ABSTRACT: One of the main reasons for using asynchronous design is that it offers the opportunity to exploit the data-dependent latency of many operations in order to achieve low-power, high-performance, or low area. This paper describes a novel, asynchronous, iterative multiplier which exhibits data-dependency in both the number of iterations required to produce the result and in the delay of each step of the iteration. The preliminary evaluation of the multiplier, implemented using standard-cells, shows that speed improvements can be achieved in comparison to a standard iterative, radix-4 Booth multiplier.
Asynchronous Circuits and Systems, 2004. Proceedings. 10th International Symposium on; 05/2004
10th International Symposium on Advanced Research in Asynchronous Circuits and Systems (ASYNC 2004), 19-23 April 2004, Crete, Greece; 01/2004
ABSTRACT: The requirement for extended battery life, reduced size and low electromagnetic interference (EMI) for mobile communication equipment has led to the development of a novel low-power asynchronous digital signal processor (DSP) device known as CADRE (configurable asynchronous DSP for reduced energy). The architecture of CADRE is based on an asynchronous ‘function unit’ (FU) which has been designed to reduce power consumption as far as possible without sacrificing speed. This paper describes current work in redesigning the FU hardware to achieve improvements in power efficiency with the use of pass-transistor logic, voltage scaling and modifications to the hardware multiplication and addition circuitry. Software support for the new DSP is needed to avoid manual assembly-level programming and a ‘C’ compiler is being developed to make the best use of the power saving features of CADRE when implementing speech coding algorithms as required by 3G mobile telephony. A cost database and ‘pattern’ program for guiding the energy aware compilation is used and three optimisation strategies are described. Results are presented which demonstrate the effectiveness of the optimisations.
DSP enabled Radio, 2003 IEE Colloquium on; 10/2003
Low Power IC Design (Ref. No. 2001/042), IEE Seminar on; 02/2001
ABSTRACT: Viterbi decoders are used for decoding data encoded using
convolutional forward error correction codes or data that suffers from
inter-symbol interference. They occur in a large proportion of digital
transmission and digital recording systems, including digital mobile
telephony and digital TV broadcast, CD-ROM and magnetic disk reading.
This paper describes a design for a self-timed Viterbi decoder The new
design is based upon serial, unary arithmetic for the manipulation and
storage of metrics. In the trace-back system, multiple concurrent
trace-backs may be running and trace-backs are terminated as soon as
they cease to be useful. The new architecture occupies between 29% and
23% less area than a selection of synchronous implementations with the
same design parameters which use the same process and cell-library
Asynchronous Circuits and Systems, 2001. ASYNC 2001. Seventh International Symposium on; 02/2001
7th International Symposium on Advanced Research in Asynchronous Circuits and Systems (ASYNC 2001), 11-14 March 2001, Salt Lake City, UT, USA; 01/2001
ABSTRACT: By using the rotation of polarised light in a liquid crystal display, pixels can perform AND and OR logic functions. Such logic operations can be combined to implement more complex functions, and the results of an optical four-to-one multiplexer and a four-bit parallel shifter are presented.
Journal of Physics D Applied Physics 11/2000; 21(10S):S153. · 2.54 Impact Factor
ABSTRACT: An early-open latch controller for use in self-timed micropipeline
circuits is described. It switches into normally-open mode shortly
before the arrival of the data. Preventing energy dissipation due to
data propagation down the pipe, but preserves the speed advantage of a
normally-open latch controller. This is confirmed by comparing the
throughput and power for a large circuit with those obtained using other
Electronics Letters 02/2000; · 0.96 Impact Factor
ABSTRACT: Current interest in self-timed systems is motivated by the area,
power and design effort required for the global clock of VLSI
synchronous designs. A self-timed data-path, based on the ARM (Advanced
RISC Machine) processor, using `micropipeline' control techniques has
been developed for a newly updated high-performance differential bipolar
technology. This paper describes the architectural model produced to
verify the correctness of the prototype design, and the use of the model
in evaluating and enhancing the processor performance. Self-timed design
comprises independent blocks whose operation depends solely on input
data and unit availability. The modelling of the dynamic behaviour of
blocks and the control structures required are presented. These
illustrate how easily and well the self-timed operation is mapped onto
the Verilog modelling language. Benchmark results on the processor
indicate a factor-of-two performance improvement over a CMOS version.
The system state at a particular instant is difficult to determine and
the effects of interactions between modules are difficult to quantify.
The use of the model to explore design changes, particularly to the
buffering structures, is presented. This allows the design to be `tuned'
to the technology. It also enables a better understanding of total
IEE Proceedings - Computers and Digital Techniques 12/1997;
ABSTRACT: A high performance differential bipolar datapath based on the ARM
architecture has been designed using `micropipeline' self-timed
techniques. The datapath design included a full-custom 31×32 bit
register bank. Traditional bipolar single-ended design techniques are
not suited to implementing a RAM of this size on the target technology.
This has led to the adoption of a fully differential circuit for the RAM
cell here. The paper describes the challenges of designing such a
differential register bank and the surrounding self-timed control. The
data path has been fabricated by GEC Plessey Semiconductors and is fully
operational. Results for the register bank are presented in terms of
speed, power consumption and area
IEE Proceedings - Circuits Devices and Systems 11/1997; · 0.36 Impact Factor
ABSTRACT: In this paper, the advantages of an asynchronous approach for efficient, concurrent operation are presented. Different approaches to implementing asynchronous designs are summarised leading to the selection of bundled-data transition signalling. Some key differences between CMOS and differential bipolar design are discussed with reference to their impact upon the microprocessor design. Finally, the performance potential of this approach is given, indicating the benefits of this work. Introduction Current high performance chip designs are devoting a significant silicon area and high power to generate and meet the speed of the global clock required in a synchronously timed system. Asynchronous techniques, whereby units on an integrated circuit operate at their natural speed, when ready, offer a route to significant design simplification, smaller area and lower power. This in turn should result in increased processor throughput. TAM-ARM is a collaborative project between ARM Ltd....
ABSTRACT: Differential and single-ended operation are compared for
optoelectronic logic based on phototransistors and LEDs. This
demonstrates the advantages of operating differentially to obtain
tolerant elements which do not require tuning. Fan-out levels and
interconnection strategies for this logic are discussed leading to the
adoption of local interconnections with a fan-out of four to support
regular 2D logic arrays of 4-to-1 multiplexers. It is shown that 4-to-1
multiplexers not only minimise the number of elements required to
perform a function but also minimise the number of interstage
connections and the number of stages required. The flexibility and power
of the logic functionality and interconnection approach are demonstrated
for both combinatorial and sequential logic where the use of shared
outputs leads to significant logic compaction. This considerably
increases the functionality and parallelism offered by a system
IEE Proceedings - Optoelectronics 01/1995; · 0.71 Impact Factor
ABSTRACT: It is shown that any combinational function can be implemented by
a multiplexer of appropriate width. Since these logic elements can also
perform sequential operations, entire digital systems can be implemented
with multiplexers. Such systems are of particular relevance to optical
implementations as multiplexers using liquid crystal electro-optic
elements can be realised. The large number of pixels available on a
single display leads to the concept of significant optical systems
implemented using two-dimensional arrays of uncommitted multiplexers.
Since the multiplexer width is critical in determining the system
efficiency, the implementation of commonly occurring logic functions is
presented to determine the optimum width
Optoelectronics [see also IEE Proceedings-Optoelectronics], IEE Proceedings J 11/1990;
ABSTRACT: Serial shifting techniques are slow whereas a totally parallel approach, though fast, is impractical for implementational reasons. A novel intermediate approach is presented whereby circular shifting is performed over two or three levels depending on the factorisation of the shift length n. The use of read-only memories to control its operation and to provide arithmetic and logical shifting produces a hardware structure which can be automatically generated by software for use in cell-based integrated circuit design. This implementation is compared for CMOS and bipolar differential mode logic.
Computers and Digital Techniques, IEE Proceedings E. 02/1990;
ABSTRACT: It is shown that any n variable boolean function can be
implemented with a 2<sup>n-1</sup>-to-one multiplexer. The ability to
perform combinational and sequential logic gives the element universal
computability. The choice of the multiplexer width represents a
trade-off between the number of elements and the redundancy in a system.
Examples of commonly occurring logic in systems are presented. These
show that the optimum choice of width is four-to-one
Optics in Computing, IEE Colloquium on; 12/1988
ABSTRACT: Energy efficient computing in a DSP has become an important research issue in order to have a longer battery operating time to support the modern portable devices. The energy efficient functional unit has been designed and implemented for an in-house asynchornous DSP named, Configurable Asynchronous DSP for Reduced Energy (CADRE). CADRE-successor (CADRE-s) has been implemented as a full custom design and simulations are presented to successfully demonstrate the energy-efficiency of the FU. The results show that the FU designed can achieve an energy improvement by a factor of 5 in the multiply accumulator units and a factor of nearly 2 for the overall system compared with the original CADRE system. This demonstrates the importance that energy efficient logic, circuit and layout techniques contribute to a design.