-
[show abstract]
[hide abstract]
ABSTRACT: Fast parallel multipliers that contain logarithmic partial-product reduction trees pose a challenge to simulation-based high-accuracy timing verification, since the reduction tree has many reconvergent signal branches. However, such a multiplier architecture also offers a clue as how to attack the test-vector generation problem. The timing-critical paths are intimately associated with long carry propagation. We introduce a multiplier test-vector generation method that has the ability to exercise such long carry propagation paths. Through extensive circuit simulation and static timing analysis, we evaluate the quality of the test vectors that result from the new method. Especially for fast multipliers with a pronounced carry propagation, the timing-critical vectors manage to stimulate a path, which has a delay that comes close to the true worst case delay. We investigate the complexity and run-time for the test-vector generation, and derive timing-critical vectors up to a factor word length of 54 bits.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems 05/2006; · 1.22 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: We present a twin-precision multiplier that in normal operation mode efficiently performs N-b multiplications. For applications where the demand on precision is relaxed, the multiplier can perform N/2-b multiplications while expending only a fraction of the energy of a conventional N-b multiplier. For applications with high demands on throughput, the multiplier is capable of performing two independent N/2-b multiplications in parallel. A comparison between two signed 16-b multipliers, where both perform single 8-b multiplications, shows that the twin-precision multiplier has 72% lower power dissipation and 15% higher speed than the conventional one, while only requiring 8% more transistors.
Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings. IEEE International Conference on; 11/2004
-
[show abstract]
[hide abstract]
ABSTRACT: Gate leakage power dissipation is predicted to overtake subthreshold leakage power within the next few years thus adding further problems for designers trying to meet a strict power budget. In this paper, a power cut-off technique is proposed, which in sleep mode suppresses not only subthreshold leakage but also gate leakage. The proposed technique displays a combination of low total leakage power and short wake-up time.
Solid-State Circuits Conference, 2004. ESSCIRC 2004. Proceeding of the 30th European; 10/2004
-
[show abstract]
[hide abstract]
ABSTRACT: Glitches are common in arithmetic circuits, especially in large multipliers where they often represent the major part of transitions. With the aim to provide a judicious glitch-reduction strategy, we extract and study the relation between generated and propagated glitches for three different arithmetic blocks. We show that the number of propagated glitches is far bigger than those generated regardless of circuit type, supply voltage, and threshold voltage. In contrast to existing glitch-reduction strategies we propose to focus also on the glitch propagation mechanism. It is shown how the inverting property of adder cells can be harnessed to reduce propagation of glitches and thus the overall power dissipation.
Circuits and Systems, 2004. ISCAS '04. Proceedings of the 2004 International Symposium on; 06/2004
-
[show abstract]
[hide abstract]
ABSTRACT: We employ a dynamic pass-transistor technique to drastically reduce the area requirement and power dissipation of the dot-operator cell in parallel-prefix adders. The technique is demonstrated in both 0.35 μm and 0.13 μm process technologies on a 64-bit Kogge-Stone carry tree. In a comparison with a corresponding domino implementation it is shown that the transistor count and the power dissipation can be reduced with as much as 25% and 50%, respectively. On top of the area and power reduction, the delay can also be significantly reduced by using NMOS precharge transistors, but this requires a clock signal with a higher voltage.
Circuits and Systems, 2004. ISCAS '04. Proceedings of the 2004 International Symposium on; 06/2004
-
[show abstract]
[hide abstract]
ABSTRACT: We consider the power-optimal design of dual-V<sub>T</sub> CMOS circuits under challenging delay constraints, with threshold voltages and device sizes as design variables. We show that the presence of interconnect resistance affects the optimum choices of V<sub>T</sub> and device sizes, and that ignoring the resistance can lead to highly suboptimal results. We also present criteria for deciding when interconnect resistance should be taken into account.
VLSI, 2003. Proceedings. IEEE Computer Society Annual Symposium on; 03/2003
-
[show abstract]
[hide abstract]
ABSTRACT: Full-custom design is considered superior to standard-cell design when a high-performance circuit is requested. The structured routing of critical wires is considered to be the most important contributor to this performance gap. However, this is only true for bitsliced designs, such as ripple-carry adders, but not for designs with inter-bitslice interconnections spanning several bitslices, such as tree adders and reduction-tree multipliers. It is found that standard-cell design techniques scale better with the data width than full-custom bitsliced layouts for designs dominated by inter-bitslice interconnections.
Design Automation Conference, 2003. Proceedings of the ASP-DAC 2003. Asia and South Pacific; 02/2003
-
[show abstract]
[hide abstract]
ABSTRACT: A new regular partial-product reduction tree for parallel
multipliers is presented in this paper. The reduction tree has a simple
and efficient interconnect configuration and a minimal hardware usage.
The reduction tree has a gate structure, which allows for extensive use
of carry-propagation adders. Since carry-propagation adders can be very
efficiently implemented, significant delay reduction is expected for
large multipliers
Circuits and Systems, 2001. ISCAS 2001. The 2001 IEEE International Symposium on; 06/2001
-
[show abstract]
[hide abstract]
ABSTRACT: A fast and area-efficient 32-b Manchester carry-bypass adder with
low energy-delay product is presented in this paper. The high speed is
achieved by the use of optimized bypass circuitry and fast repeater
elements in the carry path. The fabricated adder has a measured
worst-case delay of 2.8 ns and consumes 30 μW/MHz
Circuits and Systems, 2001. ISCAS 2001. The 2001 IEEE International Symposium on; 06/2001
-
[show abstract]
[hide abstract]
ABSTRACT: For 10 Gigabit Ethernet a CRC-32 generation is essential and
timing critical. Many efficient software algorithms have been proposed
for CRC generation. In this work we use an algorithm based on the
properties of Galois fields, which gives very efficient hardware. The
CRC generator has been implemented and simulated in both standard cells
and a full-custom design technique. In standard cells from the UMC 0.18
micron library a throughput of 8.7 Gb/s has been achieved. In the
full-custom design for AMS 0.35 micron process we have achieved a
throughput of 5.0 Gb/s. The conclusion, based on extrapolation of device
characteristics, is that CRC-32 generation for 10 Gb/s can be designed
with standard cells in a 0.15 micron process technology, or using
full-custom design techniques in a 0.18 micron process technology
Electronics, Circuits and Systems, 2001. ICECS 2001. The 8th IEEE International Conference on; 02/2001
-
[show abstract]
[hide abstract]
ABSTRACT: A new interconnect-driven DFT implementation is proposed in this
paper. The normal way to implement the DFT is to use the FFT algorithm
since it is computationally favorable. However, the increased speed
comes at the cost of increased communications which give a higher power
consumption. If the DFT algorithm is directly implemented instead, each
channel becomes independent of all other channels and consequently
communications and hence power consumption are reduced. Other benefits
of using the DFT directly are the possibility to calculate a spectrum of
any length, not only a power of two, and to have an irregular frequency
step between channels. A number of ad hoc processing-element (PE) and
system-level solutions are also proposed to reduce the power consumption
even further
Circuits and Systems, 2000. Proceedings. ISCAS 2000 Geneva. The 2000 IEEE International Symposium on; 02/2000