H. Eriksson

Chalmers University of Technology, Göteborg, Vaestra Goetaland, Sweden

Are you H. Eriksson?

Claim your profile

Publications (11)1.22 Total impact

  • Article: Toward architecture-based test-vector generation for timing verification of fast parallel multipliers
    [show abstract] [hide abstract]
    ABSTRACT: Fast parallel multipliers that contain logarithmic partial-product reduction trees pose a challenge to simulation-based high-accuracy timing verification, since the reduction tree has many reconvergent signal branches. However, such a multiplier architecture also offers a clue as how to attack the test-vector generation problem. The timing-critical paths are intimately associated with long carry propagation. We introduce a multiplier test-vector generation method that has the ability to exercise such long carry propagation paths. Through extensive circuit simulation and static timing analysis, we evaluate the quality of the test vectors that result from the new method. Especially for fast multipliers with a pronounced carry propagation, the timing-critical vectors manage to stimulate a path, which has a delay that comes close to the true worst case delay. We investigate the complexity and run-time for the test-vector generation, and derive timing-critical vectors up to a factor word length of 54 bits.
    IEEE Transactions on Very Large Scale Integration (VLSI) Systems 05/2006; · 1.22 Impact Factor
  • Source
    Conference Proceeding: An efficient twin-precision multiplier
    [show abstract] [hide abstract]
    ABSTRACT: We present a twin-precision multiplier that in normal operation mode efficiently performs N-b multiplications. For applications where the demand on precision is relaxed, the multiplier can perform N/2-b multiplications while expending only a fraction of the energy of a conventional N-b multiplier. For applications with high demands on throughput, the multiplier is capable of performing two independent N/2-b multiplications in parallel. A comparison between two signed 16-b multipliers, where both perform single 8-b multiplications, shows that the twin-precision multiplier has 72% lower power dissipation and 15% higher speed than the conventional one, while only requiring 8% more transistors.
    Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings. IEEE International Conference on; 11/2004
  • Conference Proceeding: A power cut-off technique for gate leakage suppression [CMOS logic circuits]
    [show abstract] [hide abstract]
    ABSTRACT: Gate leakage power dissipation is predicted to overtake subthreshold leakage power within the next few years thus adding further problems for designers trying to meet a strict power budget. In this paper, a power cut-off technique is proposed, which in sleep mode suppresses not only subthreshold leakage but also gate leakage. The proposed technique displays a combination of low total leakage power and short wake-up time.
    Solid-State Circuits Conference, 2004. ESSCIRC 2004. Proceeding of the 30th European; 10/2004
  • Conference Proceeding: Glitch-conscious low-power design of arithmetic circuits
    H. Eriksson, P. Larsson-Edefors
    [show abstract] [hide abstract]
    ABSTRACT: Glitches are common in arithmetic circuits, especially in large multipliers where they often represent the major part of transitions. With the aim to provide a judicious glitch-reduction strategy, we extract and study the relation between generated and propagated glitches for three different arithmetic blocks. We show that the number of propagated glitches is far bigger than those generated regardless of circuit type, supply voltage, and threshold voltage. In contrast to existing glitch-reduction strategies we propose to focus also on the glitch propagation mechanism. It is shown how the inverting property of adder cells can be harnessed to reduce propagation of glitches and thus the overall power dissipation.
    Circuits and Systems, 2004. ISCAS '04. Proceedings of the 2004 International Symposium on; 06/2004
  • Conference Proceeding: Dynamic pass-transistor dot operators for efficient parallel-prefix adders
    H. Eriksson, P. Larsson-Edefors
    [show abstract] [hide abstract]
    ABSTRACT: We employ a dynamic pass-transistor technique to drastically reduce the area requirement and power dissipation of the dot-operator cell in parallel-prefix adders. The technique is demonstrated in both 0.35 μm and 0.13 μm process technologies on a 64-bit Kogge-Stone carry tree. In a comparison with a corresponding domino implementation it is shown that the transistor count and the power dissipation can be reduced with as much as 25% and 50%, respectively. On top of the area and power reduction, the delay can also be significantly reduced by using NMOS precharge transistors, but this requires a clock signal with a higher voltage.
    Circuits and Systems, 2004. ISCAS '04. Proceedings of the 2004 International Symposium on; 06/2004
  • Conference Proceeding: Dual threshold voltage circuits in the presence of resistive interconnects
    [show abstract] [hide abstract]
    ABSTRACT: We consider the power-optimal design of dual-V<sub>T</sub> CMOS circuits under challenging delay constraints, with threshold voltages and device sizes as design variables. We show that the presence of interconnect resistance affects the optimum choices of V<sub>T</sub> and device sizes, and that ignoring the resistance can lead to highly suboptimal results. We also present criteria for deciding when interconnect resistance should be taken into account.
    VLSI, 2003. Proceedings. IEEE Computer Society Annual Symposium on; 03/2003
  • Source
    Conference Proceeding: Full-custom vs. standard-cell design flow - an adder case study
    [show abstract] [hide abstract]
    ABSTRACT: Full-custom design is considered superior to standard-cell design when a high-performance circuit is requested. The structured routing of critical wires is considered to be the most important contributor to this performance gap. However, this is only true for bitsliced designs, such as ripple-carry adders, but not for designs with inter-bitslice interconnections spanning several bitslices, such as tree adders and reduction-tree multipliers. It is found that standard-cell design techniques scale better with the data width than full-custom bitsliced layouts for designs dominated by inter-bitslice interconnections.
    Design Automation Conference, 2003. Proceedings of the ASP-DAC 2003. Asia and South Pacific; 02/2003
  • Conference Proceeding: A regular parallel multiplier which utilizes multiple carry-propagate adders
    [show abstract] [hide abstract]
    ABSTRACT: A new regular partial-product reduction tree for parallel multipliers is presented in this paper. The reduction tree has a simple and efficient interconnect configuration and a minimal hardware usage. The reduction tree has a gate structure, which allows for extensive use of carry-propagation adders. Since carry-propagation adders can be very efficiently implemented, significant delay reduction is expected for large multipliers
    Circuits and Systems, 2001. ISCAS 2001. The 2001 IEEE International Symposium on; 06/2001
  • Conference Proceeding: A 2.8 ns 30 μW/MHz area-efficient 32-b Manchester carry-bypass adder
    [show abstract] [hide abstract]
    ABSTRACT: A fast and area-efficient 32-b Manchester carry-bypass adder with low energy-delay product is presented in this paper. The high speed is achieved by the use of optimized bypass circuitry and fast repeater elements in the carry path. The fabricated adder has a measured worst-case delay of 2.8 ns and consumes 30 μW/MHz
    Circuits and Systems, 2001. ISCAS 2001. The 2001 IEEE International Symposium on; 06/2001
  • Source
    Conference Proceeding: VLSI implementation of CRC-32 for 10 Gigabit Ethernet
    [show abstract] [hide abstract]
    ABSTRACT: For 10 Gigabit Ethernet a CRC-32 generation is essential and timing critical. Many efficient software algorithms have been proposed for CRC generation. In this work we use an algorithm based on the properties of Galois fields, which gives very efficient hardware. The CRC generator has been implemented and simulated in both standard cells and a full-custom design technique. In standard cells from the UMC 0.18 micron library a throughput of 8.7 Gb/s has been achieved. In the full-custom design for AMS 0.35 micron process we have achieved a throughput of 5.0 Gb/s. The conclusion, based on extrapolation of device characteristics, is that CRC-32 generation for 10 Gb/s can be designed with standard cells in a 0.15 micron process technology, or using full-custom design techniques in a 0.18 micron process technology
    Electronics, Circuits and Systems, 2001. ICECS 2001. The 8th IEEE International Conference on; 02/2001
  • Conference Proceeding: An interconnect-driven design of a DFT processor
    [show abstract] [hide abstract]
    ABSTRACT: A new interconnect-driven DFT implementation is proposed in this paper. The normal way to implement the DFT is to use the FFT algorithm since it is computationally favorable. However, the increased speed comes at the cost of increased communications which give a higher power consumption. If the DFT algorithm is directly implemented instead, each channel becomes independent of all other channels and consequently communications and hence power consumption are reduced. Other benefits of using the DFT directly are the possibility to calculate a spectrum of any length, not only a power of two, and to have an irregular frequency step between channels. A number of ad hoc processing-element (PE) and system-level solutions are also proposed to reduce the power consumption even further
    Circuits and Systems, 2000. Proceedings. ISCAS 2000 Geneva. The 2000 IEEE International Symposium on; 02/2000

Institutions

  • 2004–2006
    • Chalmers University of Technology
      • Department of Computer Science and Engineering
      Göteborg, Vaestra Goetaland, Sweden
  • 2001
    • Linköping University
      • Department of Electrical Engineering
      Linköping, OEstergoetland, Sweden