Transmission Gates Combined With Level-Restoring CMOS Gates Reduce Glitches in Low-Power Low-Frequency Multipliers

ETH Zurich, Zurich
IEEE Transactions on Very Large Scale Integration (VLSI) Systems (Impact Factor: 1.14). 08/2008; DOI: 10.1109/TVLSI.2008.2000457
Source: IEEE Xplore

ABSTRACT Various 16-bit multiplier architectures are compared in terms of dissipated energy, propagation delay, energy-delay product (EDP), and area occupation, in view of low-power low-voltage signal processing for low-frequency applications. A novel practical approach has been set up to investigate and graphically represent the mechanisms of glitch generation and propagation. It is found that spurious activity is a major cause of energy dissipation in multipliers. Measurements point out that, because of its shorter full-adder chains, the Wallace multiplier dissipates less energy than other traditional array multipliers (8.2 mu W/MHz versus 9.6 mu W/MHz for 0.18mum CMOS technology at 0.75 V). The benefits of transistor sizing are also evaluated (Wallace including minimum-size transistors dissipates 6.2 muW/MHz). By combining transmission gates with static CMOS in a Wallace architecture, a new approach is proposed to improve the energy-efficiency further (4.7 muW/MHz), beyond recently published low-power architectures. The innovation consists in suppressing glitches via resistance-capacitance low-pass filtering, while preserving unaltered driving capabilities. The reduced number of V dd-to-ground paths also contributes to a significant decrease of static consumption.

1 Bookmark
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present a multiprecision (MP) reconfigurable multiplier that incorporates variable precision, parallel processing (PP), razor-based dynamic voltage scaling (DVS), and dedicated MP operands scheduling to provide optimum performance for a variety of operating conditions. All of the building blocks of the proposed reconfigurable multiplier can either work as independent smaller-precision multipliers or work in parallel to perform higher-precision multiplications. Given the user's requirements (e.g., throughput), a dynamic voltage/frequency scaling management unit configures the multiplier to operate at the proper precision and frequency. Adapting to the run-time workload of the targeted application, razor flip-flops together with a dithering voltage unit then configure the multiplier to achieve the lowest power consumption. The single-switch dithering voltage unit and razor flip-flops help to reduce the voltage safety margins and overhead typically associated to DVS to the lowest level. The large silicon area and power overhead typically associated to reconfigurability features are removed. Finally, the proposed novel MP multiplier can further benefit from an operands scheduler that rearranges the input data, hence to determine the optimum voltage and frequency operating conditions for minimum power consumption. This low-power MP multiplier is fabricated in AMIS 0.35- $mu{rm m}$ technology. Experimental results show that the proposed MP design features a 28.2% and 15.8% reduction in circuit area and power consumption compared with conventional fixed-width multiplier. When combining this MP design with error-tolerant razor-based DVS, PP, and the proposed novel operands scheduler, 77.7%–86.3% total power reduction is achieved with a total silicon area overhead as low as 11.1%. This paper successfully demonstrates that a MP architecture can allow more aggressive frequency/suppl- voltage scaling for improved power efficiency.
    IEEE Transactions on Very Large Scale Integration (VLSI) Systems 04/2014; 22(4):759-770. DOI:10.1109/TVLSI.2013.2252032 · 1.14 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents new low-power, high-speed unified and scalable word-based radix 8 architecture for Montgomery modular multiplication in GF(P) and GF(2 (n) ). This architecture has some similarities to the architecture of Huang, but it achieves more reduction in area and power consumption. To speed up the modular multiplication process, the hardware architecture employs carry-save addition to avoid carry propagation at each addition operation of the add-shift loop. To reduce power consumption, some latches called glitch blockers are employed at the outputs of some circuit modules to reduce the spurious transitions and the expected switching activities of high fan-out signals in the architecture. Also, we proposed a modified low-power dual-field 4-to-2 carry-save adder that has internal logic structure that reduces the chance of glitches occurrence. An ASIC implementation of the proposed architecture shows that it can perform 1,024-bit modular multiplication (for word size w = 32) in about 5.45 mu s. Also, the results show that it has smaller Area x Time values compared to all unified and scalable designs by ratios ranging from 12.2 to 66.8 %, which makes it suitable for implementation where both area and performance are of concern. Also, it has higher throughput over them by ratios ranging from 6.0 to 80.7 %. In addition, it achieves a decrease in power consumption compared to these designs by ratios ranging from 18.8 to 52.6 %. By comparing to the designs that are not unified, it has slightly higher Area x Time and lower throughput values compared to some of them. However, it achieves a significant low-power consumption compared to all of them.
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 11/2014; 39(11):7847-7863. DOI:10.1007/s13369-014-1363-5 · 0.37 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Due to relatively constant and low resistive path between input and output, Transmission gate (TG) logic offers less delay compared to other logic styles without threshold drop while keeping low transistor count. Apart from transition time, the load impedances and initial conditions on internal node capacitances, the critical delay of TG logic depends on chain-length (n) of the circuit and shows quadratic dependency on chain-length. This necessitates buffer insertion at depth 3 or 4 for chain of transmission gate in the current analysis methodology. In this paper, the dependency on two more factors such as fan-out and input-pattern are discussed. We show that the delay is dynamic and exponential depending on input-pattern and fan-out respectively. As a consequence, the insertion of buffer at proper depth is necessary for different fan-out configuration. A restoring mode transmission gate (RMTG) XOR gate is proposed which shows little dependency on fan-out and input patterns thereby eliminate the complexity of buffer insertion. The Spice simulation in 180nM UMC Technology shows that our proposed RMTG XOR is 13.21% and 31.34% faster, 51.63% and 1.72% power efficient compared to the conventional CMOS XOR and TG XOR respectively for a load capacitance of 10 fF. Our proposed model consumes less hardware compared to the conventional CMOS XOR.