-
[show abstract]
[hide abstract]
ABSTRACT: A 1.8-V 14-b 12-MS/s pseudo-differential pipeline analog-to-digital converter (ADC) using a passive capacitor error-averaging technique and a nested CMOS gain-boosting technique is described. The converter is optimized for low-voltage low-power applications by applying an optimum stage-scaling algorithm at the architectural level and an opamp and comparator sharing technique at the circuit level. Prototyped in a 0.18-μm 6M-1P CMOS process, this converter achieves a peak signal-to-noise plus distortion ratio (SNDR) of 75.5 dB and a 103-dB spurious-free dynamic range (SFDR) without trimming, calibration, or dithering. With a 1-MHz analog input, the maximum differential nonlinearity is 0.47 LSB and the maximum integral nonlinearity is 0.54 LSB. The large analog bandwidth of the front-end sample-and-hold circuit is achieved using bootstrapped thin-oxide transistors as switches, resulting in an SFDR of 97 dB when a 40-MHz full-scale input is digitized. The ADC occupies an active area of 10 mm<sup>2</sup> and dissipates 98 mW.
IEEE Journal of Solid-State Circuits 01/2005; · 3.23 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: This work presents the design of enhancement-mode and accumulation-mode thin-body MOSFETs optimized in terms of energy vs. delay (E-D), and assesses the effectiveness of back-gate biasing to adjust the leakage current. It is shown that back-gated FETs (BG-FETs) can provide power savings over double-gate FETs. Since BG-FETs span a wide range in E-D space, they can provide a single-device solution for high-performance and low-power applications through adaptive supply-voltage and threshold-voltage biasing.
SOI Conference, 2004. Proceedings. 2004 IEEE International; 11/2004
-
[show abstract]
[hide abstract]
ABSTRACT: This paper presents methods for efficient energy-performance optimization at the circuit and micro-architectural levels. The optimal balance between energy and performance is achieved when the sensitivity of energy to a change in performance is equal for all the design variables. The sensitivity-based optimizations minimize energy subject to a delay constraint. Energy savings of about 65% can be achieved without delay penalty with equalization of sensitivities to sizing, supply, and threshold voltage in a 64-bit adder, compared to the reference design sized for minimum delay. Circuit optimization is effective only in the region of about ±30% around the reference delay; outside of this region the optimization becomes too costly either in terms of energy or delay. Using optimal energy-delay tradeoffs from the circuit level and introducing more degrees of freedom, the optimization is hierarchically extended to higher abstraction layers. We focus on the micro-architectural optimization and demonstrate that the scope of energy-efficient optimization can be extended by the choice of circuit topology or the level of parallelism. In a 64-bit ALU example, parallelism of five provides a three-fold performance increase, while requiring the same energy as the reference design. Parallel or time-multiplexed solutions significantly affect the area of their respective designs, so the overall design cost is minimized when optimal energy-area tradeoff is achieved.
IEEE Journal of Solid-State Circuits 09/2004; · 3.23 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: We present several hardware architectures to implement low-density parity-check (LDPC) decoders for codes constructed with a hierarchical structure. The proposed hierarchical formulation of the LDPC code allows a structured hardware realization of the decoder. For a fully-parallel implementation, there is a reduced routing congestion that allows implementations for blocks sizes up to 1024 bits in 0.13μm technology. Partially and fully serial implementations benefits greatly from the structure of the code as well, leading to several flexible, efficient architectures. In a general purpose 0.13μm technology, the approximate area required by a 1024-bit fully-parallel LDPC decoder is found to be 12.5 mm<sup>2</sup> while a serial decoder can be implemented in an area of 0.15 mm<sup>2</sup>.
Communications, 2004 IEEE International Conference on; 07/2004
-
[show abstract]
[hide abstract]
ABSTRACT: A phase-locked loop (PLL) architecture is presented that allows adaptive optimization of tracking jitter by using an on-chip jitter estimation block. The jitter estimation circuit operates at the PLL reference clock frequency and is composed of digital blocks, improving the robustness of the overall architecture. The jitter estimates may be used to adaptively tune the PLL loop parameters to achieve minimum jitter operation. System design considerations are discussed and simulation results are reported for a PLL in 0.13 μm CMOS technology.
Circuits and Systems, 2004. ISCAS '04. Proceedings of the 2004 International Symposium on; 06/2004
-
[show abstract]
[hide abstract]
ABSTRACT: A shared n-well layout technique is developed for the design of dual-supply-voltage logic blocks. It is demonstrated on a design of a 64-bit arithmetic logic unit (ALU) module in domino logic. The second supply voltage is used to lower the power of noncritical paths in the sparse, radix-4 64-bit carry-lookahead adder and in the loopback bus. A 3 mm<sup>2</sup> test chip in 0.18-μm 1.8-V five-metal with local interconnect CMOS technology that contains six ALUs and test circuitry operates at 1.16 GHz at the nominal supply. For target delay increase of 2.8% energy savings are 25.3% using dual supplies, while for 8.3% increase in delay, 33.3% can be saved.
IEEE Journal of Solid-State Circuits 04/2004; · 3.23 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: A 1.8 V, 14 b pipelined ADC using passive capacitor error-averaging and nested CMOS gain boosting achieves 99 dB SFDR for signal frequencies up to 5.1 MHz without trimming or calibration. With a 1 MHz analog input, DNL is 0.31 LSB, INL is 0.58 LSB, and SNDR is 73.6 dB. The chip occupies 15 mm<sup>2</sup> in 0.18 μm CMOS and dissipates 112 mW.
Solid-State Circuits Conference, 2004. Digest of Technical Papers. ISSCC. 2004 IEEE International; 03/2004
-
[show abstract]
[hide abstract]
ABSTRACT: Dual-supply voltage design using a clustered voltage scaling (CVS) scheme is an effective approach to reduce chip power. The optimal CVS design relies on a level converter implemented in a flip-flop to minimize energy, delay, and area penalties due to level conversion. Additionally, circuit robustness against supply bounce is a key property that differentiates good level converter design. Novel flip-flops presented in this paper incorporate a half-latch level converter and a precharged level converter. These flip-flops are optimized in the energy-delay design space to achieve over 30% reduction of energy-delay product and about 10% savings of total power in a CVS design as compared to the conventional flip-flop. These benefits are accompanied by 24% flip-flop robustness improvement leading to 13% delay spread reduction in a CVS critical path. The proposed flip-flops also show 18% layout area reduction. Advantages of level conversion in a flip-flop over asynchronous level conversion in combinational logic are also discussed in terms of delay penalty and its sensitivity to supply bounce.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems 03/2004; · 1.22 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The striking benefits of iterative detection have generated strong interest in the disk drive signal processing area, but thus far application of this technology has been rather limited. We review the benefits that the most interesting iterative detectors have over the industry-standard partial-response maximum-likelihood (PRML) detectors, and examine the hardware complexity issues that have so far stood in the way of practical implementations.
IEEE Transactions on Magnetics 02/2004; · 1.36 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: We present an adaptive digital technique to calibrate pipelined analog-to-digital converters (ADCs). Rather than achieving linearity by adjustment of analog component values, the new approach infers component errors from conversion results and applies digital postprocessing to correct those results. The scheme proposed here draws close analogy to the channel equalization problem commonly encountered in digital communications. We show that, with the help of a slow but accurate ADC, the proposed code-domain adaptive finite-impulse-response filter is sufficient to remove the effect of component errors including capacitor mismatch, finite op-amp gain, op-amp offset, and sampling-switch-induced offset, provided they are not signal-dependent. The algorithm is all digital, fully adaptive, data-driven, and operates in the background. Strong tradeoffs between accuracy and speed of pipelined ADCs are greatly relaxed in this approach with the aid of digital correction techniques. Analog precision problems are translated into the complexity of digital signal-processing circuits, allowing this approach to benefit from CMOS device scaling in contrast to most conventional correction techniques.
Circuits and Systems I: Regular Papers, IEEE Transactions on 02/2004; · 1.97 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: A circuit sizing tool that minimizes the delay under energy constraints has been developed using optimisation software, tabulated delay models and analytical energy models. The tool is used to generate energy-delay (E-D) tradeoff curves for selected high-performance 64-bit carry-lookahead adders. The optimisation indicates that the sparse radix-4 carry-lookahead adder with sparseness factor of 2 has optimal performance in the energy-delay space.
Solid-State Circuits Conference, 2003. ESSCIRC '03. Proceedings of the 29th European; 10/2003
-
[show abstract]
[hide abstract]
ABSTRACT: Dual-supply voltage design using a clustered voltage scaling (CVS) scheme is an effective approach to reduce chip power. The optimal CVS design relies on a level converter (LC) implemented in a flip-flop to minimize energy, delay, and area penalties due to level conversion. Novel flip-flops presented in this paper incorporate a half-latch LC and a precharged LC. These flip-flops are optimized in the energy-delay design space to achieve over 30% reduction of energy-delay product and about 10% savings of total power in a CVS design as compared to the conventional flipflop. These benefits are accompanied by 24% robustness improvement and 18% layout area reduction.
Low Power Electronics and Design, 2003. ISLPED '03. Proceedings of the 2003 International Symposium on; 09/2003
-
[show abstract]
[hide abstract]
ABSTRACT: Two eight-state 7-bit soft-output Viterbi decoders matched to an EPR4 channel and a rate-8/9 convolutional code are implemented in a 0.18-μm CMOS technology. The throughput of the decoders is increased through architectural transformation of the add-compare-select recursion, with a small area overhead. The survivor-path decoding logic of a conventional Viterbi decoder register exchange is adapted to detect the two most likely paths. The 4-mm<sup>2</sup> chip has been verified to decode at 500 Mb/s with 1.8-V supply. These decoders can be used as constituent decoders for Turbo codes in high-performance applications requiring information rates that are very close to the Shannon limit.
IEEE Journal of Solid-State Circuits 08/2003; · 3.23 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: This paper describes the early analysis and estimation features currently implemented in the Berkeley Emulation Engine (BEE) system. BEE is an integrated rapid prototyping and design environment for communication and digital signal processing (DSP) systems, consisting of four multi-FPGA based processing units, each capable of emulating 10 million ASIC (application specific integrated circuits) equivalent gates at an overall system clock rate up to 60 MHz. This translates to over 600 billion 16 bit additions (operations) per second on one unit. An integrated software design flow enables the users to specify the design using a data-flow diagram, then automatically generates both the FPGA implementation for real-time rapid prototyping and a cycle-accurate, bit-true, and functionally equivalent ASIC implementation. For system-level design, the BEE hardware and software support rapid design turn-around and early performance analysis, without full synthesis or hardware mapping, from the high-level design entry. A case study detailing a turbo-decoder explains how the processing capability of the emulator can be utilized to verify a design using one billion input vectors with a speed-up factor exceeding 106 over equivalent software simulation methods.
Rapid Systems Prototyping, 2003. Proceedings. 14th IEEE International Workshop on; 07/2003
-
[show abstract]
[hide abstract]
ABSTRACT: This paper presents methods for efficient power minimization at circuit and micro-architectural levels. The potential energy savings are strongly related to the energy profile of a circuit. These savings are obtained by using gate sizing, supply voltage, and threshold voltage optimization, to minimize energy consumption subject to a delay constraint. The true power minimization is achieved when the energy reduction potentials of all tuning variables are balanced. We derive the sensitivity of energy to delay for each of the tuning variables, connecting its energy saving potential to the physical properties of the circuit. This helps to develop understanding of optimization performance and identify the most efficient techniques for energy reduction. The optimizations are applied to some examples that span typical circuit topologies including inverter chains, SRAM decoders, and adders. At a delay of 20% larger than the minimum, energy savings of 40% to 70% are possible, indicating that achieving peak performance is expensive in terms of energy. Energy savings of about 50% can be achieved without delay penalty with the balancing of sizes, supplies, and thresholds.
Computer Aided Design, 2002. ICCAD 2002. IEEE/ACM International Conference on; 12/2002
-
[show abstract]
[hide abstract]
ABSTRACT: The architectural considerations for VLSI implementations of soft output Viterbi decoders are presented. Structural transformation of the add-compare-select structures provides high throughput with small area overhead. Modifications to the survivor memory unit and a comparison between the register exchange and memory traceback methods are highlighted. A 4 mm<sup>2</sup> demonstration chip, consisting of two parallel, 8-state, 7-bit soft output Viterbi decoders, has been implemented in 0.18 μm CMOS technology, and decodes at 500 Mb/s with 1.8 V supply. These decoders are used with turbo codes, which have been demonstrated to achieve information rates close to the Shannon limit.
Signal Processing Systems, 2002. (SIPS '02). IEEE Workshop on; 11/2002
-
[show abstract]
[hide abstract]
ABSTRACT: Two 8-state, 7-bit soft output Viterbi decoders matched to an EPR4 channel and a rate-8/9 convolutional code are implemented in 0.18µm CMOS technology. Architectural transformation of the add-compare-select structures and modification of the register exchange allow a high throughput with small area overhead. The 4mm<sup>2</sup>chip has been verified to decode at 500Mb/s with 1.8V supply. These decoders are used with Turbo codes, which have been demonstrated to achieve information rates very close to the Shannon limit.
Solid-State Circuits Conference, 2002. ESSCIRC 2002. Proceedings of the 28th European; 10/2002
-
[show abstract]
[hide abstract]
ABSTRACT: This paper relates the potential energy savings to the energy profile of a circuit. These savings are obtained by using gate sizing and supply voltage optimization to minimize energy consumption subject to a delay constraint. The sensitivity of energy to delay is derived from a linear delay model extended to multiple supplies. The optimizations are applied to a range of examples that span typical circuit topologies including inverter chains, SRAM decoders and adders. At a delay of 20% larger than the minimum, energy savings of 40% to 70% are possible, indicating that achieving peak performance is expensive in terms of energy.
Solid-State Circuits Conference, 2002. ESSCIRC 2002. Proceedings of the 28th European; 10/2002
-
[show abstract]
[hide abstract]
ABSTRACT: Architectures for low-density parity-check (LDPC) decoders are discussed, with methods to reduce their complexity. Serial implementations similar to traditional microprocessor datapaths are compared against implementations with multiple processing elements that exploit the inherent parallelism in the decoding algorithm. Several classes of LDPC codes, such as those based on irregular random graphs and geometric properties of finite fields are evaluated in terms of their suitability for VLSI implementation and performance as measured by bit-error rate. Efficient realizations of low-density parity check decoders under area, power, and throughput constraints are of particular interest in the design of communications receivers.
Circuits and Systems, 2002. MWSCAS-2002. The 2002 45th Midwest Symposium on; 09/2002
-
[show abstract]
[hide abstract]
ABSTRACT: A hierarchical automated design flow for low-energy direct-mapped
signal processing integrated circuits is presented. A modular framework
based on a combined dataflow graph and floorplan description drives
automatic layout generation with commercial CAD tools. Automatic
characterization of layout improves system-level estimates. Simplified
physical design methodologies for low supply voltages are discussed. The
flow is demonstrated on a 300-k transistor test-chip, a time-division
multiple-access baseband receiver, and a soft-output Viterbi decoder. An
example of architectural comparison of energy efficiency is presented
IEEE Journal of Solid-State Circuits 04/2002; · 3.23 Impact Factor