IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 30, NO. 3, MARCH 1995251
A 4.4 ns CMOS 54
Using Pass-Transistor Multiplexer
Norio Ohkubo, Makoto Suzuki, Member, IEEE, Toshinobu Shinbo, Toshiaki Yamanaka, Member, IEEE,
Akihiro Shimizu, Katsuro Sasaki, Member, IEEE, and Yoshinobu Nakagome, Member, IEEE
Abstract—A 54 ? ? ? 54-b multiplier using pass-transistor mul-
tiplexers has been fabricated by 0.25 ? ? ?m CMOS technology.
To enhance the speed performance, a new 4-2 compressor and
a carry lookahead adder (CLA), both featuring pass-transistor
multiplexers, have been developed. The new circuits have a speed
advantage over conventional CMOS circuits because the number
of critical-path gate stages is minimized due to the high logic
functionality of pass-transistor multiplexers. The active size of
the 54 ? ? ? 54-b multiplier is 3.77 ? ? ? 3.41 mm. The multiplication
time is 4.4 ns at a 2.5-V power supply.
cessors. In particular, high-speed multiplication is becoming
increasingly important in RISC’s, DSP’s, graphics accelera-
tors, and so on, because of increasing demand for multimedia
applications. Recent high-end microprocessors call for an
operating frequency of 200 MHz or over. Furthermore, a
multiplier will be required for single-clock-cycle operation.
However, no CMOS 54
54-b multiplier with a delay time
less than 5 ns has yet been reported , .
This paper describes a 54
oped formantissa multiplication of 2 double-precision num-
bers, as outlined in the IEEE standard . The target multipli-
cation time is less than 5 ns. To reduce the multiplication
time, a new 4-2 compressor and a carry lookahead adder
(CLA), both featuring pass-transistor multiplexers, have been
developed. The new circuits provide a speed advantage over
conventional CMOS circuits because the number of critical-
path gate stages is minimized due to the high logic func-
tionality of pass-transistor multiplexers. In addition, power
reduction is important in attaining such high performance. For
this purpose, we employed 0.25
reduced the supply voltage to 2.5 V.
The architecture of the 54
in Section II. In Section III, the circuit design based on
Booth’s algorithm, as well as the design of the pass-transistor
NHANCING the performance of floating-point operation
is indispensable for current high-performance micropro-
54-b multiplier macro devel-
m CMOS technology and
54-b multiplier is described
Manuscript received August 1, 1994; revised 10/21/94.
N. Ohkubo, M. Suzuki, T. Yamanaka, and Y. Nakagome are with the Central
Research Laboratory, Hitachi, Ltd., Kokubunji, Tokyo 185, Japan.
T. Shinbo and A. Shimizu are with the Hitachi VLSI Engineering Corpo-
ration, Kodaira, Tokyo 187, Japan.
K. Sasaki is with the R&D Division, Hitachi America, Ltd., Brisbane, CA
IEEE Log Number 9408745.
Block diagram of the 54 ? 54-b multiplier usingpass-transistor
multiplexer, 4-2 compressor, and carry lookahead adder are
discussed. Section IV describes the fabrication of a test chip.
Some experimental results are shown in Section V, and the
conclusions are summarized in Section VI.
The block diagram of the 54
in Fig. 1. It employs Booth’s algorithm , Wallace’s tree
, and a conditional carry-selection (CCS) adder . The
number of partial products is halved by Booth’s algorithm.
The partial products are summed by Wallace’s tree, without
carry propagation. The summed results are then added by the
CCS adder with high-speed carry propagation.
Reducing the delay of Wallace’s tree is important in reduc-
ing multiplication time, so we used a 4-2 compressor, which
has 5 inputs and 3 outputs. The carry-out (
the next higher bit 4-2 compressor’s carry-in (
in Fig. 1. Without propagating the carry to a higher bit, the
4-2 compressor can add four partial products (II-I4) because
the carry-out (
) does not depend on the carry-in (
using this 4-2 compressor, only four addition stages are needed
for Wallace’s tree, as shown in Fig. 1. It is known that the
4-2 compressor has a speed advantage over full-adder-based
designs, because of the reduced number of addition stages
, . For further improvement, we have developed a new
4-2 compressor that reduces the critical path gate stages by
exploiting the high logic functionality of the pass-transistor
54-b multiplier is shown
) is connected to
), as shown
0018–9200/95$04.00 1995 IEEE
252IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 30, NO. 3, MARCH 1995
Fig. 2. Booth’s algorithm. (a) Booth’s encoder. (b) Partial-product generator.
Furthermore, we have developed a high-speed 108-b CLA
adder, which is another important component of the high-speed
multiplier. We have already reported a 32-b ALU with a 4-b
lookahead carry scheme called conditional carry-selection .
To apply this scheme to the final adder of the multiplier, the
4-b CLA was modified into an 8-b CLA.
III. CIRCUIT DESIGN
A. Booth’s Algorithm
The purpose of using Booth’s algorithm in this design is
not to reduce delay time, but to reduce the chip area. If the
full-adder-based design is applied in Wallace’s tree, the delay
time can be shortened by using Booth’s algorithm because
it reduces the number of addition stages. However, because
one addition stage of the 4-2 compressor halves the number
of partial products, the extra delay time without Booth’s
algorithm is only that of one 4-2 compressor. Since the delay
time in generating the partial products without Booth’s encoder
is shorter than that with Booth’s encoder, the multiplication
time, either with or without Booth’s algorithm, are almost the
Booth’s encoder is shown in Fig. 2(a). The multiplicands,
,, and , are encoded by this circuit. Encoding
the data halves the number of partial products. The simulated
propagation delay time is 0.50 ns. The partial-product genera-
tor is shown in Fig. 2(b). A multiplier, either
is selected depending on whether encoded data,
is high and inverted by encoded data, NEG. The simulated
propagation delay time is 0.56 ns.
B. Pass-Transistor Multiplexer
The pass-transistor multiplexer used in the 4-2 compressor
and 108-b CLA adder is shown in Fig. 3. When the control
is low, data is selected, and when the control
Fig. 3. Pass-transistor multiplexer circuit.
tristate inverter. (b) Pass-transistor tristate inverter. (c) Comparisonof delay
Comparison of CMOS and pass-transistor logic circuits. (a) CMOS
the control signal input for the next-stage multiplexer. Thus,
the multiplexer has both positive and negative output. It can
reduce the propagation delay by eliminating an inverter.
Several pass-transistor logic circuits have been proposed
to improve the performance of CMOS circuits. The NMOS
pass-transistor logic circuits  is one example. It has been
shown to result in high speed due to its low input capacitance
and high logic functionality. However, particularly in reduced
supply voltage designs, it is important to take into account
the problems of noise margins and speed degradation. These
are caused by mismatches between the input signal levels
and the logic threshold voltage of the CMOS gates, which
fluctuates with process variations. To avoid these problems,
the multiplexer in this design consists of both NMOS and
PMOS pass transistors.
The delay time of the pass-transistor multiplexer is shorter
than that of a CMOS gate, because of the pass-transistor-
based design where both the NMOS and PMOS are turned on.
The CMOS tristate inverter and pass-transistor tristate inverter
are shown in Fig. 4(a) and (b), which are used in CMOS
and pass-transistor multiplexers, respectively. The number
of transistors in both circuits is the same, and both have
is high, datais selected. The output is used as
OHKUBO et al.: A CMOS MULTIPLIER USING PASS-TRANSISTOR MULTIPLEXER253
Full-adder-based construction. (b) Proposed construction.
4-2 compressor circuits using pass-transistor multiplexers. (a)
equal input capacitance. A simulated comparison is shown
in Fig. 4(c), showing the dependence of the delay time from
“in” to “out” on the output load capacitance. The low driving-
source impedance attained by using the pass-transistor makes
the delay time of the pass-transistor shorter than that of a
C. 4-2 Compressor Circuit
The 4-2 compressor circuits using pass-transistor multiplex-
ers are shown in Fig. 5. The signal lines in the figure represent
positive and negative signals. The inputs of the multiplexers
are either two different signals, or one signal and logical invert.
All outputs ( ,
, and ) have buffers to enhance driving
ability. The 4-2 compressor circuits add four partial products
(I1–I4) and generate a sum signal ( ) and two carry signals
Since the pass-transistor multiplexer circuit shown in Fig. 3
has high logic functionality, a full-adder circuit is constructed
from three pass-transistor multiplexers. The 4-2 compressor is
constructed from 2 full adders, such that there are 4 critical-
path gate stages, as shown in Fig. 5(a). This circuit is faster
than conventional CMOS circuits due to the use of pass-
For further speed improvement, we developed a new 4-2
compressor. Though the number of multiplexers is the same,
the number of critical-path gate stages in this circuit is reduced
to 3 by exploiting parallelism, as shown in Fig. 5(b). In this
new configuration, the carry-out (
the carry-in (Ci), so, the advantage of the 4-2 compressor is
maintained with this new configuration. The simulated delay
) does not depend on
Fig. 6. Simulated comparison of 4-2 compressor circuits.
Fig. 7. Construction of Wallace’s tree.
comparison for these 4-2 compressor circuits is shown in
Fig. 6 The proposed circuit reduces the propagation delay time
by 18% from that of a full-adder-based circuit.
Wallace’s tree, shown in Fig. 7, is constructed from partial-
product generators, 4-2 compressors, full adders, and half
adders. Using the 4-2 compressor simplifies the construction of
Wallace’s tree. The Wallace’s tree consist of only five kinds
of blocks and four kinds of wire shifters. The five kinds of
blocks include an 8
4 partial-product generator ( ), eight
4-2 compressors (8 ), six half adders (
), and 21 half adders and a full adder (
four kinds of wire shifters include an 8-b right shifter, an 8-b
left shifter, a 16-b right shifter, and a 16-b left shifter. The
sum and carry signals of 4-2 compressors are shifted by the
wire shifters. In addition, the carry signals are shifted to the
left one more bit.
), 15 half adders
D. Conditional Carry-Selection Circuit
A number of fast-adder architectures have been proposed
–, some of which use pass-transistor logic circuits
for carry propagation , . These circuits gain their
254IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 30, NO. 3, MARCH 1995
circuit. (b) 8-b modified conditional carry-selectcircuit.
Carry lookahead circuits. (a) 4-b conditional carry-select (CCS)
speed advantage over CMOS circuits due to their high logic
functionality. However, a problem with this architecture is
the serial connection of the pass transistors in the carry
We have already reported a new look ahead carry scheme
called conditional carry-selection (CCS) . The 4-b CLA
is constructed so as to have three critical-path gate stages,
as shown in Fig. 8(a). In the CCS architecture, conditional
carry signals for each bit,
carry of “0”) or
(assuming an incoming group-carry
of “1”), are selected by the multiplexers depending on the
conditional carry signals of the previous bit,
as expressed by
(assuming an incoming group-
CCS adder is faster than the conventional pass-transistor-based
design because the pass-transistors are not serially connected
is carry generation andis carry propagation. The
Fig. 9.Construction of 108-b adder.
in the carry propagation path. This is because the multiplexer’s
output is directly connected to the next multiplexer’s control
To apply the CCS scheme to the final 108-b adder, the 4-
b CLA is modified into an 8-b CLA, as shown in Fig. 8(b).
In the modified CCS architecture, conditional carry signals
for each bit,
or, are selected by the multiplexers
depending on the conditional carry signals of the two previous
or, as expressed by
In this configuration, there are several series-connected pass-
transistors, as shown in Fig. 8(b). However, the critical path
of addition in the multiplier has no series-connected pass-
transistors. This critical path is created because the
critical path of multiplier arrives later than the other bits. The
new 8-b CLA has only four critical-path gate stages because
it makes use of parallelism. It reduces the 108-b addition time
to 1.52 ns from 1.82 ns.
Fig. 9 shows a construction of the 108-b adder based on
carry-selected architecture . DPL gates  are used for the
OHKUBO et al.: A CMOS MULTIPLIER USING PASS-TRANSISTOR MULTIPLEXER255
Fig. 10. Block carry lookahead circuit CLA2.
CHARACTERISTICS OF THE 54 ? 54-BIT MULTIPLIER
half adders (HA) circuits. The conditional-sum selection (CSS)
circuit consists of a multiplexer which selects the conditional
or, according to the incoming block carry
signal. The CCS architecture is applied not only to the 8-
b carry lookahead circuit CLA1, but also to the block carry
look-ahead circuit CLA2, as shown in Fig. 10. The block carry
signals are generated by CLA2 assuming the carry of the lower
block carry signals to be 0 or 1, and are then selected by the
multiplexer according to the incoming true carry. To shorten
the carry propagation delay, the multiplexer of the critical carry
propagation path is separated from the other multiplexers, as
shown in Fig. 10. This architecture enhances parallelism and
results in fast operation, because the carry signals of the upper
32 b are calculated in parallel with those of the lower 32 b,
and the carry signals of the upper 32 b are generated after the
delay time of a single multiplexer.
Fig. 11.Micrograph of the 54 ? 54-b multiplier.
Fig. 12. Multiplication time of the 54 ? 54-b multiplier.
metal 0.25- m CMOS technology. The major process param-
eters are summarized in Table I. The first metal is tungsten,
and the second and third metals are aluminum. It operates on
a supply voltage of 2.5 V. Fig. 11 shows a micrograph of
the chip. The top and right-hand side are the input sections,
Booth’s encoder is also positioned on the right-hand side. The
final adder and the output are at the bottom. 100 200 transistors
are integrated in an active area of 3.77
is mostly occupied by the 4-2 compressors, the partial-product
generators, and wire shifters.
54-b multiplier test chip was fabricated by triple-
3.41 mm. The area
The simulated multiplication time of the 54
tiplier is shown in Fig. 12. It is only 4.4 ns with a 2.5-V
power supply and typical process at room temperature. It