Page 1

Constructing Zero-deficiency Parallel Prefix Adder of Minimum Depth

Haikun Zhu, Chung-Kuan Cheng, Ronald Graham

Department of Computer Science and Engineering

La Jolla, California 92093

{hazhu,kuan,rgraham}@cs.ucsd.edu

Abstract—Parallel prefix adder is a general technique for

speeding up binary addition. In unit delay model, we denote

the size and depth of an n-bit prefix adder C(n) as sC(n)and

dC(n)respectively. Snir proved that sC(n)+dC(n)≥ 2n−2 holds

for arbitrary prefix adders. Hence, a prefix adder is said to be

of zero-deficiency if sC(n)+ dC(n)= 2n − 2. In this paper, we

first propose a new architecture of zero-deficiency prefix adder

dubbed Z(d), which provably has the minimal depth among all

kinds of zero-deficiency prefix adders. We then design a 64-bit

prefix adder Z64, which is derived from Z(d)|d=8, and compare

it against several classical prefix adders of the same bit width

in terms of area and delay using logical effort method. The

result shows that the proposed Z(d) adder is also promising in

practical VLSI design.

I. INTRODUCTION

Binary adders are the most fundamental modules in com-

puter arithmetic design, and consequently have been investi-

gated extensively for decades. Quite a few classic fast adders,

such as the carry-skip adder, the carry-select adder, and

the carry-lookahead adder, were proposed in the past [1].

However, these fast adders are ad-hoc in structure, and each

of them represents a unique area-time tradeoff in the design

space. Parallel prefix adder, on the other hand, represents a

class of general adder structure that exhibits flexible area-time

tradeoffs for adder design. Therefore, identifying the exact

area-delay tradeoff curve of the parallel prefix adder is an

interesting problem that has received much research attention.

In designing parallel prefix adders, it has been popular to

assume the unit delay timing model, in which the computation

nodes are arranged in levels that represent the signal timing

[1], [2], [4]∼ [8]. Denoting the size (i.e., the number of

computation nodes) and depth of an n-bit prefix adder C(n) as

sC(n)and dC(n)respectively, Snir proved that sC(n)+dC(n)≥

2n − 2 holds for arbitrary prefix circuits [2]. He defined the

deficiency of a prefix circuit as

def(C(n)) = sC(n)+ dC(n)− 2n + 2

(1)

Therefore, a prefix adder is said to be of zero-deficiency if

sC(n)+ dC(n)= 2n − 2.

Snir’s theorem indicates that the solution space for parallel

prefix adders should look like Fig. 1. For loose depth con-

straints we can observe a linear tradeoff between the depth

and size which is exhibited by zero-deficiency prefix adders.

However, if the depth constraint is too tight, the size of

the prefix adders will grow dramatically and zero-deficiency

prefix adders no longer exist. It remains an open question to

find the zero-deficiency prefix adders of minimum depth, i.e.,

to identify the curve d = f(n) shown in Fig. 1.

Various zero-deficiency prefix circuits were proposed in the

past. Among them the most notable ones are Snir’s design

[2], LY D(n) circuit [5] and LS(n) circuit [4]. The main

zero−deficiency prefix adders

(linear depth−size tradeoff)

Sklansky Adder

0

20

40

60

80

100

120

20 40 60 80 100 120

d (Depth level)

n (Bit Width)

Serial Prefix Adder

d = ⌈log n⌉

d = n − 1

d = f(n)

Fig. 1.

width n, the maximal depth of the prefix adders is n−1 (serial prefix adder)

while the minimal depth is ⌈logn⌉ (Sklansky adder [3]).

Optimal Depth-Size tradeoffs of Parallel Prefix Adders. For a given

purpose of these work has been tightening the depth of zero-

deficiency prefix circuits as small as possible for a given

width. In [6], Zimmermann proposed an heuristic for prefix

adder optimization using depth-controlled compression and

expansion. His approach in many cases produces depth-size

optimal or near optimal prefix adders. However, the optimality

of his results is not guaranteed.

In this paper, we propose a new kind of zero-deficiency

prefix adder called Z(d) which provably has the minimal

depth for a given width. As in previous work, we adopt the

unit delay timing model since it is simple enough but also

easy for extension. We design our structure from an alternative

point of view, that is, by constructing a zero-deficiency prefix

adder of maximal width for a given depth. We then design

a 64-bit prefix adder derived from Z(d)|d=8, and compare it

against several classical prefix adders in terms of area and

delay using logical effort method [16], [17]. The result shows

that Z(d) adder is promising in practical VLSI design.

The remainder of the paper is organized as follows. Sec-

tion II explains how binary addition is formulated as a parallel

prefix problem. In Section III, we first give a revised proof

for Snir’s lower bound theorem sC(n)+ dC(n) ≥ 2n − 2,

and then discuss the properties of zero-deficiency adders. Our

main contribution lies in Section IV, where a new class of

zero-deficiency prefix circuits Z(d) is proposed. Section V

focuses on area and delay analysis using logical effort, as

well as comparisons. Section VI concludes the paper.

II. BACKGROUND

The generalized prefix problem is to compute yi = xi•

xi−1• ···x1 for 1 ≤ i ≤ n, given n inputs x1,x2,...,xn

and an arbitrary associative operator •.

Binary addition is usually expressed in terms of carry

generation signal gi, carry propagation signal pi, carry signal

883

9B-2

0-7803-8736-8/05/$20.00 ©2005 IEEE. ASP-DAC 2005

Page 2

Affiliated Tree

Ridge

Backbone

2

3

4

5

1

689 16

15

14

11

13

12

1075

4

3

2

1

Fig. 2.An examples of parallel prefix adder: n = 16, d = 4, s = 32

ciand sum signal siat each bit position (1 ≤ i ≤ n):

gi= aibi

pi= ai⊕ bi

?gi

si= pi⊕ ci−1

(2)

(3)

ci=

if i = 1

gi+ pici−1 if 2 ≤ i ≤ n

(4)

(5)

where A = anan−1···a1and B = bnbn−1···b1are the two

binary numbers. The concepts of carry generation and carry

propagation can be easily extended to a block of consecutive

bits:

G[i:j]=

?gi

G[i:k]+ P[i:k]G[k−1:j]if n ≥ i > j ≥ 1

?pi

P[i:k]P[k−1:j]

if i = j;

(6)

P[i:j]=

if i = j;

if n ≥ i > j ≥ 1

(7)

By introducing an associative operator •, the computation of

a pair of (G,P) signals and carry signals can be rewritten as:

(G,P)[i:i]= (g,p)i

(G,P)[i:j]= (G,P)[i:k]• (G,P)[k−1:j]

= (G[i:k]+ P[i:k]G[k−1:j],P[i:k]P[k−1:j]) (9)

ci= G[i:1]

(8)

(10)

Therefore, the carry signal generation, namely, the (G,P)[i:1]

signal generation in terms of equation (9) is exactly a prefix

computation problem. Since the generation of gi, pi signals

and sum bits siare just local operations, the performance of

the adder is determined by the prefix circuit of generating

(G,P) signals. In the sequel, we will only show the interme-

diate prefix circuit of the prefix adders. As an example, Fig. 2

shows a 16-bit prefix circuit of depth 5. Each computation

node (i.e., black node) in the figure is a (G,P) generator.

Generally, for an n-bit prefix circuit C(n), its size sC(n)is

defined as the number of computation nodes while its depth

dC(n)is defined as the level of the latest prefix output. The

white nodes in the figure are duplication nodes since they do

nothing but pass the signals. In the rest of the paper when

we say “a node”, we always refer to a computation (or black)

node.

III. PROPERTIES OF THE ZERO-DEFICIENCY PREFIX ADDER

Snir’s original proof for sC(n)+ dC(n) ≥ 2n − 2 is by

mathematical induction and does not reveal the structural

information of zero-deficiency circuits. We notice that there

are some nice ideas in [7] and [8] which can be used to devise

an elegant proof for Snir’s theorem. In this section, we polish

these ideas and prove an enhanced version of Snir’s lower

bound theorem.

Theorem 1: Let C(n) be an n-bit prefix circuit, with its

size and depth being denoted by sC(n)and dC(n), respectively.

Denote the depth of its most significant bit (MSB) output by

dM

C(n). Then

sC(n)+ dC(n)≥ sC(n)+ dM

C(n)≥ 2n − 2

(11)

Proof: Consider the MSB output node in C(n). This

output node is actually the root of an alphabetical tree which

is upside-down with all the input nodes being its leaves. The

size of this tree is exactly n − 1, and its depth is dM

Including the LSB bit, at most dM

be obtained from this tree. For each of the columns where the

prefix results are not ready, at least one extra node is needed

to generate the output. Thus, besides the tree, we need at least

n−(dM

the inequality

C(n).

C(n)+ 1 prefix outputs can

C(n)+1) nodes for the outputs. Consequently, we have

sC(n)≥ (n − 1) + (n − (dM

C(n)+ 1)) = 2n − 2 − dM

C(n)

which leads to

sC(n)+ dM

C(n)≥ 2n − 2

We now define a few new concepts about the prefix circuits

as follows.

Definition 1: For a prefix circuit C(n), the binary alpha-

betical tree generating the MSB prefix output is called the

backbone of C(n). In addition, there is another tree whose

nodes are exactly all the prefix output nodes, with the first

input node being its root. This tree is called the the affiliated

tree of C(n). The common part of the backbone and the

affiliated tree, that is, the path from the first input to the MSB

output, is called the ridge of C(n).

For illustration, the backbone of the prefix circuit in Fig. 2

is enclosed by a solid line loop, while the affiliated tree is

enclosed by a dashed line loop. Their common part, i.e., the

ridge, is highlighted using heavy line.

According to the proof of Theorem 1, it is straightforward

to derive the following corollary:

Corollary 1: A prefix circuit C(n) of depth d is of zero-

deficiency if and only if

1) The backbone of C(n) has depth d, and its size is n−1.

2) The affiliated tree of C(n) has size n − 1, and it

has exactly one node per column (excluding the LSB

column). Each node of the affiliated tree is a prefix

output.

3) The ridge has d nodes, one node per level.

IV. PROPOSED Z(d) CIRCUIT

In this section, we propose a new class of zero-deficiency

prefix circuits, called Z(d), which have the minimum depth

among all zero-deficiency prefix circuits.

We will first construct a class of parameterized trees called

Tk(t) trees which will be used to form the backbone of

the Z(d) circuit. We then define the Ak(t) trees which will

be used to form the affiliated tree of Z(d). Z(d) circuit

is constructed by assembling Tk(t) trees and Ak(t) trees

together.

The Tk(t) trees are defined by a recursive way as shown in

Fig. 3. The parameter t(≥ 1) is the depth of Tk(t) tree while

k represents the maximum number of black nodes in a single

column. As an example, Fig. 5(a) shows the T3(5) tree whose

884

Page 3

Tk−1

Tk−1

T1

(k−1)

(1)

(t−1)

k−1

T

(k)

. . .

. . .

... ...

... ...

level t

level t

(a)

(b)

Fig. 3.

k ≤ t.

The recursive definition of Tk(t) tree: (a) T1(t); (b) Tk(t), 1 ≤

A

k−1(k−1)

(k)

k−1

A

(t−1)

k−1

A

A

1(1)

. . .

t+1 t

2 1

. . . . . .

(b)

(a)

. . .

Fig. 4.

k ≤ t.

The recursive definition of Ak(t) tree: (a) A1(t); (b) Ak(t), 1 ≤

Algorithm 1 Generation of the Tk(t) tree

T tree generation(int t, int k)

1: for i = 1 to k − 1 do

2:

for j = i to t − k + i do

3:

Construct Ti(j) according to Fig. 3

4:

end for

5: end for

6: Construct Tk(t) according to Fig. 3

Algorithm 2 Generation of the Ak(t) tree

A tree generation(int t)

1: for i = 1 to k − 1 do

2:

for j = i to t − k + i do

3:

Construct Ai(j) according to Fig. 4

4:

end for

5: end for

6: Construct Ak(t) according to Fig. 4

depth is 5 and maximum number of nodes per column is 3,

and also shows how it is composed of T2(t), T2(3), T2(2)

and T1(1). Algorithm 1 formally presents how we generate

Tk(t) trees.

Following nearly the same recursive way of construction as

Tk(t) trees, we define the Ak(t) trees as shown in Fig. 4. For

Ak(t) tree, the parameter t is the lateral fan-out of the root,

while k+1 is its depth. We also give an example of A3(5) tree

shown in Fig. 5(b). Similar to the structure of T3(5), A3(5)

comprises A2(4), A2(3), A2(2) and A1(1). The algorithmic

description of Ak(t) trees is presented in Algorithm 2.

It is interesting to note that, T3(5) and A3(5) can be

assembled together to form a partial prefix adder, as shown

in Fig. 5(c). If the root of A3(5) is the prefix output of bit i,

then essentially we have the prefix outputs from bit i + 1 to

i+26. Generally, we can always combine a pair of Tk(t) tree

and Ak(t) tree to form a partial prefix adder of depth k + t,

as shown in Fig. 6. This is feasible because Tk(t) tree and

Ak(t) tree have the same width, which is due to the fact that

they follow the same recursive way of definition.

Theorem 2 gives the width of Tk(t) tree and Ak(t) tree.

Actually since T-tree and A-tree are defined recursively, their

width is a two dimensional integer recurrence. Equation (12)

is obtained by deriving a closed form formula of that recur-

rence. The detailed mathematical derivation can be found in

[10], and is omitted here due to lack of space.

Theorem 2: Tk(t) tree and Ak(t) tree has the same width,

which is

k

?

i=0

i

Furthermore, the size of Tk(t) tree is N(k,t) − 1, while the

N(k,t) =

?t

?

for 1 ≤ k ≤ t

(12)

(a)

(b)

(c)

1

1

26

26

i+1 i

i+26

T2(4)

T2(3)

T2(2)

T1(1)

A2(4)

A2(3)

A2(2)

A1(1)

T3(5)

A3(5)

Fig. 5.

Assembling of T3(5) and A3(5).

Examples of Tk(t) and Ak(t) trees: (a) T3(5); (b) A3(5) (c)

i

k + t

i + 1

i + N(k,t)

level t

level t

Ak(t)

Tk(t)

Fig. 6. Assembling of Tk(t) tree and Ak(t) tree

size of Ak(t) tree is N(k,t) with one node per column.

Now we are ready to introduce our new zero-deficiency

circuit Z(d). Algorithm 3 along with Fig. 7 defines the Z(d)

circuit. It can be seen that essentially Z(d) is defined over its

depth d. The width of Z(d) is given in Theorem 3. Again,

the derivation is omitted and can be found in [10].

Theorem 3: The width of the Z(d) circuit, which we

denote by NZ(d), is

NZ(d) = F(d + 3) − 1 for d ≥ 1

(13)

where F(k) are the natural Fibonacci numbers.

In order to prove the optimality of Z(d) circuit, we shall

first show that the Z(d) circuit is indeed of zero-deficiency,

and then prove that it does have the minimal depth among

all possible zero-deficiency circuits. These two facts are

presented in Theorem 4 and Theorem 5 respectively. For

Theorem 4, the proof is relatively simple. We just count the

number of computation nodes in Z(d) and verify it satisfies

the definition of zero-deficiency. For Theorem 5, the proof is

by contradiction. The basic idea is to show that, given a prefix

circuit of depth d, if its ridge spans wider than that of Z(d)

does, it can not be of zero-deficiency indeed. The detailed

proofs are presented in [10].

Theorem 4: The parallel prefix circuit Z(d) shown in

Fig. 7 is of zero-deficiency.

Theorem 5: Z(d) has the maximum width for a given

depth d among all zero-deficiency prefix circuits.

As an example, Fig. 8 shows the Z(d) circuit for d = 8.

It is now clear that the curve of function d = f(n) is

just the reverse function of NZ(d) = F(d + 3) − 1 given

in Theorem 3. Table I shows the widths of Z(d) circuits for

885

Page 4

. . .

. . .

. . .

. . .

T1(1)

A1(1)

T1(d − 1)

T2(d − 2)

T⌈d

2⌉−1(⌊d

2⌋ + 1)

T⌊d

2⌋(⌊d

2⌋)

T2(2)

A1(d − 1)

A2(d − 2)

A⌈d

2⌉−1(⌊d

2⌋ + 1)

A⌊d

2⌋(⌊d

2⌋)

A2(2)

d

Fig. 7.A new class of zero-deficiency prefix circuits Z(d).

585981 8088 33 32

1

T3(5)

T2(6)

T1(7)

A1(7)

A2(6)

A3(5)

A4(4)

A3(3)

A2(2)

T4(4)

T3(3)

T2(2)

Fig. 8.Example of Z(d) circuits: Z(d)|d=8.

Algorithm 3 Generation of the Z(d) circuit

Z circuit generation(int d) // d is the depth

1: for i = 1 to ⌈d

2:

T tree generation(d − i,i); // call algorithm 1

3:

A tree generation(d − i,i); // call algorithm 2

4: end for

5: for i = ⌊d

6:

T tree generation(i,i);

7:

A tree generation(i,i);

8: end for

9: Stitch all the T, A trees together to form Z(d) as shown in Fig. 7.

2⌉ − 1 do

2⌋ to 1 do

3 ≤ d ≤ 18, with the results of Lin’s design [4] and the

LY D(n) circuit [5] listed for comparison. The numbers are

read as the maximal widths up to which zero-deficiency prefix

circuits of the specified type and depth can be constructed.

Clearly, our design dominates the other two, especially when

the width is large.

Since Z(d) adder has provable optimality, it is also truth-

fully better than the result produced by Zimmermann’s al-

gorithm [6]. For example, given width 54 and depth 7, our

design has 99 nodes, while Zimmermann’s algorithm gives a

design of 104 nodes [9].

V. ANALYSIS AND COMPARISON

In this section, we compare a Z(d)-derived zero-deficiency

adder with several classical prefix adders in terms of both

delay and area, for 64-bit width. The selected prefix adders

include Sklansky adder [3], Brent-Kung (BK) adder [13],

Kogge-Stone (KS) adder [14] and Han-Carlson (HC) adder

[15]. These adders are well known to be fast because of small

logic depth or regular layout.

We use logical effort method for fast estimation of the adder

delay [16]. Logical effort method is a shorthand for RC delay

TABLE I

WIDTHS OF LS(n), LY D(n) AND Z(d) CIRCUITS.

d

3

4

5

6

7

8

9

LS

LY D

Z

7

12

20

33

54

88

143

232

d

LS

131

191

260

383

517

575

1030

1535

LY D

169

242

308

446

576

843

1101

1625

Z

7

7

11

12

13

14

15

16

17

18

376

609

986

1596

2583

4180

6764

10945

11

16

23

33

47

66

95

12

20

33

54

77

95

135

10

model yet provides reasonable accuracy. For a single stage

gate, logical effort method measures its delay in units τ, which

is the intrinsic delay of an ideal inverter:

D = Dabs/τ = gh + p

(14)

where g is the logical effort of the gate, which is the ratio

of the input capacitance of the gate to the input capacitance

of an inverter with the same unit effective resistance; h is

the electrical effort of the gate, which is the ratio of load

capacitance to input capacitance. p characterizes the parasitic

delay of the gate. Note that by incorporating wire capacitance

into h the interconnect delay, as well as fan-out effect, can be

easily considered. The overall path delay, is simply the sum

of the gate delay

Dpath=

?

i

Di=

?

i

figi+

?

i

pi

(15)

The first item is called path effort delay while the second item

is called path parasitic delay.

In this study, we exactly follow the experimental settings in

[17]. We assume that the (P,G) generators are implemented

using inverting static CMOS, as shown in Fig. 9. The transis-

tors are sized such that each pull down stack has unit effective

resistance. Note that there are two kinds of (P,G) generators.

The black cells in Fig. 9(a) calculate both P and G signals

while the gray cells in Fig. 9(b) only calculate P signals.

Both black cells and gray cells have two versions of opposite

polarities.

Table II lists the logical efforts, parasitic delays and circuit

area of black and gray cells. The parasitic delay is estimated

by counting the total transistor width on the output node. The

cell area is calculated by summing up all the transistor area in

the cell in unit squares. Since in the prefix network alternating

stages uses alternating polarities of inputs and outputs, all the

values in Table II are the average of the two polarities.

In analyzing the delay, two basic assumptions are made as

they were in [17].

• The wires are short enough so that distributive RC delay

can be neglected. Thus the wires are only considered as

capacitive load. This assumption is supported by [18].

886

Page 5

G

[k−1:j]

P

[k−1:j]

P

P[i:k]

G

[i:k]

P

[k−1:j]

G

P[i:k]

P[i:k]

[k−1:j]

P

[k−1:j]

1

P

2

1

P[i:j]

2

2

2

4

2

2

2

P[i:k]

4

4

[i:k]

G

(b)

4

4

1

[i:k]

G

[i:j]

2

1

P[i:j]

(a)

2

2

4

G[k−1:j]

G[k−1:j]

G[i:k]

G[i:k]

P[i:k]

P[i:k]

G[i:j]

[i:k]

P

4

4

(c)

[k−1:j]

Fig. 9. Inverting CMOS Logic: (a) black cells; (b) gray cell; (c) inverter

TABLE II

LOGICAL EFFORT, PARASITIC DELAY AND AREA OF ADDER CIRCUIT

BLOCKS

Cell

Black

Cell

Term

LEblackgu

LEblackgl

LEblackpu

LEblackpl

PDblackg

PDblackp

Ablack

LE=Logical Effort, PD=Parasitic Delay, A=Area

Value

4.5/3

6/3

10.5/3

4.5/3

7.5/3

6/3

16.5

Cell

Gray

Cell

Term

LEgraygu

LEgraygl

LEgraypu

PDgray

Agray

LEbuf

Abuf

Value

4.5/3

6/3

6/3

7.5/3

9

1*1/2

1/2

buffer

3

4

2

1

0

3

4

2

1

0

3

4

2

1

0

3

4

2

1

0

5

6

5

(a)

(b)

(c)

(d)

16 15

14 13 12 11 10

98765

4

3

2

1

16 15

14 13 12 11 10

98765

4

3

2

1

16 15

14 13 12 11 10

98765

4

3

2

1

16 15

14 13 12 11 10

98765

4

3

2

1

Fig. 10.

Kung; (d) Han-Carlson.

Parallel prefix adders: (a) Sklansky; (b) Kogge-Stone; (c) Brent-

• Vertical wires are short enough to be neglected. The

horizontal wire capacitance is measured as w units per

column spanned. It is estimated that w ≈ 0.5 for a

0.18um technology. For KS and HC adders, there are

a lot of parallel wires which have significant coupling

capacitance [18]. Therefore, for these two adders w =

0.1 is used.

For illustration purpose, Fig. 10 shows 16-bit Sklansky,

BK, KS and HC adders with critical paths identified. A few

buffers are inserted to decouple the capacitance load from the

critical path. These buffers have half the drive of an ordinary

gate and hence half the input capacitance. Note that in [17]

a fixed critical path is specified for a given adder structure

of various bit widths. This is barely true because when bit

width increases, the critical paths vary due to increased wire

capacitance. Instead, we analyze the critical paths of 64-bit

adders by hand, and list them in Table III.

The 64-bit Z(d)-derived prefix adder, denoted as Z64 for

short, is generated as follows:

1) Generate Z(d)|d=8, whose width is 88;

2) Scan the nodes of Z(d)|d=8one by one from level 0 to

level 8, and column 1 to column 88. Do the following

TABLE III

CRITICAL PATHS OF VARIOUS 64-BIT ADDERS

Sklansky

KS

HC

(0,1)input→(1,2)g→(2,4)g→(3,8)g→(4,16)g→(5,32)g→(6,64)g

(0,32)input→(1,32)b→(2,32)b→(3,32)b→(4,32)b→(5,32)g→(6,64)g

(0,30)input→(1,30)b→(2,30)b→(3,30)b→(4,30)b→(5,30)g→(6,62)g

→(7,63)g

(0,1)input→(2,4)g→(3,8)g→(4,16)g→(5,32)g→(6,48)g→(7,56)g

→(8,60)g→(9,62)g→(10,63)g

(0,1)input→(1,2)g→(2,4)→(3,8)g→(4,16)g→(5,32)g→(6,57)g

→(7,61)g→(8,64)g

(i,j)b denotes a black cell at level i, column j, similar for (i,j)g.

BK

Z64

TABLE IV

DELAY AND AREA OF VARIOUS ADDERS

Adder

Z64

BK

Sklansky

KS

HC

Delay

Area

1474

1507.05

2695.5

4824

2695.9

Logic Depth

8

10

6

6

7

81.5(w=0.5)

81(w=0.5)

175.5(w=0.5)

100(w=1)

105.5(w=1)

recursively for a selected node:

(a) If the fan-out of the node is larger than 4, slide

down it’s branch of least lateral connection by one level;

Otherwise skip to the next node;

(b) If some nodes on the sliding down branch exceeds

level 8, the entire columns where the exceeding nodes

reside are deleted. Do local connection adjustment if

needed. After this step, a 72-bit prefix network whose

largest fan-out is limited by 4 is generated;

3) Discard the eight MSB columns, yielding a 64-bit prefix

network;

4) Further optimize the prefix adder by inserting buffers to

decouple load capacitance from the critical path.

Fig. 11 shows Z64 with critical path highlighted in heavy

line. As an example, we calculate its delay and area as

follows:

DF= 4 ∗ (LEgraygl+ LEbuf) + (2 ∗ LEgraygl+ LEbuf)

+(3 ∗ LEgraygl+ LEbuf) + (LEgraygl+ LEbuf)

+3 ∗ LEgraygl+ LEbuf)

=13 ∗ LEgraygl+ 8 ∗ LEbuf = 30

DP=8 ∗ PDgray = 20

Dwire=63 ∗ w = 63 ∗ 0.5 = 31.5

Dtotal=DF + DP + Dwire= 81.5

Area=(#black cells) ∗ Ablack+ (#gray cells) ∗ Agray

=1474

The other four adder structures are evaluated in the same

way, and the results are listed in Table IV.

Clearly Z64 and BK adders are the best two of the five in

terms of both delay and area. Sklansky’s problem is that the

fan-out grows exponentially as logic depth increases, resulting

in huge delay. KS and HC adders, on the other hand, suffer

from high coupling capacitance (w=1). The area of Sklansky,

KS and HC is way larger than that of Z64 and BK.

Compared to BK adder, Z64 has smaller logic depth but

larger fan-out. These two factors effectively cancel out so that

BK and Z64 adders have nearly the same delay. However, we

conjecture that Z64 adder be more power efficient than BK

adder. The reason is that circuit cells of deep logic depth

tend to have high activity rate, hence consume more power.

A detailed power analysis of the proposed Z(d) circuit is

projected as a future work.

VI. CONCLUSIONS

In this paper, we have proposed a new class of zero-

deficiency prefix adder Z(d) which has the minimal depth

887