Page 1

A Statistical Approach to the Timing-Yield

Optimization of Pipeline Circuits

Chin-Hsiung Hsu†, Szu-Jui Chou†, Jie-Hong R. Jiang§†, and Yao-Wen Chang§†

Department of Electrical Engineering§/Graduate Institute of Electronics Engineering†

National Taiwan University, Taipei 10617, Taiwan

{arious, rerechou}@eda.ee.ntu.edu.tw; {jhjiang, ywchang}@cc.ee.ntu.edu.tw

Abstract. The continuous miniaturization of semiconductor devices im-

poses serious threats to design robustness against process variations

and environmental fluctuations. Modern circuit designs may suffer from

design uncertainties, unpredictable in the design phase or even after

manufacturing. This paper presents an optimization technique to make

pipeline circuits robust against delay variations and thus maximize tim-

ing yield. By trading larger flip-flops for smaller latches, the proposed

approach can be used as a post-synthesis or post-layout optimization

tool, allowing accurate timing information to be available. Experimental

results show an average of 31% timing yield improvement for pipeline cir-

cuits. They suggest that our method is promising for high-speed designs

and is capable of tolerating clock variations.

1 Introduction

As the semiconductor fabrication technology advances to the sub-100nm feature

size regime, sensitivities of IC designs to process variations and environmental

fluctuations are ever-increasing. To maintain design robustness against these un-

certainties, it becomes more and more apparent that traditional design method-

ologies need to be modified and consider variations at the early stage of a design

flow since not all process variations can be diminished with technology advances

after all.

In recent years, statistical approaches to circuit analysis and optimization

have been revolutionizing the EDA community. They are mostly centered around

delay and power issues, the two main concerns affected by design uncertainties.

In this paper, we focus on the timing issue. Traditional approaches to timing op-

timization were based on worst-case analysis. For instance, any gate delay under

a certain operation condition may be set as a deterministic value fixed at the

3σ point in statistics to ensure enough margin tolerating variations. However,

worst-case analysis is too conservative especially for more and more stringent de-

sign constraints in timing. Furthermore, when designs become more sensitive to

process variations, it is harder to make design safe under worst-case variations.

Due to the inadequacy of traditional worst-case analysis, the need of statistical

analysis emerges, and has attracted intensive research efforts. Statistical opti-

mization is the next step as statistical analysis is getting mature.

Page 2

r0

r4

r3

r2

r1

Fig.1. A motivating example for timing yield improvement by replacing DFFs with

latches.

Based on statistical timing analysis, most existing statistical optimization

approaches focused on gate sizing, e.g., [4–6,11], and clock skew scheduling,

e.g., [1,7,10,14]. Rather, we propose a new statistical optimization methodol-

ogy, which is orthogonal and complementary to gate sizing and can possibly

be combined with clock scheduling for further improvement. We take advan-

tage of the transparency property of level-sensitive latches for tolerating delay

uncertainties. In fact, there were prior efforts focusing on the tradeoff between

flip-flops and latches in other optimization context. For instance, flip-flops may

be replaced with latches to optimize storage [16] or power [8]. However, to the

best of our knowledge, there was no work done in the context of optimizing

timing yield in the statistical domain. Consider Figure 1 for a motivating ex-

ample. In the circuit, assume the delays (in nanoseconds) of an and gate and a

not gate, and a wire are in normal distributions N(5,1), N(3,1), and N(0,0),

respectively. (That is, we neglect the wire delay and assume that the and- and

not-gate delays are of mean values 5 and 3, respectively, and are of the same

variance 1.) Suppose the clock period is 8ns. By Monte Carlo simulation, the

timing yield of the circuit with all positive-edge triggered D-type flip-flop (D-

FF) registers is 33.19%; after replacing r2with an active-high latch, the yield

increases to 93.02%. A nearly 60% improvement is achieved by replacing a D-FF

with a latch. Note that in this replacement the number of pipeline stages remains

unchanged.

Given a design with edge-triggered D-FF implementation of state-holding

elements (i.e. registers), we substitute level-sensitive latches for D-FFs such that

timing yield is maximally improved. In addition, this substitution also enhances

the tolerance to clock skew uncertainty as was known in the timing community.

Based on dynamic programming, we devise an optimal algorithm for pipelined

circuits, and generalize it for arbitrary sequential circuits. The proposed method

can be used for pre-layout optimization under a statistical model of design un-

certainties. Moreover, because latches are of smaller sizes compared with D-FFs,

the substitution is possible without affecting nearby circuit structures and thus

can be performed even after physical design. Thereby, accurate timing informa-

tion may be used. In contrast, yield improvement by gate sizing may invalidate

Page 3

prior physical design when devices are sized up, and thus may suffer from the

design closure problem.

Why is latch substitution challenging? Firstly, statistical timing analysis for

latch-based design is itself tricky compared with those for combinational designs

and D-FF based sequential designs [3]. Secondly, aside from the timing analysis

issue, for optimization there are an exponential number of register configurations

to be explored. Essentially, each register can be of type D (standing for a D-

FF), H (an active-high latch), or L (an active-low latch). Thus, for a design

with n registers, there are 3npossible configurations, each of them requiring

the above analysis to determine its timing yield. Despite these challenges, there

exist effective approaches to the latch substitution problem. We organize our

explanation as follows. Section 2 gives some preliminaries of our models and the

underlying timing analysis. Section 3 analyzes the effect of substituting latches

for D-FFs, and formalizes our optimization objectives. Section 4 presents our

algorithms, which are evaluated with experimental results in Section 5. Finally,

concluding remarks are given in Section 6.

2 Preliminaries

2.1Statistical Timing Models and Analysis

To simplify our exposition, in our discussion we shall assume that gates are

the main delay sources. However, wire delays as well can be taken into account

straightforwardly. Using the model of [15], global and local variations as well

as correlations can be handled. By statistical static timing analysis (SSTA),

the input-to-output delay distributions of a combinational block in a sequential

circuit can be obtained. Thus we may compute the longest combinational-path

delay distribution ∆(ri,rj) (resp. shortest combinational-path delay distribution

δ(ri,rj)) from register rito register rjby Gaussian-approximating max [2] (resp.

min) and sum operations over Gaussian random variables.1While δ(ri,rj) is im-

material in combinational timing analysis, it is crucial in analyzing sequential

circuits involving latches. Note that ∆(ri,rj) (similarly δ(ri,rj)) is not a distri-

bution for some single fixed path, rather it may probabilistically correspond to

different paths.

2.2Timing Yield of Sequential Circuits

Let T = TH+ TL be the clock period with high interval TH and low interval

TL. Given a design with some target operation speed, its timing yield is the

probability that no violation occurs with respect to timing constraints, see e.g.

1For circuits with pure D-FF registers, analyzing register-to-register delays may seem

far from necessary. In fact, computing the longest delay of every combinational block

is enough. However, for circuits containing latches, computing register-to-register

delays is necessary due to the transparency of latches making combinational blocks

not well separable for timing analysis.

Page 4

Active interval

of r1

Combinational

block

Register

r2

r1

r0

C2

C1

Delay( r0, r1) Delay( r1, r2)

T

TH

TL

(a)

(b)

(c)

(2)

(3)

(4)

(1)

(1’)

(2’)

(3’)

(4’)

Fig.2. A single-path pipelined circuit and timing diagrams. (a) type(r0) = type(r1) =

type(r2) = D; (b) type(r1) = H and type(r0) = type(r2) = D; (c) type(r1) = L and

type(r0) = type(r2) = D

.

[3]. In the simplest case, when a circuit is implemented with D-FFs for all of its

registers, its timing yield is the probability

?

for any register pair (ri,rj) with a combinational path from rito rj. For example

in Figure 2 (a), where registers r0,r1,r2are of type D, then the yield is

Pr[(∆(r0,r1) ≤ T) ∧ (∆(r1,r2) ≤ T)].

Pr[

(ri,rj)

(∆(ri,rj) ≤ T)],

(1)

(2)

3Timing Yield and Register Configuration

3.1Timing Yield Changed by Latch Replacement

We study the effects of substituting latches for D-FFs. To begin with, consider

the single-path pipelined circuit of Figure 2. Intuitively, an active-high latch can

tolerate longer delay of its fan-in combinational block than a D-FF. If the type of

r1is changed to H as shown in Figure 2 (b), the longest delay of combinational

block C1can exceed T. For a circuit to operate without any timing violation,

essentially four cases need to be analyzed depending on ∆(r0,r1):

Page 5

case 1 0 ≤ ∆(r0,r1) < TH: The signal of C1arrives r1within the active interval

and can directly pass to C2; so T < ∆(r0,r1)+∆(r1,r2) ≤ 2T must hold. In

addition, C2must satisfy T < δ(r0,r1) + δ(r1,r2) ≤ 2T for r2to latch the

right value.

case 2 TH≤ ∆(r0,r1) < T: The signal of C1arrives r1before r1is turned on; so

it must wait until r1is active again at T. C2must satisfy ∆(r1,r2) ≤ T. In

addition, C1 must satisfy δ(r0,r1) > TH; otherwise the earliest and latest

signals of C1arrive C2in different clock cycles.

case 3 T ≤ ∆(r0,r1) < T + TH: The delay of C1is in the active interval of r1and

can directly pass to C2; so T < ∆(r0,r1) + ∆(r1,r2) ≤ 2T must hold. Also,

δ(r0,r1) > THmust hold for the same reason as case 2.

case 4 T + TH≤ ∆(r0,r1) < 2T: The signal of C1cannot pass through r1in 2T;

so this case is forbidden.

Although case 1 incurs no timing violation in this example, it is problematic

if r1 has a designated initial value (which will be erased) or r1 fans out to a

primary output since then the number of pipeline stages seen from the output

is different. We exclude it from our yield calculation and consider only legal

cases 2 and 3. For these two cases, the delay between r0and r1is restricted to

TH≤ ∆(r0,r1) < T + TH and δ(r0,r1) > TH, while the delay between r1and

r2is restricted to max{∆(r0,r1),T} +∆(r1,r2) ≤ 2T, where max{∆(r0,r1),T}

equals T in case 2 and ∆(r0,r1) in case 3, respectively. Thus, the yield equals

Pr[case 2] + Pr[case 3]

= Pr[(TH≤ ∆(r0,r1) < T) ∧ (δ(r0,r1) > TH) ∧ (∆(r1,r2) ≤ T)] +

Pr[(T ≤ ∆(r0,r1) < T + TH) ∧ (δ(r0,r1) > TH) ∧

(T < ∆(r0,r1) + ∆(r1,r2) ≤ 2T)]

= Pr[(TH≤ ∆(r0,r1) < T + TH) ∧ (δ(r0,r1) > TH) ∧

(max{∆(r0,r1),T} + ∆(r1,r2) ≤ 2T)].

In contrast, if the type of r1is changed to L as shown in Figure 2 (c), four

cases similar to the above ones need to be analyzed depending on ∆(r0,r1),

which we omit due to limited space. The analysis forms the basis of our yield

calculation. It can be extended to the analysis of pipeline circuits since every

pair of adjacent registers can be transform into a circuit as in Figure 2.

In computing the timing yield of a pipeline circuit, the timing constraints

of a combinational block depend on the types of its preceding registers, which

leads to complex computation especially for latches. Due to the transparency of

latches, delay distributions need to be propagated across latches. For example,

∆(r0,r1) is needed in Equation (4) in calculating the yield between registers

r1and r2. (For D-FF based designs, there is no need to propagate distribution

across register boundaries since the output of a D-FF has zero arrival time.)

To resolve this complication, we shift the delay distribution of a combinational

block to make the equations for the three types of registers identical. That is,

we modify the delay distribution of a register input and pass it as a slack to

(3)

(4)