Page 1
A Statistical Approach to the TimingYield
Optimization of Pipeline Circuits
ChinHsiung Hsu†, SzuJui Chou†, JieHong R. Jiang§†, and YaoWen Chang§†
Department of Electrical Engineering§/Graduate Institute of Electronics Engineering†
National Taiwan University, Taipei 10617, Taiwan
{arious, rerechou}@eda.ee.ntu.edu.tw; {jhjiang, ywchang}@cc.ee.ntu.edu.tw
Abstract. The continuous miniaturization of semiconductor devices im
poses serious threats to design robustness against process variations
and environmental fluctuations. Modern circuit designs may suffer from
design uncertainties, unpredictable in the design phase or even after
manufacturing. This paper presents an optimization technique to make
pipeline circuits robust against delay variations and thus maximize tim
ing yield. By trading larger flipflops for smaller latches, the proposed
approach can be used as a postsynthesis or postlayout optimization
tool, allowing accurate timing information to be available. Experimental
results show an average of 31% timing yield improvement for pipeline cir
cuits. They suggest that our method is promising for highspeed designs
and is capable of tolerating clock variations.
1 Introduction
As the semiconductor fabrication technology advances to the sub100nm feature
size regime, sensitivities of IC designs to process variations and environmental
fluctuations are everincreasing. To maintain design robustness against these un
certainties, it becomes more and more apparent that traditional design method
ologies need to be modified and consider variations at the early stage of a design
flow since not all process variations can be diminished with technology advances
after all.
In recent years, statistical approaches to circuit analysis and optimization
have been revolutionizing the EDA community. They are mostly centered around
delay and power issues, the two main concerns affected by design uncertainties.
In this paper, we focus on the timing issue. Traditional approaches to timing op
timization were based on worstcase analysis. For instance, any gate delay under
a certain operation condition may be set as a deterministic value fixed at the
3σ point in statistics to ensure enough margin tolerating variations. However,
worstcase analysis is too conservative especially for more and more stringent de
sign constraints in timing. Furthermore, when designs become more sensitive to
process variations, it is harder to make design safe under worstcase variations.
Due to the inadequacy of traditional worstcase analysis, the need of statistical
analysis emerges, and has attracted intensive research efforts. Statistical opti
mization is the next step as statistical analysis is getting mature.
Page 2
r0
r4
r3
r2
r1
Fig.1. A motivating example for timing yield improvement by replacing DFFs with
latches.
Based on statistical timing analysis, most existing statistical optimization
approaches focused on gate sizing, e.g., [4–6,11], and clock skew scheduling,
e.g., [1,7,10,14]. Rather, we propose a new statistical optimization methodol
ogy, which is orthogonal and complementary to gate sizing and can possibly
be combined with clock scheduling for further improvement. We take advan
tage of the transparency property of levelsensitive latches for tolerating delay
uncertainties. In fact, there were prior efforts focusing on the tradeoff between
flipflops and latches in other optimization context. For instance, flipflops may
be replaced with latches to optimize storage [16] or power [8]. However, to the
best of our knowledge, there was no work done in the context of optimizing
timing yield in the statistical domain. Consider Figure 1 for a motivating ex
ample. In the circuit, assume the delays (in nanoseconds) of an and gate and a
not gate, and a wire are in normal distributions N(5,1), N(3,1), and N(0,0),
respectively. (That is, we neglect the wire delay and assume that the and and
notgate delays are of mean values 5 and 3, respectively, and are of the same
variance 1.) Suppose the clock period is 8ns. By Monte Carlo simulation, the
timing yield of the circuit with all positiveedge triggered Dtype flipflop (D
FF) registers is 33.19%; after replacing r2with an activehigh latch, the yield
increases to 93.02%. A nearly 60% improvement is achieved by replacing a DFF
with a latch. Note that in this replacement the number of pipeline stages remains
unchanged.
Given a design with edgetriggered DFF implementation of stateholding
elements (i.e. registers), we substitute levelsensitive latches for DFFs such that
timing yield is maximally improved. In addition, this substitution also enhances
the tolerance to clock skew uncertainty as was known in the timing community.
Based on dynamic programming, we devise an optimal algorithm for pipelined
circuits, and generalize it for arbitrary sequential circuits. The proposed method
can be used for prelayout optimization under a statistical model of design un
certainties. Moreover, because latches are of smaller sizes compared with DFFs,
the substitution is possible without affecting nearby circuit structures and thus
can be performed even after physical design. Thereby, accurate timing informa
tion may be used. In contrast, yield improvement by gate sizing may invalidate
Page 3
prior physical design when devices are sized up, and thus may suffer from the
design closure problem.
Why is latch substitution challenging? Firstly, statistical timing analysis for
latchbased design is itself tricky compared with those for combinational designs
and DFF based sequential designs [3]. Secondly, aside from the timing analysis
issue, for optimization there are an exponential number of register configurations
to be explored. Essentially, each register can be of type D (standing for a D
FF), H (an activehigh latch), or L (an activelow latch). Thus, for a design
with n registers, there are 3npossible configurations, each of them requiring
the above analysis to determine its timing yield. Despite these challenges, there
exist effective approaches to the latch substitution problem. We organize our
explanation as follows. Section 2 gives some preliminaries of our models and the
underlying timing analysis. Section 3 analyzes the effect of substituting latches
for DFFs, and formalizes our optimization objectives. Section 4 presents our
algorithms, which are evaluated with experimental results in Section 5. Finally,
concluding remarks are given in Section 6.
2 Preliminaries
2.1Statistical Timing Models and Analysis
To simplify our exposition, in our discussion we shall assume that gates are
the main delay sources. However, wire delays as well can be taken into account
straightforwardly. Using the model of [15], global and local variations as well
as correlations can be handled. By statistical static timing analysis (SSTA),
the inputtooutput delay distributions of a combinational block in a sequential
circuit can be obtained. Thus we may compute the longest combinationalpath
delay distribution ∆(ri,rj) (resp. shortest combinationalpath delay distribution
δ(ri,rj)) from register rito register rjby Gaussianapproximating max [2] (resp.
min) and sum operations over Gaussian random variables.1While δ(ri,rj) is im
material in combinational timing analysis, it is crucial in analyzing sequential
circuits involving latches. Note that ∆(ri,rj) (similarly δ(ri,rj)) is not a distri
bution for some single fixed path, rather it may probabilistically correspond to
different paths.
2.2Timing Yield of Sequential Circuits
Let T = TH+ TL be the clock period with high interval TH and low interval
TL. Given a design with some target operation speed, its timing yield is the
probability that no violation occurs with respect to timing constraints, see e.g.
1For circuits with pure DFF registers, analyzing registertoregister delays may seem
far from necessary. In fact, computing the longest delay of every combinational block
is enough. However, for circuits containing latches, computing registertoregister
delays is necessary due to the transparency of latches making combinational blocks
not well separable for timing analysis.
Page 4
Active interval
of r1
Combinational
block
Register
r2
r1
r0
C2
C1
Delay( r0, r1) Delay( r1, r2)
T
TH
TL
(a)
(b)
(c)
(2)
(3)
(4)
(1)
(1’)
(2’)
(3’)
(4’)
Fig.2. A singlepath pipelined circuit and timing diagrams. (a) type(r0) = type(r1) =
type(r2) = D; (b) type(r1) = H and type(r0) = type(r2) = D; (c) type(r1) = L and
type(r0) = type(r2) = D
.
[3]. In the simplest case, when a circuit is implemented with DFFs for all of its
registers, its timing yield is the probability
?
for any register pair (ri,rj) with a combinational path from rito rj. For example
in Figure 2 (a), where registers r0,r1,r2are of type D, then the yield is
Pr[(∆(r0,r1) ≤ T) ∧ (∆(r1,r2) ≤ T)].
Pr[
(ri,rj)
(∆(ri,rj) ≤ T)],
(1)
(2)
3Timing Yield and Register Configuration
3.1Timing Yield Changed by Latch Replacement
We study the effects of substituting latches for DFFs. To begin with, consider
the singlepath pipelined circuit of Figure 2. Intuitively, an activehigh latch can
tolerate longer delay of its fanin combinational block than a DFF. If the type of
r1is changed to H as shown in Figure 2 (b), the longest delay of combinational
block C1can exceed T. For a circuit to operate without any timing violation,
essentially four cases need to be analyzed depending on ∆(r0,r1):
Page 5
case 1 0 ≤ ∆(r0,r1) < TH: The signal of C1arrives r1within the active interval
and can directly pass to C2; so T < ∆(r0,r1)+∆(r1,r2) ≤ 2T must hold. In
addition, C2must satisfy T < δ(r0,r1) + δ(r1,r2) ≤ 2T for r2to latch the
right value.
case 2 TH≤ ∆(r0,r1) < T: The signal of C1arrives r1before r1is turned on; so
it must wait until r1is active again at T. C2must satisfy ∆(r1,r2) ≤ T. In
addition, C1 must satisfy δ(r0,r1) > TH; otherwise the earliest and latest
signals of C1arrive C2in different clock cycles.
case 3 T ≤ ∆(r0,r1) < T + TH: The delay of C1is in the active interval of r1and
can directly pass to C2; so T < ∆(r0,r1) + ∆(r1,r2) ≤ 2T must hold. Also,
δ(r0,r1) > THmust hold for the same reason as case 2.
case 4 T + TH≤ ∆(r0,r1) < 2T: The signal of C1cannot pass through r1in 2T;
so this case is forbidden.
Although case 1 incurs no timing violation in this example, it is problematic
if r1 has a designated initial value (which will be erased) or r1 fans out to a
primary output since then the number of pipeline stages seen from the output
is different. We exclude it from our yield calculation and consider only legal
cases 2 and 3. For these two cases, the delay between r0and r1is restricted to
TH≤ ∆(r0,r1) < T + TH and δ(r0,r1) > TH, while the delay between r1and
r2is restricted to max{∆(r0,r1),T} +∆(r1,r2) ≤ 2T, where max{∆(r0,r1),T}
equals T in case 2 and ∆(r0,r1) in case 3, respectively. Thus, the yield equals
Pr[case 2] + Pr[case 3]
= Pr[(TH≤ ∆(r0,r1) < T) ∧ (δ(r0,r1) > TH) ∧ (∆(r1,r2) ≤ T)] +
Pr[(T ≤ ∆(r0,r1) < T + TH) ∧ (δ(r0,r1) > TH) ∧
(T < ∆(r0,r1) + ∆(r1,r2) ≤ 2T)]
= Pr[(TH≤ ∆(r0,r1) < T + TH) ∧ (δ(r0,r1) > TH) ∧
(max{∆(r0,r1),T} + ∆(r1,r2) ≤ 2T)].
In contrast, if the type of r1is changed to L as shown in Figure 2 (c), four
cases similar to the above ones need to be analyzed depending on ∆(r0,r1),
which we omit due to limited space. The analysis forms the basis of our yield
calculation. It can be extended to the analysis of pipeline circuits since every
pair of adjacent registers can be transform into a circuit as in Figure 2.
In computing the timing yield of a pipeline circuit, the timing constraints
of a combinational block depend on the types of its preceding registers, which
leads to complex computation especially for latches. Due to the transparency of
latches, delay distributions need to be propagated across latches. For example,
∆(r0,r1) is needed in Equation (4) in calculating the yield between registers
r1and r2. (For DFF based designs, there is no need to propagate distribution
across register boundaries since the output of a DFF has zero arrival time.)
To resolve this complication, we shift the delay distribution of a combinational
block to make the equations for the three types of registers identical. That is,
we modify the delay distribution of a register input and pass it as a slack to
(3)
(4)
Page 6
the fanout blocks. Thereby we may propagate probability distributions across
latches. Precisely speaking, for activehigh latches, by defining
∆shift(r0,r1) ≡ ∆(r0,r1) − TH,δshift(r0,r1) ≡ δ(r0,r1) − TH,
∆shift(r1,r2) ≡ max{∆(r0,r1) − T,0} + ∆(r1,r2), andδshift(r1,r2) ≡ δ(r1,r2),
Equation (4) can be rewritten as
Pr[case 2] + Pr[case 3]
= Pr[(∆shift(r0,r1) < T) ∧ (δshift(r0,r1) > 0) ∧
(∆shift(r1,r2) < T) ∧ (δshift(r1,r2) > 0)].
For activelow latches, similar rewriting is also available, which we omit due to
limited space. For DFFs, on the other hand, no shifting is needed.
With the above distribution shifted, we make all longest delay constraints
compared with T and shortest delay constraints compared with 0 as in Equa
tions (5). Finally, for any register pair riand rj connected by a combinational
block under analysis, we perform the max operation over {∆shift(ri,rj)} and min
operation over {δshift(ri,rj)}, and obtain the probability of the combinational
block without timing violation by
(5)
Pr[(max{∆shift(ri,rj)} < T) ∧ (min{δshift(ri,rj)} > 0)].
3.2Problem Formulation
Definition 1. Let R be a nonempty set of registers of a sequential circuit. A
register configuration of R is a total function ρ : R → {D,H,L}.
DFFs are the most common implementation of stateholding elements of
sequential circuits due to their simple edgetriggered timing constraints. We
assume that a given design is in DFF implementation initially. By changing the
initial register configuration, a circuit can be made more insensitive to timing
variations while maintaining its behavior. Essentially, pipeline stages should not
be changed before and after modifying register configurations. Therefore, no two
latches of the same type can be connected by a combinational path. Furthermore,
even two latches of different types cannot be connected by a combinational
path because the number of pipeline stages will decrease if the total number of
registers cannot increase. Hence we require that the fanin and fanout registers
of a latch have to be of type D. (Note that a positiveedge triggered DFF can
be decomposed into an activelow latch followed by an activehigh latch. So it is
possible to maintain pipeline stages by increasing the register count, which we
disallow in this paper.)
The optimization problem can be stated as follows.
Yield optimization problem: Given a sequential circuit with ρ(r) = D, for
any register r, and the distributions of its gate and wire delays, find the register
configuration such that timing yield is maximally improved subject to the above
replacement criterion.
Page 7
Start
Circuit
Graph conversion
Acyclic
graph?
Statistical dynamic
programming
Cycle
breaking
Register
configuration
Monte Carlo justification
End
No
Yes
Estimated
yield
improvement
Library
Fig.3. The flowchart of statistical latch replacement.
4Statistical Latch Replacement
4.1Optimization Flow Overview
The flow of our algorithm is shown in Figure 3. Firstly, the input circuit is
abstracted and converted to a register dependency graph with statistical timing
models and analysis to abstract essential timing information. Secondly, all cycles
of the register dependency graph are made acyclic with respect to a chosen
minimal feedback vertex set. Thirdly, the resultant acyclic graph is levelized
in topological order from inputs to outputs. Fourthly, our statistical dynamic
programming algorithm is conducted forwardly over the levelized acyclic graph.
The optimal configuration can then be derived by tracing backward from outputs
to inputs. Finally, Monte Carlo simulation can optionally be applied to justify
the yield improvement.
4.2 Statistical Dynamic Programming
We abstract a given input circuit C with a register dependency graph G =
(V,E), where a vertex vi ∈ V represents a register ri in C and there is an
directed edge (vi,vj) ∈ E if and only if there is a combinational path from
ri to rj in C. Also, registertoregister distributions ∆(ri,rj) and δ(ri,rj) are
computed according to the delay distributions of C, and is associated to its
corresponding edge (vi,vj) ∈ E. If a circuit has feedback, there will be cycles in
the converted graph. In order to levelize the register dependency graph, we break
all cycles by finding a minimal feedback vertex set (FVS) [9]. After making a
register dependency graph acyclic, we levelize it in a topological order such that
each vertex is labelled with the longest distance from an input vertex. Given an
levelized acyclic register dependency graph, we derive a register configuration
with maximal timing yield by the statistical dynamic programming algorithm
outlined in Figure 4.
Page 8
Algorithm: StatisticalDynamicProgramming
Input:
levelized register dependency graph
G = (V,E) and delay distributions on E
Output:
optimal register configuration for yield
begin
01 set level1 registers to DFFs with local yield 1
02
? := LevelCount(G)
03
for i = 2,...,?
04 let Ri be the set of registers at leveli
05
for every register configuration α of Ri
06 compute the highest local yield Yα of α
subject to the configurations of Ri−1
and their local yields
07record the config. of Ri−1 responsible for Yα
08set R? to the config. β? of all DFFs
09
for i := ? − 1,? − 2,...,2
10set Ri to the config. βi responsible for βi+1
11
return β’s
end
Fig.4. The Statistical Dynamic Programming Algorithm.
We add artificial DFFs at the primary inputs and outputs when convert
ing a circuit to a register dependency graph. Hence we set level1 and level?
registers to be of type D, where ? is the number of levels in the levelized regis
ter dependency graph. In addition, we define the local yield of a register to be
the accumulated yield computed forward from level1 registers, each having local
yield 1. The statistical dynamic programming algorithm computes and stores the
optimal configurations and the corresponding local yields in a forward direction
based on the timing analysis introduced in Section 3.
Take a singlepath pipelined circuit as an example. The statistical dynamic
programming algorithm proceeds in two phases as shown in Figure 5 (a) and (b).
In the first phase, three configurations {D,H,L} are considered for each register
in a forward direction. Since we require that the fanin and fanout registers of a
latch need to have type D, only a subset of two consecutive configurations need
to be considered as indicated by the arrows of Figure 5 (a). At each level, the
maximal local yield is kept for each configuration of {D,H,L}. Once the final
level is reached, the algorithm enters the second phase. It extracts the optimal
configuration for each register backward.
For a register dependency graph with large pipeline widths, the above algo
rithm becomes inefficient (in fact, exponential complexity in the pipeline width)
since it considers all possible configurations for registers at each level. We allevi
ate this problem by greedily optimizing one register at a time without considering
Page 9
D
H
L
D
H
L
ri
ri+1
(a)
(b)
D
H
L
rn1
D
rn
D
H
L
r2
D
r1
D
H
L
D
H
L
ri
ri+1
D
L
r2
D
r1
HH
L
rn1
D
rn
D
Fig.5. Statistical dynamic programming for a singlepath pipelined circuit optimiza
tion. (a) Forward yield calculation. Only feasible edges are shown. (b) Backward tracing
the optimal configuration.
the configurations of other registers at the same level. Thus, we may need to han
dle the consistency problem for conflicting register type assignments. Note that
because we only consider one register at a time, the result may differ from the
global optimum. It is a tradeoff between optimality and efficiency.
5 Experimental Results
The proposed algorithm is implemented in C++ codes. The experiments were
conducted on a Linux machine with Pentium IV 3.2GHz CPU and 3GB memory.
Two sets of circuits are used: pipeline circuits and general sequential circuits
all from ISCAS benchmark suites. The pipeline circuits were generated from
combinational circuits by adding 4stage pipelines. For a given circuit, under
the SIS [13] environment, technology mapping was conducted to obtain delay
information and then minimumperiod retiming was performed (thus registers
were relocated evenly over the circuit). In addition, the circuits were synthe
sized to balance long and short combinational paths. (Note that, in highspeed
and/or lowpower designs, long and short paths tend to be balanced. For in
stance, performancedriven logic optimization and power optimization with dual
threshold voltage assignments tend to balance long and short delays. Thus, de
sign trends meet our timing requirements.) All delay variations are in normal
distribution with 10–20% deviation.
Table 1 shows the results for 10% and 20% delay deviations. Columns 1, 2,
and 3 show the circuits, numbers of pipeline stages, and numbers of registers,
respectively. The clock periods are shown in the 4th column, where the clock
period of a circuit is determined by imposing the timing yield of the circuit with
all DFF registers to fall between 60–65%. The numbers of DFFs replaced by
Page 10
Table 1. ISCAS benchmark circuits with 10% and 20% delay deviations.
Circuit # of
stages
Total Clock period Replaced reg. Original yield (%) Final yield (%)
reg.10%20%10% 20%
Pipeline circuits with clock minimization
214 8.138.58 18 28
1868.65 9.37138
242 7.367.741416
218 9.42 10.189 10
240 13.44 14.2619 19
27811.1411.96 7862
867 11.88 12.6000
87911.2612.12 5669
Impv. (%)
10%
CPU time (s)
10%10%20% 10% 20% 20%20%
ISCAS85
c432
c499
c880
c1355
c1908
c3540
c5315
c7552
Average
ISCAS89
5
5
5
5
5
5
5
5
64.4
65.0
61.5
60.3
62.5
64.0
60.1
63.7
63.2
62.2
62.7
62.3
64.0
62.3
61.7
63.5
100.0
100.0
67.0
100.0
100.0
94.1
60.1
99.6
97.2
100.0
98.7
99.8
98.1
93.9
61.7
99.9
35.6
35.0
5.5
39.7
37.5
30.1
0.0
35.9
27.41 30.93 0.284
34.0
37.8
36.0
37.5
34.1
31.6
0.0
36.4
0.21
0.11
0.14
0.16
0.19
0.42
0.61
0.71
0.20
0.11
0.13
0.16
0.19
0.40
0.63
0.68
0.313
Sequential circuits
62.9
65.2
54.7
s1196
s5378
s9234
Average



18
179
211
50.24
47.79
108.57 118.86
53.54
52.98
34 59.7
61.1
57.8
67.2
71.9
56.0
62.4
65.2
59.3
4.3
6.7
1.3
4.10
2.7
4.1
1.5
2.77
0.04
0.44
0.90
0.460
0.05
0.45
0.89
0.463
10
8
10
8
levelsensitive latches are shown in the fifth column. Columns 6, 7, and 8 list
the original, final, and improved timing yields, respectively. The yields are are
justified with Monte Carlo simulation. The reported CPU times in the ninth
column are without counting the Monte Carlo simulation. Each of Columns 4–9
is divided further into two subcolumns for 10% and 20% delay deviations.
As can be see, the improvements are consistent above 30% for all of the
pipeline circuits, except for circuits c880 and c5315. For c880 in the 10%
deviation case, some inaccuracy in timing analysis causes inadequate latch re
placements and degrades the yield improvement. For c5315, a similar reason
causes inadequate latch replacements, which are later cancelled by the justifica
tion of Monte Carlo simulation. This problem can be overcome by using more
accurate SSTA tools. Nevertheless, the average improvements of pipeline cir
cuits are 27% and 31% for deviations 10% and 20%, respectively. It suggests
that our approach to yield improvement is robust against the changes of delay
deviation. It is interesting to note that the numbers of replaced registers for
20%deviation cases are in general larger than those for 10%deviation ones as
shown in Column 5. It suggests the importance of latch replacements for in
creased delay deviations. On the other hand, for cyclic sequential circuits, such
as s1196, s5378, and s9234, our approach only yields mild improvements. It
is understandable because the register dependency graphs of these circuits are
close to complete graphs, which makes latch replacement almost impossible.
To see the relation between the clock period and yield improvement, we
conduct another experiment over circuit c1355. The result is plotted in Figure 6.
As can be seen, by reducing the clock period, the yield of the original design
with all DFFs tends to vanish very quickly from 100% to 0% whereas that
of the optimized version remains high and stable for another 1 unit of delay.
(The glitch in the figure is due to different optimal register configurations for
different clock periods.) The result tends to suggest that our latch replacement
algorithm is robust against clock variation, and suitable for highspeed designs.
Hence our approaches are promising for yield improvement in the current trend
of highspeed designs.
Page 11
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
8 8.59 9.510 10.5 11 11.5 12
Yield:
Original
Final
Improvement
Clock
Fig.6. Experimental results. Yield vs. clock period for circuit c1355.
6Conclusions and Future Work
Based on statistical timing analysis, we have proposed an algorithm to opti
mize the timing yield of a sequential circuit. Experimental results show that,
by substituting latches for DFFs, timing yield can be improved about 31% on
average for pipelined circuits. In addition, the results suggest that latch replace
ment tends to tolerate clock variations. Complementary to other designforyield
methodologies like gate sizing and clock skew scheduling, our technique may be
combined with these techniques for further improvement. Since most circuits
use DFFs for register implementation, our approach may be widely applicable
to standard designs. Since replacing DFFs with latches incurs no area penalty,
the proposed algorithm can be used for not only prelayout but also postlayout
optimization, where accurate timing information is available.
For future work, since our approach only yields mild timing yield improve
ments to cyclic sequential circuits, some work needs to be done to overcome
this limitation. On the other hand, we may consider multiplephased clocking
scheme, which may lead to further yield improvements. Also, setuptime and
holdtime constraints may be added in our framework.
Acknowledgments
This work was supported in part by NSC grants 942218E002083, 952221E
002432, and 952218E002064MY3.
References
1. C. Albrecht, B. Korte, J. Schietke, and J. Vygen. Cycle time and slack optimization
for VLSIchips. In Proc. ICCAD, pp. 232238, 1999.
Page 12
2. C. E. Clark. The greatest of a finite set of random variables. Operations Research,
vol. 9, no. 2, pp. 145162, 1961.
3. C.T. Chao, L.C. Wang, K.T. Cheng, and S. Kundu. Static statistical timing anal
ysis for latchbased pipeline designs. In Proc. ICCAD, 2004.
4. S.H. Choi, B. Paul, and K. Roy. Novel sizing algorithm for yield improvement under
process variation in nanometer technology. In Proc. DAC, 2004.
5. K. Chopra, S. Shah, A. Srivastava, D. Blaauw, and D. Sylvester. Parametric yield
maximization using gate sizing based on efficient statistical power and delay gradient
computation. In Proc. ICCAD, 2005.
6. M. Guthaus, N. Venkateswaran, C. Visweswariah, and V. Zolotov. Gate sizing using
incremental parameterized statistical timing analysis. In Proc. ICCAD, 2005
7. A. Hurst and R. Brayton. Computing clock skew schedules under normal process
variation. In Proc. IWLS, 2005.
8. K. Lalgudi and M. Papaefthymiou. Fixedphase retiming for low power design. In
Proc. ISLPED, 1996.
9. H.M. Lin and J.Y. Jou. On computing the minimum feedback vertex set of a
directed graph by contraction operations. IEEE Trans. on CAD, vol. 19, no. 3,
2000.
10. J. Neves and E. Friedman. Optimal clock skew scheduling tolerant to process vari
abtions. In Proc. DAC, pp. 623628, 1996.
11. S. Raj, S. Vrudhula, and J. Wang. A methodology to improve timing yield in the
presence of process variations. In Proc. DAC, pp. 448453, 2004.
12. K. Sakallah, T. Mudge, and O. Olukotun. checkTc and minTc: Timing verification
and optimal clocking of synchronous digital circuits. In Proc. ICCAD, pp. 552555,
1990.
13. E.M. Sentovish et al. SIS: a system for sequential circuit synthesis. Technical Report
UCB/ERL M92/41, UC Berkeley, 1992.
14. J.L. Tsai, D. Baik, C.P. Chen. and K. Saluja. A yield improvement methodology
using pre and postsilicon statistical clock scheduling. In Proc. ICCAD, pp.611618,
2004.
15. C. Vishweswariah, K. Ravindran, K. Kalafala, S. Walker, and S. Narayan. First
order incremental blockbased statistical timing analysis. In Proc. DAC, pp. 331226,
2004.
16. T.Y. Wu and Y.L. Lin. Storage optimization by replacing some flipflops with
latches. In Proc. DAC, 1996.