# A network-flow approach to timing-driven incremental placement for ASICs.

**ABSTRACT** We present a novel incremental placement methodology called FlowPlace for significantly reducing critical path delays of placed standard-cell circuits. FlowPlace includes: a) a timing-driven (TD) analytical global placer TAN that uses accurate delay functions and minimizes a combination of linear and quadratic objective functions; b) a network flow based detailed placer TIF that has new and effective techniques for performing TD incremental placement and satisfying row-length (white space) constraints. We have obtained results on three sets of benchmarks: i) TD versions of the ibm benchmark suite that we have constructed; ii) benchmarks used in TD-Dragon; iii) the Faraday benchmarks. Results show that starting with Dragon-placed circuits, we are able to obtain up to 34% and an average of 18% improvement in critical path delays, at an average of 17.5% of the run-time of the Dragon placer. Starting with a state-of-the-art TD placer TD-Dragon, for the TD-Dragon benchmarks we obtain up to about 10% and an average of 4.3% delay improvement with 12% of TD-Dragon's run times; this is significant as we are extracting performance improvements from a performance-optimized layout. Wire length deterioration on the average over all benchmark suites is less than 8%

**0**Bookmarks

**·**

**77**Views

- [Show abstract] [Hide abstract]

**ABSTRACT:**We propose a timing-driven discrete cell-sizing algorithm that can address total cell size and/or leakage power constraints. We model cell sizing as a “discretized” mincost network flow problem, wherein available sizes of each cell are modeled as nodes. Flow passing through a node indicates the choice of the corresponding cell size, and the total flow cost reflects the timing objective function value corresponding to these choices. Compared to other discrete optimization methods for cell sizing, our method can obtain near-optimal solutions in a time-efficient manner. We tested our algorithm on ISCAS’85 benchmarks, and compared our results to those produced by an optimal dynamic programming- (DP-) based method. The results show that compared to the optimal method, the improvements to an initial sizing solution obtained by our method is only 1% (3%) worse when using a 180 nm (90 nm) library, while being 40–60 times faster. We also obtained results for ISPD’12 cell-sizing benchmarks, under leakage power constraint, and compared them to those of a state-of-the-art approximate DP method (optimal DP runs out of memory for the smallest of these circuits). Our results show that we are only 0.9% worse than the approximate DP method, while being more than twice as fast.VLSI Design 05/2013; 2013. - SourceAvailable from: array.bioengr.uic.edu[Show abstract] [Hide abstract]

**ABSTRACT:**Recent large scale genome-wide association studies have been considered to hold promise for unraveling the genetic etiology of com- plex diseases. It becomes possible now to use these data to assess the influence of interactions from multiple SNPs on a disease. In this pa- per we formulate the multiple SNP selection problem for determining genetic risk profiles of certain diseases by formulating novel 0/1 IP for- mulations for this problem, and solving them using a new near-optimal and ecient discrete optimization technique called discretized network flow that has recently been developed by us. One of the highlights of our approach to solving the multiple SNP selection problem is recognizing that there could be dierent genetic profiles of a disease among the pa- tient population, and it is thus desirable to classify/cluster patients with similar genetic profiles of the disease while simultaneously selecting the right genetic marker sets of the disease for each cluster. This approach coupled with the DNF technique has yielded results for several diseases with some of the highest sensitivities seen so far and specificities that are higher or comparable to state-of-the art techniques, at a fraction of the runtime of these techniques. - SourceAvailable from: citeseerx.ist.psu.edu[Show abstract] [Hide abstract]

**ABSTRACT:**Abstract—We propose a timing-driven discrete cell- sizing algorithm that can incorporate total cell size con- straints. We model cell sizing as a min-cost network flow problem. In the network flow graph, available sizes of each cell are modeled as nodes. Flow passing through a node indicates the choice of the corresponding cell size, and the total flow cost reflects the timing objective function value change with the chosen sizes of cells. Compared to other discrete optimization methods for cell sizing, our method can obtain a near-optimal solution in a very time- efficient manner. We tested our algorithm on the ISCAS’85 benchmark, and compared our results with an optimal solution produced by an exhaustive search method with exponential time complexity. The results show that given the same initial sizing, the improvement obtained by our method is only 1% worse (11.9% v.s. 12.9%) than the optimal solution, while satisfying a given total cell area constraint. Furthermore, our method is 60 times faster than the optimal method.

Page 1

A Network-Flow Approach to Timing-Driven Incremental

Placement for ASICs

∗

Shantanu Dutt, Huan Ren, Fenghua Yuan and Vishal Suthar

Dept. of ECE, University of Illinois-Chicago

dutt@ece.uic.edu, hren2@uic.edu

ABSTRACT

We present a novel incremental placement methodology called

FlowPlace for significantly reducing critical path delays of

placed standard-cell circuits.

timing-driven (TD) analytical global placer TAN that uses ac-

curate delay functions and minimizes a combination of linear

and quadratic objective functions; b) a network flow based

detailed placer TIF that has new and effective techniques

for performing TD incremental placement and satisfying row-

length (white space) constraints. We have obtained results on

three sets of benchmarks: i) TD versions of the ibm bench-

mark suite that we have constructed; ii) benchmarks used in

TD-Dragon; iii) the Faraday benchmarks. Results show that

starting with Dragon-placed circuits, we are able to obtain

up to 34% and an average of 18% improvement in critical

path delays, at an average of 17.5% of the run-time of the

Dragon placer. Starting with a state-of-the-art TD placer

TD-Dragon, for the TD-Dragon benchmarks we obtain up to

about 10% and an average of 4.3% delay improvement with

12% of TD-Dragon’s run times; this is significant as we are

extracting performance improvements from a performance-

optimized layout. Wire length deterioration on the average

over all benchmark suites is less than 8%.

FlowPlace includes:a) a

1.INTRODUCTION

Due to the increasing ratio of interconnect to gate delays in

very deep submicron (VDSM) designs, and the large impact

that placement plays on the final wire length (WL) as well

as performance, WL and timing consideration during place-

ment is critical. Timing driven (TD) placement algorithms

can be divided into 3 categories. 1) partition-based, like [9,

12], 2) simulated annealing (SA) based, like [11, 15], and 3)

analytical [8]. Circuit timing optimization is basically a path-

based problem, though it is impractical to track delays of all

paths, since their numbers are generally exponential in cir-

cuit size [8]. Hence, timing constraints on paths are usually

converted to either net/edge weights or constraints such as

∗This work was supported by NSF grant CCR-0204097. We also grate-

fully acknowledge the permission of Artisan Components, Inc. for the

use of the cell-timing libraries of the TD-Dragon benchmarks, and also

the help of the TD-Dragon authors in this regard.

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that copies

bear this notice and the full citation on the first page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior specific

permission and/or a fee.

ICCAD’06 November 5-9, 2006, San Jose, CA

Copyright 2006 ACM 0-12345-67-8/90/01 ...$5.00.

net delay bounds, yielding more tractable net-based methods.

In a recent work [8], a novel edge weight function was pro-

posed that, together with its new objective function, solves

the convergence problem in net-based methods–delay reduc-

tions along critical paths are sometimes obtained at the ex-

pense of delay increases in non-critical paths, to the extent

that the circuit delay reduces little, if at all. In [15], a SA

approach is used along with delay bounds on nets. The slack

assignment approach in the paper ensures that estimated long

nets are assigned a larger delay bound so that they are not

be overly constrained. The objective function is to minimize

the sum of delay violations across all nets.

Another approach to TD placement is via targeted incre-

mental placement. On an initial base placement, an incre-

mental TD placer can focus on reducing delays of the most

critical paths. This will greatly reduce the number of paths

that need to be considered. Also, more timing information

can be derived if there is an initial placement; thus delay and

slack estimates, and thereby cost functions, are more accu-

rate. Furthermore, by its very nature, TD incremental place-

ment, if done properly, implicitly solves the aforementioned

convergence problem, since it minimizes placement changes

to the non-critical paths, thereby limiting any delay increases

in them. TD incremental placement also finds applications

in ECO scenarios where changes in stages above the phys-

ical design (PD) level generally percolate down to required

changes in the placement and routing stages. In such appli-

cations, TD incremental placement would make the required

placement changes, while minimizing placement changes in

the unaffected portion of the circuit, and minimizing any de-

terioration in critical path delays. TD incremental placement

can also be invoked in ECOs for the express purpose of reduc-

ing delays of paths that violate target clock speed constraints

via appropriate placement changes in cells on these paths. It

is in the context of the first and third applications that we

will describe our TD incremental placer, though it can also

be used in the general ECO context.

A TD incremental placer was proposed in [14] that di-

rectly controls the delay of critical or near-critical paths. It

explicitly sets delay constraints for all the critical paths based

on the half-perimeter bounding box (HPBB) net lengths on

these paths. It then finds a solution to these constraints while

minimizing total HPBB WL change in the circuit using linear

programming. This method only takes BB length into con-

sideration, which is only one component of sink delays in a

net, resulting in less than highly-accurate timing estimates.

Also, non-critical nets are ignored, and thus the convergence

problem mentioned earlier can surface.

We propose a timing-driven incremental placer FlowPlace

Page 2

that addresses many of the above issues. It has two major

components. First, an incremental TD analytical placer TAN

is used to find an initial placement, possibly with overlaps.

Then a TD detailed placer TIF is used to get a legal place-

ment that minimizes critical path delay increase over that

of TAN’s placement. Our TD analytical placer extends the

basic techniques of Gordian [10] and Gordian-L [13] to opti-

mize a TD objective function with quadratic as well as linear

terms, and also has carefully designed objective and weight

functions. The detailed placement algorithm uses a network

flow based method. Network flow has been used previously

for solving the legalization problem in standard-cell circuits

[4, 5]. In both works, the network flow modeling is similar to

ours on some high-level issues: cells are represented by nodes

and possible movement of cells are represented by arcs from

them to destination positions. The properties of network flow

were used in these works to remove cell overlaps, and mini-

mize the sum of flow costs while doing so. However, their

objectives were to minimize total WL which can be more eas-

ily modeled by sum of flow costs. Our objective is to minimize

the delay of critical paths rather than the sum of delays. To

this end, we use more complex cost functions and flow graph

structures to make sure that the sum of flow costs is a good

indicator of mainly critical-path delay changes.

The rest of the paper is organized as follows. Section 2 dis-

cusses some basic issues about incremental TD placement and

the high-level flow of our methodology. In Sec. 3 we present

various aspects of our TD analytical placer including its ob-

jective function and accurate interconnect delay estimates. In

Sec. 4, the network-flow based TD incremental detailed placer

is discussed at length. Section 5 presents experimental results

and we conclude in Sec. 6.

2.TD INCREMENTAL PLACEMENT AND

METHODOLOGY FLOW

The TD incremental placement problem can be formally

stated as follows.

Input: A placed circuit PC with some

Initial Placed

Circuit

and deterioration of placement metrics

like total wirelength (WL) and chip area,

and (2) either: (a) the critical path de-

lay in PC0is not increased beyond the

one in PC (this applies in applications

where the target clock speed has been

met, and the ECO process is used to rec-

tify other circuit problems), or (b) the

critical path in PC0is significantly im-

proved compared to the one in the pre-

vious layout–we will focus on this ob-

jective in this paper, though our incre-

mental placer can also be used to tackle

applications of type 2(a) as well.

Fig. 1 shows the flow of our TD in-

cremental placer used in applications of

type (b) given above. We start from a

placed circuit, and identify all critical

and near-critical paths using static tim-

ing analysis (STA). Let this set of paths

be P. After P is identified, we remove

either: (i) only the cells in P from the layout, or (ii) all cells

in all nets in P from the layout.

the cell set moveC (nets connected to cells in moveC are de-

w/ improved

performance

New Placement

TD analytical

placement (TAN)

(on moveC)

determine critical

node set moveC

Perform STA &

TD n/w−flow

based detailed

placement (TIF)

(on moveC)

Figure 1:

The

flowchartofour

TDincremental

placer FlowPlace.

vacant positions both cell movements

The removed cells form

noted by moveN) that will be replaced by our TD incremental

placer with the goal of reducing the critical path delay. This

is achieved by a combination of a TD analytical placer TAN

in which moveC constitutes the set of movable cells and the

minimization function is a sum of net-delay functions weighed

inversely by their path slacks, that have both linear and a

quadratic interconnect-length terms. Our main contribution

in this part is two-fold. The first is developing an accurate

and detailed pre-routing net-delay function, and determining

net weights so that the net-delays of critical paths have the

highest minimization priority. The second is performing both

quadratic and linear optimization simultaneously.

The output of our TD analytical placer will generally be

an illegal placement for cells in moveC–the cell positions de-

termined will generally not be in cell rows and may overlap

each other or cells in PC. However, these cell positions pro-

vides starting points for our detailed TD placer that uses a

novel network flow method for placing new cells in legal po-

sitions and moving existing cells minimally to accommodate

this in such a way that the critical path delay is optimized and

the row-length (i.e., row white space) constraint is satisfied.

This TD min-cost max-flow white-space satisfying algorithm,

called TIF, is the major contribution of this paper. The ba-

sic problem of TD incremental placement (and placement in

general) is at its core a constraint-satisfying discrete opti-

mization problem (DOP). By using a network flow approach

to solve it, we are using a continuous optimization approach,

and thus certain “illegalities” are introduced in the solution

for the core problem. We thus also describe in Sec. 4 the

in-processing methods we use for: a) legalizing the contin-

uous solution of the network flow process, and b) satisfying

white-space constraints that are not completely modeled by

standard capacity constraints in the network flow graph.

2.1STA and Path Slacks

We perform STA to determine delays to the output pins

or flip-flops (FFs) of the circuit; each of these “terminal” pins

have a max-delay path to them, and the maximum delay over

all these paths is the critical path delay. We define a near-

critical path as a max-delay path to a terminal pin whose delay

is within a (1 − ?) fraction of the critical path delay; we use

? = 0.1 in our experiments. A path P’s slack S(P) is defined

as the difference between the required arrival time (RAT) at

the terminal pin of P and the arrival time (AT) of P. We

assume a single target clock speed and thus uniform RATs at

all terminal pins (our methods easily apply to non-uniform

RATs as well). For the purpose of meaningful slack-driven

cost functions to minimize critical interconnect lengths, we

need positive slacks, and we thus bootstrap our methods by

defining the RAT of the terminal pin of the critical path as

(1 + α) times the critical path delay; we use α = 0.1 in our

experiments. This ensures positive slack for all paths, and of

course smaller slacks for more critical paths.

3.TD

MENT

Our analytical placer TAN is a TD extension of a combi-

nation of Gordian [10] and Gordian-L [13]–we optimize an

objective function that contains both linear and quadratic

terms.

3.1Basic Gordian and Gordian-L

Gordian [10] is a quadratic programming technique for cell

placement for quadratic WL minimization. The quadratic net

ANALYTICALGLOBALPLACE-

Page 3

length estimate can be based on either a clique or a star-graph

model. For the latter (see Fig. 2(a)), which we use in our TD

objective function, the quadratic net length of net nj with k

pins is given by:

X

where (xi,yi) are the coordinates of pin ui, (xc,yc) is the

coordinate of the centroid of the pins of nj, with xc(yc) =

(1/k) ×P

u (x , y )u (x , y )

p

p

p

centroid

L2(nj) =

ui∈nj

(xi− xc)2+ (yi− yc)2

ui∈njxi(yi).

For a circuit netlist

G, Gordian performs

an optimization of

the quadratic objec-

tive function

P

include those on the

coordinates

initially within chip

boundaries, and then

within boundaries of

subregions–after each

Gordian phase in a

region, the cells are

partitioned, based on

their solution coor-

dinates, into

subregions by a cut-

line perpendicular to

the optimization di-

mension (x or y).

This prevents

of Ctotal

(1−γ)

u (x , y )

d

d

d

l /2

d,i

l

d,i

iu (x , y )

ii

u (x , y )

q

q

q

u (x , y )

p

pp

(b)

C

of

total

γ

u (x , y )

iii

u (x , y )

q

q

q

C(x , y )

c

c

(a)

d

dd

Figure 2:

(a) The star-graph model

for net length estimate.(b) Inter-

connect delay computation in a pre-

routing placement. Ctotal is the total

(net and load) capacitance seen by the

driver.

nj∈nets(G)L2(nj).

The linear constraints

tobe

two

cell

overlaps among the two groups and ultimately between ev-

ery subgroup of cells where this hierarchical process ends.

Gordian-L [13] applies an additional inner-iteration for the

optimization in each subregion, which essentially comprises of

dividing in the (m+1)’th inner iteration, each L2(nj) part of

the objective function by a net-centric linear-length quantity

given by ηm

the horizontal dimension), where xm

coordinate of ui after the m’th iteration, and η0

has the effect of linearizing the objective function at the end

of the inner iteration.

3.2Net Delays and Objective Function

We assume that we start with an unrouted placement1,

and thus use the routing model shown in Fig. 2(b). For a net

nj with driver ud, and k − 1 ≥ 1 sinks, let Rd be the driving

resistance, Cg the load capacitance of a sink pin2, r (c) the

unit wire resistance (capacitance), and ld,i the interconnect

length connecting driver ud to sink ui; see Fig. 2(b ). Refer-

ring to this figure and considering a sink ui in nj, the delay

D(ui,nj) to it (using the Elmore delay model) from the driver

ud, consists of three parts:

j =P

ui∈nj|xm

i −xm

c| (for the optimization along

i

is the value of the x-

i = 1. This

1Our methods apply to routed placements as well. However, since rout-

ing consumes a dominant part of the PD phase, it would be beneficial to

perform a quick-and-approximate pre-routing estimate of critical path

delays using as-accurate-as-possible net route models and performing

TD re-placement before proceeding to the actual routing stage. This

will hopefully be beneficial for pre-routing corrections thus saving sig-

nificantly in design times.

2For simplicity of exposition, we assume uniform loads for all sink pins,

though clearly our net-delay modeling and methods also apply to non-

uniform loads.

D1(nj) = Rd(c · L(nj) + (k − 1)Cg)

D2(ui,nj) =rc

2· l2

(1)

d,i+ r · ld,iCg

(2)

D3(ui,nj) = r·(ld,i/2)((1−γ+γ/2)(c·L(nj)+(k−2)Cg) (3)

and D(ui,nj) = D1(nj) + D2(ui,nj) + D3(ui,nj)(4)

where γ ≤ 1, and note that the D1(nj) delay component is

the same for all sinks of nj. The idea behind the 3rd delay

component D3(ui,nj) is that without an exact route, we es-

timate that if ui lies in the initial γ fraction of the HPBB

of nj starting from the driver position, then, on the average,

half of the interconnect length ld,i lies on the main trunk of

the estimated route, and it “sees” the entire wire and sink

capacitance of the rest of the (1 − γ) fraction of the net.

Furthermore, incremental pieces of this part of the (ud,ui)

interconnect on the main trunk can also see incremental por-

tions of the γ fraction of the net and load capacitance, which

ultimately results in this interconnect seeing a γ/2 fraction of

the total (load + net) capacitance Ctotal.

We define the critical delay Dc(nj) of nj as:

X

The intent here is to include in Dc only the delays of the set

critical(nj) of sinks of nj lying on near-critical paths. Note

that Dc is really a delay-criticality measure of nj rather than

an actual delay of some component of this net. We define the

allocated slack Sa(nj) of net nj as S(Pmax(nj))/(# of nets in

path Pmax(nj)), where Pmax(nj) is the maximum-delay path

through nj, and recall that S(P) is the slack of path P.

How much minimization should be performed to reduce a

net nj’s interconnect lengths for optimizing the circuit’s crit-

ical path delay depends not only on the net’s Dc value but

also on S(Pmax(nj))–a net with high Dc value but one lying

on a path with relatively high slack should have lower delay

optimization priority, and similarly for the reverse case. Fur-

thermore, two nets ni,nj on different max-delay paths with

similar slacks and similar Dc values, should not necessarily

be optimized similarly. The important parameter besides Dc

for determining optimization priority is the allocated slack Sa

of a net. The rationale for this is as follows. Let the max-

delay path through ni (nj) have 10 (5) nets in them. If the

delay optimization priority were the same for all the nets on

Pmax(ni) and Pmax(nj) due to their similar Dcand path slack

values, then the delays on their critical interconnects (assum-

ing only one critical interconnect from the driver to a single

critical sink on each of the 15 nets) will be made almost equal.

This results in Pmax(ni) having twice the delay of Pmax(nj),

and thereby a higher probability of violating the target clock

speed. On the other hand, if the delay cost of each net is made

∝ Dc/Sa, then in our example, since the Sa for the nets in

Pmax(ni) are half that of those in Pmax(nj), the former will

have twice the delay optimization priority (i.e., delay cost)

than the latter leading to balanced delays for both critical

paths Pmax(ni) and Pmax(nj).

Based on the above arguments we define the delay cost

CD(nj) of nj as

Dc(nj) = D1(nj) +

ui∈critical(nj)

D2(ui) + D3(ui).

CD(nj) = Dc(nj)/Sa(nj)β

where β is an exponent of the Sa metric that allows magnifi-

cation (with β > 1) or shrinking (with β ≤ 1) of differences

in optimization priorities of nets on paths with with varying

allocated slacks; we use β = 1 in our experiments.

Page 4

C21

in

D

C12

C11

C11

1

C

(o( , ), c )

21

C12

(o( , ), c )

3

C21

Dout

C21

(w( ), 0)

(w( ), 0)C21

(b)

A1

C11

C13

C22

C24

W 3

W 2

W 21

W 21

(w( ), 0)

A2

C21

C23

(max(w), c )

h

(w(A ), 0)

1

2

(w(A ), 0)

W

(Σ

3i

w( ), 0 )

White space cells

CCCC

CCCW

New Cell

C12141516 1

C25

Row Boundary

Source

Sink

Row 1

Row 3

Row 2

(Σ

CC

31 3233 34 3536

Details in (c)

Details in (b)

cost

W w( ), 0 )

1i

w( ), 0 )W

2i

(Σ

(a).

capacity

C31

in

D

C31

A1

A2

A 1

(w( ), c )

5

C21C22

,

From

C22C23

,

From

A1C31

(o( , ), c )

6

in

D

(w( ), 0)C32

A1C32

7

(o( , ), c )

A C

2

(o( , ), c )

32

8

(c)

C32

(w( ), 0)

A 2

(w( ) , c )

6

From

Row 3

Row 3

To

Figure 3: (a) The high-level network flow graph for placing cells A1,A2 in legal positions; w(u) is the width of a cell u. (b)

Details of flow graph structure for vertical flows between cell pairs (C1,1, C2,1) and (C1,2,C2,1); o(u,v) is the amount of horizontal

overlap between cells u and v. This flow graph structure only allows a flow of amount <= w(u) into a row cell u, and also the

vertical flow out of a cell v to go to all cells in the adjacent row that it horizontally overlaps. (c) Similar details of the flow graph

structure for flows from the new cells into vertically adjacent row cells.

Note that the Dc(nj) metric has a component Dc,quad(nj)

that is quadratic and a component Dc,lin(nj) that is linear in

length metrics. Thus we can write

CD(nj) = (Dc,quad(nj) + Dc,lin(nj))/Sa(nj)β.

The desired TD objective function then is:

X

where recall that moveN is the set of nets connected to cells in

moveC, the set of cells selected for replacement for reducing

delays in critical and near-critical paths.

Since we use a quadratic placer, we need to have a

quadratic version of Dc,lin(nj), which we do simply by replac-

ing the linear length metrics (e.g., L(nj),ld,i) in it by their

quadratic counterparts (e.g., L2(nj),l2

yi)2). Let us call this modified component Dc,lin quad(nj).

Then, the objective function for TAN is:

X

In TAN we optimize the quadratic portion just like in

Gordian, and obtain the desired optimization of the linear

Dc,lin quad(nj) as in Gordian-L by dividing Dc,lin quad(nj)

by its current linear value in an inner loop as explained in

Sec. 3.1. Note that since we are performing both quadratic

and linear optimization, in the inner loop the quadratic-

optimization terms remain part of the optimization function

without modification (unlike the linear optimization terms).

Furthermore, since the analytical placement phase will be fol-

lowed by a legalizing detailed placer, we do not perform the

hierarchical partition-based optimization process of Gordian

and Gordian-L.

nj∈moveN

(Dc,quad(nj) + Dc,lin(nj))/Sa(nj)β

(5)

d,i= (xd−xi)2+(yd−

nj∈moveN

(Dc,quad(nj) + Dc,lin quad(nj))/Sa(nj)β

(6)

4.TD

TAILED PLACEMENT

The output of TAN will generally be an illegal placement,

but it presents a good starting point for our TD network-

flow based detailed placer TIF to place the new cells in legal

positions to minimize critical path delays. To accommodate

new cell placement, existing cells will be moved minimally.

All cell movements are done using TD costs which are: a)

proportional to the delay sensitivities Ds(u)s–Ds(u) is the

delay change per unit displacement of u of the most criti-

cal interconnect through it, and b) inversely proportional to

the allocated slacks Sa(u)s–Sa(u) = Sa(nj) where nj is the

net on the max-delay path through u; further details are in

NETWORK FLOWBASEDDE-

Sec. 4.3. Besides placing the new cells in legal positions in

a timing-driven manner, TIF also satisfies white space (WS)

constraints using novel techniques. The rest of this section

describes various aspects of TIF.

4.1 Network Flow Model

Fig. 3(a) shows a generic network flow graph with arc costs

and capacities, and a minimum cost flow of some amount x

from the source node S to the sink node T that passes through

the network. Network flow has found application in VLSI

CAD problems ranging from partitioning to placement [4, 5,

16].

Our network flow-based incremental placement algorithm

TIF is novel in the way it models arc costs, in that it is timing

driven, and in that it accurately solves white space constraints

for standard cell placement by overlaying constraints on the

flow determination process. The basic network flow model

for our detailed incremental placer is shown in Fig. 3(a). For-

mally, the network graph we use is F(V,A) defined as follows.

The node set V is the set moveC ∪ rowC ∪ IWS ∪ rowWS ∪

{S,T}, where moveC is the set of new cells that need to

be “pushed” to legal row positions so as to minimize critical

path delay, rowC is the set of existing cells in each row of the

placement, IWS is the set of intermediate row “WS cells”, and

rowWS is the set of row WS nodes, one per row, representing

the total WS available in each row. The arc set A is given

by pushA ∪ vertA ∪ horA ∪ IWSA ∪ rowWSA, where pushA

is the set of flow pushing arcs from S to each cell in moveC,

vertA and horA are the sets of vertical and horizontal arcs,

respectively, that represent cell movements in corresponding

directions when flows pass through them, IWSA is the set

of arcs going from intermediate WS cells to the correspond-

ing row WS nodes, and rowWSA contains the arcs that go

from each row WS node to the sink T. The purpose of these

different classes of nodes and arcs in F(V,A) are explained

below.

There is a push arc from the source S to each new cell v

of capacity the width w(v) of v, and for each such v, there

are two vertical arcs from it directed toward cells in rows im-

mediately above and below it (there are more details to these

“conceptual” arcs shown in Fig. 3(c)); the capacity of each

vertical arc is also w(v). A total flow of f =P

results in each new cell being pushed to one of its row-position

choices (modeled by the vertical arcs from it).

From each row cell, there are two vertical and two horizon-

tal arcs, one in each direction. The vertical arcs from u go to

cells in adjacent rows and model possible movement of u in

the respective vertical directions; the capacity of these arcs is

v∈moveCw(v)

emanates from S, and a max-flow solution though the network

Page 5

w(u), since only u can move along these arcs. The horizontal

arcs from u model possible horizontal movement of u within

its row, and are potentially of capacity equal to the width of

the row from u to the corresponding end of the row, since u

could be moved up to either end of the row. However, since

arc cost estimates become more inaccurate for large displace-

ments of the cells, a capacity equal to the maximum of the

widths of the cell in adjacent rows or new cells that have ver-

tical arcs into u is imposed on its outgoing horizontal arcs.

This allows enough horizontal flow through u to cause its

movement that remove overlaps with cells vertically moved

to its position (via vertical flows into u). There can be in-

termediate white space within rows and these are modeled as

nodes (∈ IWS) with incoming horizontal and vertical arcs,

but each with only one outgoing arc (∈ IWSA) to the row

WS node Wi of the row; the arc’s cost is zero and capacity

equal to amount of that intermediate white space. Finally,

the total white space w(Wi) of row i (Ri) = (max row size

constraint) - (P[cell widths in it]) is also modeled as a node

arc from the rightmost cell and an outgoing arc (∈ rowWSA)

to T of zero cost and capacity = w(Wi).

4.2The Simplex Network Flow Algorithm

The Simplex method is widely used to solve min-cost max-

flow problems. Its basic idea is to iteratively improve an initial

solution. It starts with a feasible but generally non-optimal

flow of the given amount f. After that, it tries to find negative

cycles, defined as cycles that have negative costs when trav-

eling in a certain direction. For each such cycle, the Simplex

method augments or pushes a flow of the maximum possible

value in the cycle in the negative-cost direction. It continues

doing so until there are no negative cycles, or flows in nega-

tive cycles cannot be further augmented because the capacity

of some arc in each cycle is either full in the direction of the

flow or there is no flow on some arc in the reverse direction.

Our implementation is based on the Simplex algorithm in [3].

4.3Arc Cost Functions

As mentioned earlier, the TD cost of arc (u,v) should be:

i) proportional to the delay change or sensitivity of the most

critical interconnect of its start node u to unit length dis-

placements of u in the direction of the arc, and ii) inversely

proportional to the allocated slack of its start node u. De-

lay sensitivity, which is essentially the derivative of the delay

function w.r.t. start cell displacement, is a good measure of

performance cost when cells are moved by not-very-large dis-

placements from well-established positions, as in the case of

incremental detailed placement.

Eqns. 1-4 give the delay formulation for a sink uion net nj.

The sensitivity of this delay to a displacement of either sink

uior driver udby ∆ld,ican be obtained by taking derivatives

w.r.t. ld,i, and following the components in Eqns. 1-4, these

are:

Wi at the right end of the row with an incoming horizontal

∆D1(ui,nj) = Rdc · ∆L(nj) ≈ Rdc · ∆ld,i

(7)

∆D2(ui,nj) = rc · ld,i· ∆ld,i+ r · ∆ld,iCg

(8)

∆D3(ui,nj) = ∆D3a(ui,nj) + ∆D3b(ui,nj),where

∆D3a(ui,nj) = r·(∆ld,i/2)((1−γ/2)(c·L(nj)+(k−2)Cg) (9)

∆D3b(ui,nj) = r · (ld,i/2)((1 − γ/2)(c · ∆ld,i)(10)

∆D(ui,nj) = ∆D1(ui,nj) + ∆D2(ui,nj) + ∆D3(ui,nj).

(11)

Note that the ∆ld,i can be positive or negative based on

the movement of the cell in question (udor ui) in the direction

of the arc e whose cost is being determined. The magnitude

of ∆ld,i for a horizontal arc is its capacity (which reflects the

maximum displacement of the cell), and for a vertical arc,

it is the spacing between the two adjacent rows that the arc

spans (this reflects the exact cell displacement if there is any

positive flow along this arc).

The displacement of a cell u in the direction of a flow arc

e emanating from it impacts critical nets connected to u in

two ways: a) as a sink on the most critical net connected to

it, and b) as a driver of the most critical net connected to it.

a) As a sink, there are two cases:

i) u is the most critical sink of its most critical net nj, in

which case its effect on the delay change on nj is

∆Da(u) = ∆D(u,nj)asexplainedinEqns. 7 − 11.

ii) u is not the most critical sink of its most critical net nj,

in which case its effect on the delay change on nj is

∆Da(u) = ∆D1(u,nj) + ∆D3b(ui,nj),

which reflects the displacement’s effect on L(nj) and thereby

on ∆D(ui,nj) for the most critical sink ui on nj.

b) As a driver of its most critical net nk, the effect of u0s

displacement on the delay on its most critical interconnect is:

∆Db(u) = ∆D(u,nk)givenbyEqn. 11

Based on the above, the cost of an arc e (i.e., its unit-flow

cost) emanating from u is:

cost(e) = (∆Da(u) + ∆Db(u)/cap(e)) ·

1

Sa(u)κ

Note that Sa(u) = Sa(nj) = Sa(nk) as nj and nk lie on the

max-delay path through u, and κ is a variable exponent to

magnify or shrink cost differences among arcs emanating from

cells connected to critical and non-critical nets; κ = 2 gives

us the best overall results.

4.4Tackling Illegalities in the Incremental

Placement DOP

As mentioned earlier, the core incremental detailed place-

ment problem is a DOP, and thus certain illegalities are in-

troduced in it by using a continuous optimization method

like network flow. We discuss two main illegality issues and

their in-processing techniques that we have developed, i.e.,

techniques that work simultaneously with the network-flow

algorithm.

4.4.1Discrete flow requirement in vertical arcs

Figure 4(b) shows a vertical arc (u,v) from cell u to v of

capacity w(u) = 5 and unit-flow cost c1. This arc is used

to model the possible movement of u to the row immediately

above it (and thus to the position of v). The physical inter-

pretation of any flow along (u,v) has to be that u is moved

to v’s location, since any position in between its current posi-

tion and that of v’s is illegal. Thus the exact requirement of

the flow amount through (u,v) should be either 0 (no move-

ment of u) or w(u) = 5. Furthermore, any flow of x < w(u)

through (u,v) will also incur an inaccurate lower cost of x×c1

rather than the “full cost” of w(u) × c1, incurred in actually

moving u to v0s position. The resulting inaccuracies in cell

movements implied by such flows are shown in Figs. 4(b-c).

We rectify these inaccuracies, by initially having a capacity

of 1 and cost = w(u)×c1(the full cost) for (u,v) as illustrated

in Fig. 4(d). When a flow of 1 passes through (u,v) correctly

incurring the full cost of (u,v), we update (u,v)’s capacity to