Conference Paper

A network-flow approach to timing-driven incremental placement for ASICs.

DOI: 10.1109/ICCAD.2006.320061 Conference: 2006 International Conference on Computer-Aided Design (ICCAD'06), November 5-9, 2006, San Jose, CA, USA
Source: DBLP

ABSTRACT We present a novel incremental placement methodology called FlowPlace for significantly reducing critical path delays of placed standard-cell circuits. FlowPlace includes: a) a timing-driven (TD) analytical global placer TAN that uses accurate delay functions and minimizes a combination of linear and quadratic objective functions; b) a network flow based detailed placer TIF that has new and effective techniques for performing TD incremental placement and satisfying row-length (white space) constraints. We have obtained results on three sets of benchmarks: i) TD versions of the ibm benchmark suite that we have constructed; ii) benchmarks used in TD-Dragon; iii) the Faraday benchmarks. Results show that starting with Dragon-placed circuits, we are able to obtain up to 34% and an average of 18% improvement in critical path delays, at an average of 17.5% of the run-time of the Dragon placer. Starting with a state-of-the-art TD placer TD-Dragon, for the TD-Dragon benchmarks we obtain up to about 10% and an average of 4.3% delay improvement with 12% of TD-Dragon's run times; this is significant as we are extracting performance improvements from a performance-optimized layout. Wire length deterioration on the average over all benchmark suites is less than 8%

  • Source
    • "Our method tries to improve the critical path delay by minimizing the objective function proposed in [9]. A timing cost í µí±¡ í µí± (í µí±› í µí±– ) of a net í µí±› í µí±– is defined as [9]: í µí±¡ í µí± (í µí±› í µí±– ) = ∑ í µí±¢ í µí±— ∈CS(í µí±› í µí±–) "
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a timing-driven discrete cell-sizing algorithm that can address total cell size and/or leakage power constraints. We model cell sizing as a “discretized” mincost network flow problem, wherein available sizes of each cell are modeled as nodes. Flow passing through a node indicates the choice of the corresponding cell size, and the total flow cost reflects the timing objective function value corresponding to these choices. Compared to other discrete optimization methods for cell sizing, our method can obtain near-optimal solutions in a time-efficient manner. We tested our algorithm on ISCAS’85 benchmarks, and compared our results to those produced by an optimal dynamic programming- (DP-) based method. The results show that compared to the optimal method, the improvements to an initial sizing solution obtained by our method is only 1% (3%) worse when using a 180 nm (90 nm) library, while being 40–60 times faster. We also obtained results for ISPD’12 cell-sizing benchmarks, under leakage power constraint, and compared them to those of a state-of-the-art approximate DP method (optimal DP runs out of memory for the smallest of these circuits). Our results show that we are only 0.9% worse than the approximate DP method, while being more than twice as fast.
    VLSI Design 05/2013; 2013. DOI:10.1155/2013/474601
  • Source
    • "For example, [15] takes ~1hr on problems with only 12,000 cells compared to a runtime of ~1min for RePlace on 50,000 BLEs. Also, [16] is ~6 times faster than full ASIC placement but only ~1% of cells are moved, compared to RePlace being ~7 times faster than full FPGA placement when moving up to 2/3 of 50,000 BLEs. Diffusion [14] is fast, but it computes costs in the inner loop and may sacrifice quality compared to replacing from scratch or performing a final global anneal as done in RePlace. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Recompiling a large circuit after making a few logic changes is a time-consuming process. We present an incremental placement algorithm for FPGAs that is focused on extremely fast runtime for changes which can be localized. It is capable of handling multiple changes across large regions of an FPGA. This is especially useful when used with a floorplan where a modified subcircuit is instantiated several times in the design hierarchy or where several subcircuits are modified. The algorithm is simpler and faster than past approaches because its insertion and legalization steps are based on CPU-efficient shifting steps which do not continuously evaluate the impact of each move on costs. Instead, any lost quality is recovered by a fast, low-temperature anneal at the end. When 35,000 out of 50,000 LUTs are modified, the incremental placement (including fast anneal) is 7 times faster than VPR's "fast placement" from scratch with only 2% quality degradation. The key concepts utilized in the incremental placement algorithm include uses of floor-planning constraints, CPU-efficient CLB shifting, super placement grid and a tuned annealing refinement process.
    Field Programmable Logic and Applications, 2009. FPL 2009. International Conference on; 10/2009
  • Source
    • "). 4. Connect the TSG to the DPG via placement injection arcs so that for each cell in a net structure the amount of flow equal to its chosen transform option size is sent to its chosen global placement position in the DPG (see Sec. 7). 5. Translate the given total cell size constraint to a white-space constraint on the detailed placement process that is satisfied by flows through the DPG using the algorithm of [5]. 6. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a post-placement physical synthesis algorithm that can apply multiple circuit synthesis and placement transforms on a placed circuit to improve the critical path delay under area constraints by simultaneously considering the benefits and costs of all transforms (as opposed to considering them sequentially after applying each transform). The circuit transforms we employ include, but are not limited to, incremental placement, two types of buffer insertion, cell resizing and cell replication. The problem is modeled as a min-cost network flow problem, in which nodes represent circuit transform options. By carefully determining the structure of the network graph and the cost of each arc, a set of near-optimal transform options can be obtained as those whose corresponding nodes in the network graph have the min-cost flow passing through them. We also tie the transform selection network graph to a detailed placement network graph with TD arc costs for cell movements. This enables our algorithms to incorporate considerations of detailed placement cost for each synthesis transform along with the basic cost of applying the transform in the circuit. We have tested our algorithms on three sets of benchmarks under 3-10% area increase constraints, and obtained up to 48% and an average of 27.8% timing improvement. Our average improvement is relatively 40% better (8.2% better by an absolute measure) than applying the same set of transforms in a good sequential order that is used in many current techniques. Considering only synthesis transforms (no replacement), our technique is relatively 50% better than the sequential approach.
    2008 International Conference on Computer-Aided Design (ICCAD'08), November 10-13, 2008, San Jose, CA, USA; 01/2008
Show more