-
[show abstract]
[hide abstract]
ABSTRACT: With its advantages in wirelength reduction and routing flexibility compared with conventional Manhattan routing, X architecture
has been proposed and applied to modern IC design. As a critical part in high-performance integrated circuits, clock network
design meets great challenges due to feature size decrease and clock frequency increase. In order to eliminate the delay and
attenuation of clock signal introduced by the vias, and to make it more tolerant to process variations, in this paper, we
propose an algorithm of a single layer zero skew clock routing in X architecture (called Planar-CRX). Our Planar-CRX method
integrates the extended deferred-merge embedding algorithm (DME-X, which extends the DME algorithm to X architecture) with
modified Ohtsuki’s line-search algorithm to minimize the total wirelength and the bends. Compared with planar clock routing
in the Manhattan plane, our method achieves a reduction of 6.81% in total wirelength on average and gets the resultant clock
tree with fewer bends. Experimental results also indicate that our solution can be comparable with previous non-planar zero
skew clock routing algorithm.
Science in China Series F Information Sciences 04/2012; 52(8):1466-1475. · 0.66 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Technology mapping and placement have a significant impact on delays in standard cell-based very large scale integrated circuits. Traditionally, these steps are applied separately to optimize the delays, possibly since efficient algorithms that allow the simultaneous exploration of the mapping and placement solution spaces are unknown. In this paper, we present an exact polynomial time algorithm for delay-optimal placement of a tree and extend the same to simultaneous technology mapping and placement for the optimal delay in the tree. We extend the algorithm by employing Lagrangian relaxation technique, which assesses the timing criticality of paths beyond a tree, to optimize the delays in directed acyclic graphs. Experimental results on benchmark circuits in a 70 nm technology show that our algorithms improve timing significantly with remarkably less runtimes compared to a competitive approach of iterative conventional timing-driven mapping and multilevel placement.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 04/2011; · 1.27 Impact Factor
-
Proceedings of the 2011 International Symposium on Physical Design, ISPD 2011, Santa Barbara, California, USA, March 27-30, 2011; 01/2011
-
[show abstract]
[hide abstract]
ABSTRACT: Clock gating is one of the most effective techniques to reduce clock tree power. Although it has already been studied considerably, most of the previous works are restricted to either register transfer level (RTL) or clock tree synthesis stage. Clock gating design at RTL is coarse and it pays no attention to the physical information, therefore, it often results in large wirelength overhead. While if clock gating is considered only at clock tree synthesis, the optimization space is largely limited due to the fixing of registers. To fully use the logical and physical information between registers, we propose a new flow for low-power gated clock tree design in this work. It mainly includes three parts: gated clock tree aware register placement, gated clock tree construction, and incremental placement. Compared with the previous works on clock gating, our algorithm reduces the clock tree power with much fewer gating logics, therefore, the overhead to the placement is also reduced.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems 01/2011; · 1.22 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The gap between VLSI technology and fabrication technology leads to strong refractive effects in lithography. Consequently, it is a huge challenge to reliably print layout features on wafers. The quality and robustness of lithography directly depend on layout patterns. It becomes imperative to consider the manufacturability issue during layout design such that the burden of lithography process can be alleviated. In this paper, three algorithms, namely, cell flipping algorithm, single row optimization approach and multiple row optimization approach, are proposed to tune any existing cell placement to be lithography friendly. These algorithms are based on dynamic programming and graph theoretic approaches, and can provide different tradeoff between critical dimension (CD) variation reduction and wirelength increase. Using lithography simulations, our experimental results demonstrate that over 15% CD variation reduction can be obtained in post-OPC stage by the new approaches while only less than 1% additional wire is introduced.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems 07/2010; · 1.22 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: As VLSI technology scales into sub-65 nm realm, the complexity of timing optimization is drastically increased by the consideration of power and variations. Even though designers make great efforts during physical design, they are often faced with still heavy timing violations in deep post-routing stages. For the entire design convergence and timing closure, especially under current multi-corner multi-mode design, some more efficient methods need to be invented. In this work, we propose to address such a kind of issue by exploiting useful clock skew, which can help reduce timing violations rapidly. We also add mode/corner metric balancing measurements to make this method more flexible and applicable especially in such deep stages while the CTS is ready. The results indicate that our method can achieve an average improvements of 33.16% on the worst slack (WS) and 75.56% on the total negative slack (TNS), respectively.
Quality Electronic Design (ISQED), 2010 11th International Symposium on; 04/2010
-
[show abstract]
[hide abstract]
ABSTRACT: Gate sizing and threshold voltage (V<sub>t</sub>) assignment are popular techniques for circuit timing and power optimization. Existing methods, by and large, are either sensitivity-driven heuristics or based on discretizing continuous optimization solutions. Sensitivity-driven heuristics are easily trapped in local optima and the discretization may be subject to remarkable errors. In this paper, we propose a systematic combinatorial approach for simultaneous gate sizing and V<sub>t</sub> assignment. The core idea of this approach is joint relaxation and restriction, which employs consistency relaxation and coupled bi-directional solution search. The process of joint relaxation and restriction is conducted iteratively to systematically improve solutions. Our algorithm is compared with a state-of-the-art previous work on benchmark circuits. The results from our algorithm can lead to about 22% less power dissipation subject to the same timing constraints.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 03/2010; · 1.27 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Clock mesh has been widely used to distribute the clock signal across the chip. Clock mesh is driven by a top-level tree and a set of mesh buffers. We present fast and efficient combinatorial algorithms to simultaneously identify the candidate locations as well as sizes of the buffers driving the clock mesh. We show that such a sizing offers a better solution than inserting buffers of uniform size across the mesh. Due to the high redundancy, a mesh architecture offers high tolerance toward variations in clock skew. However, such a redundancy comes at the expense of mesh wire length and power dissipation. Based on survivable network theory, we formulate the problem to reduce the clock mesh by retaining only those edges that are critical to maintain redundancy. Such a formulation offers designer the option to tradeoff between power and tolerance to process variations. We present efficient postprocessing techniques to reduce the size of the mesh buffers after mesh reduction. Experimental results indicate that our techniques can result in power savings up to 28% with less than 3.3% delay penalty . We also present driver models that can help in simulating the clock mesh. Such models achieve near-HSPICE accuracy with significant speedup in run time.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems 02/2010; · 1.22 Impact Factor
-
Proceedings of the 15th Asia South Pacific Design Automation Conference, ASP-DAC 2010, Taipei, Taiwan, January 18-21, 2010; 01/2010
-
Proceedings of the 2010 International Symposium on Physical Design, ISPD 2010, San Francisco, California, USA, March 14-17, 2010; 01/2010
-
IEEE Trans. VLSI Syst. 01/2010; 18:1002-1006.
-
IEEE Trans. VLSI Syst. 01/2010; 18:1639-1648.
-
IEEE Trans. on CAD of Integrated Circuits and Systems. 01/2010; 29:1342-1353.
-
IEEE Trans. VLSI Syst. 01/2010; 18:131-141.
-
IEEE Trans. VLSI Syst. 01/2010; 18:1025-1035.
-
Design, Automation and Test in Europe, DATE 2010, Dresden, Germany, March 8-12, 2010; 01/2010
-
[show abstract]
[hide abstract]
ABSTRACT: Power/ground noise is a major source of VLSI circuit timing variations. This work aims to reduce clock network induced power noise by assigning different signal polarities (opposite switchings) to clock buffers in an existing buffered clock tree. Three assignment algorithms are proposed: 1) partitioning; 2) 2-coloring on minimum spanning tree; and 3) recursive min-matching. A post-processing of clock buffer sizing is performed to achieve desired clock skew. SPICE based experimental results indicate that our techniques could reduce the average peak current and average delay variations by 50% and 51%, respectively.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems 07/2009; · 1.22 Impact Factor
-
IEEE Trans. on CAD of Integrated Circuits and Systems. 01/2009; 28:818-825.
-
[show abstract]
[hide abstract]
ABSTRACT: Clock gating is a popular technique for reducing power dissipation in clock network. Although there have been numerous research efforts on clock gating, the previous approaches still have a significant weakness. That is, they usually construct a gated clock tree after cell placement, i.e., cell placement is performed without considering clock gating and may generate a solution unfriendly to subsequent gated clock tree construction. As a result, the control gates inserted in the tree construction is very likely to cause cell overlap. Even though the overlap can be eventually removed in placement legalization, remarkable wirelength/power overhead is incurred. In this paper, we propose a gate planning technique which is integrated with a partition-based cell placer. During cell placement, the planning judiciously inserts clock gates based on power estimation. In addition, pseudo edges are inserted between clock gates and registers in order to reduce clock wirelength and enable long shut-off periods. At the end, when a relatively detailed placement is obtained, a post-processing is performed to degrade the inefficient clock gates to clock buffers. We compared our approach with recent previous works on ISCAS89 benchmark circuits. Our method reduces the clock tree wirelength and power by 22.06% and 40.80%, respectively, with a very limited increase on signal nets wirelength and power compared with the conventional (register-oblivious) placement. The results also indicate that our algorithm outperforms the clock-gating-oblivious placement on power reduction and performance improvement.
Computer Design, 2008. ICCD 2008. IEEE International Conference on; 11/2008
-
[show abstract]
[hide abstract]
ABSTRACT: In nanometer regime, IC designs have to consider the impact of process variations, which is often indicated by manufacturing/parametric yield. This paper investigates a yield model - the probability that the values of multiple manufacturing/circuit parameters meet certain target. This model can be applied to predict CMP (chemical-mechanical planarization) yield. We focus on the difficult cases which have large number of partially correlated variations. In order to predict the yield for these difficult cases efficiently, we propose two techniques: (1) application of orthogonal principle component analysis (OPCA); (2) hierarchical adaptive quadrisection (HAQ). Systematic variations are also included in our model. Compared to previous work, the OPCA based method can reduce the error on yield estimation from 17.1%-21.1% to 1.3%-2.8% with 4.6x speedup. The HAQ technique can reduce the error to 4.1%-5.6% with 6x-9.4x speedup.
Design Automation Conference, 2008. ASPDAC 2008. Asia and South Pacific; 04/2008