Lei He

University of California, Riverside, Riverside, CA, USA

Are you Lei He?

Claim your profile

Publications (185)52.14 Total impact

  • Source
    Conference Proceeding: Robust FPGA resynthesis based on fault-tolerant Boolean matching.
    2008 International Conference on Computer-Aided Design (ICCAD'08), November 10-13, 2008, San Jose, CA, USA; 01/2008
  • Conference Proceeding: Trace-based framework for concurrent development of process and FPGA architecture considering process variation and reliability.
    Proceedings of the ACM/SIGDA 16th International Symposium on Field Programmable Gate Arrays, FPGA 2008, Monterey, California, USA, February 24-26, 2008; 01/2008
  • Conference Proceeding: Efficient decoupling capacitance budgeting considering operation and process variations
    [show abstract] [hide abstract]
    ABSTRACT: This paper solves the variation-aware on-chip decoupling capacitance (decap) budgeting problem. Unlike previous work assuming the worst-case current load, we develop a novel stochastic current model, which efficiently and accurately captures operation variation such as temporal correlation between clock cycles and logic-induced correlation between ports. The models also considers current variation due to process variation with spatial correlation. We then propose an iterative alternative programming algorithm to solve the decap budgeting problem under the stochastic current model. Experiments using industrial examples show that compared with the baseline model which assumes maximum currents at all ports and under the same decap area constraint, the model considering temporal correlation reduces the noise by up to 5times, and the model considering both temporal and logic-induced correlations reduces the noise by up to 17times. Compared with the model using deterministic process parameters, considering process variation tLej f variation in this paper reduces the mean noise by up to 4times and the 3 sigma noise by up to 13times. While the existing stochastic optimization has been used mainly for process variation purpose, this paper to the best of our knowledge is the first in-depth study on stochastic optimization taking into account both operation and process variations for power network design. We convincingly show that considering operation variation is highly beneficial for power integrity optimization and this should be researched for optimizing signal and thermal integrity as well.
    Computer-Aided Design, 2007. ICCAD 2007. IEEE/ACM International Conference on; 12/2007
  • Source
    Conference Proceeding: Device and architecture concurrent optimization for FPGA transient soft error rate
    Yan Lin, Lei He
    [show abstract] [hide abstract]
    ABSTRACT: Late CMOS scaling reduces device reliability, and existing work has studied the permanent SER (soft error rate) for configuration memory in FPGA extensively. In this paper, we show that, continuous CMOS scaling dramatically increases the significance of FPGA chip-level transient soft errors in circuit elements other than configuration memory, and transient SER can no longer be ignored. We then develop an efficient, yet accurate, transient SER evaluation method, called trace based methodology, considering logic, electrical and latch-window maskings. By collecting traces on logic probability and sensitivity and re-using these traces for different device settings, we finally perform device and architecture concurrent optimization considering hundreds of device and architecture combinations. Compared to the commonly used FPGA architecture and device settings, device and architecture concurrent optimization can reduce the transient SER by 2.8times and reduce the product of energy, delay and transient SER by 1.8times.
    Computer-Aided Design, 2007. ICCAD 2007. IEEE/ACM International Conference on; 12/2007
  • Source
    Conference Proceeding: Exploiting symmetry in SAT-based boolean matching for heterogeneous FPGA technology mapping
    [show abstract] [hide abstract]
    ABSTRACT: The Boolean matching problem is a key procedure in technology mapping for heterogeneous field programmable gate arrays (FPGA), and SAT-based Boolean matching (SAT-BM) provides a highly flexible solution for various FPGA architectures. However, the computational complexity of state-of-the-art SAT-BM prohibits its application practically. In this paper we propose an efficient SAT-BM algorithm by exploring function and architectural symmetries. While the most recent work obtained up to 13times speedup, we achieve up to 200times speedup, when both are compared to the original SAT-BM algorithm.
    Computer-Aided Design, 2007. ICCAD 2007. IEEE/ACM International Conference on; 12/2007
  • Source
    Conference Proceeding: Design, synthesis and evaluation of heterogeneous FPGA with mixed LUTs and macro-gates
    [show abstract] [hide abstract]
    ABSTRACT: Small gates, such as AND2, XOR2 and MUX2, have been mixed with lookup tables (LUTs) inside the. programmable logic block (PLB) to reduce area and power and increase performance in FP-GAs. However, it is unclear whether incorporating macro-gates with wide inputs inside PLBs is beneficial. In this paper, we first propose a methodology to extract a small set of logic functions that are able to implement a large portion of functions for given FPGA applications. Assuming that the extracted logic functions are implemented by macro-gates in PLBs, we then develop a complete synthesis flow for such heterogeneous PLBs with mixed LUTs and macro-gates. The flow includes a cut-based delay and area optimized technology mapping, a mixed binary integer and linear programming based area recovery algorithm to balance the resource utilization of macro-gates and LUTs for area-efficient packing, and a SAT-based packing. We finally evaluate the proposed heterogeneous FPGA using the newly developed flow and show that mixing LUT and macro-gates, both with 6 inputs, improves performance by 16.5% and reduces logic area by 30% compared to using merely 6-input LUTs.
    Computer-Aided Design, 2007. ICCAD 2007. IEEE/ACM International Conference on; 12/2007
  • Source
    Conference Proceeding: Temperature aware microprocessor floorplanning considering application dependent power load
    [show abstract] [hide abstract]
    ABSTRACT: This paper studies microprocessor floorplanning considering thermal and throughput optimization. We first develop a stochastic heat diffusion model taking into account the application dependent power load for thermal analysis. Then, we design the floorplanning algorithm based on this model. Experimental results show that, compared with the deterministic heat diffusion model, our model obtains up to 3.2degC reduction of the on-chip peak temperature, 1.25% reduction of the area, and 1.125times better CPI (cycles per instruction) performance, respectively. Compared with temperature aware floorplanning in the HOTSPOT tool set that ignores interconnect pipelining, our algorithm is up to 27times faster, reduces the peak temperature by up to 3degC, and also reduces CPI significantly with a negligible area overhead.
    Computer-Aided Design, 2007. ICCAD 2007. IEEE/ACM International Conference on; 12/2007
  • Source
    Article: TermMerg: An Efficient Terminal-Reduction Method for Interconnect Circuits
    [show abstract] [hide abstract]
    ABSTRACT: In this paper, a novel method to efficiently reduce the terminal number of general linear-interconnect circuits with a large number of input or output terminals considering delay uncertainty is proposed. Our new algorithm is motivated by the fact that terminal reduction can lead to a more compact order-reduced model and the observation that very large-scale integration interconnect circuits have many similar terminals in terms of their timing and delay metrics due to their closeness in structure or due to the mathematical discretization using meshing in finite-difference or finite-element scheme during the extraction process. The new method, called TermMerg ( Proc. ICCAD, p. 821, 2005), is based on the moments of the circuits as the metrics for the timing or delay. It then employs a singular-value-decomposition (SVD) method to determine the best number of clusters based on the low-rank approximation. After this, the -means clustering algorithm is used to cluster the moments of the terminals into the different clusters. The proposed method can work with any passive-model order reduction and ensure the passive models. In contrast, we show that singular value decomposition model order reduction (SVDMOR) does not generate passive models in general. Passivity enforcement in SVDMOR will significantly hamper the terminal-reduction effectiveness. Experimental results on a number of real industry interconnect circuits demonstrate the effectiveness of the proposed method and show also that the proposed method is more accurate than SVDMOR when the used moment matrix does not give good terminal correlations.
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 09/2007; · 1.27 Impact Factor
  • Source
    Article: Device and Architecture Cooptimization for FPGA Power Reduction
    [show abstract] [hide abstract]
    ABSTRACT: Device optimization considering supply voltage Vdd and threshold voltage Vt has little chip-area increase but a great impact on power and performance in the nanometer technology. This paper studies simultaneous evaluation of device and architecture optimization for field-programmable gate arrays (FPGAs). We first develop an efficient yet accurate timing and power evaluation method called a trace-based model. By collecting trace information from a cycle-accurate simulation of placed and routed FPGA benchmark circuits and reusing the trace for different Vdds and Vts, we enable device and architecture cooptimization considering hundreds of device and architecture combinations. Compared to the baseline FPGA architecture, which uses the Versatile Place and Route architecture model and the same lookup table and cluster sizes as those used by the Xilinx Virtex-II, Vdd suggested by the International Technology Roadmap for Semiconductor, Vt optimized with respect to the preceding architecture, and Vdd architecture and device cooptimization can reduce the energy-delay product (ED) by 20.5% and the chip area by 23.3%. Furthermore, considering the power gating of unused logic blocks and interconnect switches (in this case, sleep transistor size is a parameter of device tuning), our co-optimization reduces ED by 55.0% and the chip area by 8.2% compared to the baseline FPGA architecture. To the best of our knowledge, this is the first in-depth study in the literature on architecture and device cooptimization for FPGAs.
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 08/2007; · 1.27 Impact Factor
  • Article: Microarchitecture Configurations and Floorplanning Co-Optimization
    [show abstract] [hide abstract]
    ABSTRACT: Microarchitecture configurations and floorplanning are keys to boost throughput, and they are strongly related. In this paper, we propose a new method to optimize them simultaneously. We first concentrate on floorplanning under given microarchitecture configurations. In addition to the objectives of conventional floorplanning methods, we minimize the throughput degradation caused by pipelined global interconnects based on efficient yet accurate models for microarchitecture throughput over pipeline stages of global interconnects. Our results show that an accurate trajectory piecewise-linear (TPWL) model incurs more offline setup time to obtain 13% better throughput than a rough access ratio-based model, and both models lead to much better throughput (up to 64% higher) compared with conventional floorplanning methods. We then build a unified throughput model parameterized for pipelined global interconnects and microarchitecture configurations based on the TPWL method and apply this model to efficiently explore over one million microarchitecture configurations and corresponding floorplan variations. We obtain microarchitecture configurations and floorplans with throughput 26.9% better than manually chosen microarchitecture followed by automatic floorplanning in a very recent paper.
    IEEE Transactions on Very Large Scale Integration (VLSI) Systems 08/2007; · 1.22 Impact Factor
  • Source
    Article: Circuit-Simulated Obstacle-Aware Steiner Routing
    [show abstract] [hide abstract]
    ABSTRACT: This article develops circuit-simulated routing algorithms. We model the routing graph by an RC network with terminals as inputs, and show that the faster an output reaches its peak, the higher the possibility for the corresponding Hanan or escape node to become a Steiner point. This enables us to select Steiner points and then apply any minimum spanning tree algorithm to obtain obstacle-free or obstacle-aware Steiner routing. Compared with existing algorithms, our algorithms have significant gain on either wirelength or runtime for obstacle-free routing, and on both wirelength and runtime for obstacle-aware routing.
    ACM Transactions on Design Automation of Electronic Systems 08/2007; 12(3). · 0.81 Impact Factor
  • Source
    Conference Proceeding: Off-chip Decoupling Capacitor Allocation for Chip Package Co-Design
    Hao Yu, Chunta. Chu, Lei He
    [show abstract] [hide abstract]
    ABSTRACT: Off-chip decoupling capacitor (decap) allocation is a demanding task during package and chip codesign. Existing approaches can not handle large numbers of I/O counts and large numbers of legal decap positions. In this paper, we propose a fast decoupling capacitor allocation method. By applying a spectral clustering, a small amount of principal I/Os can be found. Accordingly, the large power supply network is partitioned into several blocks each with only one principal I/O. This enables a localized macromodeling for each block by a triangular-structured reduction. In addition, to systemically consider a large legal position map in a manageable fashion, the map of legal positions is decomposed into multiple rings, which are further parameterized in each block. The decaps are then allocated according to the sensitivity obtained from the parameterized macro- model for each block. Compared to the PRIMA-based macromodeling, experiments show that our method (TBS2) is 25X faster and has 3.04X smaller error. Moreover, our decap allocation reduces the optimization time by 97X, and reduces decap cost by up to 16% to meet the same power-integrty target.
    Design Automation Conference, 2007. DAC '07. 44th ACM/IEEE; 07/2007
  • Source
    Conference Proceeding: Non-Linear Statistical Static Timing Analysis for Non-Gaussian Variation Sources
    [show abstract] [hide abstract]
    ABSTRACT: Existing statistical static timing analysis (SSTA) techniques suffer from limited modeling capability by using a linear delay model with Gaussian distribution, or have scalability problems due to expensive operations involved to handle non-Gaussian variation sources or non-linear delays. To overcome these limitations, we propose a novel SSTA technique to handle both nonlinear delay dependency and non- Gaussian variation sources simultaneously. We develop efficient algorithms to perform all statistical atomic operations (such as max and add) efficiently via either closed- form formulas or one-dimensional lookup tables. The resulting timing quantity provably preserves the correlation with variation sources to the third-order. We prove that the complexity of our algorithm is linear in both variation sources and circuit sizes, hence our algorithm scales well for large designs. Compared to Monte Carlo simulation for non-Gaussian variation sources and nonlinear delay models, our approach predicts all timing characteristics of circuit delay with less than 2% error.
    Design Automation Conference, 2007. DAC '07. 44th ACM/IEEE; 07/2007
  • Source
    Article: Simultaneous Buffer Insertion and Wire Sizing Considering Systematic CMP Variation and Random Leff Variation
    [show abstract] [hide abstract]
    ABSTRACT: This paper presents extensions of the dynamic-programming (DP) framework to consider buffer insertion and wire-sizing under effects of process variation. We study the effectiveness of this approach to reduce timing impact caused by chemical-mechanical planarization (CMP)-induced systematic variation and random L<sub>eff</sub> process variation in devices. We first present a quantitative study on the impact of CMP to interconnect parasitics. We then introduce a simple extension to handle CMP effects in the buffer insertion and wire sizing problem by simultaneously considering fill insertion (SBWF). We also tackle the same problem but with random L<sub>eff</sub> process variation (vSBWF) by incorporating statistical timing into the DP framework. We develop an efficient yet accurate heuristic pruning rule to approximate the computationally expensive statistical problem. Experiments under conservative assumption on process variation show that SBWF algorithm obtains 1.6% timing improvement over the variation-unaware solution. Moreover, our statistical vSBWF algorithm results in 43.1% yield improvement on average. We also show that our approaches have polynomial time complexity with respect to the net-size. The proposed extensions on the DP framework is orthogonal to other power/area-constrained problems under the same framework, which has been extensively studied in the literature
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 06/2007; · 1.27 Impact Factor
  • Source
    Article: Efficient In-Package Decoupling Capacitor Optimization for I/O Power Integrity
    Jun Chen, Lei He
    [show abstract] [hide abstract]
    ABSTRACT: With high integration density of today's electronic system and reduced noise margins, maintaining high power integrity becomes more challenging for high performance design. Inserting decoupling capacitors is one important and effective solution to improve the power integrity. The existing decoupling capacitor optimization approaches meet constraints on input impedance. In this paper, we show that impedance metric leads to large overdesign and then develop a noise-driven optimization algorithm for decoupling capacitors in packages for power integrity. We use the simulated annealing algorithm to minimize the total cost of decoupling capacitors under the constraints of a worst case noise bound. The key enabler for efficient optimization is an incremental worst case noise computation based on fast Fourier transform over incremental impedance matrix evaluation. Compared to the existing impedance-based approaches, our algorithm reduces the decoupling capacitor cost by 3times and is also more than 10times faster even with explicit noise computation
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 05/2007; · 1.27 Impact Factor
  • Source
    Article: Field Programmability of Supply Voltages for FPGA Power Reduction
    Fei Li, Y. Lin, Lei He
    [show abstract] [hide abstract]
    ABSTRACT: Power reduction is of growing importance for field-programmable gate arrays (FPGAs). In this paper, we apply programmable supply voltage (Vdd) to reduce FPGA power. We first design FPGA logic fabrics using dual-Vdd levels and show that field-programmable power supply is required to obtain a satisfactory power-versus-performance tradeoff. We further design FPGA interconnect fabrics for fine-grained Vdd programmability with minimal increase of the number of configuration static-random-access-memory cells. With a simple yet practical computer-aided design flow to leverage the field-programmable dual-Vdd logic and interconnect fabrics, we carry out a highly quantitative study using placed and routed benchmark circuits, and delay, power, and area models obtained from detailed circuit designs. Compared to single-Vdd FPGAs with the Vdd level suggested by the International Technology Roadmap for Semiconductors for 100-nm technology, field-programmable dual-Vdd FPGAs reduce the total power by 47.61% and the energy-delay product by 27.36%
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 05/2007; · 1.27 Impact Factor
  • Source
    Article: Probabilistic Transitive-Closure Ordering and Its Application on Variational Buffer Insertion
    Jinjun Xiong, Lei He
    [show abstract] [hide abstract]
    ABSTRACT: We propose a provably transitive-closure ordering rule with theoretical foundations to prune suboptimal design solutions in the presence of process variations. As an example, this probabilistic ordering rule is applied to develop an efficient variational buffering algorithm. Compared to the conventional deterministic approach, variational buffering improves the parametric timing yield by 15.7% on average. This transitive-closure ordering rule may be leveraged to solve other computer-aided-design problems considering process variation effects
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 05/2007; · 1.27 Impact Factor
  • Source
    Article: Robust Extraction of Spatial Correlation
    [show abstract] [hide abstract]
    ABSTRACT: The increased variability of process parameters makes it important yet challenging to extract the statistical characteristics and spatial correlation of process variation. Recent progress in statistical static-timing analysis also makes the extraction important for modern chip designs. Existing approaches extract either only a deterministic component of spatial variation or these approaches do not consider the actual difficulties in computing a valid spatial-correlation function, ignoring the fact that not every function and matrix can be used to describe the spatial correlation. Applying mathematical theories from random fields and convex analysis, we develop: 1) a robust technique to extract a valid spatial-correlation function by solving a constrained nonlinear optimization problem and 2) a robust technique to extract a valid spatial-correlation matrix by employing a modified alternative-projection algorithm. Our novel techniques guarantee to extract a valid spatial-correlation function and matrix from measurement data, even if those measurements are affected by unavoidable random noises. Experiment results, obtained from data generated by a Monte Carlo model, confirm the accuracy and robustness of our techniques and show that we are able to recover the correlation function and matrix with very high accuracy even in the presence of significant random noises
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 05/2007; · 1.27 Impact Factor
  • Source
    Conference Proceeding: Minimal skew clock embedding considering time variant temperature gradient.
    Proceedings of the 2007 International Symposium on Physical Design, ISPD 2007, Austin, Texas, USA, March 18-21, 2007; 01/2007
  • Source
    Conference Proceeding: Fast dual-vdd buffering based on interconnect prediction and sampling.
    The Ninth International Workshop on System-Level Interconnect Prediction (SLIP 2007), Austin, Texas, USA, March 17-18, 2007, Proceedings; 01/2007

Institutions

  • 2005–2013
    • University of California, Riverside
      • Department of Electrical Engineering
      Riverside, CA, USA
  • 1995–2011
    • University of California, Los Angeles
      • • Department of Electrical Engineering
      • • Department of Computer Science
      Los Angeles, CA, USA
  • 2003–2009
    • Tsinghua University
      • Department of Computer Science and Technology
      Beijing, Beijing Shi, China
  • 2006–2007
    • Synopsys
      Mountain View, CA, USA
  • 2001–2003
    • University of Wisconsin, Madison
      • Department of Electrical and Computer Engineering
      Madison, MS, USA