Lei He

University of California, Los Angeles, Los Angeles, California, United States

Are you Lei He?

Claim your profile

Publications (260)78.76 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Majority of practical multivariate statistical analysis and optimizations model interdependence among random variables in terms of the linear correlation. Though linear correlation is simple to use and evaluate, in several cases non-linear dependence between random variables may be too strong to ignore. In this paper, we propose polynomial correlation coefficients as simple measure of multi-variable non-linear dependence and show that the need for modeling non-linear dependence strongly depends on the end function that is to be evaluated from the random variables. Then, we calculate the errors in estimation resulting from assuming independence of components generated by linear de-correlation techniques, such as PCA and ICA. The experimental results show that the error predicted by our method is within 1% error compared to the real simulation of statistical timing and leakage analysis. In order to deal with non-linear dependence, we further develop a target-function-driven component analysis algorithm (FCA) to minimize the error caused by ignoring high order dependence. We apply FCA to statistical leakage power analysis and SRAM cell noise margin variation analysis. Experimental results show that the proposed FCA method is more accurate compared to the traditional PCA or ICA.
    Integration the VLSI Journal 09/2014; · 0.53 Impact Factor
  • Source
    IEEE Design and Test of Computers 08/2014; 22(5):6-15. · 1.62 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Statistical circuit simulation is exhibiting increasing importance for circuit design under process variations. Existing approaches cannot efficiently analyze the failure probability for circuits with a large number of variation, nor handle problems with multiple disjoint failure regions. The proposed rare event microscope (REscope) first reduces the problem dimension by pruning the parameters with little contribution to circuit failure. Furthermore, we applied a nonlinear classifier which is capable of identifying multiple disjoint failure regions. In REscope, only likely-to-fail samples are simulated then matched to a generalized pareto distribution. On a 108-dimension charge pump circuit in PLL design, REscope outperforms the importance sampling and achieves more than 2 orders of magnitude speedup compared to Monte Carlo. Moreover, it accurately estimates failure rate, while the importance sampling totally fails because failure regions are not correctly captured.
  • Design Automation Conference (ASP-DAC), 2014 19th Asia and South Pacific; 01/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Boolean matching is one of the fundamental and time-consuming procedures in field-programmable gate array (FPGA) synthesis. The SAT-based Boolean matchers (BMs) are not scalable while other Boolean matchers based on complicated Boolean logic operation algorithms are not flexible for complex PLBs. Recently, a scalable Boolean matcher (F-BM) based on the Bloom filter has been proposed for both scalability and flexibility. However, it requires large amount of memory space which can be a bottleneck for traditional personal computers. To tackle that problem, this letter proposes a novel Boolean matcher with much less memory requirement. Compared with F-BM, the proposed Boolean matcher has achieved an average of 5% better result with 2000x smaller storage and only 1.6x more runtime when applying to the same application. The significant reduction of storage requirements makes the proposed Boolean matcher able to handle more complicated PLB structures with larger input sizes.
    IEEE embedded systems letters 12/2013; 5(4):65-68.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Sitting posture analysis is widely applied in many daily applications in biomedical, education, and health care domains. It is interesting to monitor sitting postures in an economic and comfortable manner. Accordingly, we present a textile-based sensing system, called Smart Cushion, which analyzes the sitting posture of human being accurately and non-invasively. First, we introduce the electrical textile sensor and its electrical characteristics, such as offset, scaling, crosstalk, and rotation. Second, we present the design and implementation of the Smart Cushion system. Several effective techniques have been proposed to improve the recognition rate of sitting postures, including sensor calibration, data representation, and dynamic time warping-based classification. Last, our experimental results show that the recognition rate of our Smart Cushion system is in excess of 85.9%.
    IEEE Sensors Journal 10/2013; 13(10):3926-3934. · 1.85 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Through-silicon-via (TSV) enables vertical connectivity between stacked chips or interposer and is a key technology for 3-D integrated circuits (ICs). While arrays of TSVs are needed in 3-D IC, there only exists a frequency-dependent resistance, inductance, conductance and capacitance circuit model for a pair of TSVs with coupling between them. In this paper, we develop a simple yet accurate circuit model for a multiport TSV network (e.g., coupled TSV array) by decomposing the network into a number of TSV pairs and then applying circuit models for each of them. We call the new model a pair-based model for the multiport TSV network. It is first verified against a commercial electromagnetic solver for up to 20 GHz and subsequently employed for a variety of examples for signal and power integrity analysis.
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 04/2013; 32(4):487-496. · 1.20 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: It has become increasingly challenging to model the stochastic behavior of analog/mixed-signal (AMS) circuits under large-scale process variations. In this paper, a novel moment-matching based method has been proposed to accurately extract the probabilistic behavioral distributions of AMS circuits. This method first utilizes Latin Hypercube Sampling (LHS) coupling with a correlation control technique to generate a few samples (e.g., sample-size is in linear with number of variable parameters) and further analytically evaluate the high-order moments of the circuit behavior with high accuracy. In this way, the “arbitrary” probabilistic distributions of the circuit behavior can be extracted using moment-matching method. More importantly, the proposed method has been successfully applied to high-dimensional prob-lems with linear complexity. The experiments demonstrate that the proposed method can provide up to 1666X speedup over crude Monte Carlo method for the same accuracy.
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 01/2013; 32(1):24-33. · 1.20 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Maximum entropy (MAXENT) is a powerful and flexible method for estimating the arbitrary probabilistic distribution of a stochastic variable with moment constraints. However, modeling the stochastic behavior of analog/mixed-signal (AMS) circuits using MAXENT is still unknown. In this paper, we present a MAXENT based approach to efficiently model the arbitrary behavioral distribution of AMS circuits with high accuracy. The exact behavioral distribution can be approximated by a product of exponential functions with different Lagrangian multipliers. The closest approximation can be obtained by maximizing Shannon's information entropy subject to moment constraints, leading to a nonlinear system. Classic Newton's method is used to solve the nonlinear system for the Lagrangian multipliers, which can further recover the arbitrary behavioral distribution of AMS circuits. Extensive experiments on different circuits demonstrate that the proposed MAXENT based approach offers better stability and improves the accuracy up to 110% when compared to previous AWE-based moment matching approaches, and offers up to 592x speedup when compared to Monte Carlo method.
    Quality Electronic Design (ISQED), 2013 14th International Symposium on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Reliability has become an increasingly important concern for SRAM-based field programmable gate arrays (FPGAs). Targeting SEU (single event upset) in SRAM-based FPGAs, this article first develops an SEU evaluation framework that can quantify the failure sensitivity for each configuration bit during design time. This framework considers detailed fault behavior and logic masking on a post-layout FPGA application and performs logic simulation on various circuit elements for fault evaluation. Applying this framework on MCNC benchmark circuits, we first characterize SEUs with respect to different FPGA circuits and architectures, for example, bidirectional routing and unidirectional routing. We show that in both routing architectures, interconnects not only contribute to the lion's share of the SEU-induced functional failures, but also present higher failure rates per configuration bits than LUTs. Particularly, local interconnect multiplexers in logic blocks have the highest failure rate per configuration bit. Then, we evaluate three recently proposed SEU mitigation algorithms, IPD, IPF, and IPV, which are all logic resynthesis-based with little or no overhead on placement and routing. Different fault mitigating capabilities at the chip level are revealed, and it demonstrates that algorithms with explicit consideration for interconnect significantly mitigate the SEU at the chip level, for example, IPV achieves 61% failure rate reduction on average against IPF with about 15%. In addition, the combination of the three algorithms delivers over 70% failure rate reduction on average at the chip level. The experiments also reveal that in order to improve fault tolerance at the chip level, it is necessary for future fault mitigation algorithms to concern not only LUT or interconnect faults, but also their interactions. We envision that our framework can be used to cast more useful insights for more robust FPGA circuits, architectures, and better synthesis algorithms.
    ACM Transactions on Design Automation of Electronic Systems 01/2013; 18(1). · 0.52 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Modern computing system applications or workloads can bring significant non-uniform temperature gradient on-chip, and hence can cause significant temperature uncertainty during clock-tree synthesis. Existing designs of clock-trees have to assume a given time-invariant worst-case temperature map but cannot deal with a set of temperature maps under a set of workloads. For robust clock-tree synthesis considering temperature uncertainty, this paper presents a new problem formulation: Stochastic PErturbation based Clock Optimization (SPECO). In SPECO algorithm, one nominal clock-tree is pre-synthesized with determined merging points. The impact from the stochastic temperature variation is modeled by perturbation (or small physical displacement) of merging points to offset the induced skews. Because the implementation cost is reduced but the design complexity is increased, the determination of optimal positions of perturbed merging points requires a computationally efficient algorithm.In this paper, one Non-Monte-Carlo (NMC) method is deployed to generate skew and skew variance by one-time analysis when a set of stochastic temperature maps is already provided. Moreover, one principal temperature–map analysis is developed to reduce the design complexity by clustering correlated merging points based on the subspace of the correlation matrix. As a result, the new merging points can be efficiently determined level by level with both skew and its variance reduced. The experimental results show that our SPECO algorithm can effectively reduce the clock-skew and its variance under a number of workloads with minimized wire-length overhead and computational cost.
    Integration the VLSI Journal 01/2013; 46(1):22-32. · 0.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a parallel and incremental solver for stochastic capacitance extraction. The random geomet-rical variation is described by stochastic geometrical moments, which lead to a densely augmented system equation. To efficiently extract the capacitance and solve the system equation, a parallel fast-multipole-method (FMM) is developed in the framework of stochastic geometrical moments. This can efficiently estimate the stochastic potential interaction and its matrix-vector product (MVP) with charge. Moreover, a generalized minimal residual (GMRES) method with incremental update is developed to calculate both the nominal value and the variance. Our overall extraction flow is called piCAP. A number of experiments show that piCAP efficiently handles a large-scale on-chip capacitance extraction with variations. Specifically, a parallel MVP in piCAP is up to 3X faster than a serial MVP, and an incremental GMRES in piCAP is up to 15X faster than non-incremental GMRES methods.
    IEEE Transactions on Very Large Scale Integration (VLSI) Systems 09/2012; 20(9):1729-1737. · 1.14 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The most challenging problem in the current block-based statistical static timing analysis (SSTA) is how to handle the max operation efficiently and accurately. Existing SSTA techniques suffer from limited modeling capability by using a linear delay model with Gaussian distribution, or have scala-bility problems due to expensive operations involved to handle non-Gaussian variation sources or nonlinear delays. To overcome these limitations, we propose efficient algorithms to handle the max operation in SSTA with both quadratic delay dependency and non-Gaussian variation sources simultaneously. Based on such algorithms, we develop an SSTA flow with quadratic delay model and non-Gaussian variation sources. All the atomic operations, max and add, are calculated efficiently via either closed-form formulas or low dimension (at most 2-D) lookup tables. We prove that the complexity of our algorithm is linear in both variation sources and circuit sizes, hence our algorithm scales well for large designs. Compared to Monte Carlo simulation for non-Gaussian variation sources and nonlinear delay models, our approach pre-dicts the mean, standard deviation and 95% percentile point with less than 2% error, and the skewness with less than 10% error.
    IEEE Transactions on Very Large Scale Integration (VLSI) Systems 08/2012; 1. · 1.14 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Gait analysis is an important medical diagnostic process and has many applications in rehabilitation, therapy and exercise training. However, standard human gait analysis has to be performed in a specific gait lab and operated by a medical professional. This traditional method increases the examination cost and decreases the accuracy of the natural gait model. In this paper, we present a novel portable system, called Smart Insole, to address the current issues. Smart Insole integrates low cost sensors and computes important gait features. In this way, patients or users can wear Smart Insole for gait analysis in daily life instead of participating in gait lab experiments for hours. With our proposed portable sensing system and effective feature extraction algorithm, the Smart Insole system enables precise gait analysis. Furthermore, taking advantage of the affordability and mobility of Smart Insole, pervasive gait analysis can be extended to many potential applications such as fall prevention, life behavior analysis and networked wireless health systems.
    Proceedings of the 5th International Conference on PErvasive Technologies Related to Assistive Environments; 06/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Process variation in nanometer technology is becoming an important issue for cutting-edge FPGAs with a multimillion gate capacity. Considering both die-to-die and within-die variations in effective channel length, threshold voltage, and gate oxide thickness, we first develop closed-form models of chip-level FPGA leakage and timing variations. Experiments show that the mean and standard deviation computed by our models are within 3% from those computed by Monte Carlo simulation. We also observe that the leakage and timing variations can be up to 3X and 1.9X, respectively. We then derive analytical yield models considering both leakage and timing variations, and use such models to evaluate the performance of FPGA device and architecture considering process variations. Compared to the baseline, which uses the VPR architecture and device setup based on the ITRS roadmap, device and architecture tuning improves leakage yield by 10.4%, timing yield by 5.7%, and leakage and timing combined yield by 9.4%. We also observe that LUT size of 4 gives the highest leakage yield, LUT size of 7 gives the highest timing yield, but LUT size of 5 achieves the maximum leakage and timing combined yield. To the best of our knowledge, this is the first in-depth study on FPGA architecture and device coevaluation considering process variation.
    ACM Transactions on Reconfigurable Technology and Systems 06/2012; · 0.41 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Importance sampling is a popular approach to estimate rare event failures of SRAM cells. We propose to improve importance sampling by probability collectives. First, we use "Kullback-Leibler (KL) distance" to measure the distance between the optimal sampling distribution and the original sampling distribution of variable process parameters. Further, the probability collectives (PC) technique using immediate sampling is adapted to analytically minimize the KL distance and to obtain a sampling distribution as close to the optimal as possible. The proposed algorithm significantly accelerates the convergence of importance sampling. Experiments demonstrate that proposed algorithm is 5200X faster than the Monte Carlo approach and achieves more than 40X speedup over other existing state-of-the-art techniques without compromising estimation accuracy.
    Proceedings of the 2012 ACM international symposium on International Symposium on Physical Design; 03/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Differential signaling has been widely used in high-speed interconnects. Signal integrity issues, such as inter-symbol interference (ISI) and crosstalk between the differential pair, however, still cause significant timing jitter and amplitude noise and heavily limit the performance of the differential link. The pre-emphasis filter is commonly used to reduce ISI but may potentially change the crosstalk behavior. In this paper, we first propose formula-based jitter and noise models considering the combined effect of ISI, crosstalk, and pre-emphasis filter. With the same set of input patterns, experiment shows our models achieve within 5% difference compared with SPICE simulation. By utilizing these formula-based models, we then develop algorithms to directly find out the input patterns for worst-case jitter and worst-case amplitude noise through pseudo-Boolean optimization (PBO) and mathematical programming. In addition, a heuristic algorithm is proposed to further reduce runtime. Experiments show our algorithms obtain more reliable worst-case jitter and noise compared with pseudorandom bit sequences simulation and, meanwhile, reduce runtime by $25\times$ when using a general PBO solver and by $150\times$ when using our proposed heuristic algorithm.
    IEEE Transactions on Very Large Scale Integration (VLSI) Systems 01/2012; 20:89-97. · 1.14 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present HCS - Heterogeneous CRAM Scrubbing - for FPGAs. By utilizing stochastic fault modeling for SEUs in CRAM, we present a quantitative estimate of system MTTF improvement through CRAM scrubbing. HCS then leverages the fact that different SEUs have unequal effects on the circuit system operation, and thus the CRAM bits can be scrubbed at different rates based on the sensitivity of the bits to the circuit system failures. To maximize the improvement on system MTTF for a given circuit system, we present a dynamic programming algorithm which solves the problem efficiently and effectively. Through a detailed case study on system level study by an H.264/AVC decoder implemented on a Xilinx Virtex-5 FPGA, we show an estimation of 60% MTTF improvement by HCS over the existing homogeneous CRAM scrubbing method, while contributing virtually no area, performance and power overhead to the system.
    Field-Programmable Technology (FPT), 2012 International Conference on; 01/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Through-silicon-via (TSV) enables vertical connectivity between stacked chips or interposer and is a key technology for three-dimensional (3D) ICs. In this paper, we study the signal integrity issues of TSV-based 3D IC with high-speed signaling based on 3D electromagnetic field solver and SPICE simulations. Unlike other existing works, our study focuses on an array of TSVs and includes power and bandwidth trade-off between different signaling and termination techniques, such as single-ended, differential and reduced-swing signaling. From our study, to achieve the best power efficiency, unterminated single-ended reduced-swing signaling should be applied, while terminated single-ended signaling can provide the maximum bandwidth. Beyond TSV, critical design challenges for the junction structure between TSVs and RDL traces are also revealed and analyzed. Result shows that at 20GHz, the fanout-like junction structure could cause more than 10dB return loss (S11) degradation when changing TSV pitch from 50μm to 200μm and even contribute more insertion loss (S21) than the TSV itself.
    Electrical Performance of Electronic Packaging and Systems (EPEPS), 2012 IEEE 21st Conference on; 01/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Performance failure has become a significant threat to the reliability and robustness of analog circuits. In this article, we first develop an efficient non-Monte-Carlo (NMC) transient mismatch analysis, where transient response is represented by stochastic orthogonal polynomial (SOP) expansion under PVT variations and probabilistic distribution of transient response is solved. We further define performance yield and derive stochastic sensitivity for yield within the framework of SOP, and finally develop a gradient-based multiobjective optimization to improve yield while satisfying other performance constraints. Extensive experiments show that compared to Monte Carlo-based yield estimation, our NMC method achieves up to 700X speedup and maintains 98% accuracy. Furthermore, multiobjective optimization not only improves yield by up to 95.3% with performance constraints, it also provides better efficiency than other existing methods.
    ACM Trans. Design Autom. Electr. Syst. 01/2012; 17:10.

Publication Stats

3k Citations
78.76 Total Impact Points


  • 1995–2014
    • University of California, Los Angeles
      • • Department of Electrical Engineering
      • • Department of Computer Science
      Los Angeles, California, United States
  • 2003–2008
    • Tsinghua University
      • Department of Computer Science and Technology
      Beijing, Beijing Shi, China
  • 2007
    • Synopsys
      Mountain View, California, United States
  • 2005–2007
    • University of California, Riverside
      • Department of Electrical Engineering
      Riverside, CA, United States
    • CSU Mentor
      Long Beach, California, United States
  • 2004
    • Antioch University, Los Angeles
      Los Angeles, California, United States
  • 2000–2003
    • University of Wisconsin, Madison
      • Department of Electrical and Computer Engineering
      Mississippi, United States