Conference Paper

Q&R On-Chip (QROC): A Unified, Oven-less and Scalable Circuit Reliability Platform

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Moreover, the aging effects become more pronounced in deep sub-micron ICs due to increased manufacturing uncertainties, resulting in varying burn-in speeds among different ICs. Consequently, even if IC chips pass structural and functional tests, they can still face failure problems during practical use due to aging effects [19][20][21][22][23][24][25].To address these issues, this research presents utilizing the initial threshold voltage of the critical path as a key parameter for assessment the aging durability of ICs and employs an on-chip test structure to rapidly test the initial threshold voltages of different critical paths [26][27][28][29][30]. The on-chip test structure offers high calibration precision and enables individual prediction of the aging durability of ICs while simultaneously detecting the actual aging condition of the circuit during use. ...
Article
The evolution of transistor topology from planar to confined geometry transistors (i.e., FinFET, Nanowire FET, Nanosheet FET) has met the desired performance specification of sub-20 nm integrated circuits (ICs), but only at the expense of increased power density and thermal resistance. Thus, self-heating effect (SHE) has become a critical issue for performance/reliability of ICs. Indeed, temperature is one of the most important factors determining ICs reliability, such as Negative Bias Temperature Instability (NBTI), Hot Carrier Injection (HCI), and Electromigration (EM). Therefore, an accurate SHE model is essential for predictive, reliability-aware ICs design. Although SHE is collectively determined by the thermal resistances/capacitances associated with various layers of an IC, most researchers focus on isolated components within the hierarchy (i.e., a single transistor, few specific circuit configurations, or specialized package type). This fragmented approach makes it difficult to verify the implications of SHE on performance and reliability of ICs based on confined geometry transistors. In this paper, we combine theoretical modeling and systematic transistor characterization to extract thermal parameters at the transistor level to demonstrate the importance of multi-time constant thermal circuits to predict the spatio-temporal SHE in modern sub-20 nm transistors. Based on the refined Berkeley Short-channel IGFET Model Common Multi-Gate (BSIM-CMG) model, we examine SHE in typical digital circuits (e.g., ring oscillator) and analog circuits (e.g., two-stage operational amplifier) by Verilog-A based HSPICE simulation. Similarly, we develop a physics-based thermal compact model for packaged ICs using an effective media approximation for the Back End Of Line (BEOL) interconnects and ICs packaging. We integrate these components to investigate SHE behavior implication on ICs reliability and explain why one must adopt various (biomimetic) strategies to improve the lifetime of self-heated ICs.
Conference Paper
Asymmetric BTI aging in circuit paths has shown to cause a time dependent shift in the signal's duty cycle, affecting the performance of circuits such as low power SRAMs whose operation rely on both the positive and negative edges of the clock signal. In this work, we propose the first known on-chip reliability monitor to accurately characterize the impact of asymmetric BTI on SRAM read speed. Statistical data collected from test chip built in a 32nm high-k metal-gate technology shows that (i) the average SRAM read frequency decreases with stress while its variation increases with stress and (ii) both μ and σ of read frequency shift follow a power law dependence on stress time. These observations point to the impact of SRAM peripheral circuit aging on read performance, and the utility of the proposed monitor for characterizing circuit level reliability concerns.
Conference Paper
A circuit for long-term measurement of bias temperature instability (BTI) degradation is described. It is an entirely on-chip measurement circuit, which reports measurements periodically with a digital output. Implemented on IBM's z196 Enterprise systems, it can be used to monitor long-term degradation under real-use conditions. Over 500 days worth of ring oscillator degradation data from customer systems are presented. The importance of using a reference oscillator to measure performance degradation in the field, where the supply voltage and temperature can vary dynamically, is shown.
Article
This paper describes the implications of bias temperature instability (BTI)-induced time-dependent threshold voltage distributions on the performance and yield estimation of digital circuits. The statistical distributions encompassing both time-zero and time-dependent variability and their correlations are discussed. The impact of using normally distributed threshold voltages, imposed by state-of-the-art design approaches, is contrasted with our defect-centric approach. Extensive Monte Carlo simulation results are shown for static random access memory cell and ring oscillator structures.
Article
Variations in the number and characteristics of charges or traps contributing to transistor degradation lead to a distribution of device "ages" at any given time. This issue is well understood in the study of time dependent dielectric breakdown, but is just beginning to be thoroughly addressed under bias tem- perature instability (BTI) and hot carrier injection (HCI) stress. In this paper, we present a measurement system that facilitates efficient statistical aging measurements involving the latter two mechanisms in an array of ring oscillators. Microsecond mea- surements for minimal BTI recovery, as well as frequency shift measurement resolution ranging down to the error floor of 0.07% are achieved with three beat frequency detection systems working in tandem. Measurement results from a 65 nm test chip show that fresh frequency and the stress-induced shift are uncorrelated, both the mean and standard deviation of that shift increase with stress, and the standard deviation/mean ratio decreases with stress time.
Article
In this paper, we propose a methodology to solve leakage power self-consistently with temperature to predict thermal runaway. We target 28-nm-technology-node FinFET-based circuits as they are more prone to thermal runaway because of self-heating and less efficient heat dissipation compared to bulk metal-oxide-semiconductor field-effect transistors. We have generated thermal models for logic cells-inverter, NAND, and NOR-to self-consistently determine the temperature map of a circuit block. Our cell-level thermal models account for lateral heat flow (contribution of neighboring cells) along with vertical heat dissipation to the heat sink. We predict positive feedback between subthreshold leakage and temperature for all the cells in a given floor plan. Our proposed condition for thermal runaway shows the design tradeoff between the primary input (PI) activity of a circuit block, subthreshold leakage at the room temperature, and thermal resistance of the package. We show that, in FinFET circuits, thermal runaway can occur at the International Technology Roadmap for Semiconductors-specified subthreshold leakage (of 150 for high performance) for a nominal PI activity of 0.5 and typical package thermal resistance. In addition, we show that the maximum temperature rise in an integrated circuit is limited by package limitations.
Article
In deep submicrometer technologies, increased standby leakage current in high-performance processors results in increased junction temperature. Elevated junction temperature causes further increase on the standby leakage current. The standby leakage current is expected to increase even more under the burn-in environment leading to still higher junction temperature and possibly the thermal runaway. In this paper, for the first time the concept of thermal runaway and the conditions that lead to thermal runaway is described. Also, the thermal management of high-performance microprocessors to avoid thermal runaway is investigated