Article

Reliability evaluation of logic circuits using probabilistic gate models

February 2011
Microelectronics Reliability 51(2):468-476

DOI:10.1016/j.microrel.2010.07.154

Source
DBLP

Authors:

University of Alberta

Logic circuits built using nanoscale technologies have significant reliability limitations due to fundamental physical and manufacturing constraints of their constituent devices. This paper presents a probabilistic gate model (PGM), which relates the output probability to the error and input probabilities of an unreliable logic gate. The PGM is used to obtain computational algorithms, one being approximate and the other accurate, for the evaluation of circuit reliability. The complexity of the approximate algorithm, which does not consider dependencies among signals, increases linearly with the number of gates in a circuit. The accurate algorithm, which accounts for signal dependencies due to reconvergent fanouts and/or correlated inputs, has a worst-case complexity that is exponential in the numbers of dependent reconvergent fanouts and correlated inputs. By leveraging the fact that many large circuits consist of common logic modules, a modular approach that hierarchically decomposes a circuit into smaller modules and subsequently applies the accurate PGM algorithm to each module, is further proposed. Simulation results are presented for applications on the LGSynth91 and ISCAS85 benchmark circuits. It is shown that the modular PGM approach provides highly accurate results with a moderate computational complexity. It can further be embedded into an early design flow and is scalable for use in the reliability evaluation of large circuits.

The Open System for Storing and Processing of a Dataset of Combinational Circuits

Article

Jan 2023

This paper presents an open-source software for generation, storage, and analysis of combinational circuits. The previously created methods for generating combinational circuits have been optimized, and a dataset has been formed. The generation of combinational circuits is carried out on various devices. The application implements the possibility to combine the generated datasets into a single storage (Synology Drive), as well as analyze the fault tolerance of combinational circuits using various methods for their evaluation. New possible methods for assessing combinational circuits’ reliability using machine learning are proposed.

Logic Circuits Reliability Analysis Using Signal Probability and Bayesian Network Concepts

Article

Sep 2022

Hadi Jahanirad

Background Reliability analysis of logic circuits has been widely investigated due to increasing the fault occurrence in modern integrated circuits. Simulation-based and analytical methods are developed to estimate the reliability of logic circuits. Methods In this paper, a signal probability-based method is presented to estimate the reliability of logic circuits. In the proposed approach, four signal probabilities (correct 0, correct 1, incorrect 0, and incorrect 1) are derived (for every node of the circuit) using a probabilistic transfer matrix (PTM), and the correlation coefficients (CCs) are deployed to resolve the reconvergent fanouts issues. The CCs are defined in a fanout cone and are propagated through the logic gates to the related reconvergent nodes. The Bayesian network concept is applied to achieve more accuracy in the propagation of CCs through the logic gates. Results The accuracy and scalability of the proposed method are proved by various simulations on benchmark circuits (ISCAS 85, LGSynth91, and ITC99). The proposed method efficiently solves the reconvergent fanout problem. Moreover, the proposed method outperforms the previous methods regarding accuracy and scalability. Conclusion Simulation results on ISCAS 85, LGSynth91, and ITC99 benchmark circuits show less than 3% average error compared with the accurate simulation-based fault injection method.

Reliability Estimation of Logic Circuits at the Transistor Level

Article

Full-text available

May 2021
CIRC SYST SIGNAL PR

Hadi Jahanirad

The reliability evaluation of logic circuits is an essential step in the computer-aided design flow of emerging integrated circuits (IC). Due to the increased process variation effects in submicron IC technologies, reliability evaluation should include the transistor-level faults’ modeling and analysis. In this paper, a two-step reliability evaluation method was developed. In the first step, the gate error probability (in a matrix form) was computed based on transistor fault modeling. In the second step, the circuit’s graph was traversed in topological order. Meanwhile, for each gate, the probability of the gate’s output to be in 16 possible states was computed using the gate error probability matrix (calculated in the first step) and the corresponding gate inputs’ probability matrices. The reliability of the circuit’s outputs was extracted from the related 16-state probability matrix. Furthermore, the reconvergent fan-out problem was handled using the concept of correlation coefficients. Various simulations were performed on ISCAS 89 and LGSynth91 benchmark circuits. Compared with MonteCarlo as a reference method, the results indicated a < 3% average error on reliability estimation. Furthermore, the estimation error and algorithm runtime of the proposed method significantly decreased in comparison with some state-of-the-art methods.

Soft Error Reliability Evaluation of Nanoscale Logic Circuits in the Presence of Multiple Transient Faults

Article

Full-text available

Aug 2020
J ELECTRON TEST

Radiation-induced single transient faults (STFs) are expected to evolve into multiple transient faults (MTFs) at nanoscale CMOS technology nodes. For this reason, the reliability evaluation of logic circuits in the presence of MTFs is becoming an important aspect of the design process of deep submicron and nanoscale systems. However, an accurate evaluation of the reliability of large-scale and very large-scale circuits is both very complex and time-consuming. Accordingly, this paper presents a novel soft error reliability calculation approach for logic circuits based on a probability distribution model. The correctness or incorrectness of individual logic elements are regarded as random events obeying Bernoulli distribution. Subsequently, logic element conversion-based fault simulation experiments are conducted to analyze the logical masking effects of the circuit when one logic element fails or when two elements fail simultaneously. On this basis, the reliability boundaries of the logic circuits can efficiently be calculated using the proposed probability model and fault simulation results. The proposed solution can obtain an accurate reliability range through single fault and double faults simulations with small sample sizes, and also scales well with the variation of the error rate of the circuit element. To validate the proposed approach, we have calculated the reliability boundaries of ISCAS’85, ISCAS’89, and ITC’99 benchmark circuits. Statistical analysis and experimental results demonstrate that our method is effective and scalable, while also maintaining sufficiently close accuracy.

Identifying Desirable Function Perturbations in Signaling Pathways Through Stochastic Analysis

Article

Full-text available

Jan 2020

The logical representation and analyses of biological systems provide us with valuable insights. Inherent fluctuations play an important role in the long-term behaviors of biological systems. In this work, we mainly investigate the effects of function perturbations that might contribute to the development of effective therapies. We propose a stochastic model for one-bit perturbations and perform effective analyses of biological systems subjected to function perturbations. In addition, we also consider the scenario of a multibit perturbation (i.e., a one-bit perturbation occurring in multiple columns); thereby, we can determine the most effective column combination for the purpose of maximizing or minimizing the desired signal probability. Through stochastic analysis of the caspase3 signaling pathway, the corresponding practicability and effectiveness are demonstrated. Consequently, appropriate therapies can be determined to maximize or minimize the probability of caspase3 signaling.

Survey on Reliability Estimation in Digital Circuits

Article

Full-text available

Dec 2021

The aggressive technology scaling has significantly affected the circuit reliability. The interaction of environmental radiation with the devices in the integrated circuits (ICs) may be the dominant reliability aspect of advanced ICs. Several techniques have been explored to mitigate the radiation effects and guarantee a satisfactory reliability levels. In this context, estimating circuit radiation reliability is crucial and a challenge that has not yet been overcome. For decades, several different methods have been proposed to provide circuit reliability. Recently, the radiation effects have been more faithfully incorporated in these strategies to provide the circuit susceptibility more accurately. This paper overviews the current trend for estimating the radiation reliability of digital circuits. The survey divides the approaches into two abstraction levels: (i) gate-level that incorporate the layout information and (ii) circuit-level that traditionally explore the logic circuit characteristic to provide the radiation susceptibility of combinational circuits. We also present an open-source tool that incorporates several previously explored methods. Finally, the actual research aspects are discussed, providing the newly emerging topic, such as selective hardening and critical vector identification. © 2021, Brazilian Microelectronics Society. All rights reserved.

Evaluating the Reliability of Integer Multipliers With Respect to Permanent Faults

Conference Paper

Full-text available

Mar 2024

Arithmetic circuits form the foundation of modern digital computation, enabling us to conduct precise mathematical operations and drive the digital age. They are integral components in nearly every digital circuit, such as processors' arithmetic and logic units. Especially in safety-critical domains like automotive and aviation, the flawless operation of these circuits is of paramount importance. This paper presents a case study involving two variants of Dadda multipliers and assesses their intrinsic reliability when affected by permanent hardware faults. We conducted extensive fault injection campaigns on the circuit models under various datasets, presenting the aggregated statistical errors in the form of the mean absolute error (MAE) for each case. Specifically, we performed fault injection campaigns in which the operands are sourced from trained quantized weights of a convolutional neural network, as well as randomly generated sets of integers. The results not only reveal differences between the two circuits but also show significant variations when different datasets are used in the fault injection campaigns.

An Effective Fanout-Based Method for Improving Error Propagation Probability Estimation in Combinational Circuits

Article

Full-text available

Jan 2024

The downsizing of nanoscale circuits imposes new challenges for circuit reliability, including hard defects, soft errors and unsaturated voltage/current. Many studies on the reliability of digital circuits have focused on achieving accurate reliability estimation and more efficiency for larger circuits. To achieve accurate reliability estimation, it is necessary to address the issue of error propagation and consider correlated signals from reconverging paths in reliability calculations. In this paper, an error propagation probability model for each gate, which takes into account the probability of an unreliable logic gate’s input signal and relates it to the probability of the output signal is proposed. Additionally, we introduce an efficient approach that utilizes a new fanout matrix to tackle the reconvergent fanouts problem. Furthermore, to ensure an accurate estimation of combinational logic circuit reliability, the probabilities obtained for each fanout should be included in the calculations by defining a fanout probability matrix. To address this issue, a new method is proposed at each calculation stage, aiming to minimize computational complexity making it suitable for large circuits with a significant number of fanouts. We conducted various simulations to demonstrate the accuracy and scalability of the proposed method on the ISCAS 85 benchmark circuit and EPFL Benchmark. The results show less than 1% average relative error in reliability estimation and outperform state-of-the-art methods in reliability estimation and algorithm runtime.

Utilization of Modular Eight-Variable Karnaugh Maps in Digital Design

Chapter

Full-text available

Oct 2022

This chapter introduces a regular and modular version of the 8-variable Karnaugh map, which is a map having an unusually high variable-handling capability. The chapter also analyzes an-bit comparator in the general case of arbitrary and visualizes this analysis for on the afore-mentioned map. The cases 3, 2, and 1 appear as special cases on 6-variable, 4-variable, and 2-variable submaps of the original map. The evaluation is a tutorial exposition of several important ideas in switching theory, such as implicants, prime implicants, essential prime implicants, minimal sum, complete sum, disjoint sum of products (or probability-ready expressions), Boole-Shannon Expansion, and unate and threshold functions.

Developing Methods for Combinational Circuit Generation

Conference Paper

Sep 2022

An Accurate Estimation Algorithm for Failure Probability of Logic Circuits Using Correlation Separation

Article

Full-text available

Apr 2022
J ELECTRON TEST

As the feature size of integrated circuits decreases to the nanometer scale, process fluctuations, aging effects, and particle radiation have an increasing influence on the Failure Probability of Circuits (FPC), which brings severe challenges to chip reliability. The accurate and efficient estimation of logic circuit failure probability is a prerequisite for high-reliability design. It is difficult to calculate FPC due to a large number of reconvergent fanout structures and the resulting signal correlation, particularly for Very Large-Scale Integrated (VLSI) circuits. Accordingly, this paper presents a Correlation Separation Approach (COSEA) that aims to efficiently and accurately estimate the FPC. The proposed COSEA divides the circuit into several different fanout-relevant and fanout-irrelevant circuits. Moreover, the error probability of the nodes is expressed as the result of interactions between different structures. As a result, the problem of signal correlation can be efficiently solved. Because the computational complexity of COSEA is linearly related to the scale of the circuit, it has good scalability. Compared with the Probabilistic Transfer Matrices (PTM) method, Monte Carlo simulation (MC), and other failure probability calculation methods in the literatures, the experimental results show that our approach not only achieves fast speed and good scalability, but also maintains high accuracy.

Survey, Taxonomy, and Methods of QCA based Design Techniques – Part II: Reliability and Security

Article

Jun 2022
SEMICOND SCI TECH

Quantum-dot cellular automata is a new and adroit technology currently under extensive research for the post-CMOS era VLSI chip design. Quantum-dot cellular automata (QCA) has promised more reliable, fault-tolerant, and secure chip designs. Also, while analyzing the QCA circuits for power and energy dissipation, promising results have been reported that suggest that the QCA circuits dissipate significantly less energy and operate very close to the Shannon-von Neumann-Landauer (SNL) limit. Security is another concern that has led to the development of QCA based security systems like physically unclonable functions and true random number generators. In this paper, a survey of different fault-tolerant and QCA based security circuits has been provided, along with the discussion of critical design aspects and parameters in QCA technology.

Utilization of Eight-Variable Karnaugh Maps in the Digital Design of n-bit Comparators

Chapter

Full-text available

Feb 2021

An-bit comparator is a celebrated combinational circuit that compares two-bit inputs and and produces three orthonormal outputs: G (indicating that is strictly greater than), E (indicating that and are equal or equivalent), and L (indicating that is strictly less than). The symbols 'G', 'E', and 'L' are deliberately chosen to convey the notions of 'Greater than,' 'Equal to,' and 'Less than,' respectively. This paper analyzes an-bit comparator in the general case of arbitrary and visualizes the analysis for = 4 on a regular and modular version of the 8-variable Karnaugh-map. The cases = 3, 2 and 1 appear as special cases on 6-variable, 4-variable, and 2-variable submaps of the original map. The analysis is a tutorial exposition of many important concepts in switching theory including those of implicants, prime implicants, essential prime implicants, irredundant disjunctive forms, minimal sums, the complete sum and disjoint sums of products (or probability-ready expressions).

Reliability-driven pin assignment optimization to improve in-orbit soft-error rate

Article

Full-text available

Oct 2020
MICROELECTRON RELIAB

Electronics are increasingly susceptible to energetic particle interactions within the silicon. In order to improve the circuit reliability under radiation effects, several hardening techniques have been adopted in the design flow of VLSI systems. This paper proposes a pin assignment optimization in logic gates to reduce the Single-Event Transient (SET) cross-section and improve the in-orbit soft-error rate. Signal probability propagation is used to assign the lowest probability to the most sensitive input combination of the circuit by rewiring or pin swapping. The cell optimization can reach up to 48% reduction on the soft-error rate. For the analyzed arithmetic benchmark circuits, an optimized cell netlist can achieve from 8% to 28% reduction on the SET cross-section and in-orbit soft-error rate at no cost in the circuit design area. Additionally, as the pin swapping is a layout-friendly technique, the optimization does not impact on the cell placement and it can be adopted along with other hardening techniques in the logic and physical synthesis.

Exact quantitative probabilistic model checking through rational search

Article

Full-text available

Dec 2020
FORM METHOD SYST DES

Model checking systems formalized using probabilistic models such as discrete time Markov chains (DTMCs) and Markov decision processes (MDPs) can be reduced to computing constrained reachability properties. Linear programming methods to compute reachability probabilities for DTMCs and MDPs do not scale to large models. Thus, model checking tools often employ iterative methods to approximate reachability probabilities. These approximations can be far from the actual probabilities, leading to inaccurate model checking results. On the other hand, specialized techniques employed in existing state-of-the-art exact quantitative model checkers, don’t scale as well as their iterative counterparts. In this work, we present a new model checking algorithm that improves the approximate results obtained by scalable iterative techniques to compute exact reachability probabilities. Our techniques are implemented as an extension of the PRISM model checker and are evaluated against other exact quantitative model checking engines.

Dynamic Reliability Management for FPGA-Based Systems

Article

Full-text available

Jun 2020

Radiation tolerance in FPGAs is an important field of research particularly for reliable computation in electronics used in aerospace and satellite missions. The motivation behind this research is the degradation of reliability in FPGA hardware due to single-event effects caused by radiation particles. Redundancy is a commonly used technique to enhance the fault-tolerance capability of radiation-sensitive applications. However, redundancy comes with an overhead in terms of excessive area consumption, latency, and power dissipation. Moreover, the redundant circuit implementations vary in structure and resource usage with the redundancy insertion algorithms as well as number of used redundant stages. The radiation environment varies during the operation time span of the mission depending on the orbit and space weather conditions. Therefore, the overheads due to redundancy should also be optimized at run-time with respect to the current radiation level. In this paper, we propose a technique called Dynamic Reliability Management (DRM) that utilizes the radiation data, interprets it, selects a suitable redundancy level, and performs the run-time reconfiguration, thus varying the reliability levels of the target computation modules. DRM is composed of two parts. The design-time tool flow of DRM generates a library of various redundant implementations of the circuit with different magnitudes of performance factors. The run-time tool flow, while utilizing the radiation/error-rate data, selects a required redundancy level and reconfigures the computation module with the corresponding redundant implementation. Both parts of DRM have been verified by experimentation on various benchmarks. The most significant finding we have from this experimentation is that the performance can be scaled multiple times by using partial reconfiguration feature of DRM, e.g., 7.7 and 3.7 times better performance results obtained for our data sorter and matrix multiplier case studies compared with static reliability management techniques. Therefore, DRM allows for maintaining a suitable trade-off between computation reliability and performance overhead during run-time of an application. 1. Introduction The advancement of semiconductor technology to nanometer dimensions has made the design of digital circuits challenging. Future shrinking of device parameters is inevitable due to the increasing need for high computing power and more computational blocks on a single system-on-chip. Besides the conventional performance, area, and power constraints, today’s circuits have to conform to reliability requirements. The reliability of the digital circuits however degrades due to probabilistic nature of errors appearing in nanodevices. These errors, being permanent or transient in nature, are mainly caused by process variations, thermal fluctuations, quantum effects, power supply noise, and capacitive/inductive coupling, as few examples [1–4]. These errors are treated for fault tolerance at device, logic, or network layers depending on the feasibility of mitigation at each of these layers. However, when the nanodevice-based circuits are used in high-radiation environments (consisting alpha particles and cosmic rays), an additional source of error emerges which is known as radiation-induced errors. Although the reliability studies for electronics include well-defined mitigation strategies for different sources of errors, they propose redundancy as the most efficient way of countering radiation-induced errors. The future of space computing is largely dominated by Field Programmable Gate Arrays (FPGAs) due to their capability of run-time modification of functional implementation on hardware. The reconfiguration feature of FPGAs allows them to perform various tasks during different phases of a mission [5]. Moreover, increase in on-board processing requirements on space missions for various image-processing applications goes well with the highly parallel architecture of FPGAs [6]. Most importantly, the space, weight, and power for a satellite payload design can be minimized with the usage of FPGAs due to their ability to perform multiple tasks without having a dedicated hardware for each of these tasks. The advantages of using FPGAs in space computing brought the attention of researchers and FPGA companies to ensure the fault-tolerant operation of FPGAs in high-radiation space environments. When FPGAs are exposed to high solar or cosmic radiation (involving high-energy electrons, alpha particles, and heavy ions), errors in the form of logic reversals appear in the digital circuit elements which could be as disastrous as causing a system-level failure or as moderate as internally masked errors. For long-term space missions, the accumulation of ionizing dose, measured as Total Ionizing Dose (TID), is an important design parameter which indicates how long the FPGA can withstand the radiation before its transistors begin to degrade [7]. For rather short-term space missions, Single-Event Effects (SEEs) cause temporary errors to appear in the circuit whose mitigation strategies vary from internal masking (using redundancy) to system-reset requirement. Generally, SEEs, which are measureable changes in the state of a microelectronic device [8], are classified into four domains. Single-Event Transient (SET) is a voltage spike causing a glitch in a combinational element; Single-Event Upset (SEU) is a soft error caused by the radiation particle in memory contents, particularly SRAM cells; Single-Event Latchup (SEL) is the high-current state in a device caused by the passage of a single energetic particle through sensitive regions of the device structure or Single-Event Functional Interrupt (SEFI) that cause the component to reset, lockup, or otherwise malfunction in a detectable way. SETs are typically short-term and ineffective unless they are latched into a sequential element, thus behaving as an SEU. SELs can be corrected by power cycling, while SEFIs require resetting the component [7–9]. The remaining category, i.e., SEU, is the major concern for the reliable FPGA operation due to its higher appearance rate compared with other SEEs as well as its accumulation effects. SRAM-based FPGAs are highly susceptible to SEUs appearing in their configuration and application memory elements. The alternative to SRAM-based FPGAs are radiation-hardened FPGAs which include the foremost Actel (currently known as Microsemi) RTAX antifuse FPGAs [10]. These devices are one-time programmable, and the development of permanent interconnections after configuration makes them immune to SEUs. However, being nonreprogrammable, they lose their charm for utilization in multiple design modification scenarios. The second popular category consists of flash-based FPGAs [11], which offer full reconfiguration though lack partial reconfiguration [6]. Moreover, these devices have typically lower TID rating than SRAM or antifuse FPGAs [12]. However, an additional category of radiation-tolerant FPGAs consists of space-grade versions which utilize the performance of SRAM FPGAs while having built-in radiation-tolerance features, e.g., Xilinx Virtex-5QV [13]. In contrast to inherent radiation-tolerant capabilities of FPGAs, fault-tolerant computation approaches in hardware and software are utilized as well. Redundancy [14, 15] and scrubbing [16, 17] are two most popular techniques for tolerating SEUs and avoiding their accumulation. A classical and widely used form of redundancy is triple modular redundancy (TMR). In principle, TMR instantiates three identical copies of a circuit and places a voter module at the end to take a majority decision for each output. Hence, TMR does not depend on error detection but mitigates the error by passage through the voter. In contrast, scrubbing involves continuously configuring configuration memory to prevent accumulation of errors. The combination of redundancy and scrubbing is considered a widespread optimal fault-tolerant solution in hardware. Additional approaches in the literature include duplication with comparison (DWC) [18], error checking and correcting codes (ECAC) [19], and algorithm-based fault tolerance (ABFT) [20]. However, in this paper, we focus solely on redundancy and its different variations in hardware. Hardware redundancy techniques for FPGAs are more involved than basic TMR with respect to partitioning a circuit into submodules, deciding on how many voters to insert, and where to place the voters in the FPGA design. Tools for automating redundancy insertion in FPGA designs are available, including the TMR tool of Xilinx [21], Precision Hi-Rel software [22], and the BYU-LANL TMR tool [23]. Fault-tolerance mechanisms, particularly modular redundancy, come with an overhead in terms of excessive area consumption as well as latency and power dissipation. Therefore, while providing fault tolerance, the design of a mission critical system also has to limit these overheads to given constraints. The radiation environment during the operation time of the satellite is variant, especially the radiation particle strike rate increases enormously above Earth’s magnetosphere [6, 24]. The fault-tolerance techniques, particularly redundancy, have a constant overhead in performance factors of area, latency, and power dissipation. However, in general, higher stages of redundancy provide more reliability at the cost of increasing overheads in performance factors. Therefore, the realization of reliability-performance trade-off is mandatory before designing a redundant system. In order to avoid a constant degradation in system performance due to a fixed overhead in performance factors, the system should be self-adaptive in a way to optimize the trade-off between reliability and performance factors, at run-time, based on the radiation strength of the environment. We implement this concept named as “Dynamic Reliability Management (DRM).” DRM, based on the partial reconfiguration of FPGAs, is beneficial as it allows for the optimization of the performance overheads and, thus, can save cost and power or free hardware resources for other tasks when feasible. The contribution of this paper lies around providing a complete approach for using SRAM-based FPGAs in space missions whereby using real-time radiation scenarios for our analysis and experimentation. Most importantly, we focus on visualizing the trade-off of reliability with the three main performance parameters, i.e., latency, area, and power consumption. This is in contrast to the research studies which work on the same research line but fail to show the complete picture of reliability versus three performance factors and how to prefer one redundant structure over another based on the performance constraints. This paper provides such solution where Pareto optimization can be used to filter out the suitable redundant structure based on one or more performance factors. Our developed tool flows are unique and more extensive than previous research works which are comprehensively verified in this paper as well. Last but not the least, we also provide the possible extensions to our experimentation platform, which can be used as future research directions in this vast and emerging research area. We have tried our best to be generic as well as having a bigger scope for our research so that the maximum research community in space computing can benefit from it. This paper is organized as follows. In the next section, we provide the foundation for understanding the adaptive reliability management concept by describing implementation strategies of redundancy in FPGAs, varying radiation environments, corresponding decision mechanisms, and commonly used reliability theories. Afterwards, we discuss and analyze major research works in self-adaptive reliability management and contrast them to our approach of Dynamic Reliability Management. Section 3 describes the two portions of our DRM approach as design and run-time parts, while verification of each of these parts has been provided with detailed experimentation in Section 4. Section 5 concludes the paper. 2. Background and Related Work In this section, we describe the basic implementation strategy of TMR in FPGAs followed by an insight into different forms of TMR and its cascaded version. To understand the varying radiation environment of space, we give real radiation scenarios as examples. The reliability computation of FPGA-based circuits can be done by conventional and probabilistic theories, which will be briefly described as well. Finally, we summarize and analyze the major research works similar to our approach of Dynamic Reliability Management. 2.1. Triple Modular Redundancy in FPGAs The concept of triple modular redundancy (TMR) is straightforward as it triplicates a logic design and takes the final output of the design from a voter placed at the outputs of the redundant modules [14]. The function of the voter is to sample three logic outputs and forward the majority result. The limitation of this architecture is the single point of failure, i.e., an error occurring in the voter renders the TMR technique useless. To avoid the single point of failure, we can create a more reliable architecture involving triplicated voters in addition to triplicated logic modules. This architecture runs the three branches in parallel unless they are to be interfaced with the outside world of the FPGA, where they can be converged using reducing voters or could be interfaced in the form of three outputs. The TMR technique can be used in an FPGA by simply triplicating the inputs, outputs, and logic modules, inserting buffers, and connecting the outputs of logic modules to the triplicated voter. This straightforward implementation is not suitable due to some practical considerations. Firstly, TMR is able to counter one error among the three redundant branches, and a larger length of each branch increases its probability of being erroneous more than once. To deal with this issue, there is a need to break the logic of the branch at regular intervals and place the triplicated voters in intermediate stages of the circuit as shown in Figure 1. Thus, an error occurring in one partition of the logic will not be forwarded to the next partition due to the error-mitigation effect of the triplicated voter. However, the minimum size of the logic partition, or granularity level, could be limited to a single component on an FPGA, e.g., a lookup table and multiplexer. In addition, there are certain locations on an FPGA called illegal-cut locations which should not be triplicated due to the FPGA architecture, e.g., dedicated route connections in a slice [25]. Moreover, voters should not be placed on high-speed carry chains in order to not deteriorate the timing performance of the design. Most importantly, voters should always be added in the feedback paths to avoid data corruption at the outputs of sequential elements being forwarded into the feedback paths [14, 25]. These voters are commonly denoted as synchronization voters.

Reliability Modeling and Analysis of Generalized Majority Systems by Stochastic Computation

Article

Full-text available

Mar 2020

The k-out-of-n: G(F) majority voter consists of n components (or modules) and a number of the components are required to be operating correctly for the overall system to be correct. As per the state discretization of the components, such a system is usually classified as either a binary system or a multi-state system. In practice, the operating conditions of different components may contribute differently to the operation of the entire system. In this manuscript, the k-out-of-n: G(F) majority voter is generalized as a consecutive-weighted-k-out-of-n: G(F) voter with either binary states or multiple states. To overcome the drawbacks of existing approaches, a stochastic analysis is proposed for assessing the system reliability. In the stochastic analysis, the input signal probabilities are encoded into non-Bernoulli sequences with fixed numbers of 0s and 1s for the Boolean case, or randomly permuted sequences for the multi-state scenario. By using stochastic logic, the reliability of a general system consisting of consecutive-weighted-k-out-of-n majority voters is efficiently and accurately predicted. The results are validated by an analysis of several case studies. Although the accuracy of the stochastic analysis is closely related with the employed sequence length, it is shown that a stochastic approach is more efficient than a universal generating function (UGF) method, while still retaining an acceptable accuracy.

A strategy for soft-error vulnerability estimation using the single-event transient susceptibilities of each gate

Thesis

Full-text available

Aug 2019

Fábio Batagin Armelin

The Soft-Error Vulnerability (SEV) is an estimated parameter that, in conjunction with the characteristics of the radiation environment, is used to obtain the Soft-Error Rate (SER), that is a metric used to predict how digital systems will behave in this environment. Currently, the most confident method for SER estimation is the radiation test, since it has the actual interaction of the radiation with the electronic device. However, this test is expensive and requires the real device, that becomes available late on the design cycle. These restrictions motivated the development of other SER and SEV estimation methods, including analytical, electrical and logic simulations, and emulation-based approaches. These techniques usually incorporate the logical, electrical and latching-window masking effects into the estimation process. Nevertheless, most of them do not take into account a factor that is intrinsic to the radiation test: the probability of the radiation particle to produce a Soft-Error (SE) at the output of the gates of the circuit, referred to as Single-Event Transient (SET) susceptibility. In this context, we propose a strategy for SEV estimation based on these SET susceptibilities, suitable for simulation- and emulation-based frameworks. In a simplified version of this strategy, the SET susceptibilities take into account only the effects of the gate topology, while in a complete version, these susceptibilities consider both the topology and the operation of the circuit, that affects its input pattern distribution. The proposed strategy was evaluated with a simulation-based framework, estimating the SEV of 38 benchmark circuits. The results show that both versions of the strategy lead to an improvement in the estimation accuracy, with the complete version presenting the lowest estimation error. Finally, we show the feasibility of adopting the proposed strategy with an emulation-based framework.

Economic Design of a Linear Consecutively Connected System Considering Cost and Signal Loss

Article

Oct 2019

Linear multistate consecutively connected systems (LMCCSs) have been widely applied in telecommunications. An LMCCS usually has several nodes arranged in sequence along a line, where connecting elements (CEs) are deployed at each node to provide connections to the following nodes. Many researchers have studied the reliability modeling and optimization of LMCCSs. However, most of the existing works on LMCCSs have focused on the uncertainty in connection ranges of CEs; none of them have considered signal loss during the transmission. In practice, a signal emitted from a node may neither completely reach nor completely not reach the destination node. In other words, only a fraction of the signal may reach the destination node whereas the rest is lost. This article makes new contributions by proposing a model that evaluates the expected signal fraction receivable by the sink node in an LMCCS subject to signal loss. Moreover, we solve the optimal design policy problem, which co-determines CEs allocation and nodes building to minimize the system cost while meeting certain constraints on system reliability and expected receivable signal fraction. Three examples are provided to illustrate the proposed model.

Soft-Error Vulnerability Estimation Approach Based on the SET Susceptibility of Each Gate

Article

Full-text available

Jul 2019

Soft-Error Vulnerability (SEV) is a parameter used to evaluate the robustness of a circuit to the induced Soft Errors (SEs). There are many techniques for SEV estimation, including analytical, electrical and logic simulations, and emulation-based approaches. Each of them has advantages and disadvantages regarding estimation time, resources consumption, accuracy, and restrictions over the analysed circuit. Concerning the ionising radiation effects, some analytical and electrical simulation approaches take into account how the circuit topology and the applied input patterns affect their susceptibilities to Single Event Transient (SET) at the gate level. On the other hand, logic simulation and emulation techniques usually ignore these SET susceptibilities. In this context, we propose a logic simulation-based probability-aware approach for SEV estimation that takes into account the specific SET susceptibility of each circuit gate. For a given operational scenario, we extract the input patterns applied to each gate and calculate its specific SET susceptibility. For the 38 analysed benchmark circuits, we obtained a reduction from 15.27% to 0.68% in the average SEV estimation error, when comparing the estimated value to a reference obtained at the transistor level. The results point out an improvement of the SEV estimation process by considering the specific SET susceptibilities.

Logical Design of n-bit Comparators: Pedagogical Insight from Eight-Variable Karnaugh Maps

Article

May 2019

Logical Design of n-bit Comparators: Pedagogical Insight from Eight-Variable Karnaugh Maps

Article

Full-text available

May 2019

An-bit comparator is a celebrated combinational circuit that compares two-bit inputs and and produces three orthonormal outputs: G (indicating that is strictly greater than), E (indicating that and are equal or equivalent), and L (indicating that is strictly less than The symbols 'G', 'E', and 'L' are deliberately chosen to convey the notions of 'Greater than,' 'Equal to,' and 'Less than,' respectively. This paper analyzes an comparator in the general case of arbitrary and visualizes the analysis for = 4 on a regular and modular version of the 8-variable Karnaugh-map. The cases = 3, 2, and 1 appear as special cases on 6-variable, 4-variable, and 2-variable submaps of the original map. The analysis is a tutorial exposition of many important concepts in switching theory including those of implicants, prime implicants, essential prime implicants, minimal sum, complete sum and disjoint sum of products (or probability-ready expressions).

CAD Architecture for Expansion of WSL-Based Combinational Circuits Dataset

Conference Paper

Mar 2024

A Model For Probabilistic Fault Propagation with the Approach of Effective Fanouts in the Logic Circuits

Conference Paper

May 2023

A Reliability-Critical Path Identifying Method With Local and Global Adjacency Probability Matrix in Combinational Circuits

Article

Jan 2023

Accurate and efficient identification of reliability-critical paths (RCPs) not only facilitates fault localization and troubleshooting but also allows circuit designers to improve circuit reliability at a low cost. This paper proposes a local and global adjacency probability matrix-based approach (LGAPM) to quickly and efficiently identify RCPs of combinational logic circuits. The approach reflects the criticality of the overall reliability of the circuit as well as the local criticality of gates in the path. In addition, we design a pruning-based method to accelerate RCP identification in large-scale circuits. The experimental results of the LGAPM on all 74 series circuits, ISCAS-85, and partial EPFL benchmark circuits show that the 74181 circuit with a minimum of 17 paths and the EPFL-remainder10 circuit with a maximum of 8.081 × 10 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">8</sup> paths take times of about 0.18s and 33931.04s, respectively. The average accuracy on small and medium-scale circuits is 94.24%, and the average stability on all-size circuits is 86.19%. Compared to the SAT-based method, hill-climbing algorithm, and random method, LGAPM’s metrics are superior and more appropriate for large-scale circuits. The overall circuit reliability can be improved from 0.7726 to 0.9238 on average by hardening a tiny number of gates in the identified the most RCPs and the average cost savings is 4.08 times over random hardening methods.

Soft-Error Injection System for Processor on FPGA Platform

Conference Paper

Jul 2023

A Hybrid Method for Signal Probability Estimation with Combinational Circuits

Conference Paper

Nov 2022

A Framework for Reliability Analysis of Combinational Circuits Using Approximate Bayesian Inference

Article

Apr 2023

A commonly used approach to compute the error rate at the primary outputs (POs) of a circuit is to compare the fault-free and faulty copies of the circuit using xor gates. This model results in poor accuracies with nonsampling-based methods for reliability estimation. An alternative is to use a single copy of the circuit with a four-valued representation for each net corresponding to the correct and incorrect signals. One problem in this formulation is the accurate propagation of associated probabilities. We use the framework of Bayesian inference (BI) to address this issue. We derive the conditional probability distribution (CPD) corresponding to the four-valued signals and find the output error rate using various approximate BI techniques. With our formulation, we demonstrate that the output error rate scales with the gate error probabilities. It is guaranteed to be zero when the gate error probability is zero, provided approximate BI algorithms based on sum-product belief propagation (BP) are used. Although inaccuracies increase at very low gate error probabilities, it is able to capture the relative reliability of outputs with respect to each other. We also propose a new method for finding the overall circuit error rate as the partition function for a fixed state of POs. This method provides a significant improvement in accuracy when compared with the existing method using or gates.

A Computational Error Method to Datapath Single Event Transient Analysis

Chapter

Jan 2023

The reduction of the size of the devices and the operating voltage are critical challenges in the design of digital circuits. This paper presents a computational error analysis approach for reliability evaluation in the circuit datapath. In this paper, Soft Error Rate (SER) is investigated in terms of dependency to the cell circuit structure and the input combination. Using this method, a sample cell library is evaluated to build up reliability profile for different implementation cases. Accordingly, error rate for different bits are extracted using Monte Carlo analysis, which have been reflected as a computational error. The proposed Method can be used besides other methods for analysing the reliability based computational error; instead of increasing the reliability via redundancy or other techniques, designers select a reliable design by using different implementations of a component. Considering reliability as a computational cost, during high-level synthesis (HLS) reliability can be traded with other implementation costs resulting in higher performance and lower implementation costs. In this paper, computational error is simulated with a Gaussian distribution. Again, in order to find the relationship between the coefficients of the Gaussian error model and error rate, the coefficients are fitted to a cubic polynomial. Results show this that fitting is suitable. #COMESYSO1120.

The Impact of Logic Gates Susceptibility in Overall Circuit Reliability Analysis

Conference Paper

May 2022

A Pruning and Feedback Strategy for Locating Reliability-Critical Gates in Combinational Circuits

Article

Jan 2022

In nanometric integrated circuits, to harden reliability-critical gates (RCGs) is an important step to improve overall circuit reliability at a low cost. To locate RCGs quickly and efficiently is a key prerequisite for selective hardening at the early stage of circuit design. This article develops a new approach for locating RCGs for multiple input vectors in combinational circuits, using an input vector-oriented pruning technology to identify RCGs, and a sensitivity-based algorithm to measure the criticality of gate reliability (CGR) for each identified RCG. To accelerate the location of RCGs, a feedback-based algorithm mines the accumulated simulation data for each RCG, and a grouping algorithm handles RCGs with similar CGR in the stage of convergence checking. Simulations on 74-series and ISCAS 85 benchmark circuits show that the average accuracy of the proposed method is 0.986 with Monte–Carlo (MC) as the reference and it is 7181 times faster than the MC model. Also, this method performs better than other approximate algorithms in terms of location accuracy and time overhead.

An Efficient Method for Sequential Circuit Reliability Estimation

Conference Paper

Aug 2022

A hybrid method for signal probability and reliability estimation with combinational circuits

Article

Aug 2022
INTEGRATION

Today's high-density integrated circuits (ICs) have become more sensitive to temperature, process variations and environmental noises. This makes their reliability evaluation one of critical issues in the design flow. Circuit reliability evaluation generally requires estimation of error-free signal probabilities and signal reliability correlations. In this paper, we present a fast hybrid method to estimate both signal probability and reliability simultaneously by using combination of analysis and statistical simulation. Simulation results show that the proposed method is hundreds of times faster than Monte-Carlo simulation, while maintaining a high level of accuracy.

Identifying Reliability-Critical Primary Inputs of Combinational Circuits Based on the Model of Gate-Sensitive Attributes

Article

Jan 2022

The identification of reliability-critical primary input leads (RCPIs) plays an important role in the testing and prediction of reliability boundaries of logic circuits. This paper presents a gate-sensitive-attributes-based approach to estimate the criticality of the primary input leads in combinational circuits to their reliability. Oriented to the input vector, a subcircuit-based traversal method marks the critical input leads of each gate in a circuit. Gate-sensitive attributes and a reverse recursive algorithm quantify the effect of each RCPI on circuit reliability under the input vector. A parallel calculation method based on subcircuits with only one primary output reduces the computational complexity to accelerate the calculation process. Similarity-based clustering avoids unnecessary calculations, and a self-adaptive strategy is used to check convergence. Experimental results on benchmark circuits show that the average accuracy of this approach is 0.9634 with Monte Carlo (MC) as the reference and it is 3445 times faster than the MC on average while its average memory cost is 1.67 greater than the MC model. Although the fitness of the worst input vector obtained by other reference method is 1.09 times better than that of this approach on average, this approach is approximately 21 times faster than that reference method on average.

An Analytical Model for Circuit Reliability Estimation

Conference Paper

Aug 2021

A Novel Coplanar Based Adder Logic Design Using QCA

Article

Full-text available

Aug 2021

Now a days, VLSI is a one of the top most technology are used in the field of electronics communication. It is used to create an integrated circuit by merging million of MOS transistor into a single chip. In VLSI, most of the transistors are design in micro scale level. Now a days people will want all materials are in compact size. So, it is necessary to design a circuit in nanoscale level. In the field of VLSI, CMOS technologies are used for designing a integrated circuit (IC) chips. But in CMOS, size that are used to designing a circuit is in micro scale level. So researcher is introducing new nanotechnology that new technology is called QCA technology. Logic function gate is a one of the fundamental components to design an any circuit in electronics communication. In this paper, novel coplanar approach to designing an efficient QCA based 4-bit full adder using XOR/XNOR logic gate is proposed. QCA Designer Version 2.0.3 simulation tools are used in this proposed method. Performance is analysed and verified to determine the capabilities of proposed full adder.

An Efficient Approach to Tolerate Soft Errors in Combinational Circuits

Conference Paper

May 2021

Probability gate model based methods for approximate arithmetic circuits reliability estimation

Article

Apr 2021

With the rapid development of approximate computing technology, the reliability evaluation of approximate circuits has attracted significant interest. So far, few methods can be applied to estimate the reliability of approximate circuits, the existing methods are based on probability transfer module (PTM) and Monte Carlo (MC) method. However, the PTM-based methods are confined to small-scale approximate circuits and large circuits with weak signal correlation, and the MC method is time-consuming to obtain accurate results. This paper proposes an algorithm for determining the acceptable outputs of approximate dividers based on the design principle of the approximate divider. Then based on the probability gate model (PGM), this paper presents three methods for reliability estimation of gate-level approximate arithmetic circuits. The non-processing correlation algorithm does not consider the correlation among signals to obtain an approximate value of circuit reliability, and its time complexity keeps a linear relation with the number of gates. The processing correlation algorithm can estimate the correlation caused by fanout nodes of the approximate arithmetic circuits, and it has the obvious advantage on accuracy. However, its time complexity is exponential with the number of fanout nodes in the circuit. The fusion algorithm considers the effect of each fanout node on the reliability of the circuit separately and then uses a linear model to obtain the circuit reliability. Although some accuracy is lost, the time complexity is linear. The experimental results on benchmark circuits show that the proposed methods are effective, and have certain advantages in accuracy and efficiency as compared with the existing methods.

BM-RCGL: Benchmarking Approach for Localization of Reliability-Critical Gates in Combinational Logic Circuits

Article

Apr 2021

This paper introduces an accurate and effective approach for localizing RCGs in combinational logic blocks through a benchmarking technique. In the proposed approach, uniform non-Bernoulli sequences are used to produce a set of input vectors for driving circuits. A full-period linear congruential algorithm is employed to generate a sequence that provides the sampled order for the RCGs to be analyzed. This ensures that each gate in the circuit is treated as fairly as possible. To accelerate the localization process, an input-vector-based pruning technique combined with a counting method is also introduced to identify the specified number of RCGs. Then, the criticality of gate reliability for each RCG is measured through benchmarking. A clustering algorithm carries out the convergence checking for the proposed approach. The performance of the proposed approach was evaluated in terms of accuracy, stability, and time-space overhead by various simulations on 74-series circuits and ISCAS-85 benchmark circuits. The results show that its accuracy is close to that of the Monte Carlo model and its stability is better than that of other approximate methods. Moreover, compared with approximate methods, the time overhead of our approach is advantageous in the presence of similar memory overheads.

Design and execution of programmable logic device using quantum dot cellular automata

Article

Mar 2021

Quantum-dot Cellular Automata (QCA) is a nano scale which works out material that is being sight seen by researchers in the VLSI domain in order to reduce the CMOS transistors mount. In QCA, the elemental units of basic logic gates are QCA cells. During this work, a unique XOR/XNOR-functions like a gate with a pair of inputs, a single enable input and a result is proposed and designed in QCA nanotechnology. QCA circuits can be utilized to study and design PLD. This method presents rules for specialized architecture projects using programmable devices and a simulation engine is proposed to tune resource fully simulate circuits of QCA aimed for this construction. The presented cell structures are simulated (replicated) and tested by using the QCA designer tool. These designs are superior to the remaining designs in cell complexity, covered area.

High Efficiency and Low Overkill Testing for Probabilistic Circuits

Conference Paper

Sep 2020

Built-in self-repair structure for real-time fault recovery applications

Article

Aug 2020
MICROELECTRON RELIAB

Recent developments in high-tech industries have led to integrated circuits that are used increasingly in most of the critical applications such as medical supervisory systems and space applications. These systems are facing two major challenges, namely reliability and timing requirements. In other words, these systems have to operate correctly while meeting timing requirements even in the presence of several faults; therefore, they must be protected by real-time fault tolerance techniques to deal with both mentioned challenges. However, most of the existing approaches consider only one of these factors. In this paper, we have proposed a new reconfigurable application-specific integrated circuit (ASIC) structure with real-time fault-tolerance capability which repairs itself in two steps with a minimum delay. It includes some reconfigurable basic cells with self-repair capability which are used to implement the first recovery step. In the second step, the fault recovery process is done by replacing the faulty basic cell using a new reconfigurable routing network. According to the simulation results, our proposed structure can tolerate several permanent faults with minimum delay, power consumption, and area overhead.

Recent advances on reliability evaluation and optimization of linear multistate consecutively connected systems

Article

May 2020

With the progress of technology, the model of linear multistate consecutively connected systems (LMCCS) is applied more and more widely, such as telecommunication systems, internet of things, etc. LMCCS contains a series of linearly arranged nodes and any disconnection between nodes will lead to the failure of the whole system. Thus, the reliability of a linear multistate system is affected by many factors such as the positioning of elements. In order to be able to incorporate different types of complicated practical factors, researchers have proposed different extensions for LMCCS model. Besides, models are proposed to study the optimal system configurations of such systems considering reliability and some other objectives or constraints. Facing amounts of works on LMCCS, this paper aims to review these literatures, classify them, and elicit some future research directions.

A Novel XOR/XNOR Structure for Modular Design of QCA Circuits

Article

Apr 2020

Quantum dot cellular automata (QCA) is considered one of the most promising technologies to replace the current CMOS technology. Compared with the traditional transistor technology, the computation relies on a new paradigm based on the interaction between nearby QCA cells. It has significant advantages, such as operating frequency (THz), high device density, and low power consumption. In this paper, a novel XOR/XNOR-function logic gate with two inputs, two enable inputs and one output is proposed and designed in Quantum-dot Cellular Automata (QCA) nanotechnology. In order to demonstrate the functionality and capabilities of the proposed QCAbased XOR/XNOR architecture, performance is evaluated and analyzed. The proposed XOR/XNOR logic gate has a superb performance in terms of area, complexity, power consumption and cost function in comparison to some existing QCA-based XOR architectures. Moreover, some efficient circuits based on the proposed XOR/XNOR gate are designed in QCA.

Reliability Estimation of Approximate Circuits Based on Probabilistic Gate Model

Conference Paper

Dec 2019

Accurate Soft Error Rate Reduction using Modified Resolution Method

Conference Paper

Sep 2019

Allocating Gate Reliability for Circuit Reliability Optimization

Conference Paper

Aug 2019

A Locating Method for Reliability-Critical Gates with a Parallel-Structured Genetic Algorithm

Article

Sep 2019

The reliability allowance of circuits tends to decrease with the increase of circuit integration and the application of new technology and materials, and the hardening strategy oriented toward gates is an effective technology for improving the circuit reliability of the current situations. Therefore, a parallel-structured genetic algorithm (GA), PGA, is proposed in this paper to locate reliability-critical gates to successfully perform targeted hardening. Firstly, we design a binary coding method for reliability-critical gates and build an ordered initial population consisting of dominant individuals to improve the quality of the initial population. Secondly, we construct an embedded parallel operation loop for directional crossover and directional mutation to compensate for the deficiency of the poor local search of the GA. Thirdly, for combination with a diversity protection strategy for the population, we design an elitism retention based selection method to boost the convergence speed and avoid being trapped by a local optimum. Finally, we present an ordered identification method oriented toward reliability-critical gates using a scoring mechanism to retain the potential optimal solutions in each round to improve the robustness of the proposed locating method. The simulation results on benchmark circuits show that the proposed method PGA is an efficient locating method for reliability-critical gates in terms of accuracy and convergence speed.

Circuit Reliability Prediction Based on Deep Autoencoder Network

Article

Sep 2019
NEUROCOMPUTING

As semiconductor feature size continues to decrease and the density of integration continues to increase, highly reliable circuit design is experiencing many challenges, including reliability evaluation, which is one of the most important steps in circuit design. However, faced with the very large scale of integrated circuits at present, traditional simulation-based methods are slightly inadequate in terms of computational complexity and do not apply to the circuits at the concept stage. To solve this problem, this paper presents a new prediction method for circuit reliability based on deep auto encoder networks. Firstly, we analyze and extract the main features associated with circuit reliability. Next, we construct an efficient method for data collection by combining the characteristics of the feature set with the requirements of deep auto encoder networks. Then, we build a deep auto encoder network model oriented to circuit reliability prediction in a supervised learning manner. Simulation results on 74-series circuits and ISCAS85 benchmark circuits show that although the accuracy of the proposed method is slightly lower than that of both the Monte Carlo (MC) method and the fast probabilistic transfer matrix (F-PTM) model, its time-space consumption is approximately constant on different circuits, and it is 102,458,469 times faster than the MC method, and approximately 4,383 times faster than the F-PTM model. Furthermore, the proposed method could be used to predict circuit reliability at the conceptual stage, and it is a very efficient approximation method that could greatly reduce the power consumption of the calculation.

ATPG and Test Compression for Probabilistic Circuits

Conference Paper

Apr 2019

Defect Tolerant N^2-Transistor Structure for Reliable Nanoelectronic Designs

Article

Full-text available

Dec 2009

Nanodevices-based circuit design will be based on the acceptance that a high percentage of devices in the design will be defective. This study investigates a defect-tolerant technique that adds redundancy at the transistor level and provides built-in immunity to permanent defects (stuck-open, stuck-short and bridges). The proposed technique is based on replacing each transistor by N <sup>2</sup>-transistor structure ( N ges2) that guarantees defect tolerance of all N -1 defects as validated by theoretical analysis and simulation. As demonstrated by extensive simulation results using ISCAS 85 and 89 benchmark circuits, the investigated technique achieves significantly higher defect tolerance than recently reported nanoelectronics defect-tolerant techniques (even with up to 4-5 times more transistor defect probability) and at reduced area overhead. For example, the quadded-transistor structure technique requires nearly half the area of the quadded-logic technique.

Probabilistic Error Modeling for Nano-Domain Logic Circuits

Article

Full-text available

Feb 2009

In nano-domain logic circuits, errors generated are transient in nature and will arise due to the uncertainty or the unreliability of the computing element itself. This type of errors - which we refer to as dynamic errors - are to be distinguished from traditional faults and radiation related errors. Due to these highly likely dynamic errors, it is more appropriate to model nano-domain computing as probabilistic rather than deterministic. We propose a probabilistic error model based on Bayesian networks to estimate this expected output error probability, given dynamic error probabilities in each device since this estimate is crucial for nano-domain circuit designers to be able to compare and rank designs based on the expected output error. We estimate the overall output error probability by comparing the outputs of a dynamic error-encoded model with an ideal logic model. We prove that this probabilistic framework is a compact and minimal representation of the overall effect of dynamic errors in a circuit. We use both exact and approximate Bayesian inference schemes for propagation of probabilities. The exact inference shows better time performance than the state-of-the art by exploiting conditional independencies exhibited in the underlying probabilistic framework. However, exact inference is worst case NP-hard and can handle only small circuits. Hence, we use two approximate inference schemes for medium size benchmarks. We demonstrate the efficiency and accuracy of these approximate inference schemes by comparing estimated results with logic simulation results. We have performed our experiments on LGSynth'93 and ISCAS'85 benchmark circuits. We explore our probabilistic model to calculate: 1) error sensitivity of individual gates in a circuit; 2) compute overall exact error probabilities for small circuits; 3) compute approximate error probabilities for medium sized benchmarks using two stochastic sampling schemes; 4) compare and vet design with resp- - ect to dynamic errors; 5) characterize the input space for desired output characteristics by utilizing the unique backtracking capability of Bayesian networks (inverse problem); and 6) to apply selective redundancy to highly sensitive nodes for error tolerant designs.

Estimation and optimization of reliability of noisy digital circuits

Conference Paper

Full-text available

Mar 2009

With continued scaling, reliability is emerging as a critical challenge for the designers of digital circuits. The challenge stems in part from the lack of computationally efficient techniques for analyzing and optimizing circuits for reliability. To address this problem, we propose an exact analysis method based on circuit transformations. Also, we propose a hybrid method that combines exact analysis with probabilistic measures to estimate reliability. We use such measures in a rewiring-based optimization framework to optimize reliability. Our hybrid approach offers a speedup of 56X when compared to a pure Monte Carlo simulation-based approach with only a 3.5% loss in accuracy. Our optimization framework improves reliability by about 10% accompanied by a 6.9% reduction in circuit area1.

Probabilistic transfer matrices in symbolic reliability analysis of logic circuits

Article

Full-text available

Feb 2008

We propose the probabilistic transfer matrix (PTM) framework to capture nondeterministic behav- ior in logic circuits. PTMs provide a concise description of both normal and faulty behavior, and are well-suited to reliability and error susceptibility calculations. A few simple composition rules based on connectivity can be used to recursively build larger PTMs (representing entire logic circuits) from smaller gate PTMs. PTMs for gates in series are combined using matrix multiplication, and PTMs for gates in parallel are combined using the tensor product operation. PTMs can accurately calculate joint output probabilities in the presence of reconvergent fanout and inseparable joint input distributions. To improve computational efficiency, we encode PTMs as algebraic decision diagrams (ADDs). We also develop equivalent ADD algorithms for newly defined matrix opera- tions such as eliminate variables and eliminate redundant variables, which aid in the numerical computation of circuit PTMs. We use PTMs to evaluate circuit reliability and derive polynomial approximations for circuit error probabilities in terms of gate error probabilities. PTMs can also analyze the effects of logic and electrical masking on error mitigation. We show that ignoring logic masking can overestimate errors by an order of magnitude. We incorporate electrical masking by computing error attenuation probabilities, based on analytical models, into an extended PTM framework for reliability computation. We further define a susceptibility measure to identify gates whose errors are not well masked. We show that hardening a few gates can significantly improve circuit reliability.

A system architecture solution for unreliable nanoelectronic devices

Article

Full-text available

Jan 2003

The shrinking of electronic devices will inevitably introduce a growing number of defects and even make these devices more sensitive to external influences. It is, therefore, likely that the emerging nanometer-scale devices will eventually suffer from more errors than classical silicon devices in large scale integrated circuits. In order to make systems based on nanometer-scale devices reliable, the design of fault-tolerant architectures will be necessary. Initiated by von Neumann, the NAND multiplexing technique, based on a massive duplication of imperfect devices and randomized imperfect interconnects, had been studied in the past using an extreme high degree of redundancy. In this paper, this NAND multiplexing is extended to a rather low degree of redundancy, and the stochastic Markov nature in the heart of the system is discovered and studied, leading to a comprehensive fault-tolerant theory. A system architecture based on NAND multiplexing is investigated by studying the problem of the random background charges in single electron tunneling (SET) circuits. It might be a system solution for an ultra large integration of highly unreliable nanometer-scale devices.

The Search for Alternative Computational Paradigms

Article

Full-text available

Aug 2008

Nanometer processes are characterized by extremes of process variations, noise, soft errors, and other nonidealities, which threaten to nullify the intrinsic benefits of scaling. The resulting robustness and energy efficiency problem cannot be addressed in a cost-effective manner solely through advances in manufacturing. Alternative models of computation are needed that thrive in the presence of statistical variations in the underlying device and circuit fabric. This article explores communications-inspired models of computation supported by innovative robust circuit and logic fabric design approaches. These models share the common feature of leveraging dense networks with information exchange and coupling among nodes to enhance robustness without compromising energy efficiency. Promising post-silicon devices such as carbon nanotubes (CNTs) offer an attractive platform on which to build such computational systems. This article identifies opportunities and challenges in designing robust and low-power SoCs in emerging nanoscale process technologies, employing radically new modes of computation.

Toward Hardware-Redundant, Fault-Tolerant Logic for Nanoelectronics

Article

Full-text available

Aug 2005

This article provides an overview of several logic redundancy schemes, including von Neumann's multiplexing logic, N-tuple modular redundancy, and interwoven redundant logic. We discuss several important concepts for redundant nanoelectronic system designs based on recent results. First, we use Markov chain models to describe the error-correcting and stationary characteristics of multiple-stage multiplexing systems. Second, we show how to obtain the fundamental error bounds by using bifurcation analysis based on probabilistic models of unreliable gates. Third, we describe the notion of random interwoven redundancy. Finally, we compare the reliabilities of quadded and random interwoven structures by using a simulation-based approach. We observe that the deeper a circuit's logical depth, the more fault-tolerant the circuit tends to be for a fixed number of faults. For a constant gate failure rate, a circuit's reliability tends to reach a stationary state as its logical depth increases.

Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components in Automata Studies

Article

Jan 1988

John von Neumann

Nano CMOS Circuit and Physical Design

Book

Jan 2005

Ban Wong

Chapter 11 investigates the looming challenges of silicon process variations to robust circuit design. As the increasing variations significantly degrade design quality, it is critical to consider manufacturability and yield factors at the design stage. Design strategies to mitigate their impacts are examined, ranging from specific designs of clock distribution and memory units, to more general approaches of robust analog and digital designs for nano-CMOS technology. Besides design techniques for manufacturability, statistical analysis of circuit performance is also presented, in order to effectively handle process variations and improve system yield.

Probabilistic Logic and the Synthesis of Reliable Organisms from Unreliable Components

Article

Jan 1956

John Neumann

Nano, quantum and molecular computing: implications to high level design and validation

Book

Jan 2004

Towards Accurate and Efficient Reliability Modeling of Nanoelectronic Circuits

Conference Paper

Jul 2006

The emergence of nanoelectronic devices which rely on fundamentally unreliable physics calls for reliability evaluation techniques and practical design-for-reliability solutions. This paper reviews a method that uses probabilistic gate models (PGMs) for reliability estimation and improves upon this method to enable the accurate evaluation of reliabilities of circuits. When applied to large, complex circuits, however, this and other accurate methods lead to long execution times. To simplify reliability analysis, this paper leverages the fact that many large circuits consist of common logic modules. The overall circuit reliability estimation can be made on the basis of accurate PGM-based reliabilities of individual modules. This technique significantly reduces the PGM method’s complexity, making it suitable for practical design-for-reliability applications. Results from the use of this technique on benchmark circuits indicate that the estimates produced correctly identify the most vulnerable paths through a circuit.

Probabilistic Error Propagation in Logic Circuits Using the Boolean Difference Calculus

Conference Paper

Nov 2008

A gate-level probabilistic error propagation model is presented which takes as input the Boolean function of the gate, input signal probabilities, the error probability at the gate inputs, and the gate error probability and generates the error probability at the output of the gate. The presented model uses the Boolean difference calculus and can be efficiently applied to the problem of calculating the error probability at the primary outputs of a multilevel Boolean circuit with a time complexity which is linear in the number of gates in the circuit. This is done by starting from the primary inputs and moving toward the primary outputs by using a post-order – reverse Depth First Search (DFS ) – traversal. Experimental results demonstrate the accuracy and efficiency of the proposed approach compared to the other known methods for error calculation in VLSI circuits.

Signal probability for reliability evaluation of logic circuits

Article

Aug 2008
MICROELECTRON RELIAB

As integrated circuits scale down into nanometer dimensions, a great reduction on the reliability of combinational blocks is expected. This way, the susceptibility of circuits to intermittent and transient faults is becoming a key parameter in the evaluation of logic circuits, and fast and accurate ways of reliability analysis must be developed. This paper presents a reliability analysis methodology based on signal probability, which is of straightforward application and can be easily integrated in the design flow. The proposed methodology computes circuit’s signal reliability as a function of its logical masking capabilities, concerning multiple simultaneous faults occurrence.

Defect Tolerant Logic Gates for Unreliable Future Nanotechnologies

Conference Paper

Jun 2007

In future nanotechnologies failure densities are predicted to be several orders of magnitude higher than in current CMOS technologies. For such failure densities existing fault tolerance implementations are inadequate. This work presents several principles of building multiple-fault tolerant memory cells and logic gates for circuits affected by high defect densities as well as a first evaluation of the area cost and performance.

Reliable Computer Systems Design and Evaluation

Book

Dec 1998

SCRAP: Sequential circuits reliability analysis program

Article

Aug 2009
MICROELECTRON RELIAB

The recent rapid growth in demand for highly reliable digital circuits has focused attention on tools and techniques we might use to accurately estimate the reliability of a proposed circuit on the basis of failure rate of the utilized technology. Reliability analysis has become an integral part of the system design process, especially for those systems with life-critical applications such as aircrafts and spacecraft flight control. In this paper, we present an algorithm to evaluate the reliability of sequential circuits. This approach called ‘multiple-pass’ combines gate failure probability with the propagated errors to calculate the reliability of every nodes of the circuit in an iterative manner. The proposed approach is used to implement and develop the SCRAP program. It computes the reliability of the sequential circuit based on its standard cell library which can be extended to have larger gates such as D flip-flops. The framework is applied to a subset of sequential benchmark circuits and the observed results demonstrate the accuracy and speed of the proposed technique.

Reliability Analysis of Logic Circuits

Article

Mar 2009

Reliability of logic circuits is emerging as an important concern in scaled electronic technologies. Reliability analysis of logic circuits is computationally complex because of the exponential number of inputs, combinations, and correlations in gate failures. This paper presents three accurate and scalable algorithms for reliability analysis of logic circuits. The first algorithm, called observability-based reliability analysis, provides a closed-form expression for reliability and is accurate when single gate failures are dominant in a logic circuit. The second algorithm, called single-pass reliability analysis, computes reliability in a single topological walk through the logic circuit. It computes the exact reliability for circuits without reconvergent fan-out, even in the presence of multiple gate failures. The algorithm can also handle circuits with reconvergent fan-out with high accuracy using correlation coefficients as described in this paper. The third algorithm, called maximum-k gate failure reliability analysis, allows a constraint on the maximum number (k) of gates that can fail simultaneously in a logic circuit. Simulation results for several benchmark circuits demonstrate the accuracy, performance, and potential applications of the proposed algorithms.

Probabilistic decision diagrams for exact probabilistic analysis

Conference Paper

Dec 2007

Afshin Abdollahi

A decision diagram based framework is proposed for representing the probabilistic behavior of circuits with faulty gates. The authors introduce probabilistic decision diagrams (PDD) as an exact computational tool which along with vast expressive power holds many other useful properties such as space efficiency (on average) and efficient manipulation algorithms (polynomial in size.) An algorithm for constructing the PDD for a circuit is proposed. Useful information about probabilistic behavior of the circuit (such as output error probability for arbitrary input probability distribution) can be directly extracted from the PDD representation. Experimental results demonstrate the effectiveness and applicability of the proposed approach.

Faults, Error Bounds and Reliability of Nanoelectronic Circuits

Conference Paper

Aug 2005

This paper is concerned with faults, error bounds and reliability modeling of nanotechnology-based circuits. First, we briefly review failure mechanisms and fault models in nanoelectronics. Second, reliability functions based on probabilistic models are developed for unreliable logic gates. We then show that fundamental gate error bounds for general probabilistic computation can be derived using the nonlinear mapping functions constructed from the gate models. Finally, an analytical approach is proposed for estimating reliabilities of nanoelectronic circuits. This approach is based on the probabilistic modeling of unreliable logic gates and interconnects. In spite of the approximations used in probabilistic modeling, our study suggests that the proposed approach provides a simple and efficient way to model the reliability of nanoelectronic circuits.

A probabilistic-based design methodology for nanoscale computation

Conference Paper

Dec 2003

As current silicon-based techniques fast approach their practical limits, the investigation of nanoscale electronics, devices and system architectures becomes a central research priority. It is expected that nanoarchitectures will confront devices and interconnections with high inherent defect rates, which motivates the search for new architectural paradigms. In this paper, we propose a probabilistic-based design methodology for designing nanoscale computer architectures based on Markov Random Fields (MRF). The MRF can express arbitrary logic circuits and logic operation is achieved by maximizing the probability of state configurations in the logic network. Maximizing state probability is equivalent to minimizing a form of energy that depends on neighboring nodes in the network. Once we develop a library of elementary logic components, we can link them together to build desired architectures based on the belief propagation algorithm. Belief propagation is a way of organizing the global computation of marginal belief in terms of smaller local computations. We will illustrate the proposed design methodology with some elementary logic examples.

Majority Multiplexing—Economical Redundant Fault-Tolerant Designs for Nanoarchitectures

Article

Aug 2005

Motivated by the need for economical fault-tolerant designs for nanoarchitectures, we explore a novel multiplexing-based redundant design scheme at small (≤100) and very small (≤10) redundancy factors. In particular, we adapt a strategy known as von Neumann multiplexing to circuits of majority gates with three inputs and for the first time exactly analyze the performance of a multiplexing scheme for very small redundancies, using combinatorial arguments. We also develop an extension of von Neumann multiplexing that further improves performance by excluding unnecessary restorative stages in the computation. Our results show that the optimized three-input majority multiplexing (MAJ-3 MUX) outperforms the latest scheme presented in the literature, known as parallel restitution (PAR-REST), by a factor between two and four, for 48≤R≤100. Our scheme performs extremely well at very small redundancies, for which our analysis is the only accurate one. Finally, we determine an upper bound on the maximum tolerable failure probability when any redundancy factor may be used. This bound clearly indicates the advantage of using three-input majority gates in terms of reliable operation.

Markov Chains and Probabilistic Computation—A General Framework for Multiplexed Nanoelectronic Systems

Article

Apr 2005

In emerging nanotechnologies, reliable computation will have to be carried out with unreliable components being integral parts of computing systems. One promising scheme for designing these systems is von Neumann's multiplexing technique. Using bifurcation theory and its associated geometrical representation, we have studied a NAND-multiplexing system recently proposed. The behavior of the system is characterized by the stationary distribution of a Markov chain, which is uni- or bi-modal, when the error probability of NAND gates is larger or smaller than the threshold value, respectively. The two modes and the median of the stationary distribution are the keys to the characterization of the system reliability. Examples of potential future nanochips are used to illustrate how the NAND-multiplexing technique can lead to high system reliability in spite of large gate error probability while keeping the cost of redundancy moderate. In nanoelectronic systems, while permanent defects can be taken care of by reconfiguration, probabilistic computation schemes can incorporate another level of redundancy so that high tolerance of transient errors may be achieved. The Markov chain model is shown to be a powerful tool for the analysis of multiplexed nanoelectronic systems.

Unveiling the ISCAS-85 benchmarks: A case study in reverse engineering

Article

Feb 1999

Designing at higher levels of abstraction is key to managing the complexity of today's VLSI chips. The authors show how they reverse-engineered the ISCAS-85 benchmarks to add a useful, new high-level tool to the designer's arsenal

Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and Degradation

Article

Dec 2005

S. Borkar

As technology scales, variability in transistor performance continues to increase, making transistors less and less reliable. This creates several challenges in building reliable systems, from the unpredictability of delay to increasing leakage current. Finding solutions to these challenges require a concerted effort on the part of all the players in a system design. This article discusses these effects and proposes microarchitecture, circuit, and testing research that focuses on designing with many unreliable components (transistors) to yield reliable system designs.

Trends and challenges in VLSI circuit reliability

Article

Aug 2003

C. Constantinescu

Deep-submicron technology is having a significant impact on permanent, intermittent, and transient classes of faults. Faults experienced by semiconductor devices fall into three main categories: permanent, intermittent, and transient. Permanent faults reflect irreversible physical changes. Intermittent faults occur because of unstable or marginal hardware; they can be activated by environmental changes, like higher or lower temperature and voltage. Transients occur because of temporary environmental conditions. Fault-tolerant solutions are being developed continuously to address the problem.

Expected Value Analysis of Combinational Logic Networks

Article

Jun 1981

The results of an investigation into probabilistic analysis of combinational digital networks are reported herein. Basic gates as well as larger functional building blocks are assigned probability density functions in place of fixed input-output propagation delays. Signal lines interconnecting logic device models carry not I or 0 (binary) values, but rather continuous waveforms representing the expected values of these binary signals. Probabilistic models of the basic AND, OR, NOT, etc., gates are presented, as are methods for handling signal dependencies due to reconvergent fanout. Application of the models to reliability analysis is discussed.

Probabilistic Treatment of General Combinational Networks

Article

Jul 1975

In this correspondence two methods are given for calculating the probability that the output of a general combinational network is 1 given the probabilities for each input being 1. We define the notions of the probability of a signal and signal independence. Then several proofs are given to show the relationship between Boolean operations and algebraic operations upon probabilities. As a result of these, two simple algorithms are presented for calculating output probabilities. An example of the usefulness of these results is given with respect to the generation of tests for the purpose of fault detection.

The search for alternative computational paradigms. The special issue on ''System IC Design Challenges beyond 32 nm

Jan 2008

Nr Shanbhag
S Mitra
G Veciana
M Orshansky
R Marculescu
J Roychowdhury
D Jones
Rabaey

Shanbhag NR, Mitra S, de Veciana G, Orshansky M, Marculescu R, Roychowdhury J, Jones D, Rabaey JM. The search for alternative computational paradigms. The special issue on ''System IC Design Challenges beyond 32 nm". IEEE Des Test Comput 2008;25(4).

Nanoprism: a tool for evaluating granularity versus reliability tradeoffs in nano architectures

May 2004

D Bhaduri
Shukla

Bhaduri D, Shukla S. Nanoprism: a tool for evaluating granularity versus reliability tradeoffs in nano architectures. In: ACM GLSVLSI, Boston, MA, April 2004.

Defects tolerant logic gates for unreliable future nanotechnologies Lecture notes in computer science

Jan 2007
4507-422

L Anghel
Nicolaidis

Anghel L, Nicolaidis M. Defects tolerant logic gates for unreliable future nanotechnologies. In: Sandoval F et al., editors. Lecture notes in computer science. IWANN 2007 4507; 2007. p. 422–9.

Rabaey computational paradigms. The special issue on ‘‘System IC Design Challenges beyond 32 nm”

Shanbhagnr
S G Mitra
Roychowdhuryj

ShanbhagNR, Mitra S, deVeciana G, RoychowdhuryJ, JonesD,Rabaey computational paradigms. The special issue on ‘‘System IC Design Challenges beyond 32 nm”. IEEE Des Test Comput 2008;25(4).

Essentials of electronic testing

Jan 2005

Ml Bushnell
Vd Agrawal

Bushnell ML, Agrawal VD. Essentials of electronic testing. Boston: Springer; 2005.

Reliability evaluation of logic circuits using probabilistic gate models

Abstract

No full-text available

Recommended publications

Neutralizing a design-for-hardware-trust technique

A Synthesis Method for Quaternary Quantum Logic Circuits

Efficient search methods for obtaining exact minimum AND-EXOR expressions

Dynamic scan clock control in BIST circuits