Conference Paper

An Analytical Model for Reliability Evaluation of NoC Architectures

Univ. of Tehran, Tehran
DOI: 10.1109/IOLTS.2007.13 Conference: 13th IEEE International On-Line Testing Symposium (IOLTS 2007), 8-11 July 2007, Heraklion, Crete, Greece
Source: DBLP


This paper proposes an analytical model to assess Reliability Factor of an NoC based System-on-Chip design. Reliability Factor is the probability that faults in the NoC infrastructure can be recovered without any effect on system functionality. The proposed method classifies switch faults of an NoC according to their impact on system functionality. Based on this classification, the contribution of each transient fault lowering the reliability of the NoC is calculated. This model can be used to decide which fault tolerant techniques cause more improvement on system reliability.

13 Reads
  • Source
    • "Significant work has been carried out to estimate the reliability of either single-and multi-processors [6] [7] [8] [9] [10] or of computer networks [11]. Reliability of NoCs has only recently been studied [12] [13] [14]. High-level metrics for reliable systems (e.g., reliability, availability , data integrity, mean time to failure (MTTF), mean time to repair (MTTR), architectural vulnerability factor (AVF), failures in time (FIT), FIT for reference circuit (FORC), etc.) have been used for quantifying the benefits of reliable systems. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a new architecture level unified reliability evaluation methodology for chip multiprocessors (CMPs). The proposed reliability estimation (REST) is based on a Monte Carlo algorithm. What distinguishes REST from the previous work is that both the computational and communication components are considered in a unified manner to compute the reliability of the CMP. We utilize REST tool to develop a new dynamic reliability management (DRM) scheme to address time-dependent dielectric breakdown and negative-bias temperature instability aging mechanisms in network-on-chip (NoC) based CMPs. Designed as a control loop, the proposed DRM scheme uses an effective neural network based reliability estimation module. The neural-network predictor is trained using the REST tool. We investigate how system’s lifetime changes when the NoC as the communication unit of the CMP is considered or not during the reliability evaluation process and find that differences can be as high as 60%60%. Full-system based simulations using a customized GEM5 simulator show that reliability can be improved by up to 52%52% using the proposed DRM scheme in a best-effort scenario with 2-9%2-9% performance penalty (using a user set target lifetime of seven years) over the case when no DRM is employed.
    Microprocessors and Microsystems 01/2013; 38(1). DOI:10.1016/j.micpro.2013.11.009 · 0.43 Impact Factor
  • Source
    • "Applying the analytical method usually accompanies with some assumption which can affect on result. In [8], an analytical model of a NoC architecture has been presented with the assumption of not being the faulty and destination router at same columns in a mesh topology. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents comparison of transient fault effects in an asynchronous NoC router and a synchronous one. The experiment is based on simulation-based fault injection method to assess the fault-tolerant behavior of both architectures. The effort has been accomplished by employing fault injector signal (FIS) in asynchronous design and synchronous one. Different fault models such as Crosstalk, SEU, and SET have been applied in both architectures to evaluate their robustness. Glitch fault model has also been injected through the asynchronous scheme. The experimental results have been considered in different aspects to estimate the NoC router’s robustness. Although asynchronous designs seems inherently fault-tolerant due to applying handshaking signals, up to 55% of the injected faults result in failure, and about 44% of injected faults are replaced by new values before turning into errors. Less than 1% of injected faults treated as latent error. Moreover, the failure rate of token generation is higher than token consumption effects. Furthermore, experiments show that asynchronous NoC router is more robust than the synchronous one by preventing the fault propagation.
    Journal of Systems Architecture 01/2011; 57(1-57):61-68. DOI:10.1016/j.sysarc.2010.10.003 · 0.44 Impact Factor
  • Source
    • "In the context of reliable computing systems, high-level reliability metrics have been proposed including reliability function R(t), mean time to failure (MTTF), mean time to repair (MTTR), architectural vulnerability factor (AVF), availability, data integrity, etc. The reliability of NoCs has been evaluated using various metrics including fraction or probability of correctly delivered packets [21] [17] [32], percentage of lost packets or undetected errors [22], reliability factor [26], number of corrected errors [28] [29], minimum edge cutset [32], probability of correct operation [34], and path reliability [35]. "

    ACM/IEEE International Symposium on Networks-on-Chip; 01/2011
Show more