An Analytical Model for Reliability Evaluation of NoC Architectures.
ABSTRACT This paper proposes an analytical model to assess Reliability Factor of an NoC based System-on-Chip design. Reliability Factor is the probability that faults in the NoC infrastructure can be recovered without any effect on system functionality. The proposed method classifies switch faults of an NoC according to their impact on system functionality. Based on this classification, the contribution of each transient fault lowering the reliability of the NoC is calculated. This model can be used to decide which fault tolerant techniques cause more improvement on system reliability.
SourceAvailable from: Hamed Sajjadikia
Conference Paper: Energy and Reliability Oriented Mapping for Regular Networks-on-ChipACM/IEEE International Symposium on Networks-on-Chip; 01/2011
[Show abstract] [Hide abstract]
ABSTRACT: Reliability is a growing fundamental challenge in the design of multiprocessor Systems-on-Chip (MPSoCs). This trend is accelerated by the increasingly adverse process variations and wearout mechanisms that result in an increased number of errors. Previously proposed fault-tolerant techniques are ad-hoc and target processors or Networks-on-Chip (NoC) separately. Because each of these two units may become a reliability bottleneck for NoC based multiprocessor SoCs, it is imperative that the reliability of SoCs be evaluated and addressed in a unified manner, as a combination of communication and computational units. Using this holistic approach, in this paper, we propose a new architecture level unified reliability evaluation methodology for MPSoCs. At the core of the reliability estimation engine lies a Monte Carlo algorithm which works with failure times for time-dependent dielectric breakdown (TDDB) and negative bias temperature instability (NBTI) modeled as Weibull distributions. To demonstrate its usefulness, we utilize the proposed methodology to explore the impact of NoC router layout on the failure time of the system running the same set of benchmarks. In addition, we investigate the failure time of the system when the NoC as the communication unit of the MPSoC is taken or not - as in previous work - into consideration. Our simulation framework can be very helpful to architecture designers, who could use it to identify architectural characteristics and to develop design techniques meant to improve system's lifetime.Green Computing Conference (IGCC), 2012 International; 01/2012
[Show abstract] [Hide abstract]
ABSTRACT: With the increasing threat of soft errors induced bits upset, Network on Chip (NoC) as the communication infrastructure in many-core systems has been proven a reliability bottleneck in a fault tolerant parallel system. The often-used metric Architecture Vulnerability Factor (AVF), measures the architecture-level soft error impacts to compromise the design cost of fault tolerant schemes and reliability well. As a complementary of existing estimation methods about standard IP like processor and Cache, this work aims at an accelerated fault injection methodology for the fine-grain AVF assessment in NoC via two components: (1) modeling the complex fault patterns of both Multi-Cell Upsets (MCU) and Single Bit Upset (SBU) in the standard Fault Injection (FI) method; (2) accelerating the estimation via classifying and exploiting the fine-grain metrics according to different error impacts. The comprehensive simulation results using the diverse configures (e.g., varying fault model, benchmark, traffic load, network size and fault list size) also demonstrate that the proposed approach (i) shrinks the estimation inaccuracy due to MCU patterns 18.89% underestimation in no protection case and 88.92% overestimation under ECC (Error Correction Coding) protection on average; (ii) achieves about 5× speedup without estimation accuracy loss via phased pre-analysis based on fine-grain classification; (iii) verifies ECC a cost-effective mechanism to protect NoC router: soft errors reduced by about 50% over the no protection case, with only less than 2% area overhead.Microelectronics Reliability 07/2014; 54(11). DOI:10.1016/j.microrel.2014.06.008 · 1.21 Impact Factor