A Comparison of TMR With Alternative Fault-Tolerant Design Techniques for FPGAs

Los Alamos Nat. Lab., Los Alamos
IEEE Transactions on Nuclear Science (Impact Factor: 1.46). 01/2008; DOI: 10.1109/TNS.2007.910871
Source: IEEE Xplore

ABSTRACT With growing interest in the use of SRAM-based FPGAs in space and other radiation environments, there is a greater need for efficient and effective fault-tolerant design techniques specific to FPGAs. Triple-modular redundancy (TMR) is a common fault mitigation technique for FPGAs and has been successfully demonstrated by several organizations. This technique, however, requires significant hardware resources. This paper evaluates three additional mitigation techniques and compares them to TMR. These include quadded logic, state machine encoding, and temporal redundancy, all well-known techniques in custom circuit technologies. Each of these techniques are compared to TMR in both area cost and fault tolerance. The results from this paper suggest that none of these techniques provides greater reliability and often require more resources than TMR.

1 Bookmark
  • [Show abstract] [Hide abstract]
    ABSTRACT: The number of configurable systems deployed in hostile environments continues to rise. This, along with decreasing geometries and lower operating voltages leads to an expected increase in transient errors. This paper presents Resiliency-aware Scheduling, a novel approach to resource allocation for hardening computations on configurable systems. Using modular and replicated functional units called hybrid TMR that exploit a computation's Intrinsic Resiliency, our results show that for designs with similar performance, RaS exhibits a 60% area savings over a traditional TMR configuration with the same operation coverage.
    Field-Programmable Technology (FPT), 2012 International Conference on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: VLIW architectures are seeing increased deployment in a number of hostile environments. In addition, softcore VLIW architectures, which allow for run-time customization of the VLIW datapath, are becoming viable for a number of safety-critical applications. As error and failure rates rise, these applications elicit a need for automated and resilient architecture configuration tools. To mitigate these issues, this paper presents a Resiliency-aware Scheduling approach to the configuration of a custom VLIW architecture, providing computational resilience via software duplication. The automated RaS tool determines the optimal set of resources needed to provide a given level of resilience for a reconfigurable softcore VLIW architecture. For a sample case study, based on a common physics code kernel, the RaS approach is compared to traditional hardware (TMR) and software (source-level code replication) approaches. Results show a Resiliency-aware Scheduling-generated architecture configuration can potentially require up to 50% fewer functional units when compared to a TMR-hardened machine of similar performance, and can potentially improve performance by up to 40% over source-level software approaches.
    Reconfigurable Computing and FPGAs (ReConFig), 2012 International Conference on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: SRAM-based field programmable gate arrays (FPGAs) are particularly sensitive to single event upsets caused by high-energy space radiation. Single Event Upset (In order to successfully deploy the SRAM-FPGA based designs in aerospace applications, designers need to adopt suitable hardening techniques. In this paper, we describe novel hybrid time and hardware redundancy (HT&HR) structures to mitigate SEU effects on FPGA, especially digital circuits that are designed with bidirectional ports. The proposed structures that combine time and hardware redundancy decrease the SEU propagation mechanisms among the redundant hard units. Analysis results and fault injection experiments on some standard ISCAS benchmarks and MicroLAN protocol, as a case study over the bidirectional ports, show that the capability of tolerating SEU effects in HT&HR technique increases up to 70 times with respect to solely hardware redundant versions. On average, the proposed method provides 39.2 times improvement against single upset faults and 14.9 times for double upset faults; however it imposes about 14.7% area overhead. Also, for the considered benchmarks, HT&HR circuits become 8.8% faster on the average than their TMR versions.
    Microelectronics Journal 07/2014; 45(7):870–879. · 0.91 Impact Factor