A Comparison of TMR With Alternative Fault-Tolerant Design Techniques for FPGAs

Los Alamos Nat. Lab., Los Alamos
IEEE Transactions on Nuclear Science (Impact Factor: 1.22). 01/2008; DOI: 10.1109/TNS.2007.910871
Source: IEEE Xplore

ABSTRACT With growing interest in the use of SRAM-based FPGAs in space and other radiation environments, there is a greater need for efficient and effective fault-tolerant design techniques specific to FPGAs. Triple-modular redundancy (TMR) is a common fault mitigation technique for FPGAs and has been successfully demonstrated by several organizations. This technique, however, requires significant hardware resources. This paper evaluates three additional mitigation techniques and compares them to TMR. These include quadded logic, state machine encoding, and temporal redundancy, all well-known techniques in custom circuit technologies. Each of these techniques are compared to TMR in both area cost and fault tolerance. The results from this paper suggest that none of these techniques provides greater reliability and often require more resources than TMR.

  • [Show abstract] [Hide abstract]
    ABSTRACT: VLIW architectures are seeing increased deployment in a number of hostile environments. In addition, softcore VLIW architectures, which allow for run-time customization of the VLIW datapath, are becoming viable for a number of safety-critical applications. As error and failure rates rise, these applications elicit a need for automated and resilient architecture configuration tools. To mitigate these issues, this paper presents a Resiliency-aware Scheduling approach to the configuration of a custom VLIW architecture, providing computational resilience via software duplication. The automated RaS tool determines the optimal set of resources needed to provide a given level of resilience for a reconfigurable softcore VLIW architecture. For a sample case study, based on a common physics code kernel, the RaS approach is compared to traditional hardware (TMR) and software (source-level code replication) approaches. Results show a Resiliency-aware Scheduling-generated architecture configuration can potentially require up to 50% fewer functional units when compared to a TMR-hardened machine of similar performance, and can potentially improve performance by up to 40% over source-level software approaches.
    Reconfigurable Computing and FPGAs (ReConFig), 2012 International Conference on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: The number of configurable systems deployed in hostile environments continues to rise. This, along with decreasing geometries and lower operating voltages leads to an expected increase in transient errors. This paper presents Resiliency-aware Scheduling, a novel approach to resource allocation for hardening computations on configurable systems. Using modular and replicated functional units called hybrid TMR that exploit a computation's Intrinsic Resiliency, our results show that for designs with similar performance, RaS exhibits a 60% area savings over a traditional TMR configuration with the same operation coverage.
    Field-Programmable Technology (FPT), 2012 International Conference on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Hostile environments, shrinking feature sizes and processor aging elicit a need for resilient computing. Coarse-grained hardware approaches, such as Triple Modular Redundancy (TMR) and Temporal Redundancy (TR), while exhibiting acceptable levels of fault coverage [1], are often wasteful of resources such as time, device/chip area and power. A TMR-hardened computation can exhibit poor performance relative to a non-TMR hardware configuration with similar area. This is because the resources that are used to replicate functional units in parallel (in the case of TMR) can only execute one operation at a time. Conversely, in an equivalent non-TMR configuration, those same resources could execute three different operations concurrently (albeit with no resiliency coverage). In short, TMR is very rigid in its allocation of resources, using them only for resiliency.
    Field Programmable Logic and Applications (FPL), 2012 22nd International Conference on; 01/2012