Article

A Comparison of TMR With Alternative Fault-Tolerant Design Techniques for FPGAs

Los Alamos Nat. Lab., Los Alamos
IEEE Transactions on Nuclear Science (Impact Factor: 1.22). 01/2008; DOI: 10.1109/TNS.2007.910871
Source: IEEE Xplore

ABSTRACT With growing interest in the use of SRAM-based FPGAs in space and other radiation environments, there is a greater need for efficient and effective fault-tolerant design techniques specific to FPGAs. Triple-modular redundancy (TMR) is a common fault mitigation technique for FPGAs and has been successfully demonstrated by several organizations. This technique, however, requires significant hardware resources. This paper evaluates three additional mitigation techniques and compares them to TMR. These include quadded logic, state machine encoding, and temporal redundancy, all well-known techniques in custom circuit technologies. Each of these techniques are compared to TMR in both area cost and fault tolerance. The results from this paper suggest that none of these techniques provides greater reliability and often require more resources than TMR.

0 Bookmarks
 · 
66 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: Hostile environments, shrinking feature sizes and processor aging elicit a need for resilient computing. Coarse-grained hardware approaches, such as Triple Modular Redundancy (TMR) and Temporal Redundancy (TR), while exhibiting acceptable levels of fault coverage [1], are often wasteful of resources such as time, device/chip area and power. A TMR-hardened computation can exhibit poor performance relative to a non-TMR hardware configuration with similar area. This is because the resources that are used to replicate functional units in parallel (in the case of TMR) can only execute one operation at a time. Conversely, in an equivalent non-TMR configuration, those same resources could execute three different operations concurrently (albeit with no resiliency coverage). In short, TMR is very rigid in its allocation of resources, using them only for resiliency.
    Field Programmable Logic and Applications (FPL), 2012 22nd International Conference on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Nowadays the reliability issues of SRAM-based Field Programmable Gate Arrays (FPGAs) operating in harsh environments are well understood. One major effect is Single Event Upsets (SEUs), which are able to invert the stored logical value in flip-flops and memory cells. This issue is more serious when the affected memory cells are part of the configuration memory used for programming the circuit functionality. The consequences may be alterations of the circuit functionality causing errors which may only be corrected by reprogramming the device. For a better understanding of the robustness of programmed circuits, this paper compares two decoders for Error Correction Codes (ECCs). A Hamming Decoder and a One-Step Majority Logic Decoder (OS-MLD) for the Difference-Set Cyclic Codes (DSCC) are analyzed yielding surprisingly unexpected results for their SEU susceptibility, which are interesting for application designers.
    IEEE Transactions on Nuclear Science 06/2012; · 1.22 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: VLIW architectures are seeing increased deployment in a number of hostile environments. In addition, softcore VLIW architectures, which allow for run-time customization of the VLIW datapath, are becoming viable for a number of safety-critical applications. As error and failure rates rise, these applications elicit a need for automated and resilient architecture configuration tools. To mitigate these issues, this paper presents a Resiliency-aware Scheduling approach to the configuration of a custom VLIW architecture, providing computational resilience via software duplication. The automated RaS tool determines the optimal set of resources needed to provide a given level of resilience for a reconfigurable softcore VLIW architecture. For a sample case study, based on a common physics code kernel, the RaS approach is compared to traditional hardware (TMR) and software (source-level code replication) approaches. Results show a Resiliency-aware Scheduling-generated architecture configuration can potentially require up to 50% fewer functional units when compared to a TMR-hardened machine of similar performance, and can potentially improve performance by up to 40% over source-level software approaches.
    Reconfigurable Computing and FPGAs (ReConFig), 2012 International Conference on; 01/2012