A Comparison of TMR With Alternative Fault-Tolerant Design Techniques for FPGAs

Los Alamos Nat. Lab., Los Alamos
IEEE Transactions on Nuclear Science (Impact Factor: 1.28). 01/2008; 54(6):2065 - 2072. DOI: 10.1109/TNS.2007.910871
Source: IEEE Xplore


With growing interest in the use of SRAM-based FPGAs in space and other radiation environments, there is a greater need for efficient and effective fault-tolerant design techniques specific to FPGAs. Triple-modular redundancy (TMR) is a common fault mitigation technique for FPGAs and has been successfully demonstrated by several organizations. This technique, however, requires significant hardware resources. This paper evaluates three additional mitigation techniques and compares them to TMR. These include quadded logic, state machine encoding, and temporal redundancy, all well-known techniques in custom circuit technologies. Each of these techniques are compared to TMR in both area cost and fault tolerance. The results from this paper suggest that none of these techniques provides greater reliability and often require more resources than TMR.

39 Reads
  • Source
    • "However, most of the SCADA systems described above are nonredundant , which cannot provide high reliability. The redundancy systems are usually developed based on the chip-level processors, such as Field Programmable Gate Array (FPGA) [23] [24] or Single Chip Microcomputer (SCM) [25] [26], but not the system-level processors, such as PLC or PC. The redundancy systems based on chip-level processors are difficult to develop, which require professionals to develop the control hardware and software. "
    [Show abstract] [Hide abstract]
    ABSTRACT: An extremely reliable remote control system for subsea blowout preventer stack is developed based on the off-the-shelf triple modular redundancy system. To meet a high reliability requirement, various redundancy techniques such as controller redundancy, bus redundancy and network redundancy are used to design the system hardware architecture. The control logic, human-machine interface graphical design and redundant databases are developed by using the off-the-shelf software. A series of experiments were performed in laboratory to test the subsea blowout preventer stack control system. The results showed that the tested subsea blowout preventer functions could be executed successfully. For the faults of programmable logic controllers, discrete input groups and analog input groups, the control system could give correct alarms in the human-machine interface.
    Full-text · Article · Sep 2011 · ISA Transactions
  • Source
    • "The technique is widely used in mission-critical applications for fault detection as well as fault masking. As mentioned previously, TMR [13], [14] configuration involves three replicas of the design which are running at a time and the outputs are compared by a voting element. The majority voted output is passed through and becomes the actual output of the system. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We employ output-discrepancy consensus to mitigate faulty modules of a Triple Modular Redundant (TMR) arrangement using dynamic partial reconfiguration. Traditionally, the fault-handling resilience of a TMR arrangement is limited to fault(s) in a single TMR instance over the entire mission duration. An additional permanent fault in any of two other TMR instances results in mission's failure. However, in this work, a novel Self-Configuring approach for Discrepancy Resolution (SCDR) is developed and assessed. In SCDR, the occurrence of faults in more than one module initiates the repair mechanism, then upon fault recovery, the system is configured into Concurrent Error Detection (CED) mode. The approach is validated by the complete recovery of a TMR realization of 25 stage Finite Impulse Response (FIR) filter implemented on a reconfigurable platform as a case study. The results show that a self-healing circuit can be realized exploiting the dynamic partial reconfiguration capability of FPGAs while requiring a streamlined operational data path compared to TMR.
    Preview · Conference Paper · Jan 2011
  • Source
    • "Also, the TMRed design has more than three times the LUTs of the original design with the addition of the voter circuits used to mask out the erroneous output values of the triplicate circuits. Taking into account the experiments in [5] [6] [7] [8] [9] and our experiments, we estimate that the overhead in LUTs by TMR may be from 3.2 times to 3.9 times. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes the design and implementation of a radiation tolerant on-board computer (OBC) for the science and technology satellite-3 (STSAT-3). SRAM-based FPGAs are replacing traditional integrated circuits for space applications. However, it is difficult to employ the approach in space applications without radiation tolerant schemes to deal with the radiation effects such as single event upset (SEU). To mitigate the SEU effect, we apply a triple modular redundancy (TMR) scheme to the STSAT-3 OBC based on FPGA. Although there is an overhead in area, power and minimum clock period, we notice through a radiation test in an irradiation facility that our TMR based OBC is immune to the radiation environments up to a proton energy of 20.3MeV. The radiation environment of the test is expected to be more severe than the environment in which STSAT-3 is to be located.
    Full-text · Conference Paper · Jul 2010
Show more