Software implemented transient fault detection in space computer
ABSTRACT Computer systems operating in space environment are subject to different radiation phenomena, whose effects are often called “Soft Error”. Generally, these systems employ hardware techniques to address soft-errors, however, software techniques can provide a lower-cost and more flexible alternative. This paper presents a novel, software-only, transient-fault-detection technique, which is based on a new control flow checking scheme combined with software redundancy. The distinctive advantage of our approach over other fault tolerance techniques is the lower performance overhead with the higher fault coverage. It is able to cope with transient faults affecting data and the program control flow. By applying the proposed technique on several benchmark applications, we evaluate the error detection capabilities by means of several fault injection campaigns. Experimental results show that the proposed approach can detect more than 98% of the injected bit-flip faults with a mean execution time increase of 153%.
- SourceAvailable from: illinois.edu[show abstract] [hide abstract]
ABSTRACT: This paper evaluates the concurrent error detection capabilities of system-level checks, using fault and error injection. The checks comprise application and system level mechanisms to detect control flow errors. We propose Enhanced Control-Flow Checking Using Assertions (ECCA). In ECCA, branch-free intervals (BFI) in a given high or intermediate level program are identified and the entry and exit points of the intervals are determined. BFls are then grouped into blocks, the size of which is determined through a performance/overhead analysis. The blocks are then fortified with preinserted assertions. For the high level ECCA, we describe an implementation of ECCA through a preprocessor that will automatically insert the necessary assertions into the program. Then, we describe the intermediate implementation possible through modifications made on gee to make it ECCA capable. The fault detection capabilities of the checks are evaluated both analytically and experimentally. Fault injection experiments are conducted using FERRARI to determine the fault coverage of the proposed techniquesIEEE Transactions on Parallel and Distributed Systems 07/1999; · 1.80 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: Increasing design complexity for current and future generations of microelectronic technologies leads to an increased sensitivity to transient bit-flip errors. These errors can cause unpredictable behaviors and corrupt data integrity and system availability. This work proposes new solutions to detect all classes of faults, including those that escape conventional software detection mechanisms, allowing full protection against transient bit-flip errors. The proposed solutions, particularly well suited for low-cost safety-critical microprocessor-based applications, have been validated through exhaustive fault injection experiments performed on a set of real and synthetic benchmark programs. The fault model taken into consideration was single bit-flip errors corrupting memory cells accessible to the user by means of the processor instruction set. The obtained results demonstrate the effectiveness of the proposed solutions.IEEE Transactions on Nuclear Science 01/2005; · 1.22 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: This paper proposes a pure software technique "error detection by duplicated instructions" (EDDI), for detecting errors during usual system operation. Compared to other error-detection techniques that use hardware redundancy, EDDI does not require any hardware modifications to add error detection capability to the original system. EDDI duplicates instructions during compilation and uses different registers and variables for the new instructions. Especially for the fault in the code segment of memory, formulas are derived to estimate the error-detection coverage of EDDI using probabilistic methods. These formulas use statistics of the program, which are collected during compilation. EDDI was applied to eight benchmark programs and the error-detection coverage was estimated. Then, the estimates were verified by simulation, in which a fault injector forced a bit-flip in the code segment of executable machine codes. The simulation results validated the estimated fault coverage and show that approximately 1.5% of injected faults produced incorrect results in eight benchmark programs with EDDI, while on average, 20% of injected faults produced undetected incorrect results in the programs without EDDI. Based on the theoretical estimates and actual fault-injection experiments, EDDI can provide over 98% fault-coverage without any extra hardware for error detection. This pure software technique is especially useful when designers cannot change the hardware, but they need dependability in the computer system. To reduce the performance overhead, EDDI schedules the instructions that are added for detecting errors such that "instruction-level parallelism" (ILP) is maximized. Performance overhead can be reduced by increasing ILP within a single super-scalar processor. The execution time overhead in a 4-way super-scalar processor is less than the execution time overhead in the processors that can issue two instructions in one cycleIEEE Transactions on Reliability 04/2002; · 2.29 Impact Factor