Microprocessor sensitivity to failures: control vs. execution and combinational vs. sequential logic
ABSTRACT The goal of this study is to characterize the impact of soft errors on embedded processors. We focus on control versus speculation logic on one hand, and combinational versus sequential logic on the other. The target system is a gate-level implementation of a DLX-like processor. The synthesized design is simulated, and transients are injected to stress the processor while it is executing selected applications. Analysis of the collected data shows that fault sensitivity of the combinational logic (4.2% for a fault duration of one clock cycle) is not negligible, even though it is smaller than the fault sensitivity of flip-flops (10.4%). Detailed study of the error impact, measured at the application level, reveals that errors in speculation and control blocks collectively contribute to about 34% of crashes, 34% of fail-silent violations and 69% of application incomplete executions. These figures indicate the increasing need for processor-level detection techniques over generic methods, such as ECC and parity, to prevent such errors from propagating beyond the processor boundaries.
Conference Paper: Hybrid residue generators for increased efficiency[Show abstract] [Hide abstract]
ABSTRACT: In order for residue checking to effectively protect computer arithmetic, designers must be able to efficiently compute the residues of the input and output signals of functional units. Low-cost, single-cycle residue generators can be readily formed out of two's complement adders in two ways, which have area and delay tradeoffs. A residue generator using adder-incrementers for end-around-carry adders is small but slow, and a design using carry-select adders is fast, but large. It is shown that a hybrid combination of both approaches is more efficient than either.Signals, Systems and Computers (ASILOMAR), 2011 Conference Record of the Forty Fifth Asilomar Conference on; 01/2011
- [Show abstract] [Hide abstract]
ABSTRACT: There is an increasing need for fault tolerance capabilities in logic devices brought about by the scaling of transistors to ever smaller geometries. This paper presents a hypervisor-based replication approach that can be applied to commodity hardware to allow for virtually lockstepped execution. It offers many of the benefits of hardware-based lockstep while being cheaper and easier to implement and more flexible in the configurations supported. A novel form of processor state fingerprinting is also presented, which can significantly reduce the fault detection latency. This further improves reliability by triggering rollback recovery before errors are recorded to a checkpoint. The mechanisms are validated using a full prototype and the benchmarks considered indicate an average performance overhead of approximately 14 percent with the possibility for significant optimization. Finally, a unique method of using virtual lockstep for fault injection testing is presented and used to show that significant detection latency reduction is achievable by comparing only a small amount of data across replicas.IEEE Transactions on Dependable and Secure Computing 01/2012; 9(1):2-15. · 1.06 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: This paper presents a fault-tolerant, programmable voter architecture for software-implemented N-tuple modular redundant (NMR) computer systems. Software NMR is a cost-efficient solution for high-performance, mission-critical computer systems because this can be built on top of commercial off-the-shelf (COTS) devices. Due to the large volume and randomness of voting data, software NMR system requires a programmable voter. Our experiment shows that voting software that executes on a processor has the time-of-check-to-time-of-use (TOCTTOU) vulnerabilities and is unable to tolerate long duration faults. In order to address these two problems, we present a special-purpose voter processor and its embedded software architecture. The processor has a set of new instructions and hardware modules that are used by the software in order to accelerate the voting software execution and address the identified two reliability problems. We have implemented the presented system on an FPGA platform. Our evaluation result shows that using the presented system reduces the execution time of error detection codes (commonly used in voting software) by 14% and their code size by 56%. Our fault injection experiments validate that the presented system removes the TOCTTOU vulnerabilities and recovers under both transient and long duration faults. This is achieved by using 0.7% extra hardware in a baseline processor.IEEE Aerospace Conference Proceedings 01/2012;