-
[show abstract]
[hide abstract]
ABSTRACT: We investigate the correlation between low-level faults in the control logic of a modern microprocessor and their instruction-level impact on the execution of typical workload. Such information can prove immensely useful in accurately assessing and prioritizing faults with regards to their criticality, as well as commensurately allocating resources to enhance online testability and error/fault resilience through concurrent error detection/correction methods. To this end, we developed an extensive fault simulation infrastructure which allows injection of stuck-at faults and transient errors of arbitrary starting time and duration, as well as cost-effective simulation and classification of their repercussions into various instruction-level error types. As a test vehicle for our study, we employ a superscalar, dynamically-scheduled, out-of-order, Alpha-like microprocessor, on which we execute SPEC2000 integer benchmarks. Extensive fault injection campaigns in control modules of this microprocessor facilitate valuable observations regarding the distribution of low-level faults into the instruction-level error types that they cause. Experimentation with both Register Transfer (RT-) and Gate-Level faults, as well as with both stuck-at faults and transient errors, confirms the validity and corroborates the utility of these observations.
IEEE Transactions on Computers 10/2011; · 1.10 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: We present a Concurrent Error Detection (CED) scheme for the Scheduler of a modern microprocessor. The proposed CED scheme is based on monitoring a set of invariances imposed through added hardware, violation of which signifies the occurrence of an error. The novelty of our solution stems from the workload-cognizant way in which these invariances are selected so that they leverage the application-level error masking inherent in program execution. Specifically, in order to ensure cost-effectiveness of the hardware employed to construct these invariances, we make use of information regarding the type and frequency of errors affecting the typical workload of the microprocessor. Thereby, we identify the most susceptible aspects of instruction execution and we accordingly distribute CED resources to protect them. Our approach is demonstrated on the Scheduler of an Alpha-like superscalar microprocessor with dynamic scheduling, hybrid branch prediction and out-of-order execution capabilities. Using an extensive fault-simulation infrastructure that we developed around this microprocessor, we profile the impact of Scheduler faults across a variety of different SPEC2000 benchmarks. Based on the results, we construct a CED scheme which monitors the time and location of instruction execution, the executed operation, the utilized resources, as well as the executed and retired sequence of instructions. At a hardware cost of only 32 percent of the Scheduler, the corresponding CED scheme detects over 85 percent of its faults that affect the architectural state of the microprocessor. Furthermore, over 99.5 percent of these faults are detected before they corrupt the architectural state, while the average detection latency for the remaining faults is in the order of a few clock cycles, implying that efficient recovery methods can be developed.
IEEE Transactions on Computers 10/2011; · 1.10 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The notion of Architectural Vulnerability Factor (AVF) has been extensively used by designers to evaluate various aspects of design robustness. While AVF is a very accurate way of assessing element resiliency, its calculation requires rigorous and extremely time-consuming experiments. In response, designers have introduced various methodologies that allow AVF calculation within reasonable time, at the cost of some loss of accuracy. In this paper, we present a method for calculating the AVF of design elements-using Statistical Fault Injection (SFI)-with equal accuracy but several orders of magnitude faster than traditional SFI techniques. Our method partitions the design into various hierarchical levels and systematically performs incremental fault injections to generate the AVF numbers. The presented method has been applied on an Intel microprocessor, where experimental results corroborate its ability to achieve great speed-up while maintaining perfect accuracy in calculating AVF.
European Test Symposium (ETS), 2011 16th IEEE; 06/2011
-
[show abstract]
[hide abstract]
ABSTRACT: Towards improving performance, modern microprocessors incorporate a variety of architectural features, such as branch prediction and speculative execution, which are not critical to the correctness of their operation. While faults in the corresponding hardware may not necessarily affect functional correctness, they may, nevertheless, adversely impact performance. In this paper, we investigate quantitatively the performance impact of such faults using a superscalar, dynamically-scheduled, out-of-order, Alpha-like microprocessor, on which we execute SPEC2000 integer benchmarks. We provide extensive fault simulation-based experimental results and we discuss how this information may guide the inclusion of additional hardware for performance loss recovery and yield enhancement.
Computer Design, 2009. ICCD 2009. IEEE International Conference on; 11/2009
-
[show abstract]
[hide abstract]
ABSTRACT: We investigate the correlation between register transfer-level faults in the control logic of a modern microprocessor and their instruction-level impact on the execution flow of typical programs. Such information can prove immensely useful in accurately assessing and prioritizing faults with regards to their criticality, as well as commensurately allocating resources to enhance testability, diagnosability, manufacturability and reliability. To this end, we developed an extensive infrastructure which allows injection of stuck-at faults and transient errors of arbitrary starting point and duration, as well as cost-effective simulation and classification of their repercussions into various instruction-level error types. As a test vehicle for our study, we employ a superscalar, dynamically-scheduled, out-of-order, Alpha-like microprocessor, on which we execute SPEC2000 integer benchmarks. Extensive experimentation with faults injected in control logic modules of this microprocessor reveals interesting trends and results, corroborating the utility of this simulation infrastructure and motivating its further development and application to various tasks related to robust design.
Test Conference, 2008. ITC 2008. IEEE International; 11/2008
-
[show abstract]
[hide abstract]
ABSTRACT: This paper presents a concurrent error detection technique for the control logic of a modern microprocessor. Our method is based on execution time prediction for each instruction executing in the processor. To evaluate the proposed method, we use a superscalar, dynamically-scheduled, out-of-order, Alpha-like microprocessor, on which we execute SPEC2000 integer benchmarks and we consider the coverage and the detection latency for faults in the scheduler module of the microprocessor controller. Experimental results show, that through this method, a large percentage of control logic faults can be detected with low latency during normal operation of the processor.