Conference Paper

A low-cost concurrent error detection technique for processor control logic

DOI: 10.1145/1403375.1403592 Conference: Design, Automation and Test in Europe, DATE 2008, Munich, Germany, March 10-14, 2008
Source: DBLP


This paper presents a concurrent error detection technique targeted towards control logic in a processor with emphasis on low area overhead. Rather than detect all modeled transient faults, the technique selects faults which have a high probability of causing damage to the architectural state of the processor and protects the circuit against these faults. Fault detection is achieved through a series of assertions. Each assertion is an implication from inputs to the outputs of a combinational circuit. Fault simulation experiments performed on control logic modules of an industrial processor suggest that high reduction in damage causing faults can be achieved with a low overhead.

Download full-text


Available from: R. Galivanche, Jun 05, 2014
8 Reads
  • Source
    • "While transient errors that occur during circuit operation will require complex online error detection approach [4]–[6], permanent faults can be checked for during production and the chip can be discarded before it causes errors for the enduser . Circuit designers use variety of defect models to capture the behavior of a permanent defect in chip. "
    Journal of Computers 11/2011; 6(11):2335-2344. DOI:10.4304/jcp.6.11.2335-2344
  • Source
    • "Such invariance can be monitored during the normal operation of a circuit to identify errors that cause it to be violated. In [20] such invariance is mined from the gate-level of a controller implementation in the form of assertions, which are evaluated through simulation in order to select a costeffective appropriate subset. The same principle governs the approach in [21]; therein, however, invariance is identified through a path-construction algorithm, which exploits inherent transparency channels that exist in the RTL description of a modular design. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a Concurrent Error Detection (CED) scheme for the Scheduler of a modern microprocessor. The proposed CED scheme is based on monitoring a set of invariances imposed through added hardware, violation of which signifies the occurrence of an error. The novelty of our solution stems from the workload-cognizant way in which these invariances are selected so that they leverage the application-level error masking inherent in program execution. Specifically, in order to ensure cost-effectiveness of the hardware employed to construct these invariances, we make use of information regarding the type and frequency of errors affecting the typical workload of the microprocessor. Thereby, we identify the most susceptible aspects of instruction execution and we accordingly distribute CED resources to protect them. Our approach is demonstrated on the Scheduler of an Alpha-like superscalar microprocessor with dynamic scheduling, hybrid branch prediction and out-of-order execution capabilities. Using an extensive fault-simulation infrastructure that we developed around this microprocessor, we profile the impact of Scheduler faults across a variety of different SPEC2000 benchmarks. Based on the results, we construct a CED scheme which monitors the time and location of instruction execution, the executed operation, the utilized resources, as well as the executed and retired sequence of instructions. At a hardware cost of only 32 percent of the Scheduler, the corresponding CED scheme detects over 85 percent of its faults that affect the architectural state of the microprocessor. Furthermore, over 99.5 percent of these faults are detected before they corrupt the architectural state, while the average detection latency for the remaining faults is in the order of a few clock cycles, implying that efficient recovery methods can be developed.
    IEEE Transactions on Computers 10/2011; 60(9-60):1274 - 1287. DOI:10.1109/TC.2010.265 · 1.66 Impact Factor
  • Source
    • "Other researchers have considered relationships that occur between flip-flops or between circuit sites at the gate level. For example, the authors of [13] investigated the protection of the control logic of a microprocessor though the use of relationships between functions of the flip-flops in a design. The authors of [2] proposed the use of checking functions, which identify circuit sites or functions of circuit sites that should always be equal to (or complements of) each other. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a new method to identify multi-site implications that can significantly increase the fault coverage of error-detecting hardware without increasing the area overhead. This method intelligently divides the input space about the functions of internal circuit sites and finds new valuable implications that can share gates in checker logic.
    VLSI Test Symposium (VTS), 2011 IEEE 29th; 06/2011
Show more