Conference Paper

A low-cost concurrent error detection technique for processor control logic

DOI: 10.1145/1403375.1403592 Conference: Design, Automation and Test in Europe, DATE 2008, Munich, Germany, March 10-14, 2008
Source: DBLP
ABSTRACT
This paper presents a concurrent error detection technique targeted towards control logic in a processor with emphasis on low area overhead. Rather than detect all modeled transient faults, the technique selects faults which have a high probability of causing damage to the architectural state of the processor and protects the circuit against these faults. Fault detection is achieved through a series of assertions. Each assertion is an implication from inputs to the outputs of a combinational circuit. Fault simulation experiments performed on control logic modules of an industrial processor suggest that high reduction in damage causing faults can be achieved with a low overhead.

Full-text

Available from: R. Galivanche, Jun 05, 2014
A low-cost concurrent error detection technique for processor control logic
Ramtilak Vemu Abhijit Jas Jacob A. Abraham Srinivas Patil
Rajesh Galivanche
Computer Engineering Research Center Design and Technology Solutions
University of Texas at Austin Intel Corporation
{rvemu,jaa}@cerc.utexas.edu {ajas,spatil,rgalivanche}@intel.com
Abstract
This pa per presents a concurrent error detection tech-
nique targeted towards control logic in a processor with em-
phasis on low area overhead. Rather than detect all mod-
eled transient faults, the technique selects faults which have
a high probab ility of causing damage to the architectural
state of the processor and protects the circuit against these
faults. Fault detection is achieved through a series of asser-
tions. Each assertion is an implication from inputs to the
outputs of a com binational circuit. Fault simulation exper-
iments performed on control logic module s of an industrial
processor su ggest that high reduction in damage causing
faults can be achieved with a low overhead.
1 Introduction
Transient faults can occur in a processor as a result of
electrical noise, like crosstalk, or high energy partic les, like
neutrons and alpha particles. These faults can ca use a pro-
gram r unning on the processor to behave erratically, if they
propagate and change the architectural state of the proces-
sor. These faults can occur in memory arrays, sequential el-
ements or in the combinational logic in the processor. Pro-
tection aga inst transient faults in combinational logic h as
not received much attention trad i t i onally becau s e combina-
tional logic has a natur al barrier stopping the propagation
of the faults [4]. Three masking factors - logical, electri-
cal and latching-window, red uce the probability that a tran-
sient fault propag ates and latches on to sequen tial elements.
With the current trends in the processor industry, however,
the mask ing provided by these factors is reducing [5]. Re-
duced logical depth betwee n latches means that there are
more sensitized paths and hence more paths for a transient
fault at a gate to propagate and latch on. Decrea sing f eature
sizes and lowering operating voltages result in the lesser
charge stored at any node. Thus the electrical noise or th e
energy of the particle strikes re quired for triggering a tran-
sient fault is d ecreasing. High operating frequencies mean
that there are mo re latching-windows per unit time, thus in-
creasing the probability that a transient fault gets latched on.
Due to the reasons mentioned , the comb inational portion of
the processor is projected to become a dominant source of
failures due to transient faults [9].
Va rious techniques have been proposed to detect tran-
sient faults in combinational circuits. Residue codes have
been found to be very e ffective in detecting faults in data-
path circuits. For control logic circuits, codewor d based
schemes have been proposed. Codes like parity [8], [11],
[12], Berger [3] and Bose-Lin [1] are predicted for the out-
puts of the circuit and the codes of real-time outputs are
matched ag ainst the predicted codes. These techniques for
control logic circuits are viable for mission critical applica-
tions where reliability is of primary concern and area, tim-
ing and power take second place. There is another class
of techniqu es which do not attempt to detect all the mod-
eled faults like the above methods. Rather, they try to de-
tect most of the errors a t a reasonable overhead. Such tech-
niques are more viable for mainstream applications which
do not have the same stringent FIT (Failure In Time) re-
quirements as the mission critical applications [5]. The
work presented in this paper falls under this category of
techniques.
This pap er presents a new low-cost technique for concur-
rent error dete ction (CED) in processor control logic. The
proposed te chnique takes advantage of the fact that transien t
faults in gates along some paths are much more likely to
propagate to an architectural state under normal running o f
the pro cessor than others and protects against errors in these
paths. T he technique automatically extracts the control con-
ditions (input value combinations) under which these paths
are sensitized and converts these conditions into assertions.
Each assertion is an implication from the control conditions
to the value of an output. Depending on the area overhead
budget and the req uired transient error redu ction, a subset
of the extracted assertions can be selected for CED. The
work presented here is similar to the work presented in [2]
in the sense that outputs ar e pre dicted for a few input com-
binations. As opposed to this technique, the present work
978-3-9810801-3-1/DATE08 © 2008 EDAA
Page 1
Figure 1: Block diagram for the proposed CED
does not make any assumptions as to the duration of tran-
sient faults and has a very low latency. The work presented
here can be considered a more fine graine d approac h than
the concept of architectural vulnerability factor (AVF) [10].
Instead of just determinin g which modules are more vul-
nerable, we determine which flops in these m odules are the
most vulnerable and protect the combinational logic which
feeds these flops.
Fault simulation experime nts were p erformed on se-
lected control logic modules in an industrial processor using
sample segments o f real app lications. The proposed tec h-
nique was implemented for each of these modu les for fault
escape reductions of 50%, 75%, 90%, 95% and 99%. A
fault escape happens if a transient fault propagates unde-
tected to the archit ectural state of the processor. Results
show that high fault escape reductions can be achieved at
low costs. Over 95% fault escape r eduction can be ob tained
with ju st 25% area overhead.
The rest of the paper is o rga nized as follows. Section 2
gives an overview of the propose d te chnique and provid es
an insight as to why high fault escape reduc tions can be ob-
tained with a few assertion s. Section 3 and Section 4 pro-
vide a detailed discussion about the algorithm. Section 5
presents the experimental results an d Section 6 provides
conclu sions.
2 Overview
In this section, we will present an overview of the pro-
posed technique. The technique protects the combin ational
portion of the circuit against transient errors. Figure 1
shows a circuit which has CED capability. The technique
introduces an Assertion Checker wh ich takes as inputs the
inputs and the outputs of the combina tional circuit and
which gives out a signal whether they conform to a series
of assertions. Each assertio n is an implication of the form
antecedent = consequent (if antecedent is true, then
consequent i s true). The antecedent in each assert ion is a
Table 1: Distribution of input vectors of combinational po r-
tion of an example module
Number of unique % of total vectors
vectors (703547 vectors)
32 50
397 75
5392 90
37175 95
65317 99
72353 100
minterm on a subse t of inputs to the circuit. All the inputs
need not be part of the antecedent. In many cases, the an-
tecedent is a minterm on just one or two of th e inputs. Th e
consequent in each assertion is a literal of an output of the
circuit. For example, the following would be a valid as-
sertion accord ing to our technique: i2i3
!
= o5, where
i2 and i3 are two of the inputs to the c ircuit and o5 is an
output of the circuit. This assertion states that o5 should
have a value of 1 when i2 = 1 and i3=0. An assertion
will detect all th e faults which propagate to the output in the
consequent when the antecedent is true. The above exam-
ple assertion will detect all faults propagating to o5 when
i2,i3 = 1, 0.
In testin g terminology, the antecedent of an assertion
would form a test vector f or a stuck-at fault for the out-
put in the consequent. In the above example, i2=1and
i3=0would fo rm a test vec tor for a stu ck-at-0 fault at o5.
In fact, any test vector which detects any stuck-at fault at an
output of the combina tional circuit can be converted into an
assertion on that output.
An assertion can also be viewed as checking for a subset
of the truth table f or the correspo nding output. The above
example checks for that subset of the o5 truth table which
has i2 = 1 and i3 = 0 .
In order to keep the overhead for concurrent error de-
tection to a minimum, we need to select the minimal set of
assertions such that the required transient fault coverage is
achieved. To assist us in selecting the assertions to be in-
cluded in the assertion checker, we use transient fault simu-
lations using sample segments of real applications and take
into consideration only the faults which propagate all the
way to the architectural states of the processor. This differs
from the experimental methodology followed in almost all
previous CED schemes proposed [1], [3], [5], [8], [11], [12],
where r andom vectors a re used as inputs of the combina-
tional circuits and all the faults which propagate to the out-
puts of the combinational portion are considered im portant.
The methodology followed in this paper is sim ilar to the one
used in [6] for lo gic derating.
The effectiveness of an a ssert ion in detecting transient
faults which propagate to the pr imary outputs varies wid ely
depending on the following factors.
Page 2
Figure 2: Asymmetry amo ng flops contr ibuting to faults
which propagate to primary outputs
Faults in som e paths are more likely to propagate and
latch on to sequential elements than others. In the
control- logic of a processor, the distribution of vec-
tors applied at run-time to the inputs of the combi-
national portion (primary inputs as well as outputs of
latches/flops) is highly skewed. A small sub set of input
vectors is applied for a large percentage of clock cy-
cles. This is due to the fact that some state transitions in
the finite state machine (FSM) of a contro l logic mod-
ule are mo re common than others. Additionally, some
input combinations are invalid and hence cannot occur.
As a result, some p aths in the circuit are more often ex-
ercised than others. Transient faults in these paths are
more likely to propagate to the o utputs of the c ombina-
tional part of the circuit and hence to the inputs of the
sequential elements (latches and flops). To show their
skewed nature, we collected vectors that wer e applied
at the inputs of the combinational portions of a control
logic module (module3 in Table 3) during 703547 cy-
cles. The module has 390 inputs to the combinational
portion. We collected these vectors from traces of sam-
ple progra ms ru nning on the processor. The uniqu e vec-
tors are sorted according to the number of times they
occur and their distribution is shown in Table 1. The
703547 vectors have a total of 72353 unique vectors.
The asymmetric al nature of the vectors can be seen from
the table. Just 32 unique vec tors contribute to about
50% of all the vectors.
Due to clock gatin g, bit-flips at inputs at some of the
sequential elements are more likely to get latched on
than others.
Bit flips in certain sequential elements are more likely to
affect the architec tural states of the processor tha n oth-
ers. Bit flips in latches (flops) may be masked logically
in the combinational portion which prevents them f rom
propagating to the architectu ral states. Due to asymme-
try in the vectors a pplied at the inputs of combination al
logic, bit flips in some latches are more likely to prop-
agate to the next level of latch es. For example, if the
output of a la tch is fed to an AND gate whose other
input is predominantly 0, a bit flip in that latch has a
very low probability of prop agating. The greater the
pipeline distance between a latch and the arc hitectural
states, the more probable it is that a bit flip in that latch
is masked. To show the asymmetry in the importance
of latche s in terms of bit flips in them propagating, we
randomly injected tra nsient faults in the sequential ele-
ments of the modules listed in Table 3. We then marked
the faults which propa gate to the primary outputs of th e
module in which each of those modules is instantiated .
We sorted the sequential elements according the num ber
of faults in the elements which propagate to the architec-
tural states and Fig ure 2 shows the results. We can see
that bit flips in just 5% of the flop s contribute to more
than 90% of all the faults propagating to the primary
outputs.
The net effect of the above observations is that so me asser-
tions detect more transient faults propagating to the archi-
tectural states of t he pr ocessor than othe rs. An effective as-
sertion is one whose consequent is o n a combinationa l out-
put wh ich f eeds a vulne rable latch. The antecedent of the
assertion will cover the most common subset of the output
truth table. The next few se ctions deal with how we auto-
matically extract such assertions based on fault simulatio ns
on the circuit.
3 Algorithm for assertion extraction
This section describes the algorithm for extracting all the
assertions which a re valid for a particular input vector. In
the next section, we in tegr ate this algorithm with the rest
of the flow for fin ding the minimal set of assertio ns. For
ease of explanation, we define the term co ntrol assignment
(CA). For any particular vector, the CA for any n et in the
circuit defines the assignments of values to inputs wh ich
guarantee th e current value of the net (value of the net when
the curre nt input vector is applied). In testing terminolo gy,
each a ssignment of values can be considered a different con-
trollability cond i t i on which is true for the current vector. A
CA is in sum of products (SOP) format where each prod-
uct defines a different controllability condition. A CA of
i1+i2
!
for a net means that the net is guaranteed to have
the current value if i1 = 1 or if i2 = 0. Additionally, i1
and i2 have values 1 and 0 in the curre nt vector. A pro duct
in the control assignment for an output of th e combinational
circuit is a test vector for stuck-at fault at that output since
propagation condition is also met (in addition to controlling
condition).
If we know the CAs of all the inputs of any gate, the CA
of its output can be calculated according to the rules stated
below.
Page 3
Table 2: Propagation of CAs f or an AND gate
i
1
i
2
00 01 10 11
CA
o
CA
1
+ CA
2
CA
1
CA
2
CA
1
.CA
2
If the gate has at lea st one controlling input, the CA of
the output is the sum of CAs of all the gate inputs which
have controlling values.
If the gate has all non-controlling values a t the inputs,
the CA of the output is the product of CAs of all the
gate inputs.
Table 2 illustrates the propagation of CAs for an AN D gate
with inputs i1 and i2. CA
1
, CA
2
and CA
o
represent the
CAs of i1, i2 and the output of the gate. The propagation
tables for other types of gates can be similarly obtained.
We will now present an algorithm for extracting the as-
sertions on the outputs of the circuit for a given vector. Ini-
tially, all the nets in the circuit ar e ordered topologically (all
inputs of a gate are listed before the output of the gate). For
each net in the ordered list
1. If the net is an input of the circuit, the CA of the net is
the positive literal of the net if the net has a value 1 in
the sim ulation vector. The CA is the negative literal of
the net if it has a value 0.
2. If the net is no t an input to the circuit, calculate the
CA of the net from the CAs o f the inputs of the gate
driving the net according to the ru les stated above.
3. Convert the CA into the SOP format if it is not al-
ready in the format.
4. Trim the CA to contain only those products which
have n umber of literals lesser than (n+thresh), where
n is the minimum number of li terals in all the products
and thresh is a parameter of the algorithm.
Assertions on the outp uts of the c ircuit are then extra cted
as follows. Each product in the output CA can be made an
antecedent of a different assertion on that output. If the out-
put has a value 0, then the consequent of the assertions will
be the negative literal of the output. It will be the positive
literal of the output otherwise.
We need to trim the CAs of the nets (step 4 in the al-
gorithm above) to prevent the explosion of the nu mber of
terms in the CAs calculated subsequently from this CA.
We trim away the products which have a large nu mber of
literals. The intuition behind this trimming is that the lesser
the number of literals in the antecedent o f any assertion,
the more probable it is to occur very often and h ence the
more probable it is to be picked among the most effective
assertions. On the other hand, trimming away some of the
products may lead to dropping some of the assertions which
may detect a large number of transient faults.
Figure 3 gives an example circuit and shows how the
control assignments are propagated. Each net in the cir-
cuit is accompanied by the tuple (ne t name, value, control
Figure 3: Example control assignment prop agation
assignment). The circuit has inputs a , b, c and d an d has a n
output y. A vec tor 0001 is applied to the circuit. The calcu-
lated control assignments are given in th e figure. The AND
gate gives an exam ple of how CAs are propagated when
both gate inputs are controlling. The output gate gives an
example of how CAs are propagated when b oth gate inputs
are non-controlling. The output y has a CA of a
!
d + b
!
d.
Two assertions can the n be extracted for the given vector,
(a
!
d = y
!
) and (b
!
d = y
!
).
4 Algorithm for low-cost CED
In this section, we describe the algorithm fo r construct-
ing the assertion checker for a given circuit. The algorithm
takes as inputs the description of the circuit and the func-
tional vectors applied to the circ uit. The algorithm works
for a given target reduction in faul t escapes. A fault escape
is a fault which propagates to the architectur al state of the
processor without b eing detected. In the absence of any
concurrent error detection (CED), all the faults which prop-
agate to th e arc hitectural states are fault escap es. In pres-
ence of CED, some of these faults are detected and hence
there is a reduction in the number of fault escap es. The tar-
get reduction in fault escapes that is require d is given as a
paramete r to the algorithm. Given below are the steps in-
volved in implementing the algorithm.
Step 1: Performing fault simulations and building fault
database. In this step we inject m transient faults in each
cycle of the functional vec tors, where m is a parameter to
the algorithm. The transient faults are injected in the com-
binational por tion of the design accor ding to any given fault
model (single-event transients, cross-talk faults etc.). We
set as observation points the architectural state as w ell as
the outputs of combinational portions. For each fault w hich
propagates to the architectural state, we note the outputs of
the combinational portion to which the fault propagates be-
fore be ing first latched on to a sequential element. For these
faults, we store th e vector, the fault site and the outputs of
the c ombinational portion to which the fault propagates in a
fault database. Since we sto re only the faults which propa-
gate to the architectural state, we automatically consider the
various m asking factors mentioned in Sectio n 2.
Step 2: Extracting assertions. For each unique vector
in the fault database, we extract assertions as described in
Page 4
Table 3: Details of modules used for evaluation
Module Num. of Num. of Num. of Num. of
Combinational Sequential Inputs Outputs
Gates Elements
module1 1509 378 64 108
module2 1314 363 79 108
module3 1692 435 100 194
module4 495 177 49 27
module5 2602 773 76 110
Section 3 for all combina tional outputs to which any fault
injected in that vector propagates to.
Step 3: Building the assertion databa se. For each ex-
tracted assertion, we find out all the faults which are de-
tected by that assertion. An assertion detects a fault if the
antecedent of the assertion holds true for th e vecto r in which
the fault is injected and the fault p ropagates to the output in
the consequent of the assertion. We store the list of asser-
tions and the faults they detect in an assertion database.
Step 4: Picking top assertio ns for a given reduction in
fault escape s. Ideally we would like to pick the minimal
number of assertions for detecting a given number of faults.
This problem is similar to the set-covering problem and is
NP-complete. We em ploy a simple greedy a pproximation
algorithm to pick assertions for a given reduction in fault
escapes. The ta rge t number of faults required to be detected
for achieving the targe t reduction in fault escapes is calcu-
lated. The assertions are greedily picked till th e target num-
ber of faults is detecte d.
Step 5: Constructing the assertion checker. Once the
assertions needed for a given reduction in fault escapes are
picked, the assertion checker is constructed by synthesizing
the conjunction of all the individual assertions. We consid-
ered two different implementations for synthesizing the as-
sertion checker - a totally self-checking checker and a self-
exercising strongly code-disjoint checker [7]. A dual-rail
implementation is used for synthesiz ing the self-checking
checker. For the self-exercising checker, during the test
phase, all the antecedents are forced to be true and the con-
sequents are forced to be false one after the other. This im-
plementation takes advantage of the fact that the asser tion
checker is the conjunction of all the individual assertions to
obtain a low-overhea d self-exercising chec ker.
5 Experimental results
The algorithm presented in this paper was evaluated on
five r andom control logic modules in the integer execu tion
unit of an industrial processor. The mo dules - module1,
module2, module3, module4 and module5 - are instanti-
ated in the execution unit of the processor. The details of
these modules are given in Table 3.
An in-house transient fault simulator was used fo r all the
fault simulations. The vectors used for fault simulation are
functional traces extracted when r unning programs on the
processor simulation model. 416 different functional traces
with a total of 70354 7 vectors are used. For each transient
fault to be injected a fault site was chosen randomly among
all the ne ts in a module and the value at the net during a
given cyc le was corr upted. 5 transient faults were injected
per cycle ( 3.5 millio n faults) for each module considered. A
fault was considered to escape if it propagates to the prima ry
outputs of the execution unit. An implicit assumption here
is that the faults which propagate to the primary outputs of
the execution unit are goin g to affect the archit ectural state
of the processor.
The entire algorithm for extracting and picking asser-
tions is written in perl. The program was run 5 times for
each module with target fault escape reduction s of 50%,
75%, 90%, 9 5% and 99%. Synopsys Design Analyzer
was used to synthesize the modules. The a ssertion check-
ers (both self-che cking and self-exercising) for each point
were implemented and the area overheads were calculated.
The technolo gy library used is the lsi
10k library distributed
along with Synopsys D esign Compiler. For comparison
purposes, the partial duplication technique described in [5]
was also implemented on all ve of the modules for the
given target fault escape reductions. Consistent with our
methodology, we considered only the faults which escape
instead of considering all the faults which propagate to out-
puts of combinational logic.
Table 4 shows the area overhead results for partial du-
plication (PD), the proposed techn ique with dual rail imple-
mentation (PT-D) and with self-exercising implementation
(PT-S) when achieving different fault esca pe reductions.
The average area overheads for different fault escape tar-
gets are plotte d in Fig ure 4. It ca n be seen that high amount
of fault escape reductions can be obtained with a low area
overhead. On an average, 50% fault escape reduction ca n
be obta ined with just 3% overhead. This number increases
to 54% for PT-D and 4 2% fo r PT-S when the target fault
escape reduction is 99 %. It can be seen from the figure that
compared to partial duplication technique, the average area
overhead of the proposed technique with dual rail imple-
mentation is always lower. Further area savings can be ob-
tained if just a self-exercising checker is needed. For a tar-
get fault esca pe reduction of 95%, dual rail implementation
is 25% better than par tial duplication and the self-exercising
implementation is 40% be tter.
It is to be noted that the area overhead numbers for partial
duplication (PD in the table) are very low compared to the
results presented in the original paper [5]. This difference
can be attributed to the differences in selection methodol-
ogy followed. In this paper, we perfor med fault simulations
using traces from real pro grams instea d of using random
vectors at the inputs of combinational logic. We also con-
sidered only the faults which escape to the primary outputs
Page 5
Table 4: % Area overhead for different target fault escape reduction s for partial duplication (PD), proposed technique as a
dual-rail checker (PT-D) and prop osed technique as a self-exercising checker (PT-S)
50% reduction 75% reduction 90% reduction 95% reduction 99% reduction
PD PT-D PT-S PD PT-D PT-S PD PT-D PT-S PD PT-D PT-S PD PT-D PT-S
module1 0.9 0.6 0.7 1.6 0.8 0.9 8.7 4.4 4.4 33.9 18 14.6 60.6 47.6 36
module2 0.9 0.6 0.7 8.3 8.4 7.8 51.2 38.8 31.4 67.5 64.2 48.5 90.1 106.2 77.5
module3 4.8 1.8 1.9 8.6 4.2 4.1 21.6 15.2 13 53.9 26.4 22.6 82.9 71.2 53.2
module4 12.3 9.2 10.1 15.9 14.8 15.7 25.5 20.8 20.7 27.1 25.4 24.9 30.9 31.8 31.1
module5 1.9 2.0 1.9 7.4 5.6 4.8 14.4 10.8 8.8 18.1 13.6 11.2 23.4 16.2 13.8
Figure 4 : Avera ge area overhead over different fault escape
reductions
of the execution unit instead of considering all the faults
which propagate to the outputs of the combinational logic.
6 Conclusions
A new algorithm for detecting transient faults in the con-
trol logic of a processor with a low overhead has bee n pre-
sented. An assertion checker is automatically constructed
using the architectural traces of real programs. The checker
checks the outputs of a combinational circuit against a sub-
set of the truth table. The algorithm takes advantage of
the following pr operties of the control logic of a processor
to yield a low-overhead checker - asymm etry in the paths
which are exercised at real-time and the asymmetry in the
propagativity of bit-flips in individual flops to the arc hitec-
tural state of the processor. Fault simulatio n exp eriments
were run on ve different random control logic modules in
an industrial processor. Results show th at mor e than 95%
of all the faults which propagat e to architectural states can
be detected with an average area overhead of just ar ound
25%. This is more than 40% lesser when compared with
previously propo sed work for the same am ount of fault de-
tection.
Acknowledgements
We would like to acknowledge the contr ibution
of Suriyapra kash Natarajan of Intel for early pattern-
distribution results on a couple of Intel test cases demon-
strating significant input vector b ias.
References
[1] D. Das and N. A. Touba. Synthesis of circuits with low-
cost concurrent error detecti on based on bose-lin codes. J.
Electron. Test., 15(1-2):145–155, 1999.
[2] P. Drineas and Y. Makris. Non-intrusive design of concur-
rently self-testable fsms. In ATS 02: Proceedings of the
11th Asian Test Symposium, pages 33–38, 2002.
[3] N. K. Jha and S.-J. Wang. Design and synthesis of self-
checking vlsi circuits and systems. In ICCD ’91: Proceed-
ings of the 1991 IEEE International Conference on Com-
puter Design on VLSI i n Computer & Processors, pages
578–581, 1991.
[4] P. Liden, P. Dahlgren, R. Johansson, and J. Karlsson. On
latching probability of particle induced transients in combi-
national networks. In 24th Int. Symposium on Fault-Tolerant
Computing, pages 340–349, 1994.
[5] K. Mohanram and N. Touba. Cost-effective approach for re-
ducing soft error failure rate in logic circuits. In Proceedings
of the International Test Conference, pages 893–901, 2003.
[6] H. T. Nguyen and Y. Yagil. A systematic approach to ser
estimation and solutions. In Proceedings of IEEE Interna-
tional Reliability Physics Symposium, pages 60–70, 2003.
[7] M. Nicolaidis. Self-exercising checkers for unified built-in
self-test (ubist). IEEE Transactions on CAD, 8(3):203–218,
1989.
[8] M. Nicolaidis, R. O. Duarte, S. Manich, and J. Figueras.
Fault-secure parity prediction arithmetic operators. IEEE
Des. Test, 14(2):60–71, 1997.
[9] P. Shivakumar et al. Modeling the effect of technology
trends on the soft error rate of combinational logic. In DSN
’02: Proceedings of the 2002 International Conference on
Dependable Systems and Networks, pages 389–398, 2002.
[10] S. S. Mukherjee et al. A systematic methodology to
compute the architectural vulnerability factors for a high-
performance microprocessor. In MICRO 36: Proceedings
of the 36th annual IEEE/ACM Inte r n a t ional Symposium on
Microarchitecture, page 29, 2003.
[11] F. F. Sel lers, M.-Y. Hsiao, and L . W. Bearnson, editors. Error
Detection Logic for Digital Computers. McGraw-Hill Book
Company, 1968.
[12] N. A. Touba and E. J. McCluskey. Logic synthesis of mul-
tilevel circuits w ith concurrent error detection. IEEE Trans-
actions on CAD, 16(7):783–789, 1997.
Page 6
  • Source
    • "threshold voltage, gate lengths, doping profile etc), which give rise to non-uniform switching behavior causing deterioration of performance as well as increased functional failure and susceptibility to noise. While transient errors that occur during circuit operation will require complex online error detection approach [4]–[6], permanent faults can be checked for during production and the chip can be discarded before it causes errors for the enduser . Circuit designers use variety of defect models to capture the behavior of a permanent defect in chip. "
    [Show abstract] [Hide abstract] ABSTRACT: This paper investigates the use of reconfigurable computing and readily available Field Programmable Gate Array (FPGA) platforms to expedite the generation of input-patterns for testing integrated circuits after manufacture. Unlike traditional fault simulation approaches, our approach emulates single stuckat fault behavior in a circuit and finds the minimum test pattern set to detect it. In this paper, we present a method to insert faults into a circuit netlist by identifying circuit fault sites. We then present our parallel method of fault emulation and describe our method to organize and compress the input patterns needed to identify all faults. Using circuits from the ISCAS and MCNC benchmark suites, we show that our approach does better than a commercial tool in test-set reduction.
    Full-text · Article · Nov 2011 · Journal of Computers
  • Source
    • "Such invariance can be monitored during the normal operation of a circuit to identify errors that cause it to be violated. In [20] such invariance is mined from the gate-level of a controller implementation in the form of assertions, which are evaluated through simulation in order to select a costeffective appropriate subset. The same principle governs the approach in [21]; therein, however, invariance is identified through a path-construction algorithm, which exploits inherent transparency channels that exist in the RTL description of a modular design. "
    [Show abstract] [Hide abstract] ABSTRACT: We present a Concurrent Error Detection (CED) scheme for the Scheduler of a modern microprocessor. The proposed CED scheme is based on monitoring a set of invariances imposed through added hardware, violation of which signifies the occurrence of an error. The novelty of our solution stems from the workload-cognizant way in which these invariances are selected so that they leverage the application-level error masking inherent in program execution. Specifically, in order to ensure cost-effectiveness of the hardware employed to construct these invariances, we make use of information regarding the type and frequency of errors affecting the typical workload of the microprocessor. Thereby, we identify the most susceptible aspects of instruction execution and we accordingly distribute CED resources to protect them. Our approach is demonstrated on the Scheduler of an Alpha-like superscalar microprocessor with dynamic scheduling, hybrid branch prediction and out-of-order execution capabilities. Using an extensive fault-simulation infrastructure that we developed around this microprocessor, we profile the impact of Scheduler faults across a variety of different SPEC2000 benchmarks. Based on the results, we construct a CED scheme which monitors the time and location of instruction execution, the executed operation, the utilized resources, as well as the executed and retired sequence of instructions. At a hardware cost of only 32 percent of the Scheduler, the corresponding CED scheme detects over 85 percent of its faults that affect the architectural state of the microprocessor. Furthermore, over 99.5 percent of these faults are detected before they corrupt the architectural state, while the average detection latency for the remaining faults is in the order of a few clock cycles, implying that efficient recovery methods can be developed.
    Full-text · Article · Oct 2011 · IEEE Transactions on Computers
  • Source
    • "We suggest utilizing a Concurrent Error Detection (CED) mechanism, which is normally used to cope with faults (permanent or transient) in the circuitry to detect active Trojans. In general, there are three approaches to the design of CED schemes: the first is based on analyzing all the possible error events in a given circuit (e.g. [4]), the second is based on analyzing the possible combinations that may appear on the output lines of each logic block, and the third approach is based on analyzing the functionality of the whole system. Clearly, the first two methods are more suitable for trapping errors that are caused by faults, since a) they heavily relay on the actual implementation of the combinatorial circuit and b) they check the correctness of the current output. "
    [Show abstract] [Hide abstract] ABSTRACT: A Trojan horse is a malicious altering of hardware specification or implementation in such a way that its functionality is altered under a set of conditions defined by the attacker. The paper presents a technique for designing secure systems that can detect an active Trojan. The technique is based on utilizing specific information about the system's behavior, which is known to the designer of the system and/or is hidden in the functional specification of the system. A case study of the proposed technique conducted on an arithmetic unit of a microprocessor is provided. The study indicated a high level of Trojan detection with a small hardware overhead.
    Full-text · Article · Jul 2011
Show more