ArticlePDF Available

Built-in-Self-Test of FPGAs With Provable Diagnosabilities and High Diagnostic Coverage With Application to Online Testing.

Authors:
  • Designtech Systems Ltd

Abstract and Figures

We present novel and efficient methods for built-in-self-test (BIST) of FPGAs for detection and diagnosis of permanent faults in current as well as emerging technologies that are expected to have high fault densities. Our basic BIST methods can be used in both on-line as well as off-line testing scenarios, though we focus on the former in this paper. We present 1-and 2-diagnosable BISTer designs that make up a ROving TEster (ROTE). Due to their provable diagnosabilities, these BISTers can avoid time-intensive adaptive diagnosis without significantly compromising diagnostic coverage—the percentage of faults correctly diagnosed. We also develop functional testing methods that test programmable logic blocks (PLBs) in only two circuit functions that will be mapped to them as the ROTE moves across a functioning FPGA. We extend our basic BISTer designs to those with test-pattern generators (TPGs) using multiple PLBs to more efficiently test the complex PLBs of current commercial FPGAs, and prove the diagnosabilities of these designs as well. Simulation results show that our 1-diagnosable functional-test based BISTer with a 3-PLB TPG has very high diagnostic coverages—for example, for a random fault distribution, our non-adaptive diagnosis methods provide diagnostic coverages of 96% and 88% at fault densities of 10% and 25%, respectively, while the previous best non-adaptive diagnosis method of the STAR-¢ ¡ ¤ £ BISTer has diagnostic coverages of about 75% and 55% at these fault densities. Index Terms— built-in self-test (BIST), ¥-diagnosability, diagnostic coverage, FPGAs, functional testing, on-line testing, roving tester.
Content may be subject to copyright.
Built-in-Self-Test of FPGAs with Provable
Diagnosabilities and High Diagnostic Coverage
with Application to On-Line Testing
Shantanu Dutt, Member, IEEE, Vinay Verma and Vishal Suthar
Abstract— We present novel and efficient methods for built-
in-self-test (BIST) of FPGAs for detection and diagnosis of
permanent faults in current as well as emerging technologies
that are expected to have high fault densities. Our basic BIST
methods can be used in both on-line as well as off-line testing
scenarios, though we focus on the former in this paper. We
present 1- and 2-diagnosable BISTer designs that make up a
ROving TEster (ROTE). Due to their provable diagnosabilities,
these BISTers can avoid time-intensive adaptive diagnosis without
significantly compromising diagnostic coverage—the percentage
of faults correctly diagnosed. We also develop functional testing
methods that test programmable logic blocks ( PLBs) in only two
circuit functions that will be mapped to them as the ROTE moves
across a functioning FPGA. We extend our basic BISTer designs
to those with test-pattern generators (TPGs) using multiple PLBs
to more efficiently test the complex PLBs of current commercial
FPGAs, and prove the diagnosabilities of these designs as well.
Simulation results show that our 1-diagnosable functional-test
based BISTer with a 3-PLB TPG has very high diagnostic
coverages—for example, for a random fault distribution, our non-
adaptive diagnosis methods provide diagnostic coverages of 96%
and 88% at fault densities of 10% and 25%, respectively, while
the previous best non-adaptive diagnosis method of the STAR-
BISTer has diagnostic coverages of about 75% and 55%
at these fault densities.
Index Terms— built-in self-test (BIST), -diagnosability, diag-
nostic coverage, FPGAs, functional testing, on-line testing, roving
tester.
I. INT ROD UC TI ON
An FPGA consists of an array of programmable logic blocks
(PLBs) interconnected by a programmable routing network
and programmable I/O PLBs. Current technology trends for
FPGA devices are in the very deep-submicron (VDSM) regime
with recent chips using 90 nanometers and seven metal layers.
Unfortunately, this trend has resulted in decrease of fabrication
yield, and can potentially lead to decreased reliability of
operation. The larger die sizes also mean that there is more
likelihood of failure of some component. Thus testing and
fault tolerance techniques for FPGAs are important to increase
device fabrication yield, and the reliability of FPGA operation
in platforms ranging from mission/life-critical systems to
S. Dutt is with the University of Illinois at Chicago (UIC), and V. Verma
and S. Suthar are with Xilinx Inc. S. Suthar was at UIC when this paper was
submitted.
This work was funded by a grant from Xilinx Corp., Darpa Contract
# F33615-98-C-1318, and National Science Foundation under Grant No.
0204097.
Copyright c2007 IEEE. Personal use of this material is permitted.
However, permission to use this material for any other purposes must be
obtained from the IEEE by sending an email to pubs-permissions@ieee.org.
commercial products. For current very deep submicron CMOS
technology, and to a greater extent for emerging molecular
nano-technology and nanoscale CMOS technology, it is very
likely that transient faults will disturb the system operation
and also permanent faults will develop during the FPGA’s
operational lifetime. The rate of occurrence of permanent
faults can be quite high in emerging technologies, and hence
there is a need for periodic testing of such FPGAs.
The BIST methods presented in this paper are mainly
targeted at detection and diagnosis of permanent faults that
model fabrication defects as well as other physical defects
arising during the lifetime of the FPGA chip. We will use
the term fault density to mean the percentage of PLBs that
have a permanent fault. Finally, we define diagnostic coverage
of a testing technique as the percentage of faults it correctly
diagnoses, and its fault latency as the average time it takes
from the start of the testing phase for the FPGA to diagnose
a fault.
Off-line testing methods for FPGAs are reasonably ma-
ture and well developed [2], [10], [11]. Off-line testing is
acceptable in application environments where there are little
real-time constraints on the application circuit mapped to
the FPGA, since such testing requires the circuit to stop
functioning while it is tested. However, in systems with such
constraints like those in space, avionics and many commercial
products, it is desirable and sometimes necessary to perform
testing on-line, i.e., testing with the application circuit mapped
to and executing on the FPGA and with minimal disruption to
its functioning. As mentioned earlier, the rate of occurrence of
permanent faults can be quite high in emerging technologies.
Hence there is a need for on-line testing to frequently monitor
and check such FPGAs; this would be especially crucial in
remote, mission-critical and other critical applications. The
periodicity of on-line testing can be adjusted to the suscep-
tibility (fault probability) of the FPGA technology. While the
enveloping techniques we present here are for on-line testing,
the basic BIST methods we develop can be readily used for
off-line testing as well; in fact, the higher diagnosability of
our BISTers should also improve diagnostic coverage in the
off-line testing scenario compared to previous work. In the
off-line testing mode, instead of having a roving tester (see
Sec. II), multiple copies of our BISTers would be configured
into the entire FPGA to perform testing in parallel, thereby
yielding smaller test times1.
We differentiate between exhaustive and functional testing
of PLBs. In the former, a PLB is tested in all its modes of
operations, i.e., for all combination of values in its lookup
tables (LUTs), all possible settings of its flip-flops (FFs), etc.
This will be required when it is not known which circuit(s)
will be mapped to the FPGA, as, e.g., during factory-testing
of FPGAs that could be sold to any user. In functional
testing, which can only be done when it is precisely known
which (small) set of circuits will be mapped to the FPGA,
a PLB is tested with it configured in only those functional
modes in which it will be used at different times across these
circuits. For simplicity of exposition, we assume though that
for functional testing only one circuit will be mapped to the
FPGA. Functional testing can be done in the field when the
user-circuit is generally known. It can also be used for factory
testing when it is known which application will be mapped to
the FPGA as, e.g., for FPGAs to be used in printers from a
certain vendor. Functional testing at the factory allows FPGAs
with faults that do not affect the correct execution of the
application circuit to be mapped to it to be correctly diagnosed
as ‘OK’ thus increasing chip yield and reducing costs. In this
paper, we present on-line methods for both exhaustive as well
as functional testing of FPGAs.
Before proceeding further, we give a useful definition. A
testing technique is said to be -diagnosable if in the presence
of any faulty components it can correctly identify all
faulty components among the components that it
tests. Such a testing technique is equivalently said to have a
diagnosability of .
There have only been a few methods proposed for on-
line FPGA testing, including [1]2, [3], [4], [16], [18] In [16],
Shnidman et al. proposed a fault-scanning technique for testing
a portion of an FPGA; for the technique to work, a bus-based
non-segmented interconnect architecture is assumed that is
not available in mainstream commercial FPGAs. The method
of [4] is among the first that uses a roving tester called
STAR to test a portion of the chip exhaustively (irrespective
of the mapped circuit) while the rest of the chip executes
the application circuit. Their tester, however, has no provable
diagnosability and in fact as we show later in Sec. III, the
basic built-in self tester (BISTer) in [4] is 0-diagnosable, i.e.,
it cannot identify the single faulty PLB. Furthermore, since
it uses an adaptive diagnosis method to get around the 0-
diagnosability problem, and also performs exhaustive testing
of PLBs (even when the circuit is known), it has high test
times and thus high fault latencies. In [1], [3], a BISTer
was proposed in combination with the STAR tester of [4] that
provides 1-diagnosability among 6 PLBs (i.e., if there is at
most one fault among 6 PLBs, the faulty PLB will be correctly
1To improve diagnosability, two such FPGA-wide multiple-BISTer configu-
rations can be used, with the second one’s configuration shifted two columns to
the right of the first one (with possible wrap-around of the rightmost BISTers
if they are more than 2-column wide).
2After the submission of the first draft of this paper, and during its review
cycle, [1] was published, and this was pointed out to us by one of the
reviewers. Paper [1] is an extension of [3] with the main addition being the
Multicello fault-diagnosis algorithm for an extended version of their BISTer
with a tile that are discussed in Sec. III-C.
ROTE
CIRCUIT
CIRCUIT
(3,3) (4,3)
(1,4) (2,4) (3,4) (4,4)
n2
n1
Reconfiguration paths
n3
n1
n2
n3
(2,3)
(2,0) (3,0) (4,0)
(0,1)
(0,2)
(0,3)
(0,4)
(1,1) (3,1) (4,1)
(2,1)
(1,2) (2,2) (3,2)
(4,2)
(1,3)
(a)
(b)
Spare-PLB column for fault reconfiguration
BISTer
(c)
(1,0)
(0,0)
(1,3)
(0,4) (1,4) (2,4) (3,4)
(3,3)
Spare-PLB
column
(0,0) (1,0) (2,2) (2,0) (3,0)
(1,1)(0,1) (2,1) (3,1)
(0,2) (3,2)
(1,2)(0,3) (2,3)
Fig. 1. (a) Roving concept. (b) BISTer tiles in ROTE area. (c) Four
reconfiguration paths (shown by dark arrows) for four faulty PLBs and net re-
routings (shown by dashed lines) of nets connected to PLBs on these paths.
The new PLB labels in the reconfigured FPGA are shown in italics below
each PLB (previous labels are in roman); PLB with new/italics label ( )
functionally replaces the previous/roman -labeled PLB.
identified); it also diagnoses some but not all patterns of two
PLB faults. This BISTer, which we call STAR- BISTer,
will be discussed further in Sec. III. BISTers for interconnects
are presented in [18], [19].
We present here a roving tester (ROTE) for on-line testing
of FPGAs, which tests parts of the circuit in a piecemeal
manner by duplication and comparison using BISTers with
provable diagnosabilities, thus avoiding expensive adaptive
diagnosis. The main contribution of this work, which is a
significant extension of [20], are novel 1- and 2-diagnosable
BISTer designs, functional testing methods coupled with the 1-
diagnosable BISTer, extensions of these designs for application
to many current FPGAs, and diagnosability results for all
these designs. Our testing methods are more accurate (higher
diagnosability and higher diagnostic coverage3) and faster
than those developed in previous work. In this paper we
address PLB faults; interconnect fault test-and-diagnosis has
been addressed in [19].
The rest of the paper is organized as follows. Section II
gives a bird’s-eye view of the on-line testing and fault tol-
erance environment that we propose; only on-line testing is
discussed in the rest of the paper. In Sec. III we discuss
previous BISTer designs for on-line testing—we establish that
a previous BISTer design [4] is 0-diagnosable, and also present
two 1-diagnosable BISTers [1], [3], one with a tile
and another with a tile. Section IV presents two new
BISTer architectures with provable diagnosabilities of one and
two, while Sec. V develops a technique for faster testing and
diagnosis that tests PLBs in only two functional modes that
they can be used in, in an operational FPGA with the roving
tester. In Sec. VI, we extend out conceptual BISTer designs to
perform testing in current FPGAs that have PLBs with large
functionalities in terms of the number of inputs. Section VII
presents our simulation results, and we conclude in Sec. VIII.
3Note that diagnosability is a measure of the diagnosis capability of a
BISTer within a single BIST area, while diagnostic coverage is an average-
case measure of diagnosis capability of a testing technique across a system
(FPGA chip in our case) wherein it employs multiple BISTers to cover the
entire system. The two measures are clearly correlated.
2
II. TH E BIG PI CT UR E – ROVI NG TE ST ER A ND FAULT
REC ON FIG UR ATIO N
We present here the overview of the process in which the
FPGA is repeatedly tested in the field in an on-line manner for
faults by a roving tester, and on whose detection the applica-
tion circuit mapped to the FPGA is dynamically reconfigured;
if at any point in this process a fault is non-reconfigurable, then
the system reports an “irrecoverable failure”. This scenario
mainly applies to on-line field testing. An earlier work [4]
presented the concept of a roving tester; our roving tester,
however, has a different structure and roving mechanism. Note
that the main contribution of our paper is not the roving
mechanism or in general the roving tester, but the BISTers
that comprise the roving tester.
We consider SRAM-based FPGAs that support partial and
run-time reconfiguration such as Xilinx’s Virtex series. In an
FPGA, two columns of the FPGA are left spare for
the ROTE and, say, one spare column is allocated to the right
of the FPGA for fault reconfiguration. The system function
is implemented in the remaining subarray; see
Fig 1(a). The leftmost spare columns are occupied by a roving
tester (ROTE) that roves across the FPGA and performs test
and diagnosis. The ROTE area comprises of multiple BISTers
that simultaneously test different sub-areas of the ROTE;
see Fig 1(b). An external reconfiguration and test controller
controls the BISTer operation (configures the BISTers in the
ROTE area, starts their operations, scans out their detailed
syndromes, and performs fault diagnosis as explained in this
paper), the movement of the ROTE across the FPGA and fault
reconfiguration. The new functions of the PLBs and the new
track positions of the nets for moving the ROTE to its next
position one column to the right are computed concurrently by
the controller while the ROTE performs testing in its current
position.
Fault detection in a BISTer consists of comparing the output
response of a PLB to another identically configured PLB; a
mismatch in the output response indicates faulty PLBs in the
BISTer tile. Diagnosis includes drawing inferences from the
output vectors and locating the exact faulty PLB(s). Once the
testing in the ROTE area is over, the circuit functioning is
stopped momentarily and the new bit-streams are downloaded
to the FPGA by the controller to move the ROTE to the right
by one column. In the presence of faults, the ROTE will move
in a warped manner so that, say, each BISTer-1 tile (described
in Sec. IV-A) in it occupies a array of PLBs whose latest
diagnosis status is “fault-free”. This is shown in Fig. 9 (this
figure also illustrates Lemma 1 in Sec. V).
When fault(s) are detected and diagnosed, then using tech-
niques proposed in [7], [14], each faulty PLB is reconfigured
directly or indirectly using a unique spare PLB. Briefly, the
steps are:
1. Compute reconfiguration paths from each identified
faulty PLB to a spare PLB using fast network flow
algorithms presented in [14]. PLB following PLB
in a reconfiguration path means that will be configured
with s functionality for each such pair to achieve
reconfiguration; see Fig. 1(c).
2. Perform incremental re-routing (e.g., [6], [7]) to
CUT CUT
CUT
CUT
CUT CUT
TPG
ORA
CB
A D
Session S1
CUT
CUT
(b)
(a)
CUT
A
B
C
D
TPG ORA
PLB
Sessions S1 S2 S3 S4
ORA
ORA
TPG
TPG
ORA
TPG
CUT
Fig. 2. (a) BISTer-0 architecture of [4]. (b) Cycling of PLBs in BISTer-0
tile.
extend/re-route each interconnect going originally to to
now connect to for each adjacent pair in a recon-
figuration path. Figure 1(c) shows three nets of
the original configuration and their re-routings required
(shown by dashed lines) for reconfiguration.
III. PRE VI OU S BISTER DE SI GN S
In this section, we first present the BISTer design of [4]
denoted here by BISTer-0, and prove that it is 0-diagnosable.
While BISTer-0 has been superseded by the 1-diagnosable
BISTer of [1], [3] by the same primary designers of BISTer-
0, the latter provides a starting point for us for developing our
own 1-diagnosable BISTer that has better diagnostic coverage
than the BISTer of [1], [3] as we show in Sec. VII. We
then also present the design of the BISTer of [1], [3]
(henceforth called the STAR- BISTer), and its -tile
version along with its diagnosis algorithm Multicello.
A. A 0-diagnosable BISTer
The presentation of BISTer-0 sets the stage for developing
new BISTer designs that have provable diagnosability in the
next section. Throughout the paper we assume that intercon-
nects and wires are fault free; testing and diagnosis of faulty
interconnects has been addressed in [18], [19].
BISTer-0 shown in Fig. 2 comprises of one test pattern
generator (TPG), one output response analyzer (ORA) and two
PLB cells under test (CUTs) that are exhaustively tested. The
TPG applies test patterns to two identically configured CUTs
whose outputs are compared by the ORA. The ORA latches
and reports mismatches as test failures. The testing of the two
CUTs by the TPG and ORA in one BISTer configuration is
called a session. Bister-0 has four sessions and successive
sessions are obtained by one-PLB rotations of the BISTer
functions (two CUTs, TPG, ORA), shown by a dotted arc
in Fig. 2a. Figure 2a also shows the configuration for session
, and Fig. 2b shows all the sessions of BISTer-0. In session
, PLB A becomes a CUT, PLB B becomes a TPG, PLB C
becomes a CUT and PLB D becomes an ORA. When the
BISTer completes a cycle of four sessions, each PLB has
been configured twice as a CUT. BISTer-0 can detect multiple
faults with high probability [4] but, as we show later, it
cannot diagnose, i.e., locate, any of them—it is 0-diagnosable.
Thus adaptive diagnosis schemes, which are generally time
consuming, are used in [4] even for a single PLB fault.
3
T
TC
C
T O
T
CO
T
T C
T
OC
T
C T
C
CT
T
O T
O
TT
C
C T
C
TT
O
T C
B1
B
O12
2
O13
B3
O23
(a) (b)
Fig. 3. (a) Six sessions of STAR- BISTer [1], [3] (T TPG, C
CUT, O ORA). (b) One of its two “combined test sessions” represented
by an ORA test graph.
We define the detailed syndrome for a session as the
0/1 bit
pattern
observed at the ORA output over all test vectors of the
TPG; a
0
indicates a match and a
1
indicates a mismatch.
represents the
’th
session as well as the detailed syndrome for
’th
session, the use of which will be clear from the context.
The gross syndrome of a session is the overall pass/fail ob-
servation over all modes of tested operations for that session.
In other words, the gross syndrome of a session is a “X” (fail)
if the ORA output is “1” for any input test vector, and is a
” (pass), otherwise.
Theorem 1: BISTer-0 is zero-diagnosable.
Proof: There are four sessions for BISTer-0; see Fig. 2b. In
BISTer-0 the same pair of PLBs are configured as CUTs in
two different sessions. When either PLB fails, the gross and
detailed syndrome will be identical in both sessions in which
they are configured as CUTs thus making it impossible to
locate the fault in either of them. For example, if PLB fails
as a CUT only (i.e., its fault is detected when it is tested in
all its modes when it is a CUT, but this fault is not exercised
when it is configured as a TPG or an ORA), then the gross
syndrome of sessions and will be fail, while and
will be a pass. The same syndrome will be obtained when Cis
faulty as a CUT only; see Fig. 2b. Also Aand Cfailing in the
same mode that is not exercised by the TPG or ORA will also
produce the same detailed syndromes for all sessions. Thus in
such cases, we cannot determine if Aor Cis faulty. Similarly,
we cannot distinguish between faulty PLBs Band D. Hence
BISTer-0 is zero-diagnosable.
B. The 1-diagnosable STAR- BISTer
We now briefly present the 1-diagnosable STAR-
BISTer of [1], [3] to which we directly compare in Sec. VII
a version of our 1-diagnosable BISTer with a 3-PLB-TPG,
BISTer-1 (see Sec. VI-B) w.r.t. diagnostic coverage and
fault latency metrics.
As shown in Fig. 3, the STAR- BISTer has a
arrangement and has a 3-PLB TPG in order to simultaneously
test all lookup tables (LUTs) in a PLB with input-output pin
ratio of 3:1 (in Sec. VI we further discuss this and alternative
strategies of tackling input-output pin ratios of greater than
1). The STAR- BISTer has six sessions, and each session
is obtained by a rotation of BISTer functions of the previous
session. The goal of the six sessions is to test each PLB twice
and compare it to a different PLB each time it is tested. The
rotating strategy leads to this BISTer being 1-diagnosable (out
of 6 PLBs). The diagnosis technique (derived from Theorem
1 of [3]) here is that when the gross syndrome consists of
exactly one pair of ORAs with a common CUT reporting
faulty syndromes (across all six sessions), and in which the
common CUT when an ORA may or may not report a faulty
syndrome, then the common CUT is uniquely diagnosed as
being faulty (assuming there are no other faulty PLBs). E.g.,
if the ORAs in the first and third sessions of Fig. 3(a) report
failures, and among the other ORAs in the remaining four
sessions, at most the ORA in the second session reports a
failure, then the top CUT in the first session is diagnosed as
faulty. This is the non-adaptive diagnosis technique of [3] that
we use in our simulation of the STAR- BISTer. As claimed
in [1], the STAR- BISTer can also detect (though not
diagnose) 2-PLB fault patterns under simplifying assumptions
of a TPG with a faulty PLB not skipping test vectors that
detect fault(s) in a faulty CUT, or the two faulty PLBs not
having the same responses to all test vectors (common-mode
failures)4.
The more involved diagnosis scheme Multicello presented
in [1] corresponds to their BISTer, and is extended from
its off-line version [2]. We discuss both, their BISTer
and Multicello, below.
C. The 1-diagnosable STAR- BISTer
Figure 4(a) shows the STAR- BISTer, which has a
tile and is similar in structure to the STAR-
BISTer except for two consecutive “spare” PLBs (labeled ‘S’)
embedded in the tile of the former. Since there are 8 PLBs
in this BISTer, the functional rotations, including that of the
“spare functionalities”, yield 8 sessions. The 8 sessions can be
partitioned into two “combined test sessions”, each of which
consists of four non-consecutive sessions. In each combined
test session, disjoint sets of PLBs are CUTs, and in each, the
comparison results of each CUT are reported by two different
ORAs. The ORA test graphs for the two combined test sessions
representing the aforementioned relationship between CUTs
and ORAs are shown in Figs. 4(b) and (d); in an ORA test
graph is used to denote the ORA that reports the comparison
results of CUTs and .
The fault diagnosis scheme Multicello [1] for the
BISTer attempts to diagnose multiple faults as follows (refer
to the diagnosis table of Fig. 4(c) for examples of the given
steps):
1. Record ORA results (0 for a pass gross syndrome, 1 for
a fail) and initialize the failure state of every CUT in each
phase as unknown (empty)—a phase is a specific functional
configuration of the CUT PLBs.
2. In each column, for every two consecutive ORAs with a 0
mark, enter a 0 for the CUT between them.
3. In each column, for every two adjacent 0 marks followed by
an empty cell, enter a 0 in the empty cell.
4. In each column, for every adjacent 0 and 1 marks followed
by an empty cell, enter a 1 in the empty cell.
5. Consistency checks: If there is an ORA reporting a failure
in phase (marked with a 1), while neither of the two CUTs
4In Theorem 4 we also make some simplifying assumptions for proving the
2-diagnosability of BISTer-2 (Sec. IV-B), which is a much higher capability
than 2-detectability. However, our assumptions allow for the faulty PLBs to
malfunction when they are TPGs (e.g., skip fault-detecting test vectors for
the other faulty PLB when it is a CUT) or ORAs, and does not have the
requirement of no common-mode failures in the two faulty PLBs. Specifically,
our assumptions are that both faulty PLBs either malfunction as TPGs and
ORAs or correctly perform as TPGs and ORAs.
4
(outputs s−a−0
or unequal−input
LUT bits s−a−0)
O12 O14
O23 34
O
(b)
C3
2
C
1
C
4
C
O14
O12
O23
34
O
O12
O23
34
O
O12
O23
34
O
O14
(c)
phase p
0
1
0
0
Step 1
phase p
0
1
0
0
Step 2
0
0
phase p
0
1
0
0
Step 3
0
0
0
0
4
C
C3
2
C
1
C1
O14
C
2
C
C3
4
C4
C
C3
2
C
1
C
i i i
(e)
phase p
Step 1
5
O56
6
C
67
C7
O
O78
8
C
58
O
C
0
1
0
0
phase p
Step 2
5
O56
6
C
67
C7
O
O78
8
C
58
O
C
0
1
0
0
0
0
phase p
Step 3
5
O56
6
C
67
C7
O
O78
8
C
58
O
C
0
1
0
0
0
0
0
0
j j j
T
TS
C
T S
C O
(a)
O56
O67
O78
O58
(d)
6
C5
C
8
C
7
C
Fig. 4. (a) One session of STAR- BISTer [1], [3] (T TPG, C CUT, O ORA, S Spare). (b) Its first combined test sessions represented by its
ORA test graph; the two faulty PLBs are indicated by ’s. (c) Diagnosis using the non-adaptive phase (Steps 1-4) of Multicello for the first combined test
session for the two faults shown—Step 4 is not needed as all locations in the table are filled in the first three steps, and no CUTs are diagnosed as faulty.
(d) The ORA test graph for the second combined test session. (e) Its diagnosis for the two faults using non-adaptive Multicello—again, Step 4 is not needed,
and no faults are diagnosed.
observed by the ORA fails in that phase (both are marked with
a 0), then there is a potential inconsistency. If the failing ORA
is reported as faulty in the other combined test session where it
is a CUT, then go to Step 6. Otherwise, divide the suspect PLBs
into subsets, retest and reapply the procedure to each subset. If
no further division is possible, then report inconsistency and
exit.
6. If every PLB has been identified as fault-free or faulty, the
group of faulty PLBs has been uniquely diagnosed. Otherwise,
divide the suspect PLBs into subsets, retest and reapply the
procedure to each subset.
For more than a single faulty PLB, the two assumptions
made by Multicello and the theoretical diagnosis results it
relies on are [1], [3]:
(1) Assumption A1: A TPG with faulty PLBs does not skip
the patterns that detect faults in a CUT.
(2) Assumption A2: No more than two faulty CUTs have
identical responses in the same failing phase.
However, the theoretical diagnosis results (Lemmas 2-8 and
Theorem 2 of [1]) for various gross syndrome patterns used
by Multicello do not take into consideration the possibility
of ORA faults. These results thus do not hold when spe-
cific ORA faults occur (within the constraints of the above
two assumptions). Multicello accounts for situations like this
(called “inconsistent syndromes”—those that do not conform
to the syndrome patterns specified in their theoretical results)
by using adaptive diagnosis; see Steps 5 and 6 of Multicello.
Thus, while the Multicello procedure can diagnose some
patterns of multiple faults correctly in one pass, there is no
theoretical guarantee that it can diagnose all such patterns
(including all 2-fault patterns) without resorting to adaptive
diagnosis. Another issue in Multicello is that the definition
of which PLBs can be considered as “suspect” (see Steps 5
and 6 of Multicello) in different fault syndrome classes has
not been specified except for one case5. Below, we provide
two examples of 2-fault patterns that cannot be diagnosed by
5The only exception is Lemma 7 of [1] which states “(For the tile)
If only two ORAs without a common CUT fail phase , then at least one
pair of CUTs between the two ORAs are faulty and have identical response
in phase .” This lemma implicitly identifies the set of suspect PLBs (though
that term is not used there) as the set of 4 PLBs checked by the two ORAs
in question. Other than this case, no other suspect PLB sets are identified for
any other fault syndrome patterns.
Multicello without using adaptive diagnosis; one of these fault
patterns is, in fact, mis-diagnosed by the non-adaptive phase
(Steps 1-4).
In the first combined test session shown in Fig. 4(b),
consider the two consecutive faulty PLBs (which becomes
the ORA in combined test session 2 of Fig. 4(d)) and
(which becomes the CUT labeled in combined test session
2), each having either of the following faults: (i) either their
output(s) are stuck-at-0, or (ii) their LUT bits corresponding
to unequal inputs (from the two CUTs they would compare
as ORAs) are stuck-at-0. Either of these internal faults causes
these PLBs when configured as ORAs to always report a pass
(0) output, even with mismatched inputs. Assume further that
these PLBs, while malfunctioning in the same way as ORAs,
have different internal faults (e.g., one has fault (i) and the
other (ii)), so that as CUTs they do not have identical outputs
in all phases of testing. Figures 4(c) and (e) shows the results
of the application of the first three steps of Multicello (Step 4
is not needed as all positions get filled before it is reached),
for relevant phases and (in which fault syndromes are
reported by the ORAs). As can be seen, no CUT is diagnosed
as faulty. However, the inconsistent syndrome check of Step
5 is triggered, and each PLB in the suspect set needs to be
configured in different BISTers to diagnose them further in an
adaptive phase. Since the suspect set for this fault syndrome
class is not defined, we can only surmise that at the very
least, the suspect set includes the PLBs that are checked by
the ORAs that report fail syndromes, as well as those ORAs
(since it is possible that the ORAs are reporting fail syndromes
because of internal faults, e.g., their outputs are stuck-at-1).
This gives us 4 PLBs as suspect, and diagnosing them further
with 4 different BISTers, whose other PLBs are not guaranteed
to be fault-free, could be involved.
Figure 5 shows another example with PLBs corresponding
to ORAs and faulty with identical responses of
an output of 1 when they are configured as ORAs, even
when they receive matching inputs from the two CUTs they
compare. Assume also that their faults are such that they
produce identical responses as CUTs in any phase (e.g., both
their outputs are stuck-at-1 or LUT bits corresponding to
identical/matching inputs [which, when configured as ORAs,
they would receive from fault-free CUTs] are stuck-at-1).
5
O14
O23 34
O
O12
(a)
C3
2
C
1
C
4
C
(outputs s−a−1
or equal−input
LUT bits s−a−1)
j
phase p
5
O56
6
C
67
C7
O
O78
8
C
58
O
C
0
1
1
0
j
phase p
5
O56
6
C
67
C7
O
O78
8
C
58
O
C
0
1
1
0
Step 1 Step 2
(unchanged)
(d)
O14
O12
O23
34
O
O12
O23
34
O
O12
O23
34
O
O14
O12
O23
34
O
O14
i i
phase p
1
1
0
0
Step 1
phase p
1
1
0
0
Step 2
0
4
C
C3
2
C
1
C1
O14
C
2
C
C3
4
C
i
phase p
1
1
0
0
Step 4
0
4
C
C3
2
C
1
C1
0
0
(b)
i
phase p
1
1
0
0
Step 3
0
0
4
C
C3
2
C
1
C
0
O56
O67
O78
O58
(c)
6
C5
C
8
C
7
C
Fig. 5. (a) The ORA test graph for the first combined test session, with two faulty PLBs indicated by ’s. (c) Its diagnosis for the two faults using the
non-adaptive phase (Steps 1-4) of Multicello for the first combined test session— is mis-diagnosed as faulty. (d) The ORA test graph for the second
combined test session. (e) Its diagnosis for the two faults using non-adaptive Multicello—no CUT locations can be filled and thus no fault diagnosis is
possible.
Figure 5(b) shows the application of Steps 1-4 of Multicello
for the combined test session represented in Fig. 5(a). This
results in a mis-diagnosis of as faulty. For the second
combined test session represented in part (c), Multicello cannot
proceed beyond Step 2 and no diagnosis is performed. Again,
the inconsistent syndrome check of Step 5 is triggered. In this
case, at least 6 PLBs are suspect, and if PLB diagnosed as
faulty in the first combined test session is also included, as it
should be, 7 PLBs are in the suspect set, requiring 7 different
BISTers to diagnose them.
The above two examples show that Multicello is a 1-
diagnosable technique, and for multiple faults, involved adap-
tive diagnosis may be required in some cases. As explained
above, the adaptive diagnosis phase of Multicello is used when
it encounters inconsistent syndromes than can occur when
there are more than one fault in the BIST area. This also
seems to be the case for the adaptive diagnosis phase for
STAR- BISTer given in [3]. This phase uses a divide-and-
conquer strategy in which the “suspected” PLBs in the original
BISTer are re-configured into different BISTers for further
test and diagnosis, and can require several reconfigurations
of the FPGA tester area. Moreover, there are no results on the
convergence of this adaptive phase.
As established in Sec. VI-B, our 3-PLB-TPG BISTer,
BISTer-1 , while also being 1-diagnosable out of 6 PLBs
(like STAR- BISTer), can provably diagnose other fault
patterns using only non-adaptive diagnosis. Furthermore, as
we empirically show in Sec. VII, BISTer-1 has higher
diagnostic coverage than the STAR- BISTer using the
provable non-adaptive 1-diagnosable technique of [1], [3].
IV. NEW BISTE R ARC HI TE CT UR ES
We now present our new BISTer designs that have non-zero
diagnosis capabilities—BISTer-1 which is 1-diagnosable and
BISTer-2 which is 2-diagnosable with very high probability.
Note that non-zero diagnosability of the BISTer reduces fault
latency since time-consuming adaptive diagnosis procedures
such as those used in [4] will not be needed as long as the
number of faults is within the diagnosability of each BISTer
tile; these numbers are 1 fault out of 4 PLBs for BISTer-
1 or a 25% fault density, and 2 faults out of 6 PLBs in
BISTer-2 or a 33% fault density. In this section, we present the
CUT
CUT CUT
CUT CUT
TPG
ORA
CUT
CUT
PLB function cycling
(b)
(a)
BA
CD
Session S1
CUT
A
B
D
TPG
PLB
Sessions S1 S2 S3 S4
TPG
TPG
TPG
ORA
ORA
ORA
ORAC
CUT CUT
Fig. 6. (a) Our BISTer-1 architecture. (b) The 4 sessions of BISTer-1.
BISTer designs for exhaustive PLB testing. In the next section
we present the basic BISTer-1 design with a modified test-
and-diagnosis envelope for functional PLB testing; a similar
approach can be used for BISTer-2. Finally, we assume here
without loss of generality that a TPG can be configured in one
PLB. In Sec. VI we provide detailed discussions on how a
one-PLB TPG can be used to test other PLBs when the PLB
input/output pin ratio is greater than one, and alternatively,
how to extend BISTer-1 to a BISTer with a 3-PLB TPG that
can also diagnose one fault besides other fault patterns.
A. A New 1-diagnosable BISTer
Figure 6 shows a modified BISTer architecture BISTer-1.
Like BISTer-0, it consists of one TPG, two CUTs and an ORA,
however, unlike in BISTer-0 where two diagonally opposite
PLBs are configured as CUTs, in BISTer-1 two adjacent
PLBs are CUTs. This results in a PLB being a CUT in two
consecutive sessions and each pair of PLBs beings CUTs in
exactly one session. In contrast, in BISTer-0 there are two pair
of PLBs that are each CUTs in two different sessions. This
difference is key to providing 1-diagnosability in BISTer1.
As shown in Fig. 6, in session , PLB A is configured as
a TPG, Bas a CUT, Cas a CUT and Das an ORA. Again,
successive sessions are obtained by one-PLB rotations of the
BISTer functions.
In Sec. VI-B, we extend BISTer-1 to a 3-PLB-TPG BISTer,
BISTer-1 , for simultaneously testing all lookup tables
(LUTs) of large PLBs with input/output pin ratios of up to 3:1.
There we also establish the diagnosabilities of BISTer-1 .
Note that BISTer-1 can test the LUTs of the latter type of
PLBs sequentially with its 1-PLB TPG assuming the number
6
TABLE I
GROS S SYN DROM ES F OR BI ST ER -1 A ND T HE IR D IAGN OSIS U NDE R
AS SU MP TI ON OF AT MO ST ON E FAULT Y PLB.
Ses. Inference
S.No
1 No faulty PLB
2 X Fault not in PLB
3 X Fault not in PLB
4 X Fault not in PLB
5 X Fault not in PLB
6 X X Faulty C (CUT)
7 X X Faulty D (CUT)
8 X X Faulty A (CUT)
9 X X Faulty B (CUT)
10 X X Fault not in PLB
11 X X Fault not in PLB
12 X X X Faulty D
13 X X X Faulty A
14 X X X Faulty C
15 X X X Faulty B
16 X X X X Fault not in PLB
of inputs of a LUT is no more than the number of PLB outputs
emanating from FFs; this is the case with most current FPGAs
like those from Xilinx.
Table I shows all possible gross syndromes for the four
sessions and the inferences drawn from them. If a single PLB
is faulty it should have a “X” in the two sessions in which it
is configured as a CUT under the assumption of at most one
faulty PLB.
Theorem 2: BISTer-1 is 1-diagnosable.
Proof: The theorem is proved by construction. We classify all
the 16 outputs (rows in Table I) into six groups (cases). The
same diagnostics apply to all rows of a given case.
Case 1: (Row 1 of the table). All the 4 sessions report pass.
If a single PLB is faulty then there should be a fail when it
is configured as a CUT. Since there is no fail in any session,
no PLB is faulty.
Case 2: (Rows 2-5). In this case, only one session reports fail
and since a PLB is configured as CUT twice so there should
at least be two failing sessions. Hence the fault is not in a
PLB. The fault could be in the interconnects or it could be a
transient fault.
Case 3: (Rows 6-9). In this case, the two failing sessions
are consecutive. Thus the two consecutive failing sessions
identifies the faulty PLB as the one that is a CUT in these
two sessions—in BISTer-1 a unique PLB is configured as
CUT in two consecutive sessions. For example, for row 6,
gross syndromes of sessions and report fail. PLB C
is configured as a CUT in these two sessions, and hence the
faulty PLB is C. Similar reasonings holds for rows 7-9.
Case 4: (Row 10-11). In this case, the two failing sessions
are alternate. Since a PLB is configured as a CUT in two
consecutive sessions and not in alternate sessions, and the PLB
should fail at least when it is a CUT, so for this case no PLB
is faulty. Again, the fault maybe in an interconnect or may be
transient.
Case 5: (Rows 12-15). There are three failing sessions. The
gross syndromes report fail when a PLB is configured as a
CUT (two sessions) and when it is an ORA. Whenever the
faulty PLB is configured as a TPG the gross syndrome is a
B
C
D
ORA
A
B
C
D
ORA
TPG
Session3 Session4
TESTEE
CUT CUT
CUT
CUT
(b)
A TESTEE
Q2
Q1
(a)
TESTERS
TESTER1
TESTER2
A
TPG
Fig. 7. (a) Testing graph for BISTer-1. (b) Tester configurations for PLB .
pass (“ ”), since, even if the TPG exercises the fault in the
PLB, identical test vectors are fed to the two fault-free CUTs
and compared by the fault-free ORA. Hence for row 12, the
faulty PLB is D, since Dis configured as a TPG in session
whose gross syndrome is a pass. Similar analysis holds for
rows 13, 14 and 15.
Case 6: (Row 16). In this case, all the sessions report fail. This
case is not possible with at most one faulty PLB, since when
the faulty PLB is configured as a TPG, the gross syndrome
should be a pass.
We thus see that for each faulty PLB, the gross syndrome is
unique. Hence BISTer-1 is 1-diagnosable.
We next show that BISTer-1 is not 2-diagnosable.
Theorem 3: BISTer-1 is not 2-diagnosable.
Proof: Figure 7a shows the “testing graph” for a PLB in
BISTer-1. The testing graph of BISTer-1 has directed arcs
between each tester-testee pair. Since each PLB is a testee the
two times it is configured as a CUT, and the rest of the PLBs,
in two different configurations, form the two testers, each PLB
has two incoming arcs.
The testing graph fits the PMC fault-diagnosis model [15]
which states that an upper bound on the diagnosability of a
system is one less than the minimum in-degree of a node. Thus
BISTer-1’s diagnosability is at most one. The main argument
of the PMC model (applied here) is that when syndromes
, it is not possible to distinguish between the two
fault patterns (each with 2 faults): 1) one of the testers (i.e.,
a non-TPG PLB in it) and Aare faulty, and 2) both testers
(non-TPG PLBs in them) are faulty (e.g., Din Tester1 and C
in Tester 2) and Ais fault-free.
B. A 2-diagnosable BISTer
The BISTer-2 architecture which has six PLBs is shown in
Fig. 8a. Two PLBs are configured as TPGs, two as CUTs, and
two as ORAs. and are the outputs at the first and second
ORA respectively. The first ORA compares the outputs of the
two CUTs and the second ORA compares the outputs of the
two TPGs. Since there are six PLBs, there are six sessions;
Fig. 8b shows the PLB configuration for each session. The
gross syndromes corresponding to are denoted also
by , while the corresponding detailed syndromes are
denoted by , respectively. Furthermore, the joint
syndromes for session is denoted simply by .
In the testing graph of BISTer-2, each testee PLB has four
incoming arcs, since it is twice tested as a CUT and twice
7
CUTCUT
CUTCUT
CUTCUT
CUTCUT
CUTCUT
CUTCUT
(Y1)
(Y1)
(Y1)
(Y2)
(Y2)
(Y2)
(Y1)
ORA
(b)
(a)
A
B
CUT
CUT
Session S1
F
ED
C
Y2Y1 ORA
TPG
ORA
TPG
ORA
TPG
F
E
D
C
B
A
S6S5S4S3S2S1
TPG ORA
(Y2)
(Y2)
(Y1)
(Y2)
ORA
ORA
(Y1)
Sessions
PLBs
ORATPG
TPGORA
ORA
ORA
ORA
ORA
ORA
TPG
TPG
TPG
TPG
TPG
TPG
TPG
TPG
Fig. 8. (a) BISTer-2 architecture. (b) Six sessions of BISTer-2.
TABLE II
GROS S S YNDR OM ES F OR BI ST ER -2 F OR O NE FAU LTY PLB. WH EN
A FAU LTY PL B IS E IT HE R A T PG O R OR A IT S SY ND RO ME I S IN DI CAT ED
BY A WH IC H ME AN S THAT T HE SY ND RO ME CA N EI TH ER B E A XOR
DE PE ND IN G ON WH ETHE R ITS FA ULT(S)A RE E XE RC IS ED O R NO T,
RE SP EC TI VE LY,BY I TS CU RR EN T F UN CT IO NAL ITY (T PG O R OR A) .
Faulty
PLB
A X
B X X
C X X
D X X
E X X
F X
as a TPG. Thus according to the PMC model [15] is at most
3-diagnosable. We next establish its diagnosability.
Theorem 4: Assuming that (1) there is no fault masking of
all detailed syndromes in the presence of two faults, and (2)
in a BISTER-2 tile, faulty PLBs either uniformly all fail as
TPGs and ORAs or uniformly all pass as TPGs and ORAs,
BISTer-2 is 2-diagnosable with very high probability.
Proof: We first show that BISTer-2 is 1-diagnosable, ir-
respective of whether a faulty PLB fails as a TPG or as
an ORA. Table II which is self-explanatory shows that the
gross syndromes for each faulty PLB is unique; in
particular note the unique pattern of three consecutive (pass)
syndromes that occur in for each faulty PLB (these three
consecutive ’s start at a different session for each faulty
PLB). E.g., for faulty PLB A, is uniquely ’s in sessions
and . Hence BISTer-2 is 1-diagnosable.
We now show that BISTer-2 is 2-diagnosable for the case
that the two faulty PLBs also fail as TPGs and ORAs; the proof
for the case that they pass as TPGs and ORAs is similar. We
first note that when a CUT is tested as an ORA (the CUT
is exhaustively tested), it is easy to detect the case when its
output is stuck-at-0. If that is the case, then when this PLB is
an actual ORA the gross syndrome corresponding to its output
is taken as a fail instead of the pass syndrome indicated by its
stuck-at-0 state. With this scenario, the assumption that fault
masking does not occur for all detailed syndromes is a very
high probability one. Table III shows the gross syndromes
and for different faulty PLB pairs. From the six session
columns we see that and for each faulty pair is unique
except for faulty pairs AD,BE and CF. For these pairs,
are fails in all sessions. We thus need additional analysis via
the detailed syndromes to diagnose these pairs. For faulty pair
AD, PLB Ais configured as a TPG in session and , while
PLB Dis configured as a CUT in these sessions; see Fig. 8b.
The detailed syndromes and for sessions and
will be thus be identical—each faulty PLB is configured to
perform the same function in both sessions. Also for sessions
and , Ais a CUT and Dis a TPG. Hence and
will also have identical detailed syndromes Using the same
reasoning for faulty pairs BE and CF we have: For AD:
, ; For BE: , ; For CF:
, .
We may not be able to distinguish faulty pairs AD and BE
when and . However,
this is a very unlikely event. For example, consider AD as
the faulty pair. In session , Ais a TPG and is a CUT,
while in , and are ORAs. The syndrome for
gives the results of testing as a CUT using non-faulty PLBs
( as the ORA and as the other CUT), while for
is the result of testing two non-faulty CUT’s ( ) using a
faulty PLB as the ORA. One can see that it is very unlikely
that the 0/1 bits in will match the 0/1 bits in
, since for that to happen needs to fail as a CUT in
for exactly those input test patterns for which fails as
an ORA in . Similarly, of and will only match
if fails in as a TPG for exactly those test patterns for
which fails in as an ORA–a very low probability event.
Thus when AD is the faulty pair, only if two very
low probability events occur. Similarly, if two very
low probability events occur. Thus AD and BE faulty pairs will
be indistinguishable, only if four very low probability events
occur, making this situation astronomically unlikely. Similarly,
any detailed syndrome equality between any of the other three
faulty pairs is extremely unlikely.
Finally, all the gross syndromes ( , ) of Table II (1 faulty
PLB) are distinct from those of Table III (2 faulty PLBs);
note again the unique pattern of three consecutive (pass)
syndromes in Table II that occur in for each faulty PLB,
and which do not occur in for any of the entries in Table III.
We can thus distinguish between the syndromes for single and
double faults.
V. FU NC TI ONA L TES TI NG A ND DIAG NO SI S
We present here a functional testing and diagnosis (TAD)
technique Fast-TAD, that, in conjunction with our BISTer
designs of the previous sections, detects failures of PLBs only
in two possible functional modes they will be used in as the
ROTE moves across the FPGA in either a fault-free scenario or
in the presence of reconfigured faults. As mentioned earlier,
functional testing is possible only when it is known which
circuit(s) will be mapped to the FPGA. For simplicity of
exposition, we assume that only one circuit will be mapped to
the FPGA; the extension to multiple circuits is straightforward.
8
123456
r2
r1
ROTE Arean1
Fa Fb Fc
(b)
r1 r2
123456
ROTE Area n1
Fa Fb Fc
(a)
Fig. 9. ROTE movement in the presence of a single f-faulty PLB shown dark. represents the functionality of each PLB. (a) Initial position of the ROTE.
PLB 3 implements its original function ; PLB 6 is 2 fault-free PLBs to its right. (b) ROTE occupies columns 4 and 5. Note the warped shape of the
BISTer-1 tile needed to occupy four fault-free PLBs. PLB 6 is thus occupied causing its original function to be mapped to PLB 3.
TABLE III
GROS S S YNDR OM ES F OR BI ST ER -2 I N T HE PR ESEN CE OF T WO
FAULT Y PL BS,A SS UM IN G TH AT A FAULT Y PLB A LS O FAI LS A S A TP G
AND AS AN OR A AN D TH AT FAULT M AS KI NG DO ES N OT O CC UR F OR A LL
DE TAI LE D SY ND ROM ES (A VER Y HI GH P ROB AB IL IT Y ASS UM PT ION ).
Faulty
PLBs
AB X X X X X X
X X X X
AC X X X X X X
X X X X X
AD X X X X X X
X X X X X X
AE X X X X X X
X X X X X
AF X X X X X X
X X X X
BC X X X X X X
X X X X
BD X X X X X X
X X X X X
BE X X X X X X
X X X X X X
BF X X X X X X
X X X X X
CD X X X X X X
X X X X
CE X X X X X X
X X X X X
CF X X X X X X
X X X X X X
DE X X X X X X
X X X X
DF X X X X X X
X X X X X
EF X X X X X X
X X X X
Clearly, Fast-TAD should be much faster than the exhaustive
TAD methods of previous work. In the rest of this section,
we present the Fast-TAD method in conjunction with BISTer-
1; a similar approach can be used for functional testing with
BISTer-2.
Functional testing requires considerations of issues beyond
just restricting the function configured into a CUT to be its
function in the “normal” circuit (i.e., when there is no roving
tester). Issues that we address here are:
1. As the roving tester ROTE moves across the FPGA,
and the application circuit is configured to accommodate
this, each PLB can be configured with different circuit
functions; these are called its operational functions. Is
there a limited number of operational functions that a
PLB will be configured with, for any reconfigured fault
pattern (including the fault-free pattern), as the ROTE
moves across the FPGA, and is it possible to know these
a-priori before the ROTE starts roving?
2. Assume that the answer to the above question is in
the affirmative, and we know that each PLB will be
configured with different and known circuit functions
as the ROTE moves across the FPGA. Then, when a
PLB is in the ROTE area (i.e., in a BISTer-1 tile), it
must be ensured that all configurations that occur in it
during ROTE’s movement are tested. In a straightforward
method, this means that will need to be configured
with a total of functions in the two sessions in which
it is a CUT functions each time it is a CUT, e.g.,
when PLBs are CUTs in a session, they would each
be configured with the operational functions of as
well as the such functions of in a straightforward
application of functional testing. The question, however,
is whether it is possible to reduce the total number of
functional configurations for the CUT (and thereby
reduce the test time) below , while satisfying the
requirement of testing each CUT for all its operational
functions (across all its test sessions).
These issues and the diagnosability of BISTer-1 using
functional testing are tackled in the rest of this section.
A PLB is said to be functionally-faulty (f-faulty) if fault(s)
in cause incorrect output(s) to be produced for one or
more input vectors when implements any of its operational
functions. Fast-TAD only detects and diagnoses f-faulty PLBs;
faulty PLBs that are not f-faulty are accurately deemed to be
“good” as they do not affect the correct functioning of the
circuit.
The first question posed above is answered in the following
lemma.
Lemma 1: While roving the ROTE left to right in an FPGA
either without f-faults or with reconfigured f-faults, a PLB
needs to implement at most two functions, its original function
(determined when the ROTE is in its initial left-most position)
and the function of the PLB two f-fault-free PLBs to its right
in the same row.
Proof Sketch: A special case of this lemma is illustrated
in Fig. 9 in which the ROTE moves across the FPGA in
the presence of a single f-faulty PLB. Note that each PLB
implements at most two functions specified in the lemma, as
shown explicitly for PLB 3.
Following Lemma 1, for a PLB , we denote , the
original function of , as the circuit function it implements
when the ROTE is in its initial (leftmost two columns position,
and as the circuit function mapped to it when the ROTE’s
leftmost column is to the immediate right of ’s column (in
9
b1,b2
b1,b2
ORA
TPGORA
TPG
D
A
B
C
D
S1 S2 S3 S4
Sessions
c1,c2
C
D
S1 S2 S3 S4
PLB
Sessions
a1,a2
A
B
C
PLB
Sessions
S1 S2
c1,c2
S2
F-faulty
CUT
CUT
CUT
CUT
CUT CUT
CUT CUT
CUT
CUT CUT
CUT
X
Sessions
X
X
X
F-faulty
PLB PLB
A
B
C
D
X
S1
X
B
X/
X/
X/
X/
X/
X/
X/
X/
X/ X/
X/
X/
(b)
b1,b2
b1,b2 c1,c2
c1,c2 d1,d2
d1,d2 a1,a2
TPG
ORA
TPG
TPG
TPG
ORA
ORA
ORA
A
(d)
(a)
(c)
Fig. 10. (a)-(b) PLB configuration for the first and second test sets,
respectively, with the functions tested in a CUT in each session. (c)-(d) Gross
syndromes for each test set.
other words, is the original function of the PLB two f-
fault-free PLBs to s right in the same row). In Fig. 9, for
PLB 3, for example, and . and are
the only two operational functions/configurations of any PLB
in any reconfigured fault situation.
We now address the second question posed at the begin-
ning of this section. Since , as established above, a
straightforward application of functional testing to BISTer-1
would require each PLB in the BISTer to be configured
with four functions each time it is a CUT (e.g., when
are CUTs in session [see Fig. 10a], both and would
be configured with functions ), resulting in eight
total functional configurations of each CUT across all four
sessions. It is, however, possible to reduce the total number of
configurations of each CUT to almost half this number as we
discuss below.
Fast-TAD uses the BISTer-1 architecture of Sec. IV, but
uses two sets of tests. The first test set, uses all four BISTer-
1 sessions, while in the second test set, which is used for
further diagnosis, only one session is adaptively used. In the
first test set, each PLB is a CUT twice. In one of its CUT
sessions, it is tested with configurations and , while in its
second CUT session, it is tested with configurations and ,
where is the other CUT in that session. Figure 10a shows
all the sessions in the first test set, along with the functions
configured in the CUTs. Note that for each CUT, its two
operational functions are tested in exactly one session. Thus
all operational functions of all PLBs in the BISTer are covered
in the first test set. Figure 10c shows the gross syndrome
for PLB configurations in Fig. 10a. When the f-faulty PLB
is configured as a TPG then the gross syndrome is a pass.
When it is configured as a CUT and implements its operational
functions, then the gross syndrome is a fail . In all other cases
it is either a fail or a pass.
The second test set (Fig. 10b) is used only to distinguish
between the possible f-fault being in either of or in
either of (each PLB in the two pairs have common gross
syndromes see Fig. 10c). Only one further session of BISTer-1
is needed to distinguish between the above PLBs. As shown in
Fig. 10d, session is needed to distinguish between either of
being f-faulty, while session is needed to distinguish
between either of being f-faulty. Note that this second
test is needed only if a syndrome common to either of the
above pairs occurs. The diagnostics are explained further in
the proof of Theorem 5.
Theorem 5: The Fast-TAD method using BISTer-1 can di-
agnose one f-faulty PLB in each BISTer-1 tile.
Proof: As shown in Fig. 10c, the gross syndrome vector
(pass/fail) for all four sessions, when PLB is f-faulty are
disjoint from those of PLBs and . Also the gross syndrome
vectors for f-faulty PLB are disjoint from those of f-faulty
PLBs and . However, as shown in Fig. 10c, we may not be
able to distinguish between f-faulty PLBs and between
f-faulty PLBs . The second test (Fig. 10b) is performed in
case a gross syndrome that is common to the f-fault being in
either occurs (in this case only session of Fig. 10b is
performed), or a gross syndrome that is common to the f-fault
being in either occurs (in this case only session of
Fig. 10b is performed). We see from Fig. 10d that PLBs A, C
have different gross syndromes in session , and that PLBs B,
Dhave different gross syndromes in session . Hence using
the two set of tests, and only one session in the second test
(if needed), we get unique gross syndromes for each f-faulty
PLB.
Hence in FAST-TAD using BISTer-1, in the fault-free case,
each CUT will be configured with a total of only four functions
over all sessions (test set 2 will not be required in this case).
When there are one or more faults, only two of the 15
remaining gross syndromes lead to the second test set with
exactly one session ( or in Fig. 10(b) depending on
the gross syndrome) in which the one pair of CUTs tested is
configured with two functions. Thus if is the fault probability
of a PLB, the average number of functional configurations
required for each PLB per ROTE movement across the FPGA
is . E.g., for p=0.01, this number
is 4.01, and for , it is 4.09, as opposed to always 8
for the straightforward method. Thus almost a factor of two
improvement in test time is obtained over the latter by using
our novel functional testing technique.
Finally, in the broader context of fault tolerance mentioned
in Sec. II, where spare PLBs need to be configured into the
circuit, we briefly discuss the issue of functional testing for
these PLBs. We propose that in order to reduce test time
these PLBs be tested only when they need to be configured
into the circuit. The fault reconfiguration method (e.g., [14])
can identify the spare PLBs that will be configured in, which
PLBs they will replace, and hence what their operational
functions will be. We can then perform FAST-TAD functional
testing for the spare PLBs by including them in BISter-1
tiles. If any spare PLB fails such a test (is diagnosed as
faulty), then the reconfiguration algorithm can identify another
spare PLB as a replacement, and so forth, until we identify
10
the required functionally-correct spare PLB. On the issue of
accumulation of faults in untested spare PLBs until they are
needed, note that these PLBs are not connected to any track
segment and are thus effectively isolated from the rest of
the system. Of course, an interconnect BIST method (e.g.,
[18], [19]) should periodically check that the switches in the
PLB-to-track interconnection structure are all set to the off or
disconnect state. Also, due to the large number of interconnect
tracks in a routing channel in current FPGAs, adjacent PLBs
are sufficiently separated from each other that dormant faults
in them should not affect the other. Thus faults that may
accumulate in spare PLBs or in circuit PLBs that do not affect
their operational functions, have little likelihood of affecting
correct circuit operation.
VI. AP PL IC ATIO NS T O CUR RE NT FPGAS
The maximum length of a test vector generated by the
TPG should be equal to the number of inputs to the CUT.
Hence a TPG must have output pins to test an -input
PLB when it is a CUT. Thus if the number of output pins of
a PLB is , then the number of PLBs required for the TPG is
. For current FPGAs, , and thus three PLBs are
required in a TPG for generating test vectors for the entire
PLB. However, because of the multiple-block structure of
current SRAM based FPGAs (e.g., Xilinx Virtex-II), explained
in more detail below, it is possible to use TPGs with fewer
PLBs, and in particular, a single-PLB TPG, for sequentially
testing smaller parts of a PLB (e.g., logic units, look-up tables)
using smaller-length test vectors, thereby testing the entire
PLB.
In the following, we provide techniques for:
1. A 1-PLB TPG to sequentially test a CUT PLB in the
Xilinx Virtex-II FPGA (a similar approach can be used
for other current FPGAs as well) that can be used in
BISTer-1 in a straightforward way.
2. A 3-PLB TPG used in BISTer-1 to test all parts of
a PLB simultaneously (useful for FPGAs with ,
which seems to be the typical value for this ratio in
current FPGAs). With such a TPG, BISTer-1 becomes a
PLB tile and some modifications are needed in its
operation compared to the 1-PLB TPG BISTer-1. These
are explained below and theoretical results proven for the
diagnosability of this modified BISTer-1; we refer to this
BISTer-1 configuration as BISTer-1 .
A. BISTer-1 with a 1-PLB TPG for Current FPGAs
Present technology FPGAs like Xilinx Virtex-II have PLBs
that are formed of multiple building blocks called Logic Cells
(LC). Each LC contains a function generator, i.e., a look-up
table (LUT), carry and control logic and a storage element
(flip-flop [FF]). Each of these -input LCs can be configured
independently to perform any -input boolean function. Two
LCs can also be multiplexed to perform a -input boolean
function ( ), with the control signal of the multi-
plexor coming from a separate PLB input. Fig. 11 shows the
schematic of the PLB with multiple LCs. Also, in most current
FPGAs, the number of flip-flopped (FF’ed) outputs (output
LUT FFLOGIC
LUT FFLOGIC
LUT FFLOGIC
LU
Fig. 11. Schematic of a PLB formed of multiple Logic Cells (LCs).
pins that are outputs of flip-flops in the PLB) of the PLB, is
greater than , the number of inputs to each LC. Note that the
outputs of a TPG are all essentially state bits of finite-state
machine, and thus will all need to be flip-flop outputs. Hence,
even though the total number of PLB outputs may be greater
than , for the purpose of TPG design, we can only consider
that subset of PLB outputs that are FF’ed outputs. Most current
Xilinx FPGAs, for example, like the Virtex-II, Virtex-II Pro,
Virtex-4, Spartan 3E, Spartan 3/3L, have and
FF’ed outputs along with other non-FF’ed outputs. Thus we
can apply the technique described below, which requires that
, to use a 1-PLB TPG to sequentially generate test vectors
for all LCs in a CUT PLB.
With a TPG formed of a single PLB with the design given
in Fig. 11, the following procedure can then be employed in
each testing session in BISTer-1:
1. The TPG PLB with ( ) outputs produces -bit
test vectors, and an additional bit, if needed (see below),
that is held constant at a 0 or a 1 depending on which of
two possible multiplexed LCs is being currently tested.
2. If the LCs of the CUT are independent (i.e., the LCs are
not multiplexed) then the -bit test vectors generated by
the single PLB TPG are simultaneously passed to each -
input LC of the CUT. The output of each LC of the CUT
is then compared with the output of the corresponding
LC of the other CUT PLB.
3. If two LCs of the CUT are multiplexed then only one
LC is tested at a time by passing the -bit test vector
to it and comparing it with the corresponding LC of the
other CUT. As shown in Fig. 12a the upper LC of the
CUT PLB, enabled by keeping the control signal of the
multiplexor at a constant value 0, is under test. After
the testing of the first LC in the multiplexed structure is
completed, the second LC in this structure is enabled by
changing the control signal value of multiplexor and is
tested in a similar fashion by passing -bit test vectors to
its input (see Fig. 12b).
B. BISTer-1 with a 3-PLB TPG for Current FPGAs
BISTer-1 is similar to the BISTer-1 of Sec. IV-A
except that there are two extra PLBs (in the third column)
acting as the two extra TPG PLBs, as shown in Fig. 13. There
11
FF
LOGIC
FFLOGIC
CUT
OUTPUT
FF
LOGIC
FFLOGIC
CUT
OUTPUT
Fig. 12. Testing of multiplexed LCs in a CUT. (a) The upper LC of the multiplexed structure is tested by passing test vectors from the TPG to its input
with the MUX control input held at 0. (b) The lower LC in the multiplexed structure is under test; the the MUX control input is constant at 1.
Fig. 13. BISTER-1 tile: (a) In the first configuration, PLBs and
are tested in a BISTer-1 fashion in 4 sessions, while PLBs and
function as the two additional TPG PLBs. (b) The second configuration in
which PLBs are tested in 4 session and PLBs and are the
two additional TPG PLBs.
X
XX
X
X
X
X
X
ROTE POSITION 1 ROTE POSITION 2
Fig. 14. Faults in alternate columns and rows diagnosed by BISTer-1 .
The areas inside the dark lines are the BISTer areas in the corresponding
ROTE positions.
are two configurations, each with four sessions in this BISTer.
In the first configuration with the first four sessions, PLBs
and are tested in a BISTer-1 fashion with
PLBs and functioning as the two extra TPG PLBs. In
the next configuration with the rest of the four sessions, PLBs
and are tested in a BISTer-1 fashion with
PLBs and functioning as the extra TPG PLBs. Once
these eight sessions are completed, the BISTer tile moves one
row down and repeats another set of eight sessions. In BISTer-
1, the ROTE moves by two columns instead of one that
it did for the BISTer-1. This is because, if the ROTE
is moved by one column instead of two, then testing of the
sub-area and will be unnecessarily repeated.
a) Diagnosabilities of BISTer-1 :: Note that BISTer-
1 can be used in the exhaustive testing mode (as in the
BISTer-1 of Sec. IV-A) or in the functional mode as described
in Sec V.
Theorem 6: BISTer-1 ’s diagnosabilities in either the ex-
haustive or functional PLB testing modes are as follows.
(a) It can diagnose one faulty (f-faulty) PLB in each
sub-area in the exhaustive (functional—Fast-TAD) testing
mode.
(b) It can diagnose a faulty (f-faulty) PLB in every alter-
nate column or row in the exhaustive (functional—Fast-
TAD) testing mode.
(c) It can diagnose two faulty (f-faulty) PLBs in con-
secutive rows in the exhaustive (functional—Fast-TAD)
testing mode.
Proof: (a) Considering one fault in a BISTer sub-area,
if the fault is in either of or then for exhaustive
testing, from Theorem 2, it will be diagnosed in the first four
sessions, as the extra TPG PLBs and are fault- free.
Similarly, if the fault is in or , then it will be diagnosed
in the last four sessions where the extra two TPG PLBs will be
fault-free PLBs and . A similar conclusion follows from
Theorem 5 when BISTer-1 is used in the functional testing
mode (Fast-TAD).
(b) If the faults (or f-faults) are in alternate columns or rows,
then each fault will fall in a different BISTer area where
it will be the only fault in that particular sub-area. This is
shown in Fig. 14. As there will be only one fault per BISTer
area and since BISTer-1 is 1-diagnosable from part (a) of
this theorem, all the faults in alternate columns or rows will
be diagnosed.
(c) As BISTer-1 moves one column down after eight
sessions, hence the two consecutive faults (or f-faults) will
fall in two different BISTer sub-areas as shown by Fig. 15.
Thus from part (a) above, each faulty or f-faulty PLB will be
diagnosed.
The following corollary provides more analysis of the fault
patterns of at most one fault in disjoint subarrays (at most
one PLB fault out of 4 PLBS) that BISTer-1 can diagnose;
note that BISTer-1 can diagnose all patterns with at most one
12
4X
3X
1
2X
: Diagnosed fault
: Undiagnosed fault
X
Legend:
Right-edge 2x2 subarrays
2x2 subarrays with single faults Left-edge 2x2 subarrays
X
XX
X
(a) (b)
2x3
BISTer-1 tiles
X
X
2x3
BISTer-1 tiles
Fig. 16. Fault patterns for Corollary 1. (a) A pattern of single faults, shown by X’s, in disjoint subarrays that is completely diagnosable by BISTer-1 .
(b) Another pattern of faults in disjoint subarrays that is partially (three of four) diagnosable by BISTer-1 —fault 2 meets condition (i), fault 3 meets
conditions (i) and (ii) (both the left-edge and right-edge subarrays that it lies in meet their respective conditions) and fault 4 meets condition (ii) of
Corollary 1.
X
X
ROTE POSITION
BISTer
TILE 1
BISTer
TILE 2
Fig. 15. Two consecutive faults in rows are diagnosed by BISTer-1 .
Areas inside the dark lines are the BISTer areas.
fault in disjoint subarray (Theorem 2).
Corollary 1: Assuming that the ROTE moves by two
columns to its next position, BISTer-1 can correctly di-
agnose a single fault in any subarray that is within any
region tested by it, i.e., any region made up of the tiles
occupied and tested by BISTer-1 if one of the following
conditions is met:
(i) If a subarray is at the left edge of a tile
occupied by BISTer-1 , then any single fault in A can be
correctly diagnosed if there is no fault in the left column of
the subarray that is to the immediate right of .
(ii) If a subarray is at the right edge of a tile
occupied by BISTer-1 , then any single fault in A can be
correctly diagnosed if there is no fault in the right column of
the subarray that is to the immediate left of .
This means that BISTer-1 can completely diagnose (i.e.,
diagnose all the faults correctly) for at least 50% of all fault
patterns with at most a single fault in disjoint subarrays
of the FPGA; see Fig. 16(a) for an example of such a fault
pattern that is completely diagnosable by BISTer-1 . For
other fault patterns in the above category that BISTer-1
is unable to completely diagnose, it may be able to diagnose
them partially, i.e., diagnose some of their faults but not all
(see Fig. 16(b)).
Proof: Consider any subarray in a tile occupied by
BISTer-1 . If is at the left edge of BISTer-1 (e.g., the
“left-edge” subarrays in Fig. 16(b) in which faults 1 and 2 lie),
then condition (i) of the corollary means that the column
to the immediate right of is non-faulty. This implies that
there is exactly one fault in this tile and from Theorem 6
this fault will be diagnosed. Similarly, if is at the right
edge of a BISTer-1 tile (e.g., the “right-edge” subarray in
Fig. 16(b) in which fault 4 lies), then condition (ii) of the
corollary means that there is exactly one fault in this
tile and from Theorem 6 this fault will be diagnosed.
If is at the left edge of BISTer-1 , then, given that
the subarray to its immediate left has a single fault
in it, the probability that this fault is not in the left column
of [condition (i)] is 0.5, and if has no faults in it, then
condition (i) has a probability of 1. A similar analysis holds
when is at right edge of BISTer-1 . Thus BISTer-1
will completely diagnose at least (exactly) 50% of all fault
patterns with at most (exactly) a single fault in each disjoint
subarray of the FPGA.
Figure 16(a) shows a failure pattern that is completely
diagnosable by BISTer-1 while Figure 16(b) shows one
that is partially diagnosable.
C. Pros and Cons of BISTer-1 and BISTer-1
The disadvantage of BISTer-1 with a 1-PLB TPG in FPGAs
with to the 3-PLB TPG BISTer-1 (BISTer-1 ) is that it
requires separate configurations of the interconnections from
the TPG to the PLB CUTs, and from the relevant CUT outputs
to the ORA versus only one configuration for BISTer-1 ;
recall that is the input/output pin ratio of a PLB and is the
number of logic cells (LCs) and hence LUTs in a PLB. This
can appreciably increase the testing time for BISTer-1. This
is ameliorated somewhat by the fact that since in BISTer-1
smaller parts of the CUT PLBs are tested at a time, fewer total
test vectors are needed to test the entire PLB ( versus
in BISTer-1 , where recall that is the number of outputs
and the number of inputs of a PLB). Another ameliorating
factor is that in the BISTer-1 ROTE area, after the first
round of testing (8 sessions per BISTer-1 tile), each Bister
tile (save the last) moves down by one row for a second round
of Bister sessions. Thus the total number of sessions in each
ROTE area in an FPGA is for BISTer-1 , while
this number is for BISTer-1. However, since
the BISTer-1 ROTE moves across the FPGA by one column to
its next position, while the BISTer-1 ROTE moves across
the FPGA by two columns from one position to the next, the
total test sessions for the entire FPGA for BISTer-1 is
versus for BISTer-1 .
The 1-PLB-TPG BISTer-1, however, has the advantage
of higher diagnosability (1 of 4 PLB faults [[Theorem 2])
compared to BISTer-1 (1 of 6 PLB faults [Theorem 6(a)]).
Note, however, from Theorem 6 and Corollary 1 that there are
other fault patterns in the FPGA that the ROTE using BISTer-
1 can diagnose that alleviate this reduced diagnosability
13
within the BISTer tile and results in high diagnostic coverage,
as discussed in the next section.
D. BIST for FPGAs with PLBs with Larger Input/Output Pin
Ratios
With rapid technology advances leading to higher integra-
tion and smaller feature sizes, it is likely that larger function-
ality will be packed in a PLB leading to increases in the PLB
input/output pin ratio . Our assumption of is based on
the ORCA 2C series FPGAs from Lucent Technologies; [1],
[3], [4] use the same assumption also based on this family of
FPGAs.
Some current FPGAs like those from Xilinx and Altera
have PLB input/output pin ratios of around 4:1. However,
these PLBs are basically composed of multiple slices. Recent
FPGAs from Xilinx have four slices per PLB, and each slice
has two LCs (logic cells). Each LC has two 4 input 1 output
LUTs along with control logic and a flip-flop. Thus each slice
has 8 inputs and two flip-flop outputs (along with other non-
flipflop outputs). All these four slices in a PLB can be tested
simultaneously by passing 8-bit test vectors to each slice. Even
if two slices are combined to perform a single function (and
at most two slices can be so combined), at most 16-input test
vectors are required. Three PLBs, each having 8 outputs, are
enough to produce 16-input test vectors. Thus BISTer-1
can be used to test such a PLB in two rounds, one round per
two slices of the PLB. In general, irrespective of the value of
, it should be possible to use either BISTer-1 or BISTer-1
to test a PLB in a sequence of one or more rounds.
VII. SIM UL ATIO N RES ULTS
A FPGA array with 3-input 1-output PLBs was
functionally simulated in C with random functions mapped
to each PLB—each LUT entry was randomly chosen to be
0 or 1 with a probability of 0.5. Test and diagnosis using
two techniques was implemented on this FPGA: (1) Fast-TAD
with BISTer-1 in which as mentioned in Sec. VI in each
ROTE position after each BISTer-1 tile (except the bottom
one) finishes testing it is shifted down by one row for another
testing phase; as mentioned in Theorem 6 and Corollary 1,
this BISTer is 1-diagnosable but can also provably diagnose
many 2-fault patterns using non-adaptive diagnosis. (2) STAR-
BISTer using the provably 1-diagnosable non-adaptive
diagnosis technique of [1], [3]; see Secs. III-B and III-C.
Finally note that STAR- BISTer performs exhaustive
testing (it has no other mode of testing) [1], [3]. However,
to reduce its fault latency below what would be normal for its
mode of testing, we used the STAR- BISTer to test only
16 of the possible functions of a 3-input LUT.
It is informative to note that while both BISTers use 6-
PLB tiles, their structures and PLB functionality rotations
for obtaining different test sessions are very dissimilar. In
particular, STAR- BISTer rotates the PLB functionalities
in a cycle to obtain its six sessions (see Sec. III-B and Fig. 3).
On the other hand, in our BISTer-1 , the functionality
rotations occur in two overlapped cycles in subtiles
(see Sec. VI-B and Fig. 13), each corresponding to the basic
structure of the the 1-diagnosable -tile BISTer-1 of
Sec. IV-A, leading to 8 sessions of testing. As a result, in
BISTer-1 we are able to harness the 1-diagnosability of
each of its two overlapped subtiles leading to more
patterns of provable diagnosabilities (beyond 1-diagnosability)
as established in Theorem 6 and Corollary 1.
Three different types of PLB fault patterns were injected:
(a) Randomly distributed faults with a given density. (b)
Moderately clustered faults in which faults occur in a cluster
around a center faulty PLB with a probability distribution
function (pdf) of where is the Manhattan distance of
a PLB from and is a suitable proportionality constant; each
cluster is distributed randomly across the FPGA with a certain
density (e.g., a 2% cluster density out of 1000 PLBs means
20 fault clusters are randomly distributed and each cluster will
have multiple faults with the above pdf). (c) Strongly clustered
faults in which faults occur in a cluster around a center fault
with a pdf of .
In both moderate and strong clusters, the probability of a
PLB being faulty grows roughly linearly with the number of
faults near it6. According to [12] this linear growth is the
property that leads to the well-established Stapper yield model
for chip defects [17]. Thus in our cluster model we are roughly
approximating the clustering effect of defects that leads to the
Stapper yield model.
We measured two metrics in our simulations of the two
BISTers, diagnostic coverage defined as the percentage of
faults that are correctly diagnosed, and fault latency defined
as the average time from the occurrence of a fault to its
correct diagnosis (in our simulations all faults are generated
simultaneously at time 0). In the results, fault latencies are
given in units of , which is the time taken to test a PLB
with only one configuration or function mapped to it7.
As we can see in Figs. 17-19 and Tables IV-VI, our Fast-
TAD method using BISTer-1 outperforms the STAR-
BISTer in both diagnostic coverage and fault detection latency
across different fault densities ranging from 1% to as high as
30%. For random faults, our technique is quite stable, giving
a coverage of at 10% fault density, while the STAR
method’s coverage falls rapidly to about 75% at this density.
For clustered faults the absolute coverage gap between the two
methods is about 30-38% (this represents a relative coverage
gap of 50-68%), and the fault detection latency of the STAR
method is about 3-4 times more than that of our technique.
Thus our new methods significantly better the current state-
of-the-art in on-line testing. This augurs well for the effective
application of our techniques to current and emerging VDSM
6E.g., in the strong cluster model, if a PLB is near two faulty PLBs
, and at distances of and from them, respectively, then its fault
probability is proportional to .
7By definition, includes the reconfiguration time to load in the new
configuration bits to test a PLB and the actual test time –time to generate
test vectors and the corresponding syndromes at the output of the ORA–to test
it; . Since our simulations are behavioral, we have kept
for both BISTers we compare. Note, however, that the latency comparisons
are valid since, if in our simulations, the two BISTers take
and times to test the entire FPGA under some fault density
and distribution, then if a time is included, the two BISTers will take
and times to complete testing, giving the same percentage
differences in their latencies.
14
30
40
50
60
70
80
90
100
1 2 5 7 10 15 20 25 30
Fault Density (%)
Di ag. Coverage (%)
Fast-TAD (2x3 BISTer-1) STAR (3x2 BISTer)
200
300
400
500
600
700
800
900
1 2 5 7 10 15 20 25 30
Fault Density (%)
Fault Latency (x t_1)
Fast-TAD (2x3 BISTer-1) STAR (3x2 BISTer)
(a) (b)
Fig. 17. Results for random fault distribution: (a) Diagnostic coverage and (b) fault detection latency comparisons between our Fast-TAD: BISTer-1 and
STAR- BISTer [1], [3]. Fault latencies are in units of , the time taken to test a PLB with only one configuration or function mapped to it.
TABLE IV
RES ULT S FO R RA ND OM FA ULT D IS TR IBU TI ON : DI AG NO ST IC COV ER AG E AN D FAULT D ET EC TI ON L ATE NC Y CO MPAR IS ON S BET WE EN O UR FA ST-TAD:
BI ST ER -AND STAR - BI ST ER [ 1] , [3 ]. FA ULT L ATE NC IE S AR E I N UN ITS O F ,T HE T IM E TAKE N TO T ES T A PLB W IT H ON LY ON E
CO NFIG UR ATI ON O R F UN CT IO N MAP PE D TO I T.
Fault Diagnostic Coverage (%) Fault Latency ( )
Density (%) Fast-TAD: BISTer-1 STAR- BISTer) Fast-TAD: BISTer-1 STAR- BISTer)
1 100.0 98.3 252 750
2 98.2 94.3 250 763
5 96.8 82.1 245 786
7 96.8 80.4 253 796
10 96.0 74.3 247 790
15 94.5 61.1 252 791
20 92.1 59.7 243 785
25 88.5 54.5 254 795
30 86.3 46.4 247 801
TABLE V
RES ULT S FO R MO DE RATE FAU LT-CLU ST ER S: D IAG NO ST IC COV ER AG E AN D FAULT D ET EC TI ON LAT EN CY C OM PARI SO NS BE TW EE N OU R FAS T-TAD :
BI ST ER -AND STAR - BI ST ER [ 1] , [3 ] F OR K =0 .7 5 AND K=1 .0 0. FAU LT LATE NC Y UN IT I S TH E SA ME I S IN TAB LE I V.
Cluster Fault Diagnostic Coverage (%) Fault Latency ( )
k Density (%) Density (%) Fast-TAD: BISTer-1 STAR- BISTer) Fast-TAD: BISTer-1 STAR- BISTer)
1 4.1 92.2 61.3 237 761
0.75 2 8.2 91.6 56.6 239 840
3 12.3 89.2 52.0 239 854
1 4.7 92.9 55.1 240 876
1.00 2 9.1 90.1 52.5 231 869
3 14.2 87.6 51.1 245 826
TABLE VI
RES ULT S FO R ST RONG FA ULT-CL US TE RS : DI AGN OS TI C COVE RA GE A ND FAU LT DE TE CT IO N LATE NC Y CO MPAR IS ON S BET WE EN O UR FA ST-TAD:
BI ST ER -AND STAR - BI ST ER [ 1] , [3 ] F OR K =0 .2 5 AND K=0 .5 0. FAU LT LATE NC Y UN IT I S TH E SA ME I S IN TAB LE I V.
Cluster Fault Diagnostic Coverage (%) Fault Latency ( )
k Density (%) Density (%) Fast-TAD: BISTer-1 STAR- BISTer) Fast-TAD: BISTer-1 STAR- BISTer)
1 4.6 92.1 58.8 296 869
0.25 2 9.3 89.7 52.9 227 823
3 19.5 83.6 49.0 230 825
1 8.8 82.5 44.5 231 899
0.50 2 16.9 77.2 40.0 236 878
3 26.6 76.1 38.7 223 876
15
30
40
50
60
70
80
90
100
4.1 8.2 12.3
Fault Density (%)
Di ag. Coverage (%)
Fast-TAD (2x3 BISTer-1) STAR (3x2 BISTer)
30
40
50
60
70
80
90
100
4.7 9.1 14.2
Fault Density (%)
Di ag. Coverage (%)
Fast-TAD (2x3 BISTer-1) STAR (3x2 BISTer)
(a) (b)
200
300
400
500
600
700
800
900
4.1 8.2 12.3
Fault Density (%)
Fault Latency (x t_1)
Fast-TAD (2x3 BISTer-1) STAR (3x2 BISTer)
200
300
400
500
600
700
800
900
4.7 9.1 14.2
Fault Density (%)
Fault Latency (x t_1)
Fast-TAD (2x3 BISTer-1) STAR (3x2 BISTer)
(c) (d)
Fig. 18. Results for moderate fault-clusters: (a-b) Diagnostic coverage and (c-d) fault detection latency comparisons between our Fast-TAD: BISTer-1
and STAR- BISTer [1], [3] for (a,c) with and (b,d). The three fault density points on the x-axis correspond to cluster densities of 1%,
2% and 3%. Fault latency unit is the same is in Fig. 17.
30
40
50
60
70
80
90
100
4.6 9.3 19.5
Fault Density (%)
Di ag. Coverage (%)
Fast-TAD (2x3 BISTer-1) STAR (3x2 BISTer)
30
40
50
60
70
80
90
100
8.8 16.9 26.6
Fault Density (%)
Di ag. Coverage (%)
Fast-TAD (2x3 BISTer-1) STAR (3x2 BISTer)
(a) (b)
200
300
400
500
600
700
800
900
4.6 9.3 19.5
Fault Density (%)
Fault Latency (x t_1)
Fast-TAD (2x3 BISTer-1) STAR (3x2 BISTer)
200
300
400
500
600
700
800
900
8.8 16.9 26.6
Fault Density (%)
Fault Latency (x t_1)
Fast-TAD (2x3 BISTer-1) STAR (3x2 BISTer)
(c) (d)
Fig. 19. Results for strong fault-clusters: (a-b) Diagnostic coverage and (c-d) fault detection latency comparisons between our Fast-TAD: BISTer-1 and
STAR- BISTer [1], [3] for (a,c) with and (b,d). The three fault density points on the x-axis correspond to cluster densities of 1%,
2% and 3%. Fault latency unit is the same is in Fig. 17.
16
FPGAs, and to future nano-technology FPGAs.
At fault densities around the 8-9% range across the different
fault distributions, we also see that even BISTer-1 ’s diag-
nosability goes down from 96% for a random fault distribution
(Fig. 17(a), Table IV), to around 90% for moderate-cluster
distributions (Figs. 18(a-b), Table V [ ]), to around
83% for a strong-cluster distribution (Fig. 19(b), Table VI
[ ]). Since a clustered-fault scenario is probably more
representative of fabrication defect distributions [17], and fault
densities would be high for emerging technology FPGAs, an
important future research direction would entail developing
BISTers that can correctly diagnose a very high percentage of
faults in such distributions.
VIII. CO NC LU SI ON S
We presented new BISTer designs that have provable di-
agnosabilities of one and two, which are significant im-
provements over previous BIST methods. Our 2-diagnosable
BISTer is the first design that offers diagnosability greater
than one. We improved the testing efficiency of our BISTers
for situations in which the circuit(s) mapped to the FPGA
are known, by developing a fast functional test-and-diagnosis
method Fast-TAD that requires a PLB to be tested for only
two circuit configurations that it will provably assume under
any reconfigured fault pattern, as the ROTE moves across the
FPGA. This is opposed to exhaustive testing of PLBs used
in previous work. Due to the provable diagnosabilities of our
BISTers, it is possible to avoid time-intensive adaptive diag-
nosis schemes without significantly compromising diagnostic
coverage, thus making our methods additionally faster. We
extended our basic BISTer designs to those with multiple-PLB
test-pattern generators to more efficiently test the complex
PLBs of current commercial FPGAs, and proved the diagnos-
abilities of these designs as well. Our BIST techniques were
simulated in an on-line testing wrapper, though they are also
applicable in an off-line testing scenario by having multiple
BISTers simultaneously configured to cover the entire FPGA
and by testing a PLB for only its original circuit function.
Simulation results obtained for our extended BISTer-1
design for random as well as clustered faults with fault
densities of up to 30% show high diagnostic coverages. For
example, at around a 8-10% fault density we obtain diagnostic
coverages ranging from 96% for random faults to 83% for
strongly clustered faults. These results represent absolute
improvements of about 20-38% and relative improvements
of 28-68% in diagnostic coverage over the previous best
BISTer design, the STAR- BISTer of [1], [3] using non-
adaptive diagnosis. Further, the fault latencies of BISTer-1
are appreciably smaller, by factors of 3-4, than than those of
STAR- BISTer. Our BIST techniques should also perform
similarly well for off-line testing. Our methods are thus well-
suited for high diagnostic-coverage factory and field testing
of current VDSM and future nano-technology FPGAs that are
expected to have high fault densities.
REF ER EN CE S
[1] M. Abramovici, C. Stroud and J. Emmert, “ On-Line BIST and
BIST-Based Diagnosis of FPGA Logic Blocks”, IEEE Trans. on VLSI
Systems, Vol. 12, Issue 12, Dec. 2004, pp. 1284-1294.
[2] M. Abramovici and C. Stroud, “BIST-Based Test and Diagnosis of
FPGA Logic Blocks,” IEEE Trans. on VLSI Systems, Vol. 9, Issue 1,
Feb. 2001, pp. 159-172.
[3] M. Abramovici, C. Stroud, B. Skaggs and J. Emmert, “Improving on-
line BIST-based diagnosis for roving STARs”, Proc. 6th IEEE Int’l
On-Line Testing Workshop, 2000, pp. 31-39.
[4] M. Abramovici, C. Stroud, S. Wijesuriya and V. Verma, “Using Roving
STARs for On-Line Testing and Diagnosis of FPGAs in Fault-Tolerant
Applications”, Proc. IEEE Int’l Test Conf., Sept’99.
[5] M. Butts, A. DeHon, S.C. Goldstein, “Molecular Electronics: Devices,
Systems and Tools for Gigagate, Gigabit Chips”, International Confer-
ence on Computer Aided Design, 2002.
[6] S. Dutt, V. Verma and H. Arslan, “A Search-Based Bump-and-Refit
Approach to Incremental Routing for ECO Applications in FPGAs”,
Trans. Design Autom. of Electronic Syst., 7(4), pp. 664-693, 2002.
[7] S. Dutt, V. Shanmugavel and S. Trimberger, “Efficient Incremental
Rerouting for Fault Reconfiguration in Field Programmable Gate Ar-
rays”, Proc. IEEE Int. Conf. Comput.-Aided Design, 1999.
[8] S.C Goldstein and M. Budiu, “NanoFabrics: Spatial Computing Using
Molecular Electronics, Int’l Symp. Comp. Arch., 2001.
[9] F. Hanchek and S. Dutt, “Methodologies for Tolerating Logic and
Interconnect Faults in FPGAs,” IEEE Trans. Comp., Special Issue on
Dependable Comput., Jan. 1998, pp. 15-33.
[10] W. K. Huang, F.J. Meyer, X. Chen and F. Lombardi, “Testing Config-
urable LUT-Based FPGAs”, IEEE Trans. VLSI Systems, Vol. 6, No. 2,
pp. 276-283, June 1998.
[11] T. Inoue and H. Fujiwara, “Universal Fault Diagnosis for Lookup Table
FPGAs,” IEEE D & T of Computers, Vol. 15, No. 1, Jan. 1998.
[12] W. Kuo and T. Kim, “An Overview of Manufacturing Yield and
Reliability Modeling for Semiconductor Products”, Proceedings of the
IEEE, Vol. 87 No. 8, Aug. 1999.
[13] J. Lach, W. H. Mangione-Smith, and M. Potkonjak, “Low Overhead
Fault-Tolerant FPGA Systems,” IEEE Transactions on VLSI Systems,
Vol. 6, No. 2, 1998.
[14] N.R. Mahapatra and S. Dutt, “Efficient Network-Flow Based Tech-
niques for Dynamic Fault Reconfiguration in FPGAs”, Proc. 29th Int’l
Symp. on Fault-Tolerant Comput., 1999.
[15] F.P. Preparata, G. Metze and R.T. Chen, “On the connection assignment
problem of diagnosable systems”, IEEE Trans. Electron. Comput., vol.
EC-16, Dec. 1967, pp. 848-854.
[16] N.R. Shnidman, W. H. Mangione-Smith, and M. Potkonjak, “On-line
Fault Detection for Bus-Based Field Programmable Gate Arrays,” IEEE
Trans. on VLSI Systems, Vol. 6, No. 4, pp. 656-666, Dec. 1998.
[17] C.H. Stapper, “The effects of wafer to wafer defect density variations
on integrated circuit defect and fault distributions,” IBM Journal of
Research and Development, vol. 29, pp. 87-97, Jan. 1985.
[18] C. Stroud et al., “On-Line BIST and Diag. of FPGA Interconnect Using
Roving STARs”, Proc. IEEE Int’l On-Line Test Wkshp, 2001.
[19] V. Suthar and S. Dutt, “Efficient On-line Interconnect Testing in FPGAs
with Provable Detectability for Multiple Faults”, Proc. DATE’06, pp.
1165 - 1170, March 2006.
[20] V. Verma, S. Dutt and V. Suthar, “Efficient On-line Testing of FPGAs
with Provable Diagnosabilities”, Proc. Design Automation Conf. (DAC),
pp. 498-503, 2004 (nominated for a best paper award).
17
... Ceci est important quand il s'agit des missions où l'intervention directe de l'ingénieur est difficile. Dans la suite, nous étudions quelques travaux dans ce domaine.Dans[72], les chercheurs proposent une technique de test BIST en ligne et hors ligne pour le test et le diagnostic des pannes permanentes dans un FPGA-SRAM de façon périodique. La méthode propose trois circuits BISTs améliorés pour un diagnostic précis. ...
... La méthode propose trois circuits BISTs améliorés pour un diagnostic précis. La contribution principale de[72] est une continuité des travaux publiés dans[73] sont deux circuits BISTs pour le diagnostic appelés 1-et 2-diagnostiquable3 . La technique BIST proposée est implémentée en premier lieu dans les deux colonnes du FPGA-SRAM et qui va par la suite survoler tout le réseau FPGA-SRAM. ...
... La technique BIST proposée est implémentée en premier lieu dans les deux colonnes du FPGA-SRAM et qui va par la suite survoler tout le réseau FPGA-SRAM. La détection des pannes par la méthode BIST proposée dans[72] consiste à comparer la sortie du CLB sous test avec la sortie d'un autre CLB de même configuration. La non concordance dans les deux réponses indique un CLB défectueux dans le groupe de CLBs sous test. ...
Thesis
De nos jours, les circuits FPGAs à base de mémoire SRAM sont omniprésents dans les applications électroniques embarquées. Ainsi, ces circuits sont devenus un acteur principal dans l’amélioration du rendement de l’ensemble du spectre des systèmes-sur-puce (SoC). Néanmoins, les pannes se sont accentuées dans ces technologies émergentes, qu’il s’agisse de pannes permanentes provenant d’une forte densité d’intégration, associée à une complexité élevée des procédés de fabrication, ou de pannes transitoires découlant des particules chargées qui heurtent les FPGAs dans leurs environnements d’exploitation. La tolérance aux pannes des circuits FPGAs à base de mémoire SRAM est donc un paramètre essentiel pour assurer la sûreté de fonctionnement des applications implémentées. Dans le cadre de cette thèse, nous proposons une stratégie de tolérance aux pannes qui s’accommode des contraintes de fiabilité pour un système implémenté dans un FPGA à base de mémoire SRAM. Cette stratégie présente une grande flexibilité et un coût faible comparé à la technique de la redondance modulaire triple (TMR), et permet la gestion en temps d’exécution qui est une caractéristique importante pour les applications critiques. Dans cette thèse, nous proposons également des tests spécifiques, appelés algorithmes March, qui permettent de détecter les pannes intra-mots dans la mémoire de configuration d’un circuit FPGA- SRAM. Ces tests présentent l’avantage de bénéficier d’une implémentation rapide et d’obtenir un taux de couverture élevé
... Test can detect hidden defects. Ten years is an integrated self-test [7][8][9][10] . hese methods are popular in the test and the diagnosis of a variety of laws FPGA. ...
... hese methods are popular in the test and the diagnosis of a variety of laws FPGA. "his method was one of two design makes Bister roving linear diagnosis (Bank) 9 is proposed. Bister proposals avoid afecting the diagnosis will tell us the extent of the diference between loud. ...
... To compare the results, we make two mistakes density parameter diagnostic overlap. he density is deined as a defect alarm FPGA CLB 1000, and diagnostic coverage is deined as a percentage of the disease diagnosed well 9 . Figure 5 is a comparison of our work 9,22 . Figure 5 we say that we achieved 100% of the fault coverage, compared with the previous work. ...
Article
This paper discussed about the increasing complexity of Field-Programmable Gate Array (FPGA) in finding delay faults using BIST technique. It is a major challenge for FPGA for highest troubles shoot text and delay circuit quickly. Built-in-self-test method is a simple solution compared with expensive test equipment for the automatic transmission. Herein, the erection designed for the detection of delay faults in the second coefficient of FPGA resources Digital Signal Processing (DSP) block, FPGA board interconnects, Look-Up-Tables (LUT) and etc. The authors suggest comprehensive plan diagnose Bister to improve the effectiveness of the control logic, which diagnose all CLB 2 x 3 BIST are faulty. The overall process for the simulation has been done by tool Xilinx FPGA Vertex FPGA. The results show a significant improvement over previous methods.
... For decades, Build-In-Self-Test (BIST) [2]- [4], [5] has become very popular for testing and diagnosis of various faults. Traditionally logic BIST has performed in context of system, burn-in test and gate level test where diagnostic resolutions are usually not required. ...
... & Science University, Shibpur,India (e-mail: nachiketad@gmail.com, rahaman_h@it.becs.ac.in, e-mail: ibanerjee@it.becs.ac.in ).BIST is regaining its popularity as alternative test compression technique.Reference[5] presents a 1-and 2-diagnosable BISTer design that makes up Roving Tester (ROTE). The proposed BISTer can perform diagnosis without compromising fault coverage by avoiding time-intensive adaptive diagnosis. ...
... Roving test methods [16], [17] perform a progressive online scan of the FPGA fabric exploiting the device capability for run-time partial reconfiguration. These methods use small roving Self-Testing Areas (STARs) which are configured to be tested off-line while the remaining FPGA logic continues its normal operation without interruption. ...
... They also provide high fault coverage and diagnosis granularity comparable to BIST methods. Recent roving methods propose more efficient BIST designs achieving higher diagnostic coverage (diagnostic coverage is the percentage of faults correctly diagnosed), for example approach [17] achieves 88% diagnostic coverage at 25% fault density, while a previous approach [16] achieve 55%. It is obvious that as the defect density increases the diagnosis accuracy of roving methods will be reduced. ...
Conference Paper
Full-text available
The fundamental question addressed in this paper is how to maintain the operation dependability of future chips built from forthcoming nano- (or subnano-) technologies characterized by the reduction of component dimensions, the increase of atomic fluctuations and the massive occurrence of physical defects. We focus on fault tolerance at the architectural level, and especially on fault-tolerance approaches, which are based on chip self-diagnosis and self-reconfiguration. We study test and reconfiguration methodologies in massively defective nanoscale devices, either at fine granularity field programmable devices or at coarse granularity multi-core arrays. In particular, we address the important question of up to which point could future chips have self-organizing fault-tolerance mechanisms to autonomously ensure their own dependable operation. In the case of FPGAs, we present known fault tolerant approaches and discuss their limitations in future nanoscale devices. In the case of multicore arrays, we show that such properties as self-diagnosis, self-isolation of faulty elements and self-reorganization of communication routes are possible.
... This is critical for systems where the BIST works online and fault recovery should be done at time. A molecular approach is given by [4] and [19], but a circuit oriented design is not taken into account. Therefore a correct partitioning of the circuit, a distributed BIST with a fast response evaluator and fault recovery support is needed. ...
Chapter
FPGAs can be used for the design of autonomic reliable systems. Advantages are reconfiguration and flexibility in the design. However commercial FPGAs are first prone to errors. Second, the design flow is not yet supported for the use of fault tolerance techniques like Built-in Self-Tests. Fault tolerance can be reached through error detection and fault recovery. Most error detection techniques are not suitable for on-line detection because of detection times and long and inflexible training. This paper proposes a fault tolerant design for FPGAs. It has a Built-in Self-Test which error evaluation and fault recovery is supported by computing techniques inspired in the Immune System. A fault recovery and a hardware implementation model are also to be presented.
Article
The increased circuit complexity of field programmable gate array (FPGA) poses a major challenge in the testing of FPGAs. One of the test challenges is to detect the delay faults in high-speed circuits. Built-in-self-test (BIST) Technique is an ease solution compared with expensive automatic test equipment. In this work, a BIST structure is proposed to detect the delay faults in the various resources of the FPGA such as multiplier, digital signal processing (DSP) block, look-up tables etc. and interconnects of FPGA. The authors have also proposed a full-diagnosable BISTer structure that improves the testing efficiency of the logic BIST. The proposed BISTer structure can diagnose the faulty configurable logic block (CLB), when all the CLBs in the 2 × 3 BIST are faulty. The proposed scheme has been simulated in Xilinx Vertex FPGA, using ISE tool, Jbits3.0 API and XHWI (Xilinx HardWare Interface) and MATLAB7.0. The result shows significant improvement compared with earlier BIST methods.
Conference Paper
A conventional Concurrent Error Detection (CED) technique usually relies on two exact replicas of a given module to provide redundancy in fault-tolerant systems. A discrepancy in one of the two instances flags at least one of them as faulty. We propose a heterogenous redundant FPGA-based system by exploiting the application properties. Consequently, the replicated module is not necessarily an exact copy of the original module but is much less resource and power hungry. In the paper, we discuss two forms of the heterogeneous structure which are spatial and temporal redundancy based. These forms are evaluated using FPGA based hardware implementation of the Discrete Cosine Transform (DCT) block. A necessary condition is derived to declare the DCT block as fault-free. The results show that the heterogeneous spatial redundancy can realize a resource efficient CED pair at the cost of a small latency in error detection. On the other hand, the heterogeneous temporal redundancy can provide permanent faults resource coverage at the cost of reduced throughput with negligible resource overhead.
Conference Paper
We employ output-discrepancy consensus to mitigate faulty modules of a Triple Modular Redundant (TMR) arrangement using dynamic partial reconfiguration. Traditionally, the fault-handling resilience of a TMR arrangement is limited to fault(s) in a single TMR instance over the entire mission duration. An additional permanent fault in any of two other TMR instances results in mission's failure. However, in this work, a novel Self-Configuring approach for Discrepancy Resolution (SCDR) is developed and assessed. In SCDR, the occurrence of faults in more than one module initiates the repair mechanism, then upon fault recovery, the system is configured into Concurrent Error Detection (CED) mode. The approach is validated by the complete recovery of a TMR realization of 25 stage Finite Impulse Response (FIR) filter implemented on a reconfigurable platform as a case study. The results show that a self-healing circuit can be realized exploiting the dynamic partial reconfiguration capability of FPGAs while requiring a streamlined operational data path compared to TMR.
Article
The capabilities of current fault-handling techniques for Field Programmable Gate Arrays (FPGAs) develop a descriptive classification ranging from simple passive techniques to robust dynamic methods. Fault-handling methods not requiring modification of the FPGA device architecture or user intervention to recover from faults are examined and evaluated against overhead-based and sustainability-based performance metrics such as additional resource requirements, throughput reduction, fault capacity, and fault coverage. This classification alongside these performance metrics forms a standard for confident comparisons.
Article
Full-text available
Incremental physical CAD is encountered frequently in the so-called engineering change order (ECO) process in which design changes are made typically late in the design process in order to correct logical and/or technological problems in the circuit. As far as routing is concerned, in order to capitalize on the enormous resources and time already spent on routing the circuit, and to meet time-to-market requirements, it is desirable to re-route only the ECO-affected portion of the circuit, while minimizing any routing changes in the much larger unaffected part of the circuit. Incremental re-routing also needs to be fast and to effectively use available routing resources. In this paper, we develop a complete incremental routing methodology for FPGAs using a novel approach called bump and refit (B&R); B&R was initially proposed in [Efficient Incremental Rerouting for Fault Reconfiguration in Field Programmable Gate Arrays] in the much simpler context of extending some nets by a segment (for the purpose of fault tolerance) for FPGAs with simple i-to-i switchboxes. Here we significantly extend this concept to global and detailed incremental routing for FPGAs with complex switchboxes such as those in Lucent's ORCA and Xilinx's Virtex series. We also introduce new concepts such as B&R cost estimation during global routing, and determination of the optimal subnet set to bump for each bumped net, which we obtain using an efficient dynamic programming formulation. The basic B&R idea in our algorithms is to re-arrange some portions of some existing nets on other tracks within their current channels to find valid routings for the incrementally changed circuit without requiring any extra routing resources (i.e., completely unused tracks), and with little effect on the electrical properties of existing nets.
Conference Paper
Full-text available
New electronics technologies are emerging which may carry us beyond the limits of lithographic processing down to molecular-scale feature sizes. Devices and interconnects can be made from a variety of molecules and materials including bistable and switchable organic molecules, carbon nanotubes, and, single-crystal semiconductor nanowires. They can be self-assembled into organized structures and attached onto lithographic substrates. This tutorial reviews emerging molecular-scale electronics technology for CAD and system designers and highlights where ICCAD research can help support this technology.
Conference Paper
Full-text available
In this paper, we consider a "dynamic" node covering framework for incorporating fault tolerance in SRAM-based segmented array FPGAs with spare row(s) and/or column(s) of cells. Two types of designs are considered: one that can support only node-disjoint (and hence nonintersecting) re c- tilinear reconfiguration paths, and the other that can sup- port edge-disjoint (and hence possibly intersecting) rect ilin- ear reconfiguration paths. The advantage of this approach is that reconfiguration paths are determined dynamically de - pending upon the actual set of faults and track segments are used as required, thus resulting in higher reconfigurabi l- ity and lower track overheads compared to previously pro- posed "static" approaches. We provide optimal network- flow based reconfiguration algorithms for both of our de- signs and present and analyze a technique for speeding up these algorithms, depending upon the fault size, by as much as times. Finally, we present reconfigurability results for our FPGA designs that show much better fault tolerance for them compared to previous approaches—the reconfigurabil- ity of the edge-disjoint design is 90% or better and 100% most of the time, which implies near-optimal spare-cell uti - lization.
Conference Paper
Full-text available
We present a very effective on-line interconnect built-in-self-test (BIST) method I-BIST for FPGAs that uses a combination of the follow- ing novel techniques: a track-adjacent and a switch-adjacent (also called ìmirror adjacentî) pairwise net comparison mechanism that achieves high detectability, a carefully designed set of only ve net-congurations that cover all types and locations of wire-segment and switch faults, a 2-phase global-detailed testing approach, and a divide-and-conquer technique used in detailed testing to quickly narrow down the set of potential suspect inter- connects that are then detail-diagnosed. These techniques result in I-BIST having provable detectability in the presence of an unbounded number of multiple faults, very high diagnosability of 99-100% even for high fault den- sities of up to 10% that are expected in emerging nano-scale technologies, and much lower test times or fault latencies than the previous best intercon- nect BIST techniques. In particular, for application to on-line testing, our method re- quires roving-tester (ROTE) congurations to test an entire FPGA, while the previous best online interconnect BIST technique re- quires congurations. Thus, I-BIST is an order of magnitude more time- as well as power-efcient, and will scale well with rapidly increas- ing FPGA device sizes that are expected in emerging technologies.
Conference Paper
Full-text available
The ability to reconfigure around manufacturing defects and operational faults increases FPGA chip yield, reduces system downtime and maintenance in field operation, and increases reliabilities of mission- and life-critical systems. The fault reconfiguration technique discussed in this work use the principle of node covering in which reconfiguration is achieved by constructing replacement chains of cells from faulty cells to spare/unused ones. A key issue in such reconfiguration is efficient incremental rerouting in the FPGA. Previous methods for node-covering based reconfiguration are “static” in the sense that extra interconnects are added a-priori as part of the initial circuit routing so that a specific fault pattern (e.g., one fault per row) can be tolerated [1]. This, however, results in worst-case track overheads and also in an inflexibility to tolerate other realistic fault patterns. In this paper, we develop dynamic reconfiguration and incremental rerouting techniques that are fault specific. In this approach, the FPGA is initially routed without any extra interconnects for reconfiguration. When faults occur, the routed nets have to be minimally perturbed to allow these interconnects to be inserted “on-the-fly” for reconfiguration. These requirements are addressed in our minimally incremental rerouting technique Conv_T-DAG, which uses a cost-directed depth-first search strategy. We prove several results that establishes the near-optimality of Conv_T-DAG in terms of track overhead. To the best of our knowledge, this is the first time that an incremental rerouting technique has been developed for FPGAs. For several benchmark circuits, the static approach to tolerating one fault per row resulted in a 43% to 34% track overhead. Using the dynamic reconfiguration approach and Conv_T-DAG results in an average overhead of only 16%—an improvement of more than 50%. Over all circuits, the reconfiguration time per fault ranges from 16.8 to 72.9 secs. Simulation of smaller fault sets of one to four faults show very small track overheads ranging from 1.75% to 4.49%. Conv_T-DAG can also be used for interconnect fault tolerance.
Article
Yield modelers have to take into account not only the wafer to wafer variations in defect densities, but also lot to lot, day to day, week to week, and month to month variations in defect levels that occur in integrated circuit fabrication. Models for these effects are described in this paper. All these models are based on the application of straightforward, elementary statistics. They are developed from fundamental random defect theory and adapted to actual data by deductive analysis. The effects on defect and fault distributions are derived; and a deficiency in some previous yield models is eliminated.
Conference Paper
We present novel and efficient methods for on-line testing in FPGAs. The testing approach uses a ROving TEster (ROTE), which has provable diagnosabilities and is also faster than prior FPGA testing methods. We present 1- and 2-diagnosable built-in self-tester (BISTer) designs that make up the ROTE, and that avoid expensive adaptive diagnosis. To the best of our knowledge, this is the first time that a BISTer design with diagnosability greater than one has been developed for FPGAs. We also develop functional testing methods that test PLBs in only two circuit functions that will be mapped to them (as opposed to testing PLBs in all their operational modes) as the ROTE moves across a functioning FPGA. Simulation results show that our 1-diagnosable BISTer and our functional testing technique leads to significantly more accurate (98% (90.5%) fault coverage at a fault/defect density of 10% (25%)) and faster test-and-diagnosis of FPGAs than achieved by previous work. In general, it is expected that ROTE will achieve high fault coverages at fault/defect densities of up to 25% using our 1-diagnosable BISTer and up to 33% using our 2-diagnosable BISTer. Our methods should thus prove useful for testing current very deep submicron FPGAs as well as future nano-CMOS and molecular nanotechnology FPGAs in which defect densities are expected to be in the 10% range.
Conference Paper
Incremental physical CAD is encountered frequently in the so-called engineering change order (ECO) process in which design changes are made typically late in the design process in order to correct logical and/or technological problems in the circuit. As far as routing is concerned, in order to capitalize on the enormous resources and time already spent on routing the circuit, and to meet time-to-market requirements, it is desirable to re-route only the ECO-affected portion of the circuit, while minimizing any routing changes in the much larger unaffected part of the circuit. Incremental re-routing also needs to be fast and to effectively use available routing resources. We develop a complete incremental routing methodology for FPGAs using a novel approach called bump and refit (B&R). We significantly extend this concept to global and detailed incremental routing for FPGAs with complex switchboxes such as those in Lucent's ORCA and Minx's Virtex series. We also introduce new concepts such as B&R cost estimation during global routing, and determination of the optimal subnet set to bump for each bumped net, which we obtain using an efficient dynamic programming formulation
Conference Paper
Presents the first on-line BIST and BIST-based diagnostic approach for the programmable interconnect resources in FPGAs. This interconnect BIST is used in the roving STARs approach. The technique provides a complete BIST of the programmable interconnect followed by high-resolution diagnostics to support reconfiguration around the fault for fault-tolerant applications. We have successfully implemented this BIST approach on the ORCA 2C series FPGA and present the results of testing and diagnosing known defective FPGAs