A Selective replacement method for timing-error-predicting flip-flops
ABSTRACT The aggressive technology scaling brings us new challenges, such as parameter variations, soft errors, and device wearout. They increase unreliability of transistors and thus will become a serious problem in SoC designs. To attack these problems, spatial redundancy is commonly utilized. Based on the spatial redundancy, a lot of dual-sensing flip-flops (FFs) are proposed. These FFs require additional circuits consisting of a redundant FF and a comparator. Thus, they suffer large area overhead. In order to reduce the area overhead, this paper proposes a selective replacement method. We focus our attention on a timing-error-predicting FF, named canary FF and evaluate the selective replacement method. We apply it to two commercial processors, Toshiba's MeP and Renesas Electronics's M32R. In the case of MeP, the area overhead is reduced from 55% to 11%.
- [Show abstract] [Hide abstract]
ABSTRACT: The demand of power saving and highly dependable LSI has increased by the miniaturization of device process technology and the spread of portable devices such as mobile phones. The design method which takes the worst case scenario makes the design margin too large because of the parameter variations in the deep submicron domain and it has serious impact for performance and power consumption. To deal with excessive design margins, typical-case design method with canary FF has been proposed so far. By using canary FF, variability-aware large guard band can be decreased. In this paper, we describe how canary FF can be integrated in a typical digital circuit design flow in detail and analyze the area and power overheads compared with the worst-case design method. The analysis is done by implementing two conventional 32-bit RISC processor cores; miniMIPS and MeP (Media Embedded Processor). The results show that our proposed method can reduce chip areas effectively and power overhead can be reduced to very small.Quality Electronic Design (ISQED), 2013 14th International Symposium on; 01/2013
Conference Paper: Dynamically reducing overestimated design margin of MultiCores[Show abstract] [Hide abstract]
ABSTRACT: MultiCore processor is one of the promising techniques to satisfy computing demands of the future consumer devices. However, MultiCore processor is still threatened by increasing energy consumption due to PVT (Process-Voltage-Temperature) variations. They require large design margins in the supply voltage, resulting in large energy consumption. The combination of DVS (Dynamic voltage scaling) technique and Canary FF (flip-flop), named Canary-DVS, has been proposed to eliminate the overestimated voltage margin but has only been evaluated under the assumption of typical delay. This paper considers C2C (Core-to-Core) variations and evaluates how Canary-DVS eliminates the energy waste under the practical assumption of delay variations. We adopt Canary-DVS to a commercial processor, Toshiba's quad-core Media embedded Processor (MeP). From Monte Carlo simulations, it is found that energy is reduced by 18.6% on average and there are not any noticeable discrepancies from the typical situations, when 0.064 of σ/μ value is assumed in gate delay.High Performance Computing and Simulation (HPCS), 2012 International Conference on; 01/2012
A Selective Replacement Method for Timing-Error-Predicting Flip-Flops*
Yuji Kunitake1 Toshinori Sato2 Hiroto Yasuura3 Takanori Hayashida4
1,3 Kyushu University
2,4 Fukuoka University
2 E-mail: email@example.com
* The preliminary results of this study were presented as a fast abstract at PRDC 2010 .
The aggressive technology scaling brings us new
challenges, such as parameter variations, soft errors, and
device wearout. They increase unreliability of transistors
and thus will become a serious problem in SoC designs. To
attack these problems, spatial redundancy is commonly
utilized. Based on the spatial redundancy, a lot of dual-
sensing flip-flops (FFs) are proposed. These FFs require
additional circuits consisting of a redundant FF and a
comparator. Thus, they suffer large area overhead. In order
to reduce the area overhead, this paper proposes a selective
replacement method. We focus our attention on a timing-
error-predicting FF, named canary FF and evaluate the
selective replacement method. We apply it to two
commercial processors, Toshiba’s MeP and Renesas
Electronics’s M32R. In the case of MeP, the area overhead
is reduced from 55% to 11%.
VLSIs, deep submicron technologies, dual-sensing flip-
flops, canary FF
As semiconductor technologies have been scaling, new
challenges of parameter variations, soft errors, and device
wearout are emerging.
Variations on an SoC chip are classified into die-to-die
(D2D) and within-die (WID) variations. Recently, the latter
ones, especially random WID variations, have become
serious. Random dopant placement and line edge roughness
(Process variations), supply voltage integrity (Voltage
variations), and temperature fluctuations (Temperature
variations) cause parameter variations [2, 3]. Process
variations are essential in semiconductor technologies and
they affect each transistor’s threshold voltage, resulting in
performance variations. Each of PVT variations increases a
safety margin, which is required since delays are no longer
constant, and thus SoC designs with considering worst case
is becoming very difficult.
Increasing soft error rate (SER) is now another major
concern for SoC manufactures . With the reduction in
transistor size, the area per logic state bit scales down. In
order to prevent breakdown caused by high electric field,
the supply voltage also scales down. Hence, the node charge
reduces and the bit cell is easy to flip by neutrons coming
from outer space and alpha particles released by radioactive
impurities. This results in logic errors. Since each bit cell
becomes small so that probability that some particles hit the
cell will also become small, resulting in the net effect of
almost constant SER per bit. Since the number of transistors
per SoC chip has been tremendously increased following
Moore’s law, SER per chip is also exponentially increasing.
Negative Bias Temperature Instability (NBTI)  is a
dominant wearout mechanism causing an increase of pMOS
threshold voltage. This results in the slowdown of transistor
switching speed and thus in the circuit performance
degradation. In other words, a non-critical path might cause
timing violation after aging. Researchers predict that the
circuit impact of NBTI will become increasingly significant
as semiconductor technologies continue to scale.
To attack these difficult problems, dual-sensing FFs are
studied [6-12]. These FFs have a redundant FF and a
comparator to detect their target errors. Hence, they share a
common problem. That is, they have the area overhead
because they essentially utilize spatial redundancy. In
addition, the other circuits with on-chip network for error
gathering and for error correction are necessary. They also
increase the area of the SoC chip. The increase in area
causes larger power consumption and higher manufacturing
costs. In this study, we focus our attention on the problem of
PVT variations, and utilize a timing-error-predicting FF,
named canary FF . We designed it and found that it is
2.65 times larger than the conventional FF. This paper
considers the problem of area overhead, proposes a solution,
and evaluates it.
The rest of this paper is organized as follows. The next
section explains the target problem of this study and
summarizes the dual-sensing FFs, especially focusing on
canary FF and explains the problem of area overhead due to
introducing canary FFs. Section 3 proposes a solution to
reduce the area overhead and evaluates it. Section 4
concludes the paper.
2. Dual-sensing Flip-flops
The emerging problems of the advanced semiconductor
technologies tend to overestimate timing margin. In order to
reduce the large margin, the combination of dynamic
voltage scaling (DVS) system with timing-error-detecting or
timing-error-predicting FFs is studied [6, 7]. These FFs
dynamically detect timing errors in an SoC chip and help the
978-1-61284-857-0/11/$26.00 ⓒ2011 IEEE
DVS system to manage the supply voltage with few or
without any timing errors. As overestimated timing margin
is eliminated, the power consumption is significantly
reduced. In our study, we utilize a timing-error-predicting
FF, named canary FF .
2.1. Canary FF
The canary FF  is augmented with a delay element
and the shadow FF, as shown in Figure 1. The shadow FF is
used as a canary in a coal mine to help detect whether a
timing error is about to occur. Timing errors are predicted
by comparing the main FF value with that of the shadow FF,
which runs into the timing error a little bit before the main
FF. Alert signal triggers voltage control. Utilizing canary FF
has the following three advantages. First, using single phase
clock simplifies clock tree design. It also eliminates the
short path problem . Second, canary FF is applicable to
the common LSIs that do not have the recovery mechanism,
where Razor FF  is inapplicable, since the shadow FF
protects the main FF against timing errors. Third, since the
delay buffer always has a positive delay, the shadow FF
always encounters a timing error before the main FF and
thus the canary FF is variation resilient.
Figure 1: Canary FF.
Canary FF has additional circuits such as the shadow FF,
the delay element and the comparator. They will have a
large impact on circuit area of the entire microprocessor. We
designed the layout of a canary FF using 65nm standard cell
library provided by VDEC and found its area is 2.65 times
larger than the conventional FF.
2.2. Related Work
In order to attack the problems caused by PVT variations,
by soft errors, and by NBTI, a lot of dual-sensing FFs are
Razor FF  has a redundant FF, named shadow FF,
where a delayed clock is delivered to meet timing constrains.
In other words, every shadow FF is expected to always hold
correct values. If the values latched in the main and shadow
FFs do not match, a timing error is detected. When the
timing error is detected in microprocessor pipelines, the
processor state is recovered to a safe point where the error
iRoC Technologies  proposes to utilize the shadow FF
to detect soft errors. Two implementations are considered.
One is very similar to the Razor FF and requires delayed
clock. The other does not require delayed clock, but the
input to the shadow FF is delayed. When two values stored
in the main and shadow FFs do not match, a soft error is
detected. By adjusting the delay, maximum transient-pulse
duration can be changed.
NEC  proposes to utilize the shadow FF to predict
wearout failures. Every combinational logic block is
duplicated and a failure part of the main circuit is switched
into its redundant copy. In order to predict the failure,
defect-prediction FF is proposed. It utilizes the shadow FF.
There is a delay line between the previous logic stage and
the shadow FF and the shadow FF might violate timing
constraints even when the main FF does not. Hence, by
comparing values stored in the main and shadow FFs, the
increase in the path delay due to the wearout can be
Agarwal et al.  propose a similar technique with
NEC’s defect-prediction FF. Intel  also proposes the
similar technique, which is an extension of the soft-error
resilient FF  in order to support process variation
3. Selective Replacement Method
First in this section, we will propose a selective
replacement method to reduce the number of canary FFs and
thus the area overhead. Next, we will introduce our
evaluation methodology. After that, experimental results
will be presented.
3.1. Replacement Strategy
We reduce the number of replaced FFs by considering
the distributions of path delay. The delays of paths in a
circuit are different with each other due to their logic depth,
wire length, and so on. The path with small delay will not
cause a timing error even if its supply voltage is declined by
the DVS system. These FFs need not to be replaced by
canary FFs. We specify the paths where timing errors might
occur and replace the FFs on their outputs by canary FFs.
We call the FFs timing-violating FFs. This replacement
policy reduces the area overhead.
Using Figure 2, we explain the replacement policy. First,
the circuit is logic-synthesized at the best case scenario and
the target cycle time is determined. Next, static timing
analysis is performed on the synthesized net list. The paths
that do not satisfy the target cycle time are specified and
their output FFs should be replaced by canary FF. In other
words, they are timing-violating FFs. Other FFs will not
cause timing errors.
3.2. Evaluation Methodology
Motif processors for evaluations are Toshiba’s Media
Embedded Processor (MeP)  and Renesas Electronics’s
M32R . We use the 65nm standard cell library shown in
Table 1. Synopsys’s DesignCompiler logic-synthesizes the
Figure 2: Delay distributions.
processors using the cell library at 1.3V. Table 2
summarizes the target processor specifications based on the
logic synthesis results.
Table 1: 65nm Standard Cell Library
Vdd (V) Temperature (C)
Table 2: Processor Specifications
Clock cycle time (nsec)
# of FFs
3.3. Number of Replaced FFs
We vary the supply voltage between 1.05V and 1.2V.
Figure 3 summarizes the number of timing-violating FFs for
every macro block. The blocks that do not have timing-
violating FFs between 1.05V and 1.2V are not shown in the
figure. The vertical line shows the block names and
horizontal line indicates the number. First, let us see the
MeP results. At 1.2V, there are not any timing-violating FFs,
except in the load/store unit (lsu). As the supply voltage is
down to 1.1V and to 1.05V, there are a number of timing
errors in the execution unit (exu) and instruction fetch unit
Next, let us see the M32R results. At 1.2V, any timing
error is not detected. As the supply voltage is down to 1.1V
and to 1.05V, timing errors are detected, especially in the
multiply-add unit (cpumac) and in instruction address data
path (cpudp_pc). For the memory management unit (mmu),
timing errors are not found when the supply voltage is larger
than 1.1V. However, in 1.05V, it has the largest number of
Figure 3: Number of timing-violating FFs.
3.4. Area Overhead
Figure 4 presents the area overhead when the timing-
violating FFs are replaced by canary FFs. The bar graph
indicates the number of replaced FFs and the line graph
indicates the percentage overhead in the area. While we
have already found canary FF is 2.65 larger than the
conventional FF, we assume the former is 3 times larger
than the latter in this evaluation in order to consider other
overheads such as wire routing. In the graph, the values at
1.3V present the area overhead and the number of replaced
FFs, when all FFs are replaced by canary FFs. The values at
the other supply voltage present them when only timing-
violating FFs are replaced by canary FFs.
First, let us see the MeP results. At 1.2V, it has 82
timing-violating FFs. When the supply voltage is down to
1.05V, it has 778 timing-violating FFs. In other words,
about 21% of FFs are timing-violating FFs. When they are
selectively replaced, the area overhead is only 12%.
Next, let us see M32R results. It does not have any
timing-violating FF at 1.2V. Even if the supply voltage is
down to 1.05V, it has only 756 timing-violating FFs. In
other words, only 7% of FFs are timing-violating FFs.
Hence, the area overhead is also as small as 2%.
These observations confirm the selective replacement
method is useful for mitigating the area overhead.
Figure 4: Area overhead and # of replaced FFs.
This paper proposed the selective replacement method
for a timing-error-predicting FF, named canary FF, in order
to mitigate the area overhead. The evaluations showed that
the chip area is increased by 55% and by 25% in the cases
of MeP and M32R processors, respectively. The selective
replacement method identifies timing-error-prone FFs and
replaces them by canary FFs. Using the method, the area
overhead is reduced to 11% and to 2% in the cases of MeP
and M32R, respectively.
Future study should consider the impact of the error-
gathering circuit and the error-correcting circuit on the chip
area. An additional post-layout replacement method should
be investigated. On considering soft errors, the other
selective replacement methods for the dual-sensing FFs may
be required. This is because soft errors affect timing while
NBTI does not so.
This work is partially supported by the CREST (Core
Research for Evolutional Science and Technology) program
of Japan Science and Technology Agency (JST), by Grant-
in-Aid for Scientific Research (B) #20300019, and by
Grant-in-Aid for JSPS Fellows #22-2357. The logic
synthesis of the motif circuits were performed with the
collaboration of STARC, e-Shuttle, Fujitsu, Renesas
Electronics, Synopsys through Toshiba and VDEC, the
University of Tokyo.
 Y. Kunitake, T. Sato, and H. Yasuura, “A Replacement
Strategy for Canary Flip-Flops,” 16th Pacific Rim
International Symposium on Dependable Computing,
 S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A.
Keshavarzi, and V. De, “Parameter Variations and
Impact on Circuits and Microarchitecture”, Design
Automation Conference, 2003.
 O. Unsal, J. Tschanz, K. Bowman, V. De, X. Vera, A.
Gonzales, and O. Ergin, “Parameter Variations and
Impact on Circuits and Microarchitecture,” IEEE
Micro, Vol.26, No.6, 2006.
 P. I. Rubinfeld, “Managing Problems at High Speed,”
IEEE Computer, Vol. 31, No. 1, 1998.
 V. Reddy, A. T. Krishnan, A. Marshall, J. Rodriguez,
S. Natarajan, T. Rost, and S. Krishnan, “Impact of
Negative Bias Temperature Instability on Digital
Circuit Reliability,” 40th IEEE International Reliability
Physics Symposium, 2002.
 T. Sato, and Y. Kunitake, “A Simple Flip-Flop Circuit
for Typical-Case Designs for DFM”, 8th International
Symposium on Quality Electronic Design, 2007.
 D. Ernst, N. S. Kim, S. Das, S. Pant, R. Rao, T. Pham,
C. Ziesler, D. Blaauw, T. Austin, K. Flautner, and T.
Mudge, “Razor : A Low-Power Pipeline Based on
Circuit-Level Timing Speculation”, 36th International
Symposium on Microarchitecture, 2003.
 M. Nicolaidis, “Time Redundancy Based Soft-Error
Tolerance to Rescue Nanometer Technologies,” 17th
VLSI Test Symposium, 1999.
 T. Nakura, K. Nose, and M. Mizuno, “Fine-Grain
Redundant Logic Using Defect-Prediction Flip-Flops,”
International Solid-State Circuits Conference, 2007.
 M. Agarwal, B. C. Paul, M. Zhang, and S. Mitra,
“Circuit Failure Prediction and Its Application to
Transistor Aging,” 25th VLSI Test Symposium, 2007.
 M. Zhang, TM Mak, J. Tschanz, K. S. Kim, N. Seifert,
and D. Lu, “Design for Resilience to Soft Errors and
Variations,” 13th International On-Line Testing
 S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. Kim,
“Robust System Design with Built-In Soft-Error
Resilience,” IEEE Computer, Vol. 38, No. 2, 2005.
 Toshiba Corporation, “Media embedded processor,”
 Renesas Electronics Corporation, “M32R/ECU Series”,