Cross-layer resilience using wearout aware design flow.
- Citations (19)
-
Cited In (0)
-
Article: Designing reliable systems from unreliable components: the challenges of transistor variability and degradation
[show abstract] [hide abstract]
ABSTRACT: As technology scales, variability in transistor performance continues to increase, making transistors less and less reliable. This creates several challenges in building reliable systems, from the unpredictability of delay to increasing leakage current. Finding solutions to these challenges require a concerted effort on the part of all the players in a system design. This article discusses these effects and proposes microarchitecture, circuit, and testing research that focuses on designing with many unreliable components (transistors) to yield reliable system designs.IEEE Micro 12/2005; · 1.78 Impact Factor -
Conference Proceeding: Electronics beyond nano-scale CMOS.
Proceedings of the 43rd Design Automation Conference, DAC 2006, San Francisco, CA, USA, July 24-28, 2006; 01/2006 -
SourceAvailable from: northwestern.edu
Article: The Case for Lifetime Reliability-Aware Microprocessors
[show abstract] [hide abstract]
ABSTRACT: Ensuring long processor lifetimes by limiting failures due to wear-out related hard errors is a critical requirement for all microprocessor manufacturers. We observe that continuous device scaling and increasing temperatures are making lifetime reliability targets even harder to meet. However, current methodologies for qualifying lifetime reliability are overly conservative since they assume worstcase operating conditions. This paper makes the case that the continued use of such methodologies will significantly and unnecessarily constrain performance. Instead, lifetime reliability awareness at the microarchitectural design stage can mitigate this problem, by designing processors that dynamically adapt in response to the observed usage to meet a reliability target.05/2004;
Page 1
Cross-layer Resilience Using Wearout Aware Design Flow
Bardia Zandian, Murali Annavaram
Electrical Engineering Department, University of Southern California
Los Angeles, CA
{bzandian,annavara}@usc.edu
Abstract—As process technology shrinks devices, circuits
experience accelerated wearout. Monitoring wearout will be
critical for improving the efficiency of error detection and
correction. The most effective wearout monitoring approach
relies on continuously checking only the most critical circuit
paths to detect timing degradation. However, circuits
optimized for power and area efficiency have a steep critical
path wall in some designs. Furthermore, wearout depends on
dynamic conditions, such as processor’s
environment, and application-specific path utilization profile.
The dynamic nature of wearout coupled with steep critical
path walls may result in excessive number of paths that need to
be monitored. In this paper we propose a novel cross-layer
circuit design flow that uses path timing information and
runtime path utilization data to significantly enhance
monitoring efficiency. The proposed methodology uses
application-specific path utilization profile to select only a few
paths to be monitored for wearout. We propose and evaluate
four novel algorithms for selecting paths to be monitored.
These four approaches allow designers to select the best group
of paths under varying power, area and monitoring budget
constraints.
operating
Keywords-Wearout; Timing margin; Cross-layer design
I.
INTRODUCTION
As devices scale to nanometer dimensions, processor’s
lifetime reliability is reduced due to increased stress factors
such as higher current density, electric field, and operation
temperature [1]. Reliability degradation manifests in the
form of many electro-physical phenomena such as
Electromigration, Time Dependant Dielectric Breakdown
(TDDB), Hot Carrier Injection (HCI), and Negative Bias
Temperature Instability (NBTI) [2]. The net result of these
phenomena is gradual timing degradation and eventual
breakdown of circuits [3-5], which is referred to as wearout
or aging. Designers estimate the expected wearout during the
lifetime of a processor and use guardbands to proactively
reduce the clock frequency (and increase supply voltage) to
account for worst-case wearout. But wearout prediction is
becoming increasingly challenging as process variations lead
to random device characteristics both within and across
chips. Dynamically changing environmental conditions and
workload-dependent circuit
exacerbate the problem of wearout estimation. While these
uncertainties existed even before, the severity of their impact
is increased as devices are scaled [6, 7]. There are different
models which explain the rate of wearout and its dependence
on different static and dynamic parameters. All these models
path utilization further
and physical device level experiments show a gradual
degradation which happens over long time periods [3-5].
Given that wearout occurs at a glacial time scale compared to
processor cycle time, it is best to monitor wearout first
before deploying expensive error correction mechanisms.
With continuous monitoring the occurrence of a wearout
related error is predicted before the error occurrence and
preventive adjustments are made to the circuit’s operation
point (e.g. changing clock frequency and supply voltage; or
deploying modular redundancy) in order to avoid errors [8,
9]. High accuracy error prediction allows for designing a
truly reliability-aware circuit which can adapt to the in-field
reliability state of the hardware.
Given the promise of the prediction approach several
prediction techniques have been proposed. Some prediction
techniques use “canary” circuits which are designed to fail
before the actual circuit [10]. Other techniques use sensors
inserted into the circuit at design time which are capable of
detecting wearout by sensing increased circuit delay [11, 12]
or changes in other parameters, such as threshold voltage
(Vth) [13]. Canary circuits do not test the actual signal paths
in the circuit; rather they only act as proxies for the primary
circuit wearout. More recently researchers proposed wearout
prediction based on monitoring the signal paths in the
primary circuit itself. In WearMon [14], stored test vectors
that are specifically selected to sensitize the critical paths of
the circuit are used for runtime tests that capture the timing
margin (also called timing slack) of these paths. Another in
situ circuit checking method [15] uses Built-in Self Test
(BIST) mechanism to perform runtime circuit tests. The
main advantage of monitoring actual signal paths using
stored test vectors compared to static sensory circuit
insertion is the ability to capture the effects of actual circuit
lifetime utilization at a lower cost and with higher flexibility
for online adaptation. For instance, test coverage can be
optimized during in-field operation with little or no overhead
by simply updating the stored test vectors.
Wearout monitoring mechanisms generally make the
following two basic assumptions: (1) In any given circuit
there are only a few circuit paths that have critical timing
margins. Hence, to accurately predict imminent timing
failures only a few circuit paths with the least timing margin
need to be monitored. (2) Circuit paths with least timing
margin have a higher probability of being among the first to
violate timing. Hence, monitoring prioritizes paths purely
based on the timing margin measured at design time. The
first assumption indicates that selection of only a few paths
for monitoring would be sufficient for robust monitoring.
This assumption may hold well in some designs that use
Page 2
Figure 1. Design time and runtime cross-layer interaction.
automatic design tools to synthesize, place and route the
design. In the absence of knowledgeable designer’s input
these tools typically do not create steep critical path walls
[16, 17], where a large number of paths have small timing
margin. However, custom design optimizations for
maximizing power and area efficiency, particularly
employed in high performance processors, may result in the
creation of a steep critical path wall in several circuit blocks.
In the presence of a steep critical path wall the number of
paths that need to be monitored can be very large, thereby
increasing the monitoring overhead. The second assumption
made by in situ approaches results in the selection of paths
purely based on design stage timing margin. However, it has
been shown that wearout depends on dynamic runtime
utilization of the processor and many of the causes of
wearout get exacerbated with increased circuit utilization [6,
7]. Path selection purely based on timing margin neglects
this important dependence. Thus the robustness of the
monitoring approach that relies purely on timing margin can
be compromised due to the dynamic nature of path
utilization. Hence, we conclude that in order for monitoring
approaches to be more broadly applied (beyond low-cost
computing segment) there is a need for a symbiotic
interaction between circuit design tools, monitoring
hardware and the high-level application software. Only
through such an interaction it is possible to identify circuit
paths which are the slowest at design time and also have
higher lifetime utilization resulting in most wearout induced
timing degradation.
In this paper we propose a novel cross-layer circuit
design flow methodology that combines static path timing
information with runtime path utilization data to significantly
enhance monitoring efficiency and robustness. Fig. 1 shows
the layered framework consisting of two phases:
(1) Cross-layer design flow (CLDF) phase: This phase
(marked as “Design Time” in the figure) uses representative
application inputs to derive circuit path utilization profile.
The microarchitecture specification provides monitoring
budget, such as the amount of chip area or the power
consumption allocated for monitoring. CLDF also derives
timing profile from static timing analysis of circuit’s design.
The wearout aware algorithm then combines information
from software, microarchitecture and circuit layers to drive
circuit design optimizations with the explicit goal of making
a circuit amenable for robust and efficient monitoring. The
algorithm selects a refined group of paths along with a robust
set of input vectors for wearout monitoring.
(2) Wearout monitoring phase: A runtime wearout
monitoring phase, similar to that proposed in WearMon [14],
continuously monitors the paths selected from the CLDF
phase. The information about the circuit paths which need to
be monitored, obtained from the CLDF phase, is used in the
runtime phase for wearout detection.
The focus of this research work is to develop the CLDF
framework. As such, we assume that a wearout monitoring
mechanism exists in the underlying microarchitecture. CLDF
significantly enhances the applicability of existing runtime
monitoring approaches. For example, where wearout sensors
or canary circuits are used for monitoring, CLDF will
identify circuit paths that are most susceptible to failure
thereby allowing the designer to select the most appropriate
location of the wearout sensors or canary circuitry. When in
situ monitoring approaches are used [14, 15, 18, 19] only the
most susceptible circuit paths reported by the CLDF
framework are monitored. It should be noted that although
the CLDF framework can be used with all the above
mentioned reliability monitoring approaches, throughout this
paper we assume that the underlying microarchitecture uses
an in situ monitoring approach similar to WearMon [14] to
illustrate how our design phase optimizations can enhance
runtime monitoring efficiency.
The main contributions of this work are:
1. We design and implement a novel cross-layer circuit
design flow methodology that combines static path timing
information with runtime path utilization data to significantly
enhance monitoring efficiency. This framework uses path
utilization profile, path delay characteristics, and number of
devices in critical paths to optimize the circuit using
selective path constraint adjustments (i.e. increasing the
timing margin of selected group of paths). This optimization
results in a new implementation of the circuit which is more
amenable for low overhead monitoring of wearout-induced
timing degradation.
2. We propose four algorithms for selecting the best
group of paths to be observed as early indicators of wearout
induced timing failures. Each of these algorithms allows the
designer to tradeoff area and power overhead of monitoring
with robustness and efficiency of monitoring.
3. We develop a hybrid hierarchical emulation/simulation
infrastructure to study the effects of application level events
on gate-level utilization profile. This setup provides a fast
and accurate framework to study system utilization across
multiple layers of the system stack using a combination of
FPGA emulation and gate-level simulation.
In an era when computers are built from increasing
number of components with decreasing reliability, multi-
layer resiliency is becoming a requirement for all computer
systems. In this work we design and implement a low cost
and scalable solution in which different layers of the
computer system stack can communicate and adapt both at
design phase and during the runtime of the system. Our
proposed cross-layer design flow approach is discussed in
Section II. Section III shows our hybrid cross-layer
evaluation infrastructure, followed by the evaluation results
in Section IV. Section V describes the most relevant prior
work, followed by conclusions in Section VI.
Page 3
II.
CROSS-LAYER DESIGN FLOW
In this section we describe the cross-layer circuit design
flow (CLDF) methodology. At the core of CLDF is a novel
approach that modifies the distribution of path timing
margins, so as to create a group of critical paths that are
more likely to fail before any other paths fail. The paths that
are likely to fail first are referred to as wearout-critical paths.
Wearout-critical paths would be ideal candidates for being
monitored as early indicators of wearout. CLDF receives a
monitoring budget, in terms of the area and power overhead
allowed for monitoring, as input from the designer. CLDF
uses three characteristics of the circuit, namely path timing,
path utilization profile, and number of devices on the path, to
select a limited number of wearout-critical paths to satisfy
the monitoring budget constraints specified by the designer.
Paths which are selected to be monitored at runtime are
going to be checked regularly using approaches like [14, 15,
18-20]. Detailed description of runtime monitoring
techniques is outside of the scope of this paper. However, to
put the work presented in this paper in context, we will
highlight a key testing method used by these runtime
monitoring approaches. Many of the runtime monitoring
frameworks test the circuit (or canary circuit) at a test
frequency, ftest, which is higher than the normal operation
frequency, f0+GB=1/T0+GB. T0 is the delay of slowest paths in
the circuit at design time and hence ideally the circuit can
operate at that clock period at fabrication time. As mentioned
earlier, designers add a guardband (increase the clock period)
to deal with wearout. T0+GB is the clock period of the system
with added guardband, which is the usual operational clock
period of the circuit. If multiple tests, each at a clock period
that falls within the T0 and T0+GB range (1/T0+GB<ftest<1/T0),
were preformed the test results would provide information
about the exact amount of timing degradation in paths tested.
Throughout the paper we assume that the above described
approach is used for wearout monitoring.
We first provide an overview of the algorithmic steps for
the proposed CLDF approach. Detailed description of the
key steps will follow immediately.
Step 1. The circuit is first synthesized using conventional
design flow. Performance, power, and area constraints are
provided as inputs to the synthesis tool. The synthesis tool
generates the implementation of the design and an initial
static timing report that shows the timing margin of each
circuit path. The first step in CLDF takes this synthesized
design as input and sorts all the circuit paths in the timing
report based on their timing margin. It then selects some
number of paths, say nLong, with least timing margin. These
nLong paths are further analyzed in the rest of the steps.
Step 2. The second step is where the cross-layer aspect of
design flow comes into effect. In this step, CLDF selects a
representative set of workloads and runs them on the
synthesized design. Utilization profile of the nLong paths
selected in step 1 is collected. The profile provides
information regarding how frequently each path has been
exercised during the execution of the selected workloads.
Step 3. One of the four approaches discussed in subsection
II.C is used to select two groups of paths from the nLong
paths: a) Path to be optimized further. b) Paths to be
monitored at runtime.
Step 4. Paths selected in group 3(a) are optimized to be
faster which results in more timing margin for these paths.
By optimizing paths in 3(a) the approach creates a distinct
separation of timing criticality between the two path group.
This separation causes paths in the group 3(b) to be wearout-
critical paths that allow for robust monitoring. It should be
noted that groups 3(a) and (b) are not mutually exclusive and
depending on the approach selected by the CLDF framework
there might be paths which are in both groups and are
optimized and also selected for being monitored.
Step 5. This step collects necessary data to enable robust
runtime monitoring of paths in group 3(b). This step is
dependent on the monitoring framework used. For example
if a runtime wearout monitoring such as [14] is used, the
input vectors that would sensitize the paths in group 3(b) are
created in this step. These inputs are then stored in a test
vector repository to enable runtime monitoring. If
approaches like [10, 20] are used for runtime monitoring,
then location of the paths in group 3(b) and their structure
should be stored so that canary circuits can be designed for
them or wearout sensors can be inserted at appropriate
locations. As stated earlier, in this paper we assume a
monitoring approach based on test vector injection for path
testing [14] is used.
A. Step 1: Selection of the Analysis Group
The first step in CLDF is to use a traditional synthesis
tool to synthesize the design and perform static timing
analysis. The hardware description language (HDL) code for
the design in addition to performance, area, and power
constraints are provided as inputs to the synthesis tool. The
output of this initial synthesis will be the gate-level
implementation and a timing report that indicates the amount
of timing margin for each circuit path. CLDF then generates
a sorted path list based on timing margin and selects a group
of nLong longest paths (paths with the least timing margin).
These paths are considered for optimization and/or runtime
monitoring as we will describe later. The selection of nLong
paths is done as follows.
CLDF selects nLong paths based on an initial cut-off
criteria (InitCutoff) given as input to the algorithm. CLDF
selects only those paths whose delay is larger than InitCutoff
percentage of the maximum path delay. For example if the
delay of the longest path in the circuit is 10ns and if
InitCutoff is selected as 75%, then CLDF picks all paths with
delay of 7.5ns or higher; this approach ensures that all paths
within 75% of the worst-case delay are selected for analysis.
The cutoff parameter is selected by the designer based on the
worst-case wearout expected in a design within the typical
lifetime of the processor. It has been shown in prior studies
that all wearout causing phenomena, such as NBTI, and
Electromigration, reach a maximum wearout level beyond
which they cause device failure [6, 7]. In fact, this
knowledge is what is used by conservative circuit design
approaches for selecting a guardband to prevent premature
failures; when a designer selects a 10% guardband the
assumption is that no path with more than 10% timing
Page 4
Figure 2. Path delay distribution (a) before optimization and after (b)
Approach 1 (c) Approach 2 (d) Approach 3 (e) Approach 4.
TABLE I. COMPARISON OF APPROACHES
Path
Utilization
Path Device
Count
High (1)
Low (2)
High (3)
Low (4)
App. 1
Opt. Mon. Opt. Mon. Opt. Mon. Opt. Mon.
No Yes Yes Yes Yes Yes Yes Yes
Yes No No Yes No No No Yes
Yes No No Yes No No No No
Yes No Yes No No No No No
App. 2 App. 3 App. 4
High
Low
margin will fail before the expected lifetime of the processor.
Hence, InitCutoff is simply the conservative guardband that
has already been estimated at design time.
It is worth noting that for circuits with steep critical path
timing walls using InitCutoff may result in selection of a
large group of paths for further analysis, thereby making
nLong a very large number. Large nLong values do not
create any impediment in the next steps of CLDF algorithm.
Similarly, for circuits with shallow critical path timing walls
nLong may be small. If nLong is too small (smaller than the
number of paths which can be monitored efficiently), then
there is no need to even conduct further analysis since the
circuit does not have many critical paths and it may be
possible to monitor all critical paths without further analysis
or need for CLDF. The main goal of this work, however, is
to make circuits with steep critical path timing walls (large
nLong values) still amenable for monitoring.
B. Step 2: Utilization Based Path Prioritization
Step 2 generates utilization profile of nLong paths. The
utilization data is collected while executing a representative
set of applications that are expected to run on the design.
During execution of representative applications the number
of times each of the nLong paths is utilized is saved. Then
nLong paths are sorted based on the cumulative number of
times each path was utilized during profile runs; we call this
sorted list the utilization profile.
CLDF uses HighUtilCutoff parameter given as input to
CLDF to identify paths that have utilization greater than
HighUtilCutoff percent of the maximum utilization reported
for the nLong paths. These paths are demarcated as high
utilization paths. CLDF also uses a LowUtilCutoff parameter
and any path with utilization lower than this cutoff is
demarcated as a low utilization path. The rationale behind
using two cutoffs is to create two distinct groups of paths
with very different utilization levels. As explained shortly,
this clear separation between high and low utilization is used
to create robust and efficient monitoring mechanisms.
Timing degradation of a circuit path is a sum of the
degradation of all the devices on that path. Hence, if all other
parameters are the same, more devices on a path result in
more susceptibility to wearout induced timing degradation.
As such CLDF uses device counts on a path to further
differentiate between paths. CLDF uses a single input
parameter called DevCutoff to demarcate paths with high or
low device counts. The criteria for specifying these
parameters are described later.
After gathering the utilization profile, CLDF divides
nLong paths into three categories based on HighUtilCutoff,
LowUtilCutoff, and DevCutoff. Timing margin of one
category of paths will be increased; we refer to these as the
optimized group. Another category contains those paths that
are monitored for wearout at runtime, which is referred to as
the monitored group. The third category contains paths that
are neither optimized nor monitored. We have explored four
path categorization algorithms in this research. These
algorithms provide different tradeoffs between performance,
power, area, and reliability.
Illustrative example: While describing the four
algorithms, we will use an illustrative example to show how
path categorization is done. For this purpose in Fig. 2(a) we
show the initial delay distribution of a sample circuit taken
from OpenSPARC T1 processor [21]. This sample circuit is
the instruction decode block of the instruction fetch unit
(sparc_ifu_dec). Section IV presents more quantitative
details for this circuit but they are not necessary here for
understanding the algorithms. The timing constraint used for
synthesis is 0.95ns (T0 or zero timing margin path delay). We
assume there is a 0.09ns timing guardband added by the
designer to deal with wearout. Hence, the resulting system
clock period is 1.04ns (T0+GB). In this discussion we assume
that we use 90% as the InitCutoff value. Hence, we select
nLong paths that are within 90% of the longest timing paths.
All paths in the right most five columns of Fig. 2(a) form the
nLong paths. There are three types of paths highlighted with
shades of black in Fig. 2(a): high utilization & high device
count, low utilization & low device count, and all other paths.
The group marked high utilization & high device count are
the paths that have utilization that exceeds the
HighUtilCutoff and device count that exceeds the DevCutoff
Page 5
parameters. Similarly, low utilization & low device count are
the paths that have utilization that is below the
LowUtilCutoff and device count that is below the DevCutoff
parameter.
Intuitively, the separation of paths into three types based
on utilization and device count provides an opportunity to
shift steep critical timing walls by not treating all paths with
the same timing margin as equally important. Instead we
create path heterogeneity with device count and utilization
information derived from application level information. By
exploiting this crucial runtime information through design
time utilization analysis we can avoid critical path timing
walls, as we will show in the next step.
C. Step 3: Approaches for Selecting Monitored Paths
The output from this step is the identification of paths
that are used for monitoring. We assume that a designer has a
fixed budget to monitor only nMonitor paths (based on the
area, power, and performance budget allocated for
monitoring). Hence, the goal is to select a total of nMonitor
paths. In this section we describe four approaches that we
designed for path selection.
1) Approach 1: Monitor Least Reliable
The goal of this approach is to create a distinct group of
paths which, with high probability, are the paths that are
going to have wearout induced timing failure before the rest
of the paths. These paths will be monitored and used as
predictors of imminent timing violations. Approach 1
achieves this goal by reshaping the path delay distribution of
the circuit as follows. A group of paths that are most
susceptible to wearout are selected for monitoring.
Concurrently, all the paths that are not monitored are
removed from the critical path wall by increasing the timing
margin of these paths. Since paths that are not monitored
have higher timing margin the probability of path not
monitored failing before the monitored group is reduced. Fig.
2(a) shows the distribution of a sample circuit before using
Approach 1 and Fig. 2(b) shows the redistribution of the
paths after applying Approach 1. The paths with the most
delay in the redistributed plot, highlighted in black on Fig.
2(b), are the group left for monitoring while all other paths
are moved away from the critical path wall. Detailed
description of Approach 1 is given below.
Paths optimized: This approach starts with the
utilization profile generated in Step 2 of the algorithm, which
sorts nLong paths based on path utilization. The
HighUtilCutoff parameter is used to select paths with high
utilization, i.e. paths with utilization greater than the cutoff
parameter. We then sort the high utilization paths based on
the number of devices on each path. We further divide this
newly sorted list by using DevCutoff parameter and identify
the high device count and low device count paths. At the end
of this process we end up with three sets of paths: high
utilization & high device count, high utilization & low device
count, and the remaining paths without any concern for their
device count. We then separate high utilization & high
device count paths from the nLong paths. The remaining
paths (nLong paths excluding high utilization & high device
count paths) are optimized to have a larger timing margin.
The increase in the margin is equal to the initial circuit
guardband. Path optimization is done by resynthesizing the
design using stricter timing constraint for the paths selected.
The delay of the optimized paths can be reduced, for
instance, by increasing the size of devices used on these
paths. Since the optimized paths have more timing margin
they are also significantly less likely to cause timing
violations.
Paths monitored: The high utilization & high device
count paths which are not optimized (black bars in Fig. 2(b))
will form the set of paths which are going to be continuously
monitored for wearout. These paths have a higher probability
of suffering the most wearout. These paths are utilized more
frequently and utilization has a first order effect on many of
the wearout causing phenomena. These paths also have more
devices on them and are more susceptible to timing
degradation caused by wearout of their devices. Runtime
monitoring would check the path delay degradation of these
paths between T0 and T0+GB and will alert the system if any
monitored path delay gets critically close to T0+GB.
Discussion of Approach 1: The goal is to select a total
of nMonitor paths where all paths have the characteristic of
high utilization & high device count. Our main motivation
for using HighUtilCutoff selection criteria is to pick a subset
of nLong paths with a distinctly higher utilization compared
to the rest of the nLong paths in that circuit. To satisfy this
goal HighUtilCutoff can be selected in the range of 75% to
85% of the maximum utilization in the nLong path group. If
a smaller percentage is selected, the relative utilization
difference between the paths selected and the ones not
selected would become smaller and hence the goal of
leveraging utilization differences between paths will not be
satisfied.
A few special cases are worth mentioning. First, if the
number of paths in the high utilization & high device count
category are more than the monitoring budget we simply
select the most utilized nMonitor paths from this category
and optimize the remaining paths even in this category. On
the other hand, in some circuits the number of paths
categorized as high utilization & high device count, after
applying HighUtilCutoff and DevCutoff, may be fewer than
nMonitor. In this case we fill the remaining paths for
monitoring from high utilization & low device count
category as well thereby removing these paths from further
optimization.
It should be noted that the goal of this work is to deal
with circuits which have many more paths than the
nMonitor. If the paths selected to be in the nLong path group
are fewer than nMonitor paths, then it is not necessary to use
the CLDF approach and all paths in the nLong group can
simply be monitored.
The value used for nMonitor has a direct impact on the
area overhead of Approach 1. If nMonitor is small then the
number of paths which are not monitored will be large and
hence the area overhead of the optimization is going to
increase. Recall that all the paths in nLong group that are not
monitored will be optimized, which usually requires
increasing device sizes. Furthermore, paths optimized with
larger device sizes also lead to higher dynamic power