Adaptive chip-package thermal analysis for synthesis and design.
ABSTRACT Ever-increasing integrated circuit (IC) power densities and peak temperatures threaten reliability, performance, and economical cool- ing. To address these challenges, thermal analysis must be embedded within IC synthesis. However, detailed thermal analysis requires ac- curate three-dimensional chip-package heat flow analysis. This has typically been based on numerical methods that are too computation- ally intensive for numerous repeated applications during synthesis or design. Thermal analysis techniques must be both accurate and fast for use in IC synthesis. This article presents a novel, accurate, incremental, self-adaptive, chip-package thermal analysis technique, called ISAC, for use in IC synthesis and design. It is common for IC temperature variation to strongly depend on position and time. ISAC dynamically adapts spatial and temporal modeling granularity to achieve high efficiency while maintaining accuracy. Both steady-state and dynamic thermal analysis are accelerated by the proposed heterogeneous spatial res- olution adaptation and temporally decoupled element time marching techniques. Each technique enables orders of magnitude improvement in performance while preserving accuracy when compared with other state-of-the-art adaptive steady-state and dynamic IC thermal analysis techniques. Experimental results indicate that these improvements are sufficient to make accurate dynamic and static thermal analysis prac- tical within the inner loops of IC synthesis algorithms. ISAC has been validated against reliable commercial thermal analysis tools using in- dustrial and academic synthesis test cases and chip designs. It has been implemented as a software package suitable for integration in IC synthesis and design flows and has been publicly released.
- SourceAvailable from: Mircea R Stan
Article: Temperature-Aware Microarchitecture[show abstract] [hide abstract]
ABSTRACT: With power density and hence cooling costs rising exponentially, processor packaging can no longer be designed for the worst case, and there is an urgent need for runtime processor-level techniques that can regulate operating temperature when the package 's capacity is exceeded. Evaluating such techniques, however, requires a thermal model that is practical for architectural studies. This paper describes HotSpot, an accurate yet fast model based on an equivalent circuit of thermal resistances and capacitances that correspond to microarchitecture blocks and essential aspects of the thermal package. Validation was performed using finiteelement simulation. The paper also introduces several effective methods for dynamic thermal management (DTM): "temperaturetracking " frequency scaling, localized toggling, and migrating computation to spare hardware units. Modeling temperature at the microarchitecture level also shows that power metrics are poor predictors of temperature, and that sensor imprecision has a substantial impact on the performance of DTM.05/2003;
Conference Proceeding: The impact of technology scaling on lifetime reliability[show abstract] [hide abstract]
ABSTRACT: The relentless scaling of CMOS technology has provided a steady increase in processor performance for the past three decades. However, increased power densities (hence temperatures) and other scaling effects have an adverse impact on long-term processor lifetime reliability. This paper represents a first attempt at quantifying the impact of scaling on lifetime reliability due to intrinsic hard errors, taking workload characteristics into consideration. For our quantitative evaluation, we use RAMP (Srinivasan et al., 2004), a previously proposed industrial-strength model that provides reliability estimates for a workload, but for a given technology. We extend RAMP by adding scaling specific parameters to enable workload-dependent lifetime reliability evaluation at different technologies. We show that (1) scaling has a significant impact on processor hard failure rates - on average, with SPEC benchmarks, we find the failure rate of a scaled 65nm processor to be 316% higher than a similarly pipelined 180nm processor; (2) time-dependent dielectric breakdown and electromigration have the largest increases; and (3) with scaling, the difference in reliability from running at worst-case vs. typical workload operating conditions increases significantly, as does the difference from running different workloads. Our results imply that leveraging a single microarchitecture design for multiple remaps across a few technology generations will become increasingly difficult, and motivate a need for workload specific, microarchitectural lifetime reliability awareness at an early design stage.Dependable Systems and Networks, 2004 International Conference on; 05/2004
Conference Proceeding: TAPHS: thermal-aware unified physical-level and high-level synthesis.[show abstract] [hide abstract]
ABSTRACT: Thermal effects are becoming increasingly important during integrated circuit design. Thermal characteristics influence reliability, power consumption, cooling costs, and performance. It is necessary to consider thermal effects during all levels of the design process, from the architectural level to the physical level. However, design-time temperature prediction requires access to block placement, wire models, power profile, and a chip-package thermal model. Thermal-aware design and synthesis necessarily couple architectural-level design decisions (e.g., scheduling) with physical design (e.g., floorplanning) and modeling (e.g., wire and thermal modeling). This article proposes an efficient and accurate thermal-aware floor-planning high-level synthesis system that makes use of integrated high-level and physical-level thermal optimization techniques. Voltage islands are automatically generated via novel slack distribution and voltage partitioning algorithms in order to reduce the design's power consumption and peak temperature. A new thermal-aware floorplanning technique is proposed to balance chip thermal profile, thereby further reducing peak temperature. The proposed system was used to synthesize a number of benchmarks, yielding numerous designs that trade off peak temperature, integrated circuit area, and power consumption. The proposed techniques reduces peak temperature by 12.5degC on average. When used to minimize peak temperature with a fixed area, peak temperature reductions are common. Under a constraint on peak temperature, integrated circuit area is reduced by 9.9% on averageProceedings of the 2006 Conference on Asia South Pacific Design Automation: ASP-DAC 2006, Yokohama, Japan, January 24-27, 2006; 01/2006
Adaptive Chip-Package Thermal Analysis for Synthesis and Design
Yonghong Yang†Zhenyu (Peter) Gu‡ Changyun Zhu†Li Shang† Robert P. Dick‡
Kingston, ON K7L 3N6, Canada
?4yy6, 4cz1 ?@qlink.queensu.ca,firstname.lastname@example.org
Evanston, IL 60208, U.S.A.
?zgu646, dickrp ?@ece.northwestern.edu
Ever-increasing integrated circuit (IC) power densities and peak
temperatures threaten reliability, performance, and economical cool-
ing. To address these challenges, thermal analysis must be embedded
within IC synthesis. However, detailed thermal analysis requires ac-
curate three-dimensional chip-package heat flow analysis. This has
typically been based on numerical methods that are too computation-
ally intensive for numerous repeated applications during synthesis or
design. Thermal analysis techniques must be both accurate and fast
for use in IC synthesis.
This articlepresents a novel, accurate, incremental, self-adaptive,
chip-package thermal analysis technique, called ISAC, for use in IC
synthesis and design. It is common for IC temperature variation
to strongly depend on position and time. ISAC dynamically adapts
spatial and temporal modeling granularity to achieve high efficiency
while maintaining accuracy. Both steady-state and dynamic thermal
analysis are accelerated by the proposed heterogeneous spatial res-
olution adaptation and temporally decoupled element time marching
techniques. Each technique enables orders of magnitude improvement
in performance while preserving accuracy when compared with other
state-of-the-art adaptive steady-state and dynamic IC thermal analysis
techniques. Experimental results indicate that these improvements are
sufficient to make accurate dynamic and static thermal analysis prac-
tical within the inner loops of IC synthesis algorithms. ISAC has been
validated against reliable commercial thermal analysis tools using in-
dustrial and academic synthesis test cases and chip designs. It has
been implemented as a software package suitable for integration in IC
synthesis and design flows and has been publicly released.
Integrated circuit (IC) densities and performance requirements arecon-
tinuously increasing. The crucial task of managing the resulting in-
crease inpower density and peak IC temperature is becoming more dif-
ficult ,. Current architectural-level design automation and syn-
thesis tools have multiple design metrics, such as power consumption,
temperature, performance, cost, and reliability. IC designs must care-
fully trade off these metrics. However, if not properly addressed, in-
creased IC temperature affects other design metrics including perfor-
mance (via decreased transistor switching speed and increased inter-
connect latency), power and energy consumption (via increased leak-
age power), reliability (via electromigration, hot carrier effects, ox-
ide thermal breakdown, etc.), and price (via increased system cooling
cost). Considering thermal issues during IC synthesis and design is
now necessary. When determining the impact of each decision in the
synthesis or design process, the impacts of changed thermal profile on
performance, power, price, and reliability must be considered. This
requires repeated use of fast, accurate thermal analysis tools during
The IC thermal analysis problem may be separated into two
subproblems: steady-state (or static) analysis and dynamic analysis.
Steady-state analysis determines the temperature profile to which an
IC converges as timeapproaches infinity, given power and thermal con-
ductivity profiles. Dynamic thermal analysis determines the tempera-
ture profile of an IC at any time given an initial temperature, power,
heat capacity, and thermal conductivity profiles.
This work is supported in part by the NSERC Discovery Grant #388694-01, and in part by
the NSF under award CNS-0347941.
Numerical analysis techniques were also proposed to character-
ize the thermal profile of on-chip interconnect layers [3–5]. Recently,
Skadron et al. developed steady-state and dynamic thermal analysis
tools for microarchitectural evaluation . Neither the matrix tech-
niques of the steady-state analysis tool nor the lock-step fourth-order
Runge-Kutta timemarching technique used for dynamic analysis make
use of spatial or asynchronous temporal adaptation; accuracy or perfor-
mance suffer. Researchers have proposed quad-tree mesh refinement
for thermal analysis , but did not consider local temporal adapta-
tion. Li et al. proposed an efficient multigrid modeling technique to
conduct full-chip steady-state thermal analysis . Although the ad-
vantages of heterogeneous element discretization is noted, no system-
atic adaptation method is provided. Zhan and Sapatnekar  proposed
a steady-state thermal analysis method based on Green’s function that
was accelerated by using discrete cosine transforms and look-up ta-
ble. However, these methods , do not support dynamic thermal
Existing IC thermal analysis tools are capable of providing either
accuracy or speed, but not both. Accurate thermal analysis requires
expensive computation for many elements in some regions, at some
times. Conventional IC thermal analysis techniques ensure accuracy
by choosing uniformly fine levels of detail across time and space, i.e.,
they use equivalent physical sizes or time step durations for all thermal
elements. The large number of elements and time steps resulting from
such techniques makes them computationally intensive and, therefore,
impractical for use within IC synthesis. This article presents validated,
synthesis-oriented IC thermal analysis techniques that differ from ex-
isting work by doing operation-by-operation dynamic adaptation of
temporal and spatial resolution in order to dramatically reduce com-
putational overhead without sacrificing accuracy. Experimental results
indicate that the proposed spatial adaptation technique improves CPU
time by 21.64–690.00? and that the temporal adaptation technique im-
proves CPU time by 122.81–337.23 ?. Although much faster than con-
ventional analysis techniques, the proposed techniques have been de-
signed for accuracy even when this increases complexity and run time,
e.g., by correctly modeling the dependence of thermal conductivity
on temperature. These algorithms have been validated against FEM-
LAB, a reliable commercial finite element physical process modeling
package, and a high-resolution spatially and temporally homogeneous
initial value problem solver. Experimental results indicate that using
existing thermal analysis techniques within IC synthesis flow would
increase CPU time by many orders of magnitude, making it imprac-
tical to synthesize complex ICs. The proposed techniques make both
dynamic and static thermal analysis practical within the inner loop of
IC synthesis algorithms. They have been implemented as a software
tool called ISAC that has been publicly released .
This article is organized as follows. Section 2 gives a motivating
example, which illustrates the need for fast and accurate thermal analy-
sis during IC synthesis and suggests techniques to reach this goal. Sec-
tion 3 describes the model, algorithms, and implementation of ISAC, a
fast and accurate steady-state and dynamic thermal analysis tool. Sec-
tion4presents experimental results validating ISACanddemonstrating
thedramaticperformance advantages resulting fromspatial andtempo-
ral adaptation during thermal analysis. Section5 presents conclusions.
2. Motivating Examples
In this section, we use a thermal-aware IC synthesis flow to demon-
strate the challenges of fast and accurate IC thermal modeling. Fig-
3-9810801-0-6/DATE06 © 2006 EDAA
partition, resource binding,
Figure 1. Thermal-aware synthesis flow.
Silicon die Cooling package
(a) Silicon chip and package.
IC active layer
(b) Temperature profile for active layer and heatsink.
Figure 2. Thermal analysis during IC synthesis.
ure 1 shows an integrated behavioral-level and physical-level IC syn-
thesis system . This synthesis system uses a simulated annealing
algorithm to jointly optimize several design metrics, including perfor-
mance, area, power consumption, and peak IC temperature. It con-
ducts both behavioral-level and physical-level stochastic optimization
moves, including scheduling, voltage assignment, resource binding,
floorplanning, etc. An intermediate solution is generated after each
optimization move. A detailed two-dimensional power profile is then
reported based on the physical floorplan. Thermal analysis algorithms
are invoked to guide optimization moves.
As illustrated by the example synthesis flow for each intermediate
solution, detailed thermal characterization requires full chip-package
thermal modeling and analysis using numerical methods, which are
computationally intensive. Figure 2 shows a full chip-package ther-
mal modeling example from an IBM IC design (see Section 4.1 for
more detail). The steady-state thermal profile of the active layer of the
silicon die in conjunction with the top layer of the cooling package,
shown in Figure 2(b), were characterized using a multigrid thermal
solver by partitioning the chip and the cooling package into 131,072
homogeneous thermal elements. Without spatial and temporal adapta-
tion, the solver requires many seconds or minutes when run on a high-
performance workstation. Compared to steady-state thermal modeling,
characterizing IC dynamic thermal profile is even more time consum-
ing. IC synthesis requires a large number of optimization steps; ther-
mal modeling can easily become its performance bottleneck.
A key challenge in thermal-aware IC synthesis is the develop-
ment of fast and accurate thermal analysis techniques. Fundamentally,
IC thermal modeling is the simulation of heat transfer from heat pro-
ducers (transistors and interconnect), through silicon die and cooling
package, to the ambient environment. This process is modeled with
partial differential equations. In order to approximate the solutions of
these equations using numerical methods, finite discretization is used,
i.e., an IC model is decomposed into numerous three-dimensional el-
ements. Adjacent elements interact via heat diffusion. Each element
is sufficiently small to permit its temperature to be expressed as a dif-
ference equation, as a function of time, its material characteristics, its
power dissipation, and the temperatures of its neighboring elements.
In an approach analogous to electric circuit analysis, thermal RC
(or R) networks are constructed to perform dynamic (or steady-state)
thermal analysis. Direct matrixoperations, e.g., inversion, maybe used
for steady-state thermal analysis. However, the computational demand
of this technique hinders its use within synthesis. Dynamic thermal
analysis may be conducted by partitioning the simulation period into
small time steps. The local times of all elements are then advanced, in
lock-step, using transient temperature approximations yielded by dif-
Number of elements
(a) Inter-element thermal gradient
Number of elements
(b) Normalized maximum step size
Figure 3. The potential of adaptive thermal modeling.
ference equations. The computation complexity of dynamic thermal
analysis is a function of the number of grid elements and time steps.
Therefore, to improve the efficiency of thermal modeling, the key issue
is to optimize the spatial and temporal modeling granularity, eliminat-
ing non-essential elements and stages.
Thereisatension between accuracy andefficiency whenchoosing
modeling granularity. Increasing modeling granularity reduces analy-
siscomplexity but mayalso decrease accuracy. Uniformtemperature is
assumed within each thermal element. Intra-element thermal gradients
are neglected. Therefore, increasing spatial modeling granularity nat-
urally increases modeling errors. Similarly, increasing time step size
may result in failure to capture transient thermal fluctuation or may in-
crease truncation error when the actual temperature functions of some
elements are of higher order than the difference equations used to ap-
IC thermal profiles contain significant spatial and temporal varia-
tion due to the heterogeneity of thermal conductivity and heat capacity
in different materials, as well as varying power profiles resulting from
non-uniform functional unit activities, placements, and schedules. Fig-
ure3(a) shows theinter-element thermal gradient distributionusingho-
mogeneous meshing of the example shown in Figure 2. The histogram
is normalized to the smallest inter-element thermal gradient. This fig-
ure contains a wide distribution of thermal gradients: heterogeneous
spatial element discretization refinement based on thermal gradients
has the potential to improve performance without impacting accuracy.
For dynamic thermal simulation, the size of each thermal ele-
ment’stime stepsshould permit accurate approximation by theelement
difference equations. An IC may experience different thermal fluctu-
ations at different locations. Therefore, the best sizes of time steps
for elements at different locations may vary. Figure 3(b) shows the
maximum potential time step size of each individual block based local
thermal variation; local adaptation of time step sizes has the potential
to improve performance without impacting accuracy.
3. Thermal Analysis Model and Algorithms
This section gives details on the proposed thermal analysis techniques.
3.1. IC Thermal Analysis Problem Definition
IC thermal analysis is the simulation of heat transfer through heteroge-
neous material among heat producers (e.g., transistors) and heat con-
sumers (e.g., heat sinks attached to IC packages). Modeling thermal
conduction is analogous to modeling electrical conduction, with ther-
mal conductivity corresponding to electrical conductivity, power dissi-
pation corresponding to electrical current, heat capacity corresponding
to electrical capacitance, and temperature corresponding to voltage.
The equation governing heat diffusion via thermal conduction in
an IC follows.
???k ?? r
??? p ?? r
In Equation 1, ρ is the material density; cpis the mass heat ca-
of the material at position
sity of the heat source. Note that, in reality, the thermal conductivity,
k, also depends on temperature (see Section 3.5). ISAC supports arbi-
trary heterogeneous thermal conduction models. For example, a model
may be composed of a heat sink in a forced-air ambient environment,
heat spreader, bulk silicon, active layer, and packaging material or any
other geometry and combination of materials.
? and k ?? r
? are the temperature and thermal conductivity
? r and time t; and p ?? r
? is the power den-
heat capacity and
thermal conductivity profiles
Initial 3-D temperature
profile and hybrid oct-tree
Adapt profile based
Figure 4. Overview of ISAC.
In order to do numerical thermal analysis, a seven point finite dif-
ference discretization method can be applied to the left and right side
of Equation 1, i.e., the IC thermal behavior may be modeled by decom-
posing it into numerous cubic elements, which may be of non-uniform
sizes. Adjacent elements interact via heat diffusion. Each element
has a power dissipation, temperature, thermal capacitance, as well as a
thermal resistance to adjacent elements. The discretized equation at an
interior point of a homogeneous material follows.
?V pi?j ?k
Given that ∆x, ∆y, and ∆z are discretization steps in dimen-
sion x, y and z, V
ductivities between adjacent elements. They are defined as follows:
cretization step in time t. For steady-state analysis, the left term in
Equation 2 expressing temperature variation as function of time, t, is
dropped. For either the dynamic or steady-state version of theproblem,
the equations for all IC elements can be represented as a matrix. Al-
though it is possible to directly solve this problem, the computational
expense is prohibitive.
? ∆x∆y∆z. Gx, Gyand Gzare the thermal con-
? and Gz
? k∆x∆y?∆z. ∆t is the dis-
3.2. ISAC Overview
Figure 4 gives an overview of ISAC, our proposed incremental, self-
adaptive, chip-package, thermal analysis tool. When used for steady-
state thermal analysis, it takes, as input, a three-dimensional chip and
package thermal conductivity profile, as well as a power dissipation
profile. A multigrid incremental solver is used to progressively refine
thermal element discretization torapidlyproduce atemperatureprofile.
When used for dynamic thermal analysis, in addition to the input
data required for steady-state analysis, ISAC requires the chip-package
heat capacity profile. In addition, it may accept an initial temperature
profile and efficient element grid. If these inputs are not provided, the
dynamic analysis technique uses the steady-state analysis technique to
produce its initial temperature profile and element grid. It then re-
peatedly updates the local temperatures and times of elements at asyn-
chronous time steps, appropriately adapting the step sizes of neighbors
to maintain accuracy.
As described in Section 3.5, after analysis is finished, the temper-
ature profile is adapted using a feedback loop in which thermal con-
ductivity is modified based upon temperature. Upon convergence, the
temperature profile is reported to the IC synthesis tool or designer.
3.3. Spatial Adaptation in Thermal Analysis
In this section, we present an efficient technique for adapting ther-
mal element spatial resolution for thermal analysis. This technique
uses incremental refinement to generate a tree of heterogeneous paral-
lelepipeds that supports fast thermal analysis without loss in accuracy.
Within ISAC, this technique is incorporated with an efficient multigrid
Algorithm 1 hybrid tree traversal(noderoot)
1: if noderootis a leaf node then
2: Add noderootto contourfinest level; return finest level
3: end if
4: for each intermediate child chi nodeido
levelchi nodei= hybrid tree traversal(chi nodei)
levelmin= min(levelmin, levelchi nodei)
7: end for
8: for each intermediate child chi nodeido
if levelchi nodei
10:Add chi nodeito contourchi nodei
12: end for
13: Add noderootto contourlevelmin
14: return levelmin-1
Level 3level 2level 1
Figure 5. Heterogeneous spatial resolution adaptation.
numerical analysis method, yielding a complete steady-state thermal
analysis solution. Dynamic thermal analysis alsobenefits fromthe pro-
posed spatial adaptation technique due to the dramatic reduction of the
number of grid elements that must be considered during time marching
3.3.1. Hybrid Data Structure. Efficient spatial adaptation in thermal
analysis relies on sophisticated data structures, i.e., it requires the effi-
cient organization of large data sets, representation of multi-level mod-
eling resolutions, and inter-level transition. The proposed technique
is supported by a hybrid oct-tree data structure, which provides an ef-
ficient and flexible representation to support spatial resolution adap-
tation. A hybrid oct-tree is a tree that maintains spatial relationships
among parallelepipeds in three dimensions. Each node may have up to
eight immediate children. Figure 5 shows a hybrid tree representation.
For the sake of simplicity, a two-dimension quad-tree is shown instead
of a three-dimension hybrid oct-tree. In the hybrid oct-tree, different
modeling resolutions are organized into contours along the tree hier-
archy, e.g., the contour formed by the leaf nodes represent the finest
spatial resolution (in this example, elements 2,3,6,7,...,12). Hetero-
geneous spatial resolution may result in a thermal element residing at
multiple resolution levels, e.g., element 2 resides at level 1, 2, and 3.
This information is represented as nodes existing in multiple contours
in the tree.
Spatial resolution adaption requires two basic operations, parti-
tioning and coarsening. In a hybrid oct-tree, partitioning is the process
of breaking a leaf node along arbitrary orthogonal axes, e.g., nodes 9
and 10 result from refining node 4. Coarsening is the process of merg-
ing direct sub-nodes into their parent, e.g., node 11 and 12 merged into
node 5. To conduct multi-resolution thermal analysis, we proposed
an efficient contour search algorithm, with computational complexity
lution level. As shown in Algorithm 1, leaf nodes are assigned to the
finest resolution level (lines 1–3). The resolution level of a parent node
of asubtreeequals theminimalresolutionlevel of allofitsintermediate
children nodes minus one (lines 4–7 and 13). An element may reside
in multiple resolution levels (lines 8–12). As will be explained later,
this algorithm provides an efficient solution to traverse different spatial
resolutions, thereby supporting efficient multigrid thermal analysis.
?, to determine thermal grid elements belonged to the same reso-
3.3.2. Multigrid Method. Since directly solving the system of linear
equations resulting from a large problem instance is intractable, more
efficient numerical methods are used to solve the heat diffusion prob-
lem. The multigrid method is an iterative method of solving (typically
sparse) systems of linear equations. It solves this problem by con-
structing a multi-level scheme, which greatly improves the efficiency
of removing low frequency solution errors common for conventional
iterative methods . A description of this technique is shown in
3.3.3. Incremental Analysis. Upon initialization, the steady-state ther-
mal analysis tool generates a coarse homogeneous oct-tree based on
the chip size. Iterative temperature approximation is repeated until
convergence to a stable profile. Elements across which temperature
varies by more than a user-specified threshold are further partitioned
into sub-elements. For each ordered element pair,
is the temperature of element i and that S is the temperature threshold,
the new number of elements, Q, along some partition g follows.
?i ? j
?, given that Ti
For each element, i, partitions along three dimensions are gath-
ered into a three-tuple (xi
a hybrid sub oct-tree. The number of sub-elements depends on the
ratio of the temperature difference to the threshold. Therefore, some
elements may be further partitioned and local thermal simulation re-
peated. Simulation terminates when all element-to-element temper-
ature differences are smaller than the predefined threshold, S. This
method focuses computation on the most critical regions, increasing
analysis speed while preserving accuracy.
3.4. Temporal Adaptation in Thermal Analysis
ISAC uses an adaptive time marching technique for dynamic thermal
analysis. This technique is loosely related to the adaptive Runge-Kutta
method  described in Section 2. The computational cost of a finite
?zi) that governs partitioning element i into
difference time marching technique is ∑e?Euecewhere E is the set
of all elements, ueis the number of time steps for a given element,
and ceis the time cost per evaluation for that element. For Runge-
Kutta methods, assuming a constant evaluation time and noting that all
elements experience the same number of evaluations, run time can be
expressed as uc∑e ?Enewhere n is the number of a block’s transitive
neighbors. For these methods, element time synchronization permits
evaluation amortization, eliminating the need to repeatedly evaluate
transitive neighbors, yielding a time cost of
Analysistimeisclassicallyreduced byattacking u, either byusing
higher-order methods that allow larger steps under bounded error or
by adapting global step size during analysis, e.g., the adaptive Runge-
Kutta method. However, much greater gains are possible. As noted in
Section 2, the requirement that all thermal elements be synchronized
in time implies that, at each time step, all elements must have their
local times advanced by the smallest step required by any element in
the model. As indicated by Figure 3(b), this implies that most elements
are forced to take unnecessarily small steps.
Although many time marching numerical methods for solving or-
dinary differential equations are based on methods that do not require
explicit differentiation, these methods are conceptually based on re-
peated Taylor series expansions around increasing time instants. Re-
visiting these roots and basing time marching on Taylor series expan-
sion allows element-by-element time step adaptation by supporting the
extrapolation of temperatures at arbitrary times.
For many problems, the differentiation required for calculating
Taylor series expansions is extremely complicated. Fortunately, for
the dynamic IC thermal analysis problem, little more than the Laplace
transform and linearity theorem are needed. Noting the definitions in
Equation 2, and given that Tn
t, Ginis the thermal conductivity between elements i and n, Niare
element i’s neighbors, M is the neighbor depth, αi
? is the temperature of element n at time
the nearest-neighbor approximation of temperature of element i at time
under boundary conditions determined by the chip, package, and cool-
Algorithm 2 Multigrid cycle
1: Pre-smoothing step: Iteratively relax initial random solution.
2: subtask Coarse grid correction
3:Compute residue from finer grid.
4:Approximate residue in coarser grid.
5:Solve coarseer grid problem using relaxation.
if Coarsest level has been reached then
7: Directly solve problem at this level.
9:Recursively apply the multigrid method.
11:Map the correction back from the coarser to finer grid.
12: end subtask
13: Post smoothing step: Add correction to solution at finest grid level.
14: Iteratively relax to obtain the final solution.
?HF error eliminated. ?
Note that the potentially differing values of step size, h, and local
time, t, for all thermal elements implies that the number of transitive
temperature extrapolations necessary for an element to advance by one
time step may not be amortized over multiple uses, as in the case in
the lock-step Runge-Kutta methods. As a result, for three-dimensional
thermal analysis, the number of evaluations, e, is related to the transi-
tive neighbor count, d, as follows:
i.e., the discretized volume of the implied octahedron.
Insummary, although itiscommon toimprovetheperformance of
time marching techniques by increasing their orders, thereby increas-
ing their step sizes, for the IC thermal analysis problem greater gains
are possible by decoupling element local times, allowing most ele-
ments to take larger than minimum-sized steps. However, this requires
explicit differentiation and prevents the amortization of neighbor tem-
perature extrapolation, increasing the cost of using higher-order meth-
ods relative to that of using fully synchronized element time march-
ing techniques. As demonstrated in Section 4, this trade-off is an
excellent one: the third-order element-by-element adaptation method
yields speed-ups ranging from 122.81–337.23 ? when compared to the
fourth-order adaptive Runge-Kutta method.
We now describe the element-by-element step size adaptation
methods used by ISAC to improve performance while preserving ac-
curacy. As illustrated in the right portion of Figure 4, dynamic analysis
starts with an initial three-dimensional temperature profile and hybrid
oct-tree that may have been provided by the synthesis tool or gener-
ated by ISAC using steady-state analysis; a chip/package/ambient heat
capacity and thermal conductivity profile; and a power profile. After
determining the initial maximum safe step sizes of all elements, ISAC
initializes an event queue of elements sorted by their target times, i.e.,
the element’s current time plus its step size. The element with the ear-
liest target timeis selected, itstemperature isupdated, a new maximum
safe step size is calculated for the element, and it is reinserted in the
event queue. The event queue serves to minimize the deviation be-
tween decoupled element current times, thereby avoiding temperature
extrapolation beyond the limits of the local time bounded-order expan-
sions. The new step size must take into account the truncation error of
the numerical method in use as well as the step sizes of the neighbors.
Given that hiis element i’s current step size, v is the order of the time-
marching numerical method, u is a constant slightly less than one, y is
the error threshold, dTi
function of time at time t, and tiis i’s current time, the safe next step
size for a block, regardless of its neighbors, follows.
? is the derivative of i’s temperature as a
This method of computing a new step size is based on the litera-
ture . However, it uses non-integer test step sizes to bracket the
most probable new step size.
It is necessary to further bound the step size to ensure that the
local times of neighbors are sufficiently close for accurate temperature
extrapolation. Given that Niis the set of i’s neighbors and w is a small
constant, e.g., 3, the new step size follows.
For efficiency, the hnof a neighbor at its own local time is used.
This temporal adaptation technique based upon Equations 4, 5
and 8 is general, and has been tested in first-order, second-order, and
third-order numerical methods. As indicated in Section 4.2, the result
isa 122.81–337.23 ? speedup without loss of accuracy when compared
to the fourth-order adaptive Runge-Kutta method.
3.5. Impact of Variable Thermal Conductivity
Thermal conductivity for a material is its ratio of heat flux density to
temperature gradient. The thermal conductivity of a material, e.g., sil-
icon, is a function of temperature, T. An ICs thermal conductivity,
analysis work ignores the dependence of thermal conductivity on tem-
perature, approximating it as a constant. This introduces inaccuracy in
analysis results. In contrast, ISAC models thermal conductivity as a
function of temperature.
Positionand temperature dependent thermal conductivity follows:
lating the thermal conductivity value after each iteration for all the ele-
ments would be computationally expensive. In order to maintain both
accuracy and performance, ISAC uses a post-processing feedback loop
todetermine theimpact of variations inthermal conductivity upon tem-
perature profile. As described in Section 4.1, the consequences were
with a model assuming constant thermal conductivity.
?, is also a function of position,
? r. Most previous fast IC thermal
?α ?C, where k0is the material’s conductivity value at
ÆK, α is a constant for the specific material. Recalcu-
ÆK improvements in peak temperature accuracy when compared
3.6. The use of ISAC in IC Synthesis
As explained in Section 2, ISAC was developed primarily for use
within IC synthesis, although it may also be used to provide guid-
ance during manual architectural decisions. ISAC may be used tosolve
both the steady-state and dynamic thermal analysis problems described
in Section 3.1. For use in steady-state analysis, ISAC requires three-
dimensional chip-package profiles of thermal conductivity and power
density. The required IC power profiles are typically produced by a
floorplanner used within the synthesis process ,,. It pro-
duces a three-dimensional steady-state temperature profile. When used
for dynamic thermal analysis, ISAC requires three-dimensional chip-
package profiles of temperature, power density, heat capacity, (option-
ally) initial temperature, and an elapsed IC duration after which to re-
port results. It produces a three-dimensional temperature profile at any
Both steady-state and dynamic thermal analysis solvers within
ISAC have been accelerated, using the techniques described in Sec-
tions 3.3 and 3.4, in order to permit efficient use after each tentative
change to an IC power profile during synthesis or design. Use within
synthesis has been validated (see Section4) by integrating ISACwithin
a behavioral synthesis algorithm .
4. Experimental Results
In this section, we validate and evaluate the performance of ISAC.
Experiments were conducted on Linux workstations of similar perfor-
mance. Evaluation focuses on accuracy and efficiency. ISAC sup-
ports both steady-state and dynamic thermal analysis. Steady-state
thermal analysis is validated against FEMLAB, a widely-used com-
mercial physics modeling package, using two actual chip designs from
IBM and the MIT Raw group. Dynamic thermal simulation is vali-
dated against a fourth-order adaptive Runge-Kutta method using a set
of synthesis benchmarks. Efficiency determines the feasibility of using
thermal analysis during synthesis and design. To characterize the effi-
ciency of ISAC, we compare it with other popular numerical analysis
methods by conducting steady-state and dynamic thermal analysis on
the power profiles produced during IC synthesis.
4.1. Steady-State Thermal Analysis Results
Thissectionreportstheaccuracy and efficiencyof thesteady-statether-
mal simulation techniques used in ISAC. We first conduct the follow-
ing experiments using two actual chip designs. The first IC is designed
by IBM. The silicon die is 13mm?13mm?0.625mm, which is sol-
dered to a ceramic carrier using flip-chip packaging, and attached to
a heat sink. A detailed 11 ?11 block static power profile was pro-
duced using a power simulator. The second IC is a chip-level multi-
processor designed by the MIT Raw group. This IC contains 16 on-
chip MIPS processor cores organized in a 4 ?4 array. The die area is
18.2mm?18.2mm. ItusesanIBMceramiccolumn gridarraypackage
with direct lid attach thermal enhancement. The static power profile is
based on data provided in the literature . We validate ISAC by
comparing its results with those produced by FEMLAB, a widely-used
commercial three-dimensional finite element based physics modeling
package. Table 1 provides thermal analysis results produced by ISAC
and FEMLAB for these ICs.
Average error, eavgwill be used as a measure of difference be-
tween thermal profiles:
? 1 ?N
where N is the total number of elements on the surface of the active
of element i reported by FEMLAB and ISAC, respectively. This is
conservative. If comparisons were made in degrees Kelvin instead of
degrees Celsius, the reported percentage error would be even lower.
In Table 1, the second and third columns show the peak and aver-
age temperatures of the surface of the active layer of the silicon dies of
these chips, as reported by ISAC. Compared to FEMLAB, the average
errors, eavg, are 1.7% and 0.7%. The next four columns show the effi-
ciency of ISAC in terms of CPU time, speedup, memory use, and num-
ber of elements. For comparison, the next three columns show the effi-
ciency using a multigrid analysis technique with homogeneous mesh-
ing. These results clearly demonstrate that element resolution adapta-
tion allows ISAC to achieve dramatic improvements in efficiency com-
pared to the conventional multigrid technique. CPU times decrease to
3.6% and 0.14% and memory usage decreases to 5.6% and 2.4% of the
times and memory required by the homogeneous technique. Note that
multigrid steady-state analysis itself is a highly efficient approach .
Using FEMLAB, both simulations take at least 20 minutes.
Existing IC thermal simulators neglect the dependence of thermal
conductivity on temperature, potentially resulting in substantial errors
in peak temperature. In previous work, this error was not detected dur-
ing validation because the models against which they were validated
also used constant values for thermal conductivity. Temperature varies
through the silicon die. Therefore, ignoring the dependence of thermal
conductivity on temperature may introduce significant errors.
The last two columns of Table 1 show the peak and average
temperatures, reported by FEMLAB, using thermal conductivities at
temperatures are underestimated by approximately 5
will be even more serious in designs with higher peak temperatures.
Note that the source of inaccuracy is not the specific value of thermal
conductivity chosen. No constant value will allow accurate results in
general: an accurate IC thermal model must consider the dependence
of silicon thermal conductivity upon temperature.
To further evaluate its efficiency, we use ISAC to conduct thermal
analysis for the behavioral synthesis algorithm described in Section 2.
This iterative algorithm does both behavioral-level and physical-level
optimization. In this experiment, ISAC performs steady-state thermal
analysis for each intermediate solution generated during synthesis of
ten commonly-used behavioral synthesis benchmarks.
Table 2 shows the performance of ISAC when used for steady-
state thermal analysis during behavioral synthesis. The second, third,
and fourth columns show the overall CPU time, speedup, and av-
erage memory used by ISAC to conduct steady-state thermal analy-
sis for all the intermediate solutions. Column five shows the aver-
age error compared to a conventional homogeneous meshing multigrid
method, whose overall CPU time and average memory use are shown
in columns six and seven. ISAC achieves almost the same accuracy
with much lower run-time overhead. The last column shows the CPU
time used by the behavioral synthesis algorithm. Comparing column
two and column seven makes it clear that, when used for steady-state
thermal analysis, ISAC consumes only a fraction of the CPU time re-
quired for synthesis: it is feasible to use ISAC during synthesis.
4.2. Dynamic Thermal Analysis Results
In this section, we evaluate the accuracy and efficiency of the dynamic
thermal analysis techniques used in ISAC. Heterogeneous spatial res-
ÆC, i.e., room temperature. It shows that, for both chips, the peak
ÆC. This effect