Adaptive chippackage thermal analysis for synthesis and design.
ABSTRACT Everincreasing integrated circuit (IC) power densities and peak temperatures threaten reliability, performance, and economical cool ing. To address these challenges, thermal analysis must be embedded within IC synthesis. However, detailed thermal analysis requires ac curate threedimensional chippackage heat flow analysis. This has typically been based on numerical methods that are too computation ally intensive for numerous repeated applications during synthesis or design. Thermal analysis techniques must be both accurate and fast for use in IC synthesis. This article presents a novel, accurate, incremental, selfadaptive, chippackage thermal analysis technique, called ISAC, for use in IC synthesis and design. It is common for IC temperature variation to strongly depend on position and time. ISAC dynamically adapts spatial and temporal modeling granularity to achieve high efficiency while maintaining accuracy. Both steadystate and dynamic thermal analysis are accelerated by the proposed heterogeneous spatial res olution adaptation and temporally decoupled element time marching techniques. Each technique enables orders of magnitude improvement in performance while preserving accuracy when compared with other stateoftheart adaptive steadystate and dynamic IC thermal analysis techniques. Experimental results indicate that these improvements are sufficient to make accurate dynamic and static thermal analysis prac tical within the inner loops of IC synthesis algorithms. ISAC has been validated against reliable commercial thermal analysis tools using in dustrial and academic synthesis test cases and chip designs. It has been implemented as a software package suitable for integration in IC synthesis and design flows and has been publicly released.

Conference Paper: Behavioral level dualvth design for reduced leakage power with thermal awareness.
[Show abstract] [Hide abstract]
ABSTRACT: DualVth design is an effective leakage power reduction technique at behavioral synthesis level. It allows designers to replace modules on noncritical path with the highVth implementation. However, the existing constructive algorithms fail to find the optimal solution due to the complexity of the problem and do not consider the onchip temperature variation. In this paper, we propose a twostage thermaldependent leakage power minimization algorithm by using dualVth library during behavioral synthesis. In the first stage, we quantitatively evaluate the timing impact on other modules caused by replacing certain modules with high Vth. Based on this analysis and the characteristics of the dualVth module library, we generate a small set of candidate solutions for the module replacement. Then in the second stage, we obtain the onchip thermal information from thermalaware floorplanning and thermal analysis to select the final solution from the candidate set. Experimental results show an average of 17.8% saving in leakage power consumption and a slightly shorter runtime compared to the best known work. In most cases, our algorithm can actually find the optimal solutions obtained from a complete solution space exploration.Design, Automation and Test in Europe, DATE 2010, Dresden, Germany, March 812, 2010; 01/2010  SourceAvailable from: sharif.edu
Conference Paper: VoltageFrequency Planning for ThermalAware, LowPower Design of Regular 3D NoCs
[Show abstract] [Hide abstract]
ABSTRACT: NetworkonChip combined with globally asynchronous locally synchronous paradigm is a promising architecture for easy IP integration and utilization with multiple voltage levels. For power reduction, multiple voltagefrequency levels are successfully applied to 2D NoCs, but never with a generic approach to 3D counterparts; in which low heat conductivity of insulator layers makes high dense temperature distribution at layers away from heat sink. In this paper, a thermalaware methodology for regular 3D NoCs based on multiple voltage levels is proposed. Given an application task graph, this methodology determines an efficient mapping of tasks onto network tiles, considering inherent computation and communication requirements of the tasks and thermal resistance from any silicon layer to the ambient. Then, a heuristic approach is utilized to determine voltage and frequency specifications of all IP cores, such that total power is reduced, dissipated heat is properly conducted to the layers close to the heat sink, and application requirements (in terms of deadline) are satisfied. The experiments confirm a significant saving in total power while performance of the running application is guaranteed.VLSI Design, 2010. VLSID '10. 23rd International Conference on; 02/2010  SourceAvailable from: unirostock.de[Show abstract] [Hide abstract]
ABSTRACT: As transistor dimensions are shrinking into regions of only a few atomic layers, designers are faced with various problems including increased reliability and power issues. Since these problems are amplified by higher circuit temperatures, this paper proposes an approach for the finegrained modeling of temperature distribution in manycore systems based on NetworksonChip. With this model, algorithms can be developed that consider the significant impact of temperature ─ e.g. on performance, power or reliability. To simulate the dynamic nature of temperature, the thermal properties of according integrated systems are modeled through the instantiation of equivalent RCcircuits. This approach exploits the dualism between electrical and thermal flows of energy. Finally, an application with system control for task mapping and power management exemplifies the proposed simulation methodology.01/2010;
Page 1
Adaptive ChipPackage Thermal Analysis for Synthesis and Design
Yonghong Yang†Zhenyu (Peter) Gu‡ Changyun Zhu†Li Shang† Robert P. Dick‡
†ECE Department
Queen’s University
Kingston, ON K7L 3N6, Canada
?4yy6, 4cz1 ?@qlink.queensu.ca,li.shang@queensu.ca
‡EECS Department
Northwestern University
Evanston, IL 60208, U.S.A.
?zgu646, dickrp ?@ece.northwestern.edu
Abstract
Everincreasing integrated circuit (IC) power densities and peak
temperatures threaten reliability, performance, and economical cool
ing. To address these challenges, thermal analysis must be embedded
within IC synthesis. However, detailed thermal analysis requires ac
curate threedimensional chippackage heat flow analysis. This has
typically been based on numerical methods that are too computation
ally intensive for numerous repeated applications during synthesis or
design. Thermal analysis techniques must be both accurate and fast
for use in IC synthesis.
This articlepresents a novel, accurate, incremental, selfadaptive,
chippackage thermal analysis technique, called ISAC, for use in IC
synthesis and design. It is common for IC temperature variation
to strongly depend on position and time. ISAC dynamically adapts
spatial and temporal modeling granularity to achieve high efficiency
while maintaining accuracy. Both steadystate and dynamic thermal
analysis are accelerated by the proposed heterogeneous spatial res
olution adaptation and temporally decoupled element time marching
techniques. Each technique enables orders of magnitude improvement
in performance while preserving accuracy when compared with other
stateoftheart adaptive steadystate and dynamic IC thermal analysis
techniques. Experimental results indicate that these improvements are
sufficient to make accurate dynamic and static thermal analysis prac
tical within the inner loops of IC synthesis algorithms. ISAC has been
validated against reliable commercial thermal analysis tools using in
dustrial and academic synthesis test cases and chip designs. It has
been implemented as a software package suitable for integration in IC
synthesis and design flows and has been publicly released.
1. Introduction
Integrated circuit (IC) densities and performance requirements arecon
tinuously increasing. The crucial task of managing the resulting in
crease inpower density and peak IC temperature is becoming more dif
ficult [1],[2]. Current architecturallevel design automation and syn
thesis tools have multiple design metrics, such as power consumption,
temperature, performance, cost, and reliability. IC designs must care
fully trade off these metrics. However, if not properly addressed, in
creased IC temperature affects other design metrics including perfor
mance (via decreased transistor switching speed and increased inter
connect latency), power and energy consumption (via increased leak
age power), reliability (via electromigration, hot carrier effects, ox
ide thermal breakdown, etc.), and price (via increased system cooling
cost). Considering thermal issues during IC synthesis and design is
now necessary. When determining the impact of each decision in the
synthesis or design process, the impacts of changed thermal profile on
performance, power, price, and reliability must be considered. This
requires repeated use of fast, accurate thermal analysis tools during
synthesis.
The IC thermal analysis problem may be separated into two
subproblems: steadystate (or static) analysis and dynamic analysis.
Steadystate analysis determines the temperature profile to which an
IC converges as timeapproaches infinity, given power and thermal con
ductivity profiles. Dynamic thermal analysis determines the tempera
ture profile of an IC at any time given an initial temperature, power,
heat capacity, and thermal conductivity profiles.
This work is supported in part by the NSERC Discovery Grant #38869401, and in part by
the NSF under award CNS0347941.
Numerical analysis techniques were also proposed to character
ize the thermal profile of onchip interconnect layers [3–5]. Recently,
Skadron et al. developed steadystate and dynamic thermal analysis
tools for microarchitectural evaluation [6]. Neither the matrix tech
niques of the steadystate analysis tool nor the lockstep fourthorder
RungeKutta timemarching technique used for dynamic analysis make
use of spatial or asynchronous temporal adaptation; accuracy or perfor
mance suffer. Researchers have proposed quadtree mesh refinement
for thermal analysis [7], but did not consider local temporal adapta
tion. Li et al. proposed an efficient multigrid modeling technique to
conduct fullchip steadystate thermal analysis [8]. Although the ad
vantages of heterogeneous element discretization is noted, no system
atic adaptation method is provided. Zhan and Sapatnekar [9] proposed
a steadystate thermal analysis method based on Green’s function that
was accelerated by using discrete cosine transforms and lookup ta
ble. However, these methods [8],[9] do not support dynamic thermal
analysis.
Existing IC thermal analysis tools are capable of providing either
accuracy or speed, but not both. Accurate thermal analysis requires
expensive computation for many elements in some regions, at some
times. Conventional IC thermal analysis techniques ensure accuracy
by choosing uniformly fine levels of detail across time and space, i.e.,
they use equivalent physical sizes or time step durations for all thermal
elements. The large number of elements and time steps resulting from
such techniques makes them computationally intensive and, therefore,
impractical for use within IC synthesis. This article presents validated,
synthesisoriented IC thermal analysis techniques that differ from ex
isting work by doing operationbyoperation dynamic adaptation of
temporal and spatial resolution in order to dramatically reduce com
putational overhead without sacrificing accuracy. Experimental results
indicate that the proposed spatial adaptation technique improves CPU
time by 21.64–690.00? and that the temporal adaptation technique im
proves CPU time by 122.81–337.23 ?. Although much faster than con
ventional analysis techniques, the proposed techniques have been de
signed for accuracy even when this increases complexity and run time,
e.g., by correctly modeling the dependence of thermal conductivity
on temperature. These algorithms have been validated against FEM
LAB, a reliable commercial finite element physical process modeling
package, and a highresolution spatially and temporally homogeneous
initial value problem solver. Experimental results indicate that using
existing thermal analysis techniques within IC synthesis flow would
increase CPU time by many orders of magnitude, making it imprac
tical to synthesize complex ICs. The proposed techniques make both
dynamic and static thermal analysis practical within the inner loop of
IC synthesis algorithms. They have been implemented as a software
tool called ISAC that has been publicly released [10].
This article is organized as follows. Section 2 gives a motivating
example, which illustrates the need for fast and accurate thermal analy
sis during IC synthesis and suggests techniques to reach this goal. Sec
tion 3 describes the model, algorithms, and implementation of ISAC, a
fast and accurate steadystate and dynamic thermal analysis tool. Sec
tion4presents experimental results validating ISACanddemonstrating
thedramaticperformance advantages resulting fromspatial andtempo
ral adaptation during thermal analysis. Section5 presents conclusions.
2. Motivating Examples
In this section, we use a thermalaware IC synthesis flow to demon
strate the challenges of fast and accurate IC thermal modeling. Fig
3981080106/DATE06 © 2006 EDAA
Page 2
Input specification
Highlevel optimization
(scheduling, voltage
partition, resource binding,
etc.)
Physicallevel optimization
(floorplanning)
Iterative optimization
Power analysis
Thermal
analysis
Performance
profiling
Multiobjective cost
evaluation
Final solutions
Figure 1. Thermalaware synthesis flow.
Silicon die Cooling package
(a) Silicon chip and package.
35
40
45
50
55
60
65
70
75
80
85
90
8
6
4
2
0
2
4
6
8
8
6
4
2
0
2
4
6
8
35
40
45
50
55
60
65
70
75
80
85
90
Temperature (°C)
Temperature (°C)
Position (mm)
Heatsink/IC
interface
IC active layer
(b) Temperature profile for active layer and heatsink.
Figure 2. Thermal analysis during IC synthesis.
ure 1 shows an integrated behaviorallevel and physicallevel IC syn
thesis system [11]. This synthesis system uses a simulated annealing
algorithm to jointly optimize several design metrics, including perfor
mance, area, power consumption, and peak IC temperature. It con
ducts both behaviorallevel and physicallevel stochastic optimization
moves, including scheduling, voltage assignment, resource binding,
floorplanning, etc. An intermediate solution is generated after each
optimization move. A detailed twodimensional power profile is then
reported based on the physical floorplan. Thermal analysis algorithms
are invoked to guide optimization moves.
As illustrated by the example synthesis flow for each intermediate
solution, detailed thermal characterization requires full chippackage
thermal modeling and analysis using numerical methods, which are
computationally intensive. Figure 2 shows a full chippackage ther
mal modeling example from an IBM IC design (see Section 4.1 for
more detail). The steadystate thermal profile of the active layer of the
silicon die in conjunction with the top layer of the cooling package,
shown in Figure 2(b), were characterized using a multigrid thermal
solver by partitioning the chip and the cooling package into 131,072
homogeneous thermal elements. Without spatial and temporal adapta
tion, the solver requires many seconds or minutes when run on a high
performance workstation. Compared to steadystate thermal modeling,
characterizing IC dynamic thermal profile is even more time consum
ing. IC synthesis requires a large number of optimization steps; ther
mal modeling can easily become its performance bottleneck.
A key challenge in thermalaware IC synthesis is the develop
ment of fast and accurate thermal analysis techniques. Fundamentally,
IC thermal modeling is the simulation of heat transfer from heat pro
ducers (transistors and interconnect), through silicon die and cooling
package, to the ambient environment. This process is modeled with
partial differential equations. In order to approximate the solutions of
these equations using numerical methods, finite discretization is used,
i.e., an IC model is decomposed into numerous threedimensional el
ements. Adjacent elements interact via heat diffusion. Each element
is sufficiently small to permit its temperature to be expressed as a dif
ference equation, as a function of time, its material characteristics, its
power dissipation, and the temperatures of its neighboring elements.
In an approach analogous to electric circuit analysis, thermal RC
(or R) networks are constructed to perform dynamic (or steadystate)
thermal analysis. Direct matrixoperations, e.g., inversion, maybe used
for steadystate thermal analysis. However, the computational demand
of this technique hinders its use within synthesis. Dynamic thermal
analysis may be conducted by partitioning the simulation period into
small time steps. The local times of all elements are then advanced, in
lockstep, using transient temperature approximations yielded by dif
10
0
10
1
10
2
0
2000
4000
6000
8000
10000
12000
Number of elements
(a) Interelement thermal gradient
10
0
10
1
10
2
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
Number of elements
(b) Normalized maximum step size
Figure 3. The potential of adaptive thermal modeling.
ference equations. The computation complexity of dynamic thermal
analysis is a function of the number of grid elements and time steps.
Therefore, to improve the efficiency of thermal modeling, the key issue
is to optimize the spatial and temporal modeling granularity, eliminat
ing nonessential elements and stages.
Thereisatension between accuracy andefficiency whenchoosing
modeling granularity. Increasing modeling granularity reduces analy
siscomplexity but mayalso decrease accuracy. Uniformtemperature is
assumed within each thermal element. Intraelement thermal gradients
are neglected. Therefore, increasing spatial modeling granularity nat
urally increases modeling errors. Similarly, increasing time step size
may result in failure to capture transient thermal fluctuation or may in
crease truncation error when the actual temperature functions of some
elements are of higher order than the difference equations used to ap
proximate them.
IC thermal profiles contain significant spatial and temporal varia
tion due to the heterogeneity of thermal conductivity and heat capacity
in different materials, as well as varying power profiles resulting from
nonuniform functional unit activities, placements, and schedules. Fig
ure3(a) shows theinterelement thermal gradient distributionusingho
mogeneous meshing of the example shown in Figure 2. The histogram
is normalized to the smallest interelement thermal gradient. This fig
ure contains a wide distribution of thermal gradients: heterogeneous
spatial element discretization refinement based on thermal gradients
has the potential to improve performance without impacting accuracy.
For dynamic thermal simulation, the size of each thermal ele
ment’stime stepsshould permit accurate approximation by theelement
difference equations. An IC may experience different thermal fluctu
ations at different locations. Therefore, the best sizes of time steps
for elements at different locations may vary. Figure 3(b) shows the
maximum potential time step size of each individual block based local
thermal variation; local adaptation of time step sizes has the potential
to improve performance without impacting accuracy.
3. Thermal Analysis Model and Algorithms
This section gives details on the proposed thermal analysis techniques.
3.1. IC Thermal Analysis Problem Definition
IC thermal analysis is the simulation of heat transfer through heteroge
neous material among heat producers (e.g., transistors) and heat con
sumers (e.g., heat sinks attached to IC packages). Modeling thermal
conduction is analogous to modeling electrical conduction, with ther
mal conductivity corresponding to electrical conductivity, power dissi
pation corresponding to electrical current, heat capacity corresponding
to electrical capacitance, and temperature corresponding to voltage.
The equation governing heat diffusion via thermal conduction in
an IC follows.
ρcp∂T
?? r
∂t
?t
?
???k ?? r
??T
?? r
?t
??? p ?? r
?t
?
(1)
In Equation 1, ρ is the material density; cpis the mass heat ca
pacity; T
of the material at position
sity of the heat source. Note that, in reality, the thermal conductivity,
k, also depends on temperature (see Section 3.5). ISAC supports arbi
trary heterogeneous thermal conduction models. For example, a model
may be composed of a heat sink in a forcedair ambient environment,
heat spreader, bulk silicon, active layer, and packaging material or any
other geometry and combination of materials.
?? r
?t
? and k ?? r
? are the temperature and thermal conductivity
? r and time t; and p ?? r
?t
? is the power den
Page 3
3D chip/package/ambient
heat capacity and
thermal conductivity profiles
Initial 3D temperature
profile and hybrid octtree
(optional)
Power
profile
Dynamic
thermal
analysis
Multigrid
incremental
solver
Initialize/update
discrete event
simulator queue
Process one
pending event
Adapt
neighboring
element
step sizes
Sample period
reached?
Thermal
gradient conditions
satisfied?
Adapt profile based
on k(T)
Converged?
3D thermal
profile (and
hybrid octtree)
Streadystate
thermal analysis
Y
N
Y
Spatial hybrid
octtree refinement
Y
N
N
Initial 3D
temperature
profile and
hybrid
octtree
Figure 4. Overview of ISAC.
i?1?j ?k
In order to do numerical thermal analysis, a seven point finite dif
ference discretization method can be applied to the left and right side
of Equation 1, i.e., the IC thermal behavior may be modeled by decom
posing it into numerous cubic elements, which may be of nonuniform
sizes. Adjacent elements interact via heat diffusion. Each element
has a power dissipation, temperature, thermal capacitance, as well as a
thermal resistance to adjacent elements. The discretized equation at an
interior point of a homogeneous material follows.
ρcpV
Tm?1
i?j ?k
?Tm
i?j ?k
∆t
??2 ?Gx
?Gy
?Gz
?Tm
i?j ?k
?GxTm
?GxTm
i?1?j ?k
?GyTm
i?j ?1?k
?GyTm
i?j ?1?k
(2)
?GzTm
i?j ?k
?1
?GzTm
i?j ?k
?1
?V pi?j ?k
Given that ∆x, ∆y, and ∆z are discretization steps in dimen
sion x, y and z, V
ductivities between adjacent elements. They are defined as follows:
Gx
cretization step in time t. For steadystate analysis, the left term in
Equation 2 expressing temperature variation as function of time, t, is
dropped. For either the dynamic or steadystate version of theproblem,
the equations for all IC elements can be represented as a matrix. Al
though it is possible to directly solve this problem, the computational
expense is prohibitive.
? ∆x∆y∆z. Gx, Gyand Gzare the thermal con
? k∆y∆z?∆x?Gy
? k∆x∆z?∆y
? and Gz
? k∆x∆y?∆z. ∆t is the dis
3.2. ISAC Overview
Figure 4 gives an overview of ISAC, our proposed incremental, self
adaptive, chippackage, thermal analysis tool. When used for steady
state thermal analysis, it takes, as input, a threedimensional chip and
package thermal conductivity profile, as well as a power dissipation
profile. A multigrid incremental solver is used to progressively refine
thermal element discretization torapidlyproduce atemperatureprofile.
When used for dynamic thermal analysis, in addition to the input
data required for steadystate analysis, ISAC requires the chippackage
heat capacity profile. In addition, it may accept an initial temperature
profile and efficient element grid. If these inputs are not provided, the
dynamic analysis technique uses the steadystate analysis technique to
produce its initial temperature profile and element grid. It then re
peatedly updates the local temperatures and times of elements at asyn
chronous time steps, appropriately adapting the step sizes of neighbors
to maintain accuracy.
As described in Section 3.5, after analysis is finished, the temper
ature profile is adapted using a feedback loop in which thermal con
ductivity is modified based upon temperature. Upon convergence, the
temperature profile is reported to the IC synthesis tool or designer.
3.3. Spatial Adaptation in Thermal Analysis
In this section, we present an efficient technique for adapting ther
mal element spatial resolution for thermal analysis. This technique
uses incremental refinement to generate a tree of heterogeneous paral
lelepipeds that supports fast thermal analysis without loss in accuracy.
Within ISAC, this technique is incorporated with an efficient multigrid
Algorithm 1 hybrid tree traversal(noderoot)
1: if noderootis a leaf node then
2: Add noderootto contourfinest level; return finest level
3: end if
4: for each intermediate child chi nodeido
5:
levelchi nodei= hybrid tree traversal(chi nodei)
6:
levelmin= min(levelmin, levelchi nodei)
7: end for
8: for each intermediate child chi nodeido
9:
if levelchi nodei
10:Add chi nodeito contourchi nodei
11:
end if
12: end for
13: Add noderootto contourlevelmin
14: return levelmin1
? levelminthen
?1,...,contourlevelmin
?1
0
1234
56789 10
1112
6
78
1
2
910
2
3
2
3
4
56
78
2
3
4
1
Contour
level 3
Level 3level 2level 1
1
1
Figure 5. Heterogeneous spatial resolution adaptation.
numerical analysis method, yielding a complete steadystate thermal
analysis solution. Dynamic thermal analysis alsobenefits fromthe pro
posed spatial adaptation technique due to the dramatic reduction of the
number of grid elements that must be considered during time marching
simulation.
3.3.1. Hybrid Data Structure. Efficient spatial adaptation in thermal
analysis relies on sophisticated data structures, i.e., it requires the effi
cient organization of large data sets, representation of multilevel mod
eling resolutions, and interlevel transition. The proposed technique
is supported by a hybrid octtree data structure, which provides an ef
ficient and flexible representation to support spatial resolution adap
tation. A hybrid octtree is a tree that maintains spatial relationships
among parallelepipeds in three dimensions. Each node may have up to
eight immediate children. Figure 5 shows a hybrid tree representation.
For the sake of simplicity, a twodimension quadtree is shown instead
of a threedimension hybrid octtree. In the hybrid octtree, different
modeling resolutions are organized into contours along the tree hier
archy, e.g., the contour formed by the leaf nodes represent the finest
spatial resolution (in this example, elements 2,3,6,7,...,12). Hetero
geneous spatial resolution may result in a thermal element residing at
multiple resolution levels, e.g., element 2 resides at level 1, 2, and 3.
This information is represented as nodes existing in multiple contours
in the tree.
Spatial resolution adaption requires two basic operations, parti
tioning and coarsening. In a hybrid octtree, partitioning is the process
of breaking a leaf node along arbitrary orthogonal axes, e.g., nodes 9
and 10 result from refining node 4. Coarsening is the process of merg
ing direct subnodes into their parent, e.g., node 11 and 12 merged into
node 5. To conduct multiresolution thermal analysis, we proposed
an efficient contour search algorithm, with computational complexity
O
lution level. As shown in Algorithm 1, leaf nodes are assigned to the
finest resolution level (lines 1–3). The resolution level of a parent node
of asubtreeequals theminimalresolutionlevel of allofitsintermediate
children nodes minus one (lines 4–7 and 13). An element may reside
in multiple resolution levels (lines 8–12). As will be explained later,
this algorithm provides an efficient solution to traverse different spatial
resolutions, thereby supporting efficient multigrid thermal analysis.
?N
?, to determine thermal grid elements belonged to the same reso
3.3.2. Multigrid Method. Since directly solving the system of linear
equations resulting from a large problem instance is intractable, more
efficient numerical methods are used to solve the heat diffusion prob
lem. The multigrid method is an iterative method of solving (typically
sparse) systems of linear equations. It solves this problem by con
Page 4
structing a multilevel scheme, which greatly improves the efficiency
of removing low frequency solution errors common for conventional
iterative methods [12]. A description of this technique is shown in
Algorithm 2.
3.3.3. Incremental Analysis. Upon initialization, the steadystate ther
mal analysis tool generates a coarse homogeneous octtree based on
the chip size. Iterative temperature approximation is repeated until
convergence to a stable profile. Elements across which temperature
varies by more than a userspecified threshold are further partitioned
into subelements. For each ordered element pair,
is the temperature of element i and that S is the temperature threshold,
the new number of elements, Q, along some partition g follows.
?i ? j
?, given that Ti
Q
?
?log2
?Ti
?Tj
?S
??
(3)
For each element, i, partitions along three dimensions are gath
ered into a threetuple (xi
a hybrid sub octtree. The number of subelements depends on the
ratio of the temperature difference to the threshold. Therefore, some
elements may be further partitioned and local thermal simulation re
peated. Simulation terminates when all elementtoelement temper
ature differences are smaller than the predefined threshold, S. This
method focuses computation on the most critical regions, increasing
analysis speed while preserving accuracy.
3.4. Temporal Adaptation in Thermal Analysis
ISAC uses an adaptive time marching technique for dynamic thermal
analysis. This technique is loosely related to the adaptive RungeKutta
method [13] described in Section 2. The computational cost of a finite
?yi
?zi) that governs partitioning element i into
difference time marching technique is ∑e?Euecewhere E is the set
of all elements, ueis the number of time steps for a given element,
and ceis the time cost per evaluation for that element. For Runge
Kutta methods, assuming a constant evaluation time and noting that all
elements experience the same number of evaluations, run time can be
expressed as uc∑e ?Enewhere n is the number of a block’s transitive
neighbors. For these methods, element time synchronization permits
evaluation amortization, eliminating the need to repeatedly evaluate
transitive neighbors, yielding a time cost of
Analysistimeisclassicallyreduced byattacking u, either byusing
higherorder methods that allow larger steps under bounded error or
by adapting global step size during analysis, e.g., the adaptive Runge
Kutta method. However, much greater gains are possible. As noted in
Section 2, the requirement that all thermal elements be synchronized
in time implies that, at each time step, all elements must have their
local times advanced by the smallest step required by any element in
the model. As indicated by Figure 3(b), this implies that most elements
are forced to take unnecessarily small steps.
Although many time marching numerical methods for solving or
dinary differential equations are based on methods that do not require
explicit differentiation, these methods are conceptually based on re
peated Taylor series expansions around increasing time instants. Re
visiting these roots and basing time marching on Taylor series expan
sion allows elementbyelement time step adaptation by supporting the
extrapolation of temperatures at arbitrary times.
For many problems, the differentiation required for calculating
Taylor series expansions is extremely complicated. Fortunately, for
the dynamic IC thermal analysis problem, little more than the Laplace
transform and linearity theorem are needed. Noting the definitions in
Equation 2, and given that Tn
t, Ginis the thermal conductivity between elements i and n, Niare
element i’s neighbors, M is the neighbor depth, αi
?E
?uc.
?t
? is the temperature of element n at time
?∑n?NiGin, and
βi
?t
?M
??
?∑n?NiTn
V pi
?t
?M
??Gin
?V pi
if M
otherwise
? 0
(4)
the nearestneighbor approximation of temperature of element i at time
t
?h follows.
Ti
?t
?h ?M
?? βi
?t
?h ?M
?1 ??αi
?Ti
?t
??βi
e
?M
?1 ??αi
?h?αi
???ρcpiV
?
(5)
under boundary conditions determined by the chip, package, and cool
ing solution.
Algorithm 2 Multigrid cycle
1: Presmoothing step: Iteratively relax initial random solution.
2: subtask Coarse grid correction
3:Compute residue from finer grid.
4:Approximate residue in coarser grid.
5:Solve coarseer grid problem using relaxation.
6:
if Coarsest level has been reached then
7: Directly solve problem at this level.
8:
else
9:Recursively apply the multigrid method.
10:
end if
11:Map the correction back from the coarser to finer grid.
12: end subtask
13: Post smoothing step: Add correction to solution at finest grid level.
14: Iteratively relax to obtain the final solution.
?HF error eliminated. ?
Note that the potentially differing values of step size, h, and local
time, t, for all thermal elements implies that the number of transitive
temperature extrapolations necessary for an element to advance by one
time step may not be amortized over multiple uses, as in the case in
the lockstep RungeKutta methods. As a result, for threedimensional
thermal analysis, the number of evaluations, e, is related to the transi
tive neighbor count, d, as follows:
e
??E
??4 ?3d3
?2d2
?8 ?3d
?
(6)
i.e., the discretized volume of the implied octahedron.
Insummary, although itiscommon toimprovetheperformance of
time marching techniques by increasing their orders, thereby increas
ing their step sizes, for the IC thermal analysis problem greater gains
are possible by decoupling element local times, allowing most ele
ments to take larger than minimumsized steps. However, this requires
explicit differentiation and prevents the amortization of neighbor tem
perature extrapolation, increasing the cost of using higherorder meth
ods relative to that of using fully synchronized element time march
ing techniques. As demonstrated in Section 4, this tradeoff is an
excellent one: the thirdorder elementbyelement adaptation method
yields speedups ranging from 122.81–337.23 ? when compared to the
fourthorder adaptive RungeKutta method.
We now describe the elementbyelement step size adaptation
methods used by ISAC to improve performance while preserving ac
curacy. As illustrated in the right portion of Figure 4, dynamic analysis
starts with an initial threedimensional temperature profile and hybrid
octtree that may have been provided by the synthesis tool or gener
ated by ISAC using steadystate analysis; a chip/package/ambient heat
capacity and thermal conductivity profile; and a power profile. After
determining the initial maximum safe step sizes of all elements, ISAC
initializes an event queue of elements sorted by their target times, i.e.,
the element’s current time plus its step size. The element with the ear
liest target timeis selected, itstemperature isupdated, a new maximum
safe step size is calculated for the element, and it is reinserted in the
event queue. The event queue serves to minimize the deviation be
tween decoupled element current times, thereby avoiding temperature
extrapolation beyond the limits of the local time boundedorder expan
sions. The new step size must take into account the truncation error of
the numerical method in use as well as the step sizes of the neighbors.
Given that hiis element i’s current step size, v is the order of the time
marching numerical method, u is a constant slightly less than one, y is
the error threshold, dTi
function of time at time t, and tiis i’s current time, the safe next step
size for a block, regardless of its neighbors, follows.
?dt
?t
? is the derivative of i’s temperature as a
si
?ti
?? u
?v
?
y
?
?
?dTi
dt
?ti
??3
2
?hi
?3
4
?hi
?dTi
dt
?ti
??dTi
dt
?ti
?3
4
?hi
?
?
?
?
?
(7)
This method of computing a new step size is based on the litera
ture [14]. However, it uses noninteger test step sizes to bracket the
most probable new step size.
It is necessary to further bound the step size to ensure that the
local times of neighbors are sufficiently close for accurate temperature
extrapolation. Given that Niis the set of i’s neighbors and w is a small
constant, e.g., 3, the new step size follows.
h
?i
? min
?
si
?ti
??min
n?Ni
?w
??tn
?hn
?ti
??
?
(8)
Page 5
For efficiency, the hnof a neighbor at its own local time is used.
This temporal adaptation technique based upon Equations 4, 5
and 8 is general, and has been tested in firstorder, secondorder, and
thirdorder numerical methods. As indicated in Section 4.2, the result
isa 122.81–337.23 ? speedup without loss of accuracy when compared
to the fourthorder adaptive RungeKutta method.
3.5. Impact of Variable Thermal Conductivity
Thermal conductivity for a material is its ratio of heat flux density to
temperature gradient. The thermal conductivity of a material, e.g., sil
icon, is a function of temperature, T. An ICs thermal conductivity,
k ?
analysis work ignores the dependence of thermal conductivity on tem
perature, approximating it as a constant. This introduces inaccuracy in
analysis results. In contrast, ISAC models thermal conductivity as a
function of temperature.
Positionand temperature dependent thermal conductivity follows:
k
temperature 300
lating the thermal conductivity value after each iteration for all the ele
ments would be computationally expensive. In order to maintain both
accuracy and performance, ISAC uses a postprocessing feedback loop
todetermine theimpact of variations inthermal conductivity upon tem
perature profile. As described in Section 4.1, the consequences were
over 5
with a model assuming constant thermal conductivity.
? r
?T
?, is also a function of position,
? r. Most previous fast IC thermal
? k0
?T
?300
?α ?C, where k0is the material’s conductivity value at
ÆK, α is a constant for the specific material. Recalcu
ÆK improvements in peak temperature accuracy when compared
3.6. The use of ISAC in IC Synthesis
As explained in Section 2, ISAC was developed primarily for use
within IC synthesis, although it may also be used to provide guid
ance during manual architectural decisions. ISAC may be used tosolve
both the steadystate and dynamic thermal analysis problems described
in Section 3.1. For use in steadystate analysis, ISAC requires three
dimensional chippackage profiles of thermal conductivity and power
density. The required IC power profiles are typically produced by a
floorplanner used within the synthesis process [11],[15],[16]. It pro
duces a threedimensional steadystate temperature profile. When used
for dynamic thermal analysis, ISAC requires threedimensional chip
package profiles of temperature, power density, heat capacity, (option
ally) initial temperature, and an elapsed IC duration after which to re
port results. It produces a threedimensional temperature profile at any
requested time.
Both steadystate and dynamic thermal analysis solvers within
ISAC have been accelerated, using the techniques described in Sec
tions 3.3 and 3.4, in order to permit efficient use after each tentative
change to an IC power profile during synthesis or design. Use within
synthesis has been validated (see Section4) by integrating ISACwithin
a behavioral synthesis algorithm [11].
4. Experimental Results
In this section, we validate and evaluate the performance of ISAC.
Experiments were conducted on Linux workstations of similar perfor
mance. Evaluation focuses on accuracy and efficiency. ISAC sup
ports both steadystate and dynamic thermal analysis. Steadystate
thermal analysis is validated against FEMLAB, a widelyused com
mercial physics modeling package, using two actual chip designs from
IBM and the MIT Raw group. Dynamic thermal simulation is vali
dated against a fourthorder adaptive RungeKutta method using a set
of synthesis benchmarks. Efficiency determines the feasibility of using
thermal analysis during synthesis and design. To characterize the effi
ciency of ISAC, we compare it with other popular numerical analysis
methods by conducting steadystate and dynamic thermal analysis on
the power profiles produced during IC synthesis.
4.1. SteadyState Thermal Analysis Results
Thissectionreportstheaccuracy and efficiencyof thesteadystatether
mal simulation techniques used in ISAC. We first conduct the follow
ing experiments using two actual chip designs. The first IC is designed
by IBM. The silicon die is 13mm?13mm?0.625mm, which is sol
dered to a ceramic carrier using flipchip packaging, and attached to
a heat sink. A detailed 11 ?11 block static power profile was pro
duced using a power simulator. The second IC is a chiplevel multi
processor designed by the MIT Raw group. This IC contains 16 on
i?0
chip MIPS processor cores organized in a 4 ?4 array. The die area is
18.2mm?18.2mm. ItusesanIBMceramiccolumn gridarraypackage
with direct lid attach thermal enhancement. The static power profile is
based on data provided in the literature [17]. We validate ISAC by
comparing its results with those produced by FEMLAB, a widelyused
commercial threedimensional finite element based physics modeling
package. Table 1 provides thermal analysis results produced by ISAC
and FEMLAB for these ICs.
Average error, eavgwill be used as a measure of difference be
tween thermal profiles:
eavg
? 1 ?N
N
∑
?1
?
?Ti
?T
?
i
?
?
?Ti
(9)
where N is the total number of elements on the surface of the active
layer ofthesilicondiemodeledbyISAC.TiandT
of element i reported by FEMLAB and ISAC, respectively. This is
conservative. If comparisons were made in degrees Kelvin instead of
degrees Celsius, the reported percentage error would be even lower.
In Table 1, the second and third columns show the peak and aver
age temperatures of the surface of the active layer of the silicon dies of
these chips, as reported by ISAC. Compared to FEMLAB, the average
errors, eavg, are 1.7% and 0.7%. The next four columns show the effi
ciency of ISAC in terms of CPU time, speedup, memory use, and num
ber of elements. For comparison, the next three columns show the effi
ciency using a multigrid analysis technique with homogeneous mesh
ing. These results clearly demonstrate that element resolution adapta
tion allows ISAC to achieve dramatic improvements in efficiency com
pared to the conventional multigrid technique. CPU times decrease to
3.6% and 0.14% and memory usage decreases to 5.6% and 2.4% of the
times and memory required by the homogeneous technique. Note that
multigrid steadystate analysis itself is a highly efficient approach [8].
Using FEMLAB, both simulations take at least 20 minutes.
Existing IC thermal simulators neglect the dependence of thermal
conductivity on temperature, potentially resulting in substantial errors
in peak temperature. In previous work, this error was not detected dur
ing validation because the models against which they were validated
also used constant values for thermal conductivity. Temperature varies
through the silicon die. Therefore, ignoring the dependence of thermal
conductivity on temperature may introduce significant errors.
The last two columns of Table 1 show the peak and average
temperatures, reported by FEMLAB, using thermal conductivities at
25
temperatures are underestimated by approximately 5
will be even more serious in designs with higher peak temperatures.
Note that the source of inaccuracy is not the specific value of thermal
conductivity chosen. No constant value will allow accurate results in
general: an accurate IC thermal model must consider the dependence
of silicon thermal conductivity upon temperature.
To further evaluate its efficiency, we use ISAC to conduct thermal
analysis for the behavioral synthesis algorithm described in Section 2.
This iterative algorithm does both behaviorallevel and physicallevel
optimization. In this experiment, ISAC performs steadystate thermal
analysis for each intermediate solution generated during synthesis of
ten commonlyused behavioral synthesis benchmarks.
Table 2 shows the performance of ISAC when used for steady
state thermal analysis during behavioral synthesis. The second, third,
and fourth columns show the overall CPU time, speedup, and av
erage memory used by ISAC to conduct steadystate thermal analy
sis for all the intermediate solutions. Column five shows the aver
age error compared to a conventional homogeneous meshing multigrid
method, whose overall CPU time and average memory use are shown
in columns six and seven. ISAC achieves almost the same accuracy
with much lower runtime overhead. The last column shows the CPU
time used by the behavioral synthesis algorithm. Comparing column
two and column seven makes it clear that, when used for steadystate
thermal analysis, ISAC consumes only a fraction of the CPU time re
quired for synthesis: it is feasible to use ISAC during synthesis.
4.2. Dynamic Thermal Analysis Results
In this section, we evaluate the accuracy and efficiency of the dynamic
thermal analysis techniques used in ISAC. Heterogeneous spatial res
?
iarethetemperatures
ÆC, i.e., room temperature. It shows that, for both chips, the peak
ÆC. This effect