Page 1

Adaptive Chip-Package Thermal Analysis for Synthesis and Design

Yonghong Yang† Zhenyu (Peter) Gu‡ Changyun Zhu† Li Shang†Robert P. Dick‡

†ECE Department

Queen’s University

Kingston, ON K7L 3N6, Canada

?4yy6, 4cz1 ?@qlink.queensu.ca,li.shang@queensu.ca

‡EECS Department

Northwestern University

Evanston, IL 60208, U.S.A.

?zgu646, dickrp ?@ece.northwestern.edu

Abstract

Ever-increasing integrated circuit (IC) power densities and peak

temperatures threaten reliability, performance, and economical cool-

ing. To address these challenges, thermal analysis must be embedded

within IC synthesis. However, detailed thermal analysis requires ac-

curate three-dimensional chip-package heat flow analysis. This has

typically been based on numerical methods that are too computation-

ally intensive for numerous repeated applications during synthesis or

design. Thermal analysis techniques must be both accurate and fast

for use in IC synthesis.

This articlepresents a novel, accurate, incremental, self-adaptive,

chip-package thermal analysis technique, called ISAC, for use in IC

synthesis and design. It is common for IC temperature variation

to strongly depend on position and time. ISAC dynamically adapts

spatial and temporal modeling granularity to achieve high efficiency

while maintaining accuracy. Both steady-state and dynamic thermal

analysis are accelerated by the proposed heterogeneous spatial res-

olution adaptation and temporally decoupled element time marching

techniques. Each technique enables orders of magnitude improvement

in performance while preserving accuracy when compared with other

state-of-the-art adaptive steady-state and dynamic IC thermal analysis

techniques. Experimental results indicate that these improvements are

sufficient to make accurate dynamic and static thermal analysis prac-

tical within the inner loops of IC synthesis algorithms. ISAC has been

validated against reliable commercial thermal analysis tools using in-

dustrial and academic synthesis test cases and chip designs. It has

been implemented as a software package suitable for integration in IC

synthesis and design flows and has been publicly released.

1. Introduction

Integrated circuit (IC) densities and performance requirements arecon-

tinuously increasing. The crucial task of managing the resulting in-

crease inpower density and peak IC temperature is becoming more dif-

ficult [1],[2]. Current architectural-level design automation and syn-

thesis tools have multiple design metrics, such as power consumption,

temperature, performance, cost, and reliability. IC designs must care-

fully trade off these metrics. However, if not properly addressed, in-

creased IC temperature affects other design metrics including perfor-

mance (via decreased transistor switching speed and increased inter-

connect latency), power and energy consumption (via increased leak-

age power), reliability (via electromigration, hot carrier effects, ox-

ide thermal breakdown, etc.), and price (via increased system cooling

cost). Considering thermal issues during IC synthesis and design is

now necessary. When determining the impact of each decision in the

synthesis or design process, the impacts of changed thermal profile on

performance, power, price, and reliability must be considered. This

requires repeated use of fast, accurate thermal analysis tools during

synthesis.

The IC thermal analysis problem may be separated into two

subproblems: steady-state (or static) analysis and dynamic analysis.

Steady-state analysis determines the temperature profile to which an

IC converges as timeapproaches infinity, given power and thermal con-

ductivity profiles. Dynamic thermal analysis determines the tempera-

ture profile of an IC at any time given an initial temperature, power,

heat capacity, and thermal conductivity profiles.

This work is supported in part by the NSERC Discovery Grant #388694-01, and in part by

the NSF under award CNS-0347941.

Numerical analysis techniques were also proposed to character-

ize the thermal profile of on-chip interconnect layers [3–5]. Recently,

Skadron et al. developed steady-state and dynamic thermal analysis

tools for microarchitectural evaluation [6]. Neither the matrix tech-

niques of the steady-state analysis tool nor the lock-step fourth-order

Runge-Kutta timemarching technique used for dynamic analysis make

use of spatial or asynchronous temporal adaptation; accuracy or perfor-

mance suffer. Researchers have proposed quad-tree mesh refinement

for thermal analysis [7], but did not consider local temporal adapta-

tion. Li et al. proposed an efficient multigrid modeling technique to

conduct full-chip steady-state thermal analysis [8]. Although the ad-

vantages of heterogeneous element discretization is noted, no system-

atic adaptation method is provided. Zhan and Sapatnekar [9] proposed

a steady-state thermal analysis method based on Green’s function that

was accelerated by using discrete cosine transforms and look-up ta-

ble. However, these methods [8],[9] do not support dynamic thermal

analysis.

Existing IC thermal analysis tools are capable of providing either

accuracy or speed, but not both. Accurate thermal analysis requires

expensive computation for many elements in some regions, at some

times. Conventional IC thermal analysis techniques ensure accuracy

by choosing uniformly fine levels of detail across time and space, i.e.,

they use equivalent physical sizes or time step durations for all thermal

elements. The large number of elements and time steps resulting from

such techniques makes them computationally intensive and, therefore,

impractical for use within IC synthesis. This article presents validated,

synthesis-oriented IC thermal analysis techniques that differ from ex-

isting work by doing operation-by-operation dynamic adaptation of

temporal and spatial resolution in order to dramatically reduce com-

putational overhead without sacrificing accuracy. Experimental results

indicate that the proposed spatial adaptation technique improves CPU

time by 21.64–690.00? and that the temporal adaptation technique im-

proves CPU time by 122.81–337.23 ?. Although much faster than con-

ventional analysis techniques, the proposed techniques have been de-

signed for accuracy even when this increases complexity and run time,

e.g., by correctly modeling the dependence of thermal conductivity

on temperature. These algorithms have been validated against FEM-

LAB, a reliable commercial finite element physical process modeling

package, and a high-resolution spatially and temporally homogeneous

initial value problem solver. Experimental results indicate that using

existing thermal analysis techniques within IC synthesis flow would

increase CPU time by many orders of magnitude, making it imprac-

tical to synthesize complex ICs. The proposed techniques make both

dynamic and static thermal analysis practical within the inner loop of

IC synthesis algorithms. They have been implemented as a software

tool called ISAC that has been publicly released [10].

This article is organized as follows. Section 2 gives a motivating

example, which illustrates the need for fast and accurate thermal analy-

sis during IC synthesis and suggests techniques to reach this goal. Sec-

tion 3 describes the model, algorithms, and implementation of ISAC, a

fast and accurate steady-state and dynamic thermal analysis tool. Sec-

tion4presents experimental results validating ISACanddemonstrating

thedramaticperformance advantages resulting fromspatial andtempo-

ral adaptation during thermal analysis. Section5 presents conclusions.

2. Motivating Examples

In this section, we use a thermal-aware IC synthesis flow to demon-

strate the challenges of fast and accurate IC thermal modeling. Fig-

3-9810801-0-6/DATE06 © 2006 EDAA

Page 2

Input specification

High-level optimization

(scheduling, voltage

partition, resource binding,

etc.)

Physical-level optimization

(floorplanning)

Iterative optimization

Power analysis

Thermal

analysis

Performance

profiling

Multi-objective cost

evaluation

Final solutions

Figure 1. Thermal-aware synthesis flow.

Silicon dieCooling package

(a) Silicon chip and package.

35

40

45

50

55

60

65

70

75

80

85

90

-8

-6

-4

-2

0

2

4

6

8

-8

-6

-4

-2

0

2

4

6

8

35

40

45

50

55

60

65

70

75

80

85

90

Temperature (°C)

Temperature (°C)

Position (mm)

Heatsink/IC

interface

IC active layer

(b) Temperature profile for active layer and heatsink.

Figure 2. Thermal analysis during IC synthesis.

ure 1 shows an integrated behavioral-level and physical-level IC syn-

thesis system [11]. This synthesis system uses a simulated annealing

algorithm to jointly optimize several design metrics, including perfor-

mance, area, power consumption, and peak IC temperature. It con-

ducts both behavioral-level and physical-level stochastic optimization

moves, including scheduling, voltage assignment, resource binding,

floorplanning, etc. An intermediate solution is generated after each

optimization move. A detailed two-dimensional power profile is then

reported based on the physical floorplan. Thermal analysis algorithms

are invoked to guide optimization moves.

As illustrated by the example synthesis flow for each intermediate

solution, detailed thermal characterization requires full chip-package

thermal modeling and analysis using numerical methods, which are

computationally intensive. Figure 2 shows a full chip-package ther-

mal modeling example from an IBM IC design (see Section 4.1 for

more detail). The steady-state thermal profile of the active layer of the

silicon die in conjunction with the top layer of the cooling package,

shown in Figure 2(b), were characterized using a multigrid thermal

solver by partitioning the chip and the cooling package into 131,072

homogeneous thermal elements. Without spatial and temporal adapta-

tion, the solver requires many seconds or minutes when run on a high-

performance workstation. Compared to steady-state thermal modeling,

characterizing IC dynamic thermal profile is even more time consum-

ing. IC synthesis requires a large number of optimization steps; ther-

mal modeling can easily become its performance bottleneck.

A key challenge in thermal-aware IC synthesis is the develop-

ment of fast and accurate thermal analysis techniques. Fundamentally,

IC thermal modeling is the simulation of heat transfer from heat pro-

ducers (transistors and interconnect), through silicon die and cooling

package, to the ambient environment. This process is modeled with

partial differential equations. In order to approximate the solutions of

these equations using numerical methods, finite discretization is used,

i.e., an IC model is decomposed into numerous three-dimensional el-

ements. Adjacent elements interact via heat diffusion. Each element

is sufficiently small to permit its temperature to be expressed as a dif-

ference equation, as a function of time, its material characteristics, its

power dissipation, and the temperatures of its neighboring elements.

In an approach analogous to electric circuit analysis, thermal RC

(or R) networks are constructed to perform dynamic (or steady-state)

thermal analysis. Direct matrixoperations, e.g., inversion, maybe used

for steady-state thermal analysis. However, the computational demand

of this technique hinders its use within synthesis. Dynamic thermal

analysis may be conducted by partitioning the simulation period into

small time steps. The local times of all elements are then advanced, in

lock-step, using transient temperature approximations yielded by dif-

10

0

10

1

10

2

0

2000

4000

6000

8000

10000

12000

Number of elements

(a) Inter-element thermal gradient

10

0

10

1

10

2

200

400

600

800

1000

1200

1400

1600

1800

2000

2200

Number of elements

(b) Normalized maximum step size

Figure 3. The potential of adaptive thermal modeling.

ference equations. The computation complexity of dynamic thermal

analysis is a function of the number of grid elements and time steps.

Therefore, to improve the efficiency of thermal modeling, the key issue

is to optimize the spatial and temporal modeling granularity, eliminat-

ing non-essential elements and stages.

Thereisatension between accuracy andefficiency whenchoosing

modeling granularity. Increasing modeling granularity reduces analy-

siscomplexity but mayalso decrease accuracy. Uniformtemperature is

assumed within each thermal element. Intra-element thermal gradients

are neglected. Therefore, increasing spatial modeling granularity nat-

urally increases modeling errors. Similarly, increasing time step size

may result in failure to capture transient thermal fluctuation or may in-

crease truncation error when the actual temperature functions of some

elements are of higher order than the difference equations used to ap-

proximate them.

IC thermal profiles contain significant spatial and temporal varia-

tion due to the heterogeneity of thermal conductivity and heat capacity

in different materials, as well as varying power profiles resulting from

non-uniform functional unit activities, placements, and schedules. Fig-

ure3(a) shows theinter-element thermal gradient distributionusingho-

mogeneous meshing of the example shown in Figure 2. The histogram

is normalized to the smallest inter-element thermal gradient. This fig-

ure contains a wide distribution of thermal gradients: heterogeneous

spatial element discretization refinement based on thermal gradients

has the potential to improve performance without impacting accuracy.

For dynamic thermal simulation, the size of each thermal ele-

ment’stime stepsshould permit accurate approximation by theelement

difference equations. An IC may experience different thermal fluctu-

ations at different locations. Therefore, the best sizes of time steps

for elements at different locations may vary. Figure 3(b) shows the

maximum potential time step size of each individual block based local

thermal variation; local adaptation of time step sizes has the potential

to improve performance without impacting accuracy.

3. Thermal Analysis Model and Algorithms

This section gives details on the proposed thermal analysis techniques.

3.1. IC Thermal Analysis Problem Definition

IC thermal analysis is the simulation of heat transfer through heteroge-

neous material among heat producers (e.g., transistors) and heat con-

sumers (e.g., heat sinks attached to IC packages). Modeling thermal

conduction is analogous to modeling electrical conduction, with ther-

mal conductivity corresponding to electrical conductivity, power dissi-

pation corresponding to electrical current, heat capacity corresponding

to electrical capacitance, and temperature corresponding to voltage.

The equation governing heat diffusion via thermal conduction in

an IC follows.

ρcp∂T

?? r

∂t

?t

?

???k ?? r

??T

?? r

?t

??? p ?? r

?t

?

(1)

In Equation 1, ρ is the material density; cpis the mass heat ca-

pacity; T

of the material at position

sity of the heat source. Note that, in reality, the thermal conductivity,

k, also depends on temperature (see Section 3.5). ISAC supports arbi-

trary heterogeneous thermal conduction models. For example, a model

may be composed of a heat sink in a forced-air ambient environment,

heat spreader, bulk silicon, active layer, and packaging material or any

other geometry and combination of materials.

?? r

?t

? and k ?? r

? are the temperature and thermal conductivity

? r and time t; and p ?? r

?t

? is the power den-

Page 3

3-D chip/package/ambient

heat capacity and

thermal conductivity profiles

Initial 3-D temperature

profile and hybrid oct-tree

(optional)

Power

profile

Dynamic

thermal

analysis

Multigrid

incremental

solver

Initialize/update

discrete event

simulator queue

Process one

pending event

Adapt

neighboring

element

step sizes

Sample period

reached?

Thermal

gradient conditions

satisfied?

Adapt profile based

on k(T)

Converged?

3-D thermal

profile (and

hybrid oct-tree)

Stready-state

thermal analysis

Y

N

Y

Spatial hybrid

oct-tree refinement

Y

N

N

Initial 3-D

temperature

profile and

hybrid

oct-tree

Figure 4. Overview of ISAC.

i?1?j ?k

In order to do numerical thermal analysis, a seven point finite dif-

ference discretization method can be applied to the left and right side

of Equation 1, i.e., the IC thermal behavior may be modeled by decom-

posing it into numerous cubic elements, which may be of non-uniform

sizes. Adjacent elements interact via heat diffusion. Each element

has a power dissipation, temperature, thermal capacitance, as well as a

thermal resistance to adjacent elements. The discretized equation at an

interior point of a homogeneous material follows.

ρcpV

Tm?1

i?j ?k

?Tm

i?j ?k

∆t

??2 ?Gx

?Gy

?Gz

?Tm

i?j ?k

?GxTm

?GxTm

i?1?j ?k

?GyTm

i?j ?1?k

?GyTm

i?j ?1?k

(2)

?GzTm

i?j ?k

?1

?GzTm

i?j ?k

?1

?V pi?j ?k

Given that ∆x, ∆y, and ∆z are discretization steps in dimen-

sion x, y and z, V

ductivities between adjacent elements. They are defined as follows:

Gx

cretization step in time t. For steady-state analysis, the left term in

Equation 2 expressing temperature variation as function of time, t, is

dropped. For either the dynamic or steady-state version of theproblem,

the equations for all IC elements can be represented as a matrix. Al-

though it is possible to directly solve this problem, the computational

expense is prohibitive.

? ∆x∆y∆z. Gx, Gyand Gzare the thermal con-

? k∆y∆z?∆x?Gy

? k∆x∆z?∆y

? and Gz

? k∆x∆y?∆z. ∆t is the dis-

3.2. ISAC Overview

Figure 4 gives an overview of ISAC, our proposed incremental, self-

adaptive, chip-package, thermal analysis tool. When used for steady-

state thermal analysis, it takes, as input, a three-dimensional chip and

package thermal conductivity profile, as well as a power dissipation

profile. A multigrid incremental solver is used to progressively refine

thermal element discretization torapidlyproduce atemperatureprofile.

When used for dynamic thermal analysis, in addition to the input

data required for steady-state analysis, ISAC requires the chip-package

heat capacity profile. In addition, it may accept an initial temperature

profile and efficient element grid. If these inputs are not provided, the

dynamic analysis technique uses the steady-state analysis technique to

produce its initial temperature profile and element grid. It then re-

peatedly updates the local temperatures and times of elements at asyn-

chronous time steps, appropriately adapting the step sizes of neighbors

to maintain accuracy.

As described in Section 3.5, after analysis is finished, the temper-

ature profile is adapted using a feedback loop in which thermal con-

ductivity is modified based upon temperature. Upon convergence, the

temperature profile is reported to the IC synthesis tool or designer.

3.3. Spatial Adaptation in Thermal Analysis

In this section, we present an efficient technique for adapting ther-

mal element spatial resolution for thermal analysis. This technique

uses incremental refinement to generate a tree of heterogeneous paral-

lelepipeds that supports fast thermal analysis without loss in accuracy.

Within ISAC, this technique is incorporated with an efficient multigrid

Algorithm 1 hybrid tree traversal(noderoot)

1: if noderootis a leaf node then

2:Add noderootto contourfinest level; return finest level

3: end if

4: for each intermediate child chi nodeido

5:

levelchi nodei= hybrid tree traversal(chi nodei)

6:

levelmin= min(levelmin, levelchi nodei)

7: end for

8: for each intermediate child chi nodeido

9:

if levelchi nodei

10:Add chi nodeito contourchi nodei

11:

end if

12: end for

13: Add noderootto contourlevelmin

14: return levelmin-1

? levelminthen

?1,...,contourlevelmin

?1

0

1234

5678910

1112

6

78

1

2

910

2

3

2

3

4

56

78

2

3

4

1

Contour

level 3

Level 3level 2level 1

1

1

Figure 5. Heterogeneous spatial resolution adaptation.

numerical analysis method, yielding a complete steady-state thermal

analysis solution. Dynamic thermal analysis alsobenefits fromthe pro-

posed spatial adaptation technique due to the dramatic reduction of the

number of grid elements that must be considered during time marching

simulation.

3.3.1. Hybrid Data Structure. Efficient spatial adaptation in thermal

analysis relies on sophisticated data structures, i.e., it requires the effi-

cient organization of large data sets, representation of multi-level mod-

eling resolutions, and inter-level transition. The proposed technique

is supported by a hybrid oct-tree data structure, which provides an ef-

ficient and flexible representation to support spatial resolution adap-

tation. A hybrid oct-tree is a tree that maintains spatial relationships

among parallelepipeds in three dimensions. Each node may have up to

eight immediate children. Figure 5 shows a hybrid tree representation.

For the sake of simplicity, a two-dimension quad-tree is shown instead

of a three-dimension hybrid oct-tree. In the hybrid oct-tree, different

modeling resolutions are organized into contours along the tree hier-

archy, e.g., the contour formed by the leaf nodes represent the finest

spatial resolution (in this example, elements 2,3,6,7,...,12). Hetero-

geneous spatial resolution may result in a thermal element residing at

multiple resolution levels, e.g., element 2 resides at level 1, 2, and 3.

This information is represented as nodes existing in multiple contours

in the tree.

Spatial resolution adaption requires two basic operations, parti-

tioning and coarsening. In a hybrid oct-tree, partitioning is the process

of breaking a leaf node along arbitrary orthogonal axes, e.g., nodes 9

and 10 result from refining node 4. Coarsening is the process of merg-

ing direct sub-nodes into their parent, e.g., node 11 and 12 merged into

node 5. To conduct multi-resolution thermal analysis, we proposed

an efficient contour search algorithm, with computational complexity

O

lution level. As shown in Algorithm 1, leaf nodes are assigned to the

finest resolution level (lines 1–3). The resolution level of a parent node

of asubtreeequals theminimalresolutionlevel of allofitsintermediate

children nodes minus one (lines 4–7 and 13). An element may reside

in multiple resolution levels (lines 8–12). As will be explained later,

this algorithm provides an efficient solution to traverse different spatial

resolutions, thereby supporting efficient multigrid thermal analysis.

?N

?, to determine thermal grid elements belonged to the same reso-

3.3.2. Multigrid Method. Since directly solving the system of linear

equations resulting from a large problem instance is intractable, more

efficient numerical methods are used to solve the heat diffusion prob-

lem. The multigrid method is an iterative method of solving (typically

sparse) systems of linear equations. It solves this problem by con-

Page 4

structing a multi-level scheme, which greatly improves the efficiency

of removing low frequency solution errors common for conventional

iterative methods [12]. A description of this technique is shown in

Algorithm 2.

3.3.3. Incremental Analysis. Upon initialization, the steady-state ther-

mal analysis tool generates a coarse homogeneous oct-tree based on

the chip size. Iterative temperature approximation is repeated until

convergence to a stable profile. Elements across which temperature

varies by more than a user-specified threshold are further partitioned

into sub-elements. For each ordered element pair,

is the temperature of element i and that S is the temperature threshold,

the new number of elements, Q, along some partition g follows.

?i ? j

?, given that Ti

Q

?

?log2

?Ti

?Tj

?S

??

(3)

For each element, i, partitions along three dimensions are gath-

ered into a three-tuple (xi

a hybrid sub oct-tree. The number of sub-elements depends on the

ratio of the temperature difference to the threshold. Therefore, some

elements may be further partitioned and local thermal simulation re-

peated. Simulation terminates when all element-to-element temper-

ature differences are smaller than the predefined threshold, S. This

method focuses computation on the most critical regions, increasing

analysis speed while preserving accuracy.

3.4. Temporal Adaptation in Thermal Analysis

ISAC uses an adaptive time marching technique for dynamic thermal

analysis. This technique is loosely related to the adaptive Runge-Kutta

method [13] described in Section 2. The computational cost of a finite

?yi

?zi) that governs partitioning element i into

difference time marching technique is ∑e?Euecewhere E is the set

of all elements, ueis the number of time steps for a given element,

and ceis the time cost per evaluation for that element. For Runge-

Kutta methods, assuming a constant evaluation time and noting that all

elements experience the same number of evaluations, run time can be

expressed as uc∑e ?Enewhere n is the number of a block’s transitive

neighbors. For these methods, element time synchronization permits

evaluation amortization, eliminating the need to repeatedly evaluate

transitive neighbors, yielding a time cost of

Analysistimeisclassicallyreduced byattacking u, either byusing

higher-order methods that allow larger steps under bounded error or

by adapting global step size during analysis, e.g., the adaptive Runge-

Kutta method. However, much greater gains are possible. As noted in

Section 2, the requirement that all thermal elements be synchronized

in time implies that, at each time step, all elements must have their

local times advanced by the smallest step required by any element in

the model. As indicated by Figure 3(b), this implies that most elements

are forced to take unnecessarily small steps.

Although many time marching numerical methods for solving or-

dinary differential equations are based on methods that do not require

explicit differentiation, these methods are conceptually based on re-

peated Taylor series expansions around increasing time instants. Re-

visiting these roots and basing time marching on Taylor series expan-

sion allows element-by-element time step adaptation by supporting the

extrapolation of temperatures at arbitrary times.

For many problems, the differentiation required for calculating

Taylor series expansions is extremely complicated. Fortunately, for

the dynamic IC thermal analysis problem, little more than the Laplace

transform and linearity theorem are needed. Noting the definitions in

Equation 2, and given that Tn

t, Ginis the thermal conductivity between elements i and n, Niare

element i’s neighbors, M is the neighbor depth, αi

?E

?uc.

?t

? is the temperature of element n at time

?∑n?NiGin, and

βi

?t

?M

??

?∑n?NiTn

V pi

?t

?M

??Gin

?V pi

if M

otherwise

? 0

(4)

the nearest-neighbor approximation of temperature of element i at time

t

?h follows.

Ti

?t

?h ?M

?? βi

?t

?h ?M

?1 ??αi

?Ti

?t

??βi

e

?M

?1 ??αi

?h?αi

???ρcpiV

?

(5)

under boundary conditions determined by the chip, package, and cool-

ing solution.

Algorithm 2 Multigrid cycle

1: Pre-smoothing step: Iteratively relax initial random solution.

2: subtask Coarse grid correction

3:Compute residue from finer grid.

4:Approximate residue in coarser grid.

5:Solve coarseer grid problem using relaxation.

6:

if Coarsest level has been reached then

7: Directly solve problem at this level.

8:

else

9:Recursively apply the multigrid method.

10:

end if

11: Map the correction back from the coarser to finer grid.

12: end subtask

13: Post smoothing step: Add correction to solution at finest grid level.

14: Iteratively relax to obtain the final solution.

?HF error eliminated. ?

Note that the potentially differing values of step size, h, and local

time, t, for all thermal elements implies that the number of transitive

temperature extrapolations necessary for an element to advance by one

time step may not be amortized over multiple uses, as in the case in

the lock-step Runge-Kutta methods. As a result, for three-dimensional

thermal analysis, the number of evaluations, e, is related to the transi-

tive neighbor count, d, as follows:

e

??E

??4 ?3d3

?2d2

?8 ?3d

?

(6)

i.e., the discretized volume of the implied octahedron.

Insummary, although itiscommon toimprovetheperformance of

time marching techniques by increasing their orders, thereby increas-

ing their step sizes, for the IC thermal analysis problem greater gains

are possible by decoupling element local times, allowing most ele-

ments to take larger than minimum-sized steps. However, this requires

explicit differentiation and prevents the amortization of neighbor tem-

perature extrapolation, increasing the cost of using higher-order meth-

ods relative to that of using fully synchronized element time march-

ing techniques. As demonstrated in Section 4, this trade-off is an

excellent one: the third-order element-by-element adaptation method

yields speed-ups ranging from 122.81–337.23 ? when compared to the

fourth-order adaptive Runge-Kutta method.

We now describe the element-by-element step size adaptation

methods used by ISAC to improve performance while preserving ac-

curacy. As illustrated in the right portion of Figure 4, dynamic analysis

starts with an initial three-dimensional temperature profile and hybrid

oct-tree that may have been provided by the synthesis tool or gener-

ated by ISAC using steady-state analysis; a chip/package/ambient heat

capacity and thermal conductivity profile; and a power profile. After

determining the initial maximum safe step sizes of all elements, ISAC

initializes an event queue of elements sorted by their target times, i.e.,

the element’s current time plus its step size. The element with the ear-

liest target timeis selected, itstemperature isupdated, a new maximum

safe step size is calculated for the element, and it is reinserted in the

event queue. The event queue serves to minimize the deviation be-

tween decoupled element current times, thereby avoiding temperature

extrapolation beyond the limits of the local time bounded-order expan-

sions. The new step size must take into account the truncation error of

the numerical method in use as well as the step sizes of the neighbors.

Given that hiis element i’s current step size, v is the order of the time-

marching numerical method, u is a constant slightly less than one, y is

the error threshold, dTi

function of time at time t, and tiis i’s current time, the safe next step

size for a block, regardless of its neighbors, follows.

?dt

?t

? is the derivative of i’s temperature as a

si

?ti

?? u

?v

?

y

?

?

?dTi

dt

?ti

??3

2

?hi

?3

4

?hi

?dTi

dt

?ti

??dTi

dt

?ti

?3

4

?hi

?

?

?

?

?

(7)

This method of computing a new step size is based on the litera-

ture [14]. However, it uses non-integer test step sizes to bracket the

most probable new step size.

It is necessary to further bound the step size to ensure that the

local times of neighbors are sufficiently close for accurate temperature

extrapolation. Given that Niis the set of i’s neighbors and w is a small

constant, e.g., 3, the new step size follows.

h

?i

? min

?

si

?ti

??min

n?Ni

?w

??tn

?hn

?ti

??

?

(8)

Page 5

For efficiency, the hnof a neighbor at its own local time is used.

This temporal adaptation technique based upon Equations 4, 5

and 8 is general, and has been tested in first-order, second-order, and

third-order numerical methods. As indicated in Section 4.2, the result

isa 122.81–337.23 ? speedup without loss of accuracy when compared

to the fourth-order adaptive Runge-Kutta method.

3.5. Impact of Variable Thermal Conductivity

Thermal conductivity for a material is its ratio of heat flux density to

temperature gradient. The thermal conductivity of a material, e.g., sil-

icon, is a function of temperature, T. An ICs thermal conductivity,

k ?

analysis work ignores the dependence of thermal conductivity on tem-

perature, approximating it as a constant. This introduces inaccuracy in

analysis results. In contrast, ISAC models thermal conductivity as a

function of temperature.

Positionand temperature dependent thermal conductivity follows:

k

temperature 300

lating the thermal conductivity value after each iteration for all the ele-

ments would be computationally expensive. In order to maintain both

accuracy and performance, ISAC uses a post-processing feedback loop

todetermine theimpact of variations inthermal conductivity upon tem-

perature profile. As described in Section 4.1, the consequences were

over 5

with a model assuming constant thermal conductivity.

? r

?T

?, is also a function of position,

? r. Most previous fast IC thermal

? k0

?T

?300

?α ?C, where k0is the material’s conductivity value at

ÆK, α is a constant for the specific material. Recalcu-

ÆK improvements in peak temperature accuracy when compared

3.6. The use of ISAC in IC Synthesis

As explained in Section 2, ISAC was developed primarily for use

within IC synthesis, although it may also be used to provide guid-

ance during manual architectural decisions. ISAC may be used tosolve

both the steady-state and dynamic thermal analysis problems described

in Section 3.1. For use in steady-state analysis, ISAC requires three-

dimensional chip-package profiles of thermal conductivity and power

density. The required IC power profiles are typically produced by a

floorplanner used within the synthesis process [11],[15],[16]. It pro-

duces a three-dimensional steady-state temperature profile. When used

for dynamic thermal analysis, ISAC requires three-dimensional chip-

package profiles of temperature, power density, heat capacity, (option-

ally) initial temperature, and an elapsed IC duration after which to re-

port results. It produces a three-dimensional temperature profile at any

requested time.

Both steady-state and dynamic thermal analysis solvers within

ISAC have been accelerated, using the techniques described in Sec-

tions 3.3 and 3.4, in order to permit efficient use after each tentative

change to an IC power profile during synthesis or design. Use within

synthesis has been validated (see Section4) by integrating ISACwithin

a behavioral synthesis algorithm [11].

4. Experimental Results

In this section, we validate and evaluate the performance of ISAC.

Experiments were conducted on Linux workstations of similar perfor-

mance. Evaluation focuses on accuracy and efficiency. ISAC sup-

ports both steady-state and dynamic thermal analysis. Steady-state

thermal analysis is validated against FEMLAB, a widely-used com-

mercial physics modeling package, using two actual chip designs from

IBM and the MIT Raw group. Dynamic thermal simulation is vali-

dated against a fourth-order adaptive Runge-Kutta method using a set

of synthesis benchmarks. Efficiency determines the feasibility of using

thermal analysis during synthesis and design. To characterize the effi-

ciency of ISAC, we compare it with other popular numerical analysis

methods by conducting steady-state and dynamic thermal analysis on

the power profiles produced during IC synthesis.

4.1. Steady-State Thermal Analysis Results

Thissectionreportstheaccuracy and efficiencyof thesteady-statether-

mal simulation techniques used in ISAC. We first conduct the follow-

ing experiments using two actual chip designs. The first IC is designed

by IBM. The silicon die is 13mm?13mm?0.625mm, which is sol-

dered to a ceramic carrier using flip-chip packaging, and attached to

a heat sink. A detailed 11 ?11 block static power profile was pro-

duced using a power simulator. The second IC is a chip-level multi-

processor designed by the MIT Raw group. This IC contains 16 on-

i?0

chip MIPS processor cores organized in a 4 ?4 array. The die area is

18.2mm?18.2mm. ItusesanIBMceramiccolumn gridarraypackage

with direct lid attach thermal enhancement. The static power profile is

based on data provided in the literature [17]. We validate ISAC by

comparing its results with those produced by FEMLAB, a widely-used

commercial three-dimensional finite element based physics modeling

package. Table 1 provides thermal analysis results produced by ISAC

and FEMLAB for these ICs.

Average error, eavgwill be used as a measure of difference be-

tween thermal profiles:

eavg

? 1 ?N

N

∑

?1

?

?Ti

?T

?

i

?

?

?Ti

(9)

where N is the total number of elements on the surface of the active

layer ofthesilicondiemodeledbyISAC.TiandT

of element i reported by FEMLAB and ISAC, respectively. This is

conservative. If comparisons were made in degrees Kelvin instead of

degrees Celsius, the reported percentage error would be even lower.

In Table 1, the second and third columns show the peak and aver-

age temperatures of the surface of the active layer of the silicon dies of

these chips, as reported by ISAC. Compared to FEMLAB, the average

errors, eavg, are 1.7% and 0.7%. The next four columns show the effi-

ciency of ISAC in terms of CPU time, speedup, memory use, and num-

ber of elements. For comparison, the next three columns show the effi-

ciency using a multigrid analysis technique with homogeneous mesh-

ing. These results clearly demonstrate that element resolution adapta-

tion allows ISAC to achieve dramatic improvements in efficiency com-

pared to the conventional multigrid technique. CPU times decrease to

3.6% and 0.14% and memory usage decreases to 5.6% and 2.4% of the

times and memory required by the homogeneous technique. Note that

multigrid steady-state analysis itself is a highly efficient approach [8].

Using FEMLAB, both simulations take at least 20 minutes.

Existing IC thermal simulators neglect the dependence of thermal

conductivity on temperature, potentially resulting in substantial errors

in peak temperature. In previous work, this error was not detected dur-

ing validation because the models against which they were validated

also used constant values for thermal conductivity. Temperature varies

through the silicon die. Therefore, ignoring the dependence of thermal

conductivity on temperature may introduce significant errors.

The last two columns of Table 1 show the peak and average

temperatures, reported by FEMLAB, using thermal conductivities at

25

temperatures are underestimated by approximately 5

will be even more serious in designs with higher peak temperatures.

Note that the source of inaccuracy is not the specific value of thermal

conductivity chosen. No constant value will allow accurate results in

general: an accurate IC thermal model must consider the dependence

of silicon thermal conductivity upon temperature.

To further evaluate its efficiency, we use ISAC to conduct thermal

analysis for the behavioral synthesis algorithm described in Section 2.

This iterative algorithm does both behavioral-level and physical-level

optimization. In this experiment, ISAC performs steady-state thermal

analysis for each intermediate solution generated during synthesis of

ten commonly-used behavioral synthesis benchmarks.

Table 2 shows the performance of ISAC when used for steady-

state thermal analysis during behavioral synthesis. The second, third,

and fourth columns show the overall CPU time, speedup, and av-

erage memory used by ISAC to conduct steady-state thermal analy-

sis for all the intermediate solutions. Column five shows the aver-

age error compared to a conventional homogeneous meshing multigrid

method, whose overall CPU time and average memory use are shown

in columns six and seven. ISAC achieves almost the same accuracy

with much lower run-time overhead. The last column shows the CPU

time used by the behavioral synthesis algorithm. Comparing column

two and column seven makes it clear that, when used for steady-state

thermal analysis, ISAC consumes only a fraction of the CPU time re-

quired for synthesis: it is feasible to use ISAC during synthesis.

4.2. Dynamic Thermal Analysis Results

In this section, we evaluate the accuracy and efficiency of the dynamic

thermal analysis techniques used in ISAC. Heterogeneous spatial res-

?

iarethetemperatures

ÆC, i.e., room temperature. It shows that, for both chips, the peak

ÆC. This effect