Page 1

Delay Analysis in Temperature-Constrained Hard Real-Time Systems

with General Task Arrivals

Shengquan Wang

The University of Michigan - Dearborn

Dearborn, MI 48128, USA

shqwang@umd.umich.edu

Riccardo Bettati

Texas A&M University

College Station, TX 77843, USA

bettati@cs.tamu.edu

Abstract

Inthispaper,westudytemperature-constrainedhardreal-

time systems, where real-time guarantees must be met with-

out exceeding safe temperature levels within the proces-

sor. Dynamic speed scaling is one of the major techniques

to manage power so as to maintain safe temperature lev-

els. As example, we adopt a simple reactive speed con-

trol technique in our work. We design a methodology to

perform delay analysis for general task arrivals under re-

active speed control with First-In-First-Out (FIFO) sche-

duling and Static-Priority (SP) scheduling. As a special

case, we obtain a close-form delay formula for the leaky-

bucket task arrival model. Our data show how simple reac-

tive speed control can decrease the delay of tasks compared

with any constant-speed scheme.

1 Introduction

With the rapidly increasing power density in processors

the problem of thermal management in systems is becom-

ing acute. Methods to manage heat to control its dissipa-

tion have been gaining much attention by researchers and

practitioners. Techniques are being investigated for ther-

mal control both at design time through appropriate pack-

aging and active heat dissipation mechanisms, and at run

time through various forms of dynamic thermal manage-

ment (DTM) (e.g., [1]).

Thermal management through packaging (that improves

airflow, for example) and active heat dissipation will be-

come increasingly challenging in the near future, due to the

high levels of peak power involved and the extremely high

power density in emerging systems-in-package [2]. In ad-

dition, the packaging requirements and operating environ-

ments of many high-performanceembedded devices render

This work was funded by NSF under Grant No. CNS-0509483, while

Dr. Wang was at Texas A&M University.

such approaches inappropriate.

A number of dynamic thermal management approaches

to control the temperature at run time have been proposed,

ranging from clock throttling to dynamic voltage scaling

(DVS) to in-chip load balancing:

• The Pentium 4 Series processors uses Clock Throt-

tling [3] or Clock Gating [4] to stall the clock and so

allow the processor to cool during thermal overload.

• Dynamic Voltage Scaling (DVS) [1] is used in a va-

riety of modern processor technologies and allows to

switch between different frequency and voltage op-

erating points at run time in response to the current

thermal situation. In the Enhanced Intel SpeedStep

mechanism in the Pentium M processor, for example,

a low-power operating point is reached in response

to a thermal trigger by first reducing the frequency

(within a few microseconds) and then reducing the

voltage (at a rate of one mV per microsecond) [3].

• A number of architecture-level mechanisms for ther-

mal control have been proposed that turn off com-

ponents inside the processor in response to thermal

overload. Skadronetal.[4]forexamplearguethatthe

microarchitecture should distribute the workload in

response to the thermal situation by taking advantage

ofinstruction-levelparallelism. Theperformancepen-

alty caused by this “local gating”would not be exces-

sive. On a coarser level, the Pentium Core Duo Ar-

chitecture allows the OS or the BIOS to disable one

of the cores by putting it into sleep mode [5].

Ashigh-performanceembeddedsystemsbecomeincreas-

inglytemperature-constrained,the questionof howthe ther-

mal behavior of the system and the thermal control mecha-

nisms affect real-time guarantees must be addressed. In this

paperwe describedelayanalysis techniquesintemperature-

constrained hard real-time systems, where deadline con-

straints for tasks have to be balanced against temperature

constraints of the system.

Page 2

Dynamic speed scaling allows for a trade-off between

these two performance metrics: To meet the deadline con-

straint, we run the processor at a higher speed; To maintain

the safe temperature levels, we run the process at a lower

speed. The work on dynamic speed scaling techniques to

controltemperaturein real-time systems was initiated in [6]

and further investigated in [7]. Both [6] and [7] focus on

online algorithms in real-time systems, where the scheduler

learns about a task only at its release time. In contrast, in

our work we assume a deterministic task model (e.g., peri-

odic tasks) and so allows for design-time delay analysis.

We distinguishbetweenproactiveandreactivespeedscal-

ing schemes. Whenever the temperature model is known,

theschedulercouldinprincipleuseaproactivespeed-scaling

approach,where–similarlytoanon-work-conservingsched-

uler – resources are preserved for future use. In this pa-

per, we limit ourselves to reactive schemes, and propose a

simple reactive speed scaling technique for the processor,

which will be discussed in Section 2. We focus on reactive

schemes primarilybecause they are simple to integrate with

current processor capabilities through the ACPI power con-

trol framework [8,9]. In our previous paper [10], we mo-

tivate the reactive scheme and perform delay analysis for

identical-period tasks. In this paper, we extend it to general

task arrivals with First-in First-out (FIFO) scheduling and

Static-Priority (SP) scheduling.

The rest of the paper is organized as follows. In Sec-

tion 2, we introduce the thermal model, speed scaling sche-

mes, and task model and scheduling algorithms. After in-

troducing two important lemmas in Section 3, we design

the methodology to perform delay analysis for FIFO and

SP scheduling algorithms in Sections 4 and 5 respectively.

We measure the performance in Section 6. Finally, we con-

clude our work with final remarks and give an outlook on

future work in Section 7.

2 Models

2.1 Thermal Model

A wide rangeof increasinglysophisticatedthermal mod-

els for integrated circuits have been proposed in the last

few years. Some are comparativelysimple, chip-wide mod-

els, such as developed by Dhodapkar et al. [11] in TEM-

PEST. Other models, such as used in HotSpot [4], describe

the thermal behavior at the granularity of architecture-level

blocks or below, and so more accurately capture the effects

of hotspots.

In this paper we will be using a very simple chip-wide

thermal model previously used in [6,7,11,12]. While this

model does not capture fine-granularity thermal effects, the

authors in [4] for example agree that it is somewhat appro-

priate for the investigation of chip-level techniques, such

as speed-scaling. In addition, existing processors typically

have well-defined hotspots, and accurate placement of sen-

sors allows alleviates the need for fine-granularity tempera-

ture modeling. The Intel Core Duo processor, for example,

has a highly accurate digital thermometer placed at the sin-

glehotspotofeachdie, inadditiontoasinglelegacythermal

diode for both cores [5]. More accurate thermal models can

be derived from this simple one by more closely modeling

the power dissipation (such as the use of active dissipation

devices) or by augmenting the input power by a stochastic

component, etc.

We defines(t) as theprocessor speed(frequency)at time

t. ThentheinputpowerP(t) at time t is usuallyrepresented

as

P(t) = κsα(t),

(1)

for some constant κ and α > 1. Usually, it is assumed that

α = 3 [6,7].

We assume that the ambient has a fixed temperature, and

that temperatureis scaled so that the ambient temperatureis

zero. We define T(t) as the temperatureat time t. We adopt

Fourier’s Law as shown in the following formula [6,7,12]:

T′(t) =P(t)

Cth

−

T(t)

RthCth,

(2)

where Rthis the thermal resistance and Cthis the thermal

capacitance of the chip. Applying (1) into (2), we have

T′(t) = asα(t) − bT(t),

(3)

where a and b are positive constants and defined as follows:

κ

Cth,b =

Equation (3) is a classic linear differential equation. If

weassumethatthetemperatureattimet0isT0, i.e.,T(t0) =

T0, (3) can be solved as

?t

Weobservethatwecanalwaysappropriatelyscalethespeed

to control the temperature:

• If we want to keep the temperatureconstantat a value

TC during a time interval [t0,t1], then for any t ∈

[t0,t1], we can set

s(t) = (bTC

a =

1

RthCth.

(4)

T(t) =

t0

asα(τ)e−b(t−τ)dτ + T0e−b(t−t0).

(5)

a

)

1

α.

(6)

• If, on the other hand, we keep the speed constant at

s(t) = sCduring the same interval, then the temper-

ature develops as follows:

T(t) =asα

b

This relation between processor speed and temperature is

the basis for any speed scaling scheme.

C

+ (T(t0) −asα

C

b

)e−b(t−t0).

(7)

Page 3

2.2 Speed Scaling

The effect of many dynamic thermal management sche-

mes (most prominently DVS and clock throttling) can be

described by the speed/temperature relation depicted in (6)

and (7). The goal of dynamic thermal management is to

maintain the processor temperature within a safe operating

range, and not exceed what we call the highest-temperature

thresholdTH, which in turn shouldbe at a safe marginfrom

the maximum junction temperature of the chip. Tempera-

ture control must ensure that

T(t) ≤ TH.

(8)

On the other hand, we can freely set the processor speed, up

to some maximum speed sH, i.e.,

0 ≤ s(t) ≤ sH.

(9)

In the absence of dynamic speed scaling we have to set

a constant value of the processing speed so that the temper-

ature will never exceed TH. Assuming that the initial tem-

perature is less than TH, we can define equilibrium speed

sEas

sE= (b

aTH)

1

α.

(10)

ForanyconstantprocessorspeednotexceedingsE, the pro-

cessor does not exceed temperature TH. Note that the equi-

librium speed sEis the maximum constant speed that we

can set to maintain the safe temperature level.

A dynamic speed scaling scheme would take advantage

of the power dissipation during idle times. It would make

use of periods where the processor is “cool”, typically after

idle periods,to dynamicallyscale the speed andtemporarily

executetasks at speedshigherthansE. As a result, dynamic

speed scaling would be used to improve the overall proces-

sor utilization.

Indefiningthe dynamicspeedscalingalgorithmwe must

keepin mind that (a) it must be supportedby existing power

controlframeworkssuch as ACPI [8,9],and (b) it must lead

to tractable design – time delay analysis. We therefore use

the following very simple reactive speed scaling algorithm:

The processor will run at maximum speed sH

whenthereis backloggedworkloadandthetem-

perature is below the threshold TH. Whenever

the temperature hits TH, the processor will run

at the equilibrium speed sE, which is defined

in (10). Whenever the backlogged workload

is empty, the processor idles (runs at the zero

speed).

If we define W(t) as the backlogged workload at time t,

the speed scaling scheme described beforecan be expressed

using the following formula:

s(t) =

sH,

sE,

0,

(W(t) > 0) ∧ (T(t) < TH)

(W(t) > 0) ∧ (T(t) = TH)

W(t) = 0

(11)

Figure 1 shows an exampleof how temperaturechangesun-

der reactive speed scaling.

t

H s

E s

)(ts

t

)(t

T

T

H

Figure 1. Illustration of reactive speed scal-

ing.

It is easy to show that in any case the temperature never

exceeds the threshold TH. By using the full speed some-

time, we aim to improvethe processor utilization compared

with the constant-speed scaling. The reactive speed scal-

ing is very simple: whenever the temperature reaches the

threshold, an event is triggered by the thermal monitor, and

the system throttles the processor speed.

2.3 Task Model and Scheduling Algorithms

The workload consists of a set of tasks {Γi : i = 1,2,

...,n}. Each task Γiis composed of a sequence of jobs.

For a job, the time elapsed from the release time trto the

completion time tf is called the delay of the job, and the

worst-case delay of all jobs in Task Γi is denoted by di.

Jobs within a task are executed in a first-in first-out order.

We characterizetheworkloadofTaskΓibytheworkload

function fi(t), the accumulated requested processor cycles

of all the jobs from Γireleased during [0,t]. Similarly, to

characterize the actual executed processor cycles received

by Γi, we define gi(t), the service function for Γi, as the

totalexecutedprocessorcyclesrenderedtojobsofΓiduring

[0,t].

A time-independent representation of fi(t) is the work-

load constraint function Fi(I), which is defined as follows.

Definition 1 (Workload Constraint Function). Fi(I) is a

workloadconstraintfunctionfortheworkloadfunctionfi(t),

if for any 0 ≤ I ≤ t,

fi(t) − fi(t − I) ≤ Fi(I).

(12)

Page 4

For example, if a task Γiis constrainedby a leaky bucket

with a bucket size σiand an average rate ρi, then

Fi(I) = σi+ ρiI.

(13)

Once tasks arrive in our system, a scheduling algorithm

will be used to schedule the service order of jobs from dif-

ferent tasks. Both the workload and the scheduling algo-

rithm will determine the delay experienced by jobs. In this

paper,weconsidertwoschedulingalgorithms: First-inFirst-

out (FIFO) scheduling and Static Priority (SP) scheduling.

3Important Lemmas

The difficulty for delay analysis in a system with reac-

tive speed scaling lies in the speed of the processor not be-

ing constant. Moreoverthe changes in processing speed are

triggered by the thermal behavior, which follows (11). As

a result, as we will show, simple busy-period analysis does

not work.

The followingtwo lemmas show how the change of tem-

perature, job arrival, job execution will affect the tempera-

ture at a later time or the delay of a later job.

Lemma 1. In a system under our reactive speed scaling,

given a time instance t, we consider a job with a release

time trand a completion time tfsuch that tr< t and tf<

t. We assume that the processor is idle during [tf,t]. If we

take either of the following actions as shown in Figure 2:

t

rt

ft

t

rt

0t

*

ft

(A)

t

rt

*

ft

(B)

t

*

ft

*

rt

(C)

Figure 2. Temperature effect.

• Action A: Increasing the temperature at time t0(t0≤

tr) such that the job has the same release time trbut

a new completion time t∗

fsatisfying t∗

f< t;

• Action B: Increasing the processor cycles for this job

such that the job has the same release time trbut a

new completion time t∗

fsatisfying t∗

f< t;

• Action C: Shifting the job such that the job has a new

release time t∗

ing tr< t∗

rand a new completion time t∗

r< t and tf< t∗

then we have Tt ≤ T∗

atures at time t in the original and the modified scenarios

respectively.

rsatisfy-

f< t,

t, where Ttand T∗

tare the temper-

Lemma 2. In a system under our reactive speed scaling,

we consider two jobs Jk’s (k = 1,2), each of which has a

release time tk,rand the completion time tk,f. We assume

t1,f < t2,f. If we take either of the following actions as

shown in Figure 3:

r

t, 1

f

t, 1

r

t, 2

f

t, 2

r

t, 2

*

, 2 f

t

*

, 1 f

t

*

, 1 r

t

r

t, 1

*

, 1 f

t

*

, 2 f

t

r

t, 1

f

t, 1

r

t, 2

0t

*

, 2 f

t

(A)

(B)

(C)

r

t, 2

Figure 3. Delay effect.

• Action A: Increasingthe temperature at t0(t0≤ t2,r)

such that Job J2has the same release time t2,rbut a

new completion time t∗

2,f;

• Action B: Increasing the processor cycles of Job J1

such that Job Jk(k = 1,2) has the same release time

tk,rbut a new completion time t∗

k,f;

• Action C: Shifting Job J1such that Job J1has a new

release time t∗

Job J2has the same release time t2,rand a new com-

pletion time t∗

t∗

1,rand a new completion time t∗

1,f, and

2,fsatisfying t1,r ≤ t∗

1,rand t∗

1,f≤

2,f,

then t2,f≤ t∗

J2in the original and the modified scenarios respectively,

then d2≤ d∗

2,f. If we define d2and d∗

2as the delay of Job

2.

The proofs of Lemmas 1 and 2 can be found in [13].

Here we summarize the three actions defined in the above

two lemmas as follows:

• Action A: Increasing the temperature at some time

instances;

• Action B: Increasing the processor cycles of some

jobs;

• Action C: Shifting some jobs to a later time.

By the lemmas, with either of the above three actions, we

can increase the temperature at a later time and the delay of

the later job.

The above two lemmas together with the three actions

are important to our delay analysis under reactive speed

scaling, which will be our focus in the next two sections.

Page 5

4 Delay Analysis of FIFO Scheduling

Recall that the speed of the processor is triggered by the

thermal behavior and varies over time under reactive speed

scaling. Simple busy-period analysis will not work in this

environment. In simple busy-period analysis, the jobs ar-

riving before the busy period will not affect the delay of

jobs arriving during the busy period. However, under reac-

tivespeedscaling,theexecutionofa jobarrivingearlierwill

heatuptheprocessorandsoaffectthedelayofajobarriving

later as shown in Lemma 2. Therefore, in the busy-period

analysis under reactive speed scaling, we have to take this

effect into consideration.

We startourdelayanalysisinthesystemwithFIFOsche-

duling. Under FIFO scheduling, all tasks experience the

same worst-case delay as the aggregated task does. There-

fore, we consider the aggregatedtask, whose workloadcon-

straintfunctioncanbewrittenasF(I) =?n

i=1Fi(I). First,

we investigate the worst-case delay for the aggregated task.

Delay Constraint

length δ1during which a job will experiencethe longest de-

lay and immediatelybeforewhich the processoris idle. The

processor runs at high speed sH in Interval [t1,t1,h] with

length δ1,hand at equilibrium speed sEin Interval [t1,h,t0]

with length δ1,eas shown in the right side of Figure 4(a).

We consider a busy period [t1,t0] with

δm,0δm,h

tm

tm,0tm-1

t3,0

δ2,0

t2,0

t1

t2

δ2,h

δ1,h

δ1,e

t1,h

t0

δ3,0

δ3,h

t3

tm

tm-1

t1

t2

t1,h

t0

t3

(a)

(b)

δ1,h

δ1,e

Figure 4. Job executions.

We define d as the worst-case delay experiencedby a job

in the busy period [t1,t0]. Then, by the definition of worst-

case delay, we have

d = sup

t≥t1

{inf{τ : f(t) ≤ g(t + τ)}}},

(14)

where f(t) and g(t) are the workload function and the ser-

vice function of the aggregated task respectively, as defined

in Section 2. In other words, if by time t + τ, the service

received by the task is no less than its workload function

f(t), then all jobs of the task arriving before time t should

have been served, with a delay no more than τ.

Since the processor is idle at time t1, we have f(t1) =

g(t1). Therefore, f(t) ≤ g(t + τ) in (14) can be written as

f(t) − f(t1) ≤ g(t + τ) − g(t1).

(15)

First, we study the right side of (15). Recall that the proces-

sor runs at high speed sHin Interval [t1,t1,h] with length

δ1,hand at equilibrium speed sEin Interval [t1,h,t0] with

length δ1,e. If we define I = t − t1, then we have

g(t + τ) − g(t1) = G(I + τ),

(16)

where G(I), a service constraint functionof g(t), is defined

as

G(I) = min{(sH− sE)δ1,h+ sEI,sHI}.

(17)

Next, we study the left side of (15). With Action B, the job

will experiencea longer delay with more workload released

and completed beforethe completion of this job. Therefore,

if we set

f(t) − f(t1) = F(I),

(18)

together with (16), then the worst-case delay in (15) can be

written as (see Figure 5)

d = sup

I≥0{inf{τ : F(I) ≤ G(I + τ)}}.

(19)

)(IF

)(IG

h , 1 δ

e , 1 δ

d

I

Figure 5. Delay constraint.

As we can see, the undeterminedservice constraintfunc-

tion G(I) is the key in the worst-case delay formula (19).

Next, we will focus on obtaining G(I).

Service Constraint

tion of δ1,h, which depends on the temperature at time t1.

Instead of determining the exact temperature at t1, we aim

to obtain a tight upper-bound of t1, which will result in an

upper-boundoftheworst-casedelayaccordingtoLemma2.

To achieve this, we introduce extra intervals [tk+1,tk]’s

(k = 1,..., m−1), as shown in Figure 4(a). By Lemma 1,

we can use the three actions mentioned above to upper-

bound the temperature at t1. With Action A, we upper-

bound the temperature at tm to be TH. With Action C,

for each Interval δk(k = 2,...,m), we shift all parts of

job execution to the end of this interval, such that the be-

ginning part is idle with length δk,0and the ending part is

busy with length δk,h, as shown in Figure 4(b). We assume

that the temperature will not hit THduring [tm,t1]1, then

As defined in (17), G(I) is a func-

1If there is an interval [tk0+1,tk0] during which the temperature hits

TH, then the temperature at tk0is TH. In this case, we can set m = k0

and remove all intervals on the left.

Page 6

the processor will run at high speed sHduring each interval

[tk+1,0,tk].

We consider the service received in each interval [tk,t0],

k = 1,...,m. As shown in Figure 4(b), the executed pro-

cessor cycles in [tk,t0] can be written as

g(t0) − g(tk) = sH

k

?

j=1

δj,h+ sEδ1,e.

(20)

For k = 1, we have g(t0) − g(t1) = f(t0) −f(t1). Fol-

lowing the delay analysis in the above delay constraint, we

consider the worst-case workload f(t0) − f(t1) = F(t0−

t1). Therefore, by (20) we have

sHδ1,h+ sEδ1,e= F(δ1,h+ δ1,e).

(21)

For k = 2,...,m, by the definition of the worst-case

delay, the number of processor cycles in Interval [tk,t0]

is bounded as g(t0) − g(tk) ≤ f(t0) − f(tk− d). By

Lemma 2, the delay becomes longer when g(t0) − g(tk) =

f(t0) − f(tk− d) = F(t0− tk+ d) by either shifting

the job execution or increasing the processor cycles of jobs.

Therefore, by (20) we have

sH

k

?

j=1

δj,h+ sEδ1,e= F(

k

?

j=1

δj,h+ δ1,e+ d).

(22)

Note that the service received by jobs depends on the

processingspeed, which changeswith the thermalbehavior.

Next we want to see how the temperature changes in each

interval.

Temperature Constraint

val [tk+1,tk], k = 1,...,m − 1, which is composes of an

idle period with length δk+1,0and a busy period with length

δk+1,h. Define Tkas the temperature at tk, then following

the temperature formula (7), we have

First, we consider each inter-

Tk=asα

H

b

+ (Tk+1e−bδk+1,0−asα

H

b

)e−bδk+1,h.

(23)

Together with the assumption that Tk≤ THand Tm= TH,

we have

Tk

TH

=(sH

sE)α?m

+e−bPm

r=k+1e−bPr−1

l=k+1δl(1 − e−bδr,h)

l=k+1δl≤ 1.

(24)

Next, considering Interval [t1,t1,h], we have

T1

TH

= (sH

sE)α− ((sH

sE)α− 1)ebδ1,h.

(25)

Therefore, for any given values of δ1,h, δ1,e, δk,0, and

δk,h, k = 2,...,m, which are constrained by the above

constraint conditions(19), (21), (22), (24), and (25), we can

obtain an upper-bound of the worst-case delay, which we

denote as d(δ1,h,δ1,e,δ2,0,δ2,h,...,δm,0,δm,h). Note that

d(δ1,h,δ1,e,δ2,0,δ2,h,...,δm,0,δm,h)canalwaysboundthe

worst-case delay. In order to find a tight upper-boundof the

worst-case delay, we can choose a set of δk,0’s and δk,h’s

to minimize d(δ1,h,δ1,e,δ2,0,δ2,h,...,δm,0,δm,h) as sum-

marized in the following theorem:

Theorem 1. In a system with FIFO scheduling under reac-

tive speed scaling, the worst-case delay d can be obtained

by the following formula

d= min{d(δ1,h,δ1,e,δ2,0,δ2,h,...,δm,0,δm,h)}

subject to (19), (21), (22), (24), and (25).

(26)

As a case study, in the following, we consider a leaky-

bucket task workload and have the following theorem for

the worst-case delay with FIFO scheduling:

Corollary 1. In a system with FIFO scheduling under re-

active speed scaling, we consider tasks with leaky-bucket

workload and the workload constraint function of the ag-

gregated task is F(I) = σ + ρI. Define χ1 =

χ2 =

pressed as follows:

sE

sHand

ρ

sH. A tight bound of the worst-case delay d is ex-

d =

?

V (X − Y ),

V (X − Y − Z),

χ2≤ χα

otherwise

1

(27)

where V =

and Z =

when the processor always runs at sHand sErespectively,

i.e., dH=

constrained by

(1−χ1)(1−χ2)

χ1−χ2

χ2

1−χ2lnχ2

, X =

χ1

1−χ1dE, Y =1

bln1−χ2

1−χα

1,

1

bχα

1. Define dH and dE as the delay

σ

sHand dE=

σ

sE. The worst-case delay d is also

dH≤ d ≤ dE.

(28)

The proof is given in Appendix A.

5 Delay Analysis of SP Scheduling

In order to perform delay analysis in the system with SP

scheduling, we introduce the following lemma:

Lemma3. Nomatter whatscheduling(FIFOorSP)isused

in a system under reactive speed scaling definedin (11), the

service function g(t) of the aggregatedtask will be uniquely

determined by the workload functionf(t) of the aggregated

task, not by the scheduling algorithm.

Proof: The service function g(t) can be written as

g(t) =

?t

0

s(τ)dτ.

(29)

Page 7

According to (11), s(t) is determined by W(t) and T(t)

underreactive speed scaling, where W(t) is the backlogged

workload at time t (i.e., W(t) = f(t) − g(t)) and T(t)

is determined by s(t) according to (5). Therefore, the ser-

vice functiong(t) will be uniquelydeterminedby the work-

load function f(t) of the aggregated task. We have no as-

sumption of the scheduling algorithm. Hence the lemma is

proved.

Based on this lemma, we are able to obtain the worst-

case delay under SP scheduling as shown in the following

theorem2:

Theorem 2. In a system with SP scheduling under reactive

speed scaling, the worst-case delay difor Task Γican be

obtained by the following formula

di= sup

I≥0{inf{τ :

?i−1

j=1Fj(I + τ) + Fi(I)

≤ G(I + τ)}},

(30)

where G(I) is defined in (17) and δ1,hin G(I) can be ob-

tainedbyminimizingd(δ1,h,δ1,e,δ2,0,δ2,h,...,δm,0,δm,h)

in Theorem 1.

Proof: We considera busyinterval[t1,t0], duringwhich

at least onejob fromTasks Γj(j ≤ i) is running,and imme-

diately before which no jobs from Tasks Γj(j ≤ i) are run-

ning. We know that the delay of a job J of Task Γiis intro-

duced by two arrival stages of jobs in the queue: all queued

jobs at J’s release time and the higher-priorityones coming

between J’s release time and completion time. Then we

have the worst-case delay for a job of Task Γias follows:

di= sup

t≥t1

{inf{τ :

?i−1

≤

j=1fj(t + τ) + fi(t)

?i

j=1gj(t + τ)}},

(31)

where fi(t) and gi(t) are the workloadfunctionand the ser-

vice function of Task Γirespectively.

By our assumption aboutInterval [t1,t0], we have fj(t1)

= gj(t1), j = 1,...,i, and gj(t) = gj(t1), j = i +

1,...,n. Therefore,?i−1

fj(t1))+(fi(t)−fi(t1)) ≤?n

lay happens when?i−1

and then (30) holds. In (30), G(I) is defined in (17). By

Lemma 3, the service function under SP scheduling is same

as the one under FIFO scheduling. Then δ1,hin G(I) can

be obtainedby minimizingd(δ1,h,δ1,e,δ2,0,δ2,h, ...,δm,0,

δm,h) in Theorem 1.

j=1fj(t+τ)+fi(t) ≤?i

j=1(gj(t+τ)−gj(t1)). With

j=1gj(t+

τ) in the aboveformulacan be written as?i−1

the similar analysis for FIFO scheduling,the worst-case de-

j=1(fj(t + τ) − fj(t1)) + (fi(t) −

fi(t1)) =?i−1

j=1(fj(t+τ)−

j=1Fj(I + τ) + Fi(I). Define I = t − t1

2In the following, the smaller index of a task indicates a higher priority.

Similarly, in the following we consider the leaky-bucket

task workload as a case study. We have the following theo-

rem on the worst-case delay for SP scheduling:

Corollary2. In a system with SP schedulingunderreactive

speed scaling, we assume that Task Γihas a workload con-

straint function Fi(I) = σi+ ρiI. The worst-case delay di

for Task i can be written as

di= max{dE,i− ∆,dH,i},

(32)

where

dE,i

=

?i

?i

σ − sEd

sE−?i−1

j=1σj

sE−?i−1

sH−?i−1

j=1ρj

,

(33)

dH,i

=

j=1σj

j=1ρj

,

(34)

∆=

j=1ρj

.

(35)

and d in (35) can be obtained by Corollary 1.

The proof is given in Appendix B.

6 Performance Evaluation

In this section we evaluate the benefit of using simple

reactive speed scaling scheme by comparing the worst-case

delay with that of a system without speed scaling. We adopt

as the baseline a constant-speed processor that runs at equi-

librium speed sE.

We choose the same setting as [4] for a silicon chip. The

thermal conductivityof the silicon material per unit volume

is kth= 100 W/mK and the thermal capacitance per unit

volume is cth = 1.75 × 106J/m3K. The chip is tth =

0.55 mm thick. Therefore, the thermal RC time constant

RC =

b ≈ 228.6 sec−1. The ambient temperature is 45◦C and the

maximumtemperaturethreshold is 85◦C, then TH= 40◦C.

The equilibrium speed sEwill be fixed by the system, but

sHcan be freely chosen. We arbitrarily pick sH =

and assume α = 3.

We consider three tasks Γi’s (i = 1,2,3). As a case

study, we consider a leaky-bucket workload and assume

each task Γihas a leaky bucket arrival with Fi(I) = σi+

ρiI. The aggregate task has an arrival with F(I) = σ +ρI,

where σ =

?3

and [0,0.5] respectively. We compare the worst-case delay

of jobs in the system under reactive speed scaling and the

baseline one in the systems the processor always run at the

equilibrium speed.

cth

ktht2

th= 0.0044 sec [4]. Hence by Equation (4)

10

7sE

i=1σi and ρ =

?3

i=1ρi. In our evalua-

tion, we vary σ/sE and ρ/sE in the ranges of [0,0.005]

Page 8

0.01

0.262

0.01

0.038

0.038

0.066

0.066

0.094

0.094

0.122

0.122

0.15

0.15

0.178

0.178

0.206

0.262

0.206

0.234

0.234

0.29

0.29

ρ/sE

σ/sE sec

00.05 0.10.15 0.20.250.3 0.35 0.40.450.5

0

1

2

3

4

5x 10

−3

Figure 6. A contour plot of delay decrease ra-

tio |d − dE|/dEfor the aggregated task under

reactive speed scaling for FIFO scheduling.

FirstweconsiderFIFOscheduling. Weevaluatetheworst-

case delay decrease ratio |d − dE|/dEfor the aggregated

task.

terms of σ/sE and ρ/sE. We observe that the delay de-

crease ratio changes from a minimum 0 (as d = dE) to a

maximum of 1 −sE

decrease ratio will decrease as either σ or ρ increases.

Next we consider SP scheduling. We assume that σ1 :

σ2 : σ3 = ρ1 : ρ2 : ρ3 = 1 : 2 : 3. We evaluate the

worst-casedelay decreaseratio |di−dE,i|/dE,ifor Task Γi.

Each individual picture in Figure 7 shows contour plots of

|di− dE,i|/dE,iin terms of σ/sEand ρ/sE, for the three

tasks separately. We observe that the delay decrease ratio

changesfrom a minimumof 0 (as di= dE,i) to a maximum

of 1 −sE

sE

sH)/(1−1

6

(1 −sE

2

As if the delay decrease ratio is not larger than 0.3, the

ratiowill decrease as eitherσ orρ increasesforanytask. As

if it becomes larger than 0.3, we have different observation

results for the lower-priority tasks. In particular, consider-

ing the lower-priority task, for small σ and ρ, the delay de-

crease ratio can be written as (1−sE

Therefore,as shown at the left-bottomcornerof the last two

contour plots in Figure 7, the delay decrease ratio will keep

constant as σ increases and ρ keeps constant, but increase

as ρ increases and σ keeps constant.

3Figure 6 shows a contour plot of |d − dE|/dEin

sH= 0.300 (as d = dH). The delay

sH= 0.300 for Task Γ1, to a maximum of (1 −

sE

sH) = 0.316 for Task Γ2, and to a maximumof

sH)/(1 −1

sE

sH) = 0.353 for Task Γ3(as di= dE,i).

sH)/(1−

1

sH

?i

j=1ρj).

7 Conclusion and Future Work

Delay analysis in systems with temperature-constrained

speedscalingis difficult,asthetraditionaldefinitionof“busy

3The alert reader has noticed that we did not define a value for param-

eter a. This is because a appears only in the computation of sE, which

cancels out in the delay decrease ratio.

0.01

0.01

0.08

0.08

0.15

0.15

0.22

0.22

0.29

0.29

ρ/sE

σ/sE sec

Γ1

00.050.10.150.20.250.30.350.40.45 0.5

0

1

2

3

4

5x 10

−3

0.01

0.01

0.066

0.305

0.066

0.122

0.122

0.178

0.178

0.234

0.234

0.29

0.29

0.301

0.301

0.301

0.303

0.303

0.303

0.305

0.307

0.307

0.309

0.309

0.311

0.3130.315

ρ/sE

σ/sE sec

Γ2

0 0.05 0.10.150.20.250.30.350.40.450.5

0

1

2

3

4

5x 10

−3

0.01

0.178

0.331

0.01

0.038

0.122

0.29

0.038

0.066

0.206

0.066

0.094

0.262

0.094

0.122

0.15

0.341

0.15

0.178

0.206

0.234

0.234

0.262

0.311

0.29

0.301

0.301

0.321

0.306

0.306

0.311

0.316

0.316

0.326

0.336

0.346

ρ/sE

σ/sE sec

Γ3

0 0.050.1 0.150.20.250.30.35 0.40.45 0.5

0

1

2

3

4

5x 10

−3

Figure 7. Contour plots of delay decrease ra-

tio |di−dE,i|/dE,ifor Task Γi(i = 1,2,3) under

reactive speed scaling for SP scheduling.

period” does not apply and it becomes difficult to separate

the execution of jobs from the interference by ones arriving

earlier or having low priorities because of dynamic speed

scaling triggered by the thermal behavior. In this paper we

have shown how to compute bounds on the worst-case de-

lay for tasks with arbitrary job arrivals for both FIFO and

SP scheduling algorithms in a system with a very simple

speed scaling algorithm, which simply runs at maximum

speed until the CPU becomes idle or reaches a critical tem-

perature. In the latter case the processing speed is reduced

(throughDVS or appropriateclock throttling) to an equilib-

rium speed that keeps the temperature constant. We have

shown that such a scheme reduces worst-case delays.

In order to further improve the performance of speed

scaling, onewouldhavetofind ways topartiallyisolate jobs

Page 9

from the thermal effects of ones arriving earlier or having

low priorities. One weakness of the proposedspeed-scaling

algorithmis its inabilityto pro-activelyprocess low-priority

tasks at lower-than-equilibrium speeds. At this point we

don’t know how to perform delay analysis for non-trivial

speed scaling algorithms, however.

References

[1] D. Brooksand M.Martonosi, “Dynamicthermalman-

agement for high-performance microprocessors,” in

Proceedings of the 7th International Symposium on

High-Performance Computer Architecture, 2001.

[2] Semiconductor Industry Association,

national technology roadmap for semiconductors,”

http://public.itrs.net.

“2005 inter-

[3] E. Rotem, A. Naveh, M. Moffie, and A. Mendelson,

“Analysis of thermalmonitorfeatures of the Intel Pen-

tium M processor,” in Proceedings of the First Work-

shop on Temperature-Aware Computer Systems, 2004.

[4] K. Skadron, M. Stan, W. Huang, S. Velusamy,

K. Sankaranarayanan, and D. Tarjan, “Temperature-

aware microarchitecture: Extended discussion and re-

sults,” Tech. Rep. CS-2003-08, Department of Com-

puter Science, University of Virginia, 2003.

[5] S. Gochman, A. Mendelson, A. Naveh, and E. Rotem,

“Introduction to Intel Core Duo processor architec-

ture,” Intel Technology Journal, vol. 10, no. 2, pp.

89 – 97, 2006.

[6] N. Bansal, T. Kimbrel, and K. Pruhs, “Dynamic speed

scaling to manage energy and temperature,” in IEEE

Syposium on Foundations of Computer Science, 2004.

[7] N. Bansal and K. Pruhs, “Speed scaling to manage

temperature,” in Symposiumon Theoretical Aspects of

Computer Science, 2005.

[8] H. Sanchez, B. Kuttanna, T. Olson, M. Alexander,

G. Gerosa, R. Philip, and J. Alvarez, “Thermal man-

agement system for high performancepowerpcmicro-

processors,” in IEEE International Computer Confer-

ence, 1997.

[9] E. Rotem, A. Naveh, M. Moffie, and A. Mendelson,

“Analysis of thermal monitor features of the intel pen-

tium m processor,”in Workshop on Temperature-

aware Computer Systems, 2004.

[10] S. Wang and R. Bettati, “Reactive speed control in

temperature-constrained real-time systems,”

romicro Conference on Real-Time Systems, 2006.

in Eu-

[11] A. Dhodapkar, C.H. Lim, G. Cai, and W.R. Daasch,

“TEMPEST: A thermal enabled multi-model pow-

er/performance estimator,” in Workshop on Power-

Aware Computer Systems, ASPLOS-IX, 2000.

[12] A. Cohen, L. Finkelstein, A. Mendelson, R. Ronen,

and D. Rudoy, “On estimating optimal performance

of CPU dynamic thermal management,” in Computer

Architecture Letters, 2003.

[13] S. Wang and R. Bettati,

temperature-constrained hard real-time systems with

general task arrivals,”

5-3, Department of Computer Science, Texas A&M

University, 2006, http://www.cs.tamu.edu/

academics/tr/tamu-cs-tr-2006-5-3.

“Delay analysis in

Tech. Rep. tamu-cs-tr-2006-

AProof of Corollary 1

We follow the analysis in Section 4, we consider the

three constraints as follows:

Delay Constraint

obtain the delay d as

Since F(I) = σ + ρI, by (19) we can

d=max{σ

sH,σ

sE

− (sH

sE

− 1)δ1,h}.

(36)

Therefore, we have

δ1,h=

σ

sE− 1,

sE− d

sH

(37)

as

d ≥

σ

sH.

(38)

Service Constraint

considerequalintervalsandassumeδk= δ fork = 3,...,m.

We investigate the service received in each interval [tk,t0],

k = 1,...,m.

As k = 1, by (21) we have

To simplify the service analysis, we

sHδ1,h+ sEδ1,e= σ + ρ(δ1,h+ δ1,e).

(39)

Hence,

(sH− ρ)δ1,h+ (sE− ρ)δ1,e= σ.

(40)

As k = 2,...,m, by (22), we have

sH

?k

=

j=1δj,h+ sEδ1,e

σ + ρ((k − 2)δ + (δ2+ δ1) + d).

(41)

Page 10

If k = 2, together with (40), we have

δ2,h=δ2,0+ d

sH

ρ− 1.

(42)

If k ≥ 3, we have

δk,h=

ρ

sHδ.

(43)

TemperatureConstraint

the temperature constraint condition (24).

As k = 2,...,m − 1,

By(37)and(43),wecanrewrite

Tk

TH

= e−b(m−k)δ(1 − ξ) + ξ,

(44)

where

ξ = (sH

sE)α1 − e−b

ρ

sHδ

1 − e−bδ.

(45)

By (44), T2is a function of δ and m. It is easy to know that

the smaller T2is, the short delayd is. Therefore,we want to

find δ and m to minimize T2so that we a tight upper-bound

d of the original worst-case delay.

If ξ ≤ 1, then Tk/TH ≤ 1. By (44), T2is a decreasing

functionintermsof(m−2)δ, thenT2/TH≥ lim(m−2)δ→∞

T2/TH = ξ.

of δ, then T2/TH ≥ limδ→0ξ = (sH

we choose the minimum and set T2/TH = (sH

(sH

Ifξ > 1,thenT2/THis themaximumamongallTk/TH’s.

Therefore, we only need to consider bound T2/TH ≤ 1.

By (44), T2/TH is an increasing function in terms of m,

then T2/THwill be minimized at m = 2. Hence, we set

T2/TH= 1 in this case.

Therefore, with the analysis above, we can set

4Furthermore, ξ is an increasing function

sE)α ρ

sH. Therefore,

sE)α ρ

sHas

sE)α ρ

sH≤ 1.

T2

TH

=min{(sH

sE)αρ

sH,1}.

(46)

At the same time, by (23) and (25), we have

δ1,h+ δ2,h=1

bln(sH

sE)α−T2

(sH

sE)α− 1

THe−bδ2,0

.

(47)

Therefore, by (37), (42), (46), and (47), we can obtain

the worst-case delay d as follows:

d=

(1 − χ1)(1 − χ2)

χ1− χ2

−1

(

χ1

1 − χ1

1}e−bδ2,0

1 − χα

1

σ

sE

+

χ2

1 − χ2δ2,0

bln1 − min{χ2,χα

),

(48)

4We assume that at time zero the system is at lowest temperature.

Therefore, we can pick the intervals with overall length up to infinity.

where χ1=sE

Equation (48) shows that d is a function of δ2,0. Since

the above analysis works for any chosen δ2,0, we want to

obtain a minimum d in terms of δ2,0. There are two cases:

• χ2≤ χα

sH, and χ2=

ρ

sH.

1: d will be minimized at δ2,0= 0, therefore

d = V (X − Y ),

(49)

• χ2 > χα

therefore

1: d will be minimized at δ2,0 =

1

blnχ2

χα

1,

d = V (X − Y − Z),

(50)

where V,X,Y,Z are defined in Corollary 1.

On the other hand, in the temperature constraint, as k =

1, by (25) and the constraint that T1/TH≤ 1, we have

δ1,h≥ 0.

(51)

Therefore, by (37), we have

d ≤

σ

sE.

(52)

Recall that dH=

the worst-case delay is also constrained by

σ

sHand dE=

σ

sE, then by (38) and (52),

dH≤ d ≤ dE.

(53)

B Proof of Corollary 2

By Theorem 2, G(I) = min{(sH− sE)δ1,h+ sE(I +

di),sH(I +di)} depends on δ1,h. For the leaky bucket task

workload,by(37)wehaveδ1,h= (σ

d can be obtained by Corollary 1.

By (30), the delay formula can written as

sE−d)/(sH

sE−1), where

?i−1

j=1σj+ ρj(I + di) + σi+ ρiI =

min{(sH− sE)δ1,h+ sE(I + di),

sH(I + di)}

(54)

Then, as di> δ1,h, we have

di=

?i−1

j=1(σj+ ρjdi) + σi

sE

− (sH

sE

− 1)δ1,h,

(55)

otherwise

di=

?i−1

j=1(σj+ ρjdi) + σi

sH

.

(56)

Therefore, by (37), (55), and (56), we have

di= max{dE,i− ∆,dH,i},

(57)

where dE,iand dH,iare defined in Corollary 2, and d can

be obtained by Corollary 1. The worst-case delay diis con-

strained by

dH,i≤ di≤ dE,i.

(58)