Page 1

Delay Analysis in Temperature-Constrained Hard Real-Time Systems

with General Task Arrivals

Shengquan Wang

The University of Michigan - Dearborn

Dearborn, MI 48128, USA

shqwang@umd.umich.edu

Riccardo Bettati

Texas A&M University

College Station, TX 77843, USA

bettati@cs.tamu.edu

Abstract

Inthispaper,westudytemperature-constrainedhardreal-

time systems, where real-time guarantees must be met with-

out exceeding safe temperature levels within the proces-

sor. Dynamic speed scaling is one of the major techniques

to manage power so as to maintain safe temperature lev-

els. As example, we adopt a simple reactive speed con-

trol technique in our work. We design a methodology to

perform delay analysis for general task arrivals under re-

active speed control with First-In-First-Out (FIFO) sche-

duling and Static-Priority (SP) scheduling. As a special

case, we obtain a close-form delay formula for the leaky-

bucket task arrival model. Our data show how simple reac-

tive speed control can decrease the delay of tasks compared

with any constant-speed scheme.

1 Introduction

With the rapidly increasing power density in processors

the problem of thermal management in systems is becom-

ing acute. Methods to manage heat to control its dissipa-

tion have been gaining much attention by researchers and

practitioners. Techniques are being investigated for ther-

mal control both at design time through appropriate pack-

aging and active heat dissipation mechanisms, and at run

time through various forms of dynamic thermal manage-

ment (DTM) (e.g., [1]).

Thermal management through packaging (that improves

airflow, for example) and active heat dissipation will be-

come increasingly challenging in the near future, due to the

high levels of peak power involved and the extremely high

power density in emerging systems-in-package [2]. In ad-

dition, the packaging requirements and operating environ-

ments of many high-performanceembedded devices render

This work was funded by NSF under Grant No. CNS-0509483, while

Dr. Wang was at Texas A&M University.

such approaches inappropriate.

A number of dynamic thermal management approaches

to control the temperature at run time have been proposed,

ranging from clock throttling to dynamic voltage scaling

(DVS) to in-chip load balancing:

• The Pentium 4 Series processors uses Clock Throt-

tling [3] or Clock Gating [4] to stall the clock and so

allow the processor to cool during thermal overload.

• Dynamic Voltage Scaling (DVS) [1] is used in a va-

riety of modern processor technologies and allows to

switch between different frequency and voltage op-

erating points at run time in response to the current

thermal situation. In the Enhanced Intel SpeedStep

mechanism in the Pentium M processor, for example,

a low-power operating point is reached in response

to a thermal trigger by first reducing the frequency

(within a few microseconds) and then reducing the

voltage (at a rate of one mV per microsecond) [3].

• A number of architecture-level mechanisms for ther-

mal control have been proposed that turn off com-

ponents inside the processor in response to thermal

overload. Skadronetal.[4]forexamplearguethatthe

microarchitecture should distribute the workload in

response to the thermal situation by taking advantage

ofinstruction-levelparallelism. Theperformancepen-

alty caused by this “local gating”would not be exces-

sive. On a coarser level, the Pentium Core Duo Ar-

chitecture allows the OS or the BIOS to disable one

of the cores by putting it into sleep mode [5].

Ashigh-performanceembeddedsystemsbecomeincreas-

inglytemperature-constrained,the questionof howthe ther-

mal behavior of the system and the thermal control mecha-

nisms affect real-time guarantees must be addressed. In this

paperwe describedelayanalysis techniquesintemperature-

constrained hard real-time systems, where deadline con-

straints for tasks have to be balanced against temperature

constraints of the system.

Page 2

Dynamic speed scaling allows for a trade-off between

these two performance metrics: To meet the deadline con-

straint, we run the processor at a higher speed; To maintain

the safe temperature levels, we run the process at a lower

speed. The work on dynamic speed scaling techniques to

controltemperaturein real-time systems was initiated in [6]

and further investigated in [7]. Both [6] and [7] focus on

online algorithms in real-time systems, where the scheduler

learns about a task only at its release time. In contrast, in

our work we assume a deterministic task model (e.g., peri-

odic tasks) and so allows for design-time delay analysis.

We distinguishbetweenproactiveandreactivespeedscal-

ing schemes. Whenever the temperature model is known,

theschedulercouldinprincipleuseaproactivespeed-scaling

approach,where–similarlytoanon-work-conservingsched-

uler – resources are preserved for future use. In this pa-

per, we limit ourselves to reactive schemes, and propose a

simple reactive speed scaling technique for the processor,

which will be discussed in Section 2. We focus on reactive

schemes primarilybecause they are simple to integrate with

current processor capabilities through the ACPI power con-

trol framework [8,9]. In our previous paper [10], we mo-

tivate the reactive scheme and perform delay analysis for

identical-period tasks. In this paper, we extend it to general

task arrivals with First-in First-out (FIFO) scheduling and

Static-Priority (SP) scheduling.

The rest of the paper is organized as follows. In Sec-

tion 2, we introduce the thermal model, speed scaling sche-

mes, and task model and scheduling algorithms. After in-

troducing two important lemmas in Section 3, we design

the methodology to perform delay analysis for FIFO and

SP scheduling algorithms in Sections 4 and 5 respectively.

We measure the performance in Section 6. Finally, we con-

clude our work with final remarks and give an outlook on

future work in Section 7.

2 Models

2.1 Thermal Model

A wide rangeof increasinglysophisticatedthermal mod-

els for integrated circuits have been proposed in the last

few years. Some are comparativelysimple, chip-wide mod-

els, such as developed by Dhodapkar et al. [11] in TEM-

PEST. Other models, such as used in HotSpot [4], describe

the thermal behavior at the granularity of architecture-level

blocks or below, and so more accurately capture the effects

of hotspots.

In this paper we will be using a very simple chip-wide

thermal model previously used in [6,7,11,12]. While this

model does not capture fine-granularity thermal effects, the

authors in [4] for example agree that it is somewhat appro-

priate for the investigation of chip-level techniques, such

as speed-scaling. In addition, existing processors typically

have well-defined hotspots, and accurate placement of sen-

sors allows alleviates the need for fine-granularity tempera-

ture modeling. The Intel Core Duo processor, for example,

has a highly accurate digital thermometer placed at the sin-

glehotspotofeachdie, inadditiontoasinglelegacythermal

diode for both cores [5]. More accurate thermal models can

be derived from this simple one by more closely modeling

the power dissipation (such as the use of active dissipation

devices) or by augmenting the input power by a stochastic

component, etc.

We defines(t) as theprocessor speed(frequency)at time

t. ThentheinputpowerP(t) at time t is usuallyrepresented

as

P(t) = κsα(t),

(1)

for some constant κ and α > 1. Usually, it is assumed that

α = 3 [6,7].

We assume that the ambient has a fixed temperature, and

that temperatureis scaled so that the ambient temperatureis

zero. We define T(t) as the temperatureat time t. We adopt

Fourier’s Law as shown in the following formula [6,7,12]:

T′(t) =P(t)

Cth

−

T(t)

RthCth,

(2)

where Rthis the thermal resistance and Cthis the thermal

capacitance of the chip. Applying (1) into (2), we have

T′(t) = asα(t) − bT(t),

(3)

where a and b are positive constants and defined as follows:

κ

Cth,b =

Equation (3) is a classic linear differential equation. If

weassumethatthetemperatureattimet0isT0, i.e.,T(t0) =

T0, (3) can be solved as

?t

Weobservethatwecanalwaysappropriatelyscalethespeed

to control the temperature:

• If we want to keep the temperatureconstantat a value

TC during a time interval [t0,t1], then for any t ∈

[t0,t1], we can set

s(t) = (bTC

a =

1

RthCth.

(4)

T(t) =

t0

asα(τ)e−b(t−τ)dτ + T0e−b(t−t0).

(5)

a

)

1

α.

(6)

• If, on the other hand, we keep the speed constant at

s(t) = sCduring the same interval, then the temper-

ature develops as follows:

T(t) =asα

b

This relation between processor speed and temperature is

the basis for any speed scaling scheme.

C

+ (T(t0) −asα

C

b

)e−b(t−t0).

(7)

Page 3

2.2 Speed Scaling

The effect of many dynamic thermal management sche-

mes (most prominently DVS and clock throttling) can be

described by the speed/temperature relation depicted in (6)

and (7). The goal of dynamic thermal management is to

maintain the processor temperature within a safe operating

range, and not exceed what we call the highest-temperature

thresholdTH, which in turn shouldbe at a safe marginfrom

the maximum junction temperature of the chip. Tempera-

ture control must ensure that

T(t) ≤ TH.

(8)

On the other hand, we can freely set the processor speed, up

to some maximum speed sH, i.e.,

0 ≤ s(t) ≤ sH.

(9)

In the absence of dynamic speed scaling we have to set

a constant value of the processing speed so that the temper-

ature will never exceed TH. Assuming that the initial tem-

perature is less than TH, we can define equilibrium speed

sEas

sE= (b

aTH)

1

α.

(10)

ForanyconstantprocessorspeednotexceedingsE, the pro-

cessor does not exceed temperature TH. Note that the equi-

librium speed sEis the maximum constant speed that we

can set to maintain the safe temperature level.

A dynamic speed scaling scheme would take advantage

of the power dissipation during idle times. It would make

use of periods where the processor is “cool”, typically after

idle periods,to dynamicallyscale the speed andtemporarily

executetasks at speedshigherthansE. As a result, dynamic

speed scaling would be used to improve the overall proces-

sor utilization.

Indefiningthe dynamicspeedscalingalgorithmwe must

keepin mind that (a) it must be supportedby existing power

controlframeworkssuch as ACPI [8,9],and (b) it must lead

to tractable design – time delay analysis. We therefore use

the following very simple reactive speed scaling algorithm:

The processor will run at maximum speed sH

whenthereis backloggedworkloadandthetem-

perature is below the threshold TH. Whenever

the temperature hits TH, the processor will run

at the equilibrium speed sE, which is defined

in (10). Whenever the backlogged workload

is empty, the processor idles (runs at the zero

speed).

If we define W(t) as the backlogged workload at time t,

the speed scaling scheme described beforecan be expressed

using the following formula:

s(t) =

sH,

sE,

0,

(W(t) > 0) ∧ (T(t) < TH)

(W(t) > 0) ∧ (T(t) = TH)

W(t) = 0

(11)

Figure 1 shows an exampleof how temperaturechangesun-

der reactive speed scaling.

t

H s

E s

)(ts

t

)(t

T

T

H

Figure 1. Illustration of reactive speed scal-

ing.

It is easy to show that in any case the temperature never

exceeds the threshold TH. By using the full speed some-

time, we aim to improvethe processor utilization compared

with the constant-speed scaling. The reactive speed scal-

ing is very simple: whenever the temperature reaches the

threshold, an event is triggered by the thermal monitor, and

the system throttles the processor speed.

2.3 Task Model and Scheduling Algorithms

The workload consists of a set of tasks {Γi : i = 1,2,

...,n}. Each task Γiis composed of a sequence of jobs.

For a job, the time elapsed from the release time trto the

completion time tf is called the delay of the job, and the

worst-case delay of all jobs in Task Γi is denoted by di.

Jobs within a task are executed in a first-in first-out order.

We characterizetheworkloadofTaskΓibytheworkload

function fi(t), the accumulated requested processor cycles

of all the jobs from Γireleased during [0,t]. Similarly, to

characterize the actual executed processor cycles received

by Γi, we define gi(t), the service function for Γi, as the

totalexecutedprocessorcyclesrenderedtojobsofΓiduring

[0,t].

A time-independent representation of fi(t) is the work-

load constraint function Fi(I), which is defined as follows.

Definition 1 (Workload Constraint Function). Fi(I) is a

workloadconstraintfunctionfortheworkloadfunctionfi(t),

if for any 0 ≤ I ≤ t,

fi(t) − fi(t − I) ≤ Fi(I).

(12)

Page 4

For example, if a task Γiis constrainedby a leaky bucket

with a bucket size σiand an average rate ρi, then

Fi(I) = σi+ ρiI.

(13)

Once tasks arrive in our system, a scheduling algorithm

will be used to schedule the service order of jobs from dif-

ferent tasks. Both the workload and the scheduling algo-

rithm will determine the delay experienced by jobs. In this

paper,weconsidertwoschedulingalgorithms: First-inFirst-

out (FIFO) scheduling and Static Priority (SP) scheduling.

3Important Lemmas

The difficulty for delay analysis in a system with reac-

tive speed scaling lies in the speed of the processor not be-

ing constant. Moreoverthe changes in processing speed are

triggered by the thermal behavior, which follows (11). As

a result, as we will show, simple busy-period analysis does

not work.

The followingtwo lemmas show how the change of tem-

perature, job arrival, job execution will affect the tempera-

ture at a later time or the delay of a later job.

Lemma 1. In a system under our reactive speed scaling,

given a time instance t, we consider a job with a release

time trand a completion time tfsuch that tr< t and tf<

t. We assume that the processor is idle during [tf,t]. If we

take either of the following actions as shown in Figure 2:

t

rt

ft

t

rt

0t

*

ft

(A)

t

rt

*

ft

(B)

t

*

ft

*

rt

(C)

Figure 2. Temperature effect.

• Action A: Increasing the temperature at time t0(t0≤

tr) such that the job has the same release time trbut

a new completion time t∗

fsatisfying t∗

f< t;

• Action B: Increasing the processor cycles for this job

such that the job has the same release time trbut a

new completion time t∗

fsatisfying t∗

f< t;

• Action C: Shifting the job such that the job has a new

release time t∗

ing tr< t∗

rand a new completion time t∗

r< t and tf< t∗

then we have Tt ≤ T∗

atures at time t in the original and the modified scenarios

respectively.

rsatisfy-

f< t,

t, where Ttand T∗

tare the temper-

Lemma 2. In a system under our reactive speed scaling,

we consider two jobs Jk’s (k = 1,2), each of which has a

release time tk,rand the completion time tk,f. We assume

t1,f < t2,f. If we take either of the following actions as

shown in Figure 3:

r

t, 1

f

t, 1

r

t, 2

f

t, 2

r

t, 2

*

, 2 f

t

*

, 1 f

t

*

, 1 r

t

r

t, 1

*

, 1 f

t

*

, 2 f

t

r

t, 1

f

t, 1

r

t, 2

0t

*

, 2 f

t

(A)

(B)

(C)

r

t, 2

Figure 3. Delay effect.

• Action A: Increasingthe temperature at t0(t0≤ t2,r)

such that Job J2has the same release time t2,rbut a

new completion time t∗

2,f;

• Action B: Increasing the processor cycles of Job J1

such that Job Jk(k = 1,2) has the same release time

tk,rbut a new completion time t∗

k,f;

• Action C: Shifting Job J1such that Job J1has a new

release time t∗

Job J2has the same release time t2,rand a new com-

pletion time t∗

t∗

1,rand a new completion time t∗

1,f, and

2,fsatisfying t1,r ≤ t∗

1,rand t∗

1,f≤

2,f,

then t2,f≤ t∗

J2in the original and the modified scenarios respectively,

then d2≤ d∗

2,f. If we define d2and d∗

2as the delay of Job

2.

The proofs of Lemmas 1 and 2 can be found in [13].

Here we summarize the three actions defined in the above

two lemmas as follows:

• Action A: Increasing the temperature at some time

instances;

• Action B: Increasing the processor cycles of some

jobs;

• Action C: Shifting some jobs to a later time.

By the lemmas, with either of the above three actions, we

can increase the temperature at a later time and the delay of

the later job.

The above two lemmas together with the three actions

are important to our delay analysis under reactive speed

scaling, which will be our focus in the next two sections.

Page 5

4 Delay Analysis of FIFO Scheduling

Recall that the speed of the processor is triggered by the

thermal behavior and varies over time under reactive speed

scaling. Simple busy-period analysis will not work in this

environment. In simple busy-period analysis, the jobs ar-

riving before the busy period will not affect the delay of

jobs arriving during the busy period. However, under reac-

tivespeedscaling,theexecutionofa jobarrivingearlierwill

heatuptheprocessorandsoaffectthedelayofajobarriving

later as shown in Lemma 2. Therefore, in the busy-period

analysis under reactive speed scaling, we have to take this

effect into consideration.

We startourdelayanalysisinthesystemwithFIFOsche-

duling. Under FIFO scheduling, all tasks experience the

same worst-case delay as the aggregated task does. There-

fore, we consider the aggregatedtask, whose workloadcon-

straintfunctioncanbewrittenasF(I) =?n

i=1Fi(I). First,

we investigate the worst-case delay for the aggregated task.

Delay Constraint

length δ1during which a job will experiencethe longest de-

lay and immediatelybeforewhich the processoris idle. The

processor runs at high speed sH in Interval [t1,t1,h] with

length δ1,hand at equilibrium speed sEin Interval [t1,h,t0]

with length δ1,eas shown in the right side of Figure 4(a).

We consider a busy period [t1,t0] with

δm,0δm,h

tm

tm,0tm-1

t3,0

δ2,0

t2,0

t1

t2

δ2,h

δ1,h

δ1,e

t1,h

t0

δ3,0

δ3,h

t3

tm

tm-1

t1

t2

t1,h

t0

t3

(a)

(b)

δ1,h

δ1,e

Figure 4. Job executions.

We define d as the worst-case delay experiencedby a job

in the busy period [t1,t0]. Then, by the definition of worst-

case delay, we have

d = sup

t≥t1

{inf{τ : f(t) ≤ g(t + τ)}}},

(14)

where f(t) and g(t) are the workload function and the ser-

vice function of the aggregated task respectively, as defined

in Section 2. In other words, if by time t + τ, the service

received by the task is no less than its workload function

f(t), then all jobs of the task arriving before time t should

have been served, with a delay no more than τ.

Since the processor is idle at time t1, we have f(t1) =

g(t1). Therefore, f(t) ≤ g(t + τ) in (14) can be written as

f(t) − f(t1) ≤ g(t + τ) − g(t1).

(15)

First, we study the right side of (15). Recall that the proces-

sor runs at high speed sHin Interval [t1,t1,h] with length

δ1,hand at equilibrium speed sEin Interval [t1,h,t0] with

length δ1,e. If we define I = t − t1, then we have

g(t + τ) − g(t1) = G(I + τ),

(16)

where G(I), a service constraint functionof g(t), is defined

as

G(I) = min{(sH− sE)δ1,h+ sEI,sHI}.

(17)

Next, we study the left side of (15). With Action B, the job

will experiencea longer delay with more workload released

and completed beforethe completion of this job. Therefore,

if we set

f(t) − f(t1) = F(I),

(18)

together with (16), then the worst-case delay in (15) can be

written as (see Figure 5)

d = sup

I≥0{inf{τ : F(I) ≤ G(I + τ)}}.

(19)

)(IF

)(IG

h , 1 δ

e , 1 δ

d

I

Figure 5. Delay constraint.

As we can see, the undeterminedservice constraintfunc-

tion G(I) is the key in the worst-case delay formula (19).

Next, we will focus on obtaining G(I).

Service Constraint

tion of δ1,h, which depends on the temperature at time t1.

Instead of determining the exact temperature at t1, we aim

to obtain a tight upper-bound of t1, which will result in an

upper-boundoftheworst-casedelayaccordingtoLemma2.

To achieve this, we introduce extra intervals [tk+1,tk]’s

(k = 1,..., m−1), as shown in Figure 4(a). By Lemma 1,

we can use the three actions mentioned above to upper-

bound the temperature at t1. With Action A, we upper-

bound the temperature at tm to be TH. With Action C,

for each Interval δk(k = 2,...,m), we shift all parts of

job execution to the end of this interval, such that the be-

ginning part is idle with length δk,0and the ending part is

busy with length δk,h, as shown in Figure 4(b). We assume

that the temperature will not hit THduring [tm,t1]1, then

As defined in (17), G(I) is a func-

1If there is an interval [tk0+1,tk0] during which the temperature hits

TH, then the temperature at tk0is TH. In this case, we can set m = k0

and remove all intervals on the left.