Page 1

apport ?

?

de recherche?

ISSN 0249-6399

ISRN INRIA/RR--7026--FR+ENG

INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE

Optimal Sampling for State Change Detection with

Application to the Control of Sleep Mode

Amar P. Azad — Sara Alouf — Eitan Altman — Vivek Borkar — Georgios Paschos

N° 7026

Septembre 2009

inria-00420542, version 1 - 29 Sep 2009

Page 2

inria-00420542, version 1 - 29 Sep 2009

Page 3

Centre de recherche INRIA Sophia Antipolis – Méditerranée

2004, route des Lucioles, BP 93, 06902 Sophia Antipolis Cedex

Téléphone : +33 4 92 38 77 77 — Télécopie : +33 4 92 38 77 65

Optimal Sampling for State Change Detection with

Application to the Control of Sleep Mode

Amar P. Azad∗, Sara Alouf∗, Eitan Altman∗, Vivek Borkar†, Georgios

Paschos‡

Th` eme : R´ eseaux et t´ el´ ecommunications

´Equipe-Projet Maestro

Rapport de recherche n° 7026 — Septembre 2009 — 17 pages

Abstract: This work considers systems with inactivity periods of unknown duration

duringwhichtheservergoesonvacation. We studythequestionofscheduling“waking

up” instants in which a server can check whether the inactivity period is over. There

is a cost proportional to the delay from the moment the inactivity period ends until the

server discovers it, a (small) running cost while the server is away and also a cost for

waking up. As an application to the problem, we consider the energy management in

WiMax where inactive mobiles reduce their energy consumption by entering a sleep

mode. Various standards exist which impose specific waking-up scheduling policies at

wireless devices. We check these and identify optimal policies under various statistical

assumptions. We show that periodic fixed vacation durations are optimal for Poisson

arrivals and derive the optimal period. We show that this structure does not hold for

other inactivity distributions but manage to obtain some suboptimal solutions which

perform strictly better than the periodic ones. We finally obtain structural properties

for optimal policies for the case of arbitrary distribution of inactivity periods.

Key-words: Dynamic programming, optimization, sampling, WiMAX, performance

evaluation

∗

Maestrogroup,INRIA,2004RoutedesLucioles, F-06902SophiaAntipolis,

{aazad,salouf,altman}@sophia.inria.fr

†School of Technology, TIFR, Mumbai, borkar@tifr.res.in.

‡ECE @ University of Thessaly, Volos, Greece, gpasxos@uth.gr.

inria-00420542, version 1 - 29 Sep 2009

Page 4

R´ esum´ e : Dans ce rapport, nous consid´ eronsdes syst` emes ayant des p´ eriodes d’inacti-

vit´ e de dur´ ee inconnue, pendant lesquelles le serveur est en vacance. La question qui

nous int´ eresse c’est de d´ eterminer, de fac ¸on optimale, ` a quel moment le serveur doit-il

v´ erifier si la p´ eriode d’inactivit´ e dure toujours. Dans les syst` emes consid´ er´ es, il existe

un coˆ ut proportionnelau laps de temps s’´ ecoulant entre la fin de la p´ eriode d’inactivit´ e

et l’instant o` u le serveur s’en rend compte.`A celui-ci, s’ajoutent un coˆ ut, assez faible,

de fonctionnement et une p´ enalit´ e ` a chaque v´ erification du serveur. Comme applica-

tion, nous consid´ erons la gestion de l’´ energie dans la norme WiMAX o` u les terminaux

mobiles entrent en veille par souci d’´ economie d’´ energie. Il existe plusieurs standards

d´ efinissant des politiques diff´ erentes d’ordonnancement pour le r´ eveil des terminaux.

Nous v´ erifions leurs performanceset identifions des politiques optimales sous diverses

hypoth` esesstatistiques. Dans le cas o` u les p´ eriodesd’inactivit´ esont exponentiellement

distribu´ ees, nous montrons qu’il est optimal de v´ erifier p´ eriodiquement si la p´ eriode

d’inactivit´ e ne serait pas finie et calculons la p´ eriode optimale. Nous montrons que

cette politique perd son optimalit´ e en pr´ esence de p´ eriodes d’inactivit´ e ayant une autre

distribution, auquel cas nous d´ erivons des politiques d’ordonnancementqui sont sous-

optimales et qui ont de meilleures performances que la politique constante. En dernier

lieu, nous trouvons des propri´ et´ es structurelles des politiques optimales pour le cas o` u

les p´ eriodes d’inactivit´ e ont une distribution arbitraire.

Mots-cl´ es :

´ evaluation de performance

Programmation dynamique, optimisation, ´ echantillonnage, WiMAX,

inria-00420542, version 1 - 29 Sep 2009

Page 5

Optimal Sampling for State Change Detection

1

1 Introduction

Mobile terminals using contemporary radios can benefit greatly by shutting off the

transceiverwheneverthere is no scheduledactivity. Nevertheless, if the attentionof the

mobile is suddenly required, the mobile will be shut off and therefore unavailable. The

longer the shut off (vacation) periods, the longer the expected response delay. There-

fore, one can identify the inherent tradeoff of energy management: increase vacation

length to improve energy saving or decrease vacation length to reduce delays.

Past approaches have considered incoming/outgoing traffic [17, 20, 18], the effect

ofsetuptime[11,8],oreventhequeueingimplicationsintheanalysis[12,2]. Concern-

ing the arrival process, it has been assumed to be Poisson (cf. the above references),

having a hyper-Erlang distribution [19] or a hyper-exponential distribution [7, 1]. In

all cases, it does not depend on the energy management scheme. As for delay, it is the

average packet delay in the system that is considered.

Recent works [11, 16, 10] focus on heuristic adaptive algorithms whose goal is

to control the vacation length according to the incoming arrival process. The work

[14] derives an optimal sleep policy using average cost structure for a given number of

consecutive sleep durations.

Our work departs from the existing models in two aspects. First, rather than an

exogenous independent arrival process, we have in mind elastic arrival processes in

which (i) a “think time” or “off time” begins when the activity of the server ends, and

(ii) the duration of the “on time” does not depend on the wake up delay, defined as

the time that elapses between the instant a request is issued and the instant at which

the request service actually begins. Both assumptions are appropriate to interactive

applications such as web browsing. As a result, the measure for delay is taken to be the

wake up delay.

Our objective is to optimize the vacation duration in order to achieve the desired

balance between delay and energy saving. We shall investigate in this paper optimal

energy management systems under one of the following assumptions on the off time

distribution:

a. Exponential distribution;

b. Hyper-exponentialdistribution;

c. General distribution.

Themotivationbehindthehyper-exponentialdistributionassumptioncomesfromworks

that provide evidence of heavy-tailed off time distributions on the Internet [15] and of

Pareto typedistributionon the World Wide Web [5]. Furthermore,it is well-knownthat

heavy-tailed distributed random variables (rvs) can be well approximated by hyper-

exponential distributions [7].

Our contributions are as follows:

1. Our problem formulation allows us to minimize the weighted sum of the two

costs, which is essentially obtaining the optimal tradeoff of delay against energy

saving. We use dynamic programming (DP) which allows to obtain the optimal

vacation size at each wake up instant.

2. For exponential off times, we show that the constant vacation policy is optimal

and we derive it.

RR n° 7026

inria-00420542, version 1 - 29 Sep 2009

Page 6

2

Azad, Alouf, Altman, Borkar & Paschos

3. For hyper-exponential off times, we derive interesting structural properties. We

show that the optimal control is bounded. Asymptotically, the optimal policy

converges to the constant policy corresponding to the smallest rate phase, irre-

spectiveof the initial state. This policycan be computednumericallyusing value

iteration.

4. Foranygeneralofftimedistribution,weshowthattheoptimalcontrolis bounded.

5. We propose suboptimal policies using policy iteration which perform strictly

better than optimal “homogeneous” policies and are simpler to compute. We

show numerically the performance of such suboptimal solutions using one stage

and two stage policy iteration.

6. We compare the proposed policies with that of the IEEE 802.16e standard [9]

under various statistical assumptions.

In the rest of the paper, Sect. 2 outlines our system model and introduces the cost

function. Section 3 introduces DP and derives the optimal sleep control and relevant

characteristicsforhyper-exponentialofftimes. Section4tacklesthe problemoffinding

the optimal policy under the worst case process of arrivals. Numerical results and a

comparative study of the different (sub)optimal strategies and of the IEEE 802.16e

standard are reported in Sect. 5. Section 6 concludes the paper.

2 System Model

We consider a system with repeated vacations. As long as there are no customers, the

server goes on vacation. We are interested in finding the optimal policy, so that at any

start of vacation, the length of this vacation is optimal. This system models a mobile

device that turns off its radio antenna while inactive to save energy. A vacation is then

the time during which the mobile is sleeping. At the end of a vacation, the mobile

needs to turn on the radio to check for packets.

Let X denote the number of vacations in an idle period. X is a discrete random

variable (rv) taking values in IN∗. The duration of the kth vacation is a rv denoted

Bk, for k ∈ IN∗. For analytical tractability, we consider vacations {Bk}k∈IN∗ that are

mutually independent rvs. The time at the end of the kth sleep interval is a rv denoted

Tk, for k ∈ IN∗. We denote T0as the time at the beginning of the first vacation; by

convention T0= 0. We naturally have Tk= Tk−1+ Bk=?k

We will be using the following notation Y(s) := E[exp(−sY )] to denote the

Laplace-Stieltjes transform of a generic rv Y evaluated at s. Hence, we can readily

write Tk(s) =?k

of a customer; this time is referred to as the “off time”. Since a generic idle period

ends at time TX, the service of the first customer to arrive during the idle period will

be delayed for TX− τ units of time.

τ is a rv whose probability density function is fτ(t),t ≥ 0. We will be as-

suming that τ is hyper-exponentially distributed with n phases and parameters λ λ λ =

(λ1,...,λn) and q = (q1,...,qn). In other words, we have

i=1Bi. Observe that a

generic idle ends at time TX.

i=1Bi(s).

Let τ denote the time length between the start of the first vacation and the arrival

fτ(t) =

n

?

i=1

qiλiexp(−λit),

n

?

i=1

qi= 1.

(1)

INRIA

inria-00420542, version 1 - 29 Sep 2009

Page 7

Optimal Sampling for State Change Detection

3

Given its definition, the off time τ is also the conditionalresidual inter-arrival time.

Observe that when n = 1, τ will be exponentially distributed with parameter λ = λ1,

which, thanks to the memoryless property of this distribution, is equivalent to having a

Poisson arrival process with rate λ.

The energy consumed by a mobile while listening to the channel and checking for

customers is denoted EL. This is actually a penalty paid at the end of each vacation.

The power consumedby a mobile in a sleep state is denoted PS. The energyconsumed

by a mobile duringvacationBkis then equal to EL+PSBk, and that consumedduring

a generic idle period is equal to ELX + PSTX.

We are interested in minimizingthe cost of the power save mode, which is seen as a

weighted sum of the energy consumedduring the power save mode and the extra delay

incurred on the traffic by a sleeping mobile. Let V be this cost; it is written as follows

V:=

=

E[¯ ǫ(TX− τ) + ǫ(ELX + PSTX)]

−¯ ǫE[τ] + ǫELE[X] + ηE[TX]

(2)

(3)

where ǫ is a normalized weight that takes value between 0 and 1; ¯ ǫ = 1 − ǫ; and

η := ¯ ǫ + ǫPS. The derivation of the elements of (3) when τ is hyper-exponentially

distributed is straightforward. We derive

P(X = k) = P(τ > Tk−1) − P(τ > Tk) =

n

?

i=1

qiTk−1(λi)(1 − Bk(λi));

E[τ] =

n

?

∞

?

?

i=1

qi/λi;

E[X] =

k=0

n

?

?

i=1

qiTk(λi);

(4)

E[TX] =

∞

k=0

n

i=1

qiTk(λi)E[Bk+1].

(5)

Using (3)-(5), the cost can be rewritten

V = −¯ ǫE[τ] +

∞

?

k=0

n

?

i=1

qiTk(λi)(ǫEL+ ηE[Bk+1]).

(6)

For convenience, we have grouped the major notation used in the paper in Table 1.

Cost of IEEE 802.16e’s sleep policy

Our system model enables us to evaluate the cost, denoted VStd, incurred by the sleep

policy of the IEEE 802.16e protocol, and more precisely, the sleep policy advocated

for type I power saving classes [9]. There, vacations are deterministic (so we use small

letters to express that) and the size of a sleep window (i.e., a vacation) is doubled over

time until a maximum permissible sleep window, denoted bmax, is reached. The size

of the kth vacation is then

bk= b12min{k−1,l},k ∈ IN∗

where l := log2(bmax/b1). We also have

tk= b1

?

2min{k,l}− 1 + 2l(k − l)1 I{k > l}

?

,k ∈ IN∗.

RR n° 7026

inria-00420542, version 1 - 29 Sep 2009

Page 8

4

Azad, Alouf, Altman, Borkar & Paschos

Table 1: Glossary of notations

X

Bk

Tk

T0

τ

Y

EL

PS

ǫ, ¯ ǫ

V

c(t,b)

W−1

Number of vacations

Duration of kth vacation

Time until kth vacation, Tk=?k

Arrival time of first customer

Laplace-Stieltjes transform of a random variable Y

Energy consumed when listening to the channel

Power consumed by a mobile in a sleep state

Normalized energy/delay weight, 0 < ǫ ≤ 1, ¯ ǫ = 1 − ǫ

Cost function

Cost incurred by vacation of size b having started at time t

Branch of the Lambert W function that is real-valued on the interval

[−exp(−1),0] and always below −1

rate/probability vector in the n-phase hyper-exponential distribution, λ λ λ =

(λ1,...,λn), q = (q1,...,qn)

= ¯ ǫ + ǫPS,0 < η ≤ 1 + PS

= 1 +λǫEL

η

,i = 1,...,n,ζ > 1

i=1Bi

Starting time of power save mode, T0= 0

λ λ λ, q

η

ζ

The cost of the standard’s policy is, using (6),

VStd= −¯ ǫE[τ] +

∞

?

k=0

n

?

i=1

qie−λitk(ǫEL+ ηbk+1),

(7)

3 Dynamic Programming

Dynamic programming (DP) is a well-known tool which allows to compute the op-

timal decision policy to be taken at each intermediate observation point, taking into

account the whole lifetime of the system. Considering our system model, we want

to identify the optimal sleep strategy where decisions are taken at each intermediate

wake-up instance. Hence, a DP approach is a natural candidate for determining the

optimal policy.

The observation points are at the end of the vacations, i.e., at tk. The conditional

residual off time at a time t is denoted τt. We introduce the following DP:

?E[c(tk,bk+1)] + P(τtk> bk+1)V⋆

Here, V⋆

state of the system at time tk. The terms P(τtk> bk+1) and c(tk,bk+1) respectively

represent the transition probability and the stage cost at tkwhen the control is bk+1. In

generic notation, the per stage cost is

V⋆

k(tk) = min

bk+1≥0

k+1(tk+1)?.

k(tk) represents the optimal cost at time tkwhere the argument tkdenotes the

c(t,b) = ¯ ǫE[(b − τt)1 I{τt≤ b}] + ǫ(EL+ PSb).

(8)

We can see that each stage is characterized by the distribution of the residual off time

τt. The state of the system in sleep mode can then by described by the distribution of

τt.

INRIA

inria-00420542, version 1 - 29 Sep 2009

Page 9

Optimal Sampling for State Change Detection

5

In the rest of this section, three cases will be considered following the distribution

of the off time. We start with the DP solution for exponential off times, then derive

some structural properties of the DP solution for hyper-exponentialoff times. Last, the

case of general off times is considered: structural properties of the optimal policy are

found and then suboptimal solutions through DP are discussed.

3.1 Exponential Off Time

When arrivals form a Poisson process with rate λ, both the off time τ and the condi-

tional residual off time τtwill be exponentially distributed with parameter λ, whatever

t is (i.e., whatever stage). The distribution of τtis characterized solely by the rate λ.

In other words, as time goes on, the state of the system is always represented by the

parameter λ. Henceforth, the DP involves a single state, denoted λ.

We are faced with a Markov Decision Process (MDP), a single state λ, a Borel

action space (thepositive real numbers)and discrete time. Note that the sleep durations

are not discrete. However, decisions are taken at discrete embedded times: the kth

decision is taken at the end of the (k − 1)st vacation. Therefore, we are dealing with

a discrete time MDP. This is called “negative” dynamic programming [13]. It follows

from [6] that we can restrict to stationary policies (that depend only on the state) and

that do not require randomization. Since there is only one state (at which decisions are

taken) this implies that one can restrict to vacation sizes that have fixed size and that

are the same each time a decision has to be taken. In other words, the optimal sleep

policy is the constant one. Hence the optimal value is given by the minimization of the

following MDP:

?¯ ǫE??b − τ(λ)?1 I{τ(λ) ≤ b}?

V⋆(λ) = min

b≥0

+ǫ(EL+ bPS) + P?τ(λ) > b?V⋆(λ)?.

(9)

Proposition 3.1 The optimal vacation size for exponential off time and the minimal

cost are given by

b⋆= −1

λ

?ζ + W−1

λ

?−e−ζ??;

(10)

V⋆(λ) = −1

?

¯ ǫ + ηW−1

?

− e−ζ??

,

(11)

with ζ := 1+λǫEL/η, and where W−1denotes the branchof the Lambert W function1

that is real-valued on the interval [−exp(−1),0] and always below −1.

Proof: From (9) we can express

V (λ) =¯ ǫE??b − τ(λ)?1 I{τ(λ) ≤ b}?+ ǫ(EL+ bPS)

1 − P?τ(λ) > b?

(12)

Substituting

E??b − τ(λ)?1 I{τ(λ) ≤ b}?=λb − 1 + exp(−λb)

1The Lambert W function, satisfies W(x)exp(W(x)) = x. As y exp(y) = x has an infinite number

of solutions y for each (non-zero) value of x, the function W(x) has an infinite number of branches.

λ

RR n° 7026

inria-00420542, version 1 - 29 Sep 2009

Page 10

6

Azad, Alouf, Altman, Borkar & Paschos

and

P?τ(λ) > b?= exp(−λb)

?1 − exp(−λb)(ζ + λb)

in (12) and differentiating w.r.t. b we obtain

V′(λ) = η

(1 − exp(−λb))2

?

.

(13)

At the extremum of V (λ), denoted b⋆, we must have

1 − exp(−λb⋆)(ζ + λb⋆)

exp(−ζ − λb⋆)(−ζ − λb⋆)

=0

⇔=−exp(−ζ).

The last expression is of the form yexp(y) = x with y = −ζ − λb⋆and x =

−exp(−ζ). The solution y is the Lambert W function [4], denoted W, at the point

x. Hence,

−ζ − λb⋆= W(−exp(−ζ)).

Sinceζ ≥ 1,wehave−exp(−1) ≤ −exp(−ζ) < 0. Therefore,weneedW(−exp(−ζ))

tobereal-valuedin[−exp(−1),0[. Also,giventhatζ+λb⋆≥ 1,weneedW(−exp(−ζ))

to be always negative and smaller than −1. Both conditions are satisfied by the branch

numbered −1. Hence, −ζ − λb⋆= W−1(−exp(−ζ)) and (10) is readily found. Re-

placing (10) in (12) and using the relation exp(y) = x/y, one can derive (11).

Similarly we proceed to the second order conditions to determine if b⋆yields min-

imum cost. The second derivative function of the cost is

V′′(λ) =

ηλ1e−λ1b

(1 − e−λ1b)3

?(1 + e−λ1b)(1 + ζ1+ λ1b) − 4?.

The sign of V′′(λ) depends on the value of

z1(b) := (1 + exp(−λ1b))(1 + ζ1+ λ1b).

The following can be easily derived

z′

b→0z′

lim

1(b) = λi(1 − exp(−λ1b)(ζ1+ λ1b))

lim

1(b) = −λ1(1 − ζ1) < 0

b→∞z′

1(b) = λ1> 0

The derivative z′

b > b⋆. Hence, z1(b) decreases from limb→0z1(b) = 2(1 + ζ1) > 4 to its minimum

z1(b⋆) = −(W−1(−e−ζ1)−1)2

W−1(−e−ζ1)

> 4 and then increases asymptotically to +∞. We have

shown that z1(b) > 4 for any positive b. Therefore, V′′(λ) > 0 for any positive b.

V (λ) is then a convex function in b and the extremum b⋆is a global minimum, which

concludes the proof.

1(b) is null for b = b⋆> 0, negative for b < b⋆and positive for

♦

3.2Hyper-Exponential Off Time

We assume in this section that τ is hyper-exponentially distributed with n phases and

parametersλ λ λ = (λ1,...,λn) and q = (q1,...,qn).

INRIA

inria-00420542, version 1 - 29 Sep 2009

Page 11

Optimal Sampling for State Change Detection

7

3.2.1Distribution of the Conditional Residual Off Time τt

The tail of τtcan be computed as follows

P(τt> a)=P(τ > t + a|τ > t) =P(τ > t + a)

?n

n

?

P(τ > t)

=

i=1qiexp(−λit)exp(−λia)

?n

gi(q,t)exp(−λia)

j=1qjexp(−λjt)

=

i=1

(14)

where

gi(q,t) :=

qiexp(−λit)

?n

j=1qjexp(−λjt),i = 1,...,n.

(15)

We denote g(q,t) as the n-tuple of functions gi(q,t), i = 1,...,n. Observe that

g(q,0) = q. The operator g transforms the distribution q into another distribution q′

such that?n

parameters λ λ λ and g(q,t). Except for the probabilities of the n phases, the off time τ

and its residual time τthave the same distribution and same parameterλ λ λ. As time goes

on, the residual time keeps its distributionbut updates its phases’ probabilities, through

the operator g. It can be shown that

j=1q′

j= 1 and q′

j> 0.

Equation (14) is nothing but the tail of a hyper-exponentialrv having n phases and

gi(q,b1+ b2) = gi

?gi(q,b1),b2

?.

(16)

In other words, the operator g is such that the result of the transformation after b1+b2

units of time is the same as that of a first transformationafter b1units of time, followed

by a second transformation after b2units of time.

To simplify the notation, we will drop the subscript of the residual off time τt,

and instead, we will add as argument the current probability distribution (which is

transformed over time through the operator g). For instance, if at some point in time,

the residual off time has the probability distribution q′, then we will use the notation

τ(q′).

The results above can be extendedto account for a randompassed time T. We have

P(τ > T + a|τ > T) =

n

?

i=1

gi(q,T)exp(−λia)

where

gi(q,T) :=

qiT (λi)

?n

j=1qjT (λj)=

qiT (λi)

P(τ > T).

(17)

There is an abuse of notation in the definition of gi(q,T), as this function depends

on the distribution of T and not on the rv T itself. The function gi(q,T) is not a rv.

Observe that (15), where time is deterministic, is a particular case of (17). Asymptotic

properties of g are provided next.

Define the compositiongm(q,b) = g?gm−1(q,b),b?= g(q,mb), where g1(q,b)

that λ1≤ ... ≤ λn. Let e(i) be the n-dimensional vector whose ith element is 1 and

all other elements are zero.

is the vector whose ith element is given in (15). Assume, without loss of generality,

RR n° 7026

inria-00420542, version 1 - 29 Sep 2009

Page 12

8

Azad, Alouf, Altman, Borkar & Paschos

Lemma 3.1 Fix q and let I(q) be the smallest j for which qj> 0. The following limit

holds:

lim

m→∞gm(q,b) = e(I(q)).

Proof: Let α(i) :=

exp(−λib)

exp(−λI(q)b). Then (15) can be rewritten

gi(q,b) =

qiαi

?n

j=I(q)qjαj.

In particular,

gm

I(q)(q,b) =

qI(q)

j>I(q)qjαm

qI(q)+?

j

.

Since λi≤ λjfor I(q) < i < j, then αj< αi≤ αI(q)+1≤ αI(q)= 1. Hence

gm

I(q)(q,b) ≥

qI(q)

qI(q)+ αm

I(q)+1

αm

I(q)+1(1 − qI(q))

qI(q)+ αm

?

j>I(q)qj

⇔

?

j>I(q)

gm

j(q,b) ≤

I(q)+1(1 − qI(q))

≤ αm

I(q)+1

1 − qI(q)

qI(q)

.

We then have that

lim

m→∞

?

j>I(q)

gm

j(q,b) = 0,

which implies the lemma.

Lemma 3.1 states that, as time passes, the residual off time’s distribution translates

its mass towards the phase with the smallest rate, and converges asymptotically irre-

spective of the initial distribution. This suggests that there exists a threshold on the

time after which the optimal policy is the one that corresponds to the optimal policy

for state I(q).

♦

Lemma 3.2 For any q we have

lim

q′→qV (q′) = V (q).

Lemma 3.2 states that as the state converges, the value also converges to the value at

the convergedstate.

3.2.2DP Solution

Below we formulate the optimization problem as an MDP where the state space is

taken to be the simplex of dimension n, i.e. the set of probability measures over the

set {1,2,...,n}. At each stage, the residual off time sees its probability distribution

being updated. Let q0denote the probabilitydistribution of the total off time. It is then

the probability distribution of the residual off time at time 0. Thanks to the property

(16), the probability distribution of the residual off time at stage k + 1, i.e., at time

tk, is q = g(q0,tk). Henceforth, there is a one to one relation between the stage and

INRIA

inria-00420542, version 1 - 29 Sep 2009

Page 13

Optimal Sampling for State Change Detection

9

the current probability distribution of the residual off time. Without loss of optimality,

either of them can be the state in the MDP [3, Sect. 5.4].

The system state is denoted q and represents the current probability distribution of

the residual off time. The initial state is q0. We assume that the controller can choose

any time b (a constantor a rv) until he wakes up. Thetransition probabilitiesare simply

Pq,b,q′ = 1 I{q′= g(q,b)}.

We are faced with an MDP with a Borel action space and a state space that is the

set of probability vectors q. Note however that, starting from a given q, there is a

countable set Q of q’s so that only states within Q can be reached from q. Therefore

we may restrict the state space to the countableset Q. We can againuse [6] to conclude

that we may restrict to policies that choose at each state a non-randomized decision b,

andthe decisiondependsonlyonthe currentstate (andneednotdependontheprevious

history). We next show that there is some b such that actions may be restricted to the

compact interval [0,b] without loss of optimality.

Consider the policy w that takes always a constant one unit length vacation. It is

easily seen that the total expected cost, when using policy w, is upper bounded by

v := ¯ ǫ + ǫ?1 + sup

Here, ¯ ǫ is an upper bound on the expected waiting cost and 1 + supi1/λiis an upper

bound on E[X], the expected number of vacations, and on E[TX], the expected idle

time. We conclude that

i

1/λi

?(EL+ PS).

Lemma 3.3 For all q, V (q) ≤ v.

Lemma 3.4 Without loss of optimality, one may restrict to policies that take only ac-

tions within [0,b] where

b =1

¯ ǫ{v + 1 + 1/(min

i

λi)}.

Proof: Let u be an ǫ-optimal Markov policy that does not use randomization, where

ǫ ∈ (0,1). If ui > b for some i then the expected immediate cost at step i is itself

larger than 1 plus the total expected cost that would be incurred under the policy w:

E??b − τ(q)?1 I{τ(q) ≤ b}?> v + 1.

Thus, by switching from time i onwards to w, the expected cost strictly decreases by

at least 1 unit; thus u cannot be ǫ-optimal.

We concludethat the MDP can be viewed as one with a countablestate space, com-

pact action space, discrete time, and non-negative costs (known as “negative dynamic

programming”). Using [13] we then conclude:

♦

(i) The optimal value (minimal cost) is given by the minimal solution of the follow-

ing DP:

?¯ ǫE??b − τ(q)?1 I{τ(q) ≤ b}?

V (q) = min

+ǫ(EL+ bPS) + P?τ(q) > b?V?g(q,b)??.

b≥0

(18)

RR n° 7026

inria-00420542, version 1 - 29 Sep 2009

Page 14

10

Azad, Alouf, Altman, Borkar & Paschos

(ii) Let B(q) denote the set of all b’s that minimize the right hand side of (18) for a

given q. Then any policy that chooses at state q some b ∈ B(q) is optimal.

The value iteration can be used as an iterative method to compute V (q). Starting

with V0= 0 we write

?¯ ǫE??b − τ(q)?1 I{τ(q) ≤ b}?

Vk+1(q)= min

b≥0

+ǫ(EL+ bPS) + P?τ(q) > b?Vk

?g(q,b)??.

Then V (q) = limk→∞Vk(q), see [3]. The iteration is to be performedfor everypossi-

blestate q. Lemma3.1impliesthatthemovingstate, g(q,b), convergesasymptotically

to e(I(q)). To complete the value iteration, we compute, for a fixed b,

E??b − τ(q)?1 I{τ(q) ≤ b}?= b −

n

?

i=1

qi1 − exp(−λib)

λi

.

3.3General Distribution of Off Time

In this section, off times have a general distribution. As a consequence, one can no

longerexpectthat the residualofftime will keepthe samedistributionovertime, updat-

ing only its parameters. Therefore, the system state is the instant t at which a vacation

is to start. We use again τtto denote the conditional residual value of τ at time t (i.e.,

τ − t given that τ > t.

As a state space, we consider the set of non-negative real numbers. An action b

is the duration of the next vacation. We shall assume that b can take value in a finite

set. The set of t reachable (with positive probability) by some policy is countable. We

can thus assume without loss of generality that the state space is discrete. Then the

following holds:

Proposition 3.2

(i) There exists an optimal deterministic stationary policy.

(ii) Let V0:= 0, Vk+1:= LVk, where

LV (t) := min

b

{c(t,b) + P(τt> b)V (t + b)}

where c(t,b) has been defined in (8). Then Vkconverges monotonically to the

optimal value V⋆.

(iii) V⋆is the smallest nonnegative solution of V⋆= LV⋆. A stationary policy that

chooses at state t an action that achieves the minimum of LV⋆is optimal.

Proof: (i) follows from [13, Thm 7.3.6], and (ii) from [13, Thm 7.3.10]. Part (iii) is

due to [13, Thm 7.3.3].

Observe that Vkexpresses the optimal cost for the problem of minimizing the total

cost over a horizon of k steps.

♦

Proposition 3.3 Assume that τt converges in distribution to some limit ? τ. Define

v(b) := ? c(b)/[1 − P(? τ > b)]. Then

INRIA

inria-00420542, version 1 - 29 Sep 2009

Page 15

Optimal Sampling for State Change Detection

11

(i) limt→∞V⋆(t) = minbv(b).

(ii) Assume that there is a unique b that achieves the minimum of v(b) and denote it

by?b. Then there is some stationary optimal policy b(t) such that for all t large

Proof: By the bounded convergencetheorem,

enough, b(t) equals?b.

lim

t→∞c(t,b)=¯ ǫE[(b − ? τ)1 I{? τ ≤ b}] + ǫ(EL+ bPS)

=

? c(b).

t→∞(LV0)(t) = min

Let V0:= 0. Then

?V1:= lim

b

? c(b)

which is a constant. Assume that?Vk:= limt→∞Vk(t) exists for some k. Then

= lim

t→∞min

b

?

which is a constant. Hence by the monotone convergenceof Vkto V⋆, the limit?V :=

?V = min

This DP corresponds to an MDP that has a single state and thus there exists an optimal

constant deterministic policy that takes always the same b, which we denote?b. This

?V = ? c(?b) + P(? τ >?b)?V

?V =

Any other stationary (constant) deterministic policy b for the limit DP gives a larger

value

? c(b)

This establishes (i).

In part (ii), the last inequalityis strict for all b ?=?b. Since the limit DP is obtainedfrom

a strictly lower value of c(t,b) + P(τt> b)V (t + b) than any other value of b. Thus

by part (iii) of the previous theorem,?b is optimal at all t large enough.

deterministic policies to achieve optimal performance. Also, if the residual off time

distribution converges in time then the optimal policy converges to the constant policy

and in fact becomes constant after finite time (under the appropriate conditions). This

canbe shownto bethe casewith thehyper-exponentialdistribution. Indeed,its residual

time converges in distribution to an exponential distribution, having as parameter the

smallest among the rates of the hyper-exponentialdistribution.

?Vk+1

:= lim

t→∞(LVk)(t)

?c(t,b) + P(τt> b)Vk(t + b)?

= min

b

? c(b) + P(? τ > b)?Vk?

limt→∞V⋆(t) exists and satisfies the limit dynamic programming (DP)

b

L?V

gives

so that

? c(?b)

1 − P(? τ >?b)

1 − P(? τ > b)≥?V

= v(?b) = min

b

v(b).

the original one by considering large t, it follows that for all t large enough,?b will give

To recapitulate, we have shown, that for a general off time, it is enough to consider

♦

RR n° 7026

inria-00420542, version 1 - 29 Sep 2009

Page 16

12

Azad, Alouf, Altman, Borkar & Paschos

3.3.1 Suboptimal policies through dynamic programming

In this section, we propose a suboptimal solution approach using policy iteration for

a few stages. For the rest of the stages, we consider a static control that is computed

through parametric optimization, which is done next.

Consider a class of policies in which all vacations are i.i.d. exponentially dis-

tributed rvs with parameter b. We will refer to this class as the “Exponential vacation

policy.” With this policy, the cost, denoted Ve, depends only on E[τ], as detailed here-

after. Conditioningona giveninactivityperiodτ, thenumberofvacationsdecremented

by one is a Poisson variable with rate τ/b. It is straightforward to write

E[X] = E[τ]/b + 1;

E[TX] = bE[X] = E[τ] + b.

Equation (3) can be rewritten (recall that η = ¯ ǫ + ǫPS)

Ve= ǫ?PS+ EL/b?E[τ] + ǫEL+ ηb.

Observe that (19) stands for any distribution of τ. We next find the optimal total cost

under the Exponential policy.

(19)

Proposition 3.4 The cost Veis a convex function having a minimum at

b⋆

e=

?

ǫELE[τ]

¯ ǫ + ǫPS.

(20)

The minimal total cost is

V⋆

e= ǫ(PSE[τ] + EL) + 2

?

ǫ(¯ ǫ + ǫPS)ELE[τ]

(21)

Proof: Let us compute the first and second derivative of the cost w.r.t. b. We find

V′

e= η −ǫELE[τ]

b2

;V′′

e= 2ǫELE[τ]

b3

.

Clearly, V′′

ehas a root at b⋆

Substituting the optimal b⋆

The optimal control is b⋆

with i.i.d. exponentialvacations,onlytheexpectedinactivityperioddefinestheoptimal

control. The inactivity period τ can be generally distributed. Therefore, Proposition

3.4 stands valid for any user application.

Now that we have computed the static control for all stages, we proceed with one

stage policy iteration. With this iteration, the vacations have the form (b1,B,B,...)

where B is an exponentiallydistributed rv with mean b. We can use DP to compute the

optimal policy within this class. The problem is given by

e ≥ 0 for any b > 0, hence Veis a convex function. The derivative

eas given in (20), which yields a minimum in the cost Veat b⋆

ein (19) we obtain the minimal cost (21).

e. Proposition 3.4 is really interesting in that it says that

V′

e.

♦

V⋆

1(0) = min

b≥0{c(0,b1) + P(τ > b1)V⋆(b1)}

(22)

where V⋆(b1) is equivalentto V⋆

off time at time b1, i.e., τb1. The optimal control identified through DP is b⋆

When τ is hyper-exponentially distributed, the system state is the distribution q

which is transformed after each stage through the operator g.

ein (21) after replacingthe off time τ with the residual

1and b⋆.

INRIA

inria-00420542, version 1 - 29 Sep 2009

Page 17

Optimal Sampling for State Change Detection

13

If we add the constraint that the first vacation should be exponentially distributed

with the same distribution as B, then we will be back to the problem of finding an

optimal exponentially distributed vacation with state-independent mean. Since we do

not impose this restriction, the policy obtained after one stage iteration will do strictly

better than the Exponential vacation policy.

This suboptimal method for one stage policy iteration can be extended to more

stages. Instances of the two stage policy iteration are provided in Sect. 5. As the

number of stages of the policy iteration increases, the suboptimal solution convergesto

the optimal solution (obtained from (18) if τ is hyper-exponentiallydistributed).

4 Worst Case Performance

We consider in this section the case where the off time is exponentiallydistributed with

an unknown parameter. When the distribution of the parameter is known (Bayesian

framework)the problem reduces to the study of the hyper-exponentiallydistributed off

time. In practice there could be many situations when the statistical distribution of

the off time is unknownor hard to estimate. In such non-Bayesian frameworks, we can

conducta worst-caseanalysis: optimizetheperformanceundertheworstcase choiceof

the unknownparameter. We assume that this parameterlies within the interval [λa,λb].

The worst case is identified as follows

λw:= argmax

λ∈[λa,λb]

min

{Bk},k∈IN∗V

(23)

Given that τ is assumed to be exponentiallydistributed, it is enoughto analyze the case

of the Constant vacation policy, which has been found to be the optimal in Sect. 3.1.

The minimal cost under this policy is given in (11). We have studied (11) using the

mathematics software tool, Maple 11. We found the following: V⋆(λ) is a monotonic

function, decreasing with λ; limλ→+∞V⋆(λ) = ǫEL; and limλ→0V⋆(λ) = +∞.

Thus, the optimal control underworst case is the one correspondingto the smallest rate

in the interval considered, i.e., λw= λa.

5 Numerical Investigation

In this section we show some numerical results of our model, when the off time τ is

either exponentially or hyper-exponentially distributed. In each case, the best control

and the corresponding cost are computed. The cost VStdof the standard’s policy is

reported (using (7)) for comparison. The physical parameters are set to the following

values: EL= 10, and PS= 1. The parameters of the standard protocol are b1= 2 and

l = 10.

5.1 Exponential Off Time

In this case, the optimal is to fix all vacations to the value found in (10). This optimal

controlis depictedinFig. 1. We naturallyfindthat theoptimalsleep durationdecreases

as λ increases. The physical explanation is that, a large arrival rate forces the server to

be available after shorter breaks, otherwise the cost is too high. Also, as ǫ gets smaller,

the extra delay gets more penalizing (cf. (2)), enforcing then smaller optimal sleep

durations.

RR n° 7026

inria-00420542, version 1 - 29 Sep 2009

Page 18

14

Azad, Alouf, Altman, Borkar & Paschos

10

−3

10

−2

10

−1

10

0

10

1

10

−1

10

0

10

1

10

2

10

3

Arrival rate λ

Optimal sleep duration b*

Optimal policy at ε= 0.1

Optimal policy at ε= 0.9

(a) b⋆versus λ

10

−3

10

−2

10

−1

10

0

10

−1

10

0

10

1

10

2

Energy coefficient weight ε

Optimal sleep duration b*

Optimal policy at λ= 0.1

Optimal policy at λ= 5

(b) b⋆versus ǫ

Figure 1: Optimal sleep duration with exponential off times.

10

−3

10

−2

10

−1

10

0

10

1

10

0

10

1

10

2

10

3

10

4

Arrival rate λ

Optimal cost V*

Optimal policy at ε= 0.1

Standard policy at ε= 0.1

Optimal policy at ε= 0.9

Standard policy at ε= 0.9

(a) V⋆versus λ

10

−3

10

−2

10

−1

10

0

10

−1

10

0

10

1

10

2

10

3

Energy coefficient weight ε

Optimal cost V*

Optimal policy at λ= 0.1

Standard policy at λ= 0.1

Optimal policy at λ= 5

Standard policy at λ= 5

(b) V⋆versus ǫ

Figure 2: Optimal expected cost with exponential off times.

10

−3

10

−2

10

−1

10

0

10

−2

10

−1

10

0

10

1

10

2

10

3

Optimal sleep duration b*

Energy coefficient weight ε

Two stage suboptimal policy− b1

*

Two stage suboptimal policy −b2

*

One Stage suboptimal policy −b1

Exponential vacation policy− b*

*

(a) Sleep durations versus ǫ

10

−3

10

−2

10

−1

10

0

10

−1

10

0

10

1

10

2

Energy coefficient weight ε

Optimal cost V*

Two stage suboptimal policy

One stage suboptimal policy

Exponential vacation policy

Standard policy

(b) Costs versus ǫ

Figure 3: Sleep durations and costs with hyper-exponentialoff times.

Figure 2 depicts the optimal (cf. (11)) and standard (cf. (7)) costs. Observe in Fig.

2(a) how the cost decreases asymptotically to ǫEL(1 for ǫ = 0.1 and 9 for ǫ = 0.9)

as foreseen in Sect. 4. As λ decreases, the increase of the optimal cost is due to the

increase of the optimal sleep duration, while for the standard’s policy the cost increase

is due to the extra (useless and costly) listening. The optimal cost increases with ǫ (cf.

INRIA

inria-00420542, version 1 - 29 Sep 2009

Page 19

Optimal Sampling for State Change Detection

15

Fig. 2(b)). Small values of ǫ make the cost more sensitive to delay, thereby enforcing

vacations to be smaller and subsequently incurring smaller costs.

The cost of the standard’s policy is high at small ǫ, when delay is very penalizing.

This is because the standard has been designed to favor energy over delay. As the

vacation size increases exponentially over time, the extra delay can get very large.

5.2Hyper-Exponential Off Time

In this case, we are able to compute two suboptimal policies using policy iteration.

We compare the performance of these to that of the Exponential vacation policy and

the standard’s policy. The off time distribution is hyper-exponential with parameters

λ λ λ = {0.2,3,10} and q = {0.1,0.3,0.6}. The suboptimal solutions are evaluated

using (22), the exponential vacation policy using (21)-(20) and the standard’s policy

using (7).

The performance of the four policies is depicted in Fig. 3 against the energy co-

efficient weight ǫ. Naturally, the suboptimal policies perform strictly better than the

Exponential vacation policy, having the two stage iteration policy strictly outperform-

ing the one stage one (cf. Fig. 3(b)). Interestingly, for large value of ǫ, the standard’s

policy outperforms all the other policies. As observed earlier, the standard favors en-

ergy over delay, so that at large ǫ, it is very efficient in reducing the cost. It is expected

however that n-stage policy iteration will outperformthe standard for sufficiently large

n.

6Concluding Remarks

We have introduced a model for the control of vacations for optimizing energy saving

in wireless networks taking into account the tradeoff between energy consumptionand

delays. Previous models studied in the literature have considered an exogenous arrival

process, whereas we consideredan on-offmodelin which the off durationbeginswhen

the server leaves on vacation and where the duration of the on time does not depend

on when it starts. We derived the optimal policy in case of a Poisson arrival process

and found many structural properties of the optimal policy for hyper-exponential and

general off times. Suboptimal policies have been derived in this case using one and

two stage policy iteration.

References

[1] J.Almhana,Z.Liu,C. Li, andR. McGorman. Trafficestimationandpowersaving

mechanism optimization of IEEE 802.16e networks. In Proc. of IEEE ICC 2008,

Beijing, China, pages 322–326, May 2008.

[2] S. Alouf, E. Altman, and A. P. Azad. Analysis of an M/G/1 queue with repeated

inhomogeneous vacations with application to IEEE 802.16e power saving mech-

anism. In Proc. of QEST 2008, pages 27–36, September 2008.

[3] D. Bertsekas. Dynamic Programming and Optimal Control, volume I. Athena

Scientific, second edition, 1996.

RR n° 7026

inria-00420542, version 1 - 29 Sep 2009

Page 20

16

Azad, Alouf, Altman, Borkar & Paschos

[4] R. M. Corless, G. H. Gonnet, D. E. G. Hare, D. J. Jeffrey, and D. E. Knuth. On

the Lambert W function. Advances in Computational Mathematics, 5:329–359,

1996.

[5] M. Crovella and A. Bestavros. Self-similarity in world wide web traffic-evidence

and possible causes. In Proc. of ACM Sigmetrics, Philadelphia, PE, pages 160–

169, 1996.

[6] E. Feinberg. On stationary strategies in borel dynamic programming. Math. of

Operations Research, 17(2):392–397,May 1992.

[7] A. Feldmann and W. Whitt. Fitting mixtures of exponentials to long-tail dis-

tributions to analyze network performance models.

31(8):963–976,August 1998.

Performance Evaluation,

[8] K. Han and S. Choi. Performance analysis of sleep mode operation in IEEE

802.16e mobile broadbandwireless access systems. In Proc. of IEEE VTC 2006-

Spring, Melbourne, Australia, May 2006.

[9] IEEE Std 802.16e-2005: Standard for Local and Metropolitan Area Networks

Part 16: Air Interface for Fixed and Mobile Broadband Wireless Access Systems

- Amendment: Physical and Medium Access Control Layers for Combined Fixed

and Mobile Operation in Licensed Bands, 2005.

[10] D. G. Jeong and W. S. Jeon. Performance of adaptive sleep period control for

wireless communications systems. IEEE Trans. on Wireless Communications,

5:3012–3016,November 2006.

[11] N-H. Lee and S. Bahk. MAC sleep mode control considering downlink traffic

pattern and mobility. In Proc. of IEEE VTC 2005-Spring, Stockholm, Sweden,

volume 3, pages 3102–3106,May 2005.

[12] Y. Park and G. U. Hwang. Performance modelling and analysis of the sleep

mode in IEEE 802.16e WMAN. In Proc. of IEEE VTC 2007-Spring,Melbourne,

Australia, pages 2801–2806,April 2007.

[13] M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Pro-

gramming. Wiley, 2005.

[14] D. Shuman and M. Liu. Optimal sleep scheduling for a wireless sensor network

node. In Proc. of 40th Asilomar Conference on Signals, Systems and Computers

(ACSSC), pages 1337–1341,Nov. 2006.

[15] W. Willinger,M. Taqqu,R. Sherman,and D. Wilson. Self-similarity throughhigh

variability: Statistical analysis of ethernet lan traffic at the source level. In Proc.

of ACM SIGCOMM, Cambridge, MA, volume 25, pages 110–113, 1995.

[16] J. Xiao, S. Zou, B. Ren, and S. Cheng. An enhanced energy saving mechanism in

ieee 802.16e. In Proc. of IEEE GLOBECOM 2006, pages 1–5, November 2006.

[17] Y. Xiao. Energy saving mechanism in the IEEE 802.16e wireless MAN. IEEE

Communications Letters, 9:595–597, July 2005.

INRIA

inria-00420542, version 1 - 29 Sep 2009

Page 21

Optimal Sampling for State Change Detection

17

[18] Y. Xiao.

802.16e wireless MAN. In Proc. of IEEE CCNC 2006, pages 406–410, January

2006.

Performance analysis of an energy saving mechanism in the IEEE

[19] Y. Zhang. Performance modeling of energy management mechanism in IEEE

802.16e mobile WiMAX. In Proc. of IEEE WCNC 2007, pages 3205–3209,

March 2007.

[20] Y. Zhang and M. Fujise. Energy management in the 802.16e MAC. IEEE Com-

munications Letters, 10:311–313,April 2006.

RR n° 7026

inria-00420542, version 1 - 29 Sep 2009

Page 22

Centre de recherche INRIA Sophia Antipolis – Méditerranée

2004, route des Lucioles - BP 93 - 06902 Sophia Antipolis Cedex (France)

Centre de recherche INRIA Bordeaux – Sud Ouest : Domaine Universitaire - 351, cours de la Libération - 33405 Talence Cedex

Centre de recherche INRIA Grenoble – Rhône-Alpes : 655, avenue de l’Europe - 38334 Montbonnot Saint-Ismier

Centre de recherche INRIA Lille – Nord Europe : Parc Scientifique de la Haute Borne - 40, avenue Halley - 59650 Villeneuve d’Ascq

Centre de recherche INRIA Nancy – Grand Est : LORIA, Technopôle de Nancy-Brabois - Campus scientifique

615, rue du Jardin Botanique - BP 101 - 54602 Villers-lès-Nancy Cedex

Centre de recherche INRIA Paris – Rocquencourt : Domaine de Voluceau - Rocquencourt - BP 105 - 78153 Le Chesnay Cedex

Centre de recherche INRIA Rennes – Bretagne Atlantique : IRISA, Campus universitaire de Beaulieu - 35042 Rennes Cedex

Centre de recherche INRIA Saclay – Île-de-France : Parc Orsay Université - ZAC des Vignes : 4, rue Jacques Monod - 91893 Orsay Cedex

Éditeur

INRIA - Domaine de Voluceau - Rocquencourt, BP 105 - 78153 Le Chesnay Cedex (France)

http://www.inria.fr

ISSN 0249-6399

inria-00420542, version 1 - 29 Sep 2009