Page 1

Utility-Optimal Scheduling in Time-Varying Wireless

Networks with Delay Constraints

I-Hong Hou

CSL and Department of Computer Science

University of Illinois

Urbana, IL, 61801, USA

ihou2@illinois.edu

P. R. Kumar

CSL and Department of ECE

University of Illinois

Urbana, IL 61801, USA

prkumar@illinois.edu

ABSTRACT

Clients in wireless networks may have per-packet delay con-

straints on their traffic. Further, in contrast to wireline net-

works, the wireless medium is subject to fading. In such a

time-varying environment, we consider the system problem of

maximizing the total utility of clients, where the utilities are

determined by their long-term average rates of being served

within their delay constraints. We also allow for the addi-

tional fairness requirement that each client may require a cer-

tain minimum service rate. This overall model can be applied

to a wide range of applications, including delay-constrained

networks, mobile cellular networks, and dynamic spectrum

allocation.

We address this problem through convex programming. We

propose an on-line scheduling policy and prove that it is utility-

optimal. Surprisingly, this policy does not need to know the

probability distribution of system states. We also design an

auction mechanism where clients are scheduled and charged

according to their bids. We prove that the auction mechanism

restricts any selfish client from improving its utility by faking

its utility function. We also show that the auction mechanism

schedules clients in the same way as that done by the on-line

scheduling policy. Thus, the auction mechanism is both truth-

ful and utility-optimal. Finally, we design specific algorithms

that implement the auction mechanism for a variety of appli-

cations.

Categories and Subject Descriptors

C.2.1 [COMPUTER-COMMUNICATION NETWORKS ]: Net-

work Architecture and Design —Wireless communication

General Terms

Theory

This material is based upon work partially supported by US-

ARO under Contract Nos. W911NF-08-1-0238and W-911-NF-

0710287, AFOSR under Contract FA9550-09-0121, and NSF

under Contract No. CNS-07- 21992.

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that copies

bear this notice and the full citation on the first page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior specific

permission and/or a fee.

MobiHoc’10, September 20–24, 2010, Chicago, Illinois, USA.

Copyright 2010 ACM 978-1-4503-0183-1/10/09 ...$10.00.

Keywords

Scheduling, utility maximization, NUM, deadlines, delays, auc-

tion

1. INTRODUCTION

This paper studies the problem of network utility maximiza-

tion (NUM) in time-varying wireless networks, when packets

have delay constraints. It is motivated by two considerations.

First, delay constraints are becoming important as wireless

networks are increasingly used for serving real-time traffic

such as VoIP and video streaming. Also, delay constraints are

critical to applications such as networked control where inner

control loops can be destabilized by excessive delay, and even

outer control loops used for coordination are safety critical,

e.g., vehicular traffic control. Second, unlike wireline net-

works, where the network topologies and link capacities are

static, wireless networks are time-varying in that the available

bandwidth and link qualities are all time-varying due to both

node mobilities and channel fading.

We first propose a system model that allows us to pose and

provide solutions that address the dynamics of different en-

tities involved. The model characterizes the system state by

the collection of subsets of clients that can be served under

it, and makes no other assumptions about the network. Thus,

the model is general enough to be applied to a wide range of

applications, including delay-constrained wireless networks

with rate adaptation, mobile cellular networks, and dynamic

spectrum allocation; we will specifically focus on the prob-

lem of delay-constrained wireless networks. The performance

of a client is defined by the long-term average rate that it is

served, subject to per-packet delay constraints. The utility

gained by a client is determined by its service rate through its

utility function. Further, to impose a certain degree of fair-

ness and avoid starving some clients, we assume that each

client requires a certain lower bound on its service rate. The

NUM problem under this model is to maximize the total long-

term utility with respect to network dynamics, per-packet de-

lay constraints, and minimum service requirements of clients.

To solve the foregoing NUM problem in time-varying wire-

less networks, we first formulate it as a convex programming

problem in which network dynamics are considered. We then

propose an on-line scheduling policy for the NUM problem

that does not require knowledge of the probability distribu-

tion of system states. We prove that the policy converges to

the optimal solution of the convex programming problem and

thus solves the NUM problem.

In practice, utility functions may be known only to the clients.

Page 2

Thus, clients may provide a fake utility function to gain more

service. To ensure that clients reveal their true utility func-

tions, we design an auction that is based on the Vickrey-

Clarke-Groves (VCG) mechanism [3,5,20] for scheduling. In

this auction, clients announce their bids in each instance, and

the server schedules service and charges clients based on their

bids as well as the system state. We prove that this auction is

truthful, meaning that a selfish client cannot strictly increase

its net utility by lying about its utility function. We also show

that the schedule derived from the auction is the same as that

from the on-line scheduling policy for solving the NUM. Thus,

this auction mechanism also achieves maximum total utility.

We also discuss how to implement the auction mechanism

for three possible applications: delay-constrained wirelessnet-

works, mobile cellular networks, and dynamic spectrum allo-

cation. For each of the three applications, we derive specific

algorithms for both scheduling clients and charging them.

Finally, we provide simulation results for the three applica-

tions. We compare our proposed policies against state-of-the-

art policies for each application. The compared policies either

only focus on satisfying the minimum service requirements of

clients or consider utilities on a per-interval base rather than

long-term average performance. Simulation results show that

these policies can result in low utility and serious unfairness.

This suggests that, where long-term average performance is

concerned, these compared policies are not applicable. On

the other hand, our proposed policies not only satisfy the min-

imum service requirementsfor all clients but also achieves the

highest utilities in all three applications.

The rest of the paper is organized as follows. Section 2 sum-

marizes related work. Section 3 describes a system model for

time-varying wireless networks and defines the NUM prob-

lem. We demonstrate that several applications can be de-

scribed by our system model in Section 4. Section 5 formu-

lates the NUM problem as a convex programming problem

and describes a simple on-line scheduling policy that solves it.

Section 6 designs an auction mechanism that not only makes

clients report their true utility functions but also achieves the

maximum total long-term utility. Section 7 discusses algo-

rithms for implementing the auction mechanism under sev-

eral applications. Simulation results are presented in Section

8. Finally, Section 9 concludes this paper.

2. RELATED WORK

First, we note that there is no work other than [8], to our

knowledge, that addresses utility maximization when packets

have delay constraints. Rather utility maximization has been

studied in the context of throughput only. Second, we note

that even such work that studies the NUM problem mostly

studies it in the context of static networks, and cannot be ap-

plied to time-varying wireless networks. The only work [8]

that studies delay constraints in a utility maximization frame-

work, also considers only a static network.

The work on network utility maximization was initiated by

Kelly [10] and Kelly, Maulloo, and Tan [11], who considered

utility-optimal rate control algorithms in wireline networks.

Lin and Shroff [13] have considered the NUM problem in

wireless networks with multi-path routing. These works as-

sume the network topology is static. Liu, Chong, and Shroff

[14] and O’Neill, Goldsmith, and Boyd [17] have both con-

sidered the NUM problem in a time-varying environments.

However, they have evaluated the performance of clients on a

per-interval base. Yi and Chiang [21] have summarized other

existing work on the NUM problem.

Shakkottai and Srikant [19] and Raghunathan et al [18]

have studied maximizingtotal throughput for delay-constrained

traffic over unreliable wireless links. Their results, however,

may result in serious unfairness. Hou and Kumar [7] have

studiedan analytical model for delay-constrained wirelessnet-

works and proposed feasibility-optimalscheduling policies that

satisfy the minimum service requirements of clients. Their

work has not considered utilities gained by clients. As noted

above, the work [8] has proposed an utility-optimal schedul-

ing policy for delay-constrained traffic over unreliable wire-

less links. This work only treats the case when link reliabili-

ties are time-invariant and does not consider rate adaptation.

Thus, it is not suitable for networks with fading channels or

with rate adaptation.

Dynamic spectrum allocation has also attracted increasing

research interest. Gandhi et al [4] has proposed a frame-

work for spectrum auctions. Zhou et al [22] and Jia et al

[9] have studied designing truthful spectrum auction mecha-

nisms. These works have focused on the scenario where spec-

trum auctions are carried out infrequently.

3. SYSTEM MODEL

Consider a wireless system with one server and N clients,

numbered {1,2,...,N}. Time is divided into time intervals.

Each client desires some service in each time interval. The

service requirement within a time interval of each client is

indivisible; that is, the server can only either fully meet the

demand of a client or not serve it at all. At the beginning

of each time interval, the server obtains the current channel

condition. Both the demands of clients and the channel con-

dition can be time-varying, and together we call them the

system state in each time interval. The sever can learn the

system state by either polling, probing, or estimating. Since

these operations are costly and cannot be carried out too fre-

quently, the server assumes that the system state does not

change within an interval. Due to limited wireless resources,

the server may be only able to serve some particular subsets

of clients in each system state. To be more specific, we de-

note the system state in the kthtime interval by c(k) ∈ C,

where C is a finite set, and {c(1),c(2),...} are i.i.d. random

variables with Prob{c(k) = c} =: pc. In practice, not only

the system state but also the distribution of system states can

be time-varying. However, the distribution of system states

usually evolve on a much slower time scale compared to the

length of a time interval and thus is assumed to be static.

A subset S of clients is said to be feasible under system state

c if it is possible for the server to serve all clients in S. For

simplicity, we represent a system state c by the collection of

subsets S that are feasible under c. Thus, we have S ∈ c if S is

feasible under c, and S / ∈ c otherwise. Since the constraints of

feasible sets can be defined arbitrarily, this model can be ap-

plied to a wide range of applications. We will illustrate some

examples of applications in Section 4. In particular, it can ac-

commodate per-packet delay constraints and rate adaptation.

The server is in charge of choosing a feasible subset S ∈

c(k) to serve in each time interval k. The server’s choice is

described by a scheduling policy.

DEFINITION 1. Let h(k) be the system’s history up to the

Page 3

kthtime interval. A scheduling policy is a function η : (h(k −

current system state c(k), the serverchooses the subset η[h(k−

1),c(k)] ∈ c(k) of clients to serve. All clients n ∈ η[h(k −

1),c(k)] are considered to be served in the kthtime interval.

1),c(k)) → 2{1,2,...,N}, such that given history h(k − 1) and

As the system state is time-varying, it is less meaningful

to discuss the performance of clients on a per-interval base.

Rather, we measure the performance of a client through its

average rate of being served. We define the service rate of a

client n as follows:

DEFINITION 2. Let qn(k) denote the service rate of client n

up to the kthtime interval, defined by the recursion:

⎧

⎪

where 0 ≤ αk ≤ 1, for all k. The long-term service rate of

client n is defined as qn := liminfk→∞qn(k).

In the above definition, αkis a system-wide variable that is

assumed to be the same for all clients. For example, by setting

αk ≡

client n is being served. On the other hand, setting all αk

to be a constant makes qn(k) a weighted-average of service

where recent service is more important than service a long

time ago.

We further assume that each client n has an utility func-

tion Un(·). The utility functions are strictly increasing, strictly

concave, and infinitely differentiable. At the kthtime inter-

val, client n receives utility that is equivalent to an amount

1

αkUn(qn(k)) of money. The scaling factor

of money received by client n is set to equalize the effects of

events in each interval. Section 6.1 provides a more detailed

explanation of this setting. The long-term utility of client n is

defined as liminfk→∞Un(qn(k)), which equals Un(qn) since

Un(·) is continuous.

Finally, to enforce some form of fairness among clients, we

also assume that each client n has a requirement of minimum

long-term service rate, qn; that is, it requires qn ≥ qnwith

probability 1. We assume that the minimum long-term ser-

vice rate requirements are strictly feasible, that is, there exists

some scheduling policy that ensures qn > qn, for all n.

We are interested in maximizing the total long-term util-

ity of the network,?N

Max?N

and qn ≥ qn,∀n.

However, this formulation only considers the long-term be-

havior of the system. A solution to this NUM problem may

not translate into an implementable scheduling policy, which

would have to make decisions on a per-interval basis. Thus,

we also wish to design utility-optimal scheduling policies.

qn(k + 1) =

⎪

⎪

⎪

⎩

⎨

(1 − αk)qn(k) + αk,

if client n is served

at the kthinterval,

(1 − αk)qn(k),

otherwise,

1

k, qn(k) becomes the proportion of time intervals that

1

αkof the amount

n=1Un(qn). The NUM problem of this

framework can hence be expressed as:

n=1Un(qn)

s.t. Network dynamics and feasibility constraints,

DEFINITION 3. A scheduling policy η is said to be utility-

optimal if, by applying η,?N

n=1Un(qn(k)) converges to the

optimal value of the NUM problem almost surely as k → ∞.

4.EXAMPLES OF APPLICATIONS

We will first discuss several applications that can be de-

scribed by our framework.

4.1Delay-Constrained WirelessNetworks with

Rate Adaptation

We consider the model introduced in [6] that characterizes

a system where clients generate real-time traffic, and which

was extended to allow fading in [7]. Assume that there are N

wireless clients and one access point (AP). Time is assumed

to be slotted and divided into time intervals, each consisting

of T consecutive time slots. At the beginning of each time

interval, packets for each client arrive at the AP. Each client

specifies a delay bound of τn time slots, with τn ≤ T. The

packet for client n is to be delivered no later than the τth

time slot in each time interval. Otherwise, the packet expires

and is dropped from the system.

Due to channel fading, the link qualities between the AP

and the client can be time-varying. We assume that the AP

has full knowledge of the current channel state. The AP then

applies rate adaptation for error-free transmissions. Thus, the

transmission rates for different clients can be different, which

in turn results in different transmission times. We define tc,n

as the number of time slots required for an error-free trans-

mission for client n under system state c. A scheduling policy

is one which selects an ordered subset S = {s1,s2,...,sm}

of clients and transmits packets for clients in S according to

the order. The ordered subset is considered feasible under

system state c if packets for all clients in S can be delivered

before their respective delay bounds, or, to be more specific,

?i

4.2Mobile Cellular Network

Consider a mobile cellular network with a base station and

N users. The system may have more than one channel, but

each channel can be occupied by at most one user at any given

time. We assume that time is slotted, where a time slot cor-

responds to a time interval in the system model. The length

of a time slot is defined as the time needed for transmitting

a packet plus any control overhead. Also, due to mobility,

the link qualities between the base station and an user can be

time-varying. We consider an ON/OFF model for links. The

link between an user and the base station is considered ON if

a packet can be transmitted between the two without errors,

and considered OFF otherwise. We assume that the base sta-

tion never transmits packets to users with OFF links. Thus,

the system state at any time slot can be described as the set of

users with ON links. A subset S of users is considered feasible

under some system state c if for any user n ∈ S, the link be-

tween user n and the base station is ON, and the size of S is

smaller than or equal to the number of channels. A schedul-

ing policy is one which chooses, based on current system state

and past history, a feasible subset of users and assigns chan-

nels to each of them. Finally, the service rate of each user is

equal to its throughput.

4.3Dynamic Spectrum Allocation

Consider a scenario with one primary userand N secondary

users. The primary user holds licenses for several channels

over a large geographical region. TV broadcasters are typi-

cal examples of primary users. The primary user only uses

n

n=1tc,sn≤ τsi, for all 1 ≤ i ≤ m. In this scenario, the

service rate of each client reflects its timely throughput.

Page 4

a portion of its licensed channels and is willing to allocate

unused channels to secondary users. The secondary users

are scattered throughout the region and constrained to much

smaller transmission powers compared to the primary user,

which makes spatial reuse possible. Still, some secondary

users may interfere with each other and thus cannot be al-

located the same channel simultaneously. We use a conflict

graph G = (V,E) to represent the interference relations be-

tween secondary users, where V is the set of secondary users

and there is an edge between two users if they interfere with

each other.

The primary user allocates unused channels periodically.

Since the network activity of the primary user can be time-

varying, the number of unused channels can also be time-

varying. A scheduling policy is one which chooses disjoint

subsets of secondary users for each unused channel, with the

constraint that two users that are assigned the same channel

cannot share a link in the conflict graph.

5. A GENERAL METHOD FOR UTILITY

MAXIMIZATION

In this section, we propose a general method for solving the

NUM problem in time-varying wireless networks with mini-

mum service requirements. We first show that the NUM prob-

lem can be formulated as a convex programming problem.

Although the formulation requires explicit knowledge of the

distribution of system states, i.e., the values of probability

[pc], we will show the surprising result that there exists an on-

line scheduling policy that does not need any information on

the distribution of system states, and is, further, also utility-

optimal. For simplicity, we assume that αk := 1/k, that is,

qn(k) is the proportion of time intervals that client n has been

served until the kthtime interval. We will discuss the case

where αkis a constant for all k at the end of this section.

5.1 Convex Programming Formulation

Define pc(k) and fc,S(k), for all system states c and subsets

S ∈ c, recursively, as follows:

⎧

⎩

and

pc(k + 1) =

⎨

k−1

kpc(k) +1

k,

if c(k) = c,

k−1

kpc(k),

otherwise,

fc,S(k + 1) =

⎧

⎩

⎨

k−1

kfc,S(k) +1

S is scheduled at the kthinterval,

k−1

kfc,S(k),

otherwise.

k,

if c(k) = c and

These two variables can be thought of as the relative frequen-

cies of occurrence of the system state c and the event that

subset S is scheduled under system state c, respectively. Also,

we have?

sider scheduling policies where fc,S := limk→∞fc,S(k) ex-

ists for all system states c and subsets S.

?

S∈cfc,S(k) = pc(k) and?

c

?

S:S∈c,n∈Sfc,S(k) =

qn(k) for all c and k. For ease of discussion, we only con-

Thus, we have

S∈cfc,S = pc and?

c

?

S:S∈c,n∈Sfc,S = qn. The NUM

problem can be described as the following convex program-

ming problem:

Max?N

qn =?

fc,S ≥ 0.

n=1Un(qn) =?N

?

n=1Un(?

c

?

S:S∈c,n∈Sfc,S)

s.t.

?

S∈cfc,S = pc,∀c,

cS:S∈c,n∈Sfc,S ≥ qn,∀n,

over

While typical techniques for solving a convex programming

problem can be applied to solve this NUM problem, such so-

lutions may not be directly translatable into a scheduling pol-

icy for our time-varying network. Also, a solution based on

solving the convex programming problem would require the

knowledge of the probability distribution of system states. In

practice, this knowledge may not always be available to the

server. Thus, a scheduling policy that makes decisions based

only on past history and current system state is needed.

5.2 An On-line Scheduling Policy

We now describe an on-line scheduling policy, and prove

that it is utility-optimal. This scheduling policy only requires

information on the past history and the current system state,

and, surprisingly, does not need any knowledge of the actual

probability distribution of system states. Thus, it is easily im-

plementable. The scheduling policy is based on dual decom-

position, which is similar to the approach used in Lin and

Shroff [13], although they do not consider network dynam-

ics.

We assign a Lagrange multiplier λn for each constraint

?

L(f,λ) =

+?N

where f denotes the vector consisting of [fc,S], for all c and

S, and λ denotes the vector [λn]. The dual objective function

is:

c

?

S:S∈c,n∈Sfc,S ≥ qn. The resulting Lagrangian of the

resulting convex programming problem is:

?N

n=1Un(?

c,S:S∈c,n∈Sfc,S)

c,S:S∈c,n∈Sfc,S− qn),

n=1λn(?

D(λ) = maxf:fc,S≥0;?

Since the minimum long-term service rate requirements,

[qn], are strictly feasible, there exist [fc,S] such that

?

for all n. By Slater’s condition, minλD(λ) equals the maxi-

mum total utility.

Let λ(k) = [λn(k)] denote Lagrange multipliers that are

used in the kthperiod. The maximum total utility can be

achieved by solving two subproblems: maximizing

S∈cfc,S=pc,∀cL(f,λ).

S∈c

fc,S = pc, and

?

c

?

S:S∈c,n∈S

fc,S > qn,

lim

k→∞E[L(f(k),λ)],

for any given λ, and minimizing

lim

k→∞E[D(λ(k))].

We will refer to these two subproblems as the primal problem

and dual problem, respectively.

We first discuss how to solve the primal problem. Due to

the constraint?

for every c and S such that fc,S > 0. Recall that Un(·) is

S∈cfc,S = pc, [fc,S] is an optimal solution if

n∈S(U?

and only if

∂L

∂fc,S:=?

n(qn) + λn) = maxS?∈c

∂L

∂fc,S?

Page 5

strictly concave, and U?

Suppose, at some time interval k with c(k) = c, there exists a

subset S feasible under c such that?

c. We wish to narrow the difference between S and all other

S?. One obvious choice would be to schedule the subset S in

the time interval, so as to increase qn(k+1) for all n ∈ S, and

thus decrease?

maximizes?

DEFINITION 4. Given λ and f(k), a max-weight scheduling

policy is one that schedules a feasible subset S ∈ c(k) that

maximizes?

LEMMA 1. Let Δf(k) be the vector consisting of the elements

Δfc,S(k) := fc,S(k + 1) − fc,S(k) for all c and S. Given

λ and f(k), the max-weight scheduling policy also maximizes

E[∇L(f,λ) · Δf(k)|fc,S(k)].

PROOF. Recall that we have:

⎧

⎩

Thus, Δfc,S(k) =

uled, and Δfc,S(k) = −1

the probability that c(k) = c and S is scheduled under the

max-weight scheduling policy. We then have:

E[∇L(f,λ)Δf(k)|fc,S(k)] =?

Since Prob{c(k) = c} = pc,?

?

n(·) is a strictly decreasing function.

n∈S(U?

n(qn(k))+λn) >

?

n∈S?(U?

n(qn(k))+λn) for all other subsets S?feasible under

n∈S(U?

n(qn(k)) + λn). In fact, as we shall

see in the lemma below, selecting the feasible subset S that

n∈S(U?

ascent direction of L.

n(qn(k))+λn) also points in the steepest

n∈S(U?

n(qn(k)) + λn) in each time interval k.

fc,S(k + 1) =

⎨

k−1

kfc,S(k) +1

S is scheduled at the kthinterval,

k−1

kfc,S(k),

otherwise.

k,

if c(k) = c and

1

k(1 − fc,S(k)) if c(k) = c and S is sched-

kfc,S(k), otherwise. Letˆfc,S(k) be

c,SE[

n∈S(U?

∂L

∂fc,SΔfc,S(k)|fc,S(k)]

n(qn(k)) + λn)]}.

=1

k{?

c,S[ˆfc,S(k) − fc,S(k)][?

E[∇L(f,λ)Δf(k)|fc,S(k)] is maximized by setting:

pc,

0,

otherwise.

(1)

Sˆfc,S(k) = pc. The term

ˆfc,S(k) =

if S = argmaxS∈c?

n∈SU?

n(qn(k)) + λn,

(2)

This is achieved by selecting the feasible subset S that maxi-

mizes?

Next, we prove that the max-weightscheduling policy solves

the primal problem.

n∈S(U?

n(qn(k)) + λn) for every system state c.

THEOREM 1. Under the max-weight scheduling policy,

L(f(k),λ) → D(λ), as k → ∞,

for any given λ.

PROOF. Since the utility functions are infinitely differen-

tiable, L(f,λ) is also infinitely differentiable. By Taylor’s the-

orem, we have that for any f, Δf, and fixed λ,

L(f + Δf,λ) = L(f,λ) + ∇L(f,λ)Δf + r(f,Δf,λ),

where |r(f,Δf,λ)| < a(λ)|Δf|2, for some constant a(λ).

Now we have,

E[L(f(k + 1),λ)|f(k)]

L(f(k),λ) + E[∇L(f(k),λ)Δf(k) − a(λ)|Δf(k)|2|f(k)]

L(f(k),λ) + E[∇L(f(k),λ)Δf(k)|f(k)] − ˜ a/k2,

≥

≥

(3)

where Δf(k) is defined as in Lemma 1 and ˜ a is some constant.

The last inequality follows because |Δfc,S(k)| ≤

Let pc(k) :=?

the proof of Lemma 1. The values ofˆfc,S(k) under the max-

weight scheduling policy are given as in (2).

ˆ μc(k) := maxS∈c?

E[∇L(f(k),λ)Δf(k)|f(k)] ≥1

1

kfor all c,S.

S∈cfc,S(k), which is the empiricalfrequency

that system state c occurs, and letˆfc,S(k) be defined as in

Further, let

n∈S(U?

n(qn(k)) + λn), for all c. Using

(1) and (2),

k

?

c

(pc− pc(k))ˆ μc(k).

Since kpc(k + 1) is the number of occurrences of system

state c until the kthtime interval, and the system state in

each time interval is i.i.d. distributed, by the law of iterated

logarithm [2], there exists some positive constant b such that

limsupk→∞

there exists constant˜b such that

k(pc(k+1)−pc)

k1/2(log log k)1/2≤ b. Thus, for large enough k,

E[∇L(f(k),λ)Δf(k)|f(k)] ≥ −(loglogk)1/2

For large enough k, (3) can hence be bounded by

k3/2

˜b.

E[L(f(k + 1),λ)|f(k)] ≥ L(f(k),λ) −(loglogk)1/2

k3/2

˜b −˜ a

k2.

(4)

As we can see in the above, E[L(f(k + 1),λ)|f(k)] is “al-

most” larger than L(f(k),λ) except for twodiminishing terms.

For large enough constant d, −L(f,λ) + d is also nonnega-

tive for all f, and by (4) it is therefore a “near positive sub-

martingale” as in [15]. Since?∞

surely.

Next, we need to show that limk→∞L(f(k),λ) = D(λ).

We prove this by contradiction. Recall that the necessary and

sufficient condition for L(f,λ) = D(λ) is that?

fc,S > 0. Suppose L(f(k),λ) does not converge to D(λ).

Then, there exists δ > 0, ? > 0 such that for all large enough

k, there exist (ck, Sk) so that fck,Sk> δ and?

term E[∇L(f(k),λ)Δf(k)|f(k)] under this condition shows

that E[∇L(f(k),λ)Δf(k)|f(k)] >

some constant K such that for all k > K,

obtain E[L(f(k + 1),λ)|f(k)] > L(f(k),λ) +

L(f(k),λ) +1

k=1

k=1[(log log k)1/2

k3/2

˜b +

˜ a

k2] < ∞,

Exercise II-4 in [15] shows that L(f(k),λ) converges almost

n∈S(U?

n(qn)+

λn) = maxS?∈c

?

n∈S?(U?

n(qn) + λn), for all c,S such that

n∈Sk(U?

n(qn)+

λn) < maxS?∈c

?

n∈S?(U?

n(qn) + λn) − ?. Evaluating the

1

kδ?. Since there exists

˜ a

k2 <

1

kδ? −

k= ∞, we also have

1

kδ?/2, we

˜ a

k2 >

kδ?/2. Since?∞

lim

1

k→∞E[L(f(k),λ)] = ∞,

which is a contradiction. Thus, limk→∞L(f(k),λ) = D(λ).

Next we discuss how to solve the dual problem: minλD(λ).

We use the subgradient method to solve it. We first find a

subgradient for D(λ).

LEMMA 2. Let vn := [?

PROOF. Let λ?be an arbitrary vector. We have:

c,S:S∈c,n∈Sf∗

c,S− qn], where [f∗

c,S]

maximizes L(f,λ). Then v is a subgradient of D(λ).

D(λ?) = maxf:?

≥L(f∗,λ?) = L(f∗,λ) + (λ?− λ)TvD(λ)

=D(λ) + (λ?− λ)Tv.

S∈cfc,S=pc,∀cL(f,λ?)