Utility-optimal scheduling in time-varying wireless networks with delay constraints.
-
Citations (0)
-
Cited In (0)
Page 1
Utility-Optimal Scheduling in Time-Varying Wireless
Networks with Delay Constraints
I-Hong Hou
CSL and Department of Computer Science
University of Illinois
Urbana, IL, 61801, USA
ihou2@illinois.edu
P. R. Kumar
CSL and Department of ECE
University of Illinois
Urbana, IL 61801, USA
prkumar@illinois.edu
ABSTRACT
Clients in wireless networks may have per-packet delay con-
straints on their traffic. Further, in contrast to wireline net-
works, the wireless medium is subject to fading. In such a
time-varying environment, we consider the system problem of
maximizing the total utility of clients, where the utilities are
determined by their long-term average rates of being served
within their delay constraints. We also allow for the addi-
tional fairness requirement that each client may require a cer-
tain minimum service rate. This overall model can be applied
to a wide range of applications, including delay-constrained
networks, mobile cellular networks, and dynamic spectrum
allocation.
We address this problem through convex programming. We
propose an on-line scheduling policy and prove that it is utility-
optimal. Surprisingly, this policy does not need to know the
probability distribution of system states. We also design an
auction mechanism where clients are scheduled and charged
according to their bids. We prove that the auction mechanism
restricts any selfish client from improving its utility by faking
its utility function. We also show that the auction mechanism
schedules clients in the same way as that done by the on-line
scheduling policy. Thus, the auction mechanism is both truth-
ful and utility-optimal. Finally, we design specific algorithms
that implement the auction mechanism for a variety of appli-
cations.
Categories and Subject Descriptors
C.2.1 [COMPUTER-COMMUNICATION NETWORKS ]: Net-
work Architecture and Design —Wireless communication
General Terms
Theory
This material is based upon work partially supported by US-
ARO under Contract Nos. W911NF-08-1-0238and W-911-NF-
0710287, AFOSR under Contract FA9550-09-0121, and NSF
under Contract No. CNS-07- 21992.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
MobiHoc’10, September 20–24, 2010, Chicago, Illinois, USA.
Copyright 2010 ACM 978-1-4503-0183-1/10/09 ...$10.00.
Keywords
Scheduling, utility maximization, NUM, deadlines, delays, auc-
tion
1.INTRODUCTION
This paper studies the problem of network utility maximiza-
tion (NUM) in time-varying wireless networks, when packets
have delay constraints. It is motivated by two considerations.
First, delay constraints are becoming important as wireless
networks are increasingly used for serving real-time traffic
such as VoIP and video streaming. Also, delay constraints are
critical to applications such as networked control where inner
control loops can be destabilized by excessive delay, and even
outer control loops used for coordination are safety critical,
e.g., vehicular traffic control. Second, unlike wireline net-
works, where the network topologies and link capacities are
static, wireless networks are time-varying in that the available
bandwidth and link qualities are all time-varying due to both
node mobilities and channel fading.
We first propose a system model that allows us to pose and
provide solutions that address the dynamics of different en-
tities involved. The model characterizes the system state by
the collection of subsets of clients that can be served under
it, and makes no other assumptions about the network. Thus,
the model is general enough to be applied to a wide range of
applications, including delay-constrained wireless networks
with rate adaptation, mobile cellular networks, and dynamic
spectrum allocation; we will specifically focus on the prob-
lem of delay-constrained wireless networks. The performance
of a client is defined by the long-term average rate that it is
served, subject to per-packet delay constraints. The utility
gained by a client is determined by its service rate through its
utility function. Further, to impose a certain degree of fair-
ness and avoid starving some clients, we assume that each
client requires a certain lower bound on its service rate. The
NUM problem under this model is to maximize the total long-
term utility with respect to network dynamics, per-packet de-
lay constraints, and minimum service requirements of clients.
To solve the foregoing NUM problem in time-varying wire-
less networks, we first formulate it as a convex programming
problem in which network dynamics are considered. We then
propose an on-line scheduling policy for the NUM problem
that does not require knowledge of the probability distribu-
tion of system states. We prove that the policy converges to
the optimal solution of the convex programming problem and
thus solves the NUM problem.
In practice, utility functions may be known only to the clients.
Page 2
Thus, clients may provide a fake utility function to gain more
service. To ensure that clients reveal their true utility func-
tions, we design an auction that is based on the Vickrey-
Clarke-Groves (VCG) mechanism [3,5,20] for scheduling. In
this auction, clients announce their bids in each instance, and
the server schedules service and charges clients based on their
bids as well as the system state. We prove that this auction is
truthful, meaning that a selfish client cannot strictly increase
its net utility by lying about its utility function. We also show
that the schedule derived from the auction is the same as that
from the on-line scheduling policy for solving the NUM. Thus,
this auction mechanism also achieves maximum total utility.
We also discuss how to implement the auction mechanism
for three possible applications: delay-constrained wirelessnet-
works, mobile cellular networks, and dynamic spectrum allo-
cation. For each of the three applications, we derive specific
algorithms for both scheduling clients and charging them.
Finally, we provide simulation results for the three applica-
tions. We compare our proposed policies against state-of-the-
art policies for each application. The compared policies either
only focus on satisfying the minimum service requirements of
clients or consider utilities on a per-interval base rather than
long-term average performance. Simulation results show that
these policies can result in low utility and serious unfairness.
This suggests that, where long-term average performance is
concerned, these compared policies are not applicable. On
the other hand, our proposed policies not only satisfy the min-
imum service requirementsfor all clients but also achieves the
highest utilities in all three applications.
The rest of the paper is organized as follows. Section 2 sum-
marizes related work. Section 3 describes a system model for
time-varying wireless networks and defines the NUM prob-
lem. We demonstrate that several applications can be de-
scribed by our system model in Section 4. Section 5 formu-
lates the NUM problem as a convex programming problem
and describes a simple on-line scheduling policy that solves it.
Section 6 designs an auction mechanism that not only makes
clients report their true utility functions but also achieves the
maximum total long-term utility. Section 7 discusses algo-
rithms for implementing the auction mechanism under sev-
eral applications. Simulation results are presented in Section
8. Finally, Section 9 concludes this paper.
2. RELATED WORK
First, we note that there is no work other than [8], to our
knowledge, that addresses utility maximization when packets
have delay constraints. Rather utility maximization has been
studied in the context of throughput only. Second, we note
that even such work that studies the NUM problem mostly
studies it in the context of static networks, and cannot be ap-
plied to time-varying wireless networks. The only work [8]
that studies delay constraints in a utility maximization frame-
work, also considers only a static network.
The work on network utility maximization was initiated by
Kelly [10] and Kelly, Maulloo, and Tan [11], who considered
utility-optimal rate control algorithms in wireline networks.
Lin and Shroff [13] have considered the NUM problem in
wireless networks with multi-path routing. These works as-
sume the network topology is static. Liu, Chong, and Shroff
[14] and O’Neill, Goldsmith, and Boyd [17] have both con-
sidered the NUM problem in a time-varying environments.
However, they have evaluated the performance of clients on a
per-interval base. Yi and Chiang [21] have summarized other
existing work on the NUM problem.
Shakkottai and Srikant [19] and Raghunathan et al [18]
have studied maximizingtotal throughput for delay-constrained
traffic over unreliable wireless links. Their results, however,
may result in serious unfairness. Hou and Kumar [7] have
studiedan analytical model for delay-constrained wirelessnet-
works and proposed feasibility-optimalscheduling policies that
satisfy the minimum service requirements of clients. Their
work has not considered utilities gained by clients. As noted
above, the work [8] has proposed an utility-optimal schedul-
ing policy for delay-constrained traffic over unreliable wire-
less links. This work only treats the case when link reliabili-
ties are time-invariant and does not consider rate adaptation.
Thus, it is not suitable for networks with fading channels or
with rate adaptation.
Dynamic spectrum allocation has also attracted increasing
research interest. Gandhi et al [4] has proposed a frame-
work for spectrum auctions. Zhou et al [22] and Jia et al
[9] have studied designing truthful spectrum auction mecha-
nisms. These works have focused on the scenario where spec-
trum auctions are carried out infrequently.
3. SYSTEM MODEL
Consider a wireless system with one server and N clients,
numbered {1,2,...,N}. Time is divided into time intervals.
Each client desires some service in each time interval. The
service requirement within a time interval of each client is
indivisible; that is, the server can only either fully meet the
demand of a client or not serve it at all. At the beginning
of each time interval, the server obtains the current channel
condition. Both the demands of clients and the channel con-
dition can be time-varying, and together we call them the
system state in each time interval. The sever can learn the
system state by either polling, probing, or estimating. Since
these operations are costly and cannot be carried out too fre-
quently, the server assumes that the system state does not
change within an interval. Due to limited wireless resources,
the server may be only able to serve some particular subsets
of clients in each system state. To be more specific, we de-
note the system state in the kthtime interval by c(k) ∈ C,
where C is a finite set, and {c(1),c(2),...} are i.i.d. random
variables with Prob{c(k) = c} =: pc. In practice, not only
the system state but also the distribution of system states can
be time-varying. However, the distribution of system states
usually evolve on a much slower time scale compared to the
length of a time interval and thus is assumed to be static.
A subset S of clients is said to be feasible under system state
c if it is possible for the server to serve all clients in S. For
simplicity, we represent a system state c by the collection of
subsets S that are feasible under c. Thus, we have S ∈ c if S is
feasible under c, and S / ∈ c otherwise. Since the constraints of
feasible sets can be defined arbitrarily, this model can be ap-
plied to a wide range of applications. We will illustrate some
examples of applications in Section 4. In particular, it can ac-
commodate per-packet delay constraints and rate adaptation.
The server is in charge of choosing a feasible subset S ∈
c(k) to serve in each time interval k. The server’s choice is
described by a scheduling policy.
DEFINITION 1. Let h(k) be the system’s history up to the
Page 3
kthtime interval. A scheduling policy is a function η : (h(k −
current system state c(k), the serverchooses the subset η[h(k−
1),c(k)] ∈ c(k) of clients to serve. All clients n ∈ η[h(k −
1),c(k)] are considered to be served in the kthtime interval.
1),c(k)) → 2{1,2,...,N}, such that given history h(k − 1) and
As the system state is time-varying, it is less meaningful
to discuss the performance of clients on a per-interval base.
Rather, we measure the performance of a client through its
average rate of being served. We define the service rate of a
client n as follows:
DEFINITION 2. Let qn(k) denote the service rate of client n
up to the kthtime interval, defined by the recursion:
⎧
⎪
where 0 ≤ αk ≤ 1, for all k. The long-term service rate of
client n is defined as qn := liminfk→∞qn(k).
In the above definition, αkis a system-wide variable that is
assumed to be the same for all clients. For example, by setting
αk ≡
client n is being served. On the other hand, setting all αk
to be a constant makes qn(k) a weighted-average of service
where recent service is more important than service a long
time ago.
We further assume that each client n has an utility func-
tion Un(·). The utility functions are strictly increasing, strictly
concave, and infinitely differentiable. At the kthtime inter-
val, client n receives utility that is equivalent to an amount
1
αkUn(qn(k)) of money. The scaling factor
of money received by client n is set to equalize the effects of
events in each interval. Section 6.1 provides a more detailed
explanation of this setting. The long-term utility of client n is
defined as liminfk→∞Un(qn(k)), which equals Un(qn) since
Un(·) is continuous.
Finally, to enforce some form of fairness among clients, we
also assume that each client n has a requirement of minimum
long-term service rate, qn; that is, it requires qn ≥ qnwith
probability 1. We assume that the minimum long-term ser-
vice rate requirements are strictly feasible, that is, there exists
some scheduling policy that ensures qn > qn, for all n.
We are interested in maximizing the total long-term util-
ity of the network,?N
Max?N
and qn ≥ qn,∀n.
However, this formulation only considers the long-term be-
havior of the system. A solution to this NUM problem may
not translate into an implementable scheduling policy, which
would have to make decisions on a per-interval basis. Thus,
we also wish to design utility-optimal scheduling policies.
qn(k + 1) =
⎪
⎪
⎪
⎩
⎨
(1 − αk)qn(k) + αk,
if client n is served
at the kthinterval,
(1 − αk)qn(k),
otherwise,
1
k, qn(k) becomes the proportion of time intervals that
1
αkof the amount
n=1Un(qn). The NUM problem of this
framework can hence be expressed as:
n=1Un(qn)
s.t. Network dynamics and feasibility constraints,
DEFINITION 3. A scheduling policy η is said to be utility-
optimal if, by applying η,?N
n=1Un(qn(k)) converges to the
optimal value of the NUM problem almost surely as k → ∞.
4.EXAMPLES OF APPLICATIONS
We will first discuss several applications that can be de-
scribed by our framework.
4.1Delay-Constrained WirelessNetworks with
Rate Adaptation
We consider the model introduced in [6] that characterizes
a system where clients generate real-time traffic, and which
was extended to allow fading in [7]. Assume that there are N
wireless clients and one access point (AP). Time is assumed
to be slotted and divided into time intervals, each consisting
of T consecutive time slots. At the beginning of each time
interval, packets for each client arrive at the AP. Each client
specifies a delay bound of τn time slots, with τn ≤ T. The
packet for client n is to be delivered no later than the τth
time slot in each time interval. Otherwise, the packet expires
and is dropped from the system.
Due to channel fading, the link qualities between the AP
and the client can be time-varying. We assume that the AP
has full knowledge of the current channel state. The AP then
applies rate adaptation for error-free transmissions. Thus, the
transmission rates for different clients can be different, which
in turn results in different transmission times. We define tc,n
as the number of time slots required for an error-free trans-
mission for client n under system state c. A scheduling policy
is one which selects an ordered subset S = {s1,s2,...,sm}
of clients and transmits packets for clients in S according to
the order. The ordered subset is considered feasible under
system state c if packets for all clients in S can be delivered
before their respective delay bounds, or, to be more specific,
?i
4.2Mobile Cellular Network
Consider a mobile cellular network with a base station and
N users. The system may have more than one channel, but
each channel can be occupied by at most one user at any given
time. We assume that time is slotted, where a time slot cor-
responds to a time interval in the system model. The length
of a time slot is defined as the time needed for transmitting
a packet plus any control overhead. Also, due to mobility,
the link qualities between the base station and an user can be
time-varying. We consider an ON/OFF model for links. The
link between an user and the base station is considered ON if
a packet can be transmitted between the two without errors,
and considered OFF otherwise. We assume that the base sta-
tion never transmits packets to users with OFF links. Thus,
the system state at any time slot can be described as the set of
users with ON links. A subset S of users is considered feasible
under some system state c if for any user n ∈ S, the link be-
tween user n and the base station is ON, and the size of S is
smaller than or equal to the number of channels. A schedul-
ing policy is one which chooses, based on current system state
and past history, a feasible subset of users and assigns chan-
nels to each of them. Finally, the service rate of each user is
equal to its throughput.
4.3Dynamic Spectrum Allocation
Consider a scenario with one primary userand N secondary
users. The primary user holds licenses for several channels
over a large geographical region. TV broadcasters are typi-
cal examples of primary users. The primary user only uses
n
n=1tc,sn≤ τsi, for all 1 ≤ i ≤ m. In this scenario, the
service rate of each client reflects its timely throughput.
Page 4
a portion of its licensed channels and is willing to allocate
unused channels to secondary users. The secondary users
are scattered throughout the region and constrained to much
smaller transmission powers compared to the primary user,
which makes spatial reuse possible. Still, some secondary
users may interfere with each other and thus cannot be al-
located the same channel simultaneously. We use a conflict
graph G = (V,E) to represent the interference relations be-
tween secondary users, where V is the set of secondary users
and there is an edge between two users if they interfere with
each other.
The primary user allocates unused channels periodically.
Since the network activity of the primary user can be time-
varying, the number of unused channels can also be time-
varying. A scheduling policy is one which chooses disjoint
subsets of secondary users for each unused channel, with the
constraint that two users that are assigned the same channel
cannot share a link in the conflict graph.
5. A GENERAL METHOD FOR UTILITY
MAXIMIZATION
In this section, we propose a general method for solving the
NUM problem in time-varying wireless networks with mini-
mum service requirements. We first show that the NUM prob-
lem can be formulated as a convex programming problem.
Although the formulation requires explicit knowledge of the
distribution of system states, i.e., the values of probability
[pc], we will show the surprising result that there exists an on-
line scheduling policy that does not need any information on
the distribution of system states, and is, further, also utility-
optimal. For simplicity, we assume that αk := 1/k, that is,
qn(k) is the proportion of time intervals that client n has been
served until the kthtime interval. We will discuss the case
where αkis a constant for all k at the end of this section.
5.1 Convex Programming Formulation
Define pc(k) and fc,S(k), for all system states c and subsets
S ∈ c, recursively, as follows:
⎧
⎩
and
pc(k + 1) =
⎨
k−1
kpc(k) +1
k,
if c(k) = c,
k−1
kpc(k),
otherwise,
fc,S(k + 1) =
⎧
⎩
⎨
k−1
kfc,S(k) +1
S is scheduled at the kthinterval,
k−1
kfc,S(k),
otherwise.
k,
if c(k) = c and
These two variables can be thought of as the relative frequen-
cies of occurrence of the system state c and the event that
subset S is scheduled under system state c, respectively. Also,
we have?
sider scheduling policies where fc,S := limk→∞fc,S(k) ex-
ists for all system states c and subsets S.
?
S∈cfc,S(k) = pc(k) and?
c
?
S:S∈c,n∈Sfc,S(k) =
qn(k) for all c and k. For ease of discussion, we only con-
Thus, we have
S∈cfc,S = pc and?
c
?
S:S∈c,n∈Sfc,S = qn. The NUM
problem can be described as the following convex program-
ming problem:
Max?N
qn =?
fc,S ≥ 0.
n=1Un(qn) =?N
?
n=1Un(?
c
?
S:S∈c,n∈Sfc,S)
s.t.
?
S∈cfc,S = pc,∀c,
cS:S∈c,n∈Sfc,S ≥ qn,∀n,
over
While typical techniques for solving a convex programming
problem can be applied to solve this NUM problem, such so-
lutions may not be directly translatable into a scheduling pol-
icy for our time-varying network. Also, a solution based on
solving the convex programming problem would require the
knowledge of the probability distribution of system states. In
practice, this knowledge may not always be available to the
server. Thus, a scheduling policy that makes decisions based
only on past history and current system state is needed.
5.2 An On-line Scheduling Policy
We now describe an on-line scheduling policy, and prove
that it is utility-optimal. This scheduling policy only requires
information on the past history and the current system state,
and, surprisingly, does not need any knowledge of the actual
probability distribution of system states. Thus, it is easily im-
plementable. The scheduling policy is based on dual decom-
position, which is similar to the approach used in Lin and
Shroff [13], although they do not consider network dynam-
ics.
We assign a Lagrange multiplier λn for each constraint
?
L(f,λ) =
+?N
where f denotes the vector consisting of [fc,S], for all c and
S, and λ denotes the vector [λn]. The dual objective function
is:
c
?
S:S∈c,n∈Sfc,S ≥ qn. The resulting Lagrangian of the
resulting convex programming problem is:
?N
n=1Un(?
c,S:S∈c,n∈Sfc,S)
c,S:S∈c,n∈Sfc,S− qn),
n=1λn(?
D(λ) = maxf:fc,S≥0;?
Since the minimum long-term service rate requirements,
[qn], are strictly feasible, there exist [fc,S] such that
?
for all n. By Slater’s condition, minλD(λ) equals the maxi-
mum total utility.
Let λ(k) = [λn(k)] denote Lagrange multipliers that are
used in the kthperiod. The maximum total utility can be
achieved by solving two subproblems: maximizing
S∈cfc,S=pc,∀cL(f,λ).
S∈c
fc,S = pc, and
?
c
?
S:S∈c,n∈S
fc,S > qn,
lim
k→∞E[L(f(k),λ)],
for any given λ, and minimizing
lim
k→∞E[D(λ(k))].
We will refer to these two subproblems as the primal problem
and dual problem, respectively.
We first discuss how to solve the primal problem. Due to
the constraint?
for every c and S such that fc,S > 0. Recall that Un(·) is
S∈cfc,S = pc, [fc,S] is an optimal solution if
n∈S(U?
and only if
∂L
∂fc,S:=?
n(qn) + λn) = maxS?∈c
∂L
∂fc,S?
Page 5
strictly concave, and U?
Suppose, at some time interval k with c(k) = c, there exists a
subset S feasible under c such that?
c. We wish to narrow the difference between S and all other
S?. One obvious choice would be to schedule the subset S in
the time interval, so as to increase qn(k+1) for all n ∈ S, and
thus decrease?
maximizes?
DEFINITION 4. Given λ and f(k), a max-weight scheduling
policy is one that schedules a feasible subset S ∈ c(k) that
maximizes?
LEMMA 1. Let Δf(k) be the vector consisting of the elements
Δfc,S(k) := fc,S(k + 1) − fc,S(k) for all c and S. Given
λ and f(k), the max-weight scheduling policy also maximizes
E[∇L(f,λ) · Δf(k)|fc,S(k)].
PROOF. Recall that we have:
⎧
⎩
Thus, Δfc,S(k) =
uled, and Δfc,S(k) = −1
the probability that c(k) = c and S is scheduled under the
max-weight scheduling policy. We then have:
E[∇L(f,λ)Δf(k)|fc,S(k)] =?
Since Prob{c(k) = c} = pc,?
?
n(·) is a strictly decreasing function.
n∈S(U?
n(qn(k))+λn) >
?
n∈S?(U?
n(qn(k))+λn) for all other subsets S?feasible under
n∈S(U?
n(qn(k)) + λn). In fact, as we shall
see in the lemma below, selecting the feasible subset S that
n∈S(U?
ascent direction of L.
n(qn(k))+λn) also points in the steepest
n∈S(U?
n(qn(k)) + λn) in each time interval k.
fc,S(k + 1) =
⎨
k−1
kfc,S(k) +1
S is scheduled at the kthinterval,
k−1
kfc,S(k),
otherwise.
k,
if c(k) = c and
1
k(1 − fc,S(k)) if c(k) = c and S is sched-
kfc,S(k), otherwise. Letˆfc,S(k) be
c,SE[
n∈S(U?
∂L
∂fc,SΔfc,S(k)|fc,S(k)]
n(qn(k)) + λn)]}.
=1
k{?
c,S[ˆfc,S(k) − fc,S(k)][?
E[∇L(f,λ)Δf(k)|fc,S(k)] is maximized by setting:
pc,
0,
otherwise.
(1)
Sˆfc,S(k) = pc. The term
ˆfc,S(k) =
if S = argmaxS∈c?
n∈SU?
n(qn(k)) + λn,
(2)
This is achieved by selecting the feasible subset S that maxi-
mizes?
Next, we prove that the max-weightscheduling policy solves
the primal problem.
n∈S(U?
n(qn(k)) + λn) for every system state c.
THEOREM 1. Under the max-weight scheduling policy,
L(f(k),λ) → D(λ), as k → ∞,
for any given λ.
PROOF. Since the utility functions are infinitely differen-
tiable, L(f,λ) is also infinitely differentiable. By Taylor’s the-
orem, we have that for any f, Δf, and fixed λ,
L(f + Δf,λ) = L(f,λ) + ∇L(f,λ)Δf + r(f,Δf,λ),
where |r(f,Δf,λ)| < a(λ)|Δf|2, for some constant a(λ).
Now we have,
E[L(f(k + 1),λ)|f(k)]
L(f(k),λ) + E[∇L(f(k),λ)Δf(k) − a(λ)|Δf(k)|2|f(k)]
L(f(k),λ) + E[∇L(f(k),λ)Δf(k)|f(k)] − ˜ a/k2,
≥
≥
(3)
where Δf(k) is defined as in Lemma 1 and ˜ a is some constant.
The last inequality follows because |Δfc,S(k)| ≤
Let pc(k) :=?
the proof of Lemma 1. The values ofˆfc,S(k) under the max-
weight scheduling policy are given as in (2).
ˆ μc(k) := maxS∈c?
E[∇L(f(k),λ)Δf(k)|f(k)] ≥1
1
kfor all c,S.
S∈cfc,S(k), which is the empiricalfrequency
that system state c occurs, and letˆfc,S(k) be defined as in
Further, let
n∈S(U?
n(qn(k)) + λn), for all c. Using
(1) and (2),
k
?
c
(pc− pc(k))ˆ μc(k).
Since kpc(k + 1) is the number of occurrences of system
state c until the kthtime interval, and the system state in
each time interval is i.i.d. distributed, by the law of iterated
logarithm [2], there exists some positive constant b such that
limsupk→∞
there exists constant˜b such that
k(pc(k+1)−pc)
k1/2(log log k)1/2≤ b. Thus, for large enough k,
E[∇L(f(k),λ)Δf(k)|f(k)] ≥ −(loglogk)1/2
For large enough k, (3) can hence be bounded by
k3/2
˜b.
E[L(f(k + 1),λ)|f(k)] ≥ L(f(k),λ) −(loglogk)1/2
k3/2
˜b −˜ a
k2.
(4)
As we can see in the above, E[L(f(k + 1),λ)|f(k)] is “al-
most” larger than L(f(k),λ) except for twodiminishing terms.
For large enough constant d, −L(f,λ) + d is also nonnega-
tive for all f, and by (4) it is therefore a “near positive sub-
martingale” as in [15]. Since?∞
surely.
Next, we need to show that limk→∞L(f(k),λ) = D(λ).
We prove this by contradiction. Recall that the necessary and
sufficient condition for L(f,λ) = D(λ) is that?
fc,S > 0. Suppose L(f(k),λ) does not converge to D(λ).
Then, there exists δ > 0, ? > 0 such that for all large enough
k, there exist (ck, Sk) so that fck,Sk> δ and?
term E[∇L(f(k),λ)Δf(k)|f(k)] under this condition shows
that E[∇L(f(k),λ)Δf(k)|f(k)] >
some constant K such that for all k > K,
obtain E[L(f(k + 1),λ)|f(k)] > L(f(k),λ) +
L(f(k),λ) +1
k=1
k=1[(log log k)1/2
k3/2
˜b +
˜ a
k2] < ∞,
Exercise II-4 in [15] shows that L(f(k),λ) converges almost
n∈S(U?
n(qn)+
λn) = maxS?∈c
?
n∈S?(U?
n(qn) + λn), for all c,S such that
n∈Sk(U?
n(qn)+
λn) < maxS?∈c
?
n∈S?(U?
n(qn) + λn) − ?. Evaluating the
1
kδ?. Since there exists
˜ a
k2 <
1
kδ? −
k= ∞, we also have
1
kδ?/2, we
˜ a
k2 >
kδ?/2. Since?∞
lim
1
k→∞E[L(f(k),λ)] = ∞,
which is a contradiction. Thus, limk→∞L(f(k),λ) = D(λ).
Next we discuss how to solve the dual problem: minλD(λ).
We use the subgradient method to solve it. We first find a
subgradient for D(λ).
LEMMA 2. Let vn := [?
PROOF. Let λ?be an arbitrary vector. We have:
c,S:S∈c,n∈Sf∗
c,S− qn], where [f∗
c,S]
maximizes L(f,λ). Then v is a subgradient of D(λ).
D(λ?) = maxf:?
≥L(f∗,λ?) = L(f∗,λ) + (λ?− λ)TvD(λ)
=D(λ) + (λ?− λ)Tv.
S∈cfc,S=pc,∀cL(f,λ?)