Page 1

1

Cross-layer Optimization for Streaming Scalable

Video over Fading Wireless Networks

Honghai Zhang, Member, IEEE, Yanyan Zheng, Mohammad A. (Amir) Khojastepour, Member, IEEE,

and Sampath Rangarajan, Senior Member, IEEE

Abstract—We present a cross-layer design of transmitting

scalable video streams from a base station to multiple clients

over a shared fading wireless network by jointly considering the

application layer information and the wireless channel conditions.

We first design a long-term resource allocation algorithm that

determines the optimal wireless scheduling policy in order to

maximize the weighted sum of average video quality of all

streams. We prove that our algorithm achieves the global op-

timum even though the problem is not concave in the parameter

space. We then devise two on-line scheduling algorithms that

utilize the results obtained by the long-term resource allocation

algorithm for user and packet scheduling as well as video frame

dropping strategy. We compare our schemes with existing video

scheduling and buffer management schemes in the literature

and simulation results show our proposed schemes significantly

outperform existing ones.

Index Terms—Scheduling, video streaming, scalable video,

fading, wireless networks

I. INTRODUCTION

Recent years have witnessed increasing popularity of

streaming video over wireless networks as both wireless data

communication and video compression techniques undergo

significant progress. On one hand, the data transmission rates

of wireless networks are steadily growing, e.g., 1Gbps target

rate for nomadic and 100Mbps for mobile users in 4G systems

[9]. On the other hand, H.264/MPEG4-AVC [1] achieves

more efficient video compression and the Scalable Video

Coding (SVC) extension [16] of H.264/MPEG4-AVC obtains

both high coding efficiency and high scalability. Nevertheless,

because the wireless medium is often shared by many users,

it is still important to adapt to the wireless channel conditions

in order to satisfy stringent bandwidth and delay requirement

of video traffic.

Streaming video over wireless networks has been studied

extensively by many researchers, but much of the previous

work ([7], [8], [25], [15], [22], [26] and the references

therein) has focused on the single-stream scenario where the

transmitter of a video streaming service adaptively adjusts its

transmission rate, re-transmission, video-truncation, Forward

Error Correction (FEC) and/or Hybrid ARQ (HARQ) policy

in order to optimize the received video quality. Multi-user

streaming where the wireless radio resources are shared by

multiple streaming users has also been considered in [24],

[10], [11], [13], [14], [12]. However, none of them considered

Manuscript received 19 March 2009; revised 20 October 2009.

Honghai Zhang, Mohammad A. Khojastepour, and Sampath Rangarajan

are with NEC Laboratories America (email:{honghai,amir,sampath}@nec-

labs.com.)

Yanyan Zheng is with the Electrical Engineering Department of Stanford

University(email:yyzheng@stanford.edu)

exploiting the fading wireless channel characteristics and the

scalable video encoding jointly.

Realtime radio resource scheduling algorithms have been

studied in [17], [3], [18] by considering the delay requirement

and channel conditions. An alternative formulation based on

optimizing a concave utility function and rate control over a

fading wireless networks has been considered [6], [19], [4].

However, these algorithms only warrant asymptotic conver-

gence without explicitly considering video applications (such

as the hard deadline constraint and bursty rate requirements).

In this work, we considerthe cross-layeroptimization of rate

adaptation and exploiting the multi-user diversity for video

streaming using scalable video coding over a shared fading

wireless channel. In a fading wireless channel, it is important

to exploit the multi-user diversity, i.e., by scheduling users

in relatively good channel conditions. With SVC [16], it is

possible to adapt the video transmission rate to the wireless

channel capacity.

We first develop an empirical model to relate the aver-

age video quality (measured by PSNR (Peak Signal-to-Noise

Ratio)) and the average throughput based on SVC [16]. We

then formulate the following cross-layer problem: maximizing

the weighted sum of video quality of all users subject to

the achievable long-term (ergodic) rate constraint under a

fading wireless channel model. To solve the problem, we

develop a long-term radio resource allocation algorithm which

determines the wireless scheduling policy and the parameters

used by the scheduling policy. We prove the convergence

and the optimality of the proposed algorithm under mild

conditions.

Aiming to exploit multi-user diversity, we propose two

online scheduling algorithms that meet the realtime video

traffic QoS (Quality of Service) requirement. The Static

scheduling algorithm only uses the results obtained from

the aforementioned long-term resource allocation algorithm.

And, the Dynamic scheduling algorithm further adapts the

scheduling parameters to meet the instantaneous rate, deadline

requirement of video traffic and wireless channel conditions.

The underlying objective function for the optimal scheduling

problem is non-convex/concave and non-differentiable. We

transform the problem to an equivalent, but differentiable

one. We then develop a gradient-based approach to solve the

problem and prove that it converges to the optimal solution

even though the objective function is non-convex/concave. We

also design frame dropping strategies that determine when and

which frames will be dropped.

We carry out extensive simulations to validate our proposed

schemes using SVC-encoded real video sequences. Simulation

results show that our proposed scheduling schemes achieve

significant improvement over existing real-time video schedul-

Page 2

ing algorithms in the literature. Our proposed schemes obtain

up to 8-10 dB gain of average video quality compared to

several well-known existing schemes. Moreover, our proposed

schemes are much more robust under weak wireless channel

conditions. At low SINR, some videos are not decodable with

existing video scheduling schemes because of heavy packet

drop, while nearly all videos are decodable even at very weak

channel conditions by using the dynamic scheme. Finally, our

proposed algorithms are robust to channel estimation errors.

Our main contributions are threefold. First, we design a

long-term radio resource allocation algorithm and prove that it

achieves the global optimum of the objective function. Second,

we design two online scheduling algorithms and also prove the

optimality of the obtained scheduling parameters used by the

dynamic scheduling algorithm. Third, we develop intelligent

frame/layer dropping strategies based on both the long-term

resource allocation algorithm and the dynamic scheduling

algorithm.

The rest of the paper is organized as follows. In Section

II, we discuss SVC and the rate-quality model. The proposed

long-term radio resource allocation scheme is given in Section

III. We present two algorithms for on-line scheduling of video

streaming traffic in Section IV. Simulation results based on

SVC-encoded real video sequences are reported in Section V

and the paper is concluded in Section VI.

II. SCALABLE VIDEO CODING AND RATE-QUALITY

MODEL

SVC can be referred to as both the general concept of

scalable video coding and the special extension [16] of

H.264/MPEG4-AVC [1]. As a general concept, an SVC stream

has a base layer and several enhancement layers. As long as

the base layer is received, the receiver can decode the video

stream. As more enhancement layers are received, the decoded

video quality is improved. For a detailed overview of SVC,

please refer to [16].

As in [27], we use PSNR (Peak Signal to Noise Ratio) as a

measure of video quality and develop a model to characterize

the relationship between the rate and PSNR. It turns out

that the relationship between the PSNR Si of video stream

i and the video rate r can be described as a piece-wise linear

function:

where r0

quality layers are included, and the last line specifies the

maximum encoding rate of a video sequence. In figure 1,

we plot both the sample points of rate and PSNR and the

model we have obtained for eight video sequences: News,

Hall, Silent, City, Foreman, Crew, Harbour, and Mobile, all

of which can be downloaded from [21] (more video sequence

models are obtained but are omitted for the clarity of the

figure). The points in the figure show the sample points of

the rate and PSNR and the lines show the regression models

we obtained based on Eq. (1). It can be seen that our model

is quite accurate. Note that Li> Ki> 0 based on the models

of the real video sequences, so the function Si(r) is concave,

continuous, and non-decreasing with respect to r.

Si(r) =

S0

S0

S0

i+ Li(r − r0

i+ Ki(r − r0

i+ Ki(rmax

i)

i)

− r0

if r ≤ r0

if r0

else

i

i< r ≤ rmax

i

ii)

(1)

i,S0

iare the rate and PSNR when only the base

020040060080010001200140016001800

15

20

25

30

35

40

45

rate (Kbps)

PSNR (dB)

News

Hall

Silent

City

Foreman

Crew

Harbour

Mobile

Fig. 1.Sample points and linear regression models of the rate and PSNR.

Along with the model, we also obtain the priority of

different video layers. Note that a layer can be specified with

a temporal level t and quality level q (when the video is

encoded with temporal and quality scalability). We use q = 0

to represent a base (quality) layer. We then determine the

priority of each layer based on the average ratio of the PSNR

drop to the rate decrease for each truncated layer; the details

are omitted due to space limit. Once the priority is determined,

we can drop video layers in ascending order of their priority

to obtain any desired transmission rate and the corresponding

PSNR can be obtained based on the model (1).

Stuhlmuller [20] et al. also proposed a rate-distortion model.

The major difference between our model and their model is

that the different rates in the model in [20] are obtained by

encoding with different source coding rates and INTRA rates

while those in our model are obtained via dropping layers in

the increasing order of layer priority. As a result, our model

is more appropriate in wireless networks where video packets

may be dynamically dropped depending on wireless channel

conditions.

III. LONG-TERM RADIO RESOURCE ALLOCATION FOR

STREAMING VIDEO

When a wireless base station receives multiple requests

for streaming service of different video sequences, it has to

decide i) how much radio resources should be allocated to

each user in order to maximize the overall video quality, ii)

how to achieve the desired radio resource allocation. In a non-

fading wireless network, these problems can be simply solved

by Time Division Multiple Access (TDMA) (e.g. [11], [12]).

But mobile networks often experience fast fading. In a fading

wireless network, channel state dependent scheduling is often

used to exploit multi-user diversity.

We describe the problem formulation under a fading channel

in subsection III-A. In subsection III-B, we develop an algo-

rithm to find the optimal scheduling policy and the associated

parameters. In subsection III-C, we rigorously prove that

the obtained scheduling policy and its parameters achieve

the global optimum of the overall video quality defined in

subsection III-A under mild conditions.

2

Page 3

A. Problem formulation

We assume that the instantaneous transmission rate for each

user i at each time slot t is

C(hi(t)) = B log(1 + ρ|hi(t)|2/Γ)

(2)

where B is the channel bandwidth,ρ is the transmission power,

hi(t) is the channel gain of user i, normalized with respect

to the standard deviation of noise (and interference), Γ ≥ 1

represents the gap between the actual coding scheme and

the Shannon capacity. We assume discrete time system and

a TDMA transmission strategy: at each time slot, the server

picks only one user (which may depend on the channel states

of all the users) and sends information with the supportable

rate of the channel of this scheduled user.

Our objective is to maximize the weighted sum of average

PSNR of all users:

max

n

?

i=1

r ∈ R

wiSi(ri)

s.t. (3)

where wiis the weight of user i, Siis the PSNR of user i as

modeled in Eq. (1), r = (r1,··· ,rn) and R is the achievable

ergodic (long-term average) rate region.

The major challenge in solving problem (3) is that the

achievable rate region cannot be explicitly specified in a

fading environment. We next develop an algorithm to solve

the problem.

B. Algorithm to solve problem (3)

We first consider a randomized channel-state-dependent

TDMA scheduling strategy (which is also called static service

split (SSS) in [2]): given the channel states h of all users at

a time slot, the scheduler picks user i with probability πi(h)

and?n

randomized TDMA strategy is decided by the randomized

scheduling function π(h):

i=1πi(h) ≤ 1. The achievable rate region under this

R = {r : ri≤ Eh[πi(h)C(hi)],1 ≤ i ≤ n}

(4)

It was shown in [19] that the achievable rate region under this

randomized TDMA strategy is convex, bounded, and closed.

Because the PSNR function Si(r) is a non-decreasing

function of the rate r, there must exist an optimal solution

to problem (3) on the boundary of the achievable rate region

(i.e., which is Pareto-optimal). It was proved in [2] that for

any achievable rate vector r on the boundary of the achievable

rate region, there exists a vector µ = (µi> 0,i = 1,··· ,n)

such that the rate vector r can be achieved by the following

scheduling function:

πi(h) > 0 ⇒ h ∈ {h : µiC(hi) ≥ µkC(hk),∀k ?= i}.

(5)

The solution in (5) is essentially a deterministic scheduling

function: at each time slot, the user with the largest µiCi

is chosen for scheduling, if we ignore the set of channel

states h for which µiC(hi) = µkC(hk) which in fact has

zero probability if the distribution of h is continuous. We

call the scheduling policy defined by s(h) in (5) as maximal

scheduling policy, due to the fact that this scheduler only

obtains the set of rates in the boundary which are Pareto-

optimal.

We can now view a boundary point r = (r1,··· ,rn) as

a function of the parameter set µ = (µ1,··· ,µn), where the

average rate riof user i can be written as

ri= E[C(hi)I(µiC(hi) > µjC(hj) for all j ?= i)].

(6)

where I is an indicator function.

Let γi = ρ|hi|2denote the SINR (Signal-to-Noise-Ratio)

of the user i with PDF (Probability Density Function) func-

tion fγi(γ) and CDF (Cumulative Density Function) function

Fγi(γ). Let R(γ) = B log(1+γ/Γ). Thus, the rate rican be

computed as

ri(µ) =

?∞

0

R(γ)Πj?=iFγj(R−1(µiR(γ)/µj))fγi(γ)dγ

(7)

Now our objective is simply to

maximize

Y =

n

?

i=1

wiSi(ri(µ))

(8)

Note that Si(ri) is a non-decreasing concave continuous

function of ri.

A general approach to solve an optimization problem is the

gradient-based method. But the challenge with problem (8)

is i) the function Y is not differentiable at the points when

some ri = r0

i

, ii) the function Y is generally

not concave (or convex) with respect to µ. When a function is

not concave or convex, the solution generated by the gradient-

based approach is often only a local maximum (or minimum)

but not a global maximum. In the following we develop an

algorithm and prove that the limit point of the algorithm is

the global maximum of problem (8) under mild conditions.

To resolve the first issue, we note that although Si is

not differentiable at ri = r0

sided derivatives. For functions with one-sided derivatives, the

following lemma is a simple generalization of the first-order

necessary optimality conditions.

ior ri = rmax

ior ri = rmax

i

, it has one-

Lemma 1 If µ∗is a local maximum of Y , then

Y′

i+(µ∗) ≤ 0 and Y′

i−(µ∗) ≥ 0

(9)

for all 1 ≤ i ≤ n, where Y′

sided partial derivative and Y′

partial derivative.

i+(µ∗) =

i−=

∂Y

∂µi+(µ∗) is the right-

∂Y

∂µi−(µ∗) is the left-sided

Now with the one-sided partial derivative, we can obtain an

iterative modified gradient-based solution as follows. First we

compute the modified gradient g(k)= (g(k)

each iteration k:

1,g(k)

2,...,g(k)

n ) in

g(k)

i

=

8

<

:

0,

Y′

Y′

if Y′

if Y′

otherwise

i+(µ(k)) ≤ 0 and Y′

i+(µ(k)) > 0 and Y′

i−(µ(k)) ≥ 0

i+(µ(k)) ≥ −Y′

i+(µ(k)),

i−(µ(k)),

i−(µ(k))

Let i0 = argmax(|g(k)

be d(k)= (0,··· ,g(k)

direction d(k)is all zero except the i0th element, which takes

the value g(k)

direction but d(k)is unless d(k)is zero.

i

|). The ascent direction is chosen to

i0,··· ,0). In other words, the ascent

i0. Note that g(k)is not necessarily an ascent

3

Page 4

TABLE I

ALGORITHM A1

A1: Pseudo-code to find the solution to problem(8)

/* ǫ,σ,α0 are positive constant values. ǫ is close to 0, and 0 < σ <

1.*/

1: Select a starting point µ(0)

i

= 1 for all 1 ≤ i ≤ n.

2: Compute d(0)from µ(0)

3: k = 0;

4: while |d(k)|2≥ ǫ do

5: /* Choose the step size */

6:

α = α0

7:

while Y (µ(k)+ αd(k)) − Y (µ(k)) < σα · |d(k)|2do

8:

α = α · β

9:

end while

10:

µ(k)= µ(k)+ α · d(k)

11: k = k+1

12:Recompute d(k)from µ(k)

13: end while

Stepsize selection using modified Armijo Rule: In any

gradient-based approach, we also need to choose the step size

α(k)appropriately in order for the algorithm to converge to a

local maximum. For this purpose, we apply a modified Armijo

rule [5] where the gradient ∇Y (µ(k)) is replaced with d(k)

because the gradient ∇Y (µ(k)) may not exist. The pseudo-

code of our algorithm A1 is listed in Table I.

We can now show the convergence of the algorithm, which

is summarized next. The proof follows similar ideas in the

standard proof of the limit points of the stationary points for

gradient-based methods (see e.g., [5]) and is omitted.

Lemma 2 Algorithm A1 converges to a point µ∗satisfying the

necessary conditions of a local maximum in Eq. (9), assuming

that the finite stopping condition (in line 4 of Algorithm A1)

is removed.

C. Optimality of the algorithm A1

For all practical purpose, we can assume that the PSNR

function Y is continuously differentiable, non-decreasing and

concave with respect to r (we can always connect the two line

segments using a smooth curve to make the function continu-

ously differentiable). Under this assumption, we can prove that

the limit point of algorithm A1 is a global maximum. This is

quite significant as the function Y is generally not concave

with respect to the controlling variable µ. We first prove the

following lemma.

Lemma 3 For any rate vector r(µ) achieved using the max-

imal scheduling policy (5) with parameter µ, all achievable

rate region is below the hyper-plane defined by

{r(µ) +∂r

∂µ|µu : u ∈ Rn}

(10)

where

column is

rate vector r′, we can find a vector u ∈ Rnsuch that r′≤

r +∂r

∂r

∂µ|µ is a matrix whose element at i’th row and j’th

∂ri

∂µj|µ). More precisely, for any other achievable

∂µ|µu element wise.

Proof. The intuition is quite clear. At any point r that is

achieved by the maximal scheduling scheme with parameter

vector µ, there is a tangent hyperplane. All the achievable

rate region is below the tangent hyperplane. Next we give a

rigorous proof.

Since r is on the boundary of the achievable rate region,

which is convex, from Proposition B.12 in [5], there is a vector

b ?= 0 such that

bT(r′− r) ≤ 0

(11)

for any r′in the achievable rate region. We now show two

properties of the vector b. First, bT ∂r

be the rate achieved using parameter set µ′= (µ′

where µ′

of Eq. (11) by µ′

µ′

we let µ′

r is continuously differentiable with respect to µ, we obtain

bT ∂r

The second important property is b ≥ 0 element wise.

Suppose b = (b1,··· ,bn). We choose r′

for all j ?= k. Clearly, r′is a achievable rate vector. So from

Eq. (11), we get 0 ≥ bT(r′−r) = bk(r′

we have bk≥ 0.

We now look at the intersection point of the line {r′+bt :

t ∈ R} and the hyper plane (10). At the intersection point, we

have

r′+ bt = r +∂r

∂µk= 0 for all k. Let r′

1,··· ,µ′

n)

j= µj for all j ?= k and µ′

k− µk we obtain that bT (r′−r)

k> µk and µ′

k< µk and µ′

k> µk. Divide both sides

k−µk≤ 0. Let

∂µk≤ 0. Similarly, if

k→ µk−, we get bT ∂r

µ′

k→ µk+, we have bT ∂r

∂µk≥ 0. Since

∂µk= 0 for all 1 ≤ k ≤ n.

k< rk and r′

j= rj

k−rk). Since r′

k< rk,

∂µ|µu

(12)

Multiplying both sides by bTfrom the left and doing a little

re-arrangement, we get

bTbt = bT(r − r′) + bT∂r

∂µ|µu = bT(r − r′) ≥ 0

where the second equality is because of the first property of

vector b and the last inequality is from Eq. (11). Since bTb >

0, we obtain t ≥ 0. Therefore, the intersection point r′+ bt

is on the hyper plane (10) and is larger than or equal to r′

element wise (because t ≥ 0 is a scalar and bj≥ 0 for all j).

?

Next we prove the main theorem. Note that we can always

connect the two line segments of the function Si(ri) with a

smooth curve such that the function Si(ri) is continuously

differentiable.

(13)

Theorem 1 The limit point of the algorithm A1 is a global

maximum of function Y assuming that the PSNR function

Si(ri) is non-decreasing, concave, and continuously differen-

tiable with respect to ri.

Proof. Let µ and r(µ) be the limit point of the algorithm

A1. Clearly r(µ) is a boundary point of the achievable rate

region. Applying Lemma 3, for any achievable rate vector r′,

there exists a vector u such that r′≤ r +∂r

wise. Because of the non-decreasing property of the function

Y over each ri, we have

Y (r′) ≤ Y (r +∂r

∂µ|µ u element

∂µ|µu)

(14)

We next show that, for any point r +

hyperplane (10),

Y (r(µ) +∂r

∂r

∂µ|µu on the

∂µ|µu) ≤ Y (r(µ)),

(15)

4

Page 5

012345

µ2

678910

45

50

55

60

65

70

75

80

Sum PSNR (dB)

Fig. 2. Sum PSNR vs. µ2 while µ1= 1

given that µ is the limit point of algorithm A1. With the

assumption that Y is continuously differentiable with respect

to both r and µ, from Lemma 2, the limit point µ of algorithm

A1 satisfies, for any 1 ≤ k ≤ n,

0 =∂Y

∂µk

=

n

?

i=1

∂Y

∂ri

∂ri

∂µk,

(16)

where the second equality is from the chain rule of partial

derivative.

Because of the concavity of Y over r,

Y (r +∂r

∂µ|µu)≤Y (r) + ∇Y (r)T∂r

∂µ|µu

n

?

k=1

∂Y

∂rj

=Y (r) +

n

?

j=1

n

?

k=1

∂Y

∂rj

n

?

j=1

?

∂rj

∂µkuk

=Y (r) +

∂rj

∂µk

???

=0

uk

=Y (r)

(17)

where the last equality is from Eq. (16).

Combining Eq. (14) and (17), we obtain that Y (r′) ≤

Y (r(µ)) for any achievable rate vector r′provided that µ is

the limit point of the algorithm A1.

In general, for a non-convex (or non-concave) problem,

a stationary point is only a local maximum (or minimum).

It appears surprising that although the objective function Y

is not a concave function with respect to µ, the stationary

point is the global maximum of Y . To further understand the

problem, we consider a two-user scenario and fix µ1 = 1.

The objective function Y is then the function of µ2. Using

parameters derived from two real video sequences, we plot

the sum PSNR of the two users vs µ2 in Fig 2. We can

observe from the figure that the function Y is neither convex

or concave, and that the stationary points of the function are

not even unique. However, all the stationary points are indeed

the global maximum of Y .

?

IV. ONLINE SCHEDULING FOR SVC VIDEO STREAMING

An online scheduling algorithm for real-time video appli-

cations needs to address three issues:

1. User scheduling: at each time slot, which user should be

scheduled?

2. Frame scheduling: after a user is selected, which pack-

ets/frames of the selected user should be transmitted?

3. Dropping strategy: when does it need to drop frames and

which frames should be dropped?

The resource allocation algorithm presented in the section

produces two results: 1) the vector µ used for user-scheduling,

2) the achievable average rate rifor each user i. We next de-

scribe two online scheduling schemes exploiting these results.

In the first scheme, we simply apply the results obtained from

the resource allocation algorithm. In the second scheme, we

also consider the bursty and dynamic arrival and the deadline

of video frames.

A. Static scheduling scheme

In this first scheme, the vector µ is computed from the

previous section and is fixed during the process of streaming (µ

may be re-computed when new users join or some users leave).

At each time slot, the user with the largest µiCiis chosen for

scheduling, where Ci is the current channel capacity of user

i, and the vector µ = {µ1,µ2,··· ,µn} is computed from

the long-term resource allocation algorithm in the previous

section.

Frame scheduling for the selected user is based on both the

deadline and priority of the packets. We differentiate two types

of deadlines. Playout deadline is the time a frame need to be

displayed. Decoding deadline is the earliest time that a frame

is needed for decoding itself or other frames. The decoding

deadline of a frame can be computed as the minimum playout

deadline of all frames that depend on it. We then schedule

packets of a given user in the order of their decoding deadline.

Those packets with the same decoding deadline are scheduled

in the order of their priority, which is obtained in Section II.

As to the dropping strategy, there are two types of dropping.

The first is late dropping, which happens when the playout

deadline of a packet is passed. If the base layer of a frame

is dropped, all dependent frames are dropped too. Note that

when all packets of a frame are either successfully transmitted

or dropped, the decoding deadline of the frames that it depends

on need to be re-computed.

The second type of dropping is early dropping. With the

achievable rate computed from the previous subsection, we

can pre-determine which layers should be dropped based on

the rate requirement. We find the minimum priority such that

the average data rate of the packets with priority higher than or

equal to the minimum priority does not exceed the achievable

rate computed from the previous section. All packets with

priority lower than the minimum priority are dropped at the

beginning of the video streaming.

B. Dynamic scheduling scheme

The dynamic scheduling is built on top of the static

scheduling scheme with two additional enhancements. The

first enhancement is on the user scheduling. At each time slot,

still, the user with the largest µiCiis selected for scheduling.

However, the vector µ is periodically updated to reflect both

the bursty arrival and the deadline of video traffic.

5

Page 6

Assume that for user i, the size of total packets that need

to be transmitted before the deadline Tjis Qj. We define the

target rate ¯ rifor user i to be

¯ ri=

n

max

j=1

Qj

Tj− t

(18)

where t is the current time.

Now with the target rate ¯ rifor each user i, we ask whether

there exists a vector of µ such that the target rate ¯ ri can be

satisfied for every user i. We consider the following max-min

problem.

maximize minri(µ)

¯ ri

(19)

over all possible choices of µ. Clearly, if the optimal value

of Eq. (19) is larger than or equal to 1, the target rate ¯ r is

schedulable and vice versa. The problem (19) is not easy to

solve as i) the problem is non-convex and ii) the derivative

does not exist and the gradient-based approach cannot be

directly applied. But the following result was obtained in [6]:

Lemma 4 Solving problem (19) is equivalent to solving the

following problem: find µ such that

r1(µ)

¯ r1

=r2(µ)

¯ r2

= ... =rn(µ)

¯ rn

,

(20)

assuming that the channel distribution function fγi(γ) is a

continuous function of γ for all i.

To solve problem (20), we define g = (g1,···gn), ¯ g, and

h(µ) as follows:

gi(µ)=ri(µ)/¯ ri

n

?

i=1

1

2

i=1

¯ g=

gi(µ)/n

h(µ)=

n

?

(gi(µ) − ¯ g)2.

(21)

In each iteration k, we compute ¯ g(k)and treat it as a fixed

value during the iteration. We then solve the problem of

minimizing the function h(µ) in Eq. (21) using Gauss-Newton

method. To do so, we choose the direction

d(k)= (∇g(µ)∇g(µ)T)−1∇g(µ)(g(µ) − ¯ g(k)).

(22)

and update µ(k)as

µ(k+1)= µ(k)− α(k)d(k)

where α(k)is the step size chosen by Armijo rule. The

pseudo-code of the algorithm A2 is presented in Table II. The

following lemma follows from the stationarity of limit points

of gradient-based approach (see e.g. [5], page 43).

Lemma 5 Algorithm A2 converges to a stationary point of

the function h(µ) (defined in Eq. (21)) assuming that the finite

stopping condition in the outer-loop is removed.

In general, the stationary point of a function h(µ) is not

necessarily a global minimum of the function because of the

non-convexity of the function. But, surprisingly, we can prove

that the limit point of the algorithm A2 is a global minimum

TABLE II

ALGORITHM A2

A2: Pseudo-code to solve problem (20)

/* ǫ,σ,α0 are positive constant values. ǫ is close to 0, and 0 < σ <

1.*/

1: Select a starting point µ(0)

i

= 1 for all 1 ≤ i ≤ n.

2: Compute g(µ(0)), ¯ g(0), and h(µ(0)) using Eq. (21).

3: Compute d(0)according to Eq. (22)

4: k = 0;

5: while ||g(µ(k)) − ¯ g(k)||2≥ ǫ do

6:/* Choose the step size */

7:

α = α0

8:

while h(µ(k))−h(µ(k)−αd(k)) < σα·(∇h(µ)T)d(k)do

9:

α = α · β

10:

end while

11:

µ(k+1)= µ(k)− α · d(k)

12:

k = k + 1

13:Recompute d(µ(k)),g(µ(k)), ¯ g(k)and h(µ(k)).

14: end while

(i.e., 0) of the function h(µ), which is also the unique solution

to problems (19) and (20), even though the function h(µ) may

not be convex. The result is stated in the following theorem.

The proof is presented in the appendix.

Theorem 2 Algorithm A2 converges to the optimum solution

to the problems (19) and (20) assuming that the finite stopping

condition in the outer-loop is removed.

The second enhancement of this dynamic scheduling

scheme is on the frame dropping strategy. Note that the algo-

rithm A2 not only produces the new vector µ that is used for

user scheduling, but also the value η = maxminn

1, it indicates that the target rate vector ¯ r = (¯ r1, ¯ r2,··· , ¯ rn)

is achievable in long term. If η < 1, it indicates that the

target rate vector ¯ r is un-achievable and some video frames

need to be dropped. To overcome the short-term uncertainty

of wireless channels, in this dynamic scheduling scheme, we

maintain a target range (η, ¯ η) of η. During the periodic re-

evaluation of the vector µ and η, if η < η, we will start to

drop packets, and if η > ¯ η, we will put some dropped packets

(whose playout deadline is not passed yet) back to the queue.

To support the function of putting dropped packets back to

the queue, we do not really drop a packet unless its playout

deadline is passed. Instead, we simply mark it to be dropped.

When we need to drop some packets (i.e., η < η) or need

to put some dropped packets back to the queue (i.e., η > ¯ η),

we first choose the user using round robin. After the user is

selected, we choose the packets that have the lowest priority

within a window from now when droppingpackets, and choose

the packets that have the highest priority among those marked

as dropped when putting dropped packets back to the queue.

Then we re-compute the vector µ and η using Algorithm A2

and repeat the process until the value η falls between η and

¯ η. We use the final result µ for subsequent user scheduling.

i=1

ri

¯ ri. If η >

V. SIMULATION RESULTS

The following settings are used for simulations. All video

sequences are encoded at 30Hz with GOP size of 16 pictures

and an intra period of 64 frames (about 0.5Hz). Wireless

channels are generated based on Rayleigh fading model unless

specified otherwise. Channel bandwidth is assumed to be

6

Page 7

1MHz unless specified otherwise and slot duration is set to

2ms. For the dynamic scheme, dynamic procedure is activated

once every 4 video frames. Our objective is to maximize the

sum Y-PSNR1of all users. In other words, the weights for all

users are set to 1. All results are average of ten simulation

runs.

We consider three reference schemes. The first is the scheme

in [14] where the user selection is based on maximum channel

capacity and packets are dropped based on their priority at

the time of buffer overflow. The buffer limit for each link is

110KBytes as used in [14]. Packets with the lowest priority

are dropped first at the time of buffer overflow. This scheme

is termed as Maximum capacity scheduling w/ FD (FD refers

to frame dropping). The other two reference schemes are

enhanced version of the Maximum capacity w/ FD where a

different user scheduling algorithm is employed. The second

scheme uses proportional fairness scheduling [23] and the

third uses Modified Largest Weighted Delay First (M-LWDF)

scheduling [3]. Note that in all the reference algorithms we

employ the same frame prioritization and dropping strategy

as in [14]. We choose these two enhanced scheduling al-

gorithms because i) the Proportional Fairness scheduling is

very widely used in wireless access networks, and ii) the

M-LWDF scheduling is designed for real-time traffic and is

shown in [13] to be one of the best scheduling algorithms for

video streaming. In the following we evaluate the developed

algorithms under four scenarios.

A. Variable mean SINR and different video sequences

In this scenario, we encode 8 video sequences with the SVC

extension [16] of H.264/MPEG4-AVC: News, Hall, Silent,

City, Foreman, Crew, Harbour, and Mobile, all of which

are downloaded from [21]. The average SINR value of each

user is uniformly distributed from 5dB to 20dB. The initial

buffer duration is randomly generated from 700 milliseconds

to 800 milliseconds. For fair comparison, we obtain the same

initial buffer duration and the same SINR values for different

schemes by using the same pseudo random number seed.

Figure 3 shows the obtained average PSNR of Y,U,V-

components of all eight video sequences for different schemes

(we connect the points that belong to the same scheme with

lines simply to group them together). Although our algorithms

are applied to improve the Y-PSNR in the simulations, they

actually improve the PSNR of all other components.

The average PSNR over all all video sequences is sum-

marized in Table III. It can be seen that both of our pro-

posed schemes achieve significant gains over existing schemes.

For the Y-PSNR, the static scheme achieves a gain of 1.5-

5.4 dB and the dynamic scheme achieves a gain of 2.2-

6.1 dB compared to the existing schemes. The improvement

on U-PSNR and V-PSNR is less because i) it is not our

objective to optimize the U and V-components, and ii) the

color components appear less affected by dropping frames.

1In video encoding, video signals are decomposed into three components:

Y stands for luma component (for brightness), U and V are the chrominance

components (for color). Among the three components, Y-component is the

most important one as human eyes are most sensitive to the brightness

information.

02468

20

25

30

35

40

45

User index

Y−PSNR (dB)

Dynamic scheduling

Static scheduling

Maximum−capacity w/ FD

Proportional Fairness w/ FD

M−LWDF w/ FD

02468

30

35

40

45

50

U−PSNR (dB)

02468

25

30

35

40

45

50

V−PSNR (dB)

Fig. 3.Average PSNR of different schemes for each user

TABLE III

AVERAGE PSNR ACHIEVED BY EACH SCHEME

Y-PSNR

Dynamic Scheduling

Static Scheduling

Maximum capacity w/ FD

Proportional fairness w/ FD

M-LWDF w/ FD

U-PSNR

42.5

42.3

39.9

41.1

41.7

V-PSNR

43.5

42.9

36.6

40.1

41.7

38.0

37.3

31.9

34.3

35.8

B. Same mean SINR, different video sequences

In the second scenario, we use the same 8 video sequences

as in the previous scenario, but choose the average channel

SINR for all users to be equal. We then investigate the video

quality when the average channel SINR varies. Figure 4(a)

shows the sum of the Y-PSNR of all eight video sequences

under different SINR values. We can see that the proposed

dynamic scheme always achieves the best video quality and the

static scheme is slightly worse the the dynamic scheme. When

the channel conditions are good, the M-LWDF scheduling

algorithm with frame dropping obtains slightly higher PSNR

than the static scheduling algorithm. However, when the

average channel SINR decreases, the video quality obtained

by all reference schemes including M-LWDF degrades very

quickly. This is because at bad channel conditions, all refer-

ence schemes do not perform early dropping and may end

up dropping important (low-layer) frames, which affects the

decoding process of high-layer video packets. In the case of

very low SINR, both of our schemes achieve an average gain

of more than 6 dB compared to the three reference schemes.

Moreover, when the SINR is low, most other schemes

468 101214 16 18 20

26

28

30

32

34

36

38

40

SINR (dB)

Y−PSNR (dB)

Dynamic scheduling

Static scheduling

Maximum−capacity w/ FD

Proportional Fairness w/ FD

M−LWDF w/ FD

(a)

46810121416 1820

0

1

2

3

4

5

6

SINR (dB)

Average number of un−decodable sequences

Dynamic scheduling

Static scheduling

Maximum−capacity w/ FD

Proportional Fairness w/ FD

M−LWDF w/ FD

(b)

Fig. 4.

of un-decodable sequences for each scheme.

(a) Average PSNR of all users for each scheme; (b) Average number

7

Page 8

2468

Number of users

10121416

22

24

26

28

30

32

34

36

38

Average Y−PSNR (dB)

Dynamic scheduling

Static scheduling

Maximum−capacity w/ FD

Proportional Fairness w/ FD

M−LWDF w/ FD

Fig. 5.

the number of users varies.

Average PSNR when all users request Mobile video sequence but

cannot decode the video sequences completely because of

the heavy packet loss2. Figure 4(b) shows the number of

video sequences that are not decodable under each scheduling

algorithm for different SINR values. Because of the early

dropping strategy employed, the dynamic scheme can decode

all video sequences for all SINR values greater or equal to

4dB, and the number of un-decodable sequences for the static

scheme is much smaller than that of all reference schemes

at the low SINR regime. For the static scheme, occasionally

some video sequences may not be decoded completely. This

is because the static scheme does not consider the instanta-

neous deadline requirement, which indicates that the dynamic

scheme is required in order to have the best performance.

C. Variable mean SINR, same video sequences

In this set of simulations, all the users request the same

Mobile video sequence (but the content is transmitted through

unicast) and we let the number of users vary. The average

SINRs (in dB) of the users are generated randomly using

uniform distribution from 5dB to 20dB. To accommodate more

users, we set the channel bandwidth to be 2.5MHz. Figure

5 shows the average PSNR of different schemes when the

number of users ranges from 2 to 16. When the number of

users is small, all scheduling algorithms perform very well

except for the Maximum-capacity algorithm w/ FD. But when

the number of users increases, our proposed schemes perform

much better than the reference schemes. The static scheme

and the dynamic one improve the average video quality by

8 dB and 10 dB, respectively, compared to the best of the

three reference schemes when there are 16 users. This again

demonstrates the efficacy of our proposed schemes.

D. Robustness under inaccurate channel model

In Sections III and IV, we assume precise information on

the fading distributions. We now evaluate the performance

of the algorithms when the fading distribution information is

inaccurate. We assume Rayleigh distribution in our scheduling

algorithms but let the actual fading distribution be Rician

2The PSNR values plotted in Fig. 4(a) are the average values of the frames

that can be successfully decoded. If we account for the frames that are not

decodable, the actual PSNR of other schemes is even lower.

00.10.20.30.40.5

v2/σ2

0.60.70.80.91

31

32

33

34

35

36

37

38

39

40

Average Y−PSNR (dB)

Dynamic scheduling

Static scheduling

M−LWDF w/ FD

Proportional Fairness w/ FD

Maximum−capacity w/ FD

Fig. 6.

Rician distribution.

Average PSNR vs. v2/σ2where v and σ are the parameters in the

with PDF function f(x|v,σ) =

where I0(z) is the modified Bessel function of the first kind

with order zero. When v = 0, the distribution reduces to a

Rayleigh distribution. A non-zero v indicates the deviation

from Rayleigh distribution, and normally v2≤ σ2. Other

simulation setups are identical to those in Section V-A.

Figure 6 shows the average PSNR of the eight video

sequences vs. v2/σ2. We maintain fixed SINR values when

v2/σ2changes. We can see that with the static scheme, the

average PSNR drops about 0.8dB when v = σ, and there is

nearly no drop in the PSNR values for the dynamic scheme.

Therefore, our algorithms (especially the dynamic one) is

robust to the channel distribution errors.

x

σ2exp(−(x2+v2)

2σ2

)I0(xv

σ2),

VI. CONCLUSION

In this paper we study the problem of scalable video

streaming in fading wireless environments. We exploit both the

application layer video characteristics and the wireless channel

fading information to obtain a cross-layer solution. We first

develop a model to characterize the relationship between the

average rate and average PSNR of a video stream. We then

formulate the problem as a long-term radio resource allocation

problem in a fading environment in order to maximize the

weighted sum of average PSNR of all users. We develop

an algorithm to find the optimal scheduling policy and the

parameters used by the scheduling policy, and rigorously prove

the optimality of the solution. We next design two scheduling

algorithms based on the results of the long-term resource

allocation scheme. Simulation results show that our scheduling

algorithms are much superior to existing solutions and are

robust to channel estimation errors.

REFERENCES

[1] Advanced video coding for generic audiovisual services. ITU-T Recom-

mendation H.264-ISO/IEC 14496-10(AVC), ITU-T and ISO/IEC JTC 1,

2003.

[2] M. Andrews, K. Kumaran, K. Ramanan, A. Stolyar, R. Vijayakumar,

and P. Whiting. CDMA data QoS scheduling on the forward link with

variable channel conditions. Bell Labs Technical Memorandum, April

2000.

[3] M. Andrews, K. Kumaran, K. Ramanan, A. Stolyar, R. Vijayakumar,

and P. Whiting. Providing quality of service over a shared wireless link.

IEEE Communication Magazine, Feb. 2001.

[4] M. Andrews, L. Qian, and A. Stolyar. Optimal utility based multi-user

throughput allocation subject to throughput constraints. 2005.

8

Page 9

[5] D. Bertsekas. Nonlinear Programming (2nd Edition). Athena Scientific,

April, 2004.

[6] S. Borst. Dynamic rate control algorithms for hdr throughput optimiza-

tion. In IEEE Infocom, 2001.

[7] M. Chen and A. Zakhor. Rate control for streaming video over wireless.

IEEE Wireless Communications, 12(4), Aug. 2005.

[8] P. A. Chou and Z. Miao. Rate-distortion optimized streaming of

packetized media. IEEE Transactions on Multimedia, 8(2), April 2006.

[9] J.M. Costa. More frequencies needed for mobiles - terrestrial spectrum

sought for imt. ITU News, No. 3, April 2007.

[10] S. Deb, S. Jaiswal, and K. Nagaraj.

WiMAX networks. In IEEE Infocom, Phoenix, AZ, 2008.

[11] M. Kalman and B. Girod. Optimized transcoding rate selection and

packet scheduling for transmitting multiple video streams over a shared

channel. Proc. IEEE International Conference on Image Processing,

ICIP-2005, September 2005.

[12] W. Kuo and W. Liao. Utility-based resource allocation in wireless

networks. IEEE Transactions on Wireless Communcations, 6(10), Oct.

2007.

[13] G¨ unther Liebl, Hrvoje Jenkac, Thomas Stockhammer, and Christian

Buchner. Radio link buffer management and scheduling for wireless

video streaming.

Telecommunication Systems, Springer Netherlands,

30(1-3), Nov. 2005.

[14] G¨ unther Liebl, Thomas Schierl, Thomas Wiegand, and Thomas Stock-

hammer. Advanced wireless multiuser video streaming using the

scalable video coding extensions of H.264/MPEG4-AVC. In IEEE ICME

2006.

[15] M. Lu, P. Steenkiste, and T. Chen. Time-based adaptive retry for wireless

video streaming.

Wireless Communications and Mobile Computing,

Special Issue on Video Communications for 4G Wireless Systems, Jan

2007.

[16] H. Schwarz, D. Marpe, and T. Wiegand. Overview of the scalable video

coding extension of the H.264/AVC standard. 17(9), Sep. 2007.

[17] S. Shakkottai and R. Srikant. Scheduling real-time traffic with deadlines

over a wireless channel. In WoWMoM, 1999.

[18] S. Shakkottai and A. L. Stolyar. Scheduling for multiple flows sharing a

time-varying channel: the exponential rule. Analytic Methods in Applied

Probability, 207:185–202, 2001.

[19] A. Stolyar. On the asymptotic optimality of the gradient scheduling

algorithm for multiuser throughput allocation.

53(1), 2005.

[20] K. Stuhlm¨ uller, N. F¨ arber, M. Link, and B. Girod. Analysis of video

transmission over lossy channels. IEEE Journal on Selected Areas in

Communications, 18(6):1012–1032, June 2000.

[21] Video test sequences. http://trace.eas.asu.edu/yuv/.

[22] N. Tizon and B. Pesquet-Popescu. Scalable and media aware adaptive

video streaming over wireless networks. EURASIP Journal on Advances

in Signal Processing, 2008.

[23] D. Tse.Multiuserdiversity

http://www.eecs.berkeley.edu/∼dtse/stanford416.ps, 2001.

[24] W. Tu, J. Chakareski, and E. Steinbach. Rate-distortion optimized frame

dropping for multiuser streaming and conversational videos. Advances

in Multimedia, 8(2), Jan. 2008.

[25] F. Yang, Q. Zhang, W. Zhu, and Y. Zhang. Streaming and bit allocation

for scalable video over mobile wireless internet. In IEEE Infocom, 2004.

[26] H. Zhang and S. Rangarajan. Adaptive scheduling of streaming video

over wireless networks. In IEEE International conference on Multimedia

& Expo (ICME), June 2008.

[27] H. Zhang, Y. Zheng, M. A. Khojastepour, and S. Rangarajan. Scalable

video streaming over fading wireless channels.

Communications & Networking Conferences, April 2009.

Real-time video multicast in

Operations Research,

inwirelessnetworks.

In IEEE Wireless

APPENDIX

Proof of Theorem 2

We first prove a technical lemma.

Lemma 6 For any 1 ≤ i ≤ n,?n

To prove the lemma, notice that if we re-scale the vector

µ by a constant, the function gi(µ) remains fixed. In other

words, gi(µ) = gi(τµ). Now consider the partial derivative

(letting τ = µ1/µ′

j=1µj

∂gi(µ)

∂µj

= 0.

1)

∂gi(µ)

∂µ1

= lim

µ′

1→µ1

gi(µ′

1,µ2,...,µn) − gi(µ1,µ2,...,µn)

µ′

1− µ1

= lim

µ′

1→µ1

gi(µ1,τµ2,...,τµn) − gi(µ1,µ2,...,µn)

µ′

1− µ1

n

?

j=2

∂µj

n

?

j=2

∂µj

µ1

n

?

j=2

∂µj

= lim

µ′

1→µ1

∂gi(µ)

µj(τ − 1)

µ′

1− µ1

= lim

τ→1

∂gi(µ)

−µjτ

=−

∂gi(µ)

µj

µ1.

(23)

Re-arrange the equations, we obtain the result in the lemma.

?.

We now prove theorem 2. From Lemma 5, algorithm A2

converges to a stationary point of the function h(µ). That

means at the limit point µ,

∇g(g(µ) − ¯ g) = 0,

(24)

In order to show the limit point µ is the optimal solution to

problem (19) and (20), it is sufficient to show that gi(µ) = ¯ g

for all 1 ≤ i ≤ n. We prove using contradiction. Suppose

that there exist gi(µ)’s that are not all equal and satisfy Eq.

(24). Denote wi = gi(µ) − ¯ g and w = (w1,··· ,wn), and

we have ∇g · w = 0 where w is a vector with zero mean.

Without loss of generality, we can assume wi’s are sorted in

the decreasing order. Because w is a zero-mean vector and

its elements are not all equal, at least one element of w is

positive. Now suppose that k is the index such that w1 ≥

··· ≥ wk ≥ 0 ≥ wk+1 ≥ ··· ≥ wn. For simplicity, we use

the matrix A to represent ∇g and aij=

matrix A = ∇g, the diagonal elements are positive and the rest

are negative. If k = 1, the first element of Aw cannot be zero

because it is equal to?n

negative and the first term is positive. This is a contradiction.

For k > 1, we look at the ith (i ≤ k) element of Aw, which is

?n

?k

matrix form, we obtain

∂gj

∂µi. Note that in the

j=1a1jwj where all terms are non-

j=1aijwj= 0. Note that aijwj> 0 for j > k. Therefore,

j=1aijwj< 0 and aiiwi<?k

j=1,j?=iwj|aij|. Writing it in

Diag(w1,··· ,wk)[a11,a22,··· ,akk]T< Akw[k],

where Ak is a matrix with k rows and k columns and the

element at the ith row and jth column of Ak is |aij| except

that the diagonal elements are zero, w[k]is a vector with the

first k elements of w, and the sign “<” is element-wise.

Since w1,··· ,wkare all positive, we have

[a11,a22,··· ,akk]T< Diag(w1,··· ,wk)−1Akw[k].

(25)

If we write the results of Lemma 6 in matrix form and

note that A = ∇g, we have µTA = 0 and so ATµ = 0.

Still look at the ith (i ≤ k) element of ATµ, and we have

?n

(because µj> 0 and aji< 0 for j > i). So the summation of

the first k terms must be positive. That is,?k

Therefore, µiaii >?k

matrix form, we obtain (recall the definition of Ak, and µ[k]

is a vector containing the first k elements of µ)

j=1ajiµj = 0. Note that all terms for j > k are negative

j=1ajiµj > 0.

j=1,j?=iµjaji. Again, writing them in

Diag(µ1,··· ,µk)[a11,a22,··· ,akk]T> AT

kµ[k].

9

Page 10

Since µi’s are all positive, we have

[a11,a22,··· ,akk]T> Diag(µ1,··· ,µk)−1AT

kµ[k].

(26)

Combining the two equations (25) and (26), we have

Diag(µ1,··· ,µk)−1AT

kµ[k]< Diag(w1,··· ,wk)−1Akw[k](27)

Multiplying both sides in the left by Diag(µ1w1,··· ,µkwk),

andnote

µ[k]

=Diag(µ1,···µk)J

Diag(w1,···wk)J where J = [1,1,··· ,1]Thas k elements,

we obtain

and

w[k]

=

Diag(w1,··· ,wk)AT

< Diag(µ1,··· ,µk)AkDiag(w1,··· ,wk)J

kDiag(µ1,··· ,µk)J

(28)

Let B = Diag(µ1,··· ,µk)AkDiag(w1,··· ,wk). Eq. (28)

can be written as

BTJ < BJ.

Thus, sum(BTJ) < sum(BJ). However, sum(BTJ) =

sum(BJ) because both are equal to the summation of all

elements in the matrix B. Therefore, this is a contradiction

and the proof is complete.

?

10