Page 1

1

On Real Time Coding with Limited Lookahead

Himanshu Asnani∗and Tsachy Weissman†

Abstract

A real time coding system with lookahead consists of a memoryless source, a memoryless channel, an encoder, which encodes

the source symbols sequentially with knowledge of future source symbols upto a fixed finite lookahead, d, with or without feedback

of the past channel output symbols and a decoder, which sequentially constructs the source symbols using the channel output.

The objective is to minimize the expected per-symbol distortion.

For a fixed finite lookahead d ≥ 1 we invoke the theory of controlled markov chains to obtain an average cost optimality

equation (ACOE), the solution of which, denoted by D(d), is the minimum expected per-symbol distortion. With increasing d,

D(d) bridges the gap between causal encoding, d = 0, where symbol by symbol encoding-decoding is optimal and the infinite

lookahead case, d = ∞, where Shannon Theoretic arguments show that separation is optimal.

We extend the analysis to a system with finite state decoders, with or without noise-free feedback. For a Bernoulli source and

binary symmetric channel, under hamming loss, we compute the optimal distortion for various source and channel parameters,

and thus obtain computable bounds on D(d). We also identify regions of source and channel parameters where symbol by symbol

encoding-decoding is suboptimal. Finally, we demonstrate the wide applicability of our approach by applying it in additional coding

scenarios, such as the case where the sequential decoder can take cost constrained actions affecting the quality or availability of

side information about the source.

Index Terms

Actions, Average Cost Optimality Equation (ACOE), Beliefs, Bellman Equation, Constrained Markov Decision Process,

Controlled Markov Chains, Expected Average Distortion, Finite State Decoders, Lagrangian, Lookahead, Optimal Cost, Policy,

Side Information, Value Iteration, Vending Machine.

I. INTRODUCTION

A. Motivation and Related Work

A memoryless source {U1,U2,...} is to be communicated over a memoryless channel with the objective of minimizing

expected average (per-symbol) distortion, with or without the availability of unit-delay noise-free feedback. The communication

is in real time and hence the encoding and decoding is sequential, with a fixed finite lookahead of source symbols available

at the encoder (cf. the setting in Fig. 1). The motivation stems from practical systems such as for video streaming, cache

memory devices in computing systems, real time communication systems etc., where the encoder has a fixed buffer of future

source symbols, and the quality of service demands that encoding and decoding should be in real time. The problem finds its

applications in other sequential decision systems, where resource allocation should be done on the fly due to adverse effects

of latency or delay, such as sensor networks, weather-monitoring systems, flow in societal networks such as transportation

networks, recycling systems, etc. A natural criterion of performance is to minimize the expected average distortion. What is

the best we can do here ? Note that such a framework with real time constraints is not covered by Shannon Theory. In classical

Information Theory, encoding of long “typical” sequences in blocks as well as block decoding introduces large delays and

thus such achievable schemes violate the very premise of bounded or no delay constraint. To answer the question, we invoke

markov decision theory and cast our problem and other such variants as discrete time controlled markov chains with average

cost criterion.

The problem is well motivated by practical problems of delay constrained source-channel coding and has been of much

interest in the literature. There have also been many different ways to model the notion of sequential encoding and decoding.

In the source coding context, causal source codes were studied in [1], [2], [3], which demand the reconstruction to depend

causally on the source symbols. But this is a much weaker constraint and causal source codes can operate on large delays as

was pointed out in [1] itself. Causal source codes with side information were studied in [4].

Note that we can transform our setting of limited encoder lookahead of d, to that of a zero lookahead of a markov source,

Vi= Ui+d

i

. This transformation puts the problem in the class of sequential encoding decoding problems with markov sources.

When the communication horizon is fixed, the structure of optimal encoding and decoding policies with Markov sources

have been studied in [5], [6], [7], [8], [9], [10]. In [11], authors propose a systematic methodology for such a non-classical

information structure to search for an optimal strategy.

The problem of real time coding and decoding in semi stochastic setting, i.e., for the individual sequences was studied in

[12] and [13], while finite state digital systems were the subject of study in [14].

The connection between dynamic programming and information theory has been well exploited. The problem of computing

the capacity of channels with feedback was formulated as a Markov Decision Process in [15], [16]. The long standing problem

∗Stanford University, Email: asnani@stanford.edu.

†Stanford University, Email: tsachy@stanford.edu.

arXiv:1105.5755v1 [cs.IT] 29 May 2011

Page 2

2

of capacity of trapdoor channel (cf. [17], [18]) with feedback was evaluated using average cost optimality equations in [19].

Zero error capacity for certain channel coding problems was computed using dynamic programming in [20].

B. Contributions and Organization of the Paper

The approaches in [5], [6], [7], [8], [9], [10] and [11] are inspired by control theory, which provides tools for finding optimal

schemes and understanding their structure. In this work, we take these tools further to provide more explicit expressions and

bounds for the optimum performance under a given lookahead constraint d. While optimum performance in the case d = 0

is easily shown to be attained by “symbol-by-symbol” operations, and the case d = ∞ can be answered with the tools of

Shannon theory, for any finite d ≥ 1, the existing literature does not provide useful analytical values or bounds on the minimum

expected average distortion, D(d). In addition to being amenable to a decision theoretic formulation of Markov sources, as

in the surveyed literature above, the model we consider here is more basic and lends itself to simpler average cost optimality

equation, which in some cases (cf. Section V) can be computed exactly. While in [7], [8], [10] emphasis is on expected total

fixed horizon cost, we argue that expected average cost over infinite horizon is a more natural criterion of performance as in

the sequential encoding and decoding problems, we typically do not know when to stop, and hence we would like to analyze

the asymptotics of the horizon-independent problem. While the main focus in this work has been to characterize the minimum

achievable distortion, the average cost optimality equations also characterize sufficient conditions on the optimality of stationary

(encoding and decoding) policies.

Note that in our communication problem in Fig. 1, the lookahead is available only at the encoder while the decoder constructs

the estimates causally, instead of a seemingly more general setting where lookahead of leis present at the encoder while decoder

has lookahead ld. However performance of any policy/code with encoder and decoder lookahead parameters (le,lm) can be

attained arbitrarily closely by the optimal policy for our setting in Fig. 1 with d = le+ lmas pointed out in Section II of

[13]. Authors in [21] consider the communication problem similar to our setting with le= 0, ld= d for d ≥ 0, per-symbol

distortion D(d) and show that D(d) converges exponentially rapidly to D(∞) and provide bounds on the exponent. However

the results are asymptotic in nature and hence different from this work, which is explicit exact or approximate characterization

of values for D(d) for any fixed, possibly small d.

Recently there has been work in the direction of “action in information theory” , i.e. canonical Shannon theoretic models

with encoder and/or decoder taking cost constrained actions to affect the generation or availability of channel state information,

side information, feedback etc., cf. action in point to point scenarios in [22], [23], [24] [25], [26] and in multi-terminal systems

in [27], [28]. We revisit the setting of source coding with a side information vending machine, as in [22] (See Fig. 6) for

the case where the encoding is sequential with lookahead, decoder takes an action Avsequentially dependent on the encoded

symbols to get side information about the source through a memoryless channel, PY |U,Av. The reconstruction of the source is

based upon the current encoded symbol, the current side information symbol and memories storing the past encoded symbols

and side information symbols. We show that the problem can be formulated as a constrained Markov Decision Process.

The main contribution of this paper is the casting of a large class of limited delay source, channel and joint source-channel

coding problems in the realm of sequential decision theory, obtain characterizations of the optimum performance via average

cost optimality equations with finite or compact state spaces, and solve exactly or obtain bounds for the expected average

distortion as a function of lookahead d.

The paper is organized as follows. Section II describes the basic model of problems with lookahead (See Fig. 1), encoding

is sequential using the lookahead and unit delay noise-free feedback, Xi(Ui+d,Yi−1), while the decoding depends on the

current channel output and the past memory,ˆUi(Yi,Zi−1). The memory evolves as Zi(Zi−1,Yi). We seek to find the minimum

expected average distortion as a function of lookahead, i.e.,

D(d) =inf

{Xi(·)},{ˆUi(·)}limsup

N→∞

E

?

1

N

N

?

i=1

Λ(Ui,ˆUi)

?

.

(1)

In Section III we present an overview of controlled markov processes with average cost, the unconstrained case in Section

III-A and constrained control in Section III-B. Section IV studies the case of complete memory, i.e., Zi= Yi. In Section IV-A

we use the theory of Section III to construct an average cost optimality equation, the solution to which is the average optimal

distortion. In Section IV-B, we consider the question “to look or not to lookahead ” and specify a sufficient condition under

which symbol by symbol encoding-decoding is optimal for a given source, channel, distortion function and lookahead. This

kind of result in our problem of sequential encoding decoding with lookahead complements that of “to code or not to code

” of [29]. In Section V, we consider the framework with finite state decoders, constructing corresponding ACOE in Section

V-A. In Section V-B, we use relative value iteration to solve the problem exactly for an example of binary source and binary

symmetric channel under hamming loss, thereby demonstrating how the average distortion values for this setting can be used to

bound D(d) of Section IV. We also contrast with the extreme cases of no lookahead, d = 0, where symbol by symbol policies

are optimal and d = ∞ where Shannon’s Separation Theorem [30] determines the minimum expected average distortion. We

also highlight the regions of source-channel parameters where for any finite d ≥ 1, symbol by symbol encoding-decoding is

strictly suboptimal for a Bernoulli source and binary symmetric channel. Section VI relaxes the assumption of the previous

Page 3

3

sections that feedback is present. In Section VII, the setting of source coding with a side information vending machine is

considered. Here again, encoding is sequential with lookahead, decoder takes cost constrained actions, Av,i, sequentially to

get side information about the source through a memoryless channel, PY |U,Av. The decoding is the optimal reconstruction

ˆUi(Xi,Yi,Mi−1,Ni−1), where Mi−1 and Ni−1 are the memories storing some or all of past encoded symbols and side

information symbols, respectively. Section VII-A evaluates the case when encoder also has access to the side information, with

decoder having complete memory in Section VII-A1, while finite memory decoders are considered in Section VII-A2. Section

VII-B studies the same source coding problem with a side information vending machine but now encoder has no access to

side information. Section VIII summarizes the methodology developed in this paper of constructing average cost optimality

equations. The paper is concluded in Section IX.

II. PROBLEM FORMULATION

We begin by explaining the notation to be used throughout this paper. Let upper case, lower case, and calligraphic

letters denote, respectively, random variables, specific or deterministic values which random variables may assume, and their

alphabets. For two jointly distributed random variables, X and Y , let PX, PXY and PX|Y respectively denote the marginal

of X, joint distribution of (X,Y ) and conditional distribution of X given Y . Xn

{Xm,Xm+1,···,Xn−1,Xn}. B(X) denotes the Borel σ-algebra of a given topological space, X. P(X) denotes the probability

simplex on the finite alphabet, X. Cb(X) denotes the set of continuous and bounded functions on the topological space X. 1{·}

stands for the indicator function. N and R denote the sets of natural and real numbers respectively. We impose the assumption

of finiteness of cardinality on all alphabets of operational significance (source, channel input, channel output, reconstruction),

unless otherwise indicated. The general problem setup, depicted in Fig. 1 consists of the following principle components :

mis a shorthand for the n − m + 1 tuple

CHANNEL

ENCODER

CHANNEL

CHANNEL

DECODER

MEMORY

MEMORYLESS

MEMORYLESS

SOURCE

Ui

Yi

ˆUi(Yi,Zi−1)

PY |X

Yi

Zi−1

Zi(Zi−1,Yi)

Yi−1

Xi(Ui+d,Yi−1)

Fig. 1.

uses present channel output and past memory for source reconstruction. Complete memory case corresponds to Zi= Yiwhich implies |Zi| = |Y|i.

Real time coding with lookahead. Encoder uses future source symbols upto a fixed finite lookahead, d and unit-delay noise free feedback, decoder

• Source : Generates i.i.d. source symbols, {Ui}i∈N∈ U. The source symbols are distributed ∼ PU.

• Channel Encoder : The encoder has access to unit-delay noise-free feedback from the channel output and future source

symbols upto a fixed finite lookahead, d, i.e, Xi= fe,i(Ui+d,Yi−1), where fe,iis the encoding function, fe,i: Ui+d×

Yi−1→ X, i ∈ N.

• Channel : Given channel input symbol, xi, and all the source symbols and past channel inputs and outputs,

(u∞

1,xi−1,yi−1), channel output, yiis distributed i.i.d. ∼ PY |X, i.e.,

P(yi|u∞

1,xi,yi−1) = PY |X(yi|xi).

(2)

• Memory : The decoder cannot make use of all the channel output symbols upto current time due to memory constraints.

Memory is updated as a function of the past state of the memory and the current channel output, i.e., Zi= fm,i(Zi−1,Yi),

where the fm.iis the memory update function, fm,i: Zi−1× Y → Zi, i ∈ N. Note that the alphabet Zican grow with

i, hence the setup also includes the special case of complete memory, i.e., Zi= Yiwhich implies |Zi| = |Y|i.

• Channel Decoder : Channel decoder uses the current channel output and the past memory state to construct its estimate

of the source symbol, i.e.,ˆUi= fd,i(Zi−1,Yi), the decoding rule is the map, fd,i: Zi−1× Y →ˆU.

The alphabets U, X, Y andˆU are assumed to be finite. Let Λ(·,·) : U ×ˆU → R indicate a distortion function. We assume

for simplicity that, 0 ≤ Λ(·,·) ≤ Λmax < ∞. Let the tuple µ(d) = (fe,fm,fd) indicate the sequence of encoding rules,

{fe,i}i∈N, memory update rules, {fm,i}i∈Nand decoding rules, {fd,i}i∈N.

Page 4

4

Definition 1: [Distortion-Optimal Policy] For a fixed lookahead, d, we define d-distortion optimal policies, Popt(d) as the

set of (fe,fm,fd)-policies, denoted by µ(d), which achieve the minimum expected average distortion, i.e,

?

The corresponding minimum expected distortion as a function of lookahead, d,

Popt(d) =µ(d) : µ(d) = arginf

{fe,fm,fd}limsup

N→∞

E

?

1

N

N

?

i=1

Λ(Ui,ˆUi)

??

.

(3)

D(d) =inf

{fe,fm,fd}limsup

N→∞

E

?

1

N

N

?

i=1

Λ(Ui,ˆUi)

?

.

(4)

Our main goal is to characterize D(d) and identify structural properties of the elements of Popt(d).

Note 1: Note that inf in the definition of D(d) can equivalently be replaced by min (cf. Appendix A). This implies that

Popt(d) is non-empty. Taking limsup in definition of D(d), while appearing more conservative, is actually inconsequential as

you would get the same value of D(d) if you put a liminf in the definition. This can be easily argued as follows. Let, the

per-symbol expected distortion under a policy µ upto time N be denoted by D(N)

distortion criterion with limsup and liminf respectively, we know Dinf(d) ≤ Dsup(d). We will now show Dinf(d) ≥ Dsup(d).

Let a policy µ∗attains the infimum for Dinf(d) (that there exists such policy follows from the same arguments as above

for the non-emptiness of Popt(d)). This implies (as Λ(·) is bounded) for ? > 0, ∃ N(?) > 0 such that under this policy

D(N(?))≤ Dinf(d) + ?. Operating such a policy in b blocks,

Dsup(d) ≤ lim

which implies in the limit ? → 0, Dsup(d) ≤ Dinf(d).

µ

. Denoting Dsup(d) and Dinf(d) as the

b→∞D(N(?)b)

µ∗

≤ Dinf(d) + ?,

(5)

III. CONTROLLED MARKOV PROCESS WITH AVERAGE COST : BACKGROUND AND PRELIMINARIES

We present here an overview of parts of the controlled Markov process with average cost criterion framework that will

be applied. First, we present an overview of the unconstrained case where the only objective is to maximize an expected

average cost. We then consider the constrained case where, in addition, the system needs to satisfy certain expected average

cost constraints.

A. Unconstrained Control

Here we overview results about general Borel state and action spaces. We refer to [31] for a more complete discussion. The

problem is characterized by the tuple (S,As,A,W,F,PS,PW,g) and a discrete time dynamical system,

st= F(st−1,at,wt),

where the states sttake values in finite, countable or in general Borel space S (called the state space), actions attake values

in the admissible action space, As(st) which is a subset of a compact subset A (called the action space) of a Borel space,

and the disturbance, wt, takes values in a measurable space W (called the disturbance space). Initial state S0is drawn with

distribution PS and the disturbance wt is drawn from the distribution, PW(·|st−1,at) which depends on past actions and

states, only through the pair (st−1,at). We consider only measurable functions. A policy π is defined to be the sequence of

functions, π = (µ1,µ2,···), where µtis the function which maps histories (φt= (s0,w0,···,wt−1)) to actions. A set of history

deterministic policies, ΠHD is characterized by policies for which actions are generated as at= µt(Φt). A set of Markov

deterministic policies, ΠMDis characterized by policies for which actions are generated as at= µt(st−1). A set of policies

ΠSDis referred to as stationary deterministic if it is characterized by a function µ : S → A such that, µt(Φt) = µ(st−1) ∀ t.

Policies can be randomized or deterministic ([31], Section 2.2). The policy sets ΠHR, ΠMRand ΠSRrespectively stand for

history randomized, markov randomized and stationary randomized policies. As per our definitions and interests, the largest

class of policies considered henceforth will be history deterministic policies, ΠHD. Let

(6)

K = {(x,a) : x ∈ S,a ∈ As(x)} ∈ B(S × A).

(7)

Note if S and A are compact subsets of a Borel space, K is a compact subset ∈ B(S ×A). The dynamics induce a stochastic

transition kernel on B(S)×K, Q(·|x,a), which implies for each (x,a) ∈ K, Q(·|x,a) is probability measure on B(S) and for

each D ∈ B(S), Q(D|·) is Borel measurable on K.

The objective is to maximize expected average reward given a bounded one stage reward function, g : K → R and find the

optimal policy. The average reward of a policy π with a given initial state distribution ν is defined by,

?

N

t=1

J(ν,π)

?

= liminf

N→∞Eπ

ν

1

N

?

g(St−1,µt(Φt))

?

.

(8)

Page 5

5

The optimal average reward and the optimal policy is defined by,

Jopt(ν)=sup

π

{π : J(ν,π) = Jopt(ν)}.

J(ν,π)

(9)

πopt(ν)=

(10)

Note that in general for a controlled Markov process with average cost criterion, where the state space is infinite, the total

expected average cost might depend on the initial state. However, operationally, since our objective is to minimize the expected

average distortion as in Eq. 4, we can decide to start of the system with the best initial state, state which yields the best

distortion, in which case the optimal cost and optimal policy will be denoted by, Joptand πopt.

Jopt

= sup

ν

{π : ∃ ν s.t. J(ν,π) = Jopt}.

Jopt(ν)

(11)

πopt

=

(12)

We need not dwell on sensitivity of the optimal cost to initial states, as this will not be an issue in our application of this

framework. However when state space is say finite, irreducible and positive recurrent, average cost is indeed equal for all initial

states. In general, there can be more than one optimal policy, in which case ties are resolved arbitrarily.

The following theorem describes the average cost optimality equation (ACOE) for such a process, and relates the optimal

reward with the optimal stationary deterministic policy.

Theorem 1 (cf. Theorem 6.1 of [31]): If λ ∈ R and a bounded function h : S → R satisfy,

?

then λ = Jopt. Further, if there is a function µ : S → A such that µ(s) attains the supremum above for all states, then

J(π) = Joptfor π = {µ1,µ2,···} with µi(Φi) = µ(si−1), ∀i.

Note 2: As in [31], the above theorem assumes the conditions of semi-continuous model, ([31], Section 2.4). However in

the set of problems considered in our paper, all such assumptions will be trivially met such as the transition kernel being

weakly continuous in K and the continuity of g. For brevity, we omit explicitly mentioning such assumptions before invoking

the above theorem in the sections to follow.

λ + h(s) = sup

a∈A

g(s,a) +

?

PW(dw|s,a)h(F(s,a,w))

?

, ∀ s ∈ S,

(13)

B. Constrained Control

In constrained control, the system is characterized by the tuple (S,As,A,W,F,PS,PW,g,l,Γ). With all the terms carrying

the same meaning as in previous subsection, l = {l1(·),···,lk(·)} and Γ = {Γ1,···,Γk} are respectively k-dimensional

constraint functions (defined on K) and cost vectors for some k ∈ N. the dynamics of the system are precisely the same as

in the unconstrained case, the objective here being,

maximizeJ(ν,µ)

Jc

i(ν,µ) ≤ Γi∀ i = 1,···,k,

subject to

(14)

where,

J(ν,π)

?

= liminf

N→∞Eπ

ν

?

1

N

N

?

t=1

g(St−1,µt(Φt))

?

,

(15)

is the average cost and,

Jc

i(ν,π)

?

= liminf

N→∞Eπ

ν

?

1

N

N

?

t=1

li(St−1,µt(Φt))

?

∀ i = 1,···,k,

(16)

are the constraints. [31] and [32] provide a treatment of this problem but only for denumerable states. We here present the

more general framework of [33], with compact state and action spaces. The Lagrangian, L, associated with the problem is

defined as,

L((ν,π),λ) = J(ν,π) +

k

?

i=1

λi(Γi− Jc

i(ν,µ)),

(17)

for any (ν,π) ∈ P(S) × ΠHDand λ = (λ1,···,λk) ∈ Rk

The following theorem gives conditions of optimality of a particular initial state distribution and a policy.

Theorem 2: [Theorem 2.3 of [33]] Assume the following conditions for the tuple (S,As,A,W,F,PS,PW,g,l,Γ),

C1 S and K are compact.

C2 g ∈ Cb(K) and li∈ Cb(K), ∀ i = 1,···,k.

+(positive orthant of the k-dimensional Euclidean space).