Page 1
1
On Real Time Coding with Limited Lookahead
Himanshu Asnani∗and Tsachy Weissman†
Abstract
A real time coding system with lookahead consists of a memoryless source, a memoryless channel, an encoder, which encodes
the source symbols sequentially with knowledge of future source symbols upto a fixed finite lookahead, d, with or without feedback
of the past channel output symbols and a decoder, which sequentially constructs the source symbols using the channel output.
The objective is to minimize the expected per-symbol distortion.
For a fixed finite lookahead d ≥ 1 we invoke the theory of controlled markov chains to obtain an average cost optimality
equation (ACOE), the solution of which, denoted by D(d), is the minimum expected per-symbol distortion. With increasing d,
D(d) bridges the gap between causal encoding, d = 0, where symbol by symbol encoding-decoding is optimal and the infinite
lookahead case, d = ∞, where Shannon Theoretic arguments show that separation is optimal.
We extend the analysis to a system with finite state decoders, with or without noise-free feedback. For a Bernoulli source and
binary symmetric channel, under hamming loss, we compute the optimal distortion for various source and channel parameters,
and thus obtain computable bounds on D(d). We also identify regions of source and channel parameters where symbol by symbol
encoding-decoding is suboptimal. Finally, we demonstrate the wide applicability of our approach by applying it in additional coding
scenarios, such as the case where the sequential decoder can take cost constrained actions affecting the quality or availability of
side information about the source.
Index Terms
Actions, Average Cost Optimality Equation (ACOE), Beliefs, Bellman Equation, Constrained Markov Decision Process,
Controlled Markov Chains, Expected Average Distortion, Finite State Decoders, Lagrangian, Lookahead, Optimal Cost, Policy,
Side Information, Value Iteration, Vending Machine.
I. INTRODUCTION
A. Motivation and Related Work
A memoryless source {U1,U2,...} is to be communicated over a memoryless channel with the objective of minimizing
expected average (per-symbol) distortion, with or without the availability of unit-delay noise-free feedback. The communication
is in real time and hence the encoding and decoding is sequential, with a fixed finite lookahead of source symbols available
at the encoder (cf. the setting in Fig. 1). The motivation stems from practical systems such as for video streaming, cache
memory devices in computing systems, real time communication systems etc., where the encoder has a fixed buffer of future
source symbols, and the quality of service demands that encoding and decoding should be in real time. The problem finds its
applications in other sequential decision systems, where resource allocation should be done on the fly due to adverse effects
of latency or delay, such as sensor networks, weather-monitoring systems, flow in societal networks such as transportation
networks, recycling systems, etc. A natural criterion of performance is to minimize the expected average distortion. What is
the best we can do here ? Note that such a framework with real time constraints is not covered by Shannon Theory. In classical
Information Theory, encoding of long “typical” sequences in blocks as well as block decoding introduces large delays and
thus such achievable schemes violate the very premise of bounded or no delay constraint. To answer the question, we invoke
markov decision theory and cast our problem and other such variants as discrete time controlled markov chains with average
cost criterion.
The problem is well motivated by practical problems of delay constrained source-channel coding and has been of much
interest in the literature. There have also been many different ways to model the notion of sequential encoding and decoding.
In the source coding context, causal source codes were studied in [1], [2], [3], which demand the reconstruction to depend
causally on the source symbols. But this is a much weaker constraint and causal source codes can operate on large delays as
was pointed out in [1] itself. Causal source codes with side information were studied in [4].
Note that we can transform our setting of limited encoder lookahead of d, to that of a zero lookahead of a markov source,
Vi= Ui+d
i
. This transformation puts the problem in the class of sequential encoding decoding problems with markov sources.
When the communication horizon is fixed, the structure of optimal encoding and decoding policies with Markov sources
have been studied in [5], [6], [7], [8], [9], [10]. In [11], authors propose a systematic methodology for such a non-classical
information structure to search for an optimal strategy.
The problem of real time coding and decoding in semi stochastic setting, i.e., for the individual sequences was studied in
[12] and [13], while finite state digital systems were the subject of study in [14].
The connection between dynamic programming and information theory has been well exploited. The problem of computing
the capacity of channels with feedback was formulated as a Markov Decision Process in [15], [16]. The long standing problem
∗Stanford University, Email: asnani@stanford.edu.
†Stanford University, Email: tsachy@stanford.edu.
arXiv:1105.5755v1 [cs.IT] 29 May 2011
Page 2
2
of capacity of trapdoor channel (cf. [17], [18]) with feedback was evaluated using average cost optimality equations in [19].
Zero error capacity for certain channel coding problems was computed using dynamic programming in [20].
B. Contributions and Organization of the Paper
The approaches in [5], [6], [7], [8], [9], [10] and [11] are inspired by control theory, which provides tools for finding optimal
schemes and understanding their structure. In this work, we take these tools further to provide more explicit expressions and
bounds for the optimum performance under a given lookahead constraint d. While optimum performance in the case d = 0
is easily shown to be attained by “symbol-by-symbol” operations, and the case d = ∞ can be answered with the tools of
Shannon theory, for any finite d ≥ 1, the existing literature does not provide useful analytical values or bounds on the minimum
expected average distortion, D(d). In addition to being amenable to a decision theoretic formulation of Markov sources, as
in the surveyed literature above, the model we consider here is more basic and lends itself to simpler average cost optimality
equation, which in some cases (cf. Section V) can be computed exactly. While in [7], [8], [10] emphasis is on expected total
fixed horizon cost, we argue that expected average cost over infinite horizon is a more natural criterion of performance as in
the sequential encoding and decoding problems, we typically do not know when to stop, and hence we would like to analyze
the asymptotics of the horizon-independent problem. While the main focus in this work has been to characterize the minimum
achievable distortion, the average cost optimality equations also characterize sufficient conditions on the optimality of stationary
(encoding and decoding) policies.
Note that in our communication problem in Fig. 1, the lookahead is available only at the encoder while the decoder constructs
the estimates causally, instead of a seemingly more general setting where lookahead of leis present at the encoder while decoder
has lookahead ld. However performance of any policy/code with encoder and decoder lookahead parameters (le,lm) can be
attained arbitrarily closely by the optimal policy for our setting in Fig. 1 with d = le+ lmas pointed out in Section II of
[13]. Authors in [21] consider the communication problem similar to our setting with le= 0, ld= d for d ≥ 0, per-symbol
distortion D(d) and show that D(d) converges exponentially rapidly to D(∞) and provide bounds on the exponent. However
the results are asymptotic in nature and hence different from this work, which is explicit exact or approximate characterization
of values for D(d) for any fixed, possibly small d.
Recently there has been work in the direction of “action in information theory” , i.e. canonical Shannon theoretic models
with encoder and/or decoder taking cost constrained actions to affect the generation or availability of channel state information,
side information, feedback etc., cf. action in point to point scenarios in [22], [23], [24] [25], [26] and in multi-terminal systems
in [27], [28]. We revisit the setting of source coding with a side information vending machine, as in [22] (See Fig. 6) for
the case where the encoding is sequential with lookahead, decoder takes an action Avsequentially dependent on the encoded
symbols to get side information about the source through a memoryless channel, PY |U,Av. The reconstruction of the source is
based upon the current encoded symbol, the current side information symbol and memories storing the past encoded symbols
and side information symbols. We show that the problem can be formulated as a constrained Markov Decision Process.
The main contribution of this paper is the casting of a large class of limited delay source, channel and joint source-channel
coding problems in the realm of sequential decision theory, obtain characterizations of the optimum performance via average
cost optimality equations with finite or compact state spaces, and solve exactly or obtain bounds for the expected average
distortion as a function of lookahead d.
The paper is organized as follows. Section II describes the basic model of problems with lookahead (See Fig. 1), encoding
is sequential using the lookahead and unit delay noise-free feedback, Xi(Ui+d,Yi−1), while the decoding depends on the
current channel output and the past memory,ˆUi(Yi,Zi−1). The memory evolves as Zi(Zi−1,Yi). We seek to find the minimum
expected average distortion as a function of lookahead, i.e.,
D(d) =inf
{Xi(·)},{ˆUi(·)}limsup
N→∞
E
?
1
N
N
?
i=1
Λ(Ui,ˆUi)
?
.
(1)
In Section III we present an overview of controlled markov processes with average cost, the unconstrained case in Section
III-A and constrained control in Section III-B. Section IV studies the case of complete memory, i.e., Zi= Yi. In Section IV-A
we use the theory of Section III to construct an average cost optimality equation, the solution to which is the average optimal
distortion. In Section IV-B, we consider the question “to look or not to lookahead ” and specify a sufficient condition under
which symbol by symbol encoding-decoding is optimal for a given source, channel, distortion function and lookahead. This
kind of result in our problem of sequential encoding decoding with lookahead complements that of “to code or not to code
” of [29]. In Section V, we consider the framework with finite state decoders, constructing corresponding ACOE in Section
V-A. In Section V-B, we use relative value iteration to solve the problem exactly for an example of binary source and binary
symmetric channel under hamming loss, thereby demonstrating how the average distortion values for this setting can be used to
bound D(d) of Section IV. We also contrast with the extreme cases of no lookahead, d = 0, where symbol by symbol policies
are optimal and d = ∞ where Shannon’s Separation Theorem [30] determines the minimum expected average distortion. We
also highlight the regions of source-channel parameters where for any finite d ≥ 1, symbol by symbol encoding-decoding is
strictly suboptimal for a Bernoulli source and binary symmetric channel. Section VI relaxes the assumption of the previous
Page 3
3
sections that feedback is present. In Section VII, the setting of source coding with a side information vending machine is
considered. Here again, encoding is sequential with lookahead, decoder takes cost constrained actions, Av,i, sequentially to
get side information about the source through a memoryless channel, PY |U,Av. The decoding is the optimal reconstruction
ˆUi(Xi,Yi,Mi−1,Ni−1), where Mi−1 and Ni−1 are the memories storing some or all of past encoded symbols and side
information symbols, respectively. Section VII-A evaluates the case when encoder also has access to the side information, with
decoder having complete memory in Section VII-A1, while finite memory decoders are considered in Section VII-A2. Section
VII-B studies the same source coding problem with a side information vending machine but now encoder has no access to
side information. Section VIII summarizes the methodology developed in this paper of constructing average cost optimality
equations. The paper is concluded in Section IX.
II. PROBLEM FORMULATION
We begin by explaining the notation to be used throughout this paper. Let upper case, lower case, and calligraphic
letters denote, respectively, random variables, specific or deterministic values which random variables may assume, and their
alphabets. For two jointly distributed random variables, X and Y , let PX, PXY and PX|Y respectively denote the marginal
of X, joint distribution of (X,Y ) and conditional distribution of X given Y . Xn
{Xm,Xm+1,···,Xn−1,Xn}. B(X) denotes the Borel σ-algebra of a given topological space, X. P(X) denotes the probability
simplex on the finite alphabet, X. Cb(X) denotes the set of continuous and bounded functions on the topological space X. 1{·}
stands for the indicator function. N and R denote the sets of natural and real numbers respectively. We impose the assumption
of finiteness of cardinality on all alphabets of operational significance (source, channel input, channel output, reconstruction),
unless otherwise indicated. The general problem setup, depicted in Fig. 1 consists of the following principle components :
mis a shorthand for the n − m + 1 tuple
CHANNEL
ENCODER
CHANNEL
CHANNEL
DECODER
MEMORY
MEMORYLESS
MEMORYLESS
SOURCE
Ui
Yi
ˆUi(Yi,Zi−1)
PY |X
Yi
Zi−1
Zi(Zi−1,Yi)
Yi−1
Xi(Ui+d,Yi−1)
Fig. 1.
uses present channel output and past memory for source reconstruction. Complete memory case corresponds to Zi= Yiwhich implies |Zi| = |Y|i.
Real time coding with lookahead. Encoder uses future source symbols upto a fixed finite lookahead, d and unit-delay noise free feedback, decoder
• Source : Generates i.i.d. source symbols, {Ui}i∈N∈ U. The source symbols are distributed ∼ PU.
• Channel Encoder : The encoder has access to unit-delay noise-free feedback from the channel output and future source
symbols upto a fixed finite lookahead, d, i.e, Xi= fe,i(Ui+d,Yi−1), where fe,iis the encoding function, fe,i: Ui+d×
Yi−1→ X, i ∈ N.
• Channel : Given channel input symbol, xi, and all the source symbols and past channel inputs and outputs,
(u∞
1,xi−1,yi−1), channel output, yiis distributed i.i.d. ∼ PY |X, i.e.,
P(yi|u∞
1,xi,yi−1) = PY |X(yi|xi).
(2)
• Memory : The decoder cannot make use of all the channel output symbols upto current time due to memory constraints.
Memory is updated as a function of the past state of the memory and the current channel output, i.e., Zi= fm,i(Zi−1,Yi),
where the fm.iis the memory update function, fm,i: Zi−1× Y → Zi, i ∈ N. Note that the alphabet Zican grow with
i, hence the setup also includes the special case of complete memory, i.e., Zi= Yiwhich implies |Zi| = |Y|i.
• Channel Decoder : Channel decoder uses the current channel output and the past memory state to construct its estimate
of the source symbol, i.e.,ˆUi= fd,i(Zi−1,Yi), the decoding rule is the map, fd,i: Zi−1× Y →ˆU.
The alphabets U, X, Y andˆU are assumed to be finite. Let Λ(·,·) : U ×ˆU → R indicate a distortion function. We assume
for simplicity that, 0 ≤ Λ(·,·) ≤ Λmax < ∞. Let the tuple µ(d) = (fe,fm,fd) indicate the sequence of encoding rules,
{fe,i}i∈N, memory update rules, {fm,i}i∈Nand decoding rules, {fd,i}i∈N.
Page 4
4
Definition 1: [Distortion-Optimal Policy] For a fixed lookahead, d, we define d-distortion optimal policies, Popt(d) as the
set of (fe,fm,fd)-policies, denoted by µ(d), which achieve the minimum expected average distortion, i.e,
?
The corresponding minimum expected distortion as a function of lookahead, d,
Popt(d) =µ(d) : µ(d) = arginf
{fe,fm,fd}limsup
N→∞
E
?
1
N
N
?
i=1
Λ(Ui,ˆUi)
??
.
(3)
D(d) =inf
{fe,fm,fd}limsup
N→∞
E
?
1
N
N
?
i=1
Λ(Ui,ˆUi)
?
.
(4)
Our main goal is to characterize D(d) and identify structural properties of the elements of Popt(d).
Note 1: Note that inf in the definition of D(d) can equivalently be replaced by min (cf. Appendix A). This implies that
Popt(d) is non-empty. Taking limsup in definition of D(d), while appearing more conservative, is actually inconsequential as
you would get the same value of D(d) if you put a liminf in the definition. This can be easily argued as follows. Let, the
per-symbol expected distortion under a policy µ upto time N be denoted by D(N)
distortion criterion with limsup and liminf respectively, we know Dinf(d) ≤ Dsup(d). We will now show Dinf(d) ≥ Dsup(d).
Let a policy µ∗attains the infimum for Dinf(d) (that there exists such policy follows from the same arguments as above
for the non-emptiness of Popt(d)). This implies (as Λ(·) is bounded) for ? > 0, ∃ N(?) > 0 such that under this policy
D(N(?))≤ Dinf(d) + ?. Operating such a policy in b blocks,
Dsup(d) ≤ lim
which implies in the limit ? → 0, Dsup(d) ≤ Dinf(d).
µ
. Denoting Dsup(d) and Dinf(d) as the
b→∞D(N(?)b)
µ∗
≤ Dinf(d) + ?,
(5)
III. CONTROLLED MARKOV PROCESS WITH AVERAGE COST : BACKGROUND AND PRELIMINARIES
We present here an overview of parts of the controlled Markov process with average cost criterion framework that will
be applied. First, we present an overview of the unconstrained case where the only objective is to maximize an expected
average cost. We then consider the constrained case where, in addition, the system needs to satisfy certain expected average
cost constraints.
A. Unconstrained Control
Here we overview results about general Borel state and action spaces. We refer to [31] for a more complete discussion. The
problem is characterized by the tuple (S,As,A,W,F,PS,PW,g) and a discrete time dynamical system,
st= F(st−1,at,wt),
where the states sttake values in finite, countable or in general Borel space S (called the state space), actions attake values
in the admissible action space, As(st) which is a subset of a compact subset A (called the action space) of a Borel space,
and the disturbance, wt, takes values in a measurable space W (called the disturbance space). Initial state S0is drawn with
distribution PS and the disturbance wt is drawn from the distribution, PW(·|st−1,at) which depends on past actions and
states, only through the pair (st−1,at). We consider only measurable functions. A policy π is defined to be the sequence of
functions, π = (µ1,µ2,···), where µtis the function which maps histories (φt= (s0,w0,···,wt−1)) to actions. A set of history
deterministic policies, ΠHD is characterized by policies for which actions are generated as at= µt(Φt). A set of Markov
deterministic policies, ΠMDis characterized by policies for which actions are generated as at= µt(st−1). A set of policies
ΠSDis referred to as stationary deterministic if it is characterized by a function µ : S → A such that, µt(Φt) = µ(st−1) ∀ t.
Policies can be randomized or deterministic ([31], Section 2.2). The policy sets ΠHR, ΠMRand ΠSRrespectively stand for
history randomized, markov randomized and stationary randomized policies. As per our definitions and interests, the largest
class of policies considered henceforth will be history deterministic policies, ΠHD. Let
(6)
K = {(x,a) : x ∈ S,a ∈ As(x)} ∈ B(S × A).
(7)
Note if S and A are compact subsets of a Borel space, K is a compact subset ∈ B(S ×A). The dynamics induce a stochastic
transition kernel on B(S)×K, Q(·|x,a), which implies for each (x,a) ∈ K, Q(·|x,a) is probability measure on B(S) and for
each D ∈ B(S), Q(D|·) is Borel measurable on K.
The objective is to maximize expected average reward given a bounded one stage reward function, g : K → R and find the
optimal policy. The average reward of a policy π with a given initial state distribution ν is defined by,
?
N
t=1
J(ν,π)
?
= liminf
N→∞Eπ
ν
1
N
?
g(St−1,µt(Φt))
?
.
(8)
Page 5
5
The optimal average reward and the optimal policy is defined by,
Jopt(ν)=sup
π
{π : J(ν,π) = Jopt(ν)}.
J(ν,π)
(9)
πopt(ν)=
(10)
Note that in general for a controlled Markov process with average cost criterion, where the state space is infinite, the total
expected average cost might depend on the initial state. However, operationally, since our objective is to minimize the expected
average distortion as in Eq. 4, we can decide to start of the system with the best initial state, state which yields the best
distortion, in which case the optimal cost and optimal policy will be denoted by, Joptand πopt.
Jopt
= sup
ν
{π : ∃ ν s.t. J(ν,π) = Jopt}.
Jopt(ν)
(11)
πopt
=
(12)
We need not dwell on sensitivity of the optimal cost to initial states, as this will not be an issue in our application of this
framework. However when state space is say finite, irreducible and positive recurrent, average cost is indeed equal for all initial
states. In general, there can be more than one optimal policy, in which case ties are resolved arbitrarily.
The following theorem describes the average cost optimality equation (ACOE) for such a process, and relates the optimal
reward with the optimal stationary deterministic policy.
Theorem 1 (cf. Theorem 6.1 of [31]): If λ ∈ R and a bounded function h : S → R satisfy,
?
then λ = Jopt. Further, if there is a function µ : S → A such that µ(s) attains the supremum above for all states, then
J(π) = Joptfor π = {µ1,µ2,···} with µi(Φi) = µ(si−1), ∀i.
Note 2: As in [31], the above theorem assumes the conditions of semi-continuous model, ([31], Section 2.4). However in
the set of problems considered in our paper, all such assumptions will be trivially met such as the transition kernel being
weakly continuous in K and the continuity of g. For brevity, we omit explicitly mentioning such assumptions before invoking
the above theorem in the sections to follow.
λ + h(s) = sup
a∈A
g(s,a) +
?
PW(dw|s,a)h(F(s,a,w))
?
, ∀ s ∈ S,
(13)
B. Constrained Control
In constrained control, the system is characterized by the tuple (S,As,A,W,F,PS,PW,g,l,Γ). With all the terms carrying
the same meaning as in previous subsection, l = {l1(·),···,lk(·)} and Γ = {Γ1,···,Γk} are respectively k-dimensional
constraint functions (defined on K) and cost vectors for some k ∈ N. the dynamics of the system are precisely the same as
in the unconstrained case, the objective here being,
maximizeJ(ν,µ)
Jc
i(ν,µ) ≤ Γi∀ i = 1,···,k,
subject to
(14)
where,
J(ν,π)
?
= liminf
N→∞Eπ
ν
?
1
N
N
?
t=1
g(St−1,µt(Φt))
?
,
(15)
is the average cost and,
Jc
i(ν,π)
?
= liminf
N→∞Eπ
ν
?
1
N
N
?
t=1
li(St−1,µt(Φt))
?
∀ i = 1,···,k,
(16)
are the constraints. [31] and [32] provide a treatment of this problem but only for denumerable states. We here present the
more general framework of [33], with compact state and action spaces. The Lagrangian, L, associated with the problem is
defined as,
L((ν,π),λ) = J(ν,π) +
k
?
i=1
λi(Γi− Jc
i(ν,µ)),
(17)
for any (ν,π) ∈ P(S) × ΠHDand λ = (λ1,···,λk) ∈ Rk
The following theorem gives conditions of optimality of a particular initial state distribution and a policy.
Theorem 2: [Theorem 2.3 of [33]] Assume the following conditions for the tuple (S,As,A,W,F,PS,PW,g,l,Γ),
C1 S and K are compact.
C2 g ∈ Cb(K) and li∈ Cb(K), ∀ i = 1,···,k.
+(positive orthant of the k-dimensional Euclidean space).
Page 6
6
C3 For all xn→ x and an→ a, Q(·|xn,an), converges weakly to Q(·|x,a).
C4 (Slater’s Condition) There exists a (ν,π) ∈ P(S) × ΠHDsuch that,
Jc
i(ν,π) < Γi∀ i = 1,···,k.
(18)
Under the conditions C1-C4, the Lagrangian L(·,·) has a saddle point with a randomized stationary policy, i.e., ∃ λ∗≥ 0 and
(ν∗,π∗) ∈ P(S) × ΠSRsuch that,
L((ν,π),λ∗) ≤ L((ν∗,π∗),λ∗) ≤ L((ν∗,π∗),λ), ∀ (ν,π) ∈ P(S) × Π, λ ≥ 0,
which implies (from Theorem 2.1 of [33]) that (ν∗,π∗) is a constrained optimal pair. Further (Theorem 2.2 of [33]),
(19)
L∗= L((ν∗,µ∗),λ∗) = inf
λ≥0
sup
(ν,π)∈P(S)×ΠHD
L((ν,π),λ) =sup
(ν,π)∈P(S)×ΠHD
inf
λ≥0L((ν,π),λ),
(20)
and L∗is the solution of the problem or the minimum expected average distortion such that the constraints are satisfied.
Note 3: In all the settings considered henceforth, ∀ s ∈ S, As(s) = A, hence with benign abuse of notation, we will drop
Asfrom the tuple associated with our description.
IV. REAL-TIME CODING WITH LIMITED LOOKAHEAD : COMPLETE MEMORY
The problem we described in Section II (Fig. 1) is an abstraction of a real time communication problem with the encoder
having a fixed lookahead of the future source symbols and a perfect unit-delay feedback of the channel output symbols. In this
section, we show that this problem can be formulated as a controlled Markov chain process with average cost criterion, and
derive an optimality equation. Before that, we modify our source to concentrate on an equivalent problem. Note that the i.i.d.
source, S = {Ui}i∈Nconsidered can be replaced by a markov source SM = {Vi}i∈Nsuch that, Vi= Ui+d
the source S is i.i.d., the transition kernel for this Markov process SMfrom v = (u1,u2,···,ud+1) to ˜ v = (˜ u1, ˜ u2,···, ˜ ud+1)
is given by,
i
∈ Ud+1. Since
K(v, ˜ v) = P(˜ v|v) = 1{(u2,···,ud+1)=(˜ u1,···,˜ ud)}PU(˜ ud+1).
(21)
The transition matrix is denoted by K. Let us assume the distribution of initial state is PV. Also there is no loss of optimality
in considering encoding functions to be |V| dimensional mappings, {fe,i(v,Vi−1,Yi−1)}v∈V. The effective problem with
modified source, SM is now a real-time communication problem as in Fig. 2 with no lookahead. For this modified problem,
we seek to minimize the average reward,
?
i=1
inf limsup
n→∞
1
nE
n
?
(Yi)). In this section we construct an average cost optimality equation for the equivalent
problem in Fig. 2 and complete memory, i.e. Zi= Yi.
˜Λ(Vi,ˆVopt
i
(Yi))
?
= inf limsup
n→∞
1
nE
?
n
?
i=1
Λ(Ui,ˆUopt
i
(Yi))
?
,
(22)
where˜Λ(Vi,ˆVopt
i
(Yi)) = Λ(Ui,ˆUopt
i
CHANNEL
CHANNEL
DECODER
MEMORY
EQUIVALENT
ENCODER
SOURCE
ACTUAL CHANNEL ENCODER
Yi
PY |X
Yi
Zi(Zi−1,Yi)
ˆVi(Yi,Zi−1)
Yi−1
fe,i(·,Vi−1,Yi−1)
Xi
Vi
Zi−1
Fig. 2.Equivalent problem to Fig. 1, with memoryless source S = {U}i∈Ntransformed to a Markov source, SM= {V }i∈N.
Page 7
7
A. Average Cost Optimality Equation
Definition 2 (Bayes Envelope and Bayes Response): Consider a random variable X taking values in a finite alphabet X
with distribution ∼ PX and ˆ x ∈ˆ X is our guess. The loss function Λ : X ×ˆ X → R can be understood as quantifying the
discrepancy in the actual value of X and its estimate. An estimate is good if its expected loss E[Λ(X,ˆ X)] is small. We define
the Bayes Envelope as B(PX) = minˆ xEPX[Λ(X, ˆ x)]. This represents the minimal expected loss value associated with the best
guess possible. The best guess is called the Bayes Response to PXand is denoted asˆ XBayes(PX) = argmin
ties are resolved arbitrarily. In the presence of observation, the optimal estimator of X based on Y in the sense of minimizing
expected loss under Λ is given byˆ XBayes(PX|Y) = argmin
ˆ x
on the loss function, this dependence is implied whenever we use Bayes response.
Lemma 3: The optimal decoding rule for the problem in Fig. 2 is given by,
ˆ x
E[Λ(X, ˆ x)], where
E[Λ(X, ˆ x)|Y ]. Note that in general, the Bayes response depends
ˆVopt
i
(Yi) =ˆVBayes(PVi|Yi).
(23)
Proof: Fix n and the encoding rule. From the definition of Bayes response,
?˜Λ(Vi,ˆVi(yi))
which implies,
?˜Λ(Vi,ˆVi(Yi))
Thus we have the following lower bound on the expected average cost,
?
i=1
E
???Yi= yi?
≥
E
?˜Λ(Vi,ˆVBayes(PVi|yi))
???Yi= yi?
?
,
(24)
E
?
≥
E
?˜Λ(Vi,ˆVBayes(PVi|Yi)) .
(25)
limsup
n→∞
1
nE
n
?
˜Λ(Vi,ˆVi(Yi))
?
≥ limsup
n→∞
1
nE
?
n
?
i=1
˜Λ(Vi,ˆVBayes(PVi|Yi))
?
,
(26)
which is attained by decoding ruleˆVopt
ˆUBayes(PUi|Yi).
Fix the decoding rule to be the optimal rule {ˆVopt
(Vi,PVi|Yi) ∈ S. PVi|Yi denotes the belief of the encoder on the source symbol given all the past and the present channel
outputs. Let us denote it by a |V|-dimensional non-negative probability (column) vector βYi. As source symbols takes values
in a finite alphabet, the state space S is a compact subset of Borel space. Consider the disturbance to be Wi= (Vi,Yi), which
takes values in a finite set, V × Y. The action is history dependent, Ai= fe,i(S0,Vi−1,Yi−1) = fe,i(S0,Wi−1) (here S0
is some fixed initial state). PS is some initial distribution. From now on we will use fe,i(Vi−1,Yi−1) interchangeably with
f(S0,Vi−1,Yi−1) to denote Aias S0is fixed. The action set is the set of mappings from V to X, hence |A| = |X||V|, which
is finite. Note,
i
(Yi) =ˆVBayes(PVi|Yi). Thus the optimal decoding for the original source,ˆUopt
i
(Yi) =
i
}i∈N as above. Consider the state sequence for this problem, Si =
P(Wi|Si−1,Ai)=P(Vi,Yi|Vi−1,Ai,βY1,···,βYi−1)
K(Vi−1,Vi)P(Yi|Xi= Ai(Vi))
PW(Wi|Si−1,Ai),
(27)
(∗)
=
(28)
(29)
=
where (∗) follows from the fact that {Vi}i∈N is Markov, and from the DMC property of the channel as in Eq. (2). Hence
Wi∼ PW(·|Si−1,Ai) where PW is given by Eqs. (28) and (29).
Lemma 4: Given knowledge of the entire past history of actions, states and disturbance, the current state evolves according
to a deterministic function of the past state, current action and the current disturbance, i.e.,
Si= F(Si−1,Ai,Wi).
(30)
Proof:
PVi=v|Yi
=
Pv,Yi|Yi−1
PYi|Yi−1
?
?
(31)
=
Vi−1PVi−1,v,Yi|Yi−1
Vi−1,ViPVi−1,Vi,Yi|Yi−1
Vi−1PVi−1|Yi−1K(Vi−1,v)P(Yi|Xi= Ai(v))
Vi−1,ViPVi−1|Yi−1K(Vi−1,Vi)P(Yi|Xi= Ai(Vi)).
?
?
(32)
=
(33)
Page 8
8
Therefore,
βYi
=
?PVi=v|Yi?
βT
?
G(βYi−1,Ai,Yi),
v∈V
(34)
=
?
Yi−1K(v)P(Yi|Ai(v))
˜ v∈VβT
Yi−1K(˜ v)P(Yi|Ai(˜ v))
?
v∈V
(35)
=
(36)
where K(v) = [K(˜ v,v)]˜ v∈V is a column vector. Since Wi= (Vi,Yi), Si= (Vi,βYi) Eq. (36) implies,
Si= (Vi,βYi)=(Vi,G(βYi−1,Ai,Yi))
G?(Vi,Yi,Ai,βYi−1)
F(Si−1,Ai,Wi).
(37)
(38)
(39)
=
=
Let,
g(Si,Ai+1)=c(Si)
c(βYi)
(40)
(41)
(42)
=
=
−E
?˜Λ(Vi,ˆVBayes(βYi))|Yi?
.
Therefore,
inf limsup
n→∞
1
nE
?
n
?
i=1
˜Λ(Vi,ˆVopt
i
(Yi))
?
= −supliminf
n→∞
1
nE
?
n
?
i=1
g(Si−1,Ai)
?
.
(43)
Hence the tuple T = (S,A,W,F,PS,PW,g) forms a controlled Markov process. The problem of finding the best channel
encoder (using the optimal decoder to be the Bayesian Vi(Yi) = Vopt
is equivalent to the problem of finding the optimal policy for the tuple T which maximizes the average reward under the cost
function g. The optimal reward is given by,
?
Thus the ACOE for the controlled Markov process T = (S,A,W,F,PS,PW,g) which has the generic form,
?
w∈W
when specialized to our setting becomes,
?
w∈W
which becomes, upon substitution from Eq. (28),
(˜ v,˜ y)∈V×Y
We will now transform back the setting from Markov source {Vi}i∈N to i.i.d. source {Ui}i∈N. Let us denote v =
(u1,u2,···,ud+1) and β = β(ˆ u1,···, ˆ ud+1)(ˆ u1,···,ˆ ud+1)∈Ud+1. Note that,
E
=
i
(PVi|Yi)) in our problem of real time communication
λopt
T
= sup
π
liminf
n→∞
1
nE
n
?
i=1
g(Si−1,Ai)
?
.
(44)
λ + h(s)=sup
a∈A
g(s,a) +
?
PW(w|s,a)h(F(s,a,w))
?
, ∀s ∈ S,
(45)
λ + h(v,β)= sup
a∈A
g(v,β,a) +
?
PW(w|v,β,a)h(F(v,β,a,w))
?
, ∀v ∈ V, β ∈ P(V),
(46)
λ + h(v,β)=c(β) + sup
a∈A
?
K(v, ˜ v)PY |X(˜ y|a(˜ v))h(˜ v,G(β,a, ˜ y))
, ∀v ∈ V, β ∈ P(V).
(47)
?˜Λ(Vi,ˆVopt(Yi))
???Yi?
E
?˜Λ(Vi,ˆVBayes(PVi|Yi))
Λ(Ui,ˆUopt(Yi))
?
min
ˆ u
u∈U
???Yi?
???Yi?
(48)
=
E
?
???Yi?
(49)
=
E
Λ(Ui,ˆUBayes(PUi|Yi))
?
(50)
(∗)
=PUi|Yi(u)Λ(u, ˆ u),
(51)
where (∗) follows from the Definition 2. Hence,
g(s,a) = c(β) = −min
ˆ u
?
u∈U
β1(u)Λ(u, ˆ u),
(52)
Page 9
9
where β1(u) =?
(ˆ u2,···,ˆ ud+1)∈Udβ(u, ˆ u2,···, ˆ ud+1) is the marginal of β on the first component. Note that g(·) is continuous
in β. Thus the transformed ACOE to our original problem with i.i.d. source,
λ + h(u1,···,ud+1,β)
?
∀(u1,···,ud+1) ∈ Ud+1, β ∈ P(Ud+1).
=
−min
ˆ u
u∈U
β1(u)Λ(u, ˆ u) + max
a∈A
?
(˜ u,˜ y)∈U×Y
PU(˜ u)PY |X(˜ y|a(u2,···,ud+1, ˜ u))h(u2.···,ud+1, ˜ u,G(β,a, ˜ y))
,
(53)
Note 4 (Structure of Optimal Policy): As A is finite, we have replaced the sup with max in the Eq. (53). Specializing
Theorem 1 for the above ACOE, if there exists, a constant, λ∗and a measurable real valued bounded function h : S → R such
that equation Eq. (53) is satisfied for all (u1,···,ud+1) ∈ Ud+1,β ∈ P(Ud+1) then the minimum distortion D(d) = −λopt=
−λ∗. Further, if there exists a function, µ : S → A such that the maximum for all states in Eq. (53) is attained by µ(s) =
µ(ud+1
1
,β), then the optimal encoding policy is stationary and depends on history only through the past state, i.e., µi(Φi) =
µ(·,Si−1) = µ(·,Ud
µ(Ui−1,···,Ud+1,βYi−1). Hence the optimal encoding in this case is a stationary mapping into X which uses only d + 2
source symbols Ud+1
i−1,βYi−1), or the input to the channel at ithtime epoch, is Xi= Ai(Ud+i
i
) = µ(Ud+i
i
,Ud
i−1,βYi−1) =
i−1and the belief βYi−1 that is updated by the Eq. (36).
B. To Look or Not to Lookahead : Optimality of Symbol by Symbol Policies
In this section we derive conditions for stationary, symbol by symbol policies to be optimal. This means that we seek to
identify situations where the optimal encoding at time i is given by, Xi= µsymbol(Ui).
Lemma 5: When lookahead d = 0, the minimum average distortion is achieved by symbol-by-symbol encoding (and
decoding) and given by Dsymbol= minˆ
Λ U,ˆUBayes(PU|Y)
Proof: Consider the communication system in Fig. 1 with lookahead, d = 0. Thus this corresponds to a communication
system with memoryless source and memoryless channel, and causal encoding and causal decoding with unit delay feedback.
We will first use standard information theoretic methods to prove,
X:U→XE
?? ??
Dsymbol= min
ˆ
X:U→X,ˆU:Y→ˆ UE
?
Λ
?
U,ˆU(Y )
??
.
(54)
Achievability :
Let Dmindenote the minimum distortion. Clearly Dsymbolis achievable by encoding, X(·) and decoding,ˆU(·) which attain
the minimum in Eq. (54). Hence Dmin≤ Dsymbol.
Converse :
Consider the chain of inequalities to prove Dmin≥ Dsymbol. Let D be the distortion achieved by any causal encoding and
causal decoding. Also note that minimizing over functions of the form, fe,i(Ui) and fd,i(Yi) is equivalent to minimizing over
vector valued mappings of the form, fe,i(·,Ui−1) : U → X and fd,i(·,Yi−1) : Y →ˆU
1
n
i=1
1
n
i=1
1
n
i=1
Note that,
?
≥
X,ˆU
D=limsup
n→∞
n
?
n
?
n
?
E
?
?
?
Λ(Ui,ˆUi)
?
(55)
=limsup
n→∞
EE
?
?
Λ(Ui,ˆUi)|Ui−1,Yi−1??
Λ(Ui,ˆUi)|fe,i,fd,i,Ui−1,Yi−1??
(56)
= limsup
n→∞
EE
.
(57)
E
Λ(Ui,ˆUi)|fe,i,fd,i,Ui−1,Yi−1?
=
?
u∈U,y∈Y
min
PU(u)PY |X(y|fe,i(u))Λ(u,fd,i(y))
?
Dsymbol,
(58)
u∈U,y∈Y
PU(u)PY |X(y|X(u))Λ(u,ˆU(y))
(59)
=
(60)
which implies, D ≥ Dsymbolfor all possible achievable distortions, which implies, Dmin≥ Dsymbol. What is left is to show
that,
?
Dsymbol=min
ˆ
X:U→X,ˆU:Y→ˆ UE
Λ(U,ˆU(Y ))
?
=min
X:U→XE
ˆ
?
Λ
?
U,ˆUBayes(PU|Y)
??
,
(61)
Page 10
10
which is equivalent to showing that for any encoding rule the optimal decoding rule,ˆU is the Bayes responseˆUBayes(PU|Y)
which follows from the definition of the Bayes response.
The above proof shows that if stationary symbol by symbol policy is optimal for controlled Markov process of Section IV-A,
then the optimal reward is given by,
?
Note that the joint distribution of (U,Y) on the right hand side of Eq. (62) and hence the expected loss is dependent on the
encoding ruleˆ X. To simplify the notation we denote, Λ(Ui,ˆUBayes(·)) by Λ(Ui,·), the Bayes response is implied in this
notation. Also, for a given source, PU, channel PY |X, and a symbol by symbol encoding policy µs, i.e., Xi= µs(Ui), let
fµs[PU,Y ] denote the posterior, PU|Y, when source is distributed as PU, and encoding policy is µsthrough the channel PY |X.
Note for brevity we omit indicating PY |Xin the argument of fµs(·) though the posterior depends on channel also. Hence if
µsis the minimizer in Eq. (62), then λsymbolis given by −E[Λ(U,fµs[PU,Y ])]. To state our next result, pertaining to the
optimality of symbol by symbol coding, we introduce another bit of notation. The evolution of the posterior is through the
function G(β,a,y), i.e.,
λsymbol= − min
ˆ
X:U→XE
Λ
?
U,ˆUBayes(PU|Y)
??
.
(62)
˜β=G(β,a,y)
?
(63)
=
βTK(v)P(y|a(v))
˜ v∈VβTK(˜ v)P(y|a(˜ v))
?
?
v∈V
.
(64)
Also let for a distribution β, B(β) = minˆ u∈U
Theorem 6: Denote the encoding function which achieves the minimum in Eq. (54) by µs. For the problem setup depicted
in Fig. 1 with the ACOE in Eq. (53) for a given positive lookahead d ≥ 1, stationary symbol by symbol policy is optimal if
the following holds :
?
u∈Uβ(u)Λ(u, ˆ u), which is the Bayes Envelope for the given loss function.
?
?
?
?
(u,y)∈U×Y
PU(u)PY |X(y|µs(u))B(fµs[PU,y]) +
d+1
?
k=2
?
y∈Y
PY |X(y|µs(uk))B(fµs[βk,y])
=
(˜ u,˜ y)∈U×Y
PU(˜ u)PY |X(˜ y|a∗(ud+1
2
, ˜ u))B(β1)
+
(˜ u,˜ y)∈U×Y
PU(˜ u)PY |X(˜ y|a∗(ud+1
2
, ˜ u))
d
?
k=2
?
PY |X(ˆ y|µs(˜ u))B(fµs[˜βd+1, ˆ y])
ˆ y∈Y
PY |X(ˆ y|µs(uk+1))B(fµs[˜βk, ˆ y])
+
(˜ u,˜ y)∈U×Y
PU(˜ u)PY |X(˜ y|a∗(ud+1
2
, ˜ u))
?
ˆ y∈Y
∀(u1,···,ud+1) ∈ Ud+1, β ∈ P(Ud+1),
(65)
where a∗(·) is the minimizer of the right hand side of the above equation,˜β = G(β,a, ˜ y) and βkand˜βkdenote the marginal
of the kthcomponent of β and˜β, respectively.
Proof:
We will first assume that symbol by symbol policy is optimal or the optimal encoding is Xi = µi(Ud+i
µs(Ud+i
i
,Si−1) = µs(Ud+i
?
∀(u1,···,ud+1) ∈ Ud+1, β ∈ P(Ud+1),˜β = G(β,µs, ˜ y).
i
,Φi) =
i
) = µs(Ui). Hence we can solve for an h that satisfies the following equation,
λsymbol+ h(u1,···,ud+1,β)=
−min
ˆ u
u∈U
β1(u)Λ(u, ˆ u) +
?
(˜ u,˜ y)∈U×Y
PU(˜ u)PY |X(˜ y|µs(u2))h(u2.···,ud+1, ˜ u,˜β)
(66)
We claim that for a given lookahead, d, Eq. (66) is satisfied with,
h(u1,···,ud+1,β) = −B(β1) −
d+1
?
k=2
?
ˆ y∈Y
PY |X(ˆ y|µs(uk))B(fµs[βk, ˆ y])
,
(67)
Page 11
11
where βkis the marginal βk(uk) =?
LHS
(uk−1
1
,ud+1
k+1)β(u1,···,ud+1). To prove the claim, consider the L.H.S. of Eq. (66),
=λsymbol+ h(u1,···,ud+1,β)
(68)
=
−E[Λ(U,fµs[PU,Y ])] − B(β1) −
?
d+1
?
d+1
?
k=2
?
ˆ y∈Y
PY |X(ˆ y|µs(uk))B(fµs[βk, ˆ y])
(69)
=
−
(u,y)∈U×Y
PU(u)PY |X(y|µs(u))B(fµs[PU,y])
ˆ y∈Y
−B(β1) −
k=2
?
PY |X(ˆ y|µs(uk))B(fµs[βk, ˆ y])
.
(70)
Before evaluating the right hand side of Eq. (66), we evaluate the marginals˜βk.
˜β(ˆ ud+1
1
)=G(β,µs, ˜ y)(ˆ ud+1
βTK(ˆ ud+1
?
?
?
?
1
)
(71)
=
1
)PY |X(˜ y|µs(ˆ ud+1
βTK(˜ ud+1
1
)PY |X(˜ y|µs(˜ ud+1
uβ(u, ˆ ud
?
?
1
))
˜ ud+1
1
?
ˆ ud+1
1
?
1
1
))
(72)
=
1)PU(ˆ ud+1)PY |X(˜ y|µs(ˆ ud+1
uβ(u, ˆ ud
1
))
1)PU(ˆ ud+1)PY |X(˜ y|µs(ˆ ud+1
1)PU(ˆ ud+1)PY |X(˜ y|µs(ˆ u1))
uβ(u, ˆ ud
1
))
(73)
=
uβ(u, ˆ ud
ˆ ud+1
1)PU(ˆ ud+1)PY |X(˜ y|µs(u1)).
(74)
Hence the marginals,
˜βk(ˆ uk) =
(ˆ uk−1
1
,ˆ ud+1
k+1)
˜β(ˆ ud+1
1
)=fµs[β2, ˜ y](ˆ u1), k = 1
(75)
=βk+1(ˆ uk), 2 ≤ k ≤ d,
PU(ˆ ud+1), k = d + 1.
(76)
(77)
=
Thus RHS of the Eq. (66),
=
−B(β1) +
?
?
(˜ u,˜ y)∈U×Y
PU(˜ u)PY |X(˜ y|µs(u2))h(u2.···,ud+1, ˜ u,˜β)
(78)
=
−B(β1) −
(˜ u,˜ y)∈U×Y
PU(˜ u)PY |X(˜ y|µs(u2))B(˜β1)
−
?
?
(˜ u,˜ y)∈U×Y
PU(˜ u)PY |X(˜ y|µs(u2))
d
?
?
k=2
?
PY |X(ˆ y|µs(˜ u)B(fµs[˜βd+1, ˆ y])
ˆ y∈Y
PY |X(ˆ y|µs(uk+1))B(fµs[˜βk, ˆ y]))
−
(˜ u,˜ y)∈U×Y
−B(β1) −
PU(˜ u)PY |X(˜ y|µs(u2)
?
?
PU(˜ u)PY |X(˜ y|µs(u2)
ˆ y∈Y
(79)
=
(˜ u,˜ y)∈U×Y
PU(˜ u)PY |X(˜ y|µs(u2))B(fµs[β2, ˜ y])
−
d
?
k=2
?
(˜ u,˜ y)∈U×Y
PU(˜ u)PY |X(˜ y|µs(u2))
?
ˆ y∈Y
PY |X(ˆ y|µs(˜ u)B(fµs[PU, ˆ y])
PY |X(ˆ y|µs(uk+1))B(fµs[βk+1, ˆ y])
−
(˜ u,˜ y)∈U×Y
?
ˆ y∈Y
(80)
=
−B(β1) −
?
PU(˜ u)PY |X(ˆ y|µs(˜ u))B(fµs[PU, ˆ y])
˜ y∈Y
PY |X(˜ y|µs(u2))B(fµs[β2, ˜ y]) −
d
?
k=2
?
ˆ y∈Y
PY |X(ˆ y|µs(uk+1))B(fµs[βk+1, ˆ y])
−
?
(˜ u,ˆ y)∈U×Y
LHS.
(81)
=
(82)
Page 12
12
Thus Eq. (65) and Eq. (53) imply that ∃ a bounded function, h given by Eq. (67) such that we have,
λsymbol+ h(u1,···,ud+1,β)=
−B(β1) + max
a∈A
?
(˜ u,˜ y)∈U×Y
PU(˜ u)PY |X(˜ y|a(ud+1
2
, ˜ u))h(u2.···,ud+1, ˜ u,˜β)
∀(u1,···,ud+1) ∈ Ud+1, β ∈ P(Ud+1),˜β = G(β,µs, ˜ y),
(83)
and the maximization is attained by the policy, µs(·,s) = µs(·) such that µs(u1,···,ud+1) = µs(u1). This implies by Theorem
1 that the symbol by symbol coding policy is optimal.
Corollary 7: Given X ⊇ U, uncoded symbol by symbol policy, i.e., µs(Ui) = µc(Ui) = Uiis optimal if,
?
=PU(˜ u)PY |X(˜ y|a∗(ud+1
(u,y)∈U×Y
?
?
?
PU(u)PY |X(y|u)B(fµc[PU,y]) +
d+1
?
k=2
?
y∈Y
PY |X(y|uk)B(fµc[βk,y])
(˜ u,˜ y)∈U×Y
2
, ˜ u))B(β1)
+
(˜ u,˜ y)∈U×Y
PU(˜ u)PY |X(˜ y|a∗(ud+1
2
, ˜ u))
d
?
k=2
?
PY |X(ˆ y|˜ u)B(fµc[˜βd+1, ˆ y])
ˆ y∈Y
PY |X(ˆ y|uk+1))B(fµc[˜βk, ˆ y])
+
(˜ u,˜ y)∈U×Y
PU(˜ u)PY |X(˜ y|a∗(ud+1
2
, ˜ u))
?
ˆ y∈Y
∀(u1,···,ud+1) ∈ Ud+1, β ∈ P(Ud+1),
(84)
where a∗(·) is the minimizer of the right hand side of the above equation,˜β = G(β,a, ˜ y) and βkand˜βkdenote the marginal
of the kthcomponent of β and˜β, respectively.
Proof: Substitute in Theorem 6, µs= µc.
V. REAL-TIME CODING WITH LIMITED LOOKAHEAD : FINITE MEMORY
A. Average Cost Optimality Equation
In Section IV, we considered the scenario where the decoder has access to the entire past channel output sequence or,
equivalently memory is unbounded. In this section, we will develop controlled Markov process formulation for the case where
the memory alphabet is finite and does not grow with time i.e, the memory space (M) is time-independent with |M| < ∞.
We make two assumptions on our coding systems for this setting :
A1 There is a fixed time-independent memory update function, i.e., there exists a function fmsuch that Zi= fm(Zi−1,Yi)
for all i. This assumption is not very restrictive as real systems such as quantizers or finite window storage devices store
only the past few channel output symbols and evolve in a time invariant way, eg. Zi= fm(Zi−1,Yi) = Yi, implies the
reconstruction is given by,ˆUi= fd,i(Yi,Yi−1).
A2 We fix the optimal decoding rule to be, fd,i(Yi,Zi−1) =ˆUopt(Yi,Zi−1), that is the decoding is restricted to optimal
policies among the stationary (time invariant) ones. Note that though we assume stationary decoding, optimal encoding
may in general not be stationary.
Hence the optimal expected average distortion depends on d,fm,M and hence we denote it by, D(d,fm,M) to distinguish
it from D(d) of Section IV.
Here too we begin by formulating a controlled Markov process for the modified source, {Vi}i∈N and then substitute
for the original source. By the assumption A2, the optimal decoding is stationary,ˆVopt(·). Consider the state sequence,
Si = (Vi,Zi) ∈ S(= V × Z) and the disturbance sequence, Wi = (Vi,Yi) ∈ W(= V × Y). The actions can be history
dependent, Ai= fe,i(S0,Wi−1), S0is a fixed initial state (with some distribution PS). The disturbance depends on the past
sequence of disturbances, states, actions and the current action only through the past state and current action, i.e.,
P(Wi|Wi−1,Si−1,Ai)=P(Vi,Yi|Vi−1,Yi−1,Ai)
K(Vi−1,Vi)PY |X(Yi|Ai(Vi))
PW(Wi|Si−1,Ai),
(85)
(86)
(87)
=
=
and hence
Si= (Vi,Zi) = (Vi,fm(Yi,Zi−1)) = F(Si−1,Ai,Wi).
(88)
Page 13
13
Note that for our transformation (as in Section IV) the modified cost function is given by,
˜Λ(Vi,ˆVopt(Yi,Zi−1)) = Λ(Ui,ˆUopt(Yi,Zi−1)).
(89)
Therefore consider,
E
?˜Λ(Vi,ˆVopt(Yi,Zi−1))|Vi−1,Yi−1,Zi−1,Ai?
P(Vi= v,Yi= y|Vi−1,Yi−1,Zi−1,Ai)˜Λ(v,ˆVopt(y,Zi−1))
?
−g(Si−1,Ai).
(90)
=
?
v∈V,y∈Y
(91)
=
v∈V,y∈Y
K(Vi−1,v)P(y|Ai(v))˜Λ(v,ˆVopt(y,Zi−1))
(92)
=
(93)
Thus,
1
n
n
?
n
?
i=1
E
?˜Λ(Vi,ˆVopt(Yi,Zi−1))
?
?
(94)
=
1
n
i=1
EE
?˜Λ(Vi,ˆVopt(Yi,Zi−1))|Vi−1,Yi−1,Zi−1,Ai??
(95)
=
−1
n
n
?
i=1
E[g(Si−1,Ai)],
(96)
and consequently,
inf limsup
n→∞
1
nE
?
n
?
i=1
˜Λ(Vi,ˆVopt(Yi))
?
= −supliminf
n→∞
1
nE
?
n
?
i=1
g(Si−1,Ai)
?
.
(97)
Thus the problem is to find the optimal policy for the controlled markov process, T = (S,A,W,F,PS,PW,g), which
maximizes the average reward under the cost function g. The optimal reward is given by,
?
Thus the ACOE for the controlled Markov process T = (S,A,W,F,PS,PW,g), is given by,
?
w∈W
?
w∈W
We will now transform back the setting from Markov source {Vi}i∈N to i.i.d. source {Ui}i∈U. Let us denote v =
(u1,u2,···,ud+1). Since˜Λ(Vi,ˆVopt(Yi,Zi−1)) = Λ(Ui,ˆUopt(Yi,Zi−1)) , we have,
g(s,a)=g(ud+1
1
,z,a)
=
−
˜ u∈U,˜ y∈Y
Thus the resulting ACOE for our problem of an i.i.d. source with lookahead d (replacing again sup by max due to finiteness
of the action set),
λopt
T
= sup
π
liminf
n→∞
1
nE
n
?
i=1
g(Si−1,Ai)
?
.
(98)
λ + h(s)=sup
a∈A
g(s,a) +
?
PW(w|s,a)h(F(s,a,w))
?
, ∀s ∈ S
(99)
λ + h(v,z)= sup
a∈A
g(v,z,a) +
?
PW(w|v,z,a)h(F(v,z,a,w))
?
, ∀v ∈ V, z ∈ Z.
(100)
?
PU(˜ u)PY |X(˜ y|a(u2,···,ud+1, ˜ u))Λ(u2,ˆUopt(˜ y,z)).
(101)
λ + h(u1,···,ud+1,z)
(˜ u,˜ y)∈U×Y
∀(u1,···,ud+1) ∈ Ud+1, z ∈ Z.
= max
a∈A
?
PU(˜ u)PY |X(˜ y|a(u2,···,ud+1, ˜ u))
?
h(u2.···,ud+1, ˜ u,fm(˜ y,z)) − Λ(u2,ˆUopt(˜ y,z))
?
,
(102)
Here again, invoking Theorem 1 implies that if the ACOE in Eq. (102) is solved by a real λ∗and a bounded h(·), then
D(d,fm,M) = −λ∗.
Page 14
14
Note 5: The reason for making assumptions A1 and A2, is that the modification of the cost function Λ(·,·) results in g(·,·)
which is time invariant and a state sequence evolving through a function F(·,·,·) which is also time invariant.
Theorem 8 (Optimality of Stationary Policy): The ACOE Eq. (102) admits a stationary optimal policy.
Proof: This follows from Theorem 4.3 in [31] as for a fixed lookahead, d, the state space and action space are finite.
Hence the optimal encoding is, Xi= µ(Ud+1
i−1,Zi−1).
B. Computing D(d,fm,M), Bounding D(d)
In this section, we explicitly compute D(d,fm,M). Note that in the setting of complete memory in Section IV the average
cost optimality equation can also be solved approximately. As the state space is compact, this admits discretization of the space
and running value or policy iteration to obtain approximations to the optimal distortion. References [34] and [35] provide an
extensive treatment along with prescriptions of error bounds and the trade off between quantization resolution and the precision
of the approximated optimal reward for discounted cost problems. However, the computational point of view we take in this
section does not follow the path of discretization and then approximation of the average reward. Rather, we compute exactly
D(d,fm,|M|) which provide non-trivial upper bounds on D(d). This is illustrated by the following example.
We assume U = X = Y =ˆU = {0,1}. The source is Bern(p),p ∈ [0,0.5] and the channel is BSC(δ),δ ∈ [0,0.5] and loss
function is hamming distortion. The memory is of m bits and retains the last m channel outputs and hence,ˆUi=ˆUopt(Yi
We will denote the optimal expected average distortion by D(d,m) in this case. We observe the following,
?
• D(d) ≤ D(d1,m) ≤ D(d2,m) ≤ D(0) ∀ d1≥ d2.
• D(∞) ≤ D(d) ≤ D(d,m) ≤ D(0) ∀ m,d.
Lemma 9: D(∞) = Dminwhere Dminis the minimum achievable distortion of the joint source-channel communication
problem.
Proof: Any sequential or limited delay encoding and decoding scheme can obviously be embedded into and emulated
arbitrarily closely by a sequence of block codes. Hence Dmin≤ D(∞). We will now prove Dmin≥ D(∞). This is equivalent
to proving that any sequential scheme with infinite lookahead can be used to construct a block coding scheme which can
then attain arbitrarily close to Dminhence, D(∞) ≤ Dmin. The argument is based on block Markov coding, (cf. Section on
Coherent Multihop Lower Bound, Chapter 17, [36]) except that instead of coding via looking at the past block as in block
Markov coding, here we look at the future block. Fix an arbitrarily small ? > 0, an arbitrarily large B, and an n sufficiently
large that there exists a block coding scheme of blocklength n achieving per symbol distortion no larger than Dmin+?. Let fe
and fddenote the encoding and decoding mappings of that ?-achieving scheme. We now construct a sequential scheme with
lookahead d = 2n as follows :
• Encoding : We code in the present block using the source symbols of the future block, i.e, Xn(b) = fe(U(b+1)n
b = 1,···,B − 1, and some dummy coding for the last block known to the decoder.
• Decoding : For the block one decoder has some predefined construction. For block b, decoder constructs source symbols
as,ˆUn(b) = fd(Y(b−1)n
The per-symbol distortion achieved by this scheme is clearly upper bounded by
arbitrarily close to Dminfor sufficiently small ? and sufficiently large B.
We have run relative value iteration to compute D(d,m) for d = 1 and some values of m, yielding some interesting upper
bounds on D(1). Note that the values obtained are exact and do not approximate distortion as the relative value iteration
converges in a few iterations. This is because the state space and action space is finite, and it is easy to check that the weak
accessibility condition (Definition 4.2.2 [37]) is satisfied. This implies by Proposition 4.3.1 of [37], that relative value iteration
converges. Fig. 3 shows the distortion values as a function of source distribution when the cross over probability is fixed,
δ = 0.3. Fig. 4 shows the distortion values as a function of channel cross over probabilities when the source distribution is
fixed, p = 0.3.
These plots provide insight into the structure of optimal policies in the setting of Section IV, given that we are considering
Bern(p) source, BSC(δ) channel under Hamming loss and |X| = |U| = 2. Since D(1,2) is an upper bound on D(1) and
hence on D(d),d ≥ 1, it is clear that for source distributions and channel cross over probabilities where D(1,2) < D(0),
symbol by symbol is not optimal. We evaluate this region and show it in Fig. 5. Note when d = ∞, since separation is optimal,
the region of suboptimality of symbol by symbol is the the complete square (p ∈ (0,0.5),δ ∈ (0,0.5)) except the boundary
where symbol by symbol is optimal. Also note, as is consistent with the plots, that in the zero lookahead case, we have
D(0) = min{p,δ}. Hence, for any lookahead, d, for a fixed cross over probability δ, if symbol by symbol encoding-decoding
achieves D(d) for p = p0< 0.5, then it is also optimal for p ∈ (p0,1 − p0]. Similarly, for a fixed source probability p, if
symbol by symbol encoding-decoding is optimal for δ = δ0< 0.5, then it is also optimal for δ ∈ (δ0,1 − δ0].
i−m).
• D(0,m) = D(0) = Dsymbol= minˆ
• D(d) ≤ D(d,m1) ≤ D(d,m2) ≤ D(0) ∀ m1≥ m2.
X:U→XE
Λ(U,ˆUBayes(PU|Y)
?
= min{p,δ}, ∀ m.
bn+1) for
(b−2)n+1), b = 2,···,B, thus decoding is in blocks using the past block.
Λmax+(B−1)(Dmin+?)
B
, which can be made
Page 15
15
00.1 0.20.30.40.5
p
0.6 0.70.8 0.91
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
D
d=1, m=0
d=0
d=1, m=1
d=1, m=2
d=∞
Upper Bound
on D( 1)
D(0)
D(∞)
Fig. 3.
probability δ = 0.3 and vary source probability in [0,0.5]. For d = 1, we have plotted values for increasing memory, m = 0,1,2, which yield series of
non-trivial non-increasing upper bounds on D(1). D(0) is achieved by symbol symbol by symbol coding, while D(∞) = Dminof Shannon’s joint source
channel coding (achieved by separation).
Computing and contrasting D(d,m) and D(d) for a Bernoulli source and binary symmetric channel and Hamming loss. We fix channel cross over
0 0.10.2 0.30.40.5
δ
0.60.70.80.91
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
D
d=1,m=0
d=0
d=1,m=1
d=1,m=2
d=∞
Upper Bound
on D(1)
D(∞)
D(0)
Fig. 4.
p = 0.3 and vary the channel cross-over probability in [0,0.5]. For d = 1, we have plotted values for increasing memory, m = 0,1,2, which yield series of
non-trivial non-increasing upper bounds on D(1). D(0) is achieved by symbol symbol by symbol coding, while D(∞) = Dminof Shannon’s joint source
channel coding (achieved by separation).
Computing and contrasting D(d,m) and D(d) for a Bernoulli source and binary symmetric channel and Hamming loss. We fix source probability
Page 16
16
Fig. 5.
does not achieve D(1). This is the shaded region which is bounded inside by the two curves.
The plot shows a region in the source-channel plane where symbol by symbol coding is suboptimal among schemes with lookahead d = 1, i.e., it
VI. REAL TIME CODING WITH LIMITED LOOKAHEAD : IN THE ABSENCE OF FEEDBACK
In the previous sections we assumed the availability of perfect unit delay feedback from the decoder to the encoder. We
now consider the same setting as that depicted in Fig. 1, but without feedback, and we formulate the problem as a controlled
Markov process. Decoders have finite state (see Note 6) and the assumptions A1 and A2 are presumed for similar reasons as
in Note 5 in Section V, i.e., memory is finite and decoding is stationary. Here again, we first study the system with modified
source, {Vi}i∈N. The state space for this problem is, Si= (Vi,PZi|Vi) ∈ S and the disturbance Wi= Vi. The actions are
thus history dependent, Ai = fe,i(S0,Vi−1) = fe,i(S0,Wi−1), S0 is some fixed initial state with distribution PS. Due to
Markovity of the source we have,
P(Wi|Wi−1,Si−1,Ai)=K(Vi−1,Vi)
PW(Wi|Si−1,Ai).
(103)
(104)
=
Denoting {PZi|Vi}Zi∈Z by βi,
βi(z)=
PVi,z|Vi−1
PVi|Vi−1
?
?
?
(105)
=
Zi−1PZi−1,Vi,z|Vi−1
Zi−1,ZiPZi−1,Vi,Zi|Vi−1
Zi−1,YiPZi−1|Vi−1K(Vi−1,Vi)P(Yi|Ai(Vi))1{z=fm(Yi,Zi−1)}
Zi−1,Yi,ZiPZi−1|Vi−1K(Vi−1,Vi)P(Yi|Ai(Vi))1{Zi=fm(Yi,Zi−1)}
Zi−1,Yiβi−1(Zi−1)K(Vi−1,Vi)P(Yi|Ai(Vi))1{z=fm(Yi,Zi−1)}
Zi−1,Yi,Ziβi−1(Zi−1)K(Vi−1,Vi)P(Yi|Ai(Vi))1{Zi=fm(Yi,Zi−1)}.
?
?
?
(106)
=
(107)
=
(108)
This implies,
βi= ξ(βi−1,Vi,Vi−1,Ai) = ξ(Si−1,Ai,Wi), Si= F(Si−1,Ai,Wi).
We now have for average cost, with modified cost function˜Λ(Vi,ˆVi) = Λ(Ui,ˆUi) and stationary decodingˆVopt(·),
?˜Λ(Vi,ˆVopt(Yi,Zi−1))|Vi−1,βi−1,Ai?
=
P(Zi−1= ˜ z,Vi= ˜ v,Yi= ˜ y|Vi−1,βi−1,Ai)˜Λ(˜ v,ˆVopt(˜ y, ˜ z)))
?
=
−g(Vi−1,βi−1,Ai)
=
−g(Si−1,Ai).
(109)
E
(110)
?
˜ z∈Z,˜ v∈V,˜ y∈Y
(111)
=
˜ z∈Z,˜ v∈V,˜ y∈Y
βi−1(˜ z)K(Vi−1, ˜ v)P(˜ y|Ai(˜ v))˜Λ(˜ v,ˆVopt(˜ y, ˜ z)))
(112)
(113)
(114)
Page 17
17
Thus,
1
n
n
?
n
?
i=1
E
?˜Λ(Vi,ˆVopt(Yi,Zi−1))
?
?
(115)
=
1
n
i=1
EE
?˜Λ(Vi,ˆVopt(Yi,Zi−1))|Vi−1,βi−1,Ai??
(116)
=
−1
n
n
?
i=1
E[g(Si−1,Ai)].
(117)
We finally write down the ACOE after transforming to the original source (similar to the previous sections),
λ + h(u1,···,ud+1,β)=max
a∈A
g(ud+1
∀(u1,···,ud+1) ∈ Ud+1, β ∈ P(Z).
1
,β,a) +
?
(˜ u,˜ y)∈U×Y
PU(˜ u)PY |X(˜ y|a(u2,···,ud+1, ˜ u))h(u2.···,ud+1, ˜ u,˜β)
,
(118)
where
g(s,a) = g(ud+1
1
,β,a)=
−
?
˜ z∈Z,˜ u∈U,˜ y∈Y
β(˜ z)PU(˜ u)P(˜ y|a(u2,···,ud+1, ˜ u))Λ(u2,ˆUopt(˜ y, ˜ z)),
(119)
and
˜β=ξ((u1,···,ud+1,β),a,(u2,···,ud+1, ˜ u))
?
(120)
=
?
˜ z∈Z,˜ y∈Yβ(˜ z)P(˜ u)P(˜ y|a(u2,···,ud+1, ˜ u))1{z=fm(˜ y,˜ z)}
˜ z∈Z,˜ y∈Y,z∈Zβ(˜ z)P(˜ u)P(˜ y|a(u2,···,ud+1, ˜ u))1{z=fm(˜ y,˜ z)}
?
?
z∈Z
.
(121)
Theorem 1 implies that if ACOE Eq. (118) is solved by a real λ∗and a bounded h(·), then D(d,fm,M) = −λ∗. The results
on the structure of optimal policies parallel those outline in Note 4 and hence are omitted.
Note 6: For the setting considered in this section when no feedback is present, we have restricted our attention to finite
state decoders only, unlike the previous section where feedback was present and we also considered the case where decoding
used complete memory. This is because in the absence of feedback when decoding uses complete memory, the state space is
one on the simplex of distributions on alphabets that grows exponentially with the time index and hence the results of the
theory presented in Section III are not as directly applicable.
VII. SEQUENTIAL SOURCE CODING WITH A SIDE INFORMATION “VENDING MACHINE”
In previous sections we considered the problem of real time source-channel communication when the encoder generates
channel input symbol sequentially with a lookahead, with or without unit delay noise-free feedback, and the decoder generates
the estimate of the source given the channel output and the memory. In this section we consider a rate-distortion problem, where
encoding is sequential with lookahead. In addition to it, the decoder can take cost constrained actions, also in a sequential
fashion, which affect the quality of the side information correlated with the source symbol it attempts to reconstruct. We
consider two classes of such models : one where the encoder has access to the past side information symbols through unit
delay noise-free feedback (Section VII-A) and the other when it does not (Section VII-B). The findings of this section are
similar in spirit to those of previous sections and assert the universality of the methodology invoked in the paper. We defer
the proofs in this section to the Appendix.
A. Encoder has access to Side Information
The setting depicted in Fig. 6 consists of the following blocks :
• Source Encoder : The encoder has access to source symbols upto a lookahead, d and to the past side information symbols,
i.e, Xi= fe,i(Ui+d,Yi−1), where fe,iis the encoding function, fe,i: Ui+d× Yi−1→ X, i ∈ N.
• Memory X : The decoder might not be able to use all of the encoded symbols upto current time due to memory
constraints. Memory X is updated as a function of the past state of the memory and the current encoder output, i.e.,
Mi= fm,i(Mi−1,Xi), where the fm,iis the memory update function, fm,i: Mi−1× X → Mi, i ∈ N. Note that the
alphabet Mican grow with i, hence this includes the special case of complete memory, i.e., Mi= Xi.
Page 18
18
MEMORY YMEMORY X
SOURCE
DECODERSOURCE
FINITE MEMORY
WITH LOOKAHEAD
SOURCE ENCODER
ACTUATOR
VENDING MACHINE
SIDE INFORMATION
Mi−1
Yi
ˆUi(Xi,Yi,Mi−1,Ni−1)
Yi
Ni−1
Mi(Mi−1,Xi)
Ui
Mi−1
Xi
Ni(Ni−1,Yi)
Xi(Ui+d,Yi−1)
Yi−1
Av,i
Av,i(Mi−1,Xi)
PY |U,Av
Fig. 6.
also knows the past side information symbols through a unit delay noise-free feedback from the decoder.
The setting of sequential source coding with source lookahead at the encoder and side information vending machine at the decoder. The encoder
• Actuator : Actuator uses the past Memory X, and the current encoded symbol to generate an action, i.e., Av,i =
fv,i(Mi−1,Xi), where fv,i: Mi−1× Xi→ Av. The action sequence should satisfy the following cost constraint,
?
N
i=1
where C(·) is the cost function and Γ is the cost constraint.
• Side Information “Vending Machine” : The side information is generated according to PY |U,Av, i.e.,
P(yi|u∞
• Memory Y : The decoder may be limited in its ability to remember all the side information upto current time due to
memory constraints. Memory Y is updated as a function of the past state of the memory and the current side information,
i.e., Ni= fn,i(Ni−1,Yi), where the fn,iis the memory update function, fn,i: Ni−1× X → Ni, i ∈ N. Here also the
alphabet Nican grow with i, hence also includes the special case of complete memory, i.e., Ni= Yi.
• Source Decoder : Source decoder uses the current encoded symbol, current side information and the past memory states,
to construct its estimate of the source symbol, i.e.,ˆUi= fd,i(Xi,Yi,Mi−1,Ni−1), the decoding rule is the map, fd,i:
X × Y × Mi−1× Ni−1→ˆU. The complete memory case corresponds to the decoding,ˆUi(Xi,Yi).
The alphabets U,X,Av,Y,M,N are assumed to be finite. Note that the finiteness of the alphabets implies we may assume,
without loss of generality, that 0 ≤ Λ(·) ≤ Λmax< ∞ and 0 ≤ C(·) ≤ Γmax< ∞. We make the further assumption that
there exists a ∈ Avsuch that C(a) = 0. Thus it makes sense to consider cost constraints, Γ ∈ [0,Γmax].
Our approach to construction of the ACOE is similar to that taken in previous sections, we first consider the system
with modified source, {Vi = Ui+d
{fe,i(v,Vi−1,Yi−1)}v∈V and {fv,i(x,Xi−1)}x∈X. Hence the modified vending machine is,
P(Yi|Vi,Av,i(Xi)) = P(Yi|Ui,Av,i(Xi)),
and the modified cost function is,
limsup
N→∞
E
1
N
?
C(Av,i)
?
≤ Γ,
(122)
1,xi,ai
v) = PY |U,Av(yi|ui,av,i).
(123)
i
}i∈N and it is equivalent to consider source and action encoding rules as mappings,
(124)
˜Λ(Vi,ˆVi) = Λ(Ui,ˆUi).
(125)
We study two scenarios under this setting :
1) Complete Memory: Here Mi= Xiand Ni= Yi. Note that we can restrict our attention to optimal decoders of the
form,ˆUi(Xi,Yi) =ˆUBayes(Xi,Yi) (cf. Lemma 3). Let us denote the minimum expected average distortion achieved to be
DFB
a
(d). Here FB superscript indicating we have side information available as a feedback to the encoder, a subscript denotes
presence of actions and d stands for lookahead. We have the average cost optimality equation as,
Page 19
19
ρλ(u1,···,ud+1,β) + hλ(u1,···,ud+1,β)
=max
(ae,av)
gλ(u1,···,ud+1,β,ae,av) +
∀ (u1,···,ud+1) ∈ Ud+1,β ∈ P(Ud+1),
?
˜ u∈U,˜ x∈X,˜ y∈Y
P(˜ u)1{˜ x=ae(u2,···,ud+1,˜ u)}P(˜ y|u2,av(˜ x))h(ud,···,ud+1, ˜ u,˜β)
(126)
where (ae,av) ∈ Ae× Av and˜β = G(β,ae,av,(ud+1
Lagrangian augmented cost,
2
, ˜ u, ˜ x, ˜ y)) (cf. Appendix B) is the updated belief and gλ(·) is the
gλ(u1,···,ud+1,β,ae,av)
g(u1,···,ud+1,β,ae,av) + λ(Γ − l(u1,···,ud+1,β,ae,av))
?
We have now the following theorem, with proof in Appendix B.
Theorem 10: For a fixed lookahead, d, let (ρλ(·),hλ(·)) solves the ACOE Eq. (126). Then the optimal average distortion
is given by,
=
(127)
=
−min
ˆ u∈U
u∈U
β1(u)Λ(u, ˆ u) + λ
Γ −
?
˜ u∈U,˜ x∈X
P(˜ u)1{˜ x=ae(u2,···,ud+1,˜ u)}C(Av(˜ x))
.
(128)
DFB
a
(d) = − inf
λ≥0
sup
x∈Ud+1×P(Ud+1)
ρλ(x).
(129)
Note 7: Note that for a fixed finite lookahead, d ≥ 1, DFB
by symbol encoding, action-encoding and decoding are optimal, i.e,
a
(d) contrasts the minimum distortion at, d = 0, where symbol
DFB
a
(0) =min
X:U→X,A:X→AE
?
Λ
?
U,ˆUBayes(PU|X,Y)
??
,
(130)
while at infinite lookahead, d = ∞, the minimum distortion is given by the distortion rate function at unit rate by results from
[22], i.e,
DFB
a
(∞)=min
PAv,W|U,ˆUopt:W×Y→ˆ UE
I(U;W,Av) ≤ log2|X|
|W| ≤ |U||Av| + 2
E[C(Av)] ≤ Γ,
?
Λ
?
U,ˆUopt(W,Y )
??
such that
(131)
where I(·;·) is the mutual information (cf. [38]). The above distortion is basically the distortion rate function (cf. Theorem 3
[22]) evaluated at the rate equal to the cardinality of the alphabet |X|. The proofs for Equations (130) and (131) are similar
to those of Lemma 5 and Lemma 9.
2) Finite Memory: In this section, all memories are finite (not growing with time). With the object of minimizing the expected
distortion, we cast this problem as a constrained Markov decision process. To be able to do that, for reasons discussed in
Section V, we assume, fm,i= fm, fn,i= fn, Mi= M and Ni= N for all i ∈ N, the alphabets M, N being finite. We
further assume stationary optimal decoding and actuator policies, i.e., fd,i(·,·,·,·) = Uopt(·,·,·,·), and fa,i(·,·) = Aopt
for all i ∈ N.
Fix a lookahead d. Now for fixed λ ≥ 0, the average cost optimality equation is,
v (·,·)
ρλ(u1,···,ud+1,m,n) + hλ(u1,···,ud+1,m,n)
=max
a∈A
gλ(u1,···,ud+1,m,n,a) +
∀ (u1,···,ud+1) ∈ Ud+1,m ∈ M,n ∈ N,
?
˜ u∈U,˜ x∈X,˜ y∈Y
P(˜ u)1{˜ x=a(u2,···,ud+1,˜ u)}P(˜ y|u2,Aopt
v (˜ x))h(ud,···,ud+1, ˜ u, ˜ m, ˜ n))
(132)
Page 20
20
where ˜ m = fm(m, ˜ x) and ˜ n = fn(n, ˜ y) are memory updates and gλ(·) is the Lagrangian augmented cost,
gλ(u1,···,ud+1,m,n,a)
=g(u1,···,ud+1,m,n,a) + λ(Γ − l(u1,···,ud+1,m,n,a))
=
−
˜ u∈U,˜ x∈X,˜ y∈Y
˜ u∈U,˜ x∈X
Let us denote the optimal distortion by DFB
a
(d,M,N). We have now the following theorem, with proof in Appendix C.
Theorem 11: For a fixed lookahead, d, let (ρλ(·),hλ(·)) solves the ACOE Eq. (132). Then the optimal average distortion
is given by,
(133)
?
P(˜ u)1{˜ x=a(u2,···,ud+1,˜ u)}P(˜ y|u2,Aopt
v (˜ x))Λ(u2,ˆUBayes(˜ x, ˜ y,m,n))
+λ
Γ −
?
P(˜ u)1{˜ x=ae(u2,···,ud+1,˜ u)}C(Aopt
v (˜ x))
.
(134)
DFB
a
(d,M,N) = − inf
λ≥0
sup
x∈Ud+1×M×N
ρλ(x).
(135)
B. Encoder does not have access to Side Information
Here encoder does not recieve any knowledge about side information. In this section also, we make assumptions A1 and
A2 and further assume finite state decoders (for reasons similar to those outlined in Note 6). For a fixed lookahead d, λ ≥ 0,
we have the average cost optimality equation,
ρλ(u1,···,ud+1,β,γ) + hλ(u1,···,ud+1,β,γ)
= max
a∈A
gλ(u1,···,ud+1,β,γ,a) +
∀ (u1,···,ud+1) ∈ Ud+1,β ∈ P(M),γ ∈ P(N),
?
˜ u∈U,˜ x∈X,˜ y∈Y
P(˜ u)1{˜ x=a(u2,···,ud+1,˜ u)}P(˜ y|u2,Aopt
v (˜ x))h(ud,···,ud+1, ˜ u,˜β, ˜ γ))
(136)
where˜β = ξm(β,ud+1
augmented cost,
1
, ˜ u,a) and ˜ γ = ξn(γ,ud+1
1
, ˜ u,a) are belief updates (cf. Appendix D) and gλ(·) is the Lagrangian
gλ(u1,···,ud+1,β,γ,a)
g(u1,···,ud+1,β,γ,a) + λ(Γ − l(u1,···,ud+1,β,γ,a))
−
˜ m∈M,˜ n∈N,˜ u∈U,˜ x∈X,˜ y∈Y
˜ u∈U,˜ x∈X
Let us denote the optimal distortion by DNF
now state the following theorem whose proof is defered to Appendix D.
Theorem 12: For a fixed lookahead, d, suppose that (ρλ(·),hλ(·)) solves the ACOE Eq. (136). Then the optimal average
distortion is given by,
=
(137)
=
?
?
β(˜ m)γ(˜ n)P(˜ u)1{˜ x=a(u2,···,ud+1,˜ u)}P(˜ y|u2,Aopt
v (˜ x))Λ(u2,ˆUBayes(˜ x, ˜ y, ˜ m, ˜ n))
+λ
Γ −
P(˜ u)1{˜ x=ae(u2,···,ud+1,˜ u)}C(Aopt
v (˜ x))
.
(138)
a
(d,M,N) (NF standing for no feedback of side information symbols). We can
DNF
a
(d,M,N) = − inf
λ≥0
sup
x∈Ud+1×P(M)×P(N)
ρλ(x).
(139)
Note 8: Note that the ACOE in this section on sequential source coding with lookahead and a side information vending
machine is amenable to computational solutions as in Section V-B. Here also DFB
memories exactly, and yield non trivial bounds on DFB
a
(d).
a
(d,M,N) can be computed for increasing
VIII. SUMMARY OF THE RESULTS
In this section we provide a summary of the various settings considered in this paper on real time communication with
fixed finite lookahead at the encoder, and the transformations performed to cast the problem as (constrained or unconstrained)
Markov decision process. The methodology is to construct an average cost optimality equation (ACOE), and seek its solution.
We have considered two classes of problems in this paper :
Page 21
21
1) Real Time Communication, Fig. 1. The problem is characterized by tuple (S,A,W,F,PS,PW,g), the meaning of various
symbols being explained in Section III. The general ACOE is,
?
w∈W
Note in all the settings we considered, sup is replaced by max as the set of actions is finite. If ∃ λ∗∈ R and a bounded
h(·), satisfying the above equation, then using Theorem 1, the minimum distortion is −λ∗. The following table exhibits
the transformations, along with pointers to the equations in the paper that cast the problem of Fig. 1 as an unconstrained
Markov decision process :
λ(s) + h(s) = max
a∈A
g(s,a) +
?
PW(w|s,a)h(F(s,a,w))
?
∀ s ∈ S.
(140)
Real-Time Communi-
cation, Fig. 1, Looka-
head, d
S, state space
A, action space
W, disturbance
F(·)
PW(·|S,A)
g(S,A), reward
ACOE
Noise-Free
Complete
Decoding
Ud+1× P(Ud+1)
Mappings : Ud+1→ X
Ud+1× Y
Eq. (39)
Eq. (29)
Eq. (52)
Eq. (53)
Feedback,
Memory
Noise-free Feedback, Fi-
nite Memory (M) De-
coder
Ud+1× M
Mappings : Ud+1→ X
Ud+1× Y
Eq. (88)
Eq. (87)
Eq. (101)
Eq. (102)
No
Memory (M) Decoder
Feedback,Finite
Ud+1× P(M)
Mappings : Ud+1→ X
Ud+1
Eq. (109)
Eq. (104)
Eq. (119)
Eq. (118)
2) Source Coding with a Side Information Vending Machine, Fig. 6 : The problem is characterized by tuple
(S,A,W,F,Ps,Pw,g,l,Γ) explained in Section III-A. Here also the general ACOE is,
?
w∈W
where λ is the Lagrangian parameter. Minimum distortion is given by the Theorems 10, 11 and 12, respectively, for the
cases tabulated below.
ρλ(s) + hλ(s) = sup
a∈A
gλ(s,a) +
?
P(w|s,a)h(F(s,a,w))
?
∀ s ∈ S,
(141)
Source Coding With
SI “Vendor”, Fig. 6,
Lookahead, d
S, state space
A, action space
Noise-Free
Complete
Decoding
Ud+1× P(Ud+1)
Mappings : Ud+1×X →
X × Av
Ud+1× X × Y
Eq. (166) (Appendix B)
Eq. (162) (Appendix B)
Eq. (173) (Appendix B)
Feedback,
Memory
Noise-free Feedback, Fi-
nite Memory (M,N)
Decoder
Ud+1× M × N
Mappings : Ud+1→ X
No
Memory
Decoder
Ud+1× P(M) × P(N)
Mappings : Ud+1→ X
Feedback, Finite
(M,N)
W, disturbance
F(·)
Pw(·|S,A)
gλ(S,A), Lagrangian
augmented reward
ACOE
Ud+1× X × Y
Eq. (181) (Appendix C)
Eq. (180) (Appendix C)
Eq. (190) (Appendix C)
Ud+1
Eq. (199) (Appendix D)
Eq. (194) (Appendix D)
Eq. (208) (Appendix D)
Eq. (126) Eq. (132)Eq. (136)
IX. CONCLUSION
In this paper, we consider an important class of problems in real time coding : a memoryless source is to be communicated
over a memoryless channel, with sequential encoding and decoding and with a fixed finite lookahead of future symbols available
at the encoder. Unit delay feedback may or may not be present, and decoding is based on the channel output symbols without
delay, with or without a memory constraint. In all these scenarios, under the objective of minimizing the per-symbol distortion,
we obtain average cost optimality equations whose solution yields the minimum achievable distortion, as well as sufficient
conditions for the optimality of stationary policies. We contrast the minimum distortion at a fixed lookahead, with the best
achievable with zero lookahead, where symbol by symbol encoding-decoding is optimal, and with the infinite lookahead case,
for which the minimum achievable per symbol distortion is shown to coincide with that for the classical joint source channel
coding problem, where separation is optimal. For the Bernoulli source and binary symmetric channel under Hamming loss, in
case of finite state decoders, we compute exactly the minimum distortion values for various memory sizes, and study the upper
bounds that they yield on the minimum distortion for a fixed lookahead in the absence of memory constraints. Answering
the question “to look or not to lookahead”, we characterize general conditions on the source and channel such that symbol
by symbol encoding-decoding is optimal within the class of schemes of a given lookahead. We obtain and plot the region
Page 22
22
for source and channel parameters in case of Bernoulli source, binary symmetric channel and Hamming distortion, where
the symbol by symbol policy is strictly suboptimal. We then demonstrate that this framework of casting real time coding
problems as Markov decision problems with average cost criteria can be useful in various other frameworks by applying this
same methodology in source coding problem with a side information vending machine, where encoder encodes the source
sequentially, with a possible lookahead, decoder takes cost constrained actions to receive the side information about the source.
This setting is cast as a constrained Markov decision problem and it is shown that a stationary randomized policy can attain
the minimum per-symbol distortion which is characterized as the solution to a saddle point equation.
ACKNOWLEDGMENT
The authors would like to thanks Benjamin Van Roy for enlightening discussions. This work is supported by The Scott A.
and Geraldine D. Macomber Stanford Graduate Fellowship and NSF Grants CCF-1049413 and 4101-38047. The authors also
acknowledge the support of Center for Science of Information (CSoI), an NSF Science and Technology Center, under grant
agreement CCF-0939370.
REFERENCES
[1] D. Neuhoff and R. Gilbert, “Causal source codes,” Information Theory, IEEE Transactions on, vol. 28, no. 5, pp. 701 – 713, sep 1982.
[2] T. Linder and R. Zamir, “Causal source coding of stationary sources with high resolution,” in Information Theory, 2001. Proceedings. 2001 IEEE
International Symposium on, 2001, p. 28.
[3] P. Piret, “Causal sliding block encoders with feedback (corresp.),” Information Theory, IEEE Transactions on, vol. 25, no. 2, pp. 237 – 240, mar 1979.
[4] T. Weissman and N. Merhav, “On causal source codes with side information,” Information Theory, IEEE Transactions on, vol. 51, no. 11, pp. 4003 –
4013, nov. 2005.
[5] H. S. Witsenhausen, “On the structure of real-time source coders,” The Bell System Technical Journal, vol. 58, no. 6, 1979.
[6] D. Tenetzekis, “Communication in decentralized control,” PhD Dissertation, MIT, Cambridge, MA, 1979.
[7] D. Teneketzis, “On the structure of optimal real-time encoders and decoders in noisy communication,” Information Theory, IEEE Transactions on, vol. 52,
no. 9, pp. 4017 –4035, sept. 2006.
[8] J. Walrand and P. Varaiya, “Optimal causal coding - decoding problems,” Information Theory, IEEE Transactions on, vol. 29, no. 6, pp. 814 – 820, nov
1983.
[9] G. Munson, “Causal information transmission with feedback,” PhD Dissertation, Cornell University, Ithaca, NY, 1981.
[10] S. K. Gorantla and T. P. Coleman, “Information-theoretic viewpoints on optimal causal coding-decoding problems,” CoRR, vol. abs/1102.0250, 2011.
[11] A. Mahajan and D. Teneketzis, “Optimal design of sequential real-time communication systems,” Information Theory, IEEE Transactions on, vol. 55,
no. 11, pp. 5317 –5338, nov. 2009.
[12] T. Linder and G. Lugosi, “A zero-delay sequential quantizer for individual sequences,” in Information Theory, 2000. Proceedings. IEEE International
Symposium on, 2000, p. 125.
[13] T. Weissman and N. Merhav, “On limited-delay lossy coding and filtering of individual sequences,” Information Theory, IEEE Transactions on, vol. 48,
no. 3, pp. 721 –733, mar 2002.
[14] N. Gaarder and D. Slepian, “On optimal finite-state digital transmission systems,” Information Theory, IEEE Transactions on, vol. 28, no. 2, pp. 167 –
186, mar 1982.
[15] S. C. Tatikonda, “Control under communication constraints,” PhD Dissertation, MIT, Cambridge, MA, 2000.
[16] S. C. Tatikonda and S. Mitter, “The capacity of channels with feedback,” IEEE Trans. Inf. Theor., vol. 55, no. 1, pp. 323–349, 2009.
[17] D. Blackwell, “Information theory,” in Modern Mathematics for the Engineer ; Second Series.
[18] R. Ash, Information Theory. New York: Wiley, 1965.
[19] H. Permuter, P. Cuff, B. Van Roy, and T. Weissman, “Capacity of the trapdoor channel with feedback,” Information Theory, IEEE Transactions on,
vol. 54, no. 7, pp. 3150 –3165, Jul. 2008.
[20] L. Zhao and H. H. Permuter, “Zero-error feedback capacity via dynamic programming,” CoRR, vol. abs/0907.1956, 2009.
[21] A. Sahai, “Why do block length and delay behave differently if feedback is present?” Information Theory, IEEE Transactions on, vol. 54, no. 5, pp.
1860 –1886, may 2008.
[22] H. H. Permuter and T. Weissman, “Source coding with a side information ’vending machine’ at the decoder,” in ISIT’09: Proceedings of the 2009 IEEE
international conference on Symposium on Information Theory. Piscataway, NJ, USA: IEEE Press, 2009, pp. 1030–1034.
[23] T. Weissman, “Capacity of channels with action-dependent states,” Information Theory, IEEE Transactions on, vol. 56, no. 11, pp. 5396 –5411, nov.
2010.
[24] K. Kittichokechai, T. Oechtering, M. Skoglund, and R. Thobaben, “Source and channel coding with action-dependent partially known two-sided state
information,” in ISIT’10: Proceedings of the 2010 IEEE international conference on Symposium on Information Theory June 2010, pp. 629 –633.
[25] H. Asnani, H. H. Permuter, and T. Weissman, “Probing capacity,” oct. 2010, submitted to IEEE Transactions on Information Theory.
[26] ——, “To feed or not to feed back,” nov. 2010, submitted to IEEE Transactions on Information Theory.
[27] Y. Chia, H. Asnani, and T. Weissman, “Multi-terminal source coding with action dependent side information,” to appear in Proceedings of 2011 IEEE
International Symposium on Information Theory, St. Petersburg, Russia, August 2011.
[28] H. H. Permuter and H. Asnani, “Multiple access channel with partial and controlled cribbing encoders,” mar. 2011, submitted to IEEE Transactions on
Information Theory.
[29] M. Gastpar, B. Rimoldi, and M. Vetterli, “To code, or not to code: lossy source-channel communication revisited,” Information Theory, IEEE Transactions
on, vol. 49, no. 5, pp. 1147 – 1158, may 2003.
[30] C. E. Shannon, “A mathematical theory of communication,” Bell Syst. Tech. J., vol. 27, pp. 379–423 and 623–656, 1948.
[31] A. Arapostathis, V. S. Borkar, E. Fern´ andez-Gaucherand, M. K. Ghosh, and S. I. Marcus, “Discrete-time controlled markov processes with average cost
criterion: a survey,” SIAM J. Control Optim., vol. 31, pp. 282–344, March 1993.
[32] E. Altman, Constrained Markov Decision Processes.Chapman and Hall/CRC, 1999.
[33] M. Kurano, J.-i. Nakagami, and Y. Huang, “Constrained markov decision processes with compact state and action spaces : The average case,” in
Optimization, vol. 48, 2000, pp. 255–269.
[34] W. Whitt, “Approximations of dynamic programs, i,” Mathematics of Operations Research, vol. 3, no. 3, pp. pp. 231–243.
[35] ——, “Approximations of dynamic programs, ii,” Mathematics of Operations Research, vol. 4, no. 2, pp. pp. 179–185.
[36] A. E. Gamal and Y. H. Kim, “Lecture notes on network information theory,” CoRR, vol. abs/1001.3404, 2010.
[37] D. P. Bertsekas, Dynamic Programming and Optimal Control, Vol. II, 3rd ed.
[38] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed.
McGraw-Hill, 1091, pp. 183–193.
Athena Scientific, 2007.
Wiley, 2006.
Page 23
23
APPENDIX A
In Section II the minimum expected distortion is defined as,
D(d) =inf
{fe,fm,fd}limsup
N→∞
E
?
1
N
N
?
i=1
Λ(Ui,ˆUi)
?
.
(142)
Note that inf in the above definition can be replaced by min over the class of (fe,fm,fd)-policies as outlined in Section II.
This is argued by constructing an (fe,fm,fd)-policy that achieves D(d). Fix lookahead d. As D(d) always exists (also it is
finite due to our assumption that Λ(·,·) ≤ Λmax< ∞), for a positive non-increasing vanishing sequence {?m}m≥1, we can
construct a sequence of policies, i.e., {µm}m≥1with expected average distortion Dµm(d), i.e.,
Dµm(d)= limsup
N→∞
limsup
N→∞
Eµm
?
1
N
N
?
i=1
Λ(Ui,ˆUi)
?
(143)
=DN
µm(d),
(144)
(Eµmis the expectation with respect to the joint probability distribution induced when the policy used is µm), such that Dµm(d)
is a monotone non-increasing sequence converging to D(d). By the definition of limsup, for every m ≥ 1, ∃ Nm(?m) such
that ∀ N ≥ Nm(Nmbeing function of ?mis implied henceforth),
DN
µm(d) ≤ Dµm(d) + ?m.
Now define a block-length sequence {lm∈ N}m≥1, satisfying the following requirements,
• R1 : lm>?m−1
lm
Note that we can always choose such a sequence, for eg. lm= max{Nm,Nm+1}(?m−1
for some m = m(N), (note m(N) → ∞ as N → ∞) we can bound the normalized distortion as:
• (Case 1) N −?m
(a)
≤
N
?N − lm
??m−1
where (a) is due to the fact that lm > Nm (requirement R2) and hence distortion in block m is bounded above by
Dµm(d) + ?m.
• (Case 2) N −?m
DN
µ∗(d)
≤
N
??m−1
??m−1
??m−1
where (b) follows from bounding the distortion in block m as in (a) and similarly as N −?m
sequences.
(145)
i=1liand that
?m−1
i=1
lm
li
→ 0 as m → ∞.
max{Nm,Nm+1}
• R2 : lm> max{Nm,Nm+1} and that
µ∗which operates with block length liin ithblock with scheme µi. Operating this scheme for time N ∈ (?m
i=1li< Nm+1
→ 0 as m → ∞.
i=1li). We define a block-coding scheme
i=1li,?m+1
i=1li]
DN
µ∗(d)
??m−1
i=1li
?
Λmax+lm
N(Dµm(d) + ?m) +
?N −?m
i=1li
N
?
Λmax
(146)
=
N
?
Λmax+lm
N(Dµm(d) + ?m)
?
(147)
≤
i=1li+ Nm+1
lm
Λmax+ Dµm(d) + ?m,
(148)
i=1li≥ Nm+1
(b)
??m−1
i=1li
?
?
?
Λmax+lm
N(Dµm(d) + ?m) +
?N −?m
?N − lm
i=1li
N
??Dµm+1(d) + ?m+1
(Dµm(d) + ?m)
?
(149)
(c)
≤
i=1li
N
Λmax+lm
N(Dµm(d) + ?m) +
N
?
(150)
=
i=1li
N
Λmax+ Dµm(d) + ?m
?
(151)
≤
i=1li+ Nm+1
lm
Λmax+ Dµm(d) + ?m,
(152)
i=1li≥ Nm+1, bounding
distortion in block m+1 by Dµm+1(d)+?m+1and (c) follows from the fact that both Dµm(d) and ?mare non-increasing
Page 24
24
Thus we see in both the above cases, for any time N, the normalized distortion is bounded above as,
??m−1
which implies that the expected average distortion under policy µ∗is,
???m−1
??m−1
(d)
=D(d),
?m−1
as N → ∞. Thus we have a scheme µ∗with minimum expected distortion Dµ∗(d) ≤ D(d), but we know for any scheme µ∗,
D(d) ≤ Dµ∗(d), implying Dµ∗(d) = D(d).
DN
µ∗(d)
≤
i=1li+ Nm+1
lm
?
Λmax+ Dµm(d) + ?m,
(153)
Dµ∗(d)
≤
limsup
N→∞
i=1li+ Nm+1
lm
?
?
Λmax+ Dµm(d) + ?m
?
(154)
≤
limsup
N→∞
i=1li
lm
Λmax+ limsup
N→∞
?Nm+1
lm
?
Λmax+ limsup
N→∞
(Dµm(d) + ?m)
(155)
(156)
where (d) follows from the fact that
i=1
lm
li
→ 0 and
Nm+1
lm
→ 0 by requirements R1 and R2 respectively, since m(N) → ∞
APPENDIX B
PROOF OF THEOREM 10
We will first obtain the ACOE Eq. (126). Define the state sequence, Si= (Vi,{PVi=v|Xi,Yi}v∈V), disturbance sequence,
Wi= (Vi,Xi,Yi), action sequence, Ai= {fe,i(v,Vi−1,Yi−1),fv,i(x,Xi−1)}v∈V,x∈X is clearly a history dependent action,
i.e. function of Wi−1= (Vi−1,Xi−1,Yi−1). We will now verify the conditions for the defined state sequence, disturbance
and action sequence to form a controlled markov process. With some abuse of notation, we denote,
βi
=
{PVi=v|Xi,Yi}v∈V
fe,i(·,Vi−1,Yi−1)
fv,i(·,Xi−1).
(157)
(158)
(159)
Ae,i
Av,i
=
=
Now,
P(Wi|Wi−1,Si−1,Ai)=P(Vi,Xi,Yi|Vi−1,Xi−1,Yi−1,Ai
K(Vi−1,Vi)1{Xi=Ae,i(Vi)}P(Yi|Vi,Av,i(Xi))
PW(Wi|Si−1,Ai)
v,Ai
e,βi)
(160)
(161)
(162)
=
=
βi
=
?
G(βi−1,Ae,i,Av,i,Vi,Xi,Yi)
G(βi−1,Ai,Wi),
?
Vi−1βi−1(Vi−1)K(Vi−1,v)1{Xi=Ae,i(v)}P(Yi|v,Av,i(Xi))
Vi−1,Viβi−1(Vi−1)K(Vi−1,Vi)1{Xi=Ae,i(Vi)}P(Yi|Vi,Av,i(Xi))
?
?
v∈V
(163)
=
(164)
(165)
=
which implies,
Si= F(Si−1,Ai,Wi).
(166)
The optimal decoding isˆVBayes(PVi|Xi,Yi). Also let g(Si,Ai+1) = −E
?
N
Also for the cost constraint on action,
?
?˜Λ(Vi,ˆVBayes(PVi|Xi,Yi))
?
???Yi?
so that we have
?
inf limsup
N→∞
E
1
N
?
i=1
˜Λ(Vi,ˆVBayes(Xi,Yi)
?
=
−supliminf
N→∞E
1
N
N
?
i=1
g(Si−1,Ai).
(167)
E
C(Av,i(Xi))
???Vi−1,Xi−1,Yi−1,Ai?
=
?
˜ v∈V,˜ x∈X
l(Si−1,Ai),
K(Vi−1, ˜ v)1{˜ x=Ae,i(˜ v)}C(Av,i(˜ x))
(168)
=
(169)
which imply,
limsup
N→∞
E
?
1
N
N
?
i=1
C(Av,i)
?
= limsup
N→∞
E
?
1
N
N
?
i=1
l(Si−1,Ai)
?
.
(170)
Page 25
25
Thus the problem of minimizing the average distortion subject to constraints on the vending action is equivalent to a constrained
Markov decision process, (S,A,W,F,PS,PW,g,l,Γ) (note here the number of constraints is k = 1). Fix a lookahead d. Let
β1denote the marginal of belief β with respect to the first argument. Now for fixed λ ≥ 0, we have the average cost optimality
equation as,
ρλ(u1,···,ud+1,β) + hλ(u1,···,ud+1,β)
= max
(ae,av)
gλ(u1,···,ud+1,β,ae,av) +
∀ (u1,···,ud+1) ∈ Ud+1,β ∈ P(Ud+1),
where˜β = G(β,ae,av,(ud+1
?
˜ u∈U,˜ x∈X,˜ y∈Y
P(˜ u)1{˜ x=ae(u2,···,ud+1,˜ u)}P(˜ y|u2,av(˜ x))h(ud,···,ud+1, ˜ u,˜β)
(171)
2
, ˜ u, ˜ x, ˜ y)) is the updated belief, (ae,av) ∈ Ae×Avand gλ(·) is the Lagrangian augmented cost,
gλ(u1,···,ud+1,β,ae,av)
g(u1,···,ud+1,β,ae,av) + λ(Γ − l(u1,···,ud+1,β,ae,av))
?
Now having obtained the ACOE, the proof is an application of Theorem 2 stated in Section III-A. We need merely verify that
the conditions :
C1 holds as the state space and actions space both are compact subsets of Borel spaces.
C2 holds because of our definitions of g(·), l(·) and assumptions on cost and distortion constraints.
C3 Denoting the state by s = (u1,···,ud+1,β) and action a = (ae,av), we have the stochastic kernel,
Q(˜ s|s,a)
˜ x∈X,˜ y∈Y
if ˜ s = (u2,···,ud+1, ˜ u,˜β),
=0 otherwise .
=
(172)
=
−min
ˆ u∈U
u∈U
β1(u)Λ(u, ˆ u) + λ
Γ −
?
˜ u∈U,˜ x∈X
P(˜ u)1{˜ x=ae(u2,···,ud+1,˜ u)}C(av(˜ x))
.
(173)
=
?
P(˜ u)1{˜ x=ae(u2,···,ud+1,˜ u)}P(˜ y|u2,av(˜ x))1{˜β=G(β,a,(ud+1
2
,˜ u,˜ x,˜ y)}
(174)
(175)
Fix tuple (ud+1
induced by Q(·|ud+1
have µn(h) → µ(h), i.e.
1
,a) which takes values in a finite set. Consider a sequence βn→ β. Let µnand µ be the measure on B(S)
1
,βn,a) and Q(·|ud+1
1
,β,a) respectively. Proving C3 is equivalent to proving that ∀ h ∈ Cb(S), we
?
˜ u∈U,˜ x∈X,˜ y∈Y
→
˜ u∈U,˜ x∈X,˜ y∈Y
P(˜ u)1{˜ x=ae(u2,···,ud+1,˜ u)}P(˜ y|u2,av(˜ x))h(F(βn,a,(ud+1
2
, ˜ u, ˜ x, ˜ y))
?
P(˜ u)1{˜ x=ae(u2,···,ud+1,˜ u)}P(˜ y|u2,av(˜ x))h(F(β,a,(ud+1
2
, ˜ u, ˜ x, ˜ y)),
(176)
which is true as F(·) (by its definition Eq. (166)) is continuous in its arguments.
C4 (Slater’s Condition) We need to show there exists a policy such that the constraint on the vending action are strictly
satisfied, but this is trivially true as we can select a policy with Av,i(·) such that C(Av,i) = 0, ∀ i which satisfies the
slater’s condition. Thus C1-C4 being true, this implies that the optimal distortion, DFB
a
(d) is,
DFB
a
(d) = −ρ∗
(a)
=
−
sup
(ν,π)∈P(S)×ΠHD
− inf
λ≥0
(ν,π)∈P(S)×ΠHD
− inf
λ≥0
x∈Ud+1×M(Ud+1)
inf
λ≥0L((ν,π),λ)
(177)
(b)
= supL((ν,π),λ)
(178)
(c)
= supρλ(x),
(179)
where (a) follows from the definition of ρ∗, (b) follows from Theorem 2 (note assumptions C1-C4 are satisfied here as
proved above) while (c) follows as (ρλ(·),hλ(·)) solve the ACOE.
APPENDIX C
PROOF OF THEOREM 11
Define the state sequence as, Si= (Vi,Mi,Ni) and the disturbance sequence, Wi= (Vi,Xi,Yi). We will first derive the
ACOE Eq. (132). The action (encoder’s control) sequence is history dependent, Ai = fe,i(·,Wi−1). (Note here Ai is the
Page 26
26
encoding action while Aopt
previous sections that,
v
is the action taken by the decoder to observe side information). It can be easily established as in
P(Wi|Wi−1,Si−1,Ai) = PW(Wi|Si−1,Ai) = K(Vi−1,Vi)1{Xi=Ai(Vi)}P(Yi|Vi,Aopt
v (Xi)).
(180)
We have,
Xi= Ai(Vi), Mi= fm(Mi−1,Xi), Ni= fn(Ni−1,Yi), and hence Si= F(Si−1,Ai,Wi).
By the assumptions in the Section VII-A2, the decoding is stationary, hence we have,
?˜Λ(Vi,ˆVopt)|Vi−1,Mi−1,Ni−1,Yi−1,Ai?
=K(Vi−1, ˜ v)1{˜ x=Ai(˜ v)}P(˜ y|˜ v,Aopt
(181)
E
(182)
?
˜ v∈V,˜ x∈X,˜ y∈Y
−g(Si−1,Ai).
v (˜ x))Λ(˜ v,ˆUopt(˜ x, ˜ y,Mi−1,Ni−1))
(183)
=
(184)
For the cost constraints we have,
E?C(Aopt
˜ v∈V,˜ x∈X
l(Si−1,Ai).
v (Vi)|Vi−1,Mi−1,Ni−1,Yi−1,Ai?
K(Vi−1, ˜ v)1{˜ x=Ai(˜ v)}C(Aopt
(185)
(186)
=
?
v (˜ x))
=
(187)
Fix a lookahead d. Now for fixed λ ≥ 0, the average cost optimality equation is,
ρλ(u1,···,ud+1,m,n) + hλ(u1,···,ud+1,m,n)
= max
a∈A
gλ(u1,···,ud+1,m,n,a) +
∀ (u1,···,ud+1) ∈ Ud+1,m ∈ M,n ∈ N,
where ˜ m = fm(m, ˜ x) and ˜ n = fn(n, ˜ y) are memory updates and gλ(·) is the Lagrangian augmented cost,
gλ(u1,···,ud+1,m,n,a)
=g(u1,···,ud+1,m,n,a) + λ(Γ − l(u1,···,ud+1,m,n,a))
=
−
˜ u∈U,˜ x∈X,˜ y∈Y
˜ u∈U,˜ x∈X
Once we have the ACOE, rest of the proof is similar to the proof of Theorem 10 by invoking Theorem 2.
?
˜ u∈U,˜ x∈X,˜ y∈Y
P(˜ u)1{˜ x=a(u2,···,ud+1,˜ u)}P(˜ y|u2,Aopt
v (˜ x))h(ud,···,ud+1, ˜ u, ˜ m, ˜ n))
(188)
(189)
?
P(˜ u)1{˜ x=a(u2,···,ud+1,˜ u)}P(˜ y|u2,Aopt
v (˜ x))Λ(u2,ˆUBayes(˜ x, ˜ y,m,n))
+λ
Γ −
?
P(˜ u)1{˜ x=ae(u2,···,ud+1,˜ u}C(Aopt
v (˜ x))
.
(190)
APPENDIX D
PROOF OF THEOREM 12
The proofs of this section follow in line with the previous sections. We just need to establish the ACOE Eq. (136), rest of
the proof follows invoking Theorem 2. Define the following :
Si
Wi
Ai
=(Vi,PMi|Vi,PNi|Vi)
Vi
fe,i(·,Wi−1).
(191)
(192)
(193)
=
=
Let us use the following notation, βi= P(Mi|Vi) and γi= P(Ni|Vi). It is easy to see (along the lines of analysis in previous
sections),
P(Wi|Wi−1,Si−1,Ai) = P(Wi|Si−1,Ai) = K(Vi−1,Vi),
(194)
Page 27
27
and,
βi
=
?
ξm(βi−1,Vi−1,Vi,Ai)
?
ξn(γi−1,Vi−1,Vi,Ai),
which imply that Si= F(Si−1,Ai,Wi).
Also for constraints,
?˜Λ(Vi,ˆVopt(Xi,Yi,Mi−1,Ni−1))|Vi−1,βi−1,γi−1,Yi−1,Ai?
=
βi−1(˜ m)γi−1(˜ n)K(Vi−1, ˜ v)1{˜ x=Ai(˜ v)}P(˜ y|Aopt
?
Mi−1,Xiβi−1(Mi−1)K(Vi−1,Vi)1{Xi=Ai(Vi)}1{m=fm(Xi,Mi−1)}
Mi−1,Xi,Miβi−1(Mi−1)K(Vi−1,Vi)1{Xi=Ai(Vi)}1{Mi=fm(Xi,Mi−1)}
?
?
m∈M
(195)
=
(196)
γi
=
?
Ni−1,Xi,Yiβi−1(Ni−1)K(Vi−1,Vi)1{Xi=Ai(Vi)}P(Yi|Vi,Aopt
Ni−1,Xi,Yi,Niβi−1(Ni−1)K(Vi−1,Vi)1{Xi=Ai(Vi)}P(Yi|Vi,Aopt
v (Xi))1{n=fn(Yi,Ni−1)}
v (Xi))1{Ni=fn(Yi,Ni−1)}
?
?
n∈N
(197)
=
(198)
(199)
E
(200)
?
˜ m∈M,˜ n∈N,˜ v∈V,˜ x∈X,˜ y∈Y
−g(Vi−1,βi−1,γi−1,Ai)
−g(Si−1,Ai),
v (˜ x), ˜ v)˜Λ(˜ v,ˆVopt(˜ x, ˜ y, ˜ m, ˜ n)))
(201)
=
(202)
(203)
=
and
E
?˜C(Aopt
v (Vi))|Vi−1,βi−1,γi−1,Yi−1,Ai?
=
?
˜ x∈X,˜ v∈V
l(Si−1,Ai).
K(Vi−1, ˜ v)1{˜ x=Ai(˜ v)}˜C(Aopt
v (˜ x))
(204)
=
(205)
After these transformations for a fixed lookahead d, λ ≥ 0, we have the average cost optimality equation,
ρλ(u1,···,ud+1,β,γ) + hλ(u1,···,ud+1,β,γ)
˜ u∈U,˜ x∈X,˜ y∈Y
∀ (u1,···,ud+1) ∈ Ud+1,β ∈ P(M),γ ∈ P(N),
where˜β = ξm(β,ud+1
1
, ˜ u,a) and ˜ γ = ξn(γ,ud+1
1
, ˜ u,a) are belief updates and gλ(·) is the Lagrangian augmented cost,
gλ(u1,···,ud+1,β,γ,a)
=g(u1,···,ud+1,β,γ,a) + λ(Γ − l(u1,···,ud+1,β,γ,a))
=
−
˜ m∈M,˜ n∈N,˜ u∈U,˜ x∈X,˜ y∈Y
˜ u∈U,˜ x∈X
thus the ACOE Eq. (136) is established.
=max
a∈A
gλ(u1,···,ud+1,β,γ,a) +
?
P(˜ u)1{˜ x=a(u2,···,ud+1,˜ u)}P(˜ y|u2,Aopt
v (˜ x))h(ud,···,ud+1, ˜ u,˜β, ˜ γ)
(206)
(207)
?
?
β(˜ m)γ(˜ n)P(˜ u)1{˜ x=a(u2,···,ud+1,˜ u)}P(˜ y|u2,Aopt
v (˜ x))Λ(u2,ˆUBayes(˜ x, ˜ y, ˜ m, ˜ n))
+λ
Γ −
P(˜ u)1{˜ x=ae(u2,···,ud+1,˜ u)}C(Aopt
v (˜ x))
,
(208)