Page 1

GLOBALLY OPTIMAL PERFORMANCE OF FEEDBACK CONTROL

SYSTEMS WITH LIMITED COMMUNICATION OVER NOISY

CHANNELS

ADITYA MAHAJAN

AND DEMOSTHENIS TENEKETZIS∗

Abstract. A discrete time stochastic feedback control system consisting of a non-linear plant,

a sensor, a controller, and a noisy communication channel between the sensor and the controller

is considered. The sensor has limited memory and at each time, it transmits an encoded symbol

over the channel and updates its memory. The controller receives a noise-corrupted copy of the

transmitted symbol. It generates a control action based on all its past observations and all its past

actions. This control action action is fed back to the plant. At each time instant the system incurs

an instantaneous cost depending on the state of the plant and the control action. The objective is

to choose encoding, memory update and control strategies to minimize: an expected total cost over

a finite horizon, or an expected discounted cost over an infinite horizon, or an average cost per unit

time over an infinite horizon. A solution methodology to obtain a sequential decomposition of the

global optimization problem is developed. This solution methodology is extended to the case when

the sensor makes an imperfect observation of the state of the plant.

Key words. optimal control over noisy communication, sequential stochastic control, decentral-

ized optimal control, non-classical information structures, dynamic teams, common knowledge

AMS subject classifications. 93E03, 93E20, 93A14, 62B05, 49N30

1. Introduction.

1.1. Preliminaries and literature overview. Recent advances in network and

communication technologies have led to an increasing interest in networked control

systems (ncs) (see the papers in [1]), in particular, in understanding the limitations

imposed upon a feedback control system by the presence of a communication channel

in the loop. Most researchers have concentrated on stability analysis of the system.

The problem of stabilization of a plant with finite data rate feedback was investigated

in [4,5,7,9,11,13,15,18,22–25,27,47]. See [26] for a unified overview of stabiliza-

tion with finite data rate feedback. lqg stability of various systems (deterministic,

stochastic, stable, unstable) under various kinds of communication constraints (noisy

and noiseless channel) was considered in [35–37,39]. Stability of an unstable plant over

awgn channel subject to input power constraints was considered in [6]. Fundamental

asymptotic limitations of feedback for a linear time invariant plant and arbitrary time-

invariant causal feedback were investigated in [19,20] using an information theoretic

formulation. In retrospect, the connection between stability and information theory

is not surprising since stability as well as the information theoretic notions of source

entropy and channel capacity are asymptotic concepts.

Certain applications, like vehicular traffic control and biomedical applications,

require performance metrics different from the asymptotic metrics of stability. In this

paper we consider the class of additive performance metrics, where the total cost is

the sum of costs along the entire path. In order to determine optimal performance

ncs, both the transient and the steady state behaviours need to be chosen optimally.

The asymptotic notions of source entropy and channel capacity, and the asymptotic

∗The authors are with the department of EECS at the University of Michigan, Ann Arbor,

MI 48109–2212, USA (email: {adityam,teneket}@eecs.umich.edu). A preliminary version of this

paper appeared in the proceedings of the 45th IEEE Conference on Decision and Control, San Diego,

December 2006.

1

Revision submitted to SIAM Journal of Control and Optimization – November ,

Page 2

2

A. MAHAJAN and D. TENEKETZIS

results on stability are not appropriate for evaluating the transient performance and

consequently not appropriate for performance evaluation.

The problem of optimal performance has received less attention than stabiliza-

tion in the literature. The problems considered in the literature can be classified on

the basis of their plant dynamics (linear or non-linear), the nature of the communi-

cation channel (rate-limited noiseless channel or noisy channel), and the information

structure (classical or non-classical information structures). Optimal performance

of a linear plant with rate-limited noiseless communication channel was considered

in [21,30]: in [21] the plant disturbance is Gaussian and the controller is memoryless;

in [30] the plant is undisturbed and the controller has perfect recall. Optimal per-

formance of a linear plant with Gaussian disturbance, either a rate-limited noiseless

channel or a Gaussian memoryless channel, and various information structures at the

encoder was considered in [38]. Optimal performance of a non-linear plant and a noisy

channel with noiseless feedback from the output of the channel to the encoder was

considered in [40].

The most important feature in problems of optimal performance of ncs is whether

the encoder knows the information available at the decoder/controller or not. We can

classify problems into two cases on the basis of the presence or absence of this fea-

ture: case 1, when the encoder has access to all the information available at the

decoder/controller, and case 2, when it does not. In case 1 the problem of determin-

ing optimal performance can be reduced to a centralized stochastic control problem

from the encoder’s point of view. Such a reduction is not possible in case 2. Con-

sequently, in case 1 the encoder knows how the decoder/controller will interpret its

messages; in case 2, it does not. So efficient communication between the encoder and

decoder/controller is easier in case 1 than in case 2. Hence, in determining optimal

strategies for the encoder and the controller in case 2 is a considerably more difficult

problem than in case 1.

The models of [21,30,40] and the instances in [38] where there are noiseless chan-

nels as well as the instance of information pattern A (see [38, pg. 1550] for definition

of information pattern A) belong to case 1. In all these situations optimal encoding

and control strategies have been determined. The model in [38] with information pat-

tern B (see [38, pg. 1550] for definition of information pattern B) belong to case 2. In

this situation only sub-optimal encoding and control strategies have been proposed.

In this paper we consider a non-linear plant with a noisy communication channel.

Our model belongs to case 2. We model the performance analysis of ncs as a stochas-

tic control problem. We study the simplest ncs—a network with only two nodes with

a noisy communication link between them. We identify the structure of optimal con-

trollers and develop a general methodology for determining globally optimal encoding

and control strategies for finite and infinite horizon problems. We show that even

for “well behaved” infinite horizon problems, stationary designs are not necessarily

globally optimal.

1.2. Features of the problem. We consider a discrete-time feedback control

system with a communication channel between the sensor and the controller, as shown

in Figure 2.1. Such problems arise when the plant and the controller are geographically

separated. We assume that there is a noisy discrete memoryless channel between the

sensor and the controller. (The rate limited communication channel is the degenerate

case where the channel is noiseless). We model problems in which the sensor has

limited resources in terms of the power at which it can transmit and the data it can

store and process. The encoder connected with the sensor is assumed to have a finite

Revision submitted to SIAM Journal of Control and Optimization – November ,

Page 3

CONTROL WITH LIMITED NOISY COMMUNICATION

3

memory, thus it can not remember all its past observations and actions, and at each

stage, must selectively shed some information. At each time, the sensor generates a

symbol using its current observation and the contents of its memory, and transmits

it over the noisy channel to the controller. We assume that there is no resource

constraint at the controller. It has infinite memory and infinite power. Thus we

assume that the controller has perfect recall—it remembers everything that it has

seen and done in the past—and the communication channel between the controller

and the plant is noiseless1. At each stage t the system incurs an instantaneous cost

depending on the state of the plant at t and the control action at t. The objective is to

choose globally optimal encoding, memory update, and control strategies to minimize:

the expected total cost over a finite horizon, or the expected discounted cost over an

infinite horizon, or the expected average cost per unit time over an infinite horizon.

The problem has two decision-makers, the sensor and the controller. Due to

the noise in the communication channel, the sensor and the controller have different

information about what is happening in nature. Due to the finite memory at the

sensor, the sensor forgets information and at any given time instant the sensor may

not know what actions it took in the past and why it took those actions. These two

considerations, the noise in the channel and the finite memory at the sensor, result

in a decentralized control problem. There is no known solution methodology to solve

infinite horizon decentralized stochastic control problems.

Markov decision theory [14] provides a solution methodology for centralized stochas-

tic control problems. For centralized problems with imperfect observations, Markov

decision theory shows that there is no loss of optimality in taking a control action

based on the controller’s belief about the state of the plant. The belief is obtained

using all the data available at the controller. Centralization of information and perfect

recall at the controller are crucial for this idea to work. Consequently this idea does

not extend to decentralized control problems: decentralization of information implies

that one decision-maker cannot infer the data available with the other decision-makers

and therefore cannot infer their beliefs. So if all decision-maker act according to their

beliefs about the state of the plant, they will act in an inconsistent manner, and the

system will not achieve the globally optimal performance. Hence, Markov decision

theory is not appropriate for this problem.

Orthogonal search [29] techniques provide a solution methodology for decentral-

ized stochastic control problem. There are different variations of orthogonal search

algorithm, but the key idea is the following. Initialize by arbitrarily choosing the

decision strategies of all agents; then pick an agent, say i, and determine the best

response of agent i to the strategies of the other agents. Fix this best response as

agent i’s strategy. Next pick another agent j, j ?= i, and update agent j’s strat-

egy by its best response to the other agents’ strategies. Continue this way. If this

procedure converges, the resultant strategies are member by member optimal, i.e.,

unilateral deviations by a single agent do not improve the system’s performance. Fic-

titious play techniques [8,28,34] are philosophically similar to orthogonal search and

result in member by member optimal solutions. Since, decentralized stochastic con-

trol problems are, in general, non-convex in strategy space, the above procedure may

not converge to globally optimal strategies; that is, it does not guarantee that there

do not exist any other tuple of strategies for all agents that outperforms the mem-

ber by member optimal strategies found by the above procedure. Thus, orthogonal

1In the sequel we show that assuming a noiseless feedback channel does not entail any loss of

generality.

Revision submitted to SIAM Journal of Control and Optimization – November ,

Page 4

4

A. MAHAJAN and D. TENEKETZIS

search cannot be used to obtain globally optimal strategies for the problem under

consideration.

Witsenhausen’s standard form [44] is the only known solution methodology for

general sequential decentralized stochastic control problems. It proceeds by converting

the problem into a standard form, and then obtains a sequential decomposition for

the standard form. The standard form is a finite horizon stochastic control problem

whose state evolution satisfies some properties, the cost is a stopping cost incurred at

the last time step, and the cost has a certain measurability properties. Since all the

cost is incurred at the last time step in the standard form, infinite horizon problems

cannot be converted into the standard form. Hence the solution methodology of [44]

is not appropriate for the problem under consideration.

1.3. Contributions of the paper. The main contribution of this paper is in

providing a solution methodology for sequentially determining globally optimal real-

time encoding, memory update, and control strategies for feedback control systems

with limited communication over noisy channels. To the best of our knowledge, the

methodology developed in this paper is the first one to provide a sequential decom-

position for the aforementioned class of problems. The methodology proceeds in two

steps. In the first step, we obtain qualitative properties of optimal controllers. In the

second step we use the qualitative properties of the first step to identify information

states sufficient for performance evaluation (also called sufficient statistic for control)

to obtain a sequential decomposition of the problem. The main conceptual difficulty

in the problem is identifying appropriate information states in the second step; once

appropriate information states are identified, obtaining a sequential decomposition

is straight forward. We would like to emphasize that identifying appropriate infor-

mation states for performance evaluation is nontrivial; the difficultly can be judged

from the fact that decentralized stochastic control problems have been investigated

since the early 1970s, and even now there is no known solution methodology to ob-

tain information states sufficient for performance analysis for these problems. The

results of this paper provide a solution methodology for a hitherto unsolved class of

decentralized stochastic control problems and explains why this methodology works.

This methodology may be useful for other classes of decentralized stochastic control

problems.

1.4. Organization of the paper. The remainder of this paper is organized

as follows. We formulate the performance analysis of feedback control systems with

limited communication over a noisy channel as a decentralized stochastic optimization

problem. To illustrate the key concepts associated with our solution methodology we

first consider the finite horizon problem. In Section 2, we establish structural results

of an optimal controller and obtain a methodology for sequentially global optimization

of the encoding, memory update and control strategies for the finite horizon problem.

We provide an explanation of the methodology in Section 3. In Section 4 we extend

the methodology to infinite horizon problems. In Section 5 we consider the case of

uncountable state space. In Section 6 we consider the feedback control problem when

the encoder has imperfect observation of the state of the plant and extend the results

of Sections 2 and 4 to this problem. We conclude in Section 7.

1.5. Notation. Throughout this paper we use the following notation. Upper-

case letters (X,Y,Z) denote random variables, lowercase letters (x,y,z) denote their

realizations, and calligraphic letters (X,Y,Z) denote their alphabets. For random

variables and functions, xtis a short hand for x1,...,xt. E{ · } denotes the expec-

Revision submitted to SIAM Journal of Control and Optimization – November ,

Page 5

CONTROL WITH LIMITED NOISY COMMUNICATION

5

tation of a random variable, Pr(·) denotes the probability of an event, and

denotes the indicator function of a statement. To denote the expectation or probabil-

ity of a random variable or an event that depends on a function ϕ, we use E{ · | ϕ }

and Pr(·|ϕ), respectively. We have chosen this slightly unusual notation because we

want to keep track of all the functional dependencies and the conventional notation

of Eϕ{ · } and Prϕ(·) is too cumbersome.

2. The Finite Horizon Problem.

1[·]

Plant Encoder

Memory

×

Controller

Xt

Mt−1

Zt

Yt

Ut

Sensor

Nt

Fig. 2.1. Feedback control system with noisy communication

2.1. Problem formulation. Consider a discrete-time feedback control system

of Figure 2.1 which operates for a horizon T. The state evolution is given by

(2.1)

Xt+1= f(Xt,Ut,Wt),

where f is the plant evolution function and the variables Xt,Ut,Wtdenote the state

of the plant, the control action and the plant disturbance respectively, at time t. We

assume that all variables are finite valued. For all t, Xttakes values in a finite setX,

Uttakes values in a finite set U and Wttakes values in a finite set W. The initial state

X1is a random variable with PMF PX1. The random variables W1,...,WT are i.i.d.

(independent and identically distributed) with PMF PW and are also independent of

X1.

The sensor, consisting of an encoder and a memory, makes perfect observations

of the state of the plant. At each time instant t the encoder generates an encoded

symbol Zt, taking values in a finite set Z, as follows:

(2.2)

Zt= ct(Xt,Mt−1),

where ct is the encoding function at time t and Mt−1 denotes the content of the

sensor’s memory at t−1. Mttakes values in a finite set M and is updated according

to

(2.3)

Mt= lt(Xt,Mt−1),

where ltis the memory update function at time t. Observe that the sensor has a finite

size memory and although it makes perfect observations of the state of the plant, it

can not store all the past observations. Thus, it does not have perfect recall and at

each stage it must selectively shed information.

Revision submitted to SIAM Journal of Control and Optimization – November ,

Page 6

6

A. MAHAJAN and D. TENEKETZIS

The encoded symbol Ztis transmitted over a noisy communication channel and

a channel output Ytis generated according to

(2.4)

Yt= h(Zt,Nt),

where h is the channel function and Ntdenotes the channel noise. Yttakes values in

a finite set Y and Nttakes values in a finite set N. The sequence of random variables

N1,...,NT is i.i.d. with given PMF PN and is also independent of X1,W1,...,WT.

The controller observes the channel outputs and generates a control action Utas

follows:

(2.5)

Ut= gt(Yt,Ut−1),

where gtis the control law at time t. Uttakes values in a finite set U. A uniformly

bounded cost function ρ : X × U → [0,K], where K < ∞ is given. At each t, an

instantaneous cost ρ(Xt,Ut) is incurred.

The collection (X, W, M, Z, N, Y, U, PX1, PW, PN, f, h, ρ, T) is called a

perfect observation system. The choice of (C,L,G), C := (c1,...,cT), L := (l1,...,lT),

G := (g1,...,gT), is called a design.

The performance of a design is quantified by the expected total cost under that

design and is given by

?

t=1

where the expectation in (2.6) is with respect to a joint measure on (X1,...,XT,

U1,...,UT) generated by PW,PN,f,h and the choice of design (C,L,G). We are

interested in the following optimization problem:

Problem 2.1. Given a perfect observation system (X, W, M, Z, N, Y, U,

PX1, PW, PN, f, h, ρ, T), choose a design (C∗,L∗,G∗) such that

(2.7)

JT(C∗,L∗,G∗) = J∗

where CT:= C × ··· × C (T times), C is the space of functions from X × M to

Z, LT:= L × ··· × L (T times), L is the space of functions from X × M to M,

GT:= G1× ··· × GT, and Gtis the space of functions from Yt× Ut−1to U.

Remarks.

1. There is no loss of generality in assuming a noiseless channel between the

controller and the plant. Suppose that the channel between the controller

and the plant is noisy. Let the inputˆUt to the plant be a noise-corrupted

version of Utgiven by

(2.8)

ˆUt=ˆh(Ut,ˆ Nt),

whereˆh is the feedback channel, andˆ Nt denotes the noise in the feedback

channel.ˆ N1,...,ˆ NT is a sequence of independent variables that is also inde-

pendent of X1,W1...,WT and N1,...,NT.2Then this model can be trans-

formed into one equivalent to (2.1)–(2.5) by setting

ˆ Wt= (Wt,ˆ Nt),

(2.9)

?

2We only requireˆ W1,...,ˆ WT, whereˆ Wt= (Wt,ˆ Nt), to be an independent process. So,ˆ Nt need

not be independent of Wt.

(2.6)

JT(C,L,G) := E

T

?

ρ(Xt,Ut)

?????C,L,G

?

,

T:=

min

C,L,G∈CT×LT×GTJT(C,L,G),

Xt+1= fXt,ˆh(Ut,ˆ Nt),Wt

?

:=ˆf(Xt,Ut,ˆ Wt).

(2.10)

Revision submitted to SIAM Journal of Control and Optimization – November ,

Page 7

CONTROL WITH LIMITED NOISY COMMUNICATION

7

t+t+

(t +1/ 2)(t +1/ 2)

(t + 1)−

(t + 1)−

t tt tt t

Wt−1

Wt−1

Xt Xt

Zt Zt

Nt Nt

Yt Yt

Mt

Mt

Ut

Ut

ct ct

lt lt

gt gt

ItIt

It It

It It

BtBt

Bt Bt

Bt Bt

πtπt

πt πt

πt πt

actual time

time notation

System

Variables

Design Laws

Information at

the controller

Beliefs

Information

States

Stage t

Fig. 2.2. Problem 2.1 as a sequential stochastic optimization problem. This figure shows the

ordering relation between the system variables, design rules, and information states.

Thus, without any loss of generality we can assume a noiseless feedback chan-

nel.

2. A globally optimal design for Problem 2.1 always exists because there are

finitely many designs and we can always choose one with the best perfor-

mance.

2.2. Salient features of the problem. Problem 2.1 is a decentralized multi-

agent stochastic optimization problem. There are two agents, the sensor and the

controller, that have different information about the system and a common objective

which is to minimize an expected total cost over a finite horizon. Multi-agent problems

in which the agents have a common objective are called team problems [17]. Team

problems are further classified as static teams or dynamic teams on the basis of their

information structure. See [43] for a definition of information structure (also called

information pattern). In static teams the actions taken by one agent do not affect the

information structure of the other agents; in dynamic teams they do. In Problem 2.1,

the actions taken by the sensor affect the observations of the controller and the actions

taken by the controller affect the observations of the sensor; furthermore, the sensor

and the controller have different information about the system. Moreover, due to the

finite memory at the sensor and the noise in the channel, Problem 2.1 has a strictly

non-classical information structure; thus Problem 2.1 is a dynamic team.

Determining globally optimal strategies for dynamic teams is difficult because

they are, in general, non-convex functional optimization problems having a complex

interdependence among their decision rules [12]. As pointed out in the introduction,

Markov decision theory, orthogonal search, and standard form are not appropriate for

solving infinite horizon dynamic team problems.

The solution concept that we are looking for is to decompose the global opti-

mization problem into a sequence of nested optimization sub-problems where each

sub-problem is easier to solve than the original problem. This is called sequential

decomposition and it exponentially reduces the search complexity of finding an op-

Revision submitted to SIAM Journal of Control and Optimization – November ,

Page 8

8

A. MAHAJAN and D. TENEKETZIS

timal strategy. A crucial step in obtaining a sequential decomposition of the global

optimization problem is to identify information states that are sufficient for perfor-

mance evaluation. Properties that such states must satisfy are explained in [16]. All

the known techniques of identifying appropriate information states, viz., Markov de-

cision theory, orthogonal search and standard form are not appropriate for infinite

horizon dynamic team problems. The information states in Markov decision theory

— the conditional probability densities of the state given all the past observations

and all the past control actions — works only when there is a single controller with

perfect recall; so, they are inappropriate for dynamic teams. The information states

in orthogonal search are obtained under the assumption that the strategies of other

agents are fixed. These information states only determine member by member opti-

mal strategies, so they are not appropriate for determining globally optimal strategies

for dynamic teams. The information states in Witsenhausen’s standard form belong

to a space that increases with time; hence, it is not appropriate for infinite horizon

problems. Thus a new methodology for identifying information states is needed for

the problem under consideration. We provide one such methodology in this paper.

The sequential order in which the system variables are generated is the key to

understanding the solution methodology that we present in this paper. For this pur-

pose we need to refine the notion of time. We call each step of the system a stage. At

any stage t, we consider three time instants3t+, (t + 1/2), and (t + 1)−. For ease of

notation, we will denote these time instants by t, t, and t respectively. From now on,

we will assume that the system has three agents—the encoder, the memory update,

and the controller—even though the encoder and the memory update are located in

the same device and have the same information. We assume that the sensor encodes

just after t, the sensor’s memory is updated just after t, and the controller takes a

control action just after t. The order in which the variables are generated in the

system is shown in Figure 2.2. Since the ordering of the decision makers can be done

independently of the realization of the system variables, the problem is a sequential

stochastic optimization problem [45].

To obtain a sequential decomposition of Problem 2.1, we proceed in two steps.

In step one, we derive structural properties of optimal controllers. In step two, we

use the structural results of step one to identify an information state sufficient for

performance evaluation, transform Problem 2.1 into an equivalent deterministic opti-

mization problem and obtain a sequential decomposition for this equivalent problem.

This sequential decomposition gives an algorithm to obtain an optimal design for

Problem 2.1.

As pointed out in the introduction, step two is the crucial step. The key difficulty

in step two is to identify an information state appropriate for performance evalua-

tion. Even when the structural results of step one are available, identifying such

an information state is a highly nontrivial task. Once an appropriate information

state is identified, the transformation to a deterministic problem and the sequential

decomposition follow.

2.3. Structure of optimal controllers. In this section we present structural

properties of optimal controllers. We first define random variables that capture the

information available just before the decision rules ct, lt, and gtact on the system.

3The actual values of these time instants is not important; we just need three values in increasing

order.

Revision submitted to SIAM Journal of Control and Optimization – November ,

Page 9

CONTROL WITH LIMITED NOISY COMMUNICATION

9

Definition 2.1. Let It, Itand Itdenote the information available at the con-

troller at time t, t, and t respectively. Specifically

1. It:= (Yt−1,Ut−1,ct−1,lt−1,gt−1).

2. It:= (Yt,Ut−1,ct,lt−1,gt−1).

3. It:= (Yt,Ut−1,ct,lt,gt−1).

We have included the past decision rules in the definition of information because

the distribution of the random variables depends on the choice of the past decision

rules. Observe that

(2.11)

It= (It−1,Ut−1,gt−1),It= (It,Yt,ct),

and

It= (It,lt).

Next we define the belief of the controller about the state of the plant and the

memory contents of the sensor at time t−, t−, and t−.

Definition 2.2. Let Bt, Bt, and Btbe random vectors defined as follows:

1. Bt(x,m) := Pr(Xt= x,Mt−1= m|It).

2. Bt(x,m) := Pr(Xt= x,Mt−1= m|It).

3. Bt(x,m) := Pr(Xt= x,Mt= m|It).

For any particular realization itof It, that is, for any particular realization

yt−1,ut−1of Yt−1,Ut−1and arbitrary (but fixed) choice of ct−1, lt−1and gt−1, the

realization btof Btis a PMF on X ×M. If Itis a random vector, then Btis a random

vector belonging to PX×M, the space of PMFs on X × M. Similar interpretations

hold for Btand Bt.

The random vectors Bt, Bt, and Btrepresent the belief of the controller about

the state of the plant and the encoder’s memory content at t, t, and t, respectively.

The sequential ordering of these beliefs with respect to the other varibles in the system

are shown in Figure 2.2. The time evolution of these beliefs are coupled as follows.

Lemma 2.3. For each stage t, there exist deterministic functions F, F, and F

such that

1. Bt= F(Bt−1,Ut−1).

2. Bt= F(Bt,Yt,ct).

3. Bt= F(Bt,lt).

Proof.

1. Consider a component of bt,

(2.12)

bt(xt,mt−1) = Pr(Xt= xt,Mt−1= mt−1|it)

= Pr(Xt= xt,Mt−1= mt−1|it−1,ut−1,gt−1)

Pr(Xt= xt,Mt−1= mt−1,Ut−1= ut−1|it−1,gt−1)

(x?

=

?

t,m?

t−1)∈X×MPr?Xt= x?

t,Mt−1= m?

t−1,Ut−1= ut−1

??it−1,gt−1

?.

Now consider

Pr(Xt= xt,Mt−1= mt−1,Ut−1= ut−1|it−1,gt−1)

= Pr(xt,mt−1,ut−1|it−1,gt−1)

=Pr(xt−1,mt−1|it−1,gt−1)

?

× Pr(ut−1|xt−1,mt−1,it−1,gt−1)

× Pr(xt|xt−1,mt−1,ut−1,it−1,gt−1)

xt−1∈X

(2.13)

Revision submitted to SIAM Journal of Control and Optimization – November ,

Page 10

10

A. MAHAJAN and D. TENEKETZIS

(a)

=

?

× Pr(xt|xt−1,ut−1)

= 1?ut−1= gt−1(yt−1,ut−2)?

×

xt−1∈X

xt−1∈X

Pr(xt−1,mt−1|it−1) 1?ut−1= gt−1(yt−1,ut−2)?

?

bt−1(xt−1,mt−1)Pr(xt|xt−1,ut−1),

(2.14)

where equality (a) follows from (2.1) and (2.5) and

function. Substitute equation (2.14) in equation (2.12) and cancel 1[ut−1=

gt−1(yt−1,ut−2)] from the numerator and the denominator, giving

(2.15)

?

(x?

1[·] is the indicator

bt(xt,mt−1) =

xt−1∈Xbt−1(xt−1,mt−1)Pr(xt|xt−1,mt−1)

t−1)∈X×X×Mbt−1(x?

?

t,x?

t−1,m?

t−1,m?

t−1)Pr?x?

t

??x?

t−1,m?

t−1

?.

Hence,

(2.16)

bt= F(bt−1,ut−1),

where F is determined by (2.15).

2. Consider a component of bt,

(2.17)

bt(xt,mt−1) = Pr(Xt= xt,Mt−1= mt−1|it)

= Pr(Xt= xt,Mt−1= mt−1|it,yt,ct)

=

?

Now consider

Pr(Xt= xt,Mt−1= mt−1,Yt= yt|it,ct)

t,Mt−1= m?

(x?

t,m?

t−1)∈X×MPr?Xt= x?

t−1,Yt= yt

??it,ct

?.

Pr(Xt= xt,Mt−1= mt−1,Yt= yt|it,ct)

= Pr(xt,mt−1,yt|it,ct)

= Pr(xt,mt−1|it,ct)Pr(yt|xt,mt−1,it,ct)

(b)

= Pr(xt,mt−1|it)Pr(yt|xt,mt−1,ct)

= bt(xt,mt−1)Pr(yt|xt,mt−1,ct),

(2.18)

where equality (b) follows from (2.1)–(2.4). Combining (2.17) and (2.18) we

have

(2.19)

bt(xt,mt−1) =

bt(xt,mt−1)Pr(yt|xt,mt−1,ct)

t−1)∈X×Mbt(x?

?

(x?

t,m?

t,m?

t−1)Pr?yt

??x?

t,m?

t−1,ct

?.

Hence,

(2.20)

bt= F(bt,yt,ct),

where F is given by (2.19).

Revision submitted to SIAM Journal of Control and Optimization – November ,

Page 11

CONTROL WITH LIMITED NOISY COMMUNICATION

11

3. Consider a component of bt,

bt(xt,mt) = Pr(xt,mt|it) = Pr(xt,mt|it,lt)

= Pr(xt,mt,mt−1|it,lt)

?

(c)

=Pr(xt,mt−1|it)Pr(mt|xt,mt−1,lt)

?

where equality (c) follows from (2.1) and (2.3), and 1[·] is the indicator

function. Hence,

?

mt−1∈M

=

mt−1∈M

?

Pr(xt,mt−1|it,lt)Pr(mt|xt,mt−1,it,lt)

mt−1∈M

=

mt−1∈M

bt(xt,mt−1) 1[mt= lt(xt,mt−1)]

(2.21)

(2.22)

bt= F(bt,lt)

where F is given by (2.21).

The above relationships between the controller’s beliefs lead to the structural results

of the optimal controllers.

Theorem 2.4. Consider Problem 2.1 for any arbitrary (but fixed) encoding and

memory update strategies C := (c1,...,cT) and L := (l1,...,lT), respectively. Then,

without loss of optimality, we can restrict attention to control laws of the form

(2.23)

Ut= gt(Bt).

Proof. We will show that the process {Bt, t = 1,...,T} is a perfectly observed

controlled Markov process with control action Ut.

The controller knows Itand hence Btis perfectly observed at the controller’s site.

Parts (i)–(iii) of Lemma 2.3 can be combined to obtain

Bt= F?F?F(Bt−1,Ut−1),Yt,ct

?,lt

?

=:ˆF(Bt−1,Yt,Ut−1,ct,lt).

(2.24)

Let b belong to PX×M. For any realization bt−1of Bt−1and ut−1of Ut−1, consider

Pr?Bt= bt

=

yt∈Y

=

??Bt−1= bt−1,Ut−1= ut−1;C,L,G?

Pr?Bt= bt

× Pr?Yt= yt

1

bt=ˆF(bt−1,yt,ut−1,ct,lt)

× Pr?Yt= yt

(2.25)

?

?

Pr?Bt= bt,Yt= yt

??Bt−1= bt−1,Ut−1= ut−1;C,L,G?

??Yt= yt,Bt−1= bt−1,Ut−1= ut−1;C,L,G?

?

??Bt−1= bt−1,Ut−1= ut−1;C,L,G?.

yt∈Y

??Bt−1= bt−1,Ut−1= ut−1;C,L,G?

=

?

yt∈Y

?

Revision submitted to SIAM Journal of Control and Optimization – November ,

Page 12

12

A. MAHAJAN and D. TENEKETZIS

Now consider

(2.26)

Pr?Yt= yt

(xt,xt−1,zt,mt−1)∈X×X×Z×M

?

??Bt−1= bt−1,Ut−1= ut−1;C,L,G?

=

?

Pr?Xt= xt,Xt−1= xt−1,Zt= zt,Yt= yt,Mt−1= mt−1

Pr?Xt−1= xt−1,Mt−1= mt−1

× Pr?Xt= xt

× Pr?Yt= yt

bt−1(xt−1,mt−1)Pr(Xt= xt|Xt−1= xt−1,Ut−1= ut−1)

??

Bt−1= bt−1,Ut−1= ut−1;C,L,G?

=

(xt,xt−1,zt,mt−1)∈X×X×Z×M

??Bt−1= bt−1,Ut−1= ut−1;C,L,G?

??Xt−1= xt−1,Mt−1= mt−1,Bt−1= bt−1,Ut−1= ut−1;C,L,G?

??Zt= zt,Xt= xt,Xt−1= xt−1,Mt−1= mt−1,Bt−1= bt−1,Ut−1= ut−1;C,L,G?

× Pr?Zt= zt

?

× 1[zt= ct(xt,mt−1)]Pr(Yt= yt|Zt= zt)

= Pr(Yt= yt|Bt−1= bt−1,Ut−1= ut−1;ct)

Substituting the value of (2.26) in (2.25) we get

Pr?Bt= bt

=

??Xt= xt,Xt−1= xt−1,Mt−1= mt−1,Bt−1= bt−1,Ut−1= ut−1;C,L,G?

=

(xt,xt−1,zt,mt−1)∈X×X×Z×M

??Bt−1= bt−1,Ut−1= ut−1;C,L,G?

(2.27)

?

× Pr(Yt= yt|Bt−1= bt−1,Ut−1= ut−1;ct)

= Pr(Bt= bt|Bt−1= bt−1,Ut−1= ut−1;ct,lt).

Thus for any fixed C and L, Btis a controlled Markov process with control action Ut.

Further, the expected instantaneous cost can be written as

E?ρ(Xt,Ut)??it+1

Now, by Bayes rule

Pr?xt

Further,

yt∈Y

1

?

bt=ˆF(bt−1,yt,ut−1,ct,lt)

?

(2.28)

?=

?

xt∈X

ρ(xt,ut)Pr?Xt= xt

??it+1

?.

??it+1

?= Pr(xt|it,ut,gt) =

Pr(xt,ut|it,gt)

x?

?

t∈XPr(x?

t,ut|it,gt).

(2.29)

Pr(xt,ut|it,gt) = Pr(ut|xt,it,gt)Pr(xt|it,gt)

(d)

= Pr(ut|it,gt)Pr(xt|it),

where equality (d) follows from (2.1) and (2.5). Combine (2.29) and (2.30), and cancel

Pr(ut|it,gt) from the numerator and the denominator to obtain

(2.31)

(2.30)

Pr?Xt= xt

??it+1

?= Pr(Xt= xt|it). =

?

mt∈M

bt(xt,mt).

Substituting back in (2.28) gives

E?ρ(Xt,Ut)??it+1

?=

?

xt,mt∈X×M

=: ˆ ρ(bt,ut).

ρ(xt,ut)bt(xt,mt)(2.32)

Revision submitted to SIAM Journal of Control and Optimization – November ,

Page 13

CONTROL WITH LIMITED NOISY COMMUNICATION

13

We can think of ˆ ρ(·) as the instantaneous cost and write the total expected cost as

?

t=1

E

T

?

ρ(Xt,Ut)

?????C,L,G

?

= E

?

?

T

?

T

?

t=1

E?ρ(Xt,Ut)??It+1

ˆ ρ(Bt,Ut)

?

?????C,L,G

?

= E

t=1

?????C,L,G

?

.

(2.33)

Hence the process {Bt, t = 1,...,T} is a perfectly observed controlled Markov process

with control action Ut. The instantaneous cost ˆ ρ(·) is a function of the controlled state

Btand the control action Ut. From Markov decision theory [14] we know that there

is no loss of optimality in restricting attention to control laws of the form (2.23).

2.3.1. Implication of the structural results. Theorem 2.4 implies that at

each stage t, without loss of optimality, we can restrict attention to controllers belong-

ing to the familyˆ

G of functions from PX×Mto U. With this modification Problem 2.1

is equivalent to the following problem:

Problem 2.2. Given a perfect observation system (X, W, M, Z, N, Y, U,

PX1, PW, PN, f, h, ρ, T), choose a design (C∗,L∗,G∗) that is optimal with respect

to the performance criterion of (2.6), i.e.,

(2.34)

JT(C∗,L∗,G∗) = J∗

T:=

min

C,L,G∈CT×LT×ˆ

GTJT(C,L,G),

whereˆ

Using the structural results results of Theorem 2.4, we can transfrom Problem 2.1

into an equivalent problem, Problem 2.2, in which the domain of all the decision rules,

the encoding rules, the memory update rules, and the control rules, is not changing

with tume. This is in contrast to Problem 2.1 where the domain of the control rules

was increasing with time. This reduction to a time-invariant domain is necessary for

extending the solution methodology for the finite horizon problems to infinite horizon.

In the next section we provide a sequential decomposition of Problem 2.2.

GT:=ˆ

G × ··· ×ˆ

G (T times).

2.4. Global optimization. As explained in Section 2.2, Problems 2.1 and 2.2

are dynamic teams with strictly non-classical information structure. To obtain a

sequential decomposition we need to identify information states sufficient for perfor-

mance evaluation, or equivalently, find sufficient statistics for performance evaluation.

The sequential nature of the problem suggests choosing an information state for each

decision rule. Suppose πt, πt, and πtare information states at time t for the encoder,

memory update, and the controller respectively. Due to the decentralization of infor-

mation, these information states should depend only on the decision rules (which are

common knowledge) and not on the observation of any agent. For πt, πt, and πtto be

information states in the sense of [14], at each instant of time πtmust be determined

from πtand ct; πtmust be determined from πtand lt; and πt+1must be determined

from πtand gt. However, a system can have more than one information state, and

not all of them are sufficient for performance evaluation (see [46]). To be sufficient

for performance evaluation, the information states must absorb/summarize the effect

Revision submitted to SIAM Journal of Control and Optimization – November ,

Page 14

14

A. MAHAJAN and D. TENEKETZIS

of past decision rules on the expected future cost4, that is they should satisfy

?

s=t

E

T

?

ρ(Xs,Us)

?????C,L,G

?

= E

?

?

?

T

?

T

?

T

?

s=t

ρ(Xs,Us)

?????πt,cT

t,lT

t,gT

t

?

= E

s=1

ρ(Xs,Us)

?????πt,cT

t+1,lT

t,gT

t

?

= E

s=t

ρ(Xs,Us)

?????πt,cT

t+1,lT

t+1,gT

t

?

.

(2.35)

or equivalently,

(2.36)

E{ ρ(Xt,Ut) | C,L,G } = E{ ρ(Xt,Ut) | πt,gt}

These properties that information states sufficient for performance evaluation must

satisty are explained in more detail in [16].

For sequential problems, one way to obtain information states satisfying the above

properties is by converting the model to Witsenhausen’s standard form [44]. However

in the standard form the space in which information states belong increases with time,

so such a transformation to the standard form does not lead to a formulation that can

be extended to infinite horizon problems. We want an information state that will be

appropriate for both finite and infinite horizon problems. This is possible only when

the space in which the information state belongs is time-invariant.

Thus information states sufficient for performance evaluation should satisfy the

following properties:

(P1) They must be states, that is, at each instant of time πtshould be a function of

πtand ct; πtshould be a function of πtand lt; and πt+1should be a function

of πtand gt.

(P2) They must be sufficient for performance evaluation, that is, they should sat-

isfy (2.35) or (2.36).

(P3) They should take values in a time-invarinat space.

Next we present information states that have the above properties and show how

these information states lead to a sequential decomposition of Problem 2.2. We want

to reemphasize that the hardest part in our solution methodology is to identify the ap-

propriate information states; there are no known solution methodologies in identifying

information states for decentralized stochastic control problems like Problem 2.2.

The information states defined below have all the above-discussed desired features.

Definition 2.5. Let Π be the space of probability measure on X × M × PX×M.

Define πt, πt, πt, t = 1,...,T, as follows:

1. πt:= Pr(Xt,Mt−1,Bt).

2. πt:= Pr(Xt,Mt−1,Bt).

3. πt:= Pr(Xt,Mt,Bt).

Here πt, πt, and πtare probability measures (or probability laws) on the prob-

ability space

?X × M × PX×M, 2X×MB(PX×M)?, where B(PX×M) is the Borel

4Note that in problems with classical information structure, we can find an information state that

is independent of the control law [14]. For problems with strictly non-classical information structures

it is not always possible find information states that are independent of the control law. However, as

long as the expected future cost conditioned on the information state is conditionally independent

of the past control laws, a sequential decomposition can be obtained using that information state.

See [44] for a proof.

Revision submitted to SIAM Journal of Control and Optimization – November ,

Page 15

CONTROL WITH LIMITED NOISY COMMUNICATION

15

σ-algebra on PX×M. These probability measures are information states sufficient

for performance evaluation of Problem 2.2. Specifically, they satisfy the following

properties:

Lemma 2.6. πt,πt,πtare information states for the encoder, the memory update

and the controller respectively, i.e.,

1. there is a linear transformation Q(ct) such that

(2.37)

πt= Q(ct)πt.

2. there is a linear transformation Q(lt) such that

(2.38)

πt= Q(lt)πt.

3. there is a linear transformation Q(gt) such that

(2.39)

πt+1= Q(gt)πt.

4. the conditional expected instantaneous cost can be expressed as

E?ρ(Xt,Ut)??ct,lt,gt?= ˜ ρ(πt,gt),

Proof.

1. Consider a component of πt,

(2.40)

where ˜ ρ is a deterministic function.

πt(x,m,b) =

?

A(b,ct)

πt(x,m,b)db

=: Qt(ct)πt,

(2.41)

where A(b,c) = {b ∈ PX×M: b = F(b,c)}.

2. Consider a component of πt,

πt(x,m,b) =

?

{m?∈M:m?=lt(x,m)}

=: Q(lt)πt,

?

A(b,lt)

πt(x,m?,b)db

(2.42)

where A(b,l) = {b ∈ PX×M: b = F(b,l)}.

3. Consider a component of πt+1,

πt+1(x,m,b) =

?

× Pr(Xt+1= x|Xt= xt,Ut= gt(b))db

=: Q(gt)πt,

xt∈X

?

A(b,gt)

πt(xt,m,b) (2.43)

where A(b,g) = {b ∈ PX×M: b = F?b,g(b)?}.

to all agents. For specified ct,ltand gt−1, the information state πtcan be

evaluated using the transformations of previous steps of this Lemma. Thus,

E?ρ(Xt,Ut)??ct,lt,gt?= E?ρ(Xt,Ut)??ct,lt,gt,πt

xt∈X

where πt(xt,bt) is the marginal of πt(xt,mt,bt).

4. Consider E{ ρ(Xt,Ut) | ct,lt,gt}. By the problem formulation π1is known

(2.44)

?

=

?

?

PX×Mπt(xt,bt)ρ?xt,gt(bt)?dbt:= ˜ ρ(πt,gt),

Revision submitted to SIAM Journal of Control and Optimization – November ,