Page 1

arXiv:0708.0271v1 [cs.IT] 2 Aug 2007

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007.1

Capacity Region of the Finite-State Multiple

Access Channel with and without Feedback

Haim Permuter and Tsachy Weissman

Abstract

The capacity region of the Finite-State Multiple Access Channel (FS-MAC) with feedback that may be an

arbitrary time-invariant function of the channel output samples is considered. We characterize both an inner and an

outer bound for this region, using Masseys’s directed information. These bounds are shown to coincide, and hence

yield the capacity region, of FS-MACs where the state process is stationary and ergodic and not affected by the

inputs. Though ‘multi-letter’ in general, our results yield explicit conclusions when applied to specific scenarios of

interest. E.g., our results allow us to:

• Identify a large class of FS-MACs, that includes the additive mod-2 noise MAC where the noise may have

memory, for which feedback does not enlarge the capacity region.

• Deduce that, for a general FS-MAC with states that are not affected by the input, if the capacity (region) without

feedback is zero, then so is the capacity (region) with feedback.

• Deduce that the capacity region of a MAC that can be decomposed into a ‘multiplexer’ concatenated by a point-

to-point channel (with, without, or with partial feedback), the capacity region is given byP

mRm ≤ C, where

C is the capacity of the point to point channel and m indexes the encoders. Moreover, we show that for this

family of channels source-channel coding separation holds.

Index Terms

Feedback capacity, multiple access channel, capacity region, directed information, causal conditioning, code-tree,

source-channel coding separation, sup-additivity of sets.

I. INTRODUCTION

The Multiple Access Channel (MAC) has received much attention in the literature. To put our contributions

in context, we begin by briefly describing some of the key results in the area. The capacity region for the

memoryless MAC was derived by Ahlswede in [1]. Cover and Leung derived an achievable region for a memoryless

MAC with feedback in [2]. Using block Markov encoding, superposition and list codes, they showed that the

region R1 ≤ I(X1;Y |X2,U), R2 ≤ I(X2;Y |X1,U) and R1+ R2 ≤ I(X1,X2;Y ) where P(u,x1,x2,y) =

p(u)p(x1|u)p(x2|u)p(y|x1,x2) is achievable for a memoryless MAC with feedback. Willems showed in [3] that

The authors are with the Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA. (Email: {haim1,

tsachy}@stanford.edu)

This work was supported by the NSF through the CAREER award and TFR-0729119 grant.

Page 2

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007.2

the achievable region given by Cover and Leung for a memoryless channel with feedback is optimal for a class of

channels where one of the inputs is a deterministic function of the output and the other input. More recently Bross

and Lapidoth [4] improved Cover and Leung’s region, and Wu et. al. [5] have extended Cover and Leung’s region

for the case that non-causal state information is available at both encoders.

Ozarow derived the capacity of a memoryless Gaussian MAC with feedback in [6], and showed it to be achievable

via a modification of the Schalkwijk-Kailath scheme [7]. In general, the capacity in the presence of noisy feedback

is an open question for the point-to-point channel and a fortiori for the MAC. Lapidoth and Wigger [8] presented an

achievable region for the case of the Gaussian MAC with noisy feedback and showed that it converges to Ozarow’s

noiseless-feedback sum-rate capacity as the feedback-noise variance tends to zero. Other recent variations on the

Schalkwijk-Kailath scheme of relevance to the themes of our work include the case of quantization noise in the

feedback link [9] and the case of interference known non-causally at the transmitter [10].

Verd´ u characterized the capacity region of a Multi-Access channel of the form P(yi|xi

P(yi|xi

synchronism between the two users, i.e., there is a random shift between the users, only stationary input distributions

1,xi

2,yi−1)=

1,i−m,xi

2,i−m) without feedback in [11]. Verd´ u further showed in that work that in the absence of frame

need be considered. Cheng and Verd´ u built on the capacity result from [11] in [12] to show that for a Gaussian

MAC there exists a water-filling solution that generalizes the point-to-point Gaussian channel.

In [13] [14], Kramer derived several capacity results for discrete memoryless networks with feedback. By using

the idea of code-trees instead of code-words, Kramer derived a ‘mulit-letter’ expression for the capacity of the

discrete memoryless MAC. One of the main results we develop in the present paper extends Kramer’s capacity

result to the case of a stationary and ergodic Markov Finite-State MAC (FS-MAC), to be formally defined below.

In [15] [16], Han used the information-spectrum method in order to derive the capacity of a general MAC

without feedback, when the channel transition probabilities are arbitrary for every n symbols. Han also considered

the additive mod-q MAC, which we shall use here to illustrate the way in which our general results characterize

special cases of interest. In particular, our results will imply that feedback does not increase the capacity region of

the additive mod-q MAC.

In this work, we consider the capacity region of the Finite-State Multiple Access Channel (FS-MAC), with

feedback that may be an arbitrary time-invariant function of the channel output samples. We characterize both an

inner and an outer bound for this region. We further show that these bounds coincide, and hence yield the capacity

region, for the important subfamily of FS-MACs with states that evolve independently of the channel inputs. Our

derivation of the capacity region is rooted in the derivation of the capacity of finite-state channels in Gallager’s

book [17, ch 4,5]. More recently, Lapidoth and Telatar [18] have used it in order to derive the capacity of a

compound channel without feedback, where the compound channel consists of a family of finite-state channels. In

particular, they have introduced into Gallager’s proof the idea of concatenating codewords, which we extend here

to concatenating code-trees.

Though ‘multi-letter’ in general, our results yield explicit conclusions when applied to more specific families

of MACs. For example, we find that feedback does not increase the capacity of the mod-q additive noise MAC

Page 3

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007.3

(where q is the size of the common alphabet of the input, output and noise), regardless of the memory in the

noise. This result is in sharp contrast with the finding of Gaarder and Wolf in [19] that feedback can increase the

capacity even of a memoryless MAC due to cooperation between senders that it can create. Our result should also

be considered in light of Alajaji’s work [20], where it was shown that feedback does not increase the capacity of

discrete point-to-point channels with mod-q additive noise. Thus, this part of our contribution can be considered

a multi-terminal extension of Alajaji’s result. Our results will in fact allow us to identify a class of MACs larger

than that of the mod-q additive noise MAC for which feedback does not enlarge the capacity region.

Further specialization of the results will allow us to deduce that, for a general FS-MAC with states that are

not affected by the input, if the capacity (region) without feedback is zero, then so is the capacity (region) with

feedback. It will also allow us to identify a large class of FS-MACs for which source-channel coding separation

holds.

The remainder of this paper is organized as follows. We concretely describe our channel model and assumptions

in Section II. In Section III we introduce some notation, tools and results pertaining to directed information and the

notion of causal conditioning that will be key in later sections. We state our main results in Section IV. In Section V

we apply the general results of Section IV to obtain the capacity region for several interesting classes of channels,

as well as establish a source-channel separation result. The validity of our inner and outer bounds is established,

respectively, in Section VI and Section VII. In Section VIII we show that our inner and outer bounds coincide,

and hence yield the capacity region, when applied to the FS-MAC without feedback. This result can be thought

of as the natural extension of Gallager’s results [17, Ch. 4] to the MAC or, alternatively, as the natural extension

of Gallager’s derivation of the MAC capacity region in [21] to channels with states. In Section IX we characterize

the capacity region for the case of arbitrary (time-invariant) feedback and FS-MAC channels with states that evolve

independently of the input, as well as the FS-MAC with limited ISI (which is the natural MAC-analogue of Kim’s

point-to-point channel [22]), by showing that our inner and outer bounds coincide for this case. We conclude in

Section X with a summary of our contribution and a related future research direction.

II. CHANNEL MODEL

In this paper, we consider an FS-MAC (Finite state MAC) with a time invariant feedback as illustrated in Fig. 1.

The MAC setting consists of two senders and one receiver. Each sender l ∈ {1,2} chooses an index mluniformly

from the set {1,...,2nRl} and independently of the other sender. The input to the channel from encoder l is

denoted by {Xl1,Xl2,Xl3,...}, and the output of the channel is denoted by {Y1,Y2,Y3,...}. The state at time i,

i.e., Si ∈ S, takes values in a finite set of possible states. The channel is stationary and is characterized by a

conditional probability P(yi,si|x1i,x2i,si−1) that satisfies

P(yi,si|xi

1,xi

2,si−1,yi−1) = P(yi,si|x1i,x2i,si−1),

(1)

where the superscripts denote sequences in the following way: xi

l= (xl1,xl2,...,xli), l ∈ {1,2}. We assume a

communication with feedback zi

lwhere the element zliis a time-invariant function of the output yi. For example,

Page 4

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007.4

Encoder 1

x1,i(m1,zi−1

1

)

Encoder 2

x2,i(m2,zi−1

2

)

m1

∈ {1,...,2nR1}

m2

∈ {1,...,2nR2}

Finite State MAC

P(yi,si|x1,i,x2,i,si−1)

Time-Invariant

Function

Time-Invariant

Function

z2,i(yi)

z1,i(yi)

z2,i−1

z1,i−1

Decoder

Unit

Delay

Unit

Delay

ˆ m1(yN)

ˆ m2(yN)

ˆ m1, ˆ m2

yi

yi

Fig. 1. Channel with feedback that is a time invariant deterministic function of the output.

zlicould equal yi(perfect feedback), or a quantized version of yi, or null (no feedback). The encoders receive the

feedback samples with one unit delay.

A code with feedback consists of two encoding functions gl: {1,...,2nR1} × Zn−1

kth coordinate of xn

l

→ Xn

l, l = 1,2, where the

l∈ Xn

lis given by the function

xlk= glk(ml,zk−1

l

),k = 1,2,...,n,l = 1,2

(2)

and a decoding function,

g : Yn→ {1,...,2nR1} × {1,...,2nR2}.

(3)

The average probability of error for ((2nR1,2nR2,n) code is defined as

P(n)

e

=

1

2n(R1+R2)

?

w1,w2

Pr{g(Yn) ?= (w1,w2)|(w1,w2) sent}.

(4)

A rate (R1,R2) is said to be achievable for the MAC if there exists a sequence of ((2nR1,2nR2),n) codes with

P(n)

e

→ 0. The capacity region of MAC is the closure of the set of achievebale (R1,R2) rates.

III. DIRECTED INFORMATION

Throughout this paper we use the Causal Conditioning notation (·||·). We denote the probability mass function

(pmf) of YNcausally conditioned on XN−d, for some integer d ≥ 0, as P(yN||xN−d) which is defined as

N

?

(if i − d ≤ 0 then xi−dis set to null). In particular, we extensively use the cases where d = 0,1:

N

?

P(yN||xN−d) ?

i=1

P(yi|yi−1,xi−d),

(5)

P(yN||xN) ?

i=1

P(yi|yi−1,xi)

(6)

Page 5

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007.5

Q(xN||yN−1) ?

N

?

i=1

Q(xi|xi−1,yi−1),

(7)

where the letters Q and P are both used for denoting pmfs.

Directed information I(XN→ YN) was defined by Massey in [23] as

I(XN→ YN) ?

N

?

i=1

I(Xi;Yi|Yi−1).

(8)

It has been widely used in the characterization of capacity of point-to-point channels [22], [24]–[29], compound

channels [30], network capacity [14], [31], rate distortion [32]–[34] and computational biology [35], [36]. Directed

information can also be expressed in terms of causal conditioning as

I(XN→ YN) =

N

?

i=1

I(Xi;Yi|Yi−1) = E

?

logP(YN||XN)

P(YN)

?

,

(9)

where E denotes expectation. The directed information from XNto YN, conditioned on S, is denoted as I(XN→

YN|S) and is defined as:

I(XN→ YN|S) ?

i=1

N

?

I(Xi;Yi|Yi−1,S).

(10)

Directed information between XN

1 to YNcausally conditioned on XN

2 is defined as

I(XN

1→ YN||XN

2) ?

N

?

i=1

I(Xi

1;Yi|Xi

2,Yi−1) = E

?

logP(YN||XN

P(YN||XN

1,XN

2)

2)

?

.

(11)

where P(yN||xN

Throughout this paper we are using several properties of causal conditioning and directed information that follow

1,xN

2) =?N

i=1P(yi|yi−1,xi

1,xi

2).

from the definitions and simple algebra. Many of the key properties that hold for mutual information and regular

conditioning carry over to directed information and causal conditioning, where P(xN) is replaced by P(xN||yN−1)

and P(yN) is replaced by P(yN||xN). Specifically,

Lemma 1: (Analogue to P(xN

1,yN) = P(xN

1)P(yN|xN

1).) For arbitrary random vectors (XN

1,XN

2,YN),

P(xN

1,yN) = P(xN

1||yN−1)P(yN||xN

1)

(12)

P(xN

1,yN||xN

1;YN) − I(XN

2) = P(xN

1||yN−1,xN

1;YN|S)| ≤ H(S).) For arbitrary random vectors and variables,

2)P(yN||xN

1,xN

2).

(13)

Lemma 2: (Analogue to |I(XN

??I(XN

1→ YN||XN

1→ YN) − I(XN

1→ YN|S)??≤ H(S) ≤ log|S|

2) − I(XN

(14)

??I(XN

1→ YN||XN

2,S)??≤ H(S) ≤ log|S|.

(15)

The proofs of Lemma 1 and Lemma 2 can be found in [27, Sec. IV], along with some additional properties of causal

conditioning and directed information. The next lemma, which is proven in Appendix I, shows that by replacing

regular pmf with causal conditioning pmf we get the directed information. Let us denote the mutual informa-

tion I(Xn

1;Yn|Xn

2) as a functional of Q(xN

1,xN

2) and P(yN|xN

1,xN

2), i.e., I(Q(xN

1,xN

2);P(yN|xN

1,xN

2)) ?

Page 6

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007.6

I(Xn

1;Yn|Xn

Q(xN

2). Consider the case that the random variables XN

1,XN

2

are independent, i.e., Q(xN

1,xN

2) =

1)Q(xN

2), then by definition

I(Q(xN

1)Q(xN

2);P(yN|xN

1,xN

2)) ?

?

yN,xN

1,xN

2

Q(xN

1)Q(xN

2)P(yN|xN

1,xN

2)

P(yN|xN

1Q(x′N

1,xN

2)

?

x′N

1)P(yN|x′N

1,xN

2). (16)

Lemma 3: If the random vectors XN

1

and XN

2

are causal-conditionally independent given YN−1, i.e.,

Q(xN

1,xN

2||yN−1) = Q(xN

1||yN−1)Q(xN

2||yN−1) then

I(Q(xN

1||yN−1)Q(xN

2||yN−1);P(yN||xN

1,xN

2)) = I(XN

1→ YN||XN

2).

(17)

The next lemma, which is proven in Appendix II, shows that in the absence of feedback, mutual information

becomes directed information.

Lemma 4: If Q(xN

1,xN

2||yN−1) = Q(xN

1)Q(xN

2) then

I(XN

1;YN|XN

2) = I(XN

1→ YN||XN

2).

(18)

IV. MAIN THEOREMS

We dedicate this section to a statement of our main results, proofs of which will appear in the subsequent sections.

Let Rndenote the following region in R2

+(2D set of nonnegative real numbers):

Rn=

?

Q(w)Q(xn

1||zn−1

1

,w)Q(xn

2||zn−1

2

,w)

R1≤ mins0

R1≤ mins0

R1+ R2≤ mins0

1

nI(Xn

1→ Yn||Xn

nI(Xn

2,W,s0) −log |S|

1,W,s0) −log |S|

n

,

1

2→ Yn||Xn

1

nI((X1,X2)n→ Yn|W,s0) −log |S|

n

,

n

.

(19)

Having the auxiliary random variable W is equivalent to taking the convex hull of the region. It is shown in

the Appendix that the inclusion (or omission) of W in the definition of the region Rnhas vanishing effect with

increasing n.

Theorem 5: (Inner bound.) For any FS-MAC with time invariant feedback as shown in Fig. 1, and for any integer

n ≥ 1, the region Rnis achievable.

Let Rndenote the following region in R2

+

Rn=

?

Q(xn

1||zn−1

1

)Q(xn

2||zn−1

2

)

R1≤1

R1≤1

R1+ R2≤1

nI(Xn

1→ Yn||Xn

nI(Xn

2),

2→ Yn||Xn

nI((X1,X2)n→ Yn).

1),

(20)

In the following theorem we use the standard notion of convergence of sets. Confer Appendix IV for the details of

the definition.

Theorem 6: (Outer bound.) Let (R1,R2) be an achievable pair for a FS-MAC with time invariant feedback,

as shown in Fig. 1. Then, for any n there exists a distribution Q(xn

1||zn−1

1

)Q(xn

2||zn−1

2

) such that the following

Page 7

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007.7

inequalities hold:

R1

≤

1

nI(Xn

1

nI(Xn

1

nI((X1,X2)n→ Yn) + ǫn,

1→ Yn||Xn

2) + ǫn

R2

≤

2→ Yn||Xn

1) + ǫn

R1+ R2

≤

(21)

where ǫngoes to zero as n goes to infinity. Moreover, the outer bound can be written as liminf Rn.

For the case where there is no feedback, i.e., zi is null, Rn and Rncan be expressed in terms of mutual

information and regular conditioning due to Lemma 4.

Theorem 7: (Capacity of FS-MAC without feedback.) For any indecomposable FS-MAC without feedback, the

achievable region is limn→∞Rn, and the limit exists.

Theorem 8: (Capacity of FS-MAC with feedback.) For any FS-MAC of the form

P(yi,si|x1i,x2,i,si−1) = P(si|si−1)P(yi|x1i,x2,i,si−1),

(22)

where the state process Siis stationary and ergodic, the achievable region is limn→∞Rn, and the limit exists.

The next theorems will be seen to be consequences of the capacity theorems given above.

Theorem 9: For the channel described in (22), where the state process siis stationary and ergodic, if the capacity

without feedback is zero, then it is also zero in the case that there is feedback.

Corollary 10: For a memoryless MAC, the capacity with feedback is zero if and only if it is zero without

feedback.

Corollary 11: Feedback does not enlarge the capacity region of a discrete additive (mod-|X|) noise MAC.

In fact, among other results, we will see in the next section that the (mod-|X|) noise MAC is only a subset of a

larger family of MACs for which feedback does not enlarge the capacity region.

V. APPLICATIONS

The capacity formula of a FS-MAC given in Theorems 7 and 8 is a multi-letter characterization. In general, it

is very hard to evaluate it but, for the finite state point to point channel, there are several cases where the capacity

with and without feedback was found numerically [37] [38], [26], [25] and analytically [28].1

The multi-letter capacity expression is also valuable for deriving useful concepts in communication. For instance,

in order to show that feedback does not increase the capacity of a memoryless channel (cf. [43]), we can use

the multi-letter upper bound of a channel with memory. Further, in [27] it was shown that for the cases where

the capacity is given by the multi-letter expression C = limN→∞

channel coding separation holds. It was also shown that if the state of the channel is known at both the encoder and

1

NmaxQ(xN||zN−1)I(XN→ YN), the source-

decoder and the channel is connected (i.e., every state can be reached with some positive probability from every

other state under some input distribution), then feedback does not increase the capacity of the channel.

1For the Gaussian case without feedback there exists the water filling solution [39], and recently the feedback capacity was found analytically,

for the case that the noise is an ARMA(1)-Gaussian process (cf. [40]–[42]).

Page 8

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007.8

In this section we use the capacity formula in order to derive three conclusions:

1) For a stationary ergodic Markovian channels, the capacity is zero if and only if the capacity with feedback

is zero.

2) Identify FS-MACs that feedback does not enlarge the capacity and show that for a MAC that can be

decomposed into a ‘multiplexer’ concatenated by a point-to-point channel (with, without, or with partial

feedback), the capacity region is given by?

channel.

3) Source-channel coding separation holds for a MAC that can be decomposed into a ‘multiplexer’ concatenated

mRm ≤ C, where C is the capacity of the point to point

by a point-to-point channel (with, without, or with partial feedback).

As a special case of the second concept we show that the capacity of a Binary Gilbert-Ellliot MAC is R1+R2≤

1 − H(V) where V is the entropy rate of the hidden Markov noise that specifies the Binary Gilbert-Ellliot MAC.

A. Zero capacity

The first concept is given in Theorem 9 and is proved here. The proof of Theorem 9 is based on the following

lemma which is proven in Appendix III.

Lemma 12: For a MAC described by an arbitrary causal conditioning p(yn||xn

1,xn

2) the following holds:

max

Q(xn

1||yn−1)Q(xn

2||yn−1)I(Xn

1,XN

2→ Yn) = 0 ⇐⇒max

1)Q(xn

Q(xn

2)I(Xn

1,XN

2→ Yn) = 0,

(23)

and each condition also implies that P(yn||xn

Proof of Theorem 9: Since the channel is a Markovian channel, i.e.,

1,xn

2) = P(yn) for all xn

1,xn

2.

P(yi,si|x1,i,x2,i,si−1) = p(si|si−1)P(yi|x1,i,x2,i,si−1)

(24)

and stationary and ergodic, its capacity region is given in Theorem 8 as C = limn→∞Rn. Furthermore, since

the sequence {Rn} is sup-additive (Lemma 22), then according to Lemma 23 that is given in Appendix IV

limn→∞Rn= cl

??

n≥1Rn

?

, implying that if the capacity without feedback is zero, then for all n ≥ 1

max

1)Q(xn

Q(xn

2)I(Xn

1,XN

2→ Yn) = 0.

(25)

Accordingto Lemma12, themaximization of theobjectivein eq.(25)over thedistribution

Q(xn

1||yn−1)Q(xn

2||yn−1) is still zero, hence, the capacity region is zero even if there is perfect feedback.

Corollary 10, which states that the capacity of a memoryless MAC without feedback is zero if and only if the

capacity with feedback is zero, follows immediately from Theorem 9 because a memoryless MAC can be considered

a FS-MAC with one state.

Clearly, Theorem 9 also holds for the case of a stationary and ergodic FS-Markov point-to-point channel because

a MAC is an extension of a point-to-point channel. However, it does not hold for the case of a broadcast channel.

For instance, consider the binary broadcast channel given by y1,i= x⊕niand y2,i= x⊕ni−1, where niis an i.i.d

Page 9

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007.9

Bernoulli(1

2) and ⊕ denotes addition mod-2. The capacity without feedback is clearly zero, but if the transmitter

has feedback, namely if it knows y1,i−1and y2,i−1at time i, then it can compute the noise ni−1= y1,i−1⊕ xi−1

and therefore it can transmit 1 bit per channel use to the second user.

B. Examples of channels for which feedback does not enlarge capacity

yy

α

1 − α

β

1 − β

GB

X1

X1

X2

X2

V ∼ Bernouli(pG)

V ∼ Bernouli(pB)

Fig. 2. Gilbert-Elliot Mac. It has two states,“Good” and “Bad” where the transition between them is according to a first order Markov process.

Given that the channel is in a “Good” (or a “Bad”) state, it behaves as binary additive noise where the noise is Bernouli(pG) (or Bernouli(pB))

1) Gilbert-Elliot MAC: The Gilbert-Elliot channel is a widely used example of a finite state channel. It is often

used to model wireless communication in the presence of fading [37], [38], [44]. The Gilbert-Elliot is a Markov

channel with two states, denoted as “good” and “bad”. Each state is a binary symmetric channel and the probability

of flipping the bit is lower in the “good” state. In the case of the Gillber-Elliot MAC (Fig. 2), each state is an

additive MAC with i.i.d noise, where in the “good” channel the probability that the noise is ’1’ is lower than in

the bad channel. This channel can be represented as an additive MAC as in Fig. 2, where the noise is a hidden

Markov process.

Since the Gilbert-Elliot MAC is an ergodic FS-MAC, its capacity with feedback when the initial state distribution

over the states “good” and “bad” is the stationary distribution is given by limn→∞Rn(Theorem 8). For the Gilbert

Elliot MAC, the region limn→∞Rnreduces to the simple region,

R1+ R2≤ 1 − H(V),

(26)

where H(V) denotes the entropy rate of the hidden Markov noise. The following equalities and inequalities upper

bound the region Rn and this upper bound can be achieved for any deterministic feedback by an i.i.d input

distribution X1,i∼ Bernoulli(1

other.

2) and X2,i∼ Bernoulli(1

2), i = 1,2,...,n and Xn

1and Xn

2are independent of each

Page 10

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007. 10

I((X1,X2)n→ Yn)=

n

?

n

?

n

?

n

?

n

?

n(1 −H(Vn)

i=1

H(Yi|Yi−1) − H(Yi|Yi−1,Xi

1,Xi

2)

(a)

=

i=1

H(Yi|Yi−1) − H(Vi|Yi−1,Xi

1,Xi

2)

=

i=1

H(Yi|Yi−1) − H(Vi|Vi−1,Yi−1,Xi

1,Xi

2)

(b)

=

i=1

H(Yi|Yi−1) − H(Vi|Vi−1)

(c)

≤

i=1

log2 − H(Vi|Vi−1)

=

n

).

(27)

Equality (a) is due to the facts that yi is a function of (vi,x1,i,x2,i) and vi is a deterministic function of

(yi,x1,i,x2,i), i.e. yi = x1,i⊕ x2,i⊕ vi and vi = yi⊕ x1,i⊕ x2,i. Equality (b) follows from the fact that vi

is independent of the messages. Inequality (c) is due to the fact that the size of the alphabet Y is 2. Similarly

1

nI(Xn

n

, and

i.i.d input distribution Bernoulli(1

1→ Yn||Xn

2) ≤ 1 −

H(Vn)

1

nI(Xn

2→ Yn||Xn

1) ≤ 1 −

H(Vn)

n

and equality is achieved with an

2).Finally, by dividing both sides by n and using the definition of entropy rate

H(V) = limn→∞

2) Multiplexer followed by a point-to-point channel: Here we extend the Gilber-Elliot MAC to the case where

1

nH(Vn) we conclude the proof.

the discrete MAC can be decomposed into two components as shown in Fig. 3. The first component is a MAC

that can behave as a multiplexer and the second component is a point-to-point channel. The definitions of those

components are the following:

Delay

Delay

W1

...

WM

X1i(W1,Yn−1)

...

XMi(WM,Yi−1)

X0i

Yi

(ˆ W1,...,ˆ WM)

...

point-to-point

channel

MAC

Multiplexer

Fig. 3. Discrete MAC that can be decomposed into two parts. The first part is a MAC that behaves as a multiplexer and the second part is a

point-to-point channel

Definition 1: A MAC behaves as a multiplexer if the inputs and the output have common alphabets and for all

m ∈ 1,...,M there exists a choice of input symbols for all senders except sender m, such that the output is the

mth input, i.e. Y = Xm.

Page 11

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007.11

An example of a multiplexer-MAC for the Binary case is a MAC whose output is one of and/or/xor of the inputs.

For a general alphabet q those operations could be max/min/addition-mod-q. For instance, if the channel is binary

with two users and it is addition-mod-2, i.e., y = x1⊕ x2, then we can ensure that y = x1by choosing x2= 0.

Theorem 13: The capacity region of a multiplexer MAC followed by a point-to-point channel with a time invariant

feedback to all encoders, as shown in Fig. 3, is

M

?

m=1

Rm≤ C

(28)

where C is the capacity of the point-to-point channel with the time invariant feedback zi−1(yi−1).

Proof:

The achievability is proved simply by time sharing. At each time, only one selected user sends

information and the other users send a constant input that insures that the output is the input of the selected user.

The converse is based on the fact that the maximum rate that can be transmitted through the point-to-point

channel is C and it is an upper bound sum-rate of multiplexer-MAC. If it hadn’t been an upper bound for the

multiplexer-MAC, we could build a fictitious Multiplexer-MAC before the point-to-point channel and achieve by

that a higher rate than its upper bound which would be contradiction.

3) Discrete additive MAC: An immediate consequence of Theorem 13 is an extension of Alajaj’s result [20] to

the additive MAC which is given in Corollary 11. Corollary 11 states that feedback does not enlarge the capacity

region of a discrete additive (mod-|X|) noise MAC.

The proof of the corollary is based on the following observation. If feedback does not increase the capacity of

a particular point-to-point channel then feedback also does not increase the capacity of the MUX followed by the

same particular channel. Specifically, feedback does not increase the achievable region of an additive MAC (Fig.

4) and the achievable region is given by

M

?

m=1

Rm≤ logq − H(V),

(29)

where H(V) is the entropy rate of the additive noise.

delay

delay

W1

...

W1

...

WM

WM

X1n(W1)

...

XMn(WM)

Yn

Yn

Vn

Vn

(ˆ W1,...,ˆ WM)

(ˆ W1,...,ˆ WM)

X1n(W1,Yn−1)

...

XMn(WM,Yn−1)

...

...

Fig. 4. Additive noise MAC with and without feedback. The random variables X1n,...,XMn,Yn,Vn,n ∈ 1,2,3,..., are from a common

alphabet of size q, and they denote the input from sender 1,...,M, the output and the noise at time n, respectively. The relation between the

random variables is given by yn = x1n⊕ x2n... ⊕ xMn⊕ vn where ⊕ denotes addition mod-q. The noise Vn, possibly with memory, is

independent of the messages W1,...,WM.

Page 12

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007.12

4) Multiplexer followed by erasure channel: Consider the case of the multiplexer-erasure MAC which is a

multiplexer followed by an erasure channel, possibly with memory.

Definition 2: A point-to-point channel is called erasure channel if the output at time n can be written as Yn=

f(Xn,Zn), and the following properties hold:

1) The alphabet of Z is binary and the alphabet of Y is the same as X plus one additional symbol called the

erasure.

2) The process Znis stationary and ergodic and is independent of the message.

3) If zn= 0, then yn= xnand if zn= 1, then the output is an erasure regardless of the input.

For the mutltiplexr-erasure channel we have the following theorem.

Corollary 14: The capacity region of the multiplexer-erasure MAC with or without feedback is

M

?

m=1

Rm≤ (1 − pe)logq,

(30)

where pe is the marginal probability of having an erasure. Moreover, even if the encoder has non causal side

information, i.e. the encoders know where the erasures appear noncausally, the capacity is still given by (30).

Proof: According to Theorem 13 the capacity region is

M

?

m=1

Rm≤ C,

(31)

where C is the capacity of the erasure point-to-point channel. Diggavi and Grossglauser [45, Thm. 3.1] showed

that the capacity of a point-to-point erasure channel, with and without feedback, is given by (1 − pe)logq. Since

the probability of having an erasure does not depend on the input to the channel, we deduce that even in the the

case where the encoder knows the sequence Znnon-causally, which is better than feedback, the transmitter can

transmit only fraction 1 − peof the time, hence the capacity cannot exceed (1 − pe)logq.

5) Multiplexer followed by the trapdoor channel: In this example feedback increases the capacity. Based on the

fact that the capacity of the trapdoor channel with feedback [28] is the logarithm of the golden ratio, i.e. log

√5+1

2

,

the achievable region of a Multiplexer followed by the trapdoor channel is

M

?

m=1

Rm≤ log

√5 + 1

2

.

(32)

C. Source-channel coding separation

Cover, El-Gamal and Salehi [46] showed that, in general, the source channel separation does not hold for MACs

even for a memoryless channel without feedback. However, for the case where the MAC is a discrete Multiplexer

followed by a channel we now show that it does hold.

We want to send the sequence of symbols Un

1,Un

2over the MAC, so that the receiver can reconstruct the

sequence. To do this we can use a joint source-channel coding scheme where we send through the channel the

symbols x1,i(un

ˆUn

1,zi−1) and x2,i(un

2,zi−1). The receiver looks at his received sequence Ynand makes an estimate

2, i.e., the probability of error P(n)

1,ˆUn

Pr((ˆUn

2. The receiver makes an error ifˆUn

1?= Un

1or ifˆUn

2?= Un

e

is P(n)

e

=

1,ˆUn

2) ?= (Un

1,Un

2)).

Page 13

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007. 13

Theorem 15: (Source-channel coding theorem for a Multiplexer followed by a channel.) Let (U1,U2)n≥1 be a

finite alphabet, jointly stationary and ergodic pair of processes and let the MAC channel be a multiplexer followed

by a point-to-point channel with time invariant feedback and capacity C = limN→∞

(e.g., a memoryless channel, an indecomposable FSC without feedback, stationary and ergodic Markovian channel).

1

NmaxQ(xn||zn−1)I(Xn;Yn)

For the source and the MAC described above:

(direct part.) There exists a source-channel code with P(n)

e

→ 0, if H(U1,U2) < C, where H(U1,U2) is the

entropy rate of the sources and C is the capacity of the point-to-point channel with a time-invariant feedback.

(converse part). If H(U1,U2) > C, then the probability of error is bounded away from zero (independent of the

blocklength).

Proof: The achievability is a straightforward consequence of the Slepian-Wolf result for Ergodic and stationary

processes [47] and the achievability of the multiplexer followed by a point-to-point channel. First, we encode the

sources by using the Sepian-Wolf achievability scheme where we assign every un

1to one of 2nR1bins according

to a uniform distribution on {1,...,2nR1} and independently we assign every un

a uniform distribution on {1,...,2nR2}. Second, we encode the bins as if they were messages, as shown in Fig. 5.

In the converse, we assume that there exists a sequence of codes with P(n)

2to one of 2nR2bins according to

e

→ 0, and we show that it implies

that H(U1,U2) ≤ C. Fix a given coding scheme and consider the following:

H(Un

1,Un

2)

(a)

≤

(b)

≤

I(Un

1,Un

2;ˆUn

1,ˆUn

2) + nǫn

I(Un

1,Un

2;Yn) + nǫn

=

H(Yn) − H(Yn|Un

n

?

n

?

n

?

n

?

n

?

n

?

I(Xn

1,Un

2) + nǫn

=

i=1

H(Yi|Yi−1) − H(Yi|Un

1,Un

2,Yi−1) + nǫn

(c)

=

i=1

H(Yi|Yi−1) − H(Yi|Un

1,Un

2,Yi−1,Xi

1,Xi

2) + nǫn

(d)

=

i=1

H(Yi|Yi−1) − H(Yi|Yi−1,Xi

1,Xi

2) + nǫn

=

i=1

H(Yi|Yi−1) − H(Yi|Yi−1,Xi

1,Xi

2) + nǫn

=

i=1

I(Xi

1,Xi

2;Yi|Yi−1) + nǫn

(e)

≤

i=1

I(Xi

0;Yi|Yi−1) + nǫn

=

0→ Yn) + nǫn

≤max

Q(xn

0||zn−1)I(Xn

0→ Yn) + nǫn

(33)

Inequality (a) is due to Fano’s inequality where nǫn = 1 + P(n)

e

n|U1||U2|. Inequality (b) follows from the data

Page 14

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007.14

processing inequality because (UN

1,UN

2) − YN− (ˆUN

1,ˆUN

2) form a Markov chain. Equality (c) is due to the fact

that, for a given code, Xi

1is a deterministic function of Un

1,Yi−1and, similarly, Xi

2is a deterministic function of

Un

2,Yi−1. Equality (d) is due to the Markov chain (UN

the output of the multiplexer which is also the input to the point-to-point channel at time i. The inequality in (e) is

1,UN

2) − (Xi

1,Xi

2,Yi−1) − Yi. The notation X0,idenotes

due to the data processing inequality which can be invoked thank to the fact that given Yi−1we have the Markov

chain Xi

1,Xi

2− Xi

0− Yi.

By dividing both sides of (33) by n, taking the limit n→∞, and recalling that C=

limn→∞

1

nmaxQ(xn||zn−1)I(Xn;Yn) we have

H(U1,U2) = lim

n→∞

1

nH(Un

1,Un

2) ≤ C.

(34)

Delay

Delay

W1(Un

1)

Un

1

∈ {1,...,2nR1}

W2(UN

2)

Un

2

∈ {1,...,2nR2}

X1i(W1,Yn−1)

X2i(W2,Yi−1)

X0i

Yi

ˆ W1(Yn)

ˆ W2(Yn)

ˆUn

ˆUn

1(ˆ W1,ˆ W2)

2(ˆ W1,ˆ W2)

point-to-point

channel

MAC

Multiplexer

Fig. 5.Source-channel coding separation in a discrete Multiplexer followed by a point-to-point channel.

VI. PROOF OF ACHIEVABILITY (THEOREM 5)

The proof of achievability for the FS-MAC with feedback is similar to the proof of achievability for the point-

to-point FSC given in [27, Sec. V], but there are two main differences:

1) In the case of FSC, only one message is sent, and in the case of FS-MAC, two independent messages are

sent, which requires that we analyze three different types of errors: the first type occurs when only the first

message is decoded with error, the second type occurs when only the second message is decoded with error,

and the third type occurs when both messages are decoded with error.

2) In both cases, we generate the encoding scheme (code-trees) randomly but the distribution that is used is

different. In the case of FSC we generate, for each message in [1,...,2NR], a code-tree of length N by using

the causal conditioning distribution Q∗(xN||zN−1) = argmaxQ(xN||zN−1)mins0I(XN→ YN|s0), and here

we generate for each message in [1,...,2NRl],l = 1,2 a code-tree of length N = Kn by concatenating K

independentcode-trees where each one is created with a causal conditioningdistribution Q(xn

l||zn−1

l

),l = 1,2.

Encoding scheme: Randomly generate for encoder {l ∈ 1,2}, 2NRlcode-trees of length N = Kn by drawing

it with the fixed distributions Q(xn

l

). In other words, given a feedback sequence zN−1

l||zn−1

1

the causal conditioning

Page 15

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007. 15

probability that the sequence xN

1will be mapped to a given message is

Q(xN

1||zN−1

1

) =

K

?

k=1

Q(xkn

1,(k−1)n+1||zkn

1,(k−1)n+1),

(35)

where xkn

1,(k−1)n+1denotes the vector (x1,(k−1)n+1,x1,(k−1)n+2,...,x1,kn). Fig. 6 illustrates the concatenation

of trees graphically. In order to shorten the notation we will sometimes use the notation QN to denote

Q(xN

1||zN−1

1

)Q(xN

2||zN−1

2

) and we will express the concatenation of pmfs in (35) as QN=?K

k=1Qn.

x1= 0 x2= 1

i = 1

i = 1

i = 1

x3= 1

i = 2

i = 2

i = 2

i = 3

i = 3

i = 3

x1= 0

x1= 0

x2= 1

x2= 1

x2= 1

x2= 1

x3= 0

x3= 0

x3= 0

x3= 1

x3= 1

x3= 1

x4= 0

x4= 1

i = 4

x4= 0

x4= 1

zi−1= 0

zi−1= 1

(no feedback)

codeword (case of no feedback)

code-tree (used in [27])

concatenated code-tree (used here)

Fig. 6. Illustration of coding scheme for setting without feedback, setting with feedback as used for point-to-point channel [27] and a code-tree

that was created by concatenating smaller code-trees. In the case of no feedback each message is mapped to a codeword, and in the case of

feedback each message is mapped to a code-tree. The third scheme is a code-tree of depth 4 created by concatenating two trees of depth 2.

Decoding Errors: For each code in the ensemble, the decoder uses maximum likelihood decoding and we want

to upper bound the expected value E[Pe] for this ensemble. Let Pe1,Pe2,Pe3be defined as follows.

Pe1 (type 1 error): probability that the decoded pair (m1,m2) satisfies ˆ m1?= m1, ˆ m2= m2,

Pe2 (type 2 error): probability that the decoded pair (m1,m2) satisfies ˆ m1= m1, ˆ m2?= m2,

Pe3 (type 3 error): probability that the decoded pair (m1,m2) satisfies ˆ m1?= m1, ˆ m2?= m2.

Because the error events are disjoint we have

Pe= Pe1+ Pe2+ Pe3

(36)

In the next sequence of theorems and lemmas, we upper bound the expected value of each error type and show that

if (R1,R2) satisfies the three inequalities that define Rnthen the corresponding E[Pei],i = 1,2,3 goes to zero

and hence E[Pe] goes to zero.

Theorem 16: Suppose that an arbitrary message m1,m2,1 ≤ m1≤ M1,1 ≤ m2≤ M2, enters the encoder with

feedback and that ML decoding is employed. Let E[Pe1|m1,m2] denote the probability of decoding error averaged

Page 16

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007. 16

over the ensemble of codes when the messages m1,m2were sent. Then for any choice of ρ,0 < ρ ≤ 1,

E[Pe1|m1,m2]≤(M1− 1)ρ?

yN,xN

2

Q(xN

2||zN−1)

?

xN

xN

1

Q(xN

1||zN−1

1

)P(yN||xN

1,xN

2)

1

(1+ρ)

1+ρ

,

(37)

E[Pe2|m1,m2]≤(M2− 1)ρ?

yN,xN

1

Q(xN

1||zN−1

1

)

?

2

Q(xN

2||zN−1)P(yN||xN

1,xN

2)

1

(1+ρ)

1+ρ

,

(38)

E[Pe3|m1,m2]≤ ((M2− 1)(M2− 1))ρ?

yN

?

xN

1,xN

2

Q(xN

1||zN−1

1

)Q(xN

2||zN−1)P(yN||xN

1,xN

2)

1

(1+ρ)

1+ρ

.

(39)

The proof is given in Appendix VI and is similar to [27, Theorem 9] only that here we take into account the fact

that there are two encoders rather than one.

Let Pei(s0),i = 1,2,3 be the probability of error of type i given that the initial state of the channel is s0. Also

let R1=

1

NlogM1and R2=

following theorem establishes exponential bounds on E[Pei(s0)].

1

NlogM2be the rate of the code and R3be the sum rate, i.e. R3= R1+ R2. The

Theorem 17: The average probability of error over the ensemble, for all initial states s0, and all ρ, 0 ≤ ρ ≤ 1,

is bounded as

E[Pei(s0)|m1,m2] ≤ |S|2{−N[−ρRi+FN,i(ρ,QN)]},i = 1,2,3

(40)

where

FN,i(ρ,QN) = −ρlog|S|

N

+

?

min

s0

EN,i(ρ,QN,s0)

?

,i = 1,2,3

(41)

EN,1(ρ,QN,s0)=−1

Nlog

?

yN,xN

2

Q(xN

2||zN−1)

?

xN

xN

1

Q(xN

1||zN−1

1

)P(yN||xN

1,xN

2,s0)

1

(1+ρ)

1+ρ

(42)

EN,2(ρ,QN,s0)=−1

Nlog

?

yN,xN

1

Q(xN

1||zN−1

1

)

?

2

Q(xN

2||zN−1)P(yN||xN

1,xN

2.s0)

1

(1+ρ)

1+ρ

(43)

EN,3(ρ,QN,s0)=−1

Nlog

?

yN

?

xN

1,xN

2

Q(xN

1||zN−1

1

)Q(xN

2||zN−1)P(yN||xN

1,xN

2,s0)

1

(1+ρ)

1+ρ

. (44)

The proof is based on algebraic manipulation of the bounds given in (37)-(39). It is similar to the proof of Theorem

9 in [27] and therefore omitted. There are two differences between the proofs (and both are straightforward to

accommodate): Here the input distribution QN = Q(xN

1||zN

1)Q(xN

2||zN

2) is arbitrary while in [27] we chose the

one that maximizes the error exponent. Second, here we bound the averaged error over the ensemble and in [27] we

have an additional step where we claim that there exists a code that has an error that is bounded by the expression

in (40). Because of this difference the bound on the probability of error in [27] has an additional factor of 4.

The following theorem presents a few properties of the functions EN,i(ρ,QN,s0), i = 1,2,3, such as positivity

of the function and its derivative, convexity with respect to ρ, and an upper bound on the derivative which is

achieved for ρ = 0.

Page 17

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007. 17

Lemma 18: The term EN,i(ρ,QN,s0) has the following properties:

EN,i(ρ,QN,s0) ≥ 0;ρ ≥ 0,i = 1,2,3,

(45)

1

NI(XN

1

NI(XN

1

NI(XN

1→ YN||XN

2,s0)≥

∂EN,1(ρ,QN,s0)

∂ρ

∂EN,2(ρ,QN,s0)

∂ρ

∂EN,3(ρ,QN,s0)

∂ρ

> 0;ρ ≥ 0

2→ YN||XN

1,s0)≥

> 0;ρ ≥ 0

1,XN

2→ YN|s0)≥> 0;ρ ≥ 0

(46)

∂2EN,i(ρ,QN,s0)

∂ρ2

> 0;ρ ≥ 0,i = 1,2,3.

(47)

Furthermore, equality holds in (45) when ρ = 0, and equality holds on the left sides of eq. (46) when ρ = 0 for

i = 1,2,3.

The proof of the theorem is the same proof as [21, eq. (2.20)], [17, Theorem 5.6.3]. In [21] the arguments QNof

EN,1(ρ,QN,s0) are regular conditioning i.e., Q(xN

1)Q(xN

2), and the channel is given by P(yN|xN

1,xN

2,s0), hence

the derivative of EN,1(ρ,QN,s0) with respect to ρ is upper-bounded by I(XN

1;YN|XN

2,s0). Here we replace

Q(xN

1)Q(xN

Lemma 3, the upper-bound becomes I(XN

2) with Q(xN

1||zN−1

1

)Q(xN

2||zN−1

2

) and P(yN|xN

→ YN||XN

1,xN

2,s0) with P(yN||xN

2,s0). The next lemma establishes the sup-additivity of

1,xN

2,s0) and, according to

1

FN,i(ρ,QN),i = 1,2,3.

Lemma 19: Sup-additivity of FN,i(ρ,QN). For any finite-state channel, FN,i(ρ,QN), as given by eq. (41),

satisfies

Fn+l,i(ρ,Qn+l) ≥

n

n + lFn,i(ρ,Qn) +

l

n + lFl,i(ρ,Ql),i = 1,2,3.

(48)

The proof steps are identical to the proof of the sub-additivity for the point-to-point channel [27, Lemma 11].

Invoking this lemma on the pmf QN=?K

k=1Qnwhere N = nK we get

FN,i(ρ,QN) ≥ Kn

NFn,i(ρ,Qn) = Fn,i(ρ,Qn).

(49)

Let us define

CN,1(QN)=

1

Nmin

1

Nmin

1

Nmin

s0

I(XN

1→ YN||XN

2,s0)

(50)

CN,2(QN)=

s0

I(XN

2→ YN||XN

1,s0)

(51)

CN,3(QN)=

s0

I(XN

1,XN

2→ YN|s0)

(52)

where the joint distribution of XN

1,XN

2,YN

conditioned on s0

is given by P(xN

1,xN

2,yN|s0)=

Q(xN

1||zN−1

1

)Q(xN

2||zN−1

2

)P(yN||xN

1,xN

2,s0).

Page 18

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007. 18

Theorem 5 (inner bound) given in Sec. IV states that for every n and 0 ≤ Ri< Cn,i(Qn) −log |S|

(recall, R3? R1+ R2) and every η > 0 there exists an N and an (N,⌈2NR1⌉,⌈2NR1⌉) code with a probability

of error Pe(s0) (averaged over the messages) that is less than η for all initial states s0.

n

, i = 1,2,3

Proof of Theorem 5: The proof consists of the following three steps:

• Showing that for a fixed n if Ri< Cn,i(Qn) −log |S|

n

, i = 1,2,3 then there exists ρ∗such that,

Fn,i(ρ∗,Qn) − ρ∗Ri> 0, i = 1,2,3.

(53)

• We choose ǫ < mini∈{1,2,3}Fn,i(ρ∗,Qn) − ρ∗Riand show that for sufficiently large N

E[Pei(s0)|m1,m2] ≤ 2−N([Fn,i(ρ∗,Qn)−ρ∗Ri]−ǫ), ∀s0.

(54)

• From the last step we deduce the existence of a (N,⌈2NR1⌉,⌈2NR1⌉) code s.t.

Pe(s0) < η, ∀s0.

(55)

First step: for any pair (R1,R2), we can rewrite eq. (40) for i=1,2,3 as

E[Pei(s0)|m1,m2] ≤ 2−N(FN,i(ρ,QN)−ρRi−log |S|

N

).

(56)

By using (49), which states that FN,i(ρ,QN) ≥ Fn,i(ρ,Qn), we get

E[Pei(s0)|m1,m2] ≤ 2−N(Fn,i(ρ,Qn)−ρRi−log |S|

N

).

(57)

Note that Fn,i(ρ,Qn) and therefore Fn,i(ρ,Qn) − ρR is continuous in ρ ∈ [0,1], so there exists a maximizing

ρ. Let us show that if R1< Cn,1(Qn) −log |S|

identical to i = 1). Let us define δ ? Cn,1− R1. From Lemma 18, we have that En,1(ρ,QN,s0) is zero when

ρ = 0, is a continuous function of ρ, and its derivative at zero with respect to ρ is equal or greater to Cn,1, which

satisfies Cn,1≥ R1+log |S|

En,1(ρ,QN,s0) − ρ(R1+log|S|

n

, then max0≤ρ≤1[Fn,1(ρ,Qn) − ρR1] > 0 (the cases i = 2,3 are

n

+δ

2. Thus, for each state s0there is a range ρ > 0 such that

n

) > 0.

(58)

Moreover, because the number of states is finite, there exists a ρ∗> 0 for which the inequality (58) is true for all

s0. Thus, from the definition of Fn,1(ρ∗,Qn) given in (41) and from (58),

Fn,1(ρ∗,Qn) = −ρ∗log|S|

n

+ min

s0

En,1(ρ∗,Qn,s0) > ρ∗R1,∀s0.

(59)

Second step: We choose a positive number ǫ such that ǫ < mini∈{1,2,3}Fn,i(ρ∗,Qn) − ρ∗Ri. It follows from

(57) that for every N that satisfies N >log |S|

ǫ

,

E[Pei(s0)|m1,m2] ≤ 2−N(Fn,i(ρ∗,Qn)−ρ∗Ri−ǫ),

(60)

and according to the first step of the proof the exponent Fn,i(ρ∗,Qn,s0) − ρ∗Ri− ǫ is strictly positive.

Page 19

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007.19

Third step: According to the previous step, for all

η

3|S|+1> 0 there exists an N such that E[Pei(s0)|m1,m2] ≤

i=1Pei(s0), then E[Pe(s0)|m1,m2] ≤

|S|+1for all s0∈ S. By using the Markov inequality, we have

η

3|S+1|for all i ∈ 1,2,3 all s0∈ S and all messages. Since Pe(s0) =?3

η

|S|+1; furthermore E[Pe(s0)] ≤

η

Pr(Pe(s0) ≥ η) ≤

1

|S| + 1,

(61)

and by using the union bound we have

Pr(Pe(s0) ≥ η,for some s0∈ S) ≤

?

s0∈S

Pr(Pe(s0) ≥ η) =

|S|

|S| + 1< 1.

(62)

Because the probability over the ensemble of codes of having a code with error probability (averaged over all

messages) that is less than η for all initial states is positive, there must exist at least one code that has an error

probability (averaged over all messages) that is less than η for all initial states.

VII. PROOF OF THE OUTER BOUND (THEOREM 6)

In this section we prove Theorem 6, which states that for any FS-MAC there exists a distribution

Q(xn

1||zn−1

1

)Q(xn

2||zn−1

2

) such that the following inequalities hold:

R1

≤

1

nI(Xn

1

nI(Xn

1

nI((X1,X2)n→ Yn) + ǫn,

1→ Yn||Xn

2) + ǫn

R1

≤

2→ Yn||Xn

1) + ǫn

R1+ R2

≤

(63)

where ǫngoes to zero as n goes to infinity.

Proof of Theorem 6: Let W1 and W2 be two independent messages, chosen independently and according to a

uniform distribution Pr(Wl= wl) = 2−nRl,l = 1,2. The input to the channel from encoder l at time i is xli, and

is a function of the message Wiand the arbitrary deterministic feedback output zi−1

l

(yi−1).

The following sequence of equalities and inequalities proves that if a code that achieves rate R1exists then the

Page 20

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007.20

first inequality holds, i.e., R1≤1

nI(Xn

1→ Yn||Xn

2) + ǫn:

nR1

(a)

=H(W1)

(b)

=H(W1|W2)

I(W1;Yn|W2) + H(W1|Yn,W2)

I(Yn;W1|W2) + 1 + P(n)

H(Yn|W2) − H(Yn|W1,W2) + 1 + P(n)

n

?

n

?

n

?

n

?

I(Xn

=

(c)

≤

e

nR

=

e

nR

(d)

=

i=1

H(Yi|Yi−1,W2) −

n

?

i=1

H(Yi|W1,W2,Yi−1) + 1 + P(n)

e

nR

(e)

=

i=1

H(Yi|Yi−1,W2,Xi

2) −

n

?

H(Yi|Yi−1,Xi

i=1

H(Yi|W1,W2,Yi−1,Xi

1,Xi

2) + 1 + P(n)

e

nR

(f)

≤

i=1

H(Yi|Yi−1,Xi

2) −

n

?

i=1

1,Xi

2) + 1 + P(n)

e

nR

=

i=1

I(Yi;Xi

1|Yi−1,Xi

2) + 1 + P(n)

e

nR

≤

1→ Yn||Xn

2) + 1 + P(n)

e

nR,

(64)

where,

(a) and (b) follow from the fact that the messages W1and W2are independent and chosen according to a uniform

distribution,

(c) follows from Fano’s inequality,

(d) follows from the chain rule,

(e) follows from the fact that x1iis a deterministic function given the message W1and the feedback zi−1

1

, where

the feedback zi−1

1

is a deterministic function of the output yi−1,

(f) follows from the fact that the random variables W1,W2,Xi

1,Xi

2,Yiform the Markov chain (W1,W2) −

(Xi

1,Xi

2,Yi−1) − Yi.

Dividing (64) by n, we conclude that if there exists a code for which the error probability of decoding the

messages W1,W2 is P(n)

e

then the distribution Q(xn

1||zn−1

1

n+ P(n)

1

)Q(xn

2||zn−1

2

) induced by the code satisfies the first

inequality of the outer bound theorem where ǫn=

e

R. The proofs of the other two inequalities in (63)

follow by a completely analogous sequence of steps as in (64): The proof of the second inequality of the outer

bound starts with the equalities R2= H(W2) = H(W2|W1) and the third with R1+ R2= H(W1,W2).

Corollary 20: The outer bound given in Theorem 6 implies that liminf Rnis an outer bound for the achievable

region.

Proof: Recall the definition of Rn in eq. (20). Let (R1,R2) be an achievable rate pair. We will create a

sequence of rate pairs (R1,n,R2,n) ∈ Rnthat converges to (R1,R2) and therefore, by the definition of liminf of

a sequence of sets (given in Appendix IV), (R1,R2) ∈ liminf Rn.

Page 21

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007. 21

If (R1,R2) ∈ Rn then we choose (R1,n,R2,n) = (R1,R2). Otherwise we choose the closest point in Rn to

R1,R2. Because of inequality (63) the distance ||(R1,n,R2,n) − (R1,R2)|| ≤ 2ǫn and, therefore, the sequence

(R1,n,R2,n) converges to (R1,R2).

VIII. CAPACITY REGION OF THE FS-MAC WITHOUT FEEDBACK

The inner and outer bounds given in Theorems 5 and 6 specialize to the case where there is no feedback, i.e., z1,z2

are null. Hence, we can use it in order to extend Gallager’s results [17, Ch. 4] on the capacity of indecomposable

FSCs to indecomposable FS-MACs. An indecomposable FS-MAC (FSC) is a FS-MAC (FSC) for which the effect

of the initial state vanishes with time. More precisely:

Definition 3: A FS-MAC (FSC) is indecomposable if, for every ǫ > 0, there exists an n0such that for n ≥ n0,

|P(sn|xn

Since there is no feedback, according to Lemma 4 directed information becomes mutual information and causal

1,xn

2,s0) − P(sn|xn

1,xn

2,s′

0)| ≤ ǫ for all sn,xn

1,xn

2, s0and s′

0.

conditioning becomes regular conditioning in all the expressions in the inner bound (Theorem 5) and outer bound

(Theorem 6).

The proof of the capacity region of FS-MAC is based on the following two lemmas. The first lemma is used for

showing that the difference between the lower bound and the upper bound goes to zero as n → ∞ and the second

lemma, which is proved in Appendix V, is used for showing that the limits exist.

Lemma 21: Let {Q(xn

indecomposable FS-MAC then the following holds for all s′

1)Q(xn

2)}n≥1 be an arbitrary sequence of input distribution. If the channel is an

0,s′′

0:

lim

n→∞

lim

n→∞

lim

n→∞

1

n|I(Xn

1

n|I(Xn

1

n|I(Xn

1;Yn|Xn

2,s′

0) − I(Xn

1;Yn|Xn

2,s′′

0)|=0

2;Yn|Xn

1,s′

0) − I(Xn

2;Yn|Xn

1,s′′

0)|=0

1,Xn

2;Yn|s′

0) − I(Xn

1,Xn

2;Yn|s′′

0)|=0.

(65)

Proof: The proof is identical to the proof of Theorem 4.6.4 in [17].

The following lemma, which is proved in Appendix V, establishes the sup-additivity of {Rn}.

Lemma 22: (sup-additivity of Rn. ) For any FS-MAC, the sequence {Rn}, which is defined in (19), is sup-

additive, i.e.,

(n + l)Rn+l⊇ nRn+ lRl,

(66)

and therefore limn→∞Rnexists. Moreover, for an indecomposable FS-MAC without feedback limn→∞Rn=

limn→∞Rnwhere Rnis defined (20).

Proof of Theorem 7: Theorem 5 implies that limn→∞Rnis achievable, and Corollary 20 implies that

liminfn→∞Rnis an outer bound. Finally, since according to Lemma 22 the two limits are equal to limn→∞Rn,

the capacity region is given by the last limit.

Page 22

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007. 22

IX. SUFFICIENT CONDITIONS FOR THE INNER AND OUTER BOUNDS TO COINCIDE FOR GENERAL FEEDBACK

A. Stationary Finite state Markovian MAC with feedback

A stationary finite state Markovian MAC satisfies

P(yi,si|x1i,x2i,si−1) = P(si|si−1)P(yi|si−1,x1i,x2i),

(67)

where the initial state distribution is the stationary distribution P(s0). In words, the states are not affected by the

channel inputs.

For the stationary Markovian-MAC, the sequence {Rn} is sup-additive. It follows from the fact that if we

concatenate two input distributions Qn+k = QnQk, then I(Xn+k

1

→ Yn+k||Xn+k

2

) = I(Xn

1→ Yn||Xn

2) +

I(Xn+k

1,n+1→ Yn+k

equal to

n+1||Xn+k

2,n+1), hence (n+k)Rn+k⊇ nRn+kRk. According to Lemma 23, the limit exists and is

lim

n→∞Rn= cl

?

n≥1

Rn

.

(68)

Next, we prove Theorem 8 that states that for a Markovian FS-MAC with a stationary ergodic state process, the

inner bound (Theorem 5) and the outer bound (Theorem 6) coincide and therefore the capacity region is given by

limn→∞Rn.

Proof of Theorem 8: Recall that the inner bound is given in Theorem 5 as RNand the outer bound given in

Theorem 6 and in Corollary 20 as liminf RN. Next we show that the distance between RNand RN goes to zero

which implies by Lemma 25 that both limits equal and therefore the capacity region can be written as limRN.

Let us consider a specific input distribution denoted by Q(xN

1||zN−1)Q(xN

2||zN−1) corresponding to the region

of the outer bound RN. Let us now consider an input distribution Q for n + N inputs corresponding to the inner

bound RN, such that it is arbitrary for the first n inputs and then it is Q(xN

1||zN−1)Q(xN

2||zN−1).

Now let us show that the term of the inner bound, i.e. IQ(XN

1→ YN||XN+n

2

,s0) and the term of the outer

Page 23

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007. 23

bound IQ(XN

1→ YN||XN

2) are arbitrarily close to each other.

IQ(XN+n

1

→ YN+n||XN+n

2

,s0)

(a)

≥

(b)

≥

IQ(XN+n

1

→ YN+n||XN+n

2

,Sn,s0) − log|S|

N+n

?

N+n

?

IQ(XN+n

i=n+1

HQ(Yi|Yi−1,Xi

2,Sn,s0) − HQ(Yi|Yi−1,Xi

2,Xi

1,Sn,s0) − log|S|

(c)

≥

i=n+1

HQ(Yi|Yi−1

n+1,Xi

2,n+1,Sn,s0) − HQ(Yi|Yi−1

n+1,Xi

2,n+1,Xi

1,n+1,Sn,s0) − log|S|

=

1,n+1→ YN+n

n+1||XN+n

2,n+1,Sn,s0) − H(Sn)

(d)

≥IQ(XN+n

1,n+1→ YN+n

IQ(XN+n

n+1||XN+n

n+1||XN+n

2,n+1,Sn)(1 − δ) − log|S|

≥

(e)

≥

(f)

≥

1,n+1→ YN+n

2,n+1,Sn) − δ(N + n)log|Y| − log|S|

IQ(XN+n

1,n+1→ YN+n

n+1||XN+n

2,n+1) − δ(N + n)log|Y| − 2log|S|

IQ(XN

1→ YN||XN

2) − δ(N + n)log|Y| − 2log|S|,

(69)

where

(a) follows from Lemma 2 that states that conditioning on Sncan differ at most by log|S|,

(b) follows from omitting the first n elements in the sum that defines directed information,

(c) follows from the fact that conditioning decreases entropy,

(d) follows from the fact that the Markov chain is ergodic, hence for any δ > 0, there exists an n such that

|P(sn|s0) − P(sn)| ≤ δ for any s0∈ S and sn∈ S, where P(sn) is the stationary distribution of sn,

(e) follows from Lemma 2 that states that conditioning on Sncan differ by at most log|S|,

(f) follows from the stationarity of the channel.

Dividing both sides by N + n we get that for any s0,

1

N + nIQ(XN+n

1

→ YN+n||XN+n

2

,s0) −

1

N + nIQ(XN

1→ YN||XN

2) ≥ −δ(1 +n

N)log|Y| − 2log|S|

N + n

(70)

Inequality (70) shows that the difference between the upper bound region and the lower bound is arbitrarily small

for N large enough and, hence, in the limit the regions coincide.

B. Finite State Markovian MAC with limited ISI

In this subsection we consider a MAC inspired by Kim’s point-to-point channel [22]. The conditional probability

of the MAC is given by

P(yi,zi|xi

1,xi

2,zi−1) = P(zi|zi−1)P(yi|zi−1,xi

1,i−m,xi

2,i−m), i = 1,2,3,...

(71)

where the distribution of Z0 is the stationary distribution P(z0), and there is also some initial distribution

P(x−m+1,...,x0).

Page 24

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007.24

This channel is a FS-MAC where the state at time i is (zi−1,xi−1

(Theorem 5) and the outer bound (Theorem 6) apply to this channel. Theorem 8 also holds for this kind of channels,

1,i−m,xi−1

2,i−m) and therefore the inner bound

namely, the capacity region is given by limn→∞Rn. The proof is very similar, the only difference being that the

input Q for n + N inputs is constructed slightly differently: it is arbitrary for the first n − m inputs, then it is as

the initial distribution P(x−m+1,...,x0), and then it is Q(xN

It is also possible to represent the channel with an alternative law, identical to the law of the channel given in eq.

1||zN−1)Q(xN

2||zN−1).

(71) for i ≥ m+1 but for i ≤ m the output yiis not influenced by the input and is, with probability 1, a particular

output φ ∈ Y. Let us define Rφ

clear that Rφ

it is possible to use the distribution of the first m inputs, Q(xm

nsimilarly as Rnbut with the alternative law for the channel. On one hand, it is

n⊆ Rnfor all n, and on the other hand the difference between Rφ

nand Rnis at most mlogY because

1), to create a desired initial distribution and then

use the same input as in Rn. Hence,

lim

n→∞Rφ

n= lim

n→∞Rn.

(72)

The advantage of analyzing Rφ

nRφ

9 holds for this channel too, namely, if the capacity of the Finite state Markovian MAC with limited ISI is zero

nrather than analyzing Rnis that the sequence nRφ

l, and according to Lemma 23, limn→∞Rφ

nis sup-additive, i.e. (n+l)Rφ

. Hence, we can conclude that Theorem

n+l⊇

n+ lRφ

n= cl

??

n≥1Rφ

n

?

without feedback then it is zero also in the presence of feedback.

X. CONCLUSIONS AND FUTURE DIRECTIONS

In this paper we have shown that directed information and causal conditioning emerge naturally in characterizing

the capacity region of FS-MACs in the presence of a time-invariant feedback. The capacity region is given as a

‘multi-letter’ expression and it is a first step toward deriving useful concepts in communication. For instance, we

use this characterization in order to show that for a stationary and ergodic Markovian channel, the capacity is zero if

and only if the capacity with feedback is zero. Further, we identify FS-MACs for which feedback does not enlarge

the capacity region and for which source-channel separation holds.

For the point-to-point channel with feedback, recent work has shown that, for some families of channels such as

unifilar channels [28] or the additive Gaussian where the noise is ARMA [22], the directed information formula can

be computed and, further, can lead to the development of capacity achieving coding schemes. One future direction

is to use the characterizations developed in this paper to explicitly compute the capacity regions of classes of MACs

with memory and feedback (other than the multiplexer followed by a point-to-point channel), and to find optimal

coding schemes.

Page 25

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007. 25

APPENDIX I

PROOF OF LEMMA 3

Recall that Lemma 3 states that if

Q(xN

1,xN

2||yN−1) = Q(xN

1||yN−1)Q(xN

2||yN−1),

(73)

then

I(Q(xN

1,xN

2||yN−1);P(yN||xN

1,xN

2)) = I(XN

1→ YN||XN

2).

(74)

Proof: The following sequence of equalities proves the lemma.

I(Q(xN

(a)

=

1,xN

2||yN−1);P(yN||xN

I(Q(xN

?

1,xN

2))

1||yN−1)Q(xN

Q(xN

2||yN−1);P(yN||xN

1||yN−1)Q(xN

1,xN

2))

(b)

=

yN,xN

1,xN

2

2||yN−1)P(yN||xN

1,xN

2)

P(yN||xN

1||yN−1)P(yN||x′N

1,xN

2)

?

x′N

1Q(x′N

1,xN

2)

(c)

=

?

?

?

?

Q(xN

yN,xN

1,xN

2

P(xN

1,xN

2,yN)

P(yN||xN

1||yN−1)P(yN||x′N

?

1,xN

2)

?

1,xN

2)P(yN||x′N

2||yN−1)P(yN||xN

x′N

x′N

1Q(x′N

1,xN

2)

=

E

P(yN||xN

1||yN−1,xN

Q(xN

2)

x′N

1Q(x′N

1,xN

2)

=

E

1,xN

2)P(yN||x′N

2)

2||yN−1)?

Q(xN

?

?Q(xN

?P(yN||xN

I(XN

1Q(x′N

1||yN−1,xN

1,xN

1,xN

1,xN

2)

?

=

E

?

2||yN−1)P(yN||xN

x′N

2)

1P(yN,x′N

2||yN−1)P(yN||xN

P(xN

2,yN)

1,xN

P(yN||xN

1→ YN||XN

2)

?

=

E

1,xN

2)

?

=

E

2)

2)

?

(d)

=

2)

(75)

(a) follows from the assumption given in eq. (73).

(b) follows from the definition of the functional I(Q;P) given in eq. (16).

(c) follows from Lemma 1 that states that P(xN

1,xN

2,yN) = Q(xN

1,xN

2||yN−1)P(yN||xN

1,xN

2) and the

assumption given in (73).

(d) follows from the definition of directed information.

Page 26

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007. 26

APPENDIX II

PROOF OF LEMMA 4

Lemma 4 states that if

Q(xN

1,xN

2||yN−1) = Q(xN

1)Q(xN

2),

(76)

then

I(XN

1;YN|XN

2) = I(XN

1→ YN||XN

2).

(77)

Proof: The following sequence of equalities proves the lemma.

I(XN

1;YN|XN

2)=

E

?

?

?

?

?

log

P(YN,XN

P(YN|XN

P(YN,XN

P(YN,XN

Q(XN

P(YN||XN

logQ(XN

P(YN||XN

logP(YN||XN

P(YN||XN

I(XN

1|XN

2)

1|XN

2)

1|XN

2)Q(XN

1,XN

2)Q(XN

1,XN

2)Q(XN

1)Q(XN

2)Q(XN

1,XN

2)

1→ YN||XN

2)

?

?

(a)

=

E

log

2)

(b)

=

E

log

2||YN−1)P(YN||XN

2||YN−1)Q(XN

2)P(YN||XN

2)Q(XN

?

2).

1,XN

1|XN

2)

1)

2)

2)

?

(c)

=

E

1,XN

?

=

E

2)

=

(78)

(a) follows from multiplying the numerator and denominator by P(xN

2).

(b) follows from decomposing the joint distributions P(yN,xN

1,xN

2) and P(YN,XN

2) into causal conditioning

distribution by using Lemma 1.

(c) follows from the fact that the assumption of the lemma given in (76) implies that Q(XN

1,XN

2) =

Q(XN

1)Q(XN

1). This can be obtained by multiplying both sides of (76) by P(yn||xn

over all yn∈ Yn.

1,xn

2) and then summing

APPENDIX III

PROOF OF LEMMA 12

Lemma 12 states that

max

Q(xn

1||yn−1)Q(xn

2||yn−1)I(Xn

1,Xn

2→ Yn) = 0 ⇐⇒max

1)Q(xn

Q(xn

2)I(Xn

1,Xn

2→ Yn) = 0,

(79)

and each condition also implies that P(yn||xn

Proof: Proving the direction =⇒ is trivial since

1,xn

2) = P(yn) for all xn

1,xn

2.

max

Q(xn

1||yn−1)Q(xn

2||yn−1)I(Xn

1,XN

2→ Yn) ≥max

1)Q(xn

Q(xn

2)I(Xn

1,Xn

2→ Yn).

(80)

Page 27

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007. 27

For the other direction, ⇐=, we have the assumption that I(Xn

Q(xn

1,Xn

2→ Yn) = 0 for all input distributions

2are uniformly distributed over their alphabets. Directed

1)Q(xn

2), and in particular for the case that Xn

information can be written as a Kullback Leibler divergence, i.e.,

1and Xn

?

xn

1,xn

2,yn

Q(xn

1)Q(xn

1)P(yn||xn

1,xn

2)logQ(xn

1)Q(xn

P(yn)Q(xn

1)P(yn||xn

1)Q(xn

1,xn

2)

2)

= 0

(81)

and by using the fact that if the Kullback Leibler divergence D(P||Q) ?

P(x) = Q(x) for all x ∈ X, we conclude that (81) implies that P(yn||xn

xn

?

x∈XP(x)logP(x)

2) = P(yn) for all xn

Q(x)is zero, then

1,xn

1∈ Xn

1and all

2∈ Xn

2. It follows that

max

Q(xn

1||yn−1)Q(xn

2||yn−1)I(Xn

1,Xn

2→Yn)= max

Q(xn

1||yn−1)Q(xn

max

Q(xn

2||yn−1)E

2||yn−1)E[0] = 0.

?

logP(Yn||Xn

1,Xn

2)

P(Yn)

?

=

1||yn−1)Q(xn

(82)

APPENDIX IV

SUP-ADDITIVITY AND CONVERGENCE OF 2D REGIONS

Let A,B be sets in R2, i.e., A and B are sets of 2D vectors. The sum of two regions is denoted as A + B and

defined as

A + B = {a + b : a ∈ A,b ∈ B},

(83)

and multiplication of a set A with a scalar c is defined as

cA = {ca : a ∈ A}.

(84)

A sequence {An}, n = 1,2,3,..., of 2D regions is said to converge to a region A, written A = limAnif

limsupAn= liminf An= A

(85)

where

liminf An

={a : a = liman,an∈ An},

limsupAn

={a : a = limak,ak∈ Ank},

(86)

and nk denotes an arbitrary increasing subsequence of the integers. An alternative and equivalent definition of

??

more details on convergence of sets in finite dimensions see [48].

Let A denote

n≥1

We say that a sequence {An}n≥1is bounded if sup{||a|| : a ∈ A} < ∞ where || · || denotes a norm in R2.

limsup and liminf is given by limsupAn=?

n≥1cl

m≥nAm

?

and liminf An=?

n≥1cl

??

m≥nAm

?

. For

A = cl

?

An

.

(87)

Page 28

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007. 28

Lemma 23: Let An, n = 1,2,..., be a bounded sequence of sets in R2that includes the origin, i.e. (0,0). If

nAnis sup-additive, i.e., for all n ≥ 1 and all N > n

NAN⊇ nAn+ (N − n)AN−n

(88)

then

lim

n→∞An= A.

(89)

Proof: From the definitions we have A ⊇ limsupAn⊇ liminf An. Hence it is enough to show that A ⊆

liminf An.

Let a be a point in A. Then for every ǫ > 0 there exists an n and a point aǫsuch that aǫ∈ Anand ||a − aǫ|| ≤ ǫ.

By induction we prove that for any integer m ≥ 2, An⊆ Amn, and this implies that aǫ∈ Amn. For m = 2 we

choose N = 2n and we get that

A2n⊇An

2

+An

2

⊇ An.

(90)

Now assume that it holds for m − 1 and let us show that it holds for m.

Amn⊇An

m+(m − 1)A(m−1)n

m

⊇An

m+(m − 1)An

m

⊇ An.

(91)

Now, for any N > n, we can represent N as mn + j where 0 ≤ j ≤ n − 1, hence

Amn+j⊇

j

mn + jAj+

mn

mn + jAmn.

(92)

Because aǫis in An, then it implies that it is in Amntoo. Following (92) and the fact that (0,0) ∈ Aj we obtain

mn

mn + jaǫ∈ Amn+j.

(93)

For any δ > 0 and for any N ≥n

can be upper-bounded by

δwe conclude the existence of an element in AN for which the distance from a

????

mn

mn + jaǫ− a

????=

????aǫ− a −

j

mn + jaǫ

????≤ ||aǫ− a|| + δ||aǫ|| ≤ ǫ + δ||aǫ||.

(94)

Because ǫ and δ are arbitrarily small we can find a sequence of points an∈ Anthat converges to a and therefore

a ∈ liminf An, which implies that A ⊆ liminf An.

Corollary 24: For a sup-additive sequence, as defined in Lemma 23, the limit is convex.

This corollary follows immediately from the definition of the sup-additivity property, eq. (88) where n = αN,

where 0 < α < 1, and N goes to infinity.

The (Hausdroff) distance between two sets A and B, is defined as

d(A,B) = max{sup[d(a,B : a ∈ A],sup[d(b,A) : b ∈ B]},

(95)

where the distance between a set A and a point b is given by,

d(b,A) = inf

a[||a − b|| : a ∈ A]

(96)

Page 29

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007. 29

Lemma 25: If limn→∞d(An,Bn) = 0 then

limsupAn

= limsupBn,

liminf An

= liminf Bn.

(97)

Proof:

The proof is straightforward. Given a sequence {ak} ∈ Ankthat converges to a, we construct a

sequence {bk} by finding a point in Bnkthat is at a distance less than1

the sets goes to zero, limbk= limak= a and from the definitions of limits of sets, it implies that (97) holds.

k+d(ak,Bnk). Since the distance between

APPENDIX V

PROOF OF LEMMA 22

Recall the definition of Rnand Rnin (19) and (20) respectively.

Lemma 22 states that

(n + l)Rn+l⊇ nRn+ lRl.

(98)

and for an indecomposable FS-MAC without feedback limn→∞Rn= limn→∞Rn.

Proof of Lemma 22: We notice that if a sequence of sets is sup-additive then the sequence of the convex hull

of the sets is also sup-additive. Hence, it is enough to prove the sup-additivity of the sequence Rnwithout the

appearance of the random variable W that its role is to convexify the regions.

The set Rnis defined by three expressions that involve directed information. Because each expression is sup-

additive the whole set is sup-additive. We prove that the first expression, i.e. mins0I(Xn

1→ Yn||Xn

2,s0)−log|S|

is sup-additive (the proofs of the supper-additivity of the other expressions are similar and therefore omitted).

min

s0

I(Xn+l

1

→ Yn+l||Xn+l

2

,s0)

(a)

≥ min

s0

n

?

i=1

I(Yi;Xi

1|Yi−1,Xi

2,s0) + min

s0

n+l

?

j=n+1

I(Yj;Xj

1|Yj−1,Xj

2,s0)

(b)

≥I(Xn

1→ Yn||Xn

2,s0) +

n+l

?

n+l

?

j=n+1

I(Yj;Xj

1,n+1|Yj−1,Xj

2,s0)

(c)

≥I(Xn

1→ Yn||Xn

2,s0) +

j=n+1

I(Yj;Xj

1,n+1|Yj−1,Xj

2,Sn,s0) − log|S|

= min

s0

I(Xn

1→ Yn||Xn

2,s0) + min

s0

?

n+l

?

I(Xl

sn

P(sn|s0)

n+l

?

j=n+1

I(Yj;Xj

1,n+1|Yj−1,Xj

2,n+1,sn) − log|S|

≥ min

s0

I(Xn

1→ Yn||Xn

2,s0) + min

sn

j=n+1

I(Yj;Xj

1,n+1|Yj−1

n+1,Xj

2,n+1,sn) − log|S|

(d)

= min

s0

I(Xn

1→ Yn||Xn

2,s0) + min

s0

1→ Yl||Xl

2,s0) − log|S|.

(99)

(a) follows the definition of the directed information the fact that mins[f(s) + g(s)] ≥ minsf(s) + minsg(s),

Page 30

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007. 30

(b) follows the fact that I(X;Y,Z) ≥ I(X;Y ),

(c) follows Lemma 2 that states that conditioning by Sncan differ by at most log|S|,

(d) follows from the stationarity of the channel.

According to Lemma 23, since the sequence {Rn} is sup-additive the limit exists. In the rest of the proof we

show that limn→∞Rn= limn→∞Rn. The terms of the region Rnhave an auxiliary random variable W whose

only role is to convexify the region. Let us denote Ro

show first that restricting W to being null does not influence the limit, i.e., limn→∞Rn= limn→∞Ro

half of the proof we showed that Ro

with rational weights (l

nthe same region as Rnwhere W is restricted to be null. We

n. In the first

nis sub-additive. Using this fact, we show now, that any convex combination

k,k−l

k) of any two points from Ro

nis in Ro

kn.

Ro

kn⊇l

kRo

ln+k − l

k

Ro

(k−l)n⊇l

kRo

n+k − l

k

Ro

n

(100)

The left and the right inclusions in (100) are due to the sup-additivity of Ro

of the sup-additivity and the right is due to the fact that sup-additivity of Ro

integers m,n, Ro

we can find a k(ǫ) such that Rn⊆ Ro

the limits of both sequences exist, allow us to deduce that the limits are the same, i.e., limn→∞Rn= limn→∞Ro

We conclude the proof by showing that, for any input distribution Q(xn

n. The left inclusion is from the definition

nalso implies that for any two positive

mn⊇ Ro

n(This is shown by induction in (90,91)). From (100) we can deduce that for any ǫ > 0

nk+ǫ. This fact, together with the trivial fact that Rn⊇ Ro

n, and the fact that

n.

1)Q(xn

2), the difference between the terms

in the inequalities of {Ro

goes to zero as n → ∞ and, by Lemma 25, the limits of the sequences are the same.

n} and {Rn} goes to zero as n → ∞, hence the distance between the sets of the sequences

lim

n→∞

(a)

1

n

????I(Xn

lim

n→∞

1→ Yn||Xn

????I(Xn

n

2) − min

s0

I(Xn

1→ Yn||Xn

2,s0) + log|S|

????

≤

1

n

1

1→ Yn||Xn

2,S0) − min

s0

I(Xn

1→ Yn||Xn

2,s0) + log|S|

?

????+ log|S|

=lim

n→∞

?

?

I(Xn

1→ Yn||Xn

2,S0) − min

s0

I(Xn

1→ Yn||Xn

2,s0))

(b)

≤ lim

n→∞

1

n

max

s0

I(Xn

1→ Yn||Xn

2,s0) − min

s0

I(Xn

1→ Yn||Xn

2,s0))

?

(c)

=0

(101)

(a) follows from Lemma 2 and the triangle inequality.

(b) follows from the fact that maxs0I(Xn

1→ Yn||Xn

2,s0) ≥ I(Xn

1→ Yn||Xn

2,S0).

(c) follows from Lemma 21 that states this equality for indecomposable FS-MAC without feedback (recall also

that directed information equals mutual information in the absence of feedback).

Page 31

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007. 31

APPENDIX VI

PROOF OF THEOREM 16

E[Pe1]=

?

?

yN

?

?

xN

1,xN

2

P(xN

1,xN

2,yN)P[error1|m1,m2,xN

1,xN

2,yN]

=

yN

xN

1,xN

2

Q(xN

1||zN−1

1

)Q(xN

2||zN−1)P(yN||xN

1,xN

2)P[error1|m1,m2,xN,yN],

(102)

where P[error1|m1,m2,xN,yN] is the error probability of decoding m1 given that m2 is decoded correctly.

Throughout the remainder of the proof we fix the message m1,m2. For a given tuple (m1,m2,xN

1,xN

2,yN)

define the event Am′

1, for each m′

1?= m1, as the event that the message m′

1is selected in such a way that

P(yN|m′

notation for xN

1,m2) > P(yN|m,m2) which is the same as P(yN||x′N

1(m′

the definition of Am′

1,xN

2) > P(yN||xN

l(ml,zN−1

1,xN

2) where x′N

(yN−1)) for l = 1,2. From

1is a shorthand

1,zN−1(yN−1)) and xN

i

is a shorthand notation for xN

l

1we have

P(Am′

1|m1,m2,xN

1,xN

2,yN)=

?

x′N

Q(x′N

1||zN−1) · 1[P(yN||x′N

1,xN

2) > P(yN||xN

1,xN

2)]

≤

?

x′N

Q(x′N

1||zN−1)

?

P(yN||x′N

P(yN||xN

1,xN

1,xN

2)

2)

?s

;

any s > 0

(103)

where 1(x) denotes the indicator function.

P[error1|m1,m2,xN

1,xN

2,yN]=P(

?

?

m′?=m

Am′

1|m1,m2,xN

1,xN

2,yN)

≤min

?

m′

1?=m

P(Am′

1|m1,m2,xN

1,xN

2,yN),1

≤

m′

1?=m1

P(Am′

1|m1,m2,xN

1,xN

2,yN)

ρ

;

any 0 ≤ ρ ≤ 1

≤

(M1− 1)

?

x′N

1

Q(x′N

1||zN−1)

?

P(yN||x′N

P(yN||xN

1,xN

1,xN

2)

2)

?s

ρ

,0 ≤ ρ ≤ 1,s > 0,

(104)

where the last inequality is due to inequality (103). By substituting inequality (104) in eq. (102) we obtain:

E[Pe1] ≤ (M − 1)ρ?

By substituting s = 1/(1+ ρ), and recognizing that x′is a dummy variable of summation, we obtain eq. (37) and

yN,xN

2

Q(xN

2||zN−1)

??

xN

Q(xN

1||zN−1

1

)P(yN||xN

1,xN

2)1−sρ

?

?

x′N

1

Q(x′N

2||zN−1)P(yN||x′N

1,xN

2)s

ρ

complete the proof of the bound on E[Pe1].

The proof for bounding E[Pe2] is identical to the proof that is given here for E[Pe1], up to exchanging the

indices. For E[Pe3] the upper bound is identical to the case of the point-to-point channel with an input xN

1,xN

2, as

proven in [27] where the union bound which appears here in eq. (104) consists of (M1− 1)(M2− 1) terms.

Page 32

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007.32

REFERENCES

[1] R. Ahlswede, “Multi-way communication channels,” in Proceedings of 2nd International Symposium on Information Theory (Thakadsor,

Armenian SSR, Sept. 1971), 1973, pp. 23–52.

[2] T. M. Cover and C. S. K. Leung, “An achievable rate region for the multiple-access channel with feedback,” IEEE Trans. on Info. Theory,

vol. 27, no. 3, pp. 292–298, 1981.

[3] F. M. J. Willems, “The feedback capacity region of a class of discrete memoryless multiple access channels,” IEEE Trans. on Info. Theory,

vol. 28, no. 1, pp. 93–95, 1982.

[4] S. Bross and A. Lapidoth, “An improved achievable region for the discrete memoryless two-user multiple-access channel with noiseless

feedback,” IEEE Trans. on Info. Theory, vol. 51, pp. 811–833, 2005.

[5] W. Wu, S. Vishwanath, and A. Arapostathis, “On the capacity of multiple access channels with state information and feedback,” 2006,

submitted to IEEE Trans. Inform. Theory. Availble at arxiv.org/cs.IT/0606014.

[6] L. H. Ozarow, “The capacity of the white Gaussian multiple access channel with feedback,” IEEE Transactions on Information Theory,

vol. 30, no. 4, pp. 623–628, 1984.

[7] J. P. M. Schalkwijk and T. Kailath, “Coding scheme for additive noise channels with feedback I: No bandwidth constraint,” IEEE Trans.

Inform. Theory, vol. 12, pp. 172–182, 1966.

[8] A. Lapidoth and M. A. Wigger, “On the Gaussian mac with imperfect feedback,” in 24th IEEE Convention of Electrical and Electronics

Engineers in Israel (IEEEI06), Eilat, Israel, November 2006.

[9] N. Martins and T. Weissman, “Coding schemes for additive white noise channels with feedback corrupted by quantization or bounded

noise,” 2006, submitted to IEEE Trans. Inform. Theory. Availble at arxiv.org/cs.IT/0609055.

[10] N. Merhav and T. Weissman, “Coding for the feedback Gel’fand-Pinsker channel and the feedforward Wyner-Ziv source,” IEEE Trans.

on Info. Theory, vol. 52, pp. 4207–4211, 2006.

[11] S. Verd´ u, “Multiple-access channels with memory with and without frame synchronism,” IEEE Trans. on Info. Theory, vol. 35, no. 3, pp.

605–619, 1989.

[12] R. Cheng and S. Verd´ u, “Gaussian multiaccess channels with ISI: Capacity region and multiuser water-filling,” IEEE Trans. Inform. Theory,

vol. 39, no. 3, pp. 773–785, 1993.

[13] G. Kramer, “Directed information for channels with feedback,” Ph.D. Dissertation, Swiss Federal Institute of Technology Zurich, 1998.

[14] ——, “Capacity results for the discrete memoryless network,” IEEE Trans. Inform. Theory, vol. 49, pp. 4–21, 2003.

[15] T. S. Han, Information-Spectrum Method in Information Theory.Springer, 2003.

[16] ——, “An information-spectrum approach to capacity theorems for the general multiple-access channel,” IEEE Trans. Inform. Theory,

vol. 44, no. 7, pp. 2773–2795, 1998.

[17] R. G. Gallager, Information Theory and Reliable Communication.New York: Wiley, 1968.

[18] A. Lapidoth and I. Telatar, “The compound channel capacity of a class of finite-state channels,” IEEE Trans. Inform. Theory, vol. 44, p.

973, 1998.

[19] N. T. Gaarder and J. K. Wolf, “The capacity region of a multiple-access discrete memoryless channel can increase with feedback,,” IEEE

Trans. Inform. Theory, vol. 21, pp. 100–102, 1975.

[20] F. Alajaji, “Feedback does not increase the capacity of discrete channels with additive noise,” IEEE Trans. Inform. Theory, vol. 41, pp.

546–549, 1995.

[21] R. G. Gallager, “A perspective on multiaccess channels,” IEEE Transactions on Information Theory, vol. 31, no. 2, pp. 124–142, 1985.

[22] Y. Kim, “A coding theorem for a class of stationary channels with feedback,” Jan 2007, submitted to IEEE Trans. Inform. Theory. Availble

at arxiv.org/cs.IT/0701041.

[23] J. Massey, “Causality, feedback and directed information,” Proc. Int. Symp. Information Theory Application (ISITA-90), pp. 303–305, 1990.

[24] S. Tatikonda, “Control under communication constraints,” Ph.D. disertation, MIT, Cambridge, MA, 2000.

[25] J. Chen and T. Berger, “The capacity of finite-state Markov channels with feedback,” IEEE Trans. on Information theory, vol. 51, pp.

780–789, 2005.

[26] S. Yang, A. Kavcic, and S. Tatikonda, “Feedback capacity of finite-state machine channels,” IEEE Trans. Inform. Theory, pp. 799–810,

2005.

Page 33

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007.33

[27] H. H. Permuter, T. Weissman, and A. J. Goldsmith, “Finite state channels with time-invariant deterministic feedback,” Sep 2006, submitted

to IEEE Trans. Inform. Theory. Availble at arxiv.org/pdf/cs.IT/0608070.

[28] H. H. Permuter, P. W. Cuff, B. Van-Roy, and T. Weissman, “Capacity of the trapdoor channel with feedback,” Aug 2006, submitted to

IEEE Trans. Inform. Theory. Availble at arxiv.org/cs.IT/0610047.

[29] S. Tatikonda and S. Mitter, “The capacity of channels with feedback,” September 2006, submitted to IEEE Trans. Inform. Theory. Availble

at arxiv.org/cs.IT/0609139.

[30] B. Shrader and H. H. Permuter, “On the compound finite state channel with feedback,” in ISIT 2007. Nice, France: IEEE, 2007.

[31] G. Kramer, “Directed information for channels with feedback,” Ph.D. Dissertation, Swiss Federal Institute of Technology Zurich, 1998.

[32] S. Pradhan, “Source coding with feedforward: Gaussian sources,” in Proceedings 2004 International Symposium on Information Theory,

2004, p. 212.

[33] R. Venkataramanan and S. S. Pradhan, “Source coding with feedforward: Rate-distortion function for general sources,” in IEEE Information

theory workshop (ITW), 2004.

[34] R. Zamir, Y. Kochman, and U. Erez, “Achieving the gaussian rate-distortion function by prediction,” July 2006, submitted for publication

in “IEEE Trans. Inform. Theory”. [Online]. Available: http://www.eng.tau.ac.il/∼zamir/papers/dpcm.pdf

[35] A. Rao, A. Hero, D. States, and J. Engel, “Inference of biologically relevant gene influence networks using the directed information

criterion,” in ICASSP 2006 Proceedings, 2006.

[36] P. Mathai, N. C. Martins, and B. Shapiro, “On the detection of gene network interconnections using directed mutual information,” in ITA,

San-Deigo, 2007.

[37] M. Mushkin and I. Bar-David, “Capacity and coding for the Gilbert-Elliot channel,” IEEE Trans. Inform. Theory, vol. 35, pp. 1277–1290,

1989.

[38] A. Goldsmith and P. Varaiya, “Capacity, mutual information, and coding for finite-state Markov channels,” IEEE Trans. on Info. Theory,

vol. 42, pp. 868–886, 1996.

[39] C. E. Shannon, “A mathematical theory of communication,” Bell Syst. Tech. J., vol. 27, pp. 379–423 and 623–656, 1948.

[40] Y. Kim, “Feedback capacity of the first-order moving average gaussian channel,” IEEE Trans. Inform. Theory, vol. 52, p. 3063, 2006.

[41] ——, “Feedback capacity of stationary Gaussian channels,” 2006, submitted to IEEE Trans. Inform. Theory. Availble at

arxiv.org/pdf/cs.IT/0602091.

[42] A. K. S. Yang and S. Tatikonda, “On the feedback capacity of power constrained Gaussian channels with memory,” submitted to IEEE

Trans. Inform. Theory, Oct. 2003.

[43] C. E. Shannon, “The zero error capacity of a noisy channel,” IRE Trans. Information Theory, vol. IT-2, pp. 8–19, 1956.

[44] E. O. Elliott, “Estimates of error rates for codes on burst-noise channels,” Bell Svst. Tech. J., vol. 42, pp. 1977–1997, 1963.

[45] S. Diggavi and M. Grossglauser, “On information transmission over a finite buffer channel,” IEEE Trans. Inform. Theory, vol. 52, p. 1226,

2006.

[46] T. M. Cover, A. E. Gamal, and M. Salehi, “Multiple access channels with arbitrarily correlated sources,” IEEE Trans. Inform. Theory,

vol. 26, pp. 648–657, 1980.

[47] T. M. Cover, “A proof of the data compression theorem of Slepian and Wolf for ergodic sources,” IEEE Trans. Inform. Theory, vol. 22,

pp. 226–228, 1975.

[48] G. Salinetti and R. Wets, “On the convergence of sequences of convex sets in finite dimensions,” SIAM Review, vol. 21, no. 1, pp. 18–33,

1979.