Page 1

1

Protocol Coding through Reordering of User

Resources, Part I: Capacity Results

Petar Popovski∗and Zoran Utkovski†

∗ Department of Electronic Systems, Aalborg University, Denmark

† Institute of Information Technology, University of Ulm, Germany

Email: petarp@es.aau.dk, zoran.utkovski@uni-ulm.de

Abstract

The vast existing wireless infrastructure features a variety of systems and standards. It is of signifi-

cant practical value to introduce new features and devices without changing the physical layer/hardware

infrastructure, but upgrade it only in software. A way to achieve it is to apply protocol coding: encode

information in the actions taken by a certain (existing) communication protocol. In this work we

investigate strategies for protocol coding via combinatorial ordering of the labelled user resources

(packets, channels) in an existing, primary system. Such a protocol coding introduces a new secondary

communication channel in the existing system, which has been considered in the prior work exclusively

in a steganographic context. Instead, we focus on the use of secondary channel for reliable communica-

tion with newly introduced secondary devices, that are low-complexity versions of the primary devices,

capable only to decode the robustly encoded header information in the primary signals. We introduce

a suitable communication model, capable to capture the constraints that the primary system operation

puts on protocol coding. We have derived the capacity of the secondary channel under arbitrary error

models. The insights from the information–theoretic analysis are used in Part II of this work to design

practical error–correcting mechanisms for secondary channels with protocol coding.

I. INTRODUCTION

A. Motivation and Initial Observations

After two decades of explosive growth, the starting point for wireless innovation is changed.

With the vast amount of deployed infrastructure and variety of existing systems, it is of significant

practical value to introduce new features without changing the physical layer/hardware of the

February 3, 2012DRAFT

arXiv:1202.0307v1 [cs.IT] 1 Feb 2012

Page 2

2

infrastructure, but only upgrade it in software. This can be achieved by a suitable, backward–

compatible upgrade of the communication protocols. We use the term protocol coding to refer

to techniques that convey information by modulating the actions of a communication protocol.

Consider the example on Fig. 1, where a cellular base station (BS) a group of primary terminals

in its range. It is assumed that the cellular system is frame–based (WiMax [1], LTE [2], etc.). The

metadata contained in the frame header informs the terminals how to receive/interpret the actual

data that follows. The frame header is commonly encoded more robustly compared to the data,

such that it can be reliably received in an area that is larger than the nominal coverage area, as

depicted on Fig. 1. In such a context, while still using the same infrastructure, we can introduce

new secondary devices, which are able to operate in the extended coverage area. These can be

e. g. machine-type devices [3], such as sensors or actuators, that are controlled by the cellular

BS. The secondary devices are simple and have a limited functionality, capable to decode only

the frame header, but not the complex high–rate codebooks used for data. The main idea is

that BS can send information to the secondary devices in the frame header. However, one could

immediately object that the frame header carries important metadata that cannot be changed

arbitrarily. The BS decides how to schedule the primary users based on certain QoS criterion.

Nevertheless, there could be still freedom to rearrange the headers and thereby send information

to the secondary devices. To illustrate this point, assume that there are two OFDMA channels,

1 and 2, defined in a diversity mode [1], such that if a user Alice is scheduled in a given frame,

it is irrelevant whether it is assigned to channel 1 or 2. Hence, if BS schedules Alice and Bob

in a given frame, then it can encode 1−bit secondary information as follows: allocating Alice to

channel 1 and Bob to channel 2 is a bit value 0, otherwise it is a bit value 1. Taking this simple

example further, let there be three OFDMA channels, but still only two users, Alice and Bob.

In a given frame, each of them can get from 0 up to 3 channels assigned, which is decided by

the primary scheduling criterion; the secondary transmitter can encode information by assigning

these channels to Alice/Bob in a particular way. If there are 2(1) packets for Alice (Bob), they

can be assigned in 3 possible ways and in that particular frame, log2(3) secondary bits can be

sent. However, if all 3 packets are addressed to Alice, no secondary information can be sent in

that frame. This variable amount of information due to the primary operation is the crux of the

communication model considered in this work.

The objective of this and the companion paper [4] is to investigate the fundamental properties

February 3, 2012 DRAFT

Page 3

3

of communication systems that use protocol coding to send information, under restrictions

imposed by a primary system. The secondary information is encoded in the ordering of labelled

resources (packets, channels) of the primary (legacy) users. In this paper we introduce a suitable

communication model that can capture the restrictions imposed by the primary system. The model

captures the key feature of a secondary communication: in a given scheduling epoch, the primary

system decides which packets/users to send data to, while secondary information can be sent by

only rearranging these packets. Each primary packet is subject to an error (e. g. erasure), which

induces a corresponding error model for secondary communication. In this paper we analyze the

model using information-theoretic tools and obtain capacity–achieving communication strategies,

which we then apply in Part II of the work to obtain practical encoding strategies.

B. Related Work and Contributions

Protocol coding can appear in many flavors. An early work that mentions the possibility to

send data by modulating the random access protocol is [5], but in a rather “negative” context,

since the model used explicitly prohibits to decide the protocol actions based on user data. The

seminal work [6] uses a form of protocol coding: the information is modulated in the arrival times

of data packets. More recent works on possible encoding of information in relaying scenarios

through protocol–level choice of whether to transmit or receive is presented in [7] [8] and [9]. At

a conceptual level, protocol coding bridges information theory and networking [10]. The idea of

communication based on packet reordering is not new per se and has been presented in the context

of covert channels [11] [12] [13]. However, the big difference with our work is that our objective

is not steganographic, but rather what kind of communication strategies can be used when the

degrees of freedom for secondary communication are limited by a certain (random) process in

the primary system. The practical coding strategies are related to the frequency permutation

arrays for power line communications [14], [15].

Preliminary results of this work have appeared in [16] and [17]. In [16] we have introduced the

notion of a secondary channel and sketched of the communication strategies when the primary

packets are subject to an erasure channel, while in [17] we treated the case when the error

model for the primary packets is represented by a Z–channel. In this paper we devise capacity–

achieving strategies for arbitrary error model incurred on the primary packets and provide the

detailed proofs. We first show that our communication model is related to the model of Shannon

February 3, 2012 DRAFT

Page 4

4

for channels with causal side information at the transmitter (CSIT) [18]. We then develop a new

framework for computing the secondary capacity, which leads us to explicit specification of the

communication strategies that are applied to convolutional codes in Part II [4].

II. SYSTEM MODEL

A. Communication Scenario

The communication model is depicted on Fig. 2. A Base Station (BS) transmits downlink data

to a set of two users, addressed 0 and 1, respectively. The BS serves the users in scheduling

frames with Time Division Multiple Access (TDMA). Each frame has a fixed number of F

packets. Each packet carries the address of a user to whom the packet is destined, as well as

data for that user. This is called primary data, destined to either user 0 or user 1. There is a third

receiving device, termed secondary device, that listens the TDMA frames sent by the BS. This

device only records the address of each packet and ignores the packet data. Since this work is

focused on the secondary communication, the notions “transmitter” and “receiver” will be used

to refer to secondary transmitter and receiver, respectively. By addressing the packets in a given

frame in a particular order, the BS sends secondary information. Thus, an input symbol for the

secondary channel is an F−dimensional binary vector x ∈ X = {0,1}F.

The model with only two primary is limiting, but extension to K primary addresses entails

complexity that is outside the scope of this initial paper on the topic. Yet, the results with binary

secondary inputs provide novel insights for the communication strategies and set the basis for

generalizations to K > 2. Furthermore, the binary input captures the following practical setup.

Consider the case in which the arrival of packets in the primary system is random and in a

certain frame the BS has only F?< F packets to send, then (F −F?) of the slots will be empty.

In this case we can still use the binary input model. We assign address 0 to a the empty packet

slots, such that these empty slots can be actually treated as valid secondary input symbols. On

the other hand, the presence of a packet in a given slot is treated as a secondary symbol 1. The

secondary receiver only needs to detect packet presence/absence, without decoding its header.

The key assumption in the model is that the packets that are scheduled in a frame are decided

by the primary communication system: the primary system decides that s packets in a frame

will be addressed to user 1 and (F − s) packets will be addressed to user 0, where 0 ≤ s ≤ F.

This assumption captures the essence of protocol coding: secondary communication is realized

February 3, 2012DRAFT

Page 5

5

by modulating the degrees of freedom left over from the operation of the original, primary

communication system. In other words, it is assumed that the operational requirements of the

primary system are contained in the set of packets that the BS decides to send in a given frame.

The number of packets s addressed to user 1 in a given frame is called state of the frame. We

assume that the primary system selects packets in a memoryless fashion: in each frame, a packet

is addressed 1(0) with probability a(1−a), independently of the other packets and the previous

frames. Hence, the probability that a frame is in state s is binomial PS(s) =?F

With the state s decided by the primary system, the secondary transmitter is only allowed to

rearrange the packets in the frame. Since s is a random variable over which the secondary

s

?as(1 − a)F−s.

transmitter has no control, a frame carries a variable amount of secondary information. For

example, if F = 4 and the primary system decides s = 3, then the possible secondary symbols

for the frame are 1110,1101,1011,0111. But, if s = F = 4, than in that frame the secondary

transmitter cannot send any information.

B. Error Models for the Secondary Channel

From the perspective of a secondary transmitter/receiver, each packet is sent over a memoryless

channel with binary inputs. Several suitable error models can be inferred from the physical setup.

In erasure channel, the receiver either correctly decodes the packet address 0 or 1 or the header

checksum in incorrect, leading to erasure ?. In a binary symmetric channel, the receiver uses

error-correction decoding to decide whether it is more likely that address 0 or 1 is received.

This results in only two possible outputs and symmetric error events. Finally, the Z-channel

is suitable if 0/1 corresponds to packet absence/presence, respectively. The probability that, in

absence of a packet, the noise produces a valid packet detection sequence, is practically 0, while

the probability that packet transmission is not detected is pe> 0.

In the general case of a channel with binary inputs, there can be J possible outputs from the

set J. The special cases above have J = {0,1,?} and J = {0,1}. When i = 0,1 is sent, there

are J transition probabilities, represented by a vector:

qi= (qi1,qi2,...qiJ)i = 0,1

(1)

where qij = P(y = j|x = i) and some qij can be equal to 0. A secondary output symbol is

y ∈ Y = JF. The input/output variables of the secondary channel are denoted by X and Y,

February 3, 2012DRAFT

Page 6

6

respectively. By denoting x = (x1,x2,···xF) with xf ∈ {0,1} and y = (y1,y2,···yF) with

yf∈ J, we can define the channel X − Y through the transition probabilities:

PY|X(y|x) =

F?

f=1

qxfyf

(2)

When there is no risk for confusion, we simply write P(y|x). Thus, the channel X − Y is

specified by the memoryless binary channel through which each packet is passed.

The following notation will be used. S = {0,1,...F} to denote the set of possible states. The

set of input and output symbols of the secondary channel is denoted by X and Y, respectively.

The set of input symbols is partitioned into F + 1 subsets Xsdefined as follows:

x ∈ Xs⇔

F

?

i=1

xi= s

(3)

When the frame state is S = s, then only x ∈ Xscan be sent over the secondary channel.

III. FRAMEWORK FOR ANALYZING THE CAPACITY OF A SECONDARY CHANNEL

A. Relation to the Shannon’s Model with Causal State Information at the Transmitter (CSIT)

The secondary channel can be represented by the framework of Shannon for channels with

causal state information at the transmitter (CSIT) [18],. Shannon showed that instead of consid-

ering the original channel with CSIT, one can consider an ordinary, discrete memoryless channel

with equivalent capacity that has a larger input alphabet. The input variable of the equivalent

channel is T and each possible input letter t, termed strategy [19], represents a mapping from

the state alphabet S to the input alphabet X of the original channel. A particular strategy t ∈ T

is defined by the vector of size |S|: (t(1),...t(|S|)), where t(s) ∈ X. Therefore, if each s ∈ S

can be mapped map to any x ∈ X, then the total number of possible strategies is |X||S|and

therefore |T | ≤ |X||S|. The capacity of the equivalent channel can be found as:

C = max

PT(·)I(T,Y)

(4)

where PT(·) is a probability distribution defined over the set T which is independent of the state

S. The maximization is performed across all the joint distributions that satisfy [19]:

PS,T,X,Y(s,t,x,y) = PS(s)PT(t)δ(x,t(s))PY|X,S(y|x,s)

where δ(x,t(s)) = 1 if x = t(s) and δ(x,t(s)) = 0 otherwise. Following the properties of

(5)

mutual information ([20], Section 8.3), the required cardinality of T is not more than |Y|.

February 3, 2012 DRAFT

Page 7

7

However, Shannon’s result is for the general case of channels with causal CSIT. The secondary

channel considered here has a specific structure that permits more explicit characterization of

the communication strategies. As noted in relation to (3), for a given state S = s only a subset

Xs∈ X of symbols x may be produced. For example, when F = 4 and s = 2, it is not possible

to send the symbol x = 1011. Nevertheless, in the model with causal CSIT the distribution

PY|X,S(y|x,s) needs to be defined for all pairs (x,s), irrespective of the fact that in the original

model some x are incompatible with s, i. e. when the state is S = s, the symbols x / ∈ Xscannot

be sent. In order to deal with this situation, we need to extend the model. Given PY|X(y|x), we

define PY|X,S(y|x,s) in the following way: For each xu/ ∈ Xswe take one xv∈ Xsand define:

PY|X,S(y|xu,s) ≡ PY|X,S(y|xv,s)

∀y ∈ Y.

(6)

The idea behind this approach is the following. For example, let us assume F = 4 and the

erasure model. When s = 0 only x = 0000 can be sent. But we can look at it in another way:

when s = 0 only y = 0000 ore the versions of 0000 with erasures can occur. Hence, we can

equivalently say that when s = 0, any x can be sent, but, in absence of errors, the output is

always 0000. Picking a strategy t??in which t??(s) = xu is equivalent to picking t?in which

t?(s) = xv. In short, for given s, we define PY|X,Sin order to discourage selection of symbols

x for which x ?= y in absence of channel errors.

As pointed out in [19], expressing the capacity in terms of strategies might pose some

conceptual and practical problems for code construction and implementation when F is large.

On the other hand, our objective is to use the specific way in which the set of states partitions

the possible set of transmitted symbols X in order to provide insights in the capacity–achieving

communication strategies. Therefore, a different framework for capacity analysis from will be

used. A practical dividend of such a framework is presented in the companion paper [4], where

the capacity–achieving strategies are converted into convolutional code designs.

B. Capacity Analysis through a Cascade of Channels

Recall that T is an auxiliary random variable defined over the set of possible strategies T .

For given T = t and each s ∈ S there is a single representative of t in s x = t(s) ∈ Xs. In the

text that follows we use “strategies” and “input symbols” interchangeably. Hence, T consists of

February 3, 2012 DRAFT

Page 8

8

the input symbols {1,2,...|T |}. The set of F + 1 representatives {xs(t)} for given t will be

called a multisymbol of t.

Due to the randomized state change, each t ∈ T induces a distribution on X. For example,

if F = 2 and the strategy is defined as t(0) = 00,t(1) = 01,t(2) = 11, then we can define

PX|T(x = 00|t) = (1 − a)2= PS(0), PX|T(x = 11|t) = a2= PS(2), PX|T(x = 01|t) =

2a(1 − a) = PS(1), and PX|T(x = 10|t) = 0. In general, PX|T(·) should satisfy that for each

s ∈ S there is a single x ∈ Xssuch that PX|T(x|t) = PS(s). The set of such distributions is:

PX|T=?PX|T(·)|∀t ∈ T ,∀s ∈ S,∃!x ∈ Xssuch that PX|T(x|t) = PS(s)?

In this way, we do not need to explicitly consider state in the capacity analysis, but instead we

(7)

model the secondary communication channel by using a cascade of two channels T −X−Y and

the primary constraints are reflected in the definition of PX|T. In order to express the mutual

information I(T;Y), we write I(T,X;Y) = I(T;Y) + I(X;Y|T) = I(X;Y) + I(T;Y|X)

Using the Markov property for the cascade we get I(T;Y|X) = 0, which implies:

I(T;Y) = I(X;Y) − I(X;Y|T)

Let PT denote the set of all distributions PT(·). Our objective is to find the pair of distributions

?PT(·),PX|T(·)?that maximizes I(T;Y). Thus, the capacity of the secondary channel is:

C = max

(8)

PT(·),PX|T(·)I(T;Y)

(9)

We will always that PX|T(·) ∈ PX|T always. The expression (9) can be upper–bounded:

C ≤

where the equality is achieved if and only if there is a pair of distributions?PT(·),PX|T(·)?that

simultaneously attains the max/min in the first/second term, respectively. We will decompose the

problem (9) into two sub–problems, maximization of I(X;Y) and minimization of I(X;Y|T).

Fig. 3 illustrates the cascade of channels where F = 2 and erasure model for X − Y with

J = {0,1,?} and q00 = q11 = 1 − p, while q0? = q1? = p. Let us assume that the primary

constraint uses a =1

max

PT(·),PX|T(·)I(X;Y) −

min

PT(·),PX|T(·)I(X;Y|T)

(10)

2. The two multisymbols, corresponding to t = 1 and t = 2 are {00,01,11}

and {00,10,11}, respectively. It is seen that uniform PT(·) induces uniform PX(·). On the

other hand, the capacity of the vector channel with erasures X − Y is achieved when PX(·) is

uniform. The reader can check that uniform PT(·) and the choice of PX|T(·) according to Fig. 3

simultaneously maximizes I(X;Y) and minimizes I(X;Y|T).

February 3, 2012DRAFT

Page 9

9

IV. MAXIMIZATION OF I(X;Y)

?PT(·),PX|T(·)?

Each pair of distributions induces a distribution PXon X. Let PXdenote

X⊂ PXcontaining the distributions PX(·)

the set of all possible distributions PX(·), while PT

that can be induced by all possible pairs?PT(·),PX|T(·)?. Then the following holds:

Proposition 1: The set of distributions PT

?

x∈Xs

Proof: We need to show that if PX(·) ∈ PT

?

where (a) follows from the definition (7) and (b) from?

The previous proposition implies maxPT(·),PX|T(·)I(X;Y) ≤ maxPX(·)∈PX,SI(X;Y). We will

first look for the distribution PX∗(·) ∈ PX,S that maximizes I(X;Y). Once PX∗(·) is known,

we choose?PT(·),PX|T(·)?in order to induce the desired PX∗(·). Let us define:

CXY = max

Xis a subset of PX,S, where PX,S⊂ PXand:

PX,S=PX(·)|

?

PX(x) = PS(s),∀s = 0,1,···F

?

(11)

X, then PX(·) ∈ PX,S. Let PX(·) ∈ PT

?

t∈TPT(t) = 1.

X, then:

x∈Xs

PX(x) =

?

x∈Xs

?

t∈T

PT(t)PX|T(x|t) =

?

t∈T

PT(t)

x∈Xs

PX|T(x|t)

(a)

= PS(s)

?

t∈T

PT(t)

(b)

= PS(s)

PX∈PX,S(·)I(X;Y)

(12)

which is never larger than the capacity of X − Y, achieved by selecting over all PX(·) ∈ PX.

For example, if the probability a ?=1

where F(1 − p) is the capacity of F erasure channel uses. This is because the achieving the

capacity of the erasure channel requires uniform distribution PU,X(x) = 2−F, which induces the

necessary condition?

In this text we are interested in channels X−Y where each single channel use x consists of

F uses of a more elementary, identical channels, leading to the following symmetry: the set of

2and there are erasure–type errors, then CXY < F(1 − p),

x∈XsPU,X(x) =?F

s

?2−F, but this is not equal to PS(s) if a ?=1

2.

transition probabilities {PY|X(y|x)} is identical for all x ∈ Xs, as they are all permutations of

a vector with s 1s and F −s 0s. This is valid irrespective of the the type of elementary channel

used for a single primary packet. Such a symmetry is instrumental for making statements about

CXY. The following lemma is proved in Appendix A.

Lemma 1: The distribution PX(·) ∈ PX,Sthat achieves CXY is, for all s and each x ∈ Xs:

PX(x) =PS(s)

?F

s

?

(13)

February 3, 2012DRAFT

Page 10

10

Having found PX(·) that attains CXY, it remains to find T , PT(·) and PX|T(·) (i. e. the

representatives of each T = t) such that (13) is satisfied. For example, let F = 4 and |Xs| =

1,4,6,4,1 for s = 0,1,2,3,4, respectively. Let at first take |T | = 4m and uniform PT(t) =

Then each x ∈ X1 can be a representative of exactly m different elements of T , such that

PX(X = x) = PS(1) · m ·

can choose x ∈ Xsto be a representative of exactly m elements from T ; i. e. PX|T(x|t) = PS(s)

for m different values t and zero otherwise. The resulting PX(·) satisfies (13). To satisfy this

condition for all s simultaneously, |T | should be divisible with?F

to the following lemma, stated without proof (lcm stands for “least common multiplier”):

Lemma 2: The distribution PX(·) that satisfies (13) can be achieved by choosing uniform

PT(·) over a set with a minimal cardinality of |T | = lcm??F

V. MINIMIZATION OF I(X;Y|T)

A. Definition of Minimal Multisymbols

1

4m.

1

4m= PS(1)/?4

1

?. In general, if |T | =?F

s

?· m and uniform PT(t), we

s

?for all s = 0···F, leading

0

?,?F

1

?,...,?F

F

??.

The multisymbol Mt= {x0(t),···xF(t)} corresponding to t has one representative in each

xs(t) = Xs, such that PX|T(xs(t)|t) = PS(s) and is zero for the other x. Since I(X;Y|T = t)

depends on the choice of representatives in Mt, we will denote it by I(X;Y|Mt), such that:

?

For example, let F = 5 with M1 = {00000,00001,00011,00111,01111,11111} and M2 =

{00000,00001,00110,11100,10111,11111}. Assuming a binary symmetric channel with q00=

q11 = 0.8,q01 = q10 = 0.2 it can be seen that I(X;Y|M1) < I(X;Y|M2). For intuitive

explanation, consider two representatives xsi∈ Xsi, i = 1,2. From (3) the Hamming weight of

xsiis siand, without loss of generality, assume s1> s2. For the multisymbol M1, the Hamming

distance between any two representatives is given by:

I(X;Y|T) =

t∈T

I(X;Y|Mt)

(14)

dH(xs1,xs2) = s2− s1

(15)

and is minimal possible. Informally, any two representatives from M1 are as similar to each

other as possible since they represent the same input T = 1, which is not the case for M2.

The multisymbols satisfying (15) are of special interest and will be termed minimal multi-

symbols. Among them, there is one termed basic multisymbol Mbwith a particular structure:

February 3, 2012 DRAFT

Page 11

11

the representative in Xs is 00···011···1 starts with F − s consecutive zeros and s consec-

utive ones. It can be shown that any minimal multisymbol can be obtained from the basic

one via permutation, such that there are F! different minimal multisymbols. For example, let

Mb= {000,001,011,111} and we apply the permutation π = 321: the components of each

x ∈ Mbare permuted according to π to obtain Mm= {000,100,110,111}. In general, for a

given permutation π we define γπ(·):

M?= γπ(M)

such that each x?

(16)

s∈ M?is obtained from the corresponding xs ∈ M by permuting the

packets according to π and the Hamming distance between any two representatives is preserved

dH(xs1,xs2) = dH(x?

s1,x?

s2) = s2− s1.

B. Analysis of I(X;Y|T = t) = I(X;Y|Mt)

We write the mutual information I(X;Y|Mt) = H(Y|Mt)−H(Y|X,Mt) and first consider:

F

?

Since each component of xsuses identical memoryless channel, H(Y|xs(t)) depends only on

the Hamming weight s, but not on how the 0s and 1s are arranged in xs. This is stated through:

H(Y|X,Mt) =

s=0

PS(s)H(Y|xs(t))

(17)

Lemma 3: The conditional entropy for xs∈ Xs, having a Hamming weight of s, is given by:

H(Y|X = xs) = sH(q1) + (F − s)H(q0) = Hs

where H(qi) = −?J

fact that P(y|x) =?F

as:

?

where (a) follows from changing the order of summation. If we consider the component j = 1:

?

(b)

= −

(18)

j=1qijlog2qijfor i = 0,1 and qiis given by (1).

Proof: In order to determine H(Y|X = x) = −?

y∈JF P(y|x)log2P(y|x), we use the

f=1qxfyfis a product distribution, such that we can write H(Y|X = x)

−

y∈JF

F?

i=1

qxiyi

F

?

j=1

log2qxjyj= −

F

?

j=1

?

y1∈J

···

?

yF∈J

log2qxjyj

F?

i=1

qxiyi

−

y1∈J

···

?

yF∈J

log2qx1y1

F?

i=2

qxiyi

= −

?

?

y1∈J

qx1y1log2qx1y1

?

y2∈J

···

?

yF∈J

F?

i=2

qxiyi

y1∈J

log2qx1y1· qx1y1= H(qx1)

(19)

February 3, 2012DRAFT

Page 12

12

where (b) follows from?

that each xj= i, i = 0,1, contributes H(qi) to H(Y|X = x), which proves the lemma.

Using the lemma, (17) can be rewritten as H(Y|X,Mt) =?F

by the actual choice of Mt, as long as there is a representative in each Xs.

y2∈J···?

yF∈J

?F

i=2qxiyi= 1. Doing the same for j = 2...F shows

s=0PS(s)Hsand is not affected

C. Analysis of H(Y|Mt)

To gain intuition, we first consider a special type of PS(·), in which only two states s1,s2∈ S

occur with non-zero probability PS(s1) = λ and PS(s2) = 1 − λ, such that Mt= {xs1,xs2}.

Due to the symmetry implied by Lemma 3, without losing generality, we first pick an arbitrary

xs1∈ Xs1. Then, how to select xs2∈ Xs2in order to minimize the H(Y|Mt)? Slightly abusing

the notation from (15), we use dH(x) to denote the Hamming weight of x. Recall that dH(x) = s

for x ∈ Xs. Let guv(xs1,xs2), where u,v ∈ {0,1} denote the number of positions f at which

xs1f = u and xs2f = v. For example, if xs1= 00110, xs2= 11011, then g00= 0, g01= 3,

g10= 1, and g11= 1 (we write guvfor brevity). Using similar arithmetics as in Lemma 3:

H(Y|T = t)=g00H(q0) + g11H(q1) + g01H(λq0+ (1-λ)q1) + g10H((1-λ)q0+ λq1)

The Hamming distance is dH(xs1,xs2) = g01+g10. The following lemma formalizes the intuition

that H(Y|Mt) is minimized when any two representatives are as similar to each other as possible.

(20)

Lemma 4: When Mtconsists of only two representatives xs1,xs2, H(Y|Mt) is minimized

when the Hamming distance dH(xs1,xs2) = |s2− s1| is minimal possible.

Proof: Without loss of generality, assume that s2> s1. Then g10(xs1,xs2) < g01(xs1,xs2)

since dH(xs1) < dH(xs2). Assume that g10(xs1,xs2) > 0 and let there be f1,f2such that:

(xs1,f1,xs2,f1) = (1,0)(xs1,f2,xs2,f2) = (0,1)

(21)

Let zs2be another representative from Xs2, obtained by swapping the positions f1,f2in xs2, but

keeping the other values of xs2, such that zs2,f1= 1 and zs2,f2= 0. Then:

g00(xs1,xs2) + 1 = g00(zs1,zs2)g11(xs1,xs2) + 1 = g11(zs1,zs2)

g01(xs1,xs2) − 1 = g01(zs1,zs2)

Using the concavity of the entropy function, we can write:

g10(xs1,xs2) − 1 = g10(zs1,zs2)

(22)

H(λq0+ (1-λ)q1) + H((1-λ)q0+ λq1) ≥ λH(q0) + (1-λ)H(q1) + (1-λ)H(q0) + λH(q1) = H(q0) + H(q1) (23)

February 3, 2012 DRAFT

Page 13

13

Using (22) and (24) it follows:

Hxs1,xs2= g00H(q0) + g11H(q1) + g01H(λq0+ (1 − λ)q1) + g10H((1 − λ)q0+ λq1) ≥

g00H(q0) + g11H(q1) + (g01− 1)H(λq0+ (1 − λ)q1) + (g10− 1)H((1 − λ)q0+ λq1) = Hxs1,zs2

where guv= guv(xs1,xs2) and Hxs1,xs2= H(Y|Mt= {xs1,xs2}). We can analogously continue

the swap the positions in xs2until getting g10= 0. Each swap does not increase H(Y|Mt),

which means that when g10= 0, H(Y|Mt) is minimal.

We now consider a general PS(·). As indicated above, H(Y|Mt) can be written as:

H(Y|Mt) =

F

?

f=1

H(uf)

(24)

where uf is the probability distribution that corresponds to the f−th position, defined as:

F

?

Without losing generality, let us take the first value xs1of each of the representatives xscan

create (F + 1)−dimensional vector z1. In a similar way z2is created, such that:

uf=

s=0

Ps[(1 − xs,f)q0+ xs,fq1]

where xs,f∈ {0,1}

(25)

z1= (x01,x11,···xF1)z2= (x02,x12,···xF2)

(26)

The probability distribution vectors u1and u2can be written as:

u1= (Q00+ Q01)q0+ (Q10+ Q11)q1

u2= (Q00+ Q10)q0+ (Q01+ Q11)q1

(27)

where Quv=?

Lemma 5: The contribution of the positions 1 and 2 to the entropy H(Y|Mt) is minimized

when one of the sets G01,G10is empty.

Proof: Let us start with a multisymbol {xs} in which none of the sets G01(z1,z2),G10(z1,z2)

is empty. Without losing generality, we will “empty” the set G01(z1,z2) as follows: If there is

s ∈ S such that xs,1= 0,xs,2= 1, these two positions in the representative xsare swapped.

That is, if there is a representative x = 01···, it is changed to 10···. Using the concavity of

the entropy, we can show that these swapping operations can decrease the contribution of the

s∈Guv(z1,z2)Psand the sets Guv(z1,z2) = {s|xs,1= u,xs,2= v} for u,v ∈ {0,1}.

positions f = 1,2 to the entropy (24). Note that after swapping (27), the new distributions are:

u?

1= Q00q0+ (Q01+ Q10+ Q11)q1

u?

2= (Q00+ Q01+ Q10)q0+ Q11q1

(28)

February 3, 2012 DRAFT

Page 14

14

Using the concavity property, it can be shown that

H(u1) + H(u2) ≥ H(u?

1) + H(u?

2)

(29)

where u1,u2and u?

from the two positions will decrease to the value (29) if the set G10(z1,z2) is emptied.

This analysis leads us to the following theorem (proof in Appendix B) and corollary:

1,u?

2are given by (27) and (28), respectively. Analogously, the contribution

Theorem 1: When each individual packet in a frame is sent over an identical channel with

binary inputs and general outputs, the minimal multisymbol minimizes H(Y|Mt).

Corollary 1: The following mutual information is constant for all minimal multisymbols Mm:

I(X;Y|Mm) = H(Y|Mm) − H(Y|X,Mm) = Im

(30)

VI. ACHIEVING THE CAPACITY OF THE SECONDARY CHANNEL

Here we analyze (10) and find T and {Mt} (i. e. PT(·)) and PX|T(·), respectively) that

simultaneously maximizes I(X;Y) according to Lemma 1 and minimizes I(X;Y|T) = Im

according to (30). Recall that uniform T with |T | = lcm??F

CXY. Since there are F! ≥ L multisymbols, then in principle it should be possible to select L

minimal multisymbols in order to have I(X;Y|T) = Imand maximize I(X;Y).

In order to show that it is always possible to select {Mt}, with |{Mt}| = L and uniform T,

we first take an example with F = 4. The set of L = 12 multisymbols can be selected as on

0

?,?F

1

?,...,?F

F

??= L can achieve

Fig. 4(a). Multisymbols can be represented by a directed graph, see Fig. 4(b). Each node in the

graph represents a particular x ∈ X. An edge exists between xs∈ Xsand xs+1∈ Xs+1if and

only if the Hamming distance is dH(xs,xs+1) = 1. The directed edge from xsto xs+1exists

if they can both belong to a same minimal multisymbol Mt. A multisymbol is represented by

a path of length F that starts at 00···0 and ends at 11···1. To each edge we can assign a

nonnegative integer, which denotes the number of multisymbols (paths) that contain that edge.

On Fig. 4(b), each edge that starts from 0000 has a weight 3, each edge between an element of

X1and X2has a weight 1, etc. The weight of each edge between xsand xs+1can be treated

as an outgoing weight for xsand incoming weight for xs+1. Using this framework, we need to

prove that, for each s = 0...F − 1, it is possible to match all outgoing weights from Xsto all

incoming weights from Xs+1. This is stated with the following theorem (proof in Appendix C):

February 3, 2012DRAFT

Page 15

15

Theorem 2: If L = lcm??F

multisymbols can be chosen such as to achieve the capacity of the secondary channel.

If F = 4 it turns out that

0

?,?F

1

?,...,?F

F

??and the distribution over T is uniform, then the

ms

F−sis always an integer, such that all the outgoing/incoming weights

to the same node are identical. This is not the case if, e. g., F = 7, then L = 105, m1= 15 and

m1

7−1=15

6, such that each node from X1has 3 outgoing edges of weight 3 and 3 of weight 2.

VII. FURTHER CONSIDERATIONS AND NUMERICAL ILLUSTRATIONS

In absence of errors Y = X, such that I(T;Y) = I(T;X) and the capacity is

CF,0=

F

?

s=0

PS(s)log2

?F

s

?

(31)

When there are no errors, the state s is always known also at the receiver and the communication

strategy is different, see [9]. Each state s is seen as a different subchannel, also denoted s, and

both the transmitter X and the receiver Y know which subchannel is used in a frame. Let

?F

Considering a large number of channel uses n → ∞, then the realization of the sequence of

frame states becomes typical [20] and the state s occurs approximately nPS(s) times. The sender

segments the message into submessages and each submessage is sent over a separate subchannel.

r(F,s) = log2

s

?denote the number of bits that are sent in a single use of the subchannel s.

The submessage sent over the subchannel s contains approximately nP(s)r(F,s) bits. If during

the i−th channel use the sender observes that the state s, then it takes the next r(F,s) bits from

the corresponding submessage. Thus, the whole message is sent by time–interleaving of all the

available subchannels and the time–interleaved sequence is perfectly observed by the receiver.

We now consider the model with erasures. An upper bound on the secondary capacity is simply

taking CXY, as defined in (12). If a =

channel with F uses. Consider now the asymptotic case F → ∞ and observe a single frame (one

single channel use). The state becomes typical and, with high probability, s ∈

where ? → 0 as F → ∞. We sketch how the capacity can be achieved in this case. First note that

it suffices that the T is?

for each T = t has representatives in the sets Xs, where s ∈

of that interval occurs, then an arbitrary x is sent. With this strategy, there are some x ∈ Xs

with s >

2

that are unused, but this is asymptotically negligible, and it can be shown that

CF

F

1

2, then CXY = F(1 − p), the capacity of the erasure

?

F(1−?)

2

,F(1+?)

2

?

,

F

F(1−?)

2

?, where the latter is assumed to be integer. Then a multisymbol

?

F(1−?)

2

,F(1+?)

2

?

. If a state s outside

F(1−?)

lim

F→∞

= (1 − p)

(32)

February 3, 2012 DRAFT