Page 1

Index Coding with Side Information

Ziv Bar-Yossef∗

Yitzhak Birk†

T. S. Jayram‡

Tomer Kol§

Abstract

Motivated by a problem of transmitting data over

broadcastchannels (BirkandKol, INFOCOM1998), we

study the following coding problem: a sender communi-

cates with n receivers R1,...,Rn. He holds an input

x ∈ {0,1}nand wishes to broadcast a single message

so that each receiver Rican recover the bit xi. Each

Rihas prior side information about x, induced by a di-

rected graph G on n nodes; Riknows the bits of x in the

positions {j | (i,j) is an edge of G}. We call encoding

schemes that achieve this goal INDEX codes for {0,1}n

with side information graph G.

In this paper we identify a measure on graphs, the

minrank, which we conjecture to exactly characterize

the minimum length of INDEX codes. We resolve the

conjecture for certain natural classes of graphs. For ar-

bitrary graphs, we show that the minrank bound is tight

for both linear codes and certain classes of non-linear

codes. For the general problem, we obtain a (weaker)

lower bound that the length of an INDEX code for any

graph G is at least the size of the maximum acyclic in-

duced subgraph of G.

1. Introduction

Source coding is one of the central areas of coding

and information theory. Shannon’s famous source cod-

ing theorem states that the average number of bits nec-

essary and sufficient to encode a source is equal (up to

one bit) to the entropy of the source. In many distributed

applications, though, the receiver may have some prior

∗Department of Electrical Engineering, Technion, Haifa 32000, Is-

rael. Email: zivby@ee.technion.ac.il. Supported by the Eu-

ropean Commission Marie Curie International Re-integration Grant.

†Department of Electrical Engineering, Technion, Haifa 32000, Is-

rael. Email: birk@ee.technion.ac.il.

‡IBM Almaden Research Center, 650 Harry Road, San Jose 95120,

CA, USA. Email: jayram@almaden.ibm.com.

§Department of Electrical Engineering, Technion, Haifa 32000, Is-

rael. Email: tomer@tx.technion.ac.il.

sideinformationaboutx, beforeitissent. Sourcecoding

with side information addresses encoding schemes that

exploit the side information in order to reduce the length

of the code. Classical results in this area [16, 19, 18]

describe how to achieve optimal rates with respect to the

joint entropy of the source and the side information.

Witsenhausen [17] initiated the study of the zero-

error side information problem. For every source input

x ∈ X, the receiver gets an input y ∈ Y that gives some

information about x. This is captured by restricting the

pairs (x,y) to belong to a fixed set L ⊆ X×Y. Both the

sender and the receiver know L, and thus each of them,

given his own input, has information about the other’s

input. Witsenhausen showed that fixed-length side in-

formationcodeswereequivalent tocoloringsofarelated

objectcalledtheconfusiongraph, andthusthelogarithm

of the chromatic number of this graph tightly charac-

terizes the minimum number of bits needed to encode

the source. Further results by Alon and Orlitsky [2] and

Koulgi et al. [12] showed that graph-theoretic informa-

tion measures could be used to characterize both the av-

erage length of variable-length codes, as well as asymp-

totic rates of codes that simultaneously encode multiple

inputs drawn from the same source.

In this paper, we study a new variant of source coding

with side information, first proposed by Birk and Kol [6]

in the context of a server that disseminates a set of data

blocks (e.g., the daily newspaper) over a broadcast chan-

nel (e.g., satellite or coaxial cable) to a set of caching

clients. At the end of the main transmission, each client

possesses some subset of the transmitted blocks, be it

due to intermittent reception, “interest filters” or limited

storagecapacity. Also, anygivenclientisonlyinterested

in some subset of the blocks, and requests retransmis-

sion of those blocks that it needs but does not possess.

There is no communication among clients, but a (slow)

“backward” channel can be used by a client to send re-

quests and metadata to the server. Each client requests

a subset of the data blocks, and advises the server of

the data blocks already available in its cache. Assum-

ing large blocks and in view of the fact that the amount

Page 2

of metadata per block is independent of block size, the

challenge is to minimize the amount of supplemental in-

formation that must be broadcast by the server in order

to enable every client to derive all its requested blocks.

Birk and Kol [6] suggested the idea of coding on de-

mand by an informed source (ISCOD). With ISCOD, the

server uses its knowledge of the cache contents and re-

quested blocks of each client along with a systematic

erasure correcting code (e.g., Reed-Solomon) to derive

a set of supplemental data blocks that would jointly en-

able every client to derive its requested blocks. The

supplemental blocks are then transmitted. Each client

uses a subset of the received supplemental blocks along

with some of its cached blocks to derive its requested

block(s). Instance-specific upper bounds on the amount

of data that must be transmitted are presented, along

with some heuristic algorithms. The bounds are nev-

ertheless shown not to be tight. No lower bounds are

presented. Finally, [6] presents a two-way protocol for

exchanging control information between the server and

the clients.

A client may request multiple blocks. With a broad-

cast channel, however, this is equivalent to multiple

single-request clients, each with the same cache content

as the original one, and is so represented. In [6], it is

pointed out that when a given block is requested by mul-

tiple clients, the main communication savings is through

only transmitting it once. Both [6] and the current paper

only address the case of unique requests.

The above scenario is formalized as a source coding

withsideinformationproblemasfollows(cf.[6]). There

is a sender who has an input x from a source alphabet

X (in this paper we confine ourselves to the alphabet

X = {0,1}n). There are n receivers R1,...,Rn, where

for each i, Riis interested in the bit xi. The side in-

formation is characterized by a simple directed graph G

(no self loops or parallel edges) on {1,2,...,n}. For

a subset S ⊆ [n], x[S] denotes the projection of x on

the coordinates in S. The side information of Riequals

x[N(i)] where N(i) ? {j ∈ V | (i,j) is an edge} de-

notes the set of out-neighbors of i in the graph G.

Example 1. Let R1,R2,...,Rn be the n receivers

(clients)over abroadcast channel whose source alphabet

is X = {0,1}n. For an input (data) x ∈ X, each receiver

Ri is interested in the value xi (requested block) but

knows xi−1as side information (cached block). (Abus-

ing notation slightly, receiver R1knows xn.) The side

information graph is thus a directed cycle of length n.

Since xi−1is “independent” of xi, it may not be clear

at first how the sender (server) can take advantage of the

side information of the receivers to shorten the broad-

cast. However, thereisastrategyinwhichthesendercan

save one bit: rather than send all the bits of x, the sender

broadcaststhen−1paritiesx1⊕x2,x2⊕x3,...,xn−1⊕

xn. Now, each receiver Rifor i > 1 can recover xiby

taking the parityof xi−1⊕xiwithxi−1. The receiver R1

on the other hand just xors the n − 1 parities broadcast

by the sender together with xnto recover x1.

Definition 2 (INDEX codes). A deterministic INDEX

code C for {0,1}nwith side information graph G on

n nodes, abbreviated as “INDEX code for G”, is a set of

codewords in {0,1}?together with:

1. An encoding function E mapping inputs in {0,1}n

to codewords, and

2. A set of decoding functions D1,D2,...Dnsuch

that Di(E(x),x[N(i)]) = xifor every i.

The graph G is known in advance to the sender and the

receivers; thus the encoding and decoding functions typ-

ically depend on G. The length of C, denoted by len(C),

is defined to be ?.

The above problem can also be cast in an equivalent

setting with a single receiver: The receiver is given an

index i and the side information x[N(i)] as inputs and

wants to recover the value xi. (The equivalence fol-

lows from the fact the sender does not know the index

i given to the receiver, and thus has to use an encoding

that allows recovering xi, for any i.) Using this equiva-

lent form, we can contrast our side information problem

with Witsenhausen’s zero-error side information prob-

lem. A first notable difference is that while in Witsen-

hausen’s setting the entire input x has to be recovered,

in our setting only a single bit xiis needed. This al-

lows significant savings in the encoding length, as the

following example demonstrates: Suppose the side in-

formation graph is a perfect matching on n nodes. Since

the receiver has only a single bit of side information,

then n − 1 bits are necessary to recover the entire input.

On the other hand, if only a single bit is needed, then

the sender can encode his input by the n/2 parities of

pairs of matched bits. A second difference from Wit-

senhausen’s setting is that the type of side information

addressed in our problem is restricted to side informa-

tion graphs. This natural restriction emanates from the

broadcastapplicationmentionedaboveandalsoimposes

more structure that enables us to obtain an interesting

combinatorial characterization of the minimum length

of INDEX codes in terms of the side information graphs.

We also consider in this paper randomized INDEX

codes, in which the encoding and decoding functions are

allowed to be randomized and are even allowed to use a

Page 3

common public random string. Decoding needs to suc-

ceed only with high probability, taken over the random

choices made by the encoding and decoding functions.

Our contributions.

functional, called minrank, which we show to charac-

terize the minimum length of INDEX codes, for natural

types of codes and for wide classes of side information

graphs. Let G be a directed graph on n vertices without

self-loops. We say that a 0-1 matrix A = (aij) fits G if

for all i and j: (i) aii = 1, and (ii) aij = 0 whenever

(i,j) is not an edge of G. Thus, A − I is the adjacency

matrix of an edge subgraph of G, where I denotes the

identity matrix. Let rk2(·) denote the 2-rank of a 0-1

matrix, namely, its rank over the field GF(2).

In this paper we identify a graph

Definition 3. minrk2(G) ? min{rk2(A) : A fits G}

The above measure for undirected graphs was con-

sidered by Haemers [11] in the context of proving

bounds for the Shannon capacity Θ of undirected

graphs. For an undirected graph G whose adjacency

matrix is M, the 2-rank of M + I (which fits G) has

also been studied in the algebraic graph theory com-

munity. For example, Brouwer and van Eijl [7] and

Peeters [15] study this quantity for strongly regular and

distance-regular graphs, respectively. It has been shown

by Peeters [14] that computing minrk2(G) is NP-hard.

Finally, it is known that minrk2has the “sandwich prop-

erty”, similar to other natural quantities such as the

Lov´ asz Theta function:

Proposition 4 ([10, 11]). For any undirected graph G,

ω(G) ≤ Θ(G) ≤ minrk2(G) ≤ χ(G). Moreover, each

of these inequalities is strict.

Our first result (see Section 3) shows that minrk2(G)

completely characterizes the minimum length of linear

INDEX codes (i.e., ones whose encoding function is lin-

ear), for arbitrary directed side information graphs G:

Theorem 5. The optimal length of a linear INDEX code

for a side information graph G equals minrk2(G).

The upper bound in the above theorem strictly im-

proves a previous result of Birk and Kol [6].

and Kol showed a construction of a linear INDEX code,

whose length is the “cover cost” of the side information

graph (and showed that the construction is suboptimal).

For undirected graphs, the cover cost is the same as the

chromatic number of the complement graph. Since the

minrank can be strictly smaller than this chromatic num-

ber, it immediately follows that this bound beats the Birk

and Kol bound. The lower bound for linear codes is of

Birk

interest, since linear codes are possibly the most natural

type of codes. In fact, all the existing INDEX codes (with

or without side information) we are aware of are linear.

Our second contribution is a lower bound which

holds for general INDEX codes including deterministic

and randomized INDEX codes. This result is presented

in Section 4.

Theorem 6. The length of any δ-error randomized

INDEX code for G is at least MAIS(G) · (1 − H2(δ)),

where MAIS(G) is the size of the maximum acyclic in-

duced subgraph of G and H2(·) is the binary entropy

function.

If G is undirected, then MAIS(G) equals the size of

the largest independent set in G, i.e., ω(G). Given the

gap between ω(G) and minrk2(G) mentioned above,

a natural question is whether minrk2(G) characterizes

the optimal length of general INDEX codes for general

graphs G.

Conjecture 7. The optimal length of a general INDEX

code for G equals minrk2(G), i.e. linear codes achieve

the optimal length over all codes for G.

InSection5wegivesupportingevidence forthiscon-

jecture by proving that minrk2(G) is a lower bound

on the minimum length of a wide class of non-linear

codes. An INDEX code is called linearly-decodable,

if all its n decoding functions are linear. A linearly-

decodable code need not be linearly encodable.

simple argument shows that the length of a linearly-

decodable INDEX code for any graph G is at least

minrk2(G). We relax the notion of linearly-decodable

codes to “semi-linearly-decodable” codes. An INDEX

code is k-linearly-decodable, if at least k of its decod-

ing functions are linear. Note that n-linearly-decodable

codes are simply linearly-decodable, while 0-linearly-

decodable codes are unrestricted. We are able to prove

the conjecture for k-linearly-decodable codes when k ≥

n − 2:

Theorem 8. For any graph G, and for any k ≥ n − 2,

the length of any k-linearly-decodable INDEX code for

G is at least minrk2(G).

Our lower bound for general codes (Theorem 6)

immediately gives tight bounds for directed acyclic

graphs and undirected graphs G that satisfy ω(G) =

minrk2(G) = χ(G). In particular, they hold for perfect

graphs1. In Section 6, we are able to prove that min-

rank characterizes the minimum length of INDEX codes,

A

1Recall that an undirected graph G is called perfect, if for every

induced subgraph G?of G, ω(G?) = χ(G?). Perfect graphs include

a wide class of graphs such as trees, bipartite graphs, interval graphs,

chordal graphs, etc.

Page 4

even for non-perfect graphs, namely odd holes (undi-

rected odd-length cycles of length at least 5) and odd

anti-holes (complements of odd holes).

Theorem 9. Let G be any graph, which is either a

DAG, a perfect graph, an odd hole, or an odd anti-hole.

Then, the length of any INDEX code for G is at least

minrk2(G).

Finally, we consider the following natural direct sum-

type problem: If a graph G has k connected components

G1,...,Gk, then is the length of the best INDEX code

for G equal to the sum of the lengths of the best codes

for G1,...,Gk? The answer should intuitively be affir-

mative, but a direct proof seems to be elusive. In fact,

using the techniques of Feder et al. [9], one can show a

connection between the two, but incurring a loss of an

additive term that depends linearly on k. After lower

bounding the length of a code by its information cost,

we are able to prove a tight direct sum theorem w.r.t.

the information cost measure. We note that almost all

our lower bounds hold not only for the length of INDEX

codes but also for their information cost. This result is

presented in Section 4.

Techniques.

required us to resort to a multitude of techniques from

linear algebra, information theory, Fourier analysis, and

combinatorics.

The lower bounds for linearly-encodable and

linearly-decodable codes are based on dimension argu-

ments from linear algebra. To extend the lower bound

for linearly-decodable codes to semi-linearly-decodable

codes, we used an intriguing “balance property” of

Boolean functions: If all linear Boolean functions are

“balanced” on some set U (i.e., get the same number

of 0’s and 1’s on the set), then all Boolean functions

(whether linear or not) are balanced on U. To prove

this property, we use Fourier analysis to represent arbi-

trary Boolean functions as linear combinations of linear

functions. We then introduce the notion of “conditional

minrank” of a Boolean matrix and explore its proper-

ties using the balance property. This in turn allows us

to extend the lower bound for linearly-decodable codes

to (n − 2)-linearly-decodable codes. Extension of the

proof technique to hold for k-linearly-decodable codes,

for k < n−2, would require better understanding of the

conditional minrank measure.

The lower bound for general (randomized) codes and

the direct sum theorem are proved via information the-

ory arguments.We extend previous arguments from

[5, 4] to obtain a direct sum theorem for the informa-

tion cost of codes.

The many results presented in this paper

Finally, our lower bounds for odd holes and odd anti-

holes are purely combinatorial. We employ a connection

between vertex covers of a graph G and the structure of

the confusion graph corresponding to the INDEX coding

for G. We note that dealing with odd holes, and with the

pentagon in particular, turned out to be very challenging,

because the standard technique of lower bounding the

chromatic number of the corresponding confusion graph

via its independence number does not work.

Related work.

codinginwhich INDEX codeshavebeenaddressed. Am-

bainis et al. [3] considered the so called “random access

codes”2, whichareidenticaltorandomized INDEX codes

without side information. Their main thrust was proving

tight bounds on the length of the codes in the quantum

setting, where inputs can be encoded by qubits rather

than classical bits; their result applied to the classical

setting is a special case of our Theorem 6 for the case

when G is the empty graph.

The problem of INDEX coding with side information

can also be cast as a one-way communication complex-

ity problem of the indexing function [13] (from which

the term INDEX codes was coined) with the additional

twist of side information. Alice (the sender) is given an

input x, sends a single message to Bob. Bob is given

an index i and the side information x[N(i)], and wants

to learn xi. Another formulation of INDEX coding is in

terms of network coding [20, 1]. As such, it represents a

restricted case of a single source, a single encoder and a

single channel, but with the important addition of a spe-

cial flavor of side information. Parts of this information

are known to different decoders, and the encoder is fully

aware of this knowledge.

There are settings other than source

Notation.

notations. Let [n] denote the set {1,2,...,n}. Let ei

denote the i-th standard basis vector. The dimensions

of these vectors are understood from the context. For a

subset S ⊆ [n], we denote by x[S] the projection of a

vector x on the coordinates in S.

Throughout the paper, we use the following

2. Sandwich property of minrank

Westartwithanobservationrelatingminranktoother

well-known graph measures.

2We chose the term INDEX codes to avoid confusion since the term

“random access” denotes a different concept in the information theory

community.

Page 5

Proposition 4 (restated) For any undirected graph G,

ω(G) ≤ Θ(G) ≤ minrk2(G) ≤ χ(G). Moreover, each

of these inequalities is strict.

Proof. Fix an optimal coloring of G. Define the 0-1 ma-

trix A by Aij= 1 if i and j get the same color in G, and

0, otherwise. The matrix A fits G, and rk2(A) = χ(G).

Hence, minrk2(G) ≤ rk2(A) = χ(G).

Recall that the Shannon capacity Θ(G) of a graph G

is defined as limk→∞α(Gk)1/k. Here Gkdenotes the

(strong)k-thpowerofG, wherethereisanedgebetween

distinct (u1,u2,...,uk) and (v1,v2,...,vk) if and only

if for all j either uj= vjor ujis connected to vjin G.

It can be verified that Gkhas an independent set of size

α(G)k, so Θ(G) ≥ α(G) = ω(G).

Suppose A fits G such that rk2(A) = minrk2(G).

It can be verified that the k-th matrix tensor power of

A, denoted by A⊗k, fits Gk. Since A⊗khas a square

identity sub-matrix corresponding to a largest indepen-

dent set in Gk, we have α(Gk) ≤ rk2(A⊗k). It is well

known that rk2(B⊗k) = rk2(B)kfor any matrix B, so

rk2(A⊗k) = rk2(A)k= minrk2(G)k. Taking the k-th

root on both sides and letting k → ∞ proves the re-

quired bound.

From the results in [10], it is known that the fam-

ily of Symplectic graphs Gnwith parameter n satisfies

minrk2(Gn) = 2n + 1 whereas χ(Gn) = 2n+ 1, ex-

hibiting a large gap between these two measures. In con-

trast, gap between minrk2(G) and ω(G), to the best of

our knowledge, is via odd cycles: for a cycle of length

2n+1 itsminrank equals n+1 whereas its independence

number equals n. Lov´ asz’s classic paper which intro-

duced the θ-function shows that the Shannon capacity

of the 5-cycle equals√5, which is strictly smaller than

its minrank.

3. Linear codes

In this section we obtain a tight characterization of

the length of linear INDEX codes for all side information

graphs G.

Theorem 5 (restated)

ear INDEX code for a side information graph G equals

minrk2(G).

The optimal length of a lin-

Proof. Let A be the matrix that fits G whose rank equals

minrk2(G) ? k. Assume without loss of generality that

the span of the first k rows A1,...,Akequals the span

of all the rows of A. The encoding function is simply

the k bits bj? Aj· x for 1 ≤ j ≤ k.

Decoding proceeds as follows. Fix a receiver Rifor

some i ∈ [n] and let Ai=?k

j=1λjAjfor some choice

of λj’s. The receiver first computes Ai·x =?k

ci= Ai− ei, where eiis the i-th standard basis vector.

Observe that the only non-zero entries in cicorrespond

to coordinates which are among the neighbors of i in G.

This means that the receiver can compute ci·x using the

side information. Receiver Rican now recover xivia

(Ai· x) − (ci· x) = ei· x = xi.

For the lower bound, suppose C is an arbitrary

linear INDEX code for G defined by the set S =

{u1,u2,...,uk}, i.e. x is encoded by the taking its in-

ner product with each vector in S.

j=1λjbj

using the k-bit encoding of x. Now, consider the vector

Claim 10. For every i, eibelongs to the span of S∪{ej:

j ∈ N(i)}.

Before we prove the claim, we show how to finish

the proof of the lower bound. Fix an i ∈ [n]; the claim

shows that ei=?k

ei−?

span of S. Therefore, the matrix A whose rows are given

by A1,A2,...,Anfits G and has rank at most k. We

conclude that k ≥ rk2(A) ≥ minrk2(G).

It remains to prove the claim. Fix an i and suppose

to the contrary that eiis not in the subspace W spanned

by the vectors in S ∪ {ej : j ∈ N(i)}. Recall that the

dual of W, denoted by W⊥denotes the set of vectors

orthogonaltoeveryvectorinW, i.e., W⊥= {v : v·w =

0 for all w ∈ W}. It is well-known that W⊥⊥= W.

Therefore, the assumption ei/ ∈ W implies that there is a

vector x ∈ W⊥such that x·ei

since x ∈ W⊥, we have that x is orthogonal to every

vector in S ∪ ∪{ej : j ∈ N(i)}. It follows that (i) the

encoding for x equals 0k, and (ii) the side information

xjavailable to receiver Riequals 0 for all j ∈ N(i).

This violates the correctness of the encoding because the

input 0nalso satisfies (i) and (ii), yet Equation (*) shows

that it differs from x in coordinate i.

j=1λjuj+?

j∈N(i)µjej, for some

choice of λ and µ. Rearranging, we have?

0 in coordinates outside N(i) and that Aibelongs to the

jλjuj =

j∈N(i)µjej ? Ai. It follows that Aihas value

(∗)

?= 0. On the other hand,

4. General codes

In this section, we prove lower bounds for the class

of general randomized INDEX codes. The main techni-

cal statement is a direct-sum result for the information

cost of a randomized INDEX code. See [8] for the basic

information theory notions and facts used in this section.