Page 1
Index Coding with Side Information
Ziv Bar-Yossef∗
Yitzhak Birk†
T. S. Jayram‡
Tomer Kol§
Abstract
Motivated by a problem of transmitting data over
broadcastchannels (BirkandKol, INFOCOM1998), we
study the following coding problem: a sender communi-
cates with n receivers R1,...,Rn. He holds an input
x ∈ {0,1}nand wishes to broadcast a single message
so that each receiver Rican recover the bit xi. Each
Rihas prior side information about x, induced by a di-
rected graph G on n nodes; Riknows the bits of x in the
positions {j | (i,j) is an edge of G}. We call encoding
schemes that achieve this goal INDEX codes for {0,1}n
with side information graph G.
In this paper we identify a measure on graphs, the
minrank, which we conjecture to exactly characterize
the minimum length of INDEX codes. We resolve the
conjecture for certain natural classes of graphs. For ar-
bitrary graphs, we show that the minrank bound is tight
for both linear codes and certain classes of non-linear
codes. For the general problem, we obtain a (weaker)
lower bound that the length of an INDEX code for any
graph G is at least the size of the maximum acyclic in-
duced subgraph of G.
1. Introduction
Source coding is one of the central areas of coding
and information theory. Shannon’s famous source cod-
ing theorem states that the average number of bits nec-
essary and sufficient to encode a source is equal (up to
one bit) to the entropy of the source. In many distributed
applications, though, the receiver may have some prior
∗Department of Electrical Engineering, Technion, Haifa 32000, Is-
rael. Email: zivby@ee.technion.ac.il. Supported by the Eu-
ropean Commission Marie Curie International Re-integration Grant.
†Department of Electrical Engineering, Technion, Haifa 32000, Is-
rael. Email: birk@ee.technion.ac.il.
‡IBM Almaden Research Center, 650 Harry Road, San Jose 95120,
CA, USA. Email: jayram@almaden.ibm.com.
§Department of Electrical Engineering, Technion, Haifa 32000, Is-
rael. Email: tomer@tx.technion.ac.il.
sideinformationaboutx, beforeitissent. Sourcecoding
with side information addresses encoding schemes that
exploit the side information in order to reduce the length
of the code. Classical results in this area [16, 19, 18]
describe how to achieve optimal rates with respect to the
joint entropy of the source and the side information.
Witsenhausen [17] initiated the study of the zero-
error side information problem. For every source input
x ∈ X, the receiver gets an input y ∈ Y that gives some
information about x. This is captured by restricting the
pairs (x,y) to belong to a fixed set L ⊆ X×Y. Both the
sender and the receiver know L, and thus each of them,
given his own input, has information about the other’s
input. Witsenhausen showed that fixed-length side in-
formationcodeswereequivalent tocoloringsofarelated
objectcalledtheconfusiongraph, andthusthelogarithm
of the chromatic number of this graph tightly charac-
terizes the minimum number of bits needed to encode
the source. Further results by Alon and Orlitsky [2] and
Koulgi et al. [12] showed that graph-theoretic informa-
tion measures could be used to characterize both the av-
erage length of variable-length codes, as well as asymp-
totic rates of codes that simultaneously encode multiple
inputs drawn from the same source.
In this paper, we study a new variant of source coding
with side information, first proposed by Birk and Kol [6]
in the context of a server that disseminates a set of data
blocks (e.g., the daily newspaper) over a broadcast chan-
nel (e.g., satellite or coaxial cable) to a set of caching
clients. At the end of the main transmission, each client
possesses some subset of the transmitted blocks, be it
due to intermittent reception, “interest filters” or limited
storagecapacity. Also, anygivenclientisonlyinterested
in some subset of the blocks, and requests retransmis-
sion of those blocks that it needs but does not possess.
There is no communication among clients, but a (slow)
“backward” channel can be used by a client to send re-
quests and metadata to the server. Each client requests
a subset of the data blocks, and advises the server of
the data blocks already available in its cache. Assum-
ing large blocks and in view of the fact that the amount
Page 2
of metadata per block is independent of block size, the
challenge is to minimize the amount of supplemental in-
formation that must be broadcast by the server in order
to enable every client to derive all its requested blocks.
Birk and Kol [6] suggested the idea of coding on de-
mand by an informed source (ISCOD). With ISCOD, the
server uses its knowledge of the cache contents and re-
quested blocks of each client along with a systematic
erasure correcting code (e.g., Reed-Solomon) to derive
a set of supplemental data blocks that would jointly en-
able every client to derive its requested blocks. The
supplemental blocks are then transmitted. Each client
uses a subset of the received supplemental blocks along
with some of its cached blocks to derive its requested
block(s). Instance-specific upper bounds on the amount
of data that must be transmitted are presented, along
with some heuristic algorithms. The bounds are nev-
ertheless shown not to be tight. No lower bounds are
presented. Finally, [6] presents a two-way protocol for
exchanging control information between the server and
the clients.
A client may request multiple blocks. With a broad-
cast channel, however, this is equivalent to multiple
single-request clients, each with the same cache content
as the original one, and is so represented. In [6], it is
pointed out that when a given block is requested by mul-
tiple clients, the main communication savings is through
only transmitting it once. Both [6] and the current paper
only address the case of unique requests.
The above scenario is formalized as a source coding
withsideinformationproblemasfollows(cf.[6]). There
is a sender who has an input x from a source alphabet
X (in this paper we confine ourselves to the alphabet
X = {0,1}n). There are n receivers R1,...,Rn, where
for each i, Riis interested in the bit xi. The side in-
formation is characterized by a simple directed graph G
(no self loops or parallel edges) on {1,2,...,n}. For
a subset S ⊆ [n], x[S] denotes the projection of x on
the coordinates in S. The side information of Riequals
x[N(i)] where N(i) ? {j ∈ V | (i,j) is an edge} de-
notes the set of out-neighbors of i in the graph G.
Example 1. Let R1,R2,...,Rn be the n receivers
(clients)over abroadcast channel whose source alphabet
is X = {0,1}n. For an input (data) x ∈ X, each receiver
Ri is interested in the value xi (requested block) but
knows xi−1as side information (cached block). (Abus-
ing notation slightly, receiver R1knows xn.) The side
information graph is thus a directed cycle of length n.
Since xi−1is “independent” of xi, it may not be clear
at first how the sender (server) can take advantage of the
side information of the receivers to shorten the broad-
cast. However, thereisastrategyinwhichthesendercan
save one bit: rather than send all the bits of x, the sender
broadcaststhen−1paritiesx1⊕x2,x2⊕x3,...,xn−1⊕
xn. Now, each receiver Rifor i > 1 can recover xiby
taking the parityof xi−1⊕xiwithxi−1. The receiver R1
on the other hand just xors the n − 1 parities broadcast
by the sender together with xnto recover x1.
Definition 2 (INDEX codes). A deterministic INDEX
code C for {0,1}nwith side information graph G on
n nodes, abbreviated as “INDEX code for G”, is a set of
codewords in {0,1}?together with:
1. An encoding function E mapping inputs in {0,1}n
to codewords, and
2. A set of decoding functions D1,D2,...Dnsuch
that Di(E(x),x[N(i)]) = xifor every i.
The graph G is known in advance to the sender and the
receivers; thus the encoding and decoding functions typ-
ically depend on G. The length of C, denoted by len(C),
is defined to be ?.
The above problem can also be cast in an equivalent
setting with a single receiver: The receiver is given an
index i and the side information x[N(i)] as inputs and
wants to recover the value xi. (The equivalence fol-
lows from the fact the sender does not know the index
i given to the receiver, and thus has to use an encoding
that allows recovering xi, for any i.) Using this equiva-
lent form, we can contrast our side information problem
with Witsenhausen’s zero-error side information prob-
lem. A first notable difference is that while in Witsen-
hausen’s setting the entire input x has to be recovered,
in our setting only a single bit xiis needed. This al-
lows significant savings in the encoding length, as the
following example demonstrates: Suppose the side in-
formation graph is a perfect matching on n nodes. Since
the receiver has only a single bit of side information,
then n − 1 bits are necessary to recover the entire input.
On the other hand, if only a single bit is needed, then
the sender can encode his input by the n/2 parities of
pairs of matched bits. A second difference from Wit-
senhausen’s setting is that the type of side information
addressed in our problem is restricted to side informa-
tion graphs. This natural restriction emanates from the
broadcastapplicationmentionedaboveandalsoimposes
more structure that enables us to obtain an interesting
combinatorial characterization of the minimum length
of INDEX codes in terms of the side information graphs.
We also consider in this paper randomized INDEX
codes, in which the encoding and decoding functions are
allowed to be randomized and are even allowed to use a
Page 10
size is 2n+1, while the above proof only shows a lower
bound of |C| > 2n. Optimal code size lower bounds
are important for deriving lower bounds on the average
encoding length and on the information cost. In the full
version of this paper, we give tight lower bounds (i.e.,
2n+1) on the size of INDEX codes for odd holes; the
proof for n ≥ 7 involves a more involved combinatorial
argument while proof for the pentagon is by brute force
computer simulations.
7. Conclusions
In this paper, we explored upper and lower bounds
on the length of INDEX codes for {0,1}nwith side in-
formation graph G. We identified a measure on graphs,
theminrank, whichweshowed tocharacterize thelength
of INDEX codes for natural classes of graphs (DAGs,
perfect graphs, odd holes, and odd anti-holes).
also proved that minrank characterizes the minimum
length of natural types of INDEX codes (linear, linearly-
decodable, and semi-linearly-decodable) for arbitrary
graphs. For general codes and general graphs, we were
able to obtain a weaker bound in terms of the maximum
acyclic induced subgraph. Finally, we proved a direct
sum theorem for the information cost of INDEX codes
with side information.
The general question, i.e., whether minrank is a lower
bound on the length of any INDEX code for any graph,
remains open. Perhaps one could relax the conjecture
and consider fields other than GF(2).
The minrank by itself is an interesting subject of
study. We know that for undirected graphs, it is bounded
from below by the Shannon capacity and from above
by the chromatic number of the graph complement. It
would be interesting to explore further properties of
minrank with respect to other graph measures such as
the Lov´ asz Theta function.
We
References
[1] R. Ahlswede, N. Cai, S.-Y. R. Li, and R. W. Yeung.
Network information flow. IEEE Trans. Inform. Theory,
46:1204–1216, 2000.
[2] N. Alon and A. Orlitsky.
entropies. IEEE Transactions on Information Theory,
42(5):1329–1339, 1996.
[3] A. Ambainis, A. Nayak, A. Ta-Shma, and U. Vazirani.
Dense quantum coding and quantum finite automata. J.
ACM, 49(4):496–511, 2002.
[4] Z. Bar-Yossef, T. S. Jayram, R. Krauthgamer, and R. Ku-
mar. The sketching complexity of pattern matching. In
Source coding and graph
Proceedings of the 8th International Workshop on Ran-
domization and Computation (RANDOM), pages 261–
272, 2004.
[5] Z. Bar-Yossef, T. S. Jayram, R. Kumar, and D. Sivaku-
mar. An information statistics approach to data stream
and communication complexity. J. Computer and Sys-
tem Sciences, 68(4):702–732, 2004.
[6] Y. Birk and T. Kol. Coding-on-demand by an informed
source (ISCOD) for efficient broadcast of different sup-
plemental data to caching clients. IEEE Transactions
on Information Theory, 52(6):2825–2830, 2006. Earlier
version appeared in INFOCOM ’98.
[7] A. E. Brouwer and C. A. van Eijl. On the p-ranks of the
adjacency matrices of strongly regular graphs. Journal
of Algebraic Combinatorics, 1:329–346, 1992.
[8] T. M. Cover and J. A. Thomas. Elements of Information
Theory. John Wiley & Sons, Inc., 1991.
[9] T. Feder, E. Kushilevitz, M. Naor, and N. Nisan. Amor-
tized communication complexity.
Computing, 24(4):736–750, 1995.
[10] W. H. Haemers. An upper bound for the Shannon ca-
pacity of a graph. Algebraic methods in Graph Theory,
25:267–272, 1978.
[11] W. H. Haemers. On some problems of Lov´ asz concern-
ing the shannon capacity of a graph. IEEE Transactions
on Information Theory, 25(2):231–232, 1979.
[12] P. Koulgi, E. Tuncel, S. L. Regunathan, and K. Rose. On
zero-error source coding with decoder side information.
IEEE Transactions on Information Theory, 49(1):99–
111, 2003.
[13] I. Kremer, N. Nisan, and D. Ron. On randomized one-
round communication complexity. Computational Com-
plexity, 8(1):21–49, 1999.
[14] R. Peeters. Orthogonal representations over finite fields
and the chromatic number of graphs. Combinatorica,
16(3):417–431, 1996.
[15] R. Peeters. On the p-ranks of the adjacency matrices of
distance-regular graphs. Journal of Algebraic Combina-
torics, 15(2):127–149, 2002.
[16] D.SlepianandJ.K.Wolf. Noiselesscodingofcorrelated
information sources. IEEE Transactions on Information
Theory, IT-19:471–480, 1973.
[17] H. S. Witsenhausen. The zero-error side information
problem and chromatic numbers. IEEE Transactions on
Information Theory, 22(5):592–593, 1976.
[18] A. Wyner. A theorem on the entropy of certain binary
sequences and applications II. IEEE Transactions on In-
formation Theory, IT-19:772–777, 1973.
[19] A. Wyner and J. Ziv. A theorem on the entropy of certain
binary sequences and applications I. IEEE Transactions
on Information Theory, IT-19:769–771, 1973.
[20] R.W.YeungandZ.Zhang. Distributedsourcecodingfor
satellite communications. IEEE Trans. Inform. Theory,
45:1111–1120, 1999.
SIAM Journal on