Content uploaded by Amin Shokrollahi
Author content
All content in this area was uploaded by Amin Shokrollahi on Jun 11, 2014
Content may be subject to copyright.
Raptor Codes
AMIN SHOKROLLAHI
Digital Fountain, Inc.
39141 Civic Center Drive, Fremont, CA 94538, USA
amin@digitalfountain.com
AND
Laboratoire d’algorithmique
Laboratoire de mathematique algorithmique
´
Ecole Polytechnique F´ed´erale de Lausanne
1015 Lausanne, Switzerland
amin.shokrollahi@epfl.ch
January 13, 2004
Abstract
LT-Codes are a new class of codes introduced in [1] for the purpose of scalable and fault-tolerant
distribution of data over computer networks. In this paper we introduce Raptor Codes, an extension of
LT-Codes with linear time encoding and decoding. We will exhibit a class of universal Raptor codes: for
a given integer , and any real , Raptor codes in this class produce a potentially infinite stream of
symbols such that any subset of symbols of size is sufficient to recover the original symbols
with high probability. Each output symbol is generated using operations, and the original
symbols are recovered from the collected ones with operations.
We will also introduce novel techniques for the analysis of the errorprobability of the decoder for finite
length Raptor codes. Moreover, we will introduce and analyze systematic versions of Raptor codes, i.e.,
versions in which the first outputelements of the coding system coincide with the original elements.
Work on this project was done while the author was a fulltime employee of Digital Fountain, Inc.
1
1 Introduction
The binary erasure channel (BEC) of communication was introduced by Elias [2] in 1955, but it was regarded
as a rather theoretical channel model until the large-scale deployment of the Internet about 40 years later.
On the Internet data is transmitted in the form of packets. Each packet is equipped with a header that
describes the source and the destination of the packet, and often also a sequence number describing the
absolute or relative position of the packet within a given stream. These packets are routed on the network
from the sender to the receiver. Due to various reasons, for example buffer overflows at the intermediate
routers, some packets may get lost and never reach their destination. Other packets may be declared as lost if
the internal checksum of the packet does not match. Therefore, the Internet is a very good real-world model
of the BEC.
Reliable transmission of data over the Internet has been the subject of much research. For the most part,
reliability is guaranteed by use of appropriate protocols. For example, the ubiquitous TCP/IP ensures re-
liability by essentially re-transmitting packets within a transmission window whose reception has not been
acknowledged by the receiver (or packets for which the receiver has explicitly sent a negative acknowledg-
ment). It is well-known that such protocols exhibit poor behavior in many cases, such as transmission of
data from one server to multiple receivers, or transmission of data over heavily impaired channels, such as
poor wireless or satellite links. Moreover, ack-based protocols such as TCP perform poorly when the distance
between the sender and the receiver is long, since large distances lead to idle times during which the sender
waits for an acknowledgment and cannot send data.
For these reasons other transmission solutions have been proposed. One class of such solutions is based
on coding. The original data is encoded using some linear erasure correcting code. If during the transmission
some part of the data is lost, then it is possible to recover the lost data using erasure correcting algorithms.
For applications it is crucial that the codes used are capable of correcting as many erasures as possible, and it
is also crucial that the encoding and decoding algorithms for these codes are very fast.
Elias showed that the capacity of the BEC with erasure probability equals . He further proved that
random codes of rates arbitrarily close to can be decoded on this channel with an exponentially small
error probability using Maximum Likelihood (ML) decoding. In the case of the erasure channel ML decoding
of linear codes is equivalent to solving systems of linear equations. This task can be done in polynomial time
using Gaussian elimination. However, Gaussian elimination is not fast enough, especially when the length of
the code is long.
2
Reed-Solomon codes can be used to partially compensate for the inefficiency of random codes. Reed-
Solomon codes can be decoded from a block with the maximum possible number of erasures in time quadratic
in the dimension. (There are faster algorithms based on fast polynomial arithmetic, but these algorithms are
often too complicated in practice.) However, quadratic running times are still too large for many applications.
In [3] the authors construct codes with linear time encoding and decoding algorithms that can come
arbitrarily close to the capacity of the BEC. These codes, called Tornado codes, are very similar to Gallager’s
LDPC-codes [4], but they use a highly irregular weight distribution for the underlying graphs.
The running times of the encoding and decoding algorithms for Tornado codes are proportional to their
block-length rather than to their dimension. Therefore, for small rates the encoding and decoding algorithms
for these codes are slow. This turns out to be quite limiting in many applications, such as those described
in [5], since the codes used there are of extremely low rate. This suggests that the encoding/decoding times
of traditional coding technologies may not be adequate for the design of scalable data transmission systems.
There are more disadvantages of traditional block codes when it comes to their use for data transmission.
The model of a single erasure channel is not adequate for cases where data is to be sent concurrently from
one sender to many receivers. In this case the erasure channels from the sender to each of the receivers have
potentially different erasure probabilities. Typically in applications the sender or the receiver may probe their
channels so the sender has a reasonable guess of the current erasure probability of the channel and can adjust
the coding rate accordingly. But if the number of receivers is large, or in situations such as satellite or wireless
transmission where receivers experience sudden abrupt changes in their reception characteristics, it becomes
unrealistic to assume and keep track of the loss rates of individual receivers. The sender is then forced to
assume a worst case loss rate for all the receivers. This not only puts unnecessary burdens on the network if
the actual loss rate is smaller, but also compromises reliable transmission if the actual loss rate is larger than
the one provisioned for.
Therefore, to construct robust and reliable transmission schemes, a new class of codes is needed. Foun-
tain Codes constitute such a class, and they address all the above mentioned issues. They were first men-
tioned without an explicit construction in [5]. A Fountain Code produces for a given set of input symbols
a potentially limitless stream of output symbols . The input and output symbols can
be bits, or more generally, they can be binary vectors of arbitrary length. The output symbols are produced
independently and randomly, according to a given distribution on . Each output symbol is the addition of
some of the input symbols, and we suppose that the output symbol is equipped with information describing
which input symbols it is the addition of. In practice, this information can be either a part of the symbol
3
(e.g., using a header in a packet), or it can be obtained via time-synchronization between the sender and the
receiver, or it may be obtained by other application-dependent means. A decoding algorithm for a Fountain
Code is an algorithm which can recover the original input symbols from any set of output symbols with
high probability. For good Fountain Codes the value of is very close to , and the decoding time is close to
linear in .
Fountain Codes are ideally suited for transmitting information over computer networks. A server sending
data to many recipients can implement a Fountain Code for a given piece of data to generate a potentially
infinite stream of packets. As soon as a receiver requests data, the packets are copied and forwarded to the
recipient. In a broadcast transmission model there is no need for copying the data since any outgoing packet
is received by all the receivers. In other types of networks, the copying can be done actively by the sender, or
it can be done by the network, for example if multicast is enabled. The recipient collects the output symbols,
and leaves the transmission as soon as it has received of them. At that time it uses the decoding algorithm
to recover the original symbols. Note that the number is the same regardless of the channel characteristics
between the sender and the receiver. More loss of symbols just translates to a longer waiting time to receive
the packets. If can be chosen to be arbitrarily close to , then the corresponding Fountain Code has
a universality property in the sense that it operates close to capacity for any erasure channel with erasure
probability less than .
Fountain Codes have also other very desirable properties. For example, since each output symbol is
generated independently of any other one, a receiver may collect output symbols generated from the same
set of input symbols, but by different devices operating a Fountain encoder. This allows for the design of
massively scalable and fault-tolerant communication systems over packet based networks. In this paper we
will not address these and other applications, but will instead focus on the theory of such codes.
In order to make Fountain Codes work in practice, one needs to ensure that they possess a fast encoder and
decoder, and that the decoder is capable of recovering the original symbols from any set of output symbols
whose size is close to optimal with high probability. We call such Fountain Codes universal. The first class
of such universal Fountain Codes was invented by Luby [6, 7, 1]. The codes in this class are called LT-Codes.
The distribution used for generating the output symbols lies at the heart of LT-Codes. Every time an output
symbol is generated in an LT-Code, a weight distribution is sampled which returns an integer between and
the number of input symbols. Then random distinct input symbols are chosen, and their value is added to
yield the value of that output symbol. Decoding of LT-Codes issimilar to that of LDPC codes over the erasure
channel and is described later in Section 3. Whether or not the decoding algorithm is successful depends
4
solely on the weight distribution.
It can be shown (see Proposition 1) that if an LT-Code has a decoding algorithm with a probability of
error that is at most inversely polynomial in the number of input symbols, and if the algorithm needs output
symbols to operate, then the average weight of an output symbol needs to be at least for some
constant . Hence, in the desirable case where is close to , the output symbols of the LT-Code need to
have an average weight of . It is absolutely remarkable that it is possible to construct a weight
distribution that matches this lower bound via a fast decoder. Such distributions were exhibited by Luby [1].
For many applications it is important to construct universal Fountain Codes for which the average weight
of an output symbol is a constant and which have fast decoding algorithms. In this paper we introduce such a
class of Fountain Codes, called Raptor Codes. The basic idea behind Raptor codes is a pre-coding of the input
symbols prior to the application of an appropriate LT-Code (see Section 4). In the asymptotic setting, we will
design a class of universal Raptor Codes with linear time encoders and decoders for which the probability of
decoding failure converges to 1 polynomially fast in the number of input symbols. This will be the topic of
Section 6.
In practical applications it is important to bound the error probability of the decoder. The bounds obtained
from the asymptotic analysis of Section 6 are rather poor. Therefore, we develop in Section 7 analytic tools
for the design of finite length Raptor Codes which exhibit very low decoding error probabilities, and we will
exemplify our methods by designing a specific Raptor Code with guaranteed bounds on its error performance.
One of the disadvantages of LT- or Raptor Codes is that they are not systematic. This means that the input
symbols are not necessarily reproduced among the output symbols. The straightforward idea of transmitting
the input symbols prior to the output symbols produced by the coding system is easily seen to be flawed, since
this does not guarantee a high probability of decodability from any subset of received output symbols. In
Section 8 we develop a new set of ideas and design efficient systematic versions of Raptor Codes.
Raptor Codes were discovered in the late 2000, and patented in late 2001 [8]. Independently, May-
mounkov [9] later discovered the idea of pre-coding to obtain linear time codes. His results are similar to
parts of Section 6.
Raptor codes have been highly optimized and are being used in commercial systems Digital Fountain
(http://www.digitalfountain.com), a Silicon Valley based startup specializing in fast and reliable delivery of
data over heterogeneous networks. the Raptor implementation of Digital Fountain reaches speeds of several
gigabits per second, on a 2.4Ghz Intel Xeon processor, while ensuring very stringent conditions on the error
probability of the decoder, even for very short lengths.
5
2 Distributions on
Let be a positive integer. The dual space of is the space of linear forms in variables with coefficients
in . This space is non-canonically isomorphic to via the isomorphism mapping the vector
with respect to the standard basis to the linear form . A probability distribution on
induces a probability distribution on the dual space of with respect to this isomorphism. For the rest of this
paper we will use this isomorphism and will freely and implicitly interchange distributions on and its dual.
Let be a distribution on so that denotes the probability that the value is chosen.
Often we will denote this distribution by its generator polynomial . For example, using
this notation, the expectation of this distribution is succinctly given by , where is the derivative of
with respect to .
The distribution induces a distribution on (and hence on its dual) in the following way: For any
vector the probability of is , where is the weight of . A simple sampling algorithm for
this distribution would be to sample first from the distribution to obtain a weight , and then the sample
a vector of weight in uniformly at random. By abuse of notation, we will in the following denote the
distribution induced by on by as well.
As an example we mention that the uniform distribution on is given by the generating polynomial
.
3 Fountain Codes and LT-Codes
The theoretical idea of Fountain Codes was introduced in [5] and the first practical realizations of Fountain
Codes were invented by Luby [6, 7, 1]. They represent a new class of linear error-correcting codes. Let
be a positive integer, and let be a degree distribution on . A Fountain Code with parameters
has as its domain the space of binary strings of length , and as its target space the set of all sequences
over , denoted by . Formally, a Fountain Code with parameters is a linear map in
which the coordinates are independent random variables with distribution over . The block-length of a
Fountain Code is potentially infinite, but in applications we will solely consider truncated Fountain Codes,
i.e., Fountain Codes with finitely many coordinates, and make frequent and implicit use of the fact that unlike
traditional codes the length of a Fountain Code is not fixed a-priori.
The symbols produced by a Fountain Code are called output symbols, and the symbols from which these
6
output symbols are calculated are called input symbols. The input and output symbols could be elements of
, or more generally the elements of any finite dimensional vector space over (or more generally, over
any field ).
Encoding of a Fountain Code is rather straightforward: for a given vector of input symbols,
each output symbol is generated independently and randomly by first sampling from the distribution to
obtain a weight between and . Next, a vector of weight is chosen uniformly at random
from and the value of the output symbol is calculated as . We will not be concerned with the cost
of sampling from the distribution over , as this will be trivial in our applications. The encoding cost of
a Fountain Code is the expected number of operations sufficient to calculate an output symbol. This is easily
seen to be at most , where is the expected Hamming weight of the random variable with distribution
over .
In addition to conceptual differences between Fountain Codes and block codes there is also an important
operational difference between these classes of codes. For a traditional block code the structure of the code
is determined prior to its use for transmission of information. This is also true for randomized block codes,
such as random LDPC codes. On the other hand, in practice, Fountain Codes are generated “online.” Each set
of input symbols may have its own associated Fountain Code. There are various advantages to this mode of
operation of Fountain Codes such as those described in Luby’s paper [1], in Luby’s patents on this subject [6,
7], or in [5].
In this paper we will consider Fountain Codes over a memoryless BEC with erasure probability . Even
though all our results also hold for more general and not necessarily memoryless erasure channels, we will
only consider the memoryless case for sake of simplicity.
Areliable decoding algorithm of length for a Fountain Code is an algorithm which can recover the
input symbols from any set of output symbols and errs with a probability that is at most inversely polynomial
in (i.e., the error probability is at most for some positive constant ). Often, we will skip the term
reliable and only talk about an algorithm of length . The cost of such a decoding algorithm is the (expected)
number of its arithmetic operations divided by . This is equal to the average cost of recovering each input
symbol.
When transmitting information using a traditional code, both the sender and the receiver are in possession
of a description of the coding method used. For Fountain Codes this is not necessarily the case, since the
code is being generated concurrently with the transmission. Therefore, in order to be able to recover the
original information from the output symbols, it is necessary to transmit a description of the code together
7
with the output symbols. In a setting where the symbols correspond to packets in a computer network, one
can augment each transmission packet with a header information that describes the set of input symbols from
which this output symbol was generated. We refer the reader to Luby [1, 6, 7] for a description of different
methods for accomplishing this. In this paper, we will implicitly assume that the structure of the Fountain
Code is transmitted together with the code using one of the many existing methods.
A special class of Fountain Codes is furnished by LT-Codes. In this class the distribution has the
form described in Section 2. It is relatively easy to prove an information theoretic lower bound on the
encoding/decoding cost of any LT-Code which has a decoding algorithm of length approximately equal to .
We will prove the lower bound in terms of the number ofedges in the decoding graph. The decoding graph of
an algorithm of length is a bipartite graph with nodes on the one side (called the input nodes or the input
symbols) and nodes on the other (called the output nodes or the output symbols). There is an edge between
an input symbol and an output symbol if the input symbol contributes to the value of the output symbol.
The following proposition shows that the decoding graph of a reliable decoding algorithm has at least of
the order of edges. Therefore, if the number of collected output symbols is close to , then the
encoding cost of the code is at least of the order of .
Proposition 1. If an LT-Code with input symbols possesses a reliable decoding algorithm, then there is a
constant such that the graph associated to the decoder has at least edges.
Proof. Suppose that the Fountain Code has parameters . In the decoding graph we call an input
node covered if it is the neighbor of at least one output node. Otherwise, we call the node uncovered. The
error probability of the decoder is lower bounded by the probability that there is an uncovered input node. We
will establish a relationship between this probability and the average degree of an output node.
Let denote the decoding graph of the algorithm. is a random bipartite graph between input and
output nodes such that each output node is of degree with probability , and such that the neighbors of
an output node are randomly chosen. Let be an input node in . If an output node is of degree , then the
probability that is not a neighbor of that output node is . Since the output node is of degree with
probability , the probability that is not a neighbor of an output node is ,
where is the average degree of an output node. Since output nodes are constructed independently,
the probability that is not a neighbor of any of the output nodes is . Wemay assume that .
Then the Taylor expansion of shows that and hence
e , where is the average degree of an input node. Since the decoder is assumed to be
8
reliable, it errs with probability at most for some constant . This shows that e , i.e.,
This completes the proof.
In the following we will give some examples of Fountain Codes, and study different decoding algorithms.
Arandom LT-Code is an LT-Code with parameters where . As discussed earlier,
this choice for amounts to the uniform distribution on , which explains the name.
Proposition 2. A random LT-Code with input symbols has encoding cost , and ML decoding is a reliable
decoding algorithm for this code of overhead .
Proof. Since the expected weight of a vector in under uniform distribution is , the encoding cost of the
random LT-Code is .
In the case of the erasure channel the ML decoding algorithm amounts to Gaussian elimination: we collect
output symbols (the value of will be determined shortly). Each received output symbol represents a linear
equation (with coefficients in ) in the unknown input values , and thus the decoding process can
be viewed as solving a (consistent) system of linear equations in unknowns. The decoding cost of this
algorithm is , since Gaussian elimination can be performed using operations.
It is well-known that a necessary and sufficient condition for the solvability of this system is that the rank
of the corresponding matrix is equal to . The entries of this matrix are independent binary random variables
with equal probability of being one or zero. We will now prove that the probability that this matrix is not of
full rank is at most . This is shown by using a union bound. For each hyperplane in the probability
that all the rows of the matrix belong to the hyperplane is . There are hyperplanes. Therefore, the
probability that the matrix is not of full rank is at most . Choosing ,
we see that the error probability of ML decoding becomes , which proves the claim.
Gaussian elimination is computationally expensive for dense codes like random LT-Codes. For properly
designed LT-Codes, the Belief-Propagation (BP) decoder [3, 1] provides a much more efficient decoder. The
9
BP decoder can be best described in terms of the graph associated to the decoder. It performs the following
steps until either no output symbols of degree one are present in the graph, or until all the input symbols
have been recovered. At each step of the algorithm the decoder identifies an output symbol of degree one.
If none exists, and not all the input symbols have been recovered, the algorithm reports a decoding failure.
Otherwise, the value of the output symbol of degree one recovers the value of its unique neighbor among
the input symbols. Once this input symbol value is recovered, its value is added to the values of all the
neighboring output symbols, and the input symbols and all edges emanating from it are removed from the
graph.
For random LT-Codes the BP decoder fails miserably even when the number of collected output symbols
is very large. Thus the design of the degree distribution must be dramatically different from the random
distribution to guarantee the success of the BP decoder.
The analysis of the BP decoding algorithm is more complicated than the analysis of ML decoding. For
the sake of completeness, we include a short expectation analysis for the case where every output symbol
chooses its neighbors among the input symbols randomly and with replacement. We refer the reader to [1] for
the analysis of the original case where the choice is done without replacement.
As described above, the BP decoder proceeds in steps, and recovers one input symbol at each step. Fol-
lowing Luby’s notation, we call the set of output symbols of reduced degree one the output ripple at step of
the algorithm. We say that an output symbol is released at step if its degree is larger than at step ,
and it is equal to one at step , so that recovery of the input symbol at step reduces the degree of the
output symbol to one. The probability that an output symbol of initial degree releases at step can be
easily calculated as follows: This is the probability that the output symbol has exactly one neighbor among the
input symbols that are not yet recovered, and that not all the remaining neighbors are among the
already recovered input symbols. The probability that the output symbols has exactly one neighbor among
the unrecovered input symbols, and that all its other neighbors are within a set of size contained in the set
of remaining input symbols is . Therefore,
Pr output symbol is released at step degree is
Multiplying the term with the probability that the degree of the symbol is , and summing over all we
obtain
Pr output symbol is released at step
10
Note that
The approximation is very good if for constant and large .
Suppose that the decoder collects output symbols. Then the expected number of output symbols releas-
ing at step is times the probability that an output symbol releases at step , which, by the above,
is approximately equal to
In order to construct asymptotically optimal codes, i.e., codes that can recover the input symbols from any
output symbols for values of arbitrarily close to , we require that every decoded input symbol releases
exactly one output symbol. Thus, in the limit, we require , and we require that the output ripple has
expected size one at every step. This means that for . Solving this differential
equation, and keeping in mind that , we obtain the soliton distribution: . The
distribution is similar to the ideal soliton distribution of Luby [1], except that it assigns a probability of zero
to degree one, and has infinitely many terms.
The distribution of the size of the output ripple at each point in time is more difficult to calculate and we
refer the reader to the upcoming paper [10] for details.
The reader is referred to Luby’s paper for a description of LT-Codes with a distribution with
and for which the BP decoder is a reliable decoder of overhead . These
degree distributions are absolutely remarkable, since they lead to an extremely simple decoding algorithm that
essentially matches the information theoretic lower bound in Proposition 1.
4 Raptor Codes
The results of the previous section imply that LT-Codes cannot be encoded with constant cost if the number
of collected output symbols is close to the number of input symbols. In this section we will present a different
class of Fountain Codes. One of the many advantages of the new construction is that it allows for encoding
and decoding with constant cost, as we will see below.
The reason behind the lower bound of for the cost of LT-Codes is the information theoretic lower
bound of Proposition 1. The decoding graph needs to have of the order of edges in order to make sure
that all the input nodes are covered with high probability. The idea of Raptor Coding is to relax this condition
11
Precoding
LT-coding
Redundant nodes
Figure 1: Raptor Codes: the input symbols are appended by redundant symbols (black squares) in the case of
a systematic pre-code. An appropriate LT-code is used to generate output symbols from the pre-coded input
symbols
and require that only a constant fraction of the input symbols be recoverable. Then the same information
theoretic argument as before shows only a linear lower bound for the number of edges in the decoding graph.
There are two potential problems with this approach: (1) The information theoretic lower bound may not
be matchable with an algorithm, and (2) We need to recover all the input symbols, not only a constant fraction.
The second issue is addressed easily: we encode the input symbols using a traditional erasure correcting
code, and then apply an appropriate LT-Code to the new set of symbols, in a way that the traditional code is
capable of recovering all the input symbols even in face of a fixed fraction of erasures. To deal with the first
issue, we need to design the traditional code and the LT-Code appropriately.
Let be a linear code of block-length and dimension , and let be a degree distribution. A
Raptor Code with parameters is an LT-Code with distribution on symbols which are the
coordinates of codewords in . The code is called the pre-code of the Raptor Code. The input symbols of
a Raptor Code are the symbols used to construct the codeword in consisting of intermediate symbols.
The output symbols are the symbols generated by the LT-Code from the intermediate symbols. A graphical
presentation of a Raptor Code is given in Figure 1. Typically, we assume that is equipped with a systematic
encoding, though this is not necessary.
A moment’s thought reveals that Raptor Codes form a subclass of Fountain Codes: The output distribution
and a fixed pre-code induce a distribution on , where is the number of input symbols of the
Raptor Code. The output symbols of the Raptor Code are sampled independently from the distribution .
A Raptor Code has an obvious encoding algorithm as follows: given input symbols, an encoding al-
gorithm for is used to generate a codeword in corresponding to the input symbols. Then an encoding
12
algorithm for the LT-Code with distribution is used to generate the output symbols.
Areliable decoding algorithm of length for a Raptor code is an algorithm which can recover the input
symbols from any set of output symbols and errs with probability which is at most for some positive
constant . As with LT-Codes, we sometimes omit mentioning the attribute “reliable.”
The definition of the encoding cost of a Raptor Code differs slightly from that of a Fountain Code. This
is because the encoding cost of the pre-code has to be taken into account. We define the encoding cost of a
Raptor Code as , where is the number of arithmetic operations sufficient for generating
a codeword in from the input symbols. The encoding cost equals the per-symbol cost of generating
output symbols.
The decoding cost of a decoding algorithm for a Raptor Code is the expected number of arithmetic opera-
tions sufficient to recover the input symbols, divided by . As with the Fountain Codes, this cost counts the
expected number of arithmetic operations per input symbol.
We will study Raptor Codes with respect to the following performance parameters:
1. Space: Since Raptor Codes require storage for the intermediate symbols, it is important to study their
space consumption. We will count the space as a multiple of the number of input symbols. The space
requirement of the Raptor Code is , where is the rate of the pre-code.
2. Overhead: The overhead is a function of the decoding algorithm used, and is defined as the number
of output symbols that the decoder needs to collect in order to recover the input symbols with high
probability. We will measure the overhead as a multiple of the number of input symbols, so an
overhead of , for example, means that output symbols need to be collected to ensure
successful decoding with high probability.
3. Cost: The cost of the encoding and the decoding process.
In the next section we will give several examples of Raptor Codes and study their performance.
5 First Examples of Raptor Codes
The first example of a Raptor Code is an LT-code. An LT-code with input symbols and output distribution
is a Raptor Code with parameters . ( is the trivial code of dimension and block-length .)
LT-codes have optimal space consumption (i.e., 1). With an appropriate output distribution the overhead
of an LT-code is , and its cost is proportional to , as was seen in Section 3.
13
LT-codes have no pre-coding, and compensate for the lack of it by using a very intricate output distribution
. At the other end of the spectrum are Raptor Codes that have the simplest possible output distribution,
with a sophisticated pre-code, which we call pre-code-only (PCO) Raptor Codes. Let be a code of dimension
and block-length . A Raptor Code with parameters is called a PCO Raptor Code with pre-code
. In this code the input symbols are encoded via to produce the intermediate symbols and the output
distribution is fixed to the trivial distribution . The value of every output symbol equals that of an
input symbol chosen uniformly at random.
The decoding algorithm for a PCO Raptor Code is the trivial one: a predetermined number of output
symbols are collected. These will determine the values of, say, intermediate symbols. Next the decoding
algorithm for the pre-code is applied to these recovered intermediate values to obtain the values of the input
symbols.
The performance of a PCO Raptor Code depends on the performance of its pre-code , as the following
result suggests.
Proposition 3. Let be a linear code of dimension and block-length with encoding and decoding algo-
rithms that have the following properties:
1. An arbitrary input vector of length can be encoded with arithmetic operations for some .
2. There is an such that the decoding algorithm can decode over a binary erasure channel with
erasure probability with high probability using arithmetic operations for some .
Then the PCO Raptor Code with pre-code has space consumption , overhead ,
encoding cost , and decoding cost with respect tothe decoding algorithm for , where is the rate
of .
Proof. The space consumption and the costs of encoding/decoding are clear. As for the overhead, suppose
that the decoder collects output symbols. We need
to show that the probability that an intermediate symbol is not covered is at most , since if this
condition is satisfied, then the decoder for the pre-code can decode the input symbols. To show the latter,
note that the probability that an intermediate symbol is not covered is which is upper bounded by
e .
Note that the overhead of the PCO Raptor Code in the previous proposition is at least , since
for . Moreover, the overhead approaches this upper bound only
14
if approaches zero. Therefore, to obtain PCO Raptor Codes with close to optimal overhead the rate of the
pre-code needs to approach zero, which means that the running time of the code cannot be a constant. The
same is true for the space consumption of the PCO Raptor Code.
Despite these obvious shortcomings PCO Raptor Codes are quite appealing, since this transforms any
block code into a Fountain Code. For example, PCO Raptor Codes could be useful when the intermediate
symbols (codeword in ) can be calculated offline via pre-processing, and the space needed to keep these
symbols is of no concern. In such a scenario a PCO Raptor Codes is the fastest possible Fountain Code.
The choice of the code depends on the specific application in mind, though usually it is best to choose
a code with very good encoding and decoding algorithms and little overhead for a given rate. One possible
choice would be a Tornado code [3], though other choices are also possible (for example an LT-code with the
appropriate number of output symbols, or an irregular Repeat-Accumulate Code [11]).
6 Raptor Codes with Good Asymptotic Performance
In the last section we encountered two types of Raptor Codes. For one of them, the LT-codes, the overhead
and the space were close to 1, while the decoding cost grew with . For PCO Raptor Codes the decoding
cost could be chosen to be a constant, but then the overhead and the space were away from 1; moreover,
convergence to an overhead equal to 1 amounted to letting the space and the cost to grow with .
In this section we will design Raptor Codes between these two extremes. These codes have encoding and
decoding algorithms of constant cost, and their space consumption and the overhead are arbitrarily close to .
We will design these codes by choosing an appropriate output distribution and an appropriate pre-code
.
The output degree distribution wewill useis verysimilar tothe soliton distribution in Section3. However,
this distribution needs to be slightly modified. First, the soliton distribution does not have output nodes of
degree one. This means that it is not possible to start the decoding process with this distribution. Second, the
soliton distribution has infinitely many terms. Our distribution will modify the soliton distribution by capping
it at some maximum degree , and giving it an appropriate weight for output symbols of degree one.
Let be a real number larger than zero, and set and define
where . Then we have the following result.
15
Lemma 4. There exists a positive real number (depending on ) such that with an error probability of at
most e any set of output symbols of the LT-code with parameters are sufficient
to recover at least input symbols via BP decoding, where .
Proof. We use the analysis of the decoding process as described in [3] or in [12]. Consider a set of
output symbols and set up the graph associated to these output symbols. This graph is a random graph
with edge degree distributions and corresponding to the input and the output symbols, respectively.
According to the analysis in [3], for any constant , if for , then the probability
that the decoder cannot recover or more of the input nodes is upper bounded by e , where is a suitable
constant (depending on , and , but not on ).
In the case of an LT-code with parameters we have . To compute
fix an input node. The probability that this node is the neighbor of a given output node is , where is
the average degree of an output node, i.e., . The probability that the input node is the neighbor
of exactly output nodes is therefore , where is the number
of output symbols in the graph. Hence, the generating function of the degree distribution of the input nodes
equals
The edge degree distribution of the input nodes is the derivative of this polynomial with respect to ,
normalized so that . This shows that
Since e for , this implies
e
So, we only need to show that the right-hand side of this inequality is less than on , or, equivalently,
that e for . Note that
16
We will show that for , which proves the inequality
. To see that , note that the left hand side is monotonically increasing, so
we only need to prove this inequality for . For the choice in the statement of the
lemma we have
So far, we have shown that . So, e e
To complete the proof we need to show that
e
for . Note that . Therefore, the above inequality is valid on the entire interval
iff it is valid at , i.e., iff
Plugging in the value of , it remains to show that , which is verified
easily.
Note that the choices in the previous theorem are far from optimal, but they suffice to prove the asymptotic
result.
To construct asymptotically good Raptor Codes, we will use LT-Codes described in the previous lemma,
and suitable pre-codes. In the following we will assume that is a fixed positive real number, and we assume
that for every we have a linear code of block-length with the following properties:
1. The rate of is ,
2. The BP decoder can decode on a BEC with erasure probability
with arithmetic operations.
Examples of such codes are Tornado codes [3], right-regular codes [13], and certain types of repeat-accumulate
codes. The reader can consult [14] for other types of such capacity-achieving examples. We remark, however,
17
that it is not necessary for to be capacity-achieving, since we only require that the decoder be able to
decode up to fraction of errors rather than . For example, we mention without proof that the
right-regular LDPC-code with message edge degree distribution can be used as the pre-code
.
Theorem 5. Let be a positive real number, be an integer, , ,
, and let be a code with the properties described above. Then the Raptor code with
parameters has space consumption , overhead , and a cost of with
respect to BP decoding of both the pre-code and the LT-Code.
Proof. Given output symbols, we use the LT-code with parameters to recover at least a
-fraction of the input symbols, where . Lemma 4 guarantees that this is possible.
Next we use the BP decoder for to recover the input symbols in linear time.
It remains to show the assertion on the cost. The average degree of the distribution is
, where is the harmonic sum up to . (One can show that
, where is Euler’s constant.) The number of operations necessary for generating the
redundant symbols of is proportional to which is proportional to . The same is
true for the decoding cost of . This proves the assertion on the cost.
A careful look at the decoder described above shows that its error probability is only polynomially small
in , rather than exponentially small (in other words, its error exponent is zero). The reason for this is that the
error probability of the decoder for has this property. So, if a different linear time decoder for exhibits a
subexponentially small error probability, then the same will also be true for the error probability of the Raptor
Code which uses as its pre-code.
We also remark that the construction in the previous theorem is essentially optimal. Using the same
techniques as in Proposition 1 it can be shown that the parameters of a Raptor Code with a reliable decoding
algorithm of length and a pre-code of rate satisfy the inequality
for some constant , where is the output degree distribution of the corresponding LT-Code. In our
construction, we have for some constant , is ,
for some constant , and . (One can show that .) Therefore, the
upper and the lower bounds on have the same order of magnitude for small . In this respect, the codes
constructed here are essentially optimal.
18
7 Finite Length Analysis of Raptor Codes
The analysis in the previous section is satisfactory from an asymptotic but not from a practical point of view.
The analysis of the decoding process of the corresponding LT-Codes relies on martingale arguments to enable
upper bounds on the error probability of the decoder. The same is true for the pre-code. Such bounds are very
far from tight, and are especially bad when the number of input symbols is small.
In this section we will introduce a different type of error analysis for Raptor Codes of finite length with
BP decoding. This analysis relies on the exact calculation of the error probability of the LT-decoder, derived
in [10], combined with the calculation of the error probability for certain LDPC codes [15].
7.1 Design of the Output Degree Distribution
Following [1], we call an input symbol released at time if at least one neighbor of that input symbol becomes
of reduced degree one after input symbols are recovered. The input ripple at time is defined as the set of
all input symbols that are released at time . The aim of the design is to keep the input ripple large during as
large a fraction of the decoding process as possible.
We will give a heuristic analysis of the expected size of the input ripple given that the decoding process
has already recovered a fixed fraction of the input symbols. For this, it is advantageous to rephrase the BP
decoding. At every round of this algorithm messages are sent along the edges from output symbols to input
symbols, and then from input symbols to output symbols. The messages sent are or . An input symbol
sends a to an incident output symbol iff its value is not recovered yet. Similarly, an output symbol sends
a message to an incident input symbol iff the output symbol is not able to recover the value of the input
symbol.
Let be the probability that an edge in the decoding graph carries a value from an output symbol at
step of the decoding process. Then, a standard tree analysis argument [16] shows the recursion
where and are the output and the input edge degree distributions, respectively. Note that this recursion is
only valid if we can assume that the messages along the edges are statistically independent.
We have , and e , where is the average degree of an input symbol,
and is the derivative of . (The latter is a standard approximation of the binomial distribution by
a Poisson distribution, see the proof of Lemma 4.) Moreover, the input node degree distribution also equals
e , since this distribution is equal to .
19
Let denote the probability that an input symbol is recovered at round . An input symbol is recovered
iff it is incident to an edge which carries the message from some output symbol. The probability that an
input symbol is recovered, conditioned on its degree being , equals . Hence, the probability
that an input symbol is unrecovered at round of the algorithm is e . This shows that
. Phrasing the above recursion for the ’s in terms of the ’s, we obtain
e
This recursion shows that if an expected -fraction of input symbols has been already recovered at some step
of the algorithm, then in the next step that fraction increases to e . Therefore, the expected fraction
of input symbols in the input ripple will be e .
Suppose that the decoding algorithm runs on output symbols. Then , and
we see that the expected fraction of symbols in the input ripple is
e
The above derivation is a heuristic. But for two reasons this does not matter for the design of the Raptor
codes:
1. The heuristic is only a means for obtaining good degree distribution candidates. Once we have found
a candidate, we will exactly calculate the error probability of the LT-decoder on that candidate as dis-
cussed in Section 7.2.
2. It can be shown by other means that the above formula is, in fact, the exact expectation of the size of
the input ripple [10].
Let us assume that the pre-code of the Raptor Code to be designed has block-length . We need to design
the output degree distribution in such a way as to ensure that a large fraction of the input symbols are
recovered. To solve this design problem, we use an idea communicated to us by Luby [17]: We try to keep the
expected ripple size larger than or equal to , for some positive constant . The rationale behind
this choice is that if deletion and insertion of elements into the input ripple were to happen independently
with probability every time an input symbol is recovered, then the input ripple size needed to be larger
by a factor of than the square root of the number of input symbols yet to be recovered, which is .
Though only a heuristic, this condition turns out to be very useful for the design of the degree distributions.
20
Using this condition, the design problem becomes the following: given and , and given the number
of input symbols, find a degree distribution such that
e
for . Indeed, if this condition is satisfied, then the expected size of the input ripple which is
e is larger than or equal to .
This design problem can be solved by means of linear programming, in a manner described in [3]. Namely,
the inequality can be manipulated to yield
for . Note that for this to be solvable, needs to be larger than . By discretizing the interval
and requiring the above inequality to hold on the discretization points, we obtain linear inequalities
in the unknown coefficients of . Moreover, we can choose to minimize the objective function
(which is again linear in the unknown coefficients of ), in order to obtain a degree distribution with the
minimum possible average degree.
Table 1 shows several optimized degree distributions we have found using this method for various values
of . All the -values used are equal to . It is interesting to note that for small values of , is
approximately equal to , which is the same as for the soliton distribution given in Section 3.
7.2 Error Analysis of LT-codes
The upcoming paper [10] describes a dynamic programming approach to calculate the error probability of
the LT-decoder for a given degree distribution. More precisely, given and the degree distribution , the
procedure computes for every the probability that the decoding process fails with exactly
input symbols recovered.
Figure 2 shows a plot of the cumulative probability of decoding error (vertical axis in log-scale) versus
(horizontal axis), for the sequence in Table 1 corresponding to the value . Note that for all
the degree distributions given in Table 1 and all large enough number of input symbols the error probability
of the LT-decoder jumps to 1 before all input symbols are recovered. This is because the average degree of
the output symbols in the LT-decoder is too small to guarantee coverage of all input symbols.
21
65536 80000 100000 120000
0.007969 0.007544 0.006495 0.004807
0.493570 0.493610 0.495044 0.496472
0.166220 0.166458 0.168010 0.166912
0.072646 0.071243 0.067900 0.073374
0.082558 0.084913 0.089209 0.082206
0.056058 0.049633 0.041731 0.057471
0.037229 0.043365 0.050162 0.035951
0.001167
0.055590 0.045231 0.038837 0.054305
0.010157 0.015537
0.025023 0.018235
0.003135 0.010479 0.016298 0.009100
0.017365 0.010777
0.038 0.035 0.028 0.02
5.87 5.91 5.85 5.83
Table 1: Degree distributions for various values of ; is the overhead, and is the average degree of an
output symbol
7.3 Design and Error Analysis of the Pre-Code
Even though the choice of a Tornado code or a right-regular code as the pre-code of a Raptor Code is sufficient
for proving theoreticalresults about thelinear time encodability and decodability ofRaptor codes with suitable
distributions, such choices turn out to be rather poor in practical situations. In such cases, one is interested in
a robust pre-code with provable guarantees on the decoding capability, even for short lengths.
In this section we discuss a special class of LDPC codes that are well suited as a pre-code. First, let us
recall the definition of an LDPC code. Let be a bipartite graph with left and right nodes. In the context
of LDPC codes the left nodes are often referred to as the message or variable nodes while the right nodes are
referred to as the check nodes. The linear code associated with the graph is of block-length . The coordinate
positions of a codeword are identified withthe message nodes. The codewords are those vectors oflength
22
-300
-250
-200
-150
-100
-50
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -20
-18
-16
-14
-12
-10
-8
-6
-4
-2
0
0.97 0.975 0.98 0.985 0.99 0.995 1
(a) (b)
Figure 2: Decimal logarithm of the cumulative probability of error of the LT-decoder (vertical axis) versus
the fraction of decoded input symbols (horizontal axis) for the sequence given in Table 1 for . (a)
full range, and (b) towards the end of the decoding process (less than 3% input symbols left to decode).
over the base field such that for every check node the sum of its neighbors among the message nodes is zero.
(See Figure 3.)
BP decoding of LDPC codes over an erasure channel is very similar to the BP decoding of LT-Codes [3].
It has been shown in [18] that this decoding algorithm is successful if and only if the graph induced by the
erased message positions does not contain a stopping set. A stopping set is a set of message nodes such that
their induced graph has the property that all the check nodes have degree greater than one. For example, in
Figure 3 the message nodes generate a stopping set of size .
Since the union of two stopping sets is again a stopping set, a bipartite graph contains a unique maximal
stopping set (which may be the empty set). The analysis of erasure decoding for LDPC codes boils down to
computing for each value of the probability that the graph generated by the erased positions has a maximal
stopping set of size .
The LDPC codes we will study in this section are constructed from a node degree distribution
. For each of the message nodes the neighboring check nodes are constructed as follows: a degree
is chosen independently at random from the distribution given by . Then random check nodes are
chosen which constitute the neighbors of the message node. The ensemble of graphs defined this way will be
denoted by in the following.
For a graph in the ensemble we can calculate an upper bound for the probability that the
graph has a maximal stopping set of size . The following theorem has been proved in [15].
23
Figure 3: An LDPC code
Theorem 6. Let be a positive integer. For , , and let be recursively defined by
for
for
Let be a random graph in the ensemble . Then the probability that has a maximal stopping
set of size is at most
In the above theorem is the probability that a random bipartite graph in the ensemble
has check nodes of degree zero and check nodes of degree one. This implies the second statement of the
theorem.
A standard dynamic programming algorithm can compute the upper bound in the above theorem with
bit operations, where is the maximum degree in the distribution .
24
7.4 Combined Error Probability
The decoding error probability of a Raptor Code with parameters can be estimated using the
finite length analysis of the corresponding LT-Code and of the pre-code . This can be done for any code
with a decoder for which the decoding error probability is completely known. For example, can be chosen
from the ensemble .
We will assume throughout that the input symbols of the Raptor Code need to be recovered from
output symbols. Suppose that has block-length . For any with let denote the probability
that the LT-decoder fails after recovering of the intermediate symbols. Further, let denote the probability
that the code cannot decode erasures at random positions. Since the LT-decoding process is independent of
the choice of , the set of unrecovered intermediate symbols at the point of failure of the decoder is random.
Therefore, if denotes the probability that the input symbols cannot be recovered from the output
symbols, then we have
Using the results of the previous two subsections, it is possible to obtain good upper bounds on the overall
error probability of Raptor Codes.
7.5 Finite Length Design
As an example of the foregoing discussions we will give one possible design for Raptor codes for which the
number of input symbols is larger than or equal to 64536. In this case, we first encode the input symbols
using an extended Hamming code. This increases the number of symbols to a number which is roughly
.
The reason to choose the extended Hamming code as a first stage of the pre-coding is to reduce the effect
of stopping sets of very small size, since an extended Hamming code has minimum distance and thus takes
care of stopping sets of sizes and . Moreover, stopping sets of larger sizes can also be resolved, with a
good probability, using an extended Hamming code.
Next, we use a random code from the ensemble to pre-code the symbols and
produce intermediate symbols. Then we use an LT-Code with the degree distribution for length
given in Table 1. The overall Raptor Code in this case is shown in Figure 4.
Using the results of the previous sections, we can calculate an upper bound on the probability of error for
this Raptor code. For any , let be the probability that the Raptor code fails to recover a subset of size
25
Input
Hamming
LDPC
Output
Figure 4: One version of Raptor codes: the pre-coding is done in multiple stages. The first stage is Hamming
coding, and the second stage is a version of LDPC coding.
within the symbols on which the LT-encoding is performed. Figure 5 shows an upper bound on
as grows. The addition of the Hamming code at the beginning results in for , and
reduces the upper bound on the overall block error probability significantly to .
8 Systematic Raptor Codes
One of the disadvantages of Raptor Codes is that they are not systematic. This means that the input symbols
are not necessarily reproduced by the encoder. As many applications require systematic codes for better
performance, we will design in this section systematic versions of Raptor Codes.
Throughout this section we will assume that we have a Raptor Code with parameters which
has a reliable decoding algorithm of overhead . We denote by the block-length of the pre-code .
We will design an encoding algorithm which accepts input symbols and produces a set
of distinct indices between and and an unbounded string of output symbols
such that , and such that the output symbols can be computed efficiently. Moreover,
we will also design a reliable decoding algorithm of overhead for this code.
In the following we will refer to the indices as the systematic positions, we will call the output
symbols the systematic output symbols, and we will refer to the other output symbols as the
non-systematic output symbols.
26
0
1e-015
2e-015
3e-015
4e-015
5e-015
6e-015
1 2 3 4 5 6 7 8 9 10
Figure 5: Upper bound on the probability that the Raptor Code of Section 7 cannot recover a subset of
size for small values of
8.1 Summary of the Approach
The overall structure of our approach is as follows. We will first compute the systematic positions .
This process also yields an invertible binary -matrix . These data are computed by sampling
times from the distribution independently to obtain vectors and applying a modification
of the decoding algorithm to these vectors. The matrix will be the product of the matrix consisting of
the rows and a generator matrix of the pre-code. These sampled vectors also determine the first
output symbols of the systematic encoder.
To encode the input symbols we first use the inverse of the matrix to transform these into
intermediate symbols . We then apply the Raptor Code with parameters to the in-
termediate symbols, whereby the first symbols are obtained using the previously sampled vectors
. All this will be done in such a way that the output symbols corresponding to the systematic
positions coincide with the input symbols.
The decoding process for the systematic Raptor Code will consist of a decoding step for the original
Raptor Code to obtain the intermediate symbols . The matrix is then used to transform these
intermediate symbols back to the input symbols .
In the next section we will introduce a matrix interpretation of the encoding and decoding procedures for
27
Raptor Codes and use this point of view to describe our encoding and decoding algorithms for systematic
Raptor Codes.
8.2 A Matrix Interpretation of Raptor Codes
The encoding procedure for a Raptor Code amounts to performing multiplications of matrices with vectors,
and to solving systems of equations. The matrices involved are binary, i.e., their coefficients are either zero or
one. The vectors on the other hand will be vectors of symbols, where each symbol is a binary digit, or itself a
binary vector. We will always view vectors as row vectors.
For the rest of this paper we will fix a generator matrix of the pre-code. is an binary matrix.
Let denote the vector consisting of the input vectors. The pre-coding step of the Raptor Code
corresponds to the multiplication .
Each output symbol of the Raptor Code is obtained by sampling independently from the distribution
to obtain a row vector in . The value of the output symbol is calculated as the scalar product . We
call the vector the vector corresponding to the output symbol.
To any given set of output symbols of the Raptor Code there corresponds a binary -matrix
in which the rows are the vectors corresponding to the output symbols. In other words, the rows of are
sampled independently from the distribution , and we have
(1)
where is the column vector consisting of the output symbols. Decoding the Raptor Code
corresponds to solving the system of equations given in (1) for the vector .
8.3 The systematic positions and the Matrix
In this section we will discuss the problem of calculating the systematic positions, and the matrix . More-
over, we will study the cost of multiplication of with a generic vector of length , and the cost of solving a
system of equations for , where is a given vector of length . In order to make assertions on the
cost, it is advantageous to introduce a piece of notation. For a matrix we will denote by the number
of arithmetic operations sufficient to calculate the product of the matrix with a generic vector ,
divided by the number of rows of . This is the number of arithmetic operations per entry of the product
. In this sense is the cost of multiplying with a generic column vector, and is the cost
of solving the system of equations for a generic vector .
28
The system (1) is solvable if and only if the rank of is . Gaussian elimination identifies rows with
indices such that the submatrix of consisting of these rows has the property that is
an invertible -matrix. This gives us the following algorithm for calculating and the systematic indices.
Algorithm 7. Input: Raptor Code with parameters , and positive real number .
Output: If successful, vectors , indices between and , and
invertible matrix such that is the matrix formed by rows .
(1) Sample times independently from the distribution on to obtain .
(2) Calculate the matrix consisting of rows and the product .
(3) Using Gaussian elimination, calculate rows such that the submatrix of consisting of
these rows is invertible, and calculate . If the rank of is less than , output an error flag.
Theorem 8. (1) If the decoding algorithm for the Raptor Code errs with probability , then the probability
that Algorithm 7 fails is at most .
(2) The algorithm computes the matrix and its inverse with binary arithmetic operations.
(3) .
(4) With high probability (over the choice of the ) is upper bounded by ,
where is the encoding cost of , and is a function approaching as approaches infinity.
Proof. Let be the matrix whose rows are the vectors . The system of equations (1) is solvable
if and only if the rank of is . The probability of solving the system using the decoding algorithm for
the Raptor Code is , hence the probability that the rank of is smaller than is at most . This proves
(1).
The matrix can be calculated with operations. The matrix and its inverse, and the
systematic indices can be obtained using a standard application of Gaussian elimination to the
matrix . This step uses operations, and proves (2).
It is easily seen that for any -matrix, so (3) follows.
The multiplication can be performed by first multiplying with to obtain an -dimensional
vector , and then multiplying with . The cost of the second step is the average weight
of the . To obtain an upper bound on this average weight, note that a standard application of the Chernoff
29
bound shows that the sum of the weights of is , with high probability. The
average weight of the vectors is therefore at most .
This shows (4).
The above algorithm can be simplified considerably for the case of LT-Codes by a slight adaptation and
modification of BP decoding.
Algorithm 9. Input: LT-Code with parameters , and positive real number .
Output: If successful, vectors , indices between and , and invertible
matrix formed by rows .
(1) Sample times independently from the distribution on , where is the block-length of
, to obtain , and let denote the matrix formed by these vectors as its rows.
(2) Set counter , and matrix , and loop through the following steps:
(2.1) If , identify a row of weight of ; flag an error and stop if it does not exist; otherwise, set
to be equal to the index of the row in .
(2.2) Identify the unique nonzero position of the row, and delete the column corresponding to that
position from .
(3) Set equal to the rows of .
Theorem 10. (1) Suppose that BP decoding is an algorithm of overhead for the above LT-Code, and
suppose that it errs with probability . Then the probability that Algorithm 9 is at most .
(2) The matrix can be calculated with at most arithmetic operations.
(3) With high probability (over the choice of the ) is upper bounded by ,
where is a function approaching as approaches infinity.
(4) With high probability (over the choice of the ) is upper bounded by .
Proof. The assertions follow from the fact that the algorithm provided is essentially the BP decoding algo-
rithm (except that it does not perform any operations on symbols, and it keeps track of the rows of which
participate in the decoding process). The assertion on the cost of calculating , and the cost follow
from the upper bound on the average weights of the vectors . (See the proof
of the previous proposition.)
30
To calculate , note that the success of the algorithm shows that with respect to a suitable column
and row permutation is lower triangular with ’s on the main diagonal. Hence, the cost is equal to
the average weight of and the assertion follows.
In what follows we assume that the matrix , the vectors , and the systematic positions
have already been calculated, and that this data is shared by the encoder and the decoder. The
systematic encoding algorithm flags an error if Algorithm 7 (or it LT-analogue) fails to calculate this data.
8.4 Encoding Systematic Raptor Codes
The following algorithm describes how to generate the output symbols for a systematic Raptor Code.
Algorithm 11. Input: Input symbols .
Output: Output symbols , where for the symbol corresponds to the vectors
, and where for .
1. Calculate given by .
2. Encode using the generator matrix of the pre-code to obtain , where
.
3. Calculate for .
4. Generate the output symbols by applying the LT-Code with parameters
to the vector .
We will first show that this encoder is indeed a systematic encoder with systematic positions .
This also shows that it is not necessary to calculate the output symbols corresponding to these positions.
Proposition 12. In Algorithm 11 the output symbols coincide with the input symbols for .
Proof. Note that , where the rows of are . We have , i.e., ,
since . Hence, for all , , and we are done.
Next, we focus on the cost of the encoding algorithm.
Theorem 13. The cost of Algorithm 11 is at most , where is the encoding cost of the Raptor
Code. In particular, if the Raptor Code is an LT-Code, then the cost of this algorithm is at most
.
31
Proof. Computation of has cost . Encoding has cost , where is the encoding cost of the
pre-code. Calculation of each of the has expected average cost of . Therefore, the total cost is
. But is the encoding cost of the Raptor Code, hence the assertion follows.
If the Raptor Code is an LT-Code, then is at most by Theorem 10, and
.
8.5 Decoding Systematic Raptor Codes
The decoder for the systematic Raptor Code collects output symbols and recovers the input symbols
with high probability.
Algorithm 14. Input: Output symbols , where .
Output: The input symbols of the systematic Raptor Code.
(1) Decode the output symbols using the decoding algorithm for the original Raptor Code to obtain the
intermediate symbols . Flag an error if decoding is not successful.
(2) Calculate , where , and .
As in the case of the encoding algorithm, we will first focus on the correctness of the algorithm.
Proposition 15. The output of Algorithm 14 is equal to the input symbols of the systematic encoder, and the
error probability of this algorithm is equal to the error probability of the decoding algorithm used in Step 1.
Proof. The output symbols are independently generated output symbols of a Raptor Code with
parameters applied to the vector . Therefore, the decoding algorithm used in Step 1 is successful
with high probability, and it computes the vector if it succeeds. Since , the correctness of the
algorithm follows.
Next we focus on the cost of the algorithm.
Theorem 16. The cost of Algorithm 14 is at most , where is the encoding cost and
is the decoding cost of the original Raptor Code, and is the encoding cost of the pre-code. If the Raptor
Code is an LT-Code, then the cost of Algorithm 14 is at most .
Proof. Step 1 of the algorithm has cost , and by Theorem 8. If the Raptor
Code is an LT-Code, then , and is upper bounded by by Theorem 10.
32
We would like to remark that the above algorithm can be improved. For example, if all the systematic
positions have been received, then there is no need to run the decoding algorithm at all. More generally, if
systematic positions have been received, then only input symbols need to be calculated. We leave it to
the reader to show that the cost of calculating the missing input symbols is actually
if .
8.6 Practical Considerations
Our first remark concerns the systematic positions. One would like these positions to be . The
reason we could not satisfy this condition is hidden in the proof of Proposition 15, since we needed to make
sure that the collected output symbols are statistically independent. In practice, however, it is a good idea to
permute the vectors so that the systematic positions become the first positions.
Next we remark that is is possible to reduce the error probability of the encoder considerably by generating
many more initial vectors than in Algorithm 7 (or its LT-analogue). Depending on how many initial
vectors are generated, this makes the error probability of the algorithm very small (for example much smaller
than the probability of failure of the decoding algorithm 14).
There are various ways of improving the running time of the systematic decoder. For example, it is not
necessary to entirely re-encode the vector in Step 2 of Algorithm 14. This is because the decoding process
in Step 1 will have recovered a large fraction of the coordinate positions of the vector obtained by applying
the pre-code to . These coordinate positions do not need to be recalculated.
We also comment that in practice the cost of multiplying with in Algorithm 11 is much smaller than
. This is because the matrix can be “almost” upper triangularized, i.e., after suitable permutations of
rows and columns will be of the form , where is an upper triangular matrix of large size,
and is invertible.
9 Acknowledgments
A number of people have helped with the design of Raptor Codes at various stages. First and foremost I would
like to thank Michael Luby for sharing with me his insight into LT-Codes and for carefully proofreading
previous versions of the paper. I am also grateful to Igal Sason for carefully reading a previous draft of the
paper and pointing out a number of corrections. Many thanks go to David MacKay for pointing out several
33
remaining typos and for suggestions for improvement.
Soren Lassen implemented the Raptor Codes reported in this paper and has since optimized the design
and the implementation to reach the speeds reported in the introduction. I would also like to thank Richard
Karp, Avi Wigderson, Vivek Goyal, Michael Mitzenmacher, and John Byers for many helpful discussions
during various development phases of this project.
References
[1] M. Luby, “LT-codes,” in Proceedings of the ACM Symposium on Foundations of Computer Science
(FOCS), 2002.
[2] P. Elias, “Coding for two noisy channels,” in Information Theory, Third London Symposium, 1955, pp.
61–76.
[3] M. Luby, M. Mitzenmacher, A. Shokrollahi, and D. Spielman, “Efficient erasure correcting codes,”
IEEE Trans. Inform. Theory, vol. 47, pp. 569–584, 2001.
[4] R. G. Gallager, Low Density Parity-Check Codes, MIT Press, Cambridge, MA, 1963.
[5] J. Byers, M. Luby, M. Mitzenmacher, and A. Rege, “A digital fountain approach to reliable distribution
of bulk data,” in proceedings of ACM SIGCOMM ’98, 1998.
[6] M. Luby, “Information additive code generator and decoder for communication systems,” U.S. Patent
No. 6,307,487, Oct. 23, 2001.
[7] M. Luby, “Information additive code generator and decoder for communication systems,” U.S. Patent
No. 6,373,406, April 16, 2002.
[8] A. Shokrollahi, S. Lassen, and M. Luby, “Multi-stage code generator and decoder for communication
systems,” U.S. patent application 20030058958, Serial No. 032156, December 2001.
[9] P. Maymounkov, “Online codes,” Submitted for publication, 2002.
[10] R. Karp, M. Luby, and A. Shokrollahi, “Finite length analysis of LT-codes,” To appear, 2002.
[11] H. Jin, A. Khandekar, and R. McEliece, “Irregular repeat-accumulate codes,” in Proc. 2nd International
Symposium on Turbo Codes, 2000, pp. 1–8.
34
[12] M. Luby, M. Mitzenmacher, A. Shokrollahi, and D. Spielman, “Improved low-density parity-check
codes using irregular graphs,” IEEE Trans. Inform. Theory, vol. 47, pp. 585–598, 2001.
[13] A. Shokrollahi, “New sequences of linear time erasure codes approaching the channel capacity,” in
Proceedings of the 13th International Symposium on Applied Algebra, Algebraic Algorithms, and Error-
Correcting Codes, M. Fossorier, H. Imai, S. Lin, and A. Poli, Eds., 1999, number 1719 in Lecture Notes
in Computer Science, pp. 65–76.
[14] P. Oswald and A. Shokrollahi, “Capacity-achieving sequences for the erasure channel,” IEEE Trans. In-
form. Theory, vol. 48, pp. 3017–3028, 2002.
[15] A. Shokrollahi and R. Urbanke, “Finite length analysis of a certain class of LDPC codes,” Unpublished,
2001.
[16] M. Luby, M. Mitzenmacher, and A. Shokrollahi, “Analysis of random processes via and-or tree eval-
uation,” in Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, 1998, pp.
364–373.
[17] M. Luby, “Design of degree distributions,” Private Communication, 2001.
[18] C. Di, D. Proietti, E. Telatar, T. Richardson, and R. Urbanke, “Finite-length analysis of low-density
parity-check codes on the binary erasure channel,” IEEE Trans. Inform. Theory, vol. 48, pp. 1570–1579,
2002.
35