Content uploaded by Kees Schouhamer Immink
Author content
All content in this area was uploaded by Kees Schouhamer Immink on Aug 18, 2021
Content may be subject to copyright.
CHAPTER 4
Channel Coding
J.H. Weber (TU Delft)
L.M.G.M. Tolhuizen (Philips Research Eindhoven)
K.A. Schouhamer Immink (University of Essen/Turing Machines)
Introduction
Channel coding plays a fundamental role in digital communication and in digital
storage systems. The position of channel coding in such a system is depicted in
Figure 4.1 overleaf. The channel encoder adds redundancy to the (possibly source
encoded and encrypted) messages generated by the information source, in order to
make them more resistant to noise and other disturbances affectingthe modulated
signals during transmission over the channel. The channel decoder exploits the
redundancy when trying to retrieve the original information based on the demod
ulator output. The choice of a channel coding scheme for a particular application
is a tradeoff between various factors, such as the rate (the ratio between the num
ber of information symbols and the number of code symbols), the reliability (the
bit or message error probability), and the complexity (the number of calculations
required to perform the encoding and decoding operations).
1This chapter covers references [346] – [450].
89
90 Chapter 4 – Channel Coding
source source
encoder encrypter channel
encoder modulator
channel
?

noise
user source
decoder decrypter channel
decoder de
modulator
Figure 4.1: Channel coding as a component in a communication or storage sys
tem.
In his landmark paper [3], Shannon showed that virtually errorfree communica
tion is possible at any rate below the channel capacity. However, his result did not
include explicit constructions and allowed for inﬁnite bandwidth and complexity.
Hence, ever since 1948, scientists and engineers have been working to further de
velop coding theory and to ﬁnd practically implementable coding schemes. The
paper of Costello, Hagenauer,Imai and Wicker gives a good overview of applica
tions of errorcontrol coding. Some of the codes emerging from coding theory as
developed in the 1950s and 1960s have been applied in mass consumer products
like the CD (developed jointly by Philips and Sony in the 1970s and 1980s) and
GSM (1990s). Classical reference works are the book of MacWilliams and Sloane
[42] and that of Blahut [62]. A more recent reference is the 2volume work [108].
The above books focuses mainly on block codes; the book of Johannesson and
Zigangirov [110] deals exclusively (and expertedly!) with convolutional codes. In
[109], a comprehensive overview is given of (modulation) codes particularly de
signed for data storage systems, such as optical and magnetical recording products.
The introduction of turbo codes in 1993 [90] caused a true revolution in error
control coding. These codes allow transmission rates that closely approach channel
capacity. Also, the rediscovery of Gallager’s lowdensity paritycheck (LDPC)
codes [15] contributedto the large present interestin iterative decoding,both the
oretically and practically (iterative decoders are being applied in UMTS).
Sessions on channel coding have been part of the Symposia on Information The
ory held in the Benelux since 1980. On average, about four channel coding papers
were presented per symposium. Among the highlights of the many Benelux con
tributions to this ﬁeld are the celebrated Roos bound on the minimum distance of
cyclic codes [352], Best’s work on the performance evaluation of convolutional
codes on the binary symmetric channel [368], and the comprehensive survey pa
pers by Delsarte on the association schemes in the context of coding theory [400],
[444].
4.1 Block Codes 91
In this chapter, we brieﬂy describe the over one hundred papers on channel coding
presented at the Symposia on InformationTheory in the Benelux. Some structure
has been pursued by classifying each paper into one of the following categories:
constructions and properties of (block) codes (Section 4.1), decoding techniques
(Section 4.2), codes for data storage systems (Section 4.3), codes for special chan
nels (Section 4.4), and, ﬁnally, applications (Section 4.5). Some categories have
been divided further into subcategories. The classiﬁcation is not always unambigu
ous, since many papers deal with more than one aspect (e.g., a paper presenting
a code construction together with an accompanying decoding method). The ﬁnal
choice represents the main contribution of the paper in the opinion of the authors
of this chapter.
4.1 Block Codes
4.1.1 Constructions
In this section, we discuss papers that deal with the construction of block codes.
Some papers in this section might just as well have been discussed in the next
section, as they aim at constructing codes with special properties, e.g. a large min
imum distance.
The wellknown Griesmer bound states that the length nof a binary [n, k, d]code
satisﬁes the following inequality:
n≥g(k, d):=
k−1
X
i=0 d
2i.(4.1)
In [349], Van Tilborg and Helleseth explicitly construct, for each k≥4, a binary
[2k+2
k−2−15,k,2
k−1+2
k−3−8] code that is readily be seen to meet the
Griesmer bound with equality. It is claimed that for k≥8, up to equivalence,
the constructed codes are the only ones with these parameters. In [380], Kapralov
and Tonchev construct selfdual binary codes from the known 2(21,10,3)designs
without ovals, and study the automorphism groups of these codes.
In [401], Ericson and Zinoviev give three methods for constructing spherical codes
(i.e., sets of unit norm vectors in Rn) from binary constant weight codes. Bounds
are given on the dimensionality, the minimum squared Euclidean distance, and the
cardinality of the resulting spherical codes, and numerical examples are given.
In [403], Peirani studies a class of codes obtained by application of the wellknown
(u, u +v)construction to a simplex code Uand a code Vfrom a class of codes
with normal asymptotic weight distribution. It is shown that the resulting codes
have an asymptotically normal weight distribution as well, by using properties of
the dual of the (u, u +v)code, the MacWilliams identity, and the central limit
theorem.
92 Chapter 4 – Channel Coding
According to the Singleton bound, the cardinality of a code Cof length nand
minimum distance dover a qary alphabet Qis at most qn−d+1 . In case of equal
ity, Cis called an MDS code. Examples of MDS codes are Reed–Solomon codes,
which are deﬁned if Qis endowed with the structure of a ﬁnite ﬁeld (and hence qis
a power of a prime). In [404], Vanroose studies MDS codes over the alphabet Zm.
His main results are the following. Let NL
m(k)denote the largest length of a linear
MDS code over Zmwith mkwords. Then NL
m(2) = p+1,andNL
m(k)≤p+k−1,
where pis the largest prime factor of m. Note that the demand that the code is
linear over Zmis quite restrictive: if mis the power of a prime, doubly extended
ReedSolomoncodes are[m+1,k,m+2−k]codes for eachk∈{1,2,...,m+1}.
In [422], Van Dijk and Keuning describe a construction of binary quasicyclic
codes from quaternary BCH codes. The length and dimension of the binary code
is determined by the generator polynomial of its originating quaternary code; its
minimum distance is at least the minimum distance of the quaternary code. For
some example codes obtained with this construction, the true minimum distance
(found by computer search) equals the best known minimum distance for binary
linear codes of the given length and dimension.
An (n, w, λ)optical orthogonal code is a set of binary sequences of length nand
weight wsuch that for each x∈Cand integer τ∈{1,2,...,n−1},
n−1
X
t=0
xtxt+τ≤λ, (4.2)
and for any two distinct x, y ∈Cand each integer τ∈{0,1,...,n−1},
n−1
X
t=0
xtyt+τ≤λ. (4.3)
The subscripts are to be taken modulon. Optical orthogonal codes can be used to
allow multiuser optical communication.
In [426], Stam and Vinck give a good overview of the known results in this area.
They also introduce a property they call “super crosscorrelation”: for all distinct
x, y and zin C, and integer τ∈{0,1,...,n−1}, it is demanded that
n−τ−1
X
t=0
xtyt+τ+
n−1
X
t=n−τ
xtzt+τ≤λ. (4.4)
Codes satisfying this extra property could be used in applications with partial syn
chronization between different codewords and where the mutually synchronized
words typically are not sent simultaneously. In [436], Martirosyan and Vinck de
scribe a construction of optical orthogonalcodes with λ=1. If a certain parameter
in their construction is small enough, their code contains, in a ﬁrstorder approx
imation, as many words as possible. Speciﬁc examples of good codes resulting
from the construction are tabulated.
4.1 Block Codes 93
Figure 4.2: Citation of the Roos bound in a textbook from 2003.
4.1.2 Properties
Over the years, properties like the length, cardinality, minimum distance, or weight
distribution of codes belonging to a particular family have been studied exten
sively. In this section we review miscellaneous results in this area as presented at
the various Benelux Information Theory symposia.
In [352], Roos states and proves what in present textbooks (see Figure 4.2) is
referred to as the “Roos bound” for the minimum distance of cyclic codes. The
bound reads as follows. Let αbe an nth primitive root of unity in GF(q). Let
b, c1,c
2,δand sbe integers such that δ≥2,(n, c1)=1,and(n, c2)<δ,andlet
N:= {αb+i1c1+i2c20≤i1≤δ−2,0≤i2≤s}.(4.5)
Let Cbe a cyclic code over GF(q) such that each element of Nis a zero of C.
That is, for each word c=(c
0
,c
1,...,c
n−1)∈Cand each β∈N,wehavethat
Pn−1
i=0 ciβi=0. Then the minimum distance of Cis at least δ+s. The Roos
bound is often applied to prove a lower bound on the minimumdistance of a sub
ﬁeld subcode of C. For example, let αbe a 51st root of unity in GF(28). Let Bbe
the binary cyclic code with zeroes α, α5and α9. The conjugacy constraints imply
that all elements of N={αii∈{7,8,9,10,13,14,15,16}} are zeroes of B.It
follows from the Roos bound, with b=7,c
1=1,c
2=6,δ =5,ands=1,that
the code C:= {(c0,c
1,...c
50)∈(GF (28))51 P50
i=0 ciβi=0for all β∈N}
has minimum distance at least six. As Bis a subcode of C, its minimum distance
94 Chapter 4 – Channel Coding
is surely at least six.
In [354], De Vroedt considers formally selfdual codes. For such codes, with
the property that all weights are multiples of some constant t>1, he derives the
weight enumerator through computation of the eigenvaluesand eigenvectors of the
socalled Krawtchouk matrix, rather than by using the traditional method based on
invariant theory.
In [357], Bussbach, Gerretzen and Van Tilborg study properties of [g(k, d),k,d]
codes, i.e., codes that meet the Griesmer bound from Equation (4.1) with equality.
It is shown that the maximum number of times a coordinate in Cis repeated equals
s:= dd
2k−1e. Moreover, it is shown that the coveringradius ρof such codes is at
most d−ds
2e, with equality if and only if a [g(k+1,d),k+1,d]code exists. For
s≤2,all[g(k, d),k,d]codes with ρ=d−ds
2eare described; for ﬁxed kand
sufﬁciently large d,thereexist[g(k, d),k,d]codes with ρ=d−ds
2e.
In [400], Delsarte givesa comprehensivesurvey of some of the main applications
and generalizations of the MacWilliams transform relevant to coding theory. The
author, one of the world’s most respected contributors to this area, considers in this
paper both the generalized MacWilliams identities for inner distributions of dual
codes and the generalized MacWilliams inequalities for the inner distributions of
unrestricted codes. The latter leads to the linear programming bound in general
coding theory. The paper also contains an introduction to association scheme the
ory, which is an appropriate framework for nonconstructive coding theory. In
[444], again a survey paper by Delsarte, the Hamming space, particularly impor
tant to coding theory, is viewed as an association scheme. The paper provides an
extensive overview of those parts of association scheme theory that are especially
relevant to coding problems. Special emphasis is put on several forms of dual
ity inherent in the theory. The Hamming space is also considered by Canogar in
[424]. The author studies an example of a nontrivial partition design of the 10
dimensional Hamming space. He shows that this partition can be reconstructed
from its adjacency matrix.
Gillot derives in [402] bounds on the codeword weights of cyclic codes by using
bounds on exponential sums. In particular, the author pays attention to a family of
codes deﬁned by Wolfmann, for which the parameters can be expressed in terms
of numbers of solutions of trace equations.
Maximumlikelihood decoding of a linear block code can be efﬁciently performed
with a trellis. An important parameter for judging the complexity of trellis decod
ing is the state complexity of the code. In [419], Tolhuizen shows that a binary
linear code of length dimension k, Hamming distance dand state complexity at
most k−3has length n≥2d+2dd/2e−1, and constructs a [15,7,5] code attain
ing this bound with equality.
A superimposed code in ndimensional Euclidean space is a subset of vectors with
the property that all possible sumsof any mor fewer of these vectors form a set of
4.1 Block Codes 95
points which are separated by a certain minimum distance d. Since known bounds
on the rate of such a code are not so useful for small values of m, Vangheluwe
[425] studies experimentally the case m=2using visualization software pack
ages, leading to plots for both the randomcoding bound and the spherepacking
bound.
4.1.3 Cooperating Codes
Two (or more) errorcorrecting codes can be combined into a new code, which has
good error correction capabilities for combinations of random and burst errors.
The new (long) code can make use of the encoding and decoding algorithms of the
(short) constituent codes, so the encoding and decoding complexity can be kept
rather low. Product Codes and Concatenated Codes are two important classes of
such cooperating codes. In the product coding concept, two (or more) codes over
the same alphabet are combined. In the concatenated coding concept, a hierarchi
cal coding structure is established by combining an inner code over a loworder
(mostly binary) alphabet with an outer code over a highorder alphabet.
The product coding concept was introduced by Elias in 1954 [9]. In the two
dimensional case, the codewordsare arrays in which the rows are codewords from
a code C1, while the columns are codewords from a code C2. After (rowwise)
transmission, the received symbols are collected in a similar array, in which ﬁrst
the rows are decoded accordingto C1and next the columns according to C2.Inthis
way, randomerrors are likely to be corrected by the row decoder, while remaining
burst errors, which have been distributed over various columnsdue to interleaving,
are to be corrected by the column decoder.
In [370], Blaum, Farrell and Van Tilborg consider simple product codes using
evenweight codes (requiring only a single paritycheck bit) as constituent codes.
They propose a diagonal readout structure (instead of the traditional rowwise
procedure) together with an efﬁcient decoding algorithm, which enables the cor
rection of relatively long burst errors.
In [385], Tolhuizen and Baggen show that a product code is much more powerful
than commonly expected. Product codes generally have a poor minimum distance,
i.e, there may exist codes of the same length and dimension with a higher mini
mum distance. Nevertheless, they may still offer good performance, since many
error patterns of a weight exceeding half the minimum distance can be decoded
correctly, even with relatively simple algorithms. The authors derive upper bounds
on the number of error patterns of low weight that a nearest neighbor decoder does
not necessarily decode correctly. Further, they also present a class of error patterns
which are decoded correctly by a nearest neighbor decoder. This class suggests
possibilities beyond those already known in 1989 for the simultaneous correction
of burst and random errors.
Concatenated codes were introduced by Forney [18] in 1966. The classical con
catenated coding scheme consists of a binary inner code with 2kwords and an
96 Chapter 4 – Channel Coding
outer code over GF(2k), typically a ReedSolomon code. Information is ﬁrst en
coded using the outer encoder. Next, each of the generated symbols is considered
as a binary vector of length k, which is encoded using the inner code. After trans
mission, the received bits are decoded by the inner decoder, leading to symbols
which are decoded using the outer decoder. In order to further increase the burst
error correction capabilities, one can insert an interleaver between the outer and
inner encoder, and a correspondingdeinterleaver between the inner and outer de
coder. A popular concatenated coding scheme (e.g., for deep space missions) uses
a rate 1/2 convolutional inner code of constraint length k=7, and a ReedSolomon
outer code over GF(256) of length 255 and dimension 223.
In [373], Van der Moolen proposes a decoding scheme for a concatenated coding
system with a convolutional inner code and a ReedSolomon (RS) outer code, with
block interleaving. For bursty channels, if a symbol error occurs in an RS word,
the symbols at the corresponding positions in the previous and next codewords are
suspicious. Based on this observation, Van der Moolen develops a “decodingwith
memory” strategy. The basic idea is that if the RS decoder succeeds, then at all
the locations of the (corrected) symbol errors, the Viterbi decoder is (re)started
to decode the corresponding symbols of the subsequent codewords with the new
initial states. Furthermore, the author gives a 12state Markov model describing
the process of decoding with memory for the concatenated coding system.
In the same year, Tolhuizen [375] considered the generalized concatenation con
struction proposed by Blokh and Zyablov in 1974. The BZ construction uses a
code A1over GF(q)ofdimensionkand r(outer) codes Biover GF(qai), where
Pr
i=1 ai=k. The author indicates how these ingredients should be chosen to
obtain a good code, i.e., a code with high minimum distance given its length and
dimension.
At the 1989 symposium in Houthalen, prof. T. Ericson from Link¨oping Univer
sity in Sweden gave an invited lecture on recent developments in concatenated
coding [386]. In particular, he discussed decoding principles, the construction
of optimal codes via concatenation (e.g., a construction of the Golay code using
a ReedSolomon outer code and a trivial distance1 inner code), and asymptotic
bounds.
In the late 1990s, Weber and AbdelGhaffar studied decoder optimization issues
for concatenated coding schemes. Instead of exploiting the full error correction
capability of the inner decoder with Hamming distance d, they use this capability
only partly, thus leaving more erasures but less errors for the outer decoder. Since
it is easier to correct an erasure than an error, there is a tradeoff problem to be
solved in order to determine the optimal choice. In [420], the inner code error
correction radius tis optimized over all possible values 0≤t≤b(d−1)/2c,
either by maximizing the number of correctable errors or by minimizing the un
successful decoding probability. For small channel error probabilities, a strategy
that is optimal in the latter respect is also optimal in the former respect. However,
for large channel error probabilities, a strategy that is optimal in one respect may
4.2 Decoding Techniques 97
be suboptimal in the other. In [430], the erasing strategy is not determined by the
inner code errorcorrection radius, but it is made adaptive to the actual reliability
values of the inner decoder outputs. The authors also determine the maximum
number of channel errors for which correction is guaranteed under such an opti
mized erasing strategy.
In 1995, Baggen and Tolhuizen [409] introduced a new class of cooperating codes:
Diamond codes. The two constituent codes, C1and C2, have the same length n
and are deﬁned over the same alphabet. As illustrated in Figure 4.3, the Diamond
code consists of the biinﬁnite strips of heightn, where each column is in C1and
each slant line with a given slope is in C2. In contrast to CIRC (Cross Interleaved
C1words C2words
Figure 4.3: The format of Diamond codes.
ReedSolomon Code, used in the CD system), all symbols of the Diamond code
are checked by both codes. In the area of optical recording, the application of
Diamond codes can enhance storage densities signiﬁcantly. In the accompanying
paper [410], Tolhuizen and Baggen consider block variations of Diamond codes in
order to make these more suited for rewritable, blockoriented applications.
4.2 Decoding Techniques
In the previous sections, we considered papers dealing with properties of codes
and constructions of codes. In the present section, we review papers on the decod
ing of errorcorrecting codes, both block codes and convolutional codes. Various
contributions to the decoding of convolutional codes are described.
4.2.1 HardDecision Decoding
Harddecision decoders operate on the symbol estimates delivered by the demodu
lator. A harddecision decoder may decode up to a prespeciﬁed number of errors
and declare a decoding failure otherwise; in that case, we speak of a bounded
distance decoder.
In [359], Simons and Roefs describe algorithms for the encoding and decoding of
[255,255 −2T,2T+1]ReedSolomon codes over GF(256) that allow an efﬁcient
98 Chapter 4 – Channel Coding
implementation in digital signal processors. The decoding algorithms contain the
following conventional steps: syndrome computation, solving the key equation,
and error location and evaluation. Signiﬁcant savings in the number of computa
tions are reported for Fast Fourier Transform techniques (strongly advocated in the
then recent book of Blahut [62]) used for encoding, syndrome computations and
for determining the error values.
In [379], Stevens shows that the BCH algorithm can be used to decode up to a
particular instance of the HartmannTzeng bound. By applying this result while
trying all values of a set of judiciously chosen syndromes, he obtains an algorithm
for decoding cyclic codes up to half their minimum distance. For various code
parameters, the cardinality of this set of syndrome values to be tried is minimized,
and thus efﬁcient decoding algorithms are obtained.
Van Tilburg describes [387] a probabilistic algorithm for decoding an arbitrary
linear [n, k]code. It reﬁnes the following wellknown method. A set of kof the
nreceived bits is selected at random. It is hoped that these kbits are error free.
If the positions corresponding to these kbits form an information set, the unique
codeword corresponding to these kbits is determined, and it is checked whether
the codeword so obtained is sufﬁciently close to the received word. If not, another
group of kbits is selected. The method proposed by Van Tilburg features a sys
tematic way of checking, and a randombit swapping procedure.
In [415], Heijnen considers binary [mk, k]codes that are quasicyclic. That is,
if
(c1,c
2,...,c
kc
k+1,...,c
2k...c
(m−1)k+1,...,c
mk)(4.6)
is a codeword, then the vector obtained by simultaneously applying a cyclic shift
on each of the mblocks
(ck,c
1,...,c
k−1c
2k,c
k+1,...,c
2k−1...c
mk,c
(m−1)k+1,...,c
mk−1)(4.7)
is a codeword as well. Three general decoding methods are compared: compari
son to all codewords, syndrome decoding (where the quasicyclic property allows
reduction of the number of coset leaders to be stored), and “error division”. The
latter method is based on the observation that an error vector of weight thas a
weight of at most s=bt
mcin at least one of its mblocks. For each i,1≤i≤m,
and each vector eof length kand weight at most s, the codeword is computed that
in the ith block equals to the sum of eand the ith block of the received word. The
Hamming distance of the codeword so obtained and the received vector is used to
select the codeword to decode to.
4.2.2 SoftDecision Decoding
While harddecision decoders do their job solely based on the symbol estimates
delivered by the demodulator, softdecision decoders also take into consideration
the reliability of those estimates. This leads to better performance, at the expense
of higher complexity. Over the years, many softdecision decoding techniques
4.2 Decoding Techniques 99
have been proposed. Although a maximumlikelihood (ML) decoding algorithm
minimizes the decoding error probability, other algorithms are of interest as well,
due to the (prohibitively)high computational complexity of ML decoding for long
codes.
Generalized Minimum Distance (GMD) decoding, as introduced by Forney [17]
in 1966, permits ﬂexible use of reliability information in algebraic decoding algo
rithms for error correction. In subsequent trials, an increasing number of the most
unreliable symbols in the received sequence is erased, and the resulting sequence is
supplied to an algebraic errorerasuredecoder, until the decoding result and the re
ceived sequence satisfy a certain distance criterion. In Forney’s originalalgorithm,
the unique codeword (if one exists) satisfying the generalized minimum distance
criterion is found in at most dd/2etrials, where dis the Hamming distance of the
code. In 1972, Chase [28] presented a similar class of decoding algorithms for
binary block codes, in which unreliable symbols are inverted (instead of erased)
in various decoding trials. From the list of generated codewords the most likely
one is chosen as the decoding result. Although the Forney and Chase decoding
approaches are rather old, they are still highly relevant. The resulting decoders
are not only used as standalone decoders, but also as constituent components in
modern techniques like iterative decoding of product codes.
In [391], Hollmann and Tolhuizen present a new condition on GMD decoding
to guarantee correct decoding. They apply their weakened condition on the de
coding of product codes, and describe a class of error patterns that is corrected
by a slightly adapted version of the GMDbased Wainberg algorithm for decoding
product codes is described. This class of error patterns equals the class that Tol
huizen and Baggen [385] showed to be correctableby a nearest neighbor decoder
two years before, cf. Section 4.2.1.
In the early 2000s, Weber and AbdelGhaffarconsidered reduced GMD decoders.
They studied the degradation in performance resulting from limiting the number
of decoding trials and/or restricting (e.g., quantizing) the set of reliability values.
In [431], they focus on singletrial methods with ﬁxed erasing strategies, threshold
erasing strategies, and optimized erasing strategies. The ratios between the realiz
able distances and the code’s Hamming distance for these strategies are about 2/3,
2/3, and 3/4, respectively. A particular class of reliability values is emphasized,
allowing a link to the ﬁeld of concatenatedcoding. In [437], asymptotic results on
the errorcorrection radius of reduced GMD decoders are derived.
Recently, limitedtrial versions of the Chase algorithm were introduced as well.
The least complex version of the original Chase algorithms (“Chase 3”) [28] uses
roughly d/2trials, where dis the code’s Hamming distance. In [442], Kossen and
Weber show that decoders exist with lower complexity and better performance
than the Chase 3 decoder. It also turns out that optimization of the settings of the
trials depends on the nature of the channel, i.e., AWGN and Rayleigh fading chan
nels may require different arrangements. In [449], Weber considers Chaselike
algorithms achieving boundeddistance (BD) decoding, i.e., decoders for which
100 Chapter 4 – Channel Coding
the errorcorrection radius (in Euclidean space) is equal to that of a decoder that
maps every point in Euclidean space to the nearest codeword. He proposes two
Chaselike BD decoders: a static method requiring about d/6trials, and a dy
namic method requiring only about d/12 trials. Hence, the complexity is reduced
by factors of three and six, respectively, comparedto the Chase3 algorithm.
4.2.3 Decoding of Convolutional Codes
The Viterbi algorithm [110, Ch. 4] is a wellknown method for decoding convo
lutional codes that minimizes the sequenceerror probability. It is the most pop
ular decoding algorithm for decoding convolutional codes with a short constraint
length. In literature, quite some attention has been paid to implementation aspects
of the algorithm. Also some contributions to the WIC symposia dealt with imple
mentation aspects of the Viterbi algorithm.
In [369], Nouwens and Verlijsdonk discuss (in Dutch) softdecision Viterbi de
coding of a rate R=1/2,K=3convolutional code with generator polynomials
1+D+D
2and 1+D
2that is used on an AWGN channel. The effect of quan
tization of the bit reliabilities that serve as input to the Viterbi decoder is studied.
An equallyspaced quantizer is assumed, and the level spacing is determined to
optimize the union bound on the error probability after decoding.
Baggen, Egner and Vanderwiele [448] discuss quantization for a Viterbi decoder
used on a Rayleigh fading channel. Also here, an equallyspaced quantizer is con
sidered. The level spacing is now computed in such a way that the cutoff rate
of the discrete channel resulting from this quantization is optimized. The optimal
spacing depends only weakly on the average SNR, and it is better choose one that
is too large than one that is too small. Simulation results suggest that the spacing
that maximizes the cutoff rate is optimal for Viterbi decoding as well.
Quantization of the bit reliabilities is not the only important practical aspect of
Viterbi decoding; one also has to determine which numerical range sufﬁces for
performing the required computations. In [393] and [397], Hekstra gives results
on the maximum difference between path metrics in Viterbi decoders. From this
maximum difference, he derives consequences for reduction of the required nu
merical range.
The Viterbi algorithm operates on a trellis that has a number of states that is ex
ponential in the encoder constraint length. Consequently, the implementation of
the Viterbi algorithm is impractical for convolutional codes with a large constraint
length. In this case, sequential decoding [110, Ch. 6], which can be seen as a back
tracking decoding method, can be applied. In the basic stack algorithm, a search is
performed in a tree, while a list is maintainedof paths of different lengths ordered
according to their metrics. The path with the highest metric is extended and sub
sequently removed, while the newpaths are placed within the ordered list (stack).
The stack algorithm suffers incompletedecoding because the stack is full (“stack
overﬂow”). Its number of required computations depends on the actual noise se
4.2 Decoding Techniques 101
quence. In [351], Schalkwijk describes several ways of reducing the complexity
of sequential decoders, using the syndrome of the receivedvector. One of the ob
servations is that extension of a noise sequence with a “zero” digit is much more
likely than extension with a “one” digit, and that one has to consider more noise
digits at each decoding step to obtain two apriori equally likely extensions. Sim
ulations results are given.
The malgorithm is a list decoding algorithm [110, Ch. 5]. It is a nonbacktracking
method and, in contrast to sequential decoding, its decoding complexity does not
depend on the actually received sequence. The idea of the algorithm is that at
each time instant, a list of the mmost promising initial (equal length) parts of the
codewords is extended. In [383], Van der Vleuten and Vinck describe an imple
mentation of the malgorithm. Paths for which the metric is below the median are
extended; the others paths are not. As ﬁnding the median of mnumbers is linear in
m, the time complexity of the algorithm is linear in m. Their ingenious traceback
method allows use of a small traceback memory.
Assume that we generate the list of the mmost likely transmitted words from
a convolutional code, given the received sequence. If messages include a CRC
check sum, the most likely codeword in the list that has a correct CRC checksum
can be selected as ﬁnal decoding result. In this way, a signiﬁcant decoding gain
over conventional Viterbi decoding (m=1) can be obtained. In [447], Hekstra
proposes to generate an unorderedlist of all words for which the path metric ex
ceeds that of the most likely path with at most B. In this way, sorting of paths
according to their path metrics is avoided. An algorithm for generating this list
is given. The length of the list is a random variable. A strategy is described for
choosing Bin such a way that the list size remains reasonable. Simulation results
are presented, showing a decoding gain of about 1.5 dB for the coding scheme
employed in GSM/GPRS on a static AWGN channel.
In 1983, Best [353] describes a convolutional decoder that outputs reliability in
formation. This decoder seems to be a rediscovery of the BCJR algorithm or
forwardbackwardalgorithm described by Bahl, Cooke, Jelinek and Raviv in 1974
[34] and well forgotten untilits usage in the decoding of Turbo codes in the 1990s.
Best considers such a decoder “not useful for practical purposes because of speed
limitations”, but he does ﬁnd it useful for theoretical insight in what happens in
decoding. He mentions that the likelihood of a state in a most likely path is almost
always equal to one, until the decoder is forced to choose between two paths with
almost the same metric. In that case, the probability drops to about one half, and
remains on that value until paths merge. As a result, Best was led to modify a
Viterbi decoder so that it outputs both alternative paths in case of a close decision.
In a concatenated code system, the outer code then can decide which path is the
correct one.
The Viterbi algorithm minimizes the sequence error probabilities, while the BCJR
algorithm [34] minimizes the bit error probability. In concatenated coding schemes,
it seems more important to minimize the error probability of the symbols entering
102 Chapter 4 – Channel Coding
the outer decoder. Willems and Pa˘si´c [413] describe an implementation of such a
decoder with a complexity much lower than that achieved before, but still signif
icantly larger than that of a Viterbi decoder. Simulations with a speciﬁc convolu
tional code show that the symbol erroroutput rate of the proposed decoder is only
negligibly lower than with Viterbi decoding. The proposed decoder has the advan
tage of generating softoutput information about the symbols, which can possibly
be used by the outer decoder.
We ﬁnalize this section by discussing papers dealing with the performance of
MaximumLikelihood (ML) decoded convolutional codes employed on a binary
symmetric channel with error probability p.
Post [346] describes an upper bound for the ﬁrst error event probability of ML
decoding. First, with the aid of the codeword enumerator of the code, he derives
lower bounds on the weights of error patterns of a given length that a ML de
coder does not decode correctly. Next, by analyzing a related random walk, he
determines the probability of occurrence of error patterns satisfying these lower
bounds. For small p, the wellknown union bound is sharper, but for larger p,
Post’s bound is sharper.
Schalkwijk [348] describes a syndrome decoderfor ML decoding of convolutional
codes with the aim of analyzing the ﬁrst error event probability. A diagram incor
porating metrics and states is studied, and a Markov chain technique is applied for
estimating the error event probability. This approach was continued and extended
by Best, who shows in [368] thata convolutionalcoding scheme with ML decod
ing over a discrete memoryless channel can be modeled as a Markov chain. This
model allows exact analysis of the statistical behavior of the errors. The method
is illustrated with a R=1/2code with constraint length 1, used over a binary
symmetric channel. Unfortunately, the amount of computation grows rapidly with
the constraint length of the code. For example, according to the author, for the
“standard code” with constraint length 3 and generator polynomials 1+Dand
1+D+D
2used on a binary symmetric channel, the Markov model hasas many
as 104 states. In 1995, this work was reported on in [94], dedicated to the memory
of Mark Best – see Figure 4.4.
4.2.4 Iterative Decoding
The introduction of turbo codes [90] in 1993 caused a true revolution in the ﬁeld
of error control coding. In their original form, turbo codes combine two recursive
convolutionalcodes along with a pseudorandom interleaverin a parallel concate
nated coding scheme. Through a maximum a posteriori (MAP) iterative decoding
process, performances very close to the Shannon limit are achieved. As men
tioned by Wicker in [108, Ch. 25, Sect. 11], turbo codes initially met with some
skepticism, but already four years after their introduction, a turbo code experi
mental package was launched into space aboard the Cassini spacecraft. Further
research on iteratively decodable codes resulted in the rediscovery of Gallager’s
4.2 Decoding Techniques 103
Figure 4.4: Paper in IEEE Transactions on Information Theory based on [368].
lowdensity paritycheck (LDPC) codes [15], dating from the 1960s. Currently,
both turbo codes and LDPC codes are studied extensively and are considered as
the most promising candidate codes for many application areas. For example,
turbo codes have been implemented in UMTS, the thirdgeneration mobile com
munication standard.
In [421], Tolhuizen and HekstraNowacka consider turbo coding schemes employ
ing serial (instead of parallel) concatenation. They focus on the word error rate
after decoding, for which they give the average union bound. In order to compute
this bound, one needs the inputoutput weight enumerator of the inner decoder.
The authors providean explicit formula for this enumerator, and apply it to some
speciﬁc examples.
Dielissen en Huisken [432] explain four implementation techniques for the soft
input softoutput (SISO) decoding module of a thirdgeneration mobile commu
nication turbo decoder. They compare the performance and implementationcosts
(in terms of silicon area and power dissipation). The ﬁnal choice is not trivial, but
a tradeoff between different aspects.
104 Chapter 4 – Channel Coding
The inputs and outputs of an aposteriori probability (APP) decoder as used in
turbo decoding can be represented as loglikelihood ratios (LLRs). Hagenauer’s
box function log((1 + ex+y)/(ex+ey)) can be used to establish an explicit input
output relation of an APP decoder. Janssen and Koppelaar [433] consider turbo
codes with BPSK modulation over an AWGN channel. They show that the ran
dom variable zthat is the output of the box function exhibits the LLR property,
that is, for each z,
log pz(zb=0)
p
z
(−zb=0)=z. (4.8)
They study the effect of mismatched inputs to the box function, and give upper
and lower bounds on the LLR at the output of the box function as a function of
mismatch.
Le Bars, Le Dantec and Piret [443] focus on the design of the interleavers in a
turbo coding scheme. The authors present an algebraic interleaver construction
method leading to codes with a high minimum distance. The performance of these
codes are very good at high signaltonoise ratios.
In [435], Balakirsky describes a realization of the MaximumLikelihood(ML) de
coding algorithm for messages encoded by an LDPC code and transmitted over a
binary symmetric channel. The algorithm is based on the introduction of a tree
structure in a space consisting of all possible noise vectors and principles of se
quential decoding with the use of a special metric function. The author derives an
upper bound on the exponent of the expected number of computations in the en
semble of lowdensity codes and shows that it is much smaller than the exponent
for the exhaustive search. It should be noted that this work is based on a (Russian)
paper by the author dating from 1991, i.e., from well before the worldwide redis
covery of LDPC codes!
Steendam and Moeneclaey [441] derive the ML performance of LDPC codes, con
sidering BPSK and QPSK transmission over a Gaussian channel. They compare
the theoretical ML performance with that of the iterative decoding algorithm. It
turns out that the performance of the iterative decoding algorithm is close to the
ML performance when the girth of the code is sufﬁciently high.
4.3 Codes for Data Storage Systems
Given the continuing demand for increased data storage capacity, it is not surpris
ing that interest in codingtechniques for mass data storage systems, such as optical
and magnetic recording products, has continued unabated ever since the day when
the ﬁrst mechanical computer memories were introducedin the 1950s. Evidently,
technological advances such as improved materials, heads, mechanics, and so on
have been the driving force behind the “ever” increasing data storage capacity, but
stateoftheart storage densities are also a function of improvements in channel
coding, the topic addressedin this section. The book by Immink [109] and the sur
vey article by Immink, Siegel and Wolf [107] offer a comprehensive description
4.3 Codes for Data StorageSystems 105
of the literature on this topic.
Optical recording, developed in the late 1960s and early 1970s, is the enabling
technology of a series of very successful productsfor digital consumer electronics
systems such as Compact Disc (CD), CDROM, CDR, and Digital Video Disc
(DVD). The design of codes for optical recording systems is essentially the design
of combined dcfree,runlength limited (DCRLL) codes.
An encoder accepts a series of information words as an input and transforms them
into a series of output words, called codewords. Binary sequences generated by
a(d, k)RLL encoder have, by deﬁnition, at least dand at most k0s between
consecutive 1s. Let the integers mand ndenote the information word length and
codeword length, respectively. The code rate,R=m/n, is a measure of the
code’s efﬁciency. The maximum rate of an RLL code, givenvalues of dand k,is
called the Shannon capacity, and it is denoted by C(d, k)[3].
Early examples of RLL codes have been given by Berkoff [16] some forty years
ago, and since then the chase of various code designers in the world has been
the creation of “practical” RLL codes whose rate approaches Shannon’s theoret
ical rate limit. Hundreds of examples of RLL codes have been published and/or
patented over the years. Dcfree codes, as their name suggests, have no spectral
components at the zero frequency and suppressed spectral content near the zero
frequency.
4.3.1 RLL Block Codes
One approach that has proved very successful for the conversion of source in
formation into constrained sequences is the one constituted by block codes. The
source sequence is partitioned into blocks of length m, called source words,and
under the code rules such blocks are mapped onto words of nchannel symbols,
called codewords. In order to clarify the concept of blockdecodable codes, we
have written down a simple illustrative case of a rate 3/5, (1,∞)block code. The
codeword assignment of Table 1 providesa simple block code that converts source
words of bit length m=3into codewords of length n=5. The two leftmost
columns tabulate the eight possible source words along with their decimal repre
sentation. We haveenumerated all eight words of length four that comply with the
d=1constraint. The eight codewords, tabulated in the righthand column, are
found by adding one leading zero to the eight 4bit words, so that the codewords
can be freely cascaded without violating the d=1constraint.
The code rate is m/n =3/5<C(1,∞)'0.69.., where C(1,∞)denotes the max
imum rate possible for any d=1code irrespective of the complexity of such an
encoder. The code efﬁciency, expressed as the quotient of code rate and Shannon
capacity of the (d, k)constrained channel having the same run length constraints,
is R/C(d, k)'0.6/0.69 '0.86.Thus the very simple block code considered is
sufﬁcient to attain 86% of the rate that is maximally possible.
106 Chapter 4 – Channel Coding
Table 4.1: Simple (d=1)block code.
source output
0 000 00000
1 001 00001
2 010 00010
3 011 00100
4 100 00101
5 101 01000
6 110 01001
7 111 01010
It is straightforward to generalize the preceding implementation example to en
coder constructions that generate sequences with an arbitrary value of the mini
mum run length d. To that end, choose some appropriate codeword length n. Write
down all dconstrained words that start with dzeros. The number of codewords
that meet the given run length condition is Nd(n−d), which can be computed
with generating functions or recursive relations [23].
A maximum run length constraint, k, can be incorporated in the code rules in a
straightforward manner. For instance, in the (d=1)code previously described,
the ﬁrst codeword symbolis at all times preset to zero. If, however, the last sym
bol of the preceding codeword and the second symbol of the actual codeword to be
conveyed are both zero, then the ﬁrst codeword symbol can be set to one without
violating the d=1channel constraint. This extra rule, which governs the selec
tion of the ﬁrst symbol, the merging rule, can be implemented quite smoothly with
some extra hardware. It is readily conceded that with this additional ‘merging’
rule the (1,∞)code turns into a (1,6) code. The process of decoding is exactly
the same as that for the simple (1,∞)code, since the ﬁrst bit, the “merging” bit, is
redundant, and in decoding it is skipped anyway. The (1,6) code is a good illustra
tion of a code that uses statedependent encoding (the actual codeword transmitted
depends on the previous codeword) and stateindependent decoding (the source
word can be retrieved by observing just a single codeword, that is, without knowl
edge of previous or upcomingcodewords or the channel state).
The ﬁrst article describing RLL block codes was written by Tang and Bahl [23] in
1970. It describes a method where (d, k)constrained info blocks of length n0are
cascaded with merging blocks of length d+2. Twelve years later, it was shown
by Beenker and Immink [60] that their method can be made more efﬁcient by con
straining the maximum number of zeros at the beginning and start of the (d, k)
constrained info blocks to k−d. Then merging blocks of length dare sufﬁcient
to cascade (glue) the info blocks. The authors presented two constructions. In the
ﬁrst construction, the merging block is the allzero word (as in Table 1), while in
the second (more efﬁcient) construction, the merging blocks depend on the two
neighboring info words.
4.3 Codes for Data StorageSystems 107
The methods described by Weber and AbdelGhaffar[392] [395] offer a more ﬂex
ible and efﬁcient method for cascading RLL blocks than that described in the early
literature, speciﬁcally for the case where kis rather small. The method presented
by Tjalkens [394] does not use ‘merging bits’ to cascade the RLL info blocks,
but Tjalkens, alternatively, shows that with the set of (d, k)constrained codewords
that start with at least dzeros and end with at most k−1zeros one may construct
a RLL block of maximum size. Later constructions showed that merging blocks
of length less than dcan be used, where the merging algorithm can alter both the
merging block and (small) parts of the info word.
The article by Hollmann and Immink [390] addresses the problem of generating
RLL sequences, where we have the additional demand that a certain, prescribed,
sequence of run lengths is not allowed to be generated. Said speciﬁc sequence of
run lengths that should be avoided is called a preﬁx, which is normally used in
recording practice as a synchronization pattern.
In essence all articles mentioned above discuss block codes. The article by Holl
mann [398] uses a completely different approach, as codes generated by his con
structions must be decoded by slidingblock decoders. A slidingblock decoder
observes the nbit codeword plus rpreceding nbit codewords plus qtrailing nbit
codewords. Such a slidingblock concept leads to codes having a high efﬁciency,
involvingsmall hardware, and that usually do not have too many signiﬁcant draw
backs. A drawback of codes that are decoded by a slidingblock decoder iserror
propagation, as the decoding operation depends on r+q+1consecutive code
words. In practice, the increased efﬁciency and reduced hardware of a sliding
block decoder outweigh the extra load on the error correction unit. There are vari
ous coding formats and design methods with which we can construct such codes.
Immink [114] has recently shown that very efﬁcient sliding block codes can be
designed. For example, a rate 9/13, (1,18) 5state encoder has a redundancy of
0.2%, while a rate 6/11, (2,15) 9state encoder has a redundancy of 0.84%.
The article by AbdelGhaffar and Weber [412] addresses runlengthconstrained
channels, where there is, as in the prior art, a maximum run length constraint,
and additionally a maximum run length constraint on both the odd and the even
positions of the encoded sequence. These codes are often called (0,G/I)con
strained, where Gdenotes the maximum run length constraint on the sequence,
and Idenotes the maximum run length imposed on the symbols at the odd and
even positions. AbdelGhaffarand Weber study block codes, where they show re
sults on the maximal size of a set of (0,G/I)constrained codewords of length n
that can be freely concatenated without violating the speciﬁed(0,G/I)constraint.
Closing Remark by the Editors
The work described in several WIC papers of SchouhamerImmink et al. summa
rized in this subsection on RLL codes has found its way in consumer electronics
products, such as CD and DVD. His contributions to these products have gained
108 Chapter 4 – Channel Coding
him acknowledgment from several international institutions and societies.
4.3.2 DcFree Codes
Dcbalanced or dcfree codes, as they are often called, have a long history and
their application is certainly not conﬁned to recording practice. Since the early
days of digital communication over cable, dcbalanced codes have been employed
to counter the effects of lowfrequency cutoff due to coupling components, isolat
ing transformers, and so on. In optical recording, dcbalanced codes are employed
to circumvent or reduce interaction between the data written on the disc and the
servo systems that follow the track. Lowfrequency disturbances, for example due
to ﬁngerprints, may cause completely wrong readout if the signal falls below the
decision level. Errors of this type are avoided by highpass ﬁltering, which is
only permissible provided that the encoded sequence itself does not contain low
frequency components, or, in other words, provided that it is dcbalanced.
Rejection of LF components is usually achieved by bounding the accumulated
sum of the transmitted symbols. Common sense tells us that a certain rate has to be
sacriﬁced in order to convertarbitrary user data into a dcbalanced sequence. The
quantiﬁcation of the maximum rate, the capacity, of a sequence given the fact that
it contains no lowfrequency components has been reported by Chien [22]. The
articles by Immink [358] and De With [360] provide a description of key charac
teristics of dcfree sequences generated by a Markov information source having
maximum entropy. Given the fact that a Markov source, which describes a dc
balanced sequence, is maxentropic, we can substitute the maxentropic transition
probabilities. Then computation of the spectrum is straightforward. Knowledge
of ideal, “maxentropic” sequences with a spectral null at dc is essential for under
standing the basic tradeoffsbetween the rate of a code and the amount of suppres
sion of lowfrequencycomponents. The results obtained in [358] and [360] allow
us to derive a ﬁgure of merit of implemented dcbalanced codes that takes into
account both the redundancy and the emergent frequency range with suppressed
components (notch width).
Beenker and Immink [367] present a category of dcfree codes called dc2free
codes. This type of codes offers a larger rejection of lowfrequency components
than is possible with the traditional codes discussed in the prior art. Besides the
trivial fact that they are dcbalanced, an additional property of dc2free codes is
that the second (and even higher) derivative of the code spectrum also vanishes at
zero frequency (note that the odd derivatives of the spectrum at zero frequency are
zero because the spectrum is an even function of the frequency). The imposition of
this additional channel constraint results in a substantial decrease of the power at
the very low frequenciesfor a ﬁxed code redundancy as compared with the designs
based on the conventional ‘bounded accumulated sum’ concept. The drawback of
this new scheme is the implementation of the codes, as it demands signiﬁcantly
more hardware and large codewords at high coding rates.
4.4 Codes for Special Channels 109
4.3.3 ErrorDetecting Constrained Codes
The paper by Immink [374] offers coding techniques for simple partialresponse
channels. He showed that the simple biphase code can be used as an inner code
of an outer code designed for maximum (free) Hamming distance. The paper by
Weber and AbdelGhaffar [389] discloses a class of runlengthlimited codesthat
can detect asymmetric errors made during transmission. Baggen and Balakirsky
[450] consider data transmission over socalled bit shift channels with (2,∞)RLL
constraints, and obtain bounds on the entropy of the output sequences.
4.4 Codes for Special Channels
4.4.1 Coding for Memories with Defects
In 1974, Kusnetsov and Tsybakov introduced [35] the following model for coding
for memories with stuckat defects. In some memory cells, known to the encoder,
only one particular symbol (known to the encoder) can be written. The decoder
does not know in which positions stuckat errors occur. The question is how much
information can be stored in such a memorywith stuckat defects. Kusnetsov and
Tsybakov [35] gave upper bounds on the rate that can be obtained if a fraction
pof the positions contain stuckat errors. With a random coding argument, they
obtained the surprising result that the capacity of a stuckat channel with stuckat
probability pequals 1p.
Some ten years later, coding for stuckat defects was a popular subject at vari
ous WIC symposia. In 1985, Van Pul [361] described an explicit construction for
obtaining the capacity of the stuckat channel with stuckat probability p.Inthe
same year, Baggen[362] showed that MDS codesachieve the upperbound on the
information rate, given the number of stuckat errors combined with random er
rors. Vinck [363] varies on the theme by using convolutionalcodes for correcting
bursts of defect errors, separated by guard spaces. In [382], Peek and Vinck give
an explicit algorithm for the binary stuckat channel. Bounds for the bit error rate
and the decoding complexity are also obtained. Schalkwijk and Post [381] take
an informationtheoretic approach to coding for stuckat errors. Indeed, suppose
that information is stored in elementary blocks of nbits. The memory with known
defects is then equivalent to a noisy channels with input and output alphabets of
size 2n. This “superchannel” can be described by a strategy in which an nbits
input block is to be used for a particular input message and defect pattern. In a
memory with known defects, the bit values that are eventually read out become
available at the moment of storing. In other words, the equivalent super channel
has perfect feedback, and repetition feedback strategies can be used [26] – see also
Section 4.4.5. Strategies for small nare described.
Vinck and Post [376] discuss the following combined test and errorcorrection
procedure. A message mof even length is initially written in memory as x(m)=
(0,m,P), where P is the parity of m. Upon reading a word zfrom memory, we
check if it has an even number of ones. If so, we leave it unchanged; if not, we
110 Chapter 4 – Channel Coding
invert all its bits and obtain z0.Ifzoriginates from x(m)by a single stuckat er
ror, then all bits of zexcept for the stuckat bit are actually inverted; the stuckat
bit keeps its value that is incorrect for x(m). Consequently, z0is the complement
of x(m). We see that mcan be represented by two messages, namely x(m)and
its complement, as long as at most one stuckat error occurs in the bits of word.
Note that both x(m)and its complement have an even number of ones. We keep
applying the same procedure. A next single stuckat error that occurs in the course
of time is detected, as inversion of the word leads to a 0 in the leftmost bit. Upper
and lower bounds on the meantime before a memory fails with this procedure are
given, and an extension of the procedure for combinationwith coding for random
(nonpermanent) errors is indicated.
In 1989, Bassalygo, Gelfand and Pinsker [76] introduced the model of localized
errors. In this model, the encoder knows a set of Eof codeword positions in
whichanerrormay occur; outside E, no errors occur. The decoder does not know
E. Coding for this model received quite some attention in the early nineties, as
indicated by Bratatjandra and Weber in their paper from 1997 [417]. In this paper,
the authors take for Ea set of multiple burst errors, that is, Eis the union of a col
lection of disjoint sets of consecutive positions. In literature, the main attention is
on the sets Econsisting of all set of positions up to a certain cardinality. Bratatjan
dra and Weber assume that both encoder and decoder know an upper bound mon
the number of bursts, and an upper bound bon the length of each burst. They give a
“ﬁxedrate” scheme for this situation. They also give a “variablerate” scheme that
allows the transmitter to send more information information if the actual number
of burst errors is below m, or one or more of the burst lengthsis below b.
4.4.2 Asymmetric/Unidirectional Error Control Codes
Most classes of error control codes have been designed for use on binary symmet
ric channels, on which 0→1crossovers and 1→0crossovers occur with equal
probability (symmetric errors). However, in certain applications, such as optical
communications, the error probability from 1to 0may be signiﬁcantly higher than
the error probability from 0to 1. These applications can be modeled by an asym
metric channel, on which only 1→0transitions can occur (asymmetric errors).
Further, some memory systems behave like a unidirectional channel, on which
both 1→0and 0→1errors are possible, but per transmission, all errors are of
the same type (unidirectional errors).
Codes that detect and/or correct symmetric errors have been studied extensively
since the 1940s. Of course, these codes can also be used to detect and/or correct
asymmetric or unidirectional errors. However, it seemed likely that it should be
possible to design codes that detect and/or correct asymmetric or unidirectional
errors which need less redundancy than a comparable symmetric error correcting
code. Pioneering work in this area was done by Varshamov [33] in the 1960s and
1970s. In the Benelux, the topic was further explored by Weber and various co
authors in the late 1980s and early1990s.
4.4 Codes for Special Channels 111
In [377], Weber, De Vroedt and Boekee propose a method to construct codes cor
rectinguptotasymmetric errors by expurgating and puncturing codes of Ham
ming distance 2t+1. The resulting codes are often of higher cardinality than their
symmetric errorcorrectingcounterparts, but are mostly nonlinear. The same group
of authors derived bounds on the sizes of codes that correct unidirectional errors
[378], and they determined necessary and sufﬁcient conditions for a block code to
be capable of correcting/detecting any combination of symmetric, unidirectional,
and asymmetric errors [384].
For practical purposes it is highly desirable that a code is systematic, i.e., that
the message is to be found unchangedin the codeword. In [399], Weber and Kaag
present a construction method for systematic codes which are able to correct up to
tasymmetric errors and detect from t+1up to dasymmetric errors.
Finally, in [405], Weber studies the asymptotic behavior of the rates of optimal
codescorrectingand/ordetecting combinations of symmetric, unidirectional,and/or
asymmetric errors. The main conclusion is that, without loosing rate asymptoti
cally, one can upgrade any error control combination to simultaneous symmetric
error correction/detection and all unidirectional error detection.
4.4.3 Codes for Combined Bit and Symbol Error Correction
In 1983, Piret introduced [355] binary codes for compound channels where both
bit errors and symbol errors occur, where a symbol is a ﬁxed group ofbit positions.
He introduces a distance proﬁle to measure the error control capabilities and gives
some examples of codes for combined bit and symbol error control.
Two years later, Van Gils published the ﬁrst of a series of 3 papers dealing with
the construction of codes for combined bit and symbol error correction. In the
application that Van Gils has in mind, a symbol correspondsto a module in a pro
cessor. An erased symbol thus corresponds to a module that is detected to be in
error, while an erroneous symbol corresponds to a malfunctioning module that is
not detected to be in error. In [366], Van Gils announces binary [3k, k]codes for
k=4,8,16 that can correct one single symbol error (i.e., one of the three groups
of kbits is in error), up to k/4+1 bit errors, and one single symbol erasure plus
up to k/4bit errors (for k=4,8) or 3 bit errors (for k=8). In addition, for
k=8and k=16,k/4+2 bit errors can be detected. In [371], he describes a
binary [27,16] code, with symbol size 9, that can correct single bit errors, detect
single (9bit) symbol errors and detect up to four bit errors. Finally, in [372], Boly
and Van Gils suggest to construct codes for controlling bit and symbol errors by
representing the symbols from a symbolerror correcting code with respect to a
judiciously chosen basis.
4.4.4 Coding for Informed Decoders
In 2001, Van Dijk, Baggen and Tolhuizen introduced informed decoding [438].
This concept was inspired by the following practical application. The address of
112 Chapter 4 – Channel Coding
a sector of an optical disc is part of a header that is protected by its own error
correcting code. In many circumstances, the location of the reading/writing head
is approximately known. The question is whether it is somehow possible to use
this information on the actual sector address for retrieving the header more reliably.
With informed decoding, it assumed that the decoder is informed about the value
of some information symbols of the transmitted codeword. The authors show that
with judicious encoding, the decoder can employ such information to effectively
decode to a subcode with a larger minimum distance. Three ways to encodewell
known codes that lead to favorable decoding capabilities are presented.
In [440], Tolhuizen, Hekstra, Cai and Baggen discuss two aspects of coding for
informed decoding. Firstly, they propose to use a certain Gray code for address
ing sectors in such a way that all addresses of sectors close to a target sector have
many coordinates in common. In this manner, it is ensured that whenever the read
ing/writing head lands close to the target sector, many coordinates of the address
of the sector in which the head actually lands are known. It is claimed that the
proposed method yields the maximum number of common coordinates for each
maximum deviation of the targetsector. The other aspect aims to improve decod
ing for data encoded using a formed informed decoding, but where no informa
tion about known informationsymbols is supplied to the encoder. This is done by
combining the codewords of several consecutive sectors, which usually have many
information symbols in common.
4.4.5 Coding for Channels with Feedback
Already in 1956, Shannon proved [10] the surprising fact that feedback does not
increase the capacity of a discrete memoryless channel. Feedback may, however,
signiﬁcantly reduce the complexity that is required to obtain reliable communica
tion. In 1971, Schalkwijk presented simple ﬁxedlength feedback strategies for the
binary symmetric channel with error probability p[26].Itisassumedthatthefeed
back is errorfree and instantaneous, that is, immediately after the transmission of
a bit, the transmitter knows which bit value has been received. Schalkwijk’s strate
gies achieve an upper bound on the rate below which reliable communication is
possible and can be described as follows. A message index sis precoded to an
nbits message mthat does not contain a run of kequal symbols. The transmitter
consecutively transmits the bits of muntil the feedback reports the occurrence of
an error. In such a case, the bit that was meant to be transmitted is repeated k
times and transmission continues until the next error occurs. If all bits of mhave
been transmitted successfully, a tail is added until nbits have been transmitted.
The receiver decodes as follows. Working its way back from the last received bit,
it replaces subsequences 01kby1and10kby 0, respectively, and afterwards, it
removes the tail.
In the 1990s, Veugen and Bargh, two Ph.D. students of Schalkwijk, build further
on his research on channels with feedback. The remainder of this section describes
their work as presented at various WIC symposia.
4.4 Codes for Special Channels 113
A possible choice for the tails in Schalkwijk’s strategy is the alternating sequence
0101.... In [407], Veugen studies conditions on the tails that are sufﬁcient for
correct operation of Schalkwijk’s strategies. In [396], he introduces the following
generalization of Schalkwijk’s scheme. Each bit of the message mis transmitted
ctimes in cconsecutive transmissions. If not all creceived bits are equal, the re
ceiver neglects them, and the transmitter again transmits the intended message bit
ctimes, until cequal bits are received. If the receiver decodes incorrectly, which
happens if the channel produces cconsecutive errors, the transmitter acts like that
in Schalkwijk’s scheme: it inserts the last message bit ktimes in the message m.
This scheme reduces to Schalkwijk’s scheme if c=1.Forc>1, it introduces
large redundancies, so it is not suitable for small p. For each p<1/2,astrategy
can be found that has a positive rate. The schemes need less than 1 bit feedback
per transmitted bit, as for each cbits, the encoder only needs to know if they were
all zero, all one, or not all equal.
In [406], Veugen considers the following extension of Schalkwijk’s scheme to
nonbinary channels. If the transmitter observes that symbol jwas received, al
though it sent symbol i, it immediately repeats symbol ik
ij times. A precoder
takes care that in the data stream to be transmitted, subsequences of the form jikij
(with i6=j) do not occur. Veugen considers decoding with a ﬁxed delay D.That
is, suppose the sequence (xn)n≥0is transmitted, and the sequence (yn)n≥0is re
ceived. Symbol ynwill be decoded as follows. The sequence yn,y
n+1,...,y
n+D
is scanned from right to left, and each subsequence jikij is replaced by i.The
leftmost symbol of the resulting sequence is the estimate ˆxn. By comparing ˆxand
y, the precoder inverse can locate the errors and eliminate the error correction
symbols. Veugen studies the error probabilities for these schemes. Combining
calculations on random walks and a plausible conjecture, he computes the error
exponent of the strategy.
In [414], Schalkwijk and Bargh consider the situation where the feedback link is
without delay and noiseless, but operates at a smaller rate than the forward chan
nel. They combine Ungerboeck’s set partitioning technique and feedback schemes
for fullrate feedback. The feedback scheme is used to see if the received signal
was in the correct subset of signal points. If so, convolutional decoding is expected
to retrieve the remaining information correctly. If not, the label of the subset of
signal points is repeated. An example with feedback rate 1/2and a ν=2con
volutional code shows a much better performance than a much more complicated
ν=6convolutional code.
In [423], Bargh and Schalkwijk compare the block coding strategies discussed
above with a recursive scheme. In the latter case, decoding takes place after a ﬁxed
delay D. A new strategy is discussed, and results on the rate and error exponent
are obtained. In [428], Bargh and Schalkwijk introduce SoftRepetition Feedback
Coding and its recursivedecoding method for binary input, softoutput symmetri
cal Discrete Memoryless Channels. The method is explained with a binaryinput,
quaternary output channel.
114 Chapter 4 – Channel Coding
In [429], Bargh and Schalkwijk give an overview of error correction schemes in
DMCs and AWGN channels with noiseless, instantaneous and fullrate feedback.
They distinguish between two classes. In the ﬁrst class, which they call “repeat to
resolve uncertainty”, the transmitter conceptually reconstructs the list of candidate
codewords for the decoder, and aims to reduce this list size with every transmis
sion. The second class of schemes, called “repeat to correct erroneous reception”,
the transmitter repeats a message segment if it is received incorrectly. In such
schemes, a mechanism is required to signal to the receiver whether transmission is
repeated, or a new segment is transmitted.
4.5 Applications
Channel coding theory is applied in a wide range of areas: deep space communi
cation, satellite communication, data transmission, data storage, mobile commu
nication, ﬁle transfer, digital audio/video transmission, etc. For an overview of
applications in the ﬁrst ﬁfty years following Shannon’s 1948 “noisy channel cod
ing theorem”, we refer to [105]. One of the most notable success stories for the
Benelux in this respect is the development of the compact disc (CD) in the late
1970s and early 1980s [109]. In this section we provide an overview of various
applications reported at the symposia on InformationTheory in the Benelux.
In [347], Roefs discusses candidate concatenated codingschemes (cf. Section 4.1.3)
for European Space Agency (ESA) telemetry applications in the early 1980s. The
inner code is ﬁxed as the standard rate 1/2 convolutional code of constraintlength 7,
but several candidates for the outer code are considered: ReedSolomon codes
with interleaving, Gallager’s burstcorrecting scheme, and Tong’s bursttrapping
scheme. Their performances are compared for dense burst channels with widely
varying burst and guard space lengths. This work is continued in [350]. In this pa
per, Best and Roefs again take as inner code the conventional rate 1/2 convolutional
code of constraint length 7. As outer code, they use a [256,224] ReedSolomon
code Cover GF(257). To be more precise, they propose to encode 224 nonzero
symbols (in GF(257)) systematically into a word from C. If a generated parity
symbol happens to be zero, it is replaced by the element 1 (in GF(257)). The au
thors argue that the encoding error probability introduced by this replacement is
negligible compared to the symbol error probability of the Viterbi decoder. The
choice for GF(257) instead of GF(256) is motivated by the resulting possibility to
employ the Fermat Number Transform for more efﬁcient encoding and decoding.
Van Gils [364] describes dot codes for product identiﬁcation (as an alternative
to the wellknown bar codes). As a product carrying a dot code word can have
several orientationswith respect to the readout device, the same product is iden
tiﬁed by several dot code words. It is indicated that for certain errorcorrecting
codes, this ambiguity can be efﬁciently resolved.
At the time when telephony, telegraphy, and postal services were still all carried
out by the PTT, Haemers considered the protectionof a binary representation of the
4.5 Applications 115
postal code, as printed on envelopes, against readout errors. In [365] he proposes
the use of an (extended) Hamming codes for this purpose, with a small modiﬁca
tion in order to increase the burst error detection capability.
Belgian bank account numbers consist of 12 digits, a9a8...a
1a
0c
1c
0,wherec
0
and c1are such that P9
i=0 ai(10)i≡10c1+c0(mod 97). The check digits c0and
c1serve to detect the most common errors made by humans when processing digit
strings (single errors, transpositions of consecutivesymbols). Stevens [388] shows
that replacing the modulus 97 by 93 slightly increases the error detection proba
bility. Another slight increase is obtained if it is stipulated that the bank account
number be divisible by93, i.e.,thatP9
i=0 ai(10)i+2 + 10c
1+c
0≡0 (mod 93).
Offermans, Breeuwer, Weber and Van Willigen [408] consider errorcorrection
strategies for Euroﬁx, an integrated radio navigation system that combines terres
trial LoranC and the satellitebased Global Positioning System (GPS). Differen
tial GPS messages are transported via the LoranC data link, which is disturbed by
continuous wave interference, crossrate interference, atmospheric noise, etc. In
order to combat these phenomena, the authors propose a coding scheme based on
the concatenation of a ReedSolomoncode and a parity check code.
In [411], Hekstra considers the following synchronizationproblem. Suppose that
when a bit string x=(x
1
,x
2,...,x
n)is written down, then either xor one of its
cyclic shifts, i.e., a string of the form (x1+i,x
2+i,...,x
n,x
1,...,x
i), could be
read out. The problem is how to efﬁciently encode much informationinto strings
such that all cyclic shifts of two distinct information strings are different. The au
thor proposes the followingmethod for efﬁcient encoding of nearly the maximum
amount of information. Suppose that n=2
m−1. Then encode k=n−m
information bits systematically to a cyclic Hamming code of length n, and sub
sequently invert the leftmost parity symbol. Synchronization is reestablished by
singleerror correction, followed by shifting the received sequence until the error
position corresponds to the leftmost parity bit.
In [418], De Bart shows that the channel coding scheme of the Digital Video
Broadcasting (DVB) satellite system, based on the concatenation of a ReedSolomon
code and a convolutionalcode, has to deal with ambiguities that cannot be solved
by the Viterbidecoder. The channel and the QPSK demodulator may cause trans
formations (rotations, shifts, etc.) yielding an incorrect sequence that resembles a
codeword of the originalconvolutional code. Joined synchronization of the Viterbi
and ReedSolomon decoders shouldsolve the problem.
A method for error correction in IC implementations of Boolean functions is pro
posed by Muurling, Kleihorst, Benschop, Van der Vleuten and Simonis [434]. The
methods corrects both manufacturing hard errors and temporary soft errors during
circuit operation. A systematic Hamming code is used, which can be implemented
through additional logic or even through software tools.
Desset [439] considers error control coding for Wireless Personal Area Networks
116 Chapter 4 – Channel Coding
(WPAN) in 2002. In a Wireless Personal Area Network, power consumption plays
a very important role. Highperformance channel coding strategies can be used to
obtain coding gain and thus reduce transmit power. The average energy required
per bit in a typical situation is about 15 nJ/bit. In addition, power consumption due
to the complexity of encoding and decoding has to be considered. The complexity
of Hamming codes, ReedMuller codes, ReedSolomon codes and Convolutional
and Turbo codes has been analyzed. The two constraints are in contradiction and
an optimum solution has to be found. The paper proposes a strategy to select error
correcting codes for WPANs. For applications with different average bit energies
ranging from 100 pJ/bit to 10 nJ/bit, the authors recommend Hamming codes,
short constraintlength convolutional codes, and turbo coding,respectively.