ArticlePDF Available

Abstract and Figures

Knuth's celebrated balancing method consists of inverting the first bits in a binary information sequence, such that the resulting sequence has as many ones as zeroes, and communicating the index to the receiver through a short balanced prefix. In the proposed method, Knuth's scheme is extended with error-correcting capabilities, where it is allowed to give unequal protection levels to the prefix and the payload. The proposed scheme is very general in the sense that any error-correcting block code may be used for the protection of the payload. Analyses with respect to redundancy and block and bit error probabilities are performed, showing good results while maintaining the simplicity features of the original scheme. It is shown that the Hamming distance of the code is of minor importance with respect to the error probability.
Content may be subject to copyright.
82 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 1, JANUARY 2012
Error-Correcting Balanced Knuth Codes
Jos H. Weber, Senior Member, IEEE, Kees A. Schouhamer Immink, Fellow, IEEE, and
Hendrik C. Ferreira, Senior Member, IEEE
Abstract—Knuth’s celebrated balancing method consists of in-
verting the first
bits in a binary information sequence, such that
the resulting sequence has as many ones as zeroes, and communi-
cating the index
to the receiver through a short balanced prefix.
In the proposed method, Knuth’s scheme is extended with error-
correcting capabilities, where it is allowed to give unequal protec-
tion levels to the prefix and the payload. The proposed scheme is
very general in the sense that any error-correcting block code may
be used for the protection of the payload. Analyses with respect to
redundancy and block and bit error probabilities are performed,
showing good results while maintaining the simplicity features of
the original scheme. It is shown that the Hamming distance of the
code is of minor importance with respect to the error probability.
I. INTRODUCTION
S
ETS of binary sequences that have a fixed length and
a fixed weight (number of ones)
are usually called
constant-weight codes. An important sub-class is formed
by the so-called balanced codes, for which
is even and
, i.e., all codewords have as many zeroes as ones.
Such codes have found application in various transmission and
(optical/magnetic) recording systems, e.g., in flash memories
[11]. A survey on balanced codes can be found in [4].
A simple method for generating balanced codewords, which
is capable of encoding and decoding (very) large blocks, was
proposed by Knuth [6] in 1986. In his method, an
-bit binary
data word,
even, is forwarded to the encoder. The encoder in-
verts the first
bits of the data word, where is chosen in such
a way that the modified word has equal numbers of zeroes and
ones. Knuth showed that such an index
can always be found.
The index
is represented by a balanced word of length . The
-bit prefix word followed by the modified -bit data word are
both transmitted, so that the rate of the code is
. The
receiver can easily undo the inversion of the first
bits received
once
is computed from the prefix. Both encoder and decoder
do not require large look-up tables, and Knuth’s algorithm is,
therefore, very attractive for constructing long balanced code-
words. The redundancy of Knuth’s method is roughly twice the
redundancy of a code which uses the full set of balanced words.
Since the latter has a prohibitively high complexity in case of
large lengths, the factor of two can be considered as a price to
Manuscript received January 05, 2011; revised July 11, 2011; accepted July
20, 2011. Date of publication September 15, 2011; date of current version Jan-
uary 06, 2012. This work was supported by Grant “Theory and Practice of
Coding and Cryptography, Award Number NRF-CRP2-2007-03.
J. H. Weber is with the Delft University of Technology, The Netherlands
(e-mail: j.h.weber@tudelft.nl).
K. A. S. Immink is with the Turing Machines BV, The Netherlands, and also
with Nanyang Technological University, Singapore (e-mail: immink@turing-
machines.com).
H. C. Ferreira is with the University of Johannesburg, Johannesburg, South
Africa (e-mail: hcferreira@uj.ac.za).
Communicated by M. Blaum, Associate Editor for Coding Techniques.
Digital Object Identifier 10.1109/TIT.2011.2167954
be paid for simplicity. In [5] and [10], modifications to Knuth’s
method are presented closing this gap while maintaining suffi-
cient simplicity.
Knuth’s method does not provide protection against errors
which may occur during transmission or storage. Actually, er-
rors in the prefix may lead to catastrophic error propagation in
the data word. Here, we propose and analyze a method to ex-
tend Knuth’s original scheme with error correcting capabilities.
Previous constructions for error-correcting balanced codes were
given in [2], [9] and [8]. In [9], van Tilborg and Blaum intro-
duced the idea to consider short balanced blocks as symbols of
an alphabet and to construct error-correcting codes over that al-
phabet. Only moderate rates can be achieved by this method,
but it has the advantage of limiting the digital sum variation
and the runlengths. In [2], Al-Bassam and Bose constructed bal-
anced codes correcting a single error, which can be extended to
codes correcting up to two, three, or four errors by concatenation
techniques. In [8], Mazumdar, Roth, and Vontobel considered
linear balancing sets and applied such sets to obtain error-cor-
recting coding schemes in which the codewords are balanced.
In the method proposed in the current paper, we stay very close
to the original Knuth algorithm. Hence, we only operate in the
binary field and inherit the low-complexity features of Knuth’s
method. In our method, the error-correcting capability can be
any number. The focus will be on long codes, for which table
look-up methods are unfeasible. An additional feature is the pos-
sibility to assign different error protection levels to the prefix
and the payload, which will be shown to be useful when de-
signing the scheme to achieve a certain required error perfor-
mance while optimizing the rate. In this context, it turns out that
the Hamming distance of the proposed code is of minor impor-
tance.
The rest of this paper is organized as follows. In Section II, the
proposed method for providing balancing and error-correcting
capabilities is presented. In Section III, the redundancy of the
new scheme is considered. The block and bit error probabili-
ties are analyzed in Section IV. Case studies are presented in
Section V. Finally, the results of this paper are discussed in
Section VI.
II. C
ONSTRUCTION METHOD
The proposed construction method is based on a combination
of conventional error correction techniques and Knuth’s method
for obtaining balanced words. The encoding procedure consists
of fours steps which are described below and illustrated in Fig. 1.
The input to the encoder is a binary data block
of length . Let
denote a run of bits , e.g., .
1) Encode
using a binary linear block code
of dimension , Hamming distance , and even length .
The encoding function is denoted by
.
0018-9448/$26.00 © 2011 IEEE
WEBER et al.: ERROR-CORRECTING BALANCED KNUTH CODES 83
Fig. 1. Encoding procedure.
2) Find a balancing index for the obtained codeword ,
with
.
3) Invert the first
bits of , resulting in the balanced word
.
4) Encode the number
into a unique codeword from a
binary code
of even length , constant weight , and
Hamming distance
. The encoding function is denoted
by
.
The output of the encoder is the concatenation of the balanced
word
, called the prefix, and the balanced word
, called the bulk or payload. It is obvious that
the resulting code
is balanced and has length and
redundancy
, and thus, code rate
and normalized redundancy . Its
Hamming distance
satisfies the following lower bound.
Theorem 1: The Hamming distance
of code is at least
Proof: Let and denote two different code-
words of
and let .If , then the Hamming
distance between the codewords is at least
, since and are
both in
.If , then the Hamming distance between the
codewords is at least
, which follows from the fact that
and are two different codewords from
, implying that , and the fact that and are
both balanced, implying that
is even.
Corollary 1: In order to make capable of correcting up to
errors, it suffices to choose constituent codes and with
distances
and , respectively.
Upon receipt of a sequence
, where and have lengths
and , respectively, a simple decoding procedure, illustrated
in Fig. 2, consists of the following steps.
1) Look for a codeword
in which is closest to , and set
.
2) Invert the first
bits in , i.e., set .
3) Decode
according to a decoding algorithm for code ,
leading to an estimated codeword
, and thus, to an esti-
mated information block
.
Fig. 2. Decoding procedure.
The following results are immediate.
Theorem 2: The proposed decoding procedure for code
corrects any error pattern with at most errors in the
first
bits and at most errors in the last bits.
Corollary 2: The proposed decoding procedure for code
corrects up to the number of errors
guaranteed by the Hamming distance result from Theorem 1.
Corollary 3: The proposed decoding procedure for code
corrects up to errors if and .
Typically, as suggested in Figs. 1 and 2, the length
of the
bulk is much larger than the length
of the prefix, and conse-
quently also
is usually (much) larger than . Hence, from
the results presented in this section, the final Hamming distance
is only in such cases, which may be very small with re-
spect to the overall length
. However, it will be shown
in Section IV, that from the perspective of achieving a certain
target decoding error probability, the overall Hamming distance
of our scheme is a parameter which is only of minor importance.
III. R
EDUNDANCY
In this section, we first derive a lower bound on the redun-
dancy of balanced codes with error-correcting capabilities.
Then, we compare the redundancy of the code from Section II
with this lower bound.
Let
denote the maximum cardinality of a code of
length
, constant weight , and even Hamming distance .
Hence, for any balanced code of even length
and Hamming
distance
, the redundancy is at least
(1)
Since
(2)
84 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 1, JANUARY 2012
the minimum redundancy for a balanced code without error cor-
rection capabilities [6] is
(3)
(4)
(5)
where the first approximation is due to the well-known Stirling
formula
(6)
No general expression for
is known, but bounds are
available in literature. From Theorem 12 in [1], we have the
upper bound
(7)
where
. Note that for , i.e., , this gives the
same expression as (2), and thus, the bound is tight in this case.
The upper bound (7) can be used to lower bound the minimum
redundancy
from (1) in case , i.e., is at least
(8)
Note that this expression can be decomposed into a contribution
from the balance property and a contribution
from the capability of correcting up to errors. Using Stirling’s
formula, we obtain the approximation
(9)
Next, we will investigate the difference between the lower
bound on the redundancy, of which it is unknown whether
it is achievable in general, and the redundancy
of the proposed
construction method. We know from [6] that the redundancy of
the Knuth scheme, without error correction, exceeds the min-
imum achievable redundancy by a factor of two. Obviously, this
is a price to be paid for simplicity. For the proposed method
with correction, the redundancy is equal to the sum of the re-
dundancy
of code , and the length of the prefix. For
neither of these terms a general expression is available. Another
complication in the analysis is that the error correction levels
of and of may be dif-
ferent. The value of
depends on the choice of .For
example, for BCH codes it is roughly
[7]. The length
of the prefix can be decomposed into two parts: a contribution of
length
identifying the balancing index and a contribution
of length roughly
, based on the presented
bound (9), providing the error correction and balancing proper-
ties to the prefix. Hence, assuming
, as for
BCH codes, the total redundancy
of the proposed
method can be approximated as
(10)
Further assuming that
and that the length is very
large, the comparison of this expression to (9) gives that the re-
dundancy of our method is roughly a factor
higher than (the lower bound on) the minimum
redundancy for any
-error-correcting balanced code. Although
the results presented in this analysis are based on bounds and ap-
proximations rather than exact results, it seems safe to conclude
that the redundancy of the presented method is within a factor
of two of the optimum. This will be illustrated in Section V. The
redundancy of the codes presented in [2] is (slightly) lower, but
the method presented here is simpler and more general (since
the constructions from [2] are for
only). The construc-
tions from [9] are of a completely different nature, with much
higher redundancies but balancing being established on a very
small block scale.
IV. E
RROR
PROBABILITY
ANALYSIS
Since the length of the prefix is considerably shorter than the
length of the bulk, the probability of the prefix being hit by a
random error is proportionally smaller. Therefore, it may be
considered giving the prefix a lower error correcting capability.
On the other hand, uncorrectable errors in the prefix will lead to
a wrong balancing index
and may thus lead to a huge number
of errors in the bulk. Therefore, it may be worthwhile to invest
in some extra protection of the prefix. Hence, determining the
error correction levels of the codes
and is a delicate issue.
This will be investigated from both the block error probability
and the bit error probability perspectives, in Sections IV-A and
IV-B, respectively. Throughout the analysis, we will assume a
memoryless binary symmetric channel with error probability
.
A. Block Error Probability
In this subsection, we will investigate proper choices of the
error correction capabilities in the context of the block error
probability
, which is defined as the probability that the
decoding result
is different from the original information
block
.
An often-used general expression for the block error proba-
bility for a block code of length
and Hamming distance ,
thus correcting up to
errors, is
(11)
Actually, this is an upper bound, since error correction might
also take place in case more than
errors occur. To which extent
this could happen depends on the structure of the code and the
implementation of its decoder. However, this effect will be ne-
glected throughout this paper. In other words, the performance
analysis reflects a worst case scenario: the real error probabili-
ties may be (a little bit) better. A well-known approximation of
(11) is obtained by considering only its first term
(12)
WEBER et al.: ERROR-CORRECTING BALANCED KNUTH CODES 85
since it dominates all other terms. A further simplification is
obtained by ignoring the factor
in (12), leading
to
(13)
The smaller
, the better (12) and (13) approximate (11).
Since the scheme presented in Section II is multistep, the
evaluation is a bit more involved. For a received word
of length
, let and be the number of errors in
the last
bits (the bulk) and the first bits (the prefix), respec-
tively. We consider various situations.
•If
and , then
correction of all errors is guaranteed.
•If
and , then the balancing index is cor-
rectly retrieved from the prefix, but the number of errors
in the bulk is beyond the error-correcting capability of
,
thus leading to a wrong decoding result.
•If
, then the balancing index retrieved from the
prefix is wrong, i.e.,
, leading to the introduction of
extra errors in the bulk due to the inversion process in step
2 of the decoding procedure. Assuming
for the mo-
ment, successful decoding requires
.
Since, typically, the bulk length
(and thus, the range of
-values) is very large and the error correcting capability
is very small, the probability of the -decoder still being
successful is negligible. If
, then the successful
interval will shrink even further, except when a transmis-
sion error’ coincides with an ’inversion error’. Still, even
in the latter case, chances for correct decoding are very
low. It could be helpful to design
, the mapping of the
-values to codewords in , in such a way that distances
between codewords corresponding to subsequent
-values
are kept as low as possible, but also the impact of this will
be limited. In conclusion, we assume that
implies
a wrong decoding result with high probability.
Hence, we approximate the block error probability by
(14)
(15)
(16)
where
(17)
(18)
and
(19)
(20)
The approximations in (18) and (20) are only valid if
and
, respectively, are sufficiently small, which we will assume
throughout the rest of this section.
When designing the scheme such that the redundancy is min-
imized while achieving a certain Hamming distance, it follows
from the results in Section II that the Hamming distances of the
constituent codes should be chosen (about) equal, i.e.,
.
However, since the prefix is mostly much shorter than the bulk,
the chances of the bulk being hit by errors is much larger than
for the prefix. Therefore, intuitively, it should be beneficial to
choose
smaller than . Indeed, when the focus is on the block
error probability rather than the Hamming distance, this turns
out to be true. When designing the scheme to achieve a certain
target block error probability at maximum rate for a given data
length and channel error probability, the focus should not be on
the final Hamming distance
, but on a careful choice of the error
correction capabilities
and , where the latter can typically
be smaller than the former. Since it follows from (16), (18), and
(20) that
is approximately
(21)
an appropriate design strategy is choosing
and such that
both terms in (21) are in the same order of magnitude as the
target block error probability, while their sum is (just) below this
target. Note that if
, which is usually the case, the bino-
mial coefficient in the first term in (21) is huge in comparison to
the binomial coefficient in the second term, and, therefore, the
exponent of
in the second term can be considerably smaller
than the exponent in the first term.
If
and are substantially smaller than , then a more
detailed general analysis is possible. To this end, we evaluate
for the case . Using (18) and (20), we find
(22)
(23)
where the second approximation follows from
,
which is quite good if
is much smaller than .If ,
then (23) gives
(24)
which shows that
is indeed orders of magnitude below
in case . For any , keeping fixed and
reducing
by 1 causes a growth in by a factor
(25)
where
and are the prefix lengths in the cases that prefix error
correcting capabilities are
and , respectively. Note that
for ranges of practical interest
is the dominating factor in
(25). Hence, after several reductions of
the ratio may
be well above zero, i.e.,
may no longer be negligibly small.
86 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 1, JANUARY 2012
B. Bit Error Probability
When the prime interest is the bit error probability rather than
the block error probability, then the picture may be different
from the one sketched in the previous subsection. A decoding
error in the bulk typically results in just a few erroneous bits
in the data, but a decoding error in the prefix may completely
destroy the data. This would ask for a stronger error protection
of the prefix. Therefore, in this subsection, we analyze the bit
error probability
for the proposed scheme.
In general, for a block code of length
and Hamming dis-
tance
, the bit error probability can be well approximated by
(26)
where
is as given in (11). This approximation is based on
the fact that in case erroneous decoding occurs it is most likely
that the output codeword is still close to the original codeword.
Since closest codewords differ in
bits and the total number of
bits is
, the result follows.
As in the previous subsection, the analysis of the proposed
scheme is more involved due to its multistep character. Again,
for a received word of length
, let and be the
number of errors in the last
bits (the bulk) and the first bits
(the prefix), respectively. We consider various situations.
•If
and , then
correction of all errors is guaranteed and the original data
block is retrieved at the receiver.
•If
and , then the balancing index is cor-
rectly retrieved from the prefix, but the number of errors in
the bulk is beyond the error-correcting capability of
.As
argued in the previous paragraph, the resulting fraction of
erroneous bits in the data block is
.
•If
, then the balancing index retrieved from the
prefix is wrong, i.e.,
. As argued in the previous sub-
section, the final decoding result is wrong with high proba-
bility, and, moreover, the number of bit errors can be huge.
The actual distribution of the number of errors depends
on the implementation of Knuth’s algorithm and the map-
ping
. When assuming that both and are more or less
uniformly distributed, it follows from standard probability
theory that the expected value of the absolute difference
between
and is . Hence, the expected fraction of
erroneous bits in the data word is
.
Hence, we approximate the bit error probability by
(27)
where
(28)
and
(29)
and where
and are as given in (17) and (19), respectively.
It thus follows that
is approximately
(30)
Following a similar reasoning as in the previous subsection, an
appropriate design strategy is choosing
and such that both
terms in (30) are in the same order of magnitude as the target
bit error probability, while their sum is (just) below this target.
Again, note that if
, the coefficient in the first term in
(30) is huge in comparison to the coefficient in the second term,
and, therefore, the exponent of
in the second term can be con-
siderably smaller than the exponent in the first term.
If
and are substantially smaller than , then a more de-
tailed general analysis is possible. To this end, evaluate
for the case . Using (28), (29), (18), and (20), we find
(31)
(32)
where the approximation comes from (23). If
, then (32)
gives
(33)
which shows that
is indeed orders of magnitude below if
and . For any , keeping fixed and
reducing
by 1 causes a growth in by the factor given in
(25). Hence, after several reductions of
the ratio may
be well above zero, i.e.,
may no longer be negligibly small.
Assuming
, this final value is larger than for the
block error probability case, since, for
, exceeds
by a factor of , while the growth factor when
reducing
is the same in both cases.
In conclusion, when designing the scheme to achieve a certain
at maximum rate, the focus should, as for the case,
not be on the final Hamming distance
, but on a careful choice
of the error correction capabilities
and . Again, the latter
can typically be smaller than the former, but not to the same
extent as for the block error probability case.
V. C
ASE STUDIES
In this section, we illustrate the results obtained in this paper
by working out two case studies. In the first case, various op-
tions for the constituent codes are considered and studied for
one fixed channel. In the second case, one particular code for
protecting the payload is evaluated for different channel condi-
tions.
A. Case Study I: Shortened BCH Codes
Let the information block length be
. We consider
codes , with ,
obtained by shortening
BCH codes
[7]. For
, we consider the shortest known balanced codes with
cardinality at least
and Hamming distance ,
WEBER et al.: ERROR-CORRECTING BALANCED KNUTH CODES 87
TABLE I
C
ARDINALITIES OF THE
LARGEST
KNOWN BALANCED
CODES WITH
LENGTH
AND HAMMING
DISTANCE
[3]
TABLE II
C
ODE PARAMETERS IN THE
SETTING OF
CASE STUDY
I FOR
WITH
with . Such balanced codes are tabulated on [3],
from which we collected the cardinalities of some short codes
in Table I.
An overview of the parameters of codes obtained by choosing
is provided in Table II. If , then it is found
from Table I that a prefix of length 12 is required to represent
the 750 possible balancing positions without error correction ca-
pabilities, as in the original Knuth case, leading to a code rate
of
, i.e., a normalized redundancy of
.If , then it is found from Table I that a
prefix of length 16 is required to represent the 760 possible bal-
ancing positions in the BCH codeword with a single error cor-
rection capability, leading to a higher normalized redundancy of
. Further increasing the value of
leads to higher distances at the expense of higher redundan-
cies, as can be checked from the table. For
, has length
and Hamming distance , while has
length
and Hamming distance . Thus, the code
has length , redundancy ,
normalized redundancy
, and Ham-
ming distance
. In Table III, we compare the normal-
ized redundancy
of our scheme to , which is the lower
bound on the normalized redundancy of any scheme with the
same length
and error correction level . Note that
is, as for Knuth’s original method, close to 2, in fact a little
bit less in case we have error-correcting capabilities. The factor
TABLE III
R
EDUNDANCY
COMPARISON IN THE
SETTING OF
CASE
STUDY I
FOR
WITH
TABLE IV
C
ODE PARAMETERS IN THE SETTING OF CASE STUDY I FOR
AND
TABLE V
N
UMERICAL EVALUATIONS OF
AND ,WITH ,
, ,
AND
TABLE VI
N
UMERICAL EVALUATIONS OF AND ,WITH ,
, , , AND
, the price to be paid for simplicity, may even be smaller
since
is only a lower bound on of which it is unknown
whether it is achievable in case
.
The proposed scheme also offers the option to provide un-
equal error protection to the bulk and the prefix. An overview of
the parameters of codes obtained by fixing
and varying
is provided in Table IV. Choosing , i.e., providing no
error correction capability to the prefix, gives a Hamming dis-
tance
and a normalized redundancy
. Note that increasing up to the value of 3 increases
both the Hamming distance (since
) and the redundancy. However, also note that further
increasing
from 3 to 4 (or beyond) increases the redundancy,
without the reward of an improved distance
(since it is stuck
at
due to the fact that ).
88 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 1, JANUARY 2012
Fig. 3. Probabilities and (for ), as a function of the channel error probability , for Case Study II.
Fig. 4. Probabilities and (for
), as a function of the channel error probability
, for Case Study II.
Next, we study the block error probability, where we assume
that the channel error probability
is . In Table V, the
values of
and are displayed for various choices of
and . Let the target block error probability for our applica-
tion be
, i.e., the error protection levels and should
be chosen in such a way that
does not exceed .
From Table V, we conclude that
should be (at least) equal
to 3. From Table II, we see that choosing
as well would
result in a code
with normalized redundancy 0.0672 and Ham-
ming distance 8, while we see from Table V that
. However, from Tables IV and
V, we can also conclude that keeping
and lowering
from 3 to only 1 results in (i) a normalized redundancy decrease
to 0.0578, i.e., a reduction by
, (ii) a Hamming distance de-
crease to 4, and (iii) a block error probability still meeting the
target:
. Hence,
we obtain a rate increase while still meeting the performance
requirement, in spite of the distance drop. Note that a further
rate increase is not possible within this scheme, since the per-
formance requirement is not met by the choice
, i.e., by
giving the prefix no error protection at all. Similar observations
can be made for the case that the target
is , where
we find that
is required, but that is sufficient for
the error protection of the prefix.
Finally, we investigate the bit error probability, still assuming
. In Table VI, the values of and are displayed for
various choices of
and . Let the target bit error probability
for a certain application be
, i.e., the error protection levels
and should be chosen in such a way that does not
exceed
. Note that the choice and satisfies
the requirement, but that
cannot be reduced any further. This
indicates that when designing the scheme to achieve a certain
target bit error probability,
could still be chosen smaller than
, but not to the same extent as for the block error probability.
B. Case Study II: Reed-Muller Code RM(6,10)
In this subsection, we use the Reed Muller Code RM(6,10)
[7] as the code
protecting the payload. The code length
, the data block length ,
and the Hamming distance
, and thus, the
error correction level
. We investigate
the choice of an appropriate code
protecting the prefix in the
proposed scheme, for different channel error probabilities
.
WEBER et al.: ERROR-CORRECTING BALANCED KNUTH CODES 89
The balanced code should have a size of at least
and a length as small as possible. For small values of the Ham-
ming distance
, the minimum length can be determined from
Table I. For Hamming distances 2, 4, 6, 8, and 10 the minimum
lengths are 14, 16, 22, 24, and 30, respectively. The error pro-
tection level
.
The block error probability can be approximated by the sum
of
and as given in (19).
For
, the values of and (for
) are given in Fig. 3. When choosing the error protec-
tion level
, then is many orders of magnitudes
below
. Hence, in order to lower the redundancy, should
rather be chosen smaller. Also when lowering
from 7 to 4, and
thus, considerably decreasing the redundancy,
is still negli-
gible compared to
for the whole range covered in the figure,
so further reductions may be possible. The figure shows that the
channel error probability
determines to which extent can be
reduced. The higher
, the lower can be chosen while keeping
well below .
The bit error probability can be approximated by the
sum of
and , which, according to (28) and (29),
are
and , respectively. For
, the values of and (for )
are given in Fig. 4. Similar conclusions as for the block error
probability can be drawn, i.e., the higher
, the lower can be
chosen while keeping
well below . However, the resulting
value of
for the bit error probability case is higher than for
the bit error probability case, due to the fact that
exceeds
by a factor of .
VI. C
ONCLUSION
We have extended Knuth’s balancing scheme with error-cor-
recting capabilities. The approach is very general in the sense
that any block code can be used to protect the payload, while
the prefix of length
is protected by a constant-weight code
where the weight is
. It has been demonstrated that, in order
to meet a certain target block or bit error probability in an ef-
ficient way, the distances of the constituent codes may prefer-
ably be unequal. Hence, from the performance perspective, the
overall Hamming distance is of minor importance. As for the
original Knuth algorithm, the scheme’s simplicity comes at the
price of a somewhat higher redundancy than the most efficient
but prohibitively complex code. Therefore, the proposed scheme
is an attractive simple alternative to achieve (long) balanced se-
quences with error correction properties.
R
EFERENCES
[1] E. Agrell, A. Vardy, and K. Zeger, “Upper bounds for constant-weight
codes, IEEE Trans. Inf. Theory, vol. 46, no. 7, pp. 2373–2395, Nov.
2000.
[2] S. Al-Bassam and B. Bose, “Design of efficient error-correcting bal-
anced codes, IEEE Trans. Computers, vol. 42, no. 10, pp. 1261–1266,
Oct. 1993.
[3] A. E. Brouwer, Bounds for Binary Constant Weight Codes [Online].
Available: http://www.win.tue.nl/~aeb/codes/Andw.html
[4] K. A. S. Immink, Codes for Mass Data Storage Systems, 2nd ed.
Eindhoven, The Netherlands: Shannon Foundation Publishers, 2004.
[5] K. A. S. Immink and J. H. Weber, “Very efficient balanced codes,
IEEE J. Sel. Areas Commun., vol. 28, no. 2, pp. 188–192, Feb. 2010.
[6] D. E. Knuth, “Efficient balanced codes, IEEE Trans. Inf. Theory, vol.
IT-32, no. 1, pp. 51–53, Jan. 1986.
[7] S. Lin and D. J. Costello, Jr., Error Control Coding, 2nd ed. Upper
Saddle River, NJ: Pearson Prentice-Hall, 2004.
[8] A. Mazumdar, R. M. Roth, and P. O. Vontobel, “On linear balancing
sets, in Proc. IEEE Int. Symp. Information Theory, Seoul, South
Korea, Jun.-Jul. 2009, pp. 2699–2703.
[9] H. van Tilborg and M. Blaum, “On error-correcting balanced codes,
IEEE Trans. Inf. Theory, vol. 35, no. 5, pp. 1091–1095, Sep. 1989.
[10] J. H. Weber and K. A. S. Immink, “Knuth’s balanced code revisited,
IEEE Trans. Inf. Theory, vol. 56, no. 4, pp. 1673–1679, Apr. 2010.
[11] H. Zhou, A. Jiang, and J. Bruck, “Error-correcting schemes with
dynamic thresholds in nonvolatile memories, in Proc. IEEE Int.
Symp. Information Theory, Saint Petersburg, Russia, Jul.-Aug. 2011,
pp. 2109–2113.
Jos H. Weber (S’87–M’90–SM’00) was born in Schiedam, The Netherlands,
in 1961. He received the M.Sc. (in mathematics, with honors), Ph.D., and MBT
(Master of Business Telecommunications) degrees from Delft University of
Technology, Delft, The Netherlands, in 1985, 1989, and 1996, respectively.
Since 1985, he has been with the Faculty of Electrical Engineering, Mathe-
matics, and Computer Science of Delft University of Technology. Currently, he
is an associate professor at the Wireless and Mobile Communications Group.
He is the chairman of the WIC (Werkgemeenschap voor Informatie-en Com-
municatietheorie in de Benelux) and the secretary of the IEEE Benelux Chapter
on Information Theory. He was a Visiting Researcher at the University of Cal-
ifornia at Davis, USA, the University of Johannesburg, South Africa, and the
Tokyo Institute of Technology, Japan. His main research interests are in the areas
of channel and network coding.
Kees A. Schouhamer Immink (M’81–SM’86–F’90) received his Ph.D. degree
from the Eindhoven University of Technology. He founded and was named
president of Turing Machines, Inc., in 1998. He has been, since 1994, an ad-
junct professor at the Institute for Experimental Mathematics, Essen University,
Germany, and is affiliated with the Nanyang Technological University of Sin-
gapore. Immink designed coding techniques of a wealth of digital audio and
video recording products, such as Compact Disc, CD-ROM, CD-Video, Dig-
ital Compact Cassette system, DCC, DVD, Video Disc Recorder, and Blu-ray
Disc. He received a Knighthood in 2000, a personal “Emmy” award in 2004, the
1996 IEEE Masaru Ibuka Consumer Electronics Award, the 1998 IEEE Edison
Medal, 1999 AES Gold and Silver Medals, and the 2004 SMPTE Progress
Medal. He was named a fellow of the IEEE, AES, and SMPTE, and was inducted
into the Consumer Electronics Hall of Fame, and elected into the Royal Nether-
lands Academy of Sciences and the U.S. National Academy of Engineering. He
served the profession as President of the Audio Engineering Society, Inc., New
York, in 2003.
Hendrik C. Ferreira (SM’08) was born and educated in South Africa where
he received the D.Sc. (Eng.) degree from the University of Pretoria in 1980.
From 1980 to 1981, he was a postdoctoral researcher at the Linkabit Cor-
poration in San Diego, CA. In 1983, he joined the Rand Afrikaans University,
Johannesburg, South Africa where he was promoted to professor in 1989 and
served two terms as Chairman of the Department of Electrical and Electronic
Engineering, from 1994 to 1999. He is currently a research professor at the Uni-
versity of Johannesburg. His research interests are in Digital Communications
and Information Theory, especially Coding Techniques, as well as in Power Line
Communications.
Dr. Ferreira is a past chairman of the Communications and Signal Processing
Chapter of the IEEE South Africa section, and from 1997 to 2006 he was Ed-
itor-in-Chief of the Transactions of the South African Institute of Electrical En-
gineers. He has served as chairman of several conferences, including the inter-
national 1999 IEEE Information Theory Workshop in the Kruger National Park,
South Africa, as well as the 2010 IEEE African Winter School on Information
Theory and Communications.
... Motivated by applications in laser disks, Knuth [5] studied balanced binary codes and proposed an efficient method to encode an arbitrary binary message to a binary balanced codeword by introducing log 2 n redundant bits. Recently, Weber et al. [13] extended Knuth's scheme to include error-correcting capabilities. Specifically, their construction takes two input codes of distance d: a linear code of length n and a short balanced code C p ; and outputs a long balanced code of distance d. ...
... Recently, Yazdi et al. [3] introduced the coupling construction (Lemma 6) that takes two binary error-correcting codes, one of which is balanced, as inputs and outputs a GCbalanced error-correcting code. As with the construction of Weber et al. [13], it is unclear how to find the balanced binary errorcorrecting code efficiently. ...
... Recently, Weber et al. [13] modified Knuth's balancing technique to endow the code with error-correcting capabilities. Their method requires two error-correcting codes as inputs: an (m, d) 2 code C m and a short (p, d) 2 balanced code C p where |C p | ≥ m. ...
Preprint
Full-text available
To equip DNA-based data storage with random-access capabilities, Yazdi et al. (2018) prepended DNA strands with specially chosen address sequences called primers and provided certain design criteria for these primers. We provide explicit constructions of error-correcting codes that are suitable as primer addresses and equip these constructions with efficient encoding algorithms. Specifically, our constructions take cyclic or linear codes as inputs and produce sets of primers with similar error-correcting capabilities. Using certain classes of BCH codes, we obtain infinite families of primer sets of length $n$, minimum distance $d$ with $(d + 1) \log_4 n + O(1)$ redundant symbols. Our techniques involve reversible cyclic codes (1964), an encoding method of Tavares et al. (1971) and Knuth's balancing technique (1986). In our investigation, we also construct efficient and explicit binary balanced error-correcting codes and codes for DNA computing.
... Binary balancing schemes that enable correction of errors have been presented by van Tilborg and Blaum [8], Al-Bassam and Bose [9] and Weber et al. [10], among others. In [8], the idea is to consider short balanced sequences as symbols of a non-binary alphabet and to construct error-correcting codes over that alphabet. ...
... These codes can be extended using concatenation techniques to correct up to four errors. In [10], a combination of conventional error correction techniques and Knuth's balancing method is used. ...
Article
Full-text available
We investigate a Knuth-like scheme for balancing q-ary codewords, which has the virtue that look-up tables for coding and decoding the prefix are avoided by using precoding and error correction techniques. We show how the scheme can be extended to allow for error correction of single channel errors using a fast decoding algorithm that depends on syndromes only, making it considerably faster compared to the prior art exhaustive decoding strategy. A comparison between the new and prior art schemes, both in terms of redundancy and error performance, completes the study.
... Balanced codes have been widely studied due to its applicability in the field of communication and storage structures such as optical and magnetic recording devices like Blu-Ray, DVD and CD [1]; error correction and detection [2], [3]; cable transmission [4] and noise attenuation in VLSI systems. The decoding of balanced codes is fast, and it is done in parallel which avoids latency in communication. ...
Article
Full-text available
In this paper, the construction of binary balanced codes is revisited. Binary balanced codes refer to sets of bipolar codewords where the number of "1"s in each codeword equals that of "0"s. The first algorithm for balancing codes was proposed by Knuth in 1986; however, its redundancy is almost two times larger than that of the full set of balanced codewords. We will present an efficient and simple construction with a redundancy approaching the minimal achievable one
... Balanced codes have been widely studied over the years because of their applicability in the field of communication and in storage structures such as optical and magnetic recording devices like Blu-Ray, DVDs, and CDs [1,2]; error correction and detection [3,4]; cable transmission [5]; and noise attenuation in VLSI integrated circuits [6]. For some balancing techniques, the decoding of balanced codes is fast and can be done in parallel, which avoids latency in communication. ...
Preprint
Full-text available
A simple scheme was proposed by Knuth to generate binary balanced codewords from any information word. 4 However, this method is limited in the sense that its redundancy is twice as that of the full sets of balanced codes. The 5 gap between Knuth's algorithm redundancy and that of the full sets of balanced codes is significantly considerable. This 6 paper attempts to reduce that gap. Furthermore, many constructions assume that a full balancing can be performed 7 without showing the steps. A full balancing refers to the overall balancing of the encoded information together with 8 the prefix. We propose an efficient way to perform full balancing scheme which do not make use of lookup tables or 9 enumerative coding. 10
... Balanced codes have been widely studied over the years because of their applicability in the field of communication and in storage structures such as optical and magnetic recording devices like Blu-Ray, DVDs, and CDs [1,2]; error correction and detection [3,4]; cable transmission [5]; and noise attenuation in VLSI integrated circuits [6]. For some balancing techniques, the decoding of balanced codes is fast and can be done in parallel, which avoids latency in communication. ...
Article
Full-text available
A simple scheme was proposed by Knuth to generate binary balanced codewords from an information word. 4 However, this method is limited in the sense that its redundancy is twice as that of the full sets of balanced codes. The 5 gap between Knuth's algorithm redundancy and that of the full sets of balanced codes is significantly considerable. This 6 paper attempts to reduce that gap. Furthermore, many constructions assume that full balancing can be performed 7 without showing the steps. A full balancing refers to the overall balancing of the encoded information together with 8 the prefix. We propose an efficient way to perform a full balancing scheme which does not make use of lookup tables or 9 enumerative coding. 10
... Balanced codes have been widely studied over the years because of their applicability in the field of communication and in storage structures such as optical and magnetic recording devices like Blu-Ray, DVDs, and CDs [1,2]; error correction and detection [3,4]; cable transmission [5]; and noise attenuation in VLSI integrated circuits [6]. For some balancing techniques, the decoding of balanced codes is fast and can be done in parallel, which avoids latency in communication. ...
Article
Full-text available
A simple scheme was proposed by Knuth to generate binary balanced codewords from any information word. However, this method is limited in the sense that its redundancy is twice as that of the full sets of balanced codes. The gap between Knuth's algorithm redundancy and that of the full sets of balanced codes is significantly considerable. This paper attempts to reduce that gap. Furthermore, many constructions assume that a full balancing can be performed without showing the steps. A full balancing refers to the overall balancing of the encoded information together with the prefix. We propose an efficient way to perform full balancing scheme which do not make use of look up tables or enumerative coding.
Article
To equip DNA-based data storage with random-access capabilities, Yazdi et al. (2018) prepended DNA strands with specially chosen address sequences called primers and provided certain design criteria for these primers. We provide explicit constructions of error-correcting codes that are suitable as primer addresses and equip these constructions with efficient encoding algorithms. Specifically, our constructions take cyclic or linear codes as inputs and produce sets of primers with similar error-correcting capabilities. Using certain classes of BCH codes, we obtain infinite families of primer sets of length $n$ , minimum distance $d$ with $(d+1) \log _{4}\,\,n +O(1)$ redundant symbols. Our techniques involve reversible cyclic codes (1964), an encoding method of Tavares et al. (1971) and Knuth’s balancing technique (1986). In our investigation, we also construct efficient and explicit binary balanced error-correcting codes and codes for DNA computing.
Article
Transmission across a bus modelled as a parallel asynchronous communication channels is subject to fault injection attacks which cause glitches – pulses that are added to the transmitted signal at arbitrary times – and delays. We present self-synchronizing coding schemes with no latency at the receiver that do not require any acknowledgment to be sent and that can decode the received signal even when the signal suffers from random delays and distortion by random glitches. We make use of the codes to produce lower bounds on the information capacity of such channels when the number of parallel channels is large.
Book
Full-text available
Preface to the Second Edition About five years after the publication of the first edition, it was felt that an update of this text would be inescapable as so many relevant publications, including patents and survey papers, have been published. The author's principal aim in writing the second edition is to add the newly published coding methods, and discuss them in the context of the prior art. As a result about 150 new references, including many patents and patent applications, most of them younger than five years old, have been added to the former list of references. Fortunately, the US Patent Office now follows the European Patent Office in publishing a patent application after eighteen months of its first application, and this policy clearly adds to the rapid access to this important part of the technical literature. I am grateful to many readers who have helped me to correct (clerical) errors in the first edition and also to those who brought new and exciting material to my attention. I have tried to correct every error that I found or was brought to my attention by attentive readers, and seriously tried to avoid introducing new errors in the Second Edition. China is becoming a major player in the art of constructing, designing, and basic research of electronic storage systems. A Chinese translation of the first edition has been published early 2004. The author is indebted to prof. Xu, Tsinghua University, Beijing, for taking the initiative for this Chinese version, and also to Mr. Zhijun Lei, Tsinghua University, for undertaking the arduous task of translating this book from English to Chinese. Clearly, this translation makes it possible that a billion more people will now have access to it. Kees A. Schouhamer Immink Rotterdam, November 2004
Book
Full-text available
In this chapter, we discuss a number of codes for error control. Only block codes are treated here. Discussion on convolutional codes will be deferred until next chapter. After reviewing some information theoretic foundations of coding in the first section, linear block codes are treated in Section 3.2. The concepts of parity-check and generator matrices to represent linear block codes are discussed. Several examples of block codes are given, including the important class of Hamming codes. Principles behind syndrome decoding and decoding using a standard array are treated in Section 3.3. Section 3.4 provides some useful bounds on coding and introduces the concept of coding gain. Section 3.5 discusses the principles behind cyclic codes. Some important decoding techniques for these codes are treated in Section 3.6. These include the Meggitt and error-trapping decoders. After introducing some algebra in Section 3.7, in the next three sections that follow, we treat the most important and practical of all cyclic codes, the Bose-Chaudhuri-Hocquenghem (BCH) codes and Reed-Solomon codes. The treatment includes the MasseyBerlekamp algorithm for decoding these codes. In Section 3.11, we turn to coding for burst error control, which has been successfully applied to storage media such as magnetic tapes and compact disc. Automatic-repeat-request (ARQ) schemes find wide applicability in computer networks and these schemes are treated in the last section.
Conference Paper
Full-text available
Predetermined fixed thresholds are commonly used in nonvolatile memories for reading binary sequences, but they usually result in significant asymmetric errors after a long duration, due to voltage or resistance drift. This motivates us to construct error-correcting schemes with dynamic reading thresholds, so that the asymmetric component of errors are minimized. In this paper, we discuss how to select dynamic reading thresholds without knowing cell level distributions, and present several error-correcting schemes. Analysis based on Gaussian noise models reveals that bit error probabilities can be significantly reduced by using dynamic thresholds instead of fixed thresholds, hence leading to a higher information rate.
Article
Full-text available
In 1986, Don Knuth published a very simple algorithm for constructing sets of bipolar codewords with equal numbers of one's and zero's, called balanced codes. Knuth's algorithm is well suited for use with large codewords. The redundancy of Knuth's balanced codes is a factor of two larger than that of a code comprising the full set of balanced codewords. In this paper, we will present results of our attempts to improve the performance of Knuth's balanced codes.
Article
Full-text available
The prior art construction of sets of balanced codewords by Knuth is attractive for its simplicity and absence of look-up tables, but the redundancy of the balanced codes generated by Knuth's algorithm falls a factor of two short with respect to the minimum required. We present a new construction, which is simple, does not use look-up tables, and is less redundant than Knuth's construction. In the new construction, the user word is modified in the same way as in Knuth's construction, that is by inverting a segment of user symbols. The prefix that indicates which segment has been inverted, however, is encoded in a different, more efficient, way.
Article
Full-text available
Results are presented on families of balanced binary error-correcting codes that extend those in the literature. The idea is to consider balanced blocks as symbols over an alphabet and to construct error-correcting codes over that alphabet. Encoding and decoding procedures are presented. Several improvements to the general construction are discussed
Article
Coding schemes in which each codeword contains equally many zeros and ones are constructed in such a way that they can be efficiently encoded and decoded.
Article
New constructions of t-error correcting balanced codes, for 1⩽t⩽4, are presented. In a balanced code, all the words have an equal number of 1's and 0's. In many cases, the information rates of the new codes are better than the existing codes given in the literature. The proposed codes also have efficient encoding and decoding algorithms
Article
Let n be an even positive integer and F be the field \GF(2). A word in F^n is called balanced if its Hamming weight is n/2. A subset C \subseteq F^n$ is called a balancing set if for every word y \in F^n there is a word x \in C such that y + x is balanced. It is shown that most linear subspaces of F^n of dimension slightly larger than 3/2\log_2(n) are balancing sets. A generalization of this result to linear subspaces that are ``almost balancing'' is also presented. On the other hand, it is shown that the problem of deciding whether a given set of vectors in F^n spans a balancing set, is NP-hard. An application of linear balancing sets is presented for designing efficient error-correcting coding schemes in which the codewords are balanced. Comment: The abstract of this paper appeared in the proc. of 2009 International Symposium on Information Theory
Article
Let A(n,d,w) denote the maximum possible number of codewords in an (n,d,w) constant-weight binary code. We improve upon the best known upper bounds on A(n,d,w) in numerous instances for n⩽24 and d⩽12, which is the parameter range of existing tables. Most improvements occur for d=8, 10, where we reduce the upper bounds in more than half of the unresolved cases. We also extend the existing tables up to n⩽28 and d⩽14. To obtain these results, we develop new techniques and introduce new classes of codes. We derive a number of general bounds on A(n,d,w) by means of mapping constant-weight codes into Euclidean space. This approach produces, among other results, a bound on A(n,d,w) that is tighter than the Johnson bound. A similar improvement over the best known bounds for doubly-constant-weight codes, studied by Johnson and Levenshtein, is obtained in the same way. Furthermore, we introduce the concept of doubly-bounded-weight codes, which may be thought of as a generalization of the doubly-constant-weight codes. Subsequently, a class of Euclidean-space codes, called zonal codes, is introduced, and a bound on the size of such codes is established. This is used to derive bounds for doubly-bounded-weight codes, which are in turn used to derive bounds on A(n,d,w). We also develop a universal method to establish constraints that augment the Delsarte inequalities for constant-weight codes, used in the linear programming bound. In addition, we present a detailed survey of known upper bounds for constant-weight codes, and sharpen these bounds in several cases. All these bounds, along with all known dependencies among them, are then combined in a coherent framework that is amenable to analysis by computer. This improves the bounds on A(n,d,w) even further for a large number of instances of n, d, and w