ArticlePDF Available

# Error-Correcting Balanced Knuth Codes

Authors:

## Abstract and Figures

Knuth's celebrated balancing method consists of inverting the first bits in a binary information sequence, such that the resulting sequence has as many ones as zeroes, and communicating the index to the receiver through a short balanced prefix. In the proposed method, Knuth's scheme is extended with error-correcting capabilities, where it is allowed to give unequal protection levels to the prefix and the payload. The proposed scheme is very general in the sense that any error-correcting block code may be used for the protection of the payload. Analyses with respect to redundancy and block and bit error probabilities are performed, showing good results while maintaining the simplicity features of the original scheme. It is shown that the Hamming distance of the code is of minor importance with respect to the error probability.
Content may be subject to copyright.
82 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 1, JANUARY 2012
Error-Correcting Balanced Knuth Codes
Jos H. Weber, Senior Member, IEEE, Kees A. Schouhamer Immink, Fellow, IEEE, and
Hendrik C. Ferreira, Senior Member, IEEE
Abstract—Knuth’s celebrated balancing method consists of in-
verting the ﬁrst
bits in a binary information sequence, such that
the resulting sequence has as many ones as zeroes, and communi-
cating the index
to the receiver through a short balanced preﬁx.
In the proposed method, Knuth’s scheme is extended with error-
correcting capabilities, where it is allowed to give unequal protec-
tion levels to the preﬁx and the payload. The proposed scheme is
very general in the sense that any error-correcting block code may
be used for the protection of the payload. Analyses with respect to
redundancy and block and bit error probabilities are performed,
showing good results while maintaining the simplicity features of
the original scheme. It is shown that the Hamming distance of the
code is of minor importance with respect to the error probability.
I. INTRODUCTION
S
ETS of binary sequences that have a ﬁxed length and
a ﬁxed weight (number of ones)
are usually called
constant-weight codes. An important sub-class is formed
by the so-called balanced codes, for which
is even and
, i.e., all codewords have as many zeroes as ones.
Such codes have found application in various transmission and
(optical/magnetic) recording systems, e.g., in ﬂash memories
[11]. A survey on balanced codes can be found in [4].
A simple method for generating balanced codewords, which
is capable of encoding and decoding (very) large blocks, was
proposed by Knuth [6] in 1986. In his method, an
-bit binary
data word,
even, is forwarded to the encoder. The encoder in-
verts the ﬁrst
bits of the data word, where is chosen in such
a way that the modiﬁed word has equal numbers of zeroes and
ones. Knuth showed that such an index
can always be found.
The index
is represented by a balanced word of length . The
-bit preﬁx word followed by the modiﬁed -bit data word are
both transmitted, so that the rate of the code is
. The
receiver can easily undo the inversion of the ﬁrst
once
is computed from the preﬁx. Both encoder and decoder
do not require large look-up tables, and Knuth’s algorithm is,
therefore, very attractive for constructing long balanced code-
words. The redundancy of Knuth’s method is roughly twice the
redundancy of a code which uses the full set of balanced words.
Since the latter has a prohibitively high complexity in case of
large lengths, the factor of two can be considered as a price to
Manuscript received January 05, 2011; revised July 11, 2011; accepted July
20, 2011. Date of publication September 15, 2011; date of current version Jan-
uary 06, 2012. This work was supported by Grant “Theory and Practice of
Coding and Cryptography, Award Number NRF-CRP2-2007-03.
J. H. Weber is with the Delft University of Technology, The Netherlands
(e-mail: j.h.weber@tudelft.nl).
K. A. S. Immink is with the Turing Machines BV, The Netherlands, and also
with Nanyang Technological University, Singapore (e-mail: immink@turing-
machines.com).
H. C. Ferreira is with the University of Johannesburg, Johannesburg, South
Africa (e-mail: hcferreira@uj.ac.za).
Communicated by M. Blaum, Associate Editor for Coding Techniques.
Digital Object Identiﬁer 10.1109/TIT.2011.2167954
be paid for simplicity. In [5] and [10], modiﬁcations to Knuth’s
method are presented closing this gap while maintaining sufﬁ-
cient simplicity.
Knuth’s method does not provide protection against errors
which may occur during transmission or storage. Actually, er-
rors in the preﬁx may lead to catastrophic error propagation in
the data word. Here, we propose and analyze a method to ex-
tend Knuth’s original scheme with error correcting capabilities.
Previous constructions for error-correcting balanced codes were
given in [2], [9] and [8]. In [9], van Tilborg and Blaum intro-
duced the idea to consider short balanced blocks as symbols of
an alphabet and to construct error-correcting codes over that al-
phabet. Only moderate rates can be achieved by this method,
but it has the advantage of limiting the digital sum variation
and the runlengths. In [2], Al-Bassam and Bose constructed bal-
anced codes correcting a single error, which can be extended to
codes correcting up to two, three, or four errors by concatenation
techniques. In [8], Mazumdar, Roth, and Vontobel considered
linear balancing sets and applied such sets to obtain error-cor-
recting coding schemes in which the codewords are balanced.
In the method proposed in the current paper, we stay very close
to the original Knuth algorithm. Hence, we only operate in the
binary ﬁeld and inherit the low-complexity features of Knuth’s
method. In our method, the error-correcting capability can be
any number. The focus will be on long codes, for which table
look-up methods are unfeasible. An additional feature is the pos-
sibility to assign different error protection levels to the preﬁx
and the payload, which will be shown to be useful when de-
signing the scheme to achieve a certain required error perfor-
mance while optimizing the rate. In this context, it turns out that
the Hamming distance of the proposed code is of minor impor-
tance.
The rest of this paper is organized as follows. In Section II, the
proposed method for providing balancing and error-correcting
capabilities is presented. In Section III, the redundancy of the
new scheme is considered. The block and bit error probabili-
ties are analyzed in Section IV. Case studies are presented in
Section V. Finally, the results of this paper are discussed in
Section VI.
II. C
ONSTRUCTION METHOD
The proposed construction method is based on a combination
of conventional error correction techniques and Knuth’s method
for obtaining balanced words. The encoding procedure consists
of fours steps which are described below and illustrated in Fig. 1.
The input to the encoder is a binary data block
of length . Let
denote a run of bits , e.g., .
1) Encode
using a binary linear block code
of dimension , Hamming distance , and even length .
The encoding function is denoted by
.
0018-9448/$26.00 © 2011 IEEE WEBER et al.: ERROR-CORRECTING BALANCED KNUTH CODES 83 Fig. 1. Encoding procedure. 2) Find a balancing index for the obtained codeword , with . 3) Invert the ﬁrst bits of , resulting in the balanced word . 4) Encode the number into a unique codeword from a binary code of even length , constant weight , and Hamming distance . The encoding function is denoted by . The output of the encoder is the concatenation of the balanced word , called the preﬁx, and the balanced word , called the bulk or payload. It is obvious that the resulting code is balanced and has length and redundancy , and thus, code rate and normalized redundancy . Its Hamming distance satisﬁes the following lower bound. Theorem 1: The Hamming distance of code is at least Proof: Let and denote two different code- words of and let .If , then the Hamming distance between the codewords is at least , since and are both in .If , then the Hamming distance between the codewords is at least , which follows from the fact that and are two different codewords from , implying that , and the fact that and are both balanced, implying that is even. Corollary 1: In order to make capable of correcting up to errors, it sufﬁces to choose constituent codes and with distances and , respectively. Upon receipt of a sequence , where and have lengths and , respectively, a simple decoding procedure, illustrated in Fig. 2, consists of the following steps. 1) Look for a codeword in which is closest to , and set . 2) Invert the ﬁrst bits in , i.e., set . 3) Decode according to a decoding algorithm for code , leading to an estimated codeword , and thus, to an esti- mated information block . Fig. 2. Decoding procedure. The following results are immediate. Theorem 2: The proposed decoding procedure for code corrects any error pattern with at most errors in the ﬁrst bits and at most errors in the last bits. Corollary 2: The proposed decoding procedure for code corrects up to the number of errors guaranteed by the Hamming distance result from Theorem 1. Corollary 3: The proposed decoding procedure for code corrects up to errors if and . Typically, as suggested in Figs. 1 and 2, the length of the bulk is much larger than the length of the preﬁx, and conse- quently also is usually (much) larger than . Hence, from the results presented in this section, the ﬁnal Hamming distance is only in such cases, which may be very small with re- spect to the overall length . However, it will be shown in Section IV, that from the perspective of achieving a certain target decoding error probability, the overall Hamming distance of our scheme is a parameter which is only of minor importance. III. R EDUNDANCY In this section, we ﬁrst derive a lower bound on the redun- dancy of balanced codes with error-correcting capabilities. Then, we compare the redundancy of the code from Section II with this lower bound. Let denote the maximum cardinality of a code of length , constant weight , and even Hamming distance . Hence, for any balanced code of even length and Hamming distance , the redundancy is at least (1) Since (2) 84 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 1, JANUARY 2012 the minimum redundancy for a balanced code without error cor- rection capabilities [6] is (3) (4) (5) where the ﬁrst approximation is due to the well-known Stirling formula (6) No general expression for is known, but bounds are available in literature. From Theorem 12 in [1], we have the upper bound (7) where . Note that for , i.e., , this gives the same expression as (2), and thus, the bound is tight in this case. The upper bound (7) can be used to lower bound the minimum redundancy from (1) in case , i.e., is at least (8) Note that this expression can be decomposed into a contribution from the balance property and a contribution from the capability of correcting up to errors. Using Stirling’s formula, we obtain the approximation (9) Next, we will investigate the difference between the lower bound on the redundancy, of which it is unknown whether it is achievable in general, and the redundancy of the proposed construction method. We know from [6] that the redundancy of the Knuth scheme, without error correction, exceeds the min- imum achievable redundancy by a factor of two. Obviously, this is a price to be paid for simplicity. For the proposed method with correction, the redundancy is equal to the sum of the re- dundancy of code , and the length of the preﬁx. For neither of these terms a general expression is available. Another complication in the analysis is that the error correction levels of and of may be dif- ferent. The value of depends on the choice of .For example, for BCH codes it is roughly [7]. The length of the preﬁx can be decomposed into two parts: a contribution of length identifying the balancing index and a contribution of length roughly , based on the presented bound (9), providing the error correction and balancing proper- ties to the preﬁx. Hence, assuming , as for BCH codes, the total redundancy of the proposed method can be approximated as (10) Further assuming that and that the length is very large, the comparison of this expression to (9) gives that the re- dundancy of our method is roughly a factor higher than (the lower bound on) the minimum redundancy for any -error-correcting balanced code. Although the results presented in this analysis are based on bounds and ap- proximations rather than exact results, it seems safe to conclude that the redundancy of the presented method is within a factor of two of the optimum. This will be illustrated in Section V. The redundancy of the codes presented in [2] is (slightly) lower, but the method presented here is simpler and more general (since the constructions from [2] are for only). The construc- tions from [9] are of a completely different nature, with much higher redundancies but balancing being established on a very small block scale. IV. E RROR PROBABILITY ANALYSIS Since the length of the preﬁx is considerably shorter than the length of the bulk, the probability of the preﬁx being hit by a random error is proportionally smaller. Therefore, it may be considered giving the preﬁx a lower error correcting capability. On the other hand, uncorrectable errors in the preﬁx will lead to a wrong balancing index and may thus lead to a huge number of errors in the bulk. Therefore, it may be worthwhile to invest in some extra protection of the preﬁx. Hence, determining the error correction levels of the codes and is a delicate issue. This will be investigated from both the block error probability and the bit error probability perspectives, in Sections IV-A and IV-B, respectively. Throughout the analysis, we will assume a memoryless binary symmetric channel with error probability . A. Block Error Probability In this subsection, we will investigate proper choices of the error correction capabilities in the context of the block error probability , which is deﬁned as the probability that the decoding result is different from the original information block . An often-used general expression for the block error proba- bility for a block code of length and Hamming distance , thus correcting up to errors, is (11) Actually, this is an upper bound, since error correction might also take place in case more than errors occur. To which extent this could happen depends on the structure of the code and the implementation of its decoder. However, this effect will be ne- glected throughout this paper. In other words, the performance analysis reﬂects a worst case scenario: the real error probabili- ties may be (a little bit) better. A well-known approximation of (11) is obtained by considering only its ﬁrst term (12) WEBER et al.: ERROR-CORRECTING BALANCED KNUTH CODES 85 since it dominates all other terms. A further simpliﬁcation is obtained by ignoring the factor in (12), leading to (13) The smaller , the better (12) and (13) approximate (11). Since the scheme presented in Section II is multistep, the evaluation is a bit more involved. For a received word of length , let and be the number of errors in the last bits (the bulk) and the ﬁrst bits (the preﬁx), respec- tively. We consider various situations. •If and , then correction of all errors is guaranteed. •If and , then the balancing index is cor- rectly retrieved from the preﬁx, but the number of errors in the bulk is beyond the error-correcting capability of , thus leading to a wrong decoding result. •If , then the balancing index retrieved from the preﬁx is wrong, i.e., , leading to the introduction of extra errors in the bulk due to the inversion process in step 2 of the decoding procedure. Assuming for the mo- ment, successful decoding requires . Since, typically, the bulk length (and thus, the range of -values) is very large and the error correcting capability is very small, the probability of the -decoder still being successful is negligible. If , then the successful interval will shrink even further, except when a transmis- sion error’ coincides with an ’inversion error’. Still, even in the latter case, chances for correct decoding are very low. It could be helpful to design , the mapping of the -values to codewords in , in such a way that distances between codewords corresponding to subsequent -values are kept as low as possible, but also the impact of this will be limited. In conclusion, we assume that implies a wrong decoding result with high probability. Hence, we approximate the block error probability by (14) (15) (16) where (17) (18) and (19) (20) The approximations in (18) and (20) are only valid if and , respectively, are sufﬁciently small, which we will assume throughout the rest of this section. When designing the scheme such that the redundancy is min- imized while achieving a certain Hamming distance, it follows from the results in Section II that the Hamming distances of the constituent codes should be chosen (about) equal, i.e., . However, since the preﬁx is mostly much shorter than the bulk, the chances of the bulk being hit by errors is much larger than for the preﬁx. Therefore, intuitively, it should be beneﬁcial to choose smaller than . Indeed, when the focus is on the block error probability rather than the Hamming distance, this turns out to be true. When designing the scheme to achieve a certain target block error probability at maximum rate for a given data length and channel error probability, the focus should not be on the ﬁnal Hamming distance , but on a careful choice of the error correction capabilities and , where the latter can typically be smaller than the former. Since it follows from (16), (18), and (20) that is approximately (21) an appropriate design strategy is choosing and such that both terms in (21) are in the same order of magnitude as the target block error probability, while their sum is (just) below this target. Note that if , which is usually the case, the bino- mial coefﬁcient in the ﬁrst term in (21) is huge in comparison to the binomial coefﬁcient in the second term, and, therefore, the exponent of in the second term can be considerably smaller than the exponent in the ﬁrst term. If and are substantially smaller than , then a more detailed general analysis is possible. To this end, we evaluate for the case . Using (18) and (20), we ﬁnd (22) (23) where the second approximation follows from , which is quite good if is much smaller than .If , then (23) gives (24) which shows that is indeed orders of magnitude below in case . For any , keeping ﬁxed and reducing by 1 causes a growth in by a factor (25) where and are the preﬁx lengths in the cases that preﬁx error correcting capabilities are and , respectively. Note that for ranges of practical interest is the dominating factor in (25). Hence, after several reductions of the ratio may be well above zero, i.e., may no longer be negligibly small. 86 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 1, JANUARY 2012 B. Bit Error Probability When the prime interest is the bit error probability rather than the block error probability, then the picture may be different from the one sketched in the previous subsection. A decoding error in the bulk typically results in just a few erroneous bits in the data, but a decoding error in the preﬁx may completely destroy the data. This would ask for a stronger error protection of the preﬁx. Therefore, in this subsection, we analyze the bit error probability for the proposed scheme. In general, for a block code of length and Hamming dis- tance , the bit error probability can be well approximated by (26) where is as given in (11). This approximation is based on the fact that in case erroneous decoding occurs it is most likely that the output codeword is still close to the original codeword. Since closest codewords differ in bits and the total number of bits is , the result follows. As in the previous subsection, the analysis of the proposed scheme is more involved due to its multistep character. Again, for a received word of length , let and be the number of errors in the last bits (the bulk) and the ﬁrst bits (the preﬁx), respectively. We consider various situations. •If and , then correction of all errors is guaranteed and the original data block is retrieved at the receiver. •If and , then the balancing index is cor- rectly retrieved from the preﬁx, but the number of errors in the bulk is beyond the error-correcting capability of .As argued in the previous paragraph, the resulting fraction of erroneous bits in the data block is . •If , then the balancing index retrieved from the preﬁx is wrong, i.e., . As argued in the previous sub- section, the ﬁnal decoding result is wrong with high proba- bility, and, moreover, the number of bit errors can be huge. The actual distribution of the number of errors depends on the implementation of Knuth’s algorithm and the map- ping . When assuming that both and are more or less uniformly distributed, it follows from standard probability theory that the expected value of the absolute difference between and is . Hence, the expected fraction of erroneous bits in the data word is . Hence, we approximate the bit error probability by (27) where (28) and (29) and where and are as given in (17) and (19), respectively. It thus follows that is approximately (30) Following a similar reasoning as in the previous subsection, an appropriate design strategy is choosing and such that both terms in (30) are in the same order of magnitude as the target bit error probability, while their sum is (just) below this target. Again, note that if , the coefﬁcient in the ﬁrst term in (30) is huge in comparison to the coefﬁcient in the second term, and, therefore, the exponent of in the second term can be con- siderably smaller than the exponent in the ﬁrst term. If and are substantially smaller than , then a more de- tailed general analysis is possible. To this end, evaluate for the case . Using (28), (29), (18), and (20), we ﬁnd (31) (32) where the approximation comes from (23). If , then (32) gives (33) which shows that is indeed orders of magnitude below if and . For any , keeping ﬁxed and reducing by 1 causes a growth in by the factor given in (25). Hence, after several reductions of the ratio may be well above zero, i.e., may no longer be negligibly small. Assuming , this ﬁnal value is larger than for the block error probability case, since, for , exceeds by a factor of , while the growth factor when reducing is the same in both cases. In conclusion, when designing the scheme to achieve a certain at maximum rate, the focus should, as for the case, not be on the ﬁnal Hamming distance , but on a careful choice of the error correction capabilities and . Again, the latter can typically be smaller than the former, but not to the same extent as for the block error probability case. V. C ASE STUDIES In this section, we illustrate the results obtained in this paper by working out two case studies. In the ﬁrst case, various op- tions for the constituent codes are considered and studied for one ﬁxed channel. In the second case, one particular code for protecting the payload is evaluated for different channel condi- tions. A. Case Study I: Shortened BCH Codes Let the information block length be . We consider codes , with , obtained by shortening BCH codes [7]. For , we consider the shortest known balanced codes with cardinality at least and Hamming distance , WEBER et al.: ERROR-CORRECTING BALANCED KNUTH CODES 87 TABLE I C ARDINALITIES OF THE LARGEST KNOWN BALANCED CODES WITH LENGTH AND HAMMING DISTANCE [3] TABLE II C ODE PARAMETERS IN THE SETTING OF CASE STUDY I FOR WITH with . Such balanced codes are tabulated on [3], from which we collected the cardinalities of some short codes in Table I. An overview of the parameters of codes obtained by choosing is provided in Table II. If , then it is found from Table I that a preﬁx of length 12 is required to represent the 750 possible balancing positions without error correction ca- pabilities, as in the original Knuth case, leading to a code rate of , i.e., a normalized redundancy of .If , then it is found from Table I that a preﬁx of length 16 is required to represent the 760 possible bal- ancing positions in the BCH codeword with a single error cor- rection capability, leading to a higher normalized redundancy of . Further increasing the value of leads to higher distances at the expense of higher redundan- cies, as can be checked from the table. For , has length and Hamming distance , while has length and Hamming distance . Thus, the code has length , redundancy , normalized redundancy , and Ham- ming distance . In Table III, we compare the normal- ized redundancy of our scheme to , which is the lower bound on the normalized redundancy of any scheme with the same length and error correction level . Note that is, as for Knuth’s original method, close to 2, in fact a little bit less in case we have error-correcting capabilities. The factor TABLE III R EDUNDANCY COMPARISON IN THE SETTING OF CASE STUDY I FOR WITH TABLE IV C ODE PARAMETERS IN THE SETTING OF CASE STUDY I FOR AND TABLE V N UMERICAL EVALUATIONS OF AND ,WITH , , , AND TABLE VI N UMERICAL EVALUATIONS OF AND ,WITH , , , , AND , the price to be paid for simplicity, may even be smaller since is only a lower bound on of which it is unknown whether it is achievable in case . The proposed scheme also offers the option to provide un- equal error protection to the bulk and the preﬁx. An overview of the parameters of codes obtained by ﬁxing and varying is provided in Table IV. Choosing , i.e., providing no error correction capability to the preﬁx, gives a Hamming dis- tance and a normalized redundancy . Note that increasing up to the value of 3 increases both the Hamming distance (since ) and the redundancy. However, also note that further increasing from 3 to 4 (or beyond) increases the redundancy, without the reward of an improved distance (since it is stuck at due to the fact that ). 88 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 1, JANUARY 2012 Fig. 3. Probabilities and (for ), as a function of the channel error probability , for Case Study II. Fig. 4. Probabilities and (for ), as a function of the channel error probability , for Case Study II. Next, we study the block error probability, where we assume that the channel error probability is . In Table V, the values of and are displayed for various choices of and . Let the target block error probability for our applica- tion be , i.e., the error protection levels and should be chosen in such a way that does not exceed . From Table V, we conclude that should be (at least) equal to 3. From Table II, we see that choosing as well would result in a code with normalized redundancy 0.0672 and Ham- ming distance 8, while we see from Table V that . However, from Tables IV and V, we can also conclude that keeping and lowering from 3 to only 1 results in (i) a normalized redundancy decrease to 0.0578, i.e., a reduction by , (ii) a Hamming distance de- crease to 4, and (iii) a block error probability still meeting the target: . Hence, we obtain a rate increase while still meeting the performance requirement, in spite of the distance drop. Note that a further rate increase is not possible within this scheme, since the per- formance requirement is not met by the choice , i.e., by giving the preﬁx no error protection at all. Similar observations can be made for the case that the target is , where we ﬁnd that is required, but that is sufﬁcient for the error protection of the preﬁx. Finally, we investigate the bit error probability, still assuming . In Table VI, the values of and are displayed for various choices of and . Let the target bit error probability for a certain application be , i.e., the error protection levels and should be chosen in such a way that does not exceed . Note that the choice and satisﬁes the requirement, but that cannot be reduced any further. This indicates that when designing the scheme to achieve a certain target bit error probability, could still be chosen smaller than , but not to the same extent as for the block error probability. B. Case Study II: Reed-Muller Code RM(6,10) In this subsection, we use the Reed Muller Code RM(6,10) [7] as the code protecting the payload. The code length , the data block length , and the Hamming distance , and thus, the error correction level . We investigate the choice of an appropriate code protecting the preﬁx in the proposed scheme, for different channel error probabilities . WEBER et al.: ERROR-CORRECTING BALANCED KNUTH CODES 89 The balanced code should have a size of at least and a length as small as possible. For small values of the Ham- ming distance , the minimum length can be determined from Table I. For Hamming distances 2, 4, 6, 8, and 10 the minimum lengths are 14, 16, 22, 24, and 30, respectively. The error pro- tection level . The block error probability can be approximated by the sum of and as given in (19). For , the values of and (for ) are given in Fig. 3. When choosing the error protec- tion level , then is many orders of magnitudes below . Hence, in order to lower the redundancy, should rather be chosen smaller. Also when lowering from 7 to 4, and thus, considerably decreasing the redundancy, is still negli- gible compared to for the whole range covered in the ﬁgure, so further reductions may be possible. The ﬁgure shows that the channel error probability determines to which extent can be reduced. The higher , the lower can be chosen while keeping well below . The bit error probability can be approximated by the sum of and , which, according to (28) and (29), are and , respectively. For , the values of and (for ) are given in Fig. 4. Similar conclusions as for the block error probability can be drawn, i.e., the higher , the lower can be chosen while keeping well below . However, the resulting value of for the bit error probability case is higher than for the bit error probability case, due to the fact that exceeds by a factor of . VI. C ONCLUSION We have extended Knuth’s balancing scheme with error-cor- recting capabilities. The approach is very general in the sense that any block code can be used to protect the payload, while the preﬁx of length is protected by a constant-weight code where the weight is . It has been demonstrated that, in order to meet a certain target block or bit error probability in an ef- ﬁcient way, the distances of the constituent codes may prefer- ably be unequal. Hence, from the performance perspective, the overall Hamming distance is of minor importance. As for the original Knuth algorithm, the scheme’s simplicity comes at the price of a somewhat higher redundancy than the most efﬁcient but prohibitively complex code. Therefore, the proposed scheme is an attractive simple alternative to achieve (long) balanced se- quences with error correction properties. R EFERENCES [1] E. Agrell, A. Vardy, and K. Zeger, “Upper bounds for constant-weight codes, IEEE Trans. Inf. Theory, vol. 46, no. 7, pp. 2373–2395, Nov. 2000. [2] S. Al-Bassam and B. Bose, “Design of efﬁcient error-correcting bal- anced codes, IEEE Trans. Computers, vol. 42, no. 10, pp. 1261–1266, Oct. 1993. [3] A. E. Brouwer, Bounds for Binary Constant Weight Codes [Online]. Available: http://www.win.tue.nl/~aeb/codes/Andw.html [4] K. A. S. Immink, Codes for Mass Data Storage Systems, 2nd ed. Eindhoven, The Netherlands: Shannon Foundation Publishers, 2004. [5] K. A. S. Immink and J. H. Weber, “Very efﬁcient balanced codes, IEEE J. Sel. Areas Commun., vol. 28, no. 2, pp. 188–192, Feb. 2010. [6] D. E. Knuth, “Efﬁcient balanced codes, IEEE Trans. Inf. Theory, vol. IT-32, no. 1, pp. 51–53, Jan. 1986. [7] S. Lin and D. J. Costello, Jr., Error Control Coding, 2nd ed. Upper Saddle River, NJ: Pearson Prentice-Hall, 2004. [8] A. Mazumdar, R. M. Roth, and P. O. Vontobel, “On linear balancing sets, in Proc. IEEE Int. Symp. Information Theory, Seoul, South Korea, Jun.-Jul. 2009, pp. 2699–2703. [9] H. van Tilborg and M. Blaum, “On error-correcting balanced codes, IEEE Trans. Inf. Theory, vol. 35, no. 5, pp. 1091–1095, Sep. 1989. [10] J. H. Weber and K. A. S. Immink, “Knuth’s balanced code revisited, IEEE Trans. Inf. Theory, vol. 56, no. 4, pp. 1673–1679, Apr. 2010. [11] H. Zhou, A. Jiang, and J. Bruck, “Error-correcting schemes with dynamic thresholds in nonvolatile memories, in Proc. IEEE Int. Symp. Information Theory, Saint Petersburg, Russia, Jul.-Aug. 2011, pp. 2109–2113. Jos H. Weber (S’87–M’90–SM’00) was born in Schiedam, The Netherlands, in 1961. He received the M.Sc. (in mathematics, with honors), Ph.D., and MBT (Master of Business Telecommunications) degrees from Delft University of Technology, Delft, The Netherlands, in 1985, 1989, and 1996, respectively. Since 1985, he has been with the Faculty of Electrical Engineering, Mathe- matics, and Computer Science of Delft University of Technology. Currently, he is an associate professor at the Wireless and Mobile Communications Group. He is the chairman of the WIC (Werkgemeenschap voor Informatie-en Com- municatietheorie in de Benelux) and the secretary of the IEEE Benelux Chapter on Information Theory. He was a Visiting Researcher at the University of Cal- ifornia at Davis, USA, the University of Johannesburg, South Africa, and the Tokyo Institute of Technology, Japan. His main research interests are in the areas of channel and network coding. Kees A. Schouhamer Immink (M’81–SM’86–F’90) received his Ph.D. degree from the Eindhoven University of Technology. He founded and was named president of Turing Machines, Inc., in 1998. He has been, since 1994, an ad- junct professor at the Institute for Experimental Mathematics, Essen University, Germany, and is afﬁliated with the Nanyang Technological University of Sin- gapore. Immink designed coding techniques of a wealth of digital audio and video recording products, such as Compact Disc, CD-ROM, CD-Video, Dig- ital Compact Cassette system, DCC, DVD, Video Disc Recorder, and Blu-ray Disc. He received a Knighthood in 2000, a personal “Emmy” award in 2004, the 1996 IEEE Masaru Ibuka Consumer Electronics Award, the 1998 IEEE Edison Medal, 1999 AES Gold and Silver Medals, and the 2004 SMPTE Progress Medal. He was named a fellow of the IEEE, AES, and SMPTE, and was inducted into the Consumer Electronics Hall of Fame, and elected into the Royal Nether- lands Academy of Sciences and the U.S. National Academy of Engineering. He served the profession as President of the Audio Engineering Society, Inc., New York, in 2003. Hendrik C. Ferreira (SM’08) was born and educated in South Africa where he received the D.Sc. (Eng.) degree from the University of Pretoria in 1980. From 1980 to 1981, he was a postdoctoral researcher at the Linkabit Cor- poration in San Diego, CA. In 1983, he joined the Rand Afrikaans University, Johannesburg, South Africa where he was promoted to professor in 1989 and served two terms as Chairman of the Department of Electrical and Electronic Engineering, from 1994 to 1999. He is currently a research professor at the Uni- versity of Johannesburg. His research interests are in Digital Communications and Information Theory, especially Coding Techniques, as well as in Power Line Communications. Dr. Ferreira is a past chairman of the Communications and Signal Processing Chapter of the IEEE South Africa section, and from 1997 to 2006 he was Ed- itor-in-Chief of the Transactions of the South African Institute of Electrical En- gineers. He has served as chairman of several conferences, including the inter- national 1999 IEEE Information Theory Workshop in the Kruger National Park, South Africa, as well as the 2010 IEEE African Winter School on Information Theory and Communications. ... Motivated by applications in laser disks, Knuth [5] studied balanced binary codes and proposed an efficient method to encode an arbitrary binary message to a binary balanced codeword by introducing log 2 n redundant bits. Recently, Weber et al. [13] extended Knuth's scheme to include error-correcting capabilities. Specifically, their construction takes two input codes of distance d: a linear code of length n and a short balanced code C p ; and outputs a long balanced code of distance d. ... ... Recently, Yazdi et al. [3] introduced the coupling construction (Lemma 6) that takes two binary error-correcting codes, one of which is balanced, as inputs and outputs a GCbalanced error-correcting code. As with the construction of Weber et al. [13], it is unclear how to find the balanced binary errorcorrecting code efficiently. ... ... Recently, Weber et al. [13] modified Knuth's balancing technique to endow the code with error-correcting capabilities. Their method requires two error-correcting codes as inputs: an (m, d) 2 code C m and a short (p, d) 2 balanced code C p where |C p | ≥ m. ... Preprint Full-text available To equip DNA-based data storage with random-access capabilities, Yazdi et al. (2018) prepended DNA strands with specially chosen address sequences called primers and provided certain design criteria for these primers. We provide explicit constructions of error-correcting codes that are suitable as primer addresses and equip these constructions with efficient encoding algorithms. Specifically, our constructions take cyclic or linear codes as inputs and produce sets of primers with similar error-correcting capabilities. Using certain classes of BCH codes, we obtain infinite families of primer sets of length$n$, minimum distance$d$with$(d + 1) \log_4 n + O(1)$redundant symbols. Our techniques involve reversible cyclic codes (1964), an encoding method of Tavares et al. (1971) and Knuth's balancing technique (1986). In our investigation, we also construct efficient and explicit binary balanced error-correcting codes and codes for DNA computing. ... Binary balancing schemes that enable correction of errors have been presented by van Tilborg and Blaum [8], Al-Bassam and Bose [9] and Weber et al. [10], among others. In [8], the idea is to consider short balanced sequences as symbols of a non-binary alphabet and to construct error-correcting codes over that alphabet. ... ... These codes can be extended using concatenation techniques to correct up to four errors. In [10], a combination of conventional error correction techniques and Knuth's balancing method is used. ... Article Full-text available We investigate a Knuth-like scheme for balancing q-ary codewords, which has the virtue that look-up tables for coding and decoding the prefix are avoided by using precoding and error correction techniques. We show how the scheme can be extended to allow for error correction of single channel errors using a fast decoding algorithm that depends on syndromes only, making it considerably faster compared to the prior art exhaustive decoding strategy. A comparison between the new and prior art schemes, both in terms of redundancy and error performance, completes the study. ... Balanced codes have been widely studied due to its applicability in the field of communication and storage structures such as optical and magnetic recording devices like Blu-Ray, DVD and CD [1]; error correction and detection [2], [3]; cable transmission [4] and noise attenuation in VLSI systems. The decoding of balanced codes is fast, and it is done in parallel which avoids latency in communication. ... Article Full-text available In this paper, the construction of binary balanced codes is revisited. Binary balanced codes refer to sets of bipolar codewords where the number of "1"s in each codeword equals that of "0"s. The first algorithm for balancing codes was proposed by Knuth in 1986; however, its redundancy is almost two times larger than that of the full set of balanced codewords. We will present an efficient and simple construction with a redundancy approaching the minimal achievable one ... Balanced codes have been widely studied over the years because of their applicability in the field of communication and in storage structures such as optical and magnetic recording devices like Blu-Ray, DVDs, and CDs [1,2]; error correction and detection [3,4]; cable transmission [5]; and noise attenuation in VLSI integrated circuits [6]. For some balancing techniques, the decoding of balanced codes is fast and can be done in parallel, which avoids latency in communication. ... Preprint Full-text available A simple scheme was proposed by Knuth to generate binary balanced codewords from any information word. 4 However, this method is limited in the sense that its redundancy is twice as that of the full sets of balanced codes. The 5 gap between Knuth's algorithm redundancy and that of the full sets of balanced codes is significantly considerable. This 6 paper attempts to reduce that gap. Furthermore, many constructions assume that a full balancing can be performed 7 without showing the steps. A full balancing refers to the overall balancing of the encoded information together with 8 the prefix. We propose an efficient way to perform full balancing scheme which do not make use of lookup tables or 9 enumerative coding. 10 ... Balanced codes have been widely studied over the years because of their applicability in the field of communication and in storage structures such as optical and magnetic recording devices like Blu-Ray, DVDs, and CDs [1,2]; error correction and detection [3,4]; cable transmission [5]; and noise attenuation in VLSI integrated circuits [6]. For some balancing techniques, the decoding of balanced codes is fast and can be done in parallel, which avoids latency in communication. ... Article Full-text available A simple scheme was proposed by Knuth to generate binary balanced codewords from an information word. 4 However, this method is limited in the sense that its redundancy is twice as that of the full sets of balanced codes. The 5 gap between Knuth's algorithm redundancy and that of the full sets of balanced codes is significantly considerable. This 6 paper attempts to reduce that gap. Furthermore, many constructions assume that full balancing can be performed 7 without showing the steps. A full balancing refers to the overall balancing of the encoded information together with 8 the prefix. We propose an efficient way to perform a full balancing scheme which does not make use of lookup tables or 9 enumerative coding. 10 ... Balanced codes have been widely studied over the years because of their applicability in the field of communication and in storage structures such as optical and magnetic recording devices like Blu-Ray, DVDs, and CDs [1,2]; error correction and detection [3,4]; cable transmission [5]; and noise attenuation in VLSI integrated circuits [6]. For some balancing techniques, the decoding of balanced codes is fast and can be done in parallel, which avoids latency in communication. ... Article Full-text available A simple scheme was proposed by Knuth to generate binary balanced codewords from any information word. However, this method is limited in the sense that its redundancy is twice as that of the full sets of balanced codes. The gap between Knuth's algorithm redundancy and that of the full sets of balanced codes is significantly considerable. This paper attempts to reduce that gap. Furthermore, many constructions assume that a full balancing can be performed without showing the steps. A full balancing refers to the overall balancing of the encoded information together with the prefix. We propose an efficient way to perform full balancing scheme which do not make use of look up tables or enumerative coding. Article To equip DNA-based data storage with random-access capabilities, Yazdi et al. (2018) prepended DNA strands with specially chosen address sequences called primers and provided certain design criteria for these primers. We provide explicit constructions of error-correcting codes that are suitable as primer addresses and equip these constructions with efficient encoding algorithms. Specifically, our constructions take cyclic or linear codes as inputs and produce sets of primers with similar error-correcting capabilities. Using certain classes of BCH codes, we obtain infinite families of primer sets of length$n$, minimum distance$d$with$(d+1) \log _{4}\,\,n +O(1)$redundant symbols. Our techniques involve reversible cyclic codes (1964), an encoding method of Tavares et al. (1971) and Knuth’s balancing technique (1986). In our investigation, we also construct efficient and explicit binary balanced error-correcting codes and codes for DNA computing. Article Transmission across a bus modelled as a parallel asynchronous communication channels is subject to fault injection attacks which cause glitches – pulses that are added to the transmitted signal at arbitrary times – and delays. We present self-synchronizing coding schemes with no latency at the receiver that do not require any acknowledgment to be sent and that can decode the received signal even when the signal suffers from random delays and distortion by random glitches. We make use of the codes to produce lower bounds on the information capacity of such channels when the number of parallel channels is large. Book Full-text available Preface to the Second Edition About five years after the publication of the first edition, it was felt that an update of this text would be inescapable as so many relevant publications, including patents and survey papers, have been published. The author's principal aim in writing the second edition is to add the newly published coding methods, and discuss them in the context of the prior art. As a result about 150 new references, including many patents and patent applications, most of them younger than five years old, have been added to the former list of references. Fortunately, the US Patent Office now follows the European Patent Office in publishing a patent application after eighteen months of its first application, and this policy clearly adds to the rapid access to this important part of the technical literature. I am grateful to many readers who have helped me to correct (clerical) errors in the first edition and also to those who brought new and exciting material to my attention. I have tried to correct every error that I found or was brought to my attention by attentive readers, and seriously tried to avoid introducing new errors in the Second Edition. China is becoming a major player in the art of constructing, designing, and basic research of electronic storage systems. A Chinese translation of the first edition has been published early 2004. The author is indebted to prof. Xu, Tsinghua University, Beijing, for taking the initiative for this Chinese version, and also to Mr. Zhijun Lei, Tsinghua University, for undertaking the arduous task of translating this book from English to Chinese. Clearly, this translation makes it possible that a billion more people will now have access to it. Kees A. Schouhamer Immink Rotterdam, November 2004 Book Full-text available In this chapter, we discuss a number of codes for error control. Only block codes are treated here. Discussion on convolutional codes will be deferred until next chapter. After reviewing some information theoretic foundations of coding in the first section, linear block codes are treated in Section 3.2. The concepts of parity-check and generator matrices to represent linear block codes are discussed. Several examples of block codes are given, including the important class of Hamming codes. Principles behind syndrome decoding and decoding using a standard array are treated in Section 3.3. Section 3.4 provides some useful bounds on coding and introduces the concept of coding gain. Section 3.5 discusses the principles behind cyclic codes. Some important decoding techniques for these codes are treated in Section 3.6. These include the Meggitt and error-trapping decoders. After introducing some algebra in Section 3.7, in the next three sections that follow, we treat the most important and practical of all cyclic codes, the Bose-Chaudhuri-Hocquenghem (BCH) codes and Reed-Solomon codes. The treatment includes the MasseyBerlekamp algorithm for decoding these codes. In Section 3.11, we turn to coding for burst error control, which has been successfully applied to storage media such as magnetic tapes and compact disc. Automatic-repeat-request (ARQ) schemes find wide applicability in computer networks and these schemes are treated in the last section. Conference Paper Full-text available Predetermined fixed thresholds are commonly used in nonvolatile memories for reading binary sequences, but they usually result in significant asymmetric errors after a long duration, due to voltage or resistance drift. This motivates us to construct error-correcting schemes with dynamic reading thresholds, so that the asymmetric component of errors are minimized. In this paper, we discuss how to select dynamic reading thresholds without knowing cell level distributions, and present several error-correcting schemes. Analysis based on Gaussian noise models reveals that bit error probabilities can be significantly reduced by using dynamic thresholds instead of fixed thresholds, hence leading to a higher information rate. Article Full-text available In 1986, Don Knuth published a very simple algorithm for constructing sets of bipolar codewords with equal numbers of one's and zero's, called balanced codes. Knuth's algorithm is well suited for use with large codewords. The redundancy of Knuth's balanced codes is a factor of two larger than that of a code comprising the full set of balanced codewords. In this paper, we will present results of our attempts to improve the performance of Knuth's balanced codes. Article Full-text available The prior art construction of sets of balanced codewords by Knuth is attractive for its simplicity and absence of look-up tables, but the redundancy of the balanced codes generated by Knuth's algorithm falls a factor of two short with respect to the minimum required. We present a new construction, which is simple, does not use look-up tables, and is less redundant than Knuth's construction. In the new construction, the user word is modified in the same way as in Knuth's construction, that is by inverting a segment of user symbols. The prefix that indicates which segment has been inverted, however, is encoded in a different, more efficient, way. Article Full-text available Results are presented on families of balanced binary error-correcting codes that extend those in the literature. The idea is to consider balanced blocks as symbols over an alphabet and to construct error-correcting codes over that alphabet. Encoding and decoding procedures are presented. Several improvements to the general construction are discussed Article Coding schemes in which each codeword contains equally many zeros and ones are constructed in such a way that they can be efficiently encoded and decoded. Article New constructions of t-error correcting balanced codes, for 1&les;t&les;4, are presented. In a balanced code, all the words have an equal number of 1's and 0's. In many cases, the information rates of the new codes are better than the existing codes given in the literature. The proposed codes also have efficient encoding and decoding algorithms Article Let n be an even positive integer and F be the field \GF(2). A word in F^n is called balanced if its Hamming weight is n/2. A subset C \subseteq F^n$ is called a balancing set if for every word y \in F^n there is a word x \in C such that y + x is balanced. It is shown that most linear subspaces of F^n of dimension slightly larger than 3/2\log_2(n) are balancing sets. A generalization of this result to linear subspaces that are almost balancing'' is also presented. On the other hand, it is shown that the problem of deciding whether a given set of vectors in F^n spans a balancing set, is NP-hard. An application of linear balancing sets is presented for designing efficient error-correcting coding schemes in which the codewords are balanced. Comment: The abstract of this paper appeared in the proc. of 2009 International Symposium on Information Theory
Article
Let A(n,d,w) denote the maximum possible number of codewords in an (n,d,w) constant-weight binary code. We improve upon the best known upper bounds on A(n,d,w) in numerous instances for n&les;24 and d&les;12, which is the parameter range of existing tables. Most improvements occur for d=8, 10, where we reduce the upper bounds in more than half of the unresolved cases. We also extend the existing tables up to n&les;28 and d&les;14. To obtain these results, we develop new techniques and introduce new classes of codes. We derive a number of general bounds on A(n,d,w) by means of mapping constant-weight codes into Euclidean space. This approach produces, among other results, a bound on A(n,d,w) that is tighter than the Johnson bound. A similar improvement over the best known bounds for doubly-constant-weight codes, studied by Johnson and Levenshtein, is obtained in the same way. Furthermore, we introduce the concept of doubly-bounded-weight codes, which may be thought of as a generalization of the doubly-constant-weight codes. Subsequently, a class of Euclidean-space codes, called zonal codes, is introduced, and a bound on the size of such codes is established. This is used to derive bounds for doubly-bounded-weight codes, which are in turn used to derive bounds on A(n,d,w). We also develop a universal method to establish constraints that augment the Delsarte inequalities for constant-weight codes, used in the linear programming bound. In addition, we present a detailed survey of known upper bounds for constant-weight codes, and sharpen these bounds in several cases. All these bounds, along with all known dependencies among them, are then combined in a coherent framework that is amenable to analysis by computer. This improves the bounds on A(n,d,w) even further for a large number of instances of n, d, and w