ArticlePDF Available

Abstract and Figures

In 1986, Don Knuth published a very simple algorithm for constructing sets of bipolar codewords with equal numbers of one's and zero's, called balanced codes. Knuth's algorithm is well suited for use with large codewords. The redundancy of Knuth's balanced codes is a factor of two larger than that of a code comprising the full set of balanced codewords. In this paper, we will present results of our attempts to improve the performance of Knuth's balanced codes.
Content may be subject to copyright.
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL.56, NO. 4, APRIL 2010 1673
Knuth’s Balanced Codes Revisited
Jos H. Weber, Senior Member, IEEE, and Kees A. Schouhamer Immink, Fellow, IEEE
Abstract—In 1986, Don Knuth published a very simple al-
gorithm for constructing sets of bipolar codewords with equal
numbers of “ ”s and “ ”s, called balanced codes. Knuth’s algo-
rithm is well suited for use with large codewords. The redundancy
of Knuth’s balanced codes is a factor of two larger than that of a
code comprising the full set of balanced codewords. In this paper,
we will present results of our attempts to improve the performance
of Knuth’s balanced codes.
Index Terms—Balanced code, channel capacity, constrained
code, magnetic recording, optical recording.
I. INTRODUCTION
SETS of bipolar codewords that have equal numbers of “ ”s
and “ ”s are usually called balanced codes. Such codes
have found application in cable transmission, optical and mag-
netic recording. A survey of properties and methods for con-
structing balanced codes can be found in [1]. A simple encoding
technique for generating balanced codewords, which is capable
of handling (very) large blocks was described by Knuth [2] in
1986.
Knuth’s algorithm is extremely simple. An -bit user word,
even, consisting of bipolar symbols valued is forwarded
to the encoder. The encoder inverts the first bits of the user
word, where is chosen in such a way that the modified word
has equal numbers of “ ”s and “ ”s. Knuth showed that such
an index can always be found. The index is represented
by a balanced word of length . The -bit prefix word fol-
lowed by the modified -bit user word are both transmitted, so
that the rate of the code is . The receiver can easily
undo the inversion of the first bits received once is computed
from the prefix. Both encoder and decoder do not require large
look-up tables, and Knuth’s algorithm is therefore very attrac-
tive for constructing long balanced codewords. Modifications of
the generic scheme are discussed in Knuth [2], Alon et al. [3],
Al-Bassam and Bose [4], and Tallini, Capocelli and Bose [5].
Knuth showed that in his best construction [2], the redun-
dancy, i.e., the number of redundant symbols , is roughly equal
to
(1)
Manuscript received March 23, 2009. Current version published March 17,
2010. This work was supported by grant Theory and Practice of Coding and
Cryptography, Award Number: NRF-CRP2-2007-03. The material in this paper
was presented in part at the IEEE International Symposium on Information
Theory, Toronto, ON, Canada, July 2008.
J. H. Weber iswith the IRCTR/CWPC, Delft University of Technology, 2628
CD Delft, The Netherlands (e-mail: J.H.Weber@tudelft.nl).
K. A. Schouhamer Immink is with the Nanyang Technological University of
Singapore, Singapore, and with Turing Machines BV, 3016 DK Rotterdam, The
Netherlands (e-mail: immink@turing-machines.com).
Communicated by H.-A. Loeliger, Associate Editor for Coding Techniques.
Color versions of Figures 1–4 in this paper are available online at http://iee-
explore.ieee.org.
Digital Object Identifier 10.1109/TIT.2010.2040868
The cardinality of a full set of balanced codewords of length
equals
where the approximation of the central binomial coefficient fol-
lows from Stirling’s formula. Then the redundancy of a full set
of balanced codewords is roughly equal to
(2)
We conclude that the redundancy of a balanced code generated
by Knuth’s algorithm falls a factor of two short with respect to
a code that uses ’full’ balanced code sets. Clearly, the loss in
redundancy is the price one has to pay for a simple construc-
tion without look-up tables. There are two features of Knuth’s
construction that could help to explain the difference in perfor-
mance, and they offer opportunities for code improvement.
The first feature that may offer a possibility of improving the
code’s performance stems from the fact that Knuth’s algorithm
is greedy as it takes the very first opportunity for balancing the
codeword [1], that is, in Knuth’s basic scheme, the first, i.e., the
smallest, index where balance is reached is selected. In case
there is more than one position where balance can be achieved,
the encoder will thus favor smaller values of the position index.
As a result, we may expect that smaller values of the index are
more probable than larger ones. Then, if the index distribution
is non-uniform, we may conclude that the average length of the
prefix required to transmit the position information is less than
. A practical embodiment of a scheme that takes advan-
tage of this feature is characterized by the fact that the length of
the prefix word is not fixed, but user data dependent. The prefix
assigned to a position with a smaller, more probable, index has
a smaller length than a prefix assigned to a position with a larger
index.
Second, it has been shown by Knuth that there is always a
position where balance can be reached. It can be verified that
there is, for some user words, more than one suitable position
where balance of the word can be realized. It will be shown
later that the number of positions where words can be balanced
lies between 1 and . This freedom offers a possibility to
improve the redundancy of Knuth’s basic construction. An en-
hanced Knuth’s algorithm may transmit auxiliary data by using
the freedom of selecting from the balancing positions possible.
Assume there are positions, where the encoder
can balance the user word, then the encoder can convey an addi-
tional bits. The number depends on the user word, and
therefore the amount of auxiliary data that can be transmitted is
user data dependent.
We start, in Section II, with a survey of known properties of
Knuth’s coding method. Thereafter, in Section III, we will com-
pute the distribution of the transmitted index in Knuth’s basic
0018-9448/$26.00 © 2010 IEEE
1674 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 4, APRIL 2010
scheme. Given the distribution of the index,we will compute the
entropy of the index, and evaluate the performance of a suitably
modified scheme. In Section IV, we will compute the amount
of additional data that can be conveyed in a modification of
Knuth’s basic scheme. Section V concludes this article.
II. KNUTHSBASIC SCHEME
Knuth’s balancing algorithm is based on the idea that there
is a simple translation between the set of all -bit bipolar user
words, even, and the set of all -bit codewords. This
conversion is based on the observation that in any block of data,
having an even number of binary digits, it is always possible to
find a location which defines two digit segments having equal
disparity. A balanced block can then be created by the inver-
sion of all the digits within either segment. The translation is
achieved by selecting a bit position within the -bit word
that defines two segments, each having the same disparity. A
zero-disparity, or balanced, block is now generated by the in-
version of the first bits (or the last bits). The position
digit is encoded in the -bit prefix. The rate of the code is
simply .
The proof that there is at least one position, , where balance
in any even length user word can be achieved is due to Knuth.
Let the user word be , , and let
be the sum, or disparity, of the user symbols, or
(3)
Let be the running digital sum of the first , , bits
of ,or
(4)
and let be the word with its first bits inverted. For
example, let
then we have and
. We let
stand for , then the quantity is
(5)
It is immediate that , (no symbols inverted)
and (all symbols inverted). We may, as
, conclude that every word , even,
canbe associatedwith atleast oneposition for which
,or is balanced. This concludes the proof.
The value of is encoded in a balanced word of length ,
even. The maximum codeword length of is, since the prefix
has an equal number of “ ”s and “ ”s, governed by
(6)
In this article, we follow Knuth’s generic format, where
. Note that in a slightly different format, we may opt
for , where the encoder has the option to invert or
not to invert the codeword in case the user word is balanced.
For small values of , this will lead to slightly different results,
though for very large values of , the differences between the
two formats are small. Knuth described some variations on the
general framework. For example, if and are both odd, we
can use a similar construction. The redundancy of Knuth’s most
efficient construction is
III. DISTRIBUTION OF THE TRANSMITTED INDEX
The basic Knuth algorithm, as described above, progressively
scans the user word till it finds the first suitable position, ,
where the word can be balanced. In case there is more than one
position where balance can be obtained, it is expected that the
encoder will favor smaller values of the position index. Then
the distribution of the index is not uniform, and, thus, the en-
tropy of the index is less than , which opens the door for
a more efficient scheme. A practical embodiment of a more ef-
ficient scheme would imply that the prefix assigned to a smaller
index has a smaller length than a prefix assigned to a larger
index. We will compute the entropy of the index sent by the
basic Knuth encoder, and in order to do so we first compute the
probability distribution of the transmitted index. In our analysis
it is assumed that all information words are equiprobable and
independent. Let denote the probability that the trans-
mitted index equals , .
Theorem 1: The distribution of the transmitted index ,
, is given by
Proof: Theorem 1 follows from Lemma 3 in Appendix and
the fact that there are (equally probable) sequences of length
.
Invoking Stirling’s approximation, we have
For ,wehave , and for
,wehave . Fig. 1
shows two examples of the distribution, , for
and . The entropy of the transmitted index, denoted by
,is
(7)
Given the distribution, it is now straightforward to compute the
entropy, , of the index. Fig. 2 shows a few results of com-
putations. The diagram shows that is only slightly less
WEBER AND SCHOUHAMER IMMINK: KNUTH’S BALANCED CODES REVISITED 1675
Fig. 1. Distribution of the (normalized) transmitted index for and .
Fig. 2. Entropy versus .
than , and we conclude that the above proposed modifica-
tion of Knuth’s scheme using a variable length prefix can offer
only a small improvement in redundancy within the range of
codeword length investigated. We conclude that, at least within
this range, the proposed variable prefix-length scheme cannot
bridge the factor of two in redundancy between the basic Knuth
scheme and that of full set balanced codes.
IV. ENCODING AUXILIARY DATA
There is at least one position and there are at most posi-
tions within an -bit word, even, where a word can be bal-
anced. The “at least” one position, which makes Knuth’s algo-
rithm possible, was proved by Knuth (see above). The “at most”
bound will be shown in the next Theorem.
Theorem 2: There are at most positions within an -bit
word, even, where a word can be balanced.
Proof: Let denote the position where balance can be
made. Then, at the neighboring positions or such
a balance cannot be made, so that we conclude that the number
of positions where balance can be made is less or equal to
1676 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 4, APRIL 2010
Fig. 3. Distribution of the (normalized) number, , of possible balancing positions for and .
Note that the indices of a word with balance positions
are either all even or all odd. It can easily be verified that there
are three groups of words that can be balanced at positions,
namely
• the wordsconsisting ofthe cascadeof the di-bits
or ,
• the words beginning with a followed by
di-bits or , followed by a , and
the inverted words of the previous case.
Since, on average, the encoder has the degree of freedom of
selecting from more than one balance position, it offers the en-
coder the possibility to transmit auxiliary data. Assume there
are positions, , where the encoder can balance
the user word, then the encoder can convey an additional
bits. The number depends on the user word at hand, and there-
fore the amount of auxiliary data that can be transmitted is user
data dependent.
Let denote the probability that the encoder may
choose between , , possible positions, where
balancing is possible.
Theorem 3: The distribution of the number of positions,
where an -bit word, even, can be balanced is given by
(8)
Proof: Theorem 3 follows from Lemma 6 in Appendix and
the fact that there are (equally probable) sequences of length
.
Fig. 3 shows two examples of the distribution, namely for
and . The average amount of information,
, that can be conveyed via the choice in the position data
is
(9)
Results of computations are shown in Fig. 4. We can recursively
compute by invoking
For large and ,wehave
where . We approximate
so that
Now, for large , we can approximate by
(10)
WEBER AND SCHOUHAMER IMMINK: KNUTH’S BALANCED CODES REVISITED 1677
Fig. 4. The average amount of information, , that can be conveyed via the choice in the index as a function of .
(11)
(12)
where isEuler’s constant.Weconclude thatthe av-
erage amount of information that can be conveyed by exploiting
the choice of index compensates for the loss in rate between
codes based on Knuth’s algorithm and codes based on full bal-
anced codeword sets.
V. CONCLUSION
We have investigated some characteristics and possible im-
provements of Knuth’s algorithm for constructing bipolar code-
words with equal numbers of “ ”s and “ ”s. An -bit
codeword is obtained after a small modification of the -bit
user word plus appending a, fixed-length, -bit prefix. The -bit
prefix represents the position index within the codeword, where
the modification has been made.
We have derived the distribution of the index (assuming
equiprobable user words), and have computed the entropy of
the transmitted index. Our computations show that a modifica-
tion of Knuth’s generic scheme using a variable length prefix
of the position index will only offer a small improvement in
redundancy.
The transmitter can, in general, choose from a plurality of
indices, so that the transmitter can transmit additional infor-
mation. The number of possible indices depends on the given
user word, so that the amount of extra information that can be
transmitted is data dependent. Wehave derived the distribution
of the number of positions where a word can be balanced. We
have computed the average information that can be conveyed
by using the freedom of choosing from multiple indices. The
average amount of information can, for large user word length,
, be approximated by . This compensates for
the loss in code rate between codes based on Knuth’s algorithm
and codes based on full balanced codeword sets.
APPENDIX
In this Appendix, we give combinatorial proofs of Theorems
1 and 3. We first review some results on Dyck words and then
derive lemmas leading to the proofs of the theorems. We also
refer the reader to On Line Encyclopedia of Integer Sequences
A33820 and A112326.
ADyck word of length is a balanced bipolar sequence
of length such that no initial segment has more ’1’s than
’s [6], or in other words, is a Dyck word if the running
digital sum for all . The
number of Dyck words of length is equal to
(13)
which is the th Catalan number [6]. For example, , and
are the Dyck words of length , and ,
, , , and are the Dyck
words of length , where for clerical convenience we have
written “ ” instead of “ ”.
Let denotethe setofallbalanced sequencesoflength
without internal balancing positions, i.e., there are no balancing
1678 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 4, APRIL 2010
positions with . Define . Note
that a sequence is in if and only if it has the format
or its inverse, where is a Dyck word of length .
Hence, for all
(14)
For example, , which is indeed the result provided by
(14).
Let denote the set of bipolar sequences of even length
for which the smallest balancing index is .
Define . We will derive an explicit
expression for (in Lemma 3), from which Theorem 1
immediately follows.
Lemma 1: For all , it holds that
(15)
Proof: Let with of length .
We define a mapping from to by
, where is the inverse of , i.e., is the cyclic shift of
with an inversion of the last bit of . The lemma follows from
the observation that is a bijection.
Lemma 2: For all , it holds that
(16)
Proof: Let denote the set of all bipolar sequences
of length , where and is
balanced. Let with of length .We
define a mapping from to by ,
where is the symbol-wise inverse of . Since is a bijection
(17)
and the lemma follows using (14).
Lemma 3: For all , it holds that
(18)
Proof: The first equality follows from Lemma 1. Suppose
that the second equality holds for . From Lemma 2
(19)
and thus the second equality also holds for . Since the
second equality holds for because of (14), the result
follows by induction.
Let denote the set of bipolar sequences of even length
which can be balanced in positions . De-
fine . We will derive an explicit ex-
pression for (in Lemma 6), from which Theorem 3 im-
mediately follows. Any sequence with balancing
positions can be uniquely decomposed as
, where is of length , with
and . Note that is in for all
and that is in . From
these observations, we can easily derive the recursive relation
(20)
for all . Further, we have, for all , the trivial
equality
(21)
Lemma 4: For all and satisfying , it holds
that
(22)
Proof: Any bipolar sequence of length containing
’ones’ can be uniquely written as , where is a Dyck
word of length , with , and is
a bipolar sequence of length containing
’s. Using (13) for Dyck word enumeration, a simple counting
argument gives the stated result.
Lemma 5: For all , it holds that
(23)
Proof: Any bipolar sequence of length having morethan
’s can be uniquely written as , where is of length
, with , and is of length and has
’s. Any bipolar sequence of length containing less than
’s can be uniquely written as , where is of length
, with , and is of length and has
’s. Hence
WEBER AND SCHOUHAMER IMMINK: KNUTH’S BALANCED CODES REVISITED 1679
(24)
which concludes the proof.
Lemma 6: For all , it holds that
(25)
Proof: Assuming that the statement holds for all ,
we will show that it also holds for . For all
,wehave
(26)
where the first equality follows from (20), the second from (25)
and (14), and the third from Lemma 4 (with and
). Further, we have
(27)
where the first equality follows from (21) (with ),
the second from (26), and the third from Lemma 5 (with
). Hence, if the statement in the lemma holds for all
, then it holds for as well. Since (21) gives that
, (25) holds for , and the lemma follows by
induction on .
REFERENCES
[1] K. A. S. Immink, Codes for Mass Data Storage Systems, Second ed.
Eindhoven, Netherlands: Shannon Foundation Publishers, 2004.
[2] D. E. Knuth, “Efficient balanced codes,” IEEE Trans. Inf. Theory, vol.
IT-32, pp. 51–53, Jan. 1986.
[3] N. Alon, E. E. Bergmann, D. Coppersmith, and A. M. Odlyzko,
“Balancing sets of vectors,” IEEE Trans. Inf. Theory, vol. IT-34, pp.
128–130, Jan. 1988.
[4] S. Al-Bassam and B. Bose, “On balanced codes,” IEEE Trans. Inf.
Theory, vol. 36, pp. 406–408, Mar. 1990.
[5] L. G. Tallini, R. M. Capocelli, and B. Bose, “Design of some new
balanced codes,” IEEE Trans. Inf. Theory, vol. 42, pp. 790–802, May
1996.
[6] R. P. Stanley, Enumerative Combinatorics. New York: Cambridge
University Press, 1999, vol. 2.
Jos H. Weber (S’87–M’90–SM’00) was born in Schiedam, The Netherlands,
in 1961. He received the M.Sc. (in mathematics, with honors), Ph.D., and MBT
(Master of Business Telecommunications) degrees from Delft University of
Technology, Delft, The Netherlands, in 1985, 1989, and 1996, respectively.
Since 1985, he has been with the Faculty of Electrical Engineering, Mathe-
matics, and Computer Science of Delft University of Technology. Currently, he
is an associate professor at the Wireless and Mobile Communications Group.
He is the chairman of the WIC (Werkgemeenschap voor Informatie- en Com-
municatietheorie in de Benelux) and the secretary of the IEEE Benelux Chapter
on Information Theory. He was a Visiting Researcher at the University of Cal-
ifornia at Davis, the University of Johannesburg, South Africa, and the Tokyo
Institute of Technology, Japan. His main research interests are in the areas of
channel and network coding.
Kees A. Schouhamer Immink (M’81–SM’86–F’90) received the Ph.D. degree
from the Eindhoven University of Technology, The Netherlands.
He founded and was named President of Turing Machines, Inc., in 1998. He
has, since 1994, been an Adjunct Professor at the Institute for Experimental
Mathematics, Essen University, Germany, and is affiliated with the Nanyang
Technological University of Singapore. He designed coding techniques of a
wealth of digital audio and video recording products, such as compact disc,
CD-ROM, CD-video, digital compact cassette system, DCC, DVD, video disc
recorder, and blu-ray disc.
Dr. Immink received a Knighthood in 2000, a personal “Emmy” award in
2004, the 1996 IEEE Masaru Ibuka Consumer Electronics Award, the 1998
IEEE Edison Medal, 1999 AES Gold and Silver Medals, and the 2004 SMPTE
Progress Medal. He was named a Fellow of the IEEE, AES, and SMPTE, and
was inducted into the Consumer Electronics Hall of Fame, and elected into the
Royal Netherlands Academy of Sciences and the US National Academy of En-
gineering. He served the profession as President of the Audio Engineering So-
ciety inc., New York, in 2003.
... Therefore, in [9], [10], Immink and Weber proposed balancing schemes that transmit variable-length prefixes and studied the average redundancy of their proposals. Specifically, in [9], Weber and Immink provided two variable-length balancing schemes whose average redundancy are asymptotically equal to log 2 n and 1 2 log 2 n + 0.936, respectively. ...
... Therefore, in [9], [10], Immink and Weber proposed balancing schemes that transmit variable-length prefixes and studied the average redundancy of their proposals. Specifically, in [9], Weber and Immink provided two variable-length balancing schemes whose average redundancy are asymptotically equal to log 2 n and 1 2 log 2 n + 0.936, respectively. Later in [10], Immink and Weber proposed another variable-length balancing scheme which we study closely in this paper. ...
... To this end, we borrow tools from lattice-path combinatorics and provide closed formulas for the upper bounds on the average redundancy of both Schemes A and B (described in Sections III and V). Unfortunately, as with [10], we are unable to complete the asymptotic analysis for Scheme A. Hence, we introduce Scheme B which uses slightly more redundant bits, and show that Scheme B incurs average redundancy of at most 1 2 log 2 n + 2.526 redundant bits asymptotically when q > 0. Interestingly, for the case q = 0, the average redundancy of Scheme B can be reduced to 1 2 log 2 n + 0.526 and this is better than the schemes given in [9]. ...
Preprint
Full-text available
We study and propose schemes that map messages onto constant-weight codewords using variable-length prefixes. We provide polynomial-time computable formulas that estimate the average number of redundant bits incurred by our schemes. In addition to the exact formulas, we also perform an asymptotic analysis and demonstrate that our scheme uses $\frac12 \log n+O(1)$ redundant bits to encode messages into length-$n$ words with weight $(n/2)+{\sf q}$ for constant ${\sf q}$.
... In [2], an attempt to improve Knuth's balancing algorithm was presented based on the distribution of the transmitted prefix index. The basic Knuth scheme uses Manuscript received April 30, 2019; revised November 8, 2019. ...
... x (1) = 11000110, x (2) = 10000110, x (3) = 10100110, x (4) = 10110110, x (5) = x (6) = 10111110, x (7) = 10111000, ...
... (4) Using the RDS approach as described in Theorem 1, the balanced codewords with inversion performed from the left are x (1) , x (3) , and x (3) , while balanced codewords from the right are x (2) , x (4) , and x (8) , as presented in (4). This is an efficient way of finding inversion points from both left and right directions; this approach presents a linear complexity of 2 operation digits. ...
Article
Full-text available
A simple scheme was proposed by Knuth to generate balanced codewords from a random binary information sequence. However, this method presents a redundancy which is twice as that of the full sets of balanced codewords, that is the minimal achievable redundancy. The gap between the Knuth's algorithm-generated redundancy and the minimal one is significantly considerable and can be reduced. This paper attempts to achieve this goal through a method based on information sequence candidates.  Index Terms-Balanced code, inversion point, redundancy, running digital sum (RDS), running digital sum from left (RDSL), running digital sum from right (RDSR), information sequence candidates.
... This approach gives a novel non-recursive efficient codes design method which makes the cited codes less redundant than other code designs. Moreover, the Knuth's parallel decoding scheme has been also used by Weber and Immink [29] and Swart and Weber [18] to convey extra auxiliary data by exploiting the freedom degree to select from more than one possible balancing indexes of a given information word. Pelusi et al. [14] gave a generalization of Knuth's scheme for obtaining efficient m-ary balanced codes with a parallel decoding scheme. ...
... Refer to the complexity of Algorithm 2, note that step at row 2 can be accomplished in space O(k) memory bits and time O(k log k) bit operations by using any of the methods given in [2], [3], [8], [13], [13], [18], [19], [29]. The step in rows 3-14 can be accomplished in space O(r 5 + k) memory bits and time O(r 3 ) bit operations (see [28]). ...
... The idea to exploit the degree of freedom to select between more than one possible balancing encoding of a given information word, was proposed by Weber and Immink [29], Swart and Weber [18], Pelusi et al. [14] and Paluncic and Maharaj [12]. Auxiliary data can be used to reduce the redundancy of Knuth's simple balancing method. ...
Article
Full-text available
The code design problem of non-recursive second-Order Spectral Null (2-OSN) codes is to convert balanced information words into 2-OSN words employing the minimum possible redundancy. Let k be the balanced information word length. If k∈2IN then the 2-OSN coding scheme has length n = k +r, with 2-OSN redundancy r∈2IN and n∈4IN. Here, we use a scheme with r = 2 log k + Θ(log log k). The challenge is to reduce redundancy even further for any given k. The idea is to exploit the degree of freedom to select from more than one possible 2-OSN encoding of a given balanced information word. To reduce redundancy, empirical results suggest that extra information δk = 0:5 log k + Θ(log log k) is obtained. Thus, the proposed approach would give a smaller redundancy r’ = 1:5 log k + Θ(log log k) less than r = 2 log k + Θ(log log k).
... In other words, by inverting a first segment of symbols any word x of even length can be balanced. Note that the balancing index is not unique [21]. We assume here that the encoder selects the smallest balancing index from the set of balancing indexes. ...
... We assume here that the encoder selects the smallest balancing index from the set of balancing indexes. The distribution of the number of symbol inversions, , for obtaining the balanced word in Knuth's scheme, denoted by P r 2 ( ), 1 ≤ ≤ n (1 ≤ j ≤ n/2), has been computed by Weber and Immink [21] (assuming equiprobable source words) ...
Article
Full-text available
We present and analyze a new construction of bipolar balanced codes where each codeword contains equally many -1’s and +1’s. The new code is minimally modified as the number of symbol changes made to the source word for translating it into a balanced codeword is as small as possible. The balanced codes feature low redundancy and time complexity. Large look-up tables are avoided.
... In other words, by inverting a first segment of symbols any word x of even length can be balanced. Note that the balancing index is not unique [21]. We assume here that the encoder selects the smallest balancing index from the set of balancing indexes. ...
... We assume here that the encoder selects the smallest balancing index from the set of balancing indexes. The distribution of the number of symbol inversions, , for obtaining the balanced word in Knuth's scheme, denoted by P r 2 ( ), 1 ≤ ≤ n (1 ≤ j ≤ n/2), has been computed by Weber and Immink [21] (assuming equiprobable source words) ...
Preprint
Full-text available
We present and analyze a new systematic construction of bipolar balanced codes where each code word contains equally many −1's and +1's. The new code is minimally modified as the number of symbol changes made to the source word for translating it into a balanced code word is as small as possible. The balanced codes feature low redundancy and time complexity. Large look-up tables are avoided.
... However, the disadvantages of these methods, which have limited applicability, are the high redundancy and complexity. For example, the redundancy of a full set of balanced codewords is O(log m), where m is the number of user bits [48]. 1. INTRODUCTION (iii) A promising decoding technique with asymptotic zero redundancy as the codeword length increases is proposed in [49], where it is shown that decoders using the Pearson distance have immunity to offset and/or gain mismatch. ...
... In [9], two schemes were described to improve the redundancy of Knuth's algorithm. The first one used the distribution of the prefix index; knowing that the balancing point may be not unique given an information word, it has been proven that this distribution for equiprobable words is not uniform and presents a redundancy slightly less than that of Knuth's scheme. ...
Article
Full-text available
In this paper, the construction of binary balanced codes is revisited. Binary balanced codes refer to sets of bipolar codewords where the number of "1"s in each codeword equals that of "0"s. The first algorithm for balancing codes was proposed by Knuth in 1986; however, its redundancy is almost two times larger than that of the full set of balanced codewords. We will present an efficient and simple construction with a redundancy approaching the minimal achievable one
Article
An indel refers to a single insertion or deletion, while an edit refers to a single insertion, deletion or substitution. In this article, we investigate codes that correct either a single indel or a single edit and provide linear-time algorithms that encode binary messages into these codes of length n. Over the quaternary alphabet, we provide two linear-time encoders. One corrects a single edit with $\lceil {\log \text {n}}\rceil+\text {O}(\log \log \text {n})$ redundancy bits, while the other corrects a single indel with $\lceil {\log \text {n}}\rceil+2$ redundant bits. These two encoders are order-optimal . The former encoder is the first known order-optimal encoder that corrects a single edit, while the latter encoder (that corrects a single indel) reduces the redundancy of the best known encoder of Tenengolts (1984) by at least four bits. Over the DNA alphabet, we impose an additional constraint: the $\mathtt {GC}$ -balanced constraint and require that exactly half of the symbols of any DNA codeword to be either $\mathtt {C}$ or $\mathtt {G}$ . In particular, via a modification of Knuth’s balancing technique, we provide a linear-time map that translates binary messages into $\mathtt {GC}$ -balanced codewords and the resulting codebook is able to correct a single indel or a single edit. These are the first known constructions of $\mathtt {GC}$ -balanced codes that correct a single indel or a single edit.
Article
In many channels, the transmitted signals do not only face noise, but offset mismatch as well. In the prior art, maximum likelihood (ML) decision criteria have already been developed for noisy channels suffering from signal independent offset . In this paper, such ML criterion is considered for the case of binary signals suffering from Gaussian noise and signal dependent offset . The signal dependency of the offset signifies that it may differ for distinct signal levels, i.e., the offset experienced by the zeroes in a transmitted codeword is not necessarily the same as the offset for the ones. Besides the ML criterion itself, also an option to reduce the complexity is considered. Further, a brief performance analysis is provided, confirming the superiority of the newly developed ML decoder over classical decoders based on the Euclidean or Pearson distances.
Book
Full-text available
Preface to the Second Edition About five years after the publication of the first edition, it was felt that an update of this text would be inescapable as so many relevant publications, including patents and survey papers, have been published. The author's principal aim in writing the second edition is to add the newly published coding methods, and discuss them in the context of the prior art. As a result about 150 new references, including many patents and patent applications, most of them younger than five years old, have been added to the former list of references. Fortunately, the US Patent Office now follows the European Patent Office in publishing a patent application after eighteen months of its first application, and this policy clearly adds to the rapid access to this important part of the technical literature. I am grateful to many readers who have helped me to correct (clerical) errors in the first edition and also to those who brought new and exciting material to my attention. I have tried to correct every error that I found or was brought to my attention by attentive readers, and seriously tried to avoid introducing new errors in the Second Edition. China is becoming a major player in the art of constructing, designing, and basic research of electronic storage systems. A Chinese translation of the first edition has been published early 2004. The author is indebted to prof. Xu, Tsinghua University, Beijing, for taking the initiative for this Chinese version, and also to Mr. Zhijun Lei, Tsinghua University, for undertaking the arduous task of translating this book from English to Chinese. Clearly, this translation makes it possible that a billion more people will now have access to it. Kees A. Schouhamer Immink Rotterdam, November 2004
Article
Coding schemes in which each codeword contains equally many zeros and ones are constructed in such a way that they can be efficiently encoded and decoded.
Article
A balanced code with r check bits and k information bits is a binary code of length k+r and cardinality 2<sup>k</sup> such that each codeword is balanced; that is, it has [(k+r)/2] 1's and [(k+r)/2] 0's. This paper contains new methods to construct efficient balanced codes. To design a balanced code, an information word with a low number of 1's or 0's is compressed and then balanced using the saved space. On the other hand, an information word having almost the same number of 1's and 0's is encoded using the single maps defined by Knuth's (1986) complementation method. Three different constructions are presented. Balanced codes with r check bits and k information bits with k&les;2<sup>r+1</sup>-2, k&les;3×2<sup>r</sup>-8, and k&les;5×2<sup>r</sup>-10r+c(r), c(r)∈{-15, -10, -5, 0, +5}, are given, improving the constructions found in the literature. In some cases, the first two constructions have a parallel coding scheme
Article
In a balanced code each codeword contains equally many 1's and 0's. Parallel decoding balanced codes with 2<sup>r</sup> (or 2<sup>r </sup>-1) information bits are presented, where r is the number of check bits. The 2<sup>2</sup>-r-1 construction given by D.E. Knuth (ibid., vol.32, no.1, p.51-3, 1986) is improved. The new codes are shown to be optimal when Knuth's complementation method is used
Article
For n >0, d &ges;0, n ≡ d (mod 2), let K ( n , d ) denote the minimal cardinality of a family V of ±1 vectors of dimension n , such that for any ±1 vector w of dimension n there is a v ∈ V such that | v - w |&les; d , where v - w is the usual scalar product of v and w . A generalization of a simple construction due to D.E. Knuth (1986) shows that K ( n , d )&les;[ n /( d +1)]. A linear algebra proof is given here that this construction is optimal, so that K ( n , d )-[ n /( d +1)] for all n ≡ d (mod 2). This construction and its extensions have applications to communication theory, especially to the construction of signal sets for optical data links
Enumerative Combinatorics