ArticlePDF Available

Knuth's Balanced Codes Revisited

Authors:

Abstract and Figures

In 1986, Don Knuth published a very simple algorithm for constructing sets of bipolar codewords with equal numbers of one's and zero's, called balanced codes. Knuth's algorithm is well suited for use with large codewords. The redundancy of Knuth's balanced codes is a factor of two larger than that of a code comprising the full set of balanced codewords. In this paper, we will present results of our attempts to improve the performance of Knuth's balanced codes.
Content may be subject to copyright.
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL.56, NO. 4, APRIL 2010 1673
Knuth’s Balanced Codes Revisited
Jos H. Weber, Senior Member, IEEE, and Kees A. Schouhamer Immink, Fellow, IEEE
Abstract—In 1986, Don Knuth published a very simple al-
gorithm for constructing sets of bipolar codewords with equal
numbers of “ ”s and “ ”s, called balanced codes. Knuth’s algo-
rithm is well suited for use with large codewords. The redundancy
of Knuth’s balanced codes is a factor of two larger than that of a
code comprising the full set of balanced codewords. In this paper,
we will present results of our attempts to improve the performance
of Knuth’s balanced codes.
Index Terms—Balanced code, channel capacity, constrained
code, magnetic recording, optical recording.
I. INTRODUCTION
SETS of bipolar codewords that have equal numbers of “ ”s
and “ ”s are usually called balanced codes. Such codes
have found application in cable transmission, optical and mag-
netic recording. A survey of properties and methods for con-
structing balanced codes can be found in [1]. A simple encoding
technique for generating balanced codewords, which is capable
of handling (very) large blocks was described by Knuth [2] in
1986.
Knuth’s algorithm is extremely simple. An -bit user word,
even, consisting of bipolar symbols valued is forwarded
to the encoder. The encoder inverts the ﬁrst bits of the user
word, where is chosen in such a way that the modiﬁed word
has equal numbers of “ ”s and “ ”s. Knuth showed that such
an index can always be found. The index is represented
by a balanced word of length . The -bit preﬁx word fol-
lowed by the modiﬁed -bit user word are both transmitted, so
that the rate of the code is . The receiver can easily
undo the inversion of the ﬁrst bits received once is computed
from the preﬁx. Both encoder and decoder do not require large
look-up tables, and Knuth’s algorithm is therefore very attrac-
tive for constructing long balanced codewords. Modiﬁcations of
the generic scheme are discussed in Knuth [2], Alon et al. [3],
Al-Bassam and Bose [4], and Tallini, Capocelli and Bose [5].
Knuth showed that in his best construction [2], the redun-
dancy, i.e., the number of redundant symbols , is roughly equal
to
(1)
Manuscript received March 23, 2009. Current version published March 17,
2010. This work was supported by grant Theory and Practice of Coding and
Cryptography, Award Number: NRF-CRP2-2007-03. The material in this paper
was presented in part at the IEEE International Symposium on Information
Theory, Toronto, ON, Canada, July 2008.
J. H. Weber iswith the IRCTR/CWPC, Delft University of Technology, 2628
CD Delft, The Netherlands (e-mail: J.H.Weber@tudelft.nl).
K. A. Schouhamer Immink is with the Nanyang Technological University of
Singapore, Singapore, and with Turing Machines BV, 3016 DK Rotterdam, The
Netherlands (e-mail: immink@turing-machines.com).
Communicated by H.-A. Loeliger, Associate Editor for Coding Techniques.
Color versions of Figures 1–4 in this paper are available online at http://iee-
explore.ieee.org.
Digital Object Identiﬁer 10.1109/TIT.2010.2040868
The cardinality of a full set of balanced codewords of length
equals
where the approximation of the central binomial coefﬁcient fol-
lows from Stirling’s formula. Then the redundancy of a full set
of balanced codewords is roughly equal to
(2)
We conclude that the redundancy of a balanced code generated
by Knuth’s algorithm falls a factor of two short with respect to
a code that uses ’full’ balanced code sets. Clearly, the loss in
redundancy is the price one has to pay for a simple construc-
tion without look-up tables. There are two features of Knuth’s
construction that could help to explain the difference in perfor-
mance, and they offer opportunities for code improvement.
The ﬁrst feature that may offer a possibility of improving the
code’s performance stems from the fact that Knuth’s algorithm
is greedy as it takes the very ﬁrst opportunity for balancing the
codeword [1], that is, in Knuth’s basic scheme, the ﬁrst, i.e., the
smallest, index where balance is reached is selected. In case
there is more than one position where balance can be achieved,
the encoder will thus favor smaller values of the position index.
As a result, we may expect that smaller values of the index are
more probable than larger ones. Then, if the index distribution
is non-uniform, we may conclude that the average length of the
preﬁx required to transmit the position information is less than
. A practical embodiment of a scheme that takes advan-
tage of this feature is characterized by the fact that the length of
the preﬁx word is not ﬁxed, but user data dependent. The preﬁx
assigned to a position with a smaller, more probable, index has
a smaller length than a preﬁx assigned to a position with a larger
index.
Second, it has been shown by Knuth that there is always a
position where balance can be reached. It can be veriﬁed that
there is, for some user words, more than one suitable position
where balance of the word can be realized. It will be shown
later that the number of positions where words can be balanced
lies between 1 and . This freedom offers a possibility to
improve the redundancy of Knuth’s basic construction. An en-
hanced Knuth’s algorithm may transmit auxiliary data by using
the freedom of selecting from the balancing positions possible.
Assume there are positions, where the encoder
can balance the user word, then the encoder can convey an addi-
tional bits. The number depends on the user word, and
therefore the amount of auxiliary data that can be transmitted is
user data dependent.
We start, in Section II, with a survey of known properties of
Knuth’s coding method. Thereafter, in Section III, we will com-
pute the distribution of the transmitted index in Knuth’s basic
0018-9448/$26.00 © 2010 IEEE 1674 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 4, APRIL 2010 scheme. Given the distribution of the index,we will compute the entropy of the index, and evaluate the performance of a suitably modiﬁed scheme. In Section IV, we will compute the amount of additional data that can be conveyed in a modiﬁcation of Knuth’s basic scheme. Section V concludes this article. II. KNUTHSBASIC SCHEME Knuth’s balancing algorithm is based on the idea that there is a simple translation between the set of all -bit bipolar user words, even, and the set of all -bit codewords. This conversion is based on the observation that in any block of data, having an even number of binary digits, it is always possible to ﬁnd a location which deﬁnes two digit segments having equal disparity. A balanced block can then be created by the inver- sion of all the digits within either segment. The translation is achieved by selecting a bit position within the -bit word that deﬁnes two segments, each having the same disparity. A zero-disparity, or balanced, block is now generated by the in- version of the ﬁrst bits (or the last bits). The position digit is encoded in the -bit preﬁx. The rate of the code is simply . The proof that there is at least one position, , where balance in any even length user word can be achieved is due to Knuth. Let the user word be , , and let be the sum, or disparity, of the user symbols, or (3) Let be the running digital sum of the ﬁrst , , bits of ,or (4) and let be the word with its ﬁrst bits inverted. For example, let then we have and . We let stand for , then the quantity is (5) It is immediate that , (no symbols inverted) and (all symbols inverted). We may, as , conclude that every word , even, canbe associatedwith atleast oneposition for which ,or is balanced. This concludes the proof. The value of is encoded in a balanced word of length , even. The maximum codeword length of is, since the preﬁx has an equal number of “ ”s and “ ”s, governed by (6) In this article, we follow Knuth’s generic format, where . Note that in a slightly different format, we may opt for , where the encoder has the option to invert or not to invert the codeword in case the user word is balanced. For small values of , this will lead to slightly different results, though for very large values of , the differences between the two formats are small. Knuth described some variations on the general framework. For example, if and are both odd, we can use a similar construction. The redundancy of Knuth’s most efﬁcient construction is III. DISTRIBUTION OF THE TRANSMITTED INDEX The basic Knuth algorithm, as described above, progressively scans the user word till it ﬁnds the ﬁrst suitable position, , where the word can be balanced. In case there is more than one position where balance can be obtained, it is expected that the encoder will favor smaller values of the position index. Then the distribution of the index is not uniform, and, thus, the en- tropy of the index is less than , which opens the door for a more efﬁcient scheme. A practical embodiment of a more ef- ﬁcient scheme would imply that the preﬁx assigned to a smaller index has a smaller length than a preﬁx assigned to a larger index. We will compute the entropy of the index sent by the basic Knuth encoder, and in order to do so we ﬁrst compute the probability distribution of the transmitted index. In our analysis it is assumed that all information words are equiprobable and independent. Let denote the probability that the trans- mitted index equals , . Theorem 1: The distribution of the transmitted index , , is given by Proof: Theorem 1 follows from Lemma 3 in Appendix and the fact that there are (equally probable) sequences of length . Invoking Stirling’s approximation, we have For ,wehave , and for ,wehave . Fig. 1 shows two examples of the distribution, , for and . The entropy of the transmitted index, denoted by ,is (7) Given the distribution, it is now straightforward to compute the entropy, , of the index. Fig. 2 shows a few results of com- putations. The diagram shows that is only slightly less WEBER AND SCHOUHAMER IMMINK: KNUTH’S BALANCED CODES REVISITED 1675 Fig. 1. Distribution of the (normalized) transmitted index for and . Fig. 2. Entropy versus . than , and we conclude that the above proposed modiﬁca- tion of Knuth’s scheme using a variable length preﬁx can offer only a small improvement in redundancy within the range of codeword length investigated. We conclude that, at least within this range, the proposed variable preﬁx-length scheme cannot bridge the factor of two in redundancy between the basic Knuth scheme and that of full set balanced codes. IV. ENCODING AUXILIARY DATA There is at least one position and there are at most posi- tions within an -bit word, even, where a word can be bal- anced. The “at least” one position, which makes Knuth’s algo- rithm possible, was proved by Knuth (see above). The “at most” bound will be shown in the next Theorem. Theorem 2: There are at most positions within an -bit word, even, where a word can be balanced. Proof: Let denote the position where balance can be made. Then, at the neighboring positions or such a balance cannot be made, so that we conclude that the number of positions where balance can be made is less or equal to 1676 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 4, APRIL 2010 Fig. 3. Distribution of the (normalized) number, , of possible balancing positions for and . Note that the indices of a word with balance positions are either all even or all odd. It can easily be veriﬁed that there are three groups of words that can be balanced at positions, namely • the wordsconsisting ofthe cascadeof the di-bits or , • the words beginning with a followed by di-bits or , followed by a , and the inverted words of the previous case. Since, on average, the encoder has the degree of freedom of selecting from more than one balance position, it offers the en- coder the possibility to transmit auxiliary data. Assume there are positions, , where the encoder can balance the user word, then the encoder can convey an additional bits. The number depends on the user word at hand, and there- fore the amount of auxiliary data that can be transmitted is user data dependent. Let denote the probability that the encoder may choose between , , possible positions, where balancing is possible. Theorem 3: The distribution of the number of positions, where an -bit word, even, can be balanced is given by (8) Proof: Theorem 3 follows from Lemma 6 in Appendix and the fact that there are (equally probable) sequences of length . Fig. 3 shows two examples of the distribution, namely for and . The average amount of information, , that can be conveyed via the choice in the position data is (9) Results of computations are shown in Fig. 4. We can recursively compute by invoking For large and ,wehave where . We approximate so that Now, for large , we can approximate by (10) WEBER AND SCHOUHAMER IMMINK: KNUTH’S BALANCED CODES REVISITED 1677 Fig. 4. The average amount of information, , that can be conveyed via the choice in the index as a function of . (11) (12) where isEuler’s constant.Weconclude thatthe av- erage amount of information that can be conveyed by exploiting the choice of index compensates for the loss in rate between codes based on Knuth’s algorithm and codes based on full bal- anced codeword sets. V. CONCLUSION We have investigated some characteristics and possible im- provements of Knuth’s algorithm for constructing bipolar code- words with equal numbers of “ ”s and “ ”s. An -bit codeword is obtained after a small modiﬁcation of the -bit user word plus appending a, ﬁxed-length, -bit preﬁx. The -bit preﬁx represents the position index within the codeword, where the modiﬁcation has been made. We have derived the distribution of the index (assuming equiprobable user words), and have computed the entropy of the transmitted index. Our computations show that a modiﬁca- tion of Knuth’s generic scheme using a variable length preﬁx of the position index will only offer a small improvement in redundancy. The transmitter can, in general, choose from a plurality of indices, so that the transmitter can transmit additional infor- mation. The number of possible indices depends on the given user word, so that the amount of extra information that can be transmitted is data dependent. Wehave derived the distribution of the number of positions where a word can be balanced. We have computed the average information that can be conveyed by using the freedom of choosing from multiple indices. The average amount of information can, for large user word length, , be approximated by . This compensates for the loss in code rate between codes based on Knuth’s algorithm and codes based on full balanced codeword sets. APPENDIX In this Appendix, we give combinatorial proofs of Theorems 1 and 3. We ﬁrst review some results on Dyck words and then derive lemmas leading to the proofs of the theorems. We also refer the reader to On Line Encyclopedia of Integer Sequences A33820 and A112326. ADyck word of length is a balanced bipolar sequence of length such that no initial segment has more ’1’s than ’s [6], or in other words, is a Dyck word if the running digital sum for all . The number of Dyck words of length is equal to (13) which is the th Catalan number [6]. For example, , and are the Dyck words of length , and , , , , and are the Dyck words of length , where for clerical convenience we have written “ ” instead of “ ”. Let denotethe setofallbalanced sequencesoflength without internal balancing positions, i.e., there are no balancing 1678 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 4, APRIL 2010 positions with . Deﬁne . Note that a sequence is in if and only if it has the format or its inverse, where is a Dyck word of length . Hence, for all (14) For example, , which is indeed the result provided by (14). Let denote the set of bipolar sequences of even length for which the smallest balancing index is . Deﬁne . We will derive an explicit expression for (in Lemma 3), from which Theorem 1 immediately follows. Lemma 1: For all , it holds that (15) Proof: Let with of length . We deﬁne a mapping from to by , where is the inverse of , i.e., is the cyclic shift of with an inversion of the last bit of . The lemma follows from the observation that is a bijection. Lemma 2: For all , it holds that (16) Proof: Let denote the set of all bipolar sequences of length , where and is balanced. Let with of length .We deﬁne a mapping from to by , where is the symbol-wise inverse of . Since is a bijection (17) and the lemma follows using (14). Lemma 3: For all , it holds that (18) Proof: The ﬁrst equality follows from Lemma 1. Suppose that the second equality holds for . From Lemma 2 (19) and thus the second equality also holds for . Since the second equality holds for because of (14), the result follows by induction. Let denote the set of bipolar sequences of even length which can be balanced in positions . De- ﬁne . We will derive an explicit ex- pression for (in Lemma 6), from which Theorem 3 im- mediately follows. Any sequence with balancing positions can be uniquely decomposed as , where is of length , with and . Note that is in for all and that is in . From these observations, we can easily derive the recursive relation (20) for all . Further, we have, for all , the trivial equality (21) Lemma 4: For all and satisfying , it holds that (22) Proof: Any bipolar sequence of length containing ’ones’ can be uniquely written as , where is a Dyck word of length , with , and is a bipolar sequence of length containing ’s. Using (13) for Dyck word enumeration, a simple counting argument gives the stated result. Lemma 5: For all , it holds that (23) Proof: Any bipolar sequence of length having morethan ’s can be uniquely written as , where is of length , with , and is of length and has ’s. Any bipolar sequence of length containing less than ’s can be uniquely written as , where is of length , with , and is of length and has ’s. Hence WEBER AND SCHOUHAMER IMMINK: KNUTH’S BALANCED CODES REVISITED 1679 (24) which concludes the proof. Lemma 6: For all , it holds that (25) Proof: Assuming that the statement holds for all , we will show that it also holds for . For all ,wehave (26) where the ﬁrst equality follows from (20), the second from (25) and (14), and the third from Lemma 4 (with and ). Further, we have (27) where the ﬁrst equality follows from (21) (with ), the second from (26), and the third from Lemma 5 (with ). Hence, if the statement in the lemma holds for all , then it holds for as well. Since (21) gives that , (25) holds for , and the lemma follows by induction on . REFERENCES [1] K. A. S. Immink, Codes for Mass Data Storage Systems, Second ed. Eindhoven, Netherlands: Shannon Foundation Publishers, 2004. [2] D. E. Knuth, “Efﬁcient balanced codes,” IEEE Trans. Inf. Theory, vol. IT-32, pp. 51–53, Jan. 1986. [3] N. Alon, E. E. Bergmann, D. Coppersmith, and A. M. Odlyzko, “Balancing sets of vectors,” IEEE Trans. Inf. Theory, vol. IT-34, pp. 128–130, Jan. 1988. [4] S. Al-Bassam and B. Bose, “On balanced codes,” IEEE Trans. Inf. Theory, vol. 36, pp. 406–408, Mar. 1990. [5] L. G. Tallini, R. M. Capocelli, and B. Bose, “Design of some new balanced codes,” IEEE Trans. Inf. Theory, vol. 42, pp. 790–802, May 1996. [6] R. P. Stanley, Enumerative Combinatorics. New York: Cambridge University Press, 1999, vol. 2. Jos H. Weber (S’87–M’90–SM’00) was born in Schiedam, The Netherlands, in 1961. He received the M.Sc. (in mathematics, with honors), Ph.D., and MBT (Master of Business Telecommunications) degrees from Delft University of Technology, Delft, The Netherlands, in 1985, 1989, and 1996, respectively. Since 1985, he has been with the Faculty of Electrical Engineering, Mathe- matics, and Computer Science of Delft University of Technology. Currently, he is an associate professor at the Wireless and Mobile Communications Group. He is the chairman of the WIC (Werkgemeenschap voor Informatie- en Com- municatietheorie in de Benelux) and the secretary of the IEEE Benelux Chapter on Information Theory. He was a Visiting Researcher at the University of Cal- ifornia at Davis, the University of Johannesburg, South Africa, and the Tokyo Institute of Technology, Japan. His main research interests are in the areas of channel and network coding. Kees A. Schouhamer Immink (M’81–SM’86–F’90) received the Ph.D. degree from the Eindhoven University of Technology, The Netherlands. He founded and was named President of Turing Machines, Inc., in 1998. He has, since 1994, been an Adjunct Professor at the Institute for Experimental Mathematics, Essen University, Germany, and is afﬁliated with the Nanyang Technological University of Singapore. He designed coding techniques of a wealth of digital audio and video recording products, such as compact disc, CD-ROM, CD-video, digital compact cassette system, DCC, DVD, video disc recorder, and blu-ray disc. Dr. Immink received a Knighthood in 2000, a personal “Emmy” award in 2004, the 1996 IEEE Masaru Ibuka Consumer Electronics Award, the 1998 IEEE Edison Medal, 1999 AES Gold and Silver Medals, and the 2004 SMPTE Progress Medal. He was named a Fellow of the IEEE, AES, and SMPTE, and was inducted into the Consumer Electronics Hall of Fame, and elected into the Royal Netherlands Academy of Sciences and the US National Academy of En- gineering. He served the profession as President of the Audio Engineering So- ciety inc., New York, in 2003. ... Therefore, in [9], [10], Immink and Weber proposed balancing schemes that transmit variable-length prefixes and studied the average redundancy of their proposals. Specifically, in [9], Weber and Immink provided two variable-length balancing schemes whose average redundancy are asymptotically equal to log 2 n and 1 2 log 2 n + 0.936, respectively. ... ... Therefore, in [9], [10], Immink and Weber proposed balancing schemes that transmit variable-length prefixes and studied the average redundancy of their proposals. Specifically, in [9], Weber and Immink provided two variable-length balancing schemes whose average redundancy are asymptotically equal to log 2 n and 1 2 log 2 n + 0.936, respectively. Later in [10], Immink and Weber proposed another variable-length balancing scheme which we study closely in this paper. ... ... To this end, we borrow tools from lattice-path combinatorics and provide closed formulas for the upper bounds on the average redundancy of both Schemes A and B (described in Sections III and V). Unfortunately, as with [10], we are unable to complete the asymptotic analysis for Scheme A. Hence, we introduce Scheme B which uses slightly more redundant bits, and show that Scheme B incurs average redundancy of at most 1 2 log 2 n + 2.526 redundant bits asymptotically when q > 0. Interestingly, for the case q = 0, the average redundancy of Scheme B can be reduced to 1 2 log 2 n + 0.526 and this is better than the schemes given in [9]. ... Preprint Full-text available We study and propose schemes that map messages onto constant-weight codewords using variable-length prefixes. We provide polynomial-time computable formulas that estimate the average number of redundant bits incurred by our schemes. In addition to the exact formulas, we also perform an asymptotic analysis and demonstrate that our scheme uses$\frac12 \log n+O(1)$redundant bits to encode messages into length-$n$words with weight$(n/2)+{\sf q}$for constant${\sf q}$. ... In [2], an attempt to improve Knuth's balancing algorithm was presented based on the distribution of the transmitted prefix index. The basic Knuth scheme uses Manuscript received April 30, 2019; revised November 8, 2019. ... ... x (1) = 11000110, x (2) = 10000110, x (3) = 10100110, x (4) = 10110110, x (5) = x (6) = 10111110, x (7) = 10111000, ... ... (4) Using the RDS approach as described in Theorem 1, the balanced codewords with inversion performed from the left are x (1) , x (3) , and x (3) , while balanced codewords from the right are x (2) , x (4) , and x (8) , as presented in (4). This is an efficient way of finding inversion points from both left and right directions; this approach presents a linear complexity of 2 operation digits. ... Article Full-text available A simple scheme was proposed by Knuth to generate balanced codewords from a random binary information sequence. However, this method presents a redundancy which is twice as that of the full sets of balanced codewords, that is the minimal achievable redundancy. The gap between the Knuth's algorithm-generated redundancy and the minimal one is significantly considerable and can be reduced. This paper attempts to achieve this goal through a method based on information sequence candidates.  Index Terms-Balanced code, inversion point, redundancy, running digital sum (RDS), running digital sum from left (RDSL), running digital sum from right (RDSR), information sequence candidates. ... This approach gives a novel non-recursive efficient codes design method which makes the cited codes less redundant than other code designs. Moreover, the Knuth's parallel decoding scheme has been also used by Weber and Immink [29] and Swart and Weber [18] to convey extra auxiliary data by exploiting the freedom degree to select from more than one possible balancing indexes of a given information word. Pelusi et al. [14] gave a generalization of Knuth's scheme for obtaining efficient m-ary balanced codes with a parallel decoding scheme. ... ... Refer to the complexity of Algorithm 2, note that step at row 2 can be accomplished in space O(k) memory bits and time O(k log k) bit operations by using any of the methods given in [2], [3], [8], [13], [13], [18], [19], [29]. The step in rows 3-14 can be accomplished in space O(r 5 + k) memory bits and time O(r 3 ) bit operations (see [28]). ... ... The idea to exploit the degree of freedom to select between more than one possible balancing encoding of a given information word, was proposed by Weber and Immink [29], Swart and Weber [18], Pelusi et al. [14] and Paluncic and Maharaj [12]. Auxiliary data can be used to reduce the redundancy of Knuth's simple balancing method. ... Article Full-text available The code design problem of non-recursive second-Order Spectral Null (2-OSN) codes is to convert balanced information words into 2-OSN words employing the minimum possible redundancy. Let k be the balanced information word length. If k∈2IN then the 2-OSN coding scheme has length n = k +r, with 2-OSN redundancy r∈2IN and n∈4IN. Here, we use a scheme with r = 2 log k + Θ(log log k). The challenge is to reduce redundancy even further for any given k. The idea is to exploit the degree of freedom to select from more than one possible 2-OSN encoding of a given balanced information word. To reduce redundancy, empirical results suggest that extra information δk = 0:5 log k + Θ(log log k) is obtained. Thus, the proposed approach would give a smaller redundancy r’ = 1:5 log k + Θ(log log k) less than r = 2 log k + Θ(log log k). ... In other words, by inverting a first segment of symbols any word x of even length can be balanced. Note that the balancing index is not unique [21]. We assume here that the encoder selects the smallest balancing index from the set of balancing indexes. ... ... We assume here that the encoder selects the smallest balancing index from the set of balancing indexes. The distribution of the number of symbol inversions, , for obtaining the balanced word in Knuth's scheme, denoted by P r 2 ( ), 1 ≤ ≤ n (1 ≤ j ≤ n/2), has been computed by Weber and Immink [21] (assuming equiprobable source words) ... Article Full-text available We present and analyze a new construction of bipolar balanced codes where each codeword contains equally many -1’s and +1’s. The new code is minimally modified as the number of symbol changes made to the source word for translating it into a balanced codeword is as small as possible. The balanced codes feature low redundancy and time complexity. Large look-up tables are avoided. ... In other words, by inverting a first segment of symbols any word x of even length can be balanced. Note that the balancing index is not unique [21]. We assume here that the encoder selects the smallest balancing index from the set of balancing indexes. ... ... We assume here that the encoder selects the smallest balancing index from the set of balancing indexes. The distribution of the number of symbol inversions, , for obtaining the balanced word in Knuth's scheme, denoted by P r 2 ( ), 1 ≤ ≤ n (1 ≤ j ≤ n/2), has been computed by Weber and Immink [21] (assuming equiprobable source words) ... Preprint Full-text available We present and analyze a new systematic construction of bipolar balanced codes where each code word contains equally many −1's and +1's. The new code is minimally modified as the number of symbol changes made to the source word for translating it into a balanced code word is as small as possible. The balanced codes feature low redundancy and time complexity. Large look-up tables are avoided. ... However, the disadvantages of these methods, which have limited applicability, are the high redundancy and complexity. For example, the redundancy of a full set of balanced codewords is O(log m), where m is the number of user bits [48]. 1. INTRODUCTION (iii) A promising decoding technique with asymptotic zero redundancy as the codeword length increases is proposed in [49], where it is shown that decoders using the Pearson distance have immunity to offset and/or gain mismatch. ... ... In [9], two schemes were described to improve the redundancy of Knuth's algorithm. The first one used the distribution of the prefix index; knowing that the balancing point may be not unique given an information word, it has been proven that this distribution for equiprobable words is not uniform and presents a redundancy slightly less than that of Knuth's scheme. ... Article Full-text available In this paper, the construction of binary balanced codes is revisited. Binary balanced codes refer to sets of bipolar codewords where the number of "1"s in each codeword equals that of "0"s. The first algorithm for balancing codes was proposed by Knuth in 1986; however, its redundancy is almost two times larger than that of the full set of balanced codewords. We will present an efficient and simple construction with a redundancy approaching the minimal achievable one Conference Paper Article An indel refers to a single insertion or deletion, while an edit refers to a single insertion, deletion or substitution. In this article, we investigate codes that correct either a single indel or a single edit and provide linear-time algorithms that encode binary messages into these codes of length n. Over the quaternary alphabet, we provide two linear-time encoders. One corrects a single edit with$\lceil {\log \text {n}}\rceil+\text {O}(\log \log \text {n})$redundancy bits, while the other corrects a single indel with$\lceil {\log \text {n}}\rceil+2$redundant bits. These two encoders are order-optimal . The former encoder is the first known order-optimal encoder that corrects a single edit, while the latter encoder (that corrects a single indel) reduces the redundancy of the best known encoder of Tenengolts (1984) by at least four bits. Over the DNA alphabet, we impose an additional constraint: the$\mathtt {GC}$-balanced constraint and require that exactly half of the symbols of any DNA codeword to be either$\mathtt {C}$or$\mathtt {G}$. In particular, via a modification of Knuth’s balancing technique, we provide a linear-time map that translates binary messages into$\mathtt {GC}$-balanced codewords and the resulting codebook is able to correct a single indel or a single edit. These are the first known constructions of$\mathtt {GC}\$ -balanced codes that correct a single indel or a single edit.
Article
In many channels, the transmitted signals do not only face noise, but offset mismatch as well. In the prior art, maximum likelihood (ML) decision criteria have already been developed for noisy channels suffering from signal independent offset . In this paper, such ML criterion is considered for the case of binary signals suffering from Gaussian noise and signal dependent offset . The signal dependency of the offset signifies that it may differ for distinct signal levels, i.e., the offset experienced by the zeroes in a transmitted codeword is not necessarily the same as the offset for the ones. Besides the ML criterion itself, also an option to reduce the complexity is considered. Further, a brief performance analysis is provided, confirming the superiority of the newly developed ML decoder over classical decoders based on the Euclidean or Pearson distances.
Book
Full-text available