## No full-text available

To read the full-text of this research,

you can request a copy directly from the authors.

Knuth published a very simple algorithm for constructing bipolar codewords with equal numbers of +1's and -1's, called balanced codes. In our paper we will present new code constructions that generate balanced runlength limited sequences using a modification of Knuth's algorithm.

To read the full-text of this research,

you can request a copy directly from the authors.

... This requirement is typically accomplished by balancing signal signs in the stream of transmitted (written) codewords. The author in [12] developed a particularly elegant method of achieving balance, which requires the addition of more than log 2 m bits, where m is the code length, and this method was later tailored to RLL codes [13]. The null at DC can be widened by constraining the higher order statistics of line codewords (see [14] and [15] for a frequency domain approach). ...

... The four codewords are shown in Table I. The bits of codeword c u are c u,i , i ∈ {0, 1}, and a u,i is defined for each c u,i as in (13). We need to prove that (14) yields g(c u ) = u, u ∈ {0, 1, 2, 3}. ...

... where a e i is defined for each e i the same way a i is defined for each c i in (13). To prevent encoding a vector like e, which is not a C-LOCO codeword, we need to prevent forbidden patterns from appearing while encoding via Algorithm 1. ...

Line codes make it possible to mitigate interference, to prevent short pulses, and to generate streams of bipolar signals with no direct-current (DC) power content through balancing. They find application in magnetic recording (MR) devices, in Flash devices, in optical recording devices, and in some computer standards. This paper introduces a new family of fixed-length, binary constrained codes, named lexicographically-ordered constrained codes (LOCO codes), for bipolar non-return-to-zero signaling. LOCO codes are capacity-achieving, the lexicographic indexing enables simple, practical encoding and decoding, and this simplicity is demonstrated through analysis of circuit complexity. LOCO codes are easy to balance, and their inherent symmetry minimizes the rate loss with respect to unbalanced codes having the same constraints. Furthermore, LOCO codes that forbid certain patterns can be used to alleviate inter-symbol interference in MR systems and inter-cell interference in Flash systems. Numerical results demonstrate a gain of up to 10% in rate achieved by LOCO codes with respect to other practical constrained codes, including run-length-limited codes, designed for the same purpose. Simulation results suggest that it is possible to achieve a channel density gain of about 20% in MR systems by using a LOCO code to encode only the parity bits, limiting the rate loss, of a low-density parity-check code before writing.

... This requirement is typically accomplished by balancing signal signs in the stream of transmitted (written) codewords. The author in [10] developed a particularly elegant method of achieving balance, which requires the addition of more than log 2 m bits, where m is the code length, and this method was later tailored to RLL codes [11]. The null at DC can be widened by constraining the higher order statistics of line codewords (see [12] and [13] for a frequency domain approach). ...

... Substituting (11) in (5) and (6) gives: ...

... As far as we know, balancing other constrained codes in the literature always incurs a notable rate loss, even asymptotically, with respect to the unbalanced codes [10], [14], [22], which is not the case for LOCO codes. For example, the balancing penalty in [10] is an added redundancy of more than log 2 m (see also [11]), which is a costly penalty, especially for large m. Moreover, in order to reduce the rate loss due to balancing, the authors in [22] are adopting large code lengths, which is not needed for LOCO codes. ...

Line codes make it possible to mitigate interference, to prevent short pulses, and to generate streams of bipolar signals with no direct-current (DC) power content through balancing. Thus, they find applications in magnetic recording (MR) devices, in Flash devices, in optical recording devices, in addition to some computer standards. This paper introduces a new family of fixed-length, binary constrained codes, namely, lexicographically-ordered constrained codes (LOCO codes) for bipolar non-return-to-zero signaling. LOCO codes are capacity achieving, the lexicographic indexing enables simple, practical encoding and decoding, and this simplicity is demonstrated through analysis of circuit complexity. LOCO codes are easy to balance, and their inherent symmetry minimizes the rate loss with respect to unbalanced codes having the same constraints. Furthermore, LOCO codes that forbid certain patterns can be used to alleviate inter-symbol interference in MR systems and inter-cell interference in Flash systems. Experimental results demonstrate a gain of up to 10% in rate achieved by LOCO codes compared with practical run-length limited codes designed for the same purpose. Simulation results suggest that it is possible to achieve channel density gains of about 20% in MR systems by using a LOCO code to encode only the parity bits of a low-density parity-check code before writing.

... The balancing index is communicated via a (balanced) prefix, allowing the decoder to retrieve the original sequence by inverting the bits beyond the balancing index in the received sequence. In [5], Immink et al. presented an adaptation of this technique in order to balance RLL sequences. As for the original Knuth method, the length of the prefix, and thus the redundancy, grows with the length of the source sequence. ...

... In [1], Ferreira et al. proposed an alternative method for balancing RLL sequences, for which the redundancy does not depend on the length of the source sequence. Hence, for long sequences, this method is more efficient than the method in [5]. The key idea is to communicate Knuth's balancing index by inserting a marker within the RLL sequence rather than using a prefix. ...

... When comparing the methods from [5] and [1], the latter is less redundant and less complex, at the price of an occasional violation of the upper runlength constraint. As argued in [1], such violation is defendable based on technological developments. ...

A well-known method for balancing binary sequences, in the sense of forcing them to have as many zeroes as ones, was proposed by Knuth. It is based on the inversion of all bits beyond a certain balancing index, and communicating this index via a prefix. This principle has also been applied to balance runlength-limited (RLL) sequences. Another Knuth-based approach exploits the insertion of a marker in the RLL sequence causing a deliberate runlength violation at the position of the balancing index. This marker method has an advantage over the prefix method, since its redundancy does not grow with the length of the source blocks. In this paper, the markers are optimized with respect to their length and the severeness of the runlength violation, for possible application in future (optical) recording systems.

... It was shown that this scheme performs better than the original Knuth schemes in terms of low frequency suppression, with similar (and sometimes lower) redundancy. In [7], a scheme is proposed that applies the Knuth balancing approach to RLL sequences, adding a balanced prefix to encode the inversion point, as well as a d+1 length interfix. The total overhead introduced by this scheme is relatively large. ...

... As an example, a d = 1, k = 3 code with a redundancy of 20 encodes only 71 distinct inversion points. See Table III, IV and V in [7] for more examples. ...

... The overhead introduced by the k-violating marker is less than the overhead required to encode the inversion point with a separate (dk)-compliant prefix, provided the RLL source word length m is large enough as per [7]. ...

An RDS-minimizing, multi-mode modulation coding scheme, using maximum run-length violating markers and based on the Knuth balancing approach, is applied to run-length limited sequences. Simulations are used to measure spectra and DC suppression performance. A comparison to EFM is included.

... There is very little literature on constructing such long block codes. We now present a simple method, requiring fewer overhead bits and also less implementation complexity than the more direct application of Knuth's algorithm in [5]. ...

... Results: With the above marker construction, the overhead is 2k + 2d + 4 bits per codeword. We compare this to the previous construction, presented in [5], where Knuth's algorithm is directly applied to constrained sequences, which requires a fixed length interfix of d + 1 bits as well as a prefix, of length dependent on (d, k) and n, as tabled in [5]. Note that the overhead of the marker construction does not depend on n. ...

... Results: With the above marker construction, the overhead is 2k + 2d + 4 bits per codeword. We compare this to the previous construction, presented in [5], where Knuth's algorithm is directly applied to constrained sequences, which requires a fixed length interfix of d + 1 bits as well as a prefix, of length dependent on (d, k) and n, as tabled in [5]. Note that the overhead of the marker construction does not depend on n. ...

In this reported work, Knuth's balancing scheme, which was originally developed for unconstrained binary codewords is adapted. Presented is a simple method to balance the NRZ runlength constrained block codes corresponding to (d, k) constrained NRZI sequences. A short marker violating the maximum runlength or k constraint is used to indicate the balancing point for Knuth's inversion. The marker requires fewer overhead bits and less implementation complexity than indexing the balancing point's address by mapping it onto a (d, k) or runlength constrained prefix, such as when applying Knuth's original scheme more directly. The new code construction may be attractive for future magnetic and especially optical recording schemes. In fact the current optical storage media, such as the CD, DVD and Blue Ray Disc, all attempt to achieve some suppression of low frequency components of the constrained codes, by exploiting a limited degree of freedom within the set of candidate (d, k) words.

... In the serial or sequential scheme, the prefix comprises the original sequence's weight then, the com-plementing is performed on the overall sequence (prefix and original sequence) up to the balancing point. Improvements and embellishments of Knuth's binary methods can be found in [5]- [10]. ...

... The values of B q k correspond to the central q-nomial coefficients of (1+x+x 2 +x 3 +· · ·+x q−1 ) k , where x is an unknown variable. Table I shows the cardinalities of B q k for k = [3,10] with q = [2,6]. Some of these coefficients can be acquired from [20]. ...

A simplified and efficient algorithm with parallel decoding capacity was presented by Knuth for balancing binary sequences (binary sequences are a combination of zeros and ones, making up a set of instructions and data that a computer understands). This study proposes a generalization of this algorithm for q-ary sequences (multiplexed sequences, clock-controlled sequences, geometric sequences). This new approach is also based on simplicity and parallel decoding for q-ary balanced codes. Furthermore, it has a fixed redundancy for short and long sequences that equals logq k, where k is the sequence length, and no lookup tables are required.

... , x v may lead to a violation of the runlength constraint near the position v. In order to solve this problem we follow the teachings of [22]. We assume that a properly chosen interfix (buffering) bit is stuffed (inserted) at the balancing position v + 1 so that balancing is obtained without violating the runlength constraint. ...

We describe properties and constructions of constraint-based codes for DNA-based data storage which accounts for the maximum repetition length and AT balance. We present algorithms for computing the number of sequences with maximum repetition length and AT balance constraint. We present efficient routines for translating binary runlength limited and/or balanced strings into DNA strands. We show that the implementation of AT-balanced codes is straightforwardly accomplished with binary balanced codes. We present codes that accounts for both the maximum repetition length and AT balance.

... The construction of efficient balanced codes has been extensively studied in [7]- [12], and extensions to non-binary and two-dimensional balanced codes have been considered in [13]- [16]. Codes that combine the balanced property with certain other constraints, such as runlength limitations, have also been addressed in, for example, [17]. ...

Inter-cell interference (ICI) is one of the main obstacles to precise programming (i.e., writing) of a flash memory. In the presence of ICI, the voltage level of a cell might increase unexpectedly if its neighboring cells are programmed to high levels. For q-ary cells, the most severe ICI arises when three consecutive cells are programmed to levels high - low - high, represented as (q-1)0(q-1), resulting in an unintended increase in the level of the middle cell and the possibility of decoding it incorrectly as a nonzero value. ICI-free codes are used to mitigate this phenomenon by preventing the programming of any three consecutive cells as (q-1)0(q-1). In this work, we extend ICI-free codes in two directions. First, we consider binary balanced ICI-free codes which, in addition to forbidding the 101 pattern, require the number of 0 symbols and 1 symbols to be the same. Using combinatorial methods, we determine the asymptotic information rate of these codes and show that the asymptotic rate loss due to the imposition of the balanced property is approximately 2%. Extensions to q-ary cells, for q>2 are also discussed. Next, we consider q-ary ICI-free write-once-memory (WOM) codes that support multiple writes of a WOM while mitigating ICI effects. These codes forbid the appearance of the (q-1)0(q-1) pattern in any codeword used in any writing step. Using properties of two-dimensional constrained codes and generalized WOMs, we characterize the maximum sum-rate of t-write ICI-free WOM codes or, equivalently, the t-write sum-capacity of an ICI-free WOM.

A novel Knuth-like balancing method for runlength-limited words is presented, which forms the basis of new variable- and fixed-length balanced runlength-limited codes that improve on the code rate as compared to balanced runlength-limited codes based on Knuth’s original balancing procedure developed by Immink
et al.
While Knuth’s original balancing procedure, as incorporated by Immink
et al.
, requires the inversion of each bit one at a time, our balancing procedure only inverts the runs as a whole one at a time. The advantage of this approach is that the number of possible inversion points, which needs to be encoded by a redundancy-contributing prefix/suffix, is reduced, thereby allowing a better code rate to be achieved. Furthermore, this balancing method also allows for runlength violating markers which improve, in a number of respects, on the optimal such markers based on Knuth’s original balancing method.

Preface to the Second Edition
About five years after the publication of the first edition, it was felt that an update of this text would be inescapable as so many relevant publications, including patents and survey papers, have been published. The author's principal aim in writing the second edition is to add the newly published coding methods, and discuss them in the context of the prior art. As a result about 150 new references, including many patents and patent applications, most of them younger than five years old, have been added to the former list of references. Fortunately, the US Patent Office now follows the European Patent Office in publishing a patent application after eighteen months of its first application, and this policy clearly adds to the rapid access to this important part of the technical literature. I am grateful to many readers who have helped me to correct (clerical) errors in the first edition and also to those who brought new and exciting material to my attention. I have tried to correct every error that I found or was brought to my attention by attentive readers, and seriously tried to
avoid introducing new errors in the Second Edition.
China is becoming a major player in the art of constructing, designing, and basic research of electronic storage systems. A Chinese translation of the first edition has been published early 2004. The author is indebted to prof. Xu, Tsinghua University, Beijing, for taking the initiative for this Chinese version, and also to Mr. Zhijun Lei, Tsinghua University, for undertaking the arduous task of translating this book from English to Chinese. Clearly, this translation makes it possible that a billion more people will now have access to it.
Kees A. Schouhamer Immink
Rotterdam, November 2004

In 1986, Don Knuth published a very simple algorithm for constructing sets of bipolar codewords with equal numbers of 1s and 0s, called balanced codes. Knuth's algorithm is, since look-up tables are absent, well suited for use with large codewords. The redundancy of Knuths balanced codes is a factor of two larger than that of a code comprising the full set of balanced codewords. In our paper we will present results of our attempts to improve the performance of Knuths balanced codes.

In 1986, Don Knuth published a very simple algorithm for constructing sets of bipolar codewords with equal numbers of one's and zero's, called balanced codes. Knuth's algorithm is well suited for use with large codewords. The redundancy of Knuth's balanced codes is a factor of two larger than that of a code comprising the full set of balanced codewords. In this paper, we will present results of our attempts to improve the performance of Knuth's balanced codes.

The prior art construction of sets of balanced codewords by Knuth is attractive for its simplicity and absence of look-up tables, but the redundancy of the balanced codes generated by Knuth's algorithm falls a factor of two short with respect to the minimum required. We present a new construction, which is simple, does not use look-up tables, and is less redundant than Knuth's construction. In the new construction, the user word is modified in the same way as in Knuth's construction, that is by inverting a segment of user symbols. The prefix that indicates which segment has been inverted, however, is encoded in a different, more efficient, way.

We present an enumerative technique for encoding and decoding
DC-free runlength-limited sequences. This technique enables the encoding
and decoding of sequences approaching the maxentropic performance bounds
very closely in terms of the code rate and low-frequency suppression
capability. Use of finite-precision floating-point notation to express
the weight coefficients results in channel encoders and decoders of
moderate complexity. For channel constraints of practical interest, the
hardware required for implementing such a quasi-maxentropic coding
scheme consists mainly of a ROM of at most 5 kB

Coding schemes in which each codeword contains equally many zeros and ones are constructed in such a way that they can be efficiently encoded and decoded.

A balanced code with r check bits and k information bits is a
binary code of length k+r and cardinality 2<sup>k</sup> such that each
codeword is balanced; that is, it has [(k+r)/2] 1's and [(k+r)/2] 0's.
This paper contains new methods to construct efficient balanced codes.
To design a balanced code, an information word with a low number of 1's
or 0's is compressed and then balanced using the saved space. On the
other hand, an information word having almost the same number of 1's and
0's is encoded using the single maps defined by Knuth's (1986)
complementation method. Three different constructions are presented.
Balanced codes with r check bits and k information bits with
k⩽2<sup>r+1</sup>-2, k⩽3×2<sup>r</sup>-8, and
k⩽5×2<sup>r</sup>-10r+c(r), c(r)∈{-15, -10, -5, 0, +5},
are given, improving the constructions found in the literature. In some
cases, the first two constructions have a parallel coding scheme

In a balanced code each codeword contains equally many 1's and
0's. Parallel decoding balanced codes with 2<sup>r</sup> (or 2<sup>r
</sup>-1) information bits are presented, where r is the number
of check bits. The 2<sup>2</sup>-r-1 construction given by D.E. Knuth
(ibid., vol.32, no.1, p.51-3, 1986) is improved. The new codes are shown
to be optimal when Knuth's complementation method is used

For n >0, d ⩾0, n ≡ d
(mod 2), let K ( n , d ) denote the minimal
cardinality of a family V of ±1 vectors of dimension
n , such that for any ±1 vector w of dimension
n there is a v ∈ V such that | v -
w |⩽ d , where v - w is the usual
scalar product of v and w . A generalization of a
simple construction due to D.E. Knuth (1986) shows that K ( n
, d )⩽[ n /( d +1)]. A linear algebra
proof is given here that this construction is optimal, so that
K ( n , d )-[ n /( d +1)] for all
n ≡ d (mod 2). This construction and its
extensions have applications to communication theory, especially to the
construction of signal sets for optical data links