## No full-text available

To read the full-text of this research,

you can request a copy directly from the authors.

Traditional schemes for encoding and decoding
runlength-constrained sequences using the enumeration principle require
two sets of weighting coefficients. A new enumeration is presented
requiring only one set of coefficients

To read the full-text of this research,

you can request a copy directly from the authors.

... Severe error propagation resulting from the use of long codewords can be avoided by reversing the conventional hierarchy of outer error correcting code and inner modulation code [6]. Enumerative decoding is done by forming the weighted sum of the symbols of the codeword received [7]. The integervalued weights used in forming this sum are a function of the channel constraints in force. ...

... Let be the number of elements in for which the first coordinates are . The rank of can be obtained by using (7) An alternative of Cover's enumeration scheme can be given by counting the number of elements in that have a lexicographic index higher than , the inverse rank of [7]. The inverse rank of can be obtained by using (8) where , the complement of . ...

... The algorithms (7) and (8) implement the decoding operation, i.e., given the constrained sequence , find the corresponding lexicographic index in set . The inverse rank has the virtue that the same set of weight coefficients can be used for encoding and decoding [7]. We will now consider the inverse rank for enumerative decoding of DCRLL sequences. ...

We present an enumerative technique for encoding and decoding
DC-free runlength-limited sequences. This technique enables the encoding
and decoding of sequences approaching the maxentropic performance bounds
very closely in terms of the code rate and low-frequency suppression
capability. Use of finite-precision floating-point notation to express
the weight coefficients results in channel encoders and decoders of
moderate complexity. For channel constraints of practical interest, the
hardware required for implementing such a quasi-maxentropic coding
scheme consists mainly of a ROM of at most 5 kB

... This includes all fixed-rate codes with finite input and output block lengths. Two existing near-optimal, fixed-rate encoding techniques are based on enumerative coding [10],[19],[43] and arithmetic coding [31],[58] . Some capacity-achieving variablerate codes are outlined in [3],[6],[27]. ...

... Chap.6] for a summary,[10],[43],[19] for more details), which has been shown to achieve very high encoding rates that approach capacity with increasing codeword length n. However, the disadvantages of enumerative coding are bitwise encoding and decoding ; and additions and comparisons with pre-stored, n-bit weighting coefficients. ...

Run-Length-Limited (RLL) channels are found in digital recording systems like the Hard Disk Drive (HDD), Compact Disc (CD), and Digital Versatile Disc (DVD). This thesis presents novel encoding algorithms for RLL channels based on a simple technique called bit stuffing. First, two new capacity-achieving variable-rate code constructions are proposed for (d,k) constraints. The variable-rate encoding ideas are then extended to (0,G/I) and other RLL constraints. Since variable-rate codes are of limited practical value, the second half of this thesis focuses on fixed-rate codes. The fixed-rate bit stuff (FRB) algorithm is proposed for the design of simple, high-rate (0,k) codes. The key to achieving high encoding rates with the FRB algorithm lies in a novel, iterative pre-processing of the fixed-length input sequence prior to bit stuffing. Detailed rate analysis for the proposed FRB algorithm is presented, and upper and lower bounds on the asymptotic (in input block length) encoding rate are derived. Several system-level issues of the proposed FRB codes are addressed, and FRB code parameters required to design rate 100/101 and rate 200/201 (0,k) codes are tabulated. Finally, the proposed fixed-rate encoding is extended to (0,G/I) constraints. Ph.D. Committee Chair: McLaughlin, Steven; Committee Member: Barnwell, Thomas; Committee Member: Barry, John; Committee Member: Fekri, Faramarz; Committee Member: Tetali, Prasad

... An alternative of the above enumeration scheme can be given by counting the number of elements of that have a lexicographic index higher than , the inverse rank of . The inverse rank of is given by (2) where , the complement of In the next section, taken from [16], we will apply the above theory to the enumeration of sequences. We will first give an algorithm for enumerating -constrained sequences for which the length of the leading zero-run is not constrained. ...

... [18] we know that the sequence of the mantissa of will ultimately become (and remain) periodic. That is, there are integers and such that (15) In other words, per-cycle period of length , the number of sequences increases with a fixed factor, which is equal to a power of two From the above it is immediate that (16) The theory of feedback registers [18] stipulates that the cycle period must be smaller than As this number is huge in the range of parameters of practical interest, we are inclined to believe that Proposition 2 is not of great practical interest. However, results of a computer search, which are listed in Table II, reveal that relatively small cycle periods are surprisingly frequent. ...

A new coding technique is proposed that translates user
information into a constrained sequence using very long codewords. Huge
error propagation resulting from the use of long codewords is avoided by
reversing the conventional hierarchy of the error control code and the
constrained code. The new technique is exemplified by focusing on (d,
k)-constrained codes. A storage-effective enumerative encoding scheme is
proposed for translating user data into long dk sequences and vice
versa. For dk runlength-limited codes, estimates are given of the
relationship between coding efficiency versus encoder and decoder
complexity. We show that for most common d, k values, a code rate of
less than 0.5% below channel capacity can be obtained by using hardware
mainly consisting of a ROM lookup table of size 1 kbyte. For selected
values of d and k, the size of the lookup table is much smaller. The
paper is concluded by an illustrative numerical example of a rate
256/466, (d=2, k=15) code, which provides a serviceable 10% increase in
rate with respect to its traditional rate 1/2, (2, 7) counterpart

... Enumerative coding makes it possible to translate source words into codewords and vice versa by invoking an algorithm rather than performing the translation with a look-up table. In [5], a method is described that requires storage capacity of a bank of n integer coefficients. The algorithm itself is conceptually very simple. ...

Runlength-limited (RLL) codes have found widespread usage in
optical and magnetic recording products. Specifically, the RLL codes EFM
and its successor, EFMPlus, are used in the compact discs (CD) and the
digital versatile discs (DVD), respectively. EFMPlus offers a 6%
increase in storage capacity with respect to EFM. The work reports on
the feasibility and limits of EFM like codes that offer an even larger
capacity. To this end, we provide an overview of the various limiting
factors, such as runlength constraint, dc-content, and code complexity,
and outline their relative effect on the code rate. In the second part
of the article we show how the performance predicted by the tenets of
information theory can be realized in practice. A worked example of a
code whose rate is 7.5% larger than EFMPlus, namely a rate 256/476,
(d=2, k=15) code, showing a 13 dB attenuation at f<sub>b</sub>=10<sup>-3
</sup>, is given to illustrate the theory

... Extremely large values of p and q are encountered in the design of high-rate codes, or when the efficiency of the code should be very high. In such cases, methods as guided scrambling [38], or a promising variant of enumerative encoding ( [39], [40], see also [34], Chapter 1) should be considered. Here, the extremely large block length imposes a special system architecture to limit error propagation. ...

Modulation codes such as runlength-limited codes have been widely employed in magnetic and optical data storage systems. We review the main techniques involved in the design and use of these codes: the maximal code rate or capacity, graphical presentations of constraints, encoders and decoders, and code construction methods such as the ACH state-splitting algorithm. We conclude this survey by discussing some recent developments and research trends.

... The construction is designed to preserve the advantages of fixed-length RLL codes such as: a) the translation of source words into codewords can be accomplished with a single look-up table; b) the cascading of codewords can be done with a simple merging rule. Although Immink [5] has presented a new version of the coding-decoding algorithm that uses a single new set of weighting coefficients n s , he did not provide also a method for determining the value of those integer coefficients. The proposed algorithm could be used for the computing the n s coefficients. ...

The construction procedure for (d,k,l,r)-sequences by traditional methods (based on the enumeration principle) requires two sets of weighting coefficients. Based on a set of parameters and recursive relationships, the proposed algorithm with just one set of weighting coefficients is presented. A new formula to determine the number of the messages permitted on constrained channels is introduced.

Permutation codes, introduced by Slepian (1965), were shown to
perform well on an additive white Gaussian noise (AWGN) channel.
Unfortunately, these codes required large lookup tables, making them
quite complex to implement even though the maximum-likelihood decoder is
very simple. In this correspondence, we present an enumeration scheme
which encodes and decodes permutation codes with low complexity. We
concentrate on the use of permutation codes for constructing high-rate
codes that satisfy runlength-limited constraints. Wolf (1990) showed
that permutation codes can be used for runlength constraints and that
they have rate that asymptotically achieves the capacity of a noiseless,
runlength-limited constrained channel. Wolf, however, gave no efficient
encoders/decoder. Our code construction is enumerative, but unlike other
enumerative codes, has storage requirements that are a function of the
runlength parameters d and k instead of the block length n. In addition,
these codes have error detection and correction capabilities. Finally,
we use this approach to construct (0, G/I) codes whereby all
odd-numbered occurrences of double-adjacent errors are detected. As an
example, a 99.2% efficient, rate 498/528, (0, 6/3) code is presented

Enumerative coding is an attractive algorithmic procedure for
translating long source words into codewords and vice versa. The usage
of long codewords makes it possible to approach a code rate which is as
close as desired to Shannon's noiseless capacity of the constrained
channel. Enumerative encoding is prone to massive error propagation as a
single bit error could ruin entire decoded words. This contribution
evaluates the effects of error propagation of the enumerative coding of
runlength-limited sequences

We introduce a new enumerative encoding method for (d,k) codes.
The encoding algorithm, which is based on enumeration of multiset
permutations, is conceptually simpler and computationally less expensive
than other algorithms proposed thus far. We also describe a new
application of enumerative encoding methods for phrase length
distribution shaping of run-length-limited (RLL) sequences. We
demonstrate that by reducing the probability of occurrence of long
phrases in maxentropic RLL sequences, the frequency of patterns that
account for most of the errors in magnetic recording systems can be
decreased

In this work, we consider the analysis and design of optimal
block-decodable M-ary runlength-limited (RLL) codes. We present two
general construction methods: one based on permutation codes due to
Datta and McLaughlin (1999), and the other, a nonbinary generalization
of the binary enumeration methods of Patrovics and Immink (1996), and Gu
and Fuja (1994). The construction based on permutation codes is simple
and asymptotically (in block length) optimal, while the other
construction is optimal in the sense that the resulting codes have the
highest rate among all block-decodable codes for any block length. In
the process, we shall also extend a result due to Zehavi and Wolf (1988)
on the capacity of binary (d, k) constraints to M-ary channels. Finally,
we present examples of template codes: remarkably low-complexity (M,d,k)
block codes which achieve the optimal rate without the use of
enumeration

We present an enumerative technique for encoding and decoding
DC-free runlength-limited sequences. We use finite-precision floating
point notation to express the weight coefficients. Code rates very close
to Shannon capacity can be achieved by using hardware mainly consisting
of a ROM of at most 5 kByte

The storage requirements of conventional enumerative schemes can be reduced by using floating point arithmetical operations instead of the conventional fixed point operations. The new enumeration scheme incurs a small coding loss. A simple relationship between storage requirements and coding loss is derived

We present a new result on the capacity of M-ary runlength limited constraints and use it to construct asymptotically efficient block codes. The codes are constructed from permutation codes and are scalable with respect to M. The codes are useful in high-rate constrained coding and nonbinary optical recording systems that achieve both high density and backward compatibility to binary recording

In this paper we consider the analysis and design of optimal block-decodable M-ary runlength-limited (RLL) codes. We present two general construction methods: one based on permutation codes due to Datta and McLaughlin (1999), and the other a nonbinary generalization of the binary enumeration methods of Patrovics and Immink (1996), and Gu and Fuja (1994). The construction based on permutation codes is simple and asymptotically (in block-length) optimal, while the other construction is optimal in the sense that the resulting codes have the highest rate among all block-decodable codes for any block-length. In the process, we also prove a new result on the capacity of(M,d,k) constraints. Finally, we present examples of remarkably low-complexity (M,d,k) block codes which achieve the optimal rate without the use of enumeration

Given an abstract group G, an N dimensional orthogonal matrix representation G of G, and an "initial vector" x 2 RN, Slepian defined the group code generated by the representation G to be the set of vectors Gx. If G is a group of permutation matrices, the set Gx is called a "permutation code". For permutation codes a 'stack algorithm' decoder exists that, in the presence of low noise, produces the maximum-likelihood estimate of the transmitted vector by using far fewer computations than the standard decoder. In this paper a new concept of equivalence of codes of dierent dimensions is presented which is weaker than the usual definition

Since the early 1980s we have witnessed the digital audio and video revolution: the Compact Disc (CD) has become a commodity audio system. CD-ROM and DVD-ROM have become the de facto standard for the storage of large computer programs and files. Growing fast in popularity are the digital audio and video recording systems called DVD and BluRay Disc. The above mass storage products, which form the backbone of modern electronic entertainment industry, would have been impossible without the usage of advanced coding systems.
Pulse Code Modulation (PCM) is a process in which an analogue, audio or video, signal is encoded into a digital bit stream. The analogue signal is sampled, quantized and finally encoded into a bit stream. The origins of digital audio can be traced as far back as 1937, when Alec H. Reeves, a British scientist, invented pulse code modulation \cite{Ree}. The advantages of digital audio and video recording have been known and appreciated for a long time. The principal advantage that digital implementation confers over analog systems is that in a well-engineered digital recording system the sole significant degradation takes place at the initial digitization, and the quality lasts until the point of ultimate failure. In an analog system, quality is diminished at each stage of signal processing and the number of recording generations is limited. The quality of analog recordings, like the proverbial 'old soldier', just fades away. The advent of ever-cheaper and faster digital circuitry has made feasible the creation of high-end digital video and audio recorders, an impracticable possibility using previous generations of conventional analog hardware.
The general subject of coding for digital recorders is very broad, with its roots deep set in history. In digital recording (and transmission) systems, channel encoding is employed to improve the efficiency and reliability of the channel. Channel coding is commonly accomplished in two successive steps: (a) error-correction code followed by (b) recording (or modulation) code. Error-correction control is realized by adding extra symbols to the conveyed message. These extra symbols make it possible for the receiver to correct errors that may occur in the received message.
In the second coding step, the input data are translated into a sequence with special properties that comply with the given "physical nature" of the recorder. Of course, it is very difficult to define precisely the area of recording codes and it is even more difficult to be in any sense comprehensive. The special attributes that the recorded sequences should have to render it compatible with the physical characteristics of the available transmission channel are called channel constraints. For instance, in optical recording a '1' is recorded as pit and a '0' is recorded as land. For physical reasons, the pits or lands should neither be too long or too short. Thus, one records only those messages that satisfy a run-length-limited constraint. This requires the construction of a code which translates arbitrary source data into sequences that obey the given constraints. Many commercial recorder products, such as Compact Disc and DVD, use an RLL code.
The main part of this book is concerned with the theoretical and practical aspects of coding techniques intended to improve the reliability and efficiency of mass recording systems as a whole. The successful operation of any recording code is crucially dependent upon specific properties of the various subsystems of the recorder. There are no techniques, other than experimental ones, available to assess the suitability of a specific coding technique. It is therefore not possible to provide a cookbook approach for the selection of the 'best' recording code.
In this book, theory has been blended with practice to show how theoretical principles are applied to design encoders and decoders. The practitioner's view will predominate: we shall not be content with proving that a particular code exists and ignore the practical detail that the decoder complexity is only a billion times more complex than the largest existing computer. The ultimate goal of all work, application, is never once lost from sight. Much effort has been gone into the presentation of advanced topics such as in-depth treatments of code design techniques, hardware consequences, and applications. The list of references (including many US Patents) has been made as complete as possible and suggestions for 'further reading' have been included for those who wish to pursue specific topics in more detail.
The decision to update Coding Techniques for Digital Recorders, published by Prentice-Hall (UK) in 1991, was made in Singapore during my stay in the winter of 1998. The principal reason for this decision was that during the last ten years or so, we have witnessed a success story of coding for constrained channels. The topic of this book, once the province of industrial research, has become an active research field in academia as well. During the IEEE International Symposia on Information Theory (ISIT and the IEEE International Conference on Communications (ICC), for example, there are now usually three sessions entirely devoted to aspects of constrained coding. As a result, very exciting new material, in the form of (conference) articles and theses, has become available, and an update became a necessity.
The author is indebted to the Institute for Experimental Mathematics, University of Duisburg-Essen, Germany, the Data Storage Institute (DSI) and National University of Singapore (NUS), both in Singapore, and Princeton University, US, for the opportunity offered to write this book. Among the many people who helped me with this project, I like to thank Dr. Ludo Tolhuizen, Philips Research Eindhoven, for reading and providing useful comments and additions to the manuscript.
Preface to the Second Edition
About five years after the publication of the first edition, it was felt that an update of this text would be inescapable as so many relevant publications, including patents and survey papers, have been published. The author's principal aim in writing the second edition is to add the newly published coding methods, and discuss them in the context of the prior art. As a result about 150 new references, including many patents and patent applications, most of them younger than five years old, have been added to the former list of references. Fortunately, the US Patent Office now follows the European Patent Office in publishing a patent application after eighteen months of its first application, and this policy clearly adds to the rapid access to this important part of the technical literature.
I am grateful to many readers who have helped me to correct (clerical) errors in the first edition and also to those who brought new and exciting material to my attention. I have tried to correct every error that I found or was brought to my attention by attentive readers, and seriously tried to avoid introducing new errors in the Second
Edition.
China is becoming a major player in the art of constructing, designing, and basic research of electronic storage systems. A Chinese translation of the first edition has been published early 2004. The author is indebted to prof. Xu, Tsinghua University, Beijing, for taking the initiative for this Chinese version, and also to Mr. Zhijun Lei, Tsinghua University, for undertaking the arduous task of translating this book from English to Chinese. Clearly, this translation makes it possible that a billion more people will now have access to it.
Kees A. Schouhamer Immink, Rotterdam, November 2004

Preface to the Second Edition
About five years after the publication of the first edition, it was felt that an update of this text would be inescapable as so many relevant publications, including patents and survey papers, have been published. The author's principal aim in writing the second edition is to add the newly published coding methods, and discuss them in the context of the prior art. As a result about 150 new references, including many patents and patent applications, most of them younger than five years old, have been added to the former list of references. Fortunately, the US Patent Office now follows the European Patent Office in publishing a patent application after eighteen months of its first application, and this policy clearly adds to the rapid access to this important part of the technical literature. I am grateful to many readers who have helped me to correct (clerical) errors in the first edition and also to those who brought new and exciting material to my attention. I have tried to correct every error that I found or was brought to my attention by attentive readers, and seriously tried to
avoid introducing new errors in the Second Edition.
China is becoming a major player in the art of constructing, designing, and basic research of electronic storage systems. A Chinese translation of the first edition has been published early 2004. The author is indebted to prof. Xu, Tsinghua University, Beijing, for taking the initiative for this Chinese version, and also to Mr. Zhijun Lei, Tsinghua University, for undertaking the arduous task of translating this book from English to Chinese. Clearly, this translation makes it possible that a billion more people will now have access to it.
Kees A. Schouhamer Immink
Rotterdam, November 2004

Introduction In channel coding schemes we are usually faced with the problem of translating a given source word into another codeword and vice versa that satisfies some prescribed constraints. In the absence of an algorithmic rule defining the relationship between the source word and codeword, the translation operation will be simple look-up tables. As hardware grows with the number of codewords used, i.e. exponentially with the codeword length, there is a technological limit to the length of the words that can be translated using such a simple look-up table. A preferable alternative technique, called enumerative coding, makes it possible to perform the translation byinvoking an algorithmic procedure [1]. Essentially,enumerative decoding is accomplished by forming the weighted sum of the codeword received. The integer-valued weights used in forming the sum are a function of the channel constraints in force. Encoding is done by a method which is

Many modulation systems used in magnetic and optical recording are based on binary run-length-limited codes. We generalize the concept of dk -limited sequences of length n introduced by Tang and Bald by imposing constraints on the maximum number of consecutive zeros at the beginning and the end of the sequences. It is shown that the encoding and decoding procedures are similar to those of Tang and Bald. The additional constraints allow a more efficient merging of the sequences. We demonstrate two constructions of run-length-limited codes with merging rules of increasing complexity and efficiency and compare them to Tang and Bahl's method.

A special case with binary sequences was presented at the IEEE 1969 International Symposium on Information Theory in a paper titled “Run-Length-Limited Codes.

Let S be a given subset of binary n-sequences. We provide an explicit scheme for calculating the index of any sequence in S according to its position in the lexicographic ordering of S . A simple inverse algorithm is also given. Particularly nice formulas arise when S is the set of all n -sequences of weight k and also when S is the set of all sequences having a given empirical Markov property. Schalkwijk and Lynch have investigated the former case. The envisioned use of this indexing scheme is to transmit or store the index rather than the sequence, thus resulting in a data compression of (logmidSmid)/n .

A new family of codes is described for representing serial binary data, subject to constraints on the maximum separation between successive changes in value (0 rightarrow 1, 1 rightarrow , or both), or between successive like digits ( 0 's, 1 's, or both). These codes have application to the recording or transmission of digital data without an accompanying clock. In such cases, the clock must be regenerated during reading (receiving, decoding), and its accuracy controlled directly from the data itself. The codes developed for this type of synchronization are shown to be optimal, and to require a very small amount of redundancy. Their encoders and decoders are not unreasonably complex, and they can be easily extended to include simple error detection or correction for almost the same additional cost as is required for arbitrary data.

The paper describes a technique for constructing fixed-length
block codes for (d, k)-constrained channels. The codes described are of
the simplest variety-codes for which the encoder restricted to any
particular channel state is a one-to-one mapping and which is not
permitted to “look ahead” to future messages. Such codes can
be decoded with no memory and no anticipation and are thus an example of
what Schouhamer Immink (1992) has referred to as block-decodable. For a
given blocklength n and given values of (d, k), the procedure constructs
a code with the highest possible rate among all such block codes, and it
does so without the iterative search that is typically used (i.e.,
Franaszek's recursive elimination algorithm). The technique used is
similar to Beenker and Immink's (1983) “Construction 2” in
that every message is associated with a (d, k, l, r) sequence of length
n-d; however the values used in the present approach are l=k-d and
r=k-1, as opposed to Beenker and Schouhamer Immink's values of l=r=k-d.
Thus the present approach demonstrates that “Construction 2”
is optimal for d=1 but is suboptimal for d>1. Furthermore, the
structure of the present codes permits enumerative coding techniques to
simplify encoding and decoding