Content uploaded by Kees Schouhamer Immink

Author content

All content in this area was uploaded by Kees Schouhamer Immink on Nov 25, 2017

Content may be subject to copyright.

A preview of the PDF is not available

Preface to the Second Edition
About five years after the publication of the first edition, it was felt that an update of this text would be inescapable as so many relevant publications, including patents and survey papers, have been published. The author's principal aim in writing the second edition is to add the newly published coding methods, and discuss them in the context of the prior art. As a result about 150 new references, including many patents and patent applications, most of them younger than five years old, have been added to the former list of references. Fortunately, the US Patent Office now follows the European Patent Office in publishing a patent application after eighteen months of its first application, and this policy clearly adds to the rapid access to this important part of the technical literature. I am grateful to many readers who have helped me to correct (clerical) errors in the first edition and also to those who brought new and exciting material to my attention. I have tried to correct every error that I found or was brought to my attention by attentive readers, and seriously tried to
avoid introducing new errors in the Second Edition.
China is becoming a major player in the art of constructing, designing, and basic research of electronic storage systems. A Chinese translation of the first edition has been published early 2004. The author is indebted to prof. Xu, Tsinghua University, Beijing, for taking the initiative for this Chinese version, and also to Mr. Zhijun Lei, Tsinghua University, for undertaking the arduous task of translating this book from English to Chinese. Clearly, this translation makes it possible that a billion more people will now have access to it.
Kees A. Schouhamer Immink
Rotterdam, November 2004

Content uploaded by Kees Schouhamer Immink

Author content

All content in this area was uploaded by Kees Schouhamer Immink on Nov 25, 2017

Content may be subject to copyright.

A preview of the PDF is not available

... The coded sequence is denoted as a codeword in the case of convolutional codes or a chain of codewords in the case of block codes. The RLL codes are used in order to exclude long runs of equal symbols from the coded sequence [3]. These long runs can cause the signals which are transmitting the information to not have the required level of synchronization supportive properties. ...

... From the parity check matrices given in (2) or (3), the basic parameters of the RS codes could be derived, namely the codeword length n, (which is equal to the number of columns in them), number of information symbols k in each codeword (because the number of rows in them is equal to n − k ) and also the code distance d m , which is for the original RS codes equal to the number of rows plus one in (2) or (3). Usually these parameters for linear ...

... EXAMPLE The one-time extended RS code [8,6,3] defined in GF(8) generated by the primitive polynomial p(x) = x 3 + x + 1 has the following parity check matrix: ...

Reed Solomon codes were standardized for numerous wireless communication systems. Most practical Reed Solomon codes belong to non-binary linear block codes defined over finite fields with characteristic two. Each linear block code contains one codeword composed of all zeros. The concatenation of this and also other codewords can lead to long, theoretically even infinite, runs of equal symbols. Such long runs do not support synchronization in wireless receivers and therefore are unwanted. In this paper it is shown that extended and some appropriately shortened Reed Solomon codes constructed over finite fields with characteristic two can be transformed into Run Length Limited Reed Solomon codes. The presented method, if applicable, allows for doing it without inserting additional redundancy. Another advantage is that after the transformation, if some round conditions are fulfilled, the decoding does not have to be rebuilt.

... Since the physical characteristics of the available transmission channel may not be compatible with every code, constrained coding is often used to incorporate the additional requirements. In classical storage and communication systems, balanced codes and codes with restricted run-lengths are used to avoid voltage imbalance and charge accumulation between connected components [1]. A more novel application of constrained non-overlapping codes, introduced by Yazdi et al. [2], is DNA-based storage, where several constraints from biochemical properties and methods should be considered in addition to code balance and restriction of run-lengths. ...

This paper concerns non-overlapping codes, block codes motivated by synchronisation and DNA-based storage applications. Most existing constructions of these codes do not account for the restrictions posed by the physical properties of communication channels. If undesired sequences are not avoided, the system using the encoding may start behaving incorrectly. Hence, we aim to characterise all non-overlapping codes satisfying two additional constraints. For the first constraint, where approximately half of the letters in each word are positive, we derive necessary and sufficient conditions for the code's non-expandability and improve known bounds on its maximum size. We also determine exact values for the maximum sizes of polarity-balanced non-overlapping codes having small block and alphabet sizes. For the other constraint, where long sequences of consecutive equal symbols lead to undesired behaviour, we derive bounds and constructions of constrained non-overlapping codes. Moreover, we provide constructions of non-overlapping codes that satisfy both constraints and analyse the sizes of the obtained codes.

In DNA-based data storage, sequencing the stored DNA is essential in reading the stored data. Nanopore sequencing, an emerging sequencing technology, has attracted a lot of attention recently owing to their various advantages, in particular, it is portable, scalable, automated and rapid. However, several kinds of errors, including inter-symbol interference, noisy measurement, backtracking, and skipping, reduce the accuracy of the technology. Several coding schemes have been proposed recently to deal with various kinds of error sources, especially inter-symbol interference and noisy measurement. In this work, we focus on backtracking and skipping errors and aim to design a good coding scheme to combat these errors. We first note that backtracking and skipping errors can be modelled as synchronization errors, including duplication and deletion errors. Next, we propose new families of codes to locate and correct all synchronization errors caused by backtracking and skipping. The proposed codes are constrained codes avoiding prescribed set of patterns. Then, we focus on studying these constrained codes. In particular, we present a method to compute their maximal asymptotic rates. For illustration, we use experimental data available online to compute the numerical results for maximal asymptotic rates of these codes.

Systems with 1-bit quantization and oversampling are promising for the Internet of Things (IoT) networks because they can reduce the power consumption of the analog-to-digital-converters in the devices. The novel time-instance zero-crossing (TI ZX) modulation is a promising approach for IoT networks and devices but existing studies rely on optimization problems with high computational complexity and delay. In this work, we propose a practical waveform design based on the established TI ZX modulation for a multiuser multi-input multi-output (MIMO) systems in the downlink scenario with 1-bit quantization and temporal oversampling at the receivers. In this sense, the proposed temporal transmit signals are constructed by concatenating segments of coefficients which convey the information into the time-instances of zero-crossings according to the TI ZX mapping rules. The proposed waveform design is compared with other methods from the literature in terms of bit error rate and normalized power spectral density. Numerical results show that the proposed technique is suitable for multiuser MIMO systems with 1-bit quantization while tolerating some small amount of out-of-band radiation.

We study and propose schemes that map messages onto constant-weight codewords using variable-length prefixes. We provide polynomial-time computable formulas that estimate the average number of redundant bits incurred by our schemes. In addition to the exact formulas, we also perform an asymptotic analysis and demonstrate that our scheme uses 1/2 log
_{2}
n
+
O
(1) redundant bits to encode messages into length-n words with weight (
n
/2) + μ for constant μ. We also propose schemes that map messages into balanced codebooks with error-correcting capabilities. For such schemes, we provide methods to enumerate the average number of redundant bits.

On a finite sequence of binary (0-1) trials we define a random variable enumerating patterns of length subject to certain constraints. For sequences of independent and identically distributed binary trials exact probability mass functions are established in closed forms by means of combinatorial analysis. An explicit expression of the mean value of this random variable is obtained. The results associated with the probability mass functions are extended on sequences of exchangeable binary trials. An application in Information theory concerning counting of a class of run-length-limited binary sequences is provided as a direct byproduct of our study. Illustrative numerical examples exemplify further the results.

The sound from a Compact Disc system encoded into data bits and modulated into channel bits is sent along the 'transmission channel' consisting of write laser - master disk - user disk - optical pick-up. The maximum information density on the disk is determined by the diameter d of the laser light spot on the disk and the 'number of data bits per light spot'. The effect of making d smaller is to greatly reduce the manufacturing tolerances for the player and the disk. The compromise adopted is d approximately equals 1 mu m, giving very small tolerances for objective disk tilt, disk thickness and defocusing.

The systematic design of DC-constrained codes based on codewords of fixed length is considered. Simple recursion relations for enumerating the number of codewords satisfying a constraint on the maximum unbalance of ones and zeros in a codeword are derived. An enumerative scheme for encoding and decoding maximum unbalance constrained codewords with binary symbols is developed. Examples of constructions of transmission systems based on unbalance constrained codewords are given. A worked example of an 8b10b channel code is given being of particular interest because of its practical simplicity and relative efficiency.

A (1,6) runlength limited (RLL) code is described in which the encoder converts unconstrained data strings into (1,6) RLL constrained strings, where the rate is 2/3. The decoder, having limited error propagation recovers the original data strings. One error in the encoded string can result in no more than eleven errors in the decoded data.

A code that converts unconstrained data strings into (5, 12) run-length limited constrained strings is presented. Its decoder recovers the original data strings, and its error propagation is limited. One error in the encoder string can result in no more than five errors in the decoded data.

The stochastic process appearing at the output of a digital encoder is investigated. Based upon the statistics of the code being employed, a systematic procedure is developed by means of which the average power spectral density of the process can be determined. The method is readily programmed on the digital computer, facilitating the calculation of the spectral densities for large numbers of codes. As an example of its use, the procedure is applied in the case of a specific multi-alphabet, multi-level code.

This paper analyzes a block-coding scheme designed to suppress spectral energy near f = 0 for any binary message sequence. In this scheme, the polarity of each block is either maintained or reversed, depending on which decision drives the accumulated digit sum toward zero. The polarity of the block's last digit informs the decoder as to which decision was made.
Our objective is to derive the average power spectrum of the coded signal when the message is a random sequence of +1's and −1's and the block length (M) is odd. The derivation uses a mixture of theoretical analysis and computer simulation. The theoretical analysis leads to a spectrum description in terms of a set of correlation coefficients, {ρq}, q = 1, 2, etc., with the ρq's functions of M. The computer simulation uses FFT algorithms to estimate the power spectrum and autocorrelation function of the block-coded signal. From these results, {ρq} is estimated for various M. A mathematical approximation to ρg in terms of q and M is then found which permits a closed-form evaluation of the power spectrum. Comparisons between the final formula and simulation results indicate an accuracy of ±5 percent (±0.2 dB) or better.
The block-coding scheme treated here is of particular interest because of its practical simplicity and relative efficiency. The methods used to analyze it can be applied to other block-coding schemes as well, some of which are discussed here for purposes of comparison.

A method for determining maximum-size block codes, with the property that no concatenation of codewords violates the input restrictions of a given channel, is presented. The class of channels considered is essentially that of Shannon (1948) in which input restrictions are represented through use of a finite-state machine. The principal results apply to channels of finite memory and codes of length greater than the channel memory but shorter codes and non-finite memory channels are discussed briefly.
A partial ordering is first defined over the set of states. On the basis of this ordering, complete terminal sets of states are determined. Use is then made of Mason's general gain formula to obtain a generating function for the size of the code which is associated with each complete terminal set. Comparison of coefficients for a particular block length establishes an optimum terminal set and codewords of the maximum-size code are then obtained directly.
Two important classes of binary channels are then considered. In the first class, an upper bound is placed on the separation of 1's during transmission while, in the second class, a lower bound is placed on this separation. Universal solutions are obtained for both classes.

A special family J of prefix codes is considered. It is verified that if A ε J has not a certain synchronizing property, then A = Cp (p > 1), where C is another code from the same family.

We derive the limiting efficiencies of dc-constrained codes. Given bounds on the running digital sum (RDS), the best possible coding efficiency η, for a K-ary transmission alphabet, is η = log2 λmax/log2 K, where λmax is the largest eigenvalue of a matrix which represents the transitions of the allowable states of RDS. Numerical results are presented for the three special cases of binary, ternary and quaternary alphabets.