BookPDF Available

Codes for Mass Data Storage Systems

Authors:

Abstract

Preface to the Second Edition About five years after the publication of the first edition, it was felt that an update of this text would be inescapable as so many relevant publications, including patents and survey papers, have been published. The author's principal aim in writing the second edition is to add the newly published coding methods, and discuss them in the context of the prior art. As a result about 150 new references, including many patents and patent applications, most of them younger than five years old, have been added to the former list of references. Fortunately, the US Patent Office now follows the European Patent Office in publishing a patent application after eighteen months of its first application, and this policy clearly adds to the rapid access to this important part of the technical literature. I am grateful to many readers who have helped me to correct (clerical) errors in the first edition and also to those who brought new and exciting material to my attention. I have tried to correct every error that I found or was brought to my attention by attentive readers, and seriously tried to avoid introducing new errors in the Second Edition. China is becoming a major player in the art of constructing, designing, and basic research of electronic storage systems. A Chinese translation of the first edition has been published early 2004. The author is indebted to prof. Xu, Tsinghua University, Beijing, for taking the initiative for this Chinese version, and also to Mr. Zhijun Lei, Tsinghua University, for undertaking the arduous task of translating this book from English to Chinese. Clearly, this translation makes it possible that a billion more people will now have access to it. Kees A. Schouhamer Immink Rotterdam, November 2004
A preview of the PDF is not available
... A word of even length is balanced if its imbalance is exactly zero and a code is balanced if all its codewords are balanced. Due to their applications in various recording systems, balanced codes have been extensively studied (see [1] for a survey). In recent years, interest in balanced codes has rekindled because of the emergence of DNA macromolecules as a next-generation data storage medium with its unprecedented density, durability and replication efficiency [2], [3]. ...
... While numerically the redundancy values are close to the optimal value given in (3), a tight asymptotic analysis was not provided. In this work, we make modifications to the scheme in [10] and demonstrate that the average redundancy is at most 1 2 log 2 n+0.526 asymptotically. Even though the average redundancy of our scheme differs from the optimal (3) by an additive constant of approximately 0.2, our scheme and its accompanying analysis can be easily extended to the case where the imbalance is fixed to some positive constant. ...
... To this end, we borrow tools from lattice-path combinatorics and provide closed formulas for the upper bounds on the average redundancy of both Schemes A and B (described in Sections III and V). Unfortunately, as with [10], we are unable to complete the asymptotic analysis for Scheme A. Hence, we introduce Scheme B which uses slightly more redundant bits, and show that Scheme B incurs average redundancy of at most 1 2 log 2 n + 2.526 redundant bits asymptotically when q > 0. Interestingly, for the case q = 0, the average redundancy of Scheme B can be reduced to 1 2 log 2 n + 0.526 and this is better than the schemes given in [9]. ...
Preprint
Full-text available
We study and propose schemes that map messages onto constant-weight codewords using variable-length prefixes. We provide polynomial-time computable formulas that estimate the average number of redundant bits incurred by our schemes. In addition to the exact formulas, we also perform an asymptotic analysis and demonstrate that our scheme uses $\frac12 \log n+O(1)$ redundant bits to encode messages into length-$n$ words with weight $(n/2)+{\sf q}$ for constant ${\sf q}$.
... Here, RS codes also play an important role. For disk applications, very often an RS code over GF (2 8 ) is used, because the 8-bit symbols can be translated in an efficient way into, for instance, Run Length Limited (RLL) blocks, see [32]. An RLL block puts constraints on the runs of bits. ...
... Thus, we use the same approach as for the coded modulation invented by Ungerböck [58]. For further information, we refer to the textbook by Schouhamer-Immink [32]. ...
Preprint
Full-text available
The material in this book is presented to graduate students in Information and Communication theory. The idea is that we give an introduction to particular applications of information theory and coding in digital communications. The goal is to bring understanding of the underlying concepts, both in theory as well as in practice. We mainly concentrate on our own research results. After showing obtainable performance, we give a specific implementation using Reed-Solomon (RS) codes. The reason for using RS codes is that they can be seen as optimal codes with maximum obtainable minimum distance. Furthermore, the structure of RS codes enables specific applications that fit perfectly into the developed concepts. We do not intend to develop the theory of error correcting codes.
... Let SN (n, q) indicate the set of qth-order spectral-null words in φ n , with φ = {−1, +1} a bipolar alphabet. This set is defined as (see [6], [16], [7]) ...
... In the case q = 1, the q-OSN(n,k) codes coincide with the balanced codes [6], [8], [4], [2], [3], [16], [19], [21], [26], [32], [23], [10], [24], [25], [27], [13], [15], [14]. On the other hand, for q ≥ 2, the q-order spectral null codes are applied in digital recording and partial-response channels [7], [16]. Considering the q-OSN codes over the binary alphabet Z Z 2 = {0, 1} [21] and replacing the symbols −1 and +1 with 0 and 1 respectively, SN (n, q) becomes equivalent to the set SN (n, q) ⊆ Z Z n 2 S N (n, q) = ...
Article
Full-text available
The code design problem of non-recursive second-Order Spectral Null (2-OSN) codes is to convert balanced information words into 2-OSN words employing the minimum possible redundancy. Let k be the balanced information word length. If k∈2IN then the 2-OSN coding scheme has length n = k +r, with 2-OSN redundancy r∈2IN and n∈4IN. Here, we use a scheme with r = 2 log k + Θ(log log k). The challenge is to reduce redundancy even further for any given k. The idea is to exploit the degree of freedom to select from more than one possible 2-OSN encoding of a given balanced information word. To reduce redundancy, empirical results suggest that extra information δk = 0:5 log k + Θ(log log k) is obtained. Thus, the proposed approach would give a smaller redundancy r’ = 1:5 log k + Θ(log log k) less than r = 2 log k + Θ(log log k).
... S(n, p) is the set of all 1D sequences that satisfy the p-bounded constraint. One may use enumeration coding [26], [27] to construct rank p : S(n, p) → [|S(n, p)|] and unrank p : [|S(n, p)|] → S(n, p). The redundancy of this encoding algorithm is then λ(n, p) = n − log |S(n, p)| (bits). ...
Preprint
Full-text available
In this work, we study two types of constraints on two-dimensional binary arrays. In particular, given $p,\epsilon>0$, we study (i) The $p$-bounded constraint: a binary vector of size $m$ is said to be $p$-bounded if its weight is at most $pm$, and (ii) The $\epsilon$-balanced constraint: a binary vector of size $m$ is said to be $\epsilon$-balanced if its weight is within $[(0.5-\epsilon)*m,(0.5+\epsilon)*m]$. Such constraints are crucial in several data storage systems, those regard the information data as two-dimensional (2D) instead of one-dimensional (1D), such as the crossbar resistive memory arrays and the holographic data storage. In this work, efficient encoding/decoding algorithms are presented for binary arrays so that the weight constraint (either $p$-bounded constraint or $\epsilon$-balanced constraint) is enforced over every row and every column, regarded as 2D row-column (RC) constrained codes; or over every subarray, regarded as 2D subarray constrained codes. While low-complexity designs have been proposed in the literature, mostly focusing on 2D RC constrained codes where $p = 1/2$ and $\epsilon = 0$, this work provides efficient coding methods that work for both 2D RC constrained codes and 2D subarray constrained codes, and more importantly, the methods are applicable for arbitrary values of $p$ and $\epsilon$. Furthermore, for certain values of $p$ and $\epsilon$, we show that, for sufficiently large array size, there exists linear-time encoding/decoding algorithm that incurs at most one redundant bit.
... Compared to this approach, runlength coding yields a spectral efficiency that is mostly superior or at least similar. This is due to the fact that the optimization is performed w.r.t. the achievable rate in (11) and not w.r.t. the spectral efficiency, which is not equivalent due to the influence of the optimization on the 90% power containment bandwidth. ...
Preprint
Today's communication systems typically use high resolution analog-to-digital converters (ADCs). However, considering future communication systems with data rates in the order of 100Gbit/s the ADC power consumption becomes a major factor due to the high sampling rates. A promising alternative are receivers based on 1-bit quantization and oversampling w.r.t. the signal bandwidth. Such an approach requires a redesign of modulation, receiver synchronization, and demapping. A zero crossing modulation is a natural choice as the information needs to be carried in the zero crossing time instants. The present paper provides an overview on zero crossing modulation, achievable rates, sequence mapping and demapping, 1-bit based channel parameter estimation, and continuous phase modulation as an alternative zero crossing modulation scheme.
... We generated this mapping rule in the following four stages. First, considering the constraint of the maximum homopolymer runs, the DNA-based storage were seen as a constrained system with runlength limit (RLL) [23], denoted by (M, d, k) , where M = 4, d = 0 and k = 2 (Supplementary S5). Accordingly, the finite state transition diagram (FSTD) of the (4, 0, 2) homopolymer-constrained DNA data storage was generated (Supplementary S5 and Figure 2(C, i)). ...
Article
Full-text available
Background: With the inherent high density and durable preservation, DNA has been recently recognized as a distinguished medium to store enormous data over millennia. To overcome the limitations existing in a recently reported high-capacity DNA data storage while achieving a competitive information capacity, we are inspired to explore a new coding system that facilitates the practical implementation of DNA data storage with high capacity. Result: In this work, we devised and implemented a DNA data storage scheme with variable-length oligonucleotides (oligos), where a hybrid DNA mapping scheme that converts digital data to DNA records is introduced. The encoded DNA oligos stores 1.98 bits per nucleotide (bits/nt) on average (approaching the upper bound of 2 bits/nt), while conforming to the biochemical constraints. Beyond that, an oligo-level repeat-accumulate coding scheme is employed for addressing data loss and corruption in the biochemical processes. With a wet-lab experiment, an error-free retrieval of 379.1 KB data with a minimum coverage of 10x is achieved, validating the error resilience of the proposed coding scheme. Along with that, the theoretical analysis shows that the proposed scheme exhibits a net information density (user bits per nucleotide) of 1.67 bits/nt while achieving 91% of the information capacity. Conclusion: To advance towards practical implementations of DNA storage, we proposed and tested a DNA data storage system enabling high potential mapping (bits to nucleotide conversion) scheme and low redundancy but highly efficient error correction code design. The advancement reported would move us closer to achieving a practical high-capacity DNA data storage system.
... The 6B/8B encoding is a kind of direct current (DC) balanced code including the same numbers of 0 digits and 1 digits, which are utilized to maintain dc-balance in the communication system [30]. We define the disparity as the difference between the numbers of one/zero digits in the binary codes. ...
Article
In distributed Internet of Things (IoTs), channel hopping (CH) is an effective scheme for neighbor nodes to achieve blind rendezvous over common available channels and to establish communication links. When nodes are unaware of each other’s local clocks and the global channels, and have no pre-assigned CH strategies or identifiers (IDs), it is particularly challenging to guarantee blind rendezvous within a finite period of time, which has not been solved yet by using only one radio. In this paper, we propose a novel Quaternary-Encoding-based CH (QECH) algorithm to tackle the above issue. The QECH algorithm encodes a randomly selected channel into a quaternary string according to the 6B/8B encoding. We also append a common prefix string as well as the randomly selected channel before the quaternary string to guarantee overlaps in the asynchronous scenario. For all kinds of quaternary digits, we construct four mutually co-prime numbers to enumerate all possible combinations of the common available channels. We theoretically analyze the deterministic rendezvous principle and the upper-bounded rendezvous latency of the QECH algorithm. We also verify the effectiveness of the QECH algorithm through extensive simulations. Evaluation results show the superiority of the QECH algorithm in terms of rendezvous latency.
Article
In this paper, we present a deliberate bit flipping (DBF) coding scheme for binary two-dimensional (2-D) channels, where specific patterns in channel inputs are the significant cause of errors. The idea is to eliminate a constrained encoder and, instead, embed a constraint into an error correction codeword that is arranged into a 2-D array by deliberately flipping the bits that violate the constraint. The DBF method relies on the error correction capability of the code being used so that it should be able to correct both deliberate errors and channel errors. Therefore, it is crucial to flip minimum number of bits in order not to overburden the error correction decoder. We devise a constrained combinatorial formulation for minimizing the number of flipped bits for a given set of harmful patterns. The generalized belief propagation algorithm is used to find an approximate solution for the problem. We evaluate the performance gain of our proposed approach on a data-dependent 2-D channel, where 2-D isolated-bits patterns are the harmful patterns for the channel. Furthermore, the performance of the DBF method is compared with classical 2-D constrained coding schemes for the 2-D no isolated-bits constraint on a memoryless binary symmetric channel.
Article
This paper studies a deep learning (DL) framework to solve distributed non-convex constrained optimizations in wireless networks where multiple computing nodes, interconnected via backhaul links, desire to determine an efficient assignment of their states based on local observations. Two different configurations are considered: First, an infinite-capacity backhaul enables nodes to communicate in a lossless way, thereby obtaining the solution by centralized computations. Second, a practical finite-capacity backhaul leads to the deployment of distributed solvers equipped along with quantizers for communication through capacity-limited backhaul. The distributed nature and the non-convexity of the optimizations render the identification of the solution unwieldy. To handle them, deep neural networks (DNNs) are introduced to approximate an unknown computation for the solution accurately. In consequence, the original problems are transformed to training tasks of the DNNs subject to non-convex constraints where existing DL libraries fail to extend straight-forwardly. A constrained training strategy is developed based on the primal-dual method. For distributed implementation, a novel binarization technique at the output layer is developed for quantization at each node. Our proposed distributed DL framework is examined in various network configurations of wireless resource management. Numerical results verify the effectiveness of our proposed approach over existing optimization techniques.
Article
Full-text available
The sound from a Compact Disc system encoded into data bits and modulated into channel bits is sent along the 'transmission channel' consisting of write laser - master disk - user disk - optical pick-up. The maximum information density on the disk is determined by the diameter d of the laser light spot on the disk and the 'number of data bits per light spot'. The effect of making d smaller is to greatly reduce the manufacturing tolerances for the player and the disk. The compromise adopted is d approximately equals 1 mu m, giving very small tolerances for objective disk tilt, disk thickness and defocusing.
Article
Full-text available
The systematic design of DC-constrained codes based on codewords of fixed length is considered. Simple recursion relations for enumerating the number of codewords satisfying a constraint on the maximum unbalance of ones and zeros in a codeword are derived. An enumerative scheme for encoding and decoding maximum unbalance constrained codewords with binary symbols is developed. Examples of constructions of transmission systems based on unbalance constrained codewords are given. A worked example of an 8b10b channel code is given being of particular interest because of its practical simplicity and relative efficiency.
Article
A (1,6) runlength limited (RLL) code is described in which the encoder converts unconstrained data strings into (1,6) RLL constrained strings, where the rate is 2/3. The decoder, having limited error propagation recovers the original data strings. One error in the encoded string can result in no more than eleven errors in the decoded data.
Article
A code that converts unconstrained data strings into (5, 12) run-length limited constrained strings is presented. Its decoder recovers the original data strings, and its error propagation is limited. One error in the encoder string can result in no more than five errors in the decoded data.
Article
The stochastic process appearing at the output of a digital encoder is investigated. Based upon the statistics of the code being employed, a systematic procedure is developed by means of which the average power spectral density of the process can be determined. The method is readily programmed on the digital computer, facilitating the calculation of the spectral densities for large numbers of codes. As an example of its use, the procedure is applied in the case of a specific multi-alphabet, multi-level code.
Article
This paper analyzes a block-coding scheme designed to suppress spectral energy near f = 0 for any binary message sequence. In this scheme, the polarity of each block is either maintained or reversed, depending on which decision drives the accumulated digit sum toward zero. The polarity of the block's last digit informs the decoder as to which decision was made. Our objective is to derive the average power spectrum of the coded signal when the message is a random sequence of +1's and −1's and the block length (M) is odd. The derivation uses a mixture of theoretical analysis and computer simulation. The theoretical analysis leads to a spectrum description in terms of a set of correlation coefficients, {ρq}, q = 1, 2, etc., with the ρq's functions of M. The computer simulation uses FFT algorithms to estimate the power spectrum and autocorrelation function of the block-coded signal. From these results, {ρq} is estimated for various M. A mathematical approximation to ρg in terms of q and M is then found which permits a closed-form evaluation of the power spectrum. Comparisons between the final formula and simulation results indicate an accuracy of ±5 percent (±0.2 dB) or better. The block-coding scheme treated here is of particular interest because of its practical simplicity and relative efficiency. The methods used to analyze it can be applied to other block-coding schemes as well, some of which are discussed here for purposes of comparison.
Article
A method for determining maximum-size block codes, with the property that no concatenation of codewords violates the input restrictions of a given channel, is presented. The class of channels considered is essentially that of Shannon (1948) in which input restrictions are represented through use of a finite-state machine. The principal results apply to channels of finite memory and codes of length greater than the channel memory but shorter codes and non-finite memory channels are discussed briefly. A partial ordering is first defined over the set of states. On the basis of this ordering, complete terminal sets of states are determined. Use is then made of Mason's general gain formula to obtain a generating function for the size of the code which is associated with each complete terminal set. Comparison of coefficients for a particular block length establishes an optimum terminal set and codewords of the maximum-size code are then obtained directly. Two important classes of binary channels are then considered. In the first class, an upper bound is placed on the separation of 1's during transmission while, in the second class, a lower bound is placed on this separation. Universal solutions are obtained for both classes.
Article
A special family J of prefix codes is considered. It is verified that if A ε J has not a certain synchronizing property, then A = Cp (p > 1), where C is another code from the same family.
Article
We derive the limiting efficiencies of dc-constrained codes. Given bounds on the running digital sum (RDS), the best possible coding efficiency η, for a K-ary transmission alphabet, is η = log2 λmax/log2 K, where λmax is the largest eigenvalue of a matrix which represents the transitions of the allowable states of RDS. Numerical results are presented for the three special cases of binary, ternary and quaternary alphabets.