Article

On the design of spectrum shaping codes for high-density data storage

Authors:
  • Turing Machines Inc
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper proposes systematic code design methods for constructing efficient spectrum shaping codes with the maximum runlength limited constraint k, which are widely used in data storage systems for digital consumer electronics products. Through shaping the spectrum of the input user data sequence, the codes can effectively circumvent the interaction between the data signal and servo signal in high-density data storage systems. In particular, we first propose novel methods to design high-rate k constrained codes in the non-return-to-zero (NRZ) format, which can not only facilitate timing recovery of the storage system, but also avoid error propagation during decoding and reduce the system complexity. We further propose to combine the Guided Scrambling (GS) technique with the k constrained code design methods to construct highly efficient spectrum shaping k constrained codes. Simulation results demonstrate that the designed codes can achieve significant spectrum shaping effect with only around 1% code rate loss and reasonable computational complexity.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Over the years, different construction schemes for RLL codes, with varied enhancements, have been proposed and analyzed [15]- [18]. Study of RLL codes has continued to be an important research topic, and recent work includes its application to high density data storage [19], DNA-based storage [20], and visible light communication [21]. ...
... Note from Prop. 10 that feasible subblock length L → ∞ as E max → ∞. Hence, using (19) and (1), ...
... The upper bound is computed using the expression h (max{B, 0.5}) which corresponds to the code capacity when the fraction of ones in each codeword is at least B (see Prop. 8 and the following remark). The lower bound for O Emax SEC (B) is computed using (19) and (1). As shown in Thm. ...
Article
Full-text available
Run-length limited (RLL) codes are a well-studied class of constrained codes having application in diverse areas, such as optical and magnetic data recording systems, DNA-based storage, and visible light communication. RLL codes have also been proposed for the emerging area of simultaneous energy and information transfer, where the receiver uses the received signal for decoding information as well as for harvesting energy to run its circuitry. In this paper, we show that RLL codes are not the best codes for simultaneous energy and information transfer, in terms of the maximum number of codewords which avoid energy outage, i.e., outage-constrained capacity. Specifically, we show that sliding window constrained (SWC) codes and sub-block energy constrained (SEC) codes have significantly higher outage-constrained capacities than RLL codes for moderate to large energy buffer sizes.
... Over the years, different construction schemes for RLL codes, with varied enhancements, have been proposed and analyzed [15]- [18]. Study of RLL codes has continued to be an important research topic, and recent work includes its application to high density data storage [19], DNA-based storage [20], and visible light communication [21]. ...
... Note from Prop. 10 that feasible subblock length L → ∞ as E max → ∞. Hence, using (19) and (1), ...
... The upper bound is computed using the expression h (max{B, 0.5}) which corresponds to the code capacity when the fraction of ones in each codeword is at least B (see Prop. 8 and the following remark). The lower bound for O Emax SEC (B) is computed using (19) and (1). As shown in Thm. ...
Preprint
Full-text available
Run-length limited (RLL) codes are a well-studied class of constrained codes having application in diverse areas such as optical and magnetic data recording systems, DNA-based storage, and visible light communication. RLL codes have also been proposed for the emerging area of simultaneous energy and information transfer, where the receiver uses the received signal for decoding information as well as for harvesting energy to run its circuitry. In this paper, we show that RLL codes are not the best codes for simultaneous energy and information transfer, in terms of the maximum number of codewords which avoid energy outage, i.e., outage-constrained capacity. Specifically, we show that sliding window constrained (SWC) codes and subblock energy constrained (SEC) codes have significantly higher outage-constrained capacities than RLL codes.
... Our designed spectrum shaping codes can be easily implemented on various consumer electronics devices, such as magnetic tape and other data storage products such as ultramobile HDDs [3,4,5,6]. The above described code designs and constructions are all done off-line, and once the codes are designed, the encoder and decoder can be implemented based on simple look-up tables. ...
... Spectral null codes have been applied in a myriad practical communication [15] and data storage systems [2,3,4,5,6]. In this section, we analyze the difference in spectral performance and redundancy of the newly developed spectral shaping codes and prior art dc-balanced codes having a null at dc, f = 0. ...
Article
Full-text available
We investigate a new approach for designing spectral shaping block codes with a target spectrum, H_t(f), that has been specified at a plurality of frequencies. We analyze the probability density function of the spectral power density function of uncoded n-symbol bipolar code words. We present estimates of the redundancy and the spectrum of spectral shaping codes with specified target spectral densities H_t(f_i) at frequencies f_i. Constructions of low-redundancy codes with suppressed low-frequency content are presented that compare favorably with conventional dc-balanced codes currently used in data transmission and data storage devices with applications in consumer electronics.
... Constrained codes have been widely applied to mass data storage systems such as the magnetic, optical recording systems and the flash memories, which are key constituents of the consumer electronics products [1]. DNA data storage systems are much more compact and durable than any of the existing data storage systems, and hence it has emerged as a very promising candidate for the storage of Big Data for the consumers [2,3]. ...
... To train the MLP, we need to generate a large amount of signals y read back from the DNA sequencer and the associated labels x as the training data set. Since this is a very hard and expensive task, we use the DNA data storage channel model given by (1) to generate the current drop signals. ...
... In this study, we first design a configuration of k constrained codes with LDPC codes. The k constrained codes herein are constructed using a nibble replacement method [10], [11], which as reported recently could achieve high coderates, simple encoders and decoders, and limited the error propagation. Then, we also extend our research to design DCfree k constrained LDPC codes. ...
... The k constrained codes based on the nibble replacement have been reported recently in [10], [11]. Basically, the technique removes all inadmissible q-bit nibbles and replaces them by q-bit admissible ones, where the admissible nibbles are predefined, i.e., dec(u) ≥ w, where the function dec() refers to the decimal representation of a binary sequence u, and w is a predefined integer. ...
Article
Full-text available
An efficient concatenation of error correction codes with constrained codes is proposed in this paper. Generally, constrained codes are designed to match specified channels, whereas error correction coding schemes are designed to correct the channel errors. They both play important roles to ensure the integrity of data in data storage systems. In this study, we first investigate the design of k constrained codes combined with a low-density parity-check (LDPC) code, and then we extend the idea to the design of DC-free k constrained LDPC codes. Simulation results show that the proposed designs achieve an improved bit error rate (BER) performance, compared to prior art schemes. Especially, the proposed design for the DC-free k constrained codes not only fully eliminates the effect of error propagation in a reverse configuration, but also achieves significant DC suppression.
... For real-world implementation of S3, cost is a factor that must be considered. Others, like Cai et al. (2017) depict a method for designing systematic codes that permit create efficient spectral shaping codes with the maximum run length possible. These codes can be very useful in highdensity data storage systems. ...
Article
A symbiotic simulation system (S3), sometimes also called a `digital twin', enables interactions between a physical system and its computational model representation. With the goal of supporting operational decisions, an S3 uses real-time data from the physical system, which is gathered via sensors. This real-time data is also saved in an enterprise data storage system (EDSS), so it can be used as historical data for future use. Both real-time and historical data are then used as inputs to the different components of an S3, which typically comprises several modules: data acquisition, simulation, optimisation, machine learning, and an `actuator'. The latter is needed when there is not a human agent between the S3 and the system. Given the amount of data generated by today's smart systems, an S3 needs to be coupled with an EDSS. Furthermore, the S3 may produce a large amount of output data that needs to be stored, since it might be re-used by the machine learning module to make the S3 adaptive in dynamic scenarios. With the goal of supporting real-time operational decision-making -- specially in Industry 4.0 applications such as smart cities, smart factories, intelligent transportation systems, and digital supply chains --, this paper proposes a generic system architecture for an S3 and discusses its integration within EDSSs. Moreover, the paper reviews the state-of-the-art in S3, and analyses how these systems can interact with EDSSs to make real-time decision making a reality. Finally, the paper also points out several research challenges in S3.
... C ONSTRAINED sequence (CS) codes, such as runlength-limited (RLL) codes, DC-free codes and DCfree RLL codes, continue to be studied for application in digital transmission [1]- [4], magnetic and optical recording [5]- [7], non-volatile storage [8]- [11], DNA-based storage systems [12] and visible light communication (VLC) [13]- [18]. Most constrained sequence codes are fixed-length codes, where codebooks consist of source words and codewords of uniform length. ...
Article
Full-text available
We consider the construction of capacity-approaching variable-length constrained sequence codes based on multi-state encoders that permit state-independent decoding. Based on the finite state machine description of the constraint, we first select the principal states and establish the minimal sets. By performing partial extensions and normalized geometric Huffman coding, efficient codebooks that enable state-independent decoding are obtained. We then extend this multi-state approach to a construction technique based on n-step FSMs. We demonstrate the usefulness of this approach by constructing capacity-approaching variable-length constrained sequence codes with improved efficiency and/or reduced implementation complexity to satisfy a variety of constraints, including the runlength-limited (RLL) constraint, the DC-free constraint, and the DC-free RLL constraint, with an emphasis on their application in visible light communications.
... Spectral null, or dc-balanced, codes have been applied in cable transmission [1], [2], magnetic recording [3], and optical recording systems [4], [5]. Spectral null codes have recently been advocated in visible light communications (VLC) systems, where light intensity of solid-state light sources, mostly LEDs, are varied [6]. ...
Preprint
Full-text available
We apply the central limit theorem for deriving approximations to the auto-correlation function and power density function (spectrum) of second-order spectral null (dc2-balanced) codes.We show that the auto-correlation function of dc2-balanced codes can be accurately approximated by a cubic function. We show that the difference between the approximate and exact spectrum is less than 0.04 dB for codeword length n = 256.
... Spectral shaping codes with a null at the zero frequency, also called dc-free, dc-balanced or spectral null codes, have been applied extensively in cable transmission [1], [2], mag- netic recording [3], and optical recording systems [4], [5]. Spectral null codes have recently been advocated in visible light communications (VLC) systems, where light intensity of solid-state light sources, mostly LEDs, are varied [6]. ...
Preprint
Full-text available
We apply the central limit theorem for deriving approximations to the auto-correlation function and power density function (spectrum) of second-order spectral null (dc^2-balanced) codes. We show that the auto-correlation function of dc^2-balanced codes can be accurately approximated by a cubic function. We compare the approximate auto-correlation function and spectrum with the exact auto-correlation function and spectrum of full set dc^2-balanced codes. We show that the difference between the approximate and exact spectrum is less than 0.04 dB for codeword length n = 256. We compare the spectral performance of dc-balanced versus dc^2-balanced codes in the low-frequency range.
... Traditionally, spectral shaping codes with a null at the zero frequency, also called dc-balanced codes, have been employed to counter the effects of the channel's low-frequency cut-off [1], [2]. Spectral shaping codes have been applied extensively in magnetic recording products such as Digital Video Recording (DV) and optical recording systems, such as Compact Disc, DVD, Blu-ray disc [3], [4], where spectral null codes are applied for avoiding the interaction between the written data and the mechanical servo systems that are used for tracking the data. Constructions of spectral null block codes have received widespread attention in the literature, see for example [5], [6], [7], [8], [9], [10]. ...
Article
Full-text available
We present an estimate of the power density function (spectrum) of binary K-th order spectral null codes. We work out the auto-correlation model in detail for second-and third-order spectral null codes. We compare the auto-correlation functions and spectra predicted by the model with those generated by full-set K-th order spectral null block codes.
Article
Resistive random access memory (ReRAM) is a promising emerging non-volatile memory (NVM) technology that shows high potential for both data storage and computing. However, its crossbar array architecture leads to the sneak path (SP) problem, which may severely degrade the data storage reliability of ReRAM. Due to the complicated nature of the SP-induced interference (SPI) [1] , it is difficult to derive an accurate channel model for it. The deep learning (DL)-based detection scheme (Zhong et al. , 2020) can better mitigate the SPI, at the cost of additional power consumption and read latency. In this letter, we first propose a novel constrained coding (CC) scheme which can not only reduce the SPI, but also effectively differentiate the memory arrays into two categories of SPI-free and SPI-affected arrays. For the SPI-free arrays, we can use a simple middle-point threshold detector to detect the low and high resistance cells of ReRAM. For the SPI-affected arrays, a DL detector is first trained off-line. To avoid the additional power consumption and latency introduced by the DL detector, we further propose a DL-based threshold detector, whose detection threshold can be derived based on the outputs of the DL detector. It is then utilized for the online data detection of all the identified SPI-affected arrays. Simulation results demonstrate that the above CC and DL aided threshold detection scheme can effectively mitigate the SPI of the ReRAM array and achieve better error rate performance than the prior art detection schemes, without the prior knowledge of the channel.
Article
Full-text available
We apply the central limit theorem for deriving approximations to the auto-correlation function and power density function (spectrum) of second-order spectral null (dc2-balanced) codes. We show that the auto-correlation function of dc2-balanced codes can be accurately approximated by a cubic function. We show that the difference between the approximated and exact spectrum is less than 0.03 dB for codeword length n=256.
Article
Full-text available
In this paper, we will present coding techniques for the character-constrained channel, where information is conveyed using q-bit characters (nibbles), and where w prescribed characters are disallowed. Using codes for the character-constrained channel, we present simple and systematic constructions of high-rate binary maximum runlength constrained codes. The new constructions have the virtue that large lookup tables for encoding and decoding are not required. We will compare the error propagation performance of codes based on the new construction with that of prior art codes.
Book
Full-text available
Preface to the Second Edition About five years after the publication of the first edition, it was felt that an update of this text would be inescapable as so many relevant publications, including patents and survey papers, have been published. The author's principal aim in writing the second edition is to add the newly published coding methods, and discuss them in the context of the prior art. As a result about 150 new references, including many patents and patent applications, most of them younger than five years old, have been added to the former list of references. Fortunately, the US Patent Office now follows the European Patent Office in publishing a patent application after eighteen months of its first application, and this policy clearly adds to the rapid access to this important part of the technical literature. I am grateful to many readers who have helped me to correct (clerical) errors in the first edition and also to those who brought new and exciting material to my attention. I have tried to correct every error that I found or was brought to my attention by attentive readers, and seriously tried to avoid introducing new errors in the Second Edition. China is becoming a major player in the art of constructing, designing, and basic research of electronic storage systems. A Chinese translation of the first edition has been published early 2004. The author is indebted to prof. Xu, Tsinghua University, Beijing, for taking the initiative for this Chinese version, and also to Mr. Zhijun Lei, Tsinghua University, for undertaking the arduous task of translating this book from English to Chinese. Clearly, this translation makes it possible that a billion more people will now have access to it. Kees A. Schouhamer Immink Rotterdam, November 2004
Article
Full-text available
In 1986, Don Knuth published a very simple algorithm for constructing sets of bipolar codewords with equal numbers of one's and zero's, called balanced codes. Knuth's algorithm is well suited for use with large codewords. The redundancy of Knuth's balanced codes is a factor of two larger than that of a code comprising the full set of balanced codewords. In this paper, we will present results of our attempts to improve the performance of Knuth's balanced codes.
Conference Paper
Full-text available
General construction methods of prefix synchronized codes and runlength limited codes are presented, which make use of so-called sequence replacement techniques. These techniques provide a simple and efficient conversion of data words into codewords of a constrained block-code, where subsequences violating the imposed constraints are replaced by encoded information to indicate their relative positions in the data word. Several constructions are proposed for constrained codes with low error propagation, and for variable length constrained codes. The coding algorithms have a low computational and hardware complexity. The rate of the constructed codes approaches the theoretical maximum. It is feasible to apply these high rate constrained block codes in communication and recording systems
Article
Full-text available
This paper proposes a general and systematic code design method to efficiently combine constrained codes with parity-check (PC) codes for optical recording. The proposed constrained PC code includes two component codes: the normal constrained (NC) code and the parity-related constrained (PRC) code. They are designed based on the same finite state machine (FSM). The rates of the designed codes are only a few tenths below the theoretical maximum. The PC constraint is defined by the generator matrix (or generator polynomial) of a linear binary PC code, which can detect any type of dominant error events or error event combinations of the system. Error propagation due to parity bits is avoided, since both component codes are protected by PCs. Two approaches are proposed to design the code in the non-return-to-zero-inverse (NRZI) format and the non-return-to-zero (NRZ) format, respectively. Designing the codes in NRZ format may reduce the number of parity bits required for error detection and simplify post-processing for error correction. Examples of several newly designed codes are illustrated. Simulation results with the Blu-Ray disc (BD) systems show that the new d = 1 constrained 4-bit PC code significantly outperforms the rate 2/3 code without parity, at both nominal density and high density.
Article
Full-text available
We report on a class of high-rate de-free codes, called multimode codes, where each source word can be represented by a codeword taken from a selection set of codeword alternatives. Conventional multimode codes will be analyzed using a simple mathematical model. The criterion used to select the "best" codeword from the selection set available has a significant bearing on the performance. Various selection criteria are introduced and their effect on the performance of multimode codes will be examined.
Article
The perpendicular magnetic recording (PMR) in hard disk drives is approaching its physical limitation. The emerging technologies, such as heat assisted magnetic recording and microwave assisted magnetic recording have been proposed to record on magnetic media with thermally stable smaller size grains at higher areal density (AD). However, in the media fabrication, achieving well-isolated small size of grains is more challenging than obtaining high ${K_{u}}$ material as recording media. Reducing the number of grains per bit is a major path for keeping AD growth of PMR in recent years. To minimize the SNR penalty at a smaller grain number per bit, pushing more on track density is the right approach. With the 2-D magnetic recording (TDMR) readers for inter-track interference cancellation, the off-track read capability is improved significantly for allowing a narrower track read. In the drive working environment, when the external vibration or other mechanical disturbance happens during the writing process, it creates more track squeeze at adjacent tracks and leaves a very narrow track at some locations of the track. When the track width is narrower than the squeeze to death width in the 747 curve, it causes hard failure in the channel. To solve the track squeeze problem, this paper proposes to add an additional magnetic recording layer in between the data recording layer and the soft underlayer of conventional PMR media. This additional recording layer is used to record servo information only. The continuous positioning error signal is able to improve the servo performance and to provide the real-time monitoring of the positioning error. When it is under bad servo conditions, the writing process can be stopped to avoid nontolerable track squeeze. The continuous servo signals are designed to be of moderate intensity at very low frequency, and its impact on data signal has been minimized. The linea- density gap between the dedicated servo media and the conventional PMR media is able to be controlled within 3%. As the dedicated servo system keeps only around 100 wedges of track ID and sector ID at the data layer, the surface area saving at the data layer can break even in capacity. The dedicated servo technology together with TDMR readers is the key technology to achieve ultrahigh track density during both writing and reading processes.
Article
In this paper, we will present coding techniques for the character-constrained channel, where information is conveyed using q-bit characters (nibbles), and where w prescribed characters are disallowed. Using codes for the character-constrained channel, we present simple and systematic constructions of high-rate binary maximum runlength constrained codes. The new constructions have the virtue that large lookup tables for encoding and decoding are not required. We will compare the error propagation performance of codes based on the new construction with that of prior art codes.
Conference Paper
The paper gives a survey of spectrum shaping codes used for digital recording systems. This class of codes belongs to the broader class of modulation codes, which are widely used in recording systems for adjusting the source characteristics to the characteristics of the recording channel. The Shannon noiseless capacities of recording channels are considered, as well as the spectra of maxentropic sequences of M-ary recording constraints. In addition, some practical encoding and decoding schemes are discussed.
Article
The sequence replacement technique converts an input sequence into a constrained sequence in which a prescribed subsequence is forbidden to occur. Several coding algorithms are presented that use this technique for the construction of maximum run-length limited sequences. The proposed algorithms show how all forbidden subsequences can be successively or iteratively removed to obtain a constrained sequence and how special subsequences can be inserted at predefined positions in the constrained sequence to represent the indices of the positions where the forbidden subsequences were removed. Several modifications are presented to reduce the impact of transmission errors on the decoding operation, and schemes to provide error control are discussed as well. The proposed algorithms can be implemented efficiently, and the rates of the constructed codes are close to their theoretical maximum. As such, the proposed algorithms are of interest for storage systems and data networks.
Conference Paper
At the extremely high-linear-density regime, where +-+ appears to be the dominant error event at the detector output, two complementary methods are presented to improve the performance of iterative detection/decoding schemes based on LDPC codes. First, without increasing complexity, noticeable performance gains can be obtained by using a 1/(1 ⊕ D) precoder for the partial response channel. Second, selecting a longer partial response target provides substantial additional gains.
Article
The proper functioning of a digital video recorder is largely governed by the selection of a suitable channel code in conjunction with both the detection method and the tracking system. The channel code for the digital video cassette recorder (DVCR) satisfies a variety of design requirements. The paper provides an overview of those requirements. A detailed description is given of the constriction of the new channel code, called 24→25 code, that complies with the given constraints and involves only a minor drawback in terms of the overhead needed. The servo position information is recorded as low-frequency components, pilot tracking tones, which are embedded in the recorded stream of binary digits
Article
A general method of constructing run-length limited (d, k) constrained codes from arbitrary sequences is introduced. This method is then combined with the method of guided scrambling for constructing a class of weakly constrained codes. The proposed codes are analyzed for the case of d=0 and are shown to give results which are better or comparable to those of the best available codes, however, at the cost of failure with some very low probability. For d>0, the code efficiency of the codes constructed according to the proposed method reduces significantly.
Article
The technique introduced has relatively simple encoding and decoding procedures which can be implemented at the high bit rates used in optical fiber communication systems. Because it is similar to the established technique of self-synchronizing scrambling but is also capable of guiding the scrambling process to produce a balanced encoded bit stream, the technique is called guided scrambling, (GS). The concept of GS coding is explained, and design parameters which ensure good line code characteristics are discussed. The performance of a number of guided scrambling configurations is reported in terms of maximum consecutive like-encoded bits, encoded stream disparity, decoder error extension, and power spectral density of the encoded signal. Comparison of guided scrambling with conventional line code techniques indicates a performance which approaches that of alphabetic lookup table codes with an implementation complexity similar to that of current nonalphabetic coding techniques.
Article
A balanced code with k information bits and r check bits is a binary code of length n=k+r and cardinality 2<sup>k</sup> such that the number of 1s in each code word is equal to [n/2]. This paper describes the design of efficient balanced codes with parallel encoding and parallel decoding. In this case, since area and delay of such circuits are critical factors, another parameter is introduced in the definition of balanced code: the “number of balancing functions used in the code design”, p. Parallel encoding and decoding algorithms independent from the chosen balancing method are given and these can be implemented by a VLSI circuit of size O(pk) and depth O(logp). This paper also presents a new balancing method: the permutation method, which, for infinitely many values of k (such as, k=8, 10, 20, 22, 32, 34, ...) is more efficient than Knuth's complementation method. This new method results in efficient balanced codes with k information bits, k even, r=2[k/12]+2 check bits and p=6 balancing functions. Further, Knuth's complementation method is generalized to obtain efficient code designs for any value of the parameters k, r, and p, provided that k&les;2Σ<sub>i=0</sub><sup>m</sup>(<sub>i</sub><sup>r </sup>)+p(r-2m-1)[(kr+k+r) mod 2], where m is such that (<sub>m-1</sub> <sup>r</sup>)<p&les;(<sub>m</sub><sup>r</sup>)
Article
A general method of constructing (d, k) constrained codes from arbitrary sequences is introduced. This method is then combined with the method of guided scrambling for constructing a class of weakly constrained codes. The proposed codes are analyzed for the case of d -- 0 and shown to give results which are better or comparable to those of the best available codes, however at the cost of failure with some very low probability.
High Rate Runlength Limited Codes for 10-bit ECC Symbols
  • P Mcewen
  • K Fitzpatrick
  • B Zafar
P. McEwen, K. Fitzpatrick, and B. Zafar, "High Rate Runlength Limited Codes for 10-bit ECC Symbols", US Patent 6,259,384, July 2001.
Runlength-limited Code and Method
  • M A Mcclellan
M.A. McClellan, "Runlength-limited Code and Method", US Patent 6,285,302, Sept. 2001.
Knuth's Balancing of Codewords Revisited
  • J H Weber
  • K A S Immink
J. H. Weber and K. A. S. Immink, "Knuth's Balancing of Codewords Revisited," IEEE Trans. Inform. Theory, vol. 56, no. 4, pp. 1673-1679, 2010.
High Rate Runlength Limited Codes for 8-bit ECC Symbols
  • P Mcewen
  • B Zafar
  • K Fitzpatrick
P. McEwen, B. Zafar, and K. Fitzpatrick, "High Rate Runlength Limited Codes for 8-bit ECC Symbols", US Patent 6,201,485, March 2001.