ArticlePDF Available

# Codes for digital recorders

Authors:

## Abstract

Constrained codes are a key component in digital recording devices that have become ubiquitous in computer data storage and electronic entertainment applications. This paper surveys the theory and practice of constrained coding, tracing the evolution of the subject from its origins in Shannon's classic 1948 paper to present-day applications in high-density digital recorders. Open problems and future research directions are also addressed
2260 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998
Codes for Digital Recorders
Kees A. Schouhamer Immink, Fellow, IEEE, Paul H. Siegel, Fellow, IEEE, and Jack K. Wolf, Fellow, IEEE
(Invited Paper)
Abstract—Constrained codes are a key component in the digital
recording devices that have become ubiquitous in computer data
storage and electronic entertainment applications. This paper
surveys the theory and practice of constrained coding, tracing
the evolution of the subject from its origins in Shannon’s classic
1948 paper to present-day applications in high-density digital
recorders. Open problems and future research directions are also
Index TermsConstrained channels, modulation codes, record-
ing codes.
I. INTRODUCTION
AS has been observed by many authors, the storage and
retrieval of digital information is a special case of digital
communications. To quote E. R. Berlekamp [18]:
Communication links transmit information from here to
there. Computer memories transmit information from
now to then.
Thus as information theory provides the theoretical under-
pinnings for digital communications, it also serves as the
foundation for understanding fundamental limits on reliable
digital data recording, as measured in terms of data rate and
storage density.
A block diagram which depicts the various steps in record-
ing and recovering data in a storage system is shown in Fig. 1.
This “Fig. 1” is essentially the same as the well-known Fig. 1
used by Shannon in his classic paper [173] to describe a
general communication system, but with the conﬁguration of
codes more explicitly shown.
As in many digital communication systems, a concatenated
approach to channel coding has been adopted in data recording,
consisting of an algebraic error-correcting code in cascade
with a modulation code. The inner modulation code, which
is the focus of this paper, serves the general function of
matching the recorded signals to the physical channel and
to the signal-processing techniques used in data retrieval,
while the outer error-correction code is designed to remove
Manuscript received December 10, 1997; revised June 5, 1998. The work
of P. H. Siegel was supported in part by the National Science Foundation
under Grant NCR-9612802. The work of J. K. Wolf was supported in part by
the National Science Foundation under Grant NCR-9405008.
K. A. S. Immink is with the Institute of Experimental Mathematics,
University of Essen, 45326 Essen, Germany.
P. H. Siegel and J. K. Wolf are with the University of California at San
Diego, La Jolla, CA 92093-0407 USA.
Publisher Item Identiﬁer S 0018-9448(98)06735-2.
Fig. 1. Block diagram of digital recording system.
any errors remaining after the detection and demodulation
process. (See [41] in this issue for a survey of applications
of error-control coding.)
As we will discuss in more detail in the next section, a
recording channel can be modeled, at a high level, as a linear,
intersymbol-interference (ISI) channel with additive Gaussian
noise, subject to a binary input constraint. The combination
of the ISI and the binary input restriction has presented a
challenge in the information-theoretic performance analysis
of recording channels, and it has also limited the applica-
bility of the coding and modulation techniques that have
been overwhelmingly successful in communication over linear
Gaussian channels. (See [56] in this issue for a comprehensive
discussion of these methods.)
The development of signal processing and coding tech-
niques for recording channels has taken place in an environ-
ment of escalating demand for higher data transfer rates and
storage capacity—magnetic disk drives for personal computers
today operate at astonishing data rates on the order of 240
million bits per second and store information at densities of
up to 3 billion bits per square inch—coupled with increasingly
severe constraints on hardware complexity and cost.
The needs of the data storage industry have not only fostered
innovation in practical code design, but have also spurred the
development of a rigorous mathematical foundation for the
theory and implementation of constrained codes. They have
also stimulated advances in the information-theoretic analysis
of input-constrained, noisy channels.
In this paper, we review the progress made during the past
50 years in the theory and practical design of constrained
modulation codes for digital data recording. Along the way, we
will highlight the fact that, although Shannon did not mention
0018–9448/98$10.00 1998 IEEE IMMINK et al.: CODES FOR DIGITAL RECORDERS 2261 storage in his classic two-part paper whose golden anniversary we celebrate in this issue—indeed random-access storage as we know it today did not exist at the time—a large number of fundamental results and techniques relevant to coding for stor- age were introduced in his seminal publication. We will also survey emerging directions in data-storage technology, and discuss new challenges in information theory that they offer. The outline of the remainder of the paper is as follows. In Section II, we present background on magnetic-recording channels. Section II-A gives a basic description of the physical recording process and the resulting signal and noise character- istics. In Section II-B, we discuss mathematical models that capture essential features of the recording channel and we review information-theoretic bounds on the capacity of these models. In Section II-C, we describe the signal-processing and -detection techniques that have been most widely used in commercial digital-recording systems. In Section III-A, we introduce the input-constrained, (noise- less) recording channel model, and we examine certain time- domain and frequency-domain constraints that the channel input sequences must satisfy to ensure successful implemen- tation of the data-detection process. In Section III-B, we review Shannon’s theory of input-constrained noiseless chan- nels, including the deﬁnition and computation of capacity, the determination of the maxentropic sequence measure, and the fundamental coding theorem for discrete noiseless channels. In Section IV, we discuss the problem of designing efﬁcient, invertible encoders for input-constrained channels. As in the case of coding for noisy communication channels, this is a subject about which Shannon had little to say. We will summarize the substantial theoretical and practical progress that has been made in constrained modulation code design. In Section V, we present coded-modulation techniques that have been developed to improve the performance of noisy recording channels. In particular, we discuss families of distance-enhancing constrained codes that are intended for use with partial-response equalization and various types of sequence detection, and we compare their performance to estimates of the noisy channel capacity. In Section VI, we give a compendium of modulation-code constraints that have been used in digital recorders, describ- ing in more detail their time-domain, frequency-domain, and statistical properties. In Section VII, we indicate several directions for future research in coding for digital recording. In particular, we consider the incorporation of improved channel models into the design and performance evaluation of modulation codes, as well as the invention of new coding techniques for ex- ploratory information storage technologies, such as nonsatu- ration recording using multilevel signals, multitrack recording and detection, and multidimensional page-oriented storage. Finally, in Section VIII, we close the paper with a dis- cussion of Shannon’s intriguing, though somewhat cryptic, remarks pertaining to the existence of crossword puzzles, and make some observations about their relevance to coding for multidimensional constrained recording channels. Section IX brieﬂy summarizes the objectives and contents of the paper. II. BACKGROUND ON DIGITAL RECORDING The history of signal processing in digital recording systems can be cleanly broken into two epochs. From 1956 until approximately 1990, direct-access storage devices relied upon “analog” detection methods, most notably peak detection. Beginning in 1990, the storage industry made a dramatic shift to “digital” techniques, based upon partial-response equaliza- tion and maximum-likelihood sequence detection, an approach that had been proposed 20 years earlier by Kobayashi and Tang [130], [131], [133]. To understand how these signal- processing methods arose, we review a few basic facts about the physical process underlying digital magnetic recording. (Readers interested in the corresponding background on optical recording may refer to [25], [84], [102, Ch. 2], and [163].) We distill from the physics several mathematical models of the recording channel, and describe upper and lower bounds on their capacity. We then present in more detail the analog and digital detection approaches, and we compare them to the optimal detector for the uncoded channel. A. Digital Recording Basics The magnetic material contained on a magnetic disk or tape can be thought of as being made up of a collection of discrete magnetic particles or domains which can be magnetized by a write head in one of two directions. In present systems, digital information is stored along paths, called tracks, in this magnetic medium. We store binary digits on a track by magne- tizing these particles or domains in one of two directions. This method is known as “saturation” recording. The stored binary digits usually are referred to as “channel bits.” Note that the word “bit” is used here as a contraction of the words “binary digit” and not as a measure of information. In fact, we will see that when coding is introduced, each channel bit represents only a fraction of a bit of user information. The modiﬁer “channel” in “channel bits” emphasizes this difference. We will assume a synchronous storage system where the channel bits occur at the ﬁxed rate of channel bits per second. Thus is the duration of a channel bit. In all magnetic- storage systems used today, the magnetic medium and the read/write transducer (referred to as the read/write head) move with respect to each other. If the relative velocity of a track and the read/write head is constant, the constant time-duration of the bit translates to a constant linear channel-bit density, reﬂected in the length corresponding to a channel bit along the track. The normalized input signal applied to the recording trans- ducer (write head) in this process can be thought of as a two-level waveform which assumes the values and over consecutive time intervals of duration In the waveform, the transitions from one level to another, which effectively carry the digital information, are therefore constrained to occur at integer multiples of the time period , and we can describe the waveform digitally as a sequence over the bipolar alphabet where is the signal amplitude in the time interval In the simplest model, the input–output relationship of the digital magnetic recording channel can be viewed as linear. Denote by 2262 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Fig. 2. Lorentzian channel step response, the output signal (readback voltage), in the absence of noise, corresponding to a single transition from, say, to at time Then, the output signal generated by the waveform represented by the sequence is given by (1) with Note that the “derivative” sequence of coefﬁcients consists of elements taken from the ternary alphabet and the nonzero values, corresponding to the transitions in the input signal, alternate in sign. A frequently used model for the transition response is the function often referred to as the Lorentzian model for an isolated-step response. The parameter is sometimes denoted ,an abbreviation for “pulsewidth at 50% maximum amplitude,” the width of the pulse measured at 50% of its maximum height. The Lorentzian step response with is shown in Fig. 2. The output signal is therefore the linear superposition of time-shifted Lorentzian pulses with coefﬁcients of magnitude equal to and alternating polarity. For this channel, sometimes called the differentiated Lorentzian channel, the frequency response is where The magnitude of the frequency response with is shown in Fig. 3. The simplest model for channel noise assumes that the noise is additive white Gaussian noise (AWGN). That is, the readback signal takes the form where and for all There are, of course, far more accurate and sophisticated models of a magnetic-recording system. These models take into account the failure of linear superposition, asymmetries in the positive and negative step responses, and other nonlinear phenomena in the readback process. There are also advanced models for media noise, incorporating the effects of material defects, thermal asperities, data dependence, and adjacent track interference. For more information on these, we direct the reader to [20], [21], [32], and [33]. B. Channel Models and Capacity The most basic model of a saturation magnetic-recording system is a binary-input, linear, intersymbol-interference (ISI) channel with AWGN, shown in Fig. 4. This model has been, and continues to be, widely used in comparing the theoretical performance of competing modula- tion, coding, and signal-processing systems. During the past IMMINK et al.: CODES FOR DIGITAL RECORDERS 2263 Fig. 3. Differentiated Lorentzian channel frequency response magnitude, Fig. 4. Continuous-time recording channel model. decade, there has been considerable research effort devoted to ﬁnding the capacity of this channel. Much of this work was motivated by the growing interest in digital recording among the information and communication theory communities [36], [37]. In this section, we survey some of the results pertaining to this problem. As the reader will observe, the analysis is limited to rather elementary channel models; the extension to more advanced channel models represents a major open research problem. 1) Continuous-Time Channel Models: Many of the bounds we cite were ﬁrst developed for the ideal, low-pass ﬁlter channel model. These are then adapted to the more realistic differentiated Lorentzian ISI model. For a given channel, let denote the capacity with a constraint on the average input power. Let denote the capacity with a peak power constraint Finally, let denote the capacity with binary input levels It is clear that The following important result, due to Ozarow, Wyner, and Ziv [159], states that the ﬁrst inequality is, in fact, an equality under very general conditions on the channel ISI. Peak-Power Achievable Rate Lemma: For the channel shown in Fig. 4, if is square integrable, then any rate achievable using waveforms satifying is achievable using the constrained waveforms We now exploit this result to develop upper and lower bounds on the capacity Consider, ﬁrst, a continuous-time, bandlimited, additive Gaussian noise channel with transfer function if otherwise. Assume that the noise has (double-sided) spectral density Let be the total noise power in the channel bandwidth. Shannon established the well-known and celebrated formula for the capacity of this channel, under the assumption of an average power constraint on the channel input signals. We quote from [173]: Theorem 17: The capacity of a channel of band perturbed by white thermal noise of power when the average transmitter power is limited to is given by (2) (We have substituted the notation for Shannon’s nota- tion to avoid confusion.) 2264 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 This result is a special case of the more general “water- ﬁlling” theorem for the capacity of an average input-power constrained channel with transfer function and noise power spectral density [74, p. 388] where denotes the range of frequencies in which and satisﬁes the equation By the peak-power achievable rate lemma, this result provides an upper bound on the capacity of the recording channel. Applications of this bound to a parameterized channel model are presented in [70]. An improved upper bound on the capacity of the low-pass AWGN channel was developed by Shamai and Bar-David [171]. This bound is a reﬁnement of the water- ﬁlling upper bound, based upon a characterization of the power spectral density of any unit process, meaning a zero-mean, stationary, two-level continuous-time random process [175]. For a speciﬁed input-power spectral density , a Gaussian input distribution maximizes the capacity. Therefore, for a given channel transfer function where and the supremum is taken over all unit process power spectral densities. In [171], an approximate solution to this optimization problem for the ideal low-pass ﬁlter was used to prove that peak-power limiting on the bandlimited channel does indeed reduce capacity relative to the average-power constrained channel. This bounding technique was applied to the differentiated Lorentzian channel with additive colored Gaussian noise in [207]. We now consider lower bounds to the capacity Shannon [173] considered the capacity of a peak-power input constraint on the ideal bandlimited AWGN channel, noting that “a constraint of this type does not work out as well mathematically as the average power limitation.” Nevertheless, he provided a lower bound, quoted below: Theorem 20: The channel capacity for a band perturbed by white thermal noise of power is bounded by where is the peak allowed transmitter power (We have substituted the notation for Shannon’s notation to avoid confusion.) In [159], the peak-power achievable rate lemma was used to derive a lower bound on for the ideal, binary-input constrained, bandlimited channel nats/s A lower bound for the more accurate channel model compris- ing a cascade of a differentiator and ideal low-pass ﬁlter was also determined. For this channel, it was shown that nats/s In both cases, the discrepancy between the lower bounds and the water-ﬁlling upper bounds represents an effective signal-to-noise ratio (SNR) difference of or about 7.6 dB at high signal-to-noise ratios. Heegard and Ozarow [83] incorporated the differentiated Lorentzian channel model into a similar analysis. To obtain a lower bound, they optimize, with respect to , the inequality where is the pulse power spectral density for the differentiated Lorentzian channel with Their results indicate that, just as for the low-pass channel and the differentiated low-pass channel, the difference in effective signal-to-noise ratios between upper and lower bounds on capacity is approximately , for large signal-to-noise ra- tios. The corresponding bound for the differentiated Lorentzian channel with additive colored Gaussian noise was determined in [207]. Shamai and Bar-David [171] developed an improved lower bound on by analyzing the achievable rate of a random telegraph wave, that is, a unit process with time intervals between transitions independently governed by an exponential distribution. Again, the corresponding bound for the differentiated Lorentzian channel with additive colored Gaussian noise was discussed in [207]. Bounds on capacity for a model incorporating slope-limitations on the magnetization are addressed in [14]. Computational results for the differentiated Lorentzian chan- nel with additive colored Gaussian noise are given in [207]. For channel densities in the range of , which corresponds to channel densities of current practical interest, the required SNR for arbitrarily low error rate was calculated. The gap between the best capacity bounds, namely, the unit process upper bound and the random telegraph wave lower bound, was found to be approximately 3 dB throughout the range. IMMINK et al.: CODES FOR DIGITAL RECORDERS 2265 2) Discrete-Time Channel Models The capacity of discrete- time channel models applicable to digital recording has been addressed by several authors, for example, [193], [88], [87], and [172]. The capacity of an average input-power- constrained, discrete-time, memoryless channel with additive, independent and identically distributed (i.i.d.) Gaussian noise is given by the well-known formula [74] where is the noise variance and is the average input- power constraint. This result is the discrete-time equivalent to Shannon’s formula (2) via the sampling theorem. Smith [180] showed that the capacity of an amplitude-constrained, discrete-time, memoryless Gaussian channel is achieved by a ﬁnite-valued random variable, representing the input to the channel, whose distribution is uniquely determined by the input constraint. (Note that, unlike the case of an average input-power constraint, this result cannot be directly translated to the continuous-time model.) Shamai, Ozarow, and Wyner [172] established upper and lower bounds on the capacity of the discrete-time Gaussian channel with ISI and stationary inputs. We will encounter in the next section a discrete-time ISI model of the magnetic- recording channel of the form , for For the channel decomposes into a pair of interleaved “dicode” channels corresponding to In [172], the capacity upper bound was compared to upper and lower bounds on the maximum achievable information rate for the normalized dicode channel model with system polynomial , and input levels These upper and lower bounds are given by (3) and respectively, where is the capacity of a binary input-constrained, memoryless Gaussian channel. Thus the upper bound on is simply the capacity of the latter channel. These upper and lower bounds differ by 3 dB, as was the case for continuous-time channel models. For other results on capacity estimates of recording-channel models, we refer the reader to [14] and [149]. The general problem of computing, or developing improved bounds for, the capacity of discrete-time ISI models of recording channels remains a signiﬁcant challenge. C. Detectors for Uncoded Channels Forney [53] derived the optimal sequence detector for an un- coded, linear, intersymbol-interference channel with additive white Gaussian noise. This detection method, the well-known maximum-likelihood sequence detector (MLSD), comprises a whitened matched ﬁlter, whose output is sampled at the symbol rate, followed by a Viterbi detector whose trellis structure reﬂects the memory of the ISI channel. For the differenti- ated Lorentzian channel model, as for many communication channel models, this detector structure would be prohibi- tively complex to implement, requiring an unbounded number of states in the Viterbi detector. Consequently, suboptimal detection techniques have been implemented. As mentioned at the start of this section, most storage devices did not even utilize sampled detection methods until the start of this decade, relying upon equalization to mitigate effects of ISI, coupled with analog symbol-by-symbol detection of waveform features such as peak positions and amplitudes. Since the introduction of digital signal-processing techniques in recording systems, partial-response equalization and Viterbi detection have been widely adopted. They represent a practical compromise between implementability and optimality, with respect to the MLSD. We now brieﬂy summarize the main features of these detection methods. 1) Peak Detection: The channel model described above is accurate at relatively low linear densities (say and where the noise is generated primarily in the readback electronics. Provided that the density of transitions and the noise variance are small enough, the locations of peaks in the output signal will closely correspond to the locations of the transitions in the recorded input signal. With a synchronous clock of period , one could then, in principle, reconstruct the ternary sequence and the recorded bipolar sequence The detection method used to implement this process in the potentially noisy digital recording device is known as peak detection and it operates roughly as follows. The peak detector differentiates the rectiﬁed readback signal, and determines the time intervals in which zero crossings occur. In parallel, the amplitude of each corresponding extremal point in the rectiﬁed signal is compared to a prespeciﬁed threshold, and if the threshold is not exceeded, the corresponding zero crossing is ignored. This ensures that low-amplitude, spurious peaks due to noise will be excluded from consideration. Those intervals in which the threshold is exceeded are designated as having a peak. The two-level recorded sequence is then reconstructed, with a transition in polarity corresponding to each interval containing a detected peak. Clock accuracy is maintained by an adaptive timing recovery circuit—known as a phase-lock loop (PLL)—which adjusts the clock frequency and phase to ensure that the amplitude-qualiﬁed zero crossings occur, on average, in the center of their respective clock intervals. 2) PRML: Current high-density recording systems use a technique referred to as PRML, an acronym for “partial- response (PR) equalization with maximum-likelihood (ML) sequence detection.” We now brieﬂy review the essence of this technique in order to motivate the use of constrained modulation codes in PRML systems. Kobayashi and Tang [133] proposed a digital communica- tions approach to handling intersymbol interference in digital magnetic recording. In contrast to peak detection, their method reconstructed the recorded sequence from sample values of a suitably equalized readback signal, with the samples measured 2266 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 at time instants At channel bit densities corresponding to , the transfer characteristics of the Lorentzian model of the saturation recording channel (with a time shift of ) closely resemble those of a linear ﬁlter with step response given by (4) where Note that at the consecutive sample times and , the function has the value , while at all other times which are multiples of , the value is . Through linear superposition (1), the output signal generated by the waveform represented by the bipolar sequence is given by which can be rewritten as where we set The transition response results in controlled intersymbol interference at sample times, leading to output-signal samples that, in the absence of noise, assume values in the set Thus in the noiseless case, we can recover the recorded bipolar sequence from the output sample values , because the interference between adjacent transitions is prescribed. In contrast to the peak detection method, this approach does not require the separation of transitions. Sampling provides a discrete-time version of this recording- channel model. Setting , the input–output relation- ship is given by In -transform notation, whereby a sequence is represented by the input–output relationship becomes where the channel transfer function satisﬁes This represention, called a partial-response channel model, is among those given a designation by Kretzmer [134] and tabulated by Kabal and Pasupathy [117]. The label assigned to it—“Class-4”—continues to be used in its designation, and the model is sometimes denoted “PR4.” For higher channel bit densities, Thapar and Patel [190] introduced a general class of partial-response models, with step-response functions (5) The corresponding input–output relationship takes the form where the discrete-time impulse response has the form where The frequency response corresponding to has a ﬁrst-order null at zero frequency and a null of order at the Nyquist frequency, one-half the symbol frequency. Clearly, the PR4 model corresponds to The channel models with are usually referred to as “extended Class-4” models, and denoted by E PR4. The PR4, EPR4, and E PR4 models are used in the design of most magnetic disk drives today. Models proposed for use in optical-recording systems have discrete-time impulse responses of the form where These models reﬂect the nonzero DC-response characteristic of some optical-recording systems, as well as their high-frequency attenuation. The models corresponding to and were also tabulated in [117], and are known as Class-1 (PR1) or “duobinary,” and Class- 2 (PR2), respectively. Recently, the models with have been called “extended PR2” models, and denoted by EPR2. (See [203] for an early analysis and application of PR equalization.) If the differentiated Lorentzian channel with AWGN is equalized to a partial-response target, the sampled channel model becomes where and Under the simplifying assumption that the noise samples are independent and identically distributed, and Gauss- ian—which is a reasonable assumption if the selected partial- response target accurately reﬂects the behavior of the channel at the speciﬁed channel bit density—the maximum-likelihood sequence detector determines the channel input–output pair and satisfying at each time This computation can be carried out recursively, using the Viterbi algorithm. In fact, Kobayashi [130], [131] proposed the use of the Viterbi algorithm for maximum-likelihood sequence IMMINK et al.: CODES FOR DIGITAL RECORDERS 2267 Fig. 5. Trellis diagram for PR4 channel. detection (MLSD) on a PR4 recording channel at about the same time that Forney [53] demonstrated its applicability to MLSD on digital communication channels with intersymbol interference. The operation of the Viterbi algorithm and its implemen- tation complexity are often described in terms of the trellis diagram corresponding to [53], [54] representing the time evolution of the channel input–output process. The trellis structure for the E PR4 channel has states. In the case of the PR4 channel, the input–output relationship permits the detector to operate independently on the output subsequences at even and odd time indices. The Viterbi algorithm can then be described in terms of a decoupled pair of -state trellises, as shown in Fig. 5. There has been considerable effort applied to simplifying Viterbi detector ar- chitectures for use in high data-rate, digital-recording systems. In particular, there are a number of formulations of the PR4 channel detector. See [131], [178], [206], [211], [50], and [205]. Analysis, simulation, and experimental measurements have conﬁrmed that PRML systems provide substantial performance improvements over RLL-coded, equalized peak detection. The beneﬁts can be realized in the form of 3–5-dB additional noise immunity at linear densities where optimized peak-detection bit-error rates are in the range of Alternatively, the gains can translate into increased linear density—in that range of error rates, PR4-based PRML channels achieve 15–25% higher linear density than -coded peak detection, with EPR4-based PRML channels providing an additional improve- ment of approximately 15% [189], [39]. The SNR loss of several PRML systems and MLSD relative to the matched-ﬁlter bound at a bit-error rate of was computed in [190]. The results show that, with the proper choice of PR target for a given density, PRML performance can achieve within 1–2 dB of the MLSD. In [207], simulation results for MLSD and PR4-based PRML detection on a differentiated Lorentzian channel with colored Gaussian media noise were compared to some of the capacity bounds discussed in Section II-B. For in the range of , PR4-based PRML required approximately 2–4 dB higher SNR than MLSD to achieve a bit-error rate of The SNR gap between MLSD and the telegraph- wave information-rate lower bound [171] was approximately 4 dB, and the gap from the unit-process upper bound [171] was approximately 7 dB. These results suggest that, through suitable coupling of equalization and coding, SNR gains as large as 6 dB over PR4-based PRML should be achievable. In Section V, we will describe some of the equalization and coding techniques that have been developed in an attempt to realize this gain. III. SHANNON THEORY OF CONSTRAINED CHANNELS In this section, we show how the implementation of record- ing systems based upon peak detection and PRML introduces the need for constraints to be imposed upon channel input sequences. We then review Shannon’s fundamental results on the theory of constrained channels and codes. A. Modulation Constraints 1) Runlength Constraints: At moderate densities, peak de- tection errors may arise from ISI-induced shifting of peak locations and drifting of clock phase due to an inadequate number of detected peak locations. The latter two problems are pattern-dependent, and the class of runlength-limited (RLL) sequences are intended to address them both [132], [101]. Speciﬁcally, in order to reduce the effects of pulse interference, one can demand that the derivative sequence of the channel input contain some minimum number, say , of symbols of value zero between consecutive nonzero values. Similarly, to prevent loss of clock synchronization, one can require that there be no more than some maximum number, say , of symbols of value zero between consecutive nonzero values in In this context, we mention that two conventions are used to map a binary sequence to the magnetization pattern along a track, or equivalently, to the two-level sequence In one convention, called nonreturn-to-zero (NRZ), one direction of magnetization (or ) corresponds to a stored and the other direction of magnetization (or ) corresponds to a stored . In the other convention, called nonreturn-to-zero-inverse (NRZI), a reversal of the direction of magnetization (or ) represents a stored and a nonreversal of magnetization (or ) represents a stored . The NRZI precoding convention may be interpreted as a translation of the binary information sequence into another binary sequence that is then mapped by the NRZ convention to the two-level sequence The relationship between and is deﬁned by where and denotes addition modulo . It is easy to see that and, therefore, Thus under the NRZI precoding convention, the constraints on the runlengths of consecutive zero symbols in are reﬂected 2268 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Fig. 6. Labeled directed graph for constraint. in corresponding constraints on the binary information sequences The set of sequences satisfying this constraint can be generated by reading the labels off of the paths in the directed graph shown in Fig. 6. 2) Constraints for PRML Channels: Two issues arise in the implementation of PRML systems that are related to properties of the recorded sequences. The ﬁrst issue is that, just as in peak detection systems, long runs of zero samples in the PR channel output can degrade the performance of the timing recovery and gain control loops. This dictates the use of a global constraint on the number of consecutive zero samples, analogous to the constraint described above. The second issue arises from a property of the PR systems known as quasicatastrophic error propagation [55]. This refers to the fact that certain bi-inﬁnite PR channel output sequences are represented by more than one path in the detector trellis. Such a sequence is pro- duced by at least two distinct channel input sequences. For the PR channels under consideration, namely, those with transfer polynomial , the difference sequences , corresponding to pairs of such input sequences and , are easily characterized. (For convenience, the symbols and are denoted by and , respectively, in these difference sequences.) Speciﬁcally, if and , then these difference sequences are and If and , the difference sequences are of the form and Finally, if and , then they are and As a consequence of the existence of these sequences, there could be a potentially unbounded delay in the merging of survivor paths in the Viterbi detection process beyond any speciﬁed time index , even in the absence of noise. It is therefore desirable to constrain the channel input sequences in such a way that these difference sequences are forbidden. This property makes it possible to limit the detector path memory, and therefore the decoding delay, without incurring any signiﬁcant degradation in the sequence estimates produced by the detector. In the case of PR4, this has been accomplished by limiting the length of runs of identical channel inputs in each of the even and odd interleaves, or, equivalently, the length of runs of zero samples in each interleave at the channel output, to be no more than a speciﬁed positive integer By incorporating interleaved NRZI (INRZI) precoding, the and constraints on output sequences translate into and constraints on binary input sequences The resulting constraints are denoted , where the “ ” may be interpreted as a Fig. 7. DC-free constrained sequences with DSV constraint, emphasizing the point that intersymbol interference is acceptable in PRML systems. It should be noted that the combination of constraints and an INRZI precoder have been used to prevent quasicatastrophic error propagation in EPR4 channels, as well. 3) Spectral-Null Constraints: The family of run- length-limited constraints and PRML constraints are representative of constraints whose description is essentially in the time domain (although the constraints certainly have impli- cations for frequency-domain characteristics of the constrained sequences). There are other constraints whose formulation is most natural in the frequency domain. One such constraint speciﬁes that the recorded sequences have no spectral content at a particular frequence ; that is, the average power spectral density function of the sequences has value zero at the speciﬁed frequency. The sequences are said to have a spectral null at frequency For an ensemble of sequences, with symbols drawn from the bipolar alphabet and generated by a ﬁnite labeled directed graph of the kind illustrated in Fig. 6, a necessary and sufﬁcient condition for a spectral null at frequency , where is the duration of a single recorded symbol, is that there exist a constant such that (6) for all recorded sequences and [145], [162], [209]. In digital recording, the spectral null constraints of most importance have been those that prescribe a spectral null at or DC. The sequences are said to be DC-free or charge-constrained. The concept of running digital sum (RDS) of a sequence plays a signiﬁcant role in the description and analysis of DC-free sequences. For a bipolar sequence , the RDS of a subsequence , denoted RDS is deﬁned as RDS From (6), we see that the spectral density of the sequences vanishes at if and only if the RDS values for all sequences are bounded in magnitude by some constant integer For sequences that assume a range of consecutive RDS values, we say that their digital sum variation (DSV) is Fig. 7 shows a graph describing the bipolar, DC-free system with DSV equal to DC-free sequences have found widespread application in optical and magnetic recording systems. In magnetic-tape IMMINK et al.: CODES FOR DIGITAL RECORDERS 2269 systems with rotary-type recording heads, such as the R- DAT digital audio tape system, they prevent write-signal distortion that can arise from transformer-coupling in the write electronics. In optical-recording systems, they reduce interference between data and servo signals, and also permit ﬁltering of low-frequency noise stemming from smudges on the disk surface. It should be noted that the application of DC- free constraints has certainly not been conﬁned to data storage. Since the early days of digital communication by means of cable, DC-free codes have been employed to counter the effects of low-frequency cutoff due to coupling components, isolating transformers, and other possible system impairments [35]. Sequences with a spectral null at also play an important role in digital recording. These sequences are often referred to as Nyquist free. There is in fact a close relationship between Nyquist-free and DC-free sequences. Speciﬁcally, consider sequences over the bipolar alphabet If is DC-free, then the sequence deﬁned by is Nyquist-free. DC/Nyquist-free sequences have spectral nulls at both and Such sequences can always be decomposed into a pair of interleaved DC-free sequences. This fact is exploited in Section V-C in the design of distance- enhancing, DC/Nyquist-free codes for PRML systems. In some recording applications, sequences satisfying both charge and runlength constraints have been used. In particular, a sequence in the charge-RLL constraint satisﬁes the runlength constraint, with the added restriction that the corresponding NRZI bipolar sequence be DC- free with DSV no larger than Codes using and constraints—known, respectively, as “zero-modulation” and “Miller-squared” codes—have found application in commercial tape-recording systems [160], [139], [150]. B. Discrete Noiseless Channels In Section III-A, we saw that the successful implementation of analog and digital signal-processing techniques used in data recording may require that the binary channel input sequences satisfy constraints in both the time and the frequency domains. Shannon established many of the fundamental properties of noiseless, input-constrained communication channels in Part I of his 1948 paper [173]. In that section, entitled “Discrete Noiseless Systems,” Shannon considered discrete communication channels, such as the teletype or telegraph channel, where the transmitted symbols were of possibly different time duration and satisﬁed a set of constraints as to the order in which they could occur. We will review his key results and illustrate them using the family of runlength-limited codes, introduced in Section III-A. Shannon ﬁrst deﬁned the capacity of a discrete noiseless channel as (7) where is the number of allowed sequences of length The following quote, which provides a method of computing the capacity, is taken directly from Shannon’s original paper (equation numbers added): Suppose all sequences of the symbols are allowed and these symbols have durations What is the channel capacity? If represents the number of sequences of duration , we have (8) The total number is equal to the sum of the number of sequences ending in and there are respectively. According to a well-known result in ﬁnite differences, is then asymptotic for large to where is the largest real solution of the characteristic equation (9) and, therefore, (10) Shannon’s results can be applied directly to the case of codes by associating the symbols with the different allowable sequences of ’s ending in a . The result is (11) where is the largest real solution of the equation (12) Shannon went on to describe constrained sequences by labeled, directed graphs, often referred to as state-transition diagrams. Again, quoting from the paper: A very general type of restriction which may be placed on allowed sequences is the following: We imagine a number of possible states For each state only certain symbols from the set can be transmitted (different subsets for the different states). When one of these has been transmitted the state changes to a new state depending both on the old state and the particular symbol transmitted. Shannon then proceeded to state the following theorem which he proved in an appendix: Theorem 1: Let be the duration of the th symbol which is allowable in state and leads to state Then the channel capacity is equal to where is the largest real root of the determinant equation: (13) where if and is zero otherwise. The condition that different states must correspond to dif- ferent subsets of the transmission alphabet is unnecessarily 2270 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 restrictive. For the theorem to hold, it sufﬁces that the state- transition diagram representation be lossless, meaning that any two distinct state sequences beginning at a common state and ending at a, possibly different, common state generate distinct symbol sequences [144]. This result can be applied to sequences in two different ways. In the ﬁrst, we let the be the collection of allowable runs of consecutive ’s followed by a ,as before. With this interpretation we have only one state since any concatenation of these runs is allowable. The determinant equation then becomes the same as (12) with replaced by In the second interpretation, we let the be associated with the binary symbols and and we use the graph with states shown earlier in Fig. 6. Note now that all of the symbols are of length so that the determinant equation is of the form (14), as shown at the bottom of this page. Multiplying every element in the matrix by , we see that this equation speciﬁes the eigenvalues of the connection matrix, or adjacency matrix, of the graph—that is, a matrix which has th entry equal to if there is a symbol from state that results in the new state and which has th entry equal to otherwise. (The notion of adjacency matrix can be extended to graphs with a multiplicity of distinctly labeled edges connecting pairs of states.) Thus we see that the channel capacity is equal to the logarithm of the largest real eigenvalue of the connection matrix of the constraint graph shown in Fig. 6. Shannon proceeded to produce an information source by assigning nonzero probabilities to the symbols leaving each state of the graph. These probabilities can be assigned in any manner subject to the constraint that for each state, the sum of the probabilities for all symbols leaving that state is . Shannon gave formulas as to how to choose these probabilities such that the resulting information source had maximum entropy. He further showed that this maximum entropy is equal to the capacity Speciﬁcally, he proved the following theorem. Theorem 8: Let the system of constraints considered as a channel have a capacity If we assign where is the duration of the th symbol leading from state to state and the satisfy then is maximized and equal to It is an easy matter to apply Shannon’s result to ﬁnd these probabilities for codes. The result is that the probability of a run of ’s followed by a is equal to for , and is the maximum entropy. Since the sum of these probabilities (summed over all possible runlengths) must equal we have (15) Note that this equation is identical to (12), except for the choice of the indeterminate. Thus the maximum entropy is achieved by choosing as the largest real root of this equation and the maximum entropy is equal to the capacity The probabilities of the symbols which result in the maximum entropy are shown in Fig. 8 (where now the branch labels are the probabilities of the binary symbols and not the symbols themselves). The maximum-entropy solution described in the theorem dictates that any sequence of length , starting in state and ending in state , has probability where denotes the probability of state Therefore, This is a special case of the notion of “typical long sequences” again introduced by Shannon in his classic paper. In this special case of maximum-entropy sequences, for large enough, all sequences of length are entropy-typical in this sense. This is analogous to the case of symbols which are of ﬁxed duration, equally probable, and statistically independent. Shannon proved that the capacity of a constrained channel represents an upper bound on the achievable rate of infor- mation transmission on the channel. Moreover, he deﬁned a concept of typical sequences and, using that concept, demon- strated that transmission at rates arbitrarily close to can in . . . . . . (14) IMMINK et al.: CODES FOR DIGITAL RECORDERS 2271 Fig. 8. Markov graph for maximum entropy sequences. principle be achieved. Speciﬁcally, he proved the following “fundamental theorem for a noiseless channel” governing transmission of the output of an information source over a constrained channel. We again quote from [173]. Theorem 9: Let a source have entropy (bits per symbol) and a channel have a capacity (bits per second). Then it is possible to encode the output of the source in such a way as to transmit at the average rate symbols per second over the channel where is arbitrarily small. It is not possible to transmit at an average rate greater than The proof technique, relying as it does upon typical long sequences, is nonconstructive. It is interesting to note, how- ever, that Shannon formulated the operations of the source encoder (and decoder) in terms of a ﬁnite-state machine, a construct that has since been widely applied to constrained channel encoding and decoding. In the next section, we turn to the problem of designing efﬁcient ﬁnite-state encoders. IV. CODES FOR NOISELESS CONSTRAINED CHANNELS For constraints described by a ﬁnite-state, directed graph with edge labels, Shannon’s fundamental coding theorem guar- antees the existence of codes that achieve any rate less than the capacity. Unfortunately, as mentioned above, Shannon’s proof of the theorem is nonconstructive. However, during the past 40 years, substantial progress has been made in the engineering design of efﬁcient codes for various constraints, including many of interest in digital recording. There have also been major strides in the development of general code construction techniques, and, during the past 20 years, rigorous mathematical foundations have been established that permit the resolution of questions pertaining to code existence, code construction, and code implementation complexity. Early contributors to the theory and practical application of constrained code design include: Berkoff [19]; Cattermole [34], [35]; Cohen [40]; Freiman and Wyner [69]; Gabor [73]; Jacoby [112], [113]; Kautz [125]; Lempel [136]; Patel [160]; and Tang and Bahl[188]; and, especially, Franaszek [57]–[64]. Further advances were made by Adler, Coppersmith, and Hassner (ACH) [3]; Marcus [141]; Karabed and Marcus [120]; Ashley, Marcus, and Roth [12]; Ashley and Marcus [9], [10]; Immink [104]; and Hollmann [91]–[93]. Fig. 9. Finite-state encoder schematic. In this section, we will survey selected aspects of this theo- retical and practical progress. The presentation largely follows [102], [146], and, especially, [144], where more detailed and comprehensive treatments of coding for constrained channels may be found. A. Encoders and Decoders Encoders have the task of translating arbitrary source in- formation into a constrained sequence. In coding practice, typically, the source sequence is partitioned into blocks of length , and under the code rules such blocks are mapped onto words of channel symbols. The rate of such an encoder is To emphasize the blocklengths, we sometimes denote the rate as It is most important that this mapping be done as efﬁ- ciently as possible subject to certain practical considerations. Efﬁciency is measured by the ratio of the code rate to the capacity of the constrained channel. A good encoder algorithm realizes a code rate close to the capacity of the constrained sequences, uses a simple implementation, and avoids the propagation of errors in the process of decoding. An encoder may be state-dependent, in which case the code- word used to represent a given source block is a function of the channel or encoder state, or the code may be state-independent. State-independence implies that codewords can be freely con- catenated without violating the sequence constraints. A set of such codewords is called self-concatenable. When the encoder is state-dependent, it typically takes the form of a synchronous ﬁnite-state machine, illustrated schematically in Fig. 9. A decoder is preferably state-independent. As a result of errors made during transmission, a state-dependent decoder could easily lose track of the encoder state, and begin to make errors, with no guarantee of recovery. In order to avoid 2272 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Fig. 10. Sliding-block decoder schematic. error propagation, therefore, a decoder should use a ﬁnite observation interval of channel bits for decoding, thus limiting the span in which errors may occur. Such a decoder is called asliding-block decoder. A sliding-block decoder makes a decision on a received word on the basis of the -bit word itself, as well as preceding -bit words and upcoming -bit words. Essentially, the decoder comprises a register of length -bit words and a logic function that translates the contents of the register into the retrieved - bit source word. Since the constants and are ﬁnite, an error in the retrieved sequence can propagate in the decoded sequence only for a ﬁnite distance, at most the decoder window length. Fig. 10 shows a schematic of a sliding-block decoder. An important subclass of sliding-block decoders are the block decoders, which use only a single codeword for reproducing the source word, i.e., Generally speaking, the problem of code design is to con- struct practical, efﬁcient, ﬁnite-state encoders with sliding- block decoders. There are several fundamental questions re- lated to this problem. a) For a rate , what encoder input and output block sizes and , with , are realizable? b) Can a sliding-block decodable encoder always be found? c) Can 100% efﬁcient sliding-block decodable encoders be designed when the capacity is a rational number ? d) Are there good bounds on basic complexity measures pertaining to constrained codes for a given constraint, such as number of encoder states, encoder gate complex- ity, encoding delay, and sliding-block decoder window length? Many of these questions have been answered fully or in part, as we now describe. B. Graphs and Constraints It is very useful and convenient, when stating code existence results and specifying code construction algorithms, to refer to labeled graph descriptions of constrained sequences. More precisely, a labeled graph (or a ﬁnite labeled directed graph) consists of a ﬁnite set of states ;a ﬁnite set of edges , where each edge has an initial state and a terminal state, both in ; and an edge labeling , where is a ﬁnite alphabet. Fig. 11 shows a Fig. 11. Typical labeled graph. “typical” labeled graph. When context makes it clear, a labeled graph may be called simply a “graph.” A labeled graph can be used to generate ﬁnite symbol sequences by reading off the labels along paths in the graph, thereby producing a word (also called a string or a block). For example, in Fig. 11, the word can be generated by following a path along edges with state sequence We will sometimes call word of length generated by an -block. The connections in the directed graph underlying a labeled graph are conveniently described by an adjacency matrix, as was mentioned in Section III. Speciﬁcally, for a graph , we denote by the adjacency matrix whose entry is the number of edges from state to state in The adjacency matrix, of course, has nonnegative integer entries. Note that the number of paths of length from state to state is simply , and the number of cycles of length is simply the trace of The fundamental object considered in the theory of con- strained coding is the set of words generated by a labeled graph. A constrained system (or constraint), denoted , is the set of all words (i.e., ﬁnite-length sequences) generated by reading the labels of paths in a labeled graph We will also, at times, consider right-inﬁnite sequences and sometimes bi-inﬁnite sequences The alphabet of symbols appearing in the words of is denoted We say that the graph presents or is a presentation of , and we write For a state in , the set of all ﬁnite words generated from is called the follower set of in , denoted by As mentioned above, a rate ﬁnite-state encoder will generate a word in the constrained system composed of a sequence of -blocks. For a constrained system presented by a labeled graph , it will be very useful to have an explicit description of the words in , decomposed into such nonoverlapping blocks of length Let be a labeled graph. The th power of , denoted , is the labeled graph with the same set of states as , but one edge for each path of length in , labeled by the -block generated by that path. The adjacency matrix of satisﬁes For a constrained system presented by a labeled graph , the th power of , denoted , is the constrained system pre- sented by So, is the constrained system obtained from IMMINK et al.: CODES FOR DIGITAL RECORDERS 2273 by grouping the symbols in each word into nonoverlapping words of length Note that the deﬁnition of does not depend on which presentation of is used. It is important to note that a given constrained system can be presented by many different labeled graphs and, depending on the context, one presentation will have advantages relative to another. For example, one graph may present the constraint using the smallest possible number of states, while another may serve as the basis for an encoder ﬁnite-state machine. There are important connections between the theory of constrained coding and other scientiﬁc disciplines, including symbolic dynamics, systems theory, and automata theory. Many of the objects, concepts, and results in constrained coding have counterparts in these ﬁelds. For example, the set of bi-inﬁnite sequences derived from a constrained system is called a soﬁc system (or soﬁc shift) in symbolic dynamics. In systems theory, these sequences correspond to a discrete- time, complete, time-invariant system. Similarly, in automata theory, a constrained system is equivalent to a regular language which is recognized by a certain type of automaton [94]. The interrelationships among these various disciplines are discussed in more detail in [15], [127], and [142]. The bridge to symbolic dynamics, established in [3], has proven to be especially signiﬁcant, leading to breakthroughs in both the theory and design of constrained codes. An interesting account of this development and its impact on the design of recording codes for magnetic storage is given in [2]. A very comprehensive mathematical treatment may be found in [138]. C. Properties of Graph Labelings In order to state the coding theorems, as well as for purposes of encoder construction, it will be important to consider labelings with special properties. We say that a labeled graph is deterministic if, at each state, the outgoing edges have distinct labels. In other words, at each state, any label generated from that state determines a unique outgoing edge from that state. Constrained systems that play a role in digital recording generally have natural presentations by a deterministic graph. For example, the labeled graphs in Figs. 6 and 7 are both deterministic. It can be shown that any constrained system can be presented by a deterministic graph [144]. Similarly, a graph is called codeterministic if, for each state, the incoming edges are distinctly labeled. Fig. 6 is not codeterministic, while Fig. 7 is. Many algorithms for constructing constrained codes begin with a deterministic presentation of the constrained system and transform it into a presentation which satisﬁes a weaker version of the deterministic property called ﬁnite anticipation. A labeled graph is said to have ﬁnite anticipation if there is an integer such that any two paths of length with the same initial state and labeling must have the same initial edge. The anticipation of refers to the smallest for which this condition holds. Similarly, we deﬁne the coanticipation of a labeled graph as the anticipation of the labeled graph obtained by reversing the directions of the edges in A labeled graph has ﬁnite memory if there is an integer such that the paths in of length that generate the same word all terminate at the same state. The smallest for which this holds is called the memory of and is denoted A property related to ﬁnite anticipation is that of being -deﬁnite. A labeled graph has this property if, given any word , the set of paths that generate all agree in the edge A graph with this property is sometimes said to have ﬁnite memory-and-anticipation. Note that, whereas the deﬁnition of ﬁnite anticipation involves knowledge of an initial state, the -deﬁnite property replaces that with knowledge of a ﬁnite amount of memory. Finally, as mentioned in Section III, a labeled graph is lossless if any two distinct paths with the same initial state and terminal state have different labelings. The graph in Fig. 6 has ﬁnite memory , and it is - deﬁnite because, for any given word of length at least , all paths that generate end with the same edge. In contrast, the graph in Fig. 7 does not have ﬁnite memory and is not deﬁnite. D. Finite-Type and Almost-Finite-Type Constraints There are some special classes of constraints, called ﬁnite- type and almost-ﬁnite type, that play an important role in the theory and construction of constrained codes. A constrained system is ﬁnite-type (a term derived from symbolic dynamics [138]) if it can be presented by a deﬁnite graph. Thus the -RLL constraint is ﬁnite-type. There is also a useful intrinsic characterization of ﬁnite-type constraints: there is an integer such that, for any symbol and any word of length at least , we have if and only if where is the sufﬁx of of length The smallest such integer , if any, is called the memory of and is denoted by Using this intrinsic characterization, we can show that not every constrained system of practical interest is ﬁnite-type. In particular, the charge-constrained system described by Fig. 7 is not. To see this, note that the symbol “ ” can be appended to the word but not to the word Nevertheless, this constrained system falls into a natural broader class of constrained systems. These systems can be thought of as “locally ﬁnite-type.” More precisely, a con- strained system is almost-ﬁnite-type if it can be presented by a labeled graph that has both ﬁnite anticipation and ﬁnite coanticipation. Since deﬁniteness implies ﬁnite anticipation and ﬁnite coan- ticipation, every ﬁnite-type constrained system is also almost- ﬁnite-type. Therefore, the class of almost-ﬁnite-type systems does indeed include all of the ﬁnite-type systems. This inclu- sion is proper, as can be seen by referring to Fig. 7. There, we see that the charge-constrained systems are presented by labeled graphs with zero anticipation (i.e., deterministic) and 2274 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 zero coanticipation (i.e., codeterministic). Thus these systems are almost-ﬁnite-type, but not ﬁnite-type. Constrained systems used in practical applications are virtually always almost- ﬁnite-type. Another useful property of constrained systems is irre- ducibility. A constraint is irreducible if, for every pair of words in , there is a word such that is in Equivalently, is irreducible if and only if it is presented by some irreducible labeled graph. In coding, it usually sufﬁces to consider irreducible constraints. Irrreducible constrained systems have a distinguished pre- sentation called the Shannon cover, which is the unique (up to labeled graph isomorphism) deterministic presentation of with a smallest number of states. The Shannon cover can be used to determine if the constraint is almost-ﬁnite- type or ﬁnite-type. More precisely, an irreducible constrained system is ﬁnite-type (respectively, almost-ﬁnite-type) if and only if its Shannon cover has ﬁnite memory (respectively, ﬁnite coanticipation). Referring to Section III, recall that the (base- capacity of a constrained system is given by cap where is the number of -blocks in The (base- ) capacity of an irreducible system can be obtained from the Shannon cover. In fact, as mentioned in Section III, if is any irreducible lossless presentation of , then cap E. Coding Theorems We now state a series of coding theorems that reﬁne and strengthen the fundamental coding theorem of Shannon, thus answering many of the questions posed above. Moreover, the proofs of these theorems are often constructive, leading to practical algorithms for code design. First, we establish some useful notation and terminology. An encoder usually takes the form of a synchronous ﬁnite- state machine, as mentioned earlier and shown schematically in Fig. 9. More precisely, for a constrained system and a positive integer ,an -encoder is a labeled graph satisfying the following properties: 1) each state of has out- degree , that is, outgoing edges; 2) ; and 3) the presentation is lossless. Atagged -encoder is an -encoder in which the outgoing edges from each state in are assigned distinct words, or input tags, from an alphabet of size We will sometimes use the same symbol to denote both a tagged -encoder and the underlying -encoder. Finally, we deﬁne a rate ﬁnite-state -encoder to be a tagged -encoder where the input tags are the - ary -blocks. We will be primarily concerned with the binary case, , and will call such an encoder a rate ﬁnite- state encoder for The encoding proceeds in the obvious fashion, given a selection of an initial state. If the current state is and the input data is the -block , the codeword generated is the -block that labels the outgoing edge from Fig. 12. Rate tagged encoder. state with input tag The next encoder state is the terminal state of the edge A tagged encoder is illustrated in Fig. 12. 1) Block Encoders: We ﬁrst consider the construction of the structurally simplest type of encoder, namely, a block encoder. A rate ﬁnite-state -encoder is called a rate block -encoder if it contains only one state. Block encoders have played an important role in digital storage systems. The following theorem states that block encoders can be used to asymptotically approach capacity. It follows essen- tially from Shannon’s proof of the fundamental theorem for noiseless channels. Block-Coding Theorem: Let be an irreducible con- strained system and let be a positive integer. There exists a sequence of rate block -encoders such that cap The next result provides a characterization of all block encoders. Block Code Characterization: Let be a constrained sys- tem with a deterministic presentation and let be a positive integer. Then there exists a block -encoder if and only if there exists a subgraph of and a collection of symbols of , such that is the set of labels of the outgoing edges from each state in Freiman and Wyner [69] developed a procedure that can be used to determine whether there exists a block - encoder for a given constrained system with ﬁnite memory Speciﬁcally, let be a deterministic presentation of For every pair of states and in , consider the set of all words of length that can be generated in by paths that start at and terminate at To identify a subgraph of as in the block-code characterization, we search for a set of states in satisfying Freiman and Wyner [69] simplify the search by proving that, when has ﬁnite memory , it sufﬁces to consider sets which are complete; namely, if is in and , then is also in Even with the restriction of the search to complete sets, this block-code design procedure is not efﬁcient, in general. However, given and , for certain constrained systems , such as the -RLL constraints, it does allow us to effectively compute the largest for which there exists a block IMMINK et al.: CODES FOR DIGITAL RECORDERS 2275 TABLE I OPTIMAL LENGTH- LIST FOR TABLE II RATE VARIABLE-LENGTH BLOCK ENCODER -encoder. In fact, the procedure can be used to ﬁnd a largest possible set of self-concatenable words of length Block Encoder Examples: Digital magnetic-tape systems have utilized block codes satisfying constraints, for and . Speciﬁcally, the codes, with rates and , respectively, were derived from optimal lists of sizes and , respectively. The simple rate code, known as the Frequency Modulation code, consists of the two codewords and . The 17 words of the list are shown in Table I. The 16 words remaining after deletion of the all- ’s word form the codebook for the rate Group Code Recording (GCR) code, which became the industry standard for nine-track tape drives. The input tag assignments are also shown in the table. See [146] for further details. A rate , code, developed by Franaszek [59], [44], became an industry standard in disk drives using peak detection. It can be described as a variable-length block code, and was derived using a similar search method. The encoder table is shown in Table II. Disk drives using PRML techniques have incorporated a block code satisfying constraints [45]. The code, with rate , was derived from the unique maximum size list of size The list has a very simple description. It is the set of length- binary words satisfying the following three conditions: 1) the maximum runlength of zeros within the word is no more than ; 2) the maximum runlengths of zeros at the beginning and end of the word are no more than ; and 3) the maximum runlengths of zeros at the beginning and end of the even interleave and odd interleave of the word are no more than . A rate , block code, derived from an optimal list of length- words with an analogous deﬁnition, has also been designed for use in PRML systems [161], [1]. 2) Deterministic Encoders: Block encoders, although con- ceptually simple, may not be suitable in many cases, since they might require a prohibitively large value of in order to achieve the desired rate. Allowing multiple states in the encoder can reduce the required codeword length. If each state in has at least outgoing edges, then we can obtain a deterministic -encoder by deleting excess edges. In fact, it is sufﬁcient (and necessary) for to have a subgraph where each state satisﬁes this condition. This result, characterizing deterministic encoders, is stated by Franaszek in [57]. Deterministic Encoder Characterization: Let be a con- strained system with a deterministic presentation and let be a positive integer. Then there exists a deterministic - encoder if and only if there exists such an encoder which is a subgraph of Let be a deterministic presentation of a constrained system According to the characterization, we can derive from a deterministic -encoder if and only if there exists a set of states in , called a set of principal states, such that for every This inequality can be expressed in terms of the character- istic vector of the set of states , where if and otherwise. Then, is a set of principal states if and only if (16) We digress brieﬂy to discuss the signiﬁcance of this in- equality. Given a nonnegative integer square matrix and an integer ,an -approximate eigenvector is a nonnegative integer vector satisfying (17) where the inequality holds componentwise. We refer to this inequality as the approximate eigenvector inequality, and we denote the set of all -approximate eigenvectors by Approximate eigenvectors will play an essential role in the constructive proof of the ﬁnite-state coding theorem in the next section, as they do in many code-construction procedures. The existence of approximate eigenvectors is guaranteed by the Perron–Frobenius theory [76], [170]. Speciﬁcally, let be the largest positive eigenvalue of , and let be a positive integers satisfying Then there exists a vector , with nonnegative integer components, satisfying (17). The following algorithm, taken from [3] and due originally to Franaszek, is an approach to ﬁnding such a vector. 2276 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Fig. 13. Rate MFM encoder. Franaszek Algorithm for Finding an Approximate Eigenvec- tor: Choose an initial vector whose entries are , where is a nonnegative integer. Deﬁne inductively Let , where is the ﬁrst integer such that There are two situations that can arise: a) and b) Case a) means that we have found an approximate eigenvector, and in case b) there is no solution, so we increase and start from the top again. There may be multiple solutions for the vector The choice of the vector may affect the complexity of the code constructed in this way. The components of are often called weights. From (16), it follows that is a set of principal states if and only if the characteristic vector is an - approximate eigenvector. Hence, we can ﬁnd whether there is a deterministic -encoder by applying the Franaszek algorithm to the matrix , the integer , and the all- ’s vector as the initial vector A nonzero output vector is a necessary and sufﬁcient condition for the existence of a set of principal states, for which is then a characteristic vector. Deterministic Encoder Example: The rate , encoder—known as Modiﬁed Frequency Modulation code, Miller code, or Delay Modulation—is a determinisitic encoder. The encoder is derived from the second power of the Shannon cover of the constraint. A set of principal states is Fig. 13 shows a rate deterministic encoder. In fact, the tagged encoder in Fig. 12 is a simpler description of the MFM tagged encoder obtained by “merging” states and in Fig. 13. (See Section IV-F for more on merging of states.) 3) Finite-State Coding Theorem: Although deterministic encoders can overcome some of the limitations of block encoders, further improvements may arise if we relax the deterministic property. In this section, we show that, for a desired rate where cap , even though a deterministic encoder may not exist, a ﬁnite-state encoder always does. If an encoder has ﬁnite anticipation , then we can decode in a state-dependent manner, beginning at the initial state , and retracing the path followed by the encoder, as follows. If the current state is , then the current codeword to be decoded, together with the upcoming codewords, constitute a word of length (measured in -blocks) that is generated by a path that starts at By deﬁnition of anticipation, the initial edge of such a path is uniquely determined; the decoded -block is the input tag of , and the next decoder state is the terminal state of This decoding method will invert the encoder when applied to valid codeword sequences. The output of the decoder will be identical to the input to the encoder, possibly with a shift of input -blocks. The following theorem establishes that, with ﬁnite antic- ipation, invertible encoders can achieve all rational rates less than or equal to capacity, with any input and output blocklengths and satisfying Finite-State Coding Theorem: Let be a constrained sys- tem. If cap then there exists a rate ﬁnite-state -encoder with ﬁnite anticipation. The theorem improves upon Shannon’s result in three important ways. First, the proof is constructive, relying upon the state-splitting algorithm, which will be discussed in Section IV-F. Next, it proves the existence of ﬁnite-state - encoders that achieve rate equal to the capacity cap , when cap is rational. Finally, for any positive integers and satisfying the inequality cap , there is a rate ﬁnite-state -encoder that operates at rate In particular, choosing and relatively prime, one can design an invertible encoder using the smallest possible codeword length compatible with the chosen rate For completeness, we also state the more simply proved ﬁnite-state inverse-coding theorem. Finite-State Inverse-to-Coding Theorem: Let be a con- strained system. Then, there exists a rate ﬁnite-state -encoder only if cap 4) Sliding-Block Codes and Block-Decodable Codes: As mentioned earlier, it is often desirable for ﬁnite-state encoders to have decoders that limit the extent of error propagation. The results in this section address the design of encoders with sliding-block decoders, which we now formally deﬁne. Let and be integers such that A sliding- block decoder for a rate ﬁnite-state -encoder is a mapping such that, if is any sequence of -blocks generated by the encoder from the input tag sequence of -blocks , then, for We call the look-ahead of and the look-behind of The sum is called the decoding window length of See Fig. 10, where and IMMINK et al.: CODES FOR DIGITAL RECORDERS 2277 TABLE III RATE SLIDING-BLOCK-DECODABLE ENCODER As mentioned earlier, a single error at the input to a sliding- block decoder can only affect the decoding of -blocks that fall in a “window” of length at most , measured in -blocks. Thus a sliding-block decoder controls the extent of error propagation. The following result, due to Adler, Coppersmith, and Hass- ner [3], improves upon the ﬁnite-state coding theorem for ﬁnite-type constrained systems. Sliding-Block Code Theorem for Finite-Type Systems: Let be a ﬁnite-type constrained system. If cap , then there exists a rate ﬁnite-state -encoder with a sliding-block decoder. This result, sometimes called the ACH theorem, follows readily from the proof of the ﬁnite-state coding theorem. The constructive proof technique, based upon state-splitting, is sometimes referred to as the ACH algorithm (see Section IV-F). Sliding-Block Code Example: The con- straint has capacity Adler, Hassner, and Moussouris [4] used the state-splitting algorithm to construct a rate , encoder with ﬁve states, represented in tabular form in Table III. Entries in the “state” columns indicate the output word and next encoder state. With the input tagging shown, the encoder is sliding-block decodable with The decoder error propagation is limited to ﬁve input bits. The same underlying encoder graph was independently constructed by Jacoby [112] using “look-ahead” code design techniques. Weathers and Wolf [112] applied the state-splitting algorithm to design a -state, sliding-block-decodable encoder with error propagation at most 5 input bits. This encoder has the distinction of achieving the smallest possible number of states for this constraint and rate [143]. Ablock-decodable encoder is a special case of - sliding-block decodable encoders where both and are zero. Because of the favorable implications for error propagation, a block-decodable encoder is often sought in practice. The following result characterizes these encoders completely. Block-Decodable Encoder Characterization: Let be a constrained system with a deterministic presentation and let be a positive integer. Then there exists a block decodable -encoder if and only if there exists such an encoder which is a subgraph of It has been shown that the general problem of deciding whether a particular subgraph of can be input-tagged in such a way as to produce a block-decodable encoder is NP-complete [8]. Nevertheless, for certain classes of constraints, and many other speciﬁc examples, such an input-tag assignment can be found. Block-Decodable Code Examples: For certain irreducible constrained systems, including powers of -RLL con- strained systems, Franaszek [57], [58] showed that whenever there is a deterministic -encoder which is a subgraph of the Shannon cover, there is also such an encoder that can be tagged so that it is block-decodable. In fact, the MFM encoder of Fig. 13 is block-decodable. For -RLL constrained systems, an explicit description of such a labeling was found by Gu and Fuja [79] and, indepen- dently, by Tjalkens [191]. They show that their labeling yields the largest rate attainable by any block-decodable encoder for any given -RLL constrained system. The Gu–Fuja construction is a generalization of a coding scheme introduced by Beenker and Immink [16]. The under- lying idea, which is quite generally applicable, is to design block-decodable encoders by using merging bits between con- strained words [16], [112], [104]. Each input -block has a unique constrained -block representation, where The encoder uses a look-up table for translating source words into constrained words of length plus some logic circuitry for determining the merging bits. Decoding is extremely simple: discard the merging bits and translate the -bit word into the -bit source word. For sequences, the encoder makes use of the set of all -constrained -blocks with at least leading zeroes and at most trailing zeroes. The parameters are assumed to satisfy and Using a look-up table or enumeration techniques [102, p. 117], [188], [42], the encoder maps each of the -bit input tags to a unique -block in where The codewords in are not necessarily freely concatenable, however. When the concatenation of the current codeword with the preceding one violates the constraint, the encoder inverts one of the ﬁrst zeroes in the current codeword. The condition guarantees that such an inversion can always resolve the constraint violation. In this case, the ﬁrst bits of each codeword may be regarded as the merging bits. Immink [106] gave a constructive proof that codes with merging bits can be made for which As a result, codes with a rate only 0.1% less than Shannon’s capacity can be constructed with codewords of length Such long codewords could present an additional practical problem—beyond that of mapping the input words to the constrained words, which can be handled by enumerative coding—because a single channel bit error could corrupt the entire data in the decoded word. One proposal for resolving this difﬁculty is to use a special conﬁguration of the error-correcting code and the recording code [22], [49], [106]. Another well-known application of this method is that of the Eight-to-Fourteen Modulation (EFM) code, a rate code which is implemented in the compact audio disc [96], [84], [109]. A collection of 256 codewords is drawn from the set of length- words that satisfy the constraint. With this codebook, two merging bits would sufﬁce to achieve a rate block-decodable code. However, in order to induce more favorable low-frequency spectral 2278 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 characteristics in the recorded code sequences, the encoding algorithm introduces an additional merging bit, yielding the rate block-decodable EFM encoder. 5) Extensions: In this section, we present strengthened versions of both the ﬁnite-state coding theorem and the ACH theorem. Anoncatastrophic encoder is a tagged -encoder with ﬁnite anticipation and the additional property that, whenever the sequences of output labels of two right-inﬁnite paths differ in only ﬁnitely many places, the corresponding sequences of input tags also differ in only ﬁnitely many places. A rate ﬁnite-state tagged -encoder is noncatastrophic if the corresponding tagged -encoder is noncatastrophic. Noncatastrophic encoders restrict error propagation in the sense that they limit the number of decoded data errors spawned by an isolated channel error. They do not necessarily limit the time span in which these errors occur. The concept of noncatastrophicity appears in the theory of convolutional codes, as well, where it actually coincides with sliding-block decodability [137, Ch. 10]. The following theorem is due to Karabed and Marcus [120]. Noncatastrophic Encoder Theorem: Let be a constrained system. If cap , then there exists a noncatastrophic rate ﬁnite-state -encoder. For the noncatastrophic encoders constructed in the proof of the theorem, the decoding errors generated by a single channel error are, in fact, conﬁned to two bursts of ﬁnite length, although these bursts may appear arbitrarily far apart. Karabed and Marcus also extended the ACH theorem to almost-ﬁnite-type systems. Sliding-Block Code Theorem for Almost-Finite-Type Sys- tems: Let be an almost-ﬁnite-type constrained system. If cap , then there exists a rate ﬁnite-state -encoder with a sliding-block decoder. The proof of this result is quite complicated. Although it does not translate as readily as the proof of the ACH theorem into a practical encoder design algorithm, the proof does introduce new and powerful techniques that, in combination with the state-splitting approach, can be applied effectively in certain cases. For example, some of these techniques were used in the de- sign of a 100% efﬁcient, sliding-block-decodable encoder for a combined charge-constrained runlength- limited system [8]. In fact, it was the quest for such an encoder that provided the original motivation for the theorem. Several of the ideas in the proof of this generalization of the ACH theorem from ﬁnite-type to almost-ﬁnite-type systems have also played a role in the design of coded-modulation schemes based upon spectral-null constraints, discussed in Section V-C. F. The State-Splitting Algorithm There are many techniques available to construct efﬁ- cient ﬁnite-state encoders. The majority of these construction techniques employ approximate eigenvectors to guide the construction process. Among these code design techniques is the state-splitting algorithm (or ACH algorithm) intro- duced by Adler, Coppersmith, and Hassner [3]. It implements the proof of the ﬁnite-state coding theorem and provides a recipe for constructing ﬁnite-state encoders that, for ﬁnite-type constraints, are sliding-block-decodable. The state-splitting approach combines ideas found in Patel’s construction of the Zero-Modulation (ZM) code [160] and earlier work of Franaszek [62]–[64] with concepts and results from the math- ematical theory of symbolic dynamics [138]. The ACH algorithm proceeds roughly as follows. For a given deterministic presentation of a constrained system and an achievable rate cap , we iteratively apply a state-splitting transformation beginning with the th-power graph The choice of transformation at each step is guided by an approximate eigenvector, which is updated at each iteration. The procedure culminates in a new presentation of with at least outgoing edges at each state. After deleting edges, we are left with an -encoder, which, when tagged, gives our desired rate ﬁnite-state -encoder. (Note that, if is ﬁnite-type, the encoder is sliding-block- decodable regardless of the assignment of input tags.) In view of its importance in the theory and practice of code design, we now present the state-splitting algorithm in more detail. This discussion follows [144], to which we refer the reader for further details. The basic step in the procedure is an out-splitting of a graph, and, more speciﬁcally, an approximate-eigenvector consistent out-splitting, both of which we now describe. An out-splitting of a labeled graph begins with a partition of the set of outgoing edges for each state in into disjoint subsets The partition is used to derive a new labeled graph The set of states consists of descendant states for every Outgoing edges from state in are partitioned among its descendant states and replicated in to each of the descendant terminal states in the following manner. For each edge from to in , we determine the partition element to which belongs, and endow with edges from to for The label on the edge in is the same as the label of the edge in (Sometimes an out-splitting is called a round of out-splitting to indicate that several states may have been split simultaneously.) The resulting graph generates the same system , and has anticipation at most Figs. 14 and 15 illustrate an out-splitting operation on state Given a labeled graph , a positive integer , and an -approximate eigenvector ,an - consistent partition of is deﬁned by partitioning the set of outgoing edges for each state in into disjoint subsets IMMINK et al.: CODES FOR DIGITAL RECORDERS 2279 Fig. 14. Before out-splitting. Fig. 15. After out-splitting. Fig. 16. Before -consistent out-splitting. with the property that for (18) where denotes the terminal state of the edge , are nonnegative integers, and for every (19) The out-splitting based upon such a partition is called an - consistent splitting. The vector indexed by the states of the split graph and deﬁned by is called the induced vector.An -consistent partition or splitting is called nontrivial if for at least one state and both and are positive. Figs. 16 and 17 illustrate an -consistent splitting. Fig. 17. After -consistent out-splitting. We now summarize the steps in the state-splitting algorithm for constructing a ﬁnite-state encoder with ﬁnite anticipation [144]. The State-Splitting Algorithm: 1) Select a labeled graph and integers and as follows: a) Find a deterministic labeled graph (or more gen- erally a labeled graph with ﬁnite anticipation) which presents the given constrained system (most con- strained systems have a natural deterministic repre- sentation that is used to describe them in the ﬁrst place). b) Find the adjacency matrix of c) Compute the capacity cap d) Select a desired code rate satisfying cap (one usually wants to keep and relatively small for complexity reasons). 2) Construct 3) Using the Franaszek algorithm of Section IV-E2, ﬁnd an -approximate eigenvector 4) Eliminate all states with from , and restrict to an irreducible sink of the resulting graph, meaning a maximal irreducible subgraph with the property that all edges with initial states in have their terminal states in Restrict to be indexed by the states of 5) Iterate steps 5a)–5c) below until the labeled graph has at least edges outgoing from each state: a) Find a nontrivial -consistent partition of the edges in (This can be shown to be possible with a state of maximum weight.) b) Find the -consistent splitting corresponding to this partition, creating a labeled graph and an approx- imate eigenvector c) Replace by and by 6) At each state of , delete all but outgoing edges and tag the remaining edges with -ry -blocks, one for 2280 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 each outgoing edge. This gives a rate ﬁnite-state -encoder. At every iteration, at least one state is split in a nontrivial way. Since a state with weight will be split into at most descendant states throughout the whole iteration process, the number of iterations required to generate the encoder graph is no more than Therefore, the anticipation of is at most For the same reason, the number of states in is at most The operations of taking higher powers and out-splitting preserve deﬁniteness (although the anticipation may increase under out-splitting). Therefore, if is ﬁnite-type and is a ﬁnite-memory presentation of , any -encoder constructed by the state-splitting algorithm will be - deﬁnite for some and and, therefore, sliding-block- decodable. The execution of the sliding-block code algorithm can be made completely systematic, in the sense that a computer program can be devised to automatically generate an encoder and decoder for any valid code rate. Nevertheless, the appli- cation of the method to just about any nontrivial code design problem will beneﬁt from the interactive involvement of the code designers. There are some practical tools that can help the designer make “good” choices during the construction process, meaning choices that optimize certain measures of performance and complexity. Among them is state merging, a technique that can be used to simplify the encoder produced by the ACH algorithm, as we now describe. Let be a labeled graph and let and be two states in such that Suppose that is an - approximate eigenvector, and that The - merger of is the labeled graph obtained from by: 1) eliminating all edges in ; 2) redirecting into state all remaining edges coming into state ; and 3) eliminating the state It is straightforward to show that , and the vector deﬁned by for all vertices of is an -approximate eigenvector. This operation reduces the ﬁnal number of encoder states by The general problem of determining when to apply state merging during the state- splitting procedure in order to achieve the minimum number of states in the ﬁnal encoder remains open. It is also desirable to minimize the sliding-block decoder window size, in order to limit error propagation as well as decoder complexity. There are several elements of the code design that inﬂuence the window size, such as initial presen- tation, choice of approximate eigenvector, selection of out- splittings, excess edge elimination, and input tag assignment. There are approaches that, in some cases, can be used during the application of the state-splitting algorithm to help reduce the size of the decoder window, but the problem of minimizing the window size remains open. In this context, it should be noted that there are alternative code-design procedures that provide very useful heuristics for constructing sliding- block-decodable encoders with small decoding window. They also imply useful upper bounds on the minimum size of the decoding window and on the smallest possible anticipation (or decoding delay) [12]. In particular, Hollmann [92] has recently developed an approach, inﬂuenced by earlier work of Immink [103], which combines the state-splitting method with a generalized look-ahead encoding technique called bounded- delay encoding, originally introduced by Franaszek [61], [63]. In a number of cases, it was found that this hybrid code design technique produced a sliding-block-decodable encoder with smaller window length than was achieved using other methods. Several examples of such codes for speciﬁc constraints of practical importance were constructed in [92]. For more extensive discussion of complexity measures and bounds, as well as brief descriptions of other general code construction methods, the reader is referred to [144]. G. Universality of State Splitting The guarantee of a sliding-block decoder when is ﬁnite- type, along with the explicit bound on the decoder window length, represent key strengths of the state-splitting algorithm. Another important property is its universality. In this context, we think of the state-splitting algorithm as comprising a selection of a deterministic presentation of a constrained system ,an -approximate eigenvector , a sequence of -consistent out-splittings, followed by deletion of excess edges, and ﬁnally an input-tag assignment, resulting in a tagged -encoder. For integers and a function from -blocks of to the -ary alphabet (such as a sliding-block decoder), we deﬁne to be the induced mapping on bi-inﬁnite sequences given by where For convenience, we use the notation to denote For a tagged -encoder with sliding-block decoder ,we take the domain of the induced mapping to be the set of all bi-inﬁnite (output) symbol sequences obtained from We say that a mapping is a sliding-block -decoder if is a sliding-block decoder for some tagged -encoder. The universality of the state-splitting algorithm is summa- rized in the following theorem due to Ashley and Marcus [9], which we quote from [144]. Universality Theorem: Let be an irreducible constrained system and let be a positive integer. a) Every sliding-block -decoder has a unique mini- mal tagged -encoder, where minimality is in terms of number of encoder states. b) If we allow an arbitrary choice of deterministic pre- sentation of and -approximate eigenvector , then the state-splitting algorithm can ﬁnd a tagged -encoder for every sliding-block -decoder. If we also allow merging of states (i.e., -merging as described above), then it can ﬁnd the minimal tagged -encoder for every sliding-block -decoder. c) If we ﬁx to be the Shannon cover of , but allow an arbitrary choice of -approximate eigenvector IMMINK et al.: CODES FOR DIGITAL RECORDERS 2281 , then the state-splitting algorithm can ﬁnd a tagged -encoder for every sliding-block -decoder , modulo a change in the domain of , possibly with a constant shift of each bi-inﬁnite sequence prior to applying (but with no change in the decoding function itself). If we also allow merging of states, then, modulo the same changes, it can ﬁnd the minimal tagged -encoder for every sliding-block - decoder. In particular, it can ﬁnd a sliding-block - decoder with minimal decoding window length. Certain limitations on the use of the algorithm should be noted, however [9]. If we apply the state-splitting algorithm to the Shannon cover of an irreducible constrained system , it need not be able to ﬁnd a sliding-block -decoder with smallest number of encoder states in its minimal tagged -encoder. Similarly, if we start with the Shannon cover of an irre- ducible constrained system and, in addition, we ﬁx to be a minimal -approximate eigenvector (i.e., with smallest eigenvector component sum), then the algorithm may fail to ﬁnd a sliding-block -decoder with minimum decoding window length [119], [103], [9]. The universality of the state-splitting algorithm is an at- tractive property, in that it implies that the technique can be used to produce the “best” codes. However, in order to harness the power of this design tool, strategies for making the right choices during the execution of the construction procedure are required. There is considerable room for further research in this direction, as well as in the development of other code-construction methods. H. Practical Aspects of High-Rate Code Design The construction of very high rate -constrained codes and DC-balanced codes is an important practical problem [71], [102], [208]. The construction of such high-rate codes is far from obvious, as table look-up for encoding and decoding is an engineering impracticality. The usual approach is to sup- plement the source bits with bits. Under certain, usually simple, rules the source word is modiﬁed in such a way that the modiﬁed word plus supplementary bits comply with the constraints. The information that certain modiﬁcations have been made is carried by the supplementary bits. The receiver, on reception of the word, will undo the modiﬁcations. In order to reduce complexity and error propagation, the number of bits affected by a modiﬁcation should be as small as possible. We now give some examples of such constructions. A traditional example of a simple DC-free code is called the polarity bit code [26]. The source symbols are supplemented by one bit called the polarity bit. The encoder has the option to transmit the -bit word without modiﬁcation or to invert all symbols. The choice of a speciﬁc translation is made in such a way that the running digital sum is as close to zero as possible. It can easily be shown that the running digital sum takes a ﬁnite number of values, so that the sequence generated is DC-balanced. A surprisingly simple method for transforming an arbi- trary word into a codeword having equal numbers of ’s and ’s—that is, a balanced or zero-disparity word—was published by Knuth [129] and Henry [85]. Let be the disparity of the binary source word Let be the running digital sum of the ﬁrst bits of or and let be the word with its ﬁrst bits inverted. For example, if we have and If is of even length , and if we let stand for , then the quantity is It is immediate that (no symbols inverted), and (all symbols inverted). We may, there- fore, conclude that every word can be associated with at least one , so that ,or is balanced. The value of is encoded in a (preferably) zero-disparity word of length even. If and are both odd, we can use a similar construction. The maximum codeword length of is governed by Some other modiﬁcations of the basic scheme are discussed in Knuth [129] and Alon [5]. The sequence replacement technique [202] converts source words of length into -constrained words of length The control bit is set to and appended at the beginning of the -bit source word. If this -bit sequence satisﬁes the prescribed constraint it is transmitted. If the constraint is violated, i.e., a runlength of at least ’s occur, we remove the trespassing ’s. The position where the start of the violation was found is encoded in bits, which are appended at the beginning of the -bit word. Such a modiﬁcation is signaled to the receiver by setting the control bit to . The codeword remains of length The procedure above is repeated until all forbidden subsequences have been removed. The receiver can reconstruct the source word as the position information is stored at a predeﬁned position in the codeword. In certain situations, the entire source word has to be modiﬁed which makes the procedure prone to error propagation. The class of rate -constrained codes, was constructed to minimize 2282 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 Fig. 18. Probability that no sequence of drawings from a selection set of random sequences satisﬁes the constraint. Code rate Upper curve: codeword length , selection set size ; lower curve: codeword length , selection set size error propagation [111]. Error propagation is conﬁned to one decoded 8-bit symbol, irrespective of the codeword length Recently, the publications by Fair et al. [48] and Immink and Patrovics [110] on guided scrambling brought new insights into high-rate code design. Guided scrambling is a member of a larger class of related coding schemes called multimode codes. In multimode codes, the -bit source word is mapped into -bit codewords. Each source word can be represented by a member of a selection set consisting of codewords. Examples of such mappings are the guided scrambling algorithm presented by Fair et al. [48], the DC-free coset codes of Deng and Herro [43], and the scrambling using a Reed–Solomon code by Kunisa et al. [135]. A mapping is considered to be “good” if the selection set contains sufﬁciently distinct and random codewords. The encoder transmits the codeword that minimizes, accord- ing to a prescribed criterion, some property of the encoded sequence, such as its low-frequency spectral content. In gen- eral, there are two key elements which need to be chosen judiciously: a) the mapping between the source words and their corresponding selection sets, and b) the criterion used to select the “best” word. The use of multimode codes is not conﬁned to the generation of DC-free sequences. Provided that is large enough and the selection set contains sufﬁciently different codewords, multimode codes can also be used to satisfy almost any channel constraint with a suitably chosen selection method. For given rate and proper selection criteria, the spectral content of multimode codes is very close to that of maxentropic RDS- constrained sequences. A clear disadvantage is that the encoder needs to generate all possible codewords, compute the criterion, and make the decision. In the context of high-rate multimode codes, there is in- terest in weakly constrained codes [107]. Weakly constrained codes may produce sequences that violate the constraints with probability It is argued that if the channel is not free of errors, it is pointless to feed the channel with perfectly constrained sequences. We illustrate the effectiveness of this idea by considering the properties of two examples of weak codes. Fig. 18 shows the probability that no sequence taken from a selection set of size of random sequences obeys the constraint. Let the code rate , the codeword length , and the size of the selection set Then we observe that with probability a codeword violates the constraint. The alternative implementation [111] requires a rate of —four times the redundancy of the weakly constrained code—to strictly guarantee the same constraint. V. CONSTRAINED CODES FOR NOISY RECORDING CHANNELS In Section III-A, we indicated how the implementation of timing recovery, gain control, and detection algorithms in recording systems created a need for suitably constrained recording codes. These codes are typically used as an inner code, in concatenation with an outer error-correcting code. The error-correcting codes improve system performance by introducing structure, usually of an algebraic nature, that increases the separation of code sequences as measured by some distance metric, such as Hamming distance. A number of authors have addressed the problem of endow- ing constrained codes with advantageous distance properties. Metrics that have been considered include Hamming distance, edit (or Levenshtein) distance, and Lee distance. These metrics arise in the context of a variety of error types, including random-bit errors, insertion and deletion errors, bitshift errors, and more generally, burst errors. Code constructions, perfor- mance analyses, as well as lower and upper bounds on the IMMINK et al.: CODES FOR DIGITAL RECORDERS 2283 achievable size of constrained codes with speciﬁed distance properties are surveyed in [144]. It is fair to say that the application of constrained codes with random or burst-error correction capabilities, proposed largely in the context of storage systems using symbol-by- symbol detection such as peak detection, has been extremely limited. However, the advent of digital signal processing techniques such as PRML has created a new role for recording codes, analogous to the role of trellis-coded modulation in digital communications. In this section, we describe how appropriately constrained code sequences can improve PRML system performance by increasing the separation between the channel output sequences with respect to Euclidean distance. A. PRML Performance Bounds and Error Event Analysis The design of distance-enhancing constrained codes for recording channels requires an understanding of the perfor- mance of the PRML Viterbi detector, which we now brieﬂy review. The detector performance is best understood in terms of error events. For a pair of input sequences and , deﬁne the input error sequence and the output error sequence Aclosed error event corresponds to a polynomial input error sequence where and are ﬁnite integers, , and A closed error event is said to be simple if the condition is not true for any integer where is the memory of the channel. An open error event corresponds to a right-inﬁnite input error sequence of the form where inﬁnitely many are nonzero, but the Euclidean norm is ﬁnite In general, for an error event , with corresponding input error sequence and output error sequence , the squared-Euclidean distance is deﬁned as The number of channel input-bit errors corresponding to an error event is given by The ML detector produces an error when the selected trellis path differs from the correct path by a sequence of error events. The union bound provides an upper bound to the probability of an error event beginning at some time by considering the set of all possible simple error events ﬁrst event at time TABLE IV ERROR EVENT MULTIPLICITY GENERATING FUNCTIONS which in the assumed case of AWGN, yields Reorganizing the summation according to the error event distance , the bound is expressed as: where the values , known as the error event distance spectrum, are deﬁned by At moderate-to-high SNR, the performance of the system is largely dictated by error events with small distance In particular, the events with the minimum distance will be the dominant contributors to the union bound, leading to the frequently used approximation For a number of the PR channel models applicable to recording, the error event distance spectrum values, as well as the corresponding input error sequences, have been determined for a range of values of the distance [7], [198], [6]. The calculation is made somewhat interesting by the fact, mentioned in Section II-C2, that the PR trellises support closed error events of unbounded length having certain speciﬁed, ﬁnite distances. For channels with limited ISI, analytical methods may be applied in the characterization of low distance events. However, for larger distances, and for PR channel polynomials of higher degree, computer search methods have been more effective. Table IV gives several terms of the error event multi- plicity generating functions for several PR channels. Tables V and VI, respectively, list the input error sequences for simple closed events on the PR4 and EPR4 channels having squared-distance Table VII describes the input error sequences for simple closed events on the E PR4 channel having squared-distance In the error sequence tables, the symbol “ ” is used to designate ,” “ ” is used to designate “ ,” and a parenthesized string denotes any positive number of repetitions of the string 2284 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 TABLE V CLOSED ERROR EVENTS (PER INTERLEAVE)FOR PR4 CHANNEL, TABLE VI CLOSED ERROR EVENTS FOR EPR4 CHANNEL, B. Code Design Strategy The characterization of error sequences provides a basis for the design of constrained codes that eliminate events with a small Euclidean distance, thereby increasing the minimum distance and giving a performance improvement [123], [182], [122], [154]. This operation is similar in nature to expurgation in the context of algebraic codes. More speciﬁcally, the design of distance-enhancing con- strained codes for PRML systems is based upon the following strategy. First, we identify the input error sequences with , where is the target distance of the coded channel. Then, we determine a list of input error strings that, if eliminated by means of a code constraint, will prevent the occurrence of error events with We denote the set of ternary error sequences satisfying this constraint by In order to prevent these error strings, we must next de- termine a code constraint with the property that the corre- sponding set of input error sequences satisﬁes (20) There are many choices for the error strings , as well as for constraints satisfying (20). The problem of identifying TABLE VII CLOSED ERROR EVENTS FOR E PR4 CHANNEL, those that can generate the best practical distance-enhancing codes—with a speciﬁed coding gain, high-rate, simple encoder and decoder, and low-complexity sequence detector—remains open. The code constraints and the PR channel memory are then incorporated into a single detector trellis that can serve as the basis for the Viterbi detector. The ﬁnal step in the design procedure is to construct an efﬁcient code into the constraints This can be accomplished using code design techniques such as those discussed in Section IV. It is useful to distinguish between two cases in implementing this strategy for the PR channels we have discussed. The cases are determined by the relationship of the minimum distance to the matched-ﬁlter-bound (MFB) distance, , where the energy in the channel impulse response. The ﬁrst case pertains to those channels which are said to achieve the MFB including PR4, EPR4, and PR1. For these channels, the set of minimum-distance input error sequences includes , and so any distance-enhancing code constraint must prevent this input error impulse from occurring. The second case involves channels which do not achieve the MFB This case applies to E PR4, for all , as well as EPR2, for all Note that, in this situation, a minimum-distance input error sequence—in fact, every er- ror sequence satisfying —has length strictly greater than , where event length refers to the span between the ﬁrst and last nonzero symbols. These events can often be eliminated with constraints that are quite simply speciﬁed and for which practical, efﬁcient codes are readily constructed. For the latter class of channels, we can determine distance- enhancing constraints that increase the minimum distance to , yet are characterizable in terms of a small list of IMMINK et al.: CODES FOR DIGITAL RECORDERS 2285 relatively short forbidden code strings. (We will sometimes denote such constraints by This permits the design of high-rate codes, and also makes it possible to limit the complexity of the Viterbi detector, since the maximum length of a forbidden string may not exceed too signiﬁcantly, or at all, the memory of the uncoded channel. Consequently, and perhaps surprisingly, the design of high-rate, distance- enhancing codes with acceptable encoder/decoder and Viterbi detector complexity proves to be considerably simpler for the channels in the second group, namely, the channels with relatively larger intersymbol interference. We now turn to a discussion of some speciﬁc distance- enhancing constraints and codes for partial-response channels. C. Matched-Spectral-Null Constraints As mentioned above, spectral-null constraints, particularly those with DC-nulls and/or Nyquist-nulls, are well-matched to the frequency characteristics of digital recording channels, and have found application in many recording systems prior to the introduction of PRML techniques. In [121] and [46], it was shown that, in addition, constraints with spectral nulls at the frequencies where the channel frequency response has the value zero—matched-spectral-null (MSN) constraints—can in- crease the minimum distance relative to the uncoded channel. An example of this phenomenon, and one which served historically to motivate the use of matched-spectral-null codes, is the rate biphase code, with binary codewords and , which, one can easily show, increases the minimum squared-Euclidean distance of the binary-input dicode channel, , from to To state a more general bound on the distance-enhancing properties of MSN codes, we generalize the notion of a spectral null constraint to include sequences for which higher order derivatives of the power spectrum vanish at speciﬁed frequencies, as well. More precisely, we say that an ensemble of sequences has an order-spectral density null at if the power spectral density satisﬁes We will concentrate here upon those with high-order spectral null at DC. Sequences with high-order spectral nulls can be characterized in a number of equivalent ways. The high-order running-digital-sums of a sequence at DC can be deﬁned recursively as RDS RDS RDS RDS Sequences with order- spectral null at DC may be char- acterized in terms of properties of RDS Another characterization involves the related notion of high- order moments (power-sums), where the order-moment at DC of the sequence is deﬁned as In analogy to the characterization of (ﬁrst-order) spectral null sequences, one can show that an ensemble of sequences generated by freely concatenating a set of codewords of ﬁnite length will have an order- spectral null at DC if and only if (21) for all codewords In other words, for each codeword, the order- moments at DC must vanish for A sequence satisfying this condition is also said to have zero disparity of order Finally, we remark that a length- sequence with - transform has an order- spectral null at DC if and only if is divisible by This fact plays a role in bounding the distance-enhancing properties of spectral-null sequences. For more details about high-order spectral null constraints, particularly constraints with high-order null at DC, we refer the interested reader to Immink [99], Monti and Pierobon [153], Karabed and Siegel [121], Eleftheriou and Cideciyan [46], and Roth, Siegel, and Vardy [165], as well as other references cited therein. The original proof of the distance-enhancing properties of MSN codes was based upon a number-theoretic lower bound on the minimum Hamming distance of zero-disparity codes, due to Immink and Beenker [108]. They proved that the minimum Hamming distance (and, therefore, the minimum Euclidean distance) of a block code over the bipolar alphabet with order- spectral-null at DC grows at least linearly in Speciﬁcally, they showed that, for any pair of length- sequences in the code and This result for block codes can be suitably generalized to any constrained system with order- spectral null at DC. The result also extends to systems with an order- spectral null at any rational submultiple of the symbol frequency, in particular, at the Nyquist frequency. In [121], this result was extended to show that the Lee distance, and a fortiori the squared-Euclidean distance, be- tween output sequences of a bipolar, input-constrained channel is lower-bounded by if the input constraint and the channel, with spectral nulls at DC (or the Nyquist frequency) of orders and , respectively, combine to produce a spectral null at DC (or Nyquist) of order This result can be proved by applying Descartes’ rule of signs to the -transform representation of these sequences, using the divisibility conditions mentioned above [121]. 2286 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 This result can be applied to the PR4, EPR4, and E PR4 channels, which have a ﬁrst-order null at DC and a Nyquist null of order and , respectively. If the channel inputs are constrained to be bipolar sequences with an order- Nyquist null, the channel outputs will satisfy the following lower bound on minimum squared-Euclidean distance: for PR4 for EPR4 for E PR4. Comparing to the minimum distance of the uncoded bipolar channels, we see that the MSN constraint with , corresponding to a ﬁrst-order Nyquist null, provides a coding gain (unnormalized for rate loss) of at least 3, 1.8, and 1.2 dB, respectively. Using the observation made in Section III- A3, one can design codes with ﬁrst-order null at DC and Nyquist by twice-interleaving a DC-free code. When such a code is applied to the PR4 channel, which has an interleaved dicode decomposition, the implementation of the MSN-coded system becomes feasible. Code-design techniques such as those described in Section IV have been used to design efﬁcient MSN codes. For analytical and experimental results pertaining to a rate , MSN-coded PR4 system, the reader is referred to [164] and [169]. Experimental evaluation of a spectral-null coded-tape system is described in [27]. For these examples of MSN-constrained PR channels, the error event characterization discussed above provides another conﬁrmation, and a reﬁnement, of the coding gain bounds. The veriﬁcation makes use of the moment conditions satisﬁed by closed input error sequences satisfying spectral null properties, a generalization of the moment conditions in (21) above. Speciﬁcally, a ﬁrst-order DC null requires that (22) and a ﬁrst-order Nyquist null requires that (23) Examination of the error events for PR4 in Table V shows that each error event with fails to satisfy at least one of these conditions. Similarly, for EPR4, the error events in Table VI with are forbidden by the moment conditions. In the case of E PR4, the error event characterization not only conﬁrms, but also improves, the lower bound. Table VII shows that the moment conditions cannot be satisﬁed by any error sequence with implying a nominal coding gain of 2.2 dB. MSN coding based upon Nyquist-free constraints is applicable to optical PR channels, and error-event analysis can be used to conﬁrm the coding gain bounds in a similar manner [152]. There have been a number of extensions and variations on MSN coding techniques, most aimed at increasing code rate, improving intrinsic runlength constraints, or reducing the implementation complexity of the encoder, decoder, and detector. For further details, the reader should consult [68] and the references therein, as well as more recent results in, for example, [151] and [147]. When implementing MSN-coded PR systems, the complex- ity of the trellis structure that incorporates both the PR channel memory and the MSN constraints can be an issue, particularly for high-rate codes requiring larger digital sum variation. Reduced-complexity, suboptimal detection algorithms based upon a concatenation of a Viterbi detector for the PR channel and an error-event post-processor have been proposed for a DC/Nyquist-free block-coded PR4 channel [128] and EPR4 channel [184]. In both schemes, DC/Nyquist-free codewords are obtained by interleaving pairs of DC-free codewords, and discrepancies in the ﬁrst-order moments of the interleaved codeword estimates produced by the PR channel detector are utilized by the post-processors to determine and correct most-probable minimum-distance error events. It should be pointed out that aspects of the code design strategy described above were foreshadowed in an unpublished paper of Fredrickson [66] dealing with the biphase-coded dicode channel. In that paper, the observation was made that the input error sequences corresponding to the minimum squared-distance were of the form and those corresponding to the next-minimum distance were of the form Fredrickson modiﬁed the encoding process to eliminate minimum-distance events by appending an overall “parity- check” bit to each block of input bits, for a speciﬁed value of The resulting rate code provided a minimum squared-Euclidean distance at the output of the dicode channel, with only a modest penalty in rate for large The Viterbi detector for the coded channel was modiﬁed to incorporate the parity modulo- and to reﬂect the even-parity condition at the codeword boundaries. It was also shown that both the and the events can be eliminated by appending a pair of bits to each block of input bits in order to enforce a speciﬁc parity condition modulo- . The resulting rate code yielded at the dicode channel output, and the coding gain was realized with a suitably enhanced detector structure. D. Runlength Constraints Certain classes of runlength constraints have distance- enhancing properties when applied to the magnetic and optical PR channels. For example, the NRZI constraint has been applied to the EPR4 and the E PR4 magnetic channels, as well as the PR1 and PR2 optical channels; see, for example [152] and the references therein. On the EPR4 and PR1 channels, the constraint does not increase minimum distance. However, it does eliminate some of the minimum distance error-events, thereby providing some performance improvement. Moreover, the incorporation of the constraint into the detector trellis for EPR4 leads to a reduction of complexity from eight states IMMINK et al.: CODES FOR DIGITAL RECORDERS 2287 TABLE VIII INPUT PAIRS FOR FORBIDDEN ERROR STRINGS IN to six states, eliminating those corresponding to the NRZ channel inputs and . In the case of E PR4, Behrens and Armstrong [17] showed that the constraint provides a 2.2-dB increase in minimum squared-Euclidean distance. To see why this is the case, observe that forbidding the input error strings will prevent all closed error events with Forbidding, in addition, the strings pre- vents all open events with as well. Table VIII depicts pairs of binary input strings whose corresponding error strings belong to The symbol represents an arbitrary binary value common to both strings in a pair. Clearly, the elimination of the NRZ strings and precludes all of the input error strings. The precoded constraint precludes the NRZ strings —that is, the NRZ constraint is —conﬁrming that the constraint prevents all events with When the constraint is incorporated into the detector trellis, the resulting structure has only 10 states, substantially less than the 16 states required by the uncoded channel. The input error sequence analysis used above to conﬁrm the distance-enhancing properties of the constraint on the EPR4 channel suggests a relaxation of the constraint that nevertheless still achieves the same distance gain. Speciﬁ- cally, the constraint and the complementary constraint are sufﬁcient to ensure the elimination of closed and open events with The capacity of this constraint satisﬁes , and a rate , ﬁnite-state encoder with state-independent decoder is described in [122]. The corresponding detector trellis requires 12 states. Thus with a modest increase in complexity, this code achieves essentially the same performance as the rate code, while increasing the rate by 20%. This line of reasoning may be used to demonstrate the distance-enhancing properties of another class of NRZI runlength constraints, referred to as maximum-transition-run (MTR) constraints [154]. These constraints limit, sometimes in a periodically time-varying manner, the maximum number of consecutive ’s that can occur. The MTR constraints are characterized by a parameter , which determined the maximum allowable runlength of ’s. These constraints can be interpreted as a generalization of the constraint, which is the same as the MTR constraint with The MTR constraint with was introduced by Moon and Brickner [154] (see also Soljanin [181]). A labeled graph representation is shown in Fig. 19. The capacity of this constraint is Imposing an additional constraint, which we now denote , on the maximum runlength of ’s reduces the capacity, as shown in Table IX. Fig. 19. Labeled graph for MTR constraint. TABLE IX CAPACITY OF MTR FOR SELECTED VALUES OF TABLE X INPUT PAIRS FOR FORBIDDEN ERROR STRINGS IN The NRZI MTR constraint with corresponds to an NRZ constraint The error-event characteriza- tion in Table VII shows that the forbidden input error list sufﬁces to eliminate the closed error events on E PR4 with , though not all the open events. Analysis of input pairs, shown in Table X, reveals that the MTR constraint indeed eliminates the closed error events with The detector trellis that incorporates the E PR4 memory with this MTR constraint requires 14 states. A rate block code is shown in Table XI [154]. It is interesting to observe that the MTR constraint is the symbol-wise complement of the constraint, and the rate MTR codebook is the symbol-wise comple- ment of the rate Group Code Recording code, shown in Table I. With this code, all open error events with are eliminated. The MTR constraint supports codes with rates approaching its capacity, [154], [155]. However, in practical applications, a distance-enhancing code with rate or higher is considered very desirable. It has been shown that higher rate trellis codes can be based upon time-varying MTR (TMTR) constraints [67], [23], [123], [52]. For example, the TMTR constraint deﬁned by , which limits the maximum runlength of ’s beginning at an even time-index to at most , has capacity The constraint has been shown to support a rate block code. Graph representations for the TMTR constrained system are shown in Fig. 20. The states in the upper graph are depicted as either circles or squares, corresponding to odd time indices and even time indices, respectively. The numbering of states reﬂects the number of ’s seen since the 2288 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 TABLE XI ENCODER TABLE FOR RATE 4/5, MTR BLOCK CODE Fig. 20. Labeled graphs for TMTR constraint. last . In the upper graph , each state represents a unique such number. The lower graph is obtained by successively merging states with identical follower sets. The TMTR constraint eliminates all closed error events with on the E PR4 channel by preventing the input error sequences As with the MTR constraint, it can be shown that all open error events with can be eliminated by an appropriately designed rate , TMTR block code [123], [23], [21] [24], [52]. The time-varying trellis used by the detector for the rate coded E PR4 channel requires 16 states, no more states than the uncoded system. It has been shown that these constraints and codes also may be applied to the E PR4 channel to increase the minimum distance to the channel MFB, that is from to [152]. Time-varying constraints for the E PR4 channel that support distance-enhancing codes with rates larger than have also been found [52]. Fig. 21 shows a computer simulation of the bit-error-rate performance of four distance-enhancing constraints on the EPR4 channel, assuming a constant channel bit rate [152]. As a result of the favorable tradeoff between performance and complexity offered by high-rate distance-enhancing codes for high-order PR channels, there is currently great interest in deploying them in commercial magnetic data-storage systems, and further research into the design of such codes is being actively pursued. Finally, we remark that, for optical recording, the constraint and the TMTR constraint increase the minimum distance to on the PR2 and EPR2 channels, yielding nominal coding gains of 1.8 and 3 dB, respectively. A simple, rate code for the TMTR constraint may be used with a four-state detector to realize these coding gains [29], [152]. E. Precoded Convolutional Codes An alternative, and in fact earlier, approach to coded- modulation for PR channels of the form was introduced by Wolf and Ungerboeck [204] (see also [30]). Consider ﬁrst the case , the dicode channel. A binary input sequence is applied to an NRZI precoder, which implements the precoding operation characterized by the polynomial The binary precoder outputs are modulated to produce the bipolar channel inputs according to the rule Let be precoder inputs, with corresponding channel outputs Then the Euclidean distance at the output of the channel is related to the Hamming distance at the input to the precoder by the inequality (24) Now, consider as precoder inputs the set of code sequences in a convolutional code with states in the encoder and free Hamming distance The outputs of the PR channel may be described by a trellis with or fewer states [212], which may be used as the basis for Viterbi detection. The inequality (24) leads to the following lower bound on of the coded system: if is even if is odd. This coding scheme achieves coding gains on the dicode channel by the application of good convolutional codes, de- signed for memoryless Gaussian channels, and the use of a sequence detector trellis that reﬂects both the structure of the convolutional code and the memory of the channel. Using a nontrivial coset of the convolutional code ensures the satisfaction of constraints on the zero runlengths at the output of the channel. It is clear that, by interleaving convolutional encoders and using a precoder of the form , this technique, and the bound on free distance, may be extended to PR channels of the form In particular, it is applicable to the PR4 channel corresponding to The selection of the underlying convolutional code and nontrivial coset to optimize runlength constraints, free distance, and detector trellis complexity has been in- vestigated by several authors. See, for example, [89], [90], IMMINK et al.: CODES FOR DIGITAL RECORDERS 2289 Fig. 21. Performance of uncoded and coded E PR4 systems. and [212]. For the PR4 channel, and speciﬁed free Euclidean distance at the channel output, the runlength constraints and complexity of precoded convolutional codes have been found to be slightly inferior to those of matched-spectral-null (MSN) codes. For example, a rate precoded convolutional code was shown to achieve 3-dB gain (unnormalized for rate loss) with constraints and a 16-state detector trellis with 256 branches (per interleave). The comparable MSN code with this gain achieved the equivalent of constraints and used a six-state detector trellis with 24 branches (per interleave). Recently, a modiﬁed version of this precoding approach was developed for use with a rate turbo code [168]. The detection procedure incorporated an a posteriori probability (APP) PR channel detector, combined with an iterative, turbo decoder. Performance simulations of this coding scheme on a PR4 channel with AWGN demonstrated a gain of 5.3 dB (normalized for rate loss) at a bit-error rate of , relative to the uncoded PRML channel. Turbo equalization, whereby the PR detector is integrated into the iterative decoding procedure, was also considered. This increased the gain by another 0.5 dB. Thus the improvement over the previously proposed rate codes, which achieve 2-dB gain (normalized for rate loss) is approximately 3.3–3.8 dB. The remaining gap in between the rate turbo code performance at a bit-error rate of and the upper bound capacity limit (3) at rate [172] is approximately 2.25 dB [168]. The corresponding gap to the upper bound capacity limit at rate for the precoded convolutional code and the MSN code is therefore approximately 5.5–6 dB. This estimate of the SNR gap can be compared with that implied by the continuous-time channel capacity bounds, as discussed in Section II-B. VI. COMPENDIUM OF MODULATION CONSTRAINTS In this section, we describe in more detail selected properties of constrained systems that have played a prominent role in digital recording systems. The classes of runlength- limited constraints and spectral-null constraints have already been introduced. In addition, there are constraints that generate spectral lines at speciﬁed frequencies, called pilot tracking tones, which can be used for servo tracking systems in videotape recorders [118], [115]. Certain channels require a combination of time and frequency constraints [128], [157], [160]; speciﬁcally DC-balanced RLL sequences have found widespread usage in recording practice. In addition, there are many other constraints that play a role in recording systems; see, for example, [102], [196], [146], [177], and [178]. Table XII gives a survey of recording constraints used in consumer electronics products. A. Runlength-Limited Sequences We have already encountered -constrained binary se- quences where We are also interested in the case Fig. 22 illustrates a graph representing constraints. 2290 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 TABLE XII SURVEY OF RECORDING CODES AND THEIR APPLICATION AREA Fig. 22. Shannon cover for a constraint. TABLE XIII CAPACITY VERSUS RUNLENGTH PARAMETERS AND For sequences we can easily derive the characteristic equation or equivalently, Table XIII lists the capacity for selected values of the parameters and RLL sequences are used to increase the minimum separation between recorded transitions. The quantity , called the density ratio or packing density, is deﬁned as It expresses the number of information bit intervals within the minimum separation between consecutive transitions of an RLL sequence. It may be shown that the density ratio can be made arbitrarily large by choosing sufﬁciently large [3]. The minimum increment within a runlength is called the timing window or detection window, denoted by Measured in units of information bit intervals, Sequences with a larger value of , and thus a lower capacity , are penalized by an increasingly difﬁcult tradeoff between the detection window and the density ratio. Practical codes have typically used constraints with TABLE XIV CAPACITY OF ASYMMETRICAL RUNLENGTH-LIMITED SEQUENCES VERSUS MINIMUM RUNLENGTH B. Asymmetrical Runlength Constraints Asymmetrical runlength-limited sequences [75], [194], [186] have different constraints on the runlengths of ’s and ’s. One application of these constraints has been in optical recording systems, where the minimum size of a written pit, as determined by diffraction limitations, is larger than the minimum size of the area separating two pits, a spacing determined by the mechanical positioning capabilities of the optical recording ﬁxture. Asymmetrical runlength-limited sequences are described by four parameters and , and , , which describe the constraints on runlengths of ’s and ’s, respectively. An allowable sequence is composed of alternate phrases of the form , and Let one sequence be composed of phrases of durations , and let the second sequence have phrases of durations The interleaved sequence is composed of phrases taken alternately from the ﬁrst, odd sequence and the second, even sequence. The interleaved sequence is composed of phrases of duration , , , implying that the characteristic equation is which can be rewritten as (25) If we assume that , then (25) can be written as As an immediate implication of the symmetry in and , we ﬁnd for the capacity of the asymmetrical runlength-limited sequences (26) where denotes the capacity of asymmetrical runlength-limited sequences. Thus the capacity of asym- metrical RLL sequences is a function of the sum of the two minimum runlength parameters only, and it sufﬁces to evaluate by solving the characteristic equation Results of computations are given in Table XIV. IMMINK et al.: CODES FOR DIGITAL RECORDERS 2291 Fig. 23. Labeled graph for constraint. We can derive another useful relation with the following observation. Let , i.e., the restrictions on the runlengths of ’s and ’s are again symmetric, then from (26) so that we obtain the following relation between the capacity of symmetrical and asymmetrical RLL sequences: C. RLL Sequences with Multiple Spacings Funk [72] showed that the theory of RLL sequences is unnecessarily narrow in scope and that it precludes certain relevant coding possibilities which could prove useful in particular devices. The limitation is removed by introducing multiple-spaced RLL sequences, where one further restriction is imposed upon the admissible runlengths of ’s. The run- length/spacing constraints may be expressed as follows: for integers and where is a multiple of the number of ’s between successive ’s must be equal to , where The parameters and again deﬁne the minimum and maximum allowable runlength. A sequence deﬁned in this way is called an RLL sequence with multiple spacing (RLL/MS). Such a sequence is characterized by the parameters Note that for standard RLL sequences we have Fig. 23 illustrates a state-transition diagram for the constraint. The capacity can simply be found by invoking Shannon’s capacity formula where is the largest root of the characteristic equation (27) Note that if and have a common factor , then is also divisible by Therefore, a sequence with the above condition on and is equivalent to a sequence. For ,we obtain the characteristic equation Table XV shows the results of computations. Within any adjacent bit periods, there is only one possible location for the next , given the location of the last . The detection window for an RLL/MS sequence is therefore and the minimum spacing between two transitions, , equals By rewriting (27) we obtain a relationship between and , namely, TABLE XV CAPACITY FOR SELECTED VALUES OF AND Fig. 24. Relationship between and window The operating points of various sequences are indicated. This relationship is plotted, for , in Fig. 24. With constrained sequences, only discrete points on this curve are possible. RLL sequences with multiple spacing, however, make it possible, by a proper choice of and , to approximate any point on this curve. A multiple-spaced RLL code with parameters has been designed and experimentally evaluated in exploratory magnetooptic recording systems using a resonant bias coil direct-overwrite technique [167], [200]. D. Sequences The constraints for partial-response maximum- likelihood systems were introduced in Section II-C2. Recall that the parameter stipulates the maximum number of allowed ’s between consecutive ’s, while the parameter stipulates the maximum number of ’s between ’s in both the even- and odd-numbered positions of the sequence. To describe a graph presentation of these constraints, we deﬁne three parameters. The quantity denotes the number of ’s since the last . The quantities and denote the number of ’s since the last in the even and odd subsequence, respectively. It is immediate that if if Each state in the graph corresponds to a -tuple with and Wherever permitted, there 2292 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 TABLE XVI CAPACITY FOR SELECTED VALUES OF AND is an edge from state to state with a label , and an edge from state to state with a label . By computing the maximum eigenvalue of the adjacency matrix corresponding to the graph, we obtain the capacity of the constraint . Results of computations are listed in Table XVI. For all of these constraints, rate codes have been constructed [146]. As mentioned earlier, a rate , block code was used in early disk drives employing PRML techniques. Current disk drives make use of more relaxed constraints, such as and , which can support codes with even higher rates, such as rate [161], [51]. E. Spectral-Null Sequences Frequency-domain analysis of constrained sequences is based upon the average power spectral density, or, as it is often called, the power spectrum. In order to deﬁne the power spectrum, we must endow the ensemble of constrained sequences with a probability measure. Generally, the measure chosen is the maxentropic measure determined by the transition probabilities discussed in Section III-B. The autocorrelation function is the sequence of th-order autocorrelation coefﬁcients , deﬁned by where represent channel input symbols and the expec- tation is with respect to the given measure. According to the Wiener–Khinchin theorem, the average power spectrum is given by the discrete-time Fourier transform of the autocor- relation function where, as before, Alternatively, we can express as The computation of the power spectrum of an ensemble of Markov-chain driven sequences is well-studied and has been carried out for many families of runlength-type constraints, as well as for the subsets of constrained sequences generated by speciﬁc ﬁnite-state encoders; see [75] and references therein. It is important to note that for a particular sequence, the average power density at a particular frequency , if it exists at all, may differ signiﬁcantly from if TABLE XVII CAPACITY AND SUM VARIANCE OF MAXENTROPIC RDS-CONSTRAINED SEQUENCES VERSUS DIGITAL SUM VARIATION For spectral-null constraints with , however, every sequence in the constraint has a well-deﬁned average power density at , and the magnitude is equal to zero [145]. As has already been mentioned, the spectral null frequencies of primary interest in digital recording are zero frequency (DC) and the Nyquist frequency. (Further general results on spectral-null sequences are given in [145], [100], and [102], for example.) Chien [38] studied bipolar sequences that assume a ﬁnite range of consecutive running-digital-sum (RDS) values, that is, sequences with digital-sum variation (DSV) The range of RDS values may be used, as in Fig. 7, to deﬁne a set of allowable states. The adjacency matrix for the RDS-constrained channel is given by otherwise. For most constraints, it is not possible to ﬁnd a simple closed-form expression for the capacity, and one has to rely on numerical methods to obtain an approximation. The RDS- constrained sequences provide a beautiful exception to the rule, as the structure of allows us to provide a closed-form expression for the capacity of an RDS-constrained channel. We have [38] and thus the capacity of the RDS-constrained channel is (28) Table XVII lists the capacity , for It can be seen that the sum constraint is not very expensive in terms of rate loss when is relatively large. For instance, a sequence that takes at maximum sum values has a capacity , which implies a rate loss of less than 10%. Closed-form expressions for the spectra of maxentropic RDS-constrained sequences were derived by Kerpez [126]. Fig. 25 displays the power spectral density function of max- entropic RDS-constrained sequences for various values of the digital sum variation Let denote the power spectral density of a sequence with vanishing power at DC, where The width of the spectral notch is a very important design characteristic which is usually quantiﬁed by a parameter called the cutoff frequency. The cutoff frequency of a DC-free constraint, denoted by , IMMINK et al.: CODES FOR DIGITAL RECORDERS 2293 Fig. 25. Power density function of maxentropic RDS-constrained sequences against frequency with digital sum variation as a parameter. For the case , we have indicated the cutoff frequency is deﬁned by [65] It can be observed that the cutoff frequency becomes smaller when the digital sum variation is allowed to increase. Let denote RDS Justesen [116] discovered a useful relation between the sum variance and the width of the spectral notch He found the following approximation of the cutoff frequency : (29) Extensive computations of samples of implemented channel codes, made by Justesen [116] and Immink [98] to validate the reciprocal relation (29) between and , have revealed that this relationship is fairly reliable. The sum variance of a maxentropic RDS-constrained sequence, denoted by , is given by [38] (30) Table XVII lists the sum variance for Fig. 26, which shows a plot of the sum variance versus the redundancy , affords more insight into the tradeoffs in the engineering of DC-balanced sequences. It presents the designer with a spectral budget, reﬂecting the price in terms of code redundancy for a desired spectral notch width. It also reveals that the relationship between the logarithms of the sum variance and the redundancy is approximately linear. For large digital sum variation , it was shown by A. Janssen [114] that and similarly Fig. 26. Sum variance versus redundancy of maxentropic RDS-constrained sequences. These approximations, coupled with (28) and (30), lead to a fundamental relation between the redundancy and the sum variance of a maxentropic RDS-constrained sequence, namely, (31) Actually, the bound on the right is within 1% accuracy for Equation (31) states that, for large enough , the product of redundancy and sum variance of maxentropic RDS-constrained sequences is approximately constant, as was suggested by Fig. 26. VII. FUTURE DIRECTIONS As digital recording technology advances and changes, so does the system model that serves as the basis for information- theoretic analysis and the motivation for signal processing and coding techniques. In this section, we brieﬂy describe several technology developments, some evolutionary and some revo- lutionary, that introduce new elements that can be incorporated into mathematical models for digital recording channels. A. Improved Channel Models Reﬂecting the continuing, rapid increase in areal density of conventional magnetic recording, as well as the characteristics of the component heads and disks, channel models now incor- porate factors such as asymmetry in the positive and negative step responses of magnetoresistive read heads; deviations from linear superposition; spectral coloring, nonadditivity, and nonstationarity in media noise; and partial-erasure effects and other data-dependent distortions [20], [21], [32], [33]. The evaluation of the impact of these channel characteristics on the performance of the signal processing and coding techniques dicussed in this paper is an active area of research, as is the development of new approaches that take these channel properties into account. See, for example, related papers in [192]. 2294 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 B. Nonsaturation Multilevel Recording At various times during the past, the possibility of aban- doning saturation recording, “linearizing” the digital magnetic- recording channel, and incorporating nonbinary signaling has been examined. In all such studies, however, the potential increase in recording density that might accrue from the application or adaptation of coded-modulation techniques de- veloped for digital communications has been outweighed by the increase in detector complexity and, more fundamentally, the cost in signal-to-noise ratio that accompanies the lineariza- tion process. However, several novel storage technologies can support multilevel alphabets, such as electron-trapping optical memories ETOM [148], [31] and optical recording with multivalued magnetooptic media [176]. C. Multitrack Recording Another avenue toward increasing the storage capacity of disk and tape systems is to exploit their inherent two- dimensional nature. Runlength-limited codes, such as -track codes, that increase the per-track code rate by sharing the timing constraint across multiple tracks have been analyzed and designed [140], [185], [47]. Using models of signal-to-noise ratio dependence upon track width, as well as intertrack interference (ITI), one can investigate information-theoretic capacity bounds as a function of track density. Multitrack recording and multihead detection techniques based upon partial-response equalization, decision-feedback-equalization, and sequence detection have been studied [13], along with coding schemes that can improve their performance. See, for example, [183] and references therein. D. Multidimensional Recording New, exploratory technologies, such as volume holographic data storage [80] and two-photon-based three-dimensional (3-D) optical memories [95], have generated interest in page- oriented recording and readback. Models of these processes have generated proposals for two-dimensional equalization and detection methods [82], [158], along with two-dimensional codes [81], [195]. This has generated interest in two-dimensional constrained systems and modulation codes. As an example, consider a two- dimensional binary constrained array as an (row) by (column) binary array such that every has no less than ’s and no more than ’s above it, below it, to the right of it, and to the left of it (with the exception of ’s on or near borders). The capacity of such an array is equal to the limit, as and approach inﬁnity, of the ratio of the logarithm of the number of distinct arrays satisfying the constraints to the product of times Little is known at this time about ﬁnding the capacity of such two-dimensional binary constrained arrays. A notable exception is that it has been proved that the two- dimensional capacity of such two-dimensional binary arrays is equal to zero if and only if and [124]. Thus the two-dimensional capacity of the constraint is equal to , while the two-dimensional capacity of the constraint is strictly greater than . This is in contrast to the one-dimensional case, where the capacity of both and constrained binary sequences are both nonzero and, in fact, are equal. Lower bounds on the capacity of some two- dimensional constraints are presented in [124], [179], and other constraints relevant to two-dimensional recording are analyzed in [11], [187], and [199]. VIII. SHANNONSCROSSWORD PUZZLES A. Existence of Multidimensional Crossword Puzzles As mentioned in the preceding section, multidimensional constrained codes represent a new challenge for information theorists, with potentially important applications to novel, high-density storage devices. We feel it is particularly ﬁtting, then, to bring our survey to a close by returning once more to Shannon’s 1948 paper [173] where, remarkably, in a short passage addressing the connection between the redundancy of a language and the existence of crossword puzzles, Shannon anticipated some of the issues that arise in multidimensional constrained coding. Speciﬁcally, Shannon suggested that there would be cases where the capacity of a two-dimensional constraint is equal to zero, even though the capacity of the constituent one- dimensional constraint is nonzero, a situation illustrated by certain two-dimensional constraints. We cite the fol- lowing excerpt from Shannon’s 1948 paper: The ratio of the entropy of a source to the maximum value it could have while still restricted to the same sym- bols will be called its relative entropy.One minus the relative entropy is the redundancy.The redundancy of a language is related to the existence of crossword puzzles. If the redundancy is zero any sequence of letters is a reasonable text in the language and any two- dimensional array of letters forms a crossword puzzle. If the redundancy is too high the language imposes too many constraints for large crossword puzzles to be possible. A more detailed analysis shows that if we assume the constraints imposed by the language are of a rather chaotic and random nature, large crossword puzzles are just possible when the redundancy is 50%. If the redundancy is 33%, three-dimensional crossword puzzles should be possible, etc. To the best of our knowledge Shannon never published a more detailed exposition on this subject. This led us to try to construct a plausibility argument for his statement. We assume that the phrase “large crossword puzzles are just possible” should be taken to mean that the capacity of the corresponding two-dimensional constraint is nonzero. Let denote the number of source symbols, denote the source binary entropy, and denote the relative entropy. We begin with all by arrays that can be formed from symbols. We eliminate all arrays that do not have all of their rows and columns made up of a concatenation of allowable words from the language. The probability that any row of the array is made up of a concatenation of allowable words from the language is equal IMMINK et al.: CODES FOR DIGITAL RECORDERS 2295 to the ratio of the number of allowable concatenations of words with letters, ,to Thus assuming statistical independence of the rows, the probability that all rows are concatenations of allowable words is this ratio raised to the th power, or or The identical ratio results for the probability that all columns are made up of concatenations of allowable words. Now assuming that the rows and columns are statistically independent, we see that the probability for an array to have all of its rows and all of its columns made up of concatenations of allowable words is equal to The assumption of independence of the rows and columns is made with the sole justiﬁcation that this property might be expected to be true for a language that is “of a rather chaotic and random nature.” Multiplying this probability by the number of arrays yields the average number of surviving arrays, which grows exponentially with provided that A similar argument for three-dimensional arrays yields the condition This is Shannon’s result. (The authors thank K. Shaughnessy [174] for contributions to this argument.) We remark that for ordinary English crossword puzzles, we would interpret the black square to be a 27th symbol in the alphabet. Thus to compute the “relative entropy” of English, we divide the entropy of English by In this context, we would propose using an unusual deﬁnition of the entropy of English, which we call , based upon the dependencies of letters within individual words, but not across word boundaries, since the rows and columns of crossword puzzles are made up of unrelated words separated by one or more black squares. To compute for the English language, we can proceed as follows. Assume that is the number of words in an English dictionary with letters, for We lengthen each word by one letter to include the black square at the end of a word and then add one more word of length to represent a single black square. (This allows more than one black square between words.) Following Shannon, the number of distinct sequences of words containing exactly symbols, ,is given by the difference equation (32) Then, is given by the logarithm of the largest real root of the equation (33) The distribution of word lengths in an English dictionary has been investigated by Lord Rothschild [166]. (See also the discussion in Section VIII-C.) B. Connections to Two-Dimensional Constraints Unfortunately, a direct application of Shannon’s statement to the and constraints leads to a problem. Their one-dimensional capacities and, therefore, their relative entropies, are equal, with However, we have seen that the capacity of the two-dimensional constraint is zero, while that of the two-dimensional constraint is nonzero. In order to resolve this inconsistency with Shannon’s bound, we tried to modify the argument by more accurately approximating the probability of a column satisfying the speciﬁed row constraint, as follows. Although the one-dimensional capacities of the two con- straints are equal, the one-dimensional constraints have dif- ferent ﬁrst-order entropies In particular, for the constraint and for the constraint, since the relative frequency of ’s is higher for the constraint than for the constraint. In the previous plausibility argument for Shannon’s result, once one chooses the rows of the array to be a concatenation of allowable words, the relative frequencies of the symbols in each column occur in accordance with the relative frequency of the symbols in the words of the language. Thus the probability that any column is a concatenation of allowable words is equal to Proceeding as above, we ﬁnd that the average number of surviving arrays grows exponentially with provided that for two-dimensional arrays, or for three-dimensional arrays. However, for both the one-dimensional and constraints, we ﬁnd Therefore, this modiﬁed analysis still does not satisfactorily explain the behavior of these two constraints. A possible explanation is that a further reﬁnement in the argument is needed. Another possibility is that these constraints are not “chaotic and random” enough for Shannon’s conclusion, and our plausibility argu- ments, to apply. C. Coda As this paper was undergoing ﬁnal revisions, one of the authors (JKW) received a letter from E. Gilbert pertaining to Shannon’s crossword puzzles [77]. The letter was prompted by a lecture given by JKW at the Shannon Day Symposium, held at Bell Labs on May 18, 1998, in which the connection between the capacity of two-dimensional constraints and Shan- non’s result on crossword puzzles was discussed. In the letter, Gilbert recalls a conversation he had with Shannon 50 years ago on this subject. Referring to Shannon’s paper, he says: I didn’t understand that crossword example and tried to reconstruct his argument. That led to a kind of hand- waving “proof,” which I showed to Claude. Claude’s own argument turned out to have been something like mine Fortunately, I outlined my proof in the margin of my reprint of the paper (like Fermat and his copy of Diophantos). It went like this: The argument that followed is exactly the same as the one presented in Section VIII-A above, with the small exception that arrays were assumed to be square. In fact, in a subsequent e-mail correspondence [78], Gilbert describes a calculation of the redundancy of English along the lines suggested by (32) and (33). Thus we see that the study of multidimensional constrained arrays actually dates back 50 years to the birth of information theory. A great deal remains to be learned. IX. SUMMARY In this paper, we have attempted to provide an overview of the theoretical foundations and practical applications of 2296 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 constrained coding in digital-recording systems. In keeping with the theme of this special issue, we have highlighted essential contributions to this area made by Shannon in his landmark 1948 paper. We described the basic characteristics of a digital-recording channel, and surveyed bounds on the noisy-channel capacity for several mathematical channel mod- els. We then discussed practical equalization and detection techniques and indicated how their implementation imposes constraints on the recording-channel inputs. Following a re- view of Shannon’s fundamental results on the capacity of discrete noiseless channels and on the existence of efﬁcient codes, we presented a summary of key results in the theory and practice of efﬁcient constrained code design. We then discussed the application of distance-enhancing constrained codes to improve the reliability of noisy recording channels, and compared the resulting performance to estimates of the noisy-channel capacity. Finally, we pointed out several new directions that future research in the area of recording codes might follow, and we concluded with a discussion of the connection between Shannon’s remarks on crossword puzzles and the theory of multidimensional constrained codes. Through the inclusion of numerous references and indications of open research problems, we hope to have provided the reader with an introduction to this fascinating, important, and active branch of information theory, as well as with some incentive and encouragement to contribute to it. ACKNOWLEDGMENT The authors are grateful to Dick Blahut, Brian Marcus, Ron Roth, and Emina Soljanin for their thoughtful comments on an earlier version of this paper. They also wish to thank Bruce Moision for assistance with computer simulations and for preparation of Fig. 21. REFERENCES [1] K. A. S. Abdel-Ghaffar and J. H. Weber, “Constrained block codes for class–IV partial-response channels with maximum-likelihood sequence estimation,” IEEE Trans. Inform. Theory, vol. 42, pp. 1405–1424, Sept. 1996. [2] R. L. Adler, “The torus and the disk,” IBM J. Res. Develop., vol. 31, no. 2, pp. 224–234, Mar. 1987. [3] R. L. Adler, D. Coppersmith, and M. Hassner, “Algorithms for sliding block codes: An application of symbolic dynamics to information theory,” IEEE Trans. Inform. Theory, vol. IT-29, pp. 5–22, Jan. 1983. [4] R. L. Adler, M. Hassner, and J. Moussouris, “Method and apparatus for generating a noiseless sliding block code for a (1, 7) channel with rate 2/3,” U.S. Patent 4413 251, June 1982. [5] N. Alon, E. E. Bergmann, D. Coppersmith, and A. M. Odlyzko, “Balancing sets of vectors,” IEEE Trans. Inform. Theory, vol. 34, pp. 128–130, Jan. 1988. [6] S. Altekar, “Detection and coding techniques for magnetic recording channels,” Ph.D. dissertation, Univ. Calif. San Diego, June 1997. [7] S. A. Altekar, M. Berggren, B. E. Moision, P. H. Siegel, and J. K. Wolf, “Error-event characterization on partial-response channels,” in Proc. 1997 IEEE Int. Symp. Information Theory (Ulm, Germany, June 29–July 4), p. 461; IEEE Trans. Inform. Theory, vol. 45, Jan. 1999, to be published. [8] J. Ashley, R. Karabed, and P. H. Siegel, “Complexity and sliding-block decodability,” IEEE Trans. Inform. Theory, vol. 42, no. 6, pt. 1, pp. 1925–1947, Nov. 1996. [9] J. J. Ashley and B. H. Marcus, “Canonical encoders for sliding block decoders,” SIAM J. Discrete Math., vol. 8, pp. 555–605, 1995. [10] , “A generalized state-splitting algorithm,” IEEE Trans. Inform. Theory, vol. 43, pp. 1326–1338, July 1997. [11] , “Two-dimensional low-pass ﬁltering codes,” IEEE Trans. Com- mun., vol. 46, pp. 724–727, June 1998. [12] J. J. Ashley, B. H. Marcus, and R. M. Roth, “Construction of encoders with small decoding look-ahead for input-constrained channels,” IEEE Trans. Inform. Theory, vol. 41, pp. 55–76, Jan. 1995. [13] L. C. Barbosa, “Simultaneous detection of readback signals from inter- fering magnetic recording tracks using array heads,” IEEE Trans. Magn., vol. 26, pp. 2163–2165, Sept. 1990. [14] I. Bar-David and S. Shamai (Shitz), “Information rates for magnetic recording channels with peak- and slope-limited magnetization,” IEEE Trans. Inform. Theory, vol. 35, pp. 956–962, Sept. 1989. [15] M.-P. B´ eal, Codage Symbolique. Paris, France: Masson, 1993. [16] G. F. M. Beenker and K. A. S. Immink, “A generalized method for encoding and decoding runlength-limited binary sequences,” IEEE Trans. Inform. Theory, vol. IT-29, pp. 751–754, Sept. 1983. [17] R. Behrens and A. Armstrong, “An advanced read/write channel for magnetic disk storage,” in Proc. 26th Asilomar Conf. Signals, Systems, and Computers (Paciﬁc Grove, CA, Oct. 1992), pp. 956–960. [18] E. R. Berlekamp, “The technology of error-correcting codes,” Proc. IEEE, vol. 68, pp. 564–593, May 1980. [19] M. Berkoff, “Waveform compression in NRZI magnetic recording,” Proc. IEEE, vol. 52, pp. 1271–1272, Oct. 1964. [20] H. N. Bertram, Theory of Magnetic Recording. Cambridge, U.K.: Cambridge Univ. Press, 1994 [21] H. N. Bertram and X. Che, “General analysis of noise in recorded transitions in thin ﬁlm recording media,” IEEE Trans. Magn., vol. 29, pp. 201–208, Jan. 1993. [22] W. G. Bliss, “Circuitry for performing error correction calculations on baseband encoded data to eliminate error propagation,” IBM Tech. Discl. Bull., vol. 23, pp. 4633–4634, 1981. [23] , “An 8/9 rate time-varying trellis code for high density magnetic recording,” IEEE Trans. Magn., vol. 33, pp. 2746–2748, Sept. 1997. [24] W. G. Bliss, S. She, and L. Sundell, “The performance of generalized maximum transition run trellis codes,” IEEE Trans. Magn., vol. 34, no. 1, pt. 1, pp. 85–90, Jan. 1998. [25] G. Bouwhuis, J. Braat, A. Huijser, J. Pasman, G. van Rosmalen, and K. A. S. Immink, Principles of Optical Disc Systems. Bristol, U.K. and Boston, MA: Adam Hilger, 1985. [26] F. K. Bowers, U.S. Patent 2 957947, 1960. [27] V. Braun, K. A. S. Immink, M. A. Ribiero, and G. J. van den Enden, “On the application of sequence estimation algorithms in the Digital Compact Cassette (DCC),” IEEE Trans. Consumer Electron., vol. 40, pp. 992–998, Nov. 1994. [28] V. Braun and A. J. E. M. Janssen, “On the low-frequency suppression performance of DC-free runlength-limited modulation codes,” IEEE Trans. Consumer Electron., vol. 42, pp. 939–945, Nov. 1996. [29] B. Brickner and J. Moon, “Investigation of error propagation in DFE and MTR coding for ultra-high density,” Tech. Rep., Commun. Data Storage Lab., Univ. Minnesota, Minneapolis, July 10, 1997. [30] A. R. Calderbank, C. Heegard, and T.-A. Lee, “Binary convolutional codes with application to magnetic recording, IEEE Trans. Inform. Theory, vol. IT-32, pp. 797–815, Nov. 1986. [31] A. R. Calderbank, R. Laroia, and S. W. McLaughlin, “Coded modulation and precoding for electron-trapping optical memories,” IEEE Trans. Commun., vol. 46, pp. 1011–1019, Aug. 1998. [32] J. Caroselli and J. K. Wolf, “A new model for media noise in thin ﬁlm magnetic recording media,” in Proc. 1995 SPIE Int. Symp. Voice, Video, and Data Communications (Philadelphia, PA, Oct. 1995), vol. 2605, pp. 29–38. [33] J. Caroselli and J. K. Wolf, “Applications of a new simulation model for media noise limited magnetic recording channels,” IEEE Trans. Magn., vol. 32, pp. 3917–3919, Sept. 1996. [34] K. W. Cattermole, Principles of Pulse Code Modulation. London, U.K.: Iliffe, 1969. [35] , “Principles of digital line coding,” Int. J. Electron., vol. 55, pp. 3–33, July 1983. [36] Workshop on Modulation, Coding, and Signal Processing for Magnetic Recording Channels, Center for Magnetic Recording Res., Univ. Calif. at San Diego. La Jolla, CA, May 20–22, 1985. [37] Workshop on Modulation and Coding for Digital Recording Systems, Center for Magnetic Recording Res., Univ. Calif. at San Diego. La Jolla, CA, Jan. 8–10, 1987. [38] T. M. Chien, “Upper bound on the efﬁciency of DC-constrained codes,” Bell Syst. Tech. J., vol. 49, pp. 2267–2287, Nov. 1970. [39] R. Cideciyan, F. Dolivo, R. Hermann, W. Hirt, and W. Schott, “A PRML system for digital magnetic recording,” IEEE J. Select. Areas Commun., vol. 10, pp. 38–56, Jan. 1992. IMMINK et al.: CODES FOR DIGITAL RECORDERS 2297 [40] M. Cohn and G. V. Jacoby, “Run-length reduction of 3PM code via look- ahead technique,” IEEE Trans. Magn., vol. MAG-18, pp. 1253–1255, Nov. 1982. [41] D. J. Costello, Jr., J. Hagenauer, H. Imai, and S. B. Wicker, “Applica- tions of error control coding,” this issue, pp. 2531–2560. [42] T. M. Cover, “Enumerative source coding,” IEEE Trans. Inform. Theory, vol. IT-19, pp. 73–77, Jan. 1973. [43] R. H. Deng and M. A. Herro, “DC-free coset codes,” IEEE Trans. Inform. Theory, vol. 34, pp. 786–792, July 1988. [44] J. Eggenberger and P. Hodges, “Sequential encoding and decoding of variable length, ﬁxed rate data codes,” U.S. Patent 4 115768, 1978. [45] J. Eggenberger and A. M. Patel, “Method and apparatus for implement- ing optimum PRML codes,” U.S. Patent 4707 681, Nov. 17, 1987. [46] E. Eleftheriou and R. Cideciyan, “On codes satisfying th order running digital sum constraints,” IEEE Trans. Inform. Theory, vol. 37, pp. 1294–1313, Sept. 1991. [47] T. Etzion, “Cascading methods for runlength-limited arrays,” IEEE Trans. Inform. Theory, vol. 43, pp. 319–324, Jan. 1997. [48] I. J. Fair, W. D. Gover, W. A. Krzymien, and R. I. MacDonald, “Guided scrambling: A new line coding technique for high bit rate ﬁber optic transmission systems,” IEEE Trans. Commun., vol. 39, pp. 289–297, Feb. 1991. [49] J. L. Fan and A. R. Calderbank, “A modiﬁed concatenated coding scheme with applications to magnetic data storage,” IEEE Trans. Inform. Theory, vol. 44, pp. 1565–1574, July 1998. [50] M. J. Ferguson, “Optimal reception for binary partial response chan- nels,” Bell Syst. Tech. J., vol. 51, pp. 493–505, 1972. [51] J. Fitzpatrick and K. J. Knudson, “Rate modulation code for a magnetic recording channel,” U.S. Patent 5635 933, June 3, 1997. [52] K. K. Fitzpatrick and C. S. Modlin, “Time-varying MTR codes for high density magnetic recording,” in Proc. 1997 IEEE Global Telecommuni- cations Conf. (GLOBECOM ’97) (Phoenix, AZ, Nov. 4–8, 1997). [53] G. D. Forney, Jr., “Maximum likelihood sequence detection in the presence of intersymbol interference,” IEEE Trans. Inform. Theory, vol. IT-18, pp. 363–378, May 1972. [54] , “The Viterbi algorithm,” Proc. IEEE, vol. 61, no. 3, pp. 268–278, Mar. 1973. [55] G. D. Forney, Jr. and A. R. Calderbank, “Coset codes for partial response channels; or, cosets codes with spectral nulls,” IEEE Trans. Inform. Theory, vol. 35, pp. 925–943, Sept. 1989. [56] G. D. Forney, Jr. and G. Ungerboeck, “Modulation and coding for linear gaussian channels,” this issue, pp. 2384–2415. [57] P. A. Franaszek, “Sequence-state encoding for digital transmission,” Bell Syst. Tech. J., vol. 47, pp. 143–157, Jan. 1968. [58] , “Sequence-state methods for run-length-limited coding,” IBM J. Res. Develop., vol. 14, pp. 376–383, July 1970. [59] , “Run-length-limited variable length coding with error propaga- tion limitation,” U.S. Patent 3 689 899, Sept. 1972. [60] , “On future-dependent block coding for input-restricted chan- nels,” IBM J. Res. Develop., vol. 23, pp. 75–81, 1979. [61] , “Synchronous bounded delay coding for input restricted chan- nels,” IBM J. Res. Develop., vol. 24, pp. 43–48, 1980. [62] , “A general method for channel coding,” IBM J. Res. Develop., vol. 24, pp. 638–641, 1980. [63] , “Construction of bounded delay codes for discrete noiseless channels,” IBM J. Res. Develop., vol. 26, pp. 506–514, 1982. [64] , “Coding for constrained channels: A comparison of two ap- proaches,” IBM J. Res. Develop., vol. 33, pp. 602–607, 1989. [65] J. N. Franklin and J. R. Pierce, “Spectra and efﬁciency of binary codes without DC,” IEEE Trans. Commun., vol. COM-20, pp. 1182–1184, Dec. 1972. [66] L. Fredrickson, unpublished report, 1993. [67] , “Time-varying modulo trellis codes for input restricted partial response channels,” U.S. Patent 5257 272, Oct. 26, 1993. [68] L. Fredrickson, R. Karabed, J. W. Rae, P. H. Siegel, H. Thapar, and R. Wood, “Improved trellis coding for partial response channels,” IEEE Trans. Magn., vol. 31, pp. 1141–1148, Mar. 1995. [69] C. V. Freiman and A. D. Wyner, “Optimum block codes for noiseless input restricted channels,” Inform. Contr., vol. 7, pp. 398–415, 1964. [70] C. A. French and J. K. Wolf, “Bounds on the capacity of a peak power constrained Gaussian channel,” IEEE Trans. Magn., vol. 24, pp. 2247–2262, Sept. 1988. [71] S. Fukuda, Y. Kojima, Y. Shimpuku, and K. Odaka, “8/10 modulation codes for digital magnetic recording,” IEEE Trans. Magn., vol. MAG-22, pp. 1194–1196, Sept. 1986. [72] P. Funk, “Run-length-limited codes with multiple spacing,” IEEE Trans. Magn., vol. MAG-18, pp. 772–775, Mar. 1982. [73] A. Gabor, “Adaptive coding for self-clocking recording,” IEEE Trans. Electron. Comp., vol. EC-16, pp. 866–868, Dec. 1967. [74] R. Gallager, Information Theory and Reliable Communication. New York: Wiley, 1968. [75] A. Gallopoulos, C. Heegard, and P. H. Siegel, “The power spectrum of run-length-limited codes,” IEEE Trans. Commun., vol. 37, pp. 906–917, Sept. 1989. [76] F. R. Gantmacher, Matrix Theory, Volume II. New York: Chelsea, 1960. [77] E. Gilbert, private correspondence, May 1998. [78] , private e-mail, June 1998. [79] J. Gu and T. Fuja, “A new approach to constructing optimal block codes for runlength-limited channels,” IEEE Trans. Inform. Theory, vol 40, pp. 774–785, May 1994. [80] J. Heanue, M. Bashaw, and L. Hesselink, “Volume holographic storage and retrieval of digital data,” Science, vol. 265, pp. 749–752, 1994. [81] , “Channel codes for digital holographic data storage,” J. Opt. Soc. Amer. Ser. A, vol. 12, pp. 2432–2439, 1995. [82] J. Heanue, K. Gurkan, and L. Hesselink, “Signal detection for page- access optical memories with intersymbol interference,” Appl. Opt., vol. 35, no. 14, pp. 2431–2438, May 1996. [83] C. Heegard and L. Ozarow, “Bounding the capacity of saturation recording: the Lorentz model and applications,” IEEE J. Select. Areas Commun., vol. 10, pp. 145–156, Jan. 1992. [84] J. P. J. Heemskerk and K. A. S. Immink, “Compact disc: System aspects and modulation,” Philips Tech. Rev., vol. 40, no. 6, pp. 157–164, 1982. [85] P. S. Henry, “Zero disparity coding system,” U.S. Patent 4 309694, Jan. 1982. [86] T. Himeno, M. Tanaka, T. Katoku, K. Matsumoto, M. Tamura, and H. Min-Jae, “High-density magnetic tape recording by a nontracking method,” Electron. Commun. in Japan, vol. 76. no. 5, pt. 2, pp. 83–93, 1993 [87] W. Hirt, “Capacity and information rates of discrete-time channels with memory,” Ph.D. dissertation (Diss. ETH no. 8671), Swiss Federal Inst. Technol. (ETH), Zurich, Switzerland, 1988. [88] W. Hirt and J. L. Massey, “Capacity of the discrete-time Gaussian channel with intersymbol interference,” IEEE Trans. Inform. Theory, vol. 34, pp. 380–388, May 1988. [89] K. J. Hole, “Punctured convolutional codes for the partial-response channel,” IEEE Trans. Inform. Theory, vol. 37, pt. 2, pp. 808–817, May 1991. [90] K. J. Hole and Ø. Ytrehus, “Improved coding techniques for partial- response channels,” IEEE Trans. Inform. Theory, vol. 40, pp. 482–493, Mar. 1994. [91] H. D. L. Hollmann, “Modulation codes,” Ph.D. dissertation, Eindhoven Univ. Technol., Eindhoven, The Netherlands, Dec. 1996. [92] , “On the construction of bounded-delay encodable codes for con- strained systems,” IEEE Trans. Inform. Theory, vol. 41, pp. 1354–1378, Sept. 1995. [93] , “Bounded-delay-encodable, block-decodable codes for con- strained systems,” IEEE Trans. Inform. Theory, vol. 42, pp. 1957–1970, Nov. 1996. [94] J. E. Hopcroft and J. D. Ullman, Introduction to Automata Theory, Languages, and Computation. Reading, MA: Addison-Wesley, 1979. [95] S. Hunter, F. Kiamilev, S. Esener, D. Parthenopoulos, and P. M. Rentzepis,“Potentials of two-photon based 3D optical memories for high performance computing,” Appl. Opt., vol. 29, pp. 2058–2066, 1990. [96] K. A. S. Immink, “Modulation systems for digital audio discs with optical readout,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (Atlanta, GA, Apr. 1981), pp. 587–590. [97] , “Construction of binary DC-constrained codes,” Philips J. Res., vol. 40, pp. 22–39, 1985. [98] , “Performance of simple binary DC-constrained codes,” Philips J. Res., vol. 40, pp. 1–21, 1985. [99] , “Spectrum shaping with DC -constrained channel codes,” Philips J. Res., vol. 40, pp. 40–53, 1985. [100] , “Spectral null codes,” IEEE Trans. Magn., vol. 26, pp. 1130–1135, Mar. 1990. [101] , “Runlength-limited sequences,” Proc. IEEE, vol. 78, pp. 1745–1759, Nov. 1990. [102] ,Coding Techniques for Digital Recorders. Englewood Cliffs, NJ: Prentice-Hall Int. (UK), 1991. [103] , “Block-decodable runlength-limited codes via look-ahead tech- nique,” Philips J. Res., vol. 46, pp. 293–310, 1992. [104] , “Constructions of almost block-decodable runlength-limited codes,” IEEE Trans. Inform. Theory, vol. 41, pp. 284–287, Jan. 1995. [105] , “The Digital Versatile Disc (DVD): System requirements and channel coding,” SMPTE J., vol. 105, no. 8, pp. 483–489, Aug. 1996. [106] , “A practical method for approaching the channel capacity 2298 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 of constrained channels,” IEEE Trans. Inform. Theory, vol. 43, pp. 1389–1399, Sept. 1997. [107] , “Weakly constrained codes,” Electron. Lett., vol. 33, no. 23, pp. 1943–1944, Nov. 1997. [108] K. A. S. Immink and G. F. M. Beenker, “Binary transmission codes with higher order spectral zeros at zero frequency,” IEEE Trans. Inform. Theory, vol. IT-33, pp. 452–454, May 1987. [109] K. A. S. Immink and H. Ogawa, “Method for encoding binary data,” U.S. Patent 4501 000, Feb. 1985. [110] K. A. S. Immink and L. Patrovics, “Performance assessment of DC-free multimode codes,” IEEE Trans. Commun., vol. 45, pp. 293–299, Mar. 1997. [111] K. A. S. Immink and A. van Wijngaarden, “Simple high-rate constrained codes,” Electron. Lett., vol. 32, no. 20, pp. 1877, Sept. 1996. [112] G. V. Jacoby, “A new look-ahead code for increasing data density,” IEEE Trans. Magn., vol. MAG-13, pp. 1202–1204, Sept. 1977. See also U.S. Patent 4323 931, Apr. 1982. [113] G. V. Jacoby and R. Kost, “Binary two-thirds rate code with full word look-ahead,” IEEE Trans. Magn., vol. MAG-20, pp. 709–714, Sept. 1984. See also M. Cohn, G. V. Jacoby, and C. A. Bates III, U.S. Patent 4337 458, June 1982. [114] A. J. E. M. Janssen, private communication, 1998. [115] A. J. E. M. Janssen and K. A. S. Immink, “Entropy and power spectrum of asymmetrically DC-constrained binary sequences’, IEEE Trans. Inform. Theory, vol. 37, pp. 924–927, May 1991. [116] J. Justesen, “Information rates and power spectra of digital codes,” IEEE Trans. Inform. Theory, vol. IT-28, pp. 457–472, May 1982. [117] P. Kabal and S. Pasupathy, “Partial-response signaling,” IEEE Trans. Commun., vol. COM-23, pp. 921–934, Sept. 1975. [118] J. A. H. Kahlman and K. A. S. Immink, “Channel code with embedded pilot tracking tones for DVCR,” IEEE Trans. Consumer Electron., vol. 41, pp. 180–185, Feb. 1995. [119] H. Kamabe, “Minimum scope for sliding block decoder mappings,” IEEE Trans. Inform. Theory, vol. 35, pp. 1335–1340, Nov. 1989. [120] R. Karabed and B. H. Marcus, “Sliding-block coding for input-restricted channels,” IEEE Trans. Inform. Theory, vol. 34, pp. 2–26, Jan. 1988. [121] R. Karabed and P. H. Siegel, “Matched spectral-null codes for partial response channels,” IEEE Trans. Inform. Theory, vol. 37, no. 3, pt. II, pp. 818–855, May 1991. [122] , “Coding for higher order partial response channels,” in Proc. 1995 SPIE Int. Symp. Voice, Video, and Data Communications (Philadel- phia, PA, Oct. 1995), vol. 2605, pp. 115–126. [123] R. Karabed, P. Siegel, and E. Soljanin, “Constrained coding for channels with high intersymbol interference,”IEEE Trans. Inform. Theory,tobe published. [124] A. Kato and K. Zeger, “On the capacity of two-dimensional run- length-limited codes,” in Proc. 1998 IEEE Int. Symp. Information Theory (Cambridge, MA, Aug. 16–21, 1998), p. 320; submitted for publication to IEEE Trans. Inform. Theory. [125] W. H. Kautz, “Fibonacci codes for synchronization control,” IEEE Trans. Inform. Theory, vol. IT-11, pp. 284–292, 1965. [126] K. J. Kerpez, “The power spectral density of maximum entropy charge constrained sequences,” IEEE Trans. Inform. Theory, vol. 35, pp. 692–695, May 1989. [127] Z.-A. Khayrallah and D. Neuhoff, “Subshift models and ﬁnite-state codes for input-constrained noiseless channels: A tutorial,” Univ. Delaware EE Tech. Rep. 90–9–1, Dover, DE, 1990. [128] K. J. Knudson, J. K. Wolf, and L. B. Milstein, “A concatenated decoding scheme for partial response with matched spectral–null coding,” in Proc. 1993 IEEE Global Telecommunications Conf. (GLOBECOM ’93) (Houston, TX, Nov. 1993), pp. 1960–1964. [129] D. E. Knuth, “Efﬁcient balanced codes,” IEEE Trans. Inform. Theory, vol. IT-32, pp. 51–53, Jan. 1986. [130] H. Kobayashi, “Application of probabilistic decoding to digital magnetic recording systems,” IBM J. Res. Develop., vol. 15, pp. 65–74, Jan. 1971. [131] , “Correlative level coding and maximum-likelihood decoding,” IEEE Trans. Inform. Theory, vol. IT-17, pp. 586–594, Sept. 1971. [132] , “A survey of coding schemes for transmission or recording of digital data,” IEEE Trans. Commun., vol. COM-19, pp. 1087–1099, Dec. 1971. [133] H. Kobayashi and D. T. Tang, “Appliction of partial-response channel coding to magnetic recording systems,” IBM J. Res. Develop., vol. 14, pp. 368–375, July 1970. [134] E. R. Kretzmer, “Generalization of a technique for binary data transmis- sion,” IEEE Trans. Commun. Technol., vol. COM-14, pp. 67–68, Feb. 1966. [135] A. Kunisa, S. Takahashi, and N. Itoh, “Digital modulation method for recordable digital video disc,” IEEE Trans. Consumer Electron., vol. 42, pp. 820–825, Aug. 1996. [136] A. Lempel and M. Cohn, “Look-ahead coding for input-restricted channels,” IEEE Trans. Inform. Theory, vol. IT-28, pp. 933–937, Nov. 1982. [137] S. Lin and D. J. Costello, Jr., Error Control Coding, Fundamentals and Applications. Englewood Cliffs, NJ: Prentice-Hall, 1983. [138] D. Lind and B. Marcus, Symbolic Dynamics and Coding. Cambridge, U.K.: Cambridge Univ. Press, 1995. [139] J. C. Mallinson and J. W. Miller, “ Optimal codes for digital magnetic recording,” Radio Elec. Eng., vol. 47, pp. 172–176, 1977. [140] M. W. Marcellin and H. J. Weber, “Two-dimensional modulation codes,” IEEE J. Select. Areas Commun., vol. 10, pp. 254–266, Jan. 1992. [141] B. H. Marcus, “Soﬁc systems and encoding data,” IEEE Trans. Inform. Theory, vol. IT-31, pp. 366–377, May 1985. [142] , “Symbolic dynamics and connections to coding theory, automata theory and systems theory,” in Different Aspects of Coding Theory (Proc. Symp. Applied Matematics.), A. R. Calderbank, Ed., vol. 50, American Math. Soc., 1995. [143] B. H. Marcus and R. M. Roth, “Bounds on the number of states in encoder graphs for input-constrained channels,” IEEE Trans. Inform. Theory, vol. 37, no. 3, pt. 2, pp. 742–758, May 1991. [144] B. H. Marcus, R. M. Roth, and P. H. Siegel, “Constrained systems and coding for recording channels,” in Handbook of Coding Theory,R. Brualdi, C. Huffman, and V. Pless, Eds. Amsterdam, The Netherlands: Elsevier, 1998. [145] B. H. Marcus and P. H. Siegel, “On codes with spectral nulls at rational submultiples of the symbol frequency,” IEEE Trans. Inform. Theory, vol. IT-33, pp. 557–568, July 1987. [146] B. H. Marcus, P. H. Siegel, and J. K. Wolf, “Finite-state modulation codes for data storage,” IEEE J. Select. Areas Commun., vol. 10, pp. 5–37, Jan. 1992. [147] P. A. McEwen and J. K. Wolf, “Trellis codes for E PR4ML with squared-distance 18,” IEEE Trans. Magn., vol. 32, pp. 3995–3997, Sept. 1996. [148] S. W. McLaughlin, “Five runlength-limited codes for -ary recording channels, ” IEEE Trans. Magn., vol. 33, pp. 2442–2450, May 1997. [149] S. W. McLaughlin and D. L. Neuhoff, “Upper bounds on the capacity of the digital magnetic recording channel,” IEEE Trans. Magn., vol. 29, pp. 59–66, Jan. 1993. [150] J. W. Miller, U.S. Patent 4 027335, 1977. [151] T. Mittelholzer, P. A. McEwen, S. A. Altekar, and J. K. Wolf, “Finite truncation depth trellis codes for the dicode channel,” IEEE Trans. Magn., vol. 31, no. 6, pt. 1, pp. 3027–3029, Nov. 1995. [152] B. E. Moision, P. H. Siegel, and E. Soljanin, “Distance-enhancing codes for digital recording,” IEEE Trans. Magn., vol. 34, no. 1, pt. 1, pp. 69–74, Jan. 1998. [153] C. M. Monti and G. L. Pierobon, “ Codes with a multiple spectral null at zero frequency,” IEEE Trans. Inform. Theory, vol. 35, pp. 463–471, Mar. 1989. [154] J. Moon and B. Brickner, “ Maximum transition run codes for data stor- age systems,” IEEE Trans. Magn., vol. 32, no. 5, pt. 1, pp. 3992–3994, Sept. 1996. [155] , “Design of a rate 5/6 maximum transition run code,” IEEE Trans. Magn., vol. 33, pp. 2749–2751, Sept. 1997. [156] H. Nakajima and K. Odaka, “A rotary-head high-density digital au- dio tape recorder,” IEEE Trans. Consumer Electron., vol. CE-29, pp. 430–437, Aug. 1983. [157] K. Norris and D. S. Bloomberg, “Channel capacity of charge-constrained run-length limited codes,” IEEE Trans. Magn., vol. MAG-17, no. 6, pp. 3452–3455, Nov. 1981. [158] B. Olson and S. Esener,“Partial response precoding for parallel-readout optical memories,” Opt. Lett., vol. 19, pp. 661–663, 1993. [159] L. H. Ozarow, A. D. Wyner, and J. Ziv, “Achievable rates for a constrained Gaussian channel,” IEEE Trans. Inform. Theory, vol. 34, pp. 365–371, May 1988. [160] A. M. Patel, “Zero-modulation encoding in magnetic recording,” IBM J. Res. Develop., vol. 19, pp. 366–378, July 1975. See also U.S. Patent 3810 111, May 1974. [161] ,IBM Tech. Discl. Bull., vol. 231, no. 8, pp. 4633–4634, Jan.1989. [162] G. L. Pierobon, “Codes for zero spectral density at zero frequency,” IEEE Trans. Inform. Theory, vol. IT-30, pp. 435–439, Mar. 1984. [163] K. C. Pohlmann, The Compact Disc Handbook, 2nd ed. Madison, WI: A–R Editions, 1992. [164] J. Rae, G. Christiansen, S.-M. Shih, H. Thapar, R. Karabed, and P. Siegel, “Design and performance of a VLSI 120 Mb/s trellis-coded partial-response channel,” IEEE Trans. Magn., vol. 31, pp. 1208–1214, Mar. 1995. [165] R. M. Roth, P. H. Siegel, and A. Vardy, “High-order spectral-null codes: Constructions and bounds,” IEEE Trans. Inform. Theory, vol. 40, pp. 1826–1840, Nov. 1994. IMMINK et al.: CODES FOR DIGITAL RECORDERS 2299 [166] Lord Rothschild, “The distribution of English dictionary word lengths,” J. Statist. Planning Infer., vol. 14, pp. 311–322, 1986. [167] D. Rugar and P. H. Siegel, “Recording results and coding considerations for the resonant bias coil overwrite technique,” in Optical Data Storage Topical Meet., Proc. SPIE, G. R. Knight and C. N. Kurtz, Eds., vol. 1078, pp. 265–270, 1989. [168] W. E. Ryan, L. L. McPheters, and S. W. McLaughlin, “Combined turbo coding and turbo equalization for PR4-equalized Lorentzian channels,” in Proc. Conf. Information Science and Systems (CISS’98) (Princeton, NJ, Mar. 1998).. [169] N. Sayiner, “Impact of the track density versus linear density trade–off on the read channel: TCPR4 versus EPR4,” in Proc. 1995 SPIE Int. Symp. on Voice, Video, and Data Communications (Philadelphia, PA, Oct. 1995), vol. 2605, pp. 84–91. [170] E. Seneta, Non-negative Matrices and Markov Chains, 2nd ed. New York: Springer, 1980. [171] S. Shamai (Shitz) and I. Bar-David, “Upper bounds on the capacity for a constrained Gaussian channel,” IEEE Trans. Inform. Theory, vol. 35, pp. 1079–1084, Sept. 1989. [172] S. Shamai (Shitz), L. H. Ozarow, and A. D. Wyner, “Information rates for a discrete-time Gaussian channel with intersymbol interference and stationary inputs,” IEEE Trans. Inform. Theory, vol. 37, pp. 1527–1539, Nov. 1991. [173] C. E. Shannon, “A mathematical theory of communication,” Bell Syst. Tech. J., vol. 27, pp. 379–423, July 1948. [174] K. Shaughnessy, personal communication, Dec. 1997. [175] L. A. Shepp, “Covariance of unit processes,” in Proc. Working Conf. Stochastic Processes (Santa Barbara, CA, 1967), pp. 205–218. [176] K. Shimazaki, M. Yoshihiro, O. Ishizaki, S. Ohnuki, and N. Ohta, “Magnetic multi-valued magneto-optical disk,” J. Magn. Soc. Japan, vol. 19, suppl. no. S1, p. 429–430, 1995. [177] P. H. Siegel, “Recording codes for digital magnetic storage,” IEEE Trans. Magn., vol. MAG-21, pp. 1344–1349, Sept. 1985. [178] P. H. Siegel and J. K. Wolf, “Modulation and coding for information storage,” IEEE Commun. Mag., vol. 29, pp. 68–86, Dec. 1991. [179] , “Bit-stufﬁng bounds on the capacity of two-dimensional con- strained arrays,” in Proc. 1998 IEEE Int. Symp. Inform. Theory (Cam- bridge, MA, Aug. 16–21, 1998), p. 323. [180] J. G. Smith, “The information capacity of amplitude and variance constrained scalar Gaussian channels,” Inform. Contr., vol. 18, pp. 203–219, 1971. [181] E. Soljanin, “On–track and off–track distance properties of Class 4 partial response channels,” in Proc. 1995 SPIE Int. Symp. Voice, Video, and Data Communications (Philadelphia, PA, Oct. 1995), vol. 2605, pp. 92–102. [182] , “On coding for binary partial-response channels that don’t achieve the matched-ﬁlter-bound,” in Proc. 1996 Information Theory Work. (Haifa, Israel, June 9–13, 1996). [183] E. Soljanin and C. N. Georghiades, “Multihead detection for multitrack recording channels,” to be published in IEEE Trans. Inform. Theory, vol. 44, Nov. 1998. [184] E. Soljanin and O. E. Agazzi, “An interleaved coding scheme for partial response with concatenated decoding” in Proc. 1993 IEEE Global Telecommunications Conf. (GLOBECOM’96) (London, U.K., Nov. 1996). [185] R. E. Swanson and J. K. Wolf, “A new class of two-dimensional RLL recording codes,” IEEE Trans. Magn., vol. 28, pp. 3407–3416, Nov. 1992. [186] N. Swenson and J. M. Ciofﬁ, “Sliding block line codes to increase dispersion-limited distance of optical ﬁber channels,” IEEE J. Select. Areas Commun., vol. 13, pp. 485–498, Apr. 1995. [187] R. Talyansky, T. Etzion, and R. M. Roth, “Efﬁcient code constructions for certain two-dimensional constraints,” in Proc. 1997 IEEE Int. Symp. Information Theory (Ulm, Germany, June 29–July 4), p. 387. [188] D. T. Tang and L. R. Bahl, “Block codes for a class of constrained noiseless channels,” Inform. Contr., vol. 17, pp. 436-461, 1970. [189] H. K. Thapar and T. D. Howell, “On the performance of partial response maximum-likelihood and peak detection methods in digital recording,” in Tech. Dig. Magn. Rec. Conf 1991 (Hidden Valley, PA, June 1991). [190] H. Thapar and A. Patel, “A class of partial-response systems for increasing storage density in magnetic recording,” IEEE Trans. Magn., vol. MAG-23, pp. 3666–3668, Sept. 1987. [191] Tj. Tjalkens, “Runlength limited sequences,” IEEE Trans. Inform. The- ory, vol. 40, pp. 934–940, May 1994. [192] IEEE Trans. Magn., vol. 34, no. 1, pt. 1, Jan. 1998. [193] B. S. Tsybakov, “Capacity of a discrete Gaussian channel with a ﬁlter,” Probl. Pered. Inform., vol. 6, pp. 78–82, 1970. [194] C. M. J. van Uijen and C. P. M. J. Baggen, “Performance of a class of channel codes for asymmetric optical recording,” in Proc. 7th Int. Conf. Video, Audio and Data Recording, IERE Conf. Publ. no. 79 (York, U.K., Mar. 1988), pp. 29–32. [195] A. Vardy, M. Blaum, P. Siegel, and G. Sincerbox, “Conservative arrays: Multi-dimensional modulation codes for holographic recording,” IEEE Trans. Inform. Theory, vol. 42, pp. 227–230, Jan. 1996. [196] J. Watkinson, The Art of Digital Audio. London, U.K.: Focal, 1988. [197] A. D. Weathers and J. K. Wolf, “ A new sliding block code for the runlength constraint with the minimal number of encoder states,” IEEE Trans. Inform. Theory, vol. 37, no. 3, pt. 2, pp. 908–913, May 1991. [198] A. D. Weathers, S. A. Altekar, and J. K. Wolf, “Distance spectra for PRML channels,” IEEE Trans. Magn., vol. 33, pp. 2809–2811, Sept. 1997. [199] W. Weeks IV and R. E. Blahut, “The capacity and coding gain of certain checkerboard codes,” IEEE Trans. Inform. Theory, vol. 44, pp. 1193–1203, May 1998. [200] T. Weigandt, “Magneto-optic recording using a (2,18,2) run-length- limited code,” S.M. thesis, Mass. Inst. Technol., Cambridge, MA, 1991. [201] A. X. Widmer and P. A. Franaszek, “A DC-balanced, partitioned-block, 8b/10b transmission code,” IBM J. Res. Develop., vol. 27, no. 5, pp. 440–451, Sept. 1983. [202] A. van Wijngaarden and K. A. S. Immink, “Construction of constrained codes using sequence replacement techniques,” submitted for publica- tion to IEEE Trans. Inform. Theory, 1997. [203] J. K. Wolf and W. R. Richard, “Binary to ternary conversion by linear ﬁltering,” Tech. Documentary Rep. RADC-TDR-62-230, May 1962. [204] J. K. Wolf and G. Ungerboeck, “Trellis coding for partial-response channels,” IEEE Trans. Commun., vol. COM-34, pp. 765–773, Aug. 1986. [205] R. W. Wood, “Denser magnetic memory,” IEEE Spectrum, vol. 27, pp. 32–39, May 1990. [206] R. W. Wood and D. A. Petersen, “Viterbi detection of class IV partial response on a magnetic recoding channel,” IEEE Trans. Commun., vol. COM-34, pp. 454–461, May 1986. [207] Z.-N. Wu, S. Lin, and J. M. Ciofﬁ, “Capacity bounds for magnetic recording channels,” in Proc. 1998 IEEE Global Telecommun. Conf. (GLOBECOM ’98) (Sydney, Australia, Nov. 8–12, 1998), to be pub- lished. [208] H. Yoshida, T. Shimada, and Y. Hashimoto, “8-9 block code: A DC- free channel code for digital magnetic recording’, SMPTE J., vol. 92, pp. 918-922, Sept. 1983. [209] S. Yoshida and S. Yajima, “On the relation between an encoding automaton and the power spectrum of its output sequence,” Trans. IECE Japan, vol. 59, pp. 1–7, 1976. [210] A. H. Young, “Implementation issues of 8/9 distance-enhancing con- strained codes for EEPR4 channel,” M.S. thesis, Univ. Calif., San Diego, June 1997. [211] E. Zehavi, “Coding for magnetic recording,” Ph.D. dissertation, Univ. Calif., San Diego, 1987. [212] E. Zehavi and J. K. Wolf, “On saving decoder states for some trellis codes and partial response channels,” IEEE Trans. Commun., vol. 36, pp. 454–461, Feb. 1988. ... The DEC has been investigated in [22], [26], [43] and stands as a simplified version of the well-known dicode channel with white additive Gaussian noise (AWGN) used as a model in magnetic recording [44]. Specifically, in response to the input sequence (x t ), the DEC with parameter ǫ ∈ [0, 1] produces as output the sequence (y t ), where ... ... where (a) follows from the dual upper bound for FSCs and (44), and z j−1 , γ j−1 are defined as ... Conference Paper We consider the use of the well-known dual capacity bounding technique for deriving upper bounds on the capacity of indecomposable finite-state channels (FSCs) with finite input and output alphabets. In this technique, capacity upper bounds are obtained by choosing suitable test distributions on the sequence of channel outputs. We propose test distributions that arise from certain graphical structures called Q-graphs. As we show in this paper, the advantage of this choice of test distribution is that, for the important sub-classes of unifilar and input-driven FSCs, the resulting upper bounds can be formulated as a dynamic programming (DP) problem, which makes the bounds tractable. We illustrate this for several examples of FSCs, where we are able to solve the associated DP problems explicitly to obtain capacity upper bounds that either match or beat the best previously reported bounds. For instance, for the classical trapdoor channel, we improve the best known upper bound of 0.661 (due to Lutz (2014)) to 0.584, shrinking the gap to the best known lower bound of 0.572, all bounds being in units of bits per channel use. ... The popular model of finite-state channels (FSCs) [1]- [6] has been motivated by channels or systems with memory, common in wireless communication [7]- [13], molecular communication [14], [15] and magnetic recordings [16]. The memory of a channel or a system is encapsulated in a finite set of states in the FSC model. ... Preprint Full-text available We consider finite-state channels (FSCs) where the channel state is stochastically dependent on the previous channel output. We refer to these as Noisy Output is the STate (NOST) channels. We derive the feedback capacity of NOST channels in two scenarios: with and without causal state information (CSI) available at the encoder. If CSI is unavailable, the feedback capacity is$C_{\text{FB}}= \max_{P(x|y')} I(X;Y|Y')$, while if it is available at the encoder, the feedback capacity is$C_{\text{FB-CSI}}= \max_{P(u|y'),x(u,s')} I(U;Y|Y')$, where$U$is an auxiliary random variable with finite cardinality. In both formulas, the output process is a Markov process with stationary distribution. The derived formulas generalize special known instances from the literature, such as where the state is distributed i.i.d. and where it is a deterministic function of the output.$C_{\text{FB}}$and$C_{\text{FB-CSI}}\$ are also shown to be computable via concave optimization problem formulations. Finally, we give a sufficient condition under which CSI available at the encoder does not increase the feedback capacity, and we present an interesting example that demonstrates this.
... Gaussian. This is a reasonable model for magnetic recording (ignoring intersymbol interference) [21] and for flash memories (ignoring further impairments like cross talk) [22]. Note that this is actually a joint-source channel coding problem where the source is x ∈ {0, 1} M , the channel is Gaussian and can be used n times, and the distortion measure is Hamming distortion. ...
Article
Full-text available
Motivated by applications in unsourced random access, this paper develops a novel scheme for the problem of compressed sensing of binary signals. In this problem, the goal is to design a sensing matrix A and a recovery algorithm, such that the sparse binary vector x can be recovered reliably from the measurements y=Ax+σz, where z is additive white Gaussian noise. We propose to design A as a parity check matrix of a low-density parity-check code (LDPC) and to recover x from the measurements y using a Markov chain Monte Carlo algorithm, which runs relatively fast due to the sparse structure of A. The performance of our scheme is comparable to state-of-the-art schemes, which use dense sensing matrices, while enjoying the advantages of using a sparse sensing matrix.
... Partial-response (PR) channels are a class of channels with memory that are used as a model for transmission over bandwidth-limited channels. These channels are useful models in many applications including data storage and magnetic recording [1]- [3], wireless communication over time-varying multipath channels [4], optical communications and digital subscriber lines [5]. PR channels are a special case of finite-state machine channels (FSMCs), sometimes also called finite-state channels (FSCs) [6]. ...