Content uploaded by Kees Schouhamer Immink

Author content

All content in this area was uploaded by Kees Schouhamer Immink on Jul 13, 2019

Content may be subject to copyright.

2260 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

Codes for Digital Recorders

Kees A. Schouhamer Immink, Fellow, IEEE, Paul H. Siegel, Fellow, IEEE, and Jack K. Wolf, Fellow, IEEE

(Invited Paper)

Abstract—Constrained codes are a key component in the digital

recording devices that have become ubiquitous in computer data

storage and electronic entertainment applications. This paper

surveys the theory and practice of constrained coding, tracing

the evolution of the subject from its origins in Shannon’s classic

1948 paper to present-day applications in high-density digital

recorders. Open problems and future research directions are also

addressed.

Index Terms—Constrained channels, modulation codes, record-

ing codes.

I. INTRODUCTION

AS has been observed by many authors, the storage and

retrieval of digital information is a special case of digital

communications. To quote E. R. Berlekamp [18]:

Communication links transmit information from here to

there. Computer memories transmit information from

now to then.

Thus as information theory provides the theoretical under-

pinnings for digital communications, it also serves as the

foundation for understanding fundamental limits on reliable

digital data recording, as measured in terms of data rate and

storage density.

A block diagram which depicts the various steps in record-

ing and recovering data in a storage system is shown in Fig. 1.

This “Fig. 1” is essentially the same as the well-known Fig. 1

used by Shannon in his classic paper [173] to describe a

general communication system, but with the conﬁguration of

codes more explicitly shown.

As in many digital communication systems, a concatenated

approach to channel coding has been adopted in data recording,

consisting of an algebraic error-correcting code in cascade

with a modulation code. The inner modulation code, which

is the focus of this paper, serves the general function of

matching the recorded signals to the physical channel and

to the signal-processing techniques used in data retrieval,

while the outer error-correction code is designed to remove

Manuscript received December 10, 1997; revised June 5, 1998. The work

of P. H. Siegel was supported in part by the National Science Foundation

under Grant NCR-9612802. The work of J. K. Wolf was supported in part by

the National Science Foundation under Grant NCR-9405008.

K. A. S. Immink is with the Institute of Experimental Mathematics,

University of Essen, 45326 Essen, Germany.

P. H. Siegel and J. K. Wolf are with the University of California at San

Diego, La Jolla, CA 92093-0407 USA.

Publisher Item Identiﬁer S 0018-9448(98)06735-2.

Fig. 1. Block diagram of digital recording system.

any errors remaining after the detection and demodulation

process. (See [41] in this issue for a survey of applications

of error-control coding.)

As we will discuss in more detail in the next section, a

recording channel can be modeled, at a high level, as a linear,

intersymbol-interference (ISI) channel with additive Gaussian

noise, subject to a binary input constraint. The combination

of the ISI and the binary input restriction has presented a

challenge in the information-theoretic performance analysis

of recording channels, and it has also limited the applica-

bility of the coding and modulation techniques that have

been overwhelmingly successful in communication over linear

Gaussian channels. (See [56] in this issue for a comprehensive

discussion of these methods.)

The development of signal processing and coding tech-

niques for recording channels has taken place in an environ-

ment of escalating demand for higher data transfer rates and

storage capacity—magnetic disk drives for personal computers

today operate at astonishing data rates on the order of 240

million bits per second and store information at densities of

up to 3 billion bits per square inch—coupled with increasingly

severe constraints on hardware complexity and cost.

The needs of the data storage industry have not only fostered

innovation in practical code design, but have also spurred the

development of a rigorous mathematical foundation for the

theory and implementation of constrained codes. They have

also stimulated advances in the information-theoretic analysis

of input-constrained, noisy channels.

In this paper, we review the progress made during the past

50 years in the theory and practical design of constrained

modulation codes for digital data recording. Along the way, we

will highlight the fact that, although Shannon did not mention

0018–9448/98$10.00 1998 IEEE

IMMINK et al.: CODES FOR DIGITAL RECORDERS 2261

storage in his classic two-part paper whose golden anniversary

we celebrate in this issue—indeed random-access storage as

we know it today did not exist at the time—a large number of

fundamental results and techniques relevant to coding for stor-

age were introduced in his seminal publication. We will also

survey emerging directions in data-storage technology, and

discuss new challenges in information theory that they offer.

The outline of the remainder of the paper is as follows.

In Section II, we present background on magnetic-recording

channels. Section II-A gives a basic description of the physical

recording process and the resulting signal and noise character-

istics. In Section II-B, we discuss mathematical models that

capture essential features of the recording channel and we

review information-theoretic bounds on the capacity of these

models. In Section II-C, we describe the signal-processing

and -detection techniques that have been most widely used

in commercial digital-recording systems.

In Section III-A, we introduce the input-constrained, (noise-

less) recording channel model, and we examine certain time-

domain and frequency-domain constraints that the channel

input sequences must satisfy to ensure successful implemen-

tation of the data-detection process. In Section III-B, we

review Shannon’s theory of input-constrained noiseless chan-

nels, including the deﬁnition and computation of capacity, the

determination of the maxentropic sequence measure, and the

fundamental coding theorem for discrete noiseless channels.

In Section IV, we discuss the problem of designing efﬁcient,

invertible encoders for input-constrained channels. As in the

case of coding for noisy communication channels, this is

a subject about which Shannon had little to say. We will

summarize the substantial theoretical and practical progress

that has been made in constrained modulation code design.

In Section V, we present coded-modulation techniques

that have been developed to improve the performance of

noisy recording channels. In particular, we discuss families

of distance-enhancing constrained codes that are intended for

use with partial-response equalization and various types of

sequence detection, and we compare their performance to

estimates of the noisy channel capacity.

In Section VI, we give a compendium of modulation-code

constraints that have been used in digital recorders, describ-

ing in more detail their time-domain, frequency-domain, and

statistical properties.

In Section VII, we indicate several directions for future

research in coding for digital recording. In particular, we

consider the incorporation of improved channel models into

the design and performance evaluation of modulation codes,

as well as the invention of new coding techniques for ex-

ploratory information storage technologies, such as nonsatu-

ration recording using multilevel signals, multitrack recording

and detection, and multidimensional page-oriented storage.

Finally, in Section VIII, we close the paper with a dis-

cussion of Shannon’s intriguing, though somewhat cryptic,

remarks pertaining to the existence of crossword puzzles, and

make some observations about their relevance to coding for

multidimensional constrained recording channels.

Section IX brieﬂy summarizes the objectives and contents

of the paper.

II. BACKGROUND ON DIGITAL RECORDING

The history of signal processing in digital recording systems

can be cleanly broken into two epochs. From 1956 until

approximately 1990, direct-access storage devices relied upon

“analog” detection methods, most notably peak detection.

Beginning in 1990, the storage industry made a dramatic shift

to “digital” techniques, based upon partial-response equaliza-

tion and maximum-likelihood sequence detection, an approach

that had been proposed 20 years earlier by Kobayashi and

Tang [130], [131], [133]. To understand how these signal-

processing methods arose, we review a few basic facts about

the physical process underlying digital magnetic recording.

(Readers interested in the corresponding background on optical

recording may refer to [25], [84], [102, Ch. 2], and [163].)

We distill from the physics several mathematical models of

the recording channel, and describe upper and lower bounds

on their capacity. We then present in more detail the analog

and digital detection approaches, and we compare them to the

optimal detector for the uncoded channel.

A. Digital Recording Basics

The magnetic material contained on a magnetic disk or tape

can be thought of as being made up of a collection of discrete

magnetic particles or domains which can be magnetized by

a write head in one of two directions. In present systems,

digital information is stored along paths, called tracks, in this

magnetic medium. We store binary digits on a track by magne-

tizing these particles or domains in one of two directions. This

method is known as “saturation” recording. The stored binary

digits usually are referred to as “channel bits.” Note that the

word “bit” is used here as a contraction of the words “binary

digit” and not as a measure of information. In fact, we will

see that when coding is introduced, each channel bit represents

only a fraction of a bit of user information. The modiﬁer

“channel” in “channel bits” emphasizes this difference. We

will assume a synchronous storage system where the channel

bits occur at the ﬁxed rate of channel bits per second.

Thus is the duration of a channel bit. In all magnetic-

storage systems used today, the magnetic medium and the

read/write transducer (referred to as the read/write head) move

with respect to each other. If the relative velocity of a track

and the read/write head is constant, the constant time-duration

of the bit translates to a constant linear channel-bit density,

reﬂected in the length corresponding to a channel bit along

the track.

The normalized input signal applied to the recording trans-

ducer (write head) in this process can be thought of as a

two-level waveform which assumes the values and over

consecutive time intervals of duration In the waveform, the

transitions from one level to another, which effectively carry

the digital information, are therefore constrained to occur at

integer multiples of the time period , and we can describe

the waveform digitally as a sequence

over the bipolar alphabet where is the signal

amplitude in the time interval In the simplest

model, the input–output relationship of the digital magnetic

recording channel can be viewed as linear. Denote by

2262 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

Fig. 2. Lorentzian channel step response,

the output signal (readback voltage), in the absence of noise,

corresponding to a single transition from, say, to at

time Then, the output signal generated by the

waveform represented by the sequence is given by

(1)

with Note that the “derivative” sequence of

coefﬁcients consists of elements taken

from the ternary alphabet and the nonzero values,

corresponding to the transitions in the input signal, alternate

in sign.

A frequently used model for the transition response is

the function

often referred to as the Lorentzian model for an isolated-step

response. The parameter is sometimes denoted ,an

abbreviation for “pulsewidth at 50% maximum amplitude,” the

width of the pulse measured at 50% of its maximum height.

The Lorentzian step response with is shown in

Fig. 2.

The output signal is therefore the linear superposition of

time-shifted Lorentzian pulses with coefﬁcients of magnitude

equal to and alternating polarity. For this channel, sometimes

called the differentiated Lorentzian channel, the frequency

response is

where The magnitude of the frequency response

with is shown in Fig. 3.

The simplest model for channel noise assumes that the

noise is additive white Gaussian noise (AWGN). That is, the

readback signal takes the form

where

and

for all

There are, of course, far more accurate and sophisticated

models of a magnetic-recording system. These models take

into account the failure of linear superposition, asymmetries

in the positive and negative step responses, and other nonlinear

phenomena in the readback process. There are also advanced

models for media noise, incorporating the effects of material

defects, thermal asperities, data dependence, and adjacent track

interference. For more information on these, we direct the

reader to [20], [21], [32], and [33].

B. Channel Models and Capacity

The most basic model of a saturation magnetic-recording

system is a binary-input, linear, intersymbol-interference (ISI)

channel with AWGN, shown in Fig. 4.

This model has been, and continues to be, widely used in

comparing the theoretical performance of competing modula-

tion, coding, and signal-processing systems. During the past

IMMINK et al.: CODES FOR DIGITAL RECORDERS 2263

Fig. 3. Differentiated Lorentzian channel frequency response magnitude,

Fig. 4. Continuous-time recording channel model.

decade, there has been considerable research effort devoted to

ﬁnding the capacity of this channel. Much of this work was

motivated by the growing interest in digital recording among

the information and communication theory communities [36],

[37]. In this section, we survey some of the results pertaining

to this problem. As the reader will observe, the analysis is

limited to rather elementary channel models; the extension

to more advanced channel models represents a major open

research problem.

1) Continuous-Time Channel Models: Many of the bounds

we cite were ﬁrst developed for the ideal, low-pass ﬁlter

channel model. These are then adapted to the more realistic

differentiated Lorentzian ISI model.

For a given channel, let denote the capacity with a

constraint on the average input power. Let denote the

capacity with a peak power constraint Finally, let denote

the capacity with binary input levels It is clear that

The following important result, due to Ozarow, Wyner, and

Ziv [159], states that the ﬁrst inequality is, in fact, an equality

under very general conditions on the channel ISI.

Peak-Power Achievable Rate Lemma: For the channel

shown in Fig. 4, if is square integrable, then any rate

achievable using waveforms satifying

is achievable using the constrained waveforms

We now exploit this result to develop upper and lower

bounds on the capacity Consider, ﬁrst, a continuous-time,

bandlimited, additive Gaussian noise channel with transfer

function

if

otherwise.

Assume that the noise has (double-sided) spectral density

Let be the total noise power in the

channel bandwidth. Shannon established the well-known and

celebrated formula for the capacity of this channel, under the

assumption of an average power constraint on the channel

input signals. We quote from [173]:

Theorem 17: The capacity of a channel of band

perturbed by white thermal noise of power when the

average transmitter power is limited to is given by

(2)

(We have substituted the notation for Shannon’s nota-

tion to avoid confusion.)

2264 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

This result is a special case of the more general “water-

ﬁlling” theorem for the capacity of an average input-power

constrained channel with transfer function and noise

power spectral density [74, p. 388]

where denotes the range of frequencies in which

and satisﬁes the equation

By the peak-power achievable rate lemma, this result provides

an upper bound on the capacity of the recording channel.

Applications of this bound to a parameterized channel model

are presented in [70].

An improved upper bound on the capacity of

the low-pass AWGN channel was developed by Shamai and

Bar-David [171]. This bound is a reﬁnement of the water-

ﬁlling upper bound, based upon a characterization of the

power spectral density of any unit process, meaning

a zero-mean, stationary, two-level continuous-time random

process [175]. For a speciﬁed input-power spectral density

, a Gaussian input distribution maximizes the capacity.

Therefore, for a given channel transfer function

where

and the supremum is taken over all unit process power

spectral densities. In [171], an approximate solution to this

optimization problem for the ideal low-pass ﬁlter was used to

prove that peak-power limiting on the bandlimited channel

does indeed reduce capacity relative to the average-power

constrained channel. This bounding technique was applied to

the differentiated Lorentzian channel with additive colored

Gaussian noise in [207].

We now consider lower bounds to the capacity

Shannon [173] considered the capacity of a peak-power input

constraint on the ideal bandlimited AWGN channel, noting

that “a constraint of this type does not work out as well

mathematically as the average power limitation.” Nevertheless,

he provided a lower bound, quoted below:

Theorem 20: The channel capacity for a band

perturbed by white thermal noise of power is bounded

by

where is the peak allowed transmitter power

(We have substituted the notation for Shannon’s notation

to avoid confusion.)

In [159], the peak-power achievable rate lemma was used

to derive a lower bound on for the ideal, binary-input

constrained, bandlimited channel

nats/s

A lower bound for the more accurate channel model compris-

ing a cascade of a differentiator and ideal low-pass ﬁlter was

also determined. For this channel, it was shown that

nats/s

In both cases, the discrepancy between the lower bounds and

the water-ﬁlling upper bounds represents an effective

signal-to-noise ratio (SNR) difference of or about 7.6

dB at high signal-to-noise ratios.

Heegard and Ozarow [83] incorporated the differentiated

Lorentzian channel model into a similar analysis. To obtain a

lower bound, they optimize, with respect to , the inequality

where is the pulse power spectral density for the

differentiated Lorentzian channel

with

Their results indicate that, just as for the low-pass channel and

the differentiated low-pass channel, the difference in effective

signal-to-noise ratios between upper and lower bounds on

capacity is approximately , for large signal-to-noise ra-

tios. The corresponding bound for the differentiated Lorentzian

channel with additive colored Gaussian noise was determined

in [207].

Shamai and Bar-David [171] developed an improved lower

bound on by analyzing the achievable rate of a

random telegraph wave, that is, a unit process with time

intervals between transitions independently governed by an

exponential distribution. Again, the corresponding bound for

the differentiated Lorentzian channel with additive colored

Gaussian noise was discussed in [207]. Bounds on capacity for

a model incorporating slope-limitations on the magnetization

are addressed in [14].

Computational results for the differentiated Lorentzian chan-

nel with additive colored Gaussian noise are given in [207].

For channel densities in the range of – , which

corresponds to channel densities of current practical interest,

the required SNR for arbitrarily low error rate was calculated.

The gap between the best capacity bounds, namely, the unit

process upper bound and the random telegraph wave lower

bound, was found to be approximately 3 dB throughout

the range.

IMMINK et al.: CODES FOR DIGITAL RECORDERS 2265

2) Discrete-Time Channel Models The capacity of discrete-

time channel models applicable to digital recording has

been addressed by several authors, for example, [193], [88],

[87], and [172]. The capacity of an average input-power-

constrained, discrete-time, memoryless channel with additive,

independent and identically distributed (i.i.d.) Gaussian noise

is given by the well-known formula [74]

where is the noise variance and is the average input-

power constraint. This result is the discrete-time equivalent

to Shannon’s formula (2) via the sampling theorem. Smith

[180] showed that the capacity of an amplitude-constrained,

discrete-time, memoryless Gaussian channel is achieved by a

ﬁnite-valued random variable, representing the input to the

channel, whose distribution is uniquely determined by the

input constraint. (Note that, unlike the case of an average

input-power constraint, this result cannot be directly translated

to the continuous-time model.)

Shamai, Ozarow, and Wyner [172] established upper and

lower bounds on the capacity of the discrete-time Gaussian

channel with ISI and stationary inputs. We will encounter in

the next section a discrete-time ISI model of the magnetic-

recording channel of the form , for

For the channel decomposes into a pair of

interleaved “dicode” channels corresponding to

In [172], the capacity upper bound was compared to upper

and lower bounds on the maximum achievable information

rate for the normalized dicode channel model with system

polynomial , and input levels

These upper and lower bounds are given by

(3)

and

respectively, where

is the capacity of a binary input-constrained, memoryless

Gaussian channel. Thus the upper bound on is simply the

capacity of the latter channel. These upper and lower bounds

differ by 3 dB, as was the case for continuous-time channel

models.

For other results on capacity estimates of recording-channel

models, we refer the reader to [14] and [149]. The general

problem of computing, or developing improved bounds for,

the capacity of discrete-time ISI models of recording channels

remains a signiﬁcant challenge.

C. Detectors for Uncoded Channels

Forney [53] derived the optimal sequence detector for an un-

coded, linear, intersymbol-interference channel with additive

white Gaussian noise. This detection method, the well-known

maximum-likelihood sequence detector (MLSD), comprises a

whitened matched ﬁlter, whose output is sampled at the symbol

rate, followed by a Viterbi detector whose trellis structure

reﬂects the memory of the ISI channel. For the differenti-

ated Lorentzian channel model, as for many communication

channel models, this detector structure would be prohibi-

tively complex to implement, requiring an unbounded number

of states in the Viterbi detector. Consequently, suboptimal

detection techniques have been implemented. As mentioned

at the start of this section, most storage devices did not

even utilize sampled detection methods until the start of

this decade, relying upon equalization to mitigate effects

of ISI, coupled with analog symbol-by-symbol detection of

waveform features such as peak positions and amplitudes.

Since the introduction of digital signal-processing techniques

in recording systems, partial-response equalization and Viterbi

detection have been widely adopted. They represent a practical

compromise between implementability and optimality, with

respect to the MLSD. We now brieﬂy summarize the main

features of these detection methods.

1) Peak Detection: The channel model described above is

accurate at relatively low linear densities (say

and where the noise is generated primarily in the readback

electronics. Provided that the density of transitions and the

noise variance are small enough, the locations of peaks

in the output signal will closely correspond to the locations of

the transitions in the recorded input signal. With a synchronous

clock of period , one could then, in principle, reconstruct

the ternary sequence and the recorded bipolar sequence

The detection method used to implement this process in the

potentially noisy digital recording device is known as peak

detection and it operates roughly as follows. The peak detector

differentiates the rectiﬁed readback signal, and determines the

time intervals in which zero crossings occur. In parallel, the

amplitude of each corresponding extremal point in the rectiﬁed

signal is compared to a prespeciﬁed threshold, and if the

threshold is not exceeded, the corresponding zero crossing is

ignored. This ensures that low-amplitude, spurious peaks due

to noise will be excluded from consideration. Those intervals

in which the threshold is exceeded are designated as having a

peak. The two-level recorded sequence is then reconstructed,

with a transition in polarity corresponding to each interval

containing a detected peak. Clock accuracy is maintained by

an adaptive timing recovery circuit—known as a phase-lock

loop (PLL)—which adjusts the clock frequency and phase to

ensure that the amplitude-qualiﬁed zero crossings occur, on

average, in the center of their respective clock intervals.

2) PRML: Current high-density recording systems use a

technique referred to as PRML, an acronym for “partial-

response (PR) equalization with maximum-likelihood (ML)

sequence detection.” We now brieﬂy review the essence of

this technique in order to motivate the use of constrained

modulation codes in PRML systems.

Kobayashi and Tang [133] proposed a digital communica-

tions approach to handling intersymbol interference in digital

magnetic recording. In contrast to peak detection, their method

reconstructed the recorded sequence from sample values of a

suitably equalized readback signal, with the samples measured

2266 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

at time instants At channel bit densities

corresponding to , the transfer characteristics

of the Lorentzian model of the saturation recording channel

(with a time shift of ) closely resemble those of a linear

ﬁlter with step response given by

(4)

where

Note that at the consecutive sample times and ,

the function has the value , while at all other times

which are multiples of , the value is . Through linear

superposition (1), the output signal generated by the

waveform represented by the bipolar sequence is given by

which can be rewritten as

where we set The transition response

results in controlled intersymbol interference at sample times,

leading to output-signal samples that, in the

absence of noise, assume values in the set Thus in the

noiseless case, we can recover the recorded bipolar sequence

from the output sample values , because

the interference between adjacent transitions is prescribed. In

contrast to the peak detection method, this approach does not

require the separation of transitions.

Sampling provides a discrete-time version of this recording-

channel model. Setting , the input–output relation-

ship is given by

In -transform notation, whereby a sequence is represented

by

the input–output relationship becomes

where the channel transfer function satisﬁes

This represention, called a partial-response channel model,

is among those given a designation by Kretzmer [134] and

tabulated by Kabal and Pasupathy [117]. The label assigned

to it—“Class-4”—continues to be used in its designation, and

the model is sometimes denoted “PR4.”

For higher channel bit densities, Thapar and Patel [190]

introduced a general class of partial-response models, with

step-response functions

(5)

The corresponding input–output relationship takes the form

where the discrete-time impulse response

has the form

where The frequency response corresponding to

has a ﬁrst-order null at zero frequency and a null

of order at the Nyquist frequency, one-half the symbol

frequency. Clearly, the PR4 model corresponds to

The channel models with are usually referred to as

“extended Class-4” models, and denoted by E PR4. The

PR4, EPR4, and E PR4 models are used in the design of most

magnetic disk drives today.

Models proposed for use in optical-recording systems have

discrete-time impulse responses of the form

where These models reﬂect the nonzero DC-response

characteristic of some optical-recording systems, as well as

their high-frequency attenuation. The models corresponding

to and were also tabulated in [117], and

are known as Class-1 (PR1) or “duobinary,” and Class-

2 (PR2), respectively. Recently, the models with

have been called “extended PR2” models, and denoted by

EPR2. (See [203] for an early analysis and application

of PR equalization.)

If the differentiated Lorentzian channel with AWGN is

equalized to a partial-response target, the sampled channel

model becomes

where and

Under the simplifying assumption that the noise samples

are independent and identically distributed, and Gauss-

ian—which is a reasonable assumption if the selected partial-

response target accurately reﬂects the behavior of the channel

at the speciﬁed channel bit density—the maximum-likelihood

sequence detector determines the channel input–output pair

and satisfying

at each time

This computation can be carried out recursively, using the

Viterbi algorithm. In fact, Kobayashi [130], [131] proposed the

use of the Viterbi algorithm for maximum-likelihood sequence

IMMINK et al.: CODES FOR DIGITAL RECORDERS 2267

Fig. 5. Trellis diagram for PR4 channel.

detection (MLSD) on a PR4 recording channel at about the

same time that Forney [53] demonstrated its applicability to

MLSD on digital communication channels with intersymbol

interference.

The operation of the Viterbi algorithm and its implemen-

tation complexity are often described in terms of the trellis

diagram corresponding to [53], [54] representing the

time evolution of the channel input–output process. The trellis

structure for the E PR4 channel has states. In the

case of the PR4 channel, the input–output relationship

permits the detector to operate independently on

the output subsequences at even and odd time indices. The

Viterbi algorithm can then be described in terms of a decoupled

pair of -state trellises, as shown in Fig. 5. There has been

considerable effort applied to simplifying Viterbi detector ar-

chitectures for use in high data-rate, digital-recording systems.

In particular, there are a number of formulations of the PR4

channel detector. See [131], [178], [206], [211], [50], and

[205].

Analysis, simulation, and experimental measurements have

conﬁrmed that PRML systems provide substantial performance

improvements over RLL-coded, equalized peak detection. The

beneﬁts can be realized in the form of 3–5-dB additional noise

immunity at linear densities where optimized peak-detection

bit-error rates are in the range of – Alternatively, the

gains can translate into increased linear density—in that range

of error rates, PR4-based PRML channels achieve 15–25%

higher linear density than -coded peak detection, with

EPR4-based PRML channels providing an additional improve-

ment of approximately 15% [189], [39].

The SNR loss of several PRML systems and MLSD relative

to the matched-ﬁlter bound at a bit-error rate of was

computed in [190]. The results show that, with the proper

choice of PR target for a given density, PRML performance

can achieve within 1–2 dB of the MLSD.

In [207], simulation results for MLSD and PR4-based

PRML detection on a differentiated Lorentzian channel with

colored Gaussian media noise were compared to some of the

capacity bounds discussed in Section II-B. For in

the range of – , PR4-based PRML required approximately

2–4 dB higher SNR than MLSD to achieve a bit-error rate

of The SNR gap between MLSD and the telegraph-

wave information-rate lower bound [171] was approximately

4 dB, and the gap from the unit-process upper bound [171]

was approximately 7 dB. These results suggest that, through

suitable coupling of equalization and coding, SNR gains as

large as 6 dB over PR4-based PRML should be achievable.

In Section V, we will describe some of the equalization and

coding techniques that have been developed in an attempt to

realize this gain.

III. SHANNON THEORY OF CONSTRAINED CHANNELS

In this section, we show how the implementation of record-

ing systems based upon peak detection and PRML introduces

the need for constraints to be imposed upon channel input

sequences. We then review Shannon’s fundamental results on

the theory of constrained channels and codes.

A. Modulation Constraints

1) Runlength Constraints: At moderate densities, peak de-

tection errors may arise from ISI-induced shifting of peak

locations and drifting of clock phase due to an inadequate

number of detected peak locations.

The latter two problems are pattern-dependent, and the class

of runlength-limited (RLL) sequences are intended to

address them both [132], [101]. Speciﬁcally, in order to reduce

the effects of pulse interference, one can demand that the

derivative sequence of the channel input contain some

minimum number, say , of symbols of value zero between

consecutive nonzero values. Similarly, to prevent loss of clock

synchronization, one can require that there be no more than

some maximum number, say , of symbols of value zero

between consecutive nonzero values in

In this context, we mention that two conventions are used

to map a binary sequence to the magnetization

pattern along a track, or equivalently, to the two-level sequence

In one convention, called nonreturn-to-zero (NRZ), one

direction of magnetization (or ) corresponds to a

stored and the other direction of magnetization (or )

corresponds to a stored . In the other convention, called

nonreturn-to-zero-inverse (NRZI), a reversal of the direction

of magnetization (or ) represents a stored and a

nonreversal of magnetization (or ) represents a stored .

The NRZI precoding convention may be interpreted as a

translation of the binary information sequence into another

binary sequence that is then mapped by the

NRZ convention to the two-level sequence The relationship

between and is deﬁned by

where and denotes addition modulo .

It is easy to see that

and, therefore,

Thus under the NRZI precoding convention, the constraints on

the runlengths of consecutive zero symbols in are reﬂected

2268 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

Fig. 6. Labeled directed graph for constraint.

in corresponding constraints on the binary information

sequences The set of sequences satisfying this constraint

can be generated by reading the labels off of the paths in the

directed graph shown in Fig. 6.

2) Constraints for PRML Channels: Two issues arise in the

implementation of PRML systems that are related to properties

of the recorded sequences. The ﬁrst issue is that, just as in peak

detection systems, long runs of zero samples in the PR channel

output can degrade the performance of the timing recovery and

gain control loops. This dictates the use of a global constraint

on the number of consecutive zero samples, analogous to

the constraint described above.

The second issue arises from a property of the PR

systems known as quasicatastrophic error propagation

[55]. This refers to the fact that certain bi-inﬁnite PR

channel output sequences are represented by more than

one path in the detector trellis. Such a sequence is pro-

duced by at least two distinct channel input sequences.

For the PR channels under consideration, namely, those

with transfer polynomial

, the difference sequences ,

corresponding to pairs of such input sequences and ,

are easily characterized. (For convenience, the symbols

and are denoted by and , respectively, in these

difference sequences.) Speciﬁcally, if and ,

then these difference sequences are and

If and , the difference sequences are of the form

and Finally, if and ,

then they are and

As a consequence of the existence of these sequences, there

could be a potentially unbounded delay in the merging of

survivor paths in the Viterbi detection process beyond any

speciﬁed time index , even in the absence of noise. It is

therefore desirable to constrain the channel input sequences

in such a way that these difference sequences are forbidden.

This property makes it possible to limit the detector path

memory, and therefore the decoding delay, without incurring

any signiﬁcant degradation in the sequence estimates produced

by the detector.

In the case of PR4, this has been accomplished by limiting

the length of runs of identical channel inputs in each of the

even and odd interleaves, or, equivalently, the length of runs

of zero samples in each interleave at the channel output, to be

no more than a speciﬁed positive integer By incorporating

interleaved NRZI (INRZI) precoding, the and constraints

on output sequences translate into and constraints on

binary input sequences The resulting constraints are denoted

, where the “ ” may be interpreted as a

Fig. 7. DC-free constrained sequences with DSV

constraint, emphasizing the point that intersymbol interference

is acceptable in PRML systems. It should be noted that the

combination of constraints and an INRZI precoder

have been used to prevent quasicatastrophic error propagation

in EPR4 channels, as well.

3) Spectral-Null Constraints: The family of run-

length-limited constraints and PRML constraints are

representative of constraints whose description is essentially in

the time domain (although the constraints certainly have impli-

cations for frequency-domain characteristics of the constrained

sequences). There are other constraints whose formulation is

most natural in the frequency domain. One such constraint

speciﬁes that the recorded sequences have no spectral

content at a particular frequence ; that is, the average power

spectral density function of the sequences has value zero at the

speciﬁed frequency. The sequences are said to have a spectral

null at frequency

For an ensemble of sequences, with symbols drawn from the

bipolar alphabet and generated by a ﬁnite labeled

directed graph of the kind illustrated in Fig. 6, a necessary

and sufﬁcient condition for a spectral null at frequency

, where is the duration of a single recorded

symbol, is that there exist a constant such that

(6)

for all recorded sequences and

[145], [162], [209].

In digital recording, the spectral null constraints of most

importance have been those that prescribe a spectral null

at or DC. The sequences are said to be DC-free

or charge-constrained. The concept of running digital sum

(RDS) of a sequence plays a signiﬁcant role in the description

and analysis of DC-free sequences. For a bipolar sequence

, the RDS of a subsequence ,

denoted RDS is deﬁned as

RDS

From (6), we see that the spectral density of the sequences

vanishes at if and only if the RDS values for all

sequences are bounded in magnitude by some constant integer

For sequences that assume a range of consecutive RDS

values, we say that their digital sum variation (DSV) is

Fig. 7 shows a graph describing the bipolar, DC-free system

with DSV equal to

DC-free sequences have found widespread application in

optical and magnetic recording systems. In magnetic-tape

IMMINK et al.: CODES FOR DIGITAL RECORDERS 2269

systems with rotary-type recording heads, such as the R-

DAT digital audio tape system, they prevent write-signal

distortion that can arise from transformer-coupling in the

write electronics. In optical-recording systems, they reduce

interference between data and servo signals, and also permit

ﬁltering of low-frequency noise stemming from smudges on

the disk surface. It should be noted that the application of DC-

free constraints has certainly not been conﬁned to data storage.

Since the early days of digital communication by means of

cable, DC-free codes have been employed to counter the

effects of low-frequency cutoff due to coupling components,

isolating transformers, and other possible system impairments

[35].

Sequences with a spectral null at also play an

important role in digital recording. These sequences are often

referred to as Nyquist free. There is in fact a close relationship

between Nyquist-free and DC-free sequences. Speciﬁcally,

consider sequences over the bipolar alphabet

If is DC-free, then the sequence deﬁned by

is Nyquist-free. DC/Nyquist-free sequences have spectral nulls

at both and Such sequences can always

be decomposed into a pair of interleaved DC-free sequences.

This fact is exploited in Section V-C in the design of distance-

enhancing, DC/Nyquist-free codes for PRML systems.

In some recording applications, sequences satisfying both

charge and runlength constraints have been used. In particular,

a sequence in the charge-RLL constraint satisﬁes

the runlength constraint, with the added restriction

that the corresponding NRZI bipolar sequence be DC-

free with DSV no larger than Codes using

and constraints—known, respectively, as

“zero-modulation” and “Miller-squared” codes—have found

application in commercial tape-recording systems [160], [139],

[150].

B. Discrete Noiseless Channels

In Section III-A, we saw that the successful implementation

of analog and digital signal-processing techniques used in data

recording may require that the binary channel input sequences

satisfy constraints in both the time and the frequency domains.

Shannon established many of the fundamental properties

of noiseless, input-constrained communication channels in

Part I of his 1948 paper [173]. In that section, entitled

“Discrete Noiseless Systems,” Shannon considered discrete

communication channels, such as the teletype or telegraph

channel, where the transmitted symbols were of possibly

different time duration and satisﬁed a set of constraints as to

the order in which they could occur. We will review his key

results and illustrate them using the family of runlength-limited

codes, introduced in Section III-A.

Shannon ﬁrst deﬁned the capacity of a discrete noiseless

channel as

(7)

where is the number of allowed sequences of length

The following quote, which provides a method of computing

the capacity, is taken directly from Shannon’s original paper

(equation numbers added):

Suppose all sequences of the symbols are

allowed and these symbols have durations

What is the channel capacity? If represents the

number of sequences of duration , we have

(8)

The total number is equal to the sum of the number of

sequences ending in and there are

respectively. According

to a well-known result in ﬁnite differences, is then

asymptotic for large to where is the largest real

solution of the characteristic equation

(9)

and, therefore,

(10)

Shannon’s results can be applied directly to the case of

codes by associating the symbols with the

different allowable sequences of ’s ending in a . The

result is

(11)

where is the largest real solution of the equation

(12)

Shannon went on to describe constrained sequences by labeled,

directed graphs, often referred to as state-transition diagrams.

Again, quoting from the paper:

A very general type of restriction which may be placed

on allowed sequences is the following: We imagine a

number of possible states For each state

only certain symbols from the set can be

transmitted (different subsets for the different states).

When one of these has been transmitted the state changes

to a new state depending both on the old state and the

particular symbol transmitted.

Shannon then proceeded to state the following theorem

which he proved in an appendix:

Theorem 1: Let be the duration of the th symbol

which is allowable in state and leads to state Then

the channel capacity is equal to where is

the largest real root of the determinant equation:

(13)

where if and is zero otherwise.

The condition that different states must correspond to dif-

ferent subsets of the transmission alphabet is unnecessarily

2270 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

restrictive. For the theorem to hold, it sufﬁces that the state-

transition diagram representation be lossless, meaning that any

two distinct state sequences beginning at a common state and

ending at a, possibly different, common state generate distinct

symbol sequences [144].

This result can be applied to sequences in two

different ways. In the ﬁrst, we let the be the collection

of allowable runs of consecutive ’s followed by a ,as

before. With this interpretation we have only one state since

any concatenation of these runs is allowable. The determinant

equation then becomes the same as (12) with replaced by

In the second interpretation, we let the be associated

with the binary symbols and and we use the graph with

states shown earlier in Fig. 6. Note now that all of the

symbols are of length so that the determinant equation is of

the form (14), as shown at the bottom of this page.

Multiplying every element in the matrix by , we see

that this equation speciﬁes the eigenvalues of the connection

matrix, or adjacency matrix, of the graph—that is, a matrix

which has th entry equal to if there is a symbol from

state that results in the new state and which has th

entry equal to otherwise. (The notion of adjacency matrix

can be extended to graphs with a multiplicity of distinctly

labeled edges connecting pairs of states.) Thus we see that the

channel capacity is equal to the logarithm of the largest real

eigenvalue of the connection matrix of the constraint graph

shown in Fig. 6.

Shannon proceeded to produce an information source by

assigning nonzero probabilities to the symbols leaving each

state of the graph. These probabilities can be assigned in any

manner subject to the constraint that for each state, the sum of

the probabilities for all symbols leaving that state is . Shannon

gave formulas as to how to choose these probabilities such

that the resulting information source had maximum entropy.

He further showed that this maximum entropy is equal to the

capacity Speciﬁcally, he proved the following theorem.

Theorem 8: Let the system of constraints considered as

a channel have a capacity If we assign

where is the duration of the th symbol leading from

state to state and the satisfy

then is maximized and equal to

It is an easy matter to apply Shannon’s result to ﬁnd these

probabilities for codes. The result is that the probability

of a run of ’s followed by a is equal to for

, and is the maximum entropy. Since

the sum of these probabilities (summed over all possible

runlengths) must equal we have

(15)

Note that this equation is identical to (12), except for the

choice of the indeterminate. Thus the maximum entropy is

achieved by choosing as the largest real root of this equation

and the maximum entropy is equal to the capacity The

probabilities of the symbols which result in the maximum

entropy are shown in Fig. 8 (where now the branch labels are

the probabilities of the binary symbols and not the symbols

themselves).

The maximum-entropy solution described in the theorem

dictates that any sequence of length , starting

in state and ending in state , has probability

where denotes the probability of state Therefore,

This is a special case of the notion of “typical long sequences”

again introduced by Shannon in his classic paper. In this

special case of maximum-entropy sequences, for large

enough, all sequences of length are entropy-typical in this

sense. This is analogous to the case of symbols which are of

ﬁxed duration, equally probable, and statistically independent.

Shannon proved that the capacity of a constrained channel

represents an upper bound on the achievable rate of infor-

mation transmission on the channel. Moreover, he deﬁned a

concept of typical sequences and, using that concept, demon-

strated that transmission at rates arbitrarily close to can in

.

.

.

.

.

.

(14)

IMMINK et al.: CODES FOR DIGITAL RECORDERS 2271

Fig. 8. Markov graph for maximum entropy sequences.

principle be achieved. Speciﬁcally, he proved the following

“fundamental theorem for a noiseless channel” governing

transmission of the output of an information source over a

constrained channel. We again quote from [173].

Theorem 9: Let a source have entropy (bits per

symbol) and a channel have a capacity (bits per

second). Then it is possible to encode the output of the

source in such a way as to transmit at the average rate

symbols per second over the channel where

is arbitrarily small. It is not possible to transmit at an

average rate greater than

The proof technique, relying as it does upon typical long

sequences, is nonconstructive. It is interesting to note, how-

ever, that Shannon formulated the operations of the source

encoder (and decoder) in terms of a ﬁnite-state machine, a

construct that has since been widely applied to constrained

channel encoding and decoding. In the next section, we turn

to the problem of designing efﬁcient ﬁnite-state encoders.

IV. CODES FOR NOISELESS CONSTRAINED CHANNELS

For constraints described by a ﬁnite-state, directed graph

with edge labels, Shannon’s fundamental coding theorem guar-

antees the existence of codes that achieve any rate less than

the capacity. Unfortunately, as mentioned above, Shannon’s

proof of the theorem is nonconstructive. However, during

the past 40 years, substantial progress has been made in the

engineering design of efﬁcient codes for various constraints,

including many of interest in digital recording. There have

also been major strides in the development of general code

construction techniques, and, during the past 20 years, rigorous

mathematical foundations have been established that permit

the resolution of questions pertaining to code existence, code

construction, and code implementation complexity.

Early contributors to the theory and practical application

of constrained code design include: Berkoff [19]; Cattermole

[34], [35]; Cohen [40]; Freiman and Wyner [69]; Gabor [73];

Jacoby [112], [113]; Kautz [125]; Lempel [136]; Patel [160];

and Tang and Bahl[188]; and, especially, Franaszek [57]–[64].

Further advances were made by Adler, Coppersmith, and

Hassner (ACH) [3]; Marcus [141]; Karabed and Marcus [120];

Ashley, Marcus, and Roth [12]; Ashley and Marcus [9], [10];

Immink [104]; and Hollmann [91]–[93].

Fig. 9. Finite-state encoder schematic.

In this section, we will survey selected aspects of this theo-

retical and practical progress. The presentation largely follows

[102], [146], and, especially, [144], where more detailed and

comprehensive treatments of coding for constrained channels

may be found.

A. Encoders and Decoders

Encoders have the task of translating arbitrary source in-

formation into a constrained sequence. In coding practice,

typically, the source sequence is partitioned into blocks of

length , and under the code rules such blocks are mapped onto

words of channel symbols. The rate of such an encoder is

To emphasize the blocklengths, we sometimes

denote the rate as

It is most important that this mapping be done as efﬁ-

ciently as possible subject to certain practical considerations.

Efﬁciency is measured by the ratio of the code rate to

the capacity of the constrained channel. A good encoder

algorithm realizes a code rate close to the capacity of the

constrained sequences, uses a simple implementation, and

avoids the propagation of errors in the process of decoding.

An encoder may be state-dependent, in which case the code-

word used to represent a given source block is a function of the

channel or encoder state, or the code may be state-independent.

State-independence implies that codewords can be freely con-

catenated without violating the sequence constraints. A set of

such codewords is called self-concatenable. When the encoder

is state-dependent, it typically takes the form of a synchronous

ﬁnite-state machine, illustrated schematically in Fig. 9.

A decoder is preferably state-independent. As a result of

errors made during transmission, a state-dependent decoder

could easily lose track of the encoder state, and begin to

make errors, with no guarantee of recovery. In order to avoid

2272 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

Fig. 10. Sliding-block decoder schematic.

error propagation, therefore, a decoder should use a ﬁnite

observation interval of channel bits for decoding, thus limiting

the span in which errors may occur. Such a decoder is called

asliding-block decoder. A sliding-block decoder makes a

decision on a received word on the basis of the -bit word

itself, as well as preceding -bit words and upcoming

-bit words. Essentially, the decoder comprises a register of

length -bit words and a logic function that

translates the contents of the register into the retrieved -

bit source word. Since the constants and are ﬁnite, an

error in the retrieved sequence can propagate in the decoded

sequence only for a ﬁnite distance, at most the decoder window

length. Fig. 10 shows a schematic of a sliding-block decoder.

An important subclass of sliding-block decoders are the block

decoders, which use only a single codeword for reproducing

the source word, i.e.,

Generally speaking, the problem of code design is to con-

struct practical, efﬁcient, ﬁnite-state encoders with sliding-

block decoders. There are several fundamental questions re-

lated to this problem.

a) For a rate , what encoder input and output block

sizes and , with , are realizable?

b) Can a sliding-block decodable encoder always be found?

c) Can 100% efﬁcient sliding-block decodable encoders be

designed when the capacity is a rational number ?

d) Are there good bounds on basic complexity measures

pertaining to constrained codes for a given constraint,

such as number of encoder states, encoder gate complex-

ity, encoding delay, and sliding-block decoder window

length?

Many of these questions have been answered fully or in

part, as we now describe.

B. Graphs and Constraints

It is very useful and convenient, when stating code existence

results and specifying code construction algorithms, to refer

to labeled graph descriptions of constrained sequences. More

precisely, a labeled graph (or a ﬁnite labeled directed graph)

consists of a ﬁnite set of states ;a

ﬁnite set of edges , where each edge has an initial

state and a terminal state, both in ; and an edge labeling

, where is a ﬁnite alphabet. Fig. 11 shows a

Fig. 11. Typical labeled graph.

“typical” labeled graph. When context makes it clear, a labeled

graph may be called simply a “graph.”

A labeled graph can be used to generate ﬁnite symbol

sequences by reading off the labels along paths in the graph,

thereby producing a word (also called a string or a block). For

example, in Fig. 11, the word can be generated by

following a path along edges with state sequence

We will sometimes call word of length generated by an

-block.

The connections in the directed graph underlying a labeled

graph are conveniently described by an adjacency matrix, as

was mentioned in Section III. Speciﬁcally, for a graph ,

we denote by the

adjacency matrix whose entry is the number of edges

from state to state in The adjacency matrix, of course,

has nonnegative integer entries. Note that the number of paths

of length from state to state is simply , and the

number of cycles of length is simply the trace of

The fundamental object considered in the theory of con-

strained coding is the set of words generated by a labeled

graph. A constrained system (or constraint), denoted , is the

set of all words (i.e., ﬁnite-length sequences) generated by

reading the labels of paths in a labeled graph We will

also, at times, consider right-inﬁnite sequences and

sometimes bi-inﬁnite sequences The

alphabet of symbols appearing in the words of is denoted

We say that the graph presents or is a presentation

of , and we write For a state in , the set of

all ﬁnite words generated from is called the follower set of

in , denoted by

As mentioned above, a rate ﬁnite-state encoder will

generate a word in the constrained system composed of a

sequence of -blocks. For a constrained system presented

by a labeled graph , it will be very useful to have an

explicit description of the words in , decomposed into such

nonoverlapping blocks of length

Let be a labeled graph. The th power of , denoted

, is the labeled graph with the same set of states as ,

but one edge for each path of length in , labeled by the

-block generated by that path. The adjacency matrix of

satisﬁes

For a constrained system presented by a labeled graph ,

the th power of , denoted , is the constrained system pre-

sented by So, is the constrained system obtained from

IMMINK et al.: CODES FOR DIGITAL RECORDERS 2273

by grouping the symbols in each word into nonoverlapping

words of length Note that the deﬁnition of does not

depend on which presentation of is used.

It is important to note that a given constrained system can

be presented by many different labeled graphs and, depending

on the context, one presentation will have advantages relative

to another. For example, one graph may present the constraint

using the smallest possible number of states, while another

may serve as the basis for an encoder ﬁnite-state machine.

There are important connections between the theory of

constrained coding and other scientiﬁc disciplines, including

symbolic dynamics, systems theory, and automata theory.

Many of the objects, concepts, and results in constrained

coding have counterparts in these ﬁelds. For example, the set

of bi-inﬁnite sequences derived from a constrained system is

called a soﬁc system (or soﬁc shift) in symbolic dynamics.

In systems theory, these sequences correspond to a discrete-

time, complete, time-invariant system. Similarly, in automata

theory, a constrained system is equivalent to a regular language

which is recognized by a certain type of automaton [94].

The interrelationships among these various disciplines are

discussed in more detail in [15], [127], and [142].

The bridge to symbolic dynamics, established in [3], has

proven to be especially signiﬁcant, leading to breakthroughs in

both the theory and design of constrained codes. An interesting

account of this development and its impact on the design of

recording codes for magnetic storage is given in [2]. A very

comprehensive mathematical treatment may be found in [138].

C. Properties of Graph Labelings

In order to state the coding theorems, as well as for purposes

of encoder construction, it will be important to consider

labelings with special properties.

We say that a labeled graph is deterministic if, at each state,

the outgoing edges have distinct labels. In other words, at each

state, any label generated from that state determines a unique

outgoing edge from that state. Constrained systems that play

a role in digital recording generally have natural presentations

by a deterministic graph. For example, the labeled graphs in

Figs. 6 and 7 are both deterministic. It can be shown that any

constrained system can be presented by a deterministic graph

[144]. Similarly, a graph is called codeterministic if, for each

state, the incoming edges are distinctly labeled. Fig. 6 is not

codeterministic, while Fig. 7 is.

Many algorithms for constructing constrained codes begin

with a deterministic presentation of the constrained system

and transform it into a presentation which satisﬁes a weaker

version of the deterministic property called ﬁnite anticipation.

A labeled graph is said to have ﬁnite anticipation if there is an

integer such that any two paths of length with the

same initial state and labeling must have the same initial edge.

The anticipation of refers to the smallest for which

this condition holds. Similarly, we deﬁne the coanticipation

of a labeled graph as the anticipation of the labeled graph

obtained by reversing the directions of the edges in

A labeled graph has ﬁnite memory if there is an integer

such that the paths in of length that generate the same

word all terminate at the same state. The smallest for which

this holds is called the memory of and is denoted

A property related to ﬁnite anticipation is that of being

-deﬁnite. A labeled graph has this property if, given

any word , the set of paths

that generate all agree in the edge

A graph with this property is sometimes said to have ﬁnite

memory-and-anticipation. Note that, whereas the deﬁnition of

ﬁnite anticipation involves knowledge of an initial state, the

-deﬁnite property replaces that with knowledge of a

ﬁnite amount of memory.

Finally, as mentioned in Section III, a labeled graph is

lossless if any two distinct paths with the same initial state

and terminal state have different labelings.

The graph in Fig. 6 has ﬁnite memory , and it is -

deﬁnite because, for any given word of length at least ,

all paths that generate end with the same edge. In contrast,

the graph in Fig. 7 does not have ﬁnite memory and is not

deﬁnite.

D. Finite-Type and Almost-Finite-Type Constraints

There are some special classes of constraints, called ﬁnite-

type and almost-ﬁnite type, that play an important role in the

theory and construction of constrained codes. A constrained

system is ﬁnite-type (a term derived from symbolic dynamics

[138]) if it can be presented by a deﬁnite graph. Thus the

-RLL constraint is ﬁnite-type.

There is also a useful intrinsic characterization of ﬁnite-type

constraints: there is an integer such that, for any symbol

and any word of length at least , we have

if and only if where is the sufﬁx of of

length The smallest such integer , if any, is called the

memory of and is denoted by

Using this intrinsic characterization, we can show that not

every constrained system of practical interest is ﬁnite-type. In

particular, the charge-constrained system described by Fig. 7

is not. To see this, note that the symbol “ ” can be appended

to the word

but not to the word

Nevertheless, this constrained system falls into a natural

broader class of constrained systems. These systems can be

thought of as “locally ﬁnite-type.” More precisely, a con-

strained system is almost-ﬁnite-type if it can be presented

by a labeled graph that has both ﬁnite anticipation and ﬁnite

coanticipation.

Since deﬁniteness implies ﬁnite anticipation and ﬁnite coan-

ticipation, every ﬁnite-type constrained system is also almost-

ﬁnite-type. Therefore, the class of almost-ﬁnite-type systems

does indeed include all of the ﬁnite-type systems. This inclu-

sion is proper, as can be seen by referring to Fig. 7. There,

we see that the charge-constrained systems are presented by

labeled graphs with zero anticipation (i.e., deterministic) and

2274 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

zero coanticipation (i.e., codeterministic). Thus these systems

are almost-ﬁnite-type, but not ﬁnite-type. Constrained systems

used in practical applications are virtually always almost-

ﬁnite-type.

Another useful property of constrained systems is irre-

ducibility. A constraint is irreducible if, for every pair of

words in , there is a word such that is in

Equivalently, is irreducible if and only if it is presented by

some irreducible labeled graph. In coding, it usually sufﬁces

to consider irreducible constraints.

Irrreducible constrained systems have a distinguished pre-

sentation called the Shannon cover, which is the unique (up

to labeled graph isomorphism) deterministic presentation of

with a smallest number of states. The Shannon cover

can be used to determine if the constraint is almost-ﬁnite-

type or ﬁnite-type. More precisely, an irreducible constrained

system is ﬁnite-type (respectively, almost-ﬁnite-type) if and

only if its Shannon cover has ﬁnite memory (respectively,

ﬁnite coanticipation).

Referring to Section III, recall that the (base- capacity of

a constrained system is given by

cap

where is the number of -blocks in The (base- )

capacity of an irreducible system can be obtained from the

Shannon cover. In fact, as mentioned in Section III, if is

any irreducible lossless presentation of , then

cap

E. Coding Theorems

We now state a series of coding theorems that reﬁne and

strengthen the fundamental coding theorem of Shannon, thus

answering many of the questions posed above. Moreover, the

proofs of these theorems are often constructive, leading to

practical algorithms for code design.

First, we establish some useful notation and terminology.

An encoder usually takes the form of a synchronous ﬁnite-

state machine, as mentioned earlier and shown schematically

in Fig. 9. More precisely, for a constrained system and a

positive integer ,an -encoder is a labeled graph

satisfying the following properties: 1) each state of has out-

degree , that is, outgoing edges; 2) ; and 3) the

presentation is lossless.

Atagged -encoder is an -encoder in which

the outgoing edges from each state in are assigned distinct

words, or input tags, from an alphabet of size We will

sometimes use the same symbol to denote both a tagged

-encoder and the underlying -encoder.

Finally, we deﬁne a rate ﬁnite-state -encoder to

be a tagged -encoder where the input tags are the -

ary -blocks. We will be primarily concerned with the binary

case, , and will call such an encoder a rate ﬁnite-

state encoder for The encoding proceeds in the obvious

fashion, given a selection of an initial state. If the current

state is and the input data is the -block , the codeword

generated is the -block that labels the outgoing edge from

Fig. 12. Rate tagged encoder.

state with input tag The next encoder state is the terminal

state of the edge A tagged encoder is illustrated in Fig. 12.

1) Block Encoders: We ﬁrst consider the construction of

the structurally simplest type of encoder, namely, a block

encoder. A rate ﬁnite-state -encoder is called a

rate block -encoder if it contains only one state.

Block encoders have played an important role in digital storage

systems.

The following theorem states that block encoders can be

used to asymptotically approach capacity. It follows essen-

tially from Shannon’s proof of the fundamental theorem for

noiseless channels.

Block-Coding Theorem: Let be an irreducible con-

strained system and let be a positive integer. There exists a

sequence of rate block -encoders such that

cap

The next result provides a characterization of all block

encoders.

Block Code Characterization: Let be a constrained sys-

tem with a deterministic presentation and let be a positive

integer. Then there exists a block -encoder if and only if

there exists a subgraph of and a collection of symbols

of , such that is the set of labels of the outgoing edges

from each state in

Freiman and Wyner [69] developed a procedure that can

be used to determine whether there exists a block -

encoder for a given constrained system with ﬁnite memory

Speciﬁcally, let be a deterministic presentation of

For every pair of states and in , consider the set

of all words of length that can be generated in by paths

that start at and terminate at To identify a subgraph of

as in the block-code characterization, we search for a set

of states in satisfying

Freiman and Wyner [69] simplify the search by proving that,

when has ﬁnite memory , it sufﬁces to consider sets

which are complete; namely, if is in and ,

then is also in

Even with the restriction of the search to complete sets,

this block-code design procedure is not efﬁcient, in general.

However, given and , for certain constrained systems

, such as the -RLL constraints, it does allow us to

effectively compute the largest for which there exists a block

IMMINK et al.: CODES FOR DIGITAL RECORDERS 2275

TABLE I

OPTIMAL LENGTH- LIST

FOR

TABLE II

RATE

VARIABLE-LENGTH BLOCK ENCODER

-encoder. In fact, the procedure can be used to ﬁnd a

largest possible set of self-concatenable words of length

Block Encoder Examples: Digital magnetic-tape systems

have utilized block codes satisfying

constraints, for and . Speciﬁcally, the codes,

with rates and , respectively, were derived from

optimal lists of sizes and ,

respectively. The simple rate code,

known as the Frequency Modulation code, consists of the two

codewords and . The 17 words of the list

are shown in Table I. The 16 words remaining after deletion

of the all- ’s word form the codebook for the rate Group

Code Recording (GCR) code, which became the industry

standard for nine-track tape drives. The input tag assignments

are also shown in the table. See [146] for further details.

A rate , code, developed by Franaszek

[59], [44], became an industry standard in disk drives using

peak detection. It can be described as a variable-length block

code, and was derived using a similar search method. The

encoder table is shown in Table II.

Disk drives using PRML techniques have incorporated

a block code satisfying constraints

[45]. The code, with rate , was derived from the unique

maximum size list of size The list has a very

simple description. It is the set of length- binary words

satisfying the following three conditions: 1) the maximum

runlength of zeros within the word is no more than ; 2) the

maximum runlengths of zeros at the beginning and end of the

word are no more than ; and 3) the maximum runlengths

of zeros at the beginning and end of the even interleave and

odd interleave of the word are no more than . A rate ,

block code, derived from an optimal list

of length- words with an analogous deﬁnition, has also been

designed for use in PRML systems [161], [1].

2) Deterministic Encoders: Block encoders, although con-

ceptually simple, may not be suitable in many cases, since

they might require a prohibitively large value of in order

to achieve the desired rate. Allowing multiple states in the

encoder can reduce the required codeword length. If each

state in has at least outgoing edges, then we can

obtain a deterministic -encoder by deleting excess

edges. In fact, it is sufﬁcient (and necessary) for to have a

subgraph where each state satisﬁes this condition. This result,

characterizing deterministic encoders, is stated by Franaszek

in [57].

Deterministic Encoder Characterization: Let be a con-

strained system with a deterministic presentation and let

be a positive integer. Then there exists a deterministic -

encoder if and only if there exists such an encoder which is

a subgraph of

Let be a deterministic presentation of a constrained

system According to the characterization, we can derive

from a deterministic -encoder if and only if there

exists a set of states in , called a set of principal states,

such that

for every

This inequality can be expressed in terms of the character-

istic vector of the set of states , where

if and otherwise. Then, is a set of principal

states if and only if

(16)

We digress brieﬂy to discuss the signiﬁcance of this in-

equality. Given a nonnegative integer square matrix

and an integer ,an -approximate eigenvector is a

nonnegative integer vector satisfying

(17)

where the inequality holds componentwise. We refer to this

inequality as the approximate eigenvector inequality, and we

denote the set of all -approximate eigenvectors by

Approximate eigenvectors will play an essential role

in the constructive proof of the ﬁnite-state coding theorem

in the next section, as they do in many code-construction

procedures.

The existence of approximate eigenvectors is guaranteed

by the Perron–Frobenius theory [76], [170]. Speciﬁcally, let

be the largest positive eigenvalue of , and let be a

positive integers satisfying Then there exists a vector

, with nonnegative integer components, satisfying (17).

The following algorithm, taken from [3] and due originally to

Franaszek, is an approach to ﬁnding such a vector.

2276 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

Fig. 13. Rate MFM encoder.

Franaszek Algorithm for Finding an Approximate Eigenvec-

tor: Choose an initial vector whose entries are ,

where is a nonnegative integer. Deﬁne inductively

Let , where is the ﬁrst integer such that

There are two situations that can arise: a) and b)

Case a) means that we have found an approximate

eigenvector, and in case b) there is no solution, so we increase

and start from the top again. There may be multiple

solutions for the vector The choice of the vector may

affect the complexity of the code constructed in this way. The

components of are often called weights.

From (16), it follows that is a set of principal states

if and only if the characteristic vector is an -

approximate eigenvector. Hence, we can ﬁnd whether there

is a deterministic -encoder by applying the Franaszek

algorithm to the matrix , the integer , and the

all- ’s vector as the initial vector A nonzero output vector

is a necessary and sufﬁcient condition for the existence of

a set of principal states, for which is then a characteristic

vector.

Deterministic Encoder Example: The rate ,

encoder—known as Modiﬁed Frequency Modulation

code, Miller code, or Delay Modulation—is a determinisitic

encoder. The encoder is derived from the second power of

the Shannon cover of the constraint. A set of

principal states is Fig. 13 shows a rate

deterministic encoder. In fact, the tagged encoder in Fig. 12

is a simpler description of the MFM tagged encoder obtained

by “merging” states and in Fig. 13. (See Section IV-F for

more on merging of states.)

3) Finite-State Coding Theorem: Although deterministic

encoders can overcome some of the limitations of block

encoders, further improvements may arise if we relax the

deterministic property. In this section, we show that, for a

desired rate where cap , even though a

deterministic encoder may not exist, a ﬁnite-state encoder

always does.

If an encoder has ﬁnite anticipation , then

we can decode in a state-dependent manner, beginning at the

initial state , and retracing the path followed by the encoder,

as follows. If the current state is , then the current codeword

to be decoded, together with the upcoming codewords,

constitute a word of length (measured in -blocks)

that is generated by a path that starts at By deﬁnition

of anticipation, the initial edge of such a path is uniquely

determined; the decoded -block is the input tag of , and the

next decoder state is the terminal state of

This decoding method will invert the encoder when applied

to valid codeword sequences. The output of the decoder will

be identical to the input to the encoder, possibly with a shift

of input -blocks.

The following theorem establishes that, with ﬁnite antic-

ipation, invertible encoders can achieve all rational rates

less than or equal to capacity, with any input and output

blocklengths and satisfying

Finite-State Coding Theorem: Let be a constrained sys-

tem. If cap then there exists a rate ﬁnite-state

-encoder with ﬁnite anticipation.

The theorem improves upon Shannon’s result in three

important ways. First, the proof is constructive, relying upon

the state-splitting algorithm, which will be discussed in Section

IV-F. Next, it proves the existence of ﬁnite-state -

encoders that achieve rate equal to the capacity cap , when

cap is rational. Finally, for any positive integers and

satisfying the inequality cap , there is a rate

ﬁnite-state -encoder that operates at rate In

particular, choosing and relatively prime, one can design

an invertible encoder using the smallest possible codeword

length compatible with the chosen rate

For completeness, we also state the more simply proved

ﬁnite-state inverse-coding theorem.

Finite-State Inverse-to-Coding Theorem: Let be a con-

strained system. Then, there exists a rate ﬁnite-state

-encoder only if cap

4) Sliding-Block Codes and Block-Decodable Codes: As

mentioned earlier, it is often desirable for ﬁnite-state encoders

to have decoders that limit the extent of error propagation.

The results in this section address the design of encoders with

sliding-block decoders, which we now formally deﬁne.

Let and be integers such that A sliding-

block decoder for a rate ﬁnite-state -encoder is a

mapping

such that, if is any sequence of -blocks

generated by the encoder from the input tag sequence of

-blocks , then, for

We call the look-ahead of and the look-behind of

The sum is called the decoding window length of

See Fig. 10, where and

IMMINK et al.: CODES FOR DIGITAL RECORDERS 2277

TABLE III

RATE SLIDING-BLOCK-DECODABLE ENCODER

As mentioned earlier, a single error at the input to a sliding-

block decoder can only affect the decoding of -blocks that

fall in a “window” of length at most , measured in

-blocks. Thus a sliding-block decoder controls the extent of

error propagation.

The following result, due to Adler, Coppersmith, and Hass-

ner [3], improves upon the ﬁnite-state coding theorem for

ﬁnite-type constrained systems.

Sliding-Block Code Theorem for Finite-Type Systems: Let

be a ﬁnite-type constrained system. If cap , then

there exists a rate ﬁnite-state -encoder with a

sliding-block decoder.

This result, sometimes called the ACH theorem, follows

readily from the proof of the ﬁnite-state coding theorem.

The constructive proof technique, based upon state-splitting,

is sometimes referred to as the ACH algorithm (see Section

IV-F).

Sliding-Block Code Example: The con-

straint has capacity Adler, Hassner, and

Moussouris [4] used the state-splitting algorithm to construct a

rate , encoder with ﬁve states, represented

in tabular form in Table III. Entries in the “state” columns

indicate the output word and next encoder state. With the input

tagging shown, the encoder is sliding-block decodable with

The decoder error propagation is limited

to ﬁve input bits. The same underlying encoder graph was

independently constructed by Jacoby [112] using “look-ahead”

code design techniques. Weathers and Wolf [112] applied the

state-splitting algorithm to design a -state,

sliding-block-decodable encoder with error propagation at

most 5 input bits. This encoder has the distinction of achieving

the smallest possible number of states for this constraint and

rate [143].

Ablock-decodable encoder is a special case of -

sliding-block decodable encoders where both and are zero.

Because of the favorable implications for error propagation,

a block-decodable encoder is often sought in practice. The

following result characterizes these encoders completely.

Block-Decodable Encoder Characterization: Let be a

constrained system with a deterministic presentation and let

be a positive integer. Then there exists a block decodable

-encoder if and only if there exists such an encoder

which is a subgraph of

It has been shown that the general problem of deciding

whether a particular subgraph of can be input-tagged in such

a way as to produce a block-decodable encoder is NP-complete

[8]. Nevertheless, for certain classes of constraints, and many

other speciﬁc examples, such an input-tag assignment can be

found.

Block-Decodable Code Examples: For certain irreducible

constrained systems, including powers of -RLL con-

strained systems, Franaszek [57], [58] showed that whenever

there is a deterministic -encoder which is a subgraph of

the Shannon cover, there is also such an encoder that can be

tagged so that it is block-decodable. In fact, the MFM encoder

of Fig. 13 is block-decodable.

For -RLL constrained systems, an explicit description

of such a labeling was found by Gu and Fuja [79] and, indepen-

dently, by Tjalkens [191]. They show that their labeling yields

the largest rate attainable by any block-decodable encoder for

any given -RLL constrained system.

The Gu–Fuja construction is a generalization of a coding

scheme introduced by Beenker and Immink [16]. The under-

lying idea, which is quite generally applicable, is to design

block-decodable encoders by using merging bits between con-

strained words [16], [112], [104]. Each input -block has a

unique constrained -block representation, where The

encoder uses a look-up table for translating source words into

constrained words of length plus some logic circuitry for

determining the merging bits. Decoding is extremely

simple: discard the merging bits and translate the -bit word

into the -bit source word.

For sequences, the encoder makes use of the set

of all -constrained -blocks with at least

leading zeroes and at most trailing zeroes. The parameters

are assumed to satisfy and

Using a look-up table or enumeration techniques [102, p.

117], [188], [42], the encoder maps each of the -bit

input tags to a unique -block in where

The codewords in

are not necessarily freely concatenable, however. When the

concatenation of the current codeword with the preceding

one violates the constraint, the encoder inverts one

of the ﬁrst zeroes in the current codeword. The condition

guarantees that such an inversion can always

resolve the constraint violation. In this case, the ﬁrst bits

of each codeword may be regarded as the merging bits.

Immink [106] gave a constructive proof that codes

with merging bits can be made for which

As a result, codes with a rate only 0.1% less

than Shannon’s capacity can be constructed with codewords

of length Such long codewords could present an

additional practical problem—beyond that of mapping the

input words to the constrained words, which can be handled by

enumerative coding—because a single channel bit error could

corrupt the entire data in the decoded word. One proposal for

resolving this difﬁculty is to use a special conﬁguration of the

error-correcting code and the recording code [22], [49], [106].

Another well-known application of this method is that of the

Eight-to-Fourteen Modulation (EFM) code, a rate code

which is implemented in the compact audio disc [96], [84],

[109]. A collection of 256 codewords is drawn from the set of

length- words that satisfy the constraint.

With this codebook, two merging bits would sufﬁce to achieve

a rate block-decodable code. However,

in order to induce more favorable low-frequency spectral

2278 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

characteristics in the recorded code sequences, the encoding

algorithm introduces an additional merging bit, yielding the

rate block-decodable EFM encoder.

5) Extensions: In this section, we present strengthened

versions of both the ﬁnite-state coding theorem and the ACH

theorem.

Anoncatastrophic encoder is a tagged -encoder with

ﬁnite anticipation and the additional property that, whenever

the sequences of output labels of two right-inﬁnite paths differ

in only ﬁnitely many places, the corresponding sequences of

input tags also differ in only ﬁnitely many places. A rate

ﬁnite-state tagged -encoder is noncatastrophic if

the corresponding tagged -encoder is noncatastrophic.

Noncatastrophic encoders restrict error propagation in the

sense that they limit the number of decoded data errors

spawned by an isolated channel error. They do not necessarily

limit the time span in which these errors occur. The concept

of noncatastrophicity appears in the theory of convolutional

codes, as well, where it actually coincides with sliding-block

decodability [137, Ch. 10].

The following theorem is due to Karabed and Marcus [120].

Noncatastrophic Encoder Theorem: Let be a constrained

system. If cap , then there exists a noncatastrophic

rate ﬁnite-state -encoder.

For the noncatastrophic encoders constructed in the proof

of the theorem, the decoding errors generated by a single

channel error are, in fact, conﬁned to two bursts of ﬁnite

length, although these bursts may appear arbitrarily far apart.

Karabed and Marcus also extended the ACH theorem to

almost-ﬁnite-type systems.

Sliding-Block Code Theorem for Almost-Finite-Type Sys-

tems: Let be an almost-ﬁnite-type constrained system. If

cap , then there exists a rate ﬁnite-state

-encoder with a sliding-block decoder.

The proof of this result is quite complicated. Although it

does not translate as readily as the proof of the ACH theorem

into a practical encoder design algorithm, the proof does

introduce new and powerful techniques that, in combination

with the state-splitting approach, can be applied effectively in

certain cases.

For example, some of these techniques were used in the de-

sign of a 100% efﬁcient, sliding-block-decodable encoder for a

combined charge-constrained runlength-

limited system [8]. In fact, it was the quest for such an encoder

that provided the original motivation for the theorem. Several

of the ideas in the proof of this generalization of the ACH

theorem from ﬁnite-type to almost-ﬁnite-type systems have

also played a role in the design of coded-modulation schemes

based upon spectral-null constraints, discussed in Section V-C.

F. The State-Splitting Algorithm

There are many techniques available to construct efﬁ-

cient ﬁnite-state encoders. The majority of these construction

techniques employ approximate eigenvectors to guide the

construction process. Among these code design techniques

is the state-splitting algorithm (or ACH algorithm) intro-

duced by Adler, Coppersmith, and Hassner [3]. It implements

the proof of the ﬁnite-state coding theorem and provides a

recipe for constructing ﬁnite-state encoders that, for ﬁnite-type

constraints, are sliding-block-decodable. The state-splitting

approach combines ideas found in Patel’s construction of

the Zero-Modulation (ZM) code [160] and earlier work of

Franaszek [62]–[64] with concepts and results from the math-

ematical theory of symbolic dynamics [138].

The ACH algorithm proceeds roughly as follows. For a

given deterministic presentation of a constrained system

and an achievable rate cap , we iteratively apply

a state-splitting transformation beginning with the th-power

graph The choice of transformation at each step is guided

by an approximate eigenvector, which is updated at each

iteration. The procedure culminates in a new presentation of

with at least outgoing edges at each state. After deleting

edges, we are left with an -encoder, which, when

tagged, gives our desired rate ﬁnite-state -encoder.

(Note that, if is ﬁnite-type, the encoder is sliding-block-

decodable regardless of the assignment of input tags.)

In view of its importance in the theory and practice of

code design, we now present the state-splitting algorithm

in more detail. This discussion follows [144], to which we

refer the reader for further details. The basic step in the

procedure is an out-splitting of a graph, and, more speciﬁcally,

an approximate-eigenvector consistent out-splitting, both of

which we now describe.

An out-splitting of a labeled graph begins with a partition

of the set of outgoing edges for each state in into

disjoint subsets

The partition is used to derive a new labeled graph

The set of states consists of descendant states

for every Outgoing edges from

state in are partitioned among its descendant states and

replicated in to each of the descendant terminal states in

the following manner. For each edge from to in ,

we determine the partition element to which belongs,

and endow with edges from to for

The label on the edge in is the same

as the label of the edge in (Sometimes an out-splitting

is called a round of out-splitting to indicate that several states

may have been split simultaneously.) The resulting graph

generates the same system , and has anticipation at most

Figs. 14 and 15 illustrate an out-splitting operation

on state

Given a labeled graph , a positive integer , and an

-approximate eigenvector ,an -

consistent partition of is deﬁned by partitioning the set

of outgoing edges for each state in into disjoint

subsets

IMMINK et al.: CODES FOR DIGITAL RECORDERS 2279

Fig. 14. Before out-splitting.

Fig. 15. After out-splitting.

Fig. 16. Before -consistent out-splitting.

with the property that

for (18)

where denotes the terminal state of the edge , are

nonnegative integers, and

for every (19)

The out-splitting based upon such a partition is called an -

consistent splitting. The vector indexed by the states

of the split graph and deﬁned by is called

the induced vector.An -consistent partition or splitting is

called nontrivial if for at least one state and

both and are positive. Figs. 16 and 17 illustrate an

-consistent splitting.

Fig. 17. After -consistent out-splitting.

We now summarize the steps in the state-splitting algorithm

for constructing a ﬁnite-state encoder with ﬁnite anticipation

[144].

The State-Splitting Algorithm:

1) Select a labeled graph and integers and as follows:

a) Find a deterministic labeled graph (or more gen-

erally a labeled graph with ﬁnite anticipation) which

presents the given constrained system (most con-

strained systems have a natural deterministic repre-

sentation that is used to describe them in the ﬁrst

place).

b) Find the adjacency matrix of

c) Compute the capacity cap

d) Select a desired code rate satisfying

cap

(one usually wants to keep and relatively small

for complexity reasons).

2) Construct

3) Using the Franaszek algorithm of Section IV-E2, ﬁnd an

-approximate eigenvector

4) Eliminate all states with from , and restrict

to an irreducible sink of the resulting graph, meaning

a maximal irreducible subgraph with the property that all

edges with initial states in have their terminal states

in Restrict to be indexed by the states of

5) Iterate steps 5a)–5c) below until the labeled graph

has at least edges outgoing from each state:

a) Find a nontrivial -consistent partition of the edges

in (This can be shown to be possible with a state

of maximum weight.)

b) Find the -consistent splitting corresponding to this

partition, creating a labeled graph and an approx-

imate eigenvector

c) Replace by and by

6) At each state of , delete all but outgoing edges

and tag the remaining edges with -ry -blocks, one for

2280 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

each outgoing edge. This gives a rate ﬁnite-state

-encoder.

At every iteration, at least one state is split in a nontrivial

way. Since a state with weight will be split into at most

descendant states throughout the whole iteration process,

the number of iterations required to generate the encoder graph

is no more than Therefore, the anticipation

of is at most For the same reason, the

number of states in is at most

The operations of taking higher powers and out-splitting

preserve deﬁniteness (although the anticipation may increase

under out-splitting). Therefore, if is ﬁnite-type and

is a ﬁnite-memory presentation of , any -encoder

constructed by the state-splitting algorithm will be -

deﬁnite for some and and, therefore, sliding-block-

decodable.

The execution of the sliding-block code algorithm can be

made completely systematic, in the sense that a computer

program can be devised to automatically generate an encoder

and decoder for any valid code rate. Nevertheless, the appli-

cation of the method to just about any nontrivial code design

problem will beneﬁt from the interactive involvement of the

code designers. There are some practical tools that can help

the designer make “good” choices during the construction

process, meaning choices that optimize certain measures of

performance and complexity. Among them is state merging, a

technique that can be used to simplify the encoder produced

by the ACH algorithm, as we now describe.

Let be a labeled graph and let and be two states in

such that Suppose that is an -

approximate eigenvector, and that The -

merger of is the labeled graph obtained from by:

1) eliminating all edges in ; 2) redirecting into state all

remaining edges coming into state ; and 3) eliminating the

state It is straightforward to show that , and

the vector deﬁned by for all vertices of is an

-approximate eigenvector. This operation reduces the

ﬁnal number of encoder states by The general problem

of determining when to apply state merging during the state-

splitting procedure in order to achieve the minimum number

of states in the ﬁnal encoder remains open.

It is also desirable to minimize the sliding-block decoder

window size, in order to limit error propagation as well as

decoder complexity. There are several elements of the code

design that inﬂuence the window size, such as initial presen-

tation, choice of approximate eigenvector, selection of out-

splittings, excess edge elimination, and input tag assignment.

There are approaches that, in some cases, can be used during

the application of the state-splitting algorithm to help reduce

the size of the decoder window, but the problem of minimizing

the window size remains open. In this context, it should

be noted that there are alternative code-design procedures

that provide very useful heuristics for constructing sliding-

block-decodable encoders with small decoding window. They

also imply useful upper bounds on the minimum size of the

decoding window and on the smallest possible anticipation

(or decoding delay) [12]. In particular, Hollmann [92] has

recently developed an approach, inﬂuenced by earlier work of

Immink [103], which combines the state-splitting method with

a generalized look-ahead encoding technique called bounded-

delay encoding, originally introduced by Franaszek [61], [63].

In a number of cases, it was found that this hybrid code design

technique produced a sliding-block-decodable encoder with

smaller window length than was achieved using other methods.

Several examples of such codes for speciﬁc constraints of

practical importance were constructed in [92].

For more extensive discussion of complexity measures and

bounds, as well as brief descriptions of other general code

construction methods, the reader is referred to [144].

G. Universality of State Splitting

The guarantee of a sliding-block decoder when is ﬁnite-

type, along with the explicit bound on the decoder window

length, represent key strengths of the state-splitting algorithm.

Another important property is its universality. In this context,

we think of the state-splitting algorithm as comprising a

selection of a deterministic presentation of a constrained

system ,an -approximate eigenvector , a sequence

of -consistent out-splittings, followed by deletion of excess

edges, and ﬁnally an input-tag assignment, resulting in a

tagged -encoder.

For integers and a function from -blocks

of to the -ary alphabet (such as a sliding-block decoder),

we deﬁne to be the induced mapping on bi-inﬁnite

sequences given by

where

For convenience, we use the notation to denote For

a tagged -encoder with sliding-block decoder ,we

take the domain of the induced mapping to be the set of

all bi-inﬁnite (output) symbol sequences obtained from We

say that a mapping is a sliding-block -decoder if

is a sliding-block decoder for some tagged -encoder.

The universality of the state-splitting algorithm is summa-

rized in the following theorem due to Ashley and Marcus [9],

which we quote from [144].

Universality Theorem: Let be an irreducible constrained

system and let be a positive integer.

a) Every sliding-block -decoder has a unique mini-

mal tagged -encoder, where minimality is in terms

of number of encoder states.

b) If we allow an arbitrary choice of deterministic pre-

sentation of and -approximate eigenvector

, then the state-splitting algorithm can ﬁnd a tagged

-encoder for every sliding-block -decoder.

If we also allow merging of states (i.e., -merging

as described above), then it can ﬁnd the minimal tagged

-encoder for every sliding-block -decoder.

c) If we ﬁx to be the Shannon cover of , but allow

an arbitrary choice of -approximate eigenvector

IMMINK et al.: CODES FOR DIGITAL RECORDERS 2281

, then the state-splitting algorithm can ﬁnd a tagged

-encoder for every sliding-block -decoder

, modulo a change in the domain of , possibly

with a constant shift of each bi-inﬁnite sequence prior

to applying (but with no change in the decoding

function itself). If we also allow merging of states,

then, modulo the same changes, it can ﬁnd the minimal

tagged -encoder for every sliding-block -

decoder. In particular, it can ﬁnd a sliding-block -

decoder with minimal decoding window length.

Certain limitations on the use of the algorithm should be

noted, however [9]. If we apply the state-splitting algorithm

to the Shannon cover of an irreducible constrained system

, it need not be able to ﬁnd a sliding-block -decoder

with smallest number of encoder states in its minimal tagged

-encoder.

Similarly, if we start with the Shannon cover of an irre-

ducible constrained system and, in addition, we ﬁx to be a

minimal -approximate eigenvector (i.e., with smallest

eigenvector component sum), then the algorithm may fail to

ﬁnd a sliding-block -decoder with minimum decoding

window length [119], [103], [9].

The universality of the state-splitting algorithm is an at-

tractive property, in that it implies that the technique can

be used to produce the “best” codes. However, in order to

harness the power of this design tool, strategies for making

the right choices during the execution of the construction

procedure are required. There is considerable room for further

research in this direction, as well as in the development of

other code-construction methods.

H. Practical Aspects of High-Rate Code Design

The construction of very high rate -constrained codes

and DC-balanced codes is an important practical problem [71],

[102], [208]. The construction of such high-rate codes is far

from obvious, as table look-up for encoding and decoding is

an engineering impracticality. The usual approach is to sup-

plement the source bits with bits. Under certain,

usually simple, rules the source word is modiﬁed in such a

way that the modiﬁed word plus supplementary bits comply

with the constraints. The information that certain modiﬁcations

have been made is carried by the supplementary bits. The

receiver, on reception of the word, will undo the modiﬁcations.

In order to reduce complexity and error propagation, the

number of bits affected by a modiﬁcation should be as small as

possible. We now give some examples of such constructions.

A traditional example of a simple DC-free code is called the

polarity bit code [26]. The source symbols are supplemented

by one bit called the polarity bit. The encoder has the option to

transmit the -bit word without modiﬁcation or to invert

all symbols. The choice of a speciﬁc translation is made

in such a way that the running digital sum is as close to zero

as possible. It can easily be shown that the running digital sum

takes a ﬁnite number of values, so that the sequence generated

is DC-balanced.

A surprisingly simple method for transforming an arbi-

trary word into a codeword having equal numbers of ’s

and ’s—that is, a balanced or zero-disparity word—was

published by Knuth [129] and Henry [85]. Let

be the disparity of the binary source word

Let be the running digital sum of the ﬁrst

bits of or

and let be the word with its ﬁrst bits inverted. For

example, if

we have and

If is of even length , and if we let stand for ,

then the quantity is

It is immediate that (no symbols inverted),

and (all symbols inverted). We may, there-

fore, conclude that every word can be associated with at

least one , so that ,or is balanced. The

value of is encoded in a (preferably) zero-disparity word

of length even. If and are both odd, we can use

a similar construction. The maximum codeword length of

is governed by

Some other modiﬁcations of the basic scheme are discussed

in Knuth [129] and Alon [5].

The sequence replacement technique [202] converts source

words of length into -constrained words of length

The control bit is set to and appended at

the beginning of the -bit source word. If this -bit

sequence satisﬁes the prescribed constraint it is transmitted. If

the constraint is violated, i.e., a runlength of at least ’s

occur, we remove the trespassing ’s. The position where

the start of the violation was found is encoded in bits,

which are appended at the beginning of the -bit word. Such

a modiﬁcation is signaled to the receiver by setting the control

bit to . The codeword remains of length The procedure

above is repeated until all forbidden subsequences have been

removed. The receiver can reconstruct the source word as the

position information is stored at a predeﬁned position in the

codeword. In certain situations, the entire source word has

to be modiﬁed which makes the procedure prone to error

propagation. The class of rate -constrained

codes, was constructed to minimize

2282 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

Fig. 18. Probability that no sequence of drawings from a selection set of random sequences satisﬁes the constraint. Code rate Upper

curve: codeword length , selection set size ; lower curve: codeword length , selection set size

error propagation [111]. Error propagation is conﬁned to one

decoded 8-bit symbol, irrespective of the codeword length

Recently, the publications by Fair et al. [48] and Immink

and Patrovics [110] on guided scrambling brought new insights

into high-rate code design. Guided scrambling is a member

of a larger class of related coding schemes called multimode

codes. In multimode codes, the -bit source word is mapped

into -bit codewords. Each source word can be

represented by a member of a selection set consisting of

codewords. Examples of such mappings are the

guided scrambling algorithm presented by Fair et al. [48],

the DC-free coset codes of Deng and Herro [43], and the

scrambling using a Reed–Solomon code by Kunisa et al. [135].

A mapping is considered to be “good” if the selection set

contains sufﬁciently distinct and random codewords.

The encoder transmits the codeword that minimizes, accord-

ing to a prescribed criterion, some property of the encoded

sequence, such as its low-frequency spectral content. In gen-

eral, there are two key elements which need to be chosen

judiciously: a) the mapping between the source words and

their corresponding selection sets, and b) the criterion used to

select the “best” word.

The use of multimode codes is not conﬁned to the generation

of DC-free sequences. Provided that is large enough and

the selection set contains sufﬁciently different codewords,

multimode codes can also be used to satisfy almost any

channel constraint with a suitably chosen selection method.

For given rate and proper selection criteria, the spectral content

of multimode codes is very close to that of maxentropic RDS-

constrained sequences. A clear disadvantage is that the encoder

needs to generate all possible codewords, compute the

criterion, and make the decision.

In the context of high-rate multimode codes, there is in-

terest in weakly constrained codes [107]. Weakly constrained

codes may produce sequences that violate the constraints with

probability It is argued that if the channel is not free

of errors, it is pointless to feed the channel with perfectly

constrained sequences. We illustrate the effectiveness of this

idea by considering the properties of two examples of weak

codes. Fig. 18 shows the probability that no sequence

taken from a selection set of size of random sequences

obeys the constraint. Let the code rate ,

the codeword length , and the size of the selection set

Then we observe that with probability

a codeword violates the constraint. The alternative

implementation [111] requires a rate of —four

times the redundancy of the weakly constrained code—to

strictly guarantee the same constraint.

V. CONSTRAINED CODES FOR NOISY RECORDING CHANNELS

In Section III-A, we indicated how the implementation

of timing recovery, gain control, and detection algorithms

in recording systems created a need for suitably constrained

recording codes. These codes are typically used as an inner

code, in concatenation with an outer error-correcting code.

The error-correcting codes improve system performance by

introducing structure, usually of an algebraic nature, that

increases the separation of code sequences as measured by

some distance metric, such as Hamming distance.

A number of authors have addressed the problem of endow-

ing constrained codes with advantageous distance properties.

Metrics that have been considered include Hamming distance,

edit (or Levenshtein) distance, and Lee distance. These metrics

arise in the context of a variety of error types, including

random-bit errors, insertion and deletion errors, bitshift errors,

and more generally, burst errors. Code constructions, perfor-

mance analyses, as well as lower and upper bounds on the

IMMINK et al.: CODES FOR DIGITAL RECORDERS 2283

achievable size of constrained codes with speciﬁed distance

properties are surveyed in [144].

It is fair to say that the application of constrained codes

with random or burst-error correction capabilities, proposed

largely in the context of storage systems using symbol-by-

symbol detection such as peak detection, has been extremely

limited. However, the advent of digital signal processing

techniques such as PRML has created a new role for recording

codes, analogous to the role of trellis-coded modulation in

digital communications. In this section, we describe how

appropriately constrained code sequences can improve PRML

system performance by increasing the separation between the

channel output sequences with respect to Euclidean distance.

A. PRML Performance Bounds and Error Event Analysis

The design of distance-enhancing constrained codes for

recording channels requires an understanding of the perfor-

mance of the PRML Viterbi detector, which we now brieﬂy

review. The detector performance is best understood in terms

of error events. For a pair of input sequences and ,

deﬁne the input error sequence and the

output error sequence Aclosed error

event corresponds to a polynomial input error sequence

where and are ﬁnite integers, , and

A closed error event is said to be simple if the condition

is not true for any

integer where is the memory of the

channel. An open error event corresponds to a right-inﬁnite

input error sequence of the form

where inﬁnitely many are nonzero, but the Euclidean

norm is ﬁnite

In general, for an error event , with corresponding input

error sequence and output error sequence , the

squared-Euclidean distance is deﬁned as

The number of channel input-bit errors corresponding to an

error event is given by

The ML detector produces an error when the selected trellis

path differs from the correct path by a sequence of error events.

The union bound provides an upper bound to the probability

of an error event beginning at some time by considering the

set of all possible simple error events

ﬁrst event at time

TABLE IV

ERROR EVENT MULTIPLICITY GENERATING FUNCTIONS

which in the assumed case of AWGN, yields

Reorganizing the summation according to the error event

distance , the bound is expressed as:

where the values , known as the error event distance

spectrum, are deﬁned by

At moderate-to-high SNR, the performance of the system is

largely dictated by error events with small distance In

particular, the events with the minimum distance will be

the dominant contributors to the union bound, leading to the

frequently used approximation

For a number of the PR channel models applicable to

recording, the error event distance spectrum values, as well as

the corresponding input error sequences, have been determined

for a range of values of the distance [7], [198], [6].

The calculation is made somewhat interesting by the fact,

mentioned in Section II-C2, that the PR trellises support closed

error events of unbounded length having certain speciﬁed,

ﬁnite distances. For channels with limited ISI, analytical

methods may be applied in the characterization of low distance

events. However, for larger distances, and for PR channel

polynomials of higher degree, computer search methods have

been more effective.

Table IV gives several terms of the error event multi-

plicity generating functions for several

PR channels. Tables V and VI, respectively, list the input

error sequences for simple closed events on the PR4 and

EPR4 channels having squared-distance Table VII

describes the input error sequences for simple closed events on

the E PR4 channel having squared-distance In

the error sequence tables, the symbol “ ” is used to designate

“,” “ ” is used to designate “ ,” and a parenthesized string

denotes any positive number of repetitions of the string

2284 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

TABLE V

CLOSED ERROR EVENTS (PER INTERLEAVE)FOR PR4 CHANNEL,

TABLE VI

CLOSED ERROR EVENTS FOR EPR4 CHANNEL,

B. Code Design Strategy

The characterization of error sequences provides a basis

for the design of constrained codes that eliminate events with

a small Euclidean distance, thereby increasing the minimum

distance and giving a performance improvement [123], [182],

[122], [154]. This operation is similar in nature to expurgation

in the context of algebraic codes.

More speciﬁcally, the design of distance-enhancing con-

strained codes for PRML systems is based upon the following

strategy. First, we identify the input error sequences

with , where is the target

distance of the coded channel. Then, we determine a list

of input error strings that, if eliminated by means of a

code constraint, will prevent the occurrence of error events

with We denote the set of ternary error sequences

satisfying this constraint by

In order to prevent these error strings, we must next de-

termine a code constraint with the property that the corre-

sponding set of input error sequences satisﬁes

(20)

There are many choices for the error strings , as well as

for constraints satisfying (20). The problem of identifying

TABLE VII

CLOSED ERROR EVENTS FOR E PR4 CHANNEL,

those that can generate the best practical distance-enhancing

codes—with a speciﬁed coding gain, high-rate, simple encoder

and decoder, and low-complexity sequence detector—remains

open.

The code constraints and the PR channel memory are

then incorporated into a single detector trellis that can serve

as the basis for the Viterbi detector. The ﬁnal step in the design

procedure is to construct an efﬁcient code into the constraints

This can be accomplished using code design techniques

such as those discussed in Section IV.

It is useful to distinguish between two cases in implementing

this strategy for the PR channels we have discussed. The

cases are determined by the relationship of the minimum

distance to the matched-ﬁlter-bound (MFB) distance,

, where

the energy in the channel impulse response.

The ﬁrst case pertains to those channels which are said to

achieve the MFB

including PR4, EPR4, and PR1. For these channels, the set of

minimum-distance input error sequences includes

, and so any distance-enhancing code constraint must

prevent this input error impulse from occurring.

The second case involves channels which do not achieve

the MFB

This case applies to E PR4, for all , as well as

EPR2, for all Note that, in this situation, a

minimum-distance input error sequence—in fact, every er-

ror sequence satisfying —has length strictly

greater than , where event length refers to the span between

the ﬁrst and last nonzero symbols. These events can often be

eliminated with constraints that are quite simply speciﬁed

and for which practical, efﬁcient codes are readily constructed.

For the latter class of channels, we can determine distance-

enhancing constraints that increase the minimum distance to

, yet are characterizable in terms of a small list of

IMMINK et al.: CODES FOR DIGITAL RECORDERS 2285

relatively short forbidden code strings. (We will sometimes

denote such constraints by This permits the design

of high-rate codes, and also makes it possible to limit the

complexity of the Viterbi detector, since the maximum length

of a forbidden string may not exceed too signiﬁcantly, or

at all, the memory of the uncoded channel. Consequently,

and perhaps surprisingly, the design of high-rate, distance-

enhancing codes with acceptable encoder/decoder and Viterbi

detector complexity proves to be considerably simpler for

the channels in the second group, namely, the channels with

relatively larger intersymbol interference.

We now turn to a discussion of some speciﬁc distance-

enhancing constraints and codes for partial-response channels.

C. Matched-Spectral-Null Constraints

As mentioned above, spectral-null constraints, particularly

those with DC-nulls and/or Nyquist-nulls, are well-matched

to the frequency characteristics of digital recording channels,

and have found application in many recording systems prior

to the introduction of PRML techniques. In [121] and [46], it

was shown that, in addition, constraints with spectral nulls at

the frequencies where the channel frequency response has the

value zero—matched-spectral-null (MSN) constraints—can in-

crease the minimum distance relative to the uncoded channel.

An example of this phenomenon, and one which served

historically to motivate the use of matched-spectral-null codes,

is the rate biphase code, with binary codewords

and , which, one can easily show, increases the minimum

squared-Euclidean distance of the binary-input dicode channel,

, from to

To state a more general bound on the distance-enhancing

properties of MSN codes, we generalize the notion of a

spectral null constraint to include sequences for which higher

order derivatives of the power spectrum vanish at speciﬁed

frequencies, as well. More precisely, we say that an ensemble

of sequences has an order-spectral density null at if the

power spectral density satisﬁes

We will concentrate here upon those with high-order spectral

null at DC. Sequences with high-order spectral nulls can be

characterized in a number of equivalent ways. The high-order

running-digital-sums of a sequence at

DC can be deﬁned recursively as

RDS RDS

RDS RDS

Sequences with order- spectral null at DC may be char-

acterized in terms of properties of RDS

Another characterization involves the related notion of high-

order moments (power-sums), where the order-moment at

DC of the sequence is deﬁned as

In analogy to the characterization of (ﬁrst-order) spectral

null sequences, one can show that an ensemble of sequences

generated by freely concatenating a set of codewords of ﬁnite

length will have an order- spectral null at DC if and only if

(21)

for all codewords In other words, for each codeword, the

order- moments at DC must vanish for

A sequence satisfying this condition is also said to have zero

disparity of order

Finally, we remark that a length- sequence with -

transform

has an order- spectral null at DC if and only if is

divisible by This fact plays a role in bounding the

distance-enhancing properties of spectral-null sequences.

For more details about high-order spectral null constraints,

particularly constraints with high-order null at DC, we refer the

interested reader to Immink [99], Monti and Pierobon [153],

Karabed and Siegel [121], Eleftheriou and Cideciyan [46], and

Roth, Siegel, and Vardy [165], as well as other references cited

therein.

The original proof of the distance-enhancing properties of

MSN codes was based upon a number-theoretic lower bound

on the minimum Hamming distance of zero-disparity codes,

due to Immink and Beenker [108]. They proved that the

minimum Hamming distance (and, therefore, the minimum

Euclidean distance) of a block code over the bipolar alphabet

with order- spectral-null at DC grows at least linearly in

Speciﬁcally, they showed that, for any pair of length-

sequences in the code

and

This result for block codes can be suitably generalized to any

constrained system with order- spectral null at DC. The

result also extends to systems with an order- spectral null at

any rational submultiple of the symbol frequency, in particular,

at the Nyquist frequency.

In [121], this result was extended to show that the Lee

distance, and a fortiori the squared-Euclidean distance, be-

tween output sequences of a bipolar, input-constrained channel

is lower-bounded by if the input constraint and the

channel, with spectral nulls at DC (or the Nyquist frequency)

of orders and , respectively, combine to produce a

spectral null at DC (or Nyquist) of order This

result can be proved by applying Descartes’ rule of signs to

the -transform representation of these sequences, using the

divisibility conditions mentioned above [121].

2286 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

This result can be applied to the PR4, EPR4, and E PR4

channels, which have a ﬁrst-order null at DC and a Nyquist

null of order and , respectively. If the channel

inputs are constrained to be bipolar sequences with an order-

Nyquist null, the channel outputs will satisfy the following

lower bound on minimum squared-Euclidean distance:

for PR4

for EPR4

for E PR4.

Comparing to the minimum distance of the uncoded bipolar

channels, we see that the MSN constraint with ,

corresponding to a ﬁrst-order Nyquist null, provides a coding

gain (unnormalized for rate loss) of at least 3, 1.8, and 1.2

dB, respectively. Using the observation made in Section III-

A3, one can design codes with ﬁrst-order null at DC and

Nyquist by twice-interleaving a DC-free code. When such a

code is applied to the PR4 channel, which has an interleaved

dicode decomposition, the implementation of the MSN-coded

system becomes feasible. Code-design techniques such as

those described in Section IV have been used to design

efﬁcient MSN codes. For analytical and experimental results

pertaining to a rate , MSN-coded PR4 system, the reader

is referred to [164] and [169]. Experimental evaluation of a

spectral-null coded-tape system is described in [27].

For these examples of MSN-constrained PR channels, the

error event characterization discussed above provides another

conﬁrmation, and a reﬁnement, of the coding gain bounds. The

veriﬁcation makes use of the moment conditions satisﬁed by

closed input error sequences satisfying spectral

null properties, a generalization of the moment conditions in

(21) above. Speciﬁcally, a ﬁrst-order DC null requires that

(22)

and a ﬁrst-order Nyquist null requires that

(23)

Examination of the error events for PR4 in Table V shows

that each error event with fails to satisfy at least one

of these conditions. Similarly, for EPR4, the error events in

Table VI with are forbidden by the moment conditions.

In the case of E PR4, the error event characterization not only

conﬁrms, but also improves, the lower bound. Table VII shows

that the moment conditions cannot be satisﬁed by any error

sequence with implying a nominal coding gain of

2.2 dB. MSN coding based upon Nyquist-free constraints is

applicable to optical PR channels, and error-event analysis can

be used to conﬁrm the coding gain bounds in a similar manner

[152].

There have been a number of extensions and variations

on MSN coding techniques, most aimed at increasing code

rate, improving intrinsic runlength constraints, or reducing

the implementation complexity of the encoder, decoder, and

detector. For further details, the reader should consult [68] and

the references therein, as well as more recent results in, for

example, [151] and [147].

When implementing MSN-coded PR systems, the complex-

ity of the trellis structure that incorporates both the PR channel

memory and the MSN constraints can be an issue, particularly

for high-rate codes requiring larger digital sum variation.

Reduced-complexity, suboptimal detection algorithms based

upon a concatenation of a Viterbi detector for the PR channel

and an error-event post-processor have been proposed for a

DC/Nyquist-free block-coded PR4 channel [128] and EPR4

channel [184]. In both schemes, DC/Nyquist-free codewords

are obtained by interleaving pairs of DC-free codewords, and

discrepancies in the ﬁrst-order moments of the interleaved

codeword estimates produced by the PR channel detector

are utilized by the post-processors to determine and correct

most-probable minimum-distance error events.

It should be pointed out that aspects of the code design

strategy described above were foreshadowed in an unpublished

paper of Fredrickson [66] dealing with the biphase-coded

dicode channel. In that paper, the observation was made

that the input error sequences corresponding to the minimum

squared-distance were of the form

and those corresponding to the next-minimum distance

were of the form

Fredrickson modiﬁed the encoding process to eliminate

minimum-distance events by appending an overall “parity-

check” bit to each block of input bits, for a speciﬁed value

of The resulting rate code provided a minimum

squared-Euclidean distance at the output of the

dicode channel, with only a modest penalty in rate for large

The Viterbi detector for the coded channel was modiﬁed to

incorporate the parity modulo- and to reﬂect the even-parity

condition at the codeword boundaries. It was also shown that

both the and the events can be eliminated

by appending a pair of bits to each block of input bits in

order to enforce a speciﬁc parity condition modulo- . The

resulting rate code yielded at the

dicode channel output, and the coding gain was realized with

a suitably enhanced detector structure.

D. Runlength Constraints

Certain classes of runlength constraints have distance-

enhancing properties when applied to the magnetic and optical

PR channels. For example, the NRZI constraint has been

applied to the EPR4 and the E PR4 magnetic channels, as well

as the PR1 and PR2 optical channels; see, for example [152]

and the references therein. On the EPR4 and PR1 channels,

the constraint does not increase minimum distance. However,

it does eliminate some of the minimum distance error-events,

thereby providing some performance improvement. Moreover,

the incorporation of the constraint into the detector trellis for

EPR4 leads to a reduction of complexity from eight states

IMMINK et al.: CODES FOR DIGITAL RECORDERS 2287

TABLE VIII

INPUT PAIRS FOR FORBIDDEN ERROR STRINGS IN

to six states, eliminating those corresponding to the NRZ

channel inputs and .

In the case of E PR4, Behrens and Armstrong [17] showed

that the constraint provides a 2.2-dB increase in

minimum squared-Euclidean distance. To see why this is

the case, observe that forbidding the input error strings

will prevent all closed error events with

Forbidding, in addition, the strings pre-

vents all open events with as well. Table VIII depicts

pairs of binary input strings whose corresponding error strings

belong to The symbol

represents an arbitrary binary value common to both strings in

a pair. Clearly, the elimination of the NRZ strings and

precludes all of the input error strings. The precoded

constraint precludes the NRZ strings —that

is, the NRZ constraint is —conﬁrming that the

constraint prevents all events with When the

constraint is incorporated into the detector trellis, the resulting

structure has only 10 states, substantially less than the 16

states required by the uncoded channel.

The input error sequence analysis used above to conﬁrm the

distance-enhancing properties of the constraint on the

EPR4 channel suggests a relaxation of the constraint that

nevertheless still achieves the same distance gain. Speciﬁ-

cally, the constraint and the complementary

constraint are sufﬁcient to ensure the elimination of closed

and open events with The capacity of this constraint

satisﬁes , and a rate , ﬁnite-state encoder

with state-independent decoder is described in [122]. The

corresponding detector trellis requires 12 states. Thus with a

modest increase in complexity, this code achieves essentially

the same performance as the rate code, while

increasing the rate by 20%.

This line of reasoning may be used to demonstrate the

distance-enhancing properties of another class of NRZI

runlength constraints, referred to as maximum-transition-run

(MTR) constraints [154]. These constraints limit, sometimes

in a periodically time-varying manner, the maximum number

of consecutive ’s that can occur. The MTR constraints

are characterized by a parameter , which determined the

maximum allowable runlength of ’s. These constraints can

be interpreted as a generalization of the constraint,

which is the same as the MTR constraint with

The MTR constraint with was introduced by

Moon and Brickner [154] (see also Soljanin [181]). A labeled

graph representation is shown in Fig. 19. The capacity of this

constraint is Imposing an additional constraint,

which we now denote , on the maximum runlength of ’s

reduces the capacity, as shown in Table IX.

Fig. 19. Labeled graph for MTR constraint.

TABLE IX

CAPACITY OF MTR

FOR SELECTED VALUES OF

TABLE X

INPUT PAIRS FOR FORBIDDEN ERROR STRINGS IN

The NRZI MTR constraint with corresponds to

an NRZ constraint The error-event characteriza-

tion in Table VII shows that the forbidden input error list

sufﬁces to eliminate the closed

error events on E PR4 with , though not all the open

events. Analysis of input pairs, shown in Table X, reveals that

the MTR constraint indeed eliminates the closed error events

with The detector trellis that incorporates the E PR4

memory with this MTR constraint requires 14 states.

A rate block code is shown in Table XI [154]. It is

interesting to observe that the MTR constraint is the

symbol-wise complement of the constraint,

and the rate MTR codebook is the symbol-wise comple-

ment of the rate Group Code Recording code, shown in

Table I. With this code, all open error events with are

eliminated.

The MTR constraint supports codes with rates approaching

its capacity, [154], [155]. However, in practical

applications, a distance-enhancing code with rate or higher

is considered very desirable. It has been shown that higher rate

trellis codes can be based upon time-varying MTR (TMTR)

constraints [67], [23], [123], [52]. For example, the TMTR

constraint deﬁned by , which limits the maximum

runlength of ’s beginning at an even time-index to at most

, has capacity The constraint has been shown to

support a rate block code.

Graph representations for the TMTR constrained

system are shown in Fig. 20. The states in the upper graph

are depicted as either circles or squares, corresponding

to odd time indices and even time indices, respectively. The

numbering of states reﬂects the number of ’s seen since the

2288 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

TABLE XI

ENCODER TABLE FOR RATE 4/5,

MTR BLOCK CODE

Fig. 20. Labeled graphs for TMTR constraint.

last . In the upper graph , each state represents a unique

such number. The lower graph is obtained by successively

merging states with identical follower sets.

The TMTR constraint eliminates all closed error

events with on the E PR4 channel by preventing the

input error sequences

As with the MTR constraint, it can be shown that

all open error events with can be eliminated by an

appropriately designed rate , TMTR block code [123],

[23], [21] [24], [52]. The time-varying trellis used by the

detector for the rate coded E PR4 channel requires 16

states, no more states than the uncoded system. It has been

shown that these constraints and codes also may be applied

to the E PR4 channel to increase the minimum distance to

the channel MFB, that is from to [152].

Time-varying constraints for the E PR4 channel that support

distance-enhancing codes with rates larger than have also

been found [52].

Fig. 21 shows a computer simulation of the bit-error-rate

performance of four distance-enhancing constraints on the

EPR4 channel, assuming a constant channel bit rate [152].

As a result of the favorable tradeoff between performance

and complexity offered by high-rate distance-enhancing codes

for high-order PR channels, there is currently great interest in

deploying them in commercial magnetic data-storage systems,

and further research into the design of such codes is being

actively pursued.

Finally, we remark that, for optical recording, the

constraint and the TMTR constraint increase the

minimum distance to on the PR2 and EPR2 channels,

yielding nominal coding gains of 1.8 and 3 dB, respectively.

A simple, rate code for the TMTR constraint

may be used with a four-state detector to realize these coding

gains [29], [152].

E. Precoded Convolutional Codes

An alternative, and in fact earlier, approach to coded-

modulation for PR channels of the form

was introduced by Wolf and Ungerboeck [204] (see also [30]).

Consider ﬁrst the case , the dicode channel. A binary

input sequence is applied to an NRZI precoder,

which implements the precoding operation characterized by

the polynomial The binary precoder

outputs are modulated to produce the bipolar

channel inputs according to the rule

Let be precoder inputs, with corresponding

channel outputs Then the Euclidean distance at

the output of the channel is related to the Hamming distance

at the input to the precoder by the inequality

(24)

Now, consider as precoder inputs the set of code sequences

in a convolutional code with states in the encoder and free

Hamming distance The outputs of the PR channel may

be described by a trellis with or fewer states [212], which

may be used as the basis for Viterbi detection. The inequality

(24) leads to the following lower bound on of the coded

system:

if is even

if is odd.

This coding scheme achieves coding gains on the dicode

channel by the application of good convolutional codes, de-

signed for memoryless Gaussian channels, and the use of

a sequence detector trellis that reﬂects both the structure

of the convolutional code and the memory of the channel.

Using a nontrivial coset of the convolutional code ensures the

satisfaction of constraints on the zero runlengths at the output

of the channel.

It is clear that, by interleaving convolutional encoders

and using a precoder of the form , this

technique, and the bound on free distance, may be extended

to PR channels of the form In

particular, it is applicable to the PR4 channel corresponding to

The selection of the underlying convolutional

code and nontrivial coset to optimize runlength constraints,

free distance, and detector trellis complexity has been in-

vestigated by several authors. See, for example, [89], [90],

IMMINK et al.: CODES FOR DIGITAL RECORDERS 2289

Fig. 21. Performance of uncoded and coded E PR4 systems.

and [212]. For the PR4 channel, and speciﬁed free Euclidean

distance at the channel output, the runlength constraints and

complexity of precoded convolutional codes have been found

to be slightly inferior to those of matched-spectral-null (MSN)

codes. For example, a rate precoded convolutional code

was shown to achieve 3-dB gain (unnormalized for rate loss)

with constraints and a 16-state detector

trellis with 256 branches (per interleave). The comparable

MSN code with this gain achieved the equivalent of constraints

and used a six-state detector trellis with

24 branches (per interleave).

Recently, a modiﬁed version of this precoding approach

was developed for use with a rate turbo code [168]. The

detection procedure incorporated an a posteriori probability

(APP) PR channel detector, combined with an iterative, turbo

decoder. Performance simulations of this coding scheme on

a PR4 channel with AWGN demonstrated a gain of 5.3 dB

(normalized for rate loss) at a bit-error rate of , relative to

the uncoded PRML channel. Turbo equalization, whereby the

PR detector is integrated into the iterative decoding procedure,

was also considered. This increased the gain by another 0.5

dB. Thus the improvement over the previously proposed rate

codes, which achieve 2-dB gain (normalized for rate loss)

is approximately 3.3–3.8 dB. The remaining gap in

between the rate turbo code performance at a bit-error

rate of and the upper bound capacity limit (3) at rate

[172] is approximately 2.25 dB [168]. The corresponding

gap to the upper bound capacity limit at rate for the

precoded convolutional code and the MSN code is therefore

approximately 5.5–6 dB. This estimate of the SNR gap can be

compared with that implied by the continuous-time channel

capacity bounds, as discussed in Section II-B.

VI. COMPENDIUM OF MODULATION CONSTRAINTS

In this section, we describe in more detail selected properties

of constrained systems that have played a prominent role in

digital recording systems. The classes of runlength-

limited constraints and spectral-null constraints have already

been introduced. In addition, there are constraints that generate

spectral lines at speciﬁed frequencies, called pilot tracking

tones, which can be used for servo tracking systems in

videotape recorders [118], [115]. Certain channels require a

combination of time and frequency constraints [128], [157],

[160]; speciﬁcally DC-balanced RLL sequences have found

widespread usage in recording practice. In addition, there are

many other constraints that play a role in recording systems;

see, for example, [102], [196], [146], [177], and [178]. Table

XII gives a survey of recording constraints used in consumer

electronics products.

A. Runlength-Limited Sequences

We have already encountered -constrained binary se-

quences where We are also interested in the

case Fig. 22 illustrates a graph representing

constraints.

2290 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

TABLE XII

SURVEY OF RECORDING CODES AND THEIR APPLICATION AREA

Fig. 22. Shannon cover for a constraint.

TABLE XIII

CAPACITY VERSUS RUNLENGTH PARAMETERS AND

For sequences we can easily derive the characteristic

equation

or equivalently,

Table XIII lists the capacity for selected values of

the parameters and

RLL sequences are used to increase the minimum separation

between recorded transitions. The quantity , called the

density ratio or packing density, is deﬁned as

It expresses the number of information bit intervals within

the minimum separation between consecutive transitions of an

RLL sequence. It may be shown that the density ratio can

be made arbitrarily large by choosing sufﬁciently large [3].

The minimum increment within a runlength is called the timing

window or detection window, denoted by Measured in

units of information bit intervals, Sequences

with a larger value of , and thus a lower capacity ,

are penalized by an increasingly difﬁcult tradeoff between the

detection window and the density ratio. Practical codes have

typically used constraints with

TABLE XIV

CAPACITY OF ASYMMETRICAL RUNLENGTH-LIMITED

SEQUENCES VERSUS MINIMUM RUNLENGTH

B. Asymmetrical Runlength Constraints

Asymmetrical runlength-limited sequences [75], [194],

[186] have different constraints on the runlengths of ’s and

’s. One application of these constraints has been in optical

recording systems, where the minimum size of a written

pit, as determined by diffraction limitations, is larger than

the minimum size of the area separating two pits, a spacing

determined by the mechanical positioning capabilities of the

optical recording ﬁxture.

Asymmetrical runlength-limited sequences are described by

four parameters and , and ,

, which describe the constraints on runlengths of ’s

and ’s, respectively. An allowable sequence is composed

of alternate phrases of the form ,

and

Let one sequence be composed of phrases of durations

, and let the second sequence have phrases of

durations The interleaved sequence is composed of

phrases taken alternately from the ﬁrst, odd sequence and the

second, even sequence. The interleaved sequence is composed

of phrases of duration , , ,

implying that the characteristic equation is

which can be rewritten as

(25)

If we assume that , then (25) can be written as

As an immediate implication of the symmetry in and ,

we ﬁnd for the capacity of the asymmetrical runlength-limited

sequences

(26)

where denotes the capacity of asymmetrical

runlength-limited sequences. Thus the capacity of asym-

metrical RLL sequences is a function of the sum of the

two minimum runlength parameters only, and it sufﬁces to

evaluate by solving the characteristic equation

Results of computations are given in Table XIV.

IMMINK et al.: CODES FOR DIGITAL RECORDERS 2291

Fig. 23. Labeled graph for constraint.

We can derive another useful relation with the following

observation. Let , i.e., the restrictions on the runlengths

of ’s and ’s are again symmetric, then from (26)

so that we obtain the following relation between the capacity

of symmetrical and asymmetrical RLL sequences:

C. RLL Sequences with Multiple Spacings

Funk [72] showed that the theory of RLL sequences is

unnecessarily narrow in scope and that it precludes certain

relevant coding possibilities which could prove useful in

particular devices. The limitation is removed by introducing

multiple-spaced RLL sequences, where one further restriction

is imposed upon the admissible runlengths of ’s. The run-

length/spacing constraints may be expressed as follows: for

integers and where is a multiple of the number

of ’s between successive ’s must be equal to , where

The parameters and again deﬁne

the minimum and maximum allowable runlength. A sequence

deﬁned in this way is called an RLL sequence with multiple

spacing (RLL/MS). Such a sequence is characterized by the

parameters Note that for standard RLL sequences

we have Fig. 23 illustrates a state-transition diagram

for the constraint.

The capacity can simply be found by invoking

Shannon’s capacity formula

where is the largest root of the characteristic equation

(27)

Note that if and have a common factor , then

is also divisible by Therefore, a sequence

with the above condition on and is equivalent to a

sequence. For ,we

obtain the characteristic equation

Table XV shows the results of computations. Within any

adjacent bit periods, there is only one possible location for the

next , given the location of the last . The detection window

for an RLL/MS sequence is therefore and

the minimum spacing between two transitions, , equals

By rewriting (27) we obtain a relationship between

and , namely,

TABLE XV

CAPACITY FOR SELECTED VALUES OF AND

Fig. 24. Relationship between and window The operating points

of various sequences are indicated.

This relationship is plotted, for , in Fig. 24. With

constrained sequences, only discrete points on this

curve are possible. RLL sequences with multiple spacing,

however, make it possible, by a proper choice of and ,

to approximate any point on this curve.

A multiple-spaced RLL code with parameters has

been designed and experimentally evaluated in exploratory

magnetooptic recording systems using a resonant bias coil

direct-overwrite technique [167], [200].

D. Sequences

The constraints for partial-response maximum-

likelihood systems were introduced in Section II-C2. Recall

that the parameter stipulates the maximum number of

allowed ’s between consecutive ’s, while the parameter

stipulates the maximum number of ’s between ’s in both the

even- and odd-numbered positions of the sequence.

To describe a graph presentation of these constraints, we

deﬁne three parameters. The quantity denotes the number

of ’s since the last . The quantities and denote the

number of ’s since the last in the even and odd subsequence,

respectively. It is immediate that

if

if

Each state in the graph corresponds to a -tuple with

and Wherever permitted, there

2292 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

TABLE XVI

CAPACITY FOR SELECTED VALUES OF AND

is an edge from state to state with a label

, and an edge from state to state with a label

. By computing the maximum eigenvalue of the adjacency

matrix corresponding to the graph, we obtain the capacity of

the constraint . Results of computations are listed in

Table XVI.

For all of these constraints, rate codes have been

constructed [146]. As mentioned earlier, a rate ,

block code was used in early disk drives employing PRML

techniques. Current disk drives make use of more relaxed

constraints, such as and , which can support

codes with even higher rates, such as rate [161], [51].

E. Spectral-Null Sequences

Frequency-domain analysis of constrained sequences

is based upon the average power spectral density, or, as

it is often called, the power spectrum. In order to deﬁne

the power spectrum, we must endow the ensemble of

constrained sequences with a probability measure. Generally,

the measure chosen is the maxentropic measure determined

by the transition probabilities discussed in Section III-B.

The autocorrelation function is the sequence of th-order

autocorrelation coefﬁcients , deﬁned by

where represent channel input symbols and the expec-

tation is with respect to the given measure. According to

the Wiener–Khinchin theorem, the average power spectrum

is given by the discrete-time Fourier transform of the autocor-

relation function

where, as before, Alternatively, we can express

as

The computation of the power spectrum of an ensemble of

Markov-chain driven sequences is well-studied and has been

carried out for many families of runlength-type constraints, as

well as for the subsets of constrained sequences generated by

speciﬁc ﬁnite-state encoders; see [75] and references therein.

It is important to note that for a particular sequence, the

average power density at a particular frequency , if it exists

at all, may differ signiﬁcantly from if

TABLE XVII

CAPACITY AND SUM VARIANCE OF MAXENTROPIC RDS-CONSTRAINED

SEQUENCES VERSUS DIGITAL SUM VARIATION

For spectral-null constraints with , however, every

sequence in the constraint has a well-deﬁned average power

density at , and the magnitude is equal to zero [145]. As

has already been mentioned, the spectral null frequencies

of primary interest in digital recording are zero frequency

(DC) and the Nyquist frequency. (Further general results on

spectral-null sequences are given in [145], [100], and [102],

for example.)

Chien [38] studied bipolar sequences that assume a ﬁnite

range of consecutive running-digital-sum (RDS) values,

that is, sequences with digital-sum variation (DSV) The

range of RDS values may be used, as in Fig. 7, to deﬁne a

set of allowable states. The adjacency matrix for the

RDS-constrained channel is given by

otherwise.

For most constraints, it is not possible to ﬁnd a simple

closed-form expression for the capacity, and one has to rely

on numerical methods to obtain an approximation. The RDS-

constrained sequences provide a beautiful exception to the rule,

as the structure of allows us to provide a closed-form

expression for the capacity of an RDS-constrained channel.

We have [38]

and thus the capacity of the RDS-constrained channel is

(28)

Table XVII lists the capacity , for It

can be seen that the sum constraint is not very expensive in

terms of rate loss when is relatively large. For instance,

a sequence that takes at maximum sum values has a

capacity , which implies a rate loss of less than

10%.

Closed-form expressions for the spectra of maxentropic

RDS-constrained sequences were derived by Kerpez [126].

Fig. 25 displays the power spectral density function of max-

entropic RDS-constrained sequences for various values of the

digital sum variation

Let denote the power spectral density of a sequence

with vanishing power at DC, where The width of the

spectral notch is a very important design characteristic which

is usually quantiﬁed by a parameter called the cutoff frequency.

The cutoff frequency of a DC-free constraint, denoted by ,

IMMINK et al.: CODES FOR DIGITAL RECORDERS 2293

Fig. 25. Power density function of maxentropic RDS-constrained

sequences against frequency with digital sum variation as a parameter.

For the case , we have indicated the cutoff frequency

is deﬁned by [65]

It can be observed that the cutoff frequency becomes

smaller when the digital sum variation is allowed to

increase.

Let denote RDS Justesen [116] discovered

a useful relation between the sum variance and

the width of the spectral notch He found the following

approximation of the cutoff frequency :

(29)

Extensive computations of samples of implemented channel

codes, made by Justesen [116] and Immink [98] to validate the

reciprocal relation (29) between and , have revealed that

this relationship is fairly reliable. The sum variance of

a maxentropic RDS-constrained sequence, denoted by ,

is given by [38]

(30)

Table XVII lists the sum variance for

Fig. 26, which shows a plot of the sum variance versus the

redundancy , affords more insight into the tradeoffs

in the engineering of DC-balanced sequences. It presents the

designer with a spectral budget, reﬂecting the price in terms

of code redundancy for a desired spectral notch width. It also

reveals that the relationship between the logarithms of the sum

variance and the redundancy is approximately linear.

For large digital sum variation , it was shown by A.

Janssen [114] that

and similarly

Fig. 26. Sum variance versus redundancy of maxentropic RDS-constrained

sequences.

These approximations, coupled with (28) and (30), lead to

a fundamental relation between the redundancy and

the sum variance of a maxentropic RDS-constrained sequence,

namely,

(31)

Actually, the bound on the right is within 1% accuracy for

Equation (31) states that, for large enough , the

product of redundancy and sum variance of maxentropic

RDS-constrained sequences is approximately constant, as was

suggested by Fig. 26.

VII. FUTURE DIRECTIONS

As digital recording technology advances and changes, so

does the system model that serves as the basis for information-

theoretic analysis and the motivation for signal processing and

coding techniques. In this section, we brieﬂy describe several

technology developments, some evolutionary and some revo-

lutionary, that introduce new elements that can be incorporated

into mathematical models for digital recording channels.

A. Improved Channel Models

Reﬂecting the continuing, rapid increase in areal density of

conventional magnetic recording, as well as the characteristics

of the component heads and disks, channel models now incor-

porate factors such as asymmetry in the positive and negative

step responses of magnetoresistive read heads; deviations

from linear superposition; spectral coloring, nonadditivity, and

nonstationarity in media noise; and partial-erasure effects and

other data-dependent distortions [20], [21], [32], [33].

The evaluation of the impact of these channel characteristics

on the performance of the signal processing and coding

techniques dicussed in this paper is an active area of research,

as is the development of new approaches that take these

channel properties into account. See, for example, related

papers in [192].

2294 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

B. Nonsaturation Multilevel Recording

At various times during the past, the possibility of aban-

doning saturation recording, “linearizing” the digital magnetic-

recording channel, and incorporating nonbinary signaling has

been examined. In all such studies, however, the potential

increase in recording density that might accrue from the

application or adaptation of coded-modulation techniques de-

veloped for digital communications has been outweighed by

the increase in detector complexity and, more fundamentally,

the cost in signal-to-noise ratio that accompanies the lineariza-

tion process. However, several novel storage technologies

can support multilevel alphabets, such as electron-trapping

optical memories ETOM [148], [31] and optical recording with

multivalued magnetooptic media [176].

C. Multitrack Recording

Another avenue toward increasing the storage capacity

of disk and tape systems is to exploit their inherent two-

dimensional nature. Runlength-limited codes, such as -track

codes, that increase the per-track code rate by sharing

the timing constraint across multiple tracks have been

analyzed and designed [140], [185], [47].

Using models of signal-to-noise ratio dependence upon

track width, as well as intertrack interference (ITI), one

can investigate information-theoretic capacity bounds as a

function of track density. Multitrack recording and multihead

detection techniques based upon partial-response equalization,

decision-feedback-equalization, and sequence detection have

been studied [13], along with coding schemes that can improve

their performance. See, for example, [183] and references

therein.

D. Multidimensional Recording

New, exploratory technologies, such as volume holographic

data storage [80] and two-photon-based three-dimensional

(3-D) optical memories [95], have generated interest in page-

oriented recording and readback. Models of these processes

have generated proposals for two-dimensional equalization and

detection methods [82], [158], along with two-dimensional

codes [81], [195].

This has generated interest in two-dimensional constrained

systems and modulation codes. As an example, consider a two-

dimensional binary constrained array as an (row) by

(column) binary array such that every has no less than ’s

and no more than ’s above it, below it, to the right of it, and

to the left of it (with the exception of ’s on or near borders).

The capacity of such an array is equal to the limit, as and

approach inﬁnity, of the ratio of the logarithm of the number

of distinct arrays satisfying the constraints to the product of

times Little is known at this time about ﬁnding the

capacity of such two-dimensional binary constrained arrays.

A notable exception is that it has been proved that the two-

dimensional capacity of such two-dimensional binary

arrays is equal to zero if and only if and [124].

Thus the two-dimensional capacity of the constraint is

equal to , while the two-dimensional capacity of the

constraint is strictly greater than . This is in contrast to the

one-dimensional case, where the capacity of both and

constrained binary sequences are both nonzero and, in

fact, are equal. Lower bounds on the capacity of some two-

dimensional constraints are presented in [124], [179],

and other constraints relevant to two-dimensional recording

are analyzed in [11], [187], and [199].

VIII. SHANNON’SCROSSWORD PUZZLES

A. Existence of Multidimensional Crossword Puzzles

As mentioned in the preceding section, multidimensional

constrained codes represent a new challenge for information

theorists, with potentially important applications to novel,

high-density storage devices. We feel it is particularly ﬁtting,

then, to bring our survey to a close by returning once more

to Shannon’s 1948 paper [173] where, remarkably, in a short

passage addressing the connection between the redundancy of

a language and the existence of crossword puzzles, Shannon

anticipated some of the issues that arise in multidimensional

constrained coding.

Speciﬁcally, Shannon suggested that there would be cases

where the capacity of a two-dimensional constraint is equal

to zero, even though the capacity of the constituent one-

dimensional constraint is nonzero, a situation illustrated by

certain two-dimensional constraints. We cite the fol-

lowing excerpt from Shannon’s 1948 paper:

The ratio of the entropy of a source to the maximum

value it could have while still restricted to the same sym-

bols will be called its relative entropy.One minus the

relative entropy is the redundancy.The redundancy

of a language is related to the existence of crossword

puzzles. If the redundancy is zero any sequence of

letters is a reasonable text in the language and any two-

dimensional array of letters forms a crossword puzzle.

If the redundancy is too high the language imposes

too many constraints for large crossword puzzles to be

possible. A more detailed analysis shows that if we

assume the constraints imposed by the language are

of a rather chaotic and random nature, large crossword

puzzles are just possible when the redundancy is 50%.

If the redundancy is 33%, three-dimensional crossword

puzzles should be possible, etc.

To the best of our knowledge Shannon never published a

more detailed exposition on this subject. This led us to try to

construct a plausibility argument for his statement. We assume

that the phrase “large crossword puzzles are just possible”

should be taken to mean that the capacity of the corresponding

two-dimensional constraint is nonzero.

Let denote the number of source symbols, denote

the source binary entropy, and denote

the relative entropy. We begin with all by arrays

that can be formed from symbols. We eliminate all arrays

that do not have all of their rows and columns made up

of a concatenation of allowable words from the language.

The probability that any row of the array is made up of a

concatenation of allowable words from the language is equal

IMMINK et al.: CODES FOR DIGITAL RECORDERS 2295

to the ratio of the number of allowable concatenations of

words with letters, ,to Thus assuming statistical

independence of the rows, the probability that all rows are

concatenations of allowable words is this ratio raised to the

th power, or or The identical

ratio results for the probability that all columns are made

up of concatenations of allowable words. Now assuming that

the rows and columns are statistically independent, we see that

the probability for an array to have all of its rows and all of

its columns made up of concatenations of allowable words is

equal to The assumption of independence of

the rows and columns is made with the sole justiﬁcation that

this property might be expected to be true for a language that

is “of a rather chaotic and random nature.” Multiplying this

probability by the number of arrays yields the average

number of surviving arrays, which grows

exponentially with provided that A similar

argument for three-dimensional arrays yields the condition

This is Shannon’s result. (The authors thank K.

Shaughnessy [174] for contributions to this argument.) We

remark that for ordinary English crossword puzzles, we would

interpret the black square to be a 27th symbol in the alphabet.

Thus to compute the “relative entropy” of English, we divide

the entropy of English by In this context, we would

propose using an unusual deﬁnition of the entropy of English,

which we call , based upon the dependencies of letters

within individual words, but not across word boundaries, since

the rows and columns of crossword puzzles are made up of

unrelated words separated by one or more black squares. To

compute for the English language, we can proceed as

follows. Assume that is the number of words in an English

dictionary with letters, for We lengthen each

word by one letter to include the black square at the end of a

word and then add one more word of length to represent a

single black square. (This allows more than one black square

between words.) Following Shannon, the number of distinct

sequences of words containing exactly symbols, ,is

given by the difference equation

(32)

Then, is given by the logarithm of the largest real root of

the equation

(33)

The distribution of word lengths in an English dictionary

has been investigated by Lord Rothschild [166]. (See also the

discussion in Section VIII-C.)

B. Connections to Two-Dimensional Constraints

Unfortunately, a direct application of Shannon’s statement

to the and constraints leads to a

problem. Their one-dimensional capacities and, therefore, their

relative entropies, are equal, with However,

we have seen that the capacity of the two-dimensional

constraint is zero, while that of the two-dimensional

constraint is nonzero. In order to resolve this inconsistency

with Shannon’s bound, we tried to modify the argument by

more accurately approximating the probability of a column

satisfying the speciﬁed row constraint, as follows.

Although the one-dimensional capacities of the two con-

straints are equal, the one-dimensional constraints have dif-

ferent ﬁrst-order entropies In particular,

for the constraint and for the

constraint, since the relative frequency of ’s is higher for the

constraint than for the constraint. In the previous

plausibility argument for Shannon’s result, once one chooses

the rows of the array to be a concatenation of allowable words,

the relative frequencies of the symbols in each column occur

in accordance with the relative frequency of the symbols in the

words of the language. Thus the probability that any column is

a concatenation of allowable words is equal to

Proceeding as above, we ﬁnd that the average number of

surviving arrays grows exponentially with provided that

for two-dimensional arrays, or

for three-dimensional arrays.

However, for both the one-dimensional and

constraints, we ﬁnd Therefore, this modiﬁed

analysis still does not satisfactorily explain the behavior of

these two constraints. A possible explanation is that a further

reﬁnement in the argument is needed. Another possibility is

that these constraints are not “chaotic and random”

enough for Shannon’s conclusion, and our plausibility argu-

ments, to apply.

C. Coda

As this paper was undergoing ﬁnal revisions, one of the

authors (JKW) received a letter from E. Gilbert pertaining to

Shannon’s crossword puzzles [77]. The letter was prompted

by a lecture given by JKW at the Shannon Day Symposium,

held at Bell Labs on May 18, 1998, in which the connection

between the capacity of two-dimensional constraints and Shan-

non’s result on crossword puzzles was discussed. In the letter,

Gilbert recalls a conversation he had with Shannon 50 years

ago on this subject. Referring to Shannon’s paper, he says:

I didn’t understand that crossword example and tried to

reconstruct his argument. That led to a kind of hand-

waving “proof,” which I showed to Claude. Claude’s

own argument turned out to have been something like

mine Fortunately, I outlined my proof in the margin

of my reprint of the paper (like Fermat and his copy of

Diophantos). It went like this:

The argument that followed is exactly the same as the one

presented in Section VIII-A above, with the small exception

that arrays were assumed to be square. In fact, in a subsequent

e-mail correspondence [78], Gilbert describes a calculation

of the redundancy of English along the lines suggested by

(32) and (33). Thus we see that the study of multidimensional

constrained arrays actually dates back 50 years to the birth of

information theory. A great deal remains to be learned.

IX. SUMMARY

In this paper, we have attempted to provide an overview

of the theoretical foundations and practical applications of

2296 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

constrained coding in digital-recording systems. In keeping

with the theme of this special issue, we have highlighted

essential contributions to this area made by Shannon in his

landmark 1948 paper. We described the basic characteristics

of a digital-recording channel, and surveyed bounds on the

noisy-channel capacity for several mathematical channel mod-

els. We then discussed practical equalization and detection

techniques and indicated how their implementation imposes

constraints on the recording-channel inputs. Following a re-

view of Shannon’s fundamental results on the capacity of

discrete noiseless channels and on the existence of efﬁcient

codes, we presented a summary of key results in the theory

and practice of efﬁcient constrained code design. We then

discussed the application of distance-enhancing constrained

codes to improve the reliability of noisy recording channels,

and compared the resulting performance to estimates of the

noisy-channel capacity. Finally, we pointed out several new

directions that future research in the area of recording codes

might follow, and we concluded with a discussion of the

connection between Shannon’s remarks on crossword puzzles

and the theory of multidimensional constrained codes. Through

the inclusion of numerous references and indications of open

research problems, we hope to have provided the reader with

an introduction to this fascinating, important, and active branch

of information theory, as well as with some incentive and

encouragement to contribute to it.

ACKNOWLEDGMENT

The authors are grateful to Dick Blahut, Brian Marcus,

Ron Roth, and Emina Soljanin for their thoughtful comments

on an earlier version of this paper. They also wish to thank

Bruce Moision for assistance with computer simulations and

for preparation of Fig. 21.

REFERENCES

[1] K. A. S. Abdel-Ghaffar and J. H. Weber, “Constrained block codes for

class–IV partial-response channels with maximum-likelihood sequence

estimation,” IEEE Trans. Inform. Theory, vol. 42, pp. 1405–1424, Sept.

1996.

[2] R. L. Adler, “The torus and the disk,” IBM J. Res. Develop., vol. 31,

no. 2, pp. 224–234, Mar. 1987.

[3] R. L. Adler, D. Coppersmith, and M. Hassner, “Algorithms for sliding

block codes: An application of symbolic dynamics to information

theory,” IEEE Trans. Inform. Theory, vol. IT-29, pp. 5–22, Jan. 1983.

[4] R. L. Adler, M. Hassner, and J. Moussouris, “Method and apparatus for

generating a noiseless sliding block code for a (1, 7) channel with rate

2/3,” U.S. Patent 4413 251, June 1982.

[5] N. Alon, E. E. Bergmann, D. Coppersmith, and A. M. Odlyzko,

“Balancing sets of vectors,” IEEE Trans. Inform. Theory, vol. 34, pp.

128–130, Jan. 1988.

[6] S. Altekar, “Detection and coding techniques for magnetic recording

channels,” Ph.D. dissertation, Univ. Calif. San Diego, June 1997.

[7] S. A. Altekar, M. Berggren, B. E. Moision, P. H. Siegel, and J. K.

Wolf, “Error-event characterization on partial-response channels,” in

Proc. 1997 IEEE Int. Symp. Information Theory (Ulm, Germany, June

29–July 4), p. 461; IEEE Trans. Inform. Theory, vol. 45, Jan. 1999, to

be published.

[8] J. Ashley, R. Karabed, and P. H. Siegel, “Complexity and sliding-block

decodability,” IEEE Trans. Inform. Theory, vol. 42, no. 6, pt. 1, pp.

1925–1947, Nov. 1996.

[9] J. J. Ashley and B. H. Marcus, “Canonical encoders for sliding block

decoders,” SIAM J. Discrete Math., vol. 8, pp. 555–605, 1995.

[10] , “A generalized state-splitting algorithm,” IEEE Trans. Inform.

Theory, vol. 43, pp. 1326–1338, July 1997.

[11] , “Two-dimensional low-pass ﬁltering codes,” IEEE Trans. Com-

mun., vol. 46, pp. 724–727, June 1998.

[12] J. J. Ashley, B. H. Marcus, and R. M. Roth, “Construction of encoders

with small decoding look-ahead for input-constrained channels,” IEEE

Trans. Inform. Theory, vol. 41, pp. 55–76, Jan. 1995.

[13] L. C. Barbosa, “Simultaneous detection of readback signals from inter-

fering magnetic recording tracks using array heads,” IEEE Trans. Magn.,

vol. 26, pp. 2163–2165, Sept. 1990.

[14] I. Bar-David and S. Shamai (Shitz), “Information rates for magnetic

recording channels with peak- and slope-limited magnetization,” IEEE

Trans. Inform. Theory, vol. 35, pp. 956–962, Sept. 1989.

[15] M.-P. B´

eal, Codage Symbolique. Paris, France: Masson, 1993.

[16] G. F. M. Beenker and K. A. S. Immink, “A generalized method

for encoding and decoding runlength-limited binary sequences,” IEEE

Trans. Inform. Theory, vol. IT-29, pp. 751–754, Sept. 1983.

[17] R. Behrens and A. Armstrong, “An advanced read/write channel for

magnetic disk storage,” in Proc. 26th Asilomar Conf. Signals, Systems,

and Computers (Paciﬁc Grove, CA, Oct. 1992), pp. 956–960.

[18] E. R. Berlekamp, “The technology of error-correcting codes,” Proc.

IEEE, vol. 68, pp. 564–593, May 1980.

[19] M. Berkoff, “Waveform compression in NRZI magnetic recording,”

Proc. IEEE, vol. 52, pp. 1271–1272, Oct. 1964.

[20] H. N. Bertram, Theory of Magnetic Recording. Cambridge, U.K.:

Cambridge Univ. Press, 1994

[21] H. N. Bertram and X. Che, “General analysis of noise in recorded

transitions in thin ﬁlm recording media,” IEEE Trans. Magn., vol. 29,

pp. 201–208, Jan. 1993.

[22] W. G. Bliss, “Circuitry for performing error correction calculations on

baseband encoded data to eliminate error propagation,” IBM Tech. Discl.

Bull., vol. 23, pp. 4633–4634, 1981.

[23] , “An 8/9 rate time-varying trellis code for high density magnetic

recording,” IEEE Trans. Magn., vol. 33, pp. 2746–2748, Sept. 1997.

[24] W. G. Bliss, S. She, and L. Sundell, “The performance of generalized

maximum transition run trellis codes,” IEEE Trans. Magn., vol. 34, no.

1, pt. 1, pp. 85–90, Jan. 1998.

[25] G. Bouwhuis, J. Braat, A. Huijser, J. Pasman, G. van Rosmalen, and K.

A. S. Immink, Principles of Optical Disc Systems. Bristol, U.K. and

Boston, MA: Adam Hilger, 1985.

[26] F. K. Bowers, U.S. Patent 2 957947, 1960.

[27] V. Braun, K. A. S. Immink, M. A. Ribiero, and G. J. van den Enden,

“On the application of sequence estimation algorithms in the Digital

Compact Cassette (DCC),” IEEE Trans. Consumer Electron., vol. 40,

pp. 992–998, Nov. 1994.

[28] V. Braun and A. J. E. M. Janssen, “On the low-frequency suppression

performance of DC-free runlength-limited modulation codes,” IEEE

Trans. Consumer Electron., vol. 42, pp. 939–945, Nov. 1996.

[29] B. Brickner and J. Moon, “Investigation of error propagation in

DFE and MTR coding for ultra-high density,” Tech. Rep.,

Commun. Data Storage Lab., Univ. Minnesota, Minneapolis, July 10,

1997.

[30] A. R. Calderbank, C. Heegard, and T.-A. Lee, “Binary convolutional

codes with application to magnetic recording, IEEE Trans. Inform.

Theory, vol. IT-32, pp. 797–815, Nov. 1986.

[31] A. R. Calderbank, R. Laroia, and S. W. McLaughlin, “Coded modulation

and precoding for electron-trapping optical memories,” IEEE Trans.

Commun., vol. 46, pp. 1011–1019, Aug. 1998.

[32] J. Caroselli and J. K. Wolf, “A new model for media noise in thin ﬁlm

magnetic recording media,” in Proc. 1995 SPIE Int. Symp. Voice, Video,

and Data Communications (Philadelphia, PA, Oct. 1995), vol. 2605, pp.

29–38.

[33] J. Caroselli and J. K. Wolf, “Applications of a new simulation model for

media noise limited magnetic recording channels,” IEEE Trans. Magn.,

vol. 32, pp. 3917–3919, Sept. 1996.

[34] K. W. Cattermole, Principles of Pulse Code Modulation. London,

U.K.: Iliffe, 1969.

[35] , “Principles of digital line coding,” Int. J. Electron., vol. 55, pp.

3–33, July 1983.

[36] Workshop on Modulation, Coding, and Signal Processing for Magnetic

Recording Channels, Center for Magnetic Recording Res., Univ. Calif.

at San Diego. La Jolla, CA, May 20–22, 1985.

[37] Workshop on Modulation and Coding for Digital Recording Systems,

Center for Magnetic Recording Res., Univ. Calif. at San Diego. La

Jolla, CA, Jan. 8–10, 1987.

[38] T. M. Chien, “Upper bound on the efﬁciency of DC-constrained codes,”

Bell Syst. Tech. J., vol. 49, pp. 2267–2287, Nov. 1970.

[39] R. Cideciyan, F. Dolivo, R. Hermann, W. Hirt, and W. Schott, “A PRML

system for digital magnetic recording,” IEEE J. Select. Areas Commun.,

vol. 10, pp. 38–56, Jan. 1992.

IMMINK et al.: CODES FOR DIGITAL RECORDERS 2297

[40] M. Cohn and G. V. Jacoby, “Run-length reduction of 3PM code via look-

ahead technique,” IEEE Trans. Magn., vol. MAG-18, pp. 1253–1255,

Nov. 1982.

[41] D. J. Costello, Jr., J. Hagenauer, H. Imai, and S. B. Wicker, “Applica-

tions of error control coding,” this issue, pp. 2531–2560.

[42] T. M. Cover, “Enumerative source coding,” IEEE Trans. Inform. Theory,

vol. IT-19, pp. 73–77, Jan. 1973.

[43] R. H. Deng and M. A. Herro, “DC-free coset codes,” IEEE Trans.

Inform. Theory, vol. 34, pp. 786–792, July 1988.

[44] J. Eggenberger and P. Hodges, “Sequential encoding and decoding of

variable length, ﬁxed rate data codes,” U.S. Patent 4 115768, 1978.

[45] J. Eggenberger and A. M. Patel, “Method and apparatus for implement-

ing optimum PRML codes,” U.S. Patent 4707 681, Nov. 17, 1987.

[46] E. Eleftheriou and R. Cideciyan, “On codes satisfying th order

running digital sum constraints,” IEEE Trans. Inform. Theory, vol. 37,

pp. 1294–1313, Sept. 1991.

[47] T. Etzion, “Cascading methods for runlength-limited arrays,” IEEE

Trans. Inform. Theory, vol. 43, pp. 319–324, Jan. 1997.

[48] I. J. Fair, W. D. Gover, W. A. Krzymien, and R. I. MacDonald, “Guided

scrambling: A new line coding technique for high bit rate ﬁber optic

transmission systems,” IEEE Trans. Commun., vol. 39, pp. 289–297,

Feb. 1991.

[49] J. L. Fan and A. R. Calderbank, “A modiﬁed concatenated coding

scheme with applications to magnetic data storage,” IEEE Trans. Inform.

Theory, vol. 44, pp. 1565–1574, July 1998.

[50] M. J. Ferguson, “Optimal reception for binary partial response chan-

nels,” Bell Syst. Tech. J., vol. 51, pp. 493–505, 1972.

[51] J. Fitzpatrick and K. J. Knudson, “Rate

modulation code for a magnetic recording channel,” U.S. Patent

5635 933, June 3, 1997.

[52] K. K. Fitzpatrick and C. S. Modlin, “Time-varying MTR codes for high

density magnetic recording,” in Proc. 1997 IEEE Global Telecommuni-

cations Conf. (GLOBECOM ’97) (Phoenix, AZ, Nov. 4–8, 1997).

[53] G. D. Forney, Jr., “Maximum likelihood sequence detection in the

presence of intersymbol interference,” IEEE Trans. Inform. Theory, vol.

IT-18, pp. 363–378, May 1972.

[54] , “The Viterbi algorithm,” Proc. IEEE, vol. 61, no. 3, pp. 268–278,

Mar. 1973.

[55] G. D. Forney, Jr. and A. R. Calderbank, “Coset codes for partial response

channels; or, cosets codes with spectral nulls,” IEEE Trans. Inform.

Theory, vol. 35, pp. 925–943, Sept. 1989.

[56] G. D. Forney, Jr. and G. Ungerboeck, “Modulation and coding for linear

gaussian channels,” this issue, pp. 2384–2415.

[57] P. A. Franaszek, “Sequence-state encoding for digital transmission,” Bell

Syst. Tech. J., vol. 47, pp. 143–157, Jan. 1968.

[58] , “Sequence-state methods for run-length-limited coding,” IBM J.

Res. Develop., vol. 14, pp. 376–383, July 1970.

[59] , “Run-length-limited variable length coding with error propaga-

tion limitation,” U.S. Patent 3 689 899, Sept. 1972.

[60] , “On future-dependent block coding for input-restricted chan-

nels,” IBM J. Res. Develop., vol. 23, pp. 75–81, 1979.

[61] , “Synchronous bounded delay coding for input restricted chan-

nels,” IBM J. Res. Develop., vol. 24, pp. 43–48, 1980.

[62] , “A general method for channel coding,” IBM J. Res. Develop.,

vol. 24, pp. 638–641, 1980.

[63] , “Construction of bounded delay codes for discrete noiseless

channels,” IBM J. Res. Develop., vol. 26, pp. 506–514, 1982.

[64] , “Coding for constrained channels: A comparison of two ap-

proaches,” IBM J. Res. Develop., vol. 33, pp. 602–607, 1989.

[65] J. N. Franklin and J. R. Pierce, “Spectra and efﬁciency of binary codes

without DC,” IEEE Trans. Commun., vol. COM-20, pp. 1182–1184,

Dec. 1972.

[66] L. Fredrickson, unpublished report, 1993.

[67] , “Time-varying modulo trellis codes for input restricted partial

response channels,” U.S. Patent 5257 272, Oct. 26, 1993.

[68] L. Fredrickson, R. Karabed, J. W. Rae, P. H. Siegel, H. Thapar, and

R. Wood, “Improved trellis coding for partial response channels,” IEEE

Trans. Magn., vol. 31, pp. 1141–1148, Mar. 1995.

[69] C. V. Freiman and A. D. Wyner, “Optimum block codes for noiseless

input restricted channels,” Inform. Contr., vol. 7, pp. 398–415, 1964.

[70] C. A. French and J. K. Wolf, “Bounds on the capacity of a peak

power constrained Gaussian channel,” IEEE Trans. Magn., vol. 24, pp.

2247–2262, Sept. 1988.

[71] S. Fukuda, Y. Kojima, Y. Shimpuku, and K. Odaka, “8/10 modulation

codes for digital magnetic recording,” IEEE Trans. Magn., vol. MAG-22,

pp. 1194–1196, Sept. 1986.

[72] P. Funk, “Run-length-limited codes with multiple spacing,” IEEE Trans.

Magn., vol. MAG-18, pp. 772–775, Mar. 1982.

[73] A. Gabor, “Adaptive coding for self-clocking recording,” IEEE Trans.

Electron. Comp., vol. EC-16, pp. 866–868, Dec. 1967.

[74] R. Gallager, Information Theory and Reliable Communication. New

York: Wiley, 1968.

[75] A. Gallopoulos, C. Heegard, and P. H. Siegel, “The power spectrum of

run-length-limited codes,” IEEE Trans. Commun., vol. 37, pp. 906–917,

Sept. 1989.

[76] F. R. Gantmacher, Matrix Theory, Volume II. New York: Chelsea,

1960.

[77] E. Gilbert, private correspondence, May 1998.

[78] , private e-mail, June 1998.

[79] J. Gu and T. Fuja, “A new approach to constructing optimal block

codes for runlength-limited channels,” IEEE Trans. Inform. Theory, vol

40, pp. 774–785, May 1994.

[80] J. Heanue, M. Bashaw, and L. Hesselink, “Volume holographic storage

and retrieval of digital data,” Science, vol. 265, pp. 749–752, 1994.

[81] , “Channel codes for digital holographic data storage,” J. Opt.

Soc. Amer. Ser. A, vol. 12, pp. 2432–2439, 1995.

[82] J. Heanue, K. Gurkan, and L. Hesselink, “Signal detection for page-

access optical memories with intersymbol interference,” Appl. Opt., vol.

35, no. 14, pp. 2431–2438, May 1996.

[83] C. Heegard and L. Ozarow, “Bounding the capacity of saturation

recording: the Lorentz model and applications,” IEEE J. Select. Areas

Commun., vol. 10, pp. 145–156, Jan. 1992.

[84] J. P. J. Heemskerk and K. A. S. Immink, “Compact disc: System aspects

and modulation,” Philips Tech. Rev., vol. 40, no. 6, pp. 157–164, 1982.

[85] P. S. Henry, “Zero disparity coding system,” U.S. Patent 4 309694, Jan.

1982.

[86] T. Himeno, M. Tanaka, T. Katoku, K. Matsumoto, M. Tamura, and

H. Min-Jae, “High-density magnetic tape recording by a nontracking

method,” Electron. Commun. in Japan, vol. 76. no. 5, pt. 2, pp. 83–93,

1993

[87] W. Hirt, “Capacity and information rates of discrete-time channels with

memory,” Ph.D. dissertation (Diss. ETH no. 8671), Swiss Federal Inst.

Technol. (ETH), Zurich, Switzerland, 1988.

[88] W. Hirt and J. L. Massey, “Capacity of the discrete-time Gaussian

channel with intersymbol interference,” IEEE Trans. Inform. Theory,

vol. 34, pp. 380–388, May 1988.

[89] K. J. Hole, “Punctured convolutional codes for the partial-response

channel,” IEEE Trans. Inform. Theory, vol. 37, pt. 2, pp. 808–817, May

1991.

[90] K. J. Hole and Ø. Ytrehus, “Improved coding techniques for partial-

response channels,” IEEE Trans. Inform. Theory, vol. 40, pp. 482–493,

Mar. 1994.

[91] H. D. L. Hollmann, “Modulation codes,” Ph.D. dissertation, Eindhoven

Univ. Technol., Eindhoven, The Netherlands, Dec. 1996.

[92] , “On the construction of bounded-delay encodable codes for con-

strained systems,” IEEE Trans. Inform. Theory, vol. 41, pp. 1354–1378,

Sept. 1995.

[93] , “Bounded-delay-encodable, block-decodable codes for con-

strained systems,” IEEE Trans. Inform. Theory, vol. 42, pp. 1957–1970,

Nov. 1996.

[94] J. E. Hopcroft and J. D. Ullman, Introduction to Automata Theory,

Languages, and Computation. Reading, MA: Addison-Wesley, 1979.

[95] S. Hunter, F. Kiamilev, S. Esener, D. Parthenopoulos, and P. M.

Rentzepis,“Potentials of two-photon based 3D optical memories for high

performance computing,” Appl. Opt., vol. 29, pp. 2058–2066, 1990.

[96] K. A. S. Immink, “Modulation systems for digital audio discs with

optical readout,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal

Processing (Atlanta, GA, Apr. 1981), pp. 587–590.

[97] , “Construction of binary DC-constrained codes,” Philips J. Res.,

vol. 40, pp. 22–39, 1985.

[98] , “Performance of simple binary DC-constrained codes,” Philips

J. Res., vol. 40, pp. 1–21, 1985.

[99] , “Spectrum shaping with DC -constrained channel codes,”

Philips J. Res., vol. 40, pp. 40–53, 1985.

[100] , “Spectral null codes,” IEEE Trans. Magn., vol. 26, pp.

1130–1135, Mar. 1990.

[101] , “Runlength-limited sequences,” Proc. IEEE, vol. 78, pp.

1745–1759, Nov. 1990.

[102] ,Coding Techniques for Digital Recorders. Englewood Cliffs,

NJ: Prentice-Hall Int. (UK), 1991.

[103] , “Block-decodable runlength-limited codes via look-ahead tech-

nique,” Philips J. Res., vol. 46, pp. 293–310, 1992.

[104] , “Constructions of almost block-decodable runlength-limited

codes,” IEEE Trans. Inform. Theory, vol. 41, pp. 284–287, Jan. 1995.

[105] , “The Digital Versatile Disc (DVD): System requirements and

channel coding,” SMPTE J., vol. 105, no. 8, pp. 483–489, Aug. 1996.

[106] , “A practical method for approaching the channel capacity

2298 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

of constrained channels,” IEEE Trans. Inform. Theory, vol. 43, pp.

1389–1399, Sept. 1997.

[107] , “Weakly constrained codes,” Electron. Lett., vol. 33, no. 23, pp.

1943–1944, Nov. 1997.

[108] K. A. S. Immink and G. F. M. Beenker, “Binary transmission codes

with higher order spectral zeros at zero frequency,” IEEE Trans. Inform.

Theory, vol. IT-33, pp. 452–454, May 1987.

[109] K. A. S. Immink and H. Ogawa, “Method for encoding binary data,”

U.S. Patent 4501 000, Feb. 1985.

[110] K. A. S. Immink and L. Patrovics, “Performance assessment of DC-free

multimode codes,” IEEE Trans. Commun., vol. 45, pp. 293–299, Mar.

1997.

[111] K. A. S. Immink and A. van Wijngaarden, “Simple high-rate constrained

codes,” Electron. Lett., vol. 32, no. 20, pp. 1877, Sept. 1996.

[112] G. V. Jacoby, “A new look-ahead code for increasing data density,”

IEEE Trans. Magn., vol. MAG-13, pp. 1202–1204, Sept. 1977. See also

U.S. Patent 4323 931, Apr. 1982.

[113] G. V. Jacoby and R. Kost, “Binary two-thirds rate code with full word

look-ahead,” IEEE Trans. Magn., vol. MAG-20, pp. 709–714, Sept.

1984. See also M. Cohn, G. V. Jacoby, and C. A. Bates III, U.S. Patent

4337 458, June 1982.

[114] A. J. E. M. Janssen, private communication, 1998.

[115] A. J. E. M. Janssen and K. A. S. Immink, “Entropy and power

spectrum of asymmetrically DC-constrained binary sequences’, IEEE

Trans. Inform. Theory, vol. 37, pp. 924–927, May 1991.

[116] J. Justesen, “Information rates and power spectra of digital codes,” IEEE

Trans. Inform. Theory, vol. IT-28, pp. 457–472, May 1982.

[117] P. Kabal and S. Pasupathy, “Partial-response signaling,” IEEE Trans.

Commun., vol. COM-23, pp. 921–934, Sept. 1975.

[118] J. A. H. Kahlman and K. A. S. Immink, “Channel code with embedded

pilot tracking tones for DVCR,” IEEE Trans. Consumer Electron., vol.

41, pp. 180–185, Feb. 1995.

[119] H. Kamabe, “Minimum scope for sliding block decoder mappings,”

IEEE Trans. Inform. Theory, vol. 35, pp. 1335–1340, Nov. 1989.

[120] R. Karabed and B. H. Marcus, “Sliding-block coding for input-restricted

channels,” IEEE Trans. Inform. Theory, vol. 34, pp. 2–26, Jan. 1988.

[121] R. Karabed and P. H. Siegel, “Matched spectral-null codes for partial

response channels,” IEEE Trans. Inform. Theory, vol. 37, no. 3, pt. II,

pp. 818–855, May 1991.

[122] , “Coding for higher order partial response channels,” in Proc.

1995 SPIE Int. Symp. Voice, Video, and Data Communications (Philadel-

phia, PA, Oct. 1995), vol. 2605, pp. 115–126.

[123] R. Karabed, P. Siegel, and E. Soljanin, “Constrained coding for channels

with high intersymbol interference,”IEEE Trans. Inform. Theory,tobe

published.

[124] A. Kato and K. Zeger, “On the capacity of two-dimensional run-

length-limited codes,” in Proc. 1998 IEEE Int. Symp. Information Theory

(Cambridge, MA, Aug. 16–21, 1998), p. 320; submitted for publication

to IEEE Trans. Inform. Theory.

[125] W. H. Kautz, “Fibonacci codes for synchronization control,” IEEE

Trans. Inform. Theory, vol. IT-11, pp. 284–292, 1965.

[126] K. J. Kerpez, “The power spectral density of maximum entropy charge

constrained sequences,” IEEE Trans. Inform. Theory, vol. 35, pp.

692–695, May 1989.

[127] Z.-A. Khayrallah and D. Neuhoff, “Subshift models and ﬁnite-state

codes for input-constrained noiseless channels: A tutorial,” Univ.

Delaware EE Tech. Rep. 90–9–1, Dover, DE, 1990.

[128] K. J. Knudson, J. K. Wolf, and L. B. Milstein, “A concatenated decoding

scheme for partial response with matched spectral–null coding,”

in Proc. 1993 IEEE Global Telecommunications Conf. (GLOBECOM

’93) (Houston, TX, Nov. 1993), pp. 1960–1964.

[129] D. E. Knuth, “Efﬁcient balanced codes,” IEEE Trans. Inform. Theory,

vol. IT-32, pp. 51–53, Jan. 1986.

[130] H. Kobayashi, “Application of probabilistic decoding to digital magnetic

recording systems,” IBM J. Res. Develop., vol. 15, pp. 65–74, Jan. 1971.

[131] , “Correlative level coding and maximum-likelihood decoding,”

IEEE Trans. Inform. Theory, vol. IT-17, pp. 586–594, Sept. 1971.

[132] , “A survey of coding schemes for transmission or recording of

digital data,” IEEE Trans. Commun., vol. COM-19, pp. 1087–1099, Dec.

1971.

[133] H. Kobayashi and D. T. Tang, “Appliction of partial-response channel

coding to magnetic recording systems,” IBM J. Res. Develop., vol. 14,

pp. 368–375, July 1970.

[134] E. R. Kretzmer, “Generalization of a technique for binary data transmis-

sion,” IEEE Trans. Commun. Technol., vol. COM-14, pp. 67–68, Feb.

1966.

[135] A. Kunisa, S. Takahashi, and N. Itoh, “Digital modulation method for

recordable digital video disc,” IEEE Trans. Consumer Electron., vol.

42, pp. 820–825, Aug. 1996.

[136] A. Lempel and M. Cohn, “Look-ahead coding for input-restricted

channels,” IEEE Trans. Inform. Theory, vol. IT-28, pp. 933–937, Nov.

1982.

[137] S. Lin and D. J. Costello, Jr., Error Control Coding, Fundamentals and

Applications. Englewood Cliffs, NJ: Prentice-Hall, 1983.

[138] D. Lind and B. Marcus, Symbolic Dynamics and Coding. Cambridge,

U.K.: Cambridge Univ. Press, 1995.

[139] J. C. Mallinson and J. W. Miller, “ Optimal codes for digital magnetic

recording,” Radio Elec. Eng., vol. 47, pp. 172–176, 1977.

[140] M. W. Marcellin and H. J. Weber, “Two-dimensional modulation codes,”

IEEE J. Select. Areas Commun., vol. 10, pp. 254–266, Jan. 1992.

[141] B. H. Marcus, “Soﬁc systems and encoding data,” IEEE Trans. Inform.

Theory, vol. IT-31, pp. 366–377, May 1985.

[142] , “Symbolic dynamics and connections to coding theory, automata

theory and systems theory,” in Different Aspects of Coding Theory (Proc.

Symp. Applied Matematics.), A. R. Calderbank, Ed., vol. 50, American

Math. Soc., 1995.

[143] B. H. Marcus and R. M. Roth, “Bounds on the number of states in

encoder graphs for input-constrained channels,” IEEE Trans. Inform.

Theory, vol. 37, no. 3, pt. 2, pp. 742–758, May 1991.

[144] B. H. Marcus, R. M. Roth, and P. H. Siegel, “Constrained systems

and coding for recording channels,” in Handbook of Coding Theory,R.

Brualdi, C. Huffman, and V. Pless, Eds. Amsterdam, The Netherlands:

Elsevier, 1998.

[145] B. H. Marcus and P. H. Siegel, “On codes with spectral nulls at rational

submultiples of the symbol frequency,” IEEE Trans. Inform. Theory,

vol. IT-33, pp. 557–568, July 1987.

[146] B. H. Marcus, P. H. Siegel, and J. K. Wolf, “Finite-state modulation

codes for data storage,” IEEE J. Select. Areas Commun., vol. 10, pp.

5–37, Jan. 1992.

[147] P. A. McEwen and J. K. Wolf, “Trellis codes for E PR4ML

with squared-distance 18,” IEEE Trans. Magn., vol. 32, pp. 3995–3997,

Sept. 1996.

[148] S. W. McLaughlin, “Five runlength-limited codes for -ary recording

channels, ” IEEE Trans. Magn., vol. 33, pp. 2442–2450, May 1997.

[149] S. W. McLaughlin and D. L. Neuhoff, “Upper bounds on the capacity

of the digital magnetic recording channel,” IEEE Trans. Magn., vol. 29,

pp. 59–66, Jan. 1993.

[150] J. W. Miller, U.S. Patent 4 027335, 1977.

[151] T. Mittelholzer, P. A. McEwen, S. A. Altekar, and J. K. Wolf, “Finite

truncation depth trellis codes for the dicode channel,” IEEE Trans.

Magn., vol. 31, no. 6, pt. 1, pp. 3027–3029, Nov. 1995.

[152] B. E. Moision, P. H. Siegel, and E. Soljanin, “Distance-enhancing codes

for digital recording,” IEEE Trans. Magn., vol. 34, no. 1, pt. 1, pp.

69–74, Jan. 1998.

[153] C. M. Monti and G. L. Pierobon, “ Codes with a multiple spectral null

at zero frequency,” IEEE Trans. Inform. Theory, vol. 35, pp. 463–471,

Mar. 1989.

[154] J. Moon and B. Brickner, “ Maximum transition run codes for data stor-

age systems,” IEEE Trans. Magn., vol. 32, no. 5, pt. 1, pp. 3992–3994,

Sept. 1996.

[155] , “Design of a rate 5/6 maximum transition run code,” IEEE

Trans. Magn., vol. 33, pp. 2749–2751, Sept. 1997.

[156] H. Nakajima and K. Odaka, “A rotary-head high-density digital au-

dio tape recorder,” IEEE Trans. Consumer Electron., vol. CE-29, pp.

430–437, Aug. 1983.

[157] K. Norris and D. S. Bloomberg, “Channel capacity of charge-constrained

run-length limited codes,” IEEE Trans. Magn., vol. MAG-17, no. 6, pp.

3452–3455, Nov. 1981.

[158] B. Olson and S. Esener,“Partial response precoding for parallel-readout

optical memories,” Opt. Lett., vol. 19, pp. 661–663, 1993.

[159] L. H. Ozarow, A. D. Wyner, and J. Ziv, “Achievable rates for a

constrained Gaussian channel,” IEEE Trans. Inform. Theory, vol. 34,

pp. 365–371, May 1988.

[160] A. M. Patel, “Zero-modulation encoding in magnetic recording,” IBM

J. Res. Develop., vol. 19, pp. 366–378, July 1975. See also U.S. Patent

3810 111, May 1974.

[161] ,IBM Tech. Discl. Bull., vol. 231, no. 8, pp. 4633–4634, Jan.1989.

[162] G. L. Pierobon, “Codes for zero spectral density at zero frequency,”

IEEE Trans. Inform. Theory, vol. IT-30, pp. 435–439, Mar. 1984.

[163] K. C. Pohlmann, The Compact Disc Handbook, 2nd ed. Madison, WI:

A–R Editions, 1992.

[164] J. Rae, G. Christiansen, S.-M. Shih, H. Thapar, R. Karabed, and P.

Siegel, “Design and performance of a VLSI 120 Mb/s trellis-coded

partial-response channel,” IEEE Trans. Magn., vol. 31, pp. 1208–1214,

Mar. 1995.

[165] R. M. Roth, P. H. Siegel, and A. Vardy, “High-order spectral-null codes:

Constructions and bounds,” IEEE Trans. Inform. Theory, vol. 40, pp.

1826–1840, Nov. 1994.

IMMINK et al.: CODES FOR DIGITAL RECORDERS 2299

[166] Lord Rothschild, “The distribution of English dictionary word lengths,”

J. Statist. Planning Infer., vol. 14, pp. 311–322, 1986.

[167] D. Rugar and P. H. Siegel, “Recording results and coding considerations

for the resonant bias coil overwrite technique,” in Optical Data Storage

Topical Meet., Proc. SPIE, G. R. Knight and C. N. Kurtz, Eds., vol.

1078, pp. 265–270, 1989.

[168] W. E. Ryan, L. L. McPheters, and S. W. McLaughlin, “Combined turbo

coding and turbo equalization for PR4-equalized Lorentzian channels,”

in Proc. Conf. Information Science and Systems (CISS’98) (Princeton,

NJ, Mar. 1998)..

[169] N. Sayiner, “Impact of the track density versus linear density trade–off

on the read channel: TCPR4 versus EPR4,” in Proc. 1995 SPIE Int.

Symp. on Voice, Video, and Data Communications (Philadelphia, PA,

Oct. 1995), vol. 2605, pp. 84–91.

[170] E. Seneta, Non-negative Matrices and Markov Chains, 2nd ed. New

York: Springer, 1980.

[171] S. Shamai (Shitz) and I. Bar-David, “Upper bounds on the capacity for

a constrained Gaussian channel,” IEEE Trans. Inform. Theory, vol. 35,

pp. 1079–1084, Sept. 1989.

[172] S. Shamai (Shitz), L. H. Ozarow, and A. D. Wyner, “Information rates

for a discrete-time Gaussian channel with intersymbol interference and

stationary inputs,” IEEE Trans. Inform. Theory, vol. 37, pp. 1527–1539,

Nov. 1991.

[173] C. E. Shannon, “A mathematical theory of communication,” Bell Syst.

Tech. J., vol. 27, pp. 379–423, July 1948.

[174] K. Shaughnessy, personal communication, Dec. 1997.

[175] L. A. Shepp, “Covariance of unit processes,” in Proc. Working Conf.

Stochastic Processes (Santa Barbara, CA, 1967), pp. 205–218.

[176] K. Shimazaki, M. Yoshihiro, O. Ishizaki, S. Ohnuki, and N. Ohta,

“Magnetic multi-valued magneto-optical disk,” J. Magn. Soc. Japan,

vol. 19, suppl. no. S1, p. 429–430, 1995.

[177] P. H. Siegel, “Recording codes for digital magnetic storage,” IEEE

Trans. Magn., vol. MAG-21, pp. 1344–1349, Sept. 1985.

[178] P. H. Siegel and J. K. Wolf, “Modulation and coding for information

storage,” IEEE Commun. Mag., vol. 29, pp. 68–86, Dec. 1991.

[179] , “Bit-stufﬁng bounds on the capacity of two-dimensional con-

strained arrays,” in Proc. 1998 IEEE Int. Symp. Inform. Theory (Cam-

bridge, MA, Aug. 16–21, 1998), p. 323.

[180] J. G. Smith, “The information capacity of amplitude and variance

constrained scalar Gaussian channels,” Inform. Contr., vol. 18, pp.

203–219, 1971.

[181] E. Soljanin, “On–track and off–track distance properties of Class 4

partial response channels,” in Proc. 1995 SPIE Int. Symp. Voice, Video,

and Data Communications (Philadelphia, PA, Oct. 1995), vol. 2605, pp.

92–102.

[182] , “On coding for binary partial-response channels that don’t

achieve the matched-ﬁlter-bound,” in Proc. 1996 Information Theory

Work. (Haifa, Israel, June 9–13, 1996).

[183] E. Soljanin and C. N. Georghiades, “Multihead detection for multitrack

recording channels,” to be published in IEEE Trans. Inform. Theory,

vol. 44, Nov. 1998.

[184] E. Soljanin and O. E. Agazzi, “An interleaved coding scheme for

partial response with concatenated decoding” in

Proc. 1993 IEEE Global Telecommunications Conf. (GLOBECOM’96)

(London, U.K., Nov. 1996).

[185] R. E. Swanson and J. K. Wolf, “A new class of two-dimensional RLL

recording codes,” IEEE Trans. Magn., vol. 28, pp. 3407–3416, Nov.

1992.

[186] N. Swenson and J. M. Ciofﬁ, “Sliding block line codes to increase

dispersion-limited distance of optical ﬁber channels,” IEEE J. Select.

Areas Commun., vol. 13, pp. 485–498, Apr. 1995.

[187] R. Talyansky, T. Etzion, and R. M. Roth, “Efﬁcient code constructions

for certain two-dimensional constraints,” in Proc. 1997 IEEE Int. Symp.

Information Theory (Ulm, Germany, June 29–July 4), p. 387.

[188] D. T. Tang and L. R. Bahl, “Block codes for a class of constrained

noiseless channels,” Inform. Contr., vol. 17, pp. 436-461, 1970.

[189] H. K. Thapar and T. D. Howell, “On the performance of partial response

maximum-likelihood and peak detection methods in digital recording,”

in Tech. Dig. Magn. Rec. Conf 1991 (Hidden Valley, PA, June 1991).

[190] H. Thapar and A. Patel, “A class of partial-response systems for

increasing storage density in magnetic recording,” IEEE Trans. Magn.,

vol. MAG-23, pp. 3666–3668, Sept. 1987.

[191] Tj. Tjalkens, “Runlength limited sequences,” IEEE Trans. Inform. The-

ory, vol. 40, pp. 934–940, May 1994.

[192] IEEE Trans. Magn., vol. 34, no. 1, pt. 1, Jan. 1998.

[193] B. S. Tsybakov, “Capacity of a discrete Gaussian channel with a ﬁlter,”

Probl. Pered. Inform., vol. 6, pp. 78–82, 1970.

[194] C. M. J. van Uijen and C. P. M. J. Baggen, “Performance of a class of

channel codes for asymmetric optical recording,” in Proc. 7th Int. Conf.

Video, Audio and Data Recording, IERE Conf. Publ. no. 79 (York, U.K.,

Mar. 1988), pp. 29–32.

[195] A. Vardy, M. Blaum, P. Siegel, and G. Sincerbox, “Conservative arrays:

Multi-dimensional modulation codes for holographic recording,” IEEE

Trans. Inform. Theory, vol. 42, pp. 227–230, Jan. 1996.

[196] J. Watkinson, The Art of Digital Audio. London, U.K.: Focal, 1988.

[197] A. D. Weathers and J. K. Wolf, “ A new sliding block code for the

runlength constraint with the minimal number of encoder states,”

IEEE Trans. Inform. Theory, vol. 37, no. 3, pt. 2, pp. 908–913, May

1991.

[198] A. D. Weathers, S. A. Altekar, and J. K. Wolf, “Distance spectra for

PRML channels,” IEEE Trans. Magn., vol. 33, pp. 2809–2811, Sept.

1997.

[199] W. Weeks IV and R. E. Blahut, “The capacity and coding gain of

certain checkerboard codes,” IEEE Trans. Inform. Theory, vol. 44, pp.

1193–1203, May 1998.

[200] T. Weigandt, “Magneto-optic recording using a (2,18,2) run-length-

limited code,” S.M. thesis, Mass. Inst. Technol., Cambridge, MA, 1991.

[201] A. X. Widmer and P. A. Franaszek, “A DC-balanced, partitioned-block,

8b/10b transmission code,” IBM J. Res. Develop., vol. 27, no. 5, pp.

440–451, Sept. 1983.

[202] A. van Wijngaarden and K. A. S. Immink, “Construction of constrained

codes using sequence replacement techniques,” submitted for publica-

tion to IEEE Trans. Inform. Theory, 1997.

[203] J. K. Wolf and W. R. Richard, “Binary to ternary conversion by linear

ﬁltering,” Tech. Documentary Rep. RADC-TDR-62-230, May 1962.

[204] J. K. Wolf and G. Ungerboeck, “Trellis coding for partial-response

channels,” IEEE Trans. Commun., vol. COM-34, pp. 765–773, Aug.

1986.

[205] R. W. Wood, “Denser magnetic memory,” IEEE Spectrum, vol. 27, pp.

32–39, May 1990.

[206] R. W. Wood and D. A. Petersen, “Viterbi detection of class IV partial

response on a magnetic recoding channel,” IEEE Trans. Commun., vol.

COM-34, pp. 454–461, May 1986.

[207] Z.-N. Wu, S. Lin, and J. M. Ciofﬁ, “Capacity bounds for magnetic

recording channels,” in Proc. 1998 IEEE Global Telecommun. Conf.

(GLOBECOM ’98) (Sydney, Australia, Nov. 8–12, 1998), to be pub-

lished.

[208] H. Yoshida, T. Shimada, and Y. Hashimoto, “8-9 block code: A DC-

free channel code for digital magnetic recording’, SMPTE J., vol. 92,

pp. 918-922, Sept. 1983.

[209] S. Yoshida and S. Yajima, “On the relation between an encoding

automaton and the power spectrum of its output sequence,” Trans. IECE

Japan, vol. 59, pp. 1–7, 1976.

[210] A. H. Young, “Implementation issues of 8/9 distance-enhancing con-

strained codes for EEPR4 channel,” M.S. thesis, Univ. Calif., San Diego,

June 1997.

[211] E. Zehavi, “Coding for magnetic recording,” Ph.D. dissertation, Univ.

Calif., San Diego, 1987.

[212] E. Zehavi and J. K. Wolf, “On saving decoder states for some trellis

codes and partial response channels,” IEEE Trans. Commun., vol. 36,

pp. 454–461, Feb. 1988.