2260 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998
Codes for Digital Recorders
Kees A. Schouhamer Immink, Fellow, IEEE, Paul H. Siegel, Fellow, IEEE, and Jack K. Wolf, Fellow, IEEE
Abstract—Constrained codes are a key component in the digital
recording devices that have become ubiquitous in computer data
storage and electronic entertainment applications. This paper
surveys the theory and practice of constrained coding, tracing
the evolution of the subject from its origins in Shannon’s classic
1948 paper to present-day applications in high-density digital
recorders. Open problems and future research directions are also
Index Terms—Constrained channels, modulation codes, record-
AS has been observed by many authors, the storage and
retrieval of digital information is a special case of digital
communications. To quote E. R. Berlekamp :
Communication links transmit information from here to
there. Computer memories transmit information from
now to then.
Thus as information theory provides the theoretical under-
pinnings for digital communications, it also serves as the
foundation for understanding fundamental limits on reliable
digital data recording, as measured in terms of data rate and
A block diagram which depicts the various steps in record-
ing and recovering data in a storage system is shown in Fig. 1.
This “Fig. 1” is essentially the same as the well-known Fig. 1
used by Shannon in his classic paper  to describe a
general communication system, but with the conﬁguration of
codes more explicitly shown.
As in many digital communication systems, a concatenated
approach to channel coding has been adopted in data recording,
consisting of an algebraic error-correcting code in cascade
with a modulation code. The inner modulation code, which
is the focus of this paper, serves the general function of
matching the recorded signals to the physical channel and
to the signal-processing techniques used in data retrieval,
while the outer error-correction code is designed to remove
Manuscript received December 10, 1997; revised June 5, 1998. The work
of P. H. Siegel was supported in part by the National Science Foundation
under Grant NCR-9612802. The work of J. K. Wolf was supported in part by
the National Science Foundation under Grant NCR-9405008.
K. A. S. Immink is with the Institute of Experimental Mathematics,
University of Essen, 45326 Essen, Germany.
P. H. Siegel and J. K. Wolf are with the University of California at San
Diego, La Jolla, CA 92093-0407 USA.
Publisher Item Identiﬁer S 0018-9448(98)06735-2.
Fig. 1. Block diagram of digital recording system.
any errors remaining after the detection and demodulation
process. (See  in this issue for a survey of applications
of error-control coding.)
As we will discuss in more detail in the next section, a
recording channel can be modeled, at a high level, as a linear,
intersymbol-interference (ISI) channel with additive Gaussian
noise, subject to a binary input constraint. The combination
of the ISI and the binary input restriction has presented a
challenge in the information-theoretic performance analysis
of recording channels, and it has also limited the applica-
bility of the coding and modulation techniques that have
been overwhelmingly successful in communication over linear
Gaussian channels. (See  in this issue for a comprehensive
discussion of these methods.)
The development of signal processing and coding tech-
niques for recording channels has taken place in an environ-
ment of escalating demand for higher data transfer rates and
storage capacity—magnetic disk drives for personal computers
today operate at astonishing data rates on the order of 240
million bits per second and store information at densities of
up to 3 billion bits per square inch—coupled with increasingly
severe constraints on hardware complexity and cost.
The needs of the data storage industry have not only fostered
innovation in practical code design, but have also spurred the
development of a rigorous mathematical foundation for the
theory and implementation of constrained codes. They have
also stimulated advances in the information-theoretic analysis
of input-constrained, noisy channels.
In this paper, we review the progress made during the past
50 years in the theory and practical design of constrained
modulation codes for digital data recording. Along the way, we
will highlight the fact that, although Shannon did not mention
0018–9448/98$10.00 1998 IEEE
IMMINK et al.: CODES FOR DIGITAL RECORDERS 2261
storage in his classic two-part paper whose golden anniversary
we celebrate in this issue—indeed random-access storage as
we know it today did not exist at the time—a large number of
fundamental results and techniques relevant to coding for stor-
age were introduced in his seminal publication. We will also
survey emerging directions in data-storage technology, and
discuss new challenges in information theory that they offer.
The outline of the remainder of the paper is as follows.
In Section II, we present background on magnetic-recording
channels. Section II-A gives a basic description of the physical
recording process and the resulting signal and noise character-
istics. In Section II-B, we discuss mathematical models that
capture essential features of the recording channel and we
review information-theoretic bounds on the capacity of these
models. In Section II-C, we describe the signal-processing
and -detection techniques that have been most widely used
in commercial digital-recording systems.
In Section III-A, we introduce the input-constrained, (noise-
less) recording channel model, and we examine certain time-
domain and frequency-domain constraints that the channel
input sequences must satisfy to ensure successful implemen-
tation of the data-detection process. In Section III-B, we
review Shannon’s theory of input-constrained noiseless chan-
nels, including the deﬁnition and computation of capacity, the
determination of the maxentropic sequence measure, and the
fundamental coding theorem for discrete noiseless channels.
In Section IV, we discuss the problem of designing efﬁcient,
invertible encoders for input-constrained channels. As in the
case of coding for noisy communication channels, this is
a subject about which Shannon had little to say. We will
summarize the substantial theoretical and practical progress
that has been made in constrained modulation code design.
In Section V, we present coded-modulation techniques
that have been developed to improve the performance of
noisy recording channels. In particular, we discuss families
of distance-enhancing constrained codes that are intended for
use with partial-response equalization and various types of
sequence detection, and we compare their performance to
estimates of the noisy channel capacity.
In Section VI, we give a compendium of modulation-code
constraints that have been used in digital recorders, describ-
ing in more detail their time-domain, frequency-domain, and
In Section VII, we indicate several directions for future
research in coding for digital recording. In particular, we
consider the incorporation of improved channel models into
the design and performance evaluation of modulation codes,
as well as the invention of new coding techniques for ex-
ploratory information storage technologies, such as nonsatu-
ration recording using multilevel signals, multitrack recording
and detection, and multidimensional page-oriented storage.
Finally, in Section VIII, we close the paper with a dis-
cussion of Shannon’s intriguing, though somewhat cryptic,
remarks pertaining to the existence of crossword puzzles, and
make some observations about their relevance to coding for
multidimensional constrained recording channels.
Section IX brieﬂy summarizes the objectives and contents
of the paper.
II. BACKGROUND ON DIGITAL RECORDING
The history of signal processing in digital recording systems
can be cleanly broken into two epochs. From 1956 until
approximately 1990, direct-access storage devices relied upon
“analog” detection methods, most notably peak detection.
Beginning in 1990, the storage industry made a dramatic shift
to “digital” techniques, based upon partial-response equaliza-
tion and maximum-likelihood sequence detection, an approach
that had been proposed 20 years earlier by Kobayashi and
Tang , , . To understand how these signal-
processing methods arose, we review a few basic facts about
the physical process underlying digital magnetic recording.
(Readers interested in the corresponding background on optical
recording may refer to , , [102, Ch. 2], and .)
We distill from the physics several mathematical models of
the recording channel, and describe upper and lower bounds
on their capacity. We then present in more detail the analog
and digital detection approaches, and we compare them to the
optimal detector for the uncoded channel.
A. Digital Recording Basics
The magnetic material contained on a magnetic disk or tape
can be thought of as being made up of a collection of discrete
magnetic particles or domains which can be magnetized by
a write head in one of two directions. In present systems,
digital information is stored along paths, called tracks, in this
magnetic medium. We store binary digits on a track by magne-
tizing these particles or domains in one of two directions. This
method is known as “saturation” recording. The stored binary
digits usually are referred to as “channel bits.” Note that the
word “bit” is used here as a contraction of the words “binary
digit” and not as a measure of information. In fact, we will
see that when coding is introduced, each channel bit represents
only a fraction of a bit of user information. The modiﬁer
“channel” in “channel bits” emphasizes this difference. We
will assume a synchronous storage system where the channel
bits occur at the ﬁxed rate of channel bits per second.
Thus is the duration of a channel bit. In all magnetic-
storage systems used today, the magnetic medium and the
read/write transducer (referred to as the read/write head) move
with respect to each other. If the relative velocity of a track
and the read/write head is constant, the constant time-duration
of the bit translates to a constant linear channel-bit density,
reﬂected in the length corresponding to a channel bit along
The normalized input signal applied to the recording trans-
ducer (write head) in this process can be thought of as a
two-level waveform which assumes the values and over
consecutive time intervals of duration In the waveform, the
transitions from one level to another, which effectively carry
the digital information, are therefore constrained to occur at
integer multiples of the time period , and we can describe
the waveform digitally as a sequence
over the bipolar alphabet where is the signal
amplitude in the time interval In the simplest
model, the input–output relationship of the digital magnetic
recording channel can be viewed as linear. Denote by
2262 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998
Fig. 2. Lorentzian channel step response,
the output signal (readback voltage), in the absence of noise,
corresponding to a single transition from, say, to at
time Then, the output signal generated by the
waveform represented by the sequence is given by
with Note that the “derivative” sequence of
coefﬁcients consists of elements taken
from the ternary alphabet and the nonzero values,
corresponding to the transitions in the input signal, alternate
A frequently used model for the transition response is
often referred to as the Lorentzian model for an isolated-step
response. The parameter is sometimes denoted ,an
abbreviation for “pulsewidth at 50% maximum amplitude,” the
width of the pulse measured at 50% of its maximum height.
The Lorentzian step response with is shown in
The output signal is therefore the linear superposition of
time-shifted Lorentzian pulses with coefﬁcients of magnitude
equal to and alternating polarity. For this channel, sometimes
called the differentiated Lorentzian channel, the frequency
where The magnitude of the frequency response
with is shown in Fig. 3.
The simplest model for channel noise assumes that the
noise is additive white Gaussian noise (AWGN). That is, the
readback signal takes the form
There are, of course, far more accurate and sophisticated
models of a magnetic-recording system. These models take
into account the failure of linear superposition, asymmetries
in the positive and negative step responses, and other nonlinear
phenomena in the readback process. There are also advanced
models for media noise, incorporating the effects of material
defects, thermal asperities, data dependence, and adjacent track
interference. For more information on these, we direct the
reader to , , , and .
B. Channel Models and Capacity
The most basic model of a saturation magnetic-recording
system is a binary-input, linear, intersymbol-interference (ISI)
channel with AWGN, shown in Fig. 4.
This model has been, and continues to be, widely used in
comparing the theoretical performance of competing modula-
tion, coding, and signal-processing systems. During the past
IMMINK et al.: CODES FOR DIGITAL RECORDERS 2263
Fig. 3. Differentiated Lorentzian channel frequency response magnitude,
Fig. 4. Continuous-time recording channel model.
decade, there has been considerable research effort devoted to
ﬁnding the capacity of this channel. Much of this work was
motivated by the growing interest in digital recording among
the information and communication theory communities ,
. In this section, we survey some of the results pertaining
to this problem. As the reader will observe, the analysis is
limited to rather elementary channel models; the extension
to more advanced channel models represents a major open
1) Continuous-Time Channel Models: Many of the bounds
we cite were ﬁrst developed for the ideal, low-pass ﬁlter
channel model. These are then adapted to the more realistic
differentiated Lorentzian ISI model.
For a given channel, let denote the capacity with a
constraint on the average input power. Let denote the
capacity with a peak power constraint Finally, let denote
the capacity with binary input levels It is clear that
The following important result, due to Ozarow, Wyner, and
Ziv , states that the ﬁrst inequality is, in fact, an equality
under very general conditions on the channel ISI.
Peak-Power Achievable Rate Lemma: For the channel
shown in Fig. 4, if is square integrable, then any rate
achievable using waveforms satifying
is achievable using the constrained waveforms
We now exploit this result to develop upper and lower
bounds on the capacity Consider, ﬁrst, a continuous-time,
bandlimited, additive Gaussian noise channel with transfer
Assume that the noise has (double-sided) spectral density
Let be the total noise power in the
channel bandwidth. Shannon established the well-known and
celebrated formula for the capacity of this channel, under the
assumption of an average power constraint on the channel
input signals. We quote from :
Theorem 17: The capacity of a channel of band
perturbed by white thermal noise of power when the
average transmitter power is limited to is given by
(We have substituted the notation for Shannon’s nota-
tion to avoid confusion.)
2264 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998
This result is a special case of the more general “water-
ﬁlling” theorem for the capacity of an average input-power
constrained channel with transfer function and noise
power spectral density [74, p. 388]
where denotes the range of frequencies in which
and satisﬁes the equation
By the peak-power achievable rate lemma, this result provides
an upper bound on the capacity of the recording channel.
Applications of this bound to a parameterized channel model
are presented in .
An improved upper bound on the capacity of
the low-pass AWGN channel was developed by Shamai and
Bar-David . This bound is a reﬁnement of the water-
ﬁlling upper bound, based upon a characterization of the
power spectral density of any unit process, meaning
a zero-mean, stationary, two-level continuous-time random
process . For a speciﬁed input-power spectral density
, a Gaussian input distribution maximizes the capacity.
Therefore, for a given channel transfer function
and the supremum is taken over all unit process power
spectral densities. In , an approximate solution to this
optimization problem for the ideal low-pass ﬁlter was used to
prove that peak-power limiting on the bandlimited channel
does indeed reduce capacity relative to the average-power
constrained channel. This bounding technique was applied to
the differentiated Lorentzian channel with additive colored
Gaussian noise in .
We now consider lower bounds to the capacity
Shannon  considered the capacity of a peak-power input
constraint on the ideal bandlimited AWGN channel, noting
that “a constraint of this type does not work out as well
mathematically as the average power limitation.” Nevertheless,
he provided a lower bound, quoted below:
Theorem 20: The channel capacity for a band
perturbed by white thermal noise of power is bounded
where is the peak allowed transmitter power
(We have substituted the notation for Shannon’s notation
to avoid confusion.)
In , the peak-power achievable rate lemma was used
to derive a lower bound on for the ideal, binary-input
constrained, bandlimited channel
A lower bound for the more accurate channel model compris-
ing a cascade of a differentiator and ideal low-pass ﬁlter was
also determined. For this channel, it was shown that
In both cases, the discrepancy between the lower bounds and
the water-ﬁlling upper bounds represents an effective
signal-to-noise ratio (SNR) difference of or about 7.6
dB at high signal-to-noise ratios.
Heegard and Ozarow  incorporated the differentiated
Lorentzian channel model into a similar analysis. To obtain a
lower bound, they optimize, with respect to , the inequality
where is the pulse power spectral density for the
differentiated Lorentzian channel
Their results indicate that, just as for the low-pass channel and
the differentiated low-pass channel, the difference in effective
signal-to-noise ratios between upper and lower bounds on
capacity is approximately , for large signal-to-noise ra-
tios. The corresponding bound for the differentiated Lorentzian
channel with additive colored Gaussian noise was determined
Shamai and Bar-David  developed an improved lower
bound on by analyzing the achievable rate of a
random telegraph wave, that is, a unit process with time
intervals between transitions independently governed by an
exponential distribution. Again, the corresponding bound for
the differentiated Lorentzian channel with additive colored
Gaussian noise was discussed in . Bounds on capacity for
a model incorporating slope-limitations on the magnetization
are addressed in .
Computational results for the differentiated Lorentzian chan-
nel with additive colored Gaussian noise are given in .
For channel densities in the range of – , which
corresponds to channel densities of current practical interest,
the required SNR for arbitrarily low error rate was calculated.
The gap between the best capacity bounds, namely, the unit
process upper bound and the random telegraph wave lower
bound, was found to be approximately 3 dB throughout
IMMINK et al.: CODES FOR DIGITAL RECORDERS 2265
2) Discrete-Time Channel Models The capacity of discrete-
time channel models applicable to digital recording has
been addressed by several authors, for example, , ,
, and . The capacity of an average input-power-
constrained, discrete-time, memoryless channel with additive,
independent and identically distributed (i.i.d.) Gaussian noise
is given by the well-known formula 
where is the noise variance and is the average input-
power constraint. This result is the discrete-time equivalent
to Shannon’s formula (2) via the sampling theorem. Smith
 showed that the capacity of an amplitude-constrained,
discrete-time, memoryless Gaussian channel is achieved by a
ﬁnite-valued random variable, representing the input to the
channel, whose distribution is uniquely determined by the
input constraint. (Note that, unlike the case of an average
input-power constraint, this result cannot be directly translated
to the continuous-time model.)
Shamai, Ozarow, and Wyner  established upper and
lower bounds on the capacity of the discrete-time Gaussian
channel with ISI and stationary inputs. We will encounter in
the next section a discrete-time ISI model of the magnetic-
recording channel of the form , for
For the channel decomposes into a pair of
interleaved “dicode” channels corresponding to
In , the capacity upper bound was compared to upper
and lower bounds on the maximum achievable information
rate for the normalized dicode channel model with system
polynomial , and input levels
These upper and lower bounds are given by
is the capacity of a binary input-constrained, memoryless
Gaussian channel. Thus the upper bound on is simply the
capacity of the latter channel. These upper and lower bounds
differ by 3 dB, as was the case for continuous-time channel
For other results on capacity estimates of recording-channel
models, we refer the reader to  and . The general
problem of computing, or developing improved bounds for,
the capacity of discrete-time ISI models of recording channels
remains a signiﬁcant challenge.
C. Detectors for Uncoded Channels
Forney  derived the optimal sequence detector for an un-
coded, linear, intersymbol-interference channel with additive
white Gaussian noise. This detection method, the well-known
maximum-likelihood sequence detector (MLSD), comprises a
whitened matched ﬁlter, whose output is sampled at the symbol
rate, followed by a Viterbi detector whose trellis structure
reﬂects the memory of the ISI channel. For the differenti-
ated Lorentzian channel model, as for many communication
channel models, this detector structure would be prohibi-
tively complex to implement, requiring an unbounded number
of states in the Viterbi detector. Consequently, suboptimal
detection techniques have been implemented. As mentioned
at the start of this section, most storage devices did not
even utilize sampled detection methods until the start of
this decade, relying upon equalization to mitigate effects
of ISI, coupled with analog symbol-by-symbol detection of
waveform features such as peak positions and amplitudes.
Since the introduction of digital signal-processing techniques
in recording systems, partial-response equalization and Viterbi
detection have been widely adopted. They represent a practical
compromise between implementability and optimality, with
respect to the MLSD. We now brieﬂy summarize the main
features of these detection methods.
1) Peak Detection: The channel model described above is
accurate at relatively low linear densities (say
and where the noise is generated primarily in the readback
electronics. Provided that the density of transitions and the
noise variance are small enough, the locations of peaks
in the output signal will closely correspond to the locations of
the transitions in the recorded input signal. With a synchronous
clock of period , one could then, in principle, reconstruct
the ternary sequence and the recorded bipolar sequence
The detection method used to implement this process in the
potentially noisy digital recording device is known as peak
detection and it operates roughly as follows. The peak detector
differentiates the rectiﬁed readback signal, and determines the
time intervals in which zero crossings occur. In parallel, the
amplitude of each corresponding extremal point in the rectiﬁed
signal is compared to a prespeciﬁed threshold, and if the
threshold is not exceeded, the corresponding zero crossing is
ignored. This ensures that low-amplitude, spurious peaks due
to noise will be excluded from consideration. Those intervals
in which the threshold is exceeded are designated as having a
peak. The two-level recorded sequence is then reconstructed,
with a transition in polarity corresponding to each interval
containing a detected peak. Clock accuracy is maintained by
an adaptive timing recovery circuit—known as a phase-lock
loop (PLL)—which adjusts the clock frequency and phase to
ensure that the amplitude-qualiﬁed zero crossings occur, on
average, in the center of their respective clock intervals.
2) PRML: Current high-density recording systems use a
technique referred to as PRML, an acronym for “partial-
response (PR) equalization with maximum-likelihood (ML)
sequence detection.” We now brieﬂy review the essence of
this technique in order to motivate the use of constrained
modulation codes in PRML systems.
Kobayashi and Tang  proposed a digital communica-
tions approach to handling intersymbol interference in digital
magnetic recording. In contrast to peak detection, their method
reconstructed the recorded sequence from sample values of a
suitably equalized readback signal, with the samples measured
2266 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998
at time instants At channel bit densities
corresponding to , the transfer characteristics
of the Lorentzian model of the saturation recording channel
(with a time shift of ) closely resemble those of a linear
ﬁlter with step response given by
Note that at the consecutive sample times and ,
the function has the value , while at all other times
which are multiples of , the value is . Through linear
superposition (1), the output signal generated by the
waveform represented by the bipolar sequence is given by
which can be rewritten as
where we set The transition response
results in controlled intersymbol interference at sample times,
leading to output-signal samples that, in the
absence of noise, assume values in the set Thus in the
noiseless case, we can recover the recorded bipolar sequence
from the output sample values , because
the interference between adjacent transitions is prescribed. In
contrast to the peak detection method, this approach does not
require the separation of transitions.
Sampling provides a discrete-time version of this recording-
channel model. Setting , the input–output relation-
ship is given by
In -transform notation, whereby a sequence is represented
the input–output relationship becomes
where the channel transfer function satisﬁes
This represention, called a partial-response channel model,
is among those given a designation by Kretzmer  and
tabulated by Kabal and Pasupathy . The label assigned
to it—“Class-4”—continues to be used in its designation, and
the model is sometimes denoted “PR4.”
For higher channel bit densities, Thapar and Patel 
introduced a general class of partial-response models, with
The corresponding input–output relationship takes the form
where the discrete-time impulse response
has the form
where The frequency response corresponding to
has a ﬁrst-order null at zero frequency and a null
of order at the Nyquist frequency, one-half the symbol
frequency. Clearly, the PR4 model corresponds to
The channel models with are usually referred to as
“extended Class-4” models, and denoted by E PR4. The
PR4, EPR4, and E PR4 models are used in the design of most
magnetic disk drives today.
Models proposed for use in optical-recording systems have
discrete-time impulse responses of the form
where These models reﬂect the nonzero DC-response
characteristic of some optical-recording systems, as well as
their high-frequency attenuation. The models corresponding
to and were also tabulated in , and
are known as Class-1 (PR1) or “duobinary,” and Class-
2 (PR2), respectively. Recently, the models with
have been called “extended PR2” models, and denoted by
EPR2. (See  for an early analysis and application
of PR equalization.)
If the differentiated Lorentzian channel with AWGN is
equalized to a partial-response target, the sampled channel
Under the simplifying assumption that the noise samples
are independent and identically distributed, and Gauss-
ian—which is a reasonable assumption if the selected partial-
response target accurately reﬂects the behavior of the channel
at the speciﬁed channel bit density—the maximum-likelihood
sequence detector determines the channel input–output pair
at each time
This computation can be carried out recursively, using the
Viterbi algorithm. In fact, Kobayashi ,  proposed the
use of the Viterbi algorithm for maximum-likelihood sequence
IMMINK et al.: CODES FOR DIGITAL RECORDERS 2267
Fig. 5. Trellis diagram for PR4 channel.
detection (MLSD) on a PR4 recording channel at about the
same time that Forney  demonstrated its applicability to
MLSD on digital communication channels with intersymbol
The operation of the Viterbi algorithm and its implemen-
tation complexity are often described in terms of the trellis
diagram corresponding to ,  representing the
time evolution of the channel input–output process. The trellis
structure for the E PR4 channel has states. In the
case of the PR4 channel, the input–output relationship
permits the detector to operate independently on
the output subsequences at even and odd time indices. The
Viterbi algorithm can then be described in terms of a decoupled
pair of -state trellises, as shown in Fig. 5. There has been
considerable effort applied to simplifying Viterbi detector ar-
chitectures for use in high data-rate, digital-recording systems.
In particular, there are a number of formulations of the PR4
channel detector. See , , , , , and
Analysis, simulation, and experimental measurements have
conﬁrmed that PRML systems provide substantial performance
improvements over RLL-coded, equalized peak detection. The
beneﬁts can be realized in the form of 3–5-dB additional noise
immunity at linear densities where optimized peak-detection
bit-error rates are in the range of – Alternatively, the
gains can translate into increased linear density—in that range
of error rates, PR4-based PRML channels achieve 15–25%
higher linear density than -coded peak detection, with
EPR4-based PRML channels providing an additional improve-
ment of approximately 15% , .
The SNR loss of several PRML systems and MLSD relative
to the matched-ﬁlter bound at a bit-error rate of was
computed in . The results show that, with the proper
choice of PR target for a given density, PRML performance
can achieve within 1–2 dB of the MLSD.
In , simulation results for MLSD and PR4-based
PRML detection on a differentiated Lorentzian channel with
colored Gaussian media noise were compared to some of the
capacity bounds discussed in Section II-B. For in
the range of – , PR4-based PRML required approximately
2–4 dB higher SNR than MLSD to achieve a bit-error rate
of The SNR gap between MLSD and the telegraph-
wave information-rate lower bound  was approximately
4 dB, and the gap from the unit-process upper bound 
was approximately 7 dB. These results suggest that, through
suitable coupling of equalization and coding, SNR gains as
large as 6 dB over PR4-based PRML should be achievable.
In Section V, we will describe some of the equalization and
coding techniques that have been developed in an attempt to
realize this gain.
III. SHANNON THEORY OF CONSTRAINED CHANNELS
In this section, we show how the implementation of record-
ing systems based upon peak detection and PRML introduces
the need for constraints to be imposed upon channel input
sequences. We then review Shannon’s fundamental results on
the theory of constrained channels and codes.
A. Modulation Constraints
1) Runlength Constraints: At moderate densities, peak de-
tection errors may arise from ISI-induced shifting of peak
locations and drifting of clock phase due to an inadequate
number of detected peak locations.
The latter two problems are pattern-dependent, and the class
of runlength-limited (RLL) sequences are intended to
address them both , . Speciﬁcally, in order to reduce
the effects of pulse interference, one can demand that the
derivative sequence of the channel input contain some
minimum number, say , of symbols of value zero between
consecutive nonzero values. Similarly, to prevent loss of clock
synchronization, one can require that there be no more than
some maximum number, say , of symbols of value zero
between consecutive nonzero values in
In this context, we mention that two conventions are used
to map a binary sequence to the magnetization
pattern along a track, or equivalently, to the two-level sequence
In one convention, called nonreturn-to-zero (NRZ), one
direction of magnetization (or ) corresponds to a
stored and the other direction of magnetization (or )
corresponds to a stored . In the other convention, called
nonreturn-to-zero-inverse (NRZI), a reversal of the direction
of magnetization (or ) represents a stored and a
nonreversal of magnetization (or ) represents a stored .
The NRZI precoding convention may be interpreted as a
translation of the binary information sequence into another
binary sequence that is then mapped by the
NRZ convention to the two-level sequence The relationship
between and is deﬁned by
where and denotes addition modulo .
It is easy to see that
Thus under the NRZI precoding convention, the constraints on
the runlengths of consecutive zero symbols in are reﬂected
2268 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998
Fig. 6. Labeled directed graph for constraint.
in corresponding constraints on the binary information
sequences The set of sequences satisfying this constraint
can be generated by reading the labels off of the paths in the
directed graph shown in Fig. 6.
2) Constraints for PRML Channels: Two issues arise in the
implementation of PRML systems that are related to properties
of the recorded sequences. The ﬁrst issue is that, just as in peak
detection systems, long runs of zero samples in the PR channel
output can degrade the performance of the timing recovery and
gain control loops. This dictates the use of a global constraint
on the number of consecutive zero samples, analogous to
the constraint described above.
The second issue arises from a property of the PR
systems known as quasicatastrophic error propagation
. This refers to the fact that certain bi-inﬁnite PR
channel output sequences are represented by more than
one path in the detector trellis. Such a sequence is pro-
duced by at least two distinct channel input sequences.
For the PR channels under consideration, namely, those
with transfer polynomial
, the difference sequences ,
corresponding to pairs of such input sequences and ,
are easily characterized. (For convenience, the symbols
and are denoted by and , respectively, in these
difference sequences.) Speciﬁcally, if and ,
then these difference sequences are and
If and , the difference sequences are of the form
and Finally, if and ,
then they are and
As a consequence of the existence of these sequences, there
could be a potentially unbounded delay in the merging of
survivor paths in the Viterbi detection process beyond any
speciﬁed time index , even in the absence of noise. It is
therefore desirable to constrain the channel input sequences
in such a way that these difference sequences are forbidden.
This property makes it possible to limit the detector path
memory, and therefore the decoding delay, without incurring
any signiﬁcant degradation in the sequence estimates produced
by the detector.
In the case of PR4, this has been accomplished by limiting
the length of runs of identical channel inputs in each of the
even and odd interleaves, or, equivalently, the length of runs
of zero samples in each interleave at the channel output, to be
no more than a speciﬁed positive integer By incorporating
interleaved NRZI (INRZI) precoding, the and constraints
on output sequences translate into and constraints on
binary input sequences The resulting constraints are denoted
, where the “ ” may be interpreted as a
Fig. 7. DC-free constrained sequences with DSV
constraint, emphasizing the point that intersymbol interference
is acceptable in PRML systems. It should be noted that the
combination of constraints and an INRZI precoder
have been used to prevent quasicatastrophic error propagation
in EPR4 channels, as well.
3) Spectral-Null Constraints: The family of run-
length-limited constraints and PRML constraints are
representative of constraints whose description is essentially in
the time domain (although the constraints certainly have impli-
cations for frequency-domain characteristics of the constrained
sequences). There are other constraints whose formulation is
most natural in the frequency domain. One such constraint
speciﬁes that the recorded sequences have no spectral
content at a particular frequence ; that is, the average power
spectral density function of the sequences has value zero at the
speciﬁed frequency. The sequences are said to have a spectral
null at frequency
For an ensemble of sequences, with symbols drawn from the
bipolar alphabet and generated by a ﬁnite labeled
directed graph of the kind illustrated in Fig. 6, a necessary
and sufﬁcient condition for a spectral null at frequency
, where is the duration of a single recorded
symbol, is that there exist a constant such that
for all recorded sequences and
, , .
In digital recording, the spectral null constraints of most
importance have been those that prescribe a spectral null
at or DC. The sequences are said to be DC-free
or charge-constrained. The concept of running digital sum
(RDS) of a sequence plays a signiﬁcant role in the description
and analysis of DC-free sequences. For a bipolar sequence
, the RDS of a subsequence ,
denoted RDS is deﬁned as
From (6), we see that the spectral density of the sequences
vanishes at if and only if the RDS values for all
sequences are bounded in magnitude by some constant integer
For sequences that assume a range of consecutive RDS
values, we say that their digital sum variation (DSV) is
Fig. 7 shows a graph describing the bipolar, DC-free system
with DSV equal to
DC-free sequences have found widespread application in
optical and magnetic recording systems. In magnetic-tape
IMMINK et al.: CODES FOR DIGITAL RECORDERS 2269
systems with rotary-type recording heads, such as the R-
DAT digital audio tape system, they prevent write-signal
distortion that can arise from transformer-coupling in the
write electronics. In optical-recording systems, they reduce
interference between data and servo signals, and also permit
ﬁltering of low-frequency noise stemming from smudges on
the disk surface. It should be noted that the application of DC-
free constraints has certainly not been conﬁned to data storage.
Since the early days of digital communication by means of
cable, DC-free codes have been employed to counter the
effects of low-frequency cutoff due to coupling components,
isolating transformers, and other possible system impairments
Sequences with a spectral null at also play an
important role in digital recording. These sequences are often
referred to as Nyquist free. There is in fact a close relationship
between Nyquist-free and DC-free sequences. Speciﬁcally,
consider sequences over the bipolar alphabet
If is DC-free, then the sequence deﬁned by
is Nyquist-free. DC/Nyquist-free sequences have spectral nulls
at both and Such sequences can always
be decomposed into a pair of interleaved DC-free sequences.
This fact is exploited in Section V-C in the design of distance-
enhancing, DC/Nyquist-free codes for PRML systems.
In some recording applications, sequences satisfying both
charge and runlength constraints have been used. In particular,
a sequence in the charge-RLL constraint satisﬁes
the runlength constraint, with the added restriction
that the corresponding NRZI bipolar sequence be DC-
free with DSV no larger than Codes using
and constraints—known, respectively, as
“zero-modulation” and “Miller-squared” codes—have found
application in commercial tape-recording systems , ,
B. Discrete Noiseless Channels
In Section III-A, we saw that the successful implementation
of analog and digital signal-processing techniques used in data
recording may require that the binary channel input sequences
satisfy constraints in both the time and the frequency domains.
Shannon established many of the fundamental properties
of noiseless, input-constrained communication channels in
Part I of his 1948 paper . In that section, entitled
“Discrete Noiseless Systems,” Shannon considered discrete
communication channels, such as the teletype or telegraph
channel, where the transmitted symbols were of possibly
different time duration and satisﬁed a set of constraints as to
the order in which they could occur. We will review his key
results and illustrate them using the family of runlength-limited
codes, introduced in Section III-A.
Shannon ﬁrst deﬁned the capacity of a discrete noiseless
where is the number of allowed sequences of length
The following quote, which provides a method of computing
the capacity, is taken directly from Shannon’s original paper
(equation numbers added):
Suppose all sequences of the symbols are
allowed and these symbols have durations
What is the channel capacity? If represents the
number of sequences of duration , we have
The total number is equal to the sum of the number of
sequences ending in and there are
to a well-known result in ﬁnite differences, is then
asymptotic for large to where is the largest real
solution of the characteristic equation
Shannon’s results can be applied directly to the case of
codes by associating the symbols with the
different allowable sequences of ’s ending in a . The
where is the largest real solution of the equation
Shannon went on to describe constrained sequences by labeled,
directed graphs, often referred to as state-transition diagrams.
Again, quoting from the paper:
A very general type of restriction which may be placed
on allowed sequences is the following: We imagine a
number of possible states For each state
only certain symbols from the set can be
transmitted (different subsets for the different states).
When one of these has been transmitted the state changes
to a new state depending both on the old state and the
particular symbol transmitted.
Shannon then proceeded to state the following theorem
which he proved in an appendix:
Theorem 1: Let be the duration of the th symbol
which is allowable in state and leads to state Then
the channel capacity is equal to where is
the largest real root of the determinant equation:
where if and is zero otherwise.
The condition that different states must correspond to dif-
ferent subsets of the transmission alphabet is unnecessarily
2270 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998
restrictive. For the theorem to hold, it sufﬁces that the state-
transition diagram representation be lossless, meaning that any
two distinct state sequences beginning at a common state and
ending at a, possibly different, common state generate distinct
symbol sequences .
This result can be applied to sequences in two
different ways. In the ﬁrst, we let the be the collection
of allowable runs of consecutive ’s followed by a ,as
before. With this interpretation we have only one state since
any concatenation of these runs is allowable. The determinant
equation then becomes the same as (12) with replaced by
In the second interpretation, we let the be associated
with the binary symbols and and we use the graph with
states shown earlier in Fig. 6. Note now that all of the
symbols are of length so that the determinant equation is of
the form (14), as shown at the bottom of this page.
Multiplying every element in the matrix by , we see
that this equation speciﬁes the eigenvalues of the connection
matrix, or adjacency matrix, of the graph—that is, a matrix
which has th entry equal to if there is a symbol from
state that results in the new state and which has th
entry equal to otherwise. (The notion of adjacency matrix
can be extended to graphs with a multiplicity of distinctly
labeled edges connecting pairs of states.) Thus we see that the
channel capacity is equal to the logarithm of the largest real
eigenvalue of the connection matrix of the constraint graph
shown in Fig. 6.
Shannon proceeded to produce an information source by
assigning nonzero probabilities to the symbols leaving each
state of the graph. These probabilities can be assigned in any
manner subject to the constraint that for each state, the sum of
the probabilities for all symbols leaving that state is . Shannon
gave formulas as to how to choose these probabilities such
that the resulting information source had maximum entropy.
He further showed that this maximum entropy is equal to the
capacity Speciﬁcally, he proved the following theorem.
Theorem 8: Let the system of constraints considered as
a channel have a capacity If we assign
where is the duration of the th symbol leading from
state to state and the satisfy
then is maximized and equal to
It is an easy matter to apply Shannon’s result to ﬁnd these
probabilities for codes. The result is that the probability
of a run of ’s followed by a is equal to for
, and is the maximum entropy. Since
the sum of these probabilities (summed over all possible
runlengths) must equal we have
Note that this equation is identical to (12), except for the
choice of the indeterminate. Thus the maximum entropy is
achieved by choosing as the largest real root of this equation
and the maximum entropy is equal to the capacity The
probabilities of the symbols which result in the maximum
entropy are shown in Fig. 8 (where now the branch labels are
the probabilities of the binary symbols and not the symbols
The maximum-entropy solution described in the theorem
dictates that any sequence of length , starting
in state and ending in state , has probability
where denotes the probability of state Therefore,
This is a special case of the notion of “typical long sequences”
again introduced by Shannon in his classic paper. In this
special case of maximum-entropy sequences, for large
enough, all sequences of length are entropy-typical in this
sense. This is analogous to the case of symbols which are of
ﬁxed duration, equally probable, and statistically independent.
Shannon proved that the capacity of a constrained channel
represents an upper bound on the achievable rate of infor-
mation transmission on the channel. Moreover, he deﬁned a
concept of typical sequences and, using that concept, demon-
strated that transmission at rates arbitrarily close to can in
IMMINK et al.: CODES FOR DIGITAL RECORDERS 2271
Fig. 8. Markov graph for maximum entropy sequences.
principle be achieved. Speciﬁcally, he proved the following
“fundamental theorem for a noiseless channel” governing
transmission of the output of an information source over a
constrained channel. We again quote from .
Theorem 9: Let a source have entropy (bits per
symbol) and a channel have a capacity (bits per
second). Then it is possible to encode the output of the
source in such a way as to transmit at the average rate
symbols per second over the channel where
is arbitrarily small. It is not possible to transmit at an
average rate greater than
The proof technique, relying as it does upon typical long
sequences, is nonconstructive. It is interesting to note, how-
ever, that Shannon formulated the operations of the source
encoder (and decoder) in terms of a ﬁnite-state machine, a
construct that has since been widely applied to constrained
channel encoding and decoding. In the next section, we turn
to the problem of designing efﬁcient ﬁnite-state encoders.
IV. CODES FOR NOISELESS CONSTRAINED CHANNELS
For constraints described by a ﬁnite-state, directed graph
with edge labels, Shannon’s fundamental coding theorem guar-
antees the existence of codes that achieve any rate less than
the capacity. Unfortunately, as mentioned above, Shannon’s
proof of the theorem is nonconstructive. However, during
the past 40 years, substantial progress has been made in the
engineering design of efﬁcient codes for various constraints,
including many of interest in digital recording. There have
also been major strides in the development of general code
construction techniques, and, during the past 20 years, rigorous
mathematical foundations have been established that permit
the resolution of questions pertaining to code existence, code
construction, and code implementation complexity.
Early contributors to the theory and practical application
of constrained code design include: Berkoff ; Cattermole
, ; Cohen ; Freiman and Wyner ; Gabor ;
Jacoby , ; Kautz ; Lempel ; Patel ;
and Tang and Bahl; and, especially, Franaszek –.
Further advances were made by Adler, Coppersmith, and
Hassner (ACH) ; Marcus ; Karabed and Marcus ;
Ashley, Marcus, and Roth ; Ashley and Marcus , ;
Immink ; and Hollmann –.
Fig. 9. Finite-state encoder schematic.
In this section, we will survey selected aspects of this theo-
retical and practical progress. The presentation largely follows
, , and, especially, , where more detailed and
comprehensive treatments of coding for constrained channels
may be found.
A. Encoders and Decoders
Encoders have the task of translating arbitrary source in-
formation into a constrained sequence. In coding practice,
typically, the source sequence is partitioned into blocks of
length , and under the code rules such blocks are mapped onto
words of channel symbols. The rate of such an encoder is
To emphasize the blocklengths, we sometimes
denote the rate as
It is most important that this mapping be done as efﬁ-
ciently as possible subject to certain practical considerations.
Efﬁciency is measured by the ratio of the code rate to
the capacity of the constrained channel. A good encoder
algorithm realizes a code rate close to the capacity of the
constrained sequences, uses a simple implementation, and
avoids the propagation of errors in the process of decoding.
An encoder may be state-dependent, in which case the code-
word used to represent a given source block is a function of the
channel or encoder state, or the code may be state-independent.
State-independence implies that codewords can be freely con-
catenated without violating the sequence constraints. A set of
such codewords is called self-concatenable. When the encoder
is state-dependent, it typically takes the form of a synchronous
ﬁnite-state machine, illustrated schematically in Fig. 9.
A decoder is preferably state-independent. As a result of
errors made during transmission, a state-dependent decoder
could easily lose track of the encoder state, and begin to
make errors, with no guarantee of recovery. In order to avoid
2272 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998
Fig. 10. Sliding-block decoder schematic.
error propagation, therefore, a decoder should use a ﬁnite
observation interval of channel bits for decoding, thus limiting
the span in which errors may occur. Such a decoder is called
asliding-block decoder. A sliding-block decoder makes a
decision on a received word on the basis of the -bit word
itself, as well as preceding -bit words and upcoming
-bit words. Essentially, the decoder comprises a register of
length -bit words and a logic function that
translates the contents of the register into the retrieved -
bit source word. Since the constants and are ﬁnite, an
error in the retrieved sequence can propagate in the decoded
sequence only for a ﬁnite distance, at most the decoder window
length. Fig. 10 shows a schematic of a sliding-block decoder.
An important subclass of sliding-block decoders are the block
decoders, which use only a single codeword for reproducing
the source word, i.e.,
Generally speaking, the problem of code design is to con-
struct practical, efﬁcient, ﬁnite-state encoders with sliding-
block decoders. There are several fundamental questions re-
lated to this problem.
a) For a rate , what encoder input and output block
sizes and , with , are realizable?
b) Can a sliding-block decodable encoder always be found?
c) Can 100% efﬁcient sliding-block decodable encoders be
designed when the capacity is a rational number ?
d) Are there good bounds on basic complexity measures
pertaining to constrained codes for a given constraint,
such as number of encoder states, encoder gate complex-
ity, encoding delay, and sliding-block decoder window
Many of these questions have been answered fully or in
part, as we now describe.
B. Graphs and Constraints
It is very useful and convenient, when stating code existence
results and specifying code construction algorithms, to refer
to labeled graph descriptions of constrained sequences. More
precisely, a labeled graph (or a ﬁnite labeled directed graph)
consists of a ﬁnite set of states ;a
ﬁnite set of edges , where each edge has an initial
state and a terminal state, both in ; and an edge labeling
, where is a ﬁnite alphabet. Fig. 11 shows a
Fig. 11. Typical labeled graph.
“typical” labeled graph. When context makes it clear, a labeled
graph may be called simply a “graph.”
A labeled graph can be used to generate ﬁnite symbol
sequences by reading off the labels along paths in the graph,
thereby producing a word (also called a string or a block). For
example, in Fig. 11, the word can be generated by
following a path along edges with state sequence
We will sometimes call word of length generated by an
The connections in the directed graph underlying a labeled
graph are conveniently described by an adjacency matrix, as
was mentioned in Section III. Speciﬁcally, for a graph ,
we denote by the
adjacency matrix whose entry is the number of edges
from state to state in The adjacency matrix, of course,
has nonnegative integer entries. Note that the number of paths
of length from state to state is simply , and the
number of cycles of length is simply the trace of
The fundamental object considered in the theory of con-
strained coding is the set of words generated by a labeled
graph. A constrained system (or constraint), denoted , is the
set of all words (i.e., ﬁnite-length sequences) generated by
reading the labels of paths in a labeled graph We will
also, at times, consider right-inﬁnite sequences and
sometimes bi-inﬁnite sequences The
alphabet of symbols appearing in the words of is denoted
We say that the graph presents or is a presentation
of , and we write For a state in , the set of
all ﬁnite words generated from is called the follower set of
in , denoted by
As mentioned above, a rate ﬁnite-state encoder will
generate a word in the constrained system composed of a
sequence of -blocks. For a constrained system presented
by a labeled graph , it will be very useful to have an
explicit description of the words in , decomposed into such
nonoverlapping blocks of length
Let be a labeled graph. The th power of , denoted
, is the labeled graph with the same set of states as ,
but one edge for each path of length in , labeled by the
-block generated by that path. The adjacency matrix of
For a constrained system presented by a labeled graph ,
the th power of , denoted , is the constrained system pre-
sented by So, is the constrained system obtained from
IMMINK et al.: CODES FOR DIGITAL RECORDERS 2273
by grouping the symbols in each word into nonoverlapping
words of length Note that the deﬁnition of does not
depend on which presentation of is used.
It is important to note that a given constrained system can
be presented by many different labeled graphs and, depending
on the context, one presentation will have advantages relative
to another. For example, one graph may present the constraint
using the smallest possible number of states, while another
may serve as the basis for an encoder ﬁnite-state machine.
There are important connections between the theory of
constrained coding and other scientiﬁc disciplines, including
symbolic dynamics, systems theory, and automata theory.
Many of the objects, concepts, and results in constrained
coding have counterparts in these ﬁelds. For example, the set
of bi-inﬁnite sequences derived from a constrained system is
called a soﬁc system (or soﬁc shift) in symbolic dynamics.
In systems theory, these sequences correspond to a discrete-
time, complete, time-invariant system. Similarly, in automata
theory, a constrained system is equivalent to a regular language
which is recognized by a certain type of automaton .
The interrelationships among these various disciplines are
discussed in more detail in , , and .
The bridge to symbolic dynamics, established in , has
proven to be especially signiﬁcant, leading to breakthroughs in
both the theory and design of constrained codes. An interesting
account of this development and its impact on the design of
recording codes for magnetic storage is given in . A very
comprehensive mathematical treatment may be found in .
C. Properties of Graph Labelings
In order to state the coding theorems, as well as for purposes
of encoder construction, it will be important to consider
labelings with special properties.
We say that a labeled graph is deterministic if, at each state,
the outgoing edges have distinct labels. In other words, at each
state, any label generated from that state determines a unique
outgoing edge from that state. Constrained systems that play
a role in digital recording generally have natural presentations
by a deterministic graph. For example, the labeled graphs in
Figs. 6 and 7 are both deterministic. It can be shown that any
constrained system can be presented by a deterministic graph
. Similarly, a graph is called codeterministic if, for each
state, the incoming edges are distinctly labeled. Fig. 6 is not
codeterministic, while Fig. 7 is.
Many algorithms for constructing constrained codes begin
with a deterministic presentation of the constrained system
and transform it into a presentation which satisﬁes a weaker
version of the deterministic property called ﬁnite anticipation.
A labeled graph is said to have ﬁnite anticipation if there is an
integer such that any two paths of length with the
same initial state and labeling must have the same initial edge.
The anticipation of refers to the smallest for which
this condition holds. Similarly, we deﬁne the coanticipation
of a labeled graph as the anticipation of the labeled graph
obtained by reversing the directions of the edges in
A labeled graph has ﬁnite memory if there is an integer
such that the paths in of length that generate the same
word all terminate at the same state. The smallest for which
this holds is called the memory of and is denoted
A property related to ﬁnite anticipation is that of being
-deﬁnite. A labeled graph has this property if, given
any word , the set of paths
that generate all agree in the edge
A graph with this property is sometimes said to have ﬁnite
memory-and-anticipation. Note that, whereas the deﬁnition of
ﬁnite anticipation involves knowledge of an initial state, the
-deﬁnite property replaces that with knowledge of a
ﬁnite amount of memory.
Finally, as mentioned in Section III, a labeled graph is
lossless if any two distinct paths with the same initial state
and terminal state have different labelings.
The graph in Fig. 6 has ﬁnite memory , and it is -
deﬁnite because, for any given word of length at least ,
all paths that generate end with the same edge. In contrast,
the graph in Fig. 7 does not have ﬁnite memory and is not
D. Finite-Type and Almost-Finite-Type Constraints
There are some special classes of constraints, called ﬁnite-
type and almost-ﬁnite type, that play an important role in the
theory and construction of constrained codes. A constrained
system is ﬁnite-type (a term derived from symbolic dynamics
) if it can be presented by a deﬁnite graph. Thus the
-RLL constraint is ﬁnite-type.
There is also a useful intrinsic characterization of ﬁnite-type
constraints: there is an integer such that, for any symbol
and any word of length at least , we have
if and only if where is the sufﬁx of of
length The smallest such integer , if any, is called the
memory of and is denoted by
Using this intrinsic characterization, we can show that not
every constrained system of practical interest is ﬁnite-type. In
particular, the charge-constrained system described by Fig. 7
is not. To see this, note that the symbol “ ” can be appended
to the word
but not to the word
Nevertheless, this constrained system falls into a natural
broader class of constrained systems. These systems can be
thought of as “locally ﬁnite-type.” More precisely, a con-
strained system is almost-ﬁnite-type if it can be presented
by a labeled graph that has both ﬁnite anticipation and ﬁnite
Since deﬁniteness implies ﬁnite anticipation and ﬁnite coan-
ticipation, every ﬁnite-type constrained system is also almost-
ﬁnite-type. Therefore, the class of almost-ﬁnite-type systems
does indeed include all of the ﬁnite-type systems. This inclu-
sion is proper, as can be seen by referring to Fig. 7. There,
we see that the charge-constrained systems are presented by
labeled graphs with zero anticipation (i.e., deterministic) and
2274 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998
zero coanticipation (i.e., codeterministic). Thus these systems
are almost-ﬁnite-type, but not ﬁnite-type. Constrained systems
used in practical applications are virtually always almost-
Another useful property of constrained systems is irre-
ducibility. A constraint is irreducible if, for every pair of
words in , there is a word such that is in
Equivalently, is irreducible if and only if it is presented by
some irreducible labeled graph. In coding, it usually sufﬁces
to consider irreducible constraints.
Irrreducible constrained systems have a distinguished pre-
sentation called the Shannon cover, which is the unique (up
to labeled graph isomorphism) deterministic presentation of
with a smallest number of states. The Shannon cover
can be used to determine if the constraint is almost-ﬁnite-
type or ﬁnite-type. More precisely, an irreducible constrained
system is ﬁnite-type (respectively, almost-ﬁnite-type) if and
only if its Shannon cover has ﬁnite memory (respectively,
Referring to Section III, recall that the (base- capacity of
a constrained system is given by
where is the number of -blocks in The (base- )
capacity of an irreducible system can be obtained from the
Shannon cover. In fact, as mentioned in Section III, if is
any irreducible lossless presentation of , then
E. Coding Theorems
We now state a series of coding theorems that reﬁne and
strengthen the fundamental coding theorem of Shannon, thus
answering many of the questions posed above. Moreover, the
proofs of these theorems are often constructive, leading to
practical algorithms for code design.
First, we establish some useful notation and terminology.
An encoder usually takes the form of a synchronous ﬁnite-
state machine, as mentioned earlier and shown schematically
in Fig. 9. More precisely, for a constrained system and a
positive integer ,an -encoder is a labeled graph
satisfying the following properties: 1) each state of has out-
degree , that is, outgoing edges; 2) ; and 3) the
presentation is lossless.
Atagged -encoder is an -encoder in which
the outgoing edges from each state in are assigned distinct
words, or input tags, from an alphabet of size We will
sometimes use the same symbol to denote both a tagged
-encoder and the underlying -encoder.
Finally, we deﬁne a rate ﬁnite-state -encoder to
be a tagged -encoder where the input tags are the -
ary -blocks. We will be primarily concerned with the binary
case, , and will call such an encoder a rate ﬁnite-
state encoder for The encoding proceeds in the obvious
fashion, given a selection of an initial state. If the current
state is and the input data is the -block , the codeword
generated is the -block that labels the outgoing edge from
Fig. 12. Rate tagged encoder.
state with input tag The next encoder state is the terminal
state of the edge A tagged encoder is illustrated in Fig. 12.
1) Block Encoders: We ﬁrst consider the construction of
the structurally simplest type of encoder, namely, a block
encoder. A rate ﬁnite-state -encoder is called a
rate block -encoder if it contains only one state.
Block encoders have played an important role in digital storage
The following theorem states that block encoders can be
used to asymptotically approach capacity. It follows essen-
tially from Shannon’s proof of the fundamental theorem for
Block-Coding Theorem: Let be an irreducible con-
strained system and let be a positive integer. There exists a
sequence of rate block -encoders such that
The next result provides a characterization of all block
Block Code Characterization: Let be a constrained sys-
tem with a deterministic presentation and let be a positive
integer. Then there exists a block -encoder if and only if
there exists a subgraph of and a collection of symbols
of , such that is the set of labels of the outgoing edges
from each state in
Freiman and Wyner  developed a procedure that can
be used to determine whether there exists a block -
encoder for a given constrained system with ﬁnite memory
Speciﬁcally, let be a deterministic presentation of
For every pair of states and in , consider the set
of all words of length that can be generated in by paths
that start at and terminate at To identify a subgraph of
as in the block-code characterization, we search for a set
of states in satisfying
Freiman and Wyner  simplify the search by proving that,
when has ﬁnite memory , it sufﬁces to consider sets
which are complete; namely, if is in and ,
then is also in
Even with the restriction of the search to complete sets,
this block-code design procedure is not efﬁcient, in general.
However, given and , for certain constrained systems
, such as the -RLL constraints, it does allow us to
effectively compute the largest for which there exists a block
IMMINK et al.: CODES FOR DIGITAL RECORDERS 2275
OPTIMAL LENGTH- LIST
VARIABLE-LENGTH BLOCK ENCODER
-encoder. In fact, the procedure can be used to ﬁnd a
largest possible set of self-concatenable words of length
Block Encoder Examples: Digital magnetic-tape systems
have utilized block codes satisfying
constraints, for and . Speciﬁcally, the codes,
with rates and , respectively, were derived from
optimal lists of sizes and ,
respectively. The simple rate code,
known as the Frequency Modulation code, consists of the two
codewords and . The 17 words of the list
are shown in Table I. The 16 words remaining after deletion
of the all- ’s word form the codebook for the rate Group
Code Recording (GCR) code, which became the industry
standard for nine-track tape drives. The input tag assignments
are also shown in the table. See  for further details.
A rate , code, developed by Franaszek
, , became an industry standard in disk drives using
peak detection. It can be described as a variable-length block
code, and was derived using a similar search method. The
encoder table is shown in Table II.
Disk drives using PRML techniques have incorporated
a block code satisfying constraints
. The code, with rate , was derived from the unique
maximum size list of size The list has a very
simple description. It is the set of length- binary words
satisfying the following three conditions: 1) the maximum
runlength of zeros within the word is no more than ; 2) the
maximum runlengths of zeros at the beginning and end of the
word are no more than ; and 3) the maximum runlengths
of zeros at the beginning and end of the even interleave and
odd interleave of the word are no more than . A rate ,
block code, derived from an optimal list
of length- words with an analogous deﬁnition, has also been
designed for use in PRML systems , .
2) Deterministic Encoders: Block encoders, although con-
ceptually simple, may not be suitable in many cases, since
they might require a prohibitively large value of in order
to achieve the desired rate. Allowing multiple states in the
encoder can reduce the required codeword length. If each
state in has at least outgoing edges, then we can
obtain a deterministic -encoder by deleting excess
edges. In fact, it is sufﬁcient (and necessary) for to have a
subgraph where each state satisﬁes this condition. This result,
characterizing deterministic encoders, is stated by Franaszek
Deterministic Encoder Characterization: Let be a con-
strained system with a deterministic presentation and let
be a positive integer. Then there exists a deterministic -
encoder if and only if there exists such an encoder which is
a subgraph of
Let be a deterministic presentation of a constrained
system According to the characterization, we can derive
from a deterministic -encoder if and only if there
exists a set of states in , called a set of principal states,
This inequality can be expressed in terms of the character-
istic vector of the set of states , where
if and otherwise. Then, is a set of principal
states if and only if
We digress brieﬂy to discuss the signiﬁcance of this in-
equality. Given a nonnegative integer square matrix
and an integer ,an -approximate eigenvector is a
nonnegative integer vector satisfying
where the inequality holds componentwise. We refer to this
inequality as the approximate eigenvector inequality, and we
denote the set of all -approximate eigenvectors by
Approximate eigenvectors will play an essential role
in the constructive proof of the ﬁnite-state coding theorem
in the next section, as they do in many code-construction
The existence of approximate eigenvectors is guaranteed
by the Perron–Frobenius theory , . Speciﬁcally, let
be the largest positive eigenvalue of , and let be a
positive integers satisfying Then there exists a vector
, with nonnegative integer components, satisfying (17).
The following algorithm, taken from  and due originally to
Franaszek, is an approach to ﬁnding such a vector.
2276 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998
Fig. 13. Rate MFM encoder.
Franaszek Algorithm for Finding an Approximate Eigenvec-
tor: Choose an initial vector whose entries are ,
where is a nonnegative integer. Deﬁne inductively
Let , where is the ﬁrst integer such that
There are two situations that can arise: a) and b)
Case a) means that we have found an approximate
eigenvector, and in case b) there is no solution, so we increase
and start from the top again. There may be multiple
solutions for the vector The choice of the vector may
affect the complexity of the code constructed in this way. The
components of are often called weights.
From (16), it follows that is a set of principal states
if and only if the characteristic vector is an -
approximate eigenvector. Hence, we can ﬁnd whether there
is a deterministic -encoder by applying the Franaszek
algorithm to the matrix , the integer , and the
all- ’s vector as the initial vector A nonzero output vector
is a necessary and sufﬁcient condition for the existence of
a set of principal states, for which is then a characteristic
Deterministic Encoder Example: The rate ,
encoder—known as Modiﬁed Frequency Modulation
code, Miller code, or Delay Modulation—is a determinisitic
encoder. The encoder is derived from the second power of
the Shannon cover of the constraint. A set of
principal states is Fig. 13 shows a rate
deterministic encoder. In fact, the tagged encoder in Fig. 12
is a simpler description of the MFM tagged encoder obtained
by “merging” states and in Fig. 13. (See Section IV-F for
more on merging of states.)
3) Finite-State Coding Theorem: Although deterministic
encoders can overcome some of the limitations of block
encoders, further improvements may arise if we relax the
deterministic property. In this section, we show that, for a
desired rate where cap , even though a
deterministic encoder may not exist, a ﬁnite-state encoder
If an encoder has ﬁnite anticipation , then
we can decode in a state-dependent manner, beginning at the
initial state , and retracing the path followed by the encoder,
as follows. If the current state is , then the current codeword
to be decoded, together with the upcoming codewords,
constitute a word of length (measured in -blocks)
that is generated by a path that starts at By deﬁnition
of anticipation, the initial edge of such a path is uniquely
determined; the decoded -block is the input tag of , and the
next decoder state is the terminal state of
This decoding method will invert the encoder when applied
to valid codeword sequences. The output of the decoder will
be identical to the input to the encoder, possibly with a shift
of input -blocks.
The following theorem establishes that, with ﬁnite antic-
ipation, invertible encoders can achieve all rational rates
less than or equal to capacity, with any input and output
blocklengths and satisfying
Finite-State Coding Theorem: Let be a constrained sys-
tem. If cap then there exists a rate ﬁnite-state
-encoder with ﬁnite anticipation.
The theorem improves upon Shannon’s result in three
important ways. First, the proof is constructive, relying upon
the state-splitting algorithm, which will be discussed in Section
IV-F. Next, it proves the existence of ﬁnite-state -
encoders that achieve rate equal to the capacity cap , when
cap is rational. Finally, for any positive integers and
satisfying the inequality cap , there is a rate
ﬁnite-state -encoder that operates at rate In
particular, choosing and relatively prime, one can design
an invertible encoder using the smallest possible codeword
length compatible with the chosen rate
For completeness, we also state the more simply proved
ﬁnite-state inverse-coding theorem.
Finite-State Inverse-to-Coding Theorem: Let be a con-
strained system. Then, there exists a rate ﬁnite-state
-encoder only if cap
4) Sliding-Block Codes and Block-Decodable Codes: As
mentioned earlier, it is often desirable for ﬁnite-state encoders
to have decoders that limit the extent of error propagation.
The results in this section address the design of encoders with
sliding-block decoders, which we now formally deﬁne.
Let and be integers such that A sliding-
block decoder for a rate ﬁnite-state -encoder is a
such that, if is any sequence of -blocks
generated by the encoder from the input tag sequence of
-blocks , then, for
We call the look-ahead of and the look-behind of
The sum is called the decoding window length of
See Fig. 10, where and
IMMINK et al.: CODES FOR DIGITAL RECORDERS 2277
RATE SLIDING-BLOCK-DECODABLE ENCODER
As mentioned earlier, a single error at the input to a sliding-
block decoder can only affect the decoding of -blocks that
fall in a “window” of length at most , measured in
-blocks. Thus a sliding-block decoder controls the extent of
The following result, due to Adler, Coppersmith, and Hass-
ner , improves upon the ﬁnite-state coding theorem for
ﬁnite-type constrained systems.
Sliding-Block Code Theorem for Finite-Type Systems: Let
be a ﬁnite-type constrained system. If cap , then
there exists a rate ﬁnite-state -encoder with a
This result, sometimes called the ACH theorem, follows
readily from the proof of the ﬁnite-state coding theorem.
The constructive proof technique, based upon state-splitting,
is sometimes referred to as the ACH algorithm (see Section
Sliding-Block Code Example: The con-
straint has capacity Adler, Hassner, and
Moussouris  used the state-splitting algorithm to construct a
rate , encoder with ﬁve states, represented
in tabular form in Table III. Entries in the “state” columns
indicate the output word and next encoder state. With the input
tagging shown, the encoder is sliding-block decodable with
The decoder error propagation is limited
to ﬁve input bits. The same underlying encoder graph was
independently constructed by Jacoby  using “look-ahead”
code design techniques. Weathers and Wolf  applied the
state-splitting algorithm to design a -state,
sliding-block-decodable encoder with error propagation at
most 5 input bits. This encoder has the distinction of achieving
the smallest possible number of states for this constraint and
Ablock-decodable encoder is a special case of -
sliding-block decodable encoders where both and are zero.
Because of the favorable implications for error propagation,
a block-decodable encoder is often sought in practice. The
following result characterizes these encoders completely.
Block-Decodable Encoder Characterization: Let be a
constrained system with a deterministic presentation and let
be a positive integer. Then there exists a block decodable
-encoder if and only if there exists such an encoder
which is a subgraph of
It has been shown that the general problem of deciding
whether a particular subgraph of can be input-tagged in such
a way as to produce a block-decodable encoder is NP-complete
. Nevertheless, for certain classes of constraints, and many
other speciﬁc examples, such an input-tag assignment can be
Block-Decodable Code Examples: For certain irreducible
constrained systems, including powers of -RLL con-
strained systems, Franaszek ,  showed that whenever
there is a deterministic -encoder which is a subgraph of
the Shannon cover, there is also such an encoder that can be
tagged so that it is block-decodable. In fact, the MFM encoder
of Fig. 13 is block-decodable.
For -RLL constrained systems, an explicit description
of such a labeling was found by Gu and Fuja  and, indepen-
dently, by Tjalkens . They show that their labeling yields
the largest rate attainable by any block-decodable encoder for
any given -RLL constrained system.
The Gu–Fuja construction is a generalization of a coding
scheme introduced by Beenker and Immink . The under-
lying idea, which is quite generally applicable, is to design
block-decodable encoders by using merging bits between con-
strained words , , . Each input -block has a
unique constrained -block representation, where The
encoder uses a look-up table for translating source words into
constrained words of length plus some logic circuitry for
determining the merging bits. Decoding is extremely
simple: discard the merging bits and translate the -bit word
into the -bit source word.
For sequences, the encoder makes use of the set
of all -constrained -blocks with at least
leading zeroes and at most trailing zeroes. The parameters
are assumed to satisfy and
Using a look-up table or enumeration techniques [102, p.
117], , , the encoder maps each of the -bit
input tags to a unique -block in where
The codewords in
are not necessarily freely concatenable, however. When the
concatenation of the current codeword with the preceding
one violates the constraint, the encoder inverts one
of the ﬁrst zeroes in the current codeword. The condition
guarantees that such an inversion can always
resolve the constraint violation. In this case, the ﬁrst bits
of each codeword may be regarded as the merging bits.
Immink  gave a constructive proof that codes
with merging bits can be made for which
As a result, codes with a rate only 0.1% less
than Shannon’s capacity can be constructed with codewords
of length Such long codewords could present an
additional practical problem—beyond that of mapping the
input words to the constrained words, which can be handled by
enumerative coding—because a single channel bit error could
corrupt the entire data in the decoded word. One proposal for
resolving this difﬁculty is to use a special conﬁguration of the
error-correcting code and the recording code , , .
Another well-known application of this method is that of the
Eight-to-Fourteen Modulation (EFM) code, a rate code
which is implemented in the compact audio disc , ,
. A collection of 256 codewords is drawn from the set of
length- words that satisfy the constraint.
With this codebook, two merging bits would sufﬁce to achieve
a rate block-decodable code. However,
in order to induce more favorable low-frequency spectral
2278 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998
characteristics in the recorded code sequences, the encoding
algorithm introduces an additional merging bit, yielding the
rate block-decodable EFM encoder.
5) Extensions: In this section, we present strengthened
versions of both the ﬁnite-state coding theorem and the ACH
Anoncatastrophic encoder is a tagged -encoder with
ﬁnite anticipation and the additional property that, whenever
the sequences of output labels of two right-inﬁnite paths differ
in only ﬁnitely many places, the corresponding sequences of
input tags also differ in only ﬁnitely many places. A rate
ﬁnite-state tagged -encoder is noncatastrophic if
the corresponding tagged -encoder is noncatastrophic.
Noncatastrophic encoders restrict error propagation in the
sense that they limit the number of decoded data errors
spawned by an isolated channel error. They do not necessarily
limit the time span in which these errors occur. The concept
of noncatastrophicity appears in the theory of convolutional
codes, as well, where it actually coincides with sliding-block
decodability [137, Ch. 10].
The following theorem is due to Karabed and Marcus .
Noncatastrophic Encoder Theorem: Let be a constrained
system. If cap , then there exists a noncatastrophic
rate ﬁnite-state -encoder.
For the noncatastrophic encoders constructed in the proof
of the theorem, the decoding errors generated by a single
channel error are, in fact, conﬁned to two bursts of ﬁnite
length, although these bursts may appear arbitrarily far apart.
Karabed and Marcus also extended the ACH theorem to
Sliding-Block Code Theorem for Almost-Finite-Type Sys-
tems: Let be an almost-ﬁnite-type constrained system. If
cap , then there exists a rate ﬁnite-state
-encoder with a sliding-block decoder.
The proof of this result is quite complicated. Although it
does not translate as readily as the proof of the ACH theorem
into a practical encoder design algorithm, the proof does
introduce new and powerful techniques that, in combination
with the state-splitting approach, can be applied effectively in
For example, some of these techniques were used in the de-
sign of a 100% efﬁcient, sliding-block-decodable encoder for a
combined charge-constrained runlength-
limited system . In fact, it was the quest for such an encoder
that provided the original motivation for the theorem. Several
of the ideas in the proof of this generalization of the ACH
theorem from ﬁnite-type to almost-ﬁnite-type systems have
also played a role in the design of coded-modulation schemes
based upon spectral-null constraints, discussed in Section V-C.
F. The State-Splitting Algorithm
There are many techniques available to construct efﬁ-
cient ﬁnite-state encoders. The majority of these construction
techniques employ approximate eigenvectors to guide the
construction process. Among these code design techniques
is the state-splitting algorithm (or ACH algorithm) intro-
duced by Adler, Coppersmith, and Hassner . It implements
the proof of the ﬁnite-state coding theorem and provides a
recipe for constructing ﬁnite-state encoders that, for ﬁnite-type
constraints, are sliding-block-decodable. The state-splitting
approach combines ideas found in Patel’s construction of
the Zero-Modulation (ZM) code  and earlier work of
Franaszek – with concepts and results from the math-
ematical theory of symbolic dynamics .
The ACH algorithm proceeds roughly as follows. For a
given deterministic presentation of a constrained system
and an achievable rate cap , we iteratively apply
a state-splitting transformation beginning with the th-power
graph The choice of transformation at each step is guided
by an approximate eigenvector, which is updated at each
iteration. The procedure culminates in a new presentation of
with at least outgoing edges at each state. After deleting
edges, we are left with an -encoder, which, when
tagged, gives our desired rate ﬁnite-state -encoder.
(Note that, if is ﬁnite-type, the encoder is sliding-block-
decodable regardless of the assignment of input tags.)
In view of its importance in the theory and practice of
code design, we now present the state-splitting algorithm
in more detail. This discussion follows , to which we
refer the reader for further details. The basic step in the
procedure is an out-splitting of a graph, and, more speciﬁcally,
an approximate-eigenvector consistent out-splitting, both of
which we now describe.
An out-splitting of a labeled graph begins with a partition
of the set of outgoing edges for each state in into
The partition is used to derive a new labeled graph
The set of states consists of descendant states
for every Outgoing edges from
state in are partitioned among its descendant states and
replicated in to each of the descendant terminal states in
the following manner. For each edge from to in ,
we determine the partition element to which belongs,
and endow with edges from to for
The label on the edge in is the same
as the label of the edge in (Sometimes an out-splitting
is called a round of out-splitting to indicate that several states
may have been split simultaneously.) The resulting graph
generates the same system , and has anticipation at most
Figs. 14 and 15 illustrate an out-splitting operation
Given a labeled graph , a positive integer , and an
-approximate eigenvector ,an -
consistent partition of is deﬁned by partitioning the set
of outgoing edges for each state in into disjoint
IMMINK et al.: CODES FOR DIGITAL RECORDERS 2279
Fig. 14. Before out-splitting.
Fig. 15. After out-splitting.
Fig. 16. Before -consistent out-splitting.
with the property that
where denotes the terminal state of the edge , are
nonnegative integers, and
for every (19)
The out-splitting based upon such a partition is called an -
consistent splitting. The vector indexed by the states
of the split graph and deﬁned by is called
the induced vector.An -consistent partition or splitting is
called nontrivial if for at least one state and
both and are positive. Figs. 16 and 17 illustrate an
Fig. 17. After -consistent out-splitting.
We now summarize the steps in the state-splitting algorithm
for constructing a ﬁnite-state encoder with ﬁnite anticipation
The State-Splitting Algorithm:
1) Select a labeled graph and integers and as follows:
a) Find a deterministic labeled graph (or more gen-
erally a labeled graph with ﬁnite anticipation) which
presents the given constrained system (most con-
strained systems have a natural deterministic repre-
sentation that is used to describe them in the ﬁrst
b) Find the adjacency matrix of
c) Compute the capacity cap
d) Select a desired code rate satisfying
(one usually wants to keep and relatively small
for complexity reasons).
3) Using the Franaszek algorithm of Section IV-E2, ﬁnd an
4) Eliminate all states with from , and restrict
to an irreducible sink of the resulting graph, meaning
a maximal irreducible subgraph with the property that all
edges with initial states in have their terminal states
in Restrict to be indexed by the states of
5) Iterate steps 5a)–5c) below until the labeled graph
has at least edges outgoing from each state:
a) Find a nontrivial -consistent partition of the edges
in (This can be shown to be possible with a state
of maximum weight.)
b) Find the -consistent splitting corresponding to this
partition, creating a labeled graph and an approx-
c) Replace by and by
6) At each state of , delete all but outgoing edges
and tag the remaining edges with -ry -blocks, one for
2280 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998
each outgoing edge. This gives a rate ﬁnite-state
At every iteration, at least one state is split in a nontrivial
way. Since a state with weight will be split into at most
descendant states throughout the whole iteration process,
the number of iterations required to generate the encoder graph
is no more than Therefore, the anticipation
of is at most For the same reason, the
number of states in is at most
The operations of taking higher powers and out-splitting
preserve deﬁniteness (although the anticipation may increase
under out-splitting). Therefore, if is ﬁnite-type and
is a ﬁnite-memory presentation of , any -encoder
constructed by the state-splitting algorithm will be -
deﬁnite for some and and, therefore, sliding-block-
The execution of the sliding-block code algorithm can be
made completely systematic, in the sense that a computer
program can be devised to automatically generate an encoder
and decoder for any valid code rate. Nevertheless, the appli-
cation of the method to just about any nontrivial code design
problem will beneﬁt from the interactive involvement of the
code designers. There are some practical tools that can help
the designer make “good” choices during the construction
process, meaning choices that optimize certain measures of
performance and complexity. Among them is state merging, a
technique that can be used to simplify the encoder produced
by the ACH algorithm, as we now describe.
Let be a labeled graph and let and be two states in
such that Suppose that is an -
approximate eigenvector, and that The -
merger of is the labeled graph obtained from by:
1) eliminating all edges in ; 2) redirecting into state all
remaining edges coming into state ; and 3) eliminating the
state It is straightforward to show that , and
the vector deﬁned by for all vertices of is an
-approximate eigenvector. This operation reduces the
ﬁnal number of encoder states by The general problem
of determining when to apply state merging during the state-
splitting procedure in order to achieve the minimum number
of states in the ﬁnal encoder remains open.
It is also desirable to minimize the sliding-block decoder
window size, in order to limit error propagation as well as
decoder complexity. There are several elements of the code
design that inﬂuence the window size, such as initial presen-
tation, choice of approximate eigenvector, selection of out-
splittings, excess edge elimination, and input tag assignment.
There are approaches that, in some cases, can be used during
the application of the state-splitting algorithm to help reduce
the size of the decoder window, but the problem of minimizing
the window size remains open. In this context, it should
be noted that there are alternative code-design procedures
that provide very useful heuristics for constructing sliding-
block-decodable encoders with small decoding window. They
also imply useful upper bounds on the minimum size of the
decoding window and on the smallest possible anticipation
(or decoding delay) . In particular, Hollmann  has
recently developed an approach, inﬂuenced by earlier work of
Immink , which combines the state-splitting method with
a generalized look-ahead encoding technique called bounded-
delay encoding, originally introduced by Franaszek , .
In a number of cases, it was found that this hybrid code design
technique produced a sliding-block-decodable encoder with
smaller window length than was achieved using other methods.
Several examples of such codes for speciﬁc constraints of
practical importance were constructed in .
For more extensive discussion of complexity measures and
bounds, as well as brief descriptions of other general code
construction methods, the reader is referred to .
G. Universality of State Splitting
The guarantee of a sliding-block decoder when is ﬁnite-
type, along with the explicit bound on the decoder window
length, represent key strengths of the state-splitting algorithm.
Another important property is its universality. In this context,
we think of the state-splitting algorithm as comprising a
selection of a deterministic presentation of a constrained
system ,an -approximate eigenvector , a sequence
of -consistent out-splittings, followed by deletion of excess
edges, and ﬁnally an input-tag assignment, resulting in a
For integers and a function from -blocks
of to the -ary alphabet (such as a sliding-block decoder),
we deﬁne to be the induced mapping on bi-inﬁnite
sequences given by
For convenience, we use the notation to denote For
a tagged -encoder with sliding-block decoder ,we
take the domain of the induced mapping to be the set of
all bi-inﬁnite (output) symbol sequences obtained from We
say that a mapping is a sliding-block -decoder if
is a sliding-block decoder for some tagged -encoder.
The universality of the state-splitting algorithm is summa-
rized in the following theorem due to Ashley and Marcus ,
which we quote from .
Universality Theorem: Let be an irreducible constrained
system and let be a positive integer.
a) Every sliding-block -decoder has a unique mini-
mal tagged -encoder, where minimality is in terms
of number of encoder states.
b) If we allow an arbitrary choice of deterministic pre-
sentation of and -approximate eigenvector
, then the state-splitting algorithm can ﬁnd a tagged
-encoder for every sliding-block -decoder.
If we also allow merging of states (i.e., -merging
as described above), then it can ﬁnd the minimal tagged
-encoder for every sliding-block -decoder.
c) If we ﬁx to be the Shannon cover of , but allow
an arbitrary choice of -approximate eigenvector
IMMINK et al.: CODES FOR DIGITAL RECORDERS 2281
, then the state-splitting algorithm can ﬁnd a tagged
-encoder for every sliding-block -decoder
, modulo a change in the domain of , possibly
with a constant shift of each bi-inﬁnite sequence prior
to applying (but with no change in the decoding
function itself). If we also allow merging of states,
then, modulo the same changes, it can ﬁnd the minimal
tagged -encoder for every sliding-block -
decoder. In particular, it can ﬁnd a sliding-block -
decoder with minimal decoding window length.
Certain limitations on the use of the algorithm should be
noted, however . If we apply the state-splitting algorithm
to the Shannon cover of an irreducible constrained system
, it need not be able to ﬁnd a sliding-block -decoder
with smallest number of encoder states in its minimal tagged
Similarly, if we start with the Shannon cover of an irre-
ducible constrained system and, in addition, we ﬁx to be a
minimal -approximate eigenvector (i.e., with smallest
eigenvector component sum), then the algorithm may fail to
ﬁnd a sliding-block -decoder with minimum decoding
window length , , .
The universality of the state-splitting algorithm is an at-
tractive property, in that it implies that the technique can
be used to produce the “best” codes. However, in order to
harness the power of this design tool, strategies for making
the right choices during the execution of the construction
procedure are required. There is considerable room for further
research in this direction, as well as in the development of
other code-construction methods.
H. Practical Aspects of High-Rate Code Design
The construction of very high rate -constrained codes
and DC-balanced codes is an important practical problem ,
, . The construction of such high-rate codes is far
from obvious, as table look-up for encoding and decoding is
an engineering impracticality. The usual approach is to sup-
plement the source bits with bits. Under certain,
usually simple, rules the source word is modiﬁed in such a
way that the modiﬁed word plus supplementary bits comply
with the constraints. The information that certain modiﬁcations
have been made is carried by the supplementary bits. The
receiver, on reception of the word, will undo the modiﬁcations.
In order to reduce complexity and error propagation, the
number of bits affected by a modiﬁcation should be as small as
possible. We now give some examples of such constructions.
A traditional example of a simple DC-free code is called the
polarity bit code . The source symbols are supplemented
by one bit called the polarity bit. The encoder has the option to
transmit the -bit word without modiﬁcation or to invert
all symbols. The choice of a speciﬁc translation is made
in such a way that the running digital sum is as close to zero
as possible. It can easily be shown that the running digital sum
takes a ﬁnite number of values, so that the sequence generated
A surprisingly simple method for transforming an arbi-
trary word into a codeword having equal numbers of ’s
and ’s—that is, a balanced or zero-disparity word—was
published by Knuth  and Henry . Let
be the disparity of the binary source word
Let be the running digital sum of the ﬁrst
bits of or
and let be the word with its ﬁrst bits inverted. For
we have and
If is of even length , and if we let stand for ,
then the quantity is
It is immediate that (no symbols inverted),
and (all symbols inverted). We may, there-
fore, conclude that every word can be associated with at
least one , so that ,or is balanced. The
value of is encoded in a (preferably) zero-disparity word
of length even. If and are both odd, we can use
a similar construction. The maximum codeword length of
is governed by
Some other modiﬁcations of the basic scheme are discussed
in Knuth  and Alon .
The sequence replacement technique  converts source
words of length into -constrained words of length
The control bit is set to and appended at
the beginning of the -bit source word. If this -bit
sequence satisﬁes the prescribed constraint it is transmitted. If
the constraint is violated, i.e., a runlength of at least ’s
occur, we remove the trespassing ’s. The position where
the start of the violation was found is encoded in bits,
which are appended at the beginning of the -bit word. Such
a modiﬁcation is signaled to the receiver by setting the control
bit to . The codeword remains of length The procedure
above is repeated until all forbidden subsequences have been
removed. The receiver can reconstruct the source word as the
position information is stored at a predeﬁned position in the
codeword. In certain situations, the entire source word has
to be modiﬁed which makes the procedure prone to error
propagation. The class of rate -constrained
codes, was constructed to minimize
2282 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998
Fig. 18. Probability that no sequence of drawings from a selection set of random sequences satisﬁes the constraint. Code rate Upper
curve: codeword length , selection set size ; lower curve: codeword length , selection set size
error propagation . Error propagation is conﬁned to one
decoded 8-bit symbol, irrespective of the codeword length
Recently, the publications by Fair et al.  and Immink
and Patrovics  on guided scrambling brought new insights
into high-rate code design. Guided scrambling is a member
of a larger class of related coding schemes called multimode
codes. In multimode codes, the -bit source word is mapped
into -bit codewords. Each source word can be
represented by a member of a selection set consisting of
codewords. Examples of such mappings are the
guided scrambling algorithm presented by Fair et al. ,
the DC-free coset codes of Deng and Herro , and the
scrambling using a Reed–Solomon code by Kunisa et al. .
A mapping is considered to be “good” if the selection set
contains sufﬁciently distinct and random codewords.
The encoder transmits the codeword that minimizes, accord-
ing to a prescribed criterion, some property of the encoded
sequence, such as its low-frequency spectral content. In gen-
eral, there are two key elements which need to be chosen
judiciously: a) the mapping between the source words and
their corresponding selection sets, and b) the criterion used to
select the “best” word.
The use of multimode codes is not conﬁned to the generation
of DC-free sequences. Provided that is large enough and
the selection set contains sufﬁciently different codewords,
multimode codes can also be used to satisfy almost any
channel constraint with a suitably chosen selection method.
For given rate and proper selection criteria, the spectral content
of multimode codes is very close to that of maxentropic RDS-
constrained sequences. A clear disadvantage is that the encoder
needs to generate all possible codewords, compute the
criterion, and make the decision.
In the context of high-rate multimode codes, there is in-
terest in weakly constrained codes . Weakly constrained
codes may produce sequences that violate the constraints with
probability It is argued that if the channel is not free
of errors, it is pointless to feed the channel with perfectly
constrained sequences. We illustrate the effectiveness of this
idea by considering the properties of two examples of weak
codes. Fig. 18 shows the probability that no sequence
taken from a selection set of size of random sequences
obeys the constraint. Let the code rate ,
the codeword length , and the size of the selection set
Then we observe that with probability
a codeword violates the constraint. The alternative
implementation  requires a rate of —four
times the redundancy of the weakly constrained code—to
strictly guarantee the same constraint.
V. CONSTRAINED CODES FOR NOISY RECORDING CHANNELS
In Section III-A, we indicated how the implementation
of timing recovery, gain control, and detection algorithms
in recording systems created a need for suitably constrained
recording codes. These codes are typically used as an inner
code, in concatenation with an outer error-correcting code.
The error-correcting codes improve system performance by
introducing structure, usually of an algebraic nature, that
increases the separation of code sequences as measured by
some distance metric, such as Hamming distance.
A number of authors have addressed the problem of endow-
ing constrained codes with advantageous distance properties.
Metrics that have been considered include Hamming distance,
edit (or Levenshtein) distance, and Lee distance. These metrics
arise in the context of a variety of error types, including
random-bit errors, insertion and deletion errors, bitshift errors,
and more generally, burst errors. Code constructions, perfor-
mance analyses, as well as lower and upper bounds on the
IMMINK et al.: CODES FOR DIGITAL RECORDERS 2283
achievable size of constrained codes with speciﬁed distance
properties are surveyed in .
It is fair to say that the application of constrained codes
with random or burst-error correction capabilities, proposed
largely in the context of storage systems using symbol-by-
symbol detection such as peak detection, has been extremely
limited. However, the advent of digital signal processing
techniques such as PRML has created a new role for recording
codes, analogous to the role of trellis-coded modulation in
digital communications. In this section, we describe how
appropriately constrained code sequences can improve PRML
system performance by increasing the separation between the
channel output sequences with respect to Euclidean distance.
A. PRML Performance Bounds and Error Event Analysis
The design of distance-enhancing constrained codes for
recording channels requires an understanding of the perfor-
mance of the PRML Viterbi detector, which we now brieﬂy
review. The detector performance is best understood in terms
of error events. For a pair of input sequences and ,
deﬁne the input error sequence and the
output error sequence Aclosed error
event corresponds to a polynomial input error sequence
where and are ﬁnite integers, , and
A closed error event is said to be simple if the condition
is not true for any
integer where is the memory of the
channel. An open error event corresponds to a right-inﬁnite
input error sequence of the form
where inﬁnitely many are nonzero, but the Euclidean
norm is ﬁnite
In general, for an error event , with corresponding input
error sequence and output error sequence , the
squared-Euclidean distance is deﬁned as
The number of channel input-bit errors corresponding to an
error event is given by
The ML detector produces an error when the selected trellis
path differs from the correct path by a sequence of error events.
The union bound provides an upper bound to the probability
of an error event beginning at some time by considering the
set of all possible simple error events
ﬁrst event at time
ERROR EVENT MULTIPLICITY GENERATING FUNCTIONS
which in the assumed case of AWGN, yields
Reorganizing the summation according to the error event
distance , the bound is expressed as:
where the values , known as the error event distance
spectrum, are deﬁned by
At moderate-to-high SNR, the performance of the system is
largely dictated by error events with small distance In
particular, the events with the minimum distance will be
the dominant contributors to the union bound, leading to the
frequently used approximation
For a number of the PR channel models applicable to
recording, the error event distance spectrum values, as well as
the corresponding input error sequences, have been determined
for a range of values of the distance , , .
The calculation is made somewhat interesting by the fact,
mentioned in Section II-C2, that the PR trellises support closed
error events of unbounded length having certain speciﬁed,
ﬁnite distances. For channels with limited ISI, analytical
methods may be applied in the characterization of low distance
events. However, for larger distances, and for PR channel
polynomials of higher degree, computer search methods have
been more effective.
Table IV gives several terms of the error event multi-
plicity generating functions for several
PR channels. Tables V and VI, respectively, list the input
error sequences for simple closed events on the PR4 and
EPR4 channels having squared-distance Table VII
describes the input error sequences for simple closed events on
the E PR4 channel having squared-distance In
the error sequence tables, the symbol “ ” is used to designate
“,” “ ” is used to designate “ ,” and a parenthesized string
denotes any positive number of repetitions of the string
2284 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998
CLOSED ERROR EVENTS (PER INTERLEAVE)FOR PR4 CHANNEL,
CLOSED ERROR EVENTS FOR EPR4 CHANNEL,
B. Code Design Strategy
The characterization of error sequences provides a basis
for the design of constrained codes that eliminate events with
a small Euclidean distance, thereby increasing the minimum
distance and giving a performance improvement , ,
, . This operation is similar in nature to expurgation
in the context of algebraic codes.
More speciﬁcally, the design of distance-enhancing con-
strained codes for PRML systems is based upon the following
strategy. First, we identify the input error sequences
with , where is the target
distance of the coded channel. Then, we determine a list
of input error strings that, if eliminated by means of a
code constraint, will prevent the occurrence of error events
with We denote the set of ternary error sequences
satisfying this constraint by
In order to prevent these error strings, we must next de-
termine a code constraint with the property that the corre-
sponding set of input error sequences satisﬁes
There are many choices for the error strings , as well as
for constraints satisfying (20). The problem of identifying
CLOSED ERROR EVENTS FOR E PR4 CHANNEL,
those that can generate the best practical distance-enhancing
codes—with a speciﬁed coding gain, high-rate, simple encoder
and decoder, and low-complexity sequence detector—remains
The code constraints and the PR channel memory are
then incorporated into a single detector trellis that can serve
as the basis for the Viterbi detector. The ﬁnal step in the design
procedure is to construct an efﬁcient code into the constraints
This can be accomplished using code design techniques
such as those discussed in Section IV.
It is useful to distinguish between two cases in implementing
this strategy for the PR channels we have discussed. The
cases are determined by the relationship of the minimum
distance to the matched-ﬁlter-bound (MFB) distance,
the energy in the channel impulse response.
The ﬁrst case pertains to those channels which are said to
achieve the MFB
including PR4, EPR4, and PR1. For these channels, the set of
minimum-distance input error sequences includes
, and so any distance-enhancing code constraint must
prevent this input error impulse from occurring.
The second case involves channels which do not achieve
This case applies to E PR4, for all , as well as
EPR2, for all Note that, in this situation, a
minimum-distance input error sequence—in fact, every er-
ror sequence satisfying —has length strictly
greater than , where event length refers to the span between
the ﬁrst and last nonzero symbols. These events can often be
eliminated with constraints that are quite simply speciﬁed
and for which practical, efﬁcient codes are readily constructed.
For the latter class of channels, we can determine distance-
enhancing constraints that increase the minimum distance to
, yet are characterizable in terms of a small list of
IMMINK et al.: CODES FOR DIGITAL RECORDERS 2285
relatively short forbidden code strings. (We will sometimes
denote such constraints by This permits the design
of high-rate codes, and also makes it possible to limit the
complexity of the Viterbi detector, since the maximum length
of a forbidden string may not exceed too signiﬁcantly, or
at all, the memory of the uncoded channel. Consequently,
and perhaps surprisingly, the design of high-rate, distance-
enhancing codes with acceptable encoder/decoder and Viterbi
detector complexity proves to be considerably simpler for
the channels in the second group, namely, the channels with
relatively larger intersymbol interference.
We now turn to a discussion of some speciﬁc distance-
enhancing constraints and codes for partial-response channels.
C. Matched-Spectral-Null Constraints
As mentioned above, spectral-null constraints, particularly
those with DC-nulls and/or Nyquist-nulls, are well-matched
to the frequency characteristics of digital recording channels,
and have found application in many recording systems prior
to the introduction of PRML techniques. In  and , it
was shown that, in addition, constraints with spectral nulls at
the frequencies where the channel frequency response has the
value zero—matched-spectral-null (MSN) constraints—can in-
crease the minimum distance relative to the uncoded channel.
An example of this phenomenon, and one which served
historically to motivate the use of matched-spectral-null codes,
is the rate biphase code, with binary codewords
and , which, one can easily show, increases the minimum
squared-Euclidean distance of the binary-input dicode channel,
, from to
To state a more general bound on the distance-enhancing
properties of MSN codes, we generalize the notion of a
spectral null constraint to include sequences for which higher
order derivatives of the power spectrum vanish at speciﬁed
frequencies, as well. More precisely, we say that an ensemble
of sequences has an order-spectral density null at if the
power spectral density satisﬁes
We will concentrate here upon those with high-order spectral
null at DC. Sequences with high-order spectral nulls can be
characterized in a number of equivalent ways. The high-order
running-digital-sums of a sequence at
DC can be deﬁned recursively as
Sequences with order- spectral null at DC may be char-
acterized in terms of properties of RDS
Another characterization involves the related notion of high-
order moments (power-sums), where the order-moment at
DC of the sequence is deﬁned as
In analogy to the characterization of (ﬁrst-order) spectral
null sequences, one can show that an ensemble of sequences
generated by freely concatenating a set of codewords of ﬁnite
length will have an order- spectral null at DC if and only if
for all codewords In other words, for each codeword, the
order- moments at DC must vanish for
A sequence satisfying this condition is also said to have zero
disparity of order
Finally, we remark that a length- sequence with -
has an order- spectral null at DC if and only if is
divisible by This fact plays a role in bounding the
distance-enhancing properties of spectral-null sequences.
For more details about high-order spectral null constraints,
particularly constraints with high-order null at DC, we refer the
interested reader to Immink , Monti and Pierobon ,
Karabed and Siegel , Eleftheriou and Cideciyan , and
Roth, Siegel, and Vardy , as well as other references cited
The original proof of the distance-enhancing properties of
MSN codes was based upon a number-theoretic lower bound
on the minimum Hamming distance of zero-disparity codes,
due to Immink and Beenker . They proved that the
minimum Hamming distance (and, therefore, the minimum
Euclidean distance) of a block code over the bipolar alphabet
with order- spectral-null at DC grows at least linearly in
Speciﬁcally, they showed that, for any pair of length-
sequences in the code
This result for block codes can be suitably generalized to any
constrained system with order- spectral null at DC. The
result also extends to systems with an order- spectral null at
any rational submultiple of the symbol frequency, in particular,
at the Nyquist frequency.
In , this result was extended to show that the Lee
distance, and a fortiori the squared-Euclidean distance, be-
tween output sequences of a bipolar, input-constrained channel
is lower-bounded by if the input constraint and the
channel, with spectral nulls at DC (or the Nyquist frequency)
of orders and , respectively, combine to produce a
spectral null at DC (or Nyquist) of order This
result can be proved by applying Descartes’ rule of signs to
the -transform representation of these sequences, using the
divisibility conditions mentioned above .
2286 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998
This result can be applied to the PR4, EPR4, and E PR4
channels, which have a ﬁrst-order null at DC and a Nyquist
null of order and , respectively. If the channel
inputs are constrained to be bipolar sequences with an order-
Nyquist null, the channel outputs will satisfy the following
lower bound on minimum squared-Euclidean distance:
for E PR4.
Comparing to the minimum distance of the uncoded bipolar
channels, we see that the MSN constraint with ,
corresponding to a ﬁrst-order Nyquist null, provides a coding
gain (unnormalized for rate loss) of at least 3, 1.8, and 1.2
dB, respectively. Using the observation made in Section III-
A3, one can design codes with ﬁrst-order null at DC and
Nyquist by twice-interleaving a DC-free code. When such a
code is applied to the PR4 channel, which has an interleaved
dicode decomposition, the implementation of the MSN-coded
system becomes feasible. Code-design techniques such as
those described in Section IV have been used to design
efﬁcient MSN codes. For analytical and experimental results
pertaining to a rate , MSN-coded PR4 system, the reader
is referred to  and . Experimental evaluation of a
spectral-null coded-tape system is described in .
For these examples of MSN-constrained PR channels, the
error event characterization discussed above provides another
conﬁrmation, and a reﬁnement, of the coding gain bounds. The
veriﬁcation makes use of the moment conditions satisﬁed by
closed input error sequences satisfying spectral
null properties, a generalization of the moment conditions in
(21) above. Speciﬁcally, a ﬁrst-order DC null requires that
and a ﬁrst-order Nyquist null requires that
Examination of the error events for PR4 in Table V shows
that each error event with fails to satisfy at least one
of these conditions. Similarly, for EPR4, the error events in
Table VI with are forbidden by the moment conditions.
In the case of E PR4, the error event characterization not only
conﬁrms, but also improves, the lower bound. Table VII shows
that the moment conditions cannot be satisﬁed by any error
sequence with implying a nominal coding gain of
2.2 dB. MSN coding based upon Nyquist-free constraints is
applicable to optical PR channels, and error-event analysis can
be used to conﬁrm the coding gain bounds in a similar manner
There have been a number of extensions and variations
on MSN coding techniques, most aimed at increasing code
rate, improving intrinsic runlength constraints, or reducing
the implementation complexity of the encoder, decoder, and
detector. For further details, the reader should consult  and
the references therein, as well as more recent results in, for
example,  and .
When implementing MSN-coded PR systems, the complex-
ity of the trellis structure that incorporates both the PR channel
memory and the MSN constraints can be an issue, particularly
for high-rate codes requiring larger digital sum variation.
Reduced-complexity, suboptimal detection algorithms based
upon a concatenation of a Viterbi detector for the PR channel
and an error-event post-processor have been proposed for a
DC/Nyquist-free block-coded PR4 channel  and EPR4
channel . In both schemes, DC/Nyquist-free codewords
are obtained by interleaving pairs of DC-free codewords, and
discrepancies in the ﬁrst-order moments of the interleaved
codeword estimates produced by the PR channel detector
are utilized by the post-processors to determine and correct
most-probable minimum-distance error events.
It should be pointed out that aspects of the code design
strategy described above were foreshadowed in an unpublished
paper of Fredrickson  dealing with the biphase-coded
dicode channel. In that paper, the observation was made
that the input error sequences corresponding to the minimum
squared-distance were of the form
and those corresponding to the next-minimum distance
were of the form
Fredrickson modiﬁed the encoding process to eliminate
minimum-distance events by appending an overall “parity-
check” bit to each block of input bits, for a speciﬁed value
of The resulting rate code provided a minimum
squared-Euclidean distance at the output of the
dicode channel, with only a modest penalty in rate for large
The Viterbi detector for the coded channel was modiﬁed to
incorporate the parity modulo- and to reﬂect the even-parity
condition at the codeword boundaries. It was also shown that
both the and the events can be eliminated
by appending a pair of bits to each block of input bits in
order to enforce a speciﬁc parity condition modulo- . The
resulting rate code yielded at the
dicode channel output, and the coding gain was realized with
a suitably enhanced detector structure.
D. Runlength Constraints
Certain classes of runlength constraints have distance-
enhancing properties when applied to the magnetic and optical
PR channels. For example, the NRZI constraint has been
applied to the EPR4 and the E PR4 magnetic channels, as well
as the PR1 and PR2 optical channels; see, for example 
and the references therein. On the EPR4 and PR1 channels,
the constraint does not increase minimum distance. However,
it does eliminate some of the minimum distance error-events,
thereby providing some performance improvement. Moreover,
the incorporation of the constraint into the detector trellis for
EPR4 leads to a reduction of complexity from eight states
IMMINK et al.: CODES FOR DIGITAL RECORDERS 2287
INPUT PAIRS FOR FORBIDDEN ERROR STRINGS IN
to six states, eliminating those corresponding to the NRZ
channel inputs and .
In the case of E PR4, Behrens and Armstrong  showed
that the constraint provides a 2.2-dB increase in
minimum squared-Euclidean distance. To see why this is
the case, observe that forbidding the input error strings
will prevent all closed error events with
Forbidding, in addition, the strings pre-
vents all open events with as well. Table VIII depicts
pairs of binary input strings whose corresponding error strings
belong to The symbol
represents an arbitrary binary value common to both strings in
a pair. Clearly, the elimination of the NRZ strings and
precludes all of the input error strings. The precoded
constraint precludes the NRZ strings —that
is, the NRZ constraint is —conﬁrming that the
constraint prevents all events with When the
constraint is incorporated into the detector trellis, the resulting
structure has only 10 states, substantially less than the 16
states required by the uncoded channel.
The input error sequence analysis used above to conﬁrm the
distance-enhancing properties of the constraint on the
EPR4 channel suggests a relaxation of the constraint that
nevertheless still achieves the same distance gain. Speciﬁ-
cally, the constraint and the complementary
constraint are sufﬁcient to ensure the elimination of closed
and open events with The capacity of this constraint
satisﬁes , and a rate , ﬁnite-state encoder
with state-independent decoder is described in . The
corresponding detector trellis requires 12 states. Thus with a
modest increase in complexity, this code achieves essentially
the same performance as the rate code, while
increasing the rate by 20%.
This line of reasoning may be used to demonstrate the
distance-enhancing properties of another class of NRZI
runlength constraints, referred to as maximum-transition-run
(MTR) constraints . These constraints limit, sometimes
in a periodically time-varying manner, the maximum number
of consecutive ’s that can occur. The MTR constraints
are characterized by a parameter , which determined the
maximum allowable runlength of ’s. These constraints can
be interpreted as a generalization of the constraint,
which is the same as the MTR constraint with
The MTR constraint with was introduced by
Moon and Brickner  (see also Soljanin ). A labeled
graph representation is shown in Fig. 19. The capacity of this
constraint is Imposing an additional constraint,
which we now denote , on the maximum runlength of ’s
reduces the capacity, as shown in Table IX.
Fig. 19. Labeled graph for MTR constraint.
CAPACITY OF MTR
FOR SELECTED VALUES OF
INPUT PAIRS FOR FORBIDDEN ERROR STRINGS IN
The NRZI MTR constraint with corresponds to
an NRZ constraint The error-event characteriza-
tion in Table VII shows that the forbidden input error list
sufﬁces to eliminate the closed
error events on E PR4 with , though not all the open
events. Analysis of input pairs, shown in Table X, reveals that
the MTR constraint indeed eliminates the closed error events
with The detector trellis that incorporates the E PR4
memory with this MTR constraint requires 14 states.
A rate block code is shown in Table XI . It is
interesting to observe that the MTR constraint is the
symbol-wise complement of the constraint,
and the rate MTR codebook is the symbol-wise comple-
ment of the rate Group Code Recording code, shown in
Table I. With this code, all open error events with are
The MTR constraint supports codes with rates approaching
its capacity, , . However, in practical
applications, a distance-enhancing code with rate or higher
is considered very desirable. It has been shown that higher rate
trellis codes can be based upon time-varying MTR (TMTR)
constraints , , , . For example, the TMTR
constraint deﬁned by , which limits the maximum
runlength of ’s beginning at an even time-index to at most
, has capacity The constraint has been shown to
support a rate block code.
Graph representations for the TMTR constrained
system are shown in Fig. 20. The states in the upper graph
are depicted as either circles or squares, corresponding
to odd time indices and even time indices, respectively. The
numbering of states reﬂects the number of ’s seen since the
2288 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998
ENCODER TABLE FOR RATE 4/5,
MTR BLOCK CODE
Fig. 20. Labeled graphs for TMTR constraint.
last . In the upper graph , each state represents a unique
such number. The lower graph is obtained by successively
merging states with identical follower sets.
The TMTR constraint eliminates all closed error
events with on the E PR4 channel by preventing the
input error sequences
As with the MTR constraint, it can be shown that
all open error events with can be eliminated by an
appropriately designed rate , TMTR block code ,
,  , . The time-varying trellis used by the
detector for the rate coded E PR4 channel requires 16
states, no more states than the uncoded system. It has been
shown that these constraints and codes also may be applied
to the E PR4 channel to increase the minimum distance to
the channel MFB, that is from to .
Time-varying constraints for the E PR4 channel that support
distance-enhancing codes with rates larger than have also
been found .
Fig. 21 shows a computer simulation of the bit-error-rate
performance of four distance-enhancing constraints on the
EPR4 channel, assuming a constant channel bit rate .
As a result of the favorable tradeoff between performance
and complexity offered by high-rate distance-enhancing codes
for high-order PR channels, there is currently great interest in
deploying them in commercial magnetic data-storage systems,
and further research into the design of such codes is being
Finally, we remark that, for optical recording, the
constraint and the TMTR constraint increase the
minimum distance to on the PR2 and EPR2 channels,
yielding nominal coding gains of 1.8 and 3 dB, respectively.
A simple, rate code for the TMTR constraint
may be used with a four-state detector to realize these coding
gains , .
E. Precoded Convolutional Codes
An alternative, and in fact earlier, approach to coded-
modulation for PR channels of the form
was introduced by Wolf and Ungerboeck  (see also ).
Consider ﬁrst the case , the dicode channel. A binary
input sequence is applied to an NRZI precoder,
which implements the precoding operation characterized by
the polynomial The binary precoder
outputs are modulated to produce the bipolar
channel inputs according to the rule
Let be precoder inputs, with corresponding
channel outputs Then the Euclidean distance at
the output of the channel is related to the Hamming distance
at the input to the precoder by the inequality
Now, consider as precoder inputs the set of code sequences
in a convolutional code with states in the encoder and free
Hamming distance The outputs of the PR channel may
be described by a trellis with or fewer states , which
may be used as the basis for Viterbi detection. The inequality
(24) leads to the following lower bound on of the coded
if is even
if is odd.
This coding scheme achieves coding gains on the dicode
channel by the application of good convolutional codes, de-
signed for memoryless Gaussian channels, and the use of
a sequence detector trellis that reﬂects both the structure
of the convolutional code and the memory of the channel.
Using a nontrivial coset of the convolutional code ensures the
satisfaction of constraints on the zero runlengths at the output
of the channel.
It is clear that, by interleaving convolutional encoders
and using a precoder of the form , this
technique, and the bound on free distance, may be extended
to PR channels of the form In
particular, it is applicable to the PR4 channel corresponding to
The selection of the underlying convolutional
code and nontrivial coset to optimize runlength constraints,
free distance, and detector trellis complexity has been in-
vestigated by several authors. See, for example, , ,
IMMINK et al.: CODES FOR DIGITAL RECORDERS 2289
Fig. 21. Performance of uncoded and coded E PR4 systems.
and . For the PR4 channel, and speciﬁed free Euclidean
distance at the channel output, the runlength constraints and
complexity of precoded convolutional codes have been found
to be slightly inferior to those of matched-spectral-null (MSN)
codes. For example, a rate precoded convolutional code
was shown to achieve 3-dB gain (unnormalized for rate loss)
with constraints and a 16-state detector
trellis with 256 branches (per interleave). The comparable
MSN code with this gain achieved the equivalent of constraints
and used a six-state detector trellis with
24 branches (per interleave).
Recently, a modiﬁed version of this precoding approach
was developed for use with a rate turbo code . The
detection procedure incorporated an a posteriori probability
(APP) PR channel detector, combined with an iterative, turbo
decoder. Performance simulations of this coding scheme on
a PR4 channel with AWGN demonstrated a gain of 5.3 dB
(normalized for rate loss) at a bit-error rate of , relative to
the uncoded PRML channel. Turbo equalization, whereby the
PR detector is integrated into the iterative decoding procedure,
was also considered. This increased the gain by another 0.5
dB. Thus the improvement over the previously proposed rate
codes, which achieve 2-dB gain (normalized for rate loss)
is approximately 3.3–3.8 dB. The remaining gap in
between the rate turbo code performance at a bit-error
rate of and the upper bound capacity limit (3) at rate
 is approximately 2.25 dB . The corresponding
gap to the upper bound capacity limit at rate for the
precoded convolutional code and the MSN code is therefore
approximately 5.5–6 dB. This estimate of the SNR gap can be
compared with that implied by the continuous-time channel
capacity bounds, as discussed in Section II-B.
VI. COMPENDIUM OF MODULATION CONSTRAINTS
In this section, we describe in more detail selected properties
of constrained systems that have played a prominent role in
digital recording systems. The classes of runlength-
limited constraints and spectral-null constraints have already
been introduced. In addition, there are constraints that generate
spectral lines at speciﬁed frequencies, called pilot tracking
tones, which can be used for servo tracking systems in
videotape recorders , . Certain channels require a
combination of time and frequency constraints , ,
; speciﬁcally DC-balanced RLL sequences have found
widespread usage in recording practice. In addition, there are
many other constraints that play a role in recording systems;
see, for example, , , , , and . Table
XII gives a survey of recording constraints used in consumer
A. Runlength-Limited Sequences
We have already encountered -constrained binary se-
quences where We are also interested in the
case Fig. 22 illustrates a graph representing
2290 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998
SURVEY OF RECORDING CODES AND THEIR APPLICATION AREA
Fig. 22. Shannon cover for a constraint.
CAPACITY VERSUS RUNLENGTH PARAMETERS AND
For sequences we can easily derive the characteristic
Table XIII lists the capacity for selected values of
the parameters and
RLL sequences are used to increase the minimum separation
between recorded transitions. The quantity , called the
density ratio or packing density, is deﬁned as
It expresses the number of information bit intervals within
the minimum separation between consecutive transitions of an
RLL sequence. It may be shown that the density ratio can
be made arbitrarily large by choosing sufﬁciently large .
The minimum increment within a runlength is called the timing
window or detection window, denoted by Measured in
units of information bit intervals, Sequences
with a larger value of , and thus a lower capacity ,
are penalized by an increasingly difﬁcult tradeoff between the
detection window and the density ratio. Practical codes have
typically used constraints with
CAPACITY OF ASYMMETRICAL RUNLENGTH-LIMITED
SEQUENCES VERSUS MINIMUM RUNLENGTH
B. Asymmetrical Runlength Constraints
Asymmetrical runlength-limited sequences , ,
 have different constraints on the runlengths of ’s and
’s. One application of these constraints has been in optical
recording systems, where the minimum size of a written
pit, as determined by diffraction limitations, is larger than
the minimum size of the area separating two pits, a spacing
determined by the mechanical positioning capabilities of the
optical recording ﬁxture.
Asymmetrical runlength-limited sequences are described by
four parameters and , and ,
, which describe the constraints on runlengths of ’s
and ’s, respectively. An allowable sequence is composed
of alternate phrases of the form ,
Let one sequence be composed of phrases of durations
, and let the second sequence have phrases of
durations The interleaved sequence is composed of
phrases taken alternately from the ﬁrst, odd sequence and the
second, even sequence. The interleaved sequence is composed
of phrases of duration , , ,
implying that the characteristic equation is
which can be rewritten as
If we assume that , then (25) can be written as
As an immediate implication of the symmetry in and ,
we ﬁnd for the capacity of the asymmetrical runlength-limited
where denotes the capacity of asymmetrical
runlength-limited sequences. Thus the capacity of asym-
metrical RLL sequences is a function of the sum of the
two minimum runlength parameters only, and it sufﬁces to
evaluate by solving the characteristic equation
Results of computations are given in Table XIV.
IMMINK et al.: CODES FOR DIGITAL RECORDERS 2291
Fig. 23. Labeled graph for constraint.
We can derive another useful relation with the following
observation. Let , i.e., the restrictions on the runlengths
of ’s and ’s are again symmetric, then from (26)
so that we obtain the following relation between the capacity
of symmetrical and asymmetrical RLL sequences:
C. RLL Sequences with Multiple Spacings
Funk  showed that the theory of RLL sequences is
unnecessarily narrow in scope and that it precludes certain
relevant coding possibilities which could prove useful in
particular devices. The limitation is removed by introducing
multiple-spaced RLL sequences, where one further restriction
is imposed upon the admissible runlengths of ’s. The run-
length/spacing constraints may be expressed as follows: for
integers and where is a multiple of the number
of ’s between successive ’s must be equal to , where
The parameters and again deﬁne
the minimum and maximum allowable runlength. A sequence
deﬁned in this way is called an RLL sequence with multiple
spacing (RLL/MS). Such a sequence is characterized by the
parameters Note that for standard RLL sequences
we have Fig. 23 illustrates a state-transition diagram
for the constraint.
The capacity can simply be found by invoking
Shannon’s capacity formula
where is the largest root of the characteristic equation
Note that if and have a common factor , then
is also divisible by Therefore, a sequence
with the above condition on and is equivalent to a
sequence. For ,we
obtain the characteristic equation
Table XV shows the results of computations. Within any
adjacent bit periods, there is only one possible location for the
next , given the location of the last . The detection window
for an RLL/MS sequence is therefore and
the minimum spacing between two transitions, , equals
By rewriting (27) we obtain a relationship between
and , namely,
CAPACITY FOR SELECTED VALUES OF AND
Fig. 24. Relationship between and window The operating points
of various sequences are indicated.
This relationship is plotted, for , in Fig. 24. With
constrained sequences, only discrete points on this
curve are possible. RLL sequences with multiple spacing,
however, make it possible, by a proper choice of and ,
to approximate any point on this curve.
A multiple-spaced RLL code with parameters has
been designed and experimentally evaluated in exploratory
magnetooptic recording systems using a resonant bias coil
direct-overwrite technique , .
The constraints for partial-response maximum-
likelihood systems were introduced in Section II-C2. Recall
that the parameter stipulates the maximum number of
allowed ’s between consecutive ’s, while the parameter
stipulates the maximum number of ’s between ’s in both the
even- and odd-numbered positions of the sequence.
To describe a graph presentation of these constraints, we
deﬁne three parameters. The quantity denotes the number
of ’s since the last . The quantities and denote the
number of ’s since the last in the even and odd subsequence,
respectively. It is immediate that
Each state in the graph corresponds to a -tuple with
and Wherever permitted, there
2292 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998
CAPACITY FOR SELECTED VALUES OF AND
is an edge from state to state with a label
, and an edge from state to state with a label
. By computing the maximum eigenvalue of the adjacency
matrix corresponding to the graph, we obtain the capacity of
the constraint . Results of computations are listed in
For all of these constraints, rate codes have been
constructed . As mentioned earlier, a rate ,
block code was used in early disk drives employing PRML
techniques. Current disk drives make use of more relaxed
constraints, such as and , which can support
codes with even higher rates, such as rate , .
E. Spectral-Null Sequences
Frequency-domain analysis of constrained sequences
is based upon the average power spectral density, or, as
it is often called, the power spectrum. In order to deﬁne
the power spectrum, we must endow the ensemble of
constrained sequences with a probability measure. Generally,
the measure chosen is the maxentropic measure determined
by the transition probabilities discussed in Section III-B.
The autocorrelation function is the sequence of th-order
autocorrelation coefﬁcients , deﬁned by
where represent channel input symbols and the expec-
tation is with respect to the given measure. According to
the Wiener–Khinchin theorem, the average power spectrum
is given by the discrete-time Fourier transform of the autocor-
where, as before, Alternatively, we can express
The computation of the power spectrum of an ensemble of
Markov-chain driven sequences is well-studied and has been
carried out for many families of runlength-type constraints, as
well as for the subsets of constrained sequences generated by
speciﬁc ﬁnite-state encoders; see  and references therein.
It is important to note that for a particular sequence, the
average power density at a particular frequency , if it exists
at all, may differ signiﬁcantly from if
CAPACITY AND SUM VARIANCE OF MAXENTROPIC RDS-CONSTRAINED
SEQUENCES VERSUS DIGITAL SUM VARIATION
For spectral-null constraints with , however, every
sequence in the constraint has a well-deﬁned average power
density at , and the magnitude is equal to zero . As
has already been mentioned, the spectral null frequencies
of primary interest in digital recording are zero frequency
(DC) and the Nyquist frequency. (Further general results on
spectral-null sequences are given in , , and ,
Chien  studied bipolar sequences that assume a ﬁnite
range of consecutive running-digital-sum (RDS) values,
that is, sequences with digital-sum variation (DSV) The
range of RDS values may be used, as in Fig. 7, to deﬁne a
set of allowable states. The adjacency matrix for the
RDS-constrained channel is given by
For most constraints, it is not possible to ﬁnd a simple
closed-form expression for the capacity, and one has to rely
on numerical methods to obtain an approximation. The RDS-
constrained sequences provide a beautiful exception to the rule,
as the structure of allows us to provide a closed-form
expression for the capacity of an RDS-constrained channel.
We have 
and thus the capacity of the RDS-constrained channel is
Table XVII lists the capacity , for It
can be seen that the sum constraint is not very expensive in
terms of rate loss when is relatively large. For instance,
a sequence that takes at maximum sum values has a
capacity , which implies a rate loss of less than
Closed-form expressions for the spectra of maxentropic
RDS-constrained sequences were derived by Kerpez .
Fig. 25 displays the power spectral density function of max-
entropic RDS-constrained sequences for various values of the
digital sum variation
Let denote the power spectral density of a sequence
with vanishing power at DC, where The width of the
spectral notch is a very important design characteristic which
is usually quantiﬁed by a parameter called the cutoff frequency.
The cutoff frequency of a DC-free constraint, denoted by ,
IMMINK et al.: CODES FOR DIGITAL RECORDERS 2293
Fig. 25. Power density function of maxentropic RDS-constrained
sequences against frequency with digital sum variation as a parameter.
For the case , we have indicated the cutoff frequency
is deﬁned by 
It can be observed that the cutoff frequency becomes
smaller when the digital sum variation is allowed to
Let denote RDS Justesen  discovered
a useful relation between the sum variance and
the width of the spectral notch He found the following
approximation of the cutoff frequency :
Extensive computations of samples of implemented channel
codes, made by Justesen  and Immink  to validate the
reciprocal relation (29) between and , have revealed that
this relationship is fairly reliable. The sum variance of
a maxentropic RDS-constrained sequence, denoted by ,
is given by 
Table XVII lists the sum variance for
Fig. 26, which shows a plot of the sum variance versus the
redundancy , affords more insight into the tradeoffs
in the engineering of DC-balanced sequences. It presents the
designer with a spectral budget, reﬂecting the price in terms
of code redundancy for a desired spectral notch width. It also
reveals that the relationship between the logarithms of the sum
variance and the redundancy is approximately linear.
For large digital sum variation , it was shown by A.
Janssen  that
Fig. 26. Sum variance versus redundancy of maxentropic RDS-constrained
These approximations, coupled with (28) and (30), lead to
a fundamental relation between the redundancy and
the sum variance of a maxentropic RDS-constrained sequence,
Actually, the bound on the right is within 1% accuracy for
Equation (31) states that, for large enough , the
product of redundancy and sum variance of maxentropic
RDS-constrained sequences is approximately constant, as was
suggested by Fig. 26.
VII. FUTURE DIRECTIONS
As digital recording technology advances and changes, so
does the system model that serves as the basis for information-
theoretic analysis and the motivation for signal processing and
coding techniques. In this section, we brieﬂy describe several
technology developments, some evolutionary and some revo-
lutionary, that introduce new elements that can be incorporated
into mathematical models for digital recording channels.
A. Improved Channel Models
Reﬂecting the continuing, rapid increase in areal density of
conventional magnetic recording, as well as the characteristics
of the component heads and disks, channel models now incor-
porate factors such as asymmetry in the positive and negative
step responses of magnetoresistive read heads; deviations
from linear superposition; spectral coloring, nonadditivity, and
nonstationarity in media noise; and partial-erasure effects and
other data-dependent distortions , , , .
The evaluation of the impact of these channel characteristics
on the performance of the signal processing and coding
techniques dicussed in this paper is an active area of research,
as is the development of new approaches that take these
channel properties into account. See, for example, related
papers in .
2294 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998
B. Nonsaturation Multilevel Recording
At various times during the past, the possibility of aban-
doning saturation recording, “linearizing” the digital magnetic-
recording channel, and incorporating nonbinary signaling has
been examined. In all such studies, however, the potential
increase in recording density that might accrue from the
application or adaptation of coded-modulation techniques de-
veloped for digital communications has been outweighed by
the increase in detector complexity and, more fundamentally,
the cost in signal-to-noise ratio that accompanies the lineariza-
tion process. However, several novel storage technologies
can support multilevel alphabets, such as electron-trapping
optical memories ETOM ,  and optical recording with
multivalued magnetooptic media .
C. Multitrack Recording
Another avenue toward increasing the storage capacity
of disk and tape systems is to exploit their inherent two-
dimensional nature. Runlength-limited codes, such as -track
codes, that increase the per-track code rate by sharing
the timing constraint across multiple tracks have been
analyzed and designed , , .
Using models of signal-to-noise ratio dependence upon
track width, as well as intertrack interference (ITI), one
can investigate information-theoretic capacity bounds as a
function of track density. Multitrack recording and multihead
detection techniques based upon partial-response equalization,
decision-feedback-equalization, and sequence detection have
been studied , along with coding schemes that can improve
their performance. See, for example,  and references
D. Multidimensional Recording
New, exploratory technologies, such as volume holographic
data storage  and two-photon-based three-dimensional
(3-D) optical memories , have generated interest in page-
oriented recording and readback. Models of these processes
have generated proposals for two-dimensional equalization and
detection methods , , along with two-dimensional
codes , .
This has generated interest in two-dimensional constrained
systems and modulation codes. As an example, consider a two-
dimensional binary constrained array as an (row) by
(column) binary array such that every has no less than ’s
and no more than ’s above it, below it, to the right of it, and
to the left of it (with the exception of ’s on or near borders).
The capacity of such an array is equal to the limit, as and
approach inﬁnity, of the ratio of the logarithm of the number
of distinct arrays satisfying the constraints to the product of
times Little is known at this time about ﬁnding the
capacity of such two-dimensional binary constrained arrays.
A notable exception is that it has been proved that the two-
dimensional capacity of such two-dimensional binary
arrays is equal to zero if and only if and .
Thus the two-dimensional capacity of the constraint is
equal to , while the two-dimensional capacity of the
constraint is strictly greater than . This is in contrast to the
one-dimensional case, where the capacity of both and
constrained binary sequences are both nonzero and, in
fact, are equal. Lower bounds on the capacity of some two-
dimensional constraints are presented in , ,
and other constraints relevant to two-dimensional recording
are analyzed in , , and .
VIII. SHANNON’SCROSSWORD PUZZLES
A. Existence of Multidimensional Crossword Puzzles
As mentioned in the preceding section, multidimensional
constrained codes represent a new challenge for information
theorists, with potentially important applications to novel,
high-density storage devices. We feel it is particularly ﬁtting,
then, to bring our survey to a close by returning once more
to Shannon’s 1948 paper  where, remarkably, in a short
passage addressing the connection between the redundancy of
a language and the existence of crossword puzzles, Shannon
anticipated some of the issues that arise in multidimensional
Speciﬁcally, Shannon suggested that there would be cases
where the capacity of a two-dimensional constraint is equal
to zero, even though the capacity of the constituent one-
dimensional constraint is nonzero, a situation illustrated by
certain two-dimensional constraints. We cite the fol-
lowing excerpt from Shannon’s 1948 paper:
The ratio of the entropy of a source to the maximum
value it could have while still restricted to the same sym-
bols will be called its relative entropy.One minus the
relative entropy is the redundancy.The redundancy
of a language is related to the existence of crossword
puzzles. If the redundancy is zero any sequence of
letters is a reasonable text in the language and any two-
dimensional array of letters forms a crossword puzzle.
If the redundancy is too high the language imposes
too many constraints for large crossword puzzles to be
possible. A more detailed analysis shows that if we
assume the constraints imposed by the language are
of a rather chaotic and random nature, large crossword
puzzles are just possible when the redundancy is 50%.
If the redundancy is 33%, three-dimensional crossword
puzzles should be possible, etc.
To the best of our knowledge Shannon never published a
more detailed exposition on this subject. This led us to try to
construct a plausibility argument for his statement. We assume
that the phrase “large crossword puzzles are just possible”
should be taken to mean that the capacity of the corresponding
two-dimensional constraint is nonzero.
Let denote the number of source symbols, denote
the source binary entropy, and denote
the relative entropy. We begin with all by arrays
that can be formed from symbols. We eliminate all arrays
that do not have all of their rows and columns made up
of a concatenation of allowable words from the language.
The probability that any row of the array is made up of a
concatenation of allowable words from the language is equal
IMMINK et al.: CODES FOR DIGITAL RECORDERS 2295
to the ratio of the number of allowable concatenations of
words with letters, ,to Thus assuming statistical
independence of the rows, the probability that all rows are
concatenations of allowable words is this ratio raised to the
th power, or or The identical
ratio results for the probability that all columns are made
up of concatenations of allowable words. Now assuming that
the rows and columns are statistically independent, we see that
the probability for an array to have all of its rows and all of
its columns made up of concatenations of allowable words is
equal to The assumption of independence of
the rows and columns is made with the sole justiﬁcation that
this property might be expected to be true for a language that
is “of a rather chaotic and random nature.” Multiplying this
probability by the number of arrays yields the average
number of surviving arrays, which grows
exponentially with provided that A similar
argument for three-dimensional arrays yields the condition
This is Shannon’s result. (The authors thank K.
Shaughnessy  for contributions to this argument.) We
remark that for ordinary English crossword puzzles, we would
interpret the black square to be a 27th symbol in the alphabet.
Thus to compute the “relative entropy” of English, we divide
the entropy of English by In this context, we would
propose using an unusual deﬁnition of the entropy of English,
which we call , based upon the dependencies of letters
within individual words, but not across word boundaries, since
the rows and columns of crossword puzzles are made up of
unrelated words separated by one or more black squares. To
compute for the English language, we can proceed as
follows. Assume that is the number of words in an English
dictionary with letters, for We lengthen each
word by one letter to include the black square at the end of a
word and then add one more word of length to represent a
single black square. (This allows more than one black square
between words.) Following Shannon, the number of distinct
sequences of words containing exactly symbols, ,is
given by the difference equation
Then, is given by the logarithm of the largest real root of
The distribution of word lengths in an English dictionary
has been investigated by Lord Rothschild . (See also the
discussion in Section VIII-C.)
B. Connections to Two-Dimensional Constraints
Unfortunately, a direct application of Shannon’s statement
to the and constraints leads to a
problem. Their one-dimensional capacities and, therefore, their
relative entropies, are equal, with However,
we have seen that the capacity of the two-dimensional
constraint is zero, while that of the two-dimensional
constraint is nonzero. In order to resolve this inconsistency
with Shannon’s bound, we tried to modify the argument by
more accurately approximating the probability of a column
satisfying the speciﬁed row constraint, as follows.
Although the one-dimensional capacities of the two con-
straints are equal, the one-dimensional constraints have dif-
ferent ﬁrst-order entropies In particular,
for the constraint and for the
constraint, since the relative frequency of ’s is higher for the
constraint than for the constraint. In the previous
plausibility argument for Shannon’s result, once one chooses
the rows of the array to be a concatenation of allowable words,
the relative frequencies of the symbols in each column occur
in accordance with the relative frequency of the symbols in the
words of the language. Thus the probability that any column is
a concatenation of allowable words is equal to
Proceeding as above, we ﬁnd that the average number of
surviving arrays grows exponentially with provided that
for two-dimensional arrays, or
for three-dimensional arrays.
However, for both the one-dimensional and
constraints, we ﬁnd Therefore, this modiﬁed
analysis still does not satisfactorily explain the behavior of
these two constraints. A possible explanation is that a further
reﬁnement in the argument is needed. Another possibility is
that these constraints are not “chaotic and random”
enough for Shannon’s conclusion, and our plausibility argu-
ments, to apply.
As this paper was undergoing ﬁnal revisions, one of the
authors (JKW) received a letter from E. Gilbert pertaining to
Shannon’s crossword puzzles . The letter was prompted
by a lecture given by JKW at the Shannon Day Symposium,
held at Bell Labs on May 18, 1998, in which the connection
between the capacity of two-dimensional constraints and Shan-
non’s result on crossword puzzles was discussed. In the letter,
Gilbert recalls a conversation he had with Shannon 50 years
ago on this subject. Referring to Shannon’s paper, he says:
I didn’t understand that crossword example and tried to
reconstruct his argument. That led to a kind of hand-
waving “proof,” which I showed to Claude. Claude’s
own argument turned out to have been something like
mine Fortunately, I outlined my proof in the margin
of my reprint of the paper (like Fermat and his copy of
Diophantos). It went like this:
The argument that followed is exactly the same as the one
presented in Section VIII-A above, with the small exception
that arrays were assumed to be square. In fact, in a subsequent
e-mail correspondence , Gilbert describes a calculation
of the redundancy of English along the lines suggested by
(32) and (33). Thus we see that the study of multidimensional
constrained arrays actually dates back 50 years to the birth of
information theory. A great deal remains to be learned.
In this paper, we have attempted to provide an overview
of the theoretical foundations and practical applications of
2296 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998
constrained coding in digital-recording systems. In keeping
with the theme of this special issue, we have highlighted
essential contributions to this area made by Shannon in his
landmark 1948 paper. We described the basic characteristics
of a digital-recording channel, and surveyed bounds on the
noisy-channel capacity for several mathematical channel mod-
els. We then discussed practical equalization and detection
techniques and indicated how their implementation imposes
constraints on the recording-channel inputs. Following a re-
view of Shannon’s fundamental results on the capacity of
discrete noiseless channels and on the existence of efﬁcient
codes, we presented a summary of key results in the theory
and practice of efﬁcient constrained code design. We then
discussed the application of distance-enhancing constrained
codes to improve the reliability of noisy recording channels,
and compared the resulting performance to estimates of the
noisy-channel capacity. Finally, we pointed out several new
directions that future research in the area of recording codes
might follow, and we concluded with a discussion of the
connection between Shannon’s remarks on crossword puzzles
and the theory of multidimensional constrained codes. Through
the inclusion of numerous references and indications of open
research problems, we hope to have provided the reader with
an introduction to this fascinating, important, and active branch
of information theory, as well as with some incentive and
encouragement to contribute to it.
The authors are grateful to Dick Blahut, Brian Marcus,
Ron Roth, and Emina Soljanin for their thoughtful comments
on an earlier version of this paper. They also wish to thank
Bruce Moision for assistance with computer simulations and
for preparation of Fig. 21.
 K. A. S. Abdel-Ghaffar and J. H. Weber, “Constrained block codes for
class–IV partial-response channels with maximum-likelihood sequence
estimation,” IEEE Trans. Inform. Theory, vol. 42, pp. 1405–1424, Sept.
 R. L. Adler, “The torus and the disk,” IBM J. Res. Develop., vol. 31,
no. 2, pp. 224–234, Mar. 1987.
 R. L. Adler, D. Coppersmith, and M. Hassner, “Algorithms for sliding
block codes: An application of symbolic dynamics to information
theory,” IEEE Trans. Inform. Theory, vol. IT-29, pp. 5–22, Jan. 1983.
 R. L. Adler, M. Hassner, and J. Moussouris, “Method and apparatus for
generating a noiseless sliding block code for a (1, 7) channel with rate
2/3,” U.S. Patent 4413 251, June 1982.
 N. Alon, E. E. Bergmann, D. Coppersmith, and A. M. Odlyzko,
“Balancing sets of vectors,” IEEE Trans. Inform. Theory, vol. 34, pp.
128–130, Jan. 1988.
 S. Altekar, “Detection and coding techniques for magnetic recording
channels,” Ph.D. dissertation, Univ. Calif. San Diego, June 1997.
 S. A. Altekar, M. Berggren, B. E. Moision, P. H. Siegel, and J. K.
Wolf, “Error-event characterization on partial-response channels,” in
Proc. 1997 IEEE Int. Symp. Information Theory (Ulm, Germany, June
29–July 4), p. 461; IEEE Trans. Inform. Theory, vol. 45, Jan. 1999, to
 J. Ashley, R. Karabed, and P. H. Siegel, “Complexity and sliding-block
decodability,” IEEE Trans. Inform. Theory, vol. 42, no. 6, pt. 1, pp.
1925–1947, Nov. 1996.
 J. J. Ashley and B. H. Marcus, “Canonical encoders for sliding block
decoders,” SIAM J. Discrete Math., vol. 8, pp. 555–605, 1995.
 , “A generalized state-splitting algorithm,” IEEE Trans. Inform.
Theory, vol. 43, pp. 1326–1338, July 1997.
 , “Two-dimensional low-pass ﬁltering codes,” IEEE Trans. Com-
mun., vol. 46, pp. 724–727, June 1998.
 J. J. Ashley, B. H. Marcus, and R. M. Roth, “Construction of encoders
with small decoding look-ahead for input-constrained channels,” IEEE
Trans. Inform. Theory, vol. 41, pp. 55–76, Jan. 1995.
 L. C. Barbosa, “Simultaneous detection of readback signals from inter-
fering magnetic recording tracks using array heads,” IEEE Trans. Magn.,
vol. 26, pp. 2163–2165, Sept. 1990.
 I. Bar-David and S. Shamai (Shitz), “Information rates for magnetic
recording channels with peak- and slope-limited magnetization,” IEEE
Trans. Inform. Theory, vol. 35, pp. 956–962, Sept. 1989.
 M.-P. B´
eal, Codage Symbolique. Paris, France: Masson, 1993.
 G. F. M. Beenker and K. A. S. Immink, “A generalized method
for encoding and decoding runlength-limited binary sequences,” IEEE
Trans. Inform. Theory, vol. IT-29, pp. 751–754, Sept. 1983.
 R. Behrens and A. Armstrong, “An advanced read/write channel for
magnetic disk storage,” in Proc. 26th Asilomar Conf. Signals, Systems,
and Computers (Paciﬁc Grove, CA, Oct. 1992), pp. 956–960.
 E. R. Berlekamp, “The technology of error-correcting codes,” Proc.
IEEE, vol. 68, pp. 564–593, May 1980.
 M. Berkoff, “Waveform compression in NRZI magnetic recording,”
Proc. IEEE, vol. 52, pp. 1271–1272, Oct. 1964.
 H. N. Bertram, Theory of Magnetic Recording. Cambridge, U.K.:
Cambridge Univ. Press, 1994
 H. N. Bertram and X. Che, “General analysis of noise in recorded
transitions in thin ﬁlm recording media,” IEEE Trans. Magn., vol. 29,
pp. 201–208, Jan. 1993.
 W. G. Bliss, “Circuitry for performing error correction calculations on
baseband encoded data to eliminate error propagation,” IBM Tech. Discl.
Bull., vol. 23, pp. 4633–4634, 1981.
 , “An 8/9 rate time-varying trellis code for high density magnetic
recording,” IEEE Trans. Magn., vol. 33, pp. 2746–2748, Sept. 1997.
 W. G. Bliss, S. She, and L. Sundell, “The performance of generalized
maximum transition run trellis codes,” IEEE Trans. Magn., vol. 34, no.
1, pt. 1, pp. 85–90, Jan. 1998.
 G. Bouwhuis, J. Braat, A. Huijser, J. Pasman, G. van Rosmalen, and K.
A. S. Immink, Principles of Optical Disc Systems. Bristol, U.K. and
Boston, MA: Adam Hilger, 1985.
 F. K. Bowers, U.S. Patent 2 957947, 1960.
 V. Braun, K. A. S. Immink, M. A. Ribiero, and G. J. van den Enden,
“On the application of sequence estimation algorithms in the Digital
Compact Cassette (DCC),” IEEE Trans. Consumer Electron., vol. 40,
pp. 992–998, Nov. 1994.
 V. Braun and A. J. E. M. Janssen, “On the low-frequency suppression
performance of DC-free runlength-limited modulation codes,” IEEE
Trans. Consumer Electron., vol. 42, pp. 939–945, Nov. 1996.
 B. Brickner and J. Moon, “Investigation of error propagation in
DFE and MTR coding for ultra-high density,” Tech. Rep.,
Commun. Data Storage Lab., Univ. Minnesota, Minneapolis, July 10,
 A. R. Calderbank, C. Heegard, and T.-A. Lee, “Binary convolutional
codes with application to magnetic recording, IEEE Trans. Inform.
Theory, vol. IT-32, pp. 797–815, Nov. 1986.
 A. R. Calderbank, R. Laroia, and S. W. McLaughlin, “Coded modulation
and precoding for electron-trapping optical memories,” IEEE Trans.
Commun., vol. 46, pp. 1011–1019, Aug. 1998.
 J. Caroselli and J. K. Wolf, “A new model for media noise in thin ﬁlm
magnetic recording media,” in Proc. 1995 SPIE Int. Symp. Voice, Video,
and Data Communications (Philadelphia, PA, Oct. 1995), vol. 2605, pp.
 J. Caroselli and J. K. Wolf, “Applications of a new simulation model for
media noise limited magnetic recording channels,” IEEE Trans. Magn.,
vol. 32, pp. 3917–3919, Sept. 1996.
 K. W. Cattermole, Principles of Pulse Code Modulation. London,
U.K.: Iliffe, 1969.
 , “Principles of digital line coding,” Int. J. Electron., vol. 55, pp.
3–33, July 1983.
 Workshop on Modulation, Coding, and Signal Processing for Magnetic
Recording Channels, Center for Magnetic Recording Res., Univ. Calif.
at San Diego. La Jolla, CA, May 20–22, 1985.
 Workshop on Modulation and Coding for Digital Recording Systems,
Center for Magnetic Recording Res., Univ. Calif. at San Diego. La
Jolla, CA, Jan. 8–10, 1987.
 T. M. Chien, “Upper bound on the efﬁciency of DC-constrained codes,”
Bell Syst. Tech. J., vol. 49, pp. 2267–2287, Nov. 1970.
 R. Cideciyan, F. Dolivo, R. Hermann, W. Hirt, and W. Schott, “A PRML
system for digital magnetic recording,” IEEE J. Select. Areas Commun.,
vol. 10, pp. 38–56, Jan. 1992.
IMMINK et al.: CODES FOR DIGITAL RECORDERS 2297
 M. Cohn and G. V. Jacoby, “Run-length reduction of 3PM code via look-
ahead technique,” IEEE Trans. Magn., vol. MAG-18, pp. 1253–1255,
 D. J. Costello, Jr., J. Hagenauer, H. Imai, and S. B. Wicker, “Applica-
tions of error control coding,” this issue, pp. 2531–2560.
 T. M. Cover, “Enumerative source coding,” IEEE Trans. Inform. Theory,
vol. IT-19, pp. 73–77, Jan. 1973.
 R. H. Deng and M. A. Herro, “DC-free coset codes,” IEEE Trans.
Inform. Theory, vol. 34, pp. 786–792, July 1988.
 J. Eggenberger and P. Hodges, “Sequential encoding and decoding of
variable length, ﬁxed rate data codes,” U.S. Patent 4 115768, 1978.
 J. Eggenberger and A. M. Patel, “Method and apparatus for implement-
ing optimum PRML codes,” U.S. Patent 4707 681, Nov. 17, 1987.
 E. Eleftheriou and R. Cideciyan, “On codes satisfying th order
running digital sum constraints,” IEEE Trans. Inform. Theory, vol. 37,
pp. 1294–1313, Sept. 1991.
 T. Etzion, “Cascading methods for runlength-limited arrays,” IEEE
Trans. Inform. Theory, vol. 43, pp. 319–324, Jan. 1997.
 I. J. Fair, W. D. Gover, W. A. Krzymien, and R. I. MacDonald, “Guided
scrambling: A new line coding technique for high bit rate ﬁber optic
transmission systems,” IEEE Trans. Commun., vol. 39, pp. 289–297,
 J. L. Fan and A. R. Calderbank, “A modiﬁed concatenated coding
scheme with applications to magnetic data storage,” IEEE Trans. Inform.
Theory, vol. 44, pp. 1565–1574, July 1998.
 M. J. Ferguson, “Optimal reception for binary partial response chan-
nels,” Bell Syst. Tech. J., vol. 51, pp. 493–505, 1972.
 J. Fitzpatrick and K. J. Knudson, “Rate
modulation code for a magnetic recording channel,” U.S. Patent
5635 933, June 3, 1997.
 K. K. Fitzpatrick and C. S. Modlin, “Time-varying MTR codes for high
density magnetic recording,” in Proc. 1997 IEEE Global Telecommuni-
cations Conf. (GLOBECOM ’97) (Phoenix, AZ, Nov. 4–8, 1997).
 G. D. Forney, Jr., “Maximum likelihood sequence detection in the
presence of intersymbol interference,” IEEE Trans. Inform. Theory, vol.
IT-18, pp. 363–378, May 1972.
 , “The Viterbi algorithm,” Proc. IEEE, vol. 61, no. 3, pp. 268–278,
 G. D. Forney, Jr. and A. R. Calderbank, “Coset codes for partial response
channels; or, cosets codes with spectral nulls,” IEEE Trans. Inform.
Theory, vol. 35, pp. 925–943, Sept. 1989.
 G. D. Forney, Jr. and G. Ungerboeck, “Modulation and coding for linear
gaussian channels,” this issue, pp. 2384–2415.
 P. A. Franaszek, “Sequence-state encoding for digital transmission,” Bell
Syst. Tech. J., vol. 47, pp. 143–157, Jan. 1968.
 , “Sequence-state methods for run-length-limited coding,” IBM J.
Res. Develop., vol. 14, pp. 376–383, July 1970.
 , “Run-length-limited variable length coding with error propaga-
tion limitation,” U.S. Patent 3 689 899, Sept. 1972.
 , “On future-dependent block coding for input-restricted chan-
nels,” IBM J. Res. Develop., vol. 23, pp. 75–81, 1979.
 , “Synchronous bounded delay coding for input restricted chan-
nels,” IBM J. Res. Develop., vol. 24, pp. 43–48, 1980.
 , “A general method for channel coding,” IBM J. Res. Develop.,
vol. 24, pp. 638–641, 1980.
 , “Construction of bounded delay codes for discrete noiseless
channels,” IBM J. Res. Develop., vol. 26, pp. 506–514, 1982.
 , “Coding for constrained channels: A comparison of two ap-
proaches,” IBM J. Res. Develop., vol. 33, pp. 602–607, 1989.
 J. N. Franklin and J. R. Pierce, “Spectra and efﬁciency of binary codes
without DC,” IEEE Trans. Commun., vol. COM-20, pp. 1182–1184,
 L. Fredrickson, unpublished report, 1993.
 , “Time-varying modulo trellis codes for input restricted partial
response channels,” U.S. Patent 5257 272, Oct. 26, 1993.
 L. Fredrickson, R. Karabed, J. W. Rae, P. H. Siegel, H. Thapar, and
R. Wood, “Improved trellis coding for partial response channels,” IEEE
Trans. Magn., vol. 31, pp. 1141–1148, Mar. 1995.
 C. V. Freiman and A. D. Wyner, “Optimum block codes for noiseless
input restricted channels,” Inform. Contr., vol. 7, pp. 398–415, 1964.
 C. A. French and J. K. Wolf, “Bounds on the capacity of a peak
power constrained Gaussian channel,” IEEE Trans. Magn., vol. 24, pp.
2247–2262, Sept. 1988.
 S. Fukuda, Y. Kojima, Y. Shimpuku, and K. Odaka, “8/10 modulation
codes for digital magnetic recording,” IEEE Trans. Magn., vol. MAG-22,
pp. 1194–1196, Sept. 1986.
 P. Funk, “Run-length-limited codes with multiple spacing,” IEEE Trans.
Magn., vol. MAG-18, pp. 772–775, Mar. 1982.
 A. Gabor, “Adaptive coding for self-clocking recording,” IEEE Trans.
Electron. Comp., vol. EC-16, pp. 866–868, Dec. 1967.
 R. Gallager, Information Theory and Reliable Communication. New
York: Wiley, 1968.
 A. Gallopoulos, C. Heegard, and P. H. Siegel, “The power spectrum of
run-length-limited codes,” IEEE Trans. Commun., vol. 37, pp. 906–917,
 F. R. Gantmacher, Matrix Theory, Volume II. New York: Chelsea,
 E. Gilbert, private correspondence, May 1998.
 , private e-mail, June 1998.
 J. Gu and T. Fuja, “A new approach to constructing optimal block
codes for runlength-limited channels,” IEEE Trans. Inform. Theory, vol
40, pp. 774–785, May 1994.
 J. Heanue, M. Bashaw, and L. Hesselink, “Volume holographic storage
and retrieval of digital data,” Science, vol. 265, pp. 749–752, 1994.
 , “Channel codes for digital holographic data storage,” J. Opt.
Soc. Amer. Ser. A, vol. 12, pp. 2432–2439, 1995.
 J. Heanue, K. Gurkan, and L. Hesselink, “Signal detection for page-
access optical memories with intersymbol interference,” Appl. Opt., vol.
35, no. 14, pp. 2431–2438, May 1996.
 C. Heegard and L. Ozarow, “Bounding the capacity of saturation
recording: the Lorentz model and applications,” IEEE J. Select. Areas
Commun., vol. 10, pp. 145–156, Jan. 1992.
 J. P. J. Heemskerk and K. A. S. Immink, “Compact disc: System aspects
and modulation,” Philips Tech. Rev., vol. 40, no. 6, pp. 157–164, 1982.
 P. S. Henry, “Zero disparity coding system,” U.S. Patent 4 309694, Jan.
 T. Himeno, M. Tanaka, T. Katoku, K. Matsumoto, M. Tamura, and
H. Min-Jae, “High-density magnetic tape recording by a nontracking
method,” Electron. Commun. in Japan, vol. 76. no. 5, pt. 2, pp. 83–93,
 W. Hirt, “Capacity and information rates of discrete-time channels with
memory,” Ph.D. dissertation (Diss. ETH no. 8671), Swiss Federal Inst.
Technol. (ETH), Zurich, Switzerland, 1988.
 W. Hirt and J. L. Massey, “Capacity of the discrete-time Gaussian
channel with intersymbol interference,” IEEE Trans. Inform. Theory,
vol. 34, pp. 380–388, May 1988.
 K. J. Hole, “Punctured convolutional codes for the partial-response
channel,” IEEE Trans. Inform. Theory, vol. 37, pt. 2, pp. 808–817, May
 K. J. Hole and Ø. Ytrehus, “Improved coding techniques for partial-
response channels,” IEEE Trans. Inform. Theory, vol. 40, pp. 482–493,
 H. D. L. Hollmann, “Modulation codes,” Ph.D. dissertation, Eindhoven
Univ. Technol., Eindhoven, The Netherlands, Dec. 1996.
 , “On the construction of bounded-delay encodable codes for con-
strained systems,” IEEE Trans. Inform. Theory, vol. 41, pp. 1354–1378,
 , “Bounded-delay-encodable, block-decodable codes for con-
strained systems,” IEEE Trans. Inform. Theory, vol. 42, pp. 1957–1970,
 J. E. Hopcroft and J. D. Ullman, Introduction to Automata Theory,
Languages, and Computation. Reading, MA: Addison-Wesley, 1979.
 S. Hunter, F. Kiamilev, S. Esener, D. Parthenopoulos, and P. M.
Rentzepis,“Potentials of two-photon based 3D optical memories for high
performance computing,” Appl. Opt., vol. 29, pp. 2058–2066, 1990.
 K. A. S. Immink, “Modulation systems for digital audio discs with
optical readout,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal
Processing (Atlanta, GA, Apr. 1981), pp. 587–590.
 , “Construction of binary DC-constrained codes,” Philips J. Res.,
vol. 40, pp. 22–39, 1985.
 , “Performance of simple binary DC-constrained codes,” Philips
J. Res., vol. 40, pp. 1–21, 1985.
 , “Spectrum shaping with DC -constrained channel codes,”
Philips J. Res., vol. 40, pp. 40–53, 1985.
 , “Spectral null codes,” IEEE Trans. Magn., vol. 26, pp.
1130–1135, Mar. 1990.
 , “Runlength-limited sequences,” Proc. IEEE, vol. 78, pp.
1745–1759, Nov. 1990.
 ,Coding Techniques for Digital Recorders. Englewood Cliffs,
NJ: Prentice-Hall Int. (UK), 1991.
 , “Block-decodable runlength-limited codes via look-ahead tech-
nique,” Philips J. Res., vol. 46, pp. 293–310, 1992.
 , “Constructions of almost block-decodable runlength-limited
codes,” IEEE Trans. Inform. Theory, vol. 41, pp. 284–287, Jan. 1995.
 , “The Digital Versatile Disc (DVD): System requirements and
channel coding,” SMPTE J., vol. 105, no. 8, pp. 483–489, Aug. 1996.
 , “A practical method for approaching the channel capacity
2298 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998
of constrained channels,” IEEE Trans. Inform. Theory, vol. 43, pp.
1389–1399, Sept. 1997.
 , “Weakly constrained codes,” Electron. Lett., vol. 33, no. 23, pp.
1943–1944, Nov. 1997.
 K. A. S. Immink and G. F. M. Beenker, “Binary transmission codes
with higher order spectral zeros at zero frequency,” IEEE Trans. Inform.
Theory, vol. IT-33, pp. 452–454, May 1987.
 K. A. S. Immink and H. Ogawa, “Method for encoding binary data,”
U.S. Patent 4501 000, Feb. 1985.
 K. A. S. Immink and L. Patrovics, “Performance assessment of DC-free
multimode codes,” IEEE Trans. Commun., vol. 45, pp. 293–299, Mar.
 K. A. S. Immink and A. van Wijngaarden, “Simple high-rate constrained
codes,” Electron. Lett., vol. 32, no. 20, pp. 1877, Sept. 1996.
 G. V. Jacoby, “A new look-ahead code for increasing data density,”
IEEE Trans. Magn., vol. MAG-13, pp. 1202–1204, Sept. 1977. See also
U.S. Patent 4323 931, Apr. 1982.
 G. V. Jacoby and R. Kost, “Binary two-thirds rate code with full word
look-ahead,” IEEE Trans. Magn., vol. MAG-20, pp. 709–714, Sept.
1984. See also M. Cohn, G. V. Jacoby, and C. A. Bates III, U.S. Patent
4337 458, June 1982.
 A. J. E. M. Janssen, private communication, 1998.
 A. J. E. M. Janssen and K. A. S. Immink, “Entropy and power
spectrum of asymmetrically DC-constrained binary sequences’, IEEE
Trans. Inform. Theory, vol. 37, pp. 924–927, May 1991.
 J. Justesen, “Information rates and power spectra of digital codes,” IEEE
Trans. Inform. Theory, vol. IT-28, pp. 457–472, May 1982.
 P. Kabal and S. Pasupathy, “Partial-response signaling,” IEEE Trans.
Commun., vol. COM-23, pp. 921–934, Sept. 1975.
 J. A. H. Kahlman and K. A. S. Immink, “Channel code with embedded
pilot tracking tones for DVCR,” IEEE Trans. Consumer Electron., vol.
41, pp. 180–185, Feb. 1995.
 H. Kamabe, “Minimum scope for sliding block decoder mappings,”
IEEE Trans. Inform. Theory, vol. 35, pp. 1335–1340, Nov. 1989.
 R. Karabed and B. H. Marcus, “Sliding-block coding for input-restricted
channels,” IEEE Trans. Inform. Theory, vol. 34, pp. 2–26, Jan. 1988.
 R. Karabed and P. H. Siegel, “Matched spectral-null codes for partial
response channels,” IEEE Trans. Inform. Theory, vol. 37, no. 3, pt. II,
pp. 818–855, May 1991.
 , “Coding for higher order partial response channels,” in Proc.
1995 SPIE Int. Symp. Voice, Video, and Data Communications (Philadel-
phia, PA, Oct. 1995), vol. 2605, pp. 115–126.
 R. Karabed, P. Siegel, and E. Soljanin, “Constrained coding for channels
with high intersymbol interference,”IEEE Trans. Inform. Theory,tobe
 A. Kato and K. Zeger, “On the capacity of two-dimensional run-
length-limited codes,” in Proc. 1998 IEEE Int. Symp. Information Theory
(Cambridge, MA, Aug. 16–21, 1998), p. 320; submitted for publication
to IEEE Trans. Inform. Theory.
 W. H. Kautz, “Fibonacci codes for synchronization control,” IEEE
Trans. Inform. Theory, vol. IT-11, pp. 284–292, 1965.
 K. J. Kerpez, “The power spectral density of maximum entropy charge
constrained sequences,” IEEE Trans. Inform. Theory, vol. 35, pp.
692–695, May 1989.
 Z.-A. Khayrallah and D. Neuhoff, “Subshift models and ﬁnite-state
codes for input-constrained noiseless channels: A tutorial,” Univ.
Delaware EE Tech. Rep. 90–9–1, Dover, DE, 1990.
 K. J. Knudson, J. K. Wolf, and L. B. Milstein, “A concatenated decoding
scheme for partial response with matched spectral–null coding,”
in Proc. 1993 IEEE Global Telecommunications Conf. (GLOBECOM
’93) (Houston, TX, Nov. 1993), pp. 1960–1964.
 D. E. Knuth, “Efﬁcient balanced codes,” IEEE Trans. Inform. Theory,
vol. IT-32, pp. 51–53, Jan. 1986.
 H. Kobayashi, “Application of probabilistic decoding to digital magnetic
recording systems,” IBM J. Res. Develop., vol. 15, pp. 65–74, Jan. 1971.
 , “Correlative level coding and maximum-likelihood decoding,”
IEEE Trans. Inform. Theory, vol. IT-17, pp. 586–594, Sept. 1971.
 , “A survey of coding schemes for transmission or recording of
digital data,” IEEE Trans. Commun., vol. COM-19, pp. 1087–1099, Dec.
 H. Kobayashi and D. T. Tang, “Appliction of partial-response channel
coding to magnetic recording systems,” IBM J. Res. Develop., vol. 14,
pp. 368–375, July 1970.
 E. R. Kretzmer, “Generalization of a technique for binary data transmis-
sion,” IEEE Trans. Commun. Technol., vol. COM-14, pp. 67–68, Feb.
 A. Kunisa, S. Takahashi, and N. Itoh, “Digital modulation method for
recordable digital video disc,” IEEE Trans. Consumer Electron., vol.
42, pp. 820–825, Aug. 1996.
 A. Lempel and M. Cohn, “Look-ahead coding for input-restricted
channels,” IEEE Trans. Inform. Theory, vol. IT-28, pp. 933–937, Nov.
 S. Lin and D. J. Costello, Jr., Error Control Coding, Fundamentals and
Applications. Englewood Cliffs, NJ: Prentice-Hall, 1983.
 D. Lind and B. Marcus, Symbolic Dynamics and Coding. Cambridge,
U.K.: Cambridge Univ. Press, 1995.
 J. C. Mallinson and J. W. Miller, “ Optimal codes for digital magnetic
recording,” Radio Elec. Eng., vol. 47, pp. 172–176, 1977.
 M. W. Marcellin and H. J. Weber, “Two-dimensional modulation codes,”
IEEE J. Select. Areas Commun., vol. 10, pp. 254–266, Jan. 1992.
 B. H. Marcus, “Soﬁc systems and encoding data,” IEEE Trans. Inform.
Theory, vol. IT-31, pp. 366–377, May 1985.
 , “Symbolic dynamics and connections to coding theory, automata
theory and systems theory,” in Different Aspects of Coding Theory (Proc.
Symp. Applied Matematics.), A. R. Calderbank, Ed., vol. 50, American
Math. Soc., 1995.
 B. H. Marcus and R. M. Roth, “Bounds on the number of states in
encoder graphs for input-constrained channels,” IEEE Trans. Inform.
Theory, vol. 37, no. 3, pt. 2, pp. 742–758, May 1991.
 B. H. Marcus, R. M. Roth, and P. H. Siegel, “Constrained systems
and coding for recording channels,” in Handbook of Coding Theory,R.
Brualdi, C. Huffman, and V. Pless, Eds. Amsterdam, The Netherlands:
 B. H. Marcus and P. H. Siegel, “On codes with spectral nulls at rational
submultiples of the symbol frequency,” IEEE Trans. Inform. Theory,
vol. IT-33, pp. 557–568, July 1987.
 B. H. Marcus, P. H. Siegel, and J. K. Wolf, “Finite-state modulation
codes for data storage,” IEEE J. Select. Areas Commun., vol. 10, pp.
5–37, Jan. 1992.
 P. A. McEwen and J. K. Wolf, “Trellis codes for E PR4ML
with squared-distance 18,” IEEE Trans. Magn., vol. 32, pp. 3995–3997,
 S. W. McLaughlin, “Five runlength-limited codes for -ary recording
channels, ” IEEE Trans. Magn., vol. 33, pp. 2442–2450, May 1997.
 S. W. McLaughlin and D. L. Neuhoff, “Upper bounds on the capacity
of the digital magnetic recording channel,” IEEE Trans. Magn., vol. 29,
pp. 59–66, Jan. 1993.
 J. W. Miller, U.S. Patent 4 027335, 1977.
 T. Mittelholzer, P. A. McEwen, S. A. Altekar, and J. K. Wolf, “Finite
truncation depth trellis codes for the dicode channel,” IEEE Trans.
Magn., vol. 31, no. 6, pt. 1, pp. 3027–3029, Nov. 1995.
 B. E. Moision, P. H. Siegel, and E. Soljanin, “Distance-enhancing codes
for digital recording,” IEEE Trans. Magn., vol. 34, no. 1, pt. 1, pp.
69–74, Jan. 1998.
 C. M. Monti and G. L. Pierobon, “ Codes with a multiple spectral null
at zero frequency,” IEEE Trans. Inform. Theory, vol. 35, pp. 463–471,
 J. Moon and B. Brickner, “ Maximum transition run codes for data stor-
age systems,” IEEE Trans. Magn., vol. 32, no. 5, pt. 1, pp. 3992–3994,
 , “Design of a rate 5/6 maximum transition run code,” IEEE
Trans. Magn., vol. 33, pp. 2749–2751, Sept. 1997.
 H. Nakajima and K. Odaka, “A rotary-head high-density digital au-
dio tape recorder,” IEEE Trans. Consumer Electron., vol. CE-29, pp.
430–437, Aug. 1983.
 K. Norris and D. S. Bloomberg, “Channel capacity of charge-constrained
run-length limited codes,” IEEE Trans. Magn., vol. MAG-17, no. 6, pp.
3452–3455, Nov. 1981.
 B. Olson and S. Esener,“Partial response precoding for parallel-readout
optical memories,” Opt. Lett., vol. 19, pp. 661–663, 1993.
 L. H. Ozarow, A. D. Wyner, and J. Ziv, “Achievable rates for a
constrained Gaussian channel,” IEEE Trans. Inform. Theory, vol. 34,
pp. 365–371, May 1988.
 A. M. Patel, “Zero-modulation encoding in magnetic recording,” IBM
J. Res. Develop., vol. 19, pp. 366–378, July 1975. See also U.S. Patent
3810 111, May 1974.
 ,IBM Tech. Discl. Bull., vol. 231, no. 8, pp. 4633–4634, Jan.1989.
 G. L. Pierobon, “Codes for zero spectral density at zero frequency,”
IEEE Trans. Inform. Theory, vol. IT-30, pp. 435–439, Mar. 1984.
 K. C. Pohlmann, The Compact Disc Handbook, 2nd ed. Madison, WI:
A–R Editions, 1992.
 J. Rae, G. Christiansen, S.-M. Shih, H. Thapar, R. Karabed, and P.
Siegel, “Design and performance of a VLSI 120 Mb/s trellis-coded
partial-response channel,” IEEE Trans. Magn., vol. 31, pp. 1208–1214,
 R. M. Roth, P. H. Siegel, and A. Vardy, “High-order spectral-null codes:
Constructions and bounds,” IEEE Trans. Inform. Theory, vol. 40, pp.
1826–1840, Nov. 1994.
IMMINK et al.: CODES FOR DIGITAL RECORDERS 2299
 Lord Rothschild, “The distribution of English dictionary word lengths,”
J. Statist. Planning Infer., vol. 14, pp. 311–322, 1986.
 D. Rugar and P. H. Siegel, “Recording results and coding considerations
for the resonant bias coil overwrite technique,” in Optical Data Storage
Topical Meet., Proc. SPIE, G. R. Knight and C. N. Kurtz, Eds., vol.
1078, pp. 265–270, 1989.
 W. E. Ryan, L. L. McPheters, and S. W. McLaughlin, “Combined turbo
coding and turbo equalization for PR4-equalized Lorentzian channels,”
in Proc. Conf. Information Science and Systems (CISS’98) (Princeton,
NJ, Mar. 1998)..
 N. Sayiner, “Impact of the track density versus linear density trade–off
on the read channel: TCPR4 versus EPR4,” in Proc. 1995 SPIE Int.
Symp. on Voice, Video, and Data Communications (Philadelphia, PA,
Oct. 1995), vol. 2605, pp. 84–91.
 E. Seneta, Non-negative Matrices and Markov Chains, 2nd ed. New
York: Springer, 1980.
 S. Shamai (Shitz) and I. Bar-David, “Upper bounds on the capacity for
a constrained Gaussian channel,” IEEE Trans. Inform. Theory, vol. 35,
pp. 1079–1084, Sept. 1989.
 S. Shamai (Shitz), L. H. Ozarow, and A. D. Wyner, “Information rates
for a discrete-time Gaussian channel with intersymbol interference and
stationary inputs,” IEEE Trans. Inform. Theory, vol. 37, pp. 1527–1539,
 C. E. Shannon, “A mathematical theory of communication,” Bell Syst.
Tech. J., vol. 27, pp. 379–423, July 1948.
 K. Shaughnessy, personal communication, Dec. 1997.
 L. A. Shepp, “Covariance of unit processes,” in Proc. Working Conf.
Stochastic Processes (Santa Barbara, CA, 1967), pp. 205–218.
 K. Shimazaki, M. Yoshihiro, O. Ishizaki, S. Ohnuki, and N. Ohta,
“Magnetic multi-valued magneto-optical disk,” J. Magn. Soc. Japan,
vol. 19, suppl. no. S1, p. 429–430, 1995.
 P. H. Siegel, “Recording codes for digital magnetic storage,” IEEE
Trans. Magn., vol. MAG-21, pp. 1344–1349, Sept. 1985.
 P. H. Siegel and J. K. Wolf, “Modulation and coding for information
storage,” IEEE Commun. Mag., vol. 29, pp. 68–86, Dec. 1991.
 , “Bit-stufﬁng bounds on the capacity of two-dimensional con-
strained arrays,” in Proc. 1998 IEEE Int. Symp. Inform. Theory (Cam-
bridge, MA, Aug. 16–21, 1998), p. 323.
 J. G. Smith, “The information capacity of amplitude and variance
constrained scalar Gaussian channels,” Inform. Contr., vol. 18, pp.
 E. Soljanin, “On–track and off–track distance properties of Class 4
partial response channels,” in Proc. 1995 SPIE Int. Symp. Voice, Video,
and Data Communications (Philadelphia, PA, Oct. 1995), vol. 2605, pp.
 , “On coding for binary partial-response channels that don’t
achieve the matched-ﬁlter-bound,” in Proc. 1996 Information Theory
Work. (Haifa, Israel, June 9–13, 1996).
 E. Soljanin and C. N. Georghiades, “Multihead detection for multitrack
recording channels,” to be published in IEEE Trans. Inform. Theory,
vol. 44, Nov. 1998.
 E. Soljanin and O. E. Agazzi, “An interleaved coding scheme for
partial response with concatenated decoding” in
Proc. 1993 IEEE Global Telecommunications Conf. (GLOBECOM’96)
(London, U.K., Nov. 1996).
 R. E. Swanson and J. K. Wolf, “A new class of two-dimensional RLL
recording codes,” IEEE Trans. Magn., vol. 28, pp. 3407–3416, Nov.
 N. Swenson and J. M. Ciofﬁ, “Sliding block line codes to increase
dispersion-limited distance of optical ﬁber channels,” IEEE J. Select.
Areas Commun., vol. 13, pp. 485–498, Apr. 1995.
 R. Talyansky, T. Etzion, and R. M. Roth, “Efﬁcient code constructions
for certain two-dimensional constraints,” in Proc. 1997 IEEE Int. Symp.
Information Theory (Ulm, Germany, June 29–July 4), p. 387.
 D. T. Tang and L. R. Bahl, “Block codes for a class of constrained
noiseless channels,” Inform. Contr., vol. 17, pp. 436-461, 1970.
 H. K. Thapar and T. D. Howell, “On the performance of partial response
maximum-likelihood and peak detection methods in digital recording,”
in Tech. Dig. Magn. Rec. Conf 1991 (Hidden Valley, PA, June 1991).
 H. Thapar and A. Patel, “A class of partial-response systems for
increasing storage density in magnetic recording,” IEEE Trans. Magn.,
vol. MAG-23, pp. 3666–3668, Sept. 1987.
 Tj. Tjalkens, “Runlength limited sequences,” IEEE Trans. Inform. The-
ory, vol. 40, pp. 934–940, May 1994.
 IEEE Trans. Magn., vol. 34, no. 1, pt. 1, Jan. 1998.
 B. S. Tsybakov, “Capacity of a discrete Gaussian channel with a ﬁlter,”
Probl. Pered. Inform., vol. 6, pp. 78–82, 1970.
 C. M. J. van Uijen and C. P. M. J. Baggen, “Performance of a class of
channel codes for asymmetric optical recording,” in Proc. 7th Int. Conf.
Video, Audio and Data Recording, IERE Conf. Publ. no. 79 (York, U.K.,
Mar. 1988), pp. 29–32.
 A. Vardy, M. Blaum, P. Siegel, and G. Sincerbox, “Conservative arrays:
Multi-dimensional modulation codes for holographic recording,” IEEE
Trans. Inform. Theory, vol. 42, pp. 227–230, Jan. 1996.
 J. Watkinson, The Art of Digital Audio. London, U.K.: Focal, 1988.
 A. D. Weathers and J. K. Wolf, “ A new sliding block code for the
runlength constraint with the minimal number of encoder states,”
IEEE Trans. Inform. Theory, vol. 37, no. 3, pt. 2, pp. 908–913, May
 A. D. Weathers, S. A. Altekar, and J. K. Wolf, “Distance spectra for
PRML channels,” IEEE Trans. Magn., vol. 33, pp. 2809–2811, Sept.
 W. Weeks IV and R. E. Blahut, “The capacity and coding gain of
certain checkerboard codes,” IEEE Trans. Inform. Theory, vol. 44, pp.
1193–1203, May 1998.
 T. Weigandt, “Magneto-optic recording using a (2,18,2) run-length-
limited code,” S.M. thesis, Mass. Inst. Technol., Cambridge, MA, 1991.
 A. X. Widmer and P. A. Franaszek, “A DC-balanced, partitioned-block,
8b/10b transmission code,” IBM J. Res. Develop., vol. 27, no. 5, pp.
440–451, Sept. 1983.
 A. van Wijngaarden and K. A. S. Immink, “Construction of constrained
codes using sequence replacement techniques,” submitted for publica-
tion to IEEE Trans. Inform. Theory, 1997.
 J. K. Wolf and W. R. Richard, “Binary to ternary conversion by linear
ﬁltering,” Tech. Documentary Rep. RADC-TDR-62-230, May 1962.
 J. K. Wolf and G. Ungerboeck, “Trellis coding for partial-response
channels,” IEEE Trans. Commun., vol. COM-34, pp. 765–773, Aug.
 R. W. Wood, “Denser magnetic memory,” IEEE Spectrum, vol. 27, pp.
32–39, May 1990.
 R. W. Wood and D. A. Petersen, “Viterbi detection of class IV partial
response on a magnetic recoding channel,” IEEE Trans. Commun., vol.
COM-34, pp. 454–461, May 1986.
 Z.-N. Wu, S. Lin, and J. M. Ciofﬁ, “Capacity bounds for magnetic
recording channels,” in Proc. 1998 IEEE Global Telecommun. Conf.
(GLOBECOM ’98) (Sydney, Australia, Nov. 8–12, 1998), to be pub-
 H. Yoshida, T. Shimada, and Y. Hashimoto, “8-9 block code: A DC-
free channel code for digital magnetic recording’, SMPTE J., vol. 92,
pp. 918-922, Sept. 1983.
 S. Yoshida and S. Yajima, “On the relation between an encoding
automaton and the power spectrum of its output sequence,” Trans. IECE
Japan, vol. 59, pp. 1–7, 1976.
 A. H. Young, “Implementation issues of 8/9 distance-enhancing con-
strained codes for EEPR4 channel,” M.S. thesis, Univ. Calif., San Diego,
 E. Zehavi, “Coding for magnetic recording,” Ph.D. dissertation, Univ.
Calif., San Diego, 1987.
 E. Zehavi and J. K. Wolf, “On saving decoder states for some trellis
codes and partial response channels,” IEEE Trans. Commun., vol. 36,
pp. 454–461, Feb. 1988.