Conference PaperPDF Available

A novel design technique for weakly constrained codes

Transactions Letters________________________________________________________________
Design Techniques for Weakly Constrained Codes
Ming Jin, Member, IEEE, K. A. S. Immink, Fellow, IEEE, and B. Farhang-Boroujeny, Senior Member, IEEE
Abstract—A general method of constructing run-length limited
(, ) constrained codes from arbitrary sequences is introduced.
This method is then combined with the method of guided scram-
posed codes are analyzed for the case of and are shown to
give results which are better or comparable to those of the best
available codes, however, at the cost of failure with some very low
probability. For , the code efficiency of the codes constructed
according to the proposed method reduces significantly.
Index Terms—Bit stuffing, guided scrambling, run-length lim-
ited (RLL) code, weakly constrained codes.
CODES BASED on run-length limited (RLL) sequences
have found wide application in data storage products [4],
[5]. RLL sequences are characterized by two parameters, and
, respectively, which show minimum and maximum number
of zeros allowed between two consecutive ones. We recall that
a one corresponds to a change of direction of flux on a mag-
netic medium, and a zero means no change of flux. The con-
straint is applied to reduce the rate of flux change on the mag-
netic medium, in order to cope with the maximum allowable
flux change along the recording track. The constraint, on the
other hand, is dictated by timing recovery considerations [7],
Realization of RLL codes requires manipulation of the orig-
inal source sequence to satisfy the and constraints. This nat-
urally requires addition of some redundancy in the sequence.
This is commonly done by first partitioning the source sequence
into blocks of length . Each block then undergoes a map-
ping which, according to the coding rules, generates a block of
symbols for transmission, resulting in a code rate of
. The maximum achievable value of is given by the
noiseless capacity, , of the constrained channel [4], [5]. The
code efficiency is defined as .
Paper approved by E. Ayanoglu, the Editor for Communication Theory and
Coding Application of the IEEE Communications Society. Manuscript received
February 26, 2001; revised October 23, 2001 and May 19, 2002. This paper was
presented in part at the IEEE GLOBECOM’01, San Antonio, TX, November
25–29, 2001.
M. Jin was with the Department of Electrical Engineering, National Univer-
sity of Singapore, Singapore. He is now with Seagate Technology International,
Singapore 118249, Singapore.
K.A.S.Imminkis with theInstitute for ExperimentalMathematics, D453261
Essen, Germany.
B. Farhang-Boroujeny was with the Department of Electrical Engineering,
National University of Singapore. He is now with the Department of Electrical
Engineering, University of Utah, Salt Lake City, UT 84112-9206 USA (e-mail:
Digital Object Identifier 10.1109/TCOMM.2003.811377
The idea of weakly constrained codes was first proposed by
Immink [3]. Weakly constrained codes follow the code rules
most of the time, but not always. Code violation occurs with
a low probability, and this can be made arbitrarily low by re-
ducing the code rate. It is argued that as the channel is not free
of errors, failure to satisfy the code constraints with a proba-
bility significantly less than the probability of errors due to the
channel imperfections will not result in any significant degrada-
tion of the system performance.
Immink [3] presented a performance analysis of weakly con-
strained codes in a rather general context. He demonstrated that
repetitive independent scrambling of the original data will give a
set of code words which comprises, with a very high probability,
at least one member that complies with the code constraint.
In this letter, we first introduce a systematic method of gen-
erating a class of variable-length (VL) RLL constrained
codes. VL codes were pioneered by Franaszek in the early
1970s; see [4, Sec. 7.2]. The VL codes presented by Franaszek
generate encoded blocks of fixed length, and are therefore
named synchronous VL codes. Here, we study asynchronous
VL codes, which are made synchronous (i.e., they have the
required feature of fixed output block length) by adding a
number of dummy bits. As the proposed encoding algorithm
cannot fully guarantee the fixed-length requirement without
disobeying the maximum-runlength constraint, this leads to
a class of weakly constrained codes. The encoding algorithm
is improved significantly by using a procedure called guided
scrambling first introduced by Fair et al. [1].
The long block codes considered in this letter may suffer
from severe error propagation as, in worst-case situations, en-
tire decoded words can be in error as the result of a single
channel-bit error. With the coding configuration developed by
Bliss (see [4, Sec. 6.3] and further) serious error propagation
of long block codes can be effectively combated. We assume
that our newly developed block code is used in conjunction with
such a Bliss-type encoder configuration, and that as a result, the
error propagation of long block codes does not play a role.
We consider a simple method for constructing ( , ) con-
strained codes where bit stuffing [4]–[6] is used to translate ar-
bitrary data into constrained sequences. We assume . The
encoding is done in two steps. In the first step, the encoder scans
the input data sequence, , and inserts (stuffs) a 1 after every
string of consecutive 0’s such that the output sequence,
, satisfies the RLL (0, ) constraint. In the second step,
a string of consecutive 0’s are inserted after every 1 in .
0090-6778/03$17.00 © 2003 IEEE
This results in an output sequence satisfying the ( , ) con-
straint. The decoding is simply done by removing the inserted
bits, which as can be verified, can be done in a unique fashion.
We note that the number of inserted bits, and thus, the length of
coded data, depends on the input, thus, the name variable length.
To analyze the above bit-stuffing method, we assume the
input data bits, , are independent and 0’s and 1’s are equally
probable. In addition, we identify two categories of bit patterns
(phrases) and their probability of occurrence, , in the input
data as listed in Table I. We note that in the first step of bit
stuffing, no action will be taken on the phrases in Category I.
However, the bit phrase in Category II is expanded by one bit.
We also note that the average phrase length in a random input
sequence is
Moreover, the average inserted bits in the first step, and the
average phrase length in the bit-stuffed sequence are
, and , respectively. The average code rate of is,
Hence, assuming that contains bits, the average length
of will be bits.
In the second step, the encoder inserts a string of consecu-
tive 0’s after every 1 in . Noting that the average number of
0’s in is similar to and is equal to , the average length
of is
We obtain the average code rate of as
To evaluate the code efficiency of the above method, we
may compare the numerical results derived from (4) with
the theoretical bounds (i.e., the noiseless capacity of the
constrained channel) and also those of the well-known in-
dustry-standard RLL ( , ) codes. For example, from (4), we
obtain and . These
values should be compared against the maxentropic capacity
values 0.6793 and 0.5174, respectively. We may also notice
that the respective values for the well-known industry-standard
codes are 2/3 and 1/2. Another good example is RLL (0, )
codes. From (4), for , we get
This also is very close to the capacity
Although the above bit stuffing is attractive as an inex-
pensive-to-implement method, it may be found impractical
because of its serious problem of error propagation. A single
bit error in the decoded data may corrupt the rest of the data
sequence. To resolve this problem, we propose a fixed-length
weakly constrained code, using the above bit-stuffing method.
The discussion in this section is limited to ) codes.
The reason that we limit the discussion to codes
is that the proposed weakly constrained codes will only give
good results when . When , addition of 0’s after
each 1 without any consideration of the subsequent information
bits, as done in this letter, reduces the code efficiency of the
constructed constrained codes significantly.
We divide the input sequence into blocks of bits each. Each
block is coded separately into bits. Thus, a max-
imum of stuffed bits is allowed in each block. If the number
of the required stuffed bits is less than , the additional bits
will be filled up with dummy bits, which will be ignored in the
decoding process. If the number of the required stuffed bits is
larger than , constraint failure will occur. The success of this
method should be assessed by studying the probability of code
failure (PCF). We, therefore, proceed with such a study. A pos-
sible study which may lead to an exact expression for PCF in-
volves use of multinomial distribution functions [2]. However,
our attempt along this line has led to cumbersome equations
which are hard to handle. On the other hand, analysis based on
some approximations gives results which match computer sim-
ulations very closely. We proceed with two analysis methods of
this type. The first method is an oversimplified analysis, which
within the range of interest gives an upper bound to the PCF.
It also serves as a means of more in-depth understanding of
the problem, and it paves the way for the development of more
accurate results in the second method. The analytical PCF for-
mulas developed by the second method match very closely with
A. Analysis Based on Single Phrase Length
In a RLL (0, ) constrained code that is constructed according
to the procedure discussed in Section II, the phrase length is a
random variable which takes values of 1 to with the re-
spective probabilities listed in Table I. To simplify the analysis,
we assume that all phrases are of equal length , where is
the average phrase length. Accordingly, in a block of length ,
there are phrases, and the probability distribution
function (PDF) of a number of Category II phrases, , in a given
block follows the binomial distribution function [2]
Fig. 1. Trellis diagram explanation for binomial distribution based on single-
phrase approach.
where is the probability that a chosen phrase
belongs to Category II, and the first term on the right-hand side
is the binomial coefficient/combinatorial number
The PCF is then given by
A more in-depth understanding of (5) is obtained by refer-
ring to the trellis diagram depicted in Fig. 1. The horizontal
(dashed) and diagonal (dotted) arrows indicate occurrences of
phrases from Category I and Category II, respectively. Attached
to each arrow is the probability of occurrence of a phrase from
the respective category. For convenience, we may imagine that
the phrases in the data block are selected successively from left
to right, as they appear in the trellis. The trellis begins with a
root and ends at a number of terminals. Each terminal may be
reached through a number of different paths. So, attached to
each terminal, there is a set of paths. Each terminal is high-
lighted with a number which indicates the number of obser-
vations of Category II phrases along each of the paths in the
respective set. The probability of occurrence of each path is ob-
tainedby multiplying the respective probabilities along the path.
Simple inspection shows that all the paths ending at the same
terminal (i.e., in a set) have the same probability of occurrence.
Thus, the number of paths in a set multiplied by the probability
of occurrence of each path in the set is equal to the probability
of reaching the associated terminal. This, clearly, gives the PDF
of a number of phrases from Category II in the data block, i.e.,
the binomial distribution of (5).
Obviously, in practice, where the block length is fixed and
the phrase length varies, the trellis of Fig. 1, and thus, (5) may
not give a good prediction of PCF. An exact evaluation of PCF
requires consideration of the length of branches (arrows) in the
trellis and the fact that the summation of the branch lengths
along each path cannot be larger than the block length, .As
noted earlier, a full consideration of this point will result in a
procedure which involvesuse of multinomial distribution which
is difficult to handle. We proceed with our next analysis which
considers the length of the Category II phrase accurately, but
still uses an average length for the phrases from Category I.
B. Analysis Based on Double Phrase Length
For convenience of the presentation here, let us assume that
the average phrase length in Category I is and the Category
II phrase has a length of . Fig. 2 presents a trellis which is
built up for this scenario, in the case where . Here,
we have three types of branches; two horizonal and one diag-
onal. We note that near the end of each sequence of horizonal
branches,wereach a point where the remaining length is smaller
than a Category II phrase length, thus, the Category I phrases
occur with probability 1. Such branches are indicated by solid
line arrows. The PDF associated with each of the terminals in
Fig. 2 can be calculated in the same manner as done in Fig. 1.
That is, the probability of reaching each terminal is obtained by
identifying all the paths connecting the root to the terminal and
adding the respective probabilities.
We note that unlike Fig. 1, in the trellis of Fig. 2, the paths
ending at the same terminal do not share the same probability of
occurrence. For example, the paths ending at terminal 2 in Fig. 2
may be divided into the subsets 0, 1, and 2, with the respective
probabilities , ,and
. Moreover, we may observe that the number of paths in each
of these sets is a binomial combinatorial number (compare the
form of the trellis in each subset in Fig. 2 with each set in Fig. 1).
Generalization of this observation is obvious and will result in
the PDF equation (for )
where is the number of phrases on
the paths ending at terminal , is the number
of solid line arrows ending at terminal , and denotes the
integerpart of . A more accurate estimate ofPCF is, thus, given
To evaluate the accuracy of the results above, in Fig. 3, we
have presented a set of PCF curves obtained by using (6) and (8)
with different maximum run-length constraints . The results
clearly show that the double-phrase approximation gives almost
perfect results. The single-phrase approach, however, does not
give accurate results, particularly when is small. We also note
that the single-phrase approach provides an upper limit to PCF.
This may be explained as follows.
upon occurrence of a phrase from Category II, the remaining
length in the present block will reduce by the length of the Cat-
egory II phrase. That is, we ignore the fact that the length of
the horizontal paths in the trellis decreases as we move to the
upper branches. On the contrary, in the double-phrase analysis,
this reduction of trellis path lengths is considered. Ignoring the
Fig. 2. Trellis diagram explanation based on double-phrase approach.
Fig. 3. PCF , for weakly constrained codes. Block length .
Solid-line curves are the theoretical results based on double-phrase
approximation. Dots are simulation results based on examination of data
blocks. Dashed-line curves are the theoretical results based on single-phrase
reduction in the path lengths obviously results in overcounting
the number of paths that end at higher terminals. This, in turn,
results in overestimating the chance of reaching higher termi-
nals that satisfy , i.e., those that correspond to the code
Further results of (8) are presented in Fig. 4. Here, the code
efficiency is fixed at and is used to select the value of
according to
We note that PCF, ,decreases rapidly with the increase of both
the run length, , and the block length, . However, when is
small, say , PCF is high, and thus, may not be acceptable.
Fig. 4. PCF , as a function of the run length , with the input data block
length as a parameter. Code efficiency is fixed at .
For smaller values of , we propose using the idea of GS [1],
[9] to achieve a low PCF. Fig. 5 depicts a schematic of the en-
coder which uses the idea of GS. The GS encoder takes a block
of input data ( bits) and generates randomly scram-
bled sequences of length bits each. The scrambled se-
quences are bit stuffed with a maximum of bits to satisfy
the constraint. Assuming that the scrambled sequences are suf-
ficiently distinct, which is achievable according to [1], the prob-
ability of failure of not being able to get at least one successfully
constrained sequence from the structure of Fig. 5 is
We note that the code rate here is .
Fig. 6 shows the PCF, , as a function of code efficiency,
, with different input data block length, , for fixed values of
Fig. 5. Combined GS and bit-stuffing encoder scheme for weakly constrained
Fig. 6. PCF, , as a function of the code efficiency, . , .
Fig. 7. PCF, , as a function of the code efficiency, . , .
and .Theseresults showthatthe failureprobability
decreases with the increase of data block length, . To achieve
small , it is preferred to choose large values of . On the
other hand, to limit the error propagation, it is better to choose
Fig. 8. PCF, , as a function of the code efficiency, . (a) ,
. (b) , .
small . For the following presentation, we choose
as a compromise choice.
Fig. 7 presents a set of plots showing the PCF, , versus
code efficiency, , for and and various values
of . Note that corresponds to no scrambling. From the
results, we clearly see the effect of GS in improving the failure
probability. For example, without scrambling, a PCF of
can be achieved at a code efficiency of 0.952, while if ,
i.e., eight scrambled codes are examined, for the same PCF, the
achievable code efficiency is improved to 0.978.
Fig. 8 presents two sets of curves which show how the PCF
improves as increases, for values of . The results
are interesting and also impressive. They show very efficient
codes can be designed for even small values of
. For example, if we choose , and failure
probability of , the code efficiency of the resulting RLL
(0, 2), (0, 3), and (0, 6) codes in Fig. 8(a) are about 0.933, 0.953,
and 0.978, respectively. In Fig. 8(b), the corresponding code
efficiency will improve to the values 0.948, 0.962, and 0.984,
respectively, if is selected. These should be compared
with the values 0.9101, 0.9388, and 0.9467 corresponding to
the industry standards 4/5 RLL (0, 2), 8/9 RLL (0, 3), and 16/17
RLL (0, 6) codes, respectively [4].
In this letter, we introduced a general method of constructing
, constrained codes from arbitrary sequences using
a simple bit-stuffing technique. We noted that this method,
although it leads to a class of relatively efficient codes, has
the problem of error propagation and variable code rate. This
problem was then resolved by dividing the input data into
blocks of length and applying the proposed method to
individual blocks, but keeping the number of stuffed bits fixed
for each block. We noticed that this leads to a class of weakly
constrained codes with a fixed code rate. These codes were
analyzed for the case of (0, ) constraint; the only case that we
had found giving good results. We found that these codes can
only give acceptable results for larger values of . The
use of GS was then suggested as a very effective method of
achieving high efficiency, for smaller values of .
[1] I. J. Fair, W. D. Gover,W. A. Krzymien, and R. I. MacDonald, “Guided
scrambling: a new line coding technique for high bit rate fiber optic
transmission systems,” IEEE Trans. Commun., vol. 39, pp. 289–297,
Feb. 1991.
[2] W. Feller, An Introduction to Probability Theory and Its Application,
2nd ed. New York: Wiley, 1971.
[3] K. A. S. Immink, “Weakly constrained codes,” Electron. Lett., vol. 33,
no. 23, pp. 1943–1944, Nov. 1997.
[4] ,Codes For Mass Data Storage Systems. Eindhoven, The Nether-
lands: Shannon Foundation, 1999.
[5] E. A. Lee and D. G. Messerschmitt, Digital Communication. Boston,
MA: Kluwer, 1988.
[6] P. E. Bender and J. K. Wolf, “A universal algorithm for generating op-
timal and nearly optimal run-length limited, charge-constrained binary
sequences,” in Proc. 1993 IEEE Int. Symp. Information Theory, Jan.
17–22, 1993, p. 6.
[7] J. W. M. Bergmans, Digital Baseband Transmission and
Recording. Boston, MA: Kluwer, 1996.
[8] S. X. Wang and A. M. Taratorin, Magnetic Information Storage Tech-
nology. San Diego, CA: Academic, 1999.
[9] A. Kunisa, “Run-length control based on guided scrambling for digital
magnetic recording,” IEICE Trans. Electron., vol. E82-C, no. 12, pp.
2209–2217, Dec. 1999.
ResearchGate has not been able to resolve any citations for this publication.
This book concerns digital communication. Specifically, we treat the transport of bit streams from one geographical location to another over various physical media, such as wire pairs, coaxial cable, optical fiber, and radio waves. Further, we cover the mul­ tiplexing, multiple access, and synchronization issues relevant to constructing com­ munication networks that simultaneously transport bit streams from many users. The material in this book is thus directly relevant to the design of a multitude of digital communication systems, including for example local and metropolitan area data net­ works, voice and video telephony systems, the integrated services digital network (ISDN), computer communication systems, voiceband data modems, and satellite communication systems. We extract the common principles underlying these and other applications and present them in a unified framework. This book is intended for designers and would-be designers of digital communication systems. To limit the scope to manageable proportions we have had to be selective in the topics covered and in the depth of coverage. In the case of advanced information, coding, and detection theory, for example, we have not tried to duplicate the in-depth coverage of many advanced textbooks, but rather have tried to cover those aspects directly relevant to the design of digital communication systems.
Guided Scrambling (GS) is used for control of the runlength within code blocks, such as d or k, as well as for DC component suppression. A code designed by the GS technique, called a weakly constrained code, does not strictly guarantee the imposed fc-constraint, but rather generates code blocks that violate the prescribed constraint with very low probability. In this case, the code rate and efficiency become very high, compared with typical RLL codes using a small constrained length. In this paper, weakly constrained codes based on the convolutional GS and GF-addition GS generate the weakly fc-constraint sequences. The probability that a code block violates the -4-constraint is measured. To show the superior performance of the GS, the occurrence probability of each runlength is also investigated and compared with the 24/25(0,8) block code which has a high code rate and adheres to channel constraints. We also compare it with the runlength distribution of a maxentropic RLL sequence and show that the statistical property of the GS-encoded sequences is similar to that of the maxentropic RLL sequence on runlength distribution.
From the Publisher: Digital Baseband Transmission and Recording provides an integral, in-depth and up-to-date overview of the signal processing techniques that are at the heart of digital baseband transmission and recording systems. The coverage ranges from fundamentals to applications in such areas as digital subscriber loops and magnetic and optical storage. Much of the material presented here has never before appeared in book form. The main features of Digital Baseband Transmission and Recording include: -- a survey of digital subscriber lines and digital magnetic and optical storage; -- a review of fundamental transmission and reception limits; -- an encyclopedic introduction to baseband modulation codes; -- development of a rich palette of equalization techniques; -- a coherent treatment of Viterbi detection and many near-optimum detection schemes; -- an overview of adaptive reception techniques that encompasses adaptive gain and slope control, adaptive detection, and novel forms of zero-forcing adaptation; -- an in-depth review of timing recovery and PLLs, with an extensive catalog of timing-recovery schemes. Featuring around 450 figures, 200 examples, 350 problems and exercises, and 750 references, Digital Baseband Transmission and Recording is an essential reference source to engineers and researchers active in telecommunications and digital recording. It will also be useful for advanced courses in digital communications.
The rapid growth of data communications in recent years, coupled with a movement from batch-oriented to transaction-oriented (interactive) type of operation, generated the need for a new, more efficient, more reliable, more flexible form of data link control procedure. A bit-oriented approach to data link control that has become the generally accepted standard around the world is discussed in this paper. The basic elements and structure of the procedure are described, some typical examples of operation reviewed, and a "crystal-balling" of the future is offered.