Content uploaded by Kees Schouhamer Immink

Author content

All content in this area was uploaded by Kees Schouhamer Immink on Aug 19, 2021

Content may be subject to copyright.

Efﬁcient encoding of constrained block codes

Kees A. Schouhamer Immink, Fellow, IEEE, and Kui Cai, Senior Member, IEEE

Abstract—We present coding methods for generating `-symbol

constrained codewords taken from a set, S, of allowed codewords.

In standard practice, the size of the set S, denoted by M=|S|,

is truncated to an integer power of two, which may lead to

a serious waste of capacity. We present an efﬁcient and low-

complexity coding method for avoiding the truncation loss, where

the encoding is accomplished in two steps: ﬁrst, a series of binary

input (user) data is translated into a series of M-ary symbols

in the alphabet M={0,...,M −1}. Then, in the second step,

the M-ary symbols are translated into a series of admissible `-

symbol words in Sby using a small look-up table. The presented

construction of Pearson codes and ﬁxed-weight codes offers a rate

close to capacity. For example, the presented 255B320B balanced

code, where 255 source bits are translated into 32 10-bit balanced

codewords, has a rate 0.1 % below capacity.

Keywords−constrained code, code design, binary block

code, balanced code, Pearson code.

I. INTRODUCTION

Constrained codes have found widespread application in a

large variety of communication systems such as cable trans-

mission [1, 2], vehicular communications systems, visible light

communications (VLC) systems [3], and data storage products

ranging from magnetic, optical, solid-state (Flash), and DNA.

Runlength limited codes [4] (RLL) use codewords with re-

strictions on the minimum and maximum runlength (that is,

the number of consecutive like symbols) of the encoded se-

quence. RLL codes are ubiquitous in optical disc and magnetic

recording products [5], VLC [3, 6], and DNA-based storage

media [7, 8]. Balanced and almost balanced codes employ

codewords with equal, or almost equal, numbers of 1’s and

0’s [5]. A typical example of an almost balanced code is the

8B10B code. The 8B10B code has many embodiments [2, 9],

and is widely used in gigabit telecommunication systems

and data storage media. Combinations of RLL and balanced

codes can be found in data storage, energy harvesting, and

communications codes [10, 11, 12, 13].

The codewords of a constrained block code are taken from

a selected repertoire, S,S ⊆ Q`, of admissible codewords

x= (x1, x2, . . . , x`),xi∈Q. The number of admissible

codewords is denoted by M=|S|, the size of S. We do not

concern ourselves with the selection or pairing of codewords.

Each admissible codeword will be uniquely represented by an

integer symbol taken from the alphabet M={0, . . . , M −1}.

Kees A. Schouhamer Immink is with Turing Machines Inc, Willem-

skade 15d, 3016 DK Rotterdam, The Netherlands. E-mail: immink@turing-

machines.com.

Kui Cai is with Singapore University of Technology and Design (SUTD),

Science, Mathematics, and Technology Cluster, 8 Somapah Rd, 487372,

Singapore. E-mail: cai kui@sutd.edu.sg.

This work is supported by Singapore Ministry of Education Academic

Research Fund Tier 2 MOE2019-T2-2-123 and RIE2020 Advanced Manu-

facturing and Engineering (AME) programmatic grant A18A6b0057.

The information capacity, denoted by C, of the channel using

constrained codewords equals

C= log2M. (1)

In standard practice, the size of the original set Sof admissible

codewords is truncated to the nearest integer power of two,

2bCc, by judiciously deleting the surplus, M−2bCc, words,

which may lead to a serious waste of capacity. A notorious

example is the binary Pearson code, where only one word, the

all-1 or all-0 word, is excluded [14]. With prior art ﬁxed-block

codes the rate equals (`−1)/`, which entails a signiﬁcant

redundancy for small `. Another well-known example is

M= 252 (the number of 10-bit balanced (binary) codewords

with equal numbers of 1’s and 0’s), where the truncation to

27= 128 codewords leads to an information rate waste of

around 10 %. By combining ten 10-bit balanced words, we

can translate 79 bits into ten 10-bit balanced codewords. The

redundancy is less than a percent, but the improvement comes

at a higher complexity of the look-up translation tables. There

is a need to improve the rate efﬁciency of constrained codes

without complex look-up tables.

We present and investigate a new encoding procedure that

aims to improve the code rate efﬁciency without the need

to use large look-up tables. The new encoding method is

accomplished in two steps. First, the key step, binary source

data are efﬁciently translated into a series of integer symbols

in the alphabet Mthat are conveniently represented by q-bit

binary words, where qan integer satisfying q=dCe. In the

second step, the series of q-bit words is translated into a series

of admissible codewords in S. The ﬁrst encoding step scales

linearly with the number of symbols in the codeword.

The paper is organized as follows. In Section II, we start

a survey of properties of the radix conversion scheme. Sec-

tion III presents results of the new coding technique. An

alternative scheme, the variable length to ﬁxed length scheme,

is investigated in Section IV. Applications to binary Pearson

codes and balanced codes are given in Section V. We present a

high-rate 255B320B balanced code, where 255 source bits are

translated into 32 10-bit balanced codewords. Our conclusions

are presented in Section VI.

II. RADIX CONVERSION SCHEME

A straightforward method for translating an n-bit source

ﬁle into a series of integer symbols in the alphabet Mis base

or radix conversion. The binary n-bit input word, considered

as a number in radix 2, is converted into Lointeger symbols

in M, where Lois a user-deﬁned positive integer [15]. The

Loradix-Minteger, in turn, is translated, using a look-up

table, into the corresponding admissible word. The number of

distinct integers that can be addressed with Losymbols in an

M-radix system equals MLo, so that for a code to exist we

have 2n≤MLo, or

n=bLoCc.(2)

An integer symbol in the alphabet Mcan carry at most Cbits,

so it is natural to deﬁne the rate efﬁciency of an encoder as

the quotient of the (average) number of bits that are translated

per symbol and the capacity C. Then, the rate efﬁciency of

the radix-2-to-Mconversion, denoted by Ro(Lo), equals

Ro(Lo) = n

LoC=bLoCc

LoC.(3)

The rate efﬁciency of a simple look-up table, Lo= 1, equals

Ro(1) = bCc/C. The best case rate efﬁciency is obtained

when Mis an integer power of two, then Ro(Lo) = 1, and

the coding step is lossless.

Example 1: Let M= 252. Then, we can transmit

at most C= log2(252) = 7.977 bits per symbol. A

simple binary encoder, using a look-up table, has a rate

Ro(1) = 7/7.977, which implies a 10 % relative rate loss.

Let, for example, Lo= 10, then, n= 79, so that the (relative)

code redundancy of the radix conversion scheme equals

1−Ro(10) = 1 −79/79.77 ≈0.0097.

A drawback of the radix conversion scheme, unless M

is an integer power of two, is the increasing complexity

with growing codeword length nas we require n-bit addi-

sion/subtraction units and storage of the (M−1) ×Lon-

bit wide coefﬁcients. Let Mbe close to an integer power

of two, say M= 2u+v, where u, v are two integers,

u > 0,|v| 2u, then C≈u+v

ln(2)2u. Let v > 0, then

Ro(1) = u/C, and we simply ﬁnd that Ro(Lo)> Ro(1) for

Lo'2uln(2)/v ≈0.693 2u/v. In other words, in order to

improve the rate with respect to that of a simple look-up table,

Ro(1), we must increase Lo. For example, let M= 33, then

C= log2(33) ≈5.044394118 and Ro(1) ≈5/C = 0.9911.

We easily ﬁnd that L0

o= 23 is the smallest Lothat can

increase the rate to Ro(23) = bCL0

oc/CL0

o= 116/115 Ro(1).

In the next section, we describe a simple method for efﬁciently

generating codewords that has less concerns with respect to

complexity.

III. DESCRIPTION OF THE NEW CODING METHOD

A. Basic encoder

An integer in Mis represented by a q-bit word, q=

dlog2Me, taken from a constrained set, C, where |C| =M.

The aim of the new coding method is to efﬁciently translate

binary source data into a series of q-bit words in C. The binary

source data are assumed to be represented by (n−1)-bit words,

denoted by (a1, . . . , an−1),ai∈B={0,1}, where nis a

conveniently chosen integer. The (n−1)-bit source word is

translated, using the new algorithm, into L q-bit words, where

qL =n. The q-bit words are denoted by ui,1≤i≤L, where

the ﬁrst word, u1= (p, a1, . . . , aq−1), called pivot word,

contains q−1user bits plus a redundant bit called pivot bit,

denoted by p,p∈B. The value of the pivot bit, p, is governed

by the encoder, see later. The remaining (L−1) q-bit words

are deﬁned by a shufﬂed input: ui= (a(i−1)q, . . . , aiq−1),

2≤i≤L.

B. Description of the encoding and decoding algorithms

For clerical convenience we deﬁne two functions: dec(y)

denotes the decimal representation of the q-bit word

y= (y1, . . . , yq), and, vice versa,y=binq(z)denotes the

q-bit binary representation, y, of the integer z,0≤z≤2q−1.

Clearly, dec(y) = z. All variables are integers, the bold

face variables denote q-bit words. Let w= 2q−Mdenote

the number of inadmissible words, u, dec(u)< w. At the

conclusion of the algorithm, all inadmissible q-bit words, ui,

dec(ui)< w, are eliminated and replaced by admissible q-bit

words, so that dec(ui)≥w, ∀i.

Encoding routine

Input: The integers q,w= 2q−M,L, and the binary

(Lq −1)-bit source data (a1, . . . , aLq−1).

Output: Series of encoded q-bit words ui, where dec(ui)≥w,

1≤i≤L.

Initialize: Deﬁne the L q-bit words u1=(1, a1, . . . , aq−1)

and ui=(a(i−1)q, . . . , aiq−1),2≤i≤L.

Set v= 1.

Replacing inadmissible words:

for i=2:L

if dec(ui)< w

ui=uv;uv=binq((i−1)w+ dec(ui));v=i

end

end.

Note that the pivot bit equals ‘1’ in case the user data is sent

unmodiﬁed, or it equals ‘0’ in case at least one word has

been modiﬁed. As a result, the receiver can easily detect that

modiﬁcations have been effected. Decoding of the received

codeword can be accomplished in a straightforward way by

recursively undoing the replacements.

Decoding routine

Input: The integers q,w= 2q−M,L, and L q-bit words ui,

1≤i≤L, encoded by the above routine.

Output: Series of decoded q-bit words, denoted by ˆ

ui,

1≤i≤L. The (Lq −1)-bit source data (a1, . . . , aLq−1)are

found after a reshufﬂing of ˆ

ui.

Restoring the source data:

for i= 1 : Lˆ

ui=uiend

if dec(u1)<2q−1

v= 1; c= 0;

while (c < 2q−1)

vo=v;c= dec(uv);

v= 1 + c÷w;ˆ

uv= binq(dec(uvo)−(v−1)w);

end

ˆ

u1=binq(c−2q−1);

end.

The arithmetic of the algorithms is easily embodied in a

look-up table. Two worked encoding examples will be helpful

to understand the encoding algorithm.

Example 2: Let q= 4,L= 4,w= 2 (‘0000’ and ‘0001’

are forbidden words). Let the user data be ‘000 0001 0000

1111’. After prepending the pivot bit ‘1’, we obtain the

sequence ‘1000 0001 0000 1111’. Set at the start v= 1. We

ﬁnd the ﬁrst inadmissible word at position i1= 2, and we

replace it by the pivot word, that is, u2=u1= ‘1000’. Then

wi1=dec(u2)=1, so that the pivot word becomes u1=

bin4((i1−1)w+wi1) = bin4((2 −1)2 + 1) = bin4(3) =

‘0011’. We obtain the intermediate result ‘0011 1000 0000

1111’. Set v=i1= 2. The second inadmissible word is found

at index position i2= 3. We now set u3=uv=u2=‘1000’

and u2=bin4((i2−1)w+wi2) = bin4(4 + 0) = ‘0100’,

and we obtain the ﬁnal result ‘0011 0100 1000 1111’.

Example 3: Let, as above, q= 4,L= 4,w= 2. Let the

user data be ‘000 0000 0000 0000’. Without much ado, we

write down the intermediate results ‘0010 1000 0000 0000’,

‘0010 0100 1000 0000’, and ‘0010 0100 0110 1000’. The

sent codeword is ‘0010 0100 0110 1000’. In case the source

word is ‘001 0001 0001 0001’, we obtain the codeword ‘0011

0101 0111 1001’.

Unique decoding is possible if the number, L, of q-bit words

that can maximally be translated, equals [16]

L=2q−1

2q−M.(4)

so that the rate efﬁciency of the code, denoted by R1(M), is

at most

R1(M) = Lq −1

LC .(5)

If 2q−M= 2tis a power of two, 1≤t≤q−1, we simply

ﬁnd

R1(M) = q−2t+1−q

C.(6)

In case L= 1, when Mis in the range

2q−1< M ≤3

22q−1−1,(7)

the algorithm does not improve the rate efﬁciency with respect

to a simple look-up table. The worst case rate efﬁciency equals

R13

22q−1−1≈q−1

q−2 + log2(3) .(8)

We may, in some instances, improve the rate efﬁciency by

combining r,r≥1,M-ary symbols into a single r-symbol

word.

C. Combining words

The number of combined admissible r-symbol words equals

Mr, so that we obtain

q0=drlog2Me,(9)

L0=$2q0−1

2q0−Mr%.(10)

7 7.2 7.4 7.6 7.8 8

log2 M

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

1-R

r=2

VF

r=1

Fig. 1. Redundancy 1−Rr(M)versus C= log2Mfor r= 1 and 2.

The curve denoted by VF is discussed in Section IV.

7 7.5 8 8.5 9 9.5 10

log2 M

0

0.005

0.01

0.015

0.02

0.025

0.03

1-R

Fig. 2. Redundancy 1−ˆ

R3(M)versus C= log2M, where ˆ

R3(M)

denotes the largest of R1(M), R2(M), and R3(M).

Let Rr(M)denote the rate efﬁciency of the encoder, then

Rr(M) = L0q0−1

L0rC =R1(Mr).(11)

Figure 1 shows the redundancy 1−Rr(M)versus C= log2M

for r= 1 and 2. The encoder cannot improve its rate efﬁciency

with respect to an encoder using a simple look-up table if L0=

1, that is, when Mis in the range 2q0−1< M < r

q3

22q0−1,

which gets smaller with increasing r.

We may further observe that not for all Mwe have

Rr(M)> R1(M), r > 1. Let ˆ

Rm(M)denote the highest

Rr(M)achievable for a selected r= 1, . . . , m, that is,

ˆ

Rm(M) = max{R1(M), R2(M), . . . , Rm(M)}. Figure 2

shows the redundancy 1−ˆ

R3(M)as a function of log2M.

D. Encoder complexity

The complexity of the encoder scales with L. If the source

data are random, then the probability of treplacements, 0≤

TABLE I

ENC ODE R TABL E OF VF COD E FO R M= 5.

input output

000 0

001 1

01 2

10 3

11 4

t≤L, in the L q-bit words follows a binomial distribution.

The average number of replacements, denoted by µ, is

µ=Lw

2q.(12)

As (4), we have µ≤1/2. The probability that no word is

altered equals (1 −w

2q)L.

IV. VARIABLE-TO -FIX ED (VF) LENGTH ENCODING

Cao and Fair have investigated the application of

variable-to-ﬁxed length codewords (VF code) for constrained

systems [17, 18]. Again, let q=dlog2Me. In the VF scheme,

the Msource words may have two lengths, namely q−1

and q. We deﬁne v1= 2q−Msource words of length q−1

and v2=M−v1= 2M−2qsource words of length q. The

Msource words are assigned to the Mintegers taken from M.

Example 4: Let M= 5, then q= 3,v1= 2q−M= 3,

and v2= 2. Deﬁne the ﬁve source words 000, 001, 01, 10,

11 of length 3 and 2, respectively. The M= 5 source words

are arbitrarily assigned to the integers 0,...,4. Table I shows

a possible assignment. Let the input string be 001011000011.

The encoder parses the input string into the words 001, 01,

10, 000, 11 and translates them, using Table I, into the output

string 1, 2, 3, 0, and 4.

Assuming independent and identically distributed random

input data, the average rate efﬁciency of the VL code, denoted

by Rvl(M), equals

Rvl(M) = 1

Cq−1

2q−1v1+q

2qv2=1

Cq−2 + M

2q−1.

(13)

The worst case rate efﬁciency equals bCc/C, which is the

same as that of a simple block code. Note that, see (6), for

M= 2q−2twe have Rvl(M) = R1(M). Figure 3 shows the

redundancy 1−Rvl(M)versus log2M. A survey plotted in

Figure 1 shows the redundancy 1−Rvl(M)for the variable-

to-ﬁxed length (VF) encoding and 1−Rr(M)versus C=

log2M,r= 1,r= 2, for the new method.

V. APPLICATIONS

In this section, we present applications to Pearson codes

and ﬁxed-weight codes.

7 7.5 8 8.5 9 9.5 10

log2 M

0

0.002

0.004

0.006

0.008

0.01

0.012

1-R

Fig. 3. Redundancy 1−Rvl(M)of the VL scheme versus C=

log2M.

A. Binary Pearson codes

Pearson codes have been advocated for channels whose gain

and/or offset are unknown [19]. For binary channels, Q=

{0,1}, with unknown offset, the offset channel, it sufﬁces to

forbid the all-0 word (or the all-1 word), and for channels

with both unknown gain and offset, the offset/gain channel,

we forbid both the all-0 and all-1 words. Let the codeword

size be q, then we simply have M= 2q−1,M0= 2q−2,

and C= log2M,C0= log2M0for the offset and offset/gain

channel, respectively. Although only one or two codewords are

barred, prior art block codes face a serious loss for small q. By

invoking the new coding method, we are able to improve the

rate efﬁciency to R1(2q−1) = (q−2−q+1)/C or R0

1(2q−2) =

(q−2−q+2)/C0, see (6). For the VF code we ﬁnd the same

rate efﬁciency results, namely Rvl(2q−1) = R1(2q−1) and

R0

vl(2q−2) = R0

1(2q−2), respectively, which accords with

the results presented in [17, 18, 20].

B. Fixed-weight codes

The weight of a binary codeword is the number of its

symbols equal to ’1’. A balanced code is a ﬁxed-weight code

whose codewords have equal numbers of 1’s and 0’s. The

6B8B [21] and 4B6B [3] codes are examples of balanced

codes, where the short-hand notation mB`B refers to codes

that translate an m-bit input word into an `-bit codeword.

Balanced codes, such as the 6B8B and 5B10B codes, have

a minimum Hamming distance of at least two. The 5B10B

features a minimum Hamming distance 4 [22], which offers a

greater noise resilience at the cost of a higher redundancy. Note

that the 8B10B [2] code is not balanced as it uses codewords

of weight 4, 5, and 6. The codewords with weight 4 or 6 are

sent alternately for balancing the numbers of 0’1 and 1’s, so

that the concatenation of codewords is almost balanced [2].

The minimum Hamming distance of the 8B10B code is unity.

We have applied the new coding method to constructing

balanced codes. Table II shows the rate efﬁciency of the new

construction versus the codeword length `, where M=`

`/2.

TABLE II

PERFORMANCE OF `-BIT BALANCED CODES.

` M =`

`/2bCc/C R1(M)Rvl(M)

8 70 0.979 0.979 0.994

10 252 0.877 0.999 0.999

12 924 0.914 0.995 0.995

14 3432 0.937 0.993 0.994

16 12870 0.952 0.989 0.994

Except for the case `= 8, the resulting rate efﬁciency is

close to capacity, in most cases less than half a percent.

Example 5, `= 10, details the construction of a rate 255/320,

255B320B balanced code with minimum Hamming distance

two, whose rate is 0.3 % lower than that of an 8B10B code

with a minimum Hamming distance of unity.

Example 5: There are M= 252 10-bit balanced words.

A straightforward implementation of a block code translates

7 source bits into 10 channel bits. We may improve the

efﬁciency by combining codewords, see Example 1, but its

implementation requires impracticably large look-up tables.

With the new scheme, we ﬁnd q=dlog2Me= 8, and

w= 28−252 = 4. Then Lq −1 = 255 source bits can

be encoded into L= 32(= 2q−1/w)10-bit balanced words.

The rate efﬁciency is 0.999, see Table II. The new encoding

method requires data storage of 32 bytes, the execution of the

encoding algorithm, and a small look-up table for translating

an 8-bit wide word into a 10-bit balanced word.

VI. CONCLUSIONS

We have presented an encoding method for efﬁciently trans-

lating binary source data into a series of integer symbols in

the alphabet {0, . . . , M −1}. The series of integer symbols is

translated, using a second encoder, into a series of constrained

codewords. We have compared the rate efﬁciency of the

new scheme with that of variable-to-ﬁxed (VF) length codes.

As an application example, we have presented constructions

of Pearson codes and ﬁxed-weight and balanced codes that

offer a rate close to capacity. We have presented a high-rate

255B320B balanced code, where 255 source bits are translated

into 32 10-bit balanced codewords, has a rate 0.1 % below

capacity, and a minimum Hamming distance between the 10-

bit words being two.

REFERENCES

[1] K. Balasubramanian, S. S. Agili and A. Morales, “Encoding

and compensation schemes using improved pre-equalization for the

64B/66B Encoder,” 2012 IEEE International Conference on Con-

sumer Electronics (ICCE), Las Vegas, NV, pp. 361-363, 2012, doi:

10.1109/ICCE.2012.6161902.

[2] A. X. Widmer and P. A. Franaszek, “A Dc-balanced, Partitioned-Block,

8B/10B Transmission Code,” IBM J. Res. Develop., vol. 27, no. 5, pp.

440-451, Sept. 1983, doi: 10.1147/rd.275.0440.

[3] “IEEE standard for local and metropolitan area networks, part

15.7-2011: Short range wireless optical communication using vis-

ible light,” IEEE Std 802.15.7-2011, pp. 1-309, Sept 2011, doi:

10.1109/IEEESTD.2011.6016195.

[4] B. H. Marcus, P. H. Siegel, and J. K. Wolf, “Finite-state Modulation

Codes for Data Storage,” IEEE Journal on Selected Areas in Commu-

nications, vol. 10, no. 1, pp. 5-37, Jan. 1992, doi: 10.1109/49.124467.

[5] P. H. Siegel, “Recording Codes for Digital Magnetic Storage,” IEEE

Transactions on Magnetics, vol. MAG-21, no. 5, pp. 1344-1349, Sept.

1985, doi: 10.1109/TMAG.1985.1063972.

[6] Z. Wang, Q. Wang, W. Huang, and Z. Xu, Visible Light Communica-

tions: Modulation and Signal Processing, Wiley-IEEE Press, Jan 2018.

[7] M. Blawat, K. Gaedke, I. Hutter, X. Cheng, B. Turczyk, S. Inverso, B. W.

Pruitt, and G. M. Church, “Forward Error Correction for DNA Data Stor-

age,” International Conference on Computational Science (ICCS 2016),

vol. 80, pp. 1011-1022, 2016, doi.org/10.1016/j.procs.2016.05.398.

[8] K. A. S. Immink and K. Cai, “Properties and Constructions of Con-

strained Codes for DNA-based Data Storage,” IEEE Access, vol. 8, pp.

49523-49531, 2020, doi: 10.1109/ACCESS.2020.2980036.

[9] S. Fukuda, Y. Kojima, Y. Shimpuku, and K. Odaka, “8/10 Mod-

ulation Codes for Digital Magnetic Recording,” IEEE Transactions

on Magnetics, vol. MAG-22, no. 5, pp. 1194-1196, Sept. 1986, doi:

10.1109/TMAG.1986.1064445.

[10] V. Braun and K. A. S. Immink, “An Enumerative Coding Technique

for DC-free Runlength-Limited Sequences,” IEEE Transactions on

Communications, vol. 48, no. 12, pp. 2024-2031, Dec. 2000, doi:

10.1109/26.891213.

[11] K. A. S. Immink and K. Cai, “Properties and constructions of

energy-harvesting sliding-window constrained codes,” IEEE Commu-

nications Letters, vol. 24, no. 9, pp. 1890-1893, Sept. 2020, doi:

10.1109/LCOMM.2020.2993467.

[12] K. A. S. Immink, “A New DC-free Runlength Limited Coding Method

for Data Transmission and Recording,” IEEE Transactions on Con-

sumer Electronics, vol. CE-65, no. 4, pp. 502-505, Nov. 2019, doi:

10.1109/TCE.2019.2932795.

[13] K. A. S. Immink and K. Cai, “Spectral Shaping Codes,” IEEE Trans-

actions on Consumer Electronics, vol CE-67, no. 2, pp. 158-165, May

2021, 10.1109/TCE.2021.3073199.

[14] J. H. Weber, K. A. S. Immink, and S. R. Blackburn, “Pearson Codes,”

IEEE Transactions on Information Theory, vol. IT-62, no. 1, pp. 131-

135, Jan. 2016, doi: 10.1109/TIT.2015.2490219.

[15] D. E. Knuth, “Positional Number Systems,” The Art of Computer

Programming, vol. 2: Semi-numerical Algorithms, 3rd ed. Reading, MA:

Addison-Wesley, pp. 195-213, 1998.

[16] K. A. S. Immink, “High-Rate Maximum Runlength Constrained Coding

Schemes Using Nibble Replacement,” IEEE Transactions on Infor-

mation Theory, pp. 6572-6580, vol. IT-58, no. 10, Oct. 2012, doi:

10.1109/TIT.2012.2204034.

[17] C. Cao and I. Fair, “Construction of Multi-State Capacity-Approaching

Variable-Length Constrained Sequence Codes With State-Independent

Decoding,” IEEE Access, vol. 7, pp. 54746-54759, 2019, doi:

10.1109/ACCESS.2019.2913339.

[18] C. Cao and I. Fair, “Capacity-Approaching Variable-Length Pearson

Codes,” IEEE Communications Letters, vol. 22, no. 7, pp. 1310-1313,

July 2018, doi: 10.1109/LCOMM.2018.2829706.

[19] K. A. S. Immink and J. H. Weber, “Minimum Pearson Distance

Detection for Multi-Level Channels with Gain and/or Offset Mismatch,”

IEEE Transactions on Information Theory, vol. IT-60, no. 10, pp. 5966-

5974, Oct. 2014, doi: 10.1109/TIT.2014.2342744.

[20] J. H. Weber, T. G. Swart, and K. A. S. Immink, “Simple Sys-

tematic Pearson Coding,” IEEE International Symposium on In-

formation Theory, Barcelona, Spain, pp. 385-389, July 2016, doi:

10.1109/ISIT.2016.7541326.

[21] A. X. Widmer, “Dc-balanced 6B/8B Transmission Codes with Local

Parity,” US Patent 6,876,315, April 2005.

[22] V. A. Reguera, “New RLL code with improved error performance for

visible light communication,” arXiv preprint arXiv:1910.10079, 2019.