PreprintPDF Available

Minimally modified balanced codes

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

We present and analyze a new systematic construction of bipolar balanced codes where each code word contains equally many −1's and +1's. The new code is minimally modified as the number of symbol changes made to the source word for translating it into a balanced code word is as small as possible. The balanced codes feature low redundancy and time complexity. Large look-up tables are avoided.
Content may be subject to copyright.
1
Minimally modified balanced codes
Kees A. Schouhamer Immink, Fellow, IEEE and Jos H. Weber, Senior Member, IEEE
Abstract—We present and analyze a new construction of
bipolar balanced codes where each codeword contains equally
many 1’s and +1’s. The new code is minimally modified as
the number of symbol changes made to the source word for
translating it into a balanced codeword is as small as possible.
The balanced codes feature low redundancy and time complexity.
Large look-up tables are avoided.
Keywordsbalanced code, constrained code, error propa-
gation, Raney’s Lemma.
I. INTRODUCTION
Let x= (x1, . . . , xn),xi {−1,1}, be a word of length
nwith bipolar symbols. The balance of a word x, denoted
by w(x), is defined by w(x) = Pn
i=1 xi. A word is said to
be balanced if w(x) = 0,neven, i.e., it consists of equal
numbers of 1’s and +1s. A code is said to be balanced
if each word in the code is balanced. Balanced codes have
found widespread application in various fields such as data
transmission and data storage [1, 2, 3, 4, 5, 6]. Look-up tables
for translating source words into balanced codewords and vice
versa have been applied for small n[7, 8]. Enumeration tech-
niques [9, 10, 11, 12, 13] have been advocated for encoding
and decoding balanced words as it achieves the minimum
redundancy possible. The complexity of enumerative coding,
mainly the coefficients look-up tables, grows with n2, which
makes it less practical if complexity is at a premium.
Knuth’s implementation of balanced codes [14, 15, 16] is
attractive for encoding large source words as its complexity
scales linearly with word length n, but it requires a redundancy
log2n, which is, for large n, around twice the minimum
redundancy of a code comprising the full set of balanced
codewords. Modifications of Knuth’s generic scheme bridging
the gap between the minimum redundancy and that of Knuth’s
implementation are discussed in Al-Bassam and Bose [17, 18],
Tallini, Capocelli, and Bose [19, 20], and Weber and Im-
mink [21, 22].
Knuth’s balancing method is handsomely simple: a first
segment of the source word is inverted, i.e. flip the symbol
sign, for balancing. In addition, a prefix (tag) that uniquely
identifies the length of the inverted segment is forwarded to
the receiver. A disadvantage of Knuth’s method is that the
encoder inverts on average n/4 + 1 symbols, which may result
in extreme error propagation when the tag is received in error.
Implementations having low redundancy, complexity, and error
propagation are welcome alternatives to the art.
Kees A. Schouhamer Immink is with Turing Machines Inc, Willem-
skade 15d, 3016 DK Rotterdam, The Netherlands. E-mail: immink@turing-
machines.com.
Jos H. Weber is with Delft University of Technology, Delft, The Nether-
lands. E-mail: j.h.weber@tudelft.nl.
Our contributions: We present a novel method for ef-
ficiently translating arbitrary user data into balanced code-
words, which is based on Raney’s Lemma also known as
Cycle Lemma [23, 24, 25, 26]. As in Knuth’s construction,
the encoder judiciously inverts a number of source symbols
for obtaining the codeword. The proposed code, however, is
minimally modified as the number of symbol inversions is
minimal, which is an attractive virtue for reconstructing the
source word when errors are made during transmission. The
encoder inverts, on average, approximately pn/2π,n1,
symbols of the source word. Thus, for example, for n= 1000
only around twelve symbol inversions are required on average
(assuming equiprobable source words).
Information regarding the symbol modifications made to the
source word is encoded into a small redundant tag appended
to the codeword. We investigate fixed- and variable-length
tag schemes. Tags of multiple codewords can be combined
so reducing the overall redundancy. The redundancy of a
fixed-length tag scheme equals log2(n/2 + 1). The average
redundancy of a variable-length tag scheme approaches the
minimum possible for asymptotically large values of n. The
(time) complexity of the new balanced encoder and decoder
grows linearly with n.
We start in Section II with a description of Raney’s Lemma.
In Section III, we detail the encoding and decoding algorithms.
The code’s redundancy is discussed in Section IV, and a
performance comparison is given in Section V. Section VI
furnishes the conclusions of our paper.
II. RA NE YSLE MM A
We start with two definitions. The npartial, or running,
balances of the index i,1in, denoted by s(i, k), are
defined by
s(i, k) =
i+k1
X
j=i
xj,1in, (1)
where we extend the sequence xby letting xn+p=xpfor
1pn. An index iis said to be a minimal index of xif
and only if all the partial balances are positive, i.e.
s(i, k)>0,1kn. (2)
In other words, an index iis a minimal index of xif and
only if the partial balances of xi, . . . , xn, x1, . . . , xi1are all
positive. Note that it is immediate from (2) that if index iis
a minimal index then xi=xi+1 = 1.
Define the set of all minimal indexes of xby σ(x). If
w(x)>0there are, according to Raney’s Lemma [23, 24],
exactly |σ(x)|=w(x)minimal indexes, where |X|denotes
the cardinality of a set X.
2
2 4 6 8 10 12 14 16 18 20
1
2
3
4
5
6
7
8
Fig. 1. Partial balance s(1, k)versus kfor n= 10 and x=
(1,1,1,1,1,1,1,1,1,1). The diagram shows s(1, k)in the
extended interval 1k2n, by letting xn+p=xpfor 1pn,
which makes it more convenient to peruse the partial balances s(i, k)
for any i, as explained in the text. The minimal indexes of xare the
k-values of the points indicated by the circles.
Example 1: Let n= 10 and x= (1,1,1,1,1,1,1,
1,1,1). There are w(x) = 4 minimal indexes. Figure 1
illustrates the partial balances, s(1, k), versus k. We can check
that index i= 1 is a minimal index, since all partial balances
s(1, k)are positive. Further, note that s(i, k) = s(1, k +i
1) s(1, i 1), This implies that for any i,1inthe
s(i, k)curve with 1kncan be obtained from the curve
in Figure 1 by considering it from k=iup to k=i+n1and
then shifting this segment i1units to the left and s(1, i 1)
units downwards. Hence it follows that the minimal index set
is σ(x) = {1,8,9,10}.
III. RAN EY SLEMMA-BAS ED BALANCED CODES
A. Antipodal matchings
Ordentlich and Roth [25] pioneered antipodal matchings for
two-dimensional weight-constrained codes, which are based
on Raney’s Lemma. Before showing their results, we define
the function y=f(x, S), where Sis a subset of {1, . . . , n},
by
yi=xi, i S,
xi, i /S. (3)
Ordentlich and Roth [25] showed that all n-bit input words,
x, of balance w(x)>0can be converted into n-bit output
words, y, of inverted balance w(y) = w(x)by
y=f(x, σ(x)).(4)
In other words, we simply obtain the entries yiof yby invert-
ing the +1’s at all minimal indexes of xto 1’s. For the other
indexes we simply have yi=xi. Ordentlich and Roth proved
that the above antipodal matchings are bijective mappings, and
they presented an efficient (linear-time complexity) algorithm
for finding the set of minimal indexes σ(x),w(x)>0, for
all word lengths n. They generalized the algorithm to words
xwith w(x)<0.
At first sight, the above algorithm looks superfluous as
purely reversing the sign of a word balance is obviously
achieved by inverting all symbols of x. The algorithm based
on Raney’s Lemma, however, has the advantage that a minimal
plurality of symbols (+1’s only if w(x)>0or 1s only if
w(x)<0) is inverted, which is a highly attractive feature
for constructing two-dimensional weight-constrained codes as
shown in [25]. Below we show that Raney’s Lemma can be
harnessed to balance codewords with a minimal number of
symbol inversions.
B. Balanced codes
Let Swdenote the set of n-bit words, x, whose balance
equals w(x) = w, that is
Sw=(x {−1,1}n:
n
X
i=1
xi=w).(5)
Note that S0is the set of balanced words.
Let the (minimal) indexes in σ(x)be ordered in magnitude,
that is σ(x) = {i1, i2, . . . , iw}, where i1< i2< . . . <
iw1< iw. For w > 0we define the mapping φ(.)between
x Swand the balanced y=φ(x) S0, where
y=φ(x) = f(x,{i1, i2, . . . , i w
2}).(6)
Clearly w(φ(x)) = 0.
The following lemma shows some important properties,
which will be used later. Essential parts have been presented
in [25, Prop. 4.6].
Lemma 1: For x {−1,1}nwith w(x) = w > 0and
σ(x) = {i1, i2, . . . , iw}with i1< i2< . .. < iw1< iw<
iw+1 =i1+n, it holds for all 1jwand ij+ 1 v
ij+1 1that
(i)
v
X
i=ij+1
xi0,
(ii)
ij+11
X
i=ij+1
xi= 0,
(iii)
ij+11
X
i=v
xi0.
Proof: (i) Since ijis a minimal index of x, we have
v
X
i=ij+1
xi=
v
X
i=ij
xixij11=0.(7)
(ii) Note that
w
X
j=1
ij+11
X
ij+1
xi=
w
X
j=1
ij+11
X
i=ij
xi
w
X
j=1
xij
=
n
X
i=1
xi
w
X
j=1
xij=ww= 0.
3
0 20 40 60 80 100
-15
-10
-5
0
5
10
15
20
source word
codeword
Fig. 2. Partial balances, s(1, k), of a) an arbitrary source word, x,
and b) that of the balanced codeword y=φ(x)versus index kfor
n= 100; word balance equals w=w(x) = 16. The partial balances
at the minimal indexes of xare indicated by a ’*’.
Since ij+11
X
i=ij+1
xi0j
because of (i), the result follows.
(iii) It follows from (i) and (ii) that
ij+11
X
i=v
xi=
ij+11
X
i=ij+1
xi
v1
X
i=ij+1
xi00=0.
Note that this lemma deals with sums of symbols at positions
in the strings between two (cyclicly) consecutive minimal
indexes. In particular, it says that the sum of the symbols in
(i) any head of such string is nonnegative,
(ii) the complete string is equal to zero,
(iii) any tail of such string is nonpositive.
As a visual illustration we have plotted in Figure 2 the partial
balances, s(1, k), of a) an arbitrary source word xof length
n= 100,w(x) = 16 and b) the partial balances of the bal-
anced codeword y=φ(x). The partial balances at the minimal
indexes of xare indicated by a ’*’. The various properties
discussed in Lemma 1 can easily be noted. For example, note
the unity balance increments between consecutive minimal
indexes. The partial balances of the balanced codeword yare
indicated by a ’*’ at the minimal indexes of source word x.
For the smallest w/2minimal indexes of xwe note unity
balance decrements between consecutive minimal indexes,
while for the largest w/2minimal indexes we note unity
balance increments between consecutive minimal indexes.
C. Encoding
We propose the following encoding rule, denoted by y=
ψ(x), for translating an n-bit source word x,x {−1,1}n,
into a balanced n-bit codeword y,y S0:
y=ψ(x) =
φ(x), w(x)>0,
φ(x), w(x)<0,
x, w(x)=0,
(8)
Input: The bipolar n-bit word (x1, . . . , xn),xi {−1,1}.
Output: Encoded n-bit bipolar word yand tag w, i.e. ENC(x)
=(y, w).
begin
let w=Pn
i=1 xi
if w= 0 y=xhalt
if w < 0set x=x{invert all symbols}
run Algorithm [25, Fig. 6] yielding {i1, . . . , iw}
for i {i1, . . . , i w
2}set xi=1
if w > 0set y=x
if w < 0set y=x{invert all symbols}
end.
Fig. 3. Basic encoding algorithm ENC(x).
where xdenotes (x1,x2,...,xn). Figure 3 shows the
basic encoding algorithm ENC(x). Part of the encoding table,
n= 6, has been tabulated in Table I; note that for clerical con-
venience a ‘0’ indicates a 1 symbol. As ψ(x) = ψ(x)
we can easily extend the table. Note that in the mapping
TABLE I
PART OF ENCODING TABLE y=ψ(x)FO R n= 6. A ‘0 INDICATES 1.
xψ(x)xψ(x)
000000 111000 001000 101100
000001 110001 001001 101001
000010 110010 001010 101010
000011 100011 001011 001011
000100 110100 001100 001110
000101 100101 001101 001101
000110 100110 001110 001110
000111 000111 001111 000111
y=ψ(x)the |w(x)/2|rightmost symbols of the codeword y
equal those of the source word x. We have y=ψ(x) =
xi=yi,i=nw/2+1, . . . , n, where w=w(x)>0.
By definition of y=ψ(x), see (6), (8), and (9), only the
symbols are inverted at indexes in {i1, i2, . . . , iw/2}. We have
{i1, i2, . . . , iw/2}⊂{1, . . . , n w(x)/2}, so that the w(x)/2
rightmost symbols of xare unchanged. As w(x)2we have
yn=xnfor all x.
The receiver is able to uniquely recover xfrom the received
(balanced) y=ψ(x)if we add a tag to the sent ythat uniquely
identifies the balance of the source word x. A tag can be sent
separately as a pre- or postfix or we may combine multiple
tags to form a large tag data word. The code redundancy is
discussed in Section IV.
D. Decoding
The decoder uniquely retrieves a facsimile x0of the original
source word, x, from the received (balanced) y=ψ(x)
and tag associated with the balance of the source word,
w(x). Figure 4 shows a description of the basic decoding
algorithm. Note that the decoder (time) complexity grows
linearly with word length n. The next theorem shows that
4
Input: The integer w=w(x) {−n, n+ 2, . . . , n}, and
the bipolar n-bit balanced word (y1, . . . , yn) = ψ(x) S0,
yi {−1,1}.
Output: Decoded n-bit bipolar word DEC(y, w) = x0.
Initialize:
if w= 0 x0=yhalt;
if w < 0v=y{invert all symbols}
if w > 0v=y
set w2= abs(w
2)
begin
let zi=Pi
j=1 vj,i= 1, . . . , n
let m= min{zi}
let i0
j= min{i:zi=m+w2j} j= 1, . . . , w2
let v0=f(v,{i0
1, . . . , i0
w2})
if w > 0x0=v0
if w < 0x0=v0{invert all symbols}
end.
Fig. 4. Basic decoding algorithm DEC(y, w ).
the decoding algorithm is correct, that is, DEC(ENC(x)) = x.
Theorem 1: For any x {−1,1}n, it holds that
DEC(ENC(x)) = x.
Proof: We show that the decoding algorithm, shown in Fig-
ure 4, with input (ψ(x), w(x)) is correct and generates the
original source word xas an output. From the encoding and
decoding procedures, this is trivially true if w(x)=0, while
correctness of the w(x)>0case implies that it is also
true for the w(x)<0case. Hence, we further assume that
w=w(x)>0. Note that
vi=
xi=1if i {i1, . . . , i w
2},
xi= 1 if i {iw
2+1, . . . , iw},
xiotherwise.
(9)
Let abe the sum of the first i11entries of v, i.e.,
a=zi11=
i11
X
i=1
vi=
i11
X
i=1
xi.(10)
It follows from Lemma 1 (ii) and (9) that
zij=aj, j {1,2,...,w
2}.(11)
Furthermore we have
zi
aw
2i {iw
2, . . . , n},
ajj {1,...,w
21}, i {ij, . . . , ij+1 1},
ai {1, . . . , i11},
(12)
where the first two inequalities follow from (9), (11), and
Lemma 1 (i), while the third inequality follows from the fact
that zi< a would imply with (9) and (10) that
i11
X
j=i+1
xj=
i11
X
j=i+1
vj=zi11zi> a a= 0,
which contradicts Lemma 1 (iii). Hence, (11) and (12) give
that m=aw
2and that for any j {1,...,w
2}the smallest
isuch that zi=m+w
2j=ajis i=ij, and thus that
i0
j=ij. In conclusion, the decoder output satisfies
x0=v0=f(v,{i0
1, . . . , i0
w
2}) = f(y,{i1, . . . , i w
2}) = x.
IV. REDUNDANCY
The number of balanced codewords of length nequals
|S0|=n
n
2,(13)
and thus the minimum redundancy of balanced codewords of
length n, denoted by H0, is
H0=nlog2|S0|=nlog2n
n
2.(14)
For asymptotically large nwe have the approximation [14]
H01
2log2n+ 0.326, n 1.(15)
The redundancy of the new code is governed by the amount
of data required to recover the balance w(x)of the source
word x. The balance w(x) {−n, n+ 2, . . . , n 2, n}so
that for the simplest fixed-length tag scheme, the redundancy
is log2(n+ 1). The next theorem will help to reduce the
redundancy.
Theorem 2: Let y S0,zi=Pi
j=1 yj, for i= 1, . . . , n,
zmin = min{zi}, and zmax = max{zi}. Then it holds that
|{x Sw:ψ(x) = y}| =
1if w {−2zmax,
2zmax + 2,...,2zmin},
0otherwise.
(16)
Proof: From Theorem 1 it follows that the mapping ENC(x)
from {−1,1}nto S0×{−n, n+2, . . . , n}is injective. Hence,
for each w {−n, n+ 2, . . . , n}, there is at most one word
x Swfor which ψ(x) = y. In items (i)-(v) below we
investigate for which values of wsuch a word xexists. Define
i0
j= min{i:zi=zmin +w/2j},j= 1, . . . , w/2, and
observe that y S0implies
zmin 0zmax.
(i) If w= 0, then there does exist an x Swsuch that
ψ(x) = y, namely x=y, which immediately follows
from (8).
(ii) If w {2,4,...,2zmin}then there does exist
an x Swsuch that ψ(x) = y, namely x=
f(y,{i0
1, . . . , i0
w/2}). This can be checked as follows.
Note that zi0
j=zmin +w/2j < 0and xi0
j=yi0
j= 1
for j= 1,2, . . . , w/2, while xi=yifor all indexes
i6=i0
j. On the one hand, observe that any iwith
i0
j< i < i0
j+1,j {0,1, . . . , w/21},i0
0= 0, is
not a minimal index of x, since
i0
j+11
X
m=i
xm=
i0
j+11
X
m=i
ym=zi0
j+11zi10.
5
On the other hand, any i0
j,j= 1, . . . , w/2, is a minimal
index of x, since for all k {1,2, . . . , n}it holds that
i0
j+k1
X
i=i0
j
xi
i0
j+k1
X
i=i0
j
yi+ 2b b+ 2b=b1,
where b=|{m {j, j + 1, . . . , w/2}:i0
j
i0
mi0
j+k1}|. In conclusion, i0
1, . . . , i0
w/2are the
w/2smallest minimal indexes of x, and thus ψ(x) =
f(x,{i0
1, . . . , i0
w/2}) = y.
(iii) If w {−2zmin + 2,2zmin + 4, . . . , n}, then
there is no x Swfor which ψ(x) = y, as we will
show next. Suppose there does exist such x. Let the
w/2smallest minimal indexes of xbe i1, . . . , iw/2. Since
y=f(x,{i1, . . . , iw/2})and x=f(y,{i0
1, . . . , i0
w/2}),
it follows that ij=i0
jj. Hence, we obtain the contra-
diction
ziw
2
=
i11
X
i=1
yi+
iw
2
X
i=i1
yi=
i11
X
i=1
xiw
2 w
2< zmin,
where the first inequality follows from Lemma 1 (iii) and
the second from the fact that w > 2zmin.
(iv) If w {−n, n+ 2,...,2zmax 2}, then there is
no x Swfor which ψ(x) = y, which can be shown in
a similar way as (iii).
(v) If w {−2zmax,2zmax + 2,...,2}, then there
exists an x Swsuch that ψ(x) = y, which can be
shown in a similar way as (ii).
Define
N(y) = zmax zmin + 1,(17)
where N(y)is called the balance span of y. Let r(y)denote
the number of distinct source words x {−1,1}nthat map
to y S0, that is
r(y) = |{x {−1,1}n:y=ψ(x)}|,y S0.(18)
Corollary 1: For all y S0, it holds that
r(y) = N(y).
Proof: This result immediately follows from Theorem 2 by
counting the number of wfor which |{x Sw:ψ(x) =
y}| = 1.
A. Fixed-length (FL) tag scheme
The tag length of a scheme with a fixed-length tag depends
on the maximum value of r(y), and for a variable-length
scheme it depends on the distribution of r(y). We easily find
that 2r(y)n/2 + 1. Note that the codeword denoted by
y1that starts with n/2-1’s and ends with n/2+1’s (and the
n1circular shifts of y1) has the largest number of source
words that map on it, namely the n/2+1 words, x, that start
with p,p= 0,1, . . . , n/2, -1’s and end with np+1’s.
The decoder must be able to distinguish between at most
n/2+1 source words that map on the received word, which
makes it possible to reduce the tag length to log2(n/2+1). To
do so, the encoder first computes y=ψ(x)using the encoding
algorithm, see Figure 3, and subsequently it computes r(y).
Using r(y), the value w(x)is uniquely encoded into the n/2+
1possible tag values, so that the decoder can uniquely recover
w(x)from the tag and y. The redundancy of this scheme
equals log2(n/2 + 1).
B. Variable-length (VL) tag scheme
The average redundancy of a VL tag scheme is less than
that of the above fixed-length tag scheme. As the distribution
of r(y)is the same as that of Knuth’s code, we follow [22]
for the computation of the redundancy of the VL tag scheme.
The number of balanced words yof length nwith r(y) = u,
denoted by P(u, n),2un/2+1, is given by [22]
P(u, n) = D(u, n)2D(u1, n) + D(u2, n),(19)
where
D(u, n) = 2n
u
X
i=1
cosnπi
u+ 1.(20)
The above expression is surprising as D(u, n)is integer
valued. Using a result by Merca [27] we may translate (20)
into a summation of binomial coefficients
D(u, n)=(u+ 1)
v
X
k=vn
n
2+k(u+ 1)2n,(21)
where v=bn/(2u+ 2)c. The redundancy of the VL tag
scheme, denoted by H, equals [22]
H= 2n
n/2+1
X
u=2
uP (u, n) log2u. (22)
The redundancy Hhas been computed in [22, Table II]
for selected values of n213. For n= 213, we find
HH00.033. Eq. (19) is ill-conditioned as P(u, n)is
the difference between two much larger quantities. We were
not able to obtain results of (22) for asymptotically large n,
see also [2].
V. PER FO RM AN CE C OMPARI SO N
In this section, we discuss the number of modifications to
a source word that are made by the prior art Knuth code [14]
and the newly developed code. We start with the new method.
A. New method
The probability, denoted by Pr1(`), that `=|w(x)|/2,
0`n/2, symbols of xare inverted to obtain ψ(x)equals
(assuming equiprobable source words)
P r1(`) = (1
2nn
n
2, ` = 0,
1
2n1n
n
2+`,1`n
2.(23)
The average number of symbol inversions, denoted by ¯
`1,
equals
¯
`1=
n
2
X
`=1
`P r1(`).(24)
6
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10-4
10-3
10-2
10-1
100
Pr2(l), n=256
Pr1(l), n=64
Pr1(l), n=256
Pr2(l), n=64
Fig. 5. Distributions P r1(`)and P r2(`)versus the (relative) number
of symbol inversions `/n. Word length n= 64 and n= 256.
For large n, we obtain by using the well-known Gaussian
approximation to the binomial coefficients,
¯
`1rn
2π, n 1.(25)
B. Knuth’s method
Knuth [14] presented a simple scheme for balancing large
codewords. Let xbe the n-bit (neven) source word of bipolar
symbols, xi {−1,1}. Knuth showed that there is a balancing
index,`, such that
`
X
i=1
xi+
n
X
i=`+1
xi= 0, n even.(26)
In other words, by inverting a first segment of `symbols any
word xof even length can be balanced. Note that the balancing
index `is not unique [21]. We assume here that the encoder
selects the smallest balancing index from the set of balancing
indexes. The distribution of the number of symbol inversions,
`, for obtaining the balanced word in Knuth’s scheme, denoted
by P r2(`),1`n(1jn/2), has been computed
by Weber and Immink [21] (assuming equiprobable source
words)
P r2(2j) = P r2(2j1)
=n2j+ 1
n2n22(j1)
j1n2j
n
2j.
The average number of symbol inversions of Knuth’s scheme,
denoted by ¯
`2, simply equals, see Appendix,
¯
`2=
n
X
`=1
`P r2(`) = n
4+ 1.(27)
C. Comparison of the two methods
Figure 5 shows two examples of the distributions Pr1(`)
and P r2(`)versus the relative number of symbols inversions
`/n for word lengths n= 64 and n= 256. We may notice
that the distribution of Knuth’s method, P r2(`), is much wider
than that of the new method, P r1(`), which has a direct effect
on the average number of inversion (bit changes) made. For
example, for a codeword length n= 1000 around 12 symbol
inversions are required on average per codeword for the new
scheme. Knuth’s code requires, on average, for the same
codeword length, n= 1000, around 250 symbol inversions
for translating source words into codewords.
VI. CONCLUSIONS
We have presented a novel method for efficiently translating
arbitrary user data into balanced codewords. The new code is
minimally modified as the number of symbol changes made
to the source word for translating it into a balanced codeword
is minimal. The encoder inverts, on average, approximately
pn/2π,n1, symbols of the source word, where ndenotes
the source word length; the other code symbols being equal to
the source symbols. The redundancy of the new method using
a fixed-length tag is log2(n/2 + 1). Large look-up tables for
encoding and decoding are avoided. The (time) complexity of
the new balanced encoder and decoder grows linearly with
source word length nfor asymptotically large values of n.
VII. APPENDIX
Let for 1jn
2
P r2(2j) = P r2(2j1)
=n2j+ 1
n2n22(j1)
j1n2j
n
2j.(28)
Theorem 3:
¯
`2=
n
X
i=1
iP r2(i) = n
4+ 1.(29)
Proof: We simply find, combining P r2(2i)and P r2(2i1),
n
X
i=1
iP r2(i) =
n
2
X
i=1
(4i1)P r2(2i).(30)
Since P r2(i)is a probability mass function, we have
n
2
X
i=1
P r2(2i) = 1
2,(31)
and we obtain
¯
`2= 4
n
2
X
i=1
iP r2(2i)1
2.(32)
Define the moments
mk(n) =
n
2
X
j=1
jk2(j1)
j1n2j
n
2j, k = 0,1,2,(33)
then substituting into (32) yields
¯
`2=4(n+ 1)
n2n2m1(n)8
n2n2m2(n)1
2.(34)
In the literature [24, pp. 187], we find
m0(n) = 2n2.(35)
7
As, see (28), (31), and (33),
n+ 1
n2n2m0(n)2
n2n2m1(n) = 1
2,(36)
we obtain
m1(n) = 2n4(n+ 2).(37)
The zeroth and second moments, m0(n)and m2(n), are the
autoconvolution of the sequence 2i
iand i2i
i,i= 1,2, . . .,
respectively. The generating function of the autoconvolution
is obtained by squaring the original generating function as
presented in [24]. Due to space limitations, we omit the details,
and summarize the result:
m2(n)=2n7(3n2+ 6n+ 8).(38)
Substituting (37) and (38) into (34) proves the theorem.
ACK NOW LE DG EM EN T
The authors are indebted to dr. A.J.E.M. (Guido) Janssen
for his assistance with the proof of Theorem 3 given in the
Appendix.
REFERENCES
[1] A. R. Calderbank, ”The art of signaling: fifty years of coding theory,
IEEE Transactions on Information Theory, vol. 44, no. 6, pp. 2561-2595,
Oct. 1998, doi: 10.1109/18.720549.
[2] D. T. Dao, H. M. Kiah, and T. T. Nguyen, Average Redundancy of
Variable-Length Balancing Schemes `
a la Knuth,” ArXiv: 2204.13831,
April 2022.
[3] O. P. Babalola and V. Balyan, ‘Efficient Channel Coding for Dimmable
Visible Light Communications System, IEEE Access, vol. 8, pp.
215100-215106, 2020, doi: 10.1109/ACCESS.2020.3041431.
[4] J. N. Franklin and J. R. Pierce, “Spectra and Efficiency of Binary Codes
without DC,” IEEE Transactions on Communications, vol. COM-20, no.
6, pp. 1182-1184, Dec. 1972, doi: 10.1109/TCOM.1972.1091308.
[5] F. Chang, W. Hu, D. Lee and C. Yu, “Design and implementation of anti
low-frequency noise in visible light communications, 2017 International
Conference on Applied System Innovation (ICASI), Sapporo, 2017, pp.
1536-1538, doi: 10.1109/ICASI.2017.7988219.
[6] K. A. S. Immink and K. Cai, “Properties and Constructions of Con-
strained Codes for DNA-based Data Storage, IEEE Access, vol. 8, pp.
49523-49531, 2020, doi: 10.1109/ACCESS.2020.2980036.
[7] A. X. Widmer and P. A. Franaszek, “A Dc-balanced, Partitioned-Block,
8B/10B Transmission Code, IBM J. Res. Develop., vol. 27, no. 5, pp.
440-451, Sept. 1983, doi: 10.1147/rd.275.0440.
[8] C. N. Yang and D. J. Lee, “Some new efficient second-order spectral-
null codes with small lookup tables,” IEEE Transactions on Computers,
vol. 55, no. 7, pp. 924-927, July 2006, doi: 10.1109/TC.2006.111.
[9] T. M. Cover, “Enumerative Source Coding,” IEEE Transactions on
Information Theory, vol. IT-19, no. 1, pp. 73-77, Jan. 1973, doi:
10.1109/TIT.1973.1054929.
[10] V. Braun and K. A. S. Immink, “An Enumerative Coding Technique
for DC-free Runlength-Limited Sequences,” IEEE Transactions on
Communications, vol. 48, no. 12, pp. 2024-2031, Dec. 2000, doi:
10.1109/26.891213.
[11] J. P. M. Schalkwijk, “An Algorithm for Source Coding, IEEE Transac-
tions on Information Theory, vol. IT-18, no. 3, pp. 395-399, May 1972,
doi: 10.1109/TIT.1972.1054832.
[12] Y. Xin and I. J. Fair, Algorithms to Enumerate Codewords for DC2-
Constrained Channels,” IEEE Transactions on Information Theory, vol.
IT-47, no. 7, pp. 3020-3025, Nov. 2001, doi: 10.1109/18.959281.
[13] A. Hareedy, B. Dabak, and R. Calderbank, “The Secret Arithmetic of
Patterns: A General Method for Designing Constrained Codes Based
on Lexicographic Indexing, IEEE Transactions on Information Theory,
2022. doi: 10.1109/TIT.2022.3170692.
[14] D. E. Knuth, “Efficient Balanced Codes,” IEEE Transactions on
Information Theory, vol. IT-32, no. 1, pp. 51-53, Jan. 1986, doi:
10.1109/TIT.1986.1057136
[15] H. D. L. Hollmann and K. A. S. Immink, “Performance of efficient
balanced codes,” IEEE Transactions on Information Theory, vol. IT-37,
no. 3, pp. 913-918, May 1991, doi: 10.1109/18.79961.
[16] F. Paluncic, B. T. Maharaj, and H. C. Ferreira, “Variable- and Fixed-
Length Balanced Runlength-Limited Codes Based on a Knuth-Like Bal-
ancing Method,” IEEE Transactions on on Information Theory, vol. IT-
65, no. 11, pp. 7045-7066, Nov. 2019, doi: 10.1109/TIT.2019.2914205.
[17] S. Al-Bassam and B. Bose, “On Balanced Codes,” IEEE Transactions
on Information Theory, vol. IT-36, no. 2, pp. 406-408, March 1990, doi:
10.1109/18.52490.
[18] S. Al-Bassam and B. Bose, “Design of Efficient Balanced Codes, IEEE
Transactions on Computers, vol. 43, pp. 362-365, March 1994, doi:
10.1109/12.272436.
[19] L. G. Tallini, R. M. Capocelli, and B. Bose, “Design of some new
efficient balanced bodes,” IEEE Transactions on Information Theory,
vol. IT-42, no. 3, pp. 790-802, May 1996, doi: 10.1109/18.490545.
[20] L. G. Tallini and B. Bose, “Balanced codes with parallel encoding and
decoding,” IEEE Transactions on Computers, vol. 48, no. 8, pp. 794-
814, Aug. 1999, doi: 10.1109/12.795122.
[21] J. H. Weber and K. A. S. Immink, “Knuth’s Balanced Codes Revisited,”
IEEE Transactions on Information Theory, vol. IT-56, no. 4, pp. 1673-
1679, April 2010, doi: 10.1109/TIT.2010.2040868.
[22] K. A. S. Immink and J. H. Weber, “Very Efficient Balanced Codes,”
IEEE Journal on Selected Areas of Communications, vol. 28, no. 2, pp.
188-192, Feb. 2010, doi: 10.1109/JSAC.2010.100207.
[23] G. Raney, “Functional Composition Patterns and Power Series Rever-
sion,” Transactions of the American Mathematical Society, vol. 94. pp.
441-451, 1960.
[24] R. L. Graham, D. E. Knuth, and O. Patashnik, Concrete mathematics: A
Foundation for Computer Science (2nd Edition), Addison-Wesley, ISBN-
13: 978-0201558029, 2018.
[25] E. Ordentlich and R. M. Roth, “Low Complexity Two-
Dimensional Weight-Constrained Codes”, IEEE Transactions on
Information Theory, vol. 58, no. 6, pp. 3892-3899, June 2012, doi:
10.1109/TIT.2012.2190380.
[26] T. T. Nguyen, K. Cai, K. A. S. Immink and Y. M. Chee, “Efficient
Design of Capacity-Approaching Two-Dimensional Weight-Constrained
Codes,” 2021 IEEE International Symposium on Information Theory
(ISIT), pp. 2930-2935, 2021, doi: 10.1109/ISIT45174.2021.9517970.
[27] M. Merca, A note on cosine power series,” Journal of Integer
Sequences, vol. 15, no. 5, Article 15.5.3, MR2942751, 2012.
Kees A. Schouhamer Immink (M’81-SM’86-F’90) founded
Turing Machines Inc. in 1998, an innovative start-up focused
on novel signal processing for DNA-based storage, where he
currently holds the position of president. He was from 1994
till 2014 an adjunct professor at the Institute for Experimental
Mathematics, Essen-Duisburg University, Germany.
He contributed to digital video, audio, and data recording
products including Compact Disc, CD-ROM, DCC, DVD,
and Blu-ray Disc. He received the 2017 IEEE Medal of
Honor, a Knighthood in 2000, a personal Emmy award in
2004, the 1999 AES Gold Medal, the 2004 SMPTE Progress
Medal, the 2014 Eduard Rhein Prize for Technology, and the
2015 IET Faraday Medal. He received the Golden Jubilee
Award for Technological Innovation by the IEEE Information
Theory Society in 1998. He was inducted into the Consumer
Electronics Hall of Fame, elected into the Royal Netherlands
Academy of Sciences and the (US) National Academy of
Engineering. He received an honorary doctorate from the
University of Johannesburg in 2014. He served the profession
as President of the Audio Engineering Society inc., New
York, in 2003.
Jos H. Weber (S’87-M’90-SM’00) was born in Schiedam, The
Netherlands, in 1961. He received the M.Sc. (in mathematics,
8
with honors), Ph.D., and MBT (Master of Business Telecom-
munications) degrees from Delft University of Technology,
Delft, The Netherlands, in 1985, 1989, and 1996, respectively.
Since 1985 he has been with the Delft University of
Technology. Currently, he is an associate professor at the De-
partment of Applied Mathematics. He was the chairman of the
Werkgemeenschap voor Informatie- en Communicatietheorie
from 2006 until 2021. He is the secretary of the IEEE Benelux
Chapter on Information Theory since 2008. He was a visiting
researcher at the University of California (Davis, CA, USA),
the Tokyo Institute of Technology (Japan), the University of
Johannesburg (South Africa), EPFL (Switzerland), and SUTD
(Singapore). His main research interests are in the area of
channel coding.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Constrained codes are used to prevent errors from occurring in various data storage and data transmission systems. They can help in increasing the storage density of magnetic storage devices, in managing the lifetime of solid-state storage devices, and in increasing the reliability of data transmission over wires. Over the years, designing practical (complexity-wise) capacity-achieving constrained codes has been an area of research gaining significant interest. We recently designed various constrained codes based on lexicographic indexing. We introduced binary symmetric lexicographically-ordered constrained (S-LOCO) codes, $q$ -ary asymmetric LOCO (QA-LOCO) codes, and a class of two-dimensional LOCO (TD-LOCO) codes. These families of codes achieve capacity with simple encoding and decoding, and they are easy to reconfigure. We demonstrated that these codes can contribute to notable density and lifetime gains in magnetic recording (MR) and Flash systems, and they find application in other systems too. In this paper, we generalize our work on LOCO codes by presenting a systematic method that guides the code designer to build any constrained code based on lexicographic indexing once the finite set of data patterns to forbid is known. In particular, we connect the set of forbidden patterns directly to the cardinality of the LOCO code and most importantly to the rule that uncovers the index associated with a LOCO codeword. By doing that, we reveal the secret arithmetic of patterns, and make the design of such constrained codes significantly easier. We give examples illustrating the method via codes based on lexicographic indexing from the literature. We then design optimal (rate-wise) constrained codes for the new two-dimensional magnetic recording (TDMR) technology. Over a practical TDMR model, we show notable performance gains as a result of solely applying the new codes. Moreover, we show how near-optimal constrained codes for TDMR can be designed and used to further reduce complexity and error propagation. All the newly introduced LOCO codes are designed using the proposed general method, and they inherit all the desirable properties in our previously designed LOCO codes.
Article
Full-text available
Visible light communication (VLC) offers wireless communication within short-range based on wavelength converters and light-emitting diode (LED). In the VLC system, conventional forward error correction (FEC) codes are not guaranteed to provide flicker mitigation and dimming support. Consequently, modified coding schemes are introduced for reliable VLC. These methods require complicated coding structures, use of lookup tables, and the addition of large redundancy, resulting to increased computational complexity and low transmission efficiency. In this article, we propose a coding scheme that is flicker-free and enhances the transmission efficiency for VLC systems. The proposed scheme is based on polar codes (PC) and Knuth balancing code with enhanced prefix coding technique. The results show that the proposed algorithm exhibits improved transmission efficiency compared to the PC without and with run-length limited code, for dimming values 75% (or 25%) and 87.5% (or 12.5%). Also, the proposed scheme presents a significant bit error rate (BER) performance gain compared to the schemes in literature. The proposed scheme is flicker-free, provides a simple encoding structure, does not utilize lookup tables, generates minimal number of redundancies for energy efficiency. Thus, the approach is flexible, and it is more suitable for real-time VLC systems. INDEX TERMS Forward error correction, Knuth balancing codes, light-emitting diode, polar codes, visible light communication.
Conference Paper
Full-text available
In this work, given n, p>0 , efficient encoding/decoding algorithms are presented for mapping arbitrary data to and from n×n binary arrays in which the weight of every row and every column is at most pn. Such constraint, referred as p-bounded-weight-constraint, is crucial for reducing the parasitic currents in the crossbar resistive memory arrays, and has also been proposed for certain applications of the holographic data storage. While low-complexity designs have been proposed in the literature for only the case p=1/2 , this work provides efficient coding methods that work for arbitrary values of p . The coding rate of our proposed encoder approaches the channel capacity for all p .
Article
Full-text available
We describe properties and constructions of constraint-based codes for DNA-based data storage which account for the maximum repetition length and AT/GC balance. Generating functions and approximations are presented for computing the number of sequences with maximum repetition length and AT/GC balance constraint. We describe routines for translating binary runlength limited and/or balanced strings into DNA strands, and compute the efficiency of such routines. Expressions for the redundancy of codes that account for both the maximum repetition length and AT/GC balance are derived.
Article
Full-text available
Using the multisection series method, we establish formulas for various power sums of cosine functions. As corollaries we derive several combinatorial identities.
Article
A novel Knuth-like balancing method for runlength-limited words is presented, which forms the basis of new variable- and fixed-length balanced runlength-limited codes that improve on the code rate as compared to balanced runlength-limited codes based on Knuth’s original balancing procedure developed by Immink et al. While Knuth’s original balancing procedure, as incorporated by Immink et al. , requires the inversion of each bit one at a time, our balancing procedure only inverts the runs as a whole one at a time. The advantage of this approach is that the number of possible inversion points, which needs to be encoded by a redundancy-contributing prefix/suffix, is reduced, thereby allowing a better code rate to be achieved. Furthermore, this balancing method also allows for runlength violating markers which improve, in a number of respects, on the optimal such markers based on Knuth’s original balancing method.
Article
found in the writings of Jacobson [9], Becker [2], Motzkin [11], and Bourbaki [3; 4]. This paper will be concerned with a natural generalization of Cayley's problem, and will show that the solution to the generalized problem contains all of the combinatorial information needed to establish the well known formula of Lagrange for the reversion of power series. To describe the problem, we consider expressions which are built from operator symbols and argument symbols, using a prefix notation for operators. Weights are assigned to the symbols in an expression, an argument symbol having the weight 0 and an n-ary operator symbol having the weight n. Expressions are of various types, the type of an expression depending only on the weights of the symbols in it and on the order in which they appear. The expression (a+b) +c, for example, is written + +abc and is of the type 22000, while a+(b+c) is written +a+bc and is of the type 20200. The expression F(G(x, H(y, z), t), K(u)) is of the type 230200010. Following P. C. Rosenbloom [13], we call those finite sequences of natural numbers which designate the types of expressions "words." Definitions and some special properties of these sequences are stated in ?2.
Article
Two low complexity coding techniques are described for mapping arbitrary data to and from m × n binary arrays in which the Hamming weight of each row (respectively, column) is at most n/2 (respectively, m/2). One technique is based on flipping rows and columns of an arbitrary binary array until the Hamming weight constraint is satisfied in all rows and columns, and the other is based on a certain explicitly constructed “antipodal” matching between layers of the Boolean lattice. Both codes have a redundancy of roughly m+n and may have applications in next generation resistive memory technologies.
Article
In digital transmission of binary (+1,-1) signals it is desirable that the stream of pulses which constitutes the signal have no dc, that is, that the power spectrum go to zero at zero frequency. It is desirable that, for a given efficiency or entropy, the spectrum rise slowly with increasing frequency. We have obtained the spectrum for selected blocks with equal numbers of plus ones and minus ones. For a given efficiency, this is better than the spectrum obtained by Rice, using the Monte Carlo method, for block encoding using polarity pulses. An algorithm given by Schalwijk should allow simple encoding into selected blocks.