Content uploaded by Kees Schouhamer Immink
Author content
All content in this area was uploaded by Kees Schouhamer Immink on Aug 17, 2022
Content may be subject to copyright.
1
Minimally modified balanced codes
Kees A. Schouhamer Immink, Fellow, IEEE and Jos H. Weber, Senior Member, IEEE
Abstract—We present and analyze a new construction of
bipolar balanced codes where each codeword contains equally
many −1’s and +1’s. The new code is minimally modified as
the number of symbol changes made to the source word for
translating it into a balanced codeword is as small as possible.
The balanced codes feature low redundancy and time complexity.
Large look-up tables are avoided.
Keywords−balanced code, constrained code, error propa-
gation, Raney’s Lemma.
I. INTRODUCTION
Let x= (x1, . . . , xn),xi∈ {−1,1}, be a word of length
nwith bipolar symbols. The balance of a word x, denoted
by w(x), is defined by w(x) = Pn
i=1 xi. A word is said to
be balanced if w(x) = 0,neven, i.e., it consists of equal
numbers of −1’s and +1’s. A code is said to be balanced
if each word in the code is balanced. Balanced codes have
found widespread application in various fields such as data
transmission and data storage [1, 2, 3, 4, 5, 6]. Look-up tables
for translating source words into balanced codewords and vice
versa have been applied for small n[7, 8]. Enumeration tech-
niques [9, 10, 11, 12, 13] have been advocated for encoding
and decoding balanced words as it achieves the minimum
redundancy possible. The complexity of enumerative coding,
mainly the coefficients look-up tables, grows with n2, which
makes it less practical if complexity is at a premium.
Knuth’s implementation of balanced codes [14, 15, 16] is
attractive for encoding large source words as its complexity
scales linearly with word length n, but it requires a redundancy
log2n, which is, for large n, around twice the minimum
redundancy of a code comprising the full set of balanced
codewords. Modifications of Knuth’s generic scheme bridging
the gap between the minimum redundancy and that of Knuth’s
implementation are discussed in Al-Bassam and Bose [17, 18],
Tallini, Capocelli, and Bose [19, 20], and Weber and Im-
mink [21, 22].
Knuth’s balancing method is handsomely simple: a first
segment of the source word is inverted, i.e. flip the symbol
sign, for balancing. In addition, a prefix (tag) that uniquely
identifies the length of the inverted segment is forwarded to
the receiver. A disadvantage of Knuth’s method is that the
encoder inverts on average n/4 + 1 symbols, which may result
in extreme error propagation when the tag is received in error.
Implementations having low redundancy, complexity, and error
propagation are welcome alternatives to the art.
Kees A. Schouhamer Immink is with Turing Machines Inc, Willem-
skade 15d, 3016 DK Rotterdam, The Netherlands. E-mail: immink@turing-
machines.com.
Jos H. Weber is with Delft University of Technology, Delft, The Nether-
lands. E-mail: j.h.weber@tudelft.nl.
Our contributions: We present a novel method for ef-
ficiently translating arbitrary user data into balanced code-
words, which is based on Raney’s Lemma also known as
Cycle Lemma [23, 24, 25, 26]. As in Knuth’s construction,
the encoder judiciously inverts a number of source symbols
for obtaining the codeword. The proposed code, however, is
minimally modified as the number of symbol inversions is
minimal, which is an attractive virtue for reconstructing the
source word when errors are made during transmission. The
encoder inverts, on average, approximately pn/2π,n1,
symbols of the source word. Thus, for example, for n= 1000
only around twelve symbol inversions are required on average
(assuming equiprobable source words).
Information regarding the symbol modifications made to the
source word is encoded into a small redundant tag appended
to the codeword. We investigate fixed- and variable-length
tag schemes. Tags of multiple codewords can be combined
so reducing the overall redundancy. The redundancy of a
fixed-length tag scheme equals log2(n/2 + 1). The average
redundancy of a variable-length tag scheme approaches the
minimum possible for asymptotically large values of n. The
(time) complexity of the new balanced encoder and decoder
grows linearly with n.
We start in Section II with a description of Raney’s Lemma.
In Section III, we detail the encoding and decoding algorithms.
The code’s redundancy is discussed in Section IV, and a
performance comparison is given in Section V. Section VI
furnishes the conclusions of our paper.
II. RA NE Y’SLE MM A
We start with two definitions. The npartial, or running,
balances of the index i,1≤i≤n, denoted by s(i, k), are
defined by
s(i, k) =
i+k−1
X
j=i
xj,1≤i≤n, (1)
where we extend the sequence xby letting xn+p=xpfor
1≤p≤n. An index iis said to be a minimal index of xif
and only if all the partial balances are positive, i.e.
s(i, k)>0,1≤k≤n. (2)
In other words, an index iis a minimal index of xif and
only if the partial balances of xi, . . . , xn, x1, . . . , xi−1are all
positive. Note that it is immediate from (2) that if index iis
a minimal index then xi=xi+1 = 1.
Define the set of all minimal indexes of xby σ(x). If
w(x)>0there are, according to Raney’s Lemma [23, 24],
exactly |σ(x)|=w(x)minimal indexes, where |X|denotes
the cardinality of a set X.
2
2 4 6 8 10 12 14 16 18 20
1
2
3
4
5
6
7
8
Fig. 1. Partial balance s(1, k)versus kfor n= 10 and x=
(1,1,1,−1,−1,1,−1,1,1,1). The diagram shows s(1, k)in the
extended interval 1≤k≤2n, by letting xn+p=xpfor 1≤p≤n,
which makes it more convenient to peruse the partial balances s(i, k)
for any i, as explained in the text. The minimal indexes of xare the
k-values of the points indicated by the circles.
Example 1: Let n= 10 and x= (1,1,1,−1,−1,1,−1,
1,1,1). There are w(x) = 4 minimal indexes. Figure 1
illustrates the partial balances, s(1, k), versus k. We can check
that index i= 1 is a minimal index, since all partial balances
s(1, k)are positive. Further, note that s(i, k) = s(1, k +i−
1) −s(1, i −1), This implies that for any i,1≤i≤nthe
s(i, k)curve with 1≤k≤ncan be obtained from the curve
in Figure 1 by considering it from k=iup to k=i+n−1and
then shifting this segment i−1units to the left and s(1, i −1)
units downwards. Hence it follows that the minimal index set
is σ(x) = {1,8,9,10}.
III. RAN EY ’SLEMMA-BAS ED BALANCED CODES
A. Antipodal matchings
Ordentlich and Roth [25] pioneered antipodal matchings for
two-dimensional weight-constrained codes, which are based
on Raney’s Lemma. Before showing their results, we define
the function y=f(x, S), where Sis a subset of {1, . . . , n},
by
yi=−xi, i ∈S,
xi, i /∈S. (3)
Ordentlich and Roth [25] showed that all n-bit input words,
x, of balance w(x)>0can be converted into n-bit output
words, y, of inverted balance w(y) = −w(x)by
y=f(x, σ(x)).(4)
In other words, we simply obtain the entries yiof yby invert-
ing the +1’s at all minimal indexes of xto −1’s. For the other
indexes we simply have yi=xi. Ordentlich and Roth proved
that the above antipodal matchings are bijective mappings, and
they presented an efficient (linear-time complexity) algorithm
for finding the set of minimal indexes σ(x),w(x)>0, for
all word lengths n. They generalized the algorithm to words
xwith w(x)<0.
At first sight, the above algorithm looks superfluous as
purely reversing the sign of a word balance is obviously
achieved by inverting all symbols of x. The algorithm based
on Raney’s Lemma, however, has the advantage that a minimal
plurality of symbols (+1’s only if w(x)>0or −1’s only if
w(x)<0) is inverted, which is a highly attractive feature
for constructing two-dimensional weight-constrained codes as
shown in [25]. Below we show that Raney’s Lemma can be
harnessed to balance codewords with a minimal number of
symbol inversions.
B. Balanced codes
Let Swdenote the set of n-bit words, x, whose balance
equals w(x) = w, that is
Sw=(x∈ {−1,1}n:
n
X
i=1
xi=w).(5)
Note that S0is the set of balanced words.
Let the (minimal) indexes in σ(x)be ordered in magnitude,
that is σ(x) = {i1, i2, . . . , iw}, where i1< i2< . . . <
iw−1< iw. For w > 0we define the mapping φ(.)between
x∈ Swand the balanced y=φ(x)∈ S0, where
y=φ(x) = f(x,{i1, i2, . . . , i w
2}).(6)
Clearly w(φ(x)) = 0.
The following lemma shows some important properties,
which will be used later. Essential parts have been presented
in [25, Prop. 4.6].
Lemma 1: For x∈ {−1,1}nwith w(x) = w > 0and
σ(x) = {i1, i2, . . . , iw}with i1< i2< . .. < iw−1< iw<
iw+1 =i1+n, it holds for all 1≤j≤wand ij+ 1 ≤v≤
ij+1 −1that
(i)
v
X
i=ij+1
xi≥0,
(ii)
ij+1−1
X
i=ij+1
xi= 0,
(iii)
ij+1−1
X
i=v
xi≤0.
Proof: (i) Since ijis a minimal index of x, we have
v
X
i=ij+1
xi=
v
X
i=ij
xi−xij≥1−1=0.(7)
(ii) Note that
w
X
j=1
ij+1−1
X
ij+1
xi=
w
X
j=1
ij+1−1
X
i=ij
xi−
w
X
j=1
xij
=
n
X
i=1
xi−
w
X
j=1
xij=w−w= 0.
3
0 20 40 60 80 100
-15
-10
-5
0
5
10
15
20
source word
codeword
Fig. 2. Partial balances, s(1, k), of a) an arbitrary source word, x,
and b) that of the balanced codeword y=φ(x)versus index kfor
n= 100; word balance equals w=w(x) = 16. The partial balances
at the minimal indexes of xare indicated by a ’*’.
Since ij+1−1
X
i=ij+1
xi≥0∀j
because of (i), the result follows.
(iii) It follows from (i) and (ii) that
ij+1−1
X
i=v
xi=
ij+1−1
X
i=ij+1
xi−
v−1
X
i=ij+1
xi≤0−0=0.
Note that this lemma deals with sums of symbols at positions
in the strings between two (cyclicly) consecutive minimal
indexes. In particular, it says that the sum of the symbols in
(i) any head of such string is nonnegative,
(ii) the complete string is equal to zero,
(iii) any tail of such string is nonpositive.
As a visual illustration we have plotted in Figure 2 the partial
balances, s(1, k), of a) an arbitrary source word xof length
n= 100,w(x) = 16 and b) the partial balances of the bal-
anced codeword y=φ(x). The partial balances at the minimal
indexes of xare indicated by a ’*’. The various properties
discussed in Lemma 1 can easily be noted. For example, note
the unity balance increments between consecutive minimal
indexes. The partial balances of the balanced codeword yare
indicated by a ’*’ at the minimal indexes of source word x.
For the smallest w/2minimal indexes of xwe note unity
balance decrements between consecutive minimal indexes,
while for the largest w/2minimal indexes we note unity
balance increments between consecutive minimal indexes.
C. Encoding
We propose the following encoding rule, denoted by y=
ψ(x), for translating an n-bit source word x,x∈ {−1,1}n,
into a balanced n-bit codeword y,y∈ S0:
y=ψ(x) =
φ(x), w(x)>0,
−φ(−x), w(x)<0,
x, w(x)=0,
(8)
Input: The bipolar n-bit word (x1, . . . , xn),xi∈ {−1,1}.
Output: Encoded n-bit bipolar word yand tag w, i.e. ENC(x)
=(y, w).
begin
let w=Pn
i=1 xi
if w= 0 y=xhalt
if w < 0set x=−x{invert all symbols}
run Algorithm [25, Fig. 6] yielding {i1, . . . , iw}
for i∈ {i1, . . . , i w
2}set xi=−1
if w > 0set y=x
if w < 0set y=−x{invert all symbols}
end.
Fig. 3. Basic encoding algorithm ENC(x).
where −xdenotes (−x1,−x2,...,−xn). Figure 3 shows the
basic encoding algorithm ENC(x). Part of the encoding table,
n= 6, has been tabulated in Table I; note that for clerical con-
venience a ‘0’ indicates a ‘−1’ symbol. As ψ(−x) = −ψ(x)
we can easily extend the table. Note that in the mapping
TABLE I
PART OF ENCODING TABLE y=ψ(x)FO R n= 6. A ‘0 ’ INDICATES ‘−1’.
xψ(x)xψ(x)
000000 111000 001000 101100
000001 110001 001001 101001
000010 110010 001010 101010
000011 100011 001011 001011
000100 110100 001100 001110
000101 100101 001101 001101
000110 100110 001110 001110
000111 000111 001111 000111
y=ψ(x)the |w(x)/2|rightmost symbols of the codeword y
equal those of the source word x. We have y=ψ(x) =⇒
xi=yi,i=n−w/2+1, . . . , n, where w=w(x)>0.
By definition of y=ψ(x), see (6), (8), and (9), only the
symbols are inverted at indexes in {i1, i2, . . . , iw/2}. We have
{i1, i2, . . . , iw/2}⊂{1, . . . , n −w(x)/2}, so that the w(x)/2
rightmost symbols of xare unchanged. As w(x)≥2we have
yn=xnfor all x.
The receiver is able to uniquely recover xfrom the received
(balanced) y=ψ(x)if we add a tag to the sent ythat uniquely
identifies the balance of the source word x. A tag can be sent
separately as a pre- or postfix or we may combine multiple
tags to form a large tag data word. The code redundancy is
discussed in Section IV.
D. Decoding
The decoder uniquely retrieves a facsimile x0of the original
source word, x, from the received (balanced) y=ψ(x)
and tag associated with the balance of the source word,
w(x). Figure 4 shows a description of the basic decoding
algorithm. Note that the decoder (time) complexity grows
linearly with word length n. The next theorem shows that
4
Input: The integer w=w(x)∈ {−n, −n+ 2, . . . , n}, and
the bipolar n-bit balanced word (y1, . . . , yn) = ψ(x)∈ S0,
yi∈ {−1,1}.
Output: Decoded n-bit bipolar word DEC(y, w) = x0.
Initialize:
if w= 0 x0=yhalt;
if w < 0v=−y{invert all symbols}
if w > 0v=y
set w2= abs(w
2)
begin
let zi=Pi
j=1 vj,∀i= 1, . . . , n
let m= min{zi}
let i0
j= min{i:zi=m+w2−j} ∀j= 1, . . . , w2
let v0=f(v,{i0
1, . . . , i0
w2})
if w > 0x0=v0
if w < 0x0=−v0{invert all symbols}
end.
Fig. 4. Basic decoding algorithm DEC(y, w ).
the decoding algorithm is correct, that is, DEC(ENC(x)) = x.
Theorem 1: For any x∈ {−1,1}n, it holds that
DEC(ENC(x)) = x.
Proof: We show that the decoding algorithm, shown in Fig-
ure 4, with input (ψ(x), w(x)) is correct and generates the
original source word xas an output. From the encoding and
decoding procedures, this is trivially true if w(x)=0, while
correctness of the w(x)>0case implies that it is also
true for the w(x)<0case. Hence, we further assume that
w=w(x)>0. Note that
vi=
−xi=−1if i∈ {i1, . . . , i w
2},
xi= 1 if i∈ {iw
2+1, . . . , iw},
xiotherwise.
(9)
Let abe the sum of the first i1−1entries of v, i.e.,
a=zi1−1=
i1−1
X
i=1
vi=
i1−1
X
i=1
xi.(10)
It follows from Lemma 1 (ii) and (9) that
zij=a−j, ∀j∈ {1,2,...,w
2}.(11)
Furthermore we have
zi≥
a−w
2∀i∈ {iw
2, . . . , n},
a−j∀j∈ {1,...,w
2−1}, i ∈ {ij, . . . , ij+1 −1},
a∀i∈ {1, . . . , i1−1},
(12)
where the first two inequalities follow from (9), (11), and
Lemma 1 (i), while the third inequality follows from the fact
that zi< a would imply with (9) and (10) that
i1−1
X
j=i+1
xj=
i1−1
X
j=i+1
vj=zi1−1−zi> a −a= 0,
which contradicts Lemma 1 (iii). Hence, (11) and (12) give
that m=a−w
2and that for any j∈ {1,...,w
2}the smallest
isuch that zi=m+w
2−j=a−jis i=ij, and thus that
i0
j=ij. In conclusion, the decoder output satisfies
x0=v0=f(v,{i0
1, . . . , i0
w
2}) = f(y,{i1, . . . , i w
2}) = x.
IV. REDUNDANCY
The number of balanced codewords of length nequals
|S0|=n
n
2,(13)
and thus the minimum redundancy of balanced codewords of
length n, denoted by H0, is
H0=n−log2|S0|=n−log2n
n
2.(14)
For asymptotically large nwe have the approximation [14]
H0≈1
2log2n+ 0.326, n 1.(15)
The redundancy of the new code is governed by the amount
of data required to recover the balance w(x)of the source
word x. The balance w(x)∈ {−n, −n+ 2, . . . , n −2, n}so
that for the simplest fixed-length tag scheme, the redundancy
is log2(n+ 1). The next theorem will help to reduce the
redundancy.
Theorem 2: Let y∈ S0,zi=Pi
j=1 yj, for i= 1, . . . , n,
zmin = min{zi}, and zmax = max{zi}. Then it holds that
|{x∈ Sw:ψ(x) = y}| =
1if w∈ {−2zmax,
−2zmax + 2,...,−2zmin},
0otherwise.
(16)
Proof: From Theorem 1 it follows that the mapping ENC(x)
from {−1,1}nto S0×{−n, −n+2, . . . , n}is injective. Hence,
for each w∈ {−n, −n+ 2, . . . , n}, there is at most one word
x∈ Swfor which ψ(x) = y. In items (i)-(v) below we
investigate for which values of wsuch a word xexists. Define
i0
j= min{i:zi=zmin +w/2−j},j= 1, . . . , w/2, and
observe that y∈ S0implies
zmin ≤0≤zmax.
(i) If w= 0, then there does exist an x∈ Swsuch that
ψ(x) = y, namely x=y, which immediately follows
from (8).
(ii) If w∈ {2,4,...,−2zmin}then there does exist
an x∈ Swsuch that ψ(x) = y, namely x=
f(y,{i0
1, . . . , i0
w/2}). This can be checked as follows.
Note that zi0
j=zmin +w/2−j < 0and xi0
j=−yi0
j= 1
for j= 1,2, . . . , w/2, while xi=yifor all indexes
i6=i0
j. On the one hand, observe that any iwith
i0
j< i < i0
j+1,j∈ {0,1, . . . , w/2−1},i0
0= 0, is
not a minimal index of x, since
i0
j+1−1
X
m=i
xm=
i0
j+1−1
X
m=i
ym=zi0
j+1−1−zi−1≤0.
5
On the other hand, any i0
j,j= 1, . . . , w/2, is a minimal
index of x, since for all k∈ {1,2, . . . , n}it holds that
i0
j+k−1
X
i=i0
j
xi≥
i0
j+k−1
X
i=i0
j
yi+ 2b≥ −b+ 2b=b≥1,
where b=|{m∈ {j, j + 1, . . . , w/2}:i0
j≤
i0
m≤i0
j+k−1}|. In conclusion, i0
1, . . . , i0
w/2are the
w/2smallest minimal indexes of x, and thus ψ(x) =
f(x,{i0
1, . . . , i0
w/2}) = y.
(iii) If w∈ {−2zmin + 2,−2zmin + 4, . . . , n}, then
there is no x∈ Swfor which ψ(x) = y, as we will
show next. Suppose there does exist such x. Let the
w/2smallest minimal indexes of xbe i1, . . . , iw/2. Since
y=f(x,{i1, . . . , iw/2})and x=f(y,{i0
1, . . . , i0
w/2}),
it follows that ij=i0
j∀j. Hence, we obtain the contra-
diction
ziw
2
=
i1−1
X
i=1
yi+
iw
2
X
i=i1
yi=
i1−1
X
i=1
xi−w
2≤ −w
2< zmin,
where the first inequality follows from Lemma 1 (iii) and
the second from the fact that w > −2zmin.
(iv) If w∈ {−n, −n+ 2,...,−2zmax −2}, then there is
no x∈ Swfor which ψ(x) = y, which can be shown in
a similar way as (iii).
(v) If w∈ {−2zmax,−2zmax + 2,...,−2}, then there
exists an x∈ Swsuch that ψ(x) = y, which can be
shown in a similar way as (ii).
Define
N(y) = zmax −zmin + 1,(17)
where N(y)is called the balance span of y. Let r(y)denote
the number of distinct source words x∈ {−1,1}nthat map
to y∈ S0, that is
r(y) = |{x∈ {−1,1}n:y=ψ(x)}|,y∈ S0.(18)
Corollary 1: For all y∈ S0, it holds that
r(y) = N(y).
Proof: This result immediately follows from Theorem 2 by
counting the number of wfor which |{x∈ Sw:ψ(x) =
y}| = 1.
A. Fixed-length (FL) tag scheme
The tag length of a scheme with a fixed-length tag depends
on the maximum value of r(y), and for a variable-length
scheme it depends on the distribution of r(y). We easily find
that 2≤r(y)≤n/2 + 1. Note that the codeword denoted by
y1that starts with n/2-1’s and ends with n/2+1’s (and the
n−1circular shifts of y1) has the largest number of source
words that map on it, namely the n/2+1 words, x, that start
with p,p= 0,1, . . . , n/2, -1’s and end with n−p+1’s.
The decoder must be able to distinguish between at most
n/2+1 source words that map on the received word, which
makes it possible to reduce the tag length to log2(n/2+1). To
do so, the encoder first computes y=ψ(x)using the encoding
algorithm, see Figure 3, and subsequently it computes r(y).
Using r(y), the value w(x)is uniquely encoded into the n/2+
1possible tag values, so that the decoder can uniquely recover
w(x)from the tag and y. The redundancy of this scheme
equals log2(n/2 + 1).
B. Variable-length (VL) tag scheme
The average redundancy of a VL tag scheme is less than
that of the above fixed-length tag scheme. As the distribution
of r(y)is the same as that of Knuth’s code, we follow [22]
for the computation of the redundancy of the VL tag scheme.
The number of balanced words yof length nwith r(y) = u,
denoted by P(u, n),2≤u≤n/2+1, is given by [22]
P(u, n) = D(u, n)−2D(u−1, n) + D(u−2, n),(19)
where
D(u, n) = 2n
u
X
i=1
cosnπi
u+ 1.(20)
The above expression is surprising as D(u, n)is integer
valued. Using a result by Merca [27] we may translate (20)
into a summation of binomial coefficients
D(u, n)=(u+ 1)
v
X
k=−vn
n
2+k(u+ 1)−2n,(21)
where v=bn/(2u+ 2)c. The redundancy of the VL tag
scheme, denoted by H, equals [22]
H= 2−n
n/2+1
X
u=2
uP (u, n) log2u. (22)
The redundancy Hhas been computed in [22, Table II]
for selected values of n≤213. For n= 213, we find
H−H0≈0.033. Eq. (19) is ill-conditioned as P(u, n)is
the difference between two much larger quantities. We were
not able to obtain results of (22) for asymptotically large n,
see also [2].
V. PER FO RM AN CE C OMPARI SO N
In this section, we discuss the number of modifications to
a source word that are made by the prior art Knuth code [14]
and the newly developed code. We start with the new method.
A. New method
The probability, denoted by Pr1(`), that `=|w(x)|/2,
0≤`≤n/2, symbols of xare inverted to obtain ψ(x)equals
(assuming equiprobable source words)
P r1(`) = (1
2nn
n
2, ` = 0,
1
2n−1n
n
2+`,1≤`≤n
2.(23)
The average number of symbol inversions, denoted by ¯
`1,
equals
¯
`1=
n
2
X
`=1
`P r1(`).(24)
6
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10-4
10-3
10-2
10-1
100
Pr2(l), n=256
Pr1(l), n=64
Pr1(l), n=256
Pr2(l), n=64
Fig. 5. Distributions P r1(`)and P r2(`)versus the (relative) number
of symbol inversions `/n. Word length n= 64 and n= 256.
For large n, we obtain by using the well-known Gaussian
approximation to the binomial coefficients,
¯
`1≈rn
2π, n 1.(25)
B. Knuth’s method
Knuth [14] presented a simple scheme for balancing large
codewords. Let xbe the n-bit (neven) source word of bipolar
symbols, xi∈ {−1,1}. Knuth showed that there is a balancing
index,`, such that
−
`
X
i=1
xi+
n
X
i=`+1
xi= 0, n even.(26)
In other words, by inverting a first segment of `symbols any
word xof even length can be balanced. Note that the balancing
index `is not unique [21]. We assume here that the encoder
selects the smallest balancing index from the set of balancing
indexes. The distribution of the number of symbol inversions,
`, for obtaining the balanced word in Knuth’s scheme, denoted
by P r2(`),1≤`≤n(1≤j≤n/2), has been computed
by Weber and Immink [21] (assuming equiprobable source
words)
P r2(2j) = P r2(2j−1)
=n−2j+ 1
n2n−22(j−1)
j−1n−2j
n
2−j.
The average number of symbol inversions of Knuth’s scheme,
denoted by ¯
`2, simply equals, see Appendix,
¯
`2=
n
X
`=1
`P r2(`) = n
4+ 1.(27)
C. Comparison of the two methods
Figure 5 shows two examples of the distributions Pr1(`)
and P r2(`)versus the relative number of symbols inversions
`/n for word lengths n= 64 and n= 256. We may notice
that the distribution of Knuth’s method, P r2(`), is much wider
than that of the new method, P r1(`), which has a direct effect
on the average number of inversion (bit changes) made. For
example, for a codeword length n= 1000 around 12 symbol
inversions are required on average per codeword for the new
scheme. Knuth’s code requires, on average, for the same
codeword length, n= 1000, around 250 symbol inversions
for translating source words into codewords.
VI. CONCLUSIONS
We have presented a novel method for efficiently translating
arbitrary user data into balanced codewords. The new code is
minimally modified as the number of symbol changes made
to the source word for translating it into a balanced codeword
is minimal. The encoder inverts, on average, approximately
pn/2π,n1, symbols of the source word, where ndenotes
the source word length; the other code symbols being equal to
the source symbols. The redundancy of the new method using
a fixed-length tag is log2(n/2 + 1). Large look-up tables for
encoding and decoding are avoided. The (time) complexity of
the new balanced encoder and decoder grows linearly with
source word length nfor asymptotically large values of n.
VII. APPENDIX
Let for 1≤j≤n
2
P r2(2j) = P r2(2j−1)
=n−2j+ 1
n2n−22(j−1)
j−1n−2j
n
2−j.(28)
Theorem 3:
¯
`2=
n
X
i=1
iP r2(i) = n
4+ 1.(29)
Proof: We simply find, combining P r2(2i)and P r2(2i−1),
n
X
i=1
iP r2(i) =
n
2
X
i=1
(4i−1)P r2(2i).(30)
Since P r2(i)is a probability mass function, we have
n
2
X
i=1
P r2(2i) = 1
2,(31)
and we obtain
¯
`2= 4
n
2
X
i=1
iP r2(2i)−1
2.(32)
Define the moments
mk(n) =
n
2
X
j=1
jk2(j−1)
j−1n−2j
n
2−j, k = 0,1,2,(33)
then substituting into (32) yields
¯
`2=4(n+ 1)
n2n−2m1(n)−8
n2n−2m2(n)−1
2.(34)
In the literature [24, pp. 187], we find
m0(n) = 2n−2.(35)
7
As, see (28), (31), and (33),
n+ 1
n2n−2m0(n)−2
n2n−2m1(n) = 1
2,(36)
we obtain
m1(n) = 2n−4(n+ 2).(37)
The zeroth and second moments, m0(n)and m2(n), are the
autoconvolution of the sequence 2i
iand i2i
i,i= 1,2, . . .,
respectively. The generating function of the autoconvolution
is obtained by squaring the original generating function as
presented in [24]. Due to space limitations, we omit the details,
and summarize the result:
m2(n)=2n−7(3n2+ 6n+ 8).(38)
Substituting (37) and (38) into (34) proves the theorem.
ACK NOW LE DG EM EN T
The authors are indebted to dr. A.J.E.M. (Guido) Janssen
for his assistance with the proof of Theorem 3 given in the
Appendix.
REFERENCES
[1] A. R. Calderbank, ”The art of signaling: fifty years of coding theory,”
IEEE Transactions on Information Theory, vol. 44, no. 6, pp. 2561-2595,
Oct. 1998, doi: 10.1109/18.720549.
[2] D. T. Dao, H. M. Kiah, and T. T. Nguyen, “Average Redundancy of
Variable-Length Balancing Schemes `
a la Knuth,” ArXiv: 2204.13831,
April 2022.
[3] O. P. Babalola and V. Balyan, ‘Efficient Channel Coding for Dimmable
Visible Light Communications System,” IEEE Access, vol. 8, pp.
215100-215106, 2020, doi: 10.1109/ACCESS.2020.3041431.
[4] J. N. Franklin and J. R. Pierce, “Spectra and Efficiency of Binary Codes
without DC,” IEEE Transactions on Communications, vol. COM-20, no.
6, pp. 1182-1184, Dec. 1972, doi: 10.1109/TCOM.1972.1091308.
[5] F. Chang, W. Hu, D. Lee and C. Yu, “Design and implementation of anti
low-frequency noise in visible light communications,” 2017 International
Conference on Applied System Innovation (ICASI), Sapporo, 2017, pp.
1536-1538, doi: 10.1109/ICASI.2017.7988219.
[6] K. A. S. Immink and K. Cai, “Properties and Constructions of Con-
strained Codes for DNA-based Data Storage,” IEEE Access, vol. 8, pp.
49523-49531, 2020, doi: 10.1109/ACCESS.2020.2980036.
[7] A. X. Widmer and P. A. Franaszek, “A Dc-balanced, Partitioned-Block,
8B/10B Transmission Code,” IBM J. Res. Develop., vol. 27, no. 5, pp.
440-451, Sept. 1983, doi: 10.1147/rd.275.0440.
[8] C. N. Yang and D. J. Lee, “Some new efficient second-order spectral-
null codes with small lookup tables,” IEEE Transactions on Computers,
vol. 55, no. 7, pp. 924-927, July 2006, doi: 10.1109/TC.2006.111.
[9] T. M. Cover, “Enumerative Source Coding,” IEEE Transactions on
Information Theory, vol. IT-19, no. 1, pp. 73-77, Jan. 1973, doi:
10.1109/TIT.1973.1054929.
[10] V. Braun and K. A. S. Immink, “An Enumerative Coding Technique
for DC-free Runlength-Limited Sequences,” IEEE Transactions on
Communications, vol. 48, no. 12, pp. 2024-2031, Dec. 2000, doi:
10.1109/26.891213.
[11] J. P. M. Schalkwijk, “An Algorithm for Source Coding,” IEEE Transac-
tions on Information Theory, vol. IT-18, no. 3, pp. 395-399, May 1972,
doi: 10.1109/TIT.1972.1054832.
[12] Y. Xin and I. J. Fair, “Algorithms to Enumerate Codewords for DC2-
Constrained Channels,” IEEE Transactions on Information Theory, vol.
IT-47, no. 7, pp. 3020-3025, Nov. 2001, doi: 10.1109/18.959281.
[13] A. Hareedy, B. Dabak, and R. Calderbank, “The Secret Arithmetic of
Patterns: A General Method for Designing Constrained Codes Based
on Lexicographic Indexing,” IEEE Transactions on Information Theory,
2022. doi: 10.1109/TIT.2022.3170692.
[14] D. E. Knuth, “Efficient Balanced Codes,” IEEE Transactions on
Information Theory, vol. IT-32, no. 1, pp. 51-53, Jan. 1986, doi:
10.1109/TIT.1986.1057136
[15] H. D. L. Hollmann and K. A. S. Immink, “Performance of efficient
balanced codes,” IEEE Transactions on Information Theory, vol. IT-37,
no. 3, pp. 913-918, May 1991, doi: 10.1109/18.79961.
[16] F. Paluncic, B. T. Maharaj, and H. C. Ferreira, “Variable- and Fixed-
Length Balanced Runlength-Limited Codes Based on a Knuth-Like Bal-
ancing Method,” IEEE Transactions on on Information Theory, vol. IT-
65, no. 11, pp. 7045-7066, Nov. 2019, doi: 10.1109/TIT.2019.2914205.
[17] S. Al-Bassam and B. Bose, “On Balanced Codes,” IEEE Transactions
on Information Theory, vol. IT-36, no. 2, pp. 406-408, March 1990, doi:
10.1109/18.52490.
[18] S. Al-Bassam and B. Bose, “Design of Efficient Balanced Codes,” IEEE
Transactions on Computers, vol. 43, pp. 362-365, March 1994, doi:
10.1109/12.272436.
[19] L. G. Tallini, R. M. Capocelli, and B. Bose, “Design of some new
efficient balanced bodes,” IEEE Transactions on Information Theory,
vol. IT-42, no. 3, pp. 790-802, May 1996, doi: 10.1109/18.490545.
[20] L. G. Tallini and B. Bose, “Balanced codes with parallel encoding and
decoding,” IEEE Transactions on Computers, vol. 48, no. 8, pp. 794-
814, Aug. 1999, doi: 10.1109/12.795122.
[21] J. H. Weber and K. A. S. Immink, “Knuth’s Balanced Codes Revisited,”
IEEE Transactions on Information Theory, vol. IT-56, no. 4, pp. 1673-
1679, April 2010, doi: 10.1109/TIT.2010.2040868.
[22] K. A. S. Immink and J. H. Weber, “Very Efficient Balanced Codes,”
IEEE Journal on Selected Areas of Communications, vol. 28, no. 2, pp.
188-192, Feb. 2010, doi: 10.1109/JSAC.2010.100207.
[23] G. Raney, “Functional Composition Patterns and Power Series Rever-
sion,” Transactions of the American Mathematical Society, vol. 94. pp.
441-451, 1960.
[24] R. L. Graham, D. E. Knuth, and O. Patashnik, Concrete mathematics: A
Foundation for Computer Science (2nd Edition), Addison-Wesley, ISBN-
13: 978-0201558029, 2018.
[25] E. Ordentlich and R. M. Roth, “Low Complexity Two-
Dimensional Weight-Constrained Codes”, IEEE Transactions on
Information Theory, vol. 58, no. 6, pp. 3892-3899, June 2012, doi:
10.1109/TIT.2012.2190380.
[26] T. T. Nguyen, K. Cai, K. A. S. Immink and Y. M. Chee, “Efficient
Design of Capacity-Approaching Two-Dimensional Weight-Constrained
Codes,” 2021 IEEE International Symposium on Information Theory
(ISIT), pp. 2930-2935, 2021, doi: 10.1109/ISIT45174.2021.9517970.
[27] M. Merca, “A note on cosine power series,” Journal of Integer
Sequences, vol. 15, no. 5, Article 15.5.3, MR2942751, 2012.
Kees A. Schouhamer Immink (M’81-SM’86-F’90) founded
Turing Machines Inc. in 1998, an innovative start-up focused
on novel signal processing for DNA-based storage, where he
currently holds the position of president. He was from 1994
till 2014 an adjunct professor at the Institute for Experimental
Mathematics, Essen-Duisburg University, Germany.
He contributed to digital video, audio, and data recording
products including Compact Disc, CD-ROM, DCC, DVD,
and Blu-ray Disc. He received the 2017 IEEE Medal of
Honor, a Knighthood in 2000, a personal Emmy award in
2004, the 1999 AES Gold Medal, the 2004 SMPTE Progress
Medal, the 2014 Eduard Rhein Prize for Technology, and the
2015 IET Faraday Medal. He received the Golden Jubilee
Award for Technological Innovation by the IEEE Information
Theory Society in 1998. He was inducted into the Consumer
Electronics Hall of Fame, elected into the Royal Netherlands
Academy of Sciences and the (US) National Academy of
Engineering. He received an honorary doctorate from the
University of Johannesburg in 2014. He served the profession
as President of the Audio Engineering Society inc., New
York, in 2003.
Jos H. Weber (S’87-M’90-SM’00) was born in Schiedam, The
Netherlands, in 1961. He received the M.Sc. (in mathematics,
8
with honors), Ph.D., and MBT (Master of Business Telecom-
munications) degrees from Delft University of Technology,
Delft, The Netherlands, in 1985, 1989, and 1996, respectively.
Since 1985 he has been with the Delft University of
Technology. Currently, he is an associate professor at the De-
partment of Applied Mathematics. He was the chairman of the
Werkgemeenschap voor Informatie- en Communicatietheorie
from 2006 until 2021. He is the secretary of the IEEE Benelux
Chapter on Information Theory since 2008. He was a visiting
researcher at the University of California (Davis, CA, USA),
the Tokyo Institute of Technology (Japan), the University of
Johannesburg (South Africa), EPFL (Switzerland), and SUTD
(Singapore). His main research interests are in the area of
channel coding.