Content uploaded by Kees Schouhamer Immink

Author content

All content in this area was uploaded by Kees Schouhamer Immink on Aug 17, 2022

Content may be subject to copyright.

1

Minimally modiﬁed balanced codes

Kees A. Schouhamer Immink, Fellow, IEEE and Jos H. Weber, Senior Member, IEEE

Abstract—We present and analyze a new construction of

bipolar balanced codes where each codeword contains equally

many −1’s and +1’s. The new code is minimally modiﬁed as

the number of symbol changes made to the source word for

translating it into a balanced codeword is as small as possible.

The balanced codes feature low redundancy and time complexity.

Large look-up tables are avoided.

Keywords−balanced code, constrained code, error propa-

gation, Raney’s Lemma.

I. INTRODUCTION

Let x= (x1, . . . , xn),xi∈ {−1,1}, be a word of length

nwith bipolar symbols. The balance of a word x, denoted

by w(x), is deﬁned by w(x) = Pn

i=1 xi. A word is said to

be balanced if w(x) = 0,neven, i.e., it consists of equal

numbers of −1’s and +1’s. A code is said to be balanced

if each word in the code is balanced. Balanced codes have

found widespread application in various ﬁelds such as data

transmission and data storage [1, 2, 3, 4, 5, 6]. Look-up tables

for translating source words into balanced codewords and vice

versa have been applied for small n[7, 8]. Enumeration tech-

niques [9, 10, 11, 12, 13] have been advocated for encoding

and decoding balanced words as it achieves the minimum

redundancy possible. The complexity of enumerative coding,

mainly the coefﬁcients look-up tables, grows with n2, which

makes it less practical if complexity is at a premium.

Knuth’s implementation of balanced codes [14, 15, 16] is

attractive for encoding large source words as its complexity

scales linearly with word length n, but it requires a redundancy

log2n, which is, for large n, around twice the minimum

redundancy of a code comprising the full set of balanced

codewords. Modiﬁcations of Knuth’s generic scheme bridging

the gap between the minimum redundancy and that of Knuth’s

implementation are discussed in Al-Bassam and Bose [17, 18],

Tallini, Capocelli, and Bose [19, 20], and Weber and Im-

mink [21, 22].

Knuth’s balancing method is handsomely simple: a ﬁrst

segment of the source word is inverted, i.e. ﬂip the symbol

sign, for balancing. In addition, a preﬁx (tag) that uniquely

identiﬁes the length of the inverted segment is forwarded to

the receiver. A disadvantage of Knuth’s method is that the

encoder inverts on average n/4 + 1 symbols, which may result

in extreme error propagation when the tag is received in error.

Implementations having low redundancy, complexity, and error

propagation are welcome alternatives to the art.

Kees A. Schouhamer Immink is with Turing Machines Inc, Willem-

skade 15d, 3016 DK Rotterdam, The Netherlands. E-mail: immink@turing-

machines.com.

Jos H. Weber is with Delft University of Technology, Delft, The Nether-

lands. E-mail: j.h.weber@tudelft.nl.

Our contributions: We present a novel method for ef-

ﬁciently translating arbitrary user data into balanced code-

words, which is based on Raney’s Lemma also known as

Cycle Lemma [23, 24, 25, 26]. As in Knuth’s construction,

the encoder judiciously inverts a number of source symbols

for obtaining the codeword. The proposed code, however, is

minimally modiﬁed as the number of symbol inversions is

minimal, which is an attractive virtue for reconstructing the

source word when errors are made during transmission. The

encoder inverts, on average, approximately pn/2π,n1,

symbols of the source word. Thus, for example, for n= 1000

only around twelve symbol inversions are required on average

(assuming equiprobable source words).

Information regarding the symbol modiﬁcations made to the

source word is encoded into a small redundant tag appended

to the codeword. We investigate ﬁxed- and variable-length

tag schemes. Tags of multiple codewords can be combined

so reducing the overall redundancy. The redundancy of a

ﬁxed-length tag scheme equals log2(n/2 + 1). The average

redundancy of a variable-length tag scheme approaches the

minimum possible for asymptotically large values of n. The

(time) complexity of the new balanced encoder and decoder

grows linearly with n.

We start in Section II with a description of Raney’s Lemma.

In Section III, we detail the encoding and decoding algorithms.

The code’s redundancy is discussed in Section IV, and a

performance comparison is given in Section V. Section VI

furnishes the conclusions of our paper.

II. RA NE Y’SLE MM A

We start with two deﬁnitions. The npartial, or running,

balances of the index i,1≤i≤n, denoted by s(i, k), are

deﬁned by

s(i, k) =

i+k−1

X

j=i

xj,1≤i≤n, (1)

where we extend the sequence xby letting xn+p=xpfor

1≤p≤n. An index iis said to be a minimal index of xif

and only if all the partial balances are positive, i.e.

s(i, k)>0,1≤k≤n. (2)

In other words, an index iis a minimal index of xif and

only if the partial balances of xi, . . . , xn, x1, . . . , xi−1are all

positive. Note that it is immediate from (2) that if index iis

a minimal index then xi=xi+1 = 1.

Deﬁne the set of all minimal indexes of xby σ(x). If

w(x)>0there are, according to Raney’s Lemma [23, 24],

exactly |σ(x)|=w(x)minimal indexes, where |X|denotes

the cardinality of a set X.

2

2 4 6 8 10 12 14 16 18 20

1

2

3

4

5

6

7

8

Fig. 1. Partial balance s(1, k)versus kfor n= 10 and x=

(1,1,1,−1,−1,1,−1,1,1,1). The diagram shows s(1, k)in the

extended interval 1≤k≤2n, by letting xn+p=xpfor 1≤p≤n,

which makes it more convenient to peruse the partial balances s(i, k)

for any i, as explained in the text. The minimal indexes of xare the

k-values of the points indicated by the circles.

Example 1: Let n= 10 and x= (1,1,1,−1,−1,1,−1,

1,1,1). There are w(x) = 4 minimal indexes. Figure 1

illustrates the partial balances, s(1, k), versus k. We can check

that index i= 1 is a minimal index, since all partial balances

s(1, k)are positive. Further, note that s(i, k) = s(1, k +i−

1) −s(1, i −1), This implies that for any i,1≤i≤nthe

s(i, k)curve with 1≤k≤ncan be obtained from the curve

in Figure 1 by considering it from k=iup to k=i+n−1and

then shifting this segment i−1units to the left and s(1, i −1)

units downwards. Hence it follows that the minimal index set

is σ(x) = {1,8,9,10}.

III. RAN EY ’SLEMMA-BAS ED BALANCED CODES

A. Antipodal matchings

Ordentlich and Roth [25] pioneered antipodal matchings for

two-dimensional weight-constrained codes, which are based

on Raney’s Lemma. Before showing their results, we deﬁne

the function y=f(x, S), where Sis a subset of {1, . . . , n},

by

yi=−xi, i ∈S,

xi, i /∈S. (3)

Ordentlich and Roth [25] showed that all n-bit input words,

x, of balance w(x)>0can be converted into n-bit output

words, y, of inverted balance w(y) = −w(x)by

y=f(x, σ(x)).(4)

In other words, we simply obtain the entries yiof yby invert-

ing the +1’s at all minimal indexes of xto −1’s. For the other

indexes we simply have yi=xi. Ordentlich and Roth proved

that the above antipodal matchings are bijective mappings, and

they presented an efﬁcient (linear-time complexity) algorithm

for ﬁnding the set of minimal indexes σ(x),w(x)>0, for

all word lengths n. They generalized the algorithm to words

xwith w(x)<0.

At ﬁrst sight, the above algorithm looks superﬂuous as

purely reversing the sign of a word balance is obviously

achieved by inverting all symbols of x. The algorithm based

on Raney’s Lemma, however, has the advantage that a minimal

plurality of symbols (+1’s only if w(x)>0or −1’s only if

w(x)<0) is inverted, which is a highly attractive feature

for constructing two-dimensional weight-constrained codes as

shown in [25]. Below we show that Raney’s Lemma can be

harnessed to balance codewords with a minimal number of

symbol inversions.

B. Balanced codes

Let Swdenote the set of n-bit words, x, whose balance

equals w(x) = w, that is

Sw=(x∈ {−1,1}n:

n

X

i=1

xi=w).(5)

Note that S0is the set of balanced words.

Let the (minimal) indexes in σ(x)be ordered in magnitude,

that is σ(x) = {i1, i2, . . . , iw}, where i1< i2< . . . <

iw−1< iw. For w > 0we deﬁne the mapping φ(.)between

x∈ Swand the balanced y=φ(x)∈ S0, where

y=φ(x) = f(x,{i1, i2, . . . , i w

2}).(6)

Clearly w(φ(x)) = 0.

The following lemma shows some important properties,

which will be used later. Essential parts have been presented

in [25, Prop. 4.6].

Lemma 1: For x∈ {−1,1}nwith w(x) = w > 0and

σ(x) = {i1, i2, . . . , iw}with i1< i2< . .. < iw−1< iw<

iw+1 =i1+n, it holds for all 1≤j≤wand ij+ 1 ≤v≤

ij+1 −1that

(i)

v

X

i=ij+1

xi≥0,

(ii)

ij+1−1

X

i=ij+1

xi= 0,

(iii)

ij+1−1

X

i=v

xi≤0.

Proof: (i) Since ijis a minimal index of x, we have

v

X

i=ij+1

xi=

v

X

i=ij

xi−xij≥1−1=0.(7)

(ii) Note that

w

X

j=1

ij+1−1

X

ij+1

xi=

w

X

j=1

ij+1−1

X

i=ij

xi−

w

X

j=1

xij

=

n

X

i=1

xi−

w

X

j=1

xij=w−w= 0.

3

0 20 40 60 80 100

-15

-10

-5

0

5

10

15

20

source word

codeword

Fig. 2. Partial balances, s(1, k), of a) an arbitrary source word, x,

and b) that of the balanced codeword y=φ(x)versus index kfor

n= 100; word balance equals w=w(x) = 16. The partial balances

at the minimal indexes of xare indicated by a ’*’.

Since ij+1−1

X

i=ij+1

xi≥0∀j

because of (i), the result follows.

(iii) It follows from (i) and (ii) that

ij+1−1

X

i=v

xi=

ij+1−1

X

i=ij+1

xi−

v−1

X

i=ij+1

xi≤0−0=0.

Note that this lemma deals with sums of symbols at positions

in the strings between two (cyclicly) consecutive minimal

indexes. In particular, it says that the sum of the symbols in

(i) any head of such string is nonnegative,

(ii) the complete string is equal to zero,

(iii) any tail of such string is nonpositive.

As a visual illustration we have plotted in Figure 2 the partial

balances, s(1, k), of a) an arbitrary source word xof length

n= 100,w(x) = 16 and b) the partial balances of the bal-

anced codeword y=φ(x). The partial balances at the minimal

indexes of xare indicated by a ’*’. The various properties

discussed in Lemma 1 can easily be noted. For example, note

the unity balance increments between consecutive minimal

indexes. The partial balances of the balanced codeword yare

indicated by a ’*’ at the minimal indexes of source word x.

For the smallest w/2minimal indexes of xwe note unity

balance decrements between consecutive minimal indexes,

while for the largest w/2minimal indexes we note unity

balance increments between consecutive minimal indexes.

C. Encoding

We propose the following encoding rule, denoted by y=

ψ(x), for translating an n-bit source word x,x∈ {−1,1}n,

into a balanced n-bit codeword y,y∈ S0:

y=ψ(x) =

φ(x), w(x)>0,

−φ(−x), w(x)<0,

x, w(x)=0,

(8)

Input: The bipolar n-bit word (x1, . . . , xn),xi∈ {−1,1}.

Output: Encoded n-bit bipolar word yand tag w, i.e. ENC(x)

=(y, w).

begin

let w=Pn

i=1 xi

if w= 0 y=xhalt

if w < 0set x=−x{invert all symbols}

run Algorithm [25, Fig. 6] yielding {i1, . . . , iw}

for i∈ {i1, . . . , i w

2}set xi=−1

if w > 0set y=x

if w < 0set y=−x{invert all symbols}

end.

Fig. 3. Basic encoding algorithm ENC(x).

where −xdenotes (−x1,−x2,...,−xn). Figure 3 shows the

basic encoding algorithm ENC(x). Part of the encoding table,

n= 6, has been tabulated in Table I; note that for clerical con-

venience a ‘0’ indicates a ‘−1’ symbol. As ψ(−x) = −ψ(x)

we can easily extend the table. Note that in the mapping

TABLE I

PART OF ENCODING TABLE y=ψ(x)FO R n= 6. A ‘0 ’ INDICATES ‘−1’.

xψ(x)xψ(x)

000000 111000 001000 101100

000001 110001 001001 101001

000010 110010 001010 101010

000011 100011 001011 001011

000100 110100 001100 001110

000101 100101 001101 001101

000110 100110 001110 001110

000111 000111 001111 000111

y=ψ(x)the |w(x)/2|rightmost symbols of the codeword y

equal those of the source word x. We have y=ψ(x) =⇒

xi=yi,i=n−w/2+1, . . . , n, where w=w(x)>0.

By deﬁnition of y=ψ(x), see (6), (8), and (9), only the

symbols are inverted at indexes in {i1, i2, . . . , iw/2}. We have

{i1, i2, . . . , iw/2}⊂{1, . . . , n −w(x)/2}, so that the w(x)/2

rightmost symbols of xare unchanged. As w(x)≥2we have

yn=xnfor all x.

The receiver is able to uniquely recover xfrom the received

(balanced) y=ψ(x)if we add a tag to the sent ythat uniquely

identiﬁes the balance of the source word x. A tag can be sent

separately as a pre- or postﬁx or we may combine multiple

tags to form a large tag data word. The code redundancy is

discussed in Section IV.

D. Decoding

The decoder uniquely retrieves a facsimile x0of the original

source word, x, from the received (balanced) y=ψ(x)

and tag associated with the balance of the source word,

w(x). Figure 4 shows a description of the basic decoding

algorithm. Note that the decoder (time) complexity grows

linearly with word length n. The next theorem shows that

4

Input: The integer w=w(x)∈ {−n, −n+ 2, . . . , n}, and

the bipolar n-bit balanced word (y1, . . . , yn) = ψ(x)∈ S0,

yi∈ {−1,1}.

Output: Decoded n-bit bipolar word DEC(y, w) = x0.

Initialize:

if w= 0 x0=yhalt;

if w < 0v=−y{invert all symbols}

if w > 0v=y

set w2= abs(w

2)

begin

let zi=Pi

j=1 vj,∀i= 1, . . . , n

let m= min{zi}

let i0

j= min{i:zi=m+w2−j} ∀j= 1, . . . , w2

let v0=f(v,{i0

1, . . . , i0

w2})

if w > 0x0=v0

if w < 0x0=−v0{invert all symbols}

end.

Fig. 4. Basic decoding algorithm DEC(y, w ).

the decoding algorithm is correct, that is, DEC(ENC(x)) = x.

Theorem 1: For any x∈ {−1,1}n, it holds that

DEC(ENC(x)) = x.

Proof: We show that the decoding algorithm, shown in Fig-

ure 4, with input (ψ(x), w(x)) is correct and generates the

original source word xas an output. From the encoding and

decoding procedures, this is trivially true if w(x)=0, while

correctness of the w(x)>0case implies that it is also

true for the w(x)<0case. Hence, we further assume that

w=w(x)>0. Note that

vi=

−xi=−1if i∈ {i1, . . . , i w

2},

xi= 1 if i∈ {iw

2+1, . . . , iw},

xiotherwise.

(9)

Let abe the sum of the ﬁrst i1−1entries of v, i.e.,

a=zi1−1=

i1−1

X

i=1

vi=

i1−1

X

i=1

xi.(10)

It follows from Lemma 1 (ii) and (9) that

zij=a−j, ∀j∈ {1,2,...,w

2}.(11)

Furthermore we have

zi≥

a−w

2∀i∈ {iw

2, . . . , n},

a−j∀j∈ {1,...,w

2−1}, i ∈ {ij, . . . , ij+1 −1},

a∀i∈ {1, . . . , i1−1},

(12)

where the ﬁrst two inequalities follow from (9), (11), and

Lemma 1 (i), while the third inequality follows from the fact

that zi< a would imply with (9) and (10) that

i1−1

X

j=i+1

xj=

i1−1

X

j=i+1

vj=zi1−1−zi> a −a= 0,

which contradicts Lemma 1 (iii). Hence, (11) and (12) give

that m=a−w

2and that for any j∈ {1,...,w

2}the smallest

isuch that zi=m+w

2−j=a−jis i=ij, and thus that

i0

j=ij. In conclusion, the decoder output satisﬁes

x0=v0=f(v,{i0

1, . . . , i0

w

2}) = f(y,{i1, . . . , i w

2}) = x.

IV. REDUNDANCY

The number of balanced codewords of length nequals

|S0|=n

n

2,(13)

and thus the minimum redundancy of balanced codewords of

length n, denoted by H0, is

H0=n−log2|S0|=n−log2n

n

2.(14)

For asymptotically large nwe have the approximation [14]

H0≈1

2log2n+ 0.326, n 1.(15)

The redundancy of the new code is governed by the amount

of data required to recover the balance w(x)of the source

word x. The balance w(x)∈ {−n, −n+ 2, . . . , n −2, n}so

that for the simplest ﬁxed-length tag scheme, the redundancy

is log2(n+ 1). The next theorem will help to reduce the

redundancy.

Theorem 2: Let y∈ S0,zi=Pi

j=1 yj, for i= 1, . . . , n,

zmin = min{zi}, and zmax = max{zi}. Then it holds that

|{x∈ Sw:ψ(x) = y}| =

1if w∈ {−2zmax,

−2zmax + 2,...,−2zmin},

0otherwise.

(16)

Proof: From Theorem 1 it follows that the mapping ENC(x)

from {−1,1}nto S0×{−n, −n+2, . . . , n}is injective. Hence,

for each w∈ {−n, −n+ 2, . . . , n}, there is at most one word

x∈ Swfor which ψ(x) = y. In items (i)-(v) below we

investigate for which values of wsuch a word xexists. Deﬁne

i0

j= min{i:zi=zmin +w/2−j},j= 1, . . . , w/2, and

observe that y∈ S0implies

zmin ≤0≤zmax.

(i) If w= 0, then there does exist an x∈ Swsuch that

ψ(x) = y, namely x=y, which immediately follows

from (8).

(ii) If w∈ {2,4,...,−2zmin}then there does exist

an x∈ Swsuch that ψ(x) = y, namely x=

f(y,{i0

1, . . . , i0

w/2}). This can be checked as follows.

Note that zi0

j=zmin +w/2−j < 0and xi0

j=−yi0

j= 1

for j= 1,2, . . . , w/2, while xi=yifor all indexes

i6=i0

j. On the one hand, observe that any iwith

i0

j< i < i0

j+1,j∈ {0,1, . . . , w/2−1},i0

0= 0, is

not a minimal index of x, since

i0

j+1−1

X

m=i

xm=

i0

j+1−1

X

m=i

ym=zi0

j+1−1−zi−1≤0.

5

On the other hand, any i0

j,j= 1, . . . , w/2, is a minimal

index of x, since for all k∈ {1,2, . . . , n}it holds that

i0

j+k−1

X

i=i0

j

xi≥

i0

j+k−1

X

i=i0

j

yi+ 2b≥ −b+ 2b=b≥1,

where b=|{m∈ {j, j + 1, . . . , w/2}:i0

j≤

i0

m≤i0

j+k−1}|. In conclusion, i0

1, . . . , i0

w/2are the

w/2smallest minimal indexes of x, and thus ψ(x) =

f(x,{i0

1, . . . , i0

w/2}) = y.

(iii) If w∈ {−2zmin + 2,−2zmin + 4, . . . , n}, then

there is no x∈ Swfor which ψ(x) = y, as we will

show next. Suppose there does exist such x. Let the

w/2smallest minimal indexes of xbe i1, . . . , iw/2. Since

y=f(x,{i1, . . . , iw/2})and x=f(y,{i0

1, . . . , i0

w/2}),

it follows that ij=i0

j∀j. Hence, we obtain the contra-

diction

ziw

2

=

i1−1

X

i=1

yi+

iw

2

X

i=i1

yi=

i1−1

X

i=1

xi−w

2≤ −w

2< zmin,

where the ﬁrst inequality follows from Lemma 1 (iii) and

the second from the fact that w > −2zmin.

(iv) If w∈ {−n, −n+ 2,...,−2zmax −2}, then there is

no x∈ Swfor which ψ(x) = y, which can be shown in

a similar way as (iii).

(v) If w∈ {−2zmax,−2zmax + 2,...,−2}, then there

exists an x∈ Swsuch that ψ(x) = y, which can be

shown in a similar way as (ii).

Deﬁne

N(y) = zmax −zmin + 1,(17)

where N(y)is called the balance span of y. Let r(y)denote

the number of distinct source words x∈ {−1,1}nthat map

to y∈ S0, that is

r(y) = |{x∈ {−1,1}n:y=ψ(x)}|,y∈ S0.(18)

Corollary 1: For all y∈ S0, it holds that

r(y) = N(y).

Proof: This result immediately follows from Theorem 2 by

counting the number of wfor which |{x∈ Sw:ψ(x) =

y}| = 1.

A. Fixed-length (FL) tag scheme

The tag length of a scheme with a ﬁxed-length tag depends

on the maximum value of r(y), and for a variable-length

scheme it depends on the distribution of r(y). We easily ﬁnd

that 2≤r(y)≤n/2 + 1. Note that the codeword denoted by

y1that starts with n/2-1’s and ends with n/2+1’s (and the

n−1circular shifts of y1) has the largest number of source

words that map on it, namely the n/2+1 words, x, that start

with p,p= 0,1, . . . , n/2, -1’s and end with n−p+1’s.

The decoder must be able to distinguish between at most

n/2+1 source words that map on the received word, which

makes it possible to reduce the tag length to log2(n/2+1). To

do so, the encoder ﬁrst computes y=ψ(x)using the encoding

algorithm, see Figure 3, and subsequently it computes r(y).

Using r(y), the value w(x)is uniquely encoded into the n/2+

1possible tag values, so that the decoder can uniquely recover

w(x)from the tag and y. The redundancy of this scheme

equals log2(n/2 + 1).

B. Variable-length (VL) tag scheme

The average redundancy of a VL tag scheme is less than

that of the above ﬁxed-length tag scheme. As the distribution

of r(y)is the same as that of Knuth’s code, we follow [22]

for the computation of the redundancy of the VL tag scheme.

The number of balanced words yof length nwith r(y) = u,

denoted by P(u, n),2≤u≤n/2+1, is given by [22]

P(u, n) = D(u, n)−2D(u−1, n) + D(u−2, n),(19)

where

D(u, n) = 2n

u

X

i=1

cosnπi

u+ 1.(20)

The above expression is surprising as D(u, n)is integer

valued. Using a result by Merca [27] we may translate (20)

into a summation of binomial coefﬁcients

D(u, n)=(u+ 1)

v

X

k=−vn

n

2+k(u+ 1)−2n,(21)

where v=bn/(2u+ 2)c. The redundancy of the VL tag

scheme, denoted by H, equals [22]

H= 2−n

n/2+1

X

u=2

uP (u, n) log2u. (22)

The redundancy Hhas been computed in [22, Table II]

for selected values of n≤213. For n= 213, we ﬁnd

H−H0≈0.033. Eq. (19) is ill-conditioned as P(u, n)is

the difference between two much larger quantities. We were

not able to obtain results of (22) for asymptotically large n,

see also [2].

V. PER FO RM AN CE C OMPARI SO N

In this section, we discuss the number of modiﬁcations to

a source word that are made by the prior art Knuth code [14]

and the newly developed code. We start with the new method.

A. New method

The probability, denoted by Pr1(`), that `=|w(x)|/2,

0≤`≤n/2, symbols of xare inverted to obtain ψ(x)equals

(assuming equiprobable source words)

P r1(`) = (1

2nn

n

2, ` = 0,

1

2n−1n

n

2+`,1≤`≤n

2.(23)

The average number of symbol inversions, denoted by ¯

`1,

equals

¯

`1=

n

2

X

`=1

`P r1(`).(24)

6

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

10-4

10-3

10-2

10-1

100

Pr2(l), n=256

Pr1(l), n=64

Pr1(l), n=256

Pr2(l), n=64

Fig. 5. Distributions P r1(`)and P r2(`)versus the (relative) number

of symbol inversions `/n. Word length n= 64 and n= 256.

For large n, we obtain by using the well-known Gaussian

approximation to the binomial coefﬁcients,

¯

`1≈rn

2π, n 1.(25)

B. Knuth’s method

Knuth [14] presented a simple scheme for balancing large

codewords. Let xbe the n-bit (neven) source word of bipolar

symbols, xi∈ {−1,1}. Knuth showed that there is a balancing

index,`, such that

−

`

X

i=1

xi+

n

X

i=`+1

xi= 0, n even.(26)

In other words, by inverting a ﬁrst segment of `symbols any

word xof even length can be balanced. Note that the balancing

index `is not unique [21]. We assume here that the encoder

selects the smallest balancing index from the set of balancing

indexes. The distribution of the number of symbol inversions,

`, for obtaining the balanced word in Knuth’s scheme, denoted

by P r2(`),1≤`≤n(1≤j≤n/2), has been computed

by Weber and Immink [21] (assuming equiprobable source

words)

P r2(2j) = P r2(2j−1)

=n−2j+ 1

n2n−22(j−1)

j−1n−2j

n

2−j.

The average number of symbol inversions of Knuth’s scheme,

denoted by ¯

`2, simply equals, see Appendix,

¯

`2=

n

X

`=1

`P r2(`) = n

4+ 1.(27)

C. Comparison of the two methods

Figure 5 shows two examples of the distributions Pr1(`)

and P r2(`)versus the relative number of symbols inversions

`/n for word lengths n= 64 and n= 256. We may notice

that the distribution of Knuth’s method, P r2(`), is much wider

than that of the new method, P r1(`), which has a direct effect

on the average number of inversion (bit changes) made. For

example, for a codeword length n= 1000 around 12 symbol

inversions are required on average per codeword for the new

scheme. Knuth’s code requires, on average, for the same

codeword length, n= 1000, around 250 symbol inversions

for translating source words into codewords.

VI. CONCLUSIONS

We have presented a novel method for efﬁciently translating

arbitrary user data into balanced codewords. The new code is

minimally modiﬁed as the number of symbol changes made

to the source word for translating it into a balanced codeword

is minimal. The encoder inverts, on average, approximately

pn/2π,n1, symbols of the source word, where ndenotes

the source word length; the other code symbols being equal to

the source symbols. The redundancy of the new method using

a ﬁxed-length tag is log2(n/2 + 1). Large look-up tables for

encoding and decoding are avoided. The (time) complexity of

the new balanced encoder and decoder grows linearly with

source word length nfor asymptotically large values of n.

VII. APPENDIX

Let for 1≤j≤n

2

P r2(2j) = P r2(2j−1)

=n−2j+ 1

n2n−22(j−1)

j−1n−2j

n

2−j.(28)

Theorem 3:

¯

`2=

n

X

i=1

iP r2(i) = n

4+ 1.(29)

Proof: We simply ﬁnd, combining P r2(2i)and P r2(2i−1),

n

X

i=1

iP r2(i) =

n

2

X

i=1

(4i−1)P r2(2i).(30)

Since P r2(i)is a probability mass function, we have

n

2

X

i=1

P r2(2i) = 1

2,(31)

and we obtain

¯

`2= 4

n

2

X

i=1

iP r2(2i)−1

2.(32)

Deﬁne the moments

mk(n) =

n

2

X

j=1

jk2(j−1)

j−1n−2j

n

2−j, k = 0,1,2,(33)

then substituting into (32) yields

¯

`2=4(n+ 1)

n2n−2m1(n)−8

n2n−2m2(n)−1

2.(34)

In the literature [24, pp. 187], we ﬁnd

m0(n) = 2n−2.(35)

7

As, see (28), (31), and (33),

n+ 1

n2n−2m0(n)−2

n2n−2m1(n) = 1

2,(36)

we obtain

m1(n) = 2n−4(n+ 2).(37)

The zeroth and second moments, m0(n)and m2(n), are the

autoconvolution of the sequence 2i

iand i2i

i,i= 1,2, . . .,

respectively. The generating function of the autoconvolution

is obtained by squaring the original generating function as

presented in [24]. Due to space limitations, we omit the details,

and summarize the result:

m2(n)=2n−7(3n2+ 6n+ 8).(38)

Substituting (37) and (38) into (34) proves the theorem.

ACK NOW LE DG EM EN T

The authors are indebted to dr. A.J.E.M. (Guido) Janssen

for his assistance with the proof of Theorem 3 given in the

Appendix.

REFERENCES

[1] A. R. Calderbank, ”The art of signaling: ﬁfty years of coding theory,”

IEEE Transactions on Information Theory, vol. 44, no. 6, pp. 2561-2595,

Oct. 1998, doi: 10.1109/18.720549.

[2] D. T. Dao, H. M. Kiah, and T. T. Nguyen, “Average Redundancy of

Variable-Length Balancing Schemes `

a la Knuth,” ArXiv: 2204.13831,

April 2022.

[3] O. P. Babalola and V. Balyan, ‘Efﬁcient Channel Coding for Dimmable

Visible Light Communications System,” IEEE Access, vol. 8, pp.

215100-215106, 2020, doi: 10.1109/ACCESS.2020.3041431.

[4] J. N. Franklin and J. R. Pierce, “Spectra and Efﬁciency of Binary Codes

without DC,” IEEE Transactions on Communications, vol. COM-20, no.

6, pp. 1182-1184, Dec. 1972, doi: 10.1109/TCOM.1972.1091308.

[5] F. Chang, W. Hu, D. Lee and C. Yu, “Design and implementation of anti

low-frequency noise in visible light communications,” 2017 International

Conference on Applied System Innovation (ICASI), Sapporo, 2017, pp.

1536-1538, doi: 10.1109/ICASI.2017.7988219.

[6] K. A. S. Immink and K. Cai, “Properties and Constructions of Con-

strained Codes for DNA-based Data Storage,” IEEE Access, vol. 8, pp.

49523-49531, 2020, doi: 10.1109/ACCESS.2020.2980036.

[7] A. X. Widmer and P. A. Franaszek, “A Dc-balanced, Partitioned-Block,

8B/10B Transmission Code,” IBM J. Res. Develop., vol. 27, no. 5, pp.

440-451, Sept. 1983, doi: 10.1147/rd.275.0440.

[8] C. N. Yang and D. J. Lee, “Some new efﬁcient second-order spectral-

null codes with small lookup tables,” IEEE Transactions on Computers,

vol. 55, no. 7, pp. 924-927, July 2006, doi: 10.1109/TC.2006.111.

[9] T. M. Cover, “Enumerative Source Coding,” IEEE Transactions on

Information Theory, vol. IT-19, no. 1, pp. 73-77, Jan. 1973, doi:

10.1109/TIT.1973.1054929.

[10] V. Braun and K. A. S. Immink, “An Enumerative Coding Technique

for DC-free Runlength-Limited Sequences,” IEEE Transactions on

Communications, vol. 48, no. 12, pp. 2024-2031, Dec. 2000, doi:

10.1109/26.891213.

[11] J. P. M. Schalkwijk, “An Algorithm for Source Coding,” IEEE Transac-

tions on Information Theory, vol. IT-18, no. 3, pp. 395-399, May 1972,

doi: 10.1109/TIT.1972.1054832.

[12] Y. Xin and I. J. Fair, “Algorithms to Enumerate Codewords for DC2-

Constrained Channels,” IEEE Transactions on Information Theory, vol.

IT-47, no. 7, pp. 3020-3025, Nov. 2001, doi: 10.1109/18.959281.

[13] A. Hareedy, B. Dabak, and R. Calderbank, “The Secret Arithmetic of

Patterns: A General Method for Designing Constrained Codes Based

on Lexicographic Indexing,” IEEE Transactions on Information Theory,

2022. doi: 10.1109/TIT.2022.3170692.

[14] D. E. Knuth, “Efﬁcient Balanced Codes,” IEEE Transactions on

Information Theory, vol. IT-32, no. 1, pp. 51-53, Jan. 1986, doi:

10.1109/TIT.1986.1057136

[15] H. D. L. Hollmann and K. A. S. Immink, “Performance of efﬁcient

balanced codes,” IEEE Transactions on Information Theory, vol. IT-37,

no. 3, pp. 913-918, May 1991, doi: 10.1109/18.79961.

[16] F. Paluncic, B. T. Maharaj, and H. C. Ferreira, “Variable- and Fixed-

Length Balanced Runlength-Limited Codes Based on a Knuth-Like Bal-

ancing Method,” IEEE Transactions on on Information Theory, vol. IT-

65, no. 11, pp. 7045-7066, Nov. 2019, doi: 10.1109/TIT.2019.2914205.

[17] S. Al-Bassam and B. Bose, “On Balanced Codes,” IEEE Transactions

on Information Theory, vol. IT-36, no. 2, pp. 406-408, March 1990, doi:

10.1109/18.52490.

[18] S. Al-Bassam and B. Bose, “Design of Efﬁcient Balanced Codes,” IEEE

Transactions on Computers, vol. 43, pp. 362-365, March 1994, doi:

10.1109/12.272436.

[19] L. G. Tallini, R. M. Capocelli, and B. Bose, “Design of some new

efﬁcient balanced bodes,” IEEE Transactions on Information Theory,

vol. IT-42, no. 3, pp. 790-802, May 1996, doi: 10.1109/18.490545.

[20] L. G. Tallini and B. Bose, “Balanced codes with parallel encoding and

decoding,” IEEE Transactions on Computers, vol. 48, no. 8, pp. 794-

814, Aug. 1999, doi: 10.1109/12.795122.

[21] J. H. Weber and K. A. S. Immink, “Knuth’s Balanced Codes Revisited,”

IEEE Transactions on Information Theory, vol. IT-56, no. 4, pp. 1673-

1679, April 2010, doi: 10.1109/TIT.2010.2040868.

[22] K. A. S. Immink and J. H. Weber, “Very Efﬁcient Balanced Codes,”

IEEE Journal on Selected Areas of Communications, vol. 28, no. 2, pp.

188-192, Feb. 2010, doi: 10.1109/JSAC.2010.100207.

[23] G. Raney, “Functional Composition Patterns and Power Series Rever-

sion,” Transactions of the American Mathematical Society, vol. 94. pp.

441-451, 1960.

[24] R. L. Graham, D. E. Knuth, and O. Patashnik, Concrete mathematics: A

Foundation for Computer Science (2nd Edition), Addison-Wesley, ISBN-

13: 978-0201558029, 2018.

[25] E. Ordentlich and R. M. Roth, “Low Complexity Two-

Dimensional Weight-Constrained Codes”, IEEE Transactions on

Information Theory, vol. 58, no. 6, pp. 3892-3899, June 2012, doi:

10.1109/TIT.2012.2190380.

[26] T. T. Nguyen, K. Cai, K. A. S. Immink and Y. M. Chee, “Efﬁcient

Design of Capacity-Approaching Two-Dimensional Weight-Constrained

Codes,” 2021 IEEE International Symposium on Information Theory

(ISIT), pp. 2930-2935, 2021, doi: 10.1109/ISIT45174.2021.9517970.

[27] M. Merca, “A note on cosine power series,” Journal of Integer

Sequences, vol. 15, no. 5, Article 15.5.3, MR2942751, 2012.

Kees A. Schouhamer Immink (M’81-SM’86-F’90) founded

Turing Machines Inc. in 1998, an innovative start-up focused

on novel signal processing for DNA-based storage, where he

currently holds the position of president. He was from 1994

till 2014 an adjunct professor at the Institute for Experimental

Mathematics, Essen-Duisburg University, Germany.

He contributed to digital video, audio, and data recording

products including Compact Disc, CD-ROM, DCC, DVD,

and Blu-ray Disc. He received the 2017 IEEE Medal of

Honor, a Knighthood in 2000, a personal Emmy award in

2004, the 1999 AES Gold Medal, the 2004 SMPTE Progress

Medal, the 2014 Eduard Rhein Prize for Technology, and the

2015 IET Faraday Medal. He received the Golden Jubilee

Award for Technological Innovation by the IEEE Information

Theory Society in 1998. He was inducted into the Consumer

Electronics Hall of Fame, elected into the Royal Netherlands

Academy of Sciences and the (US) National Academy of

Engineering. He received an honorary doctorate from the

University of Johannesburg in 2014. He served the profession

as President of the Audio Engineering Society inc., New

York, in 2003.

Jos H. Weber (S’87-M’90-SM’00) was born in Schiedam, The

Netherlands, in 1961. He received the M.Sc. (in mathematics,

8

with honors), Ph.D., and MBT (Master of Business Telecom-

munications) degrees from Delft University of Technology,

Delft, The Netherlands, in 1985, 1989, and 1996, respectively.

Since 1985 he has been with the Delft University of

Technology. Currently, he is an associate professor at the De-

partment of Applied Mathematics. He was the chairman of the

Werkgemeenschap voor Informatie- en Communicatietheorie

from 2006 until 2021. He is the secretary of the IEEE Benelux

Chapter on Information Theory since 2008. He was a visiting

researcher at the University of California (Davis, CA, USA),

the Tokyo Institute of Technology (Japan), the University of

Johannesburg (South Africa), EPFL (Switzerland), and SUTD

(Singapore). His main research interests are in the area of

channel coding.