Content uploaded by Kees Schouhamer Immink

Author content

All content in this area was uploaded by Kees Schouhamer Immink on Mar 29, 2021

Content may be subject to copyright.

0018-9448 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIT.2017.2767034, IEEE

Transactions on Information Theory

1

Preﬁxless q-ary Balanced Codes with Fast

Syndrome-based Error Correction

Theo G. Swart, Senior Member, IEEE, Jos H. Weber, Senior Member, IEEE, and

Kees A. Schouhamer Immink, Fellow, IEEE

Abstract—We investigate a Knuth-like scheme for balancing

q-ary codewords, which has the virtue that look-up tables for

coding and decoding the preﬁx are avoided by using precoding

and error correction techniques. We show how the scheme can

be extended to allow for error correction of single channel errors

using a fast decoding algorithm that depends on syndromes

only, making it considerably faster compared to the prior art

exhaustive decoding strategy. A comparison between the new

and prior art schemes, both in terms of redundancy and error

performance, completes the study.

Index Terms—Balanced code, constrained code, error correc-

tion, Knuth code, running digital sum.

I. INT ROD UC TI ON

BALANCED, sometimes called dc-free, q-ary sequences

have found widespread application in popular optical

recording devices such as CD, DVD, and Blu-Ray [1], ca-

ble communication [2], and recently in non-volatile (ﬂash)

memories [3]. A sequence of symbols is said to be balanced

if the sum of the symbols equals the prescribed balancing

value. The study of simple and efﬁcient methods for translating

arbitrary source sequences into balanced q-ary sequences has

been an active ﬁeld of research. If the sequences are not too

long, look-up translation tables can be used. For handling

extremely long binary (q= 2) blocks, where look-up tables

are impractically large, Knuth [4] has devised two simple

algorithms for generating binary balanced codewords, namely

aparallel algorithm and a serial algorithm.

In the parallel algorithm, the encoder splits the user word

into two segments: the ﬁrst consisting of the ﬁrst vbits of

the user word, and the second consisting of the remaining

m−vbits. The encoder inverts the ﬁrst segment by adding

(modulo 2) a ‘1’ to the vsymbols in the ﬁrst segment. The

index vis chosen in such a way that the modiﬁed word is

balanced. Knuth showed that such an index vcan always

This paper was presented in part at the IEEE Information Theory Workshop,

Seville, Spain, September 2013.

T. G. Swart is with the Department of Electrical and Electronic Engineering

Science, University of Johannesburg, Auckland Park, 2006, South Africa. (e-

mail: tgswart@uj.ac.za). J. H. Weber is with the Faculty of Electrical Engi-

neering, Mathematics and Computer Science, Delft University of Technology,

2628 Delft, The Netherlands and a visiting professor with the Department of

Electrical and Electronic Engineering Science, University of Johannesburg,

Auckland Park, 2006, South Africa (e-mail: j.h.weber@tudelft.nl). K. A.

Schouhamer Immink is with Turing Machines Inc, 3016 DK Rotterdam, The

Netherlands (e-mail: immink@turing-machines.com).

This work is based on research supported in part by the National Research

Foundation of South Africa (UID 77596).

Copyright c

2017 IEEE. Personal use of this material is permitted.

However, permission to use this material for any other purposes must be

obtained from the IEEE by sending a request to pubs-permissions@ieee.org.

be found. In the simplest embodiment of Knuth’s algorithm,

the index vis represented by a p-bit balanced word, called

apreﬁx. The p-bit preﬁx is appended to the m-bit modiﬁed

user word, and the sequence of p+mbits is transmitted. The

rate of the code is m/(m+p). The receiver, after observing

the preﬁx, decodes the index v, and subsequently it undoes

the modiﬁcations made to the user word. Note that both the

encoder and decoder do not require large, m-bit wide, look-up

tables, making Knuth’s algorithm very attractive for balancing

long user words. The serial algorithm adds a p-bit preﬁx

(not necessarily balanced) that describes the weight of the

original m-bit user word. In this case, the encoder splits the

sequence, consisting of the preﬁx and user word together, into

two segments and ﬁnds an index vthat balances the overall

sequence. The receiver undoes the modiﬁcation by inverting

the bits until the original weight, captured by the preﬁx, is

attained. Modiﬁcations and embellishments of Knuth’s binary

schemes have been presented by Al-Bassam and Bose [5],

Tallini et al. [6], and Weber and Immink [7].

Binary balancing schemes that enable correction of errors

have been presented by van Tilborg and Blaum [8], Al-Bassam

and Bose [9] and Weber et al. [10], among others. In [8], the

idea is to consider short balanced sequences as symbols of

a non-binary alphabet and to construct error-correcting codes

over that alphabet. In [9], balanced codes that correct a single

error are constructed. These codes can be extended using

concatenation techniques to correct up to four errors. In [10],

a combination of conventional error correction techniques and

Knuth’s balancing method is used.

Methods for balancing q-ary, q > 2, codewords can be

found, for example, in Capocelli et al. [11], Tallini and Vacaro

[12], Al-Bassam [13], Swart and Weber [14], and Pelusi et

al. [15]. Balancing is achieved in [14], as in Knuth’s parallel

scheme, by splitting the user word into a ﬁrst and second

segment of vand m−vsymbols, respectively. The encoder

adds (modulo q) an integer s+ 1 to the symbols in the

ﬁrst segment, and an integer sto the symbols in the second

segment. A further improvement of [14] was presented by

Pelusi et al. [15]. More details regarding Swart and Weber’s

q-ary scheme are provided in the next section.

Other work closely related to q-ary balancing, but with

different alphabets or constraints, include balancing codes over

the q-th roots of unity [16], [17] and balancing codes that are

invariant under symbol permutation [18]. For the former, the

non-binary, complex alphabet is chosen as the q-th roots of

unity, e.g. when q= 4, the alphabet is {+1,+j, −1,−j}, and

the complex sum of the symbols in a codeword must be zero.

0018-9448 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIT.2017.2767034, IEEE

Transactions on Information Theory

2

For the latter, each alphabet symbol occurs as many times as

any other symbol in the codeword, and can thus be seen as a

special case of q-ary balancing.

In most Knuth-like balancing schemes, both binary and non-

binary, the encoder appends a preﬁx, which is required by

the decoder for restoring the original user word. However,

for some of these schemes look-up tables are required for

encoding and decoding the preﬁx, which is undesirable for

certain high-speed applications. Schemes like [12] require

either a very small look-up table or the check word (i.e. preﬁx)

can be obtained by direct computation of the user word’s

weight. In this paper, we present a simple preﬁxless scheme,

which extends the work by Swart and Immink [19], see also

Section III. As in [19], we add error correction capabilities to

the balancing scheme, which can correct single channel errors,

and in this paper we show that fast decoding can be done based

on syndromes only.

In Sections II and III, we present relevant results from the

literature. We also detail Swart and Immink’s [19] method for

constructing preﬁxless q-ary balanced codes in Section III.

In Section IV, we show how error correction capabilities can

be efﬁciently added to the balancing act. In Section V, we

investigate the redundancy, complexity, and performance of the

new scheme. In Section VI we discuss and highlight certain

aspects of the work along with directions for future research,

and ﬁnally in Section VII we present our conclusions.

II. BAL AN CI NG OF q-A RY SEQUENCES

The following deﬁnitions will be used in this paper. Let

x= (x1, . . . , xm)be a sequence of msymbols taken from

the q-ary alphabet Q={0,1, . . . , q−1}, with qand mpositive

integers and q≥2. The weight of x, denoted by weight(x),

is deﬁned as the real sum of the m q-ary symbols, i.e.

weight(x) =

m

X

i=1

xi.

We further deﬁne the balancing value by

Ωq,m =m(q−1)

2,(1)

where qand mare chosen such that Ωq,m is an integer. A

codeword xof length mis said to be balanced if

weight(x) = Ωq,m.(2)

Alternatively, an alphabet with polar symbols can be con-

sidered, where

Qodd ={−q−1

2, . . . , −2,−1,0,+1,+2,...,+q−1

2},

if qis odd, and

Qeven ={−(q−1), . . . , −3,−1,+1,+3,...,+(q−1)},

if qis even. In this case, balancing is achieved when the sym-

bol sum equals zero, i.e. when weight(x) = 0. The conversion

between the two representations is straightforward, thus for

clerical convenience we only use the former representation.

To keep this paper as self-contained as possible, we sum-

marize the most important results from [14] without proofs.

Aq-ary sequence can be balanced by adding (modulo q)

an appropriate q-ary balancing sequence, as deﬁned in the

following.

Deﬁnition 1: Aq-ary balancing sequence of length mis

denoted by bs,v = (b1, . . . , bm),s∈ Q,v∈ {1, . . . , m}, with

bi=(s+ 1 (mod q), i ≤v,

s, otherwise.

The balancing sequence can be seen as two sequences,

consisting of a q-ary sequence (the all-ssequence) and a

“binary” sequence (indicating the position). Then,

bs,v = (s, s, s, . . . , s)⊕q(

v

z }| {

1, . . . , 1,0, . . . , 0),

where ⊕qrepresents modulo qsummation.

Example 1: Consider a 4-ary sequence x=

(0,2,3,3,3,1,3,2), of weight 17. By adding the

sequence (2,2,2,2,2,2,2,1), we can obtain a balanced

sequence (2,0,1,1,1,3,1,3) with the weight equal to

Ω4,8= 12. The balancing sequence is not unique, and

in this case three more balancing sequences can be

found, namely (3,3,3,2,2,2,2,2),(3,3,3,3,3,3,3,2) and

(0,0,0,3,3,3,3,3).

To see how the balancing sequences affect the weight, let

bzdenote the z-th balancing sequence, with

z=sm +v, 1≤z≤qm,

and let ω(z) = weight(x⊕qbz). Note there are qm possible

balancing sequences.

For the binary case, we know from [4] that the minimum

and maximum values of ω(z)will always be such that

min{ω(z)} ≤ Ω2,m ≤max{ω(z)}. The increase and decrease

in ω(z)plotted against zwill always be one, and therefore

it must pass through Ω2,m at some stage. A progression of

this nature, that consists of a succession of random steps, is

called a random walk, which is the basis of Knuth’s proof. For

q-ary balancing in [12], Tallini and Vaccaro construct single

or double maps in such a way that random walks are also

achieved, with ω(z)changing by −1, 0 or +1 for each step,

thereby ensuring that it will pass through Ωq,m. The approach

from [14] also results in random walks, but the value of ω(z)

does not change by −1, 0 or +1, as described in the following

lemma.

Lemma 1: When adding bzto x,ω(z)forms a random walk

with increases of 1 and decreases of q−1.

An exact minimum and maximum value for ω(z)is hard to

ﬁnd since it is sequence dependent, but it can be shown that

a bound exists on these values for all sequences.

Lemma 2: The ω(z)-random walk has min{ω(z)} ≤

Ωq,m ≤max{ω(z)}.

Using Lemmas 1 and 2 we can state the main result from

[14]:

Theorem 1: There is at least one pair of integers, sand v,

s∈ Q,v∈ {1, . . . , m}, such that x⊕qbs,v is balanced.

This result is used in the next section to construct the

preﬁxless balanced codes from [19].

0018-9448 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIT.2017.2767034, IEEE

Transactions on Information Theory

3

III. PRE FIX LE SS BA LA NCED CODES

As before, let x= (x1, . . . , xm)be a q-ary word of m

symbols, xi∈ Q. The word d= (d1, . . . , dm)is obtained by

modulo qintegration1of x

di=di+1 ⊕qxi,1≤i≤m,

where dm+1 = 0. The above integration operation will be

denoted by d=I(x). Note that the original word xcan be

uniquely restored by modulo qdifferentiation:

xi=diqdi+1,1≤i≤m, (3)

where qindicates modulo qsubtraction. The above differ-

entiation operation will be denoted by x=I−1(d). Clearly,

I−1(I(x)) = x.

Deﬁne the binary m-bit word uv= (

v−1

z}|{

0. . . 0 1

m−v

z}|{

0. . . 0). We

are now in a position to formulate Theorem 2.

Theorem 2: There is at least one pair of integers, sand v,

s∈ Q,v∈ {1, . . . , m}, such that w=I(x⊕quv⊕qsum)

is balanced, that is weight(w)=Ωq,m.

Since uvand sumunder modulo qintegration are equiv-

alent to bs,v, according to Theorem 1 balancing of xwill

always be possible. Note that in the binary case, q= 2, the

search simpliﬁes to ﬁnding the balancing index vonly, since

such an index will always be found for s= 0. The following

example illustrates the method.

Example 2: Let q= 5 and m= 7, and let the input be

x= (3,2,0,1,1,4,0). After a search we ﬁnd that the choice

of s= 3 and v= 3 balances the integrated sequence I(x) =

(1,3,1,1,0,4,0). Adding u3⊕53u7= (0,0,1,0,0,0,3) to

the input yields (3,2,1,1,1,4,3), and after integration we

obtain w=I((3,2,1,1,1,4,3)) = (0,2,0,4,3,2,3). Note

that wis balanced since the sum of its components equals

Ω5,7= 14.

It is worth noting that determining sand vcan be simpliﬁed

by ﬁrst ﬁnding an ssuch that weight(x⊕qbs,0)≤Ωq,m

and weight(x⊕qbs+1,0)≥Ωq,m and then ﬁnding the vthat

balances the sequence. We will elaborate on the complexity

for this in Section V-B.

From the receiver’s point of view, uvintroduced an error

of magnitude one in xin an unknown position. In the rest

of the paper we will refer to this as the magnitude-one error.

The next encoding and decoding algorithms exploit Theorem 2

and we will show that in conjunction with error correcting

techniques to correct the magnitude-one error, it will be

possible to efﬁciently balance q-ary words, and circumvent

the encoding and decoding of the preﬁx in the prior art

constructions.

A. Encoding

We will make use of a q-ary (m−1, k)linear block code,

denoted as C, of dimension kand length m−1to encode the

user word, a= (a1, . . . , ak), of length k. Let r0=m−1−kbe

the redundancy of the block code, and deﬁne the r0×(m−1)

1Note that for convenience we perform the integration from right to left.

Similar results will be obtained if it was performed from left to right, i.e.

di+1 =di⊕qxiwith d0= 0.

matrix Hq,r0whose i-th column hiis the q-ary representation

of the integer i,1≤i≤m−1,m≤qr0. For example, for

q= 3,r0= 2, and m= 9 we obtain

H3,2=12012012

00111222.(4)

We call Hq ,r0acheck matrix, for which we have an

easy syndrome decoding available similar to that of binary

Hamming codes [20]. The code Cis a single, magnitude-

one error correction code, or a single error detection code, as

described in [21], [22], [23]. The maximum row length of the

check matrix Hq,r0is qr0−1,r0>1. The encoding function

is denoted by x=φq(a), and is deﬁned in such a way that

xsatisﬁes Hq,r0xT=0T.

The encoding procedure consists of the following three

steps:

Step 1: The k-symbol user word, a, is encoded into the

codeword x= (x1, . . . , xm−1)using the q-ary (m−1, k)

linear block code, i.e. x=φq(a).

Step 2: The m-symbol word x0is obtained by appending a

redundant ‘0’ to x, that is, x0= (x1, . . . , xm−1,0).

Step 3: Find a pair of integers, s∈ Q and v∈ {1, . . . , m},

such that w=I(x0⊕quv⊕qsum), with weight(w)=Ωq,m.

(According to Theorem 1, such a pair of integers sand vcan

always be found.)

Example 3: Let q= 5 and k= 2, and let the user word

be a= (3,2). Using the shortened linear code with generator

matrix

G5,2=101132

011414,

and check matrix

H5,2=123401

000011,

the user word is encoded as x= (3,2,0,1,1,4) and after

appending a redundant ‘0’ we have x0= (3,2,0,1,1,4,0).

This is the same sequence used in Example 2 and will thus

be balanced as w= (0,2,0,4,3,2,3).

B. Decoding

At the receiver side, the m-symbol word y0is retrieved from

the received wby modulo qdifferentiation, i.e.

y0=I−1(w) = x0⊕quv⊕qsum.

We drop the last symbol, ‘s’ (or ‘s+ 1’ if v=m), of

y0and thereby obtain yof length m−1. Then either the

words yand xdiffer only at an unknown index position v

if v6=m, because of the magnitude-one “error” introduced

during encoding, or y=xif v=m.

As xsatisﬁes Hq,r0xT=0T, we have

Hq,r0yT=Hq,r 0(x⊕quv)T=(hv,if 1≤v≤m−1,

0T,if v=m,

where hiis the i-th column of Hq,r0. Thus, we can uniquely

retrieve the index v, and restore the original word by subtract-

ing ‘1’ from yv, i.e. x=yquv.

Transactions on Information Theory

4

By removing the redundant symbols, we obtain the original

user word a.

Example 4: Using the balanced codeword obtained in

Example 3 as our received word w= (0,2,0,4,3,2,3),

we apply modulo qdifferentiation to obtain y0=

I−1(0,2,0,4,3,2,3) = (3,2,1,1,1,4,3). We then drop the

last symbol to get y= (3,2,1,1,1,4). By multiplying ywith

the shortened check matrix

123401

000011×321114T=3

0,

the third column is identiﬁed representing v= 3, thus x=

(3,2,0,1,1,4) and the original information is retrieved as a=

(3,2).

IV. ADD IN G ERROR CORRECTION

So far the error correction techniques employed were used

to identify the index vthat was used during the encoding

process. We will now extend the error correction code that

was used in such a way that we will be able to correct single

channel errors.

From (3), any single channel error in, say wj, will be

transformed into a double adjacent error, in xj−1and xj.

This, together with the single “error” we introduced during

balancing, means that we must be able to correct three

errors. However, this would come at a price of much more

redundancy. We can avoid this by extending our code used in

the previous section and by introducing interleaving.

A. Encoding

We start with a code similar to that used in Section III-

A, but extend it by adding a check symbol that checks all

the previous symbols and denote this extended code by C∗.

If qis odd2we then choose an odd value for mas well and

set n=m−1

2. The code C∗then forms an (n, k)linear block

code, with redundancy of r∗=n−kand check matrix H∗

q,r∗.

In general, the check matrix will be of size r∗×n, with the

i-th column h∗

ibeing the q-ary representation of the integer

(qr∗−1+i),1≤i≤n. As an example, the check matrix in

(4) becomes

H∗

3,3=

12012012

00111222

11111111

.

We now formally deﬁne the syndrome, s= (s1, s2, . . . , sr∗)T,

as it is generally used for Hamming codes, by

s=H∗

q,r∗ˆ

wT,

where ˆ

wis the (possibly corrupted) received codeword. The

syndrome is then equal to the summation (modulo q) of those

columns, multiplied by the error magnitudes, where the errors

occurred. For clerical convenience in the rest of the paper,

we neglect indicating that modular arithmetic is used when

calculating the syndromes.

2If qis even, then instead of using n=m−1

2and x0=

(x0

1, . . . , x0

m−1,0), use n=m−2

2and x0= (x0

1, . . . , x0

m−2,0,0) with

meven, so that the overall length is even and balancing can be achieved.

Note that this code is a special case of the class of tsym-

metric error correction and all unidirectional error detection

(tEC-AUED) codes with t= 1. The reader can refer to [21],

[22], [23], [24] and references therein for more details. We

will further elaborate on the use of these codes in Section VI.

To make this paper self-contained, we include the following

lemma.

Lemma 3: The code C∗with check matrix H∗

q,r∗can:

(i) detect a single magnitude-one error and a single random

error, for any value of q, or

(ii) correct a single magnitude-one error, for any value of q,

or

(iii) correct a single random error, for qany integer power

of a prime value.

Proof: Let the magnitude-one error be in position iand

the random error, with error magnitude e,e∈ Q, be in position

j. The resulting syndrome is s=h∗

i+eh∗

j. We prove each

case individually:

(i) sr∗6= 0 for all eexcept when e=q−1(since sr∗=

1⊕q(q−1) = 0). If e=q−1, then s=0Tonly if i=j, which

would mean that the magnitude-one error and the random error

“cancel” each other out. Therefore, a single magnitude-one

error and a single random error can always be detected based

on s6=0T.

(ii) If e= 0 (there is no random error), then with s=h∗

i

a magnitude-one error can always be corrected.

(iii) If there is no magnitude-one error, then s=eh∗

j. Since

we can ﬁnd efrom sr∗, we can retrieve jfrom h∗

j=e−1s. As

the modular multiplicative inverse, e−1, is needed, this case is

limited to qbeing integer powers of a prime value. Therefore,

a single random error can always be corrected.

We have two user words aand a0, each of length k, that

are encoded into codewords of length n,c= (c1, c2, . . . , cn)

and c0= (c0

1, c0

2, . . . , c0

n)respectively, using the code C∗. The

encoding function for this code is deﬁned as c=φ∗

q(a).

Interleave these two codewords to a depth of two, to form

x= (x1, x2, . . . , xm−1) = (c1, c0

1, c2, c0

2, . . . , cn, c0

n).

The encoding now follows the same steps as in Section III-A

to add a redundant ‘0’, to ﬁnd the values sand vto balance

the sequence and to encode it into w. The ﬁnal encoding step

is to append symbols αand βto w, where

α=w1⊕qw3⊕q· ·· ⊕qwm⊕qδq,m ,and

β=w2⊕qw4⊕q· ·· ⊕qwm−1,

with δq,m ≡(q−1) −Ωq,m (mod q).

Lemma 4: The sequence (α, β)is balanced.

Proof: Adding the two check symbols together:

α+β≡w1+w3+· ·· +wm+δq,m +w2

+w4+· ·· +wm−1(mod q)

≡w1+w2+w3+w4+· ·· +wm−1

+wm+ (q−1) −Ωq,m (mod q)

≡Ωq,m + (q−1) −Ωq,m (mod q)

≡q−1 (mod q).

Transactions on Information Theory

5

Since 0≤α, β ≤q−1, it must hold that α+β=q−1, and

thus (α, β)is balanced.

In essence, αand βare check symbols over the odd and

even symbols respectively, and δq,m is added to ensure that

αand βtogether are balanced. The sender then sends the

balanced sequence (w1, w2, . . . , wm, α, β)to the receiver. The

encoding process is summarized in Fig. 1.

The following example illustrates the encoding process.

Example 5: We consider a q= 5,(4,2) linear block code

with a generator matrix

G∗

5,2=1022

0131,

and check matrix

H∗

5,2=1234

1111.

Let the user words be a= (4,0) and a0= (2,1).

Using G∗

5,2, these are encoded as c= (4,0,3,3) and

c0= (2,1,2,0). After interleaving and adding the redun-

dant zero, we have x0= (4,2,0,1,3,2,3,0,0). According

to Theorem 2, we can ﬁnd s= 1 and v= 4, then

w=I((4,2,0,1,3,2,3,0,0) ⊕5(0,0,0,0,0,0,0,0,1) ⊕5

(0,0,0,1,0,0,0,0,0)) = I((4,2,0,2,3,2,3,0,1)) =

(2,3,1,1,4,1,4,1,1). The check symbols are calculated as

α= 3 and β= 1, with δ5,9= 1. The transmitted sequence

(2,3,1,1,4,1,4,1,1,3,1) is balanced with Ω5,11 = 22.

B. Decoding

Previously in [19], decoding was done exhaustively by

trying to correct the error in every even (or odd) position,

until the syndromes were found to be zero. For each attempt

at correcting the error, modulo qdifferentiation, deinterleaving

and syndrome calculation had to be performed. However,

this can negatively affect the decoding time if the length of

the sequence becomes very long. Instead of this exhaustive

(a1, a2, . . . , ak)

aa0

c0

(c1, c2, . . . , cn)(c0

1, c0

2, . . . , c0

n)

φ∗

q

x

x0

(x1, . . . , xm−1)=(c1, c0

1, c2, c0

2, . . . , cn, c0

n)

c

(a0

1, a0

2, . . . , a0

k)

Interleave depth 2

Add redundant 0

I(x0⊕quv⊕qsum)

(x1, x2, . . . , xm−1,0)

(w1, w2, . . . , wm)

Add check symbols

(w1, w2, . . . , wm, α, β)

w

φ∗

q

Fig. 1. Summary of encoding algorithm for qodd

decoding, we will show that we can decode once and correct

the error by making use of the syndromes directly.

Now we will deﬁne notations to be used in the following

description of the decoding process. Let ˆ

wbe the (possibly

corrupted) received codeword and ˆαand ˆ

βthe (possibly

corrupted) received check symbols. Let ˆ

xbe the sequence

after applying modulo qdifferentiation and dropping the last

redundant symbol, with ˆ

cand ˆ

c0the codewords recovered

after deinterleaving. Let s= (s1, s2, . . . , sr∗)Tand s0=

(s0

1, s0

2, . . . , s0

r∗)Tbe the syndromes calculated by multiplying

ˆ

cand ˆ

c0with the check matrix H∗

q,r∗, respectively. Finally,

let ¯

cand ¯

c0be the codewords after correction is applied, with

ˆ

aand ˆ

a0the recovered user words. The decoding process is

summarized in Fig. 2, where (φ∗

q)−1is used to denote the

inverse operation of φ∗

q. If the error correction was successful,

then we will have a=ˆ

aand a0=ˆ

a0.

We deﬁne the imbalance as the difference between the

weight of the received sequence and the known weight of the

balanced sequence, i.e. Pm

i=1 ˆwi−Ωq,m. From this imbalance

the error magnitude can be determined, provided a single

channel error occurred. By using check symbols αand βwe

can also determine whether the channel error occurred in an

even or odd position.

As was seen in the proof of Lemma 3, the modular

multiplicative inverse will be needed during decoding, which

means that the algorithm as described here is limited to prime

values of alphabet size q, or integer powers of a prime value

if it is adapted to work in an extension ﬁeld.

We assume that the random channel error occurred in posi-

(ˆa1,ˆa2, . . . , ˆak)

ˆ

aˆ

a0

ˆ

c0

(ˆc1,ˆc2, . . . , ˆcn)(ˆc0

1,ˆc0

2, . . . , ˆc0

n)

(φ∗

q)−1

ˆ

x

ˆ

x0

(ˆx1, . . . , ˆxm−1) = (ˆc1,ˆc0

1,ˆc2,ˆc0

2, . . . , ˆcn,ˆc0

n)

ˆ

c

(ˆa0

1,ˆa0

2, . . . , ˆa0

k)

Deinterleave

Drop redundant s

I−1(ˆ

w)

(ˆx1,ˆx2, . . . , ˆxm−1, s)

( ˆw1,ˆw2, . . . , ˆwm)

( ˆw1,ˆw2, . . . , ˆwm,ˆα, ˆ

β)

ˆ

w

Determine parity of error po-

sition and drop check symbols

(φ∗

q)−1

s0

s(s1, s2, . . . , sr∗)T

Correction

(¯c1,¯c2, . . . , ¯cn)(¯c0

1,¯c0

2, . . . , ¯c0

n)

Correction

¯

c0

¯

c

(s0

1, s0

2, . . . , s0

r∗)T

Fig. 2. Summary of decoding algorithm for qodd.

Transactions on Information Theory

6

s

ˆw1ˆwm-1

ˆw2ˆw3ˆw4

ˆx1ˆxm-2

ˆx2ˆx3ˆx4

ˆc1ˆcn

ˆc0

1ˆc2ˆc0

2ˆc0

n

·· ·

·· ·

·· ·

s

Random channel error in position t.

·· ·

·· ·

·· ·

ˆwt

or or

ˆxv

Random channel error aﬀects positions t−1 and t.

Intentional error from encoding, in position v.

Random channel error aﬀects positions τand τ0.

Intentional error from encoding, in position ν.

·· ·

·· ·

ˆcν

or

ˆxt-1 ˆxt

ˆc0

τ0

ˆcτ

ˆwm

τ=τ0=t

2if tis even

ν=dv

2e

τ−1 = τ0=t−1

2if tis odd

ˆxm-1

s0

ˆc0

ν

Fig. 3. Effect of errors on symbols from one decoding stage to the next

tion twhen considering ˆ

w, and that the intentional magnitude-

one error occurred in position vwhen considering ˆ

x.

The following observations are crucial to note, using Fig. 3

as a guide:

•a random channel error in ˆw1will only affect ˆx1and thus

only ˆc1, and similarly an error in ˆwmwill only affect

ˆxm−1and thus only ˆc0

n,

•a random channel error of magnitude ein ˆwt,2≤t≤

m−1, will affect ˆxtby eand ˆxt−1by −eafter modulo

qdifferentiation,

•similarly, a random channel error in ˆwt,2≤t≤m−1,

will affect ˆcτand ˆc0

τ0after modulo qdifferentiation and

deinterleaving, where τ=τ0=t

2if tis even, or τ−1 =

τ0=t−1

2if tis odd, with 1≤τ, τ 0≤n,

•an intentional magnitude-one error in position vin ˆ

x, will

be in position ν=v

2,1≤ν≤n, in ˆ

cif vis even or

ˆ

c0if vis odd, after deinterleaving,

•ˆxmdoes not affect any syndromes, ˆx1,ˆx3,ˆx5, . . . , ˆxm−2

affect s, and ˆx2,ˆx4,ˆx6, . . . , ˆxm−1affect s0,

•a random channel error of magnitude ein position τwill

result in a syndrome s=eh∗

τ, and consequently we have

to multiply by e−1before we can determine the τ-th

column of H∗

q,r∗, i.e. h∗

τ=e−1swith

τ=

r∗−1

X

i=1

qi−1[e−1si(mod q)].(5)

The same applies to τ0and s0.

•a random channel error’s effect on ˆ

cand ˆ

c0are deter-

mined from sr∗and s0

r∗respectively, since all columns

of H∗

q,r∗have ones in the r∗-th position.

Table I now lists the possible error locations that will be

affected by both types of errors and classiﬁes them into states,

to be used in the decoding process. By pairing the states

together, we obtain all the possible error scenarios as listed

in Table II, along with the complete syndromes. The values of

sr∗and s0

r∗are used to determine the error state. It is now also

evident why it is necessary to determine the position parity of

the channel error, e.g. for q= 7 and e= 3 one is unable to

distinguish between states B2 and A3 as both are valid with

sr∗= 4 and s0

r∗= 4.

The ﬂow diagram, see Fig. 4, and the values of the param-

eters e,sr∗and s0

r∗determine the error state. Once the error

state is determined, then sand s0in Table II can be used to

TABLE I

CLA SSI FIC ATION O F ER ROR LO CATI ON S FOR T HE M AGNI TU DE-O NE E RRO R

AN D THE C HA NNE L ERR OR

Magnitude-one error in position vChannel error in position t

State Error Locations State Error Locations

Avodd, 1≤v≤m−20 No channel error

Bveven, 2≤v≤m−11t= 1

Cv=m2teven, 2≤t≤m−1

3todd, 3≤t≤m−2

4t=m

TABLE II

SYN DROM E VALU ES BA SED O N TH E POS SIB LE E RRO R STATES (A LL

OP ERATI ON S PE RFO RM ED M ODU LO q)

Error

State

s s0sr∗s0

r∗Note

A0 h∗

ν01 0

A1 h∗

ν+eh∗

τ01 + e0τ= 1

A2 h∗

ν−eh∗

τeh∗

τ01−e e τ =τ0

A3 h∗

ν+eh∗

τ−eh∗

τ01 + e−e τ −1 = τ0

A4 h∗

ν−eh∗

τ01−e τ 0=n

B0 0h∗

ν0 1

B1 eh∗

τh∗

νe1τ= 1

B2 −eh∗

τh∗

ν+eh∗

τ0−e1 + e τ =τ0

B3 eh∗

τh∗

ν−eh∗

τ0e1−e τ −1 = τ0

B4 0h∗

ν−eh∗

τ001−e τ0=n

C0 0 0 0 0

C1 eh∗

τ0e0τ= 1

C2 −eh∗

τeh∗

τ0−e e τ =τ0

C3 eh∗

τ−eh∗

τ0e−e τ −1 = τ0

C4 0−eh∗

τ00−e τ 0=n

ν=dv

2e,τ=τ0=t

2if tis even, τ−1 = τ0=t−1

2if tis odd

determine the error positions using (5). Depending on whether

e= 1 or e=q−1, certain states may appear equivalent, and

during decoding we need to be able to differentiate between

these. Here we consider two states to be equivalent if their sr∗

and s0

r∗values are the same. In that case, we need to determine

the error positions to distinguish between the two states.

If only a single error occurred, then decoding will be

straightforward as no ambiguity exists. However, since we are

working with a constrained code, we can use the constraints

Transactions on Information Theory

7

tC2

A2

even

odd

−e+1

e+1

e

START

C1

A1

B1

B3

B4

1

−e

−e+1

C3

C4

A3

A4

e

e+1

e

0

e+1

1

e

0

B2

−e

−e

e

equiv.

if e= 1

equiv.

if e=q−1

0

e

>0

= 0

A0

B0

sr∗

1

0

1

0

equiv.

if e= 1

equiv.

if e=q−1

C0

0

sr∗

sr∗

sr∗

sr∗

sr∗

sr∗

sr∗

s0

r∗

s0

r∗

s0

r∗

Fig. 4. Flow diagram to determine error states based on syndrome values.

to check at different stages in the decoding whether multiple

errors possibly occurred and declare a decoding failure if this

is detected. Decoding is done according to the following steps.

Step 1: Let ∆be the imbalance of the received sequence,

with:

∆ =

m

X

i=1

ˆwi−Ωq,m.

If |∆|> q −1, conclude that multiple errors occurred, declare

a decoding failure and STOP. Otherwise, proceed to calculate

all the necessary values by determining the error magnitude3

as e≡∆ (mod q)and the check symbols as

γ= ˆw1⊕qˆw3⊕q· ·· ⊕qˆwm⊕qδq,m qˆα,

γ0= ˆw2⊕qˆw4⊕q· ·· ⊕qˆwm−1qˆ

β.

Perform modulo qdifferentiation with ˆ

x0=I−1(ˆ

w), drop the

redundant last symbol to obtain ˆ

x, deinterleave the codewords

to ˆ

cand ˆ

c0, and determine sand s0.

3The imbalance, ∆, indicates the error magnitude, but its sign also indicates

if we are above or below Ωq,m. This information can be used to check if the

error correction was performed successfully. The error magnitude, e, used in

the decoding algorithm, however, does not make use of the sign information,

as −e≡q−e(mod q)in all the calculations.

TABLE III

ERROR STATES AND CORRESPONDING CORRECTIVE ACTION

Error State Corrective action (all operations done modulo q)

A0 Subtract 1 from ˆcν

A1 Subtract 1 from ˆcν, subtract efrom ˆc1

A2 Subtract 1 from ˆcν, add eto ˆcτ, subtract efrom ˆc0

τ

A3 Subtract 1 from ˆcν, subtract efrom ˆcτ, add eto ˆc0

τ0

A4 Subtract 1 from ˆcν, add eto ˆc0

n

B0 Subtract 1 from ˆc0

ν

B1 Subtract 1 from ˆc0

ν, subtract efrom ˆc1

B2 Subtract 1 from ˆc0

ν, add eto ˆcτ, subtract efrom ˆc0

τ

B3 Subtract 1 from ˆc0

ν, subtract efrom ˆcτ, add eto ˆc0

τ0

B4 Subtract 1 from ˆc0

ν, add eto ˆc0

n

C0 No correction necessary

C1 Subtract efrom ˆc1

C2 Add eto ˆcτ, subtract efrom ˆc0

τ

C3 Subtract efrom ˆcτ, add eto ˆc0

τ0

C4 Add eto ˆc0

n

Step 2: If s=s0=0Tand ∆ = γ=γ0= 0, then proceed

to Step 8, otherwise proceed to the next step.

Step 3: If γ6= 0 and γ0= 0, then tis odd, or if γ= 0 and

γ06= 0, then tis even, and proceed to the next step. Otherwise,

if ∆6= 0 (and since either γ6= 0,γ06= 0 or γ=γ0= 0,

and the parity of tcannot be determined), then multiple errors

occurred. In that case, declare a decoding failure and STOP.

Otherwise, proceed.

Step 4: Use e,sr∗and s0

r∗together with Table I and Fig. 4

to determine the error state(s). If equivalent error states are

obtained, for e= 1 (C3 ≡A4 and C1 ≡B3) or for e=q−1

(C3 ≡B1 and C4 ≡A3), then use the next step to determine

whether τ= 1 or τ0=n.

Step 5: For the determined error state, use Table II together

with the syndromes to solve for τ,τ0and ν, making use of (5).

If τ6=τ0for teven, or τ+1 6=τ0for todd4, or 0≮τ, τ 0n,

then declare a decoding failure and STOP. If ˆw2τ−∆6∈ Q

for teven, or if ˆw2τ−1−∆6∈ Q for todd, then declare

a decoding failure and STOP. (Note that in this case we are

not doing modulo qsubtraction, as we are testing whether

correcting the original imbalance, ∆, in position twould have

resulted in an invalid channel symbol.)

Step 6: Correct the errors by applying the corrective action

as described in Table III, using e,τ,τ0and ν.

Step 7: Recalculate the syndromes for the corrected code-

words. If s=s0=0T, then proceed to the next step,

otherwise declare a decoding failure and STOP.

Step 8: Finish decoding by recovering the user words from

the codewords.

Theorem 3: Using the fast syndrome-based algorithm de-

scribed, a single channel error can be corrected, provided that

qis an integer power of a prime number.

Proof: Using γand γ0we can determine the parity of t.

Using the imbalance we can determine e, and from sand s0we

4These conditions are checked in the case where τand τ0can be determined

independently, to test whether the expected result is obtained, e.g. for state

C2 or C3 in Table II.

Transactions on Information Theory

8

can obtain sr∗and s0

r∗. All these parameters can then be used

to determine the error state from those listed in Table II, as

well as to distinguish between equivalent states for instances

where e= 1 or e=q−1, as described in the decoding

algorithm.

By using e,e−1,sand s0, and according to Lemma 3,

we can solve for ν,τand τ0, since we have two syndromes

and two unknowns, recalling that τand τ0are related. Then

correction follows from Table III.

We conclude this section with the following example of

decoding.

Example 6: We use the q= 5 balanced sequence,

(2,3,1,1,4,1,4,1,1,3,1), obtained in Example 5, and con-

sider three received sequences, where the bold symbol(s)

indicate the channel error(s).

•Case 1: The received sequence is

(2,3,1,1,4,3,4,1,1,3,1). We proceed through the

decoding steps:

1) ∆ = 2,e= 2,γ= 0 and γ0= 2. After performing

modulo 5 differentiation and dropping the redundant

last symbol, we obtain ˆ

x= (4,2,0,2,1,4,3,0).

Deinterleaving produces ˆ

c= (4,0,1,3) and ˆ

c0=

(2,2,4,0), and multiplying these with H∗

5,2results

in the syndromes s= (4,3)Tand s0= (3,3)T.

2) None of the conditions are met, and we proceed to

the next step.

3) According to γand γ0,tis even.

4) From the syndromes we extract s2= 3 and s0

2= 3,

together with e= 2, using Fig. 4 to determine the

error state to be B2.

5) Using Table II, we ﬁnd −eh∗

τ= (4,3)Tand h∗

ν+

eh∗

τ0= (3,3)T. Keeping in mind that τ=τ0and

e−1= 3, we solve the column positions as τ=τ0=

3and ν= 2. Testing ˆw2τ−∆=3−2=1 ∈ Q

shows that a valid channel symbol is obtained.

6) Apply the correction from Table III, then ¯

c=

(4,0,3,3) and ¯

c0= (2,1,2,0).

7) Recalculating the syndromes results in s=0Tand

s0=0T. Proceed to the next step.

8) According to G∗

5,2, the information symbols are in

positions 1 and 2. Thus ˆ

a= (4,0) and ˆ

a0= (2,1),

which agrees with the original aand a0from

Example 3.

•Case 2: The received sequence is

(1,3,1,1,4,1,4,1,1,3,1). We proceed through the

decoding steps:

1) ∆ = −1,e= 4,γ=−1and γ0= 0. We obtain

ˆ

x= (3,2,0,2,3,2,3,0), deinterleaving produces

ˆ

c= (3,0,3,3) and ˆ

c0= (2,2,2,0), and the syn-

dromes s= (4,4)Tand s0= (2,1)Tare obtained.

2) None of the conditions are met, and we proceed to

the next step.

3) tis odd.

4) s2= 4 and s0

2= 1, together with e= 4. Use Fig. 4

to determine the error state to be B1 or C3.

5) From Table II, since s=eh∗

τfor both possible

error states, solving τwill enable us to distinguish

between the two states. Multiplying sby e−1= 4,

determines that τ= 1, and thus the error state is B1.

With τsolved, it can straightforwardly be found that

ν= 2. Testing ˆw2τ−1−∆=1−(−1) = 2 ∈ Q

shows that a valid channel symbol is obtained.

6) Apply the correction from Table III, making ˆ

c=

(4,0,3,3) and ˆ

c0= (2,1,2,0).

7) Recalculating the syndromes gives s=0Tand s0=

0T. Proceed to the next step.

8) ˆ

a= (4,0) and ˆ

a0= (2,1), which agrees with the

original aand a0from Example 3.

•Case 3: The received sequence is

(2,3,1,3,4,2,4,1,1,3,1). We proceed through the

decoding steps:

1) ∆ = 3,e= 3,γ= 0 and γ0= 3. We obtain

ˆ

x= (4,2,3,4,2,3,3,0), deinterleaving produces

ˆ

c= (4,3,2,3) and ˆ

c0= (2,4,3,0), and the syn-

dromes s= (3,2)Tand s0= (4,4)Tare obtained.

2) None of the conditions are met, therefore we pro-

ceed to the next step.

3) tis even.

4) s2= 2 and s0

2= 4, together with e= 3. Use Fig. 4

to determine the error state to be B2.

5) Using Table II, we ﬁnd −eh∗

τ= (3,2)Tand h∗

ν+

eh∗

τ0= (4,4)T. Using τ=τ0and e−1= 2, we

solve the column positions as ν= 2 and τ=τ0= 4.

Testing ˆw2τ−∆=1−3 = −26∈ Q indicates that an

invalid channel symbol has been obtained, therefore

we declare a decoding failure and stop.

V. ANALYSI S

A. Redundancy

We ﬁrst look at the redundancy of the balancing scheme in

Section III and compare it with those discussed in Sections I

and II. Let rdenote the total number of redundant symbols of

the balanced code, r=r0+ 1. Since the maximum length of

the check matrix Hq,r0equals qr0−1, we conclude that the

maximum length, Lq(r), of the user word for q > 2is

Lq(r)=(qr0−1) + 1 −r

=qr−1−r.

For the binary case q= 2, since only the index vneeds to

be encoded and not the integer s, we ﬁnd

L2(r) = 2r−r−1,

which is the same value as presented by Knuth [4] using a

construction with a preﬁx. Note that for q= 2 the check

matrix C2,r deﬁnes a regular (binary) Hamming code with

redundancy r0=r.

Swart and Weber’s construction [14] has a balanced preﬁx

of length r, where each preﬁx uniquely represents the pair

of integers sand v. Let Nq(r)denote the number of distinct

q-ary balanced preﬁxes of length r. For this construction the

maximum length of the user word, denoted by LSw

q(r), is

LSw

q(r) = Nq(r)

q.

Transactions on Information Theory

9

TABLE IV

MAX IMU M US ER WO RD LE NG HTS A S A FU NCT IO N OF r

q r LSw

q(r)LCap1

q(r)LCap2

q(r)LPel1

q(r)Lq(r)LECC

q(r)RECC

3 4 6 40 76 9 23 — —

3 5 17 121 237 25 76 — —

3 6 47 364 722 70 237 — —

3 7 131 1093 2179 196 722 — —

3 8 369 3280 6552 553 2179 — —

3 9 1046 9841 19673 1569 6552 10 0.526

3 10 2984 29524 59038 4476 19673 9 0.474

3 11 8551 88573 177135 12826 59038 44 0.800

3 12 24596 265720 531428 36894 177135 43 0.782

3 13 70980 797161 1594309 106470 531428 150 0.920

3 14 205409 2391484 4782954 308113 1594309 149 0.914

5 4 17 156 308 21 121 — —

5 5 76 781 1557 95 620 — —

5 6 350 3906 7806 437 3119 — —

5 7 1627 19531 39055 2033 15618 4 0.364

5 8 7633 97656 195304 9541 78117 3 0.273

5 9 36065 488281 976553 45081 390616 42 0.824

5 10 171389 2441406 4882802 214236 1953115 41 0.804

5 11 818299 12207031 24414051 1022873 9765614 240 0.956

5 12 3922235 61035156 122070300 4902793 48828113 239 0.952

5 13 18861819 305175781 610351549 23577274 244140612 1238 0.990

5 14 90961151 1525878906 3051757798 113701438 1220703111 1237 0.989

Using generating functions, we can straightforwardly compute

Nq(r)as the largest coefﬁcient of the expansion of (1 + x+

x2+·· ·+xq−1)r. Then these values correspond to the central

binomial coefﬁcients (A001405), central trinomial coefﬁcients

(A002426), central quadrinomial coefﬁcients (A005190) and

central pentanomial coefﬁcients (A005191) for sequences with

q= 2, 3, 4 and 5 respectively, where the bracketed numbers

indicate the sequences from [25].

Capocelli et al. [11] presented two code constructions where

the maximum length of the user word for the ﬁrst construction,

LCap1

q(r), is

LCap1

q(r) = qr−1

q−1,

and the maximum length of the user word for the second,

slightly more complex, construction, LCap2

q(r), is

LCap2

q(r)=2qr−1

q−1−r.

In Tallini and Vaccaro [12], a generalization of Knuth’s

complementation method is used to balance sequences that are

close to being balanced, while other sequences are compressed

with uniquely decodable variable length code and balanced

using the saved space. The maximum length of the user word

for this construction, LTal

q(r), is

LTal

q(r) = 1

1−2α

qr−1

q−1−c1(q, α)r−c2(q, α),

where c1and c2are dependent on qand α, with α∈[0,1

2).

If only the balancing aspect is taken into account and not the

compression aspect, then LTal

q(r)is equal to LCap2

q(r).

Pelusi et al. [15] have two constructions with parallel

decoding. The ﬁrst construction has balanced preﬁxes, similar

to [14], and the maximum length of the user word for this

101

103

105

107

109

1011

1013

1015

1017

Maximum user word length

3 4 5 6 7 8 9 10 11 12 13 14 15

Redundancy, r

LSw

16 (r)

LCap1

16 (r)

LCap2

16 (r)

LPel1

16 (r)

L16(r)

LSw

3(r)

LCap1

3(r)

LCap2

3(r)

LPel1

3(r)

L3(r)

Fig. 5. Maximum user word lengths as a function of r

construction, LPel1

q(r), is

LPel1

q(r) = Nq(r)− {qmod 2 + [(q−1)LPel1

q(r)] mod 2}

q−1.

The second construction is a reﬁnement of the ﬁrst with

preﬁxes that need not be balanced, and has a maximum user

word length that is the same as LCap1

q(r).

Table IV shows, for q= 3 and q= 5, the maximum length

of the user words as a function of rfor the schemes discussed

thus far. Fig. 5 graphically shows the maximum length of

the user words as a function of rfor q= 3 and q= 16.

Table V compares the schemes’ redundancies for practical user

word lengths. From this we can see that the new scheme has

maximum user lengths that are considerably longer than the

scheme from [14] and Scheme 1 from [15], comparable to that

of Scheme 1 from [11] and Scheme 2 from [15], but roughly

half that of Scheme 2 from [11].

Transactions on Information Theory

10

TABLE V

RED UNDA NC Y COM PARI SO N FOR S PE CIFI C IN FOR MATI ON LE NG THS

Redundancy, r

User word Scheme 1 Scheme 2 Scheme 1 Our scheme

qlength [14] [11] [11] [15] Our scheme with ECC

3 64 7 5 4 6 5 12

3 128 7 6 5 7 6 13

3 256 8 6 6 8 7 14

3 512 9 7 6 8 7 16

3 1024 9 7 7 9 8 17

3 2048 10 8 7 10 8 18

3 4096 11 9 8 10 9 19

5 64 5 4 4 5 4 10

5 128 6 4 4 6 5 11

5 256 6 5 4 6 5 12

5 512 7 5 5 7 5 12

5 1024 7 6 5 7 6 13

5 2048 8 6 6 8 6 14

5 4096 8 7 6 8 7 15

This can further be illuminated by looking at the redundan-

cies when the alphabet size becomes large. To proceed, we

make use of Star’s approximation [26] for Nq(r), given by

Nq(r) = qrs6

πr(q2−1) 1 + O1

r

≈qrs6

πr(q2−1) ,

as r→ ∞.

Now, letting q→ ∞ for the previous redundancies, we ﬁnd

Lq(r)≈qr−1,

LSw

q(r)≈1.38qr−2/√r,

LCap1

q(r) = LPel2

q(r)≈qr−1,

LCap2

q(r)≈2qr−1,

and

LPel1

q(r)≈1.38qr−2/√r,

conﬁrming our earlier observation.

For the redundancy of the balancing scheme with error

correction in Section IV, the total redundancy is r= 2r∗+ 3,

taking into account that we use the C∗code twice with

redundancy of r∗for each, and add one redundant symbol

for balancing and two more redundant check symbols. Fur-

thermore, the maximum length of the check matrix for one C∗

code is n=qr∗−1−1, since the one row is an all-ones row.

The total length of the encoded word is 2(qr∗−1−1) + 3. Let

LECC

q(r)denote the maximum length of the user word, then it

can be shown that for qodd

LECC

q(r) = 2(qr∗−1−1) + 3 −r

= 2qbr−3

2c−1−r+ 3 −2

= 2qbr−5

2c−r+ 1.

For qeven, one more redundant symbol is added. The values

for LECC

q(r)are also shown in Table IV, along with the error

correction code rate based on these values, where

RECC =LECC

q(r)

LECC

q(r) + r.

Again, for the binary case q= 2, since only the index v

needs to be encoded and not the integer s, we ﬁnd

LECC

2(r)=2br−2

2c−r−1.

B. Complexity

The main contributor to the time and space complexity in

our balancing scheme is the ECC component: vector-matrix

multiplication affecting the time complexity and the size of the

generator/parity check matrices affecting the space complexity.

For the complexity analyses we use r0≈logqkwhich holds

as k→ ∞, leading to m≈k+ logqk+ 1.

The time complexity for encoding consists of the error

correction and balancing aspects. For the error correction,

vector-matrix multiplication is needed with the number of

operations being k(m−1) = km −k=k2+klogqk,

resulting in O(k2). Note that we have not considered a parallel

vector-matrix multiplication implementation here. Using the

method brieﬂy described after Example 2 to ﬁnd sand v

results in complexity O((m+q) logqm). This is obtained

by ﬁrst computing weight(x⊕qb0,0), requiring O(mlogqm)

digit operations5. Next, the number of appearances of all q-

ary symbols in xare computed and stored in a table, also

requiring O(mlogqm)digit operations. If midenotes the

number of appearances of symbol iin x,i∈ Q, then for each

0≤s≤q−2, the weight of x⊕qbs+1,0can be computed

based on the weight of x⊕qbs,0using

weight(x⊕qbs+1,0) = weight(x⊕qbs,0) + m−qmq−1−s,

which requires only O(logqm)digit operations. Once an sis

found such that weight(x⊕qbs,0)≤Ωq,m ≤weight(x⊕q

5Here we make use of the fact that arithmetic operations with numbers up

to m(q−1) requires O(mlogqm)q-ary digit operations, assuming that m

is larger than q.

Transactions on Information Theory

11

TABLE VI

TIME/SPACE C OM PLE XI TY CO MPAR IS ON

[14] Scheme 1 and 2

[11]

Scheme 1

[15]

Scheme 2

[15]

Our scheme

Time complexity (encoding) O(klogqk)O(qk logqk)O(q k logqk)O(kplogqk)O(k2+q)

Time complexity (decoding) O(1) O(qk logqk)O(1) O(1) O(klogqk)

Space complexity (encoding) O(qk logqk)O(k+q)O(qk logqk)O(q k logqk)O(k2)

Space complexity (decoding) O(qk logqk)O(k+q)O(qk logqk)O(q k logqk)O(klogqk)

bs+1,0), a vhas to be found that balances the sequence.

The ﬁnal step of modulo qintegration has complexity O(m).

Taking all of these into account results in a time complexity

for encoding of O(k2+q).

The time complexity for decoding depends on modulo q

differentiation with complexity O(m)and vector-matrix mul-

tiplication to ﬁnd the syndrome, with the number of operations

r0(m−1) = r0m−r0=klogqk+ 2 logqk, ﬁnally resulting

in O(klogqk).

The space complexity for encoding mainly depends on the

generator matrix of size (m−1) ×k. After substitution for m

we have O(k2). Similarly, the space complexity for decoding

mainly depends on the parity check matrix of size (m−1)×r0,

resulting in O(klogqk).

In Table VI we compare the time and space complexity

of our balancing scheme, as shown above, with those of

previous schemes. In terms of encoding time complexity, the

new scheme generally takes longer than the previous ones,

except in the special case of large qand short k. Scheme 2

[15] is the most efﬁcient, regardless of qand k. Note that

we listed a reduced encoding time complexity for [14], since

the same fast encoding method for balancing as described

earlier can be used. Our decoding time complexity is a factor

of qless than the previous schemes, with time complexity

O(qk logqk). However, it cannot compete with the schemes

that have parallel decoding with O(1). Again, it should be

pointed out that one could improve the time complexity for our

new scheme if a parallel implementation for the vector-matrix

multiplication is considered, although this could potentially

increase the space complexity.

C. Performance

We look at the performance of four codes:

•R= 10/19 code with q= 3: an (8,5) linear block code

is used, giving a (16,10) code after interleaving, and after

balancing and adding the extra two check symbols this

becomes (19,10).

•R= 44/55 code with q= 3: a (26,22) linear block code

is used.

•R= 4/11 code with q= 5: a (4,2) linear block code is

used. (This is the code used in Example 3.)

•R= 12/21 code with q= 5: a (9,6) linear block code

is used.6

In general, if we start with an (n, k)code C∗, then the overall

rate of the scheme will be R=2k

2n+3 .

6This is an example of a code with a shorter user word length than the

maximum attainable, which according to Table IV is 42.

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

Prob. decoding failure

10010−110−210−310−4

Symbol error rate, ps

R=10

19 , q = 3 (Exh.)

R=10

19 , q = 3 (Synd.)

R=44

55 , q = 3 (Synd.)

R=4

11 , q = 5 (Exh.)

R=4

11 , q = 5 (Synd.)

R=12

21 , q = 5 (Synd.)

Fig. 6. Probability of decoding failure

In the ﬁgures we also compare the results of the previ-

ous exhaustive decoding algorithm (from [19]) with the fast

syndrome-based decoding presented here.

Fig. 6 shows the probability that a decoding failure occurred.

This can occur when:

•the imbalance indicates that possible multiple errors

occurred, i.e. the imbalance size is greater than the

maximum symbol size or an imbalance occurred in both

the codeword and the check symbols, or

•checks during the decoding indicated that decoding was

unsuccessful, i.e. if the syndrome indicated invalid posi-

tions, correction would have resulted in an invalid channel

symbol or correction proceeded, but the syndromes are

still non-zero.

Fig. 7 shows the symbol error rate after decoding for these

four codes. If a decoding failure occurred, the information

was discarded, and was not taken into account in the symbol

error rate calculations. Typically this would be applicable in

an ARQ system where the information would be requested

again.

These ﬁgures show that the same performance is attained

by the fast syndrome-based decoding method compared to

the previous exhaustive decoding method presented in [19].

However, it should be noted that the small difference in

performance between the two decoders is because in the

exhaustive method, decoding was performed even though the

check symbols could not determine the even or odd position

of the channel error, by iterating through all positions. In some

instances this resulted in two possible positions where the

error could be corrected, one correct and one incorrect. In

Transactions on Information Theory

12

10−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

Decoding symbol error rate, pe

10010−110−210−310−4

Symbol error rate, ps

R=10

19 , q = 3 (Exh.)

R=10

19 , q = 3 (Synd.)

R=44

55 , q = 3 (Synd.)

R=4

11 , q = 5 (Exh.)

R=4

11 , q = 5 (Synd.)

R=12

21 , q = 5 (Synd.)

Fig. 7. Decoding symbol error rate

0

100

200

300

Avg. decoding time [µs]

10010−110−210−310−4

Symbol error rate, ps

R=10

19, q = 3 (Exh.)

R=10

19, q = 3 (Synd.)

R=44

55, q = 3 (Exh.)

R=44

55, q = 3 (Synd.)

R=4

11, q = 5 (Exh.)

R=4

11, q = 5 (Synd.)

R=12

21, q = 5 (Exh.)

R=12

21, q = 5 (Synd.)

Fig. 8. Average time to perform one decoding operation.

these cases the ﬁrst possible position was used for correction.

However, this situation only occurs when more than one

channel error occurred, and is regarded as a decoding failure

in the fast syndrome-based decoder. If the exhaustive decoder

is changed to do the same, the exact same performance is

attained by both decoders.

Finally, we look at the time it takes for the algorithms

to decode. These results should be seen as comparative, as

different times would be obtained when the same simulations

are run on different computer hardware or using a different

programming language. Fig. 8 shows the average time that

the algorithms spend on a decoding operation (speciﬁcally

Steps 4 to 8 in the syndrome decoding algorithm above, and

the corresponding steps in the exhaustive algorithm). It is

clear that for the fast syndrome-based algorithm the average

decoding time is approximately the same, regardless of the

symbol error rate and the length of the code. As one would

expect, longer lengths cause the exhaustive algorithm to spend

considerable more time on decoding.

VI. DI SC US SI ON S

Although our new balancing scheme does not improve

in terms of redundancy or complexity compared to other

balancing schemes that do not need lookup tables (such as the

two schemes in [11]), it provides us with an error correcting

framework than can be extended to include error correction of

channel errors, as was done in Section IV for single errors.

Further improvements are possible if some of the known

tEC-AUED codes are employed, provided that one can adhere

to the parameter constraints (e.g. length of code, alphabet size

of code, etc.) of the chosen code. An immediate extension

would be to employ some of the known tEC-AUED codes

with t > 1, to be able to correct more than one channel error.

Alternatively, a tEC-AUED code with t= 2 could be used to

avoid the use of interleaving.

The most promising and ﬂexible option appears to be [24],

where the proposed codes can correct t1asymmetric errors of

maximum magnitude l1and t2asymmetric errors of maximum

magnitude l2with l1< l2. The authors state that the “model

can be naturally generalized to a wider range of magnitudes

as well as for errors in both directions”. Provided that the

code is generalized for errors in both directions for the larger

magnitude errors, our code in Section IV can be replaced by

such a code with t1=t2= 1,l1= 1 and l2=q−1. Again,

to avoid interleaving we can use a similar code with t1= 1,

t2= 2,l1= 1 and l2=q−1. Either of these options can be

extended to correct multiple channel errors by increasing the

value of t2.

An easy method to obtain lower redundancies for the error-

correcting balancing scheme described can be attained by

adding more columns to the check matrix, where the ﬁrst non-

zero element from the bottom is one, e.g.

H∗

3,3=

120120120121

001112221110

111111110000

,

where the last four columns were added. However, this will

increase the complexity of the decoding algorithm slightly, as

the error states cannot be determined simply by looking at sr∗

and s0

r∗.

VII. CON CL US IO NS

We have presented a simple method for balancing q-ary

codewords, where look-up tables for coding and decoding the

preﬁx can be avoided by making use of an error correction

technique. The redundancy of the new construction is compa-

rable to other constructions for certain parameters, but a factor

away in other cases.

The method was expanded to include error correction ca-

pabilities to correct single channel errors by simply extending

the already used error correction code, introducing interleav-

ing and adding a further three redundant symbols. A fast

syndrome-based decoding algorithm was presented that can

correct single channel errors more quickly than the prior art

exhaustive decoding algorithm.

Although simultaneous balancing and error correction have

been investigated before, this is the ﬁrst attempt to closely tie

the two operations together. By doing this we have established

a balancing scheme within an error correction framework

that can easily be extended in future to account for multiple

channel errors.

Transactions on Information Theory

13

ACK NOW LE DG ME NT S

The authors would like to extend their gratitude towards

the anonymous reviewers for their insightful comments and

critiques that improved this paper.

REF ER EN CE S

[1] K. A. S. Immink, “Coding methods for high-density optical recording,”

Philips J. Res., vol. 41, pp. 410–430, 1986.

[2] K. W. Cattermole, “Principles of digital line coding,” Int. J. Electron.,

vol. 55, pp. 3–33, July 1983.

[3] H. Zhou, A. Jiang, and J. Bruck, “Balanced modulation for nonvolatile

memories,” IEEE Trans. Inform. Theory, submitted for publication.

Available: http://arxiv.org/abs/1209.0744

[4] D. E. Knuth, “Efﬁcient balanced codes,” IEEE Trans. Inform. Theory,

vol. 32, no. 1, pp. 51–53, Jan. 1986.

[5] S. Al-Bassam and B. Bose, “On balanced codes,” IEEE Trans. Inform.

Theory, vol. 36, no. 2, pp. 406–408, Mar. 1990.

[6] L. G. Tallini, R. M. Capocelli, and B. Bose, “Design of some new

efﬁcient balanced codes,” IEEE Trans. Inform. Theory, vol. 42, no. 3,

pp. 790–802, May 1996.

[7] J. H. Weber and K. A. S. Immink, “Knuth’s balanced codes revisited,”

IEEE Trans. Inform. Theory, vol. 56, no. 4, pp. 1673–1679, Apr. 2010.

[8] H. van Tilborg and M. Blaum, “On error-correcting balanced codes,”

IEEE Trans. Inf. Theory, vol. 35, no. 5, pp. 1091–1095, Sep. 1989.

[9] S. Al-Bassam and B. Bose, “Design of efﬁcient error-correcting balanced

codes,” IEEE Trans. Comput., vol. 42, no. 10, pp. 1261–1266, Oct. 1993.

[10] J. H. Weber, K. A. S. Immink, and H. C. Ferreira, “Error-correcting

balanced Knuth codes,” IEEE Trans. Inform. Theory, vol. 58, no. 1, pp.

82–89, Jan. 2012.

[11] R. M. Capocelli. L. Gargano, and U. Vaccaro, “Efﬁcient q-ary immutable

codes,” Discrete Appl. Math., vol. 33, pp. 25–41, 1991.

[12] L. G. Tallini and U. Vaccaro, “Efﬁcient m-ary balanced codes,” Discrete

Appl. Math., vol. 92, pp. 17–56, 1999.

[13] S. Al-Bassam, “Balanced codes,” Ph.D. dissertation, Oregan State Uni-

versity, USA, 1990.

[14] T. G. Swart and J. H. Weber, “Efﬁcient balancing of q-ary sequences

with parallel decoding,” in Proc. IEEE Intl. Symp. Inform. Theory, Seoul,

South Korea, Jun. 29–Jul. 3, 2009, pp. 1564–1568.

[15] D. Pelusi, S. Elmougy, L. G. Tallini and B. Bose, “m-ary balanced codes

with parallel decoding,” IEEE Trans. Inform. Theory, vol. 61, no. 6, pp.

3251–3264, May 2015.

[16] A. Baliga and S. Boztas¸, “Balancing sets of non-binary vectors,” in Proc.

IEEE Intl. Symp. Inform. Theory, Lausanna, Switzerland, Jun. 30–Jul.

5, 2002, p. 300.

[17] R. Mascella, L. G. Tallini, S. Al-Bassam and B. Bose, “On efﬁcient

balanced codes over the mth roots of unity,” IEEE Trans. Inform.

Theory, vol. 52, no. 5, pp. 2214–2217, May 2006.

[18] R. Mascella and L. G. Tallini, “Efﬁcient m-ary balanced codes which

are invariant under symbol permutation,” IEEE Trans. Comput., vol. 55,

no. 8, pp. 929–946, Aug. 2006.

[19] T. G. Swart and K. A. S. Immink, “Preﬁxless q-ary balanced codes with

ECC,” in Proc. IEEE Inform. Theory Workshop, Seville, Spain, Sep. 9–

13, pp. 1–5.

[20] R. W. Hamming, Coding and Information Theory, Prentice-Hall, Engle-

wood Cliffs, 1986.

[21] F.-W. Fu, S. Ling and C. Xing, “Constructions for nonbinary codes

correcting tsymmetric errors and detecting all unidirectional errors:

Magnitude error criterion,” Progress in Comput. Sci. and Applied Logic,

vol. 23, pp. 139–152, 2004.

[22] I. Naydenova and T. Kløve, “Some optimal binary and ternary t-EC-

AUED codes,” IEEE Trans. Inform. Theory, vol. 55, no. 11, pp. 4898–

4904, Nov. 2009.

[23] T. Kløve, B. Bose and N. Elarief, “Systematic, single limited magnitude

error correcting codes for ﬂash memories,” IEEE Trans. Inform. Theory,

vol. 57, no. 7, pp. 4477–4487, Jun. 2011.

[24] E. Yaakobi, P. H. Siegel, A. Vardy and J. K. Wolf, “On codes that correct

asymmetric errors with graded magnitude distribution,” in Proc. IEEE

Intl. Symp. Inform. Theory, Saint-Petersburg, Russia, Jul. 31–Aug. 5,

2011, pp. 1056–1060.

[25] N. J. A. Sloane, ed., “The on-line encyclopedia of integer sequences,”

2005. [Online]. Available: https://oeis.org

[26] Z. Star, “An asymptotic formula in the theory of compositions,” Aequa-

tiones Mathematicae, vol. 13, pp. 279–284, 1975.

Theo G. Swart (M’05-SM’14) received the B.Eng. and M.Eng. degrees in

electrical and electronic engineering from the Rand Afrikaans University,

South Africa, in 1999 and 2001, respectively, and the D.Eng. degree from

the University of Johannesburg, South Africa, in 2006.

He is an associate professor in the Department of Electrical and Electronic

Engineering Science and a member of the UJ Center for Telecommunications.

He is the chair of the IEEE South Africa Chapter on Information Theory.

His research interests include digital communications, error-correction coding,

constrained coding and power-line communications.

Jos H. Weber (S’87-M’90-SM’00) was born in Schiedam, The Netherlands,

in 1961. He received the M.Sc. (in mathematics, with honors), Ph.D., and

MBT (Master of Business Telecommunications) degrees from Delft University

of Technology, Delft, The Netherlands, in 1985, 1989, and 1996, respectively.

Since 1985 he has been with the Faculty of Electrical Engineering, Math-

ematics, and Computer Science of Delft University of Technology. Currently,

he is an associate professor in the Department of Applied Mathematics.

He is the chairman of the WIC (Werkgemeenschap voor Informatie- en

Communicatietheorie in the Benelux) and the secretary of the IEEE Benelux

Chapter on Information Theory. He was a Visiting Researcher at the University

of California at Davis, USA, the University of Johannesburg, South Africa,

the Tokyo Institute of Technology, Japan, and EPFL, Switzerland. His main

research interests are in the area of channel coding.

Kees A. Schouhamer Immink (M’81-SM’86-F’90) received his Ph.D. degree

from the Eindhoven University of Technology. He was from 1994 till 2014

an adjunct professor at the Institute for Experimental Mathematics, Essen,

Germany. In 1998, he founded Turing Machines Inc., an innovative start-up

focused on novel signal processing for hard disk drives and solid-state (Flash)

memories.

He received the Golden Jubilee Award for Technological Innovation by the

IEEE Information Theory Society in 1998. He received the 2017 IEEE Medal

of Honor, a knighthood in 2000, a personal Emmy award in 2004, and the

2015 IET Faraday Medal. He was elected into the (US) National Academy

of Engineering, and he received an honorary doctorate from the University of

Johannesburg in 2014.