Content uploaded by Hendrik Christoffel Ferreira

Author content

All content in this area was uploaded by Hendrik Christoffel Ferreira on Mar 24, 2014

Content may be subject to copyright.

Content uploaded by Hendrik Christoffel Ferreira

Author content

All content in this area was uploaded by Hendrik Christoffel Ferreira on Mar 24, 2014

Content may be subject to copyright.

82 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 1, JANUARY 2012

Error-Correcting Balanced Knuth Codes

Jos H. Weber, Senior Member, IEEE, Kees A. Schouhamer Immink, Fellow, IEEE, and

Hendrik C. Ferreira, Senior Member, IEEE

Abstract—Knuth’s celebrated balancing method consists of in-

verting the ﬁrst

bits in a binary information sequence, such that

the resulting sequence has as many ones as zeroes, and communi-

cating the index

to the receiver through a short balanced preﬁx.

In the proposed method, Knuth’s scheme is extended with error-

correcting capabilities, where it is allowed to give unequal protec-

tion levels to the preﬁx and the payload. The proposed scheme is

very general in the sense that any error-correcting block code may

be used for the protection of the payload. Analyses with respect to

redundancy and block and bit error probabilities are performed,

showing good results while maintaining the simplicity features of

the original scheme. It is shown that the Hamming distance of the

code is of minor importance with respect to the error probability.

I. INTRODUCTION

S

ETS of binary sequences that have a ﬁxed length and

a ﬁxed weight (number of ones)

are usually called

constant-weight codes. An important sub-class is formed

by the so-called balanced codes, for which

is even and

, i.e., all codewords have as many zeroes as ones.

Such codes have found application in various transmission and

(optical/magnetic) recording systems, e.g., in ﬂash memories

[11]. A survey on balanced codes can be found in [4].

A simple method for generating balanced codewords, which

is capable of encoding and decoding (very) large blocks, was

proposed by Knuth [6] in 1986. In his method, an

-bit binary

data word,

even, is forwarded to the encoder. The encoder in-

verts the ﬁrst

bits of the data word, where is chosen in such

a way that the modiﬁed word has equal numbers of zeroes and

ones. Knuth showed that such an index

can always be found.

The index

is represented by a balanced word of length . The

-bit preﬁx word followed by the modiﬁed -bit data word are

both transmitted, so that the rate of the code is

. The

receiver can easily undo the inversion of the ﬁrst

bits received

once

is computed from the preﬁx. Both encoder and decoder

do not require large look-up tables, and Knuth’s algorithm is,

therefore, very attractive for constructing long balanced code-

words. The redundancy of Knuth’s method is roughly twice the

redundancy of a code which uses the full set of balanced words.

Since the latter has a prohibitively high complexity in case of

large lengths, the factor of two can be considered as a price to

Manuscript received January 05, 2011; revised July 11, 2011; accepted July

20, 2011. Date of publication September 15, 2011; date of current version Jan-

uary 06, 2012. This work was supported by Grant “Theory and Practice of

Coding and Cryptography,” Award Number NRF-CRP2-2007-03.

J. H. Weber is with the Delft University of Technology, The Netherlands

(e-mail: j.h.weber@tudelft.nl).

K. A. S. Immink is with the Turing Machines BV, The Netherlands, and also

with Nanyang Technological University, Singapore (e-mail: immink@turing-

machines.com).

H. C. Ferreira is with the University of Johannesburg, Johannesburg, South

Africa (e-mail: hcferreira@uj.ac.za).

Communicated by M. Blaum, Associate Editor for Coding Techniques.

Digital Object Identiﬁer 10.1109/TIT.2011.2167954

be paid for simplicity. In [5] and [10], modiﬁcations to Knuth’s

method are presented closing this gap while maintaining sufﬁ-

cient simplicity.

Knuth’s method does not provide protection against errors

which may occur during transmission or storage. Actually, er-

rors in the preﬁx may lead to catastrophic error propagation in

the data word. Here, we propose and analyze a method to ex-

tend Knuth’s original scheme with error correcting capabilities.

Previous constructions for error-correcting balanced codes were

given in [2], [9] and [8]. In [9], van Tilborg and Blaum intro-

duced the idea to consider short balanced blocks as symbols of

an alphabet and to construct error-correcting codes over that al-

phabet. Only moderate rates can be achieved by this method,

but it has the advantage of limiting the digital sum variation

and the runlengths. In [2], Al-Bassam and Bose constructed bal-

anced codes correcting a single error, which can be extended to

codes correcting up to two, three, or four errors by concatenation

techniques. In [8], Mazumdar, Roth, and Vontobel considered

linear balancing sets and applied such sets to obtain error-cor-

recting coding schemes in which the codewords are balanced.

In the method proposed in the current paper, we stay very close

to the original Knuth algorithm. Hence, we only operate in the

binary ﬁeld and inherit the low-complexity features of Knuth’s

method. In our method, the error-correcting capability can be

any number. The focus will be on long codes, for which table

look-up methods are unfeasible. An additional feature is the pos-

sibility to assign different error protection levels to the preﬁx

and the payload, which will be shown to be useful when de-

signing the scheme to achieve a certain required error perfor-

mance while optimizing the rate. In this context, it turns out that

the Hamming distance of the proposed code is of minor impor-

tance.

The rest of this paper is organized as follows. In Section II, the

proposed method for providing balancing and error-correcting

capabilities is presented. In Section III, the redundancy of the

new scheme is considered. The block and bit error probabili-

ties are analyzed in Section IV. Case studies are presented in

Section V. Finally, the results of this paper are discussed in

Section VI.

II. C

ONSTRUCTION METHOD

The proposed construction method is based on a combination

of conventional error correction techniques and Knuth’s method

for obtaining balanced words. The encoding procedure consists

of fours steps which are described below and illustrated in Fig. 1.

The input to the encoder is a binary data block

of length . Let

denote a run of bits , e.g., .

1) Encode

using a binary linear block code

of dimension , Hamming distance , and even length .

The encoding function is denoted by

.

0018-9448/$26.00 © 2011 IEEE

WEBER et al.: ERROR-CORRECTING BALANCED KNUTH CODES 83

Fig. 1. Encoding procedure.

2) Find a balancing index for the obtained codeword ,

with

.

3) Invert the ﬁrst

bits of , resulting in the balanced word

.

4) Encode the number

into a unique codeword from a

binary code

of even length , constant weight , and

Hamming distance

. The encoding function is denoted

by

.

The output of the encoder is the concatenation of the balanced

word

, called the preﬁx, and the balanced word

, called the bulk or payload. It is obvious that

the resulting code

is balanced and has length and

redundancy

, and thus, code rate

and normalized redundancy . Its

Hamming distance

satisﬁes the following lower bound.

Theorem 1: The Hamming distance

of code is at least

Proof: Let and denote two different code-

words of

and let .If , then the Hamming

distance between the codewords is at least

, since and are

both in

.If , then the Hamming distance between the

codewords is at least

, which follows from the fact that

and are two different codewords from

, implying that , and the fact that and are

both balanced, implying that

is even.

Corollary 1: In order to make capable of correcting up to

errors, it sufﬁces to choose constituent codes and with

distances

and , respectively.

Upon receipt of a sequence

, where and have lengths

and , respectively, a simple decoding procedure, illustrated

in Fig. 2, consists of the following steps.

1) Look for a codeword

in which is closest to , and set

.

2) Invert the ﬁrst

bits in , i.e., set .

3) Decode

according to a decoding algorithm for code ,

leading to an estimated codeword

, and thus, to an esti-

mated information block

.

Fig. 2. Decoding procedure.

The following results are immediate.

Theorem 2: The proposed decoding procedure for code

corrects any error pattern with at most errors in the

ﬁrst

bits and at most errors in the last bits.

Corollary 2: The proposed decoding procedure for code

corrects up to the number of errors

guaranteed by the Hamming distance result from Theorem 1.

Corollary 3: The proposed decoding procedure for code

corrects up to errors if and .

Typically, as suggested in Figs. 1 and 2, the length

of the

bulk is much larger than the length

of the preﬁx, and conse-

quently also

is usually (much) larger than . Hence, from

the results presented in this section, the ﬁnal Hamming distance

is only in such cases, which may be very small with re-

spect to the overall length

. However, it will be shown

in Section IV, that from the perspective of achieving a certain

target decoding error probability, the overall Hamming distance

of our scheme is a parameter which is only of minor importance.

III. R

EDUNDANCY

In this section, we ﬁrst derive a lower bound on the redun-

dancy of balanced codes with error-correcting capabilities.

Then, we compare the redundancy of the code from Section II

with this lower bound.

Let

denote the maximum cardinality of a code of

length

, constant weight , and even Hamming distance .

Hence, for any balanced code of even length

and Hamming

distance

, the redundancy is at least

(1)

Since

(2)

84 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 1, JANUARY 2012

the minimum redundancy for a balanced code without error cor-

rection capabilities [6] is

(3)

(4)

(5)

where the ﬁrst approximation is due to the well-known Stirling

formula

(6)

No general expression for

is known, but bounds are

available in literature. From Theorem 12 in [1], we have the

upper bound

(7)

where

. Note that for , i.e., , this gives the

same expression as (2), and thus, the bound is tight in this case.

The upper bound (7) can be used to lower bound the minimum

redundancy

from (1) in case , i.e., is at least

(8)

Note that this expression can be decomposed into a contribution

from the balance property and a contribution

from the capability of correcting up to errors. Using Stirling’s

formula, we obtain the approximation

(9)

Next, we will investigate the difference between the lower

bound on the redundancy, of which it is unknown whether

it is achievable in general, and the redundancy

of the proposed

construction method. We know from [6] that the redundancy of

the Knuth scheme, without error correction, exceeds the min-

imum achievable redundancy by a factor of two. Obviously, this

is a price to be paid for simplicity. For the proposed method

with correction, the redundancy is equal to the sum of the re-

dundancy

of code , and the length of the preﬁx. For

neither of these terms a general expression is available. Another

complication in the analysis is that the error correction levels

of and of may be dif-

ferent. The value of

depends on the choice of .For

example, for BCH codes it is roughly

[7]. The length

of the preﬁx can be decomposed into two parts: a contribution of

length

identifying the balancing index and a contribution

of length roughly

, based on the presented

bound (9), providing the error correction and balancing proper-

ties to the preﬁx. Hence, assuming

, as for

BCH codes, the total redundancy

of the proposed

method can be approximated as

(10)

Further assuming that

and that the length is very

large, the comparison of this expression to (9) gives that the re-

dundancy of our method is roughly a factor

higher than (the lower bound on) the minimum

redundancy for any

-error-correcting balanced code. Although

the results presented in this analysis are based on bounds and ap-

proximations rather than exact results, it seems safe to conclude

that the redundancy of the presented method is within a factor

of two of the optimum. This will be illustrated in Section V. The

redundancy of the codes presented in [2] is (slightly) lower, but

the method presented here is simpler and more general (since

the constructions from [2] are for

only). The construc-

tions from [9] are of a completely different nature, with much

higher redundancies but balancing being established on a very

small block scale.

IV. E

RROR

PROBABILITY

ANALYSIS

Since the length of the preﬁx is considerably shorter than the

length of the bulk, the probability of the preﬁx being hit by a

random error is proportionally smaller. Therefore, it may be

considered giving the preﬁx a lower error correcting capability.

On the other hand, uncorrectable errors in the preﬁx will lead to

a wrong balancing index

and may thus lead to a huge number

of errors in the bulk. Therefore, it may be worthwhile to invest

in some extra protection of the preﬁx. Hence, determining the

error correction levels of the codes

and is a delicate issue.

This will be investigated from both the block error probability

and the bit error probability perspectives, in Sections IV-A and

IV-B, respectively. Throughout the analysis, we will assume a

memoryless binary symmetric channel with error probability

.

A. Block Error Probability

In this subsection, we will investigate proper choices of the

error correction capabilities in the context of the block error

probability

, which is deﬁned as the probability that the

decoding result

is different from the original information

block

.

An often-used general expression for the block error proba-

bility for a block code of length

and Hamming distance ,

thus correcting up to

errors, is

(11)

Actually, this is an upper bound, since error correction might

also take place in case more than

errors occur. To which extent

this could happen depends on the structure of the code and the

implementation of its decoder. However, this effect will be ne-

glected throughout this paper. In other words, the performance

analysis reﬂects a worst case scenario: the real error probabili-

ties may be (a little bit) better. A well-known approximation of

(11) is obtained by considering only its ﬁrst term

(12)

WEBER et al.: ERROR-CORRECTING BALANCED KNUTH CODES 85

since it dominates all other terms. A further simpliﬁcation is

obtained by ignoring the factor

in (12), leading

to

(13)

The smaller

, the better (12) and (13) approximate (11).

Since the scheme presented in Section II is multistep, the

evaluation is a bit more involved. For a received word

of length

, let and be the number of errors in

the last

bits (the bulk) and the ﬁrst bits (the preﬁx), respec-

tively. We consider various situations.

•If

and , then

correction of all errors is guaranteed.

•If

and , then the balancing index is cor-

rectly retrieved from the preﬁx, but the number of errors

in the bulk is beyond the error-correcting capability of

,

thus leading to a wrong decoding result.

•If

, then the balancing index retrieved from the

preﬁx is wrong, i.e.,

, leading to the introduction of

extra errors in the bulk due to the inversion process in step

2 of the decoding procedure. Assuming

for the mo-

ment, successful decoding requires

.

Since, typically, the bulk length

(and thus, the range of

-values) is very large and the error correcting capability

is very small, the probability of the -decoder still being

successful is negligible. If

, then the successful

interval will shrink even further, except when a ’transmis-

sion error’ coincides with an ’inversion error’. Still, even

in the latter case, chances for correct decoding are very

low. It could be helpful to design

, the mapping of the

-values to codewords in , in such a way that distances

between codewords corresponding to subsequent

-values

are kept as low as possible, but also the impact of this will

be limited. In conclusion, we assume that

implies

a wrong decoding result with high probability.

Hence, we approximate the block error probability by

(14)

(15)

(16)

where

(17)

(18)

and

(19)

(20)

The approximations in (18) and (20) are only valid if

and

, respectively, are sufﬁciently small, which we will assume

throughout the rest of this section.

When designing the scheme such that the redundancy is min-

imized while achieving a certain Hamming distance, it follows

from the results in Section II that the Hamming distances of the

constituent codes should be chosen (about) equal, i.e.,

.

However, since the preﬁx is mostly much shorter than the bulk,

the chances of the bulk being hit by errors is much larger than

for the preﬁx. Therefore, intuitively, it should be beneﬁcial to

choose

smaller than . Indeed, when the focus is on the block

error probability rather than the Hamming distance, this turns

out to be true. When designing the scheme to achieve a certain

target block error probability at maximum rate for a given data

length and channel error probability, the focus should not be on

the ﬁnal Hamming distance

, but on a careful choice of the error

correction capabilities

and , where the latter can typically

be smaller than the former. Since it follows from (16), (18), and

(20) that

is approximately

(21)

an appropriate design strategy is choosing

and such that

both terms in (21) are in the same order of magnitude as the

target block error probability, while their sum is (just) below this

target. Note that if

, which is usually the case, the bino-

mial coefﬁcient in the ﬁrst term in (21) is huge in comparison to

the binomial coefﬁcient in the second term, and, therefore, the

exponent of

in the second term can be considerably smaller

than the exponent in the ﬁrst term.

If

and are substantially smaller than , then a more

detailed general analysis is possible. To this end, we evaluate

for the case . Using (18) and (20), we ﬁnd

(22)

(23)

where the second approximation follows from

,

which is quite good if

is much smaller than .If ,

then (23) gives

(24)

which shows that

is indeed orders of magnitude below

in case . For any , keeping ﬁxed and

reducing

by 1 causes a growth in by a factor

(25)

where

and are the preﬁx lengths in the cases that preﬁx error

correcting capabilities are

and , respectively. Note that

for ranges of practical interest

is the dominating factor in

(25). Hence, after several reductions of

the ratio may

be well above zero, i.e.,

may no longer be negligibly small.

86 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 1, JANUARY 2012

B. Bit Error Probability

When the prime interest is the bit error probability rather than

the block error probability, then the picture may be different

from the one sketched in the previous subsection. A decoding

error in the bulk typically results in just a few erroneous bits

in the data, but a decoding error in the preﬁx may completely

destroy the data. This would ask for a stronger error protection

of the preﬁx. Therefore, in this subsection, we analyze the bit

error probability

for the proposed scheme.

In general, for a block code of length

and Hamming dis-

tance

, the bit error probability can be well approximated by

(26)

where

is as given in (11). This approximation is based on

the fact that in case erroneous decoding occurs it is most likely

that the output codeword is still close to the original codeword.

Since closest codewords differ in

bits and the total number of

bits is

, the result follows.

As in the previous subsection, the analysis of the proposed

scheme is more involved due to its multistep character. Again,

for a received word of length

, let and be the

number of errors in the last

bits (the bulk) and the ﬁrst bits

(the preﬁx), respectively. We consider various situations.

•If

and , then

correction of all errors is guaranteed and the original data

block is retrieved at the receiver.

•If

and , then the balancing index is cor-

rectly retrieved from the preﬁx, but the number of errors in

the bulk is beyond the error-correcting capability of

.As

argued in the previous paragraph, the resulting fraction of

erroneous bits in the data block is

.

•If

, then the balancing index retrieved from the

preﬁx is wrong, i.e.,

. As argued in the previous sub-

section, the ﬁnal decoding result is wrong with high proba-

bility, and, moreover, the number of bit errors can be huge.

The actual distribution of the number of errors depends

on the implementation of Knuth’s algorithm and the map-

ping

. When assuming that both and are more or less

uniformly distributed, it follows from standard probability

theory that the expected value of the absolute difference

between

and is . Hence, the expected fraction of

erroneous bits in the data word is

.

Hence, we approximate the bit error probability by

(27)

where

(28)

and

(29)

and where

and are as given in (17) and (19), respectively.

It thus follows that

is approximately

(30)

Following a similar reasoning as in the previous subsection, an

appropriate design strategy is choosing

and such that both

terms in (30) are in the same order of magnitude as the target

bit error probability, while their sum is (just) below this target.

Again, note that if

, the coefﬁcient in the ﬁrst term in

(30) is huge in comparison to the coefﬁcient in the second term,

and, therefore, the exponent of

in the second term can be con-

siderably smaller than the exponent in the ﬁrst term.

If

and are substantially smaller than , then a more de-

tailed general analysis is possible. To this end, evaluate

for the case . Using (28), (29), (18), and (20), we ﬁnd

(31)

(32)

where the approximation comes from (23). If

, then (32)

gives

(33)

which shows that

is indeed orders of magnitude below if

and . For any , keeping ﬁxed and

reducing

by 1 causes a growth in by the factor given in

(25). Hence, after several reductions of

the ratio may

be well above zero, i.e.,

may no longer be negligibly small.

Assuming

, this ﬁnal value is larger than for the

block error probability case, since, for

, exceeds

by a factor of , while the growth factor when

reducing

is the same in both cases.

In conclusion, when designing the scheme to achieve a certain

at maximum rate, the focus should, as for the case,

not be on the ﬁnal Hamming distance

, but on a careful choice

of the error correction capabilities

and . Again, the latter

can typically be smaller than the former, but not to the same

extent as for the block error probability case.

V. C

ASE STUDIES

In this section, we illustrate the results obtained in this paper

by working out two case studies. In the ﬁrst case, various op-

tions for the constituent codes are considered and studied for

one ﬁxed channel. In the second case, one particular code for

protecting the payload is evaluated for different channel condi-

tions.

A. Case Study I: Shortened BCH Codes

Let the information block length be

. We consider

codes , with ,

obtained by shortening

BCH codes

[7]. For

, we consider the shortest known balanced codes with

cardinality at least

and Hamming distance ,

WEBER et al.: ERROR-CORRECTING BALANCED KNUTH CODES 87

TABLE I

C

ARDINALITIES OF THE

LARGEST

KNOWN BALANCED

CODES WITH

LENGTH

AND HAMMING

DISTANCE

[3]

TABLE II

C

ODE PARAMETERS IN THE

SETTING OF

CASE STUDY

I FOR

WITH

with . Such balanced codes are tabulated on [3],

from which we collected the cardinalities of some short codes

in Table I.

An overview of the parameters of codes obtained by choosing

is provided in Table II. If , then it is found

from Table I that a preﬁx of length 12 is required to represent

the 750 possible balancing positions without error correction ca-

pabilities, as in the original Knuth case, leading to a code rate

of

, i.e., a normalized redundancy of

.If , then it is found from Table I that a

preﬁx of length 16 is required to represent the 760 possible bal-

ancing positions in the BCH codeword with a single error cor-

rection capability, leading to a higher normalized redundancy of

. Further increasing the value of

leads to higher distances at the expense of higher redundan-

cies, as can be checked from the table. For

, has length

and Hamming distance , while has

length

and Hamming distance . Thus, the code

has length , redundancy ,

normalized redundancy

, and Ham-

ming distance

. In Table III, we compare the normal-

ized redundancy

of our scheme to , which is the lower

bound on the normalized redundancy of any scheme with the

same length

and error correction level . Note that

is, as for Knuth’s original method, close to 2, in fact a little

bit less in case we have error-correcting capabilities. The factor

TABLE III

R

EDUNDANCY

COMPARISON IN THE

SETTING OF

CASE

STUDY I

FOR

WITH

TABLE IV

C

ODE PARAMETERS IN THE SETTING OF CASE STUDY I FOR

AND

TABLE V

N

UMERICAL EVALUATIONS OF

AND ,WITH ,

, ,

AND

TABLE VI

N

UMERICAL EVALUATIONS OF AND ,WITH ,

, , , AND

, the price to be paid for simplicity, may even be smaller

since

is only a lower bound on of which it is unknown

whether it is achievable in case

.

The proposed scheme also offers the option to provide un-

equal error protection to the bulk and the preﬁx. An overview of

the parameters of codes obtained by ﬁxing

and varying

is provided in Table IV. Choosing , i.e., providing no

error correction capability to the preﬁx, gives a Hamming dis-

tance

and a normalized redundancy

. Note that increasing up to the value of 3 increases

both the Hamming distance (since

) and the redundancy. However, also note that further

increasing

from 3 to 4 (or beyond) increases the redundancy,

without the reward of an improved distance

(since it is stuck

at

due to the fact that ).

88 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 1, JANUARY 2012

Fig. 3. Probabilities and (for ), as a function of the channel error probability , for Case Study II.

Fig. 4. Probabilities and (for

), as a function of the channel error probability

, for Case Study II.

Next, we study the block error probability, where we assume

that the channel error probability

is . In Table V, the

values of

and are displayed for various choices of

and . Let the target block error probability for our applica-

tion be

, i.e., the error protection levels and should

be chosen in such a way that

does not exceed .

From Table V, we conclude that

should be (at least) equal

to 3. From Table II, we see that choosing

as well would

result in a code

with normalized redundancy 0.0672 and Ham-

ming distance 8, while we see from Table V that

. However, from Tables IV and

V, we can also conclude that keeping

and lowering

from 3 to only 1 results in (i) a normalized redundancy decrease

to 0.0578, i.e., a reduction by

, (ii) a Hamming distance de-

crease to 4, and (iii) a block error probability still meeting the

target:

. Hence,

we obtain a rate increase while still meeting the performance

requirement, in spite of the distance drop. Note that a further

rate increase is not possible within this scheme, since the per-

formance requirement is not met by the choice

, i.e., by

giving the preﬁx no error protection at all. Similar observations

can be made for the case that the target

is , where

we ﬁnd that

is required, but that is sufﬁcient for

the error protection of the preﬁx.

Finally, we investigate the bit error probability, still assuming

. In Table VI, the values of and are displayed for

various choices of

and . Let the target bit error probability

for a certain application be

, i.e., the error protection levels

and should be chosen in such a way that does not

exceed

. Note that the choice and satisﬁes

the requirement, but that

cannot be reduced any further. This

indicates that when designing the scheme to achieve a certain

target bit error probability,

could still be chosen smaller than

, but not to the same extent as for the block error probability.

B. Case Study II: Reed-Muller Code RM(6,10)

In this subsection, we use the Reed Muller Code RM(6,10)

[7] as the code

protecting the payload. The code length

, the data block length ,

and the Hamming distance

, and thus, the

error correction level

. We investigate

the choice of an appropriate code

protecting the preﬁx in the

proposed scheme, for different channel error probabilities

.

WEBER et al.: ERROR-CORRECTING BALANCED KNUTH CODES 89

The balanced code should have a size of at least

and a length as small as possible. For small values of the Ham-

ming distance

, the minimum length can be determined from

Table I. For Hamming distances 2, 4, 6, 8, and 10 the minimum

lengths are 14, 16, 22, 24, and 30, respectively. The error pro-

tection level

.

The block error probability can be approximated by the sum

of

and as given in (19).

For

, the values of and (for

) are given in Fig. 3. When choosing the error protec-

tion level

, then is many orders of magnitudes

below

. Hence, in order to lower the redundancy, should

rather be chosen smaller. Also when lowering

from 7 to 4, and

thus, considerably decreasing the redundancy,

is still negli-

gible compared to

for the whole range covered in the ﬁgure,

so further reductions may be possible. The ﬁgure shows that the

channel error probability

determines to which extent can be

reduced. The higher

, the lower can be chosen while keeping

well below .

The bit error probability can be approximated by the

sum of

and , which, according to (28) and (29),

are

and , respectively. For

, the values of and (for )

are given in Fig. 4. Similar conclusions as for the block error

probability can be drawn, i.e., the higher

, the lower can be

chosen while keeping

well below . However, the resulting

value of

for the bit error probability case is higher than for

the bit error probability case, due to the fact that

exceeds

by a factor of .

VI. C

ONCLUSION

We have extended Knuth’s balancing scheme with error-cor-

recting capabilities. The approach is very general in the sense

that any block code can be used to protect the payload, while

the preﬁx of length

is protected by a constant-weight code

where the weight is

. It has been demonstrated that, in order

to meet a certain target block or bit error probability in an ef-

ﬁcient way, the distances of the constituent codes may prefer-

ably be unequal. Hence, from the performance perspective, the

overall Hamming distance is of minor importance. As for the

original Knuth algorithm, the scheme’s simplicity comes at the

price of a somewhat higher redundancy than the most efﬁcient

but prohibitively complex code. Therefore, the proposed scheme

is an attractive simple alternative to achieve (long) balanced se-

quences with error correction properties.

R

EFERENCES

[1] E. Agrell, A. Vardy, and K. Zeger, “Upper bounds for constant-weight

codes,” IEEE Trans. Inf. Theory, vol. 46, no. 7, pp. 2373–2395, Nov.

2000.

[2] S. Al-Bassam and B. Bose, “Design of efﬁcient error-correcting bal-

anced codes,” IEEE Trans. Computers, vol. 42, no. 10, pp. 1261–1266,

Oct. 1993.

[3] A. E. Brouwer, Bounds for Binary Constant Weight Codes [Online].

Available: http://www.win.tue.nl/~aeb/codes/Andw.html

[4] K. A. S. Immink, Codes for Mass Data Storage Systems, 2nd ed.

Eindhoven, The Netherlands: Shannon Foundation Publishers, 2004.

[5] K. A. S. Immink and J. H. Weber, “Very efﬁcient balanced codes,”

IEEE J. Sel. Areas Commun., vol. 28, no. 2, pp. 188–192, Feb. 2010.

[6] D. E. Knuth, “Efﬁcient balanced codes,” IEEE Trans. Inf. Theory, vol.

IT-32, no. 1, pp. 51–53, Jan. 1986.

[7] S. Lin and D. J. Costello, Jr., Error Control Coding, 2nd ed. Upper

Saddle River, NJ: Pearson Prentice-Hall, 2004.

[8] A. Mazumdar, R. M. Roth, and P. O. Vontobel, “On linear balancing

sets,” in Proc. IEEE Int. Symp. Information Theory, Seoul, South

Korea, Jun.-Jul. 2009, pp. 2699–2703.

[9] H. van Tilborg and M. Blaum, “On error-correcting balanced codes,”

IEEE Trans. Inf. Theory, vol. 35, no. 5, pp. 1091–1095, Sep. 1989.

[10] J. H. Weber and K. A. S. Immink, “Knuth’s balanced code revisited,”

IEEE Trans. Inf. Theory, vol. 56, no. 4, pp. 1673–1679, Apr. 2010.

[11] H. Zhou, A. Jiang, and J. Bruck, “Error-correcting schemes with

dynamic thresholds in nonvolatile memories,” in Proc. IEEE Int.

Symp. Information Theory, Saint Petersburg, Russia, Jul.-Aug. 2011,

pp. 2109–2113.

Jos H. Weber (S’87–M’90–SM’00) was born in Schiedam, The Netherlands,

in 1961. He received the M.Sc. (in mathematics, with honors), Ph.D., and MBT

(Master of Business Telecommunications) degrees from Delft University of

Technology, Delft, The Netherlands, in 1985, 1989, and 1996, respectively.

Since 1985, he has been with the Faculty of Electrical Engineering, Mathe-

matics, and Computer Science of Delft University of Technology. Currently, he

is an associate professor at the Wireless and Mobile Communications Group.

He is the chairman of the WIC (Werkgemeenschap voor Informatie-en Com-

municatietheorie in de Benelux) and the secretary of the IEEE Benelux Chapter

on Information Theory. He was a Visiting Researcher at the University of Cal-

ifornia at Davis, USA, the University of Johannesburg, South Africa, and the

Tokyo Institute of Technology, Japan. His main research interests are in the areas

of channel and network coding.

Kees A. Schouhamer Immink (M’81–SM’86–F’90) received his Ph.D. degree

from the Eindhoven University of Technology. He founded and was named

president of Turing Machines, Inc., in 1998. He has been, since 1994, an ad-

junct professor at the Institute for Experimental Mathematics, Essen University,

Germany, and is afﬁliated with the Nanyang Technological University of Sin-

gapore. Immink designed coding techniques of a wealth of digital audio and

video recording products, such as Compact Disc, CD-ROM, CD-Video, Dig-

ital Compact Cassette system, DCC, DVD, Video Disc Recorder, and Blu-ray

Disc. He received a Knighthood in 2000, a personal “Emmy” award in 2004, the

1996 IEEE Masaru Ibuka Consumer Electronics Award, the 1998 IEEE Edison

Medal, 1999 AES Gold and Silver Medals, and the 2004 SMPTE Progress

Medal. He was named a fellow of the IEEE, AES, and SMPTE, and was inducted

into the Consumer Electronics Hall of Fame, and elected into the Royal Nether-

lands Academy of Sciences and the U.S. National Academy of Engineering. He

served the profession as President of the Audio Engineering Society, Inc., New

York, in 2003.

Hendrik C. Ferreira (SM’08) was born and educated in South Africa where

he received the D.Sc. (Eng.) degree from the University of Pretoria in 1980.

From 1980 to 1981, he was a postdoctoral researcher at the Linkabit Cor-

poration in San Diego, CA. In 1983, he joined the Rand Afrikaans University,

Johannesburg, South Africa where he was promoted to professor in 1989 and

served two terms as Chairman of the Department of Electrical and Electronic

Engineering, from 1994 to 1999. He is currently a research professor at the Uni-

versity of Johannesburg. His research interests are in Digital Communications

and Information Theory, especially Coding Techniques, as well as in Power Line

Communications.

Dr. Ferreira is a past chairman of the Communications and Signal Processing

Chapter of the IEEE South Africa section, and from 1997 to 2006 he was Ed-

itor-in-Chief of the Transactions of the South African Institute of Electrical En-

gineers. He has served as chairman of several conferences, including the inter-

national 1999 IEEE Information Theory Workshop in the Kruger National Park,

South Africa, as well as the 2010 IEEE African Winter School on Information

Theory and Communications.