ArticlePDF Available

Abstract and Figures

The prior art construction of sets of balanced codewords by Knuth is attractive for its simplicity and absence of look-up tables, but the redundancy of the balanced codes generated by Knuth's algorithm falls a factor of two short with respect to the minimum required. We present a new construction, which is simple, does not use look-up tables, and is less redundant than Knuth's construction. In the new construction, the user word is modified in the same way as in Knuth's construction, that is by inverting a segment of user symbols. The prefix that indicates which segment has been inverted, however, is encoded in a different, more efficient, way.
Content may be subject to copyright.
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 28, NO. 2, FEBRUARY 2010 1
Very Efcient Balanced Codes
Kees A. Schouhamer Immink and Jos H. Weber
Abstract—The prior art construction of sets of balanced
codewords by Knuth is attractive for its simplicity and absence
of look-up tables, but the redundancy of the balanced codes
generated by Knuth’s algorithm falls a factor of two short with
respect to the minimum required. We present a new construction,
which is simple, does not use look-up tables, and is less redundant
than Knuth’s construction. In the new construction, the user
word is modied in the same way as in Knuth’s construction,
that is by inverting a segment of user symbols. The prexthat
indicates which segment has been inverted, however, is encoded
in a different, more efcient, way.
Index Terms—Magnetic recording, optical recording, channel
capacity, constrained code, dc-free code, balanced code.
I. INTRODUCTION
SETS of bipolar codewords that have equal numbers of
’1’s and ’-1’s are usually called balanced codes. Balanced
codes have been widely used in storage and communication
channels. A survey of properties and methods for constructing
balanced codes can be found in [1]. There is a trend towards
high rate codes, which are made possible by codes with longer
codewords. The implementation of such a code is not a simple
task as look-up tables for translating user words into channel
words and vice versa are impractically large. Knuth published
a simple algorithm for constructing balanced codes [2], which
is very well suited for use with long codewords, since look-
up tables are absent. Modications and improvements of the
generic scheme are discussed by Alon et al. [3], Al-Bassam &
Bose [4], Tallini, Capocelli & Bose [5], and Weber & Immink
[6]. The redundancy of a balanced code generated by Knuth’s
algorithm falls a factor of two short with respect to the
minimum required, i.e. the redundancy of a code that uses
all balanced codewords of a given length [1]. An attempt by
Weber & Immink [6] to compress the xed length prextoa
variable length prex with less redundancy was futile.
In this paper, we will study simple balanced code designs
that require minimum redundancy, while, as in Knuth’s con-
struction, the balanced codeword is obtained by a simple, al-
gorithmic modication of the user word, and a prex is added
that carries sufcient information for the recipient to uniquely
retrieve the user word. The way the user word is modied in
the new construction is the same as in Knuth’s construction,
that is by inverting a segment of user symbols. The prex,
however, is encoded and decoded in a different, more efcient,
Manuscript received 12 January 2009; revised 17 July 2009. This project
was supported by grant Theory and Practice of Coding and Cryptography,
Award Number: NRF-CRP2-2007-03
Kees A. Schouhamer Immink is with Turing Machines Inc., Willemskade
15b-d, 3016 DK Rotterdam, The Netherlands, National Technological Uni-
versity of Singapore, Singapore (e-mail: immink@turing-machines.com).
Jos H. Weber is with TU Delft, IRCTR/CWPC, Mekelweg 4, 2628 CD
Delft The Netherlands, (e-mail: J.H.Weber@ewi.tudelft.nl).
Digital Object Identier 10.1109/JSAC.2010.1002xx.
way. We start, in Section II, with a brief description of the
prior art code construction by Knuth, followed, in Section III,
by a description of the new construction. In Section IV we
will compute some statistics, which will enable us, in V,
to compute the redundancy of the new code construction.
In Section VI, we will present some details regarding the
implementation of the new algorithm. Section VII concludes
the paper.
II. KNUTHS CODE CONSTRUCTION
The conventional Knuth algorithm runs as follows. The user
data is arranged as a bipolar m-tuple u=(u1,...,u
m),ui
{−1,1},meven. (Knuth also presented code constructions
for odd m, but they will not be discussed here.) We dene
for 1jm, the bipolar m-tuple uj=(u1,u2,...,
uj,uj+1,u
j+2,...,u
m). Knuth showed that for any user
data uan index jcan be found such that the codeword ujis
balanced, that is
j
i=1
ui+
m
i=j+1
ui=0.
The index jis not necessarily unique as in general there are
more positions where balance can be obtained [6]. Let the
smallest index j,whereujis balanced, be denoted by I(u).
In other words, I(u)is the smallest index, 1im,for
which uiis balanced. The balanced codeword x=uI(u)
plus a prex, which suitably represents the index I(u),is
transmitted. The receiver receives the codeword xplus prex
I(u), and can thus uniquely undo the encoding step by
forming u=xI(u). For an efcient code, the redundant prex
should be as small as possible. Knuth showed that in his best
construction the redundancy pis roughly equal to [2]
log2m, m >> 1.(1)
The redundancy of a full set of balanced codewords of length
m,H0, equals
H0=mlog2m
m/2,(2)
and can be approximated by [2]
H01
2log2m+0.326,m>>1.(3)
We notice that, for large values of the codeword length, m,the
redundancy of Knuth-based codes is twice as high as that of
codes that uses ’full’ sets of balanced codewords. It has been
a continuing desideratum in data communication and storage
systems to increase the capacity by using more efcient coding
methods. In the next section, we will present the new coding
technique.
0733-8716/10/$25.00 c
2010 IEEE
2 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 28, NO. 2, FEBRUARY 2010
III. NEW PREFIX CODING SCHEME FOR BALANCING
CODEWORDS
Let x=uI(u)be a codeword balanced by Knuth’s
method described above. Clearly, we have u=xI(u)
{x1,...,xm}, and the index received makes it possible to
uniquely single out the right member from the mpossible
ones. The new encoder is based upon the observation that
not all mmembers of the set {x1,...,xm}can be legally
associated with x, since, by denition, Knuth’s encoder takes
the smallest index for balancing a user word. An m-tuple xj
with j>I(xj), is not a bona de word under Knuth’s rules.
We now dene the set of user words legally associated with
xby σx={xj:j=I(xj)}. The cardinality of σxwill be
denoted by d(x)=|σx|.
Example: Let m=6. Then it can be veried that σ000111
={100111, ’110111’, ’111111’, 111000}, where a ’0’
denotes the symbol value ’-1’. In a similar way, σ010101
={110101,100101}. Hence d(000111)=4and
d(010101)=2.
An efcient encoder will transmit the balanced word x=
uI(u)plus an index that resolves the ambiguity about which
word in σxis meant. The average number of bits required
to represent the index depends on the way the code construc-
tion is implemented. For a xed-length-prex construction it
depends on the maximum size of σx, while in a variable-
length-prex construction it will depend on the average size
of σx. Before formulating the key theorem, we offer some
denitions. Let zkbe the running sum of the rst k,km,
symbols of x,or
zk=
k
i=1
xi,
and let zmax =max{zk}and zmin =min{zk}.
Theorem 1:
d(x)=zmax zmin +1.
Proof. We hav e xi/σxif there is a j,where(xi)j,
1j<im, is balanced. Since (xi)j=(x1,...,x
j,
xj+1,...,xi,x
i+1,...,x
m)we conclude that the sum of
the elements of (xi)jequals
m
k=1
xk2
i
k=j+1
xk.
Then, as m
k=1 xk=0, we notice that xi/σxif there is a
j,1j<i, such that
i
k=j+1
xk=0.
Or, in other words, xi/σxif there is a j,1j<isuch that
zi=zj.Let Zdenote the set of sum values zi,1im,
and let V(Z)denote the number of distinct values of Z.Then
it is immediate that d(x)=V(Z). Since all possible values
of zibetween zmin and zmax are in Z, we conclude that
d(x)=V(Z)=zmax zmin +1.
This concludes the proof.
The following Theorem gives a lower and upper bound on
the size d(x).
Theorem 2:
2d(x)m
2+1.
Proof. Clearly, z1=x1=0and zm=0. Thus d(x)=
zmax zmin +12.We now proceed with the upper bound
d(x)m
2+1. Note that, since z0=zm=0,andzi=zi1±1
for all i, all values in Zoccur at least twice with the exception
of a single maximum zmax and/or a single minimum zmin.
Thus
d(x)2+m2
2=m
2+1.
This concludes the proof.
The conventional Knuth scheme, which requires a prexof
length log2(m)bits, is less efcient than a new scheme based
on Theorem 2, which shows that the maximum number of
bits required to represent the index equals log2(m/2+1).
The average number of bits required to represent the index
will be computed in the next section.
IV. COMPUTATION OF THE DISTRIBUTION OF d(x)
In this section, we will compute the distribution of d(x),
so that we can, in Section V, compute the redundancy of
the new schemes. Let P(u, m)denote the number of binary
balanced words xof length mwith d(x)=u.Bydenition
and invoking Theorem 2, we have
m/2+1
u=2
P(u, m)=m
m/2.
The computation of the distribution of d(x)is related to
the problem of computing the number of sequences (random
walks) whose running sum remains within given limits, a prob-
lem rst studied by Chien [7]. Chien studied bipolar sequences
{xi},xi∈{1,1}, where the running sum zi=zi1+xi,
for any i, remains within the limits N1and N2,whereN1
and N2are two (nite) constants, N2>N
1. The range of
sum values a sequence may assume, denoted by
N=N2N1+1,(4)
is often called the digital sum variation.Takingziat any
instant ias the state of the stream {xi}, then the bounds to
zidene a set of Nallowable states. For the N-state source,
an N×Nconnection matrix, DN,isdened by DN(i, j)=
1if a transition from state σito state σjis allowable and
DN(i, j)=0otherwise. The connection matrix DNfor the
channel having a bound to the number of assumed sum values
is given by
DN(i+1,i)=DN(i, i +1)=1,i=1,2, ..., N 1,
DN(i, j)=0,otherwise.
(5)
The (i, j)-th entry of the m-th power of DNwill be denoted
by Dm
N(i, j). The following Theorem will be helpful in
computing P(u, m).
IMMINK and WEBER: VERY EFFICIENT BALANCED CODES 3
Theorem 3: The number of balanced words xof length m
with d(x)=u,P(u, m),2um/2+1,isgivenby
P(u, m)=
u
i=1
Dm
u(i, i)2
u1
i=1
Dm
u1(i, i)
+
u2
i=1
Dm
u2(i, i),2um
2+1.
Proof. In order to calculate P(u, m), we must count the
number of balanced sequences xof length m, whose running
sum span equals u. The matrix entries Dm
N(i, i)give the
number of balanced sequences of length m, whose running
sum span is at most Nthat start and end with a given
sum value i. Thus the count Dm
N(i, i)includes words xwith
d(x)<N. We may resolve this difculty by observing that
a balanced word xwith d(x)=Nhas a unique starting (and
ending) state. Namely, assume a word xwith the property
d(x)=N,andlet
zk=z0+
k
i=1
xi,
where 1z0Ndenotes the initial value of the running
sum. Then, by denition, max{zi}−min{zi}+1=N.The
limiting values zmax and zmin are by denition the maximum
and minimum sum values allowed within the N-state machine.
Other values of z0are not allowed as they will lead to too
high or too low a value of the running sum. We conclude that
there is a unique starting (state) value z0for a sequence having
the maximum running sum span N. Similarly, a word xwith
d(x)=N1may have two possible starting states, and a
word xwith d(x)=N2has three possible starting states,
etc. As a result, we nd
u
i=1
Dm
u(i, i)=P(u, m)+2P(u1,m)+
+3P(u2,m)+4P(u3,m)+...
=
u2
k=0
(k+1)P(uk, m).
Then, after a simple manipulation, we nd
P(u, m)=
u
i=1
Dm
u(i, i)2
u1
i=1
Dm
u1(i, i)
+
u2
i=1
Dm
u2(i, i),2um
2+1.(6)
This proves the theorem.
A useful property to compute powers of DNwas derived by
Salkuyeh [8], namely
Dm
N(i, j)= 2
N+1
N
k=1
λm
ksin ikπ
N+1sin jkπ
N+1,(7)
where
λi=2cos πi
N+1,1iN
TAB L E I
P(u, m)VERSUS u.
uP(u, m)
22
32(2m/22)
m/2m(m4),m>4
m/2+1 m
are the eigenvalues of DN[1]. The number N
i=1 Dm
N(i, i)
can be calculated by invoking relation (7). After rearranging
some terms, we nd
N
i=1
Dm
N(i, i)= 2
N+1
N
k=1
λm
k
N
i=1
sin2ikπ
N+1
=
N
i=1
λm
i=2
m
N
i=1
cosmπi
N+1.(8)
With the above relations it is now straightforward to compute
P(u, m). For special values of u, we could derive simple
relations that offer more insight. The rst two cases were
discussed previously.
There are two codewords that achieve the minimum
bound d(x)=2, namely x=(+1,1,+1,1,...)and
its inverse.
There are mcodewords that achieve the upper bound,
d(x)=m/2+1, namely the codewords starting with
the maximum runlength of m/2’+1’s followed by m/2
’-1’s, and the m1circular shifts of that codeword.
There are 2(2m/22) codewords xwith d(x)=3,
namely the 2m/2codewords formed of combinations
of the 2-bit words (+1,-1) and (-1,+1) minus the two
codewords with d(x)=2plus the one-bit circular shifts
of those codewords.
There are m(m4) codewords xwith d(x)=m/2,
m>4, namely the m/22codewords starting with a
runlength of m/21’+1’s followed by i’-1’s, a ’+1’,
and m/2i’-1’s, 1im/22, their inverse, and
the m1circular shifts of those codewords.
A survey of the above ndings is shown in Table I.
V. P ERFORMANCE COMPUTATIONS
We rst compute the average number of bits, H, required to
represent the index. The quantity, H, sets a theoretical limit as
it is not assumed that the prex is balanced or has an integer
number of bits. There are d(x)different user words that are
transformed into the balanced word x, so that we conclude
m/2+1
u=2
uP (u, m)=2
m.(9)
The average number of bits, H, required to represent the index
is given by
H=2
m
m/2+1
u=2
uP (u, m)log
2u. (10)
Results of computations of the average prex length, H,
versus user word length mare listed in Table II. As a reference
4 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 28, NO. 2, FEBRUARY 2010
TAB L E I I
AVERAGE PREFIX LENGTH,H,AND MINIMUM REDUNDANCY,
H0=mlog2m
m/2,VERSUS USER WORD LENGTH m.
mH H
0
64 3.3641 3.3314
128 3.8616 3.8286
256 4.3603 4.3272
512 4.8597 4.8265
1024 5.3594 5.3261
2048 5.8592 5.8259
4096 6.3591 6.3258
8192 6.8591 6.8258
2 3 4 5 6 7 8 9 10
2
3
4
5
6
7
8
9
10
Redundancy
log m
new scheme
Fig. 1. Average prex length as a function of log2mof the VL prex
Knuth scheme with balanced prex. As a reference we plotted the minimum
redundancy of Knuth’s construction log2(m)and log2(m).
we listed the base-line redundancy of full sets of balanced
codewords, H0. The difference between the average prex
length Hand H0is less than 1 percent. The redundancy of the
variable-length (VL) balanced code, H, as shown in Table II,
is a theoretical minimum. As in Knuth’s prior art construction,
the VL prex should be balanced (or should compensate the
unbalance of the codeword). To that end, for every integer p,
p>0,wedene the integer function B(p)as the smallest
even integer qsuch that
q
q/2p.
Then, assuming that the VL index is mapped onto a balanced
prex, we nd with a slight modication of (10) that
ˆ
H=2
m
m/2+1
u=2
uP (u, m)B(u),(11)
where ˆ
Hdenotes the redundancy of the new construction hav-
ing balanced prexes. Figure 1 shows results of computations.
As a reference we plotted the curves log2(m)and
log2(m), which show the minimum redundancy and that of
integer valued redundancy of Knuth’s construction. We may
observe that for m<64 the redundancy of the xed-prex
Knuth scheme and that of the VL scheme do not signicantly
differ. For m>64, we notice that the average redundancy of
2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
Redundancy
log m
K=1
K=2
K=4
Bound
Fig. 2. Average prex length as a function of log2mof the VL prex
Knuth scheme combining K=1,2,or4m-bit codewords; all schemes have
a balanced prex.
the new scheme is less than that of the classic Knuth scheme.
We may approach the theoretical minimum by combining a
plurality of codewords, say K. Each of the Km-bit user
words is balanced as discussed above, and the Kprexes are
combined into one ’super prex’. The super prex is balanced
by a look-up table or in case the super prex is too long for
table look-up, we apply Knuth’s method. Figure 2 shows some
results of computations, where it is assumed that the prexes
of K=2and K=4words are combined. As a reference
we plotted the curve of minimum redundancy as dened by
(10). We note that the curve showing the average redundancy
of K=4combined prexes is only one bit away from the
bound. We will now take a look at the implementation of
both the encoder and decoder that exploits the ndings of
Theorem 2.
VI. IMPLEMENTATION ISSUES
The two coding schemes, which are based on Theorem 2,
may use a) a xed length or b) a variable length (VL)
prex. In the rst scheme, the prexlengthisxed as in
the conventional scheme. Then the prex must be able to
uniquely encode at most m/2+1 indices requiring less than
log2mbits, so that it is less redundant than the conventional
method. In the second scheme, where the prex length
depends on the user data, the prex length varies between
1andlog2(m/2+1) bits. On the average, the VL coding
scheme will be more efcient than the rst scheme. We will
rst describe the implementation of the encoder and decoder.
Encoder description: Assume the user data uenters the en-
coder. The encoder computes, as in the classic Knuth method,
the balancing position I(u), and transmits the balanced word
x=uI(u). The computation of the prex is more involved.
To that end, we rst specify an order relation on σx.Then
we compute the rank, Iu,0Iud(x)1,ofuin the
ordered set σx, and uniquely translate the rank Iuinto a
(preferably) balanced prex. In a scheme with a xed prex
IMMINK and WEBER: VERY EFFICIENT BALANCED CODES 5
length, the prex must accommodate in the worst case m/2+1
values of Iu. For the scheme with variable prex length, the
prex length depends on x, and must accommodate the d(x)
possible values of Iu. The (balanced) m-tuple xand the prex
are transmitted to the receiver.
Decoder description: The decoder receives the codeword
x=uI(u)and a suitable representation of Iu.Inascheme
with xed prex length, the decoder can identify the prexIu.
Then, the decoder generates the ordered set σx, and retrieves
from σxthe member with rank Iu, and outputs that member.
In a VL scheme, the decoder rst receives x, and computes
d(x).Thevalueofd(x)identies the prex length. Then we
decode the index Iu, generate the ordered set σx, and retrieve
the member of σxwith index Iu. Note that in the VL scheme,
the ’prex’ must follow the codeword, and normal people
would call it therefore a ’sufx’. For heritage reasons, we will
use Knuth’s term ’prex’, while it should be appreciated that
in this context, the ’prex’ will be following the codeword.
Essential parts of the encoding and decoding operation are
the computation of d(x), the generation of the ordered set
σx, and the computation of the index Iu. The complexity
of a straightforward algorithm for computing d(x)grows
quadratically with increasing codeword length m. We will
discuss some more efcient methods. Let x=uI(u)be a
balanced word. The complexity of the following algorithm
grows linearly with m.Dene the set of running sum val-
ues as Z={zk},k, and dene the binary vector v=
(vm/2,...,v
m/2)as vi=1,i Zand vi=0,i /Z.As
d(x)equals the number of different values that ziassumes,
which equals the weight of v,wend d(x)=vi.The
following code implements the above in MatLab notation:
Algorithm 1
Input: User input word x(i),1im.
Output: d(x).
z(1)=x(1); v(z(1)+m/2+1)=1;
for i=2:m; z(i)=z(i-1)+x(i); v(z(i)+m/2+1)=1; end;
dx=sum(v);
where z(i)denotes the running sum, x(i)∈{1,+1}denotes
the entries of codeword x,andv(i)are the entries of a binary
vector counting the occurrence of the running sum values
z(i)+m/2+1
1. The sum of the entries of vequals d(x).
We n ow de ne the ordering of the members of σx.Letxi
and xjbe elements of σx. We call xiless than xj, in short
xi<xj,ifi<j.Therank of vσx, denoted by Iv,is
dened to be the position of vin the ordered list of members
of σx, i.e., Ivis the number of all yin σxwith y<v.The
following MatLab-algorithm nds the rank, Iu, of the user
word u=xI(u)in the ordered set σxby counting the words
less than xI(u).
Algorithm 2
Input: I(u),v(i),z(i),1im,asdened in Algorithm 1.
Output: Iu=p.
p=0;
1The term m/2+1is added since MatLab does not allow non-positive
array indices.
for i=1:I(u)-1; if v(z(i)+m/2+1)==1 then p=p+1;
v(z(i)+m/2+1)=0; end;end;
After execution of the routine, we nd Iu=p.
VII. CONCLUSIONS
We have presented a new method for constructing sets of
balanced bipolar codewords. The new construction presented
is attractive as it does not use look-up tables and is less redun-
dant than Knuth’s prior art construction. We have presented
simple algorithms for computing the prex, encoding, and
decoding. We have analyzed the distribution of the lengths
of the prex length, and determined the average efciency of
the new construction.
REFERENCES
[1] K.A.S. Immink, Codes for Mass Data Storage Systems, Second Edi-
tion, ISBN 90-74249-27-2, Shannon Foundation Publishers, Eindhoven,
Netherlands, 2004.
[2] D.E. Knuth, ’Efcient Balanced Codes’, IEEE Trans. Inform. Theory,
vol. IT-32, no. 1, pp. 51-53, Jan. 1986.
[3] N. Alon, E.E. Bergmann, D. Coppersmith, and A.M. Odlyzko, ’Balanc-
ing Sets of Vectors’, IEEE Trans. Inform. Theory, vol. IT-34, no. 1, pp.
128-130, Jan. 1988.
[4] S. Al-Bassam and B. Bose, ’On Balanced Codes’, IEEE Trans. Inform.
Theory, vol. IT-36, no. 2, pp. 406-408, March 1990.
[5] L.G. Tallini, R.M. Capocelli, and B. Bose, ’Design of some New
Balanced Codes’, IEEE Trans. Inform. Theory, vol. IT-42, pp. 790-802,
May 1996.
[6] J.H. Weber and K.A.S. Immink, ’Knuth’s Balancing of Codewords
Revisited’, IEEE International Symposium on Information Theory,
ISIT2008, pp. 1567-1571, Toronto, 6-11 July 2008.
[7] T.M. Chien, ’Upper Bound on the Efciency of Dc-constrained Codes’,
Bell Syst. Tech. J., vol. 49, pp. 2267-2287, Nov. 1970.
[8] D.K. Salkuyeh, ’Positive Integer Powers of the Tri-diagonal Toeplitz
Matrices’, International Mathematical Forum, 1, no. 22, pp. 1061 -
1065, 2006
Kees Schouhamer Immink received his PhD de-
gree from the Eindhoven University of Technology.
He founded and was named president of Turing
Machines Inc. in 1998. He is, since 1994, an adjunct
professor at the Institute for Experimental Math-
ematics, Essen University, Germany, and is afli-
ated with the Nanyang Technological University of
Singapore. Immink designed coding techniques of a
wealth of digital audio and video recording products,
such as Compact Disc, CD-ROM, CD-Video, Digital
Compact Cassette system, DCC, Digital Versatile
Disc, DVD, Video Disc Recorder, and Blu-ray Disc. He received a Knighthood
in 2000, a personal Emmy award in 2004, the 1996 IEEE Masaru Ibuka
Consumer Electronics Award, the 1998 IEEE Edison Medal, 1999 AES Gold
and Silver Medals, and the 2004 SMPTE Progress Medal. He was named a
fellow of the IEEE, AES, and SMPTE, and was inducted into the Consumer
Electronics Hall of Fame, and elected into the Royal Netherlands Academy
of Sciences and the US National Academy of Engineering. He served the
profession as President of the Audio Engineering Society inc., New York, in
2003.
6 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 28, NO. 2, FEBRUARY 2010
Jos H. Weber (S’87-M’90-SM’00) was born in
Schiedam, The Netherlands, in 1961. He received
the M.Sc. (in mathematics, with honors), Ph.D.,
and MBT (Master of Business Telecommunications)
degrees from Delft University of Technology, Delft,
The Netherlands, in 1985, 1989, and 1996, respec-
tively.
Since 1985 he has been with the Faculty of
Electrical Engineering, Mathematics, and Computer
Science of Delft University of Technology. Cur-
rently, he is an associate professor at the Wire-
less and Mobile Communications Group. He is the chairman of the WIC
(Werkgemeenschap voor Informatie- en Communicatietheorie in the Benelux)
and the secretary of the IEEE Benelux Chapter on Information Theory. He
was a Visiting Researcher at the University of California at Davis, USA,
the University of Johannesburg, South Africa, and the Tokyo Institute of
Technology, Japan. His main research interests are in the areas of channel
and network coding.
... The average redundancy of a VL tag scheme is less than that of the above fixed-length tag scheme. As the distribution of r(y) is the same as that of Knuth's code, we follow [22] for the computation of the redundancy of the VL tag scheme. The number of balanced words y of length n with r(y) = u, denoted by P (u, n), 2 ≤ u ≤ n/2 + 1, is given by [22] P (u, n) = D(u, n) − 2D(u − 1, n) + D(u − 2, n), (19) where ...
... As the distribution of r(y) is the same as that of Knuth's code, we follow [22] for the computation of the redundancy of the VL tag scheme. The number of balanced words y of length n with r(y) = u, denoted by P (u, n), 2 ≤ u ≤ n/2 + 1, is given by [22] P (u, n) = D(u, n) − 2D(u − 1, n) + D(u − 2, n), (19) where ...
... where v = n/(2u + 2) . The redundancy of the VL tag scheme, denoted by H, equals [22] H = 2 −n n/2+1 u=2 uP (u, n) log 2 u. ...
Preprint
Full-text available
We present and analyze a new systematic construction of bipolar balanced codes where each code word contains equally many −1's and +1's. The new code is minimally modified as the number of symbol changes made to the source word for translating it into a balanced code word is as small as possible. The balanced codes feature low redundancy and time complexity. Large look-up tables are avoided.
... Therefore, in [9], [10], Immink and Weber proposed balancing schemes that transmit variable-length prefixes and studied the average redundancy of their proposals. Specifically, in [9], Weber and Immink provided two variable-length balancing schemes whose average redundancy are asymptotically equal to log 2 n and 1 2 log 2 n + 0.936, respectively. ...
... Specifically, in [9], Weber and Immink provided two variable-length balancing schemes whose average redundancy are asymptotically equal to log 2 n and 1 2 log 2 n + 0.936, respectively. Later in [10], Immink and Weber proposed another variable-length balancing scheme which we study closely in this paper. In [10], Immink and Weber provided closed formulas for the average redundancy of their scheme and computed these values for n 8192. ...
... Later in [10], Immink and Weber proposed another variable-length balancing scheme which we study closely in this paper. In [10], Immink and Weber provided closed formulas for the average redundancy of their scheme and computed these values for n 8192. While numerically the redundancy values are close to the optimal value given in (3), a tight asymptotic analysis was not provided. ...
Preprint
Full-text available
We study and propose schemes that map messages onto constant-weight codewords using variable-length prefixes. We provide polynomial-time computable formulas that estimate the average number of redundant bits incurred by our schemes. In addition to the exact formulas, we also perform an asymptotic analysis and demonstrate that our scheme uses $\frac12 \log n+O(1)$ redundant bits to encode messages into length-$n$ words with weight $(n/2)+{\sf q}$ for constant ${\sf q}$.
... The generating function offers a tool for enumerating the balanced codes [44,45]. Encoding/decoding of balanced codes has attracted a considerable amount of research and engineering attention [46,47]. ...
... MAXIMUM LIKELIHOOD DECODING 5 83 dc/dc 2 -balanced codes in [45]. Encoding/decoding of balanced codes has attracted a considerable amount of research and engineering attention [46,47]. ...
... Improvements of the traditional Knuth's algorithm are considered in [16], [17]. Of interest is the new prefix coding technique for balancing codewords in [17] with reduced redundant than Knuth's construction. ...
... Improvements of the traditional Knuth's algorithm are considered in [16], [17]. Of interest is the new prefix coding technique for balancing codewords in [17] with reduced redundant than Knuth's construction. The number of bits required to represent the index was achieved by either a fixed prefix length (FPL) method or variable prefix length (VPL) method. ...
Article
Full-text available
Visible light communication (VLC) offers wireless communication within short-range based on wavelength converters and light-emitting diode (LED). In the VLC system, conventional forward error correction (FEC) codes are not guaranteed to provide flicker mitigation and dimming support. Consequently, modified coding schemes are introduced for reliable VLC. These methods require complicated coding structures, use of lookup tables, and the addition of large redundancy, resulting to increased computational complexity and low transmission efficiency. In this article, we propose a coding scheme that is flicker-free and enhances the transmission efficiency for VLC systems. The proposed scheme is based on polar codes (PC) and Knuth balancing code with enhanced prefix coding technique. The results show that the proposed algorithm exhibits improved transmission efficiency compared to the PC without and with run-length limited code, for dimming values 75% (or 25%) and 87.5% (or 12.5%). Also, the proposed scheme presents a significant bit error rate (BER) performance gain compared to the schemes in literature. The proposed scheme is flicker-free, provides a simple encoding structure, does not utilize lookup tables, generates minimal number of redundancies for energy efficiency. Thus, the approach is flexible, and it is more suitable for real-time VLC systems. INDEX TERMS Forward error correction, Knuth balancing codes, light-emitting diode, polar codes, visible light communication.
... Note that the phrase "balanced codes" might be used for other concepts in literature, e.g., in [21]. ...
Preprint
Full-text available
This is a manuscript of a chapter prepared for a book. The good codes possess large information length and large minimum distance. A class of codes is said to be asymptotically good if there exists a positive real $\delta$ such that, for any positive integer $N$ we can find a code in the class with code length greater than $N$, and with both the rate and the relative minimum distance greater than $\delta$. The linear codes over any finite field are asymptotically good. More interestingly, the (asymptotic) GV-bound is a phase transition point for the linear codes; i.e., asymptotically speaking, the parameters of most linear codes attain the GV-bound. It is a long-standing open question: whether or not the cyclic codes over a finite field (which are an important class of codes) are asymptotically good? However, from a long time ago the quasi-cyclic codes of index $2$ were proved to be asymptotically good. This chapter consists of some of our studies on the asymptotic properties of several classes of quasi-group codes. We'll explain the studies in a consistent and self-contained style. We begin with the classical results on linear codes. In many cases we consider the quasi-group codes over finite abelian groups (including the cyclic case as a subcase of course), and study their asymptotic properties along two directions: (1) the order of the group (the coindex) is fixed while the index is going to infinity; (2) the index is small while the order of the group (the coindex) is going to infinity. Finally we describe the story on dihedral codes. The dihedral groups are non-abelian but near to cyclic groups (they have cyclic subgroups of index $2$). The asymptotic goodness of binary dihedral codes was obtained in the beginning of this century, and extended to the general dihedral codes recently.
... The main reason is due to the provable difficulty of 2D-constraints compared to 1D-constraints. For example, consider certain weight-constrained codes such as the balanced codes or constant-weight codes, there are several efficient prior-art coding methods for designing 1Dcodes with optimal or almost optimal redundancy [14]- [17]. Here, almost optimal refers to the cases that the encoder's redundancy is at most a constant bit away from the optimal redundancy. ...
Conference Paper
Full-text available
In this work, given n, p>0 , efficient encoding/decoding algorithms are presented for mapping arbitrary data to and from n×n binary arrays in which the weight of every row and every column is at most pn. Such constraint, referred as p-bounded-weight-constraint, is crucial for reducing the parasitic currents in the crossbar resistive memory arrays, and has also been proposed for certain applications of the holographic data storage. While low-complexity designs have been proposed in the literature for only the case p=1/2 , this work provides efficient coding methods that work for arbitrary values of p . The coding rate of our proposed encoder approaches the channel capacity for all p .
... In the serial or sequential scheme, the prefix comprises the original sequence's weight then, the com-plementing is performed on the overall sequence (prefix and original sequence) up to the balancing point. Improvements and embellishments of Knuth's binary methods can be found in [5]- [10]. ...
Article
A simplified and efficient algorithm with parallel decoding capacity was presented by Knuth for balancing binary sequences (binary sequences are a combination of zeros and ones, making up a set of instructions and data that a computer understands). This study proposes a generalization of this algorithm for q-ary sequences (multiplexed sequences, clock-controlled sequences, geometric sequences). This new approach is also based on simplicity and parallel decoding for q-ary balanced codes. Furthermore, it has a fixed redundancy for short and long sequences that equals logq k, where k is the sequence length, and no lookup tables are required.
... Since Shannon's 1948 paper [5], the design of CS codes has been an active research area where efficient CS codes that satisfy a great variety of constraints have been proposed. Although most CS codes in the literature are fixed-length codes [1]- [4], [6]- [16], recent advances show that variable-length CS codes have the potential to achieve higher code rates with simpler codebooks [17]- [26]. Since CS codes typically do not have strong error-correction capabilities, decoding of CS codes may result in error propagation. ...
Article
Full-text available
We study the ability of recently developed variable-length constrained sequence codes to determine codeword boundaries in the received sequence upon initial receipt of the sequence and if errors in the received sequence cause synchronization to be lost.We first investigate construction of these codes based on the finite state machine description of a given constraint, and develop new construction criteria to achieve high synchronization probabilities. Given these criteria, we propose a guided partial extension algorithm to construct variable-length constrained sequence codes with high synchronization probabilities. With this algorithm we construct new codes and determine the number of codewords and coded bits that are needed to recover synchronization once synchronization is lost.We consider a large variety of constraints including the runlength limited (RLL) constraint, the DC-free constraint, the Pearson constraint and constraints for inter-cell interference mitigation in flash memories. Simulation results show that the codes we construct exhibit excellent synchronization properties, often resynchronizing within a few bits.
Article
Bazzi and Mitter [4] showed that binary dihedral group codes are asymptotically good. In this paper we prove that the dihedral group codes over any finite field with strong duality property are asymptotically good. If the characteristic of the field is even, self-dual dihedral group codes are asymptotically good. If the characteristic of the field is odd, maximal self-orthogonal dihedral group codes and LCD dihedral group codes are asymptotically good.
Book
Full-text available
Preface - The advantages of digital audio and video recording have been appreciated for a long time and, of course, computers have long been operated in the digital domain. The advent of ever-cheaper and faster digital circuitry has made feasible the creation of high-end digital video and audio recorders, an impracticable possibility using previous generations of conventional analog hardware. The principal advantage that digital implementation confers over analog systems is that in a well-engineered digital recording system the sole significant degradation takes place at the initial digitization, and the quality lasts until the point of ultimate failure. In an analog system, quality is diminished at each stage of signal processing and the number of recording generations is limited. The quality of analog recordings, like the proverbial 'old soldier', just fades away.
Article
Full-text available
Let n be an arbitrary integer, let p be a prime factor of n. Denote by ! 1 the p t h primitive unity root, omega(1) : = e 2 pi i/p Define omega(i) : = omega 1(i) for 0 <= i <= p - 1 and B : = {1; omega 1, ... , omega(p-1)}(n) subset of C(n). Denote by K (n; p) the minimum k for which there exist vectors upsilon(1,) ... , upsilon(k) is an element of B such that for any vector omega is an element of B, there is an i, 1 <= i <= k, such that v(i) . omega = 0, where upsilon center dot omega is the usual scalar product of upsilon and omega. Grobner basis methods and linear algebra proof gives the lower bound K ( n; p) = n (p-1). Galvin posed the following problem: Let m = m ( n) denote the minimal integer such that there exists subsets Lambda(1,) ..., Lambda(m) of {1, ... , 4n} with vertical bar Lambda i vertical bar = 2n for each 1 <= i <= n, such that for any subset B subset of [4n] with 2 n elements there is at least one i, 1 <= i <= m, with A(i) boolean AND B having n elements. We obtain here the result m (p) >= p in the case of p > 3 primes.
Book
Full-text available
Preface to the Second Edition About five years after the publication of the first edition, it was felt that an update of this text would be inescapable as so many relevant publications, including patents and survey papers, have been published. The author's principal aim in writing the second edition is to add the newly published coding methods, and discuss them in the context of the prior art. As a result about 150 new references, including many patents and patent applications, most of them younger than five years old, have been added to the former list of references. Fortunately, the US Patent Office now follows the European Patent Office in publishing a patent application after eighteen months of its first application, and this policy clearly adds to the rapid access to this important part of the technical literature. I am grateful to many readers who have helped me to correct (clerical) errors in the first edition and also to those who brought new and exciting material to my attention. I have tried to correct every error that I found or was brought to my attention by attentive readers, and seriously tried to avoid introducing new errors in the Second Edition. China is becoming a major player in the art of constructing, designing, and basic research of electronic storage systems. A Chinese translation of the first edition has been published early 2004. The author is indebted to prof. Xu, Tsinghua University, Beijing, for taking the initiative for this Chinese version, and also to Mr. Zhijun Lei, Tsinghua University, for undertaking the arduous task of translating this book from English to Chinese. Clearly, this translation makes it possible that a billion more people will now have access to it. Kees A. Schouhamer Immink Rotterdam, November 2004
Conference Paper
Full-text available
In 1986, Don Knuth published a very simple algorithm for constructing sets of bipolar codewords with equal numbers of 1s and 0s, called balanced codes. Knuth's algorithm is, since look-up tables are absent, well suited for use with large codewords. The redundancy of Knuths balanced codes is a factor of two larger than that of a code comprising the full set of balanced codewords. In our paper we will present results of our attempts to improve the performance of Knuths balanced codes.
Article
In digital transmission systems, the transmission channel often does not pass d-c. This causes the well- known problem of baseline wander. One way to overcome this difficulty is to restrict the d-c content in the signal stream using suitably devised codes. It is shown that, for a d-c constrained code, the limiting efficiency is related to the number of allowable running digital sum states in a very simple way.
Article
We derive the limiting efficiencies of dc-constrained codes. Given bounds on the running digital sum (RDS), the best possible coding efficiency η, for a K-ary transmission alphabet, is η = log2 λmax/log2 K, where λmax is the largest eigenvalue of a matrix which represents the transitions of the allowable states of RDS. Numerical results are presented for the three special cases of binary, ternary and quaternary alphabets.
Article
Coding schemes in which each codeword contains equally many zeros and ones are constructed in such a way that they can be efficiently encoded and decoded.
Article
A balanced code with r check bits and k information bits is a binary code of length k+r and cardinality 2<sup>k</sup> such that each codeword is balanced; that is, it has [(k+r)/2] 1's and [(k+r)/2] 0's. This paper contains new methods to construct efficient balanced codes. To design a balanced code, an information word with a low number of 1's or 0's is compressed and then balanced using the saved space. On the other hand, an information word having almost the same number of 1's and 0's is encoded using the single maps defined by Knuth's (1986) complementation method. Three different constructions are presented. Balanced codes with r check bits and k information bits with k&les;2<sup>r+1</sup>-2, k&les;3×2<sup>r</sup>-8, and k&les;5×2<sup>r</sup>-10r+c(r), c(r)∈{-15, -10, -5, 0, +5}, are given, improving the constructions found in the literature. In some cases, the first two constructions have a parallel coding scheme