Content uploaded by Jehoshua Bruck

Author content

All content in this area was uploaded by Jehoshua Bruck on Apr 10, 2015

Content may be subject to copyright.

arXiv:1209.0744v1 [cs.IT] 4 Sep 2012

1

Balanced Modulation for Nonvolatile Memories

Hongchao Zhou, Anxiao (Andrew) Jiang, Member, IEEE, Jehoshua Bruck, Fellow, IEEE

Abstract—This paper presents a practical writing/reading

scheme in nonvolatile memories, called balanced modulation, for

minimizing the asymmetric component of errors. The main idea

is to encode data using a balanced error-correcting code. When

reading information from a block, it adjusts the reading threshold

such that the resulting word is also balanced or approximately

balanced. Balanced modulation has suboptimal performance for

any cell-level distribution and it can be easily implemented in

the current systems of nonvolatile memories. Furthermore, we

studied the construction of balanced error-correcting codes, in

particular, balanced LDPC codes. It has very efﬁcient encoding

and decoding algorithms, and it is more efﬁcient than prior

construction of balanced error-correcting codes.

Index Terms—Balanced Modulation, Balanced LDPC Codes,

Dynamic Reading Thresholds.

I. INTRODUCTION

NONVOLATILE memories, like EPROM, EEPROM,

Flash memory or Phase-change memory (PCM), are

memories that can keep the data content even without power

supply. This property enables them to be used in a wide range

of applications, including cellphones, consumers, automotive

and computers. Many research studies have been carried out

on nonvolatile memories because of their unique features,

attractive applications and huge marketing demands.

An important challenge for most nonvolatile memories is

data reliability. The stored data can be lost due to many

mechanisms, including cell heterogeneity, programming noise,

write disturbance, read disturbance, etc. [2], [15]. From a long-

term view, the change in data has an asymmetric property.

For example, the stored data in ﬂash memories is represented

by the voltage levels of transistors, which drift in one di-

rection because of charge leakage. In PCM, another class of

nonvolatile memories, the stored data is determined by the

electrical resistance of the cells, which drifts due to thermally

activated crystallization of the amorphous material [21]. All

these mechanisms make the errors in nonvolatile memories be

heterogeneous, asymmetric, time dependent and unpredictable.

These properties bring substantial difﬁculties to researchers

attempting to develop simple and efﬁcient error-correcting

schemes.

To date, existing coding schemes for nonvolatile memories

commonly use ﬁxed thresholds to read data. For instance, in

ﬂash memories, a threshold voltage level vis predetermined;

This work was supported in part by the NSF CAREER Award CCF-

0747415, the NSF grant ECCS-0802107, and by an NSF-NRI award. This

paper was presented in part at IEEE International Symposium on Information

Theory (ISIT), St. Petersburg, Russia, August 2011.

H. Zhou and J. Bruck are with the Department of Electrical Engi-

neering, California Institute of Technology, Pasadena, CA, 91125. Email:

hzhou@caltech.edu, bruck@caltech.edu

A. Jiang is with the Computer Science and Engineering Department, Texas

A&M University, College Station, TX 77843. Email: ajiang@cse.tamu.edu

0v1

v1

v2

0

1

1

cell−level

cell−level

(a) at time 0

(b) at time T

Fig. 1. An illustration of the voltage distributions for bit “1” and bit “0” in

ﬂash memories.

when reading data from a cell, it gets ‘1’ if the voltage

level is higher than v, and otherwise it gets ‘0’. To increase

data reliability, error-correcting codes such as Hamming code,

BCH code, Reed-Solomon code and LDPC code are applied

in nonvolatile memories to combat errors. Because of the

asymmetric feature of nonvolatile memories, a ﬁxed threshold

usually introduces too many asymmetric errors after a long

duration [14], namely, the number of 1→0errors is usually

much larger than the number of 0→1errors. To overcome the

limitations of ﬁxed thresholds in reading data in nonvolatile

memories, dynamic thresholds are introduced in this paper. To

better understand this, we use ﬂash memories for illustration,

see Fig. 1. The top ﬁgure is for newly written data, and the bot-

tom ﬁgure is for old data that has been stored for a long time

T. In the ﬁgures, assume the left curve indicates the voltage

distribution for bit ‘0’ (a bit ‘0’ is written during programming)

and the right curve indicates the voltage distribution for bit ‘1’.

At time 0(the moment after programming), it is best to set

the threshold voltage as v=v1, for separating bit ‘1’ and

‘0’. But after a period of time, the voltage distribution will

change. In this case, v1is no longer the best choice, since it

will introduce too many 1→0errors. Instead, we can set

the threshold voltage as v=v2(see the second plot in the

ﬁgure), to minimize the error probability. This also applies to

other nonvolatile memories, such as PCMs.

Although best dynamic reading thresholds lead to much less

errors than ﬁxed ones, certain difﬁculties exist in determining

their values at a time t. One reason is that the accurate level

distributions for bit ‘1’ and ‘0’ at any the current time are hard

2

to obtain due to the lack of time records, the heterogeneity

of blocks, and the unpredictability of exceptions. Another

possible method is to classify all the cell levels into two groups

based on unsupervised clustering and then map them into

‘1’s and ‘0’s. But when the border between bit ‘1’s and ‘0’s

becomes fuzzy, mistakes of clustering may cause signiﬁcant

number of reading errors. In view of these considerations, in

this paper, we introduce a simple and practical writing/reading

scheme in nonvolatile memories, called balanced modulation,

which is based on the construction of balanced codes (or

balanced error-correcting codes) and it aims to minimize the

asymmetric component of errors in the current block.

Balanced codes, whose codewords have an equal number

of 1s and 0s, have been studied in several literatures. Knuth,

in 1986, proposed a simple method of constructing balanced

codes [10]. In his method, given an information word of

k-bits (kis even), the encoder inverts the ﬁrst ibits such

that the modiﬁed word has an equal number of 1s and 0s.

Knuth showed that such an integer ialways exists, and it is

represented by a balanced word of length p. Then a codeword

consists of an p-bit preﬁx word and an k-bit modiﬁed infor-

mation word. For decoding, the decoder can easily retrieve

the value of iand then get the original information word

by inverting the ﬁrst ibits of the k-bit information word

again. Knuth’s method was later improved or modiﬁed by

many researchers [1], [9], [17], [19]. Based on balanced codes,

we have a scheme of balanced modulation. It encodes the

stored data as balanced codewords; when reading data from a

block, it adjusts the reading threshold dynamically such that

the resulting word to read is also balanced (namely, the number

of 1s is equal to the number of 0s) or approximately balanced.

Here, we call this dynamic reading threshold as a balancing

threshold.

There are several beneﬁts of applying balanced modulation

in nonvolatile memories. First, it increases the safety gap of 1s

and 0s. With a ﬁxed threshold, the safety gap is determined by

the minimum difference between cell levels and the threshold.

With balanced modulation, the safety gap is the minimum

difference between cell levels for 1and those for 0. Since

the cell level for an individual cell has a random distribution

due to the cell-programming noise [3], [11], the actual value of

the charge level varies from one write to another. In this case,

balanced modulation is more robust than the commonly used

ﬁxed-threshold approach in combating programming noise.

Second, as we discussed, balanced modulation can is a very

simple solution that minimizes the inﬂuence of cell-level drift.

It was shown in [4] that cell-level drift in ﬂash memories

introduces the most dominating errors. Third, balanced mod-

ulation can efﬁciently reduce errors introduced by some other

mechanisms, such as the change of external temperatures and

the current leakage of other reading lines, which result in the

shift of cell levels in a same direction. Generally, balanced

modulation is a simple approach that minimizes the inﬂuence

of noise asymmetries, and it can be easily implemented

on current memory devices without hardware changes. The

balanced condition on codewords enables us to select a much

better threshold dynamically than the commonly used ﬁxed

threshold when reading data from a block.

The main contributions of the paper are

1) We study balanced modulation as a simple, practical and

efﬁcient approach to minimize asymmetric component

of errors in nonvolatile memories.

2) A new construction of balanced error-correcting codes,

called balanced LDPC code, is introduced and analyzed,

which has a higher rate than prior constructions.

3) We investigate partial-balanced modulation, for its sim-

plicity of constructing error-correcting codes, and then

we extend our discussions from binary cells to multi-

level cells.

II. SCOPE OF THIS PAPER

A. Performance and Implementation

In the ﬁrst part of this paper, including Section III, Section

IV and Section V, we focus on the introduction and perfor-

mance of balanced modulation. In particular, we demonstrate

that balanced modulation introduces much less errors than the

traditional approach based on ﬁxed thresholds. For any cell-

level distributions, the balancing threshold used in balanced

modulation is suboptimal among all the possible reading

thresholds, in the term of total number of errors. It enables

balanced modulation to be adaptive to a variety of channels

characters, hence, it makes balanced modulation applicable for

most types of nonvolatile memories. Beyond storage systems,

balanced modulation can also be used in optimal communica-

tion, where the strength of received signals shifts due to many

factors like the transmitting distance, temperature, etc.

A practical and very attractive aspect of balanced modula-

tion is that it can be easily implemented in the current systems

of nonvolatile memories. The only change is that, instead of

using a ﬁxed threshold in reading a binary vector, it allows

this threshold to be adaptive. Fortunately, this operation can

be implemented physically, making the process of data reading

reasonably fast. In this case, the reading process is based on

hard decision.

If we care less about reading speed, we can have soft-

decision decoding, namely, reading data without using a

threshold. We demonstrate that the prior knowledge that the

stored codeword is balanced is very useful. It helps us to better

estimate the current cell-level distributions, hence, resulting in

a better performance in bit error rate.

B. Balanced LDPC Code

Balanced modulation can efﬁciently reduce bit error rate

when reading data from a block. A further question is how

to construct balanced codes that are capable of correcting

errors. We call such codes balanced error-correcting codes.

Knuth’s method cannot correct errors. In [18], van Tilborg and

Blaum presented a family of balanced binary error-correcting

codes. The idea is to consider balanced blocks as symbols

over an alphabet and to construct error-correcting codes over

that alphabet by concatenating nblocks of length 2leach.

Due to the constraint in the code construction, this method

achieves only moderate rates. Error-correcting balanced codes

with higher rates were presented by Al-Bassam and Bose in

3

uBalanced Code

Encoder

Programming

Cells

x

Cell Levels

Reading

Cells

Balanced Code

Decoder

y

Balancing

Threshold

c

v

after time t

Fig. 2. The diagram of balanced modulation.

[1], however, their construction considers only the case that the

number of errors is at most 4. In [12], Mazumdar, Roth, and

Vontobel studied linear balancing sets, namely, balancing sets

that are linear subspaces Fn, which are applied in obtaining

coding schemes that combine balancing and error correction.

Recently, Weber, Immink and Ferreira extent Knuth’s method

to let it equipped with error-correcting capabilities [20]. Their

idea is to assign different error protection levels to the preﬁx

and modiﬁed information word in Knuth’s construction. So

their construction is a concatenation of two error-correct codes

with different error correcting capabilities. In Section VI,

we introduce a new construction of balanced error-correcting

codes, which is based on LDPC code, so called balanced

LDPC code. Such a construction has a simple encoding algo-

rithm and its decoding complexity based on message-passing

algorithm is asymptotically equal to the decoding complexity

of the original (unbalanced) LDPC code. We demonstrate that

balanced LDPC code has error-correcting capability very close

to the original (unbalanced) LDPC code.

C. Partial-Balanced Modulation and Its Extension

Our observation is that the task of constructing efﬁcient bal-

anced error-correcting codes with simple encoding and decod-

ing algorithms is not simple, but it is much easier to construct

error-correcting codes that are partially balanced, namely,

only a certain segment (or subsequence) of each codeword is

balanced. Motivated by this observation, we propose a variant

of balanced modulation, called partial-balanced modulation.

When reading from a block, it adjusts the reading threshold

such that the segment of the resulting word is balanced. Partial-

balanced modulation has a performance very close to that of

balanced modulation, and it has much simpler constructions

of error-correcting codes than balanced modulation. Another

question that we address in the third part is how to extend the

scheme of balanced modulation or partial-balanced modulation

to be used in nonvolatile memories with multi-level cells.

Details will be provided in Section VII and Section VIII.

III. BALANCED MODULATION

For convenience, we consider different types of nonvolatile

memories in the same framework where data is represented by

cell levels, such as voltages in ﬂash memories and resistance in

phase-change memories. The scheme of balanced modulation

is sketched in Fig. 2. It can be divided into two steps:

programming step and reading step.

(1) In the programming step, we encode data based a

balanced (error-correcting) code. Let kdenote the dimension

of the code and ndenote the number of cells in a block, then

given a message u∈ {0,1}n, it is mapped to a balanced

codeword x∈ {0,1}nsuch that |x|=n

2where |x|is the

Hamming weight of x.

(2) In the reading step, we let c=c1c2...cn∈ Rnbe the

current levels of the ncells to read. A balancing threshold v

is determined based on csuch that the resulting word, denoted

by y=y1y2...yn, is also balanced, namely, |y|=n

2. For each

i∈ {1,2, ..., n},yi= 1 if and only if ci≥v, otherwise yi=

0. By applying the decoder of the balanced (error-correcting)

code, we get a binary output ˜u, which is the message that we

read from the block.

01

v

number of cells

that store 0 number of cells

that store 1

cell-level

N(1 0) N(0 1)

Fig. 3. Cell-level distributions for 1and 0, and the reading threshold.

Let us intuitively understanding the function of balanced

modulation based on the demonstration of Fig. 3, which

depicts the cell-level distributions for those cells that store 0

or 1. Given a reading threshold v, we use N(1→0) denote the

number of 1→0errors and use N(0→1) denote the number

of 0→1errors, as the tails marked in the ﬁgure. Then

N(1→0) =|{i:xi= 1, yi= 0}|,

N(0→1) =|{i:xi= 0, yi= 1}|.

We are ready to see

|y|=|x| − N(1→0) +N(0→1),

where |x|is the Hamming weight of x.

According to the deﬁnition, a balancing threshold is the one

that makes ybeing balanced, hence,

N(1→0)(v) = N(0→1) (v),

i.e., a balancing threshold results in the same number of 1→0

errors and 0→1errors.

We deﬁne Ne(v)as the total number of errors based on a

reading threshold v, then

Ne(v) = N(1→0)(v) + N(0→1) (v).

If the cell-level distributions for those cells that store 1and

those cells that store 0are known, then the balancing threshold

4

may not be the best reading threshold that we can have, i.e.,

Ne(v)may not be minimized based on the balancing thresh-

old. Let vbdenote the balancing threshold, as a comparison,

we can have an optimal threshold vo, which is deﬁned by

vo= arg min

vNe(v).

Unfortunately, it is almost impossible for us to know the

cell-level distributions for those cells that store 1and those

cells that store 0without knowing the original word x. From

this sense, the optimal threshold vois imaginary. Although

we are not able to determine vo, the following result shows

that the balancing threshold vbhas performance comparable

to that of vo. Even in the worst case, the number of errors

introduced based on vbis at most two times that introduced

by vo, implying the suboptimality of the balancing threshold

vb.

Theorem 1. Given any balanced codeword x∈ {0,1}nand

cell-level vector c∈ Rn, we have

Ne(vb)≤2Ne(vo).

Proof: Given the balancing threshold vb, the number of

0→1errors equals the number of 1→0errors, hence, the

total number of errors is

Ne(vb) = 2N(1→0)(vb) = 2N(0→1) (vb).

If vo≥vb, the number of 1→0errors N(1→0) (vo)≥

N(1→0)(vb). Therefore,

Ne(vb)≤2N(1→0)(vo)≤2Ne(vo).

Similarly, if vo< vb, by considering only 0→1errors, we

get the same conclusion.

Now we compare the balancing threshold vbwith a ﬁxed

threshold, denoted by vf. As shown in Fig. 3, if we set the

reading threshold as ﬁxed vf=1

2, then it will introduce

much more errors then the balancing threshold. Given a ﬁxed

threshold vf, after a long duration, we can characterize the

storage channel as a binary asymmetric channel, as shown in

Fig. 4(a), where p1> p2. Balanced modulation is actually a

process of modifying the channel to make it being symmetric.

As a result, balanced modulation results in a binary symmetric

channel with crossover probability psuch that p2< p < p1.

When p2≪p1, it has p−p2≪p1−p. In this case, the bit

error rate is reduced from p1+p2

2to p, where p≪p1+p2

2.

1

0

1

0

p1

p2

1

0

1

0

p

p

balanced modulation

(a) (b)

Fig. 4. Balanced modulation to turn a binary asymmetric channel with

crossover probabilities p1> p2into a binary symmetric channel with p2<

p < p1.

IV. BIT-ERROR-RATE ANALYSIS

To better understand different types of reading thresholds

as well as their performances, we study them from the ex-

pectation (statistical) perspective. Assume that we write nbits

(including kones) into a block at time 0, let gt(v)denote the

probability density function (p.d.f.) of the cell level at time t

that stores a bit 0, and let ht(v)denote the p.d.f. of the cell

level at time tthat stores 1. Then at time t, the bit error rate

of the block based on a reading threshold vis given by

pe(v) = 1

2Z∞

v

gt(u)du +1

2Zv

−∞

ht(v)dv.

According to our deﬁnition, a balancing threshold vbis

chosen such that N(1→0) (vb) = N(0→∞)(vb), i.e., the number

of 1→0errors is equal to the number of 0→1errors. As the

block length nbecomes sufﬁciently large, we can approximate

N(1→0)(vb)as n

2Rv

−∞ ht(v)dv and approximate N(0→∞)(vb)

as n

2R∞

vgt(u)du. So when nis large, we approximately have

Z∞

vb

gt(u)du =Zvb

−∞

ht(v)dv.

Differently, an optimal reading threshold vois the one that

minimizes the total number of errors. When nis large, we

approximately have

vo= arg min

vpe(v).

When gt(v)and ht(v)are continuous functions, the solutions

of voare

vo=±∞ or gt(vo) = ht(vo).

That means vois one of the intersections of gt(v)and ht(v)

or one of the inﬁnity points.

Generally, gt(v)and ht(v)are various for different non-

volatile memories and different blocks, and they have different

dynamics over time. It is not easy to ﬁnd a perfect model to

characterize gt(v)and ht(v), but there are two trends about

them in timescale. The change of a cell level can be treated as a

superposition of these two trends. First, due to cell-level drift,

the difference between the means of gt(v)and ht(v)becomes

smaller. Second, due to the existence of different types of noise

and disturbance, their variances increases over time. To study

the performance of balanced modulation, we consider both of

the effects separately in some simple scenarios.

Example 1. Let gt(v) = N(0, σ)and ht(v) = N(1 −t, σ),

as illustrated in Fig. 5. We assume that the ﬁxed threshold is

vf=1

2, which satisﬁes g0(vf) = h0(vf).

In the above example, the cell-level distribution correspond-

ing to bit ‘1’ drifts but its variance does not change. We have

vb=vo=1−t

2, vf=1

2.

At time t, the bit error rate based on a reading threshold v

is

pe(v) = 1

2Φ(−v

σ) + 1

2Φ(−1−t−v

σ),

where Φ(x) = 1

√2πRx

−∞ e−t2/2dt.

5

01

cell−level

t

Fig. 5. An illustration of the ﬁrst model with gt(v) = N(0, σ)and ht(v) =

N(1 −t, σ).

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

t

Bit Error Rate

σ=0.15

σ=0.05 fixed

fixed

balancing,

optimal

balancing,

optimal

Fig. 6. Bit error rates as functions of time t, under the ﬁrst model with

gt(v) = N(0, σ)and ht(v) = N(1 −t, σ).

For different selections of reading thresholds, pe(v)is

plotted in Fig. 6. It shows that the balancing threshold and

the optimal threshold have the same performance, which is

much better than the performance of a ﬁxed threshold. When

cell levels drift, balanced modulation can signiﬁcantly reduce

the bit error rate of a block.

Example 2. Let gt(v) = N(0, σ)and ht(v) = N(1, σ +t),

as illustrated in Fig. 7. We assume that the ﬁxed threshold is

vf=1

2, which satisﬁes g0(vf) = h0(vf).

t

cell−level

01

Fig. 7. An illustration of the second model with gt(v) = N(0, σ)and

ht(v) = N(1, σ +t).

In this example, the variance of the cell-level distribution

corresponding to bit ‘1’ increases as the time tincreases. We

have

e−vo2

2σ2=σ

σ+te−(1−vo)2

2(σ+t)2, vb=1

2 + t/σ , vf=1

2.

At time t, the bit error rate based on a threshold vis

pe(v) = 1

2Φ(−v

σ) + 1

2Φ(−1−v

σ+t),

which is plotted in Fig. 8 for different thresholds. It shows

that balancing thresholds introduce much less errors than

ﬁxed thresholds when bit ‘1’ and ‘0’ have different reliability

(reﬂected by their variances), although they introduce slightly

more errors than optimal thresholds.

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

t

Bit Error Rate

σ=0.15

σ=0.05

fixed

fixed balancing

balancing

optimal

optimal

Fig. 8. Bit error rates as functions of time t, under the second model with

gt(v) = N(0, σ)and ht(v) = N(1, σ +t).

In practice, the cell-level distributions at a time tare much

more complex than the simple Gaussian distributions, and

the errors introduced are due to many complex mechanisms.

However, the above analysis based two simple models are still

useful, because they reﬂect the trends of the cell level changes,

which is helpful for analyzing the time-dependent errors in

nonvolatile memories.

V. IMPLEMENTATION

Balanced modulation can be easily implemented on the

current architecture of nonvolatile memories. The process

described in the previous sections can be treated as a hard

decision approach, where a reading threshold is selected to

separate all the cell levels as zeros and ones. In this section,

we discuss a few methods of determining balancing thresholds

quickly, as well as their implementations in nonvolatile mem-

ories. Furthermore, we discuss soft decision implementation

of balanced modulation, namely, we do not read data based

on a reading threshold, and the decoder can get access into

all the cell levels (cell-level vector c) directly. In this case,

we want to know how the prior information that the stored

6

codeword is balanced can help us to increase the success rate

of decoding.

A. Balancing Threshold for Hard Decision

Given a block of ncells, assume their current levels are c=

c1c2...cn. Our problem is to determine a threshold vbsuch that

there are n

2cells or approximately n

2cells will be read as ones.

A trivial method is to sort all the ncell levels in the decreasing

order such that ci1≥ci2≥... ≥cin. Then vb=cik+cik+1

2

is our desired balancing threshold. The disadvantage of this

method is that it needs O(nlog n)computational time, which

may slow down the reading speed when nis large. To reduce

the reading time, we hope that the balancing threshold can be

controlled by hardware.

Half-interval search is a simple approach of determining the

balancing threshold. Assume it is known that vbis ∈[l1, l2]

with l1< l2. First, we set the reading threshold as l1+l2

2, based

on which a simple circuit can quickly detect the number of

ones in the resulting word, denoted by k. If k < n

2, we reset

the interval [l1, l2]as [l1,l1+l2

2]. If k > n

2, we reset the interval

[l1, l2]as [l1+l2

2, l2]. Then we repeat this procedure until we

get a reading threshold such that k=n

2or l2−l1≤ǫfor a

reading precision ǫ.

B. Relaxed Balancing Threshold

Half-interval search is an iterative approach of determining

the balancing threshold such that the resulting word is well

balanced. To further reduce the reading time, we can relax the

constraint on the weight of the resulting word, namely, we can

let the number of ones in the resulting word be approximately

n

2, instead of accurately n

2.

For instance, we can simply set the balancing threshold as

vb=Pn

i=1 ci

n=mean(c).

Obviously, such vbreﬂects the cell-level drift and it can be

easily implemented by a simple circuit.

More precisely, we can treat mean(c)as the ﬁrst-order

approximation, in this way, we write vbas

vb=mean(c) + a(1

2−mean(c))2,

where ais a constant depending on the noise model of memory

devices.

C. Prior Probability for Soft Decision

Reading data based on hard decision is preferred in non-

volatile memories, regarding to its advantages in reading

speed and computational complexity compared to soft decision

decoding. However, in some occasions, soft decision decoding

is still useful for increasing the decoding success rate. We

demonstrate that the prior knowledge that the stored code-

words are balanced can help us to better estimate the cell-level

probability distributions for 0or 1. Hence, it leads to a better

soft decoding performance.

We assume that given a stored bit, either 0or 1, its cell

level is Gaussian distributed. (We may also use some other

distribution models according to the physical properties of

memory devices, and our goal is to have a better estimation

of model parameters). Speciﬁcally, we assume that the cell-

level probability distribution for 0is N(u0, σ0)and the cell-

level probability distribution for 1is N(u1, σ1). Since the

codewords are balanced, the probability for a cell being 0

or 1is equal. So we can describe cell levels by a Gaussian

Mixture Model. Our goal is to ﬁnd the maximum likelihood

u0, σ0, u1, γ1based on the cell-level vector c, namely, the

parameters that maximize

P(c|u0, σ0, u1, σ1).

Expectation-Maximization (EM) algorithm is an itera-

tive method that can easily ﬁnd the maximum likelihood

u0, σ0, u1, γ1. The EM iteration alternates between performing

an expectation (E) step and a maximization (M) step. Let

x=x1x2...xnbe the codeword stored in the current block,

and let λt= [u0(t), σ0(t), u1(t), γ1(t)] be the estimation of

the parameters in the tth iteration. In the E-step, it computes

the probability for each cell being 0or 1based on the current

estimation of the parameters, namely, for all i∈ {1,2, ..., n},

it computes

P(xi=k|ci, λt) =

1

σk(t)e−(ci−uk(t))2

2σk(t)2

P1

k=0 1

σk(t)e−(ci−uk(t))2

2σk(t)2

.

In the M-step, it computes parameters maximizing the like-

lihood with given the probabilities obtained in the E-step.

Speciﬁcally, for k∈ {0,1},

uk(t+ 1) = Pn

i=1 P(xi=k|ci, λt)ci

Pn

i=1 P(xi=k|ci, λt),

σk(t+ 1)2=Pn

i=1 P(xi=k|ci, λt)(ci−uk(t+ 1))2

Pn

i=1 P(xi=k|ci, λt).

These estimations of parameters are then used to determine

the distribution of xiin the next E-step.

Assume u0, σ0, u1, σ1are the maximum-likelihood param-

eters, based on which we can calculate the log-likelihood for

each variable xi, that is

λi=log f(ci|xi= 0)

log f(ci|xi= 1) =log 1

σ0−(ci−u0)2

2σ2

0

log 1

σ1−(ci−u1)2

2σ2

1

,

where fis the probability density function. Based on the log-

likelihood of each variable xi, some soft decoding algorithms

can be applied to read data, including message-passing algo-

rithms [13], linear programming [6], etc. It will be further

discussed in the next section for decoding balanced LDPC

code.

VI. BALANCED LDPC CODE

Balanced modulation can signiﬁcantly reduce the bit error

rate of a block in nonvolatile memories, but error correction

is still necessary. So we study the construction of balanced

error-correcting codes. In the programming step, we encode

the information based on a balanced error-correcting code and

7

write it into a block. In the reading step, the reading threshold

is adjusted such that it yields a balanced word, but probably

erroneous. Then we pass this word to the decoder to further

retrieve the original information.

A. Construction

In this section, we introduce a simple construction of bal-

anced error-correcting codes, which is based on LDPC codes,

called balanced LDPC code. LDPC codes, ﬁrst introduced by

Gallager [7] in 1962 and rediscovered in 1990s, achieve near

Shannon-bound performances and allow reasonable decoding

complexities. Our construction of balanced LDPC code is

obtained by inverting the ﬁrst ibits of each codeword in

a LDPC code such that the codeword is balanced, where i

is different for different codewords. It is based on Knuth’s

observation [10], that is, given an arbitrary binary word of

length kwith keven, one can always ﬁnd an integer iwith

0≤i < k such that by inverting the ﬁrst ibits the word

becomes balanced. Different from the current construction in

[20], where iis stored and protected by a lower-rate balanced

error-correcting codes (the misdecoding of imay lead to

catastrophic error propagation in the information word), we

do not store iin our construction. The main idea is that

certain redundancy exists in the codewords of LDPC codes

that enables us to locate ior at last ﬁnd a small set that

includes iwith a very high probability, even some errors

exist in the codewords. It is wasteful to store the value of i

with a lower-rate balanced error-correcting code. As a result,

our construction is more efﬁcient than the recent construction

proposed in [20].

LDPC

encoder

inverting the

first i bits

uzx

Fig. 9. Encoding of balanced LDPC codes.

Let ube the message to encode and its length is k, according

to the description above, the encoding procedure consists of

two steps, as shown in Fig. 9:

1) Apply an (n, k)LDPC code Lto encode the message u

into a codeword of length n, denoted by z=Gu, where

Gis the generator matrix of L.

2) Find the minimal integer iin {0,1, ..., n −1}such that

inverting the ﬁrst ibits of zresults in a balanced word

x=z+1i0n−i,

where 1i0n−idenotes a run of ibits 1and n−ibits 0.

Then we denote xas φ(z). This word xis a codeword

of the resulting balanced LDPC code, denoted by C.

We see that a balanced LDPC code is constructed by simply

balancing the codewords of a LDPC code, which is called the

original LDPC code. Based on the procedure above we can

encode any message uof length kinto a balanced codeword x

of length n. The encoding procedure is very simple, but how

balanced

x

y

z

Fig. 10. Demonstration for the decoding of balanced LDPC codes.

to decode a received word? Now, we focus on the decoding of

this balanced LDPC code. Let ybe an erroneous word received

by the decoder, then the output of the maximum likelihood

decoder is

ˆx = arg min

x∈C D(y,x),

where D(y,x)is the distance between yand xdepending

on the channel, for instance, Hamming distance for binary

symmetric channels.

The balanced code Cis not a linear code, so the constraint

x∈ C is not easy to deal with. A simpler way is to think

about the codeword z∈ L that corresponds to x. By inverting

the ﬁrst jbits of ywith 0≤j < n, we can get a set of words

Syof size n, namely,

Sy={y(0),y(1), ..., y(n−1)},

in which

y(j)=y+1j0n−j,

for all j∈ {0,1,2, ..., n}. Then there exists an i∈

{0,1,2, ..., n −1}such that

y(i)−z=y−x.

The output of the maximum likelihood decoder is

(ˆz,ˆ

i) = arg min

z′∈L,i′∈{0,1,2...,n}D(y(i′),z′),

subject to i′is the minimum integer that makes z′+1i′0n−i′

being balanced.

If we ignore the constraint that ihas to be the minimum

integer, then the output of the decoder is the codeword in L

that has the minimum distance to Sy. Fig. 10 provides a simple

demonstration, where the solid circles are for the codewords

of the LPDC code L, the triangles are for the words in Sythat

are connected by lines. Our goal is to ﬁnd the solid circle that

is the closest one to the set of triangles. It is different from

traditional decoding of linear codes whose goal is to ﬁnd the

closest codeword to a single point.

8

B. An Extreme Case

LDPC codes achieve near Shannon bound performances. A

natural question is whether balanced LDPC codes hold this

property. Certain difﬁculties exist in proving it by following

the method in [8] (section 2 and section 3), since balanced

LDPC codes are not linear codes and the distance distribu-

tions of balanced LDPC codes are not easy to characterize.

Fortunately, this statement looks correct because if the ﬁrst i

bits of a codeword have been inverted (we assume that the

interger iis unknown), then the codeword can be recovered

with only little cost, i.e., a very small number of additional

redundant bits.

Let us consider the ensemble of an (n, a, b)parity-check

matrix given by Gallager [8], which has aones in each column,

bones in each row, and zeros elsewhere. According to this

construction, the matrix is divided into asubmatrices, each

containing a single 1in each column. All the submatrices are

random column permutations of a matrix that has a single one

in each column and bones in each row. As a result, we have

(n, a, b)LDPC codes.

Theorem 2. Given a codeword zof an (n, a, b)LDPC code,

we get

x=z+1i0n−i

by inverting the ﬁrst ibits of zwith 0≤i < n. Let Pe(x)

be the error probability that zcannot be correctly recovered

from xif iis unknown. As n→ ∞,

Pe(x)→0,

for any integers aand b.

Proof: Let Hbe the parity-check matrix of the LDPC

code, and let

y(j)=x+1j0n−j,

for all j∈ {0,1,2, ..., n −1}.

We can recover zfrom xif and only if

Hy(j)6= 0,

for all j6=iand 0≤j≤n−1.

Hence,

Pe(x) = P(∃j6=i, s.t., Hy(j)= 0)

≤X

j6=i

P(Hy(j)= 0).

Let us ﬁrst consider the case of j > i. We have Hy(j)= 0

if and only if

H(y(j)+z) = 0,

where

y(j)+z=0i1j−i0n−j.

So Hy(j)= 0 is equivalent to

H(0i1j−i0n−j) = 0.

As we described, His constructed by asubmatrices,

namely, we can write Has

H=

H1

H2

.

.

.

Ha

.

Let Hsbe one of the asubmatrices of H, then Hcontains

a single one in each columns and bones in each row. And it

satisﬁes

Hs(0i1j−i0n−j) = 0,

i.e., in each row of Hs, there are even number of ones from

the i+ 1th column to the jth column.

According to the construction of (n, a, b)LDPC codes,

P(Hs(0i1j−i0n−j) = 0) = P(Hs(1j−i0n−j+i) = 0).

So we can use P(n, j −i)to denote P(Hs(0i1j−i0n−j) = 0).

First, we consider the case that bis even. In this case,

P(n, j −i) = P(n, n −j+i).

Hence, without loss of generality, we can assume that j−i=

d≤n

2.

It is easy to see that P(n, j −i)>0only if dis even.

Assume that the one in the ﬁrst column of Hsis in the tth

row, and let ube the number of ones in the tth row from the

ﬁrst j−icolumns. Then we can get

P(n, d) = X

u=2,4,... b

u−1(d−1

n−1)u−1

×(n−d

n−1)b−uP(n−b, d −u),

where P(n, d) = 1 if n=dor d= 0.

If d < log n, then P(n, d) = O(log n

n).

If log n≤d≤n

2, then

X

u=2,4,... b

u−1(d−1

n−1)u−1(n−d

n−1)b−u≤b−1

b.

Iteratively, we can prove that

P(n, d) = O(( b−1

b)log n

2b).

Similar as above, when j < i, we can get

P(Hy(j)= 0) ≤P(n, i −j).

Finally, we have

Pe(x)≤

n−1−i

X

s=1

P(n, s) +

i

X

s=1

P(n, s) = O(log n

n).

So if bis even, as n→ ∞,Pe(x)→0.

If bis odd, in each row, there exists at least one 1in the last

n−j+ielements. As a result, n−j+i≥n

b. Using a same

idea as above, we can also prove that as n→ ∞,Pe(x)→0.

So the statement in the theorem is true for any rate R=

b−a

b<1. This completes the proof.

9

The above theorem considers an extreme case that if the

codeword of a balanced LDPC code does not have errors,

then we can recover the original message with little cost

of redundancy. It implies that balanced LDPC codes may

achieve almost the same rates as the original unbalanced LDPC

codes. In the following subsections, we discuss some decoding

techniques for binary erasure channels and binary symmetric

channels. Simulation results on these channels support the

above statement.

C. Decoding for Erasure Channels

In this subsection, we consider binary erasure channels

(BEC), where a bit (0or 1) is either successfully received

or it is deleted, denoted by “?”. Let y∈ {0,1,?}nbe a word

received by a decoder after transmitting a codeword x∈ C

over a BEC. Then the key of decoding yis to determine the

value of the integer isuch that xcan be obtained by inverting

the ﬁrst ibits of a codeword in L.

A simple idea is to search all the possible values of i, i.e., we

decode all the possible words y(0),y(1), ..., y(n−1)separately

and select the best resulting codeword that satisﬁes all the

constraints as the ﬁnal output. This idea is straightforward,

but the computational complexity of the decoding increases

by a factor of n, which is not acceptable for most practical

applications.

Our observation is that we might be able to determine the

value of ior at least ﬁnd a feasible set that includes i, based

on the unerased bits in y. For example, given x∈ L, assume

that one parity-check constraint is

xi1+xi2+... +xi4= 0.

If all yi1, yi2, ..., yi4are observed (not erased), then we can

have the following statement about i:

(1) If yi1+yi2+... +yi4= 0, then

i∈[0, i1)[[i2, i3)[[i4, n].

(2) If yi1+yi2+... +yi4= 1, then

i∈[i1, i2)[[i3, i4).

By combining this observation with the message-passing

algorithm, we get a decoding algorithm for balanced LDPC

codes under BEC. Similar as the original LDPC code, we

present a balanced LDPC code as a sparse bipartite graph with

nvariable nodes and rcheck nodes, as shown in Fig. 11.

Additionally, we add an inversion node for representing the

value or the feasible set of i. Let us describe a modiﬁed

message-passing algorithm on this graph. In each round of

the algorithm, messages are passed from variable nodes and

inversion nodes to check nodes, and then from check nodes

back to variable nodes and inversion nodes.

We use Idenote the feasible set consisting of all possible

values for the integer i, called inversion set. At the ﬁrst round,

we initialize the jth variable node yj∈ {0,1,?}and initialize

the inversion set as I= [0, n]. Then we pass message and

update the graph iteratively.In each round, we do the following

operations.

x1

x2

x7

x3

x4

x5

x6

i

variable node

check node

inversion node

x8

Fig. 11. Graph for balanced LDPC codes.

(1) For each variable node v, if its value xvis in {0,1}, it

sends xvto all its check neighbors. If xv=? and any incoming

message uis 0or 1, it updates xvas uand sends uto all its

check neighbors. If xv=? and all the incoming messages are

?, it sends ?to all its check neighbors.

(2) For each check node c, assume the messages from its

variable neighbors are xi1, xi2, ..., xib, where i1, i2, ..., ibare

the indices of these variable nodes s.t. i1< i2< ... < ib.

Then we deﬁne

S0

c= [0, i1)[[i2, i3)[...,

S1

c= [i1, i2)[[i3, i4)[....

If all the incoming messages are in {0,1}, then we update I

in the following way: If xi1+xi2+... +xib= 0, we update

Ias ITS0

c; otherwise, we update Ias ITS1

c. In this case,

this check node cis no longer useful, so we can remove this

check node from the graph.

(3) For each check node c, if there are exactly one incoming

message from its variable neighbor which is xj=? and all

other incoming messages are in {0,1}, we check whether I ⊆

S0

cor I ⊆ S1

c. If I ⊆ S0

c, then the check node sends the XOR

of the other incoming messages except ?to xj. If I ⊆ S1

c, then

the check node sends the XOR of the other incoming messages

except ?plus one to xj. In this case, the check node cis also

no longer useful, so we can remove this check node from the

graph.

The procedure above continues until all erasures are ﬁlled

in, or no erasures are ﬁlled in the current iteration. Differ-

ent from the message-passing decoding algorithm for LDPC

codes, where in each iteration both variable nodes and check

nodes are processed only once, here, we process variable nodes

once but check nodes twice in each iteration. If all erasures

are ﬁlled in, xis the binary vector labeled on the variable

10

0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

0

1

2

3

4

5

6

Erasure Probability p

Average Size of the Inversion Set

(120,2,6) balanced LDPC codes

(120,3,6) balanced LDPC codes

(1200,3,6) balanced LDPC codes

Fig. 12. The average size of the inversion set Iafter iterations in the

message-passing algorithm for decoding balanced LDPC codes.

nodes. In this case, if |I | = 1, then iis the only element in

I, and we can get z∈ L by calculating

z=x+1i0n−i.

If there are still some unknown erasures, we enumerate all

the possible values in Ifor the integer i. Usually, |I| is small.

For a speciﬁc i, it leads to a feasible solution zif

(1) Given I={i}, with the message-passing procedure

above, all the erasures can be ﬁlled in.

(2) xis balanced, namely, the numbers of ones and zeros

are equal for the variable nodes.

(3) Let z=x+1i0n−i. Then iis the minimal integer in

{0,1,2, ..., n}subject to z+1i0n−iis balanced.

We say that a word ywith erasures is uniquely decodable if

and only if there exists i∈ I that leads to a feasible solution,

and for all such integers ithey result in the unique solution

z∈ L. The following simple example is provided for the

purpose of demonstrating the decoding process.

Example 3. Based on Fig. 11, we have a codeword x=

01111000, which is transmitted over an erasure channel. We

assume that the received word is y= 011110??.

In the ﬁrst round of the decoding, we have

x(1)= 011110??,I= [0,8].

Considering the 2nd check node, we can update Ias

I={0,1,4,5}.

Considering the 3nd check node, we can continue updating

Ias

I=I\{1,2,6,7,8}={1}.

Based on (3), we can ﬁll 0,0for the 7th and 8th variable

nodes. Finally, we get z= 11111000 and i= 1.

Regarding to the decoding algorithm described above, there

are two important issues that need to consider, including the

decoding complexity of the algorithm and its performance.

20 40 60 80 100 120

0.1

0.2

0.3

0.4

0.5

0.6

Blocklength n

Word Error Rate

(3,6) unbalanced LDPC codes

(3,6) balanced LDPC codes

Fig. 13. Word error rate of balanced LDPC codes and unbalanced LDPC

codes when the erasure probability p= 0.35.

First, the decoding complexity of the algorithm strongly de-

pends on the size of Iwhen it ﬁnishes iterations. Fig. 12

simulates the average size of the inversion set Ifor decoding

three balanced LDPC codes. It shows that when the crossover

probability is lower than a threshold, the size of Iis smaller

than a constant with a very high probability. In this case, the

decoding complexity of the balanced LDPC code is very close

to the decoding complexity of the original unbalanced LDPC

code.

Another issue is about the performance of the decoding

algorithm for balanced LDPC codes. In particular, we want

to ﬁgure out the cost of additional redundancy in correcting

the inversion of the ﬁrst ibits when iis unknown. In Fig. 13,

it presents the word error rate of balanced LDPC codes

and the corresponding original unbalanced LDPC codes for

different block lengths. It is interesting to see that as the block

length increases, the balanced LDPC codes and the original

unbalanced LDPC codes have almost the same performance,

that is, the cost of correcting the inversion of the ﬁrst ibits is

ignorable.

D. Decoding for Symmetric Channels

In this subsection, we study and analyze the decoding

of balanced LDPC codes for symmetric channels, including

binary symmetric channels (BSC) and AWGN (Additive White

Gaussian Noise) channels. Different from binary erasure chan-

nels (BEC), here we are not able to determine a small set that

deﬁnitely includes the integer i. Instead, we want to ﬁgure out

the most possible values for i. Before presenting our decoding

algorithm, we ﬁrst introduce belief propagation algorithm for

decoding LDPC codes.

Belief propagation [13], where messages are passed iter-

atively across a factor graph, has been widely studied and

recommended for the decoding of LDPC codes. In each

iteration, each variable node passes messages (probabilities)

to all the adjacent check nodes and then each check node

11

passes messages (beliefs) to all the adjacent variable nodes.

Speciﬁcally, let m(ℓ)

vc be the message passed from a variable

node vto a check node cat the ℓth round of the algorithm,

and let m(ℓ)

cv be the message from a check node cto a variable

node v. At the ﬁrst round, m(0)

vc is the log-likelihood of the

node vconditioned on its observed value, i.e., log P(y|x=0)

P(y|x=1)

for variable xand its observation y. This value is denoted by

mv. Then the iterative update procedures can be described by

the following equations

m(ℓ)

vc =(mvℓ= 0,

mv+Pc′∈N(v)/c m(ℓ−1)

c′vℓ≥1,

m(ℓ)

cv = 2 tanh−1(Y

v′∈N(c)/v

tanh(m(ℓ)

v′c

2)),

where N(v)is the set of check nodes that connect to variable

node vand N(c)is the set of variable nodes that connect

to check node c. In practice, the belief-propagation algorithm

stops after a certain number of iterations or until the passed

likelihoods are close to certainty. Typically, for a BSC with

crossover probability p, the log-likelihood mvfor each vari-

able node vis a constant depending on p. Let xbe the variable

on vand let ybe its observation, then

mv=(log 1−p

pif y= 0,

−log 1−p

pif y= 1.

Let us consider the decoding of balanced LDPC codes.

Assume x∈ C is a codeword of a balanced LDPC code,

obtained by inverting the ﬁrst ibits of a codeword zin a

LDPC code L. The erroneous word received by the decoder

is y∈ Ynfor an alphabet Y. For example, Y={0,1}for

BSC channels, and Y=Rfor AWGN channels. Here, we

consider a symmetric channel, i.e., a channel for which there

exists a permutation πof the output alphabet Ysuch that (1)

π−1=π, and (2) P(y|1) = P(π(y)|0) for all y∈ Y, where

P(y|x)is the probability of observing ywhen the input bit is

x.The biggest challenge of decoding a received word y∈

Ynis lacking of the location information about where the

inversion happens, i.e., the integer i. We let

y(i)=π(y1)π(y2)...π(yi)yi+1...yn,

for all i∈ {0,1,2, ..., n −1}. A simple idea is to search all

the possibilities for the integer ifrom 0to n−1, i.e, decoding

all the words

y(0),y(1), ..., y(n−1)

separately. Assume their decoding outputs based on belief

propagation are

ˆz(0),ˆz(1), ...ˆz(n),

then the ﬁnal output of the decoder is ˆz =ˆz(j)such that

P(y(j)|ˆz(j))is maximized. The drawback of this method is

its high computational complexity, which is about ntimes the

complexity of decoding the original unbalanced LDPC code.

To reduce computational complexity, we want to estimate the

value of iin a simpler and faster way, even sacriﬁcing a little

bit of performance on bit error rate.

The idea is that when we are using belief propagation to de-

code a group of words y(0),y(1), ..., y(n−1), some information

can be used to roughly compare their goodness, namely, their

distances to the nearest codewords. To ﬁnd such information,

given each word y(i)(here, we denote it as yfor simplicity),

we run belief propagation for ℓrounds (iterations), where ℓis

very small, e.g., ℓ= 2. There are several ways of estimating

the goodness of y, and we introduce one of them as follows.

Given a word y, we deﬁne

λ(y, ℓ) = X

c∈CY

v∈N(c)

tanh(m(ℓ)

vc /2),

where Cis the set of all the variable nodes, N(c)is the set

of neighbors of a check node c, and m(ℓ)

vc is the message

passed from a variable node vto a check node cat the ℓth

round of the belief-propagation algorithm. Roughly, λ(y, ℓ)is

a measurement of the number of correct parity checks for the

current assignment in belief propagation (after ℓ−1iterations).

For instance,

λ(y, ℓ = 1) = α(r−2|Hy|),

for a binary symmetric channel. In this expression, αis a

constant, r=n−kis the number of redundancies, and |Hy|

is the number of ones in Hy, i.e., the number of unsatisﬁed

parity checks.

Generally, the bigger λ(y(j), ℓ)is, the more likely j=iis.

So we can get the most likely iby calculating

ˆ

i= arg n−1

max

j=0 λ(y(j), ℓ).

Then we decode y(

ˆ

i)as the ﬁnal output. However, the pro-

cedure requires to calculate λ(y(j), ℓ)with 0≤j≤n−1.

The following theorem shows that the task of computing all

λ(y(j), ℓ)with 0≤j≤n−1can be ﬁnished in linear time

if ℓis a small constant.

Theorem 3. The task of computing all λ(y(j), ℓ)with 0≤j≤

n−1can be ﬁnished in linear time if ℓis a small constant.

Proof: First, we calculate λ(y(0), ℓ). Based on the belief-

propagation algorithm described above, it can be ﬁnished in

O(n)time. In this step, we save all the messages including

mv,m(l)

cv ,m(l)

vc for all c∈C, v ∈Vand 1≤l≤ℓ.

When we calculate λ(y(1), ℓ), the only change on the inputs

is mv1, where v1is the ﬁrst variable node (the sign of mv1is

ﬂipped). As a result, we do not have to calculate all mv,m(l)

cv ,

m(l)

vc for all c∈C, v ∈Vand 1≤l≤ℓ. Instead, we only

need to update those messages that are related with mv1. It

needs to be noted that the number of messages related to mv1

has an exponential dependence on ℓ, so the value of ℓshould

be small. In this case, based on the calculation of λ(y(0) , ℓ),

λ(y(1), ℓ)can be calculated in a constant time. Similarly, each

of λ(y(j), ℓ)with 2≤j≤n−1can be obtained iteratively

in a constant time.

Based on the process above, we can compute all λ(y(j), ℓ)

with 0≤j≤n−1in O(n)time.

To increase the success rate of decoding, we can also create

a set of most likely values for i, denoted by Ic.Icconsists of

12

at most clocal maximums with the highest values of λ(y(i), ℓ).

Here, we say that j∈ {0,1,2,3, ..., n −1}is a local maximum

if and only if

λ(y(j), ℓ)> λ(y(j−1), ℓ), λ(y(j), ℓ)≥λ(y(j+1), ℓ).

Note that I1={ˆ

i}, where ˆ

iis the global maximum as

deﬁned above. If c > 1, for all j∈ Ic, we decode y(j)

separately and choose the output with the maximum likelihood

as the ﬁnal output of the decoder. It is easy to see that the

the above modiﬁed belief-propagation algorithm for balanced

LDPC codes has asymptotically the same decoding complexity

as the belief-propagation algorithm for LDPC codes, that is,

O(nlog n).

In Fig. 14, it shows the performance of the above algo-

rithm for decoding balanced LDPC codes under BSC and the

performance of belief propagation algorithm for the original

LDPC codes. From which, we see that when ℓ= 2 and c= 4,

the performance gap between balanced (280,4,7) LDPC code

and unbalanced (280,4,7) LDPC code is very small. This

comparison implies that the cost of correcting the inversion of

the ﬁrst ibits (when iis unknown) is small for LDPC codes.

Let us go back the scheme of balanced modulation. The

following examples give the log-likelihood of each variable

node when the reading process is based on hard decision and

soft decision, respectively. Based on them, we can apply the

modiﬁed propagation algorithm in balanced modulation.

Example 4. If the reading process is based on hard decision,

then it results in a binary symmetric channel with crossover

probability p. In this case, let ybe the observation on a

variable node v, the log-likelihood for vis

mv=(log 1−p

pif y= 0,

−log 1−p

pif y= 1.

Example 5. If the reading process is based on soft decision,

then we can approximate cell-level distributions by Gaus-

sian distributions, which are characterized by 4parameters

u0, σ0, u1, σ1. These parameters can be obtained based on

the cell-level vector y=c, following the steps in Subsection

V-C. In this case, if the input of the decoder is y, then the

log-likelihood of the ith variable node vis

mv=λi=log 1

σ0−(ci−u0)2

2σ2

0

log 1

σ1−(ci−u1)2

2σ2

1

where ciis the current level of the ith cell. If the input of the

decoder is y(i)(we don’t have to care about its exact value),

then the log-likelihood of the ith variable node vis

mv=λiif i > j,

−λiif i≤j, ,

for all 0≤i < n.

VII. PARTIAL-BALANCED MODULATION

Constructing balanced error-correcting codes is more difﬁ-

cult than constructing normal error-correcting codes. A ques-

tion is: is it possible to design some schemes that achieve

similar performances with balanced modulation and have

data

data (inverted)

inverting

the first i bits

i

redundancy

systematic ECC

data (inverted) i

balanced unbalanced

Fig. 15. Partial balanced code.

simple error-correcting code constructions? With this moti-

vation, we propose a variant of balanced modulation, called

partial-balanced modulation. The main idea is to construct an

error-correcting code whose codewords are partially balanced,

namely, only a certain segment of each codeword is balanced.

When reading information from a block, we adjust the reading

threshold to make this segment of the resulting word being

balanced or being approximately balanced.

One way of constructing partial-balanced error-correcting

codes is shown in Fig. 15. Given an information vector u

of kbits (kis even), according to Knuth’s observation [10],

there exists an integer iwith 0≤i < k such that inverting

the ﬁrst ibits of uresults in a balanced word e

u. Since our

goal is to construct a codeword that is partially balanced,

it is not necessary to present iin a balanced form. Now,

we use idenote the binary representation of length ⌈log2k⌉

for i. To further correct potential errors, we consider [e

u,i]

as the information part and add extra parity-check bits by

applying a systematic error-correcting code, like BCH code,

Reed-Solomon code, etc. As a result, we obtain a codeword

x= [e

u,i,r]where ris the redundancy part. In this codeword,

e

uis balanced, [i,r]is not balanced.

Note that in most data-storage applications, the bit error rate

of a block is usually very small. The application of modulation

schemes can further reduce the bit error rate. Hence, the

number of errors in real applications is usually much smaller

than the block length. In this case, the total length of [i,r]is

smaller or much smaller than the code dimension k. As the

block length nbecomes large, like one thousand, the reading

threshold determined by partial-balanced modulation is almost

the same as the one determined by balanced modulation. One

assumption that we made is that all the cells in the same

block have similar noise properties. To make this assumption

being sound, we can reorder the bits in x= [e

u,i,r]such

that the kcells of storing e

uis (approximately) randomly

distributed among all the ncells. Compared to balanced

13

0.050.0550.060.0650.070.0750.080.0850.090.0950.1

10−4

10−3

10−2

10−1

100

Ratio of Number of Errors to Blocklength

Word Error Rate

ℓ=1,c=1

ℓ=2,c=1

ℓ=2,c=2 (from top to bottom)

ℓ=2,c=3

ℓ=2,c=4

unbalanced LDPC codes

Fig. 14. World error rate of (280,4,7) LDPC codes with maximal 50 iterations.

modulation, partial-balanced modulation can achieve almost

the same performance, and its code construction is much easier

(the constraints on the codewords are relaxed). In the following

two examples, it compares the partial-balanced modulation

scheme with the traditional one based on a ﬁxed threshold.

Example 6. Let us consider a nonvolatile memory with block

length n= 255. To guarantee the data reliability, each block

has to correct 18 errors if the reading process is based

on a ﬁxed reading threshold. Assume (255,131) primitive

BCH code is applied for correcting errors, then the data

rate (deﬁned by the ratio between the number of available

information bits and the block length) is

131

255 = 0.5137.

Example 7. For the block discussed in the previous example,

we assume that it only needs to correct 8errors based

on partial-balanced modulation. In this case, we can apply

(255,191) primitive BCH code for correcting errors, and the

data rate is 191 −8

255 = 0.7176,

which is much higher than the one obtained in the previous

example.

The reading/decoding process of partial-balanced modu-

lation is straightforward. First, the reading threshold vbis

adjusted such that among the cells corresponding to uthere

are k/2cells or approximately k/2cells with higher levels

than vb. Based on this reading threshold vb, the whole block

is read as a binary word y, which can be further decoded as

[e

u,i]if the total number of errors is well bounded. Then we

obtain the original message uby inverting the ﬁrst ibits of

e

u.

VIII. BALANCED CODES FOR MULTI-LEVEL CELLS

In order to maximize the storage capacity of nonvolatile

memories, multi-level cells (MLCs) are used, where a cell of

qdiscrete levels can store log2qbits [3]. Flash memories with

4 and 8 levels have been used in products, and MLCs with 16

levels have been demonstrated in prototypes. For PCMs, cells

with 4or more levels have been in development.

The idea of balanced modulation and partial-balanced mod-

ulation can be extended to multi-level cells. For instance, if

each cell has 4levels, we can construct a balanced code in

which each codeword has the same number of 0s, 1s, 2s,

and 3s. When reading data from the block, we adjust three

reading thresholds such that the resulting word also has the

same number of 0s, 1s, 2s, and 3s. The key question is how

to construct balanced codes or partial-balanced codes for an

alphabet size q > 2.

A. Construction based on Rank

A simple approach of constructing balanced codes for

a nonbinary case is to consider the message as the rank

of its codeword among all its permutations, based on the

14

lexicography order. If the message is u∈ {0,1}k, then the

codeword length nis the minimum integer such that n=qm

and qm

m m ... m>2k.The following examples are

provided for demonstrating the encoding and decoding pro-

cesses.

Example 8. Assume the message is u= 1010010010 of length

10 and q= 3. Since 9

3 3 3>210, we can convert u

to a balanced word xof length 9and alphabet size q= 3.

Let Sdenote the set that consists of all the balanced words

of length 9and alphabet size q= 3. To map uinto a word in

S, we write uinto the decimal form r= 658 and let rbe the

rank of xin Sbased on the lexicographical order.

Let us consider the ﬁrst symbol of x. In S, there are totally

8

2 3 3= 560 sequences starting with 0, or 1, or 2. Since

560 ≤r < 560 + 560, the ﬁrst symbol in xwould be 1, then

we update ras r−560 = 98, which is the rank of xamong

all the sequences starting with 1.

Let us consider the second symbol of x. There are totally

8

2 2 3sequences starting with 10, and it is larger than

r, so the second symbol of xis 0.

Repeating this process, we can convert uinto a balanced

word x= 101202102.

Example 9. We use the same notations as the above example.

Given x= 101202102, it is easy to calculate its rank in S

based on the lexicographical order (via enumerative source

coding [5]). It is

r=8

2 3 3+6

1 2 3+5

1 1 3

+5

2 0 3+3

0 1 2+3

1 0 2

+2

0 1 1

= 656,

where 8

2 3 3is the number of x’s permutations starting

with 0,6

1 2 3is the number of x′permutations starting

with 100, ...

Then from r, we can get its binary representation u=

1010010010. In [16], Ryabko and Matchikina showed that

if the length of xis n, then we can get the message uin

O(nlog3nlog log n)time.

The above approach is simple and information efﬁcient, but

the encoding is not computationally fast.

B. Generalizing Knuth’s Construction

An alternative approach is to generalize Knuth’s idea to

the nonbinary case due to its operational simplicity. Gen-

erally, assume that we are provided a word u∈Gk

qwith

Gq={0,1,2, ..., q −1}and k=qm, our goal is to generalize

Knuth’s idea to make ubeing balanced.

Let us consider a simple case, q= 4. Given a word u∈Gk

4,

we let niwith 0≤i≤3denote the number of is in u. To

balance all the cell levels, we ﬁrst balance the total number

of 0s and 1s, such that n0+n1= 2m. It also results in

n2+n3= 2m. To do this, we can treat 0and 1as an identical

state and treat 2and 3as another identical state. Based on

Knuth’s idea, there always exists an integer isuch that by

operating on the ﬁrst isymbols (0→2,1→3,2→0,3→1)

it yields n0+n1= 2m. We then consider the subsequence

consisting of 0s and 1s, whose length is 2m. By applying

Knuth’s idea, we can make this subsequence being balanced.

Similarly, we can also balance the subsequence consisting of

2s and 3s. Consequently, we convert any word in Gk

4into a

balanced word. In order to decode this word, three additional

integers of length at most ⌈log k⌉need to be stored, indicating

the locations of having operations. The following example is

constructed for the purpose of demonstrating this procedure.

Example 10. Assume u= 0110230210110003, we convert it

into a balanced word with the following steps:

(1) By operating the ﬁrst 4symbols in u, it yields

2332230210110003, where n0+n1= 8.

(2) Considering the subsequence of 0s and 1s, i.e., the

underlined part in 2332230210110003. By operating the

ﬁrst bit of this subsequence (0 →1,1→0), it yields

2332231210110003, where n0=n1= 4.

(3) Considering the subsequence of 0s and 1s, i.e., the

underlined part in 2332231210110003. By operating the ﬁrst

0bit of this subsequence (2 →3,3→2), it yields

2332231210110003, which is balanced.

To recover 0110230210110003 from 2332231210110003

(the inverse process), we need to record the three integers

[4,1,0] whose binary lengths are [log216,log28,log28].

It can be observed that the procedure above can be easily

generalized for any q= 2awith a≥2. If m= 2bwith b≥a,

then the number of bits to store the integers (locations) is

log2q−1

X

j=0

2jlog2

qm

2j= (q−1)ab −q(a−2) −2.

For instance, if q= 23= 8 and m= 27= 128, then

k= 1024 and it requires 137 bits to represent the locations.

These bits can be stored in 46 cells without balancing.

In fact, the above idea can be generalized for an arbitrary

q > 2. For instance, when q= 3, given an binary word

u∈G3m

3, there exists an integer isuch that u+1i03m−i

has exactly m0s or m1s. Without loss of generality, we

assume that it has exactly m0s, then we can further balance

the subsequence consisting of 1s and 2s. Finally, we can get a

balanced word with alphabet size 3. More generally, we have

the following result.

Theorem 4. Given an alphabet size q=αβ with two integers

αand β, we divide all the levels into βgroups, denoted by

{0, β, 2β , ...},{1, β + 1,2β+ 1, ...}, ..., {β−1,2β−1,3β−

1, ...}. Given any word u∈Gqm

q, there exists an integer i

such that u+1i0qm−ihas exactly αm symbols in one of the

ﬁrst β−1groups.

15

Proof: Let us denote all the groups as S0, S1, ..., Sβ−1.

Given a sequence u, we use njdenote the number of symbols

in uthat belong to Sj. Furthermore, we let n′

jdenote the

number of symbols in u+1qm that belong to Sj. It is easy to

see that n′

j+1 =njfor all j∈ {0,1, ..., β −1}, where (β−

1)+1 = 0. We prove that that there exists j∈ {0,1, ..., β −2}

such that nj≥αm ≥n′

jor nj≤αm ≤n′

jby contradiction.

Assume this statement is not true, then either min(nj, n′

j)>

αm or max(nj, n′

j)< αm for all j∈ {0,1, ..., β −2}. So if

n1> αm, we can get nj> αm for all j∈ {0,1, ..., β −1}

iteratively. Similarly, if n1< αm, we can get nj< αm for

all j∈ {0,1, ..., β −1}iteratively. Both cases contradict with

the fact that Pβ

j=0 nj=αmβ =qm.

Note that the number of symbols in u+1i0qm−ithat belong

to Sjchanges by at most 1if we increase iby one. So if

there exists j∈ {0,1, ..., β −2}such that nj≥αm ≥n′

j

or nj≤αm ≤n′

j, there always exists an integer isuch that

u+1i0qm−ihas exactly αm symbols in Sj.

This completes the proof.

Based on the above result, given any q, we can always split

all the levels into two groups and make them being balanced

(the number of symbols belonging to a group is proportional to

the number of levels in that group). Then we can balance the

levels in each group. Iteratively,all the levels will be balanced.

In order to recover the original message, it requires roughly

(q−1) log2qlog2m

bits for storing additional information when mis large. If we

store this additional information as a preﬁx using a shorter bal-

anced code, then we get a generalized construction of Knuth’s

code. If we follow the steps in Section VII by further adding

parity-check bits, then we get a partial-balanced code with

error-correcting capability, based on which we can implement

partial-balanced modulation for multiple-level cells.

Now, if we have a code that uses ‘full’ sets of balanced

codewords, then the redundancy is

log2qqm −log2qm

m, m, ..., m≃q−log2q

2log2m

bits. So given an alphabet size q, the redundancy of the

above method is about 2(q−1) log2q

q−log2qtimes as high as that

of codes that uses ‘full’ sets of balanced codewords. For

q= 2,3,4,5, ..., 10, we list these factors as follows:

2.0000,4.4803,6.0000,6.9361,7.5694,

8.0351,8.4000,8.6995,8.9539.

It shows that as qincreases, the above method becomes less

information efﬁcient. How to construct balanced codes for a

nonbinary alphabet in a simple, efﬁcient and computationally

fast way is still an open question. It is even more difﬁcult

to construct balanced error-correcting codes for nonbinary

alphabets.

IX. CONCLUSION

In this paper, we introduced balanced modulation for read-

ing/writing in nonvolatile memories. Based on the construction

of balanced codes or balanced error-correcting codes, balanced

modulation can minimize the effect of asymmetric noise,

especially those introduced by cell-level drifts. Hence, it can

signiﬁcantly reduce the bit error rate in nonvolatile memo-

ries. Compared to the other schemes, balanced modulation

is easy to be implemented in the current memory systems

and it does not require any assumptions about the cell-level

distributions, which makes it very practical. Furthermore, we

studied the construction of balanced error-correcting codes, in

particular, balanced LDPC codes. It has very efﬁcient encoding

and decoding algorithms, and it is more efﬁcient than prior

construction of balanced error-correcting codes.

REFERENCES

[1] S. Al-Bassam and B. Bose, “On balanced codes,” IEEE Trans. Inform.

Theory, vol. 36, pp. 406–408, Mar. 1990.

[2] R. Bez, E. Camerlenghi, A. Modelli, and A. Visconti, “Introduction to

ﬂash memory,” Proceedings of the IEEE, vol. 91, pp. 489–502, 2003.

[3] J. E. Brewer and M. Gill, Nonvolatile Memory Technologies with Empha-

sis on Flash, John Wiley & Sons, Hoboken, New Jersey, 2008.

[4] Y. Cai, E. F. Haratsch, O. Mutlu, K. Mai, “Error patterns in MLC NAND

Flash memory: Measurement, characterization, and analysis,” in Proc.

Design, Automation, and Test in Europe (DATE), 2012.

[5] T. M. Cover, “Enumerative source coding,” IEEE Trans. Inform. Theory,

vol. 19, no. 1, pp. 73–77, Jan. 1973.

[6] J. Feldman, M. J. Wainwright, and D. R. Karger, “Using linear program-

ming to decode binary linear codes”, IEEE Trans. Inform. Theory, vol.

51, pp. 954–972, Mar. 2005.

[7] R. Gallager, “Low density parity check codes,” IRE Trans. Inform. Theory,

vol. 8, no. 1, pp. 21–28, Jan. 1962.

[8] R. Gallager, Low Density Parity Check Codes, no. 21 in Research

Monograph Series. Cambridge, MA: MIT Press, 1963.

[9] K. S. Immink and J. Weber, “Very efﬁcient balanced codes,” IEEE Journal

on Selected Areas in Communications, vol. 28, pp. 188–192, 2010.

[10] D. E. Knuth, “Efﬁcient balanced codes,” IEEE Trans. Inform. Theory,

vol. 32, no. 1, pp. 51–53, 1986.

[11] H. T. Lue et al., “Study of incremental step pulse programming (ISPP)

and STI edge effect of BE-SONOS NAND ﬂash,” in Proc. IEEE Int.

Symp. on Reliability Physics, pp. 693–694, May 2008.

[12] A. Mazumdar, R. M. Roth, and P. O. Vontobel, “On linear balancing

sets,” in Proc. IEEE Int. Symp. Information Theory, pp. 2699–2703, 2009.

[13] R. McEliece, D. MacKay, and J. Cheng, “Turbo decoding as an instance

of Pearl’s belief propagation algorithm,” IEEE J. Sel. Areas Commun., vol.

16, no. 2, pp. 140–152, Feb. 1998.

[14] N. Mielke, T. Marquart, N. Wu, J. Kessenich, H. Belgal, E. Schares, F.

Trivedi, E. Goodness, and L. R. Nevill, “Bit error rate in NAND Flash

memories,” in IEEE International Reliability Physics Symposium, pp. 9–

19, 2008.

[15] A. Pirovano, A. Redaelli, et al., “Reliability study of phase-change

nonvolatile memories,” IEEE Transactions on Device and Materials

Reliability, vol. 4, pp. 422–427, 2004.

[16] B. Y. Ryabko and E. Matchikina, “Fast and efﬁcient construction of an

unbiased random sequence,” IEEE Trans. Inform. Theory, vol. 46, pp.

1090–1093, 2000.

[17] L. G. Tallini, R. M. Capocelli, and B. Bose, “Design of some new

balanced codes,” IEEE Trans. Inform. Theory, vol. 42, pp. 790–802, May

1996.

[18] H. van Tilborg and M. Blaum, “On error-correcting balanced codes,”

IEEE Trans. Inf. Theory, vol. 35, no. 5, pp. 1091–1095, Sep. 1989.

[19] J. H. Weber and K. A. S. Immink, “Knuth’s balanced code revisited,”

IEEE Trans. Inform. Theory, vol. 56, no. 4, pp. 1673–1679, Apr. 2010.

[20] J. Weber, K. S. Immink and H. Ferreira, “Error-correcting balanced

Knuth codes,” IEEE Trans. Inform. Theory, vol. 58, no. 1, pp. 82–89,

2012.

[21] H. Wong, S. Raoux, S. Kim, J. Liang, J. P. Reifenberg, B. Rajendran,

M. Asheghi, and K. E. Goodson, “Phase change memory,” Proc. IEEE,

vol. 98, no. 12, pp. 2201–2227, Dec. 2010.