Abstract and Figures

A primary source of increased read time on nand flash comes from the fact that, in the presence of noise, the flash medium must be read several times using different read threshold voltages for the decoder to succeed. This paper proposes an algorithm that uses a limited number of rereads to characterize the noise distribution and recover the stored information. Both hard and soft decoding are considered. For hard decoding, this paper attempts to find a read threshold minimizing bit error rate (BER) and derives an expression for the resulting codeword error rate. For soft decoding, it shows that minimizing BER and minimizing codeword error rate are competing objectives in the presence of a limited number of allowed rereads, and proposes a tradeoff between the two. The proposed method does not require any prior knowledge about the noise distribution but can take advantage of such information when it is available. Each read threshold is chosen based on the results of previous reads, following an optimal policy derived through a dynamic programming backward recursion. The method and results are studied from the perspective of an SLC Flash memory with Gaussian noise, but this paper explains how the method could be extended to other scenarios.
Content may be subject to copyright.
Adaptive Read Thresholds for NAND Flash
Borja Peleato, Member, IEEE, Rajiv Agarwal, Member, IEEE, John Cioffi, Fellow, IEEE,
Minghai Qin, Member, IEEE, and Paul H. Siegel, Fellow, IEEE
Abstract—A primary source of increased read time on NAND
flash comes from the fact that in the presence of noise, the flash
medium must be read several times using different read threshold
voltages for the decoder to succeed. This paper proposes an
algorithm that uses a limited number of re-reads to characterize
the noise distribution and recover the stored information. Both
hard and soft decoding are considered. For hard decoding, the
paper attempts to find a read threshold minimizing bit-error-rate
(BER) and derives an expression for the resulting codeword-
error-rate. For soft decoding, it shows that minimizing BER and
minimizing codeword-error-rate are competing objectives in the
presence of a limited number of allowed re-reads, and proposes
a trade-off between the two.
The proposed method does not require any prior knowledge
about the noise distribution, but can take advantage of such
information when it is available. Each read threshold is chosen
based on the results of previous reads, following an optimal policy
derived through a dynamic programming backward recursion.
The method and results are studied from the perspective of an
SLC Flash memory with Gaussian noise but the paper explains
how the method could be extended to other scenarios.
Index Terms—Flash memory, multi-level memory, voltage
threshold, adaptive read, soft information, symmetric capacity.
A. Overview
The introduction of Solid State Drives (SSD) based on
NAND flash memories has revolutionized mobile, laptop, and
enterprise storage by offering random access to the information
with dramatically higher read throughput and power-efficiency
than hard disk drives. However, SSD’s are considerably more
expensive, which poses an obstacle to their widespread use.
NAND flash manufacturers have tried to pack more data in
the same silicon area by scaling the size of the flash cells and
storing more bits in each of them, thus reducing the cost per
gigabyte (GB) and making flash more attractive to consumers,
but this cell-size shrinkage has come at the cost of reduced
performance. As cell-size shrinks to sub-16nm limits, noise
can cause the voltage residing on the cell at read time to be
significantly different from the voltage that was intended to
be stored at the time of write. Even in current state-of-the-art
19nm NAND, noise is significant towards the end of life of
the drive. One way to recover host data in the presence of
noise is to use advanced signal processing algorithms [1]–[4],
but excessive re-reads and post-read signal processing could
jeopardize the advantages brought by this technology.
B. Peleato is with Purdue University, West Lafayette, IN, 47907 USA (e-
mail: bpeleato@purdue.edu).
R. Agarwal and J. Cioffi are with Stanford University, Stanford, CA 94305
USA (e-mail: {rajivag, cioffi}@stanford.edu).
M. Qin and P. H. Siegel are with the University of California, San Diego,
La Jolla, CA 92093, USA (e-mail:{mqin, psiegel}@ucsd.edu)
Manuscript received January 25, 2015; revised June 29, 2015.
Typically, all post-read signal processing algorithms require
re-reads using different thresholds, but the default read thresh-
olds, which are good for voltage levels intended during write,
are often suboptimal for read-back of host data. Furthermore,
the noise in the stored voltages is random and depends on
several factors such as time, data, and temperature; so a fixed
set of read thresholds will not be optimal throughout the entire
life of the drive. Thus, finding optimal read thresholds in a
dynamic manner to minimize BER and speed up the post-
processing is essential.
The first half of the paper proposes an algorithm for charac-
terizing the distribution of the noise for each nominal voltage
level and estimating the read thresholds which minimize BER.
It also presents an analytical expression relating the BER
found using the proposed methods to the minimum possible
BER. Though BER is a useful metric for algebraic error
correction codes, the distribution of the number of errors is
also important. Some flash memory controllers use a weaker
decoder when the number of errors is small and switch to a
stronger one when the former fails, both for the same code
(e.g. bit-flipping and min-sum for decoding an LDPC code
[5]). The average read throughput and total power consumption
depends on how frequently each decoder is used. Therefore,
the distribution of the number of errors, which is also derived
here, is a useful tool to find NAND power consumption.
The second half of the paper modifies the proposed algo-
rithm to address the quality of the soft information generated,
instead of just the number of errors. In some cases, the BER is
too large for a hard decoder to succeed, even if the read is done
at the optimal threshold. It is then necessary to generate soft
information by performing multiple reads with different read
thresholds. The choice of read thresholds has a direct impact
on the quality of the soft information generated, which in turn
dictates the number of decoder iterations and the number of
re-reads required. The paper models the flash as a discrete
memoryless channel with mismatched decoding and attempts
to maximize its capacity through dynamic programming.
The overall scheme works as follows. First, the controller
reads with an initial threshold and attempts a hard-decoding
of the information. If the noise is weak and the initial
threshold was well chosen, the decoding succeeds and no fur-
ther processing is needed. Otherwise, the controller performs
additional reads with adaptively chosen thresholds to estimate
the mean and/or variance of the voltages for each level. These
estimates are in turn used to estimate the minimum feasible
BER and the corresponding optimal read threshold. The flash
controller then decides whether to perform an additional read
with that estimated threshold to attempt hard decoding again,
or directly attempt a more robust decoding of the information,
leveraging the previous reads to generate soft information.
B. Literature review
Most of the existing literature on optimizing the read
thresholds for NAND flash assumes that prior information on
the noise is available (e.g., [6]–[10]). Some methods, such
at the one proposed by Wang et al. in [11], assume complete
knowledge of the noise and choose the read thresholds so as to
maximize the mutual information between the values written
and read, while others attempt to predict the noise from the
number of program-erase (PE) cycles and then optimize the
read thresholds based on that prediction. An example of the
latter was proposed by Cai et al. in [12]. References [13] and
[14] also address threshold selection and error-correction.
However, in some practical cases there is no prior informa-
tion available, or the prior information is not accurate enough
to build a reliable noise model. In these situations, a common
approach is to perform several reads with different thresholds
searching for the one that returns an equal number of cells on
either side, i.e., the median between the two levels1. However,
the median threshold is suboptimal in general, as was shown
in [1]. In [2] and [15] Zhou et al. proposed encoding the
data using balanced, asymmetric, or Berger codes to facilitate
the threshold selection. Balanced codes guarantee that all
codewords have the same number of ones and zeros, hence
narrowing the gap between the median and optimal thresholds.
Asymmetric and Berger codes, first described in [16], leverage
the known asymmetry of the channel to tolerate suboptimal
thresholds. Berger codes are able to detect any number of
unidirectional errors. In cases of significant leakage, where all
the cells reduce their voltage level, it is possible to perform
several reads with progressively decreasing thresholds until the
Berger code detects a low enough number of errors, and only
then attempt decoding to recover the host information.
Researchers have also proposed some innovative data rep-
resentation schemes with different requirements in terms of
read thresholds. For example, rank modulation [17]–[21] stores
information in the relative voltages between the cells instead
of using pre-defined voltage levels. The strategy of writing
data represented by rank modulation in parallel to flash
memories is studied in [22]. Theoretically, rank modulation
does not require actual read thresholds, but just comparisons
between the cell voltages. Unfortunately, there are a few
technological challenges that need to be overcome before
rank modulation becomes practical. Other examples include
constrained codes [23], [24]; write-once memories codes [25],
[25]–[27]; and other rewriting codes [28]. All these codes
impose restrictions on the levels that can be used during
a specific write operation. Since read thresholds need only
separate the levels being used, they can often take advantage
of these restrictions.
The scheme proposed in this paper is similar to those
described in [29] and [30] in that it assumes no prior in-
formation about the noise or data representation, but it is
significantly simpler and more efficient. We propose using
a small number of reads chosen by a dynamic program to
1In many cases this threshold is not explicitly identified as the median cell
voltage, but only implicitly as the solution of tµ1
σ2, where (µ1, σ1)
and (µ2, σ2)are the mean and standard deviation of the level voltages.
simultaneously estimate the noise and recover the information,
instead of periodically testing multiple thresholds (as in [29])
or running a computationally intensive optimization algorithm
to perfect the model (as in [30]). A prior version of this paper
was published in [31], but the work presented here has been
significantly extended.
Cells in a NAND flash are organized in terms of pages,
which are the smallest units for write and read operations.
Writing the cells in a page is done through a program and
verify approach where voltage pulses are sent into the cells
until their stored voltage exceeds the desired one. Once a cell
has reached its desired voltage, it is inhibited from receiving
subsequent pulses and the programming of the other cells
in the page continues. However, the inhibition mechanism is
non-ideal and future pulses may increase the voltage of the
cell [12], creating write noise. The other two main sources
of noise are inter-cell interference (ICI), caused by interaction
between neighboring cells [32], and charges leaking out of the
cells with time and heat [33].
Some attempts have been made to model these sources
of noise as a function of time, voltage levels, amplitude
of the programming pulses, etc. Unfortunately, the noise is
temperature- and page-dependent as well as time- and data-
dependent [34]. Since the controller cannot measure those fac-
tors, it cannot accurately estimate the noise without performing
additional reads. This paper assumes that the overall noise
follows a Gaussian distribution for each level, as is common
in the literature, but assumes no prior knowledge about their
means or variances. Section VI will explain how the same idea
can be used when the noise is not Gaussian.
Reading the cells in a page is done by comparing their
stored voltage with a threshold voltage t. The read operation
returns a binary vector with one bit for each cell. Bits
corresponding to cells with voltage lower than tare 1 and
those corresponding to cells with voltage higher than tare 0.
However, the aforementioned sources of voltage disturbance
can cause some cells to be misclassified, introducing errors in
the bit values read. The choice of a read threshold therefore
becomes important to minimize the BER in the reads.
In a b-bit MLC flash, each cell stores one of 2bdistinct
predefined voltage levels. When each cell stores multiple
bits, i.e. b2, the mapping of information bits to voltage
levels is done using Gray coding to ensure that only one bit
changes between adjacent levels. Since errors almost always
happen between adjacent levels, Gray coding minimizes the
average BER. Furthermore, each of the bbits is assigned to
a different page, as shown in Fig. 1. This is done so as to
reduce the number of comparisons required to read a page. For
example, the lower page of a TLC (b= 3) flash can be read
by comparing the cell voltages with a single read threshold
located between the fourth and fifth levels, denoted by D in
Fig. 1. The first four levels encode a bit value 1 for the lower
page, while the last four levels encode a value 0. Unfortunately,
reading the middle and upper pages require comparing the cell
voltages with more read thresholds: two (B,F) for the middle
page and four (A,C,E,G) for the upper page.
1Upper page
Lower page
Middle page
A DCB E GF Thresholds
Fig. 1. Typical mapping of bits to pages and levels in a TLC Flash memory.
Upper pages take longer to read than lower ones, but
the difference is not as large as it might seem. Flash chips
generally incorporate dedicated hardware for performing all
the comparisons required to read upper pages, without the
additional overhead that would arise from issuing multiple
independent read requests. The flash controller can then be
oblivious to the type of page being read. Read commands only
need to specify the page being read and a scalar parameter
representing the desired shift in the read thresholds from their
default value. If the page is a lower one, which employs only
one threshold, the scalar parameter is understood as the shift
in this threshold. If the page is an upper one, which employs
multiple read thresholds, their shifts are parameterized by the
scalar parameter. For example, a parameter value of when
reading the middle page in Fig. 1 could shift thresholds B
and Fby and 3
2mV, respectively. Then, cells whose
voltage falls between the shifted thresholds Band Fwould
be read as 0and the rest as 1.
After fixing this parametrization, the flash controller views
all the pages in an MLC or TLC memory as independent
SLC pages with a single read shift parameter that needs
to be optimized. In theory, each low level threshold could
be independently optimized, but the large number of reads
and amount of memory required would render that approach
impractical. Hence, most of the paper will assume a SLC
architecture for the flash and Section VI will show how the
same method and results can be readily extended to memories
with more bits per cell.
Figure 2 (a) shows two overlapping Gaussian probability
density functions (pdfs), corresponding to the two voltage lev-
els to which cells can be programmed. Since data is generally
compressed before being written onto flash, approximately
the same number of cells is programmed to each level. The
figure also includes three possible read thresholds. Denoting by
(µ1, σ1)and (µ2, σ2)the means and standard deviations of the
two Gaussian distributions, the thresholds are: tmean =µ1+µ2
tmedian =µ1σ2+µ2σ1
σ1+σ2, and t, which minimizes BER. If the
noise variance was the same for both levels all three thresholds
would be equal, but this is not the case in practice. The plot
legend provides the BER obtained when reading with each of
the three thresholds.
There exist several ways in which the optimal threshold,
t, can be found. A common approach is to perform several
reads by shifting the thresholds in one direction until the
decoding succeeds. Once the data has been recovered, it can be
compared with the read outputs to find the threshold yielding
the lowest BER [29]. However, this method can require a
large number of reads if the initial estimate is inaccurate,
tmean (BER = 5.6%)
tmedian (BER = 4.8%)
t* (BER = 4.5%)
Fig. 2. (a): Cell voltages pdf in an SLC page, and BER for three different
thresholds: tmean = (µ1+µ2)/2is the average of the cell voltages, tmedian
returns the same number of 1s and 0s and tminimizes the BER and is located
at the intersection of the two pdfs. (b): cdf corresponding to pdf in (a).
which reduces read throughput, and additional memory to store
and compare the successive reads, which increases cost. The
approach taken in this paper consists of estimating (µ1, σ1)
and (µ2, σ2)and deriving tanalytically. It will be shown
how this can be done with as few reads as possible, thereby
reducing read time. Furthermore, the mean and standard
deviation estimates can also be used for other tasks, such as
generating soft information for LDPC decoding.
A read operation with a threshold voltage treturns a binary
vector with a one for each cell whose voltage level is lower
than tand zero otherwise. The fraction of ones in the read
output is then equal to the probability of a randomly chosen
cell having a voltage level below t. Consequently, a read with
a threshold voltage tcan be used to obtain a sample from
the cumulative distribution function (cdf) of cell voltages at t,
illustrated in Fig. 2 (b).
The problem is then reduced to estimating the means and
variances of a mixture of Gaussians using samples from their
joint cdf. These samples will be corrupted by model, read, and
quantization noise. Model noise is caused by the deviation
of the actual distribution of cell voltages from a Gaussian
distribution. Read noise is caused by the intrinsic reading
mechanism of the flash, which can read some cells as storing
higher or lower voltages than they actually have. Quantiza-
tion noise is caused by limited computational accuracy and
rounding of the Gaussian cdf2. All these sources of noise are
collectively referred to as read noise in this paper. It is assumed
to be zero mean, but no other restriction is imposed in our
It is desirable to devote as few reads as possible to the
estimation of (µ1, σ1)and (µ2, σ2). The accuracy of the
estimates would improve with the number of reads, but read
time would also increase. Since there are four parameters to
be estimated, at least four reads will be necessary. Section III
describes how the locations of the read thresholds should be
2Since the Gaussian cdf has no analytical expression, it is generally
quantized and stored as a lookup table
chosen in order to achieve accurate estimates and Section IV
extends the framework to consider how these reads could
be reused to obtain soft information for an LDPC decoder.
If the soft information obtained from the first four reads is
enough for the LDPC decoding to succeed, no additional
reads will be required, thereby reducing the total read time
of the flash. Section V proposes a dynamic programming
method for optimizing the thresholds for a desired objective.
Finally, Section VI explains how to extend the algorithm
for MLC or TLC memories, as well as for non-Gaussian
noise distributions. Section VII provides simulation results
to evaluate the performance of the proposed algorithms and
Section VIII concludes the paper.
A. Parameter estimation
Let ti,i= 1,...,4be four voltage thresholds used for
reading a page and let yi,i= 1,...,4be the fraction of
ones in the output vector for each of the reads, respectively.
If (µ1, σ1)and (µ2, σ2)denote the voltage mean and variance
for the cells programmed to the two levels, then
σ2+nyi, i = 1,...,4,
Q(x) = Z
2dt (2)
and nyidenotes the read noise associated to yi. In theory,
it is possible to estimate (µ1, σ1)and (µ2, σ2)from (ti, yi),
i= 1,...,4by solving the system of non-linear equations in
Eq. (1), but in practice the computational complexity could
be too large for some systems. Another possible approach
would be to restrict the estimates to a pre-defined set of values
and generate a lookup table for each combination. Finding
the table which best fits the samples would require negligible
time but the amount of memory required could render this
approach impractical for some systems. This section proposes
and evaluates a progressive read algorithm that combines these
two approaches, providing similar accuracy to the former and
requiring only a standard normal (µ= 0,σ= 1) look-up
Progressive Read Algorithm: The key idea is to perform
two reads at locations where one of the Qfunctions is known
to be either close to 0 or close to 1. The problem with solving
the system in Eq. (1) was that a sum of Qfunctions cannot
be easily inverted. However, once one of the two Qfunctions
is fixed at 0 or 1, the equation can be put in linear form
using a standard normal table to invert the other Qfunction.
The system of linear equations can then be solved to estimate
the first mean and variance. Once the first mean and variance
have been estimated they can be used to evaluate a Qfunction
from each of the two remaining equations in Eq. (1), which
can then be solved in a similar way. For example, if t1and t2
are significantly smaller than µ2, then
and Eq. (1) can be solved for cµ1and cσ1to get
Substituting these in the equations for the third and fourth
reads and solving gives
It could be argued that, since the pdfs are not known a
priori, it is not possible to determine two read locations where
one of the Qfunctions is close to 0 or close to 1. In practice,
however, each read threshold can be chosen based on the result
from the previous ones. For example, say the first randomly
chosen read location returned y1= 0.6. This read, if used for
estimating the higher level distribution, will be a bad choice
because there will be significant overlap from the lower level.
Hence, a smart choice would be to obtain two reads for the
lower level that are clear of the higher level by reading to the
far left of t1. Once the lower level is canceled, the y1= 0.6
read can be used in combination with a fourth read to the right
of t1to estimate the higher level distribution.
Once the mean and variance of both pdfs have been esti-
mated, it is possible to derive an estimate for the read threshold
minimizing the BER. The BER associated to a given read
threshold tis given by
BER(t) = 1
σ2+ 1 Qµ1t
Making its derivative equal to zero gives the following equa-
tion for the optimal threshold t
where φ(x) = (2π)(1/2)ex2/2. The optimal threshold tis
located at the point where both Gaussian pdfs intersect. An
estimate b
tfor tcan be found from the quadratic equation
2 log cσ2
cσ1= b
which can be shown to be equivalent to solving Eq. (6) with
(µ1, σ1)and (µ2, σ2)replaced by their estimated values.
If some parameters are known, the number of reads can
be reduced. For example, if µ1is known, the first read can
be replaced by t1=µ1,y1= 0.25 in the above equations.
Similarly, if σ1is known (t1, y1)are not needed in Eqs. (3)-
B. Error propagation
This subsection first studies how the choice of read locations
affects the accuracy of the estimators (cµ1,cσ1),(cµ2,cσ2), and
correspondingly b
t. Then it analyzes how the accuracy of
ttranslates into BER(b
t), and provides some guidelines as
to how the read locations should be chosen. Without loss
of generality, it will be assumed that (µ1, σ1)are estimated
first using (t1, y1)and (t2, y2)according to the Progressive
Read Algorithm described in Section III-A, and (µ2, σ2)are
estimated in the second stage. In this case, Eq. (1) reduces to
σ1= 2yi2nyi
for i= 1,2and the estimates are given by Eqs. (3).
If the read thresholds are on the tails of the distributions, a
small perturbation in the cdf value ycould cause a significant
change in Q1(y). This will in turn lead to a significant change
in the estimates. Specifically, a first-order Taylor expansion of
Q1(y+ny)at ycan be written as
Q1(y+ny) = x2πe x2
where x=Q1(y). Since the exponent of eis always positive,
the first-order error term is minimized when x= 0, i.e., when
the read is performed at the mean. The expressions for (cµ1,cσ1)
and (cµ2,cσ2)as seen in Eqs. (3)-(4) use inverse Qfunctions, so
the estimation error due to read noise will be reduced when the
reads are done close to the mean of the Gaussian distributions.
The first order Taylor expansion of Eq. (3) at σ1is given by
(n2n1) + O(n2
1, n2
n1= 22πe
n2= 22πe
A similar expansion can be performed for cµ1, obtaining
1, n2
Two different tendencies can be observed in the above
expressions. On one hand, Eqs. (10) suggest that both t1and
t2should be chosen close to µ1so as to reduce the magnitude
of n1and n2. On the other hand, if t1and t2are very close
together, the denominators in Eq. (9) and (11) can become
small, increasing the estimation error.
The error expansions for cµ2,cσ2and b
t, are omitted for
simplicity, but it can be shown that the dominant terms are
linear in nyi,i= 1,...,4as long as all nyiare small enough.
The Taylor expansion for BER(b
t)at tis
BER( b
t) =BER(t) + 1
=BER(t) + O(e2
10−4 10−3 10−2 10−1
Read noise
Relative error
Fig. 3. The relative error in the mean, variance, and threshold estimates
increases linearly with the read noise (slope=1), but the BER error grows
quadratically (slope=2) and is negligible for a wide range of read noise
where b
t=t+et. The cancellation of the first-order term
is justified by Eq. (6). Summarizing, the mean and variance
estimation error increases linearly with the read noise, as
does the deviation in the estimated optimal read threshold.
The increase in BER, on the other hand, is free from linear
terms. As long as the read noise is not too large, the resulting
BER( b
t) is close to the minimum possible BER. The numerical
simulations in Fig. 3 confirm these results.
In view of these results, it seems that the read thresholds
should be spread out over both pdfs but close to the levels’
mean voltages. Choosing the thresholds in this way will
reduce the error propagating from the reads to the estimates.
However, read thresholds can be chosen sequentially, using the
information obtained from each read in selecting subsequent
thresholds. Section V proposes a method for finding the
optimal read thresholds more precisely.
This section considers a new scenario where a layered
decoding approach is used for increased error-correction ca-
pability. After reading a page, the controller may first attempt
to correct any bit errors in the read-back codeword using
a hard decoder alone, typically a bit-flipping hard-LDPC
decoder [35]. Reading with the threshold b
tfound through
Eq. (7) reduces the number of hard errors but there are cases
in which even BER(b
t) is too high for the hard decoder to
succeed. When this happens, the controller will attempt a
soft decoding, typically using a min-sum or sum-product soft
LDPC decoder.
Soft decoders are more powerful, but also significantly
slower and less power efficient than hard decoders. Conse-
quently, invoking soft LDPC decoding too often can signifi-
cantly impact the controller’s average read time. In order to
estimate the probability of requiring soft decoding, one must
look at the distribution of the number of errors, and not at BER
alone. For example, if the number of errors per codeword is
uniformly distributed between 40 and 60 and the hard decoder
can correct 75 errors, soft decoding will never be needed.
However, if the number of errors is uniformly distributed
between 0 and 100 (same BER), soft decoding will be required
to decode 25% of the reads. Section IV-A addresses this topic.
The error-correction capability of a soft decoder depends
heavily on the quality of the soft information at its input.
It is always possible to increase such quality by performing
Failure rate pe= 0.008 pe= 0.01 pe= 0.012
α= 23 0.05 0.28 0.62
α= 25 0.016 0.15 0.46
α= 27 0.004 0.07 0.31
additional reads, but this decreases read throughput. Sec-
tion IV-B shows how the Progressive Read Algorithm from
the previous section can be modified to provide high quality
soft information.
A. Distribution of the number of errors
Let Nbe the number of bits in a codeword. Assuming
that both levels are equally likely, the probability of error
for any given bit, denoted pe, is given in Eq. (5). Errors
can be considered independent, hence the number of them
in a codeword follows a binomial distribution with parameters
Nand pe. Since Nis usually large, it becomes convenient
to approximate the binomial by a Gaussian distribution with
mean Npeand variance N pe(1 pe), or by a Poisson
distribution with parameter N pewhen Npeis small.
Under the Gaussian approximation paradigm, a codeword
fails to decode with probability QαNpe
Npe(1pe), where
αdenotes the number of bit errors that can be corrected.
Table I shows that a small change in the value of αmay
increase significantly the frequency with which a stronger
decoder is needed. This has a direct impact on average power
consumption of the controller. The distribution of bit errors
can thus be used to judiciously obtain a value of αin order
to meet a power constraint.
B. Obtaining soft inputs
After performing Mreads on a page, each cell can be
classified as falling into one of the M+1 intervals between the
read thresholds. The problem of reliably storing information
on the flash is therefore equivalent to the problem of reliable
transmission over a discrete memoryless channel (DMC), such
as the one in Fig. 4. Channel inputs represent the levels to
which the cells are written, outputs represent read intervals,
and channel transition probabilities specify how likely it is
for cells programmed to a specific level to be found in each
interval at read time.
1 2 34
Fig. 4. DMC channel equivalent to Flash read channel with four reads.
It is well known that the capacity of a channel is given by the
maximum mutual information between the input and output
over all input distributions (codebooks) [36]. In practice,
however, the code must be chosen at write time when the
channel is still unknown, making it impossible to adapt the
input distribution to the channel. Although some asymmetric
codes have been proposed (e.g. [15], [24], [37]), channel
inputs are equiprobable for most practical codes. The mutual
information between the input and the output is then given by
I(X;Y) = 1
p1jlog(p1j) + p2jlog(p2j)
(p1j+p2j) log p1j+p2j
where pij ,i= 1,2,j= 1,...,M+1 are the channel transition
probabilities. For Gaussian noise, these transition probabilities
can be found as
pij =Qµitj
where t0=−∞ and tM+1 =.
The inputs to a soft decoder are given in the form of log-
likelihood ratios (LLR). The LLR value associated with a read
interval kis defined as LLRk= log(p1k/p2k). When the mean
and variance are known it is possible to obtain good LLR
values by reading at the locations that maximize I(X;Y)[11],
which tend to be on the so-called uncertainty region, where
both pdfs are comparable. However, the mean and variance
are generally not known and need to be estimated. Section III
provided some guidelines on how to choose read thresholds in
order to obtain accurate estimates, but those reads tend to pro-
duce poor LLR values. Hence, there are two opposing trends:
spreading out the reads over a wide range of voltage values
yields more accurate mean and variance estimates but degrades
the performance of the soft decoder, while concentrating the
reads on the uncertainty region provides better LLR values but
might yield inaccurate estimates which in turn undermine the
soft decoding.
Some flash manufacturers are already incorporating soft
read commands that return 3 or 4 bits of information for each
cell, but the thresholds for those reads are often pre-specified
and kept constant throughout the lifetime of the device.
Furthermore, most controller manufacturers use a pre-defined
mapping of read intervals to LLR values regardless of the
result of the reads. We propose adjusting the read thresholds
and LLR values adaptively to fit our channel estimates.
Our goal is to find the read locations that maximize the
probability of successful decoding when levels are equiproba-
ble and the decoding is done based on the estimated transition
probabilities. With this goal in mind, Section IV-C will derive
a bound for the (symmetric and mismatched) channel capacity
in this scenario and Section V will show how to choose the
read thresholds so as to maximize this bound. The error-free
coding rate specified by the bound will not be achievable in
practice due to finite code length, limited computational power,
etc., but the BER at the output of a decoder is closely related
to the capacity of the channel [38], [39]. The read thresholds
that maximize the capacity of the channel are generally the
same ones that minimize the BER, in practice.
C. Bound for maximum transmission rate
Shannon’s channel coding theorem states that all transmis-
sion rates below the channel capacity are achievable when the
channel is perfectly known to the decoder; unfortunately this
is not the case in practice. The channel transition probabilities
can be estimated by substituting the noise means and variances
cµ1,cµ2,cσ1,cσ2into Eq. (14) but these estimates, denoted cpij ,
i= 1,2,j= 1,...,5, are inaccurate. The decoder is therefore
not perfectly matched to the channel.
The subject of mismatched decoding has been of interest
since the 1970’s. The most notable early works are by Hui [40]
and Csisz´ar and K¨orner [41], who provided bounds on the
maximum transmission rates under several different condi-
tions. Merhav et al. [42] related those results to the concept of
generalized mutual information and, more recently, Scarlett et
al. [39] found bounds and error exponents for the finite code
length case. It is beyond the scope of this paper to perform
a detailed analysis of the mismatched capacity of a DMC
channel with symmetric inputs; the interested reader can refer
to the above references as well as [43]–[46]. Instead, we will
derive a simplified lower bound for the capacity of this channel
in the same scenario that has been considered throughout the
Theorem 1. The maximum achievable rate of transmission
with vanishing probability of error over a Discrete Memoryless
Channel with equiprobable binary inputs, output alphabet Y,
transition probabilities pij,i= 1,2,j= 1,...,|Y|, and
maximum likelihood decoding according to a different set of
transition probabilities cpij ,i= 1,2,j= 1,...,|Y| is lower
bounded by
CP, ˆ
p1jlog( cp1j) + p2jlog( cp2j)
(p1j+p2j) log cp1j+cp2j
Proof: Provided in the Appendix.
It is worth noting that CP, ˆ
Pis equal to the mutual infor-
mation given in Eq. (13) when the estimates are exact, and
decreases as the estimates become less accurate. In fact, the
probability of reading a given value y∈ Y can be measured
directly as the fraction of cells mapped to the corresponding
interval, so it is usually the case that cp1k+cp2k=p1k+p2k.
The bound then becomes CP, ˆ
P=I(X;Y)D(P|| ˆ
where I(X;Y)is the symmetric capacity of the channel with
matched ML decoding and D(P|| ˆ
P)is the relative entropy
(also known as Kullback-Leibler distance) between the exact
and the estimated transition probabilities.
D(P|| ˆ
P) = 1
p1jlog p1j
cp1j+p2jlog p2j
In this case CP, ˆ
Pis a concave function of the transition prob-
abilities (pij ,cpij ),i= 1,2,j= 1,...,|Y|, since the relative
entropy is convex and the mutual information is concave [36].
The bound attains its maximum when the decoder is matched
to the channel (i.e. pij =cpij i, j) and the read thresholds
are chosen so as to maximize the mutual information between
Xand Y, but that solution is not feasible for our problem.
In practice, both the capacity of the underlying channel
and the accuracy of the estimates at the decoder depend on
the location of the read thresholds and cannot be maximized
simultaneously. Finding the read thresholds t1,t2,t3, and t4
which maximize CP, ˆ
Pis not straightforward, but it can be
done numerically. Section V describes a dynamic program-
ming algorithm for choosing each read threshold based on
prior information about the noise and the result of previous
In most practical cases, the flash controller has prior infor-
mation about the voltage distributions, based on the number
of PE cycles that the page has endured, its position within the
block, etc. This prior information is generally not enough to
produce accurate noise estimates, but it can be used to improve
the choice of read thresholds. We wish to determine a policy
to choose the optimal read thresholds sequentially, given the
prior information about the voltage distributions and the results
in previous reads.
This section proposes a dynamic programming framework
to find the read thresholds that maximize the expected value
of a user-defined reward function. If the goal is to minimize
the BER at the estimated threshold b
t, as in Section III, an
appropriate reward would be 1BER(b
t). If the goal is to
maximize the channel capacity, the reward could be chosen to
be I(X;Y)D(Pkˆ
P), as shown in Section IV-C.
Let x= (µ1, µ2, σ1, σ2)and ri= (ti, yi),i= 1,...,4be
vector random variables, so as to simplify the notation. If the
read noise distribution fnis known, the prior distribution for
xcan be updated based on the result of each read riusing
Bayes rule and Eq. (1):
where Kis a normalization constant. Furthermore, let
R(r1,r2,r3,r4)denote the expected reward associated with
the reads r1,...,r4, after updating the prior fxaccordingly. In
the following, we will use Rto denote this function, omitting
the arguments for the sake of simplicity.
Choosing the fourth read threshold t4after the first three
reads r1,...,r3is relatively straightforward: t4should be
chosen so as to maximize the expected reward, given the
results of the previous three reads. Formally,
4= arg max
E{R|r1,...,r3, t4},(18)
where the expectation is taken with respect to (y4,x)by
factoring their joint distribution in a similar way to Eq. (17):
This defines a policy πfor the fourth read, and a value V3
for each possible state after the first three reads:
π4(r1,...,r3) = t
V3(r1,...,r3) = E{R|r1,...,r3, t
In practice, the read thresholds tiand samples yican only
take a finite number of values, hence the number of feasible
arguments in these functions (states) is also finite. This number
can be fairly large, but it is only necessary to find the value
for a small number of them, those which have non-negligible
probability according to the prior fxand value significantly
larger than 0. For example, states are invariant to permutation
of the reads so they can always be reordered such that t1<
t2< t3. Then, states which do not fulfill y1< y2< y3can be
ignored. If the number of states after discarding meaningless
ones is still too large, it is also possible to use approximations
for the policy and value functions [47], [48].
Equations (19) and (20) assign a value and a fourth read
threshold to each meaningful state after three reads. The same
idea, using a backward recursion, can be used to decide the
third read threshold and assign a value to each state after two
π3(r1,r2) = arg max
E{V3(r1,...,r3)|r1,r2, t3}(21)
V2(r1,r2) = max
E{V3(r1,...,r3)|r1,r2, t3},(22)
where the expectation is taken with respect to (y3,x). Simi-
larly, for the second read threshold
π2(r1) = arg max
E{V2(r1,r2)|r1, t2}(23)
V1(r1) = max
E{V2(r1,r2)|r1, t2},(24)
where the expectation is taken with respect to (y2,x). Finally,
the optimal value for the first read threshold is
1= arg max
E{V1(t1, y1)|t1}.
These policies can be computed offline and then pro-
grammed in the memory controller. Typical controllers have
multiple modes tailored towards different conditions in terms
of number of PE cycles, whether an upper or lower page is
being read, etc. Each of these modes would have its own
prior distributions for (µ1, µ2, σ1, σ2), and would result in a
different policy determining where to perform each read based
on the previous results. Each policy can be stored as a partition
of the feasible reads, and value functions can be discarded,
so memory requirements are very reasonable. Section VII
presents an example illustrating this scheme.
Just like in Section III-A, the number of reads can be
reduced if some of the noise parameters or prior information
is available. The same backward recursion could be used to
optimize the choice of thresholds, but with fewer steps.
Most of the paper has assumed that cells can only store
two voltage levels, with their voltages following Gaussian
distributions. This framework was chosen because it is the
most widely used in the literature, but the method described
can easily be extended to memories with more than two levels
and non-Gaussian noise distributions.
Section II explained how each wordline in a MLC (two
bits per cell, four levels) or TLC (three bits per cell, eight
levels) memory is usually divided into two or three pages
which are read independently as if the memory was SLC.
In that case, the proposed method can be applied without
any modifications. However, if the controller is capable of
simultaneously processing more than two levels per cell, it
is possible to accelerate the noise estimation by reducing the
number of reads. MLC and TLC memories generally have
dedicated hardware that performs multiple reads in the ranges
required to read the upper pages and returns a single binary
value. For example, reading the upper page of a TLC memory
with the structure illustrated in Fig. 1 requires four reads with
thresholds (A, C, E, G) but cells between A and C would
be indistinguishable from cells between E and G; all of them
would be read as 0. However, one additional read of the lower
page (D threshold) would allow the controller to tell them
Performing four reads (t1,...,t4)on the upper page of
a TLC memory would entail comparing the cell voltages
against 16 different thresholds but obtaining only four bits of
information for each cell. The means and variances in Eqs. (3)-
(4) would correspond to mixtures of all the levels storing
the same bit value, assumed to be approximately Gaussian.
The same process would then be repeated for the middle and
lower page. A better approach, albeit more computationally
intensive, would be to combine reads from all three pages
and estimate each level independently. Performing one single
read of the lower page (threshold D), two of the middle page
(each involving two comparisons, with thresholds B and F) and
three of the upper page (each involving four comparisons, with
thresholds A, C, E, G) would theoretically provide more than
enough data to estimate the noise in all eight Gaussian levels.
A similar process can be used for MLC memories performing,
for example, two reads of the lower page and three of the upper
Hence, five page reads are enough to estimate the noise
mean and variance in all 4 levels of an MLC memory and
6 page reads are enough for the 8 levels in a TLC memory.
Other choices for the pages to be read are also possible, but it
is useful to consider that lower pages have smaller probabilities
of error, so they often can be successfully decoded with fewer
reads. Additional reads could provide more precise estimates
and better LLR values for LDPC decoding.
There are papers suggesting that a Gaussian noise model
might not be accurate for some memories [49]. The proposed
scheme can also be extended to other noise distributions,
as long as they can be characterized by a small number
of parameters. Instead of the Q-function in Eq. (2), the
estimation should use the cumulative density function (cdf)
for the corresponding noise distribution. For example, if the
voltage distributions followed a Laplace instead of Gaussian
distribution, Eq. (1) would become
for µ1tiµ2and the estimator ˆ
b1of b1would become
log(1 2y1)log(1 2y2)(26)
when t1,t2are significantly smaller than µ2. Similar formulas
can be found to estimate the other parameters.
This section presents simulation results evaluating the per-
formance of the dynamic programming algorithm proposed in
Section V. Two scenarios will be considered, corresponding to
a fresh page with BER(t) = 0.0015 and a worn-out page with
BER(t) = 0.025. The mean voltage values for each level will
be the same in both scenarios, but the standard deviations will
differ. Specifically, µ1= 1 and µ2= 2 for both pages, but the
fresh page will be modeled using σ1= 0.12 and σ2= 0.22,
while the worn page will be modeled using σ1= 0.18
and σ2= 0.32. These values, however, are unknown to the
controller. The only information that it can use to choose the
read locations are uniform prior distributions on µ1,µ2,σ1,
and σ2, identical for both the fresh and the worn-out pages.
Specifically, µ1is known to be in the interval (0.75,1.25),µ2
in (1.8,2.1),σ1in (0.1,0.24) and σ2in (0.2,0.36).
For each scenario, three different strategies for selecting the
read thresholds were evaluated. The first strategy, S1, tries to
obtain accurate noise estimates by spreading out the reads. The
second strategy, S2, concentrates all of them on the uncertainty
region, attempting to attain highly informative LLR values.
Finally, the third strategy, S3, follows the optimal policy
obtained by the dynamic programming recursion proposed in
Section V, with CP, ˆ
Pas reward function. The three strategies
are illustrated in Fig. 5 and the results are summarized in
Table II, but before proceeding to their analysis we describe
the process employed to obtain S3.
The dynamic programming scheme assumed that read
thresholds were restricted to move in steps of 0.04, and
quantized all cdf measurements also in steps of 0.04 (making
the noise nyfrom Eq. (1) uniform between 0.02 and 0.02).
Starting from these assumptions, Eqs. (19) and (20) were used
to find the optimal policy π4and expected value V3for all
meaningful combinations of (t1, y1, t2, y2, t3, y3), which were
in the order of 106(very reasonable for offline computations).
The value function V3was then used in the backward recursion
to find the policies and values for the first three reads as
explained in Section V. The optimal location for the first read,
in terms of maximum expected value for I(X;Y)D(Pkˆ
after all four reads, was found to be t
1= 1.07. This read
resulted in y1= 0.36 for the fresh page and y1= 0.33 for
the worn page. The policy π2dictated that t2= 0.83 for
y1(0.34,0.38), and t2= 1.63 for y1(0.3,0.34), so those
were the next reads in each case. The third and fourth read
thresholds t3and t4were chosen similarly according to the
corresponding policies.
Finally, as depicted in Fig. 5, the read thresholds were
S1:t= (0.85,1.15,1.75,2.125).
S2:t= (1.2,1.35,1.45,1.6).
S3(fresh page): t= (1.07,0.83,1.79,1.31) resulting
in y= (0.36,0.04,0.58,0.496), respectively.
S3(worn page): t= (1.07,1.63,1.19,1.43) resulting
in y= (0.33,0.56,0.43,0.51), respectively.
For the fresh page, the policy dictates that the first three reads
should be performed well outside of the uncertainty region,
so as to obtain good estimates of the means and variances.
Then, the fourth read is performed as close as possible to
|ˆµµ|0.004 0.182 0.012
|ˆσσ|0.03 0.91 0.12
tt|/t0.01 0.07 0.02
t)BER(t)|/BER(t)0.1 1.4 0.11
LDPC fail rate 1 0.15 0
Genie LDPC fail rate 1 0 0
|ˆµµ|0.005 0.053 0.021
|ˆσσ|0.03 0.27 0.13
tt|/t0.006 0.015 0.011
t)BER(t)|/BER(t)0.003 0.009 0.007
LDPC fail rate 1 0.19 0.05
Genie LDPC fail rate 1 0 0.01
the BER-minimizing threshold. Since the overlap between
both levels is very small, soft decoding would barely provide
any gain over hard decoding. Picking the first three reads
for noise characterization regardless of their value towards
building LLRs seems indeed to be the best strategy. For the
worn-out page, the policy attempts to achieve a trade-off by
combining two reads away from the uncertainty region, good
for parameter estimation, with another two inside it to improve
the quality of the LLR values used for soft decoding.
0.5 1 1.5 2 2.5
Fresh page
0.5 1 1.5 2 2.5
Worn−out page
Fig. 5. Read thresholds for strategies S1,S2and S3for a fresh and a
worn-out page.
Table II shows the relative error in our estimates and
sector failure rates averaged over 5000 simulation instances,
with read noise nyi,i= 1,...,4uniformly distributed
between 0.02 and 0.02. The first three rows show the
relative estimation error of the mean, variance, and optimal
threshold. It can be observed that S1provides the lowest
estimation error, while S2produces clearly wrong estimates.
The estimates provided by S3are noisier than those provided
by S1, but are still acceptable. The relative increase in BER
when reading at b
tinstead of at tis shown in the fourth row
of each table. It is worth noting that the BER(b
t)does not
increase significantly, even with inaccurate mean and variance
estimates. This validates the derivation in Section III-B.
Finally, the last two rows on each table show the failure
rate after 20 iterations of a min-sum LDPC decoder for two
different methods of obtaining soft information. The LDPC
code had 18% redundancy and codeword length equal to
35072 bits. The fifth row corresponds to LLR values obtained
using the mean and variance estimates from the Progressive
Read Algorithm and the last row, labeled “Genie LDPC”,
corresponds to using the actual values instead of the estimated
ones. It can be observed that strategy S1, which provided very
accurate estimates, always fails in the LDPC decoding. This
is due to the wide range of cell voltages that fall between the
middle two reads, being assigned an LLR value close to 0.
The fact that the “Genie LDPC” performs better with S2than
with S3shows that the read locations chosen by the former
are better. However, S3provides lower failure rates in the
more realistic case where the means and variances need to
be estimated using the same reads used to produce the soft
In summary, S3was found to be best from an LDPC
code point of view and S1from a pure BER-minimizing
perspective. S2as proposed in [11] is worse in both cases
unless the voltage distributions are known. When more than
four reads are allowed, all three schemes perform similarly.
After the first four reads, all the strategies have relatively
good estimates for the optimal threshold. Subsequent reads
are located close to the optimal threshold, achieving small
BER. Decoding failure rates are then limited by the channel
capacity, rather than by the location of the reads.
NAND flash controllers often require several re-reads using
different read thresholds to recover host data in the presence
of noise. In most cases, the controller tries to guess the noise
distribution based on the number of PE cycles and picks the
read thresholds based on that guess. However, unexpected
events such as excessive leakage or charge trapping can make
those thresholds suboptimal. This paper proposed algorithms
to reduce the total read time and sector failure rate by using a
limited number of re-reads to estimate the noise and improve
the read thresholds.
The overall scheme will work as follows. First, the con-
troller will generally have a prior estimation of what a good
read threshold might be. It will read at that threshold and
attempt a hard-decoding of the information. If the noise is
weak and the initial threshold was well chosen, this decoding
will succeed and no further processing will be needed. In
cases when this first decoding fails, the controller will perform
additional reads to estimate the mean and/or variance of the
voltage values for each level. These estimates will in turn
be used to estimate the minimum achievable BER and the
corresponding optimal read threshold. The flash controller
then decides whether to perform an additional read with this
threshold to attempt hard decoding again, or directly attempt a
more robust decoding of the information, for example LDPC,
leveraging the reads already performed to generate the soft
The paper proposes using a dynamic programming back-
ward recursion to find a policy for progressively picking the
read thresholds based on the prior information available and
the results from previous reads. This scheme will allow us
to find the thresholds that optimize an arbitrary objective.
Controllers using hard decoding only (e.g., BCH) may wish to
find the read threshold providing minimum BER, while those
employing soft decoding (e.g., LDPC) will prefer to maximize
the capacity of the resulting channel. The paper provides an
approximation for the (symmetric and mismatched) capacity
of the channel and presents simulations to illustrate the per-
formance of the proposed scheme in such scenarios.
[1] B. Peleato and R. Agarwal, “Maximizing MLC NAND lifetime and
reliability in the presence of write noise,” in IEEE Int. Conf. on
Communications (ICC). IEEE, 2012, pp. 3752–3756.
[2] H. Zhou, A. Jiang, and J. Bruck, “Error-correcting schemes with
dynamic thresholds in nonvolatile memories,” in IEEE Int. Symp. on
Information Theory (ISIT). IEEE, 2011, pp. 2143–2147.
[3] B. Peleato, R. Agarwal, and J. Cioffi, “Probabilistic graphical model
for flash memory programming,” in IEEE Statistical Signal Processing
Workshop (SSP). IEEE, 2012, pp. 788–791.
[4] M. Asadi, X. Huang, A. Kavcic, and N. P. Santhanam, “Optimal detector
for multilevel NAND flash memory channels with intercell interference,
IEEE J. Sel. Areas Commun., vol. 32, no. 5, pp. 825–835, May 2014.
[5] M. Anholt, N. Sommer, R. Dar, U. Perlmutter, and T. Inbar, “Dual ECC
decoder, Apr. 23 2013, US Patent 8,429,498.
[6] G. Dong, N. Xie, and T. Zhang, “On the use of soft-decision error-
correction codes in nand flash memory,” IEEE Trans. Circuits Syst. I:
Reg. Papers, vol. 58, no. 2, pp. 429–439, Nov. 2011.
[7] F. Sala, R. Gabrys, and L. Dolecek, “Dynamic threshold schemes for
multi-level non-volatile memories,” IEEE Trans. Commun., vol. 61,
no. 7, pp. 2624–2634, Jul. 2013.
[8] Y. Cai, E. F. Haratsch, O. Mutlu, and K. Mai, “Error patterns in MLC
NAND flash memory: Measurement, characterization, and analysis,” in
Proc. Conf. Design, Automation and Test in Europe. IEEE, Mar. 2012,
pp. 521–526.
[9] Q. Li, A. Jiang, and E. F. Haratsch, “Noise modeling and capacity
analysis for NAND flash memories,” in IEEE Int. Symp. on Information
Theory (ISIT). IEEE, Jul. 2014, pp. 2262–2266.
[10] Y. Cai, E. F. Haratsch, O. Mutlu, and K. Mai, “Threshold voltage
distribution in MLC NAND flash memory: Characterization, analysis,
and modeling,” in Proc. Conf. Design, Automation and Test in Europe.
EDA Consortium, Mar. 2013, pp. 1285–1290.
[11] J. Wang, T. Courtade, H. Shankar, and R. Wesel, “Soft information for
LDPC decoding in flash: Mutual information optimized quantization,”
in IEEE Global Communications Conf. (GLOBECOM), 2011, pp. 5–9.
[12] Y. Cai, O. Mutlu, E. F. Haratsch, and K. Mai, “Program interference in
MLC NAND flash memory: Characterization, modeling, and mitigation,”
in IEEE Int. Conf. on Computer Design (ICCD). IEEE, 2013, pp. 123–
[13] R. Gabrys, F. Sala, and L. Dolecek, “Coding for unreliable flash memory
cells,” IEEE Commun. Lett., vol. 18, no. 9, pp. 1491–1494, Jul. 2014.
[14] R. Gabrys, E. Yaakobi, and L. Dolecek, “Graded bit-error-correcting
codes with applications to flash memory,” IEEE Trans. Inf. Theory,
vol. 59, no. 4, pp. 2315–2327, Apr. 2013.
[15] H. Zhou, A. Jiang, and J. Bruck, “Non-uniform codes for asymmetric
errors,” in IEEE Int. Symp. on Information Theory (ISIT) . IEEE, 2011.
[16] J. Berger, “A note on error detection codes for asymmetric channels,
Inform. and Control, vol. 4, no. 1, pp. 68–73, 1961.
[17] A. Jiang, M. Schwartz, and J. Bruck, “Error-correcting codes for rank
modulation,” in IEEE Int. Symp. on Information Theory (ISIT) . IEEE,
2008, pp. 1736–1740.
[18] A. Jiang, R. Mateescu, M. Schwartz, and J. Bruck, “Rank modulation
for flash memories,” IEEE Trans. on Inf. Theory, vol. 55, no. 6, pp.
2659–2673, June 2009.
[19] E. En Gad, A. Jiang, and J. Bruck, “Compressed encoding for rank
modulation,” in IEEE Int. Symp. on Information Theory Proc. (ISIT).
IEEE, Aug. 2011, pp. 884–888.
[20] Q. Li, “Compressed rank modulation,” in 50th Annu. Allerton Conf. on
Communication, Control, and Computing (Allerton). IEEE, Oct. 2012,
pp. 185–192.
[21] E. E. Gad, E. Yaakobi, A. Jiang, and J. Bruck, “Rank-modulation
rewriting codes for flash memories,” in Proc. IEEE Int. Symp. on
Information Theory (ISIT). IEEE, Jul. 2013, pp. 704–708.
[22] M. Qin, A. A. Jiang, and P. H. Siegel, “Parallel programming of rank
modulation,” in Proc. IEEE Int. Symp. on Information Theory (ISIT).
IEEE, Jul. 2013, pp. 719–723.
[23] M. Qin, E. Yaakobi, and P. H. Siegel, “Constrained codes that mitigate
inter-cell interference in read/write cycles for flash memories,” IEEE J.
Sel. Areas Commun., vol. 32, no. 5, pp. 836–846, May 2014.
[24] S. Kayser and P. H. Siegel, “Constructions for constant-weight ici-free
codes,” in IEEE Int. Symp. on Information Theory (ISIT) . IEEE, Jul.
2014, pp. 1431–1435.
[25] R. Gabrys and L. Dolecek, “Constructions of nonbinary WOM codes
for multilevel flash memories,” IEEE Trans. Inf. Theory, vol. 61, no. 4,
pp. 1905–1919, Apr. 2015.
[26] E. Yaakobi, P. H. Siegel, A. Vardy, and J. K. Wolf, “Multiple error-
correcting WOM-codes,” IEEE Trans. on Inf. Theory, vol. 58, no. 4,
pp. 2220–2230, Apr. 2012.
[27] A. Bhatia, M. Qin, A. R. Iyengar, B. M. Kurkoski, and P. H. Siegel,
“Lattice-based WOM codes for multilevel flash memories,” IEEE J. Sel.
Areas Commun., vol. 32, no. 5, pp. 933–945, May 2014.
[28] Q. Li and A. Jiang, “Polar codes are optimal for write-efficient mem-
ories.” in 51th Annu. Allerton Conf. on Communication, Control, and
Computing (Allerton), 2013, pp. 660–667.
[29] N. Papandreou, T. Parnell, H. Pozidis, T. Mittelholzer, E. Eleftheriou,
C. Camp, T. Griffin, G. Tressler, and A. Walls, “Using adaptive read
voltage thresholds to enhance the reliability of MLC NAND flash
memory systems,” in Proc. 24th Great Lakes Symp. on VLSI. ACM,
2014, pp. 151–156.
[30] D.-H. Lee and W. Sung, “Estimation of NAND flash memory threshold
voltage distribution for optimum soft-decision error correction, IEEE
Trans. Signal Process., vol. 61, no. 2, pp. 440–449, Jan. 2013.
[31] B. Peleato, R. Agarwal, J. Cioffi, M. Qin, and P. H. Siegel, “To-
wards minimizing read time for NAND flash,” in IEEE Global
Communications Conf. (GLOBECOM). IEEE, 2012, pp. 3219–3224.
[32] G. Dong, S. Li, and T. Zhang, “Using data postcompensation and
predistortion to tolerate cell-to-cell interference in MLC NAND flash
memory, IEEE Trans. Circuits Syst. I: Reg. Papers, vol. 57, no. 10, pp.
2718–2728, Oct. 2010.
[33] A. Torsi, Y. Zhao, H. Liu, T. Tanzawa, A. Goda, P. Kalavade, and
K. Parat, “A program disturb model and channel leakage current study
for sub-20nm NAND flash cells,” IEEE Trans. E lectron Devices , vol. 58,
no. 1, pp. 11–16, Jan. 2011.
[34] E. Yaakobi, J. Ma, L. Grupp, P. Siegel, S. Swanson, and J. Wolf,
“Error characterization and coding schemes for flash memories,” in
GLOBECOM Workshops (GC Wkshps). IEEE, 2010, pp. 1856–1860.
[35] D. Nguyen, B. Vasic, and M. Marcellin, “Two-bit bit flipping decoding
of LDPC codes,” in IEEE Int. Symp. on Information Theory (ISIT).
IEEE, 2011, pp. 1995–1999.
[36] T. M. Cover and J. A. Thomas, Elements of Information Theory. John
Wiley & Sons, 2012.
[37] A. Berman and Y. Birk, “Constrained flash memory programming,” in
IEEE Int. Symp. on Information Theory (ISIT). IEEE, 2011, pp. 2128–
[38] Y. Polyanskiy, H. V. Poor, and S. Verd´u, “Channel coding rate in the
finite blocklength regime,” IEEE Trans. Inf. Theory, vol. 56, no. 5, pp.
2307–2359, May 2010.
[39] J. Scarlett, A. Martinez, and A. G. i F `abregas, “Mismatched decoding:
Finite-length bounds, error exponents and approximations,” submitted
for publication. [Online: http://arxiv. org/abs/1303.6166], 2013.
[40] J. Y. N. Hui, “Fundamental issues of multiple accessing,” Ph.D.
dissertation, Mass. Inst. Technol., 1983.
[41] I. Csisz´ar and J. K ¨orner, “Graph decomposition: A new key to coding
theorems,” IEEE Trans. Inf. Theory, vol. 27, no. 1, pp. 5–12, Jan. 1981.
[42] N. Merhav, G. Kaplan, A. Lapidoth, and S. Shamai Shitz, “On informa-
tion rates for mismatched decoders,” IEEE Trans. Inf. Theory, vol. 40,
no. 6, pp. 1953–1967, Nov. 1994.
[43] A. Lapidoth, P. Narayan et al., “Reliable communication under channel
uncertainty,” IEEE Trans. Inf. Theory, vol. 44, no. 6, pp. 2148–2177,
Oct. 1998.
[44] V. B. Balakirsky, “A converse coding theorem for mismatched decoding
at the output of binary-input memoryless channels,” IEEE Trans. Inf.
Theory, vol. 41, no. 6, pp. 1889–1902, Nov. 1995.
[45] M. Alsan and E. Telatar, “Polarization as a novel architecture to
boost the classical mismatched capacity of B-DMCs,” arXiv preprint
arXiv:1401.6097, 2014.
[46] E. Arikan, “Channel polarization: A method for constructing capacity-
achieving codes for symmetric binary-input memoryless channels,” IEEE
Trans. Inf. Theory, vol. 55, no. 7, pp. 3051–3073, Jul. 2009.
[47] D. P. Bertsekas, Dynamic Programming and Optimal Control. Athena
Scientific Belmont, MA, 1995, vol. 1, no. 2.
[48] Y. Wang, B. O’Donoghue, and S. Boyd, “Approximate dynamic pro-
gramming via iterated Bellman inequalities,” Int. Journal of Robust and
Nonlinear Control, 2014.
[49] T. Parnell, N. Papandreou, T. Mittelholzer, and H. Pozidis, “Modelling
of the threshold voltage distributions of sub-20nm nand flash memory,”
in IEEE Global Communications Conf. (GLOBECOM). IEEE, 2014,
pp. 2351–2356.
Borja Peleato (S’12-M’13) is a Visiting Assistant Professor in the Electrical
and Computer Engineering department at Purdue University. He received
his B.S. degrees in telecommunications and mathematics from Universitat
Politecnica de Catalunya, Barcelona, Spain, in 2007, and his M.S. and Ph.D.
degrees in electrical engineering from Stanford University in 2009 and 2013,
respectively. He was a visiting student at the Massachusetts Institute of
Technology in 2006, and a Senior Flash Channel Architect with Proton
Digital Systems in 2013. His research interests include signal processing and
coding for non-volatile storage, convex optimization, and communications.
Dr. Peleato received a ”La Caixa” Graduate Fellowship in 2006.
Rajiv Agarwal completed his B.Tech. degree in electrical engineering from
I.I.T. Kanpur in 2003, and his M.S. and Ph.D. degrees from Stanford
University in 2005 and ??? respectively. He has worked at ??? and is currently
at ???. His aca- demic interests include ???.
John M. Cioffi (S’77-M’78-SM’90-F’96) received the B.S. degree in electri-
cal engineering from the University of Illinois at Urbana-Champaign, Urbana,
IL, USA, in 1978 and the Ph.D. degree in electrical engineering from Stanford
University, Stanford, CA, USA, in 1984. He was with Bell Laboratories
in 19781984 and IBM Research in 19841986. Since 1986, he has been
with Stanford University, where he was a Professor in electrical engineering
and is currently an Emeritus Professor. He founded Amati Communications
Corporation in 1991 (purchased by TI in 1997) and was an Officer/Director
from 1991 to 1997. He is also an Adjunct Professor of computing/information
technology with King Abdulaziz University, Jeddah, Saudi Arabia. Currently,
he is also with the Board of Directors of ASSIA (Chairman and CEO),
Alto Beam, and the Marconi Foundation. He has published more than 600
papers and holds more than 100 patents, of which many are heavily licensed,
including key necessary patents for the international standards in ADSL,
VDSL, DSM, and WiMAX. His specific interests are in the area of high-
performance digital transmission. Prof. Cioffi was the recipient of the IEEE’s
Alexander Graham Bell and Millennium Medals (2010 and 2000); Member
Internet Hall of Fame (2014); Economist Magazine 2010 Innovations Award;
International Marconi Fellow (2006); Member, U.S. National and U.K. Royal
Academies of Engineering (2001, 2009); IEEE Kobayashi and Kirchmayer
Awards (2001 and 2014); IEEE Fellow (1996); IEE JJ Tomson Medal (2000);
1991 and 2007 IEEE Communications Magazine Best Paper; and numerous
conference best paper awards.
Minghai Qin (S’11) is a Research Principal Engineer in Storage Architecture
at HGST. He received the B.E. degree in electronic and electrical engineering
from Tsinghua University, Beijing, China, in 2009, and the Ph.D. degree in
electrical engineering from the University of California, San Diego, in 2014.
He was also associated with the Center for Magnetic Recording Research
(CMRR) from 2010 to 2014. His research interests include coding and signal
processing for non-volatile memories, polar codes implementation, and coding
for distributed storage.
Paul H. Siegel (M’82-SM’90-F’97) received the S.B. and Ph.D. degrees
in mathematics from the Massachusetts Institute of Technology (MIT),
Cambridge, in 1975 and 1979, respectively. He held a Chaim Weizmann
Postdoctoral Fellowship at the Courant Institute, New York University. He
was with the IBM Research Division in San Jose, CA, from 1980 to 1995.
He joined the faculty at the University of California, San Diego in July
1995, where he is currently Professor of Electrical and Computer Engineering
in the Jacobs School of Engineering. He is affiliated with the Center for
Magnetic Recording Research where he holds an endowed chair and served
as Director from 2000 to 2011. His primary research interests lie in the areas of
information theory and communications, particularly coding and modulation
techniques, with applications to digital data storage and transmission. Prof.
Siegel was a member of the Board of Governors of the IEEE Information
Theory Society from 1991 to 1996 and from 2009 to 2011. He was re-elected
for another 3-year term in 2012. He served as Co-Guest Editor of the May
1991 Special Issue on ”Coding for Storage Devices” of the IEEE Transactions
on Information Theory. He served the same Transactions as Associate Editor
for Coding Techniques from 1992 to 1995, and as Editor-in-Chief from July
2001 to July 2004. He was also Co-Guest Editor of the May/September 2001
two-part issue on ”The Turbo Principle: From Theory to Practice” of the IEEE
Journal on Selected Areas in Communications. Prof. Siegel was co-recipient,
with R. Karabed, of the 1992 IEEE Information Theory Society Paper Award
and shared the 1993 IEEE Communications Society Leonard G. Abraham
Prize Paper Award with B. H. Marcus and J.K. Wolf. With J. B. Soriaga and
H. D. Pfister, he received the 2007 Best Paper Award in Signal Processing
and Coding for Data Storage from the Data Storage Technical Committee of
the IEEE Communications Society. He holds several patents in the area of
coding and detection, and was named a Master Inventor at IBM Research
in 1994. He is an IEEE Fellow and a member of the National Academy of
Proof: (Theorem 1) The proof is very similar to that
for Shannon’s Channel Coding Theorem, but a few changes
will be introduced to account for the mismatched decoder.
Let X∈ {1,2}ndenote the channel input and Y∈ Yn
the channel output, with Xiand Yidenoting their respective
components for i= 1,...,n. Throughout the proof, ˆ
denote the estimate for the probability of an event Aobtained
using the transition probabilities cpij ,i= 1,2,j= 1,...,|Y|,
to differentiate it from the exact probability P(A)obtained
using transition probabilities pij ,i= 1,2,j= 1,...,|Y|. The
inputs are assumed to be symmetric, so ˆ
P(X) = P(X)and
P(X, Y ) = ˆ
We start by generating 2nR random binary sequences of
length nto form a random code Cwith rate Rand length
n. After revealing the code Cto both the sender and the
receiver, a codeword xis chosen at random among those in
Cand transmitted. The conditional probability of receiving a
sequence y∈ Yngiven the transmitted codeword xis given
by P(Y=y|X=x) = Qn
i=1 pxiyi, where xiand yidenote
the i-th components of xand y, respectively.
The receiver then attempts to recover the codeword x
that was sent. However, the decoder does not have access
to the exact transition probabilities pij and must use the
estimated probabilities cpij instead. When pij =cpij i, j,
the optimal decoding procedure is maximum likelihood de-
coding (equivalent to maximum a posteriori decoding, since
inputs are equiprobable). In maximum likelihood decoding, the
decoder forms the estimate ˆx = arg maxx∈C ˆ
P(y|x), where
P(Y=y|X=x) = Qn
i=1 dpxiyiis the estimated likelihood
of x, given ywas received.
Denote by ˆ
ǫthe set of length-nsequences {(x,y)}
whose estimated empirical entropies are ǫ-close to the typical
estimated entropies:
ǫ={(x,y)∈ {1,2}n× Yn:(27)
nlog P(X=x)1< ǫ, (28)
nlog ˆ
P(Y=y)µY< ǫ, (29)
nlog ˆ
P(X=x, Y =y)µX Y < ǫ,(30)
where µYand µXY represent the expected values of
nlog ˆ
P(Y)and 1
nlog ˆ
P(X, Y ), respectively, and the log-
arithms are in base 2. Hence,
P(Yi=k) log ˆ
2log cp1k+cp2k
µXY =1
P(Xi=b, Yi=k)·
log ˆ
P(Xi=b, Yi=k)(33)
k=1 p1k
2log cp1k
2log cp2k
where the exact transition probabilities are used as weights in
the expectation and the estimated ones are the variable values.
Particularly, (x,y)ˆ
ǫimplies that ˆ
2n(1µXY ǫ)and ˆ
P(Y=y)<2n(µYǫ). We will say that
a sequence x∈ {1,2}nis in ˆ
ǫif it can be extended to a
sequence (x,y)ˆ
ǫ, and similarly for y∈ Yn.
First we show that with high probability, the transmitted and
received sequences (x,y)are in the ˆ
ǫset. The weak law
of large numbers states that for any given ǫ > 0, there exists
n0, such that for any codeword length n > n0
nlog P(X=x)1ǫ<ǫ
nlog ˆ
nlog ˆ
P(X=x, Y =y)µX Y ǫ<ǫ
Applying the union bound to these events shows that for n
large enough, P(x,y)/ˆ
ǫ< ǫ.
When a codeword x∈ {1,2}nis transmitted and y∈ Yn
is received, an error will occur if there exists another code-
word z∈ C such that ˆ
y|X=x). The estimated likelihood of xis greater than
2n(1µXY ǫ)with probability at least 1ǫ, as was just
shown. The other nR 1codewords in Care independent
from the received sequence. For a given yˆ
ǫ, let
Sy=nx∈ {1,2}n:ˆ
P(Y=y|X=x)2n(1µXY ǫ)ode-
note the set of input sequences whose estimated likelihood is
greater than 2n(1µXY ǫ). Then
1 = X
>|Sy|2n(1µXY ǫ)2n2n(µYǫ)(40)
which implies |Sy|<2n(µXY µY+2ǫ)for all yˆ
If (x,y)ˆ
ǫ, any other codeword causing an error must
be in Sy. Let Ei,i= 1,...,nR1denote the event that the
i-th codeword in the codebook Cis in Sy, and Fthe event
that (x,y)are in ˆ
ǫ. The probability of error can be upper
bounded by
P(ˆx 6=x) = P(Fc)P(ˆx 6=x|Fc) + P(F)P(ˆx 6=x|F)(41)
ǫP (ˆx 6=x|Fc) +
ǫ+ 2nR|Sy|2n(43)
ǫ+ 2n(R+µXY µY1+2ǫ)(44)
Consequently, as long as
R < 1
(p1klog ( cp1k) + p2klog ( cp2k))
(p1k+p2k) log cp1k+cp2k
for any δ > 0, we can choose ǫand nǫso that for any n > nǫ
the probability of error, averaged over all codewords and over
all random codes of length n, is below δ. By choosing a code
with average probability of error below δand discarding the
worst half of its codewords, we can construct a code of rate
nand maximal probability of error below 2δ, proving
the achievability of any rate below the bound CP, ˆ
Pdefined in
Eq. (15). This concludes the proof.
... Nevertheless, these LLR values are not accurate because the network is used when the channel model is unknown, and it uses training data to obtain the conditional probability distribution of the voltage states corresponding to each threshold voltage, while the probability distribution of the threshold voltage cannot be obtained. The authors of [11] have proposed the adaptive read-voltage (ART) algorithm to use the statistical data for calculating the mean and variance of each voltage-state distribution, and then obtain the read voltages and calculate LLR values. But in practice, this scheme cannot obtain the accurate threshold voltages of the memory cells, and thus leads to inaccurate LLR values. ...
... 00 01 11 10 where i ∈ {11, 10, 00, 01}, the means and variances of the four voltage states are as follows: ...
... Then, the BER of LDPC codes under ML decoding can be estimated [23], which is similar to Eq. (10) and Eq. (11). ...
To mitigate the impact of noise and interference on multi-level-cell (MLC) flash memory with the use of low-density parity-check (LDPC) codes, we propose a dynamic write-voltage design scheme considering the asymmetric property of raw bit error rate (RBER), which can obtain the optimal write voltage by minimizing a cost function. In order to further improve the decoding performance of flash memory, we put forward a low-complexity entropy-based read-voltage optimization scheme, which derives the read voltages by searching for the optimal entropy value via a log-likelihood ratio (LLR)-aware cost function. Simulation results demonstrate the superiority of our proposed dynamic write-voltage design scheme and read-voltage optimization scheme with respect to the existing counterparts.
... Several publications proposed threshold adaptation concepts [25,[30][31][32][33][34][35]. These approaches adjust the read references to minimize bit error rates. ...
... Such voltage sensing operations incur a larger energy consumption and a latency penalty. Similarly, a parameter estimation approach and a threshold adaptation policy is derived in [33]. This adaptation method is also based on the assumption of Gaussian voltage distributions. ...
Full-text available
The performance and reliability of nonvolatile NAND flash memories deteriorate as the number of program/erase cycles grows. The reliability also suffers from cell-to-cell interference, long data retention time, and read disturb. These processes effect the read threshold voltages. The aging of the cells causes voltage shifts which lead to high bit error rates (BER) with fixed predefined read thresholds. This work proposes two methods that aim on minimizing the BER by adjusting the read thresholds. Both methods utilize the number of errors detected in the codeword of an error correction code. It is demonstrated that the observed number of errors is a good measure for the voltage shifts and is utilized for the initial calibration of the read thresholds. The second approach is a gradual channel estimation method that utilizes the asymmetrical error probabilities for the one-to-zero and zero-to-one errors that are caused by threshold calibration errors. Both methods are investigated utilizing the mutual information between the optimal read voltage and the measured error values. Numerical results obtained from flash measurements show that these methods reduce the BER of NAND flash memories significantly.
... Clearly, by selecting a set of mass-centered codewords, we are able to significantly simplify the detection routine as the term β 1 (x)i is absent in (5), at the cost of extra code redundancy. We also note that as β 0 (x) = x, the distance measure δ(r,x) changes into the prior art δ (r,x), see (3). The rate of a masscentered code is not attractive for many applications, see, for example, Table 1 of [9], which shows the size of S m versus n. ...
Full-text available
We consider noisy communications and storage systems that are hampered by varying offset of unknown magnitude such as low-frequency signals of unknown amplitude added to the sent signal. We study and analyze a new detection method whose error performance is independent of both unknown base offset and offset’s slew rate. The new method requires, for a codeword length n ≥ 12, less than 1.5 dB more noise margin than Euclidean distance detection. The relationship with constrained codes based on mass-centered codewords and the new detection method is discussed.
... Driven by this observation, much effort has been put into the optimization of read-voltage thresholds [10,11,15,18,19]. To prolong the lifetime of flash memory, the soft readvoltage sensing strategy becomes a prevailing solution for flash memory. ...
Full-text available
The error‐correcting performance of multi‐level‐cell (MLC) NAND flash memory is closely related to the block length of error‐correcting codes (ECCs) and log‐likelihood‐ratios of the read‐voltage thresholds. Driven by this issue, this paper optimizes the read‐voltage thresholds for MLC flash memory to improve the decoding performance of ECCs with finite block length. First, through the analysis of channel coding rate and decoding error probability under finite block length, the optimization problem of read‐voltage thresholds to minimize the maximum decoding error probability is formulated. Second, a cross‐iterative search algorithm to optimize read‐voltage thresholds under the perfect knowledge of flash memory channel is developed. However, it is challenging to analytically characterize the voltage distribution under the effect of data retention noise. To address this problem, a deep neural network (DNN)‐aided optimization strategy to optimize the read‐voltage thresholds is developed, where a multi‐layer perception network is employed to learn the relationship between voltage distribution and read‐voltage thresholds. Simulation results show that, compared with the existing schemes, the proposed DNN‐aided read‐voltage threshold optimization strategy with a well‐designed Low Density Parity Check (LDPC) code can not only improve the program‐and‐erase endurance but also reduce the read latency.
... Driven by this observation, much effort has been put into the optimization of read-voltage thresholds [8], [9], [16], [19], [20]. The well-designed read-voltage thresholds can convert hard information (i.e., voltages of cells) into soft information (i.e., LLRs), which greatly improve the decoding performance of flash memory. ...
Full-text available
The error correcting performance of multi-level-cell (MLC) NAND flash memory is closely related to the block length of error correcting codes (ECCs) and log-likelihood-ratios (LLRs) of the read-voltage thresholds. Driven by this issue, this paper optimizes the read-voltage thresholds for MLC flash memory to improve the decoding performance of ECCs with finite block length. First, through the analysis of channel coding rate (CCR) and decoding error probability under finite block length, we formulate the optimization problem of read-voltage thresholds to minimize the maximum decoding error probability. Second, we develop a cross iterative search (CIS) algorithm to optimize read-voltage thresholds under the perfect knowledge of flash memory channel. However, it is challenging to analytically characterize the voltage distribution under the effect of data retention noise (DRN), since the data retention time (DRT) is hard to be recorded for flash memory in reality. To address this problem, we develop a deep neural network (DNN) aided optimization strategy to optimize the read-voltage thresholds, where a multi-layer perception (MLP) network is employed to learn the relationship between voltage distribution and read-voltage thresholds. Simulation results show that, compared with the existing schemes, the proposed DNN-aided read-voltage threshold optimization strategy with a well-designed LDPC code can not only improve the program-and-erase (PE) endurance but also reduce the read latency.
Resistive memories are affected by significant error rates tied to structural relaxation and wear out of the resistive memory devices. A way to reduce the need for strong error-correcting codes (ECCs) is to improve error correction based on the weak bits, i.e., potentially faulty bits, identified in sensed memory words. Here, it is formally proven that conventional ECC decoders reinforced with weak-bit-flipping may achieve similar error correction capability as theoretical generalized-minimum-distance decoders. It is shown that weak-bit-flipping may reduce the uncorrectable bit error rate (UBER) by orders of magnitude when applied in conjunction with single-error-correcting and double-error-detecting (SEC-DED) or double-error-correcting and triple-error-detecting (DEC-TED) codes. In particular, weak-bit-information extracted from a 2T2R memory and used to reinforce a DEC-TED code with a conventional decoder may enable an UBER that is one order of magnitude better than the UBER achieved with a triple-error-correcting (TEC) code and a conventional decoder.
Solid-state drive (SSD) gradually dominates in the high-performance storage scenarios. Three-dimension (3D) NAND flash memory owning high-storage capacity is becoming a mainstream storage component of SSD. However, the interferences of the new 3D charge-trap (CT) NAND flash are getting unprecedentedly complicated, yielding to many problems regarding reliability and performance. Alleviating these problems needs to understand the characteristics of 3D CT NAND flash memory deeply. To facilitate such understanding, in this article, we delve into characterizing the performance, reliability, and threshold voltage ( V th ) distribution of 3D CT NAND flash memory. We make a summary of these characteristics with multiple interferences and variations and give several new insights and a characterization methodology. Especially, we characterize the skewed ( V th ) distribution, ( V th ) shift laws, and the exclusive layer variation in 3D NAND flash memory. The characterization is the backbone of designing more reliable and efficient flash-based storage solutions.
3-D NAND flash memory has become increasingly popular nonvolatile storage devices due to large capacity and high performance. With the increase of program/erase (P/E) cycles and retention periods, the threshold voltage distribution of 3-D NAND flash memory is prone to shift such that it is difficult to accurately obtain the read reference voltage (RRV). When reading data, read retry operations perform multiple flash sensing to read bit information correctly, inducing extended read latency. To mitigate the read latency, a method of precisely acquiring the RRV is urgently needed. Using an field-programmable gate array (FPGA) hardware testing platform, this article first studies error characteristics of 3-D triple-level cell (TLC) NAND flash memory with the floating gate (FG) structure, which includes the variations of raw bit error rates (RBERs) in different layers and pages, the variations of block reads under different read modes, and the threshold voltage shifting characteristic. Then, based on these characterizations, this article develops an error characteristic aware RRV acquisition scheme, called ECRRV, to gain optimal RRV by exploiting the least square method. Experimental results show that the proposed scheme can significantly diminish the RBER and block read count.
The practical NAND flash memory suffers from various non-stationary noises that are difficult to be predicted. For example, the data retention noise induced channel offset is unknown during the readback process, and hence severely affects the reliability of data recovery from the memory cell. In this paper, we first propose a novel recurrent neural network (RNN)-based detector to effectively detect the data stored in the multi-level-cell (MLC) flash memory without the prior knowledge of the channel. However, compared with the conventional threshold detector, the proposed RNN detector introduces much longer read latency and more power consumption. To tackle this problem, we further propose an RNN-aided (RNNA) dynamic threshold detector, whose detection thresholds can be derived based on the outputs of the RNN detector. We thus only need to activate the RNN detector periodically when the system is idle. Moreover, to enable soft-decision decoding of error-correction codes, we first show how to obtain more read thresholds based on the hard-decision read thresholds derived from the RNN detector. We then propose integer-based reliability mappings based on the designed read thresholds, which can generate the soft information of the channel. Finally, we propose to apply density evolution (DE) combined with the differential evolution algorithm to optimize the read thresholds for low-density parity-check (LDPC) coded flash memory channels. Computer simulation results demonstrate the effectiveness of our proposed RNNA dynamic read thresholds design, for both the uncoded and LDPC-coded flash memory channels, without any prior knowledge of the channel.
Full-text available
Rank modulation has been recently proposed as a scheme for storing information in flash memories. While rank modulation has advantages in improving write speed and endurance, the current encoding approach is based on the "push to the top" operation that is not efficient in the general case. We propose a new encoding procedure where a cell level is raised to be higher than the minimal necessary subset - instead of all - of the other cell levels. This new procedure leads to a significantly more compressed (lower charge levels) encoding. We derive an upper bound for a family of codes that utilize the proposed encoding procedure, and consider code constructions that achieve that bound for several special cases.
Conference Paper
Full-text available
NAND Flash memory is not only the ubiquitous storage medium in consumer applications, but has also started to appear in enterprise storage systems as well. MLC and TLC Flash technology made it possible to store multiple bits in the same silicon area as SLC, thus reducing the cost per amount of data stored. However, at current sub-20nm technology nodes, MLC Flash devices fail to provide the levels of raw reliability, mainly cycling endurance, that are required by typical enterprise applications. Advanced signal-processing and coding schemes are needed to improve the Flash bit error rate and thus elevate the device reliability to the desired level. In this paper, we report on the use of adaptive voltage thresholds in the read operation of NAND Flash devices. We discuss how the optimal read voltage thresholds can be determined, and assess the benefit of adapting the read voltage thresholds in terms of cycling endurance, data retention and resilience to read disturb.
Conference Paper
The proliferation of NAND flash memory in consumer devices has driven their aggressive cost reduction by continuous scaling to smaller technology nodes. However, this relentless cost per capacity improvement has diminished the reliability of flash memory to a degree that advanced signal processing and error correction are needed to enhance signal integrity in current flash-based systems. Accurate models of flash readback signals are necessary to properly design such advanced signal enhancement schemes. We propose a new parametric model of the flash readback signal based on fitting threshold voltage distributions from NAND flash devices. We show accurate fitting results for flash devices cycled up to 10 times longer than their nominal endurance specification, and provide simple expressions of the model parameters as a function of program/erase cycles. Finally, we also demonstrate that the proposed model can be used to capture effects such as programming errors, that occur in over-stressed flash devices.
The goal of this paper is to present constructions of high-rate nonbinary write-once memory (WOM) codes for multilevel flash memories. The constructions provided here are all based on the basic idea of mapping high-rate binary codebooks to nonbinary codebooks. The proposed codes maintain the same length and encoding complexity as their underlying binary constituents. We begin by presenting some elementary, yet rate-efficient constructions. Afterward, we consider a high-rate two-write WOM-code defined over an alphabet of size four. In addition, we consider the application of our constructions to the creation of fixed-rate WOM codes. The constructions presented in this paper improve upon the best-known code constructions for certain code lengths.
Conference Paper
Write-efficient memories (WEM) [1] were introduced by Ahswede and Zhang as a model for storing and updating information on a rewritable medium with constraints. A coding scheme for WEM using recently proposed polar codes is presented. The coding scheme achieves rewriting capacity, and the rewriting and decoding operation can be done in time O(N logN), where N is the length of the codeword.
Conference Paper
As NAND flash memory continues to scale down to smaller process technology nodes, its reliability and endurance are degrading. One important source of reduced reliability is the phenomenon of program interference: when a flash cell is programmed to a value, the programming operation affects the threshold voltage of not only that cell, but also the other cells surrounding it. This interference potentially causes a surrounding cell to move to a logical state (i.e., a threshold voltage range) that is different from its original state, leading to an error when the cell is read. Understanding, characterizing, and modeling of program interference, i.e., how much the threshold voltage of a cell shifts when another cell is programmed, can enable the design of mechanisms that can effectively and efficiently predict and/or tolerate such errors. In this paper, we provide the first experimental characterization of and a realistic model for program interference in modern MLC NAND flash memory. To this end, we utilize the read-retry mechanism present in some state-of-the-art 2Y-nm (i.e., 20-24nm) flash chips to measure the changes in threshold voltage distributions of cells when a particular cell is programmed. Our results show that the amount of program interference received by a cell depends on 1) the location of the programmed cells, 2) the order in which cells are programmed, and 3) the data values of the cell that is being programmed as well as the cells surrounding it. Based on our experimental characterization, we develop a new model that predicts the amount of program interference as a function of threshold voltage values and changes in neighboring cells. We devise and evaluate one application of this model that adjusts the read reference voltage to the predicted threshold voltage distribution with the goal of minimizing erroneous reads. Our analysis shows that this new technique can reduce the raw flash bit error rate by 64% and thereby improve flash lifetime by 30%. We- hope that the understanding and models developed in this paper lead to other error tolerance mechanisms for future flash memories.
Conference Paper
Flash memories have become a significant storage technology. However, they have various types of error mechanisms, which are drastically different from traditional communication channels. Understanding the error models is necessary for developing better coding schemes in the complex practical settings. This paper endeavors to survey the noise and disturbs in NAND flash memories, and construct channel models for them. The capacity of flash memory under these models is analyzed, particularly regarding capacity degradation with flash operations, the trade-off of sub-thresholds for soft cell-level information, and the importance of dynamic thresholds.
Conference Paper
In this paper, we consider the joint coding constraint of forbidding the 101 subsequence and requiring all codewords to have the same Hamming weight. These two constraints are particularly applicable to SLC flash memory - the first constraint mitigates the problem of inter-cell interference, while the second constraint helps to alleviate the problems that arise from voltage drift of programmed cells. We give a construction for codes that satisfy both constraints, then analyze properties of best-case codes that can come from this construction.
In this work, the model introduced by Gabrys is extended to account for the presence of unreliable memory cells. Leveraging data analysis on errors taking place in a TLC Flash device, we show that memory cells can be broadly categorized into reliable and unreliable cells, where the latter are much more likely to be in error. Our approach programs unreliable cells only in a limited capacity. In particular, we suggest a coding scheme, using generalized tensor product codes, that programs the unreliable cells only at certain voltage levels that are less likely to result in errors. We present simulation results illustrating an improvement of up to a half order of magnitude in page error rates compared to existing codes.
In this paper, we introduce new methods for finding functions that lower bound the value function of a stochastic control problem, using an iterated form of the Bellman inequality. Our method is based on solving linear or semidefinite programs, and produces both a bound on the optimal objective, as well as a suboptimal policy that appears to works very well. These results extend and improve bounds obtained in a previous paper using a single Bellman inequality condition. We describe the methods in a general setting and show how they can be applied in specific cases including the finite state case, constrained linear quadratic control, switched affine control, and multi-period portfolio investment. Copyright © 2014 John Wiley & Sons, Ltd.