ArticlePDF Available

Abstract and Figures

The Pearson distance has been advocated for improving the error performance of noisy channels with unknown gain and offset. The Pearson distance can only fruitfully be used for sets of $q$-ary codewords, called Pearson codes, that satisfy specific properties. We will analyze constructions and properties of optimal Pearson codes. We will compare the redundancy of optimal Pearson codes with the redundancy of prior art $T$-constrained codes, which consist of $q$-ary sequences in which $T$ pre-determined reference symbols appear at least once. In particular, it will be shown that for $q\le 3$ the $2$-constrained codes are optimal Pearson codes, while for $q\ge 4$ these codes are not optimal.
Content may be subject to copyright.
Pearson Codes †‡
Jos H. Weber Kees A. Schouhamer Immink and Simon R. Blackburn
September 1, 2015
Abstract -The Pearson distance has been advocated for improving the
error performance of noisy channels with unknown gain and offset. The
Pearson distance can only fruitfully be used for sets of q-ary codewords,
called Pearson codes, that satisfy specific properties. We will analyze
constructions and properties of optimal Pearson codes. We will compare
the redundancy of optimal Pearson codes with the redundancy of prior
art T-constrained codes, which consist of q-ary sequences in which Tpre-
determined reference symbols appear at least once. In particular, it will be
shown that for q3the 2-constrained codes are optimal Pearson codes,
while for q4these codes are not optimal.
Key words: flash memory, digital optical recording, Non-Volatile Mem-
ory, NVM, Pearson distance.
1 Introduction
In non-volatile memories, such as floating gate memories, the data is
represented by stored charge, which can leak away from the floating gate.
Jos H. Weber is with Delft University of Technology, Delft, The Netherlands.
Kees A. Schouhamer Immink is with Turing Machines Inc, Willemskade 15b-d,
3016 DK Rotterdam, The Netherlands. E-mail:
Simon R. Blackburn is with the Department of Mathematics, Royal Holloway
University of London, Egham, Surrey TW20 0EX, United Kingdom. E-mail:
This leakage may result in a shift of the offset or threshold voltage of
the memory cell. The amount of leakage depends on the time elapsed
between writing and reading the data. As a result, the offset between
different groups of cells may be very different so that prior art automatic
offset or gain control, which estimates the mismatch from the previously
received data, can not be applied. Methods to solve these difficulties in
Flash memories have been discussed in, for example, [4], [5], [6], [7]. In
optical disc media, such as the popular Compact Disc, DVD, and Blu-
ray disc, the retrieved signal depends on the dimensions of the written
features and upon the quality of the light path, which may be obscured by
fingerprints or scratches on the substrate. Fingerprints and scratches will
result in rapidly varying offset and gain variations of the retrieved signal.
Automatic gain and offset control in combination with dc-balanced codes
are applied albeit at the cost of redundancy [2], and thus improvements
to the art are welcome.
Immink & Weber [3] showed that detectors that use the Pearson dis-
tance offer immunity to offset and gain mismatch. The Pearson distance
can only be used for a set of codewords with special properties, called a
Pearson set or Pearson code. Let Sbe a codebook of chosen q-ary code-
words x= (x1, x2, . . . , xn) over the q-ary alphabet Q={0,1, . . . , q 1},
q2, where n, the length of x, is a positive integer. Note that the alpha-
bet symbols are to be treated as being just integers rather than elements
of Zq. A Pearson code with maximum possible size given the parameters
qand nis said to be optimal.
In Section 2, we set the stage with a description of Pearson distance
detection and the properties of the constrained codes used in conjunction
with it. Section 3 gives a description of T-constrained codes, a type of
code described in the prior art [3], used in conjunction with the Pearson
distance detector, while Section 4 offers a general construction of optimal
Pearson codes and a computation of their cardinalities. The rates of T-
constrained codes will be compared with optimal rates of Pearson codes.
In Section 5, we will describe our conclusions.
2 Preliminaries
We use the shorthand notation av+b= (av1+b, av2+b, . . . , avn+b). In [3],
the authors suppose a situation where the sent codeword, x, is received as
the vector r=a(x+ν) + b,riR. The basic assumptions are that xis
scaled by an unknown factor, called gain,a > 0, offsetted by an unknown
(dc-) offset, b, where aand bR, and corrupted by additive noise ν=
(ν1, . . . , νn), νiRare noise samples with distribution N(0, σ2). Both
quantities, gain and offset, do not vary from symbol to symbol, but are
the same for all nsymbols.
The receiver’s ignorance of the channel’s momentary gain and offset
may lead to massive performance degradation as shown, for example,
in [3] when a traditional detector, such as threshold or maximum like-
lihood detector, is used. In the prior art, various methods have been
proposed to overcome this difficulty. In a first method, data reference, or
‘training’, patterns are multiplexed with the user data in order to ‘teach’
the data detection circuitry the momentary values of the channel’s char-
acteristics such as impulse response, gain, and offset. In a channel with
unknown gain and offset, we may use two reference symbol values, where
in each codeword, a first symbol is set equal to the lowest signal level
and a second symbol equal to the highest signal level. The positions and
amplitudes of the two reference symbols are known to the receiver. The
receiver can straightforwardly measure the amplitude of the retrieved
reference symbols, and normalize the amplitudes of the remaining sym-
bols of the retrieved codeword before applying detection. Clearly, the
redundancy of the method is two symbols per codeword.
In a second prior art method, codes satisfying equal balance and en-
ergy constraints [8], which are immune to gain and offset mismatch, have
been advocated. The redundancy of these codes, denoted by r0, is given
by [8]
r0logqn+ logq(q21)q24 + logq
In a recent contribution, Pearson distance detection is advocated since
its redundancy is much less than that of balanced codes [3]. The Pearson
distance between the vectors xand ˆ
xis defined by
x) = 1 ρx,ˆ
is the (Pearson) correlation coefficient, and
Note that σxis closely related to, but not the same as, the standard
deviation of x. The Pearson distance and Pearson correlation coefficient
are well-known concepts in statistics and cluster analysis. Note that
we have |ρx,ˆ
x| ≤ 1 by a corollary of the Cauchy-Schwarz Inequality [9,
Section IV.4.6], which implies that 0 δ(x,ˆ
A minimum Pearson distance detector outputs the codeword
xo= arg min
As the Pearson distance is translation and scale invariant, that is,
x) = δ(ax+b, ˆ
we conclude that the Pearson distance between the vectors xand ˆ
is independent of the channel’s gain or offset mismatch, so that, as a
result, the error performance of the minimum Pearson distance detector
is immune to gain and offset mismatch. This virtue implies, however, that
the minimum Pearson distance detector cannot be used in conjunction
with arbitrary codebooks, since
x) = δ(r,ˆ
if ˆ
x+c2,c1, c2Rand c1>0. In other words, since a minimum
Pearson detector cannot distinguish between the words ˆ
xand ˆ
c2, the codewords must be taken from a codebook S ⊆ Qnthat guarantees
unambiguous detection with the Pearson distance metric (2).
It is a well-known property of the Pearson correlation coefficient,
x, that
x= 1
if and only if
where the coefficients c1and c2>0 are real numbers [9, Section IV.4.6].
It is further immediate, see (3), that the Pearson distance is undefined
for codewords xwith σx= 0. We coined the name Pearson code for a
set of codewords that can be uniquely decoded by a minimum Pearson
distance detector. We conclude that codewords in a Pearson code must
satisfy two conditions, namely
Property A: If x∈ S then c1+c2x/∈ S for all c1, c2Rwith
(c1, c2)̸= (0,1) and c2>0.
Property B: x= (c, c, . . . , c)/∈ S for all cR.
In the remaining part of this paper, we will study constructions and prop-
erties of Pearson codes. In particular, we are interested in Pearson codes
that are optimal in the sense of having the largest number of codewords
for given parameters nand q. We will commence with a description of
prior art T-constrained codes, a first example of Pearson codes.
3T-constrained codes
T-constrained codes [1], denoted by Sq,n(a1, . . . , aT), consist of q-ary n-
length codewords, where T, 0 < T q,preferred or reference symbols
a1, . . . , aT∈ Q, must each appear at least once in a codeword. Thus,
each codeword, (x1, x2, . . . , xn), in a T-constrained code satisfies
|{i:xi=j}| >0 for each j∈ {a1, . . . , aT}.
The number of n-length q-ary sequences, NT(q, n), where T,Tq,
distinct symbols occur at least once in the n-sequence, equals [1]
NT(q, n) =
Ti) (qi)n, n T. (6)
For example, we easily find for T= 1 and T= 2 that
N1(q, n) = qn(q1)n(7)
N2(q, n) = qn2(q1)n+ (q2)n.(8)
Clearly, the number of T-constrained sequences is not affected by the
choice of the specific Tsymbols we like to favor.
For the binary case, q= 2, we simply find that S2,n(0) is obtained by
removing the all-‘1’ word from Qn, that S2,n(1) is obtained by removing
the all-‘0’ word from Qn, and that S2,n(0,1) is obtained by removing both
the all-‘1’ and all-‘0’ words from Qn, where Q={0,1}. Hence, indeed,
N1(2, n) = 2n1
N2(2, n) = 2n2.
The 2-constrained code Sq,n (0, q 1) is a Pearson code as it satis-
fies Properties A and B [3]. There are more examples of 2-constrained
sets that are Pearson codes, such as Sq,n(0,1). Note, however, that not
all 2-constrained sets are Pearson codes. For example, Sq,n(0,2) does
not satisfy Property A if q5, since, e.g., both (0,1,2, . . . , 2) and
(0,2,4, . . . , 4) = 2 ×(0,1,2, . . . , 2) are codewords
It is obvious from Property B that the code S2,n(0,1) of size 2n2
is the optimal binary Pearson code. For the ternary case, q= 3, it can
easily be argued that S3,n(0,1), S3,n(0,2), and S3,n(1,2) are all optimal
Pearson codes of size 3n2n+1 + 1.
However, for q > 3 the 2-constrained sets such as Sq,n(0,1), Sq,n(0, q
1), and Sq,n (q2, q1) are not optimal Pearson codes, except when n= 2.
For example, for q= 4, it can be easily checked that the set S4,n(0,3)
S3,n(0,1,2) is a Pearson code. Its size equals N2(4, n) + N3(3, n)=4n
3n2n+1 + 3, which turns out to be the maximum possible size of any
Pearson code for q= 4, as shown in the next section, where we will
address the problem of constructing optimal Pearson codes for any value
of q.
4 Optimal Pearson codes
For x= (x1, x2, . . . , xn)∈ Qn, let m(x) and M(x) denote the smallest
and largest value, respectively, among the xi. Furthermore, in case xis
not the all-zero word, let GCD(x) denote the greatest common divisor of
the xi. For integers n, q 2, let Pq,n denote the set of all q-ary sequences
xof length nsatisfying the following properties:
1. m(x) = 0;
2. M(x)>0;
3. GCD(x) = 1.
Theorem 1 For any n, q 2,Pq,n is an optimal Pearson code.
Proof. We will first show that Pq ,n is a Pearson code. Property B is
satisfied since any word in Pq,n contains at least one ‘0’ and at least
one symbol unequal to ‘0’. It can be shown that Property A holds by
supposing that x∈ Pq,n and ˆ
x=c1+c2x∈ Pq,n for some c1, c2R
with c2>0. Clearly c1= 0, since c1̸= 0 implies that m(ˆ
x)̸= 0. Then,
since ˆ
x=c2x, we infer that GCD(ˆ
x) = c2×GCD(x) = c2. Since, by
definition, GCD(ˆ
x) = 1, we have c2= 1 and conclude ˆ
x=x, which
proves that also Property A is satisfied. We conclude Pq,n is a Pearson
We will now show that Pq,n is the greatest among all Pearson codes.
To that end, let Sbe any q-ary Pearson code of length n. We map all
x∈ S to xm(x) and call the resulting code S. Then, we map all
words xin Sto x/GCD(x). Note that both mappings are injective
due to Property A and we don’t end up with the all-zero word due to
Property B. In fact, all words in the resulting code S′′ satisfy Properties
1)-3), and thus S′′ of size |S| is a subset of Pq,n, which proves that Pq,n
is optimal.
From the definitions of T-constrained sets and Pq,n it follows that
Sq,n(0,1) ⊆ Pq,n ⊆ Sq,n(0).(9)
In the following subsections, we will consider the cardinality and redun-
dancy of Pq,n, and compare these to the corresponding results for T-
constrained codes.
4.1 Cardinality
In this subsection, we study the size Pq,n of Pq,n. From (9), we have
N2(q, n)Pq,n N1(q, n).(10)
From Property B we have the trivial upper bound
Pq,n qnq, (11)
which is tight in case q= 2 as indicated in Section 3, i.e.,
P2,n = 2n2.(12)
In order to present expressions for larger values of q, we first prove the
following lemma. We define P1,n = 0.
Lemma 1 For any n2and q3,
(Pi,n Pi1,n) = qn2(q1)n+ (q2)n,(13)
where the summation is over all integers iin the indicated range such
that i1is a divisor of q1.
Proof. For each isuch that 2 iqand i1 is a divisor of q1,
we define Di,n as the set of all i-ary sequences yof length nsatisfying
m(y) = 0, M(y) = i1, and GCD(y) = 1. Let Ddenote the union of
all these Di,n.
The mapping ψfrom Sq,n(0, q 1) to D, defined by dividing x
Sq,n(0, q 1) by GCD(x), is a bijection. This follows by observing that,
on one hand, ψ(x) is a unique member of D(q1)/GCD(x)+1,n, while, on the
other hand, any sequence in y∈ Di,n is the image of ((q1)/(i1))y
Sq,n(0, q 1) under ψ.
Finally, the lemma follows by observing that |Di,n|=Pi,n Pi1,n and
|Sq,n(0, q 1)|=N2(q, n) = qn2(q1)n+ (q2)n.
We thus have with (13) a recursive expression for Pq,n. Starting from the
result for q= 2 in (12), we can find Pq,n for any nand q. Expressions
for 2 q8 of the size of optimal Pearson codes, Pq,n, are tabulated in
Table 1. The next theorem offers a closed formula for the size of optimal
Pearson codes, Pq,n. We start with a definition.
For a positive integer d, the M¨obius function µ(d) is defined [10,
Chapter XVI] to be 0 if dis divisible by the square of a prime, otherwise
µ(d) = (1)kwhere kis the number of (distinct) prime divisors of d.
Theorem 2 Let nand qbe positive integers. Let Pq,n be the cardinality
ofaq-ary Pearson code of length n. Then
Pq,n =
d+ 1n
We use the following well-known theorem (see [10, Section 16.5], for
example) in our proof of Theorem 2.
Theorem 3 Let F:RRand G:RRbe functions such that
G(x) =
for all positive x. Then
F(x) =
Proof. (of Theorem 2) For a non-negative real number x, define
Ix={0,1, . . . , x⌋} =Z[0, x].
Let Vxbe the set of vectors of length nwith entries in Ix, but with at
least one non-zero entry. Define G(x) = |Vx|. There are |Ix|nlength n
vectors with entries in Ix, and (|Ix| − 1)nof these vectors have no zero
entries. Since |Ix|=x+ 1, we find that
G(x) = |Ix|n(|Ix| − 1)n1 = (x+ 1)n− ⌊xn1.(16)
Table 1: Size of optimal Pearson codes, Pq,n , for 2 q8.
q Pq,n
2 2n2
3 3n2n+1 + 1
4 4n3n2n+1 + 3
5 5n4n3n+ 2
6 6n5n3n2n+ 4
7 7n6n4n+ 2n+ 1
8 8n7n4n+ 3
For a positive integer d, let Vx,d be the set of vectors cVxsuch that
GCD(c) = d. Since c̸=0, we see that 1 GCD(c)maxi{ci}≤⌊x
and so Vxcan be written as the disjoint union
Moreover, |Vx,d|=|Vx/d,1|, since the map taking cVx,d to (1/d)c
Vx/d,1is a bijection.
Define F(x) = |Vx,1|, so F(x) is the number of vectors cVxsuch
that GCD(c) = 1. Now,
G(x) = |Vx|=
d=1 |Vx,d|=
d=1 |Vx/d,1|=
So, by Theorem 3, we deduce that (15) holds. Theorem 2 now follows
from the fact that Pq,n =F(q1), by combining (15) and (16).
After perusing Table 1, it appears that for q4, Pq,n is roughly qn
(q1)n. An intuitive justification is that among the qnq-ary sequences
of length nthere are (q1)nsequences that do not contain 0, which is
the most significant condition to avoid. All this is confirmed by the next
Corollary 1 For any positive integer q, we have that
Pq,n =qn(q1)n+O(q/2n)
as n→ ∞.
Proof. The d= 1 term in the sum on the right hand side of (14)
is qn(q1)n, and the absolute values of remaining terms are each
bounded by q/2n, since
(q1)/d+ 1 ≤ ⌊(q1)/2+ 1 ≤ ⌈q/2.
As discussed above, the 2-constrained codes Sq,n (0,1) and Sq,n(0, q1)
are Pearson codes. Therefore, it is of interest to compare Pq,n with the
cardinality N2(q, n) = qn2(q1)n+ (q2)nof 2-constrained codes.
For q3, we simply have Pq,n =Sq,n(0, q 1). For q4, we infer from
(8) and Corollary 1 that N2(q, n)< Pq,n , with a possible exception for
very small values of n. For all q2 it holds that
Pq,2=N2(q, 2) = 2 (17)
Pq,3= 6
where ϕ(j) is Euler’s totient function that counts the totatives of j, i.e.,
the positive integers less than or equal to jthat are relatively prime to
We have computed the cardinalities of N1(q, n), N2(q, n), and Pq,n by
invoking (7), (8), and the expressions in Table 1. Table 2 lists the results
of our computations for selected values of qand n.
4.2 Redundancy
As usual, the redundancy of a q-ary code Cof length nis defined by
nlogq|C|.From (7), it follows that the redundancy of a 1-constrained
code is
Table 2: N2(q, n), Pq,n, and N1(q, n) for selected values of qand n.
n q N2(q, n)Pq,n N1(q, n)
4 4 110 146 175
4 5 194 290 369
4 6 302 578 671
5 4 570 720 781
5 5 1320 1860 2101
5 6 2550 4380 4651
6 4 2702 3242 3367
6 5 8162 10802 11529
6 6 19502 30242 31031
7 4 12138 13944 14197
7 5 47544 59556 61741
7 6 140070 199500 201811
where the approximation follows from the well-known fact that ln(1+a)
awhen ais close to 0. Similarly, from (8) we infer the redundancy of a
2-constrained code, namely
r2=nlogq(qn2(q1)n+ (q2)n)
Since the 2-constrained code Sq,n (0,1) is optimal for q= 2,3, the ex-
pression for r2gives the minimum redundancy for any binary or ternary
Pearson code. From Corollary 1, it follows for q4 that the redundancy
of optimal Pearson codes equals
rP=nlogqqn(q1)n+Oq+ 1
+Oq+ 1
+Oq+ 1
In conclusion, for sufficiently large n, we have
if q= 2,3, while
rPr1r2/2 (23)
if q4. Figure 1 shows, as an example, the redundancies r1,r2, and
rPversus nfor q= 8 (the quantity rPwas computed using the expres-
sion listed in Table 1). Note that the redundancy r2decreases while the
redundancy of prior art balanced codes, r0, see (1), increases with in-
creasing codeword length n. The curve r0versus nwas not plotted in
Figure 1 as the the redundancy of balanced codes is much higher than
that of Pearson codes. For example, an evaluation of (1) shows that the
redundancy r0= 2.79 for q= 8 and n= 10, while rP= 0.147 for the
same parameters.
5 Conclusions
We have studied sets of q-ary codewords of length n, coined Pearson
codes, that can be detected unambiguously by a detector based on the
Pearson distance. We have formulated the properties of codewords in
Pearson codes. We have presented constructions of optimal Pearson codes
and evaluated their cardinalities and redundancies. We conclude that,
except for small values of qand/or n, the redundancy of optimal Pearson
codes is almost the same as the redundancy of 1-constrained codes.
2 4 6 8 10 12 14 16 18 20
Figure 1: Redundancy r1,r2, and rPversus nfor q= 8.
[1] K. A. S. Immink, “Coding Schemes for Multi-Level Flash Memories
that are Intrinsically Resistant Against Unknown Gain and/or Offset
Using Reference Symbols”, Electronics Letters, vol. 50, pp. 20-22,
[2] K. A. S. Immink and J. H. Weber, “Very Efficient Balanced Codes”,
IEEE Journal on Selected Areas of Communications, vol. 28, pp.
188-192, 2010.
[3] K. A. S. Immink and J. H. Weber, “Minimum Pearson Distance
Detection for Multi-Level Channels with Gain and/or Offset Mis-
match”, IEEE Trans. Inform. Theory, vol. 60, pp. 5966-5974, Oct.
[4] A. Jiang, R. Mateescu, M. Schwartz, and J. Bruck, “Rank Modula-
tion for Flash Memories”, IEEE Trans. Inform. Theory, vol. IT-55,
no. 6, pp. 2659-2673, June 2009.
[5] F. Sala, R. Gabrys, and L. Dolecek, “Dynamic Threshold Schemes
for Multi-Level Non-Volatile Memories”, IEEE Trans. on Commun.,
pp. 2624-2634, Vol. 61, July 2013.
[6] H. Zhou, A. Jiang, and J. Bruck, “Error-correcting schemes with
dynamic thresholds in nonvolatile memories”, IEEE Int. Symposium
in Inform. Theory (ISIT), St Petersburg, July 2011.
[7] F. Sala, K. A. S. Immink, and L. Dolecek, “Error Control Schemes
for Modern Flash Memories: Solutions for Flash Deficiencies”, IEEE
Consumer Electronics Magazine, vol. 4 (1), pp. 66-73, Jan. 2015.
[8] K. A. S. Immink, “Coding Schemes for Multi-Level Channels with
Unknown Gain and/or Offset Using Balance and Energy con-
straints”, IEEE International Symposium on Information Theory,
(ISIT), Istanbul, July 2013.
[9] A. M. Mood, F. A. Graybill, and D. C. Boes, Introduction to the
Theory of Statistics, Third Edition, McGraw-Hill, 1974.
[10] G. H. Hardy and E. M. Wright, An Introduction to the Theory of
Numbers, (5th Edition), Oxford University Press, Oxford, 1979.
... Such sets are called Pearson codes. In [10], optimal codes were presented, in the sense of having the largest number of codewords and thus minimum redundancy among all q-ary Pearson codes of fixed length n. However, the important issue of efficient coding procedures was not addressed. ...
... A code satisfying (5) is called a Pearson code [10]. Known constructions of Pearson codes read as follows. ...
... We denote this code by P(n, q). It is has been shown in [10] that this code is optimal in the sense that it has the largest number of codewords among all q-ary Pearson codes of length n. Another code which is of interest, though not being a Pearson code, is defined as follows. ...
Conference Paper
Full-text available
The recently proposed Pearson codes offer immunity against channel gain and offset mismatch. These codes have very low redundancy, but efficient coding procedures were lacking. In this paper, systematic Pearson coding schemes are presented. The redundancy of these schemes is analyzed for memoryless uniform sources. It is concluded that simple coding can be established at only a modest rate loss.
... As investigated in [3] and [5] for the case of (a, b, 0)immunity, the codebook should satisfy certain properties in order to allow the use of MPD detection and to prevent ambiguous decoding options. For the case of (a, b, c)-immunity, a new class of codes with the required properties will be presented in the next section. ...
... In order to work well with an MPD detector, the codebook should satisfy the following two requirements [3], [5]: (i) it should not contain vectors u with σ u = 0, since it follows from (3) that the Pearson distance is undefined for such u; (ii) the presence of a vector w in the codebook implies that all vectors c 1 w + c 2 1 with c 1 > 0, c 2 ∈ R, and (c 1 , c 2 ) = (1, 0), should not appear in the codebook because of (4). In our case, these requirements must hold for ∆S, since the MPD detector operates on the difference codebook. ...
We consider noisy data transmission channels with unknown scaling and varying offset mismatch. Minimum Pearson distance detection is used in cooperation with a difference operator, which offers immunity to such mismatch. Pair-constrained codes are proposed for unambiguous decoding, where in each codeword certain adjacent symbol pairs must appear at least once. We investigate the cardinality and redundancy of these codes.
... A second method, orthogonal to the first approach, is based on the premise that the detector should be designed in such a way that channel mismatch does not cause undue error performance degradation, so that, redundancy of training sequences, parameter estimation, and receiver adjustment are not needed or cannot be applied, for example, where offset/gain changes quickly from page to page. Minimum Pearson Distance (MPD) detection has been advocated since it has innate resistance, or is said to be immune, to unknown variations of the signal amplitude (gain) and offset of the received signal [4], [5], [6]. The authors assume that the offset is constant (uniform) for all symbols in the codeword. ...
... Clearly, Pearsondistance-based detection is (a, b) immune. The Pearson distance can only be used for a set of codewords with special properties, called a Pearson code [6]. We now show that employment of a set of mass-centered codewords will make Pearson-distance-based detection invariant to the parameter c, that is (c) immune, as well. ...
We consider the transmission and storage of data that use coded binary symbols over a channel, where a Pearsondistance-based detector is used for achieving resilience against additive noise, unknown channel gain, and varying offset. We study Minimum Pearson Distance (MPD) detection in conjunction with a set, S, of codewords satisfying a center-of-mass constraint. We investigate the properties of the codewords in S, compute the size of S, and derive its redundancy for asymptotically large values of the codeword length n. The redundancy of S is approximately 3/2 log2 n + α where α = log2 √π/24 =-1.467. for n odd and α =-0.467. for n even. We describe a simple encoding algorithm whose redundancy equals 2 log2 n + o(log n). We also compute the word error rate of the MPD detector when the channel is corrupted with additive Gaussian noise.
... Some of the most widely-recognized constraints include (d, k) RLL constraints that bound the number of logic zeros between consecutive logic ones to be between d and k, and DC-free constraints that bound the running digital sum (RDS) value of the encoded sequence, where RDS is the accumulation of encoded bit weights in a sequence given that a logic one has weight +1 and a logic zero has weight −1 [1]. Some other types of constraints include the Pearson constraint and constraints that mitigate inter-cell interference in flash memories [10], [13], [14], [17], [19]- [22]. CS encoders can be described by finite state machines (FSMs) consisting of states, edges and labels. ...
Full-text available
Constrained sequence (CS) codes, including fixed-length CS codes and variable-length CS codes, have been widely used in modern wireless communication and data storage systems. Sequences encoded with constrained sequence codes satisfy constraints imposed by the physical channel to enable efficient and reliable transmission of coded symbols. In this paper, we propose using deep learning approaches to decode fixed-length and variable-length CS codes. Traditional encoding and decoding of fixed-length CS codes rely on look-up tables (LUTs), which is prone to errors that occur during transmission. We introduce fixed-length constrained sequence decoding based on multiple layer perception (MLP) networks and convolutional neural networks (CNNs), and demonstrate that we are able to achieve low bit error rates that are close to maximum a posteriori probability (MAP) decoding as well as improve the system throughput. Further, implementation of capacity-achieving fixed-length codes, where the complexity is prohibitively high with LUT decoding, becomes practical with deep learning-based decoding. We then consider CNN-aided decoding of variable-length CS codes. Different from conventional decoding where the received sequence is processed bit-by-bit, we propose using CNNs to perform one-shot batch-processing of variable-length CS codes such that an entire batch is decoded at once, which improves the system throughput. Moreover, since the CNNs can exploit global information with batch-processing instead of only making use of local information as in conventional bit-by-bit processing, the error rates can be reduced. We present simulation results that show excellent performance with both fixed-length and variable-length CS codes that are used in the frontiers of wireless communication systems.
... The word error rate (WER) of 10,000 trials is shown as a function of the signal-to-noise ratio (SNR = −20 log 10 σ). Results are given for 2-constrained codes [12], [15], while a = 1.07 and b = 0.07. The simulations indicate that for this case Pearson distance decoding has a comparable performance as ML decoding, while Euclidean distance decoding performs considerably worse. ...
Conference Paper
Full-text available
Reliability is a critical issue for modern multi-level cell memories. We consider a multi-level cell channel model such that the retrieved data is not only corrupted by Gaussian noise, but hampered by scaling and offset mismatch as well. We assume that the intervals from which the scaling and offset values are taken are known, but no further assumptions on the distributions on these intervals are made. We derive maximum likelihood (ML) decoding methods for such channels, based on finding a codeword that has closest Euclidean distance to a specified set defined by the received vector and the scaling and offset parameters. We provide geometric interpretations of scaling and offset and also show that certain known criteria appear as special cases of our general setting.
... Such sets are called Pearson codes. In [8], optimal Pearson codes were presented, in the sense of having the largest number of codewords and thus minimum redundancy among all q-ary Pearson codes of fixed length n. Further, in [9] a decoder was proposed based on minimizing a weighted sum of Euclidean and Pearson distances. ...
Conference Paper
Full-text available
Data storage systems may not only be disturbed by noise. In some cases, the error performance can also be seriously degraded by offset mismatch. Here, channels are considered for which both the noise and offset are bounded. For such channels, Euclidean distance-based decoding, Pearson distance-based decoding, and Maximum Likelihood decoding are considered. In particular, for each of these decoders, bounds are determined on the magnitudes of the noise and offset intervals which lead to a word error rate equal to zero. Case studies with simulation results are presented confirming the findings.
... Since a minimum Pearson distance detector cannot deal with codewords c with σ c = 0 and cannot distinguish between the words c and c 1 1 + c 2 c, c 2 > 0, well-chosen words must be barred from Q n to guarantee unambiguous detection. Weber et al. [7] coined the name Pearson code for a set of codewords that can be uniquely decoded by a minimum Pearson distance detector. Codewords in a Pearson code S satisfy two conditions, namely ...
Conference Paper
Full-text available
We consider the transmission and storage of data that use coded symbols over a channel, where a Pearson-distance based detector is used for achieving resilience against unknown channel gain and offset, and corruption with additive noise. We discuss properties of binary Pearson codes, such as the Pearson noise distance that plays a key role in the error performance of Pearson-distance-based detection. We also compare the Pearson noise distance to the well-known Hamming distance, since the latter plays a similar role in the error performance of Euclidean distance- based detection.
... Therefore, effective coding and signal processing methods that deal with gain and offset mismatch of the received signals are critical for recovery of source information. Pearson codes, a class of constrained sequence codes that can be decoded with Pearson-distancebased detection which is immune to channel gain and offset mismatch, were proposed in [2] and [3]. Practical systematic Pearson codes were proposed in [4] where the authors studied fixed-to-fixed (FF) and variable-to-fixed (VF) mappings from source words to codewords. ...
Full-text available
Sequences encoded with Pearson codes are immune to channel gain and offset mismatch that cause performance loss in communication systems. In this paper, we introduce an efficient method of constructing capacity-approaching variable-length Pearson codes. We introduce a finite state machine (FSM) description of Pearson codes, and present a variable-length code construction process based on this FSM. We then analyze the code rate, redundancy and the convergence property of our codes. We show that our proposed codes have less redundancy than codes recently described in the literature and that they can be implemented in a straightforward fashion.
... Recently, Immink and Weber [6][7][8] elaborated the idea of T-constrained codes along with its special properties, and developed a detection algorithm based on minimum Pearson distance (MPD) instead of Euclidean distance. T-constrained codes satisfy the requirements of unambiguous detection imposed by the new Pearson distance based detector. ...
K.A.S. Immink and J.H. Weber recently defined and studied a channel with both gain and offset mismatch, modelling the behaviour of charge-leakage in flash memory. They proposed a decoding measure for this channel based on minimising Pearson distance (a notion from cluster analysis). The paper derives a formula for maximum likelihood decoding for this channel, and also defines and justifies a notion of minimum distance of a code in this context.
Conference Paper
Full-text available
We will present coding techniques for transmission and storage channels with unknown gain and/or offset. It will be shown that a codebook of length-n q-ary codewords, S, where all codewords in S have equal balance and energy show an intrinsic resistance against unknown gain and/or offset. Generating functions for evaluating the size of S will be presented. We will present an approximate expression for the code redundancy for asymptotically large values of n.
Full-text available
The performance of certain transmission and storage channels, such as optical data storage and nonvolatile memory (flash), is seriously hampered by the phenomena of unknown offset (drift) or gain. We will show that minimum Pearson distance (MPD) detection, unlike conventional minimum Euclidean distance detection, is immune to offset and/or gain mismatch. MPD detection is used in conjunction with (T) -constrained codes that consist of (q) -ary codewords, where in each codeword (T) reference symbols appear at least once. We will analyze the redundancy of the new (q) -ary coding technique and compute the error performance of MPD detection in the presence of additive noise. Implementation issues of MPD detection will be discussed, and results of simulations will be given.
Full-text available
We explore a novel data representation scheme for multi-level flash memory cells, in which a set of n cells stores information in the permutation induced by the different charge levels of the individual cells. The only allowed charge-placement mechanism is a "push-to-the-top" operation which takes a single cell of the set and makes it the top-charged cell. The resulting scheme eliminates the need for discrete cell levels, as well as overshoot errors, when programming cells. We present unrestricted Gray codes spanning all possible n-cell states and using only "push-to-the-top" operations, and also construct balanced Gray codes. We also investigate optimal rewriting schemes for translating arbitrary input alphabet into n-cell states which minimize the number of programming operations.
Conference Paper
Full-text available
Predetermined fixed thresholds are commonly used in nonvolatile memories for reading binary sequences, but they usually result in significant asymmetric errors after a long duration, due to voltage or resistance drift. This motivates us to construct error-correcting schemes with dynamic reading thresholds, so that the asymmetric component of errors are minimized. In this paper, we discuss how to select dynamic reading thresholds without knowing cell level distributions, and present several error-correcting schemes. Analysis based on Gaussian noise models reveals that bit error probabilities can be significantly reduced by using dynamic thresholds instead of fixed thresholds, hence leading to a higher information rate.
Full-text available
The prior art construction of sets of balanced codewords by Knuth is attractive for its simplicity and absence of look-up tables, but the redundancy of the balanced codes generated by Knuth's algorithm falls a factor of two short with respect to the minimum required. We present a new construction, which is simple, does not use look-up tables, and is less redundant than Knuth's construction. In the new construction, the user word is modified in the same way as in Knuth's construction, that is by inverting a segment of user symbols. The prefix that indicates which segment has been inverted, however, is encoded in a different, more efficient, way.
Flash, already one of the dominant forms of data storage for mobile consumer devices, such as smartphones and media players, is experiencing explosive growth in cloud and enterprise applications. Flash devices offer very high access speeds, low power consumption, and physical resiliency. Our goal in this article is to provide a high-level overview of error correction for Flash. We will begin by discussing Flash functionality and design. We will introduce the nature of Flash deficiencies. Afterwards, we describe the basics of ECCs. We discuss BCH and LDPC codes in particular and wrap up the article with more directions for Flash coding.
Conference Paper
We will present coding techniques for transmission and storage channels with unknown gain and/or offset. It will be shown that a codebook of length-n q-ary codewords, S, where all codewords in S have equal balance and energy show an intrinsic resistance against unknown gain and/or offset. Generating functions for evaluating the size of S will be presented. We will present an approximate expression for the code redundancy for asymptotically large values of n.
Coding schemes for storage channels, such as optical recording and non-volatile memory (Flash), with unknown gain and offset are presented. In its simplest case, the coding schemes guarantee that a symbol with a minimum value (floor) and a symbol with a maximum (ceiling) value are always present in a codeword so that the detection system can estimate the momentary gain and the offset. The results of the computer simulations show the performance of the new coding and detection methods in the presence of additive noise.
In non-volatile memories, reading stored data is typically done through the use of predetermined fixed thresholds. However, due to problems commonly affecting such memories, including voltage drift, overwriting, and inter-cell coupling, fixed threshold usage often results in significant asymmetric errors. To combat these problems, Zhou, Jiang, and Bruck recently introduced the notion of dynamic thresholds and applied them to the reading of binary sequences. In this paper, we explore the use of dynamic thresholds for multi-level cell (MLC) memories. We provide a general scheme to compute and apply dynamic thresholds and derive performance bounds. We show that the proposed scheme compares favorably with the optimal thresholding scheme. Finally, we develop limited-magnitude error-correcting codes tailored to take advantage of dynamic thresholds.