Conference PaperPDF Available

Abstract

The recently proposed Pearson codes offer immunity against channel gain and offset mismatch. These codes have very low redundancy, but efficient coding procedures were lacking. In this paper, systematic Pearson coding schemes are presented. The redundancy of these schemes is analyzed for memoryless uniform sources. It is concluded that simple coding can be established at only a modest rate loss.
Simple Systematic Pearson Coding
Jos H. Weber,∗∗
Delft University of Technology
The Netherlands
j.h.weber@tudelft.nl
Theo G. Swart∗∗
∗∗University of Johannesburg
South Africa
tgswart@uj.ac.za
Kees A. Schouhamer Immink∗∗∗
∗∗∗Turing Machines Inc.
The Netherlands
immink@turing-machines.com
Abstract—The recently proposed Pearson codes offer immunity
against channel gain and offset mismatch. These codes have very
low redundancy, but efficient coding procedures were lacking. In
this paper, systematic Pearson coding schemes are presented. The
redundancy of these schemes is analyzed for memoryless uniform
sources. It is concluded that simple coding can be established at
only a modest rate loss.
I. INT ROD UC TI ON
Dealing with rapidly varying offset and/or gain is an im-
portant issue in signal processing for modern storage and
communication systems. For example, methods to solve these
difficulties in Flash memories have been discussed in, e.g., [7],
[9], and [11]. Also, in optical disc media, the retrieved signal
depends on the dimensions of the written features and upon
the quality of the light path, which may be obscured by
fingerprints or scratches on the substrate, leading to offset
and gain variations of the retrieved signal. Automatic gain
and offset control in combination with dc-balanced codes
are applied albeit at the cost of redundancy [4], and thus
improvements to the art are welcome.
Immink and Weber [5] showed that detectors that use the
Pearson distance offer immunity to offset and gain mismatch.
Use of the Pearson distance demands that the set of codewords
satisfies certain special properties. Such sets are called Pearson
codes. In [10], optimal codes were presented, in the sense of
having the largest number of codewords and thus minimum
redundancy among all q-ary Pearson codes of fixed length n.
However, the important issue of efficient coding procedures
was not addressed. In this paper, we present simple systematic
Pearson coding schemes, mapping sequences of information
symbols generated by a q-ary source to q-ary code sequences.
The redundancy of these coding schemes is analyzed for
memoryless sources generating q-ary symbols with equal
probability.
The remainder of this paper is organized as follows. In
Section II, we review the concepts of Pearson detection and
q-ary Pearson codes. Then, in Section III, we present our
systematic coding schemes and analyze their redundancy.
Finally, in Section IV, we draw conclusions.
II. PR EL IM INA RI ES
A. Codes and Redundancies
Let Cbe a q-ary code of length n, i.e., C ⊆ Qn, where
Q={0,1, . . . , q 1}is the code alphabet of size q2. Here
the alphabet symbols are to be treated as being real numbers
rather than elements of Zq. The cardinality of the code is
denoted by M, i.e., M=|C|. Usually, the redundancy of
code Cis then defined as
nlogqM. (1)
Actually, this assumes that all codewords are equally likely to
be selected. In a more general setting, an arbitrary probability
mass function (PMF) is specified on the codewords. Let the
probability that codeword xi∈ C,1iM, is selected for
transmission or storage be Pi. Since the average amount of
information carried by a codeword is then M
i=1 PilogqPi
symbols, the redundancy of code Cwith PMF {Pi}is
n+
M
i=1
PilogqPi.(2)
In case Pi= 1/M for all i, then (2) reduces to (1).
B. Pearson Detection
For convenience, we use the shorthand notation av+b
=(av1+b, av2+b, . . . , avn+b). A common assumption
is that a transmitted codeword xis received as a vector
r=a(x+ν) + bin Rn. Here aand bare unknown real
numbers with apositive, called the gain and the (dc-)offset,
respectively. Moreover, νis an additive noise vector, where
the νiRare noise samples from a zero-mean Gaussian
distribution. Note that both gain and offset do not vary from
symbol to symbol, but are the same for the whole block of n
symbols. The receiver’s ignorance of the channel’s momentary
gain and offset may lead to massive performance degradation
as shown, for example, in [5] when a traditional detector,
based on thresholds or the Euclidean distance, is used. In the
prior art, various methods have been proposed to overcome
this difficulty. In a first method, data reference, or ‘training’,
patterns are multiplexed with the user data in order to ‘teach’
the data detection circuitry the momentary values of the
channel’s characteristics such as impulse response, gain, and
offset. In a channel with unknown gain and offset, we may
use two reference symbol values, where in each codeword,
a first symbol is set equal to the lowest signal level and a
second symbol equal to the highest signal level. The positions
and amplitudes of the two reference symbols are known to
the receiver. The receiver can straightforwardly measure the
amplitude of the retrieved reference symbols, and normalize
the amplitudes of the remaining symbols of the retrieved
2016 IEEE International Symposium on Information Theory
978-1-5090-1805-5/16/$31.00 ©2016 IEEE 385
codeword before applying detection. Clearly, the redundancy
of the method is two symbols per codeword.
In a second prior art method, codes satisfying equal balance
and energy constraints [2], which are immune to gain and
offset mismatch, have been advocated. However, these codes
suffer from a rather high redundancy. In a recent contribution,
Pearson distance detection is advocated since its redundancy
is much less than that of balanced codes [5]. The Pearson
distance between the vectors uand vis defined as follows. For
a vector u, define u=1
nn
i=1 uiand σ2
u=n
i=1(uiu)2.
Note that σuis closely related to, but not the same as, the
standard deviation of u. The (Pearson) correlation coefficient
of uand vis defined by
ρu,v=n
i=1(uiu)(viv)
σuσv
,(3)
and the Pearson distance between uand vis given by
δ(u,v) = 1 ρu,v.(4)
The Pearson distance and Pearson correlation coefficient are
well-known concepts in statistics and cluster analysis. Since
|ρu,v| ≤ 1, it holds that 0δ(u,v)2. The Pearson
distance is translation and scale invariant, that is, δ(u,v) =
δ(u, av+b), for any real numbers aand bwith a > 0.
Upon receipt of a vector r, a minimum Pearson distance
detector outputs the codeword arg minx∈C δ(r,x). Since the
Pearson distance is translation and scale invariant, we conclude
that the Pearson distance between the received vector and
a codeword is independent of the channel’s gain or offset
mismatch, so that, as a result, the error performance of the
minimum Pearson distance detector is immune to gain and
offset mismatch, which is a big advantage in comparison to
Euclidean distance detectors. However, Pearson distance de-
tectors are more sensitive to noise. Therefore, hybrid minimum
Pearson and Euclidean distance detectors have been proposed
[6] to deal with channels suffering from both significant noise
and gain/offset.
C. Pearson Codes
Its immunity to gain and offset mismatch implies that
the minimum Pearson distance detector cannot be used in
conjunction with arbitrary codes, since δ(r,x) = δ(r,y)if
y=c1+c2x, with c1, c2Rand c2positive. In other words,
since a minimum Pearson detector cannot distinguish between
the words xand y=c1+c2x, the codewords must be taken
from a code C ⊆ Qnthat guarantees unambiguous detection
with the Pearson distance metric (4) accordingly. Furthermore,
note that codewords of the format x= (c, c, . . . , c)should not
be used in order to avoid that σx= 0, which would lead to
an undefined Pearson correlation coefficient. In conclusion, the
following condition must be satisfied:
If x∈ C then c1+c2x/∈ C for all c1, c2R
with (c1, c2)̸= (0,1) and c20. (5)
A code satisfying (5) is called a Pearson code [10]. Known
constructions of Pearson codes read as follows.
The set of all q-ary sequences of length nhaving at least
one symbol ‘0’ and at least one symbol ‘1’. We denote
this code by T(n, q). It is a member of the class of T-
constrained codes [3], consisting of sequences in which
Tpre-determined reference symbols each appear at least
once.
The set of all q-ary sequences of length nhaving at least
one symbol ‘0’, at least one symbol not equal to ‘0’,
and having the greatest common divisor of the sequence
symbols equal to ‘1’. We denote this code by P(n, q). It
is has been shown in [10] that this code is optimal in the
sense that it has the largest number of codewords among
all q-ary Pearson codes of length n.
Another code which is of interest, though not being a Pearson
code, is defined as follows.
The set of all q-ary sequences of length nhaving at least
one symbol ‘0’. We denote this code by Z(n, q). It is also
a member of the class of T-constrained codes [3]. Due
to the presence of the reference symbol ‘0’ it is resistant
against offset mismatch.
Note that
T(n, q)⊆ P(n, q)⊆ Z (n, q).(6)
The cardinalities and redundancies (in the sense of (1)) of
these three codes, as derived in [10], are given in Table I,
where, for a positive integer d, the M¨
obius function µ(d)is
defined [1, Chapter XVI] to be 0if dis divisible by the square
of a prime, otherwise µ(d) = (1)kwhere kis the number
of (distinct) prime divisors of d.
III. SYS TE MATI C COD IN G
As stated, the Pearson code P(n, q)is optimal in the sense
of having largest cardinality and thus smallest redundancy.
However, an easy coding procedure mapping information
sequences to code sequences and vice versa is not evident
at all. In this section, we propose easy coding procedures,
possibly at the expense of a somewhat higher redundancy.
We only use code sequences of a fixed length n, but for the
information we consider both fixed-length and variable-length
sequences. Hence, fixed-to-fixed (FF) as well as variable-
to-fixed (VF) length coding schemes are proposed. For the
source we make the common assumption that it is memoryless
and that all qsource symbols appear with equal probability
1/q. We start by introducing simple coding schemes resistant
against offset mismatch only. Then we continue with similar
procedures for Pearson coding.
A. Systematic Coding for Z(n, q)
The code Z(n, q)consists of all q-ary sequence of length
ncontaining at least one symbol ‘0’. Its cardinality and
redundancy are given in Table I. Here, we propose simple
coding procedures systematically mapping q-ary information
symbols to code sequences x= (x1, x2, . . . , xn)in Z(n, q).
A well-known extremely simple FF-scheme, which we call
ZFF(n, q), is to fill the code sequence xwith n1information
symbols in the subsequence (x1, x2, . . . , xn1)and to set
2016 IEEE International Symposium on Information Theory
386
TABLE I
CARDINALITY AND REDU NDA NCY O F TH E COD ES T(n, q),P(n, q ),AND Z(n, q).
Cardinality Redundancy
T(n, q)qn2(q1)n+ (q2)nlogq(12(q1
q)n+(q2
q)n)
(2(q1
q)n(q2
q)n)/ln(q)
P(n, q)q1
d=1 µ(d)((⌊q1
d+ 1)nq1
dn1)logq(1(q1
q)n+O((q+1
2q)n))
=qn(q1)n+O(q/2n)as n→ ∞ ((q1
q)n+O((q+1
2q)n))/ln(q)
Z(n, q)qn(q1)nlogq(1(q1
q)n)
(q1
q)n/ln(q)
xn= 0. Due to the fixed last symbol, which acts as a
reference, the redundancy of this method is 1.
Note that while the redundancy of Z(n, q)is decreasing in
n, the redundancy of ZFF(n, q)remains 1. Next, we propose a
systematic VF-scheme, ZVF(n, q), for which the redundancy
decreases in n:
1) Take n1information from the q-ary source and set
these as (x1, x2, . . . , xn1).
2) If xi= 0 for at least one 1in1, then choose xn
to be a (new) information symbol, otherwise set xn= 0.
It can easily be seen that the code sequence xis indeed in
Z(n, q)and that the information symbols can be uniquely
retrieved from xby checking whether it contains a zero in
its first n1positions: if ‘yes’, then all ncode symbols
are information symbols, if ‘no’, then only the first n1
code symbols are information symbols. Since the number of
information symbols may vary from codeword to codeword
(being either nor n1), while the length of the codewords
is fixed at n, this can be considered a variable-to-fixed length
coding procedure. All words in Z(n, q)can appear as code
sequence, but not necessarily with equal probability. This leads
to a redundancy as stated in the next theorem.
Theorem 1. For a memoryless uniform q-ary source, the
redundancy of coding scheme ZVF(n, q)is (1 1/q)n1.
Proof: This result can be obtained using (2), with the
observations that (i) Pi= (1/q)n1for the (q1)n1code
sequences xiwith no zeroes among the first n1symbols and
thus with last code symbol equal to zero, and (ii) Pi= (1/q)n
for the other q(qn1(q1)n1)code sequences xiwith
at least one zero among the first n1symbols. Hence, the
resulting redundancy is
n+
M
i=1
PilogqPi
=n+ (q1)n1(1/q)n1logq(1/q)n1+
q(qn1(q1)n1)(1/q)nlogq(1/q)n
= (1 1/q)n1.
Another way to derive this result is to observe that the
TABLE II
ZVF(3,2) C ODI NG F OR A ME MO RYLE SS U NIF OR M BIN ARY SO URC E.
Info Codeword ∈ Z(3,2) Probability Redundancy
000 000 1/8 0
001 001 1/8 0
010 010 1/8 0
011 011 1/8 0
100 100 1/8 0
101 101 1/8 0
11 110 1/4 1
probability of the case that a sequence of n1information
symbols does not contain a zero, leading to one redundant
symbol, is equal to (1 1/q)n1, while the opposite case
leads to no redundancy at all. The weighted average
(1 1/q)n1×1 + (1 (1 1/q)n1)×0 = (1 1/q)n1
then gives the redundancy of ZVF(n, q).
As an example, we consider scheme ZVF(3,2) for a mem-
oryless binary source producing zeroes and ones with equal
probability. The seven codewords of Z(3,2) are then used
with probabilities as indicated in Table II, and thus the average
redundancy is 1/4. This result can be obtained by applying (2),
i.e., 3+6×(1/8) log2(1/8) + (1/4) log2(1/4) = 1/4, or by
directly applying Theorem 1, i.e, (1 1/2)2= 1/4. Note that
achieving the somewhat lower redundancy 3log2(7) = 0.19
of the code Z(3,2) as such would require all seven codewords
to be used with probability 1/7, which does not naturally match
the source statistics.
In conclusion, the redundancy of ZVF(n, q)is (1
1/q)n1, while the approximate redundancy of Z(n, q)is
(1 1/q)n/ln qas given in Table I. Hence, the redundancy
of the proposed VF-scheme ZVF(n, q)is roughly a factor
qln(q)/(q1)
higher than the redundancy of Z(n, q). Note that this factor
does not depend on the code length n, but only on the alphabet
size q. For the binary case q= 2 this factor is 2 ln(2) = 1.39,
for the quaternary case q= 4 it is (4/3) ln(4) = 1.85, while
for large values of qit is roughly ln(q).
2016 IEEE International Symposium on Information Theory
387
B. Systematic Pearson Coding
An extremely simple FF scheme, called TFF(n, q), resistant
against both offset and gain mismatch, is to fill the first n2
positions in the code sequence xwith information symbols and
to reserve the last two symbols for reference purposes: xn1=
0and xn= 1. The resulting code sequence is in T(n, q)since
it contains at least one ‘0’ and at least one ‘1’. The redundancy
of this scheme is fixed at 2 symbols, but, again, it would
be desirable to have a systematic scheme with a redundancy
decreasing in the code length, preferably approaching zero for
large values of n.
The first VF Pearson scheme, called TVF(n, q), we propose
is similar to the VF scheme ZVF(n, q)presented in the
previous subsection. It reads as follows.
1) Take n2information from the q-ary source and set
these as (x1, x2, . . . , xn2).
2) If xi= 0 for at least one 1in2, then choose
xn1to be a (new) information symbol, otherwise set
xn1= 0.
3) If xi= 1 for at least one 1in1, then choose xn
to be a (new) information symbol, otherwise set xn= 1.
Since any code sequence obtained this way contains at least
one ‘0’ and at least one ‘1’, it is a member of T(n, q). Also, the
n2,n1, or ninformation symbols can easily be retrieved
from the code sequence. The redundancy of this scheme is
given in the next theorem.
Theorem 2. For a memoryless uniform q-ary source, the
redundancy of coding scheme TVF(n, q)is
2q1
qq1
qn2
+1
qq2
qn2
.
Proof: The probability that a code sequence xhas two
redundant symbols is
(1 2/q)n2,(7)
which is the probability of having an information sequence of
length n2without zeroes and ones. Further, the probability
that xhas only a redundant symbol in position n1is
(1 1/q)n2(1 2/q)n2,(8)
which is the probability of having an information sequence
of length n2without zeroes but with at least one ‘1’. The
probability that xhas only a redundant symbol in position n
is
(1 1/q)n2(1 2/q)n2(1 1/q),(9)
where the first multiplicative term is the probability of having
an information sequence of length n2without ones but
with at least one ‘0’ and the second multiplicative term is the
probability that the information symbol in position n1is
not equal to ‘1’. Hence, the redundancy is two times the term
in (7) plus the terms in (8) and (9), which gives the expression
stated in the theorem.
The redundancy of TVF(n, q)as stated in Theorem 2 is, for
large values of n, a factor
q(2q1)
2(q1)2ln(q)
higher than the redundancy of T(n, q)as stated in Table I.
For the binary case q= 2 this factor is 3 ln(2) = 2.08, for
the quaternary case q= 4 it is (14/9) ln(4) = 2.16, while for
large values of qit is roughly ln(q).
The second VF Pearson scheme, called PVF(n, q), we
propose is based on relaxing the enforcement of having both
at least one ‘0’ and at least one ‘1’ in all code sequences to
the enforcement that all code sequences xcontain at least one
‘0’ and have the greatest common divisor (GCD) of the xi
equal to one, i.e., GCD{x1, . . . , xn}= 1. It reads as follows.
1) Take n2information from the q-ary source and set
these as (x1, x2, . . . , xn2).
2) If xi= 0 for at least one 1in2, then choose
xn1to be a (new) information symbol, otherwise set
xn1= 0.
3) If GCD{x1, . . . , xn1}= 1, then choose xnto be a
(new) information symbol, otherwise set xn= 1.
Any code sequence obtained in this way is a member of
P(n, q). Again, the n2,n1, or ninformation symbols
can easily be retrieved from the code sequence. For q= 2 and
q= 3, the scheme PVF(n, q)is the same as TVF(n, q), since
the condition that a sequence has a GCD of 1 is then equivalent
to the condition that a sequence contains a ‘1’. Therefore, the
redundancy is as stated in Theorem 2 in these cases. However,
this is not the case if q4, for which we give the redundancy
of PVF(n, q)in the next theorem. First, we present a lemma,
of which the proof is summarized due to lack of space.
Lemma 1. For any fixed q4, among the qnq-ary sequences
yof length n, there are
1) qn(q1)n+O(q/2n)sequences with GCD(y) = 1
containing at least one ‘0’,
2) O(q/2n)sequences with GCD(y)̸= 1 containing at
least one ‘0’,
3) (q1)n+O((q1)/2n)sequences with GCD(y) = 1
containing no symbol ‘0’,
4) O((q1)/2n)sequences with GCD(y)̸= 1 contain-
ing no symbol ‘0’.
Proof: The first result was proved in [10]. Combining this
with the fact that the number of q-ary sequence of length n
containing at least one ‘0’ is qn(q1)ngives the second
result.
Using a well-known counting argument from, e.g., Section
16.5 in [1], it follows that the number of sequences of length
nwith symbols from {1,2, . . . , q 1}and GCD equal to 1 is
q1
d=1
µ(d)(q1)/dn= (q1)n+O((q1)/2n),
where µ(d)is the M¨
obius function already mentioned at the
end of Subsection II-C. This proves the third result, which
2016 IEEE International Symposium on Information Theory
388
combined with the fact that the number of q-ary sequence of
length ncontaining no symbol ‘0’ is (q1)nalso gives the
fourth result.
Theorem 3. For a memoryless uniform q-ary source, with
fixed q4, the redundancy of coding scheme PVF(n, q)is
q1
qn2
+Oq/2
qn2.
Proof: The probability that a code sequence xhas two
redundant symbols is
O(q1)/2
qn2,(10)
which is the probability of having an information sequence of
length n2without zeroes and with a GCD unequal to 1,
as follows from result 4) in Lemma 1. Further, the probability
that xhas only a redundant symbol in position n1is
q1
qn2
+O(q1)/2
qn2,(11)
which is the probability of having an information sequence of
length n2without zeroes but with a GCD equal to 1, as
follows from result 3) in Lemma 1. The probability that xhas
only a redundant symbol in position nis
Oq/2
qn2,(12)
as follows from result 2) in Lemma 1. Hence, the redundancy
is two times the term in (10) plus the terms in (11) and (12),
which gives the expression stated in the theorem.
The redundancy of PVF(n, q)as stated in Theorem 3 is, for
fixed q4and large values of n, a factor
q
q12
ln(q)
higher than the redundancy of P(n, q)as stated in Table I. For
the quaternary case q= 4 this factor is (16/9) ln(4) = 2.46,
while for large values of qit is roughly ln(q). Also, note that,
again for fixed q4and large values of n, the redundancy
of PVF(n, q)is a factor q/(q1) higher than the redundancy
of ZVF(n, q).
IV. CON CL US IO NS
We have presented simple systematic q-ary coding schemes
which are resistant against offset as well as gain mismatch or
against offset mismatch only. Both coding for fixed and coding
for variable length source sequences have been considered,
resulting in FF and VF schemes of fixed code block length
n, respectively. We analyzed the redundancy of the proposed
schemes for memoryless uniform sources. The major findings
are summarized in Table III.
The redundancy of the Pearson schemes TVF(n, q)and
PVF(n, q), resistant against offset as well as gain mismatch,
approaches zero for large n, as desired. The redundancy for
TABLE III
APP ROXI MATE RE DU NDA NCY O F TH E COD ES T(n, q),P(n, q ),AND
Z(n, q)AN D TH E REL ATED F F AND V F SC HEM ES ,FOR L AR GE nAN D
FIX ED q4.
Redundancy Red. FF Red. VF
T(n, q) 2 (q1
q)n/ln(q) 2 2q1
q(q1
q)n2
P(n, q)(q1
q)n/ln(q)(q1
q)n2
Z(n, q)(q1
q)n/ln(q) 1 (q1
q)n1
both schemes is equal if q= 2,3and the redundancy of
the former scheme exceeds the redundancy of the the latter
scheme by a factor of (2q1)/q if q4. Furthermore,
the redundancy of the Pearson scheme PVF(n, q)exceeds the
redundancy of the ZVF(n, q)scheme, which offers immunity
to offset mismatch only, by a factor of (2q1)/(q1) if
q= 2,3and by a factor of only q/(q1) if q4. The
schemes TFF(n, q)and ZFF(n, q)offer extreme simplicity,
using fixed training symbols in fixed positions, at the price
of a redundancy which does not decrease with increasing n.
Finally, the redundancy of the presented TVF(n, q ),
PVF(n, q), and ZVF(n, q)schemes is a bit higher than the
redundancy of their T(n, q),P(n, q), and Z(n, q )associates.
However, note that the low redundancies of these codes as
such are only achieved under the assumption that all their
codewords are used equally likely, which is hard to realize for
memoryless uniform and other practical sources. In contrast,
our VF schemes come with natural simple coding mechanisms.
REF ER EN CE S
[1] G. H. Hardy and E. M. Wright, An Introduction to the Theory of
Numbers (Fifth Edition), Oxford University Press, Oxford, 1979.
[2] K. A. S. Immink, “Coding Schemes for Multi-Level Channels with
Unknown Gain and/or Offset Using Balance and Energy constraints”,
IEEE Int. Symposium on Inform. Theory (ISIT), Istanbul, Turkey, July
2013.
[3] K. A. S. Immink, “Coding Schemes for Multi-Level Flash Memories that
are Intrinsically Resistant Against Unknown Gain and/or Offset Using
Reference Symbols”, Electronics Letters, vol. 50, pp. 20–22, 2014.
[4] K. A. S. Immink and J. H. Weber, “Very Efficient Balanced Codes”,
IEEE Journal on Selected Areas of Communications, vol. 28, pp. 188–
192, 2010.
[5] K. A. S. Immink and J. H. Weber, “Minimum Pearson Distance
Detection for Multi-Level Channels with Gain and/or Offset Mismatch”,
IEEE Trans. Inform. Theory, vol. 60, pp. 5966–5974, Oct. 2014.
[6] K. A. S. Immink and J. H. Weber, “Hybrid Minimum Pearson and
Euclidean Distance Detection”, IEEE Trans. Commun., vol. 63, no. 9,
pp. 3290–3298, Sept. 2015.
[7] A. Jiang, R. Mateescu, M. Schwartz, and J. Bruck, “Rank Modulation
for Flash Memories”, IEEE Trans. Inform. Theory, vol. 55, no. 6, pp.
2659–2673, June 2009.
[8] A. M. Mood, F. A. Graybill, and D. C. Boes, Introduction to the Theory
of Statistics (Third Edition), McGraw-Hill, 1974.
[9] F. Sala, K. A. S. Immink, and L. Dolecek, “Error Control Schemes
for Modern Flash Memories: Solutions for Flash Deficiencies”, IEEE
Consumer Electronics Magazine, vol. 4, no.1, pp. 66–73, Jan. 2015.
[10] J. H. Weber, K. A. S. Immink, and S.R. Blackburn, “Pearson Codes”,
IEEE Trans. Inform. Theory, vol. 62, no. 1, pp. 131–135, Jan. 2016.
[11] H. Zhou, A. Jiang, and J. Bruck, “Error-correcting schemes with
dynamic thresholds in nonvolatile memories”, IEEE Int. Symposium on
Inform. Theory (ISIT), St. Petersburg, Russia, July 2011.
2016 IEEE International Symposium on Information Theory
389
... The use of the Pearson distance requires that the set of codewords satisfies several specific properties. Such sets are called Pearson codes, which have attracted a lot of interest [51][52][53][54][55]. In [51], optimal Pearson codes are presented, in the sense of having the largest number of codewords and thus minimum redundancy among all q-ary Pearson codes of fixed length n. ...
... Properties of binary Pearson codes are discussed in [52,53], where the Pearson noise distance is compared to the well-known Hamming distance. A simple systematic Pearson coding scheme, that maps sequences of information symbols generated by a q-ary source to q-ary code sequences, is proposed in [54]. Construction of a particular kind of Pearson codes, i.e., T-constrained codes [49], using a finite state machine, is introduced in [55]. ...
... This simple codebook is used to demonstrate some important WER characteristics. Code book construction as such is referred to [54]. ...
... Here, we have an additional challenge, as just discussed at the end of previous section. A first priority, when decoding according to (4), is that δ * min > 0. Hence, the main focus in literature so far, see, e.g., [14], [15], has been on avoiding codeword pairs (u, v) with δ * (u, v) = 0. For the binary case, this leads to the code {0, 1} n \ {1} of size 2 n − 1. ...
... However, more in-depth research is required to check their actual performance in case of low or moderate SNR. As an example case, we investigate the performance of (a coset of) the [15,11,3] Hamming code H 4 , as presented in Subsection IV-C, in various scenarios. Simulation results are shown in Figures 1-4. ...
Article
Decoders minimizing the Euclidean distance between the received word and the candidate codewords are known to be optimal for channels suffering from Gaussian noise. However, when the stored or transmitted signals are also corrupted by an unknown offset, other decoders may perform better. In particular, applying the Euclidean distance on normalized words makes the decoding result independent of the offset. The use of this distance measure calls for alternative code design criteria in order to get good performance in the presence of both noise and offset. In this context, various adapted versions of classical binary block codes are proposed, such as (i) cosets of linear codes, (ii) (unions of) constant weight codes, and (iii) unordered codes. It is shown that considerable performance improvements can be achieved, particularly when the offset is large compared to the noise.
... Some of the most widely-recognized constraints include (d, k) RLL constraints that bound the number of logic zeros between consecutive logic ones to be between d and k, and DC-free constraints that bound the running digital sum (RDS) value of the encoded sequence, where RDS is the accumulation of encoded bit weights in a sequence given that a logic one has weight +1 and a logic zero has weight −1 [1]. Some other types of constraints include the Pearson constraint and constraints that mitigate inter-cell interference in flash memories [10], [13], [14], [17], [19]- [22]. CS encoders can be described by finite state machines (FSMs) consisting of states, edges and labels. ...
... For example, with a rate 0.5 (n = 1024, k = 512) ECC, one epoch consists of 2 512 possibilities of codewords of length 1024, which results in very large complexity and makes it difficult to train and implement DNN-based decoding in practical systems [28], [29], [31], [32]. However, we note that in FL CS decoding, this problem does not exist since CS source words are typically considerably shorter, possibly only up to a few dozen symbols [1], [6]- [17]. This property fits deep learning based-decoding well. ...
Preprint
Full-text available
Constrained sequence (CS) codes, including fixed-length CS codes and variable-length CS codes, have been widely used in modern wireless communication and data storage systems. Sequences encoded with constrained sequence codes satisfy constraints imposed by the physical channel to enable efficient and reliable transmission of coded symbols. In this paper, we propose using deep learning approaches to decode fixed-length and variable-length CS codes. Traditional encoding and decoding of fixed-length CS codes rely on look-up tables (LUTs), which is prone to errors that occur during transmission. We introduce fixed-length constrained sequence decoding based on multiple layer perception (MLP) networks and convolutional neural networks (CNNs), and demonstrate that we are able to achieve low bit error rates that are close to maximum a posteriori probability (MAP) decoding as well as improve the system throughput. Further, implementation of capacity-achieving fixed-length codes, where the complexity is prohibitively high with LUT decoding, becomes practical with deep learning-based decoding. We then consider CNN-aided decoding of variable-length CS codes. Different from conventional decoding where the received sequence is processed bit-by-bit, we propose using CNNs to perform one-shot batch-processing of variable-length CS codes such that an entire batch is decoded at once, which improves the system throughput. Moreover, since the CNNs can exploit global information with batch-processing instead of only making use of local information as in conventional bit-by-bit processing, the error rates can be reduced. We present simulation results that show excellent performance with both fixed-length and variable-length CS codes that are used in the frontiers of wireless communication systems.
... Codebook construction as such is beyond the scope of this paper. The interested reader is referred to [14]. ...
... where the fourth equality follows from r i = x i + v i + b, the first inequality follows from (14) and the last inequality from the fact that |v i + b| ≤ |v i | + |b| < α + β for all i. Hence, if decoding is based on minimizing (2), the transmitted codeword is always chosen as the decoding result, leading to a WER equal to zero. ...
Conference Paper
Full-text available
Data storage systems may not only be disturbed by noise. In some cases, the error performance can also be seriously degraded by offset mismatch. Here, channels are considered for which both the noise and offset are bounded. For such channels, Euclidean distance-based decoding, Pearson distance-based decoding, and Maximum Likelihood decoding are considered. In particular, for each of these decoders, bounds are determined on the magnitudes of the noise and offset intervals which lead to a word error rate equal to zero. Case studies with simulation results are presented confirming the findings.
... By invoking the new coding method, we are able to improve the rate efficiency to R 1 (2 q −1) = (q−2 −q+1 )/C or R 1 (2 q −2) = (q − 2 −q+2 )/C , see (6). For the VF code we find the same rate efficiency results, namely R vl (2 q − 1) = R 1 (2 q − 1) and R vl (2 q − 2) = R 1 (2 q − 2), respectively, which accords with the results presented in [17,18,20]. ...
Article
Full-text available
We present coding methods for generating ℓ-symbol constrained codewords taken from a set, S, of allowed codewords. In standard practice, the size of the set S, denoted by M=|S|, is truncated to an integer power of two, which may lead to a serious waste of capacity. We present an efficient and low-complexity coding method for avoiding the truncation loss, where the encoding is accomplished in two steps: first, a series of binary input (user) data is translated into a series of M-ary symbols in the alphabet M = {0, ... ,M - 1}. Then, in the second step, the M-ary symbols are translated into a series of admissible ℓ-symbol words in S by using a small look-up table. The presented construction of Pearson codes and fixed-weight codes offers a rate close to capacity. For example, the presented 255B320B balanced code, where 255 source bits are translated into 32 10-bit balanced codewords, has a rate 0.1 % below capacity.
... The Pearson constraint that is immune to unknown channel gain and offset can be regarded as a type of T -constrained code where each of the T pre-defined symbols appears at least once in every codeword [45]. As discussed in [14], [18], [24], a known construction for q-ary Pearson codes is to ensure that every q-ary codeword has at least one symbol "0" and one symbol "1". ...
Article
Full-text available
We study the ability of recently developed variable-length constrained sequence codes to determine codeword boundaries in the received sequence upon initial receipt of the sequence and if errors in the received sequence cause synchronization to be lost.We first investigate construction of these codes based on the finite state machine description of a given constraint, and develop new construction criteria to achieve high synchronization probabilities. Given these criteria, we propose a guided partial extension algorithm to construct variable-length constrained sequence codes with high synchronization probabilities. With this algorithm we construct new codes and determine the number of codewords and coded bits that are needed to recover synchronization once synchronization is lost.We consider a large variety of constraints including the runlength limited (RLL) constraint, the DC-free constraint, the Pearson constraint and constraints for inter-cell interference mitigation in flash memories. Simulation results show that the codes we construct exhibit excellent synchronization properties, often resynchronizing within a few bits.
... The Pearson correlation coefficient has an ability to overcome unknown gain and the offset mismatch present in the storage or communication channel. In order to do so, some specific properties have to be considered while designing the code book at the transmitter [2]. Such special code book designs are called T-constrained codes. ...
Article
Full-text available
The wireless fading channel creates unknown impairments on the transmitted signal. Multiple antenna systems have been well studied for providing diversity and/or coding gain over fading channels. In this paper, a new simpler form of signaling scheme is developed for Multiple Input Multiple Output(MIMO) channels. At the receiver, minimum Pearson distance based detector (MPD) is employed with Slepian’s algorithm, which uses a unique codebook design to estimate the transmitted code word with lesser computations. The scheme is developed for BPSK/QPSK 2x2 and 4x4 MIMO systems over time invariant Rayleigh fading channel. Simulations show significant gains in terms of Bit Error Rate (BER) performance and computational complexity when compared with conventional space time block coded MIMO systems.
Article
In many channels, the transmitted signals do not only face noise, but offset mismatch as well. In the prior art, maximum likelihood (ML) decision criteria have already been developed for noisy channels suffering from signal independent offset . In this paper, such ML criterion is considered for the case of binary signals suffering from Gaussian noise and signal dependent offset . The signal dependency of the offset signifies that it may differ for distinct signal levels, i.e., the offset experienced by the zeroes in a transmitted codeword is not necessarily the same as the offset for the ones. Besides the ML criterion itself, also an option to reduce the complexity is considered. Further, a brief performance analysis is provided, confirming the superiority of the newly developed ML decoder over classical decoders based on the Euclidean or Pearson distances.
Article
This paper studies a deep learning (DL) framework to solve distributed non-convex constrained optimizations in wireless networks where multiple computing nodes, interconnected via backhaul links, desire to determine an efficient assignment of their states based on local observations. Two different configurations are considered: First, an infinite-capacity backhaul enables nodes to communicate in a lossless way, thereby obtaining the solution by centralized computations. Second, a practical finite-capacity backhaul leads to the deployment of distributed solvers equipped along with quantizers for communication through capacity-limited backhaul. The distributed nature and the non-convexity of the optimizations render the identification of the solution unwieldy. To handle them, deep neural networks (DNNs) are introduced to approximate an unknown computation for the solution accurately. In consequence, the original problems are transformed to training tasks of the DNNs subject to non-convex constraints where existing DL libraries fail to extend straight-forwardly. A constrained training strategy is developed based on the primal-dual method. For distributed implementation, a novel binarization technique at the output layer is developed for quantization at each node. Our proposed distributed DL framework is examined in various network configurations of wireless resource management. Numerical results verify the effectiveness of our proposed approach over existing optimization techniques.
Article
Full-text available
We consider the construction of capacity-approaching variable-length constrained sequence codes based on multi-state encoders that permit state-independent decoding. Based on the finite state machine description of the constraint, we first select the principal states and establish the minimal sets. By performing partial extensions and normalized geometric Huffman coding, efficient codebooks that enable state-independent decoding are obtained. We then extend this multi-state approach to a construction technique based on n-step FSMs. We demonstrate the usefulness of this approach by constructing capacity-approaching variable-length constrained sequence codes with improved efficiency and/or reduced implementation complexity to satisfy a variety of constraints, including the runlength-limited (RLL) constraint, the DC-free constraint, and the DC-free RLL constraint, with an emphasis on their application in visible light communications.
Article
Full-text available
The Pearson distance has been advocated for improving the error performance of noisy channels with unknown gain and offset. The Pearson distance can only fruitfully be used for sets of $q$-ary codewords, called Pearson codes, that satisfy specific properties. We will analyze constructions and properties of optimal Pearson codes. We will compare the redundancy of optimal Pearson codes with the redundancy of prior art $T$-constrained codes, which consist of $q$-ary sequences in which $T$ pre-determined reference symbols appear at least once. In particular, it will be shown that for $q\le 3$ the $2$-constrained codes are optimal Pearson codes, while for $q\ge 4$ these codes are not optimal.
Conference Paper
Full-text available
We will present coding techniques for transmission and storage channels with unknown gain and/or offset. It will be shown that a codebook of length-n q-ary codewords, S, where all codewords in S have equal balance and energy show an intrinsic resistance against unknown gain and/or offset. Generating functions for evaluating the size of S will be presented. We will present an approximate expression for the code redundancy for asymptotically large values of n.
Article
Full-text available
The performance of certain transmission and storage channels, such as optical data storage and nonvolatile memory (flash), is seriously hampered by the phenomena of unknown offset (drift) or gain. We will show that minimum Pearson distance (MPD) detection, unlike conventional minimum Euclidean distance detection, is immune to offset and/or gain mismatch. MPD detection is used in conjunction with (T) -constrained codes that consist of (q) -ary codewords, where in each codeword (T) reference symbols appear at least once. We will analyze the redundancy of the new (q) -ary coding technique and compute the error performance of MPD detection in the presence of additive noise. Implementation issues of MPD detection will be discussed, and results of simulations will be given.
Article
Full-text available
We explore a novel data representation scheme for multi-level flash memory cells, in which a set of n cells stores information in the permutation induced by the different charge levels of the individual cells. The only allowed charge-placement mechanism is a "push-to-the-top" operation which takes a single cell of the set and makes it the top-charged cell. The resulting scheme eliminates the need for discrete cell levels, as well as overshoot errors, when programming cells. We present unrestricted Gray codes spanning all possible n-cell states and using only "push-to-the-top" operations, and also construct balanced Gray codes. We also investigate optimal rewriting schemes for translating arbitrary input alphabet into n-cell states which minimize the number of programming operations.
Conference Paper
Full-text available
Predetermined fixed thresholds are commonly used in nonvolatile memories for reading binary sequences, but they usually result in significant asymmetric errors after a long duration, due to voltage or resistance drift. This motivates us to construct error-correcting schemes with dynamic reading thresholds, so that the asymmetric component of errors are minimized. In this paper, we discuss how to select dynamic reading thresholds without knowing cell level distributions, and present several error-correcting schemes. Analysis based on Gaussian noise models reveals that bit error probabilities can be significantly reduced by using dynamic thresholds instead of fixed thresholds, hence leading to a higher information rate.
Article
Full-text available
The prior art construction of sets of balanced codewords by Knuth is attractive for its simplicity and absence of look-up tables, but the redundancy of the balanced codes generated by Knuth's algorithm falls a factor of two short with respect to the minimum required. We present a new construction, which is simple, does not use look-up tables, and is less redundant than Knuth's construction. In the new construction, the user word is modified in the same way as in Knuth's construction, that is by inverting a segment of user symbols. The prefix that indicates which segment has been inverted, however, is encoded in a different, more efficient, way.
Article
Flash, already one of the dominant forms of data storage for mobile consumer devices, such as smartphones and media players, is experiencing explosive growth in cloud and enterprise applications. Flash devices offer very high access speeds, low power consumption, and physical resiliency. Our goal in this article is to provide a high-level overview of error correction for Flash. We will begin by discussing Flash functionality and design. We will introduce the nature of Flash deficiencies. Afterwards, we describe the basics of ECCs. We discuss BCH and LDPC codes in particular and wrap up the article with more directions for Flash coding.
Article
The reliability of mass storage systems, such as optical data recording and non-volatile memory (Flash), is seriously hampered by uncertainty of the actual value of the offset (drift) or gain (amplitude) of the retrieved signal. The recently introduced minimum Pearson distance detection is immune to unknown offset or gain, but this virtue comes at the cost of a lessened noise margin at nominal channel conditions. We will present a novel hybrid detection method, where we combine the outputs of the minimum Euclidean distance and Pearson distance detectors so that we may trade detection robustness versus noise margin. We will compute the error performance of hybrid detection in the presence of unknown channel mismatch and additive noise.
Article
Coding schemes for storage channels, such as optical recording and non-volatile memory (Flash), with unknown gain and offset are presented. In its simplest case, the coding schemes guarantee that a symbol with a minimum value (floor) and a symbol with a maximum (ceiling) value are always present in a codeword so that the detection system can estimate the momentary gain and the offset. The results of the computer simulations show the performance of the new coding and detection methods in the presence of additive noise.