PreprintPDF Available

An Unsupervised Learning Approach for Data Detection in the Presence of Channel Mismatch and Additive Noise

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

We investigate machine learning based on clustering techniques that are suitable for the detection of encoded strings of q-ary symbols transmitted over a noisy channel with partially unknown characteristics. We consider the detection of the q-ary data as a classification problem, where objects are recognized from a corrupted vector, which is obtained by an unknown corruption process. We first evaluate the error performance of k-means clustering technique without constrained coding. Secondly, we apply constrained codes that create an environment that improves the detection reliability and it allows a wider range of channel uncertainties.
Content may be subject to copyright.
1
An Unsupervised Learning Approach for Data
Detection in the Presence of Channel Mismatch and
Additive Noise
Kees A. Schouhamer Immink and Kui Cai
Abstract—We investigate machine learning based on clustering
techniques that are suitable for the detection of encoded strings
of q-ary symbols transmitted over a noisy channel with partially
unknown characteristics. We consider the detection of the q-ary
data as a classification problem, where objects are recognized
from a corrupted vector, which is obtained by an unknown
corruption process. We first evaluate the error performance of k-
means clustering technique without constrained coding. Secondly,
we apply constrained codes that create an environment that
improves the detection reliability and it allows a wider range
of channel uncertainties.
Index Terms—Constrained coding, storage systems, non-
volatile memories, Pearson distance, Euclidean distance, channel
mismatch, Pearson code. k-means clustering, learning systems
I. INTRODUCTION
We present new techniques for the detection of q-ary data
in the face of additive noise and unknown channel corruption
by a slow change (drift) of some of the channel parame-
ters. The new detection methods are based on the teachings
of cluster analysis. An n-symbol q-ary word (x1, . . . , xn),
xi∈ {0, . . . , q 1}is transmitted or stored, and the received
word (r1, . . . , rn)is corrupted by additive noise, intersym-
bol interference, and other unknown nuisance. Retrieving a
replica of the original q-ary data is seen as the classification
function (r1, . . . , rn)→ {0, . . . , q 1}. Machine learning
and deep learning are techniques that are very suitable for
classification tasks. The detection function is considered here
as a classification problem, or object recognition, which is
targeted by cluster analysis. Cluster analysis is an example of
unsupervised machine learning, a common technique for sta-
tistical data analysis, used in many fields, pattern recognition,
image analysis, information retrieval, data compression, and
computer graphics [1].
We investigate a typical competitive learning algorithm,
named k-means clustering technique, which is an iterative
process that implements the detection function given initial
values of some basic parameters. The aim of the learning
algorithm is to map nreceived symbols into kclusters, where
in the case at hand the kclusters are associated with the
Kees A. Schouhamer Immink is with Turing Machines Inc, Willem-
skade 15d, 3016 DK Rotterdam, The Netherlands. E-mail: immink@turing-
machines.com.
Kui Cai is with Singapore University of Technology and Design (SUTD),
8 Somapah Rd, 487372, Singapore. E-mail: cai kui@sutd.edu.sg.
This work is supported by Singapore Agency of Science and Technology
(A*Star) PSF research grant and SUTD-ZJU grant ZJURP1500102
qsymbol values. The detector is ignorant of the number of
different symbol values in the sent codeword, that is kq.
A major challenge in cluster analysis is the estimation of the
optimal number of ‘clusters’ [2], [3]. The k-means clustering
technique does not allow to easily estimate the number of
(different) clusters, and therefore other means are needed
to estimate the number of clusters, k. Due to the presence
of vexatious codewords and channel distortion, the iteration
process may not always converge to a proper solution. To solve
this issue, we define constrained coding that may assist in
creating an environment where k-means clustering technique
is a reliable detection technique, and the estimation of the
number of clusters can be avoided.
In mass data storage devices, the user data are translated
into physical features that can be either electronic, magnetic,
optical, or of other nature [4]. Due to process variations,
the magnitude of the physical effect may deviate from the
nominal values, which may affect the reliable read-out of the
data. We may distinguish between two stochastic effects that
determine the process variations. On the one hand, we have
the unpredictable stochastic process variations, and on the
other hand, we may observe long-term effects, also stochastic,
due to various physical effects. For example, in non-volatile
memories (NVMs), such as floating gate memories, the data
is represented by stored charge. The stored charge can leak
away from the floating gate through the gate oxide or through
the dielectric. The amount of leakage depends on various
physical parameters, for example, the device temperature, the
magnitude of the charge, the quality of the gate oxide or
dielectric, and the time elapsed between writing and reading
the data.
The probability distribution of the recorded features changes
over time, and specifically the mean and the variance of
the distribution may change. The long-term effects are hard
to predict as they depend on, for example, the (average)
temperature of the storage device. An increase of the variance
over time may be seen as an increase of the noise level of
the storage channel, and it has a bearing on the detection
quality. The long-term deviations from the nominal means,
called offsets, can be estimated using an aging model, but,
clearly, the offsets depend on unpredictable parameters such as
temperature, humidity, etc, so that the prediction is inaccurate.
Various techniques have been advocated for improving the
detector resilience in case of channel mismatch when the
means and the variance of the recorded features distribution
have changed. Estimation of the unknown offsets may be
2
readily achieved by using reference cells, i.e., redundant cells
with known stored data. The method is often considered too
expensive in terms of redundancy, and alternative methods
with lower redundancy have been sought for.
Alternatively, coding techniques can be applied to alleviate
the detection in case of channel mismatch. Specifically bal-
anced codes [5], [6], [7] and composition check codes [8], [9]
preferably in conjunction with Slepian’s optimal detection [10]
offer resilience in the face of channel mismatch. These coding
methods are often considered too expensive in terms of cod-
ing hardware and redundancy, specifically when high-speed
applications are considered.
Detectors based on the Pearson distance instead of the
traditional Euclidean distance are immune to channel mis-
match [11]. For the binary case, q= 2, the redundancy
is low and the complexity of the Pearson detector scales
with n. However, the required number of operations grows
exponentially with the length, n, and alphabet size, q, so
that for larger values of nand qthe method becomes an
impracticability [12]. Alternative detection methods for larger
qand nthat are less costly in resources are welcome.
In this paper, we investigate detection schemes of q-ary,
q > 2, codewords that are based on the results of modern
cluster analysis. We assume distortion of the symbols received
by additive noise and we further assume that the channel
characteristics are not completely known to both sender and
receiver. Detection is based on the observation of nsymbols
only, and the observation of past or future symbols is not
assumed.
We set the scene in Section II with preliminaries and a de-
scription of the mismatched channel model. Prior art detection
schemes are discussed in Section III. In Section IV, we present
a new detection based on k-means clustering. Computer
simulations are conducted to assess the error performance
of the prior art and new schemes developed. In Sections V
and VI, we adopt a simple linear channel model where it
is assumed that the gain and offset of the received signal
are unknown. Computer simulations are conducted to assess
the error performance of the detection schemes. Section VII
concludes this paper.
II. PRELIMINARIES AND CHANNEL MODEL
We consider a communication codebook, S ⊆ Qn, of
selected n-symbol codewords x= (x1, x2, . . . , xn)over the
q-ary alphabet Q={0, . . . , q 1}, where n, the length of x,
is a positive integer. The codeword, x∈ S, is translated into
physical features, where the logical symbols, i, are written at
an average (physical) level i+bi, where biR,0iq1,
denotes the average deviation from the nominal or ‘ideal’
value. The average deviations, bi, may slowly vary (drift)
in time due to charge leakage or temperature change. The
quantities biare average deviations, called offsets, from the
nominal levels, and they are relatively small with respect to the
assumed unity difference (or amplitude) between neighboring
physical signal levels. For unambiguous detection, the average
of the physical level associated with the logical symbol i’ is
assumed to be less than that associated with the logical symbol
i+ 1’. In other words, we have the premise
b0<1 + b1<2 + b2<·· · < q 1 + bq1(1)
or
bi1bi<1,1iq1.(2)
Assume a codeword, x, is sent. The symbols, ri, of the
retrieved vector r= (r1, . . . , rn)are distorted by additive
noise and given by
ri=xi+bxi+νi.(3)
We first design a detector for the above case where the
unknown offsets, bis, are uncorrelated. Thereafter, we dis-
tinguish two special cases, where the bi’s are correlated. For
the first, general, case, we assume
bi= (a1)i+b, (4)
where ais an unknown attenuation, or gain, of the channel,
and bis an unknown offset,aand bR. We simply find using
(3)
ri=axi+b+νi.(5)
In the offset-only case, a= 1, all bi’s are equal, or ri=
xi+b+νi. We assume that the received vector, r, is corrupted
by additive Gaussian noise ν=(ν1, . . . , νn), where νiR
are zero-mean independent and identically distributed (i.i.d)
noise samples with normal distribution N(0, σ2). The quantity
σ2Rdenotes the noise variance. The additive noise term
may be caused by fabrication process variations or electronics
(detector) noise.
III. PRIOR ART DETECTION SCHEMES
Below we discuss three prior art detection schemes and
relevant properties.
A. Fixed threshold detection (FTD)
The symbols of the received word, ri, can be straightfor-
wardly quantized to an integer, ˆxi∈ Q, with a conventional
fixed threshold detector (FTD), also called symbol-by-symbol
detector. The threshold function is denoted by ˆxi= Φϑ(ri),
ˆxi∈ Q, where the threshold vector ϑ= (ϑ0, . . . , ϑq2)has
q1(real) elements, called thresholds or threshold levels. The
threshold vector satisfies the order
ϑ0< ϑ1< ϑ2<·· · < ϑq2.(6)
The quantization function, Φϑ(u), of the threshold detector is
defined by
Φϑ(u) =
0, u < ϑ0,
i,ϑi1u < ϑi,1iq2,
q1,uϑq2.
(7)
For a fixed threshold detector the q1detection thresholds
values, ϑi, are equidistant at the levels
ϑi=1
2+i, 0iq2.(8)
3
Threshold detection is very attractive for its implementation
simplicity. However, the error performance seriously degrades
in the face of channel mismatch [11]. A detector that dynam-
ically adjusts the thresholds is an alternative that offers solace
in the face of channel mismatch. The next subsection describes
a typical example.
B. Dynamic threshold detection (min-max detector)
We assume that the channel model is, see (5), ri=
axi+b+νi, where the gain, a > 0, and offset, b, are
unknown parameters, except for the sign of a. In case S=Qn,
that is all possible codewords are allowed, mismatch im-
mune detection is not possible since such a detector cannot
distinguish between the word ˆ
xand its shifted and scaled
version ˆ
y=c1ˆ
x+c2. A designer must judiciously select
codewords from Qngiven adequate constraints that may
enable mismatch immune detection. For example, we select
for Sthose codewords where the symbols ‘0’ and ‘q1
must be both at least once present. For the binary case, q= 2,
this implies a slight redundancy as only the all-1 and all-0
words have to be removed, see Subsection VI-A for details.
Then, the detector can straightforwardly estimate the gain and
offset by
ˆa=maxiriminiri
q1(9)
and ˆ
b= min
iri,(10)
where ˆaand ˆ
bdenote the estimates of the actual channel gain
and offset [13]. The dynamic thresholds, denoted by ˆ
ϑi, are
scaled in a similar fashion as the received codeword, that is,
ˆ
ϑi= ˆi+ˆ
b, 0iq2.(11)
It has been shown [13] that the min-max detector operates
over a large range of unknown parameters aand b. However,
since the estimates, ˆaand ˆ
b, are biased, the above dynamic
threshold detector loses error performance with respect to the
matched case, especially for larger codeword length n. The
detector complexity scales linearly with nas the principal
cost is the finding of the maximum and minimum of the n
received symbol values. Alternatively, detection based on the
prior art Pearson distance, discussed in the next subsection,
improves the error performance, but with mounting hardware
requirements.
C. Pearson distance detection
Immink and Weber [11] advocated the Pearson distance
instead of the conventional Euclidean distance for improving
the error performance of a mismatched noisy channel. We first
define two quantities, namely the vector average of the n-
vector z
z=1
n
n
i=1
zi(12)
and the (unnormalized) vector variance of z
σ2
z=
n
i=1
(ziz)2.(13)
The Pearson distance, δp(r,ˆ
x), between the received vector r
and a codeword ˆ
x∈ S is defined by
δp(r,ˆ
x) = 1 ρr,ˆ
x,(14)
where
ρr,ˆ
x=n
i=1(rir)(ˆxiˆx)
σrσˆx
(15)
is the well-known (Pearson) correlation coefficient. It is as-
sumed that both codewords xand ˆ
xare taken from a judi-
ciously chosen codebook S, whose properties are explained
in subsection VI-A. The Pearson distance is not a metric in
the strict mathematical sense, but in engineering parlance it
is still called a distance since it provides a useful measure
of similarity between vectors. A minimum Pearson distance
detector outputs the codeword
xo= arg min
ˆ
xS
δp(r,ˆ
x).(16)
It can easily be verified that the minimization of δp(r,ˆ
x), and
thus xo, is independent of both aand b, so that the detection
quality is immune to unknown drift of the quantities aand
b. The minimization operation (16) requires |S|computations,
which is impractical for larger S. The number of computations
can be reduced to K, the number of constant composition
codes that constitutes the codebook S, given by [14]
K=n+q3
q1.(17)
For the binary case, q= 2, we have K=n1, so that
the detection algorithm (16) scales linearly with n. For the
non-binary case, it is hard to compute or simulate the error
performance of minimum Pearson distance detection in a
relevant range of qand nas the number, K, of operations
grows rapidly with both qand n. For example, for q= 4 and
n= 64 we have K= 43.680 comparisons (16) per decoded
codeword.
The three prior art detection methods discussed above have
drawbacks in error performance and/or complexity, and to
alleviate these drawbacks, viable alternatives are sought for. In
the next section, we propose and investigate a novel detection
method with less complexity requirements, which is based on
clustering techniques.
IV. DATA DETECTION U SI NG k-MEANS CLUSTERING
In the next subsection we describe the basic k-means
clustering algorithm, and present results of simulations for the
unmatched noisy channel.
A. Basic k-means clustering algorithm
The k-means clustering technique aims to partition the n
received symbols into ksets V={V0, V1, . . . , Vk1}so as to
minimize the within-cluster sum of squares defined by
arg min
V
k1
i=0
rjVi
(rjµi)2,(18)
4
where the centroid µiis the mean of the received symbols in
cluster Vi, or
µi=1
|Vi|
rjVi
rj.(19)
The problem of choosing the correct number of clusters is
hard, and numerous prior art publications are available to
facilitate this choice [2], [3]. Here we assume that a cluster is
associated with one of the ksymbol values, that is k=q.
The k-means clustering algorithm is an iteration process that
finds a solution of (18). The initial sets V(1)
i,0ik1, are
empty. The superscript integer in parentheses, (t), denotes the
iteration tally. We initialize the kcentroids µ(1)
i,0ik1,
by a reasonable choice. For example, Forgy’s method [15]
randomly chooses ksymbols (assuming k < n), ri, from
the received vector r, and uses these as the initial centroids
µ(1)
i,1ik. The choice of the initial centroids has a
significant bearing on the error performance of the clustering
detection technique. We do not follow Forgy’s approach, and
try, dependent on the specific situation at hand, to develop
more suitable initial centroids µ(1)
i’s. We assume that we order
the centroids such that
µ(t)
0< µ(t)
1<·· · < µ(t)
q1.(20)
After the initialization step, we iterate the next two steps until
the symbol assignments no longer change.
Assignment step: Assign the nreceived symbols, ri, to
the ksets V(t+1)
j. If ri,1in, is closest to µ(t)
, or
= arg min
jriµ(t)
j2,(21)
then riis assigned to V(t+1)
. The (temporary) decoded
codeword, denoted by
ˆ
x(t)= (ˆx(t)
1, . . . , ˆx(t)
n),(22)
is found by
ˆx(t)
i=ϕV(t)(ri),1in, (23)
where ϕV(t)(ri) = jsuch that riV(t)
j.
Updating step: Compute updated versions of the k
means µ(t+1)
j, j ∈ Q. An update of the new means µ(t+1)
j
is found by
µ(t+1)
j=1
|V(t+1)
j|
riV(t+1)
j
ri, j ∈ Q,(24)
where it is understood that if |V(t+1)
j|= 0 that µ(t+1)
j=
µ(t)
j(that is, no update).
After running the above routine until the temporary decoded
word is unchanged, say at iteration step, t=to, we have
ˆ
x(to1) =ˆ
x(to). Then we have found the final estimate of the
sent codeword, xo=ˆ
x(to). Bottou [16] showed that the k-
means cluster algorithm always converges to a simple steady
state, and limit cycles do not occur. It is possible, however,
that the process reaches a local minimum of the within-cluster
sum of squares (18).
B. Assignment step: relation with threshold detection
We take a closer look at the assignment step of the k-means
clustering technique, given by (21). Considering the order (20)
of the centroids µ(t)
j, we simply infer that the symbol rilies
between, say, µ(t)
uriµ(t)
u+1,0uq2. Thus
= arg min
j∈{u,u+1}riµ(t)
j2.
As
riµ(t)
u2riµ(t)
u+12
= 2riµ(t)
u+1 µ(t)
u+µ(t)
u2µ(t)
u+12
=µ(t)
u+1 µ(t)
u2riµ(t)
uµ(t)
u+1,(25)
we obtain
=
u,ri<µ(t)
u+1+µ(t)
u
2,
u+ 1,ri>µ(t)
u+1+µ(t)
u
2.
(26)
Using (7), we yield
= arg min
jriµ(t)
j2= Φˆ
ϑ(ri),(27)
where the threshold vector, ˆ
ϑ, is given by
ˆ
ϑi=µ(t)
i+1 +µ(t)
i
2,0iq2,(28)
and the intermediate decoded vector, ˆ
x(t), is given by
ˆx(t)
i= Φˆ
ϑ(ri),1in. (29)
We conclude that the k-means cluster detection method is a
dynamic threshold detector, where at each update the threshold
vector, ˆ
ϑ, is updated with the means of the members of each
cluster using (24).
In the next section, we report on outcomes of computer
simulations using channel model (3).
C. Results of simulations
We investigate the error performance of channel model
(3), where we assume that the stochastic deviations from the
means, bi,i∈ Q, are taken from a zero-mean continuous
uniform distribution with variance σ2
b. Thus, the bi’s lie within
the range 3σbbi3σb. We assume a uniform
distribution to guarantee premise (2).
We simply initialize the centroids by µ(1)
i=i,i∈ Q,
and iterate the assignment and updating steps as outlined
above. Figure 1 shows outcomes of computer simulations for
the case n= 64 and q= 4, where we compare the word
error rate (WER) of conventional fixed threshold detection
and the novel dynamic threshold detection based on k-means
clustering classification versus the signal-to-noise ratio (SNR)
defined by =20 log σ. We plotted two cases, namely σb= 0
(ideal channel) and σb= 0.1. As a further comparison we
5
TABLE I
HIS TOG RAM O F TH E NUM BE R OF IT ER ATION S FO R q= 4,n= 64,AN D
σb= 0.1.
toSNR = 17 dB SNR = 20 dB
1 91.43 99.70
2 8.37 0.30
3 0.20 0
17 17.5 18 18.5 19 19.5 20
10−5
10−4
10−3
10−2
10−1
100
WER
SNR (dB)
(c)
(a’)
(b’)
(a)
(b)
Fig. 1. Word error rate (WER) of fixed threshold detection (FTD),
curve (a’), and k-means clustering detection, curve (b’), versus SNR
=20 log σ(dB) for n= 64,q= 4, and σb= 0.1. Curves (a) and
(b) are shown for the case σb= 0 (ideal channel). The upperbound
(30) to the word error rate of a fixed threshold detector for an ideal
noisy channel, q= 4 and n= 64, curve (c).
plotted the upper bound of the word error rate of a threshold
detector for an ideal additive noise channel, given by [11]
WER <2(q1)
qnQ 1
2σ.(30)
We infer that in case the channel is ideal, σb= 0, that the
error performance of k-means clustering detection is close
to the performance of both theory and simulation practice of
conventional fixed threshold detection. In case the channel is
not ideal, σb= 0.1,k-means clustering detection is superior
to fixed threshold detection.
The number of iterations, which is an important (time)
complexity issue, depends on the integers q,n, and the signal-
to-noise ratio, SNR. The convergence of the iteration process
is guaranteed [16], but the speed of convergence is an open
question that we studied by computer simulations. Table I
shows results of simulations for the case q= 4,n= 64,
and σb= 0.1(same parameters as used in the simulations
depicted in Figure 1). At an SNR = 17 dB, around 91% of
the received words is detected without further iterations. In
8% of the detected words, only one iteration of the threshold
levels is needed. At an SNR = 20 dB, we found that all but
no iterations are required. Thus, since in the large majority of
cases no iterations are needed, we conclude that at the cost
of a slight additional (time) complexity, the proposed k-means
clustering classification outperforms fixed threshold detection.
17 17.5 18 18.5 19 19.5 20
10−5
10−4
10−3
10−2
10−1
100
WER
SNR (dB)
cluster
ideal
FTD
Fig. 2. Word error rate (WER) of fixed threshold detection (FTD)
and k-means clustering detection (cluster) versus SNR for n= 64,
q= 4,a= 0.95, and b= 0. As a reference, the upperbound to
the word error rate of a fixed threshold detector for the ideal noisy
channel given by (30).
V. UNKNOWN GAIN aA ND OFFS ET b(S MA LL R AN GE O F
UN CE RTAIN TY )
In this section, we assume that the linear channel model, see
(5), ri=axi+b+νi, applies. In case the gain, a, is within
a tolerance range close to unity and the tolerance range of
the offset, b, is close to zero, we may directly apply the basic
k-means clustering as outlined in the previous section. We
require that both aand bare so close to their nominal values
that a fixed threshold detector works correctly in the noiseless
case. Then, the initialization, using the fixed threshold detector,
furnishes sufficiently reliable data for the iterations to follow.
From the definition of a fixed threshold decoder, see (7), we
simply derive the following tolerance ranges of aand bthat
guarantee a flawlessly operating threshold detector, namely
b < ϑ0=1
2,
ϑi1< ai +b < ϑi,1iq2,
a(q1) + b > ϑq2=q3
2,(31)
or
b < 1
2,
i1
2< ai +b < i +1
2,1iq2,
a(q1) + b > q 3
2.(32)
Figure 2 shows outcomes of computer simulations, where we
compare for the case n= 64 and q= 4, the word error rate
(WER) of fixed threshold detection and detection based on k-
means clustering versus the signal-to-noise ratio (SNR), where
the channel gain equals a= 0.95 and b= 0. We conclude that
the cluster detector shows a greater resilience in the face of
unknown gain, a, and additive noise than the fixed threshold
detector.
6
In the above case, the parameters aand bare assumed to
have a limited range of uncertainty. In case, however, they
have a wider tolerance range than prescribed by (32), it is
not possible to unambiguously detect the codeword with a
fixed threshold detector. The detector needs assistance, and
constrained coding is applied to assist in overcoming this
difficulty as discussed in the next section.
VI. UNKNOWN GAIN aA ND O FFS ET b(L AR GE R AN GE O F
UN CE RTAIN TY )
In this section, we focus on the situation where we anticipate
that both parameters aand bhave such a great range of
possible values that a fixed threshold detector fails in the
majority of cases, even in the noiseless case. In the next
subsection, we show, by example, that in such a case it is im-
possible to distinguish between certain nettlesome situations,
and constrained coding becomes a requirement to solve the
ambiguity.
A. Constrained coding
In order to cope with larger uncertainties of both parameters
aand b, we face an ambiguity problem. For example, let
q= 5, and let (2,4,4) be the received vector. Clearly, it is
impossible to distinguish between the two choices, where the
sent codeword is (2,4,4) and a= 1 or where (1,2,2) and
a= 2. Let Sbe the adopted codebook, then we can cope
with the above ambiguity if (2,4,4) ∈ S then (1,2,2) /∈ S,
or vice versa. The name Pearson code was coined for a set of
codewords that can be uniquely decoded by a detector immune
to large uncertainties in both a > 0and b[13]. Codewords in
a Pearson code, S, satisfy two conditions, namely
Property A: If x∈ S then c1+c2x/∈ S for all c1, c2R
with (c1, c2)̸= (0,1) and c2>0.
Property B: x= (c, c, . . . , c)/∈ S for all cR.
We adopt a Pearson code that has codewords with at least one
‘0’ symbol and at least one ‘q1’ symbol. We may easily
verify that such codewords satisfy Properties A and B. The
number of allowable n-symbol codewords equals [13]
|S| =qn2(q1)n+ (q2)n, q > 1.(33)
For the binary case, q= 2, we simply find that
|S| = 2n2
(both the all-‘1’ and all-‘0’ words are deleted).
B. Revised k-means clustering using min-max initialization
Here it is assumed that the parameters aand bare com-
pletely unknown, except for the sign of a,a > 0. Due to
the large uncertainty, we cannot adopt the elementary choice
of the initial values of the centroids µ(1)
ias described in
Section IV. We propose, following the min-max detector tech-
nique described in Subsection III-B, the choice of the initial
centroids µ(1)
i’s using the minimum, miniri, and maximum
value, maxiri, of the received symbols. The Pearson code
guarantees at least one ‘0’ symbol and also at least one ‘q1
symbol in a codeword. The detector may therefore use the
13 13.5 14 14.5 15 15.5 16 16.5
SNR (dB)
10-5
10-4
10-3
10-2
10-1
100
WER
(a)
(b)
(c)
Fig. 3. Word error rate (WER) for the case q= 4,n= 64 and
gain a= 1.5of a) prior art min-max detector as described in
Subsection III-B, b) k-means clustering algorithm, and c) upperbound
(30) of an ideal fixed threshold detector. Note that the signal-to-noise
ratio is defined by SNR = 20 log(σ/a). The error performance is
independent of the offset b.
minimum and maximum value of the received symbols as
anchor points defining the range of values of the symbols in
the received vector. To that end, let
α0= min
iriand α1= max
iri.(34)
The qinitial centroids, µ(1)
i, are found by the interpolation
µ(1)
i=α0+ (α1α0)i
q1,0iq1.(35)
Note that the above initialization step of the modified k-means
clustering technique has the same effect as the scaling used in
the min-max detector (11). Figure 3 shows results of computer
simulations for the case q= 4 and n= 64 and a gain a= 1.5.
For normalization purposes, we define the SNR by SNR =
20 log(σ/a). We compared prior art DTD with the k-means
clustering detection algorithm. The detector based on k-means
clustering outperforms the prior art min-max detector.
In the next subsection, we discuss a second modification to
the basic k-means clustering method using regression analysis.
C. Revised k-means clustering algorithm using regression
analysis
We adopt a second modification to the clustering algo-
rithm of Section IV. In the basic updating step (24), the
kclusters centroids are updated by computing a new mean
of the members in that cluster only. Here we assume that
the linear channel model, ri=axi+b+νi, described by
(5) holds. We have investigated an alternative method for
updating the centroids, µ(t+1)
j, by applying the well-known
linear regression model [17] that estimates the two coefficients
aand binstead of the qcentroids µi.
We start and initialize as described in the previous sub-
section, where the qinitial centroids, µ(1)
i, are found by the
interpolation
µ(1)
i=α0+ (α1α0)i, 0iq1,(36)
7
where, as in (34),
α0= min
iriand α1= max
iri.(37)
For the offset only case, a=1, we have
µ(1)
i=α0+i, i = 0, . . . , q 1.(38)
After the initialization, we iterate the next two steps until
equilibrium is reached.
Assignment step: Assign the nreceived symbols, ri, to
the ksets V(t+1)
j. If ri,1in, is closest to µ(t)
, or
= arg min
jriµ(t)
j2,(39)
then riis assigned to V(t+1)
. The (temporary) decoded
codeword, denoted by
ˆ
x(t)= (ˆx(t)
1, . . . , ˆx(t)
n),(40)
is found by
ˆx(t)
i=ϕV(t)(ri),1in, (41)
where ϕV(t)(ri) = jsuch that riV(t)
j.
Updating step: Updates of the means µ(t+1)
j, j ∈ Q
are found by a linear regression model that estimates
the coefficients aand b. To that end, define the linear
regression model
ˆri= ˆa(t)ˆx(t)
i+ˆ
b(t),(42)
where the (real-valued) regression coefficients ˆa(t)and
ˆ
b(t), chosen to minimize n
i=1(riˆri)2, denote the esti-
mates of the unknown quantities aand b. The regression
coefficients ˆa(t)and ˆ
b(t)are found by invoking the well-
known linear regression method [17], and we find
ˆ
b(t)=rˆa(t)ˆx(t)(43)
and
ˆa(t)=n
i=1(rir)(ˆxiˆx)
σ2
ˆx
=σr
σˆx(t)
ρr,ˆ
x(t).(44)
We note that for all xS,σˆx(t)̸= 0 since Prop-
erty B holds, see Subsection VI-A. The updated µ(t+1)
i,
i= 0, . . . , q 1, are found by the interpolation
µ(t+1)
i= ˆa(t)i+ˆ
b(t).(45)
For the offset-only case, a= 1, we simply find
ˆ
b(t)=rˆx(t),(46)
and
µ(t+1)
i=i+ˆ
b(t)=i+rˆx(t).(47)
We have conducted a myriad of computer simulations with the
above algorithms. Figure 4 compares the error performance
of the revised k-means clustering using min-max initializa-
tion versus the revised k-means clustering using regression
analysis for the case q= 16 and n= 64. The performance
difference between the two cluster methods is independent of
the unknown quantities aand b.
17 17.5 18 18.5 19 19.5 20
SNR (dB)
10-4
10-3
10-2
10-1
WER
(a)
(b)
Fig. 4. Word error rate (WER) of a) revised k-means clustering using
min-max initialization, and b) revised k-means clustering algorithm
using regression method for the case for q= 16 and n= 64.
VII. CONCLUSIONS
We have proposed and analyzed machine learning based
on a k-means clustering technique as a detection method of
encoded strings of q-ary symbols. We have analyzed the de-
tection of distorted data retrieved from a data storage medium
where user data is stored as physical features with qdifferent
levels. Due to manufacturing tolerances and ageing the qlevels
differ from the desired, nominal, ones. Results of simulations
have been presented, where the qunknown level differences,
called offsets, are independent stochastic variables with a
uniform probability distribution. We have evaluated the error
performance of k-means clustering detection technique, where
the offsets are correlated, and can be modelled as unknown
scale, or gain, and translation, or offset. At the cost of some
additional (time) complexity, the proposed k-means clustering
classification outperforms common prior art dynamic detection
methods in the face of additive noise and channel mismatch.
REF ER EN CE S
[1] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press,
2016.
[2] R. Tibshirani, G. Walther, and T. Hastie, “Estimating the number of
clusters in a data set via the gap statistic,” Journal of the Royal Statistical
Society, vol. 63, pp. 411-423, 2001.
[3] A. D. Gordon, Classifcation, 2nd edn., London: Chapman and Hall-
CRC, 1999.
[4] K. A. S. Immink, “A Survey of Codes for Optical Disk Recording,” IEEE
J. Select. Areas Commun., vol. 19, no. 4, pp. 756-764, April 2001.
[5] K. A. S. Immink, “Coding Schemes for Multi-Level Channels with
Unknown Gain and/or Offset Using Balance and Energy constraints,
pp. 709-713, IEEE International Symposium on Information Theory,
(ISIT), Istanbul, July 2013.
[6] H. Zhou, A. Jiang, and J. Bruck, “Balanced Modulation for Nonvolatile
Memories,” arXiv:1209.0744, Sept. 2012.
[7] B. Peleato, R. Agarwal, J. M. Cioffi, M. Qin, Member, and P. H. Siegel,
“Adaptive Read Thresholds for NAND Flash,IEEE Transactions on
Commun., vol. COM-63, pp. 3069-3081, Sept. 2015.
[8] F. Sala, R. Gabrys, and L. Dolecek, “Dynamic Threshold Schemes for
Multi-Level Non-Volatile Memories,” IEEE Trans. on Commun., pp.
2624-2634, vol. COM-61, July 2013.
[9] K. A. S. Immink and K. Cai, “Composition Check Codes,” IEEE Trans.
Inform. Theory, vol. IT-64, pp. 249-256, Jan. 2018.
8
[10] D. Slepian, “Permutation Modulation,” Proc. IEEE, vol. 53, pp. 228-236,
March 1965.
[11] K. A. S. Immink and J. H. Weber, “Minimum Pearson Distance
Detection for Multi-Level Channels with Gain and/or Offset Mismatch,
IEEE Trans. Inform. Theory, vol. IT-60, pp. 5966-5974, Oct. 2014.
[12] K. A. S. Immink, K. Cai, and J. H. Weber, “Dynamic Threshold
Detection Based on Pearson Distance Detection,” IEEE Trans. Commun.,
vol. COM-66, Issue 7, pp. 2958-2965, 2018.
[13] K. A. S. Immink, “Coding Schemes for Multi-Level Flash Memories that
are Intrinsically Resistant Against Unknown Gain and/or Offset Using
Reference Symbols,” Electronics Letters, vol. 50, pp. 20-22, 2014.
[14] K. A. S. Immink and K. Cai, “Design of Capacity-Approaching
Constrained Codes for DNA-based Storage Systems,IEEE Commun.
Letters, vol. 22, pp. 224-227, Feb. 2018.
[15] E. W. Forgy, “Cluster analysis of multivariate data: efficiency versus
interpretability of classifications,” Biometrics, vol. 21, pp. 768-769,
(1965).
[16] L. Bottou and Y. Bengio, “Convergence properties of the k-mean
algorithms,” Advances in Neural Information Processing Systems 7, pp.
585-592, MIT Press, 1995.
[17] R. V. Hogg and A. T. Craig, Introduction to Mathematical Statistics, 5th
ed. New York: Macmillan, 1995.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
We consider the transmission and storage of encoded strings of symbols over a noisy channel, where dynamic threshold detection is proposed for achieving resilience against unknown scaling and offset of the received signal. We derive simple rules for dynamically estimating the unknown scale (gain) and offset. The estimates of the actual gain and offset so obtained are used to adjust the threshold levels or to re-scale the received signal within its regular range. Then, the re-scaled signal, brought into its standard range, can be forwarded to the final detection/decoding system, where optimum use can be made of the distance properties of the code by applying, for example, the Chase algorithm. A worked example of a spin-torque transfer magnetic random access memory (STT-MRAM) with an application to an extended (72, 64) Hamming code is described, where the retrieved signal is perturbed by additive Gaussian noise and unknown gain or offset.
Article
Full-text available
We consider coding techniques that limit the lengths of homopolymer runs in strands of nucleotides used in DNA-based mass data storage systems. We compute the maximum number of user bits that can be stored per nucleotide when a maximum homopolymer runlength constraint is imposed. We describe simple and efficient implementations of coding techniques that avoid the occurrence of long homopolymers, and the rates of the constructed codes are close to the theoretical maximum. The proposed sequence replacement method for k-constrained q-ary data yields a significant improvement in coding redundancy than the prior art sequence replacement method for the k-constrained binary data. Using a simple transformation, standard binary maximum runlength limited sequences can be transformed into maximum runlength limited q-ary sequences, which opens the door to applying the vast prior art binary code constructions to DNA-based storage.
Article
Full-text available
A primary source of increased read time on nand flash comes from the fact that, in the presence of noise, the flash medium must be read several times using different read threshold voltages for the decoder to succeed. This paper proposes an algorithm that uses a limited number of rereads to characterize the noise distribution and recover the stored information. Both hard and soft decoding are considered. For hard decoding, this paper attempts to find a read threshold minimizing bit error rate (BER) and derives an expression for the resulting codeword error rate. For soft decoding, it shows that minimizing BER and minimizing codeword error rate are competing objectives in the presence of a limited number of allowed rereads, and proposes a tradeoff between the two. The proposed method does not require any prior knowledge about the noise distribution but can take advantage of such information when it is available. Each read threshold is chosen based on the results of previous reads, following an optimal policy derived through a dynamic programming backward recursion. The method and results are studied from the perspective of an SLC Flash memory with Gaussian noise, but this paper explains how the method could be extended to other scenarios.
Conference Paper
Full-text available
We will present coding techniques for transmission and storage channels with unknown gain and/or offset. It will be shown that a codebook of length-n q-ary codewords, S, where all codewords in S have equal balance and energy show an intrinsic resistance against unknown gain and/or offset. Generating functions for evaluating the size of S will be presented. We will present an approximate expression for the code redundancy for asymptotically large values of n.
Article
Full-text available
The performance of certain transmission and storage channels, such as optical data storage and nonvolatile memory (flash), is seriously hampered by the phenomena of unknown offset (drift) or gain. We will show that minimum Pearson distance (MPD) detection, unlike conventional minimum Euclidean distance detection, is immune to offset and/or gain mismatch. MPD detection is used in conjunction with (T) -constrained codes that consist of (q) -ary codewords, where in each codeword (T) reference symbols appear at least once. We will analyze the redundancy of the new (q) -ary coding technique and compute the error performance of MPD detection in the presence of additive noise. Implementation issues of MPD detection will be discussed, and results of simulations will be given.
Article
Full-text available
This paper presents a practical writing/reading scheme in nonvolatile memories, called balanced modulation, for minimizing the asymmetric component of errors. The main idea is to encode data using a balanced error-correcting code. When reading information from a block, it adjusts the reading threshold such that the resulting word is also balanced or approximately balanced. Balanced modulation has suboptimal performance for any cell-level distribution and it can be easily implemented in the current systems of nonvolatile memories. Furthermore, we studied the construction of balanced error-correcting codes, in particular, balanced LDPC codes. It has very efficient encoding and decoding algorithms, and it is more efficient than prior construction of balanced error-correcting codes.
Article
Coding schemes for storage channels, such as optical recording and non-volatile memory (Flash), with unknown gain and offset are presented. In its simplest case, the coding schemes guarantee that a symbol with a minimum value (floor) and a symbol with a maximum (ceiling) value are always present in a codeword so that the detection system can estimate the momentary gain and the offset. The results of the computer simulations show the performance of the new coding and detection methods in the presence of additive noise.
Article
In non-volatile memories, reading stored data is typically done through the use of predetermined fixed thresholds. However, due to problems commonly affecting such memories, including voltage drift, overwriting, and inter-cell coupling, fixed threshold usage often results in significant asymmetric errors. To combat these problems, Zhou, Jiang, and Bruck recently introduced the notion of dynamic thresholds and applied them to the reading of binary sequences. In this paper, we explore the use of dynamic thresholds for multi-level cell (MLC) memories. We provide a general scheme to compute and apply dynamic thresholds and derive performance bounds. We show that the proposed scheme compares favorably with the optimal thresholding scheme. Finally, we develop limited-magnitude error-correcting codes tailored to take advantage of dynamic thresholds.