Content uploaded by Kees Schouhamer Immink

Author content

All content in this area was uploaded by Kees Schouhamer Immink on Feb 28, 2020

Content may be subject to copyright.

Content uploaded by Kees Schouhamer Immink

Author content

All content in this area was uploaded by Kees Schouhamer Immink on Oct 25, 2018

Content may be subject to copyright.

1

An Unsupervised Learning Approach for Data

Detection in the Presence of Channel Mismatch and

Additive Noise

Kees A. Schouhamer Immink and Kui Cai

Abstract—We investigate machine learning based on clustering

techniques that are suitable for the detection of encoded strings

of q-ary symbols transmitted over a noisy channel with partially

unknown characteristics. We consider the detection of the q-ary

data as a classiﬁcation problem, where objects are recognized

from a corrupted vector, which is obtained by an unknown

corruption process. We ﬁrst evaluate the error performance of k-

means clustering technique without constrained coding. Secondly,

we apply constrained codes that create an environment that

improves the detection reliability and it allows a wider range

of channel uncertainties.

Index Terms—Constrained coding, storage systems, non-

volatile memories, Pearson distance, Euclidean distance, channel

mismatch, Pearson code. k-means clustering, learning systems

I. INTRODUCTION

We present new techniques for the detection of q-ary data

in the face of additive noise and unknown channel corruption

by a slow change (drift) of some of the channel parame-

ters. The new detection methods are based on the teachings

of cluster analysis. An n-symbol q-ary word (x1, . . . , xn),

xi∈ {0, . . . , q −1}is transmitted or stored, and the received

word (r1, . . . , rn)is corrupted by additive noise, intersym-

bol interference, and other unknown nuisance. Retrieving a

replica of the original q-ary data is seen as the classiﬁcation

function (r1, . . . , rn)→ {0, . . . , q −1}. Machine learning

and deep learning are techniques that are very suitable for

classiﬁcation tasks. The detection function is considered here

as a classiﬁcation problem, or object recognition, which is

targeted by cluster analysis. Cluster analysis is an example of

unsupervised machine learning, a common technique for sta-

tistical data analysis, used in many ﬁelds, pattern recognition,

image analysis, information retrieval, data compression, and

computer graphics [1].

We investigate a typical competitive learning algorithm,

named k-means clustering technique, which is an iterative

process that implements the detection function given initial

values of some basic parameters. The aim of the learning

algorithm is to map nreceived symbols into kclusters, where

in the case at hand the kclusters are associated with the

Kees A. Schouhamer Immink is with Turing Machines Inc, Willem-

skade 15d, 3016 DK Rotterdam, The Netherlands. E-mail: immink@turing-

machines.com.

Kui Cai is with Singapore University of Technology and Design (SUTD),

8 Somapah Rd, 487372, Singapore. E-mail: cai kui@sutd.edu.sg.

This work is supported by Singapore Agency of Science and Technology

(A*Star) PSF research grant and SUTD-ZJU grant ZJURP1500102

qsymbol values. The detector is ignorant of the number of

different symbol values in the sent codeword, that is k≤q.

A major challenge in cluster analysis is the estimation of the

optimal number of ‘clusters’ [2], [3]. The k-means clustering

technique does not allow to easily estimate the number of

(different) clusters, and therefore other means are needed

to estimate the number of clusters, k. Due to the presence

of vexatious codewords and channel distortion, the iteration

process may not always converge to a proper solution. To solve

this issue, we deﬁne constrained coding that may assist in

creating an environment where k-means clustering technique

is a reliable detection technique, and the estimation of the

number of clusters can be avoided.

In mass data storage devices, the user data are translated

into physical features that can be either electronic, magnetic,

optical, or of other nature [4]. Due to process variations,

the magnitude of the physical effect may deviate from the

nominal values, which may affect the reliable read-out of the

data. We may distinguish between two stochastic effects that

determine the process variations. On the one hand, we have

the unpredictable stochastic process variations, and on the

other hand, we may observe long-term effects, also stochastic,

due to various physical effects. For example, in non-volatile

memories (NVMs), such as ﬂoating gate memories, the data

is represented by stored charge. The stored charge can leak

away from the ﬂoating gate through the gate oxide or through

the dielectric. The amount of leakage depends on various

physical parameters, for example, the device temperature, the

magnitude of the charge, the quality of the gate oxide or

dielectric, and the time elapsed between writing and reading

the data.

The probability distribution of the recorded features changes

over time, and speciﬁcally the mean and the variance of

the distribution may change. The long-term effects are hard

to predict as they depend on, for example, the (average)

temperature of the storage device. An increase of the variance

over time may be seen as an increase of the noise level of

the storage channel, and it has a bearing on the detection

quality. The long-term deviations from the nominal means,

called offsets, can be estimated using an aging model, but,

clearly, the offsets depend on unpredictable parameters such as

temperature, humidity, etc, so that the prediction is inaccurate.

Various techniques have been advocated for improving the

detector resilience in case of channel mismatch when the

means and the variance of the recorded features distribution

have changed. Estimation of the unknown offsets may be

2

readily achieved by using reference cells, i.e., redundant cells

with known stored data. The method is often considered too

expensive in terms of redundancy, and alternative methods

with lower redundancy have been sought for.

Alternatively, coding techniques can be applied to alleviate

the detection in case of channel mismatch. Speciﬁcally bal-

anced codes [5], [6], [7] and composition check codes [8], [9]

preferably in conjunction with Slepian’s optimal detection [10]

offer resilience in the face of channel mismatch. These coding

methods are often considered too expensive in terms of cod-

ing hardware and redundancy, speciﬁcally when high-speed

applications are considered.

Detectors based on the Pearson distance instead of the

traditional Euclidean distance are immune to channel mis-

match [11]. For the binary case, q= 2, the redundancy

is low and the complexity of the Pearson detector scales

with n. However, the required number of operations grows

exponentially with the length, n, and alphabet size, q, so

that for larger values of nand qthe method becomes an

impracticability [12]. Alternative detection methods for larger

qand nthat are less costly in resources are welcome.

In this paper, we investigate detection schemes of q-ary,

q > 2, codewords that are based on the results of modern

cluster analysis. We assume distortion of the symbols received

by additive noise and we further assume that the channel

characteristics are not completely known to both sender and

receiver. Detection is based on the observation of nsymbols

only, and the observation of past or future symbols is not

assumed.

We set the scene in Section II with preliminaries and a de-

scription of the mismatched channel model. Prior art detection

schemes are discussed in Section III. In Section IV, we present

a new detection based on k-means clustering. Computer

simulations are conducted to assess the error performance

of the prior art and new schemes developed. In Sections V

and VI, we adopt a simple linear channel model where it

is assumed that the gain and offset of the received signal

are unknown. Computer simulations are conducted to assess

the error performance of the detection schemes. Section VII

concludes this paper.

II. PRELIMINARIES AND CHANNEL MODEL

We consider a communication codebook, S ⊆ Qn, of

selected n-symbol codewords x= (x1, x2, . . . , xn)over the

q-ary alphabet Q={0, . . . , q −1}, where n, the length of x,

is a positive integer. The codeword, x∈ S, is translated into

physical features, where the logical symbols, i, are written at

an average (physical) level i+bi, where bi∈R,0≤i≤q−1,

denotes the average deviation from the nominal or ‘ideal’

value. The average deviations, bi, may slowly vary (drift)

in time due to charge leakage or temperature change. The

quantities biare average deviations, called offsets, from the

nominal levels, and they are relatively small with respect to the

assumed unity difference (or amplitude) between neighboring

physical signal levels. For unambiguous detection, the average

of the physical level associated with the logical symbol ‘i’ is

assumed to be less than that associated with the logical symbol

‘i+ 1’. In other words, we have the premise

b0<1 + b1<2 + b2<·· · < q −1 + bq−1(1)

or

bi−1−bi<1,1≤i≤q−1.(2)

Assume a codeword, x, is sent. The symbols, ri, of the

retrieved vector r= (r1, . . . , rn)are distorted by additive

noise and given by

ri=xi+bxi+νi.(3)

We ﬁrst design a detector for the above case where the

unknown offsets, bi’s, are uncorrelated. Thereafter, we dis-

tinguish two special cases, where the bi’s are correlated. For

the ﬁrst, general, case, we assume

bi= (a−1)i+b, (4)

where ais an unknown attenuation, or gain, of the channel,

and bis an unknown offset,aand b∈R. We simply ﬁnd using

(3)

ri=axi+b+νi.(5)

In the offset-only case, a= 1, all bi’s are equal, or ri=

xi+b+νi. We assume that the received vector, r, is corrupted

by additive Gaussian noise ν=(ν1, . . . , νn), where νi∈R

are zero-mean independent and identically distributed (i.i.d)

noise samples with normal distribution N(0, σ2). The quantity

σ2∈Rdenotes the noise variance. The additive noise term

may be caused by fabrication process variations or electronics

(detector) noise.

III. PRIOR ART DETECTION SCHEMES

Below we discuss three prior art detection schemes and

relevant properties.

A. Fixed threshold detection (FTD)

The symbols of the received word, ri, can be straightfor-

wardly quantized to an integer, ˆxi∈ Q, with a conventional

ﬁxed threshold detector (FTD), also called symbol-by-symbol

detector. The threshold function is denoted by ˆxi= Φϑ(ri),

ˆxi∈ Q, where the threshold vector ϑ= (ϑ0, . . . , ϑq−2)has

q−1(real) elements, called thresholds or threshold levels. The

threshold vector satisﬁes the order

ϑ0< ϑ1< ϑ2<·· · < ϑq−2.(6)

The quantization function, Φϑ(u), of the threshold detector is

deﬁned by

Φϑ(u) =

0, u < ϑ0,

i,ϑi−1≤u < ϑi,1≤i≤q−2,

q−1,u≥ϑq−2.

(7)

For a ﬁxed threshold detector the q−1detection thresholds

values, ϑi, are equidistant at the levels

ϑi=1

2+i, 0≤i≤q−2.(8)

3

Threshold detection is very attractive for its implementation

simplicity. However, the error performance seriously degrades

in the face of channel mismatch [11]. A detector that dynam-

ically adjusts the thresholds is an alternative that offers solace

in the face of channel mismatch. The next subsection describes

a typical example.

B. Dynamic threshold detection (min-max detector)

We assume that the channel model is, see (5), ri=

axi+b+νi, where the gain, a > 0, and offset, b, are

unknown parameters, except for the sign of a. In case S=Qn,

that is all possible codewords are allowed, mismatch im-

mune detection is not possible since such a detector cannot

distinguish between the word ˆ

xand its shifted and scaled

version ˆ

y=c1ˆ

x+c2. A designer must judiciously select

codewords from Qngiven adequate constraints that may

enable mismatch immune detection. For example, we select

for Sthose codewords where the symbols ‘0’ and ‘q−1’

must be both at least once present. For the binary case, q= 2,

this implies a slight redundancy as only the all-1 and all-0

words have to be removed, see Subsection VI-A for details.

Then, the detector can straightforwardly estimate the gain and

offset by

ˆa=maxiri−miniri

q−1(9)

and ˆ

b= min

iri,(10)

where ˆaand ˆ

bdenote the estimates of the actual channel gain

and offset [13]. The dynamic thresholds, denoted by ˆ

ϑi, are

scaled in a similar fashion as the received codeword, that is,

ˆ

ϑi= ˆaϑi+ˆ

b, 0≤i≤q−2.(11)

It has been shown [13] that the min-max detector operates

over a large range of unknown parameters aand b. However,

since the estimates, ˆaand ˆ

b, are biased, the above dynamic

threshold detector loses error performance with respect to the

matched case, especially for larger codeword length n. The

detector complexity scales linearly with nas the principal

cost is the ﬁnding of the maximum and minimum of the n

received symbol values. Alternatively, detection based on the

prior art Pearson distance, discussed in the next subsection,

improves the error performance, but with mounting hardware

requirements.

C. Pearson distance detection

Immink and Weber [11] advocated the Pearson distance

instead of the conventional Euclidean distance for improving

the error performance of a mismatched noisy channel. We ﬁrst

deﬁne two quantities, namely the vector average of the n-

vector z

z=1

n

n

i=1

zi(12)

and the (unnormalized) vector variance of z

σ2

z=

n

i=1

(zi−z)2.(13)

The Pearson distance, δp(r,ˆ

x), between the received vector r

and a codeword ˆ

x∈ S is deﬁned by

δp(r,ˆ

x) = 1 −ρr,ˆ

x,(14)

where

ρr,ˆ

x=n

i=1(ri−r)(ˆxi−ˆx)

σrσˆx

(15)

is the well-known (Pearson) correlation coefﬁcient. It is as-

sumed that both codewords xand ˆ

xare taken from a judi-

ciously chosen codebook S, whose properties are explained

in subsection VI-A. The Pearson distance is not a metric in

the strict mathematical sense, but in engineering parlance it

is still called a distance since it provides a useful measure

of similarity between vectors. A minimum Pearson distance

detector outputs the codeword

xo= arg min

ˆ

x∈S

δp(r,ˆ

x).(16)

It can easily be veriﬁed that the minimization of δp(r,ˆ

x), and

thus xo, is independent of both aand b, so that the detection

quality is immune to unknown drift of the quantities aand

b. The minimization operation (16) requires |S|computations,

which is impractical for larger S. The number of computations

can be reduced to K, the number of constant composition

codes that constitutes the codebook S, given by [14]

K=n+q−3

q−1.(17)

For the binary case, q= 2, we have K=n−1, so that

the detection algorithm (16) scales linearly with n. For the

non-binary case, it is hard to compute or simulate the error

performance of minimum Pearson distance detection in a

relevant range of qand nas the number, K, of operations

grows rapidly with both qand n. For example, for q= 4 and

n= 64 we have K= 43.680 comparisons (16) per decoded

codeword.

The three prior art detection methods discussed above have

drawbacks in error performance and/or complexity, and to

alleviate these drawbacks, viable alternatives are sought for. In

the next section, we propose and investigate a novel detection

method with less complexity requirements, which is based on

clustering techniques.

IV. DATA DETECTION U SI NG k-MEANS CLUSTERING

In the next subsection we describe the basic k-means

clustering algorithm, and present results of simulations for the

unmatched noisy channel.

A. Basic k-means clustering algorithm

The k-means clustering technique aims to partition the n

received symbols into ksets V={V0, V1, . . . , Vk−1}so as to

minimize the within-cluster sum of squares deﬁned by

arg min

V

k−1

i=0

rj∈Vi

(rj−µi)2,(18)

4

where the centroid µiis the mean of the received symbols in

cluster Vi, or

µi=1

|Vi|

rj∈Vi

rj.(19)

The problem of choosing the correct number of clusters is

hard, and numerous prior art publications are available to

facilitate this choice [2], [3]. Here we assume that a cluster is

associated with one of the ksymbol values, that is k=q.

The k-means clustering algorithm is an iteration process that

ﬁnds a solution of (18). The initial sets V(1)

i,0≤i≤k−1, are

empty. The superscript integer in parentheses, (t), denotes the

iteration tally. We initialize the kcentroids µ(1)

i,0≤i≤k−1,

by a reasonable choice. For example, Forgy’s method [15]

randomly chooses ksymbols (assuming k < n), ri, from

the received vector r, and uses these as the initial centroids

µ(1)

i,1≤i≤k. The choice of the initial centroids has a

signiﬁcant bearing on the error performance of the clustering

detection technique. We do not follow Forgy’s approach, and

try, dependent on the speciﬁc situation at hand, to develop

more suitable initial centroids µ(1)

i’s. We assume that we order

the centroids such that

µ(t)

0< µ(t)

1<·· · < µ(t)

q−1.(20)

After the initialization step, we iterate the next two steps until

the symbol assignments no longer change.

•Assignment step: Assign the nreceived symbols, ri, to

the ksets V(t+1)

j. If ri,1≤i≤n, is closest to µ(t)

ℓ, or

ℓ= arg min

jri−µ(t)

j2,(21)

then riis assigned to V(t+1)

ℓ. The (temporary) decoded

codeword, denoted by

ˆ

x(t)= (ˆx(t)

1, . . . , ˆx(t)

n),(22)

is found by

ˆx(t)

i=ϕV(t)(ri),1≤i≤n, (23)

where ϕV(t)(ri) = jsuch that ri∈V(t)

j.

•Updating step: Compute updated versions of the k

means µ(t+1)

j, j ∈ Q. An update of the new means µ(t+1)

j

is found by

µ(t+1)

j=1

|V(t+1)

j|

ri∈V(t+1)

j

ri, j ∈ Q,(24)

where it is understood that if |V(t+1)

j|= 0 that µ(t+1)

j=

µ(t)

j(that is, no update).

After running the above routine until the temporary decoded

word is unchanged, say at iteration step, t=to, we have

ˆ

x(to−1) =ˆ

x(to). Then we have found the ﬁnal estimate of the

sent codeword, xo=ˆ

x(to). Bottou [16] showed that the k-

means cluster algorithm always converges to a simple steady

state, and limit cycles do not occur. It is possible, however,

that the process reaches a local minimum of the within-cluster

sum of squares (18).

B. Assignment step: relation with threshold detection

We take a closer look at the assignment step of the k-means

clustering technique, given by (21). Considering the order (20)

of the centroids µ(t)

j, we simply infer that the symbol rilies

between, say, µ(t)

u≤ri≤µ(t)

u+1,0≤u≤q−2. Thus

ℓ= arg min

j∈{u,u+1}ri−µ(t)

j2.

As

ri−µ(t)

u2−ri−µ(t)

u+12

= 2riµ(t)

u+1 −µ(t)

u+µ(t)

u2−µ(t)

u+12

=µ(t)

u+1 −µ(t)

u2ri−µ(t)

u−µ(t)

u+1,(25)

we obtain

ℓ=

u,ri<µ(t)

u+1+µ(t)

u

2,

u+ 1,ri>µ(t)

u+1+µ(t)

u

2.

(26)

Using (7), we yield

ℓ= arg min

jri−µ(t)

j2= Φˆ

ϑ(ri),(27)

where the threshold vector, ˆ

ϑ, is given by

ˆ

ϑi=µ(t)

i+1 +µ(t)

i

2,0≤i≤q−2,(28)

and the intermediate decoded vector, ˆ

x(t), is given by

ˆx(t)

i= Φˆ

ϑ(ri),1≤i≤n. (29)

We conclude that the k-means cluster detection method is a

dynamic threshold detector, where at each update the threshold

vector, ˆ

ϑ, is updated with the means of the members of each

cluster using (24).

In the next section, we report on outcomes of computer

simulations using channel model (3).

C. Results of simulations

We investigate the error performance of channel model

(3), where we assume that the stochastic deviations from the

means, bi,i∈ Q, are taken from a zero-mean continuous

uniform distribution with variance σ2

b. Thus, the bi’s lie within

the range −√3σb≤bi≤√3σb. We assume a uniform

distribution to guarantee premise (2).

We simply initialize the centroids by µ(1)

i=i,i∈ Q,

and iterate the assignment and updating steps as outlined

above. Figure 1 shows outcomes of computer simulations for

the case n= 64 and q= 4, where we compare the word

error rate (WER) of conventional ﬁxed threshold detection

and the novel dynamic threshold detection based on k-means

clustering classiﬁcation versus the signal-to-noise ratio (SNR)

deﬁned by =−20 log σ. We plotted two cases, namely σb= 0

(ideal channel) and σb= 0.1. As a further comparison we

5

TABLE I

HIS TOG RAM O F TH E NUM BE R OF IT ER ATION S FO R q= 4,n= 64,AN D

σb= 0.1.

toSNR = 17 dB SNR = 20 dB

1 91.43 99.70

2 8.37 0.30

3 0.20 0

17 17.5 18 18.5 19 19.5 20

10−5

10−4

10−3

10−2

10−1

100

WER

SNR (dB)

(c)

(a’)

(b’)

(a)

(b)

Fig. 1. Word error rate (WER) of ﬁxed threshold detection (FTD),

curve (a’), and k-means clustering detection, curve (b’), versus SNR

=−20 log σ(dB) for n= 64,q= 4, and σb= 0.1. Curves (a) and

(b) are shown for the case σb= 0 (ideal channel). The upperbound

(30) to the word error rate of a ﬁxed threshold detector for an ideal

noisy channel, q= 4 and n= 64, curve (c).

plotted the upper bound of the word error rate of a threshold

detector for an ideal additive noise channel, given by [11]

WER <2(q−1)

qnQ 1

2σ.(30)

We infer that in case the channel is ideal, σb= 0, that the

error performance of k-means clustering detection is close

to the performance of both theory and simulation practice of

conventional ﬁxed threshold detection. In case the channel is

not ideal, σb= 0.1,k-means clustering detection is superior

to ﬁxed threshold detection.

The number of iterations, which is an important (time)

complexity issue, depends on the integers q,n, and the signal-

to-noise ratio, SNR. The convergence of the iteration process

is guaranteed [16], but the speed of convergence is an open

question that we studied by computer simulations. Table I

shows results of simulations for the case q= 4,n= 64,

and σb= 0.1(same parameters as used in the simulations

depicted in Figure 1). At an SNR = 17 dB, around 91% of

the received words is detected without further iterations. In

8% of the detected words, only one iteration of the threshold

levels is needed. At an SNR = 20 dB, we found that all but

no iterations are required. Thus, since in the large majority of

cases no iterations are needed, we conclude that at the cost

of a slight additional (time) complexity, the proposed k-means

clustering classiﬁcation outperforms ﬁxed threshold detection.

17 17.5 18 18.5 19 19.5 20

10−5

10−4

10−3

10−2

10−1

100

WER

SNR (dB)

cluster

ideal

FTD

Fig. 2. Word error rate (WER) of ﬁxed threshold detection (FTD)

and k-means clustering detection (cluster) versus SNR for n= 64,

q= 4,a= 0.95, and b= 0. As a reference, the upperbound to

the word error rate of a ﬁxed threshold detector for the ideal noisy

channel given by (30).

V. UNKNOWN GAIN aA ND OFFS ET b(S MA LL R AN GE O F

UN CE RTAIN TY )

In this section, we assume that the linear channel model, see

(5), ri=axi+b+νi, applies. In case the gain, a, is within

a tolerance range close to unity and the tolerance range of

the offset, b, is close to zero, we may directly apply the basic

k-means clustering as outlined in the previous section. We

require that both aand bare so close to their nominal values

that a ﬁxed threshold detector works correctly in the noiseless

case. Then, the initialization, using the ﬁxed threshold detector,

furnishes sufﬁciently reliable data for the iterations to follow.

From the deﬁnition of a ﬁxed threshold decoder, see (7), we

simply derive the following tolerance ranges of aand bthat

guarantee a ﬂawlessly operating threshold detector, namely

b < ϑ0=1

2,

ϑi−1< ai +b < ϑi,1≤i≤q−2,

a(q−1) + b > ϑq−2=q−3

2,(31)

or

b < 1

2,

i−1

2< ai +b < i +1

2,1≤i≤q−2,

a(q−1) + b > q −3

2.(32)

Figure 2 shows outcomes of computer simulations, where we

compare for the case n= 64 and q= 4, the word error rate

(WER) of ﬁxed threshold detection and detection based on k-

means clustering versus the signal-to-noise ratio (SNR), where

the channel gain equals a= 0.95 and b= 0. We conclude that

the cluster detector shows a greater resilience in the face of

unknown gain, a, and additive noise than the ﬁxed threshold

detector.

6

In the above case, the parameters aand bare assumed to

have a limited range of uncertainty. In case, however, they

have a wider tolerance range than prescribed by (32), it is

not possible to unambiguously detect the codeword with a

ﬁxed threshold detector. The detector needs assistance, and

constrained coding is applied to assist in overcoming this

difﬁculty as discussed in the next section.

VI. UNKNOWN GAIN aA ND O FFS ET b(L AR GE R AN GE O F

UN CE RTAIN TY )

In this section, we focus on the situation where we anticipate

that both parameters aand bhave such a great range of

possible values that a ﬁxed threshold detector fails in the

majority of cases, even in the noiseless case. In the next

subsection, we show, by example, that in such a case it is im-

possible to distinguish between certain nettlesome situations,

and constrained coding becomes a requirement to solve the

ambiguity.

A. Constrained coding

In order to cope with larger uncertainties of both parameters

aand b, we face an ambiguity problem. For example, let

q= 5, and let (2,4,4) be the received vector. Clearly, it is

impossible to distinguish between the two choices, where the

sent codeword is (2,4,4) and a= 1 or where (1,2,2) and

a= 2. Let Sbe the adopted codebook, then we can cope

with the above ambiguity if (2,4,4) ∈ S then (1,2,2) /∈ S,

or vice versa. The name Pearson code was coined for a set of

codewords that can be uniquely decoded by a detector immune

to large uncertainties in both a > 0and b[13]. Codewords in

a Pearson code, S, satisfy two conditions, namely

•Property A: If x∈ S then c1+c2x/∈ S for all c1, c2∈R

with (c1, c2)̸= (0,1) and c2>0.

•Property B: x= (c, c, . . . , c)/∈ S for all c∈R.

We adopt a Pearson code that has codewords with at least one

‘0’ symbol and at least one ‘q−1’ symbol. We may easily

verify that such codewords satisfy Properties A and B. The

number of allowable n-symbol codewords equals [13]

|S| =qn−2(q−1)n+ (q−2)n, q > 1.(33)

For the binary case, q= 2, we simply ﬁnd that

|S| = 2n−2

(both the all-‘1’ and all-‘0’ words are deleted).

B. Revised k-means clustering using min-max initialization

Here it is assumed that the parameters aand bare com-

pletely unknown, except for the sign of a,a > 0. Due to

the large uncertainty, we cannot adopt the elementary choice

of the initial values of the centroids µ(1)

ias described in

Section IV. We propose, following the min-max detector tech-

nique described in Subsection III-B, the choice of the initial

centroids µ(1)

i’s using the minimum, miniri, and maximum

value, maxiri, of the received symbols. The Pearson code

guarantees at least one ‘0’ symbol and also at least one ‘q−1’

symbol in a codeword. The detector may therefore use the

13 13.5 14 14.5 15 15.5 16 16.5

SNR (dB)

10-5

10-4

10-3

10-2

10-1

100

WER

(a)

(b)

(c)

Fig. 3. Word error rate (WER) for the case q= 4,n= 64 and

gain a= 1.5of a) prior art min-max detector as described in

Subsection III-B, b) k-means clustering algorithm, and c) upperbound

(30) of an ideal ﬁxed threshold detector. Note that the signal-to-noise

ratio is deﬁned by SNR = −20 log(σ/a). The error performance is

independent of the offset b.

minimum and maximum value of the received symbols as

anchor points deﬁning the range of values of the symbols in

the received vector. To that end, let

α0= min

iriand α1= max

iri.(34)

The qinitial centroids, µ(1)

i, are found by the interpolation

µ(1)

i=α0+ (α1−α0)i

q−1,0≤i≤q−1.(35)

Note that the above initialization step of the modiﬁed k-means

clustering technique has the same effect as the scaling used in

the min-max detector (11). Figure 3 shows results of computer

simulations for the case q= 4 and n= 64 and a gain a= 1.5.

For normalization purposes, we deﬁne the SNR by SNR =

−20 log(σ/a). We compared prior art DTD with the k-means

clustering detection algorithm. The detector based on k-means

clustering outperforms the prior art min-max detector.

In the next subsection, we discuss a second modiﬁcation to

the basic k-means clustering method using regression analysis.

C. Revised k-means clustering algorithm using regression

analysis

We adopt a second modiﬁcation to the clustering algo-

rithm of Section IV. In the basic updating step (24), the

kclusters centroids are updated by computing a new mean

of the members in that cluster only. Here we assume that

the linear channel model, ri=axi+b+νi, described by

(5) holds. We have investigated an alternative method for

updating the centroids, µ(t+1)

j, by applying the well-known

linear regression model [17] that estimates the two coefﬁcients

aand binstead of the qcentroids µi.

We start and initialize as described in the previous sub-

section, where the qinitial centroids, µ(1)

i, are found by the

interpolation

µ(1)

i=α0+ (α1−α0)i, 0≤i≤q−1,(36)

7

where, as in (34),

α0= min

iriand α1= max

iri.(37)

For the offset only case, a=1, we have

µ(1)

i=α0+i, i = 0, . . . , q −1.(38)

After the initialization, we iterate the next two steps until

equilibrium is reached.

•Assignment step: Assign the nreceived symbols, ri, to

the ksets V(t+1)

j. If ri,1≤i≤n, is closest to µ(t)

ℓ, or

ℓ= arg min

jri−µ(t)

j2,(39)

then riis assigned to V(t+1)

ℓ. The (temporary) decoded

codeword, denoted by

ˆ

x(t)= (ˆx(t)

1, . . . , ˆx(t)

n),(40)

is found by

ˆx(t)

i=ϕV(t)(ri),1≤i≤n, (41)

where ϕV(t)(ri) = jsuch that ri∈V(t)

j.

•Updating step: Updates of the means µ(t+1)

j, j ∈ Q

are found by a linear regression model that estimates

the coefﬁcients aand b. To that end, deﬁne the linear

regression model

ˆri= ˆa(t)ˆx(t)

i+ˆ

b(t),(42)

where the (real-valued) regression coefﬁcients ˆa(t)and

ˆ

b(t), chosen to minimize n

i=1(ri−ˆri)2, denote the esti-

mates of the unknown quantities aand b. The regression

coefﬁcients ˆa(t)and ˆ

b(t)are found by invoking the well-

known linear regression method [17], and we ﬁnd

ˆ

b(t)=r−ˆa(t)ˆx(t)(43)

and

ˆa(t)=n

i=1(ri−r)(ˆxi−ˆx)

σ2

ˆx

=σr

σˆx(t)

ρr,ˆ

x(t).(44)

We note that for all x∈S,σˆx(t)̸= 0 since Prop-

erty B holds, see Subsection VI-A. The updated µ(t+1)

i,

i= 0, . . . , q −1, are found by the interpolation

µ(t+1)

i= ˆa(t)i+ˆ

b(t).(45)

For the offset-only case, a= 1, we simply ﬁnd

ˆ

b(t)=r−ˆx(t),(46)

and

µ(t+1)

i=i+ˆ

b(t)=i+r−ˆx(t).(47)

We have conducted a myriad of computer simulations with the

above algorithms. Figure 4 compares the error performance

of the revised k-means clustering using min-max initializa-

tion versus the revised k-means clustering using regression

analysis for the case q= 16 and n= 64. The performance

difference between the two cluster methods is independent of

the unknown quantities aand b.

17 17.5 18 18.5 19 19.5 20

SNR (dB)

10-4

10-3

10-2

10-1

WER

(a)

(b)

Fig. 4. Word error rate (WER) of a) revised k-means clustering using

min-max initialization, and b) revised k-means clustering algorithm

using regression method for the case for q= 16 and n= 64.

VII. CONCLUSIONS

We have proposed and analyzed machine learning based

on a k-means clustering technique as a detection method of

encoded strings of q-ary symbols. We have analyzed the de-

tection of distorted data retrieved from a data storage medium

where user data is stored as physical features with qdifferent

levels. Due to manufacturing tolerances and ageing the qlevels

differ from the desired, nominal, ones. Results of simulations

have been presented, where the qunknown level differences,

called offsets, are independent stochastic variables with a

uniform probability distribution. We have evaluated the error

performance of k-means clustering detection technique, where

the offsets are correlated, and can be modelled as unknown

scale, or gain, and translation, or offset. At the cost of some

additional (time) complexity, the proposed k-means clustering

classiﬁcation outperforms common prior art dynamic detection

methods in the face of additive noise and channel mismatch.

REF ER EN CE S

[1] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press,

2016.

[2] R. Tibshirani, G. Walther, and T. Hastie, “Estimating the number of

clusters in a data set via the gap statistic,” Journal of the Royal Statistical

Society, vol. 63, pp. 411-423, 2001.

[3] A. D. Gordon, Classifcation, 2nd edn., London: Chapman and Hall-

CRC, 1999.

[4] K. A. S. Immink, “A Survey of Codes for Optical Disk Recording,” IEEE

J. Select. Areas Commun., vol. 19, no. 4, pp. 756-764, April 2001.

[5] K. A. S. Immink, “Coding Schemes for Multi-Level Channels with

Unknown Gain and/or Offset Using Balance and Energy constraints,”

pp. 709-713, IEEE International Symposium on Information Theory,

(ISIT), Istanbul, July 2013.

[6] H. Zhou, A. Jiang, and J. Bruck, “Balanced Modulation for Nonvolatile

Memories,” arXiv:1209.0744, Sept. 2012.

[7] B. Peleato, R. Agarwal, J. M. Ciofﬁ, M. Qin, Member, and P. H. Siegel,

“Adaptive Read Thresholds for NAND Flash,” IEEE Transactions on

Commun., vol. COM-63, pp. 3069-3081, Sept. 2015.

[8] F. Sala, R. Gabrys, and L. Dolecek, “Dynamic Threshold Schemes for

Multi-Level Non-Volatile Memories,” IEEE Trans. on Commun., pp.

2624-2634, vol. COM-61, July 2013.

[9] K. A. S. Immink and K. Cai, “Composition Check Codes,” IEEE Trans.

Inform. Theory, vol. IT-64, pp. 249-256, Jan. 2018.

8

[10] D. Slepian, “Permutation Modulation,” Proc. IEEE, vol. 53, pp. 228-236,

March 1965.

[11] K. A. S. Immink and J. H. Weber, “Minimum Pearson Distance

Detection for Multi-Level Channels with Gain and/or Offset Mismatch,”

IEEE Trans. Inform. Theory, vol. IT-60, pp. 5966-5974, Oct. 2014.

[12] K. A. S. Immink, K. Cai, and J. H. Weber, “Dynamic Threshold

Detection Based on Pearson Distance Detection,” IEEE Trans. Commun.,

vol. COM-66, Issue 7, pp. 2958-2965, 2018.

[13] K. A. S. Immink, “Coding Schemes for Multi-Level Flash Memories that

are Intrinsically Resistant Against Unknown Gain and/or Offset Using

Reference Symbols,” Electronics Letters, vol. 50, pp. 20-22, 2014.

[14] K. A. S. Immink and K. Cai, “Design of Capacity-Approaching

Constrained Codes for DNA-based Storage Systems,” IEEE Commun.

Letters, vol. 22, pp. 224-227, Feb. 2018.

[15] E. W. Forgy, “Cluster analysis of multivariate data: efﬁciency versus

interpretability of classiﬁcations,” Biometrics, vol. 21, pp. 768-769,

(1965).

[16] L. Bottou and Y. Bengio, “Convergence properties of the k-mean

algorithms,” Advances in Neural Information Processing Systems 7, pp.

585-592, MIT Press, 1995.

[17] R. V. Hogg and A. T. Craig, Introduction to Mathematical Statistics, 5th

ed. New York: Macmillan, 1995.