Content uploaded by Kees Schouhamer Immink

Author content

All content in this area was uploaded by Kees Schouhamer Immink on Mar 14, 2020

Content may be subject to copyright.

Received January 6, 2020, accepted January 20, 2020, date of publication January 23, 2020, date of current version January 30, 2020.

Digital Object Identifier 10.1109/ACCESS.2020.2968945

Dynamic Threshold Detection Using Clustering

in the Presence of Channel Mismatch

and Additive Noise

KEES A. SCHOUHAMER IMMINK 1, (Fellow, IEEE), AND KUI CAI 2, (Senior Member, IEEE)

1Turing Machines Inc., 3016 Rotterdam, The Netherlands

2Department of Science, Singapore University of Technology and Design (SUTD), Singapore 487372

Corresponding author: Kees A. Schouhamer Immink (immink@turing-machines.com)

This work was supported in part by the Singapore University of Technology and Design (SUTD) Start-Up Research Grant (SRG) under

Grant SRLS15095, in part by the Singapore Ministry of Education Academic Research Fund Tier 2 under Grant MOE2016-T2-2-054, and

in part by the RIE2020 Advanced Manufacturing and Engineering (AME) Programmatic under Grant A18A6b0057.

ABSTRACT We report on the feasibility of k-means clustering techniques for the dynamic threshold

detection of encoded q-ary symbols transmitted over a noisy channel with partially unknown channel

parameters. We ﬁrst assess the performance of k-means clustering technique without dedicated constrained

coding. We apply constrained codes which allows a wider range of channel uncertainties so improving the

detection reliability.

INDEX TERMS Data storage systems, non-volatile memories, channel mismatch, k-means clustering,

dynamic threshold detection.

I. INTRODUCTION

In data storage products, user data are written into physical

attributes that can be either magnetic [1], electronic [2], opti-

cal [3], or even biological such as DNA [4]. Due to inevitable

process variations, called noise, the magnitude of the physical

attributes may deviate from their nominal values. In addition,

we have a long-term instability, called drift [5], which may

lead over time to changes of the physical attributes. As the

detector is ignorant of the long-term drift of the physical

attributes, this leads to a phenomenon called channel mis-

match. Channel mismatch without dynamic detection may

lead to signiﬁcant degradation of the error performance,

as shown, for example, in [6]. As not all code blocks in a

random-access memory are written at the same time, the drift

at read time, which is a function of the lapsed time since

writing the data blocks, may therefore vary per code block.

While process variations may vary from symbol to symbol,

it is assumed here that the drift term is constant within a block

of symbols, but may vary from code block to code block.

The changes in drift per block basis precludes the usage of

conventional dynamic drift estimation, which is based on pre-

viously retrieved codewords. As a consequence, the detector,

ignorant of the actual drift, must therefore estimate the drift

The associate editor coordinating the review of this manuscript and

approving it for publication was Wen-Long Chin .

on a block of symbols basis, and adjust its detector parameters

for optimizing its performance.

The detector resilience to unknown mismatch by drift can

be improved in various ways, for example, by employing

coding techniques. Balanced codes [6]–[9] and composition

check codes [10], [11], in conjunction with Slepian’s optimal

detection [12] offer excellent resilience in the face of channel

mismatch on a block of symbols basis. These coding and sig-

nal processing techniques are often considered too expensive

in terms of code redundancy and hardware, in particular when

high-speed applications are considered.

Recently, Pearson distance detectors have been advocated

as they are immune to mismatch on a block of symbols

basis [6]. For the binary case, q=2, the redundancy is attrac-

tively low and the complexity of the Pearson detector scales

with n. Neural network-based dynamic threshold detection

for non-volatile memories have been presented in [5]. For

larger alphabet size, q, the number of arithmetical operations

of grows exponentially with the codeword length, n, so that

these methods become an impracticability [13]. Alternative

detection methods for larger qand nthat are less costly in

resources are welcome.

We investigate new detection techniques for q-ary data,

q>2, that are conveyed over channels distorted by addi-

tive noise and unknown drift of the channel parameters.

The new detection techniques are based on k-means cluster

VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ 19319

K. A. S. Immink, K. Cai: Dynamic Threshold Detection Using Clustering in the Presence of Channel Mismatch and Additive Noise

analysis [14]–[16] that classiﬁes the nreceived symbols

into qclusters. Detection with given initial estimates of

the (unknown) channel parameters iterates to improved esti-

mates of the sent n-symbol codeword. In this work, we focus

on the application of the clustering technique for signal detec-

tion for non-volatile memory (NVM) technologies, which

typically have microsecond to nanosecond read access time,

and ultra-low power consumption. We choose to use the

k-means clustering algorithm to illustrate our ideas, since it

is simple and effective, and can satisfy the high-speed and

low-complexity requirement of NVMs.

Section II commences with preliminaries and a description

of the channel model. Prior art detection schemes are dis-

cussed in Section III. In Section IV, we present a new detec-

tion technique based on k-means clustering, and the error

performance of the prior art and new schemes are assessed.

In Sections Vand VI, we adopt a simple linear channel model

for improving the detection quality.

II. PRELIMINARIES AND CHANNEL MODEL

Let S⊆Qnbe a codebook of n-symbol codewords x=

(x1,x2,...,xn) over the q-ary alphabet Q= {0,...,q−1}.

The codeword, x∈S, is translated into physical attributes,

where the logical symbols, xi=k∈Q, are written at a

nominal (physical) level k. Due to signal dependent drift,

which depends on the original physical level k, the physical

level is at read time equal to k+bk, where the drift term

bk∈Rdenotes the average deviation from the nominal

value k. The qdrift parameters bkare average deviations from

the nominal levels, and they are assumed to be relatively small

with respect to the (normalized) unity difference between

neighboring signal levels. Besides the distortion by unknown

drift, the received symbols are distorted by unknown additive

noise caused by random telegraph noise [17], [18] and fabri-

cation process variations during the write operation. We are

now in a position to write down the channel model.

Let the codeword, x, consisting of n q-ary symbols be

sent or stored. The symbols, ri, of the retrieved vector r=

(r1,...,rn) are distorted by unknown additive noise and

signal dependent drift terms, and given by

ri=xi+bxi+νi,1≤i≤n.(1)

The received word, r, is corrupted by (assumed) additive

Gaussian noise ν=(ν1, . . . , νn), where νi∈Rare zero-mean

independent and identically distributed (i.i.d) noise samples

with normal distribution N(0, σ 2). The quantity σ2∈R

denotes the noise variance. Since the qsignal dependent drift

terms, bxi=bk,xi=k∈Q, are ﬁxed for an n-symbol word,

and may vary per block of nsymbols, previously received

n-symbol blocks cannot be used to estimate the unknown drift

terms. Thus, the estimation of the qreal-valued bk’s and the n

integer-valued xi’s must be accomplished having knowledge

of the nobserved ri’s, see (1), plus the assumptions on the

distributions of the noise νiand signal dependent drift bk,

only.

Unambiguous solution of (1) is not possible for large val-

ues of the signal dependent drift term bk, and restrictions must

be made on its range. For unambiguous detection we must

avoid overlap between the physical level, k+bk, associated

with the logical symbol k∈Q, and its neighboring physical

level k+1+bk+1associated with the logical symbol k+1.

In an extreme example, we may have b0=1 and b1= −1,

so that, as a result, the logical values of the ‘0’s and ‘1’s and

their representations by the physical levels ‘0’s and ‘1’s are

swapped, which should be avoided. Such undesired situation

is avoided if we stipulate that

b0<1+b1<2+b2<··· <q−1+bq−1(2)

or, we have found the restrictions

bk−1−bk<1,1≤k≤q−1.(3)

We ﬁrst design, in Section IV, a detector for the above

model (1), where the qunknown offsets, bk’s, are uncorre-

lated. Thereafter, in Sections Vand VI, we distinguish two

special cases, where the qunknown bk’s are correlated. For

the ﬁrst model, we assume

bk=(a−1)k+b,(4)

where ais an unknown attenuation, or gain, of the channel,

and bis an unknown offset,aand b∈R. We simply ﬁnd

using (1)

ri=axi+b+νi.(5)

Note that for the binary case, q=2, we can always rewrite (1)

by (5), substituting b=b0and a=1+b1−b0. The

condition (3) implies a>0. For the second model, called

offset-only model, a=1, all qdrift terms, bk, are equal,

so that we simply obtain ri=xi+b+νi.

We start with a brief discussion of prior art detection

methods.

III. PRIOR ART DETECTION SCHEMES

We discuss three prior art detection schemes and their relevant

attributes.

A. FIXED THRESHOLD DETECTION (FTD)

A conventional ﬁxed threshold detector (FTD), also called

symbol-by-symbol detector, straightforwardly quantizes the

nreal-valued ri’s to an integer, ˆxi∈Q. Let the threshold

function be denoted by ˆxi=8ϑ(ri), ˆxi∈Q, where the

threshold vector ϑ=(ϑ0, . . . , ϑq−2) has q−1 (real) elements,

called thresholds or threshold levels. The threshold vector

satisﬁes the order

ϑ0< ϑ1< ϑ2<··· < ϑq−2.(6)

The quantization function, 8ϑ(u), of the threshold detector is

deﬁned by

8ϑ(u)=

0,u< ϑ0,

i, ϑi−1≤u< ϑi,1≤i≤q−2,

q−1,u≥ϑq−2.

(7)

19320 VOLUME 8, 2020

K. A. S. Immink, K. Cai: Dynamic Threshold Detection Using Clustering in the Presence of Channel Mismatch and Additive Noise

For a ﬁxed threshold detector the q−1 detection thresholds

values, ϑi, are equidistant at the threshold levels

ϑi=1

2+i,0≤i≤q−2.(8)

Threshold detection has been applied for its implementation

simplicity. However, we may pay the price for its simplicity

elsewhere as the error performance seriously degrades in the

face of channel mismatch [6]. A detector that dynamically

adjusts the threshold vector, ϑ, is a necessity that offers solace

in the face of channel mismatch. A typical example of such a

detector is described in the next subsection.

B. DYNAMIC THRESHOLD DETECTION (MIN-MAX

DETECTOR)

We assume that the channel model is, see (5), ri=axi+b+νi,

where the gain, a>0, and offset, b, are unknown parameters,

except for the sign of a. In case S=Qn, that is all possible

codewords are allowed, mismatch immune detection on a

block of symbols basis is not possible since such a detector

cannot distinguish between the word ˆ

xand its shifted and

scaled version ˆ

y=c1ˆ

x+c2. A designer must judiciously

select codewords from Qngiven adequate constraints that

enables mismatch-immune detection. For example, we select

for Sthose codewords where the symbols ‘0’ and ‘q−1’

must be both at least once present. For the binary case, q=2,

this implies a slight redundancy as only the all-‘0’ and all-‘1’

words have to be barred, see Subsection VI-A for details.

Then, the detector can straightforwardly estimate the gain and

offset by

ˆa=maxiri−miniri

q−1(9)

and

ˆ

b=min

iri,(10)

where ˆaand ˆ

bdenote the estimates of the actual channel gain

and offset. The dynamic thresholds, denoted by ˆ

ϑi, are scaled

in a similar fashion as the received codeword, that is,

ˆ

ϑi= ˆaϑi+ˆ

b,0≤i≤q−2.(11)

It has been shown in [19] that the min-max detector operates

satisfactorily over a large range of unknown parameters a

and b. However, since ˆaand ˆ

bare biased estimates of aand b,

the above dynamic threshold detector loses error performance

with respect to the ideal matched case, especially for larger

codeword length n. The detector complexity scales linearly

with nas its principal cost is the ﬁnding of the maximum and

minimum of the nreceived symbol values using (9) and (10).

Alternatively, detection based on the prior art Pearson dis-

tance, discussed in the next subsection, improves the error

performance, but with mounting hardware requirements.

C. PEARSON DISTANCE DETECTION

Immink and Weber [6] advocated the Pearson distance

instead of the conventional Euclidean distance for improv-

ing the error performance of a mismatched noisy channel.

We ﬁrst deﬁne two quantities, namely the vector average of

the n-vector z

z=1

n

n

X

i=1

zi(12)

and the (unnormalized) vector variance of z

σ2

z=

n

X

i=1

(zi−z)2.(13)

The Pearson distance, δp(r,ˆ

x), between the received vector r

and a codeword ˆ

x∈Sis deﬁned by

δp(r,ˆ

x)=1−ρr,ˆ

x,(14)

where

ρr,ˆ

x=Pn

i=1(ri−r)(ˆxi− ˆx)

σrσˆx

(15)

is the (Pearson) correlation coefﬁcient. It is assumed that both

codewords xand ˆ

xare taken from a judiciously chosen code-

book S, whose properties are explained in subsection VI-A.

A detector based on the Pearson distance outputs the

codeword

xo=argmin

ˆ

x∈S

δp(r,ˆ

x).(16)

It can easily be veriﬁed that the minimization of δp(r,ˆ

x), and

thus xo, is independent of both aand b, so that the detection

quality is immune to unknown drift of the quantities aand

bon a block of nsymbols basis. The minimization opera-

tion (16) requires |S|computations, which is impractical for

larger S. The number of computations can be reduced to K,

the number of constant composition codes that constitutes the

codebook S, given by [6]

K=n+q−3

q−1.(17)

For the binary case, q=2, we have K=n−1, so that

the detection algorithm (16) scales linearly with n. For the

non-binary case, it is hard to compute or simulate the error

performance of minimum Pearson distance detection in a

relevant range of qand nas the number, K, of operations

grows rapidly with both qand n.

The three prior art detection methods discussed above

have drawbacks in error performance and/or complexity, and

to alleviate these drawbacks, viable alternatives are sought

for. In the next section, we propose and investigate a novel

detection method with less complexity requirements, which

is based on clustering techniques.

IV. DATA DETECTION USING K-MEANS CLUSTERING

In the next subsection we describe the basic k-means clus-

tering algorithm, and present results of simulations for the

unmatched noisy channel.

VOLUME 8, 2020 19321

K. A. S. Immink, K. Cai: Dynamic Threshold Detection Using Clustering in the Presence of Channel Mismatch and Additive Noise

A. BASIC K-MEANS CLUSTERING ALGORITHM

We assume that a cluster is associated with one of the

q,q>2, symbol values. The k-means clustering technique,

see for example [15], aims to partition the nreceived symbols

riinto k=qclusters (sets), V= {V0,V1,...,Vk−1}

by minimizing the within-cluster sum of squares, which is

deﬁned by

argmin

V

k−1

X

i=0X

rj∈Vi

(rj−µi)2,(18)

where the centroid µiis the mean of the received symbols in

cluster Vi, or

µi=1

|Vi|X

rj∈Vi

rj.(19)

The binary case, q=2, has been studied in the context of

symbol timing recovery by Zhao et al. [14], where the two

pulse amplitudes are unknown.

The k-means clustering algorithm is an iteration routine

that ﬁnds a solution of (18). At the start of the iteration

algorithm, the sets V(1)

i, 0 ≤i≤k−1, are void. The

superscript integer in parentheses, (t),t=1,2, . . ., denotes

the iteration count. In addition, we initialize the kcentroids

µ(1)

i, 0 ≤i≤k−1, by a ‘reasonable’ choice. The choice

of the initial centroids has a bearing on word error rate and

the speed of convergence of the iteration process. Forgy’s

method [20], for example, a well-known method described in

the literature, randomly chooses ksymbols (assuming k<n),

ri, from the n-symbol received vector r, and sets these as

the initial centroids µ(1)

i, 1 ≤i≤k. Forgy’s approach is

not followed here as we developed improved initial centroids

aiming to reduce the number of iterations and the word error

rate.

Let the order of the centroids be given by

µ(t)

0< µ(t)

1<··· < µ(t)

q−1.(20)

We iterate the next two steps until the symbol assignments no

longer change and the iteration routine has come to a halt.

•Assignment step: The n ri’s, 1 ≤i≤n, are assigned to

the ksets V(t+1)

jas follows. If riis closest to the centroid

µ(t)

`, or

`=argmin

jri−µ(t)

j2

,(21)

then riis assigned to cluster V(t+1)

`. The elements of

the (intermediate) decoded codeword, denoted by ˆ

x(t)=

(ˆx(t)

1,...,ˆx(t)

n) is found by

ˆx(t)

i=φV(t)(ri),1≤i≤n,(22)

where φV(t)(ri)=jsuch that ri∈V(t)

j.

•Updating step: The updated means µ(t+1)

j,j∈Qis

found by

µ(t+1)

j=1

|V(t+1)

j|X

ri∈V(t+1)

j

ri,j∈Q.(23)

If |V(t+1)

j| = 0 then we set µ(t+1)

j=µ(t)

j(that is,

no update).

We run the above algorithm until the intermediate decoded

word is unchanged at iteration step, t=to, that is, we have

ˆ

x(to−1) =ˆ

x(to). Then the ﬁnal estimate of the sent codeword is

xo=ˆ

x(to). The k-means cluster algorithm always converges

to a simple steady state, and limit cycles do not occur [21].

Note that the process may reach a local minimum of the

within-cluster sum of squares (18).

B. ASSIGNMENT STEP

Let us focus on the assignment step of the k-means clustering

technique given by (21). Considering the order (20) of the

centroids µ(t)

j, we conclude that the symbol rilies between,

say, µ(t)

u≤ri≤µ(t)

u+1, 0 ≤u≤q−2. Thus, we have

`=argmin

j∈{u,u+1}ri−µ(t)

j2

.

As

ri−µ(t)

u2−ri−µ(t)

u+12

=2riµ(t)

u+1−µ(t)

u+µ(t)

u2−µ(t)

u+12

=µ(t)

u+1−µ(t)

u2ri−µ(t)

u−µ(t)

u+1,(24)

we obtain

`=

u,ri<µ(t)

u+1+µ(t)

u

2,

u+1,ri>µ(t)

u+1+µ(t)

u

2.

(25)

Using (7), we yield

`=argmin

jri−µ(t)

j2=8ˆ

ϑ(ri),(26)

where

ˆ

ϑi=µ(t)

i+1+µ(t)

i

2,0≤i≤q−2.(27)

The vector, ˆ

x(t), equals

ˆx(t)

i=8ˆ

ϑ(ri),1≤i≤n.(28)

From the above, we see that the k-means clustering detec-

tion method, following object function (18), is a dynamic

threshold detector, where the threshold vector, ˆ

ϑ, is iteratively

updated with the means of the members of each cluster

using (23).

C. SIMULATIONS RESULTS

We now investigate the error performance of the channel

based on model (1). The qunknown drift parameters, bk,

k∈Q, are assumed to be taken from a zero-mean continuous

uniform distribution with variance σ2

b. Clearly, the parameters

bk’s are within the range −√3σb≤bk≤√3σb.

At the beginning of the iteration detection process, the cen-

troids are set to µ(1)

i=i,i∈Q, and the decoder iterates the

19322 VOLUME 8, 2020

FIGURE 1. Curve (a’): WER of fixed threshold detection (FTD), and curve

(b’): k-means clustering detection versus SNR = −20 log σ(dB) for n=64,

q=4, and σb=0.1. Curves (a) and (b) show the same parameters, but

σb=0. Curve (c) shows the upperbound (29) to the word error rate of a

fixed threshold detector for an ideal noisy channel, q=4 and n=64.

assignment and updates following the steps discussed above.

Figure 1shows results of simulations for the case n=64

and q=4. We compare the word error rate (WER) of

traditional ﬁxed threshold detection and the newly developed

dynamic threshold detection based on k-means clustering

versus signal-to-noise (SNR). The SNR is deﬁned by =

−20 log σ. We display two cases of interest, namely σb=0

(no variations, ideal channel) and σb=0.1 (random drift

parameters). We further display the upper bound of the word

error rate of a threshold detector for the ideal additive noise

channel [6]

WER <2(q−1)

qnQ 1

2σ.(29)

The error performance of k-means clustering detection for

σb=0 is close to the performance of both theory and

simulation practice of conventional ﬁxed threshold detection.

We infer that for σb=0.1, k-means clustering detection is

superior to ﬁxed threshold detection.

An important (time) complexity issue is the number of

iterations, which depends on the integers q,n, and the signal-

to-noise ratio, SNR. Although the convergence of the iteration

process is guaranteed [21], the speed of convergence, and thus

the complexity issue is an open question that we investigated

by simulations.

Results of simulations for n=64, q=4, and σb=0.1 are

shown Table 1(same parameters as used in the simulations

depicted in Figure 1). At an SNR =17 dB, around 91%

TABLE 1. Histogram of the number of iterations for q=4, n=64, and

σb=0.1.

of the received words is detected without further iterations.

In 8% of the detected words, only a single iteration of the

threshold levels is required before ﬁnishing the iteration

process. Essentially no iterations are required at a higher

SNR =20 dB. As no iterations are needed in the large

majority of cases,we infer that k-means clustering detection

outperforms ﬁxed threshold detection at the cost of only a

slight additional (time) complexity.

The situation is, however, slightly more subtle than can be

inferred from perusing Figure 1as a detector ﬂaw is hidden

behind the average values. We investigate this phenomenon

for the binary, q=2, case. If we look closer to the error

components, it turns out that codewords with a small or large

weight, where the weight of a codeword is the sum of the

1’s in that codeword, are much more prone to error than

codewords with (almost) equal numbers of ones and zeros.

Figure 2, Curve (a), shows for the k-means clustering method

the WER as a function of the weight of the sent codeword,

n=12, for a binary, q=2, ideal channel, σb=0. We infer

that words with a low or high weight (except the all-one and

all-zero words) are more prone to error than words with a

weight close to n/2, a so-called balanced word. The error

performance of ﬁxed threshold detection is independent of

the weight of the sent codeword, see Figure 2, Curve (b).

FIGURE 2. WER of k-means clustering detection, curve (a), and fixed

threshold detection (FTD), curve (b), versus the weight of the sent

codeword at an SNR = 17 dB for n=12, q=2, and σb=0 (ideal channel).

V. UNKNOWN GAIN AND OFFSET (SMALL RANGE OF

UNCERTAINTY)

In this section, we assume that the channel model, see (5),

ri=axi+b+νi,

applies. If the tolerance range of the gain, a, is close to

unity and the tolerance range of the offset, b, is close to

zero, we may directly apply the basic k-means clustering as

outlined in the previous section. Both aand bmust lie so close

to their nominal values that a ﬁxed threshold detector oper-

ates correctly in the noiseless case. Then, the initialization

VOLUME 8, 2020 19323

step, using the ﬁxed threshold detector, furnishes sufﬁciently

reliable data for the iterations to follow. From the deﬁnition

of a ﬁxed threshold decoder, see (7), we simply derive the fol-

lowing tolerance ranges of aand bthat guarantee a ﬂawlessly

operating threshold detector, namely

b< ϑ0=1

2,

ϑi−1<ai +b< ϑi,1≤i≤q−2,

a(q−1) +b> ϑq−2=q−3

2,(30)

or

b<1

2,

i−1

2<ai +b<i+1

2,1≤i≤q−2,

a(q−1) +b>q−3

2.(31)

Results of computer simulations are displayed in Figure 3,

where for the case n=64 and q=4, the WER of ﬁxed

threshold detection and detection based on k-means cluster-

ing versus SNR are compared. The channel gain is assumed

to be a=0.95 and b=0, respectively. We conclude that

the cluster detector shows a greater resilience in the face of

unknown gain, a, and additive noise than the ﬁxed threshold

detector, where the parameters aand bare assumed to have

a limited range of uncertainty. In case, however, they have

a wider tolerance range than prescribed by (31), it is not

possible to unambiguously detect the codeword with a ﬁxed

threshold detector. Then, the detector needs assistance, and

constrained coding is applied to assist in overcoming this

difﬁculty as discussed in the next section.

FIGURE 3. WER of both k-means clustering detection (cluster) and fixed

threshold detection versus SNR for n=64, q=4, a=0.95, and b=0. The

upperbound to the word error rate of a fixed threshold detector for the

ideal noisy channel given by (29) is shown as a reference.

VI. UNKNOWN GAIN AND OFFSET (LARGE RANGE OF

UNCERTAINTY)

In this section, we focus on the situation where we antici-

pate that in (5) both parameters aand bhave such a great

range of possible values that a ﬁxed threshold detector fails

in the majority of cases. In the next subsection, we show,

by example, that in such a case it is impossible to distinguish

between certain nettlesome situations, and constrained cod-

ing becomes a requirement to solve the ambiguity.

A. CONSTRAINED CODING

In order to cope with larger uncertainties of both parameters

aand b, we face an ambiguity problem. For example, let

q=5, and let (2,4,4) be the received vector. Clearly, it is

impossible to distinguish between the two choices, where the

sent codeword is (2,4,4) and a=1 or where (1,2,2) and

a=2. Let Sbe the adopted codebook, then we can cope

with the above ambiguity if (2,4,4) ∈Sthen (1,2,2) /∈S,

or vice versa. The name Pearson code was coined [19] for a

set of codewords that can be uniquely decoded by a detector

immune to large uncertainties in both gain a,a>0, and

word offset b. Pearson code design can be found in [22].

Codewords in a Pearson code, S, satisfy two conditions,

namely

•Property A: If x∈Sthen c1+c2x/∈Sfor all c1,c2∈R

with (c1,c2)6= (0,1) and c2>0.

•Property B: x=(c,c,...,c)/∈Sfor all c∈R.

We adopt a Pearson code that has codewords with at least

one ‘0’ symbol and at least one ‘q−1’ symbol. We may easily

verify that such codewords satisfy Properties A and B. The

number of allowable n-symbol codewords equals [19]

|S| = qn−2(q−1)n+(q−2)n,q>1.(32)

For the binary case, q=2, we simply ﬁnd that

|S| = 2n−2

(both the all-‘1’ and all-‘0’ words are deleted).

B. REVISED K -MEANS CLUSTERING USING MIN-MAX

INITIALIZATION

Here it is assumed that the parameters aand bare completely

unknown, except for the sign of a,a>0. Due to the

large uncertainty, we cannot adopt the elementary choice

of the initial values of the centroids µ(1)

ias described in

Section IV. We propose, following the min-max detector

technique described in Subsection III-B, the choice of the

initial centroids µ(1)

i’s using the minimum, miniri, and max-

imum value, maxiri, of the received symbols. The Pearson

code guarantees at least one ‘0’ symbol and also at least one

‘q−1’ symbol in a codeword. The detector may therefore use

the minimum and maximum value of the received symbols as

anchor points deﬁning the range of values of the symbols in

the received vector. To that end, let

α0=min

iriand α1=max

iri.(33)

The qinitial centroids, µ(1)

i, are found by the interpolation

µ(1)

i=α0+(α1−α0)i

q−1,0≤i≤q−1.(34)

19324 VOLUME 8, 2020

Note that the above initialization step of the modiﬁed

k-means clustering technique has the same effect as the scal-

ing used in the min-max detector (11). Figure 4shows results

of computer simulations for the case q=4 and n=64 and a

gain a=1.5. For normalization purposes, we deﬁne the SNR

by SNR = −20 log(σ/a). We compared prior art min-max

detector with the k-means clustering detection algorithm. The

detector based on k-means clustering outperforms the prior

art min-max detector.

FIGURE 4. WER for the case q=4, n=64 and gain a=1.5 of a) prior art

min-max detector as described in Subsection III-B, b) k-means clustering

algorithm, and c) upperbound (29) of an ideal fixed threshold detector.

Note that the signal-to-noise ratio is defined by SNR = −20 log(σ/a). The

error performance is independent of the offset b.

In the next subsection, we discuss a second modiﬁcation

to the basic k-means clustering method using regression

analysis.

C. REVISED K -MEANS CLUSTERING ALGORITHM USING

REGRESSION ANALYSIS

We adopt a second modiﬁcation to the clustering algorithm

of Section IV. In the basic updating step (23), the kcentroids

are updated by computing a new mean of the members in that

cluster only. Here we assume that the linear channel model,

ri=axi+b+νi, described by (5) holds. We have investi-

gated an alternative method for updating the centroids, µ(t+1)

j,

by applying the well-known linear regression model [23]

that estimates the two coefﬁcients aand binstead of the q

centroids µi.

We start and initialize as described in the previous sub-

section, where the qinitial centroids, µ(1)

i, are found by the

interpolation

µ(1)

i=α0+(α1−α0)i

q−1,0≤i≤q−1,(35)

where, as in (33),

α0=min

iriand α1=max

iri.(36)

For the offset only case, a=1, we have

µ(1)

i=α0+i,i=0,...,q−1.(37)

After the initialization, we iterate the next two steps until

equilibrium is reached.

•Assignment step: Assign the nreceived symbols, ri,

to the ksets V(t+1)

j. If ri, 1 ≤i≤n, is closest to µ(t)

`, or

`=argmin

jri−µ(t)

j2

,(38)

then riis assigned to V(t+1)

`. The (temporary) decoded

codeword, denoted by

ˆ

x(t)=(ˆx(t)

1,...,ˆx(t)

n),(39)

is found by

ˆx(t)

i=φV(t)(ri),1≤i≤n,(40)

where φV(t)(ri)=jsuch that ri∈V(t)

j.

•Updating step: Updates of the means µ(t+1)

j,j∈Q

are found by a linear regression model that estimates

the coefﬁcients aand b. To that end, deﬁne the linear

regression model

ˆri= ˆa(t)ˆx(t)

i+ˆ

b(t),(41)

where the (real-valued) regression coefﬁcients ˆa(t)and

ˆ

b(t), chosen to minimize Pn

i=1(ri− ˆri)2,denote the esti-

mates of the unknown quantities aand b. The regres-

sion coefﬁcients ˆa(t)and ˆ

b(t)are found by invoking the

well-known linear regression method [23], and we ﬁnd

using (13) and (15)

ˆa(t)=Pn

i=1(ri−r)(ˆxi− ˆx)

Pn

i=1(ˆxi− ˆx)2=σr

σˆx(t)

ρr,ˆ

x(t)(42)

and

ˆ

b(t)=r− ˆa(t)ˆx(t).(43)

We note that for all x∈S,σˆx(t)6= 0 since Property B

holds, see Subsection VI-A. The updated µ(t+1)

i,i=

0,...,q−1, are found by the interpolation

µ(t+1)

i= ˆa(t)i+ˆ

b(t).(44)

For the offset-only case, a=1, we simply ﬁnd

ˆ

b(t)=r− ˆx(t),(45)

and

µ(t+1)

i=i+ˆ

b(t)=i+r− ˆx(t).(46)

We have conducted a myriad of computer simulations with

the above algorithms. Figure 5compares the error perfor-

mance of the revised k-means clustering using min-max

initialization versus the revised k-means clustering using

regression analysis for the case q=16 and n=64. The

performance difference between the two cluster methods is

independent of the unknown quantities aand b.

VOLUME 8, 2020 19325

FIGURE 5. WER of a) revised k-means clustering using min-max

initialization, and b) revised k-means clustering algorithm using

regression method both for q=16 and n=64.

VII. CONCLUSION

We have analyzed k-means clustering technique as a dynamic

threshold detection technique of encoded strings of q-ary

symbols in the presence of signal dependent offset and

additive noise. We have evaluated the error performance

of k-means clustering detection technique, where the signal

dependent offsets are correlated, and can be modelled as

unknown scale, or gain, and translation, or offset. The pro-

posed k-means clustering classiﬁcation outperforms common

prior art detection methods at the cost of only a slight increase

in (time) complexity.

REFERENCES

[1] S. S. Garani, L. Dolecek, J. Barry, F. Sala, and B. Vasic, ‘‘Signal processing

and coding techniques for 2-D magnetic recording: An overview,’’ Proc.

IEEE, vol. 106, no. 2, pp. 286–318, Feb. 2018.

[2] L. Dolecek and Y. Cassuto, ‘‘Channel coding for nonvolatile memory

technologies: Theoretical advances and practical considerations,’’ Proc.

IEEE, vol. 105, no. 9, pp. 1705–1724, Sep. 2017.

[3] K. Immink, ‘‘A survey of codes for optical disk recording,’’ IEEE J. Sel.

Areas Commun., vol. 19, no. 4, pp. 756–764, Apr. 2001.

[4] G. M. Church, Y. Gao, and S. Kosuri, ‘‘Next-generation digital information

storage in DNA,’’ Science, vol. 337, no. 6102, p. 1628, Sep. 2012.

[5] Z. Mei, K. Cai, and X. Zhong, ‘‘Neural network-based dynamic threshold

detection for non-volatile memories,’’ in Proc. IEEE Int. Conf. Commun.

(ICC), Shanghai, China, May 2019, pp. 1–6.

[6] K. A. S. Immink and J. H. Weber, ‘‘Minimum pearson distance detection

for multilevel channels with gain and/or offset mismatch,’’ IEEE Trans. Inf.

Theory, vol. 60, no. 10, pp. 5966–5974, Oct. 2014.

[7] H. Zhou, A. Jiang, and J. Bruck, ‘‘Balanced modulation for non-

volatile memories,’’ Sep. 2012, arXiv:1209.0744. [Online]. Available:

https://arxiv.org/abs/1209.0744

[8] B. Peleato, R. Agarwal, J. M. Ciofﬁ, M. Qin, and P. H. Siegel, ‘‘Adaptive

read thresholds for NAND ﬂash,’’ IEEE Trans. Commun., vol. 63, no. 9,

pp. 3069–3081, Sep. 2015.

[9] C. Cao and I. Fair, ‘‘Mitigation of inter-cell interference in ﬂash memory

with capacity-approaching variable-length constrained sequence codes,’’

IEEE J. Sel. Areas Commun., vol. 34, no. 9, pp. 2366–2377, Sep. 2016.

[10] F. Sala, R. Gabrys, and L. Dolecek, ‘‘Dynamic threshold schemes for

multi-level non-volatile memories,’’ IEEE Trans. Commun., vol. 61, no. 7,

pp. 2624–2634, Jul. 2013.

[11] K. A. Schouhamer Immink and K. Cai, ‘‘Composition check codes,’’ IEEE

Trans. Inf. Theory, vol. 64, no. 1, pp. 249–256, Jan. 2018.

[12] D. Slepian, ‘‘Permutation modulation,’’ Proc. IEEE, vol. 53, no. 3,

pp. 228–236, Mar. 1965.

[13] K. A. Schouhamer Immink, K. Cai, and J. H. Weber, ‘‘Dynamic threshold

detection based on pearson distance detection,’’ IEEE Trans. Commun.,

vol. 66, no. 7, pp. 2958–2965, Jul. 2018.

[14] T. Zhao, A. Nehorai, and B. Porat, ‘‘K-means clustering-based data detec-

tion and symbol-timing recovery for burst-mode optical receiver,’’ IEEE

Trans. Commun., vol. 54, no. 8, pp. 1492–1501, Aug. 2006.

[15] A. K. Jain and R. C. Dubes, Algorithms for Clustering Data.

Englewood Cliffs, NJ, USA: Prentice-Hall, 1998.

[16] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge,

MA, USA: MIT Press, 2016.

[17] F. M. Puglisi, N. Zagni, L. Larcher, and P. Pavan, ‘‘Random telegraph

noise in resistive random access memories: Compact modeling and

advanced circuit design,’’ IEEE Trans. Electron Devices, vol. 65, no. 7,

pp. 2964–2972, Jul. 2018.

[18] G. Dong, N. Xie, and T. Zhang, ‘‘Enabling NAND ﬂash memory use

soft-decision error correction codes at minimal read latency overhead,’’

IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 60, no. 9, pp. 2412–2421,

Sep. 2013.

[19] K. Immink, ‘‘Coding schemes for multi-level Flash memories that are

intrinsically resistant against unknown gain and/or offset using reference

symbols,’’ Electron. Lett., vol. 50, no. 1, pp. 20–22, Jan. 2014.

[20] E. W. Forgy, ‘‘Cluster analysis of multivariate data: Efﬁciency versus

interpretability of classiﬁcations,’’ Biometrics, vol. 21, no. 3, pp. 768–769,

1965.

[21] L. Bottou and Y. Bengio, ‘‘Convergence properties of the k-mean algo-

rithms,’’ in Proc. Adv. Neural Inf. Process. Syst. Cambridge, MA, USA:

MIT Press, 1995, pp. 585–592.

[22] C. Cao and I. Fair, ‘‘Capacity-approaching variable-length pearson codes,’’

IEEE Commun. Lett., vol. 22, no. 7, pp. 1310–1313, Jul. 2018.

[23] R. V. Hogg and A. T. Craig, Introduction to Mathematical Statistics, 5th ed.

New York, NY, USA: Macmillan, 1995.

KEES A. SCHOUHAMER IMMINK (Fellow,

IEEE) received the M.Eng. and Ph.D. degrees

from the Eindhoven University of Technology,

and the Ph.D. degree (Hons.) from the University

of Johannesburg, in 2014. In 1998, he founded

Turing Machines Inc., an innovative start-up

focused on novel signal processing for DNA-based

storage, where he is currently the President.

He received the Knighthood, in 2000, the Personal

Emmy Award, in 2004, the 2017 IEEE Medal of

Honor, the 1999 AES Gold Medal, the 2004 SMPTE Progress Medal,

the 2014 Eduard Rhein Prize for Technology, and the 2015 IET Faraday

Medal. He was inducted into the Consumer Electronics Hall of Fame,

elected into the Royal Netherlands Academy of Sciences, and the (U.S.)

National Academy of Engineering. He served as the President of the Audio

Engineering Society, New York, in 2003.

KUI CAI (Senior Member, IEEE) received the

B.E. degree in information and control engineering

from Shanghai Jiao Tong University, Shanghai,

China, and the joint Ph.D. degree in electri-

cal engineering from the Technical University of

Eindhoven, The Netherlands, and the National

University of Singapore. She is currently an Asso-

ciate Professor with the Singapore University

of Technology and Design (SUTD). Her main

research interests are in the areas of coding theory,

information theory, and signal processing for various data storage systems

and digital communications. She received the 2008 IEEE Communications

Society Best Paper Award in Coding and Signal Processing for Data Storage.

She served as the Vice-Chair (Academia) of the IEEE Communications

Society, Data Storage Technical Committee (DSTC), from 2015 to 2016.

19326 VOLUME 8, 2020