Content uploaded by Kees Schouhamer Immink

Author content

All content in this area was uploaded by Kees Schouhamer Immink on Aug 11, 2021

Content may be subject to copyright.

Properties of Binary Pearson Codes

Jos H. Weber

Delft University of Technology

Delft, The Netherlands

Email: j.h.weber@tudelft.nl

Kees A. Schouhamer Immink

Turing Machines Inc.

Rotterdam, The Netherlands

Email: immink@turing-machines.com

Abstract—We consider the transmission and storage of data

that use coded symbols over a channel, where a Pearson-distance-

based detector is used for achieving resilience against unknown

channel gain and offset, and corruption with additive noise. We

discuss properties of binary Pearson codes, such as the Pearson

noise distance that plays a key role in the error performance of

Pearson-distance-based detection. We also compare the Pearson

noise distance to the well-known Hamming distance, since the

latter plays a similar role in the error performance of Euclidean-

distance-based detection.

I. INTRODUCTION

In mass data storage devices, the user data are translated

into physical features that can be either electronic, magnetic,

optical, or of other nature. Due to process variations, the

magnitude of the physical effect may deviate from the nominal

values, which may effect the reliable read-out of the data. For

example, over a long time, the charge in memory cells, which

represents the stored data, may fade away, and as a result

the main physical parameters change resulting in channel

mismatch and increased error rate. It has been found that such

retention errors to be dominant errors in solid-state memories.

The detector’s ignorance of the exact value of the channel’s

main physical parameters [1], [2], [3], [4], a phenomenon

called channel mismatch, may seriously degrade the error per-

formance of a storage or transmission medium [5]. Researchers

have been seeking for methods that can withstand channel

mismatch. Immink and Weber [5] advocated a novel data

detection method based on the Pearson distance that offers

invariance, “immunity”, to offset and gain mismatch. They

also showed the other side of the medal of Pearson-distance-

based detection, namely that it is less resilient to additive noise

than conventional Euclidean-distance-based detection.

In this paper, we investigate the relationship between the

noise resilience of Euclidean distance versus Pearson distance

detection. The outline of the paper is as follows. In Section II,

we present the channel model under consideration, we re-

capitulate the relevant prior art of minimum Euclidean and

Pearson distance detection, and we review the deﬁnition of

Pearson codes. In Section III, we discuss properties of binary

Pearson codes, such as lower and upper bounds to the error

performance of detectors based on the Pearson distance, and

the difference in noise resilience of minimum Euclidean versus

minimum Pearson distance detection. Section IV concludes

our paper.

II. BACKGROUND AND PRELIMINARIES

In this section we present some prior art, mainly from [5],

and set the scene for the results of this paper.

A. Pearson distance

We start with the deﬁnition of two quantities of an n-vector

of reals, z, namely the average of zby z=1

nn

i=1 ziand

the (unnormalized) variance of zby

σ2

z=

n

i=1

(zi−z)2.(1)

The Pearson distance between the vectors xand yin Rnis

deﬁned by

δp(x,y) = 1 −ρx,y,(2)

where the (Pearson) correlation coefﬁcient [6] is deﬁned by

ρx,y=n

i=1(xi−x)(yi−y)

σxσy

.(3)

It is immediate that the Pearson distance δp(x,y)is undeﬁned

if xor yhas variance zero, i.e., is a ‘constant’ vector

(c, c, . . . , c)with c∈R. Further, note that the Pearson distance

is not a regular metric, but a measure of similarity between

the vectors xand y. It can easily be veriﬁed that the triangle

inequality condition, δp(x,z)≤δp(x,y) + δp(y,z), is not

always satisﬁed. For example. let n= 4 and let x= (0001),

y= (0011), and z= (0010), then δp(x,z) = 1.3333,

δp(x,y) = 0.4226, and δp(y,z) = 0.4226.

B. Channel model

We assume a simple linear channel model where the sent

codeword c, taken from a ﬁnite codebook S ⊂ Rn, is received

as the real-valued vector

r=a(c+ν) + b1,(4)

where 1is the all-one vector (1,1,...,1) of length n, while

a,a > 0, is an unknown gain, b∈Ris an unknown offset,

and ν=(ν1, . . . , νn)is additive noise with νi∈Rbeing

zero-mean independent and identically distributed (i.i.d) noise

samples with Gaussian distribution N(0, σ2), where σ2∈R

denotes the variance. We assume that the parameters aand b

vary slowly, so that during the transmission of the nsymbols

in a codeword the parameters aand bare ﬁxed, but that these

values may be different for the next transmitted codeword.

C. Detection

Aminimum Pearson distance detector outputs a codeword

according to the minimum distance decision rule

cp= arg min

ˆ

c∈S

δp(r,ˆ

c).(5)

Due to the properties of the Pearson correlation coefﬁcient

such a detector is immune to gain and offset mismatch [5].

However, it is more sensitive to noise than the well-known

minimum Euclidean distance detector which outputs

ce= arg min

ˆ

c∈S

δ2(r,ˆ

c),(6)

with

δ2(x,y) =

n

i=1

(xi−yi)2(7)

being the squared Euclidean distance between xand y.

The computation of the probability that a minimum Pearson

distance detector errs has been investigated in [5]. A principal

ﬁnding is that it is not the (minimum) Pearson distance,

δp(x,y), between codewords xand y, that governs the error

probability, but a quantity called Pearson noise distance, which

is denoted by d(x,y). The squared Pearson noise distance,

d2(x,y), between the vectors xand yis given by

d2(x,y) = 2σ2

xδp(x,y) = 2σ2

x(1 −ρx,y).(8)

The union bound estimate of the word error rate (WER) is

WERPear ≤1

|S|

x∈S

y∈S,x̸=y

Qd(x,y)

2σ,(9)

where Q(x) = 1

√2π∞

xe−u2

2du is the well-known Q-

function. Note the similarity with the union bound for the

WER in case of a Euclidean detector, which reads

WEREucl ≤1

|S|

x∈S

y∈S,x̸=y

Qδ(x,y)

2σ(10)

for an additive Gaussian noise channel, i.e., a channel as in

(4) with the values of aand bbeing known to the receiver.

We emphasize again that for WER performance of the Pearson

distance detector it does not matter whether the gain and offset

values are known to the receiver or not, while the performance

of the Euclidean distance detector quickly deteriorates when

the gain and offset drift away from their ideal values a= 1

and b= 0 while being unknown to the receiver [5].

For small σ, the WER is dominated by the term with the

smallest distance between any xand any different y∈ S, so

that

WERPear ≈Np,minQdmin

2σ, σ ≪1,(11)

where dmin = minx,y∈S,x̸=yd(x,y)and Np,min is the num-

ber of codewords y(called nearest neighbors) at minimum

Pearson noise distance dmin from x, averaged over all x∈ S.

D. Pearson codes

In order to allow easy encoding and decoding operations,

it is common to use a q-ary codebook S, i.e., S⊆ Qnwith

Q={0,1, . . . , q −1}. Since a minimum Pearson distance

detector cannot deal with codewords cwith σc= 0 and cannot

distinguish between the words cand c11+c2c,c2>0,

well-chosen words must be barred from Qnto guarantee

unambiguous detection. Weber et al. [7] coined the name

Pearson code for a set of codewords that can be uniquely

decoded by a minimum Pearson distance detector. Codewords

in a Pearson code Ssatisfy two conditions, namely

•Property A: If c∈ S then c11+c2c/∈ S for all c1, c2∈R

with (c1, c2)̸= (0,1) and c2>0;

•Property B: c1/∈ S for all c∈R.

For a binary Pearson code, i.e., q= 2, this implies that only

two vectors must be barred, namely the all-‘0’ vector 0and

all-‘1’ vector 1. Hence, the largest binary Pearson code of

length nis

Pn={0,1}n\ {0,1}.(12)

However, in order to improve the error performance, it may

be necessary to further restrict the codebook, particularly by

avoiding codeword pairs with a small Pearson noise distance.

In the next section we investigate properties of the Pearson

(noise) distance and detector that provide more insight and as

such could be useful in the process of designing good Pearson

codes.

III. PROP ERT IE S OF B INARY PEARSON CODES

In this section, we study the important binary case, q= 2.

Particularly, we will determine bounds on the Pearson noise

distance and make comparisons with the Hamming distance.

First we give some notation. Let xand ybe two n-vectors

taken from the code S⊂ {0,1}n. We deﬁne the integers

wx=

n

i=1

xi, wy=

n

i=1

yi, wxy =

n

i=1

xiyi,(13)

where wxand wyare the weights of the vectors xand y,

respectively, and wxy, the index of ‘1’-coincidence (or overlap)

of the vectors xand y, denotes the number of indices iwhere

xi=yi= 1. Note that all additions and multiplications in

(13) are over the real numbers.

For clerical convenience, we deﬁne the real-valued function

φn(wx, wy, wxy) = d(x,y).(14)

Using (8) and the above deﬁnitions, we have

φ2

n(wx, wy, wxy) = 2σ2

x1−wxy −wxwy

n

σxσy,(15)

where

σ2

x=wx−w2

x

nand σ2

y=wy−w2

y

n.(16)

For all x,y∈ Pn, the integer variables wx,wy, and wxy

satisfy

1≤wx, wy≤n−1,(17)

max{wx+wy−n, 0} ≤ wxy ≤min{wx, wy},and (18)

wxy ≤wx−1if x̸=yand wx=wy.(19)

In the next subsections we present the main results of this

paper.

A. Bounds on the Pearson noise distance

Since the Pearson noise distance d(x,y)plays a crucial role

in the performance of a Pearson code, we should investigate

which values it can take. We start with a simple upper bound.

Theorem 1: For any two codewords xand yin Pn,n≥2,

it holds that

d2(x,y)≤4σ2

x≤nif nis even,

n−1/n if nis odd,

where equality holds in the ﬁrst inequality if and only if y=

1−x, while equality holds in the second inequality if and

only if wx=⌊n/2⌋or wx=⌈n/2⌉.

Proof. It is a well-known property of the Pearson correlation

coefﬁcient, ρu,v, of any two real-valued non-constant vectors

uand vof the same length, that |ρu,v| ≤ 1and also that

ρu,v=−1if and only if v=c11+c2u, where the coefﬁcients

c1and c2,c2<0, are real numbers [6, Sec. IV.4.6]. Hence,

for any x∈ Pn,d2(x,y)is maximized over all y∈ Pnif

and only if y=1−x, i.e., by setting yas the inverse of

x. The results as stated in the theorem now easily follow by

observing that

d2(x,1−x) = φ2

n(wx, n −wx,0) = 4σ2

x= 4 wx−w2

x

n

and that the last expression is maximized if and only if wx=

⌊n

2⌋or wx=⌈n

2⌉.

In case two codewords have equal weight, we have the

following useful observation.

Lemma 1: For any two codewords xand yin Pn,n≥2,

of equal weight, it holds that

d2(x,y) = 2(wx−wxy).

Proof. From (14)-(16) and the fact wx=wyit follows that

d2(x,y) = 2σ2

x1−wxy −w2

x

n

σ2

x= 2 σ2

x−wxy +w2

x

n

= 2(σ2

x−wxy +wx−σ2

x) = 2(wx−wxy),

which shows the stated result.

The minimum Pearson noise distance, dmin, between any

two different codewords plays a key role in the evaluation

of the error performance of the minimum Pearson detector,

see (11). The next theorem shows that dmin of Pnequals

φn(1,2,1). This was already conjectured in [5], but is now

formally proved.

Theorem 2: For any two different codewords xand yin

Pn,n≥3, it holds that

d2(x,y)≥φ2

n(1,2,1) = 2n−2

n1−n−2

2n−2,

where equality holds if and only if wx=wxy = 1,wy= 2

or wx=n−1,wy=wxy =n−2.

Proof. Our strategy is to look for three integers, wx,wy, and

wxy, that minimize the function φn(wx, wy, wxy ), under the

constraints (17)-(19). Any two different codewords xand y

having the found parameters will then minimize d(x,y). Since

ρx,y=ρy,x, it follows from (8) that it holds for such xand

ythat σ2

x≤σ2

y, i.e., wx≤wy≤n−wx. Further, we may

and will assume wx≤n/2since

d(x,y) = d(1−x,1−y)(20)

for all xand yin Pn.

With regard to the selection of the integer wxy, it is

straightforward from (15) that we should choose it as large as

possible for any values of wxand wy. We distinguish between

the cases wx=wyand wx< wy.

In case wx=wy, the value of wxy is at most wx−1

since x̸=y. Hence, from Lemma 1, we ﬁnd d2(x,y) =

2(wx−wxy)≥2. Note that the expression in the theorem is

clearly smaller than 2.

In case wx< wy, the maximum value of wxy is wx. Note

that wx< wyimplies that 1≤wx≤ ⌊(n−1)/2⌋. We proceed

with the selection of wy. From (14)-(16), we have

φ2

n(wx, wy, wx) = 2σ2

x(1 −α),(21)

where

α2=wx−wxwy

n2

σ2

xσ2

y

=1

wy−1

nw2

x

σ2

x

, α > 0.(22)

It is immediate from (21) and (22) that, for any value of

wx, the function φn(wx, wy, wx)is at a minimum when the

factor 1

wy−1

nis at a maximum. We conclude that, for all wx,

the choice wy=wx+ 1 minimizes (21). Subsequently, we

substitute wy=wx+ 1, and analyze the function

ψn(wx) = φ2

n(wx, wx+ 1, wx) = 2σ2

x(1 −β)(23)

in the single (integer) variable, wx, where, using (22), we write

β2=1

wx+ 1 −1

nw2

x

σ2

x

=wx(n−1−wx)

(wx+ 1)(n−wx), β > 0.

(24)

In order to determine the value of wx∈ {1,2, . . . , ⌊(n−1)/2⌋}

minimizing ψn(wx), we consider the function fn(w)which is

obtained by replacing the discrete variable wxin ψn(wx)by

the continuous variable w, with w∈[1,⌊(n−1)/2⌋]. We

replace wxby win (24) as well and then express win β,

obtaining

w=n−1

2−n

2gn(β),(25)

where

gn(β) = 1 + 1

n2+2(β2+ 1)

n(β2−1),0< β0≤β≤β1<1,

β2

0=β2|w=1 =n−2

2n−2,

0 5 10 15 20 25 30

0.76

0.77

0.78

0.79

0.8

0.81

0.82

0.83

n

minimum Pearson noise distance

Fig. 1. Minimum Pearson noise distance of Pn.

and

β2

1=β2|w=⌊n−1

2⌋=⌊n−1

2⌋⌈n−1

2⌉

⌊n+1

2⌋⌈n+1

2⌉.

Note that gn(β)is strictly decreasing with βon the interval

under consideration, and thus wis a strictly increasing function

of β. Next, we substitute (25) in fn(w), and the resulting

function with variable βis

hn(β) = n2−1

2n−ng(β)

2−g(β)(1 −β)

=β−1

n+β2+ 1

β+ 1 + (β−1)g(β).(26)

It is not hard to show that the three terms in (26) are all strictly

increasing with βin the range β0≤β≤β1, so that hn(β)is

at a minimum for β=β0. Thus, φn(wx, wy, wx)achieves a

minimum when wx=wxy = 1 and wy= 2. The expression

stated in the theorem follows by substituting these parameters

into (15). Because of (20), also the choice wx=n−1and

wy=wxy =n−2achieves this minimum. Finally, it follows

from the strict monotonicity of the functions used in the above

derivation that no other choices achieve the minimum Pearson

noise distance.

Note that it follows from this theorem that, for large values

of n, the minimum Pearson noise distance of the code Pn

approaches 2−√2≈0.765. A graphical representation is

provided in Figure 1.

We conclude this subsection with a look at the number

of codeword pairs having a certain Pearson noise distance

between each other. In this respect, note that the number of

pairs (x,y)with given values for wx,wy, and wxy is

n!

wxy!(wx−wxy )!(wy−wxy )!(n−wx−wy+wxy )!,(27)

which easily follows from standard combinatorial arguments.

For example, it follows from this result and Theorem 2 that the

number of codeword pairs (x,y)in Pnat minimum Pearson

noise distance is 2×n!

1!0!1!(n−2)! = 2n(n−1). Hence, dividing

this expression by the number of codewords gives Np,min,

which can be used, together with the minimum distance result

from Theorem 2, in (11) to obtain an approximate value for

the WER of a Pearson distance based detector.

B. Hamming versus squared Pearson noise distance

The Hamming distance between two vectors is an essen-

tial notion in coding theory, and a comparison between the

properties of Hamming and Pearson distance is therefore

relevant. Since xi, yi∈ {0,1}, the Hamming distance equals

the squared Euclidean distance, i.e.,

dH(x,y) =

n

i=1

(xi−yi)2=wx+wy−2wxy.(28)

It is essential that we deﬁne a fair yardstick for quantify-

ing the noise resilience of minimum Euclidean and Pearson

distance detection. To that end, we consider the ratio between

the squared Pearson noise distance and the Hamming distance,

denoted by gx,y, i.e.,

gx,y=d2(x,y)

dH(x,y).(29)

It follows from the WER analysis in Subsection II-C that this

ratio being smaller than one implies that the Euclidean detector

is more resilient to noise than the Pearson detector in case xis

transmitted and yis considered as an alternative for xin the

decoding process. Vice versa for this ratio being larger than

one.

As a ﬁrst observation, note that it follows from Lemma 1

and (28) that d2(x,y) = dH(x,y)and thus gx,y= 1 in

case xand yare of equal weight. Evidently, there is no

error performance difference between minimum Pearson and

Euclidean detectors for codewords drawn from a constant

weight set.

In the remainder of this subsection, we consider vectors x

and yfrom Pnwith the weight of xbeing ﬁxed at wx∈

{1,2, . . . , n −1}and the Hamming distance dH(x,y)being

ﬁxed at dH∈ {1,2, . . . , n}. Since the overlap wxy of xand

yis expected to have a high impact on gx,y, we consider

two extreme options for wxy in our analysis: 1) we choose

wy∈ {1,2, . . . , n −1}such that wxy is as small as possible,

2) we choose wysuch that wxy is as large as possible, in both

cases under the constraints of the ﬁxed values for the weight

of xand the Hamming distance between xand y.

Case 1: It follows in a straightforward way that the minimal

overlap of xand yis

wxy =

wx−dHif 1≤dH≤wx−1,

1if dH=wx,

0if wx+ 1 ≤dH≤n,

(30)

achieved for

wy=

wx−dHif 1≤dH≤wx−1,

2if dH=wx,

dH−wxif wx+ 1 ≤dH≤n.

(31)

0 2 4 6 8 10 12 14 16 18 20

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Hamming distance

g

n=20, w x=6, wxy minimal

g

y,x

g

x,y

Fig. 2. The ratios gx,yand gy,xin case n= 20,wx= 6,1≤dH≤

20, and wy∈ {1,2,...,19}chosen such that wxy is minimized

(Case 1).

0 2 4 6 8 10 12 14 16 18 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Hamming distance

g

n=20, w x=6, wxy maximal

g

y,x

g

x,y

Fig. 3. The ratios gx,yand gy,xin case n= 20,wx= 6,1≤dH≤

20, and wy∈ {1,2, . . . , 19}chosen such that wxy is maximized

(Case 2).

Case 2: Similarly, we have that the maximal overlap of x

and yis

wxy =

wxif 1≤dH≤n−wx−1,

wx−1if dH=n−wx,

n−dHif n−wx+ 1 ≤dH≤n,

(32)

achieved for

wy=

wx+dHif 1≤dH≤n−wx−1,

n−2if dH=n−wx,

2n−dH−wxif n−wx+ 1 ≤dH≤n.

(33)

The g-ratios can now be obtained from (29) by applying

(14), (15), and (28). Figures 2 and 3 show, for Cases 1 and 2,

respectively, the resulting gx,yand gy,xvalues for n= 20,

wx= 6, and 1≤dH≤20.

Several interesting observations can be made from these

ﬁgures. First of all, note that there are ‘irregularities’ for the

gy,xcurves at dH=wx= 6 (Case 1) and dH=n−wx= 14

(Case 2). For Case 1 this can be explained as follows. From

(30) we see that wxy equals max{0, wx−dH}, except when

dH=wx= 6, because this would imply wy= 0 (impossible

since 0/∈ Pn). Hence, wxy = 1 >0for dH= 6, which leads

to the observed notch. Similarly for Case 2 (using 1/∈ Pn).

Further, we observe that all curves end at the same point.

This is due to the fact that for dH=nthe only possible

options for wyand wxy when wxis given read wy=n−wx

and wxy = 0. The resulting value is

gx,1−x=g1−x,x=4σ2

x

n=4wx1−wx

n

n(34)

in general, and thus 0.84 for the example under consideration.

Finally, note that, as expected, the largest g-ratios are found

in Case 1. Most strikingly, we see that these ratios may even

exceed the value one (see Figure 2), suggesting that the noise

resistance of the Pearson detector is higher than the noise

resistance of the Euclidean detector for these cases. Of course,

this cannot be true, since a Euclidean detector is well-known

to be optimal in case of Gaussian noise. Indeed, we observe

that in all cases that gx,yexceeds one, its counterpart gy,x

is smaller than one. Similarly, gy,x>1implies gx,y<1.

Since codeword pairs with smaller distances are dominant

with respect to contributions to the WER, the overall result

is still that from the noise perspective Euclidean detectors

are superior to Pearson detectors, which is the price to be

paid for the immunity of the latter detectors to gain and

offset mismatches. The analysis as done in this paper can be

exploited in the design of new Pearson codes, i.e., subsets

of Pn, with a noise performance closer to the Euclidean

case, by avoiding the selection of codeword pairs with small

Pearson noise distances. It is clear that in order to increase

the Pearson noise distance, the focus should not only be on

Hamming distance increase, since these two distance measures

are certainly not growing proportionally. Rather, also the

codeword weights must be taken into account.

IV. CONCLUSIONS

We have investigated various properties of Pearson-distance-

based detection and Pearson codes. For binary codes, we have

derived upper and lower bounds on the Pearson noise distance

and studied relations with the Hamming distance.

As possibilities for future work we identify (i) application

of the ﬁndings in order to construct codes with an increased

minimum Pearson noise distance and (ii) extension of the

results to q-ary codes.

REFERENCES

[1] N. Papandreou, T. Parnell, H. Pozidis, T. Mittelholzer, E. Eleftheriou,

C. Camp, T. Grifﬁn, G. Tressler, and A. Walls, “Using Adaptive Read

Voltage Thresholds to Enhance the Reliability of MLC NAND Flash

Memory Systems,” Proc. 24th Great Lakes Symp. on VLSI, ACM, pp.

151-156, 2014

[2] Y. Emre, C. Yang, K. Sutaria, Y. Cao, and C. Chakrabarti, “Enhancing the

reliability of STT-RAM through circuit and system level techniques,” in

Tech. Dig. IEEE Workshop on Signal Processing Systems (SiPS), Quebec

City, Canada, pp. 125-130, Oct. 2012.

[3] A. Jiang, R. Mateescu, M. Schwartz, and J. Bruck, “Rank Modulation

for Flash Memories,” IEEE Trans. Inform. Theory, vol. IT-55, no. 6, pp.

2659-2673, June 2009.

[4] K. A. S. Immink, “A Survey of Codes for Optical Disk Recording,” IEEE

J. Select. Areas Commun., vol. 19, no. 4, pp. 756-764, April 2001.

[5] K. A. S. Immink and J. H. Weber, “Minimum Pearson Distance Detection

for Multi-Level Channels with Gain and/or Offset Mismatch,” IEEE

Trans. Inform. Theory, vol. IT-60, pp. 5966-5974, Oct. 2014.

[6] A. M. Mood, F. A. Graybill, and D. C. Boes, Introduction to the Theory

of Statistics, 3rd ed. New York, NY, USA: McGraw-Hill, 1974.

[7] J. H. Weber, K. A. S. Immink, and S. Blackburn, “Pearson Codes,” IEEE

Trans. Inform. Theory, vol. IT-62, no. 1, pp. 131-135, Jan. 2016.