ArticlePDF Available

Minimum Pearson Distance Detection for Multilevel Channels With Gain and/or Offset Mismatch

Authors:

Abstract and Figures

The performance of certain transmission and storage channels, such as optical data storage and nonvolatile memory (flash), is seriously hampered by the phenomena of unknown offset (drift) or gain. We will show that minimum Pearson distance (MPD) detection, unlike conventional minimum Euclidean distance detection, is immune to offset and/or gain mismatch. MPD detection is used in conjunction with (T) -constrained codes that consist of (q) -ary codewords, where in each codeword (T) reference symbols appear at least once. We will analyze the redundancy of the new (q) -ary coding technique and compute the error performance of MPD detection in the presence of additive noise. Implementation issues of MPD detection will be discussed, and results of simulations will be given.
Content may be subject to copyright.
1
Minimum Pearson Distance Detection for
Multi-Level Channels with Gain and/or Offset
Mismatch
Kees A. Schouhamer Immink, Fellow, IEEE, and Jos H. Weber, Senior Member, IEEE
Abstract -The performance of certain transmission
and storage channels, such as optical data storage and
Non-Volatile Memory (Flash), is seriously hampered by the
phenomena of unknown offset (drift) or gain. We will show
that Minimum Pearson Distance (MPD) detection, unlike
conventional Minimum Euclidean Distance detection, is
immune to offset and/or gain mismatch. MPD detection
is used in conjunction with T-constrained codes that
consist of q-ary codewords, where in each codeword T
reference symbols appear at least once. We will analyze
the redundancy of the new q-ary coding technique, and
compute the error performance of MPD detection in the
presence of additive noise. Implementation issues of MPD
detection will be discussed, and results of simulations will
be given.
Key words: Constant composition code, permutation
code, rank modulation, flash memory, digital optical data
storage, recording, Non-Volatile Memory, NVM, mismatch,
adaptive equalisation, fading, Pearson distance, Euclidean
distance, fading.
I. INTRODUCTION
We consider a communication codebook, S, of chosen q-
ary codewords x= (x1, x2, . . . , xn)over the q-ary alphabet
Q={0,1, . . . , q 1},q2, where n, the length of x,
is a positive integer. Usually it is assumed that the primary
impairment of a sent codeword xis additive Gaussian noise,
but here, however, it is assumed that the received word r=
a(x+ν)+b,riR, is scaled by an unknown factor, called
gain,a̸= 1,a > 0, offsetted by an unknown (dc-)offset
b̸= 0 (both quantities unknown to both sender and receiver),
where aand bR, and corrupted by additive noise ν=
(ν1, . . . , νn),νiR. We use the shorthand notation x+
b=(x1+b, x2+b, . . . , xn+b). The unknown channel
gain and offset mismatch may lead to a devastating loss in
performance, as shown, for example, in [1].
There are many examples of channels with offset and
gain mismatch. In optical data storage, both the gain and
Kees A. Schouhamer Immink is with Turing Machines Inc, Willemskade
15b-d, 3016 DK Rotterdam, The Netherlands. E-mail: immink@turing-
machines.com.
Jos Weber is with Delft University of Technology, Delft, The Nether-
lands. E-mail: j.h.weber@tudelft.nl.
offset depend on the reflective index of the disc surface and
the dimensions of the written features [2]. Fingerprints on
optical discs may result in rapid gain and offset variations
of the retrieved signal. In baseband transmission channels,
offset may arise as baseline wander, which is caused by
the attenuation of the low frequencies of the channel (a.c.
coupling) [3]. In wireless communication channels, path loss
will result in unknown gain, fading, of the channel. Reading
errors in solid-state (Flash) memories may originate from
cell drift in aging devices [4].
In optical disc data storage devices and non-volatile
memories, constrained codes, specifically dc-free or bal-
anced codes, have been used and/or proposed to counter
the effects of offset and gain mismatch [1]. Jiang et al. [4]
addressed a q-ary balanced coding technique, called rank
modulation, for circumventing the difficulties with flash
memories having aging offset levels. Zhou et al. [5], Sala et
al. [6], and Immink [7] investigated the usage of balanced
codes for enabling ‘dynamic’ reading thresholds in non-
volatile memories.
Codewords taken from a q-ary balanced code can be
retrieved by a minimum Euclidean distance detector, where
the detection process of q-ary balanced codewords is in-
trinsically resistant to any offset and/or gain mismatch.
Unfortunately, the redundancy price of q-ary balanced codes
is unattractively high for small values of n[8].
We propose the Pearson distance as an alternative to the
Euclidean distance, since it is, as will be shown, intrinsically
resistant to offset and gain mismatch. It will be shown
that the Pearson distance measure can only be applied to
codebooks with special properties, and constrained coding
is therefore required. To that end, we propose q-ary T-
constrained codes, where T,0< T q,preferred
or reference symbols must appear at least once in every
codeword [7]. The redundancy of T-constrained codes is
much lower than that of prior art q-ary balanced codes,
which makes Minimum Pearson Distance detection in con-
junction with T-constrained codes an attractive alternative
for practical applications.
We start in Section II with a discussion of the prior
art Minimum Euclidean Distance (MED) detection of q-
ary balanced codewords. In Section III, we will describe
Minimum Pearson Distance (MPD) detection in conjunction
with T-constrained codewords, and we will show that the
detection performance is independent of gain and offset.
2
Conventional minimum Euclidean distance detection offers
optimum (maximum likelihood) error performance for the
‘ideal’ noisy channel, so Minimum Pearson Distance detec-
tion in the presence of additive noise will, by necessity, show
a somewhat lessened noise immunity. In Section IV, we
will analyze the error performance of MPD detection in the
presence of additive Gaussian noise. Implementation issues
of MPD detection are discussed in Section V. In Section VI,
we will describe our conclusions.
II. PR IO R ART
Before we take a look at the characteristics of the
proposed technology, we will discuss prior-art detection
schemes that are based on the classical Euclidean distance
metric. The codeword xSis sent and received as
r=x+ν, where νis additive Gaussian noise ν=
(ν1, . . . , νn). For each codeword ˆ
xin the codebook Sthe
receiver computes the (squared) Euclidean distance
δ(r,ˆ
x) =
n
i=1
(riˆxi)2(1)
between the received signal vector rand the codeword
ˆ
x. After exhaustively computing all distances, the receiver
decides that the codeword, xo, with minimum Euclidean
distance to the received vector, r, is the codeword sent, or
in succinct form
xo= arg min
ˆ
xS
δ(r,ˆ
x).(2)
For a mismatch channel, we simply find
δ(r,ˆ
x) =
n
i=1
(riˆxi)2=
n
i=1
(ax
i+bˆxi)2
or
δ(r,ˆ
x) = 2ax
iˆxi
+(ax
i+b)22bˆxi+ˆx2
i,(3)
where x=x+ν. Mismatch may lead to a warping of
the Euclidean distance measure, which will lead to a loss in
noise margin or even to a complete loss of the codeword.
Figure 1 shows results of computations and simulations of
the word error rate (WER) of a communication system us-
ing minimum Euclidean distance detection with parameters
q= 4 and n= 8 a) without mismatch, and b) with a
gain, a= 1.07, and offset, b= 0.07, mismatch. We may
notice that the error performance is seriously affected by the
mismatch. The third curve shows the error performance of
a new detection scheme proposed in Section III that offers
intrinsic resistance to gain and/or offset mismatch at the cost
of a reduced noise margin.
In the prior art, coding techniques and alternative de-
tection methods have been sought to offer solace to the
aforementioned loss of performance caused by channel
mismatch. For example, it has been known in the art [8] that
15 15.5 16 16.5 17 17.5 18 18.5 19
10−5
10−4
10−3
10−2
10−1
100
WER
SNR (dB)
no mismatch
new detector
a=1.07, b=0.07
Fig. 1. Word error rate (WER) for q= 4,n= 8 a) without channel
mismatch, b) with channel mismatch a= 1.07 and b= 0.07, both
using minimum Euclidean distance detection, and as a comparison
c) for a new detection scheme that is immune to channel mismatch
using minimum Pearson distance detection (see Section III).
balanced codes in conjunction with minimum Euclidean dis-
tance detection offer intrinsic resistance to gain and/or offset
mismatch. By definition, all codewords xin a balanced code
have the property that the symbol sum
n
i=1
xi=a1
and symbol energy
n
i=1
x2
i=a2,
are prescribed, where a1and a2are two positive integers
selected by the code designer. Then, we simply find
δ(r,ˆ
x) = 2ax
iˆxi+(ax
i+b)22ba1+a2.(4)
The metric consists of one (first) term, 2ax
iˆxi, depen-
dent of ˆ
xand three terms independent of ˆ
x. Clearly, it does
not matter for the outcome of the detection process (2) or
detection performance if we scale (by a positive constant) or
translate the metric function with a constant independent of
the variable ˆ
x. A scaled or translated version of the metric
is equivalent to the original version of the metric. Metric
equivalence will be denoted by the sign. From (4), it
is clear that in case all codewords satisfy the symbol sum
and energy constraints that (1) can be rewritten as (note we
presumed a > 0)
δ(r,ˆ
x)≡ −
n
i=1
x
iˆxi.(5)
Thus Euclidean distance detection of balanced codes is
intrinsically resistant to channel mismatch as (5) is inde-
pendent of the actual channel gain and offset. The sole
disadvantage of the method, which makes it unattractive for
practice, is the high redundancy of balanced codes [8].
3
In the next section, we will introduce the Minimum
Pearson Distance detector, and we will show that it is an
attractive alternative to the Euclidean distance for the signal
detection in the presence of both channel mismatch and
additive noise as it also offers intrinsic resistance to channel
mismatch.
III. MINIMUM PEA RS ON DI STAN CE D ET EC TI ON O F
T-CO NS TR AI NE D CO DE S
We will show that the Minimum Pearson Distance detec-
tor can only operate unambiguously in case the codebook,
S, used satisfies specific constraints, and thus constrained
codes are required. To that end, we will introduce a
class of codes, called T-constrained codes, that satisfy
the requirements of unambiguous detection using the new
Pearson-distance-based detector [7]. First we will start with
a presentation of the Minimum Pearson Distance detection
method. Thereafter, we will discuss the various requirements
to be imposed on the constrained code.
A. Minimum Pearson distance detection
It is assumed that an arbitrary codeword xSis sent,
and that the received word, r, is given by r=a(x+ν)+b.
The Pearson distance, δ2(r,ˆ
x), between rand ˆ
xis defined
by
δ2(r,ˆ
x) = 1 ρr,ˆ
x,(6)
where
ρr,ˆ
x=n
i=1(rir)(ˆxiˆx)
σrσˆx
(7)
is the well-known (Pearson) correlation coefficient, and
where we define two quantities, namely the average symbol
value of ˆ
x
ˆx=1
n
n
i=1
ˆxi,(8)
and the (unnormalized) symbol value variance of ˆ
x
σ2
ˆx=
n
i=1
xiˆx)2,(9)
where it is assumed that both codewords xand ˆ
xare taken
from a judiciously chosen codebook S, whose properties
will be explained in the next subsection. The variance, σ2
r,
and average, r, of the received vector are defined in the same
way. The term 1ρr,ˆ
xis often named the Pearson distance
between the vectors rand ˆ
x. As the Pearson correlation
coefficient falls in the interval [-1, 1], the Pearson distance
lies in the interval [0, 2].
The detector operates in the same vein as the conventional
Euclidean distance detector: the detector computes for all
codewords ˆ
xSthe Pearson distance between rand ˆ
x.
The receiver decides that the codeword, xo, with minimum
Pearson distance to the received vector, r, is the codeword
sent, or in succinct form
xo= arg min
ˆ
xS
δ2(r,ˆ
x).(10)
0 5 10 15 20 25
0
5
10
15
20
25
Euclidean distance
Pearson distance
Fig. 2. Scatter diagram of Euclidean distance, δ(x,ˆ
x), (1) versus
Pearson distance, δ2(x,ˆ
x), (13) for n= 6 and q= 3.
A key property of the Pearson distance, 1ρr,ˆ
x, is that it
is invariant to changes in translation or scale (up to a sign)
in the two vectors rand ˆ
x. That is,
ρr,ˆ
x=ρc1+c2r,ˆ
x,(11)
where c1and c2>0are constants. Clearly, the above
property ensures that the detection outcome based on the
Pearson distance (10) is intrinsically resistant to offset and
gain mismatch.
After some manipulation of (6), we may write down two
equivalents of the Pearson distance measure, namely
δ2(r,ˆ
x)≡ − 1
σˆx
n
i=1
rixiˆx)(12)
and
δ2(r,ˆ
x)
n
i=1 riˆxiˆx
σˆx2
.(13)
The equivalence of (12) with (6) is straightforward. Eq. (13)
can be rewritten as
n
i=1
r2
i2
σˆx
n
i=1
rixiˆx) +
n
i=1 ˆxiˆx
σˆx2
.
Using (9) and after deleting the first and third term, we
arrive at the equivalence of (12) with (13), and thus of the
equivalence of (13) with (6). Eq. (13) shows an interesting
relationship with respect to the conventional Euclidean
distance (1), where the vector ˆ
xis translated by ˆxand scaled
by σˆx.
In Figure 2, we have plotted an illustrative scatter diagram
of the Euclidean distance δ(x,ˆ
x), (1), versus the Pearson
distance δ2(x,ˆ
x), (13), for n= 6 and q= 3 for almost (see
later) all possible vectors xand ˆ
x. We notice quite some
difference between the alternative distance measures. The
Pearson distance measure is highly attractive for mismatch
channels, but we must clear a few hurdles before we
can fruitfully apply it in practice. It is immediate, see
4
(7), that the Pearson distance is undefined for codewords
xwith σx= 0. In addition, as mentioned above, we
have ρr,ˆ
x=ρc1+c2r,ˆ
x, so that a Pearson-distance-based
detector cannot distinguish between the sent codewords x
and c1+c2x. We remedy this flaw by using constrained
codes: in case xS, all words c1+c2x, where c1is any
real number and c2is any positive real number, are excluded
from S. In addition, the qwords x= (a, . . . , a),a∈ Q,
with σx= 0 must be barred from S.T-constrained codes,
to be discussed below, satisfy all these requirements.
B. Description of T-constrained codes
T-constrained codes, presented in [7] for enabling simple
dynamic threshold detection of q-ary codewords, consist
of q-ary n-length codewords, where T,0< T q,
preferred or reference symbols must appear at least once in
a codeword. A set of T-constrained codewords is denoted
by ST. Of specific interest in this paper are two sets of T-
constrained codes denoted by S1and S2. The set S1has
codewords where the symbol ‘0’ appears at least once, and
the set S2contains codewords where both the symbols ‘0’
and ‘q1’ appear at least once. The size of |S1|and |S2|,
respectively, equal [7]
|S1|=qn(q1)n, q > 1.(14)
and
|S2|=qn2(q1)n+ (q2)n, q > 1.(15)
For the binary case, q= 2, we simply find that
|S1|= 2n1
(the all-‘1’ word is deleted), and
|S2|= 2n2
(both the all-‘1’ and all-‘0’ words are deleted).
Very efficient implementations of high-rate T-constrained
codes can be constructed with the nibble replacement
method described in [9]. The code is based on an algorithm
that recursively removes w > 0disallowed n-symbol q-ary
words in a string of L n-symbol codewords while only one
q-ary redundant symbol is added. The code rate of the nibble
replacement equals 11/(nL), where
L(q1)qn1
w.(16)
A conventional n-symbol block code can be constructed
with a rate 11/n, while the nibble replacement code
makes it possible to construct a code with rate 11/(nL).
For example, for a binary T= 1 code, n= 10,w= 1,
the code rate is (at most) 5119/5120, while for a T= 2
code with the same parameters, w= 2, the code rate is (at
most) 2559/2560. The rate of a conventional code would
be 9/10, that is almost 10% more redundant.
The T-constraints imposed imply the following properties
of the codewords.
Property A: If xS1then x+c /S1for all non-zero
cR.
Proof: By definition, xhas at least one position, say k,
where xk= 0. Hence if c < 0, then xk+c < 0, and thus
x+c /S1. If c > 0, then xi+c > 0for all i, and thus
x+c /S1since x+cdoes not contain the symbol ‘0’.
Property B: If xS2then c1+c2x/S2for all c1, c2R
with (c1, c2)̸= (0,1) and c2>0.
Proof: Suppose c1+c2xS2. Since c2>0and xi0
for all i, it follows that c10in order to have at least one
‘0’ in c1+c2x. Since c1<0would result in at least one
negative value in c1+c2x(in a position where xhas a ‘0’),
it follows that c1= 0 and thus c2̸= 1. If 0< c2<1, then
all symbols in c1+c2x=c2xare smaller then q1, while
if c2>1, then at least one value in c1+c2xis larger than
q1(in a position where xhas a ‘q1’). In conclusion,
c1+c2x̸∈ S2.
Property C: x=c /S2for all cR.
Proof: All codewords xS2have at least one symbol equal
to 0 and at least one symbol equal to q1. Since q > 1,
we conclude that x=c /S2.
The above properties of the codebook S2are sufficient to
solve the flaws of the Pearson distance measure discussed
in the previous section. Firstly, property C guarantees that
(6) cannot be singular since σx̸= 0,xS2. Secondly,
according to Property B we have for xand ˆ
xS2,
ˆ
x̸=c1+c2x. Then in the noiseless case, ν= 0, the
detector operates without errors since for any sent x, the
detector metric (6) is at a global minimum for ˆ
x=x
since according to a well-known property of the Pearson
correlation coefficient ρx,ˆ
x= 1 only for ˆ
x=c1+c2xso
that δ2(x,x) = 0 and δ2(x,ˆ
x)>0,ˆ
x̸=x.
C. Offset-mismatch channels
Evidently, Minimum Pearson Distance detection pre-
sented above can be used for channels with offset mismatch
only (a= 1). However, by a slight modification of the
Pearson distance measure we may reduce the redundancy
of the constrained code, and improve the resistance against
additive noise. We propose the distance measure
δ1(r,ˆ
x) =
n
i=1
(riˆxi+ ˆx)2,(17)
where, as we may notice, we have deleted σˆx, in (13).
The advantage of the modified distance measure for offset-
mismatch channels is that codewords can be drawn from S1
instead of S2, which reduces the code redundancy.
We first show that δ1(r,ˆ
x)is, as claimed, independent
of the channel’s offset b, and thereafter we show that in
the noiseless case ν= 0,δ1(x,ˆ
x)has a unique minimum
at ˆ
x=xS1. Subsequently, we will show that for
application of (17) it is sufficient to draw codewords from
S1.
5
We find
δ1(r,ˆ
x) =
n
i=1
(x
i+bˆxi+ ˆx)2
=
n
i=1
(x
iˆxi+ ˆx)2+ 2b
n
i=1
(x
iˆxi+ ˆx) + b2,
where x
i=xi+νi. By definition (8), we have
n
i=1
(x
iˆxi+ ˆx) =
n
i=1
x
i,
so that
δ1(r,ˆ
x)
n
i=1
(x
iˆxi+ ˆx)2.(18)
From the above it is immediate that the minimization of
δ1(r,ˆ
x)is intrinsically resistant to the channel offset b.
In the noiseless case, ν= 0, the detector must operate
error-free, so that for any xthe detector metric δ1(x,ˆ
x)
must have a unique minimum at ˆ
x=x, that is, δ1(x,x)<
δ1(x,ˆ
x)for x̸=ˆ
x, and x,ˆ
xS1. From (17), we conclude
that δ1(x,ˆ
x) = δ1(x,x)is true if for all i
xi= ˆxiˆx= ˆxi+c, (19)
where cis any real number. However, Property A guarantees
that the above is false for codewords in S1, and thus
δ1(x,x)< δ1(x,ˆ
x),x̸=ˆ
x. Properties B and C are not
required, so that it is sufficient that we draw the codewords
from S1, which, clearly, has a positive bearing on the
redundancy of the system.
In the next section, we will compute the error perfor-
mance of the MPD detection method in the presence of
additive noise.
IV. PER FO RM AN CE ANALYS IS
We will now focus on the detection of codewords con-
veyed over channels with gain and/or offset mismatch,
where we assume that a randomly chosen codeword xST
is sent and received as r=x+ν, where ν=(ν1, . . . , νn),
and νiare noise samples with distribution N(0, σ2). Note
that we dropped the scalars aand bsince, as established
above, the MPD detector performance is independent of
those parameters.
We will start by computing the detection error perfor-
mance for the simplest, ‘offset only’, case, T= 1, using
(17) and codewords taken from codebook S1.
A. T= 1, offset mismatch
We assume xS1is sent, and that the receiver applies
(17) to compute the distance between the received vector
r=x+νfor all codewords in S1. The receiver decides
that the codeword xowas sent if (17) attains its least value
for ˆ
x=xo, that is
xo= arg min
ˆ
xS1
δ1(r,ˆ
x) = arg min
ˆ
xS1
n
i=1
(riˆxi+ ˆx)2.(20)
We have ri=xi+νi, so that
δ1(r,ˆ
x) =
n
i=1
(xi+νiˆxi+ ˆx)2.
The analysis is simplified by noticing that
δ1(r,ˆ
x)
n
i=1
(xi+νiˆxi+ ˆxx)2.
Define e=xˆ
xand e=xˆx, then we obtain the much
simpler expression
δ1(r,ˆ
x)
n
i=1
(eie+νi)2.
The detector errs, i.e. xo̸=x, if there is at least one
codeword ˆ
xS1,ˆ
x̸=x, such that
δ1(r,ˆ
x)< δ1(r,x),
or n
i=1
(eie+νi)2<
n
i=1
ν2
i(21)
or
2
n
i=1
νi(eie) +
n
i=1
(eie)2<0.(22)
The left-hand side of (22) is a stochastic variable with
distribution N(α1, β1σ2), where
α1=
n
i=1
(eie)2
and
β1= 4
n
i=1
(eie)2.
We define the square of the distance between the vectors x
and ˆ
xby
d2
1(x,ˆ
x) = 4α2
1
β1
=
n
i=1
(eie)2.
The probability that δ1(r,ˆ
x)< δ1(r,x)equals
Qd1(x,ˆ
x)
2σ,
where the Q-function is defined by
Q(x) = 1
2π
x
eu2
2du.
The word error rate (WER) over all coded sequences xis
now upperbounded by
WER <1
|S1|
xS1
ˆ
x̸=x
Qd1(x,ˆ
x)
2σ.(23)
Define the square of the minimum distance between any
possible pair of codewords in S1by
d2
min,1= min
x,ˆ
xS1
x̸=ˆ
x
d2
1(x,ˆ
x) = min
e̸=0
n
i=1
(eie)2.(24)
6
Then, for asymptotically large signal-to-noise-ratio’s, i.e. for
σ << 1, the word error rate is overbounded by [10]
WER < N1Q(dmin,1
2σ),(25)
where N1is the average number of pairs of codewords
(neighbors) at minimum distance, dmin,1. The minimum
distance, dmin,1, is called (asymptotic) coding loss. We
simply find
dmin,1=11
n.(26)
The computation of the average number neighboring pairs
of codewords ˆ
xof xin S1at minimum distance dmin,1
is a formidable combinatorics exercise. We may, however,
compute an approximation to N1by using the following
observation. Clearly, a pair of codewords ˆ
xof xat unity
Euclidean distance δ(ˆ
x,x)is also a pair of codewords at
minimum distance dmin,1. For a full set of qncodewords, a
neighbor at unity Euclidean distance from xis obtained by
adding or subtracting a ‘one’ from each of the nsymbols in
x, unless the symbol equals 0 or (q1), then we can only
add a one (if x1= 0) or subtract a one (if xi=q1). Then
the average number of pairs of codewords at unity Euclidean
distance equals 2n(q1)/q. Since the code set S1is
obtained by deleting a relatively small number of codewords
from the full set, we will approximate the average number
of neighbors, N1, at minimum distance dmin,1in S1by
N12n(q1)
q.(27)
For small values of nand q > 2the computation of N1is
amenable by exhaustively computing the distance between
each pair of codewords. We found that in the range of
(small) values of nand qinvestigated that the approximation
is of sufficient accuracy for engineering applications.
By combining (25), (26), and (27), we obtain an upper-
bound to the word error rate (WER)
WER <2n(q1)
qQ1
2σ11
n.(28)
Figure 3 shows the WER, computed using the above upper-
bound (28), as a function of the signal-to-noise-ratio (SNR),
where the quantity SNR is defined by
SNR(dB) = 20 log10 σ.
Results are presented in Figure 3 for two binary cases,
n= 4 and n= 12. Computer simulations of the detection
process were conducted for assessing the accuracy of (28).
The dotted lines show the results obtained by computer
simulations, which compare fairly well with (28).
B. Case T= 2, offset and gain mismatch
We will now examine the error performance of the
Pearson-distance-based detection scheme. The receiver uses
the Pearson distance (6) for the evaluation of the received
15 15.5 16 16.5 17 17.5 18 18.5 19
10−4
10−3
10−2
10−1
WER
SNR (dB)
n=4
n=12
Fig. 3. Word error rate (WER) of offset-resistant detection as a
function of the signal-to-noise ratio (SNR) for the binary case,
q= 2, and n= 4 and n= 12 computed using upperbound (28).
The dotted lines show results of computer simulations.
word, where we assume that xS2is sent, and received
as r=x+ν. The receiver errs if there is at least one
codeword ˆ
x̸=x,ˆ
xS2, such that
δ2(r,ˆ
x)< δ2(r,x).
After working out using (12), we obtain
n
i=1
ri(ˆxiˆx
σˆx
)<
n
i=1
ri(xix
σx
).
We substitute ri=xi+νi, and obtain
n
i=1
(xi+νi)(aiˆai)<0,(29)
where
ai=xix
σx
and
ˆai=ˆxiˆx
σˆx
.
The left-hand side of inequality (29) is a stochastic variable
with distribution N(α2, β2σ2), where
α2=
n
i=1
xi(aiˆai)
and
β2=
n
i=1
(aiˆai)2.
Since
n
i=1
ai=
n
i=1
ˆai= 0,
7
we have
α2=
n
i=1
xi(aiˆai)
=
n
i=1
(xix)(aiˆai)
=
n
i=1
(xix)xix
σxˆxiˆx
σˆx
=σx(1 ρx,ˆ
x).(30)
In a similar fashion, we find
β2= 2(1 ρx,ˆ
x).(31)
Define the square of the distance, d2
2(x,ˆ
x), between the
codewords xand ˆ
xby
d2
2(x,ˆ
x) = 4α2
2
β2
= 2σ2
x(1 ρx,ˆ
x).(32)
The distance, d2
2(x,ˆ
x), is not symmetric in the vectors x
and ˆ
x, that is, in general, d2
2(x,ˆ
x)̸=d2
2(ˆ
x,x). The quantity,
d2
2(x,ˆ
x), is the product of the Pearson distance between the
vectors xand ˆ
x, which is symmetric in xand ˆ
x, and the
word variance of the sent word x. Clearly, sent words x
that have a small symbol value variance are more prone
to error than words with a large symbol value variance.
The word error rate (WER) over all coded sequences xis
upperbounded by
WER <1
|S2|
xS2
ˆ
x̸=x
Qd2(x,ˆ
x)
2σ.(33)
We define the minimum squared distance between any pair
of codewords in S2,d2
min,2, by
d2
min,2= min
x,ˆ
xS2
x̸=ˆ
x
d2
2(x,ˆ
x)
= min
x,ˆ
xS2
x̸=ˆ
x
2σ2
x(1 ρx,ˆ
x).(34)
At high signal-to-noise ratios, the word error rate (WER) is
overbounded by [10]
WER < N2Q(α2
β2σ)
=N2Q(dmin,2
2σ),(35)
where N2is the (average) number of nearest neighbors.
We will now compute the minimum distance, d2
min,2, and
the number of nearest neighbors, N2.
In order to find the minimum distance, dmin,2, we must
compute the distance between any pair of codewords x,ˆ
x
S2, which implies an exhaustive domain search of size
around |S2|2. We may significantly reduce the number of
evaluations by the following observations.
It is not difficult to see that any permutation of the
symbols of a T-constrained n-length q-ary codeword yields
again a T-constrained codeword. A constant composition
code is a set of n-length q-ary codewords where the numbers
of occurrences of the symbols within a codeword is the same
for each codeword [11]. These specialize to constant weight
codes in the binary case, and permutation codes in the case
that each symbol occurs exactly once.
Define the composition vector w(x)=(w0, . . . , wq1)
of x, where the qentries wj,j∈ Q, of windicate the
number of occurrences of the symbol j∈ Q in x. Thus for
a sequence x, we denote the number of appearances of the
symbol jby
wj(x) = |{i:xi=j}| for j= 0,1, . . . , q 1.(36)
It is immediate that wj=nand wj∈ {0, . . . , n}. A
constant composition code will be denoted by Sw, where
wis the composition vector that characterizes the code.
Clearly, the codebook STis the union of Kconstant
composition codes, denoted by Swi,1iK, whose
composition vector
wi= (w0, . . . , wj, . . . , wq1)i,1iK,
has w0>0for T= 1, and both w0>0and wq1>0
for T= 2. The number, K, of constant composition codes
equals
K=n+q1T
q1.(37)
For the binary case, q= 2, we simply find
K=n+ 1 T.
Each constant composition code is characterized by one
codeword of the code, called pivot word, denoted by xpi,
1iK. The pivot word xpiis, by definition, the largest
word in the lexicographical ordering (the largest symbols
first) of the codewords in Swi. Thus, for a pivot word we
have xpiSwiand xpi=(xpi,1, xpi,2, . . . , xpi,n), where
xpi,1xpi,2xpi,3... xpi,n1xpi,n,1iK.
By definition we have for T= 2,xpi,1=q1and
xpi,n = 0,1iK. The remaining codewords in
STconsist of all distinct sequences that can be formed by
permuting the order of the nsymbols that form the Kpivot
words xpi,1iK.
Let the codewords xSwiand ˆ
xSwj,i̸=j, then
we have
d2
2(x,ˆ
x) = 2σ2
x(1 ρx,ˆ
x) = AB
n
i=1
xiˆxi,
where
A= 2σ2
x(1 + nxˆ
x
σxσˆx
)
and
B=2σx
σˆx
are positive constants. Since the average symbol value and
symbol value variance of codewords from the same constant
composition code are all equal, the only degree of freedom
8
we have for minimizing d2
2(x,ˆ
x)is permuting the symbols
in xand ˆ
xin such a way that the inner product n
i=1 xiˆxi
is maximized. It has been shown by Slepian [12] that
n
i=1 xiˆxiis maximized by pairing the largest symbol of
xwith the largest symbol of ˆ
x, the second largest symbol
of xwith the second largest symbol of ˆ
x, etc. Since, by
definition, the pivot words are sorted by descending order
of the symbol values, we conclude that the minimum inter-
subset distance is given by
min
xSwi,ˆ
xSwj
d2(x,ˆ
x) = d2(xpi,xpj), i ̸=j.
Define the minimum intra-subset distance by
d2(xpi,xpi) = min
x,ˆ
xSwi
x̸=ˆ
x
d2(x,ˆ
x),
then it is immediate that
dmin,2= min
1i,jKd2(xpi,xpj).(38)
A second reduction in the number of distance evaluations
can be made by observing that
d2(x,ˆ
x) = d2(q1x, q 1ˆ
x).(39)
The above observation implies that we may limit the dis-
tance evaluations to (sent) words x, whose number of ‘q1
symbols is less than the number of ‘0’ symbols. For other
values of q,q > 2, the pivot words start with a series of
q1’ symbols, followed by a series of ‘q2’ symbols
etc, and ended by a series of ‘0’ symbols. The series of
symbols may be of zero length, except, however, for the
symbol values ‘q1’ and ‘0’, where the series must have
a length of at least unity.
Figure 4 shows results of computations, where we plotted
20 log(dmin,2)(dB) as a function of nwith q=2, 3, 4
as a parameter. After perusing the diagram, we notice that
the variation of dmin,2as a function of ndiminishes with
larger values of q. We further notice that with increasing n
the minimum distance dmin,2converges to a limiting value.
In our computations, we found that (but we could not prove
it for all qand n) that the codewords x= (q1, q
2,0, . . . , 0) and ˆ
x=(q1, q1,0,...,0), and their inverses
(q1, q 1, . . . , q1,1,0) and (q1, q 1, . . . , q 1,0,0)
are at minimum squared distance d2
min,2= min 2σ2
x(1
ρx,ˆ
x),x,ˆ
xS2. For q= 2, the codewords (1, 0, 0, ...) and
its inverse (1, . . . , 1,0) each have n1nearest neighbors.
This also holds for the npermutations of (1, 0, 0, . . . )
plus its inverse (0, 1, 1, . . . ), so that, for q= 2, the total
number of pairs of codewords at minimum distance equals
2n(n1). For q > 2, the codeword (q1,q2, 0, . . . )
and its n(n1) permutations each have only one codeword
at minimum distance. The same holds for its inverse (0, 1,
q1,q1,...), so that for q > 2there are 2n(n1) pairs
of codewords at minimum distance. We conclude with (35)
that
WER <2n(n1)
|S2|Q(dmin,2
2σ), σ << 1.(40)
Fig. 4. Minimum squared distance, 20 log (dmin,2)(dB), as a
function of the codeword length nwith q= 2,3,4as a parameter.
Although lines have been drawn, the curves consist of discrete
points.
15 15.5 16 16.5 17 17.5 18 18.5 19
10−4
10−3
10−2
10−1
WER
SNR (dB)
n=12
n=4
Fig. 5. Word error rate (WER) of offset&gain-resistant detection
as a function of the signal-to-noise ratio (SNR) for q= 2,n=
4and12. The diagram shows the error performance computed by
upperbound (33) and computer simulations (dotted lines).
Upperbound (40), where it is assumed that minimum
distance error events dominate the performance, is far from
accurate in the range WER >108. A better approximation
to the word error rate is given by upperbound (33), where
the full distance profile is used. Figure 5 shows the word
error rate computed using upperbound (33) for q= 2 and
n=4, 12, as a function of the signal-to-noise-ratio. The
computer simulations (dotted lines) compare fairly well with
upperbound (33).
V. IMPLEMENTATION ISSUES
In the above we did not yet address the implementation
of the Minimum Pearson Distance detector, and we tacitly
assumed that the receiver computes the Pearson distance
for all codewords in the codebook before it may give its
verdict on the word sent. Below we will show that we
9
TABLE I
PIVOT W OR DS,AVE RAG E SYM BO L VALUE ,THE S YM BOL VAL UE
VARIANCE AND THE DISTANCE OF CASE DESCRIBED IN EXAMP LE 1.
ixpixpiσ2
xpiδ2(r,xpi)
1 (1,0,0,0) 1/4 3/4 5.93
2 (1,1,0,0) 2/4 1 6.23
3 (1,1,1,0) 3/4 3/4 6.74
may reduce the number of distance computations to the
number of constant composition subsets, K, that constitute
the full set ST. This means for q= 2 and T= 2,
for example, that instead of 2n2for a full set, only
n1computations of the Pearson distance are required.
As discussed earlier, Slepian [12] showed that a single per-
mutation modulation code may be very efficiently decoded
by applying a sorting algorithm to the entries of the received
signal vector. Slepian’s method can be extended to our case
incorporating a plurality of Kconstant composition codes.
The receiver has tabulated the Kpivot words, where the
symbols of the pivot words are, as described above, sorted
in descending order, i.e., the symbols with the largest values
first. Detection of the received vector, r, is accomplished by
invoking the following two-step procedure.
1) The nsymbols of the received word, r, are sorted,
largest to smallest, in the same way as taught in
Slepian’s prior art.
2) Compute δ2(r,xpi)or δ1(r,xpi),1iK, using
(12) or (17), for all Kpivot words. The receiver
decides that a word taken from Swk,1kK,
was sent in case the pivot word xpkis at minimum
distance to the received vector r. Now that we
have ascertained that the sent word is a member of
Swk, we can decode the received word by Slepian’s
procedure for single permutation codes.
Below we will describe a simple example for the gain-and-
offset mismatch channel to illustrate the detection routine.
Example 1: Let n= 4 and q= 2, then there are 14
codewords in S2and 3 subsets of constant composition.
Assume the received vector is r= (1.2, 2.3, 0.4, 0.8). First
we sort the symbols of rlargest to smallest, and obtain
(2.3, 1.2, 0.8, 0.4), and compute δ2(r,xpi)for each pivot
word xpi. Table I lists xpi,i= 1,2,3, the average symbol
value of the pivot words, the symbol value variance of the
pivot words, and the distance δ2(r,xpi)computed using
(13). We conclude that the pivot word (1,0,0,0) has the
smallest distance, 5.93, to the received vector r, so that
the detector, following Slepian’s sorting algorithm assigns
the largest symbol in rto a ‘1’ and the three other symbols
to ‘0s’. Thus, the receiver decides that the word (0,1,0,0)
was sent.
VI. CONCLUSIONS
We have studied coding and detection techniques for q-
level channels with gain and/or offset mismatch. We have
analyzed a new technique, where codeword detection is
based on the Pearson distance. The proposed detection
technique is used in conjunction with T-constrained codes
consisting of q-ary codewords, where in each codeword T
reference symbols appear at least once. We have shown that
the new detection technique is intrinsically resistant to offset
and/or gain mismatch. The redundancy of T-constrained
codes is much lower than that of prior art balanced codes
that offer offset and gain mismatch resistant detection,
which makes the new codes more attractive for practical
applications. We have analyzed the error performance of
the Pearson Distance detection technique in the presence of
additive noise. Results of computer simulations of the noisy
channel have been compared with analytical expressions of
the word error rates.
REF ER EN CE S
[1] K.A.S. Immink, ‘Coding Methods for High-Density Optical Record-
ing’, Philips J. Res., vol. 41, pp. 410-430, 1986.
[2] G. Bouwhuis, J. Braat, A. Huijser, J. Pasman, G. van Rosmalen, and
K.A.S. Immink, Principles of Optical Disc Systems, Adam Hilger
Ltd, Bristol and Boston, 1985.
[3] K.W. Cattermole, ‘Principles of Digital Line Coding’, Int. Journal of
Electronics, vol. 55, pp. 3-33, July 1983.
[4] A. Jiang, R. Mateescu, M. Schwartz, and J. Bruck, ‘Rank Modulation
for Flash Memories’, IEEE Trans. Inform. Theory, vol. IT-55, no. 6,
pp. 2659-2673, June 2006.
[5] H. Zhou, A. Jiang, and J. Bruck, ‘Error-correcting schemes with
dynamic thresholds in nonvolatile memories’, Int. Symposium in
Inform. Theory (ISIT), St Petersburg, July 2011.
[6] F. Sala, R. Gabrys, and L. Dolecek, ‘Dynamic Threshold Schemes
for Multi-Level Non-Volatile Memories’. IEEE Trans. on Commun.,
pp. 2624-2634, Vol. 61, July 2013.
[7] K.A.S. Immink, ‘Coding Schemes for Multi-Level Flash Memories
that are Intrinsically Resistant Against Unknown Gain and/or Offset
Using Reference Symbols’, Electronics Letters, vol. 50, pp. 20-22,
2014.
[8] K.A.S. Immink, ‘Coding Schemes for Multi-Level Channels with
Unknown Gain and/or Offset Using Balance and Energy constraints’,
IEEE International Symposium on Information Theory, (ISIT), Istan-
bul, July 2013.
[9] K.A.S. Immink, ‘High-Rate Maximum Runlength Constrained Cod-
ing Schemes Using Nibble Replacement’, IEEE Trans. Inform. The-
ory, pp. 6572-6580, vol. IT-58, No. 10, Oct. 2012.
[10] G.D. Forney Jr., ‘Maximum-Likelihood Sequence Estimation of Dig-
ital Sequences in the Presence of Intersymbol Interference’, IEEE
Trans. Inform. Theory, vol. IT-18, pp. 363-378, May. 1972.
[11] W. Chu, C.J. Colbourn, and P. Dukes, ‘On Constant Composition
Codes’, Discrete Applied Mathematics, Volume 154, Issue 6, 15 pp.
912-929, April 2006.
[12] D. Slepian, ‘Permutation Modulation’, Proc IEEE, vol. 53, pp.
228-236, March 1965.
Kees Schouhamer Immink (M81-SM86-F90) received his
PhD degree from the Eindhoven University of Technology.
He was with Philips Research Labs in Eindhoven from
1968 till 1998. In 1998, he founded Turing Machines Inc, a
company that has been successful in applying the tenets of
information theory to digital data storage and transmission.
He is, since 1994, an adjunct professor at the Institute
10
for Experimental Mathematics, Duisburg-Essen University,
Germany.
Immink designed coding techniques of virtually all
consumer-type digital audio and video recording products,
such as Compact Disc, CD-ROM, CD-Video, Digital
Audio Tape recorder, Digital Compact Cassette system,
DCC, Digital Versatile Disc, DVD, Video Disc Recorder,
and Blu-ray Disc. He received widespread recognition
for his many contributions to the technologies of video,
audio, and data recording. He received a Knighthood in
2000, a personal Emmy award in 2004, the 1996 IEEE
Masaru Ibuka Consumer Electronics Award, the 1998
IEEE Edison Medal, the 1999 AES Gold Medal, the
2004 SMPTE Progress Medal, and the Eduard Rhein
Prize for Technology in 2014. He received the Golden
Jubilee Award for Technological Innovation by the IEEE
Information Theory Society in 1998. He was named a
fellow of the IEEE, AES, and SMPTE, and was inducted
into the Consumer Electronics Hall of Fame, and elected
into the Royal Netherlands Academy of Sciences and the
US National Academy of Engineering. He received an
honorary doctorate from the University of Johannesburg in
2014. He served the profession as President of the Audio
Engineering Society inc., New York, in 2003.
Jos H. Weber (S’87-M’90-SM’00) was born in Schiedam,
The Netherlands, in 1961. He received the M.Sc. (in math-
ematics, with honors), Ph.D., and MBT (Master of Busi-
ness Telecommunications) degrees from Delft University
of Technology, Delft, The Netherlands, in 1985, 1989, and
1996, respectively.
Since 1985 he has been with the Faculty of Electrical
Engineering, Mathematics, and Computer Science of Delft
University of Technology. Currently, he is an associate
professor in the Department of Intelligent Systems. He is the
chairman of the WIC (Werkgemeenschap voor Informatie-
en Communicatietheorie in the Benelux) and the secretary
of the IEEE Benelux Chapter on Information Theory. He
was a Visiting Researcher at the University of California
at Davis, USA, the University of Johannesburg, South
Africa, the Tokyo Institute of Technology, Japan, and EPFL,
Switzerland. His main research interests are in the areas of
channel and network coding.
... However, the disadvantages of these methods, which have limited applicability, are the high redundancy and complexity. For example, the redundancy of a full set of balanced codewords is O(log m), where m is the number of user bits [48]. 1. INTRODUCTION (iii) A promising decoding technique with asymptotic zero redundancy as the codeword length increases is proposed in [49], where it is shown that decoders using the Pearson distance have immunity to offset and/or gain mismatch. A study [50] shows that a digital modulation transceiver based on Pearson distance detection provides excellent error performance for noisy channels with Rayleigh fading. ...
... A simple systematic Pearson coding scheme, that maps sequences of information symbols generated by a q-ary source to q-ary code sequences, is proposed in [54]. Construction of a particular kind of Pearson codes, i.e., T-constrained codes [49], using a finite state machine, is introduced in [55]. ...
... Minimum Euclidean distance detection is an ML criterion for an additive white Gaussian noise channel [15], but it may perform poorly against gain and/or the offset mismatch. Minimum Pearson distance detection is an ML criterion for the gain and/or offset mismatch channel without noise [49]. It is crucial and challenging to study the ML decoding solutions considering both noise and offset issues. ...
... Some of these parameters are used for the evaluation of cyber-attacks but the most parameters are left unused. It should be noted that examining the volume of many parameters may increase the overhead and ultimately reduce the efficiency of the proposed solution for intrusion detection [46,47]. In this paper, the correlation coefficient method is used in the preprocessing and correlation phase to reduce the number of parameters used to detect the APT attacks. ...
... This coefficient is between −1 to 1. In the case that there is no relationship between two variables, it is equal to zero [46]. ...
... Therefore, due to the correlation between the parameters, the best method is Pearson correlation coefficient. Perhaps the most widely used application of bivariate correlation statistical index is Pearson moment correlation coefficient, commonly called the Pearson correlation and denoted by r [46,47]. Pearson coefficient shows how much linear relationship exists among quantitative variables. ...
Article
Full-text available
Advanced persistent threat attacks are considered as a serious risk to almost any infrastructure since attackers are constantly changing and evolving their advanced techniques and methods. It is difficult to use traditional defense for detecting the advanced persistent threat attacks and protect network information. The detection of advanced persistent threat attack is usually mixed with many other attacks. Therefore, it is necessary to have a solution that is safe from error and failure in detecting them. In this paper, an intelligent approach is proposed called “APT-Dt-KC” to analyze, identify, and prevent cyber-attacks using the cyber-kill chain model and matching its fuzzy characteristics with the advanced persistent threat attack. In APT-Dt-KC, Pearson correlation test is used to reduce the amount of processing data, and then, a hybrid intrusion detection method is proposed using Bayesian classification algorithm and fuzzy analytical hierarchy process. The experimental results show that APT-Dt-KC has a false positive rate and false negative rate 1.9% and 3.6% less than the existing approach, respectively. The accuracy and detection rate of APT-Dt-KC has reached 98% with an average improvement of 5% over the existing approach.
... In order to make the error performance independent of unknown (base) offset mismatch, Immink and Weber [6] introduced the (modified) Pearson distance between two nvectors. Let x,x ∈ S, be two n-vectors, where S is the set of chosen codewords. ...
... We adopt here the same set of codewords, S = {0, 1} n \{0}, which is used in conjunction with the prior art modified Pearson distance detector [6]. Let x ∈ S be the sent codeword, and letx ∈ S,x = x. ...
... The union bound offers a useful tool to approximate the average word error rate (WER). The WER is upperbounded by [6] WER ...
Article
Full-text available
We consider noisy communications and storage systems that are hampered by varying offset of unknown magnitude such as low-frequency signals of unknown amplitude added to the sent signal. We study and analyze a new detection method whose error performance is independent of both unknown base offset and offset’s slew rate. The new method requires, for a codeword length n ≥ 12, less than 1.5 dB more noise margin than Euclidean distance detection. The relationship with constrained codes based on mass-centered codewords and the new detection method is discussed.
... The Pearson correlation [50] between the pH and the lactic acid bacteria growth in the fermentation process in the classic yogurt samples and with the berries addition was calculated. ...
Article
Full-text available
The yogurt was obtained from a combination of 50% goat's milk and 50% cow's milk with the inclusion of scald fruits of aronia (Aronia melanocarpa), raspberries (Rubus idaeus), strawberry (Fragaria xanassa). Physico-chemical and microbiological indices were determined, according to standard methods, after manufacture and storage, after 1, 5, 10, 15 days. Compared to other samples, yogurt with aronia showed the best values of the dynamics specific to the development of microorganisms: 2.93.107 cfu/ml; the growth rate of lactic acid bacteria at fermentation 0.95 μ; physico-chemical indices: titratable acidity 85 ± 0.078⁰T, pH 4.28 ± 0.002, water activity 0.875 ± 0.025; total dry matter 18.45 ± 0.31%, viscosity 2500 ± 0.023 mPa s, ash content 0.89 ± 0.10% and the optical density 2.531 ± 0.054 nm. Yeasts and molds were not detected in any of the samples. From a physico-chemical point of view, in storage, in all fruit yogurt samples the titratable acidity showed increasing values, pH remaining in the range of permissible values. In storage fruits formed an association to control the microbiological risk and stability of yogurt. Fruit yogurt shows a synergism with Streptococcus thermophilus, Lactobacillus delbrueckii subsp. bulgaricus, Lactococcus lactis subsp lactis biovar diacetilactis. The overall Pearson coefficient (Pc = f(pH and MC) for all fruit yogurt samples is -0.95066.
... Pearson codes have been advocated for channels whose gain and/or offset are unknown [19]. For binary channels, Q = {0, 1}, with unknown offset, the offset channel, it suffices to forbid the all-0 word (or the all-1 word), and for channels with both unknown gain and offset, the offset/gain channel, we forbid both the all-0 and all-1 words. ...
Article
Full-text available
We present coding methods for generating ℓ-symbol constrained codewords taken from a set, S, of allowed codewords. In standard practice, the size of the set S, denoted by M=|S|, is truncated to an integer power of two, which may lead to a serious waste of capacity. We present an efficient and low-complexity coding method for avoiding the truncation loss, where the encoding is accomplished in two steps: first, a series of binary input (user) data is translated into a series of M-ary symbols in the alphabet M = {0, ... ,M - 1}. Then, in the second step, the M-ary symbols are translated into a series of admissible ℓ-symbol words in S by using a small look-up table. The presented construction of Pearson codes and fixed-weight codes offers a rate close to capacity. For example, the presented 255B320B balanced code, where 255 source bits are translated into 32 10-bit balanced codewords, has a rate 0.1 % below capacity.
... The Pearson correlation [79] between the pH and the lactic acid bacteria growth in the classic yogurt samples fermentation process and with the berries addition was calculated. A high correlation was found a close relationship between variables, inversely proportional, because the values were obtained negative. ...
Article
Remote sensing technology has been frequently used to obtain information on changes in urban land cover because of its vast spatial coverage and timeliness of observation. Block-level change detection with high temporal resolution image data provides fine detail of urban changes, is suitable for urban management, and has gradually received widespread attention. High-dimensional features are required to express the heterogeneous structure of the blocks. High-dimensional high-frequency time series, namely, multivariate time series, are formed by arranging high-dimensional features chronologically. Classic change detection methods treat multivariate time series as univariate time series one by one. Few studies have analyzed the change in a multivariate time series by considering all variables as an entirety. Therefore, a graph-based segmentation for multivariate time series algorithm (MTS-GS) is proposed in this paper. Specifically, 1) we construct a similarity matrix to explore the changing patterns of multivariate time series for seasonal change, trend change, abrupt change, and noise disturbance; 2) a multivariate time series graph is defined based on the changing patterns; and 3) the corresponding graph segmentation algorithm is proposed in the paper to detect the abrupt and trend changes under noise and seasonal disturbances. Sentinel-2 images of the rapidly developing third-tier city of Luoyang, Henan province, China, are adopted to validate the algorithm. The F1-score in the spatial domain is 84.1%; the producer's and the user's accuracy in the temporal dimension are 81.8% and 80.1%, respectively. Seven change types are defined and extracted, showing the development pattern and the efficiency of land use in the city. Furthermore, the proposed MTS-GS can be used for pixel-level change detection and performs well under various time intervals and cloud covers.
Article
Bike-sharing systems are becoming popular and generate a large volume of trajectory data. In a bike-sharing system, users can borrow and return bikes at different stations. In particular, a bike-sharing system will be affected by weather, the time period, and other dynamic factors, which challenges the scheduling of shared bikes. In this article, a new shared-bike demand forecasting model based on dynamic convolutional neural networks, called SDF , is proposed to predict the demand of shared bikes. SDF chooses the most relevant weather features from real weather data by using the Pearson correlation coefficient and transforms them into a two-dimensional dynamic feature matrix, taking into account the states of stations from historical data. The feature information in the matrix is extracted, learned, and trained with a newly proposed dynamic convolutional neural network to predict the demand of shared bikes in a dynamical and intelligent fashion. The phase of parameter update is optimized from three aspects: the loss function, optimization algorithm, and learning rate. Then, an accurate shared-bike demand forecasting model is designed based on the basic idea of minimizing the loss value. By comparing with classical machine learning models, the weight sharing strategy employed by SDF reduces the complexity of the network. It allows a high prediction accuracy to be achieved within a relatively short period of time. Extensive experiments are conducted on real-world bike-sharing datasets to evaluate SDF. The results show that SDF significantly outperforms classical machine learning models in prediction accuracy and efficiency.
Article
Full-text available
The main purpose of this study is to revitalize the concept of wise and controlled supply of water for domestic, industrial and agricultural applications which facilitate sustainable usage of fresh water resources. As Eritrea is striving to manage its water resources, attention paid primarily to enable water flow control mechanisms in municipal water distribution systems. A table top process control trainer (PCT) was tested through proportional(P), integral (I) and derivative (D) control mechanisms using Ziegler-Nichols second method to evaluate the tuning variables. Applying exclusively P control action, critical period of oscillation (𝑃𝑃𝑐𝑐𝑐𝑐) was estimated as 1.4 sec at proportional band value of 9. P, PI and PID controller performance studies were conducted with tuned variables on the water flow control system at different step disturbances between 20 – 50 % and their corresponding responses were characterized. P controller exhibited faster responses with consistent increments in offset, PI controller recorded highest overshoot values with negligible offset and prolonged settling times. PID controller showed less overshoot values and faster response times than PI but it increased chatter on the control output signal. The study revealed that the system can be safely controlled between 0-80 LPH. If the offset is not a major concern, P controller would be reflected suitable with simple design and minimum expenditure, else PI controller makes offset to zero though it possesses higher settling times. In other words, PID controller is complex using more tuning parameters, need expensive maintenance, and has resulted an intermittent noise in the output signal.
Conference Paper
Full-text available
We will present coding techniques for transmission and storage channels with unknown gain and/or offset. It will be shown that a codebook of length-n q-ary codewords, S, where all codewords in S have equal balance and energy show an intrinsic resistance against unknown gain and/or offset. Generating functions for evaluating the size of S will be presented. We will present an approximate expression for the code redundancy for asymptotically large values of n.
Article
Full-text available
Maximum-likelihood sequence estimation of binary coded and uncoded information, stored on an optical disc, corrupted with additive Gaussian noise is considered. We assume the presence of inter-symbol interference and channel/receiver mismatch. The performance of the maximum-likelihood detection of runlength-limited sequences is compared against both uncoded information and information encoded by Hamming-distance-increasing convolutional codes.
Article
Full-text available
In this paper, we will present coding techniques for the character-constrained channel, where information is conveyed using q-bit characters (nibbles), and where w prescribed characters are disallowed. Using codes for the character-constrained channel, we present simple and systematic constructions of high-rate binary maximum runlength constrained codes. The new constructions have the virtue that large lookup tables for encoding and decoding are not required. We will compare the error propagation performance of codes based on the new construction with that of prior art codes.
Article
Full-text available
We explore a novel data representation scheme for multi-level flash memory cells, in which a set of n cells stores information in the permutation induced by the different charge levels of the individual cells. The only allowed charge-placement mechanism is a "push-to-the-top" operation which takes a single cell of the set and makes it the top-charged cell. The resulting scheme eliminates the need for discrete cell levels, as well as overshoot errors, when programming cells. We present unrestricted Gray codes spanning all possible n-cell states and using only "push-to-the-top" operations, and also construct balanced Gray codes. We also investigate optimal rewriting schemes for translating arbitrary input alphabet into n-cell states which minimize the number of programming operations.
Conference Paper
We will present coding techniques for transmission and storage channels with unknown gain and/or offset. It will be shown that a codebook of length-n q-ary codewords, S, where all codewords in S have equal balance and energy show an intrinsic resistance against unknown gain and/or offset. Generating functions for evaluating the size of S will be presented. We will present an approximate expression for the code redundancy for asymptotically large values of n.
Article
Coding schemes for storage channels, such as optical recording and non-volatile memory (Flash), with unknown gain and offset are presented. In its simplest case, the coding schemes guarantee that a symbol with a minimum value (floor) and a symbol with a maximum (ceiling) value are always present in a codeword so that the detection system can estimate the momentary gain and the offset. The results of the computer simulations show the performance of the new coding and detection methods in the presence of additive noise.
Article
In non-volatile memories, reading stored data is typically done through the use of predetermined fixed thresholds. However, due to problems commonly affecting such memories, including voltage drift, overwriting, and inter-cell coupling, fixed threshold usage often results in significant asymmetric errors. To combat these problems, Zhou, Jiang, and Bruck recently introduced the notion of dynamic thresholds and applied them to the reading of binary sequences. In this paper, we explore the use of dynamic thresholds for multi-level cell (MLC) memories. We provide a general scheme to compute and apply dynamic thresholds and derive performance bounds. We show that the proposed scheme compares favorably with the optimal thresholding scheme. Finally, we develop limited-magnitude error-correcting codes tailored to take advantage of dynamic thresholds.
Article
In this paper, we will present coding techniques for the character-constrained channel, where information is conveyed using q-bit characters (nibbles), and where w prescribed characters are disallowed. Using codes for the character-constrained channel, we present simple and systematic constructions of high-rate binary maximum runlength constrained codes. The new constructions have the virtue that large lookup tables for encoding and decoding are not required. We will compare the error propagation performance of codes based on the new construction with that of prior art codes.
Article
A maximum-likelihood sequence estimator for a digital pulse-amplitude-modulated sequence in the presence of finite intersymbol interference and white Gaussian noise is developed. The structure cbm- prises a sampled linear filter, called a whitened matched filter, and a recursive nonlinear processor, called the Viterbi algorithm. The outputs of the whitened matched filter, sampled once for each input symbol, are shown to form a set of suillcient statistics for estimation of the input sequence, a fact that makes obvious some earlier results on optimum linear processors. The Viterbi algorithm is easier to implement than earlier optimum nonlinear processors and its performance can be straight- forwardly and accurately estimated. It is shown that performance (by whatever criterion) is effectively as good as could be attained by any receiver structure and in many cases is as good as if intersymbol inter- ference were absent. Finally, a simplified but effectively optimum algorithm suitable for the most popular partial-response schemes is described.
Article
The role of line coding is to convert source data to a digital form resistant to noise in combination with such other impairments as a specific medium may suffer (notably intersymbol interference, digit timing jitter and carrier phase error), while being reasonably economical in the use of bandwidth. This paper discusses the nature and role of various constraints on code words and word sequences, including those commonly used on metallic lines, optical fibres, carrier channels and radio links ; and gives some examples from each of these applications. It should serve both as a general review of the subject and as an introduction to the companion papers on specific topics.