Content uploaded by Marisa Mohr
Author content
All content in this area was uploaded by Marisa Mohr on May 18, 2020
Content may be subject to copyright.
New Approaches in Ordinal Pattern Representations for Multivariate Time Series
Marisa Mohr, 1,2, * Florian Wilhelm,1Mattis Hartwig,2Ralf M¨
oller,2Karsten Keller3
1inovex GmbH, 76131 Karlsruhe, Germany
2Institute of Information Systems, University of L¨
ubeck, 23562 L¨
ubeck, Germany
3Institute of Mathematics, University of L¨
ubeck, 23562 L¨
ubeck, Germany
*Correspondence: mohr@ifis.uni-luebeck.de
Abstract
Many practical applications involve classification tasks on
time series data, e.g., the diagnosis of cardiac insufficiency
by evaluating the recordings of an electrocardiogram. Since
most machine learning algorithms for classification are not
capable of dealing with time series directly, mappings of time
series to scalar values, also called representations, are applied
before using these algorithms. Finding efficient mappings,
which capture the characteristics of a time series is subject
of the field of representation learning and especially valu-
able in cases of few data samples. Time series representations
based on information theoretic entropies are a proven and
well-established approach. Since this approach assumes a to-
tal ordering it is only directly applicable to univariate time se-
ries and thus rendering it difficult for many real-world appli-
cations dealing with multiple measurements at the same time.
Some extensions were established which also cope with mul-
tivariate time series data, but none of the existing approaches
take into account potential correlations between the move-
ment of the variables. In this paper we propose two new ap-
proaches, considering the correlation between multiple vari-
ables, which outperform state-of-the-art algorithms on real-
world data sets.
1 Introduction
Time series data is part of many real-world applications,
e.g., in weather prediction, stock markets, energy produc-
tion, medical recordings, sales and websites activity, polit-
ical or sociological factors. Classification of time series is
a challenging and important subject, e.g., to diagnose car-
diac insufficiency by evaluating the recordings of an electro-
cardiogram (ECG). Classical machine learning models for
classification such as k-nearest neighbor, support vector ma-
chines or random forest, can’t process time series directly. It
is necessary to extract scalar-valued representations (or fea-
tures) from time series before using these algorithms.
The choice of suitable representations strongly influences
the quality of a model and its predictions. Ideally, represen-
tations are chosen without domain knowledge of specialists
and yet contain as much information as possible. An ap-
proach for automatic extraction of representations is given
Copyright c
2020, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
by the application of deep neural networks, which learn ap-
propriate representations of time series within their hidden
layers (Franceschi, Dieuleveut, and Jaggi 2019). Deep neu-
ral networks, however, require many data samples for model
training, which are rarely available in real-world tasks.
In general, the goal is to find efficient mappings from a
time series to a scalar representation that capture as many
characteristics about the time series as possible. Information
theoretic entropies are promising through an encoding that
preserves information content (Amig´
o 2010). The concept
of Permutation Entropy (PE) has already been successfully
used in time series analysis and classification on many uni-
variate real-world data sets (Antonelli, Meschino, and Bal-
larin 2019; J. Weck et al. 2014; Xue et al. 2019). PE uses
total ordering of data points in a univariate time series to
encode the ups and downs of neighboring points.
The concept of PE in its original form is not able to han-
dle multivariate time series, because of the inability to deter-
mine a total order between vector-valued time points. Nev-
ertheless, in real-world applications we often have to deal
with high-dimensional multivariate time series. For exam-
ple, medical measurements stored as ECG data are usually
not determined from a single electrode, but from multiple
electrodes. There are extensions of PE which also cope with
multivariate time series data, but none of the existing ap-
proaches take into account potential correlations between
the simultaneous movement of the variables over the time.
This paper introduces two new approaches considering
simultaneous movement and correlations of variables. Sec-
tion 2 introduces the ordinal pattern formalism and a defini-
tion of PE. Section 3 presents related work. Section 4 details
our new approaches. To prove their contribution, we show
in Section 5, that our approaches outperform in classifica-
tion tasks on many data sets out of UEA Multivariate Time
Series Classification (MTSC) archive (Bagnall et al. 2018).
In the last Section, we discuss limitations, as well as further
research.
2 Preliminaries
For using entropies as representations, time series observa-
tions are encoded as sequences of symbolic abstractions. As
far as current research is concerned, there are two general
(0,1,2),(2,1,0),(1,0,2),(2,0,1),(0,2,1),(1,2,0)
Figure 1: All possible ordinal patterns of order d= 3.
Figure 2: Ordinal pattern determination of order d= 3 and
time delay τ= 20 in a univariate time series.
approaches of symbolization. Classical symbolization ap-
proaches use threshold values and data range partitioning
for symbol assignment such as the well-know SAX repre-
sentation introduced by Chiu, Keogh, and Lonardi (2003).
The ordinal pattern symbolization approach, describing the
up and down, is based on Bandt and Pompe (2002). The for-
malism and the advantages of this symbolization scheme are
introduced in the following.
2.1 Ordinal Pattern Symbolization
Ordinal patterns describe the total order between two or
more neighbors, coded by permutations.
Definition 1 (Univariate Ordinal Pattern).A vector
(x1, ..., xd)∈Rdhas ordinal pattern (r1, ..., rd)∈Ndof
order d∈Nif xr1≥... ≥xrdand rl−1> rlin the case
xrl−1=xrl.
Note that the equality of two values within a pattern is
not allowed. In this case, for example, the newer value is
regarded as a smaller value. Figure 1 shows all possible or-
dinal patterns of order d= 3 of a vector (x1, x2, x3). To
symbolize a time series (x1, x2, ..., xT)∈RTeach time
point t∈ {d, ..., T }is assigned its ordinal pattern of order
d. The order dis chosen much smaller than the length Tof
the time series to look at small windows in a time series and
their distributions of up and down movements. To access the
overarching trend delayed behavior is of interest. The time
delay τ∈Nis the delay between successive points in the
symbol sequences. Different delays show different details of
the structure of a time series. Figure 2 visualizes the ordinal
pattern determination of order d= 3 and time delay τ= 20
of four different time points in a univariate time series.
The ordinal approach has notable advantages in applica-
tion. First of all, the method is conceptionally simple. Sec-
ond, it is not necessary to have previous knowledge about the
data range or type of time series. Third, the ordinal approach
supports robust and fast implementations (Keller et al. 2017;
Piek, Stolz, and Keller 2019). Fourth, it allows an easier es-
timation of a good symbolization scheme compared to the
classical symbolization approaches (Keller, Maksymenko,
and Stolz 2015; Stolz and Keller 2017).
2.2 Ordinal Pattern Distributions
Not the ordinal patterns themselves, but their distribution in
different parts of a univariate time series (xt)T
t=1 are of in-
terest. Thus, each pattern is identified with exactly one of
the ordinal pattern symbols j= 1,2, ..., d!. Using the distri-
bution of symbols, the entropy of ordinal pattern symbols is
calculated as in the following Definition.
Definition 2 (Permutation Entropy (PE)).The permutation
entropy of order d∈Nand delay τ∈Nof a univariate time
series (xt)T
t=1,T∈Nis defined by
PE(d, τ ) = −
d!
X
j
pτ,d
jln pτ,d
j,(1)
where
pτ,d
j=#{t|(xt−(d−1)τ, ..., xt−τ, xt)has pattern j}
T−(d−1)τ(2)
is the relative frequency of ordinal pattern jin the time se-
ries.
Depending on the area of research, entropy is a mea-
sure for quantifying inhomogeneity, impurity, complexity
and uncertainty or unpredictability. In this paper, we use
PE as a representation (also known as feature) in a learning
model that measures the complexity of a time series, which
is measured in the (in)regularity of ordinal patterns occur-
ring. For time series with maximum random ordinal pattern
symbols (resulting in a uniform pattern distribution due to
uniqueness), PE is ln(d!). For a time series with regular pat-
tern, e.g., in case of monotony, PE is equal to zero (Amig´
o
2010).
Amultivariate time series ((xi
t)m
i=1)T
t=1 has more than one
time-dependent variable. Each variable xifor i∈1, ..., m
depends not only on its past values but also has some de-
pendency on other variables. Considering two time points
(xi
t)m
i=1 and (xi
t+1)m
i=1 with mvariables, it is not possible
to establish a total order between them. A total order is only
possible if xi
t> xi
t+1 or xi
t< xi
t+1 for all i∈1, ..., m.
Therefore, there is no trivial generalization of the PE algo-
rithm to the multivariate case.
3 Related Work
In order to determine ordinal patterns on multivariate time
series, two cases have been discussed in literature so far:
1. The multivariate time series is projected into ordinal
space directly by either
a. determining univariate ordinal pattern between values
in time of a single variable i(row marked in blue in
Figure 3), and then averaging over all mvariables, or
b. determining univariate ordinal pattern between values
in time of a single variable i(box marked in green in
Figure 3), storing all mpattern at a fixed time point tin
one matrix, or
Ordnung von Vektoren
1
x""
x#"
⋮
x%"
x"#
x##
⋯x"'
x#'
⋮ ⋱ ⋮
x%# ⋯ x%'
x""
x#"
⋮
x%"
x"#
x##
⋯x"'
x#'
⋮ ⋱ ⋮
x%# ⋯ x%'
(x′"x#
*⋯ x'
*)
1) 2)
Figure 3: Two possibilities of univariate ordinal pattern de-
termination in a multivariate time series. Case 1 shows ig-
noring variables (blue) and time (red) dependencies respec-
tively. Our first proposed approach considers both (green).
Case 2 shows a previous dimensionality reduction (yellow).
c. determining univariate ordinal pattern between values
of all variables at a single time point t(column marked
in red in Figure 3), and then averaging over all Ttime
points.
2. The multivariate time series is reduced to a single-
dimensional projection first, and then transformed into or-
dinal space.
Consequently, Definition 2 can be used for PE calcula-
tion in all cases. The first approach 1(a) is implemented by
Keller and Lauffer (1999) as Pooled Permutation Entropy
and hereafter referred to as PEpool. Morabito et al. (2012) ex-
pand the concept of PEpool by taking into account variations
over multiple scales, named Multivariate Multi-Scale Per-
mutation Entropy (MMSPE). Some theoretical basis for 1(b)
is set by Keller; Antoniouk, Keller, and Maksymenko (2012;
2013). A variant of approach 1(c) is implemented by He,
Sun, and Wang (2016) as Multivariate Permutation Entropy
(MVPE). The second case is implemented by Rayan, Mo-
hammad, and Ali (2019). They propose to reduce the num-
ber of variables mvariable first by applying different dis-
tance measures between all points of the time series and a
reference point before calculating PE on the reduced univari-
ate time series. Depending on the distance measure and ref-
erence point we denote the calculated PE-variants as PEeucl
(Euclidian distance with reference point (xi
0)m
i=1)), PEmanh
(Manhattan distance with reference point (xi
0)m
i=1)), and
PEnorm (Euclidian distance with reference point 0). Note that
the length Tof the time series in variable-dimensions must
be the same in order to perform a dimensionality reduction.
The well-known Matrix Profile by Yeh et al. (2016) pursues
a different goal and is therefore not considered in this paper.
4 Ordinal Pattern Representations
Considering Correlations of Variables
The simultaneous movement pattern of several variables
over time is important information, that should be encoded
in a mathematical representation. For example, each of ECG
and Atrial Blood Pressure (ABP) signals contain informa-
tion of cardiac status, which can be used for diagnosis of
diseases. The depolarization of ventricles and contraction of
the large ventricular muscles of the human heart (so-called
Figure 4: Medical time series (ECG signal II and ABP) from
a patient of the MIMIC III waveform database with identifi-
cation id 3900006 0029 published by Johnson et al. (2016).
QRS complex) can be observed during an ECG signal by
the highest rash, the central and most visually obvious part
of the tracing as it is shown in the signal at top in Figure 4.
Shortly after the electrical activity the blood pressure rises as
it is shown in the second signal in Figure 4. Both movements
of variables depend on each other and should be considered
together (red boxes in Figure 4), rather than separately.
In the following, we propose two new approaches tak-
ing into account the correlation of two or more variables.
In Section 4.1, we introduce our first approach adapting the
first case in Section 3 by natural extension of dimensions to
include both, variable and time. In Section 4.2, we present
a second approach following the second case in Section 3
by an improvement of dimensional reduction via Principal
Component Analysis.
4.1 Permutation Entropy Based on Multivariate
Ordinal Pattern
The intuitive idea is to store the univariate ordinal patterns of
all variables at a time point ttogether into one multivariate
pattern.
Definition 3 (Multivariate Ordinal Pattern).A matrix
(x1, ..., xd)∈Rm×dhas multivariate ordinal pattern
r11 · · · r1d
.
.
.....
.
.
rm1· · · rmd
∈Nm×d(3)
of order d∈Nif xri1≥... ≥xrid for all i= 1, ..., m
and ril−1> ril in the case xril−1=xril .
Figure 5 shows all possible multivariate ordinal patterns
of order d= 2 and number of variables m= 2.
With the natural extension of Definition 1, which leads to
Definition 3, it is possible to apply the PE algorithm in Def-
inition 2 in its original form to multivariate time series. We
denote PEmulti as PE that have been calculated on multivari-
ate ordinal patterns. Obviously, this approach has the dis-
advantage that the number of possible patterns d!increases
exponentially with the number of variables m, in summary
(d!)m. In this setting, we quickly end up with an under-
sampling and a uniform distribution of patterns, resulting in
1 0
1 0,1 0
0 1,0 1
1 0,0 1
0 1
Figure 5: All possible multivariate ordinal patterns of order
d= 2 and variable-dimension m= 2.
maximum complexity. Nevertheless, with small order dand
sufficiently large length Tof the time series, more informa-
tion can be obtained than with its averaged variant PEpool.
The following evaluation confirms this.
4.2 Permutation Entropy Based on Principal
Component Analysis
The disadvantages of the approach in the previous Section
limit its applicability on real-world data sets. In order to min-
imize the combinatorial possibilities of ordinal patterns, we
consider the univariate ordinal pattern case. The idea is to
transform a multivariate time series ((xi
t)m
i=1)T
t=1 into a uni-
variate time series ((x0
t)T
t=1 and then calculate the PE from
Definition 2 on the reduced time series as usual. To order
time points based on a single-dimensional values, Rayan,
Mohammad, and Ali (2019) propose to use the Euclidian or
Manhattan distance between all points of the time series and
a reference point for reduction. The calculation of distances
can be interpreted as the strength of a signal or time series,
on the basis of which PE is then calculated. This approach
does not take into account correlations between variables.
The aim, however, should be to sustain as much infor-
mation of variables correlations as possible to understand
their simultaneous movement. Principal Component Analy-
sis (PCA) is a well-known method converting a set of obser-
vations of possibly correlated variables into a set of values of
linearly uncorrelated variables by an orthogonal transforma-
tion. The total variance is an information that describes one
kind of characteristics of the time series (Hastie, Tibshirani,
and Friedman 2009). For m-dimensional data, there are ba-
sically mbasis vectors that are orthogonal. The variance of
data points along each basis vector is the total variance of the
data. If the first r < m basis vectors cover a sufficiently large
percentage of the total variance, then the major components
represented by the new rbasis vectors will be sufficient for
the information content of the data. Keeping only the first
rprincipal components gives the truncated transformation
X0
r=XWr, where W∈Nr×ris a matrix of weights
whose columns are the eigenvectors of XTXsorted in de-
scending order of the rhighest corresponding eigenvalues.
For applying PCA, we assume that the time series
((xi
t)m
i=1)T
t=1 is stationary for all i= 1, .., m. An appropri-
ate algorithm to reduce the data by PCA can be found in all
popular statistical textbooks, e.g. in Hastie, Tibshirani, and
Friedman (2009). We denote PEPCA as PE that have been
calculated on the single-dimensional projection by PCA. We
have r= 1 for reducing a multivariate into a univariate time
series. The reduction from high number of variables mto
only one dimension can lead to a high loss of information of
variance in the data, provided that the first main component
explains very little variance of the data. In order to show that
the addition of further main components does not lead to bet-
ter results, we denote PEPCA2 as the calculation of PEmulti on
the first two principal components. A possible explanation
for this could be the orthogonality of the main components.
5 Experimental Results
We have conducted several experiments to investigate the
relevance of the different ordinal pattern representations for
multivariate time series. We consider a representation as be-
ing good, if it is flexible and realiable applicable on differ-
ent real-world data sets. To show the flexibility, we com-
pare it on 25 different real-world use cases. To show the
reliability, we compare the accuracy of seven different PE-
variants. Higher accuracy means that the representation is
able to identify the underlying explanatory factors better
than other representations and discriminatory properties can
be identified as useful inputs for supervised predictors. (Ben-
gio, Courville, and Vincent 2012). We challenge our or-
dinal pattern representations in classification tasks on the
UEA MTSC archive, a collection of different time series
data sets from many real-world cases, released in October
2018 by Bagnall et al. (2018). The archive consists of 30
data sets with a wide range of series lengths, dimensions
and cases from human activity recognition, motion classi-
fication, ECG classification, electroencephalography (EEG)
and magnetoencephalography (MEG) classification to audio
spectra classification and many others. In the following set-
ting 5 of 30 data sets of the archive could not be used without
compromising comparability due to different length Tof the
time series in its mvariables.
5.1 Deep Dive: AtrialFibrillation Classification
Before we start with a general evaluation of different data
sets, we give a detailed insight in one specific data set out
of the 30 for a better understanding. The AtrialFibrillation
data set contains two-channel ECG recordings for predicting
spontaneous termination of atrial fibrillation (AF). The class
labels are: t, s and n. Class t contains data, where the AF
terminates at the latest within one second after the recording
ending. Class s is described as an AF that self terminates at
least one minute after the recording ending. In Class n, the
AF does not terminate for at least one hour after the record-
ing of the data. An example of the recordings for each class
is shown in Figure 6. More details are in Moody (2004). Be-
low we examine the separability of the three classes based
on the different PE-variants, PEpool, PEmulti and PEPCA. The
variance ratio for the first main component is greater than
74.17% for every l= 1, .., ntrain, where ntrain is the number
of train samples. So a dimensionality reduction via PCA can
be applied without losing too much information. Figure 7
shows boxplots for the calculated values of three different
PE-variants for each class. While PEpool does not allow any
separability of the classes, PEmulti allows classes n and t to
be separated relatively well. An even better separation of
classes s and t is achieved by PEPCA.
Figure 6: Example of 2-channel recordings for all three
classes.
Figure 7: Boxplots for the values of three different PE-
variants of order d= 2 and delay τ= 1 for classes n (blue),
s (orange) and t (green).
5.2 A Comparison of PE-variants on the UEA
Data Sets
We perform a classification on the UEA MTSC data sets to
compare the ability of separation through seven ordinal pat-
tern representations PEpool, PEmulti , PEeucl, PEmanh , PEnorm ,
PEPCA and PEPCA2 on arbitrary data sets. The initial bench-
marking on the UEA MTSC data sets by Bagnall et al.
(2018) is with the standard 1-nearest neighbor (NN) clas-
sifier with three different distance functions. To make a cer-
tain comparability we also use the 1-NN classifier. As model
input, all seven PE-variants were used individually for the
evaluation. Finding an optimal order dand time delay τis a
challenging problem in research (Riedl, M¨
uller, and Wessel
2013; Myers and Khasawneh 2019). For simplicity, we have
done an extensive hyperparameter optimization.
Table 1 lists the accuracy of the performed 1-NN classifi-
cation on all data sets with train size ntrain, number of vari-
ables m, length Tof the time series and number of classes
C. The highest accuracy scores per data set are bold. Be-
sides, the names of data sets whose benchmark accuracy we
outperform are italics.
PEmulti outperforms on two data sets with low number m
of variables and high length Tof the time series, which con-
firms that the representation is successful in this special case,
especially more successful than its averaged variant PEpool.
PEPCA outperforms the other state-of-the-art PE variants in
most cases. The results show that a dimensionality reduc-
tion by decorrelation of variables is more successful than by
different distance measures. The accuracy scores of PEPCA2
confirm that adding more major components does not pro-
duce better results than PEPCA.
6 Conclusion and Future Work
Throughout this paper, we have discussed ordinal pattern
representations for multivariate time series both from the
viewpoint of their theoretical foundation and their applica-
tion. We have shown that our approaches, especially PEPCA,
outperform state-of-the-art algorithms on many real-world
data sets. PEmulti is a valuable representation in case of small
number mof variables and high length Tof the time series.
Due to an improvement in prediction results and easy han-
dling, the integration into toolboxes for representation learn-
ing of multivariate time series is indispensable.
Furthermore, we will study, if an optimized method for
dealing with PCA in case of high-dimension, low-sample-
size (HDLSS) data suggested by Yata and Aoshima (2012)
can improve our results of PEPCA. In addition, it may be
worth taking a closer look at the individual PE of all mmain
components to understand how movements on the decorre-
lated variables of the multivariate time series change.
The advantages of ordinal patterns for analysis and pre-
diction of time series is well known in research. The ap-
proach presented here has the disadvantage that represen-
tation and prediction model are chosen independently. We
currently work on adapting the ordinal pattern approach to
automatic representation learning. Instead of first calculat-
ing PE and applying an ML model in a second step, this will
allow to both learn the representation and use them to per-
form a specific problem in one task.
References
Amig´
o, J. 2010. Permutation Complexity in Dynamical Systems:
Ordinal Patterns, Permutation Entropy and All That. Springer Se-
ries in Synergetics. Berlin Heidelberg: Springer-Verlag.
Antonelli, A.; Meschino, G.; and Ballarin, V. 2019. Mammo-
graphic Density Estimation Through Permutation Entropy. In Lhot-
ska, L.; Sukupova, L.; Lackovi´
c, I.; and Ibbott, G. S., eds., World
Congress on Medical Physics and Biomedical Engineering 2018,
IFMBE Proceedings, 135–141. Singapore: Springer.
Antoniouk, A.; Keller, K.; and Maksymenko, S. 2013.
Kolmogorov-Sinai entropy via separation properties of order-
generated sigma-algebras. Discrete and Continuous Dynamical
Systems 34(5):1793–1809.
Bagnall, A.; Dau, H. A.; Lines, J.; Flynn, M.; Large, J.; Bostrom,
A.; Southam, P.; and Keogh, E. 2018. The UEA multivariate time
series classification archive, 2018. arXiv:1811.00075 [cs, stat].
Bandt, C., and Pompe, B. 2002. Permutation Entropy: A Natu-
ral Complexity Measure for Time Series. Physical Review Letters
88(17).
Bengio, Y.; Courville, A.; and Vincent, P. 2012. Representation
Learning: A Review and New Perspectives. arXiv:1206.5538 [cs].
Chiu, B.; Keogh, E.; and Lonardi, S. 2003. Probabilistic Discovery
of Time Series Motifs. In Proceedings of the Ninth ACM SIGKDD
International Conference on Knowledge Discovery and Data Min-
ing, KDD ’03, 493–498. New York, NY, USA: ACM.
Franceschi, J.-Y.; Dieuleveut, A.; and Jaggi, M. 2019. Unsuper-
vised Scalable Representation Learning for Multivariate Time Se-
ries. arXiv:1901.10738 [cs, stat].
Hastie, T.; Tibshirani, R.; and Friedman, J. 2009. The Elements
of Statistical Learning: Data Mining, Inference, and Prediction,
Second Edition. Springer Series in Statistics. New York: Springer-
Verlag, 2 edition.
1-NN based on
Datset ntrain m T C PEpool PEmulti PEeucl PEmanh PEnorm PEPCA PEPCA2
ArticularyWordR. 275 9 144 25 0.137 0.073 0.116 0.11 0.11 0.127 0.064
AtrialFibr. 15 2 640 3 0.667 0.6 0.6 0.6 0.666 0.8 0.4
BasicMotions 40 6 100 4 0.625 0.525 0.525 0.45 0.475 0.675 0.45
DuckDuckGeese 50 1345 270 5 0.3 0.2 0.28 0.24 0.26 0.36 0.28
EigenWorms 128 6 17984 5 0.557 0.611 0.578 0.584 0.578 0.595 0.443
Epilepsy 137 3 206 4 0.5 0.478 0.507 0.48 0.485 0.514 0.384
ERing 30 4 65 6 0.392 0.351 0.385 0.374 0.351 0.381 0.274
EthanolCon. 261 3 1751 4 0.323 0.312 0.304 0.323 0.330 0.331 0.300
FaceDetection 5890 144 62 2 0.504 0.500 0.508 0.506 0.504 0.512 0.510
FingerMovem. 360 28 50 2 0.57 0.54 0.6 0.62 0.55 0.58 0.56
HandMovem. 160 10 400 4 0.419 0.419 0.392 0.392 0.432 0.364 0.405
Handwriting 150 3 152 26 0.072 0.06 0.075 0.069 0.074 0.071 0.051
Heartbeat 204 61 405 2 0.692 0.722 0.682 0.712 0.698 0.731 0.727
Libras 180 2 45 15 0.322 0.266 0.311 0.295 0.35 0.277 0.117
LSST 2459 6 36 14 0.238 0.212 0.201 0.236 0.198 0.260 0.309
MotorImagery 278 64 3000 2 0.52 0.56 0.54 0.49 0.55 0.57 0.56
NATOPS 180 24 51 6 0.239 0.2 0.25 0.272 0.272 0.266 0.2
PEMS-SF 267 963 144 7 0.671 0.173 0.563 0.647 0.524 0.676 0.266
PenDigits 7494 2 8 10 0.203 0.190 0.201 0.198 0.176 0.229 0.107
PhonemeSpectra 3315 11 217 39 0.060 0.052 0.049 0.045 0.060 0.061 0.034
RacketSports 151 6 30 4 0.348 0.296 0.355 0.349 0.368 0.381 0.362
SelfR.SCP1 268 6 896 2 0.580 0.556 0.659 0.662 0.607 0.611 0.539
SelfR.SCP2 200 7 1152 2 0.588 0.577 0.561 0.572 0.584 0.6 0.577
StandWalkJ. 12 4 2500 3 0.666 0.733 0.666 0.533 0.733 0.666 0.533
UWaveGest. 120 3 315 8 0.243 0.234 0.225 0.216 0.206 0.244 0.184
Table 1: Accuracy scores of multivariate PE variants on 25 UEA MTSC data sets.
He, S.; Sun, K.; and Wang, H. 2016. Multivariate permuta-
tion entropy and its application for complexity analysis of chaotic
systems. Physica A: Statistical Mechanics and its Applications
461:812–823.
J. Weck, P.; Schaffner, D.; Brown, M.; and Wicks, R. 2014. Permu-
tation Entropy and Statistical Complexity Analysis of Turbulence
in Laboratory Plasmas and the Solar Wind. arXiv.
Johnson, A. E. W.; Pollard, T. J.; Shen, L.; Lehman, L.-w. H.; Feng,
M.; Ghassemi, M.; Moody, B.; Szolovits, P.; Anthony Celi, L.; and
Mark, R. G. 2016. MIMIC-III, a freely accessible critical care
database. Scientific Data 3:160035.
Keller, K., and Lauffer, H. 1999. Symbolic analysis of high-
dimensional time series. In Int. J. Bifurcation Chaos, 2657–2668.
Keller, K.; Mangold, T.; Stolz, I.; and Werner, J. 2017. Permutation
Entropy: New Ideas and Challenges. Entropy 19(3):134.
Keller, K.; Maksymenko, S.; and Stolz, I. 2015. Entropy determi-
nation based on the ordinal structure of a dynamical system. Dis-
crete and Continuous Dynamical Systems - Series B 20(10):3507–
3524.
Keller, K. 2012. Permutations and the Kolmogorov-Sinai entropy.
Discrete & Continuous Dynamical Systems - A 32(3):891.
Moody, G. 2004. Spontaneous termination of atrial fibrillation: A
challenge from physionet and computers in cardiology 2004. In
Computers in Cardiology, 2004, 101–104.
Morabito, F. C.; Labate, D.; La Foresta, F.; Bramanti, A.; Mora-
bito, G.; and Palamara, I. 2012. Multivariate Multi-Scale Per-
mutation Entropy for Complexity Analysis of Alzheimer’s Disease
EEG. Entropy 14(7):1186–1202.
Myers, A., and Khasawneh, F. 2019. On the Automatic Param-
eter Selection for Permutation Entropy. arXiv:1905.06443 [nlin,
physics:physics].
Piek, A. B.; Stolz, I.; and Keller, K. 2019. Algorithmics, Pos-
sibilities and Limits of Ordinal Pattern Based Entropies. Entropy
21(6):547.
Rayan, Y.; Mohammad, Y.; and Ali, S. A. 2019. Multidimen-
sional Permutation Entropy for Constrained Motif Discovery. In
Nguyen, N. T.; Gaol, F. L.; Hong, T.-P.; and Trawi´
nski, B., eds.,
Intelligent Information and Database Systems, Lecture Notes in
Computer Science, 231–243. Cham: Springer International Pub-
lishing.
Riedl, M.; M¨
uller, A.; and Wessel, N. 2013. Practical consider-
ations of permutation entropy: A tutorial review. The European
Physical Journal Special Topics 222.
Stolz, I., and Keller, K. 2017. A General Symbolic Approach to
Kolmogorov-Sinai Entropy. Entropy 19(12):675.
Xue, X.; Li, C.; Cao, S.; Sun, J.; and Liu, L. 2019. Fault Diagnosis
of Rolling Element Bearings with a Two-Step Scheme Based on
Permutation Entropy and Random Forests. Entropy 21(1):96.
Yata, K., and Aoshima, M. 2012. Effective PCA for high-
dimension, low-sample-size data with noise reduction via geomet-
ric representations. Journal of Multivariate Analysis 105(1):193–
215.
Yeh, C.-C. M.; Zhu, Y.; Ulanova, L.; Begum, N.; Ding, Y.; Dau,
H. A.; Silva, D. F.; Mueen, A.; and Keogh, E. 2016. Matrix Pro-
file I: All Pairs Similarity Joins for Time Series: A Unifying View
That Includes Motifs, Discords and Shapelets. In 2016 IEEE 16th
International Conference on Data Mining (ICDM), 1317–1322.