ArticlePDF Available

Multi-dimensional dynamic time warping for gesture recognition


Abstract and Figures

We present an algorithm for Dynamic Time Warp-ing (DTW) on multi-dimensional time series (MD-DTW). The algorithm utilises all dimensions to find the best synchronisation. It is compared to ordi-nary DTW, where a single dimension is used for aligning the series. Both one-dimensional and multi-dimensional DTW are also tested when derivatives instead of feature values are used for calculating the warp. MD-DTW performed best in finding a known ground truth under noisy conditions. The algorithms were also used to perform simple classification of a set of 121 gestures. MD-DTW performed as well as or better than any single dimension in all tasks. In general, DTW on feature derivatives gave better re-sults than DTW on feature values.
Content may be subject to copyright.
Multi-Dimensional Dynamic Time Warping for Gesture
G.A. ten Holta,b M.J.T. ReindersaE.A. Hendriksa
aInformation and Communication Theory Group
Delft University of Technology,
Mekelweg 4, 2628 CD, Delft, The Netherlands
bHuman Information Communication Design
Delft University of Technology,
Landbergstraat 8, 2628 CC, Delft, The Netherlands
Keywords: Dynamic Time Warping, pattern recognition, multi-dimensional time series
We present an algorithm for Dynamic Time Warp-
ing (DTW) on multi-dimensional time series (MD-
DTW). The algorithm utilises all dimensions to find
the best synchronisation. It is compared to ordi-
nary DTW, where a single dimension is used for
aligning the series. Both one-dimensional and multi-
dimensional DTW are also tested when derivatives
instead of feature values are used for calculating the
warp. MD-DTW performed best in finding a known
ground truth under noisy conditions. The algorithms
were also used to perform simple classification of a
set of 121 gestures. MD-DTW performed as well as
or better than any single dimension in all tasks. In
general, DTW on feature derivatives gave better re-
sults than DTW on feature values.
1 Introduction
There are various problem areas where signals
need to be synchronised. When two time signals
are compared, or when a pattern is sought in a
larger stream of data, one of the signals may be
warped in a non-linear way by shrinking or expand-
ing along its time axis. Simple point-to-point compar-
ison then gives unrealistic results, because one might
be comparing different relative parts of the same sig-
nal/pattern. In these cases, some sort of synchroni-
sation is needed. Figure 1 illustrates the difference
between point-to-point comparison and comparison
aided by synchronisation.
Dynamic Time Warping (DTW) [5] has long been
used to find the optimal alignment of two signals. The
DTW algorithm calculates the distance between each
possible pair of points out of two signals in terms of
Figure 1: The importance of synchronisation. If sig-
nals are simply compared at each time instance (dot-
ted arrow), they may be in different relative phases,
giving unrealistic differences. Synchronisation en-
sures proper comparison (solid arrow).
their associated feature values. It uses these distances
to calculate a cumulative distance matrix and finds the
least expensive path through this matrix. This path
represents the ideal warp - the synchronisation of the
two signals which causes the feature distance between
their synchronised points to be minimised. Usually,
the signals are normalised and smoothed before the
distances between points are calculated.
DTW has been used in various fields, such as
speech recognition [7], data mining [3], and move-
ment recognition [1, 2]. Previous work in the field of
DTW mainly focused on speeding up the algorithm,
the complexity of which is quadratic in the length
of the series. Examples are applying constraints to
DTW [4], approximation of the algorithm [8] and
lower bounding techniques [3]. [4] proposed a form
of DTW called Derivative DTW (DDTW). Here, the
distances calculated are not between the feature val-
ues of the points, but between their associated first-
order derivatives. In this way, synchronisation is
based on shape characteristics (slopes, peaks) rather
than simple values. Most work, however, only con-
sidered one-dimensional series.
We work on gesture recognition. Gestures are
recorded by cameras and multiple features are ex-
tracted at each time instance, giving us multi-
dimensional time series. We therefore investigated
techniques for the synchronisation of such series.
Multi-dimensional (time) series are series in which
multiple measurements are made simultaneously.
Such series have an K-dimensional vector of feature
values for each (time) instance of the series. They can
be synchronised by simply picking one dimension to
perform DTW with and warping the complete series
according to the warp found in this dimension. How-
ever, in many cases, all dimensions will contain infor-
mation needed for synchronisation. We therefore pro-
pose multi-dimensional DTW (MD-DTW) for syn-
chronising such series. An extension of DTW into
2 dimensions was proposed by [9], but not systemat-
ically tested. In [2] a type of MD-DTW is described,
but only used for fixed-length series. In the next sec-
tions, we explain our MD-DTW algorithm and test it
on our gesture dataset. We compare MD-DTW to reg-
ular 1D-DTW on various dimensions. We also looked
at using derivatives instead of the values themselves
(DDTW) and at combining both approaches in MD-
2 Multi-Dimensional Dynamic Time
2.1 Algorithm
Multi-dimensional series consist of a number of
measurements made at each instance. The number
of measurements is the dimensionality of the series,
the number of time instances its length. Note that
multi-dimensional series need not be time signals, any
situation in which several measurements are made
simultaneously depending on one variable gives a
multi-dimensional series. In this paper, we assume
that measurements are stored in a matrix, in which
columns are features and rows are time instances.
Take two series Aand B. DTW involves the cre-
ation of a matrix in which the distance between every
possible combination of time instances A(i)B(j)
is stored. This distance is calculated in terms of the
The MD-DTW Algorithm
Let A, B be two series of dimension Kand
length M, N respectively.
Normalise each dimension of Aand B
separately to a zero mean and unit vari-
If desired, smooth each dimension with
a Gaussian filter
Fill the Mby Ndistance matrix Dac-
cording to:
D(i, j) =
|A(i, k)B(j, k)|
Use this distance matrix to find the best
synchronisation with the regular DTW
Figure 2: The MD-DTW algorithm
feature values of the points. Various norms are possi-
ble. In 1D-DTW, the distance is usually calculated by
taking the absolute or the squared distance between
the feature values of each combination of points. For
MD-DTW, a distance measure for two K-dimensional
points must be calculated. This distance can be any
p-norm. We use the 1-norm, i.e. the sum of the
absolute differences in all dimensions. To combine
different dimensions in this way, it is necessary to
normalise each dimension to a zero mean and unit
variance. For this, the dimensions must be compa-
rable. If for instance one dimension contains real-
valued measurements and one is binary, comparing
them directly is not possible and a more sophisticated
distance measure must be found. The MD-DTW al-
gorithm is shown in figure 2.
The benefits of MD-DTW can be seen when multi-
dimensional series are considered that have synchro-
nisation information distributed over different dimen-
sions. Take the artificial 2D series shown in fig-
ure 3(a). It is clear that for the first half (in time) of
the series, dimension 1 is useful for finding the cor-
rect synchronisation, whereas dimension 2 is uninfor-
mative. The converse is true for the second half of
the series. If we were to perform 1D-DTW on this se-
ries using dimension 1, the result would be as shown
in figure 3(b). The second half of the series is uni-
formly synchronised, since there is no information for
1D-DTW to work with. But it can be seen that for di-
mension 2, this is not the ideal synchronisation. 1D-
DTW on dimension 2 gives a similar (but converse)
Figure 3: The necessity for MD-DTW. (a) shows two artificial 2D time series of equal mean and variance. Di-
mension 1 is shown in column 1, dimension 2 in column 2. The series contain synchronisation information in
both dimensions. If 1D-DTW is performed using the first dimension, the result is suboptimal for dimension 2, as
illustrated in (b). Note that the peaks and valleys in dimension 2 are not aligned properly. 1D-DTW on dimension
2 gives a similar suboptimal match for dimension 1. MD-DTW takes both dimensions into account and finds the
best synchronisation (c).
result. MD-DTW takes both dimensions into account
in finding the optimal synchronisation. The result is
a synchronisation that is as ideal as possible for both
dimensions, as shown in figure 3 (c). This is the ad-
vantage of MD-DTW over regular DTW.
Though the series in figure 3 is artificial, the sit-
uation it depicts is not unrealistic. Figure 4 shows a
similar situation from a real-world multi-dimensional
series: x- and y-co-ordinates of the right hand for the
Dutch sign for acrobat. Here, the y-co-ordinate is in-
formative for the first part of the series, and the x-
co-ordinate for the second part. To get both peaks
properly matched, MD-DTW is necessary.
2.2 DTW on Derivatives
[4] argue that for synchronising shape-
characteristics of series (such as peaks and slopes),
it is beneficial to perform DTW on the first-order
derivatives of the feature values (DDTW). Since we
want to perform such shape-matching in each of our
dimensions, we considered this option for the MD-
DTW algorithm. In our case, this meant taking the
first-order derivative in each dimension separately.
This gives us information about the slopes and peaks
in each dimension. The series were first smoothed
in each dimension with a Gaussian filter (σ= 5) to
diminish noise effects. Then an approximation of the
derivative was taken in each dimension using the filter
der(a(t)) = (a(t+ 1) a(t1))/2.
Now we can perform two types of MD-DTW: on
the feature values and on their derivatives. As a third
type, we took the derivatives and added them to the
series as extra dimensions, doubling the dimensional-
ity of the series. In this setting, both the feature values
themselves and their derivatives are taken into consid-
eration when searching for the ideal warp. We tested
our algorithm in all three settings. They are denoted
as S(ignal), D(erivative) and SD(signal+derivative).
In each case, the series were smoothed and nor-
malised before (MD)DTW was commenced.
2.3 Dimension Selection
To compare series with various dimensions, it is
necessary to normalise the dimensions. However, this
is a problem if a dimension contains only noise. For
example, position data gathered on the non-dominant
hand in a one-handed gesture. Normalisation will
enlarge the noise in this dimension to the same pro-
portions as the informative data in other dimensions
(such as dominant hand positions). The noise will
Figure 4: Sign for ‘acrobat’, a real-world example of
synchronisation information distributed over dimen-
sions. The solid line is the x-co-ordinate, the dotted
line the y-co-ordinate. Each has information where
the other is fairly constant.
then heavily influence the synchronisation. This is
undesirable. We therefore perform dimension selec-
tion before commencing normalisation and synchro-
In dimension selection, the variance in each di-
mension of a multi-dimensional series is calculated.
If the variance falls below a certain threshold, the di-
mension is not taken into account in the synchronisa-
tion process, nor in later calculations on features. In
this way, dimensions with hardly any variance, that
probably consist of noise, are disregarded. The vari-
ance threshold was empirically determined for our
dataset by calculating the variance of known noise-
dimensions (inert hands). It is also possible to weight
dimensions rather then simply select or discard them.
In the next section, we give a number of tests in
which we compared MD-DTW in various settings
(S,D,SD) to regular DTW (1D-DTW) in the same set-
tings on various dimensions. For all tests, the same
filters for smoothing and derivatives were used.
3 Experimental Results
We tested the MD-DTW algorithm in different
ways. First, we wanted to assess the accuracy of var-
ious DTW algorithms on a known ground truth. For
this purpose, we created artificial warpings on a num-
ber of series and stored the warps. We then applied
various versions of (MD)DTW to the warped series
and compared the synchronisations calculated to the
stored true synchronisations.
Secondly, we tested our algorithm in the domain of
gesture recognition. We performed simple classifica-
tion on a set of 121 gestures by comparing unknown
examples to class prototypes. We used the synchroni-
sation found by the various algorithms to determine
at which time-points features should be compared.
Better synchronisation should give more appropriate
feature comparison and therefore higher classification
scores. Data for all tests was retrieved from our ges-
ture dataset, which is described below.
3.1 Dataset
In our experiments, we used a dataset of gestures.
The gestures are signs from the standard vocabulary
of Sign Language of the Netherlands. The gestures
were recorded with 2 cameras in stereo position. Six
features were automatically extracted from each pair
of frames, resulting in a multi-dimensional series.
The extracted features were: 3D positions (x, y, z ),
relative to the head, of the left and right hand. Each
gesture was stored in a gesture length x 6 matrix. The
gestures varied in length, average length was 91 ±16
Our dataset consists of 121 different gestures.
Each gesture was recorded from 67 different persons
(all right-handed), giving us 67 examples for each
of the 121 classes (for 9 classes, there were only 66
examples). In addition, there is a set of prototypes
consisting of one example per class. These exam-
ples were hand-picked on the grounds of being a cor-
rect version of the class they represented. They were
not optimised with respect to any of the classification
methods mentioned below.
Many gestures in our set are one-handed, in which
case the left hand is inert. In the two-handed gestures,
the left hand in most cases copies the right. For this
reason, we only tested 1D-DTW based on right hand
features, since in almost all cases the left hand would
either be less informative (inert) or equally informa-
tive (copying).
3.2 Artificially Warped series
We created an artificially warped series as follows:
we took a multi-dimensional series (a gesture) and
copied it. In the copy, we chose a random anchor
point. This point was shifted 20 time instances to
the left or right (direction was also chosen randomly).
The adjacent points were shifted a fraction of 20
points, the fraction decreased with a point’s increased
distance to the anchor point (with a Gaussian curve).
This created a localised time-warp. The warp was the
same in all dimensions. After warping the time axis,
the values in each dimension were re-interpolated to
the original time axis. This gave us a warped series of
which the correct warping to the original series was
known. We will refer to the artificially warped series
as the distorted series.
After warping, the original and the distorted series
were normalised. Uniform zero-mean noise was then
added to both to create some differences in feature
values. We used three levels of noise variance: 0, 0.01
and 0.05 (the normalised series had a unit variance).
Figure 5 shows an example of an artificial warping
(zero noise).
Figure 5: Example of an artificially warped series
(only 1 dimension is shown). indicate the original
feature values, othe new, shifted feature values, and
+show the values re-interpolated at the original time
We first smoothed both series in each dimension
with a Gaussian filter (σ= 5). We then applied var-
ious versions of DTW to the original and distorted
series and stored the calculated warps. We calculated
the goodness of a warp as follows: let the original se-
ries be s, the distorted series s0. Let GT denote the
ground truth warp, and Wthe warp found by the al-
gorithm. Let W:s(i)s0(j)denote that Wwarps
s(i)onto s0(j). Then the average aberration eis given
i=1 |jj0|
where M= length of W,GT :s(i)s0(j)and
We performed this operation on one random ex-
ample of each class in our set, and took the median
of the errors of all 121 examples (we took the median
to be robust against outlier classes). We repeated this
20 times. The means and standard deviations over
these 20 runs are given in table 1. It can be seen that
MD-DTW performs better than any 1D-DTW vari-
ant. Adding more noise enlarges the difference be-
tween MD-DTW and the one-dimensional variants.
More noise makes the correct warp more difficult to
find, because even the correct warp will display dif-
ferences in feature values. MD-DTW has the advan-
tage that it has multiple dimensions in which the same
warp is present, whereas the noise is different in each
dimension, canceling out to some extent. For 1D-
DTW, Derivative DTW appears to improve the re-
sults, but only for conditions with much noise. For
MD-DTW, this is not the case (probably because it is
already more robust against noise). For noiseless con-
ditions, Derivative DTW causes deterioration. The
reason may be that derivative is more sensitive to re-
interpolation, causing some noise.
3.3 Application in Gesture Classification
In our domain, we want to classify gestures by
comparing their feature values. MD-DTW should
help us find the correct correspondences of time
points between different gesture examples, so that the
appropriate features will be compared.
To test the merits of MD-DTW over ordinary DTW
in this respect, we executed a few simple classification
tasks using various versions of DTW for synchronisa-
tion. The tasks are entirely equal in all other respects.
3.3.1 Nearest Neighbour Classification
One basic way of testing classification performance is
the nearest neighbour (NN) scheme. We used the pro-
totypes from our dataset as the training set (one pro-
totype per class). All other gestures were used as the
test set. In the test, we simply warped each test ges-
ture on each prototype. We then calculated the feature
distance between test gesture and prototype in the fol-
lowing way: let Abe the test gesture, Bthe prototype,
and Wthe calculated warp. Then
Dist(A, B) =
|A(i, j)B(i0, j )|
where N= length of W,kis a list of dimensions that
are selected for both Aand B, e.g. [1,3,5] (see sec-
tion 2.3), and W:A(i)B(i0). A test gesture was
given the class of the prototype to which it had the
smallest distance.
Our test set consisted of 8 098 examples in 121
classes. We took 6 000 random samples from this set
and computed the error of the nearest neighbour clas-
sification for several different methods of synchroni-
sation. This process was repeated 20 times. The mean
error and standard deviation over these 20 runs are
given in table 2.
The errors in table 2 are large, which is to be ex-
pected for a NN with only one training example per
class. However, we are not interested in classification
performance so much as in comparing classification
performance when various forms of DTW are used
to achieve synchronisation. Table 2 shows that when
synchronisation is calculated with MD-DTW, perfor-
mance is better than with DTW synchronisation based
DTW on Noise
r.hand X
r.hand Y
r.hand Z
Signal (S)
0 0.14 (0.01) 0.13 (0.01) 0.13 (0.01) 0.10 (0.00)
0.01 0.54 (0.06) 0.45 (0.06) 0.27 (0.04) 0.12 (0.01)
0.05 1.17 (0.08) 1.16(0.07) 0.84 (0.06) 0.30 (0.04)
0 0.31 (0.03) 0.34 (0.02) 0.29 (0.02) 0.22 (0.01)
0.01 0.58 (0.05) 0.54 (0.06) 0.38 (0.04) 0.23 (0.01)
0.05 0.95 (0.05) 1.01 (0.05) 0.78 (0.06) 0.39 (0.03)
Signal +
0 n.a. n.a. n.a. 0.12 (0.01)
0.01 n.a. n.a. n.a. 0.13 (0.01)
0.05 n.a. n.a. n.a. 0.25 (0.03)
Table 1: Average aberrations of the ground truth in frames of the calculated warp. Test series were normalised
gestures. The aberrations were calculated 20 times for the entire gesture set. The values shown are the means and
standard deviations over 20 runs. Noise level indicates the variance of the noise (the gestures were normalised to
unit variance). Each type of DTW was performed both on the feature values (S) and the feature derivatives (D).
MD-DTW was also performed on both combined (SD).
on a single dimension. The best performance out-
right is given by MD-DTW on the first derivatives of
the dimensions. It performs significantly better than
any other DTW variant. For each DTW variant, the
Derivative condition performed better than the Signal
or SD condition. All significances were calculated
with a one-sided T-test, p < .05.
3.3.2 Warp Distances as Dissimilarity Features
In the previous section, we used the distance between
a gesture example and a prototype, calculated for var-
ious synchronisations, to determine the class of the
example. A more sophisticated method is to use the
distances of an example to each prototype as (new)
dissimilarity features [6] of the example and repre-
sent each example by its pattern of distances to all
prototypes. This gives us a different dataset: we still
have 67 examples for each of the 121 classes, but now,
each example has 121 features, namely, its distances
to each of the prototypes. The values of these features
will differ for different synchronisations. Therefore,
the examples have different feature values for each
DTW variant.
When we regard the dissimilarities as ordinary fea-
tures, we can train and test ordinary classifiers on our
new dataset. The performance of a few classifiers for
different DTW conditions was evaluated. Each clas-
sifier was trained and tested 20 times on random par-
titionings of the dataset. The ratio of training set and
test set was 3:1. Mean and variance of the classifica-
tion error over these 20 runs are shown in table 3 (first
two columns).
The linear density-based classifier gives the better
performance, but we are more interested in the rela-
tive performance for the various forms of DTW and
MD-DTW. MD-DTW outperforms 1D-DTW when
the x- or z-co-ordinate are used for synchronisation.
For 1D-DTW on the y-co-ordinate, performance is
equal with MD-DTW both for the S and for the D
condition. Possibly the y-co-ordinate is the most in-
formative dimension for most gestures in our dataset
when it comes to synchronisation. But there is much
variation in the performance per gesture. For cer-
tain gestures, the y-co-ordinate performs worse than
MD-DTW, e.g. gestures which hardly vary in y-co-
To investigate the performance per gesture, we
trained and tested one-class classifiers for each ges-
ture in the set. This entails training a classifier to dis-
tinguish one class from everything else. We used a
k-means classifier set to reject maximally 1% of the
target class. We trained and tested it for each class in
turn, for each DTW variant, and repeated the process
20 times. The results are too extensive to show here.
The average error over all classes was around 0.35
for all DTW variants, but it varied greatly between
classes (s.d. 0.2). We looked at the performance
per class and counted for each 1D-DTW variant the
number of classes for which it had the lowest error
and also performed significantly better than any MD-
DTW variant. In other words, we counted the classes
for which an 1D-DTW variant would perform better
than MD-DTW. These numbers are given in the right-
most column in table 3. (There were 4 classes for
which an MD-DTW variant was better than all one-
dimensional variants).
The results from the one-class study show us that
in certain cases 1D-DTW can perform better than
MD-DTW, and in many cases there is at least one di-
mension which will perform comparably well. How-
r.hand X
r.hand Y
r.hand Z
Signal (S) 0.791 (0.006) 0.791 (0.006) 0.831 (0.004) 0.748 (0.006)
0.722 (0.007) 0.699 (0.006) 0.759 (0.005) 0.690 (0.006)
Signal +
n.a. n.a. n.a. 0.699 (0.006)
Table 2: Classification error of the Nearest Neighbour algorithm using various DTW variants and conditions for
synchronisation. Each was tested 20 times on random samples (size 6 000) of the data. The values shown are the
means and standard deviations over 20 runs. The minimum error is shown in bold. It was significantly lower than
the other errors (one-sided T-test, p < .05).
DTW on Type of
Linear density-
5-nearest neigh-
One-class k-means
[# better than MD-
Signal (S)
r.hand X
0.411 (0.012) 0.504 (0.009) 10
r.hand Y
0.334 (0.008) 0.473 (0.009) 5
r.hand Z
0.468 (0.010) 0.553 (0.008) 0
MD-DTW 0.329 (0.011) 0.471 (0.009) -
r.hand X
0.377 (0.010) 0.478 (0.011) 18
r.hand Y
0.297 (0.008) 0.432 (0.007) 14
r.hand Z
0.432 (0.007) 0.543 (0.009) 3
MD-DTW 0.295 (0.009) 0.432 (0.010) -
Signal +
MD-DTW 0.299 (0.008) 0.441 (0.010) -
Table 3: Classification errors of several classifiers in dissimilarity space for various DTW variants and conditions.
Each classifier was trained and tested 20 times on random partitionings of the data. The values shown are the means
and standard deviations over 20 runs. The minimum error per column and all that did not differ significantly from it
are shown in bold (one-sided T-test, p < .05). The rightmost column indicates performance for k-means one-class
classifiers, trained to reject maximally 1% of the target class. The numbers shown are the number of classes (out of
121) for which the 1D-DTW variant performed significantly better than any MD-DTW variant (one-sided T-test,
p < .05).
ever, it also shows us that the best dimension differs
per gesture. It is therefore impossible to pick one di-
mension that will perform best for the entire set.
For the two regular classifiers, best classification
performance outright on the entire set was given by
the MD-DTW- and y-co-ordinate derivatives. There is
no significant difference between these two for either
classifier. For MD-DTW and for each DTW variant
the Derivative condition performed significantly bet-
ter than the Signal condition. All significances were
tested with a one-sided T-test, p < .05.
4 Discussion
We discussed a novel technique for synchronis-
ing multi-dimensional series. The MD-DTW algo-
rithm is an extension of the regular DTW algorithm
that takes all dimensions into account when finding
the optimal synchronisation between two series. We
tested MD-DTW against one-dimensional DTW, and
also looked at the performance of both algorithms
when first-order derivatives were used instead of fea-
ture values.
When testing against a known ground truth, MD-
DTW, as expected, showed an advantage over 1D-
DTW when more noise was added. In various clas-
sification tasks, MD-DTW usually outperformed 1D-
DTW on all dimensions. Sometimes, the performance
of the best 1D-DTW dimension equaled that of MD-
DTW. However, since there are cases for which the
best dimension does not give good results, and since
MD-DTW performance is always equal to or better
than that of a single dimension (taken over the entire
set), using MD-DTW is preferable.
MD-DTW has the disadvantage of being more ex-
pensive than DTW in its calculation of the distance
matrix. The extra processing time is linear in the
number of dimensions. For large or high-dimensional
datasets, it may therefore be worth the effort to dis-
cover the best single dimension, possibly per class,
and use 1D-DTW. The best dimension in our dataset
was not the one with the largest variance, so it is not
immediately clear how it should be found. Empirical
testing on a training set is probably the best way. For
smaller or low-dimensional datasets, MD-DTW is a
better option.
For each single dimension and for MD-DTW,
Derivative DTW gave the better performance. For our
gesture dataset, top performance was given by MD-
DTW on derivatives, sometimes equaled by DTW on
the y-co-ordinate derivatives. We therefore conclude
that for our dataset, MD-DTW on derivatives is the
best synchronisation method.
[1] Andrea Corradini. Dynamic time warping for off-
line recognition of a small gesture vocabulary.
In Proceedings of the IEEE ICCV Workshop on
Recognition, Analysis, and Tracking of Faces and
Gestures in Real-Time Systems, page 83, July-
August 2001.
[2] D.M. Gavrila and L.S. Davis. Towards 3-d
model-based tracking and recognition of human
movement: a multi-view approach. In IEEE In-
ternational Workshop on Automatic Face- and
Gesture Recognition, pages 272–277. IEEE Com-
puter Society, Zurich, June 1995.
[3] E. Keogh and C. A. Ratanamahatana. Exact in-
dexing of dynamic time warping. Knowledge and
Information Systems, 7(3):358–386, 2005.
[4] Eamonn Keogh and M.J. Pazzani. Derivative dy-
namic time warping. In First Intl. SIAM Intl.
Conf. on Data Mining, Chicago, Illinois, 2001.
[5] J. Kruskall and M. Liverman. The symmetric
time warping problem: From continuous to dis-
crete. In Time Warps, String Edits and Macro-
molecules: The Theory and Practice of Sequence
Comparison, pages 125–161. Addison-Wesley
Publishing Co., Reading, Massachusetts, 1983.
[6] Elzbieta Pekalska and Robert P.W. Duin. The dis-
similarity representation for pattern recognition :
foundations and applications. World Scientific,
NJ, London, 2005.
[7] L. Rabiner and B.-H. Juang. Fundamentals of
Speech Recognition. Prentice Hall PTR, Engle-
wood Cliffs, NJ, 1993.
[8] Stan Salvador and Philip Chan. Fastdtw: Toward
accurate dynamic time warping in linear time and
space. In KDD Workshop on Mining Temporal
and Sequential Data, 2004.
[9] Michail Vlachos, Marios Hadjieleftheriou, Dim-
itrios Gunopulos, and Eamonn Keogh. Indexing
multi-dimensional time-series with support for
multiple distance measures. In Proceedings of the
9th ACM SIGKDD int. conf. on Knowledge dis-
covery and data mining, Washington, D.C., Au-
gust 2003.
... A good summary on the development in this field can be found in [8]. Many methods have been proposed in the literature, including discriminant analysis [9], hypothesis testing [10], singular value decomposition (SVD) [11], common PCA [12], dynamic time warping (DTW) [13,14], Mahalanobis distance-based DTW [15], etc. However, all of these methods fail to consider the matrix data structure inherent in MTS data. ...
... When n < d, it is cheaper to compute G by (13). Let ...
In recent years, the methods on matrix-based or bilinear discriminant analysis (BLDA) have received much attention. Despite their advantages, it has been reported that the traditional vector-based regularized LDA (RLDA) is still quite competitive and could outperform BLDA on some benchmark datasets. Nevertheless, it is also noted that this finding is mainly limited to image data. In this paper, we propose regularized BLDA (RBLDA) and further explore the comparison between RLDA and RBLDA on another type of matrix data, namely multivariate time series (MTS). Unlike image data, MTS typically consists of multiple variables measured at different time points. Although many methods for MTS data classification exist within the literature, there is relatively little work in exploring the matrix data structure of MTS data. Moreover, the existing BLDA can not be performed when one of its within-class matrices is singular. To address the two problems, we propose RBLDA for MTS data classification, where each of the two within-class matrices is regularized via one parameter. We develop an efficient implementation of RBLDA and an efficient model selection algorithm with which the cross validation procedure for RBLDA can be performed efficiently. Experiments on a number of real MTS data sets are conducted to evaluate the proposed algorithm and compare RBLDA with several closely related methods, including RLDA and BLDA. The results reveal that RBLDA achieves the best overall recognition performance and the proposed model selection algorithm is efficient; Moreover, RBLDA can produce better visualization of MTS data than RLDA.
... Pitch-pitch, roll-roll, yaw-yaw, roll-yaw, yaw-roll DTW) distance between the head and torso sequences. Both segments were linearly interpolated to keep the number of data points constant across sequences68,69 . Pitch, roll, yaw Time-normalized number of peaks 71 . ...
Full-text available
The acquisition of postural control is an elaborate process, which relies on the balanced integration of multisensory inputs. Current models suggest that young children rely on an ‘en-block’ control of their upper body before sequentially acquiring a segmental control around the age of 7, and that they resort to the former strategy under challenging conditions. While recent works suggest that a virtual sensory environment alters visuomotor integration in healthy adults, little is known about the effects on younger individuals. Here we show that this default coordination pattern is disrupted by an immersive virtual reality framework where a steering role is assigned to the trunk, which causes 6- to 8-year-olds to employ an ill-adapted segmental strategy. These results provide an alternate trajectory of motor development and emphasize the immaturity of postural control at these ages.
Full-text available
This paper develops a multi-dimensional Dynamic Time Warping (DTW) algorithm to identify varying lead-lag relationships between two different time series. Specifically, this manuscript contributes to the literature by improving upon the use towards lead-lag estimation. Our two-step procedure computes the multi-dimensional DTW alignment with the aid of shapeDTW and then utilises the output to extract the estimated time-varying lead-lag relationship between the original time series. Next, our extensive simulation study analyses the performance of the algorithm compared to the state-of-the-art methods Thermal Optimal Path (TOP), Symmetric Thermal Optimal Path (TOPS), Rolling Cross-Correlation (RCC), Dynamic Time Warping (DTW), and Derivative Dynamic Time Warping (DDTW). We observe a strong outperformance of the algorithm regarding efficiency, robustness, and feasibility.
The application of robots in mechanical assembly increases the efficiency of industrial production. With the requirements of flexible manufacturing, it has become a research hotspot to accomplish diversified assembly operations safely and efficiently in unstructured environments. In recent years, many advanced robot assembly strategies have been proposed. Fault monitoring and strategy performance evaluation have also attracted more attention. To promote the development of robotic assembly, this paper systematically reviews the recent research in this field. According to the assembly process, the review separates the research contents into target recognition and searching, compliant strategies for fine insertion motion and fault monitoring. The characteristics of each method are summarized. Furthermore, a performance evaluation for assembly strategies is proposed with typical metrics. We surveyed the classical benchmarks to provide support for standardized performance evaluation. Finally, the challenges and potential directions are discussed.
Conference Paper
Cough is a major symptom of respiratory-related diseases. There exists a tremendous amount of work in detecting coughs from audio but there has been no effort to identify coughs from solely inertial measurement unit (IMU). Coughing causes motion across the whole body and especially on the neck and head. Therefore, head motion data during coughing captured by a head-worn IMU sensor could be leveraged to detect coughs using a template matching algorithm. In time series template matching problems, K-Nearest Neighbors (KNN) combined with elastic distance measurement (esp. Dynamic Time Warping (DTW)) achieves outstanding performance. However, it is often regarded as prohibitively time-consuming. Nearest Centroid Classifier is thereafter proposed. But the accuracy is comprised of only one centroid obtained for each class. Centroid-based Classifier performs clustering and averaging for each cluster, but requires manually setting the number of clusters. We propose a novel self-tuning multi-centroid template-matching algorithm, which can automatically adjust the number of clusters to balance accuracy and inference time. Through experiments conducted on synthetic datasets and a real-world earbud-based cough dataset, we demonstrate the superiority of our proposed algorithm and present the result of cough detection with a single accelerometer sensor on the earbuds platform.Clinical relevance- Coughing is a ubiquitous symptom of pulmonary disease, especially for patients with COPD and asthma. This work explores the possibility and and presents the result of cough detection using an IMU sensor embedded in earables.
Full-text available
Computing trajectory similarity is a fundamental operation in movement analytics, required in search, clustering, and classification of trajectories, for example. Yet the range of different but interrelated trajectory similarity measures can be bewildering for researchers and practitioners alike. This paper describes a systematic comparison and methodical exploration of trajectory similarity measures. Specifically, this paper compares five of the most important and commonly used similarity measures: dynamic time warping (DTW), edit distance (EDR), longest common subsequence (LCSS), discrete Fréchet distance (DFD), and Fréchet distance (FD). The paper begins with a thorough conceptual and theoretical comparison. This comparison highlights the similarities and differences between measures in connection with six different characteristics, including their handling of a relative versus absolute time and space, tolerance to outliers, and computational efficiency. The paper further reports on an empirical evaluation of similarity in trajectories with contrasting properties: data about constrained bus movements in a transportation network, and the unconstrained movements of wading birds in a coastal environment. A set of four experiments: a. creates a measurement baseline by comparing similarity measures to a single trajectory subjected to various transformations; b. explores the behavior of similarity measures on network-constrained bus trajectories, grouped based on spatial and on temporal similarity; c. assesses similarity with respect to known behavioral annotations (flight and foraging of oystercatchers); and d. compares bird and bus activity to examine whether they are distinguishable based solely on their movement patterns. The results show that in all instances both the absolute value and the ordering of similarity may be sensitive to the choice of measure. In general, all measures were more able to distinguish spatial differences in trajectories than temporal differences. The paper concludes with a high-level summary of advice and recommendations for selecting and using trajectory similarity measures in practice, with conclusions spanning our three complementary perspectives: conceptual, theoretical, and empirical.
Conference Paper
Full-text available
The dynamic time warping (DTW) algorithm is able to find the optimal alignment between two time series. It is o ften used to determine time series similarity, classification, a nd to find corresponding regions between two time series. DTW has a quadratic time and space complexity that limits its use to only small time series data sets. In this paper we intr oduce FastDTW, an approximation of DTW that has a linear time and space complexity. FastDTW uses a multilevel approach that recursively projects a solution from a coarse resolution and re fines the projected solution. We prove the linear time and s pace complexity of FastDTW both theoretically and empirically. We also analyze the accuracy of FastDTW compared to two other existing approximate DTW algorithms: Sakoe-Chuba Bands and Data Abstraction. Our results show a large improve ment in accuracy over the existing methods.
Full-text available
The problem of indexing time series has attracted much interest. Most algorithms used to index time series utilize the Euclidean distance or some variation thereof. However, it has been forcefully shown that the Euclidean distance is a very brittle distance measure. Dynamic time warping (DTW) is a much more robust distance measure for time series, allowing similar shapes to match even if they are out of phase in the time axis. Because of this flexibility, DTW is widely used in science, medicine, industry and finance. Unfortunately, however, DTW does not obey the triangular inequality and thus has resisted attempts at exact indexing. Instead, many researchers have introduced approximate indexing techniques or abandoned the idea of indexing and concentrated on speeding up sequential searches. In this work, we introduce a novel technique for the exact indexing of DTW. We prove that our method guarantees no false dismissals and we demonstrate its vast superiority over all competing approaches in the largest and most comprehensive set of time series indexing experiments ever undertaken.
Full-text available
Dynamic Time Warping (DTW) has a quadratic time and space complexity that limits its use to small time series. In this paper we introduce FastDTW, an approximation of DTW that has a linear time and space complexity. FastDTW uses a multilevel approach that recursively projects a solution from a coarser resolution and refines the projected solution. We prove the linear time and space complexity of FastDTW both theoretically and empirically. We also analyze the accuracy of FastDTW by comparing it to two other types of existing approximate DTW algorithms: constraints (such as Sakoe-Chiba Bands) and abstraction. Our results show a large improvement in accuracy over existing methods.
Conference Paper
We focus on the visual sensory information to recognize human activity in form of hand-arm movements from a small, predefined vocabulary. We accomplish this task by means of a matching technique by determining the distance between the unknown input and a set of previously defined templates. A dynamic time warping algorithm is used to perform the time alignment and normalization by computing a temporal transformation allowing the two signals to be matched. The system is trained with finite video sequences of single gesture performances whose start and end-point are accurately known. Preliminary experiments are accomplished off-line and result in a recognition accuracy of up to 92%
In this paper we describe our work on 3-D modelbased tracking and recognition of human movement from real images. Our system has two major components. The first component takes real image sequences acquired from multiple views and recovers the 3-D body pose at each time instant. The poserecovery problem is formulated as a search problem and entails finding the pose parameters of a graphical human model for which its synthesized appearance is most similar to the actual appearance of the real human in the multi-view images. Currently, we use a best-first search technique and chamfer matching as a fast similarity measure between synthesized and real edge images. The second component of our system deals with the representation and recognition of human movement patterns. The recognition of human movement patterns is considered as a classification problem involving the matching of a test sequence with several reference sequences representing prototypical activities. A variation of dynamic ti...
Although most time-series data mining research has concentrated on providing solutions for a single distance function, in this work we motivate the need for a single index structure that can support multiple distance measures. Our specific area of interest is the efficient retrieval and analysis of trajectory similarities. Trajectory datasets are very common in environmental applications, mobility experiments, video surveillance and are especially important for the discovery of certain biological patterns. Our primary similarity measure is based on the Longest Common Subsequence (LCSS) model, that offers enhanced robustness, particularly for noisy data, which are encountered very often in real world applications. However, our index is able to accommodate other distance measures as well, including the ubiquitous Euclidean distance, and the increasingly popular Dynamic Time Warping (DTW). While other researchers have advocated one or other of these similarity measures, a major contribution of our work is the ability to support all these measures without the need to restructure the index. Our framework guarantees no false dismissals and can also be tailored to provide much faster response time at the expense of slightly reduced precision/recall. The experimental results demonstrate that our index can help speed-up the computation of expensive similarity measures such as the LCSS and the DTW.
this paper we address both these problems by introducing a modification of DTW. The crucial difference is in the features we consider when attempting to find the correct warping. Rather than use the raw data, we consider only the (estimated) local derivatives of the data
The symmetric time warping problem: From continuous to discrete In Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison
  • J Kruskall
  • M Liverman
J. Kruskall and M. Liverman. The symmetric time warping problem: From continuous to discrete. In Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison, pages 125–161. Addison-Wesley Publishing Co., Reading, Massachusetts, 1983.