Content uploaded by Marcel J T Reinders

Author content

All content in this area was uploaded by Marcel J T Reinders

Content may be subject to copyright.

Multi-Dimensional Dynamic Time Warping for Gesture

Recognition

G.A. ten Holta,b M.J.T. ReindersaE.A. Hendriksa

aInformation and Communication Theory Group

Delft University of Technology,

Mekelweg 4, 2628 CD, Delft, The Netherlands

bHuman Information Communication Design

Delft University of Technology,

Landbergstraat 8, 2628 CC, Delft, The Netherlands

g.a.tenholt@tudelft.nl m.j.t.reinders@tudelft.nl e.a.hendriks@tudelft.nl

Keywords: Dynamic Time Warping, pattern recognition, multi-dimensional time series

Abstract

We present an algorithm for Dynamic Time Warp-

ing (DTW) on multi-dimensional time series (MD-

DTW). The algorithm utilises all dimensions to ﬁnd

the best synchronisation. It is compared to ordi-

nary DTW, where a single dimension is used for

aligning the series. Both one-dimensional and multi-

dimensional DTW are also tested when derivatives

instead of feature values are used for calculating the

warp. MD-DTW performed best in ﬁnding a known

ground truth under noisy conditions. The algorithms

were also used to perform simple classiﬁcation of a

set of 121 gestures. MD-DTW performed as well as

or better than any single dimension in all tasks. In

general, DTW on feature derivatives gave better re-

sults than DTW on feature values.

1 Introduction

There are various problem areas where signals

need to be synchronised. When two time signals

are compared, or when a pattern is sought in a

larger stream of data, one of the signals may be

warped in a non-linear way by shrinking or expand-

ing along its time axis. Simple point-to-point compar-

ison then gives unrealistic results, because one might

be comparing different relative parts of the same sig-

nal/pattern. In these cases, some sort of synchroni-

sation is needed. Figure 1 illustrates the difference

between point-to-point comparison and comparison

aided by synchronisation.

Dynamic Time Warping (DTW) [5] has long been

used to ﬁnd the optimal alignment of two signals. The

DTW algorithm calculates the distance between each

possible pair of points out of two signals in terms of

Figure 1: The importance of synchronisation. If sig-

nals are simply compared at each time instance (dot-

ted arrow), they may be in different relative phases,

giving unrealistic differences. Synchronisation en-

sures proper comparison (solid arrow).

their associated feature values. It uses these distances

to calculate a cumulative distance matrix and ﬁnds the

least expensive path through this matrix. This path

represents the ideal warp - the synchronisation of the

two signals which causes the feature distance between

their synchronised points to be minimised. Usually,

the signals are normalised and smoothed before the

distances between points are calculated.

DTW has been used in various ﬁelds, such as

speech recognition [7], data mining [3], and move-

ment recognition [1, 2]. Previous work in the ﬁeld of

DTW mainly focused on speeding up the algorithm,

the complexity of which is quadratic in the length

of the series. Examples are applying constraints to

DTW [4], approximation of the algorithm [8] and

lower bounding techniques [3]. [4] proposed a form

of DTW called Derivative DTW (DDTW). Here, the

distances calculated are not between the feature val-

ues of the points, but between their associated ﬁrst-

order derivatives. In this way, synchronisation is

based on shape characteristics (slopes, peaks) rather

than simple values. Most work, however, only con-

sidered one-dimensional series.

We work on gesture recognition. Gestures are

recorded by cameras and multiple features are ex-

tracted at each time instance, giving us multi-

dimensional time series. We therefore investigated

techniques for the synchronisation of such series.

Multi-dimensional (time) series are series in which

multiple measurements are made simultaneously.

Such series have an K-dimensional vector of feature

values for each (time) instance of the series. They can

be synchronised by simply picking one dimension to

perform DTW with and warping the complete series

according to the warp found in this dimension. How-

ever, in many cases, all dimensions will contain infor-

mation needed for synchronisation. We therefore pro-

pose multi-dimensional DTW (MD-DTW) for syn-

chronising such series. An extension of DTW into

2 dimensions was proposed by [9], but not systemat-

ically tested. In [2] a type of MD-DTW is described,

but only used for ﬁxed-length series. In the next sec-

tions, we explain our MD-DTW algorithm and test it

on our gesture dataset. We compare MD-DTW to reg-

ular 1D-DTW on various dimensions. We also looked

at using derivatives instead of the values themselves

(DDTW) and at combining both approaches in MD-

DTW.

2 Multi-Dimensional Dynamic Time

Warping

2.1 Algorithm

Multi-dimensional series consist of a number of

measurements made at each instance. The number

of measurements is the dimensionality of the series,

the number of time instances its length. Note that

multi-dimensional series need not be time signals, any

situation in which several measurements are made

simultaneously depending on one variable gives a

multi-dimensional series. In this paper, we assume

that measurements are stored in a matrix, in which

columns are features and rows are time instances.

Take two series Aand B. DTW involves the cre-

ation of a matrix in which the distance between every

possible combination of time instances A(i)↔B(j)

is stored. This distance is calculated in terms of the

The MD-DTW Algorithm

Let A, B be two series of dimension Kand

length M, N respectively.

•Normalise each dimension of Aand B

separately to a zero mean and unit vari-

ance

•If desired, smooth each dimension with

a Gaussian ﬁlter

•Fill the Mby Ndistance matrix Dac-

cording to:

D(i, j) =

K

X

k=1

|A(i, k)−B(j, k)|

•Use this distance matrix to ﬁnd the best

synchronisation with the regular DTW

algorithm

Figure 2: The MD-DTW algorithm

feature values of the points. Various norms are possi-

ble. In 1D-DTW, the distance is usually calculated by

taking the absolute or the squared distance between

the feature values of each combination of points. For

MD-DTW, a distance measure for two K-dimensional

points must be calculated. This distance can be any

p-norm. We use the 1-norm, i.e. the sum of the

absolute differences in all dimensions. To combine

different dimensions in this way, it is necessary to

normalise each dimension to a zero mean and unit

variance. For this, the dimensions must be compa-

rable. If for instance one dimension contains real-

valued measurements and one is binary, comparing

them directly is not possible and a more sophisticated

distance measure must be found. The MD-DTW al-

gorithm is shown in ﬁgure 2.

The beneﬁts of MD-DTW can be seen when multi-

dimensional series are considered that have synchro-

nisation information distributed over different dimen-

sions. Take the artiﬁcial 2D series shown in ﬁg-

ure 3(a). It is clear that for the ﬁrst half (in time) of

the series, dimension 1 is useful for ﬁnding the cor-

rect synchronisation, whereas dimension 2 is uninfor-

mative. The converse is true for the second half of

the series. If we were to perform 1D-DTW on this se-

ries using dimension 1, the result would be as shown

in ﬁgure 3(b). The second half of the series is uni-

formly synchronised, since there is no information for

1D-DTW to work with. But it can be seen that for di-

mension 2, this is not the ideal synchronisation. 1D-

DTW on dimension 2 gives a similar (but converse)

Figure 3: The necessity for MD-DTW. (a) shows two artiﬁcial 2D time series of equal mean and variance. Di-

mension 1 is shown in column 1, dimension 2 in column 2. The series contain synchronisation information in

both dimensions. If 1D-DTW is performed using the ﬁrst dimension, the result is suboptimal for dimension 2, as

illustrated in (b). Note that the peaks and valleys in dimension 2 are not aligned properly. 1D-DTW on dimension

2 gives a similar suboptimal match for dimension 1. MD-DTW takes both dimensions into account and ﬁnds the

best synchronisation (c).

result. MD-DTW takes both dimensions into account

in ﬁnding the optimal synchronisation. The result is

a synchronisation that is as ideal as possible for both

dimensions, as shown in ﬁgure 3 (c). This is the ad-

vantage of MD-DTW over regular DTW.

Though the series in ﬁgure 3 is artiﬁcial, the sit-

uation it depicts is not unrealistic. Figure 4 shows a

similar situation from a real-world multi-dimensional

series: x- and y-co-ordinates of the right hand for the

Dutch sign for acrobat. Here, the y-co-ordinate is in-

formative for the ﬁrst part of the series, and the x-

co-ordinate for the second part. To get both peaks

properly matched, MD-DTW is necessary.

2.2 DTW on Derivatives

[4] argue that for synchronising shape-

characteristics of series (such as peaks and slopes),

it is beneﬁcial to perform DTW on the ﬁrst-order

derivatives of the feature values (DDTW). Since we

want to perform such shape-matching in each of our

dimensions, we considered this option for the MD-

DTW algorithm. In our case, this meant taking the

ﬁrst-order derivative in each dimension separately.

This gives us information about the slopes and peaks

in each dimension. The series were ﬁrst smoothed

in each dimension with a Gaussian ﬁlter (σ= 5) to

diminish noise effects. Then an approximation of the

derivative was taken in each dimension using the ﬁlter

der(a(t)) = (a(t+ 1) −a(t−1))/2.

Now we can perform two types of MD-DTW: on

the feature values and on their derivatives. As a third

type, we took the derivatives and added them to the

series as extra dimensions, doubling the dimensional-

ity of the series. In this setting, both the feature values

themselves and their derivatives are taken into consid-

eration when searching for the ideal warp. We tested

our algorithm in all three settings. They are denoted

as S(ignal), D(erivative) and SD(signal+derivative).

In each case, the series were smoothed and nor-

malised before (MD)DTW was commenced.

2.3 Dimension Selection

To compare series with various dimensions, it is

necessary to normalise the dimensions. However, this

is a problem if a dimension contains only noise. For

example, position data gathered on the non-dominant

hand in a one-handed gesture. Normalisation will

enlarge the noise in this dimension to the same pro-

portions as the informative data in other dimensions

(such as dominant hand positions). The noise will

Figure 4: Sign for ‘acrobat’, a real-world example of

synchronisation information distributed over dimen-

sions. The solid line is the x-co-ordinate, the dotted

line the y-co-ordinate. Each has information where

the other is fairly constant.

then heavily inﬂuence the synchronisation. This is

undesirable. We therefore perform dimension selec-

tion before commencing normalisation and synchro-

nisation.

In dimension selection, the variance in each di-

mension of a multi-dimensional series is calculated.

If the variance falls below a certain threshold, the di-

mension is not taken into account in the synchronisa-

tion process, nor in later calculations on features. In

this way, dimensions with hardly any variance, that

probably consist of noise, are disregarded. The vari-

ance threshold was empirically determined for our

dataset by calculating the variance of known noise-

dimensions (inert hands). It is also possible to weight

dimensions rather then simply select or discard them.

In the next section, we give a number of tests in

which we compared MD-DTW in various settings

(S,D,SD) to regular DTW (1D-DTW) in the same set-

tings on various dimensions. For all tests, the same

ﬁlters for smoothing and derivatives were used.

3 Experimental Results

We tested the MD-DTW algorithm in different

ways. First, we wanted to assess the accuracy of var-

ious DTW algorithms on a known ground truth. For

this purpose, we created artiﬁcial warpings on a num-

ber of series and stored the warps. We then applied

various versions of (MD)DTW to the warped series

and compared the synchronisations calculated to the

stored true synchronisations.

Secondly, we tested our algorithm in the domain of

gesture recognition. We performed simple classiﬁca-

tion on a set of 121 gestures by comparing unknown

examples to class prototypes. We used the synchroni-

sation found by the various algorithms to determine

at which time-points features should be compared.

Better synchronisation should give more appropriate

feature comparison and therefore higher classiﬁcation

scores. Data for all tests was retrieved from our ges-

ture dataset, which is described below.

3.1 Dataset

In our experiments, we used a dataset of gestures.

The gestures are signs from the standard vocabulary

of Sign Language of the Netherlands. The gestures

were recorded with 2 cameras in stereo position. Six

features were automatically extracted from each pair

of frames, resulting in a multi-dimensional series.

The extracted features were: 3D positions (x, y, z ),

relative to the head, of the left and right hand. Each

gesture was stored in a gesture length x 6 matrix. The

gestures varied in length, average length was 91 ±16

frames.

Our dataset consists of 121 different gestures.

Each gesture was recorded from 67 different persons

(all right-handed), giving us 67 examples for each

of the 121 classes (for 9 classes, there were only 66

examples). In addition, there is a set of prototypes

consisting of one example per class. These exam-

ples were hand-picked on the grounds of being a cor-

rect version of the class they represented. They were

not optimised with respect to any of the classiﬁcation

methods mentioned below.

Many gestures in our set are one-handed, in which

case the left hand is inert. In the two-handed gestures,

the left hand in most cases copies the right. For this

reason, we only tested 1D-DTW based on right hand

features, since in almost all cases the left hand would

either be less informative (inert) or equally informa-

tive (copying).

3.2 Artiﬁcially Warped series

We created an artiﬁcially warped series as follows:

we took a multi-dimensional series (a gesture) and

copied it. In the copy, we chose a random anchor

point. This point was shifted 20 time instances to

the left or right (direction was also chosen randomly).

The adjacent points were shifted a fraction of 20

points, the fraction decreased with a point’s increased

distance to the anchor point (with a Gaussian curve).

This created a localised time-warp. The warp was the

same in all dimensions. After warping the time axis,

the values in each dimension were re-interpolated to

the original time axis. This gave us a warped series of

which the correct warping to the original series was

known. We will refer to the artiﬁcially warped series

as the distorted series.

After warping, the original and the distorted series

were normalised. Uniform zero-mean noise was then

added to both to create some differences in feature

values. We used three levels of noise variance: 0, 0.01

and 0.05 (the normalised series had a unit variance).

Figure 5 shows an example of an artiﬁcial warping

(zero noise).

Figure 5: Example of an artiﬁcially warped series

(only 1 dimension is shown). ∗indicate the original

feature values, othe new, shifted feature values, and

+show the values re-interpolated at the original time

instances.

We ﬁrst smoothed both series in each dimension

with a Gaussian ﬁlter (σ= 5). We then applied var-

ious versions of DTW to the original and distorted

series and stored the calculated warps. We calculated

the goodness of a warp as follows: let the original se-

ries be s, the distorted series s0. Let GT denote the

ground truth warp, and Wthe warp found by the al-

gorithm. Let W:s(i)→s0(j)denote that Wwarps

s(i)onto s0(j). Then the average aberration eis given

by

e=PM

i=1 |j−j0|

M

where M= length of W,GT :s(i)→s0(j)and

W:s(i)→s0(j0).

We performed this operation on one random ex-

ample of each class in our set, and took the median

of the errors of all 121 examples (we took the median

to be robust against outlier classes). We repeated this

20 times. The means and standard deviations over

these 20 runs are given in table 1. It can be seen that

MD-DTW performs better than any 1D-DTW vari-

ant. Adding more noise enlarges the difference be-

tween MD-DTW and the one-dimensional variants.

More noise makes the correct warp more difﬁcult to

ﬁnd, because even the correct warp will display dif-

ferences in feature values. MD-DTW has the advan-

tage that it has multiple dimensions in which the same

warp is present, whereas the noise is different in each

dimension, canceling out to some extent. For 1D-

DTW, Derivative DTW appears to improve the re-

sults, but only for conditions with much noise. For

MD-DTW, this is not the case (probably because it is

already more robust against noise). For noiseless con-

ditions, Derivative DTW causes deterioration. The

reason may be that derivative is more sensitive to re-

interpolation, causing some noise.

3.3 Application in Gesture Classiﬁcation

In our domain, we want to classify gestures by

comparing their feature values. MD-DTW should

help us ﬁnd the correct correspondences of time

points between different gesture examples, so that the

appropriate features will be compared.

To test the merits of MD-DTW over ordinary DTW

in this respect, we executed a few simple classiﬁcation

tasks using various versions of DTW for synchronisa-

tion. The tasks are entirely equal in all other respects.

3.3.1 Nearest Neighbour Classiﬁcation

One basic way of testing classiﬁcation performance is

the nearest neighbour (NN) scheme. We used the pro-

totypes from our dataset as the training set (one pro-

totype per class). All other gestures were used as the

test set. In the test, we simply warped each test ges-

ture on each prototype. We then calculated the feature

distance between test gesture and prototype in the fol-

lowing way: let Abe the test gesture, Bthe prototype,

and Wthe calculated warp. Then

Dist(A, B) =

N

X

i=1

k(K)

X

j=k(1)

|A(i, j)−B(i0, j )|

where N= length of W,kis a list of dimensions that

are selected for both Aand B, e.g. [1,3,5] (see sec-

tion 2.3), and W:A(i)→B(i0). A test gesture was

given the class of the prototype to which it had the

smallest distance.

Our test set consisted of 8 098 examples in 121

classes. We took 6 000 random samples from this set

and computed the error of the nearest neighbour clas-

siﬁcation for several different methods of synchroni-

sation. This process was repeated 20 times. The mean

error and standard deviation over these 20 runs are

given in table 2.

The errors in table 2 are large, which is to be ex-

pected for a NN with only one training example per

class. However, we are not interested in classiﬁcation

performance so much as in comparing classiﬁcation

performance when various forms of DTW are used

to achieve synchronisation. Table 2 shows that when

synchronisation is calculated with MD-DTW, perfor-

mance is better than with DTW synchronisation based

DTW on Noise

level

1D-DTW

r.hand X

1D-DTW

r.hand Y

1D-DTW

r.hand Z

MD-DTW

Signal (S)

0 0.14 (0.01) 0.13 (0.01) 0.13 (0.01) 0.10 (0.00)

0.01 0.54 (0.06) 0.45 (0.06) 0.27 (0.04) 0.12 (0.01)

0.05 1.17 (0.08) 1.16(0.07) 0.84 (0.06) 0.30 (0.04)

Derivative

(D)

0 0.31 (0.03) 0.34 (0.02) 0.29 (0.02) 0.22 (0.01)

0.01 0.58 (0.05) 0.54 (0.06) 0.38 (0.04) 0.23 (0.01)

0.05 0.95 (0.05) 1.01 (0.05) 0.78 (0.06) 0.39 (0.03)

Signal +

Derivative

(SD)

0 n.a. n.a. n.a. 0.12 (0.01)

0.01 n.a. n.a. n.a. 0.13 (0.01)

0.05 n.a. n.a. n.a. 0.25 (0.03)

Table 1: Average aberrations of the ground truth in frames of the calculated warp. Test series were normalised

gestures. The aberrations were calculated 20 times for the entire gesture set. The values shown are the means and

standard deviations over 20 runs. Noise level indicates the variance of the noise (the gestures were normalised to

unit variance). Each type of DTW was performed both on the feature values (S) and the feature derivatives (D).

MD-DTW was also performed on both combined (SD).

on a single dimension. The best performance out-

right is given by MD-DTW on the ﬁrst derivatives of

the dimensions. It performs signiﬁcantly better than

any other DTW variant. For each DTW variant, the

Derivative condition performed better than the Signal

or SD condition. All signiﬁcances were calculated

with a one-sided T-test, p < .05.

3.3.2 Warp Distances as Dissimilarity Features

In the previous section, we used the distance between

a gesture example and a prototype, calculated for var-

ious synchronisations, to determine the class of the

example. A more sophisticated method is to use the

distances of an example to each prototype as (new)

dissimilarity features [6] of the example and repre-

sent each example by its pattern of distances to all

prototypes. This gives us a different dataset: we still

have 67 examples for each of the 121 classes, but now,

each example has 121 features, namely, its distances

to each of the prototypes. The values of these features

will differ for different synchronisations. Therefore,

the examples have different feature values for each

DTW variant.

When we regard the dissimilarities as ordinary fea-

tures, we can train and test ordinary classiﬁers on our

new dataset. The performance of a few classiﬁers for

different DTW conditions was evaluated. Each clas-

siﬁer was trained and tested 20 times on random par-

titionings of the dataset. The ratio of training set and

test set was 3:1. Mean and variance of the classiﬁca-

tion error over these 20 runs are shown in table 3 (ﬁrst

two columns).

The linear density-based classiﬁer gives the better

performance, but we are more interested in the rela-

tive performance for the various forms of DTW and

MD-DTW. MD-DTW outperforms 1D-DTW when

the x- or z-co-ordinate are used for synchronisation.

For 1D-DTW on the y-co-ordinate, performance is

equal with MD-DTW both for the S and for the D

condition. Possibly the y-co-ordinate is the most in-

formative dimension for most gestures in our dataset

when it comes to synchronisation. But there is much

variation in the performance per gesture. For cer-

tain gestures, the y-co-ordinate performs worse than

MD-DTW, e.g. gestures which hardly vary in y-co-

ordinate.

To investigate the performance per gesture, we

trained and tested one-class classiﬁers for each ges-

ture in the set. This entails training a classiﬁer to dis-

tinguish one class from everything else. We used a

k-means classiﬁer set to reject maximally 1% of the

target class. We trained and tested it for each class in

turn, for each DTW variant, and repeated the process

20 times. The results are too extensive to show here.

The average error over all classes was around 0.35

for all DTW variants, but it varied greatly between

classes (s.d. ≈0.2). We looked at the performance

per class and counted for each 1D-DTW variant the

number of classes for which it had the lowest error

and also performed signiﬁcantly better than any MD-

DTW variant. In other words, we counted the classes

for which an 1D-DTW variant would perform better

than MD-DTW. These numbers are given in the right-

most column in table 3. (There were 4 classes for

which an MD-DTW variant was better than all one-

dimensional variants).

The results from the one-class study show us that

in certain cases 1D-DTW can perform better than

MD-DTW, and in many cases there is at least one di-

mension which will perform comparably well. How-

DTW on 1D-DTW

r.hand X

1D-DTW

r.hand Y

1D-DTW

r.hand Z

MD-DTW

Signal (S) 0.791 (0.006) 0.791 (0.006) 0.831 (0.004) 0.748 (0.006)

Derivative

(D)

0.722 (0.007) 0.699 (0.006) 0.759 (0.005) 0.690 (0.006)

Signal +

Derivative

(SD)

n.a. n.a. n.a. 0.699 (0.006)

Table 2: Classiﬁcation error of the Nearest Neighbour algorithm using various DTW variants and conditions for

synchronisation. Each was tested 20 times on random samples (size 6 000) of the data. The values shown are the

means and standard deviations over 20 runs. The minimum error is shown in bold. It was signiﬁcantly lower than

the other errors (one-sided T-test, p < .05).

DTW on Type of

DTW

Linear density-

based

5-nearest neigh-

bour

One-class k-means

[# better than MD-

DTW]

Signal (S)

1D-DTW

r.hand X

0.411 (0.012) 0.504 (0.009) 10

1D-DTW

r.hand Y

0.334 (0.008) 0.473 (0.009) 5

1D-DTW

r.hand Z

0.468 (0.010) 0.553 (0.008) 0

MD-DTW 0.329 (0.011) 0.471 (0.009) -

Derivative

(D)

1D-DTW

r.hand X

0.377 (0.010) 0.478 (0.011) 18

1D-DTW

r.hand Y

0.297 (0.008) 0.432 (0.007) 14

1D-DTW

r.hand Z

0.432 (0.007) 0.543 (0.009) 3

MD-DTW 0.295 (0.009) 0.432 (0.010) -

Signal +

Derivative

(SD)

MD-DTW 0.299 (0.008) 0.441 (0.010) -

Table 3: Classiﬁcation errors of several classiﬁers in dissimilarity space for various DTW variants and conditions.

Each classiﬁer was trained and tested 20 times on random partitionings of the data. The values shown are the means

and standard deviations over 20 runs. The minimum error per column and all that did not differ signiﬁcantly from it

are shown in bold (one-sided T-test, p < .05). The rightmost column indicates performance for k-means one-class

classiﬁers, trained to reject maximally 1% of the target class. The numbers shown are the number of classes (out of

121) for which the 1D-DTW variant performed signiﬁcantly better than any MD-DTW variant (one-sided T-test,

p < .05).

ever, it also shows us that the best dimension differs

per gesture. It is therefore impossible to pick one di-

mension that will perform best for the entire set.

For the two regular classiﬁers, best classiﬁcation

performance outright on the entire set was given by

the MD-DTW- and y-co-ordinate derivatives. There is

no signiﬁcant difference between these two for either

classiﬁer. For MD-DTW and for each DTW variant

the Derivative condition performed signiﬁcantly bet-

ter than the Signal condition. All signiﬁcances were

tested with a one-sided T-test, p < .05.

4 Discussion

We discussed a novel technique for synchronis-

ing multi-dimensional series. The MD-DTW algo-

rithm is an extension of the regular DTW algorithm

that takes all dimensions into account when ﬁnding

the optimal synchronisation between two series. We

tested MD-DTW against one-dimensional DTW, and

also looked at the performance of both algorithms

when ﬁrst-order derivatives were used instead of fea-

ture values.

When testing against a known ground truth, MD-

DTW, as expected, showed an advantage over 1D-

DTW when more noise was added. In various clas-

siﬁcation tasks, MD-DTW usually outperformed 1D-

DTW on all dimensions. Sometimes, the performance

of the best 1D-DTW dimension equaled that of MD-

DTW. However, since there are cases for which the

best dimension does not give good results, and since

MD-DTW performance is always equal to or better

than that of a single dimension (taken over the entire

set), using MD-DTW is preferable.

MD-DTW has the disadvantage of being more ex-

pensive than DTW in its calculation of the distance

matrix. The extra processing time is linear in the

number of dimensions. For large or high-dimensional

datasets, it may therefore be worth the effort to dis-

cover the best single dimension, possibly per class,

and use 1D-DTW. The best dimension in our dataset

was not the one with the largest variance, so it is not

immediately clear how it should be found. Empirical

testing on a training set is probably the best way. For

smaller or low-dimensional datasets, MD-DTW is a

better option.

For each single dimension and for MD-DTW,

Derivative DTW gave the better performance. For our

gesture dataset, top performance was given by MD-

DTW on derivatives, sometimes equaled by DTW on

the y-co-ordinate derivatives. We therefore conclude

that for our dataset, MD-DTW on derivatives is the

best synchronisation method.

References

[1] Andrea Corradini. Dynamic time warping for off-

line recognition of a small gesture vocabulary.

In Proceedings of the IEEE ICCV Workshop on

Recognition, Analysis, and Tracking of Faces and

Gestures in Real-Time Systems, page 83, July-

August 2001.

[2] D.M. Gavrila and L.S. Davis. Towards 3-d

model-based tracking and recognition of human

movement: a multi-view approach. In IEEE In-

ternational Workshop on Automatic Face- and

Gesture Recognition, pages 272–277. IEEE Com-

puter Society, Zurich, June 1995.

[3] E. Keogh and C. A. Ratanamahatana. Exact in-

dexing of dynamic time warping. Knowledge and

Information Systems, 7(3):358–386, 2005.

[4] Eamonn Keogh and M.J. Pazzani. Derivative dy-

namic time warping. In First Intl. SIAM Intl.

Conf. on Data Mining, Chicago, Illinois, 2001.

[5] J. Kruskall and M. Liverman. The symmetric

time warping problem: From continuous to dis-

crete. In Time Warps, String Edits and Macro-

molecules: The Theory and Practice of Sequence

Comparison, pages 125–161. Addison-Wesley

Publishing Co., Reading, Massachusetts, 1983.

[6] Elzbieta Pekalska and Robert P.W. Duin. The dis-

similarity representation for pattern recognition :

foundations and applications. World Scientiﬁc,

NJ, London, 2005.

[7] L. Rabiner and B.-H. Juang. Fundamentals of

Speech Recognition. Prentice Hall PTR, Engle-

wood Cliffs, NJ, 1993.

[8] Stan Salvador and Philip Chan. Fastdtw: Toward

accurate dynamic time warping in linear time and

space. In KDD Workshop on Mining Temporal

and Sequential Data, 2004.

[9] Michail Vlachos, Marios Hadjieleftheriou, Dim-

itrios Gunopulos, and Eamonn Keogh. Indexing

multi-dimensional time-series with support for

multiple distance measures. In Proceedings of the

9th ACM SIGKDD int. conf. on Knowledge dis-

covery and data mining, Washington, D.C., Au-

gust 2003.