ArticlePDF Available

Abstract and Figures

One of the most fundamental challenges when accessing gestural patterns in 3D motion capture databases is the definition of spatiotemporal similarity. While distance-based similarity models such as the Gesture Matching Distance on gesture signatures are able to leverage the spatial and temporal characteristics of gestural patterns, their applicability to large 3D motion capture databases is limited due to their high computational complexity. To this end, we present a lower bound approximation of the Gesture Matching Distance that can be utilized in an optimal multi-step query processing architecture in order to support efficient query processing. We investigate the performance in terms of accuracy and efficiency based on 3D motion capture databases and show that our approach is able to achieve an increase in efficiency of more than one order of magnitude with a negligible loss in accuracy. In addition, we discuss different applications in the digital humanities in order to highlight the significance of similarity search approaches in the research field of gestural pattern analysis.
Content may be subject to copyright.
February 24, 2016 19:22 WSPC/INSTRUCTION FILE main
International Journal of Semantic Computing
c
Wor ld S ci en ti fic P ub li sh in g Co mp any
Ecient Query Processing in 3D Motion Capture Gesture Databases
via the Gesture Matching Distance
Christian Beecks1, Marwan Hassani
1, Bela Brenger
2, Jennifer Hinnell
3, Daniel Sch¨uller
2, Irene
Mittelberg2, Thomas Seidl
1
1Data Management and Exploration Group
1RWTH Aachen University, Germany
1{beecks,hassani,seidl}@cs.rwth-aachen.de
2Natural Media Lab
2RWTH Aachen University, Germany
2{brenger,schueller,mittelberg}@humtec.rwth-aachen.de
3Department of Linguistics
3University of Alberta, Canada
3hinnell@ualberta.ca
Received (30 01 2016)
Revised (24 02 2016)
One of the most fundamental challenges when accessing gestural patterns in 3D motion
capture databases is the definition of spatiotemporal similarity. While distance-based
similarity models such as the Gesture Matching Distance on gesture signatures are able
to leverage the spatial and temporal characteristics of gestural patterns, their applica-
bility to large 3D motion capture databases is limited due to their high computational
complexity. To this end, we present a lower bound approximation of the Gesture Match-
ing Distance that can be utilized in an optimal multi-step query processing architecture
in order to support ecient query processing. We investigate the performance in terms
of accuracy and eciency based on 3D motion capture databases and show that our
approach is able to achieve an increase in eciency of more than one order of magnitude
with a negligible loss in accuracy. In addition, we discuss dierent applications in the
digital humanities in order to highlight the significance of similarity search approaches
in the research field of gestural pattern analysis.
Keywords:ecient query processing; spatiotemporal data; 3D motion capture data; ges-
tural patterns; gesture signature; gesture matching distance; dynamic time warping.
1. Introduction
3D motion capture data is a specific type of multimedia data that is mainly used to
record movements of humans, animals, or objects over time. This type of data has
found widespread utilization in academia and industry, for instance, for entertaining
purposes, medical applications, film-making, and video game development. One of
the major advantages of 3D motion capture data is the capability of expressing spa-
1
February 24, 2016 19:22 WSPC/INSTRUCTION FILE main
2Beecks et al.
tiotemporal dynamics with the highest possible accuracy [1]. This property makes
3D motion capture data particularly useful for research into the domain of gestural
pattern analysis.
A gestural pattern can be understood as a kinetic action involving hand, arm,
and body configurations or movements over a certain period of time. A gestural
pattern is represented either by extracting its characteristic features or utilizing
the raw three-dimensional movement traces, the so-called trajectories. In order to
maintain the high degree of exactness provided by utilizing 3D motion capture data,
we represent gestural patterns by means of gesture signatures [1]. Gesture signatures
are multidimensional trajectory representations which facilitate gestural pattern
analysis with arbitrarily high exactness. Gesture signatures are able to adapt to the
individual spatial and temporal properties of gestural patterns by allowing these
patterns to dier in the number of included trajectories and their lengths as well as
in the weighting scheme indicating the inherent relevance of the trajectories. In fact,
gesture signatures provide an adaptable model-free approach which supports lazy
query-dependent evaluation, i.e., no time-intensive training phase is needed prior
to query processing.
In order to leverage the spatial and temporal characteristics of gestural patterns,
we utilize the Gesture Matching Distance [1] for the similarity comparison of two ges-
ture signatures. The Gesture Matching Distance is a distance-based similarity mea-
sure which quantifies the degree of dissimilarity between two dierently structured
gesture signatures by matching similar trajectories within the gesture signatures
according to their spatial and temporal characteristics. To this end, the Gesture
Matching Distance is parameterized with a distance measure between individual
trajectories, such as the Dynamic Time Warping [2,3], the Levenshtein Distance
[4], the Minimal Variance Matching [5], the Longest Common Subsequence [6,7],
the Edit Distance with Real Penalty [8], the Edit Distance on Real Sequences [9],
or the Mutual Nearest Point Distance [10].
Although the Gesture Matching Distance enables a user-customizable and adap-
tive similarity definition, it is accompanied by a high computation time complexity.
The computation time complexity for a single distance computation between two
gesture signatures is quadratic in the number of the underlying trajectories. Thus
the applicability of this spatiotemporal similarity measure is limited to small-to-
moderate 3D motion capture databases.
In this paper, we aim to counteract this eciency issue and present a lower
bound approximation of the Gesture Matching Distance [11] that can be utilized
in an optimal multi-step query processing architecture [12]. Besides the theoreti-
cal investigation of this approximation, we benchmark the performance in terms
of accuracy and eciency and empirically show that the proposed lower bound
approximation is able to achieve an increase in eciency of more than one order
of magnitude with a negligible loss in accuracy. In addition, we discuss dierent
applications in digital humanities in order to highlight the significance of similarity
February 24, 2016 19:22 WSPC/INSTRUCTION FILE main
Ecient Query Processing in 3D Motion Capture Gesture Databases via the Gesture Matching Distance 3
search approaches in the research field of gestural pattern analysis.
The paper is structured as follows: Section 2 outlines related work with a par-
ticular focus on gestural pattern similarity by means of gesture signatures and the
Gesture Matching Distance. In Section 3, we investigate the lower bound approxi-
mation of the Gesture Matching Distance. The optimal multi-step query processing
algorithm is presented in Section 4. Experimental results are reported in Section
5, and a discussion of dierent applications in digital humanities with a particular
focus on gesture research is included in Section 6. The conclusions are given in
Section 7.
2. Related Work
2.1. Gesture Signatures
Agesture signature [1] is a lossless spatiotemporal representation of a gestural pat-
tern which comprises dierent movement traces, the so-called trajectories. A trajec-
tory can be thought of as a finite sequence of points in a multidimensional space.
As we consider the three-dimensional Euclidean space R3, we define a trajectory
t2Tas:
t:{1,...,n}!R3,(1)
where t(i)=(xi,y
i,z
i)2R3represents the coordinates of the movement trace at
time i2[1,...,n]. The trajectory space T=Sk2N{t|t:{1,...,k}!R3}denotes
the set of all finite trajectories.
Since a gestural pattern typically involves more than one trajectory within a cer-
tain period of time, we aggregate these trajectories by means of a gesture signature
S2RTwhich is defined as:
S:T!R0subject to |{t2T|S(t)6=0}| <1.(2)
A gesture signature is a function from the trajectory space Tinto the real num-
bers R. It assigns each trajectory a non-negative weight indicating its relevance with
respect to the corresponding gestural pattern. Possible weighting schemes include
uniform weighting, motion distance weighting, and motion variance weighting [1].
The latter reflect the overall movement and vividness of a trajectory, respectively.
2.2. Gesture Signature Distance Functions
Gestural patterns typically maintain a high degree of idiosyncrasy meaning that
the involved trajectories are almost unique. In order to quantify a similarity value
between two dierently structured gestural patterns, Beecks et al. [1] have investi-
gated the idea of matching similar trajectories within the gestural patterns accord-
ing to their spatial and temporal characteristics. To this end, the trajectories are
compared by means of a trajectory distance function, such as the Dynamic Time
Warping Distance, and the distances between matching trajectories are accumu-
lated accordingly. Thus, given two gesture signatures S1,S
22RTand a trajectory
February 24, 2016 19:22 WSPC/INSTRUCTION FILE main
4Beecks et al.
distance function :TT!R0,theGesture Matching Distance between S1and
S2is defined as:
GMD(S1,S
2)= X
(t1,t2)2m-NN
S1!S2
S1(t1)·(t1,t
2)+ X
(t2,t1)2m-NN
S2!S1
S2(t2)·(t2,t
1),(3)
where the nearest-neighbor matching m-NN
S1!S2TTbetween S1and S2is defined
as m-NN
S1!S2={(t1,t
2)2TT|S1(t1)>0^S2(t2)>0^t2= argmint2T(t1,t)}.
The Gesture Matching Distance increases with decreasing similarity of the
matching trajectories. The computation time complexity is quadratic in the number
of trajectories, i.e. a single distance computation lies in O(|{S1(t)>0}t2T|·|{S2(t)>
0}t2T)wheredenotes the computation time complexity of the trajectory dis-
tance function .
In addition to the Gesture Matching Distance, other applicable signature dis-
tance functions [13,14,15,16,17] are the transformation-based Earth Mover’s Dis-
tance [17], the correlation-based Signature Quadratic Form Distance [18], the
matching-based HausdorDistance [19] and its variants [20,21] as well as the Sig-
nature Matching Distance [22].
2.3. Trajectory Distance Functions
Fundamental to the question of how to model spatiotemporal similarity between
gestural patterns comprising one or more trajectories, is the question of how to
determine similarity between two individual trajectories. A common approach to
comparing two trajectories is based on Dynamic Time Warping [2,3]. The idea of
this approach is to fit the trajectories to each other by aligning their coincident
similar points and accumulating the corresponding point-wise distances. Given two
trajectories tn:{1,...,n}!R3and tm:{1,...,m}!R3and a point-wise
distance function :R3R3!R,theDynamic Time Warping Distance between
tnand tmis defined as:
DTW(tn,t
m)=(tn(n),t
m(m)) + min
8
>
>
<
>
>
:
DTW(tn1,t
m1)
DTW(tn,t
m1)
DTW(tn1,t
m)
(4)
with
DTW(t0,t
0)=0 (5)
DTW(ti,t
0)=181in(6)
DTW(t0,t
j)=181jm. (7)
The Dynamic Time Warping Distance is defined recursively by minimizing the
distances between replicated points of the trajectories. In this way, the distance
assesses the spatial proximity of two points while the Dynamic Time Warping Dis-
tance preserves their temporal order within the trajectories. By utilizing Dynamic
February 24, 2016 19:22 WSPC/INSTRUCTION FILE main
Ecient Query Processing in 3D Motion Capture Gesture Databases via the Gesture Matching Distance 5
Programming, the computation time complexity of the Dynamic Time Warping
Distance lies in O(n·m).
In addition to Dynamic Time Warping described above, spatiotemporal similar-
ity between trajectories can be assessed for instance by the Levenshtein Distance
[4], the Minimal Variance Matching [5], the Longest Common Subsequence [6,7],
the Edit Distance with Real Penalty [8], the Edit Distance on Real Sequences [9],
or the Mutual Nearest Point Distance [10].
2.4. Other Approaches to Gestural Pattern Similarity
Gestural patterns are mainly investigated in terms of gesture recognition, which
aims at recognizing meaningful expressions of human motion including hand, arm,
face, head, and body movements [23]. Many surveys [24,25,26,27,23,28,29,30,31]
have been released in the past years, providing an extensive overview of the many
facets of gesture recognition. Most approaches either rely on 2D video capture tech-
nology and, thus, computer vision techniques, cf. [32,33], or on 3D motion capture
technology, which provides higher accuracy and thus more potential for precise spa-
tiotemporal similarity search. A recent survey of vision-based gesture recognition
approaches can be found in [28]. Frequently encountered approaches for recogniz-
ing manual gestural patterns are based on Hidden Markov Models [34,35,36,37] or
more generally Dynamic Bayesian Networks [38]. More recent approaches are based
for instance on Featu re Fusion [39], on Dynamic Time Warping [40,41], on Longest
Common Subsequences [42], or on Neural Networks [43].
3. Lower Bound Approximation of the Gesture Matching Distance
In this section, we present the lower bound approximation of the Gesture Matching
Distance. In order to derive this approximation, we will first investigate the theo-
retical properties of the underlying nearest-neighbor matching and then show how
these findings lead to our proposal.
Suppose we are given two gesture signatures S1,S
22RTand a trajectory dis-
tance function :TT!R0that determines the dissimilarity between two
individual trajectories. In general, a nearest-neighbor matching m-NN
S1!S2TT
assigns each trajectory t2Tfrom gesture signature S1to one or more trajecto-
ries u, v, . . . 2Tfrom gesture signature S2. If the trajectories u, v, . . . are equally
distant to trajectory t, i.e. if it holds that (t, u)=(t, v)=... , trajectory t
is matched to several nearest neighbors. In practice, however, the uniqueness of
the distances between dierent trajectories most likely leads to exactly one near-
est neighbor. If this is not the case, we assume that one of the nearest neighbors
is selected non-deterministically. Based on this assumption, each nearest-neighbor
matching m-NN
S1!S2between two non-empty gesture signatures S1and S2satisfies
the following properties:
February 24, 2016 19:22 WSPC/INSTRUCTION FILE main
6Beecks et al.
Left totality:
8t2T,9u2T:S1(t)>0)(t, u)2m-NN
S1!S2(8)
Right uniqueness:
8t, u, v 2T:(t, u)2m-NN
S1!S2^(t, v)2m-NN
S1!S2)u=v(9)
Intuitively, each trajectory t2Tthat contributes to gesture signature S1, i.e.
which has a positive weight S1(t)>0, is matched to exactly one trajectory u2T
with S2(u)>0 from gesture signature S2. These properties of a nearest-neighbor
matching hold true irrespective of the underlying trajectory distance function .
Thus, by replacing the trajectory distance function with another one, pairs of
matching trajectories are subject to change, as shown in the following lemma.
Lemma 1. Let S1,S
22RTbe two gesture signatures and ,0:TT!R0be
two trajectory distance functions. For the nearest-neighbor matchings m-NN
S1!S2and
m0-NN
S1!S2between S1and S2it holds that:
8t2T,9u, v 2T:(t, u)2m-NN
S1!S2,(t, v)2m0-NN
S1!S2(10)
Proof. Let S1(t)0. By definition of the nearest-neighbor matching it holds that
(t, u)62 m-NN
S1!S2and that (t, v)62 m0-NN
S1!S2.
Let S1(t)>0. Suppose that 9u2Tsuch that (t, u)2m-NN
S1!S2. By definition of
m-NN
S1!S2it then holds that S2(u)>0. Thus, |{t2T|S2(t)6=0}| >0. Consequently,
by replacing with 0there exists at least one trajectory v2Twith S2(v)>0
that minimizes 0(t, v). Therefore, (t, v)2m0-NN
S1!S2. Suppose that 8u2Tit holds
that (t, u)62 m-NN
S1!S2. Due to the fact that S1(t)>0 it follows that {t2T|S2(t)6=
0}=;. Consequently, by replacing with 0if follows that 8v2Tit holds that
(t, v)62 m0-NN
S1!S2. This gives us the statement.
Lemma 1 states that each trajectory tfrom gesture signature S1that matches
a trajectory ufrom gesture signature S2according to a distance function also
matches a trajectory vaccording to a distance function 0. Due to the right unique-
ness of the nearest-neighbor matching, we conclude that each pair of matching
trajectories (t, u)2m-NN
S1!S2has exactly one counter pair (t, v)2m0-NN
S1!S2. This fact
is summarized in the following corollary.
Corollary 1. Let S1,S
22RTbe two gesture signatures and ,0:TT!R0
be two trajectory distance functions. For each (t, u)2m-NN
S1!S2there exists exactly
one (t, v)2m0-NN
S1!S2.
The corollary above implies that the cardinality of the nearest-neighbor match-
ing between two gesture signatures S1and S2is fixed, i.e. it holds that |m-NN
S1!S2|=
|m0-NN
S1!S2|for any trajectory distance functions and 0.
What remains to be shown is that the substitution of a trajectory distance
function :TT!R0with a lower bound LB :TT!R0, which is a function
February 24, 2016 19:22 WSPC/INSTRUCTION FILE main
Ecient Query Processing in 3D Motion Capture Gesture Databases via the Gesture Matching Distance 7
that satisfies the following property for all trajectories u, v 2T:LB(u, v)(u, v ),
will lead to a lower bound approximation of the Gesture Matching Distance. To this
end, we provide the following lemma.
Lemma 2. Let S1,S
22RTbe two gesture signatures and :TT!R0be
a trajectory distance function with lower bound LB :TT!R0, i.e. it holds
that 8u, v 2T:LB (u, v)(u, v). The nearest-neighbor matching satisfies the
following property:
(t, u)2mLB-NN
S1!S2^(t, v)2m-NN
S1!S2)LB(t, u)(t, v) (11)
Proof. Suppose it holds that (t, u)2mLB -NN
S1!S2and that (t, v)2m-NN
S1!S2.
By definition of the nearest-neighbor matching it then holds that u=
argmint02T^S2(t0)>0LB(t, t0) and that v= argmint02T^S2(t0)>0(t, t0). Since it holds
that mint02T^S2(t0)>0LB(t, t0)mint02T^S2(t0)>0(t, t0) it follows that LB(t, u)
(t, v).
Combining Corollary 1 and Lemma 2 finally leads to the proposal, as shown in
the following theorem.
Theorem 1. (Lower Bound Approximation)
Let S1,S
22RTbe two gesture signatures and :TT!R0be a trajectory
distance function. For any lower bound LB :TT!R0of it holds that:
GMDLB (S1,S
2)GMD(S1,S
2) (12)
Proof. The Gesture Matching Distance is defined as: GMD(S1,S
2)=
P(t1,t2)2m-NN
S1!S2
S1(t1)·(t1,t
2)+P(t2,t1)2m-NN
S2!S1
S2(t2)·(t2,t
1).By lower-
bounding with LB the number of summands stays the same since each (t1,t
2)2
m-NN
S1!S2and (t2,t
1)2m-NN
S2!S1is replaced by exactly one (t1,t
0
2)2mLB-NN
S1!S2and
(t2,t
0
1)2mLB-NN
S2!S1, respectively (cf. Corollary 1). According to Lemma 2 it addition-
ally holds that (t1,t
2)LB(t1,t
0
2) and (t2,t
1)LB(t2,t
0
1). We thus conclude
that P(t1,t2)2m-NN
S1!S2
S1(t1)·(t1,t
2)P(t1,t0
2)2mLB-NN
S1!S2
S1(t1)·(t1,t
0
2) and that
P(t2,t1)2m-NN
S2!S1
S2(t2)·(t2,t
1)P(t2,t0
1)2mLB-NN
S2!S1
S2(t2)·(t2,t
0
1), which gives us
the statement.
Theorem 1 shows that the lower bound approximation of the Gesture Matching
Distance is attributed to the properties of its inherent trajectory distance function.
How the resulting lower bound approximation is utilized in order to process queries
in gestural pattern databases arising from 3D motion capture data eciently is
shown in the following section.
4. Ecient Query Processing with the Gesture Matching Distance
A fundamental approach underlying many query processing approaches is the op-
timal multi-step algorithm [12]. The idea of this algorithm consists in processing
February 24, 2016 19:22 WSPC/INSTRUCTION FILE main
8Beecks et al.
distance-based k-nearest-neighbor queries in multiple interleaved steps, where each
step incrementally generates a candidate object with respect to a lower bound ap-
proximation which is subsequently refined by means of the exact distance function
until the final results are obtained. The algorithm is optimal, i.e., the number of
exact distance computations is minimized.
Algorithm 1 Optimal Multi-Step k-NN
1: procedure NNk(Q, GMDLB ,GMD,D)
2: R ;
3: filterRanking ranking(Q, GMDLB ,D)
4: S filterRanking.next()
5: while GMDLB (Q, S)maxP2RGMD(Q, P )do
6: if |R| <kthen
7: R R[{S}
8: else if GMD(Q, S)maxP2RGMD(Q, P )then
9: R R[{S}
10: R R{arg maxP2RGMD(Q, P )}
11: S filterRanking.next()
12: return R
As shown in Algorithm 1, the first step consists in generating a ranking with
respect to a query gesture signature Q2RTby means of the lower bound
approximation GMDLB (cf. line 3). Afterwards, this ranking is processed until
GMDLB exceeds the exact distance of the kth-nearest neighbor (cf. line 5), i.e.
until it holds that GMDLB (Q, S)6 maxP2RGMD(Q, P ). The algorithm up-
dates the result set Ras long as gesture signatures S2Dwith smaller distances
GMD(Q, S)maxP2RGMD(Q, P ) have been found (cf. line 8).
We utilize the optimal multi-step algorithm as described above in order to e-
ciently query gesture signatures in 3D motion capture databases. To this end, we ad-
ditionally subject the Dynamic Time Warping Distance to a bandwidth constraint,
which limits the maximum permissible time dierence between two aligned points of
the trajectories, and lower-bound this variant, denoted as DTWt, by LBKeogh [44].
The advantage of this lower bound is its low computation time complexity, which
is linear in the length of the trajectories. In fact, we approximate GMDDTW by
GMDDTWt, which is then lower-bounded by means of GMDLBKeogh . The perfor-
mance of this approach with respect to the qualities of accuracy and eciency is
empirically investigated in the following section.
5. Performance Analysis
In this section, we benchmark the accuracy and eciency of the Gesture Matching
Distance and its lower bound approximation by using the two following dierent
spatiotemporal 3D motion capture databases.
February 24, 2016 19:22 WSPC/INSTRUCTION FILE main
Ecient Query Processing in 3D Motion Capture Gesture Databases via the Gesture Matching Distance 9
Spiral!
Ling.!anchor!
do!run!
Aspect!
habitual!
Movement!type!
spiral!
Image!schema!
path!!/!scale!
Speech!phrase!
..'like,'I'do'run'
Stroke!phases! 3!
(a)
Circle!
Ling.!anchor!
con5nue!
Aspect!
progressive!
Movement!type!
circle!
Image!schema!
itera5on!
Speech!phrase!
!"#$%&'()#*++,(-"%.%/#('0"$#(-"%1#)$*."%$(
Stroke!phases! 3!
(b)
Straight!
Ling.!anchor!
from!the!point!of!where!I!was!to!the!
end!
Aspect!
habitual!
Movement!type!
straight!
Image!schema!
path!!
Speech!phrase!
from'the'point'of'where'I'was'8ll'like'
the'end'of'the'season''
Stroke!phases! 1!
(c)
Fig. 1. Three example gestural patterns of dierent spatiotemporal movement types: (a) gesture
of type spiral, (b) gesture of type circle, and (c) gesture of type straight. Blue trajectories indicate
the main movements of the gestural patterns. Images are taken from [1].
The natural media corpus of 3D motion capture data [1] comprises three-
dimensional motion capture data streams arising from eight participants during a
guided conversation. The participants were equipped with a multitude of reflective
markers which were attached to the body and in particular to the hands. The motion
of the markers was tracked optically via cameras at a frequency of 100 Hz by making
use of the Nexus Motion Capture Software from VICON. For evaluation purposes,
we used the right wrist marker and two markers attached to the right thumb and
right index finger each. The gestures arising within the conversation were classified
by domain experts according to the following types of movement: spiral,circle, and
straight. Example gestures of these movement types are sketched in Figure 1, which
has been taken from [1]. A total of 20 gesture signatures containing five trajectories
each was obtained from the 3D motion capture data streams. The trajectories of
the gesture signatures have been normalized to the interval [0,1]3R3.
In addition to the 3D motion capture database described above, we utilized
the 3D Iconic Gesture Dataseta[45]. This dataset comprises 1,739 iconic gestures
from 29 participants depicting entities, objects, and actions. Based on the provided
3D skeleton motion capture data, which was recorded via Microsoft Kinect, we
randomly extracted up to 10,000 gesture signatures including between 2 and 10
trajectories in the three-dimensional Euclidean space R3with a duration between
0.5 and 2.0 seconds. We additionally normalized the trajectories to the interval
[0,1]3R3.
In the first series of experiments, we evaluated the accuracy of the proposed
lower bound approximation of the Gesture Matching Distance in order to inves-
tigate the question of whether our proposal is able to find similar spatiotemporal
patterns within 3D motion capture data streams accurately. To this end, we selected
ahttp://projects.ict.usc.edu/3dig/
February 24, 2016 19:22 WSPC/INSTRUCTION FILE main
10 Beecks et al.
0
0.2
0.4
0.6
0.8
1
02000 4000 6000 8000 10000 12000 14000 16000 18000
dissimilarity
time
spirals GMD(DTW) GMD(DTW(t=10))
Fig. 2. Average dissimilarity values shown as a function of time with respect to gestural patterns of
movement type spiral. Reddish time intervals depict gestural patterns included in the 3D motion
capture data streams. The average dissimilarity of the exact Gesture Matching Distance GMDDTW
is shown by the blue line, while the average dissimilarity of the approximate Gesture Matching
Distance GMDDTWt=10 is shown by the green dotted line.
dierent movement types and computed dissimilarity plots with respect to dierent
gestural query patterns arising from the corresponding movement types in the natu-
ral media corpus. Based on the provided ground truth, we include one dissimilarity
plot for the movement type spiral, which is shown in Figure 2, and two dissimilarity
plots for the movement types straight and circle, which are shown in Figure 3 and
Figure 4, respectively. The corresponding gestural patterns included in the 3D mo-
tion capture data streams are highlighted by means of reddish time intervals. The
average dissimilarity values of the exact Gesture Matching Distance based on Dy-
namic Time Warping Distance GMDDTW are shown by blue lines, while the average
dissimilarity values of the approximate Gesture Matching Distance GMDDTWt=10 ,
where we fixed the maximum permissible time dierence t2Nto a value of 10, are
shown by green dotted lines.
As can be seen in the figures, the approximate Gesture Matching Distance
GMDDTWt=10 shows a behavior similar to the exact Gesture Matching Distance
GMDDTW. Both are able to respond to the corresponding queries with low dis-
similarity values. In fact, the maximum absolute dierence of dissimilarity values
between GMDDTW and GMDDTWt=10 is below a value of 0.242, while the average
deviation is below a value of 0.11. We thus conclude that the approximate Gesture
Matching Distance GMDDTWt=10 is able to compete with the non-approximate Ges-
ture Matching Distance GMDDTW in terms of accuracy.
In the second series of experiments, we evaluated the eect of the bandwidth
constraint applied to the Dynamic Time Warping Distance, where we fixed the max-
imum permissible time dierence t2Nto a value of 10. The precision in percentage
of the approximate Gesture Matching Distance GMDDTWt=10 with respect to the
exact GMDDTW is summarized in Figure 5. The precision values are depicted as a
function of the gesture signature length and the number of trajectories for dierent
February 24, 2016 19:22 WSPC/INSTRUCTION FILE main
Ecient Query Processing in 3D Motion Capture Gesture Databases via the Gesture Matching Distance 11
0
0.2
0.4
0.6
0.8
1
0 2000 4000 6000 8000 10000 12000 14000 16000
dissimilarity
time
straights GMD(DTW) GMD(DTW(t=10))
0
0.2
0.4
0.6
0.8
1
0 2000 4000 6000 8000 10000 12000 14000 16000 18000
dissimilarity
time
straights GMD(DTW) GMD(DTW(t=10))
Fig. 3. Average dissimilarity values shown as a function of time with respect to gestural patterns
of movement type straight. Reddish time intervals depict gestural patterns included in the 3D
motion capture data streams. The average dissimilarity of the exact Gesture Matching Distance
GMDDTW is shown by the blue line, while the average dissimilarity of the approximate Gesture
Matching Distance GMDDTWt=10 is shown by the green dotted line.
databases extracted from the 3D Iconic Gesture Dataset comprising 2k, 5k, and 10k
gesture signatures. As can be seen in the figure, the precision values decrease with
an increase in the length of the gesture signatures. At a gesture signature length of
0.5 seconds, the average precision stays at approximately 100%, which is reduced to
approximately 93% when utilizing gesture signatures with a length of 2.0 seconds.
An increase in the number of trajectories of the gesture signatures does not neces-
sarily degenerate the performance of our approach. As observed empirically, gesture
signatures comprising 6 trajectories always yield the highest precision values. This
eect might be caused by the underlying movement traces of the corresponding tra-
jectories. To sum up, the performance in terms of average precision of our proposal
stays above 97%. Thus, the loss in accuracy is less than 3%, which is negligible in
view of the increase in eciency.
In the third series of experiments, we evaluated the query processing eciency
when utilizing the optimal multi-step algorithm as presented in Section 4 with the
proposed lower bound approximation derived in Section 3. To this end, we investi-
February 24, 2016 19:22 WSPC/INSTRUCTION FILE main
12 Beecks et al.
0
0.2
0.4
0.6
0.8
1
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000
dissimilarity
time
circles GMD(DTW) GMD(DTW(t=10))
0
0.2
0.4
0.6
0.8
1
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000
dissimilarity
time
circles GMD(DTW) GMD(DTW(t=10))
Fig. 4. Average dissimilarity values shown as a function of time with respect to gestural patterns of
movement type circle. Reddish time intervals depict gestural patterns included in the 3D motion
capture data streams. The average dissimilarity of the exact Gesture Matching Distance GMDDTW
is shown by the blue line, while the average dissimilarity of the approximate Gesture Matching
Distance GMDDTWt=10 is shown by the green dotted line.
gated the average query response times needed for processing 100-nearest-neighbor
queries in a database of 10k gesture signatures. As before, the length of the gesture
signatures and the number of trajectories included in the gesture signatures are
varied. The average query processing times in seconds are reported in Table 1. In
general, the query response time increases by extending the length or the number
of trajectories of the gesture signatures. As can be seen in the table, the sequential
scan by means of the Gesture Matching Distance based on Dynamic Time Warping
Distance GMDDTW shows the highest query response time. By utilizing gesture
signatures comprising 10 trajectories with a length of 2 seconds, GMDDTW needs
more than 176 seconds on average for query processing. This query processing time
is reduced by more than one order of magnitude when processing the queries with
the optimal multi-step algorithm based on the proposed lower bound approximation
GMDLBKeogh . For the aforementioned parameters, our approach is able to complete
query processing in 14 seconds on average. By increasing the size of the database to
100k gesture signatures comprising 10 trajectories with a length of 2 seconds each,
February 24, 2016 19:22 WSPC/INSTRUCTION FILE main
Ecient Query Processing in 3D Motion Capture Gesture Databases via the Gesture Matching Distance 13
88%
90%
92%
94%
96%
98%
100%
2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10
0.5s 1.0s 1.5s 2.0s
PRECISION
TRAJECTORIES [2,4,6,8,10]
SIGNATURE LENGTH [0.5S,1.0S,1.5S,2.0S]
2000 5000 10000
Fig. 5. Precision values in percentage of GMDDTWt=10 in comparison to GMDDTW as function
of gesture signature length and the number of trajectories. The database size is varied among 2k,
5k, and 10k gesture signatures.
the average query response times for the sequential scan with GMDDTW and that
with GMDDTWt=10 are approximately 1170 seconds and 445 seconds, respectively,
whereas the optimal multi-step algorithm with the proposed lower bound approxi-
mation GMDLBKeogh takes approximately 36 seconds. Thus, our approach is more
than 12 times faster than the sequential scan with GMDDTWt=10 and more than 30
times faster than the sequential scan with GMDDTW.
To conclude, the proposed lower bound approximation is able to achieve an
increase in eciency of more than one order of magnitude with a negligible loss
in accuracy and thus enables ecient similarity search for gestural patterns in 3D
motion capture databases. How the proposed approaches are utilized in the digital
humanities, and in particular within the domain of gestural pattern analysis, is
discussed by means of two research use cases in the following section.
6. Applications in Digital Humanities: Two Gesture Research Use
Cases
Recent studies in linguistics and cognitive science have shown that the work-
ings of the human mind are intricately bound to the workings of the human
body [46,47,48,49]. Concomitantly, research has shown that co-speech behavior
is highly conventionalized and intricately tied to structures in the speech stream
[50,51,52,53,54,55]. Together, these conventionalized multimodal constructions
[56,57,58] of speech and body movement convey the semantics and pragmatics
of the message. Moving away from a long-time focus on text or speech in isolation,
linguists – especially cognitive linguists – have increasingly targeted full-bodied,
February 24, 2016 19:22 WSPC/INSTRUCTION FILE main
14 Beecks et al.
Table 1. Average query response time in seconds for processing 100-nearest-neighbor queries in a
database of 10k gesture signatures.
signature trajectories opt. multi-step seq. scan seq. scan
length GMDLBKeogh GMDDTWt=10 GMDDTW
2.0s
10 14.0 67.5 176.4
8 8.4 42.4 110.7
6 3.6 24.1 64.1
4 1.6 12.1 33.0
2 0.5 3.1 8.3
1.5s
10 8.3 44.3 97.0
8 5.0 29.3 62.3
6 2.2 16.2 34.2
4 1.3 9.6 20.3
2 0.3 2.3 5.0
1.0s
10 4.6 21.4 34.0
8 5.0 22.7 36.0
6 1.3 10.0 15.8
4 0.8 5.9 9.2
2 0.2 1.5 2.4
0.5s
10 2.8 14.8 16.1
8 1.6 7.9 8.2
6 0.8 5.4 6.0
4 0.3 1.6 1.8
2 0.1 0.6 0.7
multimodal interaction as their object of study. However, due to the challenges
inherent in studying multimodal data, which requires recording and tediously an-
notating full conversations, gesture studies to date have largely been based on case
studies of one or two individuals in conversation. With recent advances in digital
data, such as the availability of a few large video-based language corpora that pro-
vide audio-video streams with time-aligned closed-captioning textb, linguists now
have hundreds of spontaneous conversations available to them for study. However,
the annotation of body movement remains complex and highly reliant on qualitative
measures based on an annotator’s visual examination of a video played at reduced
speed.
3D motion capture, however, provides a radically dierent lens through which to
both capture and examine multimodal data. Here we describe two research programs
bFor example, UCLA’s Little Red Hen lab is an international research team using a proprietary
online corpus of more than 200,000 hours of broadcast television: https://sites.google.com/
site/distributedlittleredhen/home
February 24, 2016 19:22 WSPC/INSTRUCTION FILE main
Ecient Query Processing in 3D Motion Capture Gesture Databases via the Gesture Matching Distance 15
that use 3D motion capture data to investigate certain structures in embodied
representations (i.e. gesture, head movement, and other modalities) and how these
are co-articulated with structures in the speech stream. The aim of each study is the
understanding of the interaction between conceptual structure, linguistic structure
– i.e. the speech, and embodied/physical structure.
6.1. Aspectual contours: Matching verb types with gestural
movement types
In the study presented in [59,60], 3D motion capture data was used to investigate
the gestural profiles corresponding to linguistic utterances conveying grammatical
phenomenon known as aspect – or the linguistic phenomenon that captures how
speakers modulate the ”temporal contour” of an event [61]. Contour implies a shape
in space, making aspect a natural grammatical category through which to explore
the trajectories of co-speech gestures. Aspect encodes the ways in which an event can
be construed dynamically by performing additional computations without losing the
character of the original event [62]. For example, inherent in the meaning of verbs
in English such as sneeze is the interpretation of the event as a bounded, punctual,
single episode. However, aspect is dynamic and can be altered in interaction with
grammatical elements that have aspectual force. Using a normally bounded verb
such as sneeze in a progressive construction (-ing) renders the event unbounded and
yields an iterative interpretation, as in He is sneezing or He keeps sneezing [63,64].
In a study of the co-speech gestures associated with aspect-marking auxiliary
verbs in English, [53] examined constructions in English headed by the auxiliary co-
verbs cont inue,keep,start,stop and quit (e.g. keep sneezing, stop talking, quit smok-
ing, etc.). The goal of the study was to determine if gestures correlated with these
auxiliary constructions were conventionalized across speakers, and if so, what the
conventionalized features of the gestures are for each construction. Results showed a
statistically significant correlation between both the timing and the form of the ges-
ture and the aspect marked in the auxiliary verb in the speech stream. The gesture
profile of the auxiliary keep, for example, was characterized by longer onset times
(i.e., a greater latency between onset of gesture and onset of the auxiliary verb in
the speech) and repeated gesture strokes, many of which were cyclic or spiral in
trajectory. This study and others [65,60] have noted that prototypical movement
profiles are readily recognizable in co-speech gesture given certain linguistic cues.
In a follow-up study [59,60], 3D motion capture data was used to explore the
forms that emerged as aspect-marking profiles in [53]. The motion capture data
increased the sophistication of the analysis by allowing us to investigate the degree
of similarity between the gesture profiles corresponding to spoken utterances, as
well as providing more nuanced visualizations of the movement traces and tempo-
ral dynamics of these gestures. Eight native speakers of North American English
were recorded in the Natural Media Lab of RWTH Aachen University in casual
conversation with a confederate, who remained the same for all participants. Par-
February 24, 2016 19:22 WSPC/INSTRUCTION FILE main
16 Beecks et al.
ticipants recounted and interpreted a short movie and conversed about topics such
as habits and hobbies, resulting in approximately six hours of recorded interaction.
To analyze the data, we identified those discourse sequences in which the trajec-
tory, direction, and form of the gesture trace (circle, spiral, arc, etc.) reflected one of
the conventionalized, aspectually-charged forms established in the previous research
[53].
The computational analyses of gesture similarity by means of a distance-based
similarity model [1] enables us to recognize in a quantitative manner (rather than
relying on visual assessment) which trajectory type a gesture has. This proves most
useful in dierentiating forms, for example, a spiral and circle, which dier only in
displacement in space for the former, or lack thereof for the latter. Such a distinc-
tion is dicult to make unequivocally using manual annotation which relies on a
researcher’s observation of a video (possibly involving poor camera angles of the
gesture) played at reduced speed.
6.2. Establishing multimodal clusters in dialogic travel-planning
data
Brenger et al. [66] investigated multimodal constructions that may be observed
when interlocutors utilize their gesture spaces for spatial-geographical orientation
during collaborative travel planning (e.g. planning an Interrail trip through Eu-
rope). The basis for the study was the Multimodal Speech & Kinetic Action Cor-
pus (MuSKA), compiled in the Natural Media Lab of RWTH Aachen University
[67,68]. As in the previous use case, several data streams were recorded and aligned
in the Natural Media Lab (audio, video and 3D motion capture data), though in
this study the recorded conversations were informal dialogues between friends. In
speech, indicating potential travel destinations and routes typically involves the
use of highly context-dependent indexical expressions such as certain functional
closed-class items [63] or shifters [69]. Examples include prepositions, pronouns,
demonstratives, and connectors. The assumption underlying this study was that, in
spoken German discourse, the use of place names and indexical expressions – such
as prepositions (e.g., nach (to), von (from), bei (at)) and locative or directional
adverbials (e.g., da (there), hier (here), rber (over)) – would correlate with distinct
kinds of gestures, namely locating and routing gestures.
More specifically, the study’s target structures were prepositional phrases such
as constructions combining prepositions and adverbials (e.g., PREP + ADV such
as, nach hier,nach da (to here/there)) or prepositions and nouns (e.g., PREP +
N such as von Norden (from the north), nach Paris (to Paris)). We also included
adverbial phrases comprising locative and directional adverbial such as ADVlocative
+ ADVdirectional (e.g., da rber (over there), hier hin (to here)).
The ”travel planning”-sub-corpus contains 60 minutes of annotated discourse
data, with speech transcripts coded for shifters and the adverbial and prepositional
phrases in which they occur. The video data were coded for gestural shifts exhibiting
February 24, 2016 19:22 WSPC/INSTRUCTION FILE main
Ecient Query Processing in 3D Motion Capture Gesture Databases via the Gesture Matching Distance 17
locating or routing functions. In three dialogues (42 minutes in total), 300 gesture-
accompanied occurrences of locative prepositions and adverbials were identified (130
place names; 170 combinations of prepositions with either locative or directional
adverbials. PREP + ADVlocative or ADVdirectional). Regarding spatial orientation
and gestural charting, we observed two main strategies: a) indicating places (cities,
countries) through locating gestures; and b) tracing trajectories through routing ges-
tures. We hypothesized that whereas prepositional phrases entailing place names or
locative adverbs correlate with indexical locating gestures, deictic adverbial phrases
may co-occur with both locating gestures and routing gestures containing specific
directional movement information that is not necessarily specified in the concurrent
speech [70,71,72,52]. In addition to analyzing gestural patterns and multimodal
clusters with the help of the computational methods presented in this paper, we
are currently working on appropriate ways to visualize the data in the form of heat
maps.
6.3. Insights and Future Directions
In both case studies outlined here – and indeed throughout gesture studies, whether
working with motion capture data or video data – the methodology continues to
require manually searching of corpus data and annotated ELAN files for linguistic
phrases and then comparing the corresponding gestures to each other in terms of
their spatiotemporal similarity. Thus, in the travel-planning study, the main eort
lay in manually identifying spatiotemporal aspects and properties of correspond-
ing gestures that allowed them to be regarded as routing or placing gestures – a
very time consuming venture. However, the strength of the approach presented in
this paper is in its goal of integrating the various audio and video data streams
and annotated transcripts into a motion-capture driven multimodal database that
can be eciently searched with the types of query processing algorithms presented
here. This would enable the semi-automatic search for gestures’ spatial and tempo-
ral characteristics with their co-occurring linguistic structures. The computational
identification of inter-gestural similarity would dramatically speed up the search
process and thus enable future gesture researchers to explore larger corpora than is
currently possible with manual searching. With regards to computational gesture
signatures, these investigations could additionally be extended to the identification
of any gestural pattern – locating and routing gestures, aspect-marking forms, and
others, rather than relying on the linguistic target phrases that currently drive the
searches.
Part of the value of interdisciplinary approaches to complex, dynamic multi-
modal data such as the collaborations presented here lies in the reciprocity of the
collaborations: not only are speech and gesture data in multiple streams a welcome,
and increasingly necessary, challenge for computer scientists, the computational ap-
proach is also increasingly crucial for linguists and gesture researchers. For instance,
one prerequisite to identifying relatively stable patterns of correlated linguistic and
February 24, 2016 19:22 WSPC/INSTRUCTION FILE main
18 Beecks et al.
gestural structures is that the multimodal cluster is used with a relatively high fre-
quency. The computational methods applied to multimodal speech and gesture data
suggested here would thus also enable linguists and gesture researchers to contribute
to the advancement of the still young area of multimodal cluster analysis and thus
to predict communicative behavior in certain utterance contexts. The inclusion of
an aligned similarity search for syntactic structures and phrases, coupled with the
presented similarity search for kinetic movement patterns, could take this promis-
ing venture one step further when it comes to identifying time-elastic multimodal
clusters in larger multimodal corpora.
7. Conclusions
In this paper, we have addressed the issue of eciently accessing gestural patterns in
3D motion capture data based on spatiotemporal similarity. To this end, we modeled
gestural patterns by means of gesture signatures and investigated a lower bound
approximation of the Gesture Matching Distance. Our approach is able to achieve
an increase in eciency of more than one order of magnitude with a negligible loss
in accuracy. We thus claim that the proposed distance-based approach to gestural
pattern analysis enables the semi-automatic investigation of large heterogeneous
motion capture data archives.
References
[1] C. Beecks, M. Hassani, J. Hinnell, D. Sch¨uller, B. Brenger, I. Mittelberg, and T. Seidl,
“Spatiotemporal similarity search in 3d motion capture gesture streams,” in Proceed-
ings of the 14th International Symposium on Spatial and Temporal Databases (SSTD
2015), 2015.
[2] J. Blackburn and E. Ribeiro, “Human motion recognition using isomap and dynamic
time warping,” in Human Motion–Understanding, Modeling, Capture and Animation.
Springer, 2007, pp. 285–298.
[3] J. Yang, Y. Li, and K. Wang, “A new descriptor for 3d trajectory recognition via
modified cdtw,” in Automation and Logistics (ICAL), 2010 IEEE International Con-
ference on. IEEE, 2010, pp. 37–42.
[4] M. Hahn, L. Kr¨uger, and C. W¨ohler, “3d action recognition and long-term prediction
of human motion,” in Computer Vision Systems. Springer, 2008, pp. 23–32.
[5] L. J. Latecki, V. Megalooikonomou, Q. Wang, R. Lakaemper, C. A. Ratanamahatana,
and E. Keogh, “Elastic partial matching of time series,” in Knowledge Discovery in
Databases: PKDD 2005. Springer, 2005, pp. 577–584.
[6] M. Vlachos, M. Hadjieleftheriou, D. Gunopulos, and E. Keogh, “Indexing multi-
dimensional time-series with support for multiple distance measures,” in Proceedings
of the ninth ACM SIGKDD international conference on Knowledge discovery and
data mining. ACM, 2003, pp. 216–225.
[7] M. Vlachos, G. Kollios, and D. Gunopulos, “Elastic translation invariant matching
of trajectories,” Machine Learning, vol. 58, no. 2-3, pp. 301–334, 2005.
[8] L. Chen and R. Ng, “On the marriage of lp-norms and edit distance,” in Proceedings
of the Thirtieth international conference on Very large data bases-Volume 30. VLDB
Endowment, 2004, pp. 792–803.
February 24, 2016 19:22 WSPC/INSTRUCTION FILE main
Ecient Query Processing in 3D Motion Capture Gesture Databases via the Gesture Matching Distance 19
[9] L. Chen, M. T. ¨
Ozsu, and V. Oria, “Robust and fast similarity search for moving ob-
ject trajectories,” in Proceedings of the 2005 ACM SIGMOD international conference
on Management of data. ACM, 2005, pp. 491–502.
[10] S. Fang and H. Chan, “Human identification by quantifying similarity and dissim-
ilarity in electrocardiogram phase space,” Pattern Recognition, vol. 42, no. 9, pp.
1824–1831, 2009.
[11] C. Beecks, M. Hassani, F. Obeloer, and T. Seidl, “Ecient distance-based gestural
pattern mining in spatiotemporal 3d motion capture databases,” in Proceedings of
the 15th International Conference on Data Mining Workshops,2015.
[12] T. Seidl and H.-P. Kriegel, “Optimal multi-step k-nearest neighbor search,” in Pro-
ceedings of the ACM SIGMOD International Conference on Management of Data,
1998, pp. 154–165.
[13] C. Beecks, “Distance-based similarity models for content-based multimedia
retrieval,” Ph.D. dissertation, RWTH Aachen University, 2013. [Online]. Available:
http://darwin.bth.rwth-aachen.de/opus3/volltexte/2013/4807/
[14] C. Beecks, S. Kirchho, and T. Seidl, “On stability of signature-based similarity mea-
sures for content-based image retrieval,” Multimedia Tools and Applications,vol.71,
no. 1, pp. 349–362, 2014.
[15] C. Beecks and T. Seidl, “On stability of adaptive similarity measures for content-
based image retrieval,” in Proceedings of the International Conference on Multimedia
Modeling, 2012, pp. 346–357.
[16] C. Beecks, M. S. Uysal, and T. Seidl, “A comparative study of similarity measures
for content-based multimedia retrieval,” in Proceedings of the IEEE International
Conference on Multimedia and Expo, 2010, pp. 1552–1557.
[17] Y. Rubner, C. Tomasi, and L. J. Guibas, “The earth mover’s distance as a metric
for image retrieval,” International Journal of Computer Vision, vol. 40, no. 2, pp.
99–121, 2000.
[18] C. Beecks, M. S. Uysal, and T. Seidl, “Signature quadratic form distance,” in Pro-
ceedings of the AC M Int ern ati ona l Con fere nce on I mage and V ideo Re tri eva l, 2010,
pp. 438–445.
[19] F. Hausdor,Grundz¨uge der Mengenlehre.VonVeit,1914.
[20] D. P. Huttenlocher, G. A. Klanderman, and W. Rucklidge, “Comparing images us-
ing the hausdordistance,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 15, no. 9, pp. 850–863, 1993.
[21] B. G. Park, K. M. Lee, and S. U. Lee, “Color-based image retrieval using perceptually
modified hausdordistance,” EURASIP Journal of Image and Video Processing,vol.
2008, pp. 4:1–4:10, 2008.
[22] C. Beecks, S. Kirchho, and T. Seidl, “Signature matching distance for content-based
image retrieval,” in Proceedings of the ACM International Conference on Multimedia
Retrieval, 2013, pp. 41–48.
[23] S. Mitra and T. Acharya, “Gesture recognition: A survey,” Trans. Sys. Man
Cyber Part C, vol. 37, no. 3, pp. 311–324, May 2007. [Online]. Available:
http://dx.doi.org/10.1109/TSMCC.2007.893280
[24] N. A. Ibraheem and R. Z. Khan, “Article: Survey on various gesture recognition tech-
nologies and techniques,” International Journal of Computer Applications,vol.50,
no. 7, pp. 38–44, 2012.
[25] R. Z. Khan and N. A. Ibraheem, “Survey on gesture recognition for hand image
postures.” 2012, pp. 110–121.
[26] J. LaViola, “A survey of hand posture and gesture recognition techniques and tech-
nology,” Brown University, Providence, RI,1999.
February 24, 2016 19:22 WSPC/INSTRUCTION FILE main
20 Beecks et al.
[27] J. Liu and M. Kavakli, “A survey of speech-hand gesture recognition for the
development of multimodal interfaces in computer games,” in Proceedings of the
IEEE International Conference on Multimedia and Expo, 2010, pp. 1564–1569.
[Online]. Available: http://dx.doi.org/10.1109/ICME.2010.5583252
[28] S. S. Rautaray and A. Agrawal, “Vision based hand gesture recognition for human
computer interaction: a survey,” Artificial Intelligence Review, vol. 43, no. 1, pp. 1–54,
2015.
[29] S. Rueux, D. Lalanne, E. Mugellini, and O. A. Khaled, “A survey of datasets for
human gesture recognition,” in Human-Computer Interaction. Advanced Interaction
Modalities and Techniques - 16th International Conference, ser. Lecture Notes in
Computer Science, vol. 8511. Springer, 2014, pp. 337–348.
[30] R. Watson, “A survey of gesture recognition techniques,” Trinity College Dublin,
Department of Computer Science, Tech. Rep., 1993.
[31] Y. Wu and T. S. Huang, “Vision-based gesture recognition: A review,” in Gesture-
based communication in human-computer interaction. Springer, 1999, pp. 103–115.
[32] T. B. Moeslund and E. Granum, “A survey of computer vision-based human motion
capture,” Computer vision and image understanding, vol. 81, no. 3, pp. 231–268,
2001.
[33] T. B. Moeslund, A. Hilton, and V. Kr¨uger, “A survey of advances in vision-based
human motion capture and analysis,” Computer vision and image understanding,vol.
104, no. 2, pp. 90–126, 2006.
[34] C. Keskin, A. Erkan, and L. Akarun, “Real time hand tracking and 3d gesture recog-
nition for interactive interfaces using hmm,” ICANN/ICONIPP, vol. 2003, pp. 26–29,
2003.
[35] Y. Nam and K. Wohn, “Recognition of hand gestures with 3d, nonlinear arm move-
ment,” Pattern recognition letters, vol. 18, no. 1, pp. 105–113, 1997.
[36] A. Psarrou, S. Gong, and M. Walter, “Recognition of human gestures and behaviour
based on motion trajectories,” Image and Vision Computing, vol. 20, no. 5, pp. 349–
358, 2002.
[37] H.-I. Suk, B.-K. Sin, and S.-W. Lee, “Hand gesture recognition based on dynamic
bayesian network framework,” Pattern Recognition, vol. 43, no. 9, pp. 3059–3072,
2010.
[38] ——, “Recognizing hand gestures using dynamic bayesian network,” in Automatic
Face & Gesture Recognition, 2008. FG’ 08 . 8t h IE EE I nt er na ti on al C on fe rence o n.
IEEE, 2008, pp. 1–6.
[39] J. Cheng, C. Xie, W. Bian, and D. Tao, “Feature fusion for 3d hand gesture recogni-
tion by learning a shared hidden space,” Pattern Recognition Letters, vol. 33, no. 4,
pp. 476–484, 2012.
[40] T. Arici, S. Celebi, A. S. Aydin, and T. T. Temiz, “Robust gesture
recognition using feature pre-processing and weighted dynamic time warping,”
Multimedia Tools Appl., vol. 72, no. 3, pp. 3045–3062, 2014. [Online]. Available:
http://dx.doi.org/10.1007/s11042-013-1591- 9
[41] S. Bodiroˇza, G. Doisy, and V. V. Hafner, “Position-invariant, real-time gesture recog-
nition based on dynamic time warping,” in Proceedings of the 8th ACM/IEEE inter-
national conference on Human-robot interaction. IEEE Press, 2013, pp. 87–88.
[42] H. Stern, M. Shmueli, and S. Berman, “Most discriminating segment–longest common
subsequence (mdslcs) algorithm for dynamic hand gesture classification,” Pattern
Recognition Letters, vol. 34, no. 15, pp. 1980–1989, 2013.
[43] H. Hasan and S. Abdul-Kareem, “Static hand gesture recognition using neural net-
works,” Artificial Intelligence Review, vol. 41, no. 2, pp. 147–181, 2014.
February 24, 2016 19:22 WSPC/INSTRUCTION FILE main
Ecient Query Processing in 3D Motion Capture Gesture Databases via the Gesture Matching Distance 21
[44] E. J. Keogh, “Exact indexing of dynamic time warping,” in Proceedings of 28th In-
ternational Conference on Very Large Data Bases, 2002, pp. 406–417.
[45] A. Sadeghipour, L.-P. Morency, and S. Kopp, “Gesture-based object recognition us-
ing histograms of guiding strokes,” in Proceedings of the British Machine Vision
Conference, 2012.
[46] B. Bergen and K. Wheeler, “Grammatical aspect and mental simulation,” Brain and
Language, vol. 112, no. 3, pp. 150–158, 2010.
[47] R. W. Gibbs Jr, Embodiment and cognitive science. Cambridge University Press,
2005.
[48] C. M¨uller, A. Cienki, E. Fricke, S. H. Ladewig, D. McNeill, and J. Bressem, Eds., Body
– Language – Communication: An international handbook on multimodality in human
interaction: Vol. 2, ser. Handb¨ucher zur Sprach- und Kommunikationswissenschaft.
Berlin and Boston: De Gruyter Mouton, 2014, vol. 38.2.
[49] C. M¨uller, A. Cienki, E. Fricke, S. H. Ladewig, D. McNeill, and S. Teßendorf, Body -
Language - Communication: An International Handbook on Multimodality in Human
Interaction. (Handbooks of Linguistics and Communication Science 38). Berlin/
Boston: De Gruyter Mouton, 2013.
[50] J. Bressem, “A linguistic perspective on the notation of form features in gestures,” in
Body – Language – Communication: Vol. 1, ser. Handb¨ucher zur Sprach- und Kom-
munikationswissenschaft, C. M¨uller, A. Cienki, E. Fricke, S. H. Ladewig, D. McNeill,
and S. Teßendorf, Eds. Berlin and Boston: De Gruyter Mouton, 2013, vol. 38.1, pp.
1079–1098.
[51] C. Debras, “L?expression multimodale du positionnment interactionnel (multimodal
stance-taking).” Ph.D. dissertation, 2013.
[52] E. Fricke, Origo, Geste und Raum. Mouton de Gruyter, 2007.
[53] J. Hinnell, “Multimodal aspectual constructions in north american english: A corpus
analysis of aspect in co-speech gesture using little red hen,” in International Society
of Gesture Studies (ISGS),2014.
[54] I. Mittelberg, “The exbodied mind. cognitive-semiotic principles as motivating forces
in gesture,” in Body – Language – Communication: Vol. 1, ser. Handb¨ucher zur
Sprach- und Kommunikationswissenschaft, C. M¨uller, A. Cienki, E. Fricke, S. H.
Ladewig, D. McNeill, and S. Teßendorf, Eds. De Gruyter Mouton, 2013, vol. 38.1,
pp. 750–779.
[55] S. Schoonjans, “Modalpartikeln als multimodale konstruktionen. eine korpusbasierte
kookkurrenzanalyse von modalpartikeln und gestik im deutschen,” Ph.D. dissertation,
2014.
[56] A. E. Goldberg, Constructions at work: The nature of generalization in
language. Oxford and New York: Oxford University Press, 2006. [Online].
Available: http://search.ebscohost.com/login.aspx?direct=true\&scope=site\&db=
nlebk\&db=nlabk\&AN=215621
[57] F. Steen and M. Turner, Multimodal construction grammar. Language and the Cre-
ative Mind. CSLI, 2013, pp. 255–274.
[58] E. Zima, “Gibt es multimodale Konstruktionen? Eine Studie zu [V (motion) in circles]
und [all the way from X PREP Y],” 2014.
[59] J. Hinnell, C. Beecks, M. Hassani, T. Seidl, and I. Mittelberg, “Multimodal auxiliary
constructions in english: A quantitative image-schema analysis of aspectual contours
in gesture,” in 12th Conference on Conceptual Structure, Discourse and Language
(CSDL),2014.
[60] I. Mittelberg, J. Hinnell, C. Beecks, M. Hassani, and T. Seidl, “Emerging grammar
in gesture: A motion-capture data analysis of image-schematic aspectual contours
February 24, 2016 19:22 WSPC/INSTRUCTION FILE main
22 Beecks et al.
in north american english speaker-gesturers.” in International Cognitive Linguistics
Conference (ICLC), 2015.
[61] B. Comrie, Aspect: An introduction to the study of verbal aspect and related problems.
Cambridge university press, 1976, vol. 2.
[62] W. Frawley, Linguistic Semantics. Lawrence Erlbaum Associates, 1992. [Online].
Available: https://books.google.de/books?id=uyavMKhIfV8C
[63] L. Talmy, Towards a Cognitive Semantics. MIT Press, 2000.
[64] B. Heine and T. Kuteva, World Lexicon of Grammaticalization. Cambridge
University Press, 2002. [Online]. Available: https://books.google.de/books?id=
Ua3vSiz0gaEC
[65] A. Cienki, Image schemas and mimetic schemas in cognitive linguistics and gesture
studies, ser. Benjamins Current Topics. John Benjamins Publishing Company,
2015. [Online]. Available: https://books.google.de/books?id=wxCqCgAAQBAJ
[66] B. Brenger, D. Sch¨uller, M. Priesters, and I. Mittelberg, “3d heat maps of multimodal
travel planning: Correlating prepositional and adverbial phrases with locating and
routing gestures,” Accepted abstract for International Society for Gesture Studies
(ISGS) Conference, 2016.
[67] B. Brenger, “Head gestures in dialogue - identification and computational analysis of
motion-capture data profiles of speakers’ and listeners’ communicative action.” 2015.
[68] B. Brenger and I. Mittelberg, “Shakes, nods and tilts. motion-capture data profiles of
speakers? and listeners? head gestures,” in Proceedings of the 3rd Gesture and Speech
in Interaction (GESPIN) Conference,2015.
[69] R. Jakobson, “Shifters, verbal categories and the russian verb,” in Word and
Language, ser. Selected Writings. De Gruyter, 1971, vol. II. [Online]. Available:
https://books.google.de/books?id=ASkcAAAAIAAJ
[70] H. H. Clark, “Pointing and placing,” in Pointing. Where language, culture, and cog-
nition meet, S. Kita, Ed. Lawrence Erlbaum Assoc., 2003, pp. 243–268.
[71] K. Cooperrider and R. N´u˜nez, “Across time, across the body: Transversal temporal
gestures,” Gesture, vol. 9, no. 2, pp. 181–206, 2009.
[72] K. R. Coventry, T. Tenbrink, and J. E. Bateman, Spatial language and dialogue.
Oxford University Press, 2009, vol. 3.
... We computed word2vec distances between verbal labels of the concepts conveyed by gestures, and we computed kinematic distances using a well-known time-series comparison algorithm called Dynamic Time Warping (see e.g., [16][17][18]). By computing all possible distances between conveyed concepts, as well as gesture kinematics, we essentially map out a semantic and kinematic space that can be probed for covariances [13,16,19,19,20]. ...
... We computed word2vec distances between verbal labels of the concepts conveyed by gestures, and we computed kinematic distances using a well-known time-series comparison algorithm called Dynamic Time Warping (see e.g., [16][17][18]). By computing all possible distances between conveyed concepts, as well as gesture kinematics, we essentially map out a semantic and kinematic space that can be probed for covariances [13,16,19,19,20]. ...
... Furthermore, there are also other measurements for semantic distance computations that possibly yield different results, e.g., [32], and it is an interesting avenue for future research to see how gesture kinematics relates to these different semantic distance quantifications [33]. This goes the other way too, such that there are different ways to compute the kinematic distances [e.g., 19,34] for different gesture-relevant motion variables [e.g., 35] and more research is needed to benchmark different approaches for understanding semantic properties of communicative gesture kinematics. ...
Chapter
Full-text available
Most manual communicative gestures that humans produce cannot be looked up in a dictionary, as these manual gestures inherit their meaning in large part from the communicative context and are not conventionalized. However, it is understudied to what extent the communicative signal as such—bodily postures in movement, or kinematics—can inform about gesture semantics. Can we construct, in principle, a distribution-based semantics of gesture kinematics, similar to how word vectorization methods in NLP (Natural language Processing) are now widely used to study semantic properties in text and speech? For such a project to get off the ground, we need to know the extent to which semantically similar gestures are more likely to be kinematically similar. In study 1 we assess whether semantic word2vec distances between the conveyed concepts participants were explicitly instructed to convey in silent gestures, relate to the kinematic distances of these gestures as obtained from Dynamic Time Warping (DTW). In a second director-matcher dyadic study we assess kinematic similarity between spontaneous co-speech gestures produced between interacting participants. Participants were asked before and after they interacted how they would name the objects. The semantic distances between the resulting names were related to the gesture kinematic distances of gestures that were made in the context of conveying those objects in the interaction. We find that the gestures’ semantic relatedness is reliably predictive of kinematic relatedness across these highly divergent studies, which suggests that the development of an NLP method of deriving semantic relatedness from kinematics is a promising avenue for future developments in automated multimodal recognition. Deeper implications for statistical learning processes in multimodal language are discussed.
... We computed word2vec distances between verbal labels of the concepts conveyed by gestures, and we computed kinematic distances using a well-known time-series comparison algorithm called Dynamic Time Warping (see e.g., [16][17][18]). By computing all possible distances between conveyed concepts, as well as gesture kinematics, we essentially map out a semantic and kinematic space that can be probed for covariances [13,16,19,19,20]. ...
... We computed word2vec distances between verbal labels of the concepts conveyed by gestures, and we computed kinematic distances using a well-known time-series comparison algorithm called Dynamic Time Warping (see e.g., [16][17][18]). By computing all possible distances between conveyed concepts, as well as gesture kinematics, we essentially map out a semantic and kinematic space that can be probed for covariances [13,16,19,19,20]. ...
... Furthermore, there are also other measurements for semantic distance computations that possibly yield different results, e.g., [32], and it is an interesting avenue for future research to see how gesture kinematics relates to these different semantic distance quantifications [33]. This goes the other way too, such that there are different ways to compute the kinematic distances [e.g., 19,34] for different gesture-relevant motion variables [e.g., 35] and more research is needed to benchmark different approaches for understanding semantic properties of communicative gesture kinematics. ...
Preprint
Most manual communicative gestures that humans produce cannot be looked up in a dictionary, as these manual gestures inherit their meaning in large part from the communicative context and are not conventionalized. However, it is understudied to what extent the communicative signal as such — bodily postures in movement, or kinematics — can inform about gesture semantics. Can we construct, in principle, a distribution-based semantics of gesture kinematics, similar to how word vectorization methods in NLP (Natural language Processing) are now widely used to study semantic properties in text and speech? For such a project to get off the ground, we need to know the extent to which semantically similar gestures are more likely to be kinematically similar. In study 1 we assess whether semantic word2vec distances between the conveyed concepts participants were explicitly instructed to convey in silent gestures, relate to the kinematic distances of these gestures as obtained from Dynamic Time Warping (DTW). In a second director-matcher dyadic study we assess kinematic similarity between spontaneous co-speech gestures produced between interacting participants. Participants were asked before and after they interacted how they would name the objects. The semantic distances between the resulting names were related to the gesture kinematic distances of gestures that were made in the context of conveying those objects in the interaction. We find that the gestures’ semantic relatedness is reliably predictive of kinematic relatedness across these highly divergent studies, which suggests that the development of an NLP method of deriving semantic relatedness from kinematics is a promising avenue for future developments in automated multimodal recognition. Deeper implications for statistical learning processes in multimodal language are discussed.
... With computer vision (Cao, Simon, Wei, & Sheikh, 2017) we obtained motion traces of manual and head gestures (see e.g., Lepic, Börstell, Belsitzman, & Sandler, 2016;Ripperda, Drijvers, & Holler, 2020). We then investigated kinematic interrelationships between gestures (e.g., Beecks et al., 2015Beecks et al., , 2016Sato, Schouwstra, Flaherty, & Kirby, 2020), where we leverage bivariate time-series analysis (dynamic time warping (DTW)) with network topology analysis and visualization (Pouw & Dixon, 2019;Pouw et al., 2021). Through this analysis, we show that the study of gesture's form can reveal the linguistic constraints on the kinematic system as a whole. ...
Article
Full-text available
Silent gestures consist of complex multi-articulatory movements, but are now primarily studied through categorical coding of the referential gesture content. The relation of linguistic content with continuous kinematics is therefore poorly understood. Here we reanalyzed the video data from a gestural evolution experiment (Motamedi et al. 2019), which showed increases in the systematicity of gesture content over time. We applied computer vision techniques to quantify and analyze the kinematics of the original data, demonstrating that gestures become more efficient and less complex in their kinematics over generations of learners. We further detect systematicity of gesture form on the level of their interrelations, which directly scales with the systematicity obtained on semantic coding of the gestures. Thus from continuous kinematics alone we can tap into linguistic aspects that were previously only approachable through categorical coding of meaning. Finally, going beyond issues of systematicity we show how unique gesture kinematic dialects emerged over generations, as isolated chains of participants gradually diverged over iterations from other chains. We thereby show how gestures can come to embody the linguistic system at the level of interrelationships between communicative tokens, which should calibrate our theories about form and linguistic content.
... A general CNN-based model trained on edge maps that can handle multiple tasks including both general and fine-grained sketch-based image retrieval, as well as domain generalization, has been proposed in [51]. Recently, similarity search in 3D human motion data has raised a lot of attentions [5,6,57] due to its many application scenarios in VR, AR, MR and Internet of Things (IoTs). It deals with the search, matching, and classification based on 3D human motion data. ...
Article
Full-text available
Sketching provides the most natural way to provide a visual search query for visual object search. However, how to draw 3D sketches in a three-dimensional space and how to use a hand-drawn 3D sketch to search similar 3D models are not only interesting and novel, but also challenging research topics. In this paper, we try to answer them by initiating a novel study on 3D sketching and build a 3D sketching system which allows users to freely draw 3D sketches in the air and demonstrate its promising potentials in related applications such as collecting 3D sketch data and conducting 3D sketch-based 3D model retrieval. By utilizing the 3D sketching system, we collect a 3D sketch dataset, build a 3D sketch-based 3D model retrieval benchmark, and organize a Eurographics Shape Retrieval Contest (SHREC) track on 3D sketch-based shape retrieval based on the benchmark. We investigate 3D sketch and model matching problems and propose a novel 3D sketch-based model retrieval algorithm CNN-SBR based on Convolutional Neural Networks (CNNs) and achieve the best performance in the SHREC track. We wish that the 3D sketching system, the 3D sketch-based model retrieval benchmark, and the proposed 3D sketch-based model retrieval algorithm CNN-SBR will further promote sketch-based shape retrieval and its applications. We have made all of these publicly available on the project homepage: http://orca.st.usm.edu/~bli/SBR16/project.html.
... Optical motion capture technology is a valuable tool when quantifying human movements and clinical, biomechanical, and industrial applications require high system accuracy [1]. While motion capture is often tied to high costs, over the last years less expensive systems have been developed to measure human or robotic motion including goniometers [2], accelerometers [3][4][5], inertia-based and electromagnetic sensors [6][7][8]. ...
Article
Full-text available
In 3D motion capture, multiple methods have been developed in order to optimize the quality of the captured data. While certain technologies, such as inertial measurement units (IMU), are mostly suitable for 3D orientation estimation at relatively high frequencies, other technologies, such as marker-based motion capture, are more suitable for 3D position estimations at a lower frequency range. In this work, we introduce a complementary filter that complements 3D motion capture data with high-frequency acceleration signals from an IMU. While the local optimization reduces the error of the motion tracking, the additional accelerations can help to detect micro-motions that are useful when dealing with high-frequency human motions or robotic applications. The combination of high-frequency accelerometers improves the accuracy of the data and helps to overcome limitations in motion capture when micro-motions are not traceable with 3D motion tracking system. In our experimental evaluation, we demonstrate the improvements of the motion capture results during translational, rotational, and combined movements.
... External clustering validation and internal clustering validation are the two main categories of cluster validity. The first one compares the clustering result to a reference result which is considered as the ground truth [71]. The second one, evaluates the clustering only with the result itself, i-e, the structure of found clusters and their relations to each other. ...
Article
Across a wide variety of fields and especially for industrial companies, data are being collected and accumulated at a dramatic pace from many different resources and services. Hence, there is an urgent need for a new generation of computational theories and tools to assist humans in extracting useful information from the rapidly growing volumes of digital data. A well-known fundamental task of data mining to extract information is clustering. However, with the modified applications for various domains, several researchers have developed and have provided many clustering algorithms. This complexity makes it difficult for researchers and practitioners to keep up with clustering algorithms development. As a result, finding appropriate algorithms helps significantly to organize information and extract the correct answer from different queries of the databases. In this respect, the aim of this paper is to find the appropriate clustering algorithm for sparse industrial dataset. To achieve this goal, we first present related work that focus on comparing different clustering algorithms over the past twenty years. After that, we provide a categorization of different clustering algorithms found in the literature by matching their properties to the 4V’s challenges of Big data which allow us to select the candidate clustering algorithm. Finally, using internal validity indices, K-means, agglomerative hierarchical, DBSCAN and SOM have been implemented and compared on four datasets. In addition, we highlighted the best performing clustering algorithm that gives us the efficient clusters for each dataset.
... For example, the STRAIGHT PATH gesture in Figure 2 portrays a PROCESS construed through the aspectual framing of an activity (sitcom watching) as progressive and bounded. The blurring of a BOUNDARY portrayed by the linguist in Figure 3 involves a FORCEfully conditioned PROCESS; the gestural framing of ongoing PROCESSes also includes CYCLES and SPIRALs drawn in the air (Beecks et al. 2016;Mittelberg and Hinnell 2015;Mittelberg in press;Schüller et al. 2017). 15 Image schemas have also been found to motivate other intransitive hand movements that enact the idea of PROCESS without pretending to perform transitive actions. ...
Article
Full-text available
Embodied image schemas are central to experientialist accounts of meaning-making. Research from several disciplines has evidenced their pervasiveness in motivating form and meaning in both literal and figurative expressions across diverse semiotic systems and art forms (e.g., Gibbs and Colston; Hampe; Johnson; Lakoff; and Mandler). This paper aims to highlight structural similarities between, on the one hand, dynamic image schemas and force schemas and, on the other, hand shapes and gestural movements. Such flexible correspondences between conceptual and gestural schematicity are assumed to partly stem from experiential bases shared by incrementally internalized conceptual structures and the repeated gestural (re-) enacting of bodily actions as well as more abstract semantic primitives (Lakoff). Gestures typically consist of evanescent, metonymically reduced hand configurations, motion onsets, or movement traces that minimally suggest, for instance, a PATH, the idea of CONTAINMENT, an IN-OUT spatial relation, or the momentary loss of emotional BALANCE. So, while physical in nature, gestures often emerge as rather schematic gestalts that, as such, have the capacity to vividly convey essential semantic and pragmatic aspects of high relevance to the speaker. It is further argued that gesturally instantiated image schemas and force dynamics are inherently meaningful structures that typically underlie more complex semantic and pragmatic processes involving, for instance, metonymy, metaphor, and frames. First, I discuss previous work on how image schemas, force gestalts, and mimetic schemas may underpin hand gestures and body postures. Drawing on Gibbs’ dynamic systems account of image schemas, I then introduce an array of tendencies in gestural image schema enactments: body-inherent/self-oriented (body as image-schematic structure; forces acting upon the body); environment-oriented (material culture including spatial structures), and interlocutor-oriented (intersubjective understanding). Adopting a dynamic systems perspective (e.g.,Thompson and Varela) thus puts the focus on how image schemas and force gestalts that operate in gesture may function as cognitive-semiotic organizing principles that underpin a) the physical and cognitive self-regulation of speakers; b) how they interact with the (virtual) environment while talking; and c) intersubjective instances of resonance and understanding between interlocutors or between an artwork and its beholder. Examples of these patterns are enriched by video and motion-capture data, showing how numeric kinetic data allow one to measure the temporal and spatial dimensions of gestural articulations and to visualize movement traces.
Article
With the increasing availability of human motion data captured in the form of 2D or 3D skeleton sequences, more complex motion recordings need to be processed. In this paper, we focus on similarity-based indexing and efficient retrieval of motion episodes — medium-sized skeleton sequences that consist of multiple semantic actions and correspond to some logical motion unit (e.g. a figure skating performance). As a first step toward efficient retrieval, we apply the motion-word technique to transform spatio-temporal skeleton sequences into compact text-like documents. Based on these documents, we introduce a two-phase retrieval scheme that first finds a set of candidate query results and then re-ranks these candidates with more expensive application-specific methods. We further index the motion-word documents using inverted files, which allows us to retrieve the candidate documents in an efficient and scalable manner. We also propose additional query-reduction techniques that accelerate both the retrieval phases by removing semantically irrelevant parts of the motion query. Experimental evaluation is used to analyze the effects of the individual proposed techniques on the retrieval efficiency and effectiveness.
Preprint
ilent gestures consist of complex multi-articulatory movements, but are now primarily studied through categorical coding of the referential gesture content. The relation of linguistic content with continuous kinematics is therefore poorly understood. Here we reanalyzed the video data from a gestural evolution experiment (Motamedi et al. 2019), which showed increases in the systematicity of gesture content over time. We applied computer vision techniques to quantify and analyze the kinematics of the original data, demonstrating that gestures become more efficient and less complex in their kinematics over generations of learners. We further detect systematicity of gesture form on the level of their interrelations, which directly scales with the systematicity obtained on semantic coding of the gestures. Thus from continuous kinematics alone we can tap into linguistic aspects that were previously only approachable through categorical coding of meaning. Finally, going beyond issues of systematicity we show how unique gesture kinematic dialects emerged over generations, as isolated chains of participants gradually diverged over iterations from other chains. We thereby show how gestures can come to embody the linguistic system at the level of interrelationships between communicative tokens, which should calibrate our theories about form and linguistic content.
Conference Paper
Full-text available
Discussions of iconicity in co-speech gesture first centered on iconicity in representational gestures (McNeill 1992), whereas more recent studies have examined iconicity through form-based analyses (Bressem 2012; Ladewig 2011; Harrison 2010). In this multimodal corpus-based study, I examine iconicity in co-speech gesture at a more fine-grained level, using quantitative measures of movement, number of strokes, degree of synchrony of onset, and other parameters. A pilot study (Hinnell 2013) established the consistent marking of aspect in periphrastic auxiliary constructions in co-speech gesture. This study uses a more extensive dataset to establish a “gestural profile” for each auxiliary. It contributes to a growing body of work examining whether, and if so, how, higher order grammatical notions are represented in co-speech gesture. The dataset consisted of 50 instances each of five aspect-marking periphrastic auxiliary constructions in English – CONTINUE, KEEP, START, STOP, AND QUIT – occurring in their most frequent construction according to COCA, either the infinitival, as in it’s just continued to grow, or the progressive construction, as in the jackpot keeps getting higher, water started coming over the edge, etc. Data analyses were conducted on dyadic interaction collected using the Distributed Little Red Hen Lab. Each instance consisted of a video clip and transcript and was annotated first for gestural co-occurrence with the intonation unit containing the auxiliary construction. This provided a preliminary measure of gestural attraction. Instances with co-occurring gesture were then further annotated for a range of linguistic variables as well as gestural variables, which included plexity (Talmy 2000), further specified by number of action phases per stroke (Kendon 2008), (a)synchronicity of onset between the speech cue and gesture phrase in milliseconds, movement path, direction, and axis. Of the 5 auxiliaries, continue is most reliably gesturally marked, with over 75% instances having an iconic marking of open aspect in the variables considered here. Onset asynchrony is greater for the open aspect as compared to phase aspect constructions. Results also show that movement direction, axis, and number of action phases reach statistical significance. Quantitative results across each variable, taken together, provide a gestural profile for each aspect-marking construction. This study supports the evidence presented in Hinnell (2013) that aspect is reliably and consistently marked in co-speech gesture, and presents a more fine-grained examination of the nature of the iconicity. It is part of a larger study investigating the correlation between linguistic cues, gesture and body movement, where robust correlations would provide further evidence (see Debras 2013, Schoonjans 2013) that the notion of construction (Goldberg 2006; Steen and Turner 2013) should be extended from speech to include embodied components. Further research will include head movement and gaze analysis.
Conference Paper
The use of gesture as a natural interface serves as a motivating force for research in modeling, analyzing and recognition of gestures. In particular, human computer intelligent interaction needs vision-based gesture recognition, which involves many interdisciplinary studies. A survey on recent vision-based gesture recognition approaches is given in this paper. We shall review methods of static hand posture and temporal gesture recognition Several application systems of gesture recognition are also described in this paper. We conclude with some thoughts about future research directions.
Book
Volume I of the handbook presents contemporary, multidisciplinary, historical, theoretical, and methodological aspects of how body movements relate to language. It documents how leading scholars from differenct disciplinary backgrounds conceptualize and analyze this complex relationship. Five chapters and a total of 72 articles, present current and past approaches, including multidisciplinary methods of analysis. The chapters cover: I. How the body relates to language and communication: Outlining the subject matter, II. Perspectives from different disciplines, III. Historical dimensions, IV. Contemporary approaches, V. Methods.
Article
This book considers how people talk about the location of objects and places. Spatial language has occupied many researchers across diverse fields, such as linguistics, psychology, GIScience, architecture, and neuroscience. However, the vast majority of work in this area has examined spatial language in monologue situations, and often in highly artificial and restricted settings. Yet there is a growing recognition in the language research community that dialogue rather than monologue should be a starting point for language understanding. Hence, the current zeitgeist in both language research and robotics/AI demands an integrated examination of spatial language in dialogue settings. This book provides such integration and reports on the latest developments in this important field. © 2009 editorial matter and organization Kenny R. Coventry, Thora Tenbrink, and John Bateman. All rights reserved.