Conference PaperPDF Available

Perspectives on Real-time Computation of Movement Coarticulation



We discuss the notion of movement coarticulation, which has been studied in several fields such as motor control, music performance and animation. In gesture recognition, movement coarticulation is generally viewed as a transition between "gestures" that can be problematic. We propose here to account for movement coarticulation as an informative element of skilled practice and propose to explore computational modeling of coarticulation. We show that established probabilistic models need to be extended to accurately take into account movement coarticulation, and we propose research questions towards such a goal.
Perspectives on Real-time Computation
of Movement Coarticulation
eric Bevilacqua
Ircam - Centre Pompidou
Paris, France
Baptiste Caramiaux
McGill University
Montreal, QC, Canada
Paris, France
Jules Franc¸oise
School of Interactive Arts
and Technologies
Simon Fraser University
Surrey, Canada
We discuss the notion of movement coarticulation, which has
been studied in several fields such as motor control, music
performance and animation. In gesture recognition, move-
ment coarticulation is generally viewed as a transition be-
tween “gestures” that can be problematic. We propose here
to account for movement coarticulation as an informative ele-
ment of skilled practice and propose to explore computational
modeling of coarticulation. We show that established proba-
bilistic models need to be extended to accurately take into
account movement coarticulation, and we propose research
questions towards such a goal.
Author Keywords
Coarticulation; Movement; Gesture; Recognition; Motor
ACM Classification Keywords
H.5. Information Interfaces and Presentation (e.g. HCI):
Multimedia Information Systems; G.3Probability And Statis-
tics: Time series analysis; J.5 Arts and Humanities: Perform-
ing arts (e.g., dance, music)
Coarticulation is a well known phenomena in speech produc-
tion occurring when a sound segment is influenced by its con-
text, such as the preceding and following sound segments in
a word or sentence1. This problem has been widely studied
and modeled for both speech recognition and generation [14].
While coarticulation has also been described in movement se-
quences, it remains largely overlooked. In Human-Computer
Interaction (HCI), and particularly in gesture-based interac-
tion, the phenomenon of coarticulation is often considered as
1We note that the word coarticulation is also used to describe the oc-
currence of two different modalities, for example voice and gesture.
In this paper, we used the term coarticulation as it is generally used
in speech production.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise,
or republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee. Request permissions from
MOCO’16, July 05 - 07, 2016, Thessaloniki, GA, Greece
Copyright is held by the owner/author(s). Publication rights licensed to ACM.
ACM 978-1-4503-4307-7/16/07...$15.00
a “problem”, since it perturbs the performance of gestural vo-
cabularies and can reduce recognition rate [1].
Coarticulation relies on the existence and formalization of
constitutive segments. For instance speech is often examined
as a finite number of phonological sound segments that are
ordered, and which ordering is linguistic-dependent. Coar-
ticulation can be observed and measured because the alter-
ation of phonemes remains consistent over a large vocabu-
lary. Considering movement, such segments become highly
variable across individuals and context-dependent, making
their formalization inherently more complex. Motor theo-
rists proposed the notion of movement primitives [3] as basic
units (typically, patterns of movement kinematics) that can
be sequenced to execute a complex movement. Within this
framework, movement coarticulation has been linked to mo-
tor skills that involve the selection of movement primitives,
their ordering and their accurate execution [29].
In this context, an important challenge is the computational
modeling of coarticulating movements with the aim of lever-
aging on user’s motor skills rather than discarding them for
the sake of accuracy in gesture recognition systems. This pa-
per aims at discussing problems and prospects with compu-
tational modeling of coarticulation for the field of movement
and computing. In particular, we propose to include aspects
from motor control theories. We first recall some important
references for movement coarticulation that span over differ-
ent disciplines. Second we present computational approaches
in interactive systems. We then illustrate typical coarticula-
tion phenomena occurring with simple gestural inputs and
the analysis of these phenomena through the lens of existing
computational models. Finally we shortly discussed the re-
sults and propose a perspective for computational movement
models involving coarticulation.
Coarticulation has been studied and formalized in speech per-
ception [30] and recognition [25, 4], communication, anima-
tion, embodied conversational agents [21] as well as sign lan-
guage (both in pure synthesis and motion retrieval and data-
driven sign-language synthesis) [11, 26, 17].
In motor control, coarticulation has been studied considering
simple tasks and movements [15, 27, 29, 23]. Typically, coar-
ticulation occurs in sensori-motor learning when the move-
ment primitives (understood as basic units such as patterns of
movement kinematics [3]) are fused in a larger phrase, which
does not appear as a series of separate events [12], but from an
intermediate-level command that encompasses several events
as one event, called “chunk”. Chunks’ boundaries are char-
acterised by higher motor variability and probability of er-
rors. This behavioral characterisation is supported by recent
work in neuroscience proposing a hierarchical motor repre-
sentation underlying expert performance (see for example the
recent review of [8]). As a corollary, coarticulation is gen-
erally considered as the results of an anticipative behaviour:
the execution of the unit is planned ahead and the movement
appears to start before the end of the previous unit [29].
The concept of coarticulation has also found an echo in the
study of musical gestures, and in particular instrumental ges-
tures [13, 20, 22, 2, 12, 20]. In particular, Godøy [12] pro-
poses an informative review of coarticulation in music perfor-
mance. In music performance, the notion of small movement
units can be linked to existing musical events such as musical
notes or sounds. Such link between score events and seg-
mented instrumentalists’ gestures have also been examined
through the use of computational model [7]. Motor theories
and cognition should be considered here since instrumental
gestures imply learning skilled movements and anticipation.
Yet, important works remains to be conducted on this area.
Finally, in dance, motor skill learning is acknowledged as a
fundamental aspect of the practice. Thus dance constitutes
also a promising field for investigating coarticulation. Nev-
ertheless, to our knowledge the notion of coarticulation in
dance has been less studied from a computational perspec-
tive. Here the challenges are two-fold: to define what consti-
tutes a movement segment that can then be used in comput-
ing systems, and to design computing systems able to under-
stand higher-level representations of complex dance move-
ments coherent with embodied cognitive mechanisms [16].
In this work, we consider gesture-based interactive systems in
which movement analysis must be achieved in real-time. In
such cases, the challenge is to identify and characterize seg-
ments from a continuous stream of motion data while simul-
taneously accounting for their context-dependent variations.
In this section, we introduce the type of models that we are
considering for real-time computation of coarticulation.
In the literature, spotting and classifying gestures in a contin-
uous stream of movement data is usually called continuous
gesture recognition. State-space temporal models are typi-
cally used for continuous gesture recognition because they
can take into account both variability in execution and tempo-
ral dependencies in the signal [19]. For example, Conditional
Random Fields (CRFs) have been shown successful for such
a task [32, 18]. Moreover, considering coarticulation, CRF
is able to take into account contextual information such as
the preceding and following of a given segment. However,
as described in Morency et al. [18], standard implementa-
tion of CRF can only be used for offline analysis on bounded
continuous stream preventing for its use in interactive sys-
tems. Nevertheless, Song et al [28] extended the models by
proposing an online spotting within a sliding window, but not
addressing explicitly articulation effects.
Considering gesture-based interactive systems, we believe
that the challenge is actually to go beyond continuous gesture
recognition (i.e.spotting) in characterizing the gestures exe-
cution and in particular their coarticulation [5]. Importantly,
physical movements are dynamic phenomena, which encode
features directly linked to expressivity. It is particularly true
for coarticulatory movements that are prone to unconscious
cognition-induced changes in dynamics (e.g. chunking) as
well as voluntary (conscious) continuous variations.
In our previous work, we have proposed Bayesian state-space
models able to infer in real-time the gesture performed and
characteristics of its execution. We have proposed two main
approaches for this problem: a template-based continuous
state-space model [6], and a variant of hidden Markov models
[10, 9]. The former is able to track modulations of recorded
templates, the latter is able to learn statistically-relevant ges-
ture variances. As both approaches proposed two comple-
mentary views of potential variability in movement execu-
tion, we inspect in the next section how these models can
inform on the coarticulatory content in gesture sequences.
This section aims to illustrate a typical case of gesture coar-
ticulation and the associated challenges for real-time analysis
and continuous gesture recognition.
Movement Measurement
We recorded a set of executions of two gestures drawn on
a trackpad, using Cycling’74 Max2with the external finger-
pinger3to measure the trajectories. We also used the library
MuBu4for Max [24] in order to record and save the captured
gesture data.
We chose to use two gestures, typically used in gesture-based
interactive systems (see for example [31]). The first gesture
is a “V” (Figure 1, a) the second is a “O” (Figure 1, b). These
two gestures are then sequenced in two different ways: Ges-
ture 1– Stop – Gesture 2and Gesture 1– Gesture 2(no pause
between both gesture executions). The 4gestures are depicted
in Figure 1. Each one of these gestures is repeated 10 times.
The two different ways to perform transitions between ges-
ture 1and gesture 2illustrate different aspects of coarticula-
tion. In the first case, the coarticulatory effect is minimised
since the movement stops at the transition. In the second case,
coarticulatory effects intervene since the movement is not al-
lowed to stop and fused boundaries between segments must
Real-Time Inference of Coarticulated Gestures
We analyze the coarticulation between the two gestures
through the two probabilistic models mentioned.
3by Michael and Max Egger
research/multitouch-external- for-maxmsp/
a b c d
Figure 1: The gestures used in the experiment and several instances of their sequences. The two basic gestures are two-
dimensional trajectories representing the symbols V (a) and O (b). Figures (c) and (d) respectively represent a sequence of
gestures 1 and 2, either with a pause between, or rapidly without break.
The first model is an adaptive template-based following sys-
tem [6]. In this case we learn first the whole sequence of
gestures 1and 2without coarticulation (i.e. performed with
a short stop between them) as the template (Figure 2a-left).
Then, we observe how the adaptive model can match the coar-
ticulated sequence (Figure 2a-middle). The Figure 2a also
shows the alignment that is computed between each gesture
segment, 1and 2(Figure 2a-middle and right). The adap-
tive following system can track the co-articulated figure, and
the transition between the two gestures appears clearly on the
alignment (Figure 2a-right).
Such an approach nevertheless requires to record, thus ‘learn’
the complete pair of gesture or at least their transition, simi-
larly to speech where diphones are considered.
Next, we consider how the coarticulation can be modeled
when a statistical model learns each gesture separately. We
use a hierarchical hidden Markov model that encodes each
gesture through a learning procedure [9]. Gesture 1and 2are
learned by considering the 10 examples of the isolated per-
formance. A 10-state HMM is built from these gestures. The
analysis is performed online on the same chosen sequences.
Figure 2b shows the results of the real-time continuous ges-
ture recognition for the coarticulated sequences.
The HMM approach is a discrete-state model which allows
for a sharp transition between both gestures. Nevertheless,
we observed that the transition cannot always be defined as a
unique point, as it is illustrated in Figure 2b-right.
Precisely, in the case of the HMM-like approach, the model is
able to consistently follow the first gesture (the time progres-
sion evolves continuously from 0 to 1). However, the transi-
tion between the two gestures results in short-term recogni-
tion errors. This problem is typical of real-time continuous
gesture recognition, where the ambiguities of coarticulation
results in ‘jumps’ of the recognition until one gesture is re-
solved after the transition is complete.
The example presented in this paper illustrated that both mod-
els can account for coarticulation in a first approximation.
Importantly, boundaries are fuzzy and there is a need for an
interpolation (or extrapolation) strategy. Usually, constraints
on transition are imposed on the user in order to ensure better
recognition results as observed in the case where a pause is
respected between both gestures. Nevertheless, we advocate
here to improve movement modeling to better take into ac-
count coarticulation. In the first model, interpolating would
mean interpolate between two templates by allowing cross-
gestural dependencies. The applications of method proposed
in computer animation (e.g. Gibet et al [11]) could be inter-
esting to evaluate in this context. In the second model, cross-
gestural dependencies could also be envisaged but it would
require examples of such dependencies in order to capture
their structure.
It can indeed be argue that large database containing several
ways of performing coarticulated gestures would improve
recognition even in the presence of coarticulation. However,
our point here is different: coarticulation intrinsically con-
tains important information that we should be able to char-
acterize per se in our computational models, and exploit in
the interaction. In particular, coarticulation can inform on the
degree of expertise in skilled movements, as usually found in
music and dance. Moreover, it would be beneficial to relate
computational models to the notion of chunking [8]. Coar-
ticulation typically occurs within a chunk, and segmentation
marks could be detected between chunks.
It is important to note that the example we provide in 2D is
only representative of the research questions we propose. The
use of 3D trajectory and other movement modalities such as
acceleration data would pose additional issues. The gener-
alisation of the methodology to a 3D movement remains an
important goal of this research.
As a perspective, we propose to take into account the follow-
ing points for computational movement models:
Movement anticipation implies to take into account the in-
fluence of precedent segments for describing forthcoming
postion (x)
time progression
time (samples)
postion (y)
position (X)
position (Y)
position (X)
position (Y)
Time (samples)
Time (normalised samples)
position (X)
position (Y)
position (X)
position (Y)
Time (samples)
Time (normalised samples)
position (X)
position (Y)
position (X)
position (Y)
Time (samples)
Time (normalised samples)
postion (x)
postion (y)
(a) Tracking test with the template-based method using particle filtering. The blue figure is the template performed with a
pause (V – Stop – 0) and the red figure shows the performed coarticulated gesture (no pause between Vand 0). The central
figure shows the spatial alignment between the two figures and the right figure shows the temporal alignment where the
transition is clearly visible.
position (x)
time progression
time (samples)
position (y)
(b) Recognition test with the Hierarchical HMM. The left figure shows the coarticulated
gestures (no pause between Vand 0). The color corresponds to the recognition results:
blue is for Vand red for 0. The right figure shows the recognition over time: the blue
curve shows the time progression (as decoded by the model) during the V, followed by
time progression during the 0. In both figures, the transition is ambiguous.
Figure 2: Results of (a) gesture tracking using template-based approach and (b) continuous gesture recognition using a Hierar-
chical Hidden Markov Model)
Low-level segments are concatenated in longer phrases
through practice and learning, which produce movement
variations over time altering the initial shapes of the seg-
ments and containing expressive features. Thus, movement
features and vocabularies must be considered as evolving
over time and prone to user idiosyncrasies.
As fused boundaries between segments might appear
through practice, segmentation should be adaptive. Hybrid
segmentation-interpolation approaches should be consid-
One approach is to consider a fully Bayesian movement rep-
resentation that would be take into account uncertainty cross-
gestures, various time scales and time courses in anticipation.
Other approaches are however possible and we hope that this
‘perspective’ paper could contribute to trigger important dis-
cussions on this topic.
This work was supported by the Marie Skodowska-Curie Ac-
tion of the European Union (H2020-MSCA-IF-2014, IF-GF,
grant agreement no. 659232), the Labex SMART (ANR-11-
LABX-65) supported by French state funds managed by the
ANR within the Investissements d’Avenir programme under
reference ANR-11-IDEX-0004-02, and by the Moving Sto-
ries research partnership (SSHRC Award 31639884).
1. Bhuyan, M., Kumar, D. A., MacDorman, K. F., and
Iwahori, Y. A novel set of features for continuous hand
gesture recognition. Journal on Multimodal User
Interfaces 8, 4 (2014), 333–343.
2. Bianco, T., Freour, V., Rasamimanana, N., Bevilaqua, F.,
and Causs´
e, R. On gestural variation and coarticulation
effects in sound control. In Lecture Notes in Computer
Science, vol. 5934. Springer, 2009, 134–145.
3. Bizzi, E., Mussa-Ivaldi, F. A., and Giszter, S.
Computations underlying the execution of movement: a
biological perspective. Science 253, 5017 (1991),
4. Bush, B. O. Modeling coarticulation in continuous
speech. PhD thesis, Oregon Health and Science
University, 2015.
5. Caramiaux, B., Bevilacqua, F., and Tanaka, A. Beyond
recognition: using gesture variation for continuous
interaction. In CHI’13 Extended Abstracts on Human
Factors in Computing Systems, ACM (2013),
6. Caramiaux, B., Montecchio, N., Tanaka, A., and
Bevilacqua, F. Adaptive gesture recognition with
variation estimation for interactive systems. ACM
Transactions on Interactive Intelligent Systems (TiiS) 4,
4 (2015), 18.
7. Caramiaux, B., Wanderley, M. M., and Bevilacqua, F.
Segmenting and parsing instrumentalists’ gestures.
Journal of New Music Research 41, 1 (2012), 13–29.
8. Diedrichsen, J., and Kornysheva, K. Motor skill learning
between selection and execution. Trends in Cognitive
Sciences 19, 4 (2015), 227–233.
9. Franc¸oise, J., Roby-Brami, A., Riboud, N., and
Bevilacqua, F. Movement sequence analysis using
hidden markov models: A case study in tai chi
performance. In Proceedings of the 2nd International
Workshop on Movement and Computing (MOCO’15),
ACM (2015), 29–36.
10. Franc¸oise, J., Schnell, N., Borghesi, R., and Bevilacqua,
F. Probabilistic models for designing motion and sound
relationships. In Proceedings of the 2014 International
Conference on New Interfaces for Musical Expression
(2014), 287–292.
11. Gibet, S., Lebourque, T., and Marteau, P.-F. High-level
specification and animation of communicative gestures.
Journal of Visual Languages & Computing 12, 6 (2001),
12. Godøy, R. I. Understanding coarticulation in musical
experience. In Sound, Music, and Motion. Springer,
2013, 535–547.
13. Godøy, R. I., Jensenius, A., and Nymoen, K. Chunking
in music by coarticulation. Acta Acustica united with
Acustica 96, 4 (2010), 690–700.
14. Hardcastle, W. J., and Hewlett, N. Coarticulation:
Theory, data and techniques. Cambridge University
Press, 2006.
15. Iskarous, K., Mooshammer, C., Hoole, P., Recasens, D.,
Shadle, C. H., Saltzman, E., and Whalen, D. The
coarticulation/invariance scale: Mutual information as a
measure of coarticulation resistance, motor synergy, and
articulatory invariance. The Journal of the Acoustical
Society of America 134, 2 (2013), 1271–1282.
16. Kirsh, D. Embodied cognition and the magical future of
interaction design. ACM Transactions on
Computer-Human Interaction (TOCHI) 20, 1 (2013), 3.
17. Li, S., Wang, L., and Kong, D. Synthesis of sign
language co-articulation based on key frames.
Multimedia Tools and Applications 74, 6 (2015),
18. Morency, L.-P., Quattoni, A., and Darrell, T.
Latent-dynamic discriminative models for continuous
gesture recognition. In Computer Vision and Pattern
Recognition, 2007. CVPR’07. IEEE Conference on,
IEEE (2007), 1–8.
19. Murphy, K. P. Machine learning: a probabilistic
perspective. MIT press, 2012.
20. Palmer, C., and Deutsch, D. Music performance:
Movement and coordination. In The psychology of music
(Third Edition), D. Deutsch, Ed. Elsevier Press
Amsterdam, 2013, 405–422.
21. Pelachaud, C. Communication and coarticulation in
facial animation. PhD thesis, University of
Pennsylvania, 1991.
22. Rasamimanana, N. H., and Bevilacqua, F. Effort-based
analysis of bowing movements: evidence of anticipation
effects. The Journal of New Music Research 37, 4
(2009), 339 – 351.
23. S¨
om, D., Flanagan, J. R., and Johansson, R. S. Skill
learning involves optimizing the linking of action
phases. Journal of neurophysiology 110, 6 (2013),
24. Schnell, N., R ¨
obel, A., Schwarz, D., Peeters, G., and
Borghesi, R. MuBu & Friends - Assembling Tools for
Content Based Real-Time Interactive Audio Processing
in Max/MSP. In Proceedings of the International
Computer Music Conference (ICMC) (2009).
25. Schultz, T., and Wand, M. Modeling coarticulation in
emg-based continuous speech recognition. Speech
Communication 52, 4 (2010), 341–353.
26. Segouat, J. A study of sign language coarticulation.
SIGACCESS Access. Comput., 93 (Jan. 2009), 31–38.
27. Shah, A., Barto, A. G., and Fagg, A. H. A dual process
account of coarticulation in motor skill acquisition.
Journal of motor behavior 45, 6 (2013), 531–549.
28. Song, Y., Demirdjian, D., and Davis, R. Continuous
body and hand gesture recognition for natural
human-computer interaction. ACM Transactions on
Interactive Intelligent Systems (TiiS) 2, 1 (2012), 5.
29. Sosnik, R., Hauptmann, B., Karni, A., and Flash, T.
When practice leads to co-articulation: the evolution of
geometrically defined movement primitives.
Experimental Brain Research 156, 4 (2004), 422–438.
30. Viswanathan, N., Magnuson, J. S., and Fowler, C. A.
Information for coarticulation: Static signal properties
or formant dynamics? Journal of Experimental
Psychology: Human Perception and Performance 40, 3
(2014), 1228.
31. Wobbrock, J., Wilson, A., and Li, Y. Gestures without
libraries, toolkits or training: a $1 recognizer for user
interface prototypes. In Proceedings of the 20th annual
ACM symposium on User interface software and
technology, ACM (2007), 159–168.
32. Yang, R., and Sarkar, S. Detecting coarticulation in sign
language using conditional random fields. In Pattern
Recognition, 2006. ICPR 2006. 18th International
Conference on, vol. 2, IEEE (2006), 108–112.
... This contextual smearing of events due to coarticulation is in a sense what makes music (and spoken language) sound "natural, " and recuperating the "original" events behind the smeared manifestations of coarticulation can be very challenging. We have seen some interesting work in this direction (Bevilacqua et al., 2016), and furthermore, this recuperation could also resemble recovering underlying intermittency components from emergent continuous sound and motion. ...
... We focus on complex gesture sequences similar to the ones applied in movement-based music systems. Typically, such gestures cannot simply be reduced as a mere sequence of units due to gestural co-articulation [6], i.e. boundaries of each units tend to blur. Therefore, it is necessary to consider motor variability over the whole sequence, occurring over several learning sessions. ...
Full-text available
With the increasing interest in movement sonification and expressive gesture-based interaction, it is important to understand which factors contribute to movement learning and how. We explore the effects of movement sonification and users’ musical background on motor variability in complex gesture learning. We contribute an empirical study in which musicians and non-musicians learn two gesture sequences over three days, with and without movement sonification. Results show the interlaced interaction effects of these factors and how they unfold in the three-day learning process. For gesture 1, which is fast and dynamic with a direct “action-sound” sonification, movement sonification induces higher variability for both musicians and non-musicians on day 1. While musicians reduce this variability to a similar level as no auditory feedback condition on day 2 and day 3, non-musicians remain to have significantly higher variability. Across three days, musicians also have significantly lower variability than non-musicians. For gesture 2, which is slow and smooth with an “action-music” metaphor, there are virtually no effects. Based on these findings, we recommend future studies to take into account participants’ musical background, consider longitudinal study to examine these effects on complex gestures, and use awareness when interpreting the results given a specific design of gesture and sound.
... This contextual smearing of events due to coarticulation is in a sense what makes music (and spoken language) sound "natural, " and recuperating the "original" events behind the smeared manifestations of coarticulation can be very challenging. We have seen some interesting work in this direction (Bevilacqua et al., 2016), and furthermore, this recuperation could also resemble recovering underlying intermittency components from emergent continuous sound and motion. ...
Full-text available
The aim of this paper is to present principles of constraint-based sound-motion objects in music performance. Sound-motion objects are multimodal fragments of combined sound and sound-producing body motion, usually in the duration range of just a few seconds, and conceived, produced, and perceived as intrinsically coherent units. Sound-motion objects have a privileged role as building blocks in music because of their duration, coherence, and salient features and emerge from combined instrumental, biomechanical, and motor control constraints at work in performance. Exploring these constraints and the crucial role of the sound-motion objects can enhance our understanding of generative processes in music and have practical applications in performance, improvisation, and composition.
... As such, the processes of assembling and of navigating the diagram generate their value through the relationships they reveal and their inherent potential for experiencing new configurations: "The diagrammatic or abstract machine does not function to represent, even something real, but rather constructs a real that is yet to come, a new type of reality. " [20,142] Model As a starting point, it is interesting to note that research on movement-and-computing is carried out about existing phenomena but also from artificial models that are derived from them, but that do not necessarily represent them specifically [9]. The artificial models in question are those that attempt to structure movement analysis in a specific manner, for example by formalising Laban's Effort categories into algorithms for computing them [58], or that simulate movements based on avatars (body-models) [37] or other physical models [8]. ...
Conference Paper
This article investigates fundamental questions and methodological issues concerning research on movement and computing. Through a process of mapping of the various approaches and phases of research in this domain, it attempts to construct a coherent picture and overview of the research field. A series of questions arise that are discussed with the intent of anchoring and directing future research across different disciplines. In order to better apprehend the complexity of movement, gesture, action, and physical performance, and their role as topic of scientific, scholarly as well artistic research practices, an extension of the disciplinary and methodological framework is proposed. The juxtaposition of the diverse approaches and goals, and the extension of the research can indicate novel axes for generating techniques, methods, and ultimately knowledge. Based on this insight, a reflection on the potential of a wider cross-mediating research practice concludes this article.
... Considering DBN as a tool to model dependences between dynamic variables, a first challenge would be to formalize the problem of co-articulation as a DBN, where coarticulation is defined as the fusion of small-scale events into phrase-level segments (Godøy et al., 2010). This first challenge would require the careful definition of the relevant variables and their dependences, as well as the need for a dataset embedding co-articulatory elements (Bevilacqua et al., 2016). ...
Conference Paper
Full-text available
Movement sequences are essential to dance and expressive movement practice; yet, they remain underexplored in movement and computing research, where the focus on short gestures prevails. We propose a method for movement sequence analysis based on motion trajectory synthesis with Hidden Markov Models. The method uses Hidden Markov Regression for jointly synthesizing motion feature trajectories and their associated variances, that serves as basis for investigating performers' consistency across executions of a movement sequence. We illustrate the method with a use-case in Tai Chi performance, and we further extend the approach to cross-modal analysis of vocalized movements.
Full-text available
Music performance represents a complex example of auditory scene analysis, temporal expectancies, attention, and auditory memory processes. This chapter discusses two novel research developments in music performance: (1) body movement and its relation to sounded performance and (2) sensorimotor processes in ensemble performance (measurements of two or more performers). Motion capture measurements of performers’ body movements have documented constraints shaped by three primary factors: sensorimotor integration processes, biomechanical properties, and the performers’ expressive intentions. Recent measurements of ensemble performance, compared with solo performance, document the influence of sensory feedback from oneself versus other performers and constraints that arise from interactions among ensemble members. Important individual differences are reviewed in how musicians adapt to their partners in ensemble settings. In summary, developments in measurement techniques and time series analyses have yielded intriguing insights in the online use of multiple sensory systems during music performance, which are reviewed in this chapter.
Full-text available
Learning motor skills evolves from the effortful selection of single movement elements to their combined fast and accurate production. We review recent trends in the study of skill learning which suggest a hierarchical organization of the representations that underlie such expert performance, with premotor areas encoding short sequential movement elements (chunks) or particular component features (timing/spatial organization). This hierarchical representation allows the system to utilize elements of well-learned skills in a flexible manner. One neural correlate of skill development is the emergence of specialized neural circuits that can produce the required elements in a stable and invariant fashion. We discuss the challenges in detecting these changes with fMRI. Copyright © 2015 Elsevier Ltd. All rights reserved.
Full-text available
Applications requiring the natural use of the human hand as a human–computer interface motivate research on continuous hand gesture recognition. Gesture recognition depends on gesture segmentation to locate the starting and end points of meaningful gestures while ignoring unintentional movements. Unfortunately, gesture segmentation remains a formidable challenge because of unconstrained spatiotemporal variations in gestures and the coarticulation and movement epenthesis of successive gestures. Furthermore, errors in hand image segmentation cause the estimated hand motion trajectory to deviate from the actual one. This research moves toward addressing these problems. Our approach entails using gesture spotting to distinguish meaningful gestures from unintentional movements. To avoid the effects of variations in a gesture’s motion chain code (MCC), we propose instead to use a novel set of features: the (a) orientation and (b) length of an ellipse least-squares fitted to motion-trajectory points and (c) the position of the hand. The features are designed to support classification using conditional random fields. To evaluate the performance of the system, 10 participants signed 10 gestures several times each, providing a total of 75 instances per gesture. To train the system, 50 instances of each gesture served as training data and 25 as testing data. For isolated gestures, the recognition rate using the MCC as a feature vector was only 69.6 % but rose to 96.0 % using the proposed features, a 26.1 % improvement. For continuous gestures, the recognition rate for the proposed features was 88.9 %. These results show the efficacy of the proposed method.
Full-text available
This article presents a gesture recognition/adaptation system for human--computer interaction applications that goes beyond activity classification and that, as a complement to gesture labeling, characterizes the movement execution. We describe a template-based recognition method that simultaneously aligns the input gesture to the templates using a Sequential Monte Carlo inference technique. Contrary to standard template-based methods based on dynamic programming, such as Dynamic Time Warping, the algorithm has an adaptation process that tracks gesture variation in real time. The method continuously updates, during execution of the gesture, the estimated parameters and recognition results, which offers key advantages for continuous human--machine interaction. The technique is evaluated in several different ways: Recognition and early recognition are evaluated on 2D onscreen pen gestures; adaptation is assessed on synthetic data; and both early recognition and adaptation are evaluated in a user study involving 3D free-space gestures. The method is robust to noise, and successfully adapts to parameter variation. Moreover, it performs recognition as well as or better than nonadapting offline template-based methods.
Full-text available
This article reports on developments conducted in the fra-mework of the SampleOrchestrator project. We assembled for this project a set of tools allowing for the interactive real-time synthesis of automatically analysed and annota-ted audio files. Rather than a specific technique we present a set of components that support a variety of different inter-active real-time audio processing approaches such as beat-shuffling, sound morphing, and audio musaicing. We particularly insist on the design of the central ele-ment of these developments, an optimised data structure for the consistent storage of audio samples, descriptions, and annotations closely linked to the SDIF file standard.
Conference Paper
The term coarticulation designates the fusion of small-scale events, such as single sounds and single sound-producing actions, into larger units of combined sound and body motion, resulting in qualitative new features at what we call the chunk timescale in music, typically in the 0.5.–5 s duration range. Coarticulation has been extensively studied in linguistics and to a certain extent in other domains of human body motion as well as in robotics, but so far not so much in music, so the main aim of this paper is to provide a background for how we can explore coarticulation in both the production and perception of music. The contention is that coarticulation in music should be understood as based on a number of physical, biomechanical and cognitive constraints, and that coarticulation is an essential factor in the shaping of several perceptually salient features of music.
Modeling coarticulation in speech has been largely limited to short sequences and/or limited phonetic context. We introduce a methodology for modeling both formant frequency and bandwidth in continuous speech, allowing examination of sentencelevel coarticulation. The model represents continuous trajectories as a combination of overlapping local trajectories, which are represented by a weighted-addition of acoustic event targets by sigmoidal coarticulation functions characterized by slope and position. Estimation is achieved using a combination of hill-climbing and grid-search, with global target, joint slope for identical contexts, and local position parameters. We evaluate model performance for two speakers using an intelligibility test that compares vocoded model output to a purely vocoded and a natural condition.
Scitation is the online home of leading journals and conference proceedings from AIP Publishing and AIP Member Societies
Co-articulation is a language phenomenon. In sign language (SL), it takes the form of impact among adjacent signs, which results in variations of signs from their standard configurations. Standard configuration is the appearance of a sign when it appears singly, without context. Without co-articulation, SL animation based on virtual character will be a simple concatenation of signs. The movement of virtual character will be mechanical, lacking fluency and realism. This paper presents a key frame based SL co-articulation animation scheme aiming at the three most important elements of co-articulation, i.e. hand shape, hand position and SL speed. To generate co-articulation, motion data of signs which appear sequentially is parsed to identify hand shapes and positions included in these signs. Then, co-articulation will be achieved through some modification to the motion data according to the interaction between adjacent hand shapes and adjacent hand positions. SL Speed acts as an adjusting parameter which dynamically impacts co-articulation. Different expression speed will lead to different degree of co-articulation.