Designing Gestures for Continuous Sonic Interaction
Atau Tanaka, Balandino Di Donato, and
Embodied Audiovisual Interaction Group
Goldsmiths, University of London
SE14 6NW, London, UK
Music and Technology Department
HKU University of the Arts
3582 VB, Utrecht, Netherlands
We present a system that allows users to try diﬀerent ways
to train neural networks and temporal modelling to asso-
ciate gestures with time-varying sound. We created a soft-
ware framework for this and evaluated it in a workshop-
based study. We build upon research in sound tracing
and mapping-by-demonstration to ask participants to de-
sign gestures for performing time-varying sounds using a
multimodal, inertial measurement (IMU) and muscle sens-
ing (EMG) device. We presented the user with two classical
techniques from the literature, Static Position regression
and Hidden Markov based temporal modelling, and pro-
pose a new technique for capturing gesture anchor points
on the ﬂy as training data for neural network based regres-
sion, called Windowed Regression. Our results show trade-
oﬀs between accurate, predictable reproduction of source
sounds and exploration of the gesture-sound space. Several
users were attracted to our windowed regression technique.
This paper will be of interest to musicians engaged in going
from sound design to gesture design and oﬀers a workﬂow
for interactive machine learning.
Sonic Interaction Design, Interactive Machine Learning,
•Human-centered computing →Empirical studies
in interaction design; •Applied computing →Sound
and music computing;
Designing gestures for the articulation of dynamic sound
synthesis is a key part of the preparation of a performance
with a DMI. Traditionally this takes place through a care-
ful and manual process of mapping. Strategies for mapping,
including “one-to-many” and “many-to-one”  are funda-
mental techniques in NIME. The ﬁeld of embodied music
cognition looks at the relationship between corporeal action
and music . The notion of sonic aﬀordances draws upon
the notion of aﬀordance from environmental psychology 
to look at how a sound may invite action .
Licensed under a Creative Commons Attribution
4.0 International License (CC BY 4.0). Copyright
remains with the author(s).
NIME’19, June 3-6, 2019, Federal University of Rio Grande do Sul,
Porto Alegre, Brazil.
Sound tracing is an exercise where a sound is given as
a stimulus to study evoked gestural response . Sound
tracing has been used as a starting point for techniques of
“mapping-by-demonstration” . While these studies look
at the articulation of gesture in response to sounds, they
focus on evoked gesture. In the ﬁeld of sonic interaction de-
sign, embodied interaction has been used to design sounds.
This includes techniques applying interactive technologies
to traditions of Foley, or by vocalisation  and invoke the
body in the design of sounds.
The synthesis of time-varying sounds and the exploration
of timbral spaces is a practice at the heart of computer mu-
sic research. Wessel’s seminal work in the ﬁeld deﬁnes tim-
bre space in a Cartesian plane . Momeni has proposed
interactive techniques for exploring timbre spaces .
Neural networks can be trained for regression tasks by
providing examples of inputs associated with desired out-
puts. In systems for interactive machine learning, like Wek-
inator , this is implemented by associating positions in 3D
space to synthesised sound output. Once a model is trained,
the user performs by moving between (and beyond) the ex-
ample positions to create dynamic sound by gestures. While
performance is dynamic, the training is based on poses as-
sociated with sound synthesis parameters that are ﬁxed for
each input example. Here we call this approach “static re-
Time-varying gestures can be modelled by probabilistic
approaches, such as Hidden Markov Models. In perfor-
mance, live input is compared to transition states of the
model, allowing the algorithm to track where in the exam-
ple gesture the input is. This approach is commonly referred
to as temporal modelling.
We present a system for designing gestures to perform
time-varying synthesised sound. It extends the notion of
mapping-by-demonstration in a practical setting by en-
abling users to capture gesture while listening to sound, and
then to train diﬀerent machine learning models. It asso-
ciates the authoring of gesture to interactive sound synthe-
sis and in so doing, explores the connection between sound
design and gesture design. The technique uses commonly
available tools for musical performance and machine learn-
ing and assumes no specialist knowledge of machine learn-
ing. It will be useful for artists wishing to create gestures
for interactive music performances in which gestural input
articulates dynamic synthesised sound where the associa-
tion of gesture and sound is not made by direct mapping,
but mediated by machine learning.
We propose an automated technique for training a neu-
ral network with a windowed set of anchor points captured
on the ﬂy from a dynamic gesture made in response to a
sound tracing stimulus. We call this technique Windowed
Regression and evaluate it alongside static regression and
temporal modelling to gain insight into its usefulness in a
gesture design task.
This paper is organised as follows. In the next section,
we survey related work in the area of machine learning of
musical gesture. In Section 3, we present the architecture of
our system, its techniques of sound design, machine learning
and the proposed workﬂow. Section 4 presents a workshop-
based evaluation. This is followed by a discussion to gather
insight from user experiences.
2. RELATED WORK
Fiebrink established an interactive machine learning (IML)
workﬂow for musicians carrying out classiﬁcation and re-
gression tasks with gestural input driving sound synthesis
output where users are able to edit, delete, and add to train-
ing datasets interactively . In a typical workﬂow with
Wekinator, a regression task would be trained by static pos-
tures. Scurto  proposes a method of extracting examples
from dynamic performances in response to sonic stimuli.
Caramiaux  uses Canonical Correlation Analysis to
study evoked gestures in response to sound stimuli and ex-
plores the diﬀerent movement-sound relationships evoked
by “causal” and“non-causal” sounds . In the latter, users
trace the sound’s frequency/amplitude morphology.
Nymoen  conducted a large scale sound tracing study
relating gesture features (position, velocity, acceleration) to
sound features such as loudness, brightness and pitch, and
found a direct relationship between spectral centroid and
vertical motion. When the movement of pitch was opposite
to the motion of the spectral centroid, participants were
more likely to move their hands following the pitch. When
listening to noisy sounds, participants performed gestures
that were characterised by a higher acceleration.
Fran¸coise  studied diﬀerent probabilistic models in
mapping-by-demonstration. He uses two kinds of mod-
elling, Gaussian Mixture Models (GMM), and Hierarchi-
cal Hidden Markov Models (HHMM) and uses each in two
diﬀerent ways: 1.) to model gesture itself (single mode),
and 2.) to model gesture along with the associated sound
(multimodal). GMMs provide a probabilistic classiﬁcation
of gesture or regression based on a gesture-sound relation-
ship, while HMM-based approaches create temporal mod-
els either of the gesture by itself or of the gesture-sound
association. We adopt his HHMM approach as one of the
algorithms used in our proposed system.
There are an increasing number of machine learning soft-
ware packages for interactive music applications   
 . While these tools expose machine learning tech-
nologies to artists, they still require conﬁguration and in-
tegration into a music composition or performance system.
One part of our proposed system is a scriptable interface
where the user can assign gesture features to feed Wek-
inator, and select synthesis parameters to be controlled
by Wekinator’s output. We provide a generic Wekinator
project that runs in the background that is controlled by
3. THE SYSTEM
We developed our system using Cycling’74 Max, Fiebrink’s
Wekinator for neural network regression, and the HHMM
object from IRCAM’s MuBu library for temporal modelling.
Our system is modular, comprised of three (3) blocks:
1. A scriptable sensor input and gesture feature extrac-
2. A scriptable synthesiser controller with breakpoint en-
velopes to dynamically send selected parameters to
the machine learning module
3. A machine learning training module to capture gesture
training sets and remotely control Wekinator
3.1.1 Sensor input & feature extraction
For this study, we capture gesture using a Thalmic Labs
Myo, using its electromyogram (EMG) muscle sensing and
inertial measurement unit (IMU) gross movement and ori-
entation sensing. To extract orientation from the IMU, we
capture Euler Angles (x, y, z) of the forearm. We calculate
the ﬁrst order diﬀerences (xd, yd, zd) of these angles, which
are correlated with direction and speed of displacement, and
augment our regression feature vector with historical data.
We detect gesture power  by tracking muscle exertion,
following the amplitude envelope of four (of the Myo’s 8)
EMG channels with a Bayesian ﬁlter .
The sendrcv scripting system we propose allows the user
to select any number of features to be sent to Wekinator
as inputs. In this way, the proposed system is not speciﬁc
to the Myo and can be used with other sensors and input
feature extraction algorithms.
3.1.2 Synthesizer playback
We used a general purpose software synthesizer, SCP by
Manuel Poletti. This synthesizer is controlled by our break-
point envelope-based playback system. We chose to design
sounds that transition between four ﬁxed anchor points
(start, two intermediate points, and end) that represent
ﬁxed synthesis parameters. The envelope interpolates be-
tween these ﬁxed points. The temporal evolution of sound
is captured as diﬀerent states in the breakpoint editor whose
envelopes run during playback, feeding both synthesizer and
Wekinator. Any of the parameters can be assigned to break-
point envelopes to be controlled during playback.
The sounds are customisable. For the workshop, we cre-
ated two sounds with granular synthesis and one sound us-
ing a looping sample synthesizer. These sound trajectories
are reproduced during the gesture design and model train-
ing phases of our workﬂow (section 3.2). In performance
a model maps sensor data to synthesis parameters, allow-
ing users to reproduce the designed sounds or explore sonic
space around the existing sounds.
3.1.3 Wekinator communications
We developed a scripting system, sendrcv, in Max that al-
lows modularity and high-level use of the system. Sendrcv
is a conﬁgurable scaling and mapping abstraction that sets
up assignable sends and receives between Wekinator, the
gesture features that feed it, and the synthesis parameters
it controls. On input, it allows the user to select gesture
features to be recorded by Wekinator. On output, each
instantiation makes a bridge between a parameter in the
synthesizer and the model output.
Sendrcv is invoked with network ports as arguments, al-
lowing multiple sensor inputs and synthesizers to be used in
parallel with a corresponding number of Wekinator projects.
It is instantiated with a unique name so messages can be
addressed specifying the gesture feature or synthesizer pa-
rameter that it feeds or is controls. It is bidirectional, allow-
ing the use of a synthesizer’s user interface or the Wekinator
sliders to author sounds. The relevant range of a synthesizer
parameter can be deﬁned in the script and is normalised to
a ﬂoating point value in the range, 0.0−1.0. This allows
a Wekinator pro ject to be agnostic to synthesizer speciﬁcs.
Other scripting features include throttling the data rate us-
ing speedlim, and a ramp destination time for Max’s line
object. A typical setup script is:
; 6448weki01 sendrcv mysend;
6448weki01 arg myarg;
6448weki01 min 0;
6448weki01 max 127;
6448weki01 speedlim 10;
6448weki01 time 10;
The sound and gesture design workﬂow are described be-
low in the section 3.3.
3.2 Machine Learning Approaches
Four diﬀerent approaches to machine learning (ML) are
used in the system. We provide three diﬀerent ways to
train neural networks for regression, each using the same al-
gorithm and topology, but varying in the way training data
are captured. A fourth method uses HHMMs for tempo-
ral modelling, which we chose because it can track progress
inside of a gesture.
3.2.1 Static Regression
In the ﬁrst approach, after designing the sound-gesture in-
teraction through the sound tracing exercise, users segment
their gestural performance into four discrete poses, or an-
chor points. These points coincide with breakpoints in
the synthesis parameters (section 3.1.2). Training data are
recorded by pairing sensor data from static poses with ﬁxed
synthesis parameters. These data are used to train a re-
gression model, so in performance participants can explore
a continuous mapping between the deﬁned training points.
We refer to this technique as Static Regression.
3.2.2 Temporal Modelling
In the second approach, we train temporal models, speciﬁ-
cally Hierarchical Hidden Markov Models implemented with
MuBu . HHMMs are used to automatically segment a
gesture into 10 equal-sized states, each represented by a
Gaussian Mixture Model. In performance, the output of an
HHMM is used to step along the synthesis parameter time-
line. Here, we refer to this technique as Temporal Modelling.
3.2.3 Whole Regression
In a third approach, we train a neural network using input
and output data generated during the whole duration of the
sound. We call this algorithm Whole Regression.
3.2.4 Windowed Regression
Finally, we propose our method: training a neural network
with gestural data and synthesis parameters from four tem-
poral windows centred around the four ﬁxed anchor points
in the sound. Anchor points are deﬁned as points in time
where there is a breakpoint in the functions that gener-
ate synthesis parameters over time (red circles in Figure 1).
This includes the beginning and end of the sound, as well
as two equally spaced intermediate points. Training data
are recorded during windows that are and centred around
the anchor points and have a size of 1/6 of the whole du-
ration of given sound (grey areas in Figure 1). We call this
The workﬂow is divided into four top level activities: Sound
design, Gesture design, Machine training and Performance.
While we present them here in order, they and the steps
within them can be carried out iteratively and interactively.
Labels (1, 2, …)
Data (1, 2, …)
1 2 3 4 Time
Figure 1: Windowed Regression. The red circles
represent the four anchor points, and the grey zones
show the window of recorded data around each an-
3.3.1 Sound design
In the sound design phase of our workﬂow, users use their
preferred synthesizer to author sounds. They select salient
synthesis parameters that will be modulated in the temporal
evolution of the sound. These parameters are scripted in
sendrcv. A sound trajectory is then composed of four anchor
points. The user then records these anchor points using the
Envelope window of our system. They create a variant on
their sound, select the breakpoint to which they would like
to assign it (0 −3), and click Set (Fig. 2).
Figure 2: The Envelope window showing progress
bar, sound duration, anchor point selection, set but-
ton above, and several envelopes below.
In this way, short (<10 second) sounds can be created
with dynamic parameter envelopes that are suitable for
3.3.2 Gesture design
The gesture design part of the system (Fig. 3) enables the
user to choose between the diﬀerent ML approaches men-
tioned above (Section 3.2). The user selects a sound to
preview in the left part of the panel. In the current ver-
sion, there are three (3) authored sounds that can be pre-
viewed, each with four (4) anchor points. The Play button
below the progress bar changes name contextually to play
the time-varying sound trajectory or one of the selected
anchor points. In this way the user can conceive their ges-
ture by sound tracing, practice executing it while listening
to the sound, and ﬁnd salient anchor points in the gesture
that correspond to anchor points in the sound.
3.3.3 Model training
Once the gestures are designed, the user can train their
choice of ML algorithms. Figure 4 shows the logical se-
quence. First, the user decides whether to work with an-
chor points in a static regression approach or using dynamic
gesture in one of three time-based approaches. In the lat-
ter case, they choose from whole or windowed regression or
temporal modelling. This part is seen in the middle pane
of the interface in Fig. 3. Once the algorithm is chosen,
the user proceeds with training using the right panel. The
Record button records examples. If a dynamic algorithm
is chosen, this will play the selected sound, and the user
Figure 3: The machine learning training panel, with
selection of sounds (with possible selection of anchor
point for static regression)(Left), Selection of ML
algorithm (Centre), and Record, Play, Train, and
Clear Dataset buttons (Right).
records a training set by sound tracing, in the same way
they designed the gesture. If the user has chosen Static Re-
gression, they select an anchor point on the left, hold the
pose to associated with the anchor point, and then click
the Record button. This is repeated for each of the anchor
points. At any point, the user has the possibility to Clear
their recording (the Cbutton) to re-take their gesture or
their posture. The Data Set Size ﬁeld shows the number of
samples recorded. If they are happy with their recording,
the user then trains a model by clicking the Tbutton.
Train ML algorithm
Whole Gesture or
Windowed Regression or
Dynamic or Static?
Figure 4: The machine training decision tree, where
the user selects static regression, one of two types
of dynamic regression, or temporal modelling.
We organised a half-day workshop where we presented the
software and asked participants to explore each approach
to machine learning. We collected qualitative data in the
form of video capturing participants’ experience using our
proposed system. Data were analysed by adopting Open
and Axial Coding Methods .
The workshop was not meant to be a tutorial on ML tech-
niques nor a primer on sonic interaction design. We, there-
fore, recruited participants who were creative practitioners
in music, dance, or computational art, who had some prior
exposure to topics of embodied interaction and ML. We
recruited ﬁve (5) participants (3 female, 2 male). Three
were Computational Arts Masters students with interest in
dance technology, one was a recent undergraduate Creative
Computing graduate, and one was a PhD student in live
Figure 5: A workshop participant demonstrating
We provided the hardware and software system on lab com-
puters. We also provided three (3) sounds that had been
prepared for the study:
A A Theremin-like whistling sound with a frequency tra-
jectory ending in rapid vibrato
B A rhythmic sound of repeating bells where speed and
pitch were modulated
C Scrubbing of a pop song where granular synthesis al-
lowed time stretching
By providing the sounds, the workshop focused on the Ges-
ture Design segment of the workﬂow described above.
We focused on Sound A, the frequency trajectory of the
whistling tone. Participants listened to the sound, design-
ing their gesture by sound tracing. They then tried Whole
Regression. In the second task, participants were asked to
think about breaking their gesture down into anchor points
to train the system by Static Regression. Task three con-
sisted of trying Windowed Regression and Temporal Mod-
elling. We ﬁnished with a free exploration segment where
the participants tried the other two sounds with algorithms
of their choosing.
Four of ﬁve participants designed a gesture for Sound A
that was consistent with theory from sound tracing; they
followed the amplitude/frequency morphology of the sound
with sweeping arm gestures and muscle tension. One par-
ticipant designed her gesture with a drawing on paper (Fig.
6). Participants tried to represent the wobbly vibrato at the
end of the sound in diﬀerent ways: by wiggling their ﬁngers,
ﬂapping their hands, or making a ﬁst. P1 commented on
Whole Regression where interaction with the sound“became
embodied, it was giving me, and I was giving it.”
The participants responded diﬀerently to decomposing
their gesture into anchor points for Static Regression. For
P1 this meant that she“could be more precise.” P2 identiﬁed
what she called, “natural” points along her paper sketch as
anchors. These included key points like the turn of a line,
but also the middle of a smooth curve (Fig. 6). P3 felt
that this technique had a less ﬂuid response, like triggering
diﬀerent “samples”. P4 found it diﬃcult to decompose her
smooth gesture into constituent anchors: “It was diﬃcult to
have the four anchor points... Sure the sound was divided
up in diﬀerent pitches but..”. P5 felt that “the connection
between the sound and the movement was not as close [as
using Whole Regression].” P1 took this as a creative op-
portunity, “I had the possibility to reinvent the transitions.”
Figure 6: Gesture design by drawing. P2 in Task 1
(Left), then Task 2 with anchor points (Right).
With Temporal modelling, P1 seemed to track the orien-
tation of her arm more than her hand gestures. P3 found
it to be “too discrete” and P4 sound it “super choppy.” P5
remarked, “you could hear the transitions, it was less ﬂuid
than number one [Whole regression]. It was steppy.”
Three participants (P1, P3, P4) had positive reactions
to our Windowed Regression technique. P1 used it with
Sound B (a rhythmic bell) in a gesture consisting of waving
her hand out and twisting her wrist while moving her arm
from frontwards to upwards. After trying and clearing the
recording four times, she perfected her gesture by paying
attention to shoulder position and ﬁnger tension. P3 and
P4 chose Windowed Regression with Sound C (a scrubbed
and ﬁltered sample of a pop song). P3 “performed” with
it in a playful manner: “What I was trying to do was...
to separate out the bits.” P4 played with the “acceleration
of the gesture... because of the sound [song], that’s more
a continuous sound and movement, so I worked more with
the acceleration.” P1 and 3 felt that this technique enabled
them to reproduce the sound accurately but at the same
time also to explore new sonic possibilities.
In the free exploration segment of the workshop, four out
of ﬁve participants (P2, P3, P4 and P5) presented their
explorations with Sound B (rhythmic bells). P5 trained a
Static Regression model with diﬀerent spatial positions of
the arm. P3 did similarly and attempted to add striking
gestures to follow the rhythmic accelerando. P2 associated
speed of movement to bell triggering using Temporal Mod-
elling. She tried with the arm in a ﬁxed position and again
by changing orientation, and felt that the latter worked bet-
ter for her. P2 showed how she used the bell sound with
Whole Regression. She performed a zig-zag like movement,
and explored the quiet moments she could attain through
stillness, similar to the work of Jensenius et al. 
Participants were interested in going beyond reproducing
the sound trajectory they had traced, exploring the expres-
sivity of a given technique and responding to variations of
gesture within and outside the designed gesture. Sound B
(rhythmic bell) was the most diﬃcult sample to reproduce
faithfully but gave more expressivity, P5 said “it gave the
best interaction... the most surprising results.”
5. DISCUSSION AND CONCLUSIONS
We have presented a system for designing gesture and imple-
menting four related machine learning techniques. We pre-
sented those techniques in a workshop without giving their
name or technical details on how each algorithm worked.
The only indication about how static modelling diﬀered
from the dynamic techniques was that participants were
asked to train the system using gesture anchor points. In
this sense, this study was not a comparison of diﬀerent mod-
elling techniques. In the release version of our system1, we
expose the names of the algorithms in the UI, making a
direct comparison possible.
The workﬂow aﬀorded by our system enables the user,
without specialist knowledge of ML and without directly
conﬁguring and operating ML algorithms, to enter into a
musically productive gesture design activity following the
IML paradigm. Our system is aimed at musicians and
artists who might imagine incorporating embodied interac-
tion and machine learning into a performance. The work-
shop participants represented such a user group: they were
comfortable with digital technologies, but did not have spe-
ciﬁc technical knowledge of feature extraction or machine
learning. However, they were articulate in describing what
they experienced and insightful in discerning the diﬀerent
kinds of gesture-sound interaction each algorithm aﬀorded.
The intuitive way in which our users explored the diﬀer-
ent algorithms means they were able to train models that
did not perform as expected. Without visibility into the
data and how an algorithm was processing it, it is diﬃcult
to know how to alter one’s approach when training a new
model. While sometimes unpredictable performance was a
positive eﬀect, it was more commonly viewed as an error.
Three users (P3, P4, P5) felt that Static Regression did
not result in smooth interaction. This may be due to large
amounts of training data and a possible overﬁtting eﬀect.
We took this into consideration in a design iteration of the
system. Based on this, we added an auto-stop feature in the
static gesture recording so that it stops after 200 samples.
Participants on the whole conﬁrmed ﬁndings of sound
tracing studies. They followed the amplitude/frequency
morphology of a sound when it was non-causal . When
they sought to trace a more casual type of sound such as the
bell hits, they tried to make striking gestures. Such gestures
would be missed by a regression algorithm. Meanwhile, a
temporal model would have diﬃculty tracking the repeti-
tive looping nature of such a gesture. While in the output
of the neural network, modulation of the sample loop-end
point caused an accelerando in the bell rhythm, a striking
rhythm on input was not modelled.
Meanwhile having multiple input modalities (EMG and
IMU) gave the users multiple dimensions on which to trace
sound morphology. With a single modality, like motion cap-
ture in Cartesian space, it can be unclear whether a gesture
like raising the arms is tracing rising frequency or amplitude
or both. By using muscle tension and orientation indepen-
dently, we saw that our users used the IMU to follow pitch
contour, and muscle tension to follow intensity of sound – be
they in amplitude or eﬀects like the nervous vibrato at the
end of the whistling Theremin-like tone. This is consistent
with Nymoen’s observation on the change in sound tracing
strategies as users encounter noisier sounds . While Ny-
moen sees increased acceleration, here the EMG modality
allows an eﬀort dimension in sound tracing that does not
have to follow pitch or spectral centroid.
While the workshop focused on the gesture design work-
ﬂow, we imagine users will be interested in designing sounds
along with performance gestures, and training models ac-
cordingly. We hope our method of designing sounds with
trajectories is eﬀective. However, authoring sounds using
only four anchor points may be frustrating for some. If the
number of anchor points is too few, our system could be
expanded to accommodate more. However, in the current
version, anchor points are synchronous. It is possible that
sound designers would not want parameters to have break-
points at the same points in time. Future development will
involve integrating our system into full musical performance
environments, incorporating multiple sounds and gestures,
providing an interface for saving and loading models, and
accounting for performance issues such as fatigue.
In demonstrations of machine learning for artists, tuto-
rials often focus on the rapid prototyping advantages of
the IML paradigm. In a desire to get artists up and run-
ning with regression and modelling techniques, examples
are recorded quickly and trained on random variations of
synthesizer sounds. The focus is on speed and ease of use.
Scurto found that the serendipity this causes can bring a
certain creative satisfaction . However, we can imagine
that once comfortable with the record-train-perform-iterate
IML loop, that composers and performers will want to work
with speciﬁc sounds or choreographies of movement. It is
here that sound design and gesture design meet. Our sys-
tem provides a sound and gesture design front end to IML
that connects the two via sound tracing.
Participants in our workshop were concerned about the
ﬂuidity of response of the ML algorithms. They discussed
the choice of algorithms as a trade-oﬀ between faithfully
reproducing the traced sound and giving them a space of
exploration to produce new, unexpected ways to articulate
the sounds. In this way, they began to examine the ges-
ture/sound aﬀordances of the diﬀerent approaches to re-
gression and temporal modelling our system oﬀered. We
might say that this enabled them to exploit IML for a ges-
tural exploration of Wessel’s timbre space.
This paper presented a system that enabled sound and
gesture design to use techniques of sound tracing and IML in
authoring continuous embodied sonic interaction. It intro-
duced established techniques of static regression and tem-
poral modelling and proposed a hybrid approach, called
Windowed Regression, to track time-varying sound and as-
sociated gesture to automatically train a neural network
with salient examples. Workshop participants responded
favourably to Windowed Regression, ﬁnding it ﬂuid and ex-
pressive. They were successful in using our system in an it-
erative workﬂow to design gestures in response to dynamic,
time-varying sound synthesis. We hope that this system and
associated techniques will be of interest to artists preparing
performances with time-based media and machine learning.
We acknowledge our funding body H2020-EU.1.1. - EX-
CELLENT SCIENCE - European Research Council (ERC)
- ERC-2017-Proof of Concept (PoC) - Project name:
BioMusic - Project ID: 789825.
 A. Altavilla, B. Caramiaux, and A. Tanaka. Towards
gestural sonic aﬀordances. In Proc. NIME, Daejeon,
 J. Bullock and A. Momeni. ml.lib: Robust,
cross-platform, open-source machine learning for max
and pure data. In Proc. NIME, pages 265–270, Baton
Rouge, Louisiana, USA, 2015.
 B. Caramiaux, F. Bevilacqua, and N. Schnell.
Towards a gesture-sound cross-modal analysis. In
Gesture in Embodied Communication and
Human-Computer Interaction, pages 158–170, Berlin,
 B. Caramiaux, M. Donnarumma, and A. Tanaka.
Understanding gesture expressivity through muscle
sensing. ACM Transactions on Computer-Human
Interaction (TOCHI), 21(6):31, 2015.
 B. Caramiaux, P. Susini, T. Bianco, et al. Gestural
embodiment of environmental sounds : an
experimental study. In Proc. NIME, pages 144–148,
Oslo, Norway, 2011.
 J. M. Corbin and A. L. Strauss. Basics of Qualitative
Research: Techniques and Procedures for Developing
Grounded Theory. SAGE, Fourth edition, 2015.
 S. Delle Monache and D. Rocchesso. To embody or
not to embody: A sound design dilemma. In Machine
Sounds, Sound Machines. XXII Colloquium of Music
Informatics, Venice, Italy, 2018.
 R. Fiebrink and P. R. Cook. The Wekinator: a system
for real-time, interactive machine learning in music.
In Proc. ISMIR, Utrecht, Netherlands, 2010.
 R. Fiebrink, P. R. Cook, and D. Trueman. Human
model evaluation in interactive supervised learning. In
Proc. CHI, pages 147–156, Vancouver, BC, Canada,
 R. Fiebrink and H. Scurto. Grab-and-play mapping:
Creative machine learning approaches for musical
inclusion and exploration. In Proc. ICMC, pages
 J. Fran¸coise, N. Schnell, R. Borghesi, and
F. Bevilacqua. Probabilistic models for designing
motion and sound relationships. In Proc. NIME,
pages 287–292, London, UK, 2014.
 J. Fran¸coise. Motion-sound mapping by
demonstration. PhD thesis, UPMC, 2015.
 J. Gibson. Theory of aﬀordances. In The ecological
approach to visual perception. Lawrence Erlbaum
 N. Gillian and J. A. Paradiso. The Gesture
Recognition Toolkit. The Journal of Machine
Learning Research, 15(1):3483–3487, 2014.
 A. Hunt and M. M. Wanderley. Mapping performer
parameters to synthesis engines. Organised Sound,
 A. R. Jensenius, V. E. Gonzalez Sanchez,
A. Zelechowska, and K. A. V. Bjerkestrand. Exploring
the myo controller for sonic microinteraction. In Proc.
NIME, pages 442–445, Copenhagen, Denmark, 2017.
 M. Leman. Embodied music cognition and mediation
technology. MIT Press, 2008.
 A. Momeni and D. Wessel. Characterizing and
controlling musical material intuitively with geometric
models. In Proc. NIME, pages 54–62, Montreal,
 K. Nymoen, J. Torresen, R. I. Godøy, and A. R.
Jensenius. A statistical approach to analyzing sound
tracings. In Speech, Sound and Music Processing:
Embracing Research in India, pages 120–145, Berlin,
 A. Parkinson, M. Zbyszy´nski, and F. Bernardo.
Demonstrating interactive machine learning tools for
rapid prototyping of gestural instruments in the
browser. In Proc. Web Audio Conference, London,
 T. D. Sanger. Bayesian ﬁltering of myoelectric signals.
Journal of neurophysiology, 97(2):1839–1845, 2007.
 D. L. Wessel. Timbre space as a musical control
structure. Computer Music Journal, pages 45–52,
 M. Zbyszy´nski, M. Grierson, M. Yee-King, et al.
Rapid prototyping of new instruments with codecircle.
In Proc. NIME, Copenhagen, Denmark, 2017.