ArticlePDF Available

Action-sound Latency and the Perceived Quality of Digital Musical Instruments: Comparing Professional Percussionists and Amateur Musicians

Queen Mary University of London, London,
United Kingdom
feedback (action-sound latency) when playing a musical
instrument is widely recognized as disruptive to musical
performance. In this paper we present a study that
assesses the effects of delayed auditory feedback on the
timing accuracy and judgments of instrument quality
for two groups of participants: professional percussio-
nists and non-percussionist amateur musicians. The
amounts of delay tested in this study are relatively small
in comparison to similar studies of auditory delays in
a musical context (0 ms, 10 ms, 10 ms +3 ms, 20 ms).
We found that both groups rated the zero latency con-
dition as higher quality for a series of quality measures
in comparison to 10 ms +3 ms and 20 ms latency, but
did not show a significant difference in rating between
10 ms latency and zero latency. Professional percussio-
nists were more aware of the latency conditions and
showed less variation of timing under the latency con-
ditions, although this ability decreased as the temporal
demands of the task increased. We compare our find-
ings from each group and discuss them in relation to
latency in interactive digital systems more generally and
experimentally similar work on sensorimotor control
and rhythmic performance.
Received July 4, 2017, accepted March 28, 2018.
Key words: audiotactile latency, sensorimotor synchro-
nization, quality judgments, percussionists, digital
musical instruments
a highly developed sensorimotor skill, where
years of training and theoretical knowledge are
brought together into the nuanced and expressive con-
trol required for musical performance. Delayed feedback
(be it auditory, visual or tactile) can cause disruption to
this sensorimotor control. Previous studies have shown
that delayed auditory feedback (DAF) during music per-
formance can disrupt musical production, primarily by
increasing the variability of timing (Pfordresher, 2006;
Pfordresher & Palmer, 2002; Yates, 1963). This disruption
varies with delay length and similar effects have been
shown for DAF in speech (Howell, 2001).
In the field of human computer interaction, delayed
feedback has mostly been studied as system latency (the
asynchrony between a control gesture and a system’s
corresponding response) and jitter (the variability of
this asynchrony). Latency is a fundamental issue affect-
ing interactive digital systems and has long been recog-
nized as potentially harmful to a user’s experience of
control (MacKenzie & Ware, 1993; Meehan, Razzaque,
Whitton, & Brooks, 2003): even if accuracy of temporal
performance is not impacted, the qualitative experience
of the user may be negatively impacted (Kaaresoja,
Anttila, & Hoggan, 2011). The way latency and jitter
affect a user has been shown to vary greatly depending
(Annett, 2014); for example, direct or indirect touch,
tapping or swiping on a touchscreen (Jota, Ng, Dietz,
Dietz, 2012).
In this paper we present a study that investigates the
effects of small amounts of action-sound latency and
jitter (10 ms, 10 +3 ms, 20 ms) on the interaction of
musicians with a digital percussion instrument. We
assess both the musicians’ judgments of instrument
quality under different latency conditions and their tim-
ing accuracy. Two groups of participants took part in
this study: non-percussionist amateur musicians and
professional percussionists. Our aim with this research
was to assess the impact of relatively small amounts of
delay on the fluency and quality of the interaction, even
when the auditory feedback is not perceived as detached
from the action that produced it (the commonly
accepted threshold for perceived audiotactile simultane-
ity can vary between 20 ms and 70 ms (Occelli, Spence,
& Zampini, 2011). We also examine whether extensive
rhythmical training and the demands of the musical
task affect the influence of DAF on performance.
Action-sound Latency, Perceived Instrument Quality 109
Context of Research
The altered auditory feedback (AAF) paradigm has
been used extensively in music psychology to investigate
the importance of auditory information for the execu-
tion of control sequences on musical instruments.
Delayed auditory feedback (DAF) is a common form
of AAF where the onset of auditory feedback is delayed
by a fixed amount in relation to the action that pro-
duced it (Black, 1951). While the contents of anticipated
feedback events are usually maintained in DAF, with
only the synchrony of perception and action being
affected, there are other types of AAF that alter the
contents of auditory feedback while maintaining syn-
chrony. For example, experiments have been conducted
on digital keyboards where the AAF consists of shifting
pitches to disrupt expectations of pitch arrangements
on the keyboard (Pfordresher, 2003, 2008; Pfordresher
& Palmer, 2006) or randomizing pitch (Finney, 1997;
Pfordresher, 2005).
Each kind of alteration to auditory feedback disrupts
performance in different ways and to different extents.
Recent research on delayed feedback suggests that asyn-
chronies between action and feedback primarily disrupt
the timing of actions, not their sequencing (the produc-
tion of melodic sequences) (Pfordresher, 2003). Pitch
alterations, on the other hand, disrupt the accuracy of
production and not timing (Pfordresher, 2003). The
point of maximal disruption caused by asynchronies
(the amount of delay, above which no significant
increase in disruption is seen) has been the focus of
much research. Generally, disruption increases as the
delay increases up to a certain point, and then reaches
asymptote (Gates, Bradshaw, & Nettleton, 1974, found
an asymptote around 270 ms in music performance).
However, rather than an absolute time discrepancy, the
degree of disruption caused by asynchronies depends on
when it occurs in the interonset interval (IOI) in rhyth-
mic performance, and reflects the phase relationships
between onsets of auditory feedback relative to the IOI
between actions (key presses for example) (Pfordresher,
A common interpretation of disruption from delayed
feedback is the sensorimotor conflict hypothesis. The
proposal is that delayed feedback interferes with the
planned timing of actions (Pfordresher, 2006) or their
execution (Howell, 2001) due to shared representations
for perception and action (MacKay, 1987). Delayed
feedback causes disruption by conflicting temporally
with the expected timing of a planned movement (Pfor-
dresher & Dalla Bella, 2011), in this case the expected
sound that should result from an action. The magnitude
of disruption most likely depends on the perceptual
salience of the delayed feedback (Stenneken, Prinz,
Cole, Paillard, & Aschersleben, 2006).
Our concern is with audiotactile interactions: the tight
coupling between auditory and tactile feedback systems
has been recognized (Occelli et al., 2011), as has its
increased temporal resolution of synchrony perception
in comparison to audio-visual and tactile-visual (Fuji-
saki & Nishida, 2009). Whereas the importance of audi-
tory feedback for musical performance is evident, given
the primary aural focus of music as a cultural practice,
tactile feedback has been shown to play an important
role in the control of timing during music performance
(Goebl & Palmer, 2008) and expert performers have
been shown to depend less on auditory feedback and
more on tactile feedback than non-expert performers
during the performance of sequential movements (van
der Steen, Molendijk, Altenmu
¨ller, & Furuya, 2014).
High temporal acuity is shared by both hearing and
touch. In terms of temporal precision hearing is the
most accurate of our senses: two stimuli of equal sub-
jective intensity were perceived as being temporally dis-
crete if separated by ca. 2 ms for monaural and binaural
stimulation (Levitin et al., 2000), touch being less accu-
rate (ca. 10–12 ms; Gescheider, 1966) but still better
than sight (ca. 25 ms; Kietzman & Sutton, 1968).
Multisensory integration is the process by which the
human nervous system merges available sensory infor-
mation into unique perceptual events (Calvert, Spence,
& Stein, 2004). Joining stimuli received through sepa-
rate sensory channels can take place between stimuli
that are temporally asynchronous, but which fall within
the ‘‘temporal window’’ of integration (Meredith,
Nemitz, & Stein, 1987). For audiotactile stimuli this can
vary from tens to hundreds of milliseconds wide
depending on various factors to do with the location,
magnitude and content of the stimuli (Occelli et al.,
2011). An important measure is the point of subjective
simultaneity: the ‘‘amount of time by which one stimu-
lus has to precede or follow the other in order for the
two stimuli to be perceived as simultaneous’’ (Spence &
Parise, 2010, p. 365).
While many studies of DAF deal with amounts of
delay of 50 ms and above, research on sensorimotor
synchronization (typically in the guise of non-musical
finger tapping studies) has yielded sophisticated models
of sensorimotor timing with auditory feedback on the
110 Robert H. Jack, Adib Mehrabi, Tony Stockman, & Andrew McPherson
level of tens of milliseconds (see Repp & Su, 2013, for
a review). Tapping studies are useful for examining how
movements are synchronized with an auditory stimulus
and help enrich our understanding of sensory pathways
and the weighting of signals in cross-modal perception.
Levitin et al. (2000) and Adelstein, Begault, Anderson,
Wenzel, and Field (2003) investigated the perceptual
asynchrony threshold values for an active audiotactile
interaction situation (playing a drum). The threshold
value of 42 ms was reported by Levitin whereas for
Adelstein et al. thresholds varying between 18 ms to
31 ms depending on the stimulus duration were
reported. These studies and others report that some
participants had very low threshold values (ca. 10 ms),
particularly musicians (Adelstein et al., 2003; Levitin
et al., 2000; Repp & Doggett, 2007).
When tapping along to a metronome beat, participants
commonly exhibit a negative mean asynchrony (NMA):
they anticipate the beat and strike early by between 30
and 10 ms (see Aschersleben, 2002; Repp, 2000; Repp &
Doggett, 2007). A measure of NMA and the variability of
this asynchrony are a common way of assessing the tem-
poral accuracy of a performer (Repp & Su, 2013). The
bidirectional influence of auditory and movement infor-
mation is evident in many simple tapping studies where
auditory information guides motor timing: audiotactile
stimuli with delays to auditory feedback cause anticipa-
tions to increase with the amount of delay (Aschersleben
& Prinz, 1997) when delay is gradually introduced up to
70 ms in a tapping test, whereas NMA reduces with deaf-
ferented participants with only auditory and visual feed-
back (Stenneken et al., 2006).
This measure can be significantly affected by adapta-
tion to asynchrony (see Vroomen & Keetels, 2010, for
a review). The adaptation process is typically evaluated
by measuring participants’ perceptions of crossmodal
simultaneity both before and after an exposure period,
where there is a constant feedback delay between the
stimuli presented in the two modalities. Vroomen and
Keetels (2010) describe this as a widening of the tem-
poral window for multisensory integration. The tempo-
ral window for audiotactile integration has been shown
to widen in response to a relatively short exposure to
asynchronously presented tactile and auditory stimuli
in the case of passive tactile perception (Navarra, Soto-
Faraco, & Spence, 2007).
Musicians have been recognized as better than nonmu-
sicians across a range of timing dependent tasks. In
duration-based tasks where the duration of two inter-
vals are compared musicians outperform nonmusicians
(Rammsayer & Altenmu
¨ller, 2006). Musicians also show
a superior ability to distinguish timing changes within
isochronous sequences (Lim, Bradshaw, Nicholls, &
¨ller, 2003), which is particularly true of percus-
sionists who demonstrate the highest accuracy of all
musician groups (Ehrle
´& Samson, 2005). The NMA
when tapping to an isochronous sequence is also nota-
bly smaller for amateur musicians in comparison to
nonmusicians (10–30 ms vs. 20–80 ms) (Aschersleben,
2002; Repp & Doggett, 2007). Further differences
between instrument speciality have been demonstrated:
participants with high levels of rhythm-based musical
expertise (in particular, percussionists) display superior
synchronization abilities (smaller NMAs with less vari-
ability in tapping tasks) when compared to other musi-
cians and nonmusicians (Cameron & Grahn, 2014;
Krause, Pollok, & Schnitzler, 2010; Manning & Schutz,
2016). Dahl (2000) reported that professional percussio-
nists demonstrated a variation of mean synchronization
error of between 10–40 ms, which equated to 2–8%of
the associated tempo. Even lower synchronization error
in professional drummers has been reported by Kilchen-
mann and Senn (2011; between 3 ms and 35 ms depend-
ing on motor effector, part of the drum kit and rhythmic
‘‘feel’’) and Hellmer and Madison (2015; below 5 ms).
The effects of asynchronous multisensory feedback have
also been extensively studied in the field of human-
computer interaction, where latency and jitter are
unavoidable side effects of digital systems and their lin-
kages between virtual and physical worlds. Due to the
current proliferation of touchscreen technologies, much
recent research has focused on acceptable levels of latency
in such devices: Ng et al. (2012) have shown that visual-
tactile latency well under 10 ms can affect user preference,
even when no delay is perceived. When examining mul-
tisensory latency in touchscreen buttons Kaaresoja, Brew-
ster, and Lantz (2014) suggest that latency should be
lowest for the tactile channel (5–50 ms), followed by audio
(20–70 ms) and finally the visual (30–85 ms). However,
this is task dependent: for direct touch systems the thresh-
old of noticeable visual-tactile latency has been shown to
be as low as 69 ms for tapping and 11 ms for dragging
(Deber, Jota, Forlines, & Wigdor, 2015).
Auditory delays within an acoustic musical context are
multifaceted and commonplace. Lago and Kon (2004)
Action-sound Latency, Perceived Instrument Quality 111
point out the variability of the effects of such delays and
their dependence on instrument, style of music, and
spatial positioning: in ensemble playing, for example,
due to the distance between players and the speed of
sound (Chafe & Gurevich, 2004). Lester and Boley
(2007) provide a comprehensive overview of the effects
of latency during a live sound monitoring situation in
a recording studio and found that sensitivity to latency
was highly dependent on instrument type and monitor-
ing style (in-ear versus wedge monitors). As latency
increased it became less of a spectrum affecting issue and
more of a temporal perception issue (above 6.5 ms caused
temporal smearing with certain instruments for in-ear
monitoring). The low thresholds found in their paper are
in part due to the specifics of the live monitoring situa-
tion where acoustic and delayed sound are combined.
The close coupling of action to sound via the virtual
mechanism of the computer is of prime importance to
building compelling digital musical instruments (DMIs).
Wright, Cassidy, and Zbyszynski (2004) stated that a few
milliseconds of latency and jitter can make the difference
between a responsive, expressive, satisfying real-time
computer music instrument and a rhythm-impeding
frustration. Due to the complexity of the sensorimotor
control that a musician has over an instrument and the
high demands of musical performance, we believe that
DMI design is a good testing ground for understanding
the effect of latency and jitter in human computer inter-
action more broadly, complementing research done in
relation to musical disruption caused by DAF.
Latency has been identified as a barrier to virtuosic
engagement by obstructing a fluent interaction with the
instrument (Magnusson & Mendieta, 2007; O’Modhrain,
2011; Wright et al., 2004). DMI designers have often
been recommended to aim to create instruments that
support the kind of interaction possible with acoustic
instruments; tools that foster a relationship between
gesture and sound that is both intuitive yet complex
(O’Modhrain, 2011). Latency does not generally affect
acoustic instruments that produce sound in reaction
to action instantaneously as the sound producing mech-
anism and control interface are one and the same.
Wessel and Wright (2002) suggest that DMIs should
aim for a latency of less than 10 ms with less than
1 ms jitter. McPherson, Jack, and Moro (2016) demon-
strated that Wessel and Wright’s guideline is still not
met by many toolkits commonly used to create DMIs.
With digital musical instruments there are many parts
of the system that introduce latency and jitter between
action and sound: buffering in hardware and software,
latency in the audio code itself (from frequency domain
processing for example), transmission delay between
sensors and audio engine due to USB connection,
latency induced by smoothing or signal conditioning
of the sensor input (McPherson et al., 2016; Wright
et al., 2004). These factors can combine to create a sig-
nificant delay between performer action and resultant
sound and impede what Wessel and Wright (2002)
describe as the development of control intimacy between
performer and instrument. Control intimacy has been
described by Fels (2004) as the perceived match of the
behavior of an instrument and a performer’s control of
that instrument, a concept that is connected to the
notion of ergoticity (Luciani, Florens, Courousse
´, & Cas-
tet, 2009): the preservation of energy through both dig-
ital and physical components of a system, maintaining
an ecologically valid causal link between action and
sound to foster an embodied engagement with the
instrument (Essl & O’Modhrain, 2006).
During performance with digital musical instru-
ments, latency has been shown to affect accuracy of
control and to be identified in different ways. Instru-
ments with continuous gestural control, for example,
have been shown to be less sensitive to latency: for
a theremin, where no physical contact is made with the
instrument, the just noticeable difference of latency was
shown to be around 20–30 ms, with latencies as high as
100 ms going undetected during the performance of
slow passages (Ma¨ki-Patola & Ha¨ma¨la¨inen, 2004). Dahl
and Bresin (2001), in a study with ‘‘in-air’’ percussive
digital musical instruments without tactile feedback,
found that a latency of 40 ms negatively impacted tim-
ing accuracy but that up to around 55 ms performers
were able to compensate for the latency by increasing
their anticipation (moving their strike earlier) when
latency was gradually introduced.
In the present study, a group of highly trained profes-
sional percussionists and a group of non-percussionists
(with varying levels of musical experience) evaluated the
effects of variable levels of delayed auditory feedback on
a novel digital percussion instrument. Our aim is to
investigate the differences in the effects of latency and
There are however some exceptions when latency is built into the
mechanism of an instrument. In the case of a piano, the delay between
a key reaching the key bottom and the hammer striking the string can be
around 35 ms for pp notes and -5 ms for ff notes. These figures do not
include the key travel time (the time elapsed between initial touch and the
key reaching the key bottom) which for pressed touch can be greater than
100 ms for pp notes and 25 ms for ff notes (Askenfelt & Jansson, 1988).
112 Robert H. Jack, Adib Mehrabi, Tony Stockman, & Andrew McPherson
jitter on timing accuracy and perceived instrument
quality between the two groups, and to understand the
influence, if any, of specialized training in rhythm based
musical practices on these measures.
The challenge of measuring multimodal delays in inter-
active systems has been explored by many, mostly ori-
ented towards touchscreen interactions (Kaaresoja &
Brewster, 2010; Schultz & van Vugt, 2016; Xia et al.,
2014). In the present study, in order to counter the
common problems of latency in a DMI and to have
sufficient control over the exact amount of latency and
( as the basis of our instrument, an open source
platform designed for real time, ultra-low latency, audio
and sensor processing. Bela provides sub-millisecond
latency and near jitter-free synchronization (within 25
ms) of audio and sensor data (McPherson & Zappi,
2015), making it a suitable platform for controlling the
exact amount of latency present in a DMI. Measure-
ments of its performance can be found as part of the
latency tests of common platforms used to create DMIs
conducted by McPherson et al. (2016).
The instrument. We built a self-contained percussive
digital musical instrument, the playing surface of which
consists of eight ceramic tiles of varying sizes (see
Figure 1). The instrument represents a ‘‘simplest case’
digital percussion instrument with one dimension of
control: discrete velocity triggering of samples. Each of
the ceramic tiles has a piezo disc vibration sensor
attached to the back with pliable scotch mounting tape.
Mounts for the tiles were created from laser cut plywood
and each tile was suspended by their antinodes, allowing
them to vibrate freely when struck, ensuring a strong
signal to the vibration sensor (Sheffield & Gurevich,
2015). A layer of 3 mm rubber foam was glued to the
back of each tile to further condition the signal while
attenuating the acoustic resonance of the tile.
The piezo sensors are connected via a voltage biasing
circuit to the analog inputs of the Bela board. A striking
action on the tiles induces vibration in the tile, which is
passed through signal conditioning routines and a peak
detection algorithm (detailed in the next section) before
being used to trigger samples of Gamelan percussion
instruments. The intensity of the strike is measured
when a peak is detected and mapped to the amplitude
of the sample playback.
Filter group delay and peak detection. The peak detec-
tion routine includes a DC blocking filter, full-wave
rectification and a moving average filter. Strikes were
detected by looking for peaks in the sensor readings
using an algorithm that looks for an increase in the
reading followed by a downward trend once the reading
is above a minimum threshold. When a peak is detected,
the amplitude of the strike is measured and then
assigned to the sample appropriate to the tile. Our syn-
thesis engine had enough computation power to play 40
simultaneous samples and we used an oldest-out voice
stealing algorithm if all voices became allocated, to
allow for fast repeated strikes.
The audio output on Bela used a sample rate of 44.1
kHz and a buffer size of eight samples. The analog
inputs used for the piezo discs were sampled at 22.05
kHz, synchronously with the audio clock. The total
action-sound latency consists of the duration of the
two buffers (360 ms) plus the conversion latency of the
sigma-delta audio codec (430 ms). In addition, there is
the group filter delay of the FIR filter (moving aver-
age) that was used to smooth the piezo signal over 20
samples before the peak detection of 1 sample, result-
ing in 250 ms delay. As the analog inputs and audio
outputs are synchronized on an individual sample
level, jitter between them is no more than 25 ms
(McPherson et al., 2016). In the present study the
sound of the instrument was monitored directly
through noise-cancelling headphones. We conducted
a test of the headphones to ensure that the noise-
cancelling function was not introducing additional
latency and found that when the noise cancelling was
turned on there was an additional 100 ms latency in
comparison to the analog signal path. We take this
total of 1.2 ms latency as our Condition A, which
we call the ‘‘zero latency’’ condition, as the distance
between the tiles and the ears would normally con-
tribute around 2 ms of acoustic latency due to the
speed of sound.
FIGURE 1. The instrument built from eight ceramic tiles with piezo
discs attached to them to sense vibrations.
Action-sound Latency, Perceived Instrument Quality 113
Sound mapping. The use of a piezo vibration sensor
naturally gives us an ergotic link (Luciani et al., 2009)
between the force of a strike and the amplitude of the
sound output. The response curve of this sensor is lin-
ear, unlike other commonly used sensors in electronic
percussion instrument like force sensitive resistors
(FSRs). For a full review of sensors commonly used in
percussion instruments, see Medeiros and Wanderley
(2014) and Tindale, Kapur, Tzanetakis, Driessen, and
Schloss (2005). By using this sensor we were able to
naturally preserve the relationship between physical
energy at the input and perceived physical energy at the
output by producing a linear software relationship
between input level and output level.
In the present study four sample sets were used, each
consisting of eight individual samples that were
assigned to each of the eight ceramic tiles. All samples
had equal duration, pitch variation, and perceptual
attack time. The four sample sets were further divided
into two groups characterized by the perceptual acoustic
features of their attack transients. The difference
between these two groups was in the spectral centroid
during the initial strike: they could be broadly described
as ‘‘bright’’ or ‘‘dull’’ sounding (striking a metallic bar
with a hard beater versus striking a metallic bar with
a padded beater). Pitch height was preserved on the
instrument for each sample set moving with the lowest
pitch notes mapped to the largest tile on the left hand
side and the highest pitch note to the smallest tile of the
right hand side, increasing vertically for each third of
the instrument (see Figure 1).
The peak detection and triggering routine remained
constant throughout the experiment while the latency
condition and sample set changed. Throughout the
experiment the raw signals from the instrument were
recorded onto an SD card by Bela for later analysis. This
included the signal from each of the eight piezo disks
attached to the tiles, the audio output, and the metro-
nome or backing track that the participants were mon-
itoring through headphones in the second and third
Presentation. This study was conducted in a sound-
isolated studio. The instrument was mounted on a key-
board stand whose height the participants could adjust
for comfort. On a podium next to the instrument there
was a laptop where the participants input their
responses and changed the settings of the instrument.
Participants monitored the instrument directly through
noise cancelling headphones (Bose QC-25). White noise
was played in the room through a PA system at a level
(50 dB) where all acoustic sound from the instrument
was inaudible when the participant was performing. This
was to avoid participants hearing any excess sound com-
ing through air conduction from their contact with the
instrument, focusing their attention on the sound that
was presented through the headphones and their haptic
experience of the strike.
Two groups of participants took part in the study we
present. The first group (referred to as ‘‘non-percussio-
nists’’ or ‘‘NP’’ from now on) consisted of 11 partici-
pants (3 female) whose age was between 26 and 45 years
and who were recruited from our university depart-
ment. All members of this group had musical experi-
ence but none were professional. Eight of the 11
participants classified themselves as instrumentalists
and the other three as electronic musicians. None of
this group had received training in percussion. These
participants had varying degrees of music training
(0–15 years; M¼9.2, SD ¼4.5). All but two of the
participants had used a computer to make music, with
six of the participants regularly using the combination
of a hardware controller and software instrument to
compose and/or perform music.
The second group (referred to as ‘‘professional
percussionists’’ or ‘‘PP’’ from now on) consisted of 10
participants (1 female) whose age was between 26 and
35 years. They had completed at least a bachelors degree
in performance specializing in percussion and were
working professionally, either as a performer in an
orchestra, as a session musician, or in education. This
group had between 10 and 20 years of formal percussion
training (M¼13.8, SD ¼2.5). All participants in this
group had received training on a second instrument
(2–15 years; M¼11.0, SD ¼3.5), most commonly
piano in the case of six participants. Both groups
reported normal hearing and normal or corrected-to-
normal vision. This experiment met ethics standards
according to our university’s ethics board.
Four variable latency and jitter conditions were tested:
1) Condition A: ‘zero’ latency, 2) Condition B: 10 ms
latency, 3) Condition C: 20 ms latency, and 4) Condition
D: 10 ms latency +3 ms latency (simulated jitter).
These conditions were created by delaying the sound
triggered by a detected strike by a set number of samples
and were verified on an oscilloscope. In the jitter con-
dition each strike was assigned a random latency
between 7 ms and 13 ms. We chose these three specific
latency conditions based on a recent series of measure-
ments conducted by McPherson et al. (2016) of com-
mon techniques used to create digital musical
114 Robert H. Jack, Adib Mehrabi, Tony Stockman, & Andrew McPherson
instruments. We also deliberately chose our maximum
latency condition (20 ms) to be within the thresholds of
simultaneity perception for audio-haptic stimuli as
found by Adelstein et al. (2003) of around 24 ms. This
was to focus our findings on the effects of latency and
jitter when a delay is not necessarily perceived between
action and sound.
The study lasted for approximately one hr and 15 min
and consisted of two sections followed by a structured
interview. Participants were video and audio recorded
throughout the experiment.
Part 1: Quality assessment. In order to evaluate the
participants’ subjective impression of quality of the dif-
ferent latency conditions on the instrument, we decided
to use a method that involved participants rating the
conditions in comparison to one another for a series
of quality attributes. In this part of the study, latency
conditions B, C, and D were always compared to con-
dition A. This part of the experiment was inspired by
Fontana et al.’s (2015) study on the subjective evalua-
tion of vibrotactile cues on a keyboard. In their study
the impact of different vibrotactile feedback routines on
the perceived quality of a digital piano is assessed. Our
methodology and analysis in Part 1 took a similar route.
In this first section participants were presented with the
instrument and advised to freely improvise while switch-
ing between two settings: aand b. Their task, for each
pair of aand b, was to comparatively rate the two settings
according to four quality metrics (Responsiveness, Nat-
uralness, Temporal Control, General Preference) drawn
from studies on subjective quality assessments of acoustic
instruments (Saitis, Giordano, Fritz, & Scavone, 2012)
and based on the qualities we hypothesized would be
most relevant to the changing latency conditions. Once
they had decided on the comparative ratings of the two
settings they then moved onto the next pair.
Stimuli and conditions. Between aand bwe changed
both the latency condition and sample set. We delib-
erately wanted to mask the changing latency condi-
tions to evaluate whether the latency conditions were
perceivable by the participants when they were not
instructed to focus on the amount of latency present.
When starting the study participants were instructed
simply to compare the different settings on the instru-
ment according to the attributes and to try and not
base their ratings on preference for a sample set alone.
The fact that latency would be changing was not
Experiment procedure. The instrument was self-
contained, dealing with all the sensor and audio
processing via Bela, allowing participants to monitor
the instrument directly via noise cancelling headphones.
To switch between aand b, we used a separate laptop
that hosted a graphical user interface built in Pure Data
(, which communicates with the
Bela board via UDP, allowing participants to switch
between settings at will (see Figure 2). For each pair, the
zero latency condition (A) was assigned to either aor b
in a weighted random order while the other setting in
the pair would always contain a latency condition (B, C,
or D). Two different sample sets were also selected in
a weighted random order for aand b. There were 12
such pairs, again presented in a weighted random order
for each participant, ensuring that each sample set was
in the zero latency position three times per participant.
This meant that each participant rated each pair of
latency conditions four times, each time with a different
sample set assigned to each of the conditions. Partici-
pants were advised to take around 35 min to complete
the evaluation of the 12 pairs.
Participants also input their ratings on the accompa-
nying laptop via a graphical user interface that consisted
of slider inputs for each attribute using a Continuous
Category Rating scale (CCR), a rating widely used in
subjective quality assessments of interfaces (recommen-
dation ITU-T P.800). While rating the settings, partici-
pants were instructed to improvise freely with no
restrictions on their chosen style. Participants moved
the slider on the continuous scale to rate the relative
merits of the two settings (see Figure 3). The scale had
the following titles along its scale: ais much better than
b, both aand bare equal, and bis much better than a.
FIGURE 2. Experimental set-up with instrume nt and accompanying
laptop for changing settings.
Action-sound Latency, Perceived Instrument Quality 115
Part 2: Timing accuracy. In order to evaluate the
impact of the latency conditions on the temporal per-
formance of the participants, we used a synchronization
task were they were instructed to play along with a met-
ronome under each latency condition. A metronome at
120 bpm was played through the headphones. The par-
ticipant was then instructed to tap along with the beat
using a single tile, dividing the metronome beat into
progressively smaller chunks: every crotchet (quarter
note)—which is equivalent to the 120 bpm of the met-
ronome—then every quaver (eighth note), then every
semiquaver (sixteenth note). They performed each of
these tapping exercises for at least four bars, paused,
and then moved onto the next. They repeated the whole
task three times for each latency condition and then
moved onto the next condition. Each of the four latency
conditions were presented in a weighted random order
and the sample set remained the same across partici-
pants. Our methodology in this part of the study was
derived from Fujii et al.’s study (2011) on synchroniza-
tion of drum kit playing.
Part 3: Structured interview. To conclude the exper-
iment, a structured interview lasting between 10 and 20
min was conducted. The interview was conducted in
front of the instrument and demonstrations were
encouraged from the participants. The following themes
were discussed in each case: 1) general impression of the
instrument, including the styles of playing that worked
well and not; 2) techniques used to distinguish between
aand bin Part 1, the free improvisation; 3) whether
they noticed what was changing between settings,
besides sample set; and 4) their experience of latency
as an issue in musical performance.
The differences in subjective judgments of instrument
quality were evaluated by looking at the quality ratings
from Part 1 of the study.
Statistics. In our analysis and in Figure 4 condition
A (zero latency) is always a(zero on the y-axis) for
legibility, although in the study it was randomly
assigned to either aor b. For each group (professional
percussionist, non-percussionist) we fitted separate
linear mixed effect regression (LMER) models with
fixed effects of quality (responsiveness, naturalness,
temporal control, general preference) and condition
(10 ms, 20 ms, 10 ms +3 ms), and random intercepts
for each participant. The models were fitted using the
lme4 (Bates, Ma¨chler, Bolker, & Walker, 2015) package
for R (R Core Team, 2017). We conducted a full
factorial Type III ANOVA on each LMER model, with
Satterthwaites’s degrees of freedom approximation
from the lmerTest package (Kuznetsova, Brockhoff,
& Christensen, 2017).
Group 1: Non-percussionists.. Figure 4a shows the
median and IQR for all participants in this group (Fig-
ure 4c shows the mean and standard error). On average
condition A (the zero latency condition) was rated more
positively for all qualities than condition C and D, the
20 ms and 10 ms +3 ms conditions, respectively. We
found a significant effect of condition, F(2, 517) ¼7.37,
analysis on each factor shows that the effect of condition
is driven by a significant difference between 10 ms +3
ms and 10 ms, Z¼3.42, p
< .01, and 20 ms and 10 ms,
Z¼3.21, p
< .01 (all pvalues were adjusted using the
Benjamini and Hochberg false discovery rate correction;
FDR ¼5%).
Group 2: Professional percussionists.. Figure 4b shows
the median and IQR for all participants in this group
(Figure 4d shows the mean and standard error). For the
professionals, we found significant effects of condition,
F(2, 470) ¼4.82, p< .01, and quality, F(3, 470) ¼4.98,
p< .01. A Tukey post hoc analysis on each factor shows
that the effect of condition was driven by a significant
difference between 10 ms +3 ms and 10 ms, Z¼3.12,
< .01, and a significant difference between 20 ms
and 10 ms, Z¼2.54, p
<.05(allpvalues were
FIGURE 3. Continuous input slider for rating the settings in comparison to one another.
116 Robert H. Jack, Adib Mehrabi, Tony Stockman, & Andrew McPherson
adjusted using the Benjamini and Hochberg false dis-
covery rate correction; FDR ¼5%).
Influence of sample set. For both groups we tested to
ensure that sample was not having an overriding effect
on quality ratings (i.e., that participants were basing
their ratings on sample set alone). When fitting the
LMER models, we also included sample set as a fixed
effect and found no significant effect, so we were able to
discount this as a factor.
In this paper our analysis of timing performance focuses
only on task 2, playing with a metronome. For this
analysis we compared the onset of the strike against the
Jitter 20ms 10ms
a) non−percussionists, median and IQR
Jitter 20ms 10ms
b) percussionists, median and IQR
Jitter 20ms 10ms
c) non−percussionists, mean and std error
Jitter 20ms 10ms
d) percussionists, mean and std error
responsiveness naturalness temporal general
FIGURE 4. (a) and (b) show the median and IQR of all responses from both participant groups. (c) and (d) show the mean and standard error of all
responses from both participant groups. 0 on the y axis corresponds to ‘ais much better than b’, 100 with ‘bis much better than a’, and 50 with ‘both a
and bare equal’. Note that in this representation 0 always means that condition A (zero latency) is preferred to the other latency condition it is being
compared to.
Action-sound Latency, Perceived Instrument Quality 117
onset of the metronome tone, looking for the difference
between the timing of the strike on the tile and the
metronome tone rather than the audio output of the
instrument, which had added latency under certain con-
ditions. The onset of each strike relative to that of the
metronome was defined as the synchronization error
(SE). The value was negative when the onset of the
strike preceded that of the metronome and positive
when the strike onset lagged behind the metronome.
Statistics. For the modeling we fitted an LMER
model with fixed effects of group (non-percussionists,
professional percussionists), temporal division
(crotchet, quaver, semiquaver) and condition (10 ms,
20 ms, 10 ms +3 ms), and random intercepts for each
participant. As with the quality judgment analysis, the
significance of each fixed effect was tested using a full
factorial Type III ANOVA on the LMER model, with
Satterthwaites’s degrees of freedom approximation.
Typical distribution. Figure 5 shows the typical dis-
tribution of strikes of both groups for each tempo mea-
sure and each latency condition. Figure 6 presents the
median and interquartile range (IQR) for all latency
conditions for both groups. For the NP group we
excluded one participant from our analysis due to them
having a mean synchronization error (MSE) of 30%
greater than the group MSE giving 10 participants in
FIGURE 5. Distribution of strikes for both groups showing the spread of the timing of their strikes during the synchronization task. The medium gray
bar components reflect overlap between non-percussionists and professional percussionists.
118 Robert H. Jack, Adib Mehrabi, Tony Stockman, & Andrew McPherson
this group. All 10 participants in the PP group had
a MSE within this threshold.
Synchronization error. Figure 6 shows the median
and IQR for all participants under each latency condi-
tion and division. We found a significant effect of con-
dition,F(3, 2289) ¼5.88, p< .001, and division,F(2,
2289) ¼3.73, p< .05. A Tukey post hoc analysis
showed that the effect of condition is driven by a sig-
nificant difference between 10 ms and zero latency,
Z¼3.56, p
< .001, and between 20 ms and zero
latency, Z¼3.73, p
< .001. A smaller and marginally
significant difference was seen between 10 ms +3ms
and zero latency (p
¼.08). We also tested for inter-
actions between each fixed effect (by fitting new mod-
els with interaction terms), and found a significant
interaction between group and division, F(2, 2288) ¼
12.52, p< .001. This effect is shown in Figure 7, where
synchronization error is negatively correlated with IOI
for the NP group, and the opposite effect is observed
for PP. A post hoc analysis of the interaction contrasts
between all factors of group and condition was con-
ducted using the phia package (De Rosario-Martinez,
2015) for R. This showed a significant interaction
between group and all three division factors: crotchet-
quaver, c
(1) ¼4.38, p<.05,crotchet-semiquaver,
(1) ¼24.98, p< .001, and quaver-semiquaver,
(1) ¼7.82, p<.01.
To assess the effect of division for each group, we refitted
separate models for each group, with fixed effects of
condition and division, and random intercepts for each
participant. In the case of the non-percussionists this
showed a significant effect of division,F(2, 1483) ¼
17.31, p< .001. A Tukey post hoc analysis showed that
the effect of division is driven by a significant difference
between crotchet-semiquaver, Z ¼5.94, p
< .001, and
crotchet quaver semiquaver
Beat division
Synchronization error (s)
crotchet quaver semiquaver
Beat division
zero jitter 20ms 10ms
FIGURE 6. Median and IQRs of synchronization error for the first rhythmic task for all latency conditions for both groups.
crotchet quaver semiquaver
Synchronization error (s)
FIGURE 7. Interaction contrasts between division and group for
synchronization error.
Action-sound Latency, Perceived Instrument Quality 119
between quaver-semiquaver, Z¼3.70, p
< .001. In the
case of the professional percussionists this showed a sig-
nificant effect of condition, F(3, 806) ¼8.82, p< .001. A
Tukey post hoc analysis showed that the effect of divi-
sion is, as expected from the interactions above, driven
by a significant difference between 10 ms and zero
latency, Z¼3.04, p
< .01, 20 ms and zero latency,
Z¼4.56, p
< .001, and 10 ms +3msandzero
latency, Z¼4.30, p
< .001.
Variability of synchronization error. To evaluate the
variability of timing accuracy we refitted the above
mentioned model but with standard deviation of the
synchronization error as the dependent variable. We
observed heteroskedasticity in the residuals of the fitted
model, which we rectified using a log transform of the
dependent variable (sderror). We found a significant
effect of group,F(1, 20) ¼16.57, p< .001, division,
F(2, 220) ¼5.46, p< .01, condition, F(3, 220) ¼4.24,
p< .01, and an interaction between condition and divi-
sion,F(6, 220) ¼3.50, p<.01.Themeanstandard
deviation between groups (across all conditions and divi-
sions) is 0.02 for non-percussionists and 0.01 for per-
cussionists: this is a difference of over almost 50%.
Upon testing for interaction contrasts between condi-
tion and division we found the significant interactions
are between 20 ms and each of the crotchet-semiquaver,
c2(1) ¼13.16, p< .01, and quaver-semiquaver, c2(1) ¼
8.44, p< .05, conditions. This can be seen in Figure 8.
We noted a medium but nonsignificant positive corre-
lation between error and standard deviation of error
(i.e., as error decreases, so does its variation).
The structured interviews conducted at the end of the
study were annotated and then coded using a thematic
analysis framework (Braun & Clarke, 2006). Our coding
strategy aimed to identify the major themes that related
to latency perception and judgments of instrument
quality. Other themes that came from these interviews
related to style, the constraints of the instrument and
the evolution of gesture over the duration of the study
have been presented elsewhere (Jack, Stockman, &
McPherson, 2017).
Awareness of latency. Latency perception was the first
theme we investigated; whether the settings with latency
were perceived as having a delay or not. Only 3 out of
the 11 participants stated that there was latency or delay
changing between the settings. This suggests that either
the amounts of latency were small enough to not be
perceived as a delay for 8 of the 11 participants or that
the changing sample sets masked the changing latency
conditions. When asked what was changing between
settings aside from the sample set participants generally
reported a changing responsiveness and level of
dynamic control: they described shifting triggering
thresholds at times, that the instrument was catching
less of their strikes under certain settings, or that the
dynamic range of instrument was expanding and con-
tracting, factors that were not in fact changing. In the
quality ratings from Part 1 of the study we have seen
the zero latency condition receiving more positive rat-
ings than the 10 ms +3 ms and 20 ms latency con-
dition for the attribute Temporal Control. This
suggests that a disruption to the temporal behavior
of the instrument was identified even if its cause was
not established as delayed auditory feedback. Some
participants also acknowledged that under certain con-
ditions they were struggling to maintain timing
although, again, they did not specifically identify that
a delay or latency was the cause:
‘‘ ...One was very difficult to keep some sort of
stable timing on, while the other one just clicked for
some reason and made a lot more sense.’’ (Partici-
pant 4)
‘‘On the second one (condition A) I didn’t have to
put much thought into it or didn’t have to tap myself
crotchet quaver semiquaver
Standard deviation of
synchronization error (s)
FIGURE 8. Interaction contrasts between condition and division for the
standard deviation of synchronization error.
120 Robert H. Jack, Adib Mehrabi, Tony Stockman, & Andrew McPherson
in or anything. It was just there under my finger
tips.’’ (Participant 10)
‘‘ ...I was playing very fast passages and seeing if it
captures all the notes. In some of the settings it
wasn’t tracking well but in others it was.’’ (Partici-
pant 8)
These quotes point towards the complexity of the
‘‘response’’ of the instrument: this term does not seem
to have been reduced to how fast the instrument
responded, but rather is about how much of the parti-
cipant’s playing was translated into sound by the instru-
ment: judgments seem to be based on how participants
felt the instrument was reflecting the energy they put in.
In addition to the above reports, there were also addi-
tional multimodal effects of the latency conditions
reported, where the perceived effort required to play
a note increased with latency.
Reported effects of latency during the study. Four of
the 11 participants reported that under certain latency
conditions they felt they needed to strike the instrument
with more force to get the instrument to respond in the
way they wanted.
‘‘I also noticed that I had to put more energy into
one or other of the pairs to get a sound from the
instrument.’’ (Participant 9)
For these four participants we analyzed the variation
in striking velocity across latency conditions to test if
their reports of increased force of strike was influenced
by latency condition. We found that for these partici-
pants there was indeed an increased mean velocity of
strike for 20 ms and 10 +3 ms latency in comparison
to the zero latency condition (Jack, Stockman, &
McPherson, 2016).
Awareness of latency. In general the professional per-
cussionists were more aware of the latency conditions
than the non-percussionists, with 9 out of 10 mention-
ing it as the changing factor between settings.
‘‘I felt like some of them were a bit more ‘on top’
[...]sometimes you felt like it wasn’t instantaneous
and you’re not connected to it.’’ (Participant 4)
‘‘The latency also changed as well and they [the
sample set] weren’t necessarily related.’’ (Partici-
pant 1)
‘‘ ...Sometimes there was a bit of a delay, sometimes
the note was behind the strike.’’ (Participant 3)
They were also more conscious of latency as an issue
that faces digital musical instruments. In some cases this
came from their experience of using digital samplers in
live performance or from experiences of home record-
ing with a backing track.
‘‘I have a set of TD Roland drums, and they have
latency, it’s very slight but I definitely notice it, it’s
more than it would be from an acoustic kit defi-
nitely.’’ (Participant 5)
Many of the participants also explicitly mentioned
latency as a negative factor in an instrument’s design
that impedes their performance.
‘‘ ...With percussionists, we’re so used to, you hit it
and, bang, it’s there. So any kind of delay is a bit
disconcerting.’’ (Participant 7)
‘‘When it sits on top it’s a lot more enjoyable to play.
I know when that happens you tend to forget that
you’re playing something, and you tend to explore,
you make music then, rather than trying to work out
the instrument.’’ (Participant 4)
Ability to adjust to action-sound latency. Participants
also spoke of their ability to adjust to the changing
latency conditions naturally and without too much
active thought when they were freely improvising with-
out a metronome.
‘‘Because I’ve got experience of adjusting I was able
to adjust to what I was hearing. I do that naturally.
When playing acoustic instrument you listen to
what’s coming out and you adjust to it. It’s always
a tiny little difference, you can adjust naturally. You
deal with it. You can get by.’’ (Participant 3)
‘‘We do have experience with working with delay
and trying to think about that. You need to com-
pensate so that you don’t sound late. You don’t really
think about it much normally, it’s too much if you
think about it, has to be by feel.’’ (Participant 7)
‘‘I was adjusting very quickly. If I was doing them all
with a click track I think the response I gave would be
different as I would actually feel myself trying to
adjust to where the beat was when there was latency,
like this I just did it without thinking.’’ (Participant 5)
Experience with acoustic instruments. One of the
main differences between the two groups was their
awareness and experience with latency. From the PP
group there were many comparisons made between
latency in a digital instrument and the timing adjust-
ments that orchestral percussionists have to do as they
Action-sound Latency, Perceived Instrument Quality 121
switch their position in the orchestra or switch the
acoustic instrument they’re playing. Talk of ‘‘sitting
behind the beat,’’ ‘‘sitting in front of the beat,’’ and ‘‘sit-
ting right on the beat’’ described how the percussionists
conceptualize the microadjustments they make to their
timing in order to ensure that the conductor (and audi-
ence) hears them as in time with the rest of the ensemble.
When asked how they manage to adjust their playing
like this, most stated that they had no idea how they
actually did it, it was something that they had learned
from being told by a conductor or other performers that
they were coming in early or late and at this point in
their careers it was just a necessary part of their role that
they were able to do without thinking. They mentioned
that the dress rehearsal before a concert was the most
important in terms of making this adjustment as their
timing needs to be tuned to their position in the ensem-
ble and the acoustics of the room. Latencies within the
range of 10 ms to 40 ms are common in ensemble play-
ing due to the distance between players (Chafe, C´
& Gurevich, 2010). The importance of playing in time is
heightened by the rhythmical importance of the percus-
sion section and the impulsive style of playing.
‘‘If you’re sitting at the back of the orchestra the
physical sound getting to the front takes longer as
you’re so far back. And for certain instruments this
can take even longer. A lot of the time you have to
play a little bit ahead or behind the beat to make sure
it fits with everything else.’’ (Participant 3)
Percussionists must also adjust to the mechanical
action of the instrument they are playing. Professional
percussionists are multi-instrumentalists that are
expected to master and be able to switch between many
different instruments in a matter of seconds. This brings
with it the ability to switch playing techniques quickly
and to adjust playing style to the specific action of an
instrument—what the percussionists referred to as an
instrument speaking early or late. Tamborine was given
as an example of an instrument that sounds late, as was
tubular bells and timpani. Examples of instruments that
speak early included triangle and other metallic instru-
ments played with hard beaters. The notion of how an
instrument speaks seems to be related to frequency range
of the instrument but also to surface hardness, the action
of the instrument (triangle versus church bell for exam-
ple), and striking type (hard versus soft beaters, played
with the hands or not), although conflicting examples
were given by different percussionists.
‘‘In the case of this instrument [the instrument used
in this study] it’s to do with the fact that it’s hard.
I know that hard surfaces sound immediately, were
as floppy surfaces sound later, like timpani. I guess
that’s just sort of Pavlovian – it’s hard, it’s going to
sound quickly.’’ (Participant 7)
In terms of quality judgments, both groups seem to be
generally in agreement. The results from Part 1 suggest
that latency of 20 ms and 10 ms +3 ms can degrade the
perceived quality of an instrument in terms of temporal
control and general preference, even when the amount
of latency is too small to be perceived as a delay by the
performer. This is in agreement with findings from
Kaaresoja et al. (2011) when evaluating the impact of
audiotactile latency on user interaction with touchsc-
reens. The fact that condition D (10 ms +3 ms latency)
was rated in a similarly negative manner as condition C
(20 ms latency) in relation to the zero latency condition,
but that condition B (10 ms latency) did not receive
similar negative ratings, highlights the importance of
stable as well as low latency. This points to an agreement
with Wessel and Wright’s (2002) recommendations of
10 ms latency with 1 ms of jitter as a goal for digital
musical instruments.
None of the participants in this experiment per-
formed with a mean degree of accuracy in Part 2 that
was better than the jitter amount (+3 ms), yet this
condition was still rated negatively. This suggests that
subtle variation in the stability of the temporal response
of an instrument can be detected by performers even if
they cannot perform with a degree of accuracy that is
less than the jitter amount. These findings, alongside
previous work (Repp & Su, 2013), suggest that the
amount of acceptable latency and jitter does not corre-
spond directly to the limits of sensorimotor accuracy
possible by the player.
In the non-percussionist group, 3 of the 11 participants
identified latency, or delay, as the changing factor
between settings. For the other 8 participants the differ-
ence between settings was reported as a changing trig-
gering threshold or dynamic range, both of which
remained identical throughout the study.
In the professional percussionist group, 9 of the 10
participants reported latency or delay as the changing
factor between settings. It seems that this group was
much more aware of latency and its causes from their
experience as orchestral players, and were generally
122 Robert H. Jack, Adib Mehrabi, Tony Stockman, & Andrew McPherson
better at discussing it, as can be seen in the examples
presented from the structured interview. Making micro-
adjustments to the timing of their performance in rela-
tion to an ensemble or to their instrument is a common
part of professional percussionist’s role as a musician.
This may explain the difference between the groups,
alongside the superior synchronization ability
(Cameron & Grahn, 2014) and timing acuity (Ehrle
Samson, 2005) of percussionists, whether from their
extensive training on a rhythm-based instrument or
natural ability.
Effect of latency and beat division on mean
synchronization error. We found significant differences
between the effects of latency and beat division on the
timing ability of both groups in this study. There was
a significant effect of condition for both groups and
significant differences (i.e., interactions) between group
and beat division. For the NP group we found that
increasing division of the beat was affecting the accu-
racy of their playing, whereas latency condition showed
no significant effect: we observed the MSE and variation
of MSE increasing as the beat division increased. This
suggests that the error in their temporal performance
increased as they were required to strike faster. We
found no significant effect of latency condition on tim-
ing accuracy for this group.
The opposite seems to be the case for the PP group:
timing accuracy was significantly affected by latency
condition and not by beat division in most cases. For
this group the zero latency condition had a significantly
lower MSE in comparison to the other three latency
conditions across beat divisions. The standard deviation
of MSE under each latency condition did not differ
significantly for each beat division aside from for
20 ms latency at the smallest beat division (semiquaver),
as can be seen in Figure 8. This suggests that for the
larger divisions, this group did not find latency disrup-
tive to timing accuracy: when they were required to play
at a speed above a certain threshold (IOI of 125 ms) the
latency became detrimental to their performance. From
our findings this was only the case with the 20 ms
latency condition, which would equate to 16%of the
IOI of 125 ms when playing semiquavers at 120 bpm,
well above the variation in timing accuracy from pro-
fessional percussionists that has been previously
reported (Dahl & Bresin, 2001).
Mean synchronization error. Generally, we observed
a higher degree of variation in the MSE of the NP group
in comparison to the PP group, as can be seen in Figure
6. This agrees with the findings of Manning and Schutz
(2016) that participants with high levels of rhythm-
based training (particularly percussionists) show supe-
rior timing abilities (MSE and variability of MSE) and
temporal acuity in comparison to other musicians and
The group means of the MSE for the zero latency
condition and all metronome ranged from 17 to 7
ms for NP and from 15 to 12 ms for PP. The mean
standard deviation ranged from 20 to 33 ms for NP and
8 to 12 ms for PP. The MSE and SD for the NP group
were larger than that found by Fujii et al. (2011) in their
study with highly trained percussionists where a mean
synchronization error of 13 to 10 ms was achieved for
a metronome with standard deviations of 10 to 16 ms,
whereas the MSE and SD of MSE for the PP are roughly
aligned with these findings. The MSEs of both groups
were smaller than those reported in previous tapping
studies with nonmusicians in which MSE was usually
around -20 to -80 ms, while for the NP group they were
roughly equivalent to the performance of amateur musi-
cians: 10 to 30 ms (Aschersleben, 2002; Repp & Doggett,
2007). The values of MSE for the PP group in this study
were smaller when compared with the finger tapping
study of Gerard and Rosenfeld (1995), who found an
MSE of -25 ms in professional percussionists.
A further analysis step that falls beyond the scope of
this paper would be to investigate systematic synchro-
nization errors in the performances of each of our
groups. In this respect part of the synchronization error
that we observed could be attributed to systematic and
reoccurring time deviances (Hellmer & Madison, 2015).
Adaptation and negative mean asynchrony. In the PP
group we also saw an increase in negative mean asyn-
chrony during the crotchet and quaver beat divisions
that partially reflected the amount of latency being
added to the instrument. We observed an increase in
MSE of approximately 5 ms and 10 ms for the 10 ms
and 20 ms latency condition, respectively. This can be
seen quite clearly in Figure 6. It seems that there was
a degree of compensation in relation to the latency con-
dition but it was not an anticipation of the full latency
amount (i.e., moving a strike 20 ms earlier when 20 ms
of latency was present to bring the auditory feedback in
line with the metronome). Anticipation of strike to
match sound has been observed by others when intro-
ducing larger amounts of delay to auditory feedback
(Aschersleben & Prinz, 1997; Dahl & Bresin, 2001; Sten-
neken et al., 2006). These anticipation effects were not
observed with the NP group for any latency condition.
We could also hypothesize that as a result of our exper-
imental method, where the amount of latency was chan-
ged regularly between conditions, the adaptation as
Action-sound Latency, Perceived Instrument Quality 123
reported in other studies did not have enough time to
fully occur (Vroomen & Keetels, 2010).
Regardless of training, participants generally agreed on
judgments of perceived instrument quality. In the case
of the non-percussionists, even if the latency was not
perceived as a delay, its effect on the fluency of interac-
tion with the instrument was recognized by the partici-
pants. Timing accuracy in the non-percussionist group
was not significantly affected by the latency condition,
yet this group rated 20 ms and 10 ms +3 ms latency
negatively in comparison to the zero latency condition.
From the structured interviews they were reports that
certain conditions felt ‘‘under the fingers,’’ whereas with
others the connection between action and sound was
not as clear. This highlights the subtlety of the effects of
latency and the specific demands of percussion instru-
ments where sound is a result of direct unmediated
In general the PP group was much more aware of
latency and able to identify it as the changing factor
between settings, and talk explicitly about adjusting for
latency. This is perhaps due to their extensive rhythmi-
cal training and expertise in switching between instru-
ments with different actions. Some of the percussionists
spoke about the changing latency conditions as the
changing action of the instrument: whether the instru-
ment would sound ‘‘late’’ or their playing would be
‘‘right on top’’ of the beat, allowing them to forget the
instrument and concentrate on making music. This
connects with ideas of instrument transparency: Nijs,
Lesaffre, and Leman (2009) propose musical instru-
ments as mediators between gesture and sound output.
Transparency in this mediation is the point where the
performer doesn’t need to focus attention on the indi-
vidual operations of manipulating an instrument,
instead focusing on higher-level musical intentions.
Latency in this interaction might be understood as a bar-
rier to transparency.
Latency perception and the effects of latency vary
widely dependent on the nature of the musical task,
style of playing, instrument, and individual experience
of the performer. From our study we cannot determine
the acceptable amount of latency that digital musical
instruments should aim for in general, and also, as our
sample size is relatively small there needs to be a degree
of caution in interpreting our results, as statistical
power is necessarily limited by the amount of partici-
pants in each group. Our aim with this study, rather, is
to highlight the effects of small amounts of latency on
the perceived quality of an instrument, an effect that
we propose as similar to the degradation of feelings of
presence in VR situations: latency as ‘‘a cause for
reduction of suspension of disbelief’’ (Allison, Harris,
Jenkin, Jasiobedzka, & Zacher, 2001). In the case of
digital musical instruments the notion ‘‘presence’’ is
perhaps best equated to the erogtic aspects of an
instrument: how energy is maintained in the digital
system in the translation of action to sound (Luciani
et al., 2009). Latency stands a barrier to the fluency of
interaction that digital musical instruments should
aim to foster.
Concluding Remarks
We have presented a study that investigated the impact
of latency and jitter on the temporal accuracy of perfor-
mance and judgments of instrument quality for two
groups of participants; professional percussionists and
non-percussionists (with varying amounts of musical
experience). The studies involved quality assessments
of a novel percussive instrument with variable latency
(zero, 10 ms, 10 ms +3 ms, 20 ms), temporal accuracy
tests and structured interviews.
In terms of judgments of instrument quality, we
found that both groups showed a preference for 0 ms
in comparison to 10 ms +3 ms and 20 ms latency.
Importantly, the 0 ms and 10 ms latency conditions
show no significant difference in rating for either group.
This suggests that a stable latency of 10 ms is acceptable
to performers of a DMI where 20 ms is not. The 10 ms
+3 ms latency condition was rated in a similarly neg-
latency condition, suggesting that the addition of a ran-
dom jitter of +3 ms is enough to negatively effect the
perceived quality of an instrument. Our results support
the recommendation put forward by Wessel and Wright
(2002) that DMI designers should aim for a latency of
10 ms or below with a jitter of 1 ms or less. However, our
findings cannot tell us exactly what the minimum
threshold of acceptable latency is, except that it must
be somewhere between 10 ms and 20 ms.
Ability to perceive latency varied between groups, as
did the impact on temporal performance. Generally
professional percussionists were more aware of the
latency conditions and better able to adjust for them
in their playing, although this ability decreased as the
temporal demands of the task increased. We have seen
that latency negatively affects judgments of instrument
quality even when the amount of latency is not detectable
as a delay and has no impact on timing performance.
Latency can degrade the illusion of action translating to
sound, a factor that is central to expressive and skilled
124 Robert H. Jack, Adib Mehrabi, Tony Stockman, & Andrew McPherson
control of digital musical instruments. In this study we
have demonstrated the effects of latency on two different
groups of musicians and found marked differences
between each group in terms of disruption to timing
accuracy, and the ability to identify latency. Both groups
were in agreement as to the impact of latency on the
quality of the instrument in question. This suggests that
the influence of latency on the perceived quality of a dig-
ital system does not hinge on the temporal acuity of the
user, rather, it is something that can degrade the fluency
of the interaction regardless of level of skill.
Author Note
Correspondence concerning this article should be
addressed to Robert Jack, 110C Deptford High Street,
London, United Kingdom, SE8 4NS. E-mail:
E. M., & FIELD, M. (2003). Sensitivity to haptic-audio asyn-
chrony. In S. Oviatt (Ed.), Proceedings of the 5th International
Conference on Multimodal Interfaces (pp. 73-76). Vancouver,
Canada: ACM.
ZACHER, J. E. (2001). Tolerance of temporal delay in virtual
environments. In H. Takemura & K. Kiyokawa (Eds.),
Proceedings of Institute of Electrical and Electronics Engineers
(IEEE) Virtual reality (pp. 247–254). Yokohama, Japan: IEEE.
(2014). How low should we go? Understanding the perception
of latency while inking. In P. Kry & A. Bunt (Eds.), Proceedings
of Graphics Interface (pp. 167–174). Montreal, Canada: ACM.
ASCHERSLEBEN, G. (2002). Temporal control of movements in
sensorimotor synchronization. Brain and Cognition,48, 66–79.
ASCHERSLEBEN,G.,&PRINZ, W. (1997). Delayed auditory feed-
back in synchronization. Journal of Motor Behavior,29, 35–46.
ASKENFELT,A.,&JANSSON, E. V. (1988). From touch to string
vibrations - The initial course of the piano tone. Department
for Speech Music and Hearing, Quarterly Progress and Status
Report,29, 31–109.
Fitting linear mixed-effects models using lme4. Journal of
Statistical Software,67, 1–48.
BLACK, J. W. (1951). The effect of delayed side-tone upon vocal
rate and intensity. Journal of Speech and Hearing Disorders,16,
BRAUN,V.,&CLARKE, V. (2006). Using thematic analysis in
psychology. Qualitative research in Psychology,3, 77–101.
CALVERT,G.,SPENCE,C.,&STEIN, B. E. (2004). The handbook of
multisensory processes. Cambridge, MA: MIT Press.
CAMERON,D.J.,&GRAHN, J. A. (2014). Enhanced timing
abilities in percussionists generalize to rhythms without
a musical beat. Frontiers in Human Neuroscience,8, 1003.
aCERES,J.-P.,&GUREVICH, M. (2010). Effect of
temporal separation on synchronization in rhythmic perfor-
mance. Perception,39, 982–992.
CHAFE,C.,&GUREVICH, M. (2004). Network time delay and
ensemble accuracy: Effects of latency, asymmetry. In B.
McQuaide (Ed.), Proceedings Audio Engineering Society (AES)
Convention 117. San Francisco, USA: AES.
DAHL, S. (2000). The playing of an accent–preliminary
observations from temporal and kinematic analysis of
percussionists. Journal of New Music Research,29(3),
DAHL,S.,&BRESIN, R. (2001). Is the player more influenced by
the auditory than the tactile feedback from the instrument? In
M. Fernstr (Ed.), Proceedings of Digital Audio Effects (DAFX-
01) (pp. 194–197). Limerick, Ireland: DAFX.
DEROSARIO-MARTINEZ, H. (2015). phia: Post-hoc interaction
analysis [Computer software manual]. Retrieved from¼phia (R package ver-
sion 0.2-1)
much faster is fast enough? In B. Begole & J. Kim (Eds.),
Proceedings of the 33rd Annual ACM Conference on Human
Factors in Computing Systems (pp. 1827–1836). Seoul,
Republic of Korea: ACM
´,N.,&SAMSON, S. (2005). Auditory discrimination of
anisochrony: Influence of the tempo and musical backgrounds
of listeners. Brain and Cognition,58, 133–147.
ESSL,G.,&OMODHRAIN, S. (2006). An enactive approach to
the design of new tangible musical instruments. Organised
Sound,11, 285–296.
FELS, S. (2004). Designing for intimacy: Creating new interfaces
for musical expression. Proceedings of the Institute of Electrical
and Electronics Engineers (IEEE),92, 672–685.
FINNEY, S. A. (1997). Auditory feedback and musical keyboard
performance. Music Perception,15, 153–174.
KLAUER,G.,MALAVOLTA,L.,ET AL. (2015). Rendering and
subjective evaluation of real vs. synthetic vibrotactile cues on
a digital piano keyboard. In J. Timoney (Ed.), Proceedings of the
Sound and Music Computing Conference. Maynooth, Ireland:
Action-sound Latency, Perceived Instrument Quality 125
Y., & ODA , S. (2011). Synchronization error of drum kit
playing with a metronome at different tempi by professional
drummers. Music Perception,28, 491–503.
FUJISAKI,W.,&NISHIDA, S. (2009). Audio-tactile superiority
over visuo-tactile and audio-visual combinations in the tem-
poral resolution of synchrony perception. Experimental Brain
Research,198(2-3), 245–259.
Effect of different delayed auditory feedback intervals
on a music performance task. Perception and Psychophysics,
15, 21–25.
GERARD,C.,&ROSENFELD, M. (1995). Musical expertise
and temporal regulation. Annee Psychologique,95,
the ears and skin. Journal of Experimental Psychology,71,
GOEBL,W.,&PALMER, C. (2008). Tactile feedback and timing
accuracy in piano performance. Experimental Brain Research,
186, 471–479.
HELLMER,K.,&MADISON, G. (2015). Quantifying microtiming
patterning and variability in drum kit recordings. Music
Perception,33, 147–162.
HOWELL, P. (2001). A model of timing interference to speech
control in normal and altered listening conditions applied to
the treatment of stuttering. In B. Maasesen, W. Hulstijn, R.
Kent, H. Peters, & van Lieshout, P. H. (Eds.), Speech motor
control in normal and disordered speech (pp. 291–294).
Uttgeverij: Nijmegen.
JACK,R.H.,STOCKMAN,T.,&MCPHE RSON, A. (2016). Effect of
latency on performer interaction and subjective quality
assessment of a digital musical instrument. In J. Fagerlo¨nn
(Ed.), Proceedings of the Audio Mostly (pp. 116–123).
Norrko¨ping, Sweden: ACM.
gesture, reduced control: The influence of constrained map-
pings on performance technique. In M. Gillies (Ed.),
Proceedings of 4th International Conference on Movement
Computing. London, United Kingdom: ACM.
JOTA,R.,NG,A.,DIETZ,P.,&WIGDOR, D. (2013). How fast
is fast enough? A study of the effects of latency in direct-
touch pointing tasks. In S. Brewster & S. Bødker (Eds.),
Proceedings of the Special Interest Group on Computer–
Human Interaction (SIGCHI) Conference on Human
Factors in Computing Systems (pp. 2291–2300). Paris,
France: ACM.
KAARESOJA,T.,ANTTILA,E.,&HOGGAN, E. (2011). The effect of
tactile feedback latency in touchscreen interaction. In C.
Basdogan (Ed.), IEEE World Haptics Conference (pp. 65–70).
Istanbul, Turkey: IEEE.
KAARESOJA,T.,&BREWSTER, S. (2010). Feedback is...late:
Measuring multimodal delays in mobile device touchscreen
interaction. In W. Gao, C. Lee, & J. Yang (Eds.), International
Conference on Multimodal Interfaces and the Workshop on
Machine Learning for Multimodal Interaction. Beijing, China:
the temporally perfect virtual button: Touch-feedback
simultaneity and perceived quality in mobile touchscreen
press interactions. ACM Transactions on Applied Perception,
11(2), Article 9.
KIETZMAN,M.L.,&SUTTON, S. (1968). The interpretation of
two-pulse measures of temporal resolution in vision. Vision
Research,8, 287–302.
KILCHENMANN,L.,&SENN, O. (2011). ‘‘Play in time, but don’t
play time’’: Analyzing timing profiles in drum performances.
In A. Williamon, D. Edwards, & L. Bartel (Eds.), Proceedings of
the International Symposium on Performance Science
(pp. 593–598). Toronto, Canada: ISPS.
KRAUSE,V.,POLLOK,B.,&SCHNITZLER, A. (2010). Perception in
action: The impact of sensory information on sensorimotor
synchronization in musicians and non-musicians. Acta
Psychologica,133, 28–37.
(2017). lmerTest package: Tests in linear mixed effects models.
Journal of Statistical Software,82(13), 1–26.
LAGO,N.,&KON, F. (2004). The quest for low latency. In M.
Gurevich (Ed.) Proceedings of the International Computer
Music Conference (pp. 33–36). Miami, USA: ICMC
LESTER,M.,&BOLEY, J. (2007). The effects of latency on live sound
monitoring. Audio Engineering Society Convention, 123, 1–20.
E., & DUBOIS, D. M. (2000). The perception of cross-modal
simultaneity (or ‘‘the Greenwich observatory problem’’ revis-
ited). In D. Dubois (Ed.), AIP Conference Proceedings (pp. 323–
329). Liege, Belgium: AIP.
E. (2003). Perceptual differences in sequential stimuli across
patients with musician’s and writer’s cramp. Movement
Disorders, 18, 1286–1293.
(2009). Ergotic sounds: A new way to improve playability,
believability and presence of virtual musical instruments.
Journal of New Music Research,38, 309–323.
MACKAY, D. G. (1987). The organization of perception and
action: A theory for language and other cognitive skills. Berlin,
Germany: Springer-Verlag.
MACKENZIE,I.S.,&WARE, C. (1993). Lag as a determinant of
human performance in interactive systems. In B. Arnold, G.
van der Veer, & T. White (Eds.), Proceedings of the Interact’93
and Chi’93 Conference on Human Factors in Computing
Systems (pp. 488–493). Amsterdam, Netherlands: ACM
126 Robert H. Jack, Adib Mehrabi, Tony Stockman, & Andrew McPherson
MAGNUSSON,T.,&MENDIETA, E. H. (2007). The acoustic, the
digital and the body: A survey on musical instruments. In C.
Parkinson & E. Singer (Eds.), Proceedings of the 7th
International Conference on New Interfaces for Musical
Expression (pp. 94–99). New York, USA: NIME.
¨INEN, P. (2004). Latency tolerance
for gesture controlled continuous sound instrument without
tactile feedback. In M. Gurevich (Ed.) Proceedings of the
International Computer Music Conference (pp. 1-5). Miami,
MANNING,F.C.,&SCHUTZ, M. (2016). Trained to keep a beat:
Movement-related enhancements to timing perception in
percussionists and non-percussionists. Psychological Research,
80, 532–542.
MCPHERSON,A.,JACK,R.H.,&MORO, G. (2016). Action-
sound latency: Are our tools fast enough? In S. Wilkie & E.
Benetos (Eds.), Proceedings of the International Conference
on New Interfaces for Musical Expression. Brisbane,
Australia: NIME.
MCPHERSON,A.,&ZAPPI, V. (2015). An environment for
submillisecond-latency audio and sensor processing on
beaglebone black. In B. Kostek & U. Zanghieri (Eds.),
Audio Engineering Society Convention 138. Warsaw,
Poland: AES.
MEDEIROS,C.B.,&WANDERLEY, M. M. (2014). A comprehen-
sive review of sensors and instrumentation methods in devices
for musical expression. Sensors,14, 13556–13591.
(2003). Effect of latency on presence in stressful virtual
environments. In J. Chen, B. Froehlich, B. Loftin, U. Neumann,
& H. Takemura (Eds.), Proceedings Virtual Reality
(pp. 141–148). Los Angeles, USA: IEEE.
Determinants of multisensory integration in superior collicu-
lus neurons. i. temporal factors. Journal of Neuroscience,7,
Adaptation to audiotactile asynchrony. Neuroscience Letters,
(2012). Designing for low-latency direct-touch input. In H.
Benko & C. Latulipe (Eds.), Proceedings of the 25th Annual
ACM Symposium on User Interface Software and Technology
(pp. 453–464). Cambridge MA, USA: ACM.
NIJS,L.,LESAFFRE,M.,&LEMAN, M. (2009). The musical
instrument as a natural extension of the musician. In M.
Castellango & H. Genevois (Eds.), Proceedings of the 5th
Conference of Interdisciplinary Musicology (pp. 132–133).
Paris, France: LAM-Institut jean Le Rond d’Alembert.
OCCELLI,V.,SPENCE,C.,&ZAMPINI, M. (2011). Audiotactile
interactions in temporal perception. Psychonomic Bulletin and
Review,18, 429–454.
digital musical instruments. Computer Music Journal,35,
PFORDRESHER,P.,&PALMER, C . (2002). Effects of delayed audi-
tory feedback on timing of music performance. Psychological
Research,66, 71–79.
PFORDRESHER, P. Q. (2003). Auditory feedback in music perfor-
mance: Evidence for a dissociation of sequencing and timing.
Journal of Experimental Psychology: Human Perception and
Performance,29, 949–964.
PFORDRESHER, P. Q. (2005). Auditory feedback in music perfor-
mance: The role of melodic structure and musical skill. Journal
of Experimental Psychology: Human Perception and
Performance,31, 1331–1345.
PFORDRESHER, P. Q. (2006). Coordination of perception and
action in music performance. Advances in Cognitive
Psychology,2, 183–198.
PFORDRESHER, P. Q. (2008). Auditory feedback in music perfor-
mance: The role of transition-based similarity. Journal of
Experimental Psychology: Human Perception and Performance,
34, 708–725.
PFORDRESHER,P.Q.,&DALLA BELLA, S. (2011). Delayed audi-
tory feedback and movement. Journal of Experimental
Psychology: Human Perception and Performance,37(2),
PFORDRESHER,P.Q.,&PALMER, C. (2006). Effects of hearing the
past, present, or future during music performance. Attention,
Perception, and Psychophysics,68, 362–376.
ORE TEAM. (2017). R: A language and environment for sta-
tistical computing [Computer software manual]. Vienna,
Austria. Retrieved from
¨LLER, E. (2006). Temporal infor-
mation processing in musicians and nonmusicians. Music
Perception,24, 37–48.
REPP, B. H. (2000). Compensation for subliminal timing per-
turbations in perceptual-motor synchronization. Psychological
Research,63, 106–128.
REPP,B.H.,&DOGGETT, R. (2007). Tapping to a very slow beat:
A comparison of musicians and nonmusicians, Music
Perception,24, 367–376.
REPP,B.H.,&SU, Y. H. (2013). Sensorimotor synchronization:
A review of recent research (2006–2012). Psychonomic Bulletin
and Review,20, 403–452.
(2012). Perceptual evaluation of violins: A quantitative
analysis of preference judgments by experienced players.
Journal of the Acoustical Society of America,132,
SCHULTZ,B.G.,&VA N VUGT, F. T. (2016). Tap arduino: An
arduino microcontroller for low-latency auditory feedback in
sensorimotor synchronization experiments. Behavior Research
Methods,48, 1591–1607.
Action-sound Latency, Perceived Instrument Quality 127
SHEFFIELD,E.,&GUREVICH, M. (2015). Distributed mechan-
ical actuation of percussion instruments. In E. Berdahl
(Ed.), Proceedings of the International Conference on New
Interfaces for Musical Expression (pp. 11–15). Louisiana,
SPENCE,C.,&PARISE, C. (2010). Prior-entry: A review.
Consciousness and Cognition,19, 364–379.
ASCHERSLEBEN, G. (2006). The effect of sensory feedback on
the timing of movements: evidence from deafferented patients.
Brain Research,1084, 123–131.
SCHLOSS, A. (2005). A comparison of sensor strategies for
capturing percussive gestures. In S. Fels (Ed.), Proceedings of
the 2005 Conference on New Interfaces for Musical Expression
(pp. 200–203). Vancouver, Canada: NIME.
FURUYA, S. (2014). Expert pianists do not listen: the
expertise-dependent influence of temporal perturbation on
the production of sequential movements. Neuroscience,269,
VROOMEN,J.,&KEETELS, M. (2010). Perception of intersensory
synchrony: A tutorial review. Attention, Perception, and
Psychophysics,72, 871–884.
WESSEL,D.,&WRIGHT, M. (2002). Problems and prospects for
intimatemusical control of computers. Computer Music Journal,
26(3), 11–14.
and gesture latency measurements on linux and osx. In M.
Gurevich (Ed.), International Computer Music Conference (pp.
423–429), Miami, USA: ICMC.
K., & WIGDOR,D.(2014).Zero-latencytapping:Usinghover
information to predict touch locations and eliminate touch-
down latency. In M. Dontcheva & D. Wigdor (Eds.),
Proceedings of the 27th Annual ACM Symposium on User
Interface Software and Technology (pp. 205–214). Honolulu,
YATE S, A. J. (1963). Recent empirical and theoretical approaches
to the experimental manipulation of speech in normal sub-
jects and in stammerers. Behaviour Research and Therapy,1,
128 Robert H. Jack, Adib Mehrabi, Tony Stockman, & Andrew McPherson
... In terms of perceived quality of stimuli, it has been reported that audiovisual misalignment exceeding 20 ms causes discomfort, particularly if the audio precedes the video [31,32]. In musical contexts, where temporal accuracy is extremely important, perceived quality of sound even in non-professionals has been reported to deteriorate with as little latency as 10 ms [33]. Therefore, although the latencies reported by Zhdanov et al. [25] are certainly low enough for smooth social interaction and accurate audiovisual integration of speech stimuli, further reduction in audiovisual latencies and misalignment is still desirable. ...
... Site-tosite audio signal latency was about 3 ms, in either direction, which is on par with the speed of transmission of telephone landline audio signals [25]. Moreover, audio latency was completely jitter free, and well below reported thresholds for human detection of musical quality deterioration, indicating that our system would additionally be suitable for communication paradigms based on musical stimuli [33]. Finally, the video signals had short latencies (60-100 ms) and small jitter (SD: 6.57 ms). ...
Full-text available
Communication is one of the most important abilities in human society, which makes clarification of brain functions that underlie communication of great importance to cognitive neuroscience. To investigate the rapidly changing cortical-level brain activity underlying communication, a hyperscanning system with both high temporal and spatial resolution is extremely desirable. The modality of magnetoencephalography (MEG) would be ideal, but MEG hyperscanning systems suitable for communication studies remain rare. Here, we report the establishment of an MEG hyperscanning system that is optimized for natural, real-time, face-to-face communication between two adults in sitting positions. Two MEG systems, which are installed 500m away from each other, were directly connected with fiber optic cables. The number of intermediate devices was minimized, enabling transmission of trigger and auditory signals with almost no delay (1.95–3.90 μ s and 3 ms, respectively). Additionally, video signals were transmitted at the lowest latency ever reported (60–100 ms). We furthermore verified the function of an auditory delay line to synchronize the audio with the video signals. This system is thus optimized for natural face-to-face communication, and additionally, music-based communication which requires higher temporal accuracy is also possible via audio-only transmission. Owing to the high temporal and spatial resolution of MEG, our system offers a unique advantage over existing hyperscanning modalities of EEG, fNIRS, or fMRI. It provides novel neuroscientific methodology to investigate communication and other forms of social interaction, and could potentially aid in the development of novel medications or interventions for communication disorders.
... In other words, exploring approaches to XAI for the arts could both inform the design of XAI for more safety-critical systems and lead to more intuitive and engaging co-creative AI systems. For example, music interaction provides an opportunity to study a system's sensitivity to time-critical parameters since real-time, understandable feedback is critical for musicians in co-creating with digital instruments [27,28]. ...
... Furthermore, the real-time interaction creates a sensation of "playing" the model and helps to recreate other familiar musical interfaces through the use of the piano-roll notation. Real-time feedback provides musicians with an assurance that their input is being received, increases accuracy in timing during use, and positively influences their perceptions of the quality and usability of a system [27]. Comparing this to other generative music systems which often take input at the command line, the use of pads and sliders and note visualisation on a piano-roll (commonplace in digital musical interfaces) is more intuitive and typical of musical interaction. ...
Conference Paper
Full-text available
Explainable AI has the potential to support more interactive and fluid co-creative AI systems which can creatively collaborate with people. To do this, creative AI models need to be amenable to debugging by offering eXplainable AI (XAI) features which are inspectable, understandable, and modifiable. However, currently there is very little XAI for the arts. In this work, we demonstrate how a latent variable model for music generation can be made more explainable; specifically we extend MeasureVAE which generates measures of music. We increase the explainability of the model by: i) using latent space regularisation to force some specific dimensions of the latent space to map to meaningful musical attributes, ii) providing a user interface feedback loop to allow people to adjust dimensions of the latent space and observe the results of these changes in real-time, iii) providing a visualisation of the musical attributes in the latent space to help people understand and predict the effect of changes to latent space dimensions. We suggest that in doing so we bridge the gap between the latent space and the generated musical outcomes in a meaningful way which makes the model and its outputs more explainable and more debuggable. The code repository can be found at: Exploring_XAI_in_GenMus_via_LSR
... Literature [9] applied RNN to the time-series recommendation task and designed an RNN model (GRU4Rec) based on the gated recurrent unit (GRU). Literature [10] preprocesses sequences based on GRU4Rec, and embeds Dropout for data augmentation to reduce the overfitting problem. Literature [11] proposes a hierarchical RNN model that incorporates cross-session information into the recommendation model to capture intrasession and intersession dependencies. ...
Full-text available
In order to improve the effect of mixed music recommendation, this study combines music genes to construct a mixed music recommendation system. From the analysis of the complexity of each joint inference algorithm, GP + RKF has the highest complexity compared with the other three joint inference algorithms. Moreover, this study verifies it through the running time of the simulation experiment, using the growth method as the way of mutation. In addition, while adopting the optimal individual retention strategy, this study makes the eigenvalues of the input matrix IX all fall within the unit circle or unit circumference and makes the maximum fitness value of the individuals in the population equal to the global optimal fitness value. Finally, this study constructs an intelligent system. Through the experimental research, it can be seen that the hybrid music recommendation system based on the fusion of music genes proposed in this study has a good music recommendation effect.
... Dans le champ de l'informatique musicale, ce problème est souvent esquivé au prétexte que l'oreille humaine ne peut généralement distinguer des déplacements rythmiques ou des latences inférieures à 10 ms [11,18]. Nous considérons cet argument comme faible pour plusieurs raisons : -Il ignore le fait que des décalages de phase de moins de 10 ms, bien qu'elle ne soient pas perçues comme des distorsions rythmiques, produisent des différences audibles tel que des filtrages en peigne, ou des artefacts de spatialisation comme l'effet Haas [15]. ...
... With the rapid development of computer technology, computercentric information processing plays an increasingly important role, and digital audio processing technology has also been rapidly developed. Different from the analog audio processing technology, the digital audio processing technology transforms the analog signal into a series of digital signals to be stored and transmitted after discretization in time and quantization in amplitude [2]. When the audio signal becomes digital, all processing is actually a digital processing. ...
Full-text available
In order to improve the feature extraction effect of digital music and improve the efficiency of music retrieval, this paper combines digital technology to analyze music waveforms, extract music features, and realize digital processing of music features. Taking the extraction of waveform music file features as the starting point, this paper combines the digital music feature extraction algorithm to build a music feature extraction model and conducts an in-depth analysis of the digital music waveform extraction process. In addition, by setting the threshold, the linear difference between the sampling points on both sides of the threshold on the leading edge of the waveform is used to obtain the overthreshold time. From the experimental research results, it can be seen that the music feature extraction model based on digital music waveform analysis proposed in this paper has good results.
... The effect of a given ATL may also depend on training and the instrument being played. For example, percussionists appear able to maintain tempo when experiencing both moderate and extreme delays while other instrumentalists (e.g., harpists and flutists) are potentially more affected due to melodic and agogic constraints (Jack et al., 2018;Delle Monache et al., 2019). Past work has also revealed differences in the effects of ATL among instruments with melodic constraints (Bartlette et al., 2006), as well as variations due to instrument entropy (Rottondi et al., 2015) and reverberation (Carôt et al., 2009;Farner et al., 2009). ...
Full-text available
Today’s audio, visual, and internet technologies allow people to interact despite physical distances, for casual conversation, group workouts, or musical performance. Musical ensemble performance is unique because interaction integrity critically depends on the timing between each performer’s actions and when their acoustic outcomes arrive. Acoustic transmission latency (ATL) between players is substantially longer for networked music performance (NMP) compared to traditional in-person spaces where musicians can easily adapt. Previous work has shown that longer ATLs slow the average tempo in ensemble performance, and that asymmetric co-actor roles and empathy-related traits affect coordination patterns in joint action. Thus, we are interested in how musicians collectively adapt to a given latency and how such adaptation patterns vary with their task-related and person-related asymmetries. Here, we examined how two pianists performed duets while hearing each other’s auditory outcomes with an ATL of 10, 20, or 40 ms. To test the hypotheses regarding task-related asymmetries, we designed duets such that pianists had: (1) a starting or joining role and (2) a similar or dissimilar musical part compared to their co-performer, with respect to pitch range and melodic contour. Results replicated previous clapping-duet findings showing that longer ATLs are associated with greater temporal asynchrony between partners and increased average tempo slowing. While co-performer asynchronies were not affected by performer role or part similarity, at the longer ATLs starting performers displayed slower tempos and smaller tempo variability than joining performers. This asymmetry of stability vs. flexibility between starters and joiners may sustain coordination, consistent with recent joint action findings. Our data also suggest that relative independence in musical parts may mitigate ATL-related challenges. Additionally, there may be a relationship between co-performer differences in empathy-related personality traits such as locus of control and coordination during performance under the influence of ATL. Incorporating the emergent coordinative dynamics between performers could help further innovation of music technologies and composition techniques for NMP.
... Musicians can often exhibit an instrument-specific style of movement that helps them to maintain the rhythm while playing [22].For computational accuracy when making the evaluation, we considered two cases: Firstly, that the beats that were exactly predicted by the algorithms and secondly, that the beats that lay within a bound of 100ms on either side of the ground truth beat's timestamp. The value of 100ms was chosen as since the videos are 29.75 frames per second (FPS), it means within three video frames either side of the audio beat event there must be a corresponding video beat event [23]. On executing the evaluation, we found that the best detection approach depends mostly on the types of instrument played and on the number of musicians playing, as shown in Fig 6. ...
Conference Paper
Full-text available
Musical performance is an expressive art form where musicians interact with each other using auditory and non-verbal information. This paper aims to discover a robust technique that can identify musical phases (beats) through visual cues derived from a musician's body movements captured through camera sensors. A multi-instrumental dataset was used to carry out a comparative study of two different approaches: (a) motiongram, and (b) pose-estimation, to detect phase from body sway. Decomposition and filtering algorithms were used to clean and fuse multiple signals. The final representations were analysed from which estimates of the beat, based on a 'trust factor', were obtained. The Motiongram and pose estimation were found to demonstrate usefulness depending on the musical instrument as some instrument playing gestures stimulate more movement in the players than others. Overall, the results were most promising using motiongram. It performed well where string instruments were used. The spatial derivative technique based on human pose estimation was consistent with woodwind instruments, where only a small degree of motion was observed.
... The SoundBeam has been successful because of the ease of learnability of the novel vibroacoustic "beam" controller, although it can be problematic for users with limited movement [30]. Indeed, our own assessment of the SoundBeam found high latencies (greater than 100 ms) when the RF switches were used wirelessly which makes cognitive and motor usability difficult: large asynchronies between tactile and auditory feedback are disruptive to professional musicians during performance [52,53]. The SoundBeam also needs training for modification and troubleshooting because of the complex proprietary submenus. ...
Purpose:For older adults in aged-care, group music-making can bring numerous physical and psychological benefits, ultimately improving their quality of life. However, personalising music-making to optimise these benefits is often difficult given their diverse ages, experiences, abilities, cognitive and motor skills, and their experience with music technology. Materialsandmethods:In this study, we conducted a 10-week group music-making intervention with twenty participants in an aged-care home, using a prototype digital musical instrument that we iteratively refined by following a user-centred design approach from direct resident feedback. The prototype instrument adopted a novel method for errorless learning in music-making settings, which we also refined, by increasing the difficulty level of the instrument's operation. We also assessed the residents' engagement with the sessions by obtaining feedback from caregivers and facilitators. Results:Results show that residents' enjoyment decreased as the complexity (difficulty) of our errorless learning implementation increased. We also found that resident engagement increased when changes to the prototype digital musical instrument were provided, but not when residents were giving feedback. Results also found that participation over the course of the intervention, and the number of songs played during each session also enhanced engagement. Conclusions: Overall, our results show the intervention was beneficial to residents, although we note some areas of enhancement for further interventions in designing prototype musical instruments for group music-making in aged-care settings. • IMPLICATIONS FOR REHABILITATION • Older adults positively engage with novel music technology, and do so increasingly over subsequent sessions. Repeated sessions may have the potential to enhance longer-term adoption of technologies as well as any rehabilitative effects of the group music-making activity. • There is significant potential for residents with different abilities to all make music together, although to maximise the sustainability of the devices, the sessions, and the subsequent rehabilitative benefits, residents must be given the right adaptation for individual interfaces that balances ambition and ability. • Rapid DMI prototyping positively enhances engagement among older adults, suggesting that in the case of a custom DMI, an upgrade schedule should be aligned with key rehabilitative milestones. Similarly, in the case of pre-developed digital music systems, resident exposure to new features or functionality should be strategically introduced, so as to maximise engagement for key phases of resident rehabilitation.
Pre-print version of the book "Sonic Interactions in Virtual Environments" in press for Springer's Human-Computer Interaction Series, Open Access license. The pre-print editors' copy of the book can be found at - full book info:
Conference Paper
Full-text available
This paper presents an observational study of the interaction of professional percussionists with a simplified hand percussion instrument. We reflect on how the sound-producing gestural language of the percussionists developed over the course of an hour session, focusing on the elements of their gestural vocabulary that remained in place at the end of the session, and on those that ceased to be used. From these observations we propose a model of movement-based digital musical instruments as a projection downwards from a multidimensional body language to a reduced set of sonic features or behaviours. Many factors of an instrument's design, above and beyond the mapping of sensor degrees of freedom to dimensions of control, condition the way this projection downwards happens. We argue that there exists a world of richness of gesture beyond that which the sensors capture, but which can be implicitly captured by the design of the instrument through its physicality, constituent materials and form. We provide a case study of this model in action.
Conference Paper
Full-text available
The importance of low and consistent latency in interactive music systems is well-established. So how do commonly-used tools for creating digital musical instruments and other tangible interfaces perform in terms of latency from user action to sound output? This paper examines several common configurations where a microcontroller (e.g. Arduino) or wireless device communicates with computer-based sound generator (e.g. Max/MSP, Pd). We find that, perhaps surprisingly, almost none of the tested configurations meet generally-accepted guidelines for latency and jitter. To address this limitation, the paper presents a new embedded platform, Bela, which is capable of complex audio and sensor processing at submillisecond latency.
Full-text available
One of the frequent questions by users of the mixed model function lmer of the lme4 package has been: How can I get p values for the F and t tests for objects returned by lmer? The lmerTest package extends the 'lmerMod' class of the lme4 package, by overloading the anova and summary functions by providing p values for tests for fixed effects. We have implemented the Satterthwaite's method for approximating degrees of freedom for the t and F tests. We have also implemented the construction of Type I - III ANOVA tables. Furthermore, one may also obtain the summary as well as the anova table using the Kenward-Roger approximation for denominator degrees of freedom (based on the KRmodcomp function from the pbkrtest package). Some other convenient mixed model analysis tools such as a step method, that performs backward elimination of nonsignificant effects - both random and fixed, calculation of population means and multiple comparison tests together with plot facilities are provided by the package as well.
Conference Paper
Full-text available
When designing digital musical instruments the importance of low and consistent action-to-sound latency is widely accepted. This paper investigates the effects of latency (0- 20ms) on instrument quality evaluation and performer inter- action. We present findings from an experiment conducted with musicians who performed on an percussive digital musical instrument with variable amounts of latency. Three latency conditions were tested against a zero latency condition, 10ms, 20ms and 10ms ± 3ms jitter. The zero latency condition was significantly rated more positively than the 10ms with jitter and 20ms latency conditions in six quality measures, emphasising the importance of not only low, but stable latency in digital musical instruments. There was no significant difference in rating between the zero latency condition and 10ms condition. A quantitative analysis of timing accuracy in a metronome task under latency conditions showed no significant difference in mean synchronisation error. This suggests that the 20ms and 10ms with jitter latency conditions degrade subjective impressions of an instrument, but without significantly affecting the timing performance of our participants. These findings are discussed in terms of control intimacy and instrument transparency.
Full-text available
Timing abilities are often measured by having participants tap their finger along with a metronome and presenting tap-triggered auditory feedback. These experiments predominantly use electronic percussion pads combined with software (e.g., FTAP or Max/MSP) that records responses and delivers auditory feedback. However, these setups involve unknown latencies between tap onset and auditory feedback and can sometimes miss responses or record multiple, superfluous responses for a single tap. These issues may distort measurements of tapping performance or affect the performance of the individual. We present an alternative setup using an Arduino microcontroller that addresses these issues and delivers low-latency auditory feedback. We validated our setup by having participants (N = 6) tap on a force-sensitive resistor pad connected to the Arduino and on an electronic percussion pad with various levels of force and tempi. The Arduino delivered auditory feedback through a pulse-width modulation (PWM) pin connected to a headphone jack or a wave shield component. The Arduino's PWM (M = 0.6 ms, SD = 0.3) and wave shield (M = 2.6 ms, SD = 0.3) demonstrated significantly lower auditory feedback latencies than the percussion pad (M = 9.1 ms, SD = 2.0), FTAP (M = 14.6 ms, SD = 2.8), and Max/MSP (M = 15.8 ms, SD = 3.4). The PWM and wave shield latencies were also significantly less variable than those from FTAP and Max/MSP. The Arduino missed significantly fewer taps, and recorded fewer superfluous responses, than the percussion pad. The Arduino captured all responses, whereas at lower tapping forces, the percussion pad missed more taps. Regardless of tapping force, the Arduino outperformed the percussion pad. Overall, the Arduino is a high-precision, low-latency, portable, and affordable tool for auditory experiments.
Conference Paper
Full-text available
The perceived properties of a digital piano keyboard were studied in two experiments involving different types of vibrotactile cues in connection with sonic feedback. The first experiment implemented a free playing task in which subjects had to rate the perceived quality of the instrument according to five attributes: Dynamic control, Richness, Engagement, Naturalness, and General preference. The second experiment measured performance in timing and dynamic control in a scale playing task. While the vibrating condition was preferred over the standard non-vibrating setup in terms of perceived quality, no significant differences were observed in timing and dynamics accuracy. Overall, these results must be considered preliminary to an extension of the experiment involving repeated measurements with more subjects.
HUMAN PERFORMERS INTRODUCE TEMPORAL variability in their performance of music. The variability consists of both long-range tempo changes and micro-timing variability that are note-to-note level deviations from the nominal beat time. In many contexts, micro-timing is important for achieving certain preferred characteristics in a performance, such as hang, drive, or groove; but this variability is also, to some extent, stochastic. In this paper, we present a method for quantifying the microtiming variability. First, we transcribed drum performance audio files into empirical data using a very precise onset detection system. Second, we separated the microtiming variability into two components: systematic variability (SV), defined as recurrent temporal patterns, and residual variability ( RV), defined as the residual, unexplained temporal deviation. The method was evaluated using computer-performed audio drum tracks and the results show a slight overestimation of the variability magnitude, but proportionally correct ratios between SV and RV. Thereafter two data sets were analyzed: drum performances from a MIDI drum kit and real-life drum performances from professional drum recordings. The results from these data sets show that up to 65 percent of the total micro-timing variability can be explained by recurring and consistent patterns.
This paper presents a new environment for ultra-low-latency processing of audio and sensor data on embedded hardware. The platform, which is targeted at digital musical instruments and audio effects, is based on the low-cost BeagleBone Black single-board computer. A custom expansion board features stereo audio and 8 channels each of 16-bit ADC and 16-bit DAC for sensors and actuators. In contrast to typical embedded Linux approaches, the platform uses the Xenomai real-time kernel extensions to achieve latency as low as 80 microseconds, making the platform suitable for the most demanding of low-latency audio tasks. The paper presents the hardware, software, evaluation and applications of the system.