Conference PaperPDF Available

Analytic vs. holistic approaches for the live search of sound presets using graphical interpolation

Authors:

Abstract

The comparative study presented in this paper focuses on two approaches for the search of sound presets using a specific geometric touch app. The first approach is based on independent sliders on screen and is called analytic. The second is based on interpolation between presets represented by polygons on screen and is called holistic. Participants had to listen to, memorize, and search for sound presets characterized by four parameters. Ten different configurations of sound synthesis and processing were presented to each participant, once for each approach. The performance scores of 28 participants (not including early testers) were computed using two measurements: the search duration, and the parametric distance between the reference and answered presets. Compared to the analytic sliders-based interface, the holistic interpolation-based interface demonstrated a significant performance improvement for 60% of sound synthesizers. The other 40% led to equivalent results for the analytic and holistic interfaces. Using sliders, expert users performed nearly as well as they did with interpolation. Beginners and intermediate users struggled more with sliders, while the interpolation allowed them to get quite close to experts' results.
Analytic vs. holistic approaches for the live search of
sound presets using graphical interpolation
Gwendal Le Vaillant
University of Mons, Numediart
(and IRISIB Lab, HE2B)
31 boulevard Dolez
7000 Mons, Belgium
glevaillant@he2b.be
Thierry Dutoit
University of Mons
Numediart, TCTS Lab
31 boulevard Dolez
7000 Mons, Belgium
thierry.dutoit@umons.ac.be
Rudi Giot
HE2B-ISIB
IRISIB Lab
150 rue Royale
1000 Brussels, Belgium
rgiot@he2b.be
ABSTRACT
The comparative study presented in this paper focuses on two
approaches for the search of sound presets using a specific geometric
touch app. The first approach is based on independent sliders on screen
and is called analytic. The second is based on interpolation between
presets represented by polygons on screen and is called holistic.
Participants had to listen to, memorize, and search for sound presets
characterized by four parameters. Ten different configurations of
sound synthesis and processing were presented to each participant,
once for each approach. The performance scores of 28 participants
(not including early testers) were computed using two measurements:
the search duration, and the parametric distance between the reference
and answered presets.
Compared to the analytic sliders-based interface, the holistic
interpolation-based interface demonstrated a significant performance
improvement for 60% of sound synthesizers. The other 40% led to
equivalent results for the analytic and holistic interfaces. Using sliders,
expert users performed nearly as well as they did with interpolation.
Beginners and intermediate users struggled more with sliders, while
the interpolation allowed them to get quite close to experts’ results.
Author Keywords
Presets, interpolation, graphical, touch, polygons, holistic, analytic,
sliders, mapping, working memory.
CCS Concepts
Applied computing → Arts and humanities → Sound and music
computing; • Human-centered computing → Human computer
interaction (HCI) Interaction devices Touch screens;
Human-centered computing → Human computer interaction
(HCI) HCI theory, concepts and models
1. INTRODUCTION
In the domain of parameters mapping for sound synthesis and
processing, the most basic method is theone-to-onemapping [8]. It
consists in the assignation of independent controls to parameters of
sound processes; e.g. a large number of sliders or knobs can be one-to-
one assigned to synthesis or filtering parameters. If each output
depends on several inputs, the mapping can be called many-to-many.
Other methods map fewer control parameters to the numerous
synthesis parameters, because a reduced set of possible inputs can be
more interesting for a performer [6]. To achieve this few-to-many
parameters mapping [8], it is possible to rely on graphical interpolation
of presets. The basic principle is to internally assign presetsi.e. a set
of defined values of all parameters of the controlled process to
geometric shapes on a screen. Later, when a user moves a cursor
between these shapes, an interpolation is computed between the
underlying presets. The interpolation engine outputs values of all
parameters of the controlled synthesis process.
Many systems currently offer graphical presets interpolation
features. A widespread one is the Nodes objects included in Max/MSP
[3], which represents presets as disks. The system used for the present
experiment is based on a specific controller app, running on an iPad,
which allows free-form polygonal representations of presets. This is
very useful to compare different preset-based interactions, e.g. using
linear sliders or complex polygons from a unique app, on a unique
touch screen.
The present work aims at measuring performances of users
confronted to preset-search tasks, in order to compare two different
geometric approaches. All presets are made of four various
parameters. The first approach–called analytic [7]let subjects directly
control the synthesis parameters from four independent sliders. The
second approach is based on graphical interpolation between four
presets. It is called holistic [7] because subjects manipulate each preset
(linked to a shape) as a whole.
Several experiments have already been conducted in the fields of
graphical presets interpolation (subsection 2.1) and analytic vs. holistic
mapping strategies (subsection 2.2). The first difference between this
experiment and previous ones is the use of a unique touch app for all
graphical representations (see subsection 3.1). This removes the risk
of introducing bias when using several different physical interfaces.
Another contribution of this experiment is the nature of the preset-
retrieving tasks. First, subjects had to listen and memorize melodic and
rhythmic loops from widely used synths (see subsection 3.2).
Secondly, they had to search for values of the synthesis parameters that
give a similar result. Their performance scores depend on:
The parametric distance between answered parameter
values and reference parameter values
The search duration (the shorter the better)
This process is similar to what an artist might be doing during an
interactive music performance, a live DJ-set, etc. Recent research on
Auditory Working Memory (AWM, see subsection 2.3) was taken
into account for the conception and realization of the experiment.
Results are presented in section 4 and discussed in section 5.
An additional contribution of this work is the search for correlations
between subjects’ expertise level in sound synthesis and processing,
and their performance for the given tasks.
2. STATE-OF-THE-ART
2.1 Graphical presets interpolation
An extensive listing of graphical presets interpolators has been
recently published by Gibson and Polfreman [4] and will not be
detailed here. The geometric interpolation method used in the
present experiment does not belong to that list, because it was
initially used for sound spatialization [11] only. It was developed
by the authors of this paper and has been recently repurposed as
a generic controller app with OSC (Open Sound Control) output.
For the specific topic of comparing graphical presets
interpolators, Gibson and Polfreman [4] proposed a framework
including existing solutions. They concluded that it was usable
for future formal experiments, without referring to previously
published ones. This framework was not considered because the
presented experiment began a few months before their
publication. Moreover, our system brings free-form polygons
and useful network-related configuration features.
Later, Gibson and Polfreman [5] published a study of mouse-
traces records for various visualizations of presets, all based on
simple shapes (blank screen, points, or disks on screen). Their
experiment resembles the one presented in this paper but does
not quantify user performances and relies on a preset-based,
holistic approach only.
2.2 Analytic vs. holistic approaches
Humans can apprehend musical systems using analytic and/or
holistic cognitive modes [7]. The analytic thinking tends to
decompose input parameters and their effects. The holistic
thinking is harder to describe; in the current context, it could
mean that a systemfrom its inputs to its perceived outputsis
considered as a whole. These two cognitive modes are not
activated in the same way in everyone’s mind, e.g. Nisbett et al.
[17] speculated that they are influenced by social systems.
The main experiments comparing the analytic and holistic
cognitive modes for musical performances were conducted by
Hunt and Kirk [7]. Subjects had to reproduce parametric
trajectories using three different interfaces. The first and second
interfaces were made of four sliders with a one-to-one mapping
to synthesis parameters. The first relied on mouse-controlled on-
screen sliders, the second relied on hardware sliders. Both incited
subjects to think analytically. The third used a combination of
physical sliders and movements and buttons of a mouse, with
complex mappings to four sound parameters. Many users
developed a ‘feel’ for this interface and the authors considered
that users were thinking holistically.
Very briefly, the holistic interface gave better results only
when reference parameters change simultaneously. Users also
generally found it more ‘fun’ and engaging. When parameters
changed sequentially, better performances were obtained using
the analytic interfaces. Some subjects expressed a preference for
the analytic thinking.
These results are very interesting and call for complementary
experiments. They were obtained from different physical
interfaces, whereas the presented work aims at providing new
data about a modern touch interface, and about different
geometric controllers on a unique interface.
2.3 Auditory Working Memory (AWM)
As our experiment involves human auditory memory processes,
it is necessary to briefly focus on its organization even if it is
very complex. In terms of duration, the auditory memory can be
roughly divided into three categories:
The long-term memory, in which information is
encoded and stored [20]. This memory will not be
involved much in the experiment, as subjects did not
know the sounds before taking part in it.
The short-term memory [2] (also called AWM), which
stores information for a limited amount of time. It
allows active mental manipulation of information and
is the most involved in this experiment.
The auditory sensory memory [2] (echoic memory),
very brief and unconscious, which does not involve
active mental manipulation of sensory traces.
AWM and echoic memory have been extensively studied, but
their durations still cannot be precisely defined. The nonverbal
echoic memory retention interval is considered to be in the range
of a few seconds [15], but “possibly up to 60s”. The AWM
lasts longer but information deteriorates with time. Soemer and
Saito [19] measured that the retention of auditory nonverbal
information was slightly better for a 3s than for a 12s interval.
The number of items that can be manipulated simultaneously
in the AWM is still being studied. A widely accepted figure is
four, according to Cowan studies [2]. However, the AWM seems
to be more complex and “flexibly distributed among all items in
memory” according to a recent publication [14].
3. EXPERIMENT PROTOCOL
3.1 General organization
3.1.1 Population
Twenty-eight individuals (eight female) from 19 to 60 years old
took part in the final experiment and were included in the results.
One more person took part in it but was removed from the results
due to a technical issue. Moreover, four individuals (one female)
took part in the alpha and beta versions of the experiment to
calibrate various parameters. Their performances were not
included in the results.
One subject suffers from humeral agenesis (very short arms).
However, he/she could normally perform the taskwith good
resultsthanks to the touch interface on a rather small tablet
screen. Thus, these results were fully integrated into final data.
Among the total of 33 participants, around 10 were colleagues
or students of the authors. The population was made of
professional and amateur musicians, sound designers, engineers,
and people professionally unrelated to the domains of music or
engineering.
To start the experiment, the subjects had to read and accept a
consent form on the main computer. All results remain fully
anonymous. Videos were recorded but focused only on the
subjects’ hands on the tablet.
3.1.2 Graphical interpolation touch app
In order to study the analytic and holistic approaches, two
different representations can be displayed on the tablet using the
touch app. For a fair comparison, the analytic (Figure 1) and
holistic (Figure 2) views must contain the same number of items,
which is four, in coherence with Cowan’s magical fourlaw [2].
Figure 1. Analytic slider-based representation
The analytic, decomposed representation consists in a basic
one-to-one mapping from four graphical sliders to four
parameters. These parameters are described in subsection 3.2. In
the example of Figure 1, the parameter assigned to the yellow
slider has a 100% value and the one linked to the green slider has
a 50% value. The two other parameters have a 0% value.
The holistic representation (Figure 2) assigns presets instead
of parameters to the shapes on screen. Each preset is made of one
parameter at a 100% value, all others at 0%.
When moving the cursor (white dot) from a shape to another
overlapping shape, a smooth interpolation is computed between
the underlying presets. The method for computing interpolation
weights of polygons ensures the continuity of output values [19].
For example, on the left of Figure 2, interpolation weights are
50% for the ‘green’ preset, 50% for the ‘purple’ preset, 0% for
the two others. On the right, the interpolation weight is 100% for
the ‘red’ preset, 0% for the others. Each preset is made of one
parameter at 100% value, all other parameters at 0% value.
Figure 2. Two different preset interpolations obtained from
the same holistic interface
Colors of all graphical items on screen are randomized during
the experiment, and do not correspond to any particular
parameter or preset. With the sliders, cursors are constrained to
vertical lines. For the holistic view, the unique cursor can move
freely inside the square area delimited by the four polygons.
For a given synthesizer, both search steps must begin from
identical parameter values. To ensure this, the interpolation
cursor is initially positioned at the center of the screen, and
analytic sliders are initialized to a 25% value.
3.1.3 Twenty listening and search cycles
The main experiment contained 10 different synthesizers with
their reference presets (see subsubsection 3.2.1). Each
synthesizer was presented two times: once with the sliders-based
interface, once with the interpolation. The core of the experiment
then contains 20 cycles of listening and search, for an
approximate duration of 20 minutes.
Durations of the different steps of a cycle are presented in
Table 1. They were defined considering scientific literature on
AWM and using observations and feedback from alpha and beta
tests. We remarked that subjects lose auditory information of a
reference sound when they refine their results too much. Thus,
we tried several time limitations for the search step. We finally
choose 35s, which gives enough time to explore the interface and
refine the result if necessary. Subjects were incited to give their
answer (by pushing a green physical button) quicker to improve
their performance score.
The white noise might help erase sensory traces in memory [19],
in order to improve independence of performances between
cycles. Ordering was randomized, but at least 3 cycles separated
the analytic and holistic presentations of a given synthesizer.
Table 1. Durations of the steps of a cycle (seconds)
Pause
Listening
Pause
Search
Pause
White Noise
8
20
5
35 max.
1.5
6
Prior to the 20 main experiment cycles, subjects had to test the
whole setup. This test consisted in two trial cycles made of two
different synthesizers. One cycle presented a sliders-based
interface, the other presented an interpolation-based interface. At
the end of the experiment, subjects were asked a few questions
via a detailed form. The whole experiment lasted 30 minutes
maximum.
3.2 Sound parameters
3.2.1 Nature of sound loops and parameters
The controllable parameters were usual but very diverse, e.g. gains,
waveforms, dry/wet ratios, cut-off frequencies, etc. They were
parameters of synthesizers and effects (low- or high-pass filters,
mixers, delays, reverbs, choruses, etc.) from Arturia Analog Lab [1]
and built-in Reaper [18] plug-ins. Subjects were only informed about
the possible nature of parameters, but not about their specific names.
Each synthesizer was played from its own MIDI loop (maximum
duration 5s), at its own tempo, in its own Reaper track. All reference
presets could be reached from both analytic and holistic interfaces. The
preset search tasks were tried and criticized by alpha and beta testers,
in order to obtain pleasant sounds and reasonable levels of difficulty.
To prevent extreme unmusical variations, a rescaling of parameters’
values could be applied inside the touch app.
3.2.2 Subjects’ performances measurements
A subject’s performance depended only on the validated final
result, not on trajectories on the touch screen. The formula for
computing performances does not include psychoacoustic
differences between the result and reference sounds but relies on
parametric differences. We make the assumption that plugins
controls are calibrated such that a linear-scale control has an
equivalent psycho-acoustic effect, e.g. a frequency slider from 0
to 1 controls a physical frequency on a logarithmic scale.
The total error
!
is the normalized sum of the four parametric
errors (differences between target and user values of
parameters). Given the search duration
"
, the performance
#
is:
#$%&'
()
*+ !
,-..
/0
*+ "
12#
3
4,
5
67,8*9
Figure 3. Performance evaluation function (surface plot)
By considering both the speed and error, this performance
function evaluates a live situation, with a linear decrease on both
axes. It has been calibrated using data from alpha and beta
experiments. To prevent good scores for very fast but quite
wrong answers, an average error of 0.55 gives a 0% score.
Nonetheless, to incite subjects to search as fast as possible, the
maximum performance decreases with time. The 87s factor was
chosen after the 0.55, to obtain an average score around 50%.
3.3 Hardware setup
3.3.1 General hardware setup
Figure 4. Connections between computers and peripherals
While subjects performed the preset search tasks on the tablet,
Reaper ran synthesizers and filters on the main computer. To play
and stop loops, the manager app (also running on the main computer)
controlled Reaper from local OSC messages. The main screen’s UI
was dynamic and depended on the cycle step: it displayed initial
and final forms, countdowns, progress bars, scores, etc.
Once launched, the system was autonomous. Besides the OSC
connection from the iPad app to Reaper, the experiment manager
kept a TCP/IP connection opened to the iPad to control the touch
app and monitor its status. The experiment manager also opened
a local OSC connection to Reaper to control it.
Participants could always see the main computer’s screen and
the tablet. They were seated in an adjustable office chair, inside
a cabin covered with acoustic foam, covered itself by black
curtains in order to minimize visual stimuli. They were wearing
Sennheiser PXC 480 noise-cancelling headphones connected to
a Focusrite USB audio interface.
Figure 5. Subject during a search phase
The tablet screen was black during all steps but the search one.
During this step, a large white bar (Figure 5) displayed the
remaining time. The small bar (screen bottom) is the global
progress bar. In other cases, the main screen displayed various
information such as a countdown or the last performance score.
3.3.2 System latencies
High latencies might impair user performances and must be
estimated for such an experiment. Values from 20 to 30ms are
considered acceptable for instrumental musical application [10].
Commercial touch screens present high latencies (dozens of
milliseconds) and 15% of subjects notice visual latencies of
40ms for touch pointing tasks [9].
Using a custom latency measurement system (not published
yet) based on a microcontroller, LEDs and fast photodiodes, the
‘touch drag’ latency was estimated for visual and audio
feedbacks. For this experiment, the drag latency is more relevant
than the ‘touch tap’ audio latency (which is nonetheless easier to
measure e.g. using a single microphone [12]).
The latency from a movement on the touch screen to audio
feedback was estimated to 49ms (SD=10ms, n=1073).
Considering the number of interfaces involved, it remains quite
low thanks to the highly reactive iPad touch screen, the efficient
C++ implementation and the reliable cable network. The latency
from a touch move to visual feedback was estimated to 78ms
(SD=5ms, n=1073). This quite high figure comes from
improvable internal management of graphic buffers.
Although these latencies might slightly decrease users’
performances, they were not considered as an issue by subjects
(see subsubsection 4.3.1).
3.4 Gamification
A literature review by Lumsden et al. [13] states that introducing
game elements in user tasks improves their motivation and
enjoyment, reduces test anxiety and increases long-term
engagement. Ninaus et al. [16] concluded that “game elements
facilitated the individuals’ performances closer to their
maximum working memory capacity”.
Thus, the user interfaces and general organization of this
experiment have been conceived respecting some basic
principles of gamification. According to the theory of the ‘state
of flow’ [16], a gamified experiment should provide clear goals,
feedback, playability and a sense of control.
4. RESULTS
4.1 Sliders vs. interpolation comparison
In general, the interpolation-based interface allows a significant
performance improvement over sliders (Figure 7). The overall
average performance is 47%.
Figure 7. Histograms (and kernel density estimates) of all
measured performances, sorted by control interface
However, Figure 6 shows that results actually vary depending
on the synthesizer and its associated reference preset. For six of
them (IDs 3, 4, 5, 6, 7, 8), performances obtained from the
interpolation-based interface are significantly higher (p-values <
0.001, Wilcoxon signed-rank test). Median performance values
are around two times higher than values obtained from the
sliders-based interface. For three synthesizers (IDs 0, 2, 9), the
interpolation seems to slightly improve performances, but results
are not significant (p-values are 0.221, 0.055 and 0.122,
respectively). Synthesizer 1 presents quite similar results for
both interfaces (p-value = 0.923).
4.2 Effect of subjects’ expertise level
Figure 8 represents average performances, sorted by their
estimated level of expertise on digital sound synthesis and
processing. This level was estimated by the subjects themselves
during final questions, thanks to extensive descriptions of all
levels (from 1 to 4). This figure contains the 28 average user
performances obtained from the sliders-based interface,
compared to the 28 average performances from interpolation. It
also shows the best polynomial fits for these sets of points: a 2nd
order fit for sliders data, a 1st order fit for interpolation data. The
2nd order fit for sliders data minimizes the root mean square error
and maximizes the
:!
coefficient of determination.
Average performances using the interpolation interface are
quite consistent across all expertise levels, while they seem to be
slowly rising as the level increases. Experts present the smallest
performance difference between the two approaches. Results
from the sliders interface show a pronounced performance
increase between levels 3 and 4, but also a slight decrease from
level 1 to level 2. This is unexpected and will be discussed in
subsection 5.4.
Figure 6. Measured performances of all subjects, sorted by synthesizer ID and interface (S=Sliders, I=Interpolation)
Figure 8. Average user performances, sorted by their self-
estimated expertise level about sound synthesis and effects
4.3 Subjects’ opinion
4.3.1 General opinion on the experiment
All participants gave a positive feedback on the experiment and
described the sounds and interfaces as ‘fun’ or ‘enjoyable’.
When asked, many expressed a feeling of being in the ‘state of
flow’ [16]: they felt immersed in their tasks. None of them
noticed the audio latency, and the whole system was described
as ‘reactive’. Thus, their performances should be close to the best
they could do.
Some participants were nonetheless a bit frustrated by bad
performances displayed on the screen, when they forgot or did
not find the preset. Some also reported being a bit stressed by the
bar displaying the remaining time. These two elements seem to
be downsides of gamification and might have lowered
performance results. A few people felt some fatigue at the end.
4.3.2 Sliders vs. interpolation interfaces
Figure 9. Answers to the questions:
Which interface is the […]?
At the end of the experiment, subjects were asked to select their
preferred interface depending on four criteria (Figure 9). Their
interface of choice was generally the interpolation, but a few
users expressed a better feeling for sliders. Seven participants
considered the sliders to be the most precise.
5. DISCUSSION
5.1 Performance evaluation function
The function described in subsection 3.2.2 relies on the
;,-..412#<
couple of values, and many others were tried during
post-processing. For instance,
;,-.,4*=,#<
and
;,-2,4>,#<
give
the same 47% overall average performance, but do not
significantly change comparative results from section 4.2.
The
;,-=*4?@<
couple of values evaluates only the error, with
the same 47% overall average score. It does not significantly
change results either. The
;?@4A>#<
couple of values evaluates
only the speed, with a 47% average score as well. This evaluation
leads to differences for synthesizers 0 and 9 only: the
interpolation method shows a significant improvement.
However, these two last couples of values do not properly
evaluate the task because participants were explicitly told to give
an accurate and fast answer. For detailed results about all the
different evaluations, please refer to the provided data and
Python scripts (section 10).
5.2 True holistic/analytic representations?
One might rightfully ask whether the sliders- and interpolation-
based interfaces are truly analytic and holistic, respectively.
About the sliders, all subjects easily understood during the trials
that each slider has its own effect, independent from the others.
This corresponds to an analytic situation.
The question is more complex about the interpolation. Thanks
to the complexity of presented sounds, most people did not
realize that each preset consisted in one parameter at a 100%
value, and three parameters at 0%. They were then navigating
between sound presets, which is a holistic cognitive mode.
However, several experts reported understanding a link
between presets and some parameters (about some synths, not all
of them). Some of them were then trying to think analytically
about the holistic interpolation, maybe because it was more usual
and natural to them. This might explain why performances with
both methods are quite similar for experts (Figure 8) and
indicates that the polygon-based interface was not 100% holistic
for all participants. Nonetheless, all subjects (experts included)
had to navigate in-between the shapes of the interpolation view
at some point. Thus, a pure analytical thinking was probably not
possible. The interpolation is therefore considered more holistic
than analytic, even for level-4 participants.
5.3 Holistic vs. analytic comparison
Results from Figure 7 and Figure 6 show a general performance
improvement for the holistic approach. The analytic method
showed no clear improvement over the other one, which
confirms earlier studies [8]. It is also coherent with Cowan’s
“magical 4” [2] law: the task of manipulating four independent
sliders tends to saturate our AWM capacity, while the
interpolation cursor might be a complex but single item. This
reasoning is true as long as participants actually use a holistic
cognitive mode when playing with the interpolation cursor.
This general improvement was nonetheless not obvious. For
example, in a very general context, Nisbett et al. [17] mention
that Western societies tend to be more analytical while east-
Asian societies adopt a more holistic approach. Not all people
are prone to a more holistic thinking.
From the participants’ point of view, the holistic control was
clearly perceived as the fastest and the most intuitive. It was in
general considered more precise and was preferred by a large
majority of individuals. However, this preference was certainly
influenced by better performances: people tend to like more what
they succeed at. In a different experiment, the analytic approach
might be preferred if it would give better scores.
A major downside of the holistic approach is the time needed
to prepare presets. Because this task had been fulfilled by the
authors, this inconvenience does not influence opinion results.
5.4 Expertise-performance relations
On Figure 8, the general and progressive increase of
performance seems quite natural. The more experienced an
individual is, the more likely he or she is to identify sound
characteristics, differences and similarities. That said, it is very
interesting to observe that the holistic interface allows non-
experts participants to perform almost as well as experts.
The slider-based performance rise between levels 2 and 4 is
not surprising either, because amateurs and professionals are
used to this common kind of analytic interfaces. The unforeseen
result is the rather good performance of level-1 subjects using
sliders. Thanks to the performance evaluation function which
gives a 0.0 score for a 0.55 error, see subsubsection 3.2.2this
cannot come from random answers (the “beginner’s luck”).
This interesting phenomenon might come from their total lack
of analytic training about sound and music. By combining
measured data to remarks of participants, we speculate that it
forces these individuals into some kind of intuitive exploration
of sound spaces, which eventually gives better results than a pure
analytic exploration. The lower performance of level-2 subjects
might come from a desire to fully analyze all sound
characteristics, without having the training to do so. Their AWM
might reach its maximum capacity, which causes a loss of
auditory traces of the reference sound and leads to incorrect
answered presets.
This speculation of course needs further research to be
confirmed. The fit curves presented on Figure 8 requires more
data to become really meaningful and usable, because the orders
of the best polynomial fits could change.
6. CONCLUSION
This comparative study presented measurements of user
performances for preset search tasks. Twenty-eight individuals
had to listen to sound loops played from synthesizers and effects,
memorize them, then search for the values of four parameters of
the sound synthesis process. The search tasks were carried out
on two different controllers, based on usual sliders (analytic) or
based on graphical interpolations between polygon shapes
(holistic). These two representations were presented on the same
low-latency touch app MIEM Play, developed by the authors.
User performances were computed from the time they needed
to finish the search task, and from the parametric distance
between the reference and answered values. Results show that
the holistic approach led to much better performances for 60%
of the ten sounds presented, and similar performances for the
remaining 40%. Links to anonymous recorded data and the OSC-
controller touch app are available in section 10.
Moreover, average performances were sorted by participants’
expertise in sound synthesis and processing. As expected, the
more experienced users generally got better scores. However, the
holistic approach allowed neophytes to get results very close to
expertsperformances. An interesting variation was observed
about the least experienced users using analytic interfaces: they
did not get the lowest scores. These observed trends, however,
require a larger set of data to be confirmed.
This experiment was entirely based on presets made of four
parameters. This figure is widely accepted as the approximative
number of items simultaneously actively manipulated in our
AWM, but it is currently being questioned. Thus, some other
analytic-holistic comparative studies are planned in order to
obtain data about higher numbers of parameters. The touch app
used here allows free-form representation of presets, so it can
also be employed to formally compare different holistic
interpolation-based graphical approaches.
7. ACKNOWLEDGMENTS
We would like to thank Jean-Luc Boevé for proof-reading this
paper and for his ideas, and all participants for their contribution.
8. COMPLIANCE WITH ETHICAL
STANDARDS
This research was funded by IRISIB, a public research institute.
All participants read and validated a digital consent form,
informing them of the anonymous usage of stored data for
scientific purposes only.
9. REFERENCES
[1] Arturia. 2020. Analog Lab Overview. Retrieved from
https://www.arturia.com/products/analog-classics/analoglab
[2] N. Cowan. 2001. The magical number 4 in short-term
memory: A reconsideration of mental storage capacity.
Behavioral and brain sciences 24, 1 (Feb. 2001), 87-114.
[3] Cycling ’74. 2020. Retrieved from https://cycling74.com
[4] D. Gibson and R. Polfreman. 2019. A framework for the
development and evaluation of graphical interpolation for
synthesizer parameter mappings. In Proceedings of the Sound
and Music Computing Conference (SMC ’19). Malaga, Spain.
[5] D. Gibson and R. Polfreman. 2019. A Journey in
(Interpolated) Sound: Impact of Different Visualizations in
Graphical Interpolators. In Proceedings of the 14th International
Audio Mostly Conference (AM'19) (Nottingham, UK). 215-218.
[6] C. Goudeseune. 2002. Interpolated mappings for musical
instruments. Organised Sound 7, 2 (Aug. 2002), 85-96.
[7] A. Hunt and R. Kirk. 2000. Mapping Strategies for Musical
Performance. In Wanderley, M.M. and Battier, M. eds. Trends
in Gestural Control of Music. IRCAM Centre Pompidou, Paris,
France, 231-258.
[8] A. Hunt, M.M. Wanderley and M. Paradis. 2002. The
importance of parameter mapping in electronic instrument
design. In Proceedings of the 2002 Conference on New
Instruments for Musical Expression (Dublin, Ireland)
[9] R. Jota, A. Ng, P. Dietz and D. Wigdor. 2013. How fast is
fast enough? A study of the effects of latency in direct-touch
pointing tasks. In Proceedings of the SIGCHI conference on
human factors in computing systems (CHI ’13) (Paris, France).
ACM Press, New-York, NY, 2291-2300.
[10] N. Lago and F. Kon. 2004. The Quest for Low Latency. In
Proceedings of the International Computer Music Conference
(ICMC 04) (Miami, FL). 423-429.
[11] G. Le Vaillant and R. Giot. 2014. Multi-touch Interface for
Acousmatic Music Spatialization. In Proceedings of the
International Computer Music Conference (Athens, Greece).
[12] G. Le Vaillant, G. Villée and T. Dutoit. 2017. Portable C++
Framework for Low-Latency Musical Touch Interaction with
Geometrical Shapes. In Proceedings of the International
Computer Music Conference (Shanghai, China)
[13] J. Lumsden, E. A. Edwards, N. S. Lawrence, D. Coyle and
M. R. Munafò. 2016. Gamification of cognitive assessment and
cognitive training: a systematic review of applications and
efficacy. JMIR serious games 4, 2 (Jul. 2016), e11.
[14] W. J. Ma, M. Husain and P. M. Bays. 2014. Changing
concepts of working memory. Nature neuroscience 17, 3 (Mar.
2014), 347.
[15] M. A. Nees. 2016. Have we forgotten auditory sensory
memory? Retention intervals in studies of nonverbal auditory
working memory. Frontiers in psychology 7 (Dec. 2019), 1892.
[16] M. Ninaus, G. Pereira, R. Stefitz, R. Prada, A. Paiva, C.
Neuper and G. Wood. 2015. Game elements improve
performance in a working memory training task. International
Journal of Serious Games 2, 1 (Feb. 2015), 3-16.
[17] R. E. Nisbett, K. Peng, I. Choi and A. Norenzayan. (2001).
Culture and Systems of Thought: Holistic Versus Analytic
Cognition. Psychological Review 108, 2, (Apr. 2001) 291310.
[18] Reaper Digital Audio Workstation. 2020. Retrieved from
https://www.reaper.fm
[19] A. Soemer and S. Saito. 2015. Maintenance of auditory-
nonverbal information in working memory. Psychonomic
bulletin & review 22, 6 (Dec. 2015), 1777-1783.
[20] E. Tulving. 1972. Episodic and semantic memory. In E.
Tulving and W. Donaldson, Organization of memory. Academic
Press.
10. DATA AND TOUCH APP
The MIEM Play app is open-source and freely available on app
stores (https://miem.laras.be). Anonymous experiment data and
Python processing scripts are available at:
https://github.com/gwendal-le-vaillant/MIEM_Experiments
... Since the initial research was undertaken for this project another interpolator has been created that provides a multi-touch interface for controlling the interpolator, known as MIEM (Multitouch Interfaces for Electroacoustic Music) [199]. It consists of two separate applications: the first of which allows the design of the interpolation space and the association of the mappings and the second allows the use of the interpolation space and provides OSC messages for the transmission of the interpolation output. ...
... This interpolator was initially designed for spatialization for electroacoustic music [200], but has subsequently been expanded out to sound parameter control [199]. These authors have also undertaken user testing with the developed platform where they compared participant performance when searching for a sound with the interpolator verses individual parameter sliders. ...
... Since the initial development of the framework, a research team at University of Mons developed an interpolator using this strategy [199], [200]. As detailed in Section 2.4.2.14, the polygon interpolator expands on the intersecting model explored with the framework. ...
Thesis
This research investigates the use of graphical interpolation to control the mapping of synthesis parameters for sound design, and the impact that the visual model can have on the interpolator’s performance and usability. Typically, these systems present the user with a graphical pane where synthesizer presets, each representing a set of synthesis parameter values and therefore an existing sound, can be positioned at user-selected locations. Subsequently, moving an interpolation cursor within the pane will then create novel sounds by calculating new parameter values, based on the cursor position and an interpolation model. These systems therefore supply users with two sensory modalities, sonic output and the visual feedback from the interface. A number of graphical interpolator systems have been developed over the years, with a variety of user-interface designs, but few have been subject to formal user evaluation making it difficult to compare systems and establish effective design criteria to improve future designs. This thesis presents a novel framework designed to support the development and evaluation of graphical interpolated parameter mapping. Using this framework, comparative back-to-back testing was undertaken that studied both user interactions with, and the perceived usability of, graphical interpolation systems, comparing alternative visualizations in order to establish how the visual feedback provided by the interface aids the locating of desired sounds within the space. A pilot investigation compared different levels of visual information, the results of which indicated that the nature of visualisation did impact on user interactions. A second study then reimplemented and compared a number of extant designs, where it became apparent that the existing interpolator visuals generally relate to the interpolation model and not the sonic output. The experiments also provide new information about user interactions with interpolation systems and evidence that graphical interpolators are highly usable in general. In light of the experimental results, a new visualization paradigm for graphical interpolation systems is proposed, known as Star Interpolation, specifically created for sound design applications. This aims to bring the visualisation closer to the sonic behaviour of the interpolator by providing visual cues that relate to the parameter space. It is also shown that hybrid visualizations can be generated that combine the benefits of the new visualization with the existing interpolation models. The results from the exploration of these visualizations are encouraging and they appear to be advantageous when using the interpolators for sound design tasks.
... Hence, in contrast to [3], we argue that the objective of disentangled semantic macro-controls might be unachievable, and might not be suited to such generative models. Instead of individual macro-controls, tools such as graphical presets interpolators [21,22] can be used to manage large latent vectors directly for presets generation. ...
Conference Paper
Full-text available
Deep neural networks have been recently applied to the task of automatic synthesizer programming, i.e., finding optimal values of sound synthesis parameters in order to reproduce a given input sound. This paper focuses on generative models, which can infer parameters as well as generate new sets of parameters or perform smooth morphing effects between sounds. We introduce new models to ensure scalability and to increase performance by using heterogeneous representations of parameters as numerical and categorical random variables. Moreover, a spectral variational autoencoder architecture with multi-channel input is proposed in order to improve inference of parameters related to the pitch and intensity of input sounds. Model performance was evaluated according to several criteria such as parameters estimation error and audio reconstruction accuracy. Training and evaluation were performed using a 30k presets dataset which is published with this paper. They demonstrate significant improvements in terms of parameter inference and audio accuracy and show that presented models can be used with subsets or full sets of synthesizer parameters.
... In 2020, Le Valliant, et. al. [14] used Hunt and Kirk's analytic/holistic framing to evaluate a parameter interpolation system, tasking users with memorization and recreation of sound examples. Using quantitative metrics, they find that less experienced users can match the performance of more experienced ones by using the 'holistic' parameter interpolation system as opposed to the 'analytic' slider interface. ...
... Hence, in contrast to [3], we argue that the objective of disentangled semantic macro-controls might be unachievable, and might not be suited to such generative models. Instead of individual macro-controls, tools such as graphical presets interpolators [21,22] can be used to manage large latent vectors directly for presets generation. ...
Preprint
Full-text available
Deep neural networks have been recently applied to the task of automatic synthesizer programming, i.e., finding optimal values of sound synthesis parameters in order to reproduce a given input sound. This paper focuses on generative models, which can infer parameters as well as generate new sets of parameters or perform smooth morphing effects between sounds. We introduce new models to ensure scalability and to increase performance by using heterogeneous representations of parameters as numerical and categorical random variables. Moreover, a spectral variational autoencoder architecture with multi-channel input is proposed in order to improve inference of parameters related to the pitch and intensity of input sounds. Model performance was evaluated according to several criteria such as parameters estimation error and audio reconstruction accuracy. Training and evaluation were performed using a 30k presets dataset which is published with this paper. They demonstrate significant improvements in terms of parameter inference and audio accuracy and show that presented models can be used with subsets or full sets of synthesizer parameters.
Conference Paper
Full-text available
Graphical interpolation systems provide a simple mechanism for the control of sound synthesis systems by providing a level of abstraction above the parameters of the synthesis engine, allowing users to explore different sounds without awareness of the synthesis details. While a number of graphical interpolator systems have been developed over many years, with a variety of user-interface designs, few have been subject to user-evaluations. We present the testing and evaluation of alternative visualizations for a graphical interpolator in order to establish if the visual feedback provided through the interface, aids the navigation and identification of sounds with the system. The testing took the form of comparing the users' mouse traces, showing the journey they made through the interpolated sound space when different visual interfaces were used. Sixteen participants took part and a summary of the results is presented, showing that the visuals provide users with additional cues that lead to better interaction with the interpolators. CCS CONCEPTS • Human-centered computing~Visualization design and evaluation methods • Human-centered computing~Usability testing • Human-centered computing~Empirical studies in HCI
Conference Paper
Full-text available
This paper presents a framework that supports the development and evaluation of graphical interpolated parameter mapping for the purpose of sound design. These systems present the user with a graphical pane, usually two-dimensional, where synthesizer presets can be located. Moving an interpolation point cursor within the pane will then create new sounds by calculating new parameter values, based on the cursor position and the interpolation model used. The exploratory nature of these systems lends itself to sound design applications, which also have a highly exploratory character. However, populating the interpolation space with “known” preset sounds allows the parameter space to be constrained, reducing the design complexity otherwise associated with synthesizer-based sound design. An analysis of previous graphical interpolators in presented and from this a framework is formalized and tested to show its suitability for the evaluation of such systems. The framework has then been used to compare the functionality of a number of systems that have been previously implemented. This has led to a better understanding of the different sonic outputs that each can produce and highlighted areas for further investigation.
Article
Full-text available
Researchers have shown increased interest in mechanisms of working memory for nonverbal sounds such as music and environmental sounds. These studies often have used two-stimulus comparison tasks: two sounds separated by a brief retention interval (often 3–5 s) are compared, and a “same” or “different” judgment is recorded. Researchers seem to have assumed that sensory memory has a negligible impact on performance in auditory two-stimulus comparison tasks. This assumption is examined in detail in this comment. According to seminal texts and recent research reports, sensory memory persists in parallel with working memory for a period of time following hearing a stimulus and can influence behavioral responses on memory tasks. Unlike verbal working memory studies that use serial recall tasks, research paradigms for exploring nonverbal working memory—especially two-stimulus comparison tasks—may not be differentiating working memory from sensory memory processes in analyses of behavioral responses, because retention interval durations have not excluded the possibility that the sensory memory trace drives task performance. This conflation of different constructs may be one contributor to discrepant research findings and the resulting proliferation of theoretical conjectures regarding mechanisms of working memory for nonverbal sounds.
Article
Full-text available
According to the multi-component view of working memory, both auditory-nonverbal information and auditory-verbal information are stored in a phonological code and are maintained by an articulation-based rehearsal mechanism (Baddeley, 2012). Two experiments were carried out to investigate this hypothesis using sound materials that are difficult to label verbally and difficult to articulate. Participants were required to maintain 2 to 4 sounds differing in timbre over a delay of up to 12 seconds and under different secondary task conditions. While there was no convincing evidence for articulatory rehearsal as a main maintenance mechanism for auditory-nonverbal information, the results suggest that processes similar or identical to auditory imagery might contribute to maintenance. We discuss the implications of these results for multi-component models of working memory.
Article
Full-text available
The utilization of game elements in a non-game context is currently used in a vast range of different domains. However, research on game elements' effects in cognitive tasks is still sparse. Thus, in this study we implemented three game elements, namely, progress bar, level indicator, and a thematic setting, in a working memory training task. We evaluated the impact of game elements on user performance and perceived state of flow when compared to a conventional version of the task. Participants interacting with game elements showed higher scores in the working memory training task than participants from a control group who completed the working memory training task without the game elements. Moreover, game elements facilitated the individuals' performance closer to their maximum working memory capacity. Finally, the perceived flow did not differ between the two groups, which indicates that game elements can induce better performance without changing the perception of being " in the zone " , that is without an increase in anxiety or boredom. This empirical study indicates that certain game elements can improve the performance and efficiency in a working memory task by increasing users' ability and willingness to train at their optimal performance level.
Article
Full-text available
Working memory is widely considered to be limited in capacity, holding a fixed, small number of items, such as Miller's 'magical number' seven or Cowan's four. It has recently been proposed that working memory might better be conceptualized as a limited resource that is distributed flexibly among all items to be maintained in memory. According to this view, the quality rather than the quantity of working memory representations determines performance. Here we consider behavioral and emerging neural evidence for this proposal.
Article
Background: Cognitive tasks are typically viewed as effortful, frustrating, and repetitive, which often leads to participant disengagement. This, in turn, may negatively impact data quality and/or reduce intervention effects. However, gamification may provide a possible solution. If game design features can be incorporated into cognitive tasks without undermining their scientific value, then data quality, intervention effects, and participant engagement may be improved. Objectives: This systematic review aims to explore and evaluate the ways in which gamification has already been used for cognitive training and assessment purposes. We hope to answer 3 questions: (1) Why have researchers opted to use gamification? (2) What domains has gamification been applied in? (3) How successful has gamification been in cognitive research thus far? Methods: We systematically searched several Web-based databases, searching the titles, abstracts, and keywords of database entries using the search strategy (gamif* OR game OR games) AND (cognit* OR engag* OR behavi* OR health* OR attention OR motiv*). Searches included papers published in English between January 2007 and October 2015. Results: Our review identified 33 relevant studies, covering 31 gamified cognitive tasks used across a range of disorders and cognitive domains. We identified 7 reasons for researchers opting to gamify their cognitive training and testing. We found that working memory and general executive functions were common targets for both gamified assessment and training. Gamified tests were typically validated successfully, although mixed-domain measurement was a problem. Gamified training appears to be highly engaging and does boost participant motivation, but mixed effects of gamification on task performance were reported. Conclusions: Heterogeneous study designs and typically small sample sizes highlight the need for further research in both gamified training and testing. Nevertheless, careful application of gamification can provide a way to develop engaging and yet scientifically valid cognitive assessments, and it is likely worthwhile to continue to develop gamified cognitive tasks in the future.
Conference Paper
Although advances in touchscreen technology have provided us with more precise devices, touchscreens are still laden with latency issues. Common commercial devices present with latency up to 125ms. Although these levels have been shown to impact users' perception of the responsiveness of the system [16], relatively little is known about the impact of latency on the performance of tasks common to direct-touch interfaces, such as direct physical manipulation. In this paper, we study the effect of latency of a direct-touch pointing device on dragging tasks. Our tests show that user performance decreases as latency increases. We also find that user performance is more severely affected by latency when targets are smaller or farther away. We present a detailed analysis of users' coping mechanisms for latency, and present the results of a follow-up study demonstrating user perception of latency in the land-on phase of the dragging task.