Conference PaperPDF Available

Composing space in the space: an Augmented and Virtual Reality sound spatialization system

  • Xi-an Jiaotong - Liverpool University

Abstract and Figures

This paper describes a tool for gesture-based control of sound spatialization in Augmented and Virtual Reality (AR and VR). While the increased precision and availability of sensors of any kind has made possible, in the last twenty years, the development of a considerable number of interfaces for sound spatialization control through gesture, their integration with VR and AR has not been fully explored yet. Such technologies provide an unprecedented level of interaction, immersivity and ease of use, by letting the user visualize and modify position, trajectory and behaviour of sound sources in 3D space. Like VR/AR painting programs , the application allows to draw lines that have the function of 3D automations for spatial motion. The system also stores information about movement speed and direc-tionality of the sound source. Additionally, other parameters can be controlled from a virtual menu. The possibility to alternate AR and VR allows to switch between different environment (the actual space where the system is located or a virtual one). Virtual places can also be connected to different room parameters inside the spatialization algorithm .
Content may be subject to copyright.
Composing space in the space: an Augmented and Virtual Reality sound
spatialization system
Giovanni Santini
Hong Kong Baptist University
This paper describes a tool for gesture-based control of
sound spatialization in Augmented and Virtual Reality (AR
and VR). While the increased precision and availability of
sensors of any kind has made possible, in the last twenty
years, the development of a considerable number of inter-
faces for sound spatialization control through gesture, their
integration with VR and AR has not been fully explored
yet. Such technologies provide an unprecedented level of
interaction, immersivity and ease of use, by letting the user
visualize and modify position, trajectory and behaviour of
sound sources in 3D space. Like VR/AR painting pro-
grams, the application allows to draw lines that have the
function of 3D automations for spatial motion. The system
also stores information about movement speed and direc-
tionality of the sound source. Additionally, other parame-
ters can be controlled from a virtual menu. The possibility
to alternate AR and VR allows to switch between differ-
ent environment (the actual space where the system is lo-
cated or a virtual one). Virtual places can also be connected
to different room parameters inside the spatialization algo-
Sound spatialization has been used as a resource for musi-
cal expression at least since Willaert’s production at Basil-
ica di San Marco in Venice (mid 16th century) [1]. More
recently, since the first implementations of electronic mu-
sic and especially in the past few decades, with the devel-
opment of advanced sound spatialization algorithms (e.g.,
Vector-based Amplitude Panning (VBAP) [2], Higher Or-
der Ambisonics (HOA) [3]), spatial sound has become a
key element of the compositional syntax for an increasing
number of composers: “space as a finality in music expres-
sion” (Leo Kupper in [4]) and “space as a compositional
language”( [5]).
Since the first experiments by Pierre Schaeffer in the early
50s [1] one of the key aspects has been the control of the
trajectories of sound sources (i.e., how to manipulate po-
sition coordinates through a “high-level” interface), along
with the composition of many other parameters that can
Copyright: c 2019 Giovanni Santini et al. This is
an open-access article distributed under the terms of the
Creative Commons Attribution 3.0 Unported License, which permits unre-
stricted use, distribution, and reproduction in any medium, provided the original
author and source are credited.
affect sound perception (e.g. directivity, aperture of sound
source and room characteristics).
Many solutions have been developed by providing some
form of graphic editing/automations. In order to achieve
intuitiveness and ease of use in a context where a big num-
ber of parameters comes into play, often some specific form
of gestural input has been deployed. Gestural interfaces
include tablets or gamepads ( [6], [7]), gesture recognition
through camera input, both for visible light and infrared
( [8]), [9]), or different sensors ( [10]). More extensive
reviews can be found in [11] and [12].
One further differentiation among systems can be identi-
fied between real-time sound spatialization systems or off-
line studio editing applications: in the latter group can be
inscribed systems responding to the needs of computer-
aided composition, i.e. intuitive controls to be connected
to the development of a musical structure ( [13], [6]). Real-
time control systems can often be referred to as DMI (Digi-
tal Musical Instrument [11], [14]) and more specifically as
Spatialization Instruments, defined as “a Digital Musical
Instrument, which has the capability of manipulating the
spatial dimension of the produced sound, independently of
its capability of producing or manipulating the other sound
dimensions” [12].
Notwithstanding the high differentiation in functionali-
ties and implementation details, all the cited input models
result in some kind of symbolic representation that does
not show the sound source in its exact position in space.
In other words, none of those system lets the user see and
control the sound trajectory “as it is”. Overcoming such
limitations might provide a better control, as “[...] devices
whose control structures match the perceptual structure of
the task will allow better user performances.” ( [15], refer-
ring to [16]).
In the case of Spatialization Instruments, “matching the
perceptual structure of the task” would mean to exactly see
where the sound source is positioned in space 1.
The recent advancements in VR and AR technologies
provide the background for representing the sound loca-
The described tool allows to represent and control the be-
haviour of sound sources in a 3D immersive space, as well
as to edit other sound source parameters and store, save and
1The limitations of direction and distance perception (that would
counteract the idea of clear identification of sound source position and
trajectory) will be discussed later.
Proceedings of the 16th Sound & Music Computing Conference
ISBN 978-84-09-08518-7 ISSN 2518-3672
recall those data. Such automations can be modified af-
ter creation. Representation of positioning is in real-world
scale and has a reduced level of abstraction, prioritizing as
much as possible intuitiveness and matching visual objects
to sound behaviour.
The Augmented Reality implementation allows to see and
place sources in the real space. The VR mode provides
interaction with virtual environments. Different (real and
virtual) locations can be linked to different audio room set-
tings inside the spatialization algorithm.
The system is developed through the interaction of two
main components:
an AR/VR project developed in Unity3D for the HTC
Vive Pro headset;
a Max/MSP patch dedicated to sound spatialization by
using Spat (Ircam tools).
The two programs talk to each other through OSC (Open
Sound Control) protocol.
The system has been tested in the LIATe (Lab for Immer-
sive Arts and Technology) at Hong Kong Baptist Univer-
sity, with a 24.2 channels setup.
Figure 1. 10 sources distributed over the Sound Spa-
tialization setup in the LIATe shown in the Max object
2.1 The Unity Project
The AR session is implemented in Unity for HTC Vive
Pro, currently the only headset allowing both VR and AR
The input comes from the two controllers for the Vive,
which have 6 DOF (Degrees Of Freedom) motion tracking.
The right controller allows the positioning of one sound
source at a time through parenting (an operation by which
a virtual object is linked in position and rotation to another
object). By moving the controller and pressing the back
trigger, the user can create/modify the trajectory of the se-
lected sound source. Such trajectory is shown as a line
drawn in the air. As a child2, a source can be given an
offset respect to the parent controller, thus translating and
magnifying the movement of the controller (for example,
by shifting the sound source one meter above the controller
on the Y axis, a 360 rotation of the controller would create
a 2m diameter circle centered on the controller).
2A parent is the object providing the reference coordinate system,
while a child is a virtual object whose coordinates are referred to the
coordinates of the parent.
Figure 2. Point of view 1 on a combination of sources and
Figure 3. Point of view 2 on the same combination.
The position of sound sources (update frame by frame) is
sent through OSC to Max/MSP (that performs the sound
In the current state of development, the application allows
to control up to 10 sound sources at the same time.
The left controller can move an additional sound source.
Furthermore, it has a User Interface (UI) attached allow-
ing for the selection of different tools (the UI only sends
OSC commands to Max/MSP, which actually performs the
shifting the sound source from the parent controller (over
the three different axes);
selecting and soloing (if needed) different sound sources
and assigning different trajectories (recognizable by
different colors);
changing the aperture and yaw of the selected source;
choosing the spatialization algorithm;
changing the room (as a VR room);
storing and recalling those trajectories; changing trajec-
tories after drawing.
Proceedings of the 16th Sound & Music Computing Conference
ISBN 978-84-09-08518-7 ISSN 2518-3672
Figure 4. The Menu attached to the controller.
Sound sources are visualized as spheres of different col-
ors; when they move, either they follow a trajectory or are
moved by a controller. The trajectory is not followed with
a fixed speed: speed is changed according to the original
gesture (every trajectory is, originally, drawn with a ges-
ture). If the sound source’s duration is longer than the tra-
jectory’s one (e.g., the sound source is 3 seconds and the
trajectory is 2 seconds long), the sound source is left static
on the last point of the trajectory. However the gesture
representation can be always edited in real time by press-
ing the trigger of the controller. Thus, the user can freely
adjust a trajectory to the sound source it is related to.
2.2 Sound spatialization
OSC bundles sent out from Unity are received by a Max/
MSP patch based on Spat (Ircam tools). As both Unity and
Spat use a coordinate system where 1 corresponds to 1 me-
ter, the passage from one system to the other does not re-
quire remapping except for coordinate systems alignment.
While the AR/VR project in Unity can be considered the
front-end of the application, all the core functions are ac-
tually implemented in Max/MSP and most of the functions
control Spat parameters (position, sound source aperture,
yaw, etc.).
The system uses different “coll” objects (each one for ev-
ery different sound source), in order to store, save and re-
call trajectory information.
Different spatialization algorithms are available (e.g. 3D
VBAP, HOA and binaural) [17], and their use is left to the
discretion of the user.
Sources moving along trajectories can also be saved as
audio tracks.
The presented application is based on a relation between
virtual object position and sound source position; therefore
a critical issue must be considered: distance estimation and
respondence of visual and aural movements.
As [18] shows, the vision-based distance estimation of
a virtual object presents problems in an AR environment.
While the angular positioning is rather precise, the un-
derstanding of distance tends to be underestimated. The
study evaluates numerous rendering strategies for virtual
objects (such as aerial perspective3, cast shadows 4and
shading 5). The authors find, through two specifically de-
signed experiments, that the most effective (by far) ren-
dering strategy to reduce the underestimation of distance
consists of casting shadows on the floor (rendered shadows
are created by a virtual source of light perpendicular to the
floor). In fact, in both experiments, cast shadows proved
to increase accuracy in distance estimation respectively by
90% and 18%.
For audio discrimination, as shown in [19] and [20] many
parameters and spectral cues enter into play: sound level,
direct-to-reverberant ratio (DRR), spectral shape (e.g., low-
pass filtering of frequencies in function of the distance),
binaural cues like Interaural Time Differences (ITDs) and
Interaural Level Differences (ILDs), dynamic cues (mo-
tion) and familiarity with the sound. Even though such
cues are important for giving an idea of distance, a pre-
cise estimation of the perceived distance is problematic. In
fact, given the complexity of the overall perceptual sys-
tem and the dependency of recognition upon many differ-
ent factors, including the conformation of the venue itself,
distance perception is biased and tends to underestimation.
[19] also shows that the presence of a visual cue can help
in focusing the position of a sound source (sometimes pro-
ducing ventriloquism, the phenomenon that occurs when a
listener mistakenely adjusts the perception of sound local-
ization to the position of the visual cue).
Moreover, the discrimination of behaviour of sources is
made problematic by some other effects: for instance, one
sound tends to be more sharply localized when its posi-
tion coincides with the one of a real speaker. Another phe-
nomenon we can take as an example, named as flickering
in [5], consists in the impossibility for our hearing to dis-
criminate position under a very fast source movement, or
better, the tendency to ignore most part of a trajectory, by
focusing only on some discontinuous points in space.
Figure 5. The same configuration of Figure 2 and 3 but in
VR (and with down-cast shadows).
According to [19] and [20] the simultaneous presence of
both visual and aural cues helps in discriminating position,
3Increased hazyness of colors with the increase of distance.
4Renderings of virtual shadows on the floor.
5Defining the reflectance properties of a virtual object.
Proceedings of the 16th Sound & Music Computing Conference
ISBN 978-84-09-08518-7 ISSN 2518-3672
distance and behaviour of a sound source; as down-cast
shadows 6help to have a correct estimation of virtual ob-
jects position, they further increase the precision of sources
The presented tool allows to control sound spatialization
in an immersive environment, providing the visualization
of sound sources’ positions and trajectories. It allows fast
testing of spatial compositional solutions and real-time con-
trol over numerous spatialization parameters. It can be
used live as a Spatialization Instrument or off-line as a sort
of (limited) Digital Audio Workstation (DAW).
As pointed out in [6] the limit of some gesture-controlled
(real-time) systems might fall short for what concerns large-
scale conception and compositional organization, especially
in relation to musical structures that might prescind from
bodily gestures. For this reason, a future improvement
should include the possibility to edit trajectories even in
a computer-aided composition context.
The Spatial Instrument described might seem to follow
from a na¨
ıve approach: sound trajectories can be perceived
with the same clarity of our visual perception (i.e., the
two representations, visual and aural, of a movement are,
to some extent, precise and identical). As already shown
in [5], [19], [20] even hearing under the most ideal condi-
tions, perceived distances appears to be “a biased estimate
of physical source distance” [19]. As the perception of
distance (but also of behaviour over time) is influenced by
spectral characteristics of sound, the proposed system can
be useful as a way for “fast prototyping”, but cannot solve
the problems inherent to sound spatialization, that in nu-
merous cases require a tailored approach to different sound
sources, sound fields and timbres.
In addition to the source-trajectory approach shown in
this paper, another resource might be found in a spectral
spatialization approach. One possible idea would consist
of distributing different frequency bands of one audio file
across the space as if they were different sound sources
and providing each band with dynamic movements; while
such approach could not have a single-bin accuracy while
maintining intuitiveness of use, bin grouping based on psy-
choacoustic perception (such as Bark bands [21]) would
certainly be possible. Therefore, it would be possible to
obtain a fluctuating timbral environment by organizing the
movement of different Bark bands inside one timbre.
Moreover, a future study will be addressed to the assess-
ment of the usability and usefulness of the tool both with
trained musicians and untrained people.
The paper has described a VR/AR immersive system for
sound spatialization. It allows real-time control over po-
sition, trajectory and other parameters of different sound
6Down-casting shadows in AR requires a 3D scanning of the environ-
ment. HTC Pro has the capability to do so, but the range is rather limited
and subject to visual artifacts. In VR shadows are easy to represent prop-
sources, visualized as spheres. Trajectories are visualized
as virtual strokes.
The Digital Instrument mapping is intuitive, as sounds’
positions and trajectories mirror the gesture of the player.
These gestures can be translated in space and scaled (a
small movement can result in a shift of several meters).
A simple UI attached to the left controller allows the user
to change different parameters and options (spatialization
algorithm, sound source, aperture and yaw etc.). The appli-
cation can be also used as a tool for automating trajectories
and can be useful for electroacoustic composition. Data
about sources movements can be stored as text in “coll”
objects; spatialized soundfiles can also be exported as au-
The switch from AR to VR changes the environment where
virtual sources are visualized from the real world to a VR
landscape. Such possibility to switch makes it easier to
render on the floor shadows of virtual objects represent-
ing sound sources. As [18] shows, such shadows, rendered
under the objects with a virtual light perpendicular to the
floor, increase the accuracy of estimation of virtual objects
The intuitiveness of the system is enhanced by the simul-
taneous presence of both visual (representation of sound
sources and trajectories) and aural cues. On the other side,
such close mimicking between sound and visual behaviour
might induce a simplistic approach (as if localization of
sound sources could always be perfectly accurate). The
user should always consider some degree of inaccuracy
due to intrinsic characteristic of sound spatialization: the
understanding of source positioning is influnced by many
paraemters, such as intensity, direct-to-reverb ratio, and
spectral EQ. Consequently, in numerous circumstances, a
case by case approach should be considered.
[1] R. Zvonar, “A history of spatial music,” CEC, 1999.
[2] V. Pulkki, “Virtual Sound Source Positioning Using
Vector Base Amplitude Panning,” Audio Engineering
Society, 1997.
[3] D. Malham, “Higher order ambisonic systems for the
spatialisation of sound,” in 1999 International Com-
puter Music Conference (ICMC), 1999.
[4] R. Normandeau, “Timbre spatialisation: The medium
is the space,” 2009.
[5] T. Schmele, “Exploring 3D Audio as a New Musi-
cal Language,” Master’s Thesis, Universitat Pompeu
Fabra, 2011.
[6] J. Garcia, J. Bresson, and T. Carpentier, “Towards in-
teractive authoring tools for composing spatialization,
in 2015 IEEE Symposium on 3D User Interfaces, 3DUI
2015 - Proceedings, 2015.
[7] K. Bredies, N. A. Mann, J. Ahrens, M. Geier, S. Spors,
and M. Nischt, “The multi-touch SoundScape ren-
derer,” in Proceedings of the working conference on
Advanced visual interfaces - AVI ’08, 2008.
Proceedings of the 16th Sound & Music Computing Conference
ISBN 978-84-09-08518-7 ISSN 2518-3672
[8] D. Copeland, “The NAISA Spatialization System,”
2014. [Online]. Available: http://www.darrencopeland.
net/web2/?page{ }id=400
[9] W. Fohl and M. Nogalski, “A Gesture Control Inter-
face for a Wave Field Synthesis System,” in Nime 2013
Proceedings of the International Conference on New
Interfaces for Musical Expression, 2013.
[10] M. L. Hedges, “An investigation into the use of intu-
itive control interfaces and distributed processing for
enhanced three dimensional sound localization,” Mas-
ter thesis, Rhodes University, 2015.
[11] A. Pysiewicz and S. Weinzier, “Instruments for Spatial
Sound Control in Real Time Music Performances. A
Review.” in Musical Instruments in the 21st Century.
Singapore: Springer, 2017, pp. 273–296.
[12] A. P´
erez-Lopez, “Real-Time 3D Audio Spatialization
Tools for Interactive Performance,” Universitat Pom-
peu Fabra, Barcelona, p. 38, 2014.
[13] R. Gottfried, “SVG to OSC transcoding as a platform
for notational Praxis and electronic performance,” in
Proceedings of the International Conference on Tech-
nologies for Music Notation and Representation, Paris,
2015, pp. 154–161.
[14] J. Malloch, D. Birnbaum, E. Sinyor, and M. M. Wan-
derley, “Towards a new conceptual framework for dig-
ital musical instruments,” in Proceedings of the 9th In-
ternational Conference on Digital Audio Effects, 2006.
[15] M. M. Wanderley and N. Orio, “Evaluation of input
devices for musical expression: Borrowing tools from
HCI,” Computer Music Journal, 2002.
[16] R. J. K. Jacob, L. E. Sibert, D. C. McFarlane, and
M. P. Mullen, Jr., “Integrality and separability of
input devices,ACM Trans. Comput.-Hum. Interact.,
vol. 1, no. 1, pp. 3–26, Mar. 1994. [Online]. Available:
[17] T. Carpentier, M. Noisternig, and O. Warusfel,
“Twenty years of Ircam Spat: looking back, looking
forward,International Computer Music Conference
Proceedings, 2015.
[18] C. Diaz, M. Walker, D. A. Szafir, and D. Szafir, “De-
signing for depth perceptions in augmented reality,
in 2017 IEEE International Symposium on Mixed and
Augmented Reality (ISMAR), Oct 2017, pp. 111–122.
[19] P. Zahorik, D. S. Brungart, and A. W. Bronkhorst, “Au-
ditory distance perception in humans: A summary of
past and present research,” Acta Acustica united with
Acustica, 2005.
[20] A. J. Kolarik, B. C. Moore, P. Zahorik, S. Cirstea, and
S. Pardhan, “Auditory distance perception in humans:
a review of cues, development, neuronal bases, and ef-
fects of sensory loss,” Attention, Perception, and Psy-
chophysics, 2016.
[21] E. Zwicker, “Subdivision of the audible frequency
range into critical bands (frequenzgruppen),” The Jour-
nal of the Acoustical Society of America, vol. 33, no. 2,
pp. 248–248, 1961.
Proceedings of the 16th Sound & Music Computing Conference
ISBN 978-84-09-08518-7 ISSN 2518-3672
... Also, deeper control over the audio processing and the synthesis of spatial sound fields may be desirable for many applications. This has led to several developments that combine game engines and their extensive capabilities in 3D visual environments with computer music languages for composition, sound synthesis, and audio spatialization [14][15][16]. ...
Conference Paper
Full-text available
The Interactive Virtual Environment System (IVES) is a toolkit aiding the production of immersive audiovisual 3D virtual environments for screen-based or virtual reality (VR) applications with loudspeaker-or headphone-based spatial audio reproduction. It is developed within Cycling 74s Max programming environment and consists of a set of interface-based, higher-level building-block modules , similar to the BEAP and VIZZIE toolkits included in Max. IVES uses and unifies established programming libraries such as Jitter / OpenGL (Cycling'74), Spat (IR-CAM), VR (Graham Wakefield) into ready-to-use abstractions with graphical user interfaces (GUIs). This allows simple patching of individual spatial audio and visual 3D rendering chains. IVES provides various blocks in a flexible modular patching system, suitable for audiovisual rendering in different application scenarios with different content , and manages the synchronization and conversion of data, control messages and coordinate-systems, used differently in the underlying libraries. Furthermore, the system also provides modules for the creation, interaction, motion, and transformation of audiovisual spatial elements within virtual environments. The toolkit allows users to concentrate on the composition and artistic content in audiovisual virtual environments rather than the programming of the complex systems behind them.
... That is one of the first examples of accurate space sampling, where the exact position of interaction can be easily determined and used as a technical and expressive resource. In [14] a Virtual Instrument for sound spatialization is described. The tool allows one to draw trajectories in the air (with embedded speed information) and place sound sources on those trajectories. ...
Conference Paper
Full-text available
Augmented instruments have been a widely explored research topic since the late 80s. The possibility to use sensors for providing an input for sound processing/synthesis units let composers and sound artist open up new ways for experimentation. Augmented Reality, by rendering virtual objects in the real world and by making those objects interactive (via some sensor-generated input), provides a new frame for this research field. In fact, the 3D visual feedback, delivering a precise indication of the spatial configuration/function of each virtual interface, can make the instrumental augmentation process more intuitive for the interpreter and more resourceful for a composer/creator: interfaces can change their behavior over time, can be reshaped, activated or deactivated. Each of these modifications can be made obvious to the performer by using strategies of visual feedback. In addition, it is possible to accurately sample space and to map it with differentiated functions. Augmenting interfaces can also be considered a visual expressive tool for the audience and designed accordingly: the performer's point of view (or another point of view provided by an external camera) can be mirrored to a projector. This article will show some example of different designs of AR piano augmentation from the composition Studi sulla realtà nuova.
Audio is one of the essential elements of game design. Even in virtual reality applications, visuals have been thought to have more importance to create impressiveness. However, using its stereo or surround capabilities, it is possible to improve the effectiveness of the game, on the players. Also, creating an audio rich game increases its accessibility to a variety of people. This study involves a game setup with 3D audio used in virtual reality (VR), design of audio games, and Google Cardboard development. Using this game setup, a highly audio dependent mobile virtual reality game is developed and tested by 10 users. The performance of the users and feedback from the qualitative and quantitative comments were analyzed and reported. The main aim of this work is to show that audio can be a powerful game element, based on an explementary work involving a game developed with audio experience on its focus.
Full-text available
Auditory distance perception plays a major role in spatial awareness, enabling location of objects and avoidance of obstacles in the environment. However, it remains under-researched relative to studies of the directional aspect of sound localization. This review focuses on the following four aspects of auditory distance perception: cue processing, development, consequences of visual and auditory loss, and neurological bases. The several auditory distance cues vary in their effective ranges in peripersonal and extrapersonal space. The primary cues are sound level, reverberation, and frequency. Nonperceptual factors, including the importance of the auditory event to the listener, also can affect perceived distance. Basic internal representations of auditory distance emerge at approximately 6 months of age in humans. Although visual information plays an important role in calibrating auditory space, sensorimotor contingencies can be used for calibration when vision is unavailable. Blind individuals often manifest supranormal abilities to judge relative distance but show a deficit in absolute distance judgments. Following hearing loss, the use of auditory level as a distance cue remains robust, while the reverberation cue becomes less effective. Previous studies have not found evidence that hearing-aid processing affects perceived auditory distance. Studies investigating the brain areas involved in processing different acoustic distance cues are described. Finally, suggestions are given for further research on auditory distance perception, including broader investigation of how background noise and multiple sound sources affect perceived auditory distance for those with sensory loss.
Full-text available
Current input device taxonomies and other frameworks typically emphasize the mechanical structure of input devices. We suggest that selecting an appropriate input device for an interactive task requires looking beyond the physical structure of devices to the deeper perceptual structure of the task, the device, and the interrelationship between the perceptual structure of the task and the control properties of the device. We affirm that perception is key to understanding performance of multidimensional input devices on multidimensional tasks. We have therefore extended the theory of processing of percetual structure to graphical interactive tasks and to the control structure of input devices. This allows us to predict task and device combinations that lead to better performance and hypothesize that performance is improved when the perceptual structure of the task matches the control structure of the device. We conducted an experiment in which subjects performed two tasks with different perceptual structures, using two input devices with correspondingly different control structures, a three-dimensional tracker and a mouse. We analyzed both speed and accuracy, as well as the trajectories generated by subjects as they used the unconstrained three-dimensional tracker to perform each task. The result support our hypothesis and confirm the importance of matching the perceptual structure of the task and the control structure of the input device.
The systematic arrangement of sound in space is widely considered as one important compositional design category of Western art music and acoustic media art in the 20th century. A lot of attention has been paid to the artistic concepts of sound in space and its reproduction through loudspeaker systems. Much less attention has been attracted by live-interactive practices and tools for spatialisation as performance practice. As a contribution to this topic, the current study has conducted an inventory of controllers for the real time spatialisation of sound as part of musical performances, and classified them both along different interface paradigms and according to their scope of spatial control. By means of a literature study, we were able to identify 31 different spatialisation interfaces presented to the public in context of artistic performances or at relevant conferences on the subject. Considering that only a small proportion of these interfaces combines spatialisation and sound production, it seems that in most cases the projection of sound in space is not delegated to a musical performer but regarded as a compositional problem or as a separate performative dimension. With the exception of the mixing desk and its fader board paradigm as used for the performance of acousmatic music with loudspeaker orchestras, all devices are individual design solutions developed for a specific artistic context. We conclude that, if controllers for sound spatialisation were supposed to be perceived as musical instruments in a narrow sense, meeting certain aspects of instrumentality, immediacy, liveness, and learnability, new design strategies would be required.
Conference Paper
This paper presents the design and implementation of a gesture control interface for a wave field synthesis system. The user's motion is tracked by an IR-camera-based tracking system. The developed connecting software processes the tracker data to modify the positions of the virtual sound sources of the wave field synthesis system. Due to the modular design of the software, the triggered actions of the ges- tures may easily be modified. Three elementary gestures were designed and implemented: Select / deselect, circular movement and radial movement. The gestures are easy to execute and allow a robust detection. The guidelines for gesture design and detection are presented, and the user experiences are discussed.
Input devices for musical expression were evaluated by drawing parallels to existing research in the field of human computer interaction (HCI). The applications of the knowledge was discussed to the development of interfaces for musical expression. A set of musical tasks was discussed to allow the evaluation of existing input devices. The present evaluation methodology was found useful for designers, composers and performers.
In this text, the author argues that space should be considered as important a musical parameter in acousmatic music composition as more conventional musical parameters in instrumental music. There are aspects of sound spatialisation that can be considered exclusive to the acousmatic language: for example, immersive spatialisation places listeners in an environment where they are surrounded by speakers. The author traces a history of immersive spatialisation techniques, and describes the tools available today and the research needed to develop this parameter in the future. The author presents his own cycle of works within which he has developed a new way to compose for a spatial parameter. He calls this technique timbre spatialisation.
Scitation is the online home of leading journals and conference proceedings from AIP Publishing and AIP Member Societies