Conference PaperPDF Available

Acousticave: Auralisation Models and Applications in Virtual Reality Environments

PACS: 43.55-Ka
Guilherme Campos
, Paulo Dias
, José Vieira
, Jorge Santos
, Catarina Mendonça
, João
Pedro Lamas
, Nuno Silva
, Sérgio Lopes
Departmento de Electrónica, Telecomunicações e Informática (DETI), Universidade de Aveiro
Campus Universitário de Santiago, 3810-193 AVEIRO – Portugal
Tel: +351 234 370 355 Fax: +351 234 378 157
Instituto de Engenharia Electrónica e Informática de Aveiro (IEETA), Portugal
Departamento de Psicologia Básica – CIPsi, Braga, Universidade do Minho, Portugal
Centro Algoritmi, Guimarães, Universidade do Minho, Portugal
Laboratório de Visualização e Percepção (LVP), Centro de Computação Gráfica (CCG),
Guimarães, Universidade do Minho, Portugal
Department of Signal Processing and Acoustics, Aalto University, Espoo, Finland
Instituto de Telecomunicações (IT), Aveiro, Portugal
{guilherme.campos, paulo.dias, jnvieira},
{mendonca.catarina,, nunomiguel06, sergioilopes}
This communication is an overview of the FCT-funded three-year research project ‘AcousticAVE
– Auralisation Models and Applications in Virtual Reality Environments’, a collaboration between
the Universities of Aveiro (UA - IEETA) and Minho (UM - CIPsi, LVP). The project involved the
development of auralisation software based on the image-source method accommodating
dynamic scenarios, with real-time tracking of source/listener motion and listener head orientation.
This software supported psychophysical research at the CAVE-like facilities of UM’s Visualisation
and Perception Lab (LVP). This included an investigation on learning effects in spatial audio
perception using non-individualised HRTF sets and distance and time-to-passage (TTP)
perception experiments.
The importance of Virtual Reality (VR) has grown rapidly in recent years, with an ever-increasing
range of applications in the most diverse areas. However, most efforts in the design and
development of VR systems have been directed at providing visual immersion. The development
of increasingly convincing models demands that other senses also be considered, especially
hearing – we are still very much in the ‘silent era’ of VR.
A joint initiative of the Universities of Aveiro (UA) and Minho (UM), the research project
AcousticAVEAuralisation Models and Applications in Virtual Reality Environments’ aimed
precisely at integrating visual and aural immersion in VR environments. It received funding from
the FCT (Portuguese Foundation for Science and Technology) for a three-year period which
ended in April 2014 (PTDC/EEA-ELC/112137/2009). Prof. Damian Murphy, from the U. of York
(UK) and, at a later stage, Prof. Juan Miguel Navarro, from UCAM (Spain), acted as external
project consultants.
The project involved essentially two work packages. At UA, an engineering team from IEETA
(Institute of Electronics and Informatics Engineering) tackled the software and hardware
implementation of room acoustic simulation and auralisation models.
At the Visualisation and Perception Lab (LVP) of UM’s Graphics Computing Centre (CCG) in
Guimarães, a team of Psychophysics researchers from CIPsi (Psychology Research Centre)
were primarily concerned with the application side of the project, exploring and testing the
developed auralisation systems and software packages at their CAVE-like facilities. These
include a 3m-by-9m continuous projection screen comprising 3 panels with a DLP (Digital Light
Processing) projector per panel for flexible configuration (0°, 90° or 135°), a treadmill
synchronised with the 3D visual scene to allow walking on the virtual environment and an infra-
red motion capture system for user tracking. It was decided that sound presentation should be
binaural, in view of the technical requirements (room acoustic correction, equipment noise
control…) and costs of multi-speaker alternatives such as Ambisonics or Wave-Field Synthesis
A ‘customer-supplier’ relationship was established between the two teams, inasmuch as the
model development work at UA aimed at meeting the requirements of the VR experiments
designed by the UM team. The main research focus was on the role of aural cues in the perception
of human motion, commonly referred to as Biomotion (see Figure 1). Biomotion perception
experiments imply dynamic scenarios; the models must provide control over virtual source
movement and respond to changes in listener head orientation and position, tracked in real-time.
Also, since perception cues are obviously not only aural but also visual, it must be possible to
integrate the two; in fact, their interdependence and sensitivity to synchrony are points of
particular research interest. For these reasons, controlling the relative latency of the two channels
(visual and aural) is also crucial. These two requirements (real-time tracking and accurate audio-
visual synchronisation) posed the most significant engineering and signal processing challenges.
Figure 1 – A ‘point-light walker’: biomotion perception experiments are often
based on video projections using simplified human representations (avatars)
of this kind. Integration of aural stimuli (e.g. avatar step sounds) requires not
only correct spatialisation (so that they can be perceived to have been emitted
at the desired points in space) but also accurate synchronisation with the
corresponding visual stimuli.
2.1 Virtual Microphone Positioning
The first auralisation tool developed in the context of this project [1] was designed to simulate the
process of microphone positioning in audio recordings. The application is based on a dense,
regular grid of impulse responses pre-recorded on the room region under study for a given sound
source position – see Figure 2.
The desired microphone trajectory is specified using the mouse cursor on a diagram representing
that region. Each block of output sound is obtained by convolving the anechoic stream
representing the sound source with the room impulse response (RIR) corresponding to the
position currently selected. The convolution engine uses a very efficient block-based variation on
the overlap-add method, especially suited to accommodate the change of RIR filter on successive
audio output blocks; a short cross-fade is applied to suppress audible RIR transition glitches.
This program allows real-time operation with room impulse responses of virtually unlimited
duration. However, being based on in-situ measurement of monaural RIRs, it is does not lend
itself directly to model-based binaural auralisation.
Figure 2 – Mechanical platform used for recording a 2-dimensional RIR grid.
2.2 LibAAVE Auralisation Library
The development of the main software package for auralisation in interactive virtual reality
environments built on a line of research on audio-visual VR initiated at IEETA with the MEng
dissertation project ‘Virtual Hall’ [2]. A demo from this project using a low-cost head-mounted
display (HMD) system was awarded the first prize at the 2007 Audio Technology contest of the
Portuguese section of the Audio Engineering Society (AES). An important potential application
envisaged for this kind of systems was explored in [3].
In order to ease the gradual refinement of the room modelling and auralisation functions and the
integration of new features, the software package was shaped as a library named LibAAVE and
made freely available under a public license ( LibAAVE
is presented in some detail in [4] and [5]. It is designed to take a geometry model of the virtual
room in .obj format and allow arbitrary movement of both sources and listener. Its basic operation
principles are illustrated in Figure 3. Two functional blocks can be distinguished: acoustic model
processing and audio processing.
Figure 3 – Overall LibAAVE operation structure.
... ... ...
Sound Sources
3D Model
.OBJ File
Positions Head Orientation
(inertial head tracker)
Audio Stream
source 1
sound path
source 1
source n
source 1
source n
source n
sound path
source 1
... source n
source 1
azimuth &
... source n
azimuth &
Delay, Attenuation Re-sampling HRTF
Delay, Attenuation Re-sampling HRTF
The acoustic model processing block is based on simple, well-established geometric acoustic
models: the direct sound and reflections from each primary source are computed dynamically by
the mirror-image source (MIS) method, taking into account the acoustic properties of the room
(extracted automatically from its 3D geometry model), the sound source trajectories and listener
position, tracked in real time. The result is a set of sources (primary sources plus respective
mirror-images up to a certain order specified by the user). Source ‘visibility’ tests are applied to
determine which of these need be considered.
The audio processing block generates binaural output sound, suited for headphone or earphone
presentation, by adding the contributions from all the ‘visible’ sources. The anechoic sound
streams from each source are filtered according to their propagation path characteristics. In
particular, directional cues are obtained by applying head-related transfer-function (HRTF) filters
according to the propagation angles relative to the listener head, worked out in real time with the
help of a head-tracking device. An Intersense InertiaCube BT was used during the development
stage at IEETA. Different HRTF sets can be selected, taken from public-domain databases,
namely the KEMAR-based set from MIT Media Lab [6] and CIPIC [7]. As in the virtual microphone
positioning system [1], cross-fading is applied to avoid audible HRTF transition glitches.
The MIS method is only suited to real-time simulation of the very early part of the RIR. In other
words, for real-time operation the user must specify a sufficiently low maximum reflection order
(typically n<6 for a single primary source on the platforms tested); this is because the
computational cost of checking source visibility increases exponentially with reflection order [5].
The late RIR must therefore be simulated by other means. The reverberation tail solutions
adopted in LibAAVE are described in [8]. Acceptable sound quality was obtained with the Datorro
algorithm, but a feedback delay network (FDN) was preferred, as it allows frequency-dependent
LibAAVE includes functions to support visualisation of the virtual room based on its .obj geometry
model. Figure 1 illustrates one of the various interactive applications developed to demonstrate
its operation.
Figure 4 LibAAVE demonstration GUI: the movements of the virtual source (loudspeaker) and listener
(face) within the room are controlled using the mouse; head orientation can be controlled using the mouse
or a head-tracking device (Intersense InertiaCube BT). The lines between represent the sound paths from
‘visible’ sources up to the reflection order specified by the user. The binaural output at the listener position
is played in real time on headphones.
Significant effort was put into documentation, to ease future use and refinement. Along with the
code, a report including examples of how to build auralisation models is available at UA’s software
repository ( Demonstration code and videos are available
2.3 User Tracking and Auralisation with iOS Devices
The integration of LibAAVE with an ultrasonic indoor localisation system to track listener position
was tested in [9]. The localisation system can be based on mobile devices (e.g. smartphones)
and, in contrast with the Vicon motion capture system used at LVP (see section 3), requires only
a few small, inexpensive ultrasound beacons installed in the room, making it a much more
portable alternative, especially useful for audio augmented reality applications. A prototype was
built and tested using an iOS smartphone equipped with accelerometer, gyroscope and
magnetometer, which allow inertial head-tracking with automatic drift correction. LibAAVE is
currently being ported to iOS, since the available memory resources also allow the auralisation
software to be run from the device itself [10].
2.4 3D Data Acquisition for Room Acoustic Modelling
Practical usage of an auralisation package demands efficient tools of feeding its acoustic model
with the relevant room configuration data, namely 3D geometry and acoustic properties of the
boundary materials. In the LibAAVE case, this means creating an appropriate .OBJ room model.
The problem of acquiring real room data especially important for validation purposes was
addressed in [11] using the Microsoft Kinect sensor and the Kinect Fusion application to generate
polygonal models of the room boundary surfaces. Tools to help automate the identification of
surface materials and assignment of acoustic absorption/reflection coefficients to each polygon,
as required in geometric modelling, were developed with the Visualisation Toolkit (VTK). A
voxelisation algorithm was also developed to generate, based on the surface polygonal model, a
3D node grid covering the volume of the room. This is useful for physical room acoustic modelling,
whose application is envisaged in future LibAAVE developments. With the information on room
surface absorption (from the polygonal model) and volume (from the 3D grid), it is possible to
estimate reverberation time (RT
) automatically using Sabine’s formula and shape the
reverberation tail accordingly. A summary of this work can be found in [12]. The data acquisition
and modelling algorithms were tested on a meeting room at IEETA, as illustrated in Figure 5.
Figure 5 – Polygonal model of room boundary surfaces (left) and corresponding voxelisation (right).
Over the course of the project, successive versions of LibAAVE, as well as simpler tools for offline
generation of binaural sound (e.g. a MATLAB application developed early in the project for
synchronised avatar step sound generation) were installed, configured and adapted at LVP to suit
the specific needs of each audio or audio-visual perception experiment.
An important task was to feed LibAAVE with real-time listener/source position and head
orientation data from LVP’s Vicon motion-capture system, which is based on infra-red cameras
tracking reflective stickers fixed to the relevant targets (in this case listeners and sound sources).
Vicon’s software development kit (SDK), a C library allowing communication with Vicon’s
applications (Nexus, Blade and Tracker), was used for that purpose. Although not ideal in terms
of latency and precision, head-orientation detection with the Vicon system is possible using a set
of 3 stickers to define the head’s reference axes.
In March 2014, at the second of two ‘Auralisation Models and Applications’ workshops organised
to showcase the project, the LibAAVE-Vicon motion capture integration was demonstrated with a
real-time audio-visual VR simulation of a musical quartet.
As explained in the Introduction, controlling the relative latency of the audio and video chains is a
crucial requirement, not least because audio-visual synchrony perception is a research issue in
its own right take, for instance, [17] and [18]. With the help of a Brüel & Kjær Pulse data
acquisition system and appropriate instrumentation, a device was implemented to analyse the
degree of synchrony of different signals (electric triggers, motion-capture events, markers on the
auralised sounds and projected images...) and employed in experiments to measure the
processing latencies of the audio, video and motion-capture signal chains of the VR environment.
The results were documented in an LVP internal report and, based on it, a guide was prepared
on how to perform this kind of measurements and assess/adjust synchrony.
4.1 Adaptation to Non-Individualised HRTFs
As any auralisation package based on geometric room acoustics, LibAAVE relies on HRTF
filtering to impart the directional cues that allow 3D source localisation. Since obtaining
individualised HRTF filter sets would pose very serious practical difficulties, HRTF sets measured
on dummy heads with ‘average’ characteristics are used instead. As mentioned before, two such
HRTF sets were adopted in LibAAVE [6][7]. It is therefore extremely important to assess the
perceptual effectiveness of HRTF processing and understand to what extent the use of non-
individualised HRTFs might hinder spatial sound perception.
The issue was addressed from a learning perspective [13][14]. A set of experiments showed that
mere exposure to virtual sounds processed with non-individualised HRTFs did not improve the
subjects’ performance in sound source localisation, but short training periods involving active
learning and feedback led to significantly better results. These findings indicate that using
auralisation with non-individualised HRTF should always be preceded by a learning period. This
work, on the basis of an earlier presentation at the 129
AES Convention, was selected for
publication in the AES Journal [15].
An additional set of experiments were devised to investigate this learning effect in further detail.
The experiments involved three groups of subjects and a careful schedule of azimuth/elevation
localisation training and test sessions over the course of one month. This made it possible to
study the persistence in time (memorisation) of the learning effect, its dependence upon the type
of sound source and its decomposition in terms of azimuth, elevation and their cross-dependence
(i.e. how azimuth localisation training affects elevation localisation performance and vice-versa).
Externalisation effects were also studied. The results evidenced that sound localisation with
altered cues is easily trained and subject to generalisation effects across space and sound source
type: a brief training session with a restricted set of sounds and source directions is enough to
improve localisation performance for trained and untrained sounds in trained and untrained
directions. The learning effects are persistent; they can still observed one month after training,
especially in azimuth localisation. Externalization levels are also increased by training, although
not directly related to localisation accuracy levels [16].
4.2 Sound Presentation
The choice between headphones and earphones is an unresolved debate. In order to obtain some
guidance regarding this issue, a few experiments were carried out to compare localisation
performance with in-ear phones (Etymotics ER-4B) and headphones (Sennheiser HD 650)
already available at LVP. This unpublished work involved training using analogous methodologies
to those employed in the HRTF studies. The experiments were repeated under different noise
levels; in both cases, the global average localisation error was lower with headphones.
4.3 Depth and Time-To-Passage (TTP) Perception Cues
Significant work was dedicated to the investigation of distance (depth) perception [17][18]. As
mentioned before, the interplay between aural and visual cues highlights the importance of
controlling audio-visual synchrony. The selective control of reflection orders implemented in the
auralisation tools was an important feature in the experimental work leading to [19].
The work on the perception of “time to passage” (TTP) and “time to collision” (TTC) of looming
sounds involved experiments with sources of various types travelling different distances at
different velocities and with different occlusion rates [20][21]. This required the auralisation tools
to be configured with an HRTF database including near-field measurements; the choice fell on
the database from the TU of Berlin which comprises measurements at 0.5m, 1m, 2m and 3m [22].
Possible refinements to the acoustic modelling algorithms used in LibAAVE include, for example:
Adoption of frequency-dependent acoustic absorption coefficients;
Dynamic adjustment of the maximum reflection order;
Quantisation of source path delays so that only primary source FFTs need be calculated;
Optimisation of source visibility checking algorithms;
Partial pre-calculation and/or less frequent updating of source visibility;
Combination of MIS with ray-tracing or beam-tracing techniques;
Since the goal is to maximise model accuracy while ensuring real-time operation, any modification
must assessed in terms of both perceptual and computational impact.
LibAAVE can benefit enormously from parallel processing through functional decomposition into
two threads (room model and audio) and/or data decomposition within each thread. Both are
highly parallelisable, since sources can be processed independently from one another.
We also plan to test potentially more accurate techniques (namely physical modelling) and
develop novel hybrid models through combination of techniques. One possibility along those lines
is adapting the Virtual Microphone Positioning algorithm to use a grid of Ambisonics RIRs
(obtainable in a single run of a DWM model) instead of a grid of measured monaural RIRs. HRTF
processing can then be applied to perform Ambisonics-binaural conversion and allow headphone
presentation. The MEng dissertation [23] represented an initial step in that direction.
The PhD project already underway to build on the work developed on audiovisual perception [24]
is an example of the numerous threads that can be pursued on the Psychophysics research front.
The results on HRTF learning effects (arguably the most significant novel contribution from
AcousticAVE) call for larger-scale experiments to allow further investigation on the underlying
mechanisms and influential factors. Pursuing the (yet unpublished) work on headphone vs.
earphone sound presentation (section 4.2) could be very valuable in this regard.
On the application front, we intend to create a permanent demo with the prototype mentioned in
2.3, which shows huge practical application potential. Porting LibAAVE to iOS is the most
immediate task; an MEng dissertation was proposed to tackle it.
Developing room model configuration tools along the lines discussed in 2.4 is essential to promote
practical applications. Acoustic archaeology and walkthrough auralisation in cultural heritage sites
are among the most promising.
[1] Barker T, Campos G, Dias P, Vieira J, Mendonça C, Santos J (2012) ‘Real-Time Auralisation System
for Virtual Microphone Positioning’. 15th International Conference on Digital Audio Effects
(DAFx-12), York, UK, September 17-21, pp. 137-143.
[2] Casaleiro R (2008) ‘Sala de Espectáculos Virtual’. MEng dissertation. Dept. of Electronics,
Telecommunications and Informatics, University of Aveiro.
[3] Dias P, Campos G, Casaleiro R, Seco R, Santos V, Santos B S (2008) ‘3D Reconstruction and
Auralization of the “Painted Dolmen” of Antelas’. Electronic Imaging Conference 2008 (EI 2008),
SPIE Vol. 6805, 6805OY, Three-Dimensional Image Capture and Applications 2008, San Jose,
California, USA, January 28-29.
[4] Oliveira A, Campos G, Dias P, Vieira J, Santos J, Mendonça C (2013) ‘Aplicação de Auralização em
Tempo Real’. 11
Congress of AES Brasil, S. Paulo, Brasil, May 7-9, pp. 98-101.
[5] Oliveira A, Campos G, Dias P, Murphy D, Vieira J, Mendonça C, Santos J (2013) ‘Real-Time
Dynamic Image-Source Implementation for Auralisation’. 16th International Conference on
Digital Audio Effects (DAFx-13), Maynooth, Ireland, September 2-6, pp. 368-372.
[6] Gardner B, Martin K (2000) ‘HRTF Measurements of a KEMAR Dummy-Head Microphone’.
[7] Algazi, V. R., Duda, R. O., Thompson, D. M. (2001) ‘The CIPIC HRTF database’. IEEE Workshop on
Applications of Signal Processing to Audio and Acoustics, New York, October 21-24, pp. 99-102.
[8] Silva N, Oliveira A, Dias P, Campos G, Vieira J, Santos J (2014) ‘Auralização em Tempo Real para
Ambientes Virtuais Dinâmicos’. 12
Congress of AES Brasil, S. Paulo, Brasil, May 13-15.
[9] Lopes S, Oliveira A, Vieira J, Campos G, Dias P, Costa R (2013) ‘Real-Time Audio Augmented
Reality System for Pervasive Applications’. 11
Congress of AES Brasil, S. Paulo, Brasil, May 7-9.
[10] Lopes S, Vieira J, Campos G, Dias P (2014) ‘Sistema de Realidade Aumentada Áudio 3D para
Dispositivos iOS’. 12
Congress of AES Brasil, S. Paulo, Brasil, May 13-15.
[11] Pereira J (2013) ‘Aquisição e tratamento de dados 3D para modelação acústica de salas’. MEng
dissertation. Dept. of Electronics, Telecommunications and Informatics, University of Aveiro.
[12] Pereira J, Silva N, Dias P, Campos G, Vieira J (2014) ‘Aquisição e tratamento de dados 3D para
modelação acústica de salas’. 12
Congress of AES Brasil, S. Paulo, Brasil, May 13-15.
[13] Mendonça C, Santos J, Campos G, Dias P, Vieira, J (2012) ‘On the Adaptation to Non-
Individualised HRTF Auralisations a Longitudinal Study’. AES 45th International Conference,
Helsinki, Finland, March 1-4.
[14] Mendonça C (2012) ‘Audiovisual Perception of Biological Motion’. PhD thesis. School of
Psychology, University of Minho.
[15] Mendonça C, Campos G, Dias P, Vieira J, Ferreira J, Santos J (2012) ‘On the Improvement of
Auditory Accuracy with Non-Individualized HRTF-based Sounds’. J. Audio Eng. Soc. 60(10), pp.
821-830, October.
[16] Mendonça C, Campos G, Dias P, Santos, J (2013) ‘Learning Auditory Space: Generalization and
Long-Term Effects’. PLoS ONE 8(10). doi: 10.1371/journal.pone.0077900
[17] Silva C (2011) ‘Perceiving Audiovisual Synchrony as a Function of Stimulus Distance’. MSc
dissertation. School of Psychology, University of Minho.
[18] Silva C, Mendonça C, Mouta S, Silva R, Campos JC, Santos J (2013) ‘Depth Cues and Perceived
Audiovisual Synchrony of Biological Motion’. PLoS ONE 8(11). doi:10.1371/journal.pone.0080096
[19] Mendonça C, Lamas J, Barker T, Campos G, Dias P, Pulkki V, Silva C, Santos J (2013) ‘Reflection
orders and auditory distance’. 21st International Congress on Acoustics (ICA 2013), Montreal,
Canadá, June 2-7 (POMA Vol. 19, 050041).
[20] Silva R (2013) ‘Judging Time-to-Passage of looming sounds’. MSc dissertation. School of
Psychology, University of Minho.
[21] Silva R, Mouta S, Mendonça C, Lamas J, Silva C, Santos J (2013). ‘The role of acoustic cues in time-
to-passage judgments: Judging time-to-passage of looming sounds’. 5th Iberian Conference on
Perception (CIP), A Coruña, Spain.
[22] Wierstorf H, Geier M, Raake A. Spors S (2011). ‘A Free Database of Head-Related Impulse
Response Measurements in the Horizontal Plane with Multiple Distances’.
[23] Santos F (2012) ‘Auralização Binaural com HRTF e Descodificação de Ambisonics’. MEng
dissertation. Dept. of Electronics, Telecommunications and Informatics, University of Aveiro.
[24] Silva, C. (2012). ‘Audiovisual Perception in a Virtual World: An Application of Human-Computer
Interaction Evaluation to the Development of Immersive Environments’. PhD Thesis Proposal.
School of Psychology, University of Minho.
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Understanding the mechanisms underlying audiovisual perception is crucial for the development of interactive audiovisual immersive environments. Some human perceptual mechanisms pose challenging problems that can now be better explored with the latest technology in computer-generated environments. Our main goal is to develop an interactive audiovisual immersive system that provides to its users a highly immersive and perceptually coherent interactive environment. In order to do this, we will perform user studies to get a better knowledge of the rules guiding audiovisual perception. This will allow improvements in the simulation of realistic virtual environments through the use of predictive human cognition models as guides for the development of an audiovisual interactive immersive system. This system will encompass the integration of two Virtual Reality systems: a Cave Automatic Virtual Environment-like (CAVE-like) system and a room acoustic modeling and auralization system. The interactivity between user and the audiovisual virtual world will be enabled by the using of a Motion Capture system as a user position tracker.
Full-text available
Due to their different propagation times, visual and auditory signals from external events arrive at the human sensory receptors with a disparate delay. This delay consistently varies with distance, but, despite such variability, most events are perceived as synchronic. There is, however, contradictory data and claims regarding the existence of compensatory mechanisms for distance in simultaneity judgments. In this paper we have used familiar audiovisual events - a visual walker and footstep sounds - and manipulated the number of depth cues. In a simultaneity judgment task we presented a large range of stimulus onset asynchronies corresponding to distances of up to 35 meters. We found an effect of distance over the simultaneity estimates, with greater distances requiring larger stimulus onset asynchronies, and vision always leading. This effect was stronger when both visual and auditory cues were present but was interestingly not found when depth cues were impoverished. These findings reveal that there should be an internal mechanism to compensate for audiovisual delays, which critically depends on the depth information available.
Full-text available
Previous findings have shown that humans can learn to localize with altered auditory space cues. Here we analyze such learning processes and their effects up to one month on both localization accuracy and sound externalization. Subjects were trained and retested, focusing on the effects of stimulus type in learning, stimulus type in localization, stimulus position, previous experience, externalization levels, and time. We trained listeners in azimuth and elevation discrimination in two experiments. Half participated in the azimuth experiment first and half in the elevation first. In each experiment, half were trained in speech sounds and half in white noise. Retests were performed at several time intervals: just after training and one hour, one day, one week and one month later. In a control condition, we tested the effect of systematic retesting over time with post-tests only after training and either one day, one week, or one month later. With training all participants lowered their localization errors. This benefit was still present one month after training. Participants were more accurate in the second training phase, revealing an effect of previous experience on a different task. Training with white noise led to better results than training with speech sounds. Moreover, the training benefit generalized to untrained stimulus-position pairs. Throughout the post-tests externalization levels increased. In the control condition the long-term localization improvement was not lower without additional contact with the trained sounds, but externalization levels were lower. Our findings suggest that humans adapt easily to altered auditory space cues and that such adaptation spreads to untrained positions and sound types. We propose that such learning depends on all available cues, but each cue type might be learned and retrieved differently. The process of localization learning is global, not limited to stimulus-position pairs, and it differs from externalization processes.
Conference Paper
Full-text available
This paper describes a software package for auralisation in inter- active virtual reality environments. Its purpose is to reproduce, in real time, the 3D soundfield within a virtual room where listener and sound sources can be moved freely. Output sound is presented binaurally using headphones. Auralisation is based on geometric acoustic models combined with head-related transfer functions (HRTFs): the direct sound and reflections from each source are computed dynamically by the image-source method. Directional cues are obtained by filtering these incoming sounds by the HRTFs corresponding to their propagation directions relative to the listener, computed on the basis of the information provided by a head-tracking device. Two interactive real-time applications were developed to demonstrate the operation of this software package. Both provide a visual representation of listener (position and head orientation) and sources (including image sources). One focusses on the auralisation-visualisation synchrony and the other on the dynamic calculation of reflection paths. Computational performance results of the auralisation system are presented.
Full-text available
The perception of sound distance has been sparsely studied so far. It is assumed to depend on familiar loudness, reverberation, sound spectrum, and parallax, but most of these factors have never been carefully addressed. Reverberation has been mostly analyzed in terms of ratio between direct and indirect sound, and total duration. Here we were interested in assessing the impact of each reflection order on distance localization. We compared sound source discrimination at an intermediate and at a distant location with direct sound only, one, two, three, and four reflection orders in a 2AFC task. At the intermediate distances, normalized psychophysical curves reveal no differentiation between direct sound and up to three reflection orders, but sounds with four reflection orders have significantly lower thresholds. For the distant sources, sounds with four reflection orders yielded the best discrimination slopes, but there was also a clear benefit for sounds with three reflection orders. We discuss the results in terms of direct-to-reflected ratio, reflection directionality, and spectral information.
Conference Paper
Full-text available
Auralisations with HRTFs are an innovative tool for the reproduction of acoustic space. Their broad applicability depends on the use of non-individualised models, but little is known on how humans adapt to these sounds. Previous findings have shown that simple exposure to non-individualised virtual sounds did not provide a quick adaptation, but that training and feedback would boost this process. Here, we were interested in analyzing the long-term effect of such training-based adaptation. We trained listeners in azimuth and elevation discrimination in two separate experiments and retested them immediately, one hour, one day, one week and one month after. Results revealed that, with active learning and feedback, all participants lowered their localization errors. This benefit was still found one month after training. Interestingly, participants who had trained previously with elevations were better in azimuth localization and vice-versa. Our findings suggest that humans adapt easily to new anatomically shaped spectral cues and they are able to transfer that adaptation to non-trained sounds.
Full-text available
Auralization is a powerful tool to increase the realism and sense of immersion in Virtual Reality environments. The Head Related Transfer Function (HRTF) filters commonly used for auralization are non-individualized, as obtaining individualized HRTFs poses very serious practical difficulties. It is therefore extremely important to understand to what extent this hinders sound perception. In this paper we address this issue from a learning perspective. In a set of experiments, we observed that mere exposure to virtual sounds processed with generic HRTF did not improve the subjects’ performance in sound source localization, but short training periods involving active learning and feedback led to significantly better results. We propose that using auralization with non-individualized HRTF should always be preceded by a learning period.