ArticlePDF Available

On the improvement of auditory localization with non-individualized HRTF-based sounds

Authors:

Abstract and Figures

Auralization is a powerful tool to increase the realism and sense of immersion in Virtual Reality environments. The Head Related Transfer Function (HRTF) filters commonly used for auralization are non-individualized, as obtaining individualized HRTFs poses very serious practical difficulties. It is therefore extremely important to understand to what extent this hinders sound perception. In this paper we address this issue from a learning perspective. In a set of experiments, we observed that mere exposure to virtual sounds processed with generic HRTF did not improve the subjects’ performance in sound source localization, but short training periods involving active learning and feedback led to significantly better results. We propose that using auralization with non-individualized HRTF should always be preceded by a learning period.
Content may be subject to copyright.
PAPERS
On the Improvement of Localization Accuracy with
Non-individualized HRTF-Based Sounds
CATARINA MENDONC¸A,
1
AES Associate Member
(Catarina.Mendonca@ccg.pt)
,
GUILHERME CAMPOS,2
AES Full Member,
PAU LO DI AS2
,
JOS ´
E VIEIRA,2
AES Full Member,
JO ˜
AO P. FERREIRA1
,
AND JORGE A. SANTOS1
1University of Minho: School of Psychology; Centro Algoritmi; Centro de Computac¸˜
ao Gr´
afica, Guimar˜
aes, Portugal
2University of Aveiro, Department of Electronics Telecomunications and Informatics, Aveiro, Portugal
Auralization is a powerful tool to increase the realism and sense of immersion in Virtual
Reality environments. The Head Related Transfer Function (HRTF) filters commonly used
for auralization are non-individualized, as obtaining individualized HRTFs poses very serious
practical difficulties. It is therefore extremely important to understand to what extent this
hinders sound perception. In this paper we address this issue from a learning perspective.
In a set of experiments, we observed that mere exposure to virtual sounds processed with
generic HRTF did not improve the subjects’ performance in sound source localization, but
short training periods involving active learning and feedback led to significantly better results.
We propose that using auralization with non-individualized HRTF should always be preceded
by a learning period.
0 INTRODUCTION
Binaural auralization consists in the process of spatial-
izing sounds. The aim is to accurately simulate acoustic
environments and provide vivid and compelling auditory
experiences. It has applications in many fields; examples
range from flight control systems to tools for helping the
visually impaired. It also has a strong potential for vir-
tual reality (VR) applications and in the entertainment in-
dustry. This acoustic simulation should take into account
the influence of the listener’s anatomy over the sounds.
In fact, the interaction of sound waves with the listener’s
body—particularly torso, head, pinnae (outer ears), and ear
canals—has extremely important effects in sound localiza-
tion, notably interaural time and level differences (ITD and
ILD, respectively), the main cues for static source localiza-
tion. Such effects can be measured as a binaural impulse
response for the corresponding source position, known as
Head Related Impulse Response (HRIR), or by its Fourier
transform, the Head Related Transfer Function (HRTF). It
is possible to appropriately spatialize headphone-delivered
sounds by processing anechoic recordings of the source ma-
terial through the HRTF filters corresponding to the desired
virtual source position [1][2][3][4].
Since they depend on anatomic features such as the
size and shape of head and ears, HRTFs vary considerably
from person to person. From this fact emerged two distinct
auralization approaches: individualized HRTFs, made
for each listener from their own individual features, and
generic/averaged HRTFs. Given the between-subject
variability in HRTFs, it is arguable that all spatial audio
simulations should use individualized HRTFs. However,
this is extremely difficult to obtain in practice; HRTF
recordings are effortful and expensive, requiring anechoic
rooms, arrays of speakers (or accurate speaker positioning
systems), miniature microphones, and specialized software
and technicians. Alternatively, generic sets are measured
on manikins or head-and-torso systems equipped with
artificial pinnae designed to approximate as best as possible
an "average" human subject. Several additional efforts
exist to minimize the differences between generic and in-
dividualized HRTFs [1,5,6], and some mixed models have
been proposed [7]. However, the debate between the merits
and trade-offs of individualized/generic auralizations still
persists.
On the one hand, there is still not enough data on the
efficiency of the generic (non-individualized) HRTFs in re-
placing the individualized ones. Not all information in an
indivisualized HRTF is perceptually relevant [8]. It has been
suggested that satisfactory auralization can be obtained us-
ing generic HRTFs [6]. Wenzel et al. [9] compared the
localization accuracy when listening to external free-field
acoustic sources and to virtual sounds filtered by non-
individualized HRTFs. Several front-back and up-down
J. Audio Eng. Soc., Vol. 60, No. 10, 2012 October 1
MENDONCA ET AL. PAPERS
confusions were found, but there was overall similarity be-
tween the results obtained in the two test situations. A sim-
ilar result was found in the auralization of speech signals
[10], as most listeners were able to obtain useful azimuth
information from speech filtered with non-individualized
HRTFs.
On the other hand, there are indications that the listen-
ing effects of individualized HRTF-based systems do dif-
fer from the generic ones [11,12]. There is a significant
increase in the feeling of presence when virtual sounds
are processed with individualized binaural filters instead
of generic HRTFs [13]. In a study that compared real
life listening with real head recordings and artificial head
recordings [14], it was found that localization accuracy
with recordings is worse than in real life, and that artifi-
cial heads are worse than real head recordings. Interest-
ingly, there was a clear learning effect over the period
of five days. There had been some previous suggestions
that the perception of spatial sound with non-individualized
HRTFs might change over time. Begault and Wenzel [10]
observed several individual differences, which suggested
that some listeners were able to adapt more easily to the
spectral cues of the non-individualized HRTFs than oth-
ers. Asano et al. [15] claimed that reversal errors decreased
as subjects adapted to the unfamiliar cues in static ane-
choic stimuli. Jie and collaborators [16] argued that there
were listening improvements with time in loudspeaker
displayed sounds convolved with non-individualized
HRTFs.
In this context, our primary research question in this
paper is: can humans learn to accurately localize sound
sources processed with HRTF sets different from their own?
There is evidence that the mature brain is not immutable,
but instead holds the capacity for reorganization as a con-
sequence of sensory pattern changes or behavioral training
[17]. Shinn-Cunningham and Durlach [18] trained listeners
with “supernormal” cues, which resulted from the spectral
intensification of the peak frequencies. With repeated test-
ing, during a single session, subjects adapted to the altered
relationship between auditory cues and spatial position.
Hofman [19] addressed the consequences of manipulating
spectral cues over large periods of time (19 days), adapt-
ing molds to the outer ears of the subjects. Elevation cues
(which, in static listening, depend exclusively on monaural
cues) were initially disrupted. These elevation errors were
greatly reduced after several weeks, suggesting that sub-
jects learned to associate the new patterns with positions in
space.
The broad intention of this study was to assess how
training may influence the use of non-individualized static
HRTFs. Our main concern was assuring that users of
such generically spatialized sounds become able to fully
enjoy their listening experiences in as little time as
possible.
Three experiments were designed to answer the ques-
tions: Do listeners spontaneously improve accuracy with-
out feedback in short periods of time? (experiment 1); and
Can the adaptation process be accelerated by applying
feedback? (experiments 2 and 3).
1 GENERAL METHODS
1.1 Participants
The main experiment comprised a series of successive
tests. In all experiments, only na¨
ıve and inexperienced
young adults were used. They all had normal hearing, ver-
ified by standard audiometric screening at 500, 750, 1000,
1500, and 2000 Hz. All auditory thresholds were below
10 dB HL and none had significant interaural differences
(threshold differences were below 5 dB HL at target fre-
quencies). There were always four participants for each
experiment. In the last two experiments participants were
the same, half of the participants started with Experiment 2
and half started with Experiment 3.
1.2 Stimuli
The stimuli in all experiments consisted of pink noise
sounds auralized at several positions in space. The original
(anechoic) sound was convolved with the HRTF pair corre-
sponding to the desired source position. The resulting pair
of signals—for the left and the right ear—was then repro-
duced through earphones. No fade-in/fade-out were used in
the start/end of the signal.
The HRTF set were recorded using a KEMAR dummy
head microphone at the Massachusetts Institute of Technol-
ogy [20]. Sounds were reproduced with a Realtec Intel 8280
IBA sound card, and presented through a set of Etymotics
ER-4B MicroPro in-ear earphones.
All sounds were presented pseudo-randomly: they were
randomized assuring the same event number for all stimuli.
Each stimulus had a three-second duration, with one second
of inter-stimulus interval.
2 EXPERIMENT 1
This experiment intended to assess the localization ac-
curacy of inexperienced subjects as they became gradually
more familiarized with the non-individualized HRTF pro-
cessed sounds. We tested their ability to locate different
azimuth sounds in the horizontal plane in 10 consecutive
experimental sessions (blocks), while not providing any
feedback on the accuracy of their responses. We analyzed
the evolution of the subjects’ performance across blocks.
2.1 Method
Sounds were auralized at eight different azimuths: 0
(front), 180(back), 90(left and right), 45(left and right),
and 135(left and right). They had constant elevation (0)
and distance (1m).
There were 10 blocks, each one with 10 stimuli repeti-
tions. Therefore, each block presented a total of 80 sounds
per subject. Participants were told to indicate the perceived
sound source location for each stimulus.
The answers were recorded by selecting, on a touch
screen, one of the eight possible stimulus positions.
2 J. Audio Eng. Soc., Vol. 60, No. 10, 2012 October
PAPERS LOCALIZATION ACCURACY WITH NON-INDIVIDUALIZED HRTF-BASED SOUNDS
Fig. 1. Percentage of correct answers by azimuth (400 trials per
value).
2.2 Results
The average accuracy of azimuth localization was above
chance (65% correct answers) in all cases, but no ceiling
performances were observed. The left and right 90sounds
were on average the most accurately located, with a correct
response rate of 78% (see fig. 1). Similarly to what had been
found in previous studies [8][9], there were several front-
back confusions that account for the lower accuracy at 0
(62% correct answers), 180(43%), left/right 45(60%)
and left/right 135(69%).
Despite the data presented in fig. 1, we should not con-
sider the analysis by azimuth representative, as each listener
revealed different tendencies and biases. Indeed, there were
lateral asymmetries in accuracy, as well as no fixed rule in
the error typology. For instance, some participants failed to
answer correctly to the 45stimuli due to front-back con-
fusions, whereas others failed due to confusions with the
90sounds. To further analyze these effects a more com-
prehensive study, with more participants, would be needed.
Analyzing the average participants’ performance along
time (fig. 2), we see that the overall accuracy remained
constant. There were individual differences between par-
ticipants. Listener 1 was the least accurate (50.4% correct
answers), listeners 2 and 3 performed near average (61.9%
and 71.1%, respectively), and listener 4 had the best az-
imuth localization performance (85.1%).
The linear regression results revealed a slope coefficient
close to zero (0.04), meaning almost no change in the per-
centage of correct responses. The concatenated correlation
values revealed that indeed the experimental block number
did not account for the listeners’ accuracies (r2=0.00,
p=0.66). The analysis of each individual participant’s
correlations revealed that none obtained a significant effect
of block number over correct responses. This result was
Fig. 2. Percentage of correct answers by experimental block and
linear regression (80 trials per dot).
further supported by hypothesis testing (FANOVA9,3=
0.087, p =0.99), which revealed no significant interactions
between each block and each listener’s correct answers.
2.3 Discussion
Our results reveal that na¨
ıve participants are able to lo-
calize sounds at several azimuths. However, this ability is
neither high nor consistent among subjects. Furthermore,
throughout the exposure blocks, their accuracy does not
evolve, leading to the conclusion that simple exposure is
not enough for significant localization improvement in short
periods of time.
In view of these conclusions, a second experiment was
developed where, in the same amount of time, listeners
were trained to identify sound source locations.
3 EXPERIMENT 2
In experiment 2, we tested the participants’ accuracy in
localizing sounds at several azimuths before and after a
short training program. In this program, we selected only
a small number of sounds and trained them through active
learning and response feedback.
3.1 Method
All stimuli were auralized varying in azimuth, with ele-
vation (0) and distance (1m) fixed. Azimuths ranged from
the front of the subjects head to their right ear, spaced at
6intervals (from 6left to 96right). Only these azimuths
were used, aiming to assure that other effects such as front-
back biases (like a subject’s tendency to perceive sounds in
the back area rather than in the front) and individual lat-
eral accuracy asymmetries (like a subject’s tendency to be
more accurate for the right sided sounds) did not emerge, as
they were not the focus of our study. Stimuli did not range
solely from 0to 90, but from 6to 96, to avoid reduc-
ing response options, which would artificially increase the
accuracy at these azimuths.
J. Audio Eng. Soc., Vol. 60, No. 10, 2012 October 3
MENDONCA ET AL. PAPERS
Fig. 3. Touch screen in the pre-test and post-test (A). Touch screen
in the training program (B).
3.2 Procedure
The experiment started with a pre-test. In the pre-test, all
sounds were presented with four repetitions each. Partici-
pants had to indicate, on a continuum displayed on a touch
screen (fig. 3A, blue area), the point in space where they
estimated the sound source to be.
After the pre-test, participants engaged in a training pe-
riod. The trained sounds corresponded to the frontal (0),
lateral (90), and three intermediate azimuths (21,45
, and
66) (see white areas in fig. 3B).
The training conformed to the following steps.
Active Learning: Participants were presented with a
sound player where they could hear the training sounds
at their will. To select the sounds, there were several but-
tons on the screen, arranged according to the corresponding
source position in space. The participants were informed
that they had five minutes to practice and that afterwards
they would be tested.
Passive Feedback: After the five minutes of active learn-
ing, participants heard the training sounds and had to point
their location on a touch screen (fig. 2B). After each trial,
they were told the correct answer. The passive feedback
period continued until participants could answer correctly
in 80 percent of the trials (5 consecutive repetitions of all
stimuli with at least 20 correct answers).
When the training period ended, participants performed a
post-test, an experiment equal to the pre-test for comparison
purposes.
3.3 Results
3.3.1 Pre-Test
Results from the pre-test and post-test sessions are dis-
played in figs. 4 and 5.
Observing the average subjective localization for each
stimulus (fig. 4), we find there were no major shifts in
the perceived location of each sound. In the pre-test there
were greater azimuth mean value deviations in the central
area, where sound tended to be perceived more to the right;
and in the right area, where the opposite effect occurred.
Standard deviations were much larger in the central area,
which might reflect greater uncertainty and not a significant
shift in the perceived stimulus location.
In fig. 5, medium gray and dark gray bars display the
average distance (in degrees) to the given stimulus posi-
tion. Light gray bars display the mean theoretical error (in
-6 0 6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96
-6
0
6
12
18
24
30
36
42
48
54
60
66
72
78
84
90
96
-6 0 6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96
-6
0
6
12
18
24
30
36
42
48
54
60
66
72
78
84
90
96
Mean Response Postion (Azimuth deg)
Pre-Test
Stimulus Position (Azimuth deg)
Post-Test
Fig. 4. Mean azimuth localization tendencies for each stimulus
and standard deviations between subjects (120 trials per value).
degrees) that would be obtained if participants responded
randomly.
Analyzing the pre-test error results (fig. 5, orange bars),
we observe that azimuth localization is easier for frontal
stimuli: the average error is below 5 degrees. The absence
of rear stimuli which prevented any front-back confusions
might help explain these results. As in experiment 1, lis-
teners were fairly precise in identifying lateral source posi-
tions. Sounds were most difficult to locate at intermediate
azimuths (between 40and 60). For these positions, pre-
test localization was at chance level, revealing an overall
inability of the subjects to accurately identify such sound
positions.
On average, participants missed the stimulus position in
the pre-test by 15.67.
It is noteworthy that it was precisely in the central area
that larger standard deviations occurred, which is consistent
with the larger errors already found for this area in fig. 4.
3.3.2 TrainingPeriod
The training sessions were very successful for all partic-
ipants. All took less than 30 minutes and, in average, they
lasted 22 minutes.
Learning curves are displayed in fig. 6, where individual
azimuth localization accuracy is plotted as a function of the
time elapsed since the start of the training period.
All participants reached the 80% criterion. Despite the
differences in learning velocity, a monotonic progression
was observed for all of them.
4 J. Audio Eng. Soc., Vol. 60, No. 10, 2012 October
PAPERS LOCALIZATION ACCURACY WITH NON-INDIVIDUALIZED HRTF-BASED SOUNDS
Fig. 5. Average response error in the Pre-Test and Post-Test sessions (120 trials per value), and theoretical error level if listeners responded
randomly.
Fig. 6. Individual accuracy evolutions in the azimuth localization
training sessions.
3.3.3 Post-Test
The post-test results (fig. 5, purple bars) revealed a large
error reduction of 7.23on average, from 15.67in the pre-
test to 8.44in the post-test. Despite individual differences,
all participants revealed similar learning effects.
This difference was statistically significant in a paired
samples T-test (t(287) =14.94, p 0.001). The error re-
duction was most expressive in the intermediate azimuths,
where the average error decreased 20 degrees. Analyzing
the trained azimuths (0,21
,45
,66
,90
), we observe
that performance enhancement was substantial not only for
these stimuli, but also for others, not trained. As an example,
the best error reduction was obtained with the 48azimuth,
an untrained stimulus. In contrast, the 90azimuth, a trained
one, revealed similar results in both sessions.
Looking at average localization tendencies (fig. 5) in the
post-test listeners became slightly more precise, especially
for the right area azimuths and variability across subjects
was reduced in the central area.
3.4 Discussion
In this experiment we trained subjects in azimuth lo-
calization. We found that listeners learned fast, in a small
amount of time. They improved their localization ability not
only for the trained azimuths, but also for others. These find-
ings allow us to conclude that the trained localization abil-
ities for some stimulus positions are generalized to other,
untrained, auditory positions.
4 EXPERIMENT 3
In experiment 3, an elevation localization task was car-
ried out using the same methodology as in experiment 2.
Static elevation is known to be perceived less accurately
than azimuth, probably because it does not benefit from
as many binaural cues as azimuth. This experiment was
designed to investigate whether or not the learning effect
found in experiment 2 could be attributed to an improved
interpretation of the binaural information contained in the
HRTFs.
4.1 Method
In this experiment, the stimuli varied in elevation, but
not in azimuth (0) or distance (1m). They ranged from
the front of the listeners’ head (0in elevation) to the top
(90in elevation) in 10intervals. Participants were aware
J. Audio Eng. Soc., Vol. 60, No. 10, 2012 October 5
MENDONCA ET AL. PAPERS
Fig. 7. Touch screen in the pre-test and post-test (A). Touch screen
in the training period (B).
Fig. 8. Average response error in the Pre-Test and Post-Test
sessions (120 trials per value) and theoretical response errors if
listeners responded randomly.
that no back stimuli were present, but no instruction was
provided regarding stimuli below 0.
4.2 Procedure
Experiment 3 followed the same procedure as experi-
ment 2.
In the training period, the sounds were positioned at
elevations of 0,40
, and 90. Fig. 7 shows the touch screen
used in the pre-test and post-test sessions (A), as well as the
touch screen with the three defined elevations, which were
trained (B).
4.3 Results
4.3.1 Pre-Test
Fig. 8 presents the average distance (in degrees) between
the subjects’ answers and the stimuli elevations in the pre-
and post-test sessions. It also shows the theoretical errors
that would be obtained if subjects responded at chance.
These bars account for mean error if subjects responded
only between 0and 90elevation. In fact, participants did
sometimes answer below 0, making response distribution
asymmetric. Nevertheless, only symmetric predictions are
presented.
0 102030405060708090
0
10
20
30
40
50
60
70
80
90
0 102030405060708090
0
10
20
30
40
50
60
70
80
90
Pre-test
Mean Response Position (Elevation deg)
Stimulus Position (Elevation deg)
Post-Test
Fig. 9. Mean elevation localization tendencies for each stimulus
and standard deviations between subjects (120 trials per value).
In the pre-test session the average error was of 40.8,
close to random error. The subjects were unable to identify
the target sound position at any elevation; the worst results
were in the frontal (0) stimuli (55average error). Over-
all, participants were less accurate in estimating a sound
position in elevation than in azimuth.
Regarding where each elevation sound was perceived in
space (fig. 9), we observe that prior to training sounds were
not accurately located, falling into median locations with
large standard deviations. Standard deviations were larger
for frontal stimuli, but all sounds tended to be perceived at
higher elevations.
4.3.2 TrainingPeriod
Training sessions were faster than those of experiment
1, as there were only three trained elevations. On average,
they took 17 minutes (Fig. 10).
Only one subject (listener 3) did not evolve as expected.
After 10 minutes testing, this subject was still making ex-
cessive mistakes and was allowed a second active learning
phase (5 minutes), after which the 80 percent accuracy was
rapidly achieved.
4.3.3 Post-Test
The post-test results were better than those of the pre-test
for all subjects (figs. 8 and 9). This difference was signifi-
cant in a paired samples T-test (t(159) =16.678, p 0.001).
The average error decreased 14.75 degrees, to a mean of
6 J. Audio Eng. Soc., Vol. 60, No. 10, 2012 October
PAPERS LOCALIZATION ACCURACY WITH NON-INDIVIDUALIZED HRTF-BASED SOUNDS
Fig. 10. Individual accuracy evolutions in the elevation training
sessions.
26.5(fig. 8), an effect larger than found in experiment 2.
The training effect was most expressive for the upper stim-
uli, namely at 80,40
, and 50elevations. Among these
stimuli, the only trained one was at 40. On the other hand,
errors for sounds at 0elevation, a trained stimulus, revealed
no significant decrease in the post-test session. Similarly to
what was found in experiment 2, training was effective and
generalized well to other stimuli.
Regarding where sounds were perceived in space (fig. 9),
there was an improvement of localization accuracy for all
stimuli along with a standard deviation decrease but for
sounds below 40.
4.4 Discussion
In this experiment we trained subjects in an elevation
localization task. As in the second experiment we found
that listeners learned quite fast, in a small amount of time.
There was an overall better localization accuracy for the up-
per elevations. Sounds at lower elevations did not improve
with training. This result might be interpreted as a general
inability to accurately interpret these sounds, but it might as
well be a result of a methodological artifact. As the test sub-
jects were not aware that no stimuli existed below 0,some
responded below, artificially elevating the response distri-
bution and therefore mean error. An additional experiment
controlling methodological awareness would be needed to
obtain conclusive results.
In general, listeners improved their localization ability
not only for the trained elevations but also for others. These
findings bring further support to the assumption that the
learning achieved for specific sound positions might be
transferred to other, untrained, positions.
5 FINAL DISCUSSION
In this paper we were specifically interested in better
understanding the evolution in perceptual accuracy as a
subject familiarizes with non-individualized HRTFs. We in-
tended to understand if listeners adapt spontaneously with-
out feedback in a reasonably short time and/or if we could
somehow accelerate the adaptation process.
In experiment 1, we addressed the listeners’ adaptation
process to static non-individualized azimuth sounds with-
out feedback. Throughout 10 short experimental consecu-
tive sessions, we measured the percentage of correct an-
swers in position identification. Results revealed an overall
absence of performance improvement in all subjects. We
concluded that simple exposure is not enough for signifi-
cant accuracy evolution to be achieved in short periods of
time. Such exposure learning had been claimed in previous
works [9][12][13] in an attempt to explain individual dif-
ferences in accuracy results. Our results did not reveal such
effects. Adaptation was, however, demonstrated before in
[16], but over wide periods of time (weeks) and with spatial
feedback, as participants of those experiments carried the
molds inside their ears in their daily lives during the whole
period.
Pursuing the intent of preparing untrained listeners to
take full advantage of non-individualized HRTFs, we de-
signed a second experiment where subjects could train with
sample sounds in a short program combining active learn-
ing and feedback. In a pre-test, participants revealed good
localization abilities for frontal stimuli but performed very
poorly in the intermediate (40to 60) azimuths. After the
training sessions, in a post-test, all azimuths were identi-
fied above chance, with results significantly better than the
pre-test ones. More importantly, the training benefit was
observed not only for the trained sample azimuths but was
also generalized to other stimulus positions. In an attempt
to interpret such results, one might argue that an overall
learning of the new HRTF-based cues took place and was
then applied to the other untrained stimuli.
In experiment 3, we tested the same training program
with stimuli varying in elevation and with fixed azimuth.
Elevation alone is known to be poorly perceived in space,
when compared to azimuth, mostly because it relies less on
binaural cues as ITD and ILD values are fixed. Results in the
pre-test of this experiment revealed poor source localization
ability at almost all elevations, particularly from 0to 50.
Indeed, with unfamiliar HRTF filters, auralized sounds car-
ried little elevation information for the untrained subjects.
A large difference was found in the post-test, where some
localization ability arose. Again, the performance benefit
was generalized across stimuli and was not restricted to
the trained elevations. This finding further supports the as-
sumption that indeed the new HRTF-shaped frequencies
were learned.
Both in experiments 2 and 3, the training sessions had the
approximate duration of 20 minutes. Longer training ses-
sions might have led to better performance improvements.
We stress, however, that in preparing listeners for auralized
interfaces time should not be the criterion. In our sessions,
each participant revealed a different profile and learned at
a different velocity. Fixing a goal (such as 80% accuracy)
will allow a way of assuring all listeners reach an acceptable
adaptation.
In this paper we used the term localization as a con-
ceptual tool to assess how listeners learn and adapt to
J. Audio Eng. Soc., Vol. 60, No. 10, 2012 October 7
MENDONCA ET AL. PAPERS
non-individualized HRTF-based sounds. We addressed lo-
calization as an ability to point to a given sound source in
space. Therefore, we cannot reach conclusions about the
underlying neural processes and most precisely, we cannot
appreciate how subjects actually formed the percept of each
given sound in space. There might be pattern learning pro-
cesses involved or other cognitive factors that we cannot
account for. Nevertheless, we should stress that there was a
benefit of training not only for the listened sounds but also
for other ones. So, an adaptation or learning process did
take place.
We conclude that for binaural auralization using generic
HRTFs it is possible to significantly improve the audi-
tory performance of an untrained listener in a short pe-
riod of time. However, natural adaptation to static stimuli
is unlikely to occur in a timely manner. Without any train-
ing, several source positions are poorly located. In view of
this, we argue that testing virtual sounds processed through
non-individualized HRTFs should always consider possible
learning or adaptation effects.
Future studies in this field should test a range of dif-
ferent stimulus sounds and also focus on the endurance
of the learned capabilities over time, generalization lim-
its, and the training effects on the final auditory virtual
experience.
6 ACKNOWLEDGMENTS
This work was supported by FCT - Portuguese Founda-
tion for Science and Technology (SFRH/BD/36345/2007
and PTDC/TRA/67859/2006) and by FEDER funding
(FCOMP-01–0124-FEDER-022674).
We thank Prof. Damian Murphy (U. York) for his avail-
ability and support on methodological issues. We also thank
Prof. Pedro Arezes for his technical support.
7 REFERENCES
[1] J. Blauert (ed.), Comunication Acoustics (Germany:
Springer-Verlag, 2005).
[2] F. Wightman and D. Kistler, “Measurement Valida-
tion of Human HRTFs for Use in Hearing Research,” Acta
Psic. United with Acoustica, 91, pp. 429–439 (2005).
[3] G. Plenge “On the Difference between Localization
and Lateralization,” J. Acoust. Soc. Am., 56, pp. 944–951
(1974).
[4] F. L. Wightman, D. J. Kistler, and M. E. Perkins, “A
New Approach to the Study of Human Sound Localization,”
in Directional Hearing, W. Yost, G. Gourevich, Eds. (New
York: Springer-Verlag, 1987).
[5] D. Hammershøi and H. Møller, “Sound Transmission
To and Within the Human Ear Canal,” J. Acoust. Soc Am.
100 (1), pp 408–427 (1996).
[6] J. C. Middlebrooks, “Individual Differences in
External-Ear Transfer Functions Reduced by Scaling Fre-
quency,” J. Acoust. Soc. Am.106 (3), pp. 1480–1492
(1999).
[7] K. J. Faller II, A. Barreto, and M. Adjouadi, “Aug-
mented Hankel Total Least-Squares Decomposition of
Head-Related Transfer Functions,” J. Audio Eng. Soc.,
vol. 58, pp. 3–21 (2010 Jan./Feb.).
[8] J. Breebaart, F. Nater, and A. Kohlrausch, “Spectral
and Spatial Parameter Resolution Requirements for Para-
metric Filter-Bank-Based HRTF Processing,” J. Audio Eng.
Soc., vol. 58, pp. 126–140 (2010 Mar.).
[9] E. M. Wenzel, M. Arruda, D. J. Kistler, and F. L.
Wightman, “Localization Using Nonindividualized Head-
Related Transfer Functions,” J. Acoust. Soc. Am., 94,
pp. 111–123, 1993.
[10] D. R. Begault and E. M. Wenzel. “Headphone Lo-
calization of Speech,” Hum. Fact., 35 (2), pp. 361–376
(1993).
[11] T. Papadopoulos and P. A. Nelsar, “Choice of In-
verse Filter Design Parameters in Virtual Acoustics Imag-
ing Systems,” J. Audio Eng. Soc., vol. 58, pp. 22–35 (2010
Jan./Feb.).
[12] E. Blanco-Martin and F. J. Casajus-Quir´
os, “Ob-
jective Measurement of Sound Event Localization in Hor-
izontal and Median Planes,” J. Audio Eng. Soc., vol. 53,
pp. 124–136 (2011 Mar.).
[13] A. Valjamae, P. Larson, D. Vastfjall, and M Kleiner,
Auditory Pressure, Individualized Head-Related Trans-
fer Function, and Illusory Ego-Motion in Virtual Environ-
ments,” Proceedings of the Seventh Annual Workshop in
Presence, 2004, Spain.
[14] P. Minnaar, S. K. Olesen, F. Christensen, and H.
Møller. “Localization with Binaural Recordings from Ar-
tificial and Human Heads,” J. Audio Eng. Soc., vol. 49,
pp. 323–336 (2001 May).
[15] F. Asano, Y. Suzuki, and T. Stone. “Role of Spectral
Cues in Median Plane Localization,” J. Acoust. Soc. Am.,
80, pp. 159–168 (1990).
[16] J. Huang, H. Li, A. Saji, K. Tanno, and T. Watanabe,
“The Learning Effect of HRTF Based 3-D Sound Percep-
tion with a Horizontally Arranged 8-Loudspeaker System,”
presented at the 129th Convention of the Audio Engineering
Society (2010 Nov.), convention paper 8274.
[17] C. D. Gilbert, “Adult Cortical Dynamics,” Physiol.
Rev., 78, pp. 467–485 (1998).
[18] B. G. Shinn-Cunningham, N. I. Durlach, and R. M.
Held, “Adapting to Supernormal Auditory Location Cues.
I. Bias and Resolution.” J. Acoust. Soc. Am., 103, pp. 3656–
3666 (1998).
[19] P. M. Hofman, J. G. A. Van Ristwick, and A. J. Van
Opstal, “Relearning Sound Localization with New Ears,”
Nat. Neurosc., 1, pp. 417–421 (1998).
[20] B. Gardner and K. Martin. “HRTF Measurements
of a KEMAR Dummy-Head Microphone,” MIT Media Lab
Perceptual Computing – Technical Report #280, J. Audio
Eng. Soc., 1994.
8J. Audio Eng. Soc., Vol. 60, No. 10, 2012 October
PAPERS LOCALIZATION ACCURACY WITH NON-INDIVIDUALIZED HRTF-BASED SOUNDS
THE AUTHORS
Catarina Mendonc¸a Guilherme Campos Jos´
e Vieira Paulo Dias
Jo˜
ao Pedro Ferreira Jorge Almeida Santos
Catarina Mendonc¸a is a psychologist with an MSc in
cognitive sciences, and a Ph.D. in experimental psychol-
ogy and cognitive sciences. She currently works at Uni-
versity of Minho as a research coordinator in perception,
interaction, and usability. Her areas of expertise are hu-
man perception of multimodal displays, perception-action
loops, human adaptability, and user interfaces. Since 2006
she has worked in the Laboratory of Visualization and
Perception, as a researcher. From 2008 to 2011 she worked
as a researcher both in the School of Psychology, Univer-
sity of Minho, and in the Faculty of Medicine, University
of Coimbra. In University of Minho she was a teaching
assistant from the school years 2007/2008 to 2010/2011 in
the discipline of “laboratory of perception” and an invited
teaching assistant both in 2008/2009 and in 2010/2011,
having lectured “Psychology of Perception.” Over the last
few years, she has published under the theme of human
perception and participated in several funded research
projects (e.g., research project Biomotion (PTDC/SAU-
BEB/68455/2006), Noiseless (PTDC/TRA/6786 59/2006),
Acousticave (PTDC/EEA-ELC/112137/2009), and in in-
ternational collaborations.
r
Guilherme Campos graduated in electrical and computer
engineering at the Faculty of Engineering of the Univer-
sity of Porto (FEUP) in 1989, with a specialiszation in
control systems. He worked for several companies in elec-
tronic system development, factory and building automa-
tion projects, and quality assurance and became technical
director of a manufacturer of electrical equipment with a
voting seat at the national electrical standards committees.
He gained scholarships from FCT (Portugal’s Foundation
for Science and Technology) to study at the University of
York (UK), where he completed an MSc in music technol-
ogy (Dept. of Music and Electronics) in 1997 and a Ph.D.
(Dept. of Electronics) in 2003 on room acoustic modelling
using 3-D digital waveguide meshes. He moved back to
Portugal to take a post of assistant professor at the Depart-
ment of Electronics, Telecommunications and Informatics
(DETI) of the University of Aveiro and pursue research in
audio signal processing at its affiliated Institute of Electron-
ics and Telematics Engineering (IEETA). His main current
research interests are: 3-D digital waveguide modeling and
its practical application to room acoustic simulation and
real-time audio virtual reality; parallel computing and de-
velopment of specific hardware for physical modeling; au-
dio pattern recognition for computer-aided stethoscopy.
r
Jos´
e Vieira received a diploma in electrical engineer-
ing in 1988 from the University of Coimbra. In 2000 he
received a Ph.D. in electrical engineering from the Uni-
versity of Aveiro. He has been a professor of electrical
engineering at the University of Aveiro since 1991, and
is also a researcher at the IEETA Institute. He has been
the president of the AES Portugal Section since 2003. His
major research interests are in the fields of digital audio,
signal reconstruction, digital fountains, and compressed
sensing.
r
Paulo Dias was born in France in 1975. He went to
the University of Aveiro Portugal where he graduated
in electronics and telecommunications in 1998, being
awarded with the Alcatel price 3Pr´
emio Engenheiro Jos´
e
Ferreira Pinto Basto2for the best student of electronics
and telecommunications engineering of the University of
Aveiro (1998). After graduation he got a one-year grant
within the EC-Camera Research and Training Network to
start working in 3D reconstruction in the joint research
center in Italy. He spent three more years in Italy with a
Ph.D. grant from the Portuguese Foundation for Science
and Technology to continue his work in 3D reconstruction
and fusion of intensity and range information. In Septem-
ber 2003 he finished his Ph.D. at the University of Aveiro
with the thesis 33D Reconstruction of Real World Scenes
Using Laser and Intensity Data.2He thenstarted teaching at
the University of Aveiro as an assistant professor within the
Department of Electronics Telecommunications and Infor-
matics. He is also involved in several works and projects
within his research unit in Aveiro (IEETA Instituto de
J. Audio Eng. Soc., Vol. 60, No. 10, 2012 October 9
MENDONCA ET AL. PAPERS
Engenharia Electr´
onica e Telem´
atica de Aveiro) related to
3D reconstruction, virtual reality, computer vision, com-
puter graphics, visualization combination, and fusion of
data from multiple sensors.
r
Jo˜
ao Pedro Ferreira received a degree in electronics
engineering and an MSc degree in robotics, control sys-
tems, and automation from the University of Minho,
Guimar˜
aes, Portugal in 2010. In 2011 he started his
Ph.D. in technology and information system, develop-
ing his thesis in the area of social signals processing
in human-computer interaction. Currently he works at
the University of Minho as researcher on the social sig-
nals processing project. He has been collaborating on
projects related to human-robot and human-computer in-
teraction, usability, digital signal processing applied to
sound and image, computer programming languages, and
mathematical models for machine learning and intelligent
systems.
r
Prof. Dr. Jorge Almeida Santos is associated profes-
sor in the Department of Psychology at the University of
Minho, Portugal. He is a member of the national research
and UM centers Cipsi (psychology) and Algoritmi (engi-
neering). His research domains and interests are: visual
perception, perception-action and intermodal processes
(visual, auditory, tactile), virtual reality and immersive sys-
tems, human factors, safety and health. He is the coordi-
nator of the Laboratory of Visualization and Perception
(http://webs.psi.uminho.pt/lvp) and member of the Ad-
ministration Board of the Center for Computer Graphics
(http://www.ccg.pt), a nonprofit RTD institution, member
of international networks as the former ZDGVnet and forth-
coming GraphicsMediaNet.
10 J. Audio Eng. Soc., Vol. 60, No. 10, 2012 October
... Individualized sounds yield greater localization precision in azimuth [4]. In elevation, localization accuracy for non-individualized sounds is at change level [5,6], revealing that these sounds are very poorly localized, if at all. Also, the feeling of presence is higher with the use of individualized HRTFs [7]. ...
... Luckily, it has been demonstrated that with short training sessions localization accuracy is dramatically improved [5], along with externalization, and that these results are lasting [6]. However, developing training programs might be impractical to implement. ...
... This may reveal greater certainty or clearer mental spatial representations. It is noteworthy that simple prolonged exposure to non-individualized HRTF-based sounds does not lead to improvement [5]. Therefore this effect is most likely due to the adaptation induced by the audiovisual stimulation. ...
Conference Paper
Full-text available
Recent Brief audiovisual experience affects subsequent auditory localization. In this study we were interested in the localization of non-individualized auralised sounds before and after exposure to audiovisual stimulation. First there was a pre-test in auditory localization. Then we presented listeners with alternating auditory and audiovisual stimuli which either: were constantly mismatched (Experiment 1); had a varying mismatch with average around a mode value (Experiment 2); or had a varying mismatch with average not around a mode value (Experiment 3). In the end of each experiment, there was a post-test in auditory localization. Results showed that the perceived auditory source position was shifted in the post-test in the direction of where the visual source was in the audiovisual trials. This shift was greater when the audiovisual mismatch was constant than when it varied. There was an improvement in localization accuracy when the mismatch was constant, but not when it varied. These results reveal that audiovisual stimulation leads to an improvement of localization accuracy with non-individualized auralisations, even when light and sound are not matching in space. However, this mismatch should be predictable, otherwise worsening of localization accuracy might occur.
... Some authors have already proposed that the perception of spatial sounds with non-individualised HRTFs might be affected by perceptual learning processes, which would explain the great variability among different subjects [8] and the decrease of errors as subjects adapt [14]. In a previous paper [15] we have demonstrated that listeners do learn to localize non-individualised auralisations. There is no accuracy improvement without feedback in short periods of time, but with controlled training subjects significantly improve their performances. ...
... The training followed the same steps as applied in our previous work [15]: ...
... Furthermore, in both experiments, inexperienced subjects performed worse than those who had participated in a previous unrelated training (having trained elevations before azimuth testing or vice-versa). This result is compatible with our previous findings [15] where the benefit of training was not limited to the trained stimuli but also to sounds at other points in space. It might be argued that the cues are learned in direct association with the stimulus, but they are encoded as a wider frame, which is then applied to other new stimuli. ...
Conference Paper
Full-text available
Auralisations with HRTFs are an innovative tool for the reproduction of acoustic space. Their broad applicability depends on the use of non-individualised models, but little is known on how humans adapt to these sounds. Previous findings have shown that simple exposure to non-individualised virtual sounds did not provide a quick adaptation, but that training and feedback would boost this process. Here, we were interested in analyzing the long-term effect of such training-based adaptation. We trained listeners in azimuth and elevation discrimination in two separate experiments and retested them immediately, one hour, one day, one week and one month after. Results revealed that, with active learning and feedback, all participants lowered their localization errors. This benefit was still found one month after training. Interestingly, participants who had trained previously with elevations were better in azimuth localization and vice-versa. Our findings suggest that humans adapt easily to new anatomically shaped spectral cues and they are able to transfer that adaptation to non-trained sounds.
... These findings indicate that using auralisation with non-individualised HRTF should always be preceded by a learning period. This work, on the basis of an earlier presentation at the 129 th AES Convention, was selected for publication in the AES Journal [15]. ...
... These findings indicate that using auralisation with non-individualised HRTF should always be preceded by a learning period. This work, on the basis of an earlier presentation at the 129 th AES Convention, was selected for publication in the AES Journal [15]. ...
Conference Paper
Full-text available
This communication is an overview of the FCT-funded three-year research project ‘AcousticAVE – Auralisation Models and Applications in Virtual Reality Environments’, a collaboration between the Universities of Aveiro (UA - IEETA) and Minho (UM - CIPsi, LVP). The project involved the development of auralisation software based on the image-source method accommodating dynamic scenarios, with real-time tracking of source/listener motion and listener head orientation. This software supported psychophysical research at the CAVE-like facilities of UM’s Visualisation and Perception Lab (LVP). This included an investigation on learning effects in spatial audio perception using non-individualised HRTF sets and distance and time-to-passage (TTP) perception experiments.
Article
Full-text available
In this article we present a review of current literature on adaptations to altered head-related auditory localization cues. Localization cues can be altered through ear blocks, ear molds, electronic hearing devices, and altered head-related transfer functions (HRTFs). Three main methods have been used to induce auditory space adaptation: sound exposure, training with feedback, and explicit training. Adaptations induced by training, rather than exposure, are consistently faster. Studies on localization with altered head-related cues have reported poor initial localization, but improved accuracy and discriminability with training. Also, studies that displaced the auditory space by altering cue values reported adaptations in perceived source position to compensate for such displacements. Auditory space adaptations can last for a few months even without further contact with the learned cues. In most studies, localization with the subject's own unaltered cues remained intact despite the adaptation to a second set of cues. Generalization is observed from trained to untrained sound source positions, but there is mixed evidence regarding cross-frequency generalization. Multiple brain areas might be involved in auditory space adaptation processes, but the auditory cortex (AC) may play a critical role. Auditory space plasticity may involve context-dependent cue reweighting.
Article
Full-text available
Previous findings have shown that humans can learn to localize with altered auditory space cues. Here we analyze such learning processes and their effects up to one month on both localization accuracy and sound externalization. Subjects were trained and retested, focusing on the effects of stimulus type in learning, stimulus type in localization, stimulus position, previous experience, externalization levels, and time. We trained listeners in azimuth and elevation discrimination in two experiments. Half participated in the azimuth experiment first and half in the elevation first. In each experiment, half were trained in speech sounds and half in white noise. Retests were performed at several time intervals: just after training and one hour, one day, one week and one month later. In a control condition, we tested the effect of systematic retesting over time with post-tests only after training and either one day, one week, or one month later. With training all participants lowered their localization errors. This benefit was still present one month after training. Participants were more accurate in the second training phase, revealing an effect of previous experience on a different task. Training with white noise led to better results than training with speech sounds. Moreover, the training benefit generalized to untrained stimulus-position pairs. Throughout the post-tests externalization levels increased. In the control condition the long-term localization improvement was not lower without additional contact with the trained sounds, but externalization levels were lower. Our findings suggest that humans adapt easily to altered auditory space cues and that such adaptation spreads to untrained positions and sound types. We propose that such learning depends on all available cues, but each cue type might be learned and retrieved differently. The process of localization learning is global, not limited to stimulus-position pairs, and it differs from externalization processes.
Article
The audibility of HRTF information reduction was investigated using a parametric analysis and synthesis approach. Nonindividualized HRTFs were characterized by magnitude and interaural phase properties computed for warped critical bands. The minimum number of parameters was established as a function of the HRTF set under test, the sound-source position, whether overlapping or nonoverlapping parameter bands were used, and whether spectral characteristics were derived from the HRTF magnitude or power spectrum domain. A three-interval, forced-choice procedure was employed to determine the required spectral resolution of the parameters and the minimum requirements for interaural phase reconstruction. The results indicated that, for pink-noise stimuli, the estimation of magnitude and interaural phase spectra per critical band is a sufficient prerequisite for transparent HRTF parameterization. Furthermore the low-frequency HRTF phase characteristics can be efficiently described by a single interaural delay while disregarding the absolute phase response of the individual HRTFs. Also, high-frequency phase characteristics were found to be irrelevant for the HRTFs and the stimuli used for the test. Estimation of parameters in the spectral magnitude domain using overlapping parameter bands resulted in better quality compared to either power-domain parameter estimation or the use of nonoverlapping parameter bands. When HRTFs were reconstructed by interpolation of parameters in the spatial domain, a spatial measurement resolution of about 10 degrees was shown to be sufficient for high-quality binaural processing. Further reductions in spatial resolution predominantly give rise to monaural cues, which are stronger for interpolation in the vertical direction than in the horizontal direction. The results obtained provide clear design criteria for parametric HRTF processing in filter-bank-based applications such as MPEG Surround.
Article
Reproducing a pair of binaural signals over loudspeakers requires crosstalk cancellation filters that create sound at the two ears corresponding to a transparent delivery of the intended source material. Such filtering is effectively inverting the actual response of the loudspeakers to the two ears. The authors explore the consequences of inversion, especially when the response lasts longer than that of a strictly anechoic environment. The choice of inverse design parameters proves more difficult than expected. The authors conclude that the required knowledge of the actual environment is equivalent to making in situ measurements.
Article
A software tool, Localization of Sound Events (LSE), is presented which mimics the behavior of the auditory system for sound localization. This objective localization is accomplished by measuring a binaural signal in the horizontal plane and a monaural signal in the median plane. The LSE tool can simulate acoustic configurations such as a virtual acoustic opening in which a multichannel audio system attempts to synthesize the wave field propagated from a source room to a receiving room as if there were a real window. This study presents the results of the application of LSE to real and simulated signals. LSE is shown to simulate accurately an auditory localization system for virtual acoustic openings.
Article
There are currently three options for the implementation of spatial audio systems based on head-related transfer functions (HRTFs): individually measured HRTFs, generic HRTFs, and customizable HRTFs. Individualized HRTFs require the subject to undergo lengthy measurements in an anechoic chamber under the supervision of trained personnel, which limits their availability to the average potential user. To overcome this, many researchers and commercial developers have resorted to using generic HRTFs. However, it has been shown that generic HRTFs result in an increase in localization errors. The third possibility, which we are pursuing, is the customization of HRTFs, in which the anatomical measurements of a potential user are utilized to determine the parameters of a structural model of the head and the pinnae. However, an initial step of decomposing the impulse responses of individualized HRTFs into a summation of damped and delayed sinusoids (DDSs) is needed to reveal the parameters of the structural pinna model. An exhaustive search decomposition method has already been developed which seems to do this accurately when the DDSs are separated by long latencies, but not for short latencies. The application of the Hankel total least-squares (HTLS) decomposition method is proposed as a solution to the parameter extraction problem faced in customizing HRTFs when short latencies are expected between the DDSs that make up their impulse response.
Article
Previous experiments have shown that localization with binaural recordings made with artificial heads is inferior to localization in real life and also to localization with recordings made in the ears of selected humans. These results suggest that artificial heads may be improved. A new experiment was made, employing recordings from two human heads and seven artificial heads some of which had been developed recently. The listening room setup from previous experiments was used and 20 listeners participated. As in the earlier experiments, more directional errors were seen with binaural recordings than in real life. A clear learning effect was seen over five days, emphasizing the need of a balanced experimental design. The new results show that artificial heads are still not as good for recording as a well-selected human head, although some of the new heads come close. The accumulated results from the present and four earlier studies provide sufficient statistics to conclude that there are significant differences between some currently available artificial heads.
Article
Our interactions with the world around us depend heavily on information supplied by the auditory system. Information about the presence and identity of a sound source is obviously important. However, the location of the sound is often equally important. In everyday life localization seems so automatic and generally so precise that we much more often find ourselves concentrating on “what” rather than “where.” Nevertheless, localization is an important auditory function, and the details of how it is accomplished in the auditory system are not well understood.
Article
This article offers a brief review of the theory underlying production of virtual auditory stimuli and a discussion of various issues a researcher might encounter in applying the theory. The focus of the discussion is on head-related transfer functions (HRTFs) which describe how a sound wave is transformed as it propagates from a point in space to a listener's eardrums. The discussion covers both technical and psychophysical issues. Most of the technical issues will be familiar to anyone who produces virtual auditory stimuli, focusing on how and where HRTF measurements are made and how the data are processed. Specific issues include: 1) measurement at the eardrum or at the entrance of a blocked meatus; 2) the meaning of HRTF phase, and the consequences of representing HRTFs as minimum-phase systems; 3) individual differences in HRTFs. A few important technical issues will be discussed that have received relatively little attention. Chief among these are problems involved with compensating for the on-ear frequency response of the headphones used to present virtual auditory stimuli. Since the final test of the adequacy of any virtual synthesis must rest with the human listener, psychophysical issues are at least as important as technical issues. Among the psychophysical issues that will be reviewed are front-back confusions and various difficulties with response modalities that restrict either stimulus or response alternatives.
Conference Paper
Head-related transfer functions (HRTFs) are used to achieve binaural sound spatialization. Originally, sound spatialization techniques that utilize HRTFs require an intended listener to undergo lengthy measurements with specialized equipment. Unfortunately, the alternative generic HRTFs increase localization errors especially in elevation. Another option that we are pursuing is to customize HRTFs based on the physical measurements of a listener such that their performance is equivalent to measured HRTFs. However, an initial step of decomposing measured HRTFs in order to reveal the parameters of the required structural pinna model must be performed. A new approach for the decomposition of HRTFs is suggested and evaluated on simulated examples. Finally, the method is used to decompose actual HRTFs and the results are evaluated.
Article
The role of spectral cues in the sound source to ear transfer function in median plane sound localization is investigated in this paper. At first, transfer functions were measured and analyzed. Then, these transfer functions were used in experiments where sounds from a source on the median plane were simulated and presented to subjects through headphones. In these simulation experiments, the transfer functions were smoothed by ARMA models with different degrees of simplification to investigate the role of microscopic and macroscopic patterns in the transfer functions for median plane localization. The results of the study are summarized as follows: (1) For front-rear judgment, information derived from microscopic peaks and dips in the low-frequency region (below 2 kHz) and the macroscopic patterns in the high-frequency region seems to be utilized; (2) for judgment of elevation angle, major cues exist in the high-frequency region above 5 kHz. The information in macroscopic patterns is utilized instead of that in small peaks and dips.