Conference PaperPDF Available

A Modular Framework for the Analysis and Synthesis of Head-Related Transfer Functions

Authors:

Abstract and Figures

The paper gives an overview of a number of tools for the analysis and synthesis of head-related transfer functions (HRTFs) that we have developed in the past four years at the Department of Information Engineering, University of Padova, Italy. The main objective of our study in this context is the progressive development of a collection of algorithms for the construction of a totally synthetic personal HRTF set replacing both cumbersome and tedious individual HRTF measurements and the exploitation of inaccurate non-individual HRTF sets. Our research methodology is highlighted, along with the multiple possibilities of present and future research offered by such tools.
Content may be subject to copyright.
AudioEngineeringSociety
ConventionPaper
Presented at the134th Convention
2013 May4–7 Rome,Italy
Thispaper waspeer-reviewed asacomplete manuscriptforpresentationat thisConvention. Additional papersmay
be obtainedby sending requestandremittance to Audio EngineeringSociety,60 East42ndStreet,NewYork,New
York 10165-2520,USA;alsosee www.aes.org.All rightsreserved.Reproductionofthispaper,oranyportionthereof,
isnotpermittedwithoutdirect permission from theJournalof the AudioEngineeringSociety.
Amodular frameworkfortheanalysisand
synthesis ofHead-RelatedTransferFunctions
Michele Geronazzo1,Simone Spagnol1,and FedericoAvanzini1
1Dept.ofInformationEngineering,University ofPadova,ViaGradenigo 6/B,35131 Padova,Italy
Correspondence should be addressed toMichele Geronazzo(geronazzo@dei.unipd.it)
ABSTRACT
The papergivesanoverview ofanumberof tools forthe analysis and synthesis ofhead-related transferfunc-
tions (HRTFs) thatwe have developed in the past fouryearsat the DepartmentofInformation Engineering,
University ofPadova,Italy. The main objectiveofourstudy in this context is the progressive development
ofacollectionofalgorithms forthe constructionofatotally synthetic personalHRTF setreplacingboth
cumbersomeand tedious individualHRTFmeasurements and the exploitationof inaccurate non-individual
HRTF sets. Our research methodology is highlighted,along withthe multiple possibilities ofpresentand
futureresearch offered by such tools.
1.INTRODUCTION
Inrecent years spatial sound hasbecomeincreas-
ingly important in aplethora ofapplication domains.
Spatialrendering of sound is especially recognized to
greatly enhance the effectiveness ofauditoryhuman-
computerinterfaces[1], particularly in those cases
wherethe visual interface is limited in extension
and/or resolution(as in mobile applications[2]).
Furthermore, it aids improvingthe senseofpres-
ence in augmented/virtualreality systemsand adds
engagement tocomputergames. Winkingtothese
ever-growing application domains, headphone-based
reproductionsystemsdriven byheadtrackingde-
vicesif properly designed allowtailoringimmer-
siveand realistic auditory scenes to anyuserwithout
the need ofexpensiveand cumbersomeloudspeaker-
based systems.
This papergivesabrief overview ofanumberof
tools forthe analysis and synthesis ofhead-related
transferfunctions (HRTFs, i.e. the frequency-and
location-dependentacoustic transferfunctionsbe-
tween the sound source and the eardrumofalis-
tener[3]) we have developed,highlightingin particu-
larour research methodologyalong withthe diverse
possibilities ofpresentand futureresearch offered
by the mentioned tools. The main objectiveofour
study in this context is the progressive development
ofacollectionofalgorithms forthe constructionofa
8882
Geronazzoetal. A modularframeworkfortheanalysis and synthesisofHRTFs
totally synthetic HRTF set suitable for real-timeren-
dering ofcustomspatialaudio, takingthe listener’s
anthropometric parametersas the sole input tothe
audio chain. Our modelingphilosophy is the child
of the structuralapproach byBrownand Duda[4]:
the totalcontributionof the listener’s body tothe
HRTF is split intosmallerblocks or modules, and
each module contains ameasured,reconstructed or
synthetic response,aswill be made clearerthrough-
out the paper.Ourapproach differentiates from re-
cent trends in HRTFcustomization because noself-
tuning ofparametersorselectionofresponses from
databaseswill be required in principle by the lis-
tener.
Afterthe shortpictureofpast and currentresearch
onHRTF-related issuesgiven in Section2,Section3
will introduce the currently adopted framework for
HRTFrendering and customization, including all of
the hardwareand softwaretools atourdisposal. Sec-
tion4 will then discuss in more detail the structural
modusoperandi, withafocusonthe multiple cur-
rentand futureresearch directionsmade possible by
the available tools (furtherexpanded in the conclu-
sive Section5).
2.BACKGROUND
Inordertoenable authentic auditoryexperiences,
the correct sound pressurelevel (SPL)due to one or
moreacoustic sourcespositioned in avirtual space
shall be reproduced at the eardrumsofalistener
by the pair ofheadphones. Inthe literature, sound
transmissionfromthe headphone tothe eardrum
is often represented throughananalogue circuit
model[5], reported in Fig.1.Withreference tosuch
model, the currentaim ofour research is the correct
reproductionof the sound pressureP6at the en-
trance of the blocked earcanalof the listenerdue to
asound source placed around him/her,even though
nothingprevents afuture extensionof thesestudies
tocircuit points nearertothe eardrumimpedence as
soonaspropertools areavailable (e.g., HRTFmea-
surements at the eardrum,earcanalmodels, etc.).
The classical solutionthatbest approximates ide-
ality involves the useof individualHRTFs mea-
sured onthe listenerhimself/herself. Byconvolv-
ing a desired monophonic sound signalwithapair
ofpersonalhead-related impulseresponses (HRIRs),
Zheadphone
Zeardrum
Zear_canal
P
6
+
_
P
5
+
_
P
7
+
_
transmission
line
Fig. 1: Circuit modelof sound transmissionfrom
the headphone tothe eardrum(after[5]).
one perear,adequately compensated forheadphone-
induced spectralcoloration1,one canreach almost
the samelocalizationaccuracyas in free-field lis-
teningconditions[8], especially when headmo-
tionand/orartificialreverberation[9]are consid-
ered [10,11].
Unfortunately, obtainingpersonalHRTFdatafor
avast numberofusers is simply unpracticable be-
causespecific hardware,anechoic spaces, and long
collectiontimesarestrictly required [4]. This is
the main reasonwhynon-individualHRTFs, usu-
ally measured onanthropomorphic mannequins [12],
areoften preferred in practice.The drawback
with non-individualHRTFs is that such peculiar
transferfunctions likely never match withthe lis-
tener’s unique anthropometry, and especially his/her
outerear[13], resultingin frequentlocalization er-
rors such as front/backreversals [14], elevationan-
gle misperception[15], and inside-the-headlocaliza-
tion[16,17].
Consequently, computationalmodels ofHRTFs pa-
rameterized onthe anthropometryof the listeneror
tunable by the listenerhimself/herself havesurfaced
1Headphones, when usedforthereproduction ofbinau-
ral signals, haveto be equalizedifhigh localization accu-
racyis needed.Unfortunately,thetransferfunction between
headphoneand eardrumheavily variesfromperson to person
and withsmall displacementsof theheadphoneitself.Such
variation is particularlymarkedinthehigh-frequencyrange
whereimportantelevation cuesgenerallylie.Thus, an inac-
curate compensation likelyleadstospectralcolorationsthat
aectbothsource elevation perception and sound externaliza-
tion [6]. Although varioustechniqueshavebeen proposedin
ordertoface such a delicateissue(see [7]forareview), mod-
eling the correctequalization filter is still a hotopenresearch
theme.
AES134thConvention,Rome,Italy,2013 May4–7
Page2 of10
Geronazzoetal. A modularframeworkfortheanalysis and synthesisofHRTFs
Head
Torso
Pinna
Head
Torso
Pinna
x[n]
y[n](l)
y[n](r)
Fig. 2: Ageneric structuralHRTFmodel.
in the past two decades. Accordingto Brownand
Duda[4], thesemodels can be classified in three
groups:
1.pole/zeromodels:lterdesign, systemidentifi-
cation,and neuralnetwork techniquesareap-
pliedin ordertotmultiparameter models to
experimentaldata(e.g.[18]);
2.series expansionsbased e.g.on principalcom-
ponentanalysis (PCA)[19]orsurface spherical
harmonics (SSH)[20]appliedtocollectionsof
HRIRsor HRTFs;
3.structural models: the contributionsof the lis-
tener’s head,pinnae, shouldersand torsotothe
HRTFareisolated and arranged in differentl-
terstructureseach accountingforsomewell-
defined physicalphenomenon,as Fig.2roughly
sketches. The linearity of these contributions
allowsreconstructionof the globalHRTF from
apropercombinationofall the considered ef-
fects [21].
Althoughrecent trends in HRTFcustomization
mainly havefocused onseries expansionswith
self-tuning ofweights [22,23]orsimply non-
individualized HRTF selection[24,25,26], struc-
turalHRTFmodelingremains the most attractive
alternativefromboththe viewpoints ofcomputa-
tionaleciencyand physicalmeaning:parameters
of the renderingblocks sketched in Fig.2can be
estimated from realdata,tted tolow-orderlter
structures, and finally related tomeaningfulanthro-
pometric measurements. Our methodology follows
the structuralmodeling approach, subtly general-
ized such as to allowforthe inclusionofmeasured
Fig. 3: UML-likerepresentationof the research
framework structure.
orextrapolated datain one or moreof its compo-
nents. Forinstance,one would desiretocombinea
lter modelof the pinna withthe measured HRTF
ofa generic pinnaless mannequin, ortofeed apost-
processed HRTF includingthe responseof the pinna
alone (pinna-relatedtransfer function,PRTF[27])
into a ltersnowmanmodel[28]. We will referto
this approach asmixedstructural modelingthrough-
out the remainderof this paper.
3.THEFRAMEWORK
The first logicaldistinctionamongthe fundamen-
talcomponents of the presented frameworkregards
the file systemlevel. The database folderacts as the
main datacontainer,while all of the algorithms that
extractrelevant features in the available data are
stored in the analysisfolder.The synthesisfolder
contains the tools forspatialaudio renderingde-
signed and developed withan eyetoreal-time con-
straints. Finally, the experimental tools forsubjec-
tive evaluationofmodels and the related data are
organized in the evaluationfolder.The UML-like
diagramin Fig.3depicts all the components build-
ingthe framework.
3.1.HRIRand HpIRDatabases
The included datais underthe form of several sets
ofhead-related impulseresponses (HRIRs) recorded
forahigh numberof subjects in different spatiallo-
cations, and sets ofheadphone impulseresponses
AES134thConvention,Rome,Italy,2013 May4–7
Page3 of10
Geronazzoetal. A modularframeworkfortheanalysis and synthesisofHRTFs
(HpIRs) forthe characterizationofdifferenthead-
phone models used forcompensationin the repro-
duction process. Besidethe full-bodyHRIR repos-
itory, asimilarcontainerincludes the partialhead-
related impulseresponses (pHRIRs) recorded by iso-
latingspecific bodyparts (e.g.pinna-related im-
pulseresponses (PRIRs) measured onthe isolated
pinna,orimpulseresponsesmeasured onapinna-
less mannequin) or resultingfromthe decomposition
carriedon by the algorithmswewill mentionin the
next subsection.
There exists anumberofpublicly available HRIR
databases, the most notables ofwhicharethe CIPIC
HRTFdatabase[29]2and the LISTEN HRIR
database[30]3.The main differencesamongthese
and otherdatabasesconcernthe type of stimulus
used, the spatialgrid of the measured HRIRs, and
the microphone configuration(blocked- oropen-ear-
canal, distance fromthe eardrum,etc.).
Anattempt tounify such variability resulted in
the MARL-NYU dataformat[31], into whichthe
CIPIC,LISTEN, FIU[32]4and KEMAR MIT [33]5
databaseswerestored and organized.Ourcontribu-
tiontothe standardization process begun in [31] lies
in the introductionof somemissingrelevantdata:
1. the raw HRIRdatabesidethe alreadyavailable
compensated version;
2. the labeling ofeach HRIR’s onset sample;
3. the managementofheterogeneous spatialgrids.
The generaldatabaseorganizationwasalso applied
tothe Aalto[34]6,ARI7,and PKU&IOA[35]8
HRTFdatabases, and tothe AaltoPRTF
database[36]9which collects pinna-related im-
pulseresponses in the mid-sagittalplane aspHRIRs.
More details about the organizationof the HRIR,
pHRIR,and HpIR repositories can be found in [37].
2http://interface.cipic.ucdavis.edu/
3http://recherche.ircam.fr/equipes/salles/listen/
4http://dsp.eng.fiu.edu/HRTFDB/main.htm
5http://sound.media.mit.edu/resources/KEMAR.html
6http://www.acoustics.hut.fi/go/aes133-hrtf/
7http://www.kfs.oeaw.ac.at
8http://www.cis.pku.edu.cn/auditory/Sta/Dr.Qu.les/Qu-
HRTF-Database.html
9http://www.dei.unipd.it/spagnols/PRTFdb.zip
3.2.Signal AnalysisTools
The analysisfoldercontains all the Matlabscripts
and datastructuresexploitable for HRTFanaly-
sis. Ourtypicalworkow followsananalysis-by-
synthesis paradigmwherethe step-by-step model-
ing of salient featuresplays asignificantrole in the
analysis of the acoustic signal. Anotable instance
of such paradigmis represented by the PRTF sepa-
rationalgorithm[38], whichiteratively extrapolates
the reflective componentofaPRTFwhile keeping
its resonant structureintactbydirect subtractionof
multi-notch filterstructures. Asimilaralgorithm,
used in [39], separates the near-and far-field contri-
butionsofarigid sphereapproximatingthe head,al-
lowingtomodel the two contributions independently
through differentlterstructures.
Otherkindsofalgorithmsrepresent the bound-
aries” of such methodology. Forinstance,anim-
age processing algorithmthatextracts the relevant
anthropometric parameters fromapictureof the
pinna[40] is currently beingdeveloped.Ascript
forPCA modeling ofHRTFdatathathelps un-
derstandingthe degree of variability of the transfer
functionswithrespect tospecificfeatures is avail-
able [41]. Last butnot least, headphone equaliza-
tionalgorithmsimplementingvarious inverse filter-
ingtechniquesareincluded.
3.3.Synthesized AudioRendering
The audio engine, stored in the synthesisfolder, in-
cludes four modules organized in separatesubfold-
ers:
model:real-timerealizationsof the synthetic
structuralcomponents;
components:collectionof tools thatperform
real-time convolutionsbetween audio les and
HRIRs/pHRIRs;
headphones:management tool forheadphone
compensation filters;
extra:utility bundle forI/Ooperations, sensors
and basic binauralprocessingtools.
Variouscombinationsofone or moreinstances for
each module are possible in ordertorealizeacan-
didateversionof the 3Daudio engine.All instances
areimplemented in PureData10,a graphicalpro-
10http://puredata.info/
AES134thConvention,Rome,Italy,2013 May4–7
Page4 of10
Geronazzoetal. A modularframeworkfortheanalysis and synthesisofHRTFs
grammingenvironment foraudio processing, in the
form ofC/C++ externals. All the tentative proto-
typesare catalogued in afurtherfolder(the render-
ingfolder), each accompanied byadescriptorle
includingthe list of the modules used in that in-
stance.
The intrinsic modularity ofourapproach leadsus to
the implementationofone structurallterblock for
each relevantbodypart:
apinnalter realizationthatacts asasyn-
thetic PRTF, consisting ofapeakand notch
lterstructure[42,43]where each filteris fed
by three parameters (peak/notch central fre-
quency, bandwidth,and gain) each stored in a
configuration file indexed by the currenteleva-
tionangle;
asphericalmodelof the headtakinginto ac-
count the far-field and near-field scatteringef-
fects around its rigid body. The parametriza-
tionis made ontothe sphereradius selected as
a weighed combinationof the listener’s head di-
mensions[44]and the near-field contributionis
reduced downto a rst-ordershelvinglter[39];
aspherical torso approximation(as in the snow-
manmodel[28]) thatmodels elevation cuesat
lowfrequencies.
An expanded mentiongoes tothe contents of the
extrafolder, into whichacollectionof third-party
utility tools is keptupdated and integrated in our
prototypingenvironmentmainly developed in Mat-
lab,PureData and C/C++ libraries. Abasic exter-
nal forthe renderingprocess is CW binaural[45]
thatimplements real-time convolutionof sound in-
puts withselected HRIRs. Ithas the peculiarfea-
tureofbeing able toloadanarbitrarydiscreteset
ofHRIRs in .wav formatand realize different kinds
of interpolationsbetween adjacent spatialpositions.
This tool is at the basis ofourdynamic 3Daudio
renderingsystem,wherethe successful transposition
ofdynamic sources into a virtualworld notonly de-
pendsonthe accuracyof the interpolationscheme
but is alsoheavily conditioned by the quality of the
motiontrackingsystem.In decreasing orderofde-
gree ofimmersion,aPhaseSpace Impulse MoCap
system,ahead-pose estimationsystemvia webcam
Fig. 4: AGUIforlocalization experiments.
Fig. 5: Ahigh-level technicaldescriptionofatypi-
calexperimental scenario.
withthe faceAPIsoftwarelibrary, and a Trivisio Col-
ibriwireless inertialmotiontracker mounted ontop
ofapair ofheadphonesarealready integrated in our
environment.
3.4.Experimental Environment
An environment forsubjectivelocalizationtests is
stored in the evaluationfolder.AGUIin Matlab
offers the main environment forthe playbackand
judgmentof the HRTFdata and models tobe eval-
uated,reported in the screenshotof Fig.4.The
subject listens tothe sound stimulus, interactively
selects the perceived sound locationand/orother
properties, and switches tothe next sound stimulus.
AES134thConvention,Rome,Italy,2013 May4–7
Page5 of10
Geronazzoetal. A modularframeworkfortheanalysis and synthesisofHRTFs
DATABASE ANALYSIS SYNTHESIS
single
component
EVALUATION
full-model
EVALUATION
MIXED
STRUCTURAL
MODEL
HRTF
SELECTION
Emodel Emix
HRIRs
pHRIRs
extracted pHRIRs
best subject
best mixed
structural
model
Fig. 6: Typicalresearch workow towardsacompletestructuralmodel.
The dataexchange between the Matlab environment
and the audio engineis granted by the OSC(Open
Sound Control) protocol, running ontopofUDP.
Fig.5reports atechnicaldescriptionofatypicalex-
periment, whereapnetlibraryand the dumpOSC
externalareresponsible forcommunicationonthe
Matlabsideand onthe PureDatasiderespectively.
The basic features forthe experimentersuch as sub-
jectand sessionmanagementarealso available. Sub-
jects’ recordsand their personal information can be
manipulated on demand.Each experimental session
is stored in anindependentle labeled byases-
sion numberid. The task infostructcontains the
descriptiveinformationof the task and the times-
tamps foreach trialconducted within the session.
The latterfield operatesasaprimary key toread
and understand the experimentaldatastored in a
table specifically designed forthe purposeof the ex-
periment. Commonstatisticalanalysis softwarescan
directly import the alreadyorganized data.
4.MODUS OPERANDI
Theresearch process aims tobuild a completely
customizable structural modelthroughsubsequent
refinements,startingfromaselectionof recorded
HRIRsto a totally syntheticltermodel.The in-
termediatestepsare balanced mixturesof selected
pHRIRsand synthetic structuralcomponents.
Acandidatemixed structuralmodel is described
byasetofparametersand components; the eval-
uationstep guides the exclusionofcertain com-
binationsof such components. The obtained
3Daudio engineshall maximizethe eciency-to-
parametrization rate: this means that synthetic sub-
models withahigh customization degree butpoor
localization performance compelus toconceiveadif-
ferentor modified submodel. Insuch acase,apre-
vious versionof that submodel, whetherit be syn-
thetic, directly recorded orextracted byanalysis, re-
mains the best optionforthe component’s acoustic
contribution up tothatpoint. As soonas the par-
tial selection process reaches its maximum,wewill
havethe optimal solutionforourcompletestructural
model. We now describein more detail ourtypical
research workow,referringtoFig.6throughout the
whole section.
4.1.HRTF Selection
The simplest HRTF selection possible is realized by
choosingthe samesetofHRTFs forall of the listen-
ers. Whetherit be the best available localizerora
mannequin withmeananatomical features, nopre-
diction can be made onthe localization performance
ofaspecificlistener.Aninsight knowledgeof the re-
lation between localization cuesand anthropometric
featurescanguidethe selection process as in [46].
Ifmeasurements ofpinnadimensions foreach sub-
ject in the considered HRTFdatabaseareavailable,
asimple best match” with directly measured pinna
dimensionsofacommonlistenercan be exploited.
Eq. 1showsapossible distance functiononthe left
pinna appliedtothe CIPIC HRTFdatabase:
s=min
i=1...Ndi
1+di
2+di
4pl,(1)
whereNis the numberof subjects in the database,
dkis the vectorassociated tothe k-thanthropomet-
ric featureof the pinna, i.e. cavumconchaheight
AES134thConvention,Rome,Italy,2013 May4–7
Page6 of10
Geronazzoetal. A modularframeworkfortheanalysis and synthesisofHRTFs
(d1), cymbaconchaheight (d3)and fossaheight (d4)
respectively, and plis the distance fromthe superior
internalhelix bordertothe intertragic incisureof the
l-thlistener.At the end of this procedurethe s-th
subject is declared the best-matchingelevationlocal-
izerforlistenerland this correspondence is saved in
the experimentalenvironment.
4.2.Structural HRTF Selection&Modeling
The keen observershall criticizethe restrictionof the
latterselection proceduretothe sole contribution
of the pinna.Ofcourse, there exists no guarantee
forthe accuracy in azimuthlocalization.However,
the formerprocedure can be appliedtothe Aalto
PRTFdatabaseoralternatively to a collectionof
mid-sagittalPRTFs extracted in the analysis step.
On the otherside, the selectionofabest matching
contributionfromthe head bymeansofrecorded
impulseresponses (such as the pinnaless KEMAR
HRIRs) orofextracted ITD (interaural time differ-
ence)/ ILD (interaural leveldifference) information
mayadopt the same principle. Such analternative
leads to a finerselectionofeach structuralcompo-
nentand is at the baseof the progressionofour
mixed structuralmodeling approach.
HRTF sets resultingfromfurther refinements of
selection criteria should now be compared to our
candidatesynthetic lter models, the parameters
ofwhicharestrictly related tothe anthropometric
quantities used for HRTF selection.The simplest
example is the sphericalmodeloptimizationin [44]
wherethe sphereradius is customized throughan
empiricalequation derived fromITDmeasurements
ofapopulationof25 subjects. Eq. 2reports the
weighted sumofhead dimensionsused tocalculate
the optimal sphereradius while Eq. 3represents a
trivialITD selection.
aopt =w1X1+w2X2+w3X3,(2)
s=min
i=1...Nai
opt al
opt,(3)
whereX1is head half-width,X2head half-height
and X3head half-length,wjis the j-thweight, ai
opt
is the optimal sphereradius forthe i-th database
subjectand al
opt is the optimal sphereradius of the
l-thlistener.At the end of this procedureweobtain
two alternatives totest and compare:aparameter-
ized sphericallter modeland aselected setofreal
ITDs.
Restricting ourattentiontothe cited structuralcom-
ponents (headand pinna), 3×3instancesofmixed
structuralmodels alreadyarisefromthe combina-
tionof the following alternatives:
pinnastructuralblock: measured KEMAR
PRTFs, AaltoPRTFdatabaseselection, selec-
tionofextracted PRTFs from HRTFdatabases;
headstructuralblock: measured pinnaless KE-
MAR HRTFs, selectionofextracted ITDs, pa-
rameterized sphericallter model.
4.3.Evaluation
The candidatemodels aresubjected tothree com-
plementaryevaluations:
objective evaluation: signal-related error
metrics such as spectraldistortionand spectral
cross-correlation;
auditorymodelevaluation:auditorylter-
banks and statisticalpredictionmodels [47];
subjective evaluation: listeningtests oflo-
calizationand realism.
The space ofpossible structuralmodel instances
is reduced byatwo-stage evaluation procedure,
made byasingle-componentand afull-modeleval-
uation.The single-componentevaluationfocuseson
the minimizationof the errorEmodelin Fig.6,de-
fined as the meanlocalization errorof the best syn-
thetic versionavailable. Adimensionally reduced lo-
calizationspace (e.g.mid-sagittalplane data only)
supports this early stage.The full-modelevalua-
tiontakes the best representativesolutions foreach
structuralcomponent in ordertotest the combined
effects and the orthogonality of the models within
full-space 3D virtual scenes. The minimizationof
Emix,defined as the meanlocalization errorof the
mixed structuralmodel, leads the mixingprocess.
5.DISCUSSION AND PERSPECTIVES
The research frameworkpresented in this paperan-
swersboththe requirements of structuralmodular-
ity and systematic HRTFmodelevaluation.Awell-
defined modusoperandiwiththe aim ofdesigning
newsynthetic lter models and HRIR/pHRIRse-
lection processes is expected toprogressively set the
AES134thConvention,Rome,Italy,2013 May4–7
Page7 of10
Geronazzoetal. A modularframeworkfortheanalysis and synthesisofHRTFs
barcloserand closerto a completestructuralHRTF
model. The modularapproach can be alsoextended
to a multimodaldomain wherethe integrationofa
3Daudio renderingenginewithothersensorymodal-
ities such as video and haptics (e.g.Phantomde-
vices) requiresan evaluationin termsof integration,
cross-augmentationand/orsensory substitution.A
possible technical solutionis the X3DISO standard
XML-based file format for representing 3D virtual
environments and H3DAPI11 as the handlerforuni-
fied graphic and haptic scene graphs.
Some possible future directions thatwell represent
furtherimprovements and developments ofnewtools
forsuch workmethodologyare now listed:
the definitionofastandardized format foran-
thropometric features (possibly borrowed from
biometric researches) so as tointegratethis in-
formationin the HRTFdatabase;
astudyof the acoustic contributionof the pinna
outsidethe mid-sagittalplane, thatrequires
amore complex analysis thanthatperformed
in [38];
the introductionofan earcanalmodel to ap-
proximatethe correctP7(see Fig.1)at the
eardrumalong withapossible formalizationof
astructuralHpIR model;
the manipulationofreliable and complex au-
ditorymodels so as tofacilitateasystematic
exclusion procedureofHRIR/pHRIRdistance
functionsand parametersof the synthetic mod-
els’ components;
the extensionof the mixed structuralmodel
by the inclusionofcomputer-simulated
HRIRs/pHRIRscalculated from meshmodels
ofhuman heads[48], spatially discretized so as
tobe included in ourunified database.
6.REFERENCES
[1] D. R.Begault. 3-DSoundfor Virtual Reality
and Multimedia.Academic Press Professional,
Inc., Cambridge,MA,USA,1994.
11http://www.h3dapi.org/
[2]A.H¨arm¨a, J. Jakka,M.Tikander,M.Kar-
jalainen,T.Lokki, J. Hiipakka,and G.Lorho.
Augmented reality audio for mobile and wear-
able appliances. J. AudioEng. Soc.,52(6):618–
639, June 2004.
[3]C.I.Cheng and G.H.Wakefield. Introduction
tohead-related transferfunctions (HRTFs):
RepresentationsofHRTFs in time, frequency,
and space.J. AudioEng. Soc.,49(4):231–249,
April 2001.
[4]C.P.Brownand R.O. Duda.Astructural
model forbinaural sound synthesis. IEEE
Trans. SpeechAudioProcess.,6(5):476–488,
September1998.
[5]H.Møller. Fundamentals ofbinaural technol-
ogy. Appl. Acoust.,36:171–218,1992.
[6]B.Masiero and J. Fels. Perceptually robust
headphone equalizationforbinauralreproduc-
tion.InProc.130th Conv. AudioEng. Soc.,
London,UK,May2011.
[7]Z.Sch¨arerand A.Lindau.Evaluationofequal-
izationmethods forbinaural signals. InProc.
126th Conv. AudioEng. Soc.,Munich,Ger-
many, May2009.
[8]A.W.Bronkhorst. Localizationofrealand vir-
tual sound sources. J. Acoust. Soc. Am.,98(5):
2542–2553,November1995.
[9]V.V¨alim¨aki, J. D. Parker,L.Savioja, J. O.
Smith,and J. S.Abel. Fifty yearsofartifi-
cialreverberation.IEEE Trans. Audio, Speech,
Lang.Process.,20(5):1421–1448, July 2012.
[10] D. R.Begault, E.M.Wenzel, and M.R.An-
derson. Directcomparisonof the impactof
headtracking,reverberation,and individual-
ized head-related transferfunctionsonthe spa-
tialperceptionofavirtual speech source.J. Au-
dioEng. Soc.,49(10):904–916,October2001.
[11] F. L.Wightmanand D. J. Kistler.Resolution
of front-backambiguity in spatialhearingby
listenerand source movement. J. Acoust. Soc.
Am.,105(5):2841–2853,May1999.
AES134thConvention,Rome,Italy,2013 May4–7
Page8 of10
Geronazzoetal. A modularframeworkfortheanalysis and synthesisofHRTFs
[12]M. D. Burkhardand R.M.Sachs. Anthropo-
metric manikin foracoustic research.J. Acoust.
Soc. Am.,58(1):214–222, July 1975.
[13]A.Abaza,A.Ross, C.Hebert, M.A. F. Harri-
son,and M.S.Nixon.Asurveyon earbiomet-
rics. ACM Trans.Embedded ComputingSys-
tems,9(4):39:1–39:33,March 2010.
[14]E.M.Wenzel, M.Arruda, D. J. Kistler,and
F. L.Wightman.Localization usingnonindi-
vidualized head-related transferfunctions. J.
Acoust. Soc. Am.,94(1):111–123, July 1993.
[15]H.Møller,M. F. Sørensen,C.B. Jensen,and
D. Hammershøi. Binaural technique: Do we
need individualrecordings?J. AudioEng. Soc.,
44(6):451–469, June 1996.
[16]G.Plenge.On the differencesbetween localiza-
tionand lateralization.J. Acoust. Soc. Am.,56
(3):944–951,September1974.
[17] D. S.Brungart. Near-field virtualaudio dis-
plays. Presence,11(1):93–106, February2002.
[18]E.C. Durantand G.H.Wakefield. Ecient
modelttingusingagenetic algorithm:Pole-
zero approximationsofHRTFs. IEEE Trans.
SpeechAudioProcess.,10(1):18–27, January
2002.
[19] D. J. Kistlerand F. L.Wightman.A modelof
head-related transferfunctionsbased on prin-
cipalcomponents analysis and minimum-phase
reconstruction.J. Acoust. Soc. Am.,91(3):
1637–1647,March 1992.
[20]M. J. Evans, J. A.S.Angus, and A.I.Tew.
Analyzinghead-related transferfunctionmea-
surements usingsurface sphericalharmonics. J.
Acoust. Soc. Am.,104(4):2400–2411,October
1998.
[21]V.R.Algazi, R.O. Duda,R.P.Morrison,and
D. M.Thompson.Structuralcompositionand
decompositionofHRTFs. InProc.IEEE Work.
Appl. Signal Process., Audio, Acoust.,pages
103–106,NewPaltz,NewYork, USA,October
2001.
[22]S.Hwang,Y.Park, and Y.Park. Modeling
and customizationofhead-related impulsere-
sponsesbased ongeneralbasis functions in time
domain. ActaAcusticaunitedwithAcustica,94
(6):965–980,November2008.
[23]K.H.Shin and Y.Park. Enhanced verticalper-
ceptionthrough head-related impulseresponse
customization based on pinnaresponsetuning
in the median plane.IEICETrans.Fundamen-
tals,E91-A(1):345–356, January2008.
[24]B.U.Seeberand H. Fastl. Subjectiveselection
ofnon-individualhead-related transferfunc-
tions. InProc.2003 Int. Conf. Auditory Display
(ICAD03),pages259–262,Boston,MA,USA,
July 2003.
[25]R.H.Y.So,B.Ngan,A.Horner, J. Braasch,
J. Blauert, and K.L.Leung.Towardor-
thogonalnon-individualised head-related trans-
ferfunctions forforwardand backward direc-
tional sound:Clusteranalysis and an experi-
mental study. Ergonomics,53(6):767–781, June
2010.
[26]B.F.G.Katzand G.Parseihian.Perceptually
based head-related transferfunction database
optimization.J. Acoust. Soc. Am.,131(2):
EL99–EL105, February2012.
[27]V.C.Raykar,R. Duraiswami, and B.Yegna-
narayana.Extractingthe frequencies of the
pinnaspectralnotches in measured headrelated
impulseresponses. J. Acoust. Soc. Am.,118(1):
364–374, July 2005.
[28]V.R.Algazi, R.O. Duda,and D. M.Thomp-
son.The useofhead-and-torsomodels forim-
proved spatial sound synthesis. InProc.113th
Conv. AudioEng. Soc.,pages1–18,LosAnge-
les, CA,USA,October2002.
[29]V.R.Algazi, R.O. Duda, D. M.Thompson,
and C.Avendano.The CIPIC HRTFdatabase.
InProc.IEEE Work. Appl. Signal Process., Au-
dio, Acoust.,pages1–4,NewPaltz,NewYork,
USA,October2001.
[30]G.Eckel. Immersiveaudio-augmented environ-
ments -the LISTENproject. InProc.5thIEEE
AES134thConvention,Rome,Italy,2013 May4–7
Page9 of10
Geronazzoetal. A modularframeworkfortheanalysis and synthesisofHRTFs
Int. Conf. Info. Visualization(IV’01),pages
571–573,LosAlamitos, CA,USA, July 2001.
[31]A.Andreopoulouand A.Roginska.Towards the
creationofastandardized HRTFrepository. In
Proc.131stConv. AudioEng. Soc.,NewYork,
NY,USA,October2011.
[32]N.Gupta,A.Barreto,M. Joshi, and J. C.
Agudelo.HRTFdatabaseat FIUDSP lab.
InProc.35thIEEE Int. Conf. Acoust., Speech,
Signal Process.(ICASSP2010),pages169–172,
Dallas, TX,USA,March 2010.
[33]W.G.Gardnerand K. D. Martin. HRTFmea-
surements ofaKEMAR.J. Acoust. Soc. Am.,
97(6):3907–3908, June 1995.
[34] J. G´omez Bola˜nosand V.Pulkki. HRIR
databasewithmeasured actual source direction
data.InProc.133rd Conv. AudioEng. Soc.,
SanFrancisco,CA,USA,October2012.
[35]T.Qu,Z.Xiao,M.Gong,Y.Huang,X.Li,
and X.Wu. Distance-dependenthead-related
transferfunctionsmeasured with highspatial
resolution using a sparkgap.IEEE Trans. Au-
dio, Speech,Lang.Process.,17(6):1124–1132,
August 2009.
[36]S.Spagnol, M.Hiipakka,and V.Pulkki. A
single-azimuth pinna-related transferfunction
database.InProc.14thInt. Conf. Digital Au-
dioEects (DAFx-11),pages209–212,Paris,
France,September2011.
[37]M.Geronazzo, F. Granza,S.Spagnol, and
F. Avanzini. Astandardized repositoryofhead-
related and headphone transferfunction data.
InProc.134th Conv. AudioEng. Soc.,Rome,
Italy, May2013.
[38]M.Geronazzo,S.Spagnol, and F. Avanzini. Es-
timationand modeling ofpinna-related trans-
ferfunctions. InProc.13thInt. Conf. Digital
AudioEects (DAFx-10),pages431–438,Graz,
Austria,September2010.
[39]S.Spagnol, M.Geronazzo,and F. Avanzini.
Hearingdistance:Alow-cost model fornear-
field binauraleffects. InProc.EUSIPCO2012
Conf.,pages2005–2009,Bucharest, Romania,
September2012.
[40]S.Spagnol, M.Geronazzo,and F. Avanzini.
Fittingpinna-related transferfunctions to an-
thropometry forbinaural sound rendering.In
Proc.IEEE Int. Work.Multi. Signal Pro-
cess.(MMSP10),pages194–199,Saint-Malo,
France,October2010.
[41]S.Spagnoland F. Avanzini. Real-time binau-
ralaudio renderingin the nearfield. InProc.
6thInt. Conf. Sound and MusicComputing
(SMC09),pages201–206,Porto,Portugal, July
2009.
[42]S.Spagnol, M.Geronazzo,and F. Avanzini.
On the relation between pinnareflection pat-
ternsand head-related transferfunctionfea-
tures. IEEE Trans. Audio, Speech,Lang.Pro-
cess.,21(3):508–520,March 2013.
[43]M.Geronazzo,S.Spagnol, and F. Avanzini. A
head-related transferfunctionmodel for real-
time customized 3-D sound rendering.InProc.
INTERPRETWork., SITIS2011 Conf.,pages
174–179, Dijon, France,November-December
2011.
[44]V.R.Algazi, C.Avendano,and R.O. Duda.
Estimationofaspherical-headmodel froman-
thropometry. J. AudioEng. Soc.,49(6):472–
479, June 2001.
[45] D. Doukhanand A.S´ed`es. CW binaural:A
binaural synthesis external forPureData.In
Proc.3rdPuredataInt. Conv.(PdCon09),S˜ao
Paulo,Brazil, July 2009.
[46] J. C.Middlebrooks. Individualdifferences in
external-eartransferfunctionsreduced by scal-
ingin frequency. J. Acoust. Soc. Am.,106(3):
1480–1492,September1999.
[47]E.H.A.Langendijk and A.W.Bronkhorst.
Contributionof spectralcues tohumansound
localization.J. Acoust. Soc. Am.,112(4):1583–
1596,October2002.
[48]B.F.G.Katz.Boundaryelementmethod calcu-
lationof individualhead-related transferfunc-
tion.I.Rigid modelcalculation.J. Acoust. Soc.
Am.,110(5):2440–2448,November2001.
AES134thConvention,Rome,Italy,2013 May4–7
Page10 of10
... For the above reasons, different alternative approaches towards HRTF-based synthesis were proposed throughout the last decades [37,97]. These are now reviewed and presented sorted by increasing level of customization. ...
Article
Full-text available
Electronic travel aids (ETAs) have been in focus since technology allowed designing relatively small, light, and mobile devices for assisting the visually impaired. Since visually impaired persons rely on spatial audio cues as their primary sense of orientation, providing an accurate virtual auditory representation of the environment is essential. This paper gives an overview of the current state of spatial audio technologies that can be incorporated in ETAs, with a focus on user requirements. Most currently available ETAs either fail to address user requirements or underestimate the potential of spatial sound itself, which may explain, among other reasons, why no single ETA has gained a widespread acceptance in the blind community. We believe there is ample space for applying the technologies presented in this paper, with the aim of progressively bridging the gap between accessibility and accuracy of spatial audio in ETAs.
... In questo contesto, vengono raccolti due scenari sperimentali al fine di applicare le funzionalità del feedback sonoro integrato a modalità tattili o visive, racchiusi in un sistema interattivo multimodale per ambienti virtuali [49]: (i) un feedback con audio 3D legato ai movimenti dell'utente durante una semplice attività di inseguimento di un bersaglio rappresenta un esempio applicativo di sistema riabilitativo motorio [50]; (iii) un sistema audio-tattile interattivo sintetizza l'informazione spaziale di mappe virtuali per l'educazione all'orientamento e alla mobilità (O&M) di persone non vedenti [51,52]. ...
Article
Full-text available
RIASSUNTO Gli effetti acustici rilevanti per la percezione verticale del suono, quali le riflessioni sui contorni del padiglione auricolare e le risonanze all'interno delle cavità dell'orecchio, possono venir isolati e modellati da una combinazione di filtri sintetici e contributi misurati acusticamente. Questo approccio prende il nome di modellazione strutturale mista (MSM) e viene utilizzato, più in generale, per descrivere la Head-Related Transfer Function (HRTF), che cattura gli effetti di testa, busto e orecchie dell'ascoltatore. Questo articolo impiega la MSM nella realizzazione di display uditivi capaci di adattarsi all'antropometria dell'ascoltatore per la resa della dimensione verticale del suono. ABSTRACT The most important acoustic effects involved in vertical spatial sound perception, i.e. reflections on pinna contours and resonances inside the ear cavities, are isolated and modeled separately. The combination of such components that can be chosen to include either synthetic or measured components, is formalized in the Mixed Structural Modeling (MSM) approach which describes, more in general, the Head-Related Transfer Functions (HRTFs) capturing effects of head, torso, and external ear of the listener. This paper employs the MSM approach aiming at building personalized virtual auditory displays (VADs) able to adapt to listener anthropometry and to convey vertical localization cues.
... Once this fundamental component is integrated, extensive listening sessions will attest the degree of accuracy and realism of the presented 3-D audio scenes. Still, the multi-flash device and head tracker are currently being used for anthropometry-based HRTF selection tests [6]. ...
Conference Paper
Full-text available
The paper presents a system for customized binaural audio delivery based on the extraction of the relevant features from a 2-D representation of the listener's pinna. A procedure based on multi-flash imaging for recognizing the main contours of the pinna and their position with respect to the ear canal entrance is detailed. The resulting contours drive the parametrization of a structural head-related transfer function model that performs in real time the spatialization of a desired sound file according to the listener's position with respect to the virtual sound source, tracked by sensor-equipped headphones. The low complexity of the model allows smooth implementation and delivery on any mobile device. The purpose of the desired system is to provide low-tech custom binaural audio to any user without the need of tedious and cumbersome subjective measurements.
... The externalization judgment simply required the subject to select one of two answers to the question "where did you hear the sound?", i.e. "inside the head" or "outside the head". More details on the software environment can be found in [19]. ...
Conference Paper
Full-text available
A novel approach to the selection of generic head-related transfer functions (HRTFs) for binaural audio rendering through headphones is formalized and described in this paper. A reflection model applied to the user’s ear picture facilitates extraction of the relevant anthropometric cues that are used for selecting two HRTF sets in a database fitting that user, whose localization performances are evaluated in a complete psychoacoustic experiment. The proposed selection increases the average elevation performances of 17% (with a peak of 34%) with respect to generic HRTFs from an anthropomorphic mannequin. It also significantly enhances externalization and reduces the number of up/down reversals.
... The former are usually recorded onto a significative number of human subjects and/or dummy heads by varying the position of the sound source with respect to the head, while the latter lead the equalization process of several types and models of headphones. The proposed repository, which represents one fundamental component of our own framework for the analysis and synthesis of head-related transfer function (HRTF) data (see [1] for more details), acts as an organized container for both HRIR and HpIR databases. In addition to full-body HRIRs, recordings of isolated body parts are also stored as partial Head-Related Impulse Responses (pHRIRs), e.g. ...
Conference Paper
Full-text available
This paper proposes a repository for the organization of full- and partial-body Head-Related Impulse Responses (HRIRs/pHRIRs) and Headphone Impulse Responses (HpIRs) from several databases in a standardized environment. The main differences among the available databases concern coordinate systems, sound source stimuli, sampling frequencies and other important specifications. The repository is organized so as to consider all these differences. The structure of our repository is an improvement with respect to the MARL-NYU data format, born as an attempt to unify HRIR databases. The introduced information supports flexible analysis and synthesis processes and robust headphone equalization.
... 1) progressively remove all the individual partial components, i.e. I = 0, S + M = N ; 2) provide reliable techniques to pHRTF modeling and pHRTF selection, and to evaluate their combinations [13] towards a complete structural model. ...
Conference Paper
Full-text available
A novel approach to the modeling of head-related transfer functions (HRTFs) for binaural audio rendering is formalized and described in this paper. Mixed structural modeling (MSM) can be seen as the generalization and extension of the structural modeling approach first defined by Brown and Duda back in 1998. Possible solutions for building partial HRTFs (pHRTFs) of the head, torso, and pinna of a specific listener are first described and then used in the construction of two possible mixed structural models of a KEMAR mannequin. Thanks to the flexibility of the MSM approach, an exponential number of solutions for building custom binaural audio displays can be considered and evaluated, the final aim of the process being the achievement of a HRTF model fully customizable by the listener.
Article
This paper focuses on the localization of footstep sounds interactively generated during walking and provided through headphones. Three distinct experiments were conducted in a laboratory involving a pair of sandals enhanced with pressure sensors and a footstep synthesizer capable of simulating two typologies of surface materials: solid (e.g., wood) and aggregate (e.g., gravel). Different sound delivery methods (mono, stereo, binaural) as well as several surface materials, in the presence or absence of concurrent contextual auditory information provided as soundscapes, were evaluated in a vertical localization task. Results showed that solid surfaces were localized significantly farther from the walker’s feet than the aggregate ones. This effect was independent of the used rendering technique, of the presence of soundscapes, and of merely temporal or spectral attributes of sound. The effect is hypothesized to be due to a semantic conflict between auditory and haptic information such that the higher the semantic incongruence the greater the distance of the perceived sound source from the feet. The presented results contribute to the development of further knowledge toward a basis for the design of continuous multimodal feedback in virtual reality applications .
Article
Purpose – The purpose of this paper is to present a system for customized binaural audio delivery based on the extraction of relevant features from a 2-D representation of the listener’s pinna. Design/methodology/approach – The most significant pinna contours are extracted by means of multi-flash imaging, and they provide values for the parameters of a structural head-related transfer function (HRTF) model. The HRTF model spatializes a given sound file according to the listener’s head orientation, tracked by sensor-equipped headphones, with respect to the virtual sound source. Findings – A preliminary localization test shows that the model is able to statically render the elevation of a virtual sound source better than non-individual HRTFs. Research limitations/implications – Results encourage a deeper analysis of the psychoacoustic impact that the individualized HRTF model has on perceived elevation of virtual sound sources. Practical implications – The model has low complexity and is suitable for implementation on mobile devices. The resulting hardware/software package will hopefully allow an easy and low-tech fruition of custom spatial audio to any user. Originality/value – The authors show that custom binaural audio can be successfully deployed without the need of cumbersome subjective measurements.
Conference Paper
Full-text available
Pinna-Related Transfer Functions (PRTFs) reflect the modifica-tions undergone by an acoustic signal as it interacts with the lis-tener's outer ear. These can be seen as the pinna contribution to the Head-Related Transfer Function (HRTF). This paper describes a database of PRTFs collected from measurements performed at the Department of Signal Processing and Acoustics, Aalto Univer-sity. Median-plane PRTFs at 61 different elevation angles from 25 subjects are included. Such data collection falls into a broader project in which evidence of the correspondence between PRTF features and anthropometry is being investigated.
Conference Paper
Full-text available
One of the main spatial audio topics, nowadays, involves working towards an efficient individualization method of Head Related Transfer Functions. A major limitation in this area of research is the lack of a large and uniform database that will incorporate as many individualized properties as possible. This paper presents the MARL-NYU file format for storing HRTF datasets, and investigates the necessary normalization steps that assure a uniform and standardized HRTF repository, by compiling selected datasets from four HRTF databases.
Article
Full-text available
Recognizing people by their ear has recently received significant attention in the literature. Several reasons account for this trend: first, ear recognition does not suffer from some problems associated with other non-contact biometrics, such as face recognition; second, it is the most promising candidate for combination with the face in the context of multi-pose face recognition; and third, the ear can be used for human recognition in surveillance videos where the face may be occluded completely or in part. Further, the ear appears to degrade little with age. Even though current ear detection and recognition systems have reached a certain level of maturity, their success is limited to controlled indoor conditions. In addition to variation in illumination, other open research problems include hair occlusion, earprint forensics, ear symmetry, ear classification, and ear individuality. This article provides a detailed survey of research conducted in ear detection and recognition. It provides an up-to-date review of the existing literature revealing the current state-of-art for not only those who are working in this area but also for those who might exploit this new approach. Furthermore, it offers insights into some unsolved ear recognition problems as well as ear databases available for researchers.
Conference Paper
Full-text available
An extremely low-order filter model for source distance rendering in binaural reproduction is proposed in this paper. The main purpose of such model is to cheaply simulate the effect that source-listener distance has on the sound waves arriving at the ears in the near field, a region where the relation between sound pressure and distance is both highly frequency-dependent and nonlinear. The reference for the model is based on an analytical description of a spherical head response, appropriately filtered out so as to include distance-dependent patterns only. To this regard, the model is objectively seen to provide an excellent fit in the whole near field, despite its simplicity.
Article
Experimental data are available for the Head-related impulse response (HRIR) for several azimuth and elevation angles, and for several subjects. The measured HRIR depends on several factors such as reflections from body parts (torso, shoulder, and knees), head diffraction, and reflection/diffraction effects due to the external ear (pinna). Due to the combined influence of these factors on the HRIR, it is difficult to isolate features thought to be perceptually important (such as the frequencies of pinna spectral notches) using standard signal-processing techniques. Signal-processing methods to extract the frequencies of the pinna spectral notches from the HRIR are presented. The techniques are applied to extracting the frequencies of the pinna spectral notches from the publicly available HRIR databases. A brief discussion relating the notch frequencies to the physical dimensions and the shape of the pinna is given. [The support of NSF Award ITR-0086075 is gratefully acknowledged.]
Article
The contribution of spectral cues to human sound localization was investigated by removing cues in 1/2- 1- or 2-octave bands in the frequency range above 4 kHz. Localization responses were given by placing an acoustic pointer at the same apparent position as a virtual target. The pointer was generated by filtering a 100-ms harmonic complex with equalized head-related transfer functions (HRTFs). Listeners controlled the pointer via a hand-held stick that rotated about a fixed point. In the baseline condition, the target, a 200-ms noise burst, was filtered with the same HRTFs as the pointer. In other conditions, the spectral information within a certain frequency band was removed by replacing the directional transfer function within this band with the average transfer of this band. Analysis of the data showed that removing cues in 1/2-octave bands did not affect localization whereas for the 2-octave band correct localization was virtually impossible. The results obtained for the I-octave bands indicate that up-down cues are located mainly in the 6-12-kHz band, and front-back cues in the 8 - 16-kHz band. The interindividual spread in response patterns suggests that different listeners use different localization cues. The response patterns in the median plane can be predicted using a model based on spectral comparison of directional transfer functions for target and response directions. (C) 2002 Acoustical Society of America.
Article
The localization performance was studied when subjects listened 1) to a real sound field and 2) to binaural recordings of the same sound field, made a) in their own ears and b) in the ears of other subjects. The binaural recordings were made at the blocked ear canal entrance, and the reproduction was carried out with individually equalized headphones. Eight subjects participated in the experiments, which took place in a standard listening room. Each stimulus (female speech) was emitted from one of 19 loudspeakers, and the subjects were to indicate the perceived sound source. When compared to real life, the localization performance was preserved with individual recordings. Nonindividual recordings resulted in an increased number of errors for the sound sources in the median plane, where movements were seen not only to nearby directions, but also to directions further away, such as confusion between sound sources in front and behind. The number of distance errors increased only slightly with nonindividual recordings. Earlier suggestions that individuals might localize better with recordings from other individuals found no support.
Article
In this tutorial, head-related transfer functions (HRTFs) are introduced and treated with respect to their role in the synthesis of spatial sound over headphones. HRTFs are formally defined, and are shown to be important in reducing the ambiguity with which the classical duplex theory decodes a free-field sound's spatial location. Typical HRTF measurement strategies are described, and simple applications of HRTFs to headphone-based spatialized sound synthesis are given. By comparing and contrasting representations of HRTFs in the time, frequency, and spatial domains, different analytic and signal processing techniques used to investigate the structure of HRTFs are highlighted.
Conference Paper
A database is presented consisting of head-related impulse responses (HRIR) of 21 subjects measured in anechoic chamber with simultaneous measurement of head position and orientation. The HRIR data for sound sources at 1.35 m and 68 cm in 240 directions with elevations between ±45° and full azimuth range were measured using blocked ear canal method. The frequency region of the measured responses ranges from 100 Hz up to 20 kHz for a flat response (+0.1 dB / - 0.5 dB). This data is accompanied with the measured azimuth and elevation of the source respect to the position and orientation of the subject's head obtained with a tracking system based on infrared cameras. The HRIR data is accessible from the Internet.