Content uploaded by Michele Geronazzo
Author content
All content in this area was uploaded by Michele Geronazzo on Mar 14, 2015
Content may be subject to copyright.
AudioEngineeringSociety
ConventionPaper
Presented at the134th Convention
2013 May4–7 Rome,Italy
Thispaper waspeer-reviewed asacomplete manuscriptforpresentationat thisConvention. Additional papersmay
be obtainedby sending requestandremittance to Audio EngineeringSociety,60 East42ndStreet,NewYork,New
York 10165-2520,USA;alsosee www.aes.org.All rightsreserved.Reproductionofthispaper,oranyportionthereof,
isnotpermittedwithoutdirect permission from theJournalof the AudioEngineeringSociety.
Amodular frameworkfortheanalysisand
synthesis ofHead-RelatedTransferFunctions
Michele Geronazzo1,Simone Spagnol1,and FedericoAvanzini1
1Dept.ofInformationEngineering,University ofPadova,ViaGradenigo 6/B,35131 Padova,Italy
Correspondence should be addressed toMichele Geronazzo(geronazzo@dei.unipd.it)
ABSTRACT
The papergivesanoverview ofanumberof tools forthe analysis and synthesis ofhead-related transferfunc-
tions (HRTFs) thatwe have developed in the past fouryearsat the DepartmentofInformation Engineering,
University ofPadova,Italy. The main objectiveofourstudy in this context is the progressive development
ofacollectionofalgorithms forthe constructionofatotally synthetic personalHRTF setreplacingboth
cumbersomeand tedious individualHRTFmeasurements and the exploitationof inaccurate non-individual
HRTF sets. Our research methodology is highlighted,along withthe multiple possibilities ofpresentand
futureresearch offered by such tools.
1.INTRODUCTION
Inrecent years spatial sound hasbecomeincreas-
ingly important in aplethora ofapplication domains.
Spatialrendering of sound is especially recognized to
greatly enhance the effectiveness ofauditoryhuman-
computerinterfaces[1], particularly in those cases
wherethe visual interface is limited in extension
and/or resolution(as in mobile applications[2]).
Furthermore, it aids improvingthe senseofpres-
ence in augmented/virtualreality systemsand adds
engagement tocomputergames. Winkingtothese
ever-growing application domains, headphone-based
reproductionsystemsdriven byheadtrackingde-
vices−if properly designed −allowtailoringimmer-
siveand realistic auditory scenes to anyuserwithout
the need ofexpensiveand cumbersomeloudspeaker-
based systems.
This papergivesabrief overview ofanumberof
tools forthe analysis and synthesis ofhead-related
transferfunctions (HRTFs, i.e. the frequency-and
location-dependentacoustic transferfunctionsbe-
tween the sound source and the eardrumofalis-
tener[3]) we have developed,highlightingin particu-
larour research methodologyalong withthe diverse
possibilities ofpresentand futureresearch offered
by the mentioned tools. The main objectiveofour
study in this context is the progressive development
ofacollectionofalgorithms forthe constructionofa
8882
Geronazzoetal. A modularframeworkfortheanalysis and synthesisofHRTFs
totally synthetic HRTF set suitable for real-timeren-
dering ofcustomspatialaudio, takingthe listener’s
anthropometric parametersas the sole input tothe
audio chain. Our modelingphilosophy is the child
of the structuralapproach byBrownand Duda[4]:
the totalcontributionof the listener’s body tothe
HRTF is split intosmallerblocks or modules, and
each module contains ameasured,reconstructed or
synthetic response,aswill be made clearerthrough-
out the paper.Ourapproach differentiates from re-
cent trends in HRTFcustomization because noself-
tuning ofparametersorselectionofresponses from
databaseswill be required in principle by the lis-
tener.
Afterthe shortpictureofpast and currentresearch
onHRTF-related issuesgiven in Section2,Section3
will introduce the currently adopted framework for
HRTFrendering and customization, including all of
the hardwareand softwaretools atourdisposal. Sec-
tion4 will then discuss in more detail the structural
modusoperandi, withafocusonthe multiple cur-
rentand futureresearch directionsmade possible by
the available tools (furtherexpanded in the conclu-
sive Section5).
2.BACKGROUND
Inordertoenable authentic auditoryexperiences,
the correct sound pressurelevel (SPL)due to one or
moreacoustic sourcespositioned in avirtual space
shall be reproduced at the eardrumsofalistener
by the pair ofheadphones. Inthe literature, sound
transmissionfromthe headphone tothe eardrum
is often represented throughananalogue circuit
model[5], reported in Fig.1.Withreference tosuch
model, the currentaim ofour research is the correct
reproductionof the sound pressureP6at the en-
trance of the blocked earcanalof the listenerdue to
asound source placed around him/her,even though
nothingprevents afuture extensionof thesestudies
tocircuit points nearertothe eardrumimpedence as
soonaspropertools areavailable (e.g., HRTFmea-
surements at the eardrum,earcanalmodels, etc.).
The classical solutionthatbest approximates ide-
ality involves the useof individualHRTFs mea-
sured onthe listenerhimself/herself. Byconvolv-
ing a desired monophonic sound signalwithapair
ofpersonalhead-related impulseresponses (HRIRs),
Zheadphone
Zeardrum
Zear_canal
P
6
+
_
P
5
+
_
P
7
+
_
transmission
line
Fig. 1: Circuit modelof sound transmissionfrom
the headphone tothe eardrum(after[5]).
one perear,adequately compensated forheadphone-
induced spectralcoloration1,one canreach almost
the samelocalizationaccuracyas in free-field lis-
teningconditions[8], especially when headmo-
tionand/orartificialreverberation[9]are consid-
ered [10,11].
Unfortunately, obtainingpersonalHRTFdatafor
avast numberofusers is simply unpracticable be-
causespecific hardware,anechoic spaces, and long
collectiontimesarestrictly required [4]. This is
the main reasonwhynon-individualHRTFs, usu-
ally measured onanthropomorphic mannequins [12],
areoften preferred in practice.The drawback
with non-individualHRTFs is that such peculiar
transferfunctions likely never match withthe lis-
tener’s unique anthropometry, and especially his/her
outerear[13], resultingin frequentlocalization er-
rors such as front/backreversals [14], elevationan-
gle misperception[15], and inside-the-headlocaliza-
tion[16,17].
Consequently, computationalmodels ofHRTFs pa-
rameterized onthe anthropometryof the listeneror
tunable by the listenerhimself/herself havesurfaced
1Headphones, when usedforthereproduction ofbinau-
ral signals, haveto be equalizedifhigh localization accu-
racyis needed.Unfortunately,thetransferfunction between
headphoneand eardrumheavily variesfromperson to person
and withsmall displacementsof theheadphoneitself.Such
variation is particularlymarkedinthehigh-frequencyrange
whereimportantelevation cuesgenerallylie.Thus, an inac-
curate compensation likelyleadstospectralcolorationsthat
affectbothsource elevation perception and sound externaliza-
tion [6]. Although varioustechniqueshavebeen proposedin
ordertoface such a delicateissue(see [7]forareview), mod-
eling the correctequalization filter is still a hotopenresearch
theme.
AES134thConvention,Rome,Italy,2013 May4–7
Page2 of10
Geronazzoetal. A modularframeworkfortheanalysis and synthesisofHRTFs
Head
Torso
Pinna
Head
Torso
Pinna
x[n]
y[n](l)
y[n](r)
Fig. 2: Ageneric structuralHRTFmodel.
in the past two decades. Accordingto Brownand
Duda[4], thesemodels can be classified in three
groups:
1.pole/zeromodels:filterdesign, systemidentifi-
cation,and neuralnetwork techniquesareap-
pliedin ordertofitmultiparameter models to
experimentaldata(e.g.[18]);
2.series expansionsbased e.g.on principalcom-
ponentanalysis (PCA)[19]orsurface spherical
harmonics (SSH)[20]appliedtocollectionsof
HRIRsor HRTFs;
3.structural models: the contributionsof the lis-
tener’s head,pinnae, shouldersand torsotothe
HRTFareisolated and arranged in differentfil-
terstructureseach accountingforsomewell-
defined physicalphenomenon,as Fig.2roughly
sketches. The linearity of these contributions
allowsreconstructionof the globalHRTF from
apropercombinationofall the considered ef-
fects [21].
Althoughrecent trends in HRTFcustomization
mainly havefocused onseries expansionswith
self-tuning ofweights [22,23]orsimply non-
individualized HRTF selection[24,25,26], struc-
turalHRTFmodelingremains the most attractive
alternativefromboththe viewpoints ofcomputa-
tionalefficiencyand physicalmeaning:parameters
of the renderingblocks sketched in Fig.2can be
estimated from realdata,fitted tolow-orderfilter
structures, and finally related tomeaningfulanthro-
pometric measurements. Our methodology follows
the structuralmodeling approach, subtly general-
ized such as to allowforthe inclusionofmeasured
Fig. 3: UML-likerepresentationof the research
framework structure.
orextrapolated datain one or moreof its compo-
nents. Forinstance,one would desiretocombinea
filter modelof the pinna withthe measured HRTF
ofa generic pinnaless mannequin, ortofeed apost-
processed HRTF includingthe responseof the pinna
alone (pinna-relatedtransfer function,PRTF[27])
into a filtersnowmanmodel[28]. We will referto
this approach asmixedstructural modelingthrough-
out the remainderof this paper.
3.THEFRAMEWORK
The first logicaldistinctionamongthe fundamen-
talcomponents of the presented frameworkregards
the file systemlevel. The database folderacts as the
main datacontainer,while all of the algorithms that
extractrelevant features in the available data are
stored in the analysisfolder.The synthesisfolder
contains the tools forspatialaudio renderingde-
signed and developed withan eyetoreal-time con-
straints. Finally, the experimental tools forsubjec-
tive evaluationofmodels and the related data are
organized in the evaluationfolder.The UML-like
diagramin Fig.3depicts all the components build-
ingthe framework.
3.1.HRIRand HpIRDatabases
The included datais underthe form of several sets
ofhead-related impulseresponses (HRIRs) recorded
forahigh numberof subjects in different spatiallo-
cations, and sets ofheadphone impulseresponses
AES134thConvention,Rome,Italy,2013 May4–7
Page3 of10
Geronazzoetal. A modularframeworkfortheanalysis and synthesisofHRTFs
(HpIRs) forthe characterizationofdifferenthead-
phone models used forcompensationin the repro-
duction process. Besidethe full-bodyHRIR repos-
itory, asimilarcontainerincludes the partialhead-
related impulseresponses (pHRIRs) recorded by iso-
latingspecific bodyparts (e.g.pinna-related im-
pulseresponses (PRIRs) measured onthe isolated
pinna,orimpulseresponsesmeasured onapinna-
less mannequin) or resultingfromthe decomposition
carriedon by the algorithmswewill mentionin the
next subsection.
There exists anumberofpublicly available HRIR
databases, the most notables ofwhicharethe CIPIC
HRTFdatabase[29]2and the LISTEN HRIR
database[30]3.The main differencesamongthese
and otherdatabasesconcernthe type of stimulus
used, the spatialgrid of the measured HRIRs, and
the microphone configuration(blocked- oropen-ear-
canal, distance fromthe eardrum,etc.).
Anattempt tounify such variability resulted in
the MARL-NYU dataformat[31], into whichthe
CIPIC,LISTEN, FIU[32]4and KEMAR MIT [33]5
databaseswerestored and organized.Ourcontribu-
tiontothe standardization process begun in [31] lies
in the introductionof somemissingrelevantdata:
1. the raw HRIRdatabesidethe alreadyavailable
compensated version;
2. the labeling ofeach HRIR’s onset sample;
3. the managementofheterogeneous spatialgrids.
The generaldatabaseorganizationwasalso applied
tothe Aalto[34]6,ARI7,and PKU&IOA[35]8
HRTFdatabases, and tothe AaltoPRTF
database[36]9which collects pinna-related im-
pulseresponses in the mid-sagittalplane aspHRIRs.
More details about the organizationof the HRIR,
pHRIR,and HpIR repositories can be found in [37].
2http://interface.cipic.ucdavis.edu/
3http://recherche.ircam.fr/equipes/salles/listen/
4http://dsp.eng.fiu.edu/HRTFDB/main.htm
5http://sound.media.mit.edu/resources/KEMAR.html
6http://www.acoustics.hut.fi/go/aes133-hrtf/
7http://www.kfs.oeaw.ac.at
8http://www.cis.pku.edu.cn/auditory/Staff/Dr.Qu.files/Qu-
HRTF-Database.html
9http://www.dei.unipd.it/∼spagnols/PRTFdb.zip
3.2.Signal AnalysisTools
The analysisfoldercontains all the Matlabscripts
and datastructuresexploitable for HRTFanaly-
sis. Ourtypicalworkflow followsananalysis-by-
synthesis paradigmwherethe step-by-step model-
ing of salient featuresplays asignificantrole in the
analysis of the acoustic signal. Anotable instance
of such paradigmis represented by the PRTF sepa-
rationalgorithm[38], whichiteratively extrapolates
the reflective componentofaPRTFwhile keeping
its resonant structureintactbydirect subtractionof
multi-notch filterstructures. Asimilaralgorithm,
used in [39], separates the near-and far-field contri-
butionsofarigid sphereapproximatingthe head,al-
lowingtomodel the two contributions independently
through differentfilterstructures.
Otherkindsofalgorithmsrepresent the “bound-
aries” of such methodology. Forinstance,anim-
age processing algorithmthatextracts the relevant
anthropometric parameters fromapictureof the
pinna[40] is currently beingdeveloped.Ascript
forPCA modeling ofHRTFdatathathelps un-
derstandingthe degree of variability of the transfer
functionswithrespect tospecificfeatures is avail-
able [41]. Last butnot least, headphone equaliza-
tionalgorithmsimplementingvarious inverse filter-
ingtechniquesareincluded.
3.3.Synthesized AudioRendering
The audio engine, stored in the synthesisfolder, in-
cludes four modules organized in separatesubfold-
ers:
•model:real-timerealizationsof the synthetic
structuralcomponents;
•components:collectionof tools thatperform
real-time convolutionsbetween audio files and
HRIRs/pHRIRs;
•headphones:management tool forheadphone
compensation filters;
•extra:utility bundle forI/Ooperations, sensors
and basic binauralprocessingtools.
Variouscombinationsofone or moreinstances for
each module are possible in ordertorealizeacan-
didateversionof the 3Daudio engine.All instances
areimplemented in PureData10,a graphicalpro-
10http://puredata.info/
AES134thConvention,Rome,Italy,2013 May4–7
Page4 of10
Geronazzoetal. A modularframeworkfortheanalysis and synthesisofHRTFs
grammingenvironment foraudio processing, in the
form ofC/C++ externals. All the tentative proto-
typesare catalogued in afurtherfolder(the render-
ingfolder), each accompanied byadescriptorfile
includingthe list of the modules used in that in-
stance.
The intrinsic modularity ofourapproach leadsus to
the implementationofone structuralfilterblock for
each relevantbodypart:
•apinnafilter realizationthatacts asasyn-
thetic PRTF, consisting ofapeakand notch
filterstructure[42,43]where each filteris fed
by three parameters (peak/notch central fre-
quency, bandwidth,and gain) each stored in a
configuration file indexed by the currenteleva-
tionangle;
•asphericalmodelof the headtakinginto ac-
count the far-field and near-field scatteringef-
fects around its rigid body. The parametriza-
tionis made ontothe sphereradius selected as
a weighed combinationof the listener’s head di-
mensions[44]and the near-field contributionis
reduced downto a first-ordershelvingfilter[39];
•aspherical torso approximation(as in the snow-
manmodel[28]) thatmodels elevation cuesat
lowfrequencies.
An expanded mentiongoes tothe contents of the
extrafolder, into whichacollectionof third-party
utility tools is keptupdated and integrated in our
prototypingenvironmentmainly developed in Mat-
lab,PureData and C/C++ libraries. Abasic exter-
nal forthe renderingprocess is CW binaural∼[45]
thatimplements real-time convolutionof sound in-
puts withselected HRIRs. Ithas the peculiarfea-
tureofbeing able toloadanarbitrarydiscreteset
ofHRIRs in .wav formatand realize different kinds
of interpolationsbetween adjacent spatialpositions.
This tool is at the basis ofourdynamic 3Daudio
renderingsystem,wherethe successful transposition
ofdynamic sources into a virtualworld notonly de-
pendsonthe accuracyof the interpolationscheme
but is alsoheavily conditioned by the quality of the
motiontrackingsystem.In decreasing orderofde-
gree ofimmersion,aPhaseSpace Impulse MoCap
system,ahead-pose estimationsystemvia webcam
Fig. 4: AGUIforlocalization experiments.
Fig. 5: Ahigh-level technicaldescriptionofatypi-
calexperimental scenario.
withthe faceAPIsoftwarelibrary, and a Trivisio Col-
ibriwireless inertialmotiontracker mounted ontop
ofapair ofheadphonesarealready integrated in our
environment.
3.4.Experimental Environment
An environment forsubjectivelocalizationtests is
stored in the evaluationfolder.AGUIin Matlab
offers the main environment forthe playbackand
judgmentof the HRTFdata and models tobe eval-
uated,reported in the screenshotof Fig.4.The
subject listens tothe sound stimulus, interactively
selects the perceived sound locationand/orother
properties, and switches tothe next sound stimulus.
AES134thConvention,Rome,Italy,2013 May4–7
Page5 of10
Geronazzoetal. A modularframeworkfortheanalysis and synthesisofHRTFs
DATABASE ANALYSIS SYNTHESIS
single
component
EVALUATION
full-model
EVALUATION
MIXED
STRUCTURAL
MODEL
HRTF
SELECTION
Emodel Emix
HRIRs
pHRIRs
extracted pHRIRs
best subject
best mixed
structural
model
Fig. 6: Typicalresearch workflow towardsacompletestructuralmodel.
The dataexchange between the Matlab environment
and the audio engineis granted by the OSC(Open
Sound Control) protocol, running ontopofUDP.
Fig.5reports atechnicaldescriptionofatypicalex-
periment, whereapnetlibraryand the dumpOSC
externalareresponsible forcommunicationonthe
Matlabsideand onthe PureDatasiderespectively.
The basic features forthe experimentersuch as sub-
jectand sessionmanagementarealso available. Sub-
jects’ recordsand their personal information can be
manipulated on demand.Each experimental session
is stored in anindependentfile labeled byases-
sion numberid. The task infostructcontains the
descriptiveinformationof the task and the times-
tamps foreach trialconducted within the session.
The latterfield operatesasaprimary key toread
and understand the experimentaldatastored in a
table specifically designed forthe purposeof the ex-
periment. Commonstatisticalanalysis softwarescan
directly import the alreadyorganized data.
4.MODUS OPERANDI
Theresearch process aims tobuild a completely
customizable structural modelthroughsubsequent
refinements,startingfromaselectionof recorded
HRIRsto a totally syntheticfiltermodel.The in-
termediatestepsare balanced mixturesof selected
pHRIRsand synthetic structuralcomponents.
Acandidatemixed structuralmodel is described
byasetofparametersand components; the eval-
uationstep guides the exclusionofcertain com-
binationsof such components. The obtained
3Daudio engineshall maximizethe efficiency-to-
parametrization rate: this means that synthetic sub-
models withahigh customization degree butpoor
localization performance compelus toconceiveadif-
ferentor modified submodel. Insuch acase,apre-
vious versionof that submodel, whetherit be syn-
thetic, directly recorded orextracted byanalysis, re-
mains the best optionforthe component’s acoustic
contribution up tothatpoint. As soonas the par-
tial selection process reaches its maximum,wewill
havethe optimal solutionforourcompletestructural
model. We now describein more detail ourtypical
research workflow,referringtoFig.6throughout the
whole section.
4.1.HRTF Selection
The simplest HRTF selection possible is realized by
choosingthe samesetofHRTFs forall of the listen-
ers. Whetherit be the best available localizerora
mannequin withmeananatomical features, nopre-
diction can be made onthe localization performance
ofaspecificlistener.Aninsight knowledgeof the re-
lation between localization cuesand anthropometric
featurescanguidethe selection process as in [46].
Ifmeasurements ofpinnadimensions foreach sub-
ject in the considered HRTFdatabaseareavailable,
asimple “best match” with directly measured pinna
dimensionsofacommonlistenercan be exploited.
Eq. 1showsapossible distance functiononthe left
pinna appliedtothe CIPIC HRTFdatabase:
s=min
i=1...Ndi
1+di
2+di
4−pl,(1)
whereNis the numberof subjects in the database,
dkis the vectorassociated tothe k-thanthropomet-
ric featureof the pinna, i.e. cavumconchaheight
AES134thConvention,Rome,Italy,2013 May4–7
Page6 of10
Geronazzoetal. A modularframeworkfortheanalysis and synthesisofHRTFs
(d1), cymbaconchaheight (d3)and fossaheight (d4)
respectively, and plis the distance fromthe superior
internalhelix bordertothe intertragic incisureof the
l-thlistener.At the end of this procedurethe s-th
subject is declared the best-matchingelevationlocal-
izerforlistenerland this correspondence is saved in
the experimentalenvironment.
4.2.Structural HRTF Selection&Modeling
The keen observershall criticizethe restrictionof the
latterselection proceduretothe sole contribution
of the pinna.Ofcourse, there exists no guarantee
forthe accuracy in azimuthlocalization.However,
the formerprocedure can be appliedtothe Aalto
PRTFdatabaseoralternatively to a collectionof
mid-sagittalPRTFs extracted in the analysis step.
On the otherside, the selectionofabest matching
contributionfromthe head bymeansofrecorded
impulseresponses (such as the pinnaless KEMAR
HRIRs) orofextracted ITD (interaural time differ-
ence)/ ILD (interaural leveldifference) information
mayadopt the same principle. Such analternative
leads to a finerselectionofeach structuralcompo-
nentand is at the baseof the progressionofour
mixed structuralmodeling approach.
HRTF sets resultingfromfurther refinements of
selection criteria should now be compared to our
candidatesynthetic filter models, the parameters
ofwhicharestrictly related tothe anthropometric
quantities used for HRTF selection.The simplest
example is the sphericalmodeloptimizationin [44]
wherethe sphereradius is customized throughan
empiricalequation derived fromITDmeasurements
ofapopulationof25 subjects. Eq. 2reports the
weighted sumofhead dimensionsused tocalculate
the optimal sphereradius while Eq. 3represents a
trivialITD selection.
aopt =w1X1+w2X2+w3X3,(2)
s=min
i=1...Nai
opt −al
opt,(3)
whereX1is head half-width,X2head half-height
and X3head half-length,wjis the j-thweight, ai
opt
is the optimal sphereradius forthe i-th database
subjectand al
opt is the optimal sphereradius of the
l-thlistener.At the end of this procedureweobtain
two alternatives totest and compare:aparameter-
ized sphericalfilter modeland aselected setofreal
ITDs.
Restricting ourattentiontothe cited structuralcom-
ponents (headand pinna), 3×3instancesofmixed
structuralmodels alreadyarisefromthe combina-
tionof the following alternatives:
•pinnastructuralblock: measured KEMAR
PRTFs, AaltoPRTFdatabaseselection, selec-
tionofextracted PRTFs from HRTFdatabases;
•headstructuralblock: measured pinnaless KE-
MAR HRTFs, selectionofextracted ITDs, pa-
rameterized sphericalfilter model.
4.3.Evaluation
The candidatemodels aresubjected tothree com-
plementaryevaluations:
•objective evaluation: signal-related error
metrics such as spectraldistortionand spectral
cross-correlation;
•auditorymodelevaluation:auditoryfilter-
banks and statisticalpredictionmodels [47];
•subjective evaluation: listeningtests oflo-
calizationand realism.
The space ofpossible structuralmodel instances
is reduced byatwo-stage evaluation procedure,
made byasingle-componentand afull-modeleval-
uation.The single-componentevaluationfocuseson
the minimizationof the errorEmodelin Fig.6,de-
fined as the meanlocalization errorof the best syn-
thetic versionavailable. Adimensionally reduced lo-
calizationspace (e.g.mid-sagittalplane data only)
supports this early stage.The full-modelevalua-
tiontakes the best representativesolutions foreach
structuralcomponent in ordertotest the combined
effects and the orthogonality of the models within
full-space 3D virtual scenes. The minimizationof
Emix,defined as the meanlocalization errorof the
mixed structuralmodel, leads the mixingprocess.
5.DISCUSSION AND PERSPECTIVES
The research frameworkpresented in this paperan-
swersboththe requirements of structuralmodular-
ity and systematic HRTFmodelevaluation.Awell-
defined modusoperandiwiththe aim ofdesigning
newsynthetic filter models and HRIR/pHRIRse-
lection processes is expected toprogressively set the
AES134thConvention,Rome,Italy,2013 May4–7
Page7 of10
Geronazzoetal. A modularframeworkfortheanalysis and synthesisofHRTFs
barcloserand closerto a completestructuralHRTF
model. The modularapproach can be alsoextended
to a multimodaldomain wherethe integrationofa
3Daudio renderingenginewithothersensorymodal-
ities such as video and haptics (e.g.Phantomde-
vices) requiresan evaluationin termsof integration,
cross-augmentationand/orsensory substitution.A
possible technical solutionis the X3DISO standard
XML-based file format for representing 3D virtual
environments and H3DAPI11 as the handlerforuni-
fied graphic and haptic scene graphs.
Some possible future directions thatwell represent
furtherimprovements and developments ofnewtools
forsuch workmethodologyare now listed:
•the definitionofastandardized format foran-
thropometric features (possibly borrowed from
biometric researches) so as tointegratethis in-
formationin the HRTFdatabase;
•astudyof the acoustic contributionof the pinna
outsidethe mid-sagittalplane, thatrequires
amore complex analysis thanthatperformed
in [38];
•the introductionofan earcanalmodel to ap-
proximatethe correctP7(see Fig.1)at the
eardrumalong withapossible formalizationof
astructuralHpIR model;
•the manipulationofreliable and complex au-
ditorymodels so as tofacilitateasystematic
exclusion procedureofHRIR/pHRIRdistance
functionsand parametersof the synthetic mod-
els’ components;
•the extensionof the mixed structuralmodel
by the inclusionofcomputer-simulated
HRIRs/pHRIRscalculated from meshmodels
ofhuman heads[48], spatially discretized so as
tobe included in ourunified database.
6.REFERENCES
[1] D. R.Begault. 3-DSoundfor Virtual Reality
and Multimedia.Academic Press Professional,
Inc., Cambridge,MA,USA,1994.
11http://www.h3dapi.org/
[2]A.H¨arm¨a, J. Jakka,M.Tikander,M.Kar-
jalainen,T.Lokki, J. Hiipakka,and G.Lorho.
Augmented reality audio for mobile and wear-
able appliances. J. AudioEng. Soc.,52(6):618–
639, June 2004.
[3]C.I.Cheng and G.H.Wakefield. Introduction
tohead-related transferfunctions (HRTFs):
RepresentationsofHRTFs in time, frequency,
and space.J. AudioEng. Soc.,49(4):231–249,
April 2001.
[4]C.P.Brownand R.O. Duda.Astructural
model forbinaural sound synthesis. IEEE
Trans. SpeechAudioProcess.,6(5):476–488,
September1998.
[5]H.Møller. Fundamentals ofbinaural technol-
ogy. Appl. Acoust.,36:171–218,1992.
[6]B.Masiero and J. Fels. Perceptually robust
headphone equalizationforbinauralreproduc-
tion.InProc.130th Conv. AudioEng. Soc.,
London,UK,May2011.
[7]Z.Sch¨arerand A.Lindau.Evaluationofequal-
izationmethods forbinaural signals. InProc.
126th Conv. AudioEng. Soc.,Munich,Ger-
many, May2009.
[8]A.W.Bronkhorst. Localizationofrealand vir-
tual sound sources. J. Acoust. Soc. Am.,98(5):
2542–2553,November1995.
[9]V.V¨alim¨aki, J. D. Parker,L.Savioja, J. O.
Smith,and J. S.Abel. Fifty yearsofartifi-
cialreverberation.IEEE Trans. Audio, Speech,
Lang.Process.,20(5):1421–1448, July 2012.
[10] D. R.Begault, E.M.Wenzel, and M.R.An-
derson. Directcomparisonof the impactof
headtracking,reverberation,and individual-
ized head-related transferfunctionsonthe spa-
tialperceptionofavirtual speech source.J. Au-
dioEng. Soc.,49(10):904–916,October2001.
[11] F. L.Wightmanand D. J. Kistler.Resolution
of front-backambiguity in spatialhearingby
listenerand source movement. J. Acoust. Soc.
Am.,105(5):2841–2853,May1999.
AES134thConvention,Rome,Italy,2013 May4–7
Page8 of10
Geronazzoetal. A modularframeworkfortheanalysis and synthesisofHRTFs
[12]M. D. Burkhardand R.M.Sachs. Anthropo-
metric manikin foracoustic research.J. Acoust.
Soc. Am.,58(1):214–222, July 1975.
[13]A.Abaza,A.Ross, C.Hebert, M.A. F. Harri-
son,and M.S.Nixon.Asurveyon earbiomet-
rics. ACM Trans.Embedded ComputingSys-
tems,9(4):39:1–39:33,March 2010.
[14]E.M.Wenzel, M.Arruda, D. J. Kistler,and
F. L.Wightman.Localization usingnonindi-
vidualized head-related transferfunctions. J.
Acoust. Soc. Am.,94(1):111–123, July 1993.
[15]H.Møller,M. F. Sørensen,C.B. Jensen,and
D. Hammershøi. Binaural technique: Do we
need individualrecordings?J. AudioEng. Soc.,
44(6):451–469, June 1996.
[16]G.Plenge.On the differencesbetween localiza-
tionand lateralization.J. Acoust. Soc. Am.,56
(3):944–951,September1974.
[17] D. S.Brungart. Near-field virtualaudio dis-
plays. Presence,11(1):93–106, February2002.
[18]E.C. Durantand G.H.Wakefield. Efficient
modelfittingusingagenetic algorithm:Pole-
zero approximationsofHRTFs. IEEE Trans.
SpeechAudioProcess.,10(1):18–27, January
2002.
[19] D. J. Kistlerand F. L.Wightman.A modelof
head-related transferfunctionsbased on prin-
cipalcomponents analysis and minimum-phase
reconstruction.J. Acoust. Soc. Am.,91(3):
1637–1647,March 1992.
[20]M. J. Evans, J. A.S.Angus, and A.I.Tew.
Analyzinghead-related transferfunctionmea-
surements usingsurface sphericalharmonics. J.
Acoust. Soc. Am.,104(4):2400–2411,October
1998.
[21]V.R.Algazi, R.O. Duda,R.P.Morrison,and
D. M.Thompson.Structuralcompositionand
decompositionofHRTFs. InProc.IEEE Work.
Appl. Signal Process., Audio, Acoust.,pages
103–106,NewPaltz,NewYork, USA,October
2001.
[22]S.Hwang,Y.Park, and Y.Park. Modeling
and customizationofhead-related impulsere-
sponsesbased ongeneralbasis functions in time
domain. ActaAcusticaunitedwithAcustica,94
(6):965–980,November2008.
[23]K.H.Shin and Y.Park. Enhanced verticalper-
ceptionthrough head-related impulseresponse
customization based on pinnaresponsetuning
in the median plane.IEICETrans.Fundamen-
tals,E91-A(1):345–356, January2008.
[24]B.U.Seeberand H. Fastl. Subjectiveselection
ofnon-individualhead-related transferfunc-
tions. InProc.2003 Int. Conf. Auditory Display
(ICAD03),pages259–262,Boston,MA,USA,
July 2003.
[25]R.H.Y.So,B.Ngan,A.Horner, J. Braasch,
J. Blauert, and K.L.Leung.Towardor-
thogonalnon-individualised head-related trans-
ferfunctions forforwardand backward direc-
tional sound:Clusteranalysis and an experi-
mental study. Ergonomics,53(6):767–781, June
2010.
[26]B.F.G.Katzand G.Parseihian.Perceptually
based head-related transferfunction database
optimization.J. Acoust. Soc. Am.,131(2):
EL99–EL105, February2012.
[27]V.C.Raykar,R. Duraiswami, and B.Yegna-
narayana.Extractingthe frequencies of the
pinnaspectralnotches in measured headrelated
impulseresponses. J. Acoust. Soc. Am.,118(1):
364–374, July 2005.
[28]V.R.Algazi, R.O. Duda,and D. M.Thomp-
son.The useofhead-and-torsomodels forim-
proved spatial sound synthesis. InProc.113th
Conv. AudioEng. Soc.,pages1–18,LosAnge-
les, CA,USA,October2002.
[29]V.R.Algazi, R.O. Duda, D. M.Thompson,
and C.Avendano.The CIPIC HRTFdatabase.
InProc.IEEE Work. Appl. Signal Process., Au-
dio, Acoust.,pages1–4,NewPaltz,NewYork,
USA,October2001.
[30]G.Eckel. Immersiveaudio-augmented environ-
ments -the LISTENproject. InProc.5thIEEE
AES134thConvention,Rome,Italy,2013 May4–7
Page9 of10
Geronazzoetal. A modularframeworkfortheanalysis and synthesisofHRTFs
Int. Conf. Info. Visualization(IV’01),pages
571–573,LosAlamitos, CA,USA, July 2001.
[31]A.Andreopoulouand A.Roginska.Towards the
creationofastandardized HRTFrepository. In
Proc.131stConv. AudioEng. Soc.,NewYork,
NY,USA,October2011.
[32]N.Gupta,A.Barreto,M. Joshi, and J. C.
Agudelo.HRTFdatabaseat FIUDSP lab.
InProc.35thIEEE Int. Conf. Acoust., Speech,
Signal Process.(ICASSP2010),pages169–172,
Dallas, TX,USA,March 2010.
[33]W.G.Gardnerand K. D. Martin. HRTFmea-
surements ofaKEMAR.J. Acoust. Soc. Am.,
97(6):3907–3908, June 1995.
[34] J. G´omez Bola˜nosand V.Pulkki. HRIR
databasewithmeasured actual source direction
data.InProc.133rd Conv. AudioEng. Soc.,
SanFrancisco,CA,USA,October2012.
[35]T.Qu,Z.Xiao,M.Gong,Y.Huang,X.Li,
and X.Wu. Distance-dependenthead-related
transferfunctionsmeasured with highspatial
resolution using a sparkgap.IEEE Trans. Au-
dio, Speech,Lang.Process.,17(6):1124–1132,
August 2009.
[36]S.Spagnol, M.Hiipakka,and V.Pulkki. A
single-azimuth pinna-related transferfunction
database.InProc.14thInt. Conf. Digital Au-
dioEffects (DAFx-11),pages209–212,Paris,
France,September2011.
[37]M.Geronazzo, F. Granza,S.Spagnol, and
F. Avanzini. Astandardized repositoryofhead-
related and headphone transferfunction data.
InProc.134th Conv. AudioEng. Soc.,Rome,
Italy, May2013.
[38]M.Geronazzo,S.Spagnol, and F. Avanzini. Es-
timationand modeling ofpinna-related trans-
ferfunctions. InProc.13thInt. Conf. Digital
AudioEffects (DAFx-10),pages431–438,Graz,
Austria,September2010.
[39]S.Spagnol, M.Geronazzo,and F. Avanzini.
Hearingdistance:Alow-cost model fornear-
field binauraleffects. InProc.EUSIPCO2012
Conf.,pages2005–2009,Bucharest, Romania,
September2012.
[40]S.Spagnol, M.Geronazzo,and F. Avanzini.
Fittingpinna-related transferfunctions to an-
thropometry forbinaural sound rendering.In
Proc.IEEE Int. Work.Multi. Signal Pro-
cess.(MMSP’10),pages194–199,Saint-Malo,
France,October2010.
[41]S.Spagnoland F. Avanzini. Real-time binau-
ralaudio renderingin the nearfield. InProc.
6thInt. Conf. Sound and MusicComputing
(SMC09),pages201–206,Porto,Portugal, July
2009.
[42]S.Spagnol, M.Geronazzo,and F. Avanzini.
On the relation between pinnareflection pat-
ternsand head-related transferfunctionfea-
tures. IEEE Trans. Audio, Speech,Lang.Pro-
cess.,21(3):508–520,March 2013.
[43]M.Geronazzo,S.Spagnol, and F. Avanzini. A
head-related transferfunctionmodel for real-
time customized 3-D sound rendering.InProc.
INTERPRETWork., SITIS2011 Conf.,pages
174–179, Dijon, France,November-December
2011.
[44]V.R.Algazi, C.Avendano,and R.O. Duda.
Estimationofaspherical-headmodel froman-
thropometry. J. AudioEng. Soc.,49(6):472–
479, June 2001.
[45] D. Doukhanand A.S´ed`es. CW binaural∼:A
binaural synthesis external forPureData.In
Proc.3rdPuredataInt. Conv.(PdCon09),S˜ao
Paulo,Brazil, July 2009.
[46] J. C.Middlebrooks. Individualdifferences in
external-eartransferfunctionsreduced by scal-
ingin frequency. J. Acoust. Soc. Am.,106(3):
1480–1492,September1999.
[47]E.H.A.Langendijk and A.W.Bronkhorst.
Contributionof spectralcues tohumansound
localization.J. Acoust. Soc. Am.,112(4):1583–
1596,October2002.
[48]B.F.G.Katz.Boundaryelementmethod calcu-
lationof individualhead-related transferfunc-
tion.I.Rigid modelcalculation.J. Acoust. Soc.
Am.,110(5):2440–2448,November2001.
AES134thConvention,Rome,Italy,2013 May4–7
Page10 of10