Content uploaded by Julien De Muynke
Author content
All content in this area was uploaded by Julien De Muynke on Oct 23, 2019
Content may be subject to copyright.
Tools for the production of spatial audio within BINCI
Andr´e Kruh-Elendt1, Andr´e Fiebig1, Roland Sottek1, and Julien De Muynke2
1HEAD acoustics GmbH, 52134 Herzogenrath, Germany,
Email: {andre.kruh-elendt, andre.f iebig, roland.sottek}@head-acoustics.de
2Eurecat, Centre Tecnol`ogic de Catalunya, Multimedia Technologies Group, 08005 Barcelona, Spain,
Email: julien.demuynke@eurecat.org
The research leading to these results has received funding from the European Union’s Horizon 2020
research and innovation programme under grant agreement No 732130 – BINCI project
Introduction
With the recent introduction of support for immersive
audiovisual technologies by some of the major content
sharing platforms, such as YouTube and Facebook, and
the establishment of standards for the efficient encod-
ing and transmission of spatial audio [1], interest in the
composition and employment of these audio formats has
extended beyond the academic and research fields to in-
clude artists, sound engineers and professional content
creators. Amongst the different existing spatial audio
formats, the binaural technology presents the advantage
of being suited for the reproduction of immersive audio
content across multiple platforms and devices, requiring
only a pair of headphones. Nonetheless, support for spa-
tial audio reproduction using loudspeaker arrays and in-
teroperability between formats are still expected.
With this in mind, the need for an integrated and flexi-
ble solution for the production of spatial audio becomes
clear. The development of such a set of tools, and their
incorporation into established audio production work-
flows, are the main goals of the BINCI project. In this
paper, an overview of the tools being developed is given.
Furthermore, an HRTF selection process and the employ-
ment of room impulse responses recorded with spherical
microphone arrays are highlighted as relevant techniques
adopted in the project.
Spatial audio production tools
Within BINCI, two main software modules are being de-
veloped, each one targeting specific user-groups and tasks
in the production and delivery of spatial audio content.
A brief overview of these tools is given below, while a
more detailed description can be found in [2].
The Binaural Home Studio (BHS) is a suite of pro-
duction and post-production tools targeting sound engi-
neers and creators involved in the process of designing,
editing and mixing audio with Digital Audio Worksta-
tions (DAWs) in typical studio environments.
The BHS is composed of four main components: The
Audio Processing Server, an ambisonics-based process-
ing unit performing encoding, manipulation and render-
ing operations on the audio input; a set of DAW Plug-ins,
which provide a simple graphical interface to control the
processing performed in the server; the Virtual Sound
Card, serving as a multichannel audio interface between
the DAW and the audio processing server; and a Visual-
izer, which gives graphical feedback to the end-user about
the current spatial configuration of the sound scene and
can be used to synchronize the produced audio with 360◦
video for audiovisual applications.
The use of a modular structure with plugins controlling
the background processing server allows a seamless in-
tegration of the spatial encoding and rendering engines
into the workflows typically used for the production of
stereo and multichannel content. Sound source panning
and clustering, and the application of modulation and
audio effects are some of the sound manipulations known
from typical audio production environments which have
been extended for spatial audio in the set of BINCI tools.
Fig. 1: Example of DAW plugins developed as part of BINCI
The Binaural Player (BP) is a cross-platform dy-
namic binaural rendering module designed for reproduc-
tion of ambisonics sound scenes. The BP is designed
primarily for playback of content created with the BHS,
nonetheless B-Format audio files and streams are also
accepted as an input. This ensures compatibility with
applications and platforms which support spatial audio
formats.
By using ambisonics as an intermediate format, both in
the BHS and the BP, sound scene manipulations, such
Fig. 2: Structure and main components of the Binaural Home Studio
as rotations, can be performed efficiently. Furthermore,
rendering audio to a three dimensional loudspeaker array
is possible, extending the flexibility for creators and end-
users to monitor and listen to the created content.
All tools developed within BINCI use dynamic binau-
ral rendering (i.e. head-tracking) by default and apply
headphone equalization whenever a matching filter for
the employed headphone model is available. By com-
bining these techniques with the use of high resolution
HRTFs, a HRTF selection process and the application
of acoustic room information, localization errors are re-
duced and the realism of binaural scenes is improved.
HRTF selection and individualization
The BHS and BP modules perform binaural rendering
using a database of high spatial resolution HRTFs (2,5◦
in both azimuth and elevation) measured for 25 adults.
Head and torso dimensions extracted according to [3] for
79 individuals were used in the selection of the 25 mea-
sured subjects. The selection process was based on an
iterative clustering intending to create maximal variance
of the extracted geometrical dimensions of the selected
adults. The extracted anthropometric features for the
selected subjects accompany each of the corresponding
HRTF sets. In addition to the measured HRTFs, the
potential for modelling HRTFs based on analytical mod-
els as described in [4] is currently being evaluated as a
solution for obtaining HRTFs for children.
With the purpose of providing a more realistic or plau-
sible binaural experience for the listener, a fast HRTF
selection and individualization procedure is included as
part of the tools developed in BINCI. The individual-
ization procedure consists of a pre-selection step, were
a reduced number of HRTF sets are presented to the
end-user based on the input of easily measurable head
dimensions, and an Interaural Time Differences (ITD)
adjustment step based on the method described in [5],
where the ITDs of the pre-selected HRTFs can be ad-
justed until a stable sound source is perceived at a given
location. Finally, the selected and adjusted HRTF set is
then exported for use in the binaural rendering process.
Fig. 3: Extraction of anthropometrical parameters for one of
79 subjects
The described ITD scaling and adjustment procedure
can be applied on ITD values extracted from the HRIRs
using either the threshold or cross-correlation with the
minimum-phase methods [6]. Alternatively, ITD values
can be modelled for any azimuth and elevation angles
using the provided head dimensions, thereby eliminating
discontinuities and exaggerated values for certain posi-
tions. Further investigations in the project will provide
information about the perceptual difference and possible
advantages for any of the two ITD computation methods,
singling out an approach for the final individualization
procedure.
Fig. 4: Comparison of modelled and extracted ITD values
for one subject and all measured azimuth positions
in the horizontal plane
Spatial Room Impulse Responses
One important feature of the spatial audio production
tools developed in BINCI is the capability to simulate
different environments (i.e. rooms). This is achieved by
convolving the audio input from the DAW with a filter
describing the acoustic properties in the selected envi-
ronment for different source-receiver configurations - a
so-called room impulse response (RIR). To increase com-
patibility with the employed spatial (ambisonics) format,
sets of RIRs have been recorded using spherical (am-
bisonics) microphone arrays as so-called Spatial Room
Impulse Responses (SRRs). Some examples of ambison-
ics RIRs have already been published in [7].
The filters are thereby separated into directional com-
ponents which contain each a RIR corresponding to a
specific direction of space. The number of components
depends on the ambisonics order of the SRRs, which in
turn depends on the model of ambisonics microphone
used for the measurements: Soundfield DSF-2 MKII and
Sennheiser Ambeo microphones are of order 1, Zylia ZM-
1 is of order 3 and MH Acoustics Eigenmike is of order
4.
The main advantage of using ambisonics RIRs (SRRs)
over omnidirectional RIRs is that the acoustic properties
of the room are reproduced in accordance with the lis-
tener’s head orientation with respect to the room. As an
example, a very asymmetric room like a long and nar-
row corridor has very different echoes patterns along the
X and the Y axis. Depending on whether the user is
looking along the X or Y axis, the series of echoes com-
ing from the sides (90oand 270o) and coming from the
front and the back (0oand 180o) are meant to be differ-
ent. The head-tracking based dynamic binaural synthesis
achieved by BINCI tools allows for a continuous change
of orientation of the RIR as the listener’s head rotates.
It should be noted that a convenient way to store, ex-
change and use SRRs data is proposed in [8] as a new
convention of SOFA file format [9].
Summary
A current overview of the development status for the spa-
tial audio production tools developed in BINCI has been
given. Some of the most relevant technologies employed
in the project and the advantages provided to members of
the creative industry are highlighted. Further improve-
ments and integration of the software solutions developed
in BINCI are the next steps in the project. Furthermore,
a current state of the production tools is being used by
selected content creators with the purpose of evaluating
their usability in creative environments and demonstrat-
ing the capabilities to the end-user. To this end, the
Fundaci´o Juan Mir´o in Barcelona, the Alte Pinakothek
in Munich and the St. Andrews Castle in St. Andrews
have agreed to serve as demonstration sites for BINCI
illustrating the potential benefit of binaural content for
museums and providing an option for usability tests at
large scale.
Acknowledgements
The authors would like to thank all partners involved
in the BINCI project for the productive and cooperative
work. Additionally, we thank Andreas Herweg, Fred-
eric Allion and Matthias Reffgen for their support in the
development of software prototypes, the optimization of
computation models and the execution of measurements
respectively.
The work presented in this document is part of BINCI, a
Horizon 2020 innovation project funded by the European
Union under the grant agreement No. 732130.
References
[1] J. Herre, J. Hilpert, A. Kuntz, and J. Plogsties,
“MPEG-H 3D Audio — The New Standard for Cod-
ing of Immersive Spatial Audio,” IEEE JOURNAL
OF SELECTED TOPICS IN SIGNAL PROCESS-
ING, vol. 9, no. 5, pp. 770–779, 2015.
[2] A. Garriga, M. E. Fuenmayor, and M. Caballero,
“Binaural tools for 3D audio production at home,”
NEM Summit, pp. 1–3, 2017.
[3] V. R. Algazi, R. O. Duda, and D. M. Thompson,
“The CIPIC HRTF Database,” Signal Processing,
no. October, pp. 99–102, 2001.
[4] R. Sottek and K. Genuit, “Physical modeling of in-
dividual HRTFs (head related transfer functions),”
DAGA 1999, 1999.
[5] A. Lindau, J. Estrella, and S. Weinzierl, “Individu-
alization of dynamic binaural synthesis by real time
manipulation of the ITD,” Proc of the 128th AES
Convention, no. January 2010, 2010.
[6] B. F. G. Katz and M. Noisternig, “A comparative
study of interaural time delay estimation methods,”
J. Acoust. Soc. Am., vol. 135, no. 6, 2014.
[7] “The openair database http://www.openairlib.
net.”
[8] A. P´erez and J. De Muynke, “Ambisonics Directional
Room Impulse Response as a new Convention of the
Spatially Oriented Format for Acoustics,” Engineer-
ing Brief, AES Convention 2018, to be published.
[9] “SOFA (spatially oriented format for acoustics)
https://www.sofaconventions.org.”