Conference PaperPDF Available

Comparison of methods for upmixing B- format RIRs to Higher Order Ambisonics

Authors:

Abstract and Figures

Ambisonics is a well-established technique for capturing, storing and reproducing spatial sound. B-format refers to a commonly established first order Ambisonics audio storage format, frequently used in 3D spatial audio applications due to the widespread availability of supporting recording hardware and panning strategies. However, modern multichannel loudspeaker systems and surround formats (Dolby Atmos, Auro-3D, etc.) allows the reproduction of soundscapes with increased spatial details that requires the involvement of Higher Order Ambisonics (HOA) signals. The present paper discusses and compares two techniques for extrapolating HOA signals from first order audio for the special case of measured room impulse responses, resulting in an enhanced spatial resolution of the reproduced sound.
Content may be subject to copyright.
Scientific Society for Optics, Acoustics,
Motion Pictures and Theatre
23-24 September 2021 Budapest Hungary
COMPARISON OF TWO METHODS FOR UPMIXING B-FORMAT ROOM IMPULSE
RESPONSES TO HIGHER ORDER AMBISONICS
Gergely Firtha1, Csaba Huszty1
1ENTEL Engineering Research & Consulting Ltd. Inspired Acoustics division
Abstract: Ambisonics is a well-established technique for capturing, storing and reproducing spatial sound. B-format
refers to a commonly established first order Ambisonics audio storage format, frequently used in 3D spatial audio
applications due to the widespread availability of supporting recording hardware and panning strategies. However,
modern multichannel loudspeaker systems and surround formats (Dolby Atmos, Auro-3D, etc.) allows the reproduction
of soundscapes with increased spatial details that requires the involvement of Higher Order Ambisonics (HOA) signals.
The present paper discusses and compares two techniques for extrapolating HOA signals from first order audio for the
special case of measured room impulse responses, resulting in an enhanced spatial resolution of the reproduced sound.
Keywords: Spatial Audio, Ambisonics, B-format, Convolutional reverberation
1. INTRODUCTION
The spatially correct recreation of acoustical
properties of reverberant spaces has been the subject of
research for the last several decades. The most
straightforwardthough computationally demanding
strategy is convolution reverberation, relying on the
measurement of the room impulse response (RIR): Once
the impulse response of the acoustical environment,
along with directional information is captured,
reverberation can be performed by its convolution with
the arbitrary input signal [1, 2]. Spatial informationi.e.
directions of arrival in the reverberant sound fieldis
captured by recording the target field via a microphone
array, while reproduction is possible by either driving a
multichannel loudspeaker distribution, or by direct
downmixing to headphones. Consequently, a suitable
audio format is required for storing the measured spatial
audio data.
Ambisonics, a concept introduced by Gerzon et. al. [3,
4] is a surround sound format. Unlike the most common
channel-based audio formats containing the direct
loudspeaker feeds, Ambisonic signals represent the
captured sound field in terms of the series of spherical
functions up to a given order (), centered at the
microphone position. Each spherical function, termed as a
spherical harmonic (and thus each Ambisonic signal)
corresponds to a given virtual microphone directivity, with
the number of sidelobes increasing with the spherical
harmonic order. For a specific microphone array a suitable
encoding strategy is required in order to arrive at the
Ambisonics representation [5], and similarly, on the
reproduction side the Ambisonic signals have to be
decoded to the given loudspeaker array (see e.g. [6]). In
the aspect of convolution reverberation the application of
Ambisonics representation is favourable: once the
Ambisonics representation of the measured RIR is known,
the convolution operator may be performed in the
Ambisonics domain, followed by the decoding of the
loudspeaker feeds to the actual loudspeaker layout. The
number of simultaneous convolutions is, therefore,
determined by the Ambisonics order, independently of
the number of loudspeakers for reproduction.
Up until now the most widespread Ambisonic format
is First Order Ambisonics (FOA, ), most commonly
referred to as the B-format signals, giving a limited spatial
resolution representation of the sound field [5]. Higher
Order Ambisonics (HOA) contain the spherical harmonic
decomposition of the captured field up to a higher order
(), with the spatial resolution and the area of correct
reproduction (i.e. the size of the sweet spot) increasing
with the HOA order. Therefore, HOA ensures a highly
enhanced reproduction fidelty. Although commercially
available microphone arrays already exist for measuring
AAAA 2021 BUDAPEST
9TH CONGRESS OF THE ALPS ADRIA ACOUSTICS ASSOCIATION
23-24 September 2021 Budapest Hungary
HOA signals up to the 4th order [7], in the case of real-
time convolutional reverberation the application of FOA is
still more feasible [8]. On the other hand, modern
multichannel loudspeaker systems and surround formats
(Dolby Atmos, Auro-3D, etc.) allow the reproduction of
soundscapes with increased spatial details, involving
Higher Order Ambisonics signals. Besides the enhanced
spatial resolution the involvement of higher order
Ambisonics signals also increase the area at which correct
localization is ensured; furthermore it improves the
perceived spatial depth mapping, i.e. ensures a clearer
separation between foreground and background sound
[5]. Therefore, the application of a computationally cheap
FOA to HOA extrapolation method that allows the
approximation of HOA signals after convolutional
reverberation, is of great interest. The present
contribution discusses and compares two extrapolation
methods as solutions for this problem.
2. MATHEMATICAL PRELIMINARIES
2.1. B-format and Higher Order Ambisonics
Assume an arbitrary pressure field 󰇛󰇜
measured on a spherical surface for the sake of simplicity
centered at the origin, and with  denoting spherical
coordinates. The sound field along the sphere can be
expanded into the series of spherical harmonics [9],
reading as
󰇛󰇜

 󰇛󰇜
󰇛󰇜 (1)
where
are the spherical harmonics of the -th order,
given by
󰇛󰇜
󰇛󰇜󰇛󰇜
 󰇛󰇜
󰇛󰇜󰇛󰇜

 (2)
with  denoting the associated Legendre polynomials.
Spherical harmonics up to the -th order are
illustrated in Figure 1. For the sake of brevity in the
following, a shorthand, linear indexing notation
is also used.
The spectral coefficients 󰇛󰇜 are the Ambisonics
signals, formally obtained from the spherical harmonic
transform of the sound field (󰇝󰇛󰇜󰇞 ). In practice
the sound field is sampled in discrete microphone
positions and discrete time samples. The numerical
evaluation of the SHT in order to obtain the Ambisonic
signals is out of the scope of the present paper.
In most practical applications it is exploited that the
radius of measurement is small compared to the
wavelength. Assuming that as a limiting case the
SHT in the center of the array reads as
󰇛󰇜󰇛󰇜󰇛󰇜 (3)
Fig.1. Magnitude of the spherical harmonic functions for  ().
AAAA 2021 BUDAPEST
9TH CONGRESS OF THE ALPS ADRIA ACOUSTICS ASSOCIATION
23-24 September 2021 Budapest Hungary
with 󰇟󰇠. In practice the series is truncated to a
given order, with truncation to resulting in B-format
signals. Since the spatial resolution of the spherical
harmonics increases with the harmonics order, B-format
signals allow the representation of the sound field with
poor spatial details, still being sufficient for most practical
applications. Furthermore, the truncation order directly
defines the radius around the center inside which the
series expansion correctly describes the sound field under
consideration.
For specific sound field models the SHT is known
analytically. As the simplest example the SHT of a plane
wave, propagating into the direction is given by
󰇛󰇜 󰇛󰇜
󰇛󰇜
󰇛󰇜󰇛󰇜󰇛󰇜, (4)
with 󰇛󰇜 being the time signal propagating as a plane
wave
Finally, in the reproduction stage for a given speaker
layout, loudspeaker feeds have to be calculated from the
Ambisonics representation. This may be performed by
applying a suitable decoding/panning strategy, which is
again, out of the scope of the present treatise.
2.2. Reverberation with measured RIRs
Convolutional reverberation relies on the
measurement of the impulse response of the given
acoustic environment, termed as the room impulse
response (RIR). Based on the measured room impulse
response 󰇛󰇜 the reverberant pressure field at the given
receiver position for an arbitrary excitation signal 󰇛󰇜 can
be calculated by the convolution
󰇛󰇜
󰇛󰇜󰇛󰇜 (5)
A typical RIR measurement can be observed in Figure 2.
As it is shown in this Figure the RIR is composed of the
direct sound, arriving directly from the source position,
followed by the series of impulses reflected in the
reverberant environment [10]. The latter reverberant part
can be further subdivided to a time regime containing
(a)
(b)
Fig.2. Result of a typical room impulse response
measurement in the time (a) and frequency (b) domain,
used as an example in the following sections. The RIR was
measured in a relatively small concert venue.
early reflections with well-defined directions of arrival
(DOA) and a noise-like diffuse tail of late reflections.
In order to recreate not only the pressure field, but the
spatial characteristics, i.e. the directions of arrival in the
reverberant field, instead of a single microphone position
the RIRs are captured at multiple, suitably chosen
locations in the space. By assuming that the microphones
are located close to each other at the spherical angles
(e.g. along a spherical surface with the radius sufficiently
small) the spherical harmonic decomposition may be
performed, denoted by
󰇛󰇜󰇝󰇛󰇜󰇞 (6)
As a further stepsince both the SHT and the convolution
are linear operationsconvolutional reverberation may
be performed directly in the spherical harmonic domain,
with the number of convolution operations defined by the
actual SHT order. Finally, the actual loudspeaker feeds are
AAAA 2021 BUDAPEST
9TH CONGRESS OF THE ALPS ADRIA ACOUSTICS ASSOCIATION
23-24 September 2021 Budapest Hungary
obtained from the Ambisonic representation of the
reverberant signals by an appropriate decoding method.
3. UPMIXING B-FORMAT RIRS TO HOA SIGNALS
3.2. Problem statement
As previously discussed, in most practical applications
the SHT of the measured room impulse responses 󰇛󰇜
is calculated up to the first order, resulting in so-called B-
format RIRs. The 4 B-format RIR signals correspond to the
measured impulse responses, captured by virtual
microphones with the directivities shown in the rows
and in Figure 1. Obviously, restricting the number
of convolutions to four channels in the reverberation
stage is feasible in the aspect of computational demand.
However, modern multichannel surround speaker layouts
allow the accurate reproduction of even higher spherical
harmonic orders, with more fine spatial resolution,
extended reproduction area and improved spatial depth
mapping.
In the following two extrapolation methods are
discussed that allow the approximation of higher order
Ambisonics RIR signals from B-format RIRs (󰇛󰇜
󰇛󰇜). A basic requirement for the following FOA to HOA
extrapolation methods is to enhance the directional
information of the direct sound and the early reflections
in the room impulse response, while minimizing the
introduced colouration artifacts.
First an extrapolation method proposed in [11, 5] is
discussed, being extended in the following sections.
3.2. The original extrapolation method
The technique relies on the analysis and resynthesis of
the original B-format RIR signals. Both the analysis and
resynthesis steps require an appropriate sound field
model, chosen a-priori. In the following it is assumed that
the sound field captured by the microphones consists of
the series of plane waves arriving at different time
instants from different spatial directions. This simple
sound field model holds for most of the practical cases,
when points of the wavefront have been travelling over
sufficiently large distances from either the sound source,
or from the reflecting surface to the microphone.
1
More precisely the related literature proposes to bandlimit the FOA
signals to  to , where the directional mapping is correct.
By considering a plane wave propagation model, the
signal processing scheme consists of the following steps:
in the first step a unique direction of arrival 󰇛󰇜
is assigned for each input RIR sample. Hence, in the
signal processing scheme each sample is considered
to be an individual plane wave component, arriving at
the receiver position. This step requires a suitable
DOA estimation method based on the spatial
information carried by the B-format signals.
as a following step the time history of each plane
wave component 󰇛󰇜 has to be estimated
samplewise.
once a plane wave component is identified for each
RIR sample the spherical harmonic representation of
the RIR can be re-expressed, since the SHT of a plane
wave is known analytically, given by (4).
Based on the foregoing the extrapolated HOA RIR
signals are given by
󰇛󰇜󰇛󰇜󰇛󰇛󰇜󰇜 (7)
where both the samplewise plane wave time history and
the DOAs may be estimated by various methods.
Generally, the former is simply taken to be the
omnidirectional Ambisonic signal () [5],
given by 󰇛󰇜
󰇛󰇜 (8)
For the direction of arrival, several approximations have
been been given in the related literature. As the
simplestand computationally cheapestapproach the
DOA is approximated as the angle of the time-domain
intensity vector [12]: The intensity vectormore
generally defined in the STFT domain [13, 14]is pointing
in the direction of the local energy flow, and can be
calculated directly from the FOA signals
1
, reading as
󰇛󰇜
󰇛󰇜󰇯
󰇛󰇜
󰇛󰇜
󰇛󰇜󰇰󰇛󰇜󰇛󰇜
󰇛󰇜 (9)
The described method ensures a simple upmixing
solution from FOA to HOA signals by panning each FOA
sample as an individual plane wave. However, the method
Above this frequency range spatial aliasing occurs due to the non-zero
radius of the microphone array.
AAAA 2021 BUDAPEST
9TH CONGRESS OF THE ALPS ADRIA ACOUSTICS ASSOCIATION
23-24 September 2021 Budapest Hungary
suffers from several high frequency coloration artifacts in
practice. The reason behind these artifacts is the fact that
Equation (7) describes the multiplication of the
omnidirectional microphone signals by a wideband
panning function described by 󰇛󰇛󰇜󰇜. The panning
function is a wide-band noise-like function even if the DOA
is strictly bandlimited due to the non-linear
transformation after being in the argument of the
spherical harmonic functions. Obviously, the
corresponding spectral convolution in the Fourier domain
irreversibly corrupts the high frequency content: typically,
the long decays of low frequencies are smeared into the
high frequency region, resulting in a spectral brightening
of the diffuse tail. The problem is enhanced in higher order
signals, but is also clearly audible even in resynthesized
first order Ambisonics signals [12, 11, 15].
In order to overcome this limitation several
modifications have been introduced in the related
literature. A straightfoward strategy is to manually modify
the spectral envelope of the extrapolated Ambisonics
signals based on the analytically known correct spectral
decay of the higher order signals and the measured
spectral envelope of the omnidirectional signal. The
approach requires the subband decomposition of each
Ambisonics signal and calculation of the envelope of each
subband signal. This solution, therefore, introduces
significant computational cost increase into the signal
processing chain. A further description of the subband
equalization technique is found in [5].
In the following a computationally cheap alternative
method is introduced.
3.3. Improvement of the extrapolation technique
In order to improve the frequency performance of the
above upmixing method the following extension is
proposed: Instead of a single plane wave component each
sample is approximated as the sum of multiple plane
waves, arriving from different directions. The
extrapolated RIR, hence, reads as
󰇛󰇜
 
󰇛󰇜󰇛
󰇛󰇜󰇜
󰆄
󰆈
󰆈
󰆈
󰆈
󰆈
󰆈
󰆈
󰆈
󰆈
󰆈
󰆈
󰆈
󰆆

󰇛󰇜 (10)
where 
󰇛󰇜 is the signal-sample, carried by the -th
plane wave component into the direction 
󰇛󰇜. The
motivation behind this formulation is that statistically,
each sample of the diffuse tail contains multiple
reflections arriving from different directions. On the other
hand, regimes of the RIR with particular direction of arrival
may be divided into useful data and background noise
with the proposed technique.
The plane wave components can be calculated from
the 4 channel measured FOA signals denoted by
󰇛󰇜
iteratively, in a matching pursuit manner:
similarly to the original approach, in each -th
iteration a dominant plane wave component is
assumed in each time sample:
󰇛󰇜
󰇛󰇜󰇡
󰇛󰇜󰇢 (11)
In each iteration step the plane wave time signal and
direction has to be approximated, as discussed later.
once a plane wave component is approximated in
the next iteration its contribution is subtracted from
the measured signals
󰇛󰇜
󰇛󰇜
󰇛󰇜󰇡
󰇛󰇜󰇢 (12)
In order to arrive at a fair approximation both the
dominant plane wave direction 󰇛
󰇛󰇜󰇜 and the
corresponding time history 
󰇛󰇜 have to be
approximated based on the available measurement
information, i.e. based on the FOA () signals.
For the sake of simplicity the direction of arrival is
estimated according to (9). As a simple extension the
estimated DOAs are band-limited by simple moving
average filtering with the filter length in order to control
the frequency performance of the approximation. Since
each sample is expressed as the sum of plane waves, the
introduced temporal correlation of the DOA of the
individual plane wave components does not result in the
overall correlation of the diffuse tail. The effect of this
filtering is further elaborated in the next section.
Once the DOA of a single plane wave component is
approximated, the corresponding plane wave time history
has to be found. This can be performed by minimizing the
error of the approximation over the 4 signals at each time
sample:
 
󰇛󰇜
󰇛󰇜󰇛
󰇛󰇜󰇜 (13)
with
󰇛󰇜 denoting the measured FOA signals. The time
history that minimizes the error function can be obtained
by finding the root of the partial derivative of (13), as
explained in details in the Appendix, resulting in
AAAA 2021 BUDAPEST
9TH CONGRESS OF THE ALPS ADRIA ACOUSTICS ASSOCIATION
23-24 September 2021 Budapest Hungary

󰇛󰇜

󰇛󰇜󰇛
󰇛󰇜󰇜
󰇛
󰇛󰇜󰇜 (14)
The above equation basically describes a normalized
(since the denominator is constant) inverse spherical
harmonic transformation into the direction 
󰇛󰇜.
Note that the DOA could be approximated in a similar
manner, by minimizing the derivative of the error function
with respect to the angle variables. However, the yielded
DOA highly correlate with the much simpler and
numerically stable DOA estimator, discussed in the
foregoing, while still ensuring the convergence of the
iteration.
In summary, the algorithm pans portions of each
sample of the Ambisonics signal into dominant directions
in the space, and re-encodes them as plane waves to
higher Ambisonics orders. The following section compares
the performance of the original and the proposed B-
format upmixing methods.
4. COMPARISON OF THE ORIGINAL AND THE EXTENDED
METHODS
Both discussed methods rely on the decomposition of
the original B-format signals into plane-wave signals and
directions, from which the entire Ambisonics
representationincluding the original first order signals
are re-synthesized. First the resynthesized FOA signals are
compared with the original, measured B-format signals.
An important difference between the original and the
extended method is the following: By definition, the
original method reconstructs the zeroth order Ambisonics
signal perfectly, however, first order signalscontaining
directional information are heavily corrupted by high-
frequency spectral distortion. Although the envelope
correction compensates for the introduced high-
frequency noise, still, directional data significantly differs
from the original measurement data, resulting in
colourized loudspeaker feeds after the decoding stage.
On the other hand, the proposed method expresses
both the omnidirectional channel and the first order
signals as series expansions. Due to the truncation of the
expansion, minor discrepancy is present between the
original and the resynthesized FOA signals. Experiments
showed that choosing a practical truncation order as low
as  already results in feasibly low reconstruction
error. Obviously, by increasing the series expansion order
this error can be further suppressed.
(a)
(b)
Fig.3. Comparison of the original and reconstructed
omnidirectional () signals by the proposed method.
Figure (a) depicts the time domain signals, while (b)
illustrates the error of reconstruction.
The above statements are verified by Figure 3 and 4.
Figure 3 depicts the comparison of the original measured,
and the resynthesized omnidirectional signal. It is verified
that even setting the number of identified plane wave
components to  results in insignificant error levels.
Investigation of Figure 3 (b) reveals that the largest
relative error levels are introduced in the late
reverberation intervals. This is the direct consequence of
the fact that the diffuse tail contains no particular
directions of arrival, but even one single sample consists
of impulses, arriving from numerous directions in the
space. Hence these time regimes could be approximated
better by increasing the truncation order of the plane
wave components.
AAAA 2021 BUDAPEST
9TH CONGRESS OF THE ALPS ADRIA ACOUSTICS ASSOCIATION
23-24 September 2021 Budapest Hungary
(a)
(b)
(c)
Fig. 4. Comparison of the original and reconstructed first
order () signals by the proposed and the original
method. Figure (a) depicts the time domain signals, while
(b) illustrates the error of reconstruction, while Figure (c)
shows the frequency content of the first order RIRs.
Figure 4 (a-b) shows the reconstruction of one
particular first order signal (, i.e. , ). It
is demonstrated thatunlike in the original approach
the proposed technique resynthesizes the first order
signals with minor introduced error. Also, the original
approach suffers from heavy frequency distortion
artifacts as it is revealed in Figure 4 (c) depicting the
frequency content of the first order channel under
discussion. This convolutional frequency distortion results
in clearly audible artifacts when the Ambisonics signals
are decoded to an irregular loudspeaker layout, e.g. to a
standard 5.1 loudspeaker ensemble.
As a further important aspect, the frequency
content of the reproduced sound field is investigated
when the Ambisonics signals are decoded to an irregular
loudspeaker layout. In the present example the FOA
signals are upmixed to 4th order HOA signals and decoded
to a 22.2 loudspeaker layout. Figure 5 depicts the
reverberation time, measured in the center of the virtual
loudspeaker array. As reported in the related literature,
without spectral envelope compensation the original
approach would exhibit an increased spectral
brightnessoriginating from the increase of high
frequency components in the diffuse tailmanifesting in
an increased reverberation time at high frequencies. This
artifact is compensated well by the spectral decay
recovery strategy. However, the listening tests evinced
that still heavy colouration can be perceived in the center
of the loudspeaker array due to the frequency distortion,
being an inherent property of the original approach.
The figure also presents the reverberation time
resulting from the proposed approach with different DOA
smoothing filter lengths. The measured data indicates
that without any DOA smoothing applied the presented
approach also suffers from slight frequency performance
degradationwhich is also verified by the listening
testsalthough subjectively in a significantly lower
degree than the original method without spectral
envelope compensation. The spectral brightening can be
efficiently suppressed by applying smoothing to the DOA
estimation: A smoothing filter with the length of 
is sufficient in order to achieve minimized spectral
brightening compared to the original measured B-format
signals, at the cost of increased computational cost and
minor increase of temporal correlation in the diffuse tail.
Hence, in the present approach the DOA smoothing filter
length is a crucial parameter in the aspect of controlling
spectral brightness distortion.
AAAA 2021 BUDAPEST
9TH CONGRESS OF THE ALPS ADRIA ACOUSTICS ASSOCIATION
23-24 September 2021 Budapest Hungary
Fig. 1. Reverberation time of the reproduced sound field by
using the original B-format signals and the upmixed
signals applying the original and the new, proposed
method with different DOA smoothing filter lengths.
5. CONCLUSION
The present paper dealt with extrapolation methods
for generating high order Ambisonics signals from
measured first order representation. The paper
introduced a novel approach based on an existing
extrapolation method. The main idea of the approach is
the re-expansion of each B-format sample into the series
of plane waves, described by a time history sample and a
direction. The plane wave components are calculated in
an iterative matching pursuit manner, by assuming a
single dominant plane wave in each iteration. The
directions of arrival are estimated as the angle of the
intensity vector of the captured sound field, while the
corresponding signal content is obtained by the ISHT of
the sound field into the direction of interest.
The proposed approach was compared with the
existing extrapolation method. Numerical simulations
verified that with suitable choice of parameters the
introduced technique is capable of upmixing Ambisonics
signals by introducing unnoticeable spectral degradation
with significantly lower computational cost than the
original method. Furthermore, the presented approach
reconstructs the zeroth and first order signalsi.e. the
measured bases of the approximationwith minor error,
while the original method inherently colours even the
first-order signals, i.e. the decomposition and resynthesis
does not result in the original signals.
4. APPENDIX
4.1. Derivative of the error function with respect to 󰇛󰇜
In the following the plane wave time signal is
expressed that minimizes the error function (13). For the
sake of brevity the notation iteration index and is
omitted. The error function can be expressed as
 󰇛󰇜󰇛󰇜󰇛󰇛󰇜󰇜
 󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇛󰇜󰇜
󰇛󰇜󰇛󰇛󰇜󰇜 (15)
The error can be minimized by setting its partial
derivative to zero and assuming that only a lower bound
exists for the quadratic form:

󰇛󰇜
 󰇛󰇜󰇛󰇛󰇜󰇜
 󰇛󰇜󰇛󰇛󰇜󰇜 (16)
Solving the equation for 󰇛󰇜 yields the plane wave time
history 󰇛󰇜

󰇛󰇜󰇛󰇛󰇜󰇜
󰇛󰇛󰇜󰇜 (17)
5. REFERENCES
[1] V. Pulkki, T. Lokki, and D. Rocchesso. DAFX:
Digital Audio Effects, chapter 5, pages 139183.
John Wiley Sons, Ltd., 2011.
[2] A.V. Oppenheim and R.W. Schafer. Discrete-Time
Signal Processing. Prentice Hall Press, USA, 3rd
edition, 2009.
[3] M. A. Gerzon. Periphony: With-height sound
reproduction. Journal of the Audio Engineering
Society, 21(1):210, february 1973.
[4] M.A. Gerzon. Practical periphony: The
reproduction of full-sphere sound. Journal of the
Audio Engineering Society, february 1980.
[5] F. Zotter and F. Matthias. Ambisonics: A practical
3D audio theory for recording, studio production,
sound reinforcement, and virtual reality. Springer,
1st edition, 2019.
[6] F. Zotter and M. Frank. All-round ambisonic
panning and decoding. Journal of Audio
Engineering Society, 60(10):807820, 2012.
[7] J. Meyer and G. Elko. A highly scalable spherical
microphone array based on an orthonormal
decomposition of the soundfield. In 2002 IEEE
International Conference on Acoustics, Speech, and
Signal Processing, volume 2, pages II1781II1784,
2002.
AAAA 2021 BUDAPEST
9TH CONGRESS OF THE ALPS ADRIA ACOUSTICS ASSOCIATION
23-24 September 2021 Budapest Hungary
[8] M. A. Gerzon. The design of precisely coincident
microphone arrays for stereo and surround sound.
Journal of the Audio Engineering Society, march
1975.
[9] E. G. Williams. Fourier Acoustics: Sound
Radiation and Nearfield Acoustical Holography.
Academic Press, London, 1st edition, 1999.
[10] H. Kuttruff. Room acoustics. Applied Science
Publishers London, 2d ed. edition, 1979.
[11] M. zaunschirm, M. Frank, and F. Zotter. Brir
synthesis using first-order microphone arrays.
Journal of the Audio Engineering Society, may 2018.
[12] S. Tervo, J. tynen, A. Kuusinen, and T. Lokki.
Spatial decomposition method for room impulse
responses. Journal of the Audio Engineering Society,
61(1/2):1728, january 2013.
[13] J. Merimaa and V. Pulkki. Spatial impulse response
rendering i: Analysis and synthesis. Journal of the
Audio Engineering Society, 53(12):11151127,
december 2005.
[14] V. Pulkki and J. Merimaa. Spatial impulse response
rendering ii: Reproduction of diffuse sound and
listening tests. Journal of the Audio Engineering
Society, 54(1/2):320, january/february 2006.
[15] M. Frank and F. Zotter. Spatial impression and
directional resolution in the reproduction of
reverberation. In Fortschritte der Akustik - DEGA,
Aachen, Germany, 2016.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Spatial impulse response rendering (SIRR) is a recent technique for the reproduction of room acoustics with a multichannel loudspeaker system. SIRR analyzes the time-dependent direction of arrival and diffuseness of measured room responses within frequency bands. Based on the analysis data, a multichannel response suitable for reproduction with any chosen surround loudspeaker setup is synthesized. When loaded to a convolving reverberator, the synthesized responses create a very natural perception of space corresponding to the measured room. A technical description of the analysis-synthesis method is provided. Results of formal subjective evaluation and further analysis of SIRR are presented in a companion paper to be published in JAES in 2006 Jan./Feb.
Article
Full-text available
This paper presents a spatial encoding method for room impulse responses. The method is based on decomposing the spatial room impulse responses into a set of image-sources. The resulting image-sources can be used for room acoustics analysis and for multichannel convolution reverberation engines. The analysis method is applicable for any compact microphone array and the reproduction can be realized with any of the current spatial reproduction methods. Listening test experiments with simulated impulse responses show that the proposed method produces an auralization indistinguishable from the reference in the best case.
Book
This open access book provides a concise explanation of the fundamentals and background of the surround sound recording and playback technology Ambisonics. It equips readers with the psychoacoustical, signal processing, acoustical, and mathematical knowledge needed to understand the inner workings of modern processing utilities, special equipment for recording, manipulation, and reproduction in the higher-order Ambisonic format. The book comes with various practical examples based on free software tools and open scientific data for reproducible research. The book’s introductory section offers a perspective on Ambisonics spanning from the origins of coincident recordings in the 1930s to the Ambisonic concepts of the 1970s, as well as classical ways of applying Ambisonics in first-order coincident sound scene recording and reproduction that have been practiced since the 1980s. As, from time to time, the underlying mathematics become quite involved, but should be comprehensive without sacrificing readability, the book includes an extensive mathematical appendix. The book offers readers a deeper understanding of Ambisonic technologies, and will especially benefit scientists, audio-system and audio-recording engineers. In the advanced sections of the book, fundamentals and modern techniques as higher-order Ambisonic decoding, 3D audio effects, and higher-order recording are explained. Those techniques are shown to be suitable to supply audience areas ranging from studio-sized to hundreds of listeners, or headphone-based playback, regardless whether it is live, interactive, or studio-produced 3D audio material.
Article
Spatial impulse response rendering (SIRR) is a method for reproducing room impulse responses over multichannel loudspeaker setups. The applied analysis and synthesis methods were introduced in a companion paper. Time-frequency analysis is used to obtain directional and diffuseness information from the recorded sound field. Nondiffuse sound is then reproduced as pointlike virtual sources, and diffuse sound is synthesized with a decorrelation technique. The proposed synthesis methods for diffuse sound are examined in more detail and a hybrid method is derived. The relationship between diffuseness and interaural coherence is also studied. In addition, results of two listening tests are presented. It is shown that with a large loudspeaker setup under anechoic conditions, SIRR reproduction is at best indistinguishable from the original sample. Furthermore, in a listening test conducted in a standard listening room with real measured responses, SIRR reproduction is evaluated as the most natural one of the systems studied.
Article
All-Round Ambisonic Panning (AllRAP) is an algorithm for arbitrary loudspeaker arrangements, aiming at the creation of phantom sources of stable loudness and adjustable width. The equivalent All-Round Ambisonic Decoding (AllRAD) fits into the Ambisonic format concept. Conventional Ambisonic decoding is only simple with optimal loudspeaker arrangements for which it achieves direction-independent energy and energy spread, the estimated phantom source loudness and width. AllRAP/AllRAD is still simple but more versatile and utilizes the combination of a virtual optimal loudspeaker arrangement with Vector-Base Amplitude Panning. Open access: http://www.aes.org/e-lib/download.cfm/16554.pdf?ID=16554
Book
Intended for use as both a textbook and a reference, "Fourier Acoustics" develops the theory of sound radiation uniquely from the viewpoint of Fourier Analysis. This powerful perspective of sound radiation provides the reader with a comprehensive and practical understanding which will enable him or her to diagnose and solve sound and vibration problems in the 21st Century. As a result of this perspective, "Fourier Acoustics" is able to present thoroughly and simply, for the first time in book form, the theory of nearfield acoustical holography, an important technique which has revolutionised the measurement of sound. Relying little on material outside the book, "Fourier Acoustics" will be invaluable as a graduate level text as well as a reference for researchers in academia and industry. It talks about the physics of wave propogation and sound vibration in homogeneous media. It deals with acoustics, such as radiation of sound, and radiation from vibrating surfaces; inverse problems, such as the theory of nearfield acoustical holography; and, mathematics of specialized functions, such as spherical harmonics.
Conference Paper
This paper describes a beamforming microphone array consisting of pressure microphones that are mounted on the surface of a rigid sphere. The beamformer is based on a spherical harmonic decomposition of the soundfield. We show that this allows a simple and computationally effective, yet flexible beamformer structure. The look-direction can be steered to any direction in 3-D space without changing the beampattern. In general the number of sensors and their location is quite arbitrary as long as they hold a certain orthogonality constraint that we derived. For a practical example we chose a spherical array with 32 elements. The microphones are located at the center of the faces of a truncated icosahedron. The radius of the sphere is 5 cm. With this setup we can achieve a Directivity Index of 12 dB and higher. The operating frequency range is from 100 Hz to 5 kHz.
Article
Periphony (sound reproduction in both vertical and horizontal directions around a listener) may be recorded among others, via practical two-, four-, and nine-channel systems. matrix parameters and microphone techniques are described for 19 different systems, and a design procedure for other periphonic systems is given. Amplitude and energy directional resolution are discussed, as is compatibility with current horizontal-only systems.