Soundfield rendering with loudspeaker arrays through multiple beam shaping
ABSTRACT This paper proposes a method for the acoustic rendering of a virtual environment based on a geometric decomposition of the wavefield into multiple elementary acoustic beams, all reconstructed with a loudspeaker array. The point of origin, the orientation and the aperture of each beam is computed according to the geometry of the virtual environment that we want to render and to the location of the sources. Spacetime filters are computed with a Least Squares approach to render the desired beam. Experimental results show the feasibility as well as the critical issues of the proposed algorithm.

Conference Paper: Localization of Virtual Acoustic Sources Based on the Hough Transform for Sound Field Rendering Applications
[Show abstract] [Hide abstract]
ABSTRACT: In this paper we propose a methodology for the localization of virtual acoustic sources for sound field rendering applications. After the reconstruction of the sound field in the listening area by means of circular harmonic decomposition, the virtual source location is found through the Hough transform. We prove the accuracy of the proposed methodology by comparing the source locations estimates with those of a subjective test campaign.Proc. of ICASSP 2013, Int. Conf. on Acoustics, Speech, and Signal Processing; 05/2013 · 4.63 Impact Factor  SourceAvailable from: Lucio Bianchi
Conference Paper: Rendering of Directional Sources through Loudspeaker Arrays based on Plane Wave Decomposition
[Show abstract] [Hide abstract]
ABSTRACT: In this paper we present a technique for the rendering of directional sources by means of loudspeaker arrays. The proposed methodology is based on a decomposition of the sound field in terms of plane waves. Within this framework the directivity of the source is naturally included in the rendering problem, therefore accommodating the directivity into the picture becomes much simpler. For this purpose, the loudspeaker array is subdivided into overlapping subarrays, each generating a plane wave component. The individual plane waves are then weighed by the desired directivity pattern. Simulations and experimental results show that the proposed technique is able to reproduce the sound field of directional sources with an improved accuracy with respect to existing techniques.IEEE International Workshop on Multimedia Signal Processing, Pula; 09/2013  SourceAvailable from: Lucio Bianchi
Dataset: Poster
Page 1
2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 1821, 2009, New Paltz, NY
SOUNDFIELD RENDERING WITH LOUDSPEAKER ARRAYS THROUGH MULTIPLE
BEAM SHAPING
F.Antonacci, A.Calatroni, A.Canclini, A.Galbiati, A.Sarti, S.Tubaro.
Politecnico di Milano
Dipartimento di Elettronica ed Informazione
Piazza Leonardo da Vinci, 32, 20133 Milano, Italy
ABSTRACT
This paper proposes a method for the acoustic rendering of a vir
tual environment based on a geometric decomposition of the wave
field into multiple elementary acoustic beams, all reconstructed
with a loudspeaker array. The point of origin, the orientation and
the aperture of each beam is computed according to the geome
try of the virtual environment that we want to render and to the
location of the sources. Spacetime filters are computed with a
Least Squares approach to render the desired beam. Experimen
tal results show the feasibility as well as the critical issues of the
proposed algorithm.
Index Terms— wavefield rendering; acoustic beam; loud
speaker arrays
1. INTRODUCTION
Consider the problem of rendering the acoustics of a virtual envi
ronment in a lowreverberation (dry) room. In order to obtain this
result, we need to modify the soundfield by adding the contribu
tions of the waves reflected from walls of the virtual environment.
In order to do so, we think of the soundfield as the superposition
of beams originating from a number of virtual sources. Decom
positions of the wavefield into elementary components are quite
common in the literature: Ambisonics decomposes the soundfield
through spherical harmonic functions ([1]), while WaveField Syn
thesis (WFS) ([2],[3]) exploits the Huygens principle, which states
that any wavefront can be decomposed into a superposition of el
ementary spherical wavefronts emitted from secondary sources.
Each loudspeaker, therefore, is independently controlled to oper
ate as a secondary source. In this paper we focus on the rendering
of soundfields through a superposition of beams originated from
multiple image sources. In order to do so, we need to be able
to render an acoustic beam and to compute the parameters of all
beams according to the acoustics of the virtual environment we
want to render.
As far as the rendering of a single beam is concerned, in [4]
the authors propose an algorithm that controls the direction of the
beam. However, this methodology is not able to accurately con
trol the shape of the beampattern but just its direction. Moreover,
the authors focus on the farfield case. Generalized Sidelobe Can
celling (GSC) [5] introduces multiple constraints on the beampat
tern but the number of constraints is limited by the number of
loudspeakers, which places some limits on the accuracy of the
The research leading to these results has received funding from the
European Community’s Seventh Framework Programme (FP7/20072013)
under grant agreement No. 226007
beamshaping. An interesting solution for the shaping of an ar
bitrary beampattern can be found in [6]. Here the authors design
the near field beamshaper from a farfield one. The validity of the
beampattern in the broadband makes this algorithm very interest
ing. However, the computational cost prevents us from using this
methodology for our purposes.
In this paper we propose an alternative technique to simulate
by means of an array of M loudspeakers the arbitrary beampattern
of a virtual source. More specifically, we define N test points in
a listening area of arbitrary shape. We impose that the wavefield
on the test points best approximates the wavefield produced by
the virtual source with the specified beampattern. This condition
yields a system of N equations whose unknowns are the M loud
speaker weights that we solve through Singular Value Decomposi
tion. We have used beam tracing [7] to compute the configuration
of beams given the geometry of the virtual environment we want
to render and the position of the real source. The extension of the
algorithm to multiple image sources is done by summing up the
loudspeaker weights for each virtual source. The technique is ex
tended to wideband signals by finding a filter for each loudspeaker
through a frequency sampling approach.
The paper is structured as follows: Section 2 illustrates the
problem and gives an overview of the background. Section 3 de
scribestheproposedsolutionandillustratestheextensiontobroad
band signals and to multiple virtual sources. Section 4 provides
some experimental results to show the feasibility of the proposed
approach. Finally, Section 5 draws some conclusions.
2. PROBLEM STATEMENT AND BACKGROUND
As said above, we conceive the soundfield as a superposition of
acoustic beams. In this Section we formulate the problem of ren
dering a single acoustic beam and we provide an overview on the
techniques available in the literature, which address the same prob
lem.
Consider the problem of rendering within a predetermined
area of interest (listening area) the presence of a source that emits
the signal s(t). We do so with a uniform linear array of m =
1,...,M loudspeakers placed in p1,...,pM. The listening area
is sampled in points a1,...,aN that are arbitrarily located.
We first consider a narrowband source. In Section 3 we extend
our method to wideband sources. The sound pressure pn(t) in the
testpoints a1,...,aN is described by the equation:
pn(t) = gT
nhs(t) = Ψns(t)
(1)
where s(t) is the source signal, h = [H1
is the vector of complex coefficients applied to the loudspeakers
H2
...HM]T
9781424436798/09/$25.00 ©2009 IEEE313
Page 2
2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 1821, 2009, New Paltz, NY
and gn = [g(p1,an),...,g(pM,an)] is the juxtaposition of the
Green’s functions from each loudspeaker to the listening point an
[8]:
1
4πpm− ane−jωpm−an
g(pm,an) =
c
,
(2)
where c is the sound speed and ω is the frequency of s(t). The
term Ψn in eq.(1) is the spatial response of the loudspeaker array
in an. The goal of traditional beamshaping techniques is to leave
undistorted the signal in an (Ψn = 1) while attenuating the field
in the N − 1 remaining points.
Our goal, however, is quite different as we aim at controlling
the shape of the beampattern instead of minimizing the energy
emitted by the loudspeaker array.
Generalized Sideobe Cancelling (GSC) [5] allows us to im
pose multiple constraints at the same time on the emitting direc
tions. In GSC the maximum number of constraints is limited by
the number of loudspeakers M. In order to obtain a smooth spatial
response, however, one has to impose a number of constraints that
is much higher. Moreover, traditional techniques fail when con
sidering nearfield beams, as the juxtaposition of the propagation
vectors for all the test points produces an illconditioned propaga
tion matrix that yields an unstable filter.
In the next Section we will show how our technique addresses
both problems. We will also extend the algorithm to wideband sig
nals and multiple virtual sources to generate the desired soundfield
as superposition of elementary beams.
3. PROPOSED SOLUTION
As shown in the previous Section, we need to use different de
sign criteria with respect to the state of the art in order to achieve
the desired beampattern: instead of minimizing the energy of the
beamshaper, we are interested in controlling the wavefield over the
listening area.
The location of the test points an, n = 1,...,N and of the
loudspeaker pm, m = 1,...,M in the listening area can be cho
sen at random. Figure 1 shows the notation we will use throughout
the rest of the paper. As stated above, the algorithm we present
Figure 1: Notation proposed method for nearfield beamshaping. The virtual
source is located in s and emits a beam towards the direction θ with angular aper
ture φ; the points p1,...,pMdenote the loudspeaker array; a1,...,aNare the
test points.
in this paper is conceived for wideband signals. More specifically,
we find the spatial filter for each loudspeaker through a frequency
sampling approach: complex weights are found for a set of fre
quencies in the range of interest. Complex weights for interme
diate frequencies are found through interpolation. We will give
further details at the end of the section. In a first stage, for the sake
of simplicity in notation, we omit the dependency of all the vari
ables from the frequency. As already written in eq.(1), the spatial
response of the loudspeaker array in anis
Ψn = gT
nh ,
where h is the spatial filter of the loudspeaker array and gn is
the propagation vector from p1,...,pM to an. As stated above,
our goal is to render the acoustic beam emitted by a virtual source
placed in s that emits towards the direction θ and with an angular
aperture φ. The desired response in anis
Ψn = g(s,an)Θ(θ,φ,αn) ,
where Θ(θ,φ,αn) is the radiation pattern of the virtual source and
αn is the angle under which the nth listening point is seen from
s, as depicted in Figure 1. Although in Section 4 we use a Gaus
sian beampattern, we remark that this is only a design choice that
does not prevent us from using a custom function. Our goal is to
approximate the desired beampattern through the spatial response
of the loudspeaker array, which means that
gT
nh = g(s,an)Θ(θ,φ,αn) ,
(3)
where: h = [H1
gn = [g(p1,an)
position of the Green’s functions from the mth emitter to the con
sidered listening point.
If we consider all the listening points at once, we obtain the
following matrixformulation:
H2
g(p2,an)
...HM]Tis the coefficient vector and
...g(pM,an)]Tis the juxta
Gh = rd,
(4)
where rd = [g(s,a1)Θ(θ,φ,α1),...,g(s,aN)Θ(θ,φ,αN)]Tis
the desired response; and G = [g1,...,gN]Tis the N × M
propagation matrix from each loudspeaker to each test point. We
observe that, in order to obtain a smooth beampattern, we use
N ? M, thus we have to use Least Squareslike techniques to
obtain h. The methodology we adopt to solve eq.(4) is related
to the inverse problems theory [9]. The system in eq.(4) is over
determined and it admits no exact solution. However, an estima
tionˆh of the vector h can be calculated by introducing the pseudo
inverse operation on the matrix G:
G+= (GHG)−1GH.
The loudspeakers weight vector is approximated by:
ˆh = G+rd= (GHG)−1GHrd.
(5)
In general Gˆh ?= rd; howeverˆh represents the best solution to the
problem in the least squares sense.
The matrix (GHG) is positive definite and, therefore, in
vertible.However, nothing guarantees that it will be a well
conditioned matrix. In order to avoid instability problems a re
conditioning of (GHG) is needed. We do so through an SVD
decomposition:
GHG = UΣVH,
(6)
where U and V are, respectively, the left and right singular vectors
and Σ = diag(σ1,...,σM) is the singular value diagonal matrix
and σ1 ≥ σ2 ≥ ... ≥ σM. In order to perform the reconditioning,
we seek for the greatest index k which guarantees that σk/σ1 ≥
314
Page 3
2009 IEEE Workshop on Applications of Signal Processing to Audio and AcousticsOctober 1821, 2009, New Paltz, NY
0.01. We retain the first k columns and rows of matrices U, V and
Σ. The approximate inverse matrix is therefore
(GHG)−1≈ VkΣ−1
kUH
k .
(7)
The SVD inversion of the matrix GHG is a costly operation.
However, we observe that a change in either the radial beampattern
of the virtual source or its position correspond only to a change in
the vector rdof the desired response, as the matrix G is composed
by the Green’s functions from each loudspeaker to each test point.
As a consequence, the SVD inverse of GHG may be easily pre
computed once the positions of loudspeakers and test points are
known.
3.1. Extension to multiple image sources and wideband sig
nals
As stated in Section 1, our final goal is the rendering of a com
plex soundfield represented as a superposition of beams generating
from a set of image sources. Since the system described by eq.(4)
is linear, its extension to the multiple image sources turns out to be
quite straightforward, using the superposition principle. Consider
Z image sources characterized by their positions and radiation pat
terns. According to eq.(4), let Gz, hzand rdzbe respectively the
propagation matrix, the coefficient vector and the radiation pattern
for the zth image source, leading to the system Gzhz = rdz.
Analogously,ˆhz is the approximated solution for each of the sys
tems in eq.(4). We obtain the global coefficient vector as the su
perposition of the individual coefficient vectors:
ˆhTOT =
Z
?
z=1
G+
zrdz=
Z
?
z=1
(GH
zGz)−1GH
zrdz.
(8)
From the filter computed according to eq.(8) we find the com
plex weight vectorsˆh(k)
TOTfor the frequencies f1,...,fK. From
ˆh(k)
TOT,k = 1,...,K we obtain a set of filters whose frequency
responses are F1(fl),...,FM(fl),l = 1,...,L for each loud
speaker through parabolic interpolation of the amplitude and cubic
interpolation of the phase. Even if we work with wideband signals,
we work preserving the spatial Nyquist criterion, which means that
the maximum operating frequency is limited by the loudspeaker
reciprocal distance: fmax <
emitters.
c
2d, where d is the distance between
4. EXPERIMENTAL RESULTS
In this Section we show some simulations in order to illustrate
the accuracy of the presented rendering method. We consider the
experimental setup depicted in Fig. 2, that consists of:
• a circular array with radius of 1 m composed by M = 32
equally spaced omnidirectional emitters;
• a listening area including N = 5000 points uniformly dis
tributed in a circular region concentric with the array with a
radius of 0.9 m;
• a virtual room having dimensions w × w ;
The position and configuration of the Z beams is computed with
fast beam tracing [7] starting from the position s0 of a real
source inside the listening area. The radiance pattern function
Θ(θ,φ,αn) used for the beamshaping is a gaussian function. In
particular, the variance and the center of the gaussian windows
w
w
s1
sZ
z
z
x
y
p1
p2
pM
pM1
s0
sz
Figure 2: Experimental setup for the simulations: a circular array is composed by
loudspeakers placed in p1,...,pM; the gray shaded region within the array is the
listening area; the w ×w rectangle depicts the virtual environment; the real source is
placed in s0; with reference to Fig.1 points s1,...,sZdenote a set of virtual image
sources emitting beams towards the direction θzand with angular aperture φz.
are determined according to the direction θ and aperture φ of the
beams to be rendered.
For each simulation two pictures will be shown: the absolute
value S(q) of the theoretical soundfield (where q is a point in
side the listening area) as it were generated by the virtual image
sources, and the absolute valueˆS(q) of the rendered soundfield.
The metric we use to evaluate the accuracy of the rendered sound
fieldistheRootMeanSquare Error (RMSE),computed asfollows:
ERMSE =
??Q
i=1[S(qi) −ˆS(qi)]2
Q
,
where Q is the number of points constituting the soundfield im
ages.
The first experiment we conduct aims at finding the optimum
number K of frequencies to be used for the extension to wide
band signals. A monochromatic signal at the frequency of 550Hz
is emitted. This frequency is intermediate between two adjacent
constrained frequencies fk and fk+1. As a consequence, we ex
pect that, increasing the frequency step we obtain a lower accuracy
of the rendered soundfield. The frequency step Δf = fk+1− fk
ranges from 10Hz to 100Hz. Figure 3 shows the result: we
observe that for Δf ≤ 40Hz the error is almost constant. For
1020 3040
Frequency step [Hz]
5060708090100
2
3
4
5
6
7
8
9
ERMSE[dB]
Figure 3: Plot of ERMSEagainst the frequency step averaged for different posi
tions of the real source s0.
the following tests we keep the real source position fixed in s0 =
(0.7 m,0.7 m). The reflection coefficient of the room’s walls is
315
Page 4
2009 IEEE Workshop on Applications of Signal Processing to Audio and AcousticsOctober 1821, 2009, New Paltz, NY
0.7. Weconsiderwallreflectionsuptothesecondorder, thatyields
Z = 16. We use a frequency step of 40 Hz that gives K = 77
constrained frequencies in the range [330 Hz,3370 Hz].
We first analyze the behavior of the system at a constrained
frequency (530 Hz) in a 10 m × 10 m room. The resulting
RMSE is 4.58 dB. Figure 4.(a) shows S(q);ˆS(q) is shown in
( a )( b )
[dB]
Figure4: PlotofS(q)andˆS(q)foraroom10 m×10 mwideattheconstrained
frequency of 530 Hz.
Figure 4.(b). We will see later that for a different configuration of
the virtual room, the error introduced by some virtual sources szis
higher than the RMSE of the soundfield. This is due to the fact that
for some configurations the beam is oriented towards directions
that make it difficult to obtain the desired soundfield, since the
mainlobe falls outside the listening area.
We now repeat the previous experiment in a 3 m×3 m room.
Fig.5.(a) depicts S(q) while Fig.5.(b) showsˆS(q); the resulting
RMSE is 3.13 dB. The comparison between Fig.5 and Fig.4 and
their RMSE values suggests that, reducing the size of the virtual
environment the performances of the system improve. Experimen
tally we confirm this trend.
( a )( b )
[dB]
Figure 5: Plot of S(q) andˆS(q) for a room 3 m × 3 m wide at the constrained
frequency of 530 Hz.
We now test the performances of the rendering system in a
3 m × 3 m room at a nonconstrained frequency. We choose a
test frequency of 550Hz, that is intermediate between the con
strained frequencies of 530Hz and 570Hz. Figures 6.(a) shows
S(q), while Figure 6.(b) showsˆS(q). For this experiment we ob
tain ERMSE = 3.38 dB: even if the working frequency is not
constrained, no significant distortion appears due to the interpola
tion process.
Finally, Table 4 separates the contribution to the soundfield er
ror due to each virtual source for the same configuration of Fig.5.
The main lobes of the beam generated by the sources s5 and s16
fall outside the listening area, therefore they are not perceptually
relevant. This observation motivates us, as a future work, in find
( a ) ( b )
[dB]
Figure 6: Plot of S(q) andˆS(q) for a room 3 m × 3 m at the unconstrained
frequency of 550 Hz.
Table 1: Error ofˆS(q) due to the rendering of all the virtual sources for the same
configuration of Fig.6. The beams generated by the sources s5and s16are not rele
vant for the perception of the soundfield, hence the relative error values are indicated
with the symbol X.
s1
2,22
s5
X
s9
9,41
s13
2,45
s2
1,92
s6
2,46
s10
2,32
s14
7,74
s3
1,9
s7
7,99
s11
2,42
s15
2,44
s4
2,23
s8
2,42
s12
10,08
s16
X
ERMSE[dB]
ERMSE[dB]
ERMSE[dB]
ERMSE[dB]
ing a heuristic that prunes the tree of virtual sources retaining only
those that are significant.
5. CONCLUSIONS
In this paper we have proposed a technique for rendering the
acoustics of a virtual environment in a dry room. In particular,
we have used a decomposition of the wavefield into elementary
beams. Experimental results have shown the feasibility and some
critical issues of the algorithm. We are currently working on a
realtime demonstrator and on new solutions that work on a room
with reverberations.
6. REFERENCES
[1] P.Fellgett, “Ambisonics. part one: General system description,” Studio Sound,
vol. 40, no. 1, pp. 20–22, Aug. 1975.
[2] A.J.Berkhout, “A holographic approach to acoustic control,” J.Audio Eng.Soc.,
vol. 36, pp. 977–995, Dec. 1988.
[3] S. Spors, H. Teutsch, and R. Rabenstein, “Highquality acoustic rendering with
wave field synthesis,” in Vision, Modeling, and Visualization, Nov. 2002, pp.
101–108.
[4] B. Pueo, J. Escolano, and M. Roma, “Precise control of beam direction and
beamwidth of linear loudspeaker arrays,” in Proceedings of Sensor Array and
Multichannel Signal Processing Workshop Proceedings, June 2004, pp. 538–
541.
[5] L. Griffiths and C. Jim, “An alternative approach to linearly constrained adaptive
beamforming,” IEEE Trans. Antennas Propag., vol. 30, no. AP, pp. 27–34, Jan
1982.
[6] R.A.Kennedy, T.D.Abhayapala, and D.B.Ward, “Broadband nearfield beam
forming using a radial beampattern transformation,” IEEE Transactions on Sig
nal Processing, vol. 46, no. 8, August 1998.
[7] F. Antonacci, M. Foco, A. Sarti, and S. Tubaro, “Fast tracing of acoustic beams
and paths through visibility lookup,” Audio, Speech, and Language Processing,
IEEE Transactions on, vol. 16, no. 4, pp. 812–824, May 2008.
[8] S.Spors, “Spatial aliasing artifacts produced by linear loudspeaker arrays used
for wave field synthesis,” in Proc. of IEEE ISCCSP 2006, Marrakech, Morocco,
March 2006.
[9] A.Tarantola, Inverse Problem Theory and Methods for Model Parameter Estima
tion. Society for Industrial and Applied Mathematics, 2004.
316
View other sources
Hide other sources
 Available from Fabio Antonacci · May 29, 2014
 Available from polimi.it