Conference PaperPDF Available

Reproducing Personal Sound Zones Using a Hybrid Synthesis of Dynamic and Parametric Loudspeakers

Authors:

Abstract and Figures

This paper proposes a hybrid approach to personal sound zones utilising multizone soundfield reproduction techniques and parametric loudspeakers. Crossover filters are designed, to switch between reproduction methods, through analytical analysis of aliasing artifacts in multizone reproductions. By realising the designed crossover filters, wideband acoustic contrast between zones is significantly improved. The trade-off between acoustic contrast and the bandwidth of the reproduced soundfield is investigated. Results show that by incorporating the proposed hybrid model the whole wideband bandwidth is spatial-aliasing free with a mean acoustic contrast consistently above 54.2dB, an improvement of up to 24.2dB from a non-hybrid approach, with as few as 16 dynamic loudspeakers and one parametric loudspeaker.
Content may be subject to copyright.
Reproducing Personal Sound Zones Using a Hybrid
Synthesis of Dynamic and Parametric Loudspeakers
Jacob Donley, Christian Ritzand W. Bastiaan Kleijn
School of Electrical, Computer and Telecommunications Engineering, University of Wollongong,
Wollongong, NSW, 2522 Australia, E-mail: jrd089@uowmail.edu.au and critz@uow.edu.au
School of Engineering and Computer Science, Victoria University of Wellington,
Wellington, 6140 New Zealand, E-mail: bastiaan.kleijn@ecs.vuw.ac.nz
Abstract—This paper proposes a hybrid approach to per-
sonal sound zones utilising multizone soundfield reproduction
techniques and parametric loudspeakers. Crossover filters are
designed, to switch between reproduction methods, through an-
alytical analysis of aliasing artifacts in multizone reproductions.
By realising the designed crossover filters, wideband acoustic
contrast between zones is significantly improved. The trade-off
between acoustic contrast and the bandwidth of the reproduced
soundfield is investigated. Results show that by incorporating
the proposed hybrid model the whole wideband bandwidth is
spatial-aliasing free with a mean acoustic contrast consistently
above 54.2dB, an improvement of up to 24.2dB from a non-
hybrid approach, with as few as 16 dynamic loudspeakers and
one parametric loudspeaker.
I. INTRODUCTION
Personal sound environments, such as provided by multi-
zone soundfield reproduction (MSR) [1] and parametric loud-
speakers (PL) [2], are of interest in applications such as vehicle
cabin entertainment/communication systems, cinema surround
sound systems, multi-participant teleconferencing and personal
audio in restaurant/caf´
es. As well as creating a target bright
zone, it is sometimes also desired to create a second, quiet
zone. In this case, it is important to ensure that the acoustic
contrast (energy ratio) between zones is maximised whilst
ensuring the error in the bright zone is minimised. However,
research in the area has shown performance limitations related
to audio leaking between zones, known as interzone audio
interference, which limits the bandwidth of low error, high
acoustic contrast, personalised audio.
The concept of personal sound from controlling multiple
loudspeakers has been around since 1997 [3]. A method [4]
was proposed later, in 2002, to maximise the ratio of energy
between two regions which was termed Acoustic Contrast
Control [1]. Afterwards, earlier multizone reproduction tech-
niques made use of least-squares pressure matching [5] and
cylindrical harmonic expansion [6]. Further research has made
improvements in spatial reproduction accuracy by utilising
planarity [7] and orthogonal basis planewaves [8].
Scenarios where a finite number of loudspeakers are used
as secondary sources for soundfield reproduction, are lim-
ited to accurate reproduction below a (spatial aliasing) fre-
quency [9]. A fundamental issue with MSR using discrete
secondary sources is that the spatial aliasing induces so-called
grating lobes which can interfere across zones [10]. Recent
research [6], [8], [11] suggests a full circle array of 300
loudspeakers are required to reproduce audio up to 8 kHz with
high acoustic contrast.
PLs, on the otherhand, are capable of providing high di-
rectivity at high frequencies [12] and were first theorised in
1963 [13]. PLs have gained interest due to their high directivity
with a relatively small physical size which is comparable to dy-
namic (conventional) loudspeakers. Practical implementations
have shown PLs can provide immersive spatial audio [14],
[15], however, neither of the hybrid approaches use MSR
with dynamic loudspeakers or consider spatial aliasing. When
comparing PLs to MSR from dynamic loudspeakers, PLs
lack directivity at low frequencies [12], contain higher Total
Harmonic Distortion (THD) [2], [16] and can have potential
health risks due to the high Sound Pressure Level (SPL) of
the ultrasonic carrier frequency [2].
A hybrid system utilising the better aspects of both MSRs
and PLs would allow for high acoustic contrast at low and high
frequencies. Reproduction of speech soundfields [11], [17],
[18] would require low carrier SPL in PLs due to the low
energy of high frequency components in speech [19], thus
reducing related health risks. Further, frequency dependent PL
distortions are less of a problem at higher frequencies [16].
In this paper novel contributions are made through an
analytical approach to a hybrid MSR and PL system with ap-
plication to personal sound zones. A zone dependent crossover
filter is designed to shift the loudspeaker signals between the
MSR and PL in the frequency domain. A wideband acoustic
contrast is presented for the hybrid system and the trade-
off between the acoustic contrast, crossover frequency and
reproduced bandwidth is discussed.
Beginning this paper, in Section II, is an explanation of the
MSR layout and soundfield reproduction aliasing. Section III
gives a brief overview of the PL directivity model used in this
work. In Section IV a hybrid method is formulated for MSR
and PL reproduction of personal sound zones with results and
discussion in Section V and conclusions in Section VI.
II. MU LTIZ ON E SOU ND FIE LD REPRODUCTION (MSR)
In this section a general MSR layout is described along with
a description of a recent MSR technique. The aliasing which
occurs from reproductions with spatial discretisation artifacts
is also explained for later use in the hybrid model.
© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for
resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
0°
φL
Rlφc
Rv
ψc
ψ
2d
R
D
Du
rzq
rq
α
Q
Dq
rzb
rb
β
B
Dbθ
Fig. 1. MSR layout for a circular loudspeaker array (green) with
a companion PL (red) for hybrid soundfield reproduction in Db.
In this work, the acoustical brightness contrast between two
zones, Dband Dq, is defined as
ζR(k) =
dqRDb|Sa
R(x, k)|2dx
dbRDq|Sa
R(x, k)|2dx,(1)
where dband dqare the areas (sizes) of Dband Dq, respec-
tively. The mean square error (MSE) between the desired
soundfield, Sd(x, k), and the actual reproduced soundfield,
Sa
R(x, k), is [6], [20]
R(k) = RDb
Sd(x, k)Sa
R(x, k)
2dx
RDb|Sd(x, k)|2dx,(2)
which is used to measure reproduction accuracy. These mea-
sures can be used for any actual soundfield, Sa
R(x, k), created
with any reproduction technique, R, such as MSR, PL or any
combination thereof.
A. MSR Layout
The geometry of a generic MSR layout is depicted in
Fig. 1 for a circular array with a companion PL. An MSR
reproduction region, D, of radius Ris shown and contains
three sub-regions called the bright, quiet and unattended zone,
labelled Db,Dqand Du=D\(DbDq), respectively.
The centre of Dis the origin from which other geometrical
locations are related. The centres of Dband Dqhave radius and
angle pair polar coordinates (rzb, β)and (rzq , α), respectively.
The radius of Dband Dqis rband rq, respectively, and the
direction of the soundfield within the regions is θand ϑ,
respectively. The MSR loudspeaker arc has a centre located
at (Rl, φc)and subtends an angle of φL. The directional PL
has a centre located at (Rv, ψc)and is directed at an angle of
ψclockwise from the origin. In practice, the PL is a circular
array of transducers, with effective radius d, protruding normal
to the reproduction plane. In this work, the imaginary unit is
i=1and the Euclidean norm is denoted with k·k. The
wavenumber k= 2πf/c is interchanged with frequency, f,
under the assumption that the speed of sound, c, is constant.
B. MSR Technique
An infinite set of planewaves arriving from every angle is
capable of entirely describing any arbitrary desired sound-
field [21]. A soundfield fulfilling the wave equation, in this
work, is denoted by the function S(x, k), where xDis an
arbitrary spatial sampling point. As shown in the orthogonal
basis expansion approach [8], [20] to MSR, an additional
spatial weighting function, w(x), can be used to set relative
importance between zones. The weighted MSR soundfield
function used in this work can be written as
S(x, k) = X
j
Pj(k)Fj(x, k),(3)
where the orthogonal wavefields, Fj(x, k), have coefficients,
Pj(k), for a given weighting function and desired soundfield,
Sd(x, k); and j∈ {1, . . . , J }where Jis the number of basis
planewaves [8].
The complex loudspeaker weights used to reproduce the
soundfield in the (temporal) frequency domain are [22], [8]
Ul(k) =
M
X
¯m=M
2ei¯lφsPjPj(k)i¯mei¯j
iπH(1)
¯m(kRl),(4)
where ρj= (j1)∆ρare the wavefield angles, ρ= 2π/J,
φlis the angle of the lth dynamic loudspeaker from 0°, φsis
the angular spacing of the loudspeakers, H(1)
ν(·)is a νth-order
Hankel function of the first kind and the modal truncation
length [8] is
M=dkRe.(5)
Here, Pjis chosen to minimise the difference between the
desired soundfield and the actual soundfield [8].
The actual soundfield from MSR is the result from super-
position of all individual loudspeaker responses
Sa
MSR(x, k) = GMSR(k)X
l
Ul(k)T(x,ll, k),(6)
where GMSR(k)is introduced as an arbitrary weighting for
hybrid soundfields (described later in IV-A), the loudspeaker’s
2-D acoustic transfer function (ATF) is
T(x,ll, k) = i
4H(1)
0(kkxllk),(7)
and llis the position of the lth dynamic loudspeaker. Setting
GMSR(k)=1in (6) will render the multizone soundfield.
C. Soundfield Reproduction Aliasing
A fundamental issue with reproducing soundfields using
a limited number of loudspeakers is spatial aliasing which
gives rise to grating lobes which may impede the quiet
zone at higher frequencies [10]. Due to this phenomenon,
the bandwidth of reproducible soundfields with high acoustic
contrast (which may be lost above the aliasing frequency)
is reduced. For a part-circle array, the minimum number of
dynamic loudspeakers to use before aliasing problems begin
to occur is given by [6], [8]
LφL(2M+ 1)
2π+ 1.(8)
Substituting (5) into (8) and rearranging to find an approxi-
mation for upper frequency limit k=ku, gives
ku=2π(L1) φL
2R0φL
,(9)
where, instead of R,R0is used which is the radius of the
smallest circle concentric with Dencompassing all zones. The
upper frequency from (9) agrees with [10] and is dependent
on the number of loudspeakers, the reproduction radius and
the angle subtending the loudspeaker arc.
III. PARAMETRIC LOUDSPEAKER (PL)
A few PL directivity models are reviewed in this section as
well as common disadvantages of PLs. The disadvantages are
discussed in regards to speech soundfields, further motivating
the use of a hybrid model for such applications.
A. Directivity Models
The literature provides a handful of directivity models for
PLs which are algorithmic approximations of the pressure at
different angles. Earlier models include Westervelt’s directivity
(WD) [13] and product directivity (PD) [23], [24], though,
these models do not accurately match measured directivity
from a PL. Recently a convolutional directivity (CD) model,
used in this work, was proposed [12], [25] utilising both WD
and PD which has better correlation to measured directivity.
The actual soundfield reproduced by the PL, where the PL
is located at p, is defined in this work as
Sa
PL(x, k) = GPL(k)E(x, k)D(x, k )eikkxpk,(10)
where GPL(k)is introduced as an arbitrary weighting for
hybrid soundfields (described later in IV-A), D(x, k)is the
CD and the directivity coefficient is
E(x, k) = ˜
βk2/4π˜αs˜ρ0kxpkc2,(11)
where ˜
βis the coefficient of non-linearity, ˜αsis the sum of
the absorption coefficients for both primary frequencies and
˜ρ0is the density of the medium.
The CD is defined as the convolution between the PD and
WD with the linear convolution operator, , as [12], [25]
D(x, k) = [DG(x, kc)DG(x, kc+k)] ∗ DW(x, k),(12)
where kcis the ultrasonic carrier frequency, DGx,ˆ
kis the
Gaussian directivity [24]
DGx,ˆ
k=e(i
2dˆ
ktan (ρx+Ψ))2
,(13)
where ρxis the angle of vector xpfrom 0°, Ψ=
(ψ+ψcπ)and WD is [25]
DW(x, k) = ˜αs/q˜α2
s+k2tan4(ρx+Ψ).(14)
The far-field PL soundfield can then be found using (11)
and (12) in (10) with GPL(k)=1. However, as kdecreases
Sa
PL(x, k)approaches that of a point source and ζPL (k)is
consequently reduced. It is assumed in this work that the PL
is designed such that grating lobes are negligible [26] and for
different virtual source locations, multiple steerable PL arrays
can be used [15], [26].
B. PLs for Speech Soundfields
While PLs have been studied extensively over the years
there are still some drawbacks when it comes to reproducing
loud and clear audible sound. Audible reproductions from
PLs are known to require a large carrier SPL (>110 dB)
for typical speech conversation levels of 60 dBA, which
has potential inadvertent health risks [2]. Fortunately, for
applications of speech soundfields, high SPLs from the PL
are not necessary for high frequency ('2 kHz) components
of speech [19], further, harmonic distortions are lower above
this frequency [16]. Taking into account the PL location so that
the far-field demodulated audio [27] overlays Dband under the
assumption that high SPL from the PL is not required over Db,
health risks from the PLs could be argued to be negligible.
IV. HYBRID MSR AN D PL S YS TE M
A hybrid MSR and PL system is presented in this section
for use in personal sound zone applications. A crossover filter
is designed to switch target audio in the (temporal) frequency
domain to each of the constituent reproduction techniques.
A. Crossover Filter Design
Ideally the combination of low and high frequency acoustic
contrast from Sa
MSR(x, k)and Sa
PL(x, k), respectively, is de-
sired for personal sound zones. The weightings, GMSR(k)and
GPL(k), are introduced in (6) and (10), respectively, in order
to facilitate a hybrid soundfield, Sa
H(x, k). When composing a
hybrid soundfield it is natural to limit spectral distortion of the
reproduction at the crossover frequency, for this, we propose
the use of Linkwitz-Riley (LR) filters. Here, a low-pass ˆnth
order LR filter with a roll-off of ndB/octave is a cascaded
Butterworth filter
Hq
LR(k) = Bˆn
2(k/ku)2,(15)
where Bˆn
2are Butterworth polynomials of order ˆn
2and ku
from (9) is suggested as the crossover frequency. The matching
LR high-pass is
Hp
LR(k) = Bˆn
2(ku/k)2(16)
and together the crossover magnitude response is
Hq
LR(k) + Hp
LR(k)
= 1.(17)
In this work, the arbitrary MSR weighting is set to
GMSR(k) = Hq
LR(k),(18)
and the arbitrary PL weighting is
GPL(k) = Hp
LR(k).(19)
Using the new weights from (18) and (19) in (6) and (10),
respectively, a hybrid, H, soundfield is defined as the super-
position of a set of reproduction methods, R(in this work the
cardinality of Ris 2), as
Sa
H(x, k) = X
R∈R
db|GR(k)|Sa
R(x, k)
RDb|Sa
R(x, k)|dx,(20)
where each component soundfield is normalised to the mean
amplitude over Db.ζR(k)and R(k)can be evaluated using
Sa
H(x, k)in place of Sa
R(x, k)in (1) and (2), respectively.
B. Loudspeaker Signals
The time domain loudspeaker signals (unmodulated for a
PL) are defined in general in this section for the reproduction
of speech input signals, y(n). The discrete Fourier transform
of the gth overlapping windowed frame of y(n)is ˜
Yg(k). The
overlapping windowed frame of each loudspeaker signal is
˜
QRlg(k) = ˜
Yg(k)GR(k)Ul(k),(21)
˜qRlg(n) = 1
K
K1
X
m=0
˜
QRlg(kmˆ
f)eicnkm,(22)
where km,2πm/cK, the number of frequencies is K, the
maximum frequency is ˆ
fand each loudspeaker signal, qRl(n),
for a particular R, is reconstructed by performing overlap-add
reconstruction with the synthesis window on ˜qRlg (n). For the
case where there is a single loudspeaker, l={1}, for a given
R, such as for the PL in this work, Ul(k) = 1 is used.
V. RESULTS A ND DISCUSSION
A. Experimental Setup
Simulations were carried out using the geometry shown
in Fig. 1 with rzb =rzq = 0.6 m,rb=rq= 0.3 m,R= 1.0 m
and α=β/3 = 90°. The desired soundfield angle was θ= 0°
and in this work w(x)was set to one in Db,100 in Dqand 0.05
in Dubased on [8], [11], [20]. The target soundfield in Dbwas
a virtual point source located at the centre of the PL and Dq
was set to be quiet. The loudspeakers had Rl=Rv= 1.3 m,
φL= 180°, φc= 180° and ψ=ψc180°= 27.5°. The
speed of sound in air was c= 343 m s1.
The PL was designed with kc= 2π(40 kHz) /c,˜
β= 1.2,
˜αs= 2.328 m1,˜ρ0= 1.225 kg m3and d= 6.18 cm. In this
work, it was assumed that the PL had ultrasonic transducer
spacing less than 4.3 cm [26], thus avoiding spatial aliasing.
The LR filters used to reproduce Sa
MSR(x, k)and Sa
PL(x, k)
had order ˆn= 12. The number of MSR loudspeakers used
was L={16,24,32,134}where kuwas found from (9).
To compare with MSR, L= 134 was chosen to reproduce
the speech with no spatial aliasing. The hybrid reproduction
method used R={MSR,PL}to find Sa
H(x, k)using (20).
B. Wideband Spatial Error Reduction
Figure 2 shows MSR(k),PL(k)and H(k)computed
from (2) in (E)–(H) as dashed green, dashed red and solid
blue lines, respectively. The crossover frequencies are the
vertical dash-dot black lines. Comparing the proposed hybrid
TABLE I
WIDEBAND MEAN RAND ζRC OM PARI SON S AS A F UNC TI ON OF T HE
NU MBE R OF DY NAM IC LO UD SPE AKE RS (L)F OR ON E PL
L
R(dB)ζR(dB)
MSR PL HMSR PL H
16 27.240.732.5 30.0 40.454.2
24 32.740.731.7 38.1 40.458.1
32 33.740.731.6 43.5 40.460.3
134 36.440.735.6 79.6 40.4 79.3
approach, it can be seen in Fig. 2 that Hwas on average
similar to the aliasing free MSR. Table I confirms this by
showing that, on average, Hwas slightly less than MSR.
While this was partly due to the low MSE of PL at lower
frequencies, acoustic contrast was also reduced when using
a PL at those lower frequencies as seen in Fig. 2 (A)–(D).
The trade-off between MSE and acoustic contrast is shown in
Table I where Hreduces with L.
C. Wideband Acoustic Contrast Improvement
Figure 2 shows ζMSR(k),ζPL(k)and ζH(k), computed
from (1), in (A)–(D) as dashed green, dashed red and solid blue
lines, respectively. The crossover frequencies are the vertical
dash-dot black lines which clearly indicate the point where
ζMSR(k)begins to decrease due to spatial aliasing. Note that
the multizone occlusion problem [1], [11] (should it occur)
may be difficult to overcome with one PL, however, the MSR
grating lobes interfere less over Dqduring this phenomenon.
Also shown in Fig. 2 is the limited bandwidth with high
acoustic contrast when reducing L. The mean acoustic contrast
over the wideband bandwidth for all reproduction techniques
is given in Table I and the mean improvement using the
hybrid method can be deduced. While the MSR mean acoustic
contrast decreased significantly, from 79.6 dB to 30.0 dB, due
to spatial aliasing, the proposed hybrid method decreased to
only 54.2 dB. For all reduced loudspeaker cases the hybrid
approach outperformed both MSR and PL methods. The
maximum improvement was 24.2 dB when L= 16 and for all
cases the wideband acoustic contrast remained above 54.2 dB,
despite the fundamental spatial aliasing that occurred.
VI. CONCLUSIONS
This paper has proposed a hybrid approach to personal
sound zones, including speech soundfields. An analytical
solution to the combination of MSR and PL soundfields is
presented along with a solution to a robust crossover filter.
The crossover filter is analytically derived from the geometry
of the soundfield layout whilst taking into account spatial
aliasing artifacts. Experimental results show that a significant
improvement in acoustic contrast from non-hybrid MSR and
PL soundfields of 24.2 dB and 19.9 dB, respectively, is achiev-
able. The proposed hybrid method also yields mean wideband
acoustic contrast consistently above 54.2 dB with as few as 16
dynamic loudspeakers and a single PL. Some topics for future
work are improving speech intelligibility contrast (SIC) and
quality in private speech sound zones using hybrid techniques.
0
20
40
60
80
100
120
140
Acoustic Contrast (dB)
(A)
L= 16
MSR
PL
H
ku
0.1 1 8
-50
-40
-30
-20
-10
0
Mean Squared Error (dB)
(E)
L= 16
(B)
L= 24
0.1 1 8
(F)
L= 24
(C)
L= 32
0.1 1 8
(G)
L= 32
(D)
L= 134
Acoustic Contrast and Mean Squared Error for Reproduction Methods
0.1 1 8
Frequency (kHz)
(H)
L= 134
Fig. 2. Results are shown for three reproduction methods and four L. Acoustic contrast results (ζMSR,ζPL and ζH) are shown in (A)–(D).
Mean squared error results (MSR,PL and H) are shown in (E)–(H). The case where L= 134 is alias free up to 8 kHz.
REFERENCES
[1] T. Betlehem, W. Zhang, M. Poletti, and T. D. Abhayapala, “Personal
Sound Zones: Delivering interface-free audio to multiple listeners,IEEE
Signal Process. Mag., vol. 32, pp. 81–91, 2015.
[2] W.-S. Gan, J. Yang, and T. Kamakura, “A review of parametric acoustic
array in air,Appl. Acoust., vol. 73, no. 12, pp. 1211–1219, Dec. 2012.
[3] W. F. Druyvesteyn and J. Garas, “Personal sound,J. Audio Eng. Soc.,
vol. 45, no. 9, pp. 685–701, 1997.
[4] J.-W. Choi and Y.-H. Kim, “Generation of an acoustically bright zone
with an illuminated region using multiple sources,” J. Acoust. Soc. Am.,
vol. 111, no. 4, pp. 1695–1700, 2002.
[5] M. Poletti, “An investigation of 2-D multizone surround sound systems,”
in Proc. 125th Conv. Audio Eng. Soc. Audio Eng. Soc., 2008, pp. 1–9.
[6] Y. J. Wu and T. D. Abhayapala, “Spatial multizone soundfield reproduc-
tion: Theory and design,” IEEE Trans. Audio, Speech, Lang. Process.,
vol. 19, pp. 1711–1720, 2011.
[7] P. Coleman, P. Jackson, M. Olik, and J. A. Pedersen, “Personal audio
with a planar bright zone,” J. Acoust. Soc. Am., vol. 136, pp. 1725–1735,
2014.
[8] W. Jin, W. B. Kleijn, and D. Virette, “Multizone soundfield reproduction
using orthogonal basis expansion,” in Int. Conf. on Acoust., Speech and
Signal Process. (ICASSP). IEEE, 2013, pp. 311–315.
[9] S. Spors, H. Wierstorf, A. Raake, F. Melchior, M. Frank, and F. Zotter,
“Spatial sound with loudspeakers and its perception: A review of the
current state,” Proc. IEEE, vol. 101, no. 9, pp. 1920–1938, 2013.
[10] F. Winter, J. Ahrens, and S. Spors, “On Analytic Methods for 2.5-D
Local Sound Field Synthesis Using Circular Distributions of Secondary
Sources,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 24,
no. 5, pp. 914–926, May 2016.
[11] J. Donley, C. Ritz, and W. B. Kleijn, “Improving speech privacy in
personal sound zones,” in Int. Conf. on Acoust., Speech and Signal
Process. (ICASSP). IEEE, 2016, pp. 311–315.
[12] C. Shi, Y. Kajikawa, and W.-S. Gan, “An overview of directivity control
methods of the parametric array loudspeaker,APSIPA Trans. Signal
Inform. Process., pp. 1–30, Dec. 2014.
[13] P. J. Westervelt, “Parametric acoustic array,” J. Acoust. Soc. Am., vol. 35,
no. 4, pp. 535–537, 1963.
[14] Y. Sugibayashi, S. Kurimoto, D. Ikefuji, M. Morise, and T. Nishiura,
“Three-dimensional acoustic sound field reproduction based on hybrid
combination of multiple parametric loudspeakers and electrodynamic
subwoofer,Appl. Acoust., vol. 73, no. 12, pp. 1282–1288, Dec. 2012.
[15] C. Shi, E.-L. Tan, and W.-S. Gan, “Hybrid immersive three-dimensional
sound reproduction system with steerable parametric loudspeakers,” in
Proc. Meetings Acoust., vol. 19, 2013, pp. 1–6.
[16] C. Shi and Y. Kajikawa, “A comparative study of preprocessing methods
in the parametric loudspeaker,” in Asia-Pacific Signal & Inform. Process.
Assoc. Annu. Summit and Conf. (APSIPA ASC). IEEE, 2014, pp. 1–5.
[17] J. Donley and C. Ritz, “An efficient approach to dynamically weighted
multizone wideband reproduction of speech soundfields,” in China
Summit & Int. Conf. Signal and Inform. Process. (ChinaSIP). IEEE,
2015, pp. 60–64.
[18] J. Donley and C. Ritz, “Multizone reproduction of speech soundfields:
A perceptually weighted approach,” in Asia-Pacific Signal & Inform.
Process. Assoc. Annu. Summit and Conf. (APSIPA ASC). IEEE, 2015,
pp. 342–345.
[19] Artificial voices. ITU-T Standard P.50, 1999.
[20] W. Jin and W. B. Kleijn, “Theory and design of multizone soundfield
reproduction using sparse methods,” IEEE/ACM Trans. Audio, Speech,
Lang. Process., vol. 23, pp. 2343–2355, 2015.
[21] E. G. Williams, Fourier Acoustics: Sound Radiation and Nearfield
Acoustical Holography. Academic Press, 1999.
[22] Y. J. Wu and T. D. Abhayapala, “Theory and design of soundfield
reproduction using continuous loudspeaker concept,” IEEE Trans. Audio,
Speech, Lang. Process., vol. 17, pp. 107–116, 2009.
[23] H. O. Berktay and D. J. Leahy, “Farfield performance of parametric
transmitters,” J. Acoust. Soc. Am., vol. 55, no. 3, pp. 539–546, 1974.
[24] C. Shi and W.-S. Gan, “Product directivity models for parametric
loudspeakers,” J. Acoust. Soc. Am., vol. 131, no. 3, pp. 1938–1945,
2012.
[25] C. Shi and Y. Kajikawa, “A convolution model for computing the far-
field directivity of a parametric loudspeaker array,” J. Acoust. Soc. Am.,
vol. 137, no. 2, pp. 777–784, 2015.
[26] Chuang Shi and Woon-Seng Gan, “Grating lobe elimination in steer-
able parametric loudspeaker,IEEE Trans. Ultrason. Ferroelectr. Freq.
Control, vol. 58, no. 2, pp. 437–450, 2011.
[27] F. Farias and W. Abdulla, “On Rayleigh distance and absorption length
of parametric loudspeakers,” in Asia-Pacific Signal & Inform. Process.
Assoc. Annu. Summit and Conf. (APSIPA ASC). IEEE, 2015, pp. 1262–
1265.
... There are three main approaches for SFR: Wave Field Synthesis (WFS) [1], [2], Higher Order Ambisonics (HOA) [3], [4] and Pressure Matching (PM) [5]- [7]. The first approach, the WFS, is based on the Huygens-Fresnel principle, which means that any wavefront can be synthesized from a series of elementary spherical waves. ...
... The last experiment in this subsection compares the effect of the number of matching points M toward the NMSE. We fix the volume of listening cube as 1m 3 and increase the total number of matching points M when calculate the placement and weights of the loudspeakers. As illustrated in Figure 9, the x-axis label 'Cond 1' represents M=125 (5×5×5), 'Cond 2' represents M=512 (8×8×8), 'Cond 3' represents M=1000 (10×10×10), 'Cond 4' represents M=1728 (12×12×12), 'Cond 5' represents M=3375 (15×15×15), 'Cond 6' represents M=8000 (20×20×20), and 'Cond 7' represents M=15625 (25×25×25). ...
... By checking Figure 11(a), our proposed AL-LS SFR model has an analogous NMSE compared with CMP-based method and both of them outperform other methods when pmax=0. 3. Figure 11(b) depicts that the proposed SFR approach works better than the reference methods when f>800Hz with pmax=2. ...
Article
Full-text available
This paper proposes a 3-dimensional (3D) sound field reproduction (SFR) approach through the combination of Alternating Direction Method of Multipliers (ADMM) based Lasso and regularized Least-Square (LS). The proposed SFR method is split into two parts through the pressure matching optimi-zation of loudspeaker positions and the computation of driving signals. At the first part, a plurality of can-didate positions of loudspeakers in planar array are given and, then, the active speaker selection method is proposed based on ADMM complex Lasso algorithm for selecting the optimal loudspeaker positions. Af-terwards, regularized Least-Square (LS) is adopted to calculate the selected loudspeaker weights and con-trol the total power. The numerical simulation experiments demonstrate that the proposed SFR scheme out-performs the existing sparse loudspeakers’ placement and weight optimization algorithms especially in un-der-sampled sound fields. Meanwhile, the evaluations also confirmed that the proposed method could sig-nificantly reduce the computational complexity of the active loudspeaker selection compared to the state-of-the-art Lasso-based SFR. Effectively, the proposed method uses a relatively small number of loudspeakers for a satisfying reproduction quality.
... Parametric loudspeakers have recently been considered for personal use, such as realizing private sound spaces. [10][11][12] In addition, the parametric effect has been applied in the development of an omnidirectional audible sound source, 13) and the active use of this effect in a near field has also been investigated. 14) To develop a parametric loudspeaker system that consists of modulators, power amplifiers, and ultrasonic emitters, we need to evaluate the demodulated sound transformed from primary ultrasound waves using theoretical prediction and measurements. ...
Article
This study evaluates the accuracy of demodulated sound measurements using a condenser microphone in the near field of a parametric loudspeaker system. Microphones with different sensitivities placed at incidence angles of 0° and 90° were used to measure demodulation frequency components without special acoustic filters. The measured components were compared with theoretical predictions. The results show that the measured sound pressure using microphones placed at 0° was up to several tens of decibels larger than the theoretical predictions and significantly inaccurate in the near field. This was due to the nonlinear response of the microphone, which had high sensitivity at primary sound frequencies, inducing spurious signals. This result suggests that using a microphone with low sensitivity at primary sound frequencies placed at an appropriate angle that reduces sensitivity improves parametric sound measurement accuracy.
... • Parametric loudspeakers are incorporated into a multizone soundfield reproduction scenario using a novel hybrid crossover approach. The hybrid approach is designed to improve acoustic contrast above the spatial aliasing frequency in multizone soundfield reproduction scenarios [121]. ...
Thesis
Full-text available
The experience and utility of personal sound is a highly sought after characteristic of shared spaces. Personal sound allows individuals, or small groups of individuals, to listen to separate streams of audio content without external interruption from a third-party. The desired effects of personal acoustic environments can also be areas of minimal sound, where quiet spaces facilitate an effortless mode of communication. These characteristics have become exceedingly difficult to produce in busy environments such as cafes, restaurants, open plan offices and entertainment venues. The concept of, and the ability to provide, spaces of such nature has been of significant interest to researchers in the past two decades. This thesis answers open questions in the area of personal sound reproduction using loudspeaker arrays, which is the active reproduction of soundfields over extended spatial regions of interest. We first provide a review of the mathematical foundations of acoustics theory, single zone and multiple zone soundfield reproduction, as well as background on the human perception of sound. We then introduce novel approaches for the integration of psychoacoustic models in multizone soundfield reproductions and describe implementations that facilitate the efficient computation of complex soundfield synthesis. The psychoacoustic based zone weighting is shown to considerably improve soundfield accuracy, as measured by the soundfield error, and the proposed computational methods are shown capable of providing several orders of magnitude better performance with insignificant effects on synthesis quality. Consideration is then given to the enhancement of privacy and quality in personal sound zones and in particular on the effects of unwanted sound leaking between zones. Optimisation algorithms, along with a priori estimations of cascaded zone leakage filters, are then established so as to provide privacy between the sound zones without diminishing quality. Simulations and real-world experiments are performed, using linear and part-circle loudspeaker arrays, to confirm the practical feasibility of the proposed privacy and quality control techniques. The experiments show that good quality and confidential privacy are achievable simultaneously. The concept of personal sound is then extended to the active suppression of speech across loudspeaker boundaries. Novel suppression techniques are derived for linear and planar loudspeaker boundaries, which are then used to simulate the reduction of speech levels over open spaces and suppression of acoustic reflections from walls. The suppression is shown to be as effective as passive fibre panel absorbers. Finally, we propose a novel ultrasonic parametric and electrodynamic loudspeaker hybrid design for acoustic contrast enhancement in multizone reproduction scenarios and show that significant acoustic contrast can be achieved above the fundamental spatial aliasing frequency.
... substituting the truncation length, Ď M , and rearranging gives [40] " ...
Article
Full-text available
Reproducing zones of personal sound is a challenging signal processing problem which has garnered considerable research interest in recent years. We introduce in this work an extended method to multizone soundfield reproduction which overcomes issues with speech privacy and quality. Measures of Speech Intelligibility Contrast (SIC) and speech quality are used as cost functions in an optimisation of speech privacy and quality. Novel spatial and (temporal) frequency domain speech masker filter designs are proposed to accompany the optimisation process. Spatial masking filters are designed using multizone soundfield algorithms which are dependent on the target speech multizone reproduction. Combinations of estimates of acoustic contrast and long term average speech spectra are proposed to provide equal masking influence on speech privacy and quality. Spatial aliasing specific to multizone soundfield reproduction geometry is further considered in analytically derived low-pass filters. Simulated and real-world experiments are conducted to verify the performance of the proposed method using semi-circular and linear loudspeaker arrays. Simulated implementations of the proposed method show that significant speech intelligibility contrast and speech quality is achievable between zones. A range of Perceptual Evaluation of Speech Quality (PESQ) Mean Opinion Scores (MOS) that indicate good quality are obtained while at the same time providing confidential privacy as indicated by SIC. The simulations also show that the method is robust to variations in the speech, virtual source location, array geometry and number of loudspeakers. Real-world experiments confirm the practicality of the proposed methods by showing that good quality and confidential privacy are achievable.
... It can be seen from Fig. 4 that the mean suppression reaches a peak of −9.1 dB near 400 Hz and maintains mean suppression below −7.5 dB from 365 Hz to 730 Hz. Future work could include investigating the control above the spatial Nyquist frequency by either increasing the loudspeaker density or using hybrid loudspeaker and ANC systems [30,31]. ...
Conference Paper
Full-text available
In this paper, we investigate the effects of compensating for wave-domain filtering delay in an active speech control system. An active control system utilising wave-domain processed basis functions is evaluated for a linear array of dipole secondary sources. The target control soundfield is matched in a least squares sense using orthogonal wavefields to a predicted future target soundfield. Filtering is implemented using a block-based short-time signal processing approach which induces an inherent delay. We present an autoregressive method for predictively compensating for the filter delay. An approach to block-length choice that maximises the soundfield control is proposed for a trade-off between soundfield reproduction accuracy and prediction accuracy. Results show that block-length choice has a significant effect on the active suppression of speech.
Conference Paper
Full-text available
This paper proposes two methods for providing speech privacy between spatial zones in anechoic and reverberant environments. The methods are based on masking the content leaked between regions. The masking is optimised to maximise the speech intelligibility contrast (SIC) between the zones. The first method uses a uniform masker signal that is combined with desired multizone loudspeaker signals and requires acoustic contrast between zones. The second method computes a space-time domain masker signal in parallel with the loudspeaker signals so that the combination of the two emphasises the spectral masking in the targeted quiet zone. Simulations show that it is possible to achieve a significant SIC in anechoic environments whilst maintaining speech quality in the bright zone.
Conference Paper
Full-text available
In this paper a method for the reproduction of multizone speech soundfields using perceptual weighting criteria is proposed. Psychoacoustic models are used to derive a space-time-frequency weighting function to control leakage of perceptually unimportant energy from the bright zone into the quiet zone. This is combined with a method for regulating the number of basis planewaves used in the reproduction to allow for an efficient implementation using a codebook of predetermined weights based on desired soundfield energy in the zones. The approach is capable of improving the mean squared error for reproduced speech in the bright zone by -10.5 decibels. Results also show that the approach leads to a significant reduction in the spatial error within the bright zone whilst requiring 65% less loudspeaker signal power for the case where the soundfield in this zone is in line with, and hence partially directed to, the quiet zone.
Conference Paper
Full-text available
This paper proposes and evaluates an efficient approach for practical reproduction of multizone soundfields for speech sources. The reproduction method, based on a previously proposed approach, utilises weighting parameters to control the soundfield reproduced in each zone whilst minimising the number of loudspeakers required. Proposed here is an interpolation scheme for predicting the weighting parameter values of the multizone soundfield model that otherwise requires significant computational effort. It is shown that initial computation time can be reduced by a factor of 1024 with only -85dB of error in the reproduced soundfield relative to reproduction without interpolated weighting parameters. The perceptual impact on the quality of the speech reproduced using the method is also shown to be negligible. By using pre-saved soundfields determined using the proposed approach, practical reproduction of dynamically weighted multizone soundfields of wideband speech could be achieved in real-time. Index Terms— multizone soundfield reproduction, wideband multizone soundfield, weighted multizone soundfield, look-up tables (LUT), interpolation, sound field synthesis (SFS)
Conference Paper
The parametric loudspeaker is a directional sound reproduction device making use of the parametric sound generation. A sound beam is formed as a result of nonlinear interactions between ultrasonic beams. The parametric loudspeaker is advantageous in transmitting an equally narrow sound beam from a smaller emitter as compared to the conventional loudspeaker. Due to this advantage, parametric loudspeakers are readily applied in a variety of sound field control applications, such as creation of personal listening spots, spatial audio reproduction, and active noise control. However, there is a long concerned drawback of the parametric loudspeaker, whereby harmonic and intermodulation distortions are byproducts of the parametric sound generation. Hence, a comparative study of six preprocessing methods, including two proposed methods from this paper, is carried out. Harmonic and intermodulation distortions are demonstrated by experiments.
Conference Paper
In this paper a method for the reproduction of multizone speech soundfields using perceptual weighting criteria is proposed. Psychoacoustic models are used to derive a space-time-frequency weighting function to control leakage of perceptually unimportant energy from the bright zone into the quiet zone. This is combined with a method for regulating the number of basis planewaves used in the reproduction to allow for an efficient implementation using a codebook of predetermined weights based on desired soundfield energy in the zones. The approach is capable of improving the mean squared error for reproduced speech in the bright zone by −10.5 decibels. Results also show that the approach leads to a significant reduction in the spatial error within the bright zone whilst requiring 65% less loudspeaker signal power for the case where the soundfield in this zone is in line with, and hence partially directed to, the quiet zone.
Conference Paper
This paper investigates the propagation of primary and secondary waves produced by parametric loudspeakers of different sizes. The theory of the parametric acoustic array describes the nonlinear interaction of waves to be confined to the near-field, but the nonlinearities may remain over the far-field, producing different results. Four simulations were done to compare the performance of loudspeakers with different Rayleigh distances. Based on the simulations, a new design was proposed to overcome the mismatching of absorption length and Rayleigh distance. Control over these distances is proposed as an improvement of the performance of the parametric loudspeaker, considering the required scope and proper application.
Article
Local sound field synthesis allows for synthesizing a given desired sound field inside a limited target region such that the field is free of considerable spatial aliasing artifacts. Spatial aliasing artifacts are a consequence of overlaps due to unavoidable repetitions of the space-spectral coefficients of the secondary source driving function. We analyze various conceivable analytic ways of restricting the bandwidth of the spatial spectrum of the driving function such that considerable overlapping is prevented: local spatial bandlimitation (A), spectral windowing (B), and local spatial bandlimitation plus spectral windowing (C). While solution B is computationally significantly more efficient than A and C, it provides only limited control over the spatial location around which the aliasing-free region evolves. Solutions A and C provide more flexibility and higher accuracy whereby both achieve largely identical results so that the spectral windowing after the local spatial bandlimitation may be skipped. We present a detailed analysis of the properties of the spatial aliasing artifacts arising in the synthesis of a virtual plane wave. We establish a procedure for predicting the maximum possible size of the aliasing-free target region depending on its location and on the propagation direction of the desired sound field. The results can help reducing regularization in numerical solutions as they represent physical limitations that can be considered in the choice of parameters.
Article
Surround sound systems can produce a desired sound field over an extended region of space by using higher order Ambisonics. One application of this capability is the production of multiple independent soundfields in separate zones. This paper investigates multi-zone surround systems for the case of two dimensional reproduction. A least squares approach is used for deriving the loudspeaker weights for producing a desired single frequency wave field in one of N zones, while producing silence in the other N-1 zones. It is shown that reproduction in the active zone is more difficult when an inactive zone is in-line with the virtual sound source and the active zone. Methods for controlling this problem are discussed.
Article
Multizone soundfield reproduction over an extended spatial region is a challenging problem in acoustic signal processing. We introduce a method of reproducing a multizone soundfield within a desired region in reverberant environments. It is based on the identification of the acoustic transfer function (ATF) from the loudspeaker over the desired reproduction region using a limited number of microphone measurements. We assume that the soundfield is sparse in the domain of planewave decomposition and identify the ATF using sparse methods. The estimates of the ATFs are then used to derive the optimal least-squares solution for the loudspeaker filters that minimize the reproduction error over the entire reproduction region. Simulations confirm that the method leads to a significantly reduced number of required microphones for accurate multizone sound reproduction, while it also facilitates the reproduction over a wide frequency range. Practical experiments are used to verify the sparse planewave representation of the reverberant soundfield in a real-world listening environment.