Content uploaded by Jacob Donley

Author content

All content in this area was uploaded by Jacob Donley on Oct 13, 2017

Content may be subject to copyright.

Reproducing Personal Sound Zones Using a Hybrid

Synthesis of Dynamic and Parametric Loudspeakers

Jacob Donley∗, Christian Ritz∗and W. Bastiaan Kleijn†

∗School of Electrical, Computer and Telecommunications Engineering, University of Wollongong,

Wollongong, NSW, 2522 Australia, E-mail: jrd089@uowmail.edu.au and critz@uow.edu.au

†School of Engineering and Computer Science, Victoria University of Wellington,

Wellington, 6140 New Zealand, E-mail: bastiaan.kleijn@ecs.vuw.ac.nz

Abstract—This paper proposes a hybrid approach to per-

sonal sound zones utilising multizone soundﬁeld reproduction

techniques and parametric loudspeakers. Crossover ﬁlters are

designed, to switch between reproduction methods, through an-

alytical analysis of aliasing artifacts in multizone reproductions.

By realising the designed crossover ﬁlters, wideband acoustic

contrast between zones is signiﬁcantly improved. The trade-off

between acoustic contrast and the bandwidth of the reproduced

soundﬁeld is investigated. Results show that by incorporating

the proposed hybrid model the whole wideband bandwidth is

spatial-aliasing free with a mean acoustic contrast consistently

above 54.2dB, an improvement of up to 24.2dB from a non-

hybrid approach, with as few as 16 dynamic loudspeakers and

one parametric loudspeaker.

I. INTRODUCTION

Personal sound environments, such as provided by multi-

zone soundﬁeld reproduction (MSR) [1] and parametric loud-

speakers (PL) [2], are of interest in applications such as vehicle

cabin entertainment/communication systems, cinema surround

sound systems, multi-participant teleconferencing and personal

audio in restaurant/caf´

es. As well as creating a target bright

zone, it is sometimes also desired to create a second, quiet

zone. In this case, it is important to ensure that the acoustic

contrast (energy ratio) between zones is maximised whilst

ensuring the error in the bright zone is minimised. However,

research in the area has shown performance limitations related

to audio leaking between zones, known as interzone audio

interference, which limits the bandwidth of low error, high

acoustic contrast, personalised audio.

The concept of personal sound from controlling multiple

loudspeakers has been around since 1997 [3]. A method [4]

was proposed later, in 2002, to maximise the ratio of energy

between two regions which was termed Acoustic Contrast

Control [1]. Afterwards, earlier multizone reproduction tech-

niques made use of least-squares pressure matching [5] and

cylindrical harmonic expansion [6]. Further research has made

improvements in spatial reproduction accuracy by utilising

planarity [7] and orthogonal basis planewaves [8].

Scenarios where a ﬁnite number of loudspeakers are used

as secondary sources for soundﬁeld reproduction, are lim-

ited to accurate reproduction below a (spatial aliasing) fre-

quency [9]. A fundamental issue with MSR using discrete

secondary sources is that the spatial aliasing induces so-called

grating lobes which can interfere across zones [10]. Recent

research [6], [8], [11] suggests a full circle array of ≈300

loudspeakers are required to reproduce audio up to 8 kHz with

high acoustic contrast.

PLs, on the otherhand, are capable of providing high di-

rectivity at high frequencies [12] and were ﬁrst theorised in

1963 [13]. PLs have gained interest due to their high directivity

with a relatively small physical size which is comparable to dy-

namic (conventional) loudspeakers. Practical implementations

have shown PLs can provide immersive spatial audio [14],

[15], however, neither of the hybrid approaches use MSR

with dynamic loudspeakers or consider spatial aliasing. When

comparing PLs to MSR from dynamic loudspeakers, PLs

lack directivity at low frequencies [12], contain higher Total

Harmonic Distortion (THD) [2], [16] and can have potential

health risks due to the high Sound Pressure Level (SPL) of

the ultrasonic carrier frequency [2].

A hybrid system utilising the better aspects of both MSRs

and PLs would allow for high acoustic contrast at low and high

frequencies. Reproduction of speech soundﬁelds [11], [17],

[18] would require low carrier SPL in PLs due to the low

energy of high frequency components in speech [19], thus

reducing related health risks. Further, frequency dependent PL

distortions are less of a problem at higher frequencies [16].

In this paper novel contributions are made through an

analytical approach to a hybrid MSR and PL system with ap-

plication to personal sound zones. A zone dependent crossover

ﬁlter is designed to shift the loudspeaker signals between the

MSR and PL in the frequency domain. A wideband acoustic

contrast is presented for the hybrid system and the trade-

off between the acoustic contrast, crossover frequency and

reproduced bandwidth is discussed.

Beginning this paper, in Section II, is an explanation of the

MSR layout and soundﬁeld reproduction aliasing. Section III

gives a brief overview of the PL directivity model used in this

work. In Section IV a hybrid method is formulated for MSR

and PL reproduction of personal sound zones with results and

discussion in Section V and conclusions in Section VI.

II. MU LTIZ ON E SOU ND FIE LD REPRODUCTION (MSR)

In this section a general MSR layout is described along with

a description of a recent MSR technique. The aliasing which

occurs from reproductions with spatial discretisation artifacts

is also explained for later use in the hybrid model.

© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or

future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for

resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

0°

φL

Rlφc

Rv

ψc

−ψ

2d

R

D

Du

rzq

rq

α

Q

Dq

rzb

rb

β

B

Db−θ

Fig. 1. MSR layout for a circular loudspeaker array (green) with

a companion PL (red) for hybrid soundﬁeld reproduction in Db.

In this work, the acoustical brightness contrast between two

zones, Dband Dq, is deﬁned as

ζR(k) =

dqRDb|Sa

R(x, k)|2dx

dbRDq|Sa

R(x, k)|2dx,(1)

where dband dqare the areas (sizes) of Dband Dq, respec-

tively. The mean square error (MSE) between the desired

soundﬁeld, Sd(x, k), and the actual reproduced soundﬁeld,

Sa

R(x, k), is [6], [20]

R(k) = RDb

Sd(x, k)−Sa

R(x, k)

2dx

RDb|Sd(x, k)|2dx,(2)

which is used to measure reproduction accuracy. These mea-

sures can be used for any actual soundﬁeld, Sa

R(x, k), created

with any reproduction technique, R, such as MSR, PL or any

combination thereof.

A. MSR Layout

The geometry of a generic MSR layout is depicted in

Fig. 1 for a circular array with a companion PL. An MSR

reproduction region, D, of radius Ris shown and contains

three sub-regions called the bright, quiet and unattended zone,

labelled Db,Dqand Du=D\(Db∪Dq), respectively.

The centre of Dis the origin from which other geometrical

locations are related. The centres of Dband Dqhave radius and

angle pair polar coordinates (rzb, β)and (rzq , α), respectively.

The radius of Dband Dqis rband rq, respectively, and the

direction of the soundﬁeld within the regions is θand ϑ,

respectively. The MSR loudspeaker arc has a centre located

at (Rl, φc)and subtends an angle of φL. The directional PL

has a centre located at (Rv, ψc)and is directed at an angle of

ψclockwise from the origin. In practice, the PL is a circular

array of transducers, with effective radius d, protruding normal

to the reproduction plane. In this work, the imaginary unit is

i=√−1and the Euclidean norm is denoted with k·k. The

wavenumber k= 2πf/c is interchanged with frequency, f,

under the assumption that the speed of sound, c, is constant.

B. MSR Technique

An inﬁnite set of planewaves arriving from every angle is

capable of entirely describing any arbitrary desired sound-

ﬁeld [21]. A soundﬁeld fulﬁlling the wave equation, in this

work, is denoted by the function S(x, k), where x∈Dis an

arbitrary spatial sampling point. As shown in the orthogonal

basis expansion approach [8], [20] to MSR, an additional

spatial weighting function, w(x), can be used to set relative

importance between zones. The weighted MSR soundﬁeld

function used in this work can be written as

S(x, k) = X

j

Pj(k)Fj(x, k),(3)

where the orthogonal waveﬁelds, Fj(x, k), have coefﬁcients,

Pj(k), for a given weighting function and desired soundﬁeld,

Sd(x, k); and j∈ {1, . . . , J }where Jis the number of basis

planewaves [8].

The complex loudspeaker weights used to reproduce the

soundﬁeld in the (temporal) frequency domain are [22], [8]

Ul(k) =

M

X

¯m=−M

2ei¯mφl∆φsPjPj(k)i¯me−i¯mρj

iπH(1)

¯m(kRl),(4)

where ρj= (j−1)∆ρare the waveﬁeld angles, ∆ρ= 2π/J,

φlis the angle of the lth dynamic loudspeaker from 0°, ∆φsis

the angular spacing of the loudspeakers, H(1)

ν(·)is a νth-order

Hankel function of the ﬁrst kind and the modal truncation

length [8] is

M=dkRe.(5)

Here, Pjis chosen to minimise the difference between the

desired soundﬁeld and the actual soundﬁeld [8].

The actual soundﬁeld from MSR is the result from super-

position of all individual loudspeaker responses

Sa

MSR(x, k) = GMSR(k)X

l

Ul(k)T(x,ll, k),(6)

where GMSR(k)is introduced as an arbitrary weighting for

hybrid soundﬁelds (described later in IV-A), the loudspeaker’s

2-D acoustic transfer function (ATF) is

T(x,ll, k) = i

4H(1)

0(kkx−llk),(7)

and llis the position of the lth dynamic loudspeaker. Setting

GMSR(k)=1in (6) will render the multizone soundﬁeld.

C. Soundﬁeld Reproduction Aliasing

A fundamental issue with reproducing soundﬁelds using

a limited number of loudspeakers is spatial aliasing which

gives rise to grating lobes which may impede the quiet

zone at higher frequencies [10]. Due to this phenomenon,

the bandwidth of reproducible soundﬁelds with high acoustic

contrast (which may be lost above the aliasing frequency)

is reduced. For a part-circle array, the minimum number of

dynamic loudspeakers to use before aliasing problems begin

to occur is given by [6], [8]

L≥φL(2M+ 1)

2π+ 1.(8)

Substituting (5) into (8) and rearranging to ﬁnd an approxi-

mation for upper frequency limit k=ku, gives

ku=2π(L−1) −φL

2R0φL

,(9)

where, instead of R,R0is used which is the radius of the

smallest circle concentric with Dencompassing all zones. The

upper frequency from (9) agrees with [10] and is dependent

on the number of loudspeakers, the reproduction radius and

the angle subtending the loudspeaker arc.

III. PARAMETRIC LOUDSPEAKER (PL)

A few PL directivity models are reviewed in this section as

well as common disadvantages of PLs. The disadvantages are

discussed in regards to speech soundﬁelds, further motivating

the use of a hybrid model for such applications.

A. Directivity Models

The literature provides a handful of directivity models for

PLs which are algorithmic approximations of the pressure at

different angles. Earlier models include Westervelt’s directivity

(WD) [13] and product directivity (PD) [23], [24], though,

these models do not accurately match measured directivity

from a PL. Recently a convolutional directivity (CD) model,

used in this work, was proposed [12], [25] utilising both WD

and PD which has better correlation to measured directivity.

The actual soundﬁeld reproduced by the PL, where the PL

is located at p, is deﬁned in this work as

Sa

PL(x, k) = GPL(k)E(x, k)D(x, k )eikkx−pk,(10)

where GPL(k)is introduced as an arbitrary weighting for

hybrid soundﬁelds (described later in IV-A), D(x, k)is the

CD and the directivity coefﬁcient is

E(x, k) = ˜

βk2/4π˜αs˜ρ0kx−pkc2,(11)

where ˜

βis the coefﬁcient of non-linearity, ˜αsis the sum of

the absorption coefﬁcients for both primary frequencies and

˜ρ0is the density of the medium.

The CD is deﬁned as the convolution between the PD and

WD with the linear convolution operator, ∗, as [12], [25]

D(x, k) = [DG(x, kc)DG(x, kc+k)] ∗ DW(x, k),(12)

where kcis the ultrasonic carrier frequency, DGx,ˆ

kis the

Gaussian directivity [24]

DGx,ˆ

k=e(i

2dˆ

ktan (ρx+Ψ))2

,(13)

where ρxis the angle of vector x−pfrom 0°, Ψ=

(ψ+ψc−π)and WD is [25]

DW(x, k) = ˜αs/q˜α2

s+k2tan4(ρx+Ψ).(14)

The far-ﬁeld PL soundﬁeld can then be found using (11)

and (12) in (10) with GPL(k)=1. However, as kdecreases

Sa

PL(x, k)approaches that of a point source and ζPL (k)is

consequently reduced. It is assumed in this work that the PL

is designed such that grating lobes are negligible [26] and for

different virtual source locations, multiple steerable PL arrays

can be used [15], [26].

B. PLs for Speech Soundﬁelds

While PLs have been studied extensively over the years

there are still some drawbacks when it comes to reproducing

loud and clear audible sound. Audible reproductions from

PLs are known to require a large carrier SPL (>110 dB)

for typical speech conversation levels of ≈60 dBA, which

has potential inadvertent health risks [2]. Fortunately, for

applications of speech soundﬁelds, high SPLs from the PL

are not necessary for high frequency ('2 kHz) components

of speech [19], further, harmonic distortions are lower above

this frequency [16]. Taking into account the PL location so that

the far-ﬁeld demodulated audio [27] overlays Dband under the

assumption that high SPL from the PL is not required over Db,

health risks from the PLs could be argued to be negligible.

IV. HYBRID MSR AN D PL S YS TE M

A hybrid MSR and PL system is presented in this section

for use in personal sound zone applications. A crossover ﬁlter

is designed to switch target audio in the (temporal) frequency

domain to each of the constituent reproduction techniques.

A. Crossover Filter Design

Ideally the combination of low and high frequency acoustic

contrast from Sa

MSR(x, k)and Sa

PL(x, k), respectively, is de-

sired for personal sound zones. The weightings, GMSR(k)and

GPL(k), are introduced in (6) and (10), respectively, in order

to facilitate a hybrid soundﬁeld, Sa

H(x, k). When composing a

hybrid soundﬁeld it is natural to limit spectral distortion of the

reproduction at the crossover frequency, for this, we propose

the use of Linkwitz-Riley (LR) ﬁlters. Here, a low-pass ˆnth

order LR ﬁlter with a roll-off of 6ˆndB/octave is a cascaded

Butterworth ﬁlter

Hq

LR(k) = Bˆn

2(k/ku)−2,(15)

where Bˆn

2are Butterworth polynomials of order ˆn

2and ku

from (9) is suggested as the crossover frequency. The matching

LR high-pass is

Hp

LR(k) = Bˆn

2(ku/k)−2(16)

and together the crossover magnitude response is

Hq

LR(k) + Hp

LR(k)

= 1.(17)

In this work, the arbitrary MSR weighting is set to

GMSR(k) = Hq

LR(k),(18)

and the arbitrary PL weighting is

GPL(k) = Hp

LR(k).(19)

Using the new weights from (18) and (19) in (6) and (10),

respectively, a hybrid, H, soundﬁeld is deﬁned as the super-

position of a set of reproduction methods, R(in this work the

cardinality of Ris 2), as

Sa

H(x, k) = X

R∈R

db|GR(k)|Sa

R(x, k)

RDb|Sa

R(x, k)|dx,(20)

where each component soundﬁeld is normalised to the mean

amplitude over Db.ζR(k)and R(k)can be evaluated using

Sa

H(x, k)in place of Sa

R(x, k)in (1) and (2), respectively.

B. Loudspeaker Signals

The time domain loudspeaker signals (unmodulated for a

PL) are deﬁned in general in this section for the reproduction

of speech input signals, y(n). The discrete Fourier transform

of the gth overlapping windowed frame of y(n)is ˜

Yg(k). The

overlapping windowed frame of each loudspeaker signal is

˜

QRlg(k) = ˜

Yg(k)GR(k)Ul(k),(21)

˜qRlg(n) = 1

K

K−1

X

m=0

˜

QRlg(kmˆ

f)eicnkm,(22)

where km,2πm/cK, the number of frequencies is K, the

maximum frequency is ˆ

fand each loudspeaker signal, qRl(n),

for a particular R, is reconstructed by performing overlap-add

reconstruction with the synthesis window on ˜qRlg (n). For the

case where there is a single loudspeaker, l={1}, for a given

R, such as for the PL in this work, Ul(k) = 1 is used.

V. RESULTS A ND DISCUSSION

A. Experimental Setup

Simulations were carried out using the geometry shown

in Fig. 1 with rzb =rzq = 0.6 m,rb=rq= 0.3 m,R= 1.0 m

and α=β/3 = 90°. The desired soundﬁeld angle was θ= 0°

and in this work w(x)was set to one in Db,100 in Dqand 0.05

in Dubased on [8], [11], [20]. The target soundﬁeld in Dbwas

a virtual point source located at the centre of the PL and Dq

was set to be quiet. The loudspeakers had Rl=Rv= 1.3 m,

φL= 180°, φc= 180° and ψ=ψc−180°= 27.5°. The

speed of sound in air was c= 343 m s−1.

The PL was designed with kc= 2π(40 kHz) /c,˜

β= 1.2,

˜αs= 2.328 m−1,˜ρ0= 1.225 kg m−3and d= 6.18 cm. In this

work, it was assumed that the PL had ultrasonic transducer

spacing less than 4.3 cm [26], thus avoiding spatial aliasing.

The LR ﬁlters used to reproduce Sa

MSR(x, k)and Sa

PL(x, k)

had order ˆn= 12. The number of MSR loudspeakers used

was L={16,24,32,134}where kuwas found from (9).

To compare with MSR, L= 134 was chosen to reproduce

the speech with no spatial aliasing. The hybrid reproduction

method used R={MSR,PL}to ﬁnd Sa

H(x, k)using (20).

B. Wideband Spatial Error Reduction

Figure 2 shows MSR(k),PL(k)and H(k)computed

from (2) in (E)–(H) as dashed green, dashed red and solid

blue lines, respectively. The crossover frequencies are the

vertical dash-dot black lines. Comparing the proposed hybrid

TABLE I

WIDEBAND MEAN RAND ζRC OM PARI SON S AS A F UNC TI ON OF T HE

NU MBE R OF DY NAM IC LO UD SPE AKE RS (L)F OR ON E PL

L

R(dB)ζR(dB)

MSR PL HMSR PL H

16 −27.2−40.7−32.5 30.0 40.454.2

24 −32.7−40.7−31.7 38.1 40.458.1

32 −33.7−40.7−31.6 43.5 40.460.3

134 −36.4−40.7−35.6 79.6 40.4 79.3

approach, it can be seen in Fig. 2 that Hwas on average

similar to the aliasing free MSR. Table I conﬁrms this by

showing that, on average, Hwas slightly less than MSR.

While this was partly due to the low MSE of PL at lower

frequencies, acoustic contrast was also reduced when using

a PL at those lower frequencies as seen in Fig. 2 (A)–(D).

The trade-off between MSE and acoustic contrast is shown in

Table I where Hreduces with L.

C. Wideband Acoustic Contrast Improvement

Figure 2 shows ζMSR(k),ζPL(k)and ζH(k), computed

from (1), in (A)–(D) as dashed green, dashed red and solid blue

lines, respectively. The crossover frequencies are the vertical

dash-dot black lines which clearly indicate the point where

ζMSR(k)begins to decrease due to spatial aliasing. Note that

the multizone occlusion problem [1], [11] (should it occur)

may be difﬁcult to overcome with one PL, however, the MSR

grating lobes interfere less over Dqduring this phenomenon.

Also shown in Fig. 2 is the limited bandwidth with high

acoustic contrast when reducing L. The mean acoustic contrast

over the wideband bandwidth for all reproduction techniques

is given in Table I and the mean improvement using the

hybrid method can be deduced. While the MSR mean acoustic

contrast decreased signiﬁcantly, from 79.6 dB to 30.0 dB, due

to spatial aliasing, the proposed hybrid method decreased to

only 54.2 dB. For all reduced loudspeaker cases the hybrid

approach outperformed both MSR and PL methods. The

maximum improvement was 24.2 dB when L= 16 and for all

cases the wideband acoustic contrast remained above 54.2 dB,

despite the fundamental spatial aliasing that occurred.

VI. CONCLUSIONS

This paper has proposed a hybrid approach to personal

sound zones, including speech soundﬁelds. An analytical

solution to the combination of MSR and PL soundﬁelds is

presented along with a solution to a robust crossover ﬁlter.

The crossover ﬁlter is analytically derived from the geometry

of the soundﬁeld layout whilst taking into account spatial

aliasing artifacts. Experimental results show that a signiﬁcant

improvement in acoustic contrast from non-hybrid MSR and

PL soundﬁelds of 24.2 dB and 19.9 dB, respectively, is achiev-

able. The proposed hybrid method also yields mean wideband

acoustic contrast consistently above 54.2 dB with as few as 16

dynamic loudspeakers and a single PL. Some topics for future

work are improving speech intelligibility contrast (SIC) and

quality in private speech sound zones using hybrid techniques.

0

20

40

60

80

100

120

140

Acoustic Contrast (dB)

(A)

L= 16

MSR

PL

H

ku

0.1 1 8

-50

-40

-30

-20

-10

0

Mean Squared Error (dB)

(E)

L= 16

(B)

L= 24

0.1 1 8

(F)

L= 24

(C)

L= 32

0.1 1 8

(G)

L= 32

(D)

L= 134

Acoustic Contrast and Mean Squared Error for Reproduction Methods

0.1 1 8

Frequency (kHz)

(H)

L= 134

Fig. 2. Results are shown for three reproduction methods and four L. Acoustic contrast results (ζMSR,ζPL and ζH) are shown in (A)–(D).

Mean squared error results (MSR,PL and H) are shown in (E)–(H). The case where L= 134 is alias free up to 8 kHz.

REFERENCES

[1] T. Betlehem, W. Zhang, M. Poletti, and T. D. Abhayapala, “Personal

Sound Zones: Delivering interface-free audio to multiple listeners,” IEEE

Signal Process. Mag., vol. 32, pp. 81–91, 2015.

[2] W.-S. Gan, J. Yang, and T. Kamakura, “A review of parametric acoustic

array in air,” Appl. Acoust., vol. 73, no. 12, pp. 1211–1219, Dec. 2012.

[3] W. F. Druyvesteyn and J. Garas, “Personal sound,” J. Audio Eng. Soc.,

vol. 45, no. 9, pp. 685–701, 1997.

[4] J.-W. Choi and Y.-H. Kim, “Generation of an acoustically bright zone

with an illuminated region using multiple sources,” J. Acoust. Soc. Am.,

vol. 111, no. 4, pp. 1695–1700, 2002.

[5] M. Poletti, “An investigation of 2-D multizone surround sound systems,”

in Proc. 125th Conv. Audio Eng. Soc. Audio Eng. Soc., 2008, pp. 1–9.

[6] Y. J. Wu and T. D. Abhayapala, “Spatial multizone soundﬁeld reproduc-

tion: Theory and design,” IEEE Trans. Audio, Speech, Lang. Process.,

vol. 19, pp. 1711–1720, 2011.

[7] P. Coleman, P. Jackson, M. Olik, and J. A. Pedersen, “Personal audio

with a planar bright zone,” J. Acoust. Soc. Am., vol. 136, pp. 1725–1735,

2014.

[8] W. Jin, W. B. Kleijn, and D. Virette, “Multizone soundﬁeld reproduction

using orthogonal basis expansion,” in Int. Conf. on Acoust., Speech and

Signal Process. (ICASSP). IEEE, 2013, pp. 311–315.

[9] S. Spors, H. Wierstorf, A. Raake, F. Melchior, M. Frank, and F. Zotter,

“Spatial sound with loudspeakers and its perception: A review of the

current state,” Proc. IEEE, vol. 101, no. 9, pp. 1920–1938, 2013.

[10] F. Winter, J. Ahrens, and S. Spors, “On Analytic Methods for 2.5-D

Local Sound Field Synthesis Using Circular Distributions of Secondary

Sources,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 24,

no. 5, pp. 914–926, May 2016.

[11] J. Donley, C. Ritz, and W. B. Kleijn, “Improving speech privacy in

personal sound zones,” in Int. Conf. on Acoust., Speech and Signal

Process. (ICASSP). IEEE, 2016, pp. 311–315.

[12] C. Shi, Y. Kajikawa, and W.-S. Gan, “An overview of directivity control

methods of the parametric array loudspeaker,” APSIPA Trans. Signal

Inform. Process., pp. 1–30, Dec. 2014.

[13] P. J. Westervelt, “Parametric acoustic array,” J. Acoust. Soc. Am., vol. 35,

no. 4, pp. 535–537, 1963.

[14] Y. Sugibayashi, S. Kurimoto, D. Ikefuji, M. Morise, and T. Nishiura,

“Three-dimensional acoustic sound ﬁeld reproduction based on hybrid

combination of multiple parametric loudspeakers and electrodynamic

subwoofer,” Appl. Acoust., vol. 73, no. 12, pp. 1282–1288, Dec. 2012.

[15] C. Shi, E.-L. Tan, and W.-S. Gan, “Hybrid immersive three-dimensional

sound reproduction system with steerable parametric loudspeakers,” in

Proc. Meetings Acoust., vol. 19, 2013, pp. 1–6.

[16] C. Shi and Y. Kajikawa, “A comparative study of preprocessing methods

in the parametric loudspeaker,” in Asia-Paciﬁc Signal & Inform. Process.

Assoc. Annu. Summit and Conf. (APSIPA ASC). IEEE, 2014, pp. 1–5.

[17] J. Donley and C. Ritz, “An efﬁcient approach to dynamically weighted

multizone wideband reproduction of speech soundﬁelds,” in China

Summit & Int. Conf. Signal and Inform. Process. (ChinaSIP). IEEE,

2015, pp. 60–64.

[18] J. Donley and C. Ritz, “Multizone reproduction of speech soundﬁelds:

A perceptually weighted approach,” in Asia-Paciﬁc Signal & Inform.

Process. Assoc. Annu. Summit and Conf. (APSIPA ASC). IEEE, 2015,

pp. 342–345.

[19] Artiﬁcial voices. ITU-T Standard P.50, 1999.

[20] W. Jin and W. B. Kleijn, “Theory and design of multizone soundﬁeld

reproduction using sparse methods,” IEEE/ACM Trans. Audio, Speech,

Lang. Process., vol. 23, pp. 2343–2355, 2015.

[21] E. G. Williams, Fourier Acoustics: Sound Radiation and Nearﬁeld

Acoustical Holography. Academic Press, 1999.

[22] Y. J. Wu and T. D. Abhayapala, “Theory and design of soundﬁeld

reproduction using continuous loudspeaker concept,” IEEE Trans. Audio,

Speech, Lang. Process., vol. 17, pp. 107–116, 2009.

[23] H. O. Berktay and D. J. Leahy, “Farﬁeld performance of parametric

transmitters,” J. Acoust. Soc. Am., vol. 55, no. 3, pp. 539–546, 1974.

[24] C. Shi and W.-S. Gan, “Product directivity models for parametric

loudspeakers,” J. Acoust. Soc. Am., vol. 131, no. 3, pp. 1938–1945,

2012.

[25] C. Shi and Y. Kajikawa, “A convolution model for computing the far-

ﬁeld directivity of a parametric loudspeaker array,” J. Acoust. Soc. Am.,

vol. 137, no. 2, pp. 777–784, 2015.

[26] Chuang Shi and Woon-Seng Gan, “Grating lobe elimination in steer-

able parametric loudspeaker,” IEEE Trans. Ultrason. Ferroelectr. Freq.

Control, vol. 58, no. 2, pp. 437–450, 2011.

[27] F. Farias and W. Abdulla, “On Rayleigh distance and absorption length

of parametric loudspeakers,” in Asia-Paciﬁc Signal & Inform. Process.

Assoc. Annu. Summit and Conf. (APSIPA ASC). IEEE, 2015, pp. 1262–

1265.