Content uploaded by Jacob Donley

Author content

All content in this area was uploaded by Jacob Donley on Oct 13, 2017

Content may be subject to copyright.

ACTIVE SPEECH CONTROL USING WAVE-DOMAIN PROCESSING WITH A LINEAR

WALL OF DIPOLE SECONDARY SOURCES

Jacob Donley⋆, Christian Ritz⋆and W. Bastiaan Kleijn†

⋆School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, Australia

†School of Engineering and Computer Science, Victoria University of Wellington, New Zealand

ABSTRACT

In this paper, we investigate the effects of compensating for wave-

domain ﬁltering delay in an active speech control system. An active

control system utilising wave-domain processed basis functions is

evaluated for a linear array of dipole secondary sources. The target

control soundﬁeld is matched in a least squares sense using orthog-

onal waveﬁelds to a predicted future target soundﬁeld. Filtering

is implemented using a block-based short-time signal processing

approach which induces an inherent delay. We present an autore-

gressive method for predictively compensating for the ﬁlter delay.

An approach to block-length choice that maximises the soundﬁeld

control is proposed for a trade-off between soundﬁeld reproduction

accuracy and prediction accuracy. Results show that block-length

choice has a signiﬁcant effect on the active suppression of speech.

Index Terms—

spatial audio, personal sound, active noise con-

trol, noise barrier, delay compensation, speech emission control.

1. INTRODUCTION

Personal sound [

1

] has been a topic of great interest to researchers

in recent years. Spatial regions of controlled sound can be created

using loudspeaker arrays and superposition of soundwaves can be

used to actively control sound over space [

2

]. Active Noise Control

(ANC) is a technique that allows secondary sources in electro-acoustic

systems to reproduce destructive soundﬁelds thus reducing energy

levels of primary soundﬁelds. The resultant suppressed soundﬁelds

have been successfully employed in several applications, including

noise-cancelling headphones [

3

] and ANC in vehicle cabins [

4

,

5

,

6

]. Ofﬁces, libraries, teleconferencing rooms, restaurants and cafes

may also beneﬁt from ANC over broad spatial areas where physical

partitions could be replaced with an active loudspeaker array.

ANC systems typically comprise a reference signal and/or error

signal which are either fed forward and/or backward, respectively, to

an algorithm for generating loudspeaker signals [

2

]. Hybrid systems

exist that incorporate both feedforward and feedback techniques [

7

,

8

].

Least Mean Squares (LMS) and Filtered-x LMS (FxLMS) control

methods work by adaptively minimising the error signal in a least

squares sense [

9

,

10

]. Multichannel systems with numerous micro-

phones inside, or near, the control space often use adaptive algorithms

to minimise the error over the region [10, 11].

More recent techniques have been shown to be more accurate by

measuring acoustic pressures on boundaries and using the Kirchhoff-

Helmholtz integral to determine the soundﬁeld [

12

,

13

,

14

]. Sampling

the boundary that encloses the space, with microphones, allows the

target soundﬁeld to be estimated in the wave-domain. This extends

the multipoint method by synthesising the entire spatial area and

minimising the error over large spaces [13, 14].

In order to perform wave-domain analysis it is necessary to trans-

form received signals into the (temporal) frequency domain where

basis functions are a function of the wavenumber and spatial loca-

tions [

12

,

15

]. This transformation induces a delay where numerous

samples are required to analyse the signal with high resolution in

the frequency domain. Adaptive algorithms overcome this issue by

automatically compensating for any errors received at the error micro-

phones [

9

,

13

,

14

]. In scenarios where microphones are not placed

inside the control region, it is necessary to account for delay by other

means. Linear prediction with pitch repetition has been shown to

be viable for active speech cancellation with short predictions, up to

2 ms

, and at discrete points in a space [

16

]. However, the predictions

do not predict a regular speech frame of length around

16 ms

and

cancellation occurs only in the vicinity of the control points.

The active control of sound over a linear array has been envi-

sioned [17] using interconnected control units consisting of a micro-

phone, directional loudspeaker and processing modules. However,

the interconnection and modules do not model the received signals on

the boundary in the wave-domain and perform only a phase inversion

which is less robust to soundﬁeld variation. Linear arrays [

18

] have

also been investigated for improvement of noise barriers [

19

,

20

]

which aim to reduce diffraction of sound over a physical barrier by

minimising the pressure at points in space, usually modelled on a

plane spanning height and width. The use of linear arrays, without

a physical barrier, for control over large spatial areas using recently

advanced wave-domain processing is explored in this work.

As a baseline study, we analyse the delay caused by transforming

reference ANC signals to the wave-domain using a block-based signal

processing approach. We propose an autoregressive transform-delay

compensator in conjunction with an inverse ﬁlter that together pro-

duce a virtual source soundﬁeld used in waveﬁeld decomposition to

minimise energy residual of a control soundﬁeld. Through analysis

of the soundﬁeld suppression we show that an optimal block-length

can be chosen for active speech control using wave-domain ﬁlter-

ing without error microphones in the control region. The optimal

block-length is used in a simulated acoustic environment with dipole

secondary sources in a linear array. Acting as an active wall, we show

that the optimal block-length, along with the dipole sources, pro-

vide signiﬁcant cancellation of traversing speech waves with minimal

reproduction towards the primary source.

A description of the error minimised control soundﬁeld synthe-

sis using basis waveﬁelds is given in section 2. An explanation

of dipole modelled soundﬁeld reproduction using synthesised loud-

speaker weights is given in section 3. The short-time block-based

signal processing approach with autoregressive and geometric de-

lay compensation is presented in section 4 with results, analysis,

discussion and conclusions in sections 5 and 6.

© 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future

media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or

redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

0°

s

R

s

φ

RD

D

rc

Dc

(rt, θt)s

D

(rl, φl)

Fig. 1. Active control layout for a linear dipole array (blue) directed

to the right. The microphone (red) is used to predict the unwanted

speech source crossing the array.

2. WAVE-DOMAIN SOUNDFIELD SUPPRESSION

This section derives an expression for loudspeaker weights which

reproduce a soundﬁeld that minimises the residual energy over a con-

trol region, Dc. The active control layout and wave-domain solution

to minimise residual energy are described.

2.1. Active Control Layout and Deﬁnitions

The proposed system using a linear dipole array is shown in Fig. 1

where the loudspeakers form an active wall between a talker and target

quiet zone. The reproduction region for the soundﬁeld,

D

, with spatial

sampling points

x∈D

, has a radius of

RD

and contains a control

subregion,

Dc⊆D

, of radius

rc

. The centre of the loudspeaker array

is located at angle

s

φ

and distance

s

R

. The length of the loudspeaker

array is

s

D

and is designed to reproduce a soundﬁeld for a virtual

point source located at

v

. In this work we refer to the external source

that is to be controlled as the talker with location

t≡v≡(rt, θt)

.

We assume

t

is known, or can be reliably estimated with multiple

microphones, thus a single reference microphone sufﬁces and is

placed at the centre of the loudspeaker array with location

z≡s

R, s

φ

.

Loudspeaker locations are

ll≡(rl, φl)

for

l∈Js

LK

where

s

L

is the

number of loudspeakers,

k= 2πf /c

is the wavenumber and

c

is

the speed of sound in air. The euclidean norm is denoted using

k·k

,

i=‘−1and sets of indices are JAK,{x:x∈N0, x < A}.

2.2. Soundﬁeld Control Technique

The goal is to ﬁnd coefﬁcients for a set of basis functions that min-

imise the residual energy of a control soundﬁeld,

Sc(x;k)

, and an

arbitrary talker soundﬁeld,

St(x;k)

. A simple solution is to per-

form an orthogonalisation on a set of plane-wave basis functions that

produces a well-conditioned triangular matrix and a set of orthogo-

nal basis functions. Expansion coefﬁcients for the orthogonal basis

functions can be easily solved with an inner product.

Any arbitrary soundﬁeld can be completely deﬁned by an or-

thogonal set of solutions of the Helmholtz equation [

21

]. An arbi-

trary 2D soundﬁeld function that satisﬁes the wave equation, such as

Sc(x;k) : D×R→C, can be written as

Sc(x;k) = X

g∈JGK

Eg,mFg(x;k),(1)

where

{Fg}g∈JGK

is the set of orthogonal basis functions,

m∈JNK

are

N

frequency indices, the expansion coefﬁcients for a particular

frequency are Eg,m and Gis the number of basis functions [22].

Solving the inner product

Eg,m =hSt(x;k), Fg(x;k)i

yields

the Eg,m that minimise

min

Eg∈JGK,m∈JNK

kX

g

Eg,mFg(x;k) + St(x;k)k2,(2)

where

kXk2=hX, X i

. The set of orthogonal basis functions,

{Fg}g∈JGK

, can be found by implementing an orthogonalisation on a

set of planewaves,

Ph(x;k) = eikx·ρh

, where

ρh≡(1, ρh)

,

ρh=

(h−1)∆ρ

and

∆ρ= 2π/G

. A Gram-Schmidt process gives the

orthogonalised basis functions, which results in [22]

Fg(x;k) = X

h∈JGK

Rhg,mPh(x;k),(3)

such that

hFi(x;k), Fj(x;k)i=δij

, where

Rhg

is the

(h, g)th

element of the lower triangular matrix, R. Substituting (3) in (1),

yields Sc(x;k) = X

h∈JGK

Qm,hPh(x;k),(4)

where

Qh,m =Pg∈JGKEg,mRhg,m

are the plane-wave coefﬁcients

used to construct an approximation of the control soundﬁeld.

3. LOUDSPEAKER WEIGHTS

In this section, the loudspeaker signals needed for soundﬁeld repro-

duction with monopole and dipole sources are described.

3.1. Monopole Secondary Source Weights

To reproduce

Sc(x;k)

with minimal error to

St(x;k)

, loudspeaker

weights are found in the (temporal) frequency domain [23, 24, 25]

Ql(k) = 2∆φs

iπ

Ě

M

X

Ď

m=−Ě

MX

h∈JGK

iĎmeiĎm(φl−ρh)

H(1)

Ďm(rlk)Qh,m,(5)

where

∆φs= 2 tan−1(s

D/2s

R)/s

L

approximates angular spacing of

ll

for a linear array,

H(1)

ν(·)

is a

νth

-order Hankel function of the ﬁrst

kind and

Ď

M=⌈kRD⌉

is the modal truncation length [

24

]. However,

monopole sources produce acoustic energy in all directions which

may be undesirable as it would present an artiﬁcial echo towards

t

.

3.2. Dipole Secondary Source Weights

To reproduce a soundﬁeld with reduced acoustic energy presented

towards the talker, dipole sources are modelled to reproduce predom-

inantly over

D

. The loudspeakers at

ll

with weights

Ql(k)

are split

into two point sources at

ll,s

for

s∈J2K

with weights

Ql,s(k)

. The

dipole source pair locations are given by

ll,s =ll+ ( ¨

d/2,s

φ−sπ),(6)

where

¨

d

is the distance between the dipole point sources. The objec-

tive of each dipole source pair is to reproduce a wave which constructs

in the direction

(1,s

φ−π)

from

ll

and de-constructs in the direction

(1,s

φ)

from

ll

whilst maintaining the same amplitude and phase as

a monopole source in the constructive direction. This can be ac-

complished by phase shifting and amplitude panning the monopole

loudspeaker weights with the following [21, 26]

Ql,s(k),Ql(k)ei(−1)s(k¨

d−π)/2

2k¨

d,(7)

where as ¨

dbecomes small, ll,s approach ideal dipole sources.

4. SHORT-TIME SIGNAL PROCESSING

In order to reproduce a control soundﬁeld, a time-domain control

signal is ﬁltered using

Ql,s(k)

in the (temporal) frequency domain

and inverse transformed back to the time-domain to yield the set of

loudspeaker signals. Here, a block based approach is used. This

section investigates the inherent time delay that is induced during

the ﬁltering process due to the wave-domain transformation used to

compute the loudspeaker weights of (7).

4.1. Block Processing

An input signal,

v(n)

, broken into blocks (frames) using an analysis

windowing function,

w(n)

, of length

M

, results in an

ath

windowed

frame: r

va(n),v(n+aR)w(n),(8)

where

n∈Z

is the sample number in time,

a∈Z

is the frame

index and

R≤M

is the step size in samples. The

ath

frame is

transformed to the frequency domain to give the

ath

spectral frame as

r

Va(km) = Pn∈JNKrva(n)e−icnkm

/2˙

f

, where

km,2π˙

fm/cN

and

the frame is oversampled with N≥M+L−1for a ﬁlter length L.

Each spectral frame is ﬁltered using

Ql,s(k)

from (7) up to the

maximum frequency,

˙

f

, and inverse transformed to the time-domain

rqa,l,s(n) = R(1

NX

m∈JNK

Ql,s(km)r

Va(km)eicnkm

/2˙

f),(9)

∀n∈JNK

, where

R{·}

returns the real part of its argument, after

which a synthesis window, w(n), equivalent to the analysis window,

is applied to yield the weighted output

qw

a,l,s(n) = rqa,l,s(n−aR)w(n−aR).(10)

The weighted output,

qw

a,l,s(n)

, is added to the accumulated output

signal,

ql,s(n)

, for each dipole source. The analysis and synthesis

windows are chosen so that Pa∈Zw(n−aR)2= 1,∀n∈Z.

4.2. Autoregression Parameter Estimation

The soundﬁeld ﬁltering process induces a delay of

M

samples to

build the current

ath

frame,

rva(n)

, from (8), essential for accurate

reproduction. To perform active control, it is necessary to ﬁnd

R

future samples of the accumulated ql,s(n)that estimate v(n).

Forecasting the input signal’s future values can be accomplished

using an autoregressive (AR) linear predictive ﬁlter. Assuming the

signal is unknown after the current time,

n

, the AR parameters,

paj

,

are estimated using B > Pknown past samples with

ǫ(n+`

b+ 1) = v(n+`

b+ 1) + X

j∈JPKp

ajv(n+`

b−j),(11)

∀`

b∈ B

, where

B={−B, . . . , P − 1}

,

{ǫ(n+`

b+ 1)}`

b∈B

are

prediction errors, the predictor order is

P

and

j∈JPK

are the

coefﬁcient indices. Stable AR coefﬁcients,

p

aj

, can be estimated using

the autocorrelation method [

27

,

28

] (equivalent to the Yule-Walker

method) by approximating the minimisation of the expectation of

|ǫ(n+`

b+ 1)|2,∀`

b∈Z

where, prior to minimisation,

v(n+`

b+ 1)

is windowed with

sw(`

b)

, assuming

{sw(`

b)}`

b /∈{−B,...,−1}= 0

, to

give

sv(`

b)

. Multiplying

(11)

by

v(n+`

b−q

b),q

b∈JPK

and

taking the expectation gives the Yule-Walker (YW) equations,

Pj∈JPKrq

b−jpaj=−rq

b

. We estimate the

jth

autocorrelation,

rj

,

as

p

rj,B−1P−1

`

b=jsv(`

b)sv(`

b−j)

. The YW equations can be writ-

ten in matrix form as

p

Rp

a=−p

r

where

p

a= [pa0,...,paP−1]T

,

p

r= [pr0,...,prP−1]T

and the estimated autocorrelation matrix,

p

R

,

has a Toeplitz structure allowing for an efﬁcient solution.

4.3. Filter-Delay Compensation

Once the

paj

are estimated following section 4.2,

v(n)

can be extrap-

olated by

v(n+´

b+ 1) = −X

j∈JPKp

ajv(n+´

b−j),∀´

b∈Jx

MK(12)

where

{v(n+´

b+1)}´

b∈J

y

MK

are

x

M

future estimates of

v(n)

. From (8),

rva(n)

is an estimated future windowed frame when

x

M≥M

. The

estimated

rva(n)

and partially estimated

{rva−`a−1(n)}`a∈JM

R−1K

are

transformed, ﬁltered, inverse transformed and windowed through (9)

and (10). Adding

qw

a,l,s(n)

to the previous frames obtains

R

future

estimated samples for the output loudspeaker signals,

ql,s(n)

. The

procedures of section 4.2 and section 4.3 are repeated every

R

sam-

ples, including the estimation of paj.

4.4. Geometric-Delay Compensation

The control soundﬁeld modelling requires a virtual source location

and signal. In this work, the reference microphone recording,

z(n)

,

located at

z

, is an attenuated and time delayed version of

v(n)

. Under

the assumption of free-space and that the talker location,

t

, is known,

or can be reliably estimated, the talker signal is found by

v(n) = R(1

NX

m∈JNK

4nPn∈[N]z(n)e−icnkm

/2˙

fo

iH(1)

0(kmkv−zk)eicnkm

/2˙

f),(13)

where

z(n)

is inverse ﬁltered in the frequency domain with

N

sufﬁ-

ciently large compared to the time-delay. For the purpose of sound-

ﬁeld control, t≡vand v(n)is also the virtual source signal.

4.5. Loudspeaker Signals and Reproduction

Upon receiving the reference signal,

z(n)

, the ﬁnal dipole loud-

speaker signals,

ql,s(n)

, are produced by ﬁrstly compensating for

the geometric-delay with (13) to obtain

v(n)

. The virtual source

signal is then extrapolated by

x

M

future estimates computed with (12).

The estimated

v(n)

is transformed to the frequency domain after (8).

The dipole loudspeaker weights,

Ql,s(k)

, are computed with (7)

through (5) after Qh,m is found via (2) and (3).

For the reproduction,

Ql,s(k)

are used as ﬁlters via (9) to obtain

ql,s(n). The actual reproduced control soundﬁeld is given by

Sc(x;k) = X

l∈Js

LK,s∈J2K,n∈Z

ql,s(n)e−icnk/2˙

fT(x,ll,s;k),(14)

∀x∈Dc

, where the 2D acoustic transfer function for each source is

T(x,l;k) = i

4H(1)

0(kkl−xk). Note, Sc(x;k)depends on v(n).

5. RESULTS AND DISCUSSION

5.1. Experimental Setup

For evaluation, the layout of Fig. 1 is used with

RD=s

R= 1 m

,

rc= 0.9 m

,

s

φ=π

and

s

D= 2.1 m

. There are

s

L= 18

dipole

speaker pairs with

¨

d≪1/kmax = 2.73 cm

spacing [

21

,

26

], where

kmax = 2π(2 kHz)/c

and

c= 343 m s−1

. Spatial aliasing in

the soundﬁeld reproduction begins to occur near

2 kHz

which re-

duces the control capability. All signals are sampled at a rate of

16 kHz

with a frame step of

R= 0.5M

for 50% overlapping and

M={64,128,192,256,320,384,448,512}

are window lengths

in samples. A prediction of

x

M=M

future samples is made using

B= 2M

past samples with an order of

P=M

. The window,

Fig. 2

. The pressure ﬁeld for an ideal periodic cancellation at 1kHz

when the linear dipole array is inactive (A) and active (B).

w(n)

, is a square root Hann window. The location of the talker is

t= (2 m, π)

and speech samples used to evaluate the performance

were obtained from the TIMIT corpus [

29

]. Twenty ﬁles were ran-

domly chosen such that the selection was constrained to have a male

to female speaker ratio of 50 : 50.

5.2. Soundﬁeld Suppression

In order to evaluate the suppression of the control system,

32

virtual

microphones are placed in random locations throughout

Dc

. The

actual control and talker soundﬁelds,

Sc(x;k)

and

St(x;k)

, respec-

tively, are approximated over

Dc

using the

32

virtual recordings.

To gauge the performance of the system, the normalised acoustic

suppression between Sc(x;k)and St(x;k)is deﬁned as

ζ(k),şDcSt(x;k) + Sc(x;k)dx

şDc|St(x;k)|dx,(15)

where

Sc(x;k)

is from (14) and, in this work, for simplicity,

St(x;k) = Pn∈Zv(n)e−icnk/2˙

fi

4H(1)

0(kkv−xk)

.

ζ(k)

is

found from (15) for a range of frequencies from

100 Hz

to

8 kHz

.

The real part of

St(x;k)

is shown in Fig. 2 at

1 kHz

for when

Sc(x;k)

is active and inactive, as an example. Fig. 2 clearly shows

signiﬁcant suppression on only one side of the linear dipole array

providing a large quiet zone across the wall of loudspeakers. It is

also apparent that by not strictly sampling the entire boundary of the

control region for the Kirchhoff-Helmholtz integral, the loudspeaker

array does not restrict the movement of a listener in and out of D.

5.3. Synthesis and Prediction Accuracy Trade-off

A trade-off between soundﬁeld reproduction accuracy and prediction

accuracy is apparent in Fig. 3 which shows mean suppression from

156 Hz

to

2 kHz

. Assuming the signal is known (equivalent to a

perfect prediction), as shown in blue in Fig. 3, the longer block

length provides better control whereas a longer (and presumably

therefore less accurate) prediction is required. A smaller block length

is expected to perform worse as it results in fewer analysis frequencies

in the wave domain and, hence, is ﬁltered with less accuracy. Using

a larger block length overcomes this issue and, assuming perfect

prediction, is capable of

−18.8 dB

of suppression on average over

Dc

with a

32 ms

block length. However, with the necessary prediction

4 8 12 16 20 24 28 32

Block Length (ms)

-20

-15

-10

-5

0

Suppression (dB)

Predicted Signal Actual Signal

Fig. 3

. The mean suppression,

ζ

, computed using

1/6th

octave band

means from

156 Hz

to

2 kHz

over

2.54 m2

for an actual future block

in blue and predicted in red. 95% conﬁdence intervals are shown.

0.1 1 8

Frequency (kHz)

-15

-10

-5

0

5

Suppression (dB)

Predicted Signal Actual Signal

Fig. 4

. The suppression,

ζ(k)

, for a

12 ms

block length from

100 Hz

to

8 kHz

over

2.54 m2

. 95% conﬁdence intervals are shaded red and

blue. The bandwidth where spatial aliasing occurs is shaded grey.

to overcome the ﬁltering delay, as shown in red in Fig. 3, the longer

prediction results in less suppression. The peak suppression occurs

with a 12 ms block length and −5.74 dB of suppression on average.

Choosing the block length which attains maximum suppression

from Fig. 3 has the potential to provide the best suppression for

wave-domain processed soundﬁeld control. The optimal block length

in this case is

12 ms

and the suppression for this block length is

shown per frequency in Fig. 4. The downward trend in Fig. 4 as

frequency decreases from

2 kHz

suggests that the control from the

predicted block performs best for lower frequencies. The increase

below

156 Hz

and peak near

300 Hz

is due to the ﬁnite length ﬁlter

causing a loss of reproduction accuracy. It can be seen from Fig. 4

that the mean suppression reaches a peak of

−9.1 dB

near

400 Hz

and maintains mean suppression below

−7.5 dB

from

365 Hz

to

730 Hz

. Future work could include investigating the control above

the spatial Nyquist frequency by either increasing the loudspeaker

density or using hybrid loudspeaker and ANC systems [30, 31].

6. CONCLUSIONS

We have investigated the effects of autoregressive delay compensa-

tion on active speech control when using wave-domain processing

to improve active control over large spatial regions. A system has

been proposed using a linear array of secondary dipole sources which

uses autoregressive prediction with waveﬁeld decompositions used to

minimise residual soundﬁeld energy. The proposed system is capable

of a signiﬁcant mean speech suppression of

−18.8 dB

with an ideally

predicted

32 ms

block over a large

2.54 m2

area. Through analysis

of the proposed control system, a trade-off between reproduction ac-

curacy and prediction accuracy has been shown to exist. A predicted

block with an optimal length of

12 ms

has shown to provide a mean

suppression of −5.74 dB over a 2.54 m2area.

7. REFERENCES

[1]

T. Betlehem, W. Zhang, M. Poletti, and T. D. Abhayapala, “Per-

sonal Sound Zones: Delivering interface-free audio to multiple

listeners,” IEEE Signal Process. Mag., vol. 32, pp. 81–91, 2015.

[2]

Y. Kajikawa, W.-S. Gan, and S. M. Kuo, “Recent advances on

active noise control: open issues and innovative applications,”

APSIPA Trans. Signal Inform. Process., vol. 1, pp. 1–21, 2012.

[3]

S. M. Kuo, S. Mitra, and W.-S. Gan, “Active noise control

system for headphone applications,” IEEE Trans. Control Syst.

Technol., vol. 14, no. 2, pp. 331–335, 2006.

[4]

T. J. Sutton, S. J. Elliott, A. M. McDonald, and T. J. Saunders,

“Active control of road noise inside vehicles,” Noise Control

Eng. J., vol. 42, no. 4, 1994.

[5]

H. Sano, T. Inoue, A. Takahashi, K. Terai, and Y. Nakamura,

“Active control system for low-frequency road noise combined

with an audio system,” IEEE Trans. Speech Audio Process.,

vol. 9, no. 7, pp. 755–763, 2001.

[6]

J. Cheer and S. J. Elliott, “The design and performance of feed-

back controllers for the attenuation of road noise in vehicles,”

Int. J. Acoust. Vibration, vol. 19, no. 3, pp. 155–164, 2014.

[7]

Y. Xiao and J. Wang, “A new feedforward hybrid active noise

control system,” IEEE Signal Process. Lett., vol. 18, no. 10, pp.

591–594, 2011.

[8]

N. V. George and G. Panda, “On the development of adaptive

hybrid active noise control system for effective mitigation of

nonlinear noise,” Signal Process., vol. 92, no. 2, pp. 509–516,

2012.

[9]

O. J. Tobias and R. Seara, “Leaky delayed LMS algorithm:

stochastic analysis for gaussian data and delay modeling error,”

IEEE Trans. Signal Process., vol. 52, no. 6, pp. 1596–1606,

2004.

[10]

I. T. Ardekani and W. H. Abdulla, “Adaptive signal processing

algorithms for creating spatial zones of quiet,” Digital Signal

Process., vol. 27, pp. 129–139, 2014.

[11]

S. Elliott, Signal processing for active control. Academic

press, 2000.

[12]

S. Spors and H. Buchner, “Efﬁcient massive multichannel ac-

tive noise control using wave-domain adaptive ﬁltering,” in Int.

Symp. Commun., Control Signal Process. (ISCCSP). IEEE,

2008, pp. 1480–1485.

[13]

J. Zhang, W. Zhang, and T. D. Abhayapala, “Noise cancellation

over spatial regions using adaptive wave domain processing,” in

Workshop Applicat. Signal Process. Audio Acoust. (WASPAA).

IEEE, 2015, pp. 1–5.

[14]

J. Zhangg, T. D. Abhayapala, P. N. Samarasinghe, W. Zhang,

and S. Jiang, “Sparse complex FxLMS for active noise cancel-

lation over spatial regions,” in Int. Conf. on Acoust., Speech and

Signal Process. (ICASSP). IEEE, 2016, pp. 524–528.

[15]

W. Jin, “Adaptive reverberation cancelation for multizone sound-

ﬁeld reproduction using sparse methods,” in Int. Conf. on

Acoust., Speech and Signal Process. (ICASSP). IEEE, 2016,

pp. 509–513.

[16]

K. Kondo and K. Nakagawa, “Speech emission control using

active cancellation,” Speech Commun., vol. 49, no. 9, pp. 687–

696, Sep. 2007.

[17]

L. Athanas, “Open air noise cancellation,” U.S. Patent

2011/0 274283 A1, Nov. 10, 2011.

[18]

J. Ahrens and S. Spors, “Sound ﬁeld reproduction using planar

and linear arrays of loudspeakers,” IEEE Trans. Audio, Speech,

Lang. Process., vol. 18, no. 8, pp. 2038–2050, 2010.

[19]

C. R. Hart and S.-K. Lau, “Active noise control with linear

control source and sensor arrays for a noise barrier,” Journal of

Sound and Vibration, vol. 331, no. 1, pp. 15–26, 2012.

[20]

W. Chen, H. Min, and X. Qiu, “Noise reduction mechanisms of

active noise barriers,” Noise Control Eng. J., vol. 61, no. 2, pp.

120–126, 2013.

[21]

E. G. Williams, Fourier Acoustics: Sound Radiation and

Nearﬁeld Acoustical Holography. Academic Press, 1999.

[22]

W. Jin and W. B. Kleijn, “Theory and design of multizone sound-

ﬁeld reproduction using sparse methods,” IEEE/ACM Trans.

Audio, Speech, Lang. Process., vol. 23, pp. 2343–2355, 2015.

[23]

Y. J. Wu and T. D. Abhayapala, “Theory and design of sound-

ﬁeld reproduction using continuous loudspeaker concept,” IEEE

Trans. Audio, Speech, Lang. Process., vol. 17, pp. 107–116,

2009.

[24]

W. Jin, W. B. Kleijn, and D. Virette, “Multizone soundﬁeld

reproduction using orthogonal basis expansion,” in Int. Conf. on

Acoust., Speech and Signal Process. (ICASSP). IEEE, 2013,

pp. 311–315.

[25]

J. Donley, C. Ritz, and W. B. Kleijn, “Improving speech privacy

in personal sound zones,” in Int. Conf. on Acoust., Speech and

Signal Process. (ICASSP). IEEE, 2016, pp. 311–315.

[26]

F. Dunn, W. M. Hartmann, D. M. Campbell, N. H. Fletcher, and

T. Rossing, Springer handbook of acoustics. Springer, 2015.

[27] K. K. Paliwal and W. B. Kleijn, “Quantization of LPC parame-

ters,” in Speech Coding and Synthesis. Elsevier Science Inc.,

1995, ch. 12, pp. 433–466.

[28]

P. Stoica and R. L. Moses, Spectral analysis of signals. Upper

Saddle River, NJ: Pearson Prentice Hall, 2005.

[29]

J. Garofolo, L. Lamel, W. Fisher, J. Fiscus, D. Pallett,

N. Dahlgren, and V. Zue, “TIMIT acoustic-phonetic continuous

speech corpus,” Linguistic Data Consortium, 1993.

[30]

J. Donley, C. Ritz, and W. B. Kleijn, “Reproducing Personal

Sound Zones Using a Hybrid Synthesis of Dynamic and Para-

metric Loudspeakers,” in Asia-Paciﬁc Signal & Inform. Process.

Assoc. Annu. Summit and Conf. (APSIPA ASC). IEEE, Dec.

2016, pp. 1–5.

[31]

K. Tanaka, C. Shi, and Y. Kajikawa, “Binaural active noise

control using parametric array loudspeakers,” Applied Acoustics,

vol. 116, pp. 170–176, Jan. 2017.