Conference PaperPDF Available

Active Speech Control using Wave-Domain Processing with a Linear Wall of Dipole Secondary Sources

Authors:

Abstract and Figures

In this paper, we investigate the effects of compensating for wave-domain filtering delay in an active speech control system. An active control system utilising wave-domain processed basis functions is evaluated for a linear array of dipole secondary sources. The target control soundfield is matched in a least squares sense using orthogonal wavefields to a predicted future target soundfield. Filtering is implemented using a block-based short-time signal processing approach which induces an inherent delay. We present an autoregressive method for predictively compensating for the filter delay. An approach to block-length choice that maximises the soundfield control is proposed for a trade-off between soundfield reproduction accuracy and prediction accuracy. Results show that block-length choice has a significant effect on the active suppression of speech.
Content may be subject to copyright.
ACTIVE SPEECH CONTROL USING WAVE-DOMAIN PROCESSING WITH A LINEAR
WALL OF DIPOLE SECONDARY SOURCES
Jacob Donley, Christian Ritzand W. Bastiaan Kleijn
School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, Australia
School of Engineering and Computer Science, Victoria University of Wellington, New Zealand
ABSTRACT
In this paper, we investigate the effects of compensating for wave-
domain filtering delay in an active speech control system. An active
control system utilising wave-domain processed basis functions is
evaluated for a linear array of dipole secondary sources. The target
control soundfield is matched in a least squares sense using orthog-
onal wavefields to a predicted future target soundfield. Filtering
is implemented using a block-based short-time signal processing
approach which induces an inherent delay. We present an autore-
gressive method for predictively compensating for the filter delay.
An approach to block-length choice that maximises the soundfield
control is proposed for a trade-off between soundfield reproduction
accuracy and prediction accuracy. Results show that block-length
choice has a significant effect on the active suppression of speech.
Index Terms
spatial audio, personal sound, active noise con-
trol, noise barrier, delay compensation, speech emission control.
1. INTRODUCTION
Personal sound [
1
] has been a topic of great interest to researchers
in recent years. Spatial regions of controlled sound can be created
using loudspeaker arrays and superposition of soundwaves can be
used to actively control sound over space [
2
]. Active Noise Control
(ANC) is a technique that allows secondary sources in electro-acoustic
systems to reproduce destructive soundfields thus reducing energy
levels of primary soundfields. The resultant suppressed soundfields
have been successfully employed in several applications, including
noise-cancelling headphones [
3
] and ANC in vehicle cabins [
4
,
5
,
6
]. Offices, libraries, teleconferencing rooms, restaurants and cafes
may also benefit from ANC over broad spatial areas where physical
partitions could be replaced with an active loudspeaker array.
ANC systems typically comprise a reference signal and/or error
signal which are either fed forward and/or backward, respectively, to
an algorithm for generating loudspeaker signals [
2
]. Hybrid systems
exist that incorporate both feedforward and feedback techniques [
7
,
8
].
Least Mean Squares (LMS) and Filtered-x LMS (FxLMS) control
methods work by adaptively minimising the error signal in a least
squares sense [
9
,
10
]. Multichannel systems with numerous micro-
phones inside, or near, the control space often use adaptive algorithms
to minimise the error over the region [10, 11].
More recent techniques have been shown to be more accurate by
measuring acoustic pressures on boundaries and using the Kirchhoff-
Helmholtz integral to determine the soundfield [
12
,
13
,
14
]. Sampling
the boundary that encloses the space, with microphones, allows the
target soundfield to be estimated in the wave-domain. This extends
the multipoint method by synthesising the entire spatial area and
minimising the error over large spaces [13, 14].
In order to perform wave-domain analysis it is necessary to trans-
form received signals into the (temporal) frequency domain where
basis functions are a function of the wavenumber and spatial loca-
tions [
12
,
15
]. This transformation induces a delay where numerous
samples are required to analyse the signal with high resolution in
the frequency domain. Adaptive algorithms overcome this issue by
automatically compensating for any errors received at the error micro-
phones [
9
,
13
,
14
]. In scenarios where microphones are not placed
inside the control region, it is necessary to account for delay by other
means. Linear prediction with pitch repetition has been shown to
be viable for active speech cancellation with short predictions, up to
2 ms
, and at discrete points in a space [
16
]. However, the predictions
do not predict a regular speech frame of length around
16 ms
and
cancellation occurs only in the vicinity of the control points.
The active control of sound over a linear array has been envi-
sioned [17] using interconnected control units consisting of a micro-
phone, directional loudspeaker and processing modules. However,
the interconnection and modules do not model the received signals on
the boundary in the wave-domain and perform only a phase inversion
which is less robust to soundfield variation. Linear arrays [
18
] have
also been investigated for improvement of noise barriers [
19
,
20
]
which aim to reduce diffraction of sound over a physical barrier by
minimising the pressure at points in space, usually modelled on a
plane spanning height and width. The use of linear arrays, without
a physical barrier, for control over large spatial areas using recently
advanced wave-domain processing is explored in this work.
As a baseline study, we analyse the delay caused by transforming
reference ANC signals to the wave-domain using a block-based signal
processing approach. We propose an autoregressive transform-delay
compensator in conjunction with an inverse filter that together pro-
duce a virtual source soundfield used in wavefield decomposition to
minimise energy residual of a control soundfield. Through analysis
of the soundfield suppression we show that an optimal block-length
can be chosen for active speech control using wave-domain filter-
ing without error microphones in the control region. The optimal
block-length is used in a simulated acoustic environment with dipole
secondary sources in a linear array. Acting as an active wall, we show
that the optimal block-length, along with the dipole sources, pro-
vide significant cancellation of traversing speech waves with minimal
reproduction towards the primary source.
A description of the error minimised control soundfield synthe-
sis using basis wavefields is given in section 2. An explanation
of dipole modelled soundfield reproduction using synthesised loud-
speaker weights is given in section 3. The short-time block-based
signal processing approach with autoregressive and geometric de-
lay compensation is presented in section 4 with results, analysis,
discussion and conclusions in sections 5 and 6.
© 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future
media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
0°
s
R
s
φ
RD
D
rc
Dc
(rt, θt)s
D
(rl, φl)
Fig. 1. Active control layout for a linear dipole array (blue) directed
to the right. The microphone (red) is used to predict the unwanted
speech source crossing the array.
2. WAVE-DOMAIN SOUNDFIELD SUPPRESSION
This section derives an expression for loudspeaker weights which
reproduce a soundfield that minimises the residual energy over a con-
trol region, Dc. The active control layout and wave-domain solution
to minimise residual energy are described.
2.1. Active Control Layout and Definitions
The proposed system using a linear dipole array is shown in Fig. 1
where the loudspeakers form an active wall between a talker and target
quiet zone. The reproduction region for the soundfield,
D
, with spatial
sampling points
xD
, has a radius of
RD
and contains a control
subregion,
DcD
, of radius
rc
. The centre of the loudspeaker array
is located at angle
s
φ
and distance
s
R
. The length of the loudspeaker
array is
s
D
and is designed to reproduce a soundfield for a virtual
point source located at
v
. In this work we refer to the external source
that is to be controlled as the talker with location
tv(rt, θt)
.
We assume
t
is known, or can be reliably estimated with multiple
microphones, thus a single reference microphone suffices and is
placed at the centre of the loudspeaker array with location
zs
R, s
φ
.
Loudspeaker locations are
ll(rl, φl)
for
lJs
LK
where
s
L
is the
number of loudspeakers,
k= 2πf /c
is the wavenumber and
c
is
the speed of sound in air. The euclidean norm is denoted using
k·k
,
i=1and sets of indices are JAK,{x:xN0, x < A}.
2.2. Soundfield Control Technique
The goal is to find coefficients for a set of basis functions that min-
imise the residual energy of a control soundfield,
Sc(x;k)
, and an
arbitrary talker soundfield,
St(x;k)
. A simple solution is to per-
form an orthogonalisation on a set of plane-wave basis functions that
produces a well-conditioned triangular matrix and a set of orthogo-
nal basis functions. Expansion coefficients for the orthogonal basis
functions can be easily solved with an inner product.
Any arbitrary soundfield can be completely defined by an or-
thogonal set of solutions of the Helmholtz equation [
21
]. An arbi-
trary 2D soundfield function that satisfies the wave equation, such as
Sc(x;k) : D×RC, can be written as
Sc(x;k) = X
gJGK
Eg,mFg(x;k),(1)
where
{Fg}gJGK
is the set of orthogonal basis functions,
mJNK
are
N
frequency indices, the expansion coefficients for a particular
frequency are Eg,m and Gis the number of basis functions [22].
Solving the inner product
Eg,m =hSt(x;k), Fg(x;k)i
yields
the Eg,m that minimise
min
EgJGK,mJNK
kX
g
Eg,mFg(x;k) + St(x;k)k2,(2)
where
kXk2=hX, X i
. The set of orthogonal basis functions,
{Fg}gJGK
, can be found by implementing an orthogonalisation on a
set of planewaves,
Ph(x;k) = eikx·ρh
, where
ρh(1, ρh)
,
ρh=
(h1)∆ρ
and
ρ= 2π/G
. A Gram-Schmidt process gives the
orthogonalised basis functions, which results in [22]
Fg(x;k) = X
hJGK
Rhg,mPh(x;k),(3)
such that
hFi(x;k), Fj(x;k)i=δij
, where
Rhg
is the
(h, g)th
element of the lower triangular matrix, R. Substituting (3) in (1),
yields Sc(x;k) = X
hJGK
Qm,hPh(x;k),(4)
where
Qh,m =PgJGKEg,mRhg,m
are the plane-wave coefficients
used to construct an approximation of the control soundfield.
3. LOUDSPEAKER WEIGHTS
In this section, the loudspeaker signals needed for soundfield repro-
duction with monopole and dipole sources are described.
3.1. Monopole Secondary Source Weights
To reproduce
Sc(x;k)
with minimal error to
St(x;k)
, loudspeaker
weights are found in the (temporal) frequency domain [23, 24, 25]
Ql(k) = 2∆φs
Ě
M
X
Ď
m=Ě
MX
hJGK
iĎmeiĎm(φlρh)
H(1)
Ďm(rlk)Qh,m,(5)
where
φs= 2 tan1(s
D/2s
R)/s
L
approximates angular spacing of
ll
for a linear array,
H(1)
ν(·)
is a
νth
-order Hankel function of the first
kind and
Ď
M=kRD
is the modal truncation length [
24
]. However,
monopole sources produce acoustic energy in all directions which
may be undesirable as it would present an artificial echo towards
t
.
3.2. Dipole Secondary Source Weights
To reproduce a soundfield with reduced acoustic energy presented
towards the talker, dipole sources are modelled to reproduce predom-
inantly over
D
. The loudspeakers at
ll
with weights
Ql(k)
are split
into two point sources at
ll,s
for
sJ2K
with weights
Ql,s(k)
. The
dipole source pair locations are given by
ll,s =ll+ ( ¨
d/2,s
φ),(6)
where
¨
d
is the distance between the dipole point sources. The objec-
tive of each dipole source pair is to reproduce a wave which constructs
in the direction
(1,s
φπ)
from
ll
and de-constructs in the direction
(1,s
φ)
from
ll
whilst maintaining the same amplitude and phase as
a monopole source in the constructive direction. This can be ac-
complished by phase shifting and amplitude panning the monopole
loudspeaker weights with the following [21, 26]
Ql,s(k),Ql(k)ei(1)s(k¨
dπ)/2
2k¨
d,(7)
where as ¨
dbecomes small, ll,s approach ideal dipole sources.
4. SHORT-TIME SIGNAL PROCESSING
In order to reproduce a control soundfield, a time-domain control
signal is filtered using
Ql,s(k)
in the (temporal) frequency domain
and inverse transformed back to the time-domain to yield the set of
loudspeaker signals. Here, a block based approach is used. This
section investigates the inherent time delay that is induced during
the filtering process due to the wave-domain transformation used to
compute the loudspeaker weights of (7).
4.1. Block Processing
An input signal,
v(n)
, broken into blocks (frames) using an analysis
windowing function,
w(n)
, of length
M
, results in an
ath
windowed
frame: r
va(n),v(n+aR)w(n),(8)
where
nZ
is the sample number in time,
aZ
is the frame
index and
RM
is the step size in samples. The
ath
frame is
transformed to the frequency domain to give the
ath
spectral frame as
r
Va(km) = PnJNKrva(n)eicnkm
/2˙
f
, where
km,2π˙
fm/cN
and
the frame is oversampled with NM+L1for a filter length L.
Each spectral frame is filtered using
Ql,s(k)
from (7) up to the
maximum frequency,
˙
f
, and inverse transformed to the time-domain
rqa,l,s(n) = R(1
NX
mJNK
Ql,s(km)r
Va(km)eicnkm
/2˙
f),(9)
nJNK
, where
R{·}
returns the real part of its argument, after
which a synthesis window, w(n), equivalent to the analysis window,
is applied to yield the weighted output
qw
a,l,s(n) = rqa,l,s(naR)w(naR).(10)
The weighted output,
qw
a,l,s(n)
, is added to the accumulated output
signal,
ql,s(n)
, for each dipole source. The analysis and synthesis
windows are chosen so that PaZw(naR)2= 1,nZ.
4.2. Autoregression Parameter Estimation
The soundfield filtering process induces a delay of
M
samples to
build the current
ath
frame,
rva(n)
, from (8), essential for accurate
reproduction. To perform active control, it is necessary to find
R
future samples of the accumulated ql,s(n)that estimate v(n).
Forecasting the input signal’s future values can be accomplished
using an autoregressive (AR) linear predictive filter. Assuming the
signal is unknown after the current time,
n
, the AR parameters,
paj
,
are estimated using B > Pknown past samples with
ǫ(n+`
b+ 1) = v(n+`
b+ 1) + X
jJPKp
ajv(n+`
bj),(11)
`
b∈ B
, where
B={−B, . . . , P 1}
,
{ǫ(n+`
b+ 1)}`
b∈B
are
prediction errors, the predictor order is
P
and
jJPK
are the
coefficient indices. Stable AR coefficients,
p
aj
, can be estimated using
the autocorrelation method [
27
,
28
] (equivalent to the Yule-Walker
method) by approximating the minimisation of the expectation of
|ǫ(n+`
b+ 1)|2,`
bZ
where, prior to minimisation,
v(n+`
b+ 1)
is windowed with
sw(`
b)
, assuming
{sw(`
b)}`
b /∈{−B,...,1}= 0
, to
give
sv(`
b)
. Multiplying
(11)
by
v(n+`
bq
b),q
bJPK
and
taking the expectation gives the Yule-Walker (YW) equations,
PjJPKrq
bjpaj=rq
b
. We estimate the
jth
autocorrelation,
rj
,
as
p
rj,B1P1
`
b=jsv(`
b)sv(`
bj)
. The YW equations can be writ-
ten in matrix form as
p
Rp
a=p
r
where
p
a= [pa0,...,paP−1]T
,
p
r= [pr0,...,prP−1]T
and the estimated autocorrelation matrix,
p
R
,
has a Toeplitz structure allowing for an efficient solution.
4.3. Filter-Delay Compensation
Once the
paj
are estimated following section 4.2,
v(n)
can be extrap-
olated by
v(n+´
b+ 1) = X
jJPKp
ajv(n+´
bj),´
bJx
MK(12)
where
{v(n+´
b+1)}´
bJ
y
MK
are
x
M
future estimates of
v(n)
. From (8),
rva(n)
is an estimated future windowed frame when
x
MM
. The
estimated
rva(n)
and partially estimated
{rva`a1(n)}`aJM
R1K
are
transformed, filtered, inverse transformed and windowed through (9)
and (10). Adding
qw
a,l,s(n)
to the previous frames obtains
R
future
estimated samples for the output loudspeaker signals,
ql,s(n)
. The
procedures of section 4.2 and section 4.3 are repeated every
R
sam-
ples, including the estimation of paj.
4.4. Geometric-Delay Compensation
The control soundfield modelling requires a virtual source location
and signal. In this work, the reference microphone recording,
z(n)
,
located at
z
, is an attenuated and time delayed version of
v(n)
. Under
the assumption of free-space and that the talker location,
t
, is known,
or can be reliably estimated, the talker signal is found by
v(n) = R(1
NX
mJNK
4nPn[N]z(n)eicnkm
/2˙
fo
iH(1)
0(kmkvzk)eicnkm
/2˙
f),(13)
where
z(n)
is inverse filtered in the frequency domain with
N
suffi-
ciently large compared to the time-delay. For the purpose of sound-
field control, tvand v(n)is also the virtual source signal.
4.5. Loudspeaker Signals and Reproduction
Upon receiving the reference signal,
z(n)
, the final dipole loud-
speaker signals,
ql,s(n)
, are produced by firstly compensating for
the geometric-delay with (13) to obtain
v(n)
. The virtual source
signal is then extrapolated by
x
M
future estimates computed with (12).
The estimated
v(n)
is transformed to the frequency domain after (8).
The dipole loudspeaker weights,
Ql,s(k)
, are computed with (7)
through (5) after Qh,m is found via (2) and (3).
For the reproduction,
Ql,s(k)
are used as filters via (9) to obtain
ql,s(n). The actual reproduced control soundfield is given by
Sc(x;k) = X
lJs
LK,sJ2K,nZ
ql,s(n)eicnk/2˙
fT(x,ll,s;k),(14)
xDc
, where the 2D acoustic transfer function for each source is
T(x,l;k) = i
4H(1)
0(kklxk). Note, Sc(x;k)depends on v(n).
5. RESULTS AND DISCUSSION
5.1. Experimental Setup
For evaluation, the layout of Fig. 1 is used with
RD=s
R= 1 m
,
rc= 0.9 m
,
s
φ=π
and
s
D= 2.1 m
. There are
s
L= 18
dipole
speaker pairs with
¨
d1/kmax = 2.73 cm
spacing [
21
,
26
], where
kmax = 2π(2 kHz)/c
and
c= 343 m s1
. Spatial aliasing in
the soundfield reproduction begins to occur near
2 kHz
which re-
duces the control capability. All signals are sampled at a rate of
16 kHz
with a frame step of
R= 0.5M
for 50% overlapping and
M={64,128,192,256,320,384,448,512}
are window lengths
in samples. A prediction of
x
M=M
future samples is made using
B= 2M
past samples with an order of
P=M
. The window,
Fig. 2
. The pressure field for an ideal periodic cancellation at 1kHz
when the linear dipole array is inactive (A) and active (B).
w(n)
, is a square root Hann window. The location of the talker is
t= (2 m, π)
and speech samples used to evaluate the performance
were obtained from the TIMIT corpus [
29
]. Twenty files were ran-
domly chosen such that the selection was constrained to have a male
to female speaker ratio of 50 : 50.
5.2. Soundfield Suppression
In order to evaluate the suppression of the control system,
32
virtual
microphones are placed in random locations throughout
Dc
. The
actual control and talker soundfields,
Sc(x;k)
and
St(x;k)
, respec-
tively, are approximated over
Dc
using the
32
virtual recordings.
To gauge the performance of the system, the normalised acoustic
suppression between Sc(x;k)and St(x;k)is defined as
ζ(k),şDcSt(x;k) + Sc(x;k)dx
şDc|St(x;k)|dx,(15)
where
Sc(x;k)
is from (14) and, in this work, for simplicity,
St(x;k) = PnZv(n)eicnk/2˙
fi
4H(1)
0(kkvxk)
.
ζ(k)
is
found from (15) for a range of frequencies from
100 Hz
to
8 kHz
.
The real part of
St(x;k)
is shown in Fig. 2 at
1 kHz
for when
Sc(x;k)
is active and inactive, as an example. Fig. 2 clearly shows
significant suppression on only one side of the linear dipole array
providing a large quiet zone across the wall of loudspeakers. It is
also apparent that by not strictly sampling the entire boundary of the
control region for the Kirchhoff-Helmholtz integral, the loudspeaker
array does not restrict the movement of a listener in and out of D.
5.3. Synthesis and Prediction Accuracy Trade-off
A trade-off between soundfield reproduction accuracy and prediction
accuracy is apparent in Fig. 3 which shows mean suppression from
156 Hz
to
2 kHz
. Assuming the signal is known (equivalent to a
perfect prediction), as shown in blue in Fig. 3, the longer block
length provides better control whereas a longer (and presumably
therefore less accurate) prediction is required. A smaller block length
is expected to perform worse as it results in fewer analysis frequencies
in the wave domain and, hence, is filtered with less accuracy. Using
a larger block length overcomes this issue and, assuming perfect
prediction, is capable of
18.8 dB
of suppression on average over
Dc
with a
32 ms
block length. However, with the necessary prediction
4 8 12 16 20 24 28 32
Block Length (ms)
-20
-15
-10
-5
0
Suppression (dB)
Predicted Signal Actual Signal
Fig. 3
. The mean suppression,
ζ
, computed using
1/6th
octave band
means from
156 Hz
to
2 kHz
over
2.54 m2
for an actual future block
in blue and predicted in red. 95% confidence intervals are shown.
0.1 1 8
Frequency (kHz)
-15
-10
-5
0
5
Suppression (dB)
Predicted Signal Actual Signal
Fig. 4
. The suppression,
ζ(k)
, for a
12 ms
block length from
100 Hz
to
8 kHz
over
2.54 m2
. 95% confidence intervals are shaded red and
blue. The bandwidth where spatial aliasing occurs is shaded grey.
to overcome the filtering delay, as shown in red in Fig. 3, the longer
prediction results in less suppression. The peak suppression occurs
with a 12 ms block length and 5.74 dB of suppression on average.
Choosing the block length which attains maximum suppression
from Fig. 3 has the potential to provide the best suppression for
wave-domain processed soundfield control. The optimal block length
in this case is
12 ms
and the suppression for this block length is
shown per frequency in Fig. 4. The downward trend in Fig. 4 as
frequency decreases from
2 kHz
suggests that the control from the
predicted block performs best for lower frequencies. The increase
below
156 Hz
and peak near
300 Hz
is due to the finite length filter
causing a loss of reproduction accuracy. It can be seen from Fig. 4
that the mean suppression reaches a peak of
9.1 dB
near
400 Hz
and maintains mean suppression below
7.5 dB
from
365 Hz
to
730 Hz
. Future work could include investigating the control above
the spatial Nyquist frequency by either increasing the loudspeaker
density or using hybrid loudspeaker and ANC systems [30, 31].
6. CONCLUSIONS
We have investigated the effects of autoregressive delay compensa-
tion on active speech control when using wave-domain processing
to improve active control over large spatial regions. A system has
been proposed using a linear array of secondary dipole sources which
uses autoregressive prediction with wavefield decompositions used to
minimise residual soundfield energy. The proposed system is capable
of a significant mean speech suppression of
18.8 dB
with an ideally
predicted
32 ms
block over a large
2.54 m2
area. Through analysis
of the proposed control system, a trade-off between reproduction ac-
curacy and prediction accuracy has been shown to exist. A predicted
block with an optimal length of
12 ms
has shown to provide a mean
suppression of 5.74 dB over a 2.54 m2area.
7. REFERENCES
[1]
T. Betlehem, W. Zhang, M. Poletti, and T. D. Abhayapala, “Per-
sonal Sound Zones: Delivering interface-free audio to multiple
listeners,” IEEE Signal Process. Mag., vol. 32, pp. 81–91, 2015.
[2]
Y. Kajikawa, W.-S. Gan, and S. M. Kuo, “Recent advances on
active noise control: open issues and innovative applications,
APSIPA Trans. Signal Inform. Process., vol. 1, pp. 1–21, 2012.
[3]
S. M. Kuo, S. Mitra, and W.-S. Gan, “Active noise control
system for headphone applications,” IEEE Trans. Control Syst.
Technol., vol. 14, no. 2, pp. 331–335, 2006.
[4]
T. J. Sutton, S. J. Elliott, A. M. McDonald, and T. J. Saunders,
“Active control of road noise inside vehicles,Noise Control
Eng. J., vol. 42, no. 4, 1994.
[5]
H. Sano, T. Inoue, A. Takahashi, K. Terai, and Y. Nakamura,
“Active control system for low-frequency road noise combined
with an audio system,” IEEE Trans. Speech Audio Process.,
vol. 9, no. 7, pp. 755–763, 2001.
[6]
J. Cheer and S. J. Elliott, “The design and performance of feed-
back controllers for the attenuation of road noise in vehicles,”
Int. J. Acoust. Vibration, vol. 19, no. 3, pp. 155–164, 2014.
[7]
Y. Xiao and J. Wang, “A new feedforward hybrid active noise
control system,” IEEE Signal Process. Lett., vol. 18, no. 10, pp.
591–594, 2011.
[8]
N. V. George and G. Panda, “On the development of adaptive
hybrid active noise control system for effective mitigation of
nonlinear noise,” Signal Process., vol. 92, no. 2, pp. 509–516,
2012.
[9]
O. J. Tobias and R. Seara, “Leaky delayed LMS algorithm:
stochastic analysis for gaussian data and delay modeling error,
IEEE Trans. Signal Process., vol. 52, no. 6, pp. 1596–1606,
2004.
[10]
I. T. Ardekani and W. H. Abdulla, “Adaptive signal processing
algorithms for creating spatial zones of quiet,” Digital Signal
Process., vol. 27, pp. 129–139, 2014.
[11]
S. Elliott, Signal processing for active control. Academic
press, 2000.
[12]
S. Spors and H. Buchner, “Efficient massive multichannel ac-
tive noise control using wave-domain adaptive filtering,” in Int.
Symp. Commun., Control Signal Process. (ISCCSP). IEEE,
2008, pp. 1480–1485.
[13]
J. Zhang, W. Zhang, and T. D. Abhayapala, “Noise cancellation
over spatial regions using adaptive wave domain processing,” in
Workshop Applicat. Signal Process. Audio Acoust. (WASPAA).
IEEE, 2015, pp. 1–5.
[14]
J. Zhangg, T. D. Abhayapala, P. N. Samarasinghe, W. Zhang,
and S. Jiang, “Sparse complex FxLMS for active noise cancel-
lation over spatial regions,” in Int. Conf. on Acoust., Speech and
Signal Process. (ICASSP). IEEE, 2016, pp. 524–528.
[15]
W. Jin, “Adaptive reverberation cancelation for multizone sound-
field reproduction using sparse methods,” in Int. Conf. on
Acoust., Speech and Signal Process. (ICASSP). IEEE, 2016,
pp. 509–513.
[16]
K. Kondo and K. Nakagawa, “Speech emission control using
active cancellation,Speech Commun., vol. 49, no. 9, pp. 687–
696, Sep. 2007.
[17]
L. Athanas, “Open air noise cancellation,” U.S. Patent
2011/0 274283 A1, Nov. 10, 2011.
[18]
J. Ahrens and S. Spors, “Sound field reproduction using planar
and linear arrays of loudspeakers,” IEEE Trans. Audio, Speech,
Lang. Process., vol. 18, no. 8, pp. 2038–2050, 2010.
[19]
C. R. Hart and S.-K. Lau, “Active noise control with linear
control source and sensor arrays for a noise barrier,Journal of
Sound and Vibration, vol. 331, no. 1, pp. 15–26, 2012.
[20]
W. Chen, H. Min, and X. Qiu, “Noise reduction mechanisms of
active noise barriers,Noise Control Eng. J., vol. 61, no. 2, pp.
120–126, 2013.
[21]
E. G. Williams, Fourier Acoustics: Sound Radiation and
Nearfield Acoustical Holography. Academic Press, 1999.
[22]
W. Jin and W. B. Kleijn, “Theory and design of multizone sound-
field reproduction using sparse methods,” IEEE/ACM Trans.
Audio, Speech, Lang. Process., vol. 23, pp. 2343–2355, 2015.
[23]
Y. J. Wu and T. D. Abhayapala, “Theory and design of sound-
field reproduction using continuous loudspeaker concept,” IEEE
Trans. Audio, Speech, Lang. Process., vol. 17, pp. 107–116,
2009.
[24]
W. Jin, W. B. Kleijn, and D. Virette, “Multizone soundfield
reproduction using orthogonal basis expansion,” in Int. Conf. on
Acoust., Speech and Signal Process. (ICASSP). IEEE, 2013,
pp. 311–315.
[25]
J. Donley, C. Ritz, and W. B. Kleijn, “Improving speech privacy
in personal sound zones,” in Int. Conf. on Acoust., Speech and
Signal Process. (ICASSP). IEEE, 2016, pp. 311–315.
[26]
F. Dunn, W. M. Hartmann, D. M. Campbell, N. H. Fletcher, and
T. Rossing, Springer handbook of acoustics. Springer, 2015.
[27] K. K. Paliwal and W. B. Kleijn, “Quantization of LPC parame-
ters,” in Speech Coding and Synthesis. Elsevier Science Inc.,
1995, ch. 12, pp. 433–466.
[28]
P. Stoica and R. L. Moses, Spectral analysis of signals. Upper
Saddle River, NJ: Pearson Prentice Hall, 2005.
[29]
J. Garofolo, L. Lamel, W. Fisher, J. Fiscus, D. Pallett,
N. Dahlgren, and V. Zue, “TIMIT acoustic-phonetic continuous
speech corpus,” Linguistic Data Consortium, 1993.
[30]
J. Donley, C. Ritz, and W. B. Kleijn, “Reproducing Personal
Sound Zones Using a Hybrid Synthesis of Dynamic and Para-
metric Loudspeakers,” in Asia-Pacific Signal & Inform. Process.
Assoc. Annu. Summit and Conf. (APSIPA ASC). IEEE, Dec.
2016, pp. 1–5.
[31]
K. Tanaka, C. Shi, and Y. Kajikawa, “Binaural active noise
control using parametric array loudspeakers,” Applied Acoustics,
vol. 116, pp. 170–176, Jan. 2017.
... • An active speech control method for the cancellation of speech across loudspeaker barriers is proposed [119]. ...
... • Soundfield reproduction loudspeaker weights are extended to dipole weights for speech suppression across active acoustic barriers [119]. ...
... • A novel autoregressive model is proposed for predicting non-stationary speech and is used to compensate for real-time filter delay in active soundfield control systems [119]. ...
Thesis
Full-text available
The experience and utility of personal sound is a highly sought after characteristic of shared spaces. Personal sound allows individuals, or small groups of individuals, to listen to separate streams of audio content without external interruption from a third-party. The desired effects of personal acoustic environments can also be areas of minimal sound, where quiet spaces facilitate an effortless mode of communication. These characteristics have become exceedingly difficult to produce in busy environments such as cafes, restaurants, open plan offices and entertainment venues. The concept of, and the ability to provide, spaces of such nature has been of significant interest to researchers in the past two decades. This thesis answers open questions in the area of personal sound reproduction using loudspeaker arrays, which is the active reproduction of soundfields over extended spatial regions of interest. We first provide a review of the mathematical foundations of acoustics theory, single zone and multiple zone soundfield reproduction, as well as background on the human perception of sound. We then introduce novel approaches for the integration of psychoacoustic models in multizone soundfield reproductions and describe implementations that facilitate the efficient computation of complex soundfield synthesis. The psychoacoustic based zone weighting is shown to considerably improve soundfield accuracy, as measured by the soundfield error, and the proposed computational methods are shown capable of providing several orders of magnitude better performance with insignificant effects on synthesis quality. Consideration is then given to the enhancement of privacy and quality in personal sound zones and in particular on the effects of unwanted sound leaking between zones. Optimisation algorithms, along with a priori estimations of cascaded zone leakage filters, are then established so as to provide privacy between the sound zones without diminishing quality. Simulations and real-world experiments are performed, using linear and part-circle loudspeaker arrays, to confirm the practical feasibility of the proposed privacy and quality control techniques. The experiments show that good quality and confidential privacy are achievable simultaneously. The concept of personal sound is then extended to the active suppression of speech across loudspeaker boundaries. Novel suppression techniques are derived for linear and planar loudspeaker boundaries, which are then used to simulate the reduction of speech levels over open spaces and suppression of acoustic reflections from walls. The suppression is shown to be as effective as passive fibre panel absorbers. Finally, we propose a novel ultrasonic parametric and electrodynamic loudspeaker hybrid design for acoustic contrast enhancement in multizone reproduction scenarios and show that significant acoustic contrast can be achieved above the fundamental spatial aliasing frequency.
Article
Full-text available
Due to strong inter-channel interference in multichannel ANC, there are fundamental problems associated with the filter adaptation and online secondary path modelling remains a major challenge. This paper proposes a wave-domain adaptation algorithm for multichannel ANC with online secondary path modelling to cancel tonal noise over an extended region of 2D plane in a reverberant room. The design is based on exploiting the diagonal-dominance property of the secondary path in the wave domain. The proposed wave-domain secondary path model is applicable to both concentric and non-concentric circular loudspeaker and microphone array placement and is also robust against array positioning errors. Normalized least mean squares-type algorithms are adopted for adaptive feedback control. Computational complexity is analyzed and compared with the conventional time-domain and frequency-domain multichannel ANC. Through simulation-based verification in comparison with existing methods, the proposed algorithm demonstrates more efficient adaptation with low-level auxiliary noise.
Conference Paper
Full-text available
In this paper, we compare the performance of two active dereverberation techniques using a planar array of microphones and loudspeakers. The two techniques are based on a solution to the Kirchhoff-Helmholtz Integral Equation (KHIE). We adapt a Wave Field Synthesis (WFS) based method to the application of real-time 3D dereverberation by using a low-latency pre-filter design. The use of First-Order Differential (FOD) models is also proposed as an alternative method to the use of monopoles with WFS and which does not assume knowledge of the room geometry or primary sources. The two methods are compared by observing the suppression of reflections off a single active wall over the volume of a room in the time and (temporal) frequency domain. The FOD method provides better suppression of reflections than the WFS based method but at the expense of using higher order models. The equivalent absorption coefficients are comparable to passive fibre panel absorbers.
Conference Paper
Full-text available
This paper proposes a hybrid approach to personal sound zones utilising multizone soundfield reproduction techniques and parametric loudspeakers. Crossover filters are designed, to switch between reproduction methods, through analytical analysis of aliasing artifacts in multizone reproductions. By realising the designed crossover filters, wideband acoustic contrast between zones is significantly improved. The trade-off between acoustic contrast and the bandwidth of the reproduced soundfield is investigated. Results show that by incorporating the proposed hybrid model the whole wideband bandwidth is spatial-aliasing free with a mean acoustic contrast consistently above 54.2dB, an improvement of up to 24.2dB from a non-hybrid approach, with as few as 16 dynamic loudspeakers and one parametric loudspeaker.
Conference Paper
Full-text available
This paper proposes two methods for providing speech privacy between spatial zones in anechoic and reverberant environments. The methods are based on masking the content leaked between regions. The masking is optimised to maximise the speech intelligibility contrast (SIC) between the zones. The first method uses a uniform masker signal that is combined with desired multizone loudspeaker signals and requires acoustic contrast between zones. The second method computes a space-time domain masker signal in parallel with the loudspeaker signals so that the combination of the two emphasises the spectral masking in the targeted quiet zone. Simulations show that it is possible to achieve a significant SIC in anechoic environments whilst maintaining speech quality in the bright zone.
Article
Full-text available
Sound rendering is increasingly being required to extend over certain regions of space for multiple listeners, known as personal sound zones, with minimum interference to listeners in other regions. In this article, we present a systematic overview of the major challenges that have to be dealt with for multizone sound control in a room. Sound control over multiple zones is formulated as an optimization problem, and a unified framework is presented to compare two state-of-the-art sound control techniques. While conventional techniques have been focusing on point-to-point audio processing, we introduce a wave-domain sound field representation and active room compensation for sound pressure control over a region of space. The design of directional loudspeakers is presented and the advantages of using arrays of directional sources are illustrated for sound reproduction, such as better control of sound fields over wide areas and reduced total number of loudspeaker units, thus making it particularly suitable for establishing personal sound zones.
Article
Full-text available
Active noise control systems offer a potential method of reducing the weight of acoustic treatments in vehicles and, therefore, of increasing fuel efficiency. The commercialisation of active noise control has not been widespread, however, partly due to the cost of implementation. This paper investigates the design and performance of feedback road noise control systems, which could be implemented cost-effectively by using the car audio loudspeakers as control sources and low-cost microphones as error sensors. Three feedback control systems are investigated, of increasing complexity: a single-input single-output (SISO) controller; a SISO controller employing weighted arrays of error sensors and control sources; and a fully-coupled multi-input multi-output (MIMO) controller. For each of the three controllers robustness and disturbance enhancement constraints are defined and by formulating the three controllers using an Internal Model Control (IMC) architecture, and using frequency discretisation, the constrained optimisation problems are solvable using sequential quadratic programming. The performance of the three controllers and the associated design methods are first evaluated in a simulated environment, which allows the physical limits on performance to be understood. Finally, to validate the results in the simulated environment, the performance of the three controllers has been calculated using data measured in a car cabin and it has been shown that the fully-coupled MIMO controller is able to achieve significant low frequency road noise control, at the expense of increased implementation complexity compared to the SISO and SISO weighted transducer arrays feedback controllers.
Book
Signal Processing for Active Control sets out the signal processing and automatic control techniques that are used in the analysis and implementation of active systems for the control of sound and vibration. After reviewing the performance limitations introduced by physical aspects of active control, Stephen Elliott presents the calculation of the optimal performance and the implementation of adaptive real time controllers for a wide variety of active control systems. Active sound and vibration control are technologically important problems with many applications. 'Active control' means controlling disturbance by superimposing a second disturbance on the original source of disturbance. Put simply, initial noise + other specially-generated noise or vibration = silence [or controlled noise]. This book presents a unified approach to techniques that are used in the analysis and implementation of different control systems. It includes practical examples at the end of each chapter to illustrate the use of various approaches. This book is intended for researchers, engineers, and students in the field of acoustics, active control, signal processing, and electrical engineering.
Article
This paper reports the binaural active noise control (ANC) system developed to deal with factory noise. The control points are located in the vicinity of the left and right ears of a worker sitting along the production line. Due to the complicated safety requirements in the factory, secondary sources and error microphones are not allowed to be placed near the worker. Therefore, the proposed ANC system employs the feedforward structure and adopts the parametric array loudspeakers (PALs) as the secondary sources. The PAL is a type of directional loudspeaker that generates a much narrower sound field as compared to the conventional loudspeaker. Once the proposed ANC system has been trained offline, the error microphones can be removed. The performance of the binaural ANC system is successfully demonstrated based on a digital signal processor (DSP) implementation.
Conference Paper
This paper proposes wave-domain adaptive processing for noise cancellation within a large spatial region. We use fundamental solutions of the Helmholtz wave-equation as basis functions to express the noise field over a spatial region and show the wave-domain processing directly on the decomposition coefficients to control the entire region. A feedback control system is implemented, where only a single microphone array is placed at the boundary of the control region to measure the residual signals, and a loudspeaker array is used to generate the anti-noise signals. We develop the adaptive wave-domain filtered-x least mean square algorithm. Simulation results show that using the proposed method the noise over the entire control region can be significantly reduced with fast convergence in both free-field and reverberant environments.
Article
Multizone soundfield reproduction over an extended spatial region is a challenging problem in acoustic signal processing. We introduce a method of reproducing a multizone soundfield within a desired region in reverberant environments. It is based on the identification of the acoustic transfer function (ATF) from the loudspeaker over the desired reproduction region using a limited number of microphone measurements. We assume that the soundfield is sparse in the domain of planewave decomposition and identify the ATF using sparse methods. The estimates of the ATFs are then used to derive the optimal least-squares solution for the loudspeaker filters that minimize the reproduction error over the entire reproduction region. Simulations confirm that the method leads to a significantly reduced number of required microphones for accurate multizone sound reproduction, while it also facilitates the reproduction over a wide frequency range. Practical experiments are used to verify the sparse planewave representation of the reverberant soundfield in a real-world listening environment.