Content uploaded by Nasim Radmanesh
Author content
All content in this area was uploaded by Nasim Radmanesh on Mar 02, 2016
Content may be subject to copyright.
378 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 2, FEBRUARY 2013
Generation of Isolated Wideband Sound Fields Using
a Combined Two-stage Lasso-LS Algorithm
Nasim Radmanesh, Student Member, IEEE, and Ian S. Burnett, Senior Member, IEEE
Abstract—The prohibitive number of speakers required for
the reproduction of isolated soundfields is the major limitation
preventing solution deployment. This paper addresses the pro-
vision of personal soundfields (zones) to multiple listeners using
a limited number of speakers with an underlying assumption of
fixed virtual sources. For such multizone systems, optimization
of speaker positions and weightings is important to reduce the
number of active speakers. Typically, single stage optimization is
performed, but in this paper a new two-stage pressure matching
optimization is proposed for wideband soundsources.Inthefirst
stage, the least-absolute shrinkage and selection operator (Lasso)
is used to select the speakers’ positions for all sources and fre-
quency bands. A second stage then optimizes reproduction using
all selected speakers on the basis of a regularized least-squares
(LS) algorithm. The performance of the new, two-stage approach
is investigated for different reproduction angles, frequency range
and variable total speaker weight powers. The results demonstrate
that using two-stage Lasso-LS optimization can give up to 69 dB
improvement in the mean squared error (MSE) over a single-stage
LS in the reproduction of two isolated audio signals within control
zones using e.g. 84 speakers.
Index Terms—Isolated sound fields,lasso,least-squares,multi-
zone, wideband.
I. INTRODUCTION
THERE is a range of applications for the reproduction of
multiple isolated wideband sound fields using limited
numbers of speakers. Examples are the provision of private
sound spaces during video conferencing and in communal
areas such as medical consulting rooms, museums, planes and
cars. All of these applications can have the location of virtual
sources fixed such that the choice of speaker positions can then
be based on the positioning and size of the listening zones.
This provides more efficient sound reproduction than can be
achieved using a uniformly spaced array.
One approach for the generation of a desired soundfield in
one zone (known as a bright or active zone) and silence in an
adjacent zone (known as a dark or quiet zone) is active control
of sound [1]. Previous research using that approach involved the
maximization of acoustic energy contrast [2] or acoustic energy
Manuscript received March 19, 2012; revised August 06, 2012 and October
11, 2012; accepted November 05, 2012. Date of publication November 15,
2012; date of current version December 10, 2012. The associate editor coordi-
nating the review of this manuscript and approving it for publication was Dr.
Rongshan Yu.
The authors are with the School of Electrical and Computer Engineering,
RMIT University, Melbourne, VIC 3001, Australia (e-mail: nasim.rad-
manesh@rmit.edu.au).
Digital Object Identifier 10.1109/TASL.2012.2227736
difference [3] between the bright and dark zones using eigen-
value analysis. Similar scenarios are investigated in [4]–[6] for
the generation of personal audio systems. The practical imple-
mentation of a personal audio system is investigated in [5] to
generate acoustic isolation in adjacent seats in aircraft and in [6]
to provide personal sound fields for viewers using a line array
attached to a 17 in. monitor display. Moreover, the authors of [7]
presented broadband beamforming techniques using a speaker
array for focusing the sound to the user. Finally, the regularized
least squares (LS) pressure matching approach was introduced
in [8] for sound reproduction in multiple, isolated zones. The
performance of a multizone system using LS pressure matching
was further investigated in [9] for multiple conversation repro-
duction in a multi user environment.
There are limitations to accurate wideband sound repro-
duction in a multizone system using a practical number of
speakers. Firstly, the number of speakers required for accurate
sound generation increases with the size of the reproduction
area and frequency range [10], [11]. In [9], the present authors
employed 300 speakers around a circle of radius 2 m for signals
up to 4 kHz. Furthermore, delivering wideband signals (e. g.
speech signals) to listeners in multiple zones is a complicated
scenario which restricts limited speaker system reproduction
performance. This is most critical when the zones are in line
[8], [9]. Hence, this paper targets reduction of speaker count
through improved optimization.
Whereas the above mentioned techniques control the com-
plex weights of speakers at fixed locations on a uniformly
spaced array, the present work controls both the speaker loca-
tions and their complex weights to achieve a high performance
multizone system using a minimum number of speakers. Use of
a limited number of speakers and selection of the LS-optimal
speaker locations for maximum pressure matching at micro-
phone positions is a non-convex problem which is in general
NP-hard [12]. Using the least-absolute shrinkage and selection
operator (Lasso) [13], the problem can be converted to a convex
problem. It can then be solved with -penalization using the
least-angle regression (LARS) algorithm [14], which computes
the entire Lasso coefficient path, or with a low-complexity
procedure such as the coordinate descent method [15]. Using a
convex norm, Lasso produces zero-valued weights and thus
generates a reduced set of speakers.
For wideband sound, a single-stage Lasso approach is less
accurate than single-stage LS because Lasso does not employ
all selected speakers to reproduce all frequencies and sources.
Thus, this paper proposes a new two-stage Lasso-LS algorithm.
The first stage of the algorithm uses the selectivity of Lasso to
choose an optimal subset of speakers across all frequency bands,
1558-7916/$31.00 © 2012 IEEE
RADMANESH AND BURNETT: GENERATION OF ISOLATED WIDEBAND SOUND FIELDS 379
while LS is then employed in a second stage to optimize the
weightings for that subset. It should be noted that a preliminary
analysis of the proposed technique was presented by the authors
in [16], while the current paper provides full implementation de-
tails and rigorous analysis of the two-stage Lasso-LS algorithm
for isolated wideband sound generation.
This paper is structured as follows: Section II explains the
multizone system and the use of a pressure matching approach
for the design of speaker weights. In Section III, the single-stage
regularized LS and Lasso methods are outlined and simulation
results using these techniques for multizone wideband sound re-
production presented. In Section IV, the new two-stage, com-
bined Lasso-LS optimization algorithm is discussed and its per-
formance results in comparison to single-stage regularized LS
and Lasso approaches are provided. Finally, in Section V we
conclude the paper with a discussion of the potential future
directions.
II. MULTIZONE SOUNDFIELD GENERATION
To investigate the performance of the multizone system, it is
assumed here that the sound field propagates under free field
conditions, virtual sources and speakers are considered to be
point sources, and that all zones, virtual sources and speakers are
located in the same plane. In the following analysis the aim is to
generate isolated sound fields for wideband
sources (with constituent frequencies ,)in
zones within the speaker array with the radius and angle of the
th source being and . For this paper, the task is to generate
a desired field for every source in one corresponding active zone
and to suppress it effectively in the other zones (silent
zones) using an array of speakers located on an arc of 180 .
This corresponds to the example application scenarios described
previously.
Fig. 1 illustrates the task scenario with the reproduction zones
located at radius from the origin and the th zone’s angle
given by . All zones are located within a semicircle of ra-
dius surrounded by an array of speakers placed on a semi-
circle of radius . Each zone is of radius with a covering
of matching points distributed uniformly over a Euclidean
grid. For each source frequency, , a pressure matching ap-
proach is performed to control the complex sound pressure at
the matching points within the zones. While the pressure
amplitude is directly controlled within the zones, the pres-
sure outside of the control zones is limited by control of the
total speaker weight power. Assuming a time dependency ,
the pressure produced by the speakers at a given
matching point isgivenby[8]:
(1)
where is the th speaker weight for reproduction of the
th source at frequency and is the Green’s function
Fig. 1. Diagram of reproduction of isolated sound fields in a multizone system
using an arc of speakers.
which relates the pressure amplitude of the th speaker and the
pressure at the matching point according to:
(2)
where ,
is the acoustic wave number and the speed of sound propaga-
tion in air. and are respectively
the vector positions of the speakers and matching points in polar
coordinates. The desired sound field of a virtual
source located at to be reproduced e.g. in the
first zone [8] is then given by:
(3)
where relates the pressure amplitude of the th source
and the pressure at the matching point ,thefirst zone is cov-
ered by the first matching points and is the sound
field attenuation in inactive zones.
The speaker weights can be estimated by equating
the reproduced soundfield given by (1) at the matching points
with the desired field given by (3) according to:
(4)
where is the by 1 vector of speaker weights ,
,is the by 1 vector of desired sound
pressures at the matching points and is the by matrix
of the 2-D Green’s function given by:
.
.
..
.
.(5)
The following section examines the use of the LS and Lasso
methods to solve (4).
380 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 2, FEBRUARY 2013
III. SINGLE-STAG E SPEAKER WEIGHT ESTIMATION
A. Single-stage LS Weight Estimation
The regularized LS approach is a robust solution to multizone
sound generation from all directions normally using a uniformly
spaced array of speakers. In this method, for generation of a
frequency, ,ofsource the speaker weights, are deter-
mined by minimizing the squared error between the desired and
reproduced field with a power constraint, such that:
(6)
where is the -norm, is the LS penalty parameter
and is the total speaker weight power. Adjusting
the penalty parameter between zero and infinity changes
the solution from a LS errorsolutiontominimizationofthe
total speaker weight power only. Therefore, the value of
should be optimized to minimize the error while sufficiently
controlling the total speaker weight power. The solution of (6)
is given by (7) when the matrix is tall, i.e. for :
(7)
where is the conjugate transpose and is the by iden-
tity matrix.
B. Single-Stage Lasso Weight Estimation
To reproduce the desired sound field of a virtual point source,
the LS approach allocates power to all speakers of a regularly-
spaced array. In such an array, the number of speakers required
for accurate sound generation increases with the size of the
reproduction area and frequency range [10], [11]. When the
number of speakers is limited, the LS-optimal speaker locations
must be selected for the best soundfield reproduction of the vir-
tual source. To reproduce the desired soundfield for each fre-
quency, of source ,the speakers from candidate
speakers must be activated. From the model se-
lection approaches in [17], one could enforce a desirably small
number of speakers by computing their weights as the solution
of the following optimization problem:
(8)
where the -norm is the number of nonzero weights
in and is the penalty parameter. This is a non-convex
problem which is NP-hard [12] and requires exhaustive searches
over all subsets of columns of the green function matrix
[18]. To overcome this problem, the Lasso algorithm replaces
the non-convex -norm with the convex norm and the com-
plex speaker weights can then be calculated from:
(9)
where is the norm and is the preselected Lasso penalty
parameter. Larger values of produce fewer nonzero speaker
weights and (9) can be solved using a coordinate descent method
in the Frequency-domain [19]. In this algorithm, all speaker
weights, ,are updated individually at each
iteration. If denotes the th column of (the 2-D Green’s
function matrix), the error of the th speaker at the th iteration
is calculated by removing from the effect of prior speaker
entries in that th iteration and the
following speaker entries in
the th iteration:
(10)
Using the updated error, the th speaker weight at the th it-
eration is then given by:
(11)
where for ,and for .is
the unique solution of (9) which is written compactly for two
cases depending on the value of .When ,
there is no stationary point over the differentiable region and
is the unique global minimum whereas for
, the complex stationary point is the unique
global minimum [19]. The algorithm is allowed to iterate until
(9) converges to its global optimum as guaranteed in [20].
The coordinate descent algorithm is a fast convex optimization
solver which provides the solution of (9) for a specificvalueof
. However, the LARS algorithm [14] can also be used to solve
(9) in cases where many values of are of interest. In a sound
reproduction scenario, the LARS algorithm gives a range of
values along the solution path of (9) from which the best may
be chosen for a particular number of active speakers.
C. Reproduction Error
In order to compare the error performance of different algo-
rithms, a reproduction MSE, , generated by every source
at frequency in each zone is calculated as:
(12)
where is the area of each zone and and ,
,are respectively the desired and
reproduced soundfields in the area for the th source at fre-
quency .
Forevaluationoftheerrorperformance within the non-opti-
mized area (NOA), (which is the area outside of the zones
and confined by the circle of radius ), an ideal soundfield at-
tenuationof60dB isassumed(equaltothein-
active zones attenuation). Thus, the MSE , in the NOA
area can be calculated as:
(13)
where and , ,
are respectively the soundfield produced by the virtual source
and by the speakers with weights at frequency in the
NOA area.
RADMANESH AND BURNETT: GENERATION OF ISOLATED WIDEBAND SOUND FIELDS 381
The total mean squared error, of sources at frequency
within the considered area is then calculated as:
(14)
The following subsection compares the performance of the
single-stage LS and single-stage Lasso algorithms for isolated
sound reproduction of two wideband sources.
D. Simulation Result
Wideband Sound Reproduction Using Single-Stage LS and
Single-Stage Lasso: Throughout this paper, the zones have
fixed locations at from the origin and zone angles
are and . It is assumed that zone 1 is the
target zone for source 1 and zone 2 the target zone for source 2.
The speaker and source radii are considered to be
and the number of matching points used in each zone of radius
is . In this section, the performance of the
single-stage Lasso algorithm is compared to the single-stage
LS for isolated sound reproduction of wideband sources
located at and in corresponding
zones. The test wideband signals considered here comprise
center band frequencies of one-third octave bands
from 100 Hz to 16 kHz. For isolated sound generation of
wideband sources using single-stage Lasso, Lasso
problems were solved to select 46 different sets of speakers
from candidate positions. The Lasso penalty param-
eter was fixed across the range of frequencies. The
unified 46 sets of selected speakers give a total of
active speakers for reproduction of both wideband sources
using single-stage Lasso. Fig. 2(b) shows the location of all
active speakers selected in the single-stage Lasso al-
gorithm. For a fair comparison between the single-stage Lasso
and single-stage LS in wideband sound reproduction, the same
number of active speakers was employed for the LS method
at a comparable total speaker weight power .
Speakers used in the single-stage LS were arranged in a uni-
formly-spaced array as demonstrated in Fig. 2(a). For clarity,
Figs. 3 and 4 compare the performance of the single-stage reg-
ularized LS and Lasso methods at a comparative total speaker
weight power for reproduction of two selected frequencies
Hz from source1 and kHz from source2. In
Fig. 3, the speaker weights are calculated for reproduction of
both the discrete frequencies and . For reproduction of both
wideband sources using single-stage Lasso, a set of
speakers were used but only a subset of them were powered
for the reproduction of discrete frequencies or , whereas in
the single-stage LS approach, all speakers were active
for the generation of each discrete frequency or . It can be
seen that single-stage Lasso is less accurate than single-stage
LS for wideband sound generation. Table I lists the mean error
within the zones and the NOA for the sound reproduction
in Fig. 3 at comparable total speaker weight power for both
methods. This table demonstrates that single stage regularized
LS performance is up to 9 dB better than single stage Lasso.
Fig. 4 illustrates the squared error generated within the circle of
radius for reproduction of the soundfields in Fig. 3.
Fig. 2. Speaker locations for (a) single-stage LS and (b) single-stage Lasso,
sources located at and , the number of speakers used
in the reproduction of two wideband sources is .
Fig. 3. Sound field visualization and speaker weights using (a) single-stage LS
and (b) single-stage Lasso. Two wideband sources are located at
and (Single frequencies of Hz and kHz shown
for clarity). In both methods, the number of speakers used in the reproduction
of two wideband sources is .
The error generated by Lasso can be seen almost everywhere
within the circle (Fig. 4(b)) whereas the LS error is lowest in
the vicinity of the zones (Fig. 4(a)).
IV. TWO-STAGE SPEAKER WEIGHT ESTIMATION
A. Two Stage, Combined Lasso-LS Optimization
In this section a new two-stage, combined Lasso-LS algo-
rithm (Fig. 5) is proposed for wideband sound reproduction
with an underlying assumption of fixed virtual sources. For a
given set of virtual source positions and a large set of po-
tential positions, the LS-optimal speaker locations for the lim-
ited number of speakers must be selected for maximum pressure
matching at the microphones. This is a non-convex problem
382 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 2, FEBRUARY 2013
Fig. 4. Reproduction squared er ror (dB) within the circle of radius using (a)
single-stage LS and (b) single-stage Lasso. Two wideband sources are located at
and (Squared error for generation of single frequencies
Hz from source1 and kHz from source2 shown for clarity).
The number of active speakers in both methods is identical i.e. .
TAB L E I
THE MEAN ERROR (ME) OF FIG.3
which can be converted to a convex problem using Lasso as
explained in the previous section. Since Lasso guarantees the
global minimum solution of the convex problem using -pe-
nalization, a first-stage Lasso optimization was employed to se-
lect a subset of speakers across all frequency bands. In other
words, for the present problem, Lasso is used to find the most
efficient locations in terms of achieving the lowest reproduction
MSE for a limited number of speakers. However, single-stage
Lasso selects a different set of active speakers for each discrete
frequency, of every source . Thus, the selected speakers for
the reproduction of each frequency represent only a subset of
all speakers selected for the generation of all wideband sources.
Multizone system performance can thus be improved by ac-
tivating all selected speakers in a second stage LS optimiza-
tion and performing complex weighting optimization for all fre-
quency sources. In this paper, second stage regularized LS esti-
Fig. 5. The two-stage Lasso-LS algorithm.
mation is proposed as it is theoretically guaranteed to result in
the lowest MSE for the selected set of speakers.
In the first stage of the Lasso-LS algorithm, Lasso prob-
lems are solved to determine all active speakers used for repro-
duction of wideband sound fields with con-
stituent frequencies ,. In this paper the center
band frequencies of one-third octave bands [21] from 100 Hz
to 16 kHz were used to select active speakers in the first stage
Lasso algorithm. The first-stage Lasso penalty parameter de-
termines the number of selected active speakers. The larger the
first-stage penalty parameter, , is made, the lower the number
of speakers selected. The columns of matrix are the solutions
of the Lasso problems for sound reproduction of sources com-
prising frequency bins such that:
.
.
..
.
..
.
.
(15)
where is the by matrix of the speaker weights. The th
entry of the total speaker weights vector is then calculated
from:
(16)
The by 1 vector of total speaker weights, ,isthe
output of the first stage algorithm. The locations of the active
speakers to be used in the second stage are then extracted on
the basis of the nonzero entries of . The number of those
nonzero entries in thus determines the number of active
speakers, , to be used in the second stage.
RADMANESH AND BURNETT: GENERATION OF ISOLATED WIDEBAND SOUND FIELDS 383
In the second stage, the non-uniformly spaced arc of ac-
tive speakers is used. In this stage, all selected speakers are
utilized for sound reproduction of all constituent frequencies,
,of wideband sources using LS optimiza-
tion. The number of LS problems to be solved for generation of
isolated audio signals is thus . The penalty parameter
limits the power of the second-stage LS solution.
The important facet of the proposed technique is the loca-
tion optimization of speakers using Lasso sparsity before op-
timization of the sound field control parameters (e.g. speaker
weights) using a LS pressure matching approach. The Lasso
optimization could also be combined with other sound repro-
duction techniques such as various beamforming methods [8],
[22] to generate the desired directivity patterns in the zones. In
such scenarios, the Lasso algorithm would guarantee the LS-op-
timal speaker placement using a convex norm, while the
beamformer would control the directivity pattern of selected
speakers. This technique could enable a wideband beamformer
to achieve a controllable directivity pattern while minimizing
the total number of speakers using the selectivity of Lasso.
B. Simulation Results
This section employs the same configuration of multizone
surround system used in Section III. The performance of the
two-stage, combined Lasso-LS algorithm is compared to the
single-stage LS approach for sound reproduction of
wideband sources in zones. One-third octave bands
with center band frequencies from 100 Hz to 16 kHz were used
in the first stage Lasso algorithm and sets of active
speakers out of candidate positions were selected
(corresponding to constituent frequencies of
wideband sources). The unified 46 sets of speakers give a total
of active speaker locations to be used in the second-stage al-
gorithm for reproduction of both wideband sources. Figs. 6(b)
and 9(b) show the location of all active speakers selected in the
Lasso-LS algorithm for two different scenarios which will be
discussed in the next section. For a fair comparison between
the single-stage LS and two-stage Lasso-LS algorithm in terms
of wideband sound reproduction, the same number of active
speakers selected in the Lasso-LS algorithm was employed to
evaluate the LS method at a comparable total speaker weight
power. Speakers used in the single-stage LS are arranged in a
uniformly-spaced array as demonstrated in Figs. 6(a) and 9(a).
In the following, different aspects are considered for the perfor-
mance assessment such as virtual source angles, total speaker
weight power and frequency.
Virtual Source Angles: Two distinct scenarios for virtual
source angles were considered so as to investigate the perfor-
mance of the two-stage Lasso-LS algorithm using a limited
number of speakers. In the first scenario, the virtual sources are
close to each other and in the middle of the semicircle array of
candidate positions (at and ). In the second
scenario, the virtual sources (at and )
are further apart and compared to the first scenario they are
closer to the edge of the semicircle array. In both scenarios,
the penalty parameter used for the first stage Lasso was fixed
across the range of frequencies . Using the Lasso-LS
Fig. 6. Speaker locations for (a) single-stage LS and (b) two-stage Lasso-LS,
sources located at and , the number of speakers used in the
reproduction of two wideband sources is .
approach, speaker positions are selected during the first stage
Lasso algorithm which selects (from all candidate positions)
a number of positions closer to the virtual sources; thus, the
selection of active speakers in the first stage depends on the
virtual source angles.
Scenario 1: Virtual Sources Located at and
:In this scenario the virtual sources are close to each other
and active speakers selected across frequency in the Lasso-LS
algorithm are close to both sources as illustrated in Fig. 6(b).
In practice, a subset of selected speakers close to both virtual
sources is utilized for Lasso-LS sound reproduction of both
sources. For a fair comparison between the single-stage LS and
two-stage Lasso-LS algorithm in wideband sound reproduction,
the same number of active speakers selected in the Lasso-LS al-
gorithm was employed in the LS method at a comparable total
speaker weight power on a uniformly spaced array as illustrated
in Fig. 6(a) .
Fig. 7 illustrates the resulting soundfield and the corre-
sponding speaker weights of the LS and Lasso-LS algorithms
for scenario 1. This figure shows for clarity the generation of
a selected low frequency from source1 Hz and
a higher frequency from source2 kHz .TableII
demonstrates that in scenario 1, the flexibility of the Lasso-LS
algorithm in locating a limited number of speakers (e.g.
) at the LS-optimal positions can result in up to 14 dB
and 31 dB improvement over a single-stage LS approach in the
reproduction of frequencies and , respectively, within the
control zones. However, a regularly-spaced array is up to 2 dB
more successful in terms of limiting the error within the NOA.
This is because the speaker locations are not selected for the
best desired sound reproduction in the NOA and thus they are
not located at the best spots to generate minimum error in this
area. Fig. 8 illustrates the squared error generated within the
circle of radius for reproduction of the soundfields in Fig. 7.
Fig. 8(a) shows that the single-stage LS error is lowest in the
vicinity of the zones in comparison to the error generated in the
Lasso-LS algorithm (Figs. 8(b)).
384 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 2, FEBRUARY 2013
Fig. 7. Sound field visualization and speaker weights using (a) single-stage LS
and (b) two-stage Lasso-LS. Two wideband sources are located at
and . (Single frequencies of Hz and kHz shown for
clarity). In both methods the number of active speakers is .
Fig. 8. Reproduction squared er ror (dB) within the circle of radius using (a)
single-stage LS and (b) two-stage Lasso-LS. Two wideband sources are located
at and (Squared error for generation of single frequencies
Hz from source1 and kHz from source2 shown for clarity).
The number of active speakers in both methods is identical i.e. .
Scenario 2: Virtual Sources Located at and
:In this scenario, where the virtual sources are located fur-
ther apart, the active speakers in the Lasso_LS approach are
TAB L E I I
THE MEAN ERROR (ME) OF FIG.7
Fig. 9. Speaker locations for (a) single-stage LS and (b) two-stage Lasso-LS,
sources located at and , the number of speakers used in
the reproduction of two wideband sources is .
selected only for the reproduction of one of the sources as il-
lustrated in Fig. 9(b). For the LS sound reproduction approach,
similarly to the previous scenario a regularly-spaced array of
speakers is used as demonstrated in Fig. 9(a). Fig. 10
compares the above methods for scenario 2 at selected frequen-
cies Hz and kHz and at comparable total
speaker weight power. Table III shows that using
active speakers for scenario 2, Lasso-LS is up to 7 dB and
19 dB more accurate than a LS approach in generation of the
selected low frequency and the higher frequency respec-
tively. These results show that, in scenario 2 (similarly to sce-
nario 1), the Lasso-LS algorithm improves the performance of
the multizone system in terms of reproduction at both low and
higher frequencies within the control zones but not within the
NOA. Fig. 11 illustrates the squared error generated within the
circle of radius for reproduction of the soundfields in Fig. 10.
In Fig. 11(a) similarly to Fig. 8(a), the LS error is lowest in the
vicinity of the zones in comparison to the error generated in the
Lasso-LS algorithm (Figs. 11(b)).
The MSE Versus Total Speaker Weight Power: Fig. 12 shows
that the two-stage, combined Lasso-LS technique far outper-
forms the single-stage LS in reproduction of a selected low fre-
quency tone Hz and a high frequency tone
kHz of virtual wideband source1 located at .
In the Lasso-LS algorithm, speaker locations are se-
lected out of candidate positions on the basis of
two wideband sources located at and
and using a fixed Lasso penalty parameter (Fig. 2(b)).
RADMANESH AND BURNETT: GENERATION OF ISOLATED WIDEBAND SOUND FIELDS 385
Fig. 10. Sound field visualization and speaker weights using (a) single-stage
LS and (b) two-stage Lasso-LS. Two wideband sources are located at
and (single frequencies of Hz and kHz
shown for clarity). In both methods the number ofactive speakers is
.
Fig. 11. Reproduction squared error (dB) within the circle of radius using
(a) single-stage LS and (b) two-stage Lasso-LS. Two wideband sources are lo-
cated at and (Squared error for generation of single
frequencies Hz from source1 and kHz from source2 shown
for clarity). The number of active speakers in both methods is identical i.e.
.
In the single-stage LS approach the same number of speakers
TAB L E III
THE MEAN ERROR (ME) OF FIG.10
Fig. 12. The MSE vs total speaker weight power for generation of selected
frequencies (a) Hz and (b) kHz of source1 at with
zone1 as the target zone for this source and zone2 as the corresponding silent
zone. In the Lasso-LS algorithm, the speaker locations are selected considering
two wideband sources located at and . The number of
active speakers in both methods is identical i.e. .
are used in a uniformly-spaced array as demon-
strated in Fig. 2(a). The total speaker weight powers for the
regularized LS and Lasso-LS techniques are varied by tuning,
respectively, the LS penalty parameter, and the Lasso-LS
second stage penalty parameter, .InFig.12,Lasso-LSgen-
erates e.g. the selected low frequency Hz in zone1
and zone2 respectively with 39 dB and 20 dB less MSE than
single-stage LS at a power (Fig. 12(a)) and generates
the selected high frequency kHz within zone1 and
zone2 with 48 dB and 49 dB less MSE at a power
(Fig. 12(b)) using active speakers. The reason for
the Lasso-LS algorithm’s dramatic advantage over single-stage
386 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 2, FEBRUARY 2013
LS within the control zones is the capability of the Lasso-LS al-
gorithm to control both the speaker locations and their complex
weights for maximum pressure matching at microphone posi-
tions. Fig. 12 demonstrates, however, that the Lasso-LS algo-
rithm performance in the NOA is not better than the LS method.
This is because the speaker locations are not selected for the
best desired sound reproduction in the NOA and thus are not
located at the best spots to generate minimum error using a lim-
ited number of speakers in this area.
The MSE Versus Frequency: Fig. 13(a) shows that Lasso-LS
algorithm outperforms the single-stage LS across all frequen-
ciesfrom100Hzto16kHzforreproduction of virtual wide-
band source1 located at . Fig. 13(b) demonstrates
the corresponding total speaker weight power versus frequency
and shows that the total speaker weight power is comparable at
frequencies over 200 Hz, while the penalty parameters are kept
constant across frequency (The LS penalty parameter is ,
the Lasso first-stage penalty parameter is and the LS
second stage penalty parameter is ). As can be seen, for
both zones, there is a LS peak error at kHz which re-
sults from the use of a solution with lower energy than the min-
imum energy solution. However, locating the same number of
speakers at the LS-optimal positions in the Lasso-LS approach
provides solutions with enough energy required for accurate
multizone sound reproduction across frequency. The MSE gen-
erated at kHz using LS and Lasso-LS is 17 dB and
76 dB in zone 1 and 20 dB and 89 dB in zone 2 respec-
tively at the total speaker weight power, .Furthermore,
increasing the frequency from 700 Hz to 16 kHz, the Lasso-LS
algorithm outperforms the single-stage LS by a further 20 dB.
At low frequencies with decreasing frequency from 500 Hz, the
performance of two methods becomes more competitive.
V. C ONCLUSIONS
The aim of this work was to generate isolated wideband
soundfields in multiple listening spaces while minimizing the
number of speakers required. The paper demonstrates signif-
icant performance improvements within control zones over
a LS matching approach through the use of new, two-stage
Lasso-LS optimization approach. The latter exploits the se-
lectivity of Lasso to maximize the performance of a restricted
set of speakers reproducing fixed, wideband virtual sources.
The results show that using the proposed two-stage Lasso-LS
optimization for wideband sound reproduction can result in
up to 69 dB improvement in MSE within control zones over
a single-stage LS optimization. In addition, the performance
of the Lasso-LS approach over a single-stage LS algorithm
is accentuated at higher frequencies, with performance gains
of over 20 dB in experiments. Finally, the work shows that
limited arcs of e.g. 84 speakers can be used to successfully
create a multi-zone system for multiple users (effectively,
personal audio spaces). This makes the techniques appropriate
for realistic soundfield installations. Work is currently focused
on further reduction in the number of speakers by providing a
bank of candidate positions corresponding to frequency ranges.
This will lead to extensions of the approach to generation of
personal spaces in three-dimensional environments using a
Fig. 13. (a) The MSE measured in the control zones vs frequency and (b) total
speaker weight power vs frequency for source1 at with zone1 as
the target zone for this source and zone2 as the corresponding silent zone. In the
Lasso-LS algorithm, the speaker locations are selected considering two wide-
band sources located at and . The number of active
speakers in both methods is identical i.e. .
small set of speakers. A comparison of Lasso-LS pressure
matching with Lasso-beamforming techniques could also be
the topic of future work. Beamforming, however, does not
produce an exact desired field as it does not implement pressure
matching.
REFERENCES
[1] P. A. Nelson and S. J. Elliott, Active Control of Sound.NewYork:
Academic, 1993.
[2] J. Choi and Y. Kim, “Generation of an acoustically bright zone with an
illuminated region using multiple sources,” J. Acoust. Soc. Amer., vol.
111, no. 4, pp. 1695–1700, Apr. 2002.
[3] M.Shin,S.Lee,F.Fazi,P.Nelson,D.Kim,S.Wang,K.Park,and
J. Seo, “Maximization of acoustic energy difference between two
spaces,” J. Acoust. Soc. Amer., vol. 128, no. 1, pp. 121–131, Jul. 2010.
[4] W. F. Druyvesteyn and J. Garas, “Personal sound,” J. Audio Eng. Soc.,
vol. 45, no. 9, pp. 685–701, Sep. 1997.
[5] S. J. Elliott and M. Jones, “Active headrest for personal audio,” J.
Acoust. Soc. Amer., vol. 119, no. 5, pp. 2702–2709, May 2006.
[6] J.-H. Chang, C.-H. Lee, J.-Y. Park, and Y.-H. Kim, “A realization of
sound focused personal audio system using acoustic contrast control,”
J. Acoust. Soc. Amer., vol. 125, no. 4, pp. 2091–2097, Apr. 2009.
[7] I. Tashev, J. Droppo, M. Seltzer, and A. Acero, “Robust design of wide-
band loudspeaker arrays,” in Proc. IEEE Int. Conf. Acoust., Speech,
Signal Process., ICASSP’08, Las Vegas, NV, Mar 30–Apr 4 2008.
[8] M. Poletti, “An investigation of 2D multizone surround sound sys-
tems,” in Proc. AES 125th Convention. Audio Eng. Society, San Fran-
cisco, CA, Oct. 2008.
RADMANESH AND BURNETT: GENERATION OF ISOLATED WIDEBAND SOUND FIELDS 387
[9] N.RadmaneshandI.S.Burnett,“Reproductionofindependentnarrow-
band soundfields in a multizone surround system and its extension to
speech signal sources,” in Proc. IEEE Int. Conf. Acoust, Speech, Signal
Process., ICASSP’11, Prague, Czech Republic, May 22–27, 2011.
[10] D. B. Ward and T. D. Abhayapala, “Reproduction of a plane-wave
sound field using an array of loudspeakers,” IEEE Trans. Speech Audio
Process., vol. 9, no. 6, pp. 697–707, Sep. 2001.
[11] M. Poletti, “Robust two-dimensional surround sound reproduction for
nonuniform loudspeaker layouts,” J. Audio Eng. Soc.,vol.55,no.7/8,
pp. 598–610, Jul./Aug. 2007.
[12] B. K. Natarajan, “Sparse approximate solutions to linear systems,”
SIAM J. Comput., vol. 24, no. 2, pp. 227–234, 1995.
[13] R. Tibshirani, “Regression shrinkage and selection via the lasso,” J. R.
Statist. Soc., Ser. B, vol. 58, no. 1, pp. 267–288, 1996.
[14] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least angle re-
gression,” Ann. Statist., vol. 32, no. 2, pp. 407–499, 2004.
[15] J. H. Friedman, T. Hastie, H. Hoefling, and R. Tibshirani, “Pathwise
coordinate optimization,” Ann. Appl. Statist., vol. 2, no. 1, pp. 302–332,
2007.
[16] N. Radmanesh and I. S. Burnett, “Wideband sound reproduction in a
2D multizone system using a combined two-stage lasso-LS algorithm ,”
in Proc. IEEE Sens. Array and Multichannel Signal Process. Work-
shop, SAM 2012, Hoboken, NJ, Jun 17–20, 2012.
[17] L. Breiman, “Heuristics of instability and stabilization in model selec-
tion,” Ann. Statist., vol. 24, pp. 2350–2383, 1996.
[18] E. J. Candès and Y. Plan, “Near-ideal model selection by minimiza-
tion,” Ann. Statist., vol. 37, no. 5A, pp. 2145–2177, 2009.
[19] G. N. Lilis, D. Angelosante, and G. B. Giannakis, “Sound field repro-
duction using the lasso,” IEEE Trans. Speech Audio Process., vol. 18,
no. 8, pp. 1902–1912, Nov. 2010.
[20] S. Sardy, A. Bruce, and P. Tseng, “Block coordinate relaxation methods
for nonparametric wavelet denoising,” J. Comput. Graph. Statist., vol.
9, no. 2, pp. 361–379, Jun. 2000.
[21] Specification for Octave-Band and Fractional-Octave-Band Analog
and Digital Filters, ANSI s1.11-2004, Feb. 2004, Standards Secretariat
Acoustical Society of America.
[22] S. Haykin, Array Signal Processing. Englewood Cliffs, NJ: Prentice-
Hall, 1995.
Nasim Radmanesh (M’09) received the B.E. degree
in electrical engineering in 2005 from K.N. Toosi
University of Technology, Tehran, Iran. She received
the M.E. degree in electronics engineering in 2008
from RMIT University, Melbourne, Australia, where
she is currently pursuing the Ph.D. degree, supported
in part by the Australian Research Council (ARC).
Her research interests include multichannel sound
reproduction and optimization techniques for array
processing with emphasis on compressed sensing.
Ian Burnett (M’87–SM’02) received the Ph.D.
degree in 1992 from the University of Bath, Bath,
UK. He is currently Professor and Head of School
of Electrical and Computer Engineering, RMIT
University, Melbourne, Australia. His research
interests include speech processing, 3D audio re-
production, recording and transmission, semantic
media content description, 3D video processing and
quality of multimedia experience. He is a member
of the editorial board of IEEE Multimedia, and was
previously Chair of MDS at MPEG.