Content uploaded by Nasim Radmanesh

Author content

All content in this area was uploaded by Nasim Radmanesh on Mar 02, 2016

Content may be subject to copyright.

378 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 2, FEBRUARY 2013

Generation of Isolated Wideband Sound Fields Using

a Combined Two-stage Lasso-LS Algorithm

Nasim Radmanesh, Student Member, IEEE, and Ian S. Burnett, Senior Member, IEEE

Abstract—The prohibitive number of speakers required for

the reproduction of isolated soundﬁelds is the major limitation

preventing solution deployment. This paper addresses the pro-

vision of personal soundﬁelds (zones) to multiple listeners using

a limited number of speakers with an underlying assumption of

ﬁxed virtual sources. For such multizone systems, optimization

of speaker positions and weightings is important to reduce the

number of active speakers. Typically, single stage optimization is

performed, but in this paper a new two-stage pressure matching

optimization is proposed for wideband soundsources.Intheﬁrst

stage, the least-absolute shrinkage and selection operator (Lasso)

is used to select the speakers’ positions for all sources and fre-

quency bands. A second stage then optimizes reproduction using

all selected speakers on the basis of a regularized least-squares

(LS) algorithm. The performance of the new, two-stage approach

is investigated for different reproduction angles, frequency range

and variable total speaker weight powers. The results demonstrate

that using two-stage Lasso-LS optimization can give up to 69 dB

improvement in the mean squared error (MSE) over a single-stage

LS in the reproduction of two isolated audio signals within control

zones using e.g. 84 speakers.

Index Terms—Isolated sound ﬁelds,lasso,least-squares,multi-

zone, wideband.

I. INTRODUCTION

THERE is a range of applications for the reproduction of

multiple isolated wideband sound ﬁelds using limited

numbers of speakers. Examples are the provision of private

sound spaces during video conferencing and in communal

areas such as medical consulting rooms, museums, planes and

cars. All of these applications can have the location of virtual

sources ﬁxed such that the choice of speaker positions can then

be based on the positioning and size of the listening zones.

This provides more efﬁcient sound reproduction than can be

achieved using a uniformly spaced array.

One approach for the generation of a desired soundﬁeld in

one zone (known as a bright or active zone) and silence in an

adjacent zone (known as a dark or quiet zone) is active control

of sound [1]. Previous research using that approach involved the

maximization of acoustic energy contrast [2] or acoustic energy

Manuscript received March 19, 2012; revised August 06, 2012 and October

11, 2012; accepted November 05, 2012. Date of publication November 15,

2012; date of current version December 10, 2012. The associate editor coordi-

nating the review of this manuscript and approving it for publication was Dr.

Rongshan Yu.

The authors are with the School of Electrical and Computer Engineering,

RMIT University, Melbourne, VIC 3001, Australia (e-mail: nasim.rad-

manesh@rmit.edu.au).

Digital Object Identiﬁer 10.1109/TASL.2012.2227736

difference [3] between the bright and dark zones using eigen-

value analysis. Similar scenarios are investigated in [4]–[6] for

the generation of personal audio systems. The practical imple-

mentation of a personal audio system is investigated in [5] to

generate acoustic isolation in adjacent seats in aircraft and in [6]

to provide personal sound ﬁelds for viewers using a line array

attached to a 17 in. monitor display. Moreover, the authors of [7]

presented broadband beamforming techniques using a speaker

array for focusing the sound to the user. Finally, the regularized

least squares (LS) pressure matching approach was introduced

in [8] for sound reproduction in multiple, isolated zones. The

performance of a multizone system using LS pressure matching

was further investigated in [9] for multiple conversation repro-

duction in a multi user environment.

There are limitations to accurate wideband sound repro-

duction in a multizone system using a practical number of

speakers. Firstly, the number of speakers required for accurate

sound generation increases with the size of the reproduction

area and frequency range [10], [11]. In [9], the present authors

employed 300 speakers around a circle of radius 2 m for signals

up to 4 kHz. Furthermore, delivering wideband signals (e. g.

speech signals) to listeners in multiple zones is a complicated

scenario which restricts limited speaker system reproduction

performance. This is most critical when the zones are in line

[8], [9]. Hence, this paper targets reduction of speaker count

through improved optimization.

Whereas the above mentioned techniques control the com-

plex weights of speakers at ﬁxed locations on a uniformly

spaced array, the present work controls both the speaker loca-

tions and their complex weights to achieve a high performance

multizone system using a minimum number of speakers. Use of

a limited number of speakers and selection of the LS-optimal

speaker locations for maximum pressure matching at micro-

phone positions is a non-convex problem which is in general

NP-hard [12]. Using the least-absolute shrinkage and selection

operator (Lasso) [13], the problem can be converted to a convex

problem. It can then be solved with -penalization using the

least-angle regression (LARS) algorithm [14], which computes

the entire Lasso coefﬁcient path, or with a low-complexity

procedure such as the coordinate descent method [15]. Using a

convex norm, Lasso produces zero-valued weights and thus

generates a reduced set of speakers.

For wideband sound, a single-stage Lasso approach is less

accurate than single-stage LS because Lasso does not employ

all selected speakers to reproduce all frequencies and sources.

Thus, this paper proposes a new two-stage Lasso-LS algorithm.

The ﬁrst stage of the algorithm uses the selectivity of Lasso to

choose an optimal subset of speakers across all frequency bands,

1558-7916/$31.00 © 2012 IEEE

RADMANESH AND BURNETT: GENERATION OF ISOLATED WIDEBAND SOUND FIELDS 379

while LS is then employed in a second stage to optimize the

weightings for that subset. It should be noted that a preliminary

analysis of the proposed technique was presented by the authors

in [16], while the current paper provides full implementation de-

tails and rigorous analysis of the two-stage Lasso-LS algorithm

for isolated wideband sound generation.

This paper is structured as follows: Section II explains the

multizone system and the use of a pressure matching approach

for the design of speaker weights. In Section III, the single-stage

regularized LS and Lasso methods are outlined and simulation

results using these techniques for multizone wideband sound re-

production presented. In Section IV, the new two-stage, com-

bined Lasso-LS optimization algorithm is discussed and its per-

formance results in comparison to single-stage regularized LS

and Lasso approaches are provided. Finally, in Section V we

conclude the paper with a discussion of the potential future

directions.

II. MULTIZONE SOUNDFIELD GENERATION

To investigate the performance of the multizone system, it is

assumed here that the sound ﬁeld propagates under free ﬁeld

conditions, virtual sources and speakers are considered to be

point sources, and that all zones, virtual sources and speakers are

located in the same plane. In the following analysis the aim is to

generate isolated sound ﬁelds for wideband

sources (with constituent frequencies ,)in

zones within the speaker array with the radius and angle of the

th source being and . For this paper, the task is to generate

a desired ﬁeld for every source in one corresponding active zone

and to suppress it effectively in the other zones (silent

zones) using an array of speakers located on an arc of 180 .

This corresponds to the example application scenarios described

previously.

Fig. 1 illustrates the task scenario with the reproduction zones

located at radius from the origin and the th zone’s angle

given by . All zones are located within a semicircle of ra-

dius surrounded by an array of speakers placed on a semi-

circle of radius . Each zone is of radius with a covering

of matching points distributed uniformly over a Euclidean

grid. For each source frequency, , a pressure matching ap-

proach is performed to control the complex sound pressure at

the matching points within the zones. While the pressure

amplitude is directly controlled within the zones, the pres-

sure outside of the control zones is limited by control of the

total speaker weight power. Assuming a time dependency ,

the pressure produced by the speakers at a given

matching point isgivenby[8]:

(1)

where is the th speaker weight for reproduction of the

th source at frequency and is the Green’s function

Fig. 1. Diagram of reproduction of isolated sound ﬁelds in a multizone system

using an arc of speakers.

which relates the pressure amplitude of the th speaker and the

pressure at the matching point according to:

(2)

where ,

is the acoustic wave number and the speed of sound propaga-

tion in air. and are respectively

the vector positions of the speakers and matching points in polar

coordinates. The desired sound ﬁeld of a virtual

source located at to be reproduced e.g. in the

ﬁrst zone [8] is then given by:

(3)

where relates the pressure amplitude of the th source

and the pressure at the matching point ,theﬁrst zone is cov-

ered by the ﬁrst matching points and is the sound

ﬁeld attenuation in inactive zones.

The speaker weights can be estimated by equating

the reproduced soundﬁeld given by (1) at the matching points

with the desired ﬁeld given by (3) according to:

(4)

where is the by 1 vector of speaker weights ,

,is the by 1 vector of desired sound

pressures at the matching points and is the by matrix

of the 2-D Green’s function given by:

.

.

..

.

.(5)

The following section examines the use of the LS and Lasso

methods to solve (4).

380 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 2, FEBRUARY 2013

III. SINGLE-STAG E SPEAKER WEIGHT ESTIMATION

A. Single-stage LS Weight Estimation

The regularized LS approach is a robust solution to multizone

sound generation from all directions normally using a uniformly

spaced array of speakers. In this method, for generation of a

frequency, ,ofsource the speaker weights, are deter-

mined by minimizing the squared error between the desired and

reproduced ﬁeld with a power constraint, such that:

(6)

where is the -norm, is the LS penalty parameter

and is the total speaker weight power. Adjusting

the penalty parameter between zero and inﬁnity changes

the solution from a LS errorsolutiontominimizationofthe

total speaker weight power only. Therefore, the value of

should be optimized to minimize the error while sufﬁciently

controlling the total speaker weight power. The solution of (6)

is given by (7) when the matrix is tall, i.e. for :

(7)

where is the conjugate transpose and is the by iden-

tity matrix.

B. Single-Stage Lasso Weight Estimation

To reproduce the desired sound ﬁeld of a virtual point source,

the LS approach allocates power to all speakers of a regularly-

spaced array. In such an array, the number of speakers required

for accurate sound generation increases with the size of the

reproduction area and frequency range [10], [11]. When the

number of speakers is limited, the LS-optimal speaker locations

must be selected for the best soundﬁeld reproduction of the vir-

tual source. To reproduce the desired soundﬁeld for each fre-

quency, of source ,the speakers from candidate

speakers must be activated. From the model se-

lection approaches in [17], one could enforce a desirably small

number of speakers by computing their weights as the solution

of the following optimization problem:

(8)

where the -norm is the number of nonzero weights

in and is the penalty parameter. This is a non-convex

problem which is NP-hard [12] and requires exhaustive searches

over all subsets of columns of the green function matrix

[18]. To overcome this problem, the Lasso algorithm replaces

the non-convex -norm with the convex norm and the com-

plex speaker weights can then be calculated from:

(9)

where is the norm and is the preselected Lasso penalty

parameter. Larger values of produce fewer nonzero speaker

weights and (9) can be solved using a coordinate descent method

in the Frequency-domain [19]. In this algorithm, all speaker

weights, ,are updated individually at each

iteration. If denotes the th column of (the 2-D Green’s

function matrix), the error of the th speaker at the th iteration

is calculated by removing from the effect of prior speaker

entries in that th iteration and the

following speaker entries in

the th iteration:

(10)

Using the updated error, the th speaker weight at the th it-

eration is then given by:

(11)

where for ,and for .is

the unique solution of (9) which is written compactly for two

cases depending on the value of .When ,

there is no stationary point over the differentiable region and

is the unique global minimum whereas for

, the complex stationary point is the unique

global minimum [19]. The algorithm is allowed to iterate until

(9) converges to its global optimum as guaranteed in [20].

The coordinate descent algorithm is a fast convex optimization

solver which provides the solution of (9) for a speciﬁcvalueof

. However, the LARS algorithm [14] can also be used to solve

(9) in cases where many values of are of interest. In a sound

reproduction scenario, the LARS algorithm gives a range of

values along the solution path of (9) from which the best may

be chosen for a particular number of active speakers.

C. Reproduction Error

In order to compare the error performance of different algo-

rithms, a reproduction MSE, , generated by every source

at frequency in each zone is calculated as:

(12)

where is the area of each zone and and ,

,are respectively the desired and

reproduced soundﬁelds in the area for the th source at fre-

quency .

Forevaluationoftheerrorperformance within the non-opti-

mized area (NOA), (which is the area outside of the zones

and conﬁned by the circle of radius ), an ideal soundﬁeld at-

tenuationof60dB isassumed(equaltothein-

active zones attenuation). Thus, the MSE , in the NOA

area can be calculated as:

(13)

where and , ,

are respectively the soundﬁeld produced by the virtual source

and by the speakers with weights at frequency in the

NOA area.

RADMANESH AND BURNETT: GENERATION OF ISOLATED WIDEBAND SOUND FIELDS 381

The total mean squared error, of sources at frequency

within the considered area is then calculated as:

(14)

The following subsection compares the performance of the

single-stage LS and single-stage Lasso algorithms for isolated

sound reproduction of two wideband sources.

D. Simulation Result

Wideband Sound Reproduction Using Single-Stage LS and

Single-Stage Lasso: Throughout this paper, the zones have

ﬁxed locations at from the origin and zone angles

are and . It is assumed that zone 1 is the

target zone for source 1 and zone 2 the target zone for source 2.

The speaker and source radii are considered to be

and the number of matching points used in each zone of radius

is . In this section, the performance of the

single-stage Lasso algorithm is compared to the single-stage

LS for isolated sound reproduction of wideband sources

located at and in corresponding

zones. The test wideband signals considered here comprise

center band frequencies of one-third octave bands

from 100 Hz to 16 kHz. For isolated sound generation of

wideband sources using single-stage Lasso, Lasso

problems were solved to select 46 different sets of speakers

from candidate positions. The Lasso penalty param-

eter was ﬁxed across the range of frequencies. The

uniﬁed 46 sets of selected speakers give a total of

active speakers for reproduction of both wideband sources

using single-stage Lasso. Fig. 2(b) shows the location of all

active speakers selected in the single-stage Lasso al-

gorithm. For a fair comparison between the single-stage Lasso

and single-stage LS in wideband sound reproduction, the same

number of active speakers was employed for the LS method

at a comparable total speaker weight power .

Speakers used in the single-stage LS were arranged in a uni-

formly-spaced array as demonstrated in Fig. 2(a). For clarity,

Figs. 3 and 4 compare the performance of the single-stage reg-

ularized LS and Lasso methods at a comparative total speaker

weight power for reproduction of two selected frequencies

Hz from source1 and kHz from source2. In

Fig. 3, the speaker weights are calculated for reproduction of

both the discrete frequencies and . For reproduction of both

wideband sources using single-stage Lasso, a set of

speakers were used but only a subset of them were powered

for the reproduction of discrete frequencies or , whereas in

the single-stage LS approach, all speakers were active

for the generation of each discrete frequency or . It can be

seen that single-stage Lasso is less accurate than single-stage

LS for wideband sound generation. Table I lists the mean error

within the zones and the NOA for the sound reproduction

in Fig. 3 at comparable total speaker weight power for both

methods. This table demonstrates that single stage regularized

LS performance is up to 9 dB better than single stage Lasso.

Fig. 4 illustrates the squared error generated within the circle of

radius for reproduction of the soundﬁelds in Fig. 3.

Fig. 2. Speaker locations for (a) single-stage LS and (b) single-stage Lasso,

sources located at and , the number of speakers used

in the reproduction of two wideband sources is .

Fig. 3. Sound ﬁeld visualization and speaker weights using (a) single-stage LS

and (b) single-stage Lasso. Two wideband sources are located at

and (Single frequencies of Hz and kHz shown

for clarity). In both methods, the number of speakers used in the reproduction

of two wideband sources is .

The error generated by Lasso can be seen almost everywhere

within the circle (Fig. 4(b)) whereas the LS error is lowest in

the vicinity of the zones (Fig. 4(a)).

IV. TWO-STAGE SPEAKER WEIGHT ESTIMATION

A. Two Stage, Combined Lasso-LS Optimization

In this section a new two-stage, combined Lasso-LS algo-

rithm (Fig. 5) is proposed for wideband sound reproduction

with an underlying assumption of ﬁxed virtual sources. For a

given set of virtual source positions and a large set of po-

tential positions, the LS-optimal speaker locations for the lim-

ited number of speakers must be selected for maximum pressure

matching at the microphones. This is a non-convex problem

382 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 2, FEBRUARY 2013

Fig. 4. Reproduction squared er ror (dB) within the circle of radius using (a)

single-stage LS and (b) single-stage Lasso. Two wideband sources are located at

and (Squared error for generation of single frequencies

Hz from source1 and kHz from source2 shown for clarity).

The number of active speakers in both methods is identical i.e. .

TAB L E I

THE MEAN ERROR (ME) OF FIG.3

which can be converted to a convex problem using Lasso as

explained in the previous section. Since Lasso guarantees the

global minimum solution of the convex problem using -pe-

nalization, a ﬁrst-stage Lasso optimization was employed to se-

lect a subset of speakers across all frequency bands. In other

words, for the present problem, Lasso is used to ﬁnd the most

efﬁcient locations in terms of achieving the lowest reproduction

MSE for a limited number of speakers. However, single-stage

Lasso selects a different set of active speakers for each discrete

frequency, of every source . Thus, the selected speakers for

the reproduction of each frequency represent only a subset of

all speakers selected for the generation of all wideband sources.

Multizone system performance can thus be improved by ac-

tivating all selected speakers in a second stage LS optimiza-

tion and performing complex weighting optimization for all fre-

quency sources. In this paper, second stage regularized LS esti-

Fig. 5. The two-stage Lasso-LS algorithm.

mation is proposed as it is theoretically guaranteed to result in

the lowest MSE for the selected set of speakers.

In the ﬁrst stage of the Lasso-LS algorithm, Lasso prob-

lems are solved to determine all active speakers used for repro-

duction of wideband sound ﬁelds with con-

stituent frequencies ,. In this paper the center

band frequencies of one-third octave bands [21] from 100 Hz

to 16 kHz were used to select active speakers in the ﬁrst stage

Lasso algorithm. The ﬁrst-stage Lasso penalty parameter de-

termines the number of selected active speakers. The larger the

ﬁrst-stage penalty parameter, , is made, the lower the number

of speakers selected. The columns of matrix are the solutions

of the Lasso problems for sound reproduction of sources com-

prising frequency bins such that:

.

.

..

.

..

.

.

(15)

where is the by matrix of the speaker weights. The th

entry of the total speaker weights vector is then calculated

from:

(16)

The by 1 vector of total speaker weights, ,isthe

output of the ﬁrst stage algorithm. The locations of the active

speakers to be used in the second stage are then extracted on

the basis of the nonzero entries of . The number of those

nonzero entries in thus determines the number of active

speakers, , to be used in the second stage.

RADMANESH AND BURNETT: GENERATION OF ISOLATED WIDEBAND SOUND FIELDS 383

In the second stage, the non-uniformly spaced arc of ac-

tive speakers is used. In this stage, all selected speakers are

utilized for sound reproduction of all constituent frequencies,

,of wideband sources using LS optimiza-

tion. The number of LS problems to be solved for generation of

isolated audio signals is thus . The penalty parameter

limits the power of the second-stage LS solution.

The important facet of the proposed technique is the loca-

tion optimization of speakers using Lasso sparsity before op-

timization of the sound ﬁeld control parameters (e.g. speaker

weights) using a LS pressure matching approach. The Lasso

optimization could also be combined with other sound repro-

duction techniques such as various beamforming methods [8],

[22] to generate the desired directivity patterns in the zones. In

such scenarios, the Lasso algorithm would guarantee the LS-op-

timal speaker placement using a convex norm, while the

beamformer would control the directivity pattern of selected

speakers. This technique could enable a wideband beamformer

to achieve a controllable directivity pattern while minimizing

the total number of speakers using the selectivity of Lasso.

B. Simulation Results

This section employs the same conﬁguration of multizone

surround system used in Section III. The performance of the

two-stage, combined Lasso-LS algorithm is compared to the

single-stage LS approach for sound reproduction of

wideband sources in zones. One-third octave bands

with center band frequencies from 100 Hz to 16 kHz were used

in the ﬁrst stage Lasso algorithm and sets of active

speakers out of candidate positions were selected

(corresponding to constituent frequencies of

wideband sources). The uniﬁed 46 sets of speakers give a total

of active speaker locations to be used in the second-stage al-

gorithm for reproduction of both wideband sources. Figs. 6(b)

and 9(b) show the location of all active speakers selected in the

Lasso-LS algorithm for two different scenarios which will be

discussed in the next section. For a fair comparison between

the single-stage LS and two-stage Lasso-LS algorithm in terms

of wideband sound reproduction, the same number of active

speakers selected in the Lasso-LS algorithm was employed to

evaluate the LS method at a comparable total speaker weight

power. Speakers used in the single-stage LS are arranged in a

uniformly-spaced array as demonstrated in Figs. 6(a) and 9(a).

In the following, different aspects are considered for the perfor-

mance assessment such as virtual source angles, total speaker

weight power and frequency.

Virtual Source Angles: Two distinct scenarios for virtual

source angles were considered so as to investigate the perfor-

mance of the two-stage Lasso-LS algorithm using a limited

number of speakers. In the ﬁrst scenario, the virtual sources are

close to each other and in the middle of the semicircle array of

candidate positions (at and ). In the second

scenario, the virtual sources (at and )

are further apart and compared to the ﬁrst scenario they are

closer to the edge of the semicircle array. In both scenarios,

the penalty parameter used for the ﬁrst stage Lasso was ﬁxed

across the range of frequencies . Using the Lasso-LS

Fig. 6. Speaker locations for (a) single-stage LS and (b) two-stage Lasso-LS,

sources located at and , the number of speakers used in the

reproduction of two wideband sources is .

approach, speaker positions are selected during the ﬁrst stage

Lasso algorithm which selects (from all candidate positions)

a number of positions closer to the virtual sources; thus, the

selection of active speakers in the ﬁrst stage depends on the

virtual source angles.

Scenario 1: Virtual Sources Located at and

:In this scenario the virtual sources are close to each other

and active speakers selected across frequency in the Lasso-LS

algorithm are close to both sources as illustrated in Fig. 6(b).

In practice, a subset of selected speakers close to both virtual

sources is utilized for Lasso-LS sound reproduction of both

sources. For a fair comparison between the single-stage LS and

two-stage Lasso-LS algorithm in wideband sound reproduction,

the same number of active speakers selected in the Lasso-LS al-

gorithm was employed in the LS method at a comparable total

speaker weight power on a uniformly spaced array as illustrated

in Fig. 6(a) .

Fig. 7 illustrates the resulting soundﬁeld and the corre-

sponding speaker weights of the LS and Lasso-LS algorithms

for scenario 1. This ﬁgure shows for clarity the generation of

a selected low frequency from source1 Hz and

a higher frequency from source2 kHz .TableII

demonstrates that in scenario 1, the ﬂexibility of the Lasso-LS

algorithm in locating a limited number of speakers (e.g.

) at the LS-optimal positions can result in up to 14 dB

and 31 dB improvement over a single-stage LS approach in the

reproduction of frequencies and , respectively, within the

control zones. However, a regularly-spaced array is up to 2 dB

more successful in terms of limiting the error within the NOA.

This is because the speaker locations are not selected for the

best desired sound reproduction in the NOA and thus they are

not located at the best spots to generate minimum error in this

area. Fig. 8 illustrates the squared error generated within the

circle of radius for reproduction of the soundﬁelds in Fig. 7.

Fig. 8(a) shows that the single-stage LS error is lowest in the

vicinity of the zones in comparison to the error generated in the

Lasso-LS algorithm (Figs. 8(b)).

384 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 2, FEBRUARY 2013

Fig. 7. Sound ﬁeld visualization and speaker weights using (a) single-stage LS

and (b) two-stage Lasso-LS. Two wideband sources are located at

and . (Single frequencies of Hz and kHz shown for

clarity). In both methods the number of active speakers is .

Fig. 8. Reproduction squared er ror (dB) within the circle of radius using (a)

single-stage LS and (b) two-stage Lasso-LS. Two wideband sources are located

at and (Squared error for generation of single frequencies

Hz from source1 and kHz from source2 shown for clarity).

The number of active speakers in both methods is identical i.e. .

Scenario 2: Virtual Sources Located at and

:In this scenario, where the virtual sources are located fur-

ther apart, the active speakers in the Lasso_LS approach are

TAB L E I I

THE MEAN ERROR (ME) OF FIG.7

Fig. 9. Speaker locations for (a) single-stage LS and (b) two-stage Lasso-LS,

sources located at and , the number of speakers used in

the reproduction of two wideband sources is .

selected only for the reproduction of one of the sources as il-

lustrated in Fig. 9(b). For the LS sound reproduction approach,

similarly to the previous scenario a regularly-spaced array of

speakers is used as demonstrated in Fig. 9(a). Fig. 10

compares the above methods for scenario 2 at selected frequen-

cies Hz and kHz and at comparable total

speaker weight power. Table III shows that using

active speakers for scenario 2, Lasso-LS is up to 7 dB and

19 dB more accurate than a LS approach in generation of the

selected low frequency and the higher frequency respec-

tively. These results show that, in scenario 2 (similarly to sce-

nario 1), the Lasso-LS algorithm improves the performance of

the multizone system in terms of reproduction at both low and

higher frequencies within the control zones but not within the

NOA. Fig. 11 illustrates the squared error generated within the

circle of radius for reproduction of the soundﬁelds in Fig. 10.

In Fig. 11(a) similarly to Fig. 8(a), the LS error is lowest in the

vicinity of the zones in comparison to the error generated in the

Lasso-LS algorithm (Figs. 11(b)).

The MSE Versus Total Speaker Weight Power: Fig. 12 shows

that the two-stage, combined Lasso-LS technique far outper-

forms the single-stage LS in reproduction of a selected low fre-

quency tone Hz and a high frequency tone

kHz of virtual wideband source1 located at .

In the Lasso-LS algorithm, speaker locations are se-

lected out of candidate positions on the basis of

two wideband sources located at and

and using a ﬁxed Lasso penalty parameter (Fig. 2(b)).

RADMANESH AND BURNETT: GENERATION OF ISOLATED WIDEBAND SOUND FIELDS 385

Fig. 10. Sound ﬁeld visualization and speaker weights using (a) single-stage

LS and (b) two-stage Lasso-LS. Two wideband sources are located at

and (single frequencies of Hz and kHz

shown for clarity). In both methods the number ofactive speakers is

.

Fig. 11. Reproduction squared error (dB) within the circle of radius using

(a) single-stage LS and (b) two-stage Lasso-LS. Two wideband sources are lo-

cated at and (Squared error for generation of single

frequencies Hz from source1 and kHz from source2 shown

for clarity). The number of active speakers in both methods is identical i.e.

.

In the single-stage LS approach the same number of speakers

TAB L E III

THE MEAN ERROR (ME) OF FIG.10

Fig. 12. The MSE vs total speaker weight power for generation of selected

frequencies (a) Hz and (b) kHz of source1 at with

zone1 as the target zone for this source and zone2 as the corresponding silent

zone. In the Lasso-LS algorithm, the speaker locations are selected considering

two wideband sources located at and . The number of

active speakers in both methods is identical i.e. .

are used in a uniformly-spaced array as demon-

strated in Fig. 2(a). The total speaker weight powers for the

regularized LS and Lasso-LS techniques are varied by tuning,

respectively, the LS penalty parameter, and the Lasso-LS

second stage penalty parameter, .InFig.12,Lasso-LSgen-

erates e.g. the selected low frequency Hz in zone1

and zone2 respectively with 39 dB and 20 dB less MSE than

single-stage LS at a power (Fig. 12(a)) and generates

the selected high frequency kHz within zone1 and

zone2 with 48 dB and 49 dB less MSE at a power

(Fig. 12(b)) using active speakers. The reason for

the Lasso-LS algorithm’s dramatic advantage over single-stage

386 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 2, FEBRUARY 2013

LS within the control zones is the capability of the Lasso-LS al-

gorithm to control both the speaker locations and their complex

weights for maximum pressure matching at microphone posi-

tions. Fig. 12 demonstrates, however, that the Lasso-LS algo-

rithm performance in the NOA is not better than the LS method.

This is because the speaker locations are not selected for the

best desired sound reproduction in the NOA and thus are not

located at the best spots to generate minimum error using a lim-

ited number of speakers in this area.

The MSE Versus Frequency: Fig. 13(a) shows that Lasso-LS

algorithm outperforms the single-stage LS across all frequen-

ciesfrom100Hzto16kHzforreproduction of virtual wide-

band source1 located at . Fig. 13(b) demonstrates

the corresponding total speaker weight power versus frequency

and shows that the total speaker weight power is comparable at

frequencies over 200 Hz, while the penalty parameters are kept

constant across frequency (The LS penalty parameter is ,

the Lasso ﬁrst-stage penalty parameter is and the LS

second stage penalty parameter is ). As can be seen, for

both zones, there is a LS peak error at kHz which re-

sults from the use of a solution with lower energy than the min-

imum energy solution. However, locating the same number of

speakers at the LS-optimal positions in the Lasso-LS approach

provides solutions with enough energy required for accurate

multizone sound reproduction across frequency. The MSE gen-

erated at kHz using LS and Lasso-LS is 17 dB and

76 dB in zone 1 and 20 dB and 89 dB in zone 2 respec-

tively at the total speaker weight power, .Furthermore,

increasing the frequency from 700 Hz to 16 kHz, the Lasso-LS

algorithm outperforms the single-stage LS by a further 20 dB.

At low frequencies with decreasing frequency from 500 Hz, the

performance of two methods becomes more competitive.

V. C ONCLUSIONS

The aim of this work was to generate isolated wideband

soundﬁelds in multiple listening spaces while minimizing the

number of speakers required. The paper demonstrates signif-

icant performance improvements within control zones over

a LS matching approach through the use of new, two-stage

Lasso-LS optimization approach. The latter exploits the se-

lectivity of Lasso to maximize the performance of a restricted

set of speakers reproducing ﬁxed, wideband virtual sources.

The results show that using the proposed two-stage Lasso-LS

optimization for wideband sound reproduction can result in

up to 69 dB improvement in MSE within control zones over

a single-stage LS optimization. In addition, the performance

of the Lasso-LS approach over a single-stage LS algorithm

is accentuated at higher frequencies, with performance gains

of over 20 dB in experiments. Finally, the work shows that

limited arcs of e.g. 84 speakers can be used to successfully

create a multi-zone system for multiple users (effectively,

personal audio spaces). This makes the techniques appropriate

for realistic soundﬁeld installations. Work is currently focused

on further reduction in the number of speakers by providing a

bank of candidate positions corresponding to frequency ranges.

This will lead to extensions of the approach to generation of

personal spaces in three-dimensional environments using a

Fig. 13. (a) The MSE measured in the control zones vs frequency and (b) total

speaker weight power vs frequency for source1 at with zone1 as

the target zone for this source and zone2 as the corresponding silent zone. In the

Lasso-LS algorithm, the speaker locations are selected considering two wide-

band sources located at and . The number of active

speakers in both methods is identical i.e. .

small set of speakers. A comparison of Lasso-LS pressure

matching with Lasso-beamforming techniques could also be

the topic of future work. Beamforming, however, does not

produce an exact desired ﬁeld as it does not implement pressure

matching.

REFERENCES

[1] P. A. Nelson and S. J. Elliott, Active Control of Sound.NewYork:

Academic, 1993.

[2] J. Choi and Y. Kim, “Generation of an acoustically bright zone with an

illuminated region using multiple sources,” J. Acoust. Soc. Amer., vol.

111, no. 4, pp. 1695–1700, Apr. 2002.

[3] M.Shin,S.Lee,F.Fazi,P.Nelson,D.Kim,S.Wang,K.Park,and

J. Seo, “Maximization of acoustic energy difference between two

spaces,” J. Acoust. Soc. Amer., vol. 128, no. 1, pp. 121–131, Jul. 2010.

[4] W. F. Druyvesteyn and J. Garas, “Personal sound,” J. Audio Eng. Soc.,

vol. 45, no. 9, pp. 685–701, Sep. 1997.

[5] S. J. Elliott and M. Jones, “Active headrest for personal audio,” J.

Acoust. Soc. Amer., vol. 119, no. 5, pp. 2702–2709, May 2006.

[6] J.-H. Chang, C.-H. Lee, J.-Y. Park, and Y.-H. Kim, “A realization of

sound focused personal audio system using acoustic contrast control,”

J. Acoust. Soc. Amer., vol. 125, no. 4, pp. 2091–2097, Apr. 2009.

[7] I. Tashev, J. Droppo, M. Seltzer, and A. Acero, “Robust design of wide-

band loudspeaker arrays,” in Proc. IEEE Int. Conf. Acoust., Speech,

Signal Process., ICASSP’08, Las Vegas, NV, Mar 30–Apr 4 2008.

[8] M. Poletti, “An investigation of 2D multizone surround sound sys-

tems,” in Proc. AES 125th Convention. Audio Eng. Society, San Fran-

cisco, CA, Oct. 2008.

RADMANESH AND BURNETT: GENERATION OF ISOLATED WIDEBAND SOUND FIELDS 387

[9] N.RadmaneshandI.S.Burnett,“Reproductionofindependentnarrow-

band soundﬁelds in a multizone surround system and its extension to

speech signal sources,” in Proc. IEEE Int. Conf. Acoust, Speech, Signal

Process., ICASSP’11, Prague, Czech Republic, May 22–27, 2011.

[10] D. B. Ward and T. D. Abhayapala, “Reproduction of a plane-wave

sound ﬁeld using an array of loudspeakers,” IEEE Trans. Speech Audio

Process., vol. 9, no. 6, pp. 697–707, Sep. 2001.

[11] M. Poletti, “Robust two-dimensional surround sound reproduction for

nonuniform loudspeaker layouts,” J. Audio Eng. Soc.,vol.55,no.7/8,

pp. 598–610, Jul./Aug. 2007.

[12] B. K. Natarajan, “Sparse approximate solutions to linear systems,”

SIAM J. Comput., vol. 24, no. 2, pp. 227–234, 1995.

[13] R. Tibshirani, “Regression shrinkage and selection via the lasso,” J. R.

Statist. Soc., Ser. B, vol. 58, no. 1, pp. 267–288, 1996.

[14] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least angle re-

gression,” Ann. Statist., vol. 32, no. 2, pp. 407–499, 2004.

[15] J. H. Friedman, T. Hastie, H. Hoeﬂing, and R. Tibshirani, “Pathwise

coordinate optimization,” Ann. Appl. Statist., vol. 2, no. 1, pp. 302–332,

2007.

[16] N. Radmanesh and I. S. Burnett, “Wideband sound reproduction in a

2D multizone system using a combined two-stage lasso-LS algorithm ,”

in Proc. IEEE Sens. Array and Multichannel Signal Process. Work-

shop, SAM 2012, Hoboken, NJ, Jun 17–20, 2012.

[17] L. Breiman, “Heuristics of instability and stabilization in model selec-

tion,” Ann. Statist., vol. 24, pp. 2350–2383, 1996.

[18] E. J. Candès and Y. Plan, “Near-ideal model selection by minimiza-

tion,” Ann. Statist., vol. 37, no. 5A, pp. 2145–2177, 2009.

[19] G. N. Lilis, D. Angelosante, and G. B. Giannakis, “Sound ﬁeld repro-

duction using the lasso,” IEEE Trans. Speech Audio Process., vol. 18,

no. 8, pp. 1902–1912, Nov. 2010.

[20] S. Sardy, A. Bruce, and P. Tseng, “Block coordinate relaxation methods

for nonparametric wavelet denoising,” J. Comput. Graph. Statist., vol.

9, no. 2, pp. 361–379, Jun. 2000.

[21] Speciﬁcation for Octave-Band and Fractional-Octave-Band Analog

and Digital Filters, ANSI s1.11-2004, Feb. 2004, Standards Secretariat

Acoustical Society of America.

[22] S. Haykin, Array Signal Processing. Englewood Cliffs, NJ: Prentice-

Hall, 1995.

Nasim Radmanesh (M’09) received the B.E. degree

in electrical engineering in 2005 from K.N. Toosi

University of Technology, Tehran, Iran. She received

the M.E. degree in electronics engineering in 2008

from RMIT University, Melbourne, Australia, where

she is currently pursuing the Ph.D. degree, supported

in part by the Australian Research Council (ARC).

Her research interests include multichannel sound

reproduction and optimization techniques for array

processing with emphasis on compressed sensing.

Ian Burnett (M’87–SM’02) received the Ph.D.

degree in 1992 from the University of Bath, Bath,

UK. He is currently Professor and Head of School

of Electrical and Computer Engineering, RMIT

University, Melbourne, Australia. His research

interests include speech processing, 3D audio re-

production, recording and transmission, semantic

media content description, 3D video processing and

quality of multimedia experience. He is a member

of the editorial board of IEEE Multimedia, and was

previously Chair of MDS at MPEG.