Conference PaperPDF Available

Multizone Reproduction of Speech Soundfields: A Perceptually Weighted Approach

Authors:

Abstract and Figures

In this paper a method for the reproduction of multizone speech soundfields using perceptual weighting criteria is proposed. Psychoacoustic models are used to derive a space-time-frequency weighting function to control leakage of perceptually unimportant energy from the bright zone into the quiet zone. This is combined with a method for regulating the number of basis planewaves used in the reproduction to allow for an efficient implementation using a codebook of predetermined weights based on desired soundfield energy in the zones. The approach is capable of improving the mean squared error for reproduced speech in the bright zone by -10.5 decibels. Results also show that the approach leads to a significant reduction in the spatial error within the bright zone whilst requiring 65% less loudspeaker signal power for the case where the soundfield in this zone is in line with, and hence partially directed to, the quiet zone.
Content may be subject to copyright.
Multizone Reproduction of Speech Soundfields:
A Perceptually Weighted Approach
Jacob Donley and Christian Ritz
School of Electrical, Computer and Telecommunications Engineering, University of Wollongong
Wollongong, NSW, Australia, 2522
E-mail: jrd089@uowmail.edu.au, critz@uow.edu.au
Abstract—In this paper a method for the reproduction of
multizone speech soundfields using perceptual weighting crite-
ria is proposed. Psychoacoustic models are used to derive a
space-time-frequency weighting function to control leakage of
perceptually unimportant energy from the bright zone into the
quiet zone. This is combined with a method for regulating the
number of basis planewaves used in the reproduction to allow for
an efficient implementation using a codebook of predetermined
weights based on desired soundfield energy in the zones. The
approach is capable of improving the mean squared error for
reproduced speech in the bright zone by -10.5 decibels. Results
also show that the approach leads to a significant reduction in
the spatial error within the bright zone whilst requiring 65%
less loudspeaker signal power for the case where the soundfield
in this zone is in line with, and hence partially directed to, the
quiet zone.
I. INTRODUCTION
Spatial audio reproduction gives listeners a full experience
of the acoustic environment, including the sound source, and
has been further extended to multizone soundfield repro-
duction, which provides audio in spatially separated regions
from a single set of loudspeakers, originally proposed in
[1]. They may also be used for suppressing, or cancelling,
audio outside a targeted listening zone [2]. The multizone
approach has many applications such as the creation of
personal sound zones in multi-participant teleconferencing,
entertainment/cinema and vehicle cabins where personal sound
zones are optimised to provide one, or many, listener(s) with
individual acoustic material [3].
In order to keep the sounds zones personal it is necessary
to minimise the interzone interference to maximise the indi-
vidualistic experience. Some of the earlier methods treat the
interference with hard constraints and attempt to completely
remove it [1], [4]. This results in zones that are mostly
free of the interference, however, this is difficult to achieve
in situations where a desired soundfield in the bright zone
is obscured by or directed to another zone, as the system
requires reproduction signals many times the amplitude of
what is reproduced within any zone. This is known as the
occlusion problem [1], [3], [5] and has been dealt with in
various ways such as the control of planarity [6], orthogonal
basis planewaves [7] and alleviated zone constraints [7], [8].
Requiring large signals in relation to the reproduced zones
means the system is inefficiently directing its energy for the
multizone reproduction, with most sound energy present in
unattended regions. This may be undesirable at times where
listeners commute between sound zones and could put unnec-
essary strain on loudspeaker drivers. More recent work has
focused on alleviating the constraint such that the interference
(or leakage) is allowed into other zones, though, the amount
can be controlled with a weighting function [7], [8]. Allowing
the sound to leak into other zones can improve the practicality
of the system but decrease the individuality of zones.
Existing methods focus on single frequency soundfields,
although there has been work attempting to create multizone
soundfields for wideband speech [9]. More recently, work has
been done [10] to extend a method [7] to the reproduction
of weighted wideband speech soundfields whilst efficiently
maintaining the weighting function in the spatial, time and
frequency domain. This allows for dynamic weighting of the
zones as well as individual frequency components in time thus
allowing each zone’s acoustic content to be controlled.
The control of acoustic components to enhance the percep-
tion of a signal has been researched thoroughly for applications
such as compression [11]. The relationship between the quality
in the bright zone and interference in other zones has been
subjectively tested [12], however, the occlusion problem is not
directly addressed and the planarity control does not consider
human perception. Hence, perceptual models are employed in
this work in order to enhance the experience in personal sound
zones, especially where the occlusion problem is present.
Leaked sound energy is treated as unwanted noise in other
zones and controlled such that it is perceptually less noticeable
as indicated by established psychoacoustic models.
We begin with an explanation of the weighted multizone
soundfield method used in this work in Section II. Psychoa-
coustic models are introduced in Section III as well as the
need for regulating the number of weighted basis planewaves.
Results of the perceptual weighting and conclusions are given
in Section IV and Section V, respectively.
II. WEIGHTED MU LTIZ ON E WIDEBAND SOU NDFIELDS
The multizone soundfield reproduction layout considered in
this work is shown in Fig. 1 and contains a reproduction
region, D, which has a radius R. The reproduction region
consists of three zones called the bright, quiet and unattended
zones which are represented by Db,Dqand D(DbDq)0,
respectively. The centres of Dband Dqhave a distance of rz
from the centre of Dand each of these zones has a radius of
© 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for
resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Unattended Zone
Bright Zone
Quiet Zone
rz
rz
rπ
Fig. 1: A weighted multizone soundfield reproduction layout is shown. The
figure depicts a situation where the desired soundfield in the bright zone is
partially directed towards the quiet zone causing the occlusion problem.
r. Loudspeakers are positioned with a distance of Rlfrom the
centre of Don an arc of angle φLwhich starts at angle φand
reproduces planewave speech soundfields in Dbwith an angle
of θ.
In the method of weighting multizone soundfields [7], a
spatial weighting filter as a function of space, w(x), is used
to control the reproduction of sound within each of the
zones. Subsequent work [10] extended this approach to al-
low for space-time-frequency dependent weighting functions,
w(x, n, k), which allows for weighting functions to be adapted
based on the signal characteristics of the target soundfield. We
denote wb,wqand wuas the weighting functions for xbDb,
xqDqand xuD(DbDq)0, respectively. The reproduced
soundfield pressure at any point in the reproduction region
is defined as the sum of space-time-frequency dependant
weighted soundfield values [10],
ˆpw(x, n) =
K
X
k
Sa
w(x, n, k)(1)
where Sa
w(x, n, k)=fSd(x, n, k), w(x, n, k )is a repro-
duced soundfield, which is derived as a function of a desired
soundfield, Sd(x, n, k), and a weighting function, w(x, n, k)
using the approaches outlined in [7], [10]. Here, xis a given
position, nis a given time and kis a given frequency.
Sa
w(x, n, k)is summed for Kdifferent sinusoidal components.
In this work k= 2πf /c and c= 343 m s1.
In (1) w(x, n, k)allows independently weighting soundfield
components in space and time. It is then possible to define
the reproduced space-time-frequency domain signal for a
particular input as [10],
ˆ
Yw(x, n, k) =
Sa
w(x, n, k)
Y(n, k)(2)
where ˆ
Yw(x, n, k)is the time-frequency signal at an arbitrary
location, x, in the reproduction region, D,Y(n, k)is obtained
from the short-time Fourier transform of the windowed frame
of input y(n)and |·| denotes the absolute value. Using overlap-
add reconstruction we can obtain the time-domain signal at
any point in Dwhere a different weighting function can be
used for each space-time-frequency. The weighting function
can now be used to control the leaked content into the quiet
zone in the space-time-frequency domain.
III. PSYCHOACOUSTIC WEIGHTING MODELS
The capability of controlling the energy leakage between
zones then allows the weighting function to become dependent
on the signal being reproduced. For instance, the leaked audio
spectrum may be controlled, altered, suppressed or designed
to be masked by another spectrum. From this, psychoacoustic
modelling can be applied to the weighting function in order to
reduce the perceptual affect of the leakage in the quiet zone.
A. The Hearing Threshold
The benefit of using zone weighting is that the hard con-
straint of zero energy is alleviated and sound energy may be
allowed to leak into the quiet zone. However, this then means
the quiet zone is no longer completely quiet.
Due to the human threshold of hearing in quiet, a quiet
zone could be redefined so that the sound pressure level is
imperceptible. This would then make a weighted multizone
system practical (from a relieved constraint) and remain quiet
(perceptually). The threshold in quiet has been well established
with frequency dependent functions that provide a good ap-
proximation [11], [13].
Using the new space-time-frequency domain weighting it
is possible to apply the threshold in quiet approximation to
(2) where w(x, n, k)is chosen so that the output in the quiet
zone, ˆ
Yw(xq, n, k), is as close to the threshold in quiet as
possible. Then, using the codebook method [10], w(x, n, k)
can be chosen to minimise the difference,
min
wˆ
Yw(xq, n, k)A(xq, n, k)(3)
where A(xq, n, k)is a space-time-frequency dependent func-
tion describing the perceptual criteria. In this work Sound
Pressure Level (SPL) in dB is relative to the threshold of
hearing pr= 20 µPa.
B. Spreading Functions to Reduce Multizone Error
Analysing the weighted multizone reproductions in [7]
reveals that larger weighting increases the error in the bright
zone whilst suppressing the quiet zone. The quality of the
spatial reproduction in the bright zone is less erroneous when
the weighting is eased for the quiet zone allowing more energy
to leak. If the quiet zone is controlled to have minimal energy
leaked into it there becomes an erroneous bright zone.
The spatial errors shown in Fig. 2 are calculated from [7]:
b(n, k) = RDb
Sd(x, n, k)Sa
w(x, n, k)
2dx
RDb|Sd(x, n, k)|2dx
(4)
where b(n, k)is the spatial error in the bright zone.
Frequency (Hz)
102103104
SPL (dB)
0
20
40
60 Bright Zone SPL
Desired SPL
Threshold in Quiet
Frequency (Hz)
102103104
Error (dB)
-60
-40
-20
0
Bright Zone Spatial Error
Max Error
Actual Error
Min Error
Frequency (Hz)
102103104
SPL (dB)
0
20
40
60 Quiet Zone SPL
Max SPL
Leaked SPL
Desired SPL
Min SPL
Fig. 2: Multizone soundfield reproduction with perceptual weighting in the quiet zone. The desired bright zone signal is an equal loudness curve at 30 phon
[13] and a 2kHz masker signal at 30 dB SPL is present in the quiet zone. The red and green dashed lines show the worst and best case scenarios,
respectively. The bright zone error is calculated using (4). The “Leaked SPL” shows the result after controlling the interzone interference with wq.
Frequency (Hz)
102103104
SPL (dB)
-60
-40
-20
0Quiet Zone SPL Limits in Codebook
N = 80
Regulated N
Fig. 3: Shows the maximum (solid line) and minimum (dash-dot line) levels
in a codebook that the weighting function provides for wq= 102104.
The number of basis planewaves, N, used to generate the codebooks are a
constant number, 80, (red lines) and a regulated number (blue lines).
The work in [7] shows that for k= 2 kHz the spatial error
is greater than 5dB when the quiet zone is occluded by the
bright zone and has a large weight (equivalent to wq= 10),
however, the spatial error is less than 20 dB when the weight
is alleviated (equivalent to wq= 0.1).
In Fig. 2 it is shown that using a spreading function to
mask apparent sounds, in the “target” quiet zone, can reduce
the error of the reproduction in the bright zone. This is because
we can safely allow the sound energy at particular frequencies
to leak into the quiet zone with no perceptual affect. If
the “target” quiet zone contained many different frequency
components then it is possible that the bright zone energy
could be completely leaked into the quiet zone unperceivably
and thus reduce the error in the bright zone to a minimum.
C. Maintaining Broad Control for Wideband Speech
Inaccurate reproductions can be caused by spatial aliasing
and poorly conditioned matrices which cause the control range
of the weighting function per frequency to become reduced
and less accurate as can be seen in Fig. 3. In order to maintain
accurate estimation of the affect of different zone weights
the number of basis planewaves needed to reproduce the
soundfields can be regulated [10].
Fig. 3 shows that with a regulated Nthe difference in level
that a given wqcan provide in the quiet zone is improved for
low and high frequencies and the response is smoother for low
frequencies. N= 80 is well balanced between spatial aliasing
and ill-conditioning for 2kHz [7], [10].
IV. RES ULT S
A. Multizone Reproduction Evaluation Setup
The multizone soundfield layout of Fig. 1 is evaluated,
where r= 0.3m, rz= 0.6m, R= 1 m, Rl= 1.5m,
θ= sin1(r/2rz)14.5° and π3.141 59 rad. The value
of θis chosen such that an evanescent planewave with instant
decay would interfere with half the quiet zone [7], [10].
This choice results in a slight occlusion problem where the
range of weighting control is larger than for no occlusion
and full occlusion. Signals sampled at 16 kHz are converted to
the time-frequency domain using a Hamming window (50%
overlap) and Fast Fourier transform (FFT) of length 1024.
For the evaluation we use the efficient method of codebooks
described in [10] to store pre-determined weighted soundfield
values to be used for a given setup or wideband reproduction.
The codebooks are constructed for a reproduction where
L= 65 and φL= 2πwhich for this particular setup is
free of significant aliasing problems in the quiet zone below
approximately 8kHz.
The codebooks are built with spatial pressure samples for
all xDbDqwith each soundfield zone approximated from
2724 samples. The zone weights are chosen as wb= 1 and
wu= 0.05 following [7], [10] and the variable weight is wq.
Then, using (3), wqis chosen to match the quiet zone to a
given level, A(xq, n, k). In this work we choose A(xq, n, k)
to be the threshold in quiet using the ISO226 standard [13]
with additional masking curves using the ISO/IEC MPEG
Psychoacoustic Model 2 spreading function [11].
Speech files for the evaluation are taken from the TIMIT
corpus [14] where 20 files are chosen randomly. The male to
female speaker ratio of these files is 50 : 50. Evaluations using
these speech files are shown with 95% confidence intervals.
B. Reduced Bright Zone Error from Psychoacoustic Masking
The error induced from the multizone reproduction of the
speech soundfields is evaluated using the Mean Squared Error
(MSE) of the reproduced speech where the reference signal
for the MSE is the original speech signal. To obtain an
approximation of the reproduced speech the mean of the
Weight
10-2 100102104
MSE (dB)
-85
-80
-75
-70
-65 MSE of Weighted Reproduced Speech
Bright Zone
Fig. 4: Shows the MSE of reproduced speech files in the bright zone for
different uniform weighting functions (wq).
Fig. 5: Difference between Sd(xb, n, k)and Sa
w(xb, n, k)for f= 2 kHz.
Aand Bshow the magnitude difference and Cand Dshow the phase
difference. Aand Care for wq= 102and Band Dare for wq= 104.
simulated spatial pressure samples obtained with the approach
of section III are used across Dband Dq.
Upon analysing the MSE of different reproduced speech
files it becomes apparent that the majority of error measured
in the bright zone from the reproduction is in the spatial
domain. The sampling theory used to obtain the reproduced
speech means that spatial information is neglected, however,
(4) evaluates the spatial error and is similar to the measure
of planarity [6]. This then means that the application of
perceptual criteria primarily reduces the spatial error of the
multizone reproduction.
The maximum improvement in MSE of the bright zone
reproduced speech is 10.5dB, from 69.8dB for wq= 104
to 80.3dB for wq= 102, and can be seen in Fig. 4.
Even though there is a difference of 10.5dB, the MSE in
the reproduced speech is minimal. However, the maximum
improvement in spatial error for the bright zone, b, averaged
for all frequencies is 24.0dB, from 7.4dB for wq= 104
to 31.5dB for wq= 102, and can be seen in Fig. 2.
Also shown in Fig. 2 is that a 2kHz masker signal in
the quite zone can allow the spatial error in the bright zone
to be reduced. This reduction in spatial error is depicted
in Fig. 5 where the perceptual weighting uses wq= 102
instead of wq= 104which gives a smaller difference between
the desired soundfield and reproduced soundfield. In Fig.
5 the magnitude difference is calculated from |Sd|−|Sa
w|
and the phase difference from arg(Sd/Sa
w). The equivalent
improvement in band required loudspeaker power due to the
perceptual weighting is 28 dB and 65% less, respectively.
V. CONCLUSIONS
In this paper we have proposed a method for perceptually
weighting multizone speech soundfields which can improve
error in bright zones, especially when the occlusion problem is
apparent. We have shown the need for regulating the number of
basis planewaves used for the reproduction. Perceptual weight-
ing is shown to improve the MSE for reproduced speech in
the bright zone from 69.8dB to 80.3dB and significantly
reduce the spatial error on average from 7.4dB to 31.5dB
whilst requiring less power. Future work includes testing
methods for maximising the speech intelligibility difference
and privacy between zones in multizone speech soundfields.
REFERENCES
[1] M. Poletti, “An Investigation of 2-D Multizone Surround Sound Sys-
tems,” in Audio Engineering Society Convention 125. Audio Engineer-
ing Society, Oct. 2008.
[2] W. Jin and W. Kleijn, “Multizone soundfield reproduction in rever-
berant rooms using compressed sensing techniques,” in 2014 IEEE
International Conference on Acoustics, Speech and Signal Processing
(ICASSP), May 2014, pp. 4728–4732.
[3] T. Betlehem, W. Zhang, M. Poletti, T. D. Abhayapala, and others,
“Personal Sound Zones: Delivering interface-free audio to multiple
listeners,” Signal Processing Magazine, IEEE, vol. 32, no. 2, pp. 81–91,
2015.
[4] Y. J. Wu and T. D. Abhayapala, “Spatial multizone soundfield repro-
duction: Theory and design,” IEEE Transactions on Audio, Speech, and
Language Processing, vol. 19, no. 6, pp. 1711–1720, 2011.
[5] T. Betlehem and P. D. Teal, “A constrained optimization approach for
multi-zone surround sound,” in 2011 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2011, pp.
437–440.
[6] P. Coleman, P. J. Jackson, M. Olik, and J. A. Pedersen, “Personal audio
with a planar bright zone,” vol. 136, no. 4, pp. 1725–1735.
[7] W. Jin, W. B. Kleijn, and D. Virette, “Multizone soundfield reproduction
using orthogonal basis expansion,” in International Conference on
Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2013, pp.
311–315.
[8] H. Chen, T. D. Abhayapala, and W. Zhang, “Enhanced sound field
reproduction within prioritized control region,” in INTER-NOISE and
NOISE-CON Congress and Conference Proceedings, vol. 249. Institute
of Noise Control Engineering, 2014, pp. 4055–4064.
[9] N. Radmanesh and I. S. Burnett, “Generation of isolated wideband
sound fields using a combined two-stage lasso-ls algorithm,” IEEE
Transactions on Audio, Speech, and Language Processing, vol. 21, no. 2,
pp. 378–387, 2013.
[10] J. Donley and C. Ritz, “An efficient approach to dynamically weighted
multizone wideband reproduction of speech soundfields,” in China Sum-
mit & International Conference on Signal and Information Processing
(ChinaSIP). IEEE, 2015, pp. 60–64.
[11] M. Bosi and R. E. Goldberg, Introduction to digital audio coding and
standards. Springer, 2003.
[12] K. Baykaner, P. Coleman, R. Mason, P. J. Jackson, J. Francombe,
M. Olik, and S. Bech, “The relationship between target quality and
interference in sound zone,” vol. 63, no. 1, pp. 78–89.
[13] B. ISO, “226: 2003:AcousticsNormal equalloudness-level contours,”
International Organization for Standardization, 2003.
[14] J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, and D. S. Pallett,
“DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM.
NIST speech disc 1-1.1,” vol. 93, p. 27403.
... In the speech privacy control problem, these effects must be analysed across both listening zones, as a beneficial increase to the intelligibility in the bright zone is detrimental to the objectives in the dark zone, and vice-versa. Donley and Ritz [82] have approached the problem of reproducing multi-zone speech sound fields by considering the perceptual relevance of speech material leaking from one zone into another. ...
... They argue that the perceived performance of a personal audio system can be adversely affected if the conventional constraint of setting the dark zone pressure to zero is applied [82]. Alternatively, they suggest allowing a certain level of leakage from the bright zone into the dark zone, and using the resulting improvement in system efficiency to improve the quality of reproduction. ...
... The work of Donley and Ritz into perceptually weighted multi-zone control [82] was later developed through the use of objective speech intelligibility metrics to directly evaluate inter-zone privacy [8]. A filtered random noise masking signal was focussed into the dark zone in order to improve speech intelligibility contrast without significant degradation of the bright zone signal, measured using PESQ [95]. ...
Thesis
Full-text available
Multi-zone sound field control allows individuals to listen to personalised audio content whilst sharing a physical space. Applications of this technology include home entertainment, audio reproduction in public spaces such as museums, shops or exhibitions, and providing areas where the privacy of sensitive communication can be safeguarded without the need for physical barriers. The problem of transmitting a speech signal to a single listener and reducing the intelligibility of that signal elsewhere is the focus of the present thesis. The motivation behind the presented experiments and simulations is to identify the practical trade-offs that must be considered in the design of these "Speech Privacy Control" systems. Conventional personal audio systems use loudspeaker array processing to produce a bright zone for the intended user of the system and a dark zone where silence is desired. However, established performance metrics and system optimisation techniques do not necessarily yield privacy for the target listener, as attenuated speech may remain intelligible within the dark zone. A system is proposed that focusses a synthetic masking signal into the dark zone to selectively reduce the intelligibility of the leaked speech. Privacy is ensured by adjusting the masker to meet pre-defined constraints on the speech intelligibility in each zone. This design methodology utilises information from speech intelligibility tests and subjective preference evaluations in order to improve the utility and acceptability of such systems for all nearby listeners. In addition to the design of the masking signal, the performance of a speech privacy control system is affected by the loudspeaker array design and the location of the listening zones. These effects are explored using experimental measurements of a loudspeaker array in a room, and the results are used to select two system configurations for additional evaluation using listening tests. The perceived performance of a system is also affected by the surrounding acoustic environment, notably due to reverberation and background noise, which may change over time. The effects of room reverberation are investigated using image source simulations and acoustical measurements within a room, and the performance is evaluated in terms of the achievable level of acoustic contrast, the difference in speech intelligibility between zones, and the masking signal levels that are required to achieve privacy. A proposal is made to further enhance privacy by combining the effects of background noise and artificial masking signals. This method reduces the level of acoustic contrast that is required to achieve a given level of privacy, compared to the case where the masking is provided by the background noise alone.
... Techniques that improve performance in these situations optimise over spatial regions with planarity [13], basis plane-waves [14] and reduced constraints [14], [15]. Recent work [14]- [16] has shown that spatial weighting of importance for each zone can be used to control the amount of leakage and improve the performance of the multizone reproduction system. ...
... Multizone soundfield reproductions designed for mono-frequent soundfields have been extended to wideband soundfields including speech [8], [16]- [18]. Recent research has investigated the perceptual quality of multizone soundfields [2] and methods have been proposed to improve the quality using psychoacoustic models [16]. ...
... Multizone soundfield reproductions designed for mono-frequent soundfields have been extended to wideband soundfields including speech [8], [16]- [18]. Recent research has investigated the perceptual quality of multizone soundfields [2] and methods have been proposed to improve the quality using psychoacoustic models [16]. In this paper, we address open questions on the perception of leakage and what this means for speech privacy amongst zones. ...
Article
Full-text available
Reproducing zones of personal sound is a challenging signal processing problem which has garnered considerable research interest in recent years. We introduce in this work an extended method to multizone soundfield reproduction which overcomes issues with speech privacy and quality. Measures of Speech Intelligibility Contrast (SIC) and speech quality are used as cost functions in an optimisation of speech privacy and quality. Novel spatial and (temporal) frequency domain speech masker filter designs are proposed to accompany the optimisation process. Spatial masking filters are designed using multizone soundfield algorithms which are dependent on the target speech multizone reproduction. Combinations of estimates of acoustic contrast and long term average speech spectra are proposed to provide equal masking influence on speech privacy and quality. Spatial aliasing specific to multizone soundfield reproduction geometry is further considered in analytically derived low-pass filters. Simulated and real-world experiments are conducted to verify the performance of the proposed method using semi-circular and linear loudspeaker arrays. Simulated implementations of the proposed method show that significant speech intelligibility contrast and speech quality is achievable between zones. A range of Perceptual Evaluation of Speech Quality (PESQ) Mean Opinion Scores (MOS) that indicate good quality are obtained while at the same time providing confidential privacy as indicated by SIC. The simulations also show that the method is robust to variations in the speech, virtual source location, array geometry and number of loudspeakers. Real-world experiments confirm the practicality of the proposed methods by showing that good quality and confidential privacy are achievable.
... More recently, work has been done [10] to extend a method [3] to the reproduction of weighted wideband speech soundfields by using the spatial weighting function. This is shown in [11] to allow each zone's acoustic content to be controlled by dynamic space-time-frequency weighting. ...
... If the leaked speech is at a level below the threshold of hearing then it may be expected to start becoming inaudible and/or masked. To reproduce clear speech in a weighted multizone soundfield at a level of 60 dBA in a zone, known as the 'bright' zone, the level of leaked speech in the quiet zone could be reduced to around 30 dBA to 35 dBA [8,11] which is still well above the threshold of hearing (≈ 0 dBA). ...
Conference Paper
Full-text available
This paper proposes two methods for providing speech privacy between spatial zones in anechoic and reverberant environments. The methods are based on masking the content leaked between regions. The masking is optimised to maximise the speech intelligibility contrast (SIC) between the zones. The first method uses a uniform masker signal that is combined with desired multizone loudspeaker signals and requires acoustic contrast between zones. The second method computes a space-time domain masker signal in parallel with the loudspeaker signals so that the combination of the two emphasises the spectral masking in the targeted quiet zone. Simulations show that it is possible to achieve a significant SIC in anechoic environments whilst maintaining speech quality in the bright zone.
... A hybrid system utilising the better aspects of both MSRs and PLs would allow for high acoustic contrast at low and high frequencies. Reproduction of speech soundfields [11], [17], [18] would require low carrier SPL in PLs due to the low energy of high frequency components in speech [19], thus reducing related health risks. Further, frequency dependent PL distortions are less of a problem at higher frequencies [16]. ...
Conference Paper
Full-text available
This paper proposes a hybrid approach to personal sound zones utilising multizone soundfield reproduction techniques and parametric loudspeakers. Crossover filters are designed, to switch between reproduction methods, through analytical analysis of aliasing artifacts in multizone reproductions. By realising the designed crossover filters, wideband acoustic contrast between zones is significantly improved. The trade-off between acoustic contrast and the bandwidth of the reproduced soundfield is investigated. Results show that by incorporating the proposed hybrid model the whole wideband bandwidth is spatial-aliasing free with a mean acoustic contrast consistently above 54.2dB, an improvement of up to 24.2dB from a non-hybrid approach, with as few as 16 dynamic loudspeakers and one parametric loudspeaker.
Thesis
Full-text available
The experience and utility of personal sound is a highly sought after characteristic of shared spaces. Personal sound allows individuals, or small groups of individuals, to listen to separate streams of audio content without external interruption from a third-party. The desired effects of personal acoustic environments can also be areas of minimal sound, where quiet spaces facilitate an effortless mode of communication. These characteristics have become exceedingly difficult to produce in busy environments such as cafes, restaurants, open plan offices and entertainment venues. The concept of, and the ability to provide, spaces of such nature has been of significant interest to researchers in the past two decades. This thesis answers open questions in the area of personal sound reproduction using loudspeaker arrays, which is the active reproduction of soundfields over extended spatial regions of interest. We first provide a review of the mathematical foundations of acoustics theory, single zone and multiple zone soundfield reproduction, as well as background on the human perception of sound. We then introduce novel approaches for the integration of psychoacoustic models in multizone soundfield reproductions and describe implementations that facilitate the efficient computation of complex soundfield synthesis. The psychoacoustic based zone weighting is shown to considerably improve soundfield accuracy, as measured by the soundfield error, and the proposed computational methods are shown capable of providing several orders of magnitude better performance with insignificant effects on synthesis quality. Consideration is then given to the enhancement of privacy and quality in personal sound zones and in particular on the effects of unwanted sound leaking between zones. Optimisation algorithms, along with a priori estimations of cascaded zone leakage filters, are then established so as to provide privacy between the sound zones without diminishing quality. Simulations and real-world experiments are performed, using linear and part-circle loudspeaker arrays, to confirm the practical feasibility of the proposed privacy and quality control techniques. The experiments show that good quality and confidential privacy are achievable simultaneously. The concept of personal sound is then extended to the active suppression of speech across loudspeaker boundaries. Novel suppression techniques are derived for linear and planar loudspeaker boundaries, which are then used to simulate the reduction of speech levels over open spaces and suppression of acoustic reflections from walls. The suppression is shown to be as effective as passive fibre panel absorbers. Finally, we propose a novel ultrasonic parametric and electrodynamic loudspeaker hybrid design for acoustic contrast enhancement in multizone reproduction scenarios and show that significant acoustic contrast can be achieved above the fundamental spatial aliasing frequency.
Conference Paper
Full-text available
This paper proposes and evaluates an efficient approach for practical reproduction of multizone soundfields for speech sources. The reproduction method, based on a previously proposed approach, utilises weighting parameters to control the soundfield reproduced in each zone whilst minimising the number of loudspeakers required. Proposed here is an interpolation scheme for predicting the weighting parameter values of the multizone soundfield model that otherwise requires significant computational effort. It is shown that initial computation time can be reduced by a factor of 1024 with only -85dB of error in the reproduced soundfield relative to reproduction without interpolated weighting parameters. The perceptual impact on the quality of the speech reproduced using the method is also shown to be negligible. By using pre-saved soundfields determined using the proposed approach, practical reproduction of dynamically weighted multizone soundfields of wideband speech could be achieved in real-time. Index Terms— multizone soundfield reproduction, wideband multizone soundfield, weighted multizone soundfield, look-up tables (LUT), interpolation, sound field synthesis (SFS)
Article
Full-text available
Sound zone systems aim to produce regions within a room where listeners may consume separate audio programs with minimal acoustical interference. Often, there is a trade-off between the acoustic contrast achieved between the zones and the fidelity of the reproduced audio program (the target quality). An open question is whether reducing contrast (i.e., allowing greater interference) can improve target quality. The planarity control sound zoning method can be used to improve spatial reproduction, though at the expense of decreased contrast. Hence, this can be used to investigate the relationship between target quality (which is affected by the spatial presentation) and distraction (which is related to the perceived effect of interference). An experiment was conducted investigating target quality and distraction and examining their relationship with overall quality within sound zones. Sound zones were reproduced using acoustic contrast control, planarity control, and pressure matching applied to a circular loudspeaker array. Overall quality was related to target quality and distraction, each having a similar magnitude of effect; however, the result was dependent upon program combination. The highest mean overall quality was a compromise between distraction and target quality, with energy arriving from up to 15 degrees either side of the target direction.
Article
Full-text available
Sound rendering is increasingly being required to extend over certain regions of space for multiple listeners, known as personal sound zones, with minimum interference to listeners in other regions. In this article, we present a systematic overview of the major challenges that have to be dealt with for multizone sound control in a room. Sound control over multiple zones is formulated as an optimization problem, and a unified framework is presented to compare two state-of-the-art sound control techniques. While conventional techniques have been focusing on point-to-point audio processing, we introduce a wave-domain sound field representation and active room compensation for sound pressure control over a region of space. The design of directional loudspeakers is presented and the advantages of using arrays of directional sources are illustrated for sound reproduction, such as better control of sound fields over wide areas and reduced total number of loudspeaker units, thus making it particularly suitable for establishing personal sound zones.
Article
Full-text available
Reproduction of multiple sound zones, in which personal audio programs may be consumed without the need for headphones, is an active topic in acoustical signal processing. Many approaches to sound zone reproduction do not consider control of the bright zone phase, which may lead to self-cancellation problems if the loudspeakers surround the zones. Conversely, control of the phase in a least-squares sense comes at a cost of decreased level difference between the zones and frequency range of cancellation. Single-zone approaches have considered plane wave reproduction by focusing the sound energy in to a point in the wavenumber domain. In this article, a planar bright zone is reproduced via planarity control, which constrains the bright zone energy to impinge from a narrow range of angles via projection in to a spatial domain. Simulation results using a circular array surrounding two zones show the method to produce superior contrast to the least-squares approach, and superior planarity to the contrast maximization approach. Practical performance measurements obtained in an acoustically treated room verify the conclusions drawn under free-field conditions.
Conference Paper
Full-text available
A recent approach to surround sound is to perform exact control of the sound field over a region of space. Here, the driving signals for an array of loudspeakers are chosen to create a desired sound field over an extended area. An interesting subtopic is multi-zone surround sound, where two or more listeners can experience totally independent sound fields. However, multi-zone surround sound is a challenge because implementation can be very non-robust. We formulate multi-zone sound reproduction as a convex optimization problem, where the sound energy leakage into other listener zones is limited to fixed levels, and a constraint is placed on the loudspeaker weights to improve the robustness. An interior point algorithm is de vised for computing the loudspeaker weights, and its performance is compared with least squares approaches of multi-zone reproduction in typical two-zone cases.
Article
Surround sound systems can produce a desired sound field over an extended region of space by using higher order Ambisonics. One application of this capability is the production of multiple independent soundfields in separate zones. This paper investigates multi-zone surround systems for the case of two dimensional reproduction. A least squares approach is used for deriving the loudspeaker weights for producing a desired single frequency wave field in one of N zones, while producing silence in the other N-1 zones. It is shown that reproduction in the active zone is more difficult when an inactive zone is in-line with the virtual sound source and the active zone. Methods for controlling this problem are discussed.
Article
Higher-order ambisonics has been identified as a robust technique for synthesizing a desired sound field. However, the synthesis algorithm requires a large number of secondary sources to derive the optimal results for large reproduction regions and over high operating frequencies. This paper proposes an enhanced method for synthesizing the sound field using a relatively small number of secondary sources which allows improved synthesizing accuracy for certain subregions of the interested zone. This method introduces the spherical harmonic translation into the mode matching algorithm to acquire a uniform modal-domain representation of the sound fields within different sub-regions. Then by changing the weighing of each region, the least mean squares solution can be easily controlled to cater for certain prioritized reproduction requirements. Simulations show that this technique can effectively improve the matching accuracy of a given sub-region, while only slightly increasing the global reproduction error. This method is shown to be especially effective in the situations where the number of secondary sources is limited.
Conference Paper
We introduce a method of reproducing a multizone soundfield within the desired region in a reverberant room. It is based on determining the acoustic transfer function (ATF) between the loudspeaker over the reproduction region using a limited number of microphones. We assume that the soundfield is sparse in the Helmholtz solution domain and find the ATF using a compressed-sensing approach. This sparseness assumption facilitates the finding of the optimal characterization the original sound over the reproduction region based on scarce sound pressure measurements. The outcome of the first stage is then used to derive the optimal least-squares solution for the loudspeaker filter that minimizes the reproduction error over the whole reproduction region. Simulations confirm that the method leads to a significant reduction in the number of required microphones for accurate multizone sound reproduction, while it also facilitates the reproduction over a wide frequency range.
Conference Paper
We introduce a method for 2-D spatial multizone soundfield reproduction based on describing the desired multizone soundfield as an orthogonal expansion of basis functions over the desired reproduction region. This approach finds the solution to the Helmholtz equation that is closest to the desired soundfield in a weighted least squares sense. The basis orthogonal set is formed using QR factorization with as input a suitable set of solutions of the Helmholtz equation. The coefficients of the Helmholtz solution wavefields can then be calculated, reducing the multizone sound reproduction problem to the reconstruction of a set of basis wavefields over the desired region. The method facilitates its application with a more practical loudspeaker configuration. The approach is shown effective for both accurately reproducing sound in the selected bright zone and minimizing sound leakage into the predefined quiet zone.