Conference Paper

An efficient approach to dynamically weighted multizone wideband reproduction of speech soundfields

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper proposes and evaluates an efficient approach for practical reproduction of multizone soundfields for speech sources. The reproduction method, based on a previously proposed approach, utilises weighting parameters to control the soundfield reproduced in each zone whilst minimising the number of loudspeakers required. Proposed here is an interpolation scheme for predicting the weighting parameter values of the multizone soundfield model that otherwise requires significant computational effort. It is shown that initial computation time can be reduced by a factor of 1024 with only −85dB of error in the reproduced soundfield relative to reproduction without interpolated weighting parameters. The perceptual impact on the quality of the speech reproduced using the method is also shown to be negligible. By using pre-saved soundfields determined using the proposed approach, practical reproduction of dynamically weighted multizone soundfields of wideband speech could be achieved in real-time.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Thesis
Full-text available
The experience and utility of personal sound is a highly sought after characteristic of shared spaces. Personal sound allows individuals, or small groups of individuals, to listen to separate streams of audio content without external interruption from a third-party. The desired effects of personal acoustic environments can also be areas of minimal sound, where quiet spaces facilitate an effortless mode of communication. These characteristics have become exceedingly difficult to produce in busy environments such as cafes, restaurants, open plan offices and entertainment venues. The concept of, and the ability to provide, spaces of such nature has been of significant interest to researchers in the past two decades. This thesis answers open questions in the area of personal sound reproduction using loudspeaker arrays, which is the active reproduction of soundfields over extended spatial regions of interest. We first provide a review of the mathematical foundations of acoustics theory, single zone and multiple zone soundfield reproduction, as well as background on the human perception of sound. We then introduce novel approaches for the integration of psychoacoustic models in multizone soundfield reproductions and describe implementations that facilitate the efficient computation of complex soundfield synthesis. The psychoacoustic based zone weighting is shown to considerably improve soundfield accuracy, as measured by the soundfield error, and the proposed computational methods are shown capable of providing several orders of magnitude better performance with insignificant effects on synthesis quality. Consideration is then given to the enhancement of privacy and quality in personal sound zones and in particular on the effects of unwanted sound leaking between zones. Optimisation algorithms, along with a priori estimations of cascaded zone leakage filters, are then established so as to provide privacy between the sound zones without diminishing quality. Simulations and real-world experiments are performed, using linear and part-circle loudspeaker arrays, to confirm the practical feasibility of the proposed privacy and quality control techniques. The experiments show that good quality and confidential privacy are achievable simultaneously. The concept of personal sound is then extended to the active suppression of speech across loudspeaker boundaries. Novel suppression techniques are derived for linear and planar loudspeaker boundaries, which are then used to simulate the reduction of speech levels over open spaces and suppression of acoustic reflections from walls. The suppression is shown to be as effective as passive fibre panel absorbers. Finally, we propose a novel ultrasonic parametric and electrodynamic loudspeaker hybrid design for acoustic contrast enhancement in multizone reproduction scenarios and show that significant acoustic contrast can be achieved above the fundamental spatial aliasing frequency.
Article
Full-text available
Reproducing zones of personal sound is a challenging signal processing problem which has garnered considerable research interest in recent years. We introduce in this work an extended method to multizone soundfield reproduction which overcomes issues with speech privacy and quality. Measures of Speech Intelligibility Contrast (SIC) and speech quality are used as cost functions in an optimisation of speech privacy and quality. Novel spatial and (temporal) frequency domain speech masker filter designs are proposed to accompany the optimisation process. Spatial masking filters are designed using multizone soundfield algorithms which are dependent on the target speech multizone reproduction. Combinations of estimates of acoustic contrast and long term average speech spectra are proposed to provide equal masking influence on speech privacy and quality. Spatial aliasing specific to multizone soundfield reproduction geometry is further considered in analytically derived low-pass filters. Simulated and real-world experiments are conducted to verify the performance of the proposed method using semi-circular and linear loudspeaker arrays. Simulated implementations of the proposed method show that significant speech intelligibility contrast and speech quality is achievable between zones. A range of Perceptual Evaluation of Speech Quality (PESQ) Mean Opinion Scores (MOS) that indicate good quality are obtained while at the same time providing confidential privacy as indicated by SIC. The simulations also show that the method is robust to variations in the speech, virtual source location, array geometry and number of loudspeakers. Real-world experiments confirm the practicality of the proposed methods by showing that good quality and confidential privacy are achievable.
Article
Full-text available
Sound fields are essentially band-limited phe-nomena, both temporally and spatially. This implies that a spatially sampled sound field respecting the Nyquist crite-rion is effectively equivalent to its continuous original. We describe Sound Field Reconstruction (SFR)—a technique that uses the previously stated observation to express the reproduction of a continuous sound field as an inversion of the discrete acoustic channel from a loudspeaker array to a grid of control points. The acoustic channel is inverted us-ing truncated singular value decomposition (SVD) in order to provide optimal sound field reproduction subject to a limited effort constraint. Additionally, a detailed procedure for obtaining loudspeaker driving signals that involves selection of active loudspeakers, coverage of the listening area with control points, and frequency-domain FIR filter design is described. Extensive simulations comparing SFR with Wave Field Synthesis show that on average, SFR provides higher sound field reproduction accuracy.
Article
Full-text available
The prohibitive number of speakers required for the reproduction of isolated soundfields is the major limitation preventing solution deployment. This paper addresses the provision of personal soundfields (zones) to multiple listeners using a limited number of speakers with an underlying assumption of fixed virtual sources. For such multizone systems, optimization of speaker positions and weightings is important to reduce the number of active speakers. Typically, single stage optimization is performed, but in this paper a new two-stage pressure matching optimization is proposed for wideband sound sources. In the first stage, the least-absolute shrinkage and selection operator (Lasso) is used to select the speakers' positions for all sources and frequency bands. A second stage then optimizes reproduction using all selected speakers on the basis of a regularized least-squares (LS) algorithm. The performance of the new, two-stage approach is investigated for different reproduction angles, frequency range and variable total speaker weight powers. The results demonstrate that using two-stage Lasso-LS optimization can give up to 69 dB improvement in the mean squared error (MSE) over a single-stage LS in the reproduction of two isolated audio signals within control zones using e.g. 84 speakers.
Article
Full-text available
The Texas Instruments/Massachusetts Institute of Technology (TIMIT) corpus of read speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems. TIMIT contains speech from 630 speakers representing 8 major dialect divisions of American English, each speaking 10 phonetically-rich sentences. The TIMIT corpus includes time-aligned orthographic, phonetic, and word transcriptions, as well as speech waveform data for each spoken sentence. The release of TIMIT contains several improvements over the Prototype CD-ROM released in December, 1988: (1) full 630-speaker corpus, (2) checked and corrected transcriptions, (3) word-alignment transcriptions, (4) NIST SPHERE-headered waveform files and header manipulation software, (5) phonemic dictionary, (6) new test and training subsets balanced for dialectal and phonetic coverage, and (7) more extensive documentation.
Conference Paper
Full-text available
While higher order ambisonic approaches can be used to generate multiple zone soundfields, this paper adopts a Least Squares matching approach which provides a more flexible formulation. The base approach, adopted from [1] computes speaker weights which allow for the placement of single sources in the soundfield. In this paper the approach is extended firstly to two multifrequency sources and then to narrowband speech signals. The results for multi-frequency sources explore the zonal soundfield errors resulting from varied source positions. For speech signals, the approach provides a potential solution for multiple conversation reproduction in a multi user environment. The paper results indicate that the approach is feasible for zones which do not suffer occlusion effects from other zones. However, for more versatile multizone soundfield reproduction a 3D approach is recommended.
Article
Surround sound systems can produce a desired sound field over an extended region of space by using higher order Ambisonics. One application of this capability is the production of multiple independent soundfields in separate zones. This paper investigates multi-zone surround systems for the case of two dimensional reproduction. A least squares approach is used for deriving the loudspeaker weights for producing a desired single frequency wave field in one of N zones, while producing silence in the other N-1 zones. It is shown that reproduction in the active zone is more difficult when an inactive zone is in-line with the virtual sound source and the active zone. Methods for controlling this problem are discussed.
Article
Higher-order ambisonics has been identified as a robust technique for synthesizing a desired sound field. However, the synthesis algorithm requires a large number of secondary sources to derive the optimal results for large reproduction regions and over high operating frequencies. This paper proposes an enhanced method for synthesizing the sound field using a relatively small number of secondary sources which allows improved synthesizing accuracy for certain subregions of the interested zone. This method introduces the spherical harmonic translation into the mode matching algorithm to acquire a uniform modal-domain representation of the sound fields within different sub-regions. Then by changing the weighing of each region, the least mean squares solution can be easily controlled to cater for certain prioritized reproduction requirements. Simulations show that this technique can effectively improve the matching accuracy of a given sub-region, while only slightly increasing the global reproduction error. This method is shown to be especially effective in the situations where the number of secondary sources is limited.
Conference Paper
We introduce a method for 2-D spatial multizone soundfield reproduction based on describing the desired multizone soundfield as an orthogonal expansion of basis functions over the desired reproduction region. This approach finds the solution to the Helmholtz equation that is closest to the desired soundfield in a weighted least squares sense. The basis orthogonal set is formed using QR factorization with as input a suitable set of solutions of the Helmholtz equation. The coefficients of the Helmholtz solution wavefields can then be calculated, reducing the multizone sound reproduction problem to the reconstruction of a set of basis wavefields over the desired region. The method facilitates its application with a more practical loudspeaker configuration. The approach is shown effective for both accurately reproducing sound in the selected bright zone and minimizing sound leakage into the predefined quiet zone.
Article
Spatial multizone soundfield reproduction over an extended region of open space is a complex and challenging problem in acoustic signal processing. In this paper, we provide a framework to recreate 2-D spatial multizone soundfields using a single array of loudspeakers which encompasses all spatial regions of interest. The reproduction is based on the derivation of an equivalent global soundfield consisting of a number of individual multizone soundfields. This is achieved by using spatial harmonic coefficients translation between coordinate systems. A multizone soundfield reproduction problem is then reduced to the reproduction over the entire region. An important advantage of this approach is the full use of the available dimensionality of the soundfield. This paper provides quantitative performances of a 2-D multizone system and reveals some fundamental limits on 2-D multizone soundfield reproduction. The extensions of the multizone soundfield reproduction design in reverberant rooms are also included.
Book
Intended for use as both a textbook and a reference, "Fourier Acoustics" develops the theory of sound radiation uniquely from the viewpoint of Fourier Analysis. This powerful perspective of sound radiation provides the reader with a comprehensive and practical understanding which will enable him or her to diagnose and solve sound and vibration problems in the 21st Century. As a result of this perspective, "Fourier Acoustics" is able to present thoroughly and simply, for the first time in book form, the theory of nearfield acoustical holography, an important technique which has revolutionised the measurement of sound. Relying little on material outside the book, "Fourier Acoustics" will be invaluable as a graduate level text as well as a reference for researchers in academia and industry. It talks about the physics of wave propogation and sound vibration in homogeneous media. It deals with acoustics, such as radiation of sound, and radiation from vibrating surfaces; inverse problems, such as the theory of nearfield acoustical holography; and, mathematics of specialized functions, such as spherical harmonics.
Article
Reproduction of a soundfield is a fundamental problem in acoustic signal processing. A common approach is to use an array of loudspeakers to reproduce the desired field where the least-squares method is used to calculate the loudspeaker weights. However, the least-squares method involves matrix inversion which may lead to errors if the matrix is poorly conditioned. In this paper, we use the concept of theoretical continuous loudspeaker on a circle to derive the discrete loudspeaker aperture functions by avoiding matrix inversion. In addition, the aperture function obtained through continuous loudspeaker method reveals the underlying structure of the solution as a function of the desired soundfield, the loudspeaker positions, and the frequency. This concept can also be applied for the 3-D soundfield reproduction using spherical harmonics analysis with a spherical array. Results are verified through computer simulations.
Conference Paper
Spatial multizone soundfield reproduction is a difficult problem, which has many potential applications. This paper provides a framework to recreate 2D spatial multizone soundfields using an array of loudspeakers. We derive the desired global soundfield by translating individual desired soundfields to a single global co-ordinate system and applying appropriate angular window functions. We reveal some of the fundamental limits of 2D multizone soundfield reproduction. We show that the ability of multizone reproduction is dependent on (i) maximum radius of multizones, (ii) window length (size, and nature), and (iii) radial distance to the furthermost zone. We illustrate the framework by designing and simulating a two dimensional two zone soundfield.
Conference Paper
Previous objective speech quality assessment models, such as bark spectral distortion (BSD), the perceptual speech quality measure (PSQM), and measuring normalizing blocks (MNB), have been found to be suitable for assessing only a limited range of distortions. A new model has therefore been developed for use across a wider range of network conditions, including analogue connections, codecs, packet loss and variable delay. Known as perceptual evaluation of speech quality (PESQ), it is the result of integration of the perceptual analysis measurement system (PAMS) and PSQM99, an enhanced version of PSQM. PESQ is expected to become a new ITU-T recommendation P.862, replacing P.861 which specified PSQM and MNB
1: Mapping function for transforming P. 862 raw result scores to MOS-LQO
  • I Rec