Article

Performance analysis of dynamic acoustic source separation in reverberant rooms

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We study the effect of reverberation and source movement on the performance of blind source separation and deconvolution (BSSD) algorithms. Using the model of statistical room acoustics we derive theoretical performance measures for a class of unmixing algorithms when these are used in a reverberant room. We specifically investigate the cases: 1) where separation of only direct paths is performed and 2) the case where unmixing of the full reverberant paths is attempted. We develop closed-form performance measures that are dependent on the geometry used and the chosen unmixing system. Using these measures allows us to draw general conclusions on the robustness to source movement of typical BSSD algorithms. Results indicate that performance of systems that show very good separation in static reverberant environments is significantly reduced when sources move, with performance degrading to that of simple direct-path separation

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In reverberant rooms, Talantzis et al. [14] proposed dynamic acoustic source separation performance. Here, every reverberant unmixing path is explored. ...
Article
Full-text available
A microphone positioned far away observes speech signals with little acoustic interference, in terms of both reverberation and noise. As a result, the quality of blind speech degrades, blind source separation (BSS) from obtained speech samples and blind reverberation (BD) removal are the most challenging issues. The BSS and BD were examined separately in the previous studies. This study proposed a novel approach for both BD and BSS. Based on the discrete Fourier transform (DFT), the time-domain signals are converted into equal frequency-domain signals by adopting fast Fourier transform. The lightweight Convolutional Neural Network (CNN)-based Quantum Teaching–Learning-Based Optimization (QTLBO) called as lightweight CNN-QTLBO algorithm effectively removes the dereverberation prior to the BSS. Next, we applied Principal Component Discriminant Power-based Linear Discriminant Analysis (PCDP-LDA) for blind source separation. From the comparative results, the proposed technique demonstrated better results in terms of direct-to-reverberation ratio (DRR), signal-to-interference ratio (SIR), and target-to-interference ratio (TIR) than other existing techniques. From the mixed reverberant signals, the proposed techniques accurately separate the original signals.
... Gaussianity of the source signal, often used in blind source separation algorithms, can be applied to define an information-theoretical measure called mutual information to enhance estimated TDOA under reverberant conditions [61,62]. Sound features corresponding to the excitation source of the speech production mechanism are claimed to be robust to noise and reverberation. ...
Conference Paper
Full-text available
In this paper we present a novel signal processing algorithm and array for sound source localization in an enclosed area. This method, which has some similarity to the human eye structure, consists of a novel hemispherical microphone array with microphones on the shell and one microphone in the sphere center. A signal processing scheme utilizes parallel creation of a special closeness functions for each microphone direction on the shell in the time domain. By choosing microphone directions corresponding to the highest closeness function values and implementing a linear weighted spatial averaging on those directions we estimate the sound source direction.
... In the multichannel case, the non-minimum phase problem is eliminated and exact inversion can be achieved [2,8]. However, it has been observed that exact equalization is of limited value in practice, when the RTF estimates contains even moderate errors [1,3,9]. Various alternatives have been proposed for improving robustness to RTF inaccuracies [10,11,12]. ...
Conference Paper
Full-text available
Equalization of room transfer functions (RTFs) is important in many speech and audio processing applications. It is a challenging problem because RTFs are several thousand taps long and non-minimum phase and in practice only approximate measurements of the RTFs are available. In this paper, we present a subband multichannel least squares method for equalization of RTFs which is computationally efficient and less sensitive to inaccuracies in the measured RTFs compared to its fullband counterpart. Experimental results using simulated impulse responses demonstrate the performance of the algorithm.
... Identification of systems with long impulse responses is of major importance in many applications, including acoustic echo cancellation [1], [2], relative transfer function (RTF) identification [3], dereverberation [4], [5], blind source separation [6], [7] and beamforming in reverberant environments [8], [9]. ...
Article
In this paper, we investigate the influence of crossband filters on a system identifier implemented in the short-time Fourier transform (STFT) domain. We derive analytical relations between the number of crossband filters, which are useful for system identification in the STFT domain, and the power and length of the input signal. We show that increasing the number of crossband filters not necessarily implies a lower steady-state mean-square error (mse) in subbands. The number of useful crossband filters depends on the power ratio between the input signal and the additive noise signal. Furthermore, it depends on the effective length of input signal employed for system identification, which is restricted to enable tracking capability of the algorithm during time variations in the system. As the power of input signal increases or as the time variations in the system become slower, a larger number of crossband filters may be utilized. The proposed subband approach is compared to the conventional fullband approach and to the commonly used subband approach that relies on multiplicative transfer function (MTF) approximation. The comparison is carried out in terms of mse performance and computational complexity. Experimental results verify the theoretical derivations and demonstrate the relations between the number of useful crossband filters and the power and length of the input signal
Chapter
A class of reverberant speech enhancement techniques involve processing of the linear prediction residual signal following Linear Predictive Coding (LPC). These approaches are based on the assumption that reverberation is mainly confined to the prediction residual and affects the LPC coefficients to a lesser extent. This chapter begins with a study on the effects of reverberation on the LPC parameters where mathematical tools from statistical room acoustics are used in the analysis. Consequently, a general framework for dereverberation using LPC is formulated and several existing methods utilizing this approach are reviewed. Finally, a specific method for processing a reverberant prediction residual is presented in detail. This method uses a combination of spatial averaging and larynx cycle-based temporal averaging. Experiments with a microphone array in a small office demonstrate the dereverberation and noise suppression of the spatiotemporal averaging method, showing up to a 5 dB improvement in segmental SRR and 0.33 in the normalized Bark spectral distortion score.
Article
Full-text available
This paper deals with the problem of blind separation of sources (BSS). In the literature, one can find many Inde-pendent Component Algorithms (ICA) to solve the BSS. To demonstrate the performances of their algorithms, re-searchers often use different methods or performance in-dexes depending on their source signals and their applica-tions. Many methods and performance indexes can not be used to compare two different algorithms applied to differ-ent signals. Most of the widely used performance indexes or methods are mentioned and discussed hereafter. We also give many examples to show limitations or drawbacks of some performance indexes or methods.
Article
Full-text available
Image methods are commonly used for the analysis of the acoustic properties of enclosures. In this paper we discuss the theoretical and practical use of image techniques for simulating, on a digital computer, the impulse response between two points in a small rectangular room. The resulting impulse response, when convolved with any desired input signal, such as speech, simulates room reverberation of the input signal. This technique is useful in signal processing or psychoacoustic studies. The entire process is carried out on a digital computer so that a wide range of room parameters can be studied with accurate control over the experimental conditions. A FORTRAN implementation of this model has been included.
Conference Paper
Full-text available
This paper describes a robust real-time blind source separation (BSS) method for moving speech signals in a room. Our method employs frequency domain independent component analysis (ICA) using a blockwise batch algorithm in the first stage, and the separated signals are refined by postprocessing using crosstalk component estimation and nonstationary spectral subtraction in the second stage. The blockwise batch algorithm achieves better performance than an online algorithm when sources are fixed, and the postprocessing compensates for performance degradation caused by source movement. Experimental results using speech signals recorded in a real room show that the proposed method realizes robust real-time separation for moving sources. Our method is implemented on a standard PC and works in real time.
Conference Paper
Full-text available
We examine the problem of blind audio source separation using independent component analysis (ICA). In order to separate audio sources recorded in a real recording environment, we need to model the mixing process as convolutional. Many methods have been introduced for separating convolved mixtures, the most successful of which require working in the frequency domain. This paper proposes a fixed-point algorithm for performing fast frequency domain ICA, as well as a method to increase the stability and enhance the performance of previous frequency domain ICA algorithms
Conference Paper
Full-text available
In this paper we present a new on-line Blind Signal Separation method capable to separate convolutive speech signals of moving speakers in highly reverberant rooms. The separation network used is a recurrent network which performs separation of convolutive speech mixtures in the time domain, without any prior knowledge of the propagation media, based on the Maximum Likelihood Estimation (MLE) principle. The proposed method proved to be able to improve significantly (more than 10% in all adverse mixing situations) the performance of a continuous phoneme-based speech recognition system and therefore can be used as a front-end to separate simultaneous speech of speakers who are moving in arbitrary directions in reverberant rooms. 1. INTRODUCTION Humans have the ability to focus their listening attention on a single talker among a din of conversations and background noise, and recognize a specific voice, known as the "cocktail party effect". The problem of Blind source separation (BSS) c...
Article
Full-text available
Blind separation of independent sources from their convolutive mixtures is a problem in many real world multi-sensor applications. In this paper we present a solution to this problem based on the information maximization principle, which was recently proposed by Bell and Sejnowski for the case of blind separation of instantaneous mixtures. We present a feedback network architecture capable of coping with convolutive mixtures, and we derive the adaptation equations for the adaptive filters in the network by maximizing the information transferred through the network. Examples using speech signals are presented to illustrate the algorithm. 1 INTRODUCTION Blind source separation (BSS) denotes observing mixtures of independent sources, and by making use of these mixture signals only and nothing else, recovering the original signals. Due to a number of interesting applications in communications and speech and medical signal processing BSS has received recently lots of attention. Most of th...
Article
Full-text available
Blind source separation (BSS) of audio signals in echoic environments such as an office room is still a very challenging problem. Here we approach the problem from a practical perspective and shed light on how robust a two channel echoic parametric demixing can get. We assume that an oracle (i.e. a perfect estimator) provides a truncated estimate of the mixing FIR filters for a given source configuration. This way we can study the properties of a parametric demixer using the adjoint of the truncated mixing matrix. For several degrees of truncation, we compute how the separation SNR varies as a function of the uncertainty of the true source position. The true source position is uniformly distributed within a sphere of radius R around an assumed position, to reflect the fact that parameters of interest are imprecisely estimated. Simulations of artificial echoic mixings show that the higher order demixing filters have little robustness to position uncertainties (and therefore to errors of estimation) while the overall performance remains almost constant beyond the second order approximation. This should represent a guideline for what is practically achievable with a class of BSS techniques in echoic environments.
Article
Full-text available
N wideband sources recorded using N closely spaced receivers can feasibly be separated based only on second order statistics when using a physical model of the mixing process. In this case we show that the parameter estimation problem can be essentially reduced to considering directions of arrival and attenuations of each signal. The paper presents two demixing methods operating in the time and frequency domain and experimentally shows that it is always possible to demix signals arriving at different angles. Moreover, one can use spatial cues to solve the channel selection problem and a post-processing Wiener filter to ameliorate the artifacts caused by demixing. 1 Introduction Blind source separation (BSS) is capable of dramatic results when used to separate mixtures of independent signals. The method relies on simultaneous recordings of signals from two or more input sensors and separates the original sources purely on the basis of statistical independence between them. Unfortunatel...
Article
Full-text available
Source separation arises in a surprising number of signal processing applications, from speech recognition to EEG analysis. In the square linear blind source separation problem without time delays, one must find an unmixing matrix which can detangle the result of mixing n unknown independent sources through an unknown n Theta n mixing matrix. The recently introduced ICA blind source separation algorithm (Baram and Roth 1994; Bell and Sejnowski 1995) is a powerful and surprisingly simple technique for solving this problem. ICA is all the more remarkable for performing so well despite making absolutely no use of the temporal structure of its input! This paper presents a new algorithm, contextual ICA, which derives from a maximum likelihood density estimation formulation of the problem. cICA can incorporate arbitrarily complex adaptive history-sensitive source models, and thereby make use of the temporal structure of its input. This allows it to separate in a number of situations where s...
Article
Scitation is the online home of leading journals and conference proceedings from AIP Publishing and AIP Member Societies
Article
Acoustic crosstalk cancellation systems create a virtual audio environment by using loudspeakers to deliver appropriate binaural signals to the listener. Typically, the system is designed to equalize the direct-path transfer functions between the loudspeakers and the ears. In this paper statistical room acoustics is used to derive a closed-form expression that predicts the performance of such a system when used in a reverberant environment, and the expression is verified through simulations. The results of this paper enable designers to undertake a preliminary analysis of how well a given crosstalk cancellation system will perform in a reverberant environment, without resorting to time-consuming measurements or image-model simulations.
Article
We study and explore the limitations of methods for blind separation of a mixture of multiple speakers in a real reverberant environment. To support our results, we analyze a frequency-domain method, which achieves blind source separation (BSS) by transforming the time-domain convolutive problem to multiple short-term problems in the frequency domain. We show that treating the problem independently at different frequency bins introduces a "permutation inconsistency" problem, which becomes worse as the length of room impulse response increases. Our studies prove that the ideas proposed in the existing literature are not capable of effectively handling this problem and a need exists for its satisfactory solution. We speculate that time-domain BSS techniques may also suffer from an equivalent permutation inconsistency problem when long un-mixing filters are used.
Conference Paper
In this paper, we investigate the performance of an unmixing system obtained by frequency domain Blind Source Separation (BSS) based on Independent Compo- nent Analysis (ICA). Since ICA is based on statistics, i.e., it only attempts to make outputs independent, it is not easy to predict what is going on in a BSS system. We therefore investigate the detailed components in the pro- cessed signals of a whole BSS system by measuring four impulse responses of the system. In particular, we fo- cuse on the direct sound and reverberation in the target and jammer signals. As a result, we reveal that the direct sound and reverberation of the jammer can be reduced compared to a null beamformer (NBF), while the rever- beration of the target cannot be reduced.
Conference Paper
We consider the revision of a previously derived theoretical framework for the robustness of multi-channel sound equalization in reverberant environments when non-exact equalizers are used. Using results from image model simulations, we demonstrate the degradation in performance of an equalization system as the sound source moves from its nominal position inside the enclosure. We show that performance can be controlled to vary between a lower bound, derived when direct-path equalization is used, and a higher one produced when exact equalization is used. Full Text at Springer, may require registration or fee
Article
A method is presented for simulating the impulse response between an acoustic source and multiple microphones in a reverberant room. The method is similar to the image method described by Allen and Berkley [J. Acoust. Soc. Am. 65, 943-950 (1979)] but includes modifications to simulate received echo arrival time accurately. The essential modification is to represent each received echo as a low-pass-filtered impulse at the correct arrival time. Using this "low-pass impulse" method, reverberant rooms can be simulated with sufficient accuracy to investigate multiple-microphone systems that are sensitive to interchannel phase.
Article
A theoretical framework is established, for the robustness of multichannel sound equalization in reverberant environments. Using results from statistical room acoustics, a closed-form expression is derived that predicts the degradation in performance of an equalization system as the sound source moves from its nominal position inside the enclosure. The presented analysis also provides means of identifying the performance bounds that can be expected when using such a system in an actual room. Using extensive computer simulations, the effect of physical parameters such as the relative positions of the source and the receivers, as well as effects of different design parameters are investigated. Based on the conditions imposed by these parameters, it is shown that, depending on the array geometry and the exact form of the equalizers, slight performance gains can be expected as the number of receivers is increased.
Conference Paper
Using statistical room acoustics, we investigate the performance of blind source separation and deconvolution (BSSD) algorithms when used in a reverberant room. We focus on the case where one of the sources moves, and examine the relative impact of source movement and room reverberation on the expected performance. We derive theoretical expressions, and verify these through image model simulations.
Conference Paper
We utilise an information theoretic criterion for exploratory projection pursuit (EPP) and have shown that maximisation by natural gradient ascent of the divergence of a multivariate distribution from normality, using the negentropy as a distance measure, yields a generalised independent component analysis (ICA). By considering a Gram-Charlier approximation of the latent probability density functions (PDF) we develop a generalised neuron nonlinearity which can be considered as a conditional mean estimator of the underlying independent components. The unsupervised learning rule developed is shown to asymptotically exhibit the Bussgang property and as such produces output data with independent components, irrespective of whether the independent latent variables are sub-gaussian or super-gaussian. Improved convergence speeds are reported when momentum terms are introduced into the learning
Article
Despite several recent proposals to achieve blind source separation (BSS) for realistic acoustic signals, the separation performance is still not good enough. In particular, when the impulse responses are long, performance is highly limited. In this paper, we consider a two-input, two-output convolutive BSS problem. First, we show that it is not good to be constrained by the condition T>P, where T is the frame length of the DFT and P is the length of the room impulse responses. We show that there is an optimum frame size that is determined by the trade-off between maintaining the number of samples in each frequency bin to estimate statistics and covering the whole reverberation. We also clarify the reason for the poor performance of BSS in long reverberant environments, highlighting that the framework of BSS works as two sets of frequency-domain adaptive beamformers. Although BSS can reduce reverberant sounds to some extent like adaptive beamformers, they mainly remove the sounds from the jammer direction. This is the reason for the difficulty of BSS in reverberant environments.
Article
This paper investigates the robustness of sound equalization using a room response inverse filter with respect to changing or uncertain source or microphone positions. It is shown that due to the variations of the transfer function from point to point in a room, even small changes in the source or microphone position of just a few tenths of the acoustic wavelength can cause large degradations in the equalized room response. The robustness problem is especially acute at high frequencies, which are known to carry some important attributes of the speech signal. The spatial extent of equalization, derived from the statistical-average properties of sound transmission in rooms, is illustrated by computer simulations which corroborate the theoretical results presented
Article
Identification of an unknown system and recovery of the input signals from observations of the outputs of an unknown multiple-input, multiple-output linear system are considered. Attention is focused on the two-channel case, in which the outputs of a 2×2 linear time invariant system are observed. The approach consists of reconstructing the input signals by assuming that they are statistically uncorrelated and imposing this constraint on the signal estimates. In order to restrict the set of solutions, additional information on the true signal generation and/or on the form of the coupling systems is incorporated. Specific algorithms are developed and tested. As a special case, these algorithms suggest a potentially interesting modification of Widrow's (1975) least-squares method for noise cancellation, where the reference signal contains a component of the desired signal
Article
The strengths and limitations of correlation-based signal processing methods are discussed. The definitions, properties, and computation of higher-order statistics and spectra, with emphasis on the bispectrum and trispectrum are presented. Parametric and nonparametric expressions for polyspectra of linear and nonlinear processes are described. The applications of higher-order spectra in signal processing are discussed.< >
Article
This paper addresses the issue of separating multiple speakers from mixtures of these that are obtained using multiple microphones in a room. An adaptive blind signal separation algorithm, which is entirely based on second-order statistics, is derived. One of the advantages of this algorithm is that no parameters need to be tuned. Moreover, an extension of the algorithm that can simultaneously deal with blind signal separation and echo cancellation is derived. Experiments with real recordings have been carried out, showing the effectiveness of the algorithm for real-world signals
Article
In this paper, we investigate the separation and dereverberation performance of frequency domain Blind Source Separation (BSS) based on Independent Component Analysis (ICA) by measuring impulse responses of a system. Since ICA is a statistical method, i.e., it only attempts to make outputs independent, it is not easy to predict what is going on in a BSS system physically. We therefore investigate the detailed components in the processed signals of a whole BSS system from a physical and acoustical viewpoint. In particular, we focus on the direct sound and reverberation in the target and jammer signals. As a result, we reveal that the direct sound of a jammer can be removed and the reverberation of the jammer can be reduced to some degree by BSS, while the reverberation of the target cannot be reduced. Moreover, we show that a long frame length causes pre-echo noise, and this damages the quality of the separated signal.
Simulating the response of multiple microphones to a single acoustic source in a reverberant room
  • P M Petersen
P. M. Petersen, "Simulating the response of multiple microphones to a single acoustic source in a reverberant room," J. Acoust Soc. Amer., vol. 80, no. 5, pp. 1527-1529, 1986.