[Show abstract][Hide abstract] ABSTRACT: A spatial filter, with L linear constraints that are based on instantaneous narrowband direction-of-arrival (DOA) estimates, was recently proposed to obtain a desired spatial response for at most L sound sources. In noisy and reverberant environments, it becomes difficult to get reliable instantaneous DOA estimates and hence obtain the desired spatial response. In this work, we develop a Bayesian approach to spatial filtering that is more robust to DOA estimation errors. The resulting filter is a weighted sum of spatial filters pointed at a discrete set of DOAs, with the relative contribution of each filter determined by the posterior distribution of the discrete DOAs given the microphone signals. In addition, the proposed spatial filter is able to reduce both reverberation and noise. In this work, the required diffuse sound power is estimated using the posterior distribution of the discrete set of DOAs. Simulation results demonstrate the ability of the proposed filter to achieve strong suppression of the undesired signal components with small amount of signal distortion, in noisy and reverberant conditions.
[Show abstract][Hide abstract] ABSTRACT: Flexible and efficient spatial sound acquisition and subsequent processing are of paramount importance in communication and assisted listening devices such as mobile phones, hearing aids, smart TVs, and emerging wearable devices (e.g., smart watches and glasses). In application scenarios where the number of sound sources quickly varies, sources move, and nonstationary noise and reverberation are commonly encountered, it remains a challenge to capture sounds in such a way that they can be reproduced with a high and invariable sound quality. In addition, the objective in terms of what needs to be captured, and how it should be reproduced, depends on the application and on the user?s preferences. Parametric spatial sound processing has been around for two decades and provides a flexible and efficient solution to capture, code, and transmit, as well as manipulate and reproduce spatial sounds.
Full-text · Article · Mar 2015 · IEEE Signal Processing Magazine
[Show abstract][Hide abstract] ABSTRACT: Extracting desired source signals in noisy and reverberant environments is required in many hands-free communication systems. In practical situations, where the position and number of active sources may be unknown and time-varying, conventional implementations of spatial filters do not provide sufficiently good performance. Recently, informed spatial filters have been introduced that incorporate almost instantaneous parametric information on the sound field, thereby enabling adaptation to new acoustic conditions and moving sources. In this contribution, we propose a spatial filter which generalizes the recently proposed informed linearly constrained minimum variance filter and informed minimum mean square error filter. The proposed filter uses multiple direction-of-arrival estimates and second-order statistics of the noise and diffuse sound. To determine those statistics, an optimal diffuse power estimator is proposed that outperforms state-of-the-art estimators. Extensive performance evaluation demonstrates the effectiveness of the proposed filter in dynamic acoustic conditions. For this purpose, we have considered a challenging scenario which consists of quickly moving sound sources during double-talk. The performance of the proposed spatial filter was evaluated in terms of objective measures including segmental signal-to-reverberation ratio and log spectral distance, and by means of a listening test confirming the objective results.
Full-text · Article · Dec 2014 · IEEE/ACM Transactions on Audio, Speech, and Language Processing
[Show abstract][Hide abstract] ABSTRACT: The signal-to-diffuse ratio (SDR), which describes the power ratio between the direct and diffuse component of a sound field, is an important parameter in many applications. This paper proposes a power-based SDR estimator which considers the auto power spectral densities obtained by noisy directional microphones. Compared to recently proposed estimators that exploit the spatial coherence between two microphones, the power-based estimator is more robust at lower frequencies given that the microphone directivities are known with sufficiently high accuracy. The proposed estimator can incorporate more than two microphones and can therefore provide accurate SDR estimates independently of the direction-of-arrival of the direct sound. We further propose a method to determine the optimal microphone orientations for a given set of directional microphones. Simulations show the practical applicability.
[Show abstract][Hide abstract] ABSTRACT: When capturing speech in a multi-talker telecommunication scenario, it is desirable to keep the enhanced signal at an equal loudness level for each speaker. Single-channel automatic gain control systems are not able to adjust the level of different talkers when they are simultaneously active. In this work, an automatic spatial gain control (ASGC) algorithm is proposed that adjusts the directional response of an existing informed spatial filter such that the direct sound of multiple sources can be kept at a constant desired loudness level at the output. The spatial filter additionally reduces diffuse sound and ambient noise. It is shown that the proposed AGSC works well within the tested scenario, and is able to adjust the levels of different speakers even during double talk scenarios.
[Show abstract][Hide abstract] ABSTRACT: Microphone arrays are typically used to extract the direct sound of sound sources while suppressing noise and reverberation. Applications such as immersive spatial sound reproduction commonly also require an estimate of the reverberant sound. A linearly constrained minimum variance filter, of which one of the constraints is related to the spatial coherence of the assumed reverberant sound field, is proposed to obtain an estimate of the sound pressure of the reverberant field. The proposed spatial filter provides an almost omnidirectional directivity pattern with spatial nulls for the directions-of-arrival of the direct sound. The filter is computationally efficient and outperforms existing methods.
[Show abstract][Hide abstract] ABSTRACT: Traditional spatial sound acquisition aims at capturing a sound field with multiple microphones such that at the reproduction side a listener can perceive the sound image as it was at the recording location. Standard techniques for spatial sound acquisition usually use spaced omnidirectional microphones or coincident directional microphones. Alternatively, microphone arrays and spatial filters can be used to capture the sound field. From a geometric point of view, the perspective of the sound field is fixed when using such techniques. In this paper, a geometry-based spatial sound acquisition technique is proposed to compute virtual microphone signals that manifest a different perspective of the sound field. The proposed technique uses a parametric sound field model that is formulated in the time-frequency domain. It is assumed that each time-frequency instant of a microphone signal can be decomposed into one direct and one diffuse sound component. It is further assumed that the direct component is the response of a single isotropic point-like source (IPLS) of which the position is estimated for each time-frequency instant using distributed microphone arrays. Given the sound components and the position of the IPLS, it is possible to synthesize a signal that corresponds to a virtual microphone at an arbitrary position and with an arbitrary pick-up pattern.
Full-text · Article · Dec 2013 · IEEE Transactions on Audio Speech and Language Processing
[Show abstract][Hide abstract] ABSTRACT: In hands-free communication applications, the main goal is to capture desired sounds, while reducing noise and interfering sounds. However, for natural-sounding telepresence systems, the spatial sound image should also be preserved. Using a recently proposed method for generating the signal of a virtual microphone (VM), one can recreate the sound image from an arbitrary point of view in the sound scene (e.g., close to a desired speaker), while being able to place the physical microphones outside the sound scene. In this paper, we present a method for synthesizing a VM signal in noisy and reverberant environments, where the estimation of the required direct and diffuse sound components is performed using two multichannel linear filters. The direct sound component is estimated using a multichannel Wiener filter, while the diffuse sound component is estimated using a linearly constrained minimum variance filter followed by a single-channel Wiener filter. Simulations in a noisy and reverberant environment show the applicability of the proposed method for sound acquisition in a scenario in which two microphone arrays are installed in a large TV.
[Show abstract][Hide abstract] ABSTRACT: Extracting sound sources in noisy and reverberant conditions remains a challenging task that is commonly found in modern communication systems. In this work, we consider the problem of obtaining a desired spatial response for at most L simultaneously active sound sources. The proposed spatial filter is obtained by minimizing the diffuse plus self-noise power at the output of the filter subject to L linear constraints. In contrast to earlier works, the L constraints are based on instantaneous narrowband direction-of-arrival estimates. In addition, a novel estimator for the diffuse-to-noise ratio is developed that exhibits a sufficiently high temporal and spectral resolution to achieve both dereverberation and noise reduction. The presented results demonstrate that an optimal tradeoff between maximum white noise gain and maximum directivity is achieved.
[Show abstract][Hide abstract] ABSTRACT: We describe the design of a system consisting of several state-of-the-art real-time audio and video processing components enabling multimodal stream manipulation (e.g., automatic online editing for multiparty videoconferencing applications) in open, unconstrained environments. The underlying algorithms are designed to allow multiple people to enter, interact, and leave the observable scene with no constraints. They comprise continuous localisation of audio objects and its application for spatial audio object coding, detection, and tracking of faces, estimation of head poses and visual focus of attention, detection and localisation of verbal and paralinguistic events, and the association and fusion of these different events. Combined all together, they represent multimodal streams with audio objects and semantic video objects and provide semantic information for stream manipulation systems (like a virtual director). Various experiments have been performed to evaluate the performance of the system. The obtained results demonstrate the effectiveness of the proposed design, the various algorithms, and the benefit of fusing different modalities in this scenario.
Preview · Article · Aug 2013 · Advances in Multimedia
[Show abstract][Hide abstract] ABSTRACT: The instantaneous direction-of-arrival (DOA) of sound is a crucial parameter in the analysis of acoustic scenes. More complex acoustic scenes are often modelled as a sum of two plane waves, which requires a DOA estimator that can provide two DOAs per time and frequency. This paper proposes an approach which computes two DOAs per time and frequency from a B-format microphone signal. The proposed estimator outperforms the state-of-the-art approach by considering also diffuse sound and microphone noise in the signal model. The estimator includes an averaging process prior to computing the DOA estimates, as well as a fallback solution in case no valid results can be obtained. Simulation results demonstrate the practical applicability of the presented algorithm.
[Show abstract][Hide abstract] ABSTRACT: The diffuseness of sound fields has previously been estimated in the spatial domain using the spatial coherence between a pair of microphones (omnidirectional or first-order). In this paper, we propose a diffuseness estimator for spherical microphone arrays based on the coherence between eigenbeams, which result from a spherical harmonic decomposition of the sound field. The weighted averaging of the diffuseness estimates over all eigenbeam pairs is shown to significantly reduce the variance of the estimates, particularly in fields with low diffuseness.
[Show abstract][Hide abstract] ABSTRACT: Many applications in spatial sound recording and processing model the sound scene as a sum of directional and diffuse sound components. The power ratio between both components, i.e., the signal-to-diffuse ratio (SDR), represents an important measure for algorithms which aim at performing robustly in reverberant environments. This contribution discusses the SDR estimation from the spatial coherence between two arbitrary first-order directional microphones. First, the spatial coherence is expressed as function of the SDR. For most microphone setups, the spatial coherence is a complex function where both the absolute value and phase contain relevant information on the SDR. Secondly, the SDR estimator is derived from the spatial coherence function. The estimator is discussed for different practical microphone setups including coincident setups of arbitrary first-order directional microphones and spaced setups of identical first-order directional microphones. An unbiased SDR estimation requires noiseless coherence estimates as well as information on the direction-of-arrival of the directional sound, which usually has to be estimated. Nevertheless, measurement results verify that the proposed estimator is applicable in practice and provides accurate results.
Full-text · Article · Oct 2012 · The Journal of the Acoustical Society of America
[Show abstract][Hide abstract] ABSTRACT: Describing a sound field with a single audio signal and parametric side information offers an efficient way of storing, transmitting, or reproducing the recorded spatial sound. In many approaches, the parametric representation is based on a simple sound field model assuming for each time-frequency instance a single plane wave and diffuse sound. When multiple sources are active, we thus have to assume that the source signals do not overlap much in the time-frequency domain so that the single-wave model is still satisfied. In this paper, we show that a single-wave model is easily violated even for sparse signals such as speech. When the parametrized spatial sound is reproduced, these violations degrade the spatial quality. We further study whether the well-known measure of approximate W-disjoint orthogonality can be applied to detect the relevant model violations. Index Terms ?? Spatial audio coding, parametric spatial audio processing,W-disjoint orthogonality
[Show abstract][Hide abstract] ABSTRACT: The signal-to-reverberant ratio (SRR) is an important parameter in several applications such as speech enhancement, dereverberation, and parametric spatial audio coding. In this contribution, an SRR estimator is derived from the direction-of-arrival dependent complex spatial coherence function computed via two omnidirectional microphones. It is shown that by employing a computationally inexpensive DOA estimator, the proposed SRR estimator outperforms existing approaches.
[Show abstract][Hide abstract] ABSTRACT: Measuring the degree of diffuseness of a sound field is crucial in many modern parametric spatial audio techniques. In these applications, intensity-based diffuseness estimators are particularly convenient, as the sound intensity can also be used to obtain, e.g., the direction of arrival of the sound. This contribution reviews different diffuseness estimators comparing them under the conditions found in practice, i.e., with arrays of noisy microphones and with the expectation operators substituted by finite temporal averages. The estimators show a similar performance, however, each with specific advantages and disadvantages depending on the scenario. Furthermore, the paper derives an estimator and highlights the possibility of using spatial averaging to improve the temporal resolution of the estimates.
No preview · Article · Mar 2012 · The Journal of the Acoustical Society of America
[Show abstract][Hide abstract] ABSTRACT: The direction dependent analysis of impulse response measurements using spherical microphone arrays can deliver a universal basis for binaural auralization. A new method using dual radius open sphere arrays is proposed to overcome limitations in practical realizations of such arrays. Different methods to combine the two radii have been analyzed and will be presented. A plane wave decomposition in conjunction with a high resolution HRTF database is used to generate a binaural auralization, wherein the different designs are simulated under ideal and real conditions. The results have been evaluated in a quality grading experiment. It is shown that the dual radius cardioids design is an effective method to enhance the perceived quality in comparison to conventional spherical array designs.