O Thiergart

Friedrich-Alexander Universität Erlangen-Nürnberg, Erlangen, Bavaria, Germany

Are you O Thiergart?

Claim your profile

Publications (23)20.53 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: The signal-to-diffuse ratio (SDR), which describes the power ratio between the direct and diffuse component of a sound field, is an important parameter in many applications. This paper proposes a power-based SDR estimator which considers the auto power spectral densities obtained by noisy directional microphones. Compared to recently proposed estimators that exploit the spatial coherence between two microphones, the power-based estimator is more robust at lower frequencies given that the microphone directivities are known with sufficiently high accuracy. The proposed estimator can incorporate more than two microphones and can therefore provide accurate SDR estimates independently of the direction-of-arrival of the direct sound. We further propose a method to determine the optimal microphone orientations for a given set of directional microphones. Simulations show the practical applicability.
    ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 05/2014
  • Oliver Thiergart, Emanuel A. P. Habets
    [Show abstract] [Hide abstract]
    ABSTRACT: Microphone arrays are typically used to extract the direct sound of sound sources while suppressing noise and reverberation. Applications such as immersive spatial sound reproduction commonly also require an estimate of the reverberant sound. A linearly constrained minimum variance filter, of which one of the constraints is related to the spatial coherence of the assumed reverberant sound field, is proposed to obtain an estimate of the sound pressure of the reverberant field. The proposed spatial filter provides an almost omnidirectional directivity pattern with spatial nulls for the directions-of-arrival of the direct sound. The filter is computationally efficient and outperforms existing methods.
    IEEE Signal Processing Letters 04/2014; 21(5). · 1.67 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Traditional spatial sound acquisition aims at capturing a sound field with multiple microphones such that at the reproduction side a listener can perceive the sound image as it was at the recording location. Standard techniques for spatial sound acquisition usually use spaced omnidirectional microphones or coincident directional microphones. Alternatively, microphone arrays and spatial filters can be used to capture the sound field. From a geometric point of view, the perspective of the sound field is fixed when using such techniques. In this paper, a geometry-based spatial sound acquisition technique is proposed to compute virtual microphone signals that manifest a different perspective of the sound field. The proposed technique uses a parametric sound field model that is formulated in the time-frequency domain. It is assumed that each time-frequency instant of a microphone signal can be decomposed into one direct and one diffuse sound component. It is further assumed that the direct component is the response of a single isotropic point-like source (IPLS) of which the position is estimated for each time-frequency instant using distributed microphone arrays. Given the sound components and the position of the IPLS, it is possible to synthesize a signal that corresponds to a virtual microphone at an arbitrary position and with an arbitrary pick-up pattern.
    IEEE Transactions on Audio Speech and Language Processing 12/2013; 21(12):2583-2594. · 1.68 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In hands-free communication applications, the main goal is to capture desired sounds, while reducing noise and interfering sounds. However, for natural-sounding telepresence systems, the spatial sound image should also be preserved. Using a recently proposed method for generating the signal of a virtual microphone (VM), one can recreate the sound image from an arbitrary point of view in the sound scene (e.g., close to a desired speaker), while being able to place the physical microphones outside the sound scene. In this paper, we present a method for synthesizing a VM signal in noisy and reverberant environments, where the estimation of the required direct and diffuse sound components is performed using two multichannel linear filters. The direct sound component is estimated using a multichannel Wiener filter, while the diffuse sound component is estimated using a linearly constrained minimum variance filter followed by a single-channel Wiener filter. Simulations in a noisy and reverberant environment show the applicability of the proposed method for sound acquisition in a scenario in which two microphone arrays are installed in a large TV.
    Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics; 10/2013
  • O Thiergart, M Taseska, E A P Habets
    [Show abstract] [Hide abstract]
    ABSTRACT: Extracting sound sources in noisy and reverberant conditions remains a challenging task that is commonly found in modern communication systems. In this work, we consider the problem of obtaining a desired spatial response for at most L simultaneously active sound sources. The proposed spatial filter is obtained by minimizing the diffuse plus self-noise power at the output of the filter subject to L linear constraints. In contrast to earlier works, the L constraints are based on instantaneous narrowband direction-of-arrival estimates. In addition, a novel estimator for the diffuse-to-noise ratio is developed that exhibits a sufficiently high temporal and spectral resolution to achieve both dereverberation and noise reduction. The presented results demonstrate that an optimal tradeoff between maximum white noise gain and maximum directivity is achieved.
    Proc. European Signal Processing Conf. (EUSIPCO); 09/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: We describe the design of a system consisting of several state-of-the-art real-time audio and video processing components enabling multimodal stream manipulation (e.g., automatic online editing for multiparty videoconferencing applications) in open, unconstrained environments. The underlying algorithms are designed to allow multiple people to enter, interact, and leave the observable scene with no constraints. They comprise continuous localisation of audio objects and its application for spatial audio object coding, detection, and tracking of faces, estimation of head poses and visual focus of attention, detection and localisation of verbal and paralinguistic events, and the association and fusion of these different events. Combined all together, they represent multimodal streams with audio objects and semantic video objects and provide semantic information for stream manipulation systems (like a virtual director). Various experiments have been performed to evaluate the performance of the system. The obtained results demonstrate the effectiveness of the proposed design, the various algorithms, and the benefit of fusing different modalities in this scenario.
    Advances in Multimedia 08/2013; 2013.
  • O Thiergart, E A P Habets
    Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP); 05/2013
  • O Thiergart, E A P Habets
    [Show abstract] [Hide abstract]
    ABSTRACT: The instantaneous direction-of-arrival (DOA) of sound is a crucial parameter in the analysis of acoustic scenes. More complex acoustic scenes are often modelled as a sum of two plane waves, which requires a DOA estimator that can provide two DOAs per time and frequency. This paper proposes an approach which computes two DOAs per time and frequency from a B-format microphone signal. The proposed estimator outperforms the state-of-the-art approach by considering also diffuse sound and microphone noise in the signal model. The estimator includes an averaging process prior to computing the DOA estimates, as well as a fallback solution in case no valid results can be obtained. Simulation results demonstrate the practical applicability of the presented algorithm.
    Proc.IEEE Convention of Electrical and Electronics Engineers in Israel(IEEEI); 11/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: The diffuseness of sound fields has previously been estimated in the spatial domain using the spatial coherence between a pair of microphones (omnidirectional or first-order). In this paper, we propose a diffuseness estimator for spherical microphone arrays based on the coherence between eigenbeams, which result from a spherical harmonic decomposition of the sound field. The weighted averaging of the diffuseness estimates over all eigenbeam pairs is shown to significantly reduce the variance of the estimates, particularly in fields with low diffuseness.
    Proc.IEEE Convention of Electrical and Electronics Engineers in Israel(IEEEI); 11/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Many applications in spatial sound recording and processing model the sound scene as a sum of directional and diffuse sound components. The power ratio between both components, i.e., the signal-to-diffuse ratio (SDR), represents an important measure for algorithms which aim at performing robustly in reverberant environments. This contribution discusses the SDR estimation from the spatial coherence between two arbitrary first-order directional microphones. First, the spatial coherence is expressed as function of the SDR. For most microphone setups, the spatial coherence is a complex function where both the absolute value and phase contain relevant information on the SDR. Secondly, the SDR estimator is derived from the spatial coherence function. The estimator is discussed for different practical microphone setups including coincident setups of arbitrary first-order directional microphones and spaced setups of identical first-order directional microphones. An unbiased SDR estimation requires noiseless coherence estimates as well as information on the direction-of-arrival of the directional sound, which usually has to be estimated. Nevertheless, measurement results verify that the proposed estimator is applicable in practice and provides accurate results.
    The Journal of the Acoustical Society of America 10/2012; 132(4):2337-2346. · 1.65 Impact Factor
  • O Thiergart, E A P Habets
    [Show abstract] [Hide abstract]
    ABSTRACT: Describing a sound field with a single audio signal and parametric side information offers an efficient way of storing, transmitting, or reproducing the recorded spatial sound. In many approaches, the parametric representation is based on a simple sound field model assuming for each time-frequency instance a single plane wave and diffuse sound. When multiple sources are active, we thus have to assume that the source signals do not overlap much in the time-frequency domain so that the single-wave model is still satisfied. In this paper, we show that a single-wave model is easily violated even for sparse signals such as speech. When the parametrized spatial sound is reproduced, these violations degrade the spatial quality. We further study whether the well-known measure of approximate W-disjoint orthogonality can be applied to detect the relevant model violations. Index Terms ?? Spatial audio coding, parametric spatial audio processing,W-disjoint orthogonality
    Proc. Intl. Workshop Acoust. Echo Noise Control (IWAENC); 09/2012
  • O Thiergart, Del G Galdo, E A P Habets
    [Show abstract] [Hide abstract]
    ABSTRACT: The signal-to-reverberant ratio (SRR) is an important parameter in several applications such as speech enhancement, dereverberation, and parametric spatial audio coding. In this contribution, an SRR estimator is derived from the direction-of-arrival dependent complex spatial coherence function computed via two omnidirectional microphones. It is shown that by employing a computationally inexpensive DOA estimator, the proposed SRR estimator outperforms existing approaches.
    Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP); 03/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Measuring the degree of diffuseness of a sound field is crucial in many modern parametric spatial audio techniques. In these applications, intensity-based diffuseness estimators are particularly convenient, as the sound intensity can also be used to obtain, e.g., the direction of arrival of the sound. This contribution reviews different diffuseness estimators comparing them under the conditions found in practice, i.e., with arrays of noisy microphones and with the expectation operators substituted by finite temporal averages. The estimators show a similar performance, however, each with specific advantages and disadvantages depending on the scenario. Furthermore, the paper derives an estimator and highlights the possibility of using spatial averaging to improve the temporal resolution of the estimates.
    The Journal of the Acoustical Society of America 03/2012; 131(3):2141-51. · 1.65 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Directional Audio Coding (DirAC) is an efficient technique to capture and reproduce spatial sound on the basis of a downmix audio signal, direction of arrival, and diffuseness of sound. In practice, these parameters are determined using arrays of omnidirectional microphones. The main drawback of such configurations is that the estimates are reliable only in a certain frequency range, which depends on the array size. To overcome this problem and cover large bandwidths, we propose concentric arrays of different sizes. We derive optimal joint estimators for the DirAC parameters with respect to the mean squared error. We address the problem of choosing the optimal array sizes for specific applications such as teleconferencing and we verify our findings with measurements.
    Fraunhofer IIS. 01/2012;
  • [Show abstract] [Hide abstract]
    ABSTRACT: The importance of telecommunication continues to grow in our everyday lives. An ambitious goal for developers is to provide the most natural way of audio communication by giving users the impression of being located next to each other. MPEG Spatial Audio Object Coding (SAOC) is a technology for coding, transmitting, and interactively reproducing spatial sound scenes on any conventional multi-loudspeaker setup (e.g., ITU 5.1). This paper describes how Directional Audio Coding (DirAC) can be used as recording front-end for SAOC-based teleconference systems to capture acoustic scenes and to extract the individual objects (talkers). By introducing a novel DirAC to SAOC parameter transcoder, a highly efficient way of combining both technologies is presented that enables interactive, object-based spatial teleconferencing.
    Fraunhofer IIS. 01/2012;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Methods for spatial audio processing are becoming more important as the variety of multichannel audio applications is permanently increasing. Directional Audio Coding (DirAC) represents a well proven technique to capture and reproduce spatial sound on the basis of a downmix audio signal and parametric side information, namely direction of arrival and diffuseness of the sound. In addition to spatial audio reproduction, the DirAC parameters can be exploited further. In this paper, we propose a computationally efficient approach to determine the position of sound sources based on DirAC parameters. It is shown that the proposed localization method provides reliable estimates even in reverberant environments. The approach also allows to trade off between localization accuracy and tracking performance of moving sound sources.
    Fraunhofer IIS. 01/2012;
  • [Show abstract] [Hide abstract]
    ABSTRACT: The direction dependent analysis of impulse response measurements using spherical microphone arrays can deliver a universal basis for binaural auralization. A new method using dual radius open sphere arrays is proposed to overcome limitations in practical realizations of such arrays. Different methods to combine the two radii have been analyzed and will be presented. A plane wave decomposition in conjunction with a high resolution HRTF database is used to generate a binaural auralization, wherein the different designs are simulated under ideal and real conditions. The results have been evaluated in a quality grading experiment. It is shown that the dual radius cardioids design is an effective method to enhance the perceived quality in comparison to conventional spherical array designs.
    Fraunhofer IIS. 01/2012;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Directional Audio Coding (DirAC) represents an efficient scheme to analyze and reproduce spatial sound; the coded stream consists of a single-channel audio signal and few parameters. Estimation of these parameters ideally relies on figure-of-eight microphones and one omnidirectional microphone. The figure-of-eight directivity can be efficiently approximated by first-order differential microphone arrays. However, due to spatial sampling, there are considerable deviations from the required directivity at high frequencies. These deviations lead to incorrect spatial parameter estimates; especially, the instantaneous direction-of-arrival (DOA) becomes biased and - beyond a certain aliasing frequency - ambiguous. Parameters like the DOA drive subsequent processing units, e. g., for spatial filtering. In this paper we propose a novel strategy to compensate bias and ambiguity not directly w. r. t. the DOA but the output parameters of a spatial filtering processing unit. Simulation results confirm the benefits of the novel method, which does not involve additional computational load at run-time.
    Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on; 06/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Conventional recording techniques for spatial audio are limited to the fact that the spatial image obtained is always relative to the position in which the microphones have been physically placed. In many applications, however, it is desired to place the microphones outside the sound scene and yet be able to capture the sound from an arbitrary perspective. This contribution proposes a method to place a virtual microphone at an arbitrary point in space, by computing a signal perceptually similar to the one which would have been picked up if the microphone had been physically placed in the sound scene. The method relies on a parametric model of the sound field based on point-like isotropic sound sources. The required geometrical information is gathered by two or more distributed microphone arrays. Measurement results demonstrate the applicability of the proposed method and reveal its limitations.
    2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA); 05/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: The diffuseness of sound can be estimated with practical microphone setups by considering the spatial coherence between two microphone signals. In applications where small arrays of omnidirectional microphones are preferred, the diffuseness estimation is impaired by a high signal coherence in diffuse fields at lower frequencies, which is particularly problematic when carrying out the estimation with high temporal resolution. Therefore, we propose to exploit the spatial coherence between two virtual first-order microphones derived from the omnidirectional array. This represents a flexible method to accurately estimate the diffuseness in high-SNR regions at lower frequencies with high temporal resolution.
    IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2011, New Paltz, NY, USA, October 16-19, 2011; 01/2011