Article

Evaluation of generalized cross-correlation methods for direction of arrival estimation using two microphones in real environments

Multimedia & Multimodal Processing Research Group, Telecommunication Engineering Department, Polytechnic School, University of Jaén, Spain
Applied Acoustics (Impact Factor: 1.07). 08/2012; 73(8). DOI: 10.1016/j.apacoust.2012.02.002

ABSTRACT The localization of sound sources, and particularly speech, has a numerous number of applications to the industry. This has motivated a continuous effort in developing robust direction-of-arrival detection algorithms, in order to overcome the limitations imposed by real scenarios, such as multiple reflections and undesirable noise sources. Time difference of arrival-based methods, and particularly, generalized cross-correlation approaches have been widely investigated in acoustic signal processing, but there is considerable lack in the technical literature about their evaluation in real environments when only two microphones are used. In this work, four generalized cross-correlation methods for localization of speech sources with two microphones have been analyzed in different real scenarios with a stationary noise source. Furthermore, these scenarios have been acoustically characterized, in order to relate the behavior of these cross-correlation methods with the acoustic properties of noisy scenarios. The scope of this study is not only to assess the accuracy and reliability of a set of well-known localization algorithms, but also to determine how the different acoustic properties of the room under analysis have a determinant influence in the final results, by incorporating in the analysis additional factors to the reverberation time and signal-to-noise ratio. Results of this study have outlined the influence of the acoustic properties analysed in the performance of these methods.

1 Bookmark
 · 
472 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: Estimating the direction of a sound source is an important technique used in various engineering fields, including intelligent robots and surveillance systems. In a household where a user’s voice and noises emitted from electric appliances originate from arbitrary directions in 3-D space, robots need to recognize the directions of multiple sound sources in order to effectively interact with the user. This paper proposes an ear-based estimation (localization) system using two artificial robot ears, each consisting of a spiral-shaped pinna and two microphones, for application in humanoid robots. Four microphones are asymmetrically placed on the left and right sides of the head. The proposed localization algorithm is based on a spatially mapped generalized cross-correlation function which is transformed from the time domain to the space domain by using a measured inter-channel time difference map. For validation of the proposed localization method, two experiments (single- and multiple-source cases) were conducted using male speech. In the case of a single source, with the exception of laterally biased sources, the localization was achieved with an error of less than 10°. In a multiple-source environment, one source was fixed at the front side and the other source changed its direction; from the experimental results, the error rates on the localization of the fixed and varying sources are 0% and 36.9% respectively within an error bound of 15°.
    Applied Acoustics 03/2014; 77:49–58. DOI:10.1016/j.apacoust.2013.10.001 · 1.07 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interacting. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. However, most of these fusion mechanisms have been used in fixed systems, such as those used in video-conference rooms, and thus, they may incur difficulties when constrained to the sensors with which a robot can be equipped. Besides, within the scope of interactive autonomous robots, there is a lack in terms of evaluating the benefits of audio-visual attention mechanisms, compared to only audio or visual approaches, in real scenarios. Most of the tests conducted have been within controlled environments, at short distances and/or with off-line performance measurements. With the goal of demonstrating the benefit of fusing sensory information with a Bayes inference for interactive robotics, this paper presents a system for localizing a person by processing visual and audio data. Moreover, the performance of this system is evaluated and compared via considering the technical limitations of unimodal systems. The experiments show the promise of the proposed approach for the proactive detection and tracking of speakers in a human-robot interactive framework.
    Sensors 06/2014; 14(6):9522-9545. DOI:10.3390/s140609522 · 2.05 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Sound source localization using a two-microphone array is an active area of research, with considerable potential for use with video conferencing, mobile devices, and robotics. Based on the observed time-differences of arrival between sound signals, a probability distribution of the location of the sources is considered to estimate the actual source positions. However, these algorithms assume a given number of sound sources. This paper describes an updated research account on the solution presented in Escolano et al. [J. Acoust. Am. Soc. 132(3), 1257-1260 (2012)], where nested sampling is used to explore a probability distribution of the source position using a Laplacian mixture model, which allows both the number and position of speech sources to be inferred. This paper presents different experimental setups and scenarios to demonstrate the viability of the proposed method, which is compared with some of the most popular sampling methods, demonstrating that nested sampling is an accurate tool for speech localization.
    The Journal of the Acoustical Society of America 02/2014; 135(2):742-753. DOI:10.1121/1.4861356 · 1.56 Impact Factor

Full-text

Download
180 Downloads
Available from
Jun 3, 2014