Fig 4 - uploaded by Vladimir Tourbabin
Content may be subject to copyright.
DOA estimation performance of the MUSIC and SDD algorithms as a function of frequency and the SH order.

DOA estimation performance of the MUSIC and SDD algorithms as a function of frequency and the SH order.

Source publication
Conference Paper
Full-text available
One of the important tasks of a humanoid-robot auditory system is speaker localization. It is used for the construction of the surrounding acoustic scene and as an input for additional processing methods. Localization is usually required to operate indoors under high reverberation levels. Recently, an algorithm for speaker localization under these...

Citations

... The functionality of SSL on a robotic platform could be useful in several situations, for instance, locating human speakers without visual contact and mapping an unknown acoustic environment [4]. It has been achieved by various methodologies, such as head-related transfer function (HRTF) based time-difference-of-arrival (TDOA), spacedomain distance (SDD), acoustic beamforming, etc [5], [6], [7], [8], [9]. Among those techniques, the acoustic beamforming is widely used to obtain the sound map of a measured field in industrial applications such as transporting pass-by noise localisation and machine fault detection [4], [10], [11], [12], [13]. ...
Article
Constraint by the physical geometry, the lower and upper frequency bound and the scale of the scanning area of a microphone array are limited. Owing to its movable feature, for the service robots, achieving a wider working frequency range with a global view requires a virtually larger and denser array, which can be realised using non-synchronous measurements beamforming with a movable microphone array prototype. However, even when using the state-of-the-art method, it is challenging to localise multiple broadband sources, owing to the difficulty in selecting an appropriate operating frequency without any prior information about the target signal. Therefore, this letter proposes a tensor-completion-based non-synchronous measurements method for broadband multiple-sound-source localisation. The tensor data structure of the broadband signal is analysed, and an alternating direction method based on multiplier optimisation with a tensor multi-norm constraint is proposed. This algorithm can provide a sound map with a distinct global view of three different speech signal sources with high accuracy. Compared with the matrix-based optimisation method, the proposed method can significantly reduce the mean square error of the estimated source location.
... The LSDD-DPD test was implemented according to [14]. The DPD test method was implemented according to [30], with the spherical harmonics coefficients of the plane-wave density estimated up to spherical harmonics order N = 1 and with averaging over 3 time frames and 15 frequencies to construct the correlation matrix. For all tested methods the minimal operating frequency was limited to 1 kHz due to the array aperture. ...
Article
Full-text available
The coherent signal subspace method (CSSM) enables the direction-of-arrival (DoA) estimation of coherent sources with subspace localization methods. The focusing process that aligns the signal subspaces within a frequency band to its central frequency is central to the CSSM. Within current focusing approaches, a direction-independent focusing approach may be more suitable for reverberant environments since no initial estimation of the sources' DoAs is required. However, these methods use integrals over the steering function, and cannot be directly applied to arrays around complex scattering structures, such as robot heads. In this paper, current direction-independent focusing methods are extended to arrays for which the steering function is available only for selected directions, typically in a numerical form. Spherical harmonics decomposition of the steering function is then employed to formulate several aspects of the focusing error. A case of two coherent sources is studied and guidelines for the selection of the frequency smoothing bandwidth are suggested. The performance of the proposed methods is then investigated for an array that is mounted on a robot head. The focusing process is integrated within the direct-path dominance (DPD) test method for speaker localization, originally designed for spherical arrays, extending its application to arrays with arbitrary configurations. Finally, experiments with real data verify the feasibility of the proposed method to successfully estimate the DoAs of multiple speakers under real-world conditions.
... Combination of EB-MUSIC with the latter as a preprocessing stage for source counting was also shown to work well [9]. Similar bin selection approaches have also been proposed [10,11,12]. ...
... The DPD test shows robustness to both reverberant and noisy environments [4]. Several variants of the method were proposed, including alternative methods for bin selection [4], different DOA estimation algorithms [5], [6], and various methods for fusing the estimates from different bins [7]- [10]. Although DPD test based methods show good performance under reverberation, they have been developed for processing in the spherical harmonics domain and are restricted to microphone arrays with a spherical configuration. ...
... Another way to estimate a single DOA in 2 dimensions is presented in [104], called the space-domain distance (SDD) method. It relies on a distance metric that is applied between the captured time-frequency (TF) bin and a calculated TF bin that estimates what would have been captured if the signal would have been located in a given direction. ...
... The SFT coefficients are estimated via measurements or simulations and can be used to pick the TF bins that contain information of the direct-path signal given a pre-defined DOA. The proposed SDD metric in [104] measures the distance between the direct-path TF bins that were calculated from the input signals and the same TF bin using the SFT coefficients given a pre-defined DOA. Given this metric, a grid-search is carried out to find the direction that minimizes it. ...
... A reverberation-robust approach is presented in [104] and discussed in Section 4.1. It carries out the SDD method and defines the signal subspace solely as the eigenvector with the largest eigenvalue. ...
Article
Full-text available
Sound source localization (SSL) in a robotic platform has been essential in the overall scheme of robot audition. It allows a robot to locate a sound source by sound alone. It has an important impact on other robot audition modules, such as source separation, and it enriches human–robot interaction by complementing the robot’s perceptual capabilities. The main objective of this review is to thoroughly map the current state of the SSL field for the reader and provide a starting point to SSL in robotics. To this effect, we present: the evolution and historical context of SSL in robotics; an extensive review and classification of SSL techniques and popular tracking methodologies; different facets of SSL as well as its state-of-the-art; evaluation methodologies used for SSL; and a set of challenges and research motivations.
Conference Paper
p>Algorithms for acoustic source localization and tracking are essential for a wide range of applications such as personal assistants, smart homes, tele-conferencing systems, hearing AIDS, or autonomous systems. Numerous algorithms have been proposed for this purpose which, however, are not evaluated and compared against each other by using a common database so far. The IEEE-AASP Challenge on sound source localization and tracking (LOCATA) provides a novel, comprehensive data corpus for the objective benchmarking of state-of-the-art algorithms on sound source localization and tracking. The data corpus comprises six tasks ranging from the localization of a single static sound source with a static microphone array to the tracking of multiple moving speakers with a moving microphone array. It contains real-world multichannel audio recordings, obtained by hearing AIDS, microphones integrated in a robot head, a planar and a spherical microphone array in an enclosed acoustic environment, as well as positional information about the involved arrays and sound sources represented by moving human talkers or static loudspeakers.</p
Conference Paper
Auditory systems of humanoid robots usually acquire the surrounding sound field by means of microphone arrays. These arrays can undergo motion related to the robot’s activity. The conventional approach to dealing with this motion is to stop the robot during sound acquisition. This approach avoids changing the positions of the microphones during the acquisition and reduces the robot’s ego-noise. However, stopping the robot can interfere with the naturalness of its behaviour. Moreover,the potential performanceimprovementdue to motion of the sound acquiring system can not be attained. This potential is analysed in the current paper. The analysis considers two different types of motion: (i) rotation of the robot’s head and (ii) limb gestures. The study presented here combines both theoretical and numerical simulation approaches. The results show that rotation of the head improves the high frequency performance of the microphone array positioned on the head of the robot. This is complemented by the limb gestures, which improve the low-frequency performance of the array positioned on the torso and limbs of the robot.