Conference Paper

Blind alignment of asynchronously recorded signals for distributed microphone array

Grad. Sch. of Inf. Sci. & Technol., Univ. of Tokyo, Tokyo, Japan
DOI: 10.1109/ASPAA.2009.5346505 Conference: Applications of Signal Processing to Audio and Acoustics, 2009. WASPAA '09. IEEE Workshop on
Source: IEEE Xplore

ABSTRACT In this paper, aiming to utilize independent recording devices as a distributed microphone array, we present a novel method for alignment of recorded signals with localizing microphones and sources. Unlike conventional microphone array, signals recorded by independent devices have different origins of time, and microphone positions are generally unknown. In order to estimate both of them from only recorded signals, time differences between channels for each source are detected, which still include the differences of time origins, and an objective function defined by their square errors is minimized. For that, simple iterative update rules are derived through auxiliary function approach. The validity of our approach is evaluated by simulative experiment.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper addresses the online calibration of an asynchronous microphone array for robots. Conventional microphone array technologies require a lot of measurements of transfer functions to calibrate microphone locations, and a multi-channel A/D converter for inter-microphone synchronization. We solve these two problems using a framework combining Simultaneous Localization and Mapping (SLAM) and beamforming in an online manner. To do this, we assume that estimations of microphone locations, a sound source location, and microphone clock difference correspond to mapping, self-localization, observation errors in SLAM, respectively. In our framework, the SLAM process calibrates locations and clock differences of microphones every time a microphone array observes a sound like a human's clapping, and a beamforming process works as a cost function to decide the convergence of calibration by localizing the sound with the estimated locations and clock differences. After calibration, beamforming is used for sound source localization. We implemented a prototype system using Extended Kalman Filter (EKF) based SLAM and Delay-and-Sum Beamforming (DS-BF). The experimental results showed that microphone locations and clock differences were estimated properly with 10–15 sound events (handclaps), and the error of sound source localization with the estimated information was less than the grid size of beamforming, that is, the lowest error was theoretically attained.
    2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2011, San Francisco, CA, USA, September 25-30, 2011; 01/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In recent years ad-hoc microphone arrays have become ubiquitous, and the capture hardware and quality is increasingly more sophisticated. Ad-hoc arrays hold a vast potential for audio applications, but they are inherently asynchronous, i.e., temporal offset exists in each channel, and furthermore the device locations are generally unknown. Therefore, the data is not directly suitable for traditional microphone array applications such as source localization and beamforming. This work presents a least squares method for temporal offset estimation of a static ad-hoc microphone array. The method utilizes the captured audio content without the need to emit calibration signals, provided that during the recording a sufficient amount of sound sources surround the array. The Cramer-Rao lower bound of the estimator is given and the effect of limited number of surrounding sources on the solution accuracy is investigated. A practical implementation is then presented using non-linear filtering with automatic parameter adjustment. Simulations over a range of reverberation and noise levels demonstrate the algorithm’s robustness. Using smartphones an average RMS error of 3.5 samples (at 48 kHz) was reached when the algorithm’s assumptions were met.
    IEEE Transactions on Audio Speech and Language Processing 01/2013; 21(11):2393-2402. · 1.68 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Ad-hoc arrays formed by mobile devices are increasingly available to capture audio and video in social events. Using spatial signal processing algorithms, e.g., beamforming, with microphone signals of such arrays is hindered by the unknown locations of the devices and the lack of temporal synchronization between them. While self-calibration methods can be applied to estimate these missing parameters, they typically impose restrictions and require time to converge. Time difference of arrival (TDOA) values contain source related spatial information, and they have been previously used in source localization and tracking. In this work, relative time-of-arrival (TOA) is proposed to be used for estimating source spatial information. The method is then applied for beamforming using ad-hoc arrays. Simulations and measurements with smartphones are used to test the accuracy of different proposed TOA estimators. Then, speech captured by a smartphone array is beamformed using the TOA estimators. Results show that Kalman filter based TOA steering achieves similar enhancement performance as using the ground truth TOA.
    Digital Signal Processing (DSP), 2013 18th International Conference on; 01/2013

Full-text (2 Sources)

Available from
May 22, 2014