Conference Paper

Correlogram template matching for time-delay estimation

Hewlett-Packard Labs., Palo Alto, CA, USA
DOI: 10.1109/ICASSP.2011.5946981 Conference: Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Source: IEEE Xplore

ABSTRACT We propose a correlogram-based time delay estimation method using signals modeled as the output of the cochlea, where the low-level signal processing happens in the human auditory system. With a normalized correlogram that preserves time-delay patterns that are invariant to speech features such as formants, we employ two-dimensional template matching for time-delay estimation. Experimental results show that our method outperforms a traditional correlogram-based method as well as the GCC-PHAT, especially for short analysis windows in a moderately reverberant environment.

1 Bookmark
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Multiple sound signals, such as speech and interfering noises, can be fairly well separated, localized, and interpreted by human listeners with normal binaural hearing. The computational model presented here, based on earlier cochlear modeling work, is a first step at approaching human levels of performance on the localization and separation tasks. This combination of cochlear and binaural models, implemented as real-time algorithms, could provide the front end for a robust sound interpretation system such as a speech recognizer. The cochlear model used is basically a bandpass filterbank with frequency channels corresponding to places on the basilar membrane; filter outputs are half-wave rectified and amplitude-compressed, maintaining fine time resolution. In the binaural model, outputs of corresponding frequency channels from the two ears are combined by cross-correlation. Peaks in the short-time cross-correlation functions are then interpreted as direction. With appropriate preprocessing, the correlation peaks integrate cues based on signal phase, envelope modulation, onset time, and loudness. Based on peaks in the correlation functions, sources can be recognized, localized, and tracked. Through quickly varying gains, sound fragments are separated into streams representing different sources. Preliminary tests of the algorithms are very encouraging.
    Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '83.; 05/1983
  • [Show abstract] [Hide abstract]
    ABSTRACT: This report describes an implementation of a cochlear model proposed by Roy Patterson [Patterson1992]. Like a previous report [Slaney1988], this document is an electronic notebook written using a software package called Mathematica^TM [Wolfram 1988]. This report describes the filter bank and its implementation. The filter bank is designed as a set of parallel bandpass filters, each tuned to a different frequency. This report extends previous work by deriving an even more efficient implementation of the Gammatone filter bank, and by showing the MATLAB^TM code to design and implement an ERB filter bank based on Gammatone filters. 2.0 Ear Filters
  • Source


Available from