Conference Paper

Dynamic segmental vector quantization in isolated-word speech recognition

Dept. of Comput. Eng., Kyung Hee Univ., Yongin-Si, South Korea
DOI: 10.1109/ISSPIT.2004.1433722 Conference: Signal Processing and Information Technology, 2004. Proceedings of the Fourth IEEE International Symposium on
Source: IEEE Xplore


The standard vector quantization (VQ) approach that uses a single vector quantizer for each entire duration of the utterance of each class suffers from the following two limitations: 1) high computational cost for large codebook sizes and 2) lack of explicit characterization of the sequential behavior. Both of two these disadvantages can be remedied by treating each utterance class as a concatenation of several information subsources, each of which is represented by a VQ codebook. With this approach, segmentation schemes obviously need to be investigated. And we call this VQ approach dynamic segmental vector quantization (DSVQ). This paper shows how to design DSVQ with some effective segmentation schemes. Better performances could be seen when applying this approach itself or mixed with hidden Markov model (HMM) in isolated-word speech recognition.

24 Reads
  • 01/1993; Prentice Hall., ISBN: 978-0-13-015157-5
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we discuss the use of weighted filter bank analysis (WFBA) to increase the discriminating ability of mel frequency cepstral coefficients (MFCCs). The WFBA emphasizes the peak structure of the log filter bank energies (LFBEs) obtained from filter bank analysis while attenuating the components with lower energy in a simple, direct, and effective way. Experimental results for recognition of continuous Mandarin telephone speech indicate that the WFBA-based cepstral features are more robust than those derived by employing the standard filter bank analysis and some widely used cepstral liftering and frequency filtering schemes both in channel-distorted and noisy conditions.
    IEEE Signal Processing Letters 04/2001; 8(3-8):70 - 73. DOI:10.1109/97.905943 · 1.75 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Automatic speech recognition (ASR) is an emerging field with the goal of creating a more natural man/machine interface. The single largest obstacle to widespread use of ASR technology is robustness to noise. Since human speech recognition greatly outperforms current ASR systems in noisy environments, ASR systems seek to improve noise robustness by drawing on biological inspiration. Most ASR front ends employ mel frequency cepstral coefficients (mfcc) which is a filter bank-based algorithm whose filters are spaced on a perceptually-motivated linear-log frequency scale. However, filter bandwidth is set by filter spacing and not through biological motivation. The coupling of filter bandwidth to other filter bank parameters (frequency range, number of filters) has led to variations of the original algorithm with different filter bandwidths. In this work, a novel extension to mfcc is introduced which decouples filter bandwidth from the other filter bank parameters by employing the known relationship between filter center frequency and critical bandwidth of the human auditory system. The new algorithm, called human factor cepstral coefficients (hfcc), is shown to outperform the original mfcc and two popular variations in ASR experiments and various noise sources.
    The Journal of the Acoustical Society of America 01/2002; 112(5). DOI:10.1121/1.4779137 · 1.50 Impact Factor


24 Reads
Available from