Conference Paper

Dynamic segmental vector quantization in isolated-word speech recognition

Dept. of Comput. Eng., Kyung Hee Univ., Yongin-Si, South Korea
DOI: 10.1109/ISSPIT.2004.1433722 Conference: Signal Processing and Information Technology, 2004. Proceedings of the Fourth IEEE International Symposium on
Source: IEEE Xplore

ABSTRACT The standard vector quantization (VQ) approach that uses a single vector quantizer for each entire duration of the utterance of each class suffers from the following two limitations: 1) high computational cost for large codebook sizes and 2) lack of explicit characterization of the sequential behavior. Both of two these disadvantages can be remedied by treating each utterance class as a concatenation of several information subsources, each of which is represented by a VQ codebook. With this approach, segmentation schemes obviously need to be investigated. And we call this VQ approach dynamic segmental vector quantization (DSVQ). This paper shows how to design DSVQ with some effective segmentation schemes. Better performances could be seen when applying this approach itself or mixed with hidden Markov model (HMM) in isolated-word speech recognition.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The goal is to improve the speech recognition rate by optimisation of mel frequency cepstral coefficients (MFCCs): modifications concern the time-frequency representations used to estimate these coefficients. There are many ways to obtain a spectrum out of a signal which differ in the method itself (Fourier, wavelets,...), and in the normalisation. We show that we can obtain noise resistant cepstral coefficients, for speaker independent connected word recognition. The recognition system is based on a continuous whole word hidden Markov model. An error reduction rate of approximately 50% is achieved with word models
    The 4th International Conference on Spoken Language Processing, Philadelphia, PA, USA, October 3-6, 1996; 01/1996
  • 01/1993; Prentice Hall., ISBN: 978-0-13-015157-5
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we discuss the use of weighted filter bank analysis (WFBA) to increase the discriminating ability of mel frequency cepstral coefficients (MFCCs). The WFBA emphasizes the peak structure of the log filter bank energies (LFBEs) obtained from filter bank analysis while attenuating the components with lower energy in a simple, direct, and effective way. Experimental results for recognition of continuous Mandarin telephone speech indicate that the WFBA-based cepstral features are more robust than those derived by employing the standard filter bank analysis and some widely used cepstral liftering and frequency filtering schemes both in channel-distorted and noisy conditions.
    IEEE Signal Processing Letters 04/2001; · 1.67 Impact Factor


1 Download
Available from