Conference Paper

Computing Mel-frequency cepstral coefficients on the power spectrum

Lehrstuhl fur Inf. VI, Rheinisch-Westfalische Tech. Hochschule Aachen
DOI: 10.1109/ICASSP.2001.940770 Conference: Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on, Volume: 1
Source: IEEE Xplore

ABSTRACT We present a method to derive Mel-frequency cepstral coefficients
directly from the power spectrum of a speech signal. We show that
omitting the filterbank in signal analysis does not affect the word
error rate. The presented approach simplifies the speech recognizers
front end by merging subsequent signal analysis steps into a single one.
It avoids possible interpolation and discretization problems and results
in a compact implementation. We show that frequency warping schemes like
vocal tract normalization can be integrated easily in our concept
without additional computational efforts. Recognition test results
obtained with the RWTH large vocabulary speech recognition system are
presented for two different corpora: The German VerbMobil II dev99
corpus, and the English North American Business News 94 20k development

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We report our efforts in handling situations in Text to Speech Synthesis, where a particular phonemic or syllabic context is not available in the corpus. The idea is to replace such context by another one which is 'similar'. The 'similarity' of phones or syllables lies in the inability of listeners to distinguish them when placed in a particular context. Such phones were found linguistically in two south Indian languages -Tamil and Telugu, by performing listening tests and acoustically, through a phone classification experiment with Mel Frequency Cepstral Coefficients as features. Maximum likelihood classifier is used to find the most misrecognized phones. Both frame level and phone level classifications were performed to find out such phones. The classification experiments were performed on Tamil corpus of 1027 sentences and on TIMIT corpus.
    3rd Language and Technology Conference, Poznan, Poland; 10/2007
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Acoustical mismatch among training and testing phases degrades outstandingly speech recognition results. This problem has limited the development of real-world nonspecific applications, as testing conditions are highly variant or even unpredictable during the training process. Therefore the background noise has to be removedfrom the noisy speech signal to increase the signal intelligibility and to reduce the listener fatigue. Enhancement techniques applied, as pre-processing stages; to the systems remarkably improve recognition results. In this paper, a novel approach is used to enhance the perceived quality of the speech signal when the additive noise cannot be directly controlled. Instead of controlling the background noise, we propose to reinforce the speech signal so that it can be heard more clearly in noisy environments.The subjective evaluation shows that the proposed method improves perceptual quality of speech in various noisy environments. As in some cases speaking may be more convenient than typing, even for rapid typists: many mathematical symbols are missing from the keyboard but can be easily spoken and recognized. Therefore, the proposed system can be used in an application designed for mathematical symbol recognition (especially symbols not available on the keyboard) in schools.
    International Journal of Engineering Science and Technology. 01/2011;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We have shown previously that vocal tract normalization (VTN) results in a linear transformation in the cepstral domain. In this paper we show that Mel-frequency warping can equally well be integrated into the framework of VTN as linear transformation on the cepstrum. We show examples of transformation matri- ces to obtain VTN warped Mel-frequency cepstral coefficients (VTN-MFCC) as linear transformation of the original MFCC and discuss the effect of Mel-frequency warping on the Jaco- bian determinant of the transformation matrix. Finally we show that there is a strong interdependence of VTN and Maximum Likelihood Linear Regression (MLLR) for the case of Gaussian emission probabilities.
    8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - INTERSPEECH 2003, Geneva, Switzerland, September 1-4, 2003; 01/2003

Full-text (2 Sources)

Available from
Jul 8, 2014