Conference Paper

Computing Mel-frequency cepstral coefficients on the power spectrum

Lehrstuhl fur Inf. VI, Rheinisch-Westfalische Tech. Hochschule Aachen
DOI: 10.1109/ICASSP.2001.940770 Conference: Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on, Volume: 1
Source: IEEE Xplore

ABSTRACT We present a method to derive Mel-frequency cepstral coefficients
directly from the power spectrum of a speech signal. We show that
omitting the filterbank in signal analysis does not affect the word
error rate. The presented approach simplifies the speech recognizers
front end by merging subsequent signal analysis steps into a single one.
It avoids possible interpolation and discretization problems and results
in a compact implementation. We show that frequency warping schemes like
vocal tract normalization can be integrated easily in our concept
without additional computational efforts. Recognition test results
obtained with the RWTH large vocabulary speech recognition system are
presented for two different corpora: The German VerbMobil II dev99
corpus, and the English North American Business News 94 20k development

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The paper presents an analysis of speaker activity in online recordings from Internet radio. Proposed system has been developed in Matlab environment. Our research is based on four 1-hour length public debates acquired from the Internet radio. 7-8 speakers participate in the recordings (including one presenter). The speaker recognition was performed on short utterances to facilitate real time processing. The time of speech for each politician has been calculated with the use of gaussian mixture model (GMM) algorithm. Influence of MPEG layer 3 compression algorithm on mel frequency cepstral coefficients (MFCC) has been described. Analysis of neighborhood of speaker models have been done with the use of ISOMAP algorithm.
    IEEE International Conference on Signal Processing: Algorithms, Architectures, Arrangements, and Applications, Poznań, Poland; 09/2014
  • Source
  • [Show abstract] [Hide abstract]
    ABSTRACT: The hidden Markov model is supposed as the most common and effective method used in speech recognition for all languages including Vietnamese. However, this method is quite cumbersome and difficult to implement in many embedded systems that have limited resources. Dynamic Time Warping (DTW) method, whereas, has been in much study by many scientists and is proved to be simple and efficient for a relatively small set of words (about 100 words). Though, this method has not been investigated for Vietnamese. This paper will present the investigation result of the combination between Dynamic Time Warping and Correlative Coefficient in Vietnamese speech recognition dependent with the speaker. The vocabulary to be recognized and trained are 124 words. The training data are recorded from 7 people (4 men and 3 women), with four recording time in noise free environment. The recognition outcome achieves the accuracy above 90% on average. In some control areas, such as computer and television control, this accuracy is very promising. DTW, therefore, is proposed as a simple and efficient for Vietnamese speech recognition in many simple control systems.
    2013 International Conference on Control, Automation and Information Sciences (ICCAIS); 11/2013

Full-text (2 Sources)

Available from
Jul 8, 2014