Conference Paper

Computing Mel-frequency cepstral coefficients on the power spectrum

Lehrstuhl fur Inf. VI, Rheinisch-Westfalische Tech. Hochschule Aachen
DOI: 10.1109/ICASSP.2001.940770 Conference: Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on, Volume: 1
Source: IEEE Xplore

ABSTRACT

We present a method to derive Mel-frequency cepstral coefficients
directly from the power spectrum of a speech signal. We show that
omitting the filterbank in signal analysis does not affect the word
error rate. The presented approach simplifies the speech recognizers
front end by merging subsequent signal analysis steps into a single one.
It avoids possible interpolation and discretization problems and results
in a compact implementation. We show that frequency warping schemes like
vocal tract normalization can be integrated easily in our concept
without additional computational efforts. Recognition test results
obtained with the RWTH large vocabulary speech recognition system are
presented for two different corpora: The German VerbMobil II dev99
corpus, and the English North American Business News 94 20k development
corpus

Download full-text

Full-text

Available from: Sirko Molau, Jul 08, 2014
  • Source
    • "The modulus of Fourier transform is extracted and the magnitude spectrum is obtained as |X| which is a matrix of size N × P . The magnitude spectrum is warped according to the Mel scale in order to adapt the frequency resolution to the properties of the human ear [12]. Note that the Mel (φ f ) and the linear frequency (l f ) [13] "
    [Show abstract] [Hide abstract]
    ABSTRACT: Mel Frequency Cepstral Coefficients (MFCCs) are the most popularly used speech features in most speech and speaker recognition applications. In this work, we propose a modified Mel filter bank to extract MFCCs from subsampled speech. We also propose a stronger metric which effectively captures the correlation between MFCCs of original speech and MFCC of resampled speech. It is found that the proposed method of filter bank construction performs distinguishably well and gives recognition performance on resampled speech close to recognition accuracies on original speech.
    Full-text · Article · Oct 2014
    • "Several methods for MFCC extraction have been proposed by [19] [20] [21] "
    [Show abstract] [Hide abstract]
    ABSTRACT: Mel Frequency Cepstral Coefficients (MFCCs) features have been the strongest candidate for work on automatic speech recognition. An alternative to MFCCs can be the use of features based on Discrete Wavelet Transform. This paper compares the performance of an automatic speech recognition framework based on MFCCs and DWT features. The framework uses Urdu isolated words corpus and the training and test data remain the same for both types of features. The classification has been achieved using Linear Discriminant Analysis.
    No preview · Conference Paper · Jan 2013
  • Source
    • "Each AOV consists of typically 39 Mel-Frequency Cepstral Coefficients (MFCCs) which are created from a frame of input speech by applying a series of transforms [15] [16] as shown in Figure 3. The MFCCs contain all the necessary acoustic information of one frame of input speech. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A scalable large vocabulary, speaker independent speech recognition system is being developed using Hidden Markov Models (HMMs) for acoustic modeling and a Weighted Finite State Transducer (WFST) to compile sentence, word, and phoneme models. The system comprises a software backend search and an FPGA-based Gaussian calculation which are covered here. In this paper, we present an efficient pipelined design implemented both as an embedded peripheral and as a scalable, parallel hardware accelerator. Both architectures have been implemented on an Alpha Data XRC-5T1, reconfigurable computer housing a Virtex 5 SX95T FPGA. The core has been tested and is capable of calculating a full set of Gaussian results from 3825 acoustic models in 9.03 ms which coupled with a backend search of 5000 words has provided an accuracy of over 80%. Parallel implementations have been designed with up to 32 cores and have been successfully implemented with a clock frequency of 133 MHz.
    Full-text · Article · Jan 2011 · International Journal of Reconfigurable Computing
Show more