Conference Paper

Computing Mel-frequency cepstral coefficients on the power spectrum

Lehrstuhl fur Inf. VI, Rheinisch-Westfalische Tech. Hochschule Aachen
DOI: 10.1109/ICASSP.2001.940770 Conference: Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on, Volume: 1
Source: IEEE Xplore

ABSTRACT We present a method to derive Mel-frequency cepstral coefficients
directly from the power spectrum of a speech signal. We show that
omitting the filterbank in signal analysis does not affect the word
error rate. The presented approach simplifies the speech recognizers
front end by merging subsequent signal analysis steps into a single one.
It avoids possible interpolation and discretization problems and results
in a compact implementation. We show that frequency warping schemes like
vocal tract normalization can be integrated easily in our concept
without additional computational efforts. Recognition test results
obtained with the RWTH large vocabulary speech recognition system are
presented for two different corpora: The German VerbMobil II dev99
corpus, and the English North American Business News 94 20k development

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We report our efforts in handling situations in Text to Speech Synthesis, where a particular phonemic or syllabic context is not available in the corpus. The idea is to replace such context by another one which is 'similar'. The 'similarity' of phones or syllables lies in the inability of listeners to distinguish them when placed in a particular context. Such phones were found linguistically in two south Indian languages -Tamil and Telugu, by performing listening tests and acoustically, through a phone classification experiment with Mel Frequency Cepstral Coefficients as features. Maximum likelihood classifier is used to find the most misrecognized phones. Both frame level and phone level classifications were performed to find out such phones. The classification experiments were performed on Tamil corpus of 1027 sentences and on TIMIT corpus.
    3rd Language and Technology Conference, Poznan, Poland; 10/2007
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A scalable large vocabulary, speaker independent speech recognition system is being developed using Hidden Markov Models (HMMs) for acoustic modeling and a Weighted Finite State Transducer (WFST) to compile sentence, word, and phoneme models. The system comprises a software backend search and an FPGA-based Gaussian calculation which are covered here. In this paper, we present an efficient pipelined design implemented both as an embedded peripheral and as a scalable, parallel hardware accelerator. Both architectures have been implemented on an Alpha Data XRC-5T1, reconfigurable computer housing a Virtex 5 SX95T FPGA. The core has been tested and is capable of calculating a full set of Gaussian results from 3825 acoustic models in 9.03 ms which coupled with a backend search of 5000 words has provided an accuracy of over 80%. Parallel implementations have been designed with up to 32 cores and have been successfully implemented with a clock frequency of 133 MHz.
    International Journal of Reconfigurable Computing 01/2011;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Acoustical mismatch among training and testing phases degrades outstandingly speech recognition results. This problem has limited the development of real-world nonspecific applications, as testing conditions are highly variant or even unpredictable during the training process. Therefore the background noise has to be removedfrom the noisy speech signal to increase the signal intelligibility and to reduce the listener fatigue. Enhancement techniques applied, as pre-processing stages; to the systems remarkably improve recognition results. In this paper, a novel approach is used to enhance the perceived quality of the speech signal when the additive noise cannot be directly controlled. Instead of controlling the background noise, we propose to reinforce the speech signal so that it can be heard more clearly in noisy environments.The subjective evaluation shows that the proposed method improves perceptual quality of speech in various noisy environments. As in some cases speaking may be more convenient than typing, even for rapid typists: many mathematical symbols are missing from the keyboard but can be easily spoken and recognized. Therefore, the proposed system can be used in an application designed for mathematical symbol recognition (especially symbols not available on the keyboard) in schools.
    International Journal of Engineering Science and Technology. 01/2011;

Full-text (2 Sources)

Available from
Jul 8, 2014