Conference Paper

Computing Mel-frequency cepstral coefficients on the power spectrum

Lehrstuhl fur Inf. VI, Rheinisch-Westfalische Tech. Hochschule Aachen
DOI: 10.1109/ICASSP.2001.940770 Conference: Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on, Volume: 1
Source: IEEE Xplore

ABSTRACT We present a method to derive Mel-frequency cepstral coefficients
directly from the power spectrum of a speech signal. We show that
omitting the filterbank in signal analysis does not affect the word
error rate. The presented approach simplifies the speech recognizers
front end by merging subsequent signal analysis steps into a single one.
It avoids possible interpolation and discretization problems and results
in a compact implementation. We show that frequency warping schemes like
vocal tract normalization can be integrated easily in our concept
without additional computational efforts. Recognition test results
obtained with the RWTH large vocabulary speech recognition system are
presented for two different corpora: The German VerbMobil II dev99
corpus, and the English North American Business News 94 20k development

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The paper presents an analysis of speaker activity in online recordings from Internet radio. Proposed system has been developed in Matlab environment. Our research is based on four 1-hour length public debates acquired from the Internet radio. 7-8 speakers participate in the recordings (including one presenter). The speaker recognition was performed on short utterances to facilitate real time processing. The time of speech for each politician has been calculated with the use of gaussian mixture model (GMM) algorithm. Influence of MPEG layer 3 compression algorithm on mel frequency cepstral coefficients (MFCC) has been described. Analysis of neighborhood of speaker models have been done with the use of ISOMAP algorithm.
    IEEE International Conference on Signal Processing: Algorithms, Architectures, Arrangements, and Applications, Poznań, Poland; 09/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Mel Frequency Cepstral Coefficients (MFCCs) are the most popularly used speech features in most speech and speaker recognition applications. In this work, we propose a modified Mel filter bank to extract MFCCs from subsampled speech. We also propose a stronger metric which effectively captures the correlation between MFCCs of original speech and MFCC of resampled speech. It is found that the proposed method of filter bank construction performs distinguishably well and gives recognition performance on resampled speech close to recognition accuracies on original speech.
  • Source

Full-text (2 Sources)

Available from
Jul 8, 2014