Conference Paper

Limited Training Data Robust Speech Recognition Using Kernel-Based Acoustic Models

Dept. of Electr. Eng. & Inf. Technol., Otto-von-Guericke-Univ., Magdeburg
DOI: 10.1109/ICASSP.2006.1660226 Conference: Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on, Volume: 1
Source: IEEE Xplore

ABSTRACT Contemporary automatic speech recognition uses hidden-Markov-models (HMMs) to model the temporal structure of speech where one HMM is used for each phonetic unit. The states of the HMMs are associated with state-conditional probability density functions (PDFs) which are typically realized using mixtures of Gaussian PDFs (GMMs). Training of GMMs is error-prone especially if training data size is limited. This paper evaluates two new methods of modeling state-conditional PDFs using probabilistically interpreted support vector machines and kernel Fisher discriminants. Extensive experiments on the RMI (P. Price et al., 1988) corpus yield substantially improved recognition rates compared to traditional GMMs. Due to their generalization ability, our new methods reduce the word error rate by up to 13% using the complete training set and up to 33% when the training set size is reduced

0 Bookmarks
 · 
44 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A Support Vector Machine (SVM) is a promising machine learning technique that has generated a lot of interest in the pattern recognition community in recent years. The greatest asset of an SVM is its ability to construct nonlinear decision regions in a discriminative fashion. This paper describes an application of SVMs to two speech data classification experiments: 11 vowels spoken in isolation and 16 phones extracted from spontaneous telephone speech. The best performance achieved on the spontaneous speech classification task is a 51% error rate using an RBF kernel. This is comparable to frame-level classification achieved by other nonlinear modeling t echniques s uch a s a rtificial n eural networks (ANN).
    The 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November - 4th December 1998; 01/1998
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Multi-class classification problems can be efficiently solved by partitioning the original problem into sub-problems involving only two classes: for each pair of classes, a (potentially small) neural network is trained using only the data of these two classes. We show how to combine the outputs of the two-class neural networks in order to obtain posterior probabilities for the class decisions. The resulting probabilistic pairwise classifier is part of a handwriting recognition system which is currently applied to check reading. We present results on real world data bases and show that, from a practical point of view, these results compare favorably to other neural network approaches. 1 Introduction Generally, a pattern classifier consists of two main parts: a feature extractor and a classification algorithm. Both parts have the same ultimate goal, namely to transform a given input pattern into a representation that is easily interpretable as a class decision. In the case of feedforwar...
    06/1998;
  • Source
    Annals Eugen. 01/1936; 7:179-188.