Limited Training Data Robust Speech Recognition Using Kernel-Based Acoustic Models
Contemporary automatic speech recognition uses hidden-Markov-models (HMMs) to model the temporal structure of speech where one HMM is used for each phonetic unit. The states of the HMMs are associated with state-conditional probability density functions (PDFs) which are typically realized using mixtures of Gaussian PDFs (GMMs). Training of GMMs is error-prone especially if training data size is limited. This paper evaluates two new methods of modeling state-conditional PDFs using probabilistically interpreted support vector machines and kernel Fisher discriminants. Extensive experiments on the RMI (P. Price et al., 1988) corpus yield substantially improved recognition rates compared to traditional GMMs. Due to their generalization ability, our new methods reduce the word error rate by up to 13% using the complete training set and up to 33% when the training set size is reduced
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.