Conference Proceeding

Fast speaker adaptation of artificial neural networks for automatic speech recognition

TCTS-MULTITEL, Faculte Polytech. de Mons
Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on (impact factor: 4.63). 02/2000; DOI:10.1109/ICASSP.2000.862102 ISBN: 0-7803-6293-4 In proceeding of: Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on, Volume: 3
Source: IEEE Xplore

ABSTRACT This paper presents a fast speaker adaptation technique dedicated
to automatic speech recognition systems using artificial neural networks
(ANNs) for hidden Markov models (HMMs) state probability estimation.
Speaker-adapted ANNs are first obtained from the training data using
affine transformations in the feature space. Similarly to the
“eigenvoice” approach, principal components analysis (PCA)
is then applied to these transformation matrices. The first few
eigenvectors represent a small-dimensional space which captures most of
the inter-speaker variability of the training set. During operation,
these eigenvectors can be used to constrain the optimization of the
transformation matrices for the new speakers. This optimization is
performed using steepest descent with gradients obtained using
backpropagation through the speaker independent ANN. We have been using
state-of-the-art hybrid HMM/ANN systems trained on the Phonebook
database. Supervised adaptation experiments with different amounts of
data show better performance of this new technique compared to standard
linear regression in the feature space: with only 20 words of adaptation
data, results show a 15% relative decrease of the word error rate

0 0
 · 
0 Bookmarks
 · 
26 Views

Keywords

15% relative decrease
 
affine transformations
 
automatic speech recognition systems
 
different amounts
 
fast speaker adaptation technique
 
feature space
 
gradients
 
inter-speaker variability
 
Markov models
 
new speakers
 
paper presents
 
PCA
 
principal components analysis
 
small-dimensional space
 
speaker independent ANN
 
Speaker-adapted ANNs
 
state-of-the-art hybrid HMM/ANN systems
 
training data
 
transformation matrices
 
word error rate