Isolated word recognition using modular recurrent neural networks
ABSTRACT This paper describes a novel method of using recurrent neural networks (RNN) for isolated word recognition. Each word in the target vocabulary is modeled by a fully connected recurrent network. To recognize an input utterance, the best matching word is determined based on its temporal output response. The system is trained in two stages. First, the RNN speech models (RSM) are trained independently to capture the essential static and temporal characteristics of individual words. This is performed by using an iterative re-segmentation training algorithm which gives the optimal phonetic segmentation automatically for each training utterance. The second-stage involves mutually discriminative training among the RSMs, aiming at minimizing the probability of misclassification. A series of simulation experiments have been performed to demonstrate the effectiveness of the proposed recognition method. For the recognition of (A) 20 English words, (B) 11 Cantonese digits and (C) 58 Cantonese CV syllables, the top-1 accuracy are 91.9, 93.6 and 87.1%, respectively.