Conference Paper

Exploiting complementary aspects of phonological features in automatic speech recognition

McGill Univ., Montreal
DOI: 10.1109/ASRU.2007.4430082 Conference: Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on
Source: IEEE Xplore

ABSTRACT This paper presents techniques for exploiting complementary information contained in multiple definitions of phonological feature systems. Three different feature systems, differing in their structure and in the acoustic phonetic features they represent, are considered. A two stage process involving a mechanism for frame level phonological feature detection and a mechanism for decoding phoneme sequences from features is implemented for each phonological feature system. Two methods are investigated for integrating these features with MFCC based ASR systems. First, phonological feature and MFCC based systems are combined in a lattice re-scoring paradigm. Second, confusion network based system combination (CNC) is used to combine phone networks derived from phonological distinctive feature (PDF) and MFCC based systems. It is shown, using both methods, that phone error rates can be reduced by as much as 15% relative to the phone error rates obtained for any individual feature stream.

0 Bookmarks
 · 
72 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we study the effect of using different phonological feature sets for detection-based automatic speech recognition in phone recognition tasks. Three phonological feature sets derived from different underlying phonological theories are investigated. Our experiments were conducted on the TIMIT database. By comparing the oracle phone recognition results achieved by assuming that all the phonological features are correctly detected based on each feature set, we show that selecting an appropriate phonological feature set is crucial to the performance of detection-based ASR. The highly accurate oracle phone recognition results show that the performance of the CRF-based backend, which is commonly used in detection-based ASR, is very satisfactory. Comparison of the oracle phone recognition results and the real phone recognition results indicates that investigation of high-accuracy front-end detectors is a key issue in improving the performance of detection-based ASR.
    Chinese Spoken Language Processing, 2008. ISCSLP '08. 6th International Symposium on; 01/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent theoretical developments in neuroscience suggest that sublexical speech processing occurs via two parallel processing pathways. According to this Dual Stream Model of Speech Processing speech is processed both as sequences of speech sounds and articulations. We attempt to revise the “beads-on-a-string” paradigm of Hidden Markov Models in Automatic Speech Recognition (ASR) by implementing a system for dual stream speech recognition. Abaseline recognition system is enhanced by modeling of articulations as sequences of syllables. An efficient and complementary model to HMMs is developed by formulating Dynamic Time Warping (DTW) as a probabilistic model. The DTW Model (DTWM) is improved by enriching syllable templates with constrained covariance matrices, data imputation, clustering and mixture modeling. The resulting dual stream system is evaluated on the N-Best Southern Dutch Broadcast News benchmark. Promising results are obtained for DTWM classification and ASR tests. We provide a discussion on the remaining problems in implementing dual stream speech recognition. KeywordsSyllabification-DTW-DTWM-Syllable-Articulatory-Dual stream model-Speech recognition
    International Journal of Speech Technology 12/2010; 13(4):219-230.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: ABSTRACT In this paper, we study the effect of using different phonological feature sets for detection-based automatic speech recognition in phone recognition tasks. Three phonological feature sets derived from different underlying phonological theories are investigated. Our experiments ,were ,conducted ,on the ,TIMIT database. By comparing,the oracle phone ,recognition results achieved ,by assuming,that all the phonological features are correctly detected based on each feature set, we show that selecting an appropriate phonological,feature set is crucial ,to the ,performance ,of detection-based ASR. The ,highly ,accurate ,oracle ,phone recognition results show,that the performance,of the CRF-based backend, which is commonly used in detection-based ASR, is very satisfactory. Comparison ,of the ,oracle phone ,recognition results and ,the real phone ,recognition ,results indicates that investigation of high-accuracyfront-end detectors is a key issue in improving the performance,of detection-based ASR. Index Terms— Detection-based ASR, phonological feature system, result fusion, speech recognition 1. INTRODUCTION Currently, detection-based automatic speech recognition (ASR) isa,popular research topic in fields ,related to ASR. Because human,beings often understand speech ,by integrating ,multiple knowledge sources from the bottom up, detection-based ASR systems,attempt ,to reduce ,the gap ,between ,human ,speech
    01/2008;

Full-text (2 Sources)

View
23 Downloads
Available from
May 27, 2014