Sub-word modeling of out of vocabulary words in spoken term detection
ABSTRACT This paper deals with comparison of sub-word based methods for spoken term detection (STD) task and phone recognition. The sub-word units are needed for search for out-of-vocabulary words. We compared words, phones and multigrams. The maximal length and pruning of multigrams were investigated first. Then two constrained methods of multigram training were proposed. We evaluated on the NIST STD06 dev-set CTS data. The conclusion is that the proposed method improves the phone accuracy more than 9% relative and STD accuracy more than 7% relative.
- SourceAvailable from: Dong Wang[Show abstract] [Hide abstract]
ABSTRACT: A major challenge faced by a spoken term detection (STD) system is the detection of out-of-vocabulary (OOV) terms. Although a subword-based STD system is able to detect OOV terms, performance reduction is always observed compared to in-vocabulary terms. One challenge that OOV terms bring to STD is the pronunciation uncertainty. A commonly used approach to address this problem is a soft matching procedure, and the other is the stochastic pronunciation modelling (SPM) proposed by the authors. In this paper we compare these two approaches, and combine them using a discriminative decision strategy. Experimental results demonstrated that SPM and soft match are highly complementary, and their combination gives significant performance improvement to OOV term detection.Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on; 04/2010
Conference Paper: Stochastic pronunciation modelling for spoken term detection.[Show abstract] [Hide abstract]
ABSTRACT: A major challenge faced by a spoken term detection (STD) sys- tem is the detection of out-of-vocabulary (OOV) terms. Al- though a subword-based STD system is able to detect OOV terms, performance reduction is always observed compared to in-vocabulary terms. Current approaches to STD do not ac- knowledge the particular properties of OOV terms, such as pro- nunciation uncertainty. In this paper, we use a stochastic pro- nunciation model to deal with the uncertain pronunciations of OOV terms. By considering all possible term pronunciations, predicted by a joint-multigram model, we observe a significant performance improvement. Index Terms: joint-multigram, pronunciation model, spoken term detection, speech recognitionINTERSPEECH 2009, 10th Annual Conference of the International Speech Communication Association, Brighton, United Kingdom, September 6-10, 2009; 01/2009
Conference Paper: Discriminative score normalization for keyword search decision[Show abstract] [Hide abstract]
ABSTRACT: Many keyword search (KWS) systems make “hit/false alarm (FA)” decisions based on the lattice-based posterior probability, which is incomparable across keywords. Therefore, score normalization is essential for a KWS system. In this paper, we investigate the integration of two novel features, ranking-score and relative-to-max, into a discriminative score normalization method. These features are extracted by considering all competing hypotheses of a putative detection. A metric-based normalization method is also applied as a post-processing step to further optimize the term-weighted value (TWV) evaluation metric. We report empirical improvements over standard baselines using the Vietnamese data from IARPA's Babel program in the NIST OpenKWS13 Evaluation setup.ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 05/2014