Sub-word modeling of out of vocabulary words in spoken term detection
ABSTRACT This paper deals with comparison of sub-word based methods for spoken term detection (STD) task and phone recognition. The sub-word units are needed for search for out-of-vocabulary words. We compared words, phones and multigrams. The maximal length and pruning of multigrams were investigated first. Then two constrained methods of multigram training were proposed. We evaluated on the NIST STD06 dev-set CTS data. The conclusion is that the proposed method improves the phone accuracy more than 9% relative and STD accuracy more than 7% relative.
- SourceAvailable from: Nancy F. Chen
- "Each audio of the speech corpus is automatically segmented and then passed to a large vocabulary continuous speech recognition (LVCSR) system to produce the corresponding word lattice. These lattices are then indexed using techniques such as weighted finite state transducer (WFST)    framework or N-gram indexing    . In search phase, keywords in the textual format are searched on the index to produce a list of putative detections. "
Conference Paper: Discriminative score normalization for keyword search decision[Show abstract] [Hide abstract]
ABSTRACT: Many keyword search (KWS) systems make “hit/false alarm (FA)” decisions based on the lattice-based posterior probability, which is incomparable across keywords. Therefore, score normalization is essential for a KWS system. In this paper, we investigate the integration of two novel features, ranking-score and relative-to-max, into a discriminative score normalization method. These features are extracted by considering all competing hypotheses of a putative detection. A metric-based normalization method is also applied as a post-processing step to further optimize the term-weighted value (TWV) evaluation metric. We report empirical improvements over standard baselines using the Vietnamese data from IARPA's Babel program in the NIST OpenKWS13 Evaluation setup.ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 05/2014
[Show abstract] [Hide abstract]
- "A typical STD system comprises an ASR subsystem for lattice generation and a STD subsystem for term detection, as illustrated in Figure 1. State-of-the-art STD systems include those reported in      . Fig. 1. "
ABSTRACT: A major challenge faced by a spoken term detection (STD) system is the detection of out-of-vocabulary (OOV) terms. Although a subword-based STD system is able to detect OOV terms, performance reduction is always observed compared to in-vocabulary terms. One challenge that OOV terms bring to STD is the pronunciation uncertainty. A commonly used approach to address this problem is a soft matching procedure, and the other is the stochastic pronunciation modelling (SPM) proposed by the authors. In this paper we compare these two approaches, and combine them using a discriminative decision strategy. Experimental results demonstrated that SPM and soft match are highly complementary, and their combination gives significant performance improvement to OOV term detection.Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on; 04/2010
- "We take this approach in the work reported here. Other types of subword units under investigation include word-fragments , particles , graphones ,, multigrams , syllables  and graphemes . Both in-vocabulary (INV) terms and OOV terms can be retrieved in the same way by a subword-based system, based on their pronunciations. "
Conference Paper: Stochastic pronunciation modelling for spoken term detection.[Show abstract] [Hide abstract]
ABSTRACT: A major challenge faced by a spoken term detection (STD) sys- tem is the detection of out-of-vocabulary (OOV) terms. Al- though a subword-based STD system is able to detect OOV terms, performance reduction is always observed compared to in-vocabulary terms. Current approaches to STD do not ac- knowledge the particular properties of OOV terms, such as pro- nunciation uncertainty. In this paper, we use a stochastic pro- nunciation model to deal with the uncertain pronunciations of OOV terms. By considering all possible term pronunciations, predicted by a joint-multigram model, we observe a significant performance improvement. Index Terms: joint-multigram, pronunciation model, spoken term detection, speech recognitionINTERSPEECH 2009, 10th Annual Conference of the International Speech Communication Association, Brighton, United Kingdom, September 6-10, 2009; 01/2009