Conference Paper

Sub-word modeling of out of vocabulary words in spoken term detection

Fac. of Inf. Technol., Brno Univ. of Technol., Brno
DOI: 10.1109/SLT.2008.4777893 Conference: Spoken Language Technology Workshop, 2008. SLT 2008. IEEE
Source: IEEE Xplore

ABSTRACT This paper deals with comparison of sub-word based methods for spoken term detection (STD) task and phone recognition. The sub-word units are needed for search for out-of-vocabulary words. We compared words, phones and multigrams. The maximal length and pruning of multigrams were investigated first. Then two constrained methods of multigram training were proposed. We evaluated on the NIST STD06 dev-set CTS data. The conclusion is that the proposed method improves the phone accuracy more than 9% relative and STD accuracy more than 7% relative.

  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we focus on the problem of search for Out Of Vocabulary (OOV) in spoken term detection (STD). The phone level fragment is adopted as the speech recognition decoding unit. Furthermore, weop-timize the phone level fragment in speech recognition system by adding word-position marker. Then inverted triphone index is built to implement fuzzy search for OOV terms. In the term detection confidence measure procedure, we present a method based on multi-layer perceptron (MLP) to complement for lattice-based confidence measure. Experimental result indicates that the optimizationof fragment can give a 3% relative increase in Actual Term Weighted Value (ATWV) for OOV terms. The confidence measure based on MLP could provide another relativeimprovement of 5.5% in ATWV.
    Audio, Language and Image Processing (ICALIP), 2012 International Conference on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: The search for out of vocabulary (OOV) query terms in spoken term detection (STD) task is addressed in this paper. The phone level fragment with word-position marker is naturally adopted as the speech recognition decoding unit. Then the triphone confusion matrix (TriCM) is used to expand the query space to compensate for speech recognition errors. And we also propose a new approach to construct triphone confusion matrix using a smoothing method similar with the Katz method to solve the data sparseness problem. Experimental result on the NIST STD06 eval-set conversational telephone speech (CTS) corpus indicates that triphone confusion matrix can provide a relative improvement of 12% in actual term weighted value (ATWV).
    Chinese Spoken Language Processing (ISCSLP), 2012 8th International Symposium on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Spoken term detection (STD) is a task for open vocabulary search in large recordings of speech. Although the term detection performance for in-vocabulary (INV) terms has achieved a great improvement, the detection performance for out of vocabulary (OOV) terms is still disappointing. In this paper, we propose to combine fragment-based with syllable-based search into a hybrid STD system for OOV terms. Syllable is a kind of knowledge-based subword while fragment is data-driven. We initially compare their different modeling ability for OOVs. Considering the potential complementarities between them, we explore two methods of fusion: index fusion (combining the triphone indexes of a fragment-based and a syllable-based system) and result fusion (merging search results of the two systems). After the result fusion, we achieve a 9.4% relative improvement on NIST STD06 English conversational telephone speech (CTS) EvalSet in actual term weighted value (ATWV).
    Chinese Spoken Language Processing (ISCSLP), 2012 8th International Symposium on; 01/2012