Conference Proceeding

Sub-word modeling of out of vocabulary words in spoken term detection

Fac. of Inf. Technol., Brno Univ. of Technol., Brno
01/2009; DOI:10.1109/SLT.2008.4777893 pp.273 - 276 In proceeding of: Spoken Language Technology Workshop, 2008. SLT 2008. IEEE
Source: IEEE Xplore

ABSTRACT This paper deals with comparison of sub-word based methods for spoken term detection (STD) task and phone recognition. The sub-word units are needed for search for out-of-vocabulary words. We compared words, phones and multigrams. The maximal length and pruning of multigrams were investigated first. Then two constrained methods of multigram training were proposed. We evaluated on the NIST STD06 dev-set CTS data. The conclusion is that the proposed method improves the phone accuracy more than 9% relative and STD accuracy more than 7% relative.

0 0
 · 
0 Bookmarks
 · 
19 Views

Keywords

7% relative
 
9% relative
 
constrained methods
 
maximal length
 
multigram training
 
multigrams
 
NIST STD06 dev-set CTS data
 
out-of-vocabulary words
 
phone accuracy
 
phone recognition
 
phones
 
sub-word
 
sub-word units
 
words