[Show abstract][Hide abstract] ABSTRACT: We present our style-specific language model adaptation method for Korean conversational speech recognition. Compared with the written text corpora, conversational speech shows different characteristics of content and style such as filled pauses, word omission, and contraction, which are related to function words and depend on preceding or following words in Korean spontaneous speech. Since obtaining sufficient data for training language model is often difficult in a conversational domain, language model adaptation with large out-of-domain data is useful. For style-specific language model adaptation, first, we estimate in-domain dependent n-gram model by relevance weighting of out-of-domain text data according to style and content similarity. Here, style is represented by n-gram based tf<sup>*</sup>idf similarity. Second, we train in-domain language model including disfluency model. Recognition results show-that n-gram based tf<sup>*</sup>idf similarity weighting effectively reflects style difference and disfluencies can be used as a good predictor to the neighboring words.
Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on; 11/2003