[show abstract]
[hide abstract]
ABSTRACT: Standard statistical language models use n-grams to capture local
dependencies, or use dynamic modeling techniques to track dependencies
within an article. In this paper, we investigate a new statistical
language model that captures topic-related dependencies of words within
and across sentences. First, we develop a topic-dependent,
sentence-level mixture language model which takes advantage of the topic
constraints in a sentence or article. Second, we introduce
topic-dependent dynamic adaptation techniques in the framework of the
mixture model, using n-gram caches and content word unigram caches.
Experiments with the static (or unadapted) mixture model on the North
American Business (NAB) task show a 21% reduction in perplexity and a
3-4% improvement in recognition accuracy over a general n-gram model,
giving a larger gain than that obtained with supervised dynamic cache
modeling. Further experiments on the Switchboard corpus also showed a
small improvement in performance with the sentence-level mixture model.
Cache modeling techniques introduced in the mixture framework
contributed a further 14% reduction in perplexity and a small
improvement in recognition accuracy on the NAB task for both supervised
and unsupervised adaptation
IEEE Transactions on Speech and Audio Processing 02/1999; · 2.29 Impact Factor