Publications (2)0 Total impact
ABSTRACT: Maximum entropy (ME) techniques have been successfully used to combine different sources of linguistically meaningful constraints in language models. However, most of the current ME models can only be used for small corpora, since the computational load in training ME models for large corpora is unbearable. This problem is especially severe when non-local dependencies are considered. In this paper, we show how to train and use topic-dependent ME models efficiently for a very large corpus, Broadcast News (BN). The training time is greatly reduced by hierarchical training and divide-and-conquer approaches. The computation in using the model is also simplified by pre-normalizing the denominators of the ME model. We report new speech recognition results showing improvement with the topic model relative to the standard N-gram model for the Broadcast News task.
ABSTRACT: We present a maximum entropy approach to topic sensitive language modeling. By classifying the training data into different parts according to topic, extracting topic sensitive unigram features, and combining these new features with conventional N-grams in language modeling, we build a topic sensitive bigram model. This model improves both perplexity and word error rate. keywords: maximum entropy, language modeling, topic dependency 1 Introduction The goal of a language model is to assign probability distributions to all words that may follow a given word history. Since it is impractical to enumerate all possible histories, we need to classify them into equivalence classes to reduce model complexity. In conventional N-gram models which are based on a Markov assumption, the equivalence classes are based on the last (N-1) words in the history, so that all suffixes of the last (N-1) word string are regarded as features in N-gram models. One of the drawbacks of N-gram models is that they...