Jun Wu

Johns Hopkins University, Baltimore, MD, United States

Are you Jun Wu?

Claim your profile

Publications (9)0 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: The Structured Language Model (SLM) recently introduced by Chelba and Jelinek is a powerful general formalism for exploiting syntactic dependencies in a left-to-right language model for applications such as speech and handwriting recognition, spelling correction, machine translation, etc. Unlike traditional N-gram models, optimal smoothing techniques -- discounting methods and hierarchical structures for back-off -- are still being developed for the SLM. In the SLM, the statistical dependencies of a word on immediately preceding words, preceding syntactic heads, non-terminal labels, etc., are parameterized as overlapping N-gram dependencies. Statistical dependencies in the parser and tagger used by the SLM also have N-gram like structure. Deleted interpolation has been used to combine these N-gram like models. We demonstrate on two different corpora -- WSJ and Switchboard -- that more recent modified back-off strategies and nonlinear interpolation methods considerably lower the perplexity of the SLM. Improvement in word error rate is also demonstrated on the Switchboard corpus.
    12/2002;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Maximum entropy (ME) techniques have been successfully used to combine different sources of linguistically meaningful constraints in language models. However, most of the current ME models can only be used for small corpora, since the computational load in training ME models for large corpora is unbearable. This problem is especially severe when non-local dependencies are considered. In this paper, we show how to train and use topic-dependent ME models efficiently for a very large corpus, Broadcast News (BN). The training time is greatly reduced by hierarchical training and divide-and-conquer approaches. The computation in using the model is also simplified by pre-normalizing the denominators of the ME model. We report new speech recognition results showing improvement with the topic model relative to the standard N-gram model for the Broadcast News task.
    11/2001;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Structured Language Model (SLM) recently introduced by Chelba and Jelinek is a powerful general formalism for exploit- ing syntactic dependencies in a left-to-right language model for applications such as speech and handwriting recognition, spelling correction, machine translation, etc. Unlike traditional N-gram models, optimal smoothing techniques - discounting methods and hierarchical structures for back-off - are stil l be- ing developed for the SLM. In the SLM, the statistical depen- dencies of a word on immediately preceding words, preced- ing syntactic heads, non-terminal labels, etc., are parameterized as overlapping N-gram dependencies. Statistical dependencies in the parser and tagger used by the SLM also have N-gram like structure. Deleted interpolation has been used to comb ine these N-gram like models. We demonstrate on two different corpora - WSJ and Switchboard - that more recent modified back-off strategies and nonlinear interpolation methods c onsid- erably lower the perplexity of the SLM. Improvement in word error rate is also demonstrated on the Switchboard corpus.
    EUROSPEECH 2001 Scandinavia, 7th European Conference on Speech Communication and Technology, 2nd INTERSPEECH Event, Aalborg, Denmark, September 3-7, 2001; 01/2001
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Maximum entropy language modeling techniques combine different sources of statistical dependence, such as syntactic relationships, topic cohesiveness and collocation frequency, in a unified and effective language model. These techniques however are also computationally very intensive, particularly during model estimation, compared to the more prevalent alternative of interpolating several simple models, each capturing one type of dependency. In this paper we present ways which significantly reduce this complexity by reorganizing the required computations. We show that in case of a model with N-gram constraints, each iteration of the parameter estimation algorithm requires the same amount of computation as estimating a comparable back-off N-gram model. In general, the computational cost of each iteration in model estimation is linear in the number of distinct "histories" seen in the training corpus, times a model-class dependent factor. The reorganization focuses mainly on reducing this...
    11/2000;
  • Source
    Jun Wu, S. Khudanpur
    [Show abstract] [Hide abstract]
    ABSTRACT: The use of syntactic structure in general and heads of syntactic constituents in particular has recently been shown to be beneficial for statistical language modeling. The paper provides an insightful analysis of this role of syntactic structure. It is shown that the predictive power of syntactic heads is mostly complementary to the predictive power of N-grams: they help in positions where an intervening phrase or clause separates the heads from the word being predicted, making the N-gram a poor predictor. Furthermore, a significant portion of this predictive power comes in the form of a more sophisticated back-off effect via the syntactic categories (nonterminal tags) of the heads. Finally, it is shown that using the categories of the syntactic heads is better than using the categories (part-of-speech tags) of the two preceding words, confirming that it is the syntactic analysis and not just the improved back-off strategy which leads to improvements over N-gram models. Experimental results for perplexity and word error rate are presented on the Switchboard corpus to support this analysis
    Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on; 02/2000
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A new statistical language model is presented which combines collocational dependencies with two important sources of long-range statistical dependence: the syntactic structure and the topic of a sentence. These dependencies or constraints are integrated using the maximum entropy technique. Substantial improvements are demonstrated over a trigram model in both perplexity and speech recognition accuracy on the Switchboard task. A detailed analysis of the performance of this language model is provided in order to characterize the manner in which it performs better than a standard N -gram model. It is shown that topic dependencies are most useful in predicting words which are semantically related by the subject matter of the conversation. Syntactic dependencies on the other hand are found to be most helpful in positions where the best predictors of the following word are not within N -gram range due to an intervening phrase or clause. It is also shown that these two methods individually enhance an N -gram model in complementary ways and the overall improvement from their combination is nearly additive.
    Computer Speech & Language. 01/2000;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A new language model is presented which incorporates local N-gram dependencies with two important sources of long-range dependencies: the syntactic structure and the topic of a sentence. These dependencies or constraints are integrated using the maximum entropy method. Substantial improvements are demonstrated over a trigram model in both perplexity and speech recognition accuracy on the Switchboard task. It is shown that topic dependencies are most useful in predicting words which are semantically related by the subject matter of the conversation. Syntactic dependencies on the other hand are found to be most helpful in positions where the best predictors of the following word are not within N-gram range due to an intervening phrase or clause. It is also shown that these two methods individually enhance an N-gram model in complementary ways and the overall improvement from their combination is nearly additive. 1. INTRODUCTION N-gram models have been widely used as statistical models ...
    05/1999;
  • Source
    S. Khudanpur, Jun Wu
    [Show abstract] [Hide abstract]
    ABSTRACT: A compact language model which incorporates local dependencies in the form of N-grams and long distance dependencies through dynamic topic conditional constraints is presented. These constraints are integrated using the maximum entropy principle. Issues in assigning a topic to a test utterance are investigated. Recognition results on the Switchboard corpus are presented showing that with a very small increase in the number of model parameters, reduction in word error rate and language model perplexity are achieved over trigram models. Some analysis follows, demonstrating that the gains are even larger on content-bearing words. The results are compared with those obtained by interpolating topic-independent and topic-specific N-gram models. The framework presented here extends easily to incorporate other forms of statistical dependencies such as syntactic word-pair relationships or hierarchical topic constraints.
    Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing 03/1999; 1:553-556.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a maximum entropy approach to topic sensitive language modeling. By classifying the training data into different parts according to topic, extracting topic sensitive unigram features, and combining these new features with conventional N-grams in language modeling, we build a topic sensitive bigram model. This model improves both perplexity and word error rate. keywords: maximum entropy, language modeling, topic dependency 1 Introduction The goal of a language model is to assign probability distributions to all words that may follow a given word history. Since it is impractical to enumerate all possible histories, we need to classify them into equivalence classes to reduce model complexity. In conventional N-gram models which are based on a Markov assumption, the equivalence classes are based on the last (N-1) words in the history, so that all suffixes of the last (N-1) word string are regarded as features in N-gram models. One of the drawbacks of N-gram models is that they...
    07/1998;

Publication Stats

136 Citations

Institutions

  • 1999–2000
    • Johns Hopkins University
      • Center for Speech and Language Processing
      Baltimore, MD, United States