A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation.

Journal of Machine Learning Research - Proceedings Track 01/2009; 5:607-614.
Source: DBLP

ABSTRACT In this paper we present a doubly hierarchi- cal Pitman-Yor process language model. Its bottom layer of hierarchy consists of multi- ple hierarchical Pitman-Yor process language models, one each for some number of do- mains. The novel top layer of hierarchy con- sists of a mechanism to couple together mul- tiple language models such that they share statistical strength. Intuitively this sharing results in the "adaptation" of a latent shared language model to each domain. We intro- duce a general formalism capable of describ- ing the overall model which we call the graph- ical Pitman-Yor process and explain how to perform Bayesian inference in it. We present encouraging language model domain adapta- tion results that both illustrate the potential benefits of our new model and suggest new avenues of inquiry.

  • Source
  • [Show abstract] [Hide abstract]
    ABSTRACT: Word usage is influenced by diverse factors, including topic, genre and various speaker/author characteristics. To characterize these aspects of language, we introduce the “Multi-Factor Sparse Plus Low Rank” exponential language model, which allows supervised joint training of arbitrary overlapping factor-specific model components. This flexible architecture has the advantage of being highly interpretable. The elements of sparse parameter matrices can be viewed as factor-dependent corrections (e.g. topic- or speaker-dependent phenomena). In topic modeling experiments on conversational telephone speech, we obtain modest perplexity reductions over an n-gram baseline and demonstrate topic-dependent keyword extraction that leads to a 13% (absolute) improvement in precision over TFIDF. We also show how keywords can be jointly learned for speakers, roles and topics in a study of Supreme Court oral arguments.
    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on; 01/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present power low rank ensembles, a flexible framework for n-gram language modeling where ensembles of low rank matrices and tensors are used to obtain smoothed probability estimates of words in context. Our method is a generalization of n-gram modeling to non-integer n, and includes standard techniques such as absolute discounting and Kneser-Ney smoothing as special cases. On English and Russian evaluation sets, we obtain noticeably lower perplexities relative to state-of-the-art modified Kneser-Ney and class-based n-gram models.


Available from