A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation.

Journal of Machine Learning Research - Proceedings Track 01/2009; 5:607-614.
Source: DBLP

ABSTRACT In this paper we present a doubly hierarchi- cal Pitman-Yor process language model. Its bottom layer of hierarchy consists of multi- ple hierarchical Pitman-Yor process language models, one each for some number of do- mains. The novel top layer of hierarchy con- sists of a mechanism to couple together mul- tiple language models such that they share statistical strength. Intuitively this sharing results in the "adaptation" of a latent shared language model to each domain. We intro- duce a general formalism capable of describ- ing the overall model which we call the graph- ical Pitman-Yor process and explain how to perform Bayesian inference in it. We present encouraging language model domain adapta- tion results that both illustrate the potential benefits of our new model and suggest new avenues of inquiry.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Traditional n -gram language models are widely used in state-of-the-art large vocabulary speech recognition systems. This simple model suffers from some limitations, such as overfitting of maximum-likelihood estimation and the lack of rich contextual knowledge sources. In this paper, we exploit a hierarchical Bayesian interpretation for language modeling, based on a nonparametric prior called Pitman-Yor process. This offers a principled approach to language model smoothing, embedding the power-law distribution for natural language. Experiments on the recognition of conversational speech in multiparty meetings demonstrate that by using hierarchical Bayesian language models, we are able to achieve significant reductions in perplexity and word error rate.
    IEEE Transactions on Audio Speech and Language Processing 12/2010; · 1.68 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose an unbounded-depth, hierarchi- cal, Bayesian nonparametric model for dis- crete sequence data. This model can be estimated from a single training sequence, yet shares statistical strength between subse- quent symbol predictive distributions in such a way that predictive performance general- izes well. The model builds on a specific pa- rameterization of an unbounded-depth hier- archical Pitman-Yor process. We introduce analytic marginalization steps (using coagu- lation operators) to reduce this model to one that can be represented in time and space linear in the length of the training sequence. We show how to perform inference in such a model without truncation approximation and introduce fragmentation operators nec- essary to do predictive inference. We demon- strate the sequence memoizer by using it as a language model, achieving state-of-the-art results.
    Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14-18, 2009; 01/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present power low rank ensembles, a flexible framework for n-gram language modeling where ensembles of low rank matrices and tensors are used to obtain smoothed probability estimates of words in context. Our method is a generalization of n-gram modeling to non-integer n, and includes standard techniques such as absolute discounting and Kneser-Ney smoothing as special cases. On English and Russian evaluation sets, we obtain noticeably lower perplexities relative to state-of-the-art modified Kneser-Ney and class-based n-gram models.


Available from