A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation.

Journal of Machine Learning Research - Proceedings Track 01/2009; 5:607-614.
Source: DBLP

ABSTRACT In this paper we present a doubly hierarchi- cal Pitman-Yor process language model. Its bottom layer of hierarchy consists of multi- ple hierarchical Pitman-Yor process language models, one each for some number of do- mains. The novel top layer of hierarchy con- sists of a mechanism to couple together mul- tiple language models such that they share statistical strength. Intuitively this sharing results in the "adaptation" of a latent shared language model to each domain. We intro- duce a general formalism capable of describ- ing the overall model which we call the graph- ical Pitman-Yor process and explain how to perform Bayesian inference in it. We present encouraging language model domain adapta- tion results that both illustrate the potential benefits of our new model and suggest new avenues of inquiry.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Traditional n -gram language models are widely used in state-of-the-art large vocabulary speech recognition systems. This simple model suffers from some limitations, such as overfitting of maximum-likelihood estimation and the lack of rich contextual knowledge sources. In this paper, we exploit a hierarchical Bayesian interpretation for language modeling, based on a nonparametric prior called Pitman-Yor process. This offers a principled approach to language model smoothing, embedding the power-law distribution for natural language. Experiments on the recognition of conversational speech in multiparty meetings demonstrate that by using hierarchical Bayesian language models, we are able to achieve significant reductions in perplexity and word error rate.
    IEEE Transactions on Audio Speech and Language Processing 12/2010; · 1.68 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Exact Bayesian network inference exists for Gaussian and multinomial distributions. For other kinds of distributions, approximations or restrictions on the kind of inference done are needed. In this paper we present generalized networks of Dirichlet distributions, and show how, using the two-parameter Poisson-Dirichlet distribution and Gibbs sampling, one can do approximate inference over them. This involves integrating out the proba-bility vectors but leaving auxiliary discrete count vectors in their place. We illustrate the technique by extending standard topic models to "structured" documents, where the document structure is given by a Bayesian network of Dirichlets.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present power low rank ensembles, a flexible framework for n-gram language modeling where ensembles of low rank matrices and tensors are used to obtain smoothed probability estimates of words in context. Our method is a generalization of n-gram modeling to non-integer n, and includes standard techniques such as absolute discounting and Kneser-Ney smoothing as special cases. On English and Russian evaluation sets, we obtain noticeably lower perplexities relative to state-of-the-art modified Kneser-Ney and class-based n-gram models.


Available from