# A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation.

Journal of Machine Learning Research - Proceedings Track 01/2009; 5:607-614.

Source: DBLP

Traditional n -gram language models are widely used in state-of-the-art large vocabulary speech recognition systems. This simple model suffers from some limitations, such as overfitting of maximum-likelihood estimation and the lack of rich contextual knowledge sources. In this paper, we exploit a hierarchical Bayesian interpretation for language modeling, based on a nonparametric prior called Pitman-Yor process. This offers a principled approach to language model smoothing, embedding the power-law distribution for natural language. Experiments on the recognition of conversational speech in multiparty meetings demonstrate that by using hierarchical Bayesian language models, we are able to achieve significant reductions in perplexity and word error rate.
##### Conference Paper: A stochastic memoizer for sequence data.

We propose an unbounded-depth, hierarchi- cal, Bayesian nonparametric model for dis- crete sequence data. This model can be estimated from a single training sequence, yet shares statistical strength between subse- quent symbol predictive distributions in such a way that predictive performance general- izes well. The model builds on a specific pa- rameterization of an unbounded-depth hier- archical Pitman-Yor process. We introduce analytic marginalization steps (using coagu- lation operators) to reduce this model to one that can be represented in time and space linear in the length of the training sequence. We show how to perform inference in such a model without truncation approximation and introduce fragmentation operators nec- essary to do predictive inference. We demon- strate the sequence memoizer by using it as a language model, achieving state-of-the-art results.

We present power low rank ensembles, a flexible framework for n-gram language modeling where ensembles of low rank matrices and tensors are used to obtain smoothed probability estimates of words in context. Our method is a generalization of n-gram modeling to non-integer n, and includes standard techniques such as absolute discounting and Kneser-Ney smoothing as special cases. On English and Russian evaluation sets, we obtain noticeably lower perplexities relative to state-of-the-art modified Kneser-Ney and class-based n-gram models.

