Article

# A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation.

Journal of Machine Learning Research - Proceedings Track 01/2009; 5:607-614.

Source: DBLP

- [Show abstract] [Hide abstract]

**ABSTRACT:**Traditional n -gram language models are widely used in state-of-the-art large vocabulary speech recognition systems. This simple model suffers from some limitations, such as overfitting of maximum-likelihood estimation and the lack of rich contextual knowledge sources. In this paper, we exploit a hierarchical Bayesian interpretation for language modeling, based on a nonparametric prior called Pitman-Yor process. This offers a principled approach to language model smoothing, embedding the power-law distribution for natural language. Experiments on the recognition of conversational speech in multiparty meetings demonstrate that by using hierarchical Bayesian language models, we are able to achieve significant reductions in perplexity and word error rate.IEEE Transactions on Audio Speech and Language Processing 12/2010; · 1.68 Impact Factor -
##### Conference Paper: A stochastic memoizer for sequence data.

[Show abstract] [Hide abstract]

**ABSTRACT:**We propose an unbounded-depth, hierarchi- cal, Bayesian nonparametric model for dis- crete sequence data. This model can be estimated from a single training sequence, yet shares statistical strength between subse- quent symbol predictive distributions in such a way that predictive performance general- izes well. The model builds on a specific pa- rameterization of an unbounded-depth hier- archical Pitman-Yor process. We introduce analytic marginalization steps (using coagu- lation operators) to reduce this model to one that can be represented in time and space linear in the length of the training sequence. We show how to perform inference in such a model without truncation approximation and introduce fragmentation operators nec- essary to do predictive inference. We demon- strate the sequence memoizer by using it as a language model, achieving state-of-the-art results.Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14-18, 2009; 01/2009 - [Show abstract] [Hide abstract]

**ABSTRACT:**We present power low rank ensembles, a flexible framework for n-gram language modeling where ensembles of low rank matrices and tensors are used to obtain smoothed probability estimates of words in context. Our method is a generalization of n-gram modeling to non-integer n, and includes standard techniques such as absolute discounting and Kneser-Ney smoothing as special cases. On English and Russian evaluation sets, we obtain noticeably lower perplexities relative to state-of-the-art modified Kneser-Ney and class-based n-gram models.12/2013;

Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.