Article
Low Rank Language Models for Small Training Sets
Dept. of Electr. Eng., Univ. of Washington, Seattle, WA, USA
IEEE Signal Processing Letters (impact factor:
1.39).
10/2011;
DOI:10.1109/LSP.2011.2160850
pp.489 - 492
Source: IEEE Xplore
- Citations (7)
-
Cited In (0)
-
Article: An Empirical Study of Smoothing Techniques for Language Modeling
[show abstract] [hide abstract]
ABSTRACT: We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e.g., Brown versus Wall Street Journal), and n-gram order (bigram versus trigram) affect the relative performance of these methods, which we measure through the cross-entropy of test data. In addition, we introduce two novel smoothing techniques, one a variation of Jelinek-Mercer smoothing and one a very simple linear interpolation technique, both of which outperform existing methods.07/1996; -
Article: Tensor Decompositions and Applications.
SIAM Review. 01/2009; 51:455-500. -
Article: Probabilistic latent variable models as nonnegative factorizations.
[show abstract] [hide abstract]
ABSTRACT: This paper presents a family of probabilistic latent variable models that can be used for analysis of nonnegative data. We show that there are strong ties between nonnegative matrix factorization and this family, and provide some straightforward extensions which can help in dealing with shift invariances, higher-order decompositions and sparsity constraints. We argue through these extensions that the use of this approach allows for rapid development of complex statistical models for analyzing nonnegative data.Computational Intelligence and Neuroscience 02/2008;
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed.
The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual
current impact factor.
Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence
agreement may be applicable.
Keywords
joint probability distributions
language model smoothing techniques
low rank language model
low rank tensor representation
optimizes likelihood
out-of-domain model
perplexity reduction
rank constraint
standard smoothing techniques
tasks