Conference Proceeding

Improvements to the Sequence Memoizer.

01/2010; In proceeding of: Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, Vancouver, British Columbia, Canada.
Source: DBLP
0 0
 · 
0 Bookmarks
 · 
54 Views
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: A goal of statistical language modeling is to learn the joint probabilit y function of sequences of words. This is intrinsically difficult because o f the curse of dimensionality: we propose to fight it with its own weap ons. In the proposed approach one learns simultaneously (1) a distributed r ep- resentation for each word (i.e. a similarity between words) along with (2) the probability function for word sequences, expressed with these repr e- sentations. Generalization is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar to words forming an already seen sentence. We report on experiments using neural networks for the probability function, sh owing on two text corpora that the proposed approach very significantly im- proves on a state-of-the-art trigram model.
    Advances in Neural Information Processing Systems 13, Papers from Neural Information Processing Systems (NIPS) 2000, Denver, CO, USA; 01/2000
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: The implementation of collapsed Gibbs samplers for non-parametric Bayesian models is non-trivial, requiring con- siderable book-keeping. Goldwater et al. (2006a) presented an approximation which significantly reduces the storage and computation overhead, but we show here that their formulation was incorrect and, even after correction, is grossly inac- curate. We present an alternative formula- tion which is exact and can be computed easily. However this approach does not work for hierarchical models, for which case we present an efficient data structure which has a better space complexity than the naive approach.
    ACL 2009, Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 2-7 August 2009, Singapore, Short Papers; 01/2009
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this setting it is natural to consider sets of Dirichlet processes, one for each group, where the well-known clustering property of the Dirichlet process provides a nonparametric prior for the number of mixture components within each group. Given our desire to tie the mixture models in the various groups, we consider a hierarchical model, specifically one in which the base measure for the child Dirichlet processes is itself distributed according to a Dirichlet process. Such a base measure being discrete, the child Dirichlet processes necessarily share atoms. Thus, as desired, the mixture models in the different groups necessarily share mixture components. We discuss representations of hierarchical Dirichlet processes in terms of a stick-breaking process, and a generalization of the Chinese restaurant process that we refer to as the "Chinese restaurant franchise." We present Markov chain Monte Carlo algorithms for posterior inference in hierarchical Dirichlet process mixtures and describe applications to problems in information retrieval and text modeling.
    Teh, Y.W. and Jordan, M.I. and Beal, M.J. and Blei, D.M. (2006) Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101 (476). pp. 1566-1581. ISSN 01621459. 01/2006;

Full-text

View
0 Downloads
Available from