Conference Paper

Improvements to the Sequence Memoizer.

Conference: Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, Vancouver, British Columbia, Canada.
Source: DBLP

ABSTRACT

The sequence memoizer is a model for sequence data with state-of-the-art performance on language modeling and compression. We propose a number of improvements to the model and inference algorithm, including an enlarged range of hyperparameters, a memory-efficient representation, and inference algorithms operating on the new representation. Our derivations are based on precise definitions of the various processes that will also allow us to provide an elementary proof of the "mysterious" coagulation and fragmentation properties used in the original paper on the sequence memoizer by Wood et al. (2009). We present some experimental results supporting our improvements.

Full-text preview

Available from: pascal-network.org
  • Source
    • "However , it has been shown that going beyond finite order in a Markov model improves language modelling because natural language embodies a large array of long range depepndencies (Wood et al., 2009a). While infinite order Markov models have been extensively explored for language modelling (Gasthaus and Teh, 2010; Wood et al., 2011), this has not yet been done for structure prediction. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Linguistic structures exhibit a rich array of global phenomena, however commonly used Markov models are unable to adequately describe these phenomena due to their strong locality assumptions. We propose a novel hierarchical model for structured prediction over sequences and trees which exploits global context by conditioning each generation decision on an unbounded context of prior decisions. This builds on the success of Markov models but without imposing a fixed bound in order to better represent global phenomena. To facilitate learning of this large and unbounded model, we use a hierarchical Pitman-Yor process prior which provides a recursive form of smoothing. We propose prediction algorithms based on A* and Markov Chain Monte Carlo sampling. Empirical results demonstrate the potential of our model compared to baseline finite-context Markov models on part-of-speech tagging and syntactic parsing.
    Full-text · Article · Mar 2015
  • Source
    • "Thus, the sequence memoizers employed in this work comprise a HPYP model with its strength parameters h n set equal to zero, h n = 0, "n (note though that a wider family of distributions can be also considered for similar computational efficiency purposes (Gasthaus and Teh, 2011)). Then, given a context u, the sequence memoizer determines which of its prefixes i.e., p(u),p(p(u)), and so on, has no other children than the one appearing within u, and removes them from the recursions in (5), based on Theorem 1. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Generative models for sequential data are usually based on the assumption of temporal dependencies described by a first-order Markov chain. To ameliorate this shallow modeling assumption, several authors have proposed models with higher-order dependencies. However, the practical applicability of these approaches is hindered by their prohibitive computational costs in most cases. In addition, most existing approaches give rise to model training algorithms with objective functions that entail multiple spurious local optima, thus requiring application of tedious countermeasures to avoid getting trapped to bad model estimates. In this paper, we devise a novel margin-maximizing model with convex objective func-tion that allows for capturing infinitely-long temporal dependencies in sequential datasets. This is effected by utilizing a recently proposed nonparametric Bayesian model of label sequences with infinitely-long temporal dependencies, namely the sequence memoizer, and training our model using mar-gin maximization and a versatile mean-field-like approximation to allow for increased computational efficiency. As we experimentally demonstrate, the devised margin-maximizing construction of our model, which leads to a convex optimization scheme, without any spurious local optima, combined with the capacity of our model to capture long and complex temporal dependencies, allow for obtaining exceptional pattern recognition performance in several applications.
    Full-text · Dataset · Sep 2013
  • Source
    • "Thus, the sequence memoizers employed in this work comprise a HPYP model with its strength parameters n set equal to zero, n ¼ 0, 8n (note, though, that a wider family of distributions can be also considered for similar computational efficiency purposes [11]). Then, given a context u u u u, the sequence memoizer determines which of its prefixes i.e., ðu u u uÞ, ððu u u uÞÞ, and so on, has no other children than the one appearing within u u u u, and removes them from the recursions in (19), based on Theorem 1. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Sequential data labeling is a fundamental task in machine learning applications, with speech and natural language processing, activity recognition in video sequences, and biomedical data analysis being characteristic examples, to name just a few. The conditional random field (CRF), a log-linear model representing the conditional distribution of the observation labels, is one of the most successful approaches for sequential data labeling and classification, and has lately received significant attention in machine learning as it achieves superb prediction performance in a variety of scenarios. Nevertheless, existing CRF formulations can capture only one-or few-timestep interactions and neglect higher order dependences, which are potentially useful in many real-life sequential data modeling applications. To resolve these issues, in this paper we introduce a novel CRF formulation, based on the postulation of an energy function which entails infinitely long time-dependences between the modeled data. Building blocks of our novel approach are: 1) the sequence memoizer (SM), a recently proposed nonparametric Bayesian approach for modeling label sequences with infinitely long time dependences, and 2) a mean-field-like approximation of the model marginal likelihood, which allows for the derivation of computationally efficient inference algorithms for our model. The efficacy of the so-obtained infinite-order CRF (CRF 1) model is experimentally demonstrated.
    Full-text · Dataset · Aug 2013
Show more