Conference Paper
Improvements to the Sequence Memoizer.
Conference: Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 69 December 2010, Vancouver, British Columbia, Canada.
Source: DBLP
Fulltext preview
pascalnetwork.org Available from: Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.

 "However , it has been shown that going beyond finite order in a Markov model improves language modelling because natural language embodies a large array of long range depepndencies (Wood et al., 2009a). While infinite order Markov models have been extensively explored for language modelling (Gasthaus and Teh, 2010; Wood et al., 2011), this has not yet been done for structure prediction. "
[Show abstract] [Hide abstract]
ABSTRACT: Linguistic structures exhibit a rich array of global phenomena, however commonly used Markov models are unable to adequately describe these phenomena due to their strong locality assumptions. We propose a novel hierarchical model for structured prediction over sequences and trees which exploits global context by conditioning each generation decision on an unbounded context of prior decisions. This builds on the success of Markov models but without imposing a fixed bound in order to better represent global phenomena. To facilitate learning of this large and unbounded model, we use a hierarchical PitmanYor process prior which provides a recursive form of smoothing. We propose prediction algorithms based on A* and Markov Chain Monte Carlo sampling. Empirical results demonstrate the potential of our model compared to baseline finitecontext Markov models on partofspeech tagging and syntactic parsing. 
 "Thus, the sequence memoizers employed in this work comprise a HPYP model with its strength parameters h n set equal to zero, h n = 0, "n (note though that a wider family of distributions can be also considered for similar computational efficiency purposes (Gasthaus and Teh, 2011)). Then, given a context u, the sequence memoizer determines which of its prefixes i.e., p(u),p(p(u)), and so on, has no other children than the one appearing within u, and removes them from the recursions in (5), based on Theorem 1. "
Dataset: Marginmaximizing classification of sequential data with infinitelylong temporal dependencies
[Show abstract] [Hide abstract]
ABSTRACT: Generative models for sequential data are usually based on the assumption of temporal dependencies described by a firstorder Markov chain. To ameliorate this shallow modeling assumption, several authors have proposed models with higherorder dependencies. However, the practical applicability of these approaches is hindered by their prohibitive computational costs in most cases. In addition, most existing approaches give rise to model training algorithms with objective functions that entail multiple spurious local optima, thus requiring application of tedious countermeasures to avoid getting trapped to bad model estimates. In this paper, we devise a novel marginmaximizing model with convex objective function that allows for capturing infinitelylong temporal dependencies in sequential datasets. This is effected by utilizing a recently proposed nonparametric Bayesian model of label sequences with infinitelylong temporal dependencies, namely the sequence memoizer, and training our model using margin maximization and a versatile meanfieldlike approximation to allow for increased computational efficiency. As we experimentally demonstrate, the devised marginmaximizing construction of our model, which leads to a convex optimization scheme, without any spurious local optima, combined with the capacity of our model to capture long and complex temporal dependencies, allow for obtaining exceptional pattern recognition performance in several applications. 
 "Thus, the sequence memoizers employed in this work comprise a HPYP model with its strength parameters n set equal to zero, n ¼ 0, 8n (note, though, that a wider family of distributions can be also considered for similar computational efficiency purposes [11]). Then, given a context u u u u, the sequence memoizer determines which of its prefixes i.e., ðu u u uÞ, ððu u u uÞÞ, and so on, has no other children than the one appearing within u u u u, and removes them from the recursions in (19), based on Theorem 1. "
[Show abstract] [Hide abstract]
ABSTRACT: Sequential data labeling is a fundamental task in machine learning applications, with speech and natural language processing, activity recognition in video sequences, and biomedical data analysis being characteristic examples, to name just a few. The conditional random field (CRF), a loglinear model representing the conditional distribution of the observation labels, is one of the most successful approaches for sequential data labeling and classification, and has lately received significant attention in machine learning as it achieves superb prediction performance in a variety of scenarios. Nevertheless, existing CRF formulations can capture only oneor fewtimestep interactions and neglect higher order dependences, which are potentially useful in many reallife sequential data modeling applications. To resolve these issues, in this paper we introduce a novel CRF formulation, based on the postulation of an energy function which entails infinitely long timedependences between the modeled data. Building blocks of our novel approach are: 1) the sequence memoizer (SM), a recently proposed nonparametric Bayesian approach for modeling label sequences with infinitely long time dependences, and 2) a meanfieldlike approximation of the model marginal likelihood, which allows for the derivation of computationally efficient inference algorithms for our model. The efficacy of the soobtained infiniteorder CRF (CRF 1) model is experimentally demonstrated.