Conference Paper

Improvements to the Sequence Memoizer.

Conference: Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, Vancouver, British Columbia, Canada.
Source: DBLP
  • [Show abstract] [Hide abstract]
    ABSTRACT: Sequential data labeling is a fundamental task in machine learning applications, with speech and natural language processing, activity recognition in video sequences, and biomedical data analysis being characteristic examples, to name just a few. The conditional random field (CRF), a log-linear model representing the conditional distribution of the observation labels, is one of the most successful approaches for sequential data labeling and classification, and has lately received significant attention in machine learning as it achieves superb prediction performance in a variety of scenarios. Nevertheless, existing CRF formulations can capture only one- or few-timestep interactions and neglect higher order dependences, which are potentially useful in many real-life sequential data modeling applications. To resolve these issues, in this paper we introduce a novel CRF formulation, based on the postulation of an energy function which entails infinitely long time-dependences between the modeled data. Building blocks of our novel approach are: 1) the sequence memoizer (SM), a recently proposed nonparametric Bayesian approach for modeling label sequences with infinitely long time dependences, and 2) a mean-field-like approximation of the model marginal likelihood, which allows for the derivation of computationally efficient inference algorithms for our model. The efficacy of the so-obtained infinite-order CRF ($({\rm CRF}^{\infty })$) model is experimentally demonstrated.
    IEEE Transactions on Software Engineering 06/2013; 35(6):1523-1534. · 2.29 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Generative models for sequential data are usually based on the assumption of temporal dependencies described by a first-order Markov chain. To ameliorate this shallow modeling assumption, several authors have proposed models with higher-order dependencies. However, the practical applicability of these approaches is hindered by their prohibitive computational costs in most cases. In addition, most existing approaches give rise to model training algorithms with objective functions that entail multiple spurious local optima, thus requiring application of tedious countermeasures to avoid getting trapped to bad model estimates. In this paper, we devise a novel margin-maximizing model with convex objective func-tion that allows for capturing infinitely-long temporal dependencies in sequential datasets. This is effected by utilizing a recently proposed nonparametric Bayesian model of label sequences with infinitely-long temporal dependencies, namely the sequence memoizer, and training our model using mar-gin maximization and a versatile mean-field-like approximation to allow for increased computational efficiency. As we experimentally demonstrate, the devised margin-maximizing construction of our model, which leads to a convex optimization scheme, without any spurious local optima, combined with the capacity of our model to capture long and complex temporal dependencies, allow for obtaining exceptional pattern recognition performance in several applications.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We interpret results from a study where data was modeled using constant space approxi-mations to the sequence memoizer. The sequence memoizer (SM) is a non-constant-space, Bayesian nonparametric model in which the data are the sufficient statistic in the stream-ing setting. We review approximations to the probabilistic model underpinning the SM that yield the computational asymptotic complexities necessary for modeling very large (streaming) datasets with fixed computational resource. Results from modeling a benchmark corpus are shown for both the effectively parametric, approximate models and the fully nonparametric SM. We find that the approximations perform nearly as well in terms of predictive likelihood. We argue from this single example that, due to the lack of sufficiency, Bayesian nonparametric models may, in general, not be suitable as models of streaming data, and propose that nonstationary parametric models and estimators for the same inspired by Bayesian nonparametric models may be worth investigating more fully.


Available from