Conference Paper
Improvements to the Sequence Memoizer.
Conference: Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 69 December 2010, Vancouver, British Columbia, Canada.
Source: DBLP

Conference Paper: Deplump for Streaming Data.
[Show abstract] [Hide abstract]
ABSTRACT: We present a generalpurpose, lossless compressor for streaming data. This compressor is based on the deplump probabilistic compressor for batch data. Approximations to the inference procedure used in the probabilistic model underpinning deplump are introduced that yield the computational asyptotics necessary for stream compression. We demonstrate the performance of this streaming deplump variant relative to the batch compressor on a benchmark corpus and find that it performs equivalently well despite these approximations. We also explore the performance of the streaming variant on corpora that are too large to be compressed by batch deplump and demonstrate excellent compression performance.2011 Data Compression Conference (DCC 2011), 2931 March 2011, Snowbird, UT, USA; 01/2011  [Show abstract] [Hide abstract]
ABSTRACT: We interpret results from a study where data was modeled using constant space approximations to the sequence memoizer. The sequence memoizer (SM) is a nonconstantspace, Bayesian nonparametric model in which the data are the sufficient statistic in the streaming setting. We review approximations to the probabilistic model underpinning the SM that yield the computational asymptotic complexities necessary for modeling very large (streaming) datasets with fixed computational resource. Results from modeling a benchmark corpus are shown for both the effectively parametric, approximate models and the fully nonparametric SM. We find that the approximations perform nearly as well in terms of predictive likelihood. We argue from this single example that, due to the lack of sufficiency, Bayesian nonparametric models may, in general, not be suitable as models of streaming data, and propose that nonstationary parametric models and estimators for the same inspired by Bayesian nonparametric models may be worth investigating more fully.  [Show abstract] [Hide abstract]
ABSTRACT: Sequential data labeling is a fundamental task in machine learning applications, with speech and natural language processing, activity recognition in video sequences, and biomedical data analysis being characteristic examples, to name just a few. The conditional random field (CRF), a loglinear model representing the conditional distribution of the observation labels, is one of the most successful approaches for sequential data labeling and classification, and has lately received significant attention in machine learning as it achieves superb prediction performance in a variety of scenarios. Nevertheless, existing CRF formulations can capture only oneor fewtimestep interactions and neglect higher order dependences, which are potentially useful in many reallife sequential data modeling applications. To resolve these issues, in this paper we introduce a novel CRF formulation, based on the postulation of an energy function which entails infinitely long timedependences between the modeled data. Building blocks of our novel approach are: 1) the sequence memoizer (SM), a recently proposed nonparametric Bayesian approach for modeling label sequences with infinitely long time dependences, and 2) a meanfieldlike approximation of the model marginal likelihood, which allows for the derivation of computationally efficient inference algorithms for our model. The efficacy of the soobtained infiniteorder CRF (CRF 1) model is experimentally demonstrated.
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.