Article

Online Learning of Noisy Data

Dipt. di Sci. dell'Inf., Univ. degli Studi di Milano, Milan, Italy
IEEE Transactions on Information Theory (Impact Factor: 2.62). 01/2012; DOI: 10.1109/TIT.2011.2164053
Source: IEEE Xplore

ABSTRACT We study online learning of linear and kernel-based predictors, when individual examples are corrupted by random noise, and both examples and noise type can be chosen adversarially and change over time. We begin with the setting where some auxiliary information on the noise distribution is provided, and we wish to learn predictors with respect to the squared loss. Depending on the auxiliary information, we show how one can learn linear and kernel-based predictors, using just 1 or 2 noisy copies of each example. We then turn to discuss a general setting where virtually nothing is known about the noise distribution, and one wishes to learn with respect to general losses and using linear and kernel-based predictors. We show how this can be achieved using a random, essentially constant number of noisy copies of each example. Allowing multiple copies cannot be avoided: Indeed, we show that the setting becomes impossible when only one noisy copy of each instance can be accessed. To obtain our results we introduce several novel techniques, some of which might be of independent interest.

0 Bookmarks
 · 
112 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: We consider the problem of online estimation of an arbitrary real-valued signal corrupted by zero-mean noise using linear estimators. The estimator is required to iteratively predict the underlying signal based on the current and several last noisy observations, and its performance is measured by the mean-square-error. We design and analyze an algorithm for this task whose total square-error on any interval of the signal is equal to that of the best fixed filter in hindsight with respect to the interval plus an additional term whose dependence on the total signal length is only logarithmic. This bound is asymptotically tight, and resolves the question of Moon and Wiessman [“Universal FIR MMSE filtering,” IEEE Trans. Signal Process., vol. 57, no. 3, pp. 1068-1083, 2009]. Furthermore, the algorithm runs in linear time in terms of the number of filter coefficients. Previous constructions required at least quadratic time.
    IEEE Transactions on Signal Processing 04/2013; 61(7):1595-1604. · 2.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We consider the most common variants of linear regression, including Ridge, Lasso and Support-vector regression, in a setting where the learner is allowed to observe only a fixed number of attributes of each example at training time. We present simple and efficient algorithms for these problems: for Lasso and Ridge regression they need the same total number of attributes (up to constants) as do full-information algorithms, for reaching a certain accuracy. For Support-vector regression, we require exponentially less attributes compared to the state of the art. By that, we resolve an open problem recently posed by Cesa-Bianchi et al. (2010). Experiments show the theoretical bounds to be justified by superior performance compared to the state of the art.
    06/2012;

Full-text (2 Sources)

Download
58 Downloads
Available from
May 29, 2014