Article

The generalization performance of ERM algorithm with strongly mixing observations

Machine Learning (Impact Factor: 1.89). 06/2009; 75(3):275-295. DOI: 10.1007/s10994-009-5104-z
Source: DBLP

ABSTRACT

The generalization performance is the main concern of machine learning theoretical research. The previous main bounds describing
the generalization ability of the Empirical Risk Minimization (ERM) algorithm are based on independent and identically distributed
(i.i.d.) samples. In order to study the generalization performance of the ERM algorithm with dependent observations, we first
establish the exponential bound on the rate of relative uniform convergence of the ERM algorithm with exponentially strongly
mixing observations, and then we obtain the generalization bounds and prove that the ERM algorithm with exponentially strongly
mixing observations is consistent. The main results obtained in this paper not only extend the previously known results for
i.i.d. observations to the case of exponentially strongly mixing observations, but also improve the previous results for strongly
mixing samples. Because the ERM algorithm is usually very time-consuming and overfitting may happen when the complexity of
the hypothesis space is high, as an application of our main results we also explore a new strategy to implement the ERM algorithm
in high complexity hypothesis space.

Download full-text

Full-text

Available from: Luoqing Li, Feb 27, 2014
  • Source
    • "There have been extensive game-theoretic models on advertisers' bidding prediction (Cary et al. 2007)(Chakrabarty, Zhou, and Lukose. 2007)(Zhou and Lukos These models generally assume advertisers are fully rational and have full information access. This assumption is far beyond reality and thus recently machine learning methods (Cui et al. 2011)(Xu et al. 2013)(He et al. 2013), based on minimizing prediction loss on advertisers' historical bidding data, have been employed"
    [Show abstract] [Hide abstract]
    ABSTRACT: Machine learning algorithms have been applied to predict agent behaviors in real-world dynamic systems, such as advertiser behaviors in sponsored search and worker behaviors in crowdsourcing. The behavior data in these systems are generated by live agents: once the systems change due to the adoption of the prediction models learnt from the behavior data, agents will observe and respond to these changes by changing their own behaviors accordingly. As a result, the behavior data will evolve and will not be identically and independently distributed, posing great challenges to the theoretical analysis on the machine learning algorithms for behavior prediction. To tackle this challenge, in this paper, we propose to use Markov Chain in Random Environments (MCRE) to describe the behavior data, and perform generalization analysis of the machine learning algorithms on its basis. Since the one-step transition probability matrix of MCRE depends on both previous states and the random environment, conventional techniques for generalization analysis cannot be directly applied. To address this issue, we propose a novel technique that transforms the original MCRE into a higher-dimensional time-homogeneous Markov chain. The new Markov chain involves more variables but is more regular, and thus easier to deal with. We prove the convergence of the new Markov chain when time approaches infinity. Then we prove a generalization bound for the machine learning algorithms on the behavior data generated by the new Markov chain, which depends on both the Markovian parameters and the covering number of the function class compounded by the loss function for behavior prediction and the behavior prediction model. To the best of our knowledge, this is the first work that performs the generalization analysis on data generated by complex processes in real-world dynamic systems.
    Full-text · Article · Apr 2014
  • Source
    • "Unlike these works, in this paper, we study the generalization ability of the FLD based on dependent samples. There have been many dependent (non-i.i.d.) sampling mechanisms (e.g., α-mixing, β-mixing) studied in machine learning literature (see [1]–[4], [11]–[22]). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Fisher linear discriminant (FLD) is a well-known method for dimensionality reduction and classification that projects high-dimensional data onto a low-dimensional space where the data achieves maximum class separability. The previous works describing the generalization ability of FLD have usually been based on the assumption of independent and identically distributed (i.i.d.) samples. In this paper, we go far beyond this classical framework by studying the generalization ability of FLD based on Markov sampling. We first establish the bounds on the generalization performance of FLD based on uniformly ergodic Markov chain (u.e.M.c.) samples, and prove that FLD based on u.e.M.c. samples is consistent. By following the enlightening idea from Markov chain Monto Carlo methods, we also introduce a Markov sampling algorithm for FLD to generate u.e.M.c. samples from a given data of finite size. Through simulation studies and numerical studies on benchmark repository using FLD, we find that FLD based on u.e.M.c. samples generated by Markov sampling can provide smaller misclassification rates compared to i.i.d. samples.
    Full-text · Article · Feb 2013 · IEEE transactions on neural networks and learning systems
  • Source
    • "Steinwart and Christmann [18] considered the fast learning rates of regularized empirical risk minimizing algorithm for α-mixing process. Zou et al. [19] established the bounds on the generalization performance of the ERM algorithm with strongly mixing observations. There are many definitions of non-independent sequences in [8], but in this paper we focus only on an analysis in the case when the training samples of least-square regularized regression algorithms are Markov chains, the reasons are as follows: First, Markov chain samples appear so often and naturally in applications, especially in biological (DNA or protein) sequence analysis, speech recognition, character recognition, content-based web search and marking prediction. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The previously known works describing the generalization of least-square regularized regression algorithm are usually based on the assumption of independent and identically distributed (i.i.d.) samples. In this paper we go far beyond this classical framework by studying the generalization of least-square regularized regression algorithm with Markov chain samples. We first establish a novel concentration inequality for uniformly ergodic Markov chains, then we establish the bounds on the generalization of least-square regularized regression algorithm with uniformly ergodic Markov chain samples, and show that least-square regularized regression algorithm with uniformly ergodic Markov chains is consistent.
    Full-text · Article · Apr 2012 · Journal of Mathematical Analysis and Applications
Show more