Article

The generalization performance of ERM algorithm with strongly mixing observations

Machine Learning (Impact Factor: 1.69). 06/2009; 75(3):275-295. DOI: 10.1007/s10994-009-5104-z
Source: DBLP

ABSTRACT The generalization performance is the main concern of machine learning theoretical research. The previous main bounds describing
the generalization ability of the Empirical Risk Minimization (ERM) algorithm are based on independent and identically distributed
(i.i.d.) samples. In order to study the generalization performance of the ERM algorithm with dependent observations, we first
establish the exponential bound on the rate of relative uniform convergence of the ERM algorithm with exponentially strongly
mixing observations, and then we obtain the generalization bounds and prove that the ERM algorithm with exponentially strongly
mixing observations is consistent. The main results obtained in this paper not only extend the previously known results for
i.i.d. observations to the case of exponentially strongly mixing observations, but also improve the previous results for strongly
mixing samples. Because the ERM algorithm is usually very time-consuming and overfitting may happen when the complexity of
the hypothesis space is high, as an application of our main results we also explore a new strategy to implement the ERM algorithm
in high complexity hypothesis space.

Download full-text

Full-text

Available from: Luoqing Li, Feb 27, 2014
0 Followers
 · 
142 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Machine learning algorithms have been applied to predict agent behaviors in real-world dynamic systems, such as advertiser behaviors in sponsored search and worker behaviors in crowdsourcing. The behavior data in these systems are generated by live agents: once the systems change due to the adoption of the prediction models learnt from the behavior data, agents will observe and respond to these changes by changing their own behaviors accordingly. As a result, the behavior data will evolve and will not be identically and independently distributed, posing great challenges to the theoretical analysis on the machine learning algorithms for behavior prediction. To tackle this challenge, in this paper, we propose to use Markov Chain in Random Environments (MCRE) to describe the behavior data, and perform generalization analysis of the machine learning algorithms on its basis. Since the one-step transition probability matrix of MCRE depends on both previous states and the random environment, conventional techniques for generalization analysis cannot be directly applied. To address this issue, we propose a novel technique that transforms the original MCRE into a higher-dimensional time-homogeneous Markov chain. The new Markov chain involves more variables but is more regular, and thus easier to deal with. We prove the convergence of the new Markov chain when time approaches infinity. Then we prove a generalization bound for the machine learning algorithms on the behavior data generated by the new Markov chain, which depends on both the Markovian parameters and the covering number of the function class compounded by the loss function for behavior prediction and the behavior prediction model. To the best of our knowledge, this is the first work that performs the generalization analysis on data generated by complex processes in real-world dynamic systems.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Fisher linear discriminant (FLD) is a well-known method for dimensionality reduction and classification that projects high-dimensional data onto a low-dimensional space where the data achieves maximum class separability. The previous works describing the generalization ability of FLD have usually been based on the assumption of independent and identically distributed (i.i.d.) samples. In this paper, we go far beyond this classical framework by studying the generalization ability of FLD based on Markov sampling. We first establish the bounds on the generalization performance of FLD based on uniformly ergodic Markov chain (u.e.M.c.) samples, and prove that FLD based on u.e.M.c. samples is consistent. By following the enlightening idea from Markov chain Monto Carlo methods, we also introduce a Markov sampling algorithm for FLD to generate u.e.M.c. samples from a given data of finite size. Through simulation studies and numerical studies on benchmark repository using FLD, we find that FLD based on u.e.M.c. samples generated by Markov sampling can provide smaller misclassification rates compared to i.i.d. samples.
    IEEE transactions on neural networks and learning systems 02/2013; 24(2):288-300. DOI:10.1109/TNNLS.2012.2230406 · 4.37 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The previously known works describing the generalization of least-square regularized regression algorithm are usually based on the assumption of independent and identically distributed (i.i.d.) samples. In this paper we go far beyond this classical framework by studying the generalization of least-square regularized regression algorithm with Markov chain samples. We first establish a novel concentration inequality for uniformly ergodic Markov chains, then we establish the bounds on the generalization of least-square regularized regression algorithm with uniformly ergodic Markov chain samples, and show that least-square regularized regression algorithm with uniformly ergodic Markov chains is consistent.
    Journal of Mathematical Analysis and Applications 04/2012; 388(1):333–343. DOI:10.1016/j.jmaa.2011.11.032 · 1.12 Impact Factor