Predicting Nonstationary Time Series with Multi-scale Gaussian Processes Model.
ABSTRACT The Gaussian processes (GP) model has been successfully applied to the prediction of nonstationary time series. Due to the model's covariance function containing an undetermined hyperparameters, to find its maximum likelihood values one usually suffers from either susceptibility to initial conditions or large computational cost. To overcome the pitfalls mentioned above, at the same time to acquire better prediction performance, a novel multi-scale Gaussian processes (MGP) model is proposed in this paper. In the MGP model, the covariance function is constructed by a scaling function with its different dilations and translations, ensuring that the optimal value of the hyperparameter is easy to determine. Although some more time is spent on the calculation of covariance function, MGP takes much less time to determine hyperparameter. Therefore, the total training time of MGP is competitive to GP. Experiments demonstrate the prediction performance of MGP is better than GP. Moreover, the experiments also show that the performance of MGP and support vector machine (SVM) is comparable. They give better performance compared to the radial basis function (RBF) networks.
- SourceAvailable from: homes.dsi.unimi.it[Show abstract] [Hide abstract]
ABSTRACT: We propose a network architecture which uses a single internal layer of locally-tuned processing units to learn both classification tasks and real-valued function approximations (Moody and Darken 1988). We consider training such networks in a completely supervised manner, but abandon this approach in favor of a more computationally efficient hybrid learning method which combines self-organized and supervised learning. Our networks learn faster than backpropagation for two reasons: the local representations ensure that only a few units respond to any given input, thus reducing computational overhead, and the hybrid learning rules are linear rather than nonlinear, thus leading to faster convergence. Unlike many existing methods for data analysis, our network architecture and learning rules are truly adaptive and are thus appropriate for real-time use.Neural Computation 01/1989; 1(2):281-294. · 1.76 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: . I describe a framework for interpreting Support Vector Machines (SVMs) as maximum a posteriori (MAP) solutions to inference problems with Gaussian Process priors. This probabilistic interpretation can provide intuitive guidelines for choosing a `good' SVM kernel. Beyond this, it allows Bayesian methods to be used for tackling two of the outstanding challenges in SVM classification: how to tune hyperparameters---the misclassification penalty C, and any parameters specifying the kernel---and how to obtain predictive class probabilities rather than the conventional deterministic class label predictions. Hyperparameters can be set by maximizing the evidence; I explain how the latter can be defined and properly normalized. Both analytical approximations and numerical methods (Monte Carlo chaining) for estimating the evidence are discussed. I also compare different methods of estimating class probabilities, ranging from simple evaluation at the MAP or posterior average to full averaging ov...Machine Learning 12/2000; · 1.47 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: We consider the problem of calculating learning curves (i.e., average generalization performance) of gaussian processes used for regression. On the basis of a simple expression for the generalization error, in terms of the eigenvalue decomposition of the covariance function, we derive a number of approximation schemes. We identify where these become exact and compare with existing bounds on learning curves; the new approximations, which can be used for any input space dimension, generally get substantially closer to the truth. We also study possible improvements to our approximations. Finally, we use a simple exactly solvable learning scenario to show that there are limits of principle on the quality of approximations and bounds expressible solely in terms of the eigenvalue spectrum of the covariance function.Neural Computation 07/2002; 14(6):1393-428. · 1.76 Impact Factor