José R. Dorronsoro

José R. Dorronsoro
  • Autonomous University of Madrid

About

170
Publications
16,183
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,967
Citations
Introduction
Skills and Expertise
Current institution
Autonomous University of Madrid

Publications

Publications (170)
Article
Multi-Task Learning tries to improve the learning process of different tasks by solving them simultaneously. A popular Multi-Task Learning formulation for SVM is to combine common and task-specific parts. Other approaches rely on using a Graph Laplacian regularizer. Here we propose a combination of these two approaches that can be applied to L1, L2...
Chapter
Full-text available
By their very nature, regression problems can be transformed into classification problems by discretizing their target variable. Within this perspective, in this work we investigate the possibility of improving the performance of deep machine learning models in regression scenarios through a training strategy that combines different classification...
Chapter
Multi-Task Learning (MTL) aims at improving the learning process by solving different tasks simultaneously. Two general approaches for neural MTL are hard and soft information sharing during training. Here we propose two new approaches to neural MTL. The first one uses a common model to enforce a soft sharing learning of the tasks considered. The s...
Chapter
Multi-Task Learning aims at improving the learning process by solving different tasks simultaneously. The approaches to Multi-Task Learning can be categorized as feature-learning, regularization-based and combination strategies. Feature-learning approximations are more natural for deep models while regularization-based ones are usually designed for...
Chapter
In Ordinal Regression (OR) class labels contain ranking information about the underlying samples and, thus, the goal is not only to minimize classification errors but also the rank distance of misclassified patterns. Thus, while class rankings are not metric values, they add a regression-like character to OR. Within this perspective, we propose her...
Chapter
Multi-Task Learning (MTL) aims at solving different tasks simultaneously to obtain better models. Some Support Vector Machines (SVMs) formulations for the MTL context involve the combination of common and task-independent models, where we can also use an homogeneous graph over the tasks to impose pairwise connections between the independent models....
Conference Paper
Full-text available
Modern Deep Neuronal Network backends allow a great flexibility to define network architectures. This allows for multiple outputs with their specific losses which can make them more suitable for particular goals. In this work we shall explore this possibility for classification networks which will combine the categorical cross-entropy loss, typical...
Article
Quite often a machine learning problem lends itself to be split in several well-defined subproblems, or tasks. The goal of Multi-Task Learning (MTL) is to leverage the joint learning of the problem from two different perspectives: on the one hand, a single, overall model, and on the other hand task-specific models. In this way, the found solution b...
Article
We propose an improved version of the SMO algorithm for training classification and regression SVMs, based on a Conjugate Descent procedure. This new approach only involves a modest increase on the computational cost of each iteration but, in turn, usually results in a substantial decrease in the number of iterations required to converge to a given...
Article
Full-text available
Given the impact of renewable sources in the overall energy production, accurate predictions are becoming essential, with machine learning becoming a very important tool in this context. In many situations, the prediction problem can be divided into several tasks, more or less related between them but each with its own particularities. Multitask le...
Chapter
Black-box optimization aims to find the optimum of an unknown function only by evaluating it over different points in the space. An important application of black-box optimization in Machine Learning is the computationally expensive tuning of the hyper-parameters, which requires to try different configurations and measure the validation error over...
Chapter
The detection of anomalies, i.e. of those points found in a dataset but which do not seem to be generated by the underlying distribution, is crucial in machine learning. Their presence is likely to make model predictions not as accurate as we would like; thus, they should be identified before any model is built which, in turn, may require the optim...
Chapter
Multi-Task Learning (MTL) goal is to achieve a better generalization by using data from different sources. MTL Support Vector Machines (SVMs) embrace this idea in two main ways: by using a combination of common and task-specific parts, or by fitting individual models adding a graph Laplacian regularization that defines different degrees of task rel...
Article
Full-text available
Satellite-measured radiances are obviously of great interest for photovoltaic (PV) energy prediction. In this work we will use them together with clear sky irradiance estimates for the nowcasting of PV energy productions over peninsular Spain. We will feed them directly into two linear Machine Learning models, Lasso and linear Support Vector Regres...
Article
Kernel-based techniques have become a common way for describing the local and global relationships of data samples that are generated in real-world processes. In this research, we focus on a multi-scale kernel based technique named Auto-adaptive Laplacian Pyramids (ALP). This method can be useful for function approximation and interpolation. ALP is...
Article
Full-text available
Kernel based Support Vector Machines, SVM, one of the most popular machine learning models, usually achieve top performances in two-class classification and regression problems. However, their training cost is at least quadratic on sample size, making them thus unsuitable for large sample problems. However, Deep Neural Networks (DNNs), with a cost...
Preprint
Full-text available
We propose an improved version of the SMO algorithm for training classification and regression SVMs, based on a Conjugate Descent procedure. This new approach only involves a modest increase on the computational cost of each iteration but, in turn, usually results in a substantial decrease in the number of iterations required to converge to a given...
Article
The increasing presence of photovoltaic (PV) generation in the energy mix demands improved forecasting tools which can be updated in an almost continuous basis. Satellite-based information lends itself naturally to this purpose and here it is used to nowcast hourly PV energy production for horizons up to six hours over Peninsular Spain and two isla...
Chapter
Multi-task learning (MTL) is a powerful framework that allows to take advantage of the similarities between several machine learning tasks to improve on their solution by independent task specific models. Support Vector Machines (SVMs) are well suited for this and Cai et al. have proposed additive MTL SVMs, where the final model corresponds to the...
Conference Paper
Full-text available
Support Vector Machines, SVM, are one of the most popular machine learning models for supervised problems and have proved to achieve great performance in a wide broad of predicting tasks. However, they can suffer from scalability issues when working with large sample sizes, a common situation in the big data era. On the other hand, Deep Neural Netwo...
Article
Full-text available
While being one of the first and most elegant tools for dimensionality reduction, Fisher linear discriminant analysis (FLDA) is not currently considered among the top methods for feature extraction or classification. In this paper, we will review two recent approaches to FLDA, namely, least squares Fisher discriminant analysis (LSFDA) and regulariz...
Chapter
We consider wind energy prediction by Support Vector Regression (SVR) with generalized Gaussian Process kernels, proposing a validation–based kernel choice which will be then used in two prediction problems instead of the standard Gaussian ones. The resulting model beats a Gaussian SVR in one problem and ties in the other. Furthermore, besides the...
Chapter
Two problems when using Numerical Weather Prediction features in Machine Learning are the high dimensionality inherent to the current high-resolution models, and the high correlation of the features, which can affect the performance of learning machines as Multilayer Perceptron (MLP). In this work we propose to reduce the dimension of the problem u...
Conference Paper
Full-text available
Classification over imbalanced datasets is a highly interesting topic given that many real-world classification problems present a concrete class with a much smaller number of patterns than the others. In this work we shall explore the use of large, fully connected and potentially deep MLPs in such problems. We will consider simple MLPs, with ReLU...
Article
Full-text available
General noise cost functions have been recently proposed for support vector regression (SVR). When applied to tasks whose underlying noise distribution is similar to the one assumed for the cost function, these models should perform better than classical \(\epsilon\)-SVR. On the other hand, uncertainty estimates for SVR have received a somewhat lim...
Article
Full-text available
Deep Learning models are recently receiving a large attention because of their very powerful modeling abilities, particularly on inputs that have a intrinsic one- or two-dimensional structure that can be captured and exploited by convolutional layers. In this work we will apply Deep Neural Networks (DNNs) in two problems, wind energy and daily sola...
Conference Paper
Numerical weather predictions (NWP) ensembles, i.e., probabilistic variants of NWP forecasts, can be a useful tool to improve the quality of renewable energy predictions as well as to provide useful estimates of uncertainty in NWP–based energy forecasts. In this work we will consider the application of the NWP ensembles provided by the European Cen...
Article
Many important linear sparse models have at its core the Lasso problem, for which the GLMNet algorithm is often considered as the current state of the art. Recently M. Jaggi has observed that Constrained Lasso (CL) can be reduced to an SVM-like problem, for which the LIBSVM library provides very efficient algorithms. This suggests that it could als...
Article
The ability of ensemble models to retain the bias of their learners while decreasing their individual variance has long made them quite attractive in a number of classification and regression problems. In this work we will study the application of Random Forest Regression (RFR), Gradient Boosted Regression (GBR) and Extreme Gradient Boosting (XGB)...
Conference Paper
In this work we will study the use of satellite-measured irradiances as well as clear sky radiance estimates as features for the nowcasting of photovoltaic energy productions over Peninsular Spain. We will work with three Machine Learning models (Lasso and linear and Gaussian Support Vector Regression-SVR) plus a simple persistence model. We consid...
Conference Paper
Full-text available
Building uncertainty estimates is still an open problem for most machine learning regression models. On the other hand, general noise–dependent cost functions have been recently proposed for Support Vector Regression, SVR, which should be more effective when applied to regression problems whose underlying noise distribution follows the one assumed...
Conference Paper
Full-text available
Fisher Discriminant Analysis’ linear nature and the usual eigen-analysis approach to its solution have limited the application of its underlying elegant idea. In this work we will take advantage of some recent partially equivalent formulations based on standard least squares regression to develop a simple Deep Neural Network (DNN) extension of Fish...
Conference Paper
Satellite–measured irradiances can be an interesting source of information for the nowcasting of solar energy productions. Here we will consider the Machine Learning based prediction at hour H of the aggregated photovoltaic (PV) energy of Peninsular Spain using the irradiances measured by Meteosat’s visible and infrared channels at hours \(H, H-1,...
Conference Paper
We revise Nesterov’s Accelerated Gradient (NAG) procedure for the SVM dual problem and propose a strictly monotone version of NAG that is capable of accelerating the second order version of the SMO algorithm. The higher computational cost of the resulting Nesterov Accelerated SMO (NA–SMO) is twice as high as that of SMO so the reduction in the numb...
Thesis
Full-text available
While Support Vector Regression, SVR, is one of the algorithms of choice in modeling problems, construction of its error intervals seems to have received less attention. In addition, general noise cost functions for SVR have been recently proposed and proved to be more effective when a noise distribution that fits the data is properly chosen. Taki...
Article
The ability of ensemble models to retain the bias of their learners while decreasing their individual variance has long made them quite attractive in a number of classification and regression problems. Moreover, when trees are used as learners, the relative simplicity of the resulting models has led to a renewed interest on them on Big Data problem...
Conference Paper
Full-text available
While Support Vector Regression, SVR, is one of the algorithms of choice in modeling problems, construction of its error intervals seems to have received less attention. On the other hand, general noise cost functions for SVR have been recently proposed. Taking this into account, this paper describes a direct approach to build error intervals for d...
Article
The constant expansion of solar energy has made the accurate forecasting of radiation an important issue. In this work we apply Support Vector Regression (SVR), Gradient Boosted Regression (GBR), Random Forest Regression (RFR) as well as a hybrid method to combine them to downscale and improve 3-h accumulated radiation forecasts provided by Numeric...
Conference Paper
Full-text available
In this work we will apply some of the Deep Learning models that are currently obtaining state of the art results in several machine learning problems to the prediction of wind energy production. In particular, we will consider both deep, fully connected multilayer perceptrons with appropriate weight initialization, and also convolutional neural ne...
Article
In this paper we will prove a linear convergence rate for the extension of the Mitchell, Dem׳yanov and Malozemov (MDM) algorithm for solving the Nearest Point Problem (NPP). While linear convergence proofs for the related (but different) SMO method intended for SVM training require that the kernel matrix be positive definite, no such assumption is...
Article
The growing interest in big data problems implies the need for unsupervised methods for data visualization and dimensionality reduction. Diffusion Maps (DM) is a recent technique that can capture the lower dimensional geometric structure underlying the sample patterns in a way which can be made to be independent of the sampling distribution. Moreov...
Conference Paper
Full-text available
In this work we first explore the use of Support Vector Regression to forecast day-ahead daily and 3-hourly aggregated photovoltaic (PV) energy production on Spain using as inputs Numerical Weather Prediction forecasts of global horizontal radiation and total cloud cover. We then introduce an empirical “clear sky” PV energy curve that we use to dis...
Conference Paper
Full-text available
Spectral Clustering and Diffusion Maps are currently the leading methods for advanced clustering or dimensionality reduction. However, they require the eigenanalysis of a sample’s graph Laplacian L, something very costly for moderately sized samples and prohibitive for very large ones. We propose to build a low rank approximation to L using essenti...
Article
Full-text available
Non-linear dimensionality reduction techniques such as manifold learning algorithms have become a common way for processing and analyzing high-dimensional patterns that often have a target value attached. Their application to new points consists in two steps: first, embedding the new data point into the low dimensional space and then, estimating th...
Article
Full-text available
The growing presence of solar energy in the electrical systems of many countries has made its accurate forecasting an important issue. In this work we will explore the application of Support Vector Regression (SVR), an advanced Machine Learning modelling tool, to forecast the daily photovoltaic generation of Spain. Given the very large geographical...
Conference Paper
Full-text available
We discuss how to build sparse one hidden layer MLP replacing the standard l2 weight decay penalty on all weights by an l1 penalty on the linear output weights. We will propose an iterative two step training procedure where the output weights are found using FISTA proximal optimization algorithm to solve a Lasso-like problem and the hidden weights...
Chapter
We introduce the Group Total Variation (GTV) regularizer, a modification of Total Variation that uses the l 2,1 norm instead of the l 1 one to deal with multidimensional features. When used as the only regularizer, GTV can be applied jointly with iterative convex optimization algorithms such as FISTA. This requires to compute its proximal operator...
Article
Full-text available
Non-linear dimensionality reduction techniques such as manifold learning algorithms have become a common way for processing and analyzing high-dimensional patterns that often have a target value attached. Their application to new points consists in two steps: first, embedding the new data point into the low dimensional space and then, estimating th...
Article
Full-text available
High-content screening (HCS) allows the exploration of complex cellular phenotypes by automated microscopy and is increasingly being adopted for small interfering RNA genomic screening and phenotypic drug discovery. We introduce a series of cell-based evaluation metrics that have been implemented and validated in a mono-parametric HCS for regulator...
Conference Paper
We introduce the Group Total Variation (GTV) regularizer, a modification of Total Variation that uses the ℓ2,1 norm instead of the ℓ1 one to deal with multidimensional features. When used as the only regularizer, GTV can be applied jointly with iterative convex optimization algorithms such as FISTA. This requires to compute its proximal operator wh...
Conference Paper
It is well known that the dual function value sequence generated by SMO has a linear convergence rate when the kernel matrix is positive definite and sublinear convergence is also known to hold for a general matrix. In this paper we will prove that, when applied to hard-margin, i.e., linearly separable SVM problems, a linear convergence rate holds...
Conference Paper
Full-text available
The prediction and management of wind power ramps is currently receiving large attention as it is a crucial issue for both system operators and wind farm managers. However, this is still an issue far from being solved and in this work we will address it as a classification problem working with delay vectors of the wind power time series and applyin...
Conference Paper
Full-text available
The increasing importance of solar energy has made the accurate forecasting of radiation an important issue. In this work we apply Support Vector Regression to downscale and improve 3-hour accumulated radiation forecasts for two locations in Spain. We use either direct 3-hour SVR-refined forecasts or we build first global accumulated daily predicti...
Conference Paper
In this work we will apply sparse linear regression methods to forecast wind farm energy production using numerical weather prediction (NWP) features over several pressure levels, a problem where pattern dimension can become very large. We shall place sparse regression in the context of proximal optimization, which we shall briefly review, and we s...
Conference Paper
Full-text available
In this work we will apply Diffusion Maps (DM), a recent technique for dimensionality reduction and clustering, to build local models for wind energy forecasting. We will compare ridge regression models for K–means clusters obtained over DM features, against the models obtained for clusters constructed over the original meteorological data or princ...
Article
In this brief, we give a new proof of the asymptotic convergence of the sequential minimum optimization (SMO) algorithm for both the most violating pair and second order rules to select the pair of coefficients to be updated. The proof is more self-contained, shorter, and simpler than previous ones and has a different flavor, partially building upo...
Conference Paper
In this work we will analyze and apply to the prediction of wind energy some of the best known regularized linear regression algorithms, such as Ordinary Least Squares, Ridge Regression and, particularly, Lasso, Group Lasso and Elastic-Net that also seek to impose a certain degree of sparseness on the final models. To achieve this goal, some of the...
Conference Paper
In this paper we will describe a simple proof of a linear convergence rate for the MDM algorithm that solves the Minimum Norm Problem (MNP). Linear convergence rates have been shown for the SMO algorithm, but the proofs require specific assumptions and are rather involved. We will follow a different approach, with a more geometric flavor. While as...
Conference Paper
Full-text available
Diffusion Maps is a new powerful technique for dimensionality reduction that can capture geometric structure while taking into account data distribution. In this work we will apply it to time and spatial compression of numerical weather forecasts, showing how it is capable to greatly reduce the initial dimension while still capturing relevant infor...
Conference Paper
In the Echo State Networks (ESN) and, more generally, Reservoir Computing paradigms (a recent approach to recurrent neural networks), linear readout weights, i.e., linear output weights, are the only ones actually learned under training. The standard approach for this is SVD-based pseudo-inverse linear regression. Here it will be compared with two...
Conference Paper
Sequential Minimal Optimization (SMO) can be regarded as the state-of-the-art approach in non-linear Support Vector Machines training, being the method of choice in the successful LIBSVM software. Its optimization procedure is based on updating only a couple of the problem coefficients per iteration, until convergence. In this paper we notice that...
Article
Support vector regression (SVR) is a powerful tool in modeling and prediction tasks with widespread application in many areas. The most representative algorithms to train SVR models are Shevade et al.'s Modification 2 and Lin's WSS1 and WSS2 methods in the LIBSVM library. Both are variants of standard SMO in which the updating pairs selected are th...
Conference Paper
In this paper we establish a framework for the convergence of two algorithms for solving the Nearest Point Problem in Reduced Convex Hulls (RCH-NPP), namely the RCH-GSK method proposed in [1] and the RCH-MDM method suggested in [2]. This framework allows us to show the asymptotic convergence of both methods in a very simple way. Moreover, it allows...
Conference Paper
Least–Squares Support Vector Machines (LS–SVMs) have been a successful alternative model for classification and regression Support Vector Machines (SVMs), and used in a wide range of applications. In spite of this, only a limited effort has been realized to design efficient algorithms for the training of this class of models, in clear contrast to t...
Article
The nearest point problem (NPP), i.e., finding the closest points between two disjoint convex hulls, has two classical solutions, the Gilbert–Schlesinger–Kozinec (GSK) and Mitchell–Dem’yanov–Malozemov (MDM) algorithms. When the convex hulls do intersect, NPP has to be stated in terms of reduced convex hulls (RCHs), made up of convex pattern combina...
Conference Paper
Least-Squares Support Vector Machines (LS-SVMs) have been successfully applied in many classification and regression tasks. Their main drawback is the lack of sparseness of the final models. Thus, a procedure to sparsify LS-SVMs is a frequent desideratum. In this paper, we adapt to the LS-SVM case a recent work for sparsifying classical SVM classif...
Article
Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingeniería informática. Enero de 2004 Incluye bibliografía: p. 169-172
Conference Paper
Building upon Gilbert’s convergence proof of his algorithtm to solve the Minimum Norm Problem, we establish a framework where a much simplified version of his proof allows us to prove the convergence of two algorithms for solving the Nearest Point Problem for disjoint convex hulls, namely the GSK and the MDM algorithms, as well as the convergence o...
Conference Paper
Second order SMO represents the state–of–the–art in SVM training for moderate size problems. In it, the solution is attained by solving a series of subproblems which are optimized w.r.t just a pair of multipliers. In this paper we will illustrate how SMO works in a two stage fashion, setting first the values of the bounded multipliers to the penalt...
Conference Paper
Scaled Convex Hulls (SCHs) have been recently proposed by Liu et al. as the basis of a method to build linear classifiers that, when extended to kernel settings, provides an alternative approach to more established methods such as SVMs. Here we show how to adapt the Mitchell-Dem'yanov-Malozemov (MDM) algorithm to build such SCH-based classifiers by...
Conference Paper
Full-text available
Least Squares Support Vector Machines (LS-SVMs) were proposed by replacing the inequality constraints inherent to L1-SVMs with equality constraints. So far this idea has only been suggested for a least squares (L2) loss. We describe how this can also be done for the sumof-slacks (L1) loss, yielding a new classifier (Least 1-Norm SVMs) which gives s...
Article
Full-text available
A genetic algorithm for optimal tuning of a linear controller is presented hin this note. This method is applied to the networked control of a high performance drilling process, one example of a class of complex electromechanical process. A multi-objective optimisation criterion is presented for maximising the tool's working life and the material r...
Article
Full-text available
This paper reports on the evaluation of different machine learning techniques for the automated classification of coding gene sequences obtained from several organisms in terms of their functional role as adhesins. Diverse, biologically-meaningful, sequence-based features were extracted from the sequences and used as inputs to the in silico predict...
Conference Paper
We give a new proof of the convergence of the SMO algorithm for SVM training over linearly separable problems that partly builds on the one by Mitchell et al. for the convergence of the MDM algorithm to find the point of a convex set closest to the origin. Our proof relies in a simple derivation of SMO that we also present here and, while less gene...
Article
Full-text available
Implicit Wiener series are a powerful tool to build Volterra representations of time series with any degree of non-linearity. A natural question is then whether higher order representations yield more useful models. In this work we shall study this question for ECoG data channel relationships in epileptic seizure recordings, considering whether qua...
Article
Optimal parameter model finding is usually a crucial task in engineering applications of classification and modelling. The exponential cost of linear search on a parameter grid of a given precision rules it out in all but the simplest problems and random algorithms such as uniform design or the covariance matrix adaptation-evolution strategy (CMA-E...
Conference Paper
Shevade’s et al. Modification 2 is one of the most widely used algorithms to build Support Vector Regression (SVR) models. It selects as a size 2 working set the index pair giving the maximum KKT violation and combines it with the updating heuristics of Smola and Schölkopf enforcing at each training iteration a ai a*i = 0\alpha_i \alpha^*_i =0 cond...
Article
Fast SVM training is an important goal for which many proposals have been given in the literature. In this work we will study from a geometrical point of view the presence, in both the Mitchell–Demyanov–Malozemov (MDM) algorithm and Platt's Sequential Minimal Optimization, of training cycles, that is, the repeated selection of some concrete updatin...
Conference Paper
Full-text available
In this work we will give explicit formulae for the application of Rosen's gradient projection method to SVM training that leads to a very simple implementation. We shall experimentally show that the method provides good descent directions that result in less training iterations, particularly when large precision is wanted. However, a naive kerneli...
Conference Paper
It is well known that linear slack penalty SVM training is equivalent to solving the Nearest Point Problem (NPP) over the so-called μ-Reduced Convex Hulls, that is, convex combinations of the positive and negative samples with coefficients bounded by a μ < 1 value. In this work we give a simple approach to the classical Gilbert-Schlesinger-Kozinec...
Conference Paper
Full-text available
SVM training is usually discussed under two different algorithmic points of view. The first one is provided by decomposition methods such as SMO and SVMLight while the second one encompasses geometric methods that try to solve a Nearest Point Problem (NPP), the Gilbert–Schlesinger–Kozinec (GSK) and Mitchell–Demyanov–Malozemov (MDM) algorithms being...
Conference Paper
While usually SVM training tries to solve the dual of the standard SVM minimization problem, alternative algorithms that solve the Nearest Point Problem (NPP) for the convex hulls of the positive and negative samples have been shown to also provide effective SVM training. They are variants of the Mitchell–Demyanov–Malozemov (MDM) algorithm and alth...

Network

Cited By