Conference Paper

Learning Non-Linear Combinations of Kernels.

Conference: Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, Vancouver, British Columbia, Canada.
Source: DBLP

ABSTRACT This paper studies the general problem of learning kernels based on a polynomial combination of base kernels. We analyze this problem in the case of regression and the kernel ridge regression algorithm. We examine the corresponding learning kernel optimization problem, show how that minimax problem can be reduced to a simpler minimization problem, and prove that the global solution of this problem always lies on the boundary. We give a projection-based gradient descent algo- rithm for solving the optimization problem, shown empirically to converge in few iterations. Finally, we report the results of extensive exp eriments with this algo- rithm using several publicly available datasets demonstrating the effectiveness of our technique.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Kernel approximation via nonlinear random feature maps is widely used in speeding up kernel machines. There are two main challenges for the conventional kernel approximation methods. First, before performing kernel approximation, a good kernel has to be chosen. Picking a good kernel is a very challenging problem in itself. Second, high-dimensional maps are often required in order to achieve good performance. This leads to high computational cost in both generating the nonlinear maps, and in the subsequent learning and prediction process. In this work, we propose to optimize the nonlinear maps directly with respect to the classification objective in a data-dependent fashion. The proposed approach achieves kernel approximation and kernel learning in a joint framework. This leads to much more compact maps without hurting the performance. As a by-product, the same framework can also be used to achieve more compact kernel maps to approximate a known kernel. We also introduce Circulant Nonlinear Maps, which uses a circulant-structured projection matrix to speed up the nonlinear maps for high-dimensional data.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A traditional and intuitively appealing Multi-Task Multiple Kernel Learning (MT-MKL) method is to optimize the sum (thus, the average) of objective functions with (partially) shared kernel function, which allows information sharing amongst tasks. We point out that the obtained solution corresponds to a single point on the Pareto Front (PF) of a Multi-Objective Optimization (MOO) problem, which considers the concurrent optimization of all task objectives involved in the Multi-Task Learning (MTL) problem. Motivated by this last observation and arguing that the former approach is heuristic, we propose a novel Support Vector Machine (SVM) MT-MKL framework, that considers an implicitly-defined set of conic combinations of task objectives. We show that solving our framework produces solutions along a path on the aforementioned PF and that it subsumes the optimization of the average of objective functions as a special case. Using algorithms we derived, we demonstrate through a series of experimental results that the framework is capable of achieving better classification performance, when compared to other similar MTL approaches.
    IEEE transactions on neural networks and learning systems 01/2015; 26(1):51-61. DOI:10.1109/TNNLS.2014.2309939 · 4.37 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In recent years, different methods based on kernels have been used with success in a variety of tasks such as classification. However, in the typical use of these methods, the choice of the optimal kernel is crucial to improve the performance of a specific task. So, instead of selecting a single kernel, multiple kernel learning (MKL) has been proposed which uses a combination of kernels, where the weight of each kernel is optimized in the training stage. MKL methods use kernels in linear, nonlinear or data-dependent combinations. Methods based on MKL have performed better than methods using a single kernel such as the Support Vector Machine (SVM). In this article, we propose a new MKL method, which is based on a local (data dependent) and nonlinear combination of different kernels using a gating model for selecting the appropriate kernel function. We call our proposal as localized nonlinear multiple kernel learning (LNLMKL). In our experiments for binary microarray classification, different kernels were used in SVM and different kernels combinations were used for our proposal and for the other MKL methods. Finally, we report the results of these experiments using eight high-dimensional microarray data sets demonstrating that our proposal have performed better than the other methods analyzed.
    2012 XXXVIII Conferencia Latinoamericana En Informatica (CLEI); 10/2012


Available from