Conference Paper

Learning Non-Linear Combinations of Kernels.

Conference: Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, Vancouver, British Columbia, Canada.
Source: DBLP


This paper studies the general problem of learning kernels based on a polynomial combination of base kernels. We analyze this problem in the case of regression and the kernel ridge regression algorithm. We examine the corresponding learning kernel optimization problem, show how that minimax problem can be reduced to a simpler minimization problem, and prove that the global solution of this problem always lies on the boundary. We give a projection-based gradient descent algo- rithm for solving the optimization problem, shown empirically to converge in few iterations. Finally, we report the results of extensive exp eriments with this algo- rithm using several publicly available datasets demonstrating the effectiveness of our technique.

14 Reads
  • Source
    • "A variant of this method, known as hierarchical multiple learning [10] attempts to learn a linear combination of an exponential number of basic kernels, represented as a product of sums. At the same time, non-linear combination of polynomial kernels are also studied in [11]. "
    ICASSP 2015; 04/2015
  • Source
    • "Works have been proposed to optimize the hyperparameters of a kernel function [10] [29], and finding the best way of combining multiple kernels, i.e., Multiple Kernel Learning (MKL) [4] [3] [18] [12]. A summary of MKL can be found in [20]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Kernel approximation via nonlinear random feature maps is widely used in speeding up kernel machines. There are two main challenges for the conventional kernel approximation methods. First, before performing kernel approximation, a good kernel has to be chosen. Picking a good kernel is a very challenging problem in itself. Second, high-dimensional maps are often required in order to achieve good performance. This leads to high computational cost in both generating the nonlinear maps, and in the subsequent learning and prediction process. In this work, we propose to optimize the nonlinear maps directly with respect to the classification objective in a data-dependent fashion. The proposed approach achieves kernel approximation and kernel learning in a joint framework. This leads to much more compact maps without hurting the performance. As a by-product, the same framework can also be used to achieve more compact kernel maps to approximate a known kernel. We also introduce Circulant Nonlinear Maps, which uses a circulant-structured projection matrix to speed up the nonlinear maps for high-dimensional data.
  • Source
    • "The generalization bound and other theoretical aspects of the L p -norm MKL method is extensively studied in [16]. Besides searching for the optimal constraints on θ, several other works have been proposed, such as using a Group-Lasso type regularizer [35] and an L 1 -norm within-group / L s -norm (s ≥ 1) group-wise regularizer [1], nonlinearly combined MKL [6], MKL with localized θ [10], MKL with hyperkernels [22], MKL based on the radii of minimum enclosing balls [9] and other methods, such as the ones of [23], [29] and [33], to name a few. A thorough survey of MKL is given in [11]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A traditional and intuitively appealing Multi-Task Multiple Kernel Learning (MT-MKL) method is to optimize the sum (thus, the average) of objective functions with (partially) shared kernel function, which allows information sharing amongst tasks. We point out that the obtained solution corresponds to a single point on the Pareto Front (PF) of a Multi-Objective Optimization (MOO) problem, which considers the concurrent optimization of all task objectives involved in the Multi-Task Learning (MTL) problem. Motivated by this last observation and arguing that the former approach is heuristic, we propose a novel Support Vector Machine (SVM) MT-MKL framework, that considers an implicitly-defined set of conic combinations of task objectives. We show that solving our framework produces solutions along a path on the aforementioned PF and that it subsumes the optimization of the average of objective functions as a special case. Using algorithms we derived, we demonstrate through a series of experimental results that the framework is capable of achieving better classification performance, when compared to other similar MTL approaches.
    IEEE transactions on neural networks and learning systems 01/2015; 26(1):51-61. DOI:10.1109/TNNLS.2014.2309939 · 4.29 Impact Factor
Show more


14 Reads
Available from