Conference Paper

Learning Non-Linear Combinations of Kernels.

Conference: Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, Vancouver, British Columbia, Canada.
Source: DBLP

ABSTRACT This paper studies the general problem of learning kernels based on a polynomial combination of base kernels. We analyze this problem in the case of regression and the kernel ridge regression algorithm. We examine the corresponding learning kernel optimization problem, show how that minimax problem can be reduced to a simpler minimization problem, and prove that the global solution of this problem always lies on the boundary. We give a projection-based gradient descent algo- rithm for solving the optimization problem, shown empirically to converge in few iterations. Finally, we report the results of extensive exp eriments with this algo- rithm using several publicly available datasets demonstrating the effectiveness of our technique.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Selecting the optimal kernel is an important and difficult challenge in applying kernel methods to pattern recog-nition. To address this challenge, multiple kernel learn-ing (MKL) aims to learn a kernel from a combination of base kernel functions that perform optimally on the task. In this paper, we propose a novel MKL-themed approach to combine base kernels that are multiplica-tively shaped with low-rank positive semidefinitve ma-trices. The proposed approach generalizes several pop-ular MKL methods and thus provides more flexibility in modeling data. Computationally, we show how these low-rank matrices can be learned efficiently from data using convex quadratic programming. Empirical studies on several standard benchmark datasets for MKL show that the new approach often improves prediction accu-racy statistically significantly over very competitive sin-gle kernel and other MKL methods.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Kernel approximation via nonlinear random feature maps is widely used in speeding up kernel machines. There are two main challenges for the conventional kernel approximation methods. First, before performing kernel approximation, a good kernel has to be chosen. Picking a good kernel is a very challenging problem in itself. Second, high-dimensional maps are often required in order to achieve good performance. This leads to high computational cost in both generating the nonlinear maps, and in the subsequent learning and prediction process. In this work, we propose to optimize the nonlinear maps directly with respect to the classification objective in a data-dependent fashion. The proposed approach achieves kernel approximation and kernel learning in a joint framework. This leads to much more compact maps without hurting the performance. As a by-product, the same framework can also be used to achieve more compact kernel maps to approximate a known kernel. We also introduce Circulant Nonlinear Maps, which uses a circulant-structured projection matrix to speed up the nonlinear maps for high-dimensional data.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we give a new generalization error bound of Multiple Kernel Learn-ing (MKL) for a general class of regularizations. Our main target in this paper is dense type regularizations including ℓ p -MKL that imposes ℓ p -mixed-norm regu-larization instead of ℓ 1 -mixed-norm regularization. According to the recent nu-merical experiments, the sparse regularization does not necessarily show a good performance compared with dense type regularizations. Motivated by this fact, this paper gives a general theoretical tool to derive fast learning rates that is ap-plicable to arbitrary mixed-norm-type regularizations in a unifying manner. As a by-product of our general result, we show a fast learning rate of ℓ p -MKL that is tightest among existing bounds. We also show that our general learning rate achieves the minimax lower bound. Finally, we show that, when the complexities of candidate reproducing kernel Hilbert spaces are inhomogeneous, dense type regularization shows better learning rate compared with sparse ℓ 1 regularization.


Available from