Combined SVM-Based Feature Selection and Classification.

Machine Learning (Impact Factor: 1.47). 01/2005; 61:129-150. DOI: 10.1007/s10994-005-1505-9
Source: DBLP

ABSTRACT Feature selection is an important combinatorial optimisation problem in the context of supervised pattern classification. This paper presents four novel continuous feature selection approaches directly minimising the classifier performance. In particular, we include linear and nonlinear Support Vector Machine classifiers. The key ideas of our approaches are additional regularisation and embedded nonlinear feature selection. To solve our optimisation problems, we apply difference of convex functions programming which is a general framework for non-convex continuous optimisation. Experiments with artificial data and with various real-world problems including organ classification in computed tomography scans demonstrate that our methods accomplish the desired feature selection and classification performance simultaneously.

  • [Show abstract] [Hide abstract]
    ABSTRACT: To propose a new flexible and sparse classifier that results in interpretable decision support systems. Support vector machines (SVMs) for classification are very powerful methods to obtain classifiers for complex problems. Although the performance of these methods is consistently high and non-linearities and interactions between variables can be handled efficiently when using non-linear kernels such as the radial basis function (RBF) kernel, their use in domains where interpretability is an issue is hampered by their lack of transparency. Many feature selection algorithms have been developed to allow for some interpretation but the impact of the different input variables on the prediction still remains unclear. Alternative models using additive kernels are restricted to main effects, reducing their usefulness in many applications. This paper proposes a new approach to expand the RBF kernel into interpretable and visualizable components, including main and two-way interaction effects. In order to obtain a sparse model representation, an iterative l1-regularized parametric model using the interpretable components as inputs is proposed. Results on toy problems illustrate the ability of the method to select the correct contributions and an improved performance over standard RBF classifiers in the presence of irrelevant input variables. For a 10-dimensional x-or problem, an SVM using the standard RBF kernel obtains an area under the receiver operating characteristic curve (AUC) of 0.947, whereas the proposed method achieves an AUC of 0.997. The latter additionally identifies the relevant components. In a second 10-dimensional artificial problem, the underlying class probability follows a logistic regression model. An SVM with the RBF kernel results in an AUC of 0.975, as apposed to 0.994 for the presented method. The proposed method is applied to two benchmark datasets: the Pima Indian diabetes and the Wisconsin Breast Cancer dataset. The AUC is in both cases comparable to those of the standard method (0.826 versus 0.826 and 0.990 versus 0.996) and those reported in the literature. The selected components are consistent with different approaches reported in other work. However, this method is able to visualize the effect of each of the components, allowing for interpretation of the learned logic by experts in the application domain. This work proposes a new method to obtain flexible and sparse risk prediction models. The proposed method performs as well as a support vector machine using the standard RBF kernel, but has the additional advantage that the resulting model can be interpreted by experts in the application domain.
    Artificial intelligence in medicine 10/2013; · 1.65 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper considers a class of feature selecting support vector machines (SVMs) based on LqLq-norm regularization, where q∈(0,1)q∈(0,1). The standard SVM [Vapnik, V., 1995. The Nature of Statistical Learning Theory. Springer, NY.] minimizes the hinge loss function subject to the L2L2-norm penalty. Recently, L1L1-norm SVM (L1L1-SVM) [Bradley, P., Mangasarian, O., 1998. Feature selection via concave minimization and support vector machines. In: Machine Learning Proceedings of the Fifteenth International Conference (ICML98). Citeseer, pp. 82–90.] was suggested for feature selection and has gained great popularity since its introduction. L0L0-norm penalization would result in more powerful sparsification, but exact solution is NP-hard. This raises the question of whether fractional-norm (LqLq for qq between 0 and 1) penalization can yield benefits over the existing L1L1, and approximated L0L0 approaches for SVMs. The major obstacle to answering this is that the resulting objective functions are non-convex. This paper addresses the difficult optimization problems of fractional-norm SVM by introducing a new algorithm based on the Difference of Convex functions (DC) programming techniques [Pham Dinh, T., Le Thi, H., 1998. A DC optimization algorithm for solving the trust-region subproblem. SIAM J. Optim. 8 (2), 476–505. Le Thi, H., Pham Dinh, T., 2008. A continuous approach for the concave cost supply problem via DC programming and DCA. Discrete Appl. Math. 156 (3), 325–338.], which efficiently solves a reweighted L1L1-SVM problem at each iteration. Numerical results on seven real world biomedical datasets support the effectiveness of the proposed approach compared to other commonly-used sparse SVM methods, including L1L1-SVM, and recent approximated L0L0-SVM approaches.
    Computational Statistics & Data Analysis 11/2013; 67:136–148. · 1.30 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Plenty of feature selection methods are available in literature due to the availability of data with hundreds of variables leading to data with very high dimension. Feature selection methods provides us a way of reducing computation time, improving prediction performance, and a better understanding of the data in machine learning or pattern recognition applications. In this paper we provide an overview of some of the methods present in literature. The objective is to provide a generic introduction to variable elimination which can be applied to a wide array of machine learning problems. We focus on Filter, Wrapper and Embedded methods. We also apply some of the feature selection techniques on standard datasets to demonstrate the applicability of feature selection techniques.
    Computers & Electrical Engineering 01/2014; 40(1):16-28. · 0.93 Impact Factor