Combined SVM-Based Feature Selection and Classification.

Universität Mannheim, Mannheim, Baden-Württemberg, Germany
Machine Learning (Impact Factor: 1.69). 11/2005; 61(1-3):129-150. DOI: 10.1007/s10994-005-1505-9
Source: DBLP

ABSTRACT Feature selection is an important combinatorial optimisation problem in the context of supervised pattern classification. This paper presents four novel continuous feature selection approaches directly minimising the classifier performance. In particular, we include linear and nonlinear Support Vector Machine classifiers. The key ideas of our approaches are additional regularisation and embedded nonlinear feature selection. To solve our optimisation problems, we apply difference of convex functions programming which is a general framework for non-convex continuous optimisation. Experiments with artificial data and with various real-world problems including organ classification in computed tomography scans demonstrate that our methods accomplish the desired feature selection and classification performance simultaneously.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a new feature selection technique based on rough sets and bat algorithm (BA). BA is attractive for feature selection in that bats will discover best feature combinations as they fly within the feature subset space. Compared with GAs, BA does not need complex operators such as crossover and mutation, it requires only primitive and simple mathematical operators, and is computationally inexpensive in terms of both memory and runtime. A fitness function based on rough-sets is designed as a target for the optimization. The used fitness function incorporates both the classification accuracy and number of selected features and hence balances the classification performance and reduction size. This paper make use of four initialisation strategies for starting the optimization and studies its effect on bat performance. The used initialization reflects forward and backward feature selection and combination of both. Experimentation is carried out using UCI data sets which compares the proposed algorithm with a GA-based and PSO approaches for feature reduction based on rough-set algorithms. The results on different data sets shows that bat algorithm is efficient for rough set-based feature selection. The used rough-set based fitness function ensures better classification result keeping also minor feature size.
    9th International Conference on Computer Engineering & Systems (ICCES), Cairo, Egypt; 12/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: We develop an exact penalty approach for feature selection in machine learning via the zero-norm \(\ell _{0}\) -regularization problem. Using a new result on exact penalty techniques we reformulate equivalently the original problem as a Difference of Convex (DC) functions program. This approach permits us to consider all the existing convex and nonconvex approximation approaches to treat the zero-norm in a unified view within DC programming and DCA framework. An efficient DCA scheme is investigated for the resulting DC program. The algorithm is implemented for feature selection in SVM, that requires solving one linear program at each iteration and enjoys interesting convergence properties. We perform an empirical comparison with some nonconvex approximation approaches, and show using several datasets from the UCI database/Challenging NIPS 2003 that the proposed algorithm is efficient in both feature selection and classification.
    Machine Learning 01/2014; DOI:10.1007/s10994-014-5455-y · 1.69 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper studies the problem of feature selection in the context of Semi-Supervised Support Vector Machine (S3VM). The zero norm, a natural concept dealing with sparsity, is used for feature selection purpose. Due to two nonconvex terms (the loss function of unlabeled data and the ℓ0 term), we are faced with a NP hard optimization problem. Two continuous approaches based on DC (Difference of Convex functions) programming and DCA (DC Algorithms) are developed. The first is DC approximation approach that approximates the ℓ0-norm by a DC function. The second is an exact reformulation approach based on exact penalty techniques in DC programming. All the resulting optimization problems are DC programs for which DCA are investigated. Several usual sparse inducing functions are considered, and six versions of DCA are developed. Empirical numerical experiments on several Benchmark datasets show the efficiency of the proposed algorithms, in both feature selection and classification.
    Neurocomputing 04/2015; 153. DOI:10.1016/j.neucom.2014.11.051 · 2.01 Impact Factor