Chapter
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Feature Selection (FS) is the core part of data processing pipeline. Use of ensemble in FS is a relatively new approach aiming at producing more diversity in feature dataset, which provides better performance as well as more robust and accurate result. An aggregation step combined the output of each FS method and generate the Single feature Subset. In this paper, a novel ensemble method for FS “EFSCAT” is proposed which rank all the features and then cluster the most related features. To reduce the size of ranking an automatic threshold in every ranker is being introduced. This added thresholding step will improve the computational efficiency because it cutoff low-ranking features which were initially ranked by Ranker. Mean-shift clustering is then use to combined the results of each ranker. The process of aggregation will become very time efficient. “EFSCAT” will make the classification more robust and stable.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Harris Hawks Optimizer (HHO) is one of the many recent algorithms in the field of metaheuristics. The HHO algorithm mimics the cooperative behavior of Harris Hawks and their foraging behavior in nature called surprise pounce. HHO benefits from a small number of controlling parameters setting, simplicity of implementation, and a high level of exploration and exploitation. To alleviate the drawbacks of this algorithm, a modified version called Nonlinear based Chaotic Harris Hawks Optimization (NCHHO) is proposed in this paper. NCHHO uses chaotic and nonlinear control parameters to improve HHO’s optimization performance. The main goal of using the chaotic maps in the proposed method is to improve the exploratory behavior of HHO. In addition, this paper introduces a nonlinear control parameter to adjust HHO’s exploratory and exploitative behaviors. The proposed NCHHO algorithm shows an improved performance using a variety of chaotic maps that were implemented to identify the most effective one, and tested on several well-known benchmark functions. The paper also considers solving an Internet of Vehicles (IoV) optimization problem that showcases the applicability of NCHHO in solving large-scale, real-world problems. The results demonstrate that the NCHHO algorithm is very competitive, and often superior, compared to the other algorithms. In particular, NCHHO provides 92% better results in average to solve the uni-modal and multi-modal functions with problem dimension sizes of D = 30 and 50, whereas, with respect to the higher dimension problem, our proposed algorithm shows 100% consistent improvement with D = 100 and 1000 compared to other algorithms. In solving the IoV problem, the success rate was 62.5%, which is substantially better in comparison with the state-of-the-art algorithms. To this end, the proposed NCHHO algorithm in this paper demonstrates a promising method to be widely used by different applications, which brings benefits to industries and businesses in solving their optimization problems experienced daily , such as resource allocation, information retrieval, finding the optimal path for sending data over networks, path planning, and so many other applications.
Article
Full-text available
Feature selection is an important issue in data mining, and it is used to reduce dimensions of features set. Web spam detection is one of research fields of data mining. With regard to increasing available information in virtual space and the need of users to search, the role of search engines and used algorithms are important in terms of ranking. Web spam is an illegal method to increase mendacious rank of internet pages by deceiving the algorithms of search engines, so it is essential to use an efficient method. Up to now, many methods have been proposed to face with web spam. An ensemble feature selection method has been proposed in this paper to detect web spam. Content features of standard dataset of WEBSPAM-UK2007 are used for evaluation. Bayes network classifier is used along with 70-30% training-testing spilt of dataset. The presented results show that AUC of this method is higher than the other methods reported in this paper. Moreover, the best values of evaluation metrics in our proposed method are optimal in comparison to the other methods reported in this paper. In addition, it improves classification metrics in comparison to basic feature selection methods.
Article
Full-text available
In credit scoring, feature selection aims at removing irrelevant data to improve the performance of the scorecard and its interpretability. Standard techniques treat feature selection as a single-objective task and rely on statistical criteria such as correlation. Recent studies suggest that using profit-based indicators may improve the quality of scoring models for businesses. We extend the use of profit measures to feature selection and develop a multi-objective wrapper framework based on the NSGA-II genetic algorithm with two fitness functions: the Expected Maximum Profit (EMP) and the number of features. Experiments on multiple credit scoring data sets demonstrate that the proposed approach develops scorecards that can yield a higher expected profit using fewer features than conventional feature selection strategies.
Article
Full-text available
Feature selection is an essential technique to reduce the dimensionality problem in data mining task. Traditional feature selection algorithms are fail to scale on large space. This paper proposes a new method to solve dimensionality problem where clustering is integrating with correlation measure to produce good feature subset. First Irrelevant features are eliminated by using k-means clustering method and then non-redundant features are selected by correlation measure from each cluster. The proposed method is evaluate on Microarray and Text datasets and the results are compared with other renowned feature selection methods using Naïve Bayes classifier. To verify the accuracy of the proposed method with different number of relevant features, percentagewise criteria is used. The experimental results reveal the efficiency and accuracy of the proposed method.
Article
Full-text available
The present paper is Part 2 in this series of two papers. In Part 1 we provided an introduction to Multiple Classifier Systems (MCS) with a focus into the fundamentals: basic nomenclature, key elements, architecture, main methods, and prevalent theory and framework. Part 1 then overviewed the application of MCS to the particular field of multimodal biometric person authentication in the last 25 years, as a prototypical area in which MCS has resulted in important achievements. Here in Part 2 we present in more technical detail recent trends and developments in MCS coming from multimodal biometrics that incorporate context information in an adaptive way. These new MCS architectures exploit input quality measures and pattern-specific particularities that move apart from general population statistics, resulting in robust multimodal biometric systems. Similarly as in Part 1, methods here are described in a general way so they can be applied to other information fusion problems as well. Finally, we also discuss here open challenges in biometrics in which MCS can play a key role.
Article
Full-text available
Ensemble methods have delivered exceptional performance in various applications. However, this exceptional performance is achieved at the expense of heavy storage requirements and slower predictions. Ensemble pruning aims at reducing the complexity of this popular learning paradigm without worsening its performance. This paper presents an efficient and effective ordering-based ensemble pruning methods which ranks all the base classifiers with respect to a maximum relevancy maximum complementary (MRMC) measure. The MRMC measure evaluates the base classifier’s classification ability as well as its complementariness to the ensemble, and thereby a set of accurate and complementary base classifiers can be selected. Moreover, an evaluation function that deliberately favors the candidate sub-ensembles with a better performance in classifying low margin instances has also been proposed. Experiments performed on 25 benchmark datasets demonstrate the effectiveness of our proposed method.
Article
Full-text available
In this paper we present a new algorithm for parameter-free clustering by mode seeking. Mode seeking, especially in the form of the mean shift algorithm, is a widely used strategy for clustering data, but at the same time prone to poor performance if the parameters are not chosen correctly. We propose to form a clustering ensemble consisting of repeated and bootstrapped runs of the recent kNN mode seeking algorithm, an algorithm which is faster than ordinary mean shift and more suited for high dimensional data. This creates a robust mode seeking clustering algorithm with respect to the choice of parameters and high dimensional input spaces, while at the same inheriting all other strengths of mode seeking in general. We demonstrate promising results on a number of synthetic and real data sets.
Article
Full-text available
Most of the available feature selection techniques in the literature are classifier bound. It means a group of features tied to the performance of a specific classifier as applied in wrapper and hybrid approach. Our objective in this study is to select a set of generic features not tied to any classifier based on the proposed framework. This framework uses attribute clustering and feature ranking techniques in pipeline in order to remove redundant features. On each uncovered cluster, signal-to-noise ratio, t-statistics and significance analysis of microarray are independently applied to select the top ranked features. Both filter and evolutionary wrapper approaches have been considered for feature selection and the data set with selected features are given to ensemble of predefined statistically different classifiers. The class labels of the test data are determined using majority voting technique. Moreover, with the aforesaid objectives, this paper focuses on obtaining a stable result out of various classification models. Further, a comparative analysis has been performed to study the classification accuracy and computational time of the current approach and evolutionary wrapper techniques. It gives a better insight into the features and further enhancing the classification accuracy with less computational time.
Article
Full-text available
Crow search algorithm (CSA) is a new natural inspired algorithm proposed by Askarzadeh in 2016. The main inspiration of CSA came from crow search mechanism for hiding their food. Like most of the optimization algorithms, CSA suffers from low convergence rate and entrapment in local optima. In this paper, a novel meta-heuristic optimizer, namely chaotic crow search algorithm (CCSA), is proposed to overcome these problems. The proposed CCSA is applied to optimize feature selection problem for 20 benchmark datasets. Ten chaotic maps are employed during the optimization process of CSA. The performance of CCSA is compared with other well-known and recent optimization algorithms. Experimental results reveal the capability of CCSA to find an optimal feature subset which maximizes the classification performance and minimizes the number of selected features. Moreover, the results show that CCSA is superior compared to CSA and the other algorithms. In addition, the experiments show that sine chaotic map is the appropriate map to significantly boost the performance of CSA.
Article
Full-text available
When solving many machine learning problems such as classification, there exists a large number of input features. However, not all features are relevant for solving the problem, and sometimes, including irrelevant features may deteriorate the learning performance. Therefore, it is essential to select the most relevant features, which is known as feature selection. Many feature selection algorithms have been developed, including evolutionary algorithms or particle swarm optimization (PSO) algorithms, to find a subset of the most important features for accomplishing a particular machine learning task. However, the traditional PSO does not perform well for large scale optimization problems, which degrades the effectiveness of PSO for feature selection when the number of features dramatically increases. In this paper, we propose to use a very recent PSO variant, known as competitive swarm optimizer (CSO) that was dedicated to large-scale optimization, for solving high-dimensional feature selection problems. In addition, the CSO, which was originally developed for continuous optimization, is adapted to performing feature selection that can be considered as a combinatorial optimization problem. An archive technique is also introduced to reduce computational cost. Experiments on six benchmark datasets demonstrate that compared to the canonical PSO based and a state-of-the-art PSO variant for feature selection, the proposed CSO-based feature selection algorithm not only selects a much smaller number of features, but result in better classification performance as well.
Article
This paper proposes a new feature selection framework for machine learning-based phishing detection system, called the Hybrid Ensemble Feature Selection (HEFS). In the first phase of HEFS, a novel Cumulative Distribution Function gradient (CDF-g) algorithm is exploited to produce primary feature subsets, which are then fed into a data perturbation ensemble to yield secondary feature subsets. The second phase derives a set of baseline features from the secondary feature subsets by using a function perturbation ensemble. The overall experimental results suggest that HEFS performs best when it is integrated with Random Forest classifier, where the baseline features correctly distinguish 94.6% of phishing and legitimate websites using only 20.8% of the original features. In another experiment, the baseline features (10 in total) utilised on Random Forest outperforms the set of all features (48 in total) used on SVM, Naive Bayes, C4.5, JRip, and PART classifiers. HEFS also shows promising results when benchmarked using another well-known phishing dataset from the University of California Irvine (UCI) repository. Hence, the HEFS is a highly desirable and practical feature selection technique for machine learning-based phishing detection systems.
Article
The emergence of “curse of dimensionality” issue as a result of high reduces datasets deteriorates the capability of learning algorithms, and also requires high memory and computational costs. Selection of features by discarding redundant and irrelevant features functions as a crucial machine learning technique aimed at reducing the dimensionality of these datasets, which improves the performance of the learning algorithm. Feature selection has been extensively applied in many application areas relevant to expert and intelligent systems, such as data mining and machine learning. Although many algorithms have been developed so far, they are still unsatisfying confronting high-dimensional data. This paper presented a new hybrid filter-based feature selection algorithm based on acombination of clustering and the modified Binary Ant System (BAS), called FSCBAS, to overcome the search space and high-dimensional data processing challenges efficiently. This model provided both global and local search capabilities between and within clusters. In the proposed method, inspired by genetic algorithm and simulated annealing, a damped mutation strategy was introduced that avoided falling into local optima, and a new redundancy reduction policy adopted to estimate the correlation between the selected features further improved the algorithm. The proposed method can be applied in many expert system applications such as microarray data processing, text classification and image processing in high-dimensional data to handle the high dimensionality of the feature space and improve classification performance simultaneously. The performance of the proposed algorithm was compared to that of state-of-the-art feature selection algorithms using different classifiers on real-world datasets. The experimental results confirmed that the proposed method reduced computational complexity significantly, and achieved better performance than the other feature selection methods.
Article
Online streaming feature selection, as a new approach which deals with feature streams in an online manner, has attracted much attention in recent years and played a critical role in dealing with high-dimensional problems. However, most of the existing online streaming feature selection methods need the domain information before learning and specifying the parameters in advance. It is hence a challenge to select unified and optimal parameters before learning for all different types of data sets. In this paper, we define a new Neighborhood Rough Set relation with adapted neighbors named the Gap relation and propose a new online streaming feature selection method based on this relation, named OFS-A3M. OFS-A3M does not require any domain knowledge and does not need to specify any parameters in advance. With the “maximal-dependency, maximal-relevance and maximal-significance” evaluation criteria, OFS-A3M can select features with high correlation, high dependency and low redundancy. Experimental studies on fifteen different types of data sets show that OFS-A3M is superior to traditional feature selection methods with the same numbers of features and state-of-the-art online streaming feature selection algorithms in an online manner.
Article
Feature selection (FS) has become a significant part of the data processing pipeline. Recently, ensemble FS has emerged as a new methodology that promises to improve FS robustness and performance. In this paper, we propose several ensemble FS methods built on voting aggregation schemes such as plurality vote, single transferable vote, Borda count, and novel weighted Borda count. Additionally, we present the new concept of clustering FS methods prior to building ensembles using a mean-shift clustering algorithm. The proposed methods are examined using three accuracy measures: the ability to correctly identify relevant features, FS stability, and influence on classification. The ensembles and clustered ensembles based on a weighted Borda count show very balanced performance, achieving quality results in all investigated measures and outperforming the other methods examined.
Article
Similar to feature selection over completely labeled data, the aim of feature selection over partially labeled data (semi-supervised feature selection) is also to find a feature subset which satisfies the intended constraint. Nevertheless, two difficulties may emerge in the semi-supervised feature selection: (1) labels are incomplete since labeled and unlabeled samples coexist in data; (2) the explanation of the selected feature subset is not clear. Therefore, such two problems will be mainly addressed in our research. Firstly, the unlabeled samples can be predicted through various semi-supervised learning methods. Secondly, the Local Neighborhood Decision Error Rate is proposed to construct multiple fitness functions for evaluating the significance of the candidate feature. Such mechanism not only realizes the ensemble selector in the process of feature selection, but also the qualified feature subset will bring us lower decision errors. Immediately, a heuristic algorithm is re-designed to execute feature selection. Finally, through testing nine different ratios (10%, 20%, … 90%) of labeled samples in data, the experimental results demonstrate that our approach is superior to previous researches, mainly because: (1) the qualified feature subset derived by our approach can provide better classification performance; (2) the lower time consumption is required in our process of feature selection.
Article
Ensemble learning is a prolific field in Machine Learning since it is based on the assumption that combining the output of multiple models is better than using a single model, and it usually provides good results. Normally, it has been commonly employed for classification, but it can be used to improve other disciplines such as feature selection. Feature selection consists of selecting the relevant features for a problem and discard those irrelevant or redundant, with the main goal of improving classification accuracy. In this work, we provide the reader with the basic concepts necessary to build an ensemble for feature selection, as well as reviewing the up-to-date advances and commenting on the future trends that are still to be faced.
Article
Feature selection plays a critical role in classification problems. Feature selection methods intend to retain relevant features and eliminate redundant features. This work focuses on feature selection methods based on information theory. By analyzing the composition of feature relevancy, we believe that a good feature selection method should maximize new classification information while minimizing feature redundancy. Therefore, a novel feature selection method named Composition of Feature Relevancy (CFR) is proposed. To evaluate CFR, we conduct experiments on eight real-world data sets and two different classifiers (Naïve-Bayes and Support Vector Machine). Our method outperforms five other competing methods in terms of average classification accuracy and highest classification accuracy.
Article
Feature selection is generally considered a very important step in any pattern recognition process. Its aim is that of reducing the computational cost of the classification task, in an attempt to increase, or not to reduce, the classification performance. In the framework of handwriting recognition, the large variability of the handwriting of different writers makes the selection of appropriate feature sets even more complex and have been widely investigated. Although promising, the results achieved so far present several limitations, that include, among others, the computational complexity, the dependence on the adopted classifiers and the difficulty in evaluating the interactions among features. In this study, we tried to overcome some of the above drawbacks by adopting a feature-ranking-based technique: we considered different univariate measures to produce a feature ranking and we proposed a greedy search approach for choosing the feature subset able to maximize the classification results. In the experiments, we considered one of the most effective and widely used set of features in handwriting recognition to verify whether our approach allows us to obtain good classification results by selecting a reduced set of features. The experimental results, obtained by using standard real word databases of handwritten characters, confirmed the effectiveness of our proposal.
Article
Feature selection ensemble methods are a recent approach aiming at adding diversity in sets of selected features, improving performance and obtaining more robust and stable results. However, using an ensemble introduces the need for an aggregation step to combine all the output methods that conform the ensemble. Besides, when trying to improve computational efficiency, ranking methods that order all initial features are preferred, and so an additional thresholding step is also mandatory. In this work two different ensemble designs based on ranking methods are described. The main difference between them is the order in which the combination and thresholding steps are performed. In addition, a new automatic threshold based on the combination of three data complexity measures is proposed and compared with traditional thresholding approaches based on retaining a fixed percentage of features. The behavior of these methods was tested, according to the SVM classification accuracy, with satisfactory results, for three different scenarios: synthetic datasets and two types of real datasets (where sample size is much higher than feature size, and where feature size is much higher than sample size).
Article
Feature selection, which is used to choose a subset of relevant features has attracted considerable attention in recent years. Typical feature selections include: traditional filters, mutual information based methods, clustering based methods and hybrid methods. As many feature selections cannot achieve the best features effectively and efficiently, a new hybrid feature selection method is proposed in this paper. First, the drawbacks of some existing feature relevance measurements are analyzed and a component co-occurrence based feature relevance measurement is proposed. Then, the implementation of the proposed feature selection is given: (1) the samples are preprocessed and two feature subsets are obtained by using two different optimal filters. (2) A feature weight based union operation is proposed to merge the obtained feature subsets. (3) As the hierarchical agglomerative clustering algorithm can produce clusters of high qualities without requiring the cluster number, it is applied to obtain the final feature subset by using a predetermined threshold. In the experiments, two typical classifiers: support vector machine and K-nearest neighbor are used on eight datasets (Lung-cancer, Breast-cancer-wisconsin, Arrhythmia, Arcene, CNAE-9, Madelon, Spambase and KDD-cup-1999), and the 10-cross validation is carried out when the F1 measurement is used. Experimental results show that the performance of the proposed feature relevance measurement is superior to those of traditional methods. In addition, the proposed feature selection outperforms many existing typical methods on classification accuracy and execution speed, illustrating its effectiveness in achieving the best features.
Article
Fuzzy data occurs frequently in the fields of decision making, social sciences, and control theory. We consider the problem of clustering fuzzy data along with automatic component number detection and feature selection. A model selection criterion called minimum message length is used to address the problem of component number selection. The Bayesian framework can be adopted here, by applying an explicit prior distribution over the parameter values. We discuss both uninformative and informative priors. For the latter, a gradient descent algorithm for automatic optimization of the prior hyper-parameters is presented. The problem of simultaneous feature selection involves ordering the discriminative features according to their relative importance, and at the same time eliminating non-discriminative features. The feature selection problem is also formulated as a parameter estimation problem by extending the concept of feature saliency. Then the estimation can be computed simultaneously with the clustering steps. By combining the clustering, the cluster number detection and the feature selection into one estimation problem, we modified the fuzzy Expectation-Maximization (EM) algorithm to perform all of the estimation. Evaluation criteria are proposed and empirical study results are reported to showcase the efficacy of our proposals.
Article
In an era of social media and connectivity, web users are becoming increasingly enthusiastic about interacting, sharing, and working together through online collaborative media. More recently, this collective intelligence has spread to many different areas, with a growing impact on everyday life, such as in education, health, commerce and tourism, leading to an exponential growth in the size of the social Web. However, the distillation of knowledge from such unstructured Big data is, an extremely challenging task. Consequently, the semantic and multimodal contents of the Web in this present day are, whilst being well suited for human use, still barely accessible to machines. In this work, we explore the potential of a novel semi-supervised learning model based on the combined use of random projection scaling as part of a vector space model, and support vector machines to perform reasoning on a knowledge base. The latter is developed by merging a graph representation of commonsense with a linguistic resource for the lexical representation of affect. Comparative simulation results show a significant improvement in tasks such as emotion recognition and polarity detection, and pave the way for development of future semi-supervised learning approaches to big social data analytics.
Article
Feature selection is a popular data pre-processing step. The aim is to remove some of the features in a data set with minimum information loss, leading to a number of benefits including faster running time and easier data visualisation. In this paper we introduce two unsupervised feature selection algorithms. These make use of a cluster-dependent feature-weighting mechanism reflecting the within-cluster degree of relevance of a given feature. Those features with a relatively low weight are removed from the data set. We compare our algorithms to two other popular alternatives using a number of experiments on both synthetic and real-world data sets, with and without added noisy features. These experiments demonstrate our algorithms clearly outperform the alternatives.
Article
Sparse coding and dictionary learning has recently gained great interest in signal, image and audio processing applications through representing each problem instance by a sparse set of atoms. This also allows us to obtain different representations of feature sets in machine learning problems. Thus, different feature views for classifier ensembles can be obtained using sparse coding. On the other hand, nowadays unlabelled data is abundant and active learning methods with single and classifier ensembles received great interest. In this study, Random Subspace Dictionary Learning (RDL) and Bagging Dictionary Learning (BDL) algorithms are examined by learning ensembles of dictionaries through feature/instance subspaces. Besides, ensembles of dictionaries are evaluated under active learning framework as promising models and they are named as Active Random Subspace Dictionary Learning (ARDL) and Active Bagging Dictionary Learning (ABDL) algorithms. Active learning methods are compared with their Support Vector Machines counterparts. The experiments on eleven datasets from UCI and OpenML repositories has shown that selecting instance and feature subspaces for dictionary learning model increases the number of correctly classified instances for the most of the data sets while SVM has superiority over all of the applied models. Furthermore, using an active learner generally increases the chance of improved classification performance as the number of iterations is increased.
Article
Multi-label document classification is a typical challenge in many real-world applications. Multi-label ranking is a common approach, while existing studies usually disregard the effects of context and the relationships among labels during the scoring process. In this paper, we propose an Long Short Term Memory (LSTM)-based multi-label ranking model for document classification, namely LSTM(Formula presented.) consisting of repLSTM—an adaptive data representation process and rankLSTM—a unified learning-ranking process. In repLSTM, the supervised LSTM is used to learn document representation by incorporating the document labels. In rankLSTM, the order of the documents labels is rearranged in accordance with a semantic tree, in which the semantics are compatible with and appropriate to the sequential learning of LSTM. The model can be wholly trained by sequentially predicting labels. Connectionist Temporal Classification is performed in rankLSTM to address the error propagation for a variable number of labels in each document. Moreover, a variety of experiments with document classification conducted on three typical datasets reveal the impressive performance of our proposed approach.
Article
Feature selection problem in data mining is addressed here by proposing a bi-objective genetic algorithm based feature selection method. Boundary region analysis of rough set theory and multivariate mutual information of information theory are used as two objective functions in the proposed work, to select only precise and informative data from the data set. Data set is sampled with replacement strategy and the method is applied to determine non-dominated feature subsets from each sampled data set. Finally, ensemble of such bi-objective genetic algorithm based feature selectors is developed with the help of parallel implementations to produce much generalized feature subset. In fact, individual feature selector outputs are aggregated using a novel dominance based principle to produce final feature subset. Proposed work is validated using repository especially for feature selection datasets as well as on UCI machine learning repository datasets and the experimental results are compared with related state of art feature selection methods to show effectiveness of the proposed ensemble feature selection method.
Article
In feature selection, the most important features must be chosen so as to decrease the number thereof while retaining their discriminatory information. Within this context, a novel feature selection method based on an ensemble of wrappers is proposed and applied for automatically select features in fish age classification. The effectiveness of this procedure using an Atlantic cod database has been tested for different powerful statistical learning classifiers. The subsets based on few features selected, e.g. otolith weight and fish weight, are particularly noticeable given current biological findings and practices in fishery research and the classification results obtained with them outperforms those of previous studies in which a manual feature selection was performed.
Article
Ensemble classification is a well-established approach that involves fusing the decisions of multiple predictive models. A similar “ensemble logic” has been recently applied to challenging feature selection tasks aimed at identifying the most informative variables (or features) for a given domain of interest. In this work, we discuss the rationale of ensemble feature selection and evaluate the effects and the implications of a specific ensemble approach, namely the data perturbation strategy. Basically, it consists in combining multiple selectors that exploit the same core algorithm but are trained on different perturbed versions of the original data. The real potential of this approach, still object of debate in the feature selection literature, is here investigated in conjunction with different kinds of core selection algorithms (both univariate and multivariate). In particular, we evaluate the extent to which the ensemble implementation improves the overall performance of the selection process, in terms of predictive accuracy and stability (i.e., robustness with respect to changes in the training data). Furthermore, we measure the impact of the ensemble approach on the final selection outcome, i.e. on the composition of the selected feature subsets. The results obtained on ten public genomic benchmarks provide useful insight on both the benefits and the limitations of such ensemble approach, paving the way to the exploration of new and wider ensemble schemes.
Robust clustering using a kNN mode seeking ensemble. Pattern Recogn
  • J N Myhre
  • JN Myhre
A ranking-based feature selection approach for handwritten character recognition
  • N D Cilia
  • ND Cilia