Article

A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data

IEEE Transactions on Knowledge and Data Engineering (Impact Factor: 2.07). 01/2011; 25(99):1 - 1. DOI: 10.1109/TKDE.2011.181
Source: IEEE Xplore

ABSTRACT

A fast clustering-based feature selection algorithm, FAST, is proposed and experimentally evaluated in this paper. The FAST algorithm works in two steps. In the first step, features are divided into clusters by using graph theoretic clustering methods. In the second step, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features. Features in different clusters are relatively independent, the clustering-based strategy of FAST has a high probability of producing a subset of useful and independent features. To ensure the efficiency of FAST, we adopt the efficient maximum-spanning tree clustering method. The efficiency and effectiveness of the FAST algorithm are evaluated through an empirical study. Extensive experiments are carried out to compare FAST and several representative feature selection algorithms with respect to four types of well-known classifiers before and after feature selection. The results, on 35 publicly available real-world high dimensional image, microarray, and text data, demonstrate that FAST not only produces smaller subsets of features but also improves the performances of the four types of classifiers.

  • Source
    • "FAST is compared with other Feature Selection algorithms like FCBF, ReliefF etc, with respect to classifiers, namely the probability based Naïve-Bayes, rule based RIPPER, instance based IB1 and tree based C4.5. FAST relatively produces smaller subsets and also improves the performance of above four types of classifiers [6]. For prediction and diagnosis of various diseases with good accuracy Data Mining techniques are widely used. "

    Preview · Article · Oct 2015
  • Source
    • "DF), term frequency (TF), inverse document frequency (IDF), information gain (IG), mutual information (MI), Compactness, First Appearance (FA) [15] "

    Preview · Article · Oct 2015
  • Source
    • "If the class-relevance of a feature is lower than that of another and the correlation between them, it would be identified as a redundant features and thus to be removed. Recently, an extension of FCBF, namely fast clustering-based feature selection algorithm (FAST), is proposed [28]. In this algorithm, features are firstly divided into clusters. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Feature selection has attracted significant attention in data mining and machine learning in the past decades. Many existing feature selection methods eliminate redundancy by measuring pairwise inter-correlation of features, whereas the complementariness of features and higher inter-correlation among more than two features are ignored. In this study, a modification item concerning feature complementariness is introduced in the evaluation criterion of features. Additionally, in order to identify the interference effect of already-selected False Positives (FPs), the redundancy-complementariness dispersion is also taken into account to adjust the measurement of pairwise inter-correlation of features. To illustrate the effectiveness of proposed method, classification experiments are applied with four frequently used classifiers on ten datasets. Classification results verify the superiority of proposed method compared with seven representative feature selection methods.
    Full-text · Article · Jul 2015 · Knowledge-Based Systems
Show more