A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data

IEEE Transactions on Knowledge and Data Engineering (Impact Factor: 1.82). 01/2011; 25(99):1 - 1. DOI: 10.1109/TKDE.2011.181
Source: IEEE Xplore

ABSTRACT A fast clustering-based feature selection algorithm, FAST, is proposed and experimentally evaluated in this paper. The FAST algorithm works in two steps. In the first step, features are divided into clusters by using graph theoretic clustering methods. In the second step, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features. Features in different clusters are relatively independent, the clustering-based strategy of FAST has a high probability of producing a subset of useful and independent features. To ensure the efficiency of FAST, we adopt the efficient maximum-spanning tree clustering method. The efficiency and effectiveness of the FAST algorithm are evaluated through an empirical study. Extensive experiments are carried out to compare FAST and several representative feature selection algorithms with respect to four types of well-known classifiers before and after feature selection. The results, on 35 publicly available real-world high dimensional image, microarray, and text data, demonstrate that FAST not only produces smaller subsets of features but also improves the performances of the four types of classifiers.


Available from: Qinbao Song, Nov 23, 2014
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Since the beginning of Internet, Internet Service Providers (ISP) have seen the need of giving to users’ traffic different treatments defined by agreements between ISP and customers. This procedure, known as Quality of Service Management, has not much changed in the last years (DiffServ and Deep Packet Inspection have been the most chosen mechanisms). However, the incremental growth of Internet users and services jointly with the application of recent Ma-chine Learning techniques, open up the possibility of going one step forward in the smart management of network traffic. In this paper, we first make a survey of current tools and techniques for QoS Management. Then we introduce clustering and classifying Machine Learning techniques for traffic characterization and the concept of Quality of Experience. Finally, with all these components, we present a brand new framework that will manage in a smart way Quality of Service in a telecom Big Data based scenario, both for mobile and fixed communications.
    Workshop on Big Data Applications and Principles (BIGDAP 2014), Madrid (Spain); 09/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Feature selection in data mining minimizes the complexity involved in selecting meaningful attributes from the given data set. Attribute selection can be majorly categorized into Filter Approach or Wrapper Approach. Filter Approach filters the meaningful attributes from the data set, whereas, wrapper approach creates a wrapper like coverage between the meaningful and meaningless attributes. These two categories can further be classified into heuristics and complete search, meta-heuristic and artificial neural network methods. The objective of this paper is to survey common key steps involved feature selection and to describe more about the insights of feature subset selection proposed by various researchers. Experimentation is done for selecting the best algorithms for attribute subset selection with the rough dataset and the results are discussed.