Inducing multi-objective clustering ensembles with genetic programming

Federal University of São Carlos, Sorocaba Campus, Rod. João Leme dos Santos, Km 110, Bairro Itinga, 18052-780 Sorocaba, SP, Brazil
Neurocomputing (Impact Factor: 2.08). 12/2010; 74(1):494-498. DOI: 10.1016/j.neucom.2010.09.014
Source: DBLP


The recent years have witnessed a growing interest in two advanced strategies to cope with the data clustering problem, namely, clustering ensembles and multi-objective clustering. In this paper, we present a genetic programming based approach that can be considered as a hybrid of these strategies, thereby allowing that different hierarchical clustering ensembles be simultaneously evolved taking into account complementary validity indices. Results of computational experiments conducted with artificial and real datasets indicate that, in most of the cases, at least one of the Pareto optimal partitions returned by the proposed approach compares favorably or go in par with the consensual partitions yielded by two well-known clustering ensemble methods in terms of clustering quality, as gauged by the corrected Rand index.

Download full-text


Available from: André Coelho
  • [Show abstract] [Hide abstract]
    ABSTRACT: In real-world problems we encounter situations where patterns are described by blocks (families) of features where each of these groups comes with a well-expressed semantics. For instance, in spatiotemporal data we are dealing with spatial coordinates of the objects (say, x–y coordinates) while the temporal part of the objects forms another collection of features. It is apparent that when clustering objects being described by families of features, it becomes intuitively justifiable to anticipate their different role and contribution to the clustering process of the data whereas the clustering is sought to be reflective of an overall structure in the data set. To address this issue, we introduce an agreement based fuzzy clustering—a fuzzy clustering with blocks of features. The detailed investigations are carried out for the well-known algorithm of fuzzy clustering that is fuzzy C-means (FCM). We propose an extended version of the FCM where a composite distance function is endowed with adjustable weights (parameters) quantifying an impact coming from the blocks of features. A global evaluation criterion is used to assess the quality of the obtained results. It is treated as a fitness function in the optimization of the weights through the use of particle swarm optimization (PSO). The behavior of the proposed method is investigated in application to synthetic and real-world data as well as a certain case study.
    No preview · Article · Mar 2014 · Neurocomputing
  • [Show abstract] [Hide abstract]
    ABSTRACT: The partitional clustering concept started with K-means algorithm which was published in 1957. Since then many classical partitional clustering algorithms have been reported based on gradient descent approach. The 1990 kick started a new era in cluster analysis with the application of nature inspired metaheuristics. After initial formulation nearly two decades have passed and researchers have developed numerous new algorithms in this field. This paper embodies an up-to-date review of all major nature inspired metaheuristic algorithms employed till date for partitional clustering. Further, key issues involved during formulation of various metaheuristics as a clustering problem and major application areas are discussed.
    No preview · Article · Jun 2014 · Swarm and Evolutionary Computation
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, a new multi-objective genetic programming (GP) with a diversity preserving mechanism and a real number alteration operator is presented and successfully used for Pareto optimal modelling of some complex non-linear systems using some input–output data. In this study, two different input–output data-sets of a non-linear mathematical model and of an explosive cutting process are considered separately in three-objective optimisation processes. The pertinent conflicting objective functions that have been considered for such Pareto optimisations are namely, training error (TE), prediction error (PE), and the length of tree (complexity of the network) (TL) of the GP models. Such three-objective optimisation implementations leads to some non-dominated choices of GP-type models for both cases representing the trade-offs among those objective functions. Therefore, optimal Pareto fronts of such GP models exhibit the trade-off among the corresponding conflicting objectives and, thus, provide different non-dominated optimal choices of GP-type models. Moreover, the results show that no significant optimality in TE and PE may occur when the TL of the corresponding GP model exceeds some values.
    Full-text · Article · Jun 2014 · International Journal of Systems Science
Show more