Article

Inducing multi-objective clustering ensembles with genetic programming

Federal University of São Carlos, Sorocaba Campus, Rod. João Leme dos Santos, Km 110, Bairro Itinga, 18052-780 Sorocaba, SP, Brazil
Neurocomputing (Impact Factor: 2.08). 12/2010; 74(1):494-498. DOI: 10.1016/j.neucom.2010.09.014
Source: DBLP

ABSTRACT

The recent years have witnessed a growing interest in two advanced strategies to cope with the data clustering problem, namely, clustering ensembles and multi-objective clustering. In this paper, we present a genetic programming based approach that can be considered as a hybrid of these strategies, thereby allowing that different hierarchical clustering ensembles be simultaneously evolved taking into account complementary validity indices. Results of computational experiments conducted with artificial and real datasets indicate that, in most of the cases, at least one of the Pareto optimal partitions returned by the proposed approach compares favorably or go in par with the consensual partitions yielded by two well-known clustering ensemble methods in terms of clustering quality, as gauged by the corrected Rand index.

Full-text

Available from: André Coelho
Letters
Inducing multi-objective clustering ensembles with genetic programming
Andre
´
L.V. Coelho
a,
n
, Everl
ˆ
andio Fernandes
a
, Katti Faceli
b
a
Graduate Program in Applied Informatics, Center of Technological Sciences, University of Fortaleza, Av. Washington Soares, 1321/J30, 60811-905 Fortaleza, CE, Brazil
b
Federal University of S
~
ao Carlos, Sorocaba Campus, Rod. Jo
~
ao Leme dos Santos, Km 110, Bairro Itinga, 18052-780 Sorocaba, SP, Brazil
article info
Article history:
Received 11 December 2009
Received in revised form
13 August 2010
Accepted 14 September 2010
Communicated by A. Abraham
Available online 16 October 2010
Keywords:
Cluster analysis
Ensembles
Multi-objective optimization
Genetic programming
abstract
The recent years have witnessed a growing interest in two advanced strategies to cope with the data
clustering problem, namely, clustering ensembles and multi-objective clustering. In this paper, we
present a genetic programming based approach that can be considered as a hybrid of these strategies,
thereby allowing that different hierarchical clustering ensembles be simultaneously evolved taking into
account complementary validity indices. Results of computational experiments conducted with artificial
and real datasets indicate that, in most of the cases, at least one of the Pareto optimal partitions returned
by the proposed approach compares favorably or go in par with the consensual partitions yielded by two
well-known clustering ensemble methods in terms of clustering quality, as gauged by the corrected
Rand index.
& 2010 Elsevier B.V. All rights reserved.
1. Introduction
In a nutshell, the goal of clustering is to partition a set of objects
into groups (clusters) so that objects assigned to the same group are
more akin to each other than those from distinct groups [14]. Over
the years, several clustering algorithms have been conceived, each
with its own set of parameters and producing data partitions in
consonance with a specific clustering criterion [9]. Although these
algorithms have been widely adopted in many fields, they usually
display shortcomings, such as: sensitiveness to parameter settings
[9]; requirement that the number of clusters be set a priori [9,12];
and difficulty in uncovering partitions with different types of
clusters [8,10]. Moreover, it is well known that there is no
algorithm, optimizing a unique criterion, able to reveal all types
of relevant structures that may be simultaneously present in the
dataset under analysis [5].
Recently, multi-objective clustering (MOC) [8,10] and clustering
ensembles (CE) [12,13] have emerged as two promising strategies
to cope with the abovementioned limitations. MOC focuses on the
simultaneous optimization of a number of clustering criteria to
generate a set of alternative structures possibly representing
diverse interpretations (views) of the data [8], while CE aims at
improving the overall clustering quality by reconciling information
coming from partitions produced by different clustering algo-
rithms or even by different runs of the same algorithm [12,13].
In this brief paper, we present a novel approach hybridizing CE
and MOC strategies in a manner as to combine their positive
aspects. The approach is founded on a Pareto-based version of
genetic programming (GP) [4], which is employed to evolve a
population of clustering ensemble models taking into account
complementary validity measures (namely, overall deviation and
connectivity [8]) as optimization criteria. By this means, different
structures present in the data and related to different clustering
criteria can be simultaneously revealed. Each clustering ensemble
model is, in fact, a hierarchy of consensus functions (CF) applied
over a subset of previously generated base partitions. These base
partitions can be created through the application of traditional
clustering algorithms [9] and take part in the GP grammar as
terminal symbols. On the other hand, different CF are already
available in the literaturesuch as cluster-based similarity parti-
tioning algorithm (CSPA), hyper-graph partitioning algorithm
(HGPA), meta-clustering algorithm (MCLA), and supra-consensus
(SC), all proposed by Strehl and Ghosh [12], as well as hybrid
bipartite graph formulation (HBGF), conceived by Fern and Brodley
[7]and they can be recruited to compose the GP grammar as non-
terminal symbols. Since these consensus functions are nonlinear in
nature and exploit different aspects of the base partitions in order
to generate a consensus one, it is expected that their arrangement
into hierarchies and operating over partitions with different types
of clusters should bring about improvements in terms of clustering
accuracy.
In the sequel, we outline the main components and steps of the
novel approach. Then, we report on computational experiments
conducted over datasets with varying structures, whereby the
quality of the partitions returned by the GP-based approach is
Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/neucom
Neurocomputing
0925-2312/$ - see front matter & 2010 Elsevier B.V. All rights reserved.
doi:10.1016/j.neucom.2010.09.014
n
Corresponding author. Tel.: +55 85 34773268; fax: +55 85 34773061.
E-mail addresses: acoelho@unifor.br, coelho.alv@gmail.com (A.L.V. Coelho),
everlandio@gmail.com (E. Fernandes), katti@ufscar.br (K. Faceli).
Neurocomputing 74 (2010) 494–498
Page 1
assessed and compared with the quality of the partitions produced
by SC and HBGF.
2. Multi-objective clustering ensembles via genetic
programming
Fig. 1 summarizes the main steps behind the novel approach.
Firstly, a range of data partitions of varying quality, with distinct
levels of refinement, and possibly involving a large assortment of
cluster types and densities, should be generated at the outset and
then incorporated as terminals of the GP grammar. These base
partitions can be produced by methods associated with different
clustering criteria, by several runs of the same algorithm (with
diverse initial seeds/parameter configurations), and/or by cluster-
ing different subsamplings of the original dataset with the same
algorithm [5,13,14]. Together, one or more CF, such as those
aforementioned, should be implemented as non-terminal symbols
to operate over the base partitions while assembling the GP
individuals.
Having specified the GP grammar, the next step is to randomly
generate a population P with N
P
tree-like individuals. This initial
population (like the others of subsequent iterations) may contain
trees with different shapes and complexities (i.e., with varying
numbers of nodes, levels, and nested fusions), and each of these
trees should be interpreted in turn in order to have its fitness
assessed. It is worth pointing out that there is no imposition that all
base partitions available in the terminal set be simultaneously
merged into each individual. This means that good subsets of base
partitions to be fused can be automatically searched for while
assembling the hierarchies whereas those of poor quality can be
evolutionarily avoided, discarded or replaced. Besides, the inter-
pretation of each tree is performed from bottom up, meaning that
higher-level CF nodes (that is, those closer to the leaves) are applied
first, and their results (partitions) are propagated upwards the
hierarchy to serve as inputs (arguments) to lower-level CF nodes.
The number of arguments (partitions), N
A
, of each consensus
function may vary and, since the outcome of any CF is a partition,
the application of several of these functions in a hierarchy will
always end up with a partition as result. This property fulfills the
closure requirement of GP [4]. Moreover, due to the usually
nonlinear character of these functions, the consensus partitions
yielded by the recursive application of fusion operators over
different subsets of base partitions (branches of the trees) can be
significantly different, and of better quality, than those produced
through a unique application of a single CF over all base partitions
available (which is the typical case for clustering ensembles [12]).
For the multi-objective assessment of the quality of the con-
sensual partitions yielded by the GP individuals, distinct validity
indices should be employed [14]. So far, we have resorted to overall
deviation and connectivity due to their complementary roles [8]:
While the former rates the levels of cluster compactness and
encourages the induction of dense/spherical groups, the latter
gauges the levels of sample connectivity within groups, yielding
clusters with arbitrary shapes. These measures are internal indices
and assume no prior knowledge on the structure underlying the
data. Once the quality of the consensual partitions represented by
the individuals was assessed, the population is stratified into
groups, called fronts, following a strategy compliant to NSGA-II
[2]. The fitness of an individual of a front is proportionate to the
front’s rank, meaning that a minimization process is in course. In
particular, the first front contains those individuals that are not
dominated by any other member of the current population, that is,
it is composed of those consensus partitions representing the best
compromises to overall deviation and connectivity.
After fitness assignment and population stratification, new
individuals (offspring) are iteratively created and inserted into
the population by applying standard genetic operators (crossover
and mutation) over some individuals (parents) selected from the
current generation [4]. At each step, the choice of the genetic
Start
Create base
partitions
Generate initial
population
Fitness assignment
and front-based
population ranking
Crowding
distance
Population
replacement
Fitness assignment
and front-based
population ranking
Selection and
genetic operators for
offspring creation
Maximum
number of
generations?
Yes
No
End of
processing
Fig. 1. Flowchart with the main steps of the proposed GP-based approach.
A.L.V. Coelho et al. / Neurocomputing 74 (2010) 494–498 495
Page 2
operator and the parent(s) is done probabilistically until the pool
of offspring is complete. Parental selection is influenced by the
fitness value and the crowding distance. The latter, borrowed from
NSGA-II [2], estimates the density of the region of the search space
where each GP individual resides: The higher the density, the
higher the value of this parameter. The newborn trees have the
quality of their partitions measured, the augmented population is
stratified again, and then all its members receive a new fitness
value. The population of the next generation will be composed of
the best N
P
individuals.
This process of partition assessment, population stratification,
and offspring creation is repeated until the maximum number of
generations N
G
is reached. Then, the consensus partitions asso-
ciated with the individuals belonging to the first front of the last
population (which should be interpreted as an approximation to
the Pareto optimal set) is returned as the final result of the GP
evolutionary process. It is expected that these partitions represent
the best compromises of the validity indices adopted and, since
each index evaluates different properties of the partitions, the
resulting set of partitions should be representative of the different
types of structures underlying the dataset.
3. Computational experiments
To assess the potentials of the proposed method, a prototype
was implemented with the help of GPLAB toolbox [11], computa-
tional experiments have been conducted, and preliminary results
are reported here. We have chosen GPLAB mostly because of its
highly modular structure (whereby functions are modeled as ‘‘plug
and play’’ devices), making it an easily extendable tool. Moreover,
this toolbox is equipped with several good functionalities, such as
pre-made functions and terminals for building trees, different
modes of tree initialization, a number of genetic operators, offline
and runtime graphical facilities, among others [11].
We compare the performance of the novel approach with that of
HBGF and SC. Table 1 brings a summary of the 10 datasets adopted
in the experiments. Each dataset has n data samples, d attributes,
and at least N
S
underlying structures, which are known a priori.
Although other unknown structures may be hidden in these
datasets, the known structures have been used to check the
effectiveness of our approach. Overall, there are 22 known struc-
tures to be revealed, each structure with a certain number of
clusters k and representing a well-defined partition. In Table 1, the
numbers of clusters associated with the N
S
structures related to a
given dataset are given within parentheses.
This repertory of datasets has been specifically selected so as to
encompass a broad range of characteristics. Four datasets, namely,
ds2c2sc13, ds3c3sc6, ds4c2sc8, and spiralsquare, were artificially
synthesized to contain heterogeneous structures in different levels
of resolution [5]. They are two-dimensional in nature so as to
make the visualization of clusters and partitions easy. Other two
datasets, iris and glass, are considered as benchmark and were
taken from the UCI repository [1]. As these two datasets have been
extensively used, it is possible to compare the results reported here
with those delivered in the related literature. The remaining
datasets (golub, proteins, leukemia, and lung) are related to
real-life, bioinformatics problems, have large dimensions, and
are described elsewhere [6].
After preliminary tuning, the GP-based approach has been
configured with N
P
¼ 50, N
G
¼ 10 and N
A
¼ {2,3}. This means that
each CF in the hierarchies could operate over two or three
partitions, with the trees of the initial population being generated
via the full method [4]. As CF, we have adopted only HBGF in the
experiments reported here. The base partitions have been gener-
ated with the following algorithms, configured with different
values for their control parameters: k-means and hierarchical
single linkage (which are biased toward cluster compactness),
and hierarchical average linkage and shared nearest neighbors
(rooted in connectivity) [5,9,14]. Moreover, for each dataset, base
partitions with the number of clusters varying in the range
[k
min
,k
max
] have been produced with these methods, where k
min
and k
max
equal, respectively, to the smallest number of clusters and
to twice the largest number of clusters among those of the known
structures. By this means, a large assortment of cluster types could
be produced, and so the size of the terminal set of the GP grammar
has varied for each dataset.
Since the application of both SC and HBGF requires the number
of clusters of the consensual partition as input, we have adopted the
same range as above for deriving the results of these methods.
Moreover, to comparatively assess the quality of the final partitions
created by the contestants, the corrected Rand (CR) index has been
used [14]. This external validity criterion is good for its insensi-
tiveness to the number of clusters in a structure, and it measures
the similarity between two partitions in this way: Values close to 0
mean random partitions and close to 1 indicate a perfect match
between the partitions. So, for each CE method, the CR value
between the best resulting partitions, according to each known
structure, and the corresponding known structure has been
calculated. In this regard, it should be emphasized that the
comparison of the partitions produced by these algorithms with
the known structures is conducted only for the purpose of
assessment of the potentials of the proposed approach. The
practical application to datasets for which the underlying struc-
tures are unknown could be indeed assessed by using internal
validation indices. However, it is worth remembering that such
indices are usually biased towards different clustering criteria.
Table 2 shows the performance results achieved by the contest-
ants for all datasets. For SC and HBGF, the mean values of the CR
index have been obtained over all partitions these methods have
produced by varying the number of clusters, as mentioned above.
Conversely, as our approach is stochastic in nature, 30 runs of the
GP process have been performed for each dataset and the average
CR value has been calculated over the best partition from the Pareto
optimal set of the last population in each run. In those cases where
more than one underlying structure is available, the CR value
is calculated for each structure separately.
As one can notice, the GP-based approach has performed very
well, comparing favorably or at least going in par with SC and HBGF
in all but five structures. This means that in more than 75% of the
cases, a consensus partition with better quality (as measured by CR)
could be discovered for a given structure of a dataset. Although in
some cases the gains may seem low, one should bear in mind that
CR is a highly sensitive index, which implies that a small increase in
its value may correspond to a significant improvement in terms of
partition quality. Moreover, to demonstrate that the GP-based
approach has outperformed both SC and HBGF in a statistically
significant way, we have resorted to the application of the Fried-
man test and Nemenyi test, since these non-parametric statistical
tests are suitable to compare the performance of different learning
algorithms when applied over multiple datasets (for a thorough
discussion on these tests, please refer to [3]). By applying the
Table 1
Configuration of the datasets used in the experiments.
Dataset ndN
S
(k) Dataset nd N
S
(k)
ds2c2sc13 588 2 3 (2, 5, 13) iris 150 4 1 (3)
ds3c3sc6 905 2 2 (3, 6) golub 72 3571 4 (2, 3, 4, 2)
ds4c2sc8 485 2 2 (2, 8) proteins 698 125 2 (4, 27)
spiralsquare 2000 2 2 (2, 6) leukemia 327 271 2 (3, 7)
glass 214 9 3 (2, 5, 6) lung 197 1000 1 (4)
A.L.V. Coelho et al. / Neurocomputing 74 (2010) 494–498496
Page 3
Friedman test first, we could perceive that the differences in
performance exhibited by the algorithms were in fact statistically
significant at a level of 0.05. Conversely, the application of Nemenyi
post-test with the same level of significance revealed that the
GP-based method has indeed prevailed over SC and HBGF.
4. Final remarks
In this paper, we have presented and empirically assessed a
GP-based approach for the multi-objective induction of hierarchical
clustering ensembles. By making use of CE individuals arranged in the
form of trees, the approach allows that novel hierarchical consensus
functions be automatically designed taking as basis primitive con-
sensus functions already available in the literature. Moreover, by
structuring and evolving the population through a multi-objective
basis, considering complementary validity indices, the approach is
capable of finding high-quality partitions that are representative of
the different structures available for a given dataset. The results
achieved with computational experiments conducted over a range of
datasets have confirmed the feasibility of the proposed approach,
which has demonstrated better performance in comparison with two
well-known CE methods (viz., SC and HBGF).
As future work, a more thorough theoretical analysis of the
GP-based approach shall be undertaken with the purpose of
highlighting the aspects that make it distinctive from other related
methods, such as the multi-objective clustering with automatic
K-determination (MOCK) algorithm [8] and the multi-objective
clustering ensemble (MOCLE) algorithm [5,6]. A systematic empiri-
cal comparison with these methods is also underway, whereby
other configurations of the GP framework presented here have
been devised and assessed.
Acknowledgements
The work of the first author has been financially sponsored by
CNPq/Brazil, under Grant #312934/2009-2.
References
[1] A. Asuncio
´
n, D.J. Newman, UCI machine learning repository, /http://www.ics.
uci.edu/ mlearn/MLRepository.htmlS, 2007.
[2] K. Deb, A. Pratap, S. Agarwal, T. Meyrivan, A fast and elitist multi-objective
genetic algorithm: NSGA-II, IEEE Trans. Evol. Comput. 6 (2) (2002) 182–197.
[3] J. Dem
ˇ
sar, Statistical comparisons of classifiers over multiple data sets, J. Mach.
Learn. Res. 7 (2006) 1–30.
[4] A.E. Eiben, J.E. Smith, Introduction to Evolutionary Computing, 2nd ed.,
Springer, 2007.
[5] K. Faceli, A.C.P.L.F. de Carvalho, M.C.P. de Souto, Multi-objective clustering
ensemble, Int. J. Hybrid Intell. Syst. 4 (3) (2008) 145–156.
[6] K. Faceli, M.C.P. de Souto, D.S.A. de Arau
´
jo, A.C.P.L.F. de Carvalho, Multi-
objective clustering ensemble for gene expression data analysis, Neurocom-
puting 72 (13–15) (2009) 2763–2774.
[7] X.Z. Fern, C.E. Brodley, Solving cluster ensemble problems by bipartite graph
partitioning, in: Proceedings of the International Conference on Machine
Learning, ACM International ConferenceProceedings Series, Banff, Canada, 2004.
[8] J. Handl, J. Knowles, An evolutionary approach to multiobjective clustering,
IEEE Trans. Evol. Comput. 11 (1) (2007) 56–76.
[9] A.K. Jain, M. Murty, P. Flynn, Data clustering: a review, ACM Comput. Surv. 31
(3) (1999) 264–323.
[10] M. Law, A. Topchy, A.K. Jain, Multiobjective data clustering, in: Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, 2004,
pp. 424–430.
[11] S. Silva, GPLABa genetic programming toolbox for MATLAB, version 3,
University of Coimbra, 2007.
[12] A. Strehl, J. Ghosh, Cluster ensemblesa knowledge reuse framework for
combining multiple partitions, J. Mach. Learn. Res. 3 (2002) 583–617.
[13] A. Topchy, A.K. Jain, W. Punch, Clustering ensembles: models of consensus and
weak partitions, IEEE Trans. Pattern Anal. Mach. Intell. 27 (12) (2005)
1866–1881.
[14] R. Xu, D.C. Wunsch, Clustering, Wiley-IEEE Press, 2008.
Andre
´
L.V. Coelho received the B.Sc. degree in Compu-
ter Engineering in 1996, and earned the M.Sc. and Ph.D.
degrees in Electrical Engineering in 1998 and 2004,
respectively, all from the State University of Campinas
(Unicamp), Brazil. He has a record of publications
related to the themes of machine learning, data mining,
computational intelligence, metaheuristics, and mul-
tiagent systems. He is a member of ACM and has served
as a reviewer for a number of scientific conferences and
journals. Currently, he is an adjunct professor affiliated
with the Graduate Program in Applied Informatics at
the University of Fortaleza, Ceara
´
, Brazil.
Table 2
Performance (average7 standard deviation of CR values) of the CE algorithmsbest results are highlighted.
Dataset Structure SC HBGF GP
ds2c2sc13 E1 (k¼ 2) 0.69107 0.058 0.88747 0.1428 1.000070.0000
E2 (k¼ 5) 0.99207 0.022 0.91117 0.0000 1.0000 70.0000
E3 (k¼ 13) 0.78607 0.042 0.77037 0.0023 0.777770.0097
ds3c3sc6 E1 (k¼ 3) 0.89007 0.023 0.89007 0.0000 0.93657 0.0200
E2 (k¼ 6) 0.5969 70.044 0.56627 0.0000 0.64847 0.0212
ds4c2sc8 E1 (k¼ 2) 0.29587 0.078 0.37567 0.0567 0.33647 0.0371
E2 (k¼ 8) 0.84607 0.058 0.89407 0.0000 0.90767 0.0136
spiralsquare E1 (k¼ 2) 1.00007 0.0000 1.00007 0.0000 1.00007 0.0000
E2 (k¼ 6) 0.6381
7 0.062 0.73127 0.0222 0.66677 0.0004
glass E1 (k¼ 2) 0.63407 0.048 0.66057 0.0002 0.6938 70.0249
E2 (k¼ 5) 0.4764 70.039 0.50447 0.0000 0.572970.0226
E3 (k¼ 6) 0.22407 0.016 0.23947 0.0000 0.2808 70.018
iris E1 (k¼ 3) 0.7591 70.018 0.75927 0.000 0.759270.0000
golub E1 (k¼ 2) 0.74307 0.175 0.842170.0000 0.88157 0.0822
E2 (k¼ 3) 0.63707 0.129 0.79037 0.0531 0.87737 0.0242
E3 (k¼ 4) 0.58907 0.124 0.65067 0.0176 0.67787 0.0506
E4 (k¼ 2) 0.11007 0.074 0.00677 0.0091 0.04917 0.0139
proteins E1 (k¼
4) 0.3150 70.020 0.33117 0.0000 0.336670.0123
E2 (k¼ 27) 0.13307 0.006 0.12307 0.0051 0.1403 7 0.0056
leukemia E1 (k¼ 3) 0.31507 0.046 0.36257 0.0354 0.28817 0.0088
E2 (k¼ 7) 0.65907 0.053 0.56037 0.0007 0.7808 70.0059
lung E1 (k¼ 4) 0.43807 0.071 0.51747 0.0533 0.79287 0.0433
A.L.V. Coelho et al. / Neurocomputing 74 (2010) 494–498 497
Page 4
Everlandio Fernandes received in 2002 the B.Sc. degree
in Computer Sciences from the Federal University of Rio
Grande do Norte, Brazil. He also holds a M.Sc. degree in
Applied Informatics from the University of Fortaleza
(2009). His areas of interest are clustering, committee
machines, and evolutionary algorithms.
Katti Faceli received the B.Sc., M.Sc. and Ph.D. degrees
in Computer Science in, respectively, 1998, 2001, and
2006, all from the University of Sao Paulo, Brazil.
Currently, she is Associate Professor at Federal Univer-
sity of S
~
ao Carlos, Campus Sorocaba, Brazil. Her main
research interests include machine learning, hybrid
intelligent systems, cluster analysis, ensembles, feature
selection and bioinformatics.
A.L.V. Coelho et al. / Neurocomputing 74 (2010) 494–498498
Page 5
  • [Show abstract] [Hide abstract] ABSTRACT: In real-world problems we encounter situations where patterns are described by blocks (families) of features where each of these groups comes with a well-expressed semantics. For instance, in spatiotemporal data we are dealing with spatial coordinates of the objects (say, x–y coordinates) while the temporal part of the objects forms another collection of features. It is apparent that when clustering objects being described by families of features, it becomes intuitively justifiable to anticipate their different role and contribution to the clustering process of the data whereas the clustering is sought to be reflective of an overall structure in the data set. To address this issue, we introduce an agreement based fuzzy clustering—a fuzzy clustering with blocks of features. The detailed investigations are carried out for the well-known algorithm of fuzzy clustering that is fuzzy C-means (FCM). We propose an extended version of the FCM where a composite distance function is endowed with adjustable weights (parameters) quantifying an impact coming from the blocks of features. A global evaluation criterion is used to assess the quality of the obtained results. It is treated as a fitness function in the optimization of the weights through the use of particle swarm optimization (PSO). The behavior of the proposed method is investigated in application to synthetic and real-world data as well as a certain case study.
    No preview · Article · Mar 2014 · Neurocomputing
  • [Show abstract] [Hide abstract] ABSTRACT: The partitional clustering concept started with K-means algorithm which was published in 1957. Since then many classical partitional clustering algorithms have been reported based on gradient descent approach. The 1990 kick started a new era in cluster analysis with the application of nature inspired metaheuristics. After initial formulation nearly two decades have passed and researchers have developed numerous new algorithms in this field. This paper embodies an up-to-date review of all major nature inspired metaheuristic algorithms employed till date for partitional clustering. Further, key issues involved during formulation of various metaheuristics as a clustering problem and major application areas are discussed.
    No preview · Article · Jun 2014 · Swarm and Evolutionary Computation
  • Source
    [Show abstract] [Hide abstract] ABSTRACT: In this paper, a new multi-objective genetic programming (GP) with a diversity preserving mechanism and a real number alteration operator is presented and successfully used for Pareto optimal modelling of some complex non-linear systems using some input–output data. In this study, two different input–output data-sets of a non-linear mathematical model and of an explosive cutting process are considered separately in three-objective optimisation processes. The pertinent conflicting objective functions that have been considered for such Pareto optimisations are namely, training error (TE), prediction error (PE), and the length of tree (complexity of the network) (TL) of the GP models. Such three-objective optimisation implementations leads to some non-dominated choices of GP-type models for both cases representing the trade-offs among those objective functions. Therefore, optimal Pareto fronts of such GP models exhibit the trade-off among the corresponding conflicting objectives and, thus, provide different non-dominated optimal choices of GP-type models. Moreover, the results show that no significant optimality in TE and PE may occur when the TL of the corresponding GP model exceeds some values.
    Full-text · Article · Jun 2014 · International Journal of Systems Science
Show more