Article

# Inducing multi-objective clustering ensembles with genetic programming

Federal University of São Carlos, Sorocaba Campus, Rod. João Leme dos Santos, Km 110, Bairro Itinga, 18052-780 Sorocaba, SP, Brazil

Neurocomputing (Impact Factor: 2.08). 12/2010; 74(1):494-498. DOI: 10.1016/j.neucom.2010.09.014 Source: DBLP

Get notified about updates to this publication Follow publication |

### Full-text

Available from: André CoelhoLetters

Inducing multi-objective clustering ensembles with genetic programming

Andre

´

L.V. Coelho

a,

n

, Everl

ˆ

andio Fernandes

a

, Katti Faceli

b

a

Graduate Program in Applied Informatics, Center of Technological Sciences, University of Fortaleza, Av. Washington Soares, 1321/J30, 60811-905 Fortaleza, CE, Brazil

b

Federal University of S

~

ao Carlos, Sorocaba Campus, Rod. Jo

~

ao Leme dos Santos, Km 110, Bairro Itinga, 18052-780 Sorocaba, SP, Brazil

article info

Article history:

Received 11 December 2009

Received in revised form

13 August 2010

Accepted 14 September 2010

Communicated by A. Abraham

Available online 16 October 2010

Keywords:

Cluster analysis

Ensembles

Multi-objective optimization

Genetic programming

abstract

The recent years have witnessed a growing interest in two advanced strategies to cope with the data

clustering problem, namely, clustering ensembles and multi-objective clustering. In this paper, we

present a genetic programming based approach that can be considered as a hybrid of these strategies,

thereby allowing that different hierarchical clustering ensembles be simultaneously evolved taking into

account complementary validity indices. Results of computational experiments conducted with artiﬁcial

and real datasets indicate that, in most of the cases, at least one of the Pareto optimal partitions returned

by the proposed approach compares favorably or go in par with the consensual partitions yielded by two

well-known clustering ensemble methods in terms of clustering quality, as gauged by the corrected

Rand index.

& 2010 Elsevier B.V. All rights reserved.

1. Introduction

In a nutshell, the goal of clustering is to partition a set of objects

into groups (clusters) so that objects assigned to the same group are

more akin to each other than those from distinct groups [14]. Over

the years, several clustering algorithms have been conceived, each

with its own set of parameters and producing data partitions in

consonance with a speciﬁc clustering criterion [9]. Although these

algorithms have been widely adopted in many ﬁelds, they usually

display shortcomings, such as: sensitiveness to parameter settings

[9]; requirement that the number of clusters be set a priori [9,12];

and difﬁculty in uncovering partitions with different types of

clusters [8,10]. Moreover, it is well known that there is no

algorithm, optimizing a unique criterion, able to reveal all types

of relevant structures that may be simultaneously present in the

dataset under analysis [5].

Recently, multi-objective clustering (MOC) [8,10] and clustering

ensembles (CE) [12,13] have emerged as two promising strategies

to cope with the abovementioned limitations. MOC focuses on the

simultaneous optimization of a number of clustering criteria to

generate a set of alternative structures possibly representing

diverse interpretations (views) of the data [8], while CE aims at

improving the overall clustering quality by reconciling information

coming from partitions produced by different clustering algo-

rithms or even by different runs of the same algorithm [12,13].

In this brief paper, we present a novel approach hybridizing CE

and MOC strategies in a manner as to combine their positive

aspects. The approach is founded on a Pareto-based version of

genetic programming (GP) [4], which is employed to evolve a

population of clustering ensemble models taking into account

complementary validity measures (namely, overall deviation and

connectivity [8]) as optimization criteria. By this means, different

structures present in the data and related to different clustering

criteria can be simultaneously revealed. Each clustering ensemble

model is, in fact, a hierarchy of consensus functions (CF) applied

over a subset of previously generated base partitions. These base

partitions can be created through the application of traditional

clustering algorithms [9] and take part in the GP grammar as

terminal symbols. On the other hand, different CF are already

available in the literature—such as cluster-based similarity parti-

tioning algorithm (CSPA), hyper-graph partitioning algorithm

(HGPA), meta-clustering algorithm (MCLA), and supra-consensus

(SC), all proposed by Strehl and Ghosh [12], as well as hybrid

bipartite graph formulation (HBGF), conceived by Fern and Brodley

[7]—and they can be recruited to compose the GP grammar as non-

terminal symbols. Since these consensus functions are nonlinear in

nature and exploit different aspects of the base partitions in order

to generate a consensus one, it is expected that their arrangement

into hierarchies and operating over partitions with different types

of clusters should bring about improvements in terms of clustering

accuracy.

In the sequel, we outline the main components and steps of the

novel approach. Then, we report on computational experiments

conducted over datasets with varying structures, whereby the

quality of the partitions returned by the GP-based approach is

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/neucom

Neurocomputing

0925-2312/$ - see front matter & 2010 Elsevier B.V. All rights reserved.

doi:10.1016/j.neucom.2010.09.014

n

Corresponding author. Tel.: +55 85 34773268; fax: +55 85 34773061.

E-mail addresses: acoelho@unifor.br, coelho.alv@gmail.com (A.L.V. Coelho),

everlandio@gmail.com (E. Fernandes), katti@ufscar.br (K. Faceli).

Neurocomputing 74 (2010) 494–498

Page 1

assessed and compared with the quality of the partitions produced

by SC and HBGF.

2. Multi-objective clustering ensembles via genetic

programming

Fig. 1 summarizes the main steps behind the novel approach.

Firstly, a range of data partitions of varying quality, with distinct

levels of reﬁnement, and possibly involving a large assortment of

cluster types and densities, should be generated at the outset and

then incorporated as terminals of the GP grammar. These base

partitions can be produced by methods associated with different

clustering criteria, by several runs of the same algorithm (with

diverse initial seeds/parameter conﬁgurations), and/or by cluster-

ing different subsamplings of the original dataset with the same

algorithm [5,13,14]. Together, one or more CF, such as those

aforementioned, should be implemented as non-terminal symbols

to operate over the base partitions while assembling the GP

individuals.

Having speciﬁed the GP grammar, the next step is to randomly

generate a population P with N

P

tree-like individuals. This initial

population (like the others of subsequent iterations) may contain

trees with different shapes and complexities (i.e., with varying

numbers of nodes, levels, and nested fusions), and each of these

trees should be interpreted in turn in order to have its ﬁtness

assessed. It is worth pointing out that there is no imposition that all

base partitions available in the terminal set be simultaneously

merged into each individual. This means that good subsets of base

partitions to be fused can be automatically searched for while

assembling the hierarchies whereas those of poor quality can be

evolutionarily avoided, discarded or replaced. Besides, the inter-

pretation of each tree is performed from bottom up, meaning that

higher-level CF nodes (that is, those closer to the leaves) are applied

ﬁrst, and their results (partitions) are propagated upwards the

hierarchy to serve as inputs (arguments) to lower-level CF nodes.

The number of arguments (partitions), N

A

, of each consensus

function may vary and, since the outcome of any CF is a partition,

the application of several of these functions in a hierarchy will

always end up with a partition as result. This property fulﬁlls the

closure requirement of GP [4]. Moreover, due to the usually

nonlinear character of these functions, the consensus partitions

yielded by the recursive application of fusion operators over

different subsets of base partitions (branches of the trees) can be

signiﬁcantly different, and of better quality, than those produced

through a unique application of a single CF over all base partitions

available (which is the typical case for clustering ensembles [12]).

For the multi-objective assessment of the quality of the con-

sensual partitions yielded by the GP individuals, distinct validity

indices should be employed [14]. So far, we have resorted to overall

deviation and connectivity due to their complementary roles [8]:

While the former rates the levels of cluster compactness and

encourages the induction of dense/spherical groups, the latter

gauges the levels of sample connectivity within groups, yielding

clusters with arbitrary shapes. These measures are internal indices

and assume no prior knowledge on the structure underlying the

data. Once the quality of the consensual partitions represented by

the individuals was assessed, the population is stratiﬁed into

groups, called fronts, following a strategy compliant to NSGA-II

[2]. The ﬁtness of an individual of a front is proportionate to the

front’s rank, meaning that a minimization process is in course. In

particular, the ﬁrst front contains those individuals that are not

dominated by any other member of the current population, that is,

it is composed of those consensus partitions representing the best

compromises to overall deviation and connectivity.

After ﬁtness assignment and population stratiﬁcation, new

individuals (offspring) are iteratively created and inserted into

the population by applying standard genetic operators (crossover

and mutation) over some individuals (parents) selected from the

current generation [4]. At each step, the choice of the genetic

Start

Create base

partitions

Generate initial

population

Fitness assignment

and front-based

population ranking

Crowding

distance

Population

replacement

Fitness assignment

and front-based

population ranking

Selection and

genetic operators for

offspring creation

Maximum

number of

generations?

Yes

No

End of

processing

Fig. 1. Flowchart with the main steps of the proposed GP-based approach.

A.L.V. Coelho et al. / Neurocomputing 74 (2010) 494–498 495

Page 2

operator and the parent(s) is done probabilistically until the pool

of offspring is complete. Parental selection is inﬂuenced by the

ﬁtness value and the crowding distance. The latter, borrowed from

NSGA-II [2], estimates the density of the region of the search space

where each GP individual resides: The higher the density, the

higher the value of this parameter. The newborn trees have the

quality of their partitions measured, the augmented population is

stratiﬁed again, and then all its members receive a new ﬁtness

value. The population of the next generation will be composed of

the best N

P

individuals.

This process of partition assessment, population stratiﬁcation,

and offspring creation is repeated until the maximum number of

generations N

G

is reached. Then, the consensus partitions asso-

ciated with the individuals belonging to the ﬁrst front of the last

population (which should be interpreted as an approximation to

the Pareto optimal set) is returned as the ﬁnal result of the GP

evolutionary process. It is expected that these partitions represent

the best compromises of the validity indices adopted and, since

each index evaluates different properties of the partitions, the

resulting set of partitions should be representative of the different

types of structures underlying the dataset.

3. Computational experiments

To assess the potentials of the proposed method, a prototype

was implemented with the help of GPLAB toolbox [11], computa-

tional experiments have been conducted, and preliminary results

are reported here. We have chosen GPLAB mostly because of its

highly modular structure (whereby functions are modeled as ‘‘plug

and play’’ devices), making it an easily extendable tool. Moreover,

this toolbox is equipped with several good functionalities, such as

pre-made functions and terminals for building trees, different

modes of tree initialization, a number of genetic operators, ofﬂine

and runtime graphical facilities, among others [11].

We compare the performance of the novel approach with that of

HBGF and SC. Table 1 brings a summary of the 10 datasets adopted

in the experiments. Each dataset has n data samples, d attributes,

and at least N

S

underlying structures, which are known a priori.

Although other unknown structures may be hidden in these

datasets, the known structures have been used to check the

effectiveness of our approach. Overall, there are 22 known struc-

tures to be revealed, each structure with a certain number of

clusters k and representing a well-deﬁned partition. In Table 1, the

numbers of clusters associated with the N

S

structures related to a

given dataset are given within parentheses.

This repertory of datasets has been speciﬁcally selected so as to

encompass a broad range of characteristics. Four datasets, namely,

ds2c2sc13, ds3c3sc6, ds4c2sc8, and spiralsquare, were artiﬁcially

synthesized to contain heterogeneous structures in different levels

of resolution [5]. They are two-dimensional in nature so as to

make the visualization of clusters and partitions easy. Other two

datasets, iris and glass, are considered as benchmark and were

taken from the UCI repository [1]. As these two datasets have been

extensively used, it is possible to compare the results reported here

with those delivered in the related literature. The remaining

datasets (golub, proteins, leukemia, and lung) are related to

real-life, bioinformatics problems, have large dimensions, and

are described elsewhere [6].

After preliminary tuning, the GP-based approach has been

conﬁgured with N

P

¼ 50, N

G

¼ 10 and N

A

¼ {2,3}. This means that

each CF in the hierarchies could operate over two or three

partitions, with the trees of the initial population being generated

via the full method [4]. As CF, we have adopted only HBGF in the

experiments reported here. The base partitions have been gener-

ated with the following algorithms, conﬁgured with different

values for their control parameters: k-means and hierarchical

single linkage (which are biased toward cluster compactness),

and hierarchical average linkage and shared nearest neighbors

(rooted in connectivity) [5,9,14]. Moreover, for each dataset, base

partitions with the number of clusters varying in the range

[k

min

,k

max

] have been produced with these methods, where k

min

and k

max

equal, respectively, to the smallest number of clusters and

to twice the largest number of clusters among those of the known

structures. By this means, a large assortment of cluster types could

be produced, and so the size of the terminal set of the GP grammar

has varied for each dataset.

Since the application of both SC and HBGF requires the number

of clusters of the consensual partition as input, we have adopted the

same range as above for deriving the results of these methods.

Moreover, to comparatively assess the quality of the ﬁnal partitions

created by the contestants, the corrected Rand (CR) index has been

used [14]. This external validity criterion is good for its insensi-

tiveness to the number of clusters in a structure, and it measures

the similarity between two partitions in this way: Values close to 0

mean random partitions and close to 1 indicate a perfect match

between the partitions. So, for each CE method, the CR value

between the best resulting partitions, according to each known

structure, and the corresponding known structure has been

calculated. In this regard, it should be emphasized that the

comparison of the partitions produced by these algorithms with

the known structures is conducted only for the purpose of

assessment of the potentials of the proposed approach. The

practical application to datasets for which the underlying struc-

tures are unknown could be indeed assessed by using internal

validation indices. However, it is worth remembering that such

indices are usually biased towards different clustering criteria.

Table 2 shows the performance results achieved by the contest-

ants for all datasets. For SC and HBGF, the mean values of the CR

index have been obtained over all partitions these methods have

produced by varying the number of clusters, as mentioned above.

Conversely, as our approach is stochastic in nature, 30 runs of the

GP process have been performed for each dataset and the average

CR value has been calculated over the best partition from the Pareto

optimal set of the last population in each run. In those cases where

more than one underlying structure is available, the CR value

is calculated for each structure separately.

As one can notice, the GP-based approach has performed very

well, comparing favorably or at least going in par with SC and HBGF

in all but ﬁve structures. This means that in more than 75% of the

cases, a consensus partition with better quality (as measured by CR)

could be discovered for a given structure of a dataset. Although in

some cases the gains may seem low, one should bear in mind that

CR is a highly sensitive index, which implies that a small increase in

its value may correspond to a signiﬁcant improvement in terms of

partition quality. Moreover, to demonstrate that the GP-based

approach has outperformed both SC and HBGF in a statistically

signiﬁcant way, we have resorted to the application of the Fried-

man test and Nemenyi test, since these non-parametric statistical

tests are suitable to compare the performance of different learning

algorithms when applied over multiple datasets (for a thorough

discussion on these tests, please refer to [3]). By applying the

Table 1

Conﬁguration of the datasets used in the experiments.

Dataset ndN

S

(k) Dataset nd N

S

(k)

ds2c2sc13 588 2 3 (2, 5, 13) iris 150 4 1 (3)

ds3c3sc6 905 2 2 (3, 6) golub 72 3571 4 (2, 3, 4, 2)

ds4c2sc8 485 2 2 (2, 8) proteins 698 125 2 (4, 27)

spiralsquare 2000 2 2 (2, 6) leukemia 327 271 2 (3, 7)

glass 214 9 3 (2, 5, 6) lung 197 1000 1 (4)

A.L.V. Coelho et al. / Neurocomputing 74 (2010) 494–498496

Page 3

Friedman test ﬁrst, we could perceive that the differences in

performance exhibited by the algorithms were in fact statistically

signiﬁcant at a level of 0.05. Conversely, the application of Nemenyi

post-test with the same level of signiﬁcance revealed that the

GP-based method has indeed prevailed over SC and HBGF.

4. Final remarks

In this paper, we have presented and empirically assessed a

GP-based approach for the multi-objective induction of hierarchical

clustering ensembles. By making use of CE individuals arranged in the

form of trees, the approach allows that novel hierarchical consensus

functions be automatically designed taking as basis primitive con-

sensus functions already available in the literature. Moreover, by

structuring and evolving the population through a multi-objective

basis, considering complementary validity indices, the approach is

capable of ﬁnding high-quality partitions that are representative of

the different structures available for a given dataset. The results

achieved with computational experiments conducted over a range of

datasets have conﬁrmed the feasibility of the proposed approach,

which has demonstrated better performance in comparison with two

well-known CE methods (viz., SC and HBGF).

As future work, a more thorough theoretical analysis of the

GP-based approach shall be undertaken with the purpose of

highlighting the aspects that make it distinctive from other related

methods, such as the multi-objective clustering with automatic

K-determination (MOCK) algorithm [8] and the multi-objective

clustering ensemble (MOCLE) algorithm [5,6]. A systematic empiri-

cal comparison with these methods is also underway, whereby

other conﬁgurations of the GP framework presented here have

been devised and assessed.

Acknowledgements

The work of the ﬁrst author has been ﬁnancially sponsored by

CNPq/Brazil, under Grant #312934/2009-2.

References

[1] A. Asuncio

´

n, D.J. Newman, UCI machine learning repository, /http://www.ics.

uci.edu/ mlearn/MLRepository.htmlS, 2007.

[2] K. Deb, A. Pratap, S. Agarwal, T. Meyrivan, A fast and elitist multi-objective

genetic algorithm: NSGA-II, IEEE Trans. Evol. Comput. 6 (2) (2002) 182–197.

[3] J. Dem

ˇ

sar, Statistical comparisons of classiﬁers over multiple data sets, J. Mach.

Learn. Res. 7 (2006) 1–30.

[4] A.E. Eiben, J.E. Smith, Introduction to Evolutionary Computing, 2nd ed.,

Springer, 2007.

[5] K. Faceli, A.C.P.L.F. de Carvalho, M.C.P. de Souto, Multi-objective clustering

ensemble, Int. J. Hybrid Intell. Syst. 4 (3) (2008) 145–156.

[6] K. Faceli, M.C.P. de Souto, D.S.A. de Arau

´

jo, A.C.P.L.F. de Carvalho, Multi-

objective clustering ensemble for gene expression data analysis, Neurocom-

puting 72 (13–15) (2009) 2763–2774.

[7] X.Z. Fern, C.E. Brodley, Solving cluster ensemble problems by bipartite graph

partitioning, in: Proceedings of the International Conference on Machine

Learning, ACM International ConferenceProceedings Series, Banff, Canada, 2004.

[8] J. Handl, J. Knowles, An evolutionary approach to multiobjective clustering,

IEEE Trans. Evol. Comput. 11 (1) (2007) 56–76.

[9] A.K. Jain, M. Murty, P. Flynn, Data clustering: a review, ACM Comput. Surv. 31

(3) (1999) 264–323.

[10] M. Law, A. Topchy, A.K. Jain, Multiobjective data clustering, in: Proceedings of

the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, 2004,

pp. 424–430.

[11] S. Silva, GPLAB—a genetic programming toolbox for MATLAB, version 3,

University of Coimbra, 2007.

[12] A. Strehl, J. Ghosh, Cluster ensembles—a knowledge reuse framework for

combining multiple partitions, J. Mach. Learn. Res. 3 (2002) 583–617.

[13] A. Topchy, A.K. Jain, W. Punch, Clustering ensembles: models of consensus and

weak partitions, IEEE Trans. Pattern Anal. Mach. Intell. 27 (12) (2005)

1866–1881.

[14] R. Xu, D.C. Wunsch, Clustering, Wiley-IEEE Press, 2008.

Andre

´

L.V. Coelho received the B.Sc. degree in Compu-

ter Engineering in 1996, and earned the M.Sc. and Ph.D.

degrees in Electrical Engineering in 1998 and 2004,

respectively, all from the State University of Campinas

(Unicamp), Brazil. He has a record of publications

related to the themes of machine learning, data mining,

computational intelligence, metaheuristics, and mul-

tiagent systems. He is a member of ACM and has served

as a reviewer for a number of scientiﬁc conferences and

journals. Currently, he is an adjunct professor afﬁliated

with the Graduate Program in Applied Informatics at

the University of Fortaleza, Ceara

´

, Brazil.

Table 2

Performance (average7 standard deviation of CR values) of the CE algorithms—best results are highlighted.

Dataset Structure SC HBGF GP

ds2c2sc13 E1 (k¼ 2) 0.69107 0.058 0.88747 0.1428 1.000070.0000

E2 (k¼ 5) 0.99207 0.022 0.91117 0.0000 1.0000 70.0000

E3 (k¼ 13) 0.78607 0.042 0.77037 0.0023 0.777770.0097

ds3c3sc6 E1 (k¼ 3) 0.89007 0.023 0.89007 0.0000 0.93657 0.0200

E2 (k¼ 6) 0.5969 70.044 0.56627 0.0000 0.64847 0.0212

ds4c2sc8 E1 (k¼ 2) 0.29587 0.078 0.37567 0.0567 0.33647 0.0371

E2 (k¼ 8) 0.84607 0.058 0.89407 0.0000 0.90767 0.0136

spiralsquare E1 (k¼ 2) 1.00007 0.0000 1.00007 0.0000 1.00007 0.0000

E2 (k¼ 6) 0.6381

7 0.062 0.73127 0.0222 0.66677 0.0004

glass E1 (k¼ 2) 0.63407 0.048 0.66057 0.0002 0.6938 70.0249

E2 (k¼ 5) 0.4764 70.039 0.50447 0.0000 0.572970.0226

E3 (k¼ 6) 0.22407 0.016 0.23947 0.0000 0.2808 70.018

iris E1 (k¼ 3) 0.7591 70.018 0.75927 0.000 0.759270.0000

golub E1 (k¼ 2) 0.74307 0.175 0.842170.0000 0.88157 0.0822

E2 (k¼ 3) 0.63707 0.129 0.79037 0.0531 0.87737 0.0242

E3 (k¼ 4) 0.58907 0.124 0.65067 0.0176 0.67787 0.0506

E4 (k¼ 2) 0.11007 0.074 0.00677 0.0091 0.04917 0.0139

proteins E1 (k¼

4) 0.3150 70.020 0.33117 0.0000 0.336670.0123

E2 (k¼ 27) 0.13307 0.006 0.12307 0.0051 0.1403 7 0.0056

leukemia E1 (k¼ 3) 0.31507 0.046 0.36257 0.0354 0.28817 0.0088

E2 (k¼ 7) 0.65907 0.053 0.56037 0.0007 0.7808 70.0059

lung E1 (k¼ 4) 0.43807 0.071 0.51747 0.0533 0.79287 0.0433

A.L.V. Coelho et al. / Neurocomputing 74 (2010) 494–498 497

Page 4

Everlandio Fernandes received in 2002 the B.Sc. degree

in Computer Sciences from the Federal University of Rio

Grande do Norte, Brazil. He also holds a M.Sc. degree in

Applied Informatics from the University of Fortaleza

(2009). His areas of interest are clustering, committee

machines, and evolutionary algorithms.

Katti Faceli received the B.Sc., M.Sc. and Ph.D. degrees

in Computer Science in, respectively, 1998, 2001, and

2006, all from the University of Sao Paulo, Brazil.

Currently, she is Associate Professor at Federal Univer-

sity of S

~

ao Carlos, Campus Sorocaba, Brazil. Her main

research interests include machine learning, hybrid

intelligent systems, cluster analysis, ensembles, feature

selection and bioinformatics.

A.L.V. Coelho et al. / Neurocomputing 74 (2010) 494–498498

Page 5

Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.

- [Show abstract] [Hide abstract]
**ABSTRACT:**In real-world problems we encounter situations where patterns are described by blocks (families) of features where each of these groups comes with a well-expressed semantics. For instance, in spatiotemporal data we are dealing with spatial coordinates of the objects (say, x–y coordinates) while the temporal part of the objects forms another collection of features. It is apparent that when clustering objects being described by families of features, it becomes intuitively justifiable to anticipate their different role and contribution to the clustering process of the data whereas the clustering is sought to be reflective of an overall structure in the data set. To address this issue, we introduce an agreement based fuzzy clustering—a fuzzy clustering with blocks of features. The detailed investigations are carried out for the well-known algorithm of fuzzy clustering that is fuzzy C-means (FCM). We propose an extended version of the FCM where a composite distance function is endowed with adjustable weights (parameters) quantifying an impact coming from the blocks of features. A global evaluation criterion is used to assess the quality of the obtained results. It is treated as a fitness function in the optimization of the weights through the use of particle swarm optimization (PSO). The behavior of the proposed method is investigated in application to synthetic and real-world data as well as a certain case study. - [Show abstract] [Hide abstract]
**ABSTRACT:**The partitional clustering concept started with K-means algorithm which was published in 1957. Since then many classical partitional clustering algorithms have been reported based on gradient descent approach. The 1990 kick started a new era in cluster analysis with the application of nature inspired metaheuristics. After initial formulation nearly two decades have passed and researchers have developed numerous new algorithms in this field. This paper embodies an up-to-date review of all major nature inspired metaheuristic algorithms employed till date for partitional clustering. Further, key issues involved during formulation of various metaheuristics as a clustering problem and major application areas are discussed. - [Show abstract] [Hide abstract]
**ABSTRACT:**In this paper, a new multi-objective genetic programming (GP) with a diversity preserving mechanism and a real number alteration operator is presented and successfully used for Pareto optimal modelling of some complex non-linear systems using some input–output data. In this study, two different input–output data-sets of a non-linear mathematical model and of an explosive cutting process are considered separately in three-objective optimisation processes. The pertinent conflicting objective functions that have been considered for such Pareto optimisations are namely, training error (TE), prediction error (PE), and the length of tree (complexity of the network) (TL) of the GP models. Such three-objective optimisation implementations leads to some non-dominated choices of GP-type models for both cases representing the trade-offs among those objective functions. Therefore, optimal Pareto fronts of such GP models exhibit the trade-off among the corresponding conflicting objectives and, thus, provide different non-dominated optimal choices of GP-type models. Moreover, the results show that no significant optimality in TE and PE may occur when the TL of the corresponding GP model exceeds some values.