Article

Coordination of Cluster Ensembles via Exact Methods

IEEE Transactions on Pattern Analysis and Machine Intelligence (impact factor: 4.91). 03/2011; DOI:10.1109/TPAMI.2010.85 pp.279 - 293
Source: IEEE Xplore

ABSTRACT We present a novel optimization-based method for the combination of cluster ensembles for the class of problems with intracluster criteria, such as Minimum-Sum-of-Squares-Clustering (MSSC). We propose a simple and efficient algorithm-called EXAMCE-for this class of problems that is inspired from a Set-Partitioning formulation of the original clustering problem. We prove some theoretical properties of the solutions produced by our algorithm, and in particular that, under general assumptions, though the algorithm recombines solution fragments so as to find the solution of a Set-Covering relaxation of the original formulation, it is guaranteed to find better solutions than the ones in the ensemble. For the MSSC problem in particular, a prototype implementation of our algorithm found a new better solution than the previously best known for 21 of the test instances of the 40-instance TSPLIB benchmark data sets used in, and and found a worse-quality solution than the best known only five times. For other published benchmark data sets where the optimal MSSC solution is known, we match them. The algorithm is particularly effective when the number of clusters is large, in which case it is able to escape the local minima found by K-means type algorithms by recombining the solutions in a Set-Covering context. We also establish the stability of the algorithm with extensive computational experiments, by showing that multiple runs of EXAMCE for the same clustering problem instance produce high-quality solutions whose Adjusted Rand Index is consistently above 0.95. Finally, in experiments utilizing external criteria to compute the validity of clustering, EXAMCE is capable of producing high-quality results that are comparable in quality to those of the best known clustering algorithms.

0 0
 · 
0 Bookmarks
 · 
19 Views

Keywords

Adjusted Rand Index
 
algorithm recombines solution fragments
 
clustering problem instance
 
efficient algorithm-called EXAMCE-for
 
experiments utilizing external criteria
 
extensive computational experiments
 
general assumptions
 
high-quality solutions
 
intracluster criteria
 
K-means type algorithms
 
known clustering algorithms
 
Minimum-Sum-of-Squares-Clustering
 
novel optimization-based method
 
optimal MSSC solution
 
original clustering problem
 
published benchmark data sets
 
Set-Covering context
 
solutions
 
test instances
 
worse-quality solution
 

I.T. Christou