Conference Paper

Sum-of-Squares Based Cluster Validity Index and Significance Analysis.

DOI: 10.1007/978-3-642-04921-7_32 Conference: Adaptive and Natural Computing Algorithms, 9th International Conference, ICANNGA 2009, Kuopio, Finland, April 23-25, 2009, Revised Selected Papers
Source: DBLP

ABSTRACT Different clustering algorithms achieve different results with certain data sets because most clustering algorithms are sensitive
to the input parameters and the structure of data sets. The way of evaluating the result of the clustering algorithms, cluster
validity, is one of the problems in cluster analysis. In this paper, we build a framework for cluster validity process, while
proposing a sum-of-squares based index for purpose of cluster validity. We use the resampling method in the framework to analyze
the stability of the clustering algorithm, and the certainty of the cluster validity index. For homogeneous data based on
independent variables, the proposed clustering validity index is effective in comparison to some other commonly used indexes.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Dermoscopy is a noninvasive skin imaging technique, which permits visualization of features of pigmented melanocytic neoplasms that are not discernable by examination with the naked eye. Color information is indispensable for the clinical diagnosis malignant melanoma, the most deadly form of skin cancer. For this reason, most of the currently accepted dermoscopic scoring systems either directly or indirectly incorporate color as a diagnostic criterion. For example, both the asymmetry, border, colors, and dermoscopic (ABCD) rule of dermoscopy and the more recent color, architecture, symmetry, and homogeneity (CASH) algorithm include the number of clinically significant colors in their calculation of malignancy scores. In this paper, we present a machine learning approach to the automated quantification of clinically significant colors in dermoscopy images. Given a true-color dermoscopy image with $N$ colors, we first reduce the number of colors in this image to a small number $K$, i.e., $K ll N$, using the $K$-means clustering algorithm incorporating a spatial term. The optimal $K$ value for the image is estimated separately using five commonly used cluster validity criteria. We then train a symbolic regression algorithm using the estimates given by these criteria, which are calculated on a set of 617 images. Finally, the mathematical equation given by the regression algorithm is used for two-class (benign versus malignant) classification. The proposed approach yields a sensitivity of 62% and a specificity of 76% on an independent test set of 297 images.
    IEEE Systems Journal 09/2014; 8(3):980-984. DOI:10.1109/JSYST.2014.2313671 · 1.75 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we consider the problem of unsupervised clustering (vector quantization) of multidimensional numerical data. We propose a new method for determining an optimal number of clusters in the data set. The method is based on parametric modeling of the quantization error. The model parameter can be treated as the effective dimensionality of the data set. The proposed method was tested with artificial and real numerical data sets and the results of the experiments demonstrate empirically not only the effectiveness of the method but its ability to cope with difficult cases where other known methods fail.
    Pattern Recognition 03/2015; 48(3). DOI:10.1016/j.patcog.2014.09.017 · 2.58 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Contrary to most of the existing 3D shape clustering methods, in which all the objects in a dataset must be classified in clusters, in this paper we tackle an incomplete but reliable unsupervised clustering solution. The central idea lies in obtaining coherent 3D shape groups using a consensus between different similarity measures which are defined in a common 3D shape representation framework. Our goal, therefore, is to extract some consistent groups of objects, considering the incomplete classification, if this occurs, as a natural result. The Weighted Cone Curvature (WCC) is defined as an overall feature which synthesizes a set of curvature levels on the nodes of a standard triangular mesh representation. The WCC concept is used to define a master descriptor called an RC-Image on which up to eight similarity measures are defined. A hierarchical clustering process is then carried out for all the measures and evaluated by means of a clustering confidence measure. Finally, a consensus between the best measures is achieved to provide a coherent group of objects. The proposed clustering approach has been tested on a set of mesh models belonging to a wide variety of free-shape objects, yielding promising results. The results of our experiments demonstrate that both the 3D shape descriptor used and the clustering strategy proposed might be useful for future developments in the unsupervised grouping field.
    Pattern Recognition 01/2014; 47(1):402-417. DOI:10.1016/j.patcog.2013.07.006 · 2.58 Impact Factor

Full-text (2 Sources)

Available from
May 22, 2014