Conference Paper

Sum-of-Squares Based Cluster Validity Index and Significance Analysis.

DOI: 10.1007/978-3-642-04921-7_32 Conference: Adaptive and Natural Computing Algorithms, 9th International Conference, ICANNGA 2009, Kuopio, Finland, April 23-25, 2009, Revised Selected Papers
Source: DBLP

ABSTRACT Different clustering algorithms achieve different results with certain data sets because most clustering algorithms are sensitive
to the input parameters and the structure of data sets. The way of evaluating the result of the clustering algorithms, cluster
validity, is one of the problems in cluster analysis. In this paper, we build a framework for cluster validity process, while
proposing a sum-of-squares based index for purpose of cluster validity. We use the resampling method in the framework to analyze
the stability of the clustering algorithm, and the certainty of the cluster validity index. For homogeneous data based on
independent variables, the proposed clustering validity index is effective in comparison to some other commonly used indexes.

0 Bookmarks
 · 
333 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we consider a problem of an unsupervised clustering of multidimensional numerical data. We propose a new method for determining an optimal number of clusters in a data set which is based on a parametric model of a Rate-Distortion curve. Theproposed method can be used in conjunction with any suitable clustering algorithm. It was tested with artificial and real numerical data sets and the results of experiments demonstrate empirically not only effectiveness of the method but also its ability to cope with "difficult" cases where other known methods failed.
    Proceedings of the 9th international conference on Image Analysis and Recognition - Volume Part I; 06/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Contrary to most of the existing 3D shape clustering methods, in which all the objects in a dataset must be classified in clusters, in this paper we tackle an incomplete but reliable unsupervised clustering solution. The central idea lies in obtaining coherent 3D shape groups using a consensus between different similarity measures which are defined in a common 3D shape representation framework. Our goal, therefore, is to extract some consistent groups of objects, considering the incomplete classification, if this occurs, as a natural result. The Weighted Cone Curvature (WCC) is defined as an overall feature which synthesizes a set of curvature levels on the nodes of a standard triangular mesh representation. The WCC concept is used to define a master descriptor called an RC-Image on which up to eight similarity measures are defined. A hierarchical clustering process is then carried out for all the measures and evaluated by means of a clustering confidence measure. Finally, a consensus between the best measures is achieved to provide a coherent group of objects. The proposed clustering approach has been tested on a set of mesh models belonging to a wide variety of free-shape objects, yielding promising results. The results of our experiments demonstrate that both the 3D shape descriptor used and the clustering strategy proposed might be useful for future developments in the unsupervised grouping field.
    Pattern Recognition 01/2014; 47(1):402-417. · 2.58 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a framework to process and analyze data from a pulse oximeter which remotely measures pulse rate and blood oxygen saturation from a set of individuals. Using case-based reasoning (CBR) as the backbone to the framework, records are analyzed and categorized according to their similarity. Record collection has been performed using a personalized health profiling approach where participants wore a pulse oximeter sensor for a fixed period of time and performed specific activities for pre-determined intervals. Using a variety of feature extraction methods in time, frequency and time-frequency domains, and data processing techniques, the data is fed into a CBR system which retrieves most similar cases and generates an alarm according to the case outcomes. The system has been compared with an expert's classification and a 90% match is achieved between the expert's and CBR classification. Again, considering the clustered measurements the CBR approach classifies 93% correctly both for the pulse rate and oxygen saturation. Along with the proposed methodology, this paper provides a basis for which the system can be used in the analysis of continuous health monitoring, and be used as a suitable method in home/remote monitoring systems
    ISRN Artificial Intelligence. 04/2013;

Full-text (2 Sources)

Download
154 Downloads
Available from
May 22, 2014