Conference Paper

Sum-of-Squares Based Cluster Validity Index and Significance Analysis

DOI: 10.1007/978-3-642-04921-7_32 Conference: Adaptive and Natural Computing Algorithms, 9th International Conference, ICANNGA 2009, Kuopio, Finland, April 23-25, 2009, Revised Selected Papers
Source: DBLP


Different clustering algorithms achieve different results with certain data sets because most clustering algorithms are sensitive
to the input parameters and the structure of data sets. The way of evaluating the result of the clustering algorithms, cluster
validity, is one of the problems in cluster analysis. In this paper, we build a framework for cluster validity process, while
proposing a sum-of-squares based index for purpose of cluster validity. We use the resampling method in the framework to analyze
the stability of the clustering algorithm, and the certainty of the cluster validity index. For homogeneous data based on
independent variables, the proposed clustering validity index is effective in comparison to some other commonly used indexes.

Download full-text


Available from: Pasi Fränti, Oct 04, 2015
214 Reads
  • Source
    • "Two well-known clustering algorithms, k-means and single linkage hierarchical clustering, are applied [29] [30], and up to 5 clusters are generated by each algorithm. In order to make meaningful clusters and also find the best clustering method, four validation indices, Silhouette coefficient (SC) [31], Dunn's index (DI) [34], Calinski-Harabasz index (CH) [32], and WB index (WB) [33], are compared. As it can be seen from Figure 7, for pulse rate (a) and oxygen saturation (b), k-means clustering is better than single linkage hierarchical clustering. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a framework to process and analyze data from a pulse oximeter which remotely measures pulse rate and blood oxygen saturation from a set of individuals. Using case-based reasoning (CBR) as the backbone to the framework, records are analyzed and categorized according to their similarity. Record collection has been performed using a personalized health profiling approach where participants wore a pulse oximeter sensor for a fixed period of time and performed specific activities for pre-determined intervals. Using a variety of feature extraction methods in time, frequency and time-frequency domains, and data processing techniques, the data is fed into a CBR system which retrieves most similar cases and generates an alarm according to the case outcomes. The system has been compared with an expert's classification and a 90% match is achieved between the expert's and CBR classification. Again, considering the clustered measurements the CBR approach classifies 93% correctly both for the pulse rate and oxygen saturation. Along with the proposed methodology, this paper provides a basis for which the system can be used in the analysis of continuous health monitoring, and be used as a suitable method in home/remote monitoring systems
    04/2013; DOI:10.1155/2013/380239
  • [Show abstract] [Hide abstract]
    ABSTRACT: External validity measures in cluster analysis evaluate how well the clustering results match to a prior knowledge about the data. However, it is always intractable to get the prior knowledge in the practical problem of unsupervised learning, such as cluster analysis. In this paper, we extend the external validity measures for both hard and soft partitions by a resampling method, where no prior information is needed. To lighten the time burden caused by the resampling method, we incorporate two approaches into the proposed method: (i) extending external validity measures for soft partitions in a computational time of O(M2N); (ii) an efficient sub-sampling method with time complexity of O(N). The proposed method is then applied and reviewed in determining the number of clusters for the problem of unsupervised learning, cluster analysis. Experimental results has demonstrated the proposed method is very effective in solving the number of clusters.
    01/2011; DOI:10.1109/ISDA.2011.6121777
  • Source
    ICIAR; 01/2012
Show more