Conference Paper

Sum-of-Squares Based Cluster Validity Index and Significance Analysis.

DOI: 10.1007/978-3-642-04921-7_32 Conference: Adaptive and Natural Computing Algorithms, 9th International Conference, ICANNGA 2009, Kuopio, Finland, April 23-25, 2009, Revised Selected Papers
Source: DBLP

ABSTRACT Different clustering algorithms achieve different results with certain data sets because most clustering algorithms are sensitive
to the input parameters and the structure of data sets. The way of evaluating the result of the clustering algorithms, cluster
validity, is one of the problems in cluster analysis. In this paper, we build a framework for cluster validity process, while
proposing a sum-of-squares based index for purpose of cluster validity. We use the resampling method in the framework to analyze
the stability of the clustering algorithm, and the certainty of the cluster validity index. For homogeneous data based on
independent variables, the proposed clustering validity index is effective in comparison to some other commonly used indexes.

Download full-text


Available from: Pasi Fränti, Jun 21, 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a framework to process and analyze data from a pulse oximeter which remotely measures pulse rate and blood oxygen saturation from a set of individuals. Using case-based reasoning (CBR) as the backbone to the framework, records are analyzed and categorized according to their similarity. Record collection has been performed using a personalized health profiling approach where participants wore a pulse oximeter sensor for a fixed period of time and performed specific activities for pre-determined intervals. Using a variety of feature extraction methods in time, frequency and time-frequency domains, and data processing techniques, the data is fed into a CBR system which retrieves most similar cases and generates an alarm according to the case outcomes. The system has been compared with an expert's classification and a 90% match is achieved between the expert's and CBR classification. Again, considering the clustered measurements the CBR approach classifies 93% correctly both for the pulse rate and oxygen saturation. Along with the proposed methodology, this paper provides a basis for which the system can be used in the analysis of continuous health monitoring, and be used as a suitable method in home/remote monitoring systems
    04/2013; DOI:10.1155/2013/380239
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The analysis of the gender wage gap has been an active subject within the socio-economic domains around the world. Much of this gap occurs at the upper rungs of the organizational ladder, even among females with credentials or achievements to their names. This research attempts to answer the gender wage gap questions related to Lebanon by utilizing an in-depth cluster analysis on gathered data pertaining to all the employees of two Lebanese financial institutions. The results indicate that Lebanon, as other countries of the world, suffers from serious discrimination as to the considerable differences between wages paid for classical "men's jobs" and those paid for classical "women's jobs". Moreover, this study shows that common factors for both genders, including years of experience, age, educational level and position, generally cannot be attributes to explain this significant wage gap; it may imply that the said gap is due to culture, traditions and weak governmental policies.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Dermoscopy is a noninvasive skin imaging technique, which permits visualization of features of pigmented melanocytic neoplasms that are not discernable by examination with the naked eye. Color information is indispensable for the clinical diagnosis malignant melanoma, the most deadly form of skin cancer. For this reason, most of the currently accepted dermoscopic scoring systems either directly or indirectly incorporate color as a diagnostic criterion. For example, both the asymmetry, border, colors, and dermoscopic (ABCD) rule of dermoscopy and the more recent color, architecture, symmetry, and homogeneity (CASH) algorithm include the number of clinically significant colors in their calculation of malignancy scores. In this paper, we present a machine learning approach to the automated quantification of clinically significant colors in dermoscopy images. Given a true-color dermoscopy image with $N$ colors, we first reduce the number of colors in this image to a small number $K$, i.e., $K ll N$, using the $K$-means clustering algorithm incorporating a spatial term. The optimal $K$ value for the image is estimated separately using five commonly used cluster validity criteria. We then train a symbolic regression algorithm using the estimates given by these criteria, which are calculated on a set of 617 images. Finally, the mathematical equation given by the regression algorithm is used for two-class (benign versus malignant) classification. The proposed approach yields a sensitivity of 62% and a specificity of 76% on an independent test set of 297 images.
    IEEE Systems Journal 09/2014; 8(3):980-984. DOI:10.1109/JSYST.2014.2313671 · 1.75 Impact Factor