Multifractal-based cluster hierarchy optimisation algorithm

ArticleinInternational Journal of Business Intelligence and Data Mining 3(4):353-374 · January 2008with17 Reads
DOI: 10.1504/IJBIDM.2008.022734 · Source: DBLP
A cluster is a collection of data objects that are similar to one another within the same cluster and are dissimilar to the objects in other clusters. Moreover, there will exist more or less similarities among these large amounts of initial cluster results in a real-life data set. Accordingly, an analyser will have difficulty implementing further analysis if they know nothing about these similarities. Therefore, it is very valuable to analyse these similarities and construct the hierarchy structures of the initial clusters. The traditional cluster methods are unfit for this cluster postprocessing problem for their favour of finding the spherical shape clusters, impractical hypothesis and multiple scans of the data set. Based on multifractal theory, we propose the MultiFractal-based Cluster Hierarchy Optimisation (MFCHO) algorithm, which integrates the cluster similarity with cluster shape and cluster distribution to construct the cluster hierarchy tree from the disjoint initial clusters. The elementary time-space complexity of the MFCHO algorithm is presented. Several comparative experiments using synthetic and real-life data sets show the performance and the effectivity of MFCHO.
  • Full-text · Article · Jan 2004
  • [Show abstract] [Hide abstract] ABSTRACT: Clustering is a widely used knowledge discovery technique. It helps uncovering structures in data that were not previously known. The clustering of large data sets has received a lot of attention in recent years, however, clustering is a still a challenging task since many published algorithms fail to do well in scaling with the size of the data set and the number of dimensions that describe the points, or in #nding arbitrary shapes of clusters, or dealing e#ectively with the presence of noise. In this paper, we present a new clustering algorithm, based in the fractal properties of the data sets. The new algorithm, whichwe call Fractal Clustering #FC#, places points incrementally in the cluster for which the change in the fractal dimension after adding the point is the least. This is a very natural way of clustering points, since points in the same cluster have a great degree of selfsimilarity among them #and much less self-similarity with respect to points in other clusters#. FC requires one scan of the data, is suspendable at will, providing the best answer possible at that point, and is incremental. We show via experiments that FC e#ectively deals with large data sets, high-dimensionality and noise and is capable of recognizing clusters of arbitrary shape. Categories and Subject Descriptors I.5.3 #Computing Methodologies#: Pattern Recognition--- Clustering General Terms Fractals 1.
    Full-text · Conference Paper · Aug 2000 · WSEAS Transactions on Information Science and Applications
  • [Show abstract] [Hide abstract] ABSTRACT: Cluster analysis is a primary method for database mining. It is either used as a stand-alone tool to get insight into the distribution of a data set, e.g. to focus further analysis and data processing, or as a preprocessing step for other algorithms operating on the detected clusters. Almost all of the well-known clustering algorithms require input parameters which are hard to determine but have a significant influence on the clustering result. Furthermore, for many real-data sets there does not even exist a global parameter setting for which the result of the clustering algorithm describes the intrinsic clustering structure accurately. We introduce a new algorithm for the purpose of cluster analysis which does not produce a clustering of a data set explicitly; but instead creates an augmented ordering of the database representing its density-based clustering structure. This cluster-ordering contains information which is equivalent to the density-based clusterings corresponding to a broad range of parameter settings. It is a versatile basis for both automatic and interactive cluster analysis. We show how to automatically and efficiently extract not only 'traditional' clustering information (e.g. representative points, arbitrary shaped clusters), but also the intrinsic clustering structure. For medium sized data sets, the cluster-ordering can be represented graphically and for very large data sets, we introduce an appropriate visualization technique. Both are suitable for interactive exploration of the intrinsic clustering structure offering additional insights into the distribution and correlation of the data.
    Full-text · Conference Paper · Jun 1999
Show more