Conference Paper

DHC: A Density-Based Hierarchical Clustering Method for Time Series Gene Expression Dat.

Dept. of Comput. Sci., State Univ. of New York, Buffalo, NY, USA;
DOI: 10.1109/BIBE.2003.1188978 Conference: 3rd IEEE International Symposium on BioInformatics and BioEngineering (BIBE 2003), 10-12 March 2003, Bethesda, MD, USA
Source: DBLP

ABSTRACT terns in underlying data, have proved to be useful in finding co-expressed genes. Clustering the time series gene expression data is an im-portant task in bioinformatics research and biomedical ap-plications. Recently, some clustering methods have been adapted or proposed. However, some concerns still remain, such as the robustness of the mining methods, as well as the quality and the interpretability of the mining results. In this paper, we tackle the problem of effectively clus-tering time series gene expression data by proposing al-gorithm DHC, a density-based, hierarchical clustering method. We use a density-based approach to identify the clusters such that the clustering results are of high quality and robustness. Moreover, The mining result is in the form of a density tree, which uncovers the embedded clusters in a data set. The inner-structures, the borders and the out-liers of the clusters can be further investigated using the attraction tree, which is an intermediate result of the min-ing. By these two trees, the internal structure of the data set can be visualized effectively. Our empirical evaluation using some real-world data sets show that the method is effective, robust and scalable. It matches the ground truth provided by bioinformatics experts very well in the sample data sets.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Clustering is an essential tool in data mining that has drawn enormous attention. In this paper, we present a new clustering algorithm with the help of Voronoi diagram. Here the clusters are formed by considering the neighboring Voronoi cells. The points belong to the closer Voronoi cells are merged to form the clusters. The similarity of the points is measured based on Euclidean distance of the neighboring points and hence it is not necessary to compare the distances from one point to all other points of the given set. We perform various experiments using many synthetic and biological data sets. The experimental results demonstrate the significance of the proposed method.
    Proceedings of the 2011 international conference on Advanced Computing, Networking and Security; 12/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent advances in DNA microarray technology, also known as gene chips, allow measuring the expression of thousands of genes in parallel under multiple experimental conditions [1]. This technology is having a significant impact on genomic studies. Disease diagnosis, drug discovery and toxicological research benefit from the microarray technology. Arrays are now widely used in basic biomedical research for mRNA expression profiling and are increasing being used to explore patterns of gene expression in clinical research.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Clustering and classification hierarchies are organizational structures of a set of objects. Multiple hierarchies may be derived over the same set of objects, which makes distance computation between hierarchies an important task. In this paper, we model the classification and clustering hierarchies as rooted, leaf-labeled, unordered trees. We propose a novel distance metric Split-Order distance to evaluate the organizational structure difference between two hierarchies over the same set of leaf objects. Split-Order distance reflects the order in which subsets of the tree leaves are differentiated from each other and can be used to explain the relationships between the leaf objects. We also propose an efficient algorithm for computing Split-Order distance between two trees in O ( n 2 d 4) time, where n is the number of leaves, and d is the maximum number of children of any node. Our experiments on both real and synthetic data demonstrate the efficiency and effectiveness of our algorithm.
    Proceedings of the 21st International Conference on Scientific and Statistical Database Management; 06/2009


Available from