Conference Paper

DHC: A Density-Based Hierarchical Clustering Method for Time Series Gene Expression Dat.

Dept. of Comput. Sci., State Univ. of New York, Buffalo, NY, USA
DOI: 10.1109/BIBE.2003.1188978 Conference: 3rd IEEE International Symposium on BioInformatics and BioEngineering (BIBE 2003), 10-12 March 2003, Bethesda, MD, USA
Source: DBLP

ABSTRACT terns in underlying data, have proved to be useful in finding co-expressed genes. Clustering the time series gene expression data is an im-portant task in bioinformatics research and biomedical ap-plications. Recently, some clustering methods have been adapted or proposed. However, some concerns still remain, such as the robustness of the mining methods, as well as the quality and the interpretability of the mining results. In this paper, we tackle the problem of effectively clus-tering time series gene expression data by proposing al-gorithm DHC, a density-based, hierarchical clustering method. We use a density-based approach to identify the clusters such that the clustering results are of high quality and robustness. Moreover, The mining result is in the form of a density tree, which uncovers the embedded clusters in a data set. The inner-structures, the borders and the out-liers of the clusters can be further investigated using the attraction tree, which is an intermediate result of the min-ing. By these two trees, the internal structure of the data set can be visualized effectively. Our empirical evaluation using some real-world data sets show that the method is effective, robust and scalable. It matches the ground truth provided by bioinformatics experts very well in the sample data sets.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Clustering is an essential tool in data mining that has drawn enormous attention. In this paper, we present a new clustering algorithm with the help of Voronoi diagram. Here the clusters are formed by considering the neighboring Voronoi cells. The points belong to the closer Voronoi cells are merged to form the clusters. The similarity of the points is measured based on Euclidean distance of the neighboring points and hence it is not necessary to compare the distances from one point to all other points of the given set. We perform various experiments using many synthetic and biological data sets. The experimental results demonstrate the significance of the proposed method.
    Proceedings of the 2011 international conference on Advanced Computing, Networking and Security; 12/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent advances in DNA microarray technology, also known as gene chips, allow measuring the expression of thousands of genes in parallel under multiple experimental conditions [1]. This technology is having a significant impact on genomic studies. Disease diagnosis, drug discovery and toxicological research benefit from the microarray technology. Arrays are now widely used in basic biomedical research for mRNA expression profiling and are increasing being used to explore patterns of gene expression in clinical research.
  • [Show abstract] [Hide abstract]
    ABSTRACT: With the advancement of high-throughput biotechnologies, biological data describing DNA, RNA, protein, and metabolite biomolecules are generated faster than ever. Huge amount of information is being produced and collected. Bioinformatics uses information technology to facilitate the discovery of new knowledge from large sets of various biological data at the molecular level. Within various applications of information technology, clustering has long played an important role. Clustering is an exploratory tool for analyzing large datasets and ...
    Statistical Bioinformatics: A Guide for Life and Biomedical Science Researchers, Edited by Jae K. Lee, 06/2010: chapter Clustering: Unsupervised Learning in Large Biological Data; Wiley-Blackwell., ISBN: 9780470567647


Available from