Conference Paper

DHC: A Density-Based Hierarchical Clustering Method for Time Series Gene Expression Dat.

Dept. of Comput. Sci., State Univ. of New York, Buffalo, NY, USA
DOI: 10.1109/BIBE.2003.1188978 Conference: 3rd IEEE International Symposium on BioInformatics and BioEngineering (BIBE 2003), 10-12 March 2003, Bethesda, MD, USA
Source: DBLP

ABSTRACT terns in underlying data, have proved to be useful in finding co-expressed genes. Clustering the time series gene expression data is an im-portant task in bioinformatics research and biomedical ap-plications. Recently, some clustering methods have been adapted or proposed. However, some concerns still remain, such as the robustness of the mining methods, as well as the quality and the interpretability of the mining results. In this paper, we tackle the problem of effectively clus-tering time series gene expression data by proposing al-gorithm DHC, a density-based, hierarchical clustering method. We use a density-based approach to identify the clusters such that the clustering results are of high quality and robustness. Moreover, The mining result is in the form of a density tree, which uncovers the embedded clusters in a data set. The inner-structures, the borders and the out-liers of the clusters can be further investigated using the attraction tree, which is an intermediate result of the min-ing. By these two trees, the internal structure of the data set can be visualized effectively. Our empirical evaluation using some real-world data sets show that the method is effective, robust and scalable. It matches the ground truth provided by bioinformatics experts very well in the sample data sets.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent advances in DNA microarray technology, also known as gene chips, allow measuring the expression of thousands of genes in parallel under multiple experimental conditions [1]. This technology is having a significant impact on genomic studies. Disease diagnosis, drug discovery and toxicological research benefit from the microarray technology. Arrays are now widely used in basic biomedical research for mRNA expression profiling and are increasing being used to explore patterns of gene expression in clinical research.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Cluster Analysis is a fundamental data exploration technique in which there exist a variety of algorithms arising in many different areas of the literature. One of the most popular clustering algorithms is fuzzy c-means which minimises the weighted within-group sum of squares functional. However, this algorithm has a major drawback in that it will only find clusters of the same simple convex shape. In reality datasets can contain clusters of a variety shapes, including complex non-convex structures the geometry of which cannot be analytically described. This paper proposes a new data induced metric to be used in the fuzzy c-means functional that results in an algorithm able to find clusters of unknown arbitrary shapes. The proposed metric is based on density information obtained using the Delaunay triangulation. The resulting optimisation problem is now non-smooth, and the alternating optimisation technique used in fuzzy c-means can no longer be applied. Instead we have made use of a recent method of non-smooth optimisation to solve the problem.
  • [Show abstract] [Hide abstract]
    ABSTRACT: With the advancement of high-throughput biotechnologies, biological data describing DNA, RNA, protein, and metabolite biomolecules are generated faster than ever. Huge amount of information is being produced and collected. Bioinformatics uses information technology to facilitate the discovery of new knowledge from large sets of various biological data at the molecular level. Within various applications of information technology, clustering has long played an important role. Clustering is an exploratory tool for analyzing large datasets and ...
    Statistical Bioinformatics: A Guide for Life and Biomedical Science Researchers, Edited by Jae K. Lee, 06/2010: chapter Clustering: Unsupervised Learning in Large Biological Data; Wiley-Blackwell., ISBN: 9780470567647


Available from