Conference Paper

DHC: A Density-Based Hierarchical Clustering Method for Time Series Gene Expression Dat.

Dept. of Comput. Sci., State Univ. of New York, Buffalo, NY, USA
DOI: 10.1109/BIBE.2003.1188978 Conference: 3rd IEEE International Symposium on BioInformatics and BioEngineering (BIBE 2003), 10-12 March 2003, Bethesda, MD, USA
Source: DBLP


terns in underlying data, have proved to be useful in finding co-expressed genes. Clustering the time series gene expression data is an im-portant task in bioinformatics research and biomedical ap-plications. Recently, some clustering methods have been adapted or proposed. However, some concerns still remain, such as the robustness of the mining methods, as well as the quality and the interpretability of the mining results. In this paper, we tackle the problem of effectively clus-tering time series gene expression data by proposing al-gorithm DHC, a density-based, hierarchical clustering method. We use a density-based approach to identify the clusters such that the clustering results are of high quality and robustness. Moreover, The mining result is in the form of a density tree, which uncovers the embedded clusters in a data set. The inner-structures, the borders and the out-liers of the clusters can be further investigated using the attraction tree, which is an intermediate result of the min-ing. By these two trees, the internal structure of the data set can be visualized effectively. Our empirical evaluation using some real-world data sets show that the method is effective, robust and scalable. It matches the ground truth provided by bioinformatics experts very well in the sample data sets.

30 Reads
  • Source
    • "For example, Eisen et al. (1998) applied an agglomerative algorithm called UPGMA (Unweighted Pair Group Method with Arithmetic mean) and adopted a method to graphically represent the clustered data set, while (Alon et al., 1999) split the genes through a divisive approach, called the deterministic-annealing algorithm. A variation of the hierarchical clustering algorithm is proposed in Jiang et al. (2003). The authors have applied a Density-based Hierarchical Clustering method (DHC) on two datasets for which the true partition is known. "
    Knowledge Discovery Practices and Emerging Applications of Data Mining: Trends and New Domains, 09/2013: pages 23; Igi Publishing.
  • Source
    • "[1] proposed an algorithm that adopts the idea of the k-means to cluster genes by replacing the distance measure with the interdependence redundancy measure. A variation of the hierarchical clustering algorithm is proposed in [21]. The basic idea is to consider a cluster as a high-dimensional dense area, where data objects are attracted with each other. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Evaluating clustering results is a fundamental task in microarray data analysis, due to the lack of enough biological knowledge to know in advance the true partition of genes. Many quality indexes for gene clustering evaluation have been proposed. A critical issue in this domain is to compare and aggregate quality indexes to select the best clustering algorithm and the optimal parameter setting for a dataset. Furthermore, due to the huge amount of data generated by microarray experiments and the requirement of external resources such as ontologies to compute biological indexes, another critical issue is the performance decline in term of execution time. Thus, the distributed computation of algorithms and quality indexes becomes essential. Addressing these issues, this paper presents the MicroClAn framework, a distributed system to evaluate and compare clustering algorithms using the most exploited quality indexes. The best solution is selected through a two-step ranking aggregation of the ranks produced by quality indexes. A new index oriented to the biological validation of microarray clustering results is also introduced. Several scheduling strategies integrated in the framework allow to distribute tasks in the grid environment to optimize the completion time. Experimental results show the effectiveness of our aggregation strategy in identifying the best rank among different clustering algorithms. Moreover, our framework achieves good performance in terms of completion time with few computational resources.
    Journal of Parallel and Distributed Computing 03/2013; 73(3):360 - 370. DOI:10.1016/j.jpdc.2012.09.008 · 1.18 Impact Factor
  • Source
    • "Gene expression data often contain embedded and intersecting clusters the identification of which is very tough (Jiang et al., 2003). The Density-Based Hierarchical clustering method (DHC) (Jiang et al., 2003) can identify embedded clusters in the dataset even in presence of outliers and can effectively visualize the internal structure of the data set. In (Das et al., 2010), a density based method (DGC) is presented for clustering gene expression data using a twoobjective function. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents an effective parameter-less graph based clustering technique (GCEPD). GCEPD produces highly coherent clusters in terms of various cluster validity measures. The technique finds highly coherent patterns containing genes with high biological relevance. Experiments with real life datasets establish that the method produces clusters that are significantly better than other similar algorithms in terms of various quality measures.
    International Journal of Bioinformatics Research and Applications 03/2012; 8(1-2):18-37. DOI:10.1504/IJBRA.2012.045974
Show more


30 Reads
Available from