Conference Paper

A General Approach to Mining Quality Pattern-Based Clusters from Microarray Data.

DOI: 10.1007/11408079_18 Conference: Database Systems for Advanced Applications, 10th International Conference, DASFAA 2005, Beijing, China, April 17-20, 2005, Proceedings
Source: DBLP

ABSTRACT Pattern-based clustering has broad applications in microar- ray data analysis, customer segmentation, e-business data analysis, etc. However, pattern-based clustering often returns a large number of highly- overlapping clusters, which makes it hard for users to identify interest- ing patterns from the mining results. Moreover, there lacks of a general model for pattern-based clustering. Different kinds of patterns or differ- ent measures on the pattern coherence may require different algorithms. In this paper, we address the above two problems by proposing a general quality-driven approach to mining top-k quality pattern-based clusters. We examine our quality-driven approach using real world microarray data sets. The experimental results show that our method is general, effective and efficient.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Mining subspace clusters from the DNA microarrays could help researchers identify those genes which commonly contribute to a disease, where a subspace cluster indicates a subset of genes whose expression levels are similar under a subset of conditions. Since in a DNA microarray, the number of genes is far larger than the number of conditions, those previous proposed algorithms which compute the maximum dimension sets (MDSs) for any two genes will take a long time to mine subspace clusters. In this article, we propose the Large Itemset-Based Clustering (LISC) algorithm for mining subspace clusters. Instead of constructing MDSs for any two genes, we construct only MDSs for any two conditions. Then, we transform the task of finding the maximal possible gene sets into the problem of mining large itemsets from the condition-pair MDSs. Since we are only interested in those subspace clusters with gene sets as large as possible, it is desirable to pay attention to those gene sets which have reasonable large support values in the condition-pair MDSs. From our simulation results, we show that the proposed algorithm needs shorter processing time than those previous proposed algorithms which need to construct gene-pair MDSs.
    Journal of computational biology: a journal of computational molecular cell biology 06/2009; 16(5):745-68. · 1.69 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: � Springer-Verlag London Limited 2006 Abstract Extensive studies have shown that mining microarray data sets is im- portant in bioinformatics research and biomedical applications. In this paper, we explore a novel type of gene-sample-time microarray data sets that records the expression levels of various genes under a set of samples during a series of time points. In particular, we propose the mining of coherent gene clusters from such data sets. Each cluster contains a subset of genes and a subset of samples such that the genes are coherent on the samples along the time series. The coherent gene clusters may identify the samples corresponding to some phenotypes (e.g., dis- eases), and suggest the candidate genes correlated to the phenotypes. We present two efficient algorithms, namely the Sample-Gene Search and the Gene-Sample Search, to mine the complete set of coherent gene clusters. We empirically eval- uate the performance of our approaches on both a real microarray data set and synthetic data sets. The test results have shown that our approaches are both effi- cient and effective to find meaningful coherent gene clusters.
    Knowledge and Information Systems 01/2007; 13:305-335. · 2.23 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Pattern-based clustering, which capture the similarity of the patterns exhibited by objects in a subset of dimensions, has broad applications in DNA microarray data analysis, customer segmentation, e-business data analysis, etc. However, pattern- based clustering often returns a large number of highly- overlapping clusters, which makes it hard for users to identify interesting patterns from the huge mining results. Moreover, there lacks a general measurement to evaluate the quality of Clusters which pattern-based clustering obtained. In this paper, we discuss factors which cause highly-overlapping, make error analysis and pattern weighting, and propose qScore as a key evaluation parameters on quality of Clusters. A algorithm which based on qScore is presented to solve the problem of high- overlapping and get better quality clustering results.
    Eighth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2011, 26-28 July 2011, Shanghai, China; 01/2011

Full-text (2 Sources)