Clustering methods for the analysis of DNA microarray data

Source: CiteSeer

ABSTRACT It is now possible to simultaneously measure the expression of thousands of genes during cellular differentiation and response, through the use of DNA microarrays. A major statistical task is to understand the structure in the data that arise from this technology. In this paper we review various methods of clustering, and illustrate how they can be used to arrange both the genes and cell lines from a set of DNA microarray experiments. The methods discussed are global clustering techniques including hierarchical, K-means, and block clustering, and tree-structured vector quantization. Finally, we propose a new method for identifying structure in subsets of both genes and cell lines that are potentially obscured by the global clustering approaches. 1 Introduction DNA microarrays and other high-throughput methods for analyzing complex nucleic acid samples make it now possible to measure rapidly, efficiently and accurately the levels of virtually all genes expressed in a biologi...

  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present a new statistical pattern recognition method for classifying cytotoxic cellular responses to toxic agents. The advantage of the proposed method is to quickly assess the toxicity level of an unclassified toxic agent on human health by bringing cytotoxic cellular responses with similar patterns (mode of action, MoOA) into the same class. The proposed method is a model-based hierarchical classification approach incorporating principal component analysis (PCA) and functional data analysis (FDA). The cytotoxic cell responses are represented by multi-concentration time-dependent cellular response profiles (TCRPs) which are dynamically recorded by using the xCELLigence real-time cell analysis high-throughput (RTCA HT) system. The classification results obtained using our algorithm show satisfactory discrimination and are validated using biological facts by examining common chemical mechanisms of actions with treatment on human hepatocellular carcinoma cells (HepG2).
    Computational biology and chemistry 02/2014; 49C:23-35. · 1.37 Impact Factor
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Analyzing microarray data represents a computational challenge due to the characteristics of these data. Clustering techniques are widely applied to create groups of genes that exhibit a similar behavior under the conditions tested. Biclustering emerges as an improvement of classical clustering since it relaxes the constraints for grouping genes to be evaluated only under a subset of the conditions and not under all of them. However, this technique is not appropriate for the analysis of longitudinal experiments in which the genes are evaluated under certain conditions at several time points. We present the TriGen algorithm, a genetic algorithm that finds triclusters of gene expression that take into account the experimental conditions and the time points simultaneously. We have used TriGen to mine datasets related to synthetic data, yeast (Saccharomyces cerevisiae) cell cycle and human inflammation and host response to injury experiments. TriGen has proved to be capable of extracting groups of genes with similar patterns in subsets of conditions and times, and these groups have shown to be related in terms of their functional annotations extracted from the Gene Ontology.
    Neurocomputing 05/2014; · 2.01 Impact Factor