Article

A new clustering approach for learning transcriptional modules

DISCO - Department of Computer Science, Systems and Communication, University of Milano Bicocca, Consorzio Milano Ricerche, Milan, Italy.
International Journal of Data Mining and Bioinformatics (Impact Factor: 0.66). 01/2012; 6(3):304-23. DOI: 10.1504/IJDMB.2012.049248
Source: PubMed

ABSTRACT In modern biology, we had an explosion of genomic data from multiple sources, like measurements of RNA levels, gene sequences, annotations or interaction data. These heterogeneous data provide important information that should be integrated through suitable learning methods aimed at elucidating regulatory networks. We propose an iterative relational clustering procedure for finding modules of co-regulated genes. This approach integrates information concerning known Transcription Factors (TFs)--gene interactions with gene expression data to find clusters of genes that share a common regulatory program. The results obtained on two well-known gene expression data sets from Saccharomyces cerevisiae are shown.

0 Followers
 · 
138 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Semi-supervised clustering employs a small amount of labeled data to aid unsupervised learning. Previous work in the area has utilized supervised data in one of two approaches: 1) constraint-based methods that guide the clustering algorithm towards a better grouping of the data, and 2) distance-function learning methods that adapt the underlying similarity metric used by the clustering algorithm. This paper provides new methods for the two approaches as well as presents a new semi-supervised clustering algorithm that integrates both of these techniques in a uniform, principled framework. Experimental results demonstrate that the unified approach produces better clusters than both individual approaches as well as previously proposed semi-supervised clustering algorithms.
    Machine Learning, Proceedings of the Twenty-first International Conference (ICML 2004), Banff, Alberta, Canada, July 4-8, 2004; 01/2004
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We describe an algorithm for discovering regulatory networks of gene modules, GRAM (Genetic Regulatory Modules), that combines information from genome-wide location and expression data sets. A gene module is defined as a set of coexpressed genes to which the same set of transcription factors binds. Unlike previous approaches that relied primarily on functional information from expression data, the GRAM algorithm explicitly links genes to the factors that regulate them by incorporating DNA binding data, which provide direct physical evidence of regulatory interactions. We use the GRAM algorithm to describe a genome-wide regulatory network in Saccharomyces cerevisiae using binding information for 106 transcription factors profiled in rich medium conditions data from over 500 expression experiments. We also present a genome-wide location analysis data set for regulators in yeast cells treated with rapamycin, and use the GRAM algorithm to provide biological insights into this regulatory network
    Nature Biotechnology 12/2003; 21(11):1337-42. DOI:10.1038/nbt890 · 39.08 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Given a set of entities, Cluster Analysis aims at finding subsets, called clusters, which are homogeneous and/or well separated. As many types of clustering and criteria for homogeneity or separation are of interest, this is a vast field. A survey is given from a mathematical programming viewpoint. Steps of a clustering study, types of clustering and criteria are discussed. Then algorithms for hierarchical, partitioning, sequential, and additive clustering are studied. Emphasis is on solution methods, i.e., dynamic programming, graph theoretical algorithms, branch-and-bound, cutting planes, column generation and heuristics.
    Mathematical Programming 10/1997; 79:191-215. DOI:10.1007/BF02614317 · 1.98 Impact Factor