Article

Integrating quality-based clustering of microarray data with Gibbs sampling for the discovery of regulatory motifs

07/2002;
Source: CiteSeer

ABSTRACT In microarray experiments, genes exhibiting a similar expression profile are potentially coregulated. Clustering identifies such groups of coexpressed genes, whose upstream regions can then searched for putative regulatory elements. We present two algorithms and an interactive web-based user interface that integrate cluster analysis and motif finding for the analysis of microarray data. Starting from the expression, we present our adaptive quality-based clustering algorithm to define groups of tightly coexpressed genes. The upstream region is then retrieved based on the accession number and gene name. Once the upstream regions are identified, the sequences are analyzed using Gibbs sampling for motif finding to find the over-represented motifs. Our implementation (called Motif Sampler) allows the use of higher-order models for the sequence background. This methodology can be used through our INCLUSive web interface at the following URL: http://www.esat.kuleuven.ac.be/~dna/BioI/Software.html Keywords: microarrays, clustering, motif finding, Gibbs sampling.

0 0
 · 
0 Bookmarks
 · 
36 Views
  • Source
    Article: The value of prior knowledge in discovering motifs with MEME.
    [show abstract] [hide abstract]
    ABSTRACT: MEME is a tool for discovering motifs in sets of protein or DNA sequences. This paper describes several extensions to MEME which increase its ability to find motifs in a totally unsupervised fashion, but which also allow it to benefit when prior knowledge is available. When no background knowledge is asserted. MEME obtains increased robustness from a method for determining motif widths automatically, and from probabilistic models that allow motifs to be absent in some input sequences. On the other hand, MEME can exploit prior knowledge about a motif being present in all input sequences, about the length of a motif and whether it is a palindrome, and (using Dirichlet mixtures) about expected patterns in individual motif positions. Extensive experiments are reported which support the claim that MEME benefits from, but does not require, background knowledge. The experiments use seven previously studied DNA and protein sequence families and 75 of the protein families documented in the Prosite database of sites and patterns, Release 11.1.
    Proceedings / ... International Conference on Intelligent Systems for Molecular Biology; ISMB. International Conference on Intelligent Systems for Molecular Biology 02/1995; 3:21-9.
  • Source
    Article: A genome-wide transcriptional analysis of the mitotic cell cycle.
    [show abstract] [hide abstract]
    ABSTRACT: Progression through the eukaryotic cell cycle is known to be both regulated and accompanied by periodic fluctuation in the expression levels of numerous genes. We report here the genome-wide characterization of mRNA transcript levels during the cell cycle of the budding yeast S. cerevisiae. Cell cycle-dependent periodicity was found for 416 of the 6220 monitored transcripts. More than 25% of the 416 genes were found directly adjacent to other genes in the genome that displayed induction in the same cell cycle phase, suggesting a mechanism for local chromosomal organization in global mRNA regulation. More than 60% of the characterized genes that displayed mRNA fluctuation have already been implicated in cell cycle period-specific biological roles. Because more than 20% of human proteins display significant homology to yeast proteins, these results also link a range of human genes to cell cycle period-specific biological functions.
    Molecular Cell 08/1998; 2(1):65-73. · 14.18 Impact Factor
  • Source
    Article: Adaptive quality-based clustering of gene expression profiles.
    [show abstract] [hide abstract]
    ABSTRACT: Microarray experiments generate a considerable amount of data, which analyzed properly help us gain a huge amount of biologically relevant information about the global cellular behaviour. Clustering (grouping genes with similar expression profiles) is one of the first steps in data analysis of high-throughput expression measurements. A number of clustering algorithms have proved useful to make sense of such data. These classical algorithms, though useful, suffer from several drawbacks (e.g. they require the predefinition of arbitrary parameters like the number of clusters; they force every gene into a cluster despite a low correlation with other cluster members). In the following we describe a novel adaptive quality-based clustering algorithm that tackles some of these drawbacks. We propose a heuristic iterative two-step algorithm: First, we find in the high-dimensional representation of the data a sphere where the "density" of expression profiles is locally maximal (based on a preliminary estimate of the radius of the cluster-quality-based approach). In a second step, we derive an optimal radius of the cluster (adaptive approach) so that only the significantly coexpressed genes are included in the cluster. This estimation is achieved by fitting a model to the data using an EM-algorithm. By inferring the radius from the data itself, the biologist is freed from finding an optimal value for this radius by trial-and-error. The computational complexity of this method is approximately linear in the number of gene expression profiles in the data set. Finally, our method is successfully validated using existing data sets. http://www.esat.kuleuven.ac.be/~thijs/Work/Clustering.html
    Bioinformatics 06/2002; 18(5):735-46. · 5.47 Impact Factor

Full-text (2 Sources)

View
0 Downloads
Available from
17 May 2013

Keywords

accession number
 
adaptive quality-based clustering algorithm
 
clustering
 
define groups
 
following URL
 
higher-order models
 
INCLUSive web interface
 
integrate cluster analysis
 
interactive web-based user interface
 
microarray data
 
microarray experiments
 
microarrays
 
Motif Sampler
 
over-represented motifs
 
putative regulatory elements
 
sequence background
 
sequences
 
similar expression profile
 
upstream region
 
upstream regions