Article

The XXmotif web server for eXhaustive, weight matriX-based motif discovery in nucleotide sequences.

Gene Center, Department of Biochemistry, and Center for Integrated Protein Science Munich (CIPSM), Ludwig-Maximilians-Universität (LMU) München, Feodor-Lynen-Straße 25, 81377 Munich, Germany.
Nucleic Acids Research (Impact Factor: 8.28). 06/2012; 40(Web Server issue):W104-9. DOI: 10.1093/nar/gks602
Source: PubMed

ABSTRACT The discovery of regulatory motifs enriched in sets of DNA or RNA sequences is fundamental to the analysis of a great variety of functional genomics experiments. These motifs usually represent binding sites of proteins or non-coding RNAs, which are best described by position weight matrices (PWMs). We have recently developed XXmotif, a de novo motif discovery method that is able to directly optimize the statistical significance of PWMs. XXmotif can also score conservation and positional clustering of motifs. The XXmotif server provides (i) a list of significantly overrepresented motif PWMs with web logos and E-values; (ii) a graph with color-coded boxes indicating the positions of selected motifs in the input sequences; (iii) a histogram of the overall positional distribution for selected motifs and (iv) a page for each motif with all significant motif occurrences, their P-values for enrichment, conservation and localization, their sequence contexts and coordinates. Free access: http://xxmotif.genzentrum.lmu.de.

0 Bookmarks
 · 
238 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Statistics in ranked lists is useful in analysing molecular biology measurement data, such as differential expression, resulting in ranked lists of genes, or ChIP-Seq, which yields ranked lists of genomic sequences. State of the art methods study fixed motifs in ranked lists of sequences. More flexible models such as position weight matrix (PWM) motifs are more challenging in this context, partially because it is not clear how to avoid the use of arbitrary thresholds. To assess the enrichment of a PWM motif in a ranked list we use a second ranking on the same set of elements induced by the PWM. Possible orders of one ranked list relative to another can be modelled as permutations. Due to sample space complexity, it is difficult to accurately characterize tail distributions in the group of permutations. In this paper we develop tight upper bounds on tail distributions of the size of the intersection of the top parts of two uniformly and independently drawn permutations. We further demonstrate advantages of this approach using our software implementation, mmHG-Finder, which is publicly available, to study PWM motifs in several datasets. In addition to validating known motifs, we found GC-rich strings to be enriched amongst the promoter sequences of long non-coding RNAs that are specifically expressed in thyroid and prostate tissue samples and observed a statistical association with tissue specific CpG hypo-methylation. We develop tight bounds that can be calculated in polynomial time. We demonstrate utility of mutual enrichment in motif search and assess performance for synthetic and biological datasets. We suggest that thyroid and prostate-specific long non-coding RNAs are regulated by transcription factors that bind GC-rich sequences, such as EGR1, SP1 and E2F3. We further suggest that this regulation is associated with DNA hypo-methylation.
    Algorithms for Molecular Biology 04/2014; 9(1):11. · 1.61 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: MEME-ChIP is a web-based tool for analyzing motifs in large DNA or RNA data sets. It can analyze peak regions identified by ChIP-seq, cross-linking sites identified by CLIP-seq and related assays, as well as sets of genomic regions selected using other criteria. MEME-ChIP performs de novo motif discovery, motif enrichment analysis, motif location analysis and motif clustering, providing a comprehensive picture of the DNA or RNA motifs that are enriched in the input sequences. MEME-ChIP performs two complementary types of de novo motif discovery: weight matrix-based discovery for high accuracy; and word-based discovery for high sensitivity. Motif enrichment analysis using DNA or RNA motifs from human, mouse, worm, fly and other model organisms provides even greater sensitivity. MEME-ChIP's interactive HTML output groups and aligns significant motifs to ease interpretation. This protocol takes less than 3 h, and it provides motif discovery approaches that are distinct and complementary to other online methods.
    Nature Protocols 06/2014; 9(6):1428-50. · 7.96 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The TGF-beta signaling pathway is a fundamental pathway in the living cell, which plays a key role in many central cellular processes. The complex and sometimes contradicting mechanisms by which TGF-beta yields phenotypic effects are not yet completely understood. In this study we investigated and compared the transcriptional response profile of TGF-beta1 stimulation in different cell types. For this purpose, extensive experiments are performed and time-course microarray data are generated in human and mouse parenchymal liver cells, human mesenchymal stromal cells and mouse hematopoietic progenitor cells at different time points. We applied a panel of bioinformatics methods on our data to uncover common patterns in the dynamic gene expression response in respective cells.
    BMC Systems Biology 05/2014; 8(1):55. · 2.98 Impact Factor

Full-text (2 Sources)

View
43 Downloads
Available from
Jun 4, 2014