The XXmotif web server for eXhaustive, weight matriX-based motif discovery in nucleotide sequences

Gene Center, Department of Biochemistry, and Center for Integrated Protein Science Munich (CIPSM), Ludwig-Maximilians-Universität (LMU) München, Feodor-Lynen-Straße 25, 81377 Munich, Germany.
Nucleic Acids Research (Impact Factor: 9.11). 06/2012; 40(Web Server issue):W104-9. DOI: 10.1093/nar/gks602
Source: PubMed


The discovery of regulatory motifs enriched in sets of DNA or RNA sequences is fundamental to the analysis of a great variety
of functional genomics experiments. These motifs usually represent binding sites of proteins or non-coding RNAs, which are
best described by position weight matrices (PWMs). We have recently developed XXmotif, a de novo motif discovery method that is able to directly optimize the statistical significance of PWMs. XXmotif can also score conservation
and positional clustering of motifs. The XXmotif server provides (i) a list of significantly overrepresented motif PWMs with
web logos and E-values; (ii) a graph with color-coded boxes indicating the positions of selected motifs in the input sequences; (iii) a histogram
of the overall positional distribution for selected motifs and (iv) a page for each motif with all significant motif occurrences,
their P-values for enrichment, conservation and localization, their sequence contexts and coordinates. Free access:

  • Source
    • "De novo motif discovery in Ttk69 regions was performed with XXmotif (Luehr et al., 2012) using 200 bp repeat-masked regions centered on the ChIP peak. TOMTOM (Gupta et al., 2007) was used (default parameters, but AT/GC content was set to 0.3/0.2) to match discovered motifs to known Drosophila PWM databases (FlyFactorSurvey, FlyRegv2, idmmpmm2009 and dmmpmm2009) with P-value ≤0.05. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Molecular models of cell fate specification typically focus on the activation of specific lineage programs. However, the concurrent repression of unwanted transcriptional networks is also essential to stabilize certain cellular identities, as shown in a number of diverse systems and phyla. Here, we demonstrate that this dual requirement also holds true in the context of Drosophila myogenesis. By integrating genetics and genomics, we identified a new role for the pleiotropic transcriptional repressor Tramtrack69 in myoblast specification. Drosophila muscles are formed through the fusion of two discrete cell types: founder cells (FCs) and fusion-competent myoblasts (FCMs). When tramtrack69 is removed, FCMs appear to adopt an alternative muscle FC-like fate. Conversely, ectopic expression of this repressor phenocopies muscle defects seen in loss-of-function lame duck mutants, a transcription factor specific to FCMs. This occurs through Tramtrack69-mediated repression in FCMs, whereas Lame duck activates a largely distinct transcriptional program in the same cells. Lineage-specific factors are therefore not sufficient to maintain FCM identity. Instead, their identity appears more plastic, requiring the combination of instructive repressive and activating programs to stabilize cell fate.
    Full-text · Article · Jul 2014 · Journal of Cell Science
  • Source
    • "We analyzed DE genes with respect to overrepresented sequence motifs in their promoter regions with the XXmotif tool [39]. Significant motifs were then compared to known position weight matrices (TRANSFAC) of transcription factors (TFs) via STAMP [40]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background The TGF-β signaling pathway is a fundamental pathway in the living cell, which plays a key role in many central cellular processes. The complex and sometimes contradicting mechanisms by which TGF-β yields phenotypic effects are not yet completely understood. In this study we investigated and compared the transcriptional response profile of TGF-β1 stimulation in different cell types. For this purpose, extensive experiments are performed and time-course microarray data are generated in human and mouse parenchymal liver cells, human mesenchymal stromal cells and mouse hematopoietic progenitor cells at different time points. We applied a panel of bioinformatics methods on our data to uncover common patterns in the dynamic gene expression response in respective cells. Results Our analysis revealed a quite variable and multifaceted transcriptional response profile of TGF-β1 stimulation, which goes far beyond the well-characterized classical TGF-β1 signaling pathway. Nonetheless, we could identify several commonly affected processes and signaling pathways across cell types and species. In addition our analysis suggested an important role of the transcription factor EGR1, which appeared to have a conserved influence across cell-types and species. Validation via an independent dataset on A549 lung adenocarcinoma cells largely confirmed our findings. Network analysis suggested explanations, how TGF-β1 stimulation could lead to the observed effects. Conclusions The analysis of dynamical transcriptional response to TGF-β treatment experiments in different human and murine cell systems revealed commonly affected biological processes and pathways, which could be linked to TGF-β1 via network analysis. This helps to gain insights about TGF-β pathway activities in these cell systems and its conserved interactions between the species and tissue types.
    Full-text · Article · May 2014 · BMC Systems Biology
  • Source
    • "Then, variants of a predefined IUPAC motif were planted at the top 64 sequences of the dataset. We compared the motifs found by mmHG-Finder to those obtained by using three other methods: the standard MEME program [28], DREME [29], and XXmotif [30]. Selected results of this comparison are summarized in Figure 2, and the full output is shown in Additional file 1. Evidently, mmHG-Finder outperformed all the other three tools on the synthetic examples, which contained degenerate motifs. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Statistics in ranked lists is useful in analysing molecular biology measurement data, such as differential expression, resulting in ranked lists of genes, or ChIP-Seq, which yields ranked lists of genomic sequences. State of the art methods study fixed motifs in ranked lists of sequences. More flexible models such as position weight matrix (PWM) motifs are more challenging in this context, partially because it is not clear how to avoid the use of arbitrary thresholds. To assess the enrichment of a PWM motif in a ranked list we use a second ranking on the same set of elements induced by the PWM. Possible orders of one ranked list relative to another can be modelled as permutations. Due to sample space complexity, it is difficult to accurately characterize tail distributions in the group of permutations. In this paper we develop tight upper bounds on tail distributions of the size of the intersection of the top parts of two uniformly and independently drawn permutations. We further demonstrate advantages of this approach using our software implementation, mmHG-Finder, which is publicly available, to study PWM motifs in several datasets. In addition to validating known motifs, we found GC-rich strings to be enriched amongst the promoter sequences of long non-coding RNAs that are specifically expressed in thyroid and prostate tissue samples and observed a statistical association with tissue specific CpG hypo-methylation. We develop tight bounds that can be calculated in polynomial time. We demonstrate utility of mutual enrichment in motif search and assess performance for synthetic and biological datasets. We suggest that thyroid and prostate-specific long non-coding RNAs are regulated by transcription factors that bind GC-rich sequences, such as EGR1, SP1 and E2F3. We further suggest that this regulation is associated with DNA hypo-methylation.
    Full-text · Article · Apr 2014 · Algorithms for Molecular Biology
Show more