Article

Assessing computational tools for the discovery of transcription factor binding sites.

Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, Washington 98195-2350, USA.
Nature Biotechnology (Impact Factor: 39.08). 02/2005; 23(1):137-44. DOI: 10.1038/nbt1053
Source: PubMed

ABSTRACT The prediction of regulatory elements is a problem where computational methods offer great hope. Over the past few years, numerous tools have become available for this task. The purpose of the current assessment is twofold: to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.

Download full-text

Full-text

Available from: Graziano Pesole, Jul 05, 2015
0 Followers
 · 
182 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In Arabidopsis thaliana the root apex is protected from aluminum (Al) rhizotoxicity by excretion of malate, an Al-chelator, by Al-activated malate transporter 1 (AtALMT1). AtALMT1 expression is fundamentally regulated by the STOP1 (Sensitive TO Proton rhizotoxicity 1) zinc finger protein, but other transcription factors have roles that enable Al-inducible expression with a broad dynamic range. In this study, we characterized multiple cis-elements in the AtALMT1 promoter that interact with transcription factors. In planta complementation assays of AtALMT1 driven by 5' truncated promoters of different lengths showed that the promoter region between -540 and 0 (the first ATG) restored the Al-sensitive phenotype of atalm1 and thus contains cis-elements essential for AtALMT1 expression for Al tolerance. Computation of overrepresented octamers showed that eight regions in this promoter region contained potential cis-elements involved in Al induction and STOP1 regulation. Mutation in a position around -297 from the first ATG completely inactivated AtALMT1 expression and Al response. In vitro binding assays showed that this region contained the STOP1 binding site, which accounted for the recognition by four zinc finger domains of the protein. Other positions were characterized as cis-elements that regulated expression by repressors and activators, and a transcription factor that determines root-tip expression of AtALMT1. From the consensus of known cis-elements, we identified CAMTA2 to be an activator of AtALMT1 expression. Al-inducible expression of AtALMT1 changed transcription starting sites, which increased the abundance of transcripts with a shortened 5' untranslated region. The present analyses identified multiple mechanisms that regulate AtALMT1 expression. Copyright © 2015, American Society of Plant Biologists.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Sequence discovery tools play a central role in several fields of computational biology. In the framework of Transcription Factor binding studies, most of the existing motif finding algorithms are computationally demanding, and they may not be able to support the increasingly large datasets produced by modern high-throughput sequencing technologies. We present FastMotif, a new motif discovery algorithm that is built on a recent machine learning technique referred to as Method of Moments. Based on spectral decompositions, our method is robust to model misspecifications and is not prone to locally optimal solutions. We obtain an algorithm that is extremely fast and designed for the analysis of big sequencing data. On HT-Selex data, FastMotif extracts motif profiles that match those computed by various state-of-the-art algorithms, but one order of magnitude faster. We provide a theoretical and numerical analysis of the algorithm's robustness and discuss its sensitivity with respect to the free parameters. The Matlab code of FastMotif is available from http://lcsb-portal.uni.lu/bioinformatics. vlassis@adobe.com. © The Author (2015). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
    Bioinformatics 07/2014; DOI:10.1093/bioinformatics/btv208 · 4.62 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Transcriptional regulation plays an important role in establishing gene expression profiles during development or in response to (a)biotic stimuli. Transcription factor binding sites (TFBSs) are the functional elements that determine transcriptional activity, and the identification of individual TFBS in genome sequences is a major goal to inferring regulatory networks. We have developed a phylogenetic footprinting approach for the identification of conserved noncoding sequences (CNSs) across 12 dicot plants. Whereas both alignment and non-alignment-based techniques were applied to identify functional motifs in a multispecies context, our method accounts for incomplete motif conservation as well as high sequence divergence between related species. We identified 69,361 footprints associated with 17,895 genes. Through the integration of known TFBS obtained from the literature and experimental studies, we used the CNSs to compile a gene regulatory network in Arabidopsis thaliana containing 40,758 interactions, of which two-thirds act through binding events located in DNase I hypersensitive sites. This network shows significant enrichment toward in vivo targets of known regulators, and its overall quality was confirmed using five different biological validation metrics. Finally, through the integration of detailed expression and function information, we demonstrate how static CNSs can be converted into condition-dependent regulatory networks, offering opportunities for regulatory gene annotation.
    The Plant Cell 07/2014; 26(7). DOI:10.1105/tpc.114.127001 · 9.58 Impact Factor