Assessing Computational Tools for the Discovery of Transcription Factor Binding Sites

Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, Washington 98195-2350, USA.
Nature Biotechnology (Impact Factor: 41.51). 02/2005; 23(1):137-44. DOI: 10.1038/nbt1053
Source: PubMed


The prediction of regulatory elements is a problem where computational methods offer great hope. Over the past few years, numerous tools have become available for this task. The purpose of the current assessment is twofold: to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.

Download full-text


Available from: Graziano Pesole,
  • Source
    • "promoter function (Kobayashi et al., 2013a). In addition, several bioinformatic procedures have been developed to predict cis-elements (Tompa et al., 2005; Zou et al., 2011). For example, we previously developed a procedure for cis-element prediction using a microarray dataset that computed the relative appearance ratio (RAR) of the octamers (i.e. the frequency of a particular octamer in the grouped genes relative to that in the genome-wide genes) as a predictive index (Yamamoto et al., 2011b). "
    [Show abstract] [Hide abstract]
    ABSTRACT: In Arabidopsis thaliana the root apex is protected from aluminum (Al) rhizotoxicity by excretion of malate, an Al-chelator, by Al-activated malate transporter 1 (AtALMT1). AtALMT1 expression is fundamentally regulated by the STOP1 (Sensitive TO Proton rhizotoxicity 1) zinc finger protein, but other transcription factors have roles that enable Al-inducible expression with a broad dynamic range. In this study, we characterized multiple cis-elements in the AtALMT1 promoter that interact with transcription factors. In planta complementation assays of AtALMT1 driven by 5' truncated promoters of different lengths showed that the promoter region between -540 and 0 (the first ATG) restored the Al-sensitive phenotype of atalm1 and thus contains cis-elements essential for AtALMT1 expression for Al tolerance. Computation of overrepresented octamers showed that eight regions in this promoter region contained potential cis-elements involved in Al induction and STOP1 regulation. Mutation in a position around -297 from the first ATG completely inactivated AtALMT1 expression and Al response. In vitro binding assays showed that this region contained the STOP1 binding site, which accounted for the recognition by four zinc finger domains of the protein. Other positions were characterized as cis-elements that regulated expression by repressors and activators, and a transcription factor that determines root-tip expression of AtALMT1. From the consensus of known cis-elements, we identified CAMTA2 to be an activator of AtALMT1 expression. Al-inducible expression of AtALMT1 changed transcription starting sites, which increased the abundance of transcripts with a shortened 5' untranslated region. The present analyses identified multiple mechanisms that regulate AtALMT1 expression. Copyright © 2015, American Society of Plant Biologists.
  • Source
    • "and the relative frequency f ðb i ; jÞ of the nucleotide b i appearing at the jth position (i.e., jth term) computing according to S, define a matrix M ¼ ½f ðb i ; jފ 4Âk whose ði; jÞ-entry is f ðb i ; jÞ, called the position frequency matrix-based motif model of S, shortly, the PFM-based motif model of S [31] [24] [42] [36] [32] [14] [29] [37]. Note that in the initial paper [31], the notation f ðb; jÞ was used for f ðb i ; jÞ. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Site-specific DNA–protein interactions can be studied by using experimental and computational methods. Understanding the relationship between these two perspectives and finding ways to improve both is a major challenge of modern molecular biology. Computational approaches for finding DNA motifs are well recognized as useful tools to biologists, which greatly help in saving experimental time and cost in wet laboratories. Over these years, many motif search algorithms and Web-based tools have been developed based on computational intelligence systems and data mining techniques in light of classical electronic computers. In order to use quantum computers in DNA motif model discovering, a related quantum algorithm is necessary. In this paper, based on the quantum adiabatic theorem, an adiabatic quantum algorithm and its application to DNA motif model discovery are obtained in terms of the MISCORE-based motif score. The proposed method can be used to deal with discovering of DNA motif models with big data by a quantum computer, e.g. D-Wave.
    Information Sciences 11/2014; 296. DOI:10.1016/j.ins.2014.10.057 · 4.04 Impact Factor
  • Source
    • "Motif Finding The literature on sequence motif discovery is vast. We refer to [29] [30] [31] [32] for reviews and additional references. There are two main classes of motif finding algorithms, probabilistic and wordbased . "
    [Show abstract] [Hide abstract]
    ABSTRACT: Sequence discovery tools play a central role in several fields of computational biology. In the framework of Transcription Factor binding studies, most of the existing motif finding algorithms are computationally demanding, and they may not be able to support the increasingly large datasets produced by modern high-throughput sequencing technologies. We present FastMotif, a new motif discovery algorithm that is built on a recent machine learning technique referred to as Method of Moments. Based on spectral decompositions, our method is robust to model misspecifications and is not prone to locally optimal solutions. We obtain an algorithm that is extremely fast and designed for the analysis of big sequencing data. On HT-Selex data, FastMotif extracts motif profiles that match those computed by various state-of-the-art algorithms, but one order of magnitude faster. We provide a theoretical and numerical analysis of the algorithm's robustness and discuss its sensitivity with respect to the free parameters. The Matlab code of FastMotif is available from © The Author (2015). Published by Oxford University Press. All rights reserved. For Permissions, please email:
    Bioinformatics 07/2014; 31(16). DOI:10.1093/bioinformatics/btv208 · 4.98 Impact Factor
Show more