Assessing Computational Tools for the Discovery of Transcription Factor Binding Sites

Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, Washington 98195-2350, USA.
Nature Biotechnology (Impact Factor: 41.51). 02/2005; 23(1):137-44. DOI: 10.1038/nbt1053
Source: PubMed


The prediction of regulatory elements is a problem where computational methods offer great hope. Over the past few years, numerous tools have become available for this task. The purpose of the current assessment is twofold: to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.

Download full-text


Available from: Graziano Pesole
  • Source
    • "Traditionally, these regulatory sites were determined by labor-intensive experiments such as DNA footprinting [3] or gel electrophoresis [4]. Various computational approaches have been developed to predict TF binding sites in silico [5]. A number of high-throughput experimental technologies were also developed recently to determine protein-DNA binding affinity extensively. "
    [Show abstract] [Hide abstract]
    ABSTRACT: With the prevalence of chromatin immunoprecipitation (ChIP) with sequencing (ChIP-Seq) technology, massive ChIP-Seq data has been accumulated. The ChIP-Seq technology measures the genome-wide occupancy of DNA-binding proteins in vivo. It is well-known that different DNA-binding protein occupancies may result in a gene being regulated in different conditions (e.g. different cell types). To fully understand a gene's function, it is essential to develop probabilistic models on multiple ChIP-Seq profiles for deciphering the gene transcription causalities. In this work, we propose and describe two probabilistic models. Assuming the conditional independence of different DNA-binding proteins' occupancies, the first method (SignalRanker) is developed as an intuitive method for ChIP-Seq genome-wide signal profile inference. Unfortunately, such an assumption may not always hold in some gene regulation cases. Thus, we propose and describe another method (FullSignalRanker) which does not make the conditional independence assumption. The proposed methods are compared with other existing methods on ENCODE ChIP-Seq datasets, demonstrating its regression and classification ability. The results suggest that FullSignalRanker is the best-performing method for recovering the signal ranks on the promoter and enhancer regions. In addition, FullSignalRanker is also the best-performing method for peak sequence classification. We envision that SignalRanker and FullSignalRanker will become important in the era of next generation sequencing. FullSignalRanker program is available on the following website: ∼ wkc/FullSignalRanker/.
    Full-text · Article · Dec 2015 · IEEE/ACM Transactions on Computational Biology and Bioinformatics
  • Source
    • "promoter function (Kobayashi et al., 2013a). In addition, several bioinformatic procedures have been developed to predict cis-elements (Tompa et al., 2005; Zou et al., 2011). For example, we previously developed a procedure for cis-element prediction using a microarray dataset that computed the relative appearance ratio (RAR) of the octamers (i.e. the frequency of a particular octamer in the grouped genes relative to that in the genome-wide genes) as a predictive index (Yamamoto et al., 2011b). "
    [Show abstract] [Hide abstract]
    ABSTRACT: In Arabidopsis thaliana the root apex is protected from aluminum (Al) rhizotoxicity by excretion of malate, an Al-chelator, by Al-activated malate transporter 1 (AtALMT1). AtALMT1 expression is fundamentally regulated by the STOP1 (Sensitive TO Proton rhizotoxicity 1) zinc finger protein, but other transcription factors have roles that enable Al-inducible expression with a broad dynamic range. In this study, we characterized multiple cis-elements in the AtALMT1 promoter that interact with transcription factors. In planta complementation assays of AtALMT1 driven by 5' truncated promoters of different lengths showed that the promoter region between -540 and 0 (the first ATG) restored the Al-sensitive phenotype of atalm1 and thus contains cis-elements essential for AtALMT1 expression for Al tolerance. Computation of overrepresented octamers showed that eight regions in this promoter region contained potential cis-elements involved in Al induction and STOP1 regulation. Mutation in a position around -297 from the first ATG completely inactivated AtALMT1 expression and Al response. In vitro binding assays showed that this region contained the STOP1 binding site, which accounted for the recognition by four zinc finger domains of the protein. Other positions were characterized as cis-elements that regulated expression by repressors and activators, and a transcription factor that determines root-tip expression of AtALMT1. From the consensus of known cis-elements, we identified CAMTA2 to be an activator of AtALMT1 expression. Al-inducible expression of AtALMT1 changed transcription starting sites, which increased the abundance of transcripts with a shortened 5' untranslated region. The present analyses identified multiple mechanisms that regulate AtALMT1 expression. Copyright © 2015, American Society of Plant Biologists.
    Full-text · Article · Jan 2015
  • Source
    • "and the relative frequency f ðb i ; jÞ of the nucleotide b i appearing at the jth position (i.e., jth term) computing according to S, define a matrix M ¼ ½f ðb i ; jފ 4Âk whose ði; jÞ-entry is f ðb i ; jÞ, called the position frequency matrix-based motif model of S, shortly, the PFM-based motif model of S [31] [24] [42] [36] [32] [14] [29] [37]. Note that in the initial paper [31], the notation f ðb; jÞ was used for f ðb i ; jÞ. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Site-specific DNA–protein interactions can be studied by using experimental and computational methods. Understanding the relationship between these two perspectives and finding ways to improve both is a major challenge of modern molecular biology. Computational approaches for finding DNA motifs are well recognized as useful tools to biologists, which greatly help in saving experimental time and cost in wet laboratories. Over these years, many motif search algorithms and Web-based tools have been developed based on computational intelligence systems and data mining techniques in light of classical electronic computers. In order to use quantum computers in DNA motif model discovering, a related quantum algorithm is necessary. In this paper, based on the quantum adiabatic theorem, an adiabatic quantum algorithm and its application to DNA motif model discovery are obtained in terms of the MISCORE-based motif score. The proposed method can be used to deal with discovering of DNA motif models with big data by a quantum computer, e.g. D-Wave.
    Full-text · Article · Nov 2014 · Information Sciences
Show more