Ab initio prediction of transcription factor binding sites.

Department of Biomedical Engineering, High-Throughput Biology Center, Johns Hopkins University, Baltimore, MD 21218, USA.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing 02/2007; DOI: 10.1142/9789812772435_0046
Source: PubMed

ABSTRACT Transcription factors are DNA-binding proteins that control gene transcription by binding specific short DNA sequences. Experiments that identify transcription factor binding sites are often laborious and expensive, and the binding sites of many transcription factors remain unknown. We present a computational scheme to predict the binding sites directly from transcription factor sequence using all-atom molecular simulations. This method is a computational counterpart to recent high-throughput experimental technologies that identify transcription factor binding sites (ChIP-chip and protein-dsDNA binding microarrays). The only requirement of our method is an accurate 3D structural model of a transcription factor-DNA complex. We apply free energy calculations by thermodynamic integration to compute the change in binding energy of the complex due to a single base pair mutation. By calculating the binding free energy differences for all possible single mutations, we construct a position weight matrix for the predicted binding sites that can be directly compared with experimental data. As water-bridged hydrogen bonds between the transcription factor and DNA often contribute to the binding specificity, we include explicit solvent in our simulations. We present successful predictions for the yeast MAT-alpha2 homeodomain and GCN4 bZIP proteins. Water-bridged hydrogen bonds are found to be more prevalent than direct protein-DNA hydrogen bonds at the binding interfaces, indicating why empirical potentials with implicit water may be less successful in predicting binding. Our methodology can be applied to a variety of DNA-binding proteins.

  • [Show abstract] [Hide abstract]
    ABSTRACT: This chapter briefly summarizes the topics in this volume.
    Sub-cellular biochemistry 01/2011; 52:1-6. DOI:10.1007/978-90-481-9069-0_1
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Sequence-specific DNA recognition by gene regulatory proteins is critical for proper cellular functioning. The ability to predict the DNA binding preferences of these regulatory proteins from their amino acid sequence would greatly aid in reconstruction of their regulatory interactions. Structural modeling provides one route to such predictions: by building accurate molecular models of regulatory proteins in complex with candidate binding sites, and estimating their relative binding affinities for these sites using a suitable potential function, it should be possible to construct DNA binding profiles. Here, we present a novel molecular modeling protocol for protein-DNA interfaces that borrows conformational sampling techniques from de novo protein structure prediction to generate a diverse ensemble of structural models from small fragments of related and unrelated protein-DNA complexes. The extensive conformational sampling is coupled with sequence space exploration so that binding preferences for the target protein can be inferred from the resulting optimized DNA sequences. We apply the algorithm to predict binding profiles for a benchmark set of eleven C2H2 zinc finger transcription factors, five of known and six of unknown structure. The predicted profiles are in good agreement with experimental binding data; furthermore, examination of the modeled structures gives insight into observed binding preferences.
    Nucleic Acids Research 02/2011; 39(11):4564-76. DOI:10.1093/nar/gkr048 · 8.81 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: An accurate, predictive understanding of protein-DNA binding specificity is crucial for the successful design and engineering of novel protein-DNA binding complexes. In this review, we summarize recent studies that use atomistic representations of interfaces to predict protein-DNA binding specificity computationally. Although methods with limited structural flexibility have proven successful at recapitulating consensus binding sequences from wild-type complex structures, conformational flexibility is likely important for design and template-based modeling, where non-native conformations need to be sampled and accurately scored. A successful application of such computational modeling techniques in the construction of the TAL-DNA complex structure is discussed. With continued improvements in energy functions, solvation models, and conformational sampling, we are optimistic that reliable and large-scale protein-DNA binding prediction and engineering is a goal within reach.
    Current Opinion in Structural Biology 07/2012; 22(4):397-405. DOI:10.1016/ · 8.75 Impact Factor

Full-text (3 Sources)

Available from
May 16, 2014