Article

Prediction of TF target sites based on atomistic models of protein-DNA complexes.

Departamento de Bioquímica y Biología Molecular y Celular, Facultad de Ciencias, Universidad de Zaragoza, Pedro Cerbuna 12, 50009 Zaragoza, España.
BMC Bioinformatics (impact factor: 2.75). 11/2008; 9:436. DOI:10.1186/1471-2105-9-436 pp.436
Source: PubMed

ABSTRACT The specific recognition of genomic cis-regulatory elements by transcription factors (TFs) plays an essential role in the regulation of coordinated gene expression. Studying the mechanisms determining binding specificity in protein-DNA interactions is thus an important goal. Most current approaches for modeling TF specific recognition rely on the knowledge of large sets of cognate target sites and consider only the information contained in their primary sequence.
Here we describe a structure-based methodology for predicting sequence motifs starting from the coordinates of a TF-DNA complex. Our algorithm combines information regarding the direct and indirect readout of DNA into an atomistic statistical model, which is used to estimate the interaction potential. We first measure the ability of our method to correctly estimate the binding specificities of eight prokaryotic and eukaryotic TFs that belong to different structural superfamilies. Secondly, the method is applied to two homology models, finding that sampling of interface side-chain rotamers remarkably improves the results. Thirdly, the algorithm is compared with a reference structural method based on contact counts, obtaining comparable predictions for the experimental complexes and more accurate sequence motifs for the homology models.
Our results demonstrate that atomic-detail structural information can be feasibly used to predict TF binding sites. The computational method presented here is universal and might be applied to other systems involving protein-DNA recognition.

0 0
 · 
0 Bookmarks
 · 
45 Views
  • Source
    Article: Intermolecular and intramolecular readout mechanisms in protein-DNA recognition.
    [show abstract] [hide abstract]
    ABSTRACT: Protein-DNA recognition plays an essential role in the regulation of gene expression. Regulatory proteins are known to recognize specific DNA sequences directly through atomic contacts (intermolecular readout) and/or indirectly through the conformational properties of the DNA (intramolecular readout). However, little is known about the respective contributions made by these so-called direct and indirect readout mechanisms. We addressed this question by making use of information extracted from a structural database containing many protein-DNA complexes. We quantified the specificity of intermolecular (direct) readout by statistical analysis of base-amino acid interactions within protein-DNA complexes. The specificity of the intramolecular (indirect) readout due to DNA was quantified by statistical analysis of the sequence-dependent DNA conformation. Systematic comparison of these specificities in a large number of protein-DNA complexes revealed that both intermolecular and intramolecular readouts contribute to the specificity of protein-DNA recognition, and that their relative contributions vary depending upon the protein-DNA complexes. We demonstrated that combination of the intermolecular and intramolecular energies derived from the statistical analyses lead to enhanced specificity, and that the combined energy could explain experimental data on binding affinity changes caused by base mutations. These results provided new insight into the relationship between specificity and structure in the process of protein-DNA recognition, which would lead to prediction of specific protein-DNA binding sites.
    Journal of Molecular Biology 04/2004; 337(2):285-94. · 4.00 Impact Factor
  • Source
    Article: Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity.
    [show abstract] [hide abstract]
    ABSTRACT: We investigate the conservation of amino acid residue sequences in 21 DNA-binding protein families and study the effects that mutations have on DNA-sequence recognition. The observations are best understood by assigning each protein family to one of three classes: (i) non-specific, where binding is independent of DNA sequence; (ii) highly specific, where binding is specific and all members of the family target the same DNA sequence; and (iii) multi-specific, where binding is also specific, but individual family members target different DNA sequences. Overall, protein residues in contact with the DNA are better conserved than the rest of the protein surface, but there is a complex underlying trend of conservation for individual residue positions. Amino acid residues that interact with the DNA backbone are well conserved across all protein families and provide a core of stabilising contacts for homologous protein-DNA complexes. In contrast, amino acid residues that interact with DNA bases have variable levels of conservation depending on the family classification. In non-specific families, base-contacting residues are well conserved and interactions are always found in the minor groove where there is little discrimination between base types. In highly specific families, base-contacting residues are highly conserved and allow member proteins to recognise the same target sequence. In multi-specific families, base-contacting residues undergo frequent mutations and enable different proteins to recognise distinct target sequences. Finally, we report that interactions with bases in the target sequence often follow (though not always) a universal code of amino acid-base recognition and the effects of amino acid mutations can be most easily understood for these interactions.
    Journal of Molecular Biology 08/2002; 320(5):991-1009. · 4.00 Impact Factor
  • Article: Structural analysis of conserved base-pairs in protein-DNA complexes.
    [show abstract] [hide abstract]
    ABSTRACT: Understanding of protein-DNA interactions is crucial for prediction of the DNA-binding specificity of transcription factors and design of novel DNAbinding proteins. In this paper we develop a novel approach to analysis of protein-DNA interactions. We bring together structures of protein-DNA complexes and data on evolution of the DNA binding sites. This allows us to reveal the features of protein-DNA complexes that are conserved in evolution and, hence, are more important in specific recognition. The main result of this study is that base-pairs that have more interactions with the protein are more conserved in evolution. We also observe that for most of the studied proteins hydrogen bonds and hydrophobic interactions alone can not explain the pattern of evolutionary conservation in the binding site. Implications for prediction of the DNA-binding specificity are discussed. Introduction Protein-DNA interactions are central for the regulation of gene expression in a cell. Up t...
    03/2001;

Full-text (3 Sources)

View
5 Downloads
Available from
14 Sep 2012

Keywords

accurate sequence motifs
 
atomic-detail structural information
 
atomistic statistical model
 
binding specificities
 
cognate target sites
 
computational method
 
contact counts
 
current approaches
 
different structural superfamilies
 
essential role
 
eukaryotic TFs
 
gene expression
 
genomic cis-regulatory elements
 
interface side-chain rotamers
 
modeling TF specific recognition
 
primary sequence
 
reference structural method
 
TF binding sites
 
TF-DNA complex
 
transcription factors
 

Vladimir Espinosa Angarica