Accuracy of Protein-Protein Binding Sites in High-Throughput Template-Based Modeling

University of Kansas, Lawrence, Kansas, United States
PLoS Computational Biology (Impact Factor: 4.62). 04/2010; 6(4):e1000727. DOI: 10.1371/journal.pcbi.1000727
Source: PubMed


The accuracy of protein structures, particularly their binding sites, is essential for the success of modeling protein complexes. Computationally inexpensive methodology is required for genome-wide modeling of such structures. For systematic evaluation of potential accuracy in high-throughput modeling of binding sites, a statistical analysis of target-template sequence alignments was performed for a representative set of protein complexes. For most of the complexes, alignments containing all residues of the interface were found. The full interface alignments were obtained even in the case of poor alignments where a relatively small part of the target sequence (as low as 40%) aligned to the template sequence, with a low overall alignment identity (<30%). Although such poor overall alignments might be considered inadequate for modeling of whole proteins, the alignment of the interfaces was strong enough for docking. In the set of homology models built on these alignments, one third of those ranked 1 by a simple sequence identity criteria had RMSD<5 A, the accuracy suitable for low-resolution template free docking. Such models corresponded to multi-domain target proteins, whereas for single-domain proteins the best models had 5 A<RMSD<10 A, the accuracy suitable for less sensitive structure-alignment methods. Overall, approximately 50% of complexes with the interfaces modeled by high-throughput techniques had accuracy suitable for meaningful docking experiments. This percentage will grow with the increasing availability of co-crystallized protein-protein complexes.

Download full-text


Available from: Petras Kundrotas, Jan 10, 2014
  • Source
    • "Template-based prediction approaches reduce the solution space of the docking approaches [2] on the premise that PPI sites are relatively conserved throughout proteins with similar sequence and structural features [28]. With the template-based approaches, high-throughput modeling of PPI sites based on protein docking have been shown with accuracy feasible for low to medium resolution models [34]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.
    PLoS ONE 06/2012; 7(6):e37706. DOI:10.1371/journal.pone.0037706 · 3.23 Impact Factor
  • Source
    • "Shoemaker et al. [69] have recently developed a web server for predicting protein binding sites by inspecting homologous proteins with similar structures. Based on a statistical analysis of target-template sequence alignments on a benchmark dataset of 329 two-chain complexes, Kundrotas and Vakser [70] have shown that it is possible to obtain high quality alignment of interface residues even when the overall alignment quality is rather poor. Specifically, they concluded that in approximately 50% of the complexes considered, the overall accuracy of the modelled interfaces was good enough for guiding docking. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence.Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein.Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the target can be reliably identified. The HomPPI web server is available at Sequence homology-based methods offer a class of computationally efficient and reliable approaches for predicting the protein-protein interface residues that participate in either obligate or transient interactions. For query proteins involved in transient interactions, the reliability of interface residue prediction can be improved by exploiting knowledge of putative interaction partners.
    BMC Bioinformatics 06/2011; 12(1):244. DOI:10.1186/1471-2105-12-244 · 2.58 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present an overview of the MultiG program, an open research program addressing issues from end-user requirements on distributed multimedia applications to medium access protocols for multi-gigabit networks based on optical fibers and wireless extensions to portable workstations, Walkstations. The program is growing into a a national Swedish effort conducted in broad cooperation between academia and industry with substantial support from public sources. The spirit of the program is similar to that of the US program for the establishment of a National Information Infrastructure. SGN, the Stockholm Gigabit Network, a gigabit testbed based on darkfibers, is being built connecting 8 nodes in the Greater Stockholm area. The testbed may be extended nationally during the next few years
    Global Data Networking, 1993. Proceedings; 01/1994
Show more