Accuracy of Protein-Protein Binding Sites in High-Throughput Template-Based Modeling

University of Kansas, Lawrence, Kansas, United States
PLoS Computational Biology (Impact Factor: 4.83). 04/2010; 6(4):e1000727. DOI: 10.1371/journal.pcbi.1000727
Source: PubMed

ABSTRACT The accuracy of protein structures, particularly their binding sites, is essential for the success of modeling protein complexes. Computationally inexpensive methodology is required for genome-wide modeling of such structures. For systematic evaluation of potential accuracy in high-throughput modeling of binding sites, a statistical analysis of target-template sequence alignments was performed for a representative set of protein complexes. For most of the complexes, alignments containing all residues of the interface were found. The full interface alignments were obtained even in the case of poor alignments where a relatively small part of the target sequence (as low as 40%) aligned to the template sequence, with a low overall alignment identity (<30%). Although such poor overall alignments might be considered inadequate for modeling of whole proteins, the alignment of the interfaces was strong enough for docking. In the set of homology models built on these alignments, one third of those ranked 1 by a simple sequence identity criteria had RMSD<5 A, the accuracy suitable for low-resolution template free docking. Such models corresponded to multi-domain target proteins, whereas for single-domain proteins the best models had 5 A<RMSD<10 A, the accuracy suitable for less sensitive structure-alignment methods. Overall, approximately 50% of complexes with the interfaces modeled by high-throughput techniques had accuracy suitable for meaningful docking experiments. This percentage will grow with the increasing availability of co-crystallized protein-protein complexes.

Download full-text


Available from: Petras Kundrotas, Jan 10, 2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present an overview of the MultiG program, an open research program addressing issues from end-user requirements on distributed multimedia applications to medium access protocols for multi-gigabit networks based on optical fibers and wireless extensions to portable workstations, Walkstations. The program is growing into a a national Swedish effort conducted in broad cooperation between academia and industry with substantial support from public sources. The spirit of the program is similar to that of the US program for the establishment of a National Information Infrastructure. SGN, the Stockholm Gigabit Network, a gigabit testbed based on darkfibers, is being built connecting 8 nodes in the Greater Stockholm area. The testbed may be extended nationally during the next few years
    Global Data Networking, 1993. Proceedings; 01/1994
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Structural genomics (SG) programs have developed during the last decade many novel methodologies for faster and more accurate structure determination. These new tools and approaches led to the determination of thousands of protein structures. The generation of enormous amounts of experimental data resulted in significant improvements in the understanding of many biological processes at molecular levels. However, the amount of data collected so far is so large that traditional analysis methods are limiting the rate of extraction of biological and biochemical information from 3D models. This situation has prompted us to review the challenges that remain unmet by SG, as well as the areas in which the potential impact of SG could exceed what has been achieved so far.
    Current Opinion in Structural Biology 10/2010; 20(5):587-97. DOI:10.1016/ · 8.75 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Current homology modeling methods for predicting protein-protein interactions (PPIs) have difficulty in the "twilight zone" (<40%) of sequence identities. Threading methods extend coverage further into the twilight zone by aligning primary sequences for a pair of proteins to a best-fit template complex to predict an entire three-dimensional structure. We introduce a threading approach, iWRAP, which focuses only on the protein interface. Our approach combines a novel linear programming formulation for interface alignment with a boosting classifier for interaction prediction. We demonstrate its efficacy on SCOPPI, a classification of PPIs in the Protein Databank, and on the entire yeast genome. iWRAP provides significantly improved prediction of PPIs and their interfaces in stringent cross-validation on SCOPPI. Furthermore, by combining our predictions with a full-complex threader, we achieve a coverage of 13% for the yeast PPIs, which is close to a 50% increase over previous methods at a higher sensitivity. As an application, we effectively combine iWRAP with genomic data to identify novel cancer-related genes involved in chromatin remodeling, nucleosome organization, and ribonuclear complex assembly. iWRAP is available at
    Journal of Molecular Biology 02/2011; 405(5):1295-310. DOI:10.1016/j.jmb.2010.11.025 · 4.33 Impact Factor
Show more