Article

Accurate Prediction of Docked Protein Structure Similarity

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

One of the major challenges for protein-protein docking methods is to accurately discriminate nativelike structures. The protein docking community agrees on the existence of a relationship between various favorable intermolecular interactions (e.g. Van der Waals, electrostatic, desolvation forces, etc.) and the similarity of a conformation to its native structure. Different docking algorithms often formulate this relationship as a weighted sum of selected terms and calibrate their weights against specific training data to evaluate and rank candidate structures. However, the exact form of this relationship is unknown and the accuracy of such methods is impaired by the pervasiveness of false positives. Unlike the conventional scoring functions, we propose a novel machine learning approach that not only ranks the candidate structures relative to each other but also indicates how similar each candidate is to the native conformation. We trained the AccuRMSD neural network with an extensive dataset using the back-propagation learning algorithm. Our method achieved predicting RMSDs of unbound docked complexes with 0.4Å error margin.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In our previous work we proposed AccuRefiner [24], a novel method to refine docked protein-protein complexes. AccuRefiner utilizes AccuRMSD [25,26], a ranking tool we developed to predict the RMSD* (RMSD* is used to distinguish predicted values from actual RMSD values) of a docked refinement candidate with respect to the native structure. The previous version of AccuRMSD used a trained two-layer neural network to approximate the complex relationship between a diverse set of scoring function terms and the RMSD of a docked structure. ...
... We employed AccuRMSD [25,26] to accurately discriminate refinement candidates produced from putative docked protein complexes. AccuRMSD predicts the RMSD* of each refinement candidate with respect to the native conformation. ...
... In the following subsections, we provide an overview of AccuRMSD and its validation on an extensive test set of 12,500 refinement candidates. More details about AccuRMSD can be found in [26]. It should be mentioned that finding ways to improve our decoys, possibly by introducing more flexibility and energy consideration, is part of on-going research. ...
Article
Full-text available
One of the major challenges for protein docking methods is to accurately discriminate native-like structures from false positives. Docking methods are often inaccurate and the results have to be refined and re-ranked to obtain native-like complexes and remove outliers. In a previous work, we introduced AccuRefiner, a machine learning based tool for refining protein-protein complexes. Given a docked complex, the refinement tool produces a small set of refined versions of the input complex, with lower root-mean-square-deviation (RMSD) of atomic positions with respect to the native structure. The method employs a unique ranking tool that accurately predicts the RMSD of docked complexes with respect to the native structure. In this work, we use a deep learning network with a similar set of features and five layers. We show that a properly trained deep learning network can accurately predict the RMSD of a docked complex with 1.40 Å error margin on average, by approximating the complex relationship between a wide set of scoring function terms and the RMSD of a docked structure. The network was trained on 35000 unbound docking complexes generated by RosettaDock. We tested our method on 25 different putative docked complexes produced also by RosettaDock for five proteins that were not included in the training data. The results demonstrate that the high accuracy of the ranking tool enables AccuRefiner to consistently choose the refinement candidates with lower RMSD values compared to the coarsely docked input structures.
... This Contribution: The objective of this paper is two-fold: rst, based on our previous work [1,5,12], we describe a new SVR based machine learning approach, to predict the lRMSD of a set of candidate complexes with respect to their native conformation. Our method includes evolutionary conservation information in addition to physico-chemical interactions. ...
Conference Paper
Discriminating native-like complexes from false-positives with high accuracy is one of the biggest challenges in protein-protein docking. The relationship between various favorable intermolecular interactions (e.g., Van der Waals, electrostatic, desolvation forces, etc.) and the similarity of a conformation to its native structure is commonly agreed, though the precise nature of this relationship is not known very well. Existing protein-protein docking methods typically formulate this relationship as a weighted sum of selected terms and tune their weights by introducing a training set with which they evaluate and rank candidate complexes. Despite improvements in recent docking methods, they are still producing a large number of false positives, which often leads to incorrect prediction of complex binding. Using machine learning, we implemented an approach that not only ranks candidate complexes relative to each other, but also predicts how similar each candidate is to the native conformation. We built a Support Vector Regressor (SVR) using physico-chemical features and evolutionary conservation. We trained and tested the model on extensive datasets of complexes generated by three state-of-the-art docking methods. The set of docked complexes was generated from 79 different protein-protein complexes in both the rigid and medium categories of the Protein-Protein Docking Benchmark v.5. We were able to generally outperform the built-in scoring functions of the docking programs we used to generate the complexes, attesting to the potential of our approach in predicting the correct binding of protein-protein complexes.
Article
In proteins, certain amino acids may play a critical role in determining their structure and function. Examples include flexible regions, which allow domain motions, and highly conserved residues on functional interfaces, which play a role in binding and interaction with other proteins. Detecting these regions facilitates the analysis and simulation of protein rigidity and conformational changes, and aids in characterizing protein–protein binding. We present a protocol that combines graph-theory rigidity analysis and machine-learning-based methods for predicting critical residues in proteins. Our approach combines amino-acid specific information and data obtained by two complementary methods. One method, KINARI, performs graph-based analysis to find rigid clusters of amino acids in a protein, while the other method relies on evolutionary conservation scores to find functional interfaces in proteins. Our machine learning model combines both methods, in addition to amino acid type and solvent-accessible surface area.
Article
Protein–protein interactions mediate several cellular functions, which can be understood from the information obtained using the three-dimensional structures of protein–protein complexes and binding affinity data. This review focuses on computational aspects of predicting the best native-like complex structure and binding affinities. The first part covers the prediction of protein–protein complex structures and the advantages of conformational searching and scoring functions in protein–protein docking. The second part is devoted to various aspects of protein–protein interaction thermodynamics, such as databases for binding affinities and other thermodynamic parameters, computational methods to predict the binding affinity using either the three-dimensional structures of complexes or amino acid sequences, and change in binding affinities of the complexes upon mutations. We provide the latest developments on protein–protein docking and binding affinity studies along with a list of available computational resources for understanding protein–protein interactions.
Article
Full-text available
We introduce a protein docking refinement method that accepts complexes consisting of any number of monomeric units. The method uses a scoring function based on a tight coupling between evolutionary conservation, geometry and physico-chemical interactions. Understanding the role of protein complexes in the basic biology of organisms heavily relies on the detection of protein complexes and their structures. Different computational docking methods are developed for this purpose, however, these methods are often not accurate and their results need to be further refined to improve the geometry and the energy of the resulting complexes. Also, despite the fact that complexes in nature often have more than two monomers, most docking methods focus on dimers since the computational complexity increases exponentially due to the addition of monomeric units. Our results show that the refinement scheme can efficiently handle complexes with more than two monomers by biasing the results towards complexes with native interactions, filtering out false positive results. Our refined complexes have better IRMSDs with respect to the known complexes and lower energies than those initial docked structures. Evolutionary conservation information allows us to bias our results towards possible functional interfaces, and the probabilistic selection scheme helps us to escape local energy minima. We aim to incorporate our refinement method in a larger framework which also enables docking of multimeric complexes given only monomeric structures.
Article
Full-text available
Elucidating the three-dimensional structure of a higher-order molecular assembly formed by interacting molecular units, a problem commonly known as docking, is central to unraveling the molecular basis of cellular activities. Though protein assemblies are ubiquitous in the cell, it is currently challenging to predict the native structure of a protein assembly in silico. This work proposes HopDock, a novel search algorithm for protein-protein docking. HopDock efficiently obtains an ensemble of low-energy dimeric configurations, also known as decoys, that can be effectively used by ab-initio docking protocols. HopDock is based on the Basin Hopping (BH) framework which perturbs the structure of a dimeric configuration and then follows it up with an energy minimization to explicitly sample a local minimum of a chosen energy function. This process is repeated in order to sample consecutive energy minima in a trajectory-like fashion. HopDock employs both geometry and evolutionary conservation analysis to narrow down the interaction search space of interest for the purpose of efficiently obtaining a diverse decoy ensemble. A detailed analysis and a comparative study on seventeen different dimers shows HopDock obtains a broad view of the energy surface near the native dimeric structure and samples many near-native configurations. The results show that HopDock has high sampling capability and can be employed to effectively obtain a large and diverse ensemble of decoy configurations that can then be further refined in greater structural detail in ab-initio docking protocols.
Article
Full-text available
Large-scale analyses of protein-protein interactions based on coarse-grain molecular docking simulations and binding site predictions resulting from evolutionary sequence analysis, are possible and realizable on hundreds of proteins with variate structures and interfaces. We demonstrated this on the 168 proteins of the Mintseris Benchmark 2.0. On the one hand, we evaluated the quality of the interaction signal and the contribution of docking information compared to evolutionary information showing that the combination of the two improves partner identification. On the other hand, since protein interactions usually occur in crowded environments with several competing partners, we realized a thorough analysis of the interactions of proteins with true partners but also with non-partners to evaluate whether proteins in the environment, competing with the true partner, affect its identification. We found three populations of proteins: strongly competing, never competing, and interacting with different levels of strength. Populations and levels of strength are numerically characterized and provide a signature for the behavior of a protein in the crowded environment. We showed that partner identification, to some extent, does not depend on the competing partners present in the environment, that certain biochemical classes of proteins are intrinsically easier to analyze than others, and that small proteins are not more promiscuous than large ones. Our approach brings to light that the knowledge of the binding site can be used to reduce the high computational cost of docking simulations with no consequence in the quality of the results, demonstrating the possibility to apply coarse-grain docking to datasets made of thousands of proteins. Comparison with all available large-scale analyses aimed to partner predictions is realized. We release the complete decoys set issued by coarse-grain docking simulations of both true and false interacting partners, and their evolutionary sequence analysis leading to binding site predictions. Download site: http://www.lgm.upmc.fr/CCDMintseris/
Article
Full-text available
Protein-protein docking, which aims to predict the structure of a protein-protein complex from its unbound components, remains an unresolved challenge in structural bioinformatics. An important step is the ranking of docked poses using a scoring function, for which many methods have been developed. There is a need to explore the differences and commonalities of these methods with each other, as well as with functions developed in the fields of molecular dynamics and homology modelling. We present an evaluation of 115 scoring functions on an unbound docking decoy benchmark covering 118 complexes for which a near-native solution can be found, yielding top 10 success rates of up to 58%. Hierarchical clustering is performed, so as to group together functions which identify near-natives in similar subsets of complexes. Three set theoretic approaches are used to identify pairs of scoring functions capable of correctly scoring different complexes. This shows that functions in different clusters capture different aspects of binding and are likely to work together synergistically. All functions designed specifically for docking perform well, indicating that functions are transferable between sampling methods. We also identify promising methods from the field of homology modelling. Further, differential success rates by docking difficulty and solution quality suggest a need for flexibility-dependent scoring. Investigating pairs of scoring functions, the set theoretic measures identify known scoring strategies as well as a number of novel approaches, indicating promising augmentations of traditional scoring methods. Such augmentation and parameter combination strategies are discussed in the context of the learning-to-rank paradigm.
Article
Full-text available
Detection of protein complexes and their structures is crucial for understanding their role in the basic biology of organisms. Computational docking methods can provide researchers with a good starting point for the analysis of protein complexes. However, these methods are often not accurate and their results need to be further refined to improve interface packing. In this paper, we introduce a refinement method that incorporates evolutionary information into a novel scoring function by employing Evolutionary Trace (ET)-based scores. Our method also takes Van der Waals interactions into account to avoid atomic clashes in refined structures. We tested our method on docked candidates of eight protein complexes and the results suggest that the proposed scoring function helps bias the search toward complexes with native interactions. We show a strong correlation between evolutionary-conserved residues and correct interface packing. Our refinement method is able to produce structures with better lRMSD (least RMSD) with respect to the known complexes and lower energies than initial docked structures. It also helps to filter out false-positive complexes generated by docking methods, by detecting little or no conserved residues on false interfaces. We believe this method is a step toward better ranking and prediction of protein complexes.
Article
Full-text available
The evolutionary trace (ET) is the single most validated approach to identify protein functional determinants and to target mutational analysis, protein engineering and drug design to the most relevant sites of a protein. It applies to the entire proteome; its predictions come with a reliability score; and its results typically reach significance in most protein families with 20 or more sequence homologs. In order to identify functional hot spots, ET scans a multiple sequence alignment for residue variations that correlate with major evolutionary divergences. In case studies this enables the selective separation, recoding, or mimicry of functional sites and, on a large scale, this enables specific function predictions based on motifs built from select ET-identified residues. ET is therefore an accurate, scalable and efficient method to identify the molecular determinants of protein function and to direct their rational perturbation for therapeutic purposes. Public ET servers are located at: http://mammoth.bcm.tmc.edu/.
Article
Full-text available
The docking field has come of age. The time is ripe to present the principles of docking, reviewing the current state of the field. Two reasons are largely responsible for the maturity of the computational docking area. First, the early optimism that the very presence of the "correct" native conformation within the list of predicted docked conformations signals a near solution to the docking problem, has been replaced by the stark realization of the extreme difficulty of the next scoring/ranking step. Second, in the last couple of years more realistic approaches to handling molecular flexibility in docking schemes have emerged. As in folding, these derive from concepts abstracted from statistical mechanics, namely, populations. Docking and folding are interrelated. From the purely physical standpoint, binding and folding are analogous processes, with similar underlying principles. Computationally, the tools developed for docking will be tremendously useful for folding. For large, multidomain proteins, domain docking is probably the only rational way, mimicking the hierarchical nature of protein folding. The complexity of the problem is huge. Here we divide the computational docking problem into its two separate components. As in folding, solving the docking problem involves efficient search (and matching) algorithms, which cover the relevant conformational space, and selective scoring functions, which are both efficient and effectively discriminate between native and non-native solutions. It is universally recognized that docking of drugs is immensely important. However, protein-protein docking is equally so, relating to recognition, cellular pathways, and macromolecular assemblies. Proteins function when they are bound to other molecules. Consequently, we present the review from both the computational and the biological points of view. Although large, it covers only partially the extensive body of literature, relating to small (drug) and to large protein-protein molecule docking, to rigid and to flexible. Unfortunately, when reviewing these, a major difficulty in assessing the results is the non-uniformity in the formats in which they are presented in the literature. Consequently, we further propose a way to rectify it here.
Article
Full-text available
ClusPro (http://nrc.bu.edu/cluster) represents the first fully automated, web-based program for the computational docking of protein structures. Users may upload the coordinate files of two protein structures through ClusPro's web interface, or enter the PDB codes of the respective structures, which ClusPro will then download from the PDB server (http://www.rcsb.org/pdb/). The docking algorithms evaluate billions of putative complexes, retaining a preset number with favorable surface complementarities. A filtering method is then applied to this set of structures, selecting those with good electrostatic and desolvation free energies for further clustering. The program output is a short list of putative complexes ranked according to their clustering properties, which is automatically sent back to the user via email.
Article
Full-text available
Structural details of protein–protein interactions are invaluable for understanding and deciphering biological mechanisms. Computational docking methods aim to predict the structure of a protein–protein complex given the structures of its single components. Protein flexibility and the absence of robust scoring functions pose a great challenge in the docking field. Due to these difficulties most of the docking methods involve a two-tier approach: coarse global search for feasible orientations that treats proteins as rigid bodies, followed by an accurate refinement stage that aims to introduce flexibility into the process. The FireDock web server, presented here, is the first web server for flexible refinement and scoring of protein–protein docking solutions. It includes optimization of side-chain conformations and rigid-body orientation and allows a high-throughput refinement. The server provides a user-friendly interface and a 3D visualization of the results. A docking protocol consisting of a global search by PatchDock and a refinement by FireDock was extensively tested. The protocol was successful in refining and scoring docking solution candidates for cases taken from docking benchmarks. We provide an option for using this protocol by automatic redirection of PatchDock candidate solutions to the FireDock web server for refinement. The FireDock web server is available at http://bioinfo3d.cs.tau.ac.il/FireDock/.
Article
Full-text available
The RosettaDock server (http://rosettadock.graylab.jhu.edu) identifies low-energy conformations of a protein–protein interaction near a given starting configuration by optimizing rigid-body orientation and side-chain conformations. The server requires two protein structures as inputs and a starting location for the search. RosettaDock generates 1000 independent structures, and the server returns pictures, coordinate files and detailed scoring information for the 10 top-scoring models. A plot of the total energy of each of the 1000 models created shows the presence or absence of an energetic binding funnel. RosettaDock has been validated on the docking benchmark set and through the Critical Assessment of PRedicted Interactions blind prediction challenge.
Article
Full-text available
Basic backpropagation, which is a simple method now being widely used in areas like pattern recognition and fault diagnosis, is reviewed. The basic equations for backpropagation through time, and applications to areas like pattern recognition involving dynamic systems, systems identification, and control are discussed. Further extensions of this method, to deal with systems other than neural networks, systems involving simultaneous equations, or true recurrent networks, and other practical issues arising with the method are described. Pseudocode is provided to clarify the algorithms. The chain rule for ordered derivatives-the theorem which underlies backpropagation-is briefly discussed. The focus is on designing a simpler version of backpropagation which can be translated into computer code and applied directly by neutral network users
Article
Protein-protein docking methods aim to compute the correct bound form of two or more proteins. One of the major challenges for docking methods is to accurately discriminate native-like structures. The protein docking community agrees on the existence of a relationship between various favorable intermolecular interactions (e.g. Van der Waals, electrostatic, desolvation forces, etc.) and the similarity of a conformation to its native structure. Different docking algorithms often formulate this relationship as a weighted sum of selected terms and calibrate their weights against a specific training data to evaluate and rank candidate structures. However, the exact form of this relationship is unknown and the accuracy of such methods is impaired by the pervasiveness of false positives. Unlike the conventional scoring functions, we propose a novel machine learning approach that not only ranks the candidate structures relative to each other but also indicates how similar each candidate is to the native conformation. We trained the AccuRMSD neural network with an extensive dataset using the back-propagation learning algorithm and achieved RMSD prediction accuracy with less than 1Å error margin on 19,600 test samples.
Article
A coarse-grained protein model implemented in the ATTRACT protein-protein docking program has been employed to predict protein-protein complex structures in CAPRI rounds 22-27. For six targets acceptable or better quality solutions have been submitted corresponding to ~60% of all targets. For one target promising results on the prediction of the hydration structure at the protein-protein interface have been achieved. New approaches for the rapid flexible refinement have been developed based on a combination of atomistic representation of the bonded geometry and a coarse-grained description of non-bonded interactions. Possible further improvements of the docking approach in particular at the scoring and the flexible refinement steps are discussed. © Proteins 2013;. © 2013 Wiley Periodicals, Inc.
Chapter
This paper presents a generalization of the perception learning procedure for learning the correct sets of connections for arbitrary networks. The rule, falled the generalized delta rule, is a simple scheme for implementing a gradient descent method for finding weights that minimize the sum squared error of the sytem's performance. The major theoretical contribution of the work is the procedure called error propagation, whereby the gradient can be determined by individual units of the network based only on locally available information. The major empirical contribution of the work is to show that the problem of local minima not serious in this application of gradient descent. Keywords: Learning; networks; Perceptrons; Adaptive systems; Learning machines; and Back propagation
Article
Docking algorithms simulate protein-protein association in molecular assemblies such as protease-inhibitor or antigen-antibody complexes by reconstituting the complexes from their component molecules. They not only efficiently retrieve native structures but also select a number of non-native structures with structural and physicochemical features that were assumed to be unique to the native complexes. Some of these ‘false positives’ may deserve further examination in experimental studies of protein-protein recognition.
Article
We present the derivation of a new molecular mechanical force field for simulating the structures, conformational energies, and interaction energies of proteins, nucleic acids, and many related organic molecules in condensed phases. This effective two-body force field is the successor to the Weiner et al, force field and was developed with some of the same philosophies, such as the use of a simple diagonal potential function and electrostatic potential fit atom centered charges. The need for a 10-12 function for representing hydrogen bonds is no longer necessary due to the improved performance of the new charge model and new van der Waals parameters. These new charges are determined using a 6-31G basis set and restrained electrostatic potential (RESP) fitting and have been shown to reproduce interaction energies, free energies of solvation, and conformational energies of simple small molecules to a good degree of accuracy. Furthermore, the new RESP charges exhibit less variability as a function of the molecular conformation used in the charge determination. The new van der Waals parameters have been derived from liquid simulations and include hydrogen parameters which take into account the effects of any geminal electronegative atoms. The bonded parameters developed by Weiner et al. were modified as necessary to reproduce experimental vibrational frequencies and structures. Most of the simple dihedral parameters have been retained from Weiner et. al., but a complex set of phi and psi parameters which do a good job of reproducing the energies of the low-energy conformations of glycyl and alanyl dipeptides has been developed for the peptide backbone.
Book
Introduction to Bioinformatics starts off by introducing the topic. It then looks at genetics and genomes. It moves on to consider the panorama of life. The text also considers alignments and phylogenetic trees. There is a chapter on structural bioinformatics and drug discovery. The text also examines scientific publications and archives, particularly media, content, access, and presentation. Artificial intelligence is considered as well, in addition to machine learning. There is an introduction to systems biology that follows towards the end. The book's final chapters look at metabolic pathways and control of organization.
Article
We updated our protein-protein docking benchmark to include complexes that became available since our previous release. As before, we only considered high-resolution complex structures that are nonredundant at the family-family pair level, for which the X-ray or NMR unbound structures of the constituent proteins are also available. Benchmark 4.0 adds 52 new complexes to the 124 cases of Benchmark 3.0, representing an increase of 42%. Thus, benchmark 4.0 provides 176 unbound-unbound cases that can be used for protein-protein docking method development and assessment. Seventeen of the newly added cases are enzyme-inhibitor complexes, and we found no new antigen-antibody complexes. Classifying the new cases according to expected difficulty for protein-protein docking algorithms gives 33 rigid body cases, 11 cases of medium difficulty, and 8 cases that are difficult. Benchmark 4.0 listings and processed structure files are publicly accessible at http://zlab.umassmed.edu/benchmark/.
Article
Docking algorithms build multimolecular assemblies based on the subunit structures. "Unbound" docking, which starts with the free molecules and allows for conformation changes, may be used to predict the structure of a protein-protein complex. This requires at least two steps, a rigid-body search that determines the relative position and orientation of the subunits, and a refinement step. The methods developed in the past twenty years yield native-like models in most cases, but always with many false positives that must be filtered out, and they fail when the conformation changes are large. CAPRI (Critical Assessment of PRedicted Interactions) is a community-wide experiment set up to monitor progress in the field. It offers participants the opportunity to test their methods in blind predictions that are assessed against an unpublished experimental structure. The models submitted by predictor groups are judged depending on how well they reproduce the geometry and the residue-residue contacts seen in the target structure. In nine years of CAPRI, 42 target complexes have been subjected to prediction based on the components' unbound structures. Good models have been submitted for 28 targets, and prediction has failed on 6. Both these successes and these failures have been fruitful, as they stimulated participant groups to develop new score functions to identify native-like solutions, and new algorithms that allow the molecules to be flexible during docking.
Article
Analysis of trajectories from our rigid-body dynamics simulation package, BioSimz, is used to find regions on the surface of unbound proteins that form frequent and tenacious encounter complexes with their binding partner. Binding partners are significantly more likely to sojourn around true binding regions than around the remainder of the protein surface. This information is used to restrict the search space for flexible protein-protein docking using our SwarmDock algorithm, reducing the computational expense of docking, and improving or matching the ranking of successfully docked poses for all but four of 26 test cases. Running the simulations with external crowder proteins, at near physiological concentration, further enhances the binding region, compared to simulations without external crowders. Information gleaned from these simulations can give mechanistic insights into binding events. The application of these techniques to CAPRI targets 32 and 38-40 is discussed.
Article
The design of an ideal scoring function for protein-protein docking that would also predict the binding affinity of a complex is one of the challenges in structural proteomics. Such a scoring function would open the route to in silico, large-scale annotation and prediction of complete interactomes. Here we present a protein-protein binding affinity benchmark consisting of binding constants (K(d)'s) for 81 complexes. This benchmark was used to assess the performance of nine commonly used scoring algorithms along with a free-energy prediction algorithm in their ability to predicting binding affinities. Our results reveal a poor correlation between binding affinity and scores for all algorithms tested. However, the diversity and validity of the benchmark is highlighted when binding affinity data are categorized according to the methodology by which they were determined. By further classifying the complexes into low, medium and high affinity groups, significant correlations emerge, some of which are retained after dividing the data into more classes, showing the robustness of these correlations. Despite this, accurate prediction of binding affinity remains outside our reach due to the large associated standard deviations of the average score within each group. All the above-mentioned observations indicate that improvements of existing scoring functions or design of new consensus tools will be required for accurate prediction of the binding affinity of a given protein-protein complex. The benchmark developed in this work will serve as an indispensable source to reach this goal.
Article
Protein-protein binding is one of the critical events in biology, and knowledge of proteic complexes three-dimensional structures is of fundamental importance for the biochemical study of pharmacologic compounds. In the past two decades there was an emergence of a large variety of algorithms designed to predict the structures of protein-protein complexes--a procedure named docking. Computational methods, if accurate and reliable, could play an important role, both to infer functional properties and to guide new experiments. Despite the outstanding progress of the methodologies developed in this area, a few problems still prevent protein-protein docking to be a widespread practice in the structural study of proteins. In this review we focus our attention on the principles that govern docking, namely the algorithms used for searching and scoring, which are usually referred as the docking problem. We also focus our attention on the use of a flexible description of the proteins under study and the use of biological information as the localization of the hot spots, the important residues for protein-protein binding. The most common docking softwares are described too.
Article
The majority of soluble and membrane-bound proteins in modern cells are symmetrical oligomeric complexes with two or more subunits. The evolutionary selection of symmetrical oligomeric complexes is driven by functional, genetic, and physicochemical needs. Large proteins are selected for specific morphological functions, such as formation of rings, containers, and filaments, and for cooperative functions, such as allosteric regulation and multivalent binding. Large proteins are also more stable against denaturation and have a reduced surface area exposed to solvent when compared with many individual, smaller proteins. Large proteins are constructed as oligomers for reasons of error control in synthesis, coding efficiency, and regulation of assembly. Symmetrical oligomers are favored because of stability and finite control of assembly. Several functions limit symmetry, such as interaction with DNA or membranes, and directional motion. Symmetry is broken or modified in many forms: quasisymmetry, in which identical subunits adopt similar but different conformations; pleomorphism, in which identical subunits form different complexes; pseudosymmetry, in which different molecules form approximately symmetrical complexes; and symmetry mismatch, in which oligomers of different symmetries interact along their respective symmetry axes. Asymmetry is also observed at several levels. Nearly all complexes show local asymmetry at the level of side chain conformation. Several complexes have reciprocating mechanisms in which the complex is asymmetric, but, over time, all subunits cycle through the same set of conformations. Global asymmetry is only rarely observed. Evolution of oligomeric complexes may favor the formation of dimers over complexes with higher cyclic symmetry, through a mechanism of prepositioned pairs of interacting residues. However, examples have been found for all of the crystallographic point groups, demonstrating that functional need can drive the evolution of any symmetry.
Article
The structure determination of protein-protein complexes is a rather tedious and lengthy process, by both NMR and X-ray crystallography. Several methods based on docking to study protein complexes have also been well developed over the past few years. Most of these approaches are not driven by experimental data but are based on a combination of energetics and shape complementarity. Here, we present an approach called HADDOCK (High Ambiguity Driven protein-protein Docking) that makes use of biochemical and/or biophysical interaction data such as chemical shift perturbation data resulting from NMR titration experiments or mutagenesis data. This information is introduced as Ambiguous Interaction Restraints (AIRs) to drive the docking process. An AIR is defined as an ambiguous distance between all residues shown to be involved in the interaction. The accuracy of our approach is demonstrated with three molecular complexes. For two of these complexes, for which both the complex and the free protein structures have been solved, NMR titration data were available. Mutagenesis data were used in the last example. In all cases, the best structures generated by HADDOCK, that is, the structures with the lowest intermolecular energies, were the closest to the published structure of the respective complexes (within 2.0 A backbone RMSD).
Article
Protein conformational change is an important consideration in ligand-docking screens, but it is difficult to predict. A simple way to account for protein flexibility is to soften the criterion for steric fit between ligand and receptor. A more comprehensive but more expensive method would be to sample multiple receptor conformations explicitly. Here, these two approaches are compared. A "soft" scoring function was created by attenuating the repulsive term in the Lennard-Jones potential, allowing for a closer approach between ligand and protein. The standard, "hard" Lennard-Jones potential was used for docking to multiple receptor conformations. The Available Chemicals Directory (ACD) was screened against two cavity sites in the T4 lysozyme. These sites undergo small but significant conformational changes on ligand binding, making them good systems for soft docking. The ACD was also screened against the drug target aldose reductase, which can undergo large conformational changes on ligand binding. We evaluated the ability of the scoring functions to identify known ligands from among the over 200 000 decoy molecules in the database. The soft potential was always better at identifying known ligands than the hard scoring function when only a single receptor conformation was used. Conversely, the soft function was worse at identifying known leads than the hard function when multiple receptor conformations were used. This was true even for the cavity sites and was especially true for aldose reductase. To test the multiple-conformation method predictively, we screened the ACD for molecules that preferentially docked to the expanded conformation of aldose reductase, known to bind larger ligands. Six novel molecules that ranked among the top 0.66% of hits from the multiple-conformation calculation, but ranked relatively poorly in the soft docking calculation, were tested experimentally for enzyme inhibition. Four of these six inhibited the enzyme, the best with an IC(50) of 8 microM. Although ligands can get better scores in soft docking, the same is also true for decoys. The improved ranking of such decoys can come at the expense of true ligands.
Article
The high-resolution prediction of protein-protein docking can now create structures with atomic-level accuracy. This progress arises from both improvements in the rapid sampling of conformations and increased accuracy of binding free energy calculations. Consequently, the quality of models submitted to the blind prediction challenge CAPRI (Critical Assessment of PRedicted Interactions) has steadily increased, including complexes predicted from homology structures of one binding partner and complexes with atomic accuracy at the interface. By exploiting experimental information, docking has created model structures for real applications, even when confronted with challenges such as moving backbones and uncertain monomer structures. Work remains to be done in docking large or flexible proteins, ranking models consistently, and producing models accurate enough to allow computational design of higher affinities or specificities.
Article
Evolutionary trace report_maker offers a new type of service for researchers investigating the function of novel proteins. It pools, from different sources, information about protein sequence, structure and elementary annotation, and to that background superimposes inference about the evolutionary behavior of individual residues, using real-valued evolutionary trace method. As its only input it takes a Protein Data Bank identifier or UniProt accession number, and returns a human-readable document in PDF format, supplemented by the original data needed to reproduce the results quoted in the report. Availability: Evolutionary trace reports are freely available for academic users at Author Webpage Contact: {imihalek,ires,lichtarge}@bcm.tmc.edu
Article
Protein-protein docking algorithms provide a means to elucidate structural details for presently unknown complexes. Here, we present and evaluate a new method to predict protein-protein complexes from the coordinates of the unbound monomer components. The method employs a low-resolution, rigid-body, Monte Carlo search followed by simultaneous optimization of backbone displacement and side-chain conformations using Monte Carlo minimization. Up to 10(5) independent simulations are carried out, and the resulting "decoys" are ranked using an energy function dominated by van der Waals interactions, an implicit solvation model, and an orientation-dependent hydrogen bonding potential. Top-ranking decoys are clustered to select the final predictions. Small-perturbation studies reveal the formation of binding funnels in 42 of 54 cases using coordinates derived from the bound complexes and in 32 of 54 cases using independently determined coordinates of one or both monomers. Experimental binding affinities correlate with the calculated score function and explain the predictive success or failure of many targets. Global searches using one or both unbound components predict at least 25% of the native residue-residue contacts in 28 of the 32 cases where binding funnels exist. The results suggest that the method may soon be useful for generating models of biologically important complexes from the structures of the isolated components, but they also highlight the challenges that must be met to achieve consistent and accurate prediction of protein-protein interactions.
Article
Protein-protein docking requires fast and effective methods to quickly discriminate correct from incorrect predictions generated by initial-stage docking. We have developed and tested a scoring function that utilizes detailed electrostatics, van der Waals, and desolvation to rescore initial-stage docking predictions. Weights for the scoring terms were optimized for a set of test cases, and this optimized function was then tested on an independent set of nonredundant cases. This program, named ZRANK, is shown to significantly improve the success rate over the initial ZDOCK rankings across a large benchmark. The amount of test cases with No. 1 ranked hits increased from 2 to 11 and from 6 to 12 when predictions from two ZDOCK versions were considered. ZRANK can be applied either as a refinement protocol in itself or as a preprocessing stage to enrich the well-ranked hits prior to further refinement.
Article
The accurate scoring of rigid-body docking orientations represents one of the major difficulties in protein-protein docking prediction. Other challenges are the development of faster and more efficient sampling methods and the introduction of receptor and ligand flexibility during simulations. Overall, good discrimination of near-native docking poses from the very early stages of rigid-body protein docking is essential step before applying more costly interface refinement to the correct docking solutions. Here we explore a simple approach to scoring of rigid-body docking poses, which has been implemented in a program called pyDock. The scheme is based on Coulombic electrostatics with distance dependent dielectric constant, and implicit desolvation energy with atomic solvation parameters previously adjusted for rigid-body protein-protein docking. This scoring function is not highly dependent on specific geometry of the docking poses and therefore can be used in rigid-body docking sets generated by a variety of methods. We have tested the procedure in a large benchmark set of 80 unbound docking cases. The method is able to detect a near-native solution from 12,000 docking poses and place it within the 100 lowest-energy docking solutions in 56% of the cases, in a completely unrestricted manner and without any other additional information. More specifically, a near-native solution will lie within the top 20 solutions in 37% of the cases. The simplicity of the approach allows for a better understanding of the physical principles behind protein-protein association, and provides a fast tool for the evaluation of large sets of rigid-body docking poses in search of the near-native orientation.
Article
We have developed a new method to predict protein- protein complexes based on the shape complementarity of the molecular surfaces, along with sequence conservation obtained by evolutionary trace (ET) analysis. The docking is achieved by optimization of an object function that evaluates the degree of shape complementarity weighted by the conservation of the interacting residues. The optimization is carried out using a genetic algorithm in combination with Monte Carlo sampling. We applied this method to CAPRI targets and evaluated the performance systematically. Consequently, our method could achieve native-like predictions in several cases. In addition, we have analyzed the feasibility of the ET method for docking simulations, and found that the conservation information was useful only in a limited category of proteins (signal related proteins and enzymes).
AccuRefiner: A machine learning guided refinement method for protein-protein docking
  • B Akbal-Delibas
  • M Pomplun
  • N Haspel
Akbal-Delibas, B., Pomplun, M., and Haspel, N. 2015. AccuRefiner: A machine learning guided refinement method for protein-protein docking. Proceedings of the 7th International Conference on Bioinformatics and Computational Biology.