Binding Site Prediction for Protein-Protein Interactions and Novel Motif Discovery using Re-occurring Polypeptide Sequences

School of Computer Science, Carleton University, Ottawa, ON K1S5B6, Canada.
BMC Bioinformatics (Impact Factor: 2.58). 06/2011; 12:225. DOI: 10.1186/1471-2105-12-225
Source: PubMed


While there are many methods for predicting protein-protein interaction, very few can determine the specific site of interaction on each protein. Characterization of the specific sequence regions mediating interaction (binding sites) is crucial for an understanding of cellular pathways. Experimental methods often report false binding sites due to experimental limitations, while computational methods tend to require data which is not available at the proteome-scale. Here we present PIPE-Sites, a novel method of protein specific binding site prediction based on pairs of re-occurring polypeptide sequences, which have been previously shown to accurately predict protein-protein interactions. PIPE-Sites operates at high specificity and requires only the sequences of query proteins and a database of known binary interactions with no binding site data, making it applicable to binding site prediction at the proteome-scale.
PIPE-Sites was evaluated using a dataset of 265 yeast and 423 human interacting proteins pairs with experimentally-determined binding sites. We found that PIPE-Sites predictions were closer to the confirmed binding site than those of two existing binding site prediction methods based on domain-domain interactions, when applied to the same dataset. Finally, we applied PIPE-Sites to two datasets of 2347 yeast and 14,438 human novel interacting protein pairs predicted to interact with high confidence. An analysis of the predicted interaction sites revealed a number of protein subsequences which are highly re-occurring in binding sites and which may represent novel binding motifs.
PIPE-Sites is an accurate method for predicting protein binding sites and is applicable to the proteome-scale. Thus, PIPE-Sites could be useful for exhaustive analysis of protein binding patterns in whole proteomes as well as discovery of novel binding motifs. PIPE-Sites is available online at

Download full-text


Available from: James Robert Green, Oct 02, 2015
46 Reads
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Computational prediction of residues that participate in protein-protein interactions is a difficult task, and state of the art methods have shown only limited success in this arena. One possible problem with these methods is that they try to predict interacting residues without incorporating information about the partner protein, although it is unclear how much partner information could enhance prediction performance. To address this issue, the two following comparisons are of crucial significance: (a) comparison between the predictability of inter-protein residue pairs, i.e., predicting exactly which residue pairs interact with each other given two protein sequences; this can be achieved by either combining conventional single-protein predictions or making predictions using a new model trained directly on the residue pairs, and the performance of these two approaches may be compared: (b) comparison between the predictability of the interacting residues in a single protein (irrespective of the partner residue or protein) from conventional methods and predictions converted from the pair-wise trained model. Using these two streams of training and validation procedures and employing similar two-stage neural networks, we showed that the models trained on pair-wise contacts outperformed the partner-unaware models in predicting both interacting pairs and interacting single-protein residues. Prediction performance decreased with the size of the conformational change upon complex formation; this trend is similar to docking, even though no structural information was used in our prediction. An example application that predicts two partner-specific interfaces of a protein was shown to be effective, highlighting the potential of the proposed approach. Finally, a preliminary attempt was made to score docking decoy poses using prediction of interacting residue pairs; this analysis produced an encouraging result.
    PLoS ONE 12/2011; 6(12):e29104. DOI:10.1371/journal.pone.0029104 · 3.23 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The molecular network sustained by different types of interactions among proteins is widely manifested as the fundamental driving force of cellular operations. Many biological functions are determined by the crosstalk between proteins rather than by the characteristics of their individual components. Thus, the searches for protein partners in global networks are imperative when attempting to address the principles of biology. We have developed a web-based tool "Sequence-based Protein Partners Search" (SPPS) to explore interacting partners of proteins, by searching over a large repertoire of proteins across many species. SPPS provides a database containing more than 60,000 protein sequences with annotations and a protein-partner search engine in two modes (Single Query and Multiple Query). Two interacting proteins of human FBXO6 protein have been found using the service in the study. In addition, users can refine potential protein partner hits by using annotations and possible interactive network in the SPPS web server. SPPS provides a new type of tool to facilitate the identification of direct or indirect protein partners which may guide scientists on the investigation of new signaling pathways. The SPPS server is available to the public at
    PLoS ONE 01/2012; 7(1):e30938. DOI:10.1371/journal.pone.0030938 · 3.23 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Proteins are the bricks and mortar of cells, playing structural and functional roles. In order to perform their function, they interact with each other as well as with other biomolecules such as DNA or RNA. Therefore, to fathom the function of a protein, we require knowing its partners and the atomic details of its interactions (i.e., the structure of the complex). However, the amount of protein interactions with an experimentally determined three-dimensional structure is scarce. Therefore, computational techniques such as homology modeling are foremost to fill this gap. Protein interactions can be modeled using as templates the interactions of homologous proteins, if the structure of the complex is known, or using docking methods. In both approaches, the estimation of the quality of models is essential. There are several ways to address this problem. In this review, we focus on the use of knowledge-based potentials for the analysis of protein interactions. We describe the procedure to derive statistical potentials and split them into different energetic terms that can be used for different purposes. We extensively discuss the fields where knowledge-based potentials have been successfully applied to (1) model protein-protein, protein-DNA, and protein-RNA interactions and (2) predict binding sites (in the protein and in the DNA). Moreover, we provide ready-to-use resources for docking and benchmarking protein interactions.
    Advances in Protein Chemistry and Structural Biology 03/2014; 94:77-120. DOI:10.1016/B978-0-12-800168-4.00004-4 · 3.04 Impact Factor
Show more