Conference Paper

Structural Prediction of Protein-Protein Interactions in Saccharomyces cerevisiae

Kansas State Univ., Manhattan
DOI: 10.1109/BIBE.2007.4375729 Conference: Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on
Source: IEEE Xplore

ABSTRACT Protein-protein interactions (PPI) refer to the associations between proteins and the study of these associations. Several approaches have been used to address the problem of predicting PPI. Some of them are based on biological features extracted from a protein sequence (such as, amino acid composition, GO terms, etc.); others use relational and structural features extracted from the PPI network, which can be represented as a graph. Our approach falls in the second category. We adapt a general approach to graph feature extraction that has previously been applied to collaborative recommendation of friends in social networks. Several structural features are identified based on the PPI graph and used to learn classifiers for predicting new interactions. Two datasets containing Saccharomyces cerevisiae PPI are used to test the proposed approach. Both these datasets were assembled from the Database of Interacting Proteins (DIP). We assembled the first data set directly from DIP in April 2006, while the second data set has been used in previous studies, thus making it easy to compare our approach with previous approaches. Several classifiers are trained using the structural features extracted from the interactions graph. The results show good performance (accuracy, sensitivity and specificity), proving that the structural features are highly predictive with respect to PPI.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: MOTIVATION: An ambitious goal of proteomics is to elucidate the structure, interactions and functions of all proteins within cells and organisms. The expectation is that this will provide a fuller appreciation of cellular processes and networks at the protein level, ultimately leading to a better understanding of disease mechanisms and suggesting new means for intervention. This paper addresses the question: can protein-protein interactions be predicted directly from primary structure and associated data? Using a diverse database of known protein interactions, a Support Vector Machine (SVM) learning system was trained to recognize and predict interactions based solely on primary structure and associated physicochemical properties. RESULTS: Inductive accuracy of the trained system, defined here as the percentage of correct protein interaction predictions for previously unseen test sets, averaged 80% for the ensemble of statistical experiments. Future proteomics studies may benefit from this research by proceeding directly from the automated identification of a cell's gene products to prediction of protein interaction pairs.
    Bioinformatics 06/2001; 17(5):455-60. DOI:10.1093/bioinformatics/17.5.455 · 4.62 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Database of Interacting Proteins ( aims to integrate the diverse body of experimental evidence on protein-protein interactions into a single, easily accessible online database. Because the reliability of experimental evidence varies widely, methods of quality assessment have been developed and utilized to identify the most reliable subset of the interactions. This CORE set can be used as a reference when evaluating the reliability of high-throughput protein-protein interaction data sets, for development of prediction methods, as well as in the studies of the properties of protein interaction networks.
    Nucleic Acids Research 02/2004; 32(Database issue):D449-51. DOI:10.1093/nar/gkh086 · 8.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Proteins play a fundamental role in every process within the cell. Understanding how proteins interact, and the functional units they are part of, is important to furthering our knowledge of the entire biological process. There has been a growing amount of work, both experimetal and computational, on determining the protein-protein interaction network. Recently researchers have had success looking at this as a relational learning problem. Results: In this work, we further this investigation, proposing several novel relational features for predicting protein-protein interaction. These features can be used in any classifier. Our approach allows large and complex networks to be analyzed and is an alternative to using more expensive relational methods. We show that we are able to get an accuracy of 81.7% when predicting new links from noisy high throughput data.


Available from