Article

Effect of reference genome selection on the performance of computational methods for genome-wide protein-protein interaction prediction.

Computational and Functional Genomics Group, Centre for DNA Fingerprinting and Diagnostics, Hyderabad, Andhra Pradesh, India.
PLoS ONE (impact factor: 4.09). 01/2012; 7(7):e42057. DOI:10.1371/journal.pone.0042057 pp.e42057
Source: PubMed

ABSTRACT Recent progress in computational methods for predicting physical and functional protein-protein interactions has provided new insights into the complexity of biological processes. Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history. This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes. These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. These are collectively known as genomic context methods. On the other hand a method called mirrortree is based on the similarity of phylogenetic trees between two interacting proteins. Comprehensive performance analyses of these methods have been frequently reported in literature. However, very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions.
We analyzed the performance of four methods and their variants to understand the effect of reference genome selection on prediction efficacy. We used six sets of reference genomes, sampled in accordance with phylogenetic diversity and relationship between organisms from 565 bacteria. We used Escherichia coli as a model organism and the gold standard datasets of interacting proteins reported in DIP, EcoCyc and KEGG databases to compare the performance of the prediction methods.
Higher performance for predicting protein-protein interactions was achievable even with 100-150 bacterial genomes out of 565 genomes. Inclusion of archaeal genomes in the reference genome set improves performance. We find that in order to obtain a good performance, it is better to sample few genomes of related genera of prokaryotes from the large number of available genomes. Moreover, such a sampling allows for selecting 50-100 genomes for comparable accuracy of predictions when computational resources are limited.

0 0
 · 
0 Bookmarks
 · 
25 Views
  • Source
    Article: Deciphering protein-protein interactions. Part II. Computational methods to predict protein and domain interaction partners.
    [show abstract] [hide abstract]
    ABSTRACT: Recent advances in high-throughput experimental methods for the identification of protein interactions have resulted in a large amount of diverse data that are somewhat incomplete and contradictory. As valuable as they are, such experimental approaches studying protein interactomes have certain limitations that can be complemented by the computational methods for predicting protein interactions. In this review we describe different approaches to predict protein interaction partners as well as highlight recent achievements in the prediction of specific domains mediating protein-protein interactions. We discuss the applicability of computational methods to different types of prediction problems and point out limitations common to all of them.
    PLoS Computational Biology 05/2007; 3(4):e43. · 5.22 Impact Factor
  • Source
    Article: Evolution of biomolecular networks: lessons from metabolic and protein interactions.
    [show abstract] [hide abstract]
    ABSTRACT: Despite only becoming popular at the beginning of this decade, biomolecular networks are now frameworks that facilitate many discoveries in molecular biology. The nodes of these networks are usually proteins (specifically enzymes in metabolic networks), whereas the links (or edges) are their interactions with other molecules. These networks are made up of protein-protein interactions or enzyme-enzyme interactions through shared metabolites in the case of metabolic networks. Evolutionary analysis has revealed that changes in the nodes and links in protein-protein interaction and metabolic networks are subject to different selection pressures owing to distinct topological features. However, many evolutionary constraints can be uncovered only if temporal and spatial aspects are included in the network analysis.
    Nature Reviews Molecular Cell Biology 11/2009; 10(11):791-803. · 39.12 Impact Factor
  • Article: Getting connected: analysis and principles of biological networks.
    [show abstract] [hide abstract]
    ABSTRACT: The execution of complex biological processes requires the precise interaction and regulation of thousands of molecules. Systematic approaches to study large numbers of proteins, metabolites, and their modification have revealed complex molecular networks. These biological networks are significantly different from random networks and often exhibit ubiquitous properties in terms of their structure and organization. Analyzing these networks provides novel insights in understanding basic mechanisms controlling normal cellular processes and disease pathologies.
    Genes & Development 06/2007; 21(9):1010-24. · 11.66 Impact Factor

Full-text (2 Sources)

View
2 Downloads
Available from
18 Jan 2013

Keywords

archaeal genomes
 
available genomes
 
comparable accuracy
 
computational methods
 
computational resources
 
Escherichia coli
 
functional protein-protein interactions
 
functionally interacting proteins
 
genomic context methods
 
good performance
 
meaningful protein interactions
 
multiple genomes
 
new insights
 
phylogenetic diversity
 
phylogenetic trees
 
prediction methods
 
protein pairs
 
protein-protein interactions
 
reference genome selection
 
reference genomes