Article

Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs.

Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA.
Genome Research (impact factor: 13.61). 06/2004; 14(6):1107-18. DOI:10.1101/gr.1774904
Source: PubMed

ABSTRACT Proteins function mainly through interactions, especially with DNA and other proteins. While some large-scale interaction networks are now available for a number of model organisms, their experimental generation remains difficult. Consequently, interolog mapping--the transfer of interaction annotation from one organism to another using comparative genomics--is of significant value. Here we quantitatively assess the degree to which interologs can be reliably transferred between species as a function of the sequence similarity of the corresponding interacting proteins. Using interaction information from Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, and Helicobacter pylori, we find that protein-protein interactions can be transferred when a pair of proteins has a joint sequence identity >80% or a joint E-value <10(-70). (These "joint" quantities are the geometric means of the identities or E-values for the two pairs of interacting proteins.) We generalize our interolog analysis to protein-DNA binding, finding such interactions are conserved at specific thresholds between 30% and 60% sequence identity depending on the protein family. Furthermore, we introduce the concept of a "regulog"--a conserved regulatory relationship between proteins across different species. We map interologs and regulogs from yeast to a number of genomes with limited experimental annotation (e.g., Arabidopsis thaliana) and make these available through an online database at http://interolog.gersteinlab.org. Specifically, we are able to transfer approximately 90,000 potential protein-protein interactions to the worm. We test a number of these in two-hybrid experiments and are able to verify 45 overlaps, which we show to be statistically significant.

0 0
 · 
0 Bookmarks
 · 
47 Views
  • Source
    Article: Comparative assessment of large-scale data sets of protein-protein interactions.
    [show abstract] [hide abstract]
    ABSTRACT: Comprehensive protein protein interaction maps promise to reveal many aspects of the complex regulatory network underlying cellular function. Recently, large-scale approaches have predicted many new protein interactions in yeast. To measure their accuracy and potential as well as to identify biases, strengths and weaknesses, we compare the methods with each other and with a reference set of previously reported protein interactions.
    Nature 06/2002; 417(6887):399-403. · 36.28 Impact Factor
  • Article: MIPS: a database for genomes and protein sequences.
    [show abstract] [hide abstract]
    ABSTRACT: The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried, near Munich, Germany, continues its longstanding tradition to develop and maintain high quality curated genome databases. In addition, efforts have been intensified to cover the wealth of complete genome sequences in a systematic, comprehensive form. Bioinformatics, supporting national as well as European sequencing and functional analysis projects, has resulted in several up-to-date genome-oriented databases. This report describes growing databases reflecting the progress of sequencing the Arabidopsis thaliana (MATDB) and Neurospora crassa genomes (MNCDB), the yeast genome database (MYGD) extended by functional analysis data, the database of annotated human EST-clusters (HIB) and the database of the complete cDNA sequences from the DHGP (German Human Genome Project). It also contains information on the up-to-date database of complete genomes (PEDANT), the classification of protein sequences (ProtFam) and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database. These databases can be accessed through the MIPS WWW server (http://www. mips.biochem.mpg.de).
    Nucleic Acids Research 02/2000; 28(1):37-40. · 8.03 Impact Factor
  • Source
    Article: A genomic perspective on protein families.
    [show abstract] [hide abstract]
    ABSTRACT: In order to extract the maximum amount of information from the rapidly accumulating genome sequences, all conserved genes need to be classified according to their homologous relationships. Comparison of proteins encoded in seven complete genomes from five major phylogenetic lineages and elucidation of consistent patterns of sequence similarities allowed the delineation of 720 clusters of orthologous groups (COGs). Each COG consists of individual orthologous proteins or orthologous sets of paralogs from at least three lineages. Orthologs typically have the same function, allowing transfer of functional information from one member to an entire COG. This relation automatically yields a number of functional predictions for poorly characterized genomes. The COGs comprise a framework for functional and evolutionary genome analysis.
    Science 11/1997; 278(5338):631-7. · 31.20 Impact Factor

Keywords

60% sequence identity
 
Arabidopsis thaliana
 
Caenorhabditis elegans
 
corresponding interacting proteins
 
different species
 
Drosophila melanogaster
 
Helicobacter pylori
 
interacting proteins
 
interaction annotation
 
interactions
 
joint sequence identity >80%
 
large-scale interaction networks
 
limited experimental annotation
 
model organisms
 
online database
 
protein family
 
protein-protein interactions
 
sequence similarity
 
two-hybrid experiments
 
verify 45 overlaps