Resolving the Ortholog Conjecture: Orthologs Tend to Be Weakly, but Significantly, More Similar in Function than Paralogs

University of California Davis, United States of America
PLoS Computational Biology (Impact Factor: 4.62). 05/2012; 8(5):e1002514. DOI: 10.1371/journal.pcbi.1002514
Source: PubMed


The function of most proteins is not determined experimentally, but is extrapolated from homologs. According to the "ortholog conjecture", or standard model of phylogenomics, protein function changes rapidly after duplication, leading to paralogs with different functions, while orthologs retain the ancestral function. We report here that a comparison of experimentally supported functional annotations among homologs from 13 genomes mostly supports this model. We show that to analyze GO annotation effectively, several confounding factors need to be controlled: authorship bias, variation of GO term frequency among species, variation of background similarity among species pairs, and propagated annotation bias. After controlling for these biases, we observe that orthologs have generally more similar functional annotations than paralogs. This is especially strong for sub-cellular localization. We observe only a weak decrease in functional similarity with increasing sequence divergence. These findings hold over a large diversity of species; notably orthologs from model organisms such as E. coli, yeast or mouse have conserved function with human proteins.

Download full-text


Available from: Romain Studer
  • Source
    • "Oneofthebenefitsofextensiveandaccuratepredictionoforthologsacrossplantspeciesistheabilityto projectfunctionalannotationbetweenpairsoforthologousgenesontheassumptionthatorthologues generallyretainfunctionbetweenspecies(Altenhoffetal.,2012 "
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent developments in DNA sequencing have enabled the large and complex genomes of many crop species to be determined for the first time, even those previously intractable due to their polyploid nature. Indeed, over the course of the last two years, the genome sequences of several commercially important cereals, notably barley and bread wheat, have become available, as well as those of related wild species. While still incomplete, comparison to other, more completely assembled species suggests that coverage of genic regions is likely to be high. Ensembl Plants ( is an integrative resource organising, analysing and visualising genome-scale information for important crop and model plants. Available data includes reference genome sequence, variant loci, gene models and functional annotation. For variant loci, individual and population genotypes, linkage information and, where available, phenotypic information, are shown. Comparative analyses are performed on DNA and protein sequence alignments. The resulting genome alignments and gene trees, representing the implied evolutionary history the gene family, are made available for visualisation and analysis. Driven by the use case of bread wheat, specific extensions to the analysis pipelines and web interface have recently been developed to support polyploid genomes. Data in Ensembl Plants is accessible through a genome browser incorporating various specialist interfaces for different data types, and through a variety of additional methods for programmatic access and data mining. These interfaces are consistent with those offered through the Ensembl interface for the genomes of non-plant species, including those of plant pathogens, pests and pollinators, facilitating the study of the plant in its environment. © The Author(s) 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists.
    Full-text · Article · Nov 2014 · Plant and Cell Physiology
  • Source
    • "Several consequent studies suggested that GO annotations should be used to test the OC hypothesis with a great caution [30, 31] or even should not be used for this purpose [32]. A general consensus is that GO annotations are compatible with the OC hypothesis [30, 32], although Altenhoff and coworkers suggested that GO annotations are better compatible with the “uniform” model [31]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Ongoing debates about functional importance of gene duplications have been recently intensified by a heated discussion of the "ortholog conjecture" (OC). Under the OC, which is central to functional annotation of genomes, orthologous genes are functionally more similar than paralogous genes at the same level of sequence divergence. However, a recent study challenged the OC by reporting a greater functional similarity, in terms of gene ontology (GO) annotations and expression profiles, among within-species paralogs compared to orthologs. These findings were taken to indicate that functional similarity of homologous genes is primarily determined by the cellular context of the genes, rather than evolutionary history. Subsequent studies suggested that the OC appears to be generally valid when applied to mammalian evolution but the complete picture of evolution of gene expression also has to incorporate lineage-specific aspects of paralogy. The observed complexity of gene expression evolution after duplication can be explained through selection for gene dosage effect combined with the duplication-degeneration-complementation model. This paper discusses expression divergence of recent duplications occurring before functional divergence of proteins encoded by duplicate genes.
    Full-text · Article · Aug 2014
  • Source
    • "For each collected homeologous pair, an orthogroup was constructed consisting of the homeologous pair and their orthologs in other plant species, since orthology relationships provide the most accurate representation of the followed evolutionary history (Fawcett et al. 2009; Altenhoff et al. 2012; Gabaldon and Koonin 2013). We used Inparanoid (v4.1) (Ostlund et al. 2010) with default parameter settings to detect orthologs. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Ancient whole genome duplications (WGDs), also referred to as paleopolyploidizations, have been reported in most evolutionary lineages. Their attributed role remains a major topic of discussion, ranging from an evolutionary dead end to a road towards evolutionary success, with evidence supporting both fates. Previously, based on dating WGDs in a limited number of plant species, we found a clustering of angiosperm paleopolyploidizations around the Cretaceous-Paleogene (K-Pg) extinction event about 66 million years ago. Here, we revisit this finding, which has proven controversial, by combining genome sequence information for many more plant lineages and using more sophisticated analyses. We included 38 full genome sequences and three transcriptome assemblies in a Bayesian evolutionary analysis framework that incorporates uncorrelated relaxed clock methods and fossil uncertainty. In accordance with earlier findings, we demonstrate a strongly non-random pattern of genome duplications over time with many WGDs clustering around the K-Pg boundary. We interpret these results in the context of recent studies on invasive polyploid plant species, and suggest that polyploid establishment is promoted during times of environmental stress. We argue that considering the evolutionary potential of polyploids in light of the environmental and ecological conditions present around the time of polyploidization could mitigate the stark contrast in the proposed evolutionary fates of polyploids.
    Full-text · Article · May 2014 · Genome Research
Show more