Article

Large-scale assignment of orthology: back to phylogenetics? Genome Biol 9:235

Bioinformatics and Genomics Program, Center for Genomic Regulation, Doctor Aiguader 88, Barcelona, Spain.
Genome biology (Impact Factor: 10.47). 11/2008; 9(10):235. DOI: 10.1186/gb-2008-9-10-235
Source: PubMed

ABSTRACT Reliable orthology prediction is central to comparative genomics. Although orthology is defined by phylogenetic criteria, most automated prediction methods are based on pairwise sequence comparisons. Recently, automated phylogeny-based orthology prediction has emerged as a feasible alternative for genome-wide studies.

0 Followers
 · 
98 Views
 · 
7 Downloads
  • Source
    • "A number of other alternative orthology inference pipelines also suffer from using similarity measurements as approximations to directly infer orthology (Li et al. 2003; Roure et al. 2007; Schreiber et al. 2009; Altenhoff et al. 2011, 2013). Given the incomplete and noisy nature of transcriptomic and low-coverage genomic data, orthology is best inferred by using phylogenies to separate paralogs and orthologs after homology has been established (Gabald on 2008). A variety of tree-based orthology inference methods have been developed . "
    [Show abstract] [Hide abstract]
    ABSTRACT: Orthology inference is central to phylogenomic analyses. Phylogenomic data sets commonly include transcriptomes and low-coverage genomes that are incomplete and contain errors and isoforms. These properties can severely violate the underlying assumptions of orthology inference with existing heuristics. We present a procedure that uses phylogenies for both homology and orthology assignment. The procedure first uses similarity scores to infer putative homologs that are then aligned, constructed into phylogenies, and pruned of spurious branches caused by deep paralogs, misassembly, frame shifts, or recombination. These final homologs are then used to identify orthologs. We explore four alternative tree-based orthology inference approaches, of which two are new. These accommodate gene and genome duplications as well as gene tree discordance. We demonstrate these methods in three published data sets including the grape family, Hymenoptera, and millipedes with divergence times ranging from ca. 100 Ma to over 400 Ma. The procedure significantly increased the completeness and accuracy of the inferred homologs and orthologs. We also found that data sets that are more recently diverged and/or include more high-coverage genomes had more complete sets of orthologs. To explicitly evaluate sources of conflicting phylogenetic signals, we applied serial jackknife analyses of gene regions keeping each locus intact. The methods described here can scale to over 100 taxa. They have been implemented in python with independent scripts for each step, making it easy to modify or incorporate them into existing pipelines. All scripts are available from https://bitbucket.org/yangya/phylogenomic_dataset_construction.
    Molecular Biology and Evolution 08/2014; 31(11). DOI:10.1093/molbev/msu245 · 14.31 Impact Factor
  • Source
    • "Amino acid sequences were aligned using MAFFT, and gene trees were obtained with RAxML as described earlier. To overcome common biases related to poorly resolved phylogenies (Hahn 2007), we used an approach similar to that described as the species-overlap method (Gabaldon 2008). When faced with disagreement between the gene and species trees, we used a conservative criterion that takes into account short branch lengths and the known problems of incomplete lineage sorting that lead to inconsistencies across genes in the position of D. willistoni (Tamura et al. 2004; Obbard et al. 2012) and the relationships among D. yakuba, D. erecta, and the melanogaster cluster (Pollard et al. 2006). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Gene turnover rates and the evolution of gene family sizes are important aspects of genome evolution. Here, we use curated sequence data of the major chemosensory gene families from Drosophila - the gustatory receptor (GR), odorant receptor (OR), ionotropic receptor (IR), and odorant binding protein (OBP) families - to conduct a comparative analysis among families, exploring different methods to estimate gene birth and death rates, including an ad hoc simulation study. Remarkably, we found that the state-of-the-art methods may produce very different rate estimates, which may lead to disparate conclusions regarding the evolution of chemosensory gene family sizes in Drosophila. Among biological factors, we found that a peculiarity of D. sechellia's gene turnover rates was a major source of bias in global estimates, whereas gene conversion had negligible effects for the families analyzed herein. Turnover rates vary considerably among families, subfamilies and ortholog groups, although all analyzed families were quite dynamic in terms of gene turnover. Computer simulations showed that the methods that use ortholog group information appear to be the most accurate for the Drosophila chemosensory families. Most importantly, these results reveal the potential of rate heterogeneity among lineages to severely bias some turnover rate estimation methods and the need of further evaluating the performance of these methods in a more diverse sampling of gene families and phylogenetic contexts. Using branch-specific codon substitution models, we find further evidence of positive selection in recently duplicated genes, which attests to a non-neutral aspect of the gene birth-and-death process.
    Genome Biology and Evolution 06/2014; 6(7). DOI:10.1093/gbe/evu130 · 4.53 Impact Factor
  • Source
    • "GBSSI sequences of some diploid FEVRE samples showed evi- 240 dence of two divergent types that were assumed to be the product 241 of a duplication event. Following a ''species-overlap rule'' 242 (Gabaldón, 2008), we inferred that a node was associated with a 243 speciation event if its branches had mutually exclusive sets of dip- 244 loid species; in contrast, a node with overlapping sets of diploid 245 species was associated with a duplication event. Polyploid samples 246 were excluded from this analysis because homoeologous copies 247 from widely divergent parents might be placed on different 248 branches, hence increasing the number of spurious duplication 249 events. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The fine-leaved Loliinae is one of the temperate grass lineages that is richest in number of evolutionary switches from perennial to annual life-cycle, and also shows one of the most complex reticulate patterns involving distinct diploid and allopolyploid lineages. Eight distinct annual lineages, that have traditionally been placed in the genus Vulpia and in other fine-leaved ephemeral genera, have apparently emerged from different perennial Festuca ancestors. The phenotypically similar Vulpia taxa have been reconstructed as polyphyletic, with polyploid lineages showing unclear relationships to their purported diploid relatives. Interspecific and intergeneric hybridization is, however, rampant across different lineages. An evolutionary analysis based on cloned nuclear low-copy GBSSI (Granule-Bound Starch Synthase I) and multicopy ITS (Internal Transcribed Spacer) sequences has been conducted on representatives of most Vulpia species and other fine-leaved lineages, using Bayesian consensus and agreement trees, networking split graphs and species tree-based approaches, to disentangle their phylogenetic relationships and to identify the parental genome donors of the allopolyploids. Both data sets were able to reconstruct a congruent phylogeny in which Vulpia was resolved as polyphyletic from at least three main ancestral diploid lineages. These, in turn, participated in the origin of the derived allopolyploid Vulpia lineages together with other Festuca-like, Psilurus-like and some unknown genome donors. Long-distance dispersal events were inferred to explain the polytopic origin of the Mediterranean and American Vulpia lineages.
    Molecular Phylogenetics and Evolution 06/2014; 79. DOI:10.1016/j.ympev.2014.06.009 · 4.02 Impact Factor
Show more

Preview (3 Sources)

Download
7 Downloads
Available from