EnsemblCompara GeneTrees: Complete, Duplication-Aware Phylogenetic Trees in Vertebrates

EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.
Genome Research (Impact Factor: 14.63). 12/2008; 19(2):327-35. DOI: 10.1101/gr.073585.107
Source: PubMed


We have developed a comprehensive gene orientated phylogenetic resource, EnsemblCompara GeneTrees, based on a computational pipeline to handle clustering, multiple alignment, and tree generation, including the handling of large gene families. We developed two novel non-sequence-based metrics of gene tree correctness and benchmarked a number of tree methods. The TreeBeST method from TreeFam shows the best performance in our hands. We also compared this phylogenetic approach to clustering approaches for ortholog prediction, showing a large increase in coverage using the phylogenetic approach. All data are made available in a number of formats and will be kept up to date with the Ensembl project.

Download full-text


Available from: Jessica Severin, Mar 03, 2014
1 Follower
40 Reads
  • Source
    • "Generating a phylogenetic profile for a human gene involves first identifying its orthologs in other species (homologs derived vertically from a common ancestor and expected to share the same function [Koonin, 2005]). Orthology inference is a mature field, with a large number of graph-based (clustering based on sequence similarity scores, e.g., BLAST) and tree-based (reconciliation of gene trees inferred from sequence similarity with the species tree) algorithms (Huerta-Cepas et al., 2014; Li et al., 2003; Powell et al., 2014; Schreiber et al., 2014; Tatusov et al., 1997; Vilella et al., 2009). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Information about functional connections between genes can be derived from patterns of coupled loss of their homologs across multiple species. This comparative approach, termed phylogenetic profiling, has been successfully used to infer genetic interactions in bacteria and eukaryotes. Rapid progress in sequencing eukaryotic species has enabled the recent phylogenetic profiling of the human genome, resulting in systematic functional predictions for uncharacterized human genes. Importantly, groups of co-evolving genes reveal widespread modularity in the underlying genetic network, facilitating experimental analyses in human cells as well as comparative studies of conserved functional modules across species. This strategy is particularly successful in identifying novel metabolic proteins and components of multi-protein complexes. The targeted sequencing of additional key eukaryotes and the incorporation of improved methods to generate and compare phylogenetic profiles will further boost the predictive power and utility of this evolutionary approach to the functional analysis of gene interaction networks.
  • Source
    • "We annotated TFs with an activator or repressor role using the GO terms " positive regulation of transcription " and " negative regulation of transcription " (Ashburner et al. 2000). We used the paralogous gene dating information in Ensembl Compara (Vilella et al. 2009) to extract human TF families whose duplication time was consistent with the 2 R-WGD at the base of the vertebrates before the formation of jawed vertebrates (Euteleostomi paralogy type). For each gene family member, orthologous sequences from Mus musculus (mouse), Gallus gallus (chicken), and Danio rerio (zebrafish) were retrieved from Ensembl. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The high regulatory complexity of vertebrates has been related to two closely spaced whole genome duplications (2R-WGD) that occurred before the divergence of the major vertebrate groups. Following these events, many developmental transcription factors (TFs) were retained in multiple copies and subsequently specialized in diverse functions, whereas others reverted to their singleton state. TFs are known to be generally rich in amino acid repeats or low-complexity regions (LCRs), such as polyalanine or polyglutamine runs, which can evolve rapidly and potentially influence the transcriptional activity of the protein. Here we test the hypothesis that LCRs have played a major role in the diversification of TF gene duplicates. We find that nearly half of the TF gene families originated during the 2R-WGD contain LCRs. The number of gene duplicates with LCRs is 155 out of 550 analyzed (28%), about twice as many as the number of single copy genes with LCRs (15 out of 115, 13%). In addition, duplicated TFs preferentially accumulate certain LCR types, the most prominent of which are alanine repeats. We experimentally test the role of alanine-rich LCRs in two different TF gene families, PHOX2A/PHOX2B and LHX2/LHX9. In both cases, the presence of the alanine-rich LCR in one of the copies (PHOX2B and LHX2) significantly increases the capacity of the TF to activate transcription. Taken together, the results provide strong evidence that LCRs are important driving forces of evolutionary change in duplicated genes. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
    Molecular Biology and Evolution 04/2015; 32(9). DOI:10.1093/molbev/msv103 · 9.11 Impact Factor
  • Source
    • "Information on gene trees and gene order were downloaded from Ensembl v.57 (Flicek et al., 2013) for all available genomes (51 species). For yeasts, the gene order information was obtained from Genolevures for 11 species (Sherman et al., 2009); gene trees were built using TreeBest (Vilella et al., 2009). The ancestral genome reconstruction method computes pairwise comparisons of gene order for all pairs of species that are informative for the ancestor of interest; i.e., the ancestor is on the pathway between both species in the phylogenetic tree. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Genomic rearrangements are a major source of evolutionary divergence in eukaryotic genomes, a cause of genetic diseases and a hallmark of tumor cell progression, yet the mechanisms underlying their occurrence and evolutionary fixation are poorly understood. Statistical associations between breakpoints and specific genomic features suggest that genomes may contain elusive "fragile regions" with a higher propensity for breakage. Here, we use ancestral genome reconstructions to demonstrate a near-perfect correlation between gene density and evolutionary rearrangement breakpoints. Simulations based on functional features in the human genome show that this pattern is best explained as the outcome of DNA breaks that occur in open chromatin regions coming into 3D contact in the nucleus. Our model explains how rearrangements reorganize the order of genes in an evolutionary neutral fashion and provides a basis for understanding the susceptibility of "fragile regions" to breakage. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
    Cell Reports 03/2015; 10(11). DOI:10.1016/j.celrep.2015.02.046 · 8.36 Impact Factor
Show more