EnsemblCompara GeneTrees: Complete, Duplication-Aware Phylogenetic Trees in Vertebrates

EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.
Genome Research (Impact Factor: 14.63). 12/2008; 19(2):327-35. DOI: 10.1101/gr.073585.107
Source: PubMed


We have developed a comprehensive gene orientated phylogenetic resource, EnsemblCompara GeneTrees, based on a computational pipeline to handle clustering, multiple alignment, and tree generation, including the handling of large gene families. We developed two novel non-sequence-based metrics of gene tree correctness and benchmarked a number of tree methods. The TreeBeST method from TreeFam shows the best performance in our hands. We also compared this phylogenetic approach to clustering approaches for ortholog prediction, showing a large increase in coverage using the phylogenetic approach. All data are made available in a number of formats and will be kept up to date with the Ensembl project.

Download full-text


Available from: Jessica Severin, Mar 03, 2014
  • Source
    • "We also performed an orthology analysis by sequence comparison between grapevine MADS-box deduced proteins and proteins from 16 plant species retrieved at in order to obtain information about the possible function of those genes (Fig. 3). The objective was to identify the sequences that show orthology in a given species with a minimal level of complexity (one ortholog in each species, or orthology one-to-one as previously described[43]). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background MADS-box genes encode transcription factors that are involved in developmental control and signal transduction in eukaryotes. In plants, they are associated to numerous development processes most notably those related to reproductive development: flowering induction, specification of inflorescence and flower meristems, establishment of flower organ identity, as well as regulation of fruit, seed and embryo development. Genomic analyses of MADS-box genes in different plant species are providing new relevant information on the function and evolution of this transcriptional factor family. We have performed a true genome-wide analysis of the complete set of MADS-box genes in grapevine (Vitis vinifera), analyzed their expression pattern and establish their phylogenetic relationships (including MIKC* and type I MADS-box) with genes from 16 other plant species. This study was integrated to previous works on the family in grapevine. Results A total of 90 MADS-box genes were detected in the grapevine reference genome by completing current gene annotations with a genome-wide analysis based on sequence similarity. We performed a thorough in-depth curation of all gene models and combined the results with gene expression information including RNAseq data to clarifying the expression of newly identified genes and improve their functional characterization. Curated data were uploaded to the ORCAE database for grapevine in the frame of the grapevine genome curation effort. This approach resulted in the identification of 30 additional MADS box genes. Among them, ten new MIKCC genes were identified, including a potential new group of short proteins similar to the SVP protein subfamily. The MIKC* subgroup contains six genes in grapevine that can be grouped in the S (4 genes) and P (2 genes) clades, showing less redundancy than that observed in Arabidopsis thaliana. Expression pattern of these genes in grapevine is compatible with a role in male gametophyte development. Most of the identified new genes belong to the type I MADS-box genes and were classified as members of the Mα and Mγ subclasses. Ours analyses indicate that only few members of type I genes in grapevine have homology in other species and that species-specific clades appeared both in the Mα and Mγ subclasses. On the other hand, as deduced from the phylogenetic analysis with other plant species, genes that can be crucial for development of central cell, endosperm and embryos seems to be conserved in plants. Conclusions The genome analysis of MADS-box genes in grapevine, the characterization of their pattern of expression and the phylogenetic analysis with other plant species allowed the identification of new MADS-box genes not yet described in other plant species as well as basic characterization of their possible role, particularly in the case of type I and MIKC* genes. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2398-7) contains supplementary material, which is available to authorized users.
    Full-text · Article · Dec 2016 · BMC Genomics
  • Source
    • "We confirmed that CG14619, which is a CCG[37,38], is the orthologue of mUsp2 by protein sequence alignments (S1 Fig) andthat dUsp2-kd suffer from sub viability ([33]; Fig 2C–2F,S1 and S2 Files). Interestingly, dUsp2 and dUsp8, which is a core clock cogwheel in Drosophila are paralogues and possibly differentially evolved towards output effector and core cogwheel status, respectively[36,48]. We found no significant alteration of the circadian free-running period in both clock-neuron specific knockdown of dUsp2 in Drosophila and in our Usp2-KO mouse (Fig 1, Fig 2A and 2B, Table 1). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The mammalian circadian clock influences most aspects of physiology and behavior through the transcriptional control of a wide variety of genes, mostly in a tissue-specific manner. About 20 clock-controlled genes (CCGs) oscillate in virtually all mammalian tissues and are generally considered as core clock components. One of them is Ubiquitin-Specific Protease 2 (Usp2), whose status remains controversial, as it may be a cogwheel regulating the stability or activity of core cogwheels or an output effector. We report here that Usp2 is a clock output effector related to bodily Ca2+ homeostasis, a feature that is conserved across evolution. Drosophila with a whole-body knockdown of the orthologue of Usp2, CG14619 (dUsp2-kd), predominantly die during pupation but are rescued by dietary Ca2+ supplementation. Usp2-KO mice show hyperabsorption of dietary Ca2+ in small intestine, likely due to strong overexpression of the membrane scaffold protein NHERF4, a regulator of the Ca2+ channel TRPV6 mediating dietary Ca2+ uptake. In this tissue, USP2-45 is found in membrane fractions and negatively regulates NHERF4 protein abundance in a rhythmic manner at the protein level. In clock mutant animals (Cry1/Cry2-dKO), rhythmic USP2-45 expression is lost, as well as the one of NHERF4, confirming the inverse relationship between USP2-45 and NHERF4 protein levels. Finally, USP2-45 interacts in vitro with NHERF4 and endogenous Clathrin Heavy Chain. Taken together these data prompt us to define USP2-45 as the first clock output effector acting at the post-translational level at cell membranes and possibly regulating membrane permeability of Ca2+.
    Full-text · Article · Jan 2016 · PLoS ONE
  • Source
    • "Generating a phylogenetic profile for a human gene involves first identifying its orthologs in other species (homologs derived vertically from a common ancestor and expected to share the same function [Koonin, 2005]). Orthology inference is a mature field, with a large number of graph-based (clustering based on sequence similarity scores, e.g., BLAST) and tree-based (reconciliation of gene trees inferred from sequence similarity with the species tree) algorithms (Huerta-Cepas et al., 2014; Li et al., 2003; Powell et al., 2014; Schreiber et al., 2014; Tatusov et al., 1997; Vilella et al., 2009). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Information about functional connections between genes can be derived from patterns of coupled loss of their homologs across multiple species. This comparative approach, termed phylogenetic profiling, has been successfully used to infer genetic interactions in bacteria and eukaryotes. Rapid progress in sequencing eukaryotic species has enabled the recent phylogenetic profiling of the human genome, resulting in systematic functional predictions for uncharacterized human genes. Importantly, groups of co-evolving genes reveal widespread modularity in the underlying genetic network, facilitating experimental analyses in human cells as well as comparative studies of conserved functional modules across species. This strategy is particularly successful in identifying novel metabolic proteins and components of multi-protein complexes. The targeted sequencing of additional key eukaryotes and the incorporation of improved methods to generate and compare phylogenetic profiles will further boost the predictive power and utility of this evolutionary approach to the functional analysis of gene interaction networks.
    Full-text · Article · Aug 2015
Show more