EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates.

EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.
Genome Research (Impact Factor: 13.85). 12/2008; 19(2):327-35. DOI: 10.1101/gr.073585.107
Source: PubMed

ABSTRACT We have developed a comprehensive gene orientated phylogenetic resource, EnsemblCompara GeneTrees, based on a computational pipeline to handle clustering, multiple alignment, and tree generation, including the handling of large gene families. We developed two novel non-sequence-based metrics of gene tree correctness and benchmarked a number of tree methods. The TreeBeST method from TreeFam shows the best performance in our hands. We also compared this phylogenetic approach to clustering approaches for ortholog prediction, showing a large increase in coverage using the phylogenetic approach. All data are made available in a number of formats and will be kept up to date with the Ensembl project.

1 Bookmark
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The evolution of animals involved acquisition of an emergent gene repertoire for gastrulation. Whether loss of genes also co-evolved with this developmental reprogramming has not yet been addressed. Here, we identify twenty-four genetic functions that are retained in fungi and choanoflagellates but undetectable in animals. These lost genes encode: (i) sixteen distinct biosynthetic functions; (ii) the two ancestral eukaryotic ClpB disaggregases, Hsp78 and Hsp104, which function in the mitochondria and cytosol, respectively; and (iii) six other assorted functions. We present computational and experimental data that are consistent with a joint function for the differentially localized ClpB disaggregases, and with the possibility of a shared client/chaperone relationship between the mitochondrial Fe/S homoaconitase encoded by the lost LYS4 gene and the two ClpBs. Our analyses lead to the hypothesis that the evolution of gastrulation-based multicellularity in animals led to efficient extraction of nutrients from dietary sources, loss of natural selection for maintenance of energetically expensive biosynthetic pathways, and subsequent loss of their attendant ClpB chaperones.
    PLoS ONE 01/2015; 10(2):e0117192. DOI:10.1371/journal.pone.0117192 · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: [FR] Les phénomènes évolutifs qui perturbent l’organisation des gènes dans les génomes eucaryotes sont de deux types : les changements dans l’ordre des gènes, ou réarrangements, et les modifications du contenu en gènes du génome, par duplications, délétions ou gains de gènes. Ces processus sont mal connus, tant au niveau de leurs mécanismes d’apparition que de leur impact fonctionnel et sélectif. Ce travail de thèse s’articule autour de deux projets. Le premier s’intéresse à la distribution des points de cassure de réarrangements évolutifs entre un génome ancestral et ses descendants modernes. Cette distribution a été modélisée en fonction des caractéristiques locales du génome pour mettre en évidence quels facteurs influencent la probabilité de cassure. Nos résultats montrent que la distribution des cassures peut s’expliquer simplement comme une fonction de la longueur des espaces intergéniques, fonction qui est cependant non-linéaire contrairement aux attentes sous un régime aléatoire classique. La répartition des points de cassure dans les génomes semble principalement liée à des propriétés de structure, et n’est que peu soumise à des contraintes de sélection. Elle pourrait être liée à la structure chromatinienne du génome. Le second projet s’inscrit dans le cadre du séquençage du génome du poisson zèbre, et fournit un aperçu global de l’organisation de ce génome. Les génomes de poissons téléostéens sont anciennement dupliqués : l’analyse est axée sur les conséquences de cette duplication. Les résultats montrent que le génome du poisson zèbre présente une organisation assez typique d’un génome téléostéen. Les gènes retenus en deux copies après la duplication du génome appartiennent à des catégories fonctionnelles particulières, et sont biaisés vers des gènes déjà conservés après les duplications 1R et 2R ayant eu lieu au début de l’histoire des vertébrés. [ENG] Evolutionary processes disrupting the gene organisation in eukaryotic genomes belong to two categories: changes in the order of the genes, known as rearrangements, and changes in the content of the genome by gene duplications, deletions and gains. The mechanisms through which these events arise, and their functional and selective impact on genomes, are poorly understood. This thesis covers two different projects. Firstly, we investigated the distribution of rearrangement breakpoints between an ancestral genome and its modern descendants. This distribution was modelled according to local genomic characteristics to highlight factors influencing the breakage process. Our results show that the distribution of breakpoints can be simply explained as a function of intergenic spacers length, although in a non-linear fashion differing from classical random expectations. The repartition of breakpoints in genomes seems to be linked to structural properties, and is only marginally affected by selective constraints. It might in fact reflect local chromatin structure in the genome. The second project is part of the joint sequencing effort for the zebrafish genome, and provides an overview of the organisation of this genome. Teleost fish genomes are anciently duplicated: the analysis focuses on the consequences of this duplication. Results show that the zebrafish genome displays a typical teleost fish genome organisation. Genes retained in two copies after the whole genome duplication belong to specific functional categories, and are biased towards genes already conserved as duplicates after the 1R and 2R duplication events that have taken place early in vertebrate history.
    09/2012, Degree: Thèse de Doctorat, Supervisor: Hugues Roest Crollius
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: More reliable and faster prediction methods are needed to interpret enormous amounts of data generated by sequencing and genome projects. We have developed a new computational tool, PON-P2, for classification of amino acid substitutions in human proteins. The method is a machine learning-based classifier and groups the variants into pathogenic, neutral and unknown classes, on the basis of random forest probability score. PON-P2 is trained using pathogenic and neutral variants obtained from VariBench, a database for benchmark variation datasets. PON-P2 utilizes information about evolutionary conservation of sequences, physical and biochemical properties of amino acids, GO annotations and if available, functional annotations of variation sites. Extensive feature selection was performed to identify 8 informative features among altogether 622 features. PON-P2 consistently showed superior performance in comparison to existing state-of-the-art tools. In 10-fold cross-validation test, its accuracy and MCC are 0.90 and 0.80, respectively, and in the independent test, they are 0.86 and 0.71, respectively. The coverage of PON-P2 is 61.7% in the 10-fold cross-validation and 62.1% in the test dataset. PON-P2 is a powerful tool for screening harmful variants and for ranking and prioritizing experimental characterization. It is very fast making it capable of analyzing large variant datasets. PON-P2 is freely available at
    PLoS ONE 02/2015; 10(2):e0117380. DOI:10.1371/journal.pone.0117380 · 3.53 Impact Factor

Full-text (2 Sources)

Available from
May 23, 2014