Thesis

Phylogenomics and comparative genomics in ant-eating mammals

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

The phenomenon of evolutionary convergence is a fascinating process in which distantly related species independently acquire similar characteristics in response to similar selective pressures. Ant- and termite-eating mammals are among the most famous examples of morphological convergence. Indeed, this particular lifestyle evolved in five distinct lineages of mammals: the aardvark (Tubulidentata), the aardwolf (Carnivora), the anteaters (Pilosa), the giant armadillo (Cingulata), and the pangolins (Pholidota). To better undestand the evolution of these organisms, several approaches were developed in this thesis. First, I present an original strategy to characterize the precise diet of myrmecophagous mammals taking advantage of metagenomic sequencing data generated from fecal samples and a reference mitogenomic database of termites and ants. Second, with the final objective of detecting molecular convergence at the genomic scale in ant-eating mammals, we generated nine high quality mammlian genomes using Oxford Nanopore technologies. The different strategies developed from the set-up of MinION qesuencing to annotation of the resulting assemblies are presented together with a first case study illustrating the use of two of these new reference genomes for species delineation. Finally, I present comparative transcriptomic analyses of salivary glands and other organs in ant-eating mammals suggesting that historical contingency and molecular evolutionary tinkering of chitinase genes played a major role in the convergent evolution of myrmecophagy.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The Zoonomia Project is investigating the genomics of shared and specialized traits in eutherian mammals. Here we provide genome assemblies for 131 species, of which all but 9 are previously uncharacterized, and describe a whole-genome alignment of 240 species of considerable phylogenetic diversity, comprising representatives from more than 80% of mammalian families. We find that regions of reduced genetic diversity are more abundant in species at a high risk of extinction, discern signals of evolutionary selection at high resolution and provide insights from individual reference genomes. By prioritizing phylogenetic diversity and making data available quickly and without restriction, the Zoonomia Project aims to support biological discovery, medical research and the conservation of biodiversity.
Preprint
Full-text available
In a context of ongoing biodiversity erosion, obtaining genomic resources from wildlife is becoming essential for conservation. The thousands of yearly mammalian roadkill could potentially provide a useful source material for genomic surveys. To illustrate the potential of this underexploited resource, we used roadkill samples to sequence reference genomes and study the genomic diversity of the bat-eared fox ( Otocyon megalotis ) and the aardwolf ( Proteles cristata ) for which subspecies have been defined based on similar disjunct distributions in Eastern and Southern Africa. By developing an optimized DNA extraction protocol, we successfully obtained long reads using the Oxford Nanopore Technologies (ONT) MinION device. For the first time in mammals, we obtained two reference genomes with high contiguity and gene completeness by combining ONT long reads with Illumina short reads using hybrid assembly. Based on re-sequencing data from few other roakill samples, the comparison of the genetic differentiation between our two pairs of subspecies to that of pairs of well-defined species across Carnivora showed that the two subspecies of aardwolf might warrant species status ( P. cristata and P. septentrionalis ), whereas the two subspecies of bat-eared fox might not. Moreover, using these data, we conducted demographic analyses that revealed similar trajectories between Eastern and Southern populations of both species, suggesting that their population sizes have been shaped by similar environmental fluctuations. Finally, we obtained a well resolved genome-scale phylogeny for Carnivora with evidence for incomplete lineage sorting among the three main arctoid lineages. Overall, our cost-effective strategy opens the way for large-scale population genomic studies and phylogenomics of mammalian wildlife using roadkill.
Article
Full-text available
Background. Ecological adaptations of mammals are reflected in the morphological diversity of their feeding apparatus, which includes differences in tooth crown morphologies, variation in snout size, or changes in muscles of the feeding apparatus. The adaptability of their feeding apparatus allowed them to optimize resource exploitation in a wide range of habitats. The combination of computer-assisted X-ray microtomography (µ-CT) with contrast-enhancing staining protocols has bolstered the reconstruction of three-dimensional (3D) models of muscles. This new approach allows for accurate descriptions of muscular anatomy, as well as the quick measurement of muscle volumes and fiber orientation. Ant- and termite-eating (myrmecophagy) represents a case of extreme feeding specialization, which is usually accompanied by tooth reduction or complete tooth loss, snout elongation, acquisition of a long vermiform tongue, and loss of the zygomatic arch. Many of these traits evolved independently in distantly-related mammalian lineages. Previous reports on South American anteaters (Vermilingua) have shown major changes in the masticatory, intermandibular, and lingual muscular apparatus. These changes have been related to a functional shift in the role of upper and lower jaws in the evolutionary context of their complete loss of teeth and masticatory ability. Methods. We used an iodine staining solution (I2KI) to perform contrast-enhanced µ-CT scanning on heads of the pygmy (Cyclopes didactylus), collared (Tamandua tetradactyla) and giant (Myrmecophaga tridactyla) anteaters. We reconstructed the musculature of the feeding apparatus of the three extant anteater genera using 3D reconstructions complemented with classical dissections of the specimens. We performed a description of the musculature of the feeding apparatus in the two morphologically divergent vermilinguan families (Myrmecophagidae and Cyclopedidae) and compared it to the association of morphological features found in other myrmecophagous placentals. Results. We found that pygmy anteaters (Cyclopes) present a relatively larger and architecturally complex temporal musculature than that of collared (Tamandua) and giant (Myrmecophaga) anteaters, but shows a reduced masseter musculature, including the loss of the deep masseter. The loss of this muscle concurs with the loss of the jugal bone in Cyclopedidae. We show that anteaters, pangolins, and aardvarks present distinct anatomies despite morphological and ecological convergences.
Article
Full-text available
Speciation rates vary considerably among lineages, and our understanding of what drives the rapid succession of speciation events within young adaptive radiations remains incomplete1–11. The cichlid fish family provides a notable example of such variation, with many slowly speciating lineages as well as several exceptionally large and rapid radiations12. Here, by reconstructing a large phylogeny of all currently described cichlid species, we show that explosive speciation is solely concentrated in species flocks of several large young lakes. Increases in the speciation rate are associated with the absence of top predators; however, this does not sufficiently explain explosive speciation. Across lake radiations, we observe a positive relationship between the speciation rate and enrichment of large insertion or deletion polymorphisms. Assembly of 100 cichlid genomes within the most rapidly speciating cichlid radiation, which is found in Lake Victoria, reveals exceptional ‘genomic potential’—hundreds of ancient haplotypes bear insertion or deletion polymorphisms, many of which are associated with specific ecologies and shared with ecologically similar species from other older radiations elsewhere in Africa. Network analysis reveals fundamentally non-treelike evolution through recombining old haplotypes, and the origins of ecological guilds are concentrated early in the radiation. Our results suggest that the combination of ecological opportunity, sexual selection and exceptional genomic potential is the key to understanding explosive adaptive radiation. Analyses of the genomes of cichlid species reveal that the combination of ecological opportunity, sexual selection and exceptional genomic potential is the key to understanding explosive adaptive radiation in cichlids.
Article
Full-text available
Angiosperms have become the dominant terrestrial plant group by diversifying for ~145 million years into a broad range of environments. During the course of evolution, numerous morphological innovations arose, often preceded by whole genome duplications (WGD). The mustard family (Brassicaceae), a successful angiosperm clade with ~4000 species, has been diversifying into many evolutionary lineages for more than 30 million years. Here we develop a species inventory, analyze morphological variation, and present a maternal, plastome-based genus-level phylogeny. We show that increased morphological disparity, despite an apparent absence of clade-specific morphological innovations, is found in tribes with WGDs or diversification rate shifts. Both are important processes in Brassicaceae, resulting in an overall high net diversification rate. Character states show frequent and independent gain and loss, and form varying combinations. Therefore, Brassicaceae pave the way to concepts of phylogenetic genome-wide association studies to analyze the evolution of morphological form and function.
Article
Full-text available
The leaf-nosed bats (Phyllostomidae) are outliers among chiropterans with respect to the unusually high diversity of dietary strategies within the family. Salivary glands, owing to their functions and high ultrastructural variability among lineages, are proposed to have played an important role during the phyllostomid radiation. To identify genes underlying salivary gland functional diversification, we sequenced submandibular gland transcriptomes from phyllostomid species representative of divergent dietary strategies. From the assembled transcriptomes, we performed an array of selection tests and gene expression analyses to identify signatures of adaptation. Overall, we identified an enrichment of immunity related gene ontology terms among 53 genes evolving under positive selection. Lineage specific selection tests revealed several endomembrane system genes under selection in the vampire bat. Many genes that respond to insulin were under selection and differentially expressed genes pointed to modifications of amino acid synthesis pathways in plant-visitors. Results indicate salivary glands have diversified in various ways across a functional diverse clade of mammals in response to niche specializations.
Article
Full-text available
Conflicting relationships have been found between diversification rate and temperature across disparate clades of life. Here, we use a supermatrix comprising nearly 20,000 species of rosids-a clade of~25% of all angiosperm species-to understand global patterns of diversification and its climatic association. Our approach incorporates historical global temperature, assessment of species' temperature niche, and two broad-scale characterizations of tropical versus non-tropical niche occupancy. We find the diversification rates of most subclades dramatically increased over the last 15 million years (Myr) during cooling associated with global expansion of temperate habitats. Climatic niche is negatively associated with diversification rates, with tropical rosids forming older communities and experiencing speciation rates~2-fold below rosids in cooler climates. Our results suggest long-term cooling had a disproportionate effect on non-tropical diversification rates, leading to dynamic young communities outside of the tropics, while relative stability in tropical climes led to older, slower-evolving but still species-rich communities.
Article
Full-text available
Inferring phylogenetic trees for individual homologous gene families is difficult because alignments are often too short, and thus contain insufficient signal, while substitution models inevitably fail to capture the complexity of the evolutionary processes. To overcome these challenges species tree-aware methods also leverage information from a putative species tree. However, only few methods are available that implement a full likelihood framework or account for horizontal gene transfers. Furthermore, these methods often require expensive data pre-processing (e.g., computing bootstrap trees), and rely on approximations and heuristics that limit the degree of tree space exploration. Here we present GeneRax, the first maximum likelihood species tree-aware phylogenetic inference software. It simultaneously accounts for substitutions at the sequence level as well as gene level events, such as duplication, transfer, and loss relying on established maximum likelihood optimization algorithms. GeneRax can infer rooted phylogenetic trees for multiple gene families, directly from the per-gene sequence alignments and a rooted, yet undated, species tree. We show that compared to competing tools, on simulated data GeneRax infers trees that are the closest to the true tree in 90% of the simulations in terms of relative Robinson-Foulds distance. On empirical datasets, GeneRax is the fastest among all tested methods when starting from aligned sequences, and it infers trees with the highest likelihood score, based on our model. GeneRax completed tree inferences and reconciliations for 1099 Cyanobacteria families in eight minutes on 512 CPU cores. Thus, its parallelization scheme enables large-scale analyses. GeneRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax.
Article
Full-text available
Numerous pairs of evolutionarily divergent mammalian species have been shown to produce hybrid offspring. In some cases, F1 hybrids are able to produce F2s through matings with F1s. In other instances, the hybrids are only able to produce offspring themselves through backcrosses with a parent species owing to unisexual sterility (Haldane's Rule). Here, we explicitly tested whether genetic distance, computed from mitochondrial and nuclear genes, can be used as a proxy to predict the relative fertility of the hybrid offspring resulting from matings between species of terrestrial mammals. We assessed the proxy's predictive power using a well-characterized felid hybrid system, and applied it to modern and ancient hominins. Our results revealed a small overlap in mitochondrial genetic distance values that distinguish species pairs whose calculated distances fall within two categories: those whose hybrid offspring follow Haldane's Rule, and those whose hybrid F1 offspring can produce F2s. The strong correlation between genetic distance and hybrid fertility demonstrated here suggests that this proxy can be employed to predict whether the hybrid offspring of two mammalian species will follow Haldane's Rule.
Article
Full-text available
Most proteins associate into multimeric complexes with specific architectures1,2, which often have functional properties such as cooperative ligand binding or allosteric regulation³. No detailed knowledge is available about how any multimer and its functions arose during evolution. Here we use ancestral protein reconstruction and biophysical assays to elucidate the origins of vertebrate haemoglobin, a heterotetramer of paralogous α- and β-subunits that mediates respiratory oxygen transport and exchange by cooperatively binding oxygen with moderate affinity. We show that modern haemoglobin evolved from an ancient monomer and characterize the historical ‘missing link’ through which the modern tetramer evolved—a noncooperative homodimer with high oxygen affinity that existed before the gene duplication that generated distinct α- and β-subunits. Reintroducing just two post-duplication historical substitutions into the ancestral protein is sufficient to cause strong tetramerization by creating favourable contacts with more ancient residues on the opposing subunit. These surface substitutions markedly reduce oxygen affinity and even confer cooperativity, because an ancient linkage between the oxygen binding site and the multimerization interface was already an intrinsic feature of the protein’s structure. Our findings establish that evolution can produce new complex molecular structures and functions via simple genetic mechanisms that recruit existing biophysical features into higher-level architectures.
Article
Full-text available
Background: Whilst much sequencing effort has focused on key mammalian model organisms such as mouse and human, little is known about the relationship between genome sequencing techniques for non-model mammals and genome assembly quality. This is especially relevant to non-model mammals, where the samples to be sequenced are often degraded and of low quality. A key aspect when planning a genome project is the choice of sequencing data to generate. This decision is driven by several factors, including the biological questions being asked, the quality of DNA available, and the availability of funds. Cutting-edge sequencing technologies now make it possible to achieve highly contiguous, chromosome-level genome assemblies, but rely on high-quality high molecular weight DNA. However, funding is often insufficient for many independent research groups to use these techniques. Here we use a range of different genomic technologies generated from a roadkill European polecat (Mustela putorius) to assess various assembly techniques on this low-quality sample. We evaluated different approaches for de novo assemblies and discuss their value in relation to biological analyses. Results: Generally, assemblies containing more data types achieved better scores in our ranking system. However, when accounting for misassemblies, this was not always the case for Bionano and low-coverage 10x Genomics (for scaffolding only). We also find that the extra cost associated with combining multiple data types is not necessarily associated with better genome assemblies. Conclusions: The high degree of variability between each de novo assembly method (assessed from the 7 key metrics) highlights the importance of carefully devising the sequencing strategy to be able to carry out the desired analysis. Adding more data to genome assemblies does not always result in better assemblies, so it is important to understand the nuances of genomic data integration explained here, in order to obtain cost-effective value for money when sequencing genomes.
Article
Full-text available
We implement two measures for quantifying genealogical concordance in phylogenomic datasets: the gene concordance factor (gCF) and the novel site concordance factor (sCF). For every branch of a reference tree, gCF is defined as the percentage of "decisive" gene trees containing that branch. This measure is already in wide usage, but here we introduce a package that calculates it while accounting for variable taxon coverage among gene trees. sCF is a new measure defined as the percentage of decisive sites supporting a branch in the reference tree. gCF and sCF complement classical measures of branch support in phylogenetics by providing a full description of underlying disagreement among loci and sites. An easy to use implementation and tutorial is freely available in the IQ-TREE software package (http://www.iqtree.org).
Article
Full-text available
Thanks to the development of high‐throughput sequencing technologies, target enrichment sequencing of nuclear ultraconserved DNA elements (UCEs) now allows routinely inferring phylogenetic relationships from thousands of genomic markers. Recently, it has been shown that mitochondrial DNA (mtDNA) is frequently sequenced alongside the targeted loci in such capture experiments. Despite its broad evolutionary interest, mtDNA is rarely assembled and used in conjunction with nuclear markers in capture‐based studies. Here, we developed MitoFinder, a user‐friendly bioinformatic pipeline, to efficiently assemble and annotate mitogenomic data from hundreds of UCE libraries. As a case study, we used ants (Formicidae) for which 501 UCE libraries have been sequenced whereas only 29 mitogenomes are available. We compared the efficiency of four different assemblers (IDBA‐UD, MEGAHIT, MetaSPAdes, and Trinity) for assembling both UCE and mtDNA loci. Using MitoFinder, we show that metagenomic assemblers, in particular MetaSPAdes, are well suited to assemble both UCEs and mtDNA. Mitogenomic signal was successfully extracted from all 501 UCE libraries allowing confirming species identification using CO1 barcoding. Moreover, our automated procedure retrieved 296 cases in which the mitochondrial genome was assembled in a single contig, thus increasing the number of available ant mitogenomes by an order of magnitude. By leveraging the power of metagenomic assemblers, MitoFinder provides an efficient tool to extract complementary mitogenomic data from UCE libraries, allowing testing for potential mito‐nuclear discordance. Our approach is potentially applicable to other sequence capture methods, transcriptomic data, and whole genome shotgun sequencing in diverse taxa.
Article
Full-text available
Plastomes of parasitic and mycoheterotrophic plants show different degrees of reduction depending on the plants’ level of heterotrophy and host dependence in comparison to photoautotrophic sister species, and the amount of time since heterotrophic dependence was established. In all but the most recent heterotrophic lineages, this reduction involves substantial decrease in genome size and gene content and sometimes alterations of genome structure. Here, we present the first plastid genome of the holoparasitic genus Prosopanche, which shows clear signs of functionality. The plastome of Prosopanche americana has a length of 28,191 bp and contains only 24 unique genes, i.e., 14 ribosomal protein genes, four ribosomal RNA genes, five genes coding for tRNAs and three genes with other or unknown function (accD, ycf1, ycf2). The inverted repeat has been lost. Despite the split of Prosopanche and Hydnora about 54 MYA ago, the level of genome reduction is strikingly congruent between the two holoparasites although highly dissimilar nucleotide sequences are observed. Our results lead to two possible evolutionary scenarios that will be tested in the future with a larger sampling: 1) a Hydnoraceae plastome, similar to those of Hydnora and Prosopanche today, existed already in the most recent common ancestor and has not changed much with respect to gene content and structure, or 2) the genome similarities we observe today are the result of two independent evolutionary trajectories leading to almost the same endpoint. The first hypothesis would be most parsimonious whereas the second would point to taxon dependent essential gene sets for plants released from photosynthetic constraints.
Article
Full-text available
Background: Arthropods comprise the largest and most diverse phylum on Earth and play vital roles in nearly every ecosystem. Their diversity stems in part from variations on a conserved body plan, resulting from and recorded in adaptive changes in the genome. Dissection of the genomic record of sequence change enables broad questions regarding genome evolution to be addressed, even across hyper-diverse taxa within arthropods. Results: Using 76 whole genome sequences representing 21 orders spanning more than 500 million years of arthropod evolution, we document changes in gene and protein domain content and provide temporal and phylogenetic context for interpreting these innovations. We identify many novel gene families that arose early in the evolution of arthropods and during the diversification of insects into modern orders. We reveal unexpected variation in patterns of DNA methylation across arthropods and examples of gene family and protein domain evolution coincident with the appearance of notable phenotypic and physiological adaptations such as flight, metamorphosis, sociality, and chemoperception. Conclusions: These analyses demonstrate how large-scale comparative genomics can provide broad new insights into the genotype to phenotype map and generate testable hypotheses about the evolution of animal diversity.
Article
Full-text available
Background: The lion (Panthera leo) is one of the most popular and iconic feline species on the planet, yet in spite of its popularity, the last century has seen massive declines for lion populations worldwide. Genomic resources for endangered species represent an important way forward for the field of conservation, enabling high-resolution studies of demography, disease, and population dynamics. Here, we present a chromosome-level assembly from a captive African lion from the Exotic Feline Rescue Center (Center Point, IN) as a resource for current and subsequent genetic work of the sole social species of the Panthera clade. Results: Our assembly is composed of 10x Genomics Chromium data, Dovetail Hi-C, and Oxford Nanopore long-read data. Synteny is highly conserved between the lion, other Panthera genomes, and the domestic cat. We find variability in the length of runs of homozygosity across lion genomes, indicating contrasting histories of recent and possibly intense inbreeding and bottleneck events. Demographic analyses reveal similar ancient histories across all individuals during the Pleistocene except the Asiatic lion, which shows a more rapid decline in population size. We show a substantial influence on the reference genome choice in the inference of demographic history and heterozygosity. Conclusions: We demonstrate that the choice of reference genome is important when comparing heterozygosity estimates across species and those inferred from different references should not be compared to each other. In addition, estimates of heterozygosity or the amount or length of runs of homozygosity should not be taken as reflective of a species, as these can differ substantially among individuals. This high-quality genome will greatly aid in the continuing research and conservation efforts for the lion, which is rapidly moving towards becoming a species in danger of extinction.
Preprint
Full-text available
A bstract Plants and their specialized flower visitors provide valuable insights into the evolutionary consequences of species interactions. In particular, antagonistic interactions between insects and plants have often been invoked as a major driver of diversification. Here we use a tropical community of palms and their specialized insect flower visitors to understand whether antagonisms lead to higher population divergence. Interactions between the palms Syagrus coronata and Syagrus botryophora and the weevils that visit their flowers range from brood pollination to florivory and commensalism. We use genomics to test the role of insect-host interactions in the early stages of diversification of nine species of beetles associated with these plams by using a model of isolation by environment. We find a surprising number of cryptic species, which in pollinating weevils coexist across a broad geographical range but are always associated with different hosts for non-pollinators. The degree to which insect populations are structured by the genetic divergence of plant populations varies. This variation is uncorrelated with the kind of interaction, showing that, at least in this system, antagonistic interactions are not associated with higher genetic differentiation. It is likely that more general aspects of host use, affecting plant-associated insects regardless of the outcomes of their interactions, are more important drivers of population divergence.
Article
Full-text available
Here, we present a major advance of the OrthoFinder method. This extends OrthoFinder's high accuracy orthogroup inference to provide phylogenetic inference of orthologs, rooted gene trees, gene duplication events, the rooted species tree, and comparative genomics statistics. Each output is benchmarked on appropriate real or simulated datasets, and where comparable methods exist, OrthoFinder is equivalent to or outperforms these methods. Furthermore, OrthoFinder is the most accurate ortholog inference method on the Quest for Orthologs benchmark test. Finally, OrthoFinder's comprehensive phylogenetic analysis is achieved with equivalent speed and scalability to the fastest, score-based heuristic methods. OrthoFinder is available at https://github.com/davidemms/OrthoFinder.
Article
Full-text available
The Ensembl (https://www.ensembl.org) is a system for generating and distributing genome annotation such as genes, variation, regulation and comparative genomics across the vertebrate subphylum and key model organisms. The Ensembl annotation pipeline is capable of integrating experimental and reference data from multiple providers into a single integrated resource. Here, we present 94 newly annotated and re-annotated genomes, bringing the total number of genomes offered by Ensembl to 227. This represents the single largest expansion of the resource since its inception. We also detail our continued efforts to improve human annotation, developments in our epigenome analysis and display, a new tool for imputing causal genes from genome-wide association studies and visualisation of variation within a 3D protein model. Finally, we present information on our new website. Both software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license) and data updates made available four times a year.
Article
Full-text available
A major goal of phylogenetic systematics is to understand both the patterns of diversification and the processes by which these patterns are formed. Few studies have focused on the ancient, species-rich Magnoliales clade and its diversification pattern. Within Magnoliales, the pantropically distributed Annonaceae are by far the most genus-rich and species-rich family-level clade, with c. 110 genera and c. 2,400 species. We investigated the diversification patterns across Annonaceae and identified traits that show varied associations with diversification rates using a time-calibrated phylogeny of 835 species (34.6% sampling) and 11,211 aligned bases from eight regions of the plastid genome (rbcL, matK, ndhF, psbA-trnH, trnL-F, atpB-rbcL, trnS-G, and ycf1). Twelve rate shifts were identified using BAMM: in Annona, Artabotrys, Asimina, Drepananthus, Duguetia, Goniothalamus, Guatteria, Uvaria, Xylopia, the tribes Miliuseae and Malmeeae, and the Desmos-Dasymaschalon-Friesodielsia-Monanthotaxis clade. TurboMEDUSA and method-of-moments estimator analyses showed largely congruent results. A positive relationship between species richness and diversification rate is revealed using PGLS. Our results show that the high species richness in Annonaceae is likely the result of recent increased diversification rather than the steady accumulation of species via the 'museum model'. We further explore the possible role of selected traits (habit, pollinator trapping, floral sex expression, pollen dispersal unit, anther septation, and seed dispersal unit) in shaping diversification patterns, based on inferences of BiSSE, MuSSE, HiSSE, and FiSSE analyses. Our results suggest that the liana habit, the presence of circadian pollinator trapping, androdioecy, and the dispersal of seeds as single-seeded monocarp fragments are closely correlated with higher diversification rates; pollen aggregation and anther septation, in contrast, are associated with lower diversification rates.
Article
Full-text available
Identifying the genetic mechanisms of adaptation requires the elucidation of links between the evolution of DNA sequence, phenotype, and fitness¹. Convergent evolution can be used as a guide to identify candidate mutations that underlie adaptive traits2,3,4, and new genome editing technology is facilitating functional validation of these mutations in whole organisms1,5. We combined these approaches to study a classic case of convergence in insects from six orders, including the monarch butterfly (Danaus plexippus), that have independently evolved to colonize plants that produce cardiac glycoside toxins6,7,8,9,10,11. Many of these insects evolved parallel amino acid substitutions in the α-subunit (ATPα) of the sodium pump (Na⁺/K⁺-ATPase)7,8,9,10,11, the physiological target of cardiac glycosides¹². Here we describe mutational paths involving three repeatedly changing amino acid sites (111, 119 and 122) in ATPα that are associated with cardiac glycoside specialization13,14. We then performed CRISPR–Cas9 base editing on the native Atpα gene in Drosophila melanogaster flies and retraced the mutational path taken across the monarch lineage11,15. We show in vivo, in vitro and in silico that the path conferred resistance and target-site insensitivity to cardiac glycosides¹⁶, culminating in triple mutant ‘monarch flies’ that were as insensitive to cardiac glycosides as monarch butterflies. ‘Monarch flies’ retained small amounts of cardiac glycosides through metamorphosis, a trait that has been optimized in monarch butterflies to deter predators17,18,19. The order in which the substitutions evolved was explained by amelioration of antagonistic pleiotropy through epistasis13,14,20,21,22. Our study illuminates how the monarch butterfly evolved resistance to a class of plant toxins, eventually becoming unpalatable, and changing the nature of species interactions within ecological communities2,6,7,8,9,10,11,15,17,18,19.
Article
Full-text available
The need for robust estimates of times of divergence is essential for downstream analyses, yet assessing this robustness is still rare. We generated a time-calibrated genus-level phylogeny of butterflies (Papilionoidea), including 994 taxa, up to 10 gene fragments and an unprecedented set of 12 fossils and 10 host-plant node calibration points. We compared marginal priors and posterior distributions to assess the relative importance of the former on the latter. This approach revealed a strong influence of the set of priors on the root age but for most calibrated nodes posterior distributions shifted from the marginal prior, indicating significant information in the molecular data set. Using a very conservative approach we estimated an origin of butterflies at 107.6 Ma, approximately equivalent to the latest Early Cretaceous, with a credibility interval ranging from 89.5 Ma (mid Late Cretaceous) to 129.5 Ma (mid Early Cretaceous). In addition, we tested the effects of changing fossil calibration priors, tree prior, different sets of calibrations and different sampling fractions but our estimate remained robust to these alternative assumptions. With 994 genera, this tree provides a comprehensive source of secondary calibrations for studies on butterflies.
Preprint
Full-text available
Besides macaques, baboons are the most commonly used nonhuman primate in biomedical research. Despite this importance, the genomic resources for baboons are quite limited. In particular, the current baboon reference genome Panu_3.0 is a highly fragmented, reference-guided (i.e., not fully de novo) assembly, and its poor quality inhibits our ability to conduct downstream genomic analyses. Here we present a truly de novo genome assembly of the olive baboon (Papio anubis) that uses data from several recently developed single-molecule technologies. Our assembly, Panubis1.0, has an N50 contig size of ~1.46 Mb (as opposed to 139 Kb for Panu_3.0), and has single scaffolds that span each of the 20 autosomes and the X chromosome. We highlight multiple lines of evidence (including Bionano Genomics data, pedigree linkage information, and linkage disequilibrium data) suggesting that there are several large assembly errors in Panu_3.0, which have been corrected in Panubis1.0.
Article
Full-text available
Background: Basecalling, the computational process of translating raw electrical signal to nucleotide sequence, is of critical importance to the sequencing platforms produced by Oxford Nanopore Technologies (ONT). Here, we examine the performance of different basecalling tools, looking at accuracy at the level of bases within individual reads and at majority-rule consensus basecalls in an assembly. We also investigate some additional aspects of basecalling: training using a taxon-specific dataset, using a larger neural network model and improving consensus basecalls in an assembly by additional signal-level analysis with Nanopolish. Results: Training basecallers on taxon-specific data results in a significant boost in consensus accuracy, mostly due to the reduction of errors in methylation motifs. A larger neural network is able to improve both read and consensus accuracy, but at a cost to speed. Improving consensus sequences ('polishing') with Nanopolish somewhat negates the accuracy differences in basecallers, but pre-polish accuracy does have an effect on post-polish accuracy. Conclusions: Basecalling accuracy has seen significant improvements over the last 2 years. The current version of ONT's Guppy basecaller performs well overall, with good accuracy and fast performance. If higher accuracy is required, users should consider producing a custom model using a larger neural network and/or training data from the same species.
Article
Full-text available
As the most species-rich class of tetrapod vertebrates, Aves possesses diverse feeding habits, with multiple origins of insectivory, carnivory, frugivory, nectarivory, granivory and omnivory. Since digestive enzymes mediate and limit energy and nutrient uptake, we hypothesized that genes encoding digestive enzymes have undergone adaptive evolution in birds. To test this general hypothesis, we identified 16 digestive enzyme genes (including seven carbohydrase genes (hepatic amy, pancreatic amy, salivary amy, agl, g6pc, gaa and gck), three lipase genes (cyp7a1, lipf and pnlip), two protease genes (ctrc and pgc), two lysozyme genes (lyz and lyg) and two chitinase genes (chia and chit1)) from the available genomes of 48 bird species. Among these 16 genes, three (salivary amy, lipf and chit1) were not found in all 48 avian genomes, which was further supported by our synteny analysis. Of the remaining 13 genes, eight were single-copy and five (chia, gaa, lyz, lyg and pgc) were multi-copy. Moreover, the multi-copy genes gaa, lyg and pgc were predicted to exhibit functional divergence among copies. Positively selected sites were detected in all of the analyzed digestive enzyme genes, except agl, g6pc, gaa and gck, suggesting that different diets may have favored differences in catalytic capacities of these enzymes. Furthermore, the analysis also revealed that the pancreatic amylase gene and one of the lipase genes (cyp7a1) have higher ω (the ratio of nonsynonymous to the synonymous substitution rates) values in species consuming a larger amount of seeds and meat, respectively, indicating an intense selection. In addition, the gck carbohydrase gene in species consuming a smaller amount of seeds, fruits or nectar, and a lipase gene (pnlip) in species consuming less meat were found to be under relaxed selection. Thus, gene loss, gene duplication, functional divergence, positive selection and relaxed selection have collectively shaped the evolution of digestive enzymes in birds, and the evolutionary flexibility of these enzymes may have facilitated their dietary diversification.
Article
Full-text available
Environmental change can create opportunities for increased rates of lineage diversification, but continued species accumulation has been hypothesized to lead to slowdowns via competitive exclusion and niche partitioning. Such density-dependent models imply tight linkages between diversification and trait evolution, but there are plausible alternative models. Little is known about the association between diversification and key ecological and phenotypic traits at broad phylogenetic and spatial scales. Do trait evolutionary rates coincide with rates of diversification, are there lags among these rates, or is diversification niche-neutral? To address these questions, we combine a deeply sampled phylogeny for a major flowering plant clade—Saxifragales—with phenotype and niche data to examine temporal patterns of evolutionary rates. The considerable phenotypic and habitat diversity of Saxifragales is greatest in temperate biomes. Global expansion of these habitats since the mid-Miocene provided ecological opportunities that, with density-dependent adaptive radiation, should result in simultaneous rate increases for diversification, niche, and phenotype, followed by decreases with habitat saturation. Instead, we find that these rates have significantly different timings, with increases in diversification occurring at the mid-Miocene Climatic Optimum (∼15 Mya), followed by increases in niche and phenotypic evolutionary rates by ∼5 Mya; all rates increase exponentially to the present. We attribute this surprising lack of temporal coincidence to initial niche-neutral diversification followed by ecological and phenotypic divergence coincident with more extreme cold and dry habitats that proliferated into the Pleistocene. A lack of density-dependence contrasts with investigations of other cosmopolitan lineages, suggesting alternative patterns may be common in the diversification of temperate lineages.
Article
Full-text available
Motivation: Phylogenies are important for fundamental biological research, but also have numerous applications in biotechnology, agriculture, and medicine. Finding the optimal tree under the popular maximum likelihood (ML) criterion is known to be NP-hard. Thus, highly optimized and scalable codes are needed to analyze constantly growing empirical datasets. Results: We present RAxML-NG, a from scratch re-implementation of the established greedy tree search algorithm of RAxML/ExaML. RAxML-NG offers improved accuracy, flexibility, speed, scalability, and usability compared to RAxML/ExaML. On taxon-rich datasets, RAxML-NG typically finds higher-scoring trees than IQTree, an increasingly popular recent tool for ML-based phylogenetic inference (although IQ-Tree shows better stability). Finally, RAxML-NG introduces several new features, such as the detection of terraces in tree space and a the recently introduced transfer bootstrap support metric. Availability: The code is available under GNU GPL at https://github.com/amkozlov/raxml-ng. RAxML-NG web service (maintained by Vital-IT) is available at https://raxml-ng.vital-it.ch/. Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Full-text available
Background More than 3,000 species of octocorals (Cnidaria, Anthozoa) inhabit an expansive range of environments, from shallow tropical seas to the deep-ocean floor. They are important foundation species that create coral “forests,” which provide unique niches and 3-dimensional living space for other organisms. The octocoral genus Renilla inhabits sandy, continental shelves in the subtropical and tropical Atlantic and eastern Pacific Oceans. Renilla is especially interesting because it produces secondary metabolites for defense, exhibits bioluminescence, and produces a luciferase that is widely used in dual-reporter assays in molecular biology. Although several anthozoan genomes are currently available, the majority of these are hexacorals. Here, we present a de novo assembly of an azooxanthellate shallow-water octocoral, Renilla muelleri. Findings We generated a hybrid de novo assembly using MaSuRCA v.3.2.6. The final assembly included 4,825 scaffolds and a haploid genome size of 172 megabases (Mb). A BUSCO assessment found 88% of metazoan orthologs present in the genome. An Augustus ab initio gene prediction found 23,660 genes, of which 66% (15,635) had detectable similarity to annotated genes from the starlet sea anemone, Nematostella vectensis, or to the Uniprot database. Although the R. muelleri genome may be smaller (172 Mb minimum size) than other publicly available coral genomes (256–448 Mb), the R. muelleri genome is similar to other coral genomes in terms of the number of complete metazoan BUSCOs and predicted gene models. Conclusions The R. muelleri hybrid genome provides a novel resource for researchers to investigate the evolution of genes and gene families within Octocorallia and more widely across Anthozoa. It will be a key resource for future comparative genomics with other corals and for understanding the genomic basis of coral diversity.
Preprint
Full-text available
Species richness varies considerably among the tree of life which can only be explained by heterogeneous rates of diversification (speciation and extinction). Previous approaches use phylogenetic trees to estimate branch-specific diversification rates. However, all previous approaches disregard diversification-rate shifts on extinct lineages although 99% of species that ever existed are now extinct. Here we describe a lineage-specific birth-death-shift process where lineages, both extant and extinct, may have heterogeneous rates of diversification. To facilitate probability computation we discretize the base distribution on speciation and extinction rates into k rate categories. The fixed number of rate categories allows us to extend the theory of state-dependent speciation and extinction models (e.g., BiSSE and MuSSE) to compute the probability of an observed phylogeny given the set of speciation and extinction rates. To estimate branch-specific diversification rates, we develop two independent and theoretically equivalent approaches: numerical integration with stochastic character mapping and data-augmentation with reversible-jump Markov chain Monte Carlo sampling. We validate the implementation of the two approaches in RevBayes using simulated data and an empirical example study of primates. In the empirical example, we show that estimates of the number of diversification-rate shifts are, unsurprisingly, very sensitive to the choice of prior distribution. Instead, branch-specific diversification rate estimates are less sensitive to the assumed prior distribution on the number of diversification-rate shifts and consistently infer an increased rate of diversification for Old World Monkeys. Additionally, we observe that as few as 10 diversification-rate categories are sufficient to approximate a continuous base distribution on diversification rates. In conclusion, our implementation of the lineage-specific birth-death-shift model in RevBayes provides biologists with a method to estimate branch-specific diversification rates under a mathematically consistent model.
Preprint
Full-text available
Studying the activity of distributed neuronal circuits at a cellular resolution in vertebrates is very challenging due to the size and optical turbidity of their brains. We recently presented Danionella translucida, a close relative of zebrafish, as a model organism suited for studying large-scale neural network interactions in adult individuals. Danionella remains transparent throughout its life, has the smallest known vertebrate brain and possesses a rich repertoire of complex behaviours. Here we sequenced, assembled and annotated the Danionella translucida genome employing a hybrid Illumina/Nanopore read library as well as RNA-seq of embryonic, larval and adult mRNA. We achieved high assembly continuity using low-coverage long-read data and annotated a large fraction of the transcriptome. This dataset will pave the way for molecular research and targeted genetic manipulation of the smallest known vertebrate brain.
Article
Full-text available
Loss or reduction of teeth has occurred independently in all major clades of mammals [1]. This process is associated with specialized diets, such as myrmecophagy and filter feeding [2, 3], and led to an extensive rearrangement of the mandibular anatomy. The mandibular canal enables lower jaw innervation through the passage of the inferior alveolar nerve (IAN) [4, 5]. In order to innervate teeth, the IAN projects ascending branches directly through tooth roots [5, 6], bone trabeculae [6], or bone canaliculi (i.e., dorsal canaliculi) [7]. Here, we used micro-computed tomography (μ-CT) scans of mandibles, from eight myrmecophagous species with reduced dentition and 21 non-myrmecophages, to investigate the evolutionary fate of dental innervation structures following convergent tooth regression in mammals. Our observations provide strong evidence for a link between the presence of tooth loci and the development of dorsal canaliculi. Interestingly, toothless anteaters present dorsal canaliculi and preserve intact tooth innervation, while equally toothless pangolins do not. We show that the internal mandibular morphology of anteaters has a closer resemblance to that of baleen whales [7] than to pangolins. This is despite masticatory apparatus resemblances that have made anteaters and pangolins a textbook example of convergent evolution. Our results suggest that early tooth loci innervation [8] is required for maintaining the dorsal innervation of the mandible and underlines the dorsal canaliculi sensorial role in the context of mediolateral mandibular movements. This study presents a unique example of convergent redeployment of the tooth developmental pathway to a strictly sensorial function following tooth regression in anteaters and baleen whales.
Article
Full-text available
Background Multiple Sequence Alignments (MSAs) are the starting point of molecular evolutionary analyses. Errors in MSAs generate a non-historical signal that can lead to incorrect inferences. Therefore, numerous efforts have been made to reduce the impact of alignment errors, by improving alignment algorithms and by developing methods to filter out poorly aligned regions. However, MSAs do not only contain alignment errors, but also primary sequence errors. Such errors may originate from sequencing errors, from assembly errors, or from erroneous structural annotations (such as incorrect intron/exon boundaries). Even though their existence is acknowledged, the impact of primary sequence errors on evolutionary inference is poorly characterized. Results In a first step to fill this gap, we have developed a program called HmmCleaner, which detects and eliminates these errors from MSAs. It uses profile hidden Markov models (pHMM) to identify sequence segments that poorly fit their MSA and selectively removes them. We assessed its performances using > 700 amino-acid MSAs from prokaryotes and eukaryotes, in which we introduced several types of simulated primary sequence errors. The sensitivity of HmmCleaner towards simulated primary sequence errors was > 95%. In a second step, we compared the impact of segment filtering software (HmmCleaner and PREQUAL) relative to commonly used block-filtering software (BMGE and TrimAI) on evolutionary analyses. Using real data from vertebrates, we observed that segment-filtering methods improve the quality of evolutionary inference more than the currently used block-filtering methods. The formers were especially effective at improving branch length inferences, and at reducing false positive rate during detection of positive selection. Conclusions Segment filtering methods such as HmmCleaner accurately detect simulated primary sequence errors. Our results suggest that these errors are more detrimental than alignment errors. However, they also show that stochastic (sampling) error is predominant in single-gene evolutionary inferences. Therefore, we argue that MSA filtering should focus on segment instead of block removal and that more studies are required to find the optimal balance between accuracy improvement and stochastic error increase brought by data removal. Electronic supplementary material The online version of this article (10.1186/s12862-019-1350-2) contains supplementary material, which is available to authorized users.
Article
Full-text available
1.Phylogenetic studies are increasingly reliant on next‐generation sequencing (NGS). Transcriptomic and hybrid‐enrichment sequencing techniques remain the most prevalent methods for phylogenomic data collection due to their relatively low demands for computing powers and sequencing prices, compared to whole genome shotgun sequencing (WGS). However, the transcriptome‐based method is constrained by the availability of fresh materials and hybrid enrichment is limited by genomic resources necessary in probe designs, especially for non‐model organisms. 2. We present a novel WGS‐based pipeline for extracting essential phylogenomic markers through rapid de novo genome assembling from low‐coverage genome data, employing a series of computationally efficient bioinformatic tools. We tested the pipeline on a Hexapoda dataset and a more focused Phthiraptera dataset (genome sizes 0.1–2 Gbp), and further investigated the effects of sequencing depth on target assembly success rate based on raw data of six insect genomes (0.1–1 Gbp). 3. Each genome assembly was completed in 2–24 hours on desktop PCs. We extracted 872–1,615 near‐universal single‐copy orthologs (BUSCOs) per species. This method also enables development of ultraconserved element (UCE) probe sets; we generated probes for Phthiraptera based on our WGS assemblies, containing 55,030 baits targeting 2,832 loci, from which we extracted 2,125–2,272 UCEs. Resulting phylogenetic trees all agreed with currently‐accepted topologies, indicating that markers produced in our methods were valid for phylogenomic studies. We also showed that 10–20× sequencing coverage was sufficient to produce hundreds to thousands of targeted loci from BUSCO sets, and even lower coverage (5×) was required for UCEs. 4. Our study demonstrates the feasibility of conducting phylogenomics from low‐coverage WGS for a wide range of organisms without reference genomes. This new approach has major advantages in data collection, particularly in reducing sequencing cost and computing consumption, while expanding loci choices. This article is protected by copyright. All rights reserved.
Article
Full-text available
The evolutionary history of a gene helps predict its function and relationship to phenotypic traits. Although sequence conservation is commonly used to decipher gene function and assess medical relevance, methods for functional inference from comparative expression data are lacking. Here, we use RNA-seq across seven tissues from 17 mammalian species to show that expression evolution across mammals is accurately modeled by the Ornstein–Uhlenbeck process, a commonly proposed model of continuous trait evolution. We apply this model to identify expression pathways under neutral, stabilizing, and directional selection. We further demonstrate novel applications of this model to quantify the extent of stabilizing selection on a gene’s expression, parameterize the distribution of each gene’s optimal expression level, and detect deleterious expression levels in expression data from individual patients. Our work provides a statistical framework for interpreting expression data across species and in disease.
Article
Full-text available
Delineating species is a difficult and seemingly uninteresting issue that is still essential to address. Taxonomic methodology is heterogeneous according to the taxa and scientists involved due to the disparate data quality and quantity and disagreements over the species concept. This has negative impacts on basic and applied research. Genomic data substantially enhance our understanding of the speciation process but do not provide a ubiquitous solution to the species problem. The relevance of comparative approaches in speciation research has nevertheless recently been demonstrated. I suggest moving towards a more unified taxonomic classification through a reference‐based decision procedure. This article is protected by copyright. All rights reserved.
Article
Full-text available
The rise of Neogene C 4 grasslands is one of the most drastic changes recently experienced by the biosphere. A central-and widely debated-hypothesis posits that Neogene grasslands acted as a major adaptive zone for herbivore lineages. We test this hypothesis with a novel model system, the Sesamiina stemborer moths and their associated host-grasses. Using a comparative phylogenetic framework integrating paleoenvironmental proxies we recover a negative correlation between the evolutionary trajectories of insects and plants. Our results show that paleoenvironmental changes generated opposing macroevolutionary dynamics in this insect-plant system and call into question the role of grasslands as a universal adaptive cradle. This study illustrates the importance of implementing environmental proxies in diversification analyses to disentangle the relative impacts of biotic and abiotic drivers of macroevolutionary dynamics.
Article
Full-text available
eggNOG is a public database of orthology relationships, gene evolutionary histories and functional annotations. Here, we present version 5.0, featuring a major update of the underlying genome sets, which have been expanded to 4445 representative bacteria and 168 archaea derived from 25 038 genomes, as well as 477 eukaryotic organisms and 2502 viral proteomes that were selected for diversity and filtered by genome quality. In total, 4.4M orthologous groups (OGs) distributed across 379 taxonomic levels were computed together with their associated sequence alignments, phylogenies, HMM models and functional descriptors. Precomputed evolutionary analysis provides fine-grained resolution of duplication/speciation events within each OG. Our benchmarks show that, despite doubling the amount of genomes, the quality of orthology assignments and functional annotations (80% coverage) has persisted without significant changes across this update. Finally, we improved eggNOG online services for fast functional annotation and orthology prediction of custom genomics or metagenomics datasets. All precomputed data are publicly available for downloading or via API queries at http://eggnog.embl.de.
Article
Full-text available
PANTHER (Protein Analysis Through Evolutionary Relationships, http://pantherdb.org) is a resource for the evolutionary and functional classification of genes from organisms across the tree of life. We report the improvements we have made to the resource during the past two years. For evolutionary classifications, we have added more prokaryotic and plant genomes to the phylogenetic gene trees, expanding the representation of gene evolution in these lineages. We have refined many protein family boundaries, and have aligned PANTHER with the MEROPS resource for protease and protease inhibitor families. For functional classifications, we have developed an entirely new PANTHER GO-slim, containing over four times as many Gene Ontology terms as our previous GO-slim, as well as curated associations of genes to these terms. Lastly, we have made substantial improvements to the enrichment analysis tools available on the PANTHER website: users can now analyze over 900 different genomes, using updated statistical tests with false discovery rate corrections for multiple testing. The overrepresentation test is also available as a web service, for easy addition to third-party sites.
Article
Full-text available
The evolutionary history of the wolf-like canids of the genus Canis has been heavily debated, especially regarding the number of distinct species and their relationships at the population and species level [1–6]. We assembled a dataset of 48 resequenced genomes spanning all members of the genus Canis except the black-backed and side-striped jackals, encompassing the global diversity of seven extant canid lineages. This includes eight new genomes, including the first resequenced Ethiopian wolf (Canis simensis), one dhole (Cuon alpinus), two East African hunting dogs (Lycaon pictus), two Eurasian golden jackals (Canis aureus), and two Middle Eastern gray wolves (Canis lupus). The relationships between the Ethiopian wolf, African golden wolf, and golden jackal were resolved. We highlight the role of interspecific hybridization in the evolution of this charismatic group. Specifically, we find gene flow between the ancestors of the dhole and African hunting dog and admixture between the gray wolf, coyote (Canis latrans), golden jackal, and African golden wolf. Additionally, we report gene flow from gray and Ethiopian wolves to the African golden wolf, suggesting that the African golden wolf originated through hybridization between these species. Finally, we hypothesize that coyotes and gray wolves carry genetic material derived from a “ghost” basal canid lineage. Gopalakrishnan et al. present evidence of pervasive gene flow among species of the genus Canis. In addition to previously known admixture events, they find evidence of gene flow from a “ghost” canid, related to the dhole, into the ancestor of the gray wolf and coyote. Further, they suggest that the African golden wolf is a species of hybrid origin.
Article
Full-text available
Motivation Quality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for each operation, such as quality control, adapter trimming and quality filtering. These tools are often insufficiently fast as most are developed using high-level programming languages (e.g. Python and Java) and provide limited multi-threading support. Reading and loading data multiple times also renders preprocessing slow and I/O inefficient. Results We developed fastp as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features. It can perform quality control, adapter trimming, quality filtering, per-read quality pruning and many other operations with a single scan of the FASTQ data. This tool is developed in C++ and has multi-threading support. Based on our evaluation, fastp is 2–5 times faster than other FASTQ preprocessing tools such as Trimmomatic or Cutadapt despite performing far more operations than similar tools. Availability and implementation The open-source code and corresponding instructions are available at https://github.com/OpenGene/fastp.
Preprint
The extent to which evolutionary outcomes reflect the unpredictable influences of chance and contingency is a central but unanswered question in evolutionary biology. A precise characterization requires evolutionary trajectories to be repeated multiple times under identical environmental conditions from multiple starting points across history, a scenario that rarely, if ever, occurs in nature. Here we combine continuous experimental evolution with ancestral protein reconstruction and manipulative genetic experiments to identify the causes and consequences of chance and contingency in the genetic outcomes of molecular evolution. By repeatedly evolving ancestral proteins in the B-cell lymphoma-2 (BCL-2) family of apoptosis regulators to acquire the same protein-protein interaction specificities that evolved during history, we found that contingency and chance interact to make sequence evolution increasingly unpredictable over phylogenetic timescales. Although replicates from the same starting genotype sometimes share mutations - indicating partial predictability - there are multiple alternative sets of changes that can alter specificity, and chance decides which of these paths is taken. Contingency has a stronger effect: when trajectories are initiated from different starting points, outcomes are even more divergent, because substitutions that occurred during phylogenetic history repeatedly changed the potential of other mutations to confer new binding specificities. The impact of contingency increased steadily with phylogenetic distance and magnified the effects of chance, resulting in a >3-fold increase in genetic variance among evolutionary trajectories initiated from different starting points across the timescale of metazoan evolution. Our findings show how a particular cascade of chance evolutionary steps throughout history makes the outcomes of molecular evolution increasingly idiosyncratic and unpredictable, even under strong selection.
Article
Linking interspecific interactions (e.g., mutualism, competition, predation, parasitism) to macroevolution (evolutionary change on deep timescales) is a key goal in biology. The role of species interactions in shaping macroevolutionary trajectories has been studied for centuries and remains a cutting-edge topic of current research. However, despite its deep historical roots, classic and current approaches to this topic are highly diverse. Here, we combine historical and contemporary perspectives on the study of ecological interactions in macroevolution, synthesizing ideas across eras to build a zoomed-out picture of the big questions at the nexus of ecology and macroevolution. We discuss the trajectory of this important and challenging field, dividing research into work done before the 1970s, research between 1970 and 2005, and work done since 2005. We argue that in response to long-standing questions in paleobiology, evidence accumulated to date has demonstrated that biotic interactions (including mutualism) can influence lineage diversification and trait evolution over macroevolutionary timescales, and we outline major open questions for future research in the field. Expected final online publication date for the Annual Review of Ecology, Evolution, and Systematics, Volume 51 is November 2, 2020. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Article
Although cases of independent adaptation to the same dietary niche have been documented in mammalian ecology, the molecular correlates of such shifts are seldom known. Here we used genome‐wide analyses of molecular evolution to examine two lineages of bats that, from an insectivorous ancestor, have both independently evolved obligate frugivory: the Old World family Pteropodidae and the neotropical subfamily Stenodermatinae. New genome assemblies from two neotropical fruit bats (Artibeus jamaicensis and Sturnira hondurensis) provide a framework for comparisons with Old World fruit bats. Comparative genomics of 10 bat species encompassing dietary diversity across the phylogeny revealed convergent molecular signatures of frugivory in both multi‐gene family evolution and single‐copy genes. Evidence for convergent molecular adaptations associated with frugivorous diets includes the composition of three subfamilies of olfactory receptor genes, losses of three bitter taste receptor genes, losses of two digestive enzyme genes, and convergent amino acid substitutions in several metabolic genes. By identifying suites of adaptations associated with the convergent evolution of frugivory, our analyses both reveal the extent of molecular mechanisms under selection in dietary shifts, and will facilitate future studies of molecular ecology in mammals.
Article
Motivation: When different lineages of organisms independently adapt to similar environments, selection often acts repeatedly upon the same genes, leading to signatures of convergent evolutionary rate shifts at these genes. With the increasing availability of genome sequences for organisms displaying a variety of convergent traits, the ability to identify genes with such convergent rate signatures would enable new insights into the molecular basis of these traits. Results: Here we present the R package RERconverge, which tests for association between relative evolutionary rates of genes and the evolution of traits across a phylogeny. RERconverge can perform associations with binary and continuous traits, and it contains tools for visualization and enrichment analyses of association results. Availability: RERconverge source code, documentation, and a detailed usage walk-through are freely available at https://github.com/nclark-lab/RERconverge. Datasets for mammals, Drosophila, and yeast are available at https://bit.ly/2J2QBnj. Supplementary information: Supplementary information, containing detailed vignettes for usage of RERconverge, are available at Bioinformatics online.
Article
Evolutionary relationships have remained unresolved in many well-studied groups, even though advances in next-generation sequencing and analysis, using approaches such as transcriptomics, anchored hybrid enrichment, or ultraconserved elements, have brought systematics to the brink of whole genome phylogenomics. Recently, it has become possible to sequence the entire genomes of numerous non-biological models in parallel at reasonable cost, particularly with shotgun sequencing. Here we identify orthologous coding sequences from whole-genome shotgun sequences, which we then use to investigate the relevance and power of phylogenomic relationship inference and time-calibrated tree estimation. We study an iconic group of butterflies - swallowtails of the family Papilionidae - that has remained phylogenetically unresolved, with continued debate about the timing of their diversification. Low-coverage whole genomes were obtained using Illumina shotgun sequencing for all genera. Genome assembly coupled to BLAST-based orthology searches allowed extraction of 6,621 orthologous protein-coding genes for 45 Papilionidae species and 16 outgroup species (with 32% missing data after cleaning phases). Supermatrix phylogenomic analyses were performed with both maximum-likelihood (IQ-TREE) and Bayesian mixture models (PhyloBayes) for amino acid sequences, which produced a fully resolved phylogeny providing new insights into controversial relationships. Species tree reconstruction from gene trees was performed with ASTRAL and SuperTriplets and recovered the same phylogeny. We estimated gene site concordant factors to complement traditional node-support measures, which strengthens the robustness of inferred phylogenies. Bayesian estimates of divergence times based on a reduced dataset (760 orthologs and 12% missing data) indicate a mid-Cretaceous origin of Papilionoidea around 99.2 million years ago (Ma) (95% credibility interval: 68.6-142.7 Ma) and Papilionidae around 71.4 Ma (49.8-103.6 Ma), with subsequent diversification of modern lineages well after the Cretaceous-Paleogene event. These results show that shotgun sequencing of whole genomes, even when highly fragmented, represents a powerful approach to phylogenomics and molecular dating in a group that has previously been refractory to resolution.
Article
Mutualisms – cooperative interactions among different species – are known to influence global biodiversity. Nevertheless, theoretical and empirical work has led to divergent hypotheses about how mutualisms modulate diversity. We ask here when and how mutualisms influence species richness. Our synthesis suggests that mutualisms can promote or restrict species richness depending on mutualist function, the level of partner dependence, and the specificity of the partnership. These characteristics, which themselves are influenced by environmental and geographic variables, regulate species richness at different scales by modulating speciation, extinction, and community coexistence. Understanding the relative impact of these mechanisms on species richness will require the integration of new phylogenetic comparative models as well as the manipulation and monitoring of experimental communities and their resulting interaction networks.
Article
Knowledge of the internal phylogeny and evolutionary history of ants (Formicidae), the world's most species-rich clade of eusocial organisms, has dramatically improved since the advent of molecular phylogenetics. A number of relationships at the subfamily level, however, remain uncertain. Key unresolved issues include placement of the root of the ant tree of life and the relationships among the so-called poneroid subfamilies. Here we assemble a new data set to attempt a resolution of these two problems and carry out divergence dating, focusing on the age of the root node of crown Formicidae. For the phylogenetic analyses we included data from 110 ant species, including the key species Martialis heureka. We focused taxon sampling on non-formicoid lineages of ants to gain insight about deep nodes in the ant phylogeny. For divergence dating we retained a subset of 62 extant taxa and 42 fossils in order to approximate diversified sampling in the context of the fossilized birth-death process. We sequenced 11 nuclear gene fragments for a total of ∼7.5 kb and investigated the DNA sequence data for the presence of among-taxon compositional heterogeneity, a property known to mislead phylogenetic inference, and for its potential to affect the rooting of the ant phylogeny. We found sequences of the Leptanillinae and several outgroup taxa to be rich in adenine and thymine (51% average AT content) compared to the remaining ants (45% average). To investigate whether this heterogeneity could bias phylogenetic inference we performed outgroup removal experiments, analysis of compositionally homogeneous sites, and a simulation study. We found that compositional heterogeneity indeed appears to affect the placement of the root of the ant tree but has limited impact on more recent nodes. Our findings have implications for outgroup choice in phylogenetics, which should be made not only on the basis of close relationship to the ingroup, but should also take into account sequence divergence and other properties relative to the ingroup. We put forward a hypothesis regarding the rooting of the ant phylogeny, in which Martialis and the Leptanillinae together constitute a clade that is sister to all other ants. After correcting for compositional heterogeneity this emerges as the best-supported hypothesis of relationships at deep nodes in the ant tree. The results of our divergence dating under the fossilized birth-death process and diversified sampling suggest that the crown Formicidae originated during the Albian or Aptian ages of the Lower Cretaceous (103–124 Ma). In addition, we found support for monophyletic poneroids comprising the subfamilies Agroecomyrmecinae, Amblyoponinae, Apomyrminae, Paraponerinae, Ponerinae, and Proceratiinae, and well-supported relationships among these subfamilies except for the placement of Proceratiinae and (Amblyoponinae + Apomyrminae). Our phylogeny also highlights the non-monophyly of several ant genera, including Protanilla and Leptanilla in the Leptanillinae, Proceratium in the Proceratiinae, and Cryptopone, Euponera, and Mesoponera within the Ponerinae.
Article
We use the genomes of 160 insect species to test the hypothesis that the size of detoxifying enzyme families is greater in species using more chemically diverse food resources. Phylogenetically appropriate contrasts in subsamples of the data generally support the hypothesis. We find relatively high numbers of cytochrome P450, glutathione S-transferase and carboxyl/choline esterase genes in omnivores and herbivores feeding on chemically complex tissues and relatively low numbers of these genes in specialists on relatively simple diets, including plant sap, nectar and pollen, and blood. Among Lepidoptera feeding on green plant tissue and Condylognatha feeding on sap we also find more of these genes in highly polyphagous species, many of which are major agricultural pests. These genomic signatures of food resource use are consistent with the hypothesis that some taxa are preadapted for insecticide resistance evolution.