Article

PSMC (Pairwise Sequentially Markovian Coalescent) analysis of RAD (Restriction site Associated DNA) sequencing data

Authors:
• Aarhus University, Aarhus Campus
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The Pairwise Sequentially Markovian Coalescent (PSMC) method uses the genome sequence of a single individual to estimate demographic history covering a time span of thousands of generations. Although originally designed for whole genome data, we here use simulations to investigate its applicability to reference genome aligned RAD (Restriction site Associated DNA) data. We find that RAD data can potentially be used for PSMC analysis, but at present with limitations. The key factor is the proportion (p) of the genome that the RAD data covers. In our simulations, a proportion of 10% can still retain a substantial amount of coalescent information, whereas for 1% estimation becomes unreliable. The performance depends strongly on mutation rate (μ) and recombination rate (r) and is proportional to μ*p/r. When the value of this term is low, increasing the amount of data and number of iterations helps restoring the power of the estimation. We subsequently analyze one whole genome sequenced and 17 RAD sequenced threespine sticklebacks (Gasterosteus aculeatus) from a lake in Greenland. The whole genome sequence suggests a relatively recent expansion and decline within ca. 4,000-40,000 generations ago, possibly reflecting postglacial expansion and founding of the lake population. RAD data, where chromosomes from 10 individuals are combined identify a similar pattern. Our study provides guidance about the use of PSMC analysis and suggests measures that can improve its utility for RAD data. Finally, the study shows that RAD loci in general contain coalescent information that can be used for developing more targeted methods. This article is protected by copyright. All rights reserved.

No full-text available

... Both PSMC and MSMC can be used with restriction site-associated DNA (RAD) data (Liu & Hansen, 2017). RAD sequencing is a reduced-representation method that gives the sequences of regions flanking the cutting sites of a chosen restriction enzyme (Miller, Dunham, Amores, Cresko, & Johnson, 2007). ...
... The smaller the fraction of the genome that this subset covers, the greater the reduction in accuracy and increase in variance. As with inferences from other reduced data sets, the demographic curve obtained from RAD data is flatter, with peaks and troughs that are less pronounced (Liu & Hansen, 2017). ...
... Based on evidence from simulations, a rule of thumb is that PSMC can recover the broad shape of the demographic curve if μp/r > 0.5, where μ is the mutation rate, p is the fraction of the genome covered by the RAD sequencing, and r is the recombination rate (Liu & Hansen, 2017). In practice, when RAD data are used for demographic inference, the read length and sampling density should be maximized. ...
Article
Full-text available
A common goal of population genomics and molecular ecology is to reconstruct the demographic history of a species of interest. A pair of powerful tools based on the sequentially Markovian coalescent have been developed to infer past population sizes using genome sequences. These methods are most useful when sequences are available for only a limited number of genomes and when the aim is to study ancient demographic events. The results of these analyses can be difficult to interpret accurately, because doing so requires some understanding of their theoretical basis and of their sensitivity to confounding factors. In this practical review, we explain some of the key concepts underpinning the pairwise and multiple sequentially Markovian coalescent methods (PSMC and MSMC, respectively). We relate these concepts to the use and interpretation of these methods, and we explain how the choice of different parameter values by the user can affect the accuracy and precision of the inferences. Based on our survey of 100 PSMC studies and 30 MSMC studies, we describe how the two methods are used in practice. Readers of this article will become familiar with the principles, practice, and interpretation of the sequentially Markovian coalescent for inferring demographic history. A common goal of population genomics and molecular ecology is to reconstruct the demographic history of a species of interest. In this practical review, we explain some of the key concepts underpinning sequentially Markovian coalescent methods. Readers of this article will become familiar with the principles, practice, and interpretation of these methods for inferring demographic history.
... To determine whether the demographic history of threespine stickleback fish could influence the ability to detect recombination hotspots, hotspots were assayed in simulated haplotypes with known recombination profiles and demographic histories. Demographic histories used in the simulations were based on the estimated histories of Lake Washington and Puget Sound, modeled using a Pairwise Sequentially Markovian Coalescent (PSMC) process with default parameters (Li and Durbin 2011;Liu and Hansen 2017). PSMC was run on all individuals from both populations and confidence intervals were estimated on 100 bootstrap replicates. ...
... Our estimates of effective population size over time revealed that Lake Washington and Puget Sound did not experience similar fluctuations. Both populations began with effective population sizes that largely parallel those observed in other threespine stickleback fish populations (Liu and Hansen 2017;Ravinet et al. 2018). Puget Sound then experienced a larger population expansion roughly 18,000 years ago, followed with a decrease in population size at 8,000 years ago. ... Article Full-text available Meiotic recombination is a highly conserved process that has profound effects on genome evolution. At a fine-scale, recombination rates can vary drastically across genomes, often localized into small recombination “hotspots” with highly elevated rates, surrounded by regions with little recombination. In most species studied, the location of hotspots within genomes is highly conserved across broad evolutionary timescales. The main exception to this pattern is in mammals, where hotspot location can evolve rapidly among closely related species and even among populations within a species. Hotspot position in mammals is controlled by the gene, Prdm9, whereas in species with conserved hotspots, a functional Prdm9 is typically absent. Due to a limited number of species where recombination rates have been estimated at a fine-scale, it remains unclear whether hotspot conservation is always associated with the absence of a functional Prdm9. Threespine stickleback fish (Gasterosteus aculeatus) are an excellent model to examine the evolution of recombination over short evolutionary timescales. Using an LD-based approach, we found recombination rates indeed varied at a fine-scale across the genome, with many regions organized into narrow hotspots. Hotspots had highly divergent landscapes between stickleback populations, where only ∼15% of these hotspots were shared. Our results indicate fine-scale recombination rates may be diverging between closely related populations of threespine stickleback fish. Interestingly, we found only a weak association of a PRDM9 binding motif within hotspots, which suggests threespine stickleback fish may possess a novel mechanism for targeting recombination hotspots at a fine-scale. ... Another Hidden Markov Model (HMM) based methods could infer TMRCA from the complete chromosome information, such as multiple Sequentially Markovian Coalescent (MSMC) [17] and Pairwise Sequentially Markovian Coalescent (PSMC) [18]. As a computational method, PSMC relies on the distribution of TMRCA between alleles along with a diploid individual genome [19]. PSMCs estimate the historical effective population size from genome-scale data of a single individual [20]. ... Article Full-text available Background Inferring historical population admixture events yield essential insights in understanding a species demographic history. Methods are available to infer admixture events in demographic history with extant genetic data from multiple sources. Due to the deficiency in ancient population genetic data, there lacks a method for admixture inference from a single source. Pairwise Sequentially Markovian Coalescent (PSMC) estimates the historical effective population size from lineage genomes of a single individual, based on the distribution of the most recent common ancestor between the diploid’s alleles. However, PSMC does not infer the admixture event. Results Here, we proposed eSMC, an extended PSMC model for admixture inference from a single source. We evaluated our model’s performance on both in silico data and real data. We simulated population admixture events at an admixture time range from 5 kya to 100 kya (5 years/generation) with population admix ratio at 1:1, 2:1, 3:1, and 4:1, respectively. The root means the square error is $$\pm 7.61$$ ± 7.61 kya for all experiments. Then we implemented our method to infer the historical admixture events in human, donkey and goat populations. The estimated admixture time for both Han and Tibetan individuals range from 60 kya to 80 kya (25 years/generation), while the estimated admixture time for the domesticated donkeys and the goats ranged from 40 kya to 60 kya (8 years/generation) and 40 kya to 100 kya (6 years/generation), respectively. The estimated admixture times were concordance to the time that domestication occurred in human history. Conclusion Our eSMC effectively infers the time of the most recent admixture event in history from a single individual’s genomics data. The source code of eSMC is hosted at https://github.com/zachary-zzc/eSMC . ... We inferred variation in the population size of the soursop based on the observed heterozygosity in the diploid genome using PSMC (Liu and Hansen 2017), removing shorter scaffolds (Strijk et al. 2021), assuming a generation time of 15 years (Collevatti et al. 2014). ... Chapter The Annonaceae family contains important tropical crops, but the number of species used commercially is limited, and development of other promising species for cultivation is hindered by a lack of genomic resources to support the building of breeding programmes. The family is part of the magnoliids, an ancient lineage of angiosperms for which evolutionary relationships with other major clades have remained unclear. To provide novel resources to both plant breeders and evolutionary research, we described the chromosome-level genome assembly of the soursop (Annona muricata L.), using DNA data generated with PacBio and Illumina short-read technology, in combination with 10XGenomics, BioNano data, and Hi-C sequencing. To disentangle key angiosperm relationships, we reconstructed phylogenomic trees comparing a wider sampling of available angiosperm genomes and reveal that the soursop represents a genomic mosaic supporting different evolutionary histories, with scaffolds almost exclusively supporting singular topologies. However, coalescent methods and a majority of genes support magnoliids as sister to monocots and eudicots, where previously published whole genome-based studies remained inconclusive. The soursop genome highlights the need for more early diverging angiosperm genomes and critical assessment of the suitability of such genomes for inferring evolutionary history. The soursop is the first genome assembled in Annonaceae and supports further studies of floral evolution in magnoliids, whilst providing an essential resource for delineating relationships of major lineages at the base of the angiosperms. Both genome-assisted improvement in promising Annonaceae fruit crops and conservation efforts will be strengthened by the availability of the soursop genome. The genome assembly as a community resource will further strengthen the role of Annonaceae as a model group for research on the ecology, evolution, and domestication potential of tropical species in pomology and agroforestry. ... The psmcfa output was visualized in r, using a modified version of the plotPsmc.r script supplied by Liu and Hansen (2017) with mutation rates of 2.23e-09 sites/year (1.784e-08 sites/generation) inferred for the Siberian ibex (Chen et al., 2019). To explore the effect of different mutation rates on the demographic trajectories, we also doubled and halved the mutation rate for the PSMC analysis ( Figure S1A and B). ... Article Full-text available Population bottlenecks can have dramatic consequences for the health and long‐term survival of a species. Understanding of historic population size and standing genetic variation prior to a contraction allows estimating the impact of a bottleneck on the species genetic diversity. Although historic population sizes can be modelled based on extant genomics, uncertainty is high for the last 10‐20 millenia. Hence, integrating ancient genomes provides a powerful complement to retrace the evolution of genetic diversity through population fluctuations. Here, we recover 15 high‐quality mitogenomes of the once nearly extinct Alpine ibex spanning 8601 BP to 1919 CE and combine these with 60 published modern whole genomes. Coalescent demography simulations based on modern whole genomes indicate population fluctuations coinciding with the last major glaciation period. Using our ancient and historic mitogenomes, we investigate the more recent demographic history of the species and show that mitochondrial haplotype diversity was reduced to a fifth of the pre‐bottleneck diversity with several highly differentiated mitochondrial lineages having co‐existed historically. The main collapse of mitochondrial diversity coincides with elevated human population growth during the last 1‐2 kya. After recovery, one lineage was spread and nearly fixed across the Alps due to recolonization efforts. Our study highlights that a combined approach integrating genomic data of ancient, historic and extant populations unravels major long‐term population fluctuations from the emergence of a species through its near extinction up to the recent past. ... The script plotPsmc.r (Liu & Hansen, 2017) was used for plotting the inferred historical dynamics of N e . Similarly, "pseudodiploid" samples were analysed. ... Article Analyzing variation in a species’ genomic diversity can provide insights into its historical demography, biogeography and population structure, and thus, its ecology and evolution. Although such studies are rarely undertaken for parasites, they can be highly revealing because of the parasite’s coevolutionary relationships with hosts. Modes of reproduction and transmission are thought to be strong determinants of genomic diversity for parasites and vary widely among microsporidia (fungal‐related intracellular parasites), which are known to have high intraspecific genetic diversity and interspecific variation in genome architecture. Here we explore genomic variation in the microsporidium Hamiltosporidium, a parasite of the freshwater crustacean Daphnia magna, looking especially at which factors contribute to nucleotide variation. Genomic samples from 18 Eurasian populations and a new, long‐read based reference genome were used to determine the roles that reproduction mode, transmission mode and geography play in determining population structure and demographic history. We demonstrate two main H. tvaerminnensis lineages and a pattern of isolation‐by‐distance, but note an absence of congruence between these two parasite lineages and the two Eurasian host lineages. We suggest a comparatively recent parasite spread through Northern Eurasian host populations after a change from vertical to mixed‐mode transmission and the loss of sexual reproduction. While gaining knowledge about the ecology and evolution of this focal parasite, we also identify common features that shape variation in genomic diversity for many parasites, e.g., distinct modes of reproduction and the intertwining of host–parasite demographies. ... The demographic history for FAWS from America (n = 10, three from AA, three from AB, two from AC, and two from AD), Africa (n = 10, four from ET, four from KE and two from SA) and China (n = 10, five from YN and five from GX) was inferred using a hidden Markov model approach following the pairwise sequentially Markovian coalescence (PSMC) (Liu and Hansen, 2017) based on SNP distribution. The parameters were set as follows: "-N25 -t15 -r5 -p 4+25*2+4 +6", "-d 13 -D 80", "-q20". ... Article Full-text available The fall armyworm (FAW), Spodoptera frugiperda, is a destructive pest native to America and has recently become an invasive insect pest in China. Because of its rapid spread and great risks in China, understanding of FAW genetic background and pesticide resistance is urgent and essential to develop effective management strategies. Here, we assembled a chromosome-level genome of a male FAW (SFynMstLFR) and compared resequencing results of the populations from America, Africa, and China. Strain identification of 163 individuals collected from America, Africa and China showed that both C and R strains were found in the American populations, while only C strain was found in the Chinese and African populations. Moreover, population genomics analysis showed that populations from Africa and China have close relationship with significantly genetic differentiation from American populations. Taken together, FAWs invaded into China were most likely originated from Africa. Comparative genomics analysis displayed that the cytochrome p450 gene family is extremely expanded to 425 members in FAW, of which 283 genes are specific to FAW. Treatments of Chinese populations with twenty-three pesticides showed the variant patterns of transcriptome profiles, and several detoxification genes such as AOX, UGT and GST specially responded to the pesticides. These findings will be useful in developing effective strategies for management of FAW in China and other invaded areas. ... We noticed low genetic diversity and extremely positive Tajima's D in the moso bamboo population (Supplementary Table 7), so we carried out a demographic analysis to infer historical population changes that resulted in the current population. First, we used the pairwise sequential Markovian coalescent (PSMC) 31 to investigate the trends of changes in the relatively remote history. As expected, we obtained unsegregated PSMC curves for individuals from five phylogenetic groups (Fig. 3e), all of which showed a rapid decline in the effective population size (Ne) of the moso bamboo population during the last glacial period (115,000-11,700 years ago). ... Article Full-text available Moso bamboo (Phyllostachys edulis) is an economically and ecologically important nontimber forestry species. Further development of this species as a sustainable bamboo resource has been hindered by a lack of population genome information. Here, we report a moso bamboo genomic variation atlas of 5.45 million single-nucleotide polymorphisms (SNPs) from whole-genome resequencing of 427 individuals covering 15 representative geographic areas. We uncover low genetic diversity, high genotype heterozygosity, and genes under balancing selection underlying moso bamboo population adaptation. We infer its demographic history with one bottleneck and its recently small population without a rebound. We define five phylogenetic groups and infer that one group probably originated by a single-origin event from East China. Finally, we conduct genome-wide association analysis of nine important property-related traits to identify candidate genes, many of which are involved in cell wall, carbohydrate metabolism, and environmental adaptation. These results provide a foundation and resources for understanding moso bamboo evolution and the genetic mechanisms of agriculturally important traits. Moso bamboo is an economically and ecologically important nontimber forestry species. Here, the authors analyze 427 genomes collected from 15 representative geographic areas, and identify genes under balancing selection, putative patterns of historic demography, and candidate genes associated with important traits. ... Stairway Plot v2 (Liu and Fu, 2020) was used to infer temporal changes in the population size (N e ) for each species. A common mutation rate of 1.0 × 10 −8 per site per generation was used following Liu and Hansen (2017) due to the lack of a precise SNP mutation rate reported for Cycas, and 40 years was set as a generation (International Union for Conservation of Nature, 2020) in the present study. All of the samples for C. bifida, C. taiwaniana, and C. dolichophylla were kept due to their small sample size, whereas we downsampled 20 individuals per species for C. changjiangensis, C. balansae, and C. szechuanensis to generate the no-missing SNP dataset. ... Article Full-text available Cycads represent one of the most ancestral living seed plants as well as one of the most threatened plant groups in the world. South China is a major center and potential origin of Cycas , the most rapidly diversified lineage of cycads. However, genomic-wide diversity of Cycas remains poorly understood due to the challenge of generating genomic markers associated with their inherent large genomes. Here, we perform a comprehensive conservation genomic study based on restriction-site associated DNA sequencing (RADseq) data in six representative species of Cycas in South China. Consistently low genetic diversity and strong genetic differentiation were detected across species. Both phylogenetic inference and genetic structure analysis via several methods revealed generally congruent groups among the six Cycas species. The analysis with ADMIXTURE showed low mixing of genetic composition among species, while individuals of C. dolichophylla exhibited substantial genetic admixture with C. bifida , C. changjiangensis , and C. balansae . Furthermore, the results from Treemix, f 4 -statistic, and ABBA-BABA test were generally consistent and revealed the complex patterns of interspecific gene flow. Relatively strong signals of hybridization were detected between C. dolichophylla and C. szechuanensis , and the ancestor of C. taiwaniana and C. changjiangensis . Distinct patterns of demographic history were inferred for these species by Stairway Plot, and our results suggested that both climate fluctuation and frequent geological activities during the late Pleistocene exerted deep impacts on the population dynamics of these species in South China. Finally, we explore the practical implications of our findings for the development of conservation strategies in Cycas . The present study demonstrates the efficiency of RADseq for conservation genomic studies on non-model species with large and complex genomes. Given the great significance of cycads as a radical transition in the evolution of plant biodiversity, our study provides important insights into the mechanisms of diversification in such recently radiated living fossil taxa. ... The demographic history for FAWS from America (n = 10, three from AA, three from AB, two from AC, and two from AD), Africa (n = 10, four from ET, four from KE and two from SA) and China (n = 10, five from YN and five from GX) was inferred using a hidden Markov model approach following the pairwise sequentially Markovian coalescence (PSMC) (Liu and Hansen, 2017) based on SNP distribution. The parameters were set as follows: "-N25 -t15 -r5 -p 4+25*2+4 +6", "-d 13 -D 80", "-q20". ... Article Full-text available The fall armyworm (FAW), Spodoptera frugiperda, is a destructive pest native to America and has recently become an invasive insect pest in China. Because of its rapid spread and great risks in China, understanding of FAW genetic background and pesticide resistance is urgent and essential to develop effective management strategies. Here, we assembled a chromosome-level genome of a male FAW (SFynMstLFR) and compared resequencing results of the populations from America, Africa, and China. Strain identification of 163 individuals collected from America, Africa and China showed that both C and R strains were found in the American populations, while only C strain was found in the Chinese and African populations. Moreover, population genomics analysis showed that populations from Africa and China have close relationship with significantly genetic differentiation from American populations. Taken together, FAWs invaded into China were most likely originated from Africa. Comparative genomics analysis displayed that the cytochrome p450 gene family is extremely expanded to 425 members in FAW, of which 283 genes are specific to FAW. Treatments of Chinese populations with twenty-three pesticides showed the variant patterns of transcriptome profiles, and several detoxification genes such as AOX, UGT and GST specially responded to the pesticides. These findings will be useful in developing effective strategies for management of FAW in China and other invaded areas. ... If sampling more localities is unfeasible as may be the case in the Antarctic realm, it can be beneficial to instead invest in high density sequencing (as in several markers per linkage group). With sufficient genome coverage even advanced coalescent modeling is possible using RRS data [112]. ... Article Full-text available Background Genome-wide data are invaluable to characterize differentiation and adaptation of natural populations. Reduced representation sequencing (RRS) subsamples a genome repeatedly across many individuals. However, RRS requires careful optimization and fine-tuning to deliver high marker density while being cost-efficient. The number of genomic fragments created through restriction enzyme digestion and the sequencing library setup must match to achieve sufficient sequencing coverage per locus. Here, we present a workflow based on published information and computational and experimental procedures to investigate and streamline the applicability of RRS. Results In an iterative process genome size estimates, restriction enzymes and size selection windows were tested and scaled in six classes of Antarctic animals (Ostracoda, Malacostraca, Bivalvia, Asteroidea, Actinopterygii, Aves). Achieving high marker density would be expensive in amphipods, the malacostracan target taxon, due to the large genome size. We propose alternative approaches such as mitogenome or target capture sequencing for this group. Pilot libraries were sequenced for all other target taxa. Ostracods, bivalves, sea stars, and fish showed overall good coverage and marker numbers for downstream population genomic analyses. In contrast, the bird test library produced low coverage and few polymorphic loci, likely due to degraded DNA. Conclusions Prior testing and optimization are important to identify which groups are amenable for RRS and where alternative methods may currently offer better cost-benefit ratios. The steps outlined here are easy to follow for other non-model taxa with little genomic resources, thus stimulating efficient resource use for the many pressing research questions in molecular ecology. ... The W. mirabilis population size history was inferred using the Pairwise Sequentially Markovian Coalescent model (pSmc, version 0.6.5-r67) (Li & Durbin, 2011;Liu & Hansen, 2017). Analysis was carried out with default parameters. ... Article Full-text available Welwitschia mirabilis, which is endemic to the Namib Desert, is the only living species within the Welwitschiaceae family. This species has an extremely long lifespan of up to 2 000 years and bears a single pair of opposite leaves that persist whilst alive. However, the underlying genetic mechanisms and evolution of the species remain poorly elucidated. Here, we report on a chromosome-level genome assembly for W. mirabilis, with a 6.30 Gb genome sequence and contig N50 of 27.50 Mb. In total, 39 019 protein-coding genes were predicted from the genome. Two brassinosteroid-related genes (BRI1 and CYCD3), key regulators of cell division and elongation, were strongly selected in W. mirabilis and may contribute to their long ever-growing leaves. Furthermore, 29 gene families in the MAPK signaling pathway showed significant expansion, which may contribute to the desert adaptations of the plant. Three positively selected genes (EHMT1, EIF4E, SOD2) may be involved in the mechanisms leading to long lifespan. Based on molecular clock dating and fossil calibrations, the divergence time of W. mirabilis and Gnetum montanum was estimated at ~123.5 million years ago. Reconstruction of population dynamics from genome data coincided well with the aridification of the Namib Desert. The genome sequence detailed in the current study provides insight into the evolution of W. mirabilis and should be an important resource for further study on gnetophyte and gymnosperm evolution. ... If sampling more localities is unfeasible as may be 505 the case in the Antarctic realm, it can be beneficial to instead invest in high density sequencing (as in several markers per linkage group). With sufficient genome coverage even advanced coalescent modeling is possible using RRS data (108). ... Preprint Full-text available Genome-wide data are invaluable to characterize differentiation and adaptation of natural populations. Reduced representation sequencing (RRS) subsamples a genome repeatedly across many individuals. However, RRS requires careful optimization and fine-tuning to deliver high marker density while being cost-efficient. The number of genomic fragments created through restriction enzyme digestion and the sequencing library setup must match to achieve sufficient sequencing coverage per locus. Here, we present a workflow based on published information and computational and experimental procedures to investigate and streamline the applicability of RRS. In an iterative process genome size estimates, restriction enzymes and size selection windows were tested and scaled in six classes of Antarctic animals (Ostracoda, Malacostraca, Bivalvia, Asteroidea, Actinopterygii, Aves). Achieving high marker density would be expensive in amphipods, the malacostracan target taxon, due to the large genome size. We propose alternative approaches such as mitogenome or target capture sequencing for this group. Pilot libraries were sequenced for all other target taxa. Ostracods, bivalves, sea stars, and fish showed overall good coverage and marker numbers for downstream population genomic analyses. In contrast, the bird test library produced low coverage and few polymorphic loci, likely due to degraded DNA. Prior testing and optimization are important to identify which groups are amenable for RRS and where alternative methods may currently offer better cost-benefit ratios. The steps outlined here are easy to follow for other non-model taxa with little genomic resources, thus stimulating efficient resource use for the many pressing research questions in molecular ecology. ... to reconstruct introduction histories (Liu & Hansen, 2017;Sherpa et al., 2018). ... Article Full-text available Biological invasions, the establishment and spread of non‐native species in new regions, can have extensive economic and environmental consequences. Increased global connectivity accelerates introduction rates, while climate and land‐cover changes may decrease the barriers to invasive populations spread. A detailed knowledge of the invasion history, including assessing source populations, routes of spread, number of independent introductions, and the effects of genetic bottlenecks and admixture on the establishment success, adaptive potential, and further spread, is crucial from an applied perspective to mitigate socio‐economic impacts of invasive species, as well as for addressing fundamental questions on the evolutionary dynamics of the invasion process. Recent advances in genomics together with the development of geographic information systems provide unprecedented large genetic and environmental datasets at global and local scales to link population genomics, landscape ecology and species distribution modelling into a common framework to study the invasion process. Although the factors underlying population invasiveness have been extensively reviewed, analytical methods currently available to optimally combine molecular and environmental data for inferring invasive population demographic parameters and predicting further spreading are still under development. In this review we focus on the few recent insect invasion studies that combine different datasets and approaches to show how integrating genetic, observational, ecological and environmental data pave the way to a more integrative biological invasion science. We provide guidelines to study the evolutionary dynamics of invasions at each step of the invasion process, and conclude on the benefits of including all types of information and up‐to‐date analytical tools from different research areas into a single framework. ... Thus, the IICR as inferred by PSMC was plotted over time with the x-axis scaled using a per generation mutation rate for buffalo of 1.5e−8 and a generation time of 7.5 years (the estimated per site per year mutation rate of 2.0e−9 by Chen et al. 46 converts to 1.5e−8 for a generation time of 7.5 years). PSMC plots were constructed in R v3.6.2 43 , using ggplot2 v3.3.0 60 , by editing the script of Emily Humble (available at: https ://githu b.com/elhum ble/SHO_analy sis_2020), which uses the plotPsmc R function from Liu and Hansen 61 . A link to all code used in this study is provided in the data availability section. ... Article Full-text available Genomes retain records of demographic changes and evolutionary forces that shape species and populations. Remnant populations of African buffalo (Syncerus caffer) in South Africa, with varied histories, provide an opportunity to investigate signatures left in their genomes by past events, both recent and ancient. Here, we produce 40 low coverage (7.14×) genome sequences of Cape buffalo (S. c. caffer) from four protected areas in South Africa. Genome-wide heterozygosity was the highest for any mammal for which these data are available, while differences in individual inbreeding coefficients reflected the severity of historical bottlenecks and current census sizes in each population. PSMC analysis revealed multiple changes in Ne between approximately one million and 20 thousand years ago, corresponding to paleoclimatic changes and Cape buffalo colonisation of southern Africa. The results of this study have implications for buffalo management and conservation, particularly in the context of the predicted increase in aridity and temperature in southern Africa over the next century as a result of climate change. ... We used PSMC (Liu & Hansen, 2017) to infer the variation in population size of the soursop based on the observed heterozygosity in the diploid genome. As PSMC was shown to perform reliably for scaffolds >100 kb, we removed shorter scaffolds from the assembly. ... Article Full-text available The flowering plant family Annonaceae includes important commercially grown tropical crops, but development of promising species is hindered by a lack of genomic resources to build breeding programs. Annonaceae are part of the magnoliids, an ancient lineage of angiosperms for which evolutionary relationships with other major clades remain unclear. To provide resources to breeders and evolutionary researchers, we report a chromosome‐level genome assembly of the soursop (Annona muricata). We assembled the genome using 444.32 Gb of DNA sequences (676X sequencing depth) from PacBio and Illumina short‐reads, in combination with 10XGenomics and Bionano data (v1). A total of 949 scaffolds were assembled to a final size of 656.77 Mb, with a scaffold N50 of 3.43 Mb (v1), and then further improved to seven pseudo‐chromosomes using Hi‐C sequencing data (v2; scaffold N50: 93.2 Mb, total size in chromosomes: 639.6 Mb). Heterozygosity was very low (0.06%), while repeat sequences accounted for 54.87% of the genome, and 23,375 protein‐coding genes with an average of 4.79 exons per gene were annotated using de novo, RNA‐seq and homology‐based approaches. Reconstruction of the historical population size showed a slow continuous contraction, likely related to Cenozoic climate changes. The soursop is the first genome assembled in Annonaceae, supporting further studies of floral evolution in magnoliids, providing an essential resource for delineating relationships of ancient angiosperm lineages. Both genome‐assisted improvement and conservation efforts will be strengthened by the availability of the soursop genome. As a community resource, this assembly will further strengthen the role of Annonaceae as model species for research on the ecology, evolution and domestication potential of tropical species in pomology and agroforestry. ... OL and BZ showed a strong correlation since 100,000 years ago. At 20,000 years, the effective population size of TST increased65,66 . ... Article Full-text available The identification of genome-wide selection signatures can provide insights on the mechanisms of natural and/or artificial selection and uncover genes related to biological functions and/or phenotypes. Tibetan sheep are an important livestock in Tibet, providing meat and wool for Tibetans who are renown for breeding livestock that adapt well to high altitudes. Using whole-genome sequences with an effective sequencing depth of 5×, we investigated the genomic diversity and structure and, identified selection signatures of White Tibetan, Oula and Poll Dorset sheep. We obtained 30,163,679 Single Nucleotide Polymorphisms (SNPs) and 5,388,372 indels benchmarked against the ovine Oar_v4.0 genome assembly. Next, using F ST , ZHp and XP-EHH approaches, we identified selection signatures spanning a set of candidate genes, including HIF1A , CAPN3 , PRKAA1 , RXFP2 , TRHR and HOXA10 that are associated with pathways and GO categories putatively related to hypoxia responses, meat traits and disease resistance. Candidate genes and GO terms associated with coat color were also identified. Finally, quantification of blood physiological parameters, revealed higher levels of mean corpuscular hemoglobin measurement and mean corpuscular hemoglobin concentration in Tibetan sheep compared with Poll Dorset, suggesting a greater oxygen-carrying capacity in the Tibetan sheep and thus better adaptation to high-altitude hypoxia. In conclusion, this study provides a greater understanding of genome diversity and variations associated with adaptive and production traits in sheep. ... Historical effective population size of T. bleekeri was estimated using Pairwise Sequentially Markovian Coalescent (PSMC) v0.6.5 software [74]. We used the data for whole-genome variants of individuals for the genome assembly. ... Article Full-text available Background Intense stresses caused by high-altitude environments may result in noticeable genetic adaptions in native species. Studies of genetic adaptations to high elevations have been largely limited to terrestrial animals. How fish adapt to high-elevation environments is largely unknown. Triplophysa bleekeri, an endemic fish inhabiting high-altitude regions, is an excellent model to investigate the genetic mechanisms of adaptation to the local environment. Here, we assembled a chromosomal genome sequence of T. bleekeri, with a size of ∼628 Mb (contig and scaffold N50 of 3.1 and 22.9 Mb, respectively). We investigated the origin and environmental adaptation of T. bleekeri based on 21,198 protein-coding genes in the genome. Results Compared with fish species living at low altitudes, gene families associated with lipid metabolism and immune response were significantly expanded in the T. bleekeri genome. Genes involved in DNA repair exhibit positive selection for T. bleekeri, Triplophysa siluroides, and Triplophysa tibetana, indicating that adaptive convergence in Triplophysa species occurred at the positively selected genes. We also analyzed whole-genome variants among samples from 3 populations. The results showed that populations separated by geological and artificial barriers exhibited obvious differences in genetic structures, indicating that gene flow is restricted between populations. Conclusions These results will help us expand our understanding of environmental adaptation and genetic diversity of T. bleekeri and provide valuable genetic resources for future studies on the evolution and conservation of high-altitude fish species such as T. bleekeri. ... The demographic history for FAWS from America (n = 10, three from AA, three from AB, two from AC, and two from AD), Africa (n = 10, four from ET, four from KE and two from SA) and China (n = 10, five from YN and five from GX) was inferred using a hidden Markov model approach following the pairwise sequentially Markovian coalescence (PSMC) (Liu and Hansen, 2017) based on SNP distribution. The parameters were set as follows: "-N25 -t15 -r5 -p 4+25*2+4 +6", "-d 13 -D 80", "-q20". ... Article Full-text available The fall armyworm (FAW), Spodoptera frugiperda, is a destructive pest native to America and has recently become an invasive insect pest in China. Because of its rapid spread and great risks in China, understanding of FAW genetic background and pesticide resistance is urgent and essential to develop effective management strategies. Here, we assembled a chromosome-level genome of a male FAW (SFynMstLFR) and compared resequencing results of the populations from America, Africa, and China. Strain identification of 163 individuals collected from America, Africa and China showed that both C and R strains were found in the American populations, while only C strain was found in the Chinese and African populations. Moreover, population genomics analysis showed that populations from Africa and China have close relationship with significantly genetic differentiation from American populations. Taken together, FAWs invaded into China were most likely originated from Africa. Comparative genomics analysis displayed that the cytochrome p450 gene family is extremely expanded to 425 members in FAW, of which 283 genes are specific to FAW. Treatments of Chinese populations with twenty-three pesticides showed the variant patterns of transcriptome profiles, and several detoxification genes such as AOX, UGT and GST specially responded to the pesticides. These findings will be useful in developing effective strategies for management of FAW in China and other invaded areas. ... With the accelerating development and accessibility of high-throughput sequencing (HTS), a number of new methods have become available that make use of reduced representation (e.g., RAD sequencing) or whole-genome sequencing data for tracking detailed demographic history since the LGM (e.g., Li & Durbin, 2011;Liu & Hansen, 2017;Liu & Fu, 2015), which have indicated that different species, or even different lineages within species, might have had varying sensitivities to the temperature decline during the LGM. Many threatened species or lineages became extinct or failed to recover to pre-LGM population sizes (e.g., Mays et al., 2018;Yang et al., 2018), but others were able to recover at the end of the LGM and rapidly expanded with Holocene climate warming (e.g., Chattopadhyay et al., 2019;Ye et al., 2018). ... Article Full-text available Genetic stochasticity and bottlenecking in the course of Pleistocene glaciations have been indicated to threaten the survival of local endemics. However, the mechanisms by which local endemic species balance the influences of these two events remain poorly understood. Here, we generated a ddRAD-seq dataset, mined mitochondrial sequences and constructed ecological niche models (ENMs) for the island endemic water strider Metrocoris esakii (Hemiptera:Gerridae). We found that M. esakii comprised three divergent lineages (i.e., north, central and south) isolated by geographical barriers and generally experienced population declines with the constriction of suitable areas during the Last Glacial Maximum (LGM). Further demographic model testing and stairway plots revealed a history of recent gene flow among the neighbouring lineages and rapid recovery at the end of the LGM, indicating that M. esakii at least had the potential for an adaptive response to the population fragmentation and bottlenecking. The northern lineage did not show genetic bottlenecking during the LGM, which was probably due to its large effective population size (Ne) from migration, which improved its adaptive potential. Relative to the ddRAD-seq dataset, the demographic results based on mitochondrial sequences were less conclusive, showing weak differentiation and oversimplified demographic trajectories for the three genetic lineages. Overall, this study provides some degree of optimism for the survival of island endemic water striders from a demographic perspective, but further evaluation of their extinction risk under the impacts of human activities is required. ... We used PSMC (Liu and Hansen 2017) to infer the variation in population size of the soursop based on the observed heterozygosity in the diploid genome. As PSMC was shown to performed reliably for scaffolds >100kb, we removed shorter scaffolds from the assembly. ... ... Additionally, large portions of the reference genomes covered up to 97% of the plastome and 69% of the reference mitochondrial genome. These results are expected with RADseq data as reads will rarely cover the entire reference because of the use of restriction enzymes (Liu & Hansen, 2017). These results indicate that our RADseq protocol is also effective at recovering large portions of the plastome and mitochondrial genome, without reducing the effectiveness and reliability of RADseq for population genetics or phylogenetic inference (Fitz-Gibbon, Hipp, Pham, Manos, & Sork, 2017). ... Article Full genome sequencing of organisms with large and complex genomes is intractable and cost ineffective under most research budgets. Cycads (Cycadales) represent one of the oldest lineages of the extant seed plants and, partly due to their age, have incredibly large genomes up to ~60Gbp. Restriction site associated DNA sequencing (RADseq) offers an approach to find genome‐wide informative markers and has proven to be effective with both model and non‐model organisms. We tested the application of RADseq using ezRAD across all ten genera of the Cycadales including an example dataset of Cycas calcicola representing 72 samples from natural populations. Using previously available plastid and mitochondrial genomes as references, reads were mapped recovering plastid and mitochondrial genome regions and nuclear markers for all of the genera. De novo assembly generated up to 138,407 high‐depth clusters and up to 1,705 phylogenetically informative loci for the genera, and 4,421 loci for the example assembly of C. calcicola. The number of loci recovered by de novo assembly were lower than previous RADseq studies, yet still sufficient for downstream analysis. However, the number of markers could be increased by relaxing our assembly parameters, especially for the C. calcicola dataset. Our results demonstrate the successful application of RADseq across the Cycadales to generate a large number of markers for all genomic compartments, despite the large number of plastids present in a typical plant cell. Our modified protocol was adapted to be applied to cycads and other organisms with large genomes to yield many informative genome‐wide markers. This article is protected by copyright. All rights reserved. ... We used PSMC (Liu and Hansen 2017) to infer the variation in population size of the soursop based on the observed heterozygosity in the diploid genome. As PSMC was shown to performed reliably for scaffolds >100kb, we removed shorter scaffolds from the assembly. ... Preprint Full-text available Deep relationships and the sequence of divergence among major lineages of angiosperms (magnoliids, monocots and eudicots) remain ambiguous and differ depending on analytical approaches and datasets used. Complete genomes potentially provide opportunities to resolve these uncertainties, but two recently published magnoliid genomes instead deliver further conflicting signals. To disentangle key angiosperm relationships, we report a high-quality draft genome for the soursop (Annona muricata, Annonaceae). We reconstructed phylogenomic trees and show that the soursop represents a genomic mosaic supporting different histories, with scaffolds almost exclusively supporting single topologies. However, coalescent methods and a majority of genes support magnoliids as sister to monocots and eudicots, where previous whole genome-based studies remained inconclusive. This result is clear and consistent with recent studies using plastomes. The soursop genome highlights the need for more early diverging angiosperm genomes and critical assessment of the suitability of such genomes for inferring evolutionary history. ... However, our estimates of effective 641 population size over time revealed that Lake Washington and Puget Sound did not experience 642 similar fluctuations. Both populations began with effective population sizes that largely parallel 643 those observed in other threespine stickleback fish populations (Liu and Hansen 2017;Ravinet et 644 al. 2018). Puget Sound then experienced a larger population expansion roughly 18,000 years ago, 645 followed with a decrease in population size at approximately 8,000 years ago. ... Preprint Full-text available Meiotic recombination is a highly conserved process that has profound effects on genome evolution. Recombination rates can vary drastically at a fine-scale across genomes and often localize to small recombination 'hotspots' with highly elevated rates surrounded by regions with little recombination. Hotspot targeting to specific genomic locations is variable across species. In some mammals, hotspots have divergent landscapes between closely related species which is directed by the binding of the rapidly evolving protein, PRDM9. In many species outside of mammals, hotspots are generally conserved and tend to localize to regions with open chromatin such as transcription start sites. It remains unclear if the location of recombination hotspots diverge in taxa outside of mammals. Threespine stickleback fish (Gasterosteus aculeatus) are an excellent model to examine the evolution of recombination over short evolutionary timescales. Using an LD-based approach, we found recombination rates varied at a fine-scale across the genome, with many regions organized into narrow hotspots. Hotspots had divergent landscapes between stickleback populations, where only ~15% were shared, though part of this divergence could be due to demographic history. Additionally, we did not detect a strong association of PRDM9 with recombination hotspots in threespine stickleback fish. Our results suggest fine-scale recombination rates may be diverging between closely related populations of threespine stickleback fish and argue for additional molecular characterization to verify the extent of the divergence. ... Population growth older than 100 kya, perhaps older than 1 Mya, was detected in all analysed samples. Note that sudden or rapid population size changes may present as apparent gradual growth using PSMC (Li & Durbin, 2011;Liu & Hansen, 2017). Therefore, in our case, what appears in the PSMC plot as gradual growth concluding ≈150 kya could be caused by older, more rapid growth. ... ... Genomic data also offer the prospect of reconstructing population histories from a single contemporary genome, a complex task that is virtually impossible with patchy or nonexistent observational and fossil data. Two particularly exciting approaches are the pairwise and multiple sequentially Markovian coalescent models (SMCs; Li and Durbin 2011; Schiffels and Durbin 2014 see also Salmona et al. 2017) that were developed for whole-genome data and have recently been applied to GBS data (Liu and Hansen 2017). Using SMC models, Orlando et al. (2013) compared the genomes of five domestic horse breeds, a Late Pleistocene horse, Przewalski's horse, and a donkey to reconstruct the demographic history of the modern horse. ... Chapter Humans have long relied on ungulates for food, clothing, manual labor, and transportation. Ungulates were among the first species to be domesticated and managed in the wild, but more than one-third of species are currently of conservation concern. Starting in the late twentieth century, ungulate research and management began employing genetic tools to assess attributes like the degree of population structure, inbreeding, and variation in functionally important genes. As sequencing technology advanced, research on ungulates shifted to now assay variation across the entire genome. More than 20 ungulates have had their genome assembled with a mean length of 2.6 Gb and N50 of 26 Mb. Genomic studies have provided deeper insights into the evolutionary relationships among giraffes and bovids, while camelids and horses have had their entire species demographic histories reconstructed using novel Markovian coalescent models. Moreover, artificial and natural selection has left clear signatures on ungulate genomes with high-throughput sequencing techniques being used to identify the genetic basis to important phenotypic traits. Novel assembly strategies and genomic assays are regularly being employed on ungulates, and research on this ecological and economically valuable group will help chart the course of the emerging field of wildlife genomics. Article Full-text available Islands are natural laboratories for studying patterns and processes of evolution. Research on island endemic birds has revealed elevated speciation rates and rapid phenotypic evolution in several groups (e.g., white-eyes, Darwin's finches). However, understanding the evolutionary processes behind these patterns requires an understanding of how genotypes map to novel phenotypes. To date, there are few high-quality reference genomes for species found on islands. Here, we sequence the genome of one of Ernst Mayr's ‘great speciators’, the collared kingfisher (Todiramphus chloris collaris). Utilizing high molecular weight DNA and linked-read sequencing technology, we assembled a draft high-quality genome with highly contiguous scaffolds (scaffold N50 = 19 Mb). Based on universal single-copy orthologues (BUSCO), we estimated a gene space completeness of 96.6% for the draft genome assembly. Population demographic history analyses reveal a distinct pattern of contraction and expansion in population size throughout the Pleistocene. Comparative genomic analysis of gene family evolution revealed that species-specific and rapidly expanding gene families in the collared kingfisher (relative to other Coraciiformes) are mainly involved in the ErbB signaling pathway and focal adhesion. Todiramphus kingfishers are a species-rich group that has become a focus of speciation research. This draft genome will be a platform for future taxonomic, phylogeographic, and speciation research in the group. For example, target genes will enable testing of changes in sensory structures associated with changes in vision and taste genes across kingfishers. Article Rubus corchorifolius (“Shanmei” or mountain berry, 2n = 14) is widely distributed in China, and its fruit has high nutritional and medicinal value. Here, we report a high-quality chromosome-scale genome assembly of Shanmei, with a size of 215.69 Mb and encompassing 26,696 genes. Genome comparisons among Rosaceae species showed that Shanmei and Fupenzi (Rubus chingii Hu) were most closely related, followed by blackberry (Rubus occidentalis), and that environmental adaptation-related genes were significantly expanded in the Shanmei genome. Further resequencing of 101 samples of Shanmei collected from four regions in the provinces of Yunnan, Hunan, Jiangxi, and Sichuan in China revealed that the Hunan population of Shanmei possessed the highest diversity and represented the more ancestral population. Moreover, the Yunnan population underwent strong selection based on nucleotide diversity, linkage disequilibrium, and the historical effective population size analyses. Furthermore, genes from candidate genomic regions that showed strong divergence were significantly enriched in flavonoid biosynthesis and plant hormone signal transduction, indicating the genetic basis of adaptation of Shanmei to the local environment. The high-quality genome sequences and the variome dataset of Shanmei provide valuable resources for breeding applications and for elucidating the genome evolution and ecological adaptation of Rubus species. Article Full-text available Kangaroo rats in the genus Dipodomys are found in a variety of habitat types in western North America, including deserts, arid and semi-arid grasslands, and scrublands. Many Dipodomys species are experiencing strong population declines due to increasing habitat fragmentation, with two species listed as federally endangered. The precarious state of many Dipodomys populations, including those occupying extreme environments, make species of this genus valuable subjects for studying the impacts of habitat degradation and fragmentation on population genomic patterns and for characterizing the genomic bases of adaptation to harsh conditions. To facilitate exploration of such questions, we assembled and annotated a reference genome for the banner-tailed kangaroo rat (D. spectabilis) using PacBio HiFi sequencing reads, providing a more contiguous genomic resource than two previously assembled Dipodomys genomes. Using the HiFi data for D. spectabilis and publicly available sequencing data for two other Dipodomys species (D. ordii and D. stephensi), we demonstrate the utility of this new assembly for studies of congeners by conducting inference of historic effective population sizes (N e) and linking these patterns to the species’ current extinction risk statuses. The genome assembly presented here will serve as a valuable resource for population and conservation genomic studies of Dipodomys species, comparative genomic research within mammals and rodents, and investigations into genomic adaptation to extreme environments and changing landscapes. Preprint Full-text available Kangaroo rats in the genus Dipodomys are found in a variety of habitat types in western North America, including deserts, arid and semi-arid grasslands, and scrublands. Many Dipodomys species are experiencing strong population declines due to increasing habitat fragmentation, with two species listed as federally endangered. The precarious state of many Dipodomys populations, including those occupying extreme environments, make species of this genus valuable subjects for studying the impacts of habitat degradation and fragmentation on population genomic patterns and for characterizing the genomic bases of adaptation to harsh conditions. To facilitate exploration of such questions, we assembled and annotated a reference genome for the banner-tailed kangaroo rat ( D. spectabilis ) using PacBio HiFi sequencing reads, providing a more contiguous genomic resource than two previously assembled Dipodomys genomes. Using the HiFi data for D. spectabilis and publicly available sequencing data for two other Dipodomys species ( D. ordii and D. stephensi ), we demonstrate the utility of this new assembly for studies of congeners by conducting inference of historic effective population sizes ( N e ) and linking these patterns to the species’ current extinction risk statuses. The genome assembly presented here will serve as a valuable resource for population and conservation genomic studies of Dipodomys species, comparative genomic research within mammals and rodents, and investigations into genomic adaptation to extreme environments and changing landscapes. Significance statement Kangaroo rats in the genus Dipodomys occur in a wide variety of habitat types, ranging from scrublands to arid deserts, and are increasingly impacted by habitat fragmentation with populations of many species in strong decline. To facilitate population and conservation genomic studies of Dipodomys species, we generated the first reference genome assembly for the extensively studied banner-tailed kangaroo rat ( D. spectabilis ) from long read PacBio sequencing data. The genome assembly presented here will serve as a valuable resource for studies of Dipodomys species—which have long served as ecological and physiological models for the study of osmoregulation—comparative genomic surveys of mammals and rodents, and investigations into genomic adaptation to extreme environments and changing landscapes. Preprint Full-text available Population bottlenecks can have dramatic consequences for the health and long-term survival of a species. A recent bottleneck event can also largely obscure our understanding of standing variation prior to the contraction. Historic population sizes can be modeled based on extant genomics, however uncertainty increases with the severity of the bottleneck. Integrating ancient genomes provides a powerful complement to retrace the evolution of genetic diversity through population fluctuations. Here, we recover 15 high-quality mitogenomes of the once nearly extinct Alpine ibex spanning 8601 BP to 1919 CE and combine these with 60 published modern genomes. Coalescent demography simulations based on modern genomes indicate population fluctuations matching major climatic change over the past millennia. Using ancient genomes, we show that mitochondrial haplotype diversity has been reduced to a fifth of the pre-bottleneck diversity with several highly differentiated mitochondrial lineages having co-existed historically. The main collapse of mitochondrial diversity coincided with human settlement expansions in the Middle Ages. The near extinction severely reduced the mitochondrial diversity. After recovery, one lineage was spread and nearly fixed across the Alps due to recolonization efforts. Contrary to expectations, we show that a second ancestral mitochondrial lineage has survived in an isolated population further south. Our study highlights that a combined approach integrating genomic data of ancient, historic and extant populations unravels major long-term population fluctuations. Preprint Full-text available Analyzing variation in a species' genomic diversity can provide insights into its historical demography, biogeography and population structure, and thus, its ecology and evolution. Although such studies are rarely undertaken for parasites, they can be highly revealing because of the parasite's coevolutionary relationships with hosts. Modes of reproduction and transmission are thought to be strong determinants of genomic diversity for parasites and vary widely among microsporidia (fungal-related intracellular parasites), which are known to have high intraspecific genetic diversity and interspecific variation in genome architecture. Here we explore genomic variation in the microsporidium Hamiltosporidium, a parasite of the freshwater crustacean Daphnia magna, looking especially at which factors contribute to nucleotide variation. Genomic samples from 18 Eurasian populations and a new, long-read based reference genome were used to determine the roles that reproduction mode, transmission mode and geography play in determining population structure and demographic history. We demonstrate two main H. tvaerminnensis lineages and a pattern of isolation-by-distance, but note an absence of congruence between these two parasite lineages and the two Eurasian host lineages. We suggest a comparatively recent parasite spread through Northern Eurasian host populations after a change from vertical to mixed-mode transmission and the loss of sexual reproduction. While gaining knowledge about the ecology and evolution of this focal parasite, we also identify common features that shape variation in genomic diversity for many parasites, e.g., distinct modes of reproduction and the intertwining of host-parasite demographies. Thesis The phenomenon of evolutionary convergence is a fascinating process in which distantly related species independently acquire similar characteristics in response to similar selective pressures. Ant- and termite-eating mammals are among the most famous examples of morphological convergence. Indeed, this particular lifestyle evolved in five distinct lineages of mammals: the aardvark (Tubulidentata), the aardwolf (Carnivora), the anteaters (Pilosa), the giant armadillo (Cingulata), and the pangolins (Pholidota). To better undestand the evolution of these organisms, several approaches were developed in this thesis. First, I present an original strategy to characterize the precise diet of myrmecophagous mammals taking advantage of metagenomic sequencing data generated from fecal samples and a reference mitogenomic database of termites and ants. Second, with the final objective of detecting molecular convergence at the genomic scale in ant-eating mammals, we generated nine high quality mammlian genomes using Oxford Nanopore technologies. The different strategies developed from the set-up of MinION qesuencing to annotation of the resulting assemblies are presented together with a first case study illustrating the use of two of these new reference genomes for species delineation. Finally, I present comparative transcriptomic analyses of salivary glands and other organs in ant-eating mammals suggesting that historical contingency and molecular evolutionary tinkering of chitinase genes played a major role in the convergent evolution of myrmecophagy. Thesis Full-text available Interspecific hybridisation—the breeding between distinct species—can contribute to species extinction due to wasted reproductive potential, outbreeding depression, and introgression of genetic material mediated by backcrossing. Incomplete reproductive barriers can facilitate interspecific hybridisation as previously isolated species come into contact with one another. Interspecific hybridisation is relatively common among birds, but anthropogenic impacts that increase the incidence of such hybridisation between threatened native species and non-threatened species are of conservation concern due to the risks of genetic swamping, which at its most extreme may result in species extinction. While the impacts of interspecific hybridisation have previously been assessed using small numbers of genetic markers, new genomic sequencing developments now facilitate implementation of genome-wide reassessments providing greater resolution of analyses. The critically endangered kakī (black stilt; Himantopus novaezelandiae) is one such species that can benefit from these new genomic data. Anthropogenic habitat change and introduction of mammalian predators resulted in the decline of this Aotearoa New Zealand endemic wading bird during the 1900s. An intense population bottleneck resulting in an ephemeral sex-bias among the remaining kakī contributed to hybridisation with the self-introduced poaka (the Aotearoa New Zealand population of the Australian pied stilt; H. himantopus leucocephalus), a congeneric species previously thought to have diverged from a common ancestor with kakī one million years ago. Intensive conservation management including captive breeding for translocation and predator control has increased kakī numbers from ~23 adults in 1981 to approximately 169 wild adults in 2020. Previous genetic studies identified minimal evidence of introgression of poaka genetic material into kakī, and determined that moderate outbreeding depression in combination with stochastic processes likely limited introgression. These data informed the kakī captive breeding for translocation programme with the aim of maintaining genetic integrity. However, re-evaluation using genomic data was recommended for kakī. Using high-throughput sequencing techniques, I sequenced and assembled the first reference genomes for kakī and Australian pied stilts as tools for use in analyses of introgression. The kakī mitochondrial genome was also assembled to facilitate comparisons of contemporary and historic stilt diversity, showing that conservation management aimed at maximising genetic diversity has largely maintained mitochondrial diversity despite kakī decline, identifying three mitochondrial haplotypes present among contemporary kakī. Kakī and poaka are well-differentiated, and are estimated to have diverged from a common ancestor approximately 750,000 years ago based on Bayesian analysis of mitochondrial data. In addition, the analysis of high-resolution genomic markers generated from approximately 65% of contemporary wild kakī detected no introgression from poaka to kakī despite past hybridisation. These findings confirm the results of previous genetic analysis of introgression and the success of past conservation management. As kakī recovery continues, these combined findings will be used by the New Zealand Department of Conservation’s Kakī Recovery Programme to further maintain the genetic integrity of kakī. Overall, the genomic resources developed here have facilitated the transition from using genetic data to genomic data for kakī recovery, and contribute to our understanding of the impacts of anthropogenic hybridisation on a critically endangered taonga species. Article Full-text available In a context of ongoing biodiversity erosion, obtaining genomic resources from wildlife is essential for conservation. The thousands of yearly mammalian roadkill provide a useful source material for genomic surveys. To illustrate the potential of this underexploited resource, we used roadkill samples to study the genomic diversity of the bat-eared fox ( Otocyon megalotis ) and the aardwolf ( Proteles cristatus ), both having subspecies with similar disjunct distributions in Eastern and Southern Africa. First, we obtained reference genomes with high contiguity and gene completeness by combining Nanopore long reads and Illumina short reads. Then, we showed that the two subspecies of aardwolf might warrant species status ( P. cristatus and P. septentrionalis ) by comparing their genome-wide genetic differentiation to pairs of well-defined species across Carnivora with a new Genetic Differentiation index (GDi) based on only a few resequenced individuals. Finally, we obtained a genome-scale Carnivora phylogeny including the new aardwolf species. Article The question of whether spatial aspects of evolution differ in marine versus terrestrial realms has endured since Ernst Mayr’s 1954 essay on marine speciation. Marine systems are often suggested to support larger and more highly connected populations, but quantitative comparisons with terrestrial systems have been lacking. Here, we compared the population histories of marine and terrestrial elapid snakes using the Pairwise Sequentially Markovian Coalescent (PSMC) model to track historical fluctuations in species’ effective population sizes (Ne) from individual whole‐genome sequences. To do this we generated a draft genome for the olive sea snake (Aiysurus laevis) and analysed this alongside six published elapid genomes and their sequence reads (marine species Hydrophis curtus, H. melanocephalus and Laticauda laticaudata; terrestrial species Pseudonaja textilis, Naja Naja and Notechis scutatus). Counter to the expectation that marine species should show higher overall Ne and less pronounced fluctuations in Ne, our analyses reveal demographic patterns that are highly variable among species and do not clearly correspond to major ecological divisions. At deeper time intervals, the four marine elapids appear to have experienced relatively stable Ne , while each terrestrial species shows a prominent upturn in Ne starting at ~4 mya followed by an equally strong decline. However, over the last million years, all seven species show strong and divergent fluctuations. Estimates of Ne in the most recent intervals (~10 kya) are lowest in two of four marine species (H. melanocephalus and Laticauda), and do not correspond to contemporary range sizes in marine or terrestrial taxa. Article Species interactions, such as pollination, parasitism and predation, form the basis of functioning ecosystems. The origins and resilience of such interactions therefore merit attention. However, fossils only occasionally document ancient interactions, and phylogenetic methods are blind to recent interactions. Is there some other way to track shared species experiences? “Comparative demography” examines when pairs of species jointly thrived or declined. By forging links between ecology, epidemiology, and evolutionary biology, this method sheds light on biological adaptation, species resilience, and ecosystem health. Here, we describe how this method works, discuss examples, and suggest future directions in hopes of inspiring interest, imitators, and critics. Article Full-text available High-elevation organisms experience shared environmental challenges that include low oxygen availability, cold temperatures, and intense UV radiation. Consequently, repeated evolution of the same genetic mechanisms may occur across high-elevation taxa. To test this prediction, we investigated the extent to which the same biochemical pathways, genes, or sites were subject to parallel molecular evolution for 12 Andean hummingbird species (family: Trochilidae) representing several independent transitions to high elevation across the phylogeny. Across high-elevation species, we discovered parallel evolution for several pathways and genes with evidence of positive selection. In particular, positively selected genes were frequently part of cellular respiration, metabolism, or cell death pathways. To further examine the role of elevation in our analyses, we compared results for low- and high-elevation species and tested different thresholds for defining elevation categories. In analyses with different elevation thresholds, positively selected genes reflected similar functions and pathways, even though there were almost no specific genes in common. For example, EPAS1 (HIF2α), which has been implicated in high-elevation adaptation in other vertebrates, shows a signature of positive selection when high-elevation is defined broadly (> 1500 m), but not when defined narrowly (> 2500 m). While a few biochemical pathways and genes change predictably as part of hummingbird adaptation to high-elevation conditions, independent lineages have rarely adapted via the same substitutions. Article Full-text available Heterogeneous genomic divergence between populations may reflect selection, but should also be seen in conjunction with gene flow and drift, particularly population bottlenecks. Marine and freshwater threespine stickleback (Gasterosteus aculeatus) populations often exhibit different lateral armor plate morphs. Moreover, strikingly parallel genomic footprints across different marine-freshwater population pairs are interpreted as parallel evolution and gene reuse. Nevertheless, in some geographic regions like the North Sea and Baltic Sea different patterns are observed. Freshwater populations in coastal regions are often dominated by marine morphs, suggesting that gene flow overwhelms selection, and genomic parallelism may also be less pronounced. We used RAD sequencing for analyzing 28,888 SNPs in two marine and seven freshwater populations in Denmark, Europe. Freshwater populations represented a variety of environments: river populations accessible to gene flow from marine sticklebacks and large and small isolated lakes with and without fish predators. Sticklebacks in an accessible river environment showed minimal morphological and genome-wide divergence from marine populations, supporting the hypothesis of gene flow overriding selection. Allele frequency spectra suggested bottlenecks in all freshwater populations, and particularly two small lake populations. However, genomic footprints ascribed to selection could nevertheless be identified. No genomic regions were consistent freshwater-marine outliers, and parallelism was much lower than in other comparable studies. Two genomic regions previously described to be under divergent selection in freshwater and marine populations were outliers between different freshwater populations. We ascribe these patterns to stronger environmental heterogeneity among freshwater populations in our study as compared to most other studies, although the demographic history involving bottlenecks should also be considered in the interpretation of results.This article is protected by copyright. All rights reserved. Article Full-text available Global climate fluctuations have significantly influenced the distribution and abundance of biodiversity [1]. During unfavorable glacial periods, many species experienced range contraction and fragmentation, expanding again during interglacials [2-4]. An understanding of the evolutionary consequences of both historical and ongoing climate changes requires knowledge of the temporal dynamics of population numbers during such climate cycles. Variation in abundance should have left clear signatures in the patterns of intraspecific genetic variation in extant species, from which historical effective population sizes (Ne) can be estimated [3]. We analyzed whole-genome sequences of 38 avian species in a pairwise sequentially Markovian coalescent (PSMC, [5]) framework to quantitatively reveal changes in Ne from approximately 10 million to 10 thousand years ago. Significant fluctuations in Ne over time were evident for most species. The most pronounced pattern observed in many species was a severe reduction in Ne coinciding with the beginning of the last glacial period (LGP). Among species, Ne varied by at least three orders of magnitude, exceeding 1 million in the most abundant species. Several species on the IUCN Red List of Threatened Species showed long-term reduction in population size, predating recent declines. We conclude that cycles of population expansions and contractions have been a common feature of many bird species during the Quaternary period, likely coinciding with climate cycles. Population size reduction should have increased the risk of extinction but may also have promoted speciation. Species that have experienced long-term declines may be especially vulnerable to recent anthropogenic threats. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved. Article Full-text available Genomes in the mist The mountain gorilla is an iconic species that is at high risk of extinction. Xue et al. have sequenced 13 gorillas from two different populations to probe their genetic diversity. The genomes show large tracts of homozygosity and the loss of highly deleterious genetic variants, indicating population bottlenecks and inbreeding. This loss of genetic diversity appears to have started over 20,000 years ago and may have been caused by changes in climate and human-associated effects. Science , this issue p. 242 Article Full-text available The honeybee Apis mellifera has major ecological and economic importance. We analyze patterns of genetic variation at 8.3 million SNPs, identified by sequencing 140 honeybee genomes from a worldwide sample of 14 populations at a combined total depth of 634×. These data provide insight into the evolutionary history and genetic basis of local adaptation in this species. We find evidence that population sizes have fluctuated greatly, mirroring historical fluctuations in climate, although contemporary populations have high genetic diversity, indicating the absence of domestication bottlenecks. Levels of genetic variation are strongly shaped by natural selection and are highly correlated with patterns of gene expression and DNA methylation. We identify genomic signatures of local adaptation, which are enriched in genes expressed in workers and in immune system– and sperm motility–related genes that might underlie geographic variation in reproduction, dispersal and disease resistance. This study provides a framework for future investigations into responses to pathogens and climate change in honeybees. Article Full-text available Application of high throughput sequencing platforms in the field of ecology and evolutionary biology is developing quickly since the introduction of efficient methods to reduce genome complexity. Numerous approaches for genome complexity reduction have been developed using different combinations of restriction enzymes, library construction strategies and fragment size selection. As a result, the choice of which techniques to use may become cumbersome, because it is difficult to anticipate the number of loci resulting from each method. We develop SimRAD, an R package that performs in silico restriction enzyme digests and fragment size selection as implemented in most restriction associated DNA polymorphism and genotyping by sequencing methods. In silico digestion is performed on a reference genome or on a randomly generated DNA sequence when no reference genome sequence is available. SimRAD accurately predicts the number of loci under alternative protocols when a reference genome sequence is available for the targeted species (or a close relative) but may be unreliable when no reference genome is available. SimRAD is also useful for fine-tuning a given protocol to adjust the number of targeted loci. Here, we outline the functionality of SimRAD and provide an illustrative example of the use of the package (available on the CRAN at http://cran.r-project.org/web/packages/SimRAD).This article is protected by copyright. All rights reserved. Article Full-text available Ecosystem function and resilience is determined by the interactions and independent contributions of individual species. Apex predators play a disproportionately determinant role through their influence and dependence on the dynamics of prey species. Their demographic fluctuations are thus likely to reflect changes in their respective ecological communities and habitat. Here, we investigate the historical population dynamics of the killer whale based on draft nuclear genome data for the Northern Hemisphere and mtDNA data worldwide. We infer a relatively stable population size throughout most of the Pleistocene, followed by an order of magnitude decline and bottleneck during the Weichselian glacial period. Global mtDNA data indicate that while most populations declined, at least one population retained diversity in a stable, productive ecosystem off southern Africa. We conclude that environmental changes during the last glacial period promoted the decline of a top ocean predator, that these events contributed to the pattern of diversity among extant populations, and that the relatively high diversity of a population currently in productive, stable habitat off South Africa suggests a role for ocean productivity in the widespread decline. Article Full-text available Average age and maximum life span of breeding adult three-spined sticklebacks (Gasterosteus aculeatus) were determined in eight Fennoscandian localities with the aid of skeletochronology. The average age varied from 1.8 to 3.6 years, and maximum life span from three to six years depending on the locality. On average, fish from marine populations were significantly older than those from freshwater populations, but variation within habitat types was large. We also found significant differences in mean body size among different habitat types and populations, but only the population differences remained significant after accounting for variation due to age effects. These results show that generation length and longevity in three-spined sticklebacks can vary significantly from one locality to another, and that population differences in mean body size cannot be explained as a simple consequence of differences in population age structure. We also describe a nanistic population from northern Finland exhibiting long life span and small body size. Article Full-text available We introduce a flexible and robust simulation-based framework to infer demographic parameters from the site frequency spectrum (SFS) computed on large genomic datasets. We show that our composite-likelihood approach allows one to study evolutionary models of arbitrary complexity, which cannot be tackled by other current likelihood-based methods. For simple scenarios, our approach compares favorably in terms of accuracy and speed with [Formula: see text], the current reference in the field, while showing better convergence properties for complex models. We first apply our methodology to non-coding genomic SNP data from four human populations. To infer their demographic history, we compare neutral evolutionary models of increasing complexity, including unsampled populations. We further show the versatility of our framework by extending it to the inference of demographic parameters from SNP chips with known ascertainment, such as that recently released by Affymetrix to study human origins. Whereas previous ways of handling ascertained SNPs were either restricted to a single population or only allowed the inference of divergence time between a pair of populations, our framework can correctly infer parameters of more complex models including the divergence of several populations, bottlenecks and migration. We apply this approach to the reconstruction of African demography using two distinct ascertained human SNP panels studied under two evolutionary models. The two SNP panels lead to globally very similar estimates and confidence intervals, and suggest an ancient divergence (>110 Ky) between Yoruba and San populations. Our methodology appears well suited to the study of complex scenarios from large genomic data sets. Article Full-text available Bowtie 1 is a fast and memory-efficient program for aligning short reads to mammalian genomes. Burrows-Wheeler indexing allows Bowtie to align more than 25 million 35-bp reads per CPU hour to the human genome in a memory footprint of as little as 1.1 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a quality-aware search algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve greater alignment speed. Bowtie is free, open source software available for download from http://bowtie.cbcb.umd.edu . The Burrows-Wheeler Transformation of a text T, BWT(T), is constructed as shown to the right. The Burrows- Wheeler Matrix of T is the matrix whose rows are all distinct cyclic rotations of T sorted lexicographically (\$ is "less than" all other characters). BWT(T) is the sequence of characters in the last column of this matrix.
Article
Full-text available
Massively parallel short-read sequencing technologies, coupled with powerful software platforms, are enabling investigators to analyse tens of thousands of genetic markers. This wealth of data is rapidly expanding and allowing biological questions to be addressed with unprecedented scope and precision. The sizes of the data sets are now posing significant data processing and analysis challenges. Here we describe an extension of the Stacks software package to efficiently use genotype-by-sequencing data for studies of populations of organisms. Stacks now produces core population genomic summary statistics and SNP-by-SNP statistical tests. These statistics can be analysed across a reference genome using a smoothed sliding window. Stacks also now provides several output formats for several commonly used downstream analysis packages. The expanded population genomics functions in Stacks will make it a useful tool to harness the newest generation of massively parallel genotyping data for ecological and evolutionary genetics.
Article
Full-text available
Throughout history, the population size of modern humans has varied considerably due to changes in environment, culture, and technology. More accurate estimates of population size changes, and when they occurred, should provide a clearer picture of human colonization history and help remove confounding effects from natural selection inference. Demography influences the pattern of genetic variation in a population, thus genomic data of multiple individuals sampled from one or more present-day populations contain valuable information about the past demographic history. Recently, Li and Durbin developed a coalescent-based hidden Markov model, called PSMC, for a pair of chromosomes (or one diploid individual) to estimate past population sizes. This is an efficient, useful approach, but its accuracy in the very recent past is hampered by the fact that, because of the small sample size, only few coalescence events occur in that period. Multiple genomes from the same population contain more information about the recent past, but are also more computationally challenging to study jointly in a coalescent framework. Here, we present a new coalescent-based method that can efficiently infer population size changes from multiple genomes, providing access to a new store of information about the recent past. Our work generalizes the recently developed sequentially Markov conditional sampling distribution framework, which provides an accurate approximation of the probability of observing a newly sampled haplotype given a set of previously sampled haplotypes. Simulation results demonstrate that we can accurately reconstruct the true population histories, with a significant improvement over the PSMC in the recent past. We apply our method, called diCal (Demographic Inference using Composite Approximate Likelihood), to the genomes of multiple human individuals of European and African ancestry to obtain a detailed population size change history during recent times.
Article
Full-text available
The ability to efficiently and accurately determine genotypes is a keystone technology in modern genetics, crucial to studies ranging from clinical diagnostics, to genotype-phenotype association, to reconstruction of ancestry and the detection of selection. To date, high capacity, low cost genotyping has been largely achieved via "SNP chip" microarray-based platforms which require substantial prior knowledge of both genome sequence and variability, and once designed are suitable only for those targeted variable nucleotide sites. This method introduces substantial ascertainment bias and inherently precludes detection of rare or population-specific variants, a major source of information for both population history and genotype-phenotype association. Recent developments in reduced-representation genome sequencing experiments on massively parallel sequencers (commonly referred to as RAD-tag or RADseq) have brought direct sequencing to the problem of population genotyping, but increased cost and procedural and analytical complexity have limited their widespread adoption. Here, we describe a complete laboratory protocol, including a custom combinatorial indexing method, and accompanying software tools to facilitate genotyping across large numbers (hundreds or more) of individuals for a range of markers (hundreds to hundreds of thousands). Our method requires no prior genomic knowledge and achieves per-site and per-individual costs below that of current SNP chip technology, while requiring similar hands-on time investment, comparable amounts of input DNA, and downstream analysis times on the order of hours. Finally, we provide empirical results from the application of this method to both genotyping in a laboratory cross and in wild populations. Because of its flexibility, this modified RADseq approach promises to be applicable to a diversity of biological questions in a wide range of organisms.
Article
Full-text available
The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10 g-8 per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research. © 2010 Macmillan Publishers Limited. All rights reserved.
Article
Full-text available
Marine stickleback fish have colonized and adapted to thousands of streams and lakes formed since the last ice age, providing an exceptional opportunity to characterize genomic mechanisms underlying repeated ecological adaptation in nature. Here we develop a high-quality reference genome assembly for threespine sticklebacks. By sequencing the genomes of twenty additional individuals from a global set of marine and freshwater populations, we identify a genome-wide set of loci that are consistently associated with marine–freshwater divergence. Our results indicate that reuse of globally shared standing genetic variation, including chromosomal inversions, has an important role in repeated evolution of distinct marine and freshwater sticklebacks, and in the maintenance of divergent ecotypes during early stages of reproductive isolation. Both coding and regulatory changes occur in the set of loci underlying marine–freshwater evolution, but regulatory changes appear to predominate in this well known example of repeated adaptive evolution in nature.
Article
Full-text available
Marine stickleback fish have colonized and adapted to thousands of streams and lakes formed since the last ice age, providing an exceptional opportunity to characterize genomic mechanisms underlying repeated ecological adaptation in nature. Here we develop a high-quality reference genome assembly for threespine sticklebacks. By sequencing the genomes of twenty additional individuals from a global set of marine and freshwater populations, we identify a genome-wide set of loci that are consistently associated with marine–freshwater divergence. Our results indicate that reuse of globally shared standing genetic variation, including chromosomal inversions, has an important role in repeated evolution of distinct marine and freshwater sticklebacks, and in the maintenance of divergent ecotypes during early stages of reproductive isolation. Both coding and regulatory changes occur in the set of loci underlying marine–freshwater evolution, but regulatory changes appear to predominate in this well known example of repeated adaptive evolution in nature.
Article
Full-text available
Advancements in next-generation sequencing technology have enabled whole genome re-sequencing in many species providing unprecedented discovery and characterization of molecular polymorphisms. There are limitations, however, to next-generation sequencing approaches for species with large complex genomes such as barley and wheat. Genotyping-by-sequencing (GBS) has been developed as a tool for association studies and genomics-assisted breeding in a range of species including those with complex genomes. GBS uses restriction enzymes for targeted complexity reduction followed by multiplex sequencing to produce high-quality polymorphism data at a relatively low per sample cost. Here we present a GBS approach for species that currently lack a reference genome sequence. We developed a novel two-enzyme GBS protocol and genotyped bi-parental barley and wheat populations to develop a genetically anchored reference map of identified SNPs and tags. We were able to map over 34,000 SNPs and 240,000 tags onto the Oregon Wolfe Barley reference map, and 20,000 SNPs and 367,000 tags on the Synthetic W9784 × Opata85 (SynOpDH) wheat reference map. To further evaluate GBS in wheat, we also constructed a de novo genetic map using only SNP markers from the GBS data. The GBS approach presented here provides a powerful method of developing high-density markers in species without a sequenced genome while providing valuable tools for anchoring and ordering physical maps and whole-genome shotgun sequence. Development of the sequenced reference genome(s) will in turn increase the utility of GBS data enabling physical mapping of genes and haplotype imputation of missing data. Finally, as a result of low per-sample costs, GBS will have broad application in genomics-assisted plant breeding programs.
Article
Full-text available
Advances in next generation technologies have driven the costs of DNA sequencing down to the point that genotyping-by-sequencing (GBS) is now feasible for high diversity, large genome species. Here, we report a procedure for constructing GBS libraries based on reducing genome complexity with restriction enzymes (REs). This approach is simple, quick, extremely specific, highly reproducible, and may reach important regions of the genome that are inaccessible to sequence capture approaches. By using methylation-sensitive REs, repetitive regions of genomes can be avoided and lower copy regions targeted with two to three fold higher efficiency. This tremendously simplifies computationally challenging alignment problems in species with high levels of genetic diversity. The GBS procedure is demonstrated with maize (IBM) and barley (Oregon Wolfe Barley) recombinant inbred populations where roughly 200,000 and 25,000 sequence tags were mapped, respectively. An advantage in species like barley that lack a complete genome sequence is that a reference map need only be developed around the restriction sites, and this can be done in the process of sample genotyping. In such cases, the consensus of the read clusters across the sequence tagged sites becomes the reference. Alternatively, for kinship analyses in the absence of a reference genome, the sequence tags can simply be treated as dominant markers. Future application of GBS to breeding, conservation, and global species and population surveys may allow plant breeders to conduct genomic selection on a novel germplasm or species without first having to develop any prior molecular tools, or conservation biologists to determine population structure without prior knowledge of the genome or diversity in the species.
Article
Full-text available
The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows-Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is approximately 10-20x faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. http://maq.sourceforge.net.
Article
Full-text available
Single nucleotide polymorphism (SNP) discovery and genotyping are essential to genetic mapping. There remains a need for a simple, inexpensive platform that allows high-density SNP discovery and genotyping in large populations. Here we describe the sequencing of restriction-site associated DNA (RAD) tags, which identified more than 13,000 SNPs, and mapped three traits in two model organisms, using less than half the capacity of one Illumina sequencing run. We demonstrated that different marker densities can be attained by choice of restriction enzyme. Furthermore, we developed a barcoding system for sample multiplexing and fine mapped the genetic basis of lateral plate armor loss in threespine stickleback by identifying recombinant breakpoints in F(2) individuals. Barcoding also facilitated mapping of a second trait, a reduction of pelvic structure, by in silico re-sorting of individuals. To further demonstrate the ease of the RAD sequencing approach we identified polymorphic markers and mapped an induced mutation in Neurospora crassa. Sequencing of RAD markers is an integrated platform for SNP discovery and genotyping. This approach should be widely applicable to genetic mapping in a variety of organisms.
Article
Full-text available
Major phenotypic changes evolve in parallel in nature by molecular mechanisms that are largely unknown. Here, we use positional cloning methods to identify the major chromosome locus controlling armor plate patterning in wild threespine sticklebacks. Mapping, sequencing, and transgenic studies show that the Ectodysplasin (EDA) signaling pathway plays a key role in evolutionary change in natural populations and that parallel evolution of stickleback low-plated phenotypes at most freshwater locations around the world has occurred by repeated selection of Eda alleles derived from an ancestral low-plated haplotype that first appeared more than two million years ago. Members of this clade of low-plated alleles are present at low frequencies in marine fish, which suggests that standing genetic variation can provide a molecular basis for rapid, parallel evolution of dramatic phenotypic change in nature.
Article
Many previous estimates of the mutation rate in humans have relied on screens of visible mutants. We investigated the rate and pattern of mutations at the nucleotide level by comparing pseudogenes in humans and chimpanzees to (i) provide an estimate of the average mutation rate per nucleotide, (ii) assess heterogeneity of mutation rate at different sites and for different types of mutations, (iii) test the hypothesis that the X chromosome has a lower mutation rate than autosomes, and (iv) estimate the deleterious mutation rate. Eighteen processed pseudogenes were sequenced, including 12 on autosomes and 6 on the X chromosome. The average mutation rate was estimated to be ~2.5 × 10−8 mutations per nucleotide site or 175 mutations per diploid genome per generation. Rates of mutation for both transitions and transversions at CpG dinucleotides are one order of magnitude higher than mutation rates at other sites. Single nucleotide substitutions are 10 times more frequent than length mutations. Comparison of rates of evolution for X-linked and autosomal pseudogenes suggests that the male mutation rate is 4 times the female mutation rate, but provides no evidence for a reduction in mutation rate that is specific to the X chromosome. Using conservative calculations of the proportion of the genome subject to purifying selection, we estimate that the genomic deleterious mutation rate (U) is at least 3. This high rate is difficult to reconcile with multiplicative fitness effects of individual mutations and suggests that synergistic epistasis among harmful mutations may be common.
Article
We analyzed 81 whole genome sequences of threespine sticklebacks from Pacific North America, Greenland and Northern Europe, representing 16 populations. Principal component analysis of nuclear SNPs grouped populations according to geographical location, with Pacific populations being more divergent from each other relative to European and Greenlandic populations. Analysis of mitogenome sequences showed Northern European populations to represent a single phylogeographical lineage, whereas Greenlandic and particularly Pacific populations showed admixture between lineages. We estimated demographic history using a genome-wide coalescence with recombination approach. The Pacific populations showed gradual population expansion starting > 100 Kya, possibly reflecting persistence in cryptic refuges near the present distributional range, although we do not rule out possible influence of ancient admixture. Sharp population declines ca. 14-15 Kya were suggested to reflect founding of freshwater populations by marine ancestors. In Greenland and Northern Europe demographic expansion started ca. 20-25 Kya coinciding with the end of the Last Glacial Maximum. In both regions marine and freshwater populations started to show different demographic trajectories ca. 8-9 Kya, suggesting that this was the time of recolonization. In Northern Europe this estimate was surprisingly late, but found support in subfossil evidence for presence of several freshwater fish species but not sticklebacks 12 Kya. The results demonstrate distinctly different demographic histories across geographical regions with potential consequences for adaptive processes. They also provide empirical support for previous assumptions about freshwater populations being founded independently from large, coherent marine populations, a key element in the Transporter Hypothesis invoked to explain the widespread occurrence of parallel evolution across freshwater stickleback populations.
Article
Identifying the genetic structure of a species and the factors that drive it are important first steps in modern population management, in part because populations evolving from separate ancestral sources may possess potentially different characteristics. This is especially true for climate-sensitive species such as pikas, where the delimitation of distinct genetic units and the characterization of population responses to contemporary and historical environmental pressures is of particular interest. We combine a restriction-associated DNA sequencing (RADSeq) dataset containing 4,156 single nucleotide polymorphisms with ecological niche models (ENMs) of present and past habitat suitability to characterize population composition and evaluate the effects of historical range shifts, contemporary climates, and landscape factors on gene flow in Collared Pikas, which are found in Alaska and adjacent regions of northwestern Canada and are the lesser-studied of North America's two pika species. The results suggest that contemporary environmental factors contribute little to current population connectivity. Instead, genetic diversity is strongly shaped by the presence of three ancestral lineages isolated during the Pleistocene (~148 and 52 kya). Based on ENMs and genetic data, populations originating from a northern refugium experienced longer-term stability whereas both southern lineages underwent population expansion – contradicting the southern stability and northern expansion patterns seen in many other taxa. Current populations are comparable with respect to generally low diversity within populations and little to no recent admixture. The predominance of divergent histories structuring populations implies that if we are to understand and manage pika populations we must specifically assess and accurately account for the forces underlying genetic similarity.This article is protected by copyright. All rights reserved.
Article
The processes leading up to species extinctions are typically characterized by prolonged declines in population size and geographic distribution, followed by a phase in which populations are very small and may be subject to intrinsic threats, including loss of genetic diversity and inbreeding [1]. However, whether such genetic factors have had an impact on species prior to their extinction is unclear [2, 3]; examining this would require a detailed reconstruction of a species' demographic history as well as changes in genome-wide diversity leading up to its extinction. Here, we present high-quality complete genome sequences from two woolly mammoths (Mammuthus primigenius). The first mammoth was sequenced at 17.1-fold coverage and dates to ∼4,300 years before present, representing one of the last surviving individuals on Wrangel Island. The second mammoth, sequenced at 11.2-fold coverage, was obtained from an ∼44,800-year-old specimen from the Late Pleistocene population in northeastern Siberia. The demographic trajectories inferred from the two genomes are qualitatively similar and reveal a population bottleneck during the Middle or Early Pleistocene, and a more recent severe decline in the ancestors of the Wrangel mammoth at the end of the last glaciation. A comparison of the two genomes shows that the Wrangel mammoth has a 20% reduction in heterozygosity as well as a 28-fold increase in the fraction of the genome that comprises runs of homozygosity. We conclude that the population on Wrangel Island, which was the last surviving woolly mammoth population, was subject to reduced genetic diversity shortly before it became extinct. Copyright © 2015 Elsevier Ltd. All rights reserved.
Article
Inferring demographic history is an important task in population genetics. Many existing inference methods are based on predefined simplified population models, which are more suitable for hypothesis testing than exploratory analysis. We developed a novel model-flexible method called stairway plot, which infers changes in population size over time using SNP frequency spectra. This method is applicable for whole-genome sequences of hundreds of individuals. Using extensive simulation, we demonstrate the usefulness of the method for inferring demographic history, especially recent changes in population size. We apply the method to the whole-genome sequence data of 9 populations from the 1000 Genomes Project and show a pattern of fluctuations in human populations from 10,000 to 200,000 years ago.
Article
Advances in next generation technologies have driven the costs of DNA sequencing down to the point that genotyping-by-sequencing (GBS) is now feasible for high diversity, large genome species. Here, we report a procedure for constructing GBS libraries based on reducing ...
Article
The availability of complete human genome sequences from populations across the world has given rise to new population genetic inference methods that explicitly model ancestral relationships under recombination and mutation. So far, application of these methods to evolutionary history more recent than 20,000-30,000 years ago and to population separations has been limited. Here we present a new method that overcomes these shortcomings. The multiple sequentially Markovian coalescent (MSMC) analyzes the observed pattern of mutations in multiple individuals, focusing on the first coalescence between any two individuals. Results from applying MSMC to genome sequences from nine populations across the world suggest that the genetic separation of non-African ancestors from African Yoruban ancestors started long before 50,000 years ago and give information about human population history as recent as 2,000 years ago, including the bottleneck in the peopling of the Americas and separations within Africa, East Asia and Europe.
Article
Genetic markers — heritable polymorphisms that can be measured in one or more populations of individuals — lie at the heart of modern genetics and enable the study of important questions in population genetics, ecological genetics and evolution. In 2003, Luikart et al.1 wrote: ...
Article
High-throughput sequencing technologies are revolutionizing the life sciences. The past 12 months have seen a burst of genome sequences from non-model organisms, in each case representing a fundamental source of data of significant importance to biological research. This has bearing on several aspects of evolutionary biology, and we are now beginning to see patterns emerging from these studies. These include significant heterogeneity in the rate of recombination that affects adaptive evolution and base composition, the role of population size in adaptive evolution, and the importance of expansion of gene families in lineage-specific adaptation. Moreover, resequencing of population samples (population genomics) has enabled the identification of the genetic basis of critical phenotypes and cast light on the landscape of genomic divergence during speciation.
Article
Reduced representation genome sequencing such as restriction-site-associated DNA (RAD) sequencing is finding increased use to identify and genotype large numbers of single-nucleotide polymorphisms (SNPs) in model and nonmodel species. We generated a unique resource of novel SNP markers for the European eel using the RAD sequencing approach that was simultaneously identified and scored in a genome-wide scan of 30 individuals. Whereas genomic resources are increasingly becoming available for this species, including the recent release of a draft genome, no genome-wide set of SNP markers was available until now. The generated SNPs were widely distributed across the eel genome, aligning to 4779 different contigs and 19 703 different scaffolds. Significant variation was identified, with an average nucleotide diversity of 0.00529 across individuals. Results varied widely across the genome, ranging from 0.00048 to 0.00737 per locus. Based on the average nucleotide diversity across all loci, long-term effective population size was estimated to range between 132 000 and 1 320 000, which is much higher than previous estimates based on microsatellite loci. The generated SNP resource consisting of 82 425 loci and 376 918 associated SNPs provides a valuable tool for future population genetics and genomics studies and allows for targeting specific genes and particularly interesting regions of the eel genome.
Article
Remains of fishes, birds and mammals are rarely reported from Quaternary deposits in Greenland. The oldest remains come from Late Pliocene and Early Pleistocene deposits and comprise Atlantic cod, hare, rabbit and ringed seal. Interglacial and interstadial deposits have yielded remains of cod, little auk, collared lemming, ringed seal, reindeer and bowhead whale. Early and Mid-Holocene finds include capelin, polar cod, red fish, sculpin, three-spined stickleback, Lapland longspur, Arctic hare, collared lemming, wolf, walrus, ringed seal, reindeer and bowhead whale. It is considered unlikely that vertebrates could survive in Greenland during the peak of the last glaciation, but many species had probably already immigrated in the Early Holocene.
Article
The history of human population size is important for understanding human evolution. Various studies have found evidence for a founder event (bottleneck) in East Asian and European populations, associated with the human dispersal out-of-Africa event around 60 thousand years (kyr) ago. However, these studies have had to assume simplified demographic models with few parameters, and they do not provide a precise date for the start and stop times of the bottleneck. Here, with fewer assumptions on population size changes, we present a more detailed history of human population sizes between approximately ten thousand and a million years ago, using the pairwise sequentially Markovian coalescent model applied to the complete diploid genome sequences of a Chinese male (YH), a Korean male (SJK), three European individuals (J. C. Venter, NA12891 and NA12878 (ref. 9)) and two Yoruba males (NA18507 (ref. 10) and NA19239). We infer that European and Chinese populations had very similar population-size histories before 10-20 kyr ago. Both populations experienced a severe bottleneck 10-60 kyr ago, whereas African populations experienced a milder bottleneck from which they recovered earlier. All three populations have an elevated effective population size between 60 and 250 kyr ago, possibly due to population substructure. We also infer that the differentiation of genetically modern humans may have started as early as 100-120 kyr ago, but considerable genetic exchanges may still have occurred until 20-40 kyr ago.
Article
The advent of next-generation sequencing (NGS) has revolutionized genomic and transcriptomic approaches to biology. These new sequencing tools are also valuable for the discovery, validation and assessment of genetic markers in populations. Here we review and discuss best practices for several NGS methods for genome-wide genetic marker development and genotyping that use restriction enzyme digestion of target genomes to reduce the complexity of the target. These new methods -- which include reduced-representation sequencing using reduced-representation libraries (RRLs) or complexity reduction of polymorphic sequences (CRoPS), restriction-site-associated DNA sequencing (RAD-seq) and low coverage genotyping -- are applicable to both model organisms with high-quality reference genome sequences and, excitingly, to non-model species with no existing genomic data.
Article
Many previous estimates of the mutation rate in humans have relied on screens of visible mutants. We investigated the rate and pattern of mutations at the nucleotide level by comparing pseudogenes in humans and chimpanzees to (i) provide an estimate of the average mutation rate per nucleotide, (ii) assess heterogeneity of mutation rate at different sites and for different types of mutations, (iii) test the hypothesis that the X chromosome has a lower mutation rate than autosomes, and (iv) estimate the deleterious mutation rate. Eighteen processed pseudogenes were sequenced, including 12 on autosomes and 6 on the X chromosome. The average mutation rate was estimated to be approximately 2.5 x 10(-8) mutations per nucleotide site or 175 mutations per diploid genome per generation. Rates of mutation for both transitions and transversions at CpG dinucleotides are one order of magnitude higher than mutation rates at other sites. Single nucleotide substitutions are 10 times more frequent than length mutations. Comparison of rates of evolution for X-linked and autosomal pseudogenes suggests that the male mutation rate is 4 times the female mutation rate, but provides no evidence for a reduction in mutation rate that is specific to the X chromosome. Using conservative calculations of the proportion of the genome subject to purifying selection, we estimate that the genomic deleterious mutation rate (U) is at least 3. This high rate is difficult to reconcile with multiplicative fitness effects of individual mutations and suggests that synergistic epistasis among harmful mutations may be common.
Article
A Monte Carlo computer program is available to generate samples drawn from a population evolving according to a Wright–Fisher neutral model. The program assumes an infinite-sites model of mutation, and allows recombination, gene conversion, symmetric migration among subpopulations, and a variety of demographic histories. The samples produced can be used to investigate the sampling properties of any sample statistic under these neutral models. Availability: The source code for the program (in the language C) is available at http://home.uchicago.edu/~rhudson1/source/mksamples.html. Contact: rr-hudson{at}uchicago.edu
Article
The coalescent with recombination describes the distribution of genealogical histories and resulting patterns of genetic variation in samples of DNA sequences from natural populations. However, using the model as the basis for inference is currently severely restricted by the computational challenge of estimating the likelihood. We discuss why the coalescent with recombination is so challenging to work with and explore whether simpler models, under which inference is more tractable, may prove useful for genealogy-based inference. We introduce a simplification of the coalescent process in which coalescence between lineages with no overlapping ancestral material is banned. The resulting process has a simple Markovian structure when generating genealogies sequentially along a sequence, yet has very similar properties to the full model, both in terms of describing patterns of genetic variation and as the basis for statistical inference.
Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data
• Rn Gutenkunst
• Rd Hernandez
• Sh Williamson
• Cd Bustamante
Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD (2009) Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. Plos Genetics 5, e1000695.
Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data
• Gutenkunst