Preprint

Evolution of barrier loci at an intermediate stage of speciation with gene flow

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

Understanding the origin of new species is a central goal in evolutionary biology. Diverging lineages often evolve highly heterogeneous patterns of differentiation; however, the underlying mechanisms are not well understood. We used an integrated approach to investigate evolutionary processes governing genetic differentiation between the hybridizing campions (Silene dioica (L.) Clairv. and S. latifolia Poiret). Demographic modeling indicated that the two species diverged with continuous gene flow. The best-supported scenario with heterogeneity in both migration rate and effective population size suggested that 5% of the loci evolved without gene flow. Differentiation (FST) and sequence divergence (dXY) were correlated and both tended to peak in the middle of most linkage groups, consistent with reduced gene flow at highly differentiated loci. Highly differentiated loci further exhibited signatures of selection and differentiation was significantly elevated around previously identified QTLs associated with assortative mating. In between-species population pairs, isolation by distance was stronger for genomic regions with low between-species differentiation than for highly differentiated regions that may contain barrier loci. Moreover, differentiation landscapes within and between species were only weakly correlated suggesting that the interplay of background selection and conserved genomic features is not the dominant determinant of genetic differentiation in these lineages. Instead, our results suggest that divergent selection drove the evolution of barrier loci played and the genomic landscape of differentiation between the two species, consistent with predictions for speciation in the face of gene flow.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Sympatric speciation illustrates how natural and sexual selection may create new species in isolation without geographic barriers. However, recent genomic reanalyses of classic examples of sympatric speciation reveal complex histories of secondary gene flow from outgroups into the radiation. In contrast, the rich theoretical literature on this process distinguishes among a diverse range of models based on simple genetic histories and different types of reproductive isolating barriers. Thus, there is a need to revisit how to connect theoretical models of sympatric speciation and their predictions to empirical case studies in the face of widespread gene flow. Here, theoretical differences among different types of sympatric speciation and speciation‐with‐gene‐flow models are reviewed and summarized, and genomic analyses are proposed for distinguishing which models apply to case studies based on the timing and function of adaptive introgression. Investigating whether secondary gene flow contributed to reproductive isolation is necessary to test whether predictions of theory are ultimately borne out in nature. Sympatric speciation means two different things to empirical and theoretical biologists. Recent genomic analyses of classic sympatric speciation examples reveal complex histories of secondary gene flow from outgroups. It is argued that reconciling diverse theoretical models with existing empirical examples requires investigating the role that gene flow played in the process.
Article
Full-text available
Sequencing reduced‐representation libraries of restriction‐site associated DNA (RADseq) to identify single nucleotide polymorphisms (SNPs) is quickly becoming a standard methodology for molecular ecologists. Because of the scale of RADseq data sets, putative loci cannot be assessed individually, making the process of filtering noise and correctly identifying biologically meaningful signal more difficult. Artifacts introduced during library preparation and/ bioinformatic processing of SNP data can create patterns that are incorrectly interpreted as indicative of population structure or natural selection. Therefore, it is crucial to carefully consider types of errors that may be introduced during laboratory work and data processing, and how to minimize, detect, and remove these errors. Here, we discuss issues inherent to RADseq methodologies that can result in artifacts during library preparation and locus reconstruction, resulting in erroneous SNP calls and ultimately, genotyping error. Further, we describe steps that can be implemented to create a rigorously filtered data set consisting of markers accurately representing independent loci and compare the effect of different combinations of filters on four RAD data sets. Finally, we stress the importance of publishing raw sequence data along with final filtered data sets in addition to detailed documentation of filtering steps and quality control measures. This article is protected by copyright. All rights reserved.
Article
Full-text available
Speciation is a continuous process and analysis of species pairs at different stages of divergence provides insight into how it unfolds. Previous genomic studies on young species pairs have revealed peaks of divergence and heterogeneous genomic differentiation. Yet less known is how localised peaks of differentiation progress to genome-wide divergence during the later stages of speciation in the presence of persistent gene flow. Spanning the speciation continuum, stickleback species pairs are ideal for investigating how genomic divergence builds up during speciation. However, attention has largely focused on young postglacial species pairs, with little knowledge of the genomic signatures of divergence and introgression in older stickleback systems. The Japanese stickleback species pair, composed of the Pacific Ocean three-spined stickleback (Gasterosteus aculeatus) and the Japan Sea stickleback (G. nipponicus), which co-occur in the Japanese islands, is at a late stage of speciation. Divergence likely started well before the end of the last glacial period and crosses between Japan Sea females and Pacific Ocean males result in hybrid male sterility. Here we use coalescent analyses and Approximate Bayesian Computation to show that the two species split approximately 0.68–1 million years ago but that they have continued to exchange genes at a low rate throughout divergence. Population genomic data revealed that, despite gene flow, a high level of genomic differentiation is maintained across the majority of the genome. However, we identified multiple, small regions of introgression, occurring mainly in areas of low recombination rate. Our results demonstrate that a high level of genome-wide divergence can establish in the face of persistent introgression and that gene flow can be localized to small genomic regions at the later stages of speciation with gene flow.
Article
Full-text available
The genetic basis of parallel evolution of similar species is of great interest in evolutionary biology. In the adaptive radiation of Lake Victoria cichlid fishes, sister species with either blue or red-back male nuptial coloration have evolved repeatedly, often associated with shallower and deeper water, respectively. One such case are blue and red-backed Pundamilia species, for which we recently showed that a young species pair may have evolved through "hybrid parallel speciation". Coalescent simulations suggested that the older species P. pundamilia (blue) and P. nyererei (red-back) admixed in the Mwanza Gulf and that new "nyererei-like" and "pundamilia-like" species evolved from the admixed population. Here, we use genome scans to study the genomic architecture of differentiation, and assess the influence of hybridization on the evolution of the younger species pair. For each of the two species pairs, we find over 300 genomic regions, widespread across the genome, which are highly differentiated. A subset of the most strongly differentiated regions of the older pair are also differentiated in the younger pair. These shared differentiated regions often show parallel allele frequency differences, consistent with the hypothesis that admixture-derived alleles were targeted by divergent selection in the hybrid population. However, two thirds of the genomic regions that are highly differentiated between the younger species are not highly differentiated between the older species, suggesting independent evolutionary responses to selection pressures. Our analyses reveal how divergent selection on admixture-derived genetic variation can facilitate new speciation events.
Article
Full-text available
Heterogeneous patterns of genomic differentiation are commonly documented between closely related populations and there is considerable interest in identifying factors that contribute to their formation. These factors could include genomic features (e.g., areas of low recombination) that promote processes like linked selection (positive or purifying selection that affects linked neutral sites) at specific genomic regions. Examinations of repeatable patterns of differentiation across population pairs can provide insight into the role of these factors. Birds are well suited for this work, as genome structure is conserved across this group. Accordingly, we reestimated relative (FST) and absolute (dXY) differentiation between eight sister pairs of birds that span a broad taxonomic range using a common pipeline. Across pairs, there were modest but significant correlations in window-based estimates of differentiation (up to 3% of variation explained for FST and 26% for dXY), supporting a role for processes at conserved genomic features in generating heterogeneous patterns of differentiation; processes specific to each episode of population divergence likely explain the remaining variation. The role genomic features play was reinforced by linear models identifying several genomic variables (e.g., gene densities) as significant predictors of FST and dXY repeatability. FST repeatability was higher among pairs that were further along the speciation continuum (i.e., more reproductively isolated) providing further insight into how genomic differentiation changes with population divergence; early stages of speciation may be dominated by positive selection that is different between pairs but becomes integrated with processes acting according to shared genomic features as speciation proceeds.
Article
Full-text available
Genetic differentiation between divergent populations is often greater in chromosome centers than in chromosome peripheries. Commonly overlooked, this broad-scale differentiation pattern is sometimes ascribed to heterogeneity in crossover rate and hence linked selection within chromosomes, but the underlying mechanisms remain incompletely understood. A literature survey across 46 organisms reveals that most eukaryotes indeed exhibit a reduced crossover rate in chromosome centers relative to the peripheries. Using simulations of populations diverging into ecologically different habitats through sorting of standing genetic variation, we demonstrate that such chromosome-scale heterogeneity in crossover rate, combined with polygenic divergent selection, causes stronger hitchhiking and especially barriers to gene flow across chromosome centers. Without requiring selection on new mutations, this rapidly leads to elevated population differentiation in the low-crossover centers relative to the high-crossover peripheries of chromosomes (‘Chromosome Center-Biased Differentiation’, CCBD). Using simulated and empirical data, we then show that strong CCBD between populations can provide evidence of polygenic adaptive divergence with gene flow. We further demonstrate that chromosome-scale heterogeneity in crossover rate impacts analyses beyond that of population differentiation, including the inference of phylogenies and parallel adaptive evolution among populations, the detection of genetic loci under selection, and the interpretation of the strength of selection on genomic regions. Overall, our results call for a greater appreciation of chromosome-scale heterogeneity in crossover rate in evolutionary genomics. This article is protected by copyright. All rights reserved.
Article
Full-text available
Identifying genomic regions underlying adaptation in extant lineages is key to understanding the trajectories along which biodiversity evolves. However, this task is complicated by evolutionary processes that obscure and mimic footprints of positive selection. Particularly, the long-term effects of linked selection remain underappreciated and difficult to account for. Based on patterns emerging from recent research on the evolution of differentiation across the speciation continuum, I illustrate how long-term linked selection affects the distribution of differentiation along genomes. I then argue that a comparative population genomics framework that exploits emergent features of long-term linked selection can help overcome shortcomings of traditional genome scans for adaptive evolution, but needs to account for the temporal dynamics of differentiation landscapes.
Article
Full-text available
Genome-wide screens of genetic variation within and between populations can reveal signatures of selection implicated in adaptation and speciation. Genomic regions with low genetic diversity and elevated differentiation reflective of locally reduced effective population sizes (Ne ) are candidates for barrier loci contributing to population divergence. Yet, such candidate genomic regions need not arise as a result of selection promoting adaptation or advancing reproductive isolation. Linked selection unrelated to lineage-specific adaptation or population divergence can generate comparable signatures. It is challenging to distinguish between these processes, particularly when diverging populations share ancestral genetic variation. In this study, we took a comparative approach using population assemblages from distant clades assessing genomic parallelism of variation in Ne . Utilizing population-level polymorphism data from 444 re-sequenced genomes of three avian clades spanning 50 million years of evolution we tested whether population genetic summary statistics reflecting genome-wide variation in Ne would co-vary among populations within clades, and importantly, also among clades where lineage sorting has been completed. All statistics including population-scaled recombination rate (ρ), nucleotide diversity (π) and measures of genetic differentiation between populations (FST , PBS, dxy ) were significantly correlated across all phylogenetic distances. Moreover, genomic regions with elevated levels of genetic differentiation were associated with inferred peri-centromeric and sub-telomeric regions. The phylogenetic stability of diversity landscapes and stable association with genomic features support a role of linked selection not necessarily associated with adaptation and speciation in shaping patterns of genome-wide heterogeneity in genetic diversity. This article is protected by copyright. All rights reserved.
Article
Full-text available
Author Summary Isolated populations accumulate genetic differences across their genomes as they diverge, whereas gene flow between populations counteracts divergence and tends to restore genetic homogeneity. Speciation proceeds by the accumulation at specific loci of mutations that reduce the fitness of hybrids, therefore preventing gene flow—the so-called species barriers. Importantly, species barriers are expected to act locally within the genome, leading to the prediction of a mosaic pattern of genetic differentiation between populations at intermediate levels of divergence—the genic view of speciation. At the same time, linked selection also contributes to speed up differentiation in low-recombining and gene-dense regions. We used a modelling approach that accounts for both sources of genomic heterogeneity and explored a wide continuum of genomic divergence made by 61 pairs of species/populations in animals. Our analysis provides a unifying picture of the relationship between molecular divergence and ability to exchange genes. We show that the "grey zone" of speciation—the intermediate state in which species definition is controversial—spans from 0.5% to 2% of molecular divergence, with these thresholds being independent of species life history traits and ecology. Semi-isolated species, between which alleles can be exchanged at some but not all loci, are numerous, with the earliest species barriers being detected at divergences as low as 0.075%. These results have important implications regarding taxonomy, conservation biology, and the management of biodiversity.
Article
Full-text available
Ecological speciation is the evolution of reproductive isolation as a consequence of direct divergent natural selection or ecologically mediated divergent sexual selection. While the genomic signature of the former has been extensively studied in recent years, only few examples exist for genomic differentiation where environment-dependent sexual selection has played an important role. Here, we describe a very young (~90 years old) population of threespine sticklebacks exhibiting phenotypic and genomic differentiation between two habitats within the same pond. We show that differentiation among habitats is limited to male throat color and nest type, traits known to be subject to sexual selection. Divergence in these traits mirrors divergence in much older benthic and limnetic stickleback species pairs from North American Westcoast lakes, which also occur in sympatry but are strongly reproductively isolated from each other. We demonstrate that in our population, differences in throat color and breeding have been stable over a decade, but in contrast to North American benthic and limnetic stickleback species, these mating trait differences are not accompanied by divergence in morphology related to feeding, predator defense or swimming performance. Using genome-wide SNP data, we find multiple genomic islands with moderate differentiation spread across several chromosomes, whereas the rest of the genome is undifferentiated. The islands contain potential candidate genes involved in visual perception of color. Our results suggest that phenotypic and multichromosome genomic divergence of these morphs was driven by environment-dependent sexual selection, demonstrating incipient speciation after only a few decades of divergence in sympatry.
Article
Full-text available
Genomic islands are clusters of loci with elevated divergence that are commonly found in population genomic studies of local adaptation and speciation. One explanation for their evolution is that linkage between selected alleles confers a benefit, which increases the establishment probability of new mutations that are linked to existing locally adapted polymorphisms. Previous theory suggested there is only limited potential for the evolution of islands via this mechanism, but involved some simplifying assumptions that may limit the accuracy of this inference. Here, we extend previous analytical approaches to study the effect of linkage on the establishment probability of new mutations and identify parameter regimes that are most likely to lead to evolution of islands via this mechanism. We show how the interplay between migration and selection affects the establishment probability of linked vs. unlinked alleles, the expected maximum size of genomic islands, and the expected time required for their evolution. Our results agree with previous studies, suggesting that this mechanism alone is unlikely to be a general explanation for the evolution of genomic islands. However, this mechanism could occur more readily if there were other pre-adaptations to reduce local rates of recombination or increase the local density of mutational targets within the region of the island. We also show that island formation via erosion following secondary contact is much more rapid than island formation from de novo mutations, suggesting that this mechanism may be more likely.
Article
Full-text available
Genome-wide patterns of genetic divergence reveal mechanisms of adaptation under gene flow. Empirical data show that divergence is mostly concentrated in narrow genomic regions. This pattern may arise because differentiated loci protect nearby mutations from gene flow, but recent theory suggests this mechanism is insufficient to explain the emergence of concentrated differentiation during biologically realistic timescales. Critically, earlier theory neglects an inevitable consequence of genetic drift: stochastic loss of local genomic divergence. Here we demonstrate that the rate of stochastic loss of weak local differentiation increases with recombination distance to a strongly diverged locus and, above a critical recombination distance, local loss is faster than local `gain' of new differentiation. Under high migration and weak selection this critical recombination distance is much smaller than the total recombination distance of the genomic region under selection. Consequently, divergence between populations increases by net gain of new differentiation within the critical recombination distance, resulting in tightly-linked clusters of divergence. The mechanism responsible is the balance between stochastic loss and gain of weak local differentiation, a mechanism acting universally throughout the genome. Our results will help to explain empirical observations and lead to novel predictions regarding changes in genomic architectures during adaptive divergence.
Article
Full-text available
Despite the global economic and ecological importance of forest trees, the genomic basis of differential adaptation and speciation in tree species is still poorly understood. Populus tremula and P. tremuloides are two of the most widespread tree species in the Northern Hemisphere. Using whole-genome re-sequencing data of 24 P. tremula and 22 P. tremuloides individuals, we find that the two species diverged ~2.2-3.1 million years ago, coinciding with the severing of the Bering land bridge and the onset of dramatic climatic oscillations during the Pleistocene. Both species have experienced substantial population expansions following long-term declines after species divergence. We detect widespread and heterogeneous genomic differentiation between species, and in accordance with the expectation of allopatric speciation, coalescent simulations suggest that neutral evolutionary processes can account for most of the observed patterns of genetic differentiation. However, there is an excess of regions exhibiting extreme differentiation relative to those expected under demographic simulations, which is indicative of the action of natural selection. Overall genetic differentiation is negatively associated with recombination rate in both species, providing strong support for a role of linked selection in generating the heterogeneous genomic landscape of differentiation between species. Finally, we identify a number of candidate regions and genes that may have been subject to positive and/or balancing selection during the speciation process.
Article
Full-text available
Speciation events often occur in rapid bursts of diversification, but the ecological and genetic factors that promote these radiations are still much debated. Using whole transcriptomes from all 13 species in the ecologically and reproductively diverse wild tomato clade (Solanum sect. Lycopersicon), we infer the species phylogeny and patterns of genetic diversity in this group. Despite widespread phylogenetic discordance due to the sorting of ancestral variation, we date the origin of this radiation to approximately 2.5 million years ago and find evidence for at least three sources of adaptive genetic variation that fuel diversification. First, we detect introgression both historically between early-branching lineages and recently between individual populations, at specific loci whose functions indicate likely adaptive benefits. Second, we find evidence of lineage-specific de novo evolution for many genes, including loci involved in the production of red fruit color. Finally, using a "PhyloGWAS" approach, we detect environment-specific sorting of ancestral variation among populations that come from different species but share common environmental conditions. Estimated across the whole clade, small but substantial and approximately equal fractions of the euchromatic portion of the genome are inferred to contribute to each of these three sources of adaptive genetic variation. These results indicate that multiple genetic sources can promote rapid diversification and speciation in response to new ecological opportunity, in agreement with our emerging phylogenomic understanding of the complexity of both ancient and recent species radiations.
Article
Full-text available
Cichlids diverge within a crater lake It is not clear how populations diversify and new species form at the genomic level, especially when they coexist in the same location. Malinsky et al. investigated how two ecomorphs of cichlid fish in a small lake in Tanzania are diversifying relative to each other. Although there is gene flow between the two forms, major regions of genetic divergence, known as genomic islands, separate the populations. Within these islands, the authors found genes likely to be associated with mate choice, supporting the idea that genetic changes related to breeding preferences are the first to diverge during speciation. Science , this issue p. 1493
Article
Full-text available
The patterns of genomic divergence during ecological speciation are shaped by a combination of evolutionary forces. Processes such as genetic drift, local reduction of gene flow around genes causing reproductive isolation, hitchhiking around selected variants, variation in recombination and mutation rates are all factors that can contribute to the heterogeneity of genomic divergence. On the basis of 60 fully sequenced three-spined stickleback genomes, we explore these different mechanisms explaining the heterogeneity of genomic divergence across five parapatric lake and river population pairs varying in their degree of genetic differentiation. We find that divergent regions of the genome are mostly specific for each population pair, while their size and abundance are not correlated with the extent of genome-wide population differentiation. In each pair-wise comparison, an analysis of allele frequency spectra reveals that 25-55% of the divergent regions are consistent with a local restriction of gene flow. Another large proportion of divergent regions (38-75%) appears to be mainly shaped by hitchhiking effects around positively selected variants. We provide empirical evidence that alternative mechanisms determining the evolution of genomic patterns of divergence are not mutually exclusive, but rather act in concert to shape the genome during population differentiation, a first necessary step towards ecological speciation.
Article
Full-text available
The European sea bass (Dicentrarchus labrax) is a temperate zone euryhaline teleost of prime importance for aquaculture and fisheries. This species is subdivided into two naturally hybridizing lineages, one inhabiting the north-eastern Atlantic Ocean and the other the Mediterranean and Black seas. Here, we provide a high-quality chromosome-scale assembly of its genome that shows a high degree of synteny with the more highly derived teleosts. We find expansions of gene families specifically associated with ion and water regulation, highlighting adaptation to variation in salinity. We further generate a genome-wide variation map through RAD-sequencing of Atlantic and Mediterranean populations. We show that variation in local recombination rates strongly influences the genomic landscape of diversity within and differentiation between lineages. Comparing predictions of alternative demographic models to the joint allele-frequency spectrum indicates that genomic islands of differentiation between sea bass lineages were generated by varying rates of introgression across the genome following a period of geographical isolation.
Article
Full-text available
Restriction-site associated DNA sequencing (RADseq) has become a powerful and useful approach for population genomics. Currently, no software exists that utilizes both paired-end reads from RADseq data to efficiently produce population-informative variant calls, especially for non-model organisms with large effective population sizes and high levels of genetic polymorphism. dDocent is an analysis pipeline with a user-friendly, command-line interface designed to process individually barcoded RADseq data (with double cut sites) into informative SNPs/Indels for population-level analyses. The pipeline, written in BASH, uses data reduction techniques and other stand-alone software packages to perform quality trimming and adapter removal, de novo assembly of RAD loci, read mapping, SNP and Indel calling, and baseline data filtering. Double-digest RAD data from population pairings of three different marine fishes were used to compare dDocent with Stacks, the first generally available, widely used pipeline for analysis of RADseq data. dDocent consistently identified more SNPs shared across greater numbers of individuals and with higher levels of coverage. This is due to the fact that dDocent quality trims instead of filtering, incorporates both forward and reverse reads (including reads with INDEL polymorphisms) in assembly, mapping, and SNP calling. The pipeline and a comprehensive user guide can be found at http://dDocent.wordpress.com.
Article
Full-text available
The importance, extent, and mode of interspecific gene flow for the evolution of species has long been debated. Characterization of genomic differentiation in a classic example of hybridization between all-black carrion crows and gray-coated hooded crows identified genome-wide introgression extending far beyond the morphological hybrid zone. Gene expression divergence was concentrated in pigmentation genes expressed in gray versus black feather follicles. Only a small number of narrow genomic islands exhibited resistance to gene flow. One prominent genomic region (<2 megabases) harbored 81 of all 82 fixed differences (of 8.4 million single-nucleotide polymorphisms in total) linking genes involved in pigmentation and in visual perception—a genomic signal reflecting color-mediated prezygotic isolation. Thus, localized genomic selection can cause marked heterogeneity in introgression landscapes while maintaining phenotypic divergence.
Article
Full-text available
Although many NGS read pre-processing tools already existed, we could not find any tool or combination of tools which met our requirements in terms of flexibility, correct handling of paired-end data, and high performance. We have developed Trimmomatic as a more flexible and efficient pre-processing tool, which could correctly handle paired-end data. The value of NGS read pre-processing is demonstrated for both reference-based and reference-free tasks. Trimmomatic is shown to produce output which is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested. Trimmomatic is licensed under GPL V3. It is cross-platform (Java 1.5+ required) and available from http://www.usadellab.org/cms/index.php?page=trimmomatic CONTACT: usadel@bio1.rwth-aachen.de SUPPLEMENTARY INFORMATION: Manual and source code are available from http://www.usadellab.org/cms/index.php?page=trimmomatic.
Article
Full-text available
Most speciation events probably occur gradually, without complete and immediate reproductive isolation, but the full extent of gene flow between diverging species has rarely been characterized on a genome-wide scale. Documenting the extent and timing of admixture between diverging species can clarify the role of geographic isolation in speciation. Here we use new methodology to quantify admixture at different stages of divergence in Heliconius butterflies, based on whole genome sequences of 31 individuals. Comparisons between sympatric and allopatric populations of H. melpomene, H. cydno and H. timareta revealed a genome-wide trend of increased shared variation in sympatry, indicative of pervasive interspecific gene flow. Up to 40% of 100 kb genomic windows clustered by geography rather than by species, demonstrating that a very substantial fraction of the genome has been shared between sympatric species. Analyses of genetic variation shared over different time intervals suggested that admixture between these species has continued since early in speciation. Alleles shared between species during recent time intervals displayed higher levels of linkage disequilibrium than those shared over longer time intervals, suggesting that this admixture took place at multiple points during divergence and is probably ongoing. The signal of admixture was significantly reduced around loci controlling divergent wing patterns, as well as throughout the Z chromosome, consistent with strong selection for Müllerian mimicry and with known Z-linked hybrid incompatibility. Overall these results show that species divergence can occur in the face of persistent and genome-wide admixture over long periods of time.
Article
Full-text available
Massively parallel short-read sequencing technologies, coupled with powerful software platforms, are enabling investigators to analyse tens of thousands of genetic markers. This wealth of data is rapidly expanding and allowing biological questions to be addressed with unprecedented scope and precision. The sizes of the data sets are now posing significant data processing and analysis challenges. Here we describe an extension of the Stacks software package to efficiently use genotype-by-sequencing data for studies of populations of organisms. Stacks now produces core population genomic summary statistics and SNP-by-SNP statistical tests. These statistics can be analysed across a reference genome using a smoothed sliding window. Stacks also now provides several output formats for several commonly used downstream analysis packages. The expanded population genomics functions in Stacks will make it a useful tool to harness the newest generation of massively parallel genotyping data for ecological and evolutionary genetics.
Article
Full-text available
The innovation of restriction-site associated DNA sequencing (RAD-seq) method takes full advantage of next-generation sequencing technology. By clustering paired-end short reads into groups with their own unique tags, RAD-seq assembly problem is divided into subproblems. Fast and accurately clustering and assembling millions of RAD-seq reads with sequencing errors, different levels of heterozygosity and repetitive sequences is a challenging question. Rainbow is developed to provide an ultra-fast and memory-efficient solution to clustering and assembling short reads produced by RAD-seq. First, Rainbow clusters reads using a spaced seed method. Then, Rainbow implements a heterozygote calling like strategy to divide potential groups into haplotypes in a top-down manner. And along a guided tree, it iteratively merges sibling leaves in a bottom-up manner if they are similar enough. Here, the similarity is defined by comparing the 2nd reads of a RAD segment. This approach tries to collapse heterozygote while discriminate repetitive sequences. At last, Rainbow uses a greedy algorithm to locally assemble merged reads into contigs. Rainbow not only outputs the optimal but also suboptimal assembly results. Based on simulation and a real guppy RAD-seq data, we show that Rainbow is more competent than the other tools in dealing with RAD-seq data. Source code in C, Rainbow is freely available at http://sourceforge.net/projects/bio-rainbow/files/ ruanjue@gmail.com.
Article
Many aspects of sex chromosome evolution are common to both plants and animals [1], but the process of Y chromosome degeneration, where genes on the Y become non-functional over time, may be much slower in plants due to purifying selection against deleterious mutations in the haploid gametophyte [2, 3]. Testing for differences in Y degeneration between the kingdoms has been hindered by the absence of accurate age estimates for plant sex chromosomes. Here, we used genome resequencing to estimate the spontaneous mutation rate and the age of the sex chromosomes in white campion (Silene latifolia). Screening of single nucleotide polymorphisms (SNPs) in parents and 10 F1 progeny identified 39 de novo mutations and yielded a rate of 7.31 × 10⁻⁹ (95% confidence interval: 5.20 × 10⁻⁹ − 8.00 × 10⁻⁹) mutations per site per haploid genome per generation. Applying this mutation rate to the synonymous divergence between homologous X- and Y-linked genes (gametologs) gave age estimates of 11.00 and 6.32 million years for the old and young strata, respectively. Based on SNP segregation patterns, we inferred which genes were Y-linked and found that at least 47% are already dysfunctional. Applying our new estimates for the age of the sex chromosomes indicates that the rate of Y degeneration in S. latifolia is nearly 2-fold slower when compared to animal sex chromosomes of a similar age. Our revised estimates support Y degeneration taking place more slowly in plants, a discrepancy that may be explained by differences in the life cycles of animals and plants. Krasovec et al. measure the mutation rate in white campion and find that it is remarkably similar across three plant orders. Based on the mutation rate, the sex chromosomes in white campion are estimated to be ∼11 million years old. The Y chromosome of white campion is demonstrated to be degrading at a slower rate compared to animals.
Article
The evolution of reproductive barriers and their underlying genetic architecture is of central importance for the formation of new species. Reproductive barriers can be controlled either by few large‐effect loci suggesting strong selection on key traits, or by many small‐effect loci, consistent with gradual divergence or with selection on polygenic or multiple traits. Genetic coupling between reproductive barrier loci further promotes divergence, particularly divergence with ongoing gene flow. In this study, we investigated the genetic architectures of ten morphological, phenological and life history traits associated with reproductive barriers between the hybridizing sister species Silene dioica and S. latifolia; both are dioecious with XY sex determination. We used quantitative trait locus (QTL) mapping in two reciprocal F2 crosses. One to six QTLs per trait, including nine major QTLs (PVE>20%) were detected on 11 of the 12 linkage groups. We found strong evidence for coupling of QTLs for uncorrelated traits and for an important role of sex chromosomes in the genetic architectures of reproductive barrier traits. Unexpectedly, QTLs detected in the two F2 crosses differed largely, despite limited phenotypic differences between them and sufficient statistical power. The widely dispersed genetic architectures of traits associated with reproductive barriers suggest gradual divergence or multifarious selection. Coupling of the underlying QTLs likely promoted divergence with gene flow in this system. The low congruence of QTLs between the two crosses further points to variable and possibly redundant genetic architectures of traits associated with reproductive barriers, with important implications for the evolutionary dynamics of divergence and speciation. This article is protected by copyright. All rights reserved.
Article
Adaptation to new environments often occurs in the face of gene flow. Under these conditions,gene flow and recombination can impede adaptation by breaking down linkage disequilibrium between locally adapted alleles. Theory predicts that this decay can be halted or slowed if adaptive alleles are tightly linked in regions of low recombination, potentially favoring divergence and adaptive evolution in these regions over others. Here, we compiled a global genomic dataset of over 1300 individual threespine stickleback from 52 populations and compared the tendency for adaptive alleles to occur in regions of low recombination between populations that diverged with or without gene flow. In support of theory, we found that putatively adaptive alleles (FST and dXY outliers) tend to occur more often in regions of low recombination in populations where divergent selection and gene flow have jointly occurred. This result remained significant when we employed different genomic window sizes, controlled for the effects of mutation rate and gene density, controlled for overall genetic differentiation, varied the genetic map used to estimate recombination, and used a continuous (rather than discrete) measure of geographic distance as proxy for gene flow/shared ancestry. We argue that our study provides the first statistical evidence that gene flow per se shapes genomic patterns of differentiation by increasing divergence in regions of low recombination. This article is protected by copyright. All rights reserved.
Article
Speciation can involve a transition from a few genetic loci that are resistant to gene flow to genome-wide differentiation. However, only limited data exist concerning this transition and the factors promoting it. Here, we study phases of speciation using data from >100 populations of 11 species of Timema stick insects. Consistent with early phases of genic speciation, adaptive colour-pattern loci reside in localized genetic regions of accentuated differentiation between populations expe- riencing gene flow. Transitions to genome-wide differentiation are also observed with gene flow, in association with differ- entiation in polygenic chemical traits affecting mate choice. Thus, intermediate phases of speciation are associated with genome-wide differentiation and mate choice, but not growth of a few genomic islands. We also find a gap in genomic dif- ferentiation between sympatric taxa that still exchange genes and those that do not, highlighting the association between differentiation and complete reproductive isolation. Our results suggest that substantial progress towards speciation may involve the alignment of multi-faceted aspects of differentiation.
Article
DNA sequence diversity in genes in the partially sex-linked pseudoautosomal region (PAR) of the sex chromosomes of the plant Silene latifolia is higher than expected from within-species diversity of other genes. This could be the footprint of sexually antagonistic (SA) alleles that are maintained by balancing selection in a PAR gene (or genes) and affect polymorphism in linked genome regions. SA selection is predicted to occur during sex chromosome evolution, but it is important to test whether the unexpectedly high sequence polymorphism could be explained without it, purely by the combined effects of partial linkage with the sex-determining region and the population's demographic history, including possible introgression from S. dioica. To test this, we applied approximate Bayesian computation (ABC)-based model choice to autosomal sequence diversity data, to find the most plausible scenario for the recent history of S. latifolia and then to estimate the posterior density of the most relevant parameters. We then used these densities to simulate variation to be expected at PAR genes. We conclude that an excess of variants at high frequencies at PAR genes should arise in S. latifolia populations only for genes with strong associations with fully sex-linked genes, which requires closer linkage with the fully sex-linked region than that estimated for the PAR genes where apparent deviations from neutrality were observed. These results support the need to invoke selection to explain the S. latifolia PAR gene diversity, and encourage further work to test the possibility of balancing selection due to sexual antagonism. This article is protected by copyright. All rights reserved.
Article
As populations diverge, genetic differences accumulate across the genome. Spurred by rapid developments in sequencing technology, genome-wide population surveys of natural populations promise insights into the evolutionary processes and the genetic basis underlying speciation. Although genomic regions of elevated differentiation are the focus of searches for 'speciation genes', there is an increasing realization that such genomic signatures can also arise by alternative processes that are not related to population divergence, such as linked selection. In this Review, we explore methodological trends in speciation genomic studies, highlight the difficulty in separating processes related to speciation from those emerging from genome-wide properties that are not related to reproductive isolation, and provide a set of suggestions for future work in this area.
Article
In order to investigate the role of differential adaptation for the evolution of reproductive barriers, we conducted a multi-site transplant experiment with the dioecious sister species Silene dioica and S. latifolia and their hybrids. Crosses within species as well as reciprocal first-generation (F1) and second-generation (F2) interspecific hybrids were transplanted into six sites, three within each species' habitat. Survival and flowering were recorded over 4 yr. At all transplant sites, the local species outperformed the foreign species, reciprocal F1 hybrids performed intermediately and F2 hybrids underperformed in comparison to F1 hybrids (hybrid breakdown). Females generally had slightly higher cumulative fitness than males in both within- and between-species crosses and we thus found little evidence for Haldane's rule acting on field performance. The strength of selection against F1 and F2 hybrids as well as hybrid breakdown increased with increasing strength of habitat adaptation (i.e. the relative fitness difference between the local and the foreign species) across sites. Our results suggest that differential habitat adaptation led to ecologically dependent post-zygotic reproductive barriers and drives divergence and speciation in this Silene system.
Article
Speciation often involves repeated episodes of genetic contact between divergent populations before reproductive isolation (RI) is complete. Whole genome sequencing (WGS) holds great promise for unravelling the genomic bases of speciation. We have studied two ecologically divergent, hybridizing species of the 'model tree' genus Populus (poplars, aspens, cottonwoods), Populus alba and P. tremula, using >8.6 million Single Nucleotide Polymorphisms (SNPs) from WGS of population pools. We used the genomic data to (1) scan these species' genomes for regions of elevated and reduced divergence, (2) assess key aspects of their joint demographic history based on genome-wide site frequency spectra (SFS), (3) infer the potential roles of adaptive and deleterious coding mutations in shaping the genomic landscape of divergence. We identified numerous small, unevenly distributed genome regions without fixed polymorphisms despite high overall genomic differentiation. The joint SFS was best explained by ancient and repeated gene flow and allowed pinpointing candidate interspecific migrant tracts. The direction of selection (DoS) differed between genes in putative migrant tracts and the remainder of the genome, thus indicating the potential roles of adaptive divergence and segregating deleterious mutations on the evolution and breakdown of RI. Genes affected by positive selection during divergence were enriched for several functionally interesting groups, including well known candidate 'speciation genes' involved in plant innate immunity. Our results suggest that adaptive divergence affects RI in these hybridizing species mainly through intrinsic and demographic processes. Integrating genomic with molecular data holds great promise for revealing the effects of particular genetic pathways on speciation.
Article
The disproportionately large involvement of the X-chromosome in the isolation of closely related species (the large-X effect) has been reported for many animals, where X-linked genes are mostly hemizygous in the heterogametic sex. The expression of deleterious recessive mutations is thought to drive the frequent involvement of the X-chromosome in hybrid sterility, as well as to reduce interspecific gene flow for X-linked genes. Here, we evaluate the role of the X-chromosome in the speciation of two closely related plant species – the white and red campions (Silene latifolia and S. dioica) – that hybridize widely across Europe. The two species evolved separate sexes and sex chromosomes relatively recently (~107 years), and unlike most animal species, most X-linked genes have intact Y-linked homologs. We demonstrate that the X-linked genes show a very small and insignificant amount of interspecific gene flow, while gene flow involving autosomal loci is significant and sufficient to homogenise the gene pools of the two species. These findings are consistent with the hypothesis of the large-X effect in Silene and comprise the first report of this effect in plants. Non-hemizygosity of many X-linked genes in Silene males indicates that exposure of recessive mutations to selection may not be essential for the occurrence of the large-X effect. Several possible causes of the large-X effect in Silene are discussed.
Article
Population genetic models predict that alleles with small selection coefficients may be swamped by migration and will not contribute to local adaptation. But if most alleles contributing to standing variation are of small effect, how does local adaptation proceed? Here I review predictions of population and quantitative genetic models and use individual-based simulations to illustrate how the architecture of local adaptation depends on the genetic redundancy of the trait, the maintenance of standing genetic variation (VG), and the susceptibility of alleles to swamping. Even when population genetic models predict swamping for individual alleles, considerable local adaptation can evolve at the phenotypic level if there is sufficient VG. However, in such cases the underlying architecture of divergence is transient: FST is low across all loci, and no locus makes an important contribution for very long. Because this kind of local adaptation is mainly due to transient frequency changes and allelic covariances, these architectures will be difficult-if not impossible-to detect using current approaches to studying the genomic basis of adaptation. Even when alleles are large and resistant to swamping, architectures can be highly transient if genetic redundancy and mutation rates are high. These results suggest that drift can play a critical role in shaping the architecture of local adaptation, both through eroding VG and affecting the rate of turnover of polymorphisms with redundant phenotypic effects.
Article
Speciation is a continuous process during which genetic changes gradually accumulate in the genomes of diverging species. Recent studies have documented highly heterogeneous differentiation landscapes, with distinct regions of elevated differentiation ('differentiation islands') widespread across genomes. However, it remains unclear which processes drive the evolution of differentiation islands, how the differentiation landscape evolves as speciation advances, and ultimately how differentiation islands are related to speciation. Here, we addressed these questions based on population genetic analyses of 200 re-sequenced genomes from 10 populations of four Ficedula flycatcher sister species. We show that a heterogeneous differentiation landscape starts emerging among populations within species and that differentiation islands evolve recurrently in the very same genomic regions among independent lineages. Contrary to expectations from models that interpret differentiation islands as genomic regions involved in reproductive isolation that are shielded from gene flow, patterns of sequence divergence (dxy and relative node depth) do not support a major role of gene flow in the evolution of the differentiation landscape in these species. Instead, as predicted by models of linked selection, genome-wide variation in diversity and differentiation can be explained by variation in recombination rate and the density of targets for selection. We thus conclude that the heterogeneous landscape of differentiation in Ficedula flycatchers evolves mainly as the result of background selection and selective sweeps in genomic regions of low recombination. Our results emphasize the necessity of incorporating linked selection as a null model to identify genome regions involved in adaptation and speciation.
Article
The relative importance of floral versus ecological isolation in preventing hybridisation in plant species remains unknown, primarily due to a paucity of detailed data from a range of systems. We examined floral isolation between Silene dioica and S. latifolia (Caryophyllaceae) in southern England by measuring gene flow across the species boundary using allozymes, and by assessing interspecific transfer of flourescent dye powders to simulate pollination. Allozyme studies of wild populations demonstrated that gene flow between S. dioica and S. latifolia is considerable since the two species could not be distinguished at the loci studied, in sharp contrast to their distinct morphologies. Pollination studies using fluorescent dye powders and direct observation of insect behaviour concurred in that although there was a degree of assortative mating it was insufficient in itself to prevent introgression. Fluorescent dye studies also suggest that pollination rates of hybrids are similar to parental types and that they provide a bridge for gene flow since they are visited freely by the main pollinators of both S. dioica (bumblebees) and S. latifolia (moths). We conclude that although floral isolation and spatial segregation may be important contributory factors, morphological differences between species are probably maintained primarily by strong selective forces associated with habitat (ecological isolation).
Article
The use of molecular data to reconstruct the history of divergence and gene flow between populations of closely related taxa represents a challenging problem. It has been proposed that the long-standing debate about the geography of speciation can be resolved by comparing the likelihoods of a model of isolation with migration and a model of secondary contact. However, data are commonly only fit to a model of isolation with migration and rarely tested against the secondary contact alternative. Furthermore, most demographic inference methods have neglected variation in introgression rates and assume that the gene flow parameter (Nm) is similar among loci. Here, we show that neglecting this source of variation can give misleading results. We analysed DNA sequences sampled from populations of the marine mussels, Mytilus edulis and M. galloprovincialis, across a well-studied mosaic hybrid zone in Europe and evaluated various scenarios of speciation, with or without variation in introgression rates, using an Approximate Bayesian Computation (ABC) approach. Models with heterogeneous gene flow across loci always outperformed models assuming equal migration rates irrespective of the history of gene flow being considered. By incorporating this heterogeneity, the best-supported scenario was a long period of allopatric isolation during the first three-quarters of the time since divergence followed by secondary contact and introgression during the last quarter. By contrast, constraining migration to be homogeneous failed to discriminate among any of the different models of gene flow tested. Our simulations thus provide statistical support for the secondary contact scenario in the European Mytilus hybrid zone that the standard coalescent approach failed to confirm. Our results demonstrate that genomic variation in introgression rates can have profound impacts on the biological conclusions drawn from inference methods and needs to be incorporated in future studies.
Article
The metaphor of “genomic islands of speciation” was first used to describe heterogeneous differentiation among loci between the genomes of closely related species. The biological model proposed to explain these differences was that the regions showing high levels of differentiation were resistant to gene flow between species, while the remainder of the genome was being homogenized by gene flow and consequently showed lower levels of differentiation. However, the conditions under which such differentiation can occur at multiple unlinked loci are restrictive; additionally, essentially all previous analyses have been carried out using relative measures of divergence, which can be misleading when regions with different levels of recombination are compared. Here we test the model of differential gene flow by asking whether absolute divergence is also higher in the previously identified “islands.” Using five species-pairs for which full sequence data is available, we find that absolute measures of divergence are not higher in genomic islands. Instead, in all cases examined we find reduced diversity in these regions, a consequence of which is that relative measures of divergence are abnormally high. These data therefore do not support a model of differential gene flow among loci, though islands of relative divergence may represent loci involved in local adaptation. Simulations using the program IMa2 further suggest that inferences of any gene flow may be incorrect in many comparisons. We instead present an alternative explanation for heterogeneous patterns of differentiation, one in which post-speciation selection generates patterns consistent with multiple aspects of the data.This article is protected by copyright. All rights reserved.
Article
A long-standing problem in evolutionary biology has been determining whether and how gradual, incremental changes at the gene level can account for rapid speciation and bursts of adaptive radiation. Using genome-scale computer simulations, we extend previous theory showing how gradual adaptive change can generate nonlinear population transitions, resulting in the rapid formation of new, reproductively isolated species. We show that these transitions occur via a mechanism rooted in a basic property of biological heredity: the organization of genes in genomes. Genomic organization of genes facilitates two processes: (i) the buildup of statistical associations among large numbers of genes, and (ii) the action of divergent selection on persistent combinations of alleles. When a population has accumulated a critical amount of standing, divergently selected variation, the combination of these two processes allows many mutations of small effect to act synergistically and precipitously split one population into two discontinuous, reproductively isolated groups. Periods of allopatry, chromosomal linkage among loci, and large-effect alleles can facilitate this process under some conditions, but are not required for it. Our results complement and extend existing theory on alternative stable states during population divergence, distinct phases of speciation, and the rapid emergence of multilocus barriers to gene flow. The results are thus a step toward aligning population genomic theory with modern empirical studies.This article is protected by copyright. All rights reserved.
Article
Speciation is a fundamental evolutionary process, the knowledge of which is crucial for understanding the origins of biodiversity. Genomic approaches are an increasingly important aspect of this research field. We review current understanding of genome-wide effects of accumulating reproductive isolation and of genomic properties that influence the process of speciation. Building on this work, we identify emergent trends and gaps in our understanding, propose new approaches to more fully integrate genomics into speciation research, translate speciation theory into hypotheses that are testable using genomic tools and provide an integrative definition of the field of speciation genomics.
Article
BWA-MEM is a new alignment algorithm for aligning sequence reads or long query sequences against a large reference genome such as human. It automatically chooses between local and end-to-end alignments, supports paired-end reads and performs chimeric alignment. The algorithm is robust to sequencing errors and applicable to a wide range of sequence lengths from 70bp to a few megabases. For mapping 100bp sequences, BWA-MEM shows better performance than several state-of-art read aligners to date. Availability and implementation: BWA-MEM is implemented as a component of BWA, which is available at http://github.com/lh3/bwa. Contact: hengli@broadinstitute.org
Article
The closely related dioecious herbs Silene latifolia and Silene dioica are widespread and predominantly sympatric in Europe. The species are interfertile, but morphologically and ecologically distinct. A study of large-scale patterns of plastid DNA (polymerase chain reaction–restriction fragment length polymorphism) haplotypes in a sample of 198 populations from most of the European ranges of both species revealed extensive interspecific haplotype sharing. Four of the 28 detected haplotypes were frequent (found in > 40 populations) and widespread. Three of these frequent haplotypes occurred in both species and the geographic distribution of each haplotype was broadly congruent in both species. Each of these three, shared and widespread haplotypes is likely to have colonized central and/or northern Europe after the last glaciation from one or more of refugial areas in southern Europe. Interspecific hybridization and plastid introgression within refugial regions and/or during the early stages of postglacial expansion is the most plausible explanation for the broadly similar distribution patterns of the shared, frequent chloroplast haplotypes in the two species. The fourth frequent, widespread haplotype was absent from S. latifolia and almost entirely restricted to Nordic S. dioica. It is most likely that this haplotype spread into the Nordic countries from a central or northern European source or from a refugial area in Russia. © 2009 The Linnean Society of London, Botanical Journal of the Linnean Society, 2009, 161, 153–170.
Article
The direct detection of haplotypes from short-read DNA sequencing data requires changes to existing small-variant detection methods. Here, we develop a Bayesian statistical framework which is capable of modeling multiallelic loci in sets of individuals with non-uniform copy number. We then describe our implementation of this framework in a haplotype-based variant detector, FreeBayes.
Article
Many recent studies of intraspecific geographic variation in maternally inherited chloroplast DNA (cpDNA) in European trees have revealed haplotype distributions that can be interpreted in terms of scenarios of postglacial migration and range expansion. However, there is still a lack of comparable information from widespread herb species. In the present study, we investigated the geographic distribution of cpDNA variation in 124 populations, covering a large part of the range of the widespread, dioecious, European herb, Silene dioica. PCR-RFLP analysis revealed 24 different cpDNA haplotypes. As in the majority of European tree species, the large-scale geographic distributions of the most common S. dioica haplotypes suggest that the species colonized Europe from more than one geographic source. Material from 16 populations of S. latifolia and five hybrid populations was also included in the study for comparative purposes. Five out of seven haplotypes detected in S. latifolia were shared with S. dioica. The similarity of the geographic distributions of the shared haplotypes in both species is consistent with a history of past and/or recent interspecific hybridization and introgression between these closely related plants. The two haplotypes detected only in S. latifolia were present in populations in the Mediterranean region – on the southern margin of the species’ area of sympatry, or outside the range of S. dioica.