[show abstract][hide abstract] ABSTRACT: Inexpensive short-read sequencing technologies applied to reduced representation genomes is revolutionizing genetic research, especially population genetics analysis, by allowing the genotyping of massive numbers of single-nucleotide polymorphisms (SNP) for large numbers of individuals and populations. Restriction site-associated DNA (RAD) sequencing is a recent technique based on the characterization of genomic regions flanking restriction sites. One of its potential drawbacks is the presence of polymorphism within the restriction site, which makes it impossible to observe the associated SNP allele (i.e. allele dropout, ADO). To investigate the effect of ADO on genetic variation estimated from RAD markers, we first mathematically derived measures of the effect of ADO on allele frequencies as a function of different parameters within a single population. We then used RAD data sets simulated using a coalescence model to investigate the magnitude of biases induced by ADO on the estimation of expected heterozygosity and F(ST) under a simple demographic model of divergence between two populations. We found that ADO tends to overestimate genetic variation both within and between populations. Assuming a mutation rate per nucleotide between 10(-9) and 10(-8) , this bias remained low for most studied combinations of divergence time and effective population size, except for large effective population sizes. Averaging F(ST) values over multiple SNPs, for example, by sliding window analysis, did not correct ADO biases. We briefly discuss possible solutions to filter the most problematic cases of ADO using read coverage to detect markers with a large excess of null alleles.
[show abstract][hide abstract] ABSTRACT: Approximate Bayesian Computation has been successfully used in population
genetics models to bypass the calculation of the likelihood. These algorithms
provide an accurate estimator by comparing the observed dataset to a sample of
datasets simulated from the model. Although parallelization is easily achieved,
computation times for assuring a suitable approximation quality of the
posterior distribution are still long. To alleviate this issue, we propose a
sequential algorithm adapted from Del Moral et al. (2012) which runs twice as
fast as traditional ABC algorithms. Its parameters are calibrated to minimize
the number of simulations from the model.
[show abstract][hide abstract] ABSTRACT: Comparison of demo-genetic models using Approximate Bayesian Computation (ABC) is an active research field. Although large numbers of populations and models (i.e. scenarios) can be analysed with ABC using molecular data obtained from various marker types, methodological and computational issues arise when these numbers become too large. Moreover, Robert et al. (Proceedings of the National Academy of Sciences of the United States of America, 2011, 108, 15112) have shown that the conclusions drawn on ABC model comparison cannot be trusted per se and required additional simulation analyses. Monte Carlo inferential techniques to empirically evaluate confidence in scenario choice are very time-consuming, however, when the numbers of summary statistics (Ss) and scenarios are large. We here describe a methodological innovation to process efficient ABC scenario probability computation using linear discriminant analysis (LDA) on Ss before computing logistic regression. We used simulated pseudo-observed data sets (pods) to assess the main features of the method (precision and computation time) in comparison with traditional probability estimation using raw (i.e. not LDA transformed) Ss. We also illustrate the method on real microsatellite data sets produced to make inferences about the invasion routes of the coccinelid Harmonia axyridis. We found that scenario probabilities computed from LDA-transformed and raw Ss were strongly correlated. Type I and II errors were similar for both methods. The faster probability computation that we observed (speed gain around a factor of 100 for LDA-transformed Ss) substantially increases the ability of ABC practitioners to analyse large numbers of pods and hence provides a manageable way to empirically evaluate the power available to discriminate among a large set of complex scenarios.
[show abstract][hide abstract] ABSTRACT: Effective population size (N
e) is a central concept in evolutionary biology and conservation genetics. It predicts rates of loss of neutral genetic variation,
fixation of deleterious and favourable alleles, and the increase of inbreeding experienced by a population. A method exists
for the estimation of N
e from the observed linkage disequilibrium between unlinked loci in a population sample. While an increasing number of studies
have applied this method in natural and managed populations, its reliability has not yet been evaluated. We developed a computer
program to calculate this estimator of N
e using the most widely used linkage disequilibrium algorithm and used simulations to show that this estimator is strongly
biased when the sample size is small (<‰100) and below the true N
e. This is probably due to the linkage disequilibrium generated by the sampling process itself and the inadequate correction
for this phenomenon in the method. Results suggest that N
e estimates derived using this method should be regarded with caution in many cases. To improve the method’s reliability and
usefulness we propose a way to determine whether a given sample size exceeds the population N
e and can therefore be used for the computation of an unbiased estimate.
[show abstract][hide abstract] ABSTRACT: The spider mite Tetranychus evansi is an emerging pest of solanaceous crops worldwide. Like many other emerging pests, its small size, confusing taxonomy, complex history of associations with humans, and propensity to start new populations from small inocula, make the study of its invasion biology difficult. Here, we use recent developments in Approximate Bayesian Computation (ABC) and variation in multi-locus genetic markers to reconstruct the complex historical demography of this cryptic invasive pest. By distinguishing among multiple pathways and timing of introductions, we find evidence for the "bridgehead effect", in which one invasion serves as source for subsequent invasions. Tetranychus evansi populations in Europe and Africa resulted from at least three independent introductions from South America and involved mites from two distinct sources in Brazil, corresponding to highly divergent mitochondrial DNA lineages. Mites from southwest Brazil (BR-SW) colonized the African continent, and from there Europe through two pathways in a "bridgehead" type pattern. One pathway resulted in a widespread invasion, not only to Europe, but also to other regions in Africa, southern Europe and eastern Asia. The second pathway involved the mixture with a second introduction from BR-SW leading to an admixed population in southern Spain. Admixture was also detected between invasive populations in Portugal. A third introduction from the Brazilian Atlantic region resulted in only a limited invasion in Europe. This study illustrates that ABC methods can provide insights into, and distinguish among, complex invasion scenarios. These processes are critical not only in understanding the biology of invasions, but also in refining management strategies for invasive species. For example, while reported observations of the mite and outbreaks in the invaded areas were largely consistent with estimates of geographical expansion from the ABC approach, historical observations failed to recognize the complex pathways involved and the corresponding effects on genetic diversity.
PLoS ONE 01/2012; 7(4):e35601. · 3.73 Impact Factor
[show abstract][hide abstract] ABSTRACT: Approximate Bayesian computation (ABC) have become an essential tool for the analysis of complex stochastic models. Grelaud et al. [(2009) Bayesian Anal 3:427-442] advocated the use of ABC for model choice in the specific case of Gibbs random fields, relying on an intermodel sufficiency property to show that the approximation was legitimate. We implemented ABC model choice in a wide range of phylogenetic models in the Do It Yourself-ABC (DIY-ABC) software [Cornuet et al. (2008) Bioinformatics 24:2713-2719]. We now present arguments as to why the theoretical arguments for ABC model choice are missing, because the algorithm involves an unknown loss of information induced by the use of insufficient summary statistics. The approximation error of the posterior probabilities of the models under comparison may thus be unrelated with the computational effort spent in running an ABC algorithm. We then conclude that additional empirical verifications of the performances of the ABC procedure as those available in DIY-ABC are necessary to conduct model choice.
Proceedings of the National Academy of Sciences 08/2011; 108(37):15112-7. · 9.74 Impact Factor
[show abstract][hide abstract] ABSTRACT: The little fire ant, Wasmannia auropunctata, displays a peculiar breeding system polymorphism. Classical haplo-diploid sexual reproduction between reproductive individuals occurs in some populations, whereas, in others, queens and males reproduce clonally. Workers are produced sexually and are sterile in both clonal and sexual populations. The evolutionary fate of the clonal lineages depends strongly on the underlying mechanisms allowing reproductive individuals to transmit their genomes to subsequent generations. We used several queen-offspring data sets to estimate the rate of transition from heterozygosity to homozygosity associated with recombination events at 33 microsatellite loci in thelytokous parthenogenetic queen lineages and compared these rates with theoretical expectations under various parthenogenesis mechanisms. We then used sexually produced worker families to define linkage groups for these 33 loci and to compare meiotic recombination rates in sexual and parthenogenetic queens. Our results demonstrate that queens from clonal populations reproduce by automictic parthenogenesis with central fusion. These same parthenogenetic queens produce normally segregating meiotic oocytes for workers, which display much lower rates of recombination (by a factor of 45) than workers produced by sexual queens. These low recombination rates also concern the parthenogenetic production of queen offspring, as indicated by the very low rates of transition from heterozygosity to homozygosity observed (from 0% to 2.8%). We suggest that the combination of automixis with central fusion and a major decrease in recombination rates allows clonal queens to benefit from thelytoky while avoiding the potential inbreeding depression resulting from the loss of heterozygosity during automixis. In sterile workers, the strong decrease of recombination rates may also facilitate the conservation over time of some coadapted allelic interactions within chromosomes that might confer an adaptive advantage in habitats disturbed by human activity, where clonal populations of W. auropunctata are mostly found.
Molecular Biology and Evolution 03/2011; 28(9):2591-601. · 10.35 Impact Factor
[show abstract][hide abstract] ABSTRACT: Approximate Bayesian computation (ABC) have become a essential tool for the
analysis of complex stochastic models. Earlier, Grelaud et al. (2009) advocated
the use of ABC for Bayesian model choice in the specific case of Gibbs random
fields, relying on a inter-model sufficiency property to show that the
approximation was legitimate. Having implemented ABC-based model choice in a
wide range of phylogenetic models in the DIY-ABC software (Cornuet et al.,
2008), we now present theoretical background as to why a generic use of ABC for
model choice is ungrounded, since it depends on an unknown amount of
information loss induced by the use of insufficient summary statistics. The
approximation error of the posterior probabilities of the models under
comparison may thus be unrelated with the computational effort spent in running
an ABC algorithm. We then conclude that additional empirical verifications of
the performances of the ABC procedure as those available in DIYABC are
necessary to conduct model choice.
[show abstract][hide abstract] ABSTRACT: We developed a spatially explicit model of a bioinvasion and used an approximate Bayesian computation (ABC) framework to make various inferences from a combination of genetic (microsatellite genotypes), historical (first observation dates) and geographical (spatial coordinates of introduction and sampled sites) information. Our method aims to discriminate between alternative introduction scenarios and to estimate posterior densities of demographically relevant parameters of the invasive process. The performance of our landscape-ABC method is assessed using simulated data sets differing in their information content (genetic and/or historical data). We apply our methodology to the recent introduction and spatial expansion of the cane toad, Bufo marinus, in northern Australia. We find that, at least in the context of cane toad invasion, historical data are more informative than genetic data for discriminating between introduction scenarios. However, the combination of historical and genetic data provides the most accurate estimates of demographic parameters. For the cane toad, we find some evidence for a strong bottleneck prior to introduction, a small initial number of founder individuals (about 15), a large population growth rate (about 400% per generation), a standard deviation of dispersal distance of 19 km per generation and a high invasion speed at equilibrium (50 km per year). Our approach strengthens the application of the ABC method to the field of bioinvasion by allowing statistical inferences to be made on the introduction and the spatial expansion dynamics of invasive species using a combination of various relevant sources of information.
[show abstract][hide abstract] ABSTRACT: Recent papers have promoted the view that model-based methods in general, and those based on Approximate Bayesian Computation (ABC) in particular, are flawed in a number of ways, and are therefore inappropriate for the analysis of phylogeographic data. These papers further argue that Nested Clade Phylogeographic Analysis (NCPA) offers the best approach in statistical phylogeography. In order to remove the confusion and misconceptions introduced by these papers, we justify and explain the reasoning behind model-based inference. We argue that ABC is a statistically valid approach, alongside other computational statistical techniques that have been successfully used to infer parameters and compare models in population genetics. We also examine the NCPA method and highlight numerous deficiencies, either when used with single or multiple loci. We further show that the ages of clades are carelessly used to infer ages of demographic events, that these ages are estimated under a simple model of panmixia and population stationarity but are then used under different and unspecified models to test hypotheses, a usage the invalidates these testing procedures. We conclude by encouraging researchers to study and use model-based inference in population genetics.
[show abstract][hide abstract] ABSTRACT: Approximate Bayesian computation (ABC) is a recent flexible class of Monte-Carlo algorithms increasingly used to make model-based inference on complex evolutionary scenarios that have acted on natural populations. The software DIYABC offers a user-friendly interface allowing non-expert users to consider population histories involving any combination of population divergences, admixtures and population size changes. We here describe and illustrate new developments of this software that mainly include (i) inference from DNA sequence data in addition or separately to microsatellite data, (ii) the possibility to analyze five categories of loci considering balanced or non balanced sex ratios: autosomal diploid, autosomal haploid, X-linked, Y-linked and mitochondrial, and (iii) the possibility to perform model checking computation to assess the "goodness-of-fit" of a model, a feature of ABC analysis that has been so far neglected.
We used controlled simulated data sets generated under evolutionary scenarios involving various divergence and admixture events to evaluate the effect of mixing autosomal microsatellite, mtDNA and/or nuclear autosomal DNA sequence data on inferences. This evaluation included the comparison of competing scenarios and the quantification of their relative support, and the estimation of parameter posterior distributions under a given scenario. We also considered a set of scenarios often compared when making ABC inferences on the routes of introduction of invasive species to illustrate the interest of the new model checking option of DIYABC to assess model misfit.
Our new developments of the integrated software DIYABC should be particularly useful to make inference on complex evolutionary scenarios involving both recent and ancient historical events and using various types of molecular markers in diploid or haploid organisms. They offer a handy way for non-expert users to achieve model checking computation within an ABC framework, hence filling up a gap of ABC analysis. The software DIYABC V1.0 is freely available at http://www1.montpellier.inra.fr/CBGP/diyabc.
[show abstract][hide abstract] ABSTRACT: Recent studies of the routes of worldwide introductions of alien organisms suggest that many widespread invasions could have stemmed not from the native range, but from a particularly successful invasive population, which serves as the source of colonists for remote new territories. We call here this phenomenon the invasive bridgehead effect. Evaluating the likelihood of such a scenario is heuristically challenging. We solved this problem by using approximate Bayesian computation methods to quantitatively compare complex invasion scenarios based on the analysis of population genetics (microsatellite variation) and historical (first observation dates) data. We applied this approach to the Harlequin ladybird Harmonia axyridis (HA), a coccinellid native to Asia that was repeatedly introduced as a biocontrol agent without becoming established for decades. We show that the recent burst of worldwide invasions of HA followed a bridgehead scenario, in which an invasive population in eastern North America acted as the source of the colonists that invaded the European, South American and African continents, with some admixture with a biocontrol strain in Europe. This demonstration of a mechanism of invasion via a bridgehead has important implications both for invasion theory (i.e., a single evolutionary shift in the bridgehead population versus multiple changes in case of introduced populations becoming invasive independently) and for ongoing efforts to manage invasions by alien organisms (i.e., heightened vigilance against invasive bridgeheads).
PLoS ONE 01/2010; 5(3):e9743. · 3.73 Impact Factor
[show abstract][hide abstract] ABSTRACT: The Adaptive Multiple Importance Sampling (AMIS) algorithm is aimed at an
optimal recycling of past simulations in an iterated importance sampling
scheme. The difference with earlier adaptive importance sampling
implementations like Population Monte Carlo is that the importance weights of
all simulated values, past as well as present, are recomputed at each
iteration, following the technique of the deterministic multiple mixture
estimator of Owen and Zhou (2000). Although the convergence properties of the
algorithm cannot be fully investigated, we demonstrate through a challenging
banana shape target distribution and a population genetics example that the
improvement brought by this technique is substantial.
[show abstract][hide abstract] ABSTRACT: Sequential techniques can enhance the efficiency of the approximate Bayesian computation algorithm, as in Sisson et al.'s (2007) partial rejection control version. While this method is based upon the theoretical works of Del Moral et al. (2006), the application to approximate Bayesian computation results in a bias in the approximation to the posterior. An alternative version based on genuine importance sampling arguments bypasses this difficulty, in connection with the population Monte Carlo method of Cappé et al. (2004), and it includes an automatic scaling of the forward kernel. When applied to a population genetics example, it compares favourably with two other versions of the approximate algorithm. Copyright 2009, Oxford University Press.
[show abstract][hide abstract] ABSTRACT: Genetic data obtained on population samples convey information about their evolutionary history. Inference methods can extract part of this information but they require sophisticated statistical techniques that have been made available to the biologist community (through computer programs) only for simple and standard situations typically involving a small number of samples. We propose here a computer program (DIY ABC) for inference based on approximate Bayesian computation (ABC), in which scenarios can be customized by the user to fit many complex situations involving any number of populations and samples. Such scenarios involve any combination of population divergences, admixtures and population size changes. DIY ABC can be used to compare competing scenarios, estimate parameters for one or more scenarios and compute bias and precision measures for a given scenario and known values of parameters (the current version applies to unlinked microsatellite data). This article describes key methods used in the program and provides its main features. The analysis of one simulated and one real dataset, both with complex evolutionary scenarios, illustrates the main possibilities of DIY ABC. AVAILABILITY: The software DIY ABC is freely available at http://www.montpellier.inra.fr/CBGP/diyabc.
[show abstract][hide abstract] ABSTRACT: The honey bee is a key model for social behavior and this feature led to the selection of the species for genome sequencing. A genetic map is a necessary companion to the sequence. In addition, because there was originally no physical map for the honey bee genome project, a meiotic map was the only resource for organizing the sequence assembly on the chromosomes.
We present the genetic (meiotic) map here and describe the main features that emerged from comparison with the sequence-based physical map. The genetic map of the honey bee is saturated and the chromosomes are oriented from the centromeric to the telomeric regions. The map is based on 2,008 markers and is about 40 Morgans (M) long, resulting in a marker density of one every 2.05 centiMorgans (cM). For the 186 megabases (Mb) of the genome mapped and assembled, this corresponds to a very high average recombination rate of 22.04 cM/Mb. Honey bee meiosis shows a relatively homogeneous recombination rate along and across chromosomes, as well as within and between individuals. Interference is higher than inferred from the Kosambi function of distance. In addition, numerous recombination hotspots are dispersed over the genome.
The very large genetic length of the honey bee genome, its small physical size and an almost complete genome sequence with a relatively low number of genes suggest a very promising future for association mapping in the honey bee, particularly as the existence of haploid males allows easy bulk segregant analysis.
[show abstract][hide abstract] ABSTRACT: Two independent genome projects for the honey bee, a microsatellite linkage map and a genome sequence assembly, interactively produced an almost complete organization of the euchromatic genome. Assembly 4.0 now includes 626 scaffolds that were ordered and oriented into chromosomes according to the framework provided by the third-generation linkage map (AmelMap3). Each construct was used to control the quality of the other. The co-linearity of markers in the sequence and the map is almost perfect and argues in favor of the high quality of both.
[show abstract][hide abstract] ABSTRACT: We introduce here a Bayesian analysis of a classical admixture model in which all parameters are simultaneously estimated. Our approach follows the approximate Bayesian computation (ABC) framework, relying on massive simulations and a rejection-regression algorithm. Although computationally intensive, this approach can easily deal with complex mutation models and partially linked loci, and it can be thoroughly validated without much additional computation cost. Compared to a recent maximum-likelihood (ML) method, the ABC approach leads to similarly accurate estimates of admixture proportions in the case of recent admixture events, but it is found superior when the admixture is more ancient. All other parameters of the admixture model such as the divergence time between parental populations, the admixture time, and the population sizes are also well estimated, unlike the ML method. The use of partially linked markers does not introduce any particular bias in the estimation of admixture, but ML confidence intervals are found too narrow if linkage is not specifically accounted for. The application of our method to an artificially admixed domestic bee population from northwest Italy suggests that the admixture occurred in the last 10-40 generations and that the parental Apis mellifera and A. ligustica populations were completely separated since the last glacial maximum.