Jean-Marie Cornuet’s research while affiliated with Centre de Biologie et de Gestion des Populations and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (187)


Application of ABC to Infer the Genetic History of Pygmy Hunter-Gatherer Populations from Western Central Africa
  • Book

August 2018

·

33 Reads

·

1 Citation

·

·

·

[...]

·

In evolutionary biology, approximate Bayesian computation (ABC) methods are well adapted to the complex models of species and population history in which serial or independent divergence events, change of population sizes, and genetic admixture, or migration events are often suspected. Here, we present the results of using a set of recent ABC-based methods to analyse a microsatellite genetic dataset composed of Pygmy populations from Western Central Africa and non-Pygmy human populations, partly stemming from the dataset originally investigated in Verdu et al. (2009). Although archaeological remains attest to Homo sapiens’ presence in the Congo Basin for at least 40,000 years, the demographic history of these groups, including divergence and admixture, remains little known. In this chapter, we will successively describe (1) the observed microsatellite dataset for which one wants to conduct a Bayesian analysis, the set(s) of models we compared, with their parameters and associated priors, and the way datasets were simulated for ABC analyses; (2) the model choice analyses we carried out using a recently developed method, ABC random forest (Pudlo et al., 2015); (3) the estimates we obtained for Pygmy historical and demographic parameters under the most likely model; and (4) model-posterior checking to evaluate the goodness of fit between the final inferred genetic history and the observed dataset. We found a probable recent (about 4,900 YBP) common origin of all Western Central African Pygmy populations, despite the vast cultural, morphological, and genetic diversity observed today among these populations. We also confirmed recent asymmetrical and heterogeneous genetic introgressions from non-Pygmies into each Pygmy population. These results are consistent with previous population genetics studies on Pygmies from Western Central Africa by Verdu et al. (2009), Batini et al. (2011), and Patin et al. (2009).


Application of approximate Bayesian computation to infer the genetic history of Pygmy hunter-gatherers populations from Western Central Africa
  • Book
  • Full-text available

January 2018

·

215 Reads

·

6 Citations

Download

Figure 2: Human SNP data : projection of the reference table on the first four LDA axes. Colors correspond to model indices. The location of the additional datasets is indicated by a large black star . 
Figure 15: Six scenarios of evolution of four Human populations genotyped at 50,000 SNPs. The 
Figure S4: Projections on the LDA axes of the simulations from the reference table for the controlled SNP example. Colors correspond to model indices: black for Model 1, blue for Model 2 and orange for Model 3. The locations of both simulated pseudo-observed datasets that are analyzed as if they were truly observed data, are indicated by green and red stars . 
Figure S6: Projections on the LDA axes of the simulations from the reference table for the controlled microsatellite example Colors correspond to model indices: black for Model 1, blue for Model 2 and orange for Model 3. The locations of both simulated pseudo-observed datasets are indicated by green and red stars . 
Figure S8: Projections on the first four LDA axes of simulations from the reference table of the Harlequin ladybird analysis. Colors correspond to model indices. The location of the real observed dataset for the Harlequin ladybird is indicated by a black star . 

+6

Reliable ABC model choice via random forests

November 2015

·

583 Reads

·

377 Citations

Bioinformatics

Motivation: Approximate Bayesian computation (ABC) methods provide an elaborate approach to Bayesian inference on complex models, including model choice. Both theoretical arguments and simulation experiments indicate, however, that model posterior probabilities may be poorly evaluated by standard ABC techniques. Results: We propose a novel approach based on a machine learning tool named random forests to conduct selection among the highly complex models covered by ABC algorithms. We thus modify the way Bayesian model selection is both understood and operated, in that we rephrase the inferential goal as a classification problem, first predicting the model that best fits the data with random forests and postponing the approximation of the posterior probability of the predicted MAP for a second stage also relying on random forests. Compared with earlier implementations of ABC model choice, the ABC random forest approach offers several potential improvements: (i) it often has a larger discriminative power among the competing models, (ii) it is more robust against the number and choice of statistics summarizing the data, (iii) the computing effort is drastically reduced (with a gain in computation efficiency of at least fifty), and (iv) it includes an approximation of the posterior probability of the selected model. The call to random forests will undoubtedly extend the range of size of datasets and complexity of models that ABC can handle. We illustrate the power of this novel methodology by analyzing controlled experiments as well as genuine population genetics datasets. Availability: The proposed methodologies are implemented in the R package abcrf available on the CRAN.


Release of a new version of the computer program DIYABC 2_1_0 and associated user manual document

July 2015

·

1,676 Reads

We have recently made available (July 2015) a new version of the computer program DIYABC (DIYABC v2.1.0) which represents a user-friendly approach to approximate Bayesian computation for inference on population history using molecular markers. This new version includes several major and minor improvements.The document proposed here include six sections: A/ Major improvements, B/ Minor improvements, C/ Availability, D/ Reference to cite, E/ Details of the major improvements included in the program DIYABC v2.1.0, and F/ New user manual document for DIYABC v2.1.0


DIYABCv2.0: a software to make Approximate Bayesian Computation inferences about population history using Single Nucleotide Polymorphism, DNA sequence and microsatellite data

January 2014

·

1,539 Reads

·

1,074 Citations

Bioinformatics

DIYABC is a software package for a comprehensive analysis of population history using approximate Bayesian computation (ABC) on DNA polymorphism data. Version 2.0 implements a number of new features and analytical methods. It allows: (i) the analysis of single nucleotide polymorphism (SNP) data at large number of loci, apart from microsatellite and DNA sequence data; (ii) efficient Bayesian model choice using linear discriminant analysis on summary statistics; and (iii) the serial launching of multiple post-processing analyses. DIYABC v2.0 also includes a user-friendly graphical interface with various new options. It can be run on three operating systems: GNU/Linux, Microsoft Windows and Apple Os X. Availability: Freely available with a detailed notice document and example projects to academic users from: http://www1.montpellier.inra.fr/CBGP/diyabc CONTACT: estoup@supagro.inra.fr.


The effect of RAD allele dropout on the estimation of genetic variation within and between populations

June 2013

·

717 Reads

·

284 Citations

Inexpensive short-read sequencing technologies applied to reduced representation genomes is revolutionizing genetic research, especially population genetics analysis, by allowing the genotyping of massive numbers of single-nucleotide polymorphisms (SNP) for large numbers of individuals and populations. Restriction site-associated DNA (RAD) sequencing is a recent technique based on the characterization of genomic regions flanking restriction sites. One of its potential drawbacks is the presence of polymorphism within the restriction site, which makes it impossible to observe the associated SNP allele (i.e. allele dropout, ADO). To investigate the effect of ADO on genetic variation estimated from RAD markers, we first mathematically derived measures of the effect of ADO on allele frequencies as a function of different parameters within a single population. We then used RAD data sets simulated using a coalescence model to investigate the magnitude of biases induced by ADO on the estimation of expected heterozygosity and F(ST) under a simple demographic model of divergence between two populations. We found that ADO tends to overestimate genetic variation both within and between populations. Assuming a mutation rate per nucleotide between 10(-9) and 10(-8) , this bias remained low for most studied combinations of divergence time and effective population size, except for large effective population sizes. Averaging F(ST) values over multiple SNPs, for example, by sliding window analysis, did not correct ADO biases. We briefly discuss possible solutions to filter the most problematic cases of ADO using read coverage to detect markers with a large excess of null alleles.


Table 1 : Comparison of computational cost of the different ABC-SMC schemes.
Figure 4: Population scenario for the honeybees dataset 
Figure 5: Estimates of the posterior distributions of the θ i 's for five independent replicates
Figure 7: Time factor for the population genetics example
Efficient learning in ABC algorithms

October 2012

·

212 Reads

·

9 Citations

Approximate Bayesian Computation has been successfully used in population genetics models to bypass the calculation of the likelihood. These algorithms provide an accurate estimator by comparing the observed dataset to a sample of datasets simulated from the model. Although parallelization is easily achieved, computation times for assuring a suitable approximation quality of the posterior distribution are still long. To alleviate this issue, we propose a sequential algorithm adapted from Del Moral et al. (2012) which runs twice as fast as traditional ABC algorithms. Its parameters are calibrated to minimize the number of simulations from the model.


Fig. 1 Probability estimations of scenario 5 computed using linear discriminant analysis (LDA)-transformed or raw summary statistics for 500 pods simulated under scenario 5 (10 scenarios compared). (a) Pearson's correlation coefficient between probability estimations = 0.940 (95% CI = [0.928, 0.949]). Solid line: y = x; dotted line: linear regression line y = 0.818436x + 0.004878. (b) 95% CIs (i.e. 2.5% and 97.5% quantiles) for each probability values obtained from either LDA-transformed summary statistics (black lines) or raw summary statistics (grey lines).
Table 1 Type I and II error rates estimated for different numbers of simulated data sets
Estimation of demo-genetic model probabilities with Approximate Bayesian Computation using linear discriminant analysis on summary statistics

May 2012

·

428 Reads

·

121 Citations

Comparison of demo-genetic models using Approximate Bayesian Computation (ABC) is an active research field. Although large numbers of populations and models (i.e. scenarios) can be analysed with ABC using molecular data obtained from various marker types, methodological and computational issues arise when these numbers become too large. Moreover, Robert et al. (Proceedings of the National Academy of Sciences of the United States of America, 2011, 108, 15112) have shown that the conclusions drawn on ABC model comparison cannot be trusted per se and required additional simulation analyses. Monte Carlo inferential techniques to empirically evaluate confidence in scenario choice are very time-consuming, however, when the numbers of summary statistics (Ss) and scenarios are large. We here describe a methodological innovation to process efficient ABC scenario probability computation using linear discriminant analysis (LDA) on Ss before computing logistic regression. We used simulated pseudo-observed data sets (pods) to assess the main features of the method (precision and computation time) in comparison with traditional probability estimation using raw (i.e. not LDA transformed) Ss. We also illustrate the method on real microsatellite data sets produced to make inferences about the invasion routes of the coccinelid Harmonia axyridis. We found that scenario probabilities computed from LDA-transformed and raw Ss were strongly correlated. Type I and II errors were similar for both methods. The faster probability computation that we observed (speed gain around a factor of 100 for LDA-transformed Ss) substantially increases the ability of ABC practitioners to analyse large numbers of pods and hence provides a manageable way to empirically evaluate the power available to discriminate among a large set of complex scenarios.


Test of Colonisation Scenarios Reveals Complex Invasion History of the Red Tomato Spider Mite Tetranychus evansi

April 2012

·

507 Reads

·

76 Citations

The spider mite Tetranychus evansi is an emerging pest of solanaceous crops worldwide. Like many other emerging pests, its small size, confusing taxonomy, complex history of associations with humans, and propensity to start new populations from small inocula, make the study of its invasion biology difficult. Here, we use recent developments in Approximate Bayesian Computation (ABC) and variation in multi-locus genetic markers to reconstruct the complex historical demography of this cryptic invasive pest. By distinguishing among multiple pathways and timing of introductions, we find evidence for the "bridgehead effect", in which one invasion serves as source for subsequent invasions. Tetranychus evansi populations in Europe and Africa resulted from at least three independent introductions from South America and involved mites from two distinct sources in Brazil, corresponding to highly divergent mitochondrial DNA lineages. Mites from southwest Brazil (BR-SW) colonized the African continent, and from there Europe through two pathways in a "bridgehead" type pattern. One pathway resulted in a widespread invasion, not only to Europe, but also to other regions in Africa, southern Europe and eastern Asia. The second pathway involved the mixture with a second introduction from BR-SW leading to an admixed population in southern Spain. Admixture was also detected between invasive populations in Portugal. A third introduction from the Brazilian Atlantic region resulted in only a limited invasion in Europe. This study illustrates that ABC methods can provide insights into, and distinguish among, complex invasion scenarios. These processes are critical not only in understanding the biology of invasions, but also in refining management strategies for invasive species. For example, while reported observations of the mite and outbreaks in the invaded areas were largely consistent with estimates of geographical expansion from the ABC approach, historical observations failed to recognize the complex pathways involved and the corresponding effects on genetic diversity.


Figure S1

April 2012

·

12 Reads

Schematic representation of the 24 competing introduction scenarios considered for the inference of the introduction routes of Tetranychus evansi in Africa and Europe tested by ABC analysis. Four populations were considered in each analysis as identified by the clustering structure analysis (see text and Figure 1): Pop1 – BR-SW is the native population from southwest Brazil, Pop2 – AF corresponds to African samples, Pop3 – EU corresponds to European samples pro parte, Pop4 – MED corresponds to Mediterranean samples (Andalusia in southern Spain, Tunisia and Crete). N2, N3, N4 correspond to number of founder individuals and were assumed to be different in all introduced populations. NB, NA, NM, NS correspond to stable effective population size in Pop1 – Br-SW, Pop2 – AF, Pop3 – EU and Pop4 – MED, respectively. The time of event (ti), in number of generations, corresponds to the time at which an introduced population has diverged from its source population; the duration of the initial bottleneck (db) was assumed to be the same in all the introduced populations. Time 0 is the sampling date. Admixture rate r relative to population Pop1 – BR-SW and 1-r to either population Pop 2 – AF or Pop3 – EU. We assumed that all populations evolved as isolated demes and no exchange of migrants occurred after the introduction. All parameters with associated prior distributions are described in Table S2. (TIF)


Citations (48)


... The evolutionary scenario concerns four populations observed at the present time linked through three divergence and two admixture events involving two unobserved populations. The same dataset has also been analyzed in Cornuet et al. (2006) with a slightly different population scenario. The dimension of Θ is 15 and the one of the summary statistics vector S(·) is 30. ...

Reference:

Optimal parallelization of a sequential approximate Bayesian computation algorithm
Inférence bayésienne dans des scénarios évolutifs complexes avec populations mélangées: application à l’abeille domestique
  • Citing Book
  • October 2006

... GenAlEx 6.5 program was used to detect the allele numbers (N A ), effective allele numbers (N E ), the frequencies of alleles, intra-population diversity of alleles, and pairwise comparisons of the isolates (18). The expected heterozygosity (H E ) of loci was calculated using the Arlequin 3.11 (20). In multiple loci, haplotype overlaps were determined with GenAlEx 6.5 (18). ...

Bayesian analysis of an admixture model with mutations and arbitrarily linked markers
  • Citing Article
  • January 2005

... Model checking revealed good congruence between observed data and data simulated under scenario 2 from the posterior predictive distribution and the prior distribution (Supplementary Figure S8). Only a few observed summary statistics were found in the tails of distributions of summary statistics computed from the simulated data (5 out of 37 among which none had a tailarea probability lower than 0.001; Supplementary Table S6), thus indicating good reliability for this most supported model (Estoup et al. 2018). Historical parameter estimates under scenario 2 suggested that the founders of the Corsican ancestral population could have been introduced around 8000 years BP (T i ), before splitting (T 2 ) 606 years BP into Cinto and Bavella populations (Table 2, Supplementary Figure S9). ...

Application of approximate Bayesian computation to infer the genetic history of Pygmy hunter-gatherers populations from Western Central Africa

... ABC provides a statistical framework for using model simulations to approximate posterior distributions when likelihood functions are not available (Lintusaari et al., 2016). RFs are an ensemble estimation method from machine learning which have previously been combined with ABC to estimate parameter values (Raynal et al., 2018) and rank candidate models (Pudlo et al., 2015). ...

Reliable ABC model choice via random forests

Bioinformatics

... We created three such samples of 219 individuals to represent our three experimental populations and we allowed these three populations to evolve via drift with no migration for 3 generations. For each simulation, we assumed a recombination rate of 22cM/Mb (Beye, et al. 2006;Solignac, et al. 2007) and a mutation rate of 3x10 -9 (Liu, et al. 2017). For each N 0 we simulated 100000 1Kbp windows and sampled 84 total chromosomes across the three populations, representing our experimental sampling scheme. ...

A third generation microsatellite linkage map of the honeybee, Apis mellifera, and its comparison with the genome sequence.
  • Citing Article
  • July 2007

... This little tsunami of articles was a leap forward in the quest for unravelling the mechanisms behind the evolutionary success of honeybees and other social insects. The genome annotation revealed several surprises; for example, a very low number of transposons, and over 600 genes missing in Drosophila but found in mammals, nematodes and even in yeast, including disease-related genes (Honey Bee Genome Consortium, 2006). Another aspect of HBGP that provided opportunities for many studies in honeybees was the discovery of two types of DNA methyltransferases (DNMTs), two paralogs of DNMT1 and one copy of DNMT3 suggesting that unlike D. melanogaster and C. elegans, A. mellifera can methylate its genome. ...

Insight into social insects from the genome of the honeybee Apis mellifera

Nature

... The lowered competitive ability of tadpoles from invasion front populations may also arise for reasons other than trade-offs with growth. Invasion front populations have a lowered genetic diversity in comparison with range core populations (Estoup et al., 2004), an effect likely due to spatial sorting and serial founder effects (Slatkin & Excoffier, 2012). Theory also predicts that deleterious alleles can become fixed on invasion fronts, causing an accumulation of deleterious effects during range expansion ("expansion load"). ...

Genetic analysis of complex demographic scenarios: Spatially expanding populations of the cane toad, Bufo marinus
  • Citing Article
  • September 2004

Evolution

... To explore the possible evolutionary scenarios of the C. pruni complex, an ABC (Approximate Bayesian Computation) analysis was performed on the data of the COI-t gene (i.e., 212 species A sequences and 332 species B sequences) using DIYABC 2.1 software (Cornuet et al. 2008;Cornuet et al. 2014). It was used to compare several competing hypotheses on demographic history, population divergence and expansion of C. pruni populations at the Western Palearctic scale. ...

DIYABCv2.0: a software to make Approximate Bayesian Computation inferences about population history using Single Nucleotide Polymorphism, DNA sequence and microsatellite data

Bioinformatics

... It is well-known that Algorithm A1 is inefficient since all draws are generated independently from the prior. More efficient sampling schemes can also be implemented by replacing the accept/reject step with more informative proposals (Marin et al., 2012;Sisson et al., 2007;Beaumont et al., 2009); see chapter 4 of Sisson et al. (2018) for a review. While these methods allow us to obtain more readily accessible accepted draws, each accepted draw is still from the posterior in (1). ...

Adaptivity for ABC algorithms: the ABC-PMC scheme
  • Citing Article
  • January 2011

... Having used cross-over detection and a genetic map for contig scaffolding, we could estimate the total genetic map for AMelMel, which was approximately 50 Morgans long, giving an average recombination rate in the genome of 23 cM/Mb, close to the first estimates based on RAPD and microsatellite genetic maps [36][37][38][39] and to the most recent estimates based on SNPs [11,40] (Table 1). However, although we used the same sequencing dataset as in Liu et al. [16], we found a drastic reduction in recombination rate between our genetic map and the one they initially published, which was 37 cM/Mb (Table 1). ...

A microsatellite-based linkage map of the honeybee