Renaud Vitalis’s research while affiliated with Université de Montpellier and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (104)


Estimating hierarchical F–statistics from Pool–Seq data
  • Preprint
  • File available

November 2024

·

16 Reads

Mathieu Gautier

·

·

Renaud Vitalis

Introduced over seventy years ago, F-statistics have been and remain central to population and evolutionary genetics. Among them, F ST is one of the most commonly used descriptive statistics in empirical studies, notably to characterize the structure of genetic polymorphisms within and between populations, to shed light on the evolutionary history of populations, or to identify marker loci under differential selection for adaptive traits. However, the use of F ST in simplified population models can overlook important hierarchical structures, such as geographic or temporal subdivisions, potentially leading to misleading interpretations and increasing false positives in genome scans for adaptive differentiation. Hierarchical F-statistics have been introduced to account for multiple pre-defined levels of population structure. Several estimators have also been proposed, including robust ones implemented in the popular R package hierfstat . Nevertheless, these were primarily designed for individual genotyping data and can be computationally intensive for large genomic datasets. In this study, we extend previous work by developing unbiased method-of-moments estimators for hierarchical F-statistics tailored for Pool-Seq data, a cost-effective alternative to individual genome sequencing. These Pool--Seq estimators have been developed in an Anova framework, using definitions based on identity-in-state probabilities. The new estimators have been implemented in an updated version of the R package poolfstat , together with estimators for sample allele count data derived from individual genotyping data. We validate and compare the performance of these estimators through extensive simulations under a hierarchical island model. Finally, we apply these estimators to real Pool-Seq data from Drosophila melanogaster populations, demonstrating their usefulness in revealing population structure and identifying loci with high differentiation within or between groups of subpopulations and associated with spatial or temporal genetic variation.

Download

Footprints of worldwide adaptation in structured populations of D. melanogaster through the expanded DEST 2.0 genomic resource

November 2024

·

176 Reads

·

1 Citation

·

·

Mathieu Gautier

·

[...]

·

Josefa Gonzalez

Large scale genomic resources can place genetic variation into an ecologically informed context. To advance our understanding of the population genetics of the fruit fly Drosophila melanogaster , we present an expanded release of the community-generated population genomics resource Drosophila Evolution over Space and Time (DEST 2.0; https://dest.bio/). This release includes 530 high-quality pooled libraries from flies collected across six continents over more than a decade (2009-2021), most at multiple time points per year; 211 of these libraries are sequenced and shared here for the first time. We used this enhanced resource to elucidate several aspects of the species' demographic history and identify novel signs of adaptation across spatial and temporal dimensions. We showed that patterns of secondary contact, originally characterized in North America, are replicated in South America and Australia. We also found that the spatial genetic structure of populations is stable over time, but that drift due to seasonal contractions of population size causes populations to diverge over time. We identified signals of adaptation that vary between continents in genomic regions associated with xenobiotic resistance, consistent with independent adaptation to common pesticides. Moreover, by analyzing samples collected during spring and fall across Europe, we provide new evidence for seasonal adaptation related to loci associated with pathogen response. Furthermore, we have also released an updated version of the DEST genome browser. This is a useful tool for studying spatio-temporal patterns of genetic variation in this classic model system.


The invasion of the African continent by Pseudocercospora fijiensis. (a) First reports of black leaf streak disease in Africa, adapted from Blomme et al. (2013), and populations sampled. The countries are colored according to the date of first report of the disease, the darker the more recent report. (b) Historical data mention the introduction into Gabon of infected plant material brought from Asia, in 1978, leading to the main hypothesis of an invasion of Africa from Gabon. (c) A potential observation of the disease reported in Zambia in 1973 suggests that a more central introduction might have occurred before that into Gabon, but this observation was never officially validated. Gabon might therefore not be the introduction point. (d) The origin of Central and East African contaminations remains unknown, but a later independent introduction into a coastal area of East Africa in 1987 (Pemba Island, Tanzania) might have led to the spread of the disease along the East African coast and toward the interior.
Scenarios compared in ABC‐RF analysis. Scenario MI (Multiple Introductions) corresponds to the scenarios with two independent introductions into Africa from the same Asian origin while scenarios SI (Single Introduction) corresponds to the scenarios where Africa was invaded through a single introduction event from Asia. Scenario SII (Single introduction, Independent contaminations) represents the scenario where, after a single introduction from Asia, the African continent has been invaded following two independent contaminations. Scenarios SIS12 and SIS21 (Single introduction, Successive contaminations) correspond to the scenarios where after a single introduction into Africa, the continent was then colonized through successive contaminations. SEA is a sampled South East Asian population, sea an unsampled South East Asian population, afi the unsampled African populations, and AFi the sampled African populations. Thin lines reflect effective population size reductions (bottlenecks). Solid thick lines represent sampled populations and hollow lines represent unsampled populations. Black lines correspond to ancestral South East Asian population divergences and gray lines correspond to events occurring within Africa. We used a ‘d’ as a prefix to name parameters of length of time: d_mig/d_nomig, the length of time with/without migration between AF1 and AF2, d_Gsea/dGaf, the period of time of the established unsampled (Ghost) sea and af populations (i.e., for the af populations, after the reduction of population size that followed the introduction). Bottleneck durations are expressed using the ‘db’ prefix. Parameters with names starting with a ‘T' are the timing of events, the origin of populations (T_Gsea, T_Gaf and T_AF) and the end of the bottleneck period (Tb_Gaf and Tb_AF). For the definition and priors, see Appendix S1
Bayesian clustering of African multilocus microsatellite haplotypes, using structure for K = 2 to K = 5. Each individual is represented by a vertical line, divided into up to K‐colored segments representing the individual's estimated likelihood of membership of each of the K clusters. Vertical black lines separate individuals from the different populations of origin, as indicated by population abbreviations under the bar plot (detailed in Table 1)
Revisiting the historical scenario of a disease dissemination using genetic data and Approximate Bayesian Computation methodology: The case of Pseudocercospora fijiensis invasion in Africa

April 2023

·

120 Reads

The reconstruction of geographic and demographic scenarios of dissemination for invasive pathogens of crops is a key step toward improving the management of emerging infectious diseases. Nowadays, the reconstruction of biological invasions typically uses the information of both genetic and historical information to test for different hypotheses of colonization. The Approximate Bayesian Computation framework and its recent Random Forest development (ABC-RF) have been successfully used in evolutionary biology to decipher multiple histories of biological invasions. Yet, for some organisms, typically plant pathogens, historical data may not be reliable notably because of the difficulty to identify the organism and the delay between the introduction and the first mention. We investigated the history of the invasion of Africa by the fungal pathogen of banana Pseudocercospora fijiensis, by testing the historical hypothesis against other plausible hypotheses. We analyzed the genetic structure of eight populations from six eastern and western African countries, using 20 microsatellite markers and tested competing scenarios of population foundation using the ABC-RF methodology. We do find evidence for an invasion front consistent with the historical hypothesis, but also for the existence of another front never mentioned in historical records. We question the historical introduction point of the disease on the continent. Crucially, our results illustrate that even if ABC-RF inferences may sometimes fail to infer a single, well-supported scenario of invasion, they can be helpful in rejecting unlikely scenarios, which can prove much useful to shed light on disease dissemination routes.


Figure 1. Correlations between the relative frequency of FBNSV genome segments and that of their respective mRNAs. The data used here are those from Trial A, Experiment 1 (see Materials and Methods). Each panel shows the correlation between the relative frequency of an FBNSV segment and the relative frequency of the corresponding mRNA. Data points, linear regressions, correlation coefficients, and P-values are shown in blue and red for FBNSV infecting faba bean and alfalfa, respectively. '***', '**', and '*' correspond to P-value ≤ 0.001, 0.01, and 0.05, respectively. The dotted line illustrates a slope of 1.
Figure 2. Radar plot of FBNSV genome and transcriptome formulas in Trial A (Experiment 1). The median relative frequencies of each FBNSV segment (left) or of their corresponding transcripts (right) are represented on one of the eight axes composing the radar plot (formulas calculated from the sixteen faba bean and twenty-eight alfalfa plants in Trial A). The FBNSV formulas observed in faba bean and alfalfa are represented in blue and red, respectively. SDs are represented by colored bars. The distances between the transcriptome formulas observed in faba bean and alfalfa are significantly smaller than those between the corresponding genome formulas (Table 1).
Gene copy number variations at the within-host population level modulate gene expression in a multipartite virus

June 2022

·

65 Reads

·

18 Citations

Virus Evolution

Multipartite viruses have a segmented genome, with each segment encapsidated separately. In all multipartite virus species for which the question has been addressed, the distinct segments reproducibly accumulate at a specific and host-dependent relative frequency, defined as the 'genome formula'. Here, we test the hypothesis that the multipartite genome organization facilitates the regulation of gene expression via changes of the genome formula and thus via gene copy number variations. In a first experiment, the faba bean necrotic stunt virus (FBNSV), whose genome is composed of eight DNA segments each encoding a single gene, was inoculated into faba bean or alfalfa host plants, and the relative concentrations of the DNA segments and their corresponding messenger RNAs (mRNAs) were monitored. In each of the two host species, our analysis consistently showed that the genome formula variations modulate gene expression, the concentration of each genome segment linearly and positively correlating to that of its cognate mRNA but not of the others. In a second experiment, twenty parallel FBNSV lines were transferred from faba bean to alfalfa plants. Upon host switching, the transcription rate of some genome segments changes, but the genome formula is modified in a way that compensates for these changes and maintains a similar ratio between the various viral mRNAs. Interestingly, a deep-sequencing analysis of these twenty FBNSV lineages demonstrated that the host-related genome formula shift operates independently of DNA-segment sequence mutation. Together, our results indicate that nanoviruses are plastic genetic systems, able to transiently adjust gene expression at the population level in changing environments, by modulating the copy number but not the sequence of each of their genes.


Average flowering time per family for the two sampling years and the two vernalization treatments. Short vernalization is in gray and long vernalization in black. The large dots and the horizontal lines stand for the average flowering date for each vernalization treatment, for the years 1987 (dotted lines) or 2009 (dashed lines). Black crossing lines indicate that the reaction norms differ between families, as expected if genotype × environment interactions are significant
Selection gradients for flowering time. Established as the relationship between the genetic value for flowering time (family average, in degree.days) and the genetic value for relative fitness (family average of the relative number of seeds), for each sampling year and vernalization treatment. Lines stand for the linear regression
Analyses of the “realized fitness,” estimated as the absolute change in frequency of the MLGs through time. MLGs with residual heterozygosity were removed from this analysis. (a) Relationship with the average number of seeds produced by plants of a given MLG in the greenhouse. (b) Selection gradient for flowering time. Each point stands for the average flowering date for a given MLG. The black regression lines are estimated using all points (n = 48; a: slope = 5×10‐5 points of frequency per seed p = .094; b: slope = −0.0002 95% confidence interval: −0.0006; 0.0001 p = .179). This includes MLGs that were not observed in 1987 (black dots), for which the change in frequency is necessarily always positive. The dotted lines are the regression lines for the analysis restricted to the MLGs present in 1987 (white dots only; n = 12; a: slope = 0.0002 p = .024; b: slope = −0.0009 95% confidence interval: −0.0017; −0.0002 p = .038). Q–Q plots for the selection gradients are provided in Figure S3
Test of selection for increasing values of Ne. p‐Value, defined as the proportion of simulated datasets where the slope of the selection gradient is steeper than the observed slope, for the simulations of drift alone (a) considering all the homozygous MLGs (n = 48) or (b) considering only the MLGs that were already present in 1987 (n = 12). The dotted line indicates the 0.05 threshold value for significance. The vertical dashed line is the effective size estimated using the temporal FST and considering the 16 microsatellite loci as independent (Ne = 19; p = .182 with n = 48 (a); p = .047 with n = 12 (b))
Hypotheses for the expected selective pressure on flowering time under climate change. (a) Selective response expected under the hypothesis that the phenotypic optimum for flowering date remains the same. The selective response is expected in the opposite direction compared to the plastic response to increased temperatures. This corresponds to the countergradient hypothesis. (b) Selective response expected under the hypothesis that the phenotypic optimum for flowering date is displaced with climate change and that it becomes advantageous to flower earlier. The selective response is expected in the same direction as the plastic response to increased temperatures. This corresponds to the cogradient hypothesis. (c) Selective response expected under the hypothesis that flowering time is under directional selection
Evolution of flowering time in a selfing annual plant: Roles of adaptation and genetic drift

January 2022

·

106 Reads

·

5 Citations

Abstract Resurrection studies are a useful tool to measure how phenotypic traits have changed in populations through time. If these trait modifications correlate with the environmental changes that occurred during the time period, it suggests that the phenotypic changes could be a response to selection. Selfing, through its reduction of effective size, could challenge the ability of a population to adapt to environmental changes. Here, we used a resurrection study to test for adaptation in a selfing population of Medicago truncatula, by comparing the genetic composition and flowering times across 22 generations. We found evidence for evolution toward earlier flowering times by about two days and a peculiar genetic structure, typical of highly selfing populations, where some multilocus genotypes (MLGs) are persistent through time. We used the change in frequency of the MLGs through time as a multilocus fitness measure and built a selection gradient that suggests evolution toward earlier flowering times. Yet, a simulation model revealed that the observed change in flowering time could be explained by drift alone, provided the effective size of the population is small enough (


Power and limits of selection genome scans on temporal data from a selfing population

November 2021

·

37 Reads

·

5 Citations

Peer Community Journal

Tracking genetic changes of populations through time allows a more direct study of the evolutionary processes acting on the population than a single contemporary sample. Several statistical methods have been developed to characterize the demography and selection from temporal population genetic data. However, these methods are usually developed under the assumption of outcrossing reproduction and might not be applicable when there is substantial selfing in the population. Here, we focus on a method to detect loci under selection based on a genome scan of temporal differentiation, adapting it to the particularities of selfing populations. Selfing reduces the effective recombination rate and can extend hitch-hiking effects to the whole genome, erasing any local signal of selection on a genome scan. Therefore, selfing is expected to reduce the power of the test. By means of simulations, we evaluate the performance of the method under scenarios of adaptation from new mutations or standing variation at different rates of selfing. We find that the detection of loci under selection in predominantly selfing populations remains challenging even with the adapted method. Still, selective sweeps from standing variation on predominantly selfing populations can leave some signal of selection around the selected site thanks to historical recombination before the sweep. Under this scenario, ancestral advantageous alleles at low frequency leave the strongest local signal, while new advantageous mutations leave no local footprint of the sweep.


Figure 6: Estimated f 3 statistics with their 95% confidence intervals for the allele count and 30X Pool-Seq data sets
Figure 10: Admixture graphs resulting from the positioning of BR-Pal onto the scaffold tree of native and Hawaiian populations (Figure 7a) with BIC less than 6 units higher than the BIC with the best fitting graph (within red box and represented in Figure 9B). For each population, the graph (as obtained with the function add.leaf) is displayed together with i) the worst fitted f-statistics and its associated Z-score; and ii) the difference of BIC of the graph with the graphs displaying the best fitting graph (∆ BIC ) as a measure of support. For all the graphs, the fitted edge lengths are in drift units (x1,000) since drift.scaling argument was set to TRUE. 25
Figure 12: Admixture graphs resulting from the positioning of US-Col, US-Nca, US-Sdi, US-Wat and US-Wis onto the am.scaf scaffold graph that relates the scaffold tree of native populations and BR-Pal and US-Sok (red frame). The best fitting graphs (as obtained with the function add.leaf) are displayed together with i) the worst fitted f-statistics and their associated Z-score; and ii) the difference of their BIC with respect to the graphs displaying the second lowest BIC (∆ BIC ) as a measure of support. The target populations are highlighted in yellow. For all the graphs, the fitted edge lengths are in drift units (x1,000) since drift.scaling argument was set to TRUE.
f ‐Statistics estimation and admixture graph construction with Pool‐Seq or allele count data using the R package poolfstat

November 2021

·

288 Reads

·

62 Citations

Molecular Ecology Resources

By capturing various patterns of the structuring of genetic variation across populations, f ‐statistics have proved highly effective for the inference of demographic history. Such statistics are defined as covariance of SNP allele frequency differences among sets of populations without requiring haplotype information and are hence particularly relevant for the analysis of pooled sequencing (Pool‐Seq) data. We here propose a reinterpretation of the F (and D) parameters in terms of probability of gene identity and derive from this unified definition unbiased estimators for both Pool‐Seq data and standard allele count data obtained from individual genotypes. We implemented these estimators in a new version of the R package poolfstat, which now includes a wide range of inference methods: (i) three‐population test of admixture; (ii) four‐population test of treeness; (iii) F4‐ratio estimation of admixture rates; and (iv) fitting, visualization and (semi‐automatic) construction of admixture graphs. A comprehensive evaluation of the methods implemented in poolfstat on both simulated Pool‐Seq (with various sequencing coverages and error rates) and allele count data confirmed the accuracy of these approaches, even for the most cost‐effective Pool‐Seq design involving relatively low sequencing coverages. We further analyzed a real Pool‐Seq data made of 14 populations of the invasive species Drosophila suzukii which allowed refining both the demographic history of native populations and the invasion routes followed by this emblematic pest. Our new package poolfstat provides the community with a user‐friendly and efficient all‐in‐one tool to unravel complex population genetic histories from large‐size Pool‐Seq or allele count SNP data.


Figure 3. Comparison of poolfstat and AdmixTools estimates across 250 simulated allele count datasets (AC m≥1% ). A) All estimates of the 15 basis f −statistics taking P1 as the reference population and corresponding to 5 f 2 of the form (P1,Px) and the 10 f 3 of the form (P1;Px,Py) (with x = 2, .., 6; y = 3, .., 6 and y > x). B) All Block-jackknife estimates of the covariance matrix Q of the 15 basis f −statistics (15 error variances and 105 error covariances). C) All estimates of the 60 f 3 (scaled f 3 ) and their associated Z-scores (D). E) All estimates of the 45 D−statistics (scaled f 4 ) and their associated Z-scores (F). For each comparison, the Mean Absolute Difference (MAD) between the parameter estimates of the two programs are given on the upper left corner of the plots. In A), C) and E), poolfstat estimates correspond to block-jackknife means (i.e., they only include SNPs eligible for block-jackknife). The given MAD' value is the MAD between AdmixTools and poolfstat estimates that include all SNPs (see documentation for the compute.fstats function). In D), a consistency score β is also given and was computed as the proportion of Z-scores < −1.65 (i.e., significant three-population test of admixture at a 5% threshold) with both programs among the n = 216 ones significant in at least one of the two programs. Similarly, in F), the given consistency score β is computed as the proportion of absolute Z-scores < 1.96 (i.e., passing the four-population treeness at a 5% threshold) with both programs among the n = 1, 912 ones with an absolute Z-scores < 1.96 in at least one of the two programs)
Figure 4. Distribution of the estimated drift-scaled lengths for all the branches in Figure 1 simulated scenario using admixture graph fitting (as implemented in the fit.graph function of poolfstat) for different types of data with a 5% threshold on the overall SNP MAF. Each box plot summarize the distribution of the 250 estimated lengths of each of the ten branches obtained from the analysis of either allele count dataset ("Counts") or one of the five different simulated Pool-Seq read count datasets ("PSλX") with different mean coverages (λ = 30; 50; 75; 100; and200) as generated from the genotyping data simulated under the scenario depicted in Figure 1. Pool-Seq read count data were generated with no sequencing errors ( = 0) in A) and D) and with a sequencing error rate of = 1 and = 2.5 in panel B) and C) respectively (Table S1). In D), the read count data were analyzed as allele counts which corresponds to a bad practice. Note that the two branches coming from the root are combined since the position of the root is not identifiable by the model (i.e., τ P8↔P9 = τ P8↔R + τ P9↔R ). Note that the box plots obtained from the analysis of count data are replicated in each panel for comparison purposes. For each branch, a red dotted line indicates the underlying simulated value. For Pool-Seq data, the overall MAF was estimated from read counts. 30
f -statistics estimation and admixture graph construction with Pool-Seq or allele count data using the R package poolfstat

May 2021

·

342 Reads

·

5 Citations

By capturing various patterns of the structuring of genetic variation across populations, f -statistics have proved highly effective for the inference of demographic history. Such statistics are defined as covariance of SNP allele frequency differences among sets of populations without requiring haplotype information and are hence particularly relevant for the analysis of pooled sequencing (Pool-Seq) data. We here propose a reinterpretation of the F (and D ) parameters in terms of probability of gene identity and derive from this unified definition unbiased estimators for both Pool-Seq and standard allele count data obtained from individual genotypes. We implemented these estimators in a new version of the R package poolfstat , which now includes a wide range of inference methods: (i) three-population test of admixture; (ii) four-population test of treeness; (iii) F 4 -ratio estimation of admixture rates; and (iv) fitting, visualization and (semi-automatic) construction of admixture graphs. A comprehensive evaluation of the methods implemented in poolfstat on both simulated Pool-Seq (with various sequencing coverages and error rates) and allele count data confirmed the accuracy of these approaches, even for the most cost-effective Pool-Seq design involving low sequencing coverages. We further analyzed a real Pool-Seq data made of 14 populations of the invasive species Drosophila suzukii which allowed refining both the demographic history of native populations and the invasion routes followed by this emblematic pest. Our new package poolfstat provides the community with a user-friendly and efficient all-in-one tool to unravel complex population genetic histories from large-size Pool-Seq or allele count SNP data.


Extending Approximate Bayesian Computation with Supervised Machine Learning to infer demographic history from genetic polymorphisms using DIYABC Random Forest

May 2021

·

310 Reads

·

123 Citations

Molecular Ecology Resources

Simulation‐based methods such as Approximate Bayesian Computation (ABC) are well‐adapted to the analysis of complex scenarios of populations and species genetic history. In this context, supervised machine learning (SML) methods provide attractive statistical solutions to conduct efficient inferences about scenario choice and parameter estimation. The Random Forest methodology (RF) is a powerful ensemble of SML algorithms used for classification or regression problems. RF allows conducting inferences at a low computational cost, without preliminary selection of the relevant components of the ABC summary statistics, and bypassing the derivation of ABC tolerance levels. We have implemented a set of RF algorithms to process inferences using simulated datasets generated from an extended version of the population genetic simulator implemented in DIYABC v2.1.0. The resulting computer package, named DIYABC Random Forest v1.0, integrates two functionalities into a user‐friendly interface: the simulation under custom evolutionary scenarios of different types of molecular data (microsatellites, DNA sequences or SNPs) and RF treatments including statistical tools to evaluate the power and accuracy of inferences. We illustrate the functionalities of DIYABC Random Forest v1.0 for both scenario choice and parameter estimation through the analysis of pseudo‐observed and real datasets corresponding to pool‐sequencing and individual‐sequencing SNP datasets. Because of the properties inherent to the implemented RF methods and the large feature vector (including various summary statistics and their linear combinations) available for SNP data, DIYABC Random Forest v1.0 can efficiently contribute to the analysis of large SNP datasets to make inferences about complex population genetic histories.


Evolution of flowering time in a selfing annual plant: Roles of adaptation and genetic drift

August 2020

·

126 Reads

·

1 Citation

Resurrection studies are a useful tool to measure how phenotypic traits have changed in populations and they allow testing whether these traits modifications are a response to selection caused by an environmental change. Selfing, through its reduction of effective size, could challenge the ability of a population to adapt to environmental changes. Here, we used a resurrection study to test for adaptation in a selfing population of Medicago truncatula , by comparing the genetic composition and flowering across 22 generations. We found evidence for evolution towards earlier flowering times by about two days and a peculiar genetic structure, typical for highly selfing population, where some multilocus genotypes (MLGs) are persistent through time. We used the change in frequency of the MLGs through time as a multilocus fitness measure and built a selection gradient that suggests evolution towards earlier flowering times. Yet, a simulation model revealed that the observed change in flowering time could be explained by drift alone, provided the effective size of the population is small enough (<150). These analyses suffer from the difficulty to estimate the effective size in a highly selfing population, where effective recombination is severely reduced.


Citations (67)


... Using these SNPs as diagnostic markers, we will subsequently estimate inversion frequencies in pooled resequencing [Pool-Seq; 62] data, where individuals with uncertain inversion status are pooled prior to DNA sequencing. In particular, we will utilize the DEST v.2.0 dataset [63,64], which is a collection of pooled whole-genome sequencing data from more than 700 D. melanogaster population samples, densely collected world-wide through time and space. Using the inversionspecific marker SNPs, we will estimate the inversion frequencies of our two focal inversions in the Pool-Seq data of each population sample and test how inversions influence genome-wide linkage disequilibrium and population structure. ...

Reference:

The influence of chromosomal inversions on genetic variation and clinal patterns in genomic data of Drosophila melanogaster
Footprints of worldwide adaptation in structured populations of D. melanogaster through the expanded DEST 2.0 genomic resource

... In general, gene copy number variation strongly impacts the gene expression and phenotype of the organisms (20), thereby adjusting their gene copy number (CN) to respond to new conditions (21)(22)(23)(24)(25)(26). However, while gene expression modulation based on relative genomic element frequencies has been demonstrated in viruses such as the faba bean necrotic stunt virus (FBNSV), a multipartite DNA virus with eight genomic segments (27), to date, achieving specific SGFs in a specific infectious context has not been associated with clear benefits. ...

Gene copy number variations at the within-host population level modulate gene expression in a multipartite virus

Virus Evolution

... Similar heritability values were also reported in many other legume species, including, inter alia, Glycine max 57,58 , Medicago truncatula 59,60 , Vigna unguiculata 61 and Cicer arietinum 62 . It should be noted that the occurrence of vernalization decreases heritability values for flowering time and this reduction is proportional to the length of vernalization 59 . Therefore, heritability values calculated for field observations with some effective vernalization days are usually lower than those obtained for a controlled environment without any vernalization. ...

Evolution of flowering time in a selfing annual plant: Roles of adaptation and genetic drift

... It is important to note that the pooled sequencing approach generally weakens the power of selection tests compared to cases where haplotypelevel information is included (Kessner et al. 2013). Regardless of whether tests are applied to single SNPs or haplotypes, the search for sites under selection should be compromised whenever genetic draft in highly selfing populations causes even the largest possible changes in allele frequency to be mundane (Navascues et al. 2021). ...

Power and limits of selection genome scans on temporal data from a selfing population
  • Citing Article
  • November 2021

Peer Community Journal

... /2025 frequency estimates. The theoretical and analytical frameworks existing in Pool-seq, a population genetic tool which involves sequencing a pooled mixture of DNA from multiple individuals together in a cost-effective way to infer genetic metrics without needing individual genotyping, can be adapted towards eDNA-based population studies in the future (Czech et al., 2024;Gautier et al., 2013Gautier et al., , 2022. ...

f ‐Statistics estimation and admixture graph construction with Pool‐Seq or allele count data using the R package poolfstat

Molecular Ecology Resources

... only a few programs can handle allele count data for example, Popoolation 471(Kofler et al., 2011) and CRISP(Bansal, 2010) for SNP calling, Plink(Chang et al., 472 2015;Purcell et al., 2007) and the R package poolfstat(Gautier et al., 2021; Hivert et al., 473 2018) for population genetics or GEMMA(Zhou and Stephens, 2012) and LDAK(Speed 474 et al., 2020) for association studies. However, when considering eusocial insects from the 475 same colony as a pool we might break underlying assumptions made by these models.476In ...

f -statistics estimation and admixture graph construction with Pool-Seq or allele count data using the R package poolfstat

... We evaluated the support for different possible scenarios of population divergence and admixture using t coalescence-based Approximate Bayesian Computation (ABC) and supervised machine learning methods implemented in DIYABC-RF (Random Forest) v.1.1.1 [83]. To reduce the computing load, we kept SNPs with no missing among individuals (--max-missing = 1) using vcftools v0.1.13 ...

Extending Approximate Bayesian Computation with Supervised Machine Learning to infer demographic history from genetic polymorphisms using DIYABC Random Forest

Molecular Ecology Resources

... Linkage disequilibrium (LD) was calculated with vcftools and bifurcation plots with the R package "rehh" [76]. We sampled 95 unrelated individuals for five representative populations of the 1000 Genomes dataset and calculated bifurcation plots for the haplotypes carrying the derived C allele of rs4751440. ...

rehh 2.0: a reimplementation of the R package rehh to detect positive selection from haplotype structure
  • Citing Preprint
  • August 2016

... We used DIYABC Random Forest v. 1.0 [60], which uses Approximate Bayesian Computation to evaluate different evolutionary scenarios, to infer colonization pathways. For all scenarios, training sets were generated using 2000 simulations per model. ...

Extending Approximate Bayesian Computation with Supervised Machine Learning to infer demographic history from genetic polymorphisms using DIYABC Random Forest

... Ainsi, en raison de la dérive, une population de petite taille efficace peut perdre un allèle pourtant avantagé par la sélection naturelle. De la purgée par la sélection est faible (Whitlock, 2000;Olivieri et al., 2010). La fitness des individus va donc diminuer avec la dérive génétique qui tend à augmenter la probabilité de fixation de mutations délétères au sein de la population (Whitlock, 2000). ...

Génétique et évolution des populations et des métapopulations
  • Citing Book
  • January 2016