Article

# Inference of Population Structure Using Multilocus Genotype Data

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

## Abstract

We describe a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations. We assume a model in which there are K populations (where K may be unknown), each of which is characterized by a set of allele frequencies at each locus. Individuals in the sample are assigned (probabilistically) to populations, or jointly to two or more populations if their genotypes indicate that they are admixed. Our model does not assume a particular mutation process, and it can be applied to most of the commonly used genetic markers, provided that they are not closely linked. Applications of our method include demonstrating the presence of population structure, assigning individuals to populations, studying hybrid zones, and identifying migrants and admixed individuals. We show that the method can produce highly accurate assignments using modest numbers of loci—e.g., seven microsatellite loci in an example using genotype data from an endangered bird species. The software used for this article is available from http://www.stats.ox.ac.uk/~pritch/home.html.

## No full-text available

... A Factorial Correspondence Analysis (FCA) was conducted in GENETX v. 4.0 (Belkhir et al., 1996) for each dataset to identify the most distinctive samples, which could be wrongly genotyped. STRUCTURE v. 2.3.4 (Pritchard, Stephens and Donnelly, 2000) was used to evaluate the possible differences between the three datasets in genetic clustering of individuals caused by the inclusion of genotypes of different reliability levels. A total of five independent simulations with 50,000 Markov Chain ...
... The results were processed in Structure Harvester v. 0.6.94 (Earl and vonHoldt, 2012) and the most probable K was estimated using the Evanno method (Evanno, Regnaut and Goudet, 2005) and the posterior probability of K (Pritchard, Stephens and Donnelly, 2000). ...
... Population substructure was estimated by a FCA carried out in Genetix (Belkhir et al., 1996) and using STRUCTURE v. 2.3.4 (Pritchard, Stephens and Donnelly, 2000). ...
... Genetic gradients and irregularly collected samples may lead to wrong inferences on hierarchical structure and biased estimates of the true number of subpopulations (Schwartz and McKelvey 2009;Puechmaille 2016) and the number of individuals that are included in each population also affects to the assignment (Vähä and Primmer 2006;Duminil et al. 2006). An important aspect of STRU CTU RE is the capability of using reference samples which are defined as individuals of known origin that are used to classify additional individuals of unknown origin (Pritchard et al. 2000). These learning samples ensure that clusters correspond to predefined groups and can improve the accuracy of the inference. ...
... A dataset of reference samples from the results produced by STRU CTU RE software (Pritchard et al. 2000) to classify 29 European chestnut populations into four genetic groups (Fernández-Cruz and Fernández-López 2016) was used to classify sweet chestnut varieties. These reference samples include 96 genotypes from the Atlantic genetic cluster, 151 genotypes from the Cantabrian genetic cluster, 123 genotypes from the Western Mediterranean genetic cluster and 65 from the Eastern Mediterranean genetic cluster. ...
... Varieties were firstly assigned to the predefined four genetic clusters using STRU CTU RE 2.3.4 (Pritchard et al. 2000). In this program, using reference samples with a few individuals without a predefined population is highly recommended to set alpha to a sensible value. ...
Article
Full-text available
Sweet chestnut is a valuable species, highly managed for centuries for nut and wood production, whose genetic structure was affected by translocations. In this study, we selected a total of 51 genetically different clonal varieties from Galicia (NW of the Iberian Peninsula), Central Iberian Peninsula, France and Italy that were genotyped at 9 microsatellites. Almost all Galician varieties include at least two accessions with the same genotype. Several datasets of reference samples, from 29 natural or naturalized populations, were used to classify them into several groups. Genetic distances among varieties showed its cultivation area. Almost all Galician varieties cultivated in orchards were grouped in a single cluster except to ‘Famosa’, ‘Longal’, ‘Garrida’ and ‘Presa’ that were classified to the Central Iberian group and ‘Luguesa’ and ‘Carrelao’ to the French-Italian varieties. The Bayesian analysis with reference samples identified a group of varieties that could be autochthonous in Galicia because they were assigned to the Atlantic or the Cantabrian cluster. Other varieties from the Galician inner mountains that belong to the Mediterranean cluster could be translocated because this gene pool was found previously in several populations in the Iberian and Italian Peninsulas. Additionally, a large number of hybrid varieties between the Western Mediterranean cluster and the Atlantic or the Cantabrian cluster were found. Further analysis indicated that these Mediterranean varieties could be originated in Mercurín, in Central Iberian or Italian Peninsulas, and that ‘Luguesa’ and ‘Puga de Afora’ could be translocated from France or Italy. The results provided in this work provide a valuable information for a more efficient use of sweet chestnut genetic resources.
... The genetic structure (Q) prediction was analyzed with the program STRUCTURE version 2.2 (Pritchard et al., 2000) to identify clusters of genetically similar lines. The process of STRUCTURE organizes the diversity panel into n clusters to form a structure matrix (Q-matrix). ...
... The marker tarit association was measured by incorporating Q + K matrices using TASSEL version v2.0.1 [36]. The association analysis was performed at 50,000 times per mutation for the correction of multiple testing [37]. Markers with the P-value ≥ 0.05 were regarded as significant. ...
Article
Full-text available
Background Rice (Oryza sativa L.) is one of the staple foods worldwide. To feed the growing population, the improvement of rice cultivars is important. To make the improvement in the rice breeding program, it is imperative to understand the similarities and differences of the existing rice accessions to find out the genetic diversity. Previous studies demonstrated the existence of abundant elite genes in rice landraces. A genome-wide association study (GWAS) was performed for yield and yield related traits to find the genetic diversity. Design Experimental study. Methods and results A total of 204 SSRs markers were used among 17 SSRs found to be located on each chromosome in the rice genome. The diversity was analyzed using different genetic characters i.e., the total number of alleles (TNA), polymorphic information content (PIC), and gene diversity by Power markers, and the values for each genetic character per marker ranged from 2 to 9, 0.332 to 0.887 and 0.423 to 0.900 respectively across the whole genome. The results of population structure identified four main groups. MTA identified several markers associated with many agronomically important traits. These results will be very useful for the selection of potential parents, recombinants, and MTAs that govern the improvements and developments of new high yielding rice varieties. Conclusions Analysis of diversity in germplasm is important for the improvement of cultivars in the breeding program. In the present study, the diversity was analyzed with different methods and found that enormous diversity was present in the studied rice germplasm. The structure analysis found the presence of 4 genetic groups in the existing germplasm. A total of 129 marker-trait associations (MTAs) have been found in this study.
... To gain further insight into the structure of VACV populations, we used the program STRUCTURE, which was widely applied to infer ancestry and admixture patterns [21][22][23] . STRUCTURE relies on a Bayesian statistical model for clustering genotypes into populations without prior information on their genetic relatedness. ...
... The standardized index of association (I A S ) generated by LIAN was associated to the LD intepretation, with zero meaning linkage equilibrium. For the VACV dataset, we obtained a I A S of 0.0379, therefore allowing the application of population structure analyses as implemented in the STRUCTURE (v.2.3.4) suite 21 . To run STRUCTURE, we first estimated the allele frequency spectrum parameter (λ) by running the program with K = 1, as suggested 22 . ...
Article
Full-text available
Vaccinia virus (VACV) was used for smallpox eradication, but its ultimate origin remains unknown. The genetic relationships among vaccine stocks are also poorly understood. We analyzed 63 vaccine strains with different origin, as well horsepox virus (HPXV). Results indicated the genetic diversity of VACV is intermediate between variola and cowpox viruses, and that mutation contributed more than recombination to VACV evolution. STRUCTURE identified 9 contributing subpopulations and showed that the lowest drift was experienced by the ancestry components of Tian Tan and HPXV/Mütter/Mulford genomes. Subpopulations that experienced very strong drift include those that contributed the ancestry of MVA and IHD-W, in good agreement with the very long passage history of these vaccines. Another highly drifted population contributed the full ancestry of viruses sampled from human/cattle infections in Brazil and, partially, to IOC clones, strongly suggesting that the recurrent infections in Brazil derive from the spillback of IOC to the feral state.
... Les populations ont été testée à l'aide du logiciel STRUCTURE v.2.3.4 (Pritchard et al., 2000). ...
... De plus, ils établissent un lien entre le nombre de clusters détectables par l'algorithme STRUCTURE et le nombre d'axes significatifs de l'ACP.I-7-3-3-Méthodes avec modèle expliciteUne autre approche consiste à supposer que les données génétiques peuvent être expliquées par un modèle probabiliste dont les paramètres sont inconnus. Une des premières méthodes basées sur un modèle explicite permettant de détecter la structure génétique des populations a été proposée parPritchard et al. (2000). Elle est implémentée dans la première version du logiciel STRUCTURE. ...
Thesis
En Afrique de l’Ouest, Onchocerca volvulus, l’agent responsable de l’onchocercose chez l’homme, est transmis par des espèces du complexe Simulium damnosum s.l. Cependant la variabilité génétique intra-spécifique et ses conséquences sur la capacité vectorielle ont été peu étudiées en Côte d’Ivoire. Cette étude décrit l’utilisation des marqueurs microsatellites pour évaluer les flux géniques entre populations de trois différents faciès épidémiologiques de l’Ouest de la Côte d’Ivoire et étudier la structure génétique des populations de S. damnosum s.l, au regard des changements climatiques et environnementaux. Les villes de Soubré situé au Sud-Ouest, Bouaflé au Centre-Ouest et Touba au Nord-Ouest de la Côte d’Ivoire ont été les zones de collecte des simulies en vue des études génétiques. Les mouches ont été capturées sur appât humain de 7h 00 à 18h 00 pendant trois jours consécutifs de décembre 2016 à octobre 2017. Quatre (04) marqueurs microsatellites ont été utilisés pour caractériser les individus des trois populations. Les quatre loci se sont révélés polymorphes avec en moyenne 9 allèles par locus. Deux allèles spécifiques, 190 pb et 290 pb, se sont révélés abondants avec des fréquences respectives de 0,46 % et 0,58 %. Les analyses génétiques révèlent un écart significatif à la proportion de Hardy-Weinberg, un déficit significatif en hétérozygotes et une faible différenciation génétique (FST = 0,046 , P = 0,024) ont été observés pour toutes les populations. La grande variabilité interspécifique serait une caractéristique générale de S. damnosum s.l. Cependant, la classification hiérarchique basée sur la similarité des allèles a montré trois groupes génétiques constitués chacun d’individus issus des trois populations. Par conséquent, ces résultats suggèrent qu’aucune barrière n’existe entre les populations de simulies sur ces trois sites. Cette étude a montré que les loci donnent une estimation indépendante des paramètres génétiques. Le locus H3-4 contribue à la différentiation génétique entre les populations.
... The genetic structure within northern dragonhead was assessed using ParallelStructure v. 2.3.4 (Besnier & Glover, 2013;Pritchard et al., 2000), an R-based implementation of the common STRUCTURE algorithm, on XSEDE at CIPRES Science Gateway v. 3.1 (Miller et al., 2011). For the NOR dataset, we tested K from 1 to 40, with 20 replicates for each value of K. ...
... Their Mantel test revealed a positive correlation between genetic distance and geographical distance (R = 0.56, p = .001). The analysis software STRUCTURE, which we have used herein, assumes that markers are not linked and that populations are panmictic (Pritchard et al., 2000). Hence, our STRUCTURE results should be interpreted with caution, as IBD violates the assumption of freely distributed genotypes. ...
Article
Full-text available
The species we have studied the spatiotemporal genetic change in the northern dragonhead, a plant species that has experienced a drastic population decline and habitat loss in Europe. We have added a temporal perspective to the monitoring of northern dragonhead in Norway by genotyping herbarium specimens up to 200 years old. We have also assessed whether northern dragonhead has achieved its potential distribution in Norway. To obtain the genotype data from 130 herbarium specimens collected from 1820 to 2008, mainly from Norway (83) but also beyond (47), we applied a microfluidic array consisting of 96 SNP markers. To assess temporal genetic change, we compared our new genotype data with existing data from modern samples. We used sample metadata and observational records to model the species' environmental niche and potential distribution in Norway. Our results show that the SNP array successfully genotyped all included herbarium specimens. Hence, with the appropriate design procedures, the SNP array technology appears highly promising for genotyping old herbarium specimens. The captured genetic diversity correlates negatively with distance from Norway. The historical‐modern comparisons reveal similar genetic structure and diversity across space and limited genetic change through time in Norway, providing no signs of any regional bottleneck (i.e., spatiotemporal stasis). The regional areas in Norway have remained genetically divergent, however, both from each other and more so from populations outside of Norway, rendering continued protection of the species in Norway relevant. The ENM results suggest that northern dragonhead has not fully achieved its potential distribution in Norway and corroborate that the species is anchored in warmer and drier habitats. Our study provides new insight to guide conservation priorities for the charismatic flowering plant called northern dragonhead (Dracocephalum ruyschiana). It also showcases a fruitful integration of methods and data sources that jointly enables a holistic species assessment covering space and time.
... Two indices (Fsc, representing the proportion of variation due to differences among populations within a group, and Fct, representing the proportion of variation due to differences between groups of populations) were used to test whether the grouping criterion is significant or not at the 5% level (resampling with 10′000 permutations). A Bayesian approach was used to infer K, the optimal number of groups in the datasets (Pritchard et al. 2000). This method uses multi-locus genotypes and allows finding the best number of populations in a dataset, assuming that each inferred population follows panmixia. ...
... This method uses multi-locus genotypes and allows finding the best number of populations in a dataset, assuming that each inferred population follows panmixia. For microsatellite data, the Structure software version 2.3.4 (Pritchard et al. 2000) was used to compute the admixture rates per individual according to the K clusters, and the Structure Harvester software (Earl and Vonholdt 2012) was used to determine the more likely K using the Evanno's DeltaK method (Evanno et al. 2005). Each run, from K = 2 to K = 10, was repeated 20 times and performed with 100′000 burn-in iterations and 100′000 subsequent Monte Carlo Markov chains. ...
Article
Full-text available
Trachycarpus fortunei (Arecaceae: Coryphoideae) is an Asian palm that was introduced during the nineteenth century in southern Switzerland and northern Italy as an ornamental plant. In the recent decades, the palm has become an aggressive invasive species in the region. Before this study, the genetic structure and diversity of the naturalised populations were unknown. We aimed at understanding the dynamics of invasion and at comparing the results obtained with two types of markers. This genetic approach aimed at tracing back as far as possible the source of invasive populations comparing historical information found in literature and invasive genetic patterns. The genetic diversity was analysed using eight microsatellites (five were developed for that purpose) and 31′000 SNPs identified through GBS analyses. Genetic analyses were carried out for 200 naturalised individuals sampled from 21 populations in the Canton Ticino (Switzerland) and the provinces of Lombardy and Piedmont (Italy). The observed general panmixia indicates that the expansion of T. fortunei is active in its naturalised areas. The genetic pattern found for both SNPs and microsatellites appears to be related to the colonisation process, with a lack of geographic structure and bottleneck signatures occurring at the colonisation front, distantly from historical sites. This study gives a better understanding of the expansion of T. fortunei and adds new insights to its ecology.
... Genetic structure was also explored using a model-based clustering method implemented in STRUCTURE 2.3.4 (Pritchard et al., 2000). An admixture model with no prior information about sampling location was used with a burn-in period of 50 000 iterations, followed by a 100 000 Monte Carlo Markov Chain (MCMC) replicates for K = 1 to K = 8 clusters and 10 ...
... The * indicates significant values after Bonferroni sequential correction. (Pritchard et al., 2000) showing the most likely number of clusters (K) present in Pinna nobilis populations across the Gulf of Lion resulting from the application of the method described by (A) Evanno et al. (2005) and (B) by Puechmaille (2016). Pemberton, 2008). ...
Thesis
Les systèmes marins côtiers sont généralement discontinus et constitués d’une mosaïque de paysages sous-marins différents, créant ainsi des distributions parfois très fragmentées chez les espèces qui les colonisent. Les espèces marines côtières sont donc structurées en réseaux de populations connectées entre elles via la dispersion larvaire. Comprendre le fonctionnement et la connectivité entre les populations d’une espèce est indispensable pour adapter les stratégies de conservation. La grande nacre, Pinna nobilis, est une espèce endémique de la mer Méditerranée qui fait aujourd’hui face à une crise majeure qui menace sa survie. Depuis Octobre 2016, des mortalités de masse sont signalées sur ses populations, à travers toutes la mer Méditerranée, causées par un protozoaire parasite, Haplosporidium pinnae. Il s’agit d’un évènement sans précédent, que ce soit par le taux de mortalité (près de 100 %) ou la vitesse de propagation, et qui pourrait conduire à l’extinction de l’espèce. En se focalisant sur le littoral Occitan, cette thèse apporte des connaissances sur la biologie et l’écologie de l’espèce mais aussi sur son fonctionnement et les processus qui permettent le maintien de ses populations afin de proposer des priorités de conservation. Ainsi, nous avons mis en évidence la diversité d’habitats colonisés par l’espèce ainsi que l’importance des lagunes car elles abritent près de 90 % des grandes nacres, sur le littoral Occitan, et semblent servir d’habitat refuge à l’espèce en limitant l’infestation par le parasite. A l’aide de marqueurs microsatellites nouvellement développés, nous avons montré une structure génétique très homogène sur toute la côte, ce qui implique un certain niveau de connectivité et laisse penser qu’une grande partie de la diversité génétique de l’espèce reste préservée dans les lagunes. En se focalisant sur la population de la baie de Peyrefite, dans la Réserve Naturelle Marine de Cerbère-Banyuls, et grâce à une analyse de parenté, nous avons apporté des connaissances sur la dynamique démographique et les processus de repeuplement de l’espèce. L’ensemble de cette thèse permet de définir des recommandations qui seront utiles à la mise en place de mesures de conservation adaptées, indispensables pour la survie de l’espèce.
... To analyse D. foliosa for patterns of genetic differentiation, I constructed a maximum parsimony network for the plastid haplotypes using the computer program Network 10.2.0.0 (Fluxus technology 2007), and I investigated the microsatellite data for potential clustering of nuclear genotypes using the computer program Structure 2.3.4 (Pritchard et al. 2000). For the plastid loci, I assumed a stepwise mutation model and weighted all mutations equally, irrespective of the amount of variability shown at the different marker sites. ...
... Fragmentation of the habitat might result in reduction in gene flow, smaller population sizes and increased levels of inbreeding, which in turn might lead to loss of genetic diversity and an increased risk of population loss and extinction. (Pritchard et al. 2000). Online Resource 7. Delta K values for K=2 to K=9 in the material studied of Dactylorhiza foliosa as estimated by the Evanno method (Evanno et al. 2005) and calculated in the computer program Structure Harvester (Earl and vonHoldt 2012). ...
Article
Full-text available
Oceanic islands have isolated biota, which typically include many endemic species. However, island endemics are vulnerable due to small population sizes, and they are often threatened by habitat destruction or by introduced pests and predators. Adequate conservation planning requires good information on genetic variability and population structure, also when seemingly viable species are considered. Here, I analysed the genetic structure in the terrestrial orchid Dactylorhiza foliosa, which is endemic to Madeira. This species is a characteristic component of evergreen laurel forests occupying the northern slopes of the island. Levels of diversity in both the plastid genome and in the nuclear genome were comparable to levels of diversity found in congeners growing in continental regions. Within populations, plants separated by distances up to 256 m shared plastid haplotypes significantly more often than plants at random, but when nuclear markers were considered, only plants growing closer than eight metres were significantly more closely related. Analysis of plastid marker variation revealed that gene dispersal by seeds is not sufficiently large to counterbalance the accumulation of mutations that build up divergence between the most distant populations. However, differentiation in the nuclear genome was considerably smaller, suggesting that gene dispersal by pollen is much more efficient than gene dispersal by seeds in D. foliosa. The overall pollen to seed dispersal ratio, mp/ms, was 7.30. Considering genetic parameters, conditions for long-term persistence of D. foliosa on Madeira seem to be good.
... We examined the genetic structure using a Bayesian population assignment method of the clustering software STRUCTURE (Pritchard et al., 2000). The Bayesian method considers data and parameters as random variables having specific distributions, called a-priori distribution. ...
Article
Full-text available
The tar spot complex (TSC) is a devastating disease of maize ( Zea mays L.), occurring in 17 countries throughout Central, South, and North America and the Caribbean, and can cause grain yield losses of up to 80%. As yield losses from the disease continue to intensify in Central America, Phyllachora maydis , one of the causal pathogens of TSC, was first detected in the United States in 2015, and in 2020 in Ontario, Canada. Both the distribution and yield losses due to TSC are increasing, and there is a critical need to identify the genetic resources for TSC resistance. The Seeds of Discovery Initiative at CIMMYT has sought to combine next-generation sequencing technologies and phenotypic characterization to identify valuable alleles held in the CIMMYT Germplasm Bank for use in germplasm improvement programs. Individual landrace accessions of the “Breeders' Core Collection” were crossed to CIMMYT hybrids to form 918 unique accessions topcrosses (F1 families) which were evaluated during 2011 and 2012 for TSC disease reaction. A total of 16 associated SNP variants were identified for TSC foliar leaf damage resistance and increased grain yield. These variants were confirmed by evaluating the TSC reaction of previously untested selections of the larger F1 testcross population (4,471 accessions) based on the presence of identified favorable SNPs. We demonstrated the usefulness of mining for donor alleles in Germplasm Bank accessions for newly emerging diseases using genomic variation in landraces.
... For identification of genetic clustering of I. parviflora populations, Bayesian analysis was performed with Structure v.2.3.1 [66], for significance of clustering patterns using Evanno et al. [67] methodology. The a priori number of clusters selected was set to K = 1-21, the maximum expected number of clusters corresponded to the number of analyzed populations. ...
Article
Currently, there is an increasing focus on understanding the interactions between genetic features of the invader and environmental factors that ensure the success of invasion. The objective of our study was to evaluate the genetic diversity of Lithuanian populations of highly invasive small balsam (Impatiens parviflora) by amplified fragment length polymorphism (AFLP) markers and to relate molecular data to biotope features defined by employing neighboring species of herbaceous plants. Low polymorphism of I. parviflora populations was observed at AFLP loci. Hierarchical analysis of molecular variance did not reveal differentiation of populations depending on biotope, geography, or road types. Bayesian analyses of AFLP data demonstrated many genetic clusters. Our results suggest multiple introductions of I. parviflora into Lithuania. The polymorphism of AFLP loci of populations significantly correlated with the total coverage by herbaceous plants in the sites. Defined by principal component analysis, the variability of study sites was most related to the coverage of herbaceous plants and least related to the molecular features of I. parviflora populations. The sites with I. parviflora were classified into agricultural scrubland, riparian forest, and urban forest biotopes. Of them, urban forest was distinguished by the highest coverage of I. parviflora and the lowest Ellenberg indicatory values for light, soil acidity, and richness in nutrients.
... The number of genetic clusters and individual assignments were examined within each dataset (2011 and 2017) separately using Bayesian clustering in STRUC-TURE v.2.3.4 (Pritchard et al., 2000) and discriminant analysis of principal components (DAPC) in adegenet v.2.1.3 (Jombart et al., 2010). ...
Article
Full-text available
Habitat loss and fragmentation can lead to smaller and more isolated populations and reduce genetic diversity and evolutionary potential. Conservation programs can benefit from including monitoring of genetic factors in fragmented populations to help inform restoration and management. We assessed genetic diversity and structure among four major populations of the Cactus Wren (Campylorhynchus brunneicapillus) in San Diego County in 2011–2012 and again in 2017–2019, using 22 microsatellite loci. We found a significant decline in heterozygosity in one population (San Pasqual) and a decline in allelic richness and effective population size in another (Sweetwater). Genetic diversity in the remaining two populations was not significantly different over time. Local diversity declined despite evidence of dispersal among some populations. Approximately 12% of genetically determined family groups (parents, offspring, siblings) included one or more members sampled in different territories with distances ranging from 0.2 to 10 km. All but one inferred dispersal events occurred within the same genetic population. Population structure remained relatively stable, although genetic differentiation tended to increase in the later sampling period. Simulations suggest that at currently estimated effective sizes, populations of Cactus Wrens will continue to lose genetic diversity for many generations, even if gene flow among them is enhanced. However, the rate of loss of heterozygosity could be reduced with increased gene flow. Habitat restoration may help bolster local population sizes and allelic richness over the long term, whereas translocation efforts from source populations outside of San Diego may be needed to restore genetic diversity in the short term. The Cactus Wren is a rare and fragmentation sensitive species in coastal southern California, USA. Two of four known aggregations of coastal Cactus Wrens on conserved lands in San Diego County showed significant declines in genetic diversity between 2011 and 2019 and all four aggregations have low effective population sizes. Forward‐time simulations show that assisted gene flow among these sites could slow the loss of heterozygosity but not overall allelic richness. Resource managers could use this information to assess whether conservation goals are being met and support management actions including translocation of individuals or eggs to preserve genetic diversity in the short term and support habitat restoration to boost local population size over longer time periods.
... To examine the population structure, principal component analysis (PCA) of the filtered and imputed genotypic data was conducted using Genomic Association and Prediction Integrated Tool (GAPIT) v3.0 (Wang and Zhang 2021) in R (R Core Team 2014). We also assessed the population stratification using a Bayesian model-based clustering program, STRU CTU RE v2.3.4 assuming an Admixture model (Pritchard et al. 2000). We used ten subgroups (K = 1-10) with ten independent runs for each subgroup using a burn-in period of 10,000 iterations followed by 10,000 Monte-Carlo iterations. ...
Article
Full-text available
Genetic dissection of yield component traits including spike and kernel characteristics is essential for the continuous improvement in wheat yield. Genome-wide association studies (GWAS) have been frequently used to identify genetic determinants for spike and kernel-related traits in wheat, though none have been employed in hard winter wheat (HWW) which represents a major class in US wheat acreage. Further, most of these studies relied on assembled diversity panels instead of adapted breeding lines, limiting the transferability of results to practical wheat breeding. Here we assembled a population of advanced/elite breeding lines and well-adapted cultivars and evaluated over four environments for phenotypic analysis of spike and kernel traits. GWAS identified 17 significant multi-environment marker–trait associations (MTAs) for various traits, representing 12 putative quantitative trait loci (QTLs), with five QTLs affecting multiple traits. Four of these QTLs mapped on three chromosomes 1A, 5B, and 7A for spike length, number of spikelets per spike (NSPS), and kernel length are likely novel. Further, a highly significant QTL was detected on chromosome 7AS that has not been previously associated with NSPS and putative candidate genes were identified in this region. The allelic frequencies of important quantitative trait nucleotides (QTNs) were deduced in a larger set of 1,124 accessions which revealed the importance of identified MTAs in the US HWW breeding programs. The results from this study could be directly used by the breeders to select the lines with favorable alleles for making crosses, and reported markers will facilitate marker-assisted selection of stable QTLs for yield components in wheat breeding.
... Finally, for both datasets, we used PLINK v.1.90b6.6 [50] to remove SNPs with a pairwise squared correlation (r 2 ) greater than 50% within sliding windows of 50 SNPs at 10 SNP increments between windows [51]. This was done to reduce the impact of linkage between SNPs on our examinations of population clustering and admixture [52]. ...
Article
Understanding patterns of diversification, genetic exchange, and pesticide resistance in arthropod disease vectors is necessary for effective population management. With the availability of next-generation sequencing technologies, one of the best approaches for surveying such patterns involves the simultaneous genotyping of many samples for a large number of genetic markers. To this end, the targeting of gene sequences of known function can be a cost-effective strategy. One insect group of substantial health concern are the mosquito taxa that make up the Culex pipiens complex. Members of this complex transmit damaging arboviruses and filariae worms to humans, as well as other pathogens such as avian malaria parasites that are detrimental to birds. Here we describe the development of a targeted, gene-based assay for surveying genetic diversity and population structure in this mosquito complex. To test the utility of this assay, we sequenced samples from several members of the complex, as well as from distinct populations of the relatively under-studied Culex quinquefasciatus . The data generated was then used to examine taxonomic divergence and population clustering between and within these mosquitoes. We also used this data to investigate genetic variants present in our samples that had previously been shown to correlate with insecticide-resistance. Broadly, our gene capture approach successfully enriched the genomic regions of interest, and proved effective for facilitating examinations of taxonomic divergence and geographic clustering within the Cx . pipiens complex. It also allowed us to successfully survey genetic variation associated with insecticide resistance in Culex mosquitoes. This enrichment protocol will be useful for future studies that aim to understand the genetic mechanisms underlying the evolution of these ubiquitous and increasingly damaging disease vectors.
... To investigate the structure of the studied walnut genotypes, the Bayesian model, implemented in Structure 2.3.1 software (Pritchard et al. 2000), was applied. Ten independent replicates were performed, setting the number of subpopulations (k) from 1 to 10, burn in period and MCMC iterations. ...
Article
Full-text available
Persian walnut (Juglans regia L.) is an economically important and worldwide-distributed walnut species. The objective of this study was to evaluate the genetic diversity and population structure of superior spring frost tolerant genotypes of Persian walnut, which were identified from native populations of East Azerbaijan province located in the Northwest region of Iran using Inter simple sequence repeat (ISSR) markers. The effective number of alleles, Shannon’s information index, and expected heterozygosity were calculated. Then, cluster analyses of identified genotypes were performed using Unweighted Pair Group Method with Arithmetic Mean (UPGMA) and Principal Coordinate Analysis methods. Finally, structure analysis was performed on regional populations. The highest and lowest effective number of alleles, Shannon’s information index, and expected heterozygosity were calculated for ISSR primers CCC (GT)7 and (AG)8 T, respectively. The STRUCTURE analysis was consistent with UPGMA clustering and principal components analysis, which determined two main sub-populations. Population A consisted of Azarshahr and Tasouj genotypes, and population B comprised mainly of genotypes collected from Ajabshir and Marand counties. Comparative analysis of genetic variation among spring frost tolerant and superior genotypes of Persian walnut revealed higher genetic diversity in Azarshahr population compared with others. These results provide valuable information regarding the distribution, diversity and structures of superior spring frost tolerant genotypes of Persian walnut for future breeding, propagation, and conservation programs.
... The GenAlEx version 6.5 software was used to calculate the diversity indices, and molecular variance between and within populations. Using the STRU CTU RE version 2.3.4 software, a model evaluation for population structure was performed to determine the genetic structure (Evanno et al. 2005;Pritchard et al. 2000). Admixture models, correlated allele frequencies, and a burn-in period of 50,000 iterations followed by 500,000 Markov chain Monte Carlo (MCMC) repetitions were used to investigate the population structure. ...
Article
Full-text available
Intraspecific genetic diversity study is a helpful tool for genetic improvement and germplasm conservation initiatives. In this context, the current study analyzed the population genetic diversity and structure of 89 accessions collected from 11 populations in four different northern Iraqi provinces using conserved DNA-derived polymorphism (CDDP) and inter-simple sequence repeats (ISSR) molecular markers. CDDP and ISSR revealed 105 and 179 polymorphic bands, respectively, with an average of 10.50 bands per primer for CDDP and 8.52 fragments per primer for ISSR. All the primers exhibited polymorphic information content values greater than 0.50. Shannon’s information index (0.43) and expected heterozygosity (0.28) were both the highest in the KNOX-1 primer. Based on CDDP, ISSR, and CDDP + ISSR data, dendrogram analysis of populations revealed the presence of two genetic clusters, which were subsequently sub-clustered. The slight similarity between the geographic distribution of Q. aegilops populations and their clustering pattern was stated. The genetic dissimilarity among populations ranged from 0.13 to 0.34 for CDDP, 0.11 to 0.39 for the ISSR, and 0.13 to 0.36 for the CDDP + ISSR combination. In the model-based structure analysis, both markers and their combinations showed a similar clustering trend, with two major genetic clusters. A moderate relationship was observed between the structure and cluster patterns in terms of the distribution of populations within the clusters. The highest fixation index values (0.43 for CDDP markers and 0.39 for ISSR markers) were recorded by the second cluster. The analysis of molecular variance revealed high genetic variation within regions than between them, as well as significant gene exchange between regions. The Sulaimani-Sharbazher (SSH) and Erbil-Shaqlawa (ESH) populations had the highest values of Shannon’s information index (0.36 for SSH and 0.35 for ESH, according to CDDP data) and expected heterozygosity (0.23 for SSH and 0.24 for ESH, according to ISSR data). There was a significant association between CDDP and ISSR dissimilarity matrices. The supplied data can be used by producers and scientists to improve the preservation and rational use of wild Q. aegilops populations. By selecting a small number of individuals from diverse populations, ex and in situ conservation may be an appropriate method for adequately capturing the total genetic diversity.
... The appropriate number of clusters was determined by calculating the delta K value (Evanno et al. 2005). A second run was performed using the delta K value assigned in the program; these analyzes were run with a length of burning period of 50,000 and number of MCMC reps after burn-in of 200,000 (Pritchard et al. 2000). ...
Article
Full-text available
Triatoma mexicana is an important vector of Trypanosoma cruzi—the etiological agent of Chagas disease. This triatomine species occurs in central Mexico, but little is known about its genetic variability. Using Cyt-b gene as a genetic marker, in this study, we determined the population genetic structure of T. mexicana collected from the States of Hidalgo, Guanajuato, and Queretaro where populations are largely peridomiciliary. A Bayesian approach was performed for the design of phylogenies, median-joining networks, and clustering among populations of T. mexicana. Our results show that the Hidalgo population was the most distinct, with the highest genetic and haplotypic variation (Hd = 0.963, π = 0.06129, and ɵ = 0.05469). Moderate gene flow (Nm) was determined among populations of Hidalgo and Queretaro. Populations from the three states showed differentiation (FST) values ranging from 0.22 to 0.3, suggesting an important genetic differentiation. The phylogenetic analysis showed the presence of five well-defined groups, as well as the haplotype network, where 24 haplotypes were observed forming five haplogroups with high mutational steps among them: 68 (Hgo-W2), 26 (Qto), 59 (Hgo-M), 44 (Hgo-W1), and 46 (Gto). Genetic isolation was apparently inferred in the Guanajuato population; however, the Mantel test did not show correlation between genetic (FST) and geographic (km) distances (p = 0.05). The STRUCTURE analyses showed seven genetic clusters and it was observed that a single cluster predominates in each sampled location. However, genetic admixture was detected in four localities. Our results show evidence that there are multiple species within the collected sampling area.
... To visualize the genetic structures of the moray pairs, we constructed median-joining haplotype networks of individual genes for each species pair in PopART v. 1.7 (Leigh & Bryant, 2015). We also implemented a Bayesian clustering based on three concatenated data sets of sequences in STRUCTURE v. 2.3.4 (Pritchard et al., 2000), including mitochondrial genes (COI and cyt b), nuclear genes (EGR3 and Rh), and the combination of all the genes. An admixture model was used to determine the best number of clusters that fit the data sets (K), and the admixture proportions among individuals (Q) may indicate hybridization and introgression. ...
Article
Phenovariant is a pair of populations or species with distinct phenotypes but little to no genetic divergence, which may have resulted from strong assortative mating, hybridization, or incomplete lineage sorting that changes in the phenotype have exceeded the genotype. Previous studies had mainly tackled this issue on small, colourful reef fishes, but little effort has been made to the barely seen and solitary taxa like the morays (family Muraenidae). In the present study, three species pairs of sympatric morays from Taiwanese waters revealed as phenovariant in the mitochondrial cytochrome oxidase I (COI) were examined from the molecular and morphological perspectives, with three dissimilar scenarios discovered: (1) no hybridization was found between the morphologically most distinct Gymnothorax intesi and Gymnothorax neglectus; (2) Gymnothorax kidako bidirectionally hybridized with Gymnothorax prionodon instead of its closely related species Gymnothorax pseudokidako; and (3) frequently bidirectional hybridization and introgression were detected between Gymnothorax pseudothyrsoideus and Gymnothorax reevesii, a species pair that overlapped in all the morphometric and meristic characters. Nevertheless, the principal component analysis indicated that each species pair has evolved significant morphological divergence, further supporting their taxonomic validities. Our results document the natural hybridization of marine eel taxa for the first time and reveal a diverse evolutionary process among the morays.
... Bayesian clustering approach was implemented using STRUCTURE 2.3.4 tool to investigate the subpopulation structure based on an "admixture" model [32]. It is a modelbased clustering algorithm to identify genetic clusters in the form of K (sub-population) values. ...
Article
Full-text available
Chickpea is an inexpensive source of protein, minerals, and vitamins to the poor people living in arid and semi-arid regions of Southern Asia and Sub-Saharan Africa. New chickpea cultivars with enhanced levels of protein, Fe and Zn content are a medium-term strategy for supplying essential nutrients for human health and reducing malnutrition. In the current study, a chickpea reference set of 280 accessions, including landraces, breeding lines, and advanced cultivars, was evaluated for grain protein, Fe, Zn content and agronomic traits over two seasons. Using a mid-density 5k SNP array, 4603 highly informative SNPs distributed across the chickpea genome were used for GWAS analysis. Population structure analysis revealed three subpopulations (K = 3). Linkage disequilibrium (LD) was extensive, and LD decay was relatively low. A total of 20 and 46 marker-trait associations (MTAs) were identified for grain nutrient and agronomic traits, respectively, using FarmCPU and BLINK models. Of which seven SNPs for grain protein, twelve for Fe, and one for Zn content were distributed on chromosomes 1, 4, 6, and 7. The marker S4_4477846 on chr4 was found to be co-associated with grain protein over seasons. The markers S1_11613376 and S1_2772537 co-associated with grain Fe content under NSII and pooled seasons and S7_9379786 marker under NSI and pooled seasons. The markers S4_31996956 co-associated with grain Fe and days to maturity. SNP annotation of associated markers were found to be related to gene functions of metal ion binding, transporters, protein kinases, transcription factors, and many more functions involved in plant metabolism along with Fe and protein homeostasis. The identified significant MTAs has potential use in marker-assisted selection for developing nutrient-rich chickpea cultivars after validation in the breeding populations.
... Rare bands may be caused by mutations combined with selection pressure, gene flow, and drift. They are not desirable in association studies (Pritchard et al., 2000), but desirable in cultivar identification; therefore, this population may not be appropriate for marker-trait association studies. Table (3) revealed the total number of bands for each primer combination which ranged from 32 bands for the primer Em2R and me3F to 53 bands for the primer Em3 R and me4 F. The percentage of polymorphism ranged from 78 % for the primer Em2R and me3F to 96 % for the primer Em3 R and me4 F. the lowest unique bands were observed in Em2R and me3F (18) while the highest number was detected in Em3 R and me4 F primer (42). ...
Article
Full-text available
Sequence-related amplified polymorphism (SRAP) markers were used to detect molecular marker polymorphisms among five parents and four crosses of citrus and their relatives in Aurantioidea. Four SRAP primer combinations produced a total of 160 polymorphic fragments with an average of 40 per primer combination and the an-average polymorphism information content (PIC) of 0.86. The unweighted pair group method arithmetic average (UPGMA) analysis demonstrated that the accessions had a similarity range from 0.35 in the cross between Lemon and Clementine to 0.43 in the Grapefruit parent with a mean of 0.37. The dendrogram separated the parents and the resulted crosses of Citrus species into two main sub-clusters with a similarity value of 0.37. Only one member of the first sub-cluster which is Clementine or the parent of all the resulted crosses. In the second main sub-cluster, Only one member of the first sub-sub-cluster which is Grapefruit or the parent of one cross. The second sub-sub-cluster has consisted of one parent separated alone (Succari parent) and another sub-cluster. This sub-cluster is formed from the sub-sub-cluster including the parent Cleopatra mandarin and the resulting from cross Cleopatra mandarin x Clementine. The last sub-cluster has consisted of one group containing the parent Lemon and the resulted cross Lemon x Clementine. The other group consisted of two crosses; Grapefruit x Clementine and Succari x Clementine.
... 2.3.4. (Pritchard et al. 2000) was used to estimate the most likely number of ancestral genotypes (K) within the entire sample, and to estimate the proportions of each ancestral genotype in Balkan chamois individuals. We ran the analysis allowing for admixture and correlated allele frequency with ten independent runs for each K between 1 and 7 with a burn-in 500,000 steps followed by 10 5 Markov chain Monte Carlo (MCMC) iterations. ...
Article
Full-text available
The translocation of wild animal species became a common practice worldwide to re-establish local populations threatened with extinction. Archaeological data confirm that chamois once lived in the Biokovo Mountain but, prior to their reintroduction in the 1960s, there was no written evidence of their recent existence in the area. The population was reintroduced in the period 1964–1969, when 48 individuals of Balkan chamois from the neighbouring mountains in Bosnia and Herzegovina were released. The main objective of this study was to determine the accuracy of the existing historical data on the origin of the Balkan chamois population from the Biokovo Mountain and to assess the genetic diversity and population structure of the source and translocated populations 56 years after reintroduction. Sixteen microsatellite loci were used to analyse the genetic structure of three source chamois populations from Prenj, Čvrsnica and Čabulja Mountains and from Biokovo Mountain. Both STRUCTURE and GENELAND analyses showed a clear separation of the reintroduced population on Biokovo from Prenj’s chamois and considerable genetic similarity between the Biokovo population and the Čvrsnica-Čabulja population. This suggests that the current genetic composition of the Biokovo population does not derive exclusively from Prenj, as suggested by the available literature and personal interviews, but also from Čvrsnica and Čabulja. GENELAND analysis recognised the Balkan chamois from Prenj as a separate cluster, distinct from the populations of Čvrsnica and Čabulja. Our results thus highlight the need to implement genetic monitoring of both reintroduced and source populations of endangered Balkan chamois to inform sustainable management and conservation strategies in order to maximise the chances of population persistence.
... 2.4.4 (R Core, The R Project for Statistical Computing) (Team 2020) R software was used to perform basic coordinate analysis (PCoA). The population structure analysis of 61 C. mas genotypes with 80 loci was derived from Bayesian model-based clustering, implemented in the program Structure v.2.3.4 (Pritchard et al. 2000). The algorithm was derived from mixed models of independent values (K) consisting of 1-10 approximate groups of 50.000 iterations of the Markov-Monte Carlo chain (MCMC) after 100.000 combustion cycles. ...
Article
Full-text available
Cornus mas L. is a type of fruit preferred by consumers due to its rich bioactive compounds, attractive appearance, unique taste, high biological activities, sensory properties and nutritional properties. Morphological and molecular characterization of 61 C. mas genotypes collected from the flora of Bolu province was carried out in the current study. According to the two-year average data, the fruit and core weights of the genotypes showed significant variation with ranges from 1.44 to 3.37 g and from 0.19 to 1.13 g, respectively. The soluble solids content, pH, and titratable acidity values changed between 10.37 and 21.22%, 3.45 and 2.44, and 1.01% and 2.46%, respectively. Inter-primer binding site (iPBS) retrotransposon markers were evaluated for genetic variation among C. mas genotypes for the first time worldwide. Five iPBS markers amplified 80 fragments, 60 polymorphic (75%) with an average of 12 polymorphic bands per primer. Each of the selected iPBS markers supplied adequate separation power. Polymorphism information content and resolution power of markers ranged from 0.18 to 0.28 and from 3.57 to 8.43, with averages of 0.24 and 5.52, respectively. The iPBS primer 2378 had the highest polymorphism rate value (88.89%), whereas iPBS primers 2242 and 2232 had the lowest (66.67%) phylogenetic analysis grouped genotypes into three main groups. The unweighted pair groups method using arithmetic averages, principal coordinate, and structure analyses confirmed a high level of genetic diversity among the investigated genotypes in this work. The findings will help to plant breeders to characterize C. mas genotypes.
... STRU CTU RE, PCoA, DAPC), as suggested previously (Pearse and Crandall 2004). Firstly, the Bayesian clustering method and Markov chain Monte Carlo (MCMC) simulation implemented in STRU CTU RE version 2.3.4 (Pritchard et al. 2000) were used to infer the most probable number of genetic clusters without a priori definition of populations. The analyses were run using an admixture model and correlated allele frequencies with a burn-in period of 250, 000 replicates and a sampling period of 750, 000 replicates for the number of clusters (K) from one to six with ten independent runs for each K. ...
Article
Full-text available
After the last glacial, the Carpathian Basin was repopulated from either eastward or northward colonisation routes for various species; one of these was the emblematic member of the European megafauna, the red deer, Cervus elaphus. We analysed 303 red deer individuals from the middle of the region, in seven Hungarian game reserves, at ten microsatellite loci (C01, C229, T26, T108, T123, T156, T172, T193, T501, T507), to investigate the genetic diversity of these subpopulations. We discovered high levels of genetic diversity of red deer subpopulations; allelic richness values ranging 4.99-7.01, observed heterozygosity 0.729-0.800, polymorphic information content 0.722-0.806, and Shannon's information index 1.668-2.064. Multi-locus analyses indicated population admixtures of various degrees that corresponded to geographical location, and complex genetic structures were shown by clustering. Populations in the southwestern and the northeastern parts of the region formed two highly separated groups, and the red deer from populations in between them were highly admixed (in western Pannonia/Transdanubia, where the Danube flows into the Carpathian Basin). This pattern corresponds to the distribution of mitochondrial as well as Y-chromosome lineages. Assignment tests showed that a large fraction of individuals (29.4%) are found outside of their population of origin, indicating that the dispersal of red deer is rather common, which could be expected considering the life course of the species.
... The dendrogram was depicted using MEGA 5 [55]. To examine the genetic structure of the analyzed Mediterranean breeds and to assess the degree to which breeds differ from each other, two methods were completed using the Discriminant Analysis of Principal Components (DAPC) implemented in the adegenet package R [56] and the Bayesian clustering, implemented by the STRUCTURE version 2.3.4 software [57] based on the most likely number of clusters (K) in the dataset. For this last method, the optimum number of clusters fitted to the data was established following the ∆K method [58]. ...
Article
Full-text available
In this study, the genetic relationship and the population structure of western Mediterranean basin native sheep breeds is investigated, analyzing Maghrebian, Central Italian, and Venetian sheep with a highly informative microsatellite markers panel. The phylogeographical analysis, between breeds’ differentiation level (Wright’s fixation index), gene flow, ancestral relatedness measured by molecular coancestry, genetic distances, divergence times estimates and structure analyses, were revealed based on the assessment of 975 genotyped animals. The results unveiled the past introduction and migration history of sheep in the occidental Mediterranean basin since the early Neolithic. Our findings provided a scenario of three westward sheep migration phases fitting properly to the westward Neolithic expansion argued by zooarcheological, historical and human genetic studies.
... Phenotypic cluster-and population-based genetic diversity, analysis of molecular variance (AMOVA), principal coordinate analysis (PCoA), and Wright's F statistics were calculated using GenAlEx 6.5 [39]. The Bayesian clustering approach in the STRUCTURE 2.3.4 program was used to analyze the population structure [40], with 10 runs for each K (assumed number of subpopulations [1−10]) and Monte Carlo chain replicates of 100,000 iterations, and the burn-in period for every run was 100,000 steps. A Structure Harvester (http://taylor0.biology.ucla.edu/structureHarvester/) ...
Article
Full-text available
This study investigated the genetic diversity of bread-wheat genotypes using canopy reflectance-based vegetation indices (VIs) and simple sequence repeat (SSR) marker-based genotyping for drought tolerance. A total of 56 wheat genotypes were assessed using phenotypic traits (combination of VIs and yield traits) and 30 SSR markers. The data of the phenotypic traits were averaged over two growing seasons under irrigated and drought-stressed conditions. The hierarchical clustering of the wheat genotypes unveiled three drought-tolerant groups. Cluster 1 genotypes showed minimal phenotypic alterations, conferring superior drought tolerance and yield stability than clusters 2 and 3. The polymorphism information content values for the SSR markers ranged from 0.434 to 0.932, averaging 0.83. A total of 458 alleles (18.32 alleles per locus) were detected, with the most polymorphic markers, wmc177 and wms292, having the most alleles (24). A comparative study of SSR diversity among phenotypic clusters indicated that genotypes under cluster 1 had higher genetic diversity (0.879) and unique alleles (47%), suggesting their potential in future breeding programs. The unweighted neighbor-joining tree grouped the wheat genotypes into five major clusters. Wheat genotypes from all phenotypic clusters were distributed throughout all SSR-based clusters, indicating that genetically heterogeneous genotypes were allocated to different drought-tolerant groups. However, SSR-based clusters and model-based populations showed significant co-linearity (86.7%). The findings of the present study suggest that combining reflectance-based indirect phenotyping with SSR-based genotyping might be an effective technique for assessing genetic diversity to improve the drought tolerance of bread-wheat genotypes.
... edu/ group/ pritc hardl ab/ struc ture software/release_ versions/v2.3.4/html/structure.html) (Pritchard et al. 2000). Briefly, the number of groups (K) was set to 1-10 and each K value was simulated 10 times. ...
Article
Full-text available
Huangqi (Astragalus spp.) is a versatile herb that possesses several therapeutic effects against a variety of diseases, especially lung diseases. The aim of this study was to establish a core collection of Astragalus germplasm resources based on 10 simple sequence repeat (SSR) markers. We used 380 samples of Astragalus collected from different regions to a core Astragalus collection using five different methods, including PowerCore-based M strategy, CoreFinder-based M strategy, Core Hunter-based stepwise sampling, PowerMarker-based simulated annealing algorithm based on allele maximization, and PowerMarker-based simulated annealing algorithm based on maximizing genetic diversity. Among these methods, the CoreFinder-based M strategy was found to be the most suitable approach as it preserved all the alleles, and most of the genetic diversity parameters of the constructed core collections were higher than those of the initial collection. Additional analyses demonstrated that the genetic diversity of the core collection was similar to that of the initial collection. Further, phylogenetic trees indicated that the population structure of the core collection was similar to that of the initial collection. In addition, our results showed that the optimal grouping value of K was 2. The construction of a core collection is beneficial for the understanding, management, and utilization of Astragalus. Moreover, this study will serve as a valuable reference for constructing core collections of other plants and fungi.
... Population structure analysis for the analysed tea germplasm was done using Bayesian Markov Chain Monte Carlo model (MCMC) using STRU CTU RE V 2.3.4 (Pritchard et al. 2000). Three runs were performed for each population (K) set from 1 to 10. ...
Article
Full-text available
Tea [Camellia sinensis (L.) Kuntze] has primarily been improved by selections and controlled hybridizations. In India, the genetic improvement programs are largely led by United Planters Association of South India (UPASI). Tea has robust vegetative propagation and several high yielding commercial elite tea clones released by UPASI have been cultivated across the world. In a previous study, we analysed 42 elite UPASI tea clones using cytological and molecular analysis (Sharma and Raina, Int J Tea Sci 5:21–28, 2006). Present work analysed the same clones using Random amplified polymorphic DNA (RAPD) and Inter simple sequence repeat (ISSR) markers to document the genetic diversity and delineate the genetically distinct superior tea clones. A total of 447 and 116 bands were generated with 52 RAPD and 27 ISSR primers, out of which 395 and 70 bands, respectively were observed to be polymorphic. RAPD markers outcompeted ISSRs when compared against various genetic diversity attributes. An overall low Nei’s gene diversity (0.23 and 0.19) and higher value of gene flow (6.5 and 5.0) with both markers indicated narrow genetic base for the clones. Dendrograms delineated 42 clones into three major clusters whereas population STRUCTURE analysis clustered them into 6 subpopulations without discrete morphotype based grouping. Presence of many admixtures in STRUCTURE indicates towards diverse genetic ancestry of the analysed tea clones. A high level of genetic variation (90.48%) was revealed with analysis of molecular variance (AMOVA) within populations as compared to a low (9.52%) level among populations. A few superior clones were found to be genetically distinct than others and can be fruitfully used in future tea breeding programme.
... For the MCC panel, principal component analysis was conducted using EIGENSOFT software (Price et al. 2006). The population structure was examined using the STRU CTU RE V2.3.4 software with a burn-in period at 20,000 iterations and a run of 200,000 replications of Markov Chain Monte Carlo (MCMC) (Pritchard et al. 2000). A phylogenetic tree was constructed based on the p-distance using the software MEGA V10.2.5 with 1000 bootstrap replications (Tamura et al. 2021). ...
Article
Full-text available
Key message Genetic architecture controlling grain lutein content of common wheat was investigated through an integration of genome-wide association study (GWAS) and linkage analysis. Putative candidate genes involved in carotenoid metabolism and regulation were identified, which provide a basis for gene cloning and development of nutrient-enriched wheat varieties through molecular breeding. Abstract Lutein, known as ‘the eye vitamin’, is an important component of wheat nutritional and end-use quality. However, the genetic manipulation of grain lutein content (LUC) in common wheat has not previously been well studied. Here, quantitative trait loci (QTL) associated with the LUC measured by high performance liquid chromatography (HPLC) were first identified by integrating a genome-wide association study (GWAS) and linkage mapping. A Chinese wheat mini-core collection (MCC) of 262 accessions and a doubled haploid (DH) population derived from Jinchun 7 and L1219 were genotyped using the 90K SNP array. A total of 124 significant marker-trait associations (MTAs) on all 21 wheat chromosomes except for 1A, 4D, and 5B that formed 58 QTL were detected. Among them, six stable QTL were identified on chromosomes 2AL, 2DS, 3BL, 3DL, 7AL, and 7BS. Meanwhile, three of the ten QTL identified in the DH population, QLuc.5A.1 and QLuc.5A.2 on chromosome 5AL and QLuc.6A.2 on 6AS, were stable and independently explained 5.58–10.86% of the phenotypic variation. The QLuc.6A.2 region colocalized with two MTAs identified by GWAS. Moreover, 71 carotenoid metabolism-related candidate genes were identified, and the allelic effects were analyzed in the MCC panel based on the 90K array. Results revealed that the genes CYP97A3 (Chr. 6B) and CCD1 (Chr. 5A) were significantly associated with LUC. Additionally, the gene PSY3 (QLuc.5A.1) and several candidate genes involved in the methylerythritol 4-phosphate (MEP) pathways colocalized with stable QTL regions. The present study provides potential targets for future functional gene exploration and molecular breeding in common wheat.
... We used STRUCTURE 2.3.4 (Pritchard et al., 2000) to analyze the genetic structure of the 40 A. vulneraria populations. To estimate the number of genetic clusters (K), we carried out ten independent runs with K = 1-20 with 10 6 Markov chain Monte Carlo (MCMC) ...
Article
Full-text available
The abundant centre model (ACM) predicts that the suitability of environmental conditions for a species decreases from the centre of its distribution toward its range periphery and, consequently, its populations will become scarcer, smaller and more isolated, resulting in lower genetic diversity and increased differentiation. However, little is known about whether genetic diversity shows similar patterns along elevational and latitudinal gradients with similar changes in important environmental conditions. Using microsatellite markers, we studied the genetic diversity and structure of 20 populations each of Anthyllis vulneraria along elevational gradients in the Alps from the valleys to the elevational limit (2500 m) and along a latitudinal gradient (2500 km) from Central Europe to the range margin in northern Scandinavia. Both types of gradients corresponded to an 11.5°C difference in mean annual temperature. Genetic diversity strongly declined and differentiation increased with latitude in line with the predictions of the ACM. However, as population size did not decline with latitude and genetic diversity was not related to population size in A. vulneraria, this pattern is not likely to be due to less favorable conditions in the North, but due to serial founder effects during the post‐glacial recolonization process. Genetic diversity was not related to elevation, but we found significant isolation by distance along both gradients, although the elevational gradient was shorter by orders of magnitude. Subarctic populations differed genetically from alpine populations indicating that the northern populations did not originate from high elevational Alpine ones. Our results support the notion that postglacial latitudinal colonization over large distances resulted in a larger loss of genetic diversity than elevational range shifts. The lack of genetic diversity in subarctic populations may threaten their long‐term persistence in the face of climate change, whereas alpine populations could benefit from gene flow from low‐elevation populations. We studied the genetic diversity and population structure of 40 populations of the widespread plant species Anthyllis vulneraria (kidney vetch) along an elevational gradient in the Alps and a latitudinal gradient from Central Europe to northern Scandinavia. Genetic diversity strongly decreased and genetic differentiation among populations increased with latitude, suggesting that founder effects played an important role in the establishment of high‐latitude populations during the post‐glacial recolonization of formerly glaciated areas. Our results also indicate that subarctic populations differed genetically from alpine populations suggesting that the northern populations did not originate from high elevational Alpine ones.
Article
To analyze genetic diversity in 10 species of Ranunculaceae . The genetic diversity and genetic structure of 10 species of Ranunculaceae in 22 populations in Luoyang and surrounding areas were analyzed using primers selected by ISSR molecular markers. The 12 selected primers amplified a total of 116 clear bands, and the proportion of polymorphic bands was 98.1%. The average polymorphism information content (PIC) of the primers was 0.9478. The results of genetic diversity analysis showed that the Shannon information index ( I ) of 22 populations of Ranunculaceae plants was 0.4367±0.1904, and Nei’s genetic diversity index ( H ) was 0.2807±0.1481. The above results showed rich polymorphism in all 12 primers, and very rich genetic diversity in the 10 species of Ranunculaceae from 22 populations. The gene flow Nm was 0.3096 and genetic differentiation index Gst was 0.5997, indicating that genetic differentiation mainly derived from diversity within populations, with less gene communication among populations. The Mantel test showed positive correlation between genetic distance and geographical distance ( r = 0.2530, P < 0.01). Cluster analysis, principle coordination analysis (PCoA) and population cluster analysis yielded broadly consistent clustering results showing that individuals of the same germplasm were closely related, tending to be clustered into one group first; the second grouping was arranged according to the geographical distance. The genetic diversity of 10 species of Ranunculaceae in 22 populations is very rich. The variation among 22 populations is large, which indicates that the 10 species of Ranunculaceae have a strong ability to adapt to the environment. The combination of the three methods can improve the accuracy of cluster analysis of wild Ranunculaceae samples. This study lays the foundation for rational utilization and resource management of Ranunculaceae .
Chapter
Forensic DNA typing using short tandem repeat (STR) markers has been the mainstay of forensic identification for over 20 years although there has been a significant improvement in sensitivity over that time. This and the development of newer tools to assist forensic identification are discussed in this chapter. The increased sensitivity has led to an increased complexity in the DNA profiles obtained from crime scene material and the development of probabilistic tools to enable scientists to consider whether a particular individual has contributed their DNA to a mixture of several people, avoiding interpretation bias. Advances in technologies have led to the development of rapid DNA instruments, particularly for use on arrest to search against national databases, and sequencing of STRs provided through massively parallel sequencing instrumentation to provide greater discrimination. The latter also facilitates the parallel analysis of single‐nucleotide polymorphisms, and these can be used not only for identification but also to provide additional intelligence on externally visible characteristics of an individual, such as hair, eye and skin colour, and their biogeographical ancestry. The increased use of publicly available genetic information given for genealogy has been used to solve historic cold cases, and this development is discussed along with its inherent ethical considerations.
Article
Savalia savaglia is an Atlantic-Mediterranean zoantharian species with a patchy geographic and bathymetric distribution. Due to its longevity, S. savaglia may form large-sized colonies which play a crucial role in the ecosystem as habitat formers. Despite its ecological importance, little is known about the population structure and intraspecific genetic diversity of this species. Using ddRAD-Seq genotyping, we obtained genome-wide single nucleotide polymorphisms (SNPs) from 50 S. savaglia individuals collected at different depths (8–60 m) and localities across the Mediterranean Sea (Marseille, Sardinia, Puglia and Montenegro) and eastern Atlantic (Portugal). Our molecular observations were discussed with the reproductive behaviour of the species to understand the observed patterns of connectivity and gene flow. These results highlight the presence of three main genetic clusters (Marseille; Sardinia; and Montenegro + Portugal + Puglia), with some of the Mediterranean individuals being genetically closer to the Atlantic population rather than to other Mediterranean populations. The strong linkage disequilibrium recorded across loci and the detection of clonal individuals in the shallow populations suggest that asexual reproduction seems to be the dominant reproductive strategy among the S. savaglia populations sampled at lower depths. Our work highlights the potential of genome-wide SNP data to study the reproductive behaviour in species such as S. savaglia that are difficult to investigate in the field. The genetic connectivity data obtained in this study can be used in the future to better guide the development of effective management and conservation plans.
Article
Lecanosticta acicola is a pine needle pathogen causing brown spot needle blight that results in premature needle shedding with considerable damage described in North America, Europe, and Asia. Microsatellite and mating type markers were used to study the population genetics, migration history, and reproduction mode of the pathogen, based on a collection of 650 isolates from 27 countries and 26 hosts across the range of L. acicola. The presence of L. acicola in Georgia was confirmed in this study. Migration analyses indicate there have been several introduction events from North America into Europe. However, some of the source populations still appear to remain unknown. The populations in Croatia and western Asia appear to originate from genetically similar populations in North America. Intercontinental movement of the pathogen was reflected in an identical haplotype occurring on two continents, in North America (Canada) and Europe (Germany). Several shared haplotypes between European populations further suggests more local pathogen movement between countries. Moreover, migration analyses indicate that the populations in northern Europe originate from more established populations in central Europe. Overall, the highest genetic diversity was observed in south‐eastern USA. In Europe, the highest diversity was observed in France, where the presence of both known pathogen lineages was recorded. Less than half of the observed populations contained mating types in equal proportions. Although there is evidence of some sexual reproduction taking place, the pathogen spreads predominantly asexually and through anthropogenic activity. The pine needle pathogen Lecanosticta acicola has been introduced into Europe on several separate occasions with human activity supporting the pathogen's onwards spread from already established European populations into new areas.
Article
Propylea japonica (Coleoptera: Coccinellidae) is a natural enemy insect with a wide range of predation in mainland China and is commonly used in pest management. However, its genetic pattern (i.e., genetic variation, genetic structure, and historical population dynamics) is still unclear, impeding the development of biological control of insect pests. Population genetic research has the potential to optimize strategies at different stages of the biological control processes. This study used 23 nuclear microsatellite sites and mitochondrial COI genes to investigate the population genetics of Propylea japonica based on 462 specimens collected from 30 sampling sites in China. The microsatellite dataset showed a moderate level of genetic diversity, but the mitochondrial genes showed a high level of genetic diversity. Populations from the Yellow River basin were more genetically diverse than those in the Yangtze River basin. P. japonica has not yet formed a significant geographically genealogical structure in China, but there was a population structure signal to some extent, which may be caused by frequent gene flow between populations. The species has experienced population expansion after a bottleneck, potentially thanks to the tri-trophic plant-insect-natural enemy relationship. Knowledge of population genetics is of importance in using predators to control pests. Our study complements existing knowledge of an important natural predator in agroecosystems through estimating its genetic diversity and population differentiation and speculating about historical dynamics.
Article
Castanopsis sclerophylla (Lindl.) Schott. is a canopy tree species of evergreen broad-leaved forests in subtropical China. In this study, the genetic diversity and population structure of C. sclerophylla were investigated by using chloroplast DNA sequences and nuclear microsatellite markers. Permutation tests with chloroplast DNA sequences indicated the presence of phylogeographic structure in C. sclerophylla. Based on nuclear microsatellite markers, Bayesian clustering analysis revealed eastern-to-western differentiation in C. sclerophylla, and the analysis of molecular variance suggested population divergence has arisen along the Xuefeng, Luoxiao, and Wuyi mountain ranges. The approximate Bayesian computation demonstrated that the genetic diversity pattern of C. sclerophylla could be explained by geographic isolation followed by secondary contact. Ecological niche modelling showed that distribution of C. sclerophylla shrank southward at the Last Glacial Maximum and expanded northward at the Mid Holocene. These results suggested that the uplift of the Xuefeng, Luoxiao, and Wuyi mountain ranges and the interglacial–glacial climate change shaped the genetic diversity of C. sclerophylla. The Luoxiao mountain range should be considered as a key conservation unit of C. sclerophylla due to its higher level of genetic diversity. Our study supplies important information for prioritizing the conservation and sustainable utilization of C. sclerophylla, and provides insight on the dynamics of evergreen broad-leaved forests in subtropical China.
Article
Full-text available
Tragopogon is an Old World genus with 150 species. Mediterranean, Middle East, and Eastern Europe are the distribution centers of this genus. This genus has 26 species in Iran, of which 11 are endemic. The morphology studies of 32 species and Molecular studies (ISSR, ITS, cp DNA) of 22 species of the genus Tragopogon were investigated. Despite the anatomical and molecular studies done around the world, the exact classification of this genus is not clear due to the high number of secret species, hybridization, polyploidy, and rapid diversification. The purpose of these studies is to classify, determine interspecific relationship in this genus, and determine the important morphological characteristics in taxon differentiation. We conclude that sections of Rubriflori, Sosnowskya, Chromopappus, Majores, Angustissimi, and Krascheninnikovia introduced by the flora of Iranica are confirmed by our morphometry and molecular studies. Section of Profundisulcati in flora Iranica is confirmed based on morphometry data. The Species of T. jesdianus, T. porphyrocephalus, T. rezaiyensis, and T. Stroterocarpus in the flora of Iranica are not classified in any section which we classified in the Rubriflori section. Cp DNA dendrogram is not useful for classification in this genus and Chloroplast sequences are very similar among Tragopogon species. Therefore, the use of cp DNA markers in the classification of this genus is not recommended. The use of ISSR and ITS molecular markers are useful for classifying the genus Tragopogon.
Article
An experiment was conducted to analyse a germplasm panel of 92 Cucumis sp. accessions predominantly from India for fruit acidity and validate the involvement of ‘CmpH’ locus for this trait through design of suitable marker and association analysis. Among 92 accessions, 28 accessions were found to be sour with fruit pH < 5.0 and titratable acidity value >0.20%. We designed a primer pair ‘Dupl‐12’ flanking the causal polymorphism of 12 bp duplication in the earlier reported candidate gene ‘CmpH’ for fruit acidity. It amplified two types of alleles (180 and 192 bp) depending upon the absence or presence of 12 bp duplication and was codominant in nature. This was used to perform association analysis in the germplasm panel and was found to be significantly associated with titratable acidity (R2 = 73.43%) and pH (R2 = 70.76%), validating the role of ‘CmpH’ gene for fruit acidity. ‘Dupl‐12’ marker can therefore be employed for marker‐assisted transfer of acidity into elite backgrounds.
Article
Lavandin (Lavandula × intermedia Emeric ex Loisel.) was brought to the Island of Hvar (Croatia) in the 1920s, coinciding with the beginning of large-scale cultivation of lavandin in France. Although the cultivation of lavandin and the production of essential oils are of great importance worldwide, the genetic diversity of lavandin has been little studied. We performed an AFLP-based genetic analysis that included the landraces ‘Bila’ and ‘Budrovka’ and two lavandin cultivars from France ‘Grosso’ and ‘Abrialis’, as well as the parental species of the hybrid (L. angustifolia and L. latifolia). Distance-based cluster analysis revealed the existence of the third landrace, named ‘Budrovka Sveti Nikola’. This result was confirmed by the model-based cluster analyses implemented in STRUCTURE and BAPS, where the optimal number of clusters was three. ‘Budrovka’ clearly separated from all other samples, while ‘Bila’ and ‘Budrovka Sveti Nikola’ showed some degree of admixture, indicating ancestral polyclonality. The landrace ‘Bila’ showed higher polymorphism than ‘Budrovka’ and ‘Budrovka Sveti Nikola’. Analysis of molecular variance (AMOVA) showed that genetic diversity (56.63%) was higher within landraces than among (43.37%). This research will provide a basis for conservation of the Island landraces and will help in the establishment of a high-quality regional brand.
Article
Chrysoporthe deuterocubensis is the causal agent of a serious canker disease of Eucalyptus spp. in tropical and sub‐tropical areas of Asia. Where the disease occurs, Eucalyptus spp. are non‐native and the native hosts of the pathogen have not been conclusively determined. The chance discovery of C. deuterocubensis on native Melastoma malabathricum growing in the understory of Eucalyptus trees in Indonesia raised the possibility that this woody shrub could be a native host of C. deuterocubensis, which would then have undergone a host jump to Eucalyptus. Four populations of C. deuterocubensis were collected from cankers on Eucalyptus and Melastoma trees in Indonesia and Malaysia. These were subjected to population genetic analyses using microsatellite markers. The genetic diversity for isolates in all four populations was relatively low (H¯$$\overline{H}$$ = 0.166–0.316) and unlikely to represent a native source. The maximum percentage of genotypic diversity was high for all four populations (Gmax = 60.0–71.5%) irrespective of the host. Structure analyses and haplotype networks showed no structure related to the host plant species. Overall, the results failed to provide evidence that the indigenous M. malabathricum is the native host or origin of C. deuterocubensis infections on the Eucalyptus trees studied.
Article
Eye color prediction based on an individual’s genetic information is of interest in the field of forensic genetics. In recent years, researchers have studied different genes and markers associated with this externally visible characteristic and have developed methods for its prediction. The IrisPlex represents a validated tool for homogeneous populations, though its applicability in populationsof mixed ancestry is limited, mainly regarding the prediction of intermediate eye colors. With the aim of validating the applicability of this system in an admixed population from Argentina (n = 302), we analyzed the six single nucleotide variants used in that multiplex for eye color and four additional SNPs, and evaluated its prediction ability. We also performed a genotype–phenotype association analysis. This system proved to be useful when dealing with the extreme ends of theeye color spectrum (blue and brown) but presented difficulties in determining the intermediate phenotypes (green), which were found in a large proportion of our population. We concluded that these genetic tools should be used with caution in admixed populations and that more studies are required in order to improve the prediction of intermediate phenotypes.
Article
Microsatellites have been a workhorse of evolutionary genetic studies for decades and are still commonly in use for estimating signatures of genetic diversity at the population and species level across a multitude of taxa. Yet, the very high mutation rate of these loci is a double-edged sword, conferring great sensitivity at shallow levels of analysis (e.g. paternity analysis) but yielding considerable uncertainty for deeper evolutionary comparisons. For the present study, we used reduced representation genome-wide data (restriction site-associated DNA sequencing (RADseq)) to test for patterns of interspecific hybridization previously characterized using microsatellite data in a contact zone between two closely related mouse lemur species in Madagascar ( Microcebus murinus and Microcebus griseorufus ). We revisit this system by examining populations in, near, and far from the contact zone, including many of the same individuals that had previously been identified as hybrids with microsatellite data. Surprisingly, we find no evidence for admixed nuclear ancestry. Instead, re-analyses of microsatellite data and simulations suggest that previously inferred hybrids were false positives and that the program N ew H ybrids can be particularly sensitive to erroneously inferring hybrid ancestry. Combined with results from coalescent-based analyses and evidence for local syntopic co-occurrence, we conclude that the two mouse lemur species are in fact completely reproductively isolated, thus providing a new understanding of the evolutionary rate whereby reproductive isolation can be achieved in a primate.
Article
Understanding the genetic basis of quality-related traits contributes to the improvement of grain protein concentration (GPC), grain starch concentration (GSC), and wet gluten concentration (WGC) in wheat, a genome-wide association study (GWAS) based on a mixed linear model (MLM) was performed on the 236 wheat accessions including 160 cultivars and 76 landraces using 55K single nucleotide polymorphism (SNP) array in multiple environments. A total of twelve stable QTL/SNPs were identified to control different quality traits in this populations at least two environments under stripe rust stress; three, seven and two QTLs associated with GPC, GSC, and WGC were characterized respectively and located on chromosomes 1B, 1D, 2A, 2B, 2D, 3B, 3D, 5D, and 7D with the range of phenotypic variation explained (PVE) from 4.2 to 10.7%. Compared with the previously reported QTLs/genes, five QTLs (QGsc.sicau-1BL, QGsc.sicau-1DS, QGsc.sicau-2DL.1, QGsc.sicau-2DL.2, QWgc.sicau-5DL) were potentially novel. KASP markers for SNPs AX-108770574 and AX-108791420 on chromosome on 5D associated with wet gluten concentration were successfully developed. Phenotype of the cultivars containing the A-allele in AX-108770574 and T-allele in AX-108791420 were extremely significantly (P<0.01) higher than that of the landraces containing the G-allele or C-allele of wet gluten concentration in each of the environments. The developed and validated KASP markers could be utilized in molecular breeding aiming to improve the quality in wheat.
Article
The widespread human commensal blindsnake species Indotyphlops braminus is currently the only known obligate parthenogenetic snake species. It is also known to be triploid. However, much of these data is from specimens collected outside India which is the native range of this species. Polyploidy and parthenogenesis are often associated with hybridization in amphibians and lizards. In this study, we generated nuclear and mitochondrial data from multiple Indotyphlops lineages from across peninsular India and investigated the possible hybrid origin of I. braminus. Species delimitation suggested three putative species, one of which was I. pammeces and the other two morphologically matched I. braminus. One of these was confined to the wet zone (high rainfall areas) while the other was largely distributed in the dry zone. There was wide discordance in the relationships between these lineages across markers and different tree building approaches suggesting past or ongoing geneflow. The statistical test for hybridization also implied geneflow across these three lineages. Furthermore, the dry zone I. braminus appears to be true I. braminus as the topotypic material falls within this clade. These results suggest that the widespread, commensal, and parthenogenetic Indotyphlops is a separate species from I. braminus, and further investigation is required to determine diagnostic morphological characters for a species description.
Article
Hybridization between diverging lineages is associated with the generation and loss of species diversity, introgression, adaptation, and changes in reproductive mode, but it is unknown when and why it results in these divergent outcomes. We estimate a comprehensive evolutionary network for the largest group of unisexual vertebrates and use it to understand the evolutionary outcomes of hybridization. Our results show that rates of introgression between species decrease with time since divergence and suggest that species must attain a threshold of evolutionary divergence before hybridization results in transitions to unisexuality. Rates of hybridization also predict genome-wide patterns of genetic diversity in whiptail lizards. These results distinguish among models for hybridization that have not previously been tested and suggest that the evolutionary outcomes can be predictable.
Article
This study investigated the genetic variability existing within the indigenous sheep population of Benin. Hair samples from 681 unrelated sheep collected across the 10 phytogeographic zones of Benin were genotyped using a set of 12 microsatellite markers. Genetic diversity indices, Bayesian ancestral admixture model, Self-Organizing Map (SOM), and Discriminant Analysis of Principal Components (DAPC), were performed to assess both the genetic diversity and spatial structure of the population under study. The polymorphism information content (PIC) recorded for each microsatellite marker used was greater than 0.50 with an average value of 0.70. The average number of alleles (12.17) and heterozygosity values (He = 0.73; Ho = 0.68) obtained suggest high genetic diversity within the Beninese sheep population. The population genetic structure analysis revealed two ancestral sheep populations in Benin, namely, the Djallonké and Sahelian sheep. Furthermore, SOM and DAPC identified four clusters of sheep within the 10 phytogeographic zones, including two Djallonké and two Sahelian subpopulations with some admixture. Djallonké sheep were predominant in the humid zones of southern Benin, with one subpopulation mainly present in the Pobè and Oueme Valley zones, and the second in Coastal, Plateau, and Zou zones. The latter subpopulation seemed the most admixed, with a mean Djallonké ancestral proportion of 86.15%. Sahelian sheep were predominant in the phytogeographic zones of northern Benin. Sheep from the driest North zone consisted the first Sahelian subpopulation and had a higher Sahelian ancestry proportion (89.81%) compared to the second subpopulation (69.29%) that most consisted of sheep from the other phytogeographic zones of the North.
Article
The ancestry of each locus of the genome can be estimated (local ancestry) based on sequencing or genotyping information together with reference panels of ancestral source populations. The length of those ancestry-specific genomic segments are commonly used to understand migration waves and admixture events. In short time scales, it is often of interest to determine the existence of the most recent unadmixed ancestor from a specific population t generations ago. We built a hypothesis test to determine if an individual has an ancestor belonging to a target ancestral population t generations ago based on these lengths of the ancestry-specific segments at an individual level. We applied this test on a data set that includes 20 Uruguayan admixed individuals to estimate for each one how many generations ago the most recent indigenous ancestor lived. As this method tests each individual separately, it is particularly suited to small sample sizes, such as our study or ancient genome samples.
Article
Full-text available
The Acacia senegal complex of species is formed by a set of species belonging to the subgenus Aculeiferum. Morphological differentiation is very often difficult within this complex, especially between the closest species. The objective of this study is to differentiate species of A. senegal based on genetic markers. A total of 217 samples collected from five natural stands in Niger were analyzed using 11 microsatellite markers. A strong genetic structure was observed with three major groups. Such genetic structuring confirms the heterogeneity of the samples that would come from different species. A large genetic diversity has been revealed in 157 Annales de l'Université Abdou Moumouni Série A – Tome XXIV déc. -2018 the five populations. The allelic richness ranges from 4.20 to 5.71, the observed heterozygosity rate Ho varies from 0.48 to 0.76 and the expected heterozygosity rate He varies from 0.50 to 0.53. The genetic markers used here may help differentiate species of the A. senegal complex and quantify their level of genetic diversity. Key words: Acacia senegal, microsatellites, diversity, genetic structure, Niger.
Article
Human-mediated dispersal of animals often acts to bring populations that have been separated for substantial periods of evolutionary time (e.g. millions of years) in their native range into contact in their introduced range. Whether these taxa successfully interbreed in the introduced range provides information on the strength of reproductive isolation amongst them. The invasive delicate skink (Lampropholis delicata) has been accidentally introduced to Lord Howe Island from four genetically divergent (>2 million years) regions of the species’ native range in eastern Australia. We used mitochondrial DNA and microsatellite data to investigate whether the individuals from four of the native-range source regions are interbreeding on Lord Howe Island. Our analyses indicate that intraspecific hybridisation among individuals from all four native-range source regions is occurring. Although there is little evidence for hybrids in the northern end of Lord Howe Island (proportion of hybrids: 0–0.02; n = 31), there is a high proportion of hybrids in the central (0.33–0.69; n = 59) and southern regions (0.38–0.75; n = 8) of the island. Given the strong evidence for interbreeding among all four native-range source regions examined, and the relatively minor morphological, life-history and phenotypic variation among them, we suggest that the delicate skink should continue to be treated as a single, widespread, but variable species.
Article
Full-text available
Article
Full-text available
While anadromous salmonids reproduce in fresh water, most harvests occur at sea. Effective genetic management requires knowledge of the stock (source population) composition of the harvest. This is accomplished with genetic stock identification (GSI), which compares the genotypes of harvested fish with those of freshwater stocks, assuming that all candidate stocks are identified and that their allele frequencies are known exactly. We develop methods that: (1) allow for sampling error in allele frequencies of candidate stocks, and (2) evaluate the possibility of unsampled contributing stocks. Composition analysis for chinook salmon (Oncorhynchus tshawytscha) collected for the Bonneville Dam egg bank program in 1980 and 1981 shows that about 10% of both harvests were from the Deschutes River and about 90% from the Hanford Reach area. Contributions from lower Columbia and Snake River stocks or from unidentified sources were limited.
Article
Full-text available
As currently defined, DNA fingerprint profiles do not uniquely identify individuals. For criminal cases involving DNA evidence, forensic scientists evaluate the conditional prob-ability that an unknown, but distinct, individual matches the crime sample, given that the defendant matches. Estimates of the conditional probability of observing matching profiles are based on reference populations maintained by forensic testing laboratories. Each of these databases is heterogeneous, being composed of subpopulations of different heritages. This heterogeneity has an impact on the weight of the evidence. A hierarchical Bayes model is formulated that incorporates the key physical characteristics inherent in these data. With the help of Markov chain Monte Carlo sampling, levels of heterogeneity are estimated for three major ethnic groups in the database of Lifecodes Corporation.
Article
Full-text available
Genetic variation at hypervariable loci is being used extensively for linkage analysis and individual identification, and may be useful for inter-population studies. Here we show that polymorphic microsatellites (primarily CA repeats) allow trees of human individuals to be constructed that reflect their geographic origin with remarkable accuracy. This is achieved by the analysis of a large number of loci for each individual, in spite of the small variations in allele frequencies existing between populations. Reliable evolutionary relationships could also be established in comparisons among human populations but not among great ape species, probably because of constraints on allele length variation. Among human populations, diversity of microsatellites is highest in Africa, which is in contrast to other nuclear markers and supports the hypothesis of an African origin for humans.
Article
Full-text available
A method is proposed for allowing for the effects of population differentiation, and other factors, in forensic inference based on DNA profiles. Much current forensic practice ignores, for example, the effects of coancestry and inappropriate databases and is consequently systematically biased against defendants. Problems with the 'product rule' for forensic identification have been highlighted by several authors, but important aspects of the problems are not widely appreciated. This arises in part because the match probability has often been confused with the relative frequency of the profile. Further, the analogous problems in paternity cases have received little attention. The proposed method is derived under general assumptions about the underlying population genetic processes. Probabilities relevant to forensic inference are expressed in terms of a single parameter whose values can be chosen to reflect the specific circumstances. The method is currently used in some UK courts and has important advantages over the 'Ceiling Principle' method, which has been criticized on a number of grounds.
Article
Full-text available
Attempts to study the genetic population structure of large mammals are often hampered by the low levels of genetic variation observed in these species. Polar bears have particularly low levels of genetic variation with the result that their genetic population structure has been intractable. We describe the use of eight hypervariable microsatellite loci to study the genetic relationships between four Canadian polar bear populations: the northern Beaufort Sea, southern Beaufort Sea, western Hudson Bay, and Davis Strait-Labrador Sea. These markers detected considerable genetic variation, with average heterozygosity near 60% within each population. Interpopulation differences in allele frequency distribution were significant between all pairs of populations, including two adjacent populations in the Beaufort Sea. Measures of genetic distance reflect the geographic distribution of populations, but also suggest patterns of gene flow which are not obvious from geography and may reflect movement patterns of these animals. Distribution of variation is sufficiently different between the Beaufort Sea populations and the two more eastern ones that the region of origin for a given sample can be predicted based on its expected genotype frequency using an assignment test. These data indicate that gene flow between local populations is restricted despite the long-distance seasonal movements undertaken by polar bears.
Article
Full-text available
In DNA profile analysis, uncertainty arises due to a number of factors such as sampling error, single bands and correlations within and between loci. One of the most important of these factors is kinship: criminal and innocent suspect may share one or more bands through identity by descent from a common ancestor. Ignoring this uncertainty is consistently unfair to innocent suspects. The effect is usually small, but may be important in some cases. The report of the US National Research Committee proposed a complicated, ad-hoc and overly-conservative method of dealing with some of these problems. We propose an alternative approach which addresses directly the effect of kinship. Whilst remaining conservative, it is simple, logically coherent and makes efficient use of the data.
Article
Full-text available
Immigration is an important force shaping the social structure, evolution, and genetics of populations. A statistical method is presented that uses multilocus genotypes to identify individuals who are immigrants, or have recent immigrant ancestry. The method is appropriate for use with allozymes, microsatellites, or restriction fragment length polymorphisms (RFLPs) and assumes linkage equilibrium among loci. Potential applications include studies of dispersal among natural populations of animals and plants, human evolutionary studies, and typing zoo animals of unknown origin (for use in captive breeding programs). The method is illustrated by analyzing RFLP genotypes in samples of humans from Australian, Japanese, New Guinean, and Senegalese populations. The test has power to detect immigrant ancestors, for these data, up to two generations in the past even though the overall differentiation of allele frequencies among populations is low.
Article
Full-text available
We analyzed the European genetic contribution to 10 populations of African descent in the United States (Maywood, Illinois; Detroit; New York; Philadelphia; Pittsburgh; Baltimore; Charleston, South Carolina; New Orleans; and Houston) and in Jamaica, using nine autosomal DNA markers. These markers either are population-specific or show frequency differences >45% between the parental populations and are thus especially informative for admixture. European genetic ancestry ranged from 6.8% (Jamaica) to 22.5% (New Orleans). The unique utility of these markers is reflected in the low variance associated with these admixture estimates (SEM 1.3%-2.7%). We also estimated the male and female European contribution to African Americans, on the basis of informative mtDNA (haplogroups H and L) and Y Alu polymorphic markers. Results indicate a sex-biased gene flow from Europeans, the male contribution being substantially greater than the female contribution. mtDNA haplogroups analysis shows no evidence of a significant maternal Amerindian contribution to any of the 10 populations. We detected significant nonrandom association between two markers located 22 cM apart (FY-null and AT3), most likely due to admixture linkage disequilibrium created in the interbreeding of the two parental populations. The strength of this association and the substantial genetic distance between FY and AT3 emphasize the importance of admixed populations as a useful resource for mapping traits with different prevalence in two parental populations.
Article
Full-text available
We consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined. We follow Dempster in examining the posterior distribution of the log-likelihood under each model, from which we derive measures of fit and complexity (the effective number of parameters). These may be combined into a Deviance Information Criterion (DIC), which is shown to have an approximate decision-theoretic justification. Analytic and asymptotic identities reveal the measure of complexity to be a generalisation of a wide range of previous suggestions, with particular reference to the neural network literature. The contributions of individual observations to fit and complexity can give rise to a diagnostic plot of deviance residuals against leverages. The procedure is illustrated in a number of examples, and throughout it is emphasised that the required quantities are trivial to compute in a Markov chain Monte Carlo analysis, and require no analytic work for new...
Article
The utilization of DNA evidence in cases of forensic identification has become widespread over the last few years. The strength of this evidence against an individual standing trial is typically presented in court in the form of a likelihood ratio (LR) or its reciprocal (the profile match probability). The value of this LR will vary according to the nature of the genetic relationship between the accused and other possible perpetrators of the crime in the population. This paper develops ideas and methods for analysing data and evaluating LRs when the evidence is based on short tandem repeat profiles, with special emphasis placed on a Bayesian approach. These are then applied in the context of a particular quadruplex profiling system used for routine case-work by the UK Forensic Science Service.
Article
In the context of Bayes estimation via Gibbs sampling, with or without data augmentation, a simple approach is developed for computing the marginal density of the sample data (marginal likelihood) given parameter draws from the posterior distribution. Consequently, Bayes factors for model comparisons can be routinely computed as a by-product of the simulation. Hitherto, this calculation has proved extremely challenging. Our approach exploits the fact that the marginal density can be expressed as the prior times the likelihood function over the posterior density. This simple identity holds for any parameter value. An estimate of the posterior density is shown to be available if all complete conditional densities used in the Gibbs sampler have closed-form expressions. To improve accuracy, the posterior density is estimated at a high density point, and the numerical standard error of resulting estimate is derived. The ideas are applied to probit regression and finite mixture models.
Article
We provide a detailed, introductory exposition of the Metropolis-Hastings algorithm, a powerful Markov chain method to simulate multivariate distributions. A simple, intuitive derivation of this method is given along with guidance on implementation. Also discussed are two applications of the algorithm, one for implementing acceptance-rejection sampling when a blanketing function is not available and the other for implementing the algorithm with block-at-a-time scans. In the latter situation, many different algorithms, including the Gibbs sampler, are shown to be special cases of the Metropolis-Hastings algorithm. The methods are illustrated with examples.
Article
New methodology for fully Bayesian mixture analysis is developed, making use of reversible jump Markov chain Monte Carlo methods that are capable of jumping between the parameter subspaces corresponding to different numbers of components in the mixture. A sample from the full joint distribution of all unknown variables is thereby generated, and this can be used as a basis for a thorough presentation of many aspects of the posterior distribution. The methodology is applied here to the analysis of univariate normal mixtures, using a hierarchical prior model that offers an approach to dealing with weak prior information while avoiding the mathematical pitfalls of using improper priors in the mixture context.
Article
Recently founded populations represent an enormous challenge for genetic analysis: new populations are often genetically impoverished, making it hard to find sufficiently variable markers, and what little variation is present tends to be ancestral, rendering phylogenetic methods inappropriate. Recently, novel genetic markers and new statistical analyses have made multilocus genotyping an invaluable tool in the fledgling field of nonequilibrium population genetics. Such advances are not of mere academic interest but address questions of great economic, medical and conservation significance.
Article
To test hypotheses about the origin of modern humans, we analyzed mtDNA sequences, 30 nuclear restriction-site polymorphisms (RSPs), and 30 tetranucleotide short tandem repeat (STR) polymorphisms in 243 Africans, Asians, and Europeans. An evolutionary tree based on mtDNA displays deep African branches, indicating greater genetic diversity for African populations. This finding, which is consistent with previous mtDNA analyses, has been interpreted as evidence for an African origin of modern humans. Both sets of nuclear polymorphisms, as well as a third set of trinucleotide polymorphisms, are highly consistent with one another but fail to show deep branches for African populations. These results, which represent the first direct comparison of mtDNA and nuclear genetic data in major continental populations, undermine the genetic evidence for an African origin of modern humans.
Article
Our goal is to infer, from human genetic data, general patterns as well as details of human evolutionary history. Here we present the results of an analysis of genetic data at the level of the individual. A tree relating 144 individuals from 12 human groups of Africa, Asia, Europe, and Oceania, inferred from an average of 75 DNA polymorphisms/individual, is remarkable in that most individuals cluster with other members of their regional group. In order to interpret this tree, we consider the factors that influence the tree pattern, including the number of genetic loci examined, the length of population isolation, the sampling process, and the extent of gene flow among groups. Understanding the impact of these factors enables us to infer details of human evolutionary history that might otherwise remain undetected. Our analyses indicate that some recent ancestor(s) of each of a few of the individuals tested may have immigrated. In general, the populations within regional groups appear to have been isolated from one another for <25,000 years. Regional groups may have been isolated for somewhat longer.
Article
We examine the issue of population stratification in association-mapping studies. In case-control studies of association, population subdivision or recent admixture of populations can lead to spurious associations between a phenotype and unlinked candidate loci. Using a model of sampling from a structured population, we show that if population stratification exists, it can be detected by use of unlinked marker loci. We show that the case-control-study design, using unrelated control individuals, is a valid approach for association mapping, provided that marker loci unlinked to the candidate locus are included in the study, to test for stratification. We suggest guidelines as to the number of unlinked marker loci to use.
Article
Richardson and Green (1997) present a method of performing a Bayesian analysis of data from a finite mixture distribution with an unknown number of components. Their method is a Markov Chain Monte Carlo (MCMC) approach, which makes use of the "reversible jump" methodology described by Green (1995). We describe an alternative MCMC method which views the parameters of the model as a (marked) point process, extending methods suggested by Ripley (1977) to create a Markov birth-death process with an appropriate stationary distribution. Our method is easy to implement, even in the case of data in more than one dimension, and we illustrate it on both univariate and bivariate data. Keywords: Bayesian analysis, Birth-death process, Markov process, MCMC, Mixture model, Model Choice, Reversible Jump, Spatial point process 1 Introduction Finite mixture models are typically used to model data where each observation is assumed to have arisen from one of k groups, each group being suitably modelle...
Article
Markov chain Monte Carlo methods for Bayesian computation have until recently been restricted to problems where the joint distribution of all variables has a density with respect to some fixed standard underlying measure. They have therefore not been available for application to Bayesian model determination, where the dimensionality of the parameter vector is typically not fixed. This paper proposes a new framework for the construction of reversible Markov chain samplers that jump between parameter subspaces of differing dimensionality, which is flexible and entirely constructive. It should therefore have wide applicability in model determination problems. The methodology is illustrated with applications to multiple change-point analysis in one and two dimensions, and to a Bayesian comparison of binomial experiments.
Article
In a Bayesian analysis of finite mixture models, parameter estimation and clustering are sometimes less straightforward that might be expected. In particular, the common practice of estimating parameters by their posterior mean, and summarising joint posterior distributions by marginal distributions, often leads to nonsensical answers. This is due to the so-called "labelswitching " problem, which is caused by symmetry in the likelihood of the model parameters. A frequent response to this problem is to remove the symmetry using artificial identifiability constraints. We demonstrate that this fails in general to solve the problem, and describe an alternative class of approaches, relabelling algorithms, which arise from attempting to minimise the posterior expected loss under a class of loss functions. We describe in detail one particularly simple and general relabelling algorithm, and illustrate its success in dealing with the labelswitching problem on two examples. KEYWORDS: ...
Detecting immigration by using multilocus genotypes
• Rannala
The transmission/disequilibrium test: history, subdivision, and admixture
• W J Ewens
• R S Spielman
Effective population size and gene flow in the globally, critically endangered Taita thrush, Turdus helleri
• Galbusera
Computing Bayes factors by posterior simulation and asymptotic approximations
• T DiCiccio
• R Kass
• A Raftery
• L Wasserman
Effective population size and gene flow in the globally, critically endangered Taita thrush, Turdus helleri
• P Galbusera
• L Lens
• E Waiyaki
• T Schenck
• E Mattysen
Hypothesis testing and model selection
• A E Raftery
• W R Gilks
• S Richardson
• D J Spiegelhalter
Bayesian deviance, the effective number of parameters, and the comparison of arbitrarily complex models
• D J Spiegelhalter
• G Bestn
• P Carlinb
Gene genealogies and the coalescent process, chains considered in this article. Furthermore, for suffipp. 1-44 in Oxford Surveys
• R R Hudson
Hudson, R. R., 1990 Gene genealogies and the coalescent process, chains considered in this article. Furthermore, for suffipp. 1-44 in Oxford Surveys in Evolutionary Biology, Vol. 7, edited ciently large c, (m), (mϩc), (mϩ2c),... will be reasonably by D. Futuyma and J. Antonovics. Oxford University Press, independent samples from (). The value of m used Oxford.
Fraley is often referred to as the burn-in period of the chain
• L B Jorde
• W S Watkins
• R Zenger
Jorde, L. B., M. J. Bamshad, W. S. Watkins, R. Zenger, A. E. Fraley is often referred to as the burn-in period of the chain;
Origins and affinities of modern humans: a comparic is often referred to as the thinning interval. son of mitochondrial and nuclear genetic data
et al., 1995 Origins and affinities of modern humans: a comparic is often referred to as the thinning interval. son of mitochondrial and nuclear genetic data. Am. J. Hum. In general it is very difficult to know how large m Genet. 57: 523-538.
Multilocus genoand c should be. The values required to obtain reliable types, a tree of individuals, and human evolutionary history. Am. results depend heavily on the amount of correlation
• J L Mountain
• L L Cavalli-Sforza
Mountain, J. L., and L. L. Cavalli-Sforza, 1997 Multilocus genoand c should be. The values required to obtain reliable types, a tree of individuals, and human evolutionary history. Am. results depend heavily on the amount of correlation J. Hum. Genet. 61: 705-718. between successive states of the Markov chain. If succes-
Hypothesis testing and model selection, pp. required, possibly rendering the method impracticable
• A E Raftery
Raftery, A. E., 1996 Hypothesis testing and model selection, pp. required, possibly rendering the method impracticable.
Chapman & Hall, London. ciently large, and the strategy we adopt here
• S Gilks
• D J Richardson
• Spiegelhalter
Gilks, S. Richardson and D. J. Spiegelhalter. Chapman & Hall, London. ciently large, and the strategy we adopt here, is to simu-
Detecting immigration by late several realizations of the Markov chain, each startusing multilocus genotypes
• B Rannala
• J L Mountain
Rannala, B., and J. L. Mountain, 1997 Detecting immigration by late several realizations of the Markov chain, each startusing multilocus genotypes. Proc. Natl. Acad. Sci. USA 94: 9197-ing from a different value of (0). If m and c are 9201.
On Bayesian analysis of sufficiently large, then the results obtained should be mixtures with an unknown number of components
• S Richardson
• P J Green
Richardson, S., and P. J. Green, 1997 On Bayesian analysis of sufficiently large, then the results obtained should be mixtures with an unknown number of components. J. R. Stat. independent of (0) and should therefore be similar for
B (in press). and that although it is not possible to simulate from Communicating editor: M. K. Uyenoyama () directly, it is possible to simulate a random value of i directly from the full conditional distribution
• M Stephens
Stephens, M., 2000b Dealing with label-switching in mixture modpose that may be partitioned into ϭ ( 1,..., r ), els. J. R. Stat. Soc. Ser. B (in press). and that although it is not possible to simulate from Communicating editor: M. K. Uyenoyama () directly, it is possible to simulate a random value of i directly from the full conditional distribution ( i | 1, 2,..., iϪ1, i ϩ 1,..., r ) for i ϭ 1, 2,...,