A Markov Chain Monte Carlo Approach for Joint Inference of Population Structure and Inbreeding Rates From Multilocus Genotype Data

Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA.
Genetics (Impact Factor: 5.96). 08/2007; 176(3):1635-51. DOI: 10.1534/genetics.107.072371
Source: PubMed


Nonrandom mating induces correlations in allelic states within and among loci that can be exploited to understand the genetic structure of natural populations (Wright 1965). For many species, it is of considerable interest to quantify the contribution of two forms of nonrandom mating to patterns of standing genetic variation: inbreeding (mating among relatives) and population substructure (limited dispersal of gametes). Here, we extend the popular Bayesian clustering approach STRUCTURE (Pritchard et al. 2000) for simultaneous inference of inbreeding or selfing rates and population-of-origin classification using multilocus genetic markers. This is accomplished by eliminating the assumption of Hardy-Weinberg equilibrium within clusters and, instead, calculating expected genotype frequencies on the basis of inbreeding or selfing rates. We demonstrate the need for such an extension by showing that selfing leads to spurious signals of population substructure using the standard STRUCTURE algorithm with a bias toward spurious signals of admixture. We gauge the performance of our method using extensive coalescent simulations and demonstrate that our approach can correct for this bias. We also apply our approach to understanding the population structure of the wild relative of domesticated rice, Oryza rufipogon, an important partially selfing grass species. Using a sample of n = 16 individuals sequenced at 111 random loci, we find strong evidence for existence of two subpopulations, which correlates well with geographic location of sampling, and estimate selfing rates for both groups that are consistent with estimates from experimental data (s approximately 0.48-0.70).

Full-text preview

Available from:
    • "Therefore, we conclude that from our study, the result of the Structure analysis, K = 2 populations , is an artifact of the Evanno's ad hoc statistic DK (Evanno et al., 2005) and V. rufidulum subpopulations represent single genetic cluster across KY and TN. These findings also support the contention of Gao et al. (2007) and Gilbert et al. (2012) that the program Structure should not be used with inbreeding populations. As with other obligate outcrossing species, the overall heterozygote deficit and deviation from HWE proportions found in this study may be explained by the ability of this V. rufidulum to mate "
    [Show abstract] [Hide abstract]
    ABSTRACT: Viburnum rufidulum is a deciduous tree native to North America that has four-season appeal, which provides commercial horticultural value. In addition, the plant has unique and attractive red pubescence on leaf buds and petioles, common to no other Viburnum species. As habitat undergoes development and subsequent fragmentation of native plant populations, it is important to have baseline genetic information for this species. Little is known about the genetic diversity within populations of V. rufidulum. In this study, seven microsatellite loci were used to measure genetic diversity, population structure, and gene flow of 235 V. rufidulum trees collected from 17 locations in Kentucky and Tennessee. The genotype data were used to infer population genetic structure using the program InStruct and to construct an unweighted pair group method with arithmetic mean dendrogram. A single population was indicated by the program InStruct and the dendrogram clustered the locations into two groups; however, little bootstrap support was evident. Observed and expected heterozygosity were 0.49 and 0.78, respectively. Low-to-moderate genetic differentiation (F-ST = 0.06) with evidence of gene flow (N-m = 4.82) was observed among 17 populations of V. rufidulum. A significant level of genetic diversity was evident among V. rufidulum populations with most of the genetic variations among individual trees (86.37%) rather than among populations (13.63%), and a Mantel test revealed significant correlation between genetic and geographical distance (r = 0.091, P = 0.001). The microsatellites developed herein provide an initial assessment or a baseline of genetic diversity for V. rufidulum in a limited area of the southeastern region of the United States. The markers are a genetic resource and can be of assistance in breeding programs, germplasm assessment, and future studies of V. rufidulum populations, as this is the first study to provide genetic diversity data for this native species.
    No preview · Article · Nov 2015 · Journal of the American Society for Horticultural Science. American Society for Horticultural Science
  • Source
    • "The best K value was evaluated by DK as proposed byEvanno et al. (2005). Because the Bayesian approach is under the null model of panmixia, in which each locus is at Hardy-Weinberg equilibrium and is independent of the others (Gao et al. 2007), it may overestimate K values if linkage occurs among some genotypes. Therefore, the genetic structure of P. striiformis f. sp. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Puccinia striiformis f. sp. tritici (Pst) is the causal pathogen of interregional epidemics of wheat stripe rust in China via long-distance migration. Gansu Province serves as putative inoculum center providing oversummering inoculum, while Sichuan Basin Area serves as a region providing huge amounts of overwintering inoculum. Thus, the relationship between these two regions in population exchange and migration become important in prediction of interregional epidemics. In this study, we compared the population genetic structure and race composition between Gansu and Sichuan Basin populations to infer their migration relationships. A total of 526 isolates, spanning 3 years, were genotyped using 8 pairs of amplified fragment length polymorphism markers and a subset of 98 isolates were inoculated onto 19 Chinese differentials to perform the race analysis. Twenty-three common races and twenty-six shared genotypes supplied molecular evidence for migration between Gansu and Sichuan Basin populations. Bayesian assignment and principal component analysis revealed that the genetic group assignment of the Sichuan Basin populations (10SB and 11SB) changed in the spring to align with the fall Gansu populations in the prior seasons (09GS and 10GS) which indicated an asymmetric migration from Gansu Province to the Sichuan Basin Area. The linkage disequilibrium and the parsimony tree length permutation test revealed a strong annual recombination signal in the Gansu populations and an inconsistent signal in the Sichuan Basin populations.
    Full-text · Article · Oct 2015 · Phytopathology
  • Source
    • "David et al. (2007) extend the approach of Enjalbert and David (2000) to accommodate errors in scoring heterozygotes as homozygotes. A primary objective of InStruct (Gao et al. 2007) is the estimation of admixture. It extends the widely-used program structure (Pritchard et al. 2000), which bases the estimation of admixture on disequilibria of various forms, by accounting for disequilibria due to selfing. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a Bayesian method for characterizing the mating system of populations reproducing through a mixture of self-fertilization and random outcrossing. Our method uses patterns of genetic variation across the genome as a basis for inference about reproduction under pure hermaphroditism, gynodioecy, and a model developed to describe the self-fertilizing killifish Kryptolebias marmoratus. We extend the standard coalescence model to accommodate these mating systems, accounting explicitly for multilocus identity disequilibrium, inbreeding depression, and variation in fertility among mating types. We incorporate the Ewens Sampling Formula (ESF) under the infinite-alleles model of mutation to obtain a novel expression for the likelihood of mating system parameters. Our Markov chain Monte Carlo (MCMC) algorithm assigns locus-specific mutation rates, drawn from a common mutation rate distribution that is itself estimated from the data using a Dirichlet Process Prior model. Our sampler is designed to accommodate additional information, including observations pertaining to the sex ratio, the intensity of inbreeding depression, and other aspects of reproduction. It can provide joint posterior distributions for the population-wide proportion of uniparental individuals, locus-specific mutation rates, and the number of generations since the most recent outcrossing event for each sampled individual. Further, estimation of all basic parameters of a given model permits estimation of functions of those parameters, including the proportion of the gene pool contributed by each sex and relative effective numbers.
    Full-text · Article · Sep 2015 · Genetics
Show more