A Markov Chain Monte Carlo Approach for Joint Inference of Population Structure and Inbreeding Rates From Multilocus Genotype Data

Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA.
Genetics (Impact Factor: 5.96). 08/2007; 176(3):1635-51. DOI: 10.1534/genetics.107.072371
Source: PubMed


Nonrandom mating induces correlations in allelic states within and among loci that can be exploited to understand the genetic structure of natural populations (Wright 1965). For many species, it is of considerable interest to quantify the contribution of two forms of nonrandom mating to patterns of standing genetic variation: inbreeding (mating among relatives) and population substructure (limited dispersal of gametes). Here, we extend the popular Bayesian clustering approach STRUCTURE (Pritchard et al. 2000) for simultaneous inference of inbreeding or selfing rates and population-of-origin classification using multilocus genetic markers. This is accomplished by eliminating the assumption of Hardy-Weinberg equilibrium within clusters and, instead, calculating expected genotype frequencies on the basis of inbreeding or selfing rates. We demonstrate the need for such an extension by showing that selfing leads to spurious signals of population substructure using the standard STRUCTURE algorithm with a bias toward spurious signals of admixture. We gauge the performance of our method using extensive coalescent simulations and demonstrate that our approach can correct for this bias. We also apply our approach to understanding the population structure of the wild relative of domesticated rice, Oryza rufipogon, an important partially selfing grass species. Using a sample of n = 16 individuals sequenced at 111 random loci, we find strong evidence for existence of two subpopulations, which correlates well with geographic location of sampling, and estimate selfing rates for both groups that are consistent with estimates from experimental data (s approximately 0.48-0.70).

1 Follower
27 Reads
  • Source
    • "David et al. (2007) extend the approach of Enjalbert and David (2000) to accommodate errors in scoring heterozygotes as homozygotes. A primary objective of InStruct (Gao et al. 2007) is the estimation of admixture. It extends the widely-used program structure (Pritchard et al. 2000), which bases the estimation of admixture on disequilibria of various forms, by accounting for disequilibria due to selfing. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a Bayesian method for characterizing the mating system of populations reproducing through a mixture of self-fertilization and random outcrossing. Our method uses patterns of genetic variation across the genome as a basis for inference about reproduction under pure hermaphroditism, gynodioecy, and a model developed to describe the self-fertilizing killifish Kryptolebias marmoratus. We extend the standard coalescence model to accommodate these mating systems, accounting explicitly for multilocus identity disequilibrium, inbreeding depression, and variation in fertility among mating types. We incorporate the Ewens Sampling Formula (ESF) under the infinite-alleles model of mutation to obtain a novel expression for the likelihood of mating system parameters. Our Markov chain Monte Carlo (MCMC) algorithm assigns locus-specific mutation rates, drawn from a common mutation rate distribution that is itself estimated from the data using a Dirichlet Process Prior model. Our sampler is designed to accommodate additional information, including observations pertaining to the sex ratio, the intensity of inbreeding depression, and other aspects of reproduction. It can provide joint posterior distributions for the population-wide proportion of uniparental individuals, locus-specific mutation rates, and the number of generations since the most recent outcrossing event for each sampled individual. Further, estimation of all basic parameters of a given model permits estimation of functions of those parameters, including the proportion of the gene pool contributed by each sex and relative effective numbers.
    Genetics 09/2015; DOI:10.1534/genetics.115.179093 · 5.96 Impact Factor
    • "e frequencies . Considering K = 1 to 20 , 10 5 burn - in steps and 10 6 iterations of MCMC algorithm were run 10 times per K . The optimal number of clusters was estimated using ΔK method described by Evanno et al . ( 2005 ) and computed with STRUCTURE HARVESTER online ( Earl & vonHoldt , 2011 ) . Contrary to STRUCTURE , the INSTRUCT soft - ware ( Gao et al . , 2007 ) takes inbreeding into account and does not require the assumption of HWE within clusters ."
    [Show abstract] [Hide abstract]
    ABSTRACT: 1. In the South-West Indian Ocean, the honeybee Apis mellifera is found on several islands including the Seychelles archipelago. This archipelago is located 1120 km North of Madagascar, where the endemic African subspecies A. m. unicolor occurs. The genetic diversity of the honeybee populations in the Seychelles islands has never been studied, yet this species interacts with highly endemic and indigenous flora. 2. A total of 186 honeybee colonies from the three main islands: Mahé, Praslin and La Digue were collected. In addition, 107 individuals from Madagascar (A. m. unicolor) and 49 from Italy (A. m. ligustica) were analysed as reference populations. The maternal lineages were assessed using PCR-RFLP (n = 342) and sequencing (n = 121) of the mtDNA COI–COII intergenic region. Intra-Seychelles nuclear genetic diversity and structure were analysed using 15 microsatellites while comparison with reference populations was done using 14 loci. 3. All Seychellian colonies had mtDNA sequences characteristic of the African evolutionary lineage. Two sub-lineages were detected: AI sub-lineage (A1) was dominant (96.7%) on all islands and mostly represented by the subspecies A. m. unicolor, while Z sub-lineage was observed in six colonies from two islands. No mtDNA characteristic of imported European lineages was detected. 4. Nuclear genetic diversity was high and structured, suggesting restricted gene flow between islands of the archipelago. High nuclear similarities were found among the Seychellian and A. m. unicolor populations, yet significant genetic differentiation was observed. The A. m. ligustica reference population was highly differentiated from the Seychellian honeybee populations.
    Insect Conservation and Diversity 08/2015; DOI:10.1111/icad.12138 · 2.17 Impact Factor
  • Source
    • "function. PCA is free from many of the population genetics assumptions underlying STRUCTURE (Gao et al. 2007; Jombart et al. 2009) and can be more useful with continuous patterns of differentiation (e.g. isolation by distance – IBD; Engelhardt & Stephens 2010) than STRUCTURE. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Identifying the genetic structure of a species and the factors that drive it are important first steps in modern population management, in part because populations evolving from separate ancestral sources may possess potentially different characteristics. This is especially true for climate-sensitive species such as pikas, where the delimitation of distinct genetic units and the characterization of population responses to contemporary and historical environmental pressures is of particular interest. We combine a restriction-associated DNA sequencing (RADSeq) dataset containing 4,156 single nucleotide polymorphisms with ecological niche models (ENMs) of present and past habitat suitability to characterize population composition and evaluate the effects of historical range shifts, contemporary climates, and landscape factors on gene flow in Collared Pikas, which are found in Alaska and adjacent regions of northwestern Canada and are the lesser-studied of North America's two pika species. The results suggest that contemporary environmental factors contribute little to current population connectivity. Instead, genetic diversity is strongly shaped by the presence of three ancestral lineages isolated during the Pleistocene (~148 and 52 kya). Based on ENMs and genetic data, populations originating from a northern refugium experienced longer-term stability whereas both southern lineages underwent population expansion – contradicting the southern stability and northern expansion patterns seen in many other taxa. Current populations are comparable with respect to generally low diversity within populations and little to no recent admixture. The predominance of divergent histories structuring populations implies that if we are to understand and manage pika populations we must specifically assess and accurately account for the forces underlying genetic similarity.This article is protected by copyright. All rights reserved.
    Molecular Ecology 06/2015; 24(14). DOI:10.1111/mec.13270 · 6.49 Impact Factor
Show more


27 Reads
Available from