A Markov chain Monte Carlo approach for joint inference of population structure and inbreeding rates from multilocus genotype data.

Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA.
Genetics (Impact Factor: 4.87). 08/2007; 176(3):1635-51. DOI: 10.1534/genetics.107.072371
Source: PubMed

ABSTRACT Nonrandom mating induces correlations in allelic states within and among loci that can be exploited to understand the genetic structure of natural populations (Wright 1965). For many species, it is of considerable interest to quantify the contribution of two forms of nonrandom mating to patterns of standing genetic variation: inbreeding (mating among relatives) and population substructure (limited dispersal of gametes). Here, we extend the popular Bayesian clustering approach STRUCTURE (Pritchard et al. 2000) for simultaneous inference of inbreeding or selfing rates and population-of-origin classification using multilocus genetic markers. This is accomplished by eliminating the assumption of Hardy-Weinberg equilibrium within clusters and, instead, calculating expected genotype frequencies on the basis of inbreeding or selfing rates. We demonstrate the need for such an extension by showing that selfing leads to spurious signals of population substructure using the standard STRUCTURE algorithm with a bias toward spurious signals of admixture. We gauge the performance of our method using extensive coalescent simulations and demonstrate that our approach can correct for this bias. We also apply our approach to understanding the population structure of the wild relative of domesticated rice, Oryza rufipogon, an important partially selfing grass species. Using a sample of n = 16 individuals sequenced at 111 random loci, we find strong evidence for existence of two subpopulations, which correlates well with geographic location of sampling, and estimate selfing rates for both groups that are consistent with estimates from experimental data (s approximately 0.48-0.70).

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Pea (Pisum sativum L.), a major pulse crop grown for its protein-rich seeds, is an important component of agroecological cropping systems in diverse regions of the world. New breeding challenges imposed by global climate change and new regulations urge pea breeders to undertake more efficient methods of selection and better take advantage of the large genetic diversity present in the Pisum sativum genepool. Diversity studies conducted so far in pea used Simple Sequence Repeat (SSR) and Retrotransposon Based Insertion Polymorphism (RBIP) markers. Recently, SNP marker panels have been developed that will be useful for genetic diversity assessment and marker-assisted selection. A collection of diverse pea accessions, including landraces and cultivars of garden, field or fodder peas as well as wild peas was characterised at the molecular level using newly developed SNP markers, as well as SSR markers and RBIP markers. The three types of markers were used to describe the structure of the collection and revealed different pictures of the genetic diversity among the collection. SSR showed the fastest rate of evolution and RBIP the slowest rate of evolution, pointing to their contrasted mode of evolution. SNP markers were then used to predict phenotypes -the date of flowering (BegFlo), the number of seeds per plant (Nseed) and thousand seed weight (TSW)- that were recorded for the collection. Different statistical methods were tested including the LASSO (Least Absolute Shrinkage ans Selection Operator), PLS (Partial Least Squares), SPLS (Sparse Partial Least Squares), Bayes A, Bayes B and GBLUP (Genomic Best Linear Unbiased Prediction) methods and the structure of the collection was taken into account in the prediction. Despite a limited number of 331 markers used for prediction, TSW was reliably predicted. The development of marker assisted selection has not reached its full potential in pea until now. This paper shows that the high-throughput SNP arrays that are being developed will most probably allow for a more efficient selection in this species.
    BMC Genomics 02/2015; 16(1):105. DOI:10.1186/s12864-015-1266-1 · 4.04 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The identification of the genetic structure of populations from multilocus genotype data has become a central component of modern population-genetic data analysis. Application of model-based clustering programs often entails a number of steps, in which the user considers different modeling assumptions, compares results across different pre-determined values of the number of assumed clusters (a parameter typically denoted K), examines multiple independent runs for each fixed value of K, and distinguishes among runs belonging to substantially distinct clustering solutions. Here, we present Clumpak (Cluster Markov Packager Across K), a method that automates the post-processing of results of model-based population structure analyses. For analyzing multiple independent runs at a single K value, Clumpak identifies sets of highly similar runs, separating distinct groups of runs that represent distinct modes in the space of possible solutions. This procedure, which generates a consensus solution for each distinct mode, is performed by the use of a Markov clustering algorithm that relies on a similarity matrix between replicate runs, as computed by the software Clumpp. Next, Clumpak identifies an optimal alignment of inferred clusters across different values of K, extending a similar approach implemented for a fixed K in Clumpp, and simplifying the comparison of clustering results across different K values. Clumpak incorporates additional features, such as implementations of methods for choosing K and comparing solutions obtained by different programs, models, or data subsets. Clumpak, available at, simplifies the use of model-based analyses of population structure in population genetics and molecular ecology. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
    Molecular Ecology Resources 02/2015; DOI:10.1111/1755-0998.12387 · 7.43 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We used a Diversity Fixed Foundation Set comprising 48 inbred lines of Brassica juncea and representing all the adaption zones of the crop for association mapping. Extensive phenotypic variations were observed for all the grain yield components and root traits under both irrigated and restricted moisture conditions. The genotypes differed in their responses to moisture stress. Trait averages declined numerically under restricted moisture conditions when compared to a standard irrigation schedule. Canonical analysis demonstrated the importance of primary branches and seed size as the traits of significance for drought susceptibility index. Microsatellite markers (158), representing all the 18 chromosomes, were used to assess the population structure, linkage disequilibrium (LD) in the association panel and marker–trait associations (MTA’s). A comparison of four association models [general linear model/GLM(Q-matrix/Q), mixed linear model: MLM(Q?kinship matrix/K), GLM (principal components/PC) andMLM(PC ? K)] showed that GLM(PC) and MLM(PC ? K), incorporating principal components and kinship matrix, were the best models. Maximum proportions of significant results were observed in the models GLM(PC) and MLM(PC ? K). MLM was preferred as there were fewer false positives than GLM. Thirteen significant associations were detected between the molecular markers and agronomic traits. Of these, seven were identified under normal moisture conditions, and six under restricted moisture conditions. Marker–trait associations included four markers associated with grain yield, three with seed size, two with secondary branches and one marker each with plant height, root diameter and root length. A single marker SB1822-1, was repeatedly detected for seed size and grain yield, and was localized at 17.5 cM (centiMorgans) on chromosome B3. Marker SB3872-3 revealed a significant effect under normal moisture conditions on seed size (R2% = 15.16) at 60.9 cM on chromosome B5 during the first year. Among the favorable alleles, SB1822-1 had the average positive phenotypic effect for seed size and grain yield. Marker cnu316-3 had maximum positive phenotypic effects on grain yield.
    Molecular Breeding 01/2015; 35(48). · 3.25 Impact Factor


Available from