Rapid Assessment of Genetic Ancestry in Populations of Unknown Origin by Genome-Wide Genotyping of Pooled Samples

Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America.
PLoS Genetics (Impact Factor: 7.53). 03/2010; 6(3):e1000866. DOI: 10.1371/journal.pgen.1000866
Source: PubMed


As we move forward from the current generation of genome-wide association (GWA) studies, additional cohorts of different ancestries will be studied to increase power, fine map association signals, and generalize association results to additional populations. Knowledge of genetic ancestry as well as population substructure will become increasingly important for GWA studies in populations of unknown ancestry. Here we propose genotyping pooled DNA samples using genome-wide SNP arrays as a viable option to efficiently and inexpensively estimate admixture proportion and identify ancestry informative markers (AIMs) in populations of unknown origin. We constructed DNA pools from African American, Native Hawaiian, Latina, and Jamaican samples and genotyped them using the Affymetrix 6.0 array. Aided by individual genotype data from the African American cohort, we established quality control filters to remove poorly performing SNPs and estimated allele frequencies for the remaining SNPs in each panel. We then applied a regression-based method to estimate the proportion of admixture in each cohort using the allele frequencies estimated from pooling and populations from the International HapMap Consortium as reference panels, and identified AIMs unique to each population. In this study, we demonstrated that genotyping pooled DNA samples yields estimates of admixture proportion that are both consistent with our knowledge of population history and similar to those obtained by genotyping known AIMs. Furthermore, through validation by individual genotyping, we demonstrated that pooling is quite effective for identifying SNPs with large allele frequency differences (i.e., AIMs) and that these AIMs are able to differentiate two closely related populations (HapMap JPT and CHB).

Download full-text


Available from: Terrence Forrester,
  • Source
    • "Several earlier studies have described the identification of new or validation of known phenotype-associated genetic loci by allelotyping using older SNP genotyping array versions [2-4]. Recently, findings from allelotyping using newer generations of arrays that enable genome-wide SNP coverage, such as the Affymetrix Genome-Wide Human SNP Array 6.0, were also published [5,6]. In an allelotyping study, the estimated allele frequencies are subject to various quantitative technical errors related to pooling of the DNA samples. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Genome-wide association studies (GWAS) using array-based genotyping technology are widely used to identify genetic loci associated with complex diseases or other phenotypes. The costs of GWAS projects based on individual genotyping are still comparatively high and increase with the size of study populations. Genotyping using pooled DNA samples, as also being referred as to allelotyping approach, offers an alternative at affordable costs. In the present study, data from 100 DNA samples individually genotyped with the Affymetrix Genome-Wide Human SNP Array 6.0 were used to estimate the error of the pooling approach by comparing the results with those obtained using the same array type but DNA pools each composed of 50 of the same samples. Newly developed and established methods for signal intensity correction were applied. Furthermore, the relative allele intensity signals (RAS) obtained by allelotyping were compared to the corresponding values derived from individual genotyping. Similarly, differences in RAS values between pools were determined and compared. Regardless of the intensity correction method applied, the pooling-specific error of the pool intensity values was larger for single pools than for the comparison of the intensity values of two pools, which reflects the scenario of a case--control study. Using 50 pooled samples and analyzing 10,000 SNPs with a minor allele frequency of >1% and applying the best correction method for the corresponding type of comparison, the 90% quantile (median) of the pooling-specific absolute error of the RAS values for single sub-pools and the SNP-specific difference in allele frequency comparing two pools was 0.064 (0.026) and 0.056 (0.021), respectively. Correction of the RAS values reduced the error of the RAS values when analyzing single pool intensities. We developed a new correction method with high accuracy but low computational costs. Correction of RAS, however, only marginally reduced the error of true differences between two sample groups and those obtained by allelotyping. Exclusion of SNPs with a minor allele frequency of <=1% notably reduced the pooling-specific error. Our findings allow for improving the estimation of the pooling-specific error and may help in designing allelotyping studies using the Affymetrix Genome-Wide Human SNP Array 6.0.
    BMC Genomics 07/2013; 14(1):506. DOI:10.1186/1471-2164-14-506 · 3.99 Impact Factor
  • Source
    • "However, despite the popularity of DNA pooling in genetic association studies, only few studies to date have utilized allelotyping approach to characterize inter-population variation e.g. [28]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background New sequencing technologies have tremendously increased the number of known molecular markers (single nucleotide polymorphisms; SNPs) in a variety of species. Concurrently, improvements to genotyping technology have now made it possible to efficiently genotype large numbers of genome-wide distributed SNPs enabling genome wide association studies (GWAS). However, genotyping significant numbers of individuals with large number of SNPs remains prohibitively expensive for many research groups. A possible solution to this problem is to determine allele frequencies from pooled DNA samples, such ‘allelotyping’ has been presented as a cost-effective alternative to individual genotyping and has become popular in human GWAS. In this article we have tested the effectiveness of DNA pooling to obtain accurate allele frequency estimates for Atlantic salmon (Salmo salar L.) populations using an Illumina SNP-chip. Results In total, 56 Atlantic salmon DNA pools from 14 populations were analyzed on an Atlantic salmon SNP-chip containing probes for 5568 SNP markers, 3928 of which were bi-allelic. We developed an efficient quality control filter which enables exclusion of loci showing high error rate and minor allele frequency (MAF) close to zero. After applying multiple quality control filters we obtained allele frequency estimates for 3631 bi-allelic loci. We observed high concordance (r > 0.99) between allele frequency estimates derived from individual genotyping and DNA pools. Our results also indicate that even relatively small DNA pools (35 individuals) can provide accurate allele frequency estimates for a given sample. Conclusions Despite of higher level of variation associated with array replicates compared to pool construction, we suggest that both sources of variation should be taken into account. This study demonstrates that DNA pooling allows fast and high-throughput determination of allele frequencies in Atlantic salmon enabling cost-efficient identification of informative markers for discrimination of populations at various geographical scales, as well as identification of loci controlling ecologically and economically important traits.
    BMC Genomics 01/2013; 14(1):12. DOI:10.1186/1471-2164-14-12 · 3.99 Impact Factor
  • Source
    • "Efficient genetic mapping requires the ability to screen large sets of markers in a cost-efficient manner. This is also true for genome-wide association studies (GWASs), which are broadly used in mammalian systems but have only recently been implemented in plants (Aranzana et al. 2005; Gupta et al. 2005; Chiang et al. 2010; Rafalski 2002). Although many technologies that seem to be more accurate than SFP arrays are available, such as Illumina bead array or even some combinations of direct sequencing (Walsh et al. 2010), the SFP approach still has one major advantage: it can be used for variation discovery without prior knowledge of the variation at a reasonable price (this can also be done by direct sequencing, but the cost is prohibitive ). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The availability of sequence information for many plants has opened the way to advanced genetic analysis in many non-model plants. Nevertheless, exploration of genetic variation on a large scale and its use as a tool for the identification of traits of interest are still rare. In this study, we combined a bulk segregation approach with our own-designed microarrays to map the pH locus that influences fruit pH in melon. Using these technologies, we identified a set of markers that are genetically linked to the pH trait. Further analysis using a set of melon cultivars demonstrated that some of these markers are tightly linked to the pH trait throughout our germplasm collection. These results validate the utility of combining microarray technology with a bulk segregation approach in mapping traits of interest in non-model plants.
    Theoretical and Applied Genetics 10/2012; 126(2). DOI:10.1007/s00122-012-1983-7 · 3.79 Impact Factor
Show more