Comprehensive evaluation of imputation performance in African Americans

Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA.
Journal of Human Genetics (Impact Factor: 2.46). 05/2012; 57(7):411-21. DOI: 10.1038/jhg.2012.43
Source: PubMed


Imputation of genome-wide single-nucleotide polymorphism (SNP) arrays to a larger known reference panel of SNPs has become a standard and an essential part of genome-wide association studies. However, little is known about the behavior of imputation in African Americans with respect to the different imputation algorithms, the reference population(s) and the reference SNP panels used. Genome-wide SNP data (Affymetrix 6.0) from 3207 African American samples in the Atherosclerosis Risk in Communities Study (ARIC) was used to systematically evaluate imputation quality and yield. Imputation was performed with the imputation algorithms MACH, IMPUTE and BEAGLE using several combinations of three reference panels of HapMap III (ASW, YRI and CEU) and 1000 Genomes Project (pilot 1 YRI June 2010 release, EUR and AFR August 2010 and June 2011 releases) panels with SNP data on chromosomes 18, 20 and 22. About 10% of the directly genotyped SNPs from each chromosome were masked, and SNPs common between the reference panels were used for evaluating the imputation quality using two statistical metrics-concordance accuracy and Cohen's kappa (κ) coefficient. The dependencies of these metrics on the minor allele frequencies (MAF) and specific genotype categories (minor allele homozygotes, heterozygotes and major allele homozygotes) were thoroughly investigated to determine the best panel and method for imputation in African Americans. In addition, the power to detect imputed SNPs associated with simulated phenotypes was studied using the mean genotype of each masked SNP in the imputed data. Our results indicate that the genotype concordances after stratification into each genotype category and Cohen's κ coefficient are considerably better equipped to differentiate imputation performance compared with the traditionally used total concordance statistic, and both statistics improved with increasing MAF irrespective of the imputation method. We also find that both MACH and IMPUTE performed equally well and consistently better than BEAGLE irrespective of the reference panel used. Of the various combinations of reference panels, for both HapMap III and 1000 Genomes Project reference panels, the multi-ethnic panels had better imputation accuracy than those containing only single ethnic samples. The most recent 1000 Genomes Project release June 2011 had substantially higher number of imputed SNPs than HapMap III and performed as well or better than the best combined HapMap III reference panels and previous releases of the 1000 Genomes Project.

Download full-text


Available from: Dan E Arking, Dec 05, 2014
  • Source
    • "The 1000 Genomes project [18] now offers a wider range of reference populations that provide a better match of allele frequencies and linkage disequilibrium patterns for admixed populations, such as African Americans from the Southwest United States (ASW). However, there has been limited evaluation of the imputation performance of the newer reference populations in admixed study populations [8]. The present study offers a thorough evaluation of SNP genotype imputation performance in African Americans, by comparing imputation results using four different imputation software programs (IMPUTE2, BEAGLE, MaCH, and MaCH-Admix) and three reference panels consisting of different combinations of 1000 Genomes (February 2012 release) populations that are either closely or more broadly related to African Americans. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Genotype imputation, used in genome-wide association studies to expand coverage of single nucleotide polymorphisms (SNPs), has performed poorly in African Americans compared to less admixed populations. Overall, imputation has typically relied on HapMap reference haplotype panels from Africans (YRI), European Americans (CEU), and Asians (CHB/JPT). The 1000 Genomes project offers a wider range of reference populations, such as African Americans (ASW), but their imputation performance has had limited evaluation. Using 595 African Americans genotyped on Illumina's HumanHap550v3 BeadChip, we compared imputation results from four software programs (IMPUTE2, BEAGLE, MaCH, and MaCH-Admix) and three reference panels consisting of different combinations of 1000 Genomes populations (February 2012 release): (1) 3 specifically selected populations (YRI, CEU, and ASW); (2) 8 populations of diverse African (AFR) or European (AFR) descent; and (3) all 14 available populations (ALL). Based on chromosome 22, we calculated three performance metrics: (1) concordance (percentage of masked genotyped SNPs with imputed and true genotype agreement); (2) imputation quality score (IQS; concordance adjusted for chance agreement, which is particularly informative for low minor allele frequency [MAF] SNPs); and (3) average r2hat (estimated correlation between the imputed and true genotypes, for all imputed SNPs). Across the reference panels, IMPUTE2 and MaCH had the highest concordance (91%-93%), but IMPUTE2 had the highest IQS (81%-83%) and average r2hat (0.68 using YRI+ASW+CEU, 0.62 using AFR+EUR, and 0.55 using ALL). Imputation quality for most programs was reduced by the addition of more distantly related reference populations, due entirely to the introduction of low frequency SNPs (MAF≤2%) that are monomorphic in the more closely related panels. While imputation was optimized by using IMPUTE2 with reference to the ALL panel (average r2hat = 0.86 for SNPs with MAF>2%), use of the ALL panel for African American studies requires careful interpretation of the population specificity and imputation quality of low frequency SNPs.
    PLoS ONE 11/2012; 7(11):e50610. DOI:10.1371/journal.pone.0050610 · 3.23 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Importance Genetic variants associated with susceptibility to late-onset Alzheimer disease are known for individuals of European ancestry, but whether the same or different variants account for the genetic risk of Alzheimer disease in African American individuals is unknown. Identification of disease-associated variants helps identify targets for genetic testing, prevention, and treatment.Objective To identify genetic loci associated with late-onset Alzheimer disease in African Americans.Design, Setting, and Participants The Alzheimer Disease Genetics Consortium (ADGC) assembled multiple data sets representing a total of 5896 African Americans (1968 case participants, 3928 control participants) 60 years or older that were collected between 1989 and 2011 at multiple sites. The association of Alzheimer disease with genotyped and imputed single-nucleotide polymorphisms (SNPs) was assessed in case-control and in family-based data sets. Results from individual data sets were combined to perform an inverse variance–weighted meta-analysis, first with genome-wide analyses and subsequently with gene-based tests for previously reported loci.Main Outcomes and Measures Presence of Alzheimer disease according to standardized criteria.Results Genome-wide significance in fully adjusted models (sex, age, APOE genotype, population stratification) was observed for a SNP in ABCA7 (rs115550680, allele = G; frequency, 0.09 cases and 0.06 controls; odds ratio [OR], 1.79 [95% CI, 1.47-2.12]; P = 2.2 × 10−9), which is in linkage disequilibrium with SNPs previously associated with Alzheimer disease in Europeans (0.8<D′<0.9). The effect size for the SNP in ABCA7 was comparable with that of the APOE ϵ4–determining SNP rs429358 (allele = C; frequency, 0.30 cases and 0.18 controls; OR, 2.31 [95% CI, 2.19-2.42]; P = 5.5 × 10−47). Several loci previously associated with Alzheimer disease but not reaching significance in genome-wide analyses were replicated in gene-based analyses accounting for linkage disequilibrium between markers and correcting for number of tests performed per gene (CR1, BIN1, EPHA1, CD33; 0.0005<empirical P < .001).Conclusions and Relevance In this meta-analysis of data from African American participants, Alzheimer disease was significantly associated with variants in ABCA7 and with other genes that have been associated with Alzheimer disease in individuals of European ancestry. Replication and functional validation of this finding is needed before this information is used in clinical settings.
    JAMA The Journal of the American Medical Association 04/2013; 309(14):1483. DOI:10.1001/jama.2013.2973 · 35.29 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Numerous studies have examined gene × environment interactions (G × E) in cognitive and behavioral domains. However, these studies have been limited in that they have not been able to directly assess differential patterns of gene expression in the human brain. Here, we assessed G × E interactions using two publically available datasets to assess if DNA variation is associated with post-mortem brain gene expression changes based on smoking behavior, a biobehavioral construct that is part of a complex system of genetic and environmental influences. We conducted an expression quantitative trait locus (eQTL) study on two independent human brain gene expression datasets assessing G × E for selected psychiatric genes and smoking status. We employed linear regression to model the significance of the Gene × Smoking interaction term, followed by meta-analysis across datasets. Overall, we observed that the effect of DNA variation on gene expression is moderated by smoking status. Expression of 16 genes was significantly associated with single nucleotide polymorphisms that demonstrated G × E effects. The strongest finding (p = 1.9 × 10(-11) ) was neurexin 3-alpha (NRXN3), a synaptic cell-cell adhesion molecule involved in maintenance of neural connections (such as the maintenance of smoking behavior). Other significant G × E associations include four glutamate genes. This is one of the first studies to demonstrate G × E effects within the human brain. In particular, this study implicated NRXN3 in the maintenance of smoking. The effect of smoking on NRXN3 expression and downstream behavior is different based upon SNP genotype, indicating that DNA profiles based on SNPs could be useful in understanding the effects of smoking behaviors. These results suggest that better measurement of psychiatric conditions, and the environment in post-mortem brain studies may yield an important avenue for understanding the biological mechanisms of G × E interactions in psychiatry.
    Journal of Child Psychology and Psychiatry 08/2013; 54(10). DOI:10.1111/jcpp.12119 · 6.46 Impact Factor
Show more