Article

Comprehensive evaluation of imputation performance in African Americans

Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA.
Journal of Human Genetics (Impact Factor: 2.53). 05/2012; 57(7):411-21. DOI: 10.1038/jhg.2012.43
Source: PubMed

ABSTRACT Imputation of genome-wide single-nucleotide polymorphism (SNP) arrays to a larger known reference panel of SNPs has become a standard and an essential part of genome-wide association studies. However, little is known about the behavior of imputation in African Americans with respect to the different imputation algorithms, the reference population(s) and the reference SNP panels used. Genome-wide SNP data (Affymetrix 6.0) from 3207 African American samples in the Atherosclerosis Risk in Communities Study (ARIC) was used to systematically evaluate imputation quality and yield. Imputation was performed with the imputation algorithms MACH, IMPUTE and BEAGLE using several combinations of three reference panels of HapMap III (ASW, YRI and CEU) and 1000 Genomes Project (pilot 1 YRI June 2010 release, EUR and AFR August 2010 and June 2011 releases) panels with SNP data on chromosomes 18, 20 and 22. About 10% of the directly genotyped SNPs from each chromosome were masked, and SNPs common between the reference panels were used for evaluating the imputation quality using two statistical metrics-concordance accuracy and Cohen's kappa (κ) coefficient. The dependencies of these metrics on the minor allele frequencies (MAF) and specific genotype categories (minor allele homozygotes, heterozygotes and major allele homozygotes) were thoroughly investigated to determine the best panel and method for imputation in African Americans. In addition, the power to detect imputed SNPs associated with simulated phenotypes was studied using the mean genotype of each masked SNP in the imputed data. Our results indicate that the genotype concordances after stratification into each genotype category and Cohen's κ coefficient are considerably better equipped to differentiate imputation performance compared with the traditionally used total concordance statistic, and both statistics improved with increasing MAF irrespective of the imputation method. We also find that both MACH and IMPUTE performed equally well and consistently better than BEAGLE irrespective of the reference panel used. Of the various combinations of reference panels, for both HapMap III and 1000 Genomes Project reference panels, the multi-ethnic panels had better imputation accuracy than those containing only single ethnic samples. The most recent 1000 Genomes Project release June 2011 had substantially higher number of imputed SNPs than HapMap III and performed as well or better than the best combined HapMap III reference panels and previous releases of the 1000 Genomes Project.

Download full-text

Full-text

Available from: Dan E Arking, Dec 05, 2014
0 Followers
 · 
111 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: Importance Genetic variants associated with susceptibility to late-onset Alzheimer disease are known for individuals of European ancestry, but whether the same or different variants account for the genetic risk of Alzheimer disease in African American individuals is unknown. Identification of disease-associated variants helps identify targets for genetic testing, prevention, and treatment.Objective To identify genetic loci associated with late-onset Alzheimer disease in African Americans.Design, Setting, and Participants The Alzheimer Disease Genetics Consortium (ADGC) assembled multiple data sets representing a total of 5896 African Americans (1968 case participants, 3928 control participants) 60 years or older that were collected between 1989 and 2011 at multiple sites. The association of Alzheimer disease with genotyped and imputed single-nucleotide polymorphisms (SNPs) was assessed in case-control and in family-based data sets. Results from individual data sets were combined to perform an inverse variance–weighted meta-analysis, first with genome-wide analyses and subsequently with gene-based tests for previously reported loci.Main Outcomes and Measures Presence of Alzheimer disease according to standardized criteria.Results Genome-wide significance in fully adjusted models (sex, age, APOE genotype, population stratification) was observed for a SNP in ABCA7 (rs115550680, allele = G; frequency, 0.09 cases and 0.06 controls; odds ratio [OR], 1.79 [95% CI, 1.47-2.12]; P = 2.2 × 10−9), which is in linkage disequilibrium with SNPs previously associated with Alzheimer disease in Europeans (0.8<D′<0.9). The effect size for the SNP in ABCA7 was comparable with that of the APOE ϵ4–determining SNP rs429358 (allele = C; frequency, 0.30 cases and 0.18 controls; OR, 2.31 [95% CI, 2.19-2.42]; P = 5.5 × 10−47). Several loci previously associated with Alzheimer disease but not reaching significance in genome-wide analyses were replicated in gene-based analyses accounting for linkage disequilibrium between markers and correcting for number of tests performed per gene (CR1, BIN1, EPHA1, CD33; 0.0005<empirical P < .001).Conclusions and Relevance In this meta-analysis of data from African American participants, Alzheimer disease was significantly associated with variants in ABCA7 and with other genes that have been associated with Alzheimer disease in individuals of European ancestry. Replication and functional validation of this finding is needed before this information is used in clinical settings.
    JAMA The Journal of the American Medical Association 04/2013; 309(14):1483. DOI:10.1001/jama.2013.2973 · 30.39 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Twenty-first century nurse clinicians, scientists, and educators must be informed of and become proficient in genetic competencies to provide the best available evidenced-based patient care. This article presents a historical context and basic applications of genetics, along with the attendant legal and ethical issues, to provide a framework for understanding genetics and the genomics applications used in clinical nursing practice. The implications of genomics are relevant to all areas of nursing practice, including risk assessment, education, clinical management, and future research.
    Nursing Clinics of North America 12/2013; 48(4):499-522. DOI:10.1016/j.cnur.2013.08.006 · 0.59 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Genotype imputation, used in genome-wide association studies to expand coverage of single nucleotide polymorphisms (SNPs), has performed poorly in African Americans compared to less admixed populations. Overall, imputation has typically relied on HapMap reference haplotype panels from Africans (YRI), European Americans (CEU), and Asians (CHB/JPT). The 1000 Genomes project offers a wider range of reference populations, such as African Americans (ASW), but their imputation performance has had limited evaluation. Using 595 African Americans genotyped on Illumina's HumanHap550v3 BeadChip, we compared imputation results from four software programs (IMPUTE2, BEAGLE, MaCH, and MaCH-Admix) and three reference panels consisting of different combinations of 1000 Genomes populations (February 2012 release): (1) 3 specifically selected populations (YRI, CEU, and ASW); (2) 8 populations of diverse African (AFR) or European (AFR) descent; and (3) all 14 available populations (ALL). Based on chromosome 22, we calculated three performance metrics: (1) concordance (percentage of masked genotyped SNPs with imputed and true genotype agreement); (2) imputation quality score (IQS; concordance adjusted for chance agreement, which is particularly informative for low minor allele frequency [MAF] SNPs); and (3) average r2hat (estimated correlation between the imputed and true genotypes, for all imputed SNPs). Across the reference panels, IMPUTE2 and MaCH had the highest concordance (91%-93%), but IMPUTE2 had the highest IQS (81%-83%) and average r2hat (0.68 using YRI+ASW+CEU, 0.62 using AFR+EUR, and 0.55 using ALL). Imputation quality for most programs was reduced by the addition of more distantly related reference populations, due entirely to the introduction of low frequency SNPs (MAF≤2%) that are monomorphic in the more closely related panels. While imputation was optimized by using IMPUTE2 with reference to the ALL panel (average r2hat = 0.86 for SNPs with MAF>2%), use of the ALL panel for African American studies requires careful interpretation of the population specificity and imputation quality of low frequency SNPs.
    PLoS ONE 11/2012; 7(11):e50610. DOI:10.1371/journal.pone.0050610 · 3.53 Impact Factor