Comprehensive evaluation of imputation performance in African Americans

Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA.
Journal of Human Genetics (Impact Factor: 2.53). 05/2012; 57(7):411-21. DOI: 10.1038/jhg.2012.43
Source: PubMed

ABSTRACT Imputation of genome-wide single-nucleotide polymorphism (SNP) arrays to a larger known reference panel of SNPs has become a standard and an essential part of genome-wide association studies. However, little is known about the behavior of imputation in African Americans with respect to the different imputation algorithms, the reference population(s) and the reference SNP panels used. Genome-wide SNP data (Affymetrix 6.0) from 3207 African American samples in the Atherosclerosis Risk in Communities Study (ARIC) was used to systematically evaluate imputation quality and yield. Imputation was performed with the imputation algorithms MACH, IMPUTE and BEAGLE using several combinations of three reference panels of HapMap III (ASW, YRI and CEU) and 1000 Genomes Project (pilot 1 YRI June 2010 release, EUR and AFR August 2010 and June 2011 releases) panels with SNP data on chromosomes 18, 20 and 22. About 10% of the directly genotyped SNPs from each chromosome were masked, and SNPs common between the reference panels were used for evaluating the imputation quality using two statistical metrics-concordance accuracy and Cohen's kappa (κ) coefficient. The dependencies of these metrics on the minor allele frequencies (MAF) and specific genotype categories (minor allele homozygotes, heterozygotes and major allele homozygotes) were thoroughly investigated to determine the best panel and method for imputation in African Americans. In addition, the power to detect imputed SNPs associated with simulated phenotypes was studied using the mean genotype of each masked SNP in the imputed data. Our results indicate that the genotype concordances after stratification into each genotype category and Cohen's κ coefficient are considerably better equipped to differentiate imputation performance compared with the traditionally used total concordance statistic, and both statistics improved with increasing MAF irrespective of the imputation method. We also find that both MACH and IMPUTE performed equally well and consistently better than BEAGLE irrespective of the reference panel used. Of the various combinations of reference panels, for both HapMap III and 1000 Genomes Project reference panels, the multi-ethnic panels had better imputation accuracy than those containing only single ethnic samples. The most recent 1000 Genomes Project release June 2011 had substantially higher number of imputed SNPs than HapMap III and performed as well or better than the best combined HapMap III reference panels and previous releases of the 1000 Genomes Project.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Multiple imputation based on chained equations (MICE) is an alternative missing genotype method that can use genetic and nongenetic auxiliary data to inform the imputation process. Previously, MICE was successfully tested on strongly linked genetic data. We have now tested it on data of the HBA2 gene which, by the experimental design used in a malaria association study in Tanzania, shows a high missing data percentage and is weakly linked with the remaining genetic markers in the data set. We constructed different imputation models and studied their performance under different missing data conditions. Overall, MICE failed to accurately predict the true genotypes. However, using the best imputation model for the data, we obtained unbiased estimates for the genetic effects, and association signals of the HBA2 gene on malaria positivity. When the whole data set was analyzed with the same imputation model, the association signal increased from 0.80 to 2.70 before and after imputation, respectively. Conversely, postimputation estimates for the genetic effects remained the same in relation to the complete case analysis but showed increased precision. We argue that these postimputation estimates are reasonably unbiased, as a result of a good study design based on matching key socio-environmental factors.
    Annals of Human Genetics 07/2014; 78(4):277-89. DOI:10.1111/ahg.12065 · 1.93 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: During the past 2 years, next-generation sequencing studies have revolutionized the field of genetic association studies. We review the concomitant evolution of statistical methods. As much of the genetic variability identified with sequencing is extremely rare, many new methods have been developed for rare variant association studies. Sequencing data available as a result of large public projects are also being integrated with genome-wide association study (GWAS) chip data to improve genotype imputation. A further trend in recent methodological development has been the use of the linear mixed effect model (LMM). LMMs are used for rare variant association to handle effect heterogeneity. They are also used more generally in GWAS to account for population structure. Many rare variant association tests have been developed to analyze the genetic variation discovered with large-scale DNA sequencing; however, no single approach outperforms others under all disease models and power tends to be low. Sequencing data are also contributing to improved imputation of uncommon genetic variants, although imputation of rare variants remains a challenge. The appropriate correction for population structure in rare variant analyses remains unclear; specialized adjustment techniques may be necessary.
    Current Opinion in Allergy and Clinical Immunology 08/2013; DOI:10.1097/ACI.0b013e3283648f68 · 3.40 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Melanoma is the most serious type of skin cancer and one of the most common cancers in the world. Advanced melanoma is often resistant to conventional therapies and has high potential for metastasis and low survival rates. Vemurafenib is a small molecule inhibitor of the BRAF serine-threonine kinase recently approved by the United States Food and Drug Administration to treat patients with metastatic and unresectable melanomas that carry an activating BRAF (V600E) mutation. Many clinical trials evaluating other therapeutic uses of vemurafenib are still ongoing. The ATP-binding cassette (ABC) transporters are membrane proteins with important physiological and pharmacological roles. Collectively, they transport and regulate levels of physiological substrates such as lipids, porphyrins and sterols. Some of them also remove xenobiotics and limit the oral bioavailability and distribution of many chemotherapeutics. The overexpression of three major ABC drug transporters is the most common mechanism for acquired resistance to anticancer drugs. In this review, we highlight some of the recent findings related to the effect of ABC drug transporters such as ABCB1 and ABCG2 on the oral bioavailability of vemurafenib, problems associated with treating melanoma brain metastases and the development of acquired resistance to vemurafenib in cancers harboring the BRAF (V600E) mutation.
    04/2014; DOI:10.1016/j.apsb.2013.12.001

Full-text (2 Sources)

Available from
Dec 5, 2014