Article

Genotype imputation for African Americans using data from HapMap phase II versus 1000 genomes projects.

Division of Biostatistics, Washington University in St. Louis, School of Medicine, St. Louis, Missouri 63110-1093, USA.
Genetic Epidemiology (impact factor: 3.44). 05/2012; 36(5):508-16. DOI:10.1002/gepi.21647 pp.508-16
Source: PubMed

ABSTRACT Genotype imputation provides imputation of untyped single nucleotide polymorphisms (SNPs) that are present on a reference panel such as those from the HapMap Project. It is popular for increasing statistical power and comparing results across studies using different platforms. Imputation for African American populations is challenging because their linkage disequilibrium blocks are shorter and also because no ideal reference panel is available due to admixture. In this paper, we evaluated three imputation strategies for African Americans. The intersection strategy used a combined panel consisting of SNPs polymorphic in both CEU and YRI. The union strategy used a panel consisting of SNPs polymorphic in either CEU or YRI. The merge strategy merged results from two separate imputations, one using CEU and the other using YRI. Because recent investigators are increasingly using the data from the 1000 Genomes (1KG) Project for genotype imputation, we evaluated both 1KG-based imputations and HapMap-based imputations. We used 23,707 SNPs from chromosomes 21 and 22 on Affymetrix SNP Array 6.0 genotyped for 1,075 HyperGEN African Americans. We found that 1KG-based imputations provided a substantially larger number of variants than HapMap-based imputations, about three times as many common variants and eight times as many rare and low-frequency variants. This higher yield is expected because the 1KG panel includes more SNPs. Accuracy rates using 1KG data were slightly lower than those using HapMap data before filtering, but slightly higher after filtering. The union strategy provided the highest imputation yield with next highest accuracy. The intersection strategy provided the lowest imputation yield but the highest accuracy. The merge strategy provided the lowest imputation accuracy. We observed that SNPs polymorphic only in CEU had much lower accuracy, reducing the accuracy of the union strategy. Our findings suggest that 1KG-based imputations can facilitate discovery of significant associations for SNPs across the whole MAF spectrum. Because the 1KG Project is still under way, we expect that later versions will provide better imputation performance.

0 0
 · 
0 Bookmarks
 · 
47 Views

Keywords

1,075 HyperGEN African Americans
 
1KG Project
 
1KG-based imputations
 
Accuracy rates
 
African Americans
 
genotype imputation
 
HapMap Project
 
HapMap-based imputations
 
highest accuracy
 
highest imputation yield
 
imputation performance
 
intersection strategy
 
low-frequency variants
 
lower accuracy
 
lowest imputation accuracy
 
lowest imputation yield
 
merge strategy
 
next highest accuracy
 
separate imputations
 
SNPs polymorphic