Hardy-Weinberg analysis of a large set of published association studies reveals genotyping error and a deficit of heterozygotes across multiple loci.
ABSTRACT In genetic association studies, deviation from Hardy-Weinberg equilibrium (HWD) can be due to recent admixture or selection at a locus, but is most commonly due to genotyping errors. In addition to its utility for identifying potential genotyping errors in individual studies, here we report that HWD can be useful in detecting the presence, magnitude and direction of genotyping error across multiple studies. If there is a consistent genotyping error at a given locus, larger studies, in general, will show more evidence for HWD than small studies. As a result, for loci prone to genotyping errors, there will be a correlation between HWD and the study sample size. By contrast, in the absence of consistent genotyping errors, there will be a chance distribution of p- values among studies without correlation with sample size. We calculated the evidence for HWD at 17 separate polymorphic loci investigated in 325 published genetic association studies. In the full set of studies, there was a significant correlation between HWD and locus-standardised sample size ( p = 0.001). For 14/17 of the individual loci, there was a positive correlation between extent of HWD and sample size, with the evidence for two loci ( 5-HTTLPR and CTSD ) rising to the level of statistical significance. Among single nucleotide polymorphisms (SNPs), 15/23 studies that deviated significantly from Hardy-Weinberg equilibrium (HWE) did so because of a deficit of heterozygotes. The inbreeding coefficient (F(is)) is a measure of the degree and direction of deviation from HWE. Among studies investigating SNPs, there was a significant correlation between F(is) and HWD ( R = 0.191; p = 0.002), indicating that the greater the deviation from HWE, the greater the deficit of heterozygotes. By contrast, for repeat variants, only one in five studies that deviated significantly from HWE showed a deficit of heterozygotes and there was no significant correlation between F(is) and HWD. These results indicate the presence of HWD across multiple loci, with the magnitude of the deviation varying substantially from locus to locus. For SNPs, HWD tends to be due to a deficit of heterozygotes, indicating that allelic dropout may be the most prevalent genotyping error.