September 22, 20109:39WSPC - Proceedings Trim Size: 11in x 8.5in EZ-JA˙PSB2011-revision
AN EVALUATION OF POWER TO DETECT LOW-FREQUENCY VARIANT
ASSOCIATIONS USING ALLELE-MATCHING TESTS THAT ACCOUNT
E. ZEGGINI∗and J.L. ASIMIT
Wellcome Trust Sanger Institute, Hinxton, CB10 1HH, UK
There is growing interest in the role of rare variants in multifactorial disease etiology, and increasing
evidence that rare variants are associated with complex traits. Single SNP tests are underpowered
in rare variant association analyses, so locus-based tests must be used. Quality scores at both the
SNP and genotype level are available for sequencing data and they are rarely accounted for. A
locus-based method that has high power in the presence of rare variants is extended to incorporate
such quality scores as weights, and its power is compared with the original method via a simulation
study. Preliminary results suggest that taking uncertainty into account does not improve the power.
Keywords: Allele-Matching; Rare variants;Locus-based method; Quality scores; Sequencing
There is an increasing interest in the role of rare variants in multifactorial disease etiology,
while the evidence that rare variants are associated with complex traits is steadily expanding.
Although any individual rare variant exists in low frequencies, the frequency with which any
rare variant is present makes them collectively common. Under the multiple rare variant
hypothesis (MRV), the effects of multiple rare variants with moderate to high penetrance
combine to increase the risk of most common inherited diseases . At the other extreme is
the common disease common variant (CDCV) hypothesis, which states that most common
complex diseases are due to a few common variants with moderately small effects . The
most likely scenario is that a combination of both common and rare variants contribute to
In most genome-wide association (GWA) studies only variants with minor allele frequency
(MAF) greater than 1-5% are followed up, and the focus tends to be on identifying common
disease variants that are associated with complex diseases. However, this approach is limited
since only 5-10% of the heritable component of disease is explained by the many genetic
variations identified as having strong evidence of disease association in GWA studies. This
suggest that a fruitful direction is to search for associations with multiple rare variants .
By design, SNP genotyping panels often focus on common SNPs, so that they only contain
a relatively small number of rare variants. This leads to a common issue in rare variant
analyses, in that on most platforms there is an insufficient number of rare variants (Table 1).
There appears to be a clear difference in the effects of rare variants in comparison to SNPs
of higher frequency, with rare variants having stronger effects. According to the odds ratios
(OR) for common and rare variants identified in published studies, most common-disease
associated variants have ORs between 1.1 and 1.4 with only a few above 2, while the majority
of the identified rare variants to date have an OR greater than 2 and a mean of 3.74 . In
September 22, 2010 9:39 WSPC - Proceedings Trim Size: 11in x 8.5inEZ-JA˙PSB2011-revision
likely due to the fact that our region of 150kb contains almost 350 SNPs, which are all jointly
tested. This illustrates a caveat of this multi-marker testing approach.
In order to examine type I error, a null simulation in which we set the relative risk as 1
is also examined. However, we only consider the neighborhood region due to the extremely
low power observed for the entire region. At the 5% level both methods are found to be quite
conservative, with AMELIA and KBAT having respective type I errors of .00502 and .00401.
In the short simulation study presented here, a decrease in power has been observed by
incorporating quality scores of SNPs and genotypes as in AMELIA, with the difference largest
for a small number of SNPs. The relatively low power of the two methods may be due to the
fact that almost 350 SNPs are being tested jointly, of which there is only one causal SNP.
This may suggest that this multi-marker approach may be best suited for smaller regions, or
after some filtering to reduce the number of SNPs that are jointly tested. For example, when
the focus is on low-frequency variants, the analysis may include only those with a MAF below
a certain threshold, such as 0.05. It is noted that the replications that were identified only by
KBAT tend to have a causal SNP with a high SNP quality score. In such situations it may be
that by allowing for uncertainty that is not present, power to detect the signal is inadvertently
diluted. In the simple simulations examined, the power of AMELIA appears to be lower than
KBAT, and both tests are conservative with similar error rates. We are extending our methods
further to achieve greater power.
1. Bodmer W and Bonillna C. (2008). Common and rare variants in multifactorial susceptibility to
common diseases. Nature Genetics 40: 695-701.
2. Pritchard JK and Cox NJ. (2002). The allelic architecture of human disease genes: common disease
common variant ... or not? Human Molecular Genetics 11: 2417-2423.
3. Schork NJ, Murray SS, Frazer KA, and Topol EJ. (2009). Common vs rare allele hypotheses for
complex diseases. Current Opinion in Genetics & Development 19: 212-219.
4. Mukhopadhyay I, Feingold E, Weeks DE, and Thalamuthu A. (2010). Association Tests Using
Kernel-Based Measures of Multi-Locus Genotype Similarity Between Individuals. Genetic Epide-
miliogy 34: 213-221.
5. Schaid DJ, McDonnell SK, Hebbring SJ, Cunningham JM, and Thibodeau SN. (2005). Nonpara-
metric Tests of Association of Multiple Genes with Human Disease. American Journal of Human
Genetics 76: 780-793.
6. Wessel J and Schork NJ. (2006). Generalized genomic distance-based regression methodology for
multilocus association analysis. American Journal of Human Genetics 79: 792-806.
7. Montana G. (2005). HapSim: a simulation tool for generating haplotype data with pre-specified
allele frequencies and LD coefficients. Bioinformatics 21: 4309-4311.