Page 1 of 8
(page number not for citation purposes)
Power analysis for genome-wide association studies
Robert J Klein
Address: Program in Cancer Biology and Genetics, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY, USA
Email: Robert J Klein - firstname.lastname@example.org
Background: Genome-wide association studies are a promising new tool for deciphering the
genetics of complex diseases. To choose the proper sample size and genotyping platform for such
studies, power calculations that take into account genetic model, tag SNP selection, and the
population of interest are required.
Results: The power of genome-wide association studies can be computed using a set of tag SNPs
and a large number of genotyped SNPs in a representative population, such as available through the
HapMap project. As expected, power increases with increasing sample size and effect size. Power
also depends on the tag SNPs selected. In some cases, more power is obtained by genotyping more
individuals at fewer SNPs than fewer individuals at more SNPs.
Conclusion: Genome-wide association studies should be designed thoughtfully, with the choice
of genotyping platform and sample size being determined from careful power calculations.
One goal of modern human genetics is to identify the
genetic variants that predispose individuals to develop
common, complex diseases. It has been proposed that
population-based association studies will be more power-
ful than traditional family-based linkage methods in iden-
tifying such high-frequency, low-penetrance alleles .
Such studies require the genotypes a large number of pol-
ymorphisms (usually single nucleotide polymorphisms
[SNPs]) across the genome, each of which is tested for
association with the phenotype of interest. As originally
proposed, this would be a direct test of association, in
which the functional mutation is presumed to be geno-
typed. An alternate approach to association studies takes
advantage of the correlation between SNPs, called linkage
disequilibrium (LD), that can occur due to the genealogi-
cal history of the polymorphisms . In this approach,
often called indirect association, one SNP is genotyped
and used to infer indirectly the genotypes at other SNPs
with which it is in high LD . As one genotyped SNP,
called a "tag" SNP, can be in LD with numerous other
SNPs, much fewer SNPs (105 – 106) would need to be gen-
otyped to capture the common variation in the genome
. Recent advances in genotyping technology make such
studies feasible [4,5] and the first results of such studies
are being published [6-10].
One key question in designing such studies is the choice
of tag SNPs. Numerous methods for choosing the best set
of tagging SNPs have been developed and compared .
One common measure evaluates the pairwise LD, meas-
ured by r2, between the tag SNPs and all other SNPs .
The value r2 represents the correlation between two SNPs.
It is a useful measure because, if N individuals are needed
for a specific power with a direct test of association, N/r2
individuals would be needed for an indirect test of associ-
ation . Sets of tag SNPs are usually compared by their
"coverage," or fraction of variants in the genome that are
Published: 28 August 2007
BMC Genetics 2007, 8:58doi:10.1186/1471-2156-8-58
Received: 22 November 2006
Accepted: 28 August 2007
This article is available from: http://www.biomedcentral.com/1471-2156/8/58
© 2007 Klein; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
BMC Genetics 2007, 8:58http://www.biomedcentral.com/1471-2156/8/58
Page 8 of 8
(page number not for citation purposes)
I am grateful to Jurg Ott, in whose lab the bulk of this work was performed;
Joe Garsetti from Illumina for help in obtaining the list of SNPs on the Illu-
mina chips; and Sara Hamon for critical comments on the manuscript. This
work was performed while RJK was a postdoctoral fellow funded by
F32HG003681 from NIH.
1.Risch N, Merikangas K: The future of genetic studies of complex
human diseases. Science 1996, 273:1516-1517.
2. Pritchard JK, Przeworski M: Linkage disequilibrium in humans:
models and data. Am J Hum Genet 2001, 69:1-14.
3.Hirschhorn JN, Daly MJ: Genome-wide association studies for
common diseases and complex traits. Nat Rev Genet 2005,
4. Matsuzaki H, Dong S, Loi H, Di X, Liu G, Hubbell E, Law J, Berntsen
T, Chadha M, Hui H, Yang G, Kennedy GC, Webster TA, Cawley S,
Walsh PS, Jones KW, Fodor SPA, Mei R: Genotyping over 100,000
SNPs on a pair of oligonucleotide arrays. Nat Methods 2004,
5.Gunderson KL, Steemers FJ, Lee G, Mendoza LG, Chee MS: A
genome-wide scalable SNP genotyping assay using microar-
ray technology. Nat Genet 2005, 37(5):549-554.
6. Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, Henning
AK, Sangiovanni JP, Mane SM, Mayne ST, Bracken MB, Ferris FL, Ott
J, Barnstable C, Hoh J: Complement factor H polymorphism in
age-related macular degeneration.
7. Herbert A, Gerry NP, McQueen MB, Heid IM, Pfeufer A, Illig T, Wich-
mann HE, Meitinger T, Hunter D, Hu FB, Colditz G, Hinney A, Hebe-
brand J, Koberwitz K, Zhu X, Cooper R, Ardlie K, Lyon H,
Hirschhorn JN, Laird NM, Lenburg ME, Lange C, Christman MF: A
common genetic variant is associated with adult and child-
hood obesity. Science 2006, 312(5771):279-283.
8.Arking DE, Pfeufer A, Post W, Kao WH, Newton-Cheh C, Ikeda M,
West K, Kashuk C, Akyol M, Perz S, Jalilzadeh S, Illig T, Gieger C, Guo
CY, Larson MG, Wichmann HE, Marban E, O'Donnell C J, Hirschhorn
JN, Kaab S, Spooner PM, Meitinger T, Chakravarti A: A common
genetic variant in the NOS1 regulator NOS1AP modulates
cardiac repolarization. Nat Genet 2006.
9.Maraganore DM, de Andrade M, Lesnick TG, Strain KJ, Farrer MJ,
Rocca WA, Pant PV, Frazer KA, Cox DR, Ballinger DG: High-reso-
lution whole-genome association study of Parkinson disease.
Am J Hum Genet 2005, 77(5):685-693.
10.Duerr RH, Taylor KD, Brant SR, Rioux JD, Silverberg MS, Daly MJ,
Steinhart AH, Abraham C, Regueiro M, Griffiths A, Dassopoulos T,
Bitton A, Yang H, Targan S, Datta LW, Kistner EO, Schumm LP, Lee
A, Gregersen PK, Barmada MM, Rotter JI, Nicolae DL, Cho JH: A
Genome-Wide Association Study Identifies IL23R as an
Inflammatory Bowel Disease
11.Ke X, Miretti MM, Broxholme J, Hunt S, Beck S, Bentley DR, Deloukas
P, Cardon LR: A comparison of tagging methods and their tag-
ging space. Hum Mol Genet 2005, 14(18):2757-2767.
12.Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA:
Selecting a maximally informative set of single-nucleotide
polymorphisms for association analyses using linkage dise-
quilibrium. Am J Hum Genet 2004, 74(1):106-120.
Pe'er I, de Bakker PI, Maller J, Yelensky R, Altshuler D, Daly MJ: Eval-
uating and improving power in whole-genome association
studies using fixed marker sets. Nat Genet 2006, 38(6):663-667.
Hinds DA, Stuve LL, Nilsen GB, Halperin E, Eskin E, Ballinger DG,
Frazer KA, Cox DR: Whole-genome patterns of common DNA
variation in three human populations. Science 2005,
Barrett JC, Cardon LR: Evaluating coverage of genome-wide
association studies. Nat Genet 2006, 38(6):659-662.
Skol AD, Scott LJ, Abecasis GR, Boehnke M: Joint analysis is more
efficient than replication-based analysis for two-stage
genome-wide association studies.
Jorgenson E, Witte JS: Coverage and Power in Genomewide
Association Studies. Am J Hum Genet 2006, 78:.
de Bakker PI, Yelensky R, Pe'er I, Gabriel SB, Daly MJ, Altshuler D:
Efficiency and power in genetic association studies. Nat Genet
The International HapMap Consortium: A haplotype map of the
human genome. Nature 2005, 437(7063):1299-1320.
Lin S, Chakravarti A, Cutler DJ: Exhaustive allelic transmission
disequilibrium tests as a new approach to genome-wide
association studies. Nat Genet 2004, 36(11):1181-1188.
Roeder K, Bacanu SA, Wasserman L, Devlin B: Using linkage
genome scans to improve power of association in genome
scans. Am J Hum Genet 2006, 78(2):243-252.
Mitra SK: On the limiting power function of the frequency chi-
square test. Ann Math Stat 1958, 29:1221-1233.
Gordon D, Finch SJ, Nothnagel M, Ott J: Power and sample size
calculations for case-control genetic association tests when
errors are present: application to single nucleotide polymor-
phisms. Hum Hered 2002, 54(1):22-33.
Nat Genet 2006,
Additional file 1
Power of genome-wide association studies with various parameters. Each
line of the file contains the power of a genome-wide association study con-
ducted with the specified HapMap population, genetic model, and sample
size (N) based on the SNPs present in a variety of commercially available
Click here for file