Power and sample size calculations for genetic case/control studies using gene-centric SNP maps: application to human chromosomes 6, 21, and 22 in three populations.
ABSTRACT Power and sample size calculations are critical parts of any research design for genetic association. We present a method that utilizes haplotype frequency information and average marker-marker linkage disequilibrium on SNPs typed in and around all genes on a chromosome. The test statistic used is the classic likelihood ratio test applied to haplotypes in case/control populations. Haplotype frequencies are computed through specification of genetic model parameters. Power is determined by computation of the test's non-centrality parameter. Power per gene is computed as a weighted average of the power assuming each haplotype is associated with the trait. We apply our method to genotype data from dense SNP maps across three entire chromosomes (6, 21, and 22) for three different human populations (African-American, Caucasian, Chinese), three different models of disease (additive, dominant, and multiplicative) and two trait allele frequencies (rare, common). We perform a regression analysis using these factors, average marker-marker disequilibrium, and the haplotype diversity across the gene region to determine which factors most significantly affect average power for a gene in our data. Also, as a 'proof of principle' calculation, we perform power and sample size calculations for all genes within 100 kb of the PSORS1 locus (chromosome 6) for a previously published association study of psoriasis. Results of our regression analysis indicate that four highly significant factors that determine average power to detect association are: disease model, average marker-marker disequilibrium, haplotype diversity, and the trait allele frequency. These findings may have important implications for the design of well-powered candidate gene association studies. Our power and sample size calculations for the PSORS1 gene appear consistent with published findings, namely that there is substantial power (>0.99) for most genes within 100 kb of the PSORS1 locus at the 0.01 significance level.
- SourceAvailable from: Krina T ZondervanNature Reviews Genetics 03/2004; 5(2):89-100. · 41.06 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: Recently, attention has focused on the use of whole-genome linkage disequilibrium (LD) studies to map common disease genes. Such studies would employ a dense map of single nucleotide polymorphisms (SNPs) to detect association between a marker and disease. Construction of SNP maps is currently underway. An essential issue yet to be settled is the required marker density of such maps. Here, I use population simulations to estimate the extent of LD surrounding common gene variants in the general human population as well as in isolated populations. Two main conclusions emerge from these investigations. First, a useful level of LD is unlikely to extend beyond an average distance of roughly 3 kb in the general population, which implies that approximately 500,000 SNPs will be required for whole-genome studies. Second, the extent of LD is similar in isolated populations unless the founding bottleneck is very narrow or the frequency of the variant is low (<5%).Nature Genetics 06/1999; 22(2):139-44. · 35.21 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: Recent progress in the development of single-nucleotide polymorphism (SNP) maps within genes and across the genome provides a valuable tool for fine-mapping and has led to the suggestion of genomewide association studies to search for susceptibility loci for complex traits. Test statistics for genome association studies that consider a single marker at a time, ignoring the linkage disequilibrium between markers, are inefficient. In this study, we present a generalized T2 statistic for association studies of complex traits, which can utilize multiple SNP markers simultaneously and considers the effects of multiple disease-susceptibility loci. This generalized T2 statistic is a corollary to that originally developed for multivariate analysis and has a close relationship to discriminant analysis and common measure of genetic distance. We evaluate the power of the generalized T2 statistic and show that power to be greater than or equal to those of the traditional chi2 test of association and a similar haplotype-test statistic. Finally, examples are given to evaluate the performance of the proposed T2 statistic for association studies using simulated and real data.The American Journal of Human Genetics 06/2002; 70(5):1257-68. · 11.20 Impact Factor