Power and sample size calculations for genetic case/control studies using gene-centric SNP maps: application to human chromosomes 6, 21, and 22 in three populations.

Applied Biosystems, Foster City, CA 94404, USA.
Human Heredity (Impact Factor: 1.64). 02/2005; 60(1):43-60. DOI: 10.1159/000087918
Source: PubMed

ABSTRACT Power and sample size calculations are critical parts of any research design for genetic association. We present a method that utilizes haplotype frequency information and average marker-marker linkage disequilibrium on SNPs typed in and around all genes on a chromosome. The test statistic used is the classic likelihood ratio test applied to haplotypes in case/control populations. Haplotype frequencies are computed through specification of genetic model parameters. Power is determined by computation of the test's non-centrality parameter. Power per gene is computed as a weighted average of the power assuming each haplotype is associated with the trait. We apply our method to genotype data from dense SNP maps across three entire chromosomes (6, 21, and 22) for three different human populations (African-American, Caucasian, Chinese), three different models of disease (additive, dominant, and multiplicative) and two trait allele frequencies (rare, common). We perform a regression analysis using these factors, average marker-marker disequilibrium, and the haplotype diversity across the gene region to determine which factors most significantly affect average power for a gene in our data. Also, as a 'proof of principle' calculation, we perform power and sample size calculations for all genes within 100 kb of the PSORS1 locus (chromosome 6) for a previously published association study of psoriasis. Results of our regression analysis indicate that four highly significant factors that determine average power to detect association are: disease model, average marker-marker disequilibrium, haplotype diversity, and the trait allele frequency. These findings may have important implications for the design of well-powered candidate gene association studies. Our power and sample size calculations for the PSORS1 gene appear consistent with published findings, namely that there is substantial power (>0.99) for most genes within 100 kb of the PSORS1 locus at the 0.01 significance level.

1 Follower
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The progress in and success of biomedical research over the past century was built on the foundation outlined in R.A. Fisher’s The Design of Experiments (1935), which described the theory and methodological approach to designing research studies. A key tenet of Fisher’s treatise, widely adopted by the research community, is randomization, the process of assigning individuals to random groups or treatments. Comparing outcomes or responses between these groups yields “risk factors” called population attributable risks (PAR), which are statistical estimates of the percentage reduction in disease if the risk were avoided or in the case of genetic associations, if the gene variant were not present in the population.
    International Journal for Vitamin and Nutrition Research 10/2012; 82(5):333-341. DOI:10.1024/0300-9831/a000128 · 1.00 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recently, many new loci associated with type 2 diabetes have been uncovered by genetic association studies and genome-wide association studies. As more reports are made, particularly with respect to varying ethnicities, there is a need to determine more precisely the effect sizes in each major racial group. In addition, some reports have claimed ethnic-specific associations with alternative single-nucleotide polymorphisms (SNPs), and to that end there has been a degree of confusion. We conducted a meta-analysis using an additive genetic model. Eight polymorphisms in 155 studies with 121174 subjects (53385 cases and 67789 controls) were addressed in this meta-analysis. Significant associations were found between type 2 diabetes and rs7903146, rs12255372, rs11196205, rs7901695, rs7895340 and rs4506565, with summary odds ratios (ORs) (95% confidence interval) of 1.39 (1.34-1.45), 1.33 (1.27-1.40), 1.20 (1.14-1.26), 1.32 (1.25-1.39), 1.21 (1.13-1.29) and 1.39 (1.29-1.49), respectively. In addition, no significant associations were found between the two polymorphisms (rs290487 and rs11196218) and type 2 diabetes. The summary ORs for the six statistically significant associations (P < 0.05) were further evaluated by estimating the false-positive report probability, with results indicating that all of the six significant associations were considered noteworthy, and may plausibly be true associations. Significant associations were found between the six polymorphisms (rs7903146, rs12255372, rs11196205, rs7901695, rs7895340 and rs4506565) in the TCF7L2 gene and type 2 diabetes risk, and the other two polymorphisms (rs11196218 and rs290487) were not found to be significantly associated with type 2 diabetes. Subgroups analyses show that significant associations are not found between the six SNPs (rs7903146, rs12255372, rs11196205, rs7901695, rs7895340, and rs4506565) and the type 2 diabetes in some ethnic populations.
    Mutagenesis 11/2012; DOI:10.1093/mutage/ges048 · 3.50 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The allele frequencies of markers as well as linkage disequilibrium (LD) can be changed in cases due to the LD between markers and the disease allele, exhibiting spurious associations of markers. To identify the true association, classical statistical tests for dealing with confounders have been applied to draw a conclusion as to whether the association of variants comes from LD with the known disease allele. However, a more direct test considering LD using estimated haplotype frequencies may be more efficient. The null hypothesis is that the different allele frequencies of a variant between cases and controls come solely from the increased disease allele frequency and the LD relationship with the disease allele. The haplotype frequencies of controls are estimated using the expectation maximization (EM) algorithm from the genotype data. The estimated frequencies are applied to calculate the expected haplotype frequencies in cases corresponding to the increase or decrease of the causative or protective alleles. The suggested method was applied to previously published data, and several APOE variants showed association with Alzheimer's disease independent from the APOE variant, rs429358, regardless of LD showing significant simulated p-values. The test results support the possibility that there may be more than one common disease variant in a locus.
    01/2007; 5(2).