Klein RJPower analysis for genome-wide association studies. BMC Genetics 8: 58

Program in Cancer Biology and Genetics, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY, USA.
BMC Genetics (Impact Factor: 2.4). 02/2007; 8(1):58. DOI: 10.1186/1471-2156-8-58
Source: PubMed


Genome-wide association studies are a promising new tool for deciphering the genetics of complex diseases. To choose the proper sample size and genotyping platform for such studies, power calculations that take into account genetic model, tag SNP selection, and the population of interest are required.
The power of genome-wide association studies can be computed using a set of tag SNPs and a large number of genotyped SNPs in a representative population, such as available through the HapMap project. As expected, power increases with increasing sample size and effect size. Power also depends on the tag SNPs selected. In some cases, more power is obtained by genotyping more individuals at fewer SNPs than fewer individuals at more SNPs.
Genome-wide association studies should be designed thoughtfully, with the choice of genotyping platform and sample size being determined from careful power calculations.

Download full-text


Available from: PubMed Central · License: CC BY
  • Source
    • "The small sample size of the GWAS was insufficient considering that for the detection of markers with small effects on quantitative traits several thousands of individuals are required (Klein, 2007; Spencer et al., 2009). Thus, it was not surprising that no effects were significant after correction for multiple testing. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We present results from a genome-wide association study (GWAS) and a single-marker association study. The GWAS was performed with the Illumina PorcineSNP60 BeadChip from which five markers were selected for a validation analysis. Genetic effects were estimated for feed intake, weight gain and traits of fat and muscle tissue in German Landrace boars kept on performance test stations. The GWAS was carried out in a population of 288 boars and the validation study for another 432 boars. No statistically significant effect was found in the GWAS after adjusting for multiple testing. Effects of two markers, which were genome-wide significant before correction for multiple testing (P<0.00005), could be confirmed in the validation study. The major allele of marker ALGA0056781 on SSC1 was positively associated with both higher weight gain and fat deposition. The effect on live-weight gain was 2.25 g/day in the GWAS (P=0.0003) and 3.73 g/day in the validation study (P=0.01), and for back fat thickness 0.15 mm in the GWAS (P<0.0001) and 0.20 mm in the validation study (P=0.02). The marker had similar effects on test-day weight gain (GWAS: 3.85 g/day, P=0.001; validation study: 6.80 g/day, P=0.003) and back fat area (GWAS: 0.27 cm&sup2;, P<0.0001; validation study: 0.35 cm&sup2;, P=0.03). Marker ASGA0056782 on SSC13 was associated with live-weight gain. The major allele had negative effects in both studies (GWAS: -4.88 g/day, P<0.0001; validation study: -3.75 g/day, P=0.02). The effects of these two markers would have been excluded based on the GWAS alone but were shown to be significantly trait associated in the validation study indicating a false-negative result. The G protein-coupled receptor 126 (GPR126) gene ∼200kb down-stream of marker ALGA0001781 was shown to be associated with human height, and thus might explain the association with weight gain in pigs. Several traits were affected in an economically desired direction by the minor allele of the markers, pointing to the possibility of improvement through further selection.
    Full-text · Article · Mar 2014 · Journal of Animal Science
  • Source
    • "Because the fluorescent signal of SNPs may vary with DNA samples and assays, the incidence of poor allele clustering such as multiple or indistinguishable clusters of samples may occur and lead to inaccurate allele calls. Based on our standard to define SNPs with successful allele calls, the rate of successful allele calls using the SoySNP50K beadchip was 91%, which is comparable to the rate of 77–91% reported in human populations using the Illumina Infinium 550 K beadchip [36], 92% in cattle using Illumina Infinium assay BovineSNP50 [37] and 81% in horse using the Illumina commercial assay EquineSNP50 [38]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The objective of this research was to identify single nucleotide polymorphisms (SNPs) and to develop an Illumina Infinium BeadChip that contained over 50,000 SNPs from soybean (Glycine max L. Merr.). A total of 498,921,777 reads 35-45bp in length were obtained from DNA sequence analysis of reduced representation libraries from several soybean accessions which included six cultivated and two wild soybean (G. soja Sieb. et Zucc.) genotypes. These reads were mapped to the soybean whole genome sequence and 209,903 SNPs were identified. After applying several filters, a total of 146,161 of the 209,903 SNPs were determined to be ideal candidates for Illumina Infinium II BeadChip design. To equalize the distance between selected SNPs, increase assay success rate, and minimize the number of SNPs with low minor allele frequency, an iteration algorithm based on a selection index was developed and used to select 60,800 SNPs for Infinium BeadChip design. Of the 60,800 SNPs, 50,701 were targeted to euchromatic regions and 10,000 to heterochromatic regions of the 20 soybean chromosomes. In addition, 99 SNPs were targeted to unanchored sequence scaffolds. Of the 60,800 SNPs, a total of 52,041 passed Illumina's manufacturing phase to produce the SoySNP50K iSelect BeadChip. Validation of the SoySNP50K chip with 96 landrace genotypes, 96 elite cultivars and 96 wild soybean accessions showed that 47,337 SNPs were polymorphic and generated successful SNP allele calls. In addition, 40,841 of the 47,337 SNPs (86%) had minor allele frequencies ≥10% among the landraces, elite cultivars and the wild soybean accessions. A total of 620 and 42 candidate regions which may be associated with domestication and recent selection were identified, respectively. The SoySNP50K iSelect SNP beadchip will be a powerful tool for characterizing soybean genetic diversity and linkage disequilibrium, and for constructing high resolution linkage maps to improve the soybean whole genome sequence assembly.
    Full-text · Article · Jan 2013 · PLoS ONE
  • Source
    • "Recently, genome-wide association studies (GWASs) using thousands of cases and controls reported many susceptibility SNPs for 237 human traits by the end of June, 2011 ( Since a GWAS evaluates hundreds of thousands of SNP markers, it requires a much larger sample size to achieve an adequate statistical power [14-18]. In genetic association studies, the observed signal for association is referred to be statistically significant if the p-value is less than a preset threshold value (α) of 0.05 to reject a null hypothesis of genetic association. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A sample size with sufficient statistical power is critical to the success of genetic association studies to detect causal genes of human complex diseases. Genome-wide association studies require much larger sample sizes to achieve an adequate statistical power. We estimated the statistical power with increasing numbers of markers analyzed and compared the sample sizes that were required in case-control studies and case-parent studies. We computed the effective sample size and statistical power using Genetic Power Calculator. An analysis using a larger number of markers requires a larger sample size. Testing a single-nucleotide polymorphism (SNP) marker requires 248 cases, while testing 500,000 SNPs and 1 million markers requires 1,206 cases and 1,255 cases, respectively, under the assumption of an odds ratio of 2, 5% disease prevalence, 5% minor allele frequency, complete linkage disequilibrium (LD), 1:1 case/control ratio, and a 5% error rate in an allelic test. Under a dominant model, a smaller sample size is required to achieve 80% power than other genetic models. We found that a much lower sample size was required with a strong effect size, common SNP, and increased LD. In addition, studying a common disease in a case-control study of a 1:4 case-control ratio is one way to achieve higher statistical power. We also found that case-parent studies require more samples than case-control studies. Although we have not covered all plausible cases in study design, the estimates of sample size and statistical power computed under various assumptions in this study may be useful to determine the sample size in designing a population-based genetic association study.
    Full-text · Article · Jun 2012
Show more