Twostage designs for genedisease association studies with sample size constraints
ABSTRACT Genedisease association studies based on casecontrol designs may often be used to identify candidate polymorphisms (markers) conferring disease risk. If a large number of markers are studied, genotyping all markers on all samples is inefficient in resource utilization. Here, we propose an alternative twostage method to identify diseasesusceptibility markers. In the first stage all markers are evaluated on a fraction of the available subjects. The most promising markers are then evaluated on the remaining individuals in Stage 2. This approach can be cost effective since markers unlikely to be associated with the disease can be eliminated in the first stage. Using simulations we show that, when the markers are independent and when they are correlated, the twostage approach provides a substantial reduction in the total number of marker evaluations for a minimal loss of power. The power of the twostage approach is evaluated when a single marker is associated with the disease, and in the presence of multiple diseasesusceptibility markers. As a general guideline, the simulations over a wide range of parametric configurations indicate that evaluating all the markers on 50% of the individuals in Stage 1 and evaluating the most promising 10% of the markers on the remaining individuals in Stage 2 provides nearoptimal power while resulting in a 45% decrease in the total number of marker evaluations.

 "Although the costs of wholegenome genotyping are decreasing with the highthroughput biological technology, the total costs for a GWAS are still very expensive due to the thousands of sampling units and huge amounts of singlenucleotide polymorphisms. In order to save the costs, the twostage design and the corresponding statistical analysis where all the SNPs are genotyped in Stage 1 on a portion of the samples and the promising SNPs with small í µí±values (e.g., <0.001) based on some efficient tests are further screened on the remaining subjects, are often adopted in practice (e.g., [11] [12] [13] [14] [15]). "
Dataset: CMMM2013843563

 "Approach of minimization costs in twostage design was proposed by Elston et al. [3] for linkage analysis. Later this approach was transferred to association analysis by Satagopan et al. [12][13][14]. Optimization of the design consists in choosing the proportion of samples between two stages and critical values in such a manner as to minimize the total cost for specified genomewide significance level and power [9][15][21][5] [8][16][10]. The start point of present work was a paper of Nguyen et al [10], where an optimal robust twostage design using the MAX3 test were considered. "
[Show abstract] [Hide abstract]
ABSTRACT: Genomewide Association Studies (GWAS) require large phenotyping and genotyping costs. Twostage design can be efficient to reduce genotyping costs: on the first stage some disease associated SNP are detected and these associations are checked on the second stage with reliable significance level. This procedure decreases the number of genotyped SNP on the second stage, thus the genotyping costs will be less than genotyping costs of onestage design. Modern genotyping technologies allow using 96 and 384 well plates. Thus the number of individuals should be proportional to well plate size. Monte Carlo simulation was used to find optimal number of well plates and critical values on the first and second stages. We also found that the costs have inverse relationship to KullbackLeibler divergence between cases and controls distributions under alternative hypothesis.Applied Methods of Statistical Analysis. Simulations and Statistical Inference (AMSA 2013) International Conference, Novosibirsk, Russia; 09/2013 
 "False negative rates are increased by multiple factors that cause systematic biases, and such biases reduce statistical power [26]. The statistical power of 80% is used widely to avoid false negative associations and to determine a costeffective sample size in largescale association studies [7, 22, 23]. However, many researchers tend to overlook the importance of statistical power and sample size calculations. "
[Show abstract] [Hide abstract]
ABSTRACT: A sample size with sufficient statistical power is critical to the success of genetic association studies to detect causal genes of human complex diseases. Genomewide association studies require much larger sample sizes to achieve an adequate statistical power. We estimated the statistical power with increasing numbers of markers analyzed and compared the sample sizes that were required in casecontrol studies and caseparent studies. We computed the effective sample size and statistical power using Genetic Power Calculator. An analysis using a larger number of markers requires a larger sample size. Testing a singlenucleotide polymorphism (SNP) marker requires 248 cases, while testing 500,000 SNPs and 1 million markers requires 1,206 cases and 1,255 cases, respectively, under the assumption of an odds ratio of 2, 5% disease prevalence, 5% minor allele frequency, complete linkage disequilibrium (LD), 1:1 case/control ratio, and a 5% error rate in an allelic test. Under a dominant model, a smaller sample size is required to achieve 80% power than other genetic models. We found that a much lower sample size was required with a strong effect size, common SNP, and increased LD. In addition, studying a common disease in a casecontrol study of a 1:4 casecontrol ratio is one way to achieve higher statistical power. We also found that caseparent studies require more samples than casecontrol studies. Although we have not covered all plausible cases in study design, the estimates of sample size and statistical power computed under various assumptions in this study may be useful to determine the sample size in designing a populationbased genetic association study.06/2012; 10(2):11722. DOI:10.5808/GI.2012.10.2.117