Two-Stage Designs for Gene-Disease Association Studies with Sample Size Constraints

Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY 10021, USA.
Biometrics (Impact Factor: 1.57). 10/2004; 60(3):589-97. DOI: 10.1111/j.0006-341X.2004.00207.x
Source: PubMed


Gene-disease association studies based on case-control designs may often be used to identify candidate polymorphisms (markers) conferring disease risk. If a large number of markers are studied, genotyping all markers on all samples is inefficient in resource utilization. Here, we propose an alternative two-stage method to identify disease-susceptibility markers. In the first stage all markers are evaluated on a fraction of the available subjects. The most promising markers are then evaluated on the remaining individuals in Stage 2. This approach can be cost effective since markers unlikely to be associated with the disease can be eliminated in the first stage. Using simulations we show that, when the markers are independent and when they are correlated, the two-stage approach provides a substantial reduction in the total number of marker evaluations for a minimal loss of power. The power of the two-stage approach is evaluated when a single marker is associated with the disease, and in the presence of multiple disease-susceptibility markers. As a general guideline, the simulations over a wide range of parametric configurations indicate that evaluating all the markers on 50% of the individuals in Stage 1 and evaluating the most promising 10% of the markers on the remaining individuals in Stage 2 provides near-optimal power while resulting in a 45% decrease in the total number of marker evaluations.

7 Reads
  • Source
    • "Compared to one-stage designs that all case–control samples are genotyped on the whole 64 panel of SNPs (usually 500,000 to 1 million SNPs) at a time, a well-constructed two-stage 65 design can substantially reduce genotyping workload and cost (Goll and Bauer, 2007; Kwak 66 et al., 2009; Satagopan et al., 2004; Skol et al., 2007; Thomas et al., 2004; Wang et al., 2006; 67 Yu et al., 2007; Zheng et al., 2008; Zuo et al., 2006). It should be noted, however, that the 68 two-stage design under the framework of GWASs is different from the two-stage designs 69 for phase II clinical trials for comparing two treatments in which a homogeneous samples 70 of patients is entered sequentially and a termination rule is desired (O'Brien and Fleming, 71 1979; Pocock, 1977; Simon, 1989). "
    [Show abstract] [Hide abstract]
    ABSTRACT: In genome-wide association studies (GWASs) to detect the disease-associated genetic variants, two-stage design has received much attention because of its cost effectiveness and high efficiency. Under the framework of a two-stage design, it has been shown that joint analysis is more powerful than replication-based analysis. Several robust tests have been proposed for joint analysis to handle the problem of unknown genetic mode of inheritance. However, existing joint analysis of combining test statistics from both stages might suffer from a loss of efficiency if the combined test statistics are not sufficient or the weight of the statistic for each stage is not appropriate. In this paper, we propose a new strategy for joint analysis by combining the raw data rather than the test statistics across stages and construct a robust MAX3-based test for two-staged GWASs, which can make full use of the information of the data from both stages. Our numerical results show that the proposed procedure is more powerful and computationally much faster than the existing joint analysis procedures. An application to a type 2 diabetes data set is used to illustrate the proposed approach.
    Full-text · Article · Dec 2014 · Communication in Statistics- Simulation and Computation
  • Source
    • "Although the costs of whole-genome genotyping are decreasing with the high-throughput biological technology, the total costs for a GWAS are still very expensive due to the thousands of sampling units and huge amounts of singlenucleotide polymorphisms. In order to save the costs, the two-stage design and the corresponding statistical analysis where all the SNPs are genotyped in Stage 1 on a portion of the samples and the promising SNPs with small í µí±ƒ-values (e.g., <0.001) based on some efficient tests are further screened on the remaining subjects, are often adopted in practice (e.g., [11] [12] [13] [14] [15]). "

    Full-text · Dataset · Jan 2014
  • Source
    • "Approach of minimization costs in two-stage design was proposed by Elston et al. [3] for linkage analysis. Later this approach was transferred to association analysis by Satagopan et al. [12][13][14]. Optimization of the design consists in choosing the proportion of samples between two stages and critical values in such a manner as to minimize the total cost for specified genome-wide significance level and power [9][15][21][5] [8][16][10]. The start point of present work was a paper of Nguyen et al [10], where an optimal robust two-stage design using the MAX3 test were considered. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Genome-wide Association Studies (GWAS) require large phenotyping and genotyping costs. Two-stage design can be efficient to reduce genotyping costs: on the first stage some disease associated SNP are detected and these associations are checked on the second stage with reliable significance level. This procedure decreases the number of genotyped SNP on the second stage, thus the genotyping costs will be less than genotyping costs of one-stage design. Modern genotyping technologies allow using 96 and 384 well plates. Thus the number of individuals should be proportional to well plate size. Monte Carlo simulation was used to find optimal number of well plates and critical values on the first and second stages. We also found that the costs have inverse relationship to Kullback-Leibler divergence between cases and controls distributions under alternative hypothesis.
    Full-text · Conference Paper · Sep 2013
Show more