Two-stage designs for gene-disease association studies with sample size constraints

Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY 10021, USA.
Biometrics (Impact Factor: 1.52). 10/2004; 60(3):589-97. DOI: 10.1111/j.0006-341X.2004.00207.x
Source: PubMed

ABSTRACT Gene-disease association studies based on case-control designs may often be used to identify candidate polymorphisms (markers) conferring disease risk. If a large number of markers are studied, genotyping all markers on all samples is inefficient in resource utilization. Here, we propose an alternative two-stage method to identify disease-susceptibility markers. In the first stage all markers are evaluated on a fraction of the available subjects. The most promising markers are then evaluated on the remaining individuals in Stage 2. This approach can be cost effective since markers unlikely to be associated with the disease can be eliminated in the first stage. Using simulations we show that, when the markers are independent and when they are correlated, the two-stage approach provides a substantial reduction in the total number of marker evaluations for a minimal loss of power. The power of the two-stage approach is evaluated when a single marker is associated with the disease, and in the presence of multiple disease-susceptibility markers. As a general guideline, the simulations over a wide range of parametric configurations indicate that evaluating all the markers on 50% of the individuals in Stage 1 and evaluating the most promising 10% of the markers on the remaining individuals in Stage 2 provides near-optimal power while resulting in a 45% decrease in the total number of marker evaluations.

  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The two-stage design is a common cost-effective approach for genome-wide association studies. The first stage serves as a screening to identify a subset of single-nucleotide polymorphisms (SNPs) from 100,000 to 500,000 SNPs using a fraction of case-control samples. In the second stage, only the selected SNPs are genotyped using the remaining case-control samples. On the other hand, DNA pooling is another common strategy to save genotyping cost. In this article, we propose a method using DNA pooling in the first stage and genotype-based anal-ysis in the second stage. A joint analysis to combine both stages is applied to a two-stage design with DNA pooling when the underlying genetic model is known. When the genetic model is unknown, we use a robust procedure in the joint analy-sis by applying genetic model selection in the second stage based on the difference of Hardy-Weinberg disequilibrium coefficients between cases and controls. Perfor-mance of our method and comparison with other approaches are investigated by simulation studies.
    Statistica Sinica 01/2009; 19(4). · 1.23 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Genome-wide Association Studies (GWAS) require large phenotyping and genotyping costs. Two-stage design can be efficient to reduce genotyping costs: on the first stage some disease associated SNP are detected and these associations are checked on the second stage with reliable significance level. This procedure decreases the number of genotyped SNP on the second stage, thus the genotyping costs will be less than genotyping costs of one-stage design. Modern genotyping technologies allow using 96 and 384 well plates. Thus the number of individuals should be proportional to well plate size. Monte Carlo simulation was used to find optimal number of well plates and critical values on the first and second stages. We also found that the costs have inverse relationship to Kullback-Leibler divergence between cases and controls distributions under alternative hypothesis.
    Applied Methods of Statistical Analysis. Simulations and Statistical Inference (AMSA 2013) International Conference, Novosibirsk, Russia; 09/2013