To Correct or Not to Correct-and How

and bDepartment of Statistics, Stanford University School of Humanities and Sciences, Stanford, CA
Epidemiology (Cambridge, Mass.) (Impact Factor: 6.2). 11/2012; 23(6):912-3. DOI: 10.1097/EDE.0b013e31826cc1b3
Source: PubMed
1 Follower
4 Reads
  • [Show abstract] [Hide abstract]
    ABSTRACT: Misclassification of phenotype status can seriously affect accuracy in association studies, including studies of genetic risk factors. A common problem is the classification of participants as nondiseased because of insufficient diagnostic workup or because participants have not been followed up long enough to develop disease. Some validated predictive models may have high discrimination in predicting disease. We suggest that information from such models can be used to predict the risk that a nondiseased participant will eventually develop disease and to recode the status of participants predicted to be at highest risk. We evaluate conditions under which recoding results in a maximal net improvement in the accuracy of phenotype classification. Net improvement is expected only when the positive likelihood ratio of the predictive model is larger than the inverse of the odds of disease among apparently nondiseased controls. We conducted simulations to probe the impact of reclassification on the power to detect new risk factors under several scenarios of classification accuracy of the previously developed models. We also apply this framework to a validated model of progression to advanced age-related macular degeneration that uses genetic and nongenetic variables (area under the curve = 0.915). In the training cohort (n = 2,937) and a separate validation cohort (n = 1,227), 195-272 and 78-91 nonprogressor participants, respectively, were reclassified as progressors. Correction of phenotype misclassification based on highly informative predictive models may be helpful in identifying additional genetic and other risk factors, when there are validated risk factors that provide strong discriminating ability.
    Epidemiology (Cambridge, Mass.) 09/2012; 23(6):902-9. DOI:10.1097/EDE.0b013e31826c3129 · 6.20 Impact Factor
  • Epidemiology (Cambridge, Mass.) 11/2012; 23(6):910-1. DOI:10.1097/EDE.0b013e31826cc118 · 6.20 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We report a set of tools to estimate the number of susceptibility loci and the distribution of their effect sizes for a trait on the basis of discoveries from existing genome-wide association studies (GWASs). We propose statistical power calculations for future GWASs using estimated distributions of effect sizes. Using reported GWAS findings for height, Crohn's disease and breast, prostate and colorectal (BPC) cancers, we determine that each of these traits is likely to harbor additional loci within the spectrum of low-penetrance common variants. These loci, which can be identified from sufficiently powerful GWASs, together could explain at least 15-20% of the known heritability of these traits. However, for BPC cancers, which have modest familial aggregation, our analysis suggests that risk models based on common variants alone will have modest discriminatory power (63.5% area under curve), even with new discoveries.
    Nature Genetics 07/2010; 42(7):570-5. DOI:10.1038/ng.610 · 29.35 Impact Factor