Response: Re: Discriminatory Accuracy From Single-Nucleotide Polymorphisms in Models to Predict Breast Cancer Risk

Division of Cancer Epidemiology and Genetics, National Cancer Institute, 6120 Executive Blvd, Rm 8032, Bethesda, MD 20892-7244, USA.
Journal of the National Cancer Institute (Impact Factor: 12.58). 08/2008; 100(14):1037-41. DOI: 10.1093/jnci/djn180
Source: PubMed


One purpose for seeking common alleles that are associated with disease is to use them to improve models for projecting individualized disease risk. Two genome-wide association studies and a study of candidate genes recently identified seven common single-nucleotide polymorphisms (SNPs) that were associated with breast cancer risk in independent samples. These seven SNPs were located in FGFR2, TNRC9 (now known as TOX3), MAP3K1, LSP1, CASP8, chromosomal region 8q, and chromosomal region 2q35. I used estimates of relative risks and allele frequencies from these studies to estimate how much these SNPs could improve discriminatory accuracy measured as the area under the receiver operating characteristic curve (AUC). A model with these seven SNPs (AUC = 0.574) and a hypothetical model with 14 such SNPs (AUC = 0.604) have less discriminatory accuracy than a model, the National Cancer Institute's Breast Cancer Risk Assessment Tool (BCRAT), that is based on ages at menarche and at first live birth, family history of breast cancer, and history of breast biopsy examinations (AUC = 0.607). Adding the seven SNPs to BCRAT improved discriminatory accuracy to an AUC of 0.632, which was, however, less than the improvement from adding mammographic density. Thus, these seven common alleles provide less discriminatory accuracy than BCRAT but have the potential to improve the discriminatory accuracy of BCRAT modestly. Experience to date and quantitative arguments indicate that a huge increase in the numbers of case patients with breast cancer and control subjects would be required in genome-wide association studies to find enough SNPs to achieve high discriminatory accuracy.

1 Follower
16 Reads
  • Source
    • "Issues may also arise from the strategy applied to study single SNPs: assigning an odd ratio to each of them and selecting the most relevant one on the basis of this value (Gail, 2008) is known to be a method prone to selection of non-relevant SNP. On the other side, as the aging process is probably regulated slightly by several interacting factors, the little amount of information can be easily confounded by the huge amount of different features tested, especially given the typically small dimension of the sample (from a statistical point of view) (Greenland, 2000; Marchini et al., 2005; Tranah, 2011 "
    [Show abstract] [Hide abstract]
    ABSTRACT: The last 30years of research greatly contributed to shed light on the role of mitochondrial DNA (mtDNA) variability in aging, although contrasting results have been reported, mainly due to bias regarding the population size and stratification, and to the use of analysis methods (haplogroup classification) that resulted to be not sufficiently adequate to grasp the complexity of the phenomenon. A 5-years European study (the GEHA EU project) collected and analysed data on mtDNA variability on an unprecedented number of long-living subjects (enriched for longevity genes) and a comparable number of controls (matched for gender and ethnicity) in Europe. This very large study allowed a reappraisal of the role of both the inherited and the somatic mtDNA variability in aging, as an association with longevity emerged only when mtDNA variants in OXPHOS complexes co-occurred. Moreover, the availability of data from both nuclear and mitochondrial genomes on a large number of subjects paves the way for an evaluation at a very large scale of the epistatic interactions at a higher level of complexity. This scenario is expected to be even more clarified in the next future with the use of next generation sequencing (NGS) techniques, which are becoming applicable to evaluate mtDNA variability and, then, new mathematical/bioinformatic analysis methods are urgently needed. Recent advances of association studies on age-related diseases and mtDNA variability will be also discussed in this review, taking into account the bias hidden by population stratification. Finally very recent findings in terms of mtDNA heteroplasmy (i.e. the coexistence of wild type and mutated copies of mtDNA) and aging as well as mitochondrial epigenetic mechanisms will be also discussed.
    Experimental gerontology 04/2014; 56. DOI:10.1016/j.exger.2014.03.022 · 3.49 Impact Factor
  • Source
    • "The inclusion of 7–18 common genetic variants for breast cancer has been shown to increase discrimination of the Gail model by 0.03–0.07 [40–42]. Apart from the substantial costs of obtaining genetic information, this modest improvement in discrimination was similar to the increase of 0.05 obtained from adding only mammographic density, a strong and highly prevalent risk factor [43]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The Gail model for predicting the absolute risk of invasive breast cancer has been validated extensively in US populations, but its performance in the international setting remains uncertain. We evaluated the predictive accuracy of the Gail model in 54,649 Spanish women aged 45–68 years who were free of breast cancer at the 1996–1998 baseline mammographic examination in the population-based Navarre Breast Cancer Screening Program. Incident cases of invasive breast cancer and competing deaths were ascertained until the end of 2005 (average follow-up of 7.7 years) through linkage with population-based cancer and mortality registries. The Gail model was tested for calibration and discrimination in its original form and after recalibration to the lower breast cancer incidence and risk factor prevalence in the study cohort, and compared through cross-validation with a Navarre model fully developed from this cohort. The original Gail model overpredicted significantly the 835 cases of invasive breast cancer observed in the cohort (ratio of expected to observed cases 1.46, 95 % CI 1.36–1.56). The recalibrated Gail model was well calibrated overall (expected-to-observed ratio 1.00, 95 % CI 0.94–1.07), but it tended to underestimate risk for women in low-risk quintiles and to overestimate risk in high-risk quintiles (P = 0.01). The Navarre model showed good cross-validated calibration overall (expected-to-observed ratio 0.98, 95 % CI 0.92–1.05) and in different cohort subsets. The Navarre and Gail models had modest cross-validated discrimination indexes of 0.542 (95 % CI 0.521–0.564) and 0.544 (95 % CI 0.523–0.565), respectively. Although the original Gail model cannot be applied directly to populations with different underlying rates of invasive breast cancer, it can readily be recalibrated to provide unbiased estimates of absolute risk in such populations. Nevertheless, its limited discrimination ability at the individual level highlights the need to develop extended models with additional strong risk factors. Electronic supplementary material The online version of this article (doi:10.1007/s10549-013-2428-y) contains supplementary material, which is available to authorized users.
    Breast Cancer Research and Treatment 02/2013; 138(1). DOI:10.1007/s10549-013-2428-y · 3.94 Impact Factor
  • Source
    • "What are typical results? Although for many complex diseases, there have been impressive numbers of genetic regions identified to be associated, the typical results for classification and probability estimation are that the predictive values are only moderate (Gail 2008; Kooperberg et al. 2010). Many examples for this have been given by Janssens and van Duijn (2008), and one systematic collation of evidence on genetic tests is given by the Evaluation of Genomic Applications in Practice and Prevention (EGAPP) initiative (Teutsch et al. 2009). "
    [Show abstract] [Hide abstract]
    ABSTRACT: After an association between genetic variants and a phenotype has been established, further study goals comprise the classification of patients according to disease risk or the estimation of disease probability. To accomplish this, different statistical methods are required, and specifically machine-learning approaches may offer advantages over classical techniques. In this paper, we describe methods for the construction and evaluation of classification and probability estimation rules. We review the use of machine-learning approaches in this context and explain some of the machine-learning algorithms in detail. Finally, we illustrate the methodology through application to a genome-wide association analysis on rheumatoid arthritis. Electronic supplementary material The online version of this article (doi:10.1007/s00439-012-1194-y) contains supplementary material, which is available to authorized users.
    Human Genetics 07/2012; 131(10):1639-54. DOI:10.1007/s00439-012-1194-y · 4.82 Impact Factor
Show more


16 Reads
Available from