# Discriminatory accuracy from single-nucleotide polymorphisms in models to predict breast cancer risk.

**ABSTRACT** One purpose for seeking common alleles that are associated with disease is to use them to improve models for projecting individualized disease risk. Two genome-wide association studies and a study of candidate genes recently identified seven common single-nucleotide polymorphisms (SNPs) that were associated with breast cancer risk in independent samples. These seven SNPs were located in FGFR2, TNRC9 (now known as TOX3), MAP3K1, LSP1, CASP8, chromosomal region 8q, and chromosomal region 2q35. I used estimates of relative risks and allele frequencies from these studies to estimate how much these SNPs could improve discriminatory accuracy measured as the area under the receiver operating characteristic curve (AUC). A model with these seven SNPs (AUC = 0.574) and a hypothetical model with 14 such SNPs (AUC = 0.604) have less discriminatory accuracy than a model, the National Cancer Institute's Breast Cancer Risk Assessment Tool (BCRAT), that is based on ages at menarche and at first live birth, family history of breast cancer, and history of breast biopsy examinations (AUC = 0.607). Adding the seven SNPs to BCRAT improved discriminatory accuracy to an AUC of 0.632, which was, however, less than the improvement from adding mammographic density. Thus, these seven common alleles provide less discriminatory accuracy than BCRAT but have the potential to improve the discriminatory accuracy of BCRAT modestly. Experience to date and quantitative arguments indicate that a huge increase in the numbers of case patients with breast cancer and control subjects would be required in genome-wide association studies to find enough SNPs to achieve high discriminatory accuracy.

**0**Bookmarks

**·**

**76**Views

- Amit D Joshi, Sara Lindström, Anika Hüsing, Myrto Barrdahl, Tyler J VanderWeele, Daniele Campa, Federico Canzian, Mia M Gaudet, Jonine D Figueroa, Laura Baglietto, [......], Gianluca Severi, Daniel O Stram, Malin Sund, Michael J Thun, Ruth C Travis, Dimitrios Trichopoulos, Walter C Willett, Shumin Zhang, Regina G Ziegler, Peter Kraft[Show abstract] [Hide abstract]

**ABSTRACT:**Additive interactions can have public health and etiological implications but are infrequently reported. We assessed departures from additivity on the absolute risk scale between 9 established breast cancer risk factors and 23 susceptibility single-nucleotide polymorphisms (SNPs) identified from genome-wide association studies among 10,146 non-Hispanic white breast cancer cases and 12,760 controls within the National Cancer Institute's Breast and Prostate Cancer Cohort Consortium. We estimated the relative excess risk due to interaction and its 95% confidence interval for each pairwise combination of SNPs and nongenetic risk factors using age- and cohort-adjusted logistic regression models. After correction for multiple comparisons, we identified a statistically significant relative excess risk due to interaction (uncorrected P = 4.51 × 10(-5)) between a SNP in the DNA repair protein RAD51 homolog 2 gene (RAD51L1; rs10483813) and body mass index (weight (kg)/height (m)(2)). We also compared additive and multiplicative polygenic risk prediction models using per-allele odds ratio estimates from previous studies for breast-cancer susceptibility SNPs and observed that the multiplicative model had a substantially better goodness of fit than the additive model.American journal of epidemiology. 09/2014; - SourceAvailable from: Kathie DalessandriEldon R. Jupe, Kathie M. Dalessandri, John J. Mulvihill, Rei Miike, Nicholas S. Knowlton, Thomas W. Pugh, Lue Ping Zhao, Daniele C. DeFreese, Sharmila Manjeshwar, Bobby A. Gramling, John K. Wiencke, Christopher C. Benz[Show abstract] [Hide abstract]

**ABSTRACT:**A polyfactorial breast cancer risk assessment model (PFRM) was built and validated•Optimized PFRM incorporates both genetic (22 SNPS/19 genes) and clinical risk factors•The PFRM was further validated in a high risk USA/Marin breast cancer population•This PFRM consistently performed significantly better than the BCRAT (Gail model)•Functional aldosterone synthase SNP in PFRM improved predictive performance in MarinBBA Clinical. 11/2014; 2. - SourceAvailable from: PubMed Central[Show abstract] [Hide abstract]

**ABSTRACT:**Genetic Analysis Workshop 18 provided a platform for evaluating genomic prediction power based on single-nucleotide polymorphisms from single-nucleotide polymorphism array data and sequencing data. Also, Genetic Analysis Workshop 18 provided a diverse pedigree structure to be explored in prediction. In this study, we attempted to combine pedigree information with single-nucleotide polymorphism data to predict systolic blood pressure. Our results suggested that the prediction power based on pedigree information only could be unsatisfactory. Using additional information such as single-nucleotide polymorphism genotypes would improve prediction accuracy. In particular, the improvement can be significant when there exist a few single-nucleotide polymorphisms with relatively larger effect sizes. We also compared the prediction performance based on genome-wide association study data (ie, common variants) and sequencing data (ie, common variants plus low-frequency variants). The experimental result showed that inclusion of low frequency variants could not lead to improvement of prediction accuracy.BMC proceedings 01/2014; 8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S67.

Page 1

jnci.oxfordjournals.org

JNCI | Brief Communication 1037

BRIEF COMMUNICATION

for the major allele in the modeling. I

defi ne X i as the number of disease-

associated alleles at SNP i in a given sub-

ject and defi ne X = ( X 1 ,…, X 7 ). Under

Hardy – Weinberg equilibrium, the proba-

bilities of X i = 0, 1, and 2, namely, p i ( X i ) are

(1– s i ) 2 , 2(1– s i ) s i , and si

s i is the frequency of the disease-associated

allele. These seven SNPs are on six differ-

ent chromosomes, and rs1045485 and

rs13387042, which are both on chromo-

some 2, are 15.8 Mb apart. I therefore

assume linkage equilibrium, which implies

X =

=

∏

i

1

2187 such probabilities, P ( X ). Analyses

( 2 – 4 ) of data on these SNPs indicate that

at a given locus the odds ratio is

well described by ( ORi

that SNP effects are additive on the

logistic scale, the relative risk for a rare

disease is

2, respectively, where

P p X

(

i

() ).

i

7

There are 3 7 or

).

Xi If it is assumed

rr()()

X =

=∏ORi

i

1

X

i

7

[1]

The distribution of relative risks in the

general population is F rr ( t ) =

P ( X ), where t is a dummy argument repre-

senting any real number. The disease risk,

r ( X ), is the probability that a woman with

risk factors X will develop breast cancer

over a defi ned time interval. For a short

interval, such as 5 years, r ( X ) is propor-

tional to rr ( X ) because competing risks of

death can be ignored. Thus, r ( X ) = k [ rr ( X )],

where k is the risk for a woman with rela-

tive risk 1.0, which corresponds to the low-

est level of risk for all risk factors. Hence,

the distribution of risk in the general popu-

lation is F r ( t ) =

X

: [

k rr

∑

As shown by Gail et al. ( 6 ), the distribution

XX

:()

rr

≤

∑

t

X

( )]

t

≤

P ( X ) = F rr ( t / k ).

Hopes have been raised that combinations of

common genetic markers can be used to

improve the discriminatory accuracy of mod-

els to project the risk of a specifi c disease,

such as breast cancer, and thereby improve

disease prevention programs ( 1 ). Recent

genome-wide association studies ( 2 , 3 ) and an

assessment of candidate single-nucleotide

polymorphisms (SNPs) ( 4 ) revealed seven

common SNP alleles that confer risk for

breast cancer. I calculated how much dis-

criminatory accuracy these SNPs provide

and how much they can add to the discrimi-

natory accuracy of Gail model 2 ( 5 ) in the

National Cancer Institute’s Breast Cancer

Risk Assessment Tool (BCRAT) ( http://

www.cancer.gov/bcrisktool/ ).

From table 2 in Easton et al. ( 2 ), the

allele frequencies ( s i ) and per allele odds

ratios (OR i ) for fi ve disease-associated

SNPs were, respectively, 0.38 and 1.26 for

rs2981582 in FGFR2 , 0.25 and 1.20 for

rs3803662 in TNRC9 (now known as

TOX3 ), 0.28 and 1.13 for rs889312 in

MAP3K1 , 0.30 and 1.07 for rs3817198 in

LSP1 , and 0.40 and 1.08 for rs13281615

in chromosomal region 8q. I used the SNP

in TNRC9 with the highest association with

disease, rs3803662, which was identifi ed as

the result of fi ne-scale mapping ( 2 ). I

included SNP rs13387042 in chromosomal

region 2q35, with allele frequency 0.497

and per allele odds ratio of 1.20, from data

in table 1 of Stacey et al. ( 3 ). The minor

allele in CASP8 D302H (rs1045485) in

chromosomal region 2q has a frequency of

0.13 and odds ratio of 0.88 per allele [from

table 1 in Cox et al. ( 4 )]. To provide a

relative odds of 1.0 or more for disease-

associated alleles, I took the rare homozy-

gote as baseline and used allele frequency

0.87 with an odds ratio of 1.136 (=1/0.88)

Discriminatory Accuracy From Single-

Nucleotide Polymorphisms in Models to

Predict Breast Cancer Risk

Mitchell H. Gail

One purpose for seeking common alleles that are associated with disease is to use

them to improve models for projecting individualized disease risk. Two genome-

wide association studies and a study of candidate genes recently identified seven

common single-nucleotide polymorphisms (SNPs) that were associated with breast

cancer risk in independent samples. These seven SNPs were located in FGFR2 ,

TNRC9 (now known as TOX3 ), MAP3K1 , LSP1 , CASP8 , chromosomal region 8q, and

chromosomal region 2q35. I used estimates of relative risks and allele frequencies

from these studies to estimate how much these SNPs could improve discriminatory

accuracy measured as the area under the receiver operating characteristic curve

(AUC). A model with these seven SNPs (AUC = 0.574) and a hypothetical model

with 14 such SNPs (AUC = 0.604) have less discriminatory accuracy than a model,

the National Cancer Institute ’ s Breast Cancer Risk Assessment Tool (BCRAT), that

is based on ages at menarche and at first live birth, family history of breast cancer,

and history of breast biopsy examinations (AUC = 0.607). Adding the seven SNPs to

BCRAT improved discriminatory accuracy to an AUC of 0.632, which was, however,

less than the improvement from adding mammographic density. Thus, these seven

common alleles provide less discriminatory accuracy than BCRAT but have the

potential to improve the discriminatory accuracy of BCRAT modestly. Experience to

date and quantitative arguments indicate that a huge increase in the numbers of

case patients with breast cancer and control subjects would be required in genome-

wide association studies to find enough SNPs to achieve high discriminatory

accuracy.

J Natl Cancer Inst 2008;100: 1037 – 1041

Affiliation of author: Division of Cancer

Epidemiology and Genetics, National Cancer

Institute, Bethesda, MD .

Correspondence to: Mitchell H. Gail, Division of

Cancer Epidemiology and Genetics, National

Cancer Institute, 6120 Executive Blvd, Rm 8032,

Bethesda, MD 20892-7244 (e-mail: gailm@mail.

nih.gov ).

See “Funding” and “Notes” following “References.”

DOI: 10.1093/jnci/djn180

Published by Oxford University Press 2008.

Page 2

1038 Brief Communication | JNCI Vol. 100, Issue 14 | July 16, 2008

of risk in women who develop breast cancer

(case patients) is

FD r ( t ) =

kPkP

kt

[ ( )] ( )

rr

X

[ ( )] ( )

rr

X

: [( )]

X ≤

rr

XX

X

∑

all X X

∑

−1

=

rr rr

rr

( ) ( )

X

P

( ) ( )

X

P

: [

k

( )]

X

XX

XX

t

≤

∑∑

all

−1

.

Likewise, the distribution of relative risks

in case patients is

∑

and it follows that FD r ( t ) = FD rr ( t / k ).

The distribution F rr ( t ) is shown in

Figure 1 for the seven-SNP model. The

corresponding mean of log e [ rr ( X )] (MLRR)

is 0.841, with a standard deviation (SDLRR)

of 0.262. This SDLRR describes the dis-

persion of relative risk and risk in the popu-

lation and is related to discriminatory

accuracy ( 7 ). A steep slope in the midrange

of a locus in Figure 1 corresponds to a

small SDLRR.

The curves in Figure 2 are plots of

[1 – FD r ( t )] (ie, the probability that risk

exceeds a given level, t , in case patients)

against [1 – F r ( t )] (ie, the probability that risk

exceeds a given level, t , in the population),

as the risk level, t , (not shown) varies from

0 to 1.0. Each point on a locus thus gives

the probability that a case patient would

have a risk greater than t on the ordinate

and the probability that a member of the

general population would have a risk

greater than t on the abscissa. If most of the

risk were concentrated in a small propor-

tion of the population, the curve would rise

quickly, indicating that most case patients

had higher risks than members of the gen-

eral population. In the curve corresponding

to the seven-SNP model in Figure 2 , only

a fraction [1 – FD r ( t 0.5 )] = 0.606 of case

FD rr ( t ) =

rr rr

rr

( ) (

P

)( ) (

P

)

:()

XXXX

XXX

t

≤

∑

−

all

1

,

patients have risks higher than the median

risk in the general population, defi ned by

[1 – F r ( t 0.5 )] = 0.5, indicating poor discrimi-

nation for the seven-SNP model. Another

measure of discriminatory accuracy, the

area under this curve ( 6 , 8 , 9 ), is 0.574. For a

rare disease, such as breast cancer in a

5-year interval, this area is very nearly equal

to area under the receiver operating charac-

teristic curve (AUC), which is the probabil-

ity that a randomly selected case patient has

a projected risk greater than that of a ran-

domly selected control (non-case) subject

( 6 ). For these discrete risk models,

I allow for ties in projected risk by com-

puting the probability that the case risk

exceeds the control risk (more precisely the

risk in the general population) plus half

the probability that the case risk equals the

control risk.

To determine whether the modest dis-

criminatory accuracy of the seven-SNP

model could be improved, I supposed that

there were seven more SNPs with identical

properties to the fi rst seven SNPs and that

all were in linkage equilibrium. As shown

in Figure 2 , some improvement in discrim-

inatory accuracy was observed, with an

AUC of 0.604. The corresponding distri-

bution of [ log e [ rr ( X )] had an MLRR of

1.682 and an SDLRR of 0.371 ( Figure 1 ).

Note that 1.682 is twice the MLRR for the

seven-SNP model and 0.371 is 2 0.5 times

the SDLRR for the seven-SNP model, as

Figure 1. Cumulative distributions of the log e relative risk, F rr ( t ), for the seven – single-nucleotide

polymorphism (SNP) model ( thin dashed line ), a 14-SNP model with the original seven SNPs

plus seven more SNPs with identical characteristics ( thin solid line ), the Breast Cancer Risk

Assessment Tool (BCRAT; thick dashed line ), and BCRAT plus the seven SNPs ( thick solid

line ).

CONTEXT AND CAVEATS

Prior knowledge

Two genome-wide association studies and

a study of candidate genes recently identi-

fied seven common single-nucleotide poly-

morphisms (SNPs) that were associated

with breast cancer risk in independent

samples.

Study design

Estimates of relative risks and allele fre-

quencies from these studies were used to

estimate how much these SNPs could

improve discriminatory accuracy measured

as the area under the receiver operating

characteristic curve (AUC). The discrimina-

tory accuracy of these seven SNPs and a

hypothetical model with 14 such SNPs

were then compared with that of the

National Cancer Institute’s Breast Cancer

Risk Assessment Tool (BCRAT).

Contribution

The seven-SNP model (AUC = 0.574) and a

hypothetical model with 14 such SNPs

(AUC = 0.604) have less discriminatory

accuracy than the National Cancer Institute’s

BCRAT (AUC = 0.607). Adding the seven

SNPs to BCRAT increased the AUC to

0.632.

Implications

Experience to date and quantitative argu-

ments indicate that a huge increase in the

numbers of case patients with breast can-

cer and control subjects would be required

in genome-wide association studies to find

enough SNPs to achieve high discrimina-

tory accuracy.

Limitations

Individual-level data on case patients and

control subjects are needed to investigate

interactions that may improve the models.

The data used to estimate SNP effects did

not permit estimation of interactions among

SNPs or between SNPs and risk factors in

BCRAT.

From the Editors

Page 3

jnci.oxfordjournals.org

JNCI | Brief Communication 1039

follows from the addition of independent

log relative risks ( Equation 1 ).

BCRAT (Gail model 2) is based on age

at fi rst live birth, age at menarche, number

of fi rst-degree relatives with breast cancer,

and number of previous benign breast

biopsy examinations. BCRAT has been

criticized for lack of discriminatory accu-

racy ( 9 ). I obtained unbiased (weighted)

estimates ( 10 ) of the joint distribution of

these risk factors, X , for white women aged

50 years or older from the 2000 National

Health Interview Survey ( http://www.cdc.

gov/NCHS/nhis/htm ; data accessed on

July 22, 2002). From the BCRAT relative

risks ( 11 ), I used the methods described

above to calculate an MLRR of 0.520 and

an SDLRR of 0.359, corresponding to the

thick dashed curve in Figure 1 ; the AUC

was 0.607 ( Figure 2 ). Thus, BCRAT had

greater discriminatory accuracy measured

by AUC than the seven-SNP model and a

slightly greater AUC than the hypothetical

14-SNP model.

By assuming that odds ratios from the

seven-SNP model multiplied those from

the BCRAT and that the distribution of

these SNPs was independent of that of the

risk factors in BCRAT, I estimated how

much the discriminatory accuracy of

BCRAT could be improved by adding the

seven SNPs. The resulting distribution of

log e [ rr ( x )]( Figure 1 ) has an MLRR of 1.361

and an SDLRR of 0.445. The AUC

increased to 0.632 ( Figure 2 ). In a different

population, Chen et al. ( 12 ) estimated that

adding mammographic density to BCRAT

increased the average age-specifi c AUC

by 0.047, from 0.596 to 0.643. The

corresponding increase in AUC from

adding these seven SNPs to BCRAT was

0.025 (= 0.632 ? 0.607). Thus, mammo-

graphic density adds more to the discrimi-

natory accuracy of BCRAT than do the

seven SNPs.

All the AUC values in these analyses

describe the discriminatory power of risk

factors, such as SNPs, in women of com-

parable age over a short interval, such as

5 years. Thus, these AUC values describe

the discriminatory accuracy of risk factors

apart from age. Some investigators com-

pare case patients and control subjects

over large age ranges. Because age is a

strong predictor of breast cancer risk and

is included in all risk models and because

Figure 2. Probability that a case patient has a risk greater than t , [1– FD r ( t )], plotted against the

probability that a member of the general population has a risk greater than t , [1– F r ( t )], as t (not

shown) varies from 0 to 1. Separate curves are shown for the seven – single-nucleotide polymor-

phism (SNP) model ( thin dashed line ), a 14-SNP model with the original seven SNPs plus seven

more such SNPs ( thin solid line ), the Breast Cancer Risk Assessment Tool (BCRAT) ( thick dashed

line ), and BCRAT plus the seven SNPs ( thick solid line ). The areas under these curves are mea-

sures of discriminatory accuracy and approximate the areas under the receiver operating char-

acteristic curve.

case patients tend to be older than

control subjects, doing so increases the

AUC value.

This presentation is focused on dis-

criminatory accuracy. High discriminatory

accuracy is required for some applications,

such as screening for disease ( 6 ), but even

risk models with modest discriminatory

accuracy can be useful for some applica-

tions, such as deciding whether or not to

take tamoxifen, which decreases the abso-

lute risks of breast cancer and hip fracture

but increases the absolute risks of endome-

trial cancer and stroke ( 6 , 13 ). For such

decision problems, for general counseling,

and for designing prevention trials, it is

important that the model accurately pre-

dict the risk in women with various risk

factor combinations, a feature termed “cali-

bration” ( 6 , 9 ). To assess calibration, one

will need to study a cohort to determine

how many women develop breast cancer

and then compare that number with how

many cancers were predicted, overall and

in groups of women with various combina-

tions of genotypes and other risk factors. It

will be of special interest to determine

whether the risks for women with multiple

adverse alleles are as high as predicted by

the multiplicative model in Equation 1 .

Positive or negative interactions among

such SNP effects or with other risk factors

could lead to poor calibration in some sub-

groups. Although interactions can affect

calibration, my unreported calculations

indicate that they have little effect on

discrimi natory accuracy. The generaliz-

ability to various racial groups of a risk

model that is based on SNPs might be

affected by interactions between SNP

effects and racial group because the magni-

tude and even the direction of an associa-

tion of a marker allele with disease may

vary by racial group ( 3 ).

The power to detect interactions between

pairs of SNPs and between SNPs and other

risk factors is limited. A recent study of

prostate cancer risk ( 14 ) failed to detect

such interactions and found that adding

information from fi ve SNPs increased the

AUC for a model based on age, geographic

region, and family history of prostate cancer

by only 0.009, from 0.624 to 0.633. Another

study of prostate cancer failed to demon-

strate statistically signifi cant interactions

among disease-associated SNPs from seven

different genomic regions ( 15 ). It would be

Page 4

1040 Brief Communication | JNCI Vol. 100, Issue 14 | July 16, 2008

of interest to search for interactions of the

effects of common SNP alleles on breast

cancer risk with age, as have been found for

rare high-risk mutations in BRCA1 and

BRCA2 ( 16 ).

To build a model of absolute risk, one

can couple the relative risk estimates from

case – control data in genome-wide associ-

ation studies with cancer incidence rates

from registry data, as described previously

( 5 , 11 ). To do so requires data on the joint

distribution of all risk factors in represen-

tative case patients or in the general popu-

lation. In my analysis, it was assumed that

the SNP genotypes were mutually inde-

pendent and also independent of the fac-

tors in BCRAT. The effect of positive

correlations between these SNPs and

family history of breast cancer, if any,

would be to diminish the discriminatory

accuracy that these SNPs add to BCRAT

because family history is included in

BCRAT.

Very large relative risks are needed for a

single factor to achieve good discrimina-

tory accuracy ( 17 ). Even adding a strong

risk factor with a large attributable risk,

such as mammographic density, only

increased the AUC of a model like BCRAT

from 0.596 to 0.643 ( 12 ). Thus, it is not

surprising that adding seven SNPs with

small relative risks would increase the AUC

of BCRAT only modestly.

It is tempting to speculate on how much

additional discriminatory accuracy can be

achieved by identifying further common

SNPs and what effort would be required to

fi nd them. Pharoah et al. ( 7 ) assumed that

the natural logarithm of risk was normally

distributed, which provides a good approxi-

mation if many independent SNPs satisfy

Equation 1 and if risk is proportional to

relative risk, as was assumed in my analysis.

Based on segregation analyses ( 18 ) and

considerations of the recurrence risk among

siblings, Pharoah et al. ( 7 ) estimated an

SDLRR of 1.2 in the general population

and showed that the logarithm of risk in

case patients would be normally distributed

with the same variance but with the mean

increased by 1.2 2 (= 1.44). From these val-

ues, I calculated an AUC of 0.800. This

result supports arguments ( 7 ) that knowing

which SNPs give rise to this polygenic

component of risk (which is independent of

risk from BRCA1 and BRCA2 mutations)

might have some value for screening the

population. The seven-SNP model has an

SDLRR of 0.262. To achieve an SDLRR of

1.2, one would need 147 [= 7(1.2./0.262) 2 ]

SNPs like the seven SNPs already identi-

fi ed. The geometric mean of the per allele

odds ratios from these seven SNPs was

1.15. The study by Easton et al. ( 2 ) used

approximately 400 case patients with strong

family histories of breast cancer in the SNP

discovery phase, which might be equivalent

in statistical power to approximately 1600

population-based case patients ( 19 ). Stacey

et al. ( 3 ) used 1600 population-based case

patients in the discovery phase. Calculations

as in Gail et al. ( 20 ) show that approxi-

mately 65% of disease-associated SNPs

with an odds ratio of 1.15 would have

among the 25 000 smallest P values in a

scan of 500 000 SNPs if 1600 case patients

and control subjects are used in the discov-

ery phase. Thus, increasing the number of

case patients and control subjects in the

discovery phase to 5000 or more ( 20 ) might

increase the number of such SNPs that

would eventually be confi rmed in subse-

quent phases to 11 (= 7/0.65). Improvements

in SNP chip technology might yield a few

more such SNPs, but even a 50% increase

would yield only 17 (= 11 × 1.5) SNPs.

There are probably many other disease-

associated SNPs with smaller odds ratios,

but their detection will require larger num-

bers of case patients and control subjects

both in the discovery and validation phases.

For example, if remaining disease-

associated SNPs have a geometric mean

OR of 1.10, one would need ( 20 ) approxi-

mately 2.15 {= [ log(1.15)/log(1.10)] 2 } times

as many case patients and control subjects

in the discovery phase as was required for

an OR of 1.15. The contribution of an

SNP to the variance of the log relative risk

is 2 s i (1– s i )[log(OR i )] 2 . It follows that if 10

additional SNPs can be identifi ed with

properties like those of the seven SNPs

found so far but the rest of the SNPs have

an OR of 1.10, one will need to fi nd about

280 [= (147 ? 17) × 2.15] additional low-

risk SNPs to achieve the desired SDLRR of

1.2. Although these numbers are only illus-

trative, they show that a huge increase in

the numbers of case patients and control

subjects would be required in genome-wide

association studies to fi nd enough SNPs to

achieve an SDLRR of 1.2.

This study had several limitations. To

investigate interactions that may improve

the models, individual level data on case

patients and control subjects are needed.

The published data ( 2 – 4 ) used to estimate

SNP effects did not permit estimation of

interactions among SNPs or between SNPs

and risk factors in BCRAT. Several assump-

tions were needed to speculate on prospects

for fi nding additional common disease-

associated alleles that will achieve high

discriminatory accuracy. Further research

may indicate the extent to which these

assumptions and the resulting broad con-

clusions hold.

References

1. Evans JP . Health care in the age of genetic

medicine . JAMA . 2007 ; 298 ( 22 ) : 2670 – 2672 .

2. Easton DF , Pooley KA , Dunning AM , et al .

Genome-wide association study identifi es

novel breast cancer susceptibility loci . Nature .

2007 ; 447 ( 7148 ): 1087 – 1095 .

3. Stacey SN , Manolescu A , Sulem P , et al .

Common variants on chromosomes 2q35 and

16q12 confer susceptibility to estrogen

receptor-positive breast cancer . Nat Genet.

2007 ; 39 ( 7 ): 865 – 869 .

4. Cox A , Dunning AM , Garcia-Closas M , et al .

A common coding variant in CASP8 is associ-

ated with breast cancer risk . Nat Genet. 2007 ;

39 ( 3 ): 352 – 358 .

5. Costantino JP , Gail MH , Pee D , et al .

Validation studies for models projecting the

risk of invasive and total breast cancer inci-

dence . J Natl Cancer Inst . 1999 ; 91 ( 18 ):

1541 – 1548 .

6. Gail MH , Pfeiffer RM . On criteria for evalu-

ating models of absolute risk . Biostatistics .

2005 ; 6 ( 2 ): 227 – 239 .

7. Pharoah PDP , Antoniou A , Bobrow M ,

Zimmern RL , Easton DF , Ponder BAJ .

Polygenic susceptibility to breast cancer and

implications for prevention . Nat Genet. 2002 ;

31 ( 1 ): 33 – 36 .

8. Pepe MS . The Statistical Evaluation of Medical

Tests for Classifi cation and Prediction . New York :

Oxford University Press ; 2003 .

9. Rockhill B , Spiegelman D , Byrne C , Hunter

DJ , Colditz GA . Validation of the Gail et al.

model of breast cancer risk prediction and

implications for chemoprevention . J Natl

Cancer Inst . 2001 ; 93 ( 5 ): 358 – 366 .

10. Freedman AN , Graubard BI , Rao SR ,

McCaskill-Stevens W , Ballard-Barbash R ,

Gail MH . Estimates of the number of US

women who could benefi t from tamoxifen for

breast cancer chemoprevention . J Natl Cancer

Inst . 2003 ; 95 ( 7 ): 526 – 532 .

11. Gail MH , Brinton LA , Byar DP , et al .

Projecting individualized probabilities of

developing breast cancer for white females

who are being examined annually . J Natl

Cancer Inst . 1989 ; 81 ( 24 ): 1879 – 1886 .

12. Chen JB , Pee D , Ayyagari R , et al . Projecting

absolute invasive breast cancer risk in white

Page 5

jnci.oxfordjournals.org

JNCI | Brief Communication 1041

women with a model that includes mammo-

graphic density . J Natl Cancer Inst . 2006 ; 98 ( 17 ):

1215 – 1226 .

13. Gail MH , Costantino JP , Bryant J , et al .

Weighing the risks and benefi ts of tamoxifen

treatment for preventing breast cancer . J Natl

Cancer Inst . 1999 ; 91 ( 21 ): 1829 – 1846 .

14. Zheng SL , Sun JL , Wiklund F , et al . Cumulative

association of fi ve genetic variants with prostate

cancer . N Engl J Med . 2008 ; 358 ( 9 ) : 910 – 919 .

15. Thomas G , Jacobs KB , Yeager M , et al .

Multiple loci identifi ed in a genome-wide

association study of prostate cancer . Nat Genet.

2008 ; 40 ( 3 ): 310 – 315 .

16. Antoniou A , Pharoah PDP , Narod S , et al .

Average risks of breast and ovarian cancer

associated with BRCA1 or BRCA2 mutations

detected in case series unselected for family

history: a combined analysis of 22 studies . Am

J Hum Genet. 2003 ; 72 ( 5 ): 1117 – 1130 .

17. Pepe MS , Janes H , Longton G , Leisenring

W , Newcomb P . Limitations of the odds ratio

in gauging the performance of a diagnostic,

prognostic, or screening marker . Am J

Epidemiology . 2004 ; 159 ( 9 ): 882 – 890 .

18. Antoniou AC , Pharoah PDP , McMullan G ,

Day NE , Ponder BAJ , Easton D . Evidence for

further breast cancer susceptibility genes in

addition to BRCA1 and BRCA2 in a popula-

tion-based study . Genet Epidemiol . 2001 ; 21 ( 1 ):

1 – 18 .

19. Antoniou AC , Easton DF . Polygenic inheri-

tance of breast cancer: implications for design

of association studies . Genet Epidemiol . 2003 ;

25 ( 3 ): 190 – 202 .

20. Gail MH , Pfeiffer RM , Wheeler W , Pee D .

Probability of detecting disease-associated

single nucleotide polymorphisms in case-

control genome-wide association studies .

Biostatistics . 2008 ; 9 ( 2 ): 201 – 215 .

Funding

Intramural Research Program, Division of Cancer

Epidemiology and Genetics, National Cancer

Institute and National Institutes of Health.

Notes

I would like to thank Sir Bruce A. J. Ponder for

stimulating discussions leading to this work, Dr

Montserrat Garcia-Closas for discussions on an

empirical study to evaluate risk prediction models

including single-nucleotide polymorphisms with

other risk factors, the reviewers and Dr Ruth

M. Pfeiffer for helpful comments, and Mr David

Pee for providing estimates of the distribution

of risk factors for Breast Cancer Risk Assessment

Tool from the 2000 National Health Interview

Survey.

Manuscript received February 11 , 2008 ; revised

May 2 , 2008 ; accepted May 6 , 2008 .