Comparing risk prediction models
Should be routine when deriving a new model for the same purpose
Gary S Collins senior medical statistician
1, Karel G M Moons professor of clinical epidemiology
1Centre for Statistics in Medicine, Wolfson College Annexe, University of Oxford, Oxford OX2 6UD, UK ;2Julius Centre for Health Sciences and
Primary Care, UMC Utrecht, 3508 GA Utrecht, Netherlands
Risk prediction models have great potential to support clinical
decision making and are increasingly incorporated into clinical
guidelines.1Many prediction models have been developed for
cardiovascular disease—the Framingham risk score, SCORE,
QRISK, and the Reynolds risk score—to mention just a few.
With so many prediction models for similar outcomes or target
populations, clinicians have to decide which model should be
as a minimum, how well the score predicts disease in people
outside the populations used to develop the model (“what is the
external validation?”) and which model performs best.2
colleagues examined the comparative performance of several
prespecified cardiovascular risk prediction models for the
general population.3They identified 20 published studies that
compared two or more models and they highlighted problems
in design, analysis, and reporting. What can be inferred from
the findings of this well conducted systematic review?
Firstly, direct comparisons are few. A plea for more direct
comparisons is increasingly heard in the field of therapeutic
intervention and diagnostic research and may be echoed in that
of prediction model validation studies. Many more prediction
models have been developed than have been validated in
independent datasets. Moreover, few models developed for
similar outcomes and target populations are directly validated
and compared.2The authors of the current study retrieved
various validation studies, but only 20 studies evaluated more
than one model and most of those compared just two models.
Thus, readers still need to judge from indirect comparisons
which of the available models provide the best predictors in
different situations. It would be much more informative if
better if they first conducted and reported a systematic review
of existing models before validating them in their dataset. Fair
comparison requires that if an existing model seems to be
miscalibrated for the data at hand, attempts should be made to
adjust or recalibrate the model.4 5For example, a prediction
model developed in one country or population does not
necessarily provide accurate predictions elsewhere. Ideally,
attempts should be made to examine pre-existing prediction
models in the new target setting and if necessary recalibrate or
further update the model and check its performance before
developing yet another model.4
Secondly, as Siontis and colleagues concluded, studies that
suggest one model is better than another often have potential
a new risk prediction model using their data and then compare
it with an existing model often report better performance for
the new model. Prediction models tend to perform better on the
dataset from which they were developed and usually, if not
always, perform better than existing models when validated on
that dataset. This is simply because the model is tuned to the
dataset at hand, which is why a model’s performance should be
evaluated in other datasets, preferably by independent
investigators. However, some form of reporting bias must play
a role here,6because a newly developed prediction model that
performed worse than an existing one would probably not be
submitted or published. Greater emphasis should therefore be
placed on methodologically sound and appropriately detailed
external validation studies, ideally of multiple models at once,
to show which model is most useful.7
Thirdly, the Framingham risk score may often require
against the Framingham risk score. Although the Framingham
risk score—developed in the United States during the
1970s—has stood the test of time, it has been shown to be
miscalibrated in several other settings.8It is not surprising that
without recalibration comparisons against it will often favour
the new model, especially if the validation dataset covers
specific subpopulations that were not covered in the original
Fourthly, Siontis and colleagues’ review supports the findings
of existing systematic reviews of prediction models.9The
conduct and reporting of prediction models has been criticised
are often omitted. In the absence of reporting guidelines for
such studies, Siontis and colleagues have provided suggestions
for conducting and reporting comparative studies, which if
For personal use only: See rights and reprints http://www.bmj.com/permissionsSubscribe: http://www.bmj.com/subscribe
BMJ 2012;344:e3186 doi: 10.1136/bmj.e3186 (Published 24 May 2012) Page 1 of 2
adhered to will make the task of appraising these studies easier.
of prediction models are being developed.10
Finally, there is a lack of consistency between studies that
are used to describe the performance of the models. Statistical
properties such as discrimination and calibration are widely
examined. As important as the statistical characteristics of the
model are, they do not ensure its clinical usefulness. There
for example,11or, preferably, on conducting a randomised trial
to evaluate the model’s ability to change clinicians’ decision
making and patient outcomes.7 12
Journal editors and peer reviewers should be more critical of
methodological shortcomings in prediction model studies, and
to describe a fair validation and to compare two or preferably
more risk prediction models simultaneously.
Competing interests: All authors have completed the ICMJE uniform
disclosure form at www.icmje.org/coi_disclosure.pdf (available on
request from the corresponding author) and declare: no support from
any organisation for the submitted work; no financial relationships with
any organisations that might have an interest in the submitted work in
the previous three years, no other relationships or activities that could
appear to have influenced the submitted work.
Provenance and peer review: Commissioned; not externally peer
1 National Institute for Health and Clinical Excellence. Lipid modification: cardiovascular
risk assessment and the modification of blood lipids for the primary and secondary
prevention of cardiovascular disease. 2008. CG67. http://guidance.nice.org.uk/CG67.
Altman DG, Vergouwe Y, Royston P, Moons KGM. Prognosis and prognostic research:
validating a prognostic model. BMJ 2009;338:b605.
Siontis GCM, Tzoulaki I, Siontis KC, Ioannidis JPA. Comparisons of established risk
prediction models for cardiovascular disease: systematic review. BMJ 2012;344:3318.
Toll DB, Janssen KJ, Vergouwe Y, Moons KG. Validation, updating and impact of clinical
prediction rules: a review. J Clin Epidemiol 2008;61:1085-94.
Steyerberg EW, Borsboom GJ, van Houwelingen HC, Eijkemans MJ, Habbema JD.
Validation and updating of predictive logistic regression models: a study on sample size
and shrinkage. Stat Med 2004;23:2567-86.
Rifai N, Altman DG, Bossuyt PM. Reporting bias in diagnostic and prognostic studies:
time for action. Clin Chem 2008;54:1101-3.
Moons KGM, Altman DG, Vergouwe Y, Royston P. Prognosis and prognostic research:
application and impact of prognostic models in clinical practice. BMJ 2009;338:b606.
Brindle P, Beswick A, Fahey T, Ebrahim S. Accuracy and impact of risk assessment in
the primary prevention of cardiovascular disease: a systematic review. Heart
Collins GS, Mallett S, Omar O, Yu LM. Developing risk prediction models for type 2
diabetes: a systematic review of methodology and reporting. BMC Med 2011;9:103.
Collins GS. Opening up multivariable prediction models: consensus-based guidelines for
transparent reporting. BMJ Blogs 2011; http://blogs.bmj.com/bmj/2011/08/03/gary-collins-
Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction
models. Med Decis Making 2006;26:565-74.
Moons KG, Kengne AP, Grobbee DE, Royston P, Vergouwe Y, Altman DG, et al. Risk
prediction models: II. External validation, model updating, and impact assessment. Heart
Cite this as: BMJ 2012;344:e3186
© BMJ Publishing Group Ltd 2012
For personal use only: See rights and reprints http://www.bmj.com/permissions Subscribe: http://www.bmj.com/subscribe
BMJ 2012;344:e3186 doi: 10.1136/bmj.e3186 (Published 24 May 2012) Page 2 of 2