Comparing risk prediction models

Centre for Statistics in Medicine, Wolfson College Annexe, University of Oxford, Oxford OX2 6UD, UK.
BMJ (online) (Impact Factor: 17.45). 05/2012; 344(may24 2):e3186. DOI: 10.1136/bmj.e3186
Source: PubMed
Download full-text


Available from: Gary S Collins, Dec 20, 2013
  • Source
    • "To our knowledge, this is the first comprehensive and independent evaluation of available risk models for early mortality following hip fracture surgery. Comparison of multiple risk models should preferably be carried out in an independent sample by investigators other than those who originally proposed the models [9] [10] [34]. In 2008, Burgos et al. compared six predictors for incidence of post-operative complications, ambulation after a 3- month period and 90-day mortality after hip fracture surgery [35]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Introduction: While predictors for mortality after hip fracture surgery have been widely studied, research regarding risk prediction models is limited. Risk models can predict mortality for individual patients, provide insight in prognosis, and be valuable in surgical audits. Existing models have not been validated independently. The purpose of this study is to evaluate the performance of existing risk models for predicting 30-day mortality following hip fracture surgery. Patients and methods: In this retrospective study, all consecutive hip fracture patients admitted between 2004 and 2010 were included. Predicted mortality was calculated for individual patients and compared to the observed outcome. The discriminative performance of the models was assessed using the area under the receiver operating characteristic curve (AUC). Calibration was analysed with the Hosmer-Lemeshow goodness-of-fit test. Results: A literature search yielded six risk prediction models: the Charlson Comorbidity Index (CCI), Orthopaedic Physiologic and Operative Severity Score for the enUmeration of Mortality and Morbidity (O-POSSUM), Estimation of Physiologic Ability and Surgical Stress (E-PASS), a risk model by Jiang et al., the Nottingham Hip Fracture Score (NHFS), and a model by Holt et al. The latter three models were specifically designed for the hip fracture population. All models except the O-POSSUM achieved an AUC greater than 0.70, demonstrating acceptable discriminative power. The score by Jiang et al. performed best with an AUC of 0.78, this was however not significantly different from the NHFS (0.77) or the model by Holt et al. (0.76). When applying the Hosmer-Lemeshow goodness-of-fit test, the model by Holt et al., the NHFS and the model by Jiang et al. showed a significant lack of fit (p<0.05). The CCI, O-POSSUM and E-PASS did not demonstrate lack of calibration. Discussion: None of the existing models yielded excellent discrimination (AUC>0.80). The best discrimination was demonstrated by the models designed for the hip fracture population, however, they had a lack of fit. The NHFS shows most promising results, with reasonable discrimination and extensive validation in earlier studies. Additional research is needed to examine recalibration and to determine the best risk model for predicting early mortality following hip fracture surgery.
    Full-text · Article · Nov 2014 · Injury
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: To identify existing prediction models for the risk of development of type 2 diabetes and to externally validate them in a large independent cohort. Systematic search of English, German, and Dutch literature in PubMed until February 2011 to identify prediction models for diabetes. DESIGN : Performance of the models was assessed in terms of discrimination (C statistic) and calibration (calibration plots and Hosmer-Lemeshow test).The validation study was a prospective cohort study, with a case cohort study in a random subcohort. Models were applied to the Dutch cohort of the European Prospective Investigation into Cancer and Nutrition cohort study (EPIC-NL). 38 379 people aged 20-70 with no diabetes at baseline, 2506 of whom made up the random subcohort. Incident type 2 diabetes. The review identified 16 studies containing 25 prediction models. We considered 12 models as basic because they were based on variables that can be assessed non-invasively and 13 models as extended because they additionally included conventional biomarkers such as glucose concentration. During a median follow-up of 10.2 years there were 924 cases in the full EPIC-NL cohort and 79 in the random subcohort. The C statistic for the basic models ranged from 0.74 (95% confidence interval 0.73 to 0.75) to 0.84 (0.82 to 0.85) for risk at 7.5 years. For prediction models including biomarkers the C statistic ranged from 0.81 (0.80 to 0.83) to 0.93 (0.92 to 0.94). Most prediction models overestimated the observed risk of diabetes, particularly at higher observed risks. After adjustment for differences in incidence of diabetes, calibration improved considerably. Most basic prediction models can identify people at high risk of developing diabetes in a time frame of five to 10 years. Models including biomarkers classified cases slightly better than basic ones. Most models overestimated the actual risk of diabetes. Existing prediction models therefore perform well to identify those at high risk, but cannot sufficiently quantify actual risk of future diabetes.
    Full-text · Article · Sep 2012 · BMJ (online)
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Aim: To validate paediatric index of mortality (PIM) and pediatric risk of mortality (PRISM) models within the overall population as well as in specific subgroups in pediatric intensive care units (PICUs). Methods: Variants of PIM and PRISM prediction models were compared with respect to calibration (agreement between predicted risks and observed mortality) and discrimination (area under the receiver operating characteristic curve, AUC). We considered performance in the overall study population and in subgroups, defined by diagnoses, age and urgency at admission, and length of stay (LoS) at the PICU. We analyzed data from consecutive patients younger than 16 years admitted to the eight PICUs in the Netherlands between February 2006 and October 2009. Patients referred to another ICU or deceased within 2 h after admission were excluded. Results: A total of 12,040 admissions were included, with 412 deaths. Variants of PIM2 were best calibrated. All models discriminated well, also in patients <28 days of age (neonates), with overall higher AUC for PRISM variants (PIM = 0.83, PIM2 = 0.85, PIM2-ANZ06 = 0.86, PIM2-ANZ08 = 0.85, PRISM = 0.88, PRISM3-24 = 0.90). Best discrimination for PRISM3-24 was confirmed in 13 out of 14 subgroup categories. After recalibration PRISM3-24 predicted accurately in most (12 out of 14) categories. Discrimination was poorer for all models (AUC < 0.73) after LoS of >6 days at the PICU. Conclusion: All models discriminated well, also in most subgroups including neonates, but had difficulties predicting mortality for patients >6 days at the PICU. In a western European setting both the PIM2(-ANZ06) or a recalibrated version of PRISM3-24 are suited for overall individualized risk prediction.
    Full-text · Article · Feb 2013 · Intensive Care Medicine
Show more