Comparison between SAPS II and SAPS 3 in predicting hospital mortality in a cohort of 103 Italian ICUs. Is new always better?
ABSTRACT More recent severity scores should be more reliable than older ones because they account for the improvement in medical care over time. To provide more insight into this issue, we compared the predictive ability of the Simplified Acute Physiology Score (SAPS) II and SAPS 3 (originally developed from data collected in 1991-1992 and 2002, respectively) on a sample of critically ill patients.
This was a prospective observational study on 3,661 patients from 103 Italian intensive care units. Standardized mortality ratios (SMRs) were calculated. Assessment of calibration across risk classes was performed using the GiViTI calibration belt. Discrimination was evaluated by means of the area under the receiver operating characteristic analysis.
Both scores were shown to discriminate fairly. SAPS 3 largely overpredicted mortality, more than SAPS II (SMR 0.63, 95 % CI 0.60-0.66 vs. 0.87, 95 % CI 0.83-0.91). This result was consistent and statistically significant across all risk classes for SAPS 3. SAPS II did not show relevant deviations from ideal calibration in the first two deciles of risk, whereas in higher-risk classes it overpredicted mortality.
Both scores provided unreliable predictions, but unexpectedly the newer SAPS 3 turned out to overpredict mortality more than the older SAPS II.
- [show abstract] [hide abstract]
ABSTRACT: To analyze the effects of patient mix diversity on performance of an intensive care unit (ICU) severity-of-illness model. Multiple patient populations were created using computer simulations. A customized version of the Mortality Probability Model (MPM) II admission model was used to ascertain probabilities of hospital mortality. Performance of the model was assessed using discrimination (area under the receiver operating characteristic curve) and calibration (goodness-of-fit testing). Intensive care units. Data were collected from 4,224 ICU patients from two Massachusetts hospitals (Baystate Medical Center, Springfield, MA; University of Massachusetts Medical Center, Worcester, MA) and two New York hospitals (Albany Medical Center, Albany, NY; Ellis Hospital, Schenectady, NY). Random samples were taken from a database. The percentage of patients with each model disease characteristic was varied by assigning weights (ranging from 0 to 10) to patients with a disease characteristic. Three simulations were run for each of 15 model variables at each of 16 weights, totaling 720 simulations. The area under the receiver operating characteristic curve and model fit were assessed in each random sample. Removing patients with a given disease characteristic did not affect discrimination or calibration. Increasing frequency of patients with each disease characteristic above the original frequency caused discrimination and calibration to deteriorate. Model fit was more robust to increases in less frequently occurring patient conditions. From the goodness-of-fit test, a critical percentage for each admission model variable was determined for each disease characteristic, defined as the percentage at which the average p value for the test over the three replications decreased to < .10. The concept of critical percentages is potentially clinically important. It might provide an easy first step in checking applicability of a given severity-of-illness model and in defining a general medical-surgical ICU. If the critical percentages are exceeded, as might occur in a highly specialized ICU, the model would not be accurate. Alternative modeling approaches might be to customize the model coefficients to the population for more accurate probabilities or to develop specialized models. The MPM approach remained robust for a large variation in patient mix factors.Critical Care Medicine 12/1996; 24(12):1968-73. · 6.12 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: To compare the performance of the New Simplified Acute Physiology Score (SAPS II) and the New Admission Mortality Probability Model (MPM II0) within relevant subgroups using formal statistical assessment (uniformity of fit). Analysis of the database of a multi-centre, multi-national and prospective cohort study, involving 89 ICUs from 12 European Countries. Database of EURICUS-I. Data of 16,060 patients consecutively admitted to the ICUs were collected during a period of 4 months. Following the original SAPS II and MPM II0 criteria, the following patients were excluded from the analysis: younger than 18 years of age; readmissions; acute myocardial infarction; burn cases; patients in the post-operative period after coronary artery bypass surgery and patients with a length of stay in the ICU shorter than 8 h, resulting in a total of 10,027 cases. Data necessary for the calculation of SAPS II and MPM II0, basic demographic statistics and vital status on hospital discharge were recorded. Formal evaluation of the performance of the models, comprising discrimination (area under ROC curve), calibration (Hosmer-Lemeshow goodness-of-fit H and C tests) and observed/expected mortality ratios within relevant subgroups. Better predictive accuracy was achieved in elective surgery patients admitted from the operative room/post-anaesthesia room with gastrointestinal, neurological or trauma diagnoses, and younger patients with non-operative neurological, septic or trauma diagnoses. All these characteristics appear to be linked to a lower severity of illness, with both models overestimating mortality in the more severely ill patients. Concerning the performance of the models, very large differences were apparent in relevant subgroups, varying from excellent to almost random predictive accuracy. These differences can explain some of the difficulties of the models to accurately predict mortality when applied to different populations with distinct patient baseline characteristics. This study stresses the importance of evaluating multiple diverse populations (to generate the design set) and of methods to improve the validation set before extrapolations can be made from the validation setting to new independent populations. It also underlines the necessity of a better definition of the patient baseline characteristics in the samples under analysis and the formal statistical evaluation of the application of the models to specific subgroups.Intensive Care Medicine 01/1998; 24(1):40-7. · 5.26 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: This paper presents a comprehensive approach to the validation of logistic prediction models. It reviews measures of overall goodness-of-fit, and indices of calibration and refinement. Using a model-based approach developed by Cox, we adapt logistic regression diagnostic techniques for use in model validation. This allows identification of problematic predictor variables in the prediction model as well as influential observations in the validation data that adversely affect the fit of the model. In appropriate situations, recommendations are made for correction of models that provide poor fit.Statistics in Medicine 09/1991; 10(8):1213-26. · 2.04 Impact Factor