Comparison of 19 pre-operative risk stratification models in open-heart surgery.
ABSTRACT To compare 19 risk score algorithms with regard to their validity to predict 30-day and 1-year mortality after cardiac surgery.
Risk factors for patients undergoing heart surgery between 1996 and 2001 at a single centre were prospectively collected. Receiver operating characteristics (ROC) curves were used to describe the performance and accuracy. Survival at 1 year and cause of death were obtained in all cases. The study included 6222 cardiac surgical procedures. Actual mortality was 2.9% at 30 days and 6.1% at 1 year. Discriminatory power for 30-day and 1-year mortality in cardiac surgery was highest for logistic (0.84 and 0.77) and additive (0.84 and 0.77) European System for Cardiac Operative Risk Evaluation (EuroSCORE) algorithms, followed by Cleveland Clinic (0.82 and 0.76) and Magovern (0.82 and 0.76) scoring systems. None of the other 15 risk algorithms had a significantly better discriminatory power than these four. In coronary artery bypass grafting (CABG)-only surgery, EuroSCORE followed by New York State (NYS) and Cleveland Clinic risk score showed the highest discriminatory power for 30-day and 1-year mortality.
EuroSCORE, Cleveland Clinic, and Magovern risk algorithms showed superior performance and accuracy in open-heart surgery, and EuroSCORE, NYS, and Cleveland Clinic in CABG-only surgery. Although the models were originally designed to predict early mortality, the 1-year mortality prediction was also reasonably accurate.
- SourceAvailable from: H.J. Hulzebos[Show abstract] [Hide abstract]
ABSTRACT: Background: Postoperative pulmonary complications (PPCs) are among the most frequently reported complications of Coronary Artery Bypass Graft (CABG) surgery. However, the risk to develop a PPC is not the same for all patients. The aim of this study was to validate a previously developed preoperative six-factor pulmonary risk model (age>70 years; productive cough, smoking, diabetes mellitus, inspiratory vital capacity > 75% predicted and maximum expiratory mouth pressure>75% predicted) to predict pneumonia, in patients undergoing CABG surgery. Methods: Prospectively collected data for 421 adult patients who had undergone elective CABG surgery, in a university medical center in the Netherlands, were used to validate the preoperative risk model for predicting pneumonia. The accuracy of the model was tested by comparing the expected and observed incidence of pneumonia in each patient. Results: Of the 421 patients, 227 (54%) were classified as being at high pulmonary risk, 24 (11%) of whom developed pneumonia. Only 4 of the 194 (2%) patients classified as being at low pulmonary risk developed pneumonia (OR=5.6; 95%CI, 1.9 to 16.5). The sensitivity (SE) was equal to 0.86, at a specificity (SP) of 0.48, both close to the values calculated for the development sample (SE=0.87, SP=0.56). The negative predictive value (NPV) was 0.98 and the area under curve (AUC) of the receiver-operating characteristics (ROC) curve was 0.76. The model that includes only the four anamnestic risk factors (age≥ 70 year, productive cough, smoking and diabetes mellitus) had an AUC equal to 0.75, with a SE=0.75, SP=0.62, and NPV=0.97. Conclusion: The study confirms the diagnostic accuracy of the preoperative six-factor pulmonary risk model in an independent sample. Both the six-factor and even the simple anamnestic four-factor models are accurate in identifying preoperative patients at risk of developing pneumonia undergoing CABG surgery.Journal Novel Physiotherapies. 01/2014; 4(4):1-6.
- [Show abstract] [Hide abstract]
ABSTRACT: Different risk models have been introduced and refined in the past in order to improve standards of care. However, the predictive power of any risk algorithms can decline over time due to changes in surgical practice and the population's risk profile. The present study aimed to develop and validate a risk model for predicting operative mortality in patients with ischaemic heart failure (HF) undergoing surgical ventricular reconstruction (SVR). The study population included 525 patients with previous myocardial infarction and left ventricular remodelling referred to our centre for SVR. All patients underwent surgical reshaping; coronary artery bypass grafting was performed in 489 (93%) patients and mitral valve (MV) repair in 142 (27%). Operative mortality was defined as death within 30 days after surgery. All patients received an operative risk assessment using the logistic EuroSCORE and the ACEF score. Better accuracy was achieved by the ACEF score (0.771) compared with the EuroSCORE (0.747). On multivariable logistic regression analysis, forcing the ACEF score in the model, three additional factors remained as independent predictors of operative mortality: atrial fibrillation, NYHA Class 3-4 and MV surgery (odds ratio 2.2, 2.6 and 2.1, respectively) and were computed in the ACEF-SVR. The ACEF-SVR score demonstrated an improved accuracy in respect of the ACEF score (from 0.771 to 0.792) and a better calibration (Hosmer-Lemeshow χ(2) of 5.40, P = 0.714). The ACEF-SVR score, starting from a simplified model of risk enabled improvement in the accuracy and calibration of the model, tailoring the risk to a specific population of patients with HF undergoing a specific surgical procedure. © The Author 2015. Published by Oxford University Press on behalf of the European Association for Cardio-Thoracic Surgery. All rights reserved.European journal of cardio-thoracic surgery: official journal of the European Association for Cardio-thoracic Surgery 02/2015; · 2.40 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Objective To evaluate the performance of the EuroSCORE II (ESII) and the Society of Thoracic Surgeons (STS) scores in surgical (SAVR) or transcatheter aortic valve replacement (TAVR). Design Systematic review of the literature and meta-analysis. Setting University hospitals. Participants Studies reporting data on the performance of ESII and STS scores in patients undergoing SAVR or TAVR. Interventions SAVR or TAVR. Measurements and Main Results Ten studies validated these scores in 13,856 patients who underwent either TAVR or SAVR. Operative mortality was 5.9% (SAVR 3.1%; TAVR 9.6%). ESII-expected mortality was 5.1% (O/E ratio: 1.15, SAVR, O/E ratio 0.94; TAVR, O/E ratio 1.23) and STS-expected mortality was 6.3% (O/E ratio: 0.94, SAVR, O/E ratio 0.84; TAVR, O/E ratio 1.13). The area under the ROC curve for ESII was 0.70 and for STS was 0.70 (SAVR patients: 0.73 for ESII and 0.75 for STS; TAVR patients; 0.66 for ESII and 0.63 for STS). The difference between observed/expected mortality was not significant for ESII (Peto’s OR 0.99, p = 0.88) and was significant for STS (Peto’s OR 0.86, p = 0.008). ESII (Peto’s OR 1.35, p<0.00001) and STS (Peto’s OR 1.23, p<0.00001) significantly underestimated the mortality risk in TAVR patients. The STS (Peto’s OR 0.74, p<0.0001) and, to a lesser extent, the ESII (Peto’s OR 0.86, p = 0.0.04) overestimated the mortality risk in SAVR patients. Conclusions The ESII and STS scores have good O/E ratios for either TAVR or SAVR patients, but both scores significantly underpredicted the risk of TAVR patients. ESII seemed to be accurate in predicting the risk of SAVR patients.Journal of Cardiothoracic and Vascular Anesthesia 09/2014; · 1.48 Impact Factor
Comparison of 19 pre-operative risk stratification
models in open-heart surgery
Johan Nilsson1*, Lars Algotsson2, Peter Ho ¨glund3, Carsten Lu ¨hrs1, and Johan Brandt1
1Department of Cardiothoracic Surgery, Heart and Lung Centre, Lund University Hospital, SE 221 85 Lund, Sweden;
2Department of Cardiothoracic Anesthesiology, Heart and Lung Centre, Lund University Hospital, Lund, Sweden; and
3Competence Centre for Clinical Research, Lund University Hospital, Lund, Sweden
Received 23 August 2005; revised 2 November 2005; accepted 16 December 2005; online publish-ahead-of-print 18 January 2006
See page 768 for the editorial comment on this article (doi:10.1093/eurheartj/ehi792)
Aims To compare 19 risk score algorithms with regard to their validity to predict 30-day and 1-year
mortality after cardiac surgery.
Methods and results Risk factors for patients undergoing heart surgery between 1996 and 2001 at a
single centre were prospectively collected. Receiver operating characteristics (ROC) curves were
used to describe the performance and accuracy. Survival at 1 year and cause of death were obtained
in all cases. The study included 6222 cardiac surgical procedures. Actual mortality was 2.9% at 30
days and 6.1% at 1 year. Discriminatory power for 30-day and 1-year mortality in cardiac surgery was
highest for logistic (0.84 and 0.77) and additive (0.84 and 0.77) European System for Cardiac Operative
Risk Evaluation (EuroSCORE) algorithms, followed by Cleveland Clinic (0.82 and 0.76) and Magovern
(0.82 and 0.76) scoring systems. None of the other 15 risk algorithms had a significantly better discrimi-
natory power than these four. In coronary artery bypass grafting (CABG)-only surgery, EuroSCORE fol-
lowed by New York State (NYS) and Cleveland Clinic risk score showed the highest discriminatory
power for 30-day and 1-year mortality.
Conclusion EuroSCORE, Cleveland Clinic, and Magovern risk algorithms showed superior performance
and accuracy in open-heart surgery, and EuroSCORE, NYS, and Cleveland Clinic in CABG-only surgery.
Although the models were originally designed to predict early mortality, the 1-year mortality prediction
was also reasonably accurate.
Despite technological advancements, open-heart operations
still carry a risk of mortality and morbidity. To aid in the
selection of patients for cardiac surgery, several risk-
scoring systems have been developed during the last
decades. These aim to estimate the risk of peri-operative
death, based on the occurrence of different risk factors.
Operative mortality is also increasingly used as an indicator
of the quality of cardiac surgery.1
To make an accurate comparison between different
institutions or surgeons, mortality data must be adjusted
to the risk profiles of the patients.2,3Differences between
the available risk algorithms regarding score design and the
patient population on which the score development was
based could influence their accuracy and performance.
Ideally, a risk model should be useful for outcome prediction
at different surgical centres, both at the institutional level
and for individual patients.4Operative mortality is the
outcome variable most commonly used as a quality indi-
cator, but long-term mortality may be more relevant from
a patient perspective.
A few comparative studies of different risk algorithms
exist.4–8However, the relative performance of the risk-
scoring systems currently used remains unclear. The
purpose of this study was to compare 19 open-source risk
score algorithms with regard to their validity to predict
30-day and 1-year mortality after cardiac surgery in a
large single-institution patient population.
Study design and patients
The study was approved by the Ethics Committee of the Medical
Faculty at Lund University. Risk factors for all adult patients under-
going heart surgery at the University Hospital of Lund between
January 1996 and February 2001 were prospectively collected
Cardiothoracic Surgery. The patient record form contained a total
of 248 variables (80 pre-, 106 intra-, and 62 post-operative) based
on the Society of Thoracic Surgeons (STS)9patient record form.
The data was stored in a local adult cardiac surgery database.
Data collection and risk-score calculation
From the total of 248 variables, those corresponding to the risk
factors in the different risk models were selected. Thus, a subset
& The European Society of Cardiology 2006. All rights reserved. For Permissions, please e-mail: email@example.com
*Corresponding author. Fax: þ46 46 15 86 35.
E-mail address: firstname.lastname@example.org
European Heart Journal (2006) 27, 867–874
by guest on May 14, 2011
of 104 of the pre- and intra-operative variables were imported into
the statistical software package, together with 30-day and 1-year
mortality for the population. Missing values were replaced using
the probability imputation technique10before the risk score was
calculated. The probability imputation technique substitutes con-
ditional probabilities for missing covariate values when the covariate
is qualitative. The risk score for each algorithm was calculated for
every patient according to the published definitions (Table 1).
The vital status at 1 year after the operation was obtained for all
patients from the Population and Welfare Statistics Sweden,
Statistiska Centralbyra ˚n, Stockholm, Sweden, as was the date and
cause of mortality.
Means (+SD) were used to describe the continuous variables, and
frequencies were calculated for categorical variables. Score-
predicted operative mortality (death within 30 days of operation)
was calculated using the mean score from the different risk
models, except for the Northern New England algorithm where
the published score-mortality table11was used. Receiver operating
characteristics (ROC) curves were used to describe the performance
and predictive accuracy for the different algorithms.12The discrimi-
natory power, i.e. the c-index, was evaluated by calculating the
areas under ROC curves.13The areas under curves are presented
with 95% confidence limits. An area of 1.0 under the ROC curve indi-
cates perfect discrimination, whereas an area of 0.50 indicates
complete absence of discrimination. Any intermediate value is a
quantitative measure of the ability of the risk predictor model to
distinguish between survivors and non-survivors.
To compare the areas under the resulting ROC curves (used as an
index for the predicted value), the non-parametric approach
described by DeLong et al.14was used. The ROC area for each risk
algorithm was systematically compared with the ROC area of the
other 18 algorithms. The numbers of algorithms with a significantly
larger or smaller ROC area was then computed. The probability
significance level was adjusted for the effect of multiple compari-
sons using Sidak’s method.
Graphs and statistical analyses were performed using the
Intercooled Stata version 9.0 (2005) statistical package (StataCorp
LP, College Station, TX, USA) and GraphPad Prism 4b, 2004 for Mac
OS X, GraphPad Software, Inc., USA.
Between January 1996 and February 2001, 6499 consecutive
heart operations were performed on 6414 patients. During
the period January–March 1998, database service and
upgrade resulted in missing values in 30% of the data
points. All operations (n ¼ 277) from this period were
excluded from the study. Thus, 6153 patients, undergoing
6222 operations, were included in the analysis. In 2% of
the total data points, missing values were replaced using
the probability imputation technique.10There was accurate
documentation of data including mortality and cause of
death in all cases, and no patient was lost to follow-up.
The average age was 66.3 + 10.6 years (range 18–95).
The majority of patients were men (72%). A coronary
artery bypass grafting (CABG)-only operation was performed
in 4351 cases (70%), 1340 (22%) cases had a valve procedure
with or without CABG surgery, and 531 (8%) were miscel-
laneous procedures, e.g. post-infarction septal rupture
(37 cases), aortic aneurysm or dissection (209 cases), and
surgery had been performed in 457 cases (7.3%). Seventy-
eight patients (1.3%) were in cardiogenic shock at the
start of the operation and 628 (10%) were operated within
24 h after acceptance for surgery (emergency surgery).
The actual 30-day mortality was 2.9% (n ¼ 180) and the
1-year mortality was 6.1% (n ¼ 377).
Synopsis of original data of 19 risk score algorithms
RegionYear of data
Number of patients
UK national scorea,5
13 302 (128)
13 302 (128)
18 814 (33)
12 712 (43)
Add, additive; log, logistic; mod, modified; NNE, Northern New England; N/A, not available. Cleveland Clinic risk score algorithm is also known as Higgins
score, NNE as American College of Cardiology/American Heart Association (ACA/AHA) score, and Ontario as Provincial Adult Cardiac Care Network (PACCN)
aAlgorithms developed for CABG-only surgery.
868J. Nilsson et al.
by guest on May 14, 2011
Performance and predictive accuracy for the
The discriminatory power (i.e. the area under the ROC
curve) for 30-day mortality and 1-year mortality was
highest for the logistic (0.84 and 0.77) and additive (0.84
and 0.77) European System for Cardiac Operative Risk
Evaluation (EuroSCORE) algorithms,
Cleveland Clinic (0.82 and 0.76) and the Magovern (0.82
and 0.76) scoring systems (Figures 1 and 2). None of the
other risk algorithms had a significantly better discrimina-
tory power (larger ROC area) than these four (Figure 3). In
the subanalysis with CABG-only patients, the discriminatory
power for the two EuroSCORE algorithms were highest, fol-
lowed by the New York State (NYS) and Cleveland Clinic
risk algorithm (Table 2).
The mortality predictions of the different scoring systems
are shown in (Figure 4).
The most common cause of death within 30 days was cardio-
vascular disease (n ¼ 163, 91%), followed by cerebrovascular
disease (n ¼ 3, 1.7%), malignant neoplasm (n ¼ 3, 1.7%), and
chronic lower respiratory disease (n ¼ 2, 1.1%). Cardio-
vascular disease was also the most common cause of death
within 1 year (n ¼ 280, 74%), followed by malignant
vs. 1-specificity for the 19 risk algorithms is plotted. The solid line represents
the absence of discrimination. Open-heart surgery (n ¼ 6222).
The ROC curves. The sensitivity of prediction of 30-day mortality
risk scoring system (left y-axis), the number of risk algorithms with a
significantly (P , 0.05) larger (black bar) or smaller (grey bar) ROC area are
shown. (A) 30-day mortality and (B) 1-year mortality. Open-heart surgery
(n ¼ 6222). See Table 1 for abbreviations.
Comparison of the ROC area for different risk algorithms. For each
bars) for 30-day mortality and 1-year mortality. (A) 30-day mortality and (B)
1-year mortality. Open heart surgery
The ROC area (diamonds) with 95% confidence intervals (horizontal
(n ¼ 6222).See
Comparison of 19 pre-operative risk stratification models in open-heart surgery869
by guest on May 14, 2011
neoplasm (n ¼ 22, 5.8%), cerebrovascular disease (n ¼ 16,
4.2%), chronic lower respiratory disease (n ¼ 10, 2.7%), and
septicaemia (n ¼ 10, 2.7%). For each risk algorithm, the
ROC areas for cardiovascular-related (n ¼ 163) and total
30-day mortality (n ¼ 180) were almost identical (difference
0.005 or less). The discriminatory power for cardiovascular-
related1-yearmortality(n ¼ 280)increasedbyapproximately
0.03 for all 19 algorithms compared with the discriminatory
power for total 1-year mortality (n ¼ 377) (logistic Euro-
SCORE 0.80, additive EuroSCORE 0.80, Cleveland Clinic
0.79, and Magovern 0.78). However, it did not change their
relative order of discriminatory power.
The purpose of this study was to compare 19 commonly used
cardiac surgical risk scores with regard to their validity in a
large single-institute patient population. The results show
that four of the algorithms had a superior performance
and accuracy to predict 30-day and 1-year mortality,
expressed as discriminatory power, compared with the
other 15 algorithms. Despite the fact that all of the algor-
ithms were designed to predict early mortality, they also
predict 1-year mortality well, especially when the cause of
death was cardiovascular disease.
Most algorithms overestimated the 30-day mortality in
this patient population. The same finding has been reported
in other studies.4,6Rather than reflecting weaknesses in the
risk score algorithm, these findings are probably explained
by differences in patient mix and temporal periods com-
pared to the original databases used for development of
the algorithms.6Prediction of mortality rate in the CABG-
only subgroup was almost perfect using the Northern New
England and NYS algorithms, which are both for use in
CABG surgery and newly developed.
The potential of ROC curves in medical diagnostic testing
was recognized as early as 1960.15Even if comparison of ROC
curves in a statistically valid fashion to evaluate models
remains controversial, the ROC curve is currently the best
developed statistical tool for describing performance.12
The EuroSCORE model, which had the highest discriminatory
power, has been shown to work well to predict 30-day mor-
tality in many European countries16and in the United
States.17It compared favourably with the STS risk stratifica-
tion algorithm7(which is not open source and was therefore
not included in the present analysis). Recently, it was
demonstrated that EuroSCORE could predict intensive care
unit stay and costs of open-heart surgery.18The Cleveland
Clinic model has also shown high discrimination to predict
early mortality.8An important finding in the present study
is that these algorithms could be used also to predict long-
term mortality (1 year), especially for cardiovascular
Earlier studies have compared the performance of differ-
ent risk algorithms to predict 30-day mortality,4,6,8but
have not shown significant differences in performance and
accuracy. This may be explained by smaller patient
The predictive accuracy of different risk scoring systems
may be influenced by numerous factors, such as differences
in variable definitions, management of incomplete data
fields, surgical procedure selection criteria, and geographi-
cal differences in patient risk factors. The prevalence of
risk factors in patients referred for heart surgery may also
change over time. Difficulties thus arise when comparison
of the accuracy and predictive power of large databases
are attempted. However, ROC analysis is a robust technique
for such comparisons. Importantly, the shapes of the ROC
curves were similar among the compared risk models
(Figure 2), making direct comparison possible.12Murphy-
Filkins et al.19showed that an increase up to five times of
ance and accuracy in CABG-only surgery (n ¼ 4351)
ROC area (95% CI)
ROC area for the five risk algorithms with best perform-
ROC area (95% CI)
Cleveland Clinic risk score algorithm is also known as Higgins score.
lines) in comparison to score-predicted 30-day mortality (diamonds) with 95%
confidence intervals (horizontal bars). (A) All open-heart surgery and (B)
CABG-only surgery. Asterisk denotes the predicted mortality calculated
from ACC/AHA score mortality table11specified for CABG-only surgery. See
Table 1 for abbreviations.
Observed 30-day mortality with 95% confidence intervals (vertical
870 J. Nilsson et al.
by guest on May 14, 2011
a low-frequency variable (for example, due to difference
in a variable definition) did not appreciably change the
All surgical procedures were included in the study, irre-
Thus, a patient could participate two or more times in
the analysis. This could be debated, as a dependence of the
data that arises from multiple procedures performed
within a patient may occur. An alternative would be to
include only the first procedure for each patient. A subanaly-
sis using this approach (n ¼ 6153) showed only very small
differences in the ROC area for the different risk algorithms
(in average 0.001). A drawback of excluding patients having
a second procedure during the study period is that some
high-risk cases will be eliminated from the analysis.
Regardless of which method used, the differences caused by
this dependence was negligible, most likely due to the
small number of patients (1%) who had more than one
The probability imputation technique, used in this study,
studies.20Another strategy to handle incomplete data is to
exclude the patients with missing values from analysis, but
because missing values are more likely in emergent high-
risk patients, this could result in bias.
Geographical differences in the occurrence of patient risk
factors may have influenced the design of different risk-
scoring systems, but do not seem to influence the present
results. The best-performing risk scores in this study were
developed in two different geographical areas: Europe and
Eight of the included risk algorithms (Cabdeal, NYS,
Northern NewEngland, Magovern,
(modified), UK national score, and Veterans Affairs) were
originally designed to predict early mortality in CABG-only
patients, which also could affect the predictive accuracy.
A subanalysis of CABG-only patients in this material ident-
ified the same two risk-scoring systems with the largest
ROC areas (EuroSCORE additive and logistic), followed by
the NYS and the Cleveland Clinic risk-scoring systems.
The smaller ROC area for the 1-year than for the 30-day
mortality prediction was expected. Risk models originally
designed to predict 30-day mortality will mainly predict
cardiovascular death, which was the most common cause
of early post-operative mortality (91%). At 1 year, the
causes of death will be more diverse and the proportion of
cardiovascular-related death will decrease (74%).
The strength of the present study is that the algorithms
could be compared using a relatively large patient material,
where the patient data were collected on a regular basis in
the daily clinical work. The data was pre-operatively
entered into the database, generally by residents, and not
by the surgeon performing the operation.
During the last decades, several different risk score algor-
ithms for cardiac surgery have been published, but it still
remains difficult to risk stratify individual patients.4,8One
method to improve risk algorithm development could be to
include more patients with higher risk scores as suggested
by Wyse and Taylor.21
However, we found that the
Cleveland Clinic score, which was developed on 5051
patients, performed almost as well as the EuroSCORE, devel-
oped on 13 302 patients.
Most risk algorithms are based on logistic regression analy-
sis with a priori assumptions of linear relationships. Another
method to improve risk prediction could be to use a more
complex risk model, such as the artificial neural network,
which has the advantage of the capacity to model
complex, non-linear relationships and is relatively robust
and tolerant of missing data.22There are only a few
studies done in this area, which merits further investigation.
Even if a perfect risk prediction algorithm in cardiac
surgery is never achieved, identification of the best-
performing risk algorithms is important. Pre-operative risk
stratification may aid in the selection between cardiac
available, facilitate the planning of hospital resource utili-
zation, and enable accurate comparison between different
institutions or surgeons.
Conflict of interest: none declared.
Pre-operative general risk factors in 6222 open-heart operations
or n (%)
Cabdeal Cleveland Clinic
NYS Northern New England
ParsonnetParsonnet (modified) Pons
TorontoToronto (modified) Tremblay Tuman
UK national score
(sys .140 mmHg)
Comparison of 19 pre-operative risk stratification models in open-heart surgery871
by guest on May 14, 2011