Predicting Mortality and Healthcare
Utilization with a Single Question
Karen B. DeSalvo, Vincent S. Fan, Mary B. McDonell, and
Stephan D. Fihn
Objective. We compared single- and multi-item measures of general self-rated health
(GSRH) to predict mortality and clinical events a large population of veteran patients.
Data Source/Study Setting. We analyzed prospective cohort data collected from
21,732 patients as part of the Veterans Affairs Ambulatory Care Quality Improvement
Project (ACQUIP), a randomized controlled trial investigating quality-of-care
Study Design. We created an age-adjusted, logistic regression model for each
predictor and outcome combination, and estimated the odds of events by response
category of the GSRH question and compared the discriminative ability of the
predictors by developing receiver operator characteristic curves and comparing the
Data Collection/Extraction Methods. All patients were sent a baseline assessment
that included a multi-item measure of general health, the 36-item Medical Outcomes
Study Short Form (SF-36), and an inventory of comorbid conditions. We compared the
predictive and discriminative ability of the GSRH to the SF-36 physical component
score (PCS), the mental component score (MCS), and the Seattle index of comorbidity
you say your health is: Excellent, Very Good, Good, Fair, Poor?’’
Principal Findings. The GSRH, PCS, and SIC had comparable AUC for predicting
mortality (AUC 0.74,0.73,and0.73,respectively); hospitalization(AUC 0.63,0.64,and
0.60, respectively); and high outpatient use (AUC 0.61, 0.61, and 0.60, respectively).
The MCS had statistically poorer discriminatory performance for mortality and
hospitalization than any other other predictors (po.001).
Conclusions. The GSRH response categories can be used to stratify patients with
varying risks for adverse outcomes. Patients reporting ‘‘poor’’ health are at significantly
greater oddsofdying orrequiring healthcare resources comparedwith their peers.The
GSRH, collectable at the point of care, is comparable with longer instruments.
Key Words. Quality of life, mortality, hospitalization, outpatient, risk assessment
r Health Research and Educational Trust
Health administrators, researchers, and policymakers use prediction models
utilization. Traditionally, administratively derived predictors have been used
for such purposes, however, their limitations have led to the development of
alternatives (Romano et al. 1993; Iezzoni et al. 1996; Iezzoni 1999;
Schneeweiss and Maclure 2000; Schneeweiss et al. 2001, 2003). Measures of
self-rated health are robust risk predictors that have gained in popularity as a
substitute for administratively derived tools. These self-rated health measures
are patient centered and predictive of subsequent health outcomes, even in
patients without prior health problems. In several studies, patient self-rated
health status has predicted such important patient outcomes as mortality and
health system utilization (Miilunpalo et al. 1997; Curtis et al. 2002; Fan et al.
2002a,b; Spertus et al. 2002; Knight et al. 2003). These measures remain
consistent predictors of hospitalizations and mortality rates even after
adjustment for clinically relevant factors (Clarke and Oxmann 2002; Lowrie
et al. 2003).
Routine use of self-rated health measures for health care planning and
delivery is partially limited by burdens associated with collection of health
status information. Many self-rated health measures are multi-item scales that
are often onerous to collect in routine practice settings. Single-item general
self-rated health status (GSRH) measures may serve as a reasonable substitute
formulti-item measures of self-ratedhealth(Balkrishnan and Anderson2001).
They have the advantage of being less expensive and less burdensome to
collect, and could be conceivably collected at the point of care with relative
ease. In a health care setting that uses a relational, electronic database, this
collection could occur as part of routine intake in the primary care setting.
They are easy to score and interpret and, like the longer multi-item scales,
these single-item measures have predictive validity for mortality and health
1999; Balkrishnan et al. 2000). GSRH measures are relatively stable (Eriksson
et al. 2001) and sensitive to change (Rodin and McAvay 1992; Diehr et al.
Address correspondence to: Karen B. DeSalvo, M.D., M.P.H., M.Sc., Section of General Internal
Medicine, Tulane University School of Medicine, 1430 Tulane Avenue, SL-16, New Orleans, LA
70112. Karen B. DeSalvo is with the Department of Medicine, Tulane University School of
and Tropical Medicine. Vincent S. Fan, M.D., M.P.H., Mary B. McDonell, M.Sc., and Stephan D.
Fihn, M.D., M.P.H are with the VA Puget Sound Health Care System, NW, HSR & D Center of
Excellence. Drs. Fan and Fihn are also with Department of Medicine, University of Washington.
GSRH Prediction of Outcomes1235
While the research regarding the use of a single-item GSRH measure as
a risk assessment tool is promising, gaps exist (McHorney 1999; Diehr et al.
2001; Eriksson et al. 2001). For example, the performance of such tools is
poorly understood in diverse patient populations, and in comparison with
whether a single-item GSRH measure could predict important outcomes in a
large veteran outpatient population, and to compare its discriminative ability
with established multi-item risk predictors.
Quality Improvement Project (ACQUIP). This multicenter, randomized trial
was designed to study the effectiveness of primary care-based, quality-of-care
interventions in a Veterans Affairs (VA) patient population (Fihn et al. 2004).
Subjects and Setting
in the general internal medicine clinics, between March 1, 1997 and July 31,
1999, at one of seven VA medical centers: Birmingham, Alabama; Little
Rock, Arkansas; San Francisco, California; West Los Angeles, California;
regular intervals to enrolled patients. The information from these surveys was
linked to health resource utilization and clinical outcomes using the VA
information system (Visit A).
At time of enrollment, participants were sent a Health Checklist, which
asked about sociodemographic characteristics and coexisting illnesses. The
Health Checklist was returned by 35,383 (54 percent) of the enrolled patients.
Patients who returned the Health Checklist were then sent an instrument that
measured general health status, the 36-item Medical Outcomes Study Short
Form (SF-36) (Ware 1998). During the study, 61 percent (n521,732) of these
participants returned at least one SF-36. The first SF-36 returned was used for
the analysis. Participants who returned a completed SF-36 (n521,732) were
older (mean age 64 versus 58, po.001), more likely to be male (po.001),
married (po.001), employed (po.001), and white (po.001) than nonrespon-
dents. Respondents had a somewhat higher prevalence of some chronic
1236 HSR: Health Services Research 40:4 (August 2005)
medical conditions including a prior myocardial infarction (18 versus 16
percent,po.001),cancer(12versus10 percent,po.001),and congestive heart
failure (8 versus 7 percent, po.001) than those who did not return the SF-36.
The main predictor was the single-item GSRH question from the SF-36, ‘‘In
general,wouldyousayyourhealthis. . .’’witha5-categoryLikertresponsescaleof
Excellent, Very Good, Good, Fair, Poor. We compared the GSRH to multi-item
scales that are calculated from the SF-36 and have been shown to predict
mortality and utilization (Hornbrook and Goodman 1996; Fan et al. 2002).
The SF-36 consists of eight subscales: physical functioning, role physical,
bodily pain, general health, vitality, social functioning, role emotional, and
and physical component summary (PCS) scores (Ware et al. 1994). The PCS
and MCS are each normalized to a 100-point scale with a mean of 50
(SD ? 10). Higher scores reflect better functioning (Ware and Keller 1995). In
the VA population, scores on the PCS and MCS tend to be below national
norms, with mean scores of 47.8 (SD ? 12.2) and 37.1 (SD ? 11.9),
respectively (Kazis et al. 1999) (Au et al. 2001).
of comorbidity (SIC), which combines patients’ self-reports of coexisting
chronic conditions (prior myocardial infarction, cancer, chronic obstructive
pulmonary disease, congestive heart failure, diabetes mellitus, pneumonia,
and stroke), age, and tobacco use. Ordinarily, the SIC is scored with higher
scores reflecting increasing levels of comorbidity and greater risk of mortality
and hospitalization among elderly, male primary care patients (Fan et al.
SIC to mirror the other measures being evaluated.
The primary outcome was all-cause mortality in the year following baseline
assessment of health status. We ascertained death from the VA Beneficiary
Identification in Record Locator Subsystem (BIRLS) database, which records
deaths of patients whose families apply for veteran’s death benefits. The
BIRLS system has a sensitivity for detecting mortality that ranges between
80.0 and 94.5 percent (Cowper et al. 2002).
Secondary outcomes included aspects of health services use during the
1-yearperiod following the baseline of health statusmeasurement. We treated
GSRH Prediction of Outcomes1237
hospital admission for any reason during the study interval. We did not
ascertain admissions to facilities outside of the VA health system or to nursing
homes. Use of outpatient services consisted of all medical visits within the VA
systemincludingprimary and specialtycare.Wedefined‘‘highuse’’asthetop
10th percentile of total visits for the year, which translated into more than
We created an age-adjusted logistic regression model for each predictor and
outcome combination. For this analysis, we included only those 21,732
patients who had completed both the Health Checklist and the SF-36, and
who had at least 1 year of follow-up. The GSRH response options were
modeled as a categorical variable collapsing the ‘‘excellent’’ and ‘‘very good’’
intoonereference category to keepwithconventional practice (Idler and Kasl
1991) (Kaplan and Camacho 1983). We calculated odds ratios for death,
hospitalization, and high outpatient utililization. The c-statistic, or area under
the receiver operator curve (AUC),assessed the model discrimination and the
Hosmer–Lemeshow goodness-of-fit w2statistic (Hosmer, Applied Logistic
Regression) assessed the model calibration. AUC values range from 0 to 1,
with a value of 1 representing perfect prediction and a value of 0.5
representing chance prediction and a relevant parameter space of 0.5–1.0.
Hosmer and Lemeshow have suggested that a c-statistic or AUC value
between 0.70 and 0.80 is acceptable and a value greater than 0.80 is excellent.
Values higher than 0.90 are rarely observed. For reference, the c-statistic for
the Framingham Heart Study risk calculator, a commonly used risk
assessment tool is 0.77 (Wilson et al. 1998). To compare the predictive ability
of the risk prediction measures, the AUC/c-statistics were compared using the
method of DeLong, DeLong, and Clarke–Pearson for correlated data. The
standard error and 95 percent confidence intervals (CIs) were also calculated
software for all analyses.
The sample characteristics were representative of the VA patients nationally
(Kazis et al. 1999). Subjects were predominantly older, white, male, and had
multiple coexisting illnesses (Table 1). Approximately one-third reported
1238 HSR: Health Services Research 40:4 (August 2005)