National Release of the Nursing Home
Quality Report Cards: Implications of
Statistical Methodology for Risk
Yue Li, Xueya Cai, Laurent G. Glance, William D. Spector, and
Dana B. Mukamel
Objective. To determine how alternative statistical risk-adjustment methods may
affect the quality measures (QMs) in nursing home (NH) report cards.
Data Sources/Study Settings. Secondary data from the national Minimum Data
Set files of 2004 and 2005 that include 605,433 long-term residents in 9,336 facilities.
Study Design. We estimated risk-adjusted QMs of decline in activities of daily living
(ADL) functioning using classical, fixed-effects, and random-effects logistic models.
Risk-adjusted QMs were compared with each other, and with the published QM
(unadjusted) in identifying high- and low-quality facilities by either the rankings or
95 percent confidence intervals of QMs.
Principal Findings. Risk-adjusted QMs showed better overall agreement (or con-
vergent validity) with each other than did the unadjusted versus each adjusted QM; the
disagreement rate between unadjusted and adjusted QM can be as high as 48 percent.
The risk-adjusted QM derived from the random-effects shrinkage estimator deviated
nonrandomly from other risk-adjusted estimates in identifying the best 10 percent
facilities using rankings.
Conclusions. The extensively risk-adjusted QMs of ADL decline, even when esti-
mated by alternative statistical methods, show higher convergent validity and provide
more robust NH comparisons than the unadjusted QM. Outcome rankings based on
ADL decline tend to show lower convergent validity when estimated by the shrinkage
estimator rather than other statistical methods.
Key Words. Nursing home, quality report cards, activities of daily living, risk
The quality of long-term care received by nursing home (NH) residents
remains a persistent concern for consumers, their families and policy makers
the 1987 Nursing Home Reform Act, continued efforts have been made to
rHealth Research and Educational Trust
establish a national system for assessing, monitoring, and publicly reporting
NH quality (Morris et al. 1990; Zimmerman et al. 1995; General Accounting
Office 2002; Mor 2004). In November 2002, as part of its Nursing Home
Quality Initiative, the Centers for Medicare and Medicaid Services (CMS)
launched a national report card with NH quality measures (QMs), the ‘‘Nurs-
outcome-based measures derived from the Minimum Data Set (MDS) (Gen-
eral Accounting Office 2002; Arling et al. 2007; Mukamel et al. 2008).
Making the facility performance data available to the public is expected
to empower consumers to compare and choose NH services based on quality,
and to stimulate quality improvement through market competition. Given its
potential impact (Chernew and Scanlon 1998; Mukamel et al. 2004, 2007), it
is critical that the QMs accurately differentiate homes with good quality from
those with poor quality.
Because health outcomes are determined by both care quality and res-
ident frailties and comorbid conditions, it is imperative to adjust for case mix
variations among facilities before their outcomes are compared (Iezzoni 2003).
Failure to do so may introduce a bias where facilities treating the sickest
residents may have worse outcomes even when they provide the best of care.
Many quality report cards for hospitals and physicians recognize this issue and
provide risk-adjusted outcome rates. However, several studies have noted that
(General Accounting Office 2002; Arlinget al. 2007; Mukamel et al. 2008), and
have advocated using more extensive, statistical risk adjustment in these QMs.
Despite the essential role of risk adjustment in making fairer outcome
comparisons,however, risk adjustment mayintroduce anuncertainty (Iezzoni
1997) when alternative statistical methodologies do not agree on the identity
of high- and low-quality providers (DeLong et al. 1997; Hannan et al. 1997;
Iezzoni 1997; Shahian et al. 2001; Glance et al. 2006a; Li et al. 2007). A
growing literature on this issue has focused on the use of appropriate severity
Address correspondence to Yue Li, Ph.D., Department of Medicine, University of California,
Irvine, CA 92697; e-mail: firstname.lastname@example.org. Xueya Cai, M.A., is with the Division of Biostatistics,
Indiana University School of Medicine, Indiana University Purdue University Indianapolis,
Indianapolis, IN. Laurent G. Glance, M.D., is with the Department of Anesthesiology, The Uni-
is with the Center for Delivery, Organization, and Markets, Agency for Healthcare Research and
Quality,Rockville, M.D. Dana B. Mukamel, Ph.D., is with the Centerfor Health Policy Research,
University of California, Irvine, CA.
80HSR: Health Services Research 44:1 (February 2009)
measures for risk adjustment (Hannan et al. 1997; Iezzoni 1997; Shahian et al.
2001). More recently, analysts also examined the choice among statistical
models, such as logistic or multilevel (random-effects) regression models,
in computing and comparing risk-adjusted rates. Their findings suggest that
alternative statistical methods may estimate outcomes differently (DeLong et
al. 1997; Shahian et al. 2001; Glance et al. 2006a; Li et al. 2007).
This study was designed to explore the implications of alternative
statistical methods——the classical, fixed-effects, and random-effects logistic
models——in constructing and interpreting the national NH QMs. Focusing on
1 of the 19 outcomes currently published (Mukamel et al. 2008), we first
developed extensively risk-adjusted measures using a common set of MDS
CMS QM (unadjusted) and these risk-adjusted measures in identifying out-
standing or poor-performing facilities. The outcome examined was decline
in activities of daily living (ADLs) for long-term care residents. We chose
this outcome because physical function (as measured by ADLs) is central
to the well-being of NH residents (Institute of Medicine 1986). Furthermore,
it has been shown to be amenable to appropriate interventions (Granger et al.
1990; Spector and Takada 1991; Kane et al. 1996) and been used in various
studies of NH quality (Mukamel 1997; Mukamel and Brower 1998; Rosen
et al. 2000, 2001).
The MDS and NH QMs
In 1986, the Institute of Medicine’s Committee on Nursing Home Regulation
reported widespread quality deficiencies across the nation (Institute of
Medicine 1986), and recommended strengthened NH regulations, revisions
of oversight and enforcement mechanisms, and changes in quality assessment
toward a more resident-centered and health outcome-oriented approach.
Based on these recommendations, the Omnibus Budget Reconciliation Act
of 1987 and subsequent legislations established new standards of NH care
‘‘to attain or maintain the highest practicable physical, mental, and psycho-
social well-being’’ (Capitman and Bishop 2004). As a part of these efforts, the
Health Care Financing Administration (now CMS) mandated the implemen-
for health assessment and care planning (Fries et al. 1997). A key component
of the RAI is the MDS, a structured assessment tool for periodic collection
Nursing Home Quality Report Cards and Risk Adjustment Methods81
of multiple domains of resident information, including physical function, cog-
nition, emotion, behavior, nutrition, diagnoses, procedures, and treatments
received (Morris et al. 1990; CMS 2002).
By virtue of their longitudinal nature, the MDS records can be used to
document changes in resident conditions, such as functional decline or
development of pressure ulcers, which can then be translated into meaningful
quality-of-care indices (Zimmerman et al. 1995; Mukamel 1997; Rosen et al.
2001; Mor 2004). In a multistate demonstration sponsored by CMS, Zimmer-
man et al. (1995) developed a set of MDS-based clinical quality indicators
states, and soon expanded it to national public reporting in November 2002.
(1995) and partly from new development (Manard 2002). Currently, there are
19 QMs (14 for long-stay residents, and five for postacute care patients) that
are published, with periodic updates, on the CMS-maintained ‘‘Nursing
Home Compare’’ website (www.medicare.gov/NHCompare).
Issues of Inadequate Risk Adjustment
The CMS QMs incorporate several mechanisms to account for resident
characteristics. First, exclusions are used to create a relatively homogenous
level of physical dependence at ‘‘baseline’’ and thus would not deteriorate
further (see Appendix SA2). Second, stratification between high- and low-risk
regression is used for five QMs, each adjusting for a limited number (1–3) of
risk variables. A detailed description of the CMS approach can be found
elsewhere (Mukamel et al. 2008).
Despite these efforts to make NH comparisons fairer, it is possible
that QMs with limited risk adjustment may not accurately identify poorly
performing facilities. Because a broad array of resident characteristics can
affect outcomes and these characteristics may not be randomly distributed
over facilities, ignoring the effect of these risk factors (i.e., those not adjusted
for in the CMS QMs) may bias quality estimation (Localio et al. 1997). For
example, Mukamel et al. (2008) examined several CMS QMs, and found that
QMs with additional adjustment for MDS risk factors resulted in different
facility rankings than the rakings based on the corresponding CMS QMs.
82 HSR: Health Services Research 44:1 (February 2009)
risk adjustment in CMS QMs (General Accounting Office 2002; Arling
et al. 2007).
CMS and its contracting researchers have recognized this issue and
suggested that adjusting for the type of residents in facilities requires further
risk factors; (2) comparisons of different risk-adjustment methodologies, as
applied to each QM; and (3) validation of different risk-adjustment methods
(General Accounting Office 2002). This study extends previous research
(Arling et al. 2007; Mukamel et al. 2008) (those have demonstrated more
appropriate choice of risk factors) along this line by comparing and validating
alternative statistical methods for risk adjustment.
Regression-Based Risk Adjustment
As can be seen in many acute care report cards, multivariate statistical regres-
sion is commonly used for risk adjustment (Iezzoni 2003). Compared with the
CMS method such as risk stratification or exclusion, the regression-based
approach is more flexible in that it can account for a large number of patient
characteristics affecting outcomes (Mukamel et al. 2008). Although the regres-
sion-based method may be technically less straightforward, its basic analytical
expected outcome (e.g., probability of functional decline) of each patient based
solely on risk factors; second, the expected outcome of each facility can be
computed as the summed risk of all patients in the facility divided by the total
number of patients in it; finally, the risk-adjusted QM can be constructed based
on the comparison of the facility’s average observed outcome with expected
outcome. Details of this observed-to-expected outcome comparison are
presented in a recent study of hospital report card (Li et al. 2007).
Despite the flexibility of this approach in modeling risks, a precaution
of performing such risk-adjusted analysis is to avoid ‘‘over-adjustment’’
(Mukamel et al. 2008). For example, weight loss may be a risk factor of func-
of NH care. Including weight loss in the regression of functional decline may
be necessary to avoid biased outcome comparison, but could overstate a
facility’s performance, when solely judged by the QM of ADL decline, if the
facility provides poor care for weight loss. However, CMS’s publication
of multidimensional QMs tempers this issue largely (Mukamel et al. 2008)
Nursing Home Quality Report Cards and Risk Adjustment Methods83
because both outcomes are published and can be used to rank facilities
explicitly. We will return to this issue later in ‘‘Discussion.’’
Potential Impact of Alternative Statistical Models
Regression-based risk adjustment is expected to balance the effect of patients’
preexisting risks (e.g., baseline physical function) on health outcomes (e.g.,
future decline in ADLs), leaving residual outcome differences across facilities
to reflect quality (Iezzoni 1997). The choice of statistical methodology, how-
ever, may have impacts on quality rankings. Although this issue has been
examined in acute care outcomes (DeLong et al. 1997; Shahian et al. 2001;
Glance et al. 2006a; Li et al. 2007), no study has explored the impact of
alternative statistical methods on constructing the NH QMs. Current acute
care report cards are frequently developed using classical regression models
(Li et al. 2007). Although widely used, the classical regression may not be
et al. 1997). A basic assumption of classical regression is that all patients are
independent observations in the dataset (Hosmer and Lemeshow 2000).
However, outcomes data almost always violate this assumption because
patients in the same facility tend to show similar (or correlated) health out-
comes when receiving similar patterns of care in the facility. The assumption
of independent outcomes in classical regression contradicts with the spirit of
outcomes comparisons and reports, which are grounded on the belief that a
common factor, ‘‘quality,’’ determines the health outcomes of patients in the
same facility, and varies across facilities. Ignoring the ‘‘clustering’’ of patients
due to ‘‘quality’’ or other factors may invalidate the empirical risk-adjustment
model and lead to incorrect quality estimates (DeLong et al. 1997).
Two alternative approaches——fixed-effects and random-effects models——
estimate quality explicitly in the regression models and assume independent
outcomes among facilities but correlated outcomes within a facility (Greene
2001). Thus, although with their own limitations, the two approaches may be
better suited for outcome comparisons.
In addition, the fixed-effects modeling is effective in dealing with the
situation where facility quality correlates with patient characteristics (Greene
2001). The correlation may be caused by selective referrals where physicians
refer sicker patients to better-performing hospitals or discharge patients with
functional disabilities to NHs with better rehabilitative services. In such cases,
the fixed-effects modeling can produce unbiased and consistent parameter
estimation (Hsiao 2003). However, one disadvantage of this approach is that
84 HSR: Health Services Research 44:1 (February 2009)
for facilities with small numbers of patients, the point estimate may be ac-
companied by a large variance and be unreliable. The fixed-effects approach
has been implemented by the Agency for Healthcare Research and Quality’s
evidence-based hospital QIs (Agency for Healthcare Research and Quality
2002), and by a national report of Consumer Survey of Health Plans
The random-effects model assumes that facility quality arises from a dis-
tribution of population quality such as normal distribution (Brown and Prescott
1999). In such a model, the estimated quality is shrunken towards the overall
performance estimate, or less widely spread, compared with its fixed-effects
counterpart. The shrinkage is inversely proportional to the precision of facility
estimate, which helps avoid extreme estimates for facilities with small numbers
of patients (Goldstein and Spiegelhalter 1996). Consequently, the shrinkage
estimates for small facilities are generally more conservative, but can cause bias
in the point estimates of quality and thus erroneous quality rankings. Another
outcomes, is that the random estimates are assumed uncorrelated with patient
characteristics (DeLong et al. 1997). This assumption may be violated when
selective referrals based on performance exist. In such instances, the random-
effects model will result in biased estimates (Brown and Prescott 1999).
Studies comparing these alternative models have resulted in inconclu-
different approaches are used (Greenfield et al. 2002; Huang et al. 2005), but
others reporting relatively minor impacts on outcomes inferences (DeLong
et al. 1997; Hannan et al. 2005; Glance et al. 2006a). Therefore, the choice
of statistical methodology is likely an empirical question that depends on
individual outcomes and populations.
s) (Elliott et al. 2001).
Data and Measures
We used the 2004 and 2005 national MDS datasets for all long-term care
residents in facilities certified by the Medicare or Medicaid program. The
MDS assessments are performed for each resident upon admission, quarterly
suggests that MDS records meet acceptable standards of accuracy and reli-
ability for research purposes (Hawes et al. 1995; Lawton et al. 1998; Mor et al.
2003). The MDS ADL score quantifies a resident’s functioning in the last 7
Nursing Home Quality Report Cards and Risk Adjustment Methods85
days of assessment in bed mobility, transferring, eating, and toilet use. Each of
thefourcomponents is scored on a five-point scale, with 0 standing forhighest
level of independence and 4 indicating total dependence (CMS 2002).
Defining the CMS QM
According to CMS’s definition (Abt Associates Inc. 2004), a resident suffers
functionaldeclinebetween twoadjacent quartersifhe/shehasatleasttwoADL
components increased by one point or at least one ADL component increased
by two points. We defined a binary variable yijfor resident i in facility j that
equaled 1 if the resident had functional decline in the first quarter of 2005
compared with the fourth quarterof2004 (based on thequarterly assessmentor
nearest full assessment), and 0 otherwise. CMS exclusion criteria were then
applied according to both resident and facility characteristics (Appendix SA2).
CMS did not use stratification or regression adjustment in this QM.
We calculated the CMS QM rate as the percent of residents who had
functional decline for each eligible facility, that is, long-term care facility with
430 residents at risk of functional decline (denominator). We further calculated
the 95 percent confidence interval (CI) of this QM using normal approximation:
Oj? 1:96 ?
Ojð1 ? OjÞ
where Ojis the CMS-unadjusted rate and njis the number of at-risk residents in
Identifying and Estimating the Effect of Risk Factors
quarter of 2004 or nearest full assessment) according to previous literature and
recommendations by an experienced geriatrician familiar with long-term care.
We then estimated their effect using classical logistic regression models in both
bivariate and multivariate analyses, and retained only variables that were sig-
nificant at .001 level in the final model. In addition, we used multivariate frac-
tional polynomials (Royston and Sauerbrei 2003) to determine the optimal
transformations of continuous covariates (i.e., length between prior and target
the same data as used for calculating the CMS QM:
1 ? pij
¼ b0þ b1xij1þ b2xij2þ ??? þ bkxijk
86 HSR: Health Services Research 44:1 (February 2009)
the best set of risk factors, and bkare model parameters.
Calculating Risk-Adjusted QMs Alternatively
described below and summarized in Table 1.
Method 1. First, each resident’s probability of experiencing functional decline
(^ pij1) was predicted by the classical logistic model (equation (2)). Expected
outcome rate for each facility ðE1
residents in facility j divided by nj. To calculate the risk-adjusted QM ðQM1
we first calculated
jÞ was then calculated as the sum of^ pij1for
jÞ ¼ ln
1 ? Oj
1 ? E1
100 percent ? 17:96 percent
where 17.96 percent is the overall rate of functional decline for all residentsin
the sample, and then back-transformed logitðQM1
jÞ to the probability scale:
jbecause a prior study (Li et al.
j¼ ½1 þ logit?1ðQM1
We used equations (3) and (4) to calculate QM1
and better identifies outlier facilities than other measures such as those based
ADL Functioning, by Risk-Adjustment Methods
Description of the Nursing Home Quality Measure of Decline in
Type of Risk-Adjustment Method
No. of Nursing
0. CMS unadjusted
1. Classical logistic regression based
2. Fixed-effects logistic regression based
4. Random-effects logistic regression
and shrinkage estimator based
17.96 16.44 11.11–23.40
18.36 16.86 11.50–24.13
19.04 17.49 11.83–25.13
19.15 17.59 13.25–23.81 2.94–67.52
Nursing Home Quality Report Cards and Risk Adjustment Methods87
Finally, to calculate the 95 percent CI of QM1
95 percent CI of Ojas
j, we first calculated the
Oj? 1:96 ?
i¼1^ pij1ð1 ?^ pij1Þ
which was developed by Hosmer and Lemeshow (1995) based on normal
approximation of the binomial distribution. We then used the upper and
lower bounds of the CI obtained above to calculate the 95 percent CI of QM1
by repeating the calculations in equations (3) and (4) (Li et al. 2007).
Method 2. We first estimated a fixed-effects logistic model
1 ? pij
¼ b0þ b1xij1þ b2xij2þ ??? þ bkxijkþ u0j
that incorporated the same set of x-variables as equation (2). This model was
NH fixed-effects u0j captures the effect of unmeasured facility characteristics
after controlling for resident risks (Greene 2001). We then used this model to
predict each resident’s probability of functional decline ð^ pij2Þ assuming null
effect of u0j(i.e.,^ pij2only incorporates resident risks but no facility effects), and
calculated theexpected rate for eachfacility ðE2
in facility j divided by nj. Finally, E2
adjusted QM ðQM2
equations (3)–(5) (replacing E2
jÞ as the sum of^ pij2for residents
jand^ pij2were used to calculate the risk-
jÞ, and its 95 percent CI by repeating the calculations in
jand^ pij2for E1
jand^ pij2, respectively).
normally distributed with mean zero and variance s2
as random-effects using SAS (SAS Corp., Cary, NC) Proc Glimmix (Littell
et al. 2006). In a similar process, we predicted the resident’s probability
of functional decline ( ^ pij3) assuming null effect of u0j, and calculated the
expected rate for each facility (E3
j divided by nj. Finally, E3
u, and estimated
j) as the sum of^ pij3for residents in facility
jand^ pij3were used to calculate the risk-adjusted
j, and its 95 percent CI according to equations (3)–(5).
Method 4. Because the random-effects shrinkage estimator u0j represents
facility variations in outcome after adjusting for resident characteristics, we
used this estimator to derive the risk-adjusted QM (QM4
j) directly. We first
88HSR: Health Services Research 44:1 (February 2009)
logit scale) and then back-transformed logitðQM4
The 95 percent CI of QM4
95 percent CI of u0j.
jÞ ¼ u0jþ logitð17:96 percentÞ (note that u0j is on the
jÞ to the probability scale.
was similarly calculated using the estimated
Comparing Statistical Models and QMs
comparison of alternative methods at the regression model (or resident) level
and at the QM (or facility) level. These different perspectives of validity
have been described previously (Mukamel 1997; Iezzoni 2003) and their
operational definitions are:
Validity criteria at the regression model level:
? Face validity——Themodel accommodates variables that on face value
are important clinical risk factors.
affecting outcome, i.e., complete risks, facility effects, and chance
? Construct validity——The effects of risk factors on outcome are
estimated in the expected direction.
agreement when estimated by alternative models.
? Predictive validity——The model predicts actual outcome well.
Validity criteria at the QM level:
? Criterion validity——The QM reflects true quality of care.
? Convergent validity——Facility rankings or identity of outliers based on
QMs derived from alternative methods show close agreement.
The predictive validity of the classical, fixed-effects, and random-effects
models was evaluated by the c-statistic, which equals the area under the
receiver operating characteristic curve (Hanley and McNeil 1982). The
c-statistic exams how well the model discriminates residents with and without
functional decline by assigning a higher predicted probability to those with
functional decline. The c-statistic ranges from 0.5 (random discrimination, no
better than a flip of coin) to 1.0 (perfect discrimination).
AttheQM level, we defined high-and low-qualityfacilitiesby either the
ranking or the 95 percent CI of the QM derived from each method. First,
Nursing Home Quality Report Cards and Risk Adjustment Methods89
facilities in the lowest 10 percent and highest 10 percent rankings were iden-
tified as ‘‘best 10 percent’’ and ‘‘worst 10 percent’’ facilities, respectively
(because the QM represents an adverse outcome, lower QM rate indicates
better quality). We also performed sensitivity analyses where we used alter-
native cutoffs (5 and 25 percent) to define best and worst facilities. Second,
facilities were identified as high-quality outliers if the 95 percent CIs of their
QM rates were below the overall rate 17.96 percent, and low-quality outliers
if the 95 percent CIs were above 17.96 percent.
To quantify the convergent validity of alternatively calculated QMs, we
calculated the k statistic (Landis and Koch 1977) (1) between the CMS-
unadjusted QM and each risk-adjusted QM, and (2) between each pair of
risk-adjusted QMs, in identifying best and worst facilities or outliers. The k
measures the level of agreement between two raters evaluating an event on a
categorical scale (Landis and Koch 1977). In this study, we defined the event
scale as 15best facilities (or high-quality outlier), 05medium facilities
(or nonquality outlier), and ?15worst facilities (or low-quality outlier). The
k ranges between 0 and 1, with 0 indicating no agreement beyond chance,
and 1 indicating total agreement.
In comparing the CMS-unadjusted QM and each risk-adjusted QM, we
further calculated (1) the false-positive rate for medium-quality facilities
(i.e., those not identified as high- or low-quality facilities by each risk-adjusted
QM were identified so by the unadjusted QM), and (2) the false-negative rate
for high- and low-quality facilities separately (i.e., those identified as high- or
low-quality facilities by each risk-adjusted QM were not identified so by the
CMS-unadjusted QM) (Glance et al. 2006b).1
This study included 605,433 long-term care residents in 9,336 facilities
(Table 1). The overall unadjusted QM rate was 17.96 percent, and it varied
percent). The residents were on average 81 years old, and had different levels
of ADL impairment at prior assessment (Table 2).
In general, the classical, fixed-effects, and random-effects models had slightly
different estimates (odds ratios) for individual risk factors (Table 2). The
c-statistic was 0.68 for the classical model, which is comparable with results
90HSR: Health Services Research 44:1 (February 2009)
Resident Characteristics (n5605,433) and Estimates in Risk-
Prevalence (%) or
Mean ? SD
Odds Ratio Estimated by
Length (in weeks) between prior (or baseline) and target assessments
length ? ln(length)
Age in yearsw
ADL performance— —bed mobility
Total dependence or no activity
ADL performance— —transfer
Total dependence or no activity
ADL performance— —eating
Total dependence or no activity
ADL performance— —toilet use
Total dependence or no activity
Short-term memory problem
Cognitive skills for daily decision making
Independent or modified
Rarely understand others or make
Behavior problems in wandering
12.37 ? 1.65 0.075
1.18780.89 ? 12.94
Nursing Home Quality Report Cards and Risk Adjustment Methods 91
in previous studies (Mukamel 1997; Rosen et al. 2001), 0.76 for the fixed-
effects model, and 0.75 for the random-effects model.
Agreement in NH Classifications
The average risk-adjusted rates ranged between 16.35 percent (fixed-effects
estimate) and 19.15 percent (shrinkage estimator, Table 1). The rate based on
shrinkage estimator exhibited the least variation across facilities, ranging from
2.94 to 67.52 percent.
Table 3 shows that the k between CMS-unadjusted QM and each
risk-adjusted QM ranged between 0.70 and 0.80 in ranking and classifying
facilities. Compared with each adjusted QM, the unadjusted QM had a
false-negative rate of over 0.20 in identifying the worst 10 percent facilities,
and a false-negative rate between 0.14 and 0.26 in identifying the best 10
percent facilities (the false-positive rate of the unadjusted QM was o0.10).
Table 4 shows that the pair-wise agreement between each risk-adjusted QM
was very high (k > 0:80) in rankings. However, the shrinkage estimator
the best 10 percent facilities, the percent of differentially identified facilities
being 18 percent [172/(1721761)], 17 percent [154/(1541779)], and 17
percent [155/(1551778)], respectively (see highlighted cells in Table 4).
We varied the cutoff percentiles in classifying facilities and found similar
results to those in Tables 3 and 4.
In identifying‘‘outliers’’using the 95 percent CI ofeachQM,the overall
k ranged between 0.59 and 0.76 between CMS-unadjusted QM and
each risk-adjusted QM (Table 5). Compared with each adjusted QM,
the false-positive rate of the unadjusted QM was between 0.10 and 0.17, the
Prevalence (%) or
Mean ? SD
Odds Ratio Estimated by
Urinary tract infection
u(standard error) 0.362 (0.007)
np value4.01, all other p values o.001.
wIntherisk-adjustmentmodels,agewastransformedto0ifageo65,andlnðage ? 64Þifage ? 65.
92HSR: Health Services Research 44:1 (February 2009)
false-negative rate was between 0.25 and 0.48 for identifying low-quality
outliers, and was minimal (o0.01) for identifying high-quality outliers.
Table 6 shows that the overall pair-wise agreement between risk-
adjusted QMs was high (k > 0:70) in identifying outliers. However, the
shrinkage estimatordeviated nonrandomly withother estimates in identifying
high-quality outliers, the percent of differentially classified facilities being
37 percent [476/(4761817)], 49 percent [793/(7931841)], and 35 percent
[448/(4481821)], respectively. In addition, there are cases that a pair of other
outliers (highlighted cells in Table 6).
Although current NH QMs are not perfect (Mor et al. 2003), they will likely
play an increasingly important role in market-driven quality improvement
Unadjusted Measure Compared with Risk-Adjusted Measures
Agreementin NursingHome QualityRankings——CMS-
Rankings Based on
Rankings Based on Risk Adjustment
0.220.14 0.240.140.230.15 0.23 0.26
Nursing homes in the worst 10% group are those whose point estimates of the quality measures of
Nursinghomesinthemedium 80%groupare thosewhosepoint estimates ofthe quality measures
of decline in ADL functioning are between the 10% highest and 10% lowest of the rates of all
nursing homes (n59,336). Nursing homes in the best 10% group are those whose point estimates
of the quality measures of decline in ADL functioning are among the 10% lowest of the rates of all
nursing homes (n59,336).
nBased on classical logistic regression model.
wBased on fixed-effects model.
zBased on random-effects model.
§Based on the shrinkage estimators of the random-effects model.
nnEach risk-adjusted measure was treated as ‘‘gold standard’’ when calculating the false-positive
(for the medium 80% group) and false-negative (for the worst or best 10% group) rates.
Nursing Home Quality Report Cards and Risk Adjustment Methods 93