Page 1

RESEARCH ARTICLE

National Release of the Nursing Home

Quality Report Cards: Implications of

Statistical Methodology for Risk

Adjustment

Yue Li, Xueya Cai, Laurent G. Glance, William D. Spector, and

Dana B. Mukamel

Objective. To determine how alternative statistical risk-adjustment methods may

affect the quality measures (QMs) in nursing home (NH) report cards.

Data Sources/Study Settings. Secondary data from the national Minimum Data

Set files of 2004 and 2005 that include 605,433 long-term residents in 9,336 facilities.

Study Design. We estimated risk-adjusted QMs of decline in activities of daily living

(ADL) functioning using classical, fixed-effects, and random-effects logistic models.

Risk-adjusted QMs were compared with each other, and with the published QM

(unadjusted) in identifying high- and low-quality facilities by either the rankings or

95 percent confidence intervals of QMs.

Principal Findings. Risk-adjusted QMs showed better overall agreement (or con-

vergent validity) with each other than did the unadjusted versus each adjusted QM; the

disagreement rate between unadjusted and adjusted QM can be as high as 48 percent.

The risk-adjusted QM derived from the random-effects shrinkage estimator deviated

nonrandomly from other risk-adjusted estimates in identifying the best 10 percent

facilities using rankings.

Conclusions. The extensively risk-adjusted QMs of ADL decline, even when esti-

mated by alternative statistical methods, show higher convergent validity and provide

more robust NH comparisons than the unadjusted QM. Outcome rankings based on

ADL decline tend to show lower convergent validity when estimated by the shrinkage

estimator rather than other statistical methods.

Key Words. Nursing home, quality report cards, activities of daily living, risk

adjustment, MDS

The quality of long-term care received by nursing home (NH) residents

remains a persistent concern for consumers, their families and policy makers

(Vladeck1980;InstituteofMedicine1986;CapitmanandBishop2004).Since

the 1987 Nursing Home Reform Act, continued efforts have been made to

rHealth Research and Educational Trust

DOI: 10.1111/j.1475-6773.2008.00910.x

79

Page 2

establish a national system for assessing, monitoring, and publicly reporting

NH quality (Morris et al. 1990; Zimmerman et al. 1995; General Accounting

Office 2002; Mor 2004). In November 2002, as part of its Nursing Home

Quality Initiative, the Centers for Medicare and Medicaid Services (CMS)

launched a national report card with NH quality measures (QMs), the ‘‘Nurs-

ingHomeCompare’’website,thatpublishesandregularlyupdatesasetofkey

outcome-based measures derived from the Minimum Data Set (MDS) (Gen-

eral Accounting Office 2002; Arling et al. 2007; Mukamel et al. 2008).

Making the facility performance data available to the public is expected

to empower consumers to compare and choose NH services based on quality,

and to stimulate quality improvement through market competition. Given its

potential impact (Chernew and Scanlon 1998; Mukamel et al. 2004, 2007), it

is critical that the QMs accurately differentiate homes with good quality from

those with poor quality.

Because health outcomes are determined by both care quality and res-

ident frailties and comorbid conditions, it is imperative to adjust for case mix

variations among facilities before their outcomes are compared (Iezzoni 2003).

Failure to do so may introduce a bias where facilities treating the sickest

residents may have worse outcomes even when they provide the best of care.

Many quality report cards for hospitals and physicians recognize this issue and

provide risk-adjusted outcome rates. However, several studies have noted that

theonlineNHQMstakeonlyminimalstepstoadjustforresidentcharacteristics

(General Accounting Office 2002; Arlinget al. 2007; Mukamel et al. 2008), and

maynotsufficiently‘‘leveltheplayingfield’’forNHcomparisons.Thesestudies

have advocated using more extensive, statistical risk adjustment in these QMs.

Despite the essential role of risk adjustment in making fairer outcome

comparisons,however, risk adjustment mayintroduce anuncertainty (Iezzoni

1997) when alternative statistical methodologies do not agree on the identity

of high- and low-quality providers (DeLong et al. 1997; Hannan et al. 1997;

Iezzoni 1997; Shahian et al. 2001; Glance et al. 2006a; Li et al. 2007). A

growing literature on this issue has focused on the use of appropriate severity

Address correspondence to Yue Li, Ph.D., Department of Medicine, University of California,

Irvine, CA 92697; e-mail: ylill@uci.edu. Xueya Cai, M.A., is with the Division of Biostatistics,

Indiana University School of Medicine, Indiana University Purdue University Indianapolis,

Indianapolis, IN. Laurent G. Glance, M.D., is with the Department of Anesthesiology, The Uni-

versityofRochesterSchoolofMedicineandDentistry,Rochester,NY.WilliamD.Spector,Ph.D.,

is with the Center for Delivery, Organization, and Markets, Agency for Healthcare Research and

Quality,Rockville, M.D. Dana B. Mukamel, Ph.D., is with the Centerfor Health Policy Research,

University of California, Irvine, CA.

80HSR: Health Services Research 44:1 (February 2009)

Page 3

measures for risk adjustment (Hannan et al. 1997; Iezzoni 1997; Shahian et al.

2001). More recently, analysts also examined the choice among statistical

models, such as logistic or multilevel (random-effects) regression models,

in computing and comparing risk-adjusted rates. Their findings suggest that

alternative statistical methods may estimate outcomes differently (DeLong et

al. 1997; Shahian et al. 2001; Glance et al. 2006a; Li et al. 2007).

This study was designed to explore the implications of alternative

statistical methods——the classical, fixed-effects, and random-effects logistic

models——in constructing and interpreting the national NH QMs. Focusing on

1 of the 19 outcomes currently published (Mukamel et al. 2008), we first

developed extensively risk-adjusted measures using a common set of MDS

riskfactorsbutdifferentmodelingapproaches.Wethen comparedthecurrent

CMS QM (unadjusted) and these risk-adjusted measures in identifying out-

standing or poor-performing facilities. The outcome examined was decline

in activities of daily living (ADLs) for long-term care residents. We chose

this outcome because physical function (as measured by ADLs) is central

to the well-being of NH residents (Institute of Medicine 1986). Furthermore,

it has been shown to be amenable to appropriate interventions (Granger et al.

1990; Spector and Takada 1991; Kane et al. 1996) and been used in various

studies of NH quality (Mukamel 1997; Mukamel and Brower 1998; Rosen

et al. 2000, 2001).

BACKGROUND

The MDS and NH QMs

In 1986, the Institute of Medicine’s Committee on Nursing Home Regulation

reported widespread quality deficiencies across the nation (Institute of

Medicine 1986), and recommended strengthened NH regulations, revisions

of oversight and enforcement mechanisms, and changes in quality assessment

toward a more resident-centered and health outcome-oriented approach.

Based on these recommendations, the Omnibus Budget Reconciliation Act

of 1987 and subsequent legislations established new standards of NH care

‘‘to attain or maintain the highest practicable physical, mental, and psycho-

social well-being’’ (Capitman and Bishop 2004). As a part of these efforts, the

Health Care Financing Administration (now CMS) mandated the implemen-

tationofstandardized,comprehensiveResidentAssessmentInstrument(RAI)

for health assessment and care planning (Fries et al. 1997). A key component

of the RAI is the MDS, a structured assessment tool for periodic collection

Nursing Home Quality Report Cards and Risk Adjustment Methods81

Page 4

of multiple domains of resident information, including physical function, cog-

nition, emotion, behavior, nutrition, diagnoses, procedures, and treatments

received (Morris et al. 1990; CMS 2002).

By virtue of their longitudinal nature, the MDS records can be used to

document changes in resident conditions, such as functional decline or

development of pressure ulcers, which can then be translated into meaningful

quality-of-care indices (Zimmerman et al. 1995; Mukamel 1997; Rosen et al.

2001; Mor 2004). In a multistate demonstration sponsored by CMS, Zimmer-

man et al. (1995) developed a set of MDS-based clinical quality indicators

(QIs).InApril2002,CMSbeganitspilotpublicationofasetofNHQMsinsix

states, and soon expanded it to national public reporting in November 2002.

TheseQMswerepartlyselectedfromtheQIsdevelopedbyZimmermanetal.

(1995) and partly from new development (Manard 2002). Currently, there are

19 QMs (14 for long-stay residents, and five for postacute care patients) that

are published, with periodic updates, on the CMS-maintained ‘‘Nursing

Home Compare’’ website (www.medicare.gov/NHCompare).

Issues of Inadequate Risk Adjustment

The CMS QMs incorporate several mechanisms to account for resident

characteristics. First, exclusions are used to create a relatively homogenous

residentcohortonwhomtocalculateeachQM.Forexample,thesampleused

forcalculatingthemeasureofADLdeclineexcludesthosewhowereathighest

level of physical dependence at ‘‘baseline’’ and thus would not deteriorate

further (see Appendix SA2). Second, stratification between high- and low-risk

residentsisusedforthemeasureofpressuresore,i.e.,facilityratesarereported

forpredefinedhigh-andlow-riskresidentsseparately.Finally,classicallogistic

regression is used for five QMs, each adjusting for a limited number (1–3) of

risk variables. A detailed description of the CMS approach can be found

elsewhere (Mukamel et al. 2008).

Despite these efforts to make NH comparisons fairer, it is possible

that QMs with limited risk adjustment may not accurately identify poorly

performing facilities. Because a broad array of resident characteristics can

affect outcomes and these characteristics may not be randomly distributed

over facilities, ignoring the effect of these risk factors (i.e., those not adjusted

for in the CMS QMs) may bias quality estimation (Localio et al. 1997). For

example, Mukamel et al. (2008) examined several CMS QMs, and found that

QMs with additional adjustment for MDS risk factors resulted in different

facility rankings than the rakings based on the corresponding CMS QMs.

82 HSR: Health Services Research 44:1 (February 2009)

Page 5

Twootherstudiesexpressedsimilarconcernsaboutthepotentiallyinsufficient

risk adjustment in CMS QMs (General Accounting Office 2002; Arling

et al. 2007).

CMS and its contracting researchers have recognized this issue and

suggested that adjusting for the type of residents in facilities requires further

researchthatshouldinclude(1)researchregardingtheselectionofappropriate

risk factors; (2) comparisons of different risk-adjustment methodologies, as

applied to each QM; and (3) validation of different risk-adjustment methods

(General Accounting Office 2002). This study extends previous research

(Arling et al. 2007; Mukamel et al. 2008) (those have demonstrated more

appropriate choice of risk factors) along this line by comparing and validating

alternative statistical methods for risk adjustment.

Regression-Based Risk Adjustment

As can be seen in many acute care report cards, multivariate statistical regres-

sion is commonly used for risk adjustment (Iezzoni 2003). Compared with the

CMS method such as risk stratification or exclusion, the regression-based

approach is more flexible in that it can account for a large number of patient

characteristics affecting outcomes (Mukamel et al. 2008). Although the regres-

sion-based method may be technically less straightforward, its basic analytical

procedureiseasytofollow:first,theregressionmodelisestimatedtopredictthe

expected outcome (e.g., probability of functional decline) of each patient based

solely on risk factors; second, the expected outcome of each facility can be

computed as the summed risk of all patients in the facility divided by the total

number of patients in it; finally, the risk-adjusted QM can be constructed based

on the comparison of the facility’s average observed outcome with expected

outcome. Details of this observed-to-expected outcome comparison are

presented in a recent study of hospital report card (Li et al. 2007).

Despite the flexibility of this approach in modeling risks, a precaution

of performing such risk-adjusted analysis is to avoid ‘‘over-adjustment’’

(Mukamel et al. 2008). For example, weight loss may be a risk factor of func-

tionaldecline(the‘‘outcome’’)forNHresidents,whereasititselfisanoutcome

of NH care. Including weight loss in the regression of functional decline may

be necessary to avoid biased outcome comparison, but could overstate a

facility’s performance, when solely judged by the QM of ADL decline, if the

facility provides poor care for weight loss. However, CMS’s publication

of multidimensional QMs tempers this issue largely (Mukamel et al. 2008)

Nursing Home Quality Report Cards and Risk Adjustment Methods83

Page 6

because both outcomes are published and can be used to rank facilities

explicitly. We will return to this issue later in ‘‘Discussion.’’

Potential Impact of Alternative Statistical Models

Regression-based risk adjustment is expected to balance the effect of patients’

preexisting risks (e.g., baseline physical function) on health outcomes (e.g.,

future decline in ADLs), leaving residual outcome differences across facilities

to reflect quality (Iezzoni 1997). The choice of statistical methodology, how-

ever, may have impacts on quality rankings. Although this issue has been

examined in acute care outcomes (DeLong et al. 1997; Shahian et al. 2001;

Glance et al. 2006a; Li et al. 2007), no study has explored the impact of

alternative statistical methods on constructing the NH QMs. Current acute

care report cards are frequently developed using classical regression models

(Li et al. 2007). Although widely used, the classical regression may not be

appropriatefordatainwhichpatientsare‘‘clustered’’withinfacilities(DeLong

et al. 1997). A basic assumption of classical regression is that all patients are

independent observations in the dataset (Hosmer and Lemeshow 2000).

However, outcomes data almost always violate this assumption because

patients in the same facility tend to show similar (or correlated) health out-

comes when receiving similar patterns of care in the facility. The assumption

of independent outcomes in classical regression contradicts with the spirit of

outcomes comparisons and reports, which are grounded on the belief that a

common factor, ‘‘quality,’’ determines the health outcomes of patients in the

same facility, and varies across facilities. Ignoring the ‘‘clustering’’ of patients

due to ‘‘quality’’ or other factors may invalidate the empirical risk-adjustment

model and lead to incorrect quality estimates (DeLong et al. 1997).

Two alternative approaches——fixed-effects and random-effects models——

estimate quality explicitly in the regression models and assume independent

outcomes among facilities but correlated outcomes within a facility (Greene

2001). Thus, although with their own limitations, the two approaches may be

better suited for outcome comparisons.

In addition, the fixed-effects modeling is effective in dealing with the

situation where facility quality correlates with patient characteristics (Greene

2001). The correlation may be caused by selective referrals where physicians

refer sicker patients to better-performing hospitals or discharge patients with

functional disabilities to NHs with better rehabilitative services. In such cases,

the fixed-effects modeling can produce unbiased and consistent parameter

estimation (Hsiao 2003). However, one disadvantage of this approach is that

84 HSR: Health Services Research 44:1 (February 2009)

Page 7

for facilities with small numbers of patients, the point estimate may be ac-

companied by a large variance and be unreliable. The fixed-effects approach

has been implemented by the Agency for Healthcare Research and Quality’s

evidence-based hospital QIs (Agency for Healthcare Research and Quality

2002), and by a national report of Consumer Survey of Health Plans

(CAHPS

The random-effects model assumes that facility quality arises from a dis-

tribution of population quality such as normal distribution (Brown and Prescott

1999). In such a model, the estimated quality is shrunken towards the overall

performance estimate, or less widely spread, compared with its fixed-effects

counterpart. The shrinkage is inversely proportional to the precision of facility

estimate, which helps avoid extreme estimates for facilities with small numbers

of patients (Goldstein and Spiegelhalter 1996). Consequently, the shrinkage

estimates for small facilities are generally more conservative, but can cause bias

in the point estimates of quality and thus erroneous quality rankings. Another

drawbackoftherandom-effectsmodel,particularlyinconstructingrisk-adjusted

outcomes, is that the random estimates are assumed uncorrelated with patient

characteristics (DeLong et al. 1997). This assumption may be violated when

selective referrals based on performance exist. In such instances, the random-

effects model will result in biased estimates (Brown and Prescott 1999).

Studies comparing these alternative models have resulted in inconclu-

sivefindings,withsomereportingsubstantiallychangedfacilityprofilingwhen

different approaches are used (Greenfield et al. 2002; Huang et al. 2005), but

others reporting relatively minor impacts on outcomes inferences (DeLong

et al. 1997; Hannan et al. 2005; Glance et al. 2006a). Therefore, the choice

of statistical methodology is likely an empirical question that depends on

individual outcomes and populations.

s) (Elliott et al. 2001).

METHODS

Data and Measures

We used the 2004 and 2005 national MDS datasets for all long-term care

residents in facilities certified by the Medicare or Medicaid program. The

MDS assessments are performed for each resident upon admission, quarterly

thereafter,andwheneverasignificantchangeofhealthstatusoccurs.Evidence

suggests that MDS records meet acceptable standards of accuracy and reli-

ability for research purposes (Hawes et al. 1995; Lawton et al. 1998; Mor et al.

2003). The MDS ADL score quantifies a resident’s functioning in the last 7

Nursing Home Quality Report Cards and Risk Adjustment Methods85

Page 8

days of assessment in bed mobility, transferring, eating, and toilet use. Each of

thefourcomponents is scored on a five-point scale, with 0 standing forhighest

level of independence and 4 indicating total dependence (CMS 2002).

Defining the CMS QM

According to CMS’s definition (Abt Associates Inc. 2004), a resident suffers

functionaldeclinebetween twoadjacent quartersifhe/shehasatleasttwoADL

components increased by one point or at least one ADL component increased

by two points. We defined a binary variable yijfor resident i in facility j that

equaled 1 if the resident had functional decline in the first quarter of 2005

compared with the fourth quarterof2004 (based on thequarterly assessmentor

nearest full assessment), and 0 otherwise. CMS exclusion criteria were then

applied according to both resident and facility characteristics (Appendix SA2).

CMS did not use stratification or regression adjustment in this QM.

We calculated the CMS QM rate as the percent of residents who had

functional decline for each eligible facility, that is, long-term care facility with

430 residents at risk of functional decline (denominator). We further calculated

the 95 percent confidence interval (CI) of this QM using normal approximation:

s

Oj? 1:96 ?

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

nj

Ojð1 ? OjÞ

ð1Þ

where Ojis the CMS-unadjusted rate and njis the number of at-risk residents in

facility j.

Identifying and Estimating the Effect of Risk Factors

Weidentifiedriskfactorsoffunctionaldeclineinthepriorassessment(thefourth

quarter of 2004 or nearest full assessment) according to previous literature and

recommendations by an experienced geriatrician familiar with long-term care.

We then estimated their effect using classical logistic regression models in both

bivariate and multivariate analyses, and retained only variables that were sig-

nificant at .001 level in the final model. In addition, we used multivariate frac-

tional polynomials (Royston and Sauerbrei 2003) to determine the optimal

transformations of continuous covariates (i.e., length between prior and target

assessments,andage).Thefinalclassicallogisticmodelwasestimatedonexactly

the same data as used for calculating the CMS QM:

ln

pij

1 ? pij

¼ b0þ b1xij1þ b2xij2þ ??? þ bkxijk

ð2Þ

86 HSR: Health Services Research 44:1 (February 2009)

Page 9

wherepijistheprobabilityoffunctionaldeclineforeachresident,x-variablesare

the best set of risk factors, and bkare model parameters.

Calculating Risk-Adjusted QMs Alternatively

Weestimatedrisk-adjustedQMsusingdifferentmodelingapproachesthatare

described below and summarized in Table 1.

Method 1. First, each resident’s probability of experiencing functional decline

(^ pij1) was predicted by the classical logistic model (equation (2)). Expected

outcome rate for each facility ðE1

residents in facility j divided by nj. To calculate the risk-adjusted QM ðQM1

we first calculated

?

?

jÞ was then calculated as the sum of^ pij1for

jÞ,

logitðQM1

jÞ ¼ ln

Oj

1 ? Oj

?

? ln

E1

j

1 ? E1

j

!

þ ln

17:96 percent

100 percent ? 17:96 percent

?

ð3Þ

where 17.96 percent is the overall rate of functional decline for all residentsin

the sample, and then back-transformed logitðQM1

QM1

jÞ to the probability scale:

jÞ??1

jbecause a prior study (Li et al.

j¼ ½1 þ logit?1ðQM1

ð4Þ

We used equations (3) and (4) to calculate QM1

2007)hasshownthatitisconsistentwiththespecificationofthelogisticmodel

and better identifies outlier facilities than other measures such as those based

onthedifference orratiobetweenobserved(Oj)andexpectedðE1

jÞoutcomes.

Table1:

ADL Functioning, by Risk-Adjustment Methods

Description of the Nursing Home Quality Measure of Decline in

Type of Risk-Adjustment Method

No. of Nursing

Homes

Mean

(%)

Median

(%)

IQR

(%)

Range (%)

(Min–Max)

0. CMS unadjusted

1. Classical logistic regression based

2. Fixed-effects logistic regression based

3.Random-effectslogisticregressionbased

4. Random-effects logistic regression

and shrinkage estimator based

9,336

9,336

9,336

9,336

9,336

17.96 16.44 11.11–23.40

18.36 16.86 11.50–24.13

16.35 14.82

19.04 17.49 11.83–25.13

19.15 17.59 13.25–23.81 2.94–67.52

0–78.85

0–72.91

0–71.04

0–74.59

9.87–21.67

ADL,activitiesofdailyliving;IQR,interquartilerange;CMS,CentersforMedicareandMedicaid

Services.

Nursing Home Quality Report Cards and Risk Adjustment Methods87

Page 10

Finally, to calculate the 95 percent CI of QM1

95 percent CI of Ojas

j, we first calculated the

Oj? 1:96 ?

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

nj

Pnj

i¼1^ pij1ð1 ?^ pij1Þ

q

ð5Þ

which was developed by Hosmer and Lemeshow (1995) based on normal

approximation of the binomial distribution. We then used the upper and

lower bounds of the CI obtained above to calculate the 95 percent CI of QM1

by repeating the calculations in equations (3) and (4) (Li et al. 2007).

j

Method 2. We first estimated a fixed-effects logistic model

ln

pij

1 ? pij

¼ b0þ b1xij1þ b2xij2þ ??? þ bkxijkþ u0j

ð6Þ

that incorporated the same set of x-variables as equation (2). This model was

estimatedusingconditionalmaximumlikelihoodmethod(Pan2002),wherethe

NH fixed-effects u0j captures the effect of unmeasured facility characteristics

after controlling for resident risks (Greene 2001). We then used this model to

predict each resident’s probability of functional decline ð^ pij2Þ assuming null

effect of u0j(i.e.,^ pij2only incorporates resident risks but no facility effects), and

calculated theexpected rate for eachfacility ðE2

in facility j divided by nj. Finally, E2

adjusted QM ðQM2

equations (3)–(5) (replacing E2

jÞ as the sum of^ pij2for residents

jand^ pij2were used to calculate the risk-

jÞ, and its 95 percent CI by repeating the calculations in

jand^ pij2for E1

jand^ pij2, respectively).

Method3. Wesimilarlyestimatedequation(6)butatthistimeu0jwasassumed

normally distributed with mean zero and variance s2

as random-effects using SAS (SAS Corp., Cary, NC) Proc Glimmix (Littell

et al. 2006). In a similar process, we predicted the resident’s probability

of functional decline ( ^ pij3) assuming null effect of u0j, and calculated the

expected rate for each facility (E3

j divided by nj. Finally, E3

QM QM3

u, and estimated

j) as the sum of^ pij3for residents in facility

jand^ pij3were used to calculate the risk-adjusted

j, and its 95 percent CI according to equations (3)–(5).

Method 4. Because the random-effects shrinkage estimator u0j represents

facility variations in outcome after adjusting for resident characteristics, we

used this estimator to derive the risk-adjusted QM (QM4

j) directly. We first

88HSR: Health Services Research 44:1 (February 2009)

Page 11

calculated logitðQM4

logit scale) and then back-transformed logitðQM4

The 95 percent CI of QM4

95 percent CI of u0j.

jÞ ¼ u0jþ logitð17:96 percentÞ (note that u0j is on the

jÞ to the probability scale.

j

was similarly calculated using the estimated

Comparing Statistical Models and QMs

Belowwepresentaframeworkofvaliditycriteriathatcanbeusedtoguideour

comparison of alternative methods at the regression model (or resident) level

and at the QM (or facility) level. These different perspectives of validity

have been described previously (Mukamel 1997; Iezzoni 2003) and their

operational definitions are:

Validity criteria at the regression model level:

? Face validity——Themodel accommodates variables that on face value

are important clinical risk factors.

? Contentvalidity——Therisk-adjustmentmodelincorporatesallconcepts

affecting outcome, i.e., complete risks, facility effects, and chance

component.

? Construct validity——The effects of risk factors on outcome are

estimated in the expected direction.

? Convergentvalidity——Theeffectsofriskfactorsonoutcomeshowclose

agreement when estimated by alternative models.

? Predictive validity——The model predicts actual outcome well.

Validity criteria at the QM level:

? Criterion validity——The QM reflects true quality of care.

? Convergent validity——Facility rankings or identity of outliers based on

QMs derived from alternative methods show close agreement.

The predictive validity of the classical, fixed-effects, and random-effects

models was evaluated by the c-statistic, which equals the area under the

receiver operating characteristic curve (Hanley and McNeil 1982). The

c-statistic exams how well the model discriminates residents with and without

functional decline by assigning a higher predicted probability to those with

functional decline. The c-statistic ranges from 0.5 (random discrimination, no

better than a flip of coin) to 1.0 (perfect discrimination).

AttheQM level, we defined high-and low-qualityfacilitiesby either the

ranking or the 95 percent CI of the QM derived from each method. First,

Nursing Home Quality Report Cards and Risk Adjustment Methods89

Page 12

facilities in the lowest 10 percent and highest 10 percent rankings were iden-

tified as ‘‘best 10 percent’’ and ‘‘worst 10 percent’’ facilities, respectively

(because the QM represents an adverse outcome, lower QM rate indicates

better quality). We also performed sensitivity analyses where we used alter-

native cutoffs (5 and 25 percent) to define best and worst facilities. Second,

facilities were identified as high-quality outliers if the 95 percent CIs of their

QM rates were below the overall rate 17.96 percent, and low-quality outliers

if the 95 percent CIs were above 17.96 percent.

To quantify the convergent validity of alternatively calculated QMs, we

calculated the k statistic (Landis and Koch 1977) (1) between the CMS-

unadjusted QM and each risk-adjusted QM, and (2) between each pair of

risk-adjusted QMs, in identifying best and worst facilities or outliers. The k

measures the level of agreement between two raters evaluating an event on a

categorical scale (Landis and Koch 1977). In this study, we defined the event

scale as 15best facilities (or high-quality outlier), 05medium facilities

(or nonquality outlier), and ?15worst facilities (or low-quality outlier). The

k ranges between 0 and 1, with 0 indicating no agreement beyond chance,

and 1 indicating total agreement.

In comparing the CMS-unadjusted QM and each risk-adjusted QM, we

further calculated (1) the false-positive rate for medium-quality facilities

(i.e., those not identified as high- or low-quality facilities by each risk-adjusted

QM were identified so by the unadjusted QM), and (2) the false-negative rate

for high- and low-quality facilities separately (i.e., those identified as high- or

low-quality facilities by each risk-adjusted QM were not identified so by the

CMS-unadjusted QM) (Glance et al. 2006b).1

RESULTS

Descriptive Results

This study included 605,433 long-term care residents in 9,336 facilities

(Table 1). The overall unadjusted QM rate was 17.96 percent, and it varied

widelyacrossfacilities(interquartilerange11.11–23.40percent,range0–78.85

percent). The residents were on average 81 years old, and had different levels

of ADL impairment at prior assessment (Table 2).

Statistical Models

In general, the classical, fixed-effects, and random-effects models had slightly

different estimates (odds ratios) for individual risk factors (Table 2). The

c-statistic was 0.68 for the classical model, which is comparable with results

90HSR: Health Services Research 44:1 (February 2009)

Page 13

Table2:

Adjustment Models

Resident Characteristics (n5605,433) and Estimates in Risk-

Baseline Characteristic

Prevalence (%) or

Mean ? SD

Odds Ratio Estimated by

Classical

Logistic Model

Fixed-Effects

Model

Random-

Effects Model

Length (in weeks) between prior (or baseline) and target assessments

length

ln(length)

length ? ln(length)

Age in yearsw

ADL performance— —bed mobility

Independent

Supervision

Limited assistance

Extensive assistance

Total dependence or no activity

ADL performance— —transfer

Independent

Supervision

Limited assistance

Extensive assistance

Total dependence or no activity

ADL performance— —eating

Independent

Supervision

Limited assistance

Extensive assistance

Total dependence or no activity

ADL performance— —toilet use

Independent

Supervision

Limited assistance

Extensive assistance

Total dependence or no activity

Short-term memory problem

Cognitive skills for daily decision making

Independent or modified

independent

Moderately impaired

Severely impaired

Rarely understand others or make

self understood

Depression

Behavior problems in wandering

Bowel incontinence

Urinary incontinence

12.37 ? 1.65 0.075

7.390

2.287

1.219

0.108

3.851

2.092

1.170

0.099

4.506

2.139

1.18780.89 ? 12.94

36.47

6.89

18.06

31.46

7.12

Reference

0.873

0.777

0.518

0.442

Reference

0.877

0.675

0.372

0.271

Reference

0.879

0.702

0.406

0.307

25.61

7.92

19.39

31.72

15.36

Reference

1.257

1.362

1.101

0.819

Reference

1.323

1.439

1.202

0.894

Reference

1.311

1.420

1.175

0.875

48.32

26.50

9.96

9.38

5.85

Reference

0.856

0.836

0.636

0.489

Reference

0.922

0.883

0.641

0.525

Reference

0.902

0.870

0.639

0.518

19.23

6.54

16.41

33.76

24.07

71.94

Reference

1.221

1.296

1.112

0.903

1.127

Reference

1.293

1.329

1.140

0.975n

1.147

Reference

1.276

1.323

1.139

0.960n

1.144

43.04Reference ReferenceReference

45.50

11.46

5.00

1.157

1.454

1.157

1.203

1.587

1.149

1.188

1.545

1.152

18.92

9.52

35.29

47.38

1.094

1.094

1.332

1.262

1.106

1.038

1.347

1.289

1.099

1.052

1.340

1.281

continued

Nursing Home Quality Report Cards and Risk Adjustment Methods 91

Page 14

in previous studies (Mukamel 1997; Rosen et al. 2001), 0.76 for the fixed-

effects model, and 0.75 for the random-effects model.

Agreement in NH Classifications

The average risk-adjusted rates ranged between 16.35 percent (fixed-effects

estimate) and 19.15 percent (shrinkage estimator, Table 1). The rate based on

shrinkage estimator exhibited the least variation across facilities, ranging from

2.94 to 67.52 percent.

Table 3 shows that the k between CMS-unadjusted QM and each

risk-adjusted QM ranged between 0.70 and 0.80 in ranking and classifying

facilities. Compared with each adjusted QM, the unadjusted QM had a

false-negative rate of over 0.20 in identifying the worst 10 percent facilities,

and a false-negative rate between 0.14 and 0.26 in identifying the best 10

percent facilities (the false-positive rate of the unadjusted QM was o0.10).

Table 4 shows that the pair-wise agreement between each risk-adjusted QM

was very high (k > 0:80) in rankings. However, the shrinkage estimator

(method4)tendedtodeviatenonrandomlyfromotherestimatesinidentifying

the best 10 percent facilities, the percent of differentially identified facilities

being 18 percent [172/(1721761)], 17 percent [154/(1541779)], and 17

percent [155/(1551778)], respectively (see highlighted cells in Table 4).

We varied the cutoff percentiles in classifying facilities and found similar

results to those in Tables 3 and 4.

In identifying‘‘outliers’’using the 95 percent CI ofeachQM,the overall

k ranged between 0.59 and 0.76 between CMS-unadjusted QM and

each risk-adjusted QM (Table 5). Compared with each adjusted QM,

the false-positive rate of the unadjusted QM was between 0.10 and 0.17, the

Table2. Continued

Baseline Characteristic

Prevalence (%) or

Mean ? SD

8.64

7.20

7.31

Odds Ratio Estimated by

Classical

Logistic Model

Fixed-Effects

Model

Random-

Effects Model

Urinary tract infection

Weight loss

Pressure ulcer

c-Statistic

s2

1.176

1.212

1.282

0.683

1.157

1.232

1.314

0.755

1.161

1.227

1.308

0.752

u(standard error) 0.362 (0.007)

np value4.01, all other p values o.001.

wIntherisk-adjustmentmodels,agewastransformedto0ifageo65,andlnðage ? 64Þifage ? 65.

92HSR: Health Services Research 44:1 (February 2009)

Page 15

false-negative rate was between 0.25 and 0.48 for identifying low-quality

outliers, and was minimal (o0.01) for identifying high-quality outliers.

Table 6 shows that the overall pair-wise agreement between risk-

adjusted QMs was high (k > 0:70) in identifying outliers. However, the

shrinkage estimatordeviated nonrandomly withother estimates in identifying

high-quality outliers, the percent of differentially classified facilities being

37 percent [476/(4761817)], 49 percent [793/(7931841)], and 35 percent

[448/(4481821)], respectively. In addition, there are cases that a pair of other

risk-adjustedQMstendedtodeviatenonrandomlyinidentifyingeithertypeof

outliers (highlighted cells in Table 6).

DISCUSSION

Although current NH QMs are not perfect (Mor et al. 2003), they will likely

play an increasingly important role in market-driven quality improvement

Table3:

Unadjusted Measure Compared with Risk-Adjusted Measures

Agreementin NursingHome QualityRankings——CMS-

Rankings Based on

CMS-Unadjusted

Method

Rankings Based on Risk Adjustment

Method 1n

Method 2w

Method 3z

Method 4§

Worst

10%

Medium

80%

Best

10%

Worst

10%

Medium

80%

Best

10%

Worst

10%

Medium

80%

Best

10%

Worst

10%

Medium

80%

Best

10%

Worst 10%

Medium 80%

Best 10%

731

202

193

7,148

129

0 712

221

212

7,125

133

0 714

219

210

7,123

137

0 716

217

208

7,021

241

0

129

804

133

800

137

796

241

6920000

False-positive ratenn

False-negative ratenn

Overall k

0.040.050.05 0.06

0.220.14 0.240.140.230.15 0.23 0.26

0.790.780.780.71

Nursing homes in the worst 10% group are those whose point estimates of the quality measures of

declineinADLfunctioningareamongthe10%highestoftheratesofallnursinghomes(n59,336).

Nursinghomesinthemedium 80%groupare thosewhosepoint estimates ofthe quality measures

of decline in ADL functioning are between the 10% highest and 10% lowest of the rates of all

nursing homes (n59,336). Nursing homes in the best 10% group are those whose point estimates

of the quality measures of decline in ADL functioning are among the 10% lowest of the rates of all

nursing homes (n59,336).

nBased on classical logistic regression model.

wBased on fixed-effects model.

zBased on random-effects model.

§Based on the shrinkage estimators of the random-effects model.

nnEach risk-adjusted measure was treated as ‘‘gold standard’’ when calculating the false-positive

(for the medium 80% group) and false-negative (for the worst or best 10% group) rates.

Nursing Home Quality Report Cards and Risk Adjustment Methods 93