Page 1

Ranking USRDS provider specific SMRs from 1998-2001

Rongheng Lin

Department of Public Health, University of Massachusetts Amherst, Rm 411 Arnold House, 715 N.

Pleasant Rd., Amherst, MA 01003, USA

Thomas A. Louis

Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD

21205, USA e-mail: tlouis@jhsph.edu

Susan M. Paddock

RAND Corporation, Santa Monica, CA 90407, USA e-mail: paddock@rand.org

Greg Ridgeway

e-mail: gregr@rand.org

Abstract

Provider profiling (ranking/percentiling) is prevalent in health services research. Bayesian models

coupled with optimizing a loss function provide an effective framework for computing non-standard

inferences such as ranks. Inferences depend on the posterior distribution and should be guided by

inferential goals. However, even optimal methods might not lead to definitive results and ranks should

be accompanied by valid uncertainty assessments. We outline the Bayesian approach and use

estimated Standardized Mortality Ratios (SMRs) in 1998-2001 from the United States Renal Data

System (USRDS) as a platform to identify issues and demonstrate approaches. Our analyses extend

Liu et al. (2004) by computing estimates developed by Lin et al. (2006) that minimize errors in

classifying providers above or below a percentile cut-point, by combining evidence over multiple

years via a first-order, autoregressive model on log(SMR), and by use of a nonparametric prior.

Results show that ranks/percentiles based on maximum likelihood estimates of the SMRs and those

based on testing whether an SMR = 1 substantially under-perform the optimal estimates. Combining

evidence over the four years using the autoregressive model reduces uncertainty, improving

performance over percentiles based on only one year. Furthermore, percentiles based on posterior

probabilities of exceeding a properly chosen SMR threshold are essentially identical to those

produced by minimizing classification loss. Uncertainty measures effectively calibrate performance,

showing that considerable uncertainty remains even when using optimal methods. Findings highlight

the importance of using loss function guided percentiles and the necessity of accompanying estimates

with uncertainty assessments.

Keywords

Provider profiling; Ranks/percentiles; Bayesian hierarchical model; Uncertainty assessment

1 Introduction

Research on and application of performance evaluation steadily increases with applications to

evaluating health service providers (Christiansen and Morris 1997; Goldstein and Spiegelhalter

Correspondence to: Rongheng Lin.

e-mail: rlin@schoolph.umass.edu.

NIH Public Access

Author Manuscript

Health Serv Outcomes Res Methodol. Author manuscript; available in PMC 2009 April 1.

Published in final edited form as:

Health Serv Outcomes Res Methodol. 2009 March 1; 9(1): 22–38. doi:10.1007/s10742-008-0040-0.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 2

1996; Landrum et al. 2000; Liu et al. 2004; McClellan and Staiger 1999; Grigg et al. 2003;

Zhang et al. 2006; Normand and Shahian 2007; Ohlssen et al. 2007), prioritizing environmental

assessments in small areas (Conlon and Louis 1999; Louis and Shen 1999; Shen and Louis

2000) and ranking teachers and schools (Lockwood et al. 2002). Inferential goals of these

studies include evaluating the population performance, such as the average performance of all

health providers and comparing performance among providers. Performance evaluations

include comparing unit-specific, substantive measures such as death rates, identifying the

group of poorest or best performing units and overall ranking of the units, e.g., profiling or

league tables (Goldstein and Spiegelhalter 1996).

The Standardized Mortality Ratio (SMR), the ratio of observed to expected deaths, is an

important service quality indicator (Zaslavsky 2001). The United States Renal Data System

(USRDS) produces annual estimated SMRs for several thousand dialysis centers and uses these

as a quality screen (Lacson et al. 2001; ESRD 2000; USRDS 2005). Invalid estimation or

inappropriate interpretation can have serious consequences for these dialysis centers and for

their patients. We present an analysis of the information from the United States Renal Data

System (USRDS) for 1998-2001 as a platform for demonstrating and comparing approaches

to ranking health service providers. From the USRDS we obtained observed and expected

deaths for the K = 3173 dialysis centers that contributed information for all four years. The

approach used by USRDS to produce these values can be found in USRDS (2005).

Though estimating SMRs is a standard statistical operation (produce provider-specific

expected deaths based on a statistical model, and then compute the “observed/expected” ratio),

it is important and challenging to deal with complications such as the need to specify a reference

population (providers included, the time period covered, attribution of events), the need to

validate the model used to adjust for important patient attributes (age, gender, diabetes, type

of dialysis, severity of disease), and the need to adjust for potential biases induced when

attributing deaths to providers and accounting for informative censoring.

The multi-level data structure and complicated inferential goals require the use of a hierarchical

Bayesian model that accounts for nesting relations and specifies both population values and

random effects. Correctly specified, the model properly accounts for the sample design,

variance components and other uncertainties, producing valid and efficient estimates of

population parameters, variance components and unit-specific random effects (provider-,

clinician-, or region-specific latent attributes), all accompanied by valid uncertainty

assessments. Importantly, the Bayesian approach provides the necessary structure for

developing scientific and policy-relevant inferences based on the joint posterior distribution

of all unknowns.

As Shen and Louis (1998) show and Gelman and Price (1999) present in detail, no single set

of estimates or assessments can effectively address multiple goals and we provide a suite of

assessments. Guided by a loss function, the Bayesian approach structures non-standard

inferences such as ranking (including identification of extremely poor and good performers)

and estimating the histogram of unit-specific random effects. For example, as Liu et al.

(2004) show, when estimation uncertainty varies over dialysis centers, ranks produced by Z-

scores that test whether a provider's SMR = 1 tend to identify providers with relatively low

variance as extreme because these tests have the highest power; ranks produced from the

provider-specific maximum likelihood estimates (MLEs) are more likely to identify dialysis

centers with relatively high variance as extreme. Effective ranks depend on striking an effective

tradeoff between signal and noise.

Lin et al. (2006) present estimates that minimize errors in classifying providers above or below

a percentile cut-point. Our analyses build on Liu et al. (2004) by extending the application of

Lin et al.Page 2

Health Serv Outcomes Res Methodol. Author manuscript; available in PMC 2009 April 1.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 3

Lin et al. (2006)'s estimates to combine evidence over multiple years via a first-order,

autoregressive model on log(SMR), and by use of a nonparametric prior. For single-year

analyses we compare the results from the log-normal prior to those based on the Non-

Parametric, Maximum Likelihood (NPML) prior (Laird 1978).

In following, Sect. 2 presents our models; Sect. 3 outlines several ranking methods; Sect. 4

gives uncertainty measures; Sect. 5 presents results and Sect. 6 sums up and identifies

additional research. Computing code for all routines is available at,

http://people.umass.edu/rlin/jhuwebhost/usrds-ranking.htm.

2 Statistical models

We employ both single-year and longitudinal models for observed deaths and underlying

parameters, with the former a sub-model of the latter. To this end, let (Ykt, mkt) be the observed

and case-mix adjusted, expected deaths for provider k in year t, k = 1,... 3173, t = 0, 1, 2, 3 and

ρkt be the SMR. The USRDS computes the expecteds under the assumption that all providers

give the same quality of care for patients with identical covariates, see USRDS (2005) for

details. We employ the conditional Poisson model,

(1)

If the provider has “average performance”, ρkt = 1. For both single-year and multiple-year

analyses we model θkt = log(ρkt).

2.1 Single-year analyses

For single-year analyses, we assume that for year t; θkt Gt, k = 1,…, 3173: We use a year-

specific, normal prior (see the note after Eq. 2) and for the single-year analyses also use the

non-parametric maximum likelihood (NPML) prior. See and Carlin and Louis (2008) and

Paddock et al. (2006) for additional details and Appendix C for the estimation algorithm.

2.2 The longitudinal, AR(1) model

To model longitudinal correlation among (ρk0, ρk1, ρk2, ρk3), let ϕ = cor(θk,t, θk(t+1)), with -1 <

ϕ < 1. Then, use a normal prior on the θkt and a normal prior on Z(ϕ) = 0.5 log {(1 + ϕ)/(1-ϕ)}

in the hierarchical model,

(2)

The notation “iid” means independently and identically distributed and “ind” means

independently distributed. The relation is first-order Markov, because though conditioning is

on all prior θs, only ρk(t-1) appears on the right-hand side of Eq. 2.

Marginally, for year t, θkt iid N(ξt, ) and setting ϕ = 0 produces four, single-year analyses,

each using the Liu et al. (2004) model with no borrowing of information over time. For ϕ > 0,

Lin et al.Page 3

Health Serv Outcomes Res Methodol. Author manuscript; available in PMC 2009 April 1.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 4

we have a standard AR(1) model on the latent log(SMR)s and the posterior distribution

combines evidence across dialysis centers within year and within dialysis center across years.

2.3 Posterior sampling implementation and hyper-prior parameters

We implement a Gibbs sampler for model (2) with WinBUGS via the R package

R2WinBUGS, using the coda package to diagnose convergence (Spiegelhalter et al. 1999;

Gelman et al. 2006; Plummer et al. 2006). We use V = 10, μ = 0.01, α = 0.05, values that

stabilize the simulation while allowing sufficient adaptation to the data. With V = 10, the a

priori, 95% probability interval for ξt is (-6.20, 6.20) [(0.002, 492.75) in the SMR scale]; the

values for α and μ produce a distribution for τ2 with center near 100, inducing large, a priori

variation for the θkt. For the AR(1) model, reported results are based on the Vϕ = 0.2. This

produces an a priori 95% probability interval for ϕ of (-0.70, 0.70). In a sensitivity analysis,

we also tried Vϕ = 2, which produced the a priori interval (-0.99, 0.99) and yielded results

virtually identical to those based on the Vϕ = 0.2 hyper-prior. In both cases, the data likelihood

dominated the priors. This can also be seen in the shrinkage of τ2 towards zero, as reported in

the Sect. 5.4. There is no strong posterior correlation observed between ϕ and the τ2s.

3 Loss function based ranking methods

Two general strategies for ranking are available. The preferred strategic approach first

computes the joint posterior distribution of the ranks and then uses it to produce estimates and

uncertainty assessments, generally guided by a loss function that is appropriate for analytic

goals. This approach ensures that estimated ranks have desired properties such as not depending

on a monotone transform of the target parameters. The other approach is based on ordering

estimates of target parameters (MLEs or posterior means) or on ordering statistics testing the

null hypothesis that SMRk ≡ 1. If the posterior distributions of the target parameters are

stochastically ordered, then for a broad class of loss functions (estimation criteria) optimally

estimated ranks will not depend on the strategy. However, Lin et al. (2006) and others have

shown that estimates not derived from the distribution of the ranks can perform very poorly

and may not be invariant under monotone transformation of the target parameters. Producing

the joint posterior distribution of the ranks is computationally intensive, but most estimates

depend only on easily computable features.

We first define ranks and then specify candidate ranking methods. For clarity in defining ranks,

we drop the index t and write

1. Rank-based estimates are based on the joint posterior distribution of the Rk(ρ) and are

invariant under monotone transform of the ρk.

, with the smallest ρk having rank

3.1 Squared-error loss

Shen and Louis (1998) and Lockwood et al. (2002) study ranks that minimize the posterior

risk induced by squared error loss (SEL):

posterior expected ranks,

. It is minimized by the

(3)

Lin et al. Page 4

Health Serv Outcomes Res Methodol. Author manuscript; available in PMC 2009 April 1.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript

Page 5

where pr(·) stands for probability. The optimal mean squared error (MSE) in estimating the

ranks is equal to the average posterior variance of the ranks. Generally, the are not integers;

for optimal, distinct integer ranks, use

.

In the notation that follows, generally we drop dependency on ρ (equivalently, on θ) and omit

conditioning on Y. For example, Rk stands for Rk(θ) and

either ranks (Rk) or, equivalently, percentiles [Pk = Rk/(K + 1)] with percentiles providing an

effective normalization. For example, Lockwood et al. (2002) show that MSE for percentiles

rapidly converges to a function of ranking estimator and posterior distributions of parameters

that does not depend on K.

stands for . We present

3.2 Optimizing (above γ)/(below γ) classification errors

The USRDS uses percentiles to identify the best and the worst performers. Let γ be the fraction

of top performers among the total that we want to identify, 0 < γ < 1. A loss function designed

to address this inferential goal was proposed by Lin et al. (2006). The loss function (Eq. 4)

penalizes for misclassification and also imposes a distance penalty between estimated

percentiles and the cutoff γ.

(4)

For ease of presentation, we have assumed that γK is an integer and so γ(K + 1) is not. It is not

necessary to make the distinction between > and ≥. To minimize the posterior risk induced by

(4), let and .

is minimized by:

(5)

Dominici et al. (1999) use this approach with γ = K/(K + 1), ordering by the probability of a

unit having the largest latent attribute.

3.3 Equivalence of the and ordering posterior exceedance probabilities

Given an SMR threshold t, the ranks/percentiles induced by ordering the posterior probabilities

that an SMR exceeds the threshold, pr(ρk > t|Y) allow us to make a connection between the

and the substantive scale (in our application, SMR). Normand et al. (1997) rank providers

based on these “exceedance probabilities” and Diggle et al. (2007) use them to identify the

areas with elevated disease rates. Lin et al. (2006) shows that exceedance probability based

percentiles are virtually identical to the by choosing the γth percentile of the average of

posterior cumulative distribution function as the threshold t, i.e., , where

. We denote the percentiles based on as . In

addition to providing a connection to the SMR scale, the

are the . Note that the

are far easier to compute than

are invariant under the monotone transform of ρk.

Lin et al.Page 5

Health Serv Outcomes Res Methodol. Author manuscript; available in PMC 2009 April 1.

NIH-PA Author Manuscript

NIH-PA Author Manuscript

NIH-PA Author Manuscript