Improving the reliability of physician performance assessment: Identifying the "physician effect" on quality and creating composite measures
ABSTRACT The proliferation of efforts to assess physician performance underscore the need to improve the reliability of physician-level quality measures.
Using diabetes care as a model, to address 2 key issues in creating reliable physician-level quality performance scores: estimating the physician effect on quality and creating composite measures.
Retrospective longitudinal observational study.
A national sample of physicians (n = 210) their patients with diabetes (n = 7574) participating in the National Committee on Quality Assurance-American Diabetes Association's Diabetes Provider Recognition Program.
Using 11 diabetes process and intermediate outcome quality measures abstracted from the medical records of participants, we tested each measure for the magnitude of physician-level variation (the physician effect or "thumbprint"). We then combined measures with a substantial physician effect into a composite, physician-level diabetes quality score and tested its reliability.
We identified the lowest target values for each outcome measure for which there was a recognizable "physician thumbprint" (ie, intraclass correlation coefficient > or =0.30) to create a composite performance score. The internal consistency reliability (Cronbach's alpha) of the composite score, created by combining the process and outcome measures with an intraclass correlation coefficient > or =0.30, exceeded 0.80. The standard errors of the composite case-mix adjusted score were sufficiently small to discriminate those physicians scoring in the highest from those scoring in the lowest quartiles of the quality of care distribution with no overlap.
We conclude that the aggregation of well-tested quality measures that maximize the "physician effect" into a composite measure yields reliable physician-level quality of care scores for patients with diabetes.
- SourceAvailable from: Eric S Holmboe
[Show abstract] [Hide abstract]
- ")). An ICC close to one implies a reliable physician ''thumbprint'' in that the between-physician variation is relatively larger than the within-physician variation; measures having physician-level reliability 40.85 can be considered sufficiently reliable for comparing physicians scoring over or under a threshold value (Kaplan et al. 2009). We estimated the ICC using a multilevel random effect logistic model (NLMIXED procedure, SAS version 9.1.3). "
ABSTRACT: To investigate the feasibility, reliability, and validity of comprehensively assessing physician-level performance in ambulatory practice. Ambulatory-based general internists in 13 states participated in the assessment. We assessed physician-level performance, adjusted for patient factors, on 46 individual measures, an overall composite measure, and composite measures for chronic, acute, and preventive care. Between- versus within-physician variation was quantified by intraclass correlation coefficients (ICC). External validity was assessed by correlating performance on a certification exam. Medical records for 236 physicians were audited for seven chronic and four acute care conditions, and six age- and gender-appropriate preventive services. Performance on the individual and composite measures varied substantially within (range 5-86 percent compliance on 46 measures) and between physicians (ICC range 0.12-0.88). Reliabilities for the composite measures were robust: 0.88 for chronic care and 0.87 for preventive services. Higher certification exam scores were associated with better performance on the overall (r = 0.19; p<.01), chronic care (r = 0.14, p = .04), and preventive services composites (r = 0.17, p = .01). Our results suggest that reliable and valid comprehensive assessment of the quality of chronic and preventive care can be achieved by creating composite measures and by sampling feasible numbers of patients for each condition.Health Services Research 12/2010; 45(6 Pt 2):1912-33. DOI:10.1111/j.1475-6773.2010.01160.x · 2.49 Impact Factor
[Show abstract] [Hide abstract]
- "reach targets. Risk adjustment models developed to allow ''fair'' comparisons of performance have so far proven inadequate (Zhang et al. 2000; Thompson et al. 2005; Kaplan et al. 2009). Although few have looked at the unintended consequences of publicly reporting intermediate outcomes, evidence from other clinical areas raises concerns about the potential for risk selection to avoid patients likely to have poor outcomes (Werner, Asch, and Polsky 2005). "
ABSTRACT: To evaluate the attainability of tight risk factor control targets for three diabetes risk factors and to assess the degree of polypharmacy required. National Health and Nutrition Examination Survey-III. We simulated a strategy of "treating to targets," exposing subjects to a battery of treatments until low-density lipoprotein (LDL)-cholesterol (100 mg/dL), hemoglobin A1c (7 percent), and blood pressure (130/80 mm Hg) targets were achieved or until all treatments had been exhausted. Regimens included five statins of increasing potency, four A1c-lowering therapies, and eight steps of antihypertensive therapy. We selected parameter estimates from placebo-controlled trials and meta-analyses. Under ideal efficacy conditions, 77, 64, and 58 percent of subjects achieved the LDL, A1c, and blood pressure targets, respectively. Successful control depended highly on a subject's baseline number of treatments. Using the least favorable assumptions of treatment tolerance, success rates were 11-17 percentage points lower. Approximately 57 percent of subjects required five or more medication classes. A significant proportion of people with diabetes will fail to achieve targets despite using high doses of multiple, conventional treatments. These findings raise concerns about the feasibility and polypharmacy burden needed for tight risk factor control, and the use of measures of tight control to assess the quality of care for diabetes.Health Services Research 04/2010; 45(2):437-56. DOI:10.1111/j.1475-6773.2009.01075.x · 2.49 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: To evaluate measurement of physician quality performance, which is increasingly used by health plans as the basis of quality improvement, network design, and financial incentives, despite concerns about data and methodological challenges. Evaluation of health plan administrative claims and enrollment data. Using administrative data from 9 health plans, we analyzed results for 27 well-accepted quality measures and evaluated how many quality events (patients eligible for a measure) were available per primary care physician and how different approaches for attributing patients to physicians affect the number of quality events per physician. Fifty-seven percent of primary care physicians had at least 1 patient who was eligible for at least 1 of the selected quality measures. Most physicians had few quality events for any single measure. As an example, for a measure evaluating appropriate treatment for children with upper respiratory tract infections, physicians on average had 14 quality events when care was attributed to physicians if they saw the patient at least once in the measurement year. The mean number of quality events dropped to 9 when attribution required that the physician provide care in at least 50% of a patient's visits. Few physicians had more than 30 quality events for any given measure. Available administrative data for a single health plan may provide insufficient information for benchmarking performance for individual physicians. Efforts are needed to develop consensus on assigning measure accountability and to expand information available for each physician, including accessing electronic clinical data, exploring composite measures of performance, and aggregating data across public and private health plans.The American journal of managed care 02/2009; 15(1):67-72. · 2.17 Impact Factor