Improving the Reliability of Physician Performance Assessment: Identifying the “Physician Effect” on Quality and Creating Composite Measures

Center for Health Policy Research and Department of Medicine, School of Medicine, University of California, Irvine, California 92697, USA.
Medical care (Impact Factor: 3.23). 04/2009; 47(4):378-87. DOI: 10.1097/MLR.0b013e31818dce07
Source: PubMed


The proliferation of efforts to assess physician performance underscore the need to improve the reliability of physician-level quality measures.
Using diabetes care as a model, to address 2 key issues in creating reliable physician-level quality performance scores: estimating the physician effect on quality and creating composite measures.
Retrospective longitudinal observational study.
A national sample of physicians (n = 210) their patients with diabetes (n = 7574) participating in the National Committee on Quality Assurance-American Diabetes Association's Diabetes Provider Recognition Program.
Using 11 diabetes process and intermediate outcome quality measures abstracted from the medical records of participants, we tested each measure for the magnitude of physician-level variation (the physician effect or "thumbprint"). We then combined measures with a substantial physician effect into a composite, physician-level diabetes quality score and tested its reliability.
We identified the lowest target values for each outcome measure for which there was a recognizable "physician thumbprint" (ie, intraclass correlation coefficient > or =0.30) to create a composite performance score. The internal consistency reliability (Cronbach's alpha) of the composite score, created by combining the process and outcome measures with an intraclass correlation coefficient > or =0.30, exceeded 0.80. The standard errors of the composite case-mix adjusted score were sufficiently small to discriminate those physicians scoring in the highest from those scoring in the lowest quartiles of the quality of care distribution with no overlap.
We conclude that the aggregation of well-tested quality measures that maximize the "physician effect" into a composite measure yields reliable physician-level quality of care scores for patients with diabetes.

25 Reads
  • Source
    • "As a result, the overall scores enable more rigorous differentiation of providers, which is an important finding for future quality assessment of healthcare providers. In line with earlier research, the discriminatory power of the overall scores also decreases the number of responses required to obtain reliable scores, compared to individual indicators [36-38]. The same applies, although to a lesser extent, to the global rating. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Global ratings of healthcare by patients are a popular way of summarizing patients' experiences. Summary scores can be used for comparing healthcare provider performance and provider rankings. As an alternative, overall scores from actual patient experiences can be constructed as summary scores. This paper addresses the statistical and practical characteristics of overall scores as an alternative to a global rating in summarizing patient survey results. Data from a 2010 patient experience survey for approximately 12,000 nursing home residents (7.5% of all Dutch nursing home residents at the time) from 464 nursing homes in the Netherlands (25% of the Dutch nursing homes) was used. Data was collected through specifically designed standardized interview surveys. The respondents' scores for 15 established quality indicators (or composites) for nursing home care were used to calculate overall scores for each nursing home, using four different strategies. The characteristics of the overall scores were compared against each other and with the respondents' global rating. The individual indicators showed stronger associations with each of the four overall strategies than with the global ratings. Furthermore, the dispersion of the overall scores across nursing homes was greater. Differences between overall scores appeared limited. Overall scores proved more valid than global ratings as a summary of the indicator scores, and also showed more pronounced differences between nursing homes. Because of the limited statistical differences between the strategies, and for practical reasons, a straightforward averaging of quality indicator scores may be preferred as an overall score.
    Full-text · Article · Nov 2013 · BMC Health Services Research
  • Source
    • ")). An ICC close to one implies a reliable physician ''thumbprint'' in that the between-physician variation is relatively larger than the within-physician variation; measures having physician-level reliability 40.85 can be considered sufficiently reliable for comparing physicians scoring over or under a threshold value (Kaplan et al. 2009). We estimated the ICC using a multilevel random effect logistic model (NLMIXED procedure, SAS version 9.1.3). "
    [Show abstract] [Hide abstract]
    ABSTRACT: To investigate the feasibility, reliability, and validity of comprehensively assessing physician-level performance in ambulatory practice. Ambulatory-based general internists in 13 states participated in the assessment. We assessed physician-level performance, adjusted for patient factors, on 46 individual measures, an overall composite measure, and composite measures for chronic, acute, and preventive care. Between- versus within-physician variation was quantified by intraclass correlation coefficients (ICC). External validity was assessed by correlating performance on a certification exam. Medical records for 236 physicians were audited for seven chronic and four acute care conditions, and six age- and gender-appropriate preventive services. Performance on the individual and composite measures varied substantially within (range 5-86 percent compliance on 46 measures) and between physicians (ICC range 0.12-0.88). Reliabilities for the composite measures were robust: 0.88 for chronic care and 0.87 for preventive services. Higher certification exam scores were associated with better performance on the overall (r = 0.19; p<.01), chronic care (r = 0.14, p = .04), and preventive services composites (r = 0.17, p = .01). Our results suggest that reliable and valid comprehensive assessment of the quality of chronic and preventive care can be achieved by creating composite measures and by sampling feasible numbers of patients for each condition.
    Full-text · Article · Dec 2010 · Health Services Research
  • Source
    • "reach targets. Risk adjustment models developed to allow ''fair'' comparisons of performance have so far proven inadequate (Zhang et al. 2000; Thompson et al. 2005; Kaplan et al. 2009). Although few have looked at the unintended consequences of publicly reporting intermediate outcomes, evidence from other clinical areas raises concerns about the potential for risk selection to avoid patients likely to have poor outcomes (Werner, Asch, and Polsky 2005). "
    [Show abstract] [Hide abstract]
    ABSTRACT: To evaluate the attainability of tight risk factor control targets for three diabetes risk factors and to assess the degree of polypharmacy required. National Health and Nutrition Examination Survey-III. We simulated a strategy of "treating to targets," exposing subjects to a battery of treatments until low-density lipoprotein (LDL)-cholesterol (100 mg/dL), hemoglobin A1c (7 percent), and blood pressure (130/80 mm Hg) targets were achieved or until all treatments had been exhausted. Regimens included five statins of increasing potency, four A1c-lowering therapies, and eight steps of antihypertensive therapy. We selected parameter estimates from placebo-controlled trials and meta-analyses. Under ideal efficacy conditions, 77, 64, and 58 percent of subjects achieved the LDL, A1c, and blood pressure targets, respectively. Successful control depended highly on a subject's baseline number of treatments. Using the least favorable assumptions of treatment tolerance, success rates were 11-17 percentage points lower. Approximately 57 percent of subjects required five or more medication classes. A significant proportion of people with diabetes will fail to achieve targets despite using high doses of multiple, conventional treatments. These findings raise concerns about the feasibility and polypharmacy burden needed for tight risk factor control, and the use of measures of tight control to assess the quality of care for diabetes.
    Full-text · Article · Apr 2010 · Health Services Research
Show more