Shiloach M, Frencher SK Jr, Steeger JE, Rowell KS, Bartzokis K, Tomeh MG, Richards KE, Ko CY, Hall BL. Toward robust information: Data quality and inter-rater reliability in the American College of Surgeons National Surgical Quality Improvement Program

Division of Research and Optimal Patient Care, American College of Surgeons, Chicago, IL 60611, USA.
Journal of the American College of Surgeons (Impact Factor: 5.12). 01/2010; 210(1):6-16. DOI: 10.1016/j.jamcollsurg.2009.09.031
Source: PubMed


Data used for evaluating quality of medical care need to be of high reliability to ensure valid quality assessment and benchmarking. The American College of Surgeons National Surgical Quality Improvement Program (ACS NSQIP) has continually emphasized the collection of highly reliable clinical data through its program infrastructure.
We provide a detailed description of the various mechanisms used in ACS NSQIP to assure collection of high quality data, including training of data collectors (surgical clinical reviewers) and ongoing audits of data reliability. For the 2005 through 2008 calendar years, inter-rater reliability was calculated overall and for individual variables using percentages of agreement between the data collector and the auditor. Variables with > 5% disagreement are flagged for educational efforts to improve accurate collection. Cohen's kappa was estimated for selected variables from the 2007 audit year.
Inter-rater reliability audits show that overall disagreement rates on variables have fallen from 3.15% in 2005 (the first year of public enrollment in ACS NSQIP) to 1.56% in 2008. In addition, disagreement levels for individual variables have continually improved, with 26 individual variables demonstrating > 5% disagreement in 2005, to only 2 such variables in 2008. Estimated kappa values suggest substantial or almost perfect agreement for most variables.
The ACS NSQIP has implemented training and audit procedures for its hospital participants that are highly effective in collecting robust data. Audit results show that data have been reliable since the program's inception and that reliability has improved every year.

Download full-text


Available from: Karen Richards, Jul 08, 2014
1 Follower
41 Reads
  • Source
    • "The audits are also measures of data collection quality. Program audits reported low disagreement rates (between the auditor and site data collector) relatively early in program development (3.15% in 2005) and this rate has continued to drop (1.56% in 2008).15 "
    [Show abstract] [Hide abstract]
    ABSTRACT: Postoperative adverse events occur all too commonly and contribute greatly to our large and increasing healthcare costs. Surgeons, as well as hospitals, need to know their own outcomes in order to recognise areas that need improvement before they can work towards reducing complications. In the USA, the American College of Surgeons National Surgical Quality Improvement Project (ACS NSQIP) collects clinical data that provide benchmarks for providers and hospitals. This review summarises the history of ACS NSQIP and its components, and describes the evidence that feeding outcomes back to providers, along with real-time comparisons with other hospital rates, leads to quality improvement, better patient outcomes, cost savings and overall improved patient safety. The potential harms and limitations of the program are discussed.
    BMJ quality & safety 04/2014; 23(7). DOI:10.1136/bmjqs-2013-002223 · 3.99 Impact Factor
  • Source
    • "Surgical Quality Improvement Program (NSQIP) [32] "
    [Show abstract] [Hide abstract]
    ABSTRACT: In response to increasing use of lumbar fusion for improving back pain, despite unclear efficacy, particularly among injured workers, some insurers have developed limited coverage policies. Washington State's workers' compensation (WC) program requires imaging confirmation of instability and limits initial fusions to a single level. In contrast, California requires coverage if a second opinion supports surgery, allows initial multilevel fusion, and provides additional reimbursement for surgical implants. There are no studies that compare population-level effects of these policy differences on utilization, costs, and safety of lumbar fusion.
    The spine journal: official journal of the North American Spine Society 11/2013; 14(7). DOI:10.1016/j.spinee.2013.08.018 · 2.43 Impact Factor
  • Source
    • "Variables with the lowest kappas were: do not resuscitate (DNR) status (0.32), history of angina (0.32), rest pain (0.38) and bleeding disorder (0.38). The percent agreement for these variables ranged from 94-99%, showing that, as in our study, low kappas may arise from high levels of chance agreement in studies of the reliability of medical record review [12]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Medical record review (MRR) is one of the most commonly used research methods in clinical studies because it provides rich clinical detail. However, because MRR involves subjective interpretation of information found in the medical record, it is critically important to understand the reproducibility of data obtained from MRR. Furthermore, because medical record review is both technically demanding and time intensive, it is important to establish whether trained research staff with no clinical training can abstract medical records reliably. Methods We assessed the reliability of abstraction of medical record information in a sample of patients who underwent total knee replacement (TKR) at a referral center. An orthopedic surgeon instructed two research coordinators (RCs) in the abstraction of inpatient medical records and operative notes for patients undergoing primary TKR. The two RCs and the surgeon each independently reviewed 75 patients’ records and one RC reviewed the records twice. Agreement was assessed using the proportion of items on which reviewers agreed and the kappa statistic. Results The kappa for agreement between the surgeon and each RC ranged from 0.59 to 1 for one RC and 0.49 to 1 for the other; the percent agreement ranged from 82% to 100% for one RC and 70% to 100% for the other. The repeated abstractions by the same RC showed high intra-rater agreement, with kappas ranging from 0.66 to 1 and percent agreement ranging from 97% to 100%. Inter-rater agreement between the two RCs was moderate with kappa ranging from 0.49 to 1 and percent agreement ranging from 76% to 100%. Conclusion The MRR method used in this study showed excellent reliability for abstraction of information that had low technical complexity and moderate to good reliability for information that had greater complexity. Overall, these findings support the use of non-surgeons to abstract surgical data from operative notes.
    BMC Musculoskeletal Disorders 06/2013; 14(1):181. DOI:10.1186/1471-2474-14-181 · 1.72 Impact Factor
Show more