Comparison of Checklist and Anchored Global Rating Instruments for Performance Rating of Simulated Pediatric Emergencies
ABSTRACT To compare the psychometric performance of two rating instruments used to assess trainee performance in three clinical scenarios.
This study was part of a two-phase, randomized trial with a wait-list control condition assessing the effectiveness of a pediatric emergency medicine curriculum targeting general emergency medicine residents. Residents received 6 hours of instruction either before or after the first assessment. Separate pairs of raters completed either a dichotomous checklist for each of three cases or the Global Performance Assessment Tool (GPAT), an anchored multidimensional scale. A fully crossed person×rater×case generalizability study was conducted. The effect of training year on performance is assessed using multivariate analysis of variance.
The person and person×case components accounted for most of the score variance for both instruments. Using either instrument, scores demonstrated a small but significant increase as training level increased when analyzed using a multivariate analysis of variance. The inter-rater reliability coefficient was >0.9 for both instruments.
We demonstrate that our checklist and anchored global rating instrument performed in a psychometrically similar fashion with high reliability. As long as proper attention is given to instrument design and testing and rater training, checklists and anchored assessment scales can produce reproducible data for a given population of subjects. The validity of the data arising for either instrument type must be assessed rigorously and with a focus, when practicable, on patient care outcomes.
SourceAvailable from: José Antonio Iglesias-Vázquez[Show abstract] [Hide abstract]
ABSTRACT: IntroductionThe aims of this study are to: a) assess the quality in clinical management during a simulated scenario of acute supraventricular tachycardia (SVT) by means of a structured task-based checklist and to b) detect pitfalls and grey areas where reinforcement in training may be needed.Material and methodsWe systematically reviewed SVT simulated scenarios during simulation courses between June 2008 and April 2010. Three scenarios were programmed using SimBaby® simulation system, and included stable SVT (S-SVT), stable progressing to unstable SVT (SU-SVT) and unstable SVT (U-SVT). Scenarios were evaluated by means of an 18-task checklist based on ILCOR international recommendations.ResultsA total of 45 scenarios were assessed with the participation of 167 paediatricians, including 15 S-SVT, 25 SU-SVT and 5 U-SVT scenarios. Out of a total of 551 possible tasks, 328 (59.5%) were completed correctly. The mean percentage of correct tasks per scenario was 63.4 (16.7) for S-SVT, 47.8 (20.3) for SU-ST and 38.6 (31) for U-SVT (p = 0.028). There were no significant differences between primary care paediatricians and hospital paediatricians. Most of the participants correctly identified non-sinus rhythm as SVT. However, important pitfalls were observed, including failure to identify haemodynamic instability in 20 out of 43 (48%) cases, an incorrect dose of adenosine in 18 out of 39 (48%), incorrect adenosine administration in 23 out of 39 (59%), and non-recognition of indication to emergent cardioversion in 15 out of 31 (48%).Conclusions Paediatricians are able to diagnose SVT correctly, but need to improve their skills in treatment. Systematic analysis of clinical performance in a simulated scenario allows the identification of strengths, as well as weak points, where reinforcement is needed.Anales de Pediatría 09/2012; 77(3):165–170. DOI:10.1016/j.anpedi.2012.01.020 · 0.72 Impact Factor
[Show abstract] [Hide abstract]
ABSTRACT: ContextThe relative advantages and disadvantages of checklists and global rating scales (GRSs) have long been debated. To compare the merits of these scale types, we conducted a systematic review of the validity evidence for checklists and GRSs in the context of simulation-based assessment of health professionals.Methods We conducted a systematic review of multiple databases including MEDLINE, EMBASE and Scopus to February 2013. We selected studies that used both a GRS and checklist in the simulation-based assessment of health professionals. Reviewers working in duplicate evaluated five domains of validity evidence, including correlation between scales and reliability. We collected information about raters, instrument characteristics, assessment context, and task. We pooled reliability and correlation coefficients using random-effects meta-analysis.ResultsWe found 45 studies that used a checklist and GRS in simulation-based assessment. All studies included physicians or physicians in training; one study also included nurse anaesthetists. Topics of assessment included open and laparoscopic surgery (n = 22), endoscopy (n = 8), resuscitation (n = 7) and anaesthesiology (n = 4). The pooled GRS–checklist correlation was 0.76 (95% confidence interval [CI] 0.69–0.81, n = 16 studies). Inter-rater reliability was similar between scales (GRS 0.78, 95% CI 0.71–0.83, n = 23; checklist 0.81, 95% CI 0.75–0.85, n = 21), whereas GRS inter-item reliabilities (0.92, 95% CI 0.84–0.95, n = 6) and inter-station reliabilities (0.80, 95% CI 0.73–0.85, n = 10) were higher than those for checklists (0.66, 95% CI 0–0.84, n = 4 and 0.69, 95% CI 0.56–0.77, n = 10, respectively). Content evidence for GRSs usually referenced previously reported instruments (n = 33), whereas content evidence for checklists usually described expert consensus (n = 26). Checklists and GRSs usually had similar evidence for relations to other variables.Conclusions Checklist inter-rater reliability and trainee discrimination were more favourable than suggested in earlier work, but each task requires a separate checklist. Compared with the checklist, the GRS has higher average inter-item and inter-station reliability, can be used across multiple tasks, and may better capture nuanced elements of expertise.Medical Education 02/2015; 49(2). DOI:10.1111/medu.12621 · 3.62 Impact Factor
[Show abstract] [Hide abstract]
ABSTRACT: The process of developing checklists to rate clinical performance is essential for ensuring their quality; thus, the authors applied an integrative approach for designing checklists that evaluate clinical performance. The approach consisted of five predefined steps (taken 2012-2013). Step 1: On the basis of the relevant literature and their clinical experience, the authors drafted a preliminary checklist. Step 2: The authors sent the draft checklist to five experts who reviewed it using an adapted Delphi technique. Step 3: The authors devised three scoring categories for items after pilot testing. Step 4: To ensure the changes made after pilot testing were valid, the checklist was submitted to an additional Delphi review round. Step 5: To weight items needed for accurate performance assessment, 10 pediatricians rated all checklist items in terms of their importance on a scale from 1 (not important) to 5 (essential). The authors have illustrated their approach using the example of a checklist for a simulation scenario of infant septic shock. The five-step approach resulted in a valid, reliable tool and proved to be an effective method to design evaluation checklists. It resulted in 33 items, most consisting of three scoring categories. This approach integrates published evidence and the knowledge of domain experts. A robust development process is a necessary prerequisite of valid performance checklists. Establishing a widely recognized standard for developing evaluation checklists will likely support the design of appropriate measurement tools and move the field of performance assessment in health care forward.Academic medicine: journal of the Association of American Medical Colleges 05/2014; 89(7). DOI:10.1097/ACM.0000000000000289 · 3.47 Impact Factor