Interrater and intrarater reliability of the Kuntz et al new deformity classification system.
ABSTRACT Kuntz et al recently introduced a new system for classifying spinal deformities. This classification of spinal deformity was developed from age-dependent deviations from the neutral upright spinal alignment.
To determine the interobserver and intraobserver reliabilities of the new Kuntz et al system for classifying scoliosis.
Fifty consecutive patients were evaluated. Three observers independently assigned a major structural curve, minor structural curve, curve type, apical vertebral rotation, spinal balance, and pelvic alignment to each curve following the guidelines described by Kuntz et al. Assignment of the curves was repeated 4 weeks later, with the curves presented in a different blinded order. The Kendall W and Holsti agreement coefficients were used to determine the interobserver and intraobserver agreement.
The intraobserver value of agreement for all parameters was 0.85 (range, 0.28-1.0), and the mean Kendall W coefficient was 0.89 (range, 0.5-0.97), demonstrating perfect reliability. The interobserver agreement averaged 0.7 (range, 0.251-1.0). The mean Kendall W coefficient was 0.67 (range, 0.19-1.0), demonstrating substantial reliability. The average time for classification of 1 curve was approximately 8.4 minutes.
The new Kuntz et al deformity classification system is comparable to the Lenke et al system in terms of reliability. However, the Kuntz et al classification system provides no recommendations for surgical interventions. It is more complex and time-consuming and therefore may be of limited value in daily clinical practice.
- [Show abstract] [Hide abstract]
ABSTRACT: Object The aim of this study was to examine observer reliability of frequently used arteriovenous malformation (AVM) grading scales, including the 5-tier Spetzler-Martin scale, the 3-tier Spetzler-Ponce scale, and the Pollock-Flickinger radiosurgery-based scale, using current imaging modalities in a setting closely resembling routine clinical practice. Methods Five experienced raters, including 1 vascular neurosurgeon, 2 neuroradiologists, and 2 senior neurosurgical residents independently reviewed 15 MRI studies, 15 CT angiograms, and 15 digital subtraction angiograms obtained at the time of initial diagnosis. Assessments of 5 scans of each imaging modality were repeated for measurement of intrarater reliability. Three months after the initial assessment, raters reassessed those scans where there was disagreement. In this second assessment, raters were asked to justify their rating with comments and illustrations. Generalized kappa (κ) analysis for multiple raters, Kendall's coefficient of concordance (W), and interclass correlation coefficient (ICC) were applied to determine interrater reliability. For intrarater reliability analysis, Cohen's kappa (κ), Kendall's correlation coefficient (tau-b), and ICC were used to assess repeat measurement agreement for each rater. Results Interrater reliability for the overall 5-tier Spetzler-Martin scale was fair to good (ICC = 0.69) to extremely strong (Kendall's W = 0.73) on initial assessment and improved on reassessment. Assessment of CT angiograms resulted in the highest agreement, followed by MRI and digital subtraction angiography. Agreement for the overall 3-tier Spetzler-Ponce grade was fair to good (ICC = 0.68) to strong (Kendall's W = 0.70) on initial assessment, improved on reassessment, and was comparable to agreement for the 5-tier Spetzler-Martin scale. Agreement for the overall Pollock-Flickinger radiosurgery-based grade was excellent (ICC = 0.89) to extremely strong (Kendall's W = 0.81). Intrarater reliability for the overall 5-tier Spetzler-Martin grade was excellent (ICC > 0.75) in 3 of the 5 raters and fair to good (ICC > 0.40) in the other 2 raters. Conclusion The 5-tier Spetzler-Martin scale, the 3-tier Spetzler-Ponce scale, and the Pollock-Flickinger radiosurgery-based scale all showed a high level of agreement. The improved reliability on reassessment was explained by a training effect from the initial assessment and the requirement to defend the rating, which outlines a potential downside for grades determined as part of routine clinical practice to be used for scientific purposes.Journal of Neurosurgery 03/2014; · 3.15 Impact Factor