Interrater and intrarater reliability of the Kuntz et al new deformity classification system

Medical University Innsbruck, Innsbruck, Austria.
Neurosurgery (Impact Factor: 3.62). 07/2012; 71(1):47-57. DOI: 10.1227/NEU.0b013e31824f4e58
Source: PubMed

ABSTRACT Kuntz et al recently introduced a new system for classifying spinal deformities. This classification of spinal deformity was developed from age-dependent deviations from the neutral upright spinal alignment.
To determine the interobserver and intraobserver reliabilities of the new Kuntz et al system for classifying scoliosis.
Fifty consecutive patients were evaluated. Three observers independently assigned a major structural curve, minor structural curve, curve type, apical vertebral rotation, spinal balance, and pelvic alignment to each curve following the guidelines described by Kuntz et al. Assignment of the curves was repeated 4 weeks later, with the curves presented in a different blinded order. The Kendall W and Holsti agreement coefficients were used to determine the interobserver and intraobserver agreement.
The intraobserver value of agreement for all parameters was 0.85 (range, 0.28-1.0), and the mean Kendall W coefficient was 0.89 (range, 0.5-0.97), demonstrating perfect reliability. The interobserver agreement averaged 0.7 (range, 0.251-1.0). The mean Kendall W coefficient was 0.67 (range, 0.19-1.0), demonstrating substantial reliability. The average time for classification of 1 curve was approximately 8.4 minutes.
The new Kuntz et al deformity classification system is comparable to the Lenke et al system in terms of reliability. However, the Kuntz et al classification system provides no recommendations for surgical interventions. It is more complex and time-consuming and therefore may be of limited value in daily clinical practice.

1 Follower
17 Reads
  • [Show abstract] [Hide abstract]
    ABSTRACT: Object: The aim of this study was to examine observer reliability of frequently used arteriovenous malformation (AVM) grading scales, including the 5-tier Spetzler-Martin scale, the 3-tier Spetzler-Ponce scale, and the Pollock-Flickinger radiosurgery-based scale, using current imaging modalities in a setting closely resembling routine clinical practice. Methods: Five experienced raters, including 1 vascular neurosurgeon, 2 neuroradiologists, and 2 senior neurosurgical residents independently reviewed 15 MRI studies, 15 CT angiograms, and 15 digital subtraction angiograms obtained at the time of initial diagnosis. Assessments of 5 scans of each imaging modality were repeated for measurement of intrarater reliability. Three months after the initial assessment, raters reassessed those scans where there was disagreement. In this second assessment, raters were asked to justify their rating with comments and illustrations. Generalized kappa (κ) analysis for multiple raters, Kendall's coefficient of concordance (W), and interclass correlation coefficient (ICC) were applied to determine interrater reliability. For intrarater reliability analysis, Cohen's kappa (κ), Kendall's correlation coefficient (tau-b), and ICC were used to assess repeat measurement agreement for each rater. Results: Interrater reliability for the overall 5-tier Spetzler-Martin scale was fair to good (ICC = 0.69) to extremely strong (Kendall's W = 0.73) on initial assessment and improved on reassessment. Assessment of CT angiograms resulted in the highest agreement, followed by MRI and digital subtraction angiography. Agreement for the overall 3-tier Spetzler-Ponce grade was fair to good (ICC = 0.68) to strong (Kendall's W = 0.70) on initial assessment, improved on reassessment, and was comparable to agreement for the 5-tier Spetzler-Martin scale. Agreement for the overall Pollock-Flickinger radiosurgery-based grade was excellent (ICC = 0.89) to extremely strong (Kendall's W = 0.81). Intrarater reliability for the overall 5-tier Spetzler-Martin grade was excellent (ICC > 0.75) in 3 of the 5 raters and fair to good (ICC > 0.40) in the other 2 raters. Conclusion: The 5-tier Spetzler-Martin scale, the 3-tier Spetzler-Ponce scale, and the Pollock-Flickinger radiosurgery-based scale all showed a high level of agreement. The improved reliability on reassessment was explained by a training effect from the initial assessment and the requirement to defend the rating, which outlines a potential downside for grades determined as part of routine clinical practice to be used for scientific purposes.
    Journal of Neurosurgery 03/2014; 120(5). DOI:10.3171/2014.2.JNS131262 · 3.74 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Retrospective radiographical review by 5 independent observers. To validate the intra- and interobserver reliability of the simplified skeletal maturity scoring (SSMS) system in a large cohort for each stage and for the overall cohort. The SSMS has been used to successfully predict curve progression in idiopathic scoliosis. A total of 275 patients with scoliosis (8-16 yr) with 1 hand radiograph were included from 2005 to 2011. Five participants independently scored images on 2 separate occasions using the SSMS (stage, 1-8). Observers (listed in order of increasing SSMS experience) included orthopedic surgery resident, clinical fellow (CF), research fellow, and senior faculty. Intraobserver agreement between the 2 sets of scores was estimated using the Pearson and Spearman rank correlation coefficients. Interobserver agreement was estimated with the unweighted Fleiss κ coefficient for the overall cohort and for junior (orthopedic surgery resident, CF, research fellow) versus senior faculty. Intrarater reliability for orthopedic surgery resident, CF, research fellow, senior faculty was 0.956, 0.967, 0.986, 0.991, and 0.998, respectively (Spearman). Intrarater agreement improved with greater familiarity using the SSMS. The inter-rater reliability for junior faculty (κ = 0.65), senior faculty (κ = 0.652), and the overall group (κ = 0.66) indicated agreement between all observers but no improved inter-rater agreement with experience. However, 98% of disagreements occurred only within 1 stage. Stages 2, 3, and 4 accounted for most of the variability; stage 3 was the most commonly scored stage, corresponding to peak growth velocity. The SSMS has excellent intraobserver agreement with substantial interobserver agreement. Intraobserver - but not interobserver agreement - improves with familiarity using the SSMS. Expectancy bias may contribute to a higher likelihood of assigning an SSMS 3. Discrepancies when classifying stages 2 to 4 may be resolved by improved descriptions of epiphyseal capping in stages 2 and 3. 2.
    Spine 12/2014; 39(26):E1592-8. DOI:10.1097/BRS.0000000000000653 · 2.30 Impact Factor