Interrater and Intrarater Reliability in the Diagnosis and Staging of Endometriosis

Department of Family and Preventive Medicine, University of Utah, Salt Lake City, Utah, United States
Obstetrics and Gynecology (Impact Factor: 4.37). 07/2012; 120(1):104-12. DOI: 10.1097/AOG.0b013e31825bc6cf
Source: PubMed

ABSTRACT To estimate the interrater and intrarater reliability of endometriosis diagnosis and severity of disease among gynecologic surgeons viewing operative digital images.
The study population comprised a random sample (n=148 [36%]) of women who participated in the Endometriosis: Natural History, Diagnosis and Outcomes study. Four academic expert and four local, specialized expert surgeons reviewed the images, diagnosed the presence or absence of endometriosis for each woman, and rated severity using the revised American Society for Reproductive Medicine (ASRM) criteria. Interrater-level and intrarater-level agreement were calculated for both endometriosis diagnosis and staging.
The interrater reliability for endometriosis diagnosis among the eight surgeons was substantial: Fleiss κ=0.69 (95% confidence interval [CI] 0.64-0.74). Surgeons agreed on revised ASRM endometriosis staging criteria after experienced assessment in a majority of cases (mean 61%, range 52-75%) with moderate interrater reliability: Fleiss κ=0.44 (95% CI 0.41-0.47). The intrarater reliability for experienced assessment compared with computer-assisted revised ASRM staging was almost perfect (mean weighted κ=0.95, range 0.89-0.99).
Substantial reliability was found for revised ASRM endometriosis diagnosis, whereas moderate reliability was observed for staging. Almost perfect reliability was observed for surgeons' rating of disease severity compared with computerized-assisted, checklist-based staging. Findings suggest that reliability in endometriosis diagnosis is not greatly altered by location or composition of surgeons, supporting the conduct of multisite studies or compilation of endometriosis data across clinical centers. Although surgeons appear to be skilled at assessing endometriosis stage intuitively, how staging of disease burden correlates with clinical outcomes remains to be developed.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Inter-rater reliability is usually assessed by means of the intraclass correlation coefficient. Using two-way analysis of variance to model raters and subjects as random effects, we derive group sequential testing procedures for the design and analysis of reliability studies in which multiple raters evaluate multiple subjects. Compared with the conventional fixed sample procedures, the group sequential test has smaller average sample number. The performance of the proposed technique is examined using simulation studies and critical values are tabulated for a range of two-stage design parameters. The methods are exemplified using data from the Physician Reliability Study for diagnosis of endometriosis.
    Statistica Sinica 07/2013; 23(4). DOI:10.5705/ss.2012.036s · 1.23 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: To determine whether accuracy of visual diagnosis of endometriosis at laparoscopy is determined by stage of disease. Prospective longitudinal cohort study (Canadian Task Force classification II-2). Tertiary referral centers in three Australian states. Of 1439 biopsy specimens, endometriosis was proved in at least one specimen in 431 patients. Laparoscopy with visual diagnosis and staging of endometriosis followed by histopathologic analysis and confirmation. Operations were performed by five experienced laparoscopic gynecologists. Histopathologic confirmation of visual diagnosis of endometriosis adjusted for significant covariates. Endometriosis was accurately diagnosed in 49.7% of American Society for Reproductive Medicine (ASRM) stage I, which was significantly less accurate than for other stages of endometriosis. Deep endometriosis was more likely to be diagnosed accurately than superficial endometriosis (adjusted odds ratio, 2.51; 95% confidence interval, 1.50-4.18; p < .01). Lesion volume was also predictive, with larger lesions diagnosed more accurately than smaller lesions. In general, lesion site did not greatly influence accuracy except for superficial ovarian lesions, which were more likely to be incorrectly diagnosed visually as endometriosis (adjusted odds ratio, 0.16; 95% confidence interval, 0.06-0.41; p < .01). There was no statistically significant difference in accuracy between the gynecologic surgeons. The accuracy of visual diagnosis of endometriosis was substantially influenced by American Society of Reproductive Medicine stage, the depth and volume of the lesion, and to a lesser extent the location of the lesion.
    Journal of Minimally Invasive Gynecology 11/2013; 20(6):783-9. DOI:10.1016/j.jmig.2013.04.017 · 1.58 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Background: Facial erythema is a clinical hallmark of rosacea and often causes social and psychological distress. Although facial erythema assessments are a common endpoint in rosacea clinical trials, their reliability has not been evaluated. Objective: The objective of this study was to evaluate the inter-and intrarater reliability of the Clinician's Erythema Assessment (CEA), a 5-point grading scale of facial erythema severity. Methods: Twelve board-certified dermatologists, previously trained on use of the scale, rated erythema of 28 rosacea subjects twice on the same day. Interrater and intrarater agreement was assessed with the intraclass correlation and k statistic. Results: The CEA had high interrater reliability and good intrarater reliability with an overall intraclass correlation coefficient (ICC) for session 1 and session 2 of 0.601 and 0.576, respectively; the overall weighted k statistic for session 1 and session 2 was 0.692. Limitations: Raters were experienced dermatologists and there may be a risk of recall bias. Conclusion: When used by trained raters, CEA is a reliable scale for measuring the facial erythema of rosacea.
    Journal of the American Academy of Dermatology 07/2014; 71(4). DOI:10.1016/j.jaad.2014.05.044 · 5.00 Impact Factor