Interrater and Intrarater Reliability in the Diagnosis and Staging of Endometriosis

Department of Family and Preventive Medicine, University of Utah, Salt Lake City, Utah, United States
Obstetrics and Gynecology (Impact Factor: 4.37). 07/2012; 120(1):104-12. DOI: 10.1097/AOG.0b013e31825bc6cf
Source: PubMed

ABSTRACT To estimate the interrater and intrarater reliability of endometriosis diagnosis and severity of disease among gynecologic surgeons viewing operative digital images.
The study population comprised a random sample (n=148 [36%]) of women who participated in the Endometriosis: Natural History, Diagnosis and Outcomes study. Four academic expert and four local, specialized expert surgeons reviewed the images, diagnosed the presence or absence of endometriosis for each woman, and rated severity using the revised American Society for Reproductive Medicine (ASRM) criteria. Interrater-level and intrarater-level agreement were calculated for both endometriosis diagnosis and staging.
The interrater reliability for endometriosis diagnosis among the eight surgeons was substantial: Fleiss κ=0.69 (95% confidence interval [CI] 0.64-0.74). Surgeons agreed on revised ASRM endometriosis staging criteria after experienced assessment in a majority of cases (mean 61%, range 52-75%) with moderate interrater reliability: Fleiss κ=0.44 (95% CI 0.41-0.47). The intrarater reliability for experienced assessment compared with computer-assisted revised ASRM staging was almost perfect (mean weighted κ=0.95, range 0.89-0.99).
Substantial reliability was found for revised ASRM endometriosis diagnosis, whereas moderate reliability was observed for staging. Almost perfect reliability was observed for surgeons' rating of disease severity compared with computerized-assisted, checklist-based staging. Findings suggest that reliability in endometriosis diagnosis is not greatly altered by location or composition of surgeons, supporting the conduct of multisite studies or compilation of endometriosis data across clinical centers. Although surgeons appear to be skilled at assessing endometriosis stage intuitively, how staging of disease burden correlates with clinical outcomes remains to be developed.

  • [Show abstract] [Hide abstract]
    ABSTRACT: In diagnostic studies without a gold standard, the assumption on the dependence structure of the multiple tests or raters plays an important role in model performance. In case of binary disease status, both conditional independence and crossed random effects structure have been proposed and their performance investigated. Less attention has been paid to the situation where the true disease status is ordinal. In this paper, we propose crossed subject-specific and rater-specific random effects to account for the dependence structure and assess the robustness of the proposed model to misspecification in the random effects distributions. We applied the models to data from the Physician Reliability Study, which focuses on assessing the diagnostic accuracy in a population of raters for the staging of endometriosis, a gynecological disorder in women. Using this new methodology, we estimate the probability of a correct classification and show that regional experts can more easily classify the intermediate stage than resident physicians. Copyright © 2013 John Wiley & Sons, Ltd.
    Statistics in Medicine 09/2013; 32(20). DOI:10.1002/sim.5784 · 2.04 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: To determine whether accuracy of visual diagnosis of endometriosis at laparoscopy is determined by stage of disease. Prospective longitudinal cohort study (Canadian Task Force classification II-2). Tertiary referral centers in three Australian states. Of 1439 biopsy specimens, endometriosis was proved in at least one specimen in 431 patients. Laparoscopy with visual diagnosis and staging of endometriosis followed by histopathologic analysis and confirmation. Operations were performed by five experienced laparoscopic gynecologists. Histopathologic confirmation of visual diagnosis of endometriosis adjusted for significant covariates. Endometriosis was accurately diagnosed in 49.7% of American Society for Reproductive Medicine (ASRM) stage I, which was significantly less accurate than for other stages of endometriosis. Deep endometriosis was more likely to be diagnosed accurately than superficial endometriosis (adjusted odds ratio, 2.51; 95% confidence interval, 1.50-4.18; p < .01). Lesion volume was also predictive, with larger lesions diagnosed more accurately than smaller lesions. In general, lesion site did not greatly influence accuracy except for superficial ovarian lesions, which were more likely to be incorrectly diagnosed visually as endometriosis (adjusted odds ratio, 0.16; 95% confidence interval, 0.06-0.41; p < .01). There was no statistically significant difference in accuracy between the gynecologic surgeons. The accuracy of visual diagnosis of endometriosis was substantially influenced by American Society of Reproductive Medicine stage, the depth and volume of the lesion, and to a lesser extent the location of the lesion.
    Journal of Minimally Invasive Gynecology 11/2013; 20(6):783-9. DOI:10.1016/j.jmig.2013.04.017 · 1.58 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Background: Facial erythema is a clinical hallmark of rosacea and often causes social and psychological distress. Although facial erythema assessments are a common endpoint in rosacea clinical trials, their reliability has not been evaluated. Objective: The objective of this study was to evaluate the inter-and intrarater reliability of the Clinician's Erythema Assessment (CEA), a 5-point grading scale of facial erythema severity. Methods: Twelve board-certified dermatologists, previously trained on use of the scale, rated erythema of 28 rosacea subjects twice on the same day. Interrater and intrarater agreement was assessed with the intraclass correlation and k statistic. Results: The CEA had high interrater reliability and good intrarater reliability with an overall intraclass correlation coefficient (ICC) for session 1 and session 2 of 0.601 and 0.576, respectively; the overall weighted k statistic for session 1 and session 2 was 0.692. Limitations: Raters were experienced dermatologists and there may be a risk of recall bias. Conclusion: When used by trained raters, CEA is a reliable scale for measuring the facial erythema of rosacea.
    Journal of the American Academy of Dermatology 07/2014; 71(4). DOI:10.1016/j.jaad.2014.05.044 · 5.00 Impact Factor