Interrater and Intrarater Reliability in the Diagnosis and Staging of Endometriosis

Department of Family and Preventive Medicine, University of Utah, Salt Lake City, Utah, United States
Obstetrics and Gynecology (Impact Factor: 5.18). 07/2012; 120(1):104-12. DOI: 10.1097/AOG.0b013e31825bc6cf
Source: PubMed


To estimate the interrater and intrarater reliability of endometriosis diagnosis and severity of disease among gynecologic surgeons viewing operative digital images.
The study population comprised a random sample (n=148 [36%]) of women who participated in the Endometriosis: Natural History, Diagnosis and Outcomes study. Four academic expert and four local, specialized expert surgeons reviewed the images, diagnosed the presence or absence of endometriosis for each woman, and rated severity using the revised American Society for Reproductive Medicine (ASRM) criteria. Interrater-level and intrarater-level agreement were calculated for both endometriosis diagnosis and staging.
The interrater reliability for endometriosis diagnosis among the eight surgeons was substantial: Fleiss κ=0.69 (95% confidence interval [CI] 0.64-0.74). Surgeons agreed on revised ASRM endometriosis staging criteria after experienced assessment in a majority of cases (mean 61%, range 52-75%) with moderate interrater reliability: Fleiss κ=0.44 (95% CI 0.41-0.47). The intrarater reliability for experienced assessment compared with computer-assisted revised ASRM staging was almost perfect (mean weighted κ=0.95, range 0.89-0.99).
Substantial reliability was found for revised ASRM endometriosis diagnosis, whereas moderate reliability was observed for staging. Almost perfect reliability was observed for surgeons' rating of disease severity compared with computerized-assisted, checklist-based staging. Findings suggest that reliability in endometriosis diagnosis is not greatly altered by location or composition of surgeons, supporting the conduct of multisite studies or compilation of endometriosis data across clinical centers. Although surgeons appear to be skilled at assessing endometriosis stage intuitively, how staging of disease burden correlates with clinical outcomes remains to be developed.

10 Reads
  • [Show abstract] [Hide abstract]
    ABSTRACT: In diagnostic medicine, estimating the diagnostic accuracy of a group of raters or medical tests relative to the gold standard is often the primary goal. When a gold standard is absent, latent class models where the unknown gold standard test is treated as a latent variable are often used. However, these models have been criticized in the literature from both a conceptual and a robustness perspective. As an alternative, we propose an approach where we exploit an imperfect reference standard with unknown diagnostic accuracy and conduct sensitivity analysis by varying this accuracy over scientifically reasonable ranges. In this article, a latent class model with crossed random effects is proposed for estimating the diagnostic accuracy of regional obstetrics and gynaecological (OB/GYN) physicians in diagnosing endometriosis. To avoid the pitfalls of models without a gold standard, we exploit the diagnostic results of a group of OB/GYN physicians with an international reputation for the diagnosis of endometriosis. We construct an ordinal reference standard based on the discordance among these international experts and propose a mechanism for conducting sensitivity analysis relative to the unknown diagnostic accuracy among them. A Monte Carlo EM algorithm is proposed for parameter estimation and a BIC-type model selection procedure is presented. Through simulations and data analysis we show that this new approach provides a useful alternative to traditional latent class modeling approaches used in this setting.
    Biometrics 12/2012; 68(4):1294-1302. DOI:10.2307/41806048 · 1.57 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Inter-rater reliability is usually assessed by means of the intraclass correlation coefficient. Using two-way analysis of variance to model raters and subjects as random effects, we derive group sequential testing procedures for the design and analysis of reliability studies in which multiple raters evaluate multiple subjects. Compared with the conventional fixed sample procedures, the group sequential test has smaller average sample number. The performance of the proposed technique is examined using simulation studies and critical values are tabulated for a range of two-stage design parameters. The methods are exemplified using data from the Physician Reliability Study for diagnosis of endometriosis.
    Statistica Sinica 07/2013; 23(4). DOI:10.5705/ss.2012.036s · 1.16 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: There has been limited study of trace elements and endometriosis. Using a matched cohort design, 473 women aged 18-44 years were recruited into an operative cohort, along with 131 similarly-aged women recruited into a population cohort. Endometriosis was defined as surgically visualized disease in the operative cohort, and magnetic resonance imaging diagnosed disease in the population cohort. Twenty trace elements in urine and three in blood were quantified using inductively coupled plasma mass spectrometry. Logistic regression estimated the adjusted odds (aOR) of endometriosis diagnosis for each element by cohort. No association was observed between any element and endometriosis in the population cohort. In the operative cohort, blood cadmium was associated with a reduced odds of diagnosis (aOR=0.55; 95% CI: 0.31, 0.98), while urinary chromium and copper reflected an increased odds (aOR=1.97; 95% CI: 1.21, 3.19; aOR=2.66; 95% CI: 1.26, 5.64, respectively). The varied associations underscore the need for continued research.
    Reproductive Toxicology 07/2013; 42. DOI:10.1016/j.reprotox.2013.05.009 · 3.23 Impact Factor
Show more

Similar Publications