Some Practical Issues of Experimental Design and Data Analysis in Radiological ROC Studies

Department of Radiology, University of Chicago, IL 60637.
Investigative Radiology (Impact Factor: 4.44). 04/1989; 24(3):234-45. DOI: 10.1097/00004424-198903000-00012
Source: PubMed


Receiver operating characteristic (ROC) analysis has been used in a broad variety of medical imaging studies during the past 15 years, and its advantages over more traditional measures of diagnostic performance are now clearly established. But despite the essential simplicity of the approach, workers in the field often find--sometimes only after an ROC study is under way--that a number of subtle issues related to experimental design and data analysis must be confronted in practice. Many of these issues have not been discussed in the literature in detail, and most are not well known. The purposes of this paper are to make users of ROC methodology in medical imaging aware of potential problems that should be confronted before an ROC study is begun and to indicate, at least broadly, how those problems may be dealt with, given the present state of the art. Some of the issues raised here can be addressed adequately by easily prescribed techniques, whereas others remain difficult and will be resolved fully only by new methodologic developments.

41 Reads
  • Source
    • "To determine the diagnostic accuracy of each modality for the two reviewers, the area under the ROC curve (Az) value was evaluated. Factors with Az values greater than 0.80 were regarded as good diagnostic accuracy (25). The Az values of each imaging modality acquired from the ROC curves were statistically compared using the paired Z-test. "
    [Show abstract] [Hide abstract]
    ABSTRACT: To compare the diagnostic performance of high-resolution ultrasound (HRUS) with contrast-enhanced CT and contrast-enhanced magnetic resonance imaging (MRI) with MR cholangiopancreatography (MRCP) to differentiate between adenomyomatosis (ADM) and gallbladder cancer (GBCA). Forty patients with surgically proven ADM (n = 13) or GBCA at stage T2 or lower (n = 27) who previously underwent preoperative HRUS, contrast-enhanced CT, and contrast-enhanced MRI with MRCP were retrospectively included in this study. According to the well-known diagnostic criteria, two reviewers independently analyzed the images from each modality separately with a five-point confidence scale. The interobserver agreement was calculated using weighted κ statistics. A receiver operating characteristic curve analysis was performed and the sensitivity, specificity, and accuracy were calculated for each modality when scores of 1 or 2 indicated ADM. The interobserver agreement between the two reviewers was good to excellent. The mean Az values for HRUS, multidetector CT (MDCT), and MRI were 0.959, 0.898, and 0.935, respectively, without any statistically significant differences between any of the modalities (p > 0.05). The mean sensitivity of MRI with MRCP (80.8%) was significantly higher than that of MDCT (50.0%) (p = 0.0215). However, the mean sensitivity of MRI with MRCP (80.8%) was not significantly different from that of HRUS (73.1%) (p > 0.05). The mean specificities and accuracies among the three modalities were not significantly different (p > 0.05). High-resolution ultrasound and MRI with MRCP have comparable sensitivity and accuracy and MDCT has the lowest sensitivity and accuracy for the differentiation of ADM and GBCA.
    Korean journal of radiology: official journal of the Korean Radiological Society 03/2014; 15(2):226-234. DOI:10.3348/kjr.2014.15.2.226 · 1.57 Impact Factor
  • Source
    • "The key of ROC analysis in radiology is that the participants rate the confidence of judgment or the likelihood of malignancy etc. instead of giving binary answer (i.e. present or absent) (Obuchowski 2003; Metz 1978; Hanley & McNeil 1982; Berbaum et al. 1989; Metz 1989; Gur et al. 1989). The fundamental problem of rating is that the decision needs to be " not obvious " , and " should be of borderline difficulty " (Metz 1978). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Detective performance of radiologists for "obvious" targets should be evaluated by visual search task instead of ROC analysis, but visual task have not been applied to radiology studies. The aim of this study was to set up an environment that allows visual search task in radiology, to evaluate its feasibility, and to preliminarily investigate the effect of career on the performance. In a darkroom, ten radiologists were asked to answer the type of lesion by pressing buttons, when images without lesions, with bulla, ground-glass nodule, and solid nodule were randomly presented on a display. Differences in accuracy and reaction times depending on board certification were investigated. The visual search task was successfully and feasibly performed. Radiologists were found to have high sensitivity, specificity, positive predictive values and negative predictive values in non-board and board groups. Reaction time was under 1 second for all target types in both groups. Board radiologists were significantly faster in answering for bulla, but there were no significant differences for other targets and values. We developed an experimental system that allows visual search experiment in radiology. Reaction time for detection of bulla was shortened with experience.
    SpringerPlus 11/2013; 2(1):607. DOI:10.1186/2193-1801-2-607
    • "We evaluated the effectiveness of these features in distinguishing malignant from non-cancer glands using the receiver operating characteristic (ROC) curve, which is a plot of sensitivity versus 1 – specificity (or false-positive rate).[27–29] We obtained the maximum-likelihood estimate of ROC curves[30] and used the area under the ROC curve (AUC)[31] as a summary statistic of the ROC curve. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Identification of individual prostatic glandular structures is an important prerequisite to quantitative histological analysis of prostate cancer with the aid of a computer. We have developed a computer method to segment individual glandular units and to extract quantitative image features, for computer identification of prostatic adenocarcinoma. TWO SETS OF DIGITAL HISTOLOGY IMAGES WERE USED: database I (n = 57) for developing and testing the computer technique, and database II (n = 116) for independent validation. The segmentation technique was based on a k-means clustering and a region-growing method. Computer segmentation results were evaluated subjectively and also compared quantitatively against manual gland outlines, using the Jaccard similarity measure. Quantitative features that were extracted from the computer segmentation results include average gland size, spatial gland density, and average gland circularity. Linear discriminant analysis (LDA) was used to combine quantitative image features. Classification performance was evaluated with receiver operating characteristic (ROC) analysis and the area under the ROC curve (AUC). Jaccard similarity coefficients between computer segmentation and manual outlines of individual glands were between 0.63 and 0.72 for non-cancer and between 0.48 and 0.54 for malignant glands, respectively, similar to an interobserver agreement of 0.79 for non-cancer and 0.75 for malignant glands, respectively. The AUC value for the features of average gland size and gland density combined via LDA was 0.91 for database I and 0.96 for database II. Using a computer, we are able to delineate individual prostatic glands automatically and identify prostatic adenocarcinoma accurately, based on the quantitative image features extracted from computer-segmented glandular structures.
    07/2011; 2:33. DOI:10.4103/2153-3539.83193
Show more