Some practical issues of experimental design and data analysis in radiological ROC studies.

Department of Radiology, University of Chicago, IL 60637.
Investigative Radiology (Impact Factor: 5.46). 04/1989; 24(3):234-45. DOI:10.1097/00004424-198903000-00012
Source: PubMed

ABSTRACT Receiver operating characteristic (ROC) analysis has been used in a broad variety of medical imaging studies during the past 15 years, and its advantages over more traditional measures of diagnostic performance are now clearly established. But despite the essential simplicity of the approach, workers in the field often find--sometimes only after an ROC study is under way--that a number of subtle issues related to experimental design and data analysis must be confronted in practice. Many of these issues have not been discussed in the literature in detail, and most are not well known. The purposes of this paper are to make users of ROC methodology in medical imaging aware of potential problems that should be confronted before an ROC study is begun and to indicate, at least broadly, how those problems may be dealt with, given the present state of the art. Some of the issues raised here can be addressed adequately by easily prescribed techniques, whereas others remain difficult and will be resolved fully only by new methodologic developments.

0 0
  • [show abstract] [hide abstract]
    ABSTRACT: To compare the performance of computer-aided diagnosis (CADx) analysis of precontrast high spectral and spatial resolution (HiSS) MRI to that of clinical dynamic contrast-enhanced MRI (DCE-MRI) in the diagnostic classification of breast lesions. Thirty-four malignant and seven benign lesions were scanned using two-dimensional (2D) HiSS and clinical 4D DCE-MRI protocols. Lesions were automatically segmented. Morphological features were calculated for HiSS, whereas both morphological and kinetic features were calculated for DCE-MRI. After stepwise feature selection, Bayesian artificial neural networks merged selected features, and receiver operating characteristic (ROC) analysis evaluated the performance with leave-one-lesion-out validation. AUC (area under the ROC curve) values of 0.92 ± 0.06 and 0.90 ± 0.05 were obtained using CADx on HiSS and DCE-MRI, respectively, in the task of classifying benign and malignant lesions. While we failed to show that the higher HiSS performance was significantly better than DCE-MRI, noninferiority testing confirmed that HiSS was not worse than DCE-MRI. CADx of HiSS (without contrast) performed similarly to CADx on clinical DCE-MRI; thus, computerized analysis of HiSS may provide sufficient information for diagnostic classification. The results are clinically important for patients in whom contrast agent is contra-indicated. Even in the limited acquisition mode of 2D single slice HiSS, by using quantitative image analysis to extract characteristics from the HiSS images, similar performance levels were obtained as compared with those from current clinical 4D DCE-MRI. As HiSS acquisitions become possible in 3D, CADx methods can also be applied. Because HiSS and DCE-MRI are based on different contrast mechanisms, the use of the two protocols in combination may increase diagnostic accuracy. J. Magn. Reson. Imaging 2013. © 2013 Wiley Periodicals, Inc.
    Journal of Magnetic Resonance Imaging 09/2013; · 2.57 Impact Factor
  • [show abstract] [hide abstract]
    ABSTRACT: To compare the diagnostic performance of high-resolution ultrasound (HRUS) with contrast-enhanced CT and contrast-enhanced magnetic resonance imaging (MRI) with MR cholangiopancreatography (MRCP) to differentiate between adenomyomatosis (ADM) and gallbladder cancer (GBCA). Forty patients with surgically proven ADM (n = 13) or GBCA at stage T2 or lower (n = 27) who previously underwent preoperative HRUS, contrast-enhanced CT, and contrast-enhanced MRI with MRCP were retrospectively included in this study. According to the well-known diagnostic criteria, two reviewers independently analyzed the images from each modality separately with a five-point confidence scale. The interobserver agreement was calculated using weighted κ statistics. A receiver operating characteristic curve analysis was performed and the sensitivity, specificity, and accuracy were calculated for each modality when scores of 1 or 2 indicated ADM. The interobserver agreement between the two reviewers was good to excellent. The mean Az values for HRUS, multidetector CT (MDCT), and MRI were 0.959, 0.898, and 0.935, respectively, without any statistically significant differences between any of the modalities (p > 0.05). The mean sensitivity of MRI with MRCP (80.8%) was significantly higher than that of MDCT (50.0%) (p = 0.0215). However, the mean sensitivity of MRI with MRCP (80.8%) was not significantly different from that of HRUS (73.1%) (p > 0.05). The mean specificities and accuracies among the three modalities were not significantly different (p > 0.05). High-resolution ultrasound and MRI with MRCP have comparable sensitivity and accuracy and MDCT has the lowest sensitivity and accuracy for the differentiation of ADM and GBCA.
    Korean journal of radiology: official journal of the Korean Radiological Society 01/2014; 15(2):226-234. · 1.32 Impact Factor
  • [show abstract] [hide abstract]
    ABSTRACT: To investigate the effect of being forewarned that they would be asked to identify repeated images on radiologists' recognition of previously interpreted versus new chest radiographs. Thirteen radiologists viewed 60 posterior-anterior chest radiographs, 31 with and 29 without nodules, in two sets of 40 images each. Eight radiologists were forewarned and five radiologists were not forewarned of the memory task. Twenty images in each of the two sets were unique to each set and 20 images occurred in both sets. The readers indicated the presence or absence of any nodules during both readings, and in the second reading session they also indicated whether they thought each image had also occurred in the first reading. There was no significant difference in recognition memory performance between forewarned and not-forewarned readers. Overall accuracy in distinguishing previously-viewed from new images was 60.7%. Being forewarned of the memory task did not improve recognition memory.
    Academic radiology 12/2013; 20(12):1598-603. · 2.09 Impact Factor

C E Metz