Association Between Time Spent Interpreting, Level of Confidence, and Accuracy of Screening Mammography

Department of Family Medicine, Oregon Health & Science University, Portland, 97239, USA.
American Journal of Roentgenology (Impact Factor: 2.73). 04/2012; 198(4):970-8. DOI: 10.2214/AJR.11.6988
Source: PubMed


The objective of this study was to examine the effect of time spent viewing images and level of confidence on a screening mammography test set on interpretive performance.
Radiologists from six mammography registries participated in this study and were randomized to interpret one of four test sets and complete 12 survey questions. Each test set had 109 cases of digitized four-view screening screen-film mammograms with prior comparison screening views. Viewing time for each case was defined as the cumulative time spent viewing all mammographic images before recording which visible feature, if any, was the "most significant finding." Log-linear regression fit via the generalized estimating equation was used to test the effect of viewing time and level of confidence in the interpretation on test set sensitivity and false-positive rate.
One hundred nineteen radiologists completed a test set and contributed data on 11,484 interpretations. The radiologists spent more time viewing cases that had significant findings or cases for which they had less confidence in their interpretation. Each additional minute of viewing time increased the probability of a true-positive interpretation among cancer cases by 1.12 (95% CI, 1.06-1.19; p < 0.001) regardless of confidence in the assessment. Among the radiologists who were very confident in their assessment, each additional minute of viewing time increased the adjusted risk of a false-positive interpretation among noncancer cases by 1.42 (95% CI, 1.21-1.68), and this viewing-time effect diminished with decreasing confidence.
Longer interpretation times and higher levels of confidence in an interpretation are both associated with higher sensitivity and false-positive rates in mammography screening.


Available from: Karla Kerlikowske
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Interpretive accuracy varies among radiologists, especially in mammography. This study examines the relationship between radiologists' confidence in their assessments and their accuracy in interpreting mammograms. In this study, 119 community radiologists interpreted 109 expert-defined screening mammography examinations in test sets and rated their confidence in their assessment for each case. They also provided a global assessment of their ability to interpret mammograms. Positive predictive value (PPV) and negative predictive value (NPV) were modeled as functions of self-rated confidence on each examination using log-linear regression estimated with generalized estimating equations. Reference measures were cancer status and expert-defined need for recall. Effect modification by weekly mammography volume was examined. Radiologists who self-reported higher global interpretive ability tended to interpret more mammograms per week (p = 0.08), were more likely to specialize (p = 0.02) and to have completed a fellowship in breast or women's imaging (p = 0.05), and had a higher PPV for cancer detection (p = 0.01). Examinations for which low-volume radiologists were "very confident" had a PPV of 2.93 times (95% CI, 2.01-4.27) higher than examinations they rated with neutral confidence. Trends of increasing NPVs with increasing confidence were significant for low-volume radiologists relative to noncancers (p = 0.01) and expert nonrecalls (p < 0.001). A trend of significantly increasing NPVs existed for high-volume radiologists relative to expert nonrecall (p = 0.02) but not relative to noncancer status (p = 0.32). Confidence in mammography assessments was associated with better accuracy, especially for low-volume readers. Asking for a second opinion when confidence in an assessment is low may increase accuracy.
    American Journal of Roentgenology 07/2012; 199(1):W134-41. DOI:10.2214/AJR.11.7701 · 2.73 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The aim of this study was to assess agreement of mammographic interpretations by community radiologists with consensus interpretations of an expert radiology panel to inform approaches that improve mammographic performance. From 6 mammographic registries, 119 community-based radiologists were recruited to assess 1 of 4 randomly assigned test sets of 109 screening mammograms with comparison studies for no recall or recall, giving the most significant finding type (mass, calcifications, asymmetric density, or architectural distortion) and location. The mean proportion of agreement with an expert radiology panel was calculated by cancer status, finding type, and difficulty level of identifying the finding at the patient, breast, and lesion level. Concordance in finding type between study radiologists and the expert panel was also examined. For each finding type, the proportion of unnecessary recalls, defined as study radiologist recalls that were not expert panel recalls, was determined. Recall agreement was 100% for masses and for examinations with obvious findings in both cancer and noncancer cases. Among cancer cases, recall agreement was lower for lesions that were subtle (50%) or asymmetric (60%). Subtle noncancer findings and benign calcifications showed 33% agreement for recall. Agreement for finding responsible for recall was low, especially for architectural distortions (43%) and asymmetric densities (40%). Most unnecessary recalls (51%) were asymmetric densities. Agreement in mammographic interpretation was low for asymmetric densities and architectural distortions. Training focused on these interpretations could improve the accuracy of mammography and reduce unnecessary recalls.
    Journal of the American College of Radiology: JACR 11/2012; 9(11):788-94. DOI:10.1016/j.jacr.2012.05.020 · 2.84 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Test sets for assessing and improving radiologic image interpretation have been used for decades and typically evaluate performance relative to gold standard interpretations by experts. To assess test sets for screening mammography, a gold standard for whether a woman should be recalled for additional workup is needed, given that interval cancers may be occult on mammography and some findings ultimately determined to be benign require additional imaging to determine if biopsy is warranted. Using experts to set a gold standard assumes little variation occurs in their interpretations, but this has not been explicitly studied in mammography. Using digitized films from 314 screening mammography exams (n = 143 cancer cases) performed in the Breast Cancer Surveillance Consortium, we evaluated interpretive agreement among three expert radiologists who independently assessed whether each examination should be recalled, and the lesion location, finding type (mass, calcification, asymmetric density, or architectural distortion), and interpretive difficulty in the recalled images. Agreement among the three expert pairs for recall/no recall was higher for cancer cases (mean 74.3 ± 6.5) than for noncancers (mean 62.6 ± 7.1). Complete agreement on recall, lesion location, finding type and difficulty ranged from 36.4% to 42.0% for cancer cases and from 43.9% to 65.6% for noncancer cases. Two of three experts agreed on recall and lesion location for 95.1% of cancer cases and 91.8% of noncancer cases, but all three experts agreed on only 55.2% of cancer cases and 42.1% of noncancer cases. Variability in expert interpretive is notable. A minimum of three independent experts combined with a consensus should be used for establishing any gold standard interpretation for test sets, especially for noncancer cases.
    Academic radiology 06/2013; 20(6):731-9. DOI:10.1016/j.acra.2013.01.012 · 1.75 Impact Factor
Show more