Determination of similarity measures for pairs of mass lesions on mammograms by use of BI-RADS lesion descriptors and image features.

Kurt Rossmann Laboratories for Radiologic Image Research, Department of Radiology, The University of Chicago, Chicago, IL, USA.
Academic radiology (Impact Factor: 2.09). 05/2009; 16(4):443-9. DOI: 10.1016/j.acra.2008.10.012
Source: PubMed

ABSTRACT To determine similarity measures for selection of pathology-known similar images that would be useful for radiologists as a reference guide in the diagnosis of new breast lesions on mammograms.
The images were obtained from the Digital Database for Screening Mammography developed by the University of South Florida. For determination and evaluation of similarity measures, the "gold standard" of similarities for 300 pairs of masses was determined by 10 breast radiologists. For determining similarity measures that would agree with radiologists' similarity determination, an artificial neural network (ANN) was trained with the radiologists' subjective similarity ratings and the image features. The image features were determined subjectively using the Breast Imaging Reporting and Data System (BI-RADS) lesion descriptors and objectively by computerized image analysis. The similarity measures determined by the ANN were compared to the gold standard and evaluated in terms of the correlation coefficient.
The similarity measures determined using the BI-RADS descriptors only were not as useful as those determined by use of the image features only. When the BI-RADS margin ratings were combined with the image features, the correlation coefficient between the subjective ratings and the objective measures improved slightly (r = 0.76) compared to those based on the image features alone (r = 0.74).
The inclusion of the BI-RADS margin descriptors may be useful for determination of similarity measures, especially when it is difficult to obtain the manual outlines of the masses and if the BI-RADS descriptors were provided consistently by radiologists.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Presentation of similar reference images can be useful for diagnosis of new lesions. A similarity map which can visually present the overview of the relationship between the lesions with different types may provide the supplemental information to the reference images. A new method for constructing the similarity map by multidimensional scaling (MDS) for breast masses on mammograms was investigated. Nine pathologic types were included; three regions of interests each from the nine groups were employed in this study. Subjective similarity ratings by expert readers were obtained for all possible 351 pairs of masses. Using the average ratings, MDS similarity map was created. Each axis of the MDS configuration was fitted by the linear model with 13 image features to reconstruct the similarity map. Dissimilarity based on the distance in the reconstructed space was determined and compared with the subjective rating. The MDS map consistently represented the similarity between cysts and fibroadenomas, invasive lobular carcinomas and scirrhous carcinomas, and ductal carcinomas in situ, solid-tubular carcinomas, and papillotubular carcinomas with the experts' data. The correlation between the average subjective ratings and the dissimilarities based on the distance in the reconstructed feature space was much greater (-0.87) than that of the dissimilarities based on the distance in the conventional feature space (-0.65). The new similarity map by MDS can be useful for visualizing the relationship between breast masses with different pathologic types. It has potential usefulness in selecting the similarity measures and providing the supplemental information.
    Journal of Digital Imaging 01/2013; · 1.10 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Purpose: We are developing a decision tree content-based image retrieval (DTCBIR) CADx system to assist radiologists in characterization of breast masses on ultrasound images.Methods: Three DTCBIR configurations, including decision tree with boosting (DTb), decision tree with full leaf features (DTL), and decision tree with selected leaf features (DTLs) were compared. For DTb, features of a query mass were combined first into a merged feature score and then masses with similar scores were retrieved. For DTL and DTLs, similar masses were retrieved based on the Euclidean distance between feature vectors of the query and those of selected references. For each DTCBIR configuration, we investigated the use of full feature set and subset of features selected by the stepwise linear discriminant analysis (LDA) and simplex optimization method, resulting in six retrieval methods and selected five, DTb-lda, DTL-lda, DTb-full, DTL-full, and DTLs-full, for the observer study. Three MQSA radiologists rated similarities between the query mass and computer-retrieved three most similar masses using nine-point similarity scale (9 = very similar).Results: For DTb-lda, DTL-lda, DTb-full, DTL-full, and DTLs-full, average A(z) values were 0.90 ± 0.03, 0.85 ± 0.04, 0.87 ± 0.04, 0.79 ± 0.05, and 0.71 ± 0.06, respectively, and average similarity ratings were 5.00, 5.41, 4.96, 5.33, and 5.13, respectively.Conclusions: The DTL-lda is a promising DTCBIR CADx configuration which had simple tree structure, good classification performance, and highest similarity rating.
    Medical Physics 01/2013; 40(1):012901. · 2.91 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We conducted an observer study to investigate how the data collection method affects the efficacy of modeling individual radiologists' judgments regarding the perceptual similarity of breast masses on mammograms. Six observers of varying experience levels in breast imaging were recruited to assess the perceptual similarity of mammographic masses. The observers' subjective judgments were collected using (i) a rating method, (ii) a preference method, and (iii) a hybrid method combining rating and ranking. Personalized user models were developed with the collected data to predict observers' opinions. The relative efficacy of each data collection method was assessed based on the classification accuracy of the resulting user models. The average accuracy of the user models derived from data collected with the hybrid method was 55.5 ± 1.5%. The models were significantly more accurate (P < .0005) than those derived from the rating (45.3 ± 3.5%) and the preference (40.8 ± 5%) methods. On average, the rating data collection method was significantly faster than the other two methods (P < .0001). No time advantage was observed between the preference and the hybrid methods. A hybrid method combining rating and ranking is an intuitive and efficient way for collecting subjective similarity judgments to model human perceptual opinions with a higher accuracy than other, more commonly used data collection methods.
    Academic radiology 11/2013; 20(11):1371-1380. · 2.09 Impact Factor