Optimization of reference library used in content-based medical image retrieval scheme.

Department of Radiology, University of Pittsburgh, 3362 Fifth Avenue, Pittsburgh, Pennsylvania 15213, USA.
Medical Physics (Impact Factor: 2.91). 12/2007; 34(11):4331-9. DOI: 10.1118/1.2795826
Source: PubMed

ABSTRACT Building an optimal image reference library is a critical step in developing the interactive computer-aided detection and diagnosis (I-CAD) systems of medical images using content-based image retrieval (CBIR) schemes. In this study, the authors conducted two experiments to investigate (1) the relationship between I-CAD performance and size of reference library and (2) a new reference selection strategy to optimize the library and improve I-CAD performance. The authors assembled a reference library that includes 3153 regions of interest (ROI) depicting either malignant masses (1592) or CAD-cued false-positive regions (1561) and an independent testing data set including 200 masses and 200 false-positive regions. A CBIR scheme using a distance-weighted K-nearest neighbor algorithm is applied to retrieve references that are considered similar to the testing sample from the library. The area under receiver operating characteristic curve (Az) is used as an index to evaluate the I-CAD performance. In the first experiment, the authors systematically increased reference library size and tested I-CAD performance. The result indicates that scheme performance improves initially from Az= 0.715 to 0.874 and then plateaus when the library size reaches approximately half of its maximum capacity. In the second experiment, based on the hypothesis that a ROI should be removed if it performs poorly compared to a group of similar ROIs in a large and diverse reference library, the authors applied a new strategy to identify "poorly effective" references. By removing 174 identified ROIs from the reference library, I-CAD performance significantly increases to Az = 0.914 (p < 0.01). The study demonstrates that increasing reference library size and removing poorly effective references can significantly improve I-CAD performance.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Although mammography is the only clinically accepted imaging modality for screening the general population to detect breast cancer, interpreting mammograms is difficult with lower sensitivity and specificity. To provide radiologists "a visual aid" in interpreting mammograms, we developed and tested an interactive system for computer-aided detection and diagnosis (CAD) of mass-like cancers. Using this system, an observer can view CAD-cued mass regions depicted on one image and then query any suspicious regions (either cued or not cued by CAD). CAD scheme automatically segments the suspicious region or accepts manually defined region and computes a set of image features. Using content-based image retrieval (CBIR) algorithm, CAD searches for a set of reference images depicting "abnormalities" similar to the queried region. Based on image retrieval results and a decision algorithm, a classification score is assigned to the queried region. In this study, a reference database with 1,800 malignant mass regions and 1,800 benign and CAD-generated false-positive regions was used. A modified CBIR algorithm with a new function of stretching the attributes in the multi-dimensional space and decision scheme was optimized using a genetic algorithm. Using a leave-one-out testing method to classify suspicious mass regions, we compared the classification performance using two CBIR algorithms with either equally weighted or optimally stretched attributes. Using the modified CBIR algorithm, the area under receiver operating characteristic curve was significantly increased from 0.865 ± 0.006 to 0.897 ± 0.005 (p < 0.001). This study demonstrated the feasibility of developing an interactive CAD system with a large reference database and achieving improved performance.
    Journal of Digital Imaging 01/2012; 25(5):570-9. · 1.10 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Purpose: We are developing a decision tree content-based image retrieval (DTCBIR) CADx system to assist radiologists in characterization of breast masses on ultrasound images.Methods: Three DTCBIR configurations, including decision tree with boosting (DTb), decision tree with full leaf features (DTL), and decision tree with selected leaf features (DTLs) were compared. For DTb, features of a query mass were combined first into a merged feature score and then masses with similar scores were retrieved. For DTL and DTLs, similar masses were retrieved based on the Euclidean distance between feature vectors of the query and those of selected references. For each DTCBIR configuration, we investigated the use of full feature set and subset of features selected by the stepwise linear discriminant analysis (LDA) and simplex optimization method, resulting in six retrieval methods and selected five, DTb-lda, DTL-lda, DTb-full, DTL-full, and DTLs-full, for the observer study. Three MQSA radiologists rated similarities between the query mass and computer-retrieved three most similar masses using nine-point similarity scale (9 = very similar).Results: For DTb-lda, DTL-lda, DTb-full, DTL-full, and DTLs-full, average A(z) values were 0.90 ± 0.03, 0.85 ± 0.04, 0.87 ± 0.04, 0.79 ± 0.05, and 0.71 ± 0.06, respectively, and average similarity ratings were 5.00, 5.41, 4.96, 5.33, and 5.13, respectively.Conclusions: The DTL-lda is a promising DTCBIR CADx configuration which had simple tree structure, good classification performance, and highest similarity rating.
    Medical Physics 01/2013; 40(1):012901. · 2.91 Impact Factor
  • Clinical and Translational Science 04/2013; 6(2):85-7. · 2.33 Impact Factor

Full-text (2 Sources)

Available from
May 26, 2014