Article

Optimization of reference library used in content-based medical image retrieval scheme.

Department of Radiology, University of Pittsburgh, 3362 Fifth Avenue, Pittsburgh, Pennsylvania 15213, USA.
Medical Physics (Impact Factor: 3.01). 12/2007; 34(11):4331-9. DOI: 10.1118/1.2795826
Source: PubMed

ABSTRACT Building an optimal image reference library is a critical step in developing the interactive computer-aided detection and diagnosis (I-CAD) systems of medical images using content-based image retrieval (CBIR) schemes. In this study, the authors conducted two experiments to investigate (1) the relationship between I-CAD performance and size of reference library and (2) a new reference selection strategy to optimize the library and improve I-CAD performance. The authors assembled a reference library that includes 3153 regions of interest (ROI) depicting either malignant masses (1592) or CAD-cued false-positive regions (1561) and an independent testing data set including 200 masses and 200 false-positive regions. A CBIR scheme using a distance-weighted K-nearest neighbor algorithm is applied to retrieve references that are considered similar to the testing sample from the library. The area under receiver operating characteristic curve (Az) is used as an index to evaluate the I-CAD performance. In the first experiment, the authors systematically increased reference library size and tested I-CAD performance. The result indicates that scheme performance improves initially from Az= 0.715 to 0.874 and then plateaus when the library size reaches approximately half of its maximum capacity. In the second experiment, based on the hypothesis that a ROI should be removed if it performs poorly compared to a group of similar ROIs in a large and diverse reference library, the authors applied a new strategy to identify "poorly effective" references. By removing 174 identified ROIs from the reference library, I-CAD performance significantly increases to Az = 0.914 (p < 0.01). The study demonstrates that increasing reference library size and removing poorly effective references can significantly improve I-CAD performance.

Download full-text

Full-text

Available from: Bin Zheng, Jun 30, 2015
0 Followers
 · 
90 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: When constructing a pattern classifier, it is important to make best use of the instances (a.k.a. cases, examples, patterns or prototypes) available for its development. In this paper we present an extensive comparative analysis of algorithms that, given a pool of previously acquired instances, attempt to select those that will be the most effective to construct an instance-based classifier in terms of classification performance, time efficiency and storage requirements. We evaluate seven previously proposed instance selection algorithms and compare their performance to simple random selection of instances. We perform the evaluation using k-nearest neighbor classifier and three classification problems: one with simulated Gaussian data and two based on clinical databases for breast cancer detection and diagnosis, respectively. Finally, we evaluate the impact of the number of instances available for selection on the performance of the selection algorithms and conduct initial analysis of the selected instances. The experiments show that for all investigated classification problems, it was possible to reduce the size of the original development dataset to less than 3% of its initial size while maintaining or improving the classification performance. Random mutation hill climbing emerges as the superior selection algorithm. Furthermore, we show that some previously proposed algorithms perform worse than random selection. Regarding the impact of the number of instances available for the classifier development on the performance of the selection algorithms, we confirm that the selection algorithms are generally more effective as the pool of available instances increases. In conclusion, instance selection is generally beneficial for instance-based classifiers as it can improve their performance, reduce their storage requirements and improve their response time. However, choosing the right selection algorithm is crucial.
    Physics in Medicine and Biology 01/2011; 56(2):473-89. DOI:10.1088/0031-9155/56/2/012 · 2.92 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Case-based computer-aided decision (CB-CAD) systems rely on a database of previously stored, known examples when classifying new, incoming queries. Such systems can be particularly useful since they do not need retraining every time a new example is deposited in the case base. The adaptive nature of case-based systems is well suited to the current trend of continuously expanding digital databases in the medical domain. To maintain efficiency, however, such systems need sophisticated strategies to effectively manage the available evidence database. In this paper, we discuss the general problem of building an evidence database by selecting the most useful examples to store while satisfying existing storage requirements. We evaluate three intelligent techniques for this purpose: genetic algorithm-based selection, greedy selection and random mutation hill climbing. These techniques are compared to a random selection strategy used as the baseline. The study is performed with a previously presented CB-CAD system applied for false positive reduction in screening mammograms. The experimental evaluation shows that when the development goal is to maximize the system's diagnostic performance, the intelligent techniques are able to reduce the size of the evidence database to 37% of the original database by eliminating superfluous and/or detrimental examples while at the same time significantly improving the CAD system's performance. Furthermore, if the case-base size is a main concern, the total number of examples stored in the system can be reduced to only 2-4% of the original database without a decrease in the diagnostic performance. Comparison of the techniques shows that random mutation hill climbing provides the best balance between the diagnostic performance and computational efficiency when building the evidence database of the CB-CAD system.
    Physics in Medicine and Biology 11/2008; 53(21):6079-96. DOI:10.1088/0031-9155/53/21/013 · 2.92 Impact Factor
  • Source