Optimization of reference library used in content-based medical image retrieval scheme

Department of Radiology, University of Pittsburgh, 3362 Fifth Avenue, Pittsburgh, Pennsylvania 15213, USA.
Medical Physics (Impact Factor: 2.64). 12/2007; 34(11):4331-9. DOI: 10.1118/1.2795826
Source: PubMed


Building an optimal image reference library is a critical step in developing the interactive computer-aided detection and diagnosis (I-CAD) systems of medical images using content-based image retrieval (CBIR) schemes. In this study, the authors conducted two experiments to investigate (1) the relationship between I-CAD performance and size of reference library and (2) a new reference selection strategy to optimize the library and improve I-CAD performance. The authors assembled a reference library that includes 3153 regions of interest (ROI) depicting either malignant masses (1592) or CAD-cued false-positive regions (1561) and an independent testing data set including 200 masses and 200 false-positive regions. A CBIR scheme using a distance-weighted K-nearest neighbor algorithm is applied to retrieve references that are considered similar to the testing sample from the library. The area under receiver operating characteristic curve (Az) is used as an index to evaluate the I-CAD performance. In the first experiment, the authors systematically increased reference library size and tested I-CAD performance. The result indicates that scheme performance improves initially from Az= 0.715 to 0.874 and then plateaus when the library size reaches approximately half of its maximum capacity. In the second experiment, based on the hypothesis that a ROI should be removed if it performs poorly compared to a group of similar ROIs in a large and diverse reference library, the authors applied a new strategy to identify "poorly effective" references. By removing 174 identified ROIs from the reference library, I-CAD performance significantly increases to Az = 0.914 (p < 0.01). The study demonstrates that increasing reference library size and removing poorly effective references can significantly improve I-CAD performance.

Download full-text


Available from: Bin Zheng
    • "Apparently, lack of scalability would hamper the utilization of these valuable medical images. On the one hand, it limits the diagnostic accuracy of CAD applications, since the larger a database is, the more likely that relevant cases are found and a correct decision is made [21], [22]. On the other hand, it is infeasible for a practical PACS to retrieve medical images using these techniques. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Computer-aided diagnosis of masses in mammograms is important to the prevention of breast cancer. Many approaches tackle this problem through content-based image retrieval (CBIR) techniques. However, most of them fall short of scalability in the retrieval stage, and their diagnostic accuracy is therefore restricted. To overcome this drawback, we propose a scalable method for retrieval and diagnosis of mammographic masses. Specifically, for a query mammographic region of interest (ROI), SIFT features are extracted and searched in a vocabulary tree, which stores all the quantized features of previously diagnosed mammographic ROIs. In addition, to fully exert the discriminative power of SIFT features, contextual information in the vocabulary tree is employed to refine the weights of tree nodes. The retrieved ROIs are then used to determine whether the query ROI contains a mass. The presented method has excellent scalability due to the low spatial-temporal cost of vocabulary tree. Extensive experiments are conducted on a large dataset of 11,553 ROIs extracted from the digital database for screening mammography (DDSM), which demonstrate the accuracy and scalability of our approach.
    No preview · Article · Oct 2014 · IEEE transactions on bio-medical engineering
  • Source
    • "The problem of instance selection has attracted some attention in the computer-aided medical diagnosis (CAD) community. Park et al (2007) approached the problem of selecting an optimal instance base in the context of false positive reduction of computer-aided detection marks in mammograms. The authors applied a variation of a popular instance selection algorithm to their database. "
    [Show abstract] [Hide abstract]
    ABSTRACT: When constructing a pattern classifier, it is important to make best use of the instances (a.k.a. cases, examples, patterns or prototypes) available for its development. In this paper we present an extensive comparative analysis of algorithms that, given a pool of previously acquired instances, attempt to select those that will be the most effective to construct an instance-based classifier in terms of classification performance, time efficiency and storage requirements. We evaluate seven previously proposed instance selection algorithms and compare their performance to simple random selection of instances. We perform the evaluation using k-nearest neighbor classifier and three classification problems: one with simulated Gaussian data and two based on clinical databases for breast cancer detection and diagnosis, respectively. Finally, we evaluate the impact of the number of instances available for selection on the performance of the selection algorithms and conduct initial analysis of the selected instances. The experiments show that for all investigated classification problems, it was possible to reduce the size of the original development dataset to less than 3% of its initial size while maintaining or improving the classification performance. Random mutation hill climbing emerges as the superior selection algorithm. Furthermore, we show that some previously proposed algorithms perform worse than random selection. Regarding the impact of the number of instances available for the classifier development on the performance of the selection algorithms, we confirm that the selection algorithms are generally more effective as the pool of available instances increases. In conclusion, instance selection is generally beneficial for instance-based classifiers as it can improve their performance, reduce their storage requirements and improve their response time. However, choosing the right selection algorithm is crucial.
    Preview · Article · Jan 2011 · Physics in Medicine and Biology
  • Source
    • "Recently, case-base optimization gained some interest in the CAD community. For example, Park et al (2007) proposed a variation of the edited nearest-neighbor rule (Wilson and Martinez 2000) for case-base reduction. The authors evaluated the technique with a feature-based CAD system for false positive reduction in screening mammograms. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Case-based computer-aided decision (CB-CAD) systems rely on a database of previously stored, known examples when classifying new, incoming queries. Such systems can be particularly useful since they do not need retraining every time a new example is deposited in the case base. The adaptive nature of case-based systems is well suited to the current trend of continuously expanding digital databases in the medical domain. To maintain efficiency, however, such systems need sophisticated strategies to effectively manage the available evidence database. In this paper, we discuss the general problem of building an evidence database by selecting the most useful examples to store while satisfying existing storage requirements. We evaluate three intelligent techniques for this purpose: genetic algorithm-based selection, greedy selection and random mutation hill climbing. These techniques are compared to a random selection strategy used as the baseline. The study is performed with a previously presented CB-CAD system applied for false positive reduction in screening mammograms. The experimental evaluation shows that when the development goal is to maximize the system's diagnostic performance, the intelligent techniques are able to reduce the size of the evidence database to 37% of the original database by eliminating superfluous and/or detrimental examples while at the same time significantly improving the CAD system's performance. Furthermore, if the case-base size is a main concern, the total number of examples stored in the system can be reduced to only 2-4% of the original database without a decrease in the diagnostic performance. Comparison of the techniques shows that random mutation hill climbing provides the best balance between the diagnostic performance and computational efficiency when building the evidence database of the CB-CAD system.
    Full-text · Article · Nov 2008 · Physics in Medicine and Biology
Show more