Optimized approach to decision fusion of heterogeneous data for breast cancer diagnosis

Department of Electrical and Computer Engineering (ECE), Duke University, Durham, North Carolina, United States
Medical Physics (Impact Factor: 2.64). 08/2006; 33(8):2945-54. DOI: 10.1118/1.2208934
Source: PubMed


As more diagnostic testing options become available to physicians, it becomes more difficult to combine various types of medical information together in order to optimize the overall diagnosis. To improve diagnostic performance, here we introduce an approach to optimize a decision-fusion technique to combine heterogeneous information, such as from different modalities, feature categories, or institutions. For classifier comparison we used two performance metrics: The receiving operator characteristic (ROC) area under the curve [area under the ROC curve (AUC)] and the normalized partial area under the curve (pAUC). This study used four classifiers: Linear discriminant analysis (LDA), artificial neural network (ANN), and two variants of our decision-fusion technique, AUC-optimized (DF-A) and pAUC-optimized (DF-P) decision fusion. We applied each of these classifiers with 100-fold cross-validation to two heterogeneous breast cancer data sets: One of mass lesion features and a much more challenging one of microcalcification lesion features. For the calcification data set, DF-A outperformed the other classifiers in terms of AUC (p < 0.02) and achieved AUC=0.85 +/- 0.01. The DF-P surpassed the other classifiers in terms of pAUC (p < 0.01) and reached pAUC=0.38 +/- 0.02. For the mass data set, DF-A outperformed both the ANN and the LDA (p < 0.04) and achieved AUC=0.94 +/- 0.01. Although for this data set there were no statistically significant differences among the classifiers' pAUC values (pAUC=0.57 +/- 0.07 to 0.67 +/- 0.05, p > 0.10), the DF-P did significantly improve specificity versus the LDA at both 98% and 100% sensitivity (p < 0.04). In conclusion, decision fusion directly optimized clinically significant performance measures, such as AUC and pAUC, and sometimes outperformed two well-known machine-learning techniques when applied to two different breast cancer data sets.

7 Reads
  • Source
    • "However, most of them have been focused on the study of the single classifier based methods. There are only a few publications on the design of ensemble classifier system on the classification of mammographic lesions ( Santo et al 2003, Constantinidis et al 2001, Jesneck et al 2006, Fung et al 2006, Yoon and Kim 2008). The authors in ( Constantinidis et al 2001) proposed the so-called the augmented behavior knowledge space method for the purpose of classification of circumscribed masses in digital mammograms. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a novel computer-aided detection (CAD) framework of breast masses in mammography. To increase detection sensitivity for various types of mammographic masses, we propose the combined use of different detection algorithms. In particular, we develop a region-of-interest combination mechanism that integrates detection information gained from unsupervised and supervised detection algorithms. Also, to significantly reduce the number of false-positive (FP) detections, the new ensemble classification algorithm is developed. Extensive experiments have been conducted on a benchmark mammogram database. Results show that our combined detection approach can considerably improve the detection sensitivity with a small loss of FP rate, compared to representative detection algorithms previously developed for mammographic CAD systems. The proposed ensemble classification solution also has a dramatic impact on the reduction of FP detections; as much as 70% (from 15 to 4.5 per image) at only cost of 4.6% sensitivity loss (from 90.0% to 85.4%). Moreover, our proposed CAD method performs as well or better (70.7% and 80.0% per 1.5 and 3.5 FPs per image respectively) than the results of mammography CAD algorithms previously reported in the literature.
    Physics in Medicine and Biology 06/2014; 59(14):3697. DOI:10.1088/0031-9155/59/14/3697 · 2.76 Impact Factor
  • Source
    • "Rohlfing et al. [2] compared the two methods to combine information sources in different biomedical image analysis applications, while Haapanen and Tuominen [3] followed a COD approach for the combination of satellite image and aerial photograph features for higher accuracies at forest variable estimation. On the other hand, Jesneck et al. [4], on a COI path, optimized clinically significant performance measures in a decision-fusion technique combining heterogeneous breast cancer data. Lee et al. [5] proposed a Generalized Fusion Framework (GFF) for homogenous data representation and subsequent fusion in the metaspace, using dimensionality reduction techniques. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this work the effects of simple imputations are studied, regarding the integration of multimodal data originating from different patients. Two separate datasets of cutaneous melanoma are used, an image analysis (dermoscopy) dataset together with a transcriptomic one, specifically DNA microarrays. Each modality is related to a different set of patients, and four imputation methods are employed to the formation of a unified, integrative dataset. The application of backward selection together with ensemble classifiers (random forests), followed by principal components analysis and linear discriminant analysis, illustrates the implication of the imputations on feature selection and dimensionality reduction methods. The results suggest that the expansion of the feature space through the data integration, achieved by the exploitation of imputation schemes in general, aids the classification task, imparting stability as regards the derivation of putative classifiers. In particular, although the biased imputation methods increase significantly the predictive performance and the class discrimination of the datasets, they still contribute to the study of prominent features and their relations. The fusion of separate datasets, which provide a multimodal description of the same pathology, represents an innovative, promising avenue, enhancing robust composite biomarker derivation and promoting the interpretation of the biomedical problem studied.
    BioMed Research International 01/2014; 2014:145243. DOI:10.1155/2014/145243 · 2.71 Impact Factor
  • Source
    • "Because of it late diagnosis, it often results from it a heavy, mutilating and expensive treatment (processing) which is accompanied by a high mortality rate[1].Various studies confirmed that detection in early stage of the infraclinical cancers can improve the forecast and that the mammography constitutes in that case the best diagnostic technique. All the radiologists recognize the difficulty of the mammographic examination which still increases by the tissues type of the examined breast, the conditions of realization, the number of available stereotype (pictures), etc. [2] (Figure 1). The uniformization of the screening, the decrease of experts number and the quality requirements regarding public health make indispensable recourse to technologies able to help in the diagnosis. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Breast cancer is reported as the second most deadly cancer in the world and the main of mortality among the women, on which public awareness has been increasing during the last few decades. This is why several works are made to develop help tools for disease diagnosis. Computer-Assisted Diagnosis (CAD) is based on 3 main steps: segmentation, feature extraction and classification in order to generate a final decision. Classification phase is the key step in this process; for that, many research have been accentuated in this domain and many techniques were be proposed. Kernel combination is a current active topic in the field of machine learning. It takes benefit of classifier algorithms. it allows to choose the kernel functions according to the features vectors. The combination of Kernel-based classifiers was proposed as a research way allowing reliability recognition by using the complementarily which can exist between classifiers. This study investigated a computer-aided diagnosis system for breast cancer by developing a novel classifier fusion scheme based on fusion of three support vector machine classifier. Each one is associated with an homogenous family of features (Hu moments; central moments, Haralick moment) as efficient learning algorithm and diversity between features family as fusion criteria to ensure best performance. Our experiments demonstrated that developed system using Database for Screening Mammography (DDSM) database achieve very encouraging results when compared with past works using the same information.
    International Journal of Multimedia and Ubiquitous Engineering 01/2013; 8(4-4):45-58.
Show more


7 Reads
Available from