ArticlePublisher preview available

Evaluating Explainability in Transfer Learning Models for Pulmonary Nodules Classification: A Comparative Analysis of Generalizability and Interpretability

World Scientific
International Journal of Pattern Recognition and Artificial Intelligence
Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Computerized diagnostic systems have come a long way in terms of providing credible and speedy results in the diagnosis of lung cancer, which has become one of the leading causes of death worldwide in recent years. This progress is particularly true with the advancements in models based on deep convolutional neural networks (CNNs) using computed tomography (CT) images. However, the decision-making processes of such models are less than exactly interpretable, as they are considered black boxes. This makes physicians reluctant to trust and use them.The aim of this paper is to compare several transfer models that were pre-trained on the ImageNet dataset and apply them to lung cancer diagnosis, evaluating their generalizability and robustness. This comparative study implements a number of models including MobileNetV2, EfficientNetV2-L, EfficientNet-B7, DenseNet201, VGG19, VGG16, ResNet50, Xception, NasNetLarge, and InceptionV3. The models were trained on four distinct datasets to evaluate data diversity and heterogeneity. The models’ generalization capabilities were assessed using two separate datasets: IQ-OTH/NCCD and the LDCT dataset. To enhance the models explainability and trustworthiness, the Local Interpretable Model-Agnostic Explanations (LIME) method was utilized. Among the tested models, MobileNetV2 and ResNet50 demonstrated the highest performance and stability. MobileNetV2 achieved an accuracy of 99.28%, with false positive and false negative rates of 1.23% and 0%, respectively. ResNet50 achieved an accuracy of 99.38%, with false positive and false negative rates of 0% and 1.23%, respectively.
This content is subject to copyright.
Evaluating Explainability in Transfer Learning Models
for Pulmonary Nodules Classi¯cation: A Comparative
Analysis of Generalizability and Interpretability
Amira Bouamrane
*
,
, Makhlouf Derdour
*
,
§
, Ahmed Alksas
,
and Ayman El-Baz
,||
*
LIAOA Laboratory, University of Oum El-Bouaghi-Larbi
Benmhidi Oum El-Bouaghi 04000
Algeria
Department of Bioengineering, University of Louisville,
Louisville, KY 40208 USA
amira.bouamrane@univ-oeb.dz
§
derdour.makhlouf@univ-oeb.dz
ahmed.alksas@louisville.edu
||
aselba01@louisville.edu
Received 26 June 2024
Accepted 23 February 2025
Published 9 May 2025
Computerized diagnostic systems have come a long way in terms of providing credible and
speedy results in the diagnosis of lung cancer, which has become one of the leading causes of
death worldwide in recent years. This progress is particularly true with the advancements in
models based on deep convolutional neural networks (CNNs) using computed tomography (CT)
images. However, the decision-making processes of such models are less than exactly inter-
pretable, as they are considered black boxes. This makes physicians reluctant to trust and use
them.The aim of this paper is to compare several transfer models that were pre-trained on the
ImageNet dataset and apply them to lung cancer diagnosis, evaluating their generalizability and
robustness. This comparative study implements a number of models including MobileNetV2,
E±cientNetV2-L, E±cientNet-B7, DenseNet201, VGG19, VGG16, ResNet50, Xception, Nas-
NetLarge, and InceptionV3. The models were trained on four distinct datasets to evaluate data
diversity and heterogeneity. The models' generalization capabilities were assessed using two
separate datasets: IQ-OTH/NCCD and the LDCT dataset. To enhance the models explain-
ability and trustworthiness, the Local Interpretable Model-Agnostic Explanations (LIME)
method was utilized. Among the tested models, MobileNetV2 and ResNet50 demonstrated the
highest performance and stability. MobileNetV2 achieved an accuracy of 99.28%, with false
positive and false negative rates of 1.23% and 0%, respectively. ResNet50 achieved an accuracy
of 99.38%, with false positive and false negative rates of 0% and 1.23%, respectively.
Keywords: CADx; LIDC-IDRI; lung cancer; generalizability; LIME; explainability.
Corresponding author.
International Journal of Pattern Recognition
and Arti¯cial Intelligence
(2025) 2540001 (33 pages)
#
.
cWorld Scienti¯c Publishing Company
DOI: 10.1142/S0218001425400014
2540001-1
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Lung cancer, the second most common type of cancer worldwide, presents significant health challenges. Detecting this disease early is essential for improving patient outcomes and simplifying treatment. In this study, we propose a hybrid framework that combines deep learning (DL) with quantum computing to enhance the accuracy of lung cancer detection using chest radiographs (CXR) and computerized tomography (CT) images. Our system utilizes pre-trained models for feature extraction and quantum circuits for classification, achieving state-of-the-art performance in various metrics. Not only does our system achieve an overall accuracy of 92.12%, it also excels in other crucial performance measures, such as sensitivity (94%), specificity (90%), F1-score (93%), and precision (92%). These results demonstrate that our hybrid approach can more accurately identify lung cancer signatures compared to traditional methods. Moreover, the incorporation of quantum computing enhances processing speed and scalability, making our system a promising tool for early lung cancer screening and diagnosis. By leveraging the strengths of quantum computing, our approach surpasses traditional methods in terms of speed, accuracy, and efficiency. This study highlights the potential of hybrid computational technologies to transform early cancer detection, paving the way for wider clinical applications and improved patient care outcomes.
Article
Full-text available
In recent years, lung cancer incidents are very high with equally high mortality rate. The main reason for fatal incidences is the late diagnosis and confirmation of the disease at an advanced stage. Identification of the disease at an early stage using lung Computed Tomography (CT) offers tremendous scope for timely medical intervention. The article illustrates the use of deep transfer learning-based pre-trained models for the diagnosis of Non-Small Cell Lung Cancer (NSCLC). The datasets were chosen from Chest CT Scan Images and the Lung Image Database Consortium (LIDC), containing over 3,179 images depicting three NSCLC types, namely, normal, adenocarcinoma, squamous cell carcinoma, and large cell carcinoma. The process is designed to measure the accuracy of NSCLC detection with an experimental dataset using approaches with and without pre-processing of lung images. The transfer learning models use deep learning and produce good results in prediction and classification. The image dataset was first handled by the convolutional neural networks DenseNet121, ResNet50, InceptionV3, VGG16, Xception, and VGG19. As a second phase, input images were subjected to contrast/brightness enhancement using Multi Level Dualistic Sub Image Histogram Equalization (ML-DSIHE). Enhanced images were further processed using shape-based feature extraction. Finally, those features input to CNN models and the results recorded. Among these models, VGG16 achieved the highest accuracy of 81.42% using the original dataset and 91.64% with the enhanced dataset. The performance of these two approaches was also evaluated using Precision, Recall, F1-Score, Accuracy, and Loss. It is confirmed that VGG16 gives highly reliable accuracy when trained upon enhanced images.
Article
Full-text available
Background and Purpose: Survival is frequently assessed using Cox proportional hazards (CPH) regression; however, CPH may be too simplistic as it assumes a linear relationship between covariables and the outcome. Alternative, non-linear machine learning (ML)-based approaches, such as random survival forests (RSFs) and, more recently, deep learning (DL) have been proposed; however, these techniques are largely black-box in nature, limiting explainability. We compared CPH, RSF and DL to predict overall survival (OS) of non-small cell lung cancer (NSCLC) patients receiving radiotherapy using pre-treatment covariables. We employed explainable techniques to provide insights into the contribution of each covariable on OS prediction. Materials and Methods: The dataset contained 471 stage I-IV NSCLC patients treated with radiotherapy. We built CPH, RSF and DL OS prediction models using several baseline covariable combinations. 10-fold Monte-Carlo cross-validation was employed with a split of 70%:10%:20% for training, validation and testing, respectively. We primarily evaluated performance using the concordance index (C-index) and integrated Brier score (IBS). Local interpretable model-agnostic explanation (LIME) values, adapted for use in survival analysis, were computed for each model. Results: The DL method exhibited a significantly improved C-index of 0.670 compared to the CPH and a significantly improved IBS of 0.121 compared to the CPH and RSF approaches. LIME values suggested that, for the DL method, the three most important covariables in OS prediction were stage, administration of chemotherapy and oesophageal mean radiation dose. Conclusion: We show that, using pre-treatment covariables, a DL approach demonstrates superior performance over CPH and RSF for OS prediction and use explainable techniques to provide transparency and interpretability.
Article
Full-text available
The early detection and classification of lung cancer is crucial for improving a patient’s outcome. However, the traditional classification methods are based on single machine learning models. Hence, this is limited by the availability and quality of data at the centralized computing server. In this paper, we propose an ensemble Federated Learning-based approach for multi-order lung cancer classification. This approach combines multiple machine learning models trained on different datasets allowing for improvising accuracy and generalization. Moreover, the Federated Learning approach enables the use of distributed data while ensuring data privacy and security. We evaluate the approach on a Kaggle cancer dataset and compare the results with traditional machine learning models. The results demonstrate an accuracy of 89.63% with lung cancer classification.
Article
Full-text available
This paper investigates the applications of explainable AI (XAI) in healthcare, which aims to provide transparency, fairness, accuracy, generality, and comprehensibility to the results obtained from AI and ML algorithms in decision-making systems. The black box nature of AI and ML systems has remained a challenge in healthcare, and interpretable AI and ML techniques can potentially address this issue. Here we critically review previous studies related to the interpretability of ML and AI methods in medical systems. Descriptions of various types of XAI methods such as layer-wise relevance propagation (LRP), Uniform Manifold Approximation and Projection (UMAP), Local Interpretable Model-agnostic Explanations (LIME), SHapley Additive exPlanations (SHAP), ANCHOR, contextual importance and utility (CIU), Training calibration-based explainers (TraCE), Gradient-weighted Class Activation Mapping (Grad-CAM), t-distributed Stochastic Neighbor Embedding (t-SNE), NeuroXAI, Explainable Cumulative Fuzzy Class Membership Criterion (X-CFCMC) along with the diseases which can be explained through these methods are provided throughout the paper. The paper also discusses how AI and ML technologies can transform healthcare services. The usability and reliability of the presented methods are summarized, including studies on the usability and reliability of XGBoost for mediastinal cysts and tumors, a 3D brain tumor segmentation network, and the TraCE method for medical image analysis. Overall, this paper aims to contribute to the growing field of XAI in healthcare and provide insights for researchers, practitioners, and decision-makers in the healthcare industry. Finally, we discuss the performance of XAI methods applied in medical health care systems. It is also needed to mention that a brief implemented method is provided in the methodology section.
Article
Full-text available
Plants play a crucial role in supplying food globally. Various environmental factors lead to plant diseases which results in significant production losses. However, manual detection of plant diseases is a time-consuming and error-prone process. It can be an unreliable method of identifying and preventing the spread of plant diseases. Adopting advanced technologies such as Machine Learning (ML) and Deep Learning (DL) can help to overcome these challenges by enabling early identification of plant diseases. In this paper, the recent advancements in the use of ML and DL techniques for the identification of plant diseases are explored. The research focuses on publications between 2015 and 2022, and the experiments discussed in this study demonstrate the effectiveness of using these techniques in improving the accuracy and efficiency of plant disease detection. This study also addresses the challenges and limitations associated with using ML and DL for plant disease identification, such as issues with data availability, imaging quality, and the differentiation between healthy and diseased plants. The research provides valuable insights for plant disease detection researchers, practitioners, and industry professionals by offering solutions to these challenges and limitations, providing a comprehensive understanding of the current state of research in this field, highlighting the benefits and limitations of these methods, and proposing potential solutions to overcome the challenges of their implementation.
Article
Artificial intelligence (AI) is a broad concept that usually refers to computer programs that can learn from data and perform certain specific tasks. In the recent years, the growth of deep learning, a successful technique for computer vision tasks that does not require explicit programming, coupled with the availability of large imaging databases fostered the development of multiple applications in the medical imaging field, especially for lung nodules and lung cancer, mostly through convolutional neural networks (CNN). Some of the first applications of AI is this field were dedicated to automated detection of lung nodules on X-ray and computed tomography (CT) examinations, with performances now reaching or exceeding those of radiologists. For lung nodule segmentation, CNN-based algorithms applied to CT images show excellent spatial overlap index with manual segmentation, even for irregular and ground glass nodules. A third application of AI is the classification of lung nodules between malignant and benign, which could limit the number of follow-up CT examinations for less suspicious lesions. Several algorithms have demonstrated excellent capabilities for the prediction of the malignancy risk when a nodule is discovered. These different applications of AI for lung nodules are particularly appealing in the context of lung cancer screening. In the field of lung cancer, AI tools applied to lung imaging have been investigated for distinct aims. First, they could play a role for the non-invasive characterization of tumors, especially for histological subtype and somatic mutation predictions, with a potential therapeutic impact. Additionally, they could help predict the patient prognosis, in combination to clinical data. Despite these encouraging perspectives, clinical implementation of AI tools is only beginning because of the lack of generalizability of published studies, of an inner obscure working and because of limited data about the impact of such tools on the radiologists' decision and on the patient outcome. Radiologists must be active participants in the process of evaluating AI tools, as such tools could support their daily work and offer them more time for high added value tasks.