A preview of this full-text is provided by World Scientific.
Content available from International Journal of Pattern Recognition and Artificial Intelligence
This content is subject to copyright.
Evaluating Explainability in Transfer Learning Models
for Pulmonary Nodules Classi¯cation: A Comparative
Analysis of Generalizability and Interpretability
Amira Bouamrane
*
,
‡
, Makhlouf Derdour
*
,
§
, Ahmed Alksas
†
,¶
and Ayman El-Baz
†
,||
*
LIAOA Laboratory, University of Oum El-Bouaghi-Larbi
Benmhidi Oum El-Bouaghi 04000
Algeria
†
Department of Bioengineering, University of Louisville,
Louisville, KY 40208 USA
‡
amira.bouamrane@univ-oeb.dz
§
derdour.makhlouf@univ-oeb.dz
¶
ahmed.alksas@louisville.edu
||
aselba01@louisville.edu
Received 26 June 2024
Accepted 23 February 2025
Published 9 May 2025
Computerized diagnostic systems have come a long way in terms of providing credible and
speedy results in the diagnosis of lung cancer, which has become one of the leading causes of
death worldwide in recent years. This progress is particularly true with the advancements in
models based on deep convolutional neural networks (CNNs) using computed tomography (CT)
images. However, the decision-making processes of such models are less than exactly inter-
pretable, as they are considered black boxes. This makes physicians reluctant to trust and use
them.The aim of this paper is to compare several transfer models that were pre-trained on the
ImageNet dataset and apply them to lung cancer diagnosis, evaluating their generalizability and
robustness. This comparative study implements a number of models including MobileNetV2,
E±cientNetV2-L, E±cientNet-B7, DenseNet201, VGG19, VGG16, ResNet50, Xception, Nas-
NetLarge, and InceptionV3. The models were trained on four distinct datasets to evaluate data
diversity and heterogeneity. The models' generalization capabilities were assessed using two
separate datasets: IQ-OTH/NCCD and the LDCT dataset. To enhance the models explain-
ability and trustworthiness, the Local Interpretable Model-Agnostic Explanations (LIME)
method was utilized. Among the tested models, MobileNetV2 and ResNet50 demonstrated the
highest performance and stability. MobileNetV2 achieved an accuracy of 99.28%, with false
positive and false negative rates of 1.23% and 0%, respectively. ResNet50 achieved an accuracy
of 99.38%, with false positive and false negative rates of 0% and 1.23%, respectively.
Keywords: CADx; LIDC-IDRI; lung cancer; generalizability; LIME; explainability.
‡
Corresponding author.
International Journal of Pattern Recognition
and Arti¯cial Intelligence
(2025) 2540001 (33 pages)
#
.
cWorld Scienti¯c Publishing Company
DOI: 10.1142/S0218001425400014
2540001-1