Most. Sharmin Sultana Samu’s research while affiliated with Ahsanullah University of Science and Technology and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (4)


Proposed methodology for automated X‐ray interpretation.
Sample X‐ray images and corresponding findings in the form of reports from the IU‐Xray dataset. These reports are treated as the ground truth.
Distribution of report length in number of words.
Report length distribution in train, test and validation split.
Word cloud of reports before removing stopwords.

+6

Vision‐Language Models for Automated Chest X‐ray Interpretation: Leveraging ViT and GPT‐2
  • Article
  • Full-text available

May 2025

·

14 Reads

·

·

·

Most. Sharmin Sultana Samu

Radiology plays a pivotal role in modern medicine due to its non‐invasive diagnostic capabilities. However, the manual generation of unstructured medical reports is time‐consuming and prone to errors. It creates a significant bottleneck in clinical workflows. Despite advancements in AI‐generated radiology reports, challenges remain in achieving detailed and accurate report generation. In this study, we have evaluated different combinations of multimodal models that integrate Computer Vision and Natural Language Processing to generate comprehensive radiology reports. We employed a pretrained Vision Transformer (ViT‐B16) and a SWIN Transformer as the image encoders. The BART and GPT‐2 models serve as the textual decoders. We used Chest X‐ray images and reports from the IU‐Xray dataset to evaluate the usability of the SWIN Transformer‐BART, SWIN Transformer‐GPT‐2, ViT‐B16‐BART, and ViT‐B16‐GPT‐2 models for report generation. We aimed to find the best combination among the models. The SWIN‐BART model performs as the best‐performing model among the four models, achieving remarkable results in almost all the evaluation metrics like ROUGE, BLEU, and BERTScore.

Download

Privacy-Preserving Chest X-ray Report Generation via Multimodal Federated Learning with ViT and GPT-2

May 2025

·

6 Reads

The automated generation of radiology reports from chest X-ray images holds significant promise in enhancing diagnostic workflows while preserving patient privacy. Traditional centralized approaches often require sensitive data transfer, posing privacy concerns. To address this, the study proposes a Multimodal Federated Learning framework for chest X-ray report generation using the IU-Xray dataset. The system utilizes a Vision Transformer (ViT) as the encoder and GPT-2 as the report generator, enabling decentralized training without sharing raw data. Three Federated Learning (FL) aggregation strategies: FedAvg, Krum Aggregation and a novel Loss-aware Federated Averaging (L-FedAvg) were evaluated. Among these, Krum Aggregation demonstrated superior performance across lexical and semantic evaluation metrics such as ROUGE, BLEU, BERTScore and RaTEScore. The results show that FL can match or surpass centralized models in generating clinically relevant and semantically rich radiology reports. This lightweight and privacy-preserving framework paves the way for collaborative medical AI development without compromising data confidentiality.


Explainable AI-Driven Detection of Human Monkeypox Using Deep Learning and Vision Transformers: A Comprehensive Analysis

April 2025

·

15 Reads

Since mpox can spread from person to person, it is a zoonotic viral illness that poses a significant public health concern. It is difficult to make an early clinical diagnosis because of how closely its symptoms match those of measles and chickenpox. Medical imaging combined with deep learning (DL) techniques has shown promise in improving disease detection by analyzing affected skin areas. Our study explore the feasibility to train deep learning and vision transformer-based models from scratch with publicly available skin lesion image dataset. Our experimental results show dataset limitation as a major drawback to build better classifier models trained from scratch. We used transfer learning with the help of pre-trained models to get a better classifier. The MobileNet-v2 outperformed other state of the art pre-trained models with 93.15% accuracy and 93.09% weighted average F1 score. ViT B16 and ResNet-50 also achieved satisfactory performance compared to already available studies with accuracy 92.12% and 86.21% respectively. To further validate the performance of the models, we applied explainable AI techniques.


Vision-Language Models for Automated Chest X-ray Interpretation: Leveraging ViT and GPT-2

January 2025

·

79 Reads

Radiology plays a pivotal role in modern medicine due to its non-invasive diagnostic capabilities. However, the manual generation of unstructured medical reports is time consuming and prone to errors. It creates a significant bottleneck in clinical workflows. Despite advancements in AI-generated radiology reports, challenges remain in achieving detailed and accurate report generation. In this study we have evaluated different combinations of multimodal models that integrate Computer Vision and Natural Language Processing to generate comprehensive radiology reports. We employed a pretrained Vision Transformer (ViT-B16) and a SWIN Transformer as the image encoders. The BART and GPT-2 models serve as the textual decoders. We used Chest X-ray images and reports from the IU-Xray dataset to evaluate the usability of the SWIN Transformer-BART, SWIN Transformer-GPT-2, ViT-B16-BART and ViT-B16-GPT-2 models for report generation. We aimed at finding the best combination among the models. The SWIN-BART model performs as the best-performing model among the four models achieving remarkable results in almost all the evaluation metrics like ROUGE, BLEU and BERTScore.