Josien P. W. Pluim’s research while affiliated with Utrecht University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (302)


Slide-Level Classification Evaluation on TCGA BRCA dataset using Ima- geNet pretrained ResNet50 to extract instance features. The values are reported as mean ± standard deviation. The best ones are in bold. The flops are measured with 120 instances per bag, and the instance feature extraction is not considered in the presented flops.
Slide-Level Classification Evaluation on TCGA BRCA dataset with sef- supervised pretrained ResNet50 to extract instance features. The values are reported as mean ± standard deviation. The best ones are in bold.
A Spatially-Aware Multiple Instance Learning Framework for Digital Pathology
  • Preprint
  • File available

April 2025

·

8 Reads

Hassan Keshvarikhojasteh

·

Mihail Tifrea

·

Sibylle Hess

·

[...]

·

Multiple instance learning (MIL) is a promising approach for weakly supervised classification in pathology using whole slide images (WSIs). However, conventional MIL methods such as Attention-Based Deep Multiple Instance Learning (ABMIL) typically disregard spatial interactions among patches that are crucial to pathological diagnosis. Recent advancements, such as Transformer based MIL (TransMIL), have incorporated spatial context and inter-patch relationships. However, it remains unclear whether explicitly modeling patch relationships yields similar performance gains in ABMIL, which relies solely on Multi-Layer Perceptrons (MLPs). In contrast, TransMIL employs Transformer-based layers, introducing a fundamental architectural shift at the cost of substantially increased computational complexity. In this work, we enhance the ABMIL framework by integrating interaction-aware representations to address this question. Our proposed model, Global ABMIL (GABMIL), explicitly captures inter-instance dependencies while preserving computational efficiency. Experimental results on two publicly available datasets for tumor subtyping in breast and lung cancers demonstrate that GABMIL achieves up to a 7 percentage point improvement in AUPRC and a 5 percentage point increase in the Kappa score over ABMIL, with minimal or no additional computational overhead. These findings underscore the importance of incorporating patch interactions within MIL frameworks.

Download

PathoPainter: Augmenting Histopathology Segmentation via Tumor-aware Inpainting

March 2025

·

8 Reads

Tumor segmentation plays a critical role in histopathology, but it requires costly, fine-grained image-mask pairs annotated by pathologists. Thus, synthesizing histopathology data to expand the dataset is highly desirable. Previous works suffer from inaccuracies and limited diversity in image-mask pairs, both of which affect training segmentation, particularly in small-scale datasets and the inherently complex nature of histopathology images. To address this challenge, we propose PathoPainter, which reformulates image-mask pair generation as a tumor inpainting task. Specifically, our approach preserves the background while inpainting the tumor region, ensuring precise alignment between the generated image and its corresponding mask. To enhance dataset diversity while maintaining biological plausibility, we incorporate a sampling mechanism that conditions tumor inpainting on regional embeddings from a different image. Additionally, we introduce a filtering strategy to exclude uncertain synthetic regions, further improving the quality of the generated data. Our comprehensive evaluation spans multiple datasets featuring diverse tumor types and various training data scales. As a result, segmentation improved significantly with our synthetic data, surpassing existing segmentation data synthesis approaches, e.g., 75.69% -> 77.69% on CAMELYON16. The code is available at https://github.com/HongLiuuuuu/PathoPainter.


Adaptive Prototype Learning for Multimodal Cancer Survival Analysis

March 2025

·

2 Reads

Leveraging multimodal data, particularly the integration of whole-slide histology images (WSIs) and transcriptomic profiles, holds great promise for improving cancer survival prediction. However, excessive redundancy in multimodal data can degrade model performance. In this paper, we propose Adaptive Prototype Learning (APL), a novel and effective approach for multimodal cancer survival analysis. APL adaptively learns representative prototypes in a data-driven manner, reducing redundancy while preserving critical information. Our method employs two sets of learnable query vectors that serve as a bridge between high-dimensional representations and survival prediction, capturing task-relevant features. Additionally, we introduce a multimodal mixed self-attention mechanism to enable cross-modal interactions, further enhancing information fusion. Extensive experiments on five benchmark cancer datasets demonstrate the superiority of our approach over existing methods. The code is available at https://github.com/HongLiuuuuu/APL.


Body composition and checkpoint inhibitor treatment outcomes in advanced melanoma: a multicenter cohort study

February 2025

·

8 Reads

·

1 Citation

JNCI Journal of the National Cancer Institute

Introduction The association of body composition with checkpoint inhibitor outcomes in melanoma is a matter of ongoing debate. In this study, we aim to investigate body mass index (BMI) alongside CT-derived body composition metrics in the largest cohort to date. Methods Patients treated with first-line anti-PD1 ± anti-CTLA4 for advanced melanoma were retrospectively identified from 11 melanoma centers in The Netherlands. From baseline CT scans, five body composition metrics were extracted: subcutaneous adipose tissue index, visceral adipose tissue index and skeletal muscle index, density and gauge. These metrics were correlated in uni- and multivariable Cox proportional hazards analysis with progression-free, overall and melanoma-specific survival (PFS, OS and MSS). Results A total of 1471 eligible patients were included. Median PFS and OS were 9.1 and 38.1 months, respectively. Worse PFS was observed in underweight patients (multivariable HR = 1.86, 95% CI 1.14–3.06). Furthermore, prolonged OS was observed in patients with higher skeletal muscle density (multivariable HR = 0.88, 95% CI 0.81-0.97) and gauge (multivariable HR = 0.61, 95% CI 0.82-0.998), whereas higher visceral adipose tissue index was associated with worse OS (multivariable HR = 1.12, 95% CI 1.04-1.22). No association with survival outcomes was found for overweight, obesity or subcutaneous adipose tissue. Discussion Our findings suggest that underweight BMI is associated with worse PFS, whereas higher skeletal muscle density and lower visceral adipose tissue index were associated with improved OS. These associations were independent of known prognostic factors, including sex, age, performance status and extent of disease. No significant association between higher BMI and survival outcomes was observed.


Scaling up self-supervised learning for improved surgical foundation models

January 2025

·

19 Reads

·

1 Citation

Foundation models have revolutionized computer vision by achieving vastly superior performance across diverse tasks through large-scale pretraining on extensive datasets. However, their application in surgical computer vision has been limited. This study addresses this gap by introducing SurgeNetXL, a novel surgical foundation model that sets a new benchmark in surgical computer vision. Trained on the largest reported surgical dataset to date, comprising over 4.7 million video frames, SurgeNetXL achieves consistent top-tier performance across six datasets spanning four surgical procedures and three tasks, including semantic segmentation, phase recognition, and critical view of safety (CVS) classification. Compared with the best-performing surgical foundation models, SurgeNetXL shows mean improvements of 2.4, 9.0, and 12.6 percent for semantic segmentation, phase recognition, and CVS classification, respectively. Additionally, SurgeNetXL outperforms the best-performing ImageNet-based variants by 14.4, 4.0, and 1.6 percent in the respective tasks. In addition to advancing model performance, this study provides key insights into scaling pretraining datasets, extending training durations, and optimizing model architectures specifically for surgical computer vision. These findings pave the way for improved generalizability and robustness in data-scarce scenarios, offering a comprehensive framework for future research in this domain. All models and a subset of the SurgeNetXL dataset, including over 2 million video frames, are publicly available at: https://github.com/TimJaspers0801/SurgeNet.


Deep learning on CT scans to predict checkpoint inhibitor treatment outcomes in advanced melanoma

December 2024

·

24 Reads

Immune checkpoint inhibitor (ICI) treatment has proven successful for advanced melanoma, but is associated with potentially severe toxicity and high costs. Accurate biomarkers for response are lacking. The present work is the first to investigate the value of deep learning on CT imaging of metastatic lesions for predicting ICI treatment outcomes in advanced melanoma. Adult patients that were treated with ICI for advanced melanoma were retrospectively identified from ten participating centers. A deep learning model (DLM) was trained on volumes of lesions on baseline CT to predict clinical benefit. The DLM was compared to and combined with a model of known clinical predictors (presence of liver and brain metastasis, level of lactate dehydrogenase, performance status and number of affected organs). A total of 730 eligible patients with 2722 lesions were included. The DLM reached an area under the receiver operating characteristic (AUROC) of 0.607 [95%CI 0.565–0.648]. In comparison, a model of clinical predictors reached an AUROC of 0.635 [95%CI 0.59 –0.678]. The combination model reached an AUROC of 0.635 [95% CI 0.595–0.676]. Differences in AUROC were not statistically significant. The output of the DLM was significantly correlated with four of the five input variables of the clinical model. The DLM reached a statistically significant discriminative value, but was unable to improve over known clinical predictors. The present work shows that the assessment over known clinical predictors is an essential step for imaging-based prediction and brings important nuance to the almost exclusively positive findings in this field. Supplementary Information The online version contains supplementary material available at 10.1038/s41598-024-81188-2.


Enhancing Reconstruction-Based Out-of-Distribution Detection in Brain MRI with Model and Metric Ensembles

December 2024

·

18 Reads

Out-of-distribution (OOD) detection is crucial for safely deploying automated medical image analysis systems, as abnormal patterns in images could hamper their performance. However, OOD detection in medical imaging remains an open challenge, and we address three gaps: the underexplored potential of a simple OOD detection model, the lack of optimization of deep learning strategies specifically for OOD detection, and the selection of appropriate reconstruction metrics. In this study, we investigated the effectiveness of a reconstruction-based autoencoder for unsupervised detection of synthetic artifacts in brain MRI. We evaluated the general reconstruction capability of the model, analyzed the impact of the selected training epoch and reconstruction metrics, assessed the potential of model and/or metric ensembles, and tested the model on a dataset containing a diverse range of artifacts. Among the metrics assessed, the contrast component of SSIM and LPIPS consistently outperformed others in detecting homogeneous circular anomalies. By combining two well-converged models and using LPIPS and contrast as reconstruction metrics, we achieved a pixel-level area under the Precision-Recall curve of 0.66. Furthermore, with the more realistic OOD dataset, we observed that the detection performance varied between artifact types; local artifacts were more difficult to detect, while global artifacts showed better detection results. These findings underscore the importance of carefully selecting metrics and model configurations, and highlight the need for tailored approaches, as standard deep learning approaches do not always align with the unique needs of OOD detection.



Dataset Distribution Impacts Model Fairness: Single Vs. Multi-task Learning

October 2024

·

8 Reads

The influence of bias in datasets on the fairness of model predictions is a topic of ongoing research in various fields. We evaluate the performance of skin lesion classification using ResNet-based CNNs, focusing on patient sex variations in training data and three different learning strategies. We present a linear programming method for generating datasets with varying patient sex and class labels, taking into account the correlations between these variables. We evaluated the model performance using three different learning strategies: a single-task model, a reinforcing multi-task model, and an adversarial learning scheme. Our observations include: 1) sex-specific training data yields better results, 2) single-task models exhibit sex bias, 3) the reinforcement approach does not remove sex bias, 4) the adversarial model eliminates sex bias in cases involving only female patients, and 5) datasets that include male patients enhance model performance for the male subgroup, even when female patients are the majority. To generalise these findings, in future research, we will examine more demographic attributes, like age, and other possibly confounding factors, such as skin colour and artefacts in the skin lesions. We make all data and models available on GitHub.



Citations (56)


... This capability allows DDPMs to produce images with superior quality, greater diversity, improved precision, and enhanced reliability compared to earlier models, which are all paramount in healthcare [33]. Moreover, DDPMs can handle scarce and imbalanced datasets better, where a dataset contains multiple classes but the distribution of data across these classes is imbalanced [34,35]. These characteristics of DDPMs highlight their potential to address, at least to some extent, the limitations discussed earlier and to make the foundation for enhanced professional services for individuals with VF pathologies and VD. ...

Reference:

Feasibility of improving vocal fold pathology image classification with synthetic images generated by DDPM-based GenAI: a pilot study
Denoising diffusion probabilistic models for addressing data limitations in chest X-ray classification
  • Citing Article
  • August 2024

Informatics in Medicine Unlocked

... Additionally, significant differences in segmentation results have been noted among various DLAS tools (Isaksson et al 2023). Moreover, for certain abdominal OARs, deformable contour propagation (DCP) from previous fractions using deformable image registration may result in more acceptable contours than DLAS (Kolenbrander et al 2024). To address these challenges a QA of these contours is a must. ...

Deep‐learning‐based joint rigid and deformable contour propagation for magnetic resonance imaging‐guided prostate radiotherapy
  • Citing Article
  • February 2024

... Ali et al. [60] introduced CB-HVTNet, a model combining CNN and ViT with a channel boosting approach. Evaluated on LYSTO [103] and Nuclick [104] datasets for lymphocyte segmentation, CB-HVTNet outperformed YOLO. The F1-score and recalls for CB-HVTNet were 0.88 and 0.93, surpassing YOLO with values of 0.8 and 0.69. ...

LYSTO: The Lymphocyte Assessment Hackathon and Benchmark Dataset

IEEE Journal of Biomedical and Health Informatics

... These models do not require prior assumptions; instead, they are trained on large paired datasets of homogeneous and inhomogeneous data. For instance, Harrevelt et al. introduced a ResNet-18-based model to correct inhomogeneities induced by the radiofrequency field for prostate T2 weighted imaging at 7T [25], while Venkatesh et al. proposed InhomoNet for the intensity inhomogeneity correction of brain and abdomen T1 weighted MRI [21]. These models typically use homogeneous images and simulated inhomogeneous bias fields for supervised learning. ...

Deep learning based correction of RF field induced inhomogeneities for T2w prostate imaging at 7 T
  • Citing Article
  • August 2023

NMR in Biomedicine

... While Bayesian methods and ensemble networks have been explored for this purpose, they often fall short of capturing the full posterior distribution and its pixel covariances [10,13]. Recent efforts have proposed generative approaches, such as Phi-Seg [2] and the Probabilistic UNet [13,3,4], which aim to model the distribution of plausible segmentations more comprehensively. The generative models take pixel-wise covariances into account, producing plausible segmentation maps which an expert might have produced. ...

Effect of latent space distribution on the segmentation of images with multiple annotations
  • Citing Article
  • April 2023

The Journal of Machine Learning for Biomedical Imaging

... Uncertainty can also be used in an active learning scenario, either with (Diaz-Pinto et al., 2022) or without (Iwamoto et al., 2021) interactive refinement. Shape-based features of uncertainty maps have also been shown to identify false positive predictions (Bhat et al., 2022). Similarly, we too use uncertainty in our training regime, but with the goal of promoting uncertainty only in those regions which are inaccurate, an objective not previously explored in medical image segmentation. ...

Influence of uncertainty estimation techniques on false-positive reduction in liver lesion detection
  • Citing Article
  • December 2022

The Journal of Machine Learning for Biomedical Imaging

... As such‚ AI assistance with computer-aided anatomy recognition may benefit RAMIE. One study that evaluated this was by Boer et al., who developed a deep-learning-based algorithm for anatomy recognition using thoracoscopic video frames from RAMIE for the chest component using AI [11]. The aim was to apply intelligent intraoperative surgical guidance to support surgeons in their anatomy recognition and surgical orientations to reduce morbidity from RAMIE and the learning curve associated with the procedure. ...

Deep learning-based recognition of key anatomical structures during robot-assisted minimally invasive esophagectomy

Surgical Endoscopy

... Ayx et al (2022) compared PCCT and EICT in myocardial imaging, observing significant differences in higher-order texture features due to PCCT's enhanced resolution. Ter Maat et al (2023) suggested that PCCT's spectral data could enhance radiomics models in personalized treatment strategies. ...

CT radiomics compared to a clinical model for predicting checkpoint inhibitor treatment outcomes in patients with advanced melanoma
  • Citing Article
  • February 2023

European Journal of Cancer

... is new approach could enhance clinical and surgical outcomes. [11,34] e objective of this study was to compare two CSD-based tractography methods using 40 diffusion images from the human connectome project (HCP) of healthy individuals and 12 clinical diffusion-weighted images (DWI) from patients who underwent neurosurgical procedures, with a focus on CST segmentation. e study aimed to evaluate the similarity between the two techniques and the specific consistency of their measurements. ...

Reconstruction of the Corticospinal Tract in Patients with Motor-Eloquent High-Grade Gliomas Using Multilevel Fiber Tractography Combined with Functional Motor Cortex Mapping

American Journal of Neuroradiology

... Of the 69 articles, the majority 78% (n = 54) focused on a single quality enhancement aspect, while the remaining 22% (n = 15) addressed a combination of aspects or quality enhancement in general. Consequently, these 15 articles [6,14,15,[32][33][34][35][36][37][38][39][40][41][42][43] were selected for further analysis. ...

Bandwidth Improvement in Ultrasound Image Reconstruction Using Deep Learning Techniques