Francesco Di Salvo’s research while affiliated with University of Bamberg and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (9)


Self-supervised Vision Transformer are Scalable Generative Models for Domain Generalization
  • Chapter

October 2024

·

1 Read

·

Francesco Di Salvo

·


Figure 1: (a) Illustration of domain shifts in terms of different contrasts and brightness levels among manufacturers (i-iii), among models from the same producer (iv-vi), and among the same model at different sites (vii-ix) for the same slice of the same individual across multiple scans. (b) High-level overview of our proposed approach. During training, the encoder E is trained to orthogonalize anatomical and image-characteristic features in an input image (orange path). Once trained, the learned feature orthogonalization by the frozen encoder is used for various downstream tasks, including bias removal, corruption detection and revision in input images, as well as robust, distortion-invariant disease classification (purple path).
Figure 2: Schematic representation of the training pipeline for unORANIC (adapted from [7]). The input image I is assumed to be bias-free and uncorrupted. Random augmentations A S , A v 1 , and A v 2 distort I to generate synthetic corrupted versions S, V 1 , and V 2 with identical anatomical information but different distortions. These distorted images are processed by the shared anatomy encoder E A , which uses the consistency loss L C to learn anatomical, distortion-invariant features. Concurrently, S is processed by the characteristic encoder E C to capture image-specific details such as contrast and brightness. Reconstruction losses L R S and L R I are applied to the reconstructed imagesˆSimagesˆ imagesˆS andˆIandˆ andˆI A by decoder D and D A , respectively, to ensure that E A and E C learn comprehensive, reliable features.
Figure 4: Examples from the datasets of the MedMNIST v2 benchmark [27] used for evaluating our approach (left to right: blood, breast, chest, derma, pneumonia, and retina)
Comparison of average Peak Signal-to-Noise Ratio (PSNR) for the reconstructions of the original input ( ˆ I) and the clean anatomical reconstructions ( ˆ I A ) given an input image (I) between unORANIC and unORANIC+ on the test sets of all six higher dimensional datasets. The best performance per reconstruction task is indicated in bold.
Unsupervised Feature Orthogonalization for Learning Distortion-Invariant Representations
  • Preprint
  • File available

September 2024

·

21 Reads

This study introduces unORANIC+, a novel method that integrates unsupervised feature orthogonalization with the ability of a Vision Transformer to capture both local and global relationships for improved robustness and generalizability. The streamlined architecture of unORANIC+ effectively separates anatomical and image-specific attributes, resulting in robust and unbiased latent representations that allow the model to demonstrate excellent performance across various medical image analysis tasks and diverse datasets. Extensive experimentation demonstrates unORANIC+'s reconstruction proficiency, corruption resilience, as well as capability to revise existing image distortions. Additionally, the model exhibits notable aptitude in downstream tasks such as disease classification and corruption detection. We confirm its adaptability to diverse datasets of varying image sources and sample sizes which positions the method as a promising algorithm for advanced medical image analysis, particularly in resource-constrained environments lacking large, tailored datasets. The source code is available at https://github.com/sdoerrich97/unoranic-plus .

Download

Fig. 2: Example of test images and their relative top-3 closest training samples extracted from the STL-10 dataset (Coates et al., 2011). Figure 2a shows two test images having wrong labels, while Figure 2b shows ambiguous labels, including multiple known objects within a single image. For instance, a bird with a ship in the background, or a dog fighting with a cat.
Fig. 3: t-SNE projection of a CIFAR-10 subset and its noisy (20% asymmetric) counterpart.
Fig. 4: Average accuracy (↑) and 95% confidence interval on stratified subsets of Animal-10N.
Accuracy (↑) on two clean and noisy datasets (Wei et al., 2022). The two best results are highlighted in bold. Note that NR stands for Noise Rate. CIFAR-10N includes three annotations per image, and Noisy split denotes various aggregation strategies. Following the proposed naming convention, Aggr refers to a majority voting, R-i (i ∈ {1, 2, 3}) denotes the i-th submitted label for each image, and Worst denotes the selection of only wrong labels, if applicable. Conversely, CIFAR-100N presents just one annotation per image, thereby offering one single noisy split.
An Embedding is Worth a Thousand Noisy Labels

August 2024

·

14 Reads

The performance of deep neural networks scales with dataset size and label quality, rendering the efficient mitigation of low-quality data annotations crucial for building robust and cost-effective systems. Existing strategies to address label noise exhibit severe limitations due to computational complexity and application dependency. In this work, we propose WANN, a Weighted Adaptive Nearest Neighbor approach that builds on self-supervised feature representations obtained from foundation models. To guide the weighted voting scheme, we introduce a reliability score, which measures the likelihood of a data label being correct. WANN outperforms reference methods, including a linear layer trained with robust loss functions, on diverse datasets of varying size and under various noise types and severities. WANN also exhibits superior generalization on imbalanced data compared to both Adaptive-NNs (ANN) and fixed k-NNs. Furthermore, the proposed weighting scheme enhances supervised dimensionality reduction under noisy labels. This yields a significant boost in classification performance with 10x and 100x smaller image embeddings, minimizing latency and storage requirements. Our approach, emphasizing efficiency and explainability, emerges as a simple, robust solution to overcome the inherent limitations of deep neural network training. The code is available at https://github.com/francescodisalvo05/wann-noisy-labels .


Privacy-preserving datasets by capturing feature distributions with Conditional VAEs

August 2024

·

15 Reads

Large and well-annotated datasets are essential for advancing deep learning applications, however often costly or impossible to obtain by a single entity. In many areas, including the medical domain, approaches relying on data sharing have become critical to address those challenges. While effective in increasing dataset size and diversity, data sharing raises significant privacy concerns. Commonly employed anonymization methods based on the k-anonymity paradigm often fail to preserve data diversity, affecting model robustness. This work introduces a novel approach using Conditional Variational Autoencoders (CVAEs) trained on feature vectors extracted from large pre-trained vision foundation models. Foundation models effectively detect and represent complex patterns across diverse domains, allowing the CVAE to faithfully capture the embedding space of a given data distribution to generate (sample) a diverse, privacy-respecting, and potentially unbounded set of synthetic feature vectors. Our method notably outperforms traditional approaches in both medical and natural image domains, exhibiting greater dataset diversity and higher robustness against perturbations while preserving sample privacy. These results underscore the potential of generative models to significantly impact deep learning applications in data-scarce and privacy-sensitive environments. The source code is available at https://github.com/francescodisalvo05/cvae-anonymization .


Fig. 2: Examples from the histopathology datasets used for evaluating domain generalization. Left: Camelyon17-wilds for which the domains are hospitals. Right: Combined epithelium-stroma dataset for which the domains are datasets.
Fig. 3: Qualitative evaluation of our method's reconstruction capability on the Camelyon17-wilds dataset.
Fig. 4: Qualitative evaluation of the method's generative capabilities on the Camelyon17-wilds dataset by means of synthetic images created through its anatomy-characteristics intermixing.
Accuracy in % on the epithelium-stroma dataset.
Self-supervised Vision Transformer are Scalable Generative Models for Domain Generalization

July 2024

·

43 Reads

Despite notable advancements, the integration of deep learning (DL) techniques into impactful clinical applications, particularly in the realm of digital histopathology, has been hindered by challenges associated with achieving robust generalization across diverse imaging domains and characteristics. Traditional mitigation strategies in this field such as data augmentation and stain color normalization have proven insufficient in addressing this limitation, necessitating the exploration of alternative methodologies. To this end, we propose a novel generative method for domain generalization in histopathology images. Our method employs a generative, self-supervised Vision Transformer to dynamically extract characteristics of image patches and seamlessly infuse them into the original images, thereby creating novel, synthetic images with diverse attributes. By enriching the dataset with such synthesized images, we aim to enhance its holistic nature, facilitating improved generalization of DL models to unseen domains. Extensive experiments conducted on two distinct histopathology datasets demonstrate the effectiveness of our proposed approach, outperforming the state of the art substantially, on the Camelyon17-wilds challenge dataset (+2%) and on a second epithelium-stroma dataset (+26%). Furthermore, we emphasize our method's ability to readily scale with increasingly available unlabeled data samples and more complex, higher parametric architectures. Source code is available at https://github.com/sdoerrich97/vits-are-generative-models .


Fig. 1: Overview of four different corruptions applied (from top to bottom) to PathMNIST, ChestMNIST, DermaMNIST, and RetinaMNIST.
MedMNIST-C: Comprehensive benchmark and improved classifier robustness by simulating realistic image corruptions

June 2024

·

70 Reads

The integration of neural-network-based systems into clinical practice is limited by challenges related to domain generalization and robustness. The computer vision community established benchmarks such as ImageNet-C as a fundamental prerequisite to measure progress towards those challenges. Similar datasets are largely absent in the medical imaging community which lacks a comprehensive benchmark that spans across imaging modalities and applications. To address this gap, we create and open-source MedMNIST-C, a benchmark dataset based on the MedMNIST+ collection covering 12 datasets and 9 imaging modalities. We simulate task and modality-specific image corruptions of varying severity to comprehensively evaluate the robustness of established algorithms against real-world artifacts and distribution shifts. We further provide quantitative evidence that our simple-to-use artificial corruptions allow for highly performant, lightweight data augmentation to enhance model robustness. Unlike traditional, generic augmentation strategies, our approach leverages domain knowledge, exhibiting significantly higher robustness when compared to widely adopted methods. By introducing MedMNIST-C and open-sourcing the corresponding library allowing for targeted data augmentations, we contribute to the development of increasingly robust methods tailored to the challenges of medical imaging. The code is available at https://github.com/francescodisalvo05/medmnistc-api}{github.com/francescodisalvo05/medmnistc-api.



unORANIC: Unsupervised Orthogonalization of Anatomy and Image-Characteristic Features

October 2023

·

9 Reads

·

1 Citation

Lecture Notes in Computer Science

We introduce unORANIC, an unsupervised approach that uses an adapted loss function to drive the orthogonalization of anatomy and image-characteristic features. The method is versatile for diverse modalities and tasks, as it does not require domain knowledge, paired data samples, or labels. During test time unORANIC is applied to potentially corrupted images, orthogonalizing their anatomy and characteristic components, to subsequently reconstruct corruption-free images, showing their domain-invariant anatomy only. This feature orthogonalization further improves generalization and robustness against corruptions. We confirm this qualitatively and quantitatively on 5 distinct datasets by assessing unORANIC’s classification accuracy, corruption detection and revision capabilities. Our approach shows promise for enhancing the generalizability and robustness of practical applications in medical image analysis. The source code is available at github.com/sdoerrich97/unORANIC.


Fig. 2: Training pipeline of unORANIC for CT images.
Details of the selected set of datasets
PSNR values of the reconstructions
unORANIC: Unsupervised Orthogonalization of Anatomy and Image-Characteristic Features

August 2023

·

36 Reads

We introduce unORANIC, an unsupervised approach that uses an adapted loss function to drive the orthogonalization of anatomy and image-characteristic features. The method is versatile for diverse modalities and tasks, as it does not require domain knowledge, paired data samples, or labels. During test time unORANIC is applied to potentially corrupted images, orthogonalizing their anatomy and characteristic components, to subsequently reconstruct corruption-free images, showing their domain-invariant anatomy only. This feature orthogonalization further improves generalization and robustness against corruptions. We confirm this qualitatively and quantitatively on 5 distinct datasets by assessing unORANIC's classification accuracy, corruption detection and revision capabilities. Our approach shows promise for enhancing the generalizability and robustness of practical applications in medical image analysis. The source code is available at https://github.com/sdoerrich97/unORANIC.

Citations (2)


... For feature extraction, we used the DINOv2 ViT-B/14 foundation model [29], which has been trained on 142 million natural images and outputs image embeddings of size 768. To optimize the training process of our CVAE, these embeddings were pre-generated and stored on disk [7,25]. For datasets lacking official validation splits, we extracted a stratified sample (10%) from the training data to ensure representativeness. ...

Reference:

Privacy-preserving datasets by capturing feature distributions with Conditional VAEs
Integrating kNN with Foundation Models for Adaptable and Privacy-Aware Image Classification
  • Citing Conference Paper
  • May 2024

... To tackle these issues, the work of unORANIC [7] has shown that unsupervised orthogonalization of anatomy and image-characteristic features can substantially improve robustness and generalizability without the need for domain knowledge, paired data, or labels. Building on this, we introduce unORANIC+, a simpler, more robust, and overall higher performant improvement. ...

unORANIC: Unsupervised Orthogonalization of Anatomy and Image-Characteristic Features
  • Citing Chapter
  • October 2023

Lecture Notes in Computer Science