Nassir Navab’s research while affiliated with Technical University of Munich and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (931)


Pre-training with PBPK-based Digital Twin Enhances Deep Learning for Predictive Dosimetry
  • Article

March 2025

·

3 Reads

Nuklearmedizin

M Kassar

·

Y Chen

·

·

[...]

·


Fig. 2 a US probe positioned perpendicular to the phantom, showing the initial needle setup. b The US image showing the needle inserted into the phantom with the US probe perpendicular to the phantom. c
Robotic CBCT meets robotic ultrasound
  • Article
  • Full-text available

March 2025

·

4 Reads

International Journal of Computer Assisted Radiology and Surgery

Purpose The multi-modality imaging system offers optimal fused images for safe and precise interventions in modern clinical practices, such as computed tomography-ultrasound (CT-US) guidance for needle insertion. However, the limited dexterity and mobility of current imaging devices hinder their integration into standardized workflows and the advancement toward fully autonomous intervention systems. In this paper, we present a novel clinical setup where robotic cone beam computed tomography (CBCT) and robotic US are pre-calibrated and dynamically co-registered, enabling new clinical applications. This setup allows registration-free rigid registration, facilitating multi-modal guided procedures in the absence of tissue deformation. Methods First, a one-time pre-calibration is performed between the systems. To ensure a safe insertion path by highlighting critical vasculature on the 3D CBCT, SAM2 segments vessels from B-mode images, using the Doppler signal as an autonomously generated prompt. Based on the registration, the Doppler image or segmented vessel masks are then mapped onto the CBCT, creating an optimally fused image with comprehensive detail. To validate the system, we used a specially designed phantom, featuring lesions covered by ribs and multiple vessels with simulated moving flow. Results The mapping error between US and CBCT resulted in an average deviation of 1.72±0.621.72\pm 0.62 1.72 ± 0.62 mm. A user study demonstrated the effectiveness of CBCT-US fusion for needle insertion guidance, showing significant improvements in time efficiency, accuracy, and success rate. Needle intervention performance improved by approximately 50% compared to the conventional US-guided workflow. Conclusion We present the first robotic dual-modality imaging system designed to guide clinical applications. The results show significant performance improvements compared to traditional manual interventions.

Download


Skelite: Compact Neural Networks for Efficient Iterative Skeletonization

March 2025

·

1 Read

Skeletonization extracts thin representations from images that compactly encode their geometry and topology. These representations have become an important topological prior for preserving connectivity in curvilinear structures, aiding medical tasks like vessel segmentation. Existing compatible skeletonization algorithms face significant trade-offs: morphology-based approaches are computationally efficient but prone to frequent breakages, while topology-preserving methods require substantial computational resources. We propose a novel framework for training iterative skeletonization algorithms with a learnable component. The framework leverages synthetic data, task-specific augmentation, and a model distillation strategy to learn compact neural networks that produce thin, connected skeletons with a fully differentiable iterative algorithm. Our method demonstrates a 100 times speedup over topology-constrained algorithms while maintaining high accuracy and generalizing effectively to new domains without fine-tuning. Benchmarking and downstream validation in 2D and 3D tasks demonstrate its computational efficiency and real-world applicability


Fig. 3: Registration of Virtual and Real Robot Using Predefined Points. Predefined points on the virtual robot are shown in green. We marked the corresponding points on the real robot with the HMD, shown in orange. The dashed lines represent the transformation matrix W T to be solved, which aligns the two point sets.
Enhancing Patient Acceptance of Robotic Ultrasound through Conversational Virtual Agent and Immersive Visualizations

March 2025

·

3 Reads

IEEE Transactions on Visualization and Computer Graphics

Robotic ultrasound systems have the potential to improve medical diagnostics, but patient acceptance remains a key challenge. To address this, we propose a novel system that combines an AI-based virtual agent, powered by a large language model (LLM), with three mixed reality visualizations aimed at enhancing patient comfort and trust. The LLM enables the virtual assistant to engage in natural, conversational dialogue with patients, answering questions in any format and offering real-time reassurance, creating a more intelligent and reliable interaction. The virtual assistant is animated as controlling the ultrasound probe, giving the impression that the robot is guided by the assistant. The first visualization employs augmented reality (AR), allowing patients to see the real world and the robot with the virtual avatar superimposed. The second visualization is an augmented virtuality (AV) environment, where the real-world body part being scanned is visible, while a 3D Gaussian Splatting reconstruction of the room, excluding the robot, forms the virtual environment. The third is a fully immersive virtual reality (VR) experience, featuring the same 3D reconstruction but entirely virtual, where the patient sees a virtual representation of their body being scanned in a robot-free environment. In this case, the virtual ultrasound probe, mirrors the movement of the probe controlled by the robot, creating a synchronized experience as it touches and moves over the patient's virtual body. We conducted a comprehensive agent-guided robotic ultrasound study with all participants, comparing these visualizations against a standard robotic ultrasound procedure. Results showed significant improvements in patient trust, acceptance, and comfort. Based on these findings, we offer insights into designing future mixed reality visualizations and virtual agents to further enhance patient comfort and acceptance in autonomous medical procedures


Rewarding Doubt: A Reinforcement Learning Approach to Confidence Calibration of Large Language Models

March 2025

A safe and trustworthy use of Large Language Models (LLMs) requires an accurate expression of confidence in their answers. We introduce a novel Reinforcement Learning (RL) approach for LLM calibration that fine-tunes LLMs to elicit calibrated confidence estimations in their answers to factual questions. We model the problem as a betting game where the model predicts a confidence score together with every answer, and design a reward function that penalizes both over and under-confidence. We prove that under our reward design an optimal policy would result in a perfectly calibrated confidence estimation. Our experiments demonstrate significantly improved confidence calibration and generalization to new tasks without re-training, indicating that our approach teaches a general confidence awareness. This approach enables the training of inherently calibrated LLMs.


MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments

March 2025

·

2 Reads

Operating rooms (ORs) are complex, high-stakes environments requiring precise understanding of interactions among medical staff, tools, and equipment for enhancing surgical assistance, situational awareness, and patient safety. Current datasets fall short in scale, realism and do not capture the multimodal nature of OR scenes, limiting progress in OR modeling. To this end, we introduce MM-OR, a realistic and large-scale multimodal spatiotemporal OR dataset, and the first dataset to enable multimodal scene graph generation. MM-OR captures comprehensive OR scenes containing RGB-D data, detail views, audio, speech transcripts, robotic logs, and tracking data and is annotated with panoptic segmentations, semantic scene graphs, and downstream task labels. Further, we propose MM2SG, the first multimodal large vision-language model for scene graph generation, and through extensive experiments, demonstrate its ability to effectively leverage multimodal inputs. Together, MM-OR and MM2SG establish a new benchmark for holistic OR understanding, and open the path towards multimodal scene analysis in complex, high-stakes environments. Our code, and data is available at https://github.com/egeozsoy/MM-OR.



SpecstatOR: Speckle statistics-based iOCT Segmentation Network for Ophthalmic Surgery

February 2025

·

4 Reads


VibNet: Vibration-Boosted Needle Detection in Ultrasound Images

February 2025

·

33 Reads

IEEE Transactions on Medical Imaging

Precise percutaneous needle detection is crucial for ultrasound (US)-guided interventions. However, inherent limitations such as speckles, needle-like artifacts, and low resolution make it challenging to robustly detect needles, especially when their visibility is reduced or imperceptible. To address this challenge, we propose VibNet, a learning-based framework designed to enhance the robustness and accuracy of needle detection in US images by leveraging periodic vibration applied externally to the needle shafts. VibNet integrates neural Short-Time Fourier Transform and Hough Transform modules to achieve successive sub-goals, including motion feature extraction in the spatiotemporal space, frequency feature aggregation, and needle detection in the Hough space. Due to the periodic subtle vibration, the features are more robust in the frequency domain than in the image intensity domain, making VibNet more effective than traditional intensity-based methods. To demonstrate the effectiveness of VibNet, we conducted experiments on distinct ex vivo porcine and bovine tissue samples. The results obtained on porcine samples demonstrate that VibNet effectively detects needles even when their visibility is severely reduced, with a tip error of 1.61±1.56 mm compared to 8.15±9.98 mm for UNet and 6.63±7.58 mm for WNet, and a needle direction error of 1.64 ± 1.86° compared to 9.29 ± 15.30° for UNet and 8.54 ± 17.92° for WNet. Code: https://github.com/marslicy/VibNet.


Citations (19)


... CATARACTS Better accuracy of V-YT than Sangria [29] (∼ 4.5%) highlights the efficacy of videolanguage representation learning and pretraining on our YT-dataset. Our results show that training on a large-scale dataset of YouTube videos (V-YT) yields competitive performance to training directly on the target dataset (V-CAT). ...

Reference:

Watch and Learn: Leveraging Expert Knowledge and Language for Surgical Video Understanding
SANGRIA: Surgical Video Scene Graph Optimization for Surgical Workflow Prediction
  • Citing Chapter
  • March 2025

... To further capture diverse spatial arrangement, the data-driven approaches (Fisher et al. 2012;Qi et al. 2018;Xu et al. 2014;Ma et al. 2018b;Sun et al. 2022Sun et al. , 2024b learn the object relationship from datasets (Fu et al. 2021;Song et al. 2017). The researchers have developed all kinds of networks to learn the scenes represented as (Paschalidou et al. 2021;Wei et al. 2023;Tang et al. 2023;Zhai et al. 2024a), etc. However, due to the inherent complexity, it is difficult to capture the essential relationship from the observed layouts and generalize to other categories. ...

EchoScene: Indoor Scene Generation via Information Echo Over Scene Graph Diffusion
  • Citing Chapter
  • October 2024

... Large language models (LLMs) can automate structured data extraction from radiology reports by processing unstructured text to efficiently yield clinically relevant insights [5][6][7]. However, privacy concerns around sensitive medical data have prompted interest in open-source LLMs, which offer greater transparency, adaptability, and control. ...

Large language models for structured reporting in radiology: past, present, and future

European Radiology

... Two diffusion models were trained to generate a patch-wise clean image and haze separately using a hazy image as a condition. Domingues et al. [49] introduced ultrasound physics to a diffusion model for ultrasound image generation. They modified the noise scheduler based on attenuation maps to include the attenuation in their synthetic data. ...

Diffusion as Sound Propagation: Physics-Inspired Model for Ultrasound Image Generation
  • Citing Chapter
  • October 2024

... Surgical Image Recognition involves using computer vision to automatically detect and classify surgical instruments, anatomical structures, actions, and phases in surgery videos or images , which can be utilized for skill assessment (Lam et al., 2022), surgical workflow understanding (Hu et al., 2025;Yuan et al., 2024), and robotic-assisted surgery (Diana & Marescaux, 2015;Rivero-Moreno et al., 2023). Datasets such as EndoVis17 and EndoVis18 provide annotations for instruments and organs, which are commonly used for training and evaluating models in tasks like tool detection and segmentation. ...

HecVL: Hierarchical Video-Language Pretraining for Zero-Shot Surgical Phase Recognition
  • Citing Chapter
  • October 2024

... This work presents the largest publicly available, expert-annotated dataset of peripheral blood single-cells, with over 40,000 images. While our dataset is being published here for the first time, it has been used in previous studies 4,5,[19][20][21][22] . ...

Neural Cellular Automata for Lightweight, Robust and Explainable Classification of White Blood Cell Images
  • Citing Chapter
  • October 2024

... KGARevion [181] is a knowledge graph-based agent, combining LLM-generated triples with grounded KG verification to ensure robust reasoning and accurate answers. This can help alleviate the shortage of clinical resources in some special clinical scenarios and tasks [182,183]. Medical agents can also achieve autonomous evolution by learning from successful clinical cases, such as the MedAgent-Zero strategy [184]. ...

MAGDA: Multi-agent Guideline-Driven Diagnostic Assistance
  • Citing Chapter
  • September 2024

... However, this level of explanation is likely to be insufficient for most of the applications discussed above. Counterfactual explanations, which allow the user to visualize how the image would need to change to change the prediction, may provide one more promising direction 131 . ...

Counterfactual Explanations for Medical Image Classification and Regression using Diffusion Autoencoder
  • Citing Article
  • September 2024

The Journal of Machine Learning for Biomedical Imaging

... For example, Dastan, Fiorentino, and Uva (2024) reviewed tools like the Precise Tool to Target Positioning Widgets (TOTTA), which demonstrated AR's capability to enhance spatial accuracy in surgical workflows. Another study by Dastan, Fiorentino, Walter (2024) explored the codesign of mixed reality drill positioning systems with dentists, emphasizing AR's ability to function effectively in realistic clinical setups. These findings establish AR as a transformative technology for medical and dental procedures. ...

Co-Designing Dynamic Mixed Reality Drill Positioning Widgets: A Collaborative Approach with Dentists in a Realistic Setup

IEEE Transactions on Visualization and Computer Graphics