Gerhard Rigoll’s research while affiliated with Technical University of Munich and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (711)


Fig. 6. Mean scores of the NASA-RTLX ratings ranging from 0 [very low] to 100 [very high]. The scales are (MD) Mental demand, (PD) Physical demand, (TD) Temporal demand, (P) Performance, (E) Effort, (F) Frustration, and (O) Overall. Error bars indicate the standard error.
Optimizing Robot Programming: Mixed Reality Gripper Control
  • Preprint
  • File available

March 2025

·

13 Reads

Maximilian Rettinger

·

Leander Hacker

·

Philipp Wolters

·

Gerhard Rigoll

Conventional robot programming methods are complex and time-consuming for users. In recent years, alternative approaches such as mixed reality have been explored to address these challenges and optimize robot programming. While the findings of the mixed reality robot programming methods are convincing, most existing methods rely on gesture interaction for robot programming. Since controller-based interactions have proven to be more reliable, this paper examines three controller-based programming methods within a mixed reality scenario: 1) Classical Jogging, where the user positions the robot's end effector using the controller's thumbsticks, 2) Direct Control, where the controller's position and orientation directly corresponds to the end effector's, and 3) Gripper Control, where the controller is enhanced with a 3D-printed gripper attachment to grasp and release objects. A within-subjects study (n = 30) was conducted to compare these methods. The findings indicate that the Gripper Control condition outperforms the others in terms of task completion time, user experience, mental demand, and task performance, while also being the preferred method. Therefore, it demonstrates promising potential as an effective and efficient approach for future robot programming. Video available at https://youtu.be/83kWr8zUFIQ.

Download

Potential of digital technologies in counteracting long-standing deficits in hemodialysis machine training

February 2025

·

25 Reads

·

1 Citation

Maximilian Rettinger

·

Julia Steinhaus

·

Annika Hackenberg

·

[...]

·

Before medical professionals are permitted to use a medical device, they first must be instructed in its use. However, it is well known that this method is hazardous for both the staff and the patients due to its inadequate quality. In order to address this problem, we investigated the potential of digital technologies for enhancing medical device training. For this, we designed and implemented several diverse training methods: (1) conventional training by a medical instructor, (2) video-based training, (3) mobile application training on a tablet, (4) virtual reality training, and (5) augmented reality training. Since each method provides identical training content to the user, we compared the resulting learning outcomes between the methods. The findings indicate that virtual and augmented reality training is superior to conventional training. These digital technologies offer the opportunity to reduce the burden on healthcare professionals and increase patient safety.


Knowledge-Informed Multi-Agent Trajectory Prediction at Signalized Intersections for Infrastructure-to-Everything

January 2025

·

3 Reads

Multi-agent trajectory prediction at signalized intersections is crucial for developing efficient intelligent transportation systems and safe autonomous driving systems. Due to the complexity of intersection scenarios and the limitations of single-vehicle perception, the performance of vehicle-centric prediction methods has reached a plateau. Furthermore, most works underutilize critical intersection information, including traffic signals, and behavior patterns induced by road structures. Therefore, we propose a multi-agent trajectory prediction framework at signalized intersections dedicated to Infrastructure-to-Everything (I2XTraj). Our framework leverages dynamic graph attention to integrate knowledge from traffic signals and driving behaviors. A continuous signal-informed mechanism is proposed to adaptively process real-time traffic signals from infrastructure devices. Additionally, leveraging the prior knowledge of the intersection topology, we propose a driving strategy awareness mechanism to model the joint distribution of goal intentions and maneuvers. To the best of our knowledge, I2XTraj represents the first multi-agent trajectory prediction framework explicitly designed for infrastructure deployment, supplying subscribable prediction services to all vehicles at intersections. I2XTraj demonstrates state-of-the-art performance on both the Vehicle-to-Infrastructure dataset V2X-Seq and the aerial-view dataset SinD for signalized intersections. Quantitative evaluations show that our approach outperforms existing methods by more than 30% in both multi-agent and single-agent scenarios.


SpaRC: Sparse Radar-Camera Fusion for 3D Object Detection

November 2024

·

57 Reads

In this work, we present SpaRC, a novel Sparse fusion transformer for 3D perception that integrates multi-view image semantics with Radar and Camera point features. The fusion of radar and camera modalities has emerged as an efficient perception paradigm for autonomous driving systems. While conventional approaches utilize dense Bird's Eye View (BEV)-based architectures for depth estimation, contemporary query-based transformers excel in camera-only detection through object-centric methodology. However, these query-based approaches exhibit limitations in false positive detections and localization precision due to implicit depth modeling. We address these challenges through three key contributions: (1) sparse frustum fusion (SFF) for cross-modal feature alignment, (2) range-adaptive radar aggregation (RAR) for precise object localization, and (3) local self-attention (LSA) for focused query aggregation. In contrast to existing methods requiring computationally intensive BEV-grid rendering, SpaRC operates directly on encoded point features, yielding substantial improvements in efficiency and accuracy. Empirical evaluations on the nuScenes and TruckScenes benchmarks demonstrate that SpaRC significantly outperforms existing dense BEV-based and sparse query-based detectors. Our method achieves state-of-the-art performance metrics of 67.1 NDS and 63.1 AMOTA. The code and pretrained models are available at https://github.com/phi-wol/sparc.




Spatial-Temporal Multi-Cuts for Online Multiple-Camera Vehicle Tracking

October 2024

·

3 Reads

Accurate online multiple-camera vehicle tracking is essential for intelligent transportation systems, autonomous driving, and smart city applications. Like single-camera multiple-object tracking, it is commonly formulated as a graph problem of tracking-by-detection. Within this framework, existing online methods usually consist of two-stage procedures that cluster temporally first, then spatially, or vice versa. This is computationally expensive and prone to error accumulation. We introduce a graph representation that allows spatial-temporal clustering in a single, combined step: New detections are spatially and temporally connected with existing clusters. By keeping sparse appearance and positional cues of all detections in a cluster, our method can compare clusters based on the strongest available evidence. The final tracks are obtained online using a simple multicut assignment procedure. Our method does not require any training on the target scene, pre-extraction of single-camera tracks, or additional annotations. Notably, we outperform the online state-of-the-art on the CityFlow dataset in terms of IDF1 by more than 14%, and on the Synthehicle dataset by more than 25%, respectively. The code is publicly available.



Efficient Interaction-Aware Trajectory Prediction Model Based on Multi-head Attention

April 2024

·

15 Reads

·

3 Citations

Automotive Innovation

Predicting vehicle trajectories using deep learning has seen substantial progress in recent years. However, making autonomous vehicles pay attention to their surrounding vehicles with the consideration of social interaction remains an open problem, especially in long-term prediction scenarios. Unlike autonomous vehicles, human drivers continuously observes and analyzes interactive information between their vehicle and other traffic participants for long-term route planning. To alleviate the challenge that the trajectory prediction should be interaction-aware, this study proposes a multi-head attention mechanism to boost the trajectory prediction performance by globally exploiting the interactive information. The multi-dimensional spatial interactive information encoded with the vehicle type and size can assign different weights of surrounding vehicles to realize the interaction of diverse trajectories. Furthermore, the model is based on a simple data pre-processing method, surpassing the traditional grid data processing approach. In the experiment, the proposed model achieves significant prediction performance. Surprisingly, this proposed multi-head trajectory prediction model outperforms state-of-the-art models, particularly in long-term prediction metrics. The code for this model is accessible at: https://github.com/pengpengjun/hybrid attention.



Citations (55)


... Mixed reality (MR) [18] allows users to augment the physical environment with virtual elements, offering promising opportunities in fields ranging from healthcare [12,29,30] * e-mail: maximilian.rettinger@tum.de to ordnance disposal [27,28]. ...

Reference:

Optimizing Robot Programming: Mixed Reality Gripper Control
Potential of digital technologies in counteracting long-standing deficits in hemodialysis machine training

... Mixed reality (MR) [18] allows users to augment the physical environment with virtual elements, offering promising opportunities in fields ranging from healthcare [12,29,30] * e-mail: maximilian.rettinger@tum.de to ordnance disposal [27,28]. ...

Optimizing Medical Device Training: The Role of Multi-User VR and Expert Guidance
  • Citing Conference Paper
  • November 2024

... To overcome these limitations, [1,30] used the 3D feature-pulling method [11], initially proposed as a lifting technique for autonomous vehicle perception, to project the features to a unified 3D space. This method creates a unified 3D feature volume representing the scene by sampling the features from the images for each voxel, effectively addressing the issues arising from perspective projection. ...

Lifting Multi-View Detection and Tracking to the Bird’s Eye View
  • Citing Conference Paper
  • June 2024

... An example illustrating the occupancy volume of the scene, reconstructed using the visual hull technique, which highlights the voxels corresponding to the regions with high probability of being occupied by pedestrians. sive coverage from multiple perspectives that helps mitigate occlusions, as objects occluded in one camera's view can be captured by another [29]. Camera calibrations are typically provided to enable the aggregation of data from multiple perspectives. ...

EarlyBird: Early-Fusion for Multi-View Tracking in the Bird's Eye View
  • Citing Conference Paper
  • January 2024

... In this work, traditional segmentation performance metrics such as Intersection over Union (IoU) [9] or Dice Coefficient [5] could not be utilised due to the absence of ground truth masks for the datasets D t2 and D t3 . Additionally, pre-trained SAM is unable to generate high-quality masks due to the existence of uneven surface textures, particularly those extending from the printed ACF code area regardless of its prompt type (point or box), which can be revealed in Fig. 5. Instead, we employed a proprietary screening tool provided by our industrial partner, which provides a justification of whether a cropped ACF code meets industrial regular standards or not. ...

Do We Still Need Non-Maximum Suppression? Accurate Confidence Estimates and Implicit Duplication Modeling with IoU-Aware Calibration
  • Citing Conference Paper
  • January 2024

... During the Group training, the users can hear the defined training steps by the instructor; for reasons of comparability, we integrated recordings of a professional speaker into the other four training methods. In the MA, VR, and AR conditions, the content of the voice recordings is also displayed visually as text for effective information transfer 74 . This allows users to re-read the instructions if they were, e.g., distracted by the interaction. ...

Enhancing VR Training: Impact of Information Transfer Methods
  • Citing Conference Paper
  • October 2023

... Building on this, a new research direction has emerged, addressing the relatively novel problem of tracking 3D human inputs solely based on event streams from an event camera, thereby completely eliminating the need for additional dense input images. In 2023, Eisl et al. (2023) presented a novel framework for tracking humans using a single event camera, comprising three main components. First, a Graph Neural Network was trained to identify a person within the stream of events. ...

Introducing A Framework for Single-Human Tracking Using Event-Based Cameras
  • Citing Conference Paper
  • October 2023

... B. Information Swapping 1) Wavelet Decomposition: Wavelet decomposition is a traditional image processing method that can decompose images into feature details at different scales and directions in the frequency domain. By performing adversarial perturbation operations in the wavelet domain, the attacker is able to embed covert adversarial information while preserving the overall structure of the original image [33]. Generating the adversarial examples in this way can effectively mislead the perception model while the examples are visually close to the original examples [25]. ...

Wavelet regularization benefits adversarial training
  • Citing Article
  • September 2023

Information Sciences

... concept drift). Human intuition can be used to mitigate such generalization caveats by adding more nuanced yet general reasoning into the model, enabling it to generalize to unseen, potentially novel, examples (Agarwal et al., 2024;Grønsund & Aanestad, 2020;Knoche & Rigoll, 2023;Langlotz, 2019). ...

Tackling Face Verification Edge Cases: In-Depth Analysis and Human-Machine Fusion Approach
  • Citing Conference Paper
  • July 2023

... Mixed reality (MR) [18] allows users to augment the physical environment with virtual elements, offering promising opportunities in fields ranging from healthcare [12,29,30] * e-mail: maximilian.rettinger@tum.de to ordnance disposal [27,28]. For robot programming, this potential has been explored for several years [1,5,25] since traditional programming approaches are unintuitive, highly skill-demanded, or time-consuming [21,36]. ...

Touching the future of training: investigating tangible interaction in virtual reality

Frontiers in Virtual Reality