Alois Knoll’s research while affiliated with Technical University of Munich and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (819)


CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving
  • Preprint
  • File available

March 2025

·

9 Reads

·

Chenwei Liang

·

Yan Xia

·

[...]

·

Alois Knoll

Dynamic scene rendering opens new avenues in autonomous driving by enabling closed-loop simulations with photorealistic data, which is crucial for validating end-to-end algorithms. However, the complex and highly dynamic nature of traffic environments presents significant challenges in accurately rendering these scenes. In this paper, we introduce a novel 4D Gaussian Splatting (4DGS) approach, which incorporates context and temporal deformation awareness to improve dynamic scene rendering. Specifically, we employ a 2D semantic segmentation foundation model to self-supervise the 4D semantic features of Gaussians, ensuring meaningful contextual embedding. Simultaneously, we track the temporal deformation of each Gaussian across adjacent frames. By aggregating and encoding both semantic and temporal deformation features, each Gaussian is equipped with cues for potential deformation compensation within 3D space, facilitating a more precise representation of dynamic scenes. Experimental results show that our method improves 4DGS's ability to capture fine details in dynamic scene rendering for autonomous driving and outperforms other self-supervised methods in 4D reconstruction and novel view synthesis. Furthermore, CoDa-4DGS deforms semantic features with each Gaussian, enabling broader applications.

Download


Fig. 1. LLM-based workflow for iterative metamodeling in automotive domain: 1-Freeform text 2-Text chunks 3-Ecore metamodel (partial) and updated model parser 4-Metamodel visualization based on PlantUML 5-Expert feedback/update request 6-Ecore metamodel (complete) 7-Freeform text 8-Model instance 9-XMI file 10-OCL rules based on standards 11-Complete and consistent system model 12-Executable code .
LLM-based Iterative Approach to Metamodeling in Automotive

March 2025

·

23 Reads

In this paper, we introduce an automated approach to domain-specific metamodel construction relying on Large Language Model (LLM). The main focus is adoption in automotive domain. As outcome, a prototype was implemented as web service using Python programming language, while OpenAI's GPT-4o was used as the underlying LLM. Based on the initial experiments, this approach successfully constructs Ecore metamodel based on set of automotive requirements and visualizes it making use of PlantUML notation, so human experts can provide feedback in order to refine the result. Finally, locally deployable solution is also considered, including the limitations and additional steps required.


Fig. 1. MLLM-based diagram prompting for product updates identification.
Fig. 2. The workflow of MLLM-based image prompting for automotive scenarios.
A summary of commonly used MLLMs.
The results of MLLMs performance on 5 questions related to [7].
Automotive diagram prompting service REST API and execution times.
Multi-modal Summarization in Model-Based Engineering: Automotive Software Development Case Study

March 2025

·

15 Reads

Multimodal summarization integrating information from diverse data modalities presents a promising solution to aid the understanding of information within various processes. However, the application and advantages of multimodal summarization have not received much attention in model-based engineering (MBE), where it has become a cornerstone in the design and development of complex systems, leveraging formal models to improve understanding, validation and automation throughout the engineering lifecycle. UML and EMF diagrams in model-based engineering contain a large amount of multimodal information and intricate relational data. Hence, our study explores the application of multimodal large language models within the domain of model-based engineering to evaluate their capacity for understanding and identifying relationships, features, and functionalities embedded in UML and EMF diagrams. We aim to demonstrate the transformative potential benefits and limitations of multimodal summarization in improving productivity and accuracy in MBE practices. The proposed approach is evaluated within the context of automotive software development, while many promising state-of-art models were taken into account.


LensDFF: Language-enhanced Sparse Feature Distillation for Efficient Few-Shot Dexterous Manipulation

March 2025

Learning dexterous manipulation from few-shot demonstrations is a significant yet challenging problem for advanced, human-like robotic systems. Dense distilled feature fields have addressed this challenge by distilling rich semantic features from 2D visual foundation models into the 3D domain. However, their reliance on neural rendering models such as Neural Radiance Fields (NeRF) or Gaussian Splatting results in high computational costs. In contrast, previous approaches based on sparse feature fields either suffer from inefficiencies due to multi-view dependencies and extensive training or lack sufficient grasp dexterity. To overcome these limitations, we propose Language-ENhanced Sparse Distilled Feature Field (LensDFF), which efficiently distills view-consistent 2D features onto 3D points using our novel language-enhanced feature fusion strategy, thereby enabling single-view few-shot generalization. Based on LensDFF, we further introduce a few-shot dexterous manipulation framework that integrates grasp primitives into the demonstrations to generate stable and highly dexterous grasps. Moreover, we present a real2sim grasp evaluation pipeline for efficient grasp assessment and hyperparameter tuning. Through extensive simulation experiments based on the real2sim pipeline and real-world experiments, our approach achieves competitive grasping performance, outperforming state-of-the-art approaches.



Pick-and-place Manipulation Across Grippers Without Retraining: A Learning-optimization Diffusion Policy Approach

February 2025

·

5 Reads

Current robotic pick-and-place policies typically require consistent gripper configurations across training and inference. This constraint imposes high retraining or fine-tuning costs, especially for imitation learning-based approaches, when adapting to new end-effectors. To mitigate this issue, we present a diffusion-based policy with a hybrid learning-optimization framework, enabling zero-shot adaptation to novel grippers without additional data collection for retraining policy. During training, the policy learns manipulation primitives from demonstrations collected using a base gripper. At inference, a diffusion-based optimization strategy dynamically enforces kinematic and safety constraints, ensuring that generated trajectories align with the physical properties of unseen grippers. This is achieved through a constrained denoising procedure that adapts trajectories to gripper-specific parameters (e.g., tool-center-point offsets, jaw widths) while preserving collision avoidance and task feasibility. We validate our method on a Franka Panda robot across six gripper configurations, including 3D-printed fingertips, flexible silicone gripper, and Robotiq 2F-85 gripper. Our approach achieves a 93.3% average task success rate across grippers (vs. 23.3-26.7% for diffusion policy baselines), supporting tool-center-point variations of 16-23.5 cm and jaw widths of 7.5-11.5 cm. The results demonstrate that constrained diffusion enables robust cross-gripper manipulation while maintaining the sample efficiency of imitation learning, eliminating the need for gripper-specific retraining. Video and code are available at https://github.com/yaoxt3/GADP.


Enhancing Highway Safety: Accident Detection on the A9 Test Stretch Using Roadside Sensors

February 2025

·

25 Reads

Road traffic injuries are the leading cause of death for people aged 5-29, resulting in about 1.19 million deaths each year. To reduce these fatalities, it is essential to address human errors like speeding, drunk driving, and distractions. Additionally, faster accident detection and quicker medical response can help save lives. We propose an accident detection framework that combines a rule-based approach with a learning-based one. We introduce a dataset of real-world highway accidents featuring high-speed crash sequences. It includes 294,924 labeled 2D boxes, 93,012 labeled 3D boxes, and track IDs across 48,144 frames captured at 10 Hz using four roadside cameras and LiDAR sensors. The dataset covers ten object classes and is released in the OpenLABEL format. Our experiments and analysis demonstrate the reliability of our method.


Fig. 1: Systematic comparison of different approaches. (a) GAN-based translator for UDA; (b) DM for domain generalization (DG)/ randomization (DR); (c) Enhanced DM for UDA (Ours). x s , x t stand for source and target domain images. y s , ˜ y t stand for source GT and target prior. ˜ x s2t , ˜ x t and˜xand˜ and˜x random are generated data. h s , h t are structure information.
Fig. 3: An algorithmic overview of W-ControlUDA framework. (a) depicts the training procedure of our UDAControlNet conditioned on prior knowledge from target domain, (Sec. III-A); (b) demonstrates how data sampling can be performed with UDAControlNet to synthesize various pseudo target data from a single source label (Sec. III-B); (c) shows how the performance of UDA segmentation in adverse weathers can be boosted via refinement training with our generated data (Sec. III-C).
W-ControlUDA: Weather-Controllable Diffusion-assisted Unsupervised Domain Adaptation for Semantic Segmentation

January 2025

·

14 Reads

IEEE Robotics and Automation Letters

Image generation has emerged as a potent strategy to enrich training data for unsupervised domain adaptation (UDA) of semantic segmentation in adverse weathers due to the scarcity of labelled target domain data. Previous UDA works commonly utilize generative adversarial networks (GANs) to translate images from the source to the target domain to enhance UDA training. However, these GANs, trained from scratch in an unpaired manner, produce sub-optimal image quality and lack multi-weather controllability. Consequently, controllable data generation for diverse weather scenarios remains underexplored. The recent strides in text-to-image diffusion models (DM) enables high fidelity diverse image generation conditioned on semantic labels. However, such DMs must be trained in a paired manner, i.e. , image and label pairs, which poses huge challenge to the UDA setting where target domain labels are missing. This work addresses two key questions: What is an optimal approach to train DMs for UDA, and how can the generated data best enhance UDA performance? We introduce W-ControlUDA, a diffusion-assisted framework for UDA segmentation in adverse weather. W-ControlUDA involves two steps: DM training for data augmentation and UDA training using the generated data. Unlike previous unpaired training, our method conditions the DM on target predictions from a pre-trained segmentor, addressing the lack of target labels. We propose UDAControlNet for high-fidelity cross-domain and intra-domain data generation under adverse weathers. In UDA training, a label filtering mechanism is introduced to ensure more reliable results. W-ControlUDA helps UDA achieve a new milestone (72.8 mIoU) on the popular Cityscapes-to-ACDC benchmark and notably improves the model's generalization on 5 other benchmarks.


Enhancing Perception for Autonomous Vehicles: A Multi-Scale Feature Modulation Network for Image Restoration

January 2025

IEEE Transactions on Intelligent Transportation Systems

Accurate environmental perception is essential for the effective operation of autonomous vehicles. However, visual images captured in dynamic environments or adverse weather conditions often suffer from various degradations. Image restoration focuses on reconstructing clear and sharp images by eliminating undesired degradations from corrupted inputs. These degradations typically vary in size and severity, making it crucial to employ robust multi-scale representation learning techniques. In this paper, we propose Multi-Scale Feature Modulation (MSFM), a novel deep convolutional architecture for image restoration. MSFM modulates multi-scale features in both frequency and spatial domains to make features sharper and closer to that of clean images. Specifically, our multi-scale frequency attention module transforms features into multiple scales and then modulates each scale in the implicit frequency domain using pooling and attention. Moreover, we develop a multi-scale spatial modulation module to refine pixels with the guidance of local features. The proposed frequency and spatial modules enable MSFM to better handle degradations of different sizes. Experimental results demonstrate that MSFM achieves state-of-the-art performance on 12 datasets for a range of image restoration tasks, i.e., image dehazing, image defocus/motion deblurring, and image desnowing. Furthermore, the restored images significantly improve the environmental perception of autonomous vehicles.


Citations (34)


... In addition to the aforementioned r-disc strategy, Zhang et al. [74] proposed an elliptical nearest neighbor method based on Coulomb's law and invalid vertices in obstacles, Kleinbort et al. [75,76] demonstrate that the r-disc approach can achieve superior performance compared to the k-nearest variant, though the connection radius r(q) must be calibrated according to the characteristics of the state space. Properly tuning the connection radius for specific applications can improve search efficiency. ...

Reference:

Motion planning for robotics: A review for sampling-based planners
Elliptical K-Nearest Neighbors - Path Optimization via Coulomb’s Law and Invalid Vertices in C-space Obstacles
  • Citing Conference Paper
  • October 2024

... In [16], the authors show how to ensure simultaneous safety and exploration when learning a general stochastic, discrete-time system, assuming that a ground truth CBF is available a-priori. Furthermore, the works [17], [8], [7], [18], [19], [20] consider the use of sensor data (particularly LiDAR data) to obtain CBFs online. However, these works either consider specialized dynamics models [18], [17], require retraining on the entire demonstration dataset whenever new measurements are gathered [8], or restrict CBFs to e.g. ...

Online Efficient Safety-Critical Control for Mobile Robots in Unknown Dynamic Multi-Obstacle Environments
  • Citing Conference Paper
  • October 2024

... Informed Set, such as Informed RRT* [33,50] (Fig. 4 (a)), which restricts the node sampling range by creating a hyper-ellipsoid subset [56] for sampling and conducts direct sampling within this subset. By constraining the algorithm's sampling region, this subset increases the probability of selecting relevant nodes, thereby boosting the algorithm's efficiency. ...

Flexible Informed Trees (FIT*): Adaptive Batch-Size Approach in Informed Sampling-Based Path Planning
  • Citing Conference Paper
  • October 2024

... Learning-based approaches for dexterous grasping can be broadly divided into generative-model-based and regression-based approaches. Generative-model-based approaches [1]- [4], [28], [29] integrate grasp generation and optimization, but they often require substaintial effort to balance grasp stability, diversity, and runtime. Meanwhile, regression-based approaches [5], [30], [31] directly predict grasp poses, neglecting the inherent multimodality of grasp distributions. ...

DexGANGrasp: Dexterous Generative Adversarial Grasping Synthesis for Task-Oriented Manipulation

... Recently, there have been several works Qiu et al., 2021;Liu et al., 2021a;Liang et al., 2020Liang et al., , 2021Yang et al., 2021b, a;Ahmed et al., 2021;Lin et al., 2023;Karim et al., 2023;Gao et al., 2023;Qu et al., 2023;Litrico et al., 2023;Luo et al., 2024;Qu et al., 2024;Zou et al., 2024) attempting to realize source-free domain adaptation, where only a pre-trained source model and unlabeled target data are available. In these approaches, ; Qiu et al. (2021); Liu et al. (2021a); Gao et al. (2023) introduce generative networks to generate pseudo-data similar to sources or targets, which are difficult and inefficient. ...

HGL: Hierarchical Geometry Learning for Test-Time Adaptation in 3D Point Cloud Segmentation
  • Citing Chapter
  • November 2024

... Once metamodel construction process is finished through one or many iterations of adding new requirements, the produced Ecore file can be further used. There are two later steps where metamodel is required: 1) model instance creation -construction of XMI model instances using LLMs based on user-provided requirements with respect to given metamodel, as described in [1]; 2) OCL rule generation -construction of design constraint rules based on reference architecture and standardisation-related documents, built upon works from [11]. After those two steps, the model instance is checked in order to identify whether it is compliant to given set of OCL rules. ...

Generative AI for OCL Constraint Generation: Dataset Collection and LLM Fine-tuning
  • Citing Conference Paper
  • October 2024

... Subsequent works focus on scene-adaptive rendering [72], unsupervised adaptation [73], event-specific attention [74], bidirectional fusion [75], [76], and multi-branch feature extraction [77], [78], advancing the robustness and flexibility of event representation. Event cameras have also supported advancements in object detection [79], [80], [81], deblurring [82], [83], [84], flow estimation [85], [86], and tracking [87], [88], [89]. Yet, event-aided SSC remains largely underexplored. ...

Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection
  • Citing Chapter
  • October 2024

... Another significant advance is the exploration of containerization and microservice architectures for improving real-time performance in ROS 2, particularly for SDVs and autonomous systems [49,50]. These studies have shown that containerized deployments can lead to better latency management and system resource utilization compared to bare-metal configurations. ...

A Containerized Microservice Architecture for a ROS 2 Autonomous Driving Software: An End-to-End Latency Evaluation
  • Citing Conference Paper
  • August 2024

... Open-set DA (Saito & Saenko, 2021; handles cases where the target domain includes unknown classes not present in the source domain. Universal DA (You et al., 2019;Qu et al., 2024) aims to adapt to target domains with any combination of known and unknown classes. ...

LEAD: Learning Decomposition for Source-free Universal Domain Adaptation
  • Citing Conference Paper
  • June 2024

... [38] introduced a Restoration Gap Guidance (RGG) to adjust and improve unsafe trajectories produced by diffusion planners. Some other studies [39], [40] have combined traditional CBFs with learning-based methods to maintain safety in autonomous driving. However, these constraints either consider only static environments or overlook the dynamic behavior of surrounding vehicles, limiting their effectiveness in real-world traffic scenarios. ...

Safe Reinforcement Learning for Autonomous Driving by Using Disturbance-Observer-Based Control Barrier Functions
  • Citing Article
  • January 2024

IEEE Transactions on Intelligent Vehicles