Didier Stricker

Didier Stricker
RPTU - Rheinland-Pfälzische Technische Universität Kaiserslautern Landau | TUK · Augmented Vision Group

About

524
Publications
199,849
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
9,241
Citations
Citations since 2017
336 Research Items
6770 Citations
201720182019202020212022202302004006008001,0001,2001,400
201720182019202020212022202302004006008001,0001,2001,400
201720182019202020212022202302004006008001,0001,2001,400
201720182019202020212022202302004006008001,0001,2001,400

Publications

Publications (524)
Article
Full-text available
In-car activity monitoring is a key enabler of various automotive safety functions. Existing approaches are largely based on vision systems. Radar, however, can provide a low-cost, privacy-preserving alternative. To this day, such systems based on the radar are not widely researched. In our work, we introduce a novel approach that uses the Doppler...
Article
Full-text available
Monocular 3D object detection has recently made a significant leap forward thanks to the use of pre-trained depth estimators for pseudo-LiDAR recovery. Yet, such two-stage methods typically suffer from overfitting and are incapable of explicitly encapsulating the geometric relation between depth and object bounding box. To overcome this limitation,...
Article
The acceleration of deep neural networks (DNNs) on edge devices is gaining significant importance in various application domains. General purpose graphics processing units (GPGPUs) are typically used to explore, train and evaluate DNNs because they offer higher processing and computational capability compared to CPUs. However, this comes at the cos...
Preprint
Object permanence is the concept that objects do not suddenly disappear in the physical world. Humans understand this concept at young ages and know that another person is still there, even though it is temporarily occluded. Neural networks currently often struggle with this challenge. Thus, we introduce explicit object permanence into two stage de...
Article
Full-text available
This paper presents a novel architecture for simultaneous estimation of highly accurate optical flows and rigid scene transformations for difficult scenarios where the brightness assumption is violated by strong shading changes. In the case of rotating objects or moving light sources, such as those encountered for driving cars in the dark, the scen...
Article
Full-text available
Object detection is a computer vision task that involves localisation and classification of objects in an image. Video data implicitly introduces several challenges, such as blur, occlusion and defocus, making video object detection more challenging in comparison to still image object detection, which is performed on individual and independent imag...
Article
Full-text available
Supervised image-to-image translation has been proven to generate realistic images with sharp details and to have good quantitative performance. Such methods are trained on a paired dataset, where an image from the source domain already has a corresponding translated image in the target domain. However, this paired dataset requirement imposes a hug...
Preprint
Full-text available
Despite monocular 3D object detection having recently made a significant leap forward thanks to the use of pre-trained depth estimators for pseudo-LiDAR recovery, such two-stage methods typically suffer from overfitting and are incapable of explicitly encapsulating the geometric relation between depth and object bounding box. To overcome this limit...
Preprint
Full-text available
Realistic reconstruction of two hands interacting with objects is a new and challenging problem that is essential for building personalized Virtual and Augmented Reality environments. Graph Convolutional networks (GCNs) allow for the preservation of the topologies of hands poses and shapes by modeling them as a graph. In this work, we propose the T...
Article
Full-text available
In the age of deep learning, researchers have looked at domain adaptation under the pre-training and fine-tuning paradigm to leverage the gains in the natural image domain. These backbones and subsequent networks are designed for object detection in the natural image domain. They do not consider some of the critical characteristics of document imag...
Preprint
Full-text available
Compositional zero-shot learning aims to recognize unseen compositions of seen visual primitives of object classes and their states. While all primitives (states and objects) are observable during training in some combination, their complex interaction makes this task especially hard. For example, wet changes the visual appearance of a dog very dif...
Article
Full-text available
Depth completion involves recovering a dense depth map from a sparse map and an RGB image. Recent approaches focus on utilizing color images as guidance images to recover depth at invalid pixels. However, color images alone are not enough to provide the necessary semantic understanding of the scene. Consequently, the depth completion task suffers f...
Article
Full-text available
Augmented reality (AR), combining virtual elements with the real world, has demonstrated impressive results in a variety of application fields and gained significant research attention in recent years due to its limitless potential [...]
Preprint
Full-text available
In class-incremental semantic segmentation (CISS), deep learning architectures suffer from the critical problems of catastrophic forgetting and semantic background shift. Although recent works focused on these issues, existing classifier initialization methods do not address the background shift problem and assign the same initialization weights to...
Preprint
Full-text available
We present a new, simple yet effective approach to uplift video object detection. We observe that prior works operate on instance-level feature aggregation that imminently neglects the refined pixel-level representation, resulting in confusion among objects sharing similar appearance or motion characteristics. To address this limitation, we propose...
Conference Paper
Recent advances in autonomous driving systems raise new questions about how to enhance the communication and takeover control between the system and the driver. Eye tracking technologies have shown their feasibility to recognize whether the driver’s gaze is directed ‘on-road’ or ‘off-road’. However, this binary information alone is not sufficient t...
Preprint
Full-text available
This paper presents the novel idea of generating object proposals by leveraging temporal information for video object detection. The feature aggregation in modern region-based video object detectors heavily relies on learned proposals generated from a single-frame RPN. This imminently introduces additional components like NMS and produces unreliabl...
Chapter
To monitor the strength exercises, the idea is to capture a template signal while instructing users to perform the movements correctly according to their ability and state of health. This template is used in an online template-matching algorithm based on DTW that was evaluated using the join angles and segment positions estimated by the pose estima...
Article
Full-text available
Research has been growing on object detection using semi-supervised methods in past few years. We examine the intersection of these two areas for floor-plan objects to promote the research objective of detecting more accurate objects with less labeled data. The floor-plan objects include different furniture items with multiple types of the same cla...
Preprint
This paper presents a novel architecture for simultaneous estimation of highly accurate optical flows and rigid scene transformations for difficult scenarios where the brightness assumption is violated by strong shading changes. In the case of rotating objects or moving light sources, such as those encountered for driving cars in the dark, the scen...
Article
Full-text available
Depth maps produced by LiDAR-based approaches are sparse. Even high-end LiDAR sensors produce highly sparse depth maps, which are also noisy around the object boundaries. Depth completion is the task of generating a dense depth map from a sparse depth map. While the earlier approaches focused on directly completing this sparsity from the sparse dep...
Article
Full-text available
The growing amount of data demands methods that can gradually learn from new samples. However, it is not trivial to continually train a network. Retraining a network with new data usually results in a phenomenon called “catastrophic forgetting”. In a nutshell, the performance of the model on the previous data drops by learning from the new instance...
Chapter
Full-text available
Moving around in a virtual world is one of the essential interactions for Virtual Reality (VR) applications. The current standard for moving in VR is using a controller. Recently, VR Head Mounted Displays integrate new input modalities such as hand tracking which allows the investigation of different techniques to move in VR. This work explores dif...
Preprint
Full-text available
Research has been growing on object detection using semi-supervised methods in past few years. We examine the intersection of these two areas for floor-plan objects to promote the research objective of detecting more accurate objects with less labelled data. The floor-plan objects include different furniture items with multiple types of the same cl...
Article
Full-text available
Performing 3D reconstruction from a single 2D input is a challenging problem that is trending in literature. Until recently, it was an ill-posed optimization problem, but with the advent of learning-based methods, the performance of 3D reconstruction has also significantly improved. Infinitely many different 3D objects can be projected onto the sam...
Preprint
Full-text available
The growing amount of data demands methods that can gradually learn from new samples. However, it is not trivial to continually train a network. Retraining a network with new data usually results in a known phenomenon, called “catastrophic forgetting.” In a nutshell, the performance of the model drops on the previous data by learning from the new i...
Preprint
Full-text available
3D reconstruction from a single 2D input is a challenging problem that is trending in literature. Until recently, it was an ill-posed optimization problem, but with the advent of learning-based methods, the performance of 3D reconstruction has also significantly improved. Infinitely many different 3D objects can be projected onto the same 2D plane,...
Article
Full-text available
Page object detection in scanned document images is a complex task due to varying document layouts and diverse page objects. In the past, traditional methods such as Optical Character Recognition (OCR)-based techniques have been employed to extract textual information. However, these methods fail to comprehend complex page objects such as tables an...
Preprint
Full-text available
This paper presents a visual SLAM system that uses both points and lines for robust camera localization, and simultaneously performs a piece-wise planar reconstruction (PPR) of the environment to provide a structural map in real-time. One of the biggest challenges in parallel tracking and mapping with a monocular camera is to keep the scale consist...
Preprint
Full-text available
Tracking a kinematic chain model with inertial sensors and magnetometers using a Bayesian Filter approach typically one magnetometer per segment is used to compensate for a global heading drift. In this work we present a study showing that using an appropriate modeling, heading information can be propagated from one segment to neighboring segments...
Preprint
Full-text available
Natural user interfaces are on the rise. Manufacturers for Augmented, Virtual, and Mixed Reality head mounted displays are increasingly integrating new sensors into their consumer grade products, allowing gesture recognition without additional hardware. This offers new possibilities for bare handed interaction within virtual environments. This work...
Preprint
Full-text available
In recent years, deep neural networks showed their exceeding capabilities in addressing many computer vision tasks including scene flow prediction. However, most of the advances are dependent on the availability of a vast amount of dense per pixel ground truth annotations, which are very difficult to obtain for real life scenarios. Therefore, synth...
Article
Full-text available
The generally unsupervised nature of autoencoder models implies that the main training metric is formulated as the error between input images and their corresponding reconstructions. Different reconstruction loss variations and latent space regularizations have been shown to improve model performances depending on the tasks to solve and to induce n...
Chapter
With recent advances in artificial intelligence (AI) and learning based systems, industries have started to integrate AI components into their products and workflows. In areas where frequent testing and development is possible these system have proved to be quite useful such as in automotive industry where vehicle are now equipped with advanced dri...
Preprint
Locomotion in Virtual Reality (VR) is an important part of VR applications. Many scientists are enriching the community with different variations that enable locomotion in VR. Some of the most promising methods are gesture-based and do not require additional handheld hardware. Recent work focused mostly on user preference and performance of the dif...
Preprint
The state-of-the-art approaches for monocular 3D reconstruction mainly focus on datasets with highly textured images. Most of these methods are trained on datasets like ShapeNet which render well-textured objects. However, in natural scenes, many objects are texture-less, making it difficult to reconstruct them. Unlike textured surfaces, reconstruc...
Article
Full-text available
The graphical page object detection classifies and localizes objects such as Tables and Figures in a document. As deep learning techniques for object detection become increasingly successful, many supervised deep neural network-based methods have been introduced to recognize graphical objects in documents. However, these models necessitate a substa...
Chapter
Locomotion in Virtual Reality (VR) is an important part of VR applications. Many scientists are enriching the community with different variations that enable locomotion in VR. Some of the most promising methods are gesture-based and do not require additional handheld hardware. Recent work focused mostly on user preference and performance of the dif...
Data
This dataset was used in the publication: Towards Artefact Aware Human Motion Capture using Inertial Sensors Integrated into Loose Clothing presented at the IEEE International Conference on Robotics and Automation 2022 The data is available under: https://zenodo.org/record/5948725 Data structure: The data contains trials of 12 subjects for differ...
Article
Full-text available
We present TIMo (Time-of-flight Indoor Monitoring), a dataset for video-based monitoring of indoor spaces captured using a time-of-flight (ToF) camera. The resulting depth videos feature people performing a set of different predefined actions, for which we provide detailed annotations. Person detection for people counting and anomaly detection are...
Preprint
Full-text available
Depth maps produced by LiDAR based approaches are sparse. Even high-end LiDAR sensors produce highly sparse depth maps, which are also noisy around the object boundaries. Depth completion is the task of generating a dense depth map from a sparse depth map. While the traditional approaches focus on directly completing this sparsity from the sparse d...
Article
Full-text available
In recent years, due to the advancements in machine learning, object detection has become a mainstream task in the computer vision domain. The first phase of object detection is to find the regions where objects can exist. With the improvements in deep learning, traditional approaches, such as sliding windows and manual feature selection techniques...
Article
Full-text available
Remote collaboration systems have become increasingly important in today’s society, especially during times where physical distancing is advised. Industry, research and individuals face the challenging task of collaborating and networking over long distances. While video and teleconferencing are already widespread, collaboration systems in augmente...
Preprint
Full-text available
Depth completion involves recovering a dense depth map from a sparse map and an RGB image. Recent approaches focus on utilizing color images as guidance images to recover depth at invalid pixels. However, color images alone are not enough to provide the necessary semantic understanding of the scene. Consequently, the depth completion task suffers f...
Preprint
Full-text available
In recent years, due to the advancement of machine learning, object detection has become a mainstream task in the computer vision domain. The first phase of object detection is to find the regions where objects can exist. With the improvement of deep learning, traditional approaches such as sliding windows and manual feature selection techniques ha...
Preprint
Full-text available
In this paper, we propose a neural network architecture for scale-invariant semantic segmentation using RGB-D images. We utilize depth information as an additional modality apart from color images only. Especially in an outdoor scene which consists of different scale objects due to the distance of the objects from the camera. The near distance obje...
Preprint
Full-text available
The proposed RMS-FlowNet is a novel end-to-end learning-based architecture for accurate and efficient scene flow estimation which can operate on point clouds of high density. For hierarchical scene flow estimation, the existing methods depend on either expensive Farthest-Point-Sampling (FPS) or structure-based scaling which decrease their ability t...
Preprint
Full-text available
The reliability assessment of a machine learning model's prediction is an important quantity for the deployment in safety critical applications. Not only can it be used to detect novel sceneries, either as out-of-distribution or anomaly sample, but it also helps to determine deficiencies in the training data distribution. A lot of promising researc...
Preprint
Full-text available
Learning on synthetic data and transferring the resulting properties to their real counterparts is an important challenge for reducing costs and increasing safety in machine learning. In this work, we focus on autoencoder architectures and aim at learning latent space representations that are invariant to inductive biases caused by the domain shift...
Preprint
Full-text available
Establishing correspondences from image to 3D has been a key task of 6DoF object pose estimation for a long time. To predict pose more accurately, deeply learned dense maps replaced sparse templates. Dense methods also improved pose estimation in the presence of occlusion. More recently researchers have shown improvements by learning object fragmen...
Preprint
Full-text available
Video anomaly detection (VAD) addresses the problem of automatically finding anomalous events in video data. The primary data modalities on which current VAD systems work on are monochrome or RGB images. Using depth data in this context instead is still hardly explored in spite of depth images being a popular choice in many other computer vision re...
Preprint
Full-text available
Grabbing virtual objects is one of the essential tasks for Augmented, Virtual, and Mixed Reality applications. Modern applications usually use a simple pinch gesture for grabbing and moving objects. However, picking up objects by pinching has disadvantages. It can be an unnatural gesture to pick up objects and prevents the implementation of other g...
Article
Full-text available
Natural user interfaces based on hand gestures are becoming increasingly popular. The need for expensive hardware left a wide range of interaction possibilities that hand tracking enables largely unexplored. Recently, hand tracking has been built into inexpensive and widely available hardware, allowing more and more people access to this technology...
Article
Full-text available
Document classification is one of the most critical steps in the document analysis pipeline. There are two types of approaches for document classification, known as image-based and multimodal approaches. Image-based document classification approaches are solely based on the inherent visual cues of the document images. In contrast, the multimodal ap...
Article
Full-text available
The problem of accurate three-dimensional reconstruction is important for many research and industrial applications. Light field depth estimation utilizes many observations of the scene and hence can provide accurate reconstruction. We present a method, which enhances existing reconstruction algorithm with per-layer disparity filtering and consiste...
Article
Full-text available
Electroencephalogram (EEG) is widely used for the diagnosis of neurological conditions like epilepsy, neurodegenerative illnesses and sleep related disorders. Proper interpretation of EEG recordings requires the expertise of trained neurologists, a resource which is scarce in the developing world. Neurologists spend a significant portion of their t...