Rudolf Mester

Rudolf Mester
NTNU Trondheim · Computer Science

Dr.-Ing.

About

185
Publications
10,455
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,139
Citations
Additional affiliations
October 2018 - May 2020
Norwegian University of Science and Technology
Position
  • Professor
Description
  • Performance, confidence quantification, and assurance of AI / machine learning and visual computing methods, on classical and ML-based perception approaches for intelligent machines, and on autonomous systems.
October 2012 - February 2014
Linköping University
Position
  • Professor
January 2005 - July 2019
Goethe-Universität Frankfurt am Main
Position
  • Professor

Publications

Publications (185)
Article
Estimating vehicles’ locations is one of the key components in intelligent traffic management systems (ITMSs) for increasing traffic scene awareness. Traditionally, stationary sensors have been employed in this regard. The development of advanced sensing and communication technologies on modern vehicles (MVs) makes it feasible to use such vehicles...
Preprint
Full-text available
Estimating vehicles' locations is one of the key components in intelligent traffic management systems (ITMSs) for increasing traffic scene awareness. Traditionally, stationary sensors have been employed in this regard. The development of advanced sensing and communication technologies on modern vehicles (MVs) makes it feasible to use such vehicles...
Preprint
Full-text available
Recent work on image anonymization has shown that generative adversarial networks (GANs) can generate near-photorealistic faces to anonymize individuals. However, scaling these networks to the entire human body has remained a challenging and yet unsolved task. We propose a new anonymization method that generates close-to-photorealistic humans for i...
Preprint
A promising approach to accurate positioning of robots is ground texture based localization. It is based on the observation that visual features of ground images enable fingerprint-like place recognition. We tackle the issue of efficient parametrization of such methods, deriving a prediction model for localization performance, which requires only a...
Preprint
Urban Traffic Surveillance (UTS) is a surveillance system based on a monocular and calibrated video camera that detects vehicles in an urban traffic scenario with dense traffic on multiple lanes and vehicles performing sharp turning maneuvers. UTS then tracks the vehicles using a 3D bounding box representation and a physically reasonable 3D motion...
Chapter
A regular convolution layer applying a filter in the same way over known and unknown areas causes visual artifacts in the inpainted image. Several studies address this issue with feature re-normalization on the output of the convolution. However, these models use a significant amount of learnable parameters for feature re-normalization [41, 48], or...
Article
Full-text available
Testing that ships are compliant to specified safety requirements have traditionally relied on real world data, which is not scalable and limited to testable scenarios due to financial and ethical reasons. Low fidelity simulations have been used to counteract some of these problems, which is sufficient for emulating simpler systems such as radar de...
Chapter
There is evidence that accessing online traffic data is a key factor to facilitate intelligent traffic management, especially at intersections. With the advent of autonomous vehicles (AVs), new options for collecting such data appear. To date, much research has been performed on machine learning to provide safe motion planning and to control modern...
Preprint
Full-text available
A regular convolution layer applying a filter in the same way over known and unknown areas causes visual artifacts in the inpainted image. Several studies address this issue with feature re-normalization on the output of the convolution. However, these models use a significant amount of learnable parameters for feature re-normalization, or assume a...
Preprint
Ground texture based vehicle localization using feature-based methods is a promising approach to achieve infrastructure-free high-accuracy localization. In this paper, we provide the first extensive evaluation of available feature extraction methods for this task, using separately taken image pairs as well as synthetic transformations. We identify...
Preprint
Ground texture based localization is a promising approach to achieve high-accuracy positioning of vehicles. We present a self-contained method that can be used for global localization as well as for subsequent local localization updates, i.e. it allows a robot to localize without any knowledge of its current whereabouts, but it can also take advant...
Chapter
Autonomous vehicles and robots require a full scene understanding of the environment to interact with it. Such a perception typically incorporates pixel-wise knowledge of the depths and semantic labels for each image from a video sensor. Recent learning-based methods estimate both types of information independently using two separate CNNs. In this...
Chapter
We propose a novel architecture which is able to automatically anonymize faces in images while retaining the original data distribution. We ensure total anonymization of all faces in an image by generating images exclusively on privacy-safe information. Our model is based on a conditional generative adversarial network, generating images considerin...
Preprint
Full-text available
We propose a novel architecture which is able to automatically anonymize faces in images while retaining the original data distribution. We ensure total anonymization of all faces in an image by generating images exclusively on privacy-safe information. Our model is based on a conditional generative adversarial network, generating images considerin...
Preprint
Existing 3D scene flow estimation methods provide the 3D geometry and 3D motion of a scene and gain a lot of interest, for example in the context of autonomous driving. These methods are traditionally based on a temporal series of stereo images. In this paper, we propose a novel monocular 3D scene flow estimation method, called Mono-SF. Mono-SF joi...
Preprint
In this paper we present mono-stixels, a compact environment representation specially designed for dynamic street scenes. Mono-stixels are a novel approach to estimate stixels from a monocular camera sequence instead of the traditionally used stereo depth measurements. Our approach jointly infers the depth, motion and semantic information of the dy...
Preprint
Autonomous vehicles and robots require a full scene understanding of the environment to interact with it. Such a perception typically incorporates pixel-wise knowledge of the depths and semantic labels for each image from a video sensor. Recent learning-based methods estimate both types of information independently using two separate CNNs. In this...
Preprint
Classical monocular vSLAM/VO methods suffer from the scale ambiguity problem. Hybrid approaches solve this problem by adding deep learning methods, for example by using depth maps which are predicted by a CNN. We suggest that it is better to base scale estimation on estimating the traveled distance for a set of subsequent images. In this paper, we...
Conference Paper
In the field of autonomous driving, the system controlling the vehicle can be seen as an agent acting in a complex environment and thus naturally fits into the modern framework of reinforcement learning. However, learning to drive can be a challenging task and current results are often restricted to simplified driving environments. To advance the f...
Chapter
The stixel-world is a compact and detailed environment representation specially designed for street scenes and automotive vision applications. A recent work proposes a monocamera based stixel estimation method based on the structure from motion principle and scene model to predict the depth and translational motion of the static and dynamic parts o...
Chapter
This paper presents a method for detecting independently moving objects (IMOs) from a monocular camera mounted on a moving car. We use an existing state of the art monocular sparse visual odometry/SLAM framework, and specifically attack the notorious problem of identifying those IMOs which move parallel to the ego-car motion, that is, in an ‘epipol...
Preprint
In the field of Autonomous Driving, the system controlling the vehicle can be seen as an agent acting in a complex environment and thus naturally fits into the modern framework of Reinforcement Learning. However, learning to drive can be a challenging task and current results are often restricted to simplified driving environments. To advance the f...
Article
Using Deep Reinforcement Learning (DRL) can be a promising approach to handle tasks in the field of (simulated) autonomous driving, whereby recent publications only consider learning in unusual driving environments. This paper outlines a developed software, which instead can be used for evaluating DRL algorithms based on realistic road networks and...
Article
Visual odometry using only a monocular camera faces more algorithmic challenges than stereo odometry. We present a robust monocular visual odometry framework for automotive applications. An extended propagation-based tracking framework is proposed which yields highly accurate (unscaled) pose estimates. Scale is supplied by ground plane pose estimat...
Article
In computer vision most iterative optimization algorithms, both sparse and dense, rely on a coarse and reliable dense initialization to bootstrap their optimization procedure. For example, dense optical flow algorithms profit massively in speed and robustness if they are initialized well in the basin of convergence of the used loss function. The sa...
Article
Traditionally, pose estimation is considered as a two step problem. First, feature correspondences are determined by direct comparison of image patches, or by associating feature descriptors. In a second step, the relative pose and the coordinates of corresponding points are estimated, most often by minimizing the reprojection error (RPE). RPE opti...
Conference Paper
The recognition of individual object instances in single monocular images is still an incompletely solved task. In this work, we propose a new approach for detecting and separating vehicles in the context of autonomous driving. Our method uses the fully convolutional network (FCN) for semantic labeling and for estimating the boundary of each vehicl...
Article
Full-text available
Detecting small obstacles on the road ahead is a critical part of the driving task which has to be mastered by fully autonomous cars. In this paper, we present a method based on stereo vision to reliably detect such obstacles from a moving vehicle. The proposed algorithm performs statistical hypothesis tests in disparity space directly on stereo im...
Conference Paper
We present a framework that supports the development and evaluation of vision algorithms in the context of driver assistance applications and traffic surveillance. This framework allows the creation of highly realistic image sequences featuring traffic scenarios. The sequences are created with a realistic state of the art vehicle physics model; dif...
Conference Paper
One of the major steps in visual environment perception for automotive applications is to track keypoints and to subsequently estimate egomotion and environment structure from the trajectories of these keypoints. This paper presents a propagation based tracking method to obtain the 2D trajectories of keypoints from a sequence of images in a monocul...
Conference Paper
Correspondence relations between different views of the same scene can be learnt in an unsupervised manner. We address autonomous learning of arbitrary fixed spatial (point-to-point) mappings. Since any such transformation can be represented by a permutation matrix, the signal model is a linear one, whereas the proposed analysis method, mainly base...
Conference Paper
Tracking keypoints through a video sequence is a crucial first step in the processing chain of many visual SLAM approaches. This paper presents a robust initialization method to provide the initial match for a keypoint tracker, from the 1st frame where a keypoint is detected to the 2nd frame, that is: when no depth information is available. We deal...
Conference Paper
The motion of a driving car is highly constrained and we claim that powerful predictors can be built that 'learn' the typical egomotion statistics, and support the typical tasks of feature matching, tracking, and egomotion estimation. We analyze the statistics of the 'ground truth' data given in the KITTI odometry benchmark sequences and confirm th...
Conference Paper
For evaluating or training different kinds of vision algorithms, a large amount of precise and reliable data is needed. In this paper we present a system to create extended synthetic sequences of traffic environment scenarios, associated with several types of ground truth data. By integrating vehicle dynamics in a configuration tool, and by using p...
Conference Paper
Phase correlation is one of the classic methods for sparse mo- tion or displacement estimation. It is renowned in the literature for high precision and insensitivity against illumination variations. We propose several important enhancements to the phase correlation (PhC) method which render it more robust against those situations where a motion mea...
Conference Paper
We present an approach to learn relative photometric differences between pairs of cameras, which have partially overlapping fields of views. This is an important problem, especially in appearance based methods to correspondence estimation or object identification in multi-camera systems where grey values observed by different cameras are processed....
Conference Paper
Reliable detection of obstacles at long range is crucial for the timely response to hazards by fast-moving safety-critical platforms like autonomous cars. We present a novel method for the joint detection and localization of distant obstacles using a stereo vision system on a moving platform. The approach is applicable to both static and moving obs...
Conference Paper
The online-estimation of yaw, pitch, and roll of a moving vehicle is an important ingredient for systems which estimate egomotion, and 3D structure of the environment in a moving vehicle from video information. We present an approach to estimate these angular changes from monocular visual data, based on the fact that the motion of far distant point...
Conference Paper
Visual odometry is one of the most active topics in computer vision. The automotive industry is particularly interested in this field due to the appeal of achieving a high degree of accuracy with inexpensive sensors such as cameras. The best results on this task are currently achieved by systems based on a calibrated stereo camera rig, whereas mono...
Article
We discuss matching measures (scores and residuals) for comparing image patches under unknown affine photometric (=intensity) transformations. In contrast to existing methods, we derive a fully symmetric matching measure which reflects the fact that both copies of the signal are affected by measurement errors ('noise'), not only one. As it turns ou...
Conference Paper
Modern applications of stereo vision, such as advanced driver assistance systems and autonomous vehicles, require highest precision when determining the location and velocity of potential obstacles. Subpixel disparity accuracy in selected image regions is therefore essential. Evaluation benchmarks for stereo correspondence algorithms, such as the p...
Conference Paper
An open issue in multiple view geometry and structure from motion, applied to real life scenarios, is the sparsity of the matched key-points and of the reconstructed point cloud. We present an approach that can significantly improve the density of measured displacement vectors in a sparse matching or tracking setting, exploiting the partial informa...
Conference Paper
We discuss matching measures (scores and residuals) for comparing image patches under unknown affine photometric (=intensity) transformations. In contrast to existing methods, we derive a fully symmetric matching measure which reflects the fact that both copies of the signal are affected by measurement errors (’noise’), not only one. As it turns ou...
Conference Paper
Stereo vision has established in the field of driver assistance and vehicular safety systems. Next steps along the road towards accident free driving aim to assist the driver in increasingly complex situations such as inner-city traffic. In order to achieve these goals, it is desirable to incorporate higher-order object knowledge in the stereo visi...
Conference Paper
The present paper analyzes some previously unexplored aspects of motion estimation that are fundamental both for discrete block matching as well as for differential'optical flow'approachesa la Lucas-Kanade. It aims at providing a complete estimation-theoretic approach that makes the assumptions about noisy observations of samples from a continuous...
Conference Paper
Dieser Beitrag beschreibt ein mehrstufiges Verfahren zur Schätzung der Eigenbewegung eines Kraftfahrzeugs unter Verwendung von monokularen Bildsequenzen. Die vorgestellten Methoden basieren auf einem planaren Weltmodell, welches für Verkehrsszenen mit gewissen Einschränkungen durchaus realistisch ist; der Einfluß von Abweichungen von diesem Modell...
Article
Full-text available
Optical Flow (OF) techniques facing the complexity of real sequences have been developed in the last years. Even using the most appropriate technique for our specific problem, at some points the output flow might fail to achieve the minimum error required for the system. Confidence measures computed from either input data or OF output should discar...
Conference Paper
Precise stereo-based depth estimation at large distances is challenging: objects become very small, often exhibit low contrast in the image, and can hardly be separated from the background based on disparity due to measurement noise. In this paper we present an approach that overcomes these problems by combining robust object segmentation and highl...
Conference Paper
We propose and evaluate a versatile scheme for image pre-segmentation that generates a partition of the image into a selectable number of patches ('superpixels'), under the constraint of obtaining maximum homogeneity of the'texture'inside of each patch, and maximum accordance of the contours with both the image content as well as a Gibbs- Markov ra...
Conference Paper
We analyze the consequences of instabilities and fluctuations, such as camera shaking and illumination/exposure changes, on typical surveillance video material and devise a systematic way to compensate these changes as much as possible. The phase correlation method plays a decisive role in the proposed scheme, since it is inherently insensitive to...
Conference Paper
In this work we present an approach to automatically learn pixel correspondences between pairs of cameras. We build on the method of Temporal Coincidence Analysis (TCA) and extend it from the pure temporal (i.e. single-pixel) to the spatiotemporal domain. Our approach is based on learning a statistical model for local spatiotemporal image patches,...
Conference Paper
Full-text available
We present a novel method for clustering data drawn from a union of arbitrary dimensional subspaces, called Discriminative Subspace Clustering (DiSC). DiSC solves the subspace clustering problem by using a quadratic clas-sifier trained from unlabeled data (clustering by classifica-tion). We generate labels by exploiting the locality of points from...
Conference Paper
Full-text available
The development of vehicles that perceive their environment, in particular those using computer vision, indispensably requires large databases of sensor recordings obtained from real cars driven in realistic traffic situations. These datasets should be time shaped for enabling synchronization of sensor data from different sources. Furthermore, full...
Conference Paper
This paper addresses the problem of finding corresponding image patches in multi-camera video streams by means of an unsupervised learning method. We determine patch-to-patch correspondence relations ('correspondence priors') merely using information from a temporal change detection. Correspondence priors are essential for geometric multi-camera ca...
Article
In the recent years, advanced video sensors have become common in driver assistance, coping with the highly dynamic lighting conditions by nonlinear exposure adjustments. However, many computer vision algorithms are still highly sensitive to the resulting sudden brightness changes. We present a method that is able to estimate the relative intensity...
Article
We present an approach to unveil the underlying structure of dynamic scenes from a sparse set of local flow measurements. We first estimate those measurements at carefully selected locations, and subsequently group them into a finite set of different dense flow field hypotheses. These flow fields are represented as parametric functional models, and...
Article
This paper formulates the problem of estimating motion or geometric transforms between images in a Bayesian manner, stressing the relation between continuous and discrete formulations and emphasizing the necessity to employ stochastic distributions on function spaces.
Article
Full-text available
The computation of free space available in an environment is an essential task for many intelligent automotive and robotic applications. This paper proposes a new approach, which builds a stochastic occupancy grid to address the free space problem as a dynamic programming task. Stereo measurements are integrated over time reducing disparity uncerta...
Conference Paper
In the present position paper, I formulate some (in part critical) remarks related to some techniques which are successfully used in contemporary live dense reconstruction approaches. Main issues are feature based correspondence vs. image-based matching, the generalization of the brightness constancy assumption, and the handling of featureless regi...
Conference Paper
This paper presents a versatile algorithmic building block that allows to significantly improve intermediate and final results of numerous variations of segmentation. The segmentation ‘context’ can be very different in terms of the used data modality (gray scale, color, texture features, depth data, motion, …), in terms of single frame vs. sequence...
Conference Paper
Vision-based motion perception builds primarily on the concept of optical flow. Modern optical flow approaches suffer from several shortcomings, especially in real, non-ideal scenarios such as traffic scenes. Non-constant illumination conditions in consecutive frames of the input image sequence are among these shortcomings. We propose and evaluate...
Conference Paper
Full-text available
Literally thousands of articles on optical flow algorithms have been published in the past thirty years. Only a small subset of the suggested algorithms have been analyzed with respect to their performance. These evaluations were based on black-box tests, mainly yielding information on the average accuracy on test-sequences with ground truth. No th...
Article
Deviating from classical Shannon-type sampling, we determine the MMSE-optimum reconstructions kernels for linearly interpolating a non-bandlimited signal from a discrete set of noisy measurements obtained from non- #x03B4; sampling kernels. For this purpose, the first and second order moment functions (ACF) of the continuous input process are requi...
Article
We propose a new learning approach to determine the geometric and photometric relationship between multiple cameras which have at least partially overlapping fields of view. The essential difference to standard matching techniques is that the search for similar spatial patterns is replaced by an analysis of temporal coincidences of single pixels. T...
Conference Paper
The contribution describes a statistical framework for image segmentation that is characterized by the following features: It allows to model scalar as well as multi-channel images (color, texture feature sets, depth, ...) in a region-based manner, including a Gibbs-Markov random field model that describes the spatial (and temporal) cohesion tenden...
Article
Interpolation of signals (arbitrary dimension, here: 2D images) with missing data points is addressed from a statistical point of view. We present a general framework for which a Wiener-style MMSE estimator can be seamlessly adapted to deal with problems such as image interpolation (inpainting), reconstruction from sparse samples, and image extrapo...
Book
This book constitutes the refereed proceedings of the 33rd Symposium of the German Association for Pattern Recognition, DAGM 2011, held in Frankfurt/Main, Germany, in August/September 2011. The 20 revised full papers and 22 revised poster papers were carefully reviewed and selected from 98 submissions. The papers are organized in topical sections o...
Conference Paper
Optical Rails [1] is a purely view-based method for autonomous track following with a mobile robot, based upon compact omnidirectional view descriptors using basis functions on the sphere. We address the most prominent points of criticism towards holistic methods for robot navigation: Dealing with occlusions and varying illumination. This is accomp...
</