About
78
Publications
34,963
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
16,144
Citations
Citations since 2017
Introduction
Additional affiliations
May 2013 - present
TU Darmstadt
Position
- Professor (Full)
September 2007 - May 2013
TU Darmstadt
Position
- Juniorprofessor
September 2001 - June 2007
Education
September 2001 - May 2007
October 1996 - May 2001
Publications
Publications (78)
Standardized benchmarks have been crucial in pushing the performance of computer vision algorithms, especially since the advent of deep learning. Although leaderboards should not be over-claimed, they often provide the most objective measure of performance and are therefore important guides for research. We present MOTChallenge, a benchmark for sin...
Standardized benchmarks have been crucial in pushing the performance of computer vision algorithms, especially since the advent of deep learning. Although leaderboards should not be over-claimed, they often provide the most objective measure of performance and are therefore important guides for research. We present MOTChallenge, a benchmark for sin...
Standardized benchmarks are crucial for the majority of computer vision applications. Although leaderboards and ranking tables should not be over-claimed, benchmarks often provide the most objective measure of performance and are therefore important guides for research. The benchmark for Multiple Object Tracking, MOTChallenge, was launched with the...
Standardized benchmarks are crucial for the majority of computer vision applications. Although leaderboards and ranking tables should not be over-claimed, benchmarks often provide the most objective measure of performance and are therefore important guides for research. The benchmark for Multiple Object Tracking, MOTChallenge, was launched with the...
Standardized benchmarks are crucial for the majority of computer vision applications. Although leaderboards and ranking tables should not be over-claimed, benchmarks often provide the most objective measure of performance and are therefore important guides for research. We present a benchmark for Multiple Object Tracking launched in the late 2014,...
Der 3D Szenenfluss (scene flow) ist eine dichte Beschreibung der Geometrie und des Bewegungsfeldes einer dynamischen Szene. Entsprechend ist die Bestimmung des Szenenflusses aus binokularen Videosequenzen eine Generalisierung zweier klassischer Aufgaben der bildbasierten Messtechnik, der Schätzung von Stereokorrespondenz und optischem Fluss. Im fol...
Standardized benchmarks are crucial for the majority of computer vision applications. Although leaderboards and ranking tables should not be over-claimed, benchmarks often provide the most objective measure of performance and are therefore important guides for reseach. Recently, a new benchmark for Multiple Object Tracking, MOTChallenge, was launch...
In the recent past, the computer vision community has developed centralized benchmarks for the performance evaluation of a variety of tasks, including generic object and pedestrian detection, 3D reconstruction, optical flow, single-object short-term tracking, and stereo estimation. Despite potential pitfalls of such benchmarks, they have proved to...
3D scene flow estimation aims to jointly recover dense geometry and 3D motion from stereoscopic image sequences, thus generalizes classical disparity and 2D optical flow estimation. To realize its conceptual benefits and overcome limitations of many existing methods, we propose to represent the dynamic scene as a collection of rigidly moving planes...
Methods and apparatus are described for monocular 3D human pose estimation and tracking, which are able to recover poses of people in realistic street conditions captured using a monocular, potentially moving camera. Embodiments of the present invention provide a three-stage process involving estimating (10, 60, 110) a 3D pose of each of the multip...
Der 3D Szenenfluss (scene flow) ist eine dichte Beschreibung der Geometrie und des Bewegungsfeldes einer dynamischen Szene. Entsprechend ist die Bestimmung des Szenenflusses aus binokularen Videosequenzen eine Generalisierung zweier klassischer Aufgaben der bildbasierten Messtechnik, der Schätzung von Stereokorrespondenz und optischem Fluss. Im fol...
We propose a method to recover dense 3D scene flow from stereo video. The method estimates the depth and 3D motion field of a dynamic scene from multiple consecutive frames in a sliding temporal window, such that the estimate is consistent across both viewpoints of all frames within the window. The observed scene is modeled as a collection of plana...
Many state-of-the-art image restoration approaches do not scale well to larger images, such as megapixel images common in the consumer segment. Computationally expensive optimization is often the culprit. While efficient alternatives exist, they have not reached the same level of image quality. The goal of this paper is to develop an effective appr...
Conditional random fields (CRFs) are popular discriminative models for
computer vision and have been successfully applied in the domain of image
restoration, especially to image denoising. For image deblurring, however,
discriminative approaches have been mostly lacking. We posit two reasons for
this: First, the blur kernel is often only known at t...
Many recent advances in multiple target tracking aim at finding a (nearly) optimal set of trajectories within a temporal window. To handle the large space of possible trajectory hypotheses, it is typically reduced to a finite set by some form of data-driven or regular discretization. In this work, we propose an alternative formulation of multitarge...
The accuracy of optical flow estimation algorithms has been improving steadily as evidenced by results on the Middlebury optical flow benchmark. The typical formulation, however, has changed little since the work of Horn and Schunck. We attempt to uncover what has made recent advances possible through a thorough analysis of how the objective functi...
People tracking in crowded real-world scenes is challenging due to frequent and long-term occlusions. Recent tracking methods obtain the image evidence from object (people) detectors, but typically use off-the-shelf detectors and treat them as black box components. In this paper we argue that for best performance one should explicitly train people...
Estimating dense 3D scene flow from stereo sequences remains a challenging task, despite much progress in both classical disparity and 2D optical flow estimation. To overcome the limitations of existing techniques, we introduce a novel model that represents the dynamic 3D scene by a collection of planar, rigidly moving, local segments. Scene flow e...
Motion estimation in realistic outdoor settings is significantly challenged by cast shadows, reflections, glare, saturation, automatic gain control, etc. To allow robust optical flow estimation in these cases, it is important to choose appropriate data cost functions for matching. Recent years have seen a growing trend toward patch-based data costs...
When tracking multiple targets in crowded scenarios,
modeling mutual exclusion between distinct targets becomes
important at two levels: (1) in data association,
each target observation should support at most one trajectory
and each trajectory should be assigned at most one
observation per frame; (2) in trajectory estimation, two
trajectories shoul...
Non-blind deblurring is an integral component of blind approaches for removing image blur due to camera shake. Even though learning-based deblurring methods exist, they have been limited to the generative case and are computationally expensive. To this date, manually-defined models are thus most widely used, though limiting the attained restoration...
Evaluating multi-target tracking based on ground truth data is a surprisingly challenging task. Erroneous or ambiguous ground truth annotations, numerous evaluation protocols, and the lack of standardized benchmarks make a direct quantitative comparison of different tracking approaches rather difficult. The goal of this paper is to raise awareness...
We present an efficient implementation of volumetric anisotropic image diffusion filters on modern programmable graphics processing units (GPUs), where the mathematics behind volumetric diffusion is effectively reduced to the diffusion in 2D images. We hereby avoid the computational bottleneck of a time consuming eigenvalue decomposition in ℝ 3 . I...
In this paper we consider people detection and articulated pose estimation, two closely related and challenging problems in computer vision. Conceptually, both of these problems can be addressed within the pictorial structures framework (Felzenszwalb and Huttenlocher in Int. J. Comput. Vis. 61(1):55–79, 2005; Fischler and Elschlager in IEEE Trans....
Probabilistic inference beyond MAP estimation is of interest in com-puter vision, both for learning appropriate models and in applications. Yet, com-mon approximate inference techniques, such as belief propagation, have largely been limited to discrete-valued Markov random fields (MRFs) and models with small cliques. Oftentimes, neither is desirabl...
We present a novel conditional random field (CRF) for semantic seg-mentation that extends the common Potts model of spatial coherency with latent topics, which capture higher-order spatial relations of segment labels. Specifi-cally, we show how recent approaches for producing sets of figure-ground seg-mentations can be leveraged to construct a suit...
Motivated by aiding human operators in the detection of dangerous objects in passenger luggage, such as in airports, we develop an automatic object detection approach for multi-view X-ray image data. We make three main con-tributions: First, we systematically analyze the appearance variations of objects in X-ray images from inspection systems. We t...
Following recent advances in detection, context modeling and tracking, scene understanding has been the focus of renewed interest in computer vision research. This paper presents a novel probabilistic 3D scene model that integrates state-of-the-art multi-class object detection, object tracking and scene labeling together with geometric 3D reasoning...
The problem of multi-target tracking is comprised of two distinct, but tightly coupled challenges: (i) the naturally dis-crete problem of data association, i.e. assigning image ob-servations to the appropriate target; (ii) the naturally con-tinuous problem of trajectory estimation, i.e. recovering the trajectories of all targets. To go beyond simpl...
Identifying suitable image features is a central challenge in computer vision, ranging from representations for low-level to high-level vision. Due to the difficulty of this task, techniques for learning features directly from example data have recently gained attention. Despite significant benefits, these learned features often have many fewer of...
We present an approach to 3D scene flow estimation, which exploits that in realistic scenarios image motion is frequently dominated by observer motion and independent, but rigid object motion. We cast the dense estimation of both scene structure and 3D motion from sequences of two or more views as a single energy minimization problem. We show that...
We present a principled model for occlusion reasoning in complex scenarios with frequent inter-object occlusions, and its application to multi-target tracking. To compute the putative overlap between pairs of targets, we represent each target with a Gaussian. Conveniently, this leads to an analytical form for the relative overlap - another Gaussian...
Spatially-discrete Markov random fields (MRFs) and spatially-continuous variational approaches are ubiquitous in low-level vision, including image restoration, segmentation, optical flow, and stereo. Even though both families of approaches are fairly similar on an intuitive level, they are frequently seen as being technically rather distinct since...
Conventional non-blind image deblurring algorithms involve natural image priors and maximum a-posteriori (MAP) estimation. As a consequence of MAP estimation, separate pre-processing steps such as noise estimation and training of the regularization parameter are necessary to avoid user interaction. Moreover, MAP estimates involving standard natural...
State-of-the-art research on MRFs, successful MRF applications, and advanced topics for future study.
This volume demonstrates the power of the Markov random field (MRF) in vision, treating the MRF both as a tool for modeling image data and, utilizing recently developed algorithms, as a means of making inferences about images. These inferences conc...
Scene understanding from a monocular, moving camera is a challenging problem with a number of applications including robotics and automotive safety. While recent systems have shown that this is best accomplished with a 3D scene model, handling of partial object occlusion is still unsatisfactory. In this paper we propose an approach that tightly int...
Finding injured humans is one of the primary goals of any search and rescue operation. The aim of this paper is to address the task of automatically finding people lying on the ground in images taken from the on-board camera of an unmanned aerial vehicle (UAV). In this paper we evaluate various state-of-the-art visual people detection methods in th...
The accuracy of optical flow estimation algorithms has been improving steadily as evidenced by results on the Middlebury optical flow benchmark. The typical formulation, however, has changed little since the work of Horn and Schunck. We attempt to uncover what has made recent advances possible through a thorough analysis of how the objective functi...
Automatic recovery of 3D human pose from monocular image sequences is a challenging and important research topic with numerous applications. Although current methods are able to recover 3D pose for a single person in controlled environments, they are severely challenged by real-world scenarios, such as crowded street scenes. To address this problem...
Scene understanding has (again) become a focus of computer vision research, leveraging advances in detection, context modeling,
and tracking. In this paper, we present a novel probabilistic 3D scene model that encompasses multi-class object detection,
object tracking, scene labeling, and 3D geometric relations. This integrated D model is able to re...
The efficient application of graph cuts to Markov Random Fields (MRFs) with multiple discrete or continuous labels remains an open question. In this paper, we demonstrate one possible way of achieving this by using graph cuts to combine pairs of suboptimal labelings or solutions. We call this combination process the fusion move. By employing recent...
Object recognition is challenging due to high intra-class variability caused, e.g., by articulation, viewpoint changes, and partial occlusion. Successful methods need to strike a balance between being flexible enough to model such variation and discriminative enough to detect objects in cluttered, real world scenes. Motivated by these challenges we...
Markov random fields (MRFs) are popular and generic probabilistic models of prior knowledge in low-level vision. Yet their generative properties are rarely examined, while application-specific models and non-probabilistic learning are gaining increased attention. In this paper we revisit the generative aspects of MRFs, and analyze the quality of co...
In urban search and rescue scenarios, typical applications of robots include autonomous exploration of possibly dangerous
sites, and the recognition of victims and other objects of interest. In complex scenarios, relying on only one type of sensor
is often misleading, while using complementary sensors frequently helps improving the performance. To...
A variety of flexible models have been proposed to detect objects in challenging real world scenes. Motivated by some of the most successful techniques, we propose a hierarchical multi-feature representation and automatically learn flexible hierarchical object models for a wide variety of object classes. To that end we not only rely on automatic se...
Non-rigid object detection and articulated pose estimation are two related and challenging problems in computer vision. Numerous models have been proposed over the years and often address different special cases, such as pedestrian detection or upper body pose estimation in TV footage. This paper shows that such specialization may not be necessary,...
The accurate estimation of optical flow is a challenging task, which is often posed as an energy minimization problem. Most top-performing methods approach this using continuous optimization algorithms. In many cases, the employed models are assumed to be convex to ensure tractability of the optimization problem. This is in contrast to the related...
We develop a framework for learning generic, expressive image priors that capture the statistics of natural scenes and can
be used for a variety of machine vision tasks. The approach provides a practical method for learning high-order Markov random
field (MRF) models with potential functions that extend over large pixel neighborhoods. These clique...
Image stitching or mosaicing is a challenging vi- sion problem, especially when considering aspects like high definition content, real-time, and proper compensation of the parallax error of objects at dif- ferent distances to the camera system. Today many approaches to image stitching exist, most of them deal with medium resolution images, offline...
Over the last few years, visual people detection has made impressive progress. The paper gives an overview of some of the most successful techniques for people detection and also summarizes a recent quantitative comparison of sev- eral state-of-the-art methods. As a proof-of-concept we show that the combination of visual and laser-based peo- ple de...
Assumptions of brightness constancy and spatial smoothness underlie most optical flow estimation methods. In contrast to standard heuristic formulations, we learn a statistical model of both brightness constancy error and the spatial properties of optical flow using image sequences with associated ground truth flow fields. The result is a complete...
Accurate estimation of optical flow is a challenging task, which often requires addressing difficult energy optimiza- tion problems. To solve them, most top-performing methods rely on continuous optimization algorithms. The modeling accuracy of the energy in this case is often traded for its tractability. This is in contrast to the related problem...
Both detection and tracking people are challenging problems, especially in complex real world scenes that com- monly involve multiple people, complicated occlusions, and cluttered or even moving backgrounds. People detectors have been shown to be able to locate pedestrians even in complex street scenes, but false positives have remained frequent. T...
We introduce a convex relaxation approach for the quadratic assignment problem to the field of computer vision. Due to convexity, a favourable property of this approach is the absence of any tuning parameters and the computation of high-quality combinatorial solutions by solving a mathematically simple optimization problem. Furthermore, the relaxat...
The quantitative evaluation of optical flow algorithms by Barron et al. (1994) led to significant advances in performance. The challenges for optical flow algorithms today go beyond the datasets and
evaluation methods proposed in that paper. Instead, they center on problems associated with complex natural scenes, including
nonrigid motion, real sen...
In contrast to traditional Markov random field (MRF) models, we develop a steerable random field (SRF) in which the field potentials are defined in terms of filter responses that are steered to the local image structure. In particular, we use the structure tensor to obtain derivative responses that are either aligned with, or orthogonal to, the pre...
We develop a Bayesian model of digitized archival films and use this for denoising, or more specifically de-graining, individual frames. In contrast to previous approaches our model uses a learned spatial prior and a unique likelihood term that models the physics that generates the image grain. The spatial prior is represented by a high-order Marko...
Belief propagation (BP) has become widely used for low-level vision problems and various inference techniques have been proposed for loopy graphs. These methods typically rely on ad hoc spatial priors such as the Potts model. In this paper we investigate the use of learned mod- els of image structure, and demonstrate the improvements obtained over...
In scenes containing specular objects, the image motion observed by a moving camera may be an intermixed combination of optical flow resulting from diffuse reflectance (diffuse flow) and specular reflection (specular flow). Here, with few assumptions, we formalize the notion of specular flow, show how it relates to the 3D structure of the world, an...
We develop a method for learning the spatial statistics of optical flow fields from a novel training database. Training flow fields are constructed using range images of natural scenes and 3D camera motions recovered from handheld and car-mounted video sequences. A detailed analysis of optical flow statistics in natural scenes is presented and mach...
We develop a framework for learning generic, expressive image priors that capture the statistics of natural scenes and can be used for a variety of machine vision tasks. The approach extends traditional Markov random field (MRF) models by learning potential functions over extended pixel neighborhoods. Field potentials are modeled using a Products-o...
Probabilistic modeling of correlated neural population firing activity is central to understanding the neural code and building practical decoding algorithms. No parametric models currently exist for modeling multi- variate correlated neural data and the high dimensional nature of the data makes fully non-parametric methods impractical. To address...
Bayesian methods for visual tracking model the likelihood of image measurements conditioned on a tracking hypothesis. Image measurements may, for example, correspond to various filter responses at multiple scales and orientations. Most tracking approaches exploit ad hoc likelihood models while those that exploit more rigorous, learned, models often...
We pose the problem of 3D human tracking as one of inference in a graphical model. Unlike traditional kinematic tree representations, our model of the body is a collection of loosely-connected limbs. Conditional probabilities relating the 3D pose of connected limbs are learned from motion-captured training data. Similarly, we learn probabilistic mo...
Introduction SA73 II. Basic Stimuli III. Experiment 1 IV. Experiment 2 Acknowledgements V. Conclusions Observations Specular flow applied to random dots or random lines is not perceived as "shiny". When the illumination sources become more "natural" the object is increasingly perceived as "shiny". The further away the illumination sphere is, the le...
We present a novel approach to the weighted graph-matching problem in computer vision, based on a convex relaxation of the underlying combinatorial optimization problem. The approach always computes a lower bound of the objective function, which is a favorable property in the context of exact search algorithms. Furthermore, no tuning parameters hav...
We introduce a convex relaxation approach for the quadratic assignment problem to the field of computer vision. Due to convexity, a favourable property of this approach is the absence of any tuning parameters and the computation of high–quality combinatorial solutions by solving a mathematically simple optimization problem. Furthermore, the relaxat...
Low-level vision is a fundamental area of computer vision that is concerned with the analysis of digital images at the pixel level and the computation of other dense, pixel-based representations of scenes such as depth and motion. Many of the algorithms and models in low-level vision rely on a representation of prior knowledge about images or other...
Network
Cited