About
82
Publications
29,232
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
13,646
Citations
Introduction
Peter Gehler currently works at Amazon Research in Tuebingen.
Additional affiliations
October 2017 - present
Amazon
Position
- Researcher
March 2017 - September 2017
January 2012 - February 2017
Publications
Publications (82)
Despite their recent success, deep neural networks continue to perform poorly when they encounter distribution shifts at test time. Many recently proposed approaches try to counter this by aligning the model to the new distribution prior to inference. With no labels available this requires unsupervised objectives to adapt the model on the observed...
Dense prediction tasks typically employ encoder-decoder architectures, but the prevalent convolutions in the decoder are not image-adaptive and can lead to boundary artifacts. Different generalized convolution operations have been introduced to counteract this. We go beyond these by leveraging guidance data to redefine their inherent notion of prox...
The ColorChecker dataset is one of the most widely used image sets for evaluating and ranking illuminant estimation algorithms. However, this single set of images has at least 3 different sets of ground-truth (i.e. correct answers) associated with it. In the literature it is often asserted that one algorithm is better than another when the algorith...
Modern deep learning systems successfully solve many perception tasks such as object pose estimation when the input image is of high quality. However, in challenging imaging conditions such as on low resolution images or when the image is corrupted by imaging artifacts, current systems degrade considerably in accuracy. While a loss in performance i...
Direct prediction of 3D body pose and shape remains a challenge even for highly parameterized deep learning models. Mapping from the 2D image space to the prediction space is difficult: perspective ambiguities make the loss function noisy and training data is scarce. In this paper, we propose a novel approach (Neural Body Fitting (NBF)). It integra...
In a previous work, it was shown that there is a curious problem with the benchmark Color Checker dataset for illuminant estimation. To wit, this dataset has at least 3 different sets of ground-truths. Typically, for a single algorithm a single ground-truth is used. But then different algorithms, whose performance is measured with respect to differ...
Modern deep learning systems successfully solve many perception tasks such as object pose estimation when the input image is of high quality. However, in challenging imaging conditions such as on low-resolution images or when the image is corrupted by imaging artifacts, current systems degrade considerably in accuracy. While a loss in performance i...
We present the first image-based generative model of people in clothing in a full-body setting. We sidestep the commonly used complex graphics rendering pipeline and the need for high-quality 3D scans of dressed people. Instead, we learn generative models from a large image database. The main challenge is to cope with the high variance in human pos...
Most object detection systems consist of three stages. First, a set of individual hypotheses for object locations is generated using a proposal generating algorithm. Second, a classifier scores every generated hypothesis independently to obtain a multi-class prediction. Finally, all scored hypotheses are filtered via a non-differentiable and decoup...
In this work, we propose a technique to convert CNN models for semantic segmentation of static images into CNNs for video data. We describe a warping method that can be used to augment existing architectures with very little extra computational cost. This module is called NetWarp and we demonstrate its use for a range of network architectures. The...
Existing marker-less motion capture methods often assume known backgrounds, static cameras, and sequence specific motion priors, which narrows its application scenarios. Here we propose a fully automatic method that given multi-view video, estimates 3D human motion and body shape. We take recent SMPLify \cite{bogo2016keep} as the base method, and e...
We propose a technique that propagates information forward through video data. The method is conceptually simple and can be applied to tasks that require the propagation of structured information, such as semantic labels, based on video content. We propose a 'Video Propagation Network' that processes video frames in an adaptive manner. The model is...
3D models provide the common ground for different representations of human bodies. In turn, robust 2D estimation has proven to be a powerful tool to obtain 3D fits "in-the-wild". However, depending on the level of detail, it can be hard to impossible to obtain labeled representations on large scale. We propose a hybrid approach to this problem: wit...
Separation of an input image into its reflectance and shading layers poses a challenge for learning approaches because no large corpus of precise and realistic ground truth decompositions exists. The Intrinsic Images in the Wild dataset (IIW) provides a sparse set of relative human reflectance judgments, which serves as a standard benchmark for int...
In this paper we propose a CNN architecture for semantic image segmentation. We introduce a new `bilateral inception'' module that can be inserted in existing CNN architectures and performs bilateral filtering, at multiple feature-scales, between superpixels in an image. The feature spaces for bilateral filtering and other parameters of the module...
We describe the first method to automatically estimate the 3D pose of the human body as well as its 3D shape from a single unconstrained image. We estimate a full 3D mesh and show that 2D joints alone carry a surprising amount of information about body shape. The problem is challenging because of the complexity of the human body, articulation, occl...
In this paper we propose a CNN architecture for semantic image segmentation. We introduce a new “bilateral inception” module that can be inserted in existing CNN architectures and performs bilateral filtering, at multiple feature-scales, between superpixels in an image. The feature spaces for bilateral filtering and other parameters of the module a...
The caffe framework is one of the leading deep learning toolboxes in the machine learning and computer vision community. While it offers efficiency and configurability, it falls short of a full interface to Python. With increasingly involved procedures for training deep networks and reaching depths of hundreds of layers, creating configuration file...
We describe the first method to automatically estimate the 3D pose of the human body as well as its 3D shape from a single unconstrained image. We estimate a full 3D mesh and show that 2D joints alone carry a surprising amount of information about body shape. The problem is challenging because of the complexity of the human body, articulation, occl...
This paper introduces a fast and efficient segmentation technique for 2D images and 3D point clouds of building facades. Facades of buildings are highly structured and consequently most methods that have been proposed for this problem aim to make use of this strong prior information. Contrary to most prior work, we are describing a system that is a...
Bilateral filters have wide spread use due to their edge-preserving properties. The common use case is to manually choose a parametric filter type, usually a Gaussian filter. In this paper, we will generalize the parametrization and in particular derive a gradient descent algorithm so the filter parameters can be learned from data. This derivation...
This paper considers the task of articulated human pose estimation of
multiple people in real-world images. We propose an approach that jointly
solves the tasks of detection and pose estimation: it infers the number of
persons in a scene, identifies occluded body parts, and disambiguates body
parts between people in close proximity of each other. T...
As objects are inherently 3D, they have been modeled in 3D in the early days of computer vision. Due to the ambiguities arising from mapping 2D features to 3D models, 3D object representations have been neglected and 2D feature-based models are the predominant paradigm in object detection nowadays. While such models have achieved outstanding boundi...
Object class detection has been a synonym for 2D bounding box localization
for the longest time, fueled by the success of powerful statistical learning
techniques, combined with robust image representations. Only recently, there
has been a growing interest in revisiting the promise of computer vision from
the early days: to precisely delineate the...
This paper introduces an efficient, non-linear image adaptive filtering as a
generalization of the standard spatial convolution of convolutional neural
networks (CNNs). We build on the bilateral filtering operation, a commonly used
edge-aware image processing technique. Our implementation of bilateral filters
uses specialized data structures, and i...
In this paper we propose a system for the problem of facade segmentation. Building facades are highly structured
images and consequently most methods that have been proposed for this problem, aim to make use of this strong prior information. We are describing a system that is almost domain independent and consists of standard segmentation methods....
This book constitutes the refereed proceedings of the 37th German Conference on Pattern Recognition, GCPR 2015, held in Aachen, Germany, in October 2015. The 45 revised full papers and one Young Researchers Forum presented were carefully reviewed and selected from 108 submissions. The papers are organized in topical sections on motion and reconstru...
This paper presents a convolutional layer that is able to process sparse
input features. As an example, for image recognition problems this allows an
efficient filtering of signals that do not lie on a dense grid (like pixel
position), but of more general features (such as color values). The presented
algorithm makes use of the permutohedral lattic...
Intrinsic images such as albedo and shading are valuable for later stages of visual processing. Previous methods for extracting albedo and shading use either single images or images together with depth data. Instead, we define intrinsic video estimation as the problem of extracting temporally coherent albedo and shading from video alone. Our approa...
This paper proposes a new formulation of the human pose estimation problem. We present the Fields of Parts model, a binary Conditional Random Field model designed to detect human body parts of articulated people in single images.
The Fields of Parts model is inspired by the idea of Pictorial Structures, it models local appearance and joint spatial...
Human pose estimation has made significant progress during the last years. However current datasets are limited in their coverage of the overall pose estimation challenges. Still these serve as the common sources to evaluate, train and compare different models on. In this paper we intro-duce a novel benchmark "MPII Human Pose" 1 that makes a signif...
Dynamic Bayesian networks such as Hidden Markov Models (HMMs) are successfully used as probabilistic mod-els for human motion. The use of hidden variables makes them expressive models, but inference is only approximate and requires procedures such as particle filters or Markov chain Monte Carlo methods. In this work we propose to in-stead use simpl...
Computer vision is hard because of a large variability in lighting, shape,
and texture; in addition the image signal is non-additive due to occlusion.
Generative models promised to account for this variability by accurately
modelling the image formation process as a function of latent variables with
prior beliefs. Bayesian posterior inference could...
Ranking hypothesis sets is a powerful concept for efficient object detection. In this work, we propose a branch&rank scheme that detects objects with often less than 100 ranking operations. This efficiency enables the use of strong and also costly classifiers like non-linear SVMs with RBF- [TEX equation: \chi ^2] kernels. We thereby relieve an inhe...
While the majority of today's object class models provide only 2D bounding
boxes, far richer output hypotheses are desirable including viewpoint,
fine-grained category, and 3D geometry estimate. However, models trained to
provide richer output require larger amounts of training data, preferably well
covering the relevant aspects such as viewpoint a...
Typical approaches to articulated pose estimation combine spatial modelling of the human body with appearance modelling of body parts. This paper aims to push the state-of-the-art in articulated pose estimation in two ways. First we explore various types of appearance representations aiming to substantially improve the body part hypotheses. And sec...
Having a sensible prior of human pose is a vital ingredient for many computer vision applications, including tracking and pose estimation. While the application of global non-parametric approaches and parametric models has led to some success, finding the right balance in terms of flexibility and tractability, as well as estimating model parameters...
Despite the success of recent object class recognition systems, the long-standing problem of partial occlusion remains a major challenge, and a principled solution is yet to be found. In this paper we leave the beaten path of methods that treat occlusion as just another source of noise - instead, we include the occluder itself into the modelling, b...
In this paper we consider the challenging problem of articulated human pose estimation in still images. We observe that despite high variability of the body articulations, human motions and activities often simultaneously constrain the positions of multiple body parts. Modelling such higher order part dependencies seemingly comes at a cost of more...
As objects are inherently 3-dimensional, they have been modeled in 3D in the early days of computer vision. Due to the ambiguities arising from mapping 2D features to 3D models, 2D feature-based models are the predominant paradigm in object recognition today. While such models have shown competitive bounding box (BB) detection performance, they are...
We present a novel conditional random field (CRF) for semantic seg-mentation that extends the common Potts model of spatial coherency with latent topics, which capture higher-order spatial relations of segment labels. Specifi-cally, we show how recent approaches for producing sets of figure-ground seg-mentations can be leveraged to construct a suit...
Current object class recognition systems typically target 2D bounding box localization, encouraged by benchmark data sets, such as Pascal VOC. While this seems suitable for the detection of individual objects, higher-level applications such as 3D scene understanding or 3D object tracking would benefit from more fine-grained object hypotheses incorp...
Branch&rank is an object detection scheme that overcomes the inherent limitation of branch&bound: this method works with arbitrary (classifier) functions whereas tight bounds exist only for simple functions. Objects are usually detected with less than 100 classifier evaluation, which paves the way for using strong (and thus costly) classifiers: We...
We propose a method to learn simultaneously a vector-valued function and a kernel between its components. The obtained kernel can be used both to improve learning performance and to reveal structures in the output space which may be important in their own right. Our method is based on the solution of a suitable regularization problem over a reprodu...
We address the challenging task of decoupling material properties from lighting properties given a single image. In the last two decades virtually all works have concentrated on exploiting edge information to address this problem. We take a different route by introducing a new prior on reflectance, that models reflectance values as being drawn from...
In this paper, we present a new, improved seam carving algo-rithm. Seam carving efficiently removes pixels from an image to produce a retargeted image. It has proved popular with users and has been used as a component in many retargeting algorithms. We introduce the visi-bility map, a new framework for pixel removing image editing methods. This all...
Image retargeting algorithms often create visually disturbing distortion. We introduce the property of scene consistency,
which is held by images which contain no object distortion and have the correct object depth ordering. We present two new
image retargeting algorithms that preserve scene consistency. These algorithms make use of a user-provided...
Recent progress in per-pixel object class labeling of natural images can be attributed to the use of multiple types of image features and sound statistical learning approaches. Within the latter, Conditional Random Fields (CRF) are prominently used for their ability to represent interactions between random variables. Despite their popularity in com...
In this paper we build upon the Multiple Kernel Learning (MKL) framework and in particular on [1] which generalized it to infinitely many kernels. We rewrite the problem in the standard MKL formulation which leads to a Semi-Infinite Program. We devise a new algorithm to solve it (Infinite Kernel Learning, IKL). The IKL algorithm is applicable to bo...
This thesis extends the use of kernel learning techniques to specific problems of image classification. Kernel learning is a paradigm in the eld of machine learning that generalizes the use of inner products to compute similarities between arbitrary objects. In image classification one aims to separate images based on their visual content. We addre...
IntroductionKernelsThe representer theoremLearning with kernelsConclusion
References
A key ingredient in the design of visual object classifi- cation systems is the identification of relevant class specific aspects while being robust to intra-class variations. While this is a necessity in order to generalize beyond a given set of training images, it is also a very difficult problem due to the high variability of visual appearance w...
Most modern computer vision systems for high-level tasks, such as image classification, object recognition and segmentation, are based on learning algorithms that are able to separate discriminative information from noise. In practice, however, the typical system consists of a long pipeline of pre-processing steps, such as extraction of different k...
Computational color constancy is the task of estimating the true reflectances of visible surfaces in an image. In this paper we follow a line of research that assumes uniform illumination of a scene, and that the principal step in estimating reflectances is the estimation of the scene illuminant. We review recent approaches to illuminant estimation...
In this paper we demonstrate how determinis- tic annealing can be applied to different SVM formulations of the multiple-instance learning (MIL) problem. Our results show that we find better local minima compared to the heuristic methods those problems are usually solved with. However this does not always translate into a bet- ter test error suggest...
In this paper we demonstrate how deterministic annealing can be applied to different SVM formulations of the multiple-instance learning (MIL) problem. Our results show that we find better local minima compared to the heuristic methods those problems are usually solved with. However this does not always translate into a better test error suggesting...
Probabilistic modelling of text data in the bag- of-words representation has been dominated by directed graphical models such as pLSI, LDA, NMF, and discrete PCA. Recently, state of the art performance on visual object recognition has also been reported using variants of these mod- els. We introduce an alternative undirected graphical model suitabl...
Probabilistic modelling of text data in the bag-of-words representation has been dominated by directed graphical models such as pLSI, LDA, NMF, and discrete PCA. Recently, state of the art performance on visual object recognition has also been reported using variants of these models. We introduce an alternative undirected graphical model suitable f...
Images represent an important and abundant source of data. Understand- ing their statistical structure has important applications such as image compression and restoration. In this paper we propose a particular kind of probabilistic model, dubbed the "products of edge-perts model" to de- scribe the structure of wavelet transformed images. We develo...
In Gaussian process regression, both the basis functions and their prior distri- bution are simultaneously specified by the choice of the covariance function. In certain problems one would like to choose the covariance independently of the basis functions (e. g., in polynomial signal processing or Wiener and Volterra analysis). We propose a solutio...
Images represent an important and abundant source of data. Understanding their statistical structure has important applications such as image compression and restoration. In this paper we propose a particular kind of probabilistic model, dubbed the ``products of edge-perts model`` to describe the structure of wavelet transformed images. We develop...
Kernel learning algorithms are currently becoming a standard tool in the area of machine learning and pattern recognition. In this chapter we review the fundamental theory of kernel learning. As the basic building block we introduce the kernel function, which provides an elegant and general way to compare possibly very complex objects. We then revi...