Peter Gehler

Peter Gehler
Amazon

Dr. rer. nat.

About

80
Publications
23,353
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
9,324
Citations
Additional affiliations
October 2017 - present
Amazon
Position
  • Researcher
March 2017 - September 2017
University of Wuerzburg
Position
  • Professor
January 2012 - February 2017
Max Planck Institute for Intelligent Systems
Position
  • Senior Researcher

Publications

Publications (80)
Preprint
Dense prediction tasks typically employ encoder-decoder architectures, but the prevalent convolutions in the decoder are not image-adaptive and can lead to boundary artifacts. Different generalized convolution operations have been introduced to counteract this. We go beyond these by leveraging guidance data to redefine their inherent notion of prox...
Preprint
The ColorChecker dataset is one of the most widely used image sets for evaluating and ranking illuminant estimation algorithms. However, this single set of images has at least 3 different sets of ground-truth (i.e. correct answers) associated with it. In the literature it is often asserted that one algorithm is better than another when the algorith...
Chapter
Modern deep learning systems successfully solve many perception tasks such as object pose estimation when the input image is of high quality. However, in challenging imaging conditions such as on low resolution images or when the image is corrupted by imaging artifacts, current systems degrade considerably in accuracy. While a loss in performance i...
Preprint
Full-text available
Direct prediction of 3D body pose and shape remains a challenge even for highly parameterized deep learning models. Mapping from the 2D image space to the prediction space is difficult: perspective ambiguities make the loss function noisy and training data is scarce. In this paper, we propose a novel approach (Neural Body Fitting (NBF)). It integra...
Preprint
In a previous work, it was shown that there is a curious problem with the benchmark Color Checker dataset for illuminant estimation. To wit, this dataset has at least 3 different sets of ground-truths. Typically, for a single algorithm a single ground-truth is used. But then different algorithms, whose performance is measured with respect to differ...
Preprint
Full-text available
Modern deep learning systems successfully solve many perception tasks such as object pose estimation when the input image is of high quality. However, in challenging imaging conditions such as on low-resolution images or when the image is corrupted by imaging artifacts, current systems degrade considerably in accuracy. While a loss in performance i...
Conference Paper
Full-text available
We present the first image-based generative model of people in clothing in a full-body setting. We sidestep the commonly used complex graphics rendering pipeline and the need for high-quality 3D scans of dressed people. Instead, we learn generative models from a large image database. The main challenge is to cope with the high variance in human pos...
Conference Paper
Most object detection systems consist of three stages. First, a set of individual hypotheses for object locations is generated using a proposal generating algorithm. Second, a classifier scores every generated hypothesis independently to obtain a multi-class prediction. Finally, all scored hypotheses are filtered via a non-differentiable and decoup...
Article
Full-text available
In this work, we propose a technique to convert CNN models for semantic segmentation of static images into CNNs for video data. We describe a warping method that can be used to augment existing architectures with very little extra computational cost. This module is called NetWarp and we demonstrate its use for a range of network architectures. The...
Article
Full-text available
Existing marker-less motion capture methods often assume known backgrounds, static cameras, and sequence specific motion priors, which narrows its application scenarios. Here we propose a fully automatic method that given multi-view video, estimates 3D human motion and body shape. We take recent SMPLify \cite{bogo2016keep} as the base method, and e...
Conference Paper
Full-text available
We propose a technique that propagates information forward through video data. The method is conceptually simple and can be applied to tasks that require the propagation of structured information, such as semantic labels, based on video content. We propose a 'Video Propagation Network' that processes video frames in an adaptive manner. The model is...
Conference Paper
Full-text available
3D models provide the common ground for different representations of human bodies. In turn, robust 2D estimation has proven to be a powerful tool to obtain 3D fits "in-the-wild". However, depending on the level of detail, it can be hard to impossible to obtain labeled representations on large scale. We propose a hybrid approach to this problem: wit...
Article
Separation of an input image into its reflectance and shading layers poses a challenge for learning approaches because no large corpus of precise and realistic ground truth decompositions exists. The Intrinsic Images in the Wild dataset (IIW) provides a sparse set of relative human reflectance judgments, which serves as a standard benchmark for int...
Conference Paper
Full-text available
In this paper we propose a CNN architecture for semantic image segmentation. We introduce a new `bilateral inception'' module that can be inserted in existing CNN architectures and performs bilateral filtering, at multiple feature-scales, between superpixels in an image. The feature spaces for bilateral filtering and other parameters of the module...
Conference Paper
Full-text available
We describe the first method to automatically estimate the 3D pose of the human body as well as its 3D shape from a single unconstrained image. We estimate a full 3D mesh and show that 2D joints alone carry a surprising amount of information about body shape. The problem is challenging because of the complexity of the human body, articulation, occl...
Conference Paper
In this paper we propose a CNN architecture for semantic image segmentation. We introduce a new “bilateral inception” module that can be inserted in existing CNN architectures and performs bilateral filtering, at multiple feature-scales, between superpixels in an image. The feature spaces for bilateral filtering and other parameters of the module a...
Conference Paper
The caffe framework is one of the leading deep learning toolboxes in the machine learning and computer vision community. While it offers efficiency and configurability, it falls short of a full interface to Python. With increasingly involved procedures for training deep networks and reaching depths of hundreds of layers, creating configuration file...
Article
Full-text available
We describe the first method to automatically estimate the 3D pose of the human body as well as its 3D shape from a single unconstrained image. We estimate a full 3D mesh and show that 2D joints alone carry a surprising amount of information about body shape. The problem is challenging because of the complexity of the human body, articulation, occl...
Article
Full-text available
This paper introduces a fast and efficient segmentation technique for 2D images and 3D point clouds of building facades. Facades of buildings are highly structured and consequently most methods that have been proposed for this problem aim to make use of this strong prior information. Contrary to most prior work, we are describing a system that is a...
Conference Paper
Bilateral filters have wide spread use due to their edge-preserving properties. The common use case is to manually choose a parametric filter type, usually a Gaussian filter. In this paper, we will generalize the parametrization and in particular derive a gradient descent algorithm so the filter parameters can be learned from data. This derivation...
Article
Full-text available
This paper considers the task of articulated human pose estimation of multiple people in real-world images. We propose an approach that jointly solves the tasks of detection and pose estimation: it infers the number of persons in a scene, identifies occluded body parts, and disambiguates body parts between people in close proximity of each other. T...
Article
As objects are inherently 3D, they have been modeled in 3D in the early days of computer vision. Due to the ambiguities arising from mapping 2D features to 3D models, 3D object representations have been neglected and 2D feature-based models are the predominant paradigm in object detection nowadays. While such models have achieved outstanding boundi...
Article
Full-text available
Object class detection has been a synonym for 2D bounding box localization for the longest time, fueled by the success of powerful statistical learning techniques, combined with robust image representations. Only recently, there has been a growing interest in revisiting the promise of computer vision from the early days: to precisely delineate the...
Article
This paper introduces an efficient, non-linear image adaptive filtering as a generalization of the standard spatial convolution of convolutional neural networks (CNNs). We build on the bilateral filtering operation, a commonly used edge-aware image processing technique. Our implementation of bilateral filters uses specialized data structures, and i...
Conference Paper
Full-text available
In this paper we propose a system for the problem of facade segmentation. Building facades are highly structured images and consequently most methods that have been proposed for this problem, aim to make use of this strong prior information. We are describing a system that is almost domain independent and consists of standard segmentation methods....
Book
This book constitutes the refereed proceedings of the 37th German Conference on Pattern Recognition, GCPR 2015, held in Aachen, Germany, in October 2015. The 45 revised full papers and one Young Researchers Forum presented were carefully reviewed and selected from 108 submissions. The papers are organized in topical sections on motion and reconstru...
Article
Full-text available
This paper presents a convolutional layer that is able to process sparse input features. As an example, for image recognition problems this allows an efficient filtering of signals that do not lie on a dense grid (like pixel position), but of more general features (such as color values). The presented algorithm makes use of the permutohedral lattic...
Conference Paper
Full-text available
Intrinsic images such as albedo and shading are valuable for later stages of visual processing. Previous methods for extracting albedo and shading use either single images or images together with depth data. Instead, we define intrinsic video estimation as the problem of extracting temporally coherent albedo and shading from video alone. Our approa...
Conference Paper
Full-text available
This paper proposes a new formulation of the human pose estimation problem. We present the Fields of Parts model, a binary Conditional Random Field model designed to detect human body parts of articulated people in single images. The Fields of Parts model is inspired by the idea of Pictorial Structures, it models local appearance and joint spatial...
Conference Paper
Full-text available
Human pose estimation has made significant progress during the last years. However current datasets are limited in their coverage of the overall pose estimation challenges. Still these serve as the common sources to evaluate, train and compare different models on. In this paper we intro-duce a novel benchmark "MPII Human Pose" 1 that makes a signif...
Conference Paper
Full-text available
Dynamic Bayesian networks such as Hidden Markov Models (HMMs) are successfully used as probabilistic mod-els for human motion. The use of hidden variables makes them expressive models, but inference is only approximate and requires procedures such as particle filters or Markov chain Monte Carlo methods. In this work we propose to in-stead use simpl...
Article
Computer vision is hard because of a large variability in lighting, shape, and texture; in addition the image signal is non-additive due to occlusion. Generative models promised to account for this variability by accurately modelling the image formation process as a function of latent variables with prior beliefs. Bayesian posterior inference could...
Article
Full-text available
Ranking hypothesis sets is a powerful concept for efficient object detection. In this work, we propose a branch&rank scheme that detects objects with often less than 100 ranking operations. This efficiency enables the use of strong and also costly classifiers like non-linear SVMs with RBF- [TEX equation: \chi ^2] kernels. We thereby relieve an inhe...
Article
Full-text available
While the majority of today's object class models provide only 2D bounding boxes, far richer output hypotheses are desirable including viewpoint, fine-grained category, and 3D geometry estimate. However, models trained to provide richer output require larger amounts of training data, preferably well covering the relevant aspects such as viewpoint a...
Conference Paper
Full-text available
Typical approaches to articulated pose estimation combine spatial modelling of the human body with appearance modelling of body parts. This paper aims to push the state-of-the-art in articulated pose estimation in two ways. First we explore various types of appearance representations aiming to substantially improve the body part hypotheses. And sec...
Conference Paper
Having a sensible prior of human pose is a vital ingredient for many computer vision applications, including tracking and pose estimation. While the application of global non-parametric approaches and parametric models has led to some success, finding the right balance in terms of flexibility and tractability, as well as estimating model parameters...
Conference Paper
Full-text available
Despite the success of recent object class recognition systems, the long-standing problem of partial occlusion remains a major challenge, and a principled solution is yet to be found. In this paper we leave the beaten path of methods that treat occlusion as just another source of noise - instead, we include the occluder itself into the modelling, b...
Conference Paper
In this paper we consider the challenging problem of articulated human pose estimation in still images. We observe that despite high variability of the body articulations, human motions and activities often simultaneously constrain the positions of multiple body parts. Modelling such higher order part dependencies seemingly comes at a cost of more...
Conference Paper
Full-text available
As objects are inherently 3-dimensional, they have been modeled in 3D in the early days of computer vision. Due to the ambiguities arising from mapping 2D features to 3D models, 2D feature-based models are the predominant paradigm in object recognition today. While such models have shown competitive bounding box (BB) detection performance, they are...
Conference Paper
Full-text available
We present a novel conditional random field (CRF) for semantic seg-mentation that extends the common Potts model of spatial coherency with latent topics, which capture higher-order spatial relations of segment labels. Specifi-cally, we show how recent approaches for producing sets of figure-ground seg-mentations can be leveraged to construct a suit...
Conference Paper
Full-text available
Current object class recognition systems typically target 2D bounding box localization, encouraged by benchmark data sets, such as Pascal VOC. While this seems suitable for the detection of individual objects, higher-level applications such as 3D scene understanding or 3D object tracking would benefit from more fine-grained object hypotheses incorp...
Conference Paper
Full-text available
Branch&rank is an object detection scheme that overcomes the inherent limitation of branch&bound: this method works with arbitrary (classifier) functions whereas tight bounds exist only for simple functions. Objects are usually detected with less than 100 classifier evaluation, which paves the way for using strong (and thus costly) classifiers: We...
Conference Paper
Full-text available
We propose a method to learn simultaneously a vector-valued function and a kernel between its components. The obtained kernel can be used both to improve learning performance and to reveal structures in the output space which may be important in their own right. Our method is based on the solution of a suitable regularization problem over a reprodu...
Article
Full-text available
We address the challenging task of decoupling material properties from lighting properties given a single image. In the last two decades virtually all works have concentrated on exploiting edge information to address this problem. We take a different route by introducing a new prior on reflectance, that models reflectance values as being drawn from...
Conference Paper
Full-text available
In this paper, we present a new, improved seam carving algo-rithm. Seam carving efficiently removes pixels from an image to produce a retargeted image. It has proved popular with users and has been used as a component in many retargeting algorithms. We introduce the visi-bility map, a new framework for pixel removing image editing methods. This all...
Conference Paper
Full-text available
Image retargeting algorithms often create visually disturbing distortion. We introduce the property of scene consistency, which is held by images which contain no object distortion and have the correct object depth ordering. We present two new image retargeting algorithms that preserve scene consistency. These algorithms make use of a user-provided...
Conference Paper
Full-text available
Recent progress in per-pixel object class labeling of natural images can be attributed to the use of multiple types of image features and sound statistical learning approaches. Within the latter, Conditional Random Fields (CRF) are prominently used for their ability to represent interactions between random variables. Despite their popularity in com...
Article
Full-text available
In this paper we build upon the Multiple Kernel Learning (MKL) framework and in particular on [1] which generalized it to infinitely many kernels. We rewrite the problem in the standard MKL formulation which leads to a Semi-Infinite Program. We devise a new algorithm to solve it (Infinite Kernel Learning, IKL). The IKL algorithm is applicable to bo...
Thesis
This thesis extends the use of kernel learning techniques to specific problems of image classification. Kernel learning is a paradigm in the eld of machine learning that generalizes the use of inner products to compute similarities between arbitrary objects. In image classification one aims to separate images based on their visual content. We addre...
Chapter
IntroductionKernelsThe representer theoremLearning with kernelsConclusion References
Conference Paper
Full-text available
A key ingredient in the design of visual object classifi- cation systems is the identification of relevant class specific aspects while being robust to intra-class variations. While this is a necessity in order to generalize beyond a given set of training images, it is also a very difficult problem due to the high variability of visual appearance w...
Conference Paper
Full-text available
Most modern computer vision systems for high-level tasks, such as image classification, object recognition and segmentation, are based on learning algorithms that are able to separate discriminative information from noise. In practice, however, the typical system consists of a long pipeline of pre-processing steps, such as extraction of different k...
Conference Paper
Full-text available
Computational color constancy is the task of estimating the true reflectances of visible surfaces in an image. In this paper we follow a line of research that assumes uniform illumination of a scene, and that the principal step in estimating reflectances is the estimation of the scene illuminant. We review recent approaches to illuminant estimation...
Article
Full-text available
In this paper we demonstrate how determinis- tic annealing can be applied to different SVM formulations of the multiple-instance learning (MIL) problem. Our results show that we find better local minima compared to the heuristic methods those problems are usually solved with. However this does not always translate into a bet- ter test error suggest...
Article
In this paper we demonstrate how deterministic annealing can be applied to different SVM formulations of the multiple-instance learning (MIL) problem. Our results show that we find better local minima compared to the heuristic methods those problems are usually solved with. However this does not always translate into a better test error suggesting...
Conference Paper
Full-text available
Probabilistic modelling of text data in the bag- of-words representation has been dominated by directed graphical models such as pLSI, LDA, NMF, and discrete PCA. Recently, state of the art performance on visual object recognition has also been reported using variants of these mod- els. We introduce an alternative undirected graphical model suitabl...
Conference Paper
Probabilistic modelling of text data in the bag-of-words representation has been dominated by directed graphical models such as pLSI, LDA, NMF, and discrete PCA. Recently, state of the art performance on visual object recognition has also been reported using variants of these models. We introduce an alternative undirected graphical model suitable f...
Conference Paper
Full-text available
Images represent an important and abundant source of data. Understand- ing their statistical structure has important applications such as image compression and restoration. In this paper we propose a particular kind of probabilistic model, dubbed the "products of edge-perts model" to de- scribe the structure of wavelet transformed images. We develo...