Peter M. Roth

Peter M. Roth
University of Veterinary Medicine, Vienna | vetmed · Institute for Computational Medicine

Professor

About

151
Publications
65,970
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,495
Citations

Publications

Publications (151)
Preprint
Full-text available
While 3D object detection in LiDAR point clouds is well-established in academia and industry, the explainability of these models is a largely unexplored field. In this paper, we propose a method to generate attribution maps for the detected objects in order to better understand the behavior of such models. These maps indicate the importance of each...
Article
Full-text available
In this paper, a neural network is trained to perform simple arithmetic using images of concatenated handwritten digit pairs. A convolutional neural network was trained with images consisting of two side-by-side handwritten digits, where the image’s label is the summation of the two digits contained in the combined image. Crucially, the network was...
Preprint
Full-text available
Objective: Surveillance imaging of chronic aortic diseases, such as dissections, relies on obtaining and comparing cross-sectional diameter measurements at predefined aortic landmarks, over time. Due to a lack of robust tools, the orientation of the cross-sectional planes is defined manually by highly trained operators. We show how manual annotatio...
Article
Full-text available
Information plane analysis, describing the mutual information between the input and a hidden layer and between a hidden layer and the target over time, has recently been proposed to analyze the training of neural networks. Since the activations of a hidden layer are typically continuous-valued, this mutual information cannot be computed analyticall...
Article
Full-text available
Many scientific studies deal with person counting and density estimation from single images. Recently, convolutional neural networks (CNNs) have been applied for these tasks. Even though often better results are reported, it is often not clear where the improvements are resulting from, and if the proposed approaches would generalize. Thus, the main...
Chapter
We present a novel 3D pose refinement approach based on differentiable rendering for objects of arbitrary categories in the wild. In contrast to previous methods, we make two main contributions: First, instead of comparing real-world images and synthetic renderings in the RGB or mask space, we compare them in a feature space optimized for 3D pose r...
Chapter
In this paper a neural network is trained to perform simple arithmetic using images of concatenated handwritten digit pairs. A convolutional neural network was trained with images consisting of two side-by-side handwritten digits, where the image’s label is the summation of the two digits contained in the combined image. Crucially, the network was...
Conference Paper
Full-text available
We present a novel 3D pose refinement approach based on differentiable rendering for objects of arbitrary categories in the wild. In contrast to previous methods, we make two main contributions: First, instead of comparing real-world images and synthetic renderings in the RGB or mask space, we compare them in a feature space optimized for 3D pose r...
Preprint
We present a novel 3D pose refinement approach based on differentiable rendering for objects of arbitrary categories in the wild. In contrast to previous methods, we make two main contributions: First, instead of comparing real-world images and synthetic renderings in the RGB or mask space, we compare them in a feature space optimized for 3D pose r...
Preprint
To make Robotics and Augmented Reality applications robust to illumination changes, the current trend is to train a Deep Network with training images captured under many different lighting conditions. Unfortunately, creating such a training set is a very unwieldy and complex task. We therefore propose a novel illumination normalization method that...
Article
To make Robotics and Augmented Reality applications robust to illumination changes, the current trend is to train a Deep Network with training images captured under many different lighting conditions. Unfortunately, creating such a training set is a very unwieldy and complex task. We therefore propose a novel illumination normalization method that...
Preprint
In this paper a neural network is trained to perform simple arithmetic using images of concatenated handwritten digit pairs. A convolutional neural network was trained with images consisting of two side-by-side handwritten digits, where the image's label is the summation of the two digits contained in the combined image. Crucially, the network was...
Article
Motivation: Image augmentation is a frequently used technique in computer vision and has been seeing increased interest since the popularity of deep learning. Its usefulness is becoming more and more recognized due to deep neural networks requiring larger amounts of data to train, and because in certain fields, such as biomedical imaging, large am...
Preprint
Full-text available
We propose a novel method to efficiently estimate the spatial layout of a room from a single monocular RGB image. As existing approaches based on low-level feature extraction, followed by a vanishing point estimation are very slow and often unreliable in realistic scenarios, we build on semantic segmentation of the input image. To obtain better seg...
Preprint
Deep neural networks paved the way for significant improvements in image visual categorization during the last years. However, even though the tasks are highly varying, differing in complexity and difficulty, existing solutions mostly build on the same architectural decisions. This also applies to the selection of activation functions (AFs), where...
Article
Full-text available
In this work, we introduce an end-to-end workflow for very high-resolution satellite-based mapping, building the basis for important 3D mapping products: (1) digital surface model, (2) digital terrain model, (3) normalized digital surface model and (4) ortho-rectified image mosaic. In particular, we describe all underlying principles for satellite-...
Conference Paper
Full-text available
We present a joint 3D pose and focal length estimation approach for object categories in the wild. In contrast to previous methods that predict 3D poses independently of the focal length or assume a constant focal length, we explicitly estimate and integrate the focal length into the 3D pose estimation. For this purpose, we combine deep learning te...
Conference Paper
Full-text available
We present Location Field Descriptors, a novel approach for single image 3D model retrieval in the wild. In contrast to previous methods that directly map 3D models and RGB images to an embedding space, we establish a common low-level representation in the form of location fields from which we compute pose invariant 3D shape descriptors. Location f...
Preprint
Full-text available
We present a joint 3D pose and focal length estimation approach for object categories in the wild. In contrast to previous methods that predict 3D poses independently of the focal length or assume a constant focal length, we explicitly estimate and integrate the focal length into the 3D pose estimation. For this purpose, we combine deep learning te...
Preprint
Full-text available
We present Location Field Descriptors, a novel approach for single image 3D model retrieval in the wild. In contrast to previous methods that directly map 3D models and RGB images to an embedding space, we establish a common low-level representation in the form of location fields from which we compute pose invariant 3D shape descriptors. Location f...
Conference Paper
Full-text available
In contrast to the fields of computer vision and photogrammetry, multiple view geometry has not been extensively exploited in the remote sensing domain so far. Therefore, an empirical study is conducted based on multi view Pléiades data that depicts a scene from multiple orbits and multiple incidence angles. First, an accuracy analysis of the 2D an...
Conference Paper
Full-text available
In airborne photogrammetry it is common sense to collect many images with high overlap both in along and in across track direction when a highly accurate digital surface model should be generated. Such highly redundant data aids the processing chain as inaccuracies or gross outliers resulting from one stereo pair can be minimized or even corrected...
Article
Full-text available
The fields of machine learning and cognitive computing have been in the last decade revolutionised with neural-inspired algorithms (e.g., deep ANNs and deep RL) and brain-intelligent systems that assist in many real-world learning tasks from robot monitoring and interaction at home to complex decision-making about emotions and behaviours in humans...
Article
Full-text available
We present an approach for fully automatic urinary bladder segmentation in CT images with artificial neural networks in this study. Automatic medical image analysis has become an invaluable tool in the different treatment stages of diseases. Especially medical image segmentation plays a vital role, since segmentation is often the initial step in an...
Article
Full-text available
The Pléiades satellite constellation provides very high resolution multi-spectral optical data at a ground sampling distance of about 0.7 m at nadir direction. Due to the highly agile pointing angle capacity in the range of ±47 degrees the sensors are optimal for detailed earth observation. They are able to collect stereo and tri-stereo datasets in...
Preprint
Deep Neural Networks have been shown to be beneficial for a variety of tasks, in particular allowing for end-to-end learning and reducing the requirement for manual design decisions. However, still many parameters have to be chosen in advance, also raising the need to optimize them. One important, but often ignored system parameter is the selection...
Conference Paper
Full-text available
We propose a scalable, efficient and accurate approach to retrieve 3D models for objects in the wild. Our contribution is twofold. We first present a 3D pose estimation approach for object categories which significantly outper-forms the state-of-the-art on Pascal3D+. Second, we use the estimated pose as a prior to retrieve 3D models which accuratel...
Article
Full-text available
We propose a scalable, efficient and accurate approach to retrieve 3D models for objects in the wild. Our contribution is twofold. We first present a 3D pose estimation approach for object categories which significantly outperforms the state-of-the-art on Pascal3D+. Second, we use the estimated pose as a prior to retrieve 3D models which accurately...
Article
Full-text available
Digital pathology is not only one of the most promising fields of diagnostic medicine, but at the same time a hot topic for fundamental research. Digital pathology is not just the transfer of histopathological slides into digital representations. The combination of different data sources (images, patient records, and *omics data) together with curr...
Chapter
Full-text available
During the last decade pathology has benefited from the rapid progress of image digitizing technologies, which led to the development of scanners, capable to produce so-called Whole Slide images (WSI) which can be explored by a pathologist on a computer screen comparable to the conventional microscope and can be used for diagnostics, research, arch...
Conference Paper
In this paper, we present a new 3D tracking approach for self-localization in urban environments. In particular, we build on existing tracking approaches (i.e., visual odometry tracking and SLAM), additionally using the information provided by 2.5D maps of the environment. Since this combination is not straightforward, we adopt ideas from semantic...
Conference Paper
Full-text available
We propose a method for accurate camera pose estimation in urban environments from single images and 2D maps made of the surrounding buildings’ outlines. Our approach bridges the gap between learning-based approaches and geometric approaches: We use recent semantic segmentation techniques for extracting the buildings’ edges and the façades’ normals...
Article
Full-text available
To be robust to illumination changes when detecting objects in images, the current trend is to train a Deep Network with training images captured under many different lighting conditions. Unfortunately, creating such a training set is very cumbersome, or sometimes even impossible, for some applications such as 3D pose estimation of specific objects...
Conference Paper
We present an efficient method for geolocalization in urban environments starting from a coarse estimate of the location provided by a GPS and using a simple untextured 2.5D model of the surrounding buildings. Our key contribution is a novel efficient and robust method to optimize the pose: We train a Deep Network to predict the best direction to i...
Conference Paper
Full-text available
Super-resolution addresses the problem of image upscaling by reconstructing high-resolution output images from low-resolution input images. One successful approach for this problem is based on random forests. However, this approach has a large memory footprint, since complex models are required to achieve high accuracy. To overcome this drawback ,...
Conference Paper
Large projection screens are increasingly present for everyday use at work. This paper introduces a new reliable system that utilizes spotlight emitted from a laser pointer device, a camera and a projection screen for interaction between human and computer. A camera is placed in a presentation room at a fixed but unknown distance and directed to th...
Conference Paper
Tracking multiple objects in parallel is a difficult task, especially if instances are interacting and occluding each other. To alleviate the arising problems multiple camera views can be taken into account, which, however, increases the computational effort. Evoking the need for very efficient methods, often rather simple approaches such as backgr...
Conference Paper
Full-text available
Robust multi-object tracking-by-detection requires the correct assignment of noisy detection results to object trajectories. We address this problem by proposing an online approach based on the observation that object detectors primarily fail if objects are significantly occluded. In contrast to most existing work, we only rely on geometric informa...
Conference Paper
In this paper, we present a novel object detection approach that is capable of regressing the aspect ratio of objects. This results in accurately predicted bounding boxes having high overlap with the ground truth. In contrast to most recent works, we employ a Random Forest for learning a template-based model but exploit the nature of this learning...
Chapter
Full-text available
Recently, Mahalanobis metric learning has gained a considerable interest for single-shot person re-identification. The main idea is to build on an existing image representation and to learn a metric that reflects the visual camera-to-camera transitions, allowing for a more powerful classification. The goal of this chapter is twofold. We first revie...
Conference Paper
Full-text available
Semi-supervised learning has recently demonstrated be successful in large scale learning for image classification tasks. Laplacian Support Vector Machines (LapSVM) is one of such approaches applied to this task. However, LapSVM uses a squared hinge loss function for the labeled examples, which is not twice differentiable and may penalize noisy labe...
Conference Paper
We present Alternating Regression Forests (ARFs), a novel regression algorithm that learns a Random Forest by optimizing a global loss function over all trees. This interrelates the information of single trees during the training phase and results in more accurate predictions. ARFs can minimize any differentiable regression loss without sacrificing...
Conference Paper
Full-text available
In this paper, we raise important issues concerning the evaluation complexity of existing Mahalanobis metric learning methods. The complexity scales linearly with the size of the dataset. This is especially cumbersome on large scale or for real-time applications with limited time budget. To alleviate this problem we propose to represent the dataset...
Conference Paper
In this paper, we present a novel formulation of Random Forests, which introduces order statistics into the splitting functions of nodes. Order statistics, in general, neglect the absolute values of single feature dimensions and just consider the ordering of different feature dimensions. Recent works showed that such statistics have more discrimina...
Conference Paper
In this paper, we address the problem of efficient k-NN classification. In particular, in the context of Mahalanobis metric learning. Mahalanobis metric learning recently demonstrated competitive results for a variety of tasks. However, such approaches have two main drawbacks. First, learning metrics requires often to solve complex and thus computa...
Conference Paper
Full-text available
The development of complex, powerful classifiers and their constant improvement have contributed much to the progress in many fields of computer vision. However, the trend towards large scale datasets revived the interest in simpler classifiers to reduce runtime. Simple nearest neighbor classifiers have several beneficial properties, such as low co...
Conference Paper
This paper introduces a novel classification method termed Alternating Decision Forests (ADFs), which formulates the training of Random Forests explicitly as a global loss minimization problem. During training, the losses are minimized via keeping an adaptive weight distribution over the training samples, similar to Boosting methods. In order to ke...
Conference Paper
Full-text available
Combining foreground images from multiple views by projecting them onto a common ground-plane has been recently applied within many multi-object tracking approaches. These planar projections introduce severe artifacts and constrain most approaches to objects moving on a common 2D ground-plane. To overcome these limitations, we introduce the concept...
Article
In this paper we present a novel fusion framework to combine the diverse outputs of arbitrary trackers, which are typically not directly combinable, allowing for significantly increasing the tracking quality. Our main idea is first to transform individual tracking outputs such as motion inliers, bounding boxes, or spe- cific target image features to a...
Conference Paper
Unsupervised object discovery is the task of finding recurring objects over an unsorted set of images without any human supervision, which becomes more and more important as the amount of visual data grows exponentially. Existing approaches typically build on still images and rely on different prior knowledge to yield accurate results. In contrast,...
Chapter
Online learning has shown to be successful in tracking-by-detection of previously unknown objects. However, most approaches are limited to a bounding box representation with fixed aspect ratio and cannot handle highly non-rigid and articulated objects. Moreover, they provide only a limited foreground/background separation, which in turn, increases...
Chapter
The most successful approach for object detection is still applying a sliding window technique, where a pre-trained classifier is evaluated on different locations and scales. In this chapter, we interrogate this strategy in the context of stationary environments. In particular, having a fixed camera position observing the same scene a lot of prior...
Conference Paper
Full-text available
In this paper, we introduce a formulation for the task of detecting objects based on the information gathered from a standard Implicit Shape Model (ISM). We describe a probabilistic approach in a general random field setting, which enables to effectively detect object instances and additionally identifies all local patches contributing to the diffe...
Conference Paper
Full-text available
Object detection and segmentation are two challenging tasks in computer vision, which are usually considered as independent steps. In this paper, we propose a framework which jointly optimizes for both tasks and implicitly provides detection hypotheses and corresponding segmentations. Our novel approach is attachable to any of the available general...
Conference Paper
In this paper, we focus on human activity detection, which solves detection, tracking, and recognition jointly. Existing approaches typically use off-the-shelf approaches for detection and tracking, ignoring naturally given prior knowledge. Hence, in this work we present a novel strategy for learning activity specific motion models by feature-to-te...
Conference Paper
Full-text available
Matching persons across non-overlapping cameras is a rather challenging task. Thus, successful methods often build on complex feature representations or sophisticated learners. A recent trend to tackle this problem is to use metric learning to find a suitable space for matching samples from different cameras. However, most of these approaches ignor...
Conference Paper
Full-text available
One central task in many visual surveillance scenarios is person re-identification, i.e., recognizing an individual person across a network of spatially disjoint cameras. Most successful recognition approaches are either based on direct modeling of the human appearance or on machine learning. In this work, we aim at taking advantage of both directi...