Bill Triggs's research while affiliated with Laboratoire d'Informatique de Grenoble and other places

Publications (111)

Article
Full-text available
We describe an efficient approach to visual object detection that uses short cascades of asymmetric ‘one class’ classifiers to quickly reject negatives (windows not centered on an object of the desired class) within a sliding window framework. Current detectors typically use binary discriminants such as Support Vector Machines or Boosting to implem...
Article
We introduce a large margin linear binary classification framework that approximates each class with a hyperdisk – the intersection of the affine support and the bounding hypersphere of its training samples in feature space – and then finds the linear classifier that maximizes the margin separating the two hyperdisks. We contrast this with Support...
Conference Paper
In this paper, we consider face detection along with facial landmark localization inspired by the recent studies showing that incorporating object parts improves the detection accuracy. To this end, we train roots and parts detectors where the roots detector returns candidate image regions that cover the entire face, and the parts detector searches...
Conference Paper
Features such as Local Binary Patterns (LBP) and Local Ternary Patterns (LTP) have been very successful in a number of areas including texture analysis, face recognition and object detection. They are based on the idea that small patterns of qualitative local gray-level differences contain a great deal of information about higher-level image conten...
Conference Paper
Full-text available
An object detector must detect and localize each instance of the object class of interest in the image. Many recent detectors adopt a sliding window approach, reducing the problem to one of deciding whether the detection window currently contains a valid object instance or background. Machine learning based discriminants such as SVM and boosting ar...
Article
Full-text available
We introduce the hierarchical Markov aspect model (HMAM), a computationally efficient graphical model for densely labeling large remote sensing images with their underlying terrain classes. HMAM resolves local ambiguities efficiently by combining the benefits of quadtree representations and aspect models-the former incorporate multiscale visual fea...
Article
This article presents an approach for modeling landmarks based on large-scale, heavily contaminated image collections gathered from the Internet. Our system efficiently combines 2D appearance and 3D geometric constraints to extract scene summaries and ...
Article
This paper introduces a geometrically inspired large margin classifier that can be a better alternative to the support vector machines (SVMs) for the classification problems with limited number of training samples. In contrast to the SVM classifier, we approximate classes with affine hulls of their class samples rather than convex hulls. For any pa...
Article
Full-text available
In case of insufficient data samples in high-dimensional classification problems, sparse scatters of samples tend to have many ‘holes’—regions that have few or no nearby training samples from the class. When such regions lie close to inter-class boundaries, the nearest neighbors of a query may lie in the wrong class, thus leading to errors in the N...
Conference Paper
Full-text available
In this paper, we present a fast approach to obtain semantic scene segmentation with high precision. We employ a two-stage classifier to label all image pixels. First, we use the regularized logistic regression to combine different appearance-based features and the improved spatial layout of labeling information. In the second stage, we incorporate...
Conference Paper
We describe a new multiscale keypoint detector and a set of local visual descriptors, both based on the efficient Dual-Tree Complex Wavelet Transform. The detector has properties and performance similar to multiscale Förstner-Harris detectors. The descriptor provides efficient rotation-invariant matching. We evaluate the method, comparing it to a p...
Conference Paper
We describe a family of object detectors that provides state-of-the-art error rates on several important datasets including INRIA people and PASCAL VOC'06 and VOC'07. The method builds on a number of recent advances. It uses the Latent SVM learning framework and a rich visual feature set that incorporates Histogram of Oriented Gradient, Local Binar...
Conference Paper
Full-text available
We introduce a novel method for face recognition from image sets. In our setting each test and training example is a set of images of an individual's face, not just a single image, so recognition decisions need to be based on comparisons of image sets. Methods for this have two main aspects: the models used to represent the individual image sets; a...
Article
The explosion of the Internet provides us with a tremendous resource of images shared online. It also confronts vision researchers the problem of finding effective methods to navigate the vast amount of visual information. Semantic image understanding ...
Article
Full-text available
Making recognition more reliable under uncontrolled lighting conditions is one of the most important challenges for practical face recognition systems. We tackle this by combining the strengths of robust illumination normalization, local texture-based face representations, distance transform based matching, kernel-based feature extraction and multi...
Conference Paper
Full-text available
We propose large margin classifiers that are sometimes better than Support Vector Machines (SVMs) for high-dimensional classification problems with limited numbers of training samples. The basic idea is to approximate each class with a convex model of some form based on its training samples. For any pair of models of this form, there is a correspon...
Article
Conditional Random Fields (CRFs) are an effective tool for a variety of different data segmentation and labeling tasks including visual scene interpretation, which seeks to partition images into their constituent semantic-level regions and assign appropriate class labels to each region. For accurate labeling it is important to capture the global co...
Article
Full-text available
Scene segmentation and semantic labeling of Syn-thetic Aperture Radar (SAR) images is one of the key problems in interpreting SAR data. In this paper, a new approach for semantic labeling of SAR imagery is proposed based on hier-archical Markov aspect model (HMAM) with weak supervision. The motivation for this work is to incorporate the multiscale...
Conference Paper
Full-text available
In high-dimensional classification problems it is infeasible to include enough training samples to cover the class regions densely. Irregularities in the resulting sparse sample distributions cause local classifiers such as Nearest Neighbors (NN) and kernel methods to have irregular decision boundaries. One solution is to "fill in the holes" by bui...
Conference Paper
Full-text available
Nearest neighbour classifiers and related kernel meth- ods often perform poorly in high dimensional problems be- cause it is infeasible to include enough training samples to cover the class regions densely. In such cases, test sam- ples often fall into gaps between training samples where the nearest neighbours are too distant to be good indica- tor...
Conference Paper
Conditional Random Fields (CRFs) are an effective tool for a variety of different data segmentation and labelling tasks including visual scene interpretation, which seeks to partition images into their constituent semantic-level regions and assign appropriate class labels to each region. For accurate labelling it is important to capture the global...
Conference Paper
Full-text available
Extending recognition to uncontrolled situations is a key challenge for practical face recognition systems. Finding efficient and discriminative facial appearance descriptors is crucial for this. Most existing approaches use features of just one type. Here we argue that robust recognition requires several different kinds of appearance information t...
Conference Paper
Considerable advances have been made in learning to recognize and localize visual object classes. Simple bag-of- feature approaches label each pixel or patch independently. More advanced models attempt to improve the coherence of the labellings by introducing some form of inter-patch coupling: traditional spatial models such as MRF's pro- vide cris...
Article
Full-text available
Many face recognition algorithms have been developed over the past few years but the problem remains challeng-ing, especially for images taken under uncontrolled lighting conditions. We show that the robustness of several popu-lar linear subspace methods and of Local Binary Patterns (LBP) can be substantially improved by including a very simple ima...
Conference Paper
Full-text available
Some of the most effective recent methods for content-based image classifica- tion work by extracting dense or sparse local image descriptors, quantizing them according to a coding rule such as k-means vector quantization, accumulating his- tograms of the resulting "visual word" codes over the image, and classifying these with a conventional classi...
Article
Young and van Vliet have designed computationally efficient methods for approximating Gaussian-based convolutions by running a recursive infinite-impulse-response (IIR) filter forward over the input signal, then running a second IIR filter backward over the first filter's output. To transition between the two filters, they use a suboptimal heuristi...
Conference Paper
Bag-of-features representations have recently become popular for content based image classification owing to their simplicity and good performance. They evolved from texton methods in texture analysis. The basic idea is to treat images as loose collections of independent patches, sampling a representative set of patches from the image, evaluating a...
Conference Paper
Histograms of local appearance descriptors are a popular representa- tion for visual recognition. They are highly discriminant and have good resistance to local occlusions and to geometric and photometric variations, but they are not able to exploit spatial co-occurrence statistics at scales larger than their local input patches. We present a new m...
Conference Paper
Detecting humans in films and videos is a challenging problem owing to the motion of the subjects, the camera and the background and to varia- tions in pose, appearance, clothing, illumination and background clutter. We de- velop a detector for standing and moving people in videos with possibly moving cameras and backgrounds, testing several differ...
Article
Methods for reconstruction and camera estimation from miminal data are often used to boot-strap robust (RANSAC and LMS) and optimal (bundle adjustment) structure and motion estimates. Minimal methods are known for projective reconstruction from two or more uncalibrated images, and for “5 point” relative orientation and Euclidean reconstruction from...
Article
Sequential random sampling (‘Markov Chain Monte-Carlo’) is a popular strategy for many vision problems involving multi-modal distributions over high-dimensional parameter spaces. It applies both to importance sampling (where one wants to sample points according to their ‘importance’ for some calculation, but otherwise fairly) and to global-optimiza...
Article
We describe a learning-based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labeling of body parts in the image. Instead, it recovers pose by direct nonlinear regression against shape descriptor vectors extracted automatically from image silho...
Conference Paper
Recovering the pose of a person from single images is a challenging problem. This paper discusses a bottom-up approach that uses local image fea- tures to estimate human upper body pose from single images in cluttered back- grounds. The method takes the image window with a dense grid of local gradient orientation histograms, followed by non negativ...
Conference Paper
Visual codebook based quantization of robust appearance descriptors extracted from local image patches is an effective means of capturing image statistics for texture analysis and scene classification. Codebooks are usually constructed by using a method such as k-means to cluster the descriptor vectors of patches sampled either densely ('textons')...
Conference Paper
We address 3D human motion capture from monocular images, taking a learning based approach to construct a probabilistic pose estimation model from a set of labelled human silhouettes. To compensate for ambiguities in the pose reconstruction problem, our model explicitly calculates several possible pose hypotheses. It uses locality on a manifold in...
Conference Paper
Full-text available
This article introduces the absolute quadratic complex formed by all lines that intersect the absolute conic. If ω denotes the 3 × 3 symmetric matrix representing the image of that conic under the action of a camera with projection matrix P, it is shown that ω ≈ P<sup>~</sup>Ω<sub>_</sub>P<sup>~T</sup> where V is the 3 × 6 line projection matrix as...
Conference Paper
Full-text available
We propose a generative model that codes the geometry and appearance of generic visual object categories as a loose hierarchy of parts, with probabilistic spatial relations linking parts to subparts, soft assignment of subparts to parts, and scale invariant keypoint based local features at the lowest level of the hierarchy. The method is designed t...
Conference Paper
We study the question of feature sets for robust visual object recognition, adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors significantly outperform existing feature sets for human detection....
Conference Paper
Full-text available
The PASCAL Visual Object Classes Challenge ran from February to March 2005. The goal of the challenge was to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). Four object classes were selected: motor- bikes, bicycles, cars and people. Twelve teams entered the challenge. In this chapter we...
Article
Full-text available
We propose a hierarchical generative model for coding the geometry and appearance of visual object categories. The model is a collection of loosely connected parts containing more rigid assemblies of subparts. It is optimized for domains where there are relatively large numbers of somewhat informative subparts, such as the features returned by loca...
Article
We describe a sparse Bayesian regression method for recovering 3D human body motion directly from silhouettes extracted from monocular video sequences. No detailed body shape model is needed, and realism is ensured by training on real human motion capture data. The tracker estimates 3D body pose by using Relevance Vector Machine regression to combi...
Article
Full-text available
Given any generative classifier based on an inexact density model, we can define a discriminative counterpart that reduces its asymptotic error rate. We introduce a family of classifiers that interpolate the two approaches, thus providing a new way to compare them and giving an estimation pro-cedure whose classification performance is well balanced...
Article
We present a method for recovering 3D human body motion from monocular video sequences based on a robust image matching metric, incorporation of joint limits and non-self-intersection constraints, and a new sample-and-refine search strategy guided by rescaled cost-function covariances. Monocular 3D body tracking is challenging: besides the difficul...
Article
Becoming trapped in suboptimal local minima is a perennial problem when optimizing visual models, particularly in applications like monocular human body tracking where complicated parametric models are repeatedly fitted to ambiguous image measurements. We show that trapping can be significantly reduced by building `roadmaps' of nearby minima linked...
Conference Paper
We describe a learning based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labelling of body pans in the image. Instead, it recovers pose by direct nonlinear regression against shape descriptor vectors extracted automatically from image silho...
Article
image sequences is the near non-observability of kinematic degrees of freedom that generate motion in depth. For known link (body segment) lengths, the strict non-observabilities reduce to twofold `forwards /backwards flipping' ambiguities for each link. These imply formal inverse kinematics solutions for the full model, and hence linked groups of...
Conference Paper
We present a novel approach to modelling the non-linear and time-varying dynamics of human motion, using statistical methods to capture the characteristic motion patterns that exist in typical human activities. Our method is based on automatically clustering the body pose space into connected regions exhibiting similar dynamical characteristics, mo...
Conference Paper
Local feature approaches to vision geometry and object recognition are based on selecting and matching sparse sets of visually salient image points, known as 'keypoints' or 'points of interest'. Their performance depends critically on the accuracy and reliability with which corresponding keypoints can be found in subsequent images. Among the many e...
Conference Paper
Local feature approaches to vision geometry and object recognition are based on selecting and matching sparse sets of visually salient image points, known as 'keypoints' or 'points of interest'. Their performance depends critically on the accuracy and reliability with which corresponding keypoints can be found in subsequent images. Among the many e...
Article
We present a novel approach to modelling the non-linear and time-varying dynamics of human motion, using statistical methods to capture the char-acteristic motion patterns that exist in typical human activities. Our method is based on automatically clustering the body pose space into connected regions ex-hibiting similar dynamical characteristics,...
Article
We describe a learning based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labelling of body parts in the image. Instead, it recovers pose by direct nonlinear regression against shape descriptor vectors extracted automatically from image silh...
Conference Paper
A major difficulty for 3D (three-dimensional) human body tracking from monocular image sequences is the near nonobservability of kinematic degrees of freedom that generate motion in depth. For known link (body segment) lengths, the strict nonobservabilities reduce to twofold 'forwards/backwards flipping' ambiguities for each link. These imply 2<sup...
Article
We present a method for recovering 3D human body motion from monocular video sequences based on a robust image matching metric, incorporation of joint limits and non-self-intersection constraints, and a new sample-and-refine search strategy guided by rescaled cost-function covariances. Monocular 3D body tracking is challenging: besides the difficul...
Conference Paper
Full-text available
Detecting people in images is a key problem for video indexing, browsing and retrieval. The main difficulties are the large appearance variations caused by action, clothing, illumination, viewpoint and scale. Our goal is to find people in static video frames using learned models of both the appearance of body parts (head, limbs, hands), and of the...
Conference Paper
Sequential random sampling (`Markov Chain Monte-Carlo') is a popular strategy for many vision problems involving multimodal distributions over high-dimensional parameter spaces. It applies both to importance sampling (where one wants to sample points according to their `importance ' for some calculation, but otherwise fairly) and to global optimiza...
Conference Paper
Getting trapped in suboptimal local minima is a perennial problem in model based vision, especially in applications like monocular human body tracking where complex nonlinear parametric models are repeatedly fitted to ambiguous image data. We show that the trapping problem can be attacked by building ‘roadmaps’ of nearby minima linked by transition...
Article
We study the low-level problem of predicting pixel intensities after subpixel image translations. This is a basic subroutine for image warping and super-resolution, and it has a critical influence on the accuracy of subpixel matching by image correlation. Rather than using traditional frequency-space filtering theory or ad hoc interpolators such as...
Conference Paper
The topic of this first panel session was algorithms and computations. Bill Triggs chaired the discussion and David Nister, Kenichi Kanatani, Jean Ponce and Zhengyou Zhang also participated. Each panelist discussed the issues that he felt were going to be important in the future. The panel session was followed by some questions and discussions whic...
Article
We study the problem of articulated 3D human motion tracking in monocular video sequences. Addressing problems related to unconstrained scene structure, uncertainty, and the high-dimensional parameter spaces required for human modeling, we present a novel, layered-robust, multiple hypothesis algorithm for estimating the distribution of the model pa...
Conference Paper
We present a method for recovering 3D human body motion from monocular video sequences using robust image matching, joint limits and non-self-intersection constraints, and a new sample-and-refine search strategy guided by rescaled cost-function covariances. Monocular 3D body tracking is challenging: for reliable tracking at least 30 joint parameter...
Conference Paper
We introduce `Joint Feature Distributions', a general statistical framework for feature based multi-image matching that explicitly models the joint probability distributions of corresponding features across several images. Conditioning on feature positions in some of the images gives well-localized distributions for their correspondents in the othe...
Conference Paper
We study the low-level problem of predicting pixel intensities after subpixel image translations. This is a basic subroutine for image warping and super-resolution, and it has a critical influence on the accuracy of subpixel matching by image correlation. Rather than using traditional frequency-space filtering theory or ad hoc interpolators such as...
Article
We develop a probabilistic framework for feature based multi-image matching that explicitly models the joint distribution of corresponding feature positions across several images. Conditioning this distribution on feature positions in some of the images gives welllocalized distributions for their correspondents in the others, which directly guide t...
Article
We study the low-level problem of predicting pixel intensities after subpixel image translations. This is a basic subroutine for image warping and super-resolution, and it has a critical influence on the accuracy of correlation-based subpixel image matching. Rather than using traditional frequency-space filtering theory, we take an empirical approa...
Article
Full-text available
Auto-calibration is the recovery of the full camera geometry and Euclidean scene structure from several images of an unknown 3D scene, using rigidity constraints and partial knowledge of the camera intrinsic parameters. It fails for certain special classes of camera motion. This paper derives necessary and sufficient conditions for unique auto-cali...
Article
This report describes a library of C routines for finding the relative pose of two calibrated perspective cameras given the images of five unknown 3D points. The relative pose is the translational and rotational displacement between the two camera frames, also called camera motion and relative orientation.
Article
This paper is a survey of the theory and methods of photogrammetric bundle adjustment, aimed at potential implementors in the computer vision community. Bundle adjustment is the problem of refining a visual reconstruction to produce jointly optimal structure and viewing parameter estimates. Topics covered include: the choice of cost function and ro...
Conference Paper
. We study the special form that the general multi-image tensor formalism takes under the plane + parallax decomposition, including matching tensors and constraints, closure and depth recovery relations, and inter-tensor consistency constraints. Plane + parallax alignment greatly simplifies the algebra, and uncovers the underlying geometric content...
Article
. Camera pose estimation is the problem of determining the position and orientation of an internally calibrated camera from known 3D reference points and their images. We briefly survey several existing methods for pose estimation, then introduce four new linear algorithms. The first three give a unique linear solution from four points by SVD null...
Conference Paper
This paper presents techniques and animations developed from 1991 to 2000 that use digital photographs of the real world to create 3D models, virtual camera moves, and realistic computer animations. In these projects, images are used to determine the ...
Article
We investigate the motions that lead to ambiguous Euclidean scene reconstructions under several common calibration constraints, giving a complete description of such critical motions for: (i) internally calibrated orthographic and perspective cameras; (ii) in two images, for cameras with unknown focal lengths, either different or equal. One aim of...
Conference Paper
We describe two direct quasilinear methods for camera pose (absolute orientation) and calibration from a single image of 4 or 5 known 3D points. They generalize the 6 point `Direct Linear Transform' method by incorporating partial prior camera knowledge, while still allowing some unknown calibration parameters to be recovered. Only linear algebra i...
Conference Paper
We investigate the motions that lead to ambiguous Euclidean scene reconstructions under several common calibration constraints, giving a complete description of such critical motions for: (i) internally calibrated orthographic and perspective cameras; (ii) in two images, for cameras with unknown focal lengths, either different or equal. One aim of...
Conference Paper
We introduce a finite difference expansion for closely spaced cameras in projective vision, and use it to derive differential analogues of the finite-displacement projective matching tensors and constraints. The results are simpler, more general and easier to use than Astrom & Heyden's time-derivative based `continuous time matching constraints'. W...
Conference Paper
This paper is a survey of the theory and methods of photogrammetric bundle adjustment, aimed at potential implementors in the computer vision community. Bundle adjustment is the problem of refining a visual reconstruction to produce jointly optimal structure and viewing parameter estimates. Topics covered include: the choice of cost function and ro...
Article
We summarize the main known properties of factorization-based projective structure from motion, including the basic formulation, depth estimation and how it can sometimes be avoided, and some suggestions about statistical properties. Keywords: structure from motion, projective reconstruction, factorization methods, affine approximation. 1 Multi-ima...
Article
This paper describes a theory and a practical algorithm for the autocalibration of a moving projective camera, from m 5 views of a planar scene. The unknown camera calibration, and (up to scale) the unknown scene geometry and camera motion are recovered from the hypothesis that the camera's internal parameters remain constant during the motion. Thi...
Conference Paper
. We describe work in progress on a numerical library for estimating multi-image matching constraints, or more precisely the multicamera geometry underlying them. The library will cover several variants of homographic, epipolar, and trifocal constraints, using various different feature types. It is designed to be modular and open-ended, so that (i)...
Conference Paper
The author describes a new method for camera autocalibration and scaled Euclidean structure and motion, from three or more views taken by a moving camera with fixed but unknown intrinsic parameters. The motion constancy of these is used to rectify an initial projective reconstruction. Euclidean scene structure is formulated in terms of the absolute...
Article
Geometric fitting --- parameter estimation for data subject to implicit parametric constraints --- is a very common sub-problem in computer vision, used for curve, surface and 3D model fitting, matching constraint estimation and 3D reconstruction under constraints. Although many algorithms exist for specific cases, the general problem is by no mean...
Article
Contents 1 Foreword and Motivation 3 1.1 Intuitive Considerations About Perspective Projection . . . . . . . . . . . . . . . . . 4 1.1.1 An Infinitely Strange Perspective . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.2 Homogeneous Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 The Perspective Camera . . . . . . . . ....