Andrew Fitzgibbon

Andrew Fitzgibbon
Microsoft · Cambridge

PhD, University of Edinburgh

About

246
Publications
76,995
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
38,762
Citations
Additional affiliations
May 2005 - present
Microsoft
Position
  • Principal Investigator
September 1990 - May 1996
University of Edinburgh
May 1996 - May 2005
University of Oxford
Description
  • Royal Society University Research Fellow
Education
September 1992 - September 1997
University of Edinburgh
Field of study
  • Artificial Intelligence
September 1989 - September 1990
Heriot-Watt University
Field of study
  • Computer Science
September 1986 - August 1989
University College Cork
Field of study
  • Mathematics and Computer Science

Publications

Publications (246)
Article
There is good evidence that simple animals, such as bees, use view-based strategies to return to a familiar location, whereas humans might use a 3-D reconstruction to achieve the same goal. Assuming some noise in the storage and retrieval process, these two types of strategy give rise to different patterns of predicted errors in homing. We describe...
Article
Full-text available
We demonstrate the use of shape-from-shading (SfS) to improve both the quality and the robustness of 3D reconstruction of dynamic objects captured by a single camera. Unlike previous approaches that made use of SfS as a post-processing step, we offer a principled integrated approach that solves dynamic object tracking and reconstruction and SfS as...
Article
Full-text available
We present new methods for simultaneously estimating camera geometry and time shift from video sequences from multiple unsynchronized cameras. Algorithms for simultaneous computation of a fundamental matrix or a homography with unknown time shift between images are developed. Our methods use minimal correspondence sets (eight for fundamental matrix...
Conference Paper
This paper presents ShadowHands - a novel technique for visualizing a remote user's hand gestures using a single depth sensor and hand tracking system. Previous work has shown that making distributed users better aware of each other's gestures facilitates remote collaboration. These systems presented virtual embodiments as a stream of raw 2D or 3D...
Conference Paper
Bundle adjustment is used in structure-from-motion pipelines as final refinement stage requiring a sufficiently good initialization to reach a useful local mininum. Starting from an arbitrary initialization almost always gets trapped in a poor minimum. In this work we aim to obtain an initialization-free approach which returns global minima from a...
Article
We have shown that, in a sparse cue environment, small changes in scene layout can significantly affect the precision with which observers can return to a previously-viewed location (Pickup, L.C., Fitzgibbon, A.W. and Glennerster, A. (2013) Biological Cybernetics, 107, 449-464). The scene consisted of three very long vertical poles viewed from one...
Article
Fully articulated hand tracking promises to enable fundamentally new interactions with virtual and augmented worlds, but the limited accuracy and efficiency of current systems has prevented widespread adoption. Today's dominant paradigm uses machine learning for initialization and recovery followed by iterative model-fitting optimization to achieve...
Chapter
This chapter presents commonly used terms across image processing, computer vision and related fields, including machine vision. It contains terms starting with numbers. Information is provided for several terms, including 1D projection, 2D coordinate system, 2D Fourier transform, 2D image, 2D projection, and 2D pose estimation. The chapter also ex...
Conference Paper
Full-text available
Matrix factorization (or low-rank matrix completion) with missing data is a key computation in many computer vision and machine learning tasks, and is also related to a broader class of nonlinear optimization problems such as bundle adjustment. The problem has received much attention recently, with renewed interest in variable-projection approaches...
Article
Matrix factorization (or low-rank matrix completion) with missing data is a key computation in many computer vision and machine learning tasks, and is also related to a broader class of nonlinear optimization problems such as bundle adjustment. The problem has received much attention recently, with renewed interest in variable-projection approaches...
Article
Observers make large, systematic errors when they point to an unseen target (VSS 2014). Here, we report modelling of these errors. In the experiment (reported VSS 2014), participants in a real or virtual environment viewed a set of 4 target boxes from one location and then walked behind a set of screens with the targets obscured from view. They wer...
Article
Full-text available
We present a new method for inferring dense data to model correspondences, focusing on the application of human pose estimation from depth images. Recent work proposed the use of regression forests to quickly predict correspondences between depth pixels and points on a 3D human mesh model. That work, however, used a proxy forest training objective...
Article
Full-text available
We present a 3D scanning system for deformable objects that uses only a single Kinect sensor. Our work allows considerable amount of nonrigid deformations during scanning, and achieves high quality results without heavily constraining user or camera motion. We do not rely on any prior shape knowledge, enabling general object scanning with freeform...
Conference Paper
Full-text available
We present a new real-time hand tracking system based on a single depth camera. The system can accurately reconstruct complex hand poses across a variety of subjects. It also allows for robust tracking, rapidly recovering from any temporary failures. Most uniquely, our tracker is highly flexible, dramatically improving upon previous approaches whic...
Article
We introduce a machine learning approach to demosaicing, the reconstruction of color images from incomplete color filter array samples. There are two challenges to overcome by a demosaicing method: first, it needs to model and respect the statistics of natural images in order to reconstruct natural looking images; second, it needs to be able to per...
Conference Paper
Motion in the image plane is ultimately a function of 3D motion in space. We propose to compute optical flow using what is ostensibly an extreme overparameterization: depth, surface normal, and frame-to-frame 3D rigid body motion at every pixel, giving a total of 9 DoF. The advantages of such an overparameterization are twofold: first, geometricall...
Patent
A method and apparatus for processing video is disclosed. In an embodiment, image features of an object within a frame of video footage are identified and the movement of each of these features is tracked throughout the video footage to determine its trajectory (track). The tracks are analyzed, the maximum separation of the tracks is determined and...
Article
We present a combined hardware and software solution for marker-less reconstruction of non-rigidly deforming physical objects with arbitrary shape in real-time. Our system uses a single self-contained stereo camera unit built from off-the-shelf components and consumer graphics hardware to generate spatio-temporally coherent 3D models at 30 Hz. A ne...
Conference Paper
Full-text available
Background / Purpose: How do humans build a spatial map of objects in their environment, and keep track of them while moving around in space with the objects disappearing out of their field of view? We aim to describe mathematical models that are able to explain what strategies are being used. Main conclusion: Participants show large, systema...
Conference Paper
We take a new approach to computing dense scene flow between a pair of consecutive RGB-D frames. We exploit the availability of depth data by seeking correspondences with respect to patches specified not as the pixels inside square windows, but as the 3D points that are the inliers of spheres in world space. Our primary contribution is to show that...
Conference Paper
This paper presents a method for acquiring dense nonrigid shape and deformation from a single monocular depth sensor. We focus on modeling the human hand, and assume that a single rough template model is available. We combine and extend existing work on model-based tracking, subdivision surface fitting, and mesh deformation to acquire detailed hand...
Conference Paper
We address the problem of estimating the pose of a cam- era relative to a known 3D scene from a single RGB-D frame. We formulate this problem as inversion of the generative rendering procedure, i.e., we want to find the camera pose corresponding to a rendering of the 3D scene model that is most similar with the observed input. This is a non-convex...
Patent
Computing pose and/or shape of a modifiable entity is described. In various embodiments a model of an entity (such as a human hand, a golf player holding a golf club, an animal, a body organ) is fitted to an image depicting an example of the entity in a particular pose and shape. In examples, an optimization process finds values of pose and/or shap...
Patent
Techniques for human body pose estimation are disclosed herein. Images such as depth images, silhouette images, or volumetric images may be generated and pixels or voxels of the images may be identified. The techniques may process the pixels or voxels to determine a probability that each pixel or voxel is associated with a segment of a body capture...
Patent
Foreground and background image segmentation is described. In an example, a seed region is selected in a foreground portion of an image, and a geodesic distance is calculated from each image element to the seed region. A subset of the image elements having a geodesic distance less than a threshold is determined, and this subset of image elements ar...
Chapter
In this chapter we review the problem of object class recognition in large image collections.We focus specifically on scenarios where the classes to be recognized are not known in advance. The motivating application is “object-class search by example” where a user provides at query time a small set of training images defining an arbitrary novel cat...
Patent
Systems and methods are disclosed for identifying objects captured by a depth camera by condensing classified image data into centroids of probability that captured objects are correctly identified entities. Output exemplars are processed to detect spatially localized clusters of non-zero probability pixels. For each cluster, a centroid is generate...
Article
Full-text available
We describe two new approaches to human pose estimation. Both can quickly and accurately predict the 3D positions of body joints from a single depth image without using any temporal information. The key to both approaches is the use of a large, realistic, and highly varied synthetic set of training images. This allows us to learn models that are la...
Article
Kinectrack is a novel approach to 6-DoF tracking that provides agile real-time pose estimation using only commodity hardware. The dot pattern emitter and IR camera components of the standard Kinect device are separated to allow the emitter to roam freely relative to a fixed camera. The 6-DoF pose of the emitter component is recovered by matching th...
Patent
Three-dimensional environment reconstruction is described. In an example, a 3D model of a real-world environment is generated in a 3D volume made up of voxels stored on a memory device. The model is built from data describing a camera location and orientation, and a depth image with pixels indicating a distance from the camera to a point in the env...
Patent
Predicting joint positions is described, for example, to find joint positions of humans or animals (or parts thereof) in an image to control a computer game or for other applications. In an embodiment image elements of a depth image make joint position votes so that for example, an image element depicting part of a torso may vote for a position of...
Patent
Use of a 3D environment model in gameplay is described. In an embodiment, a mobile depth camera is used to capture a series of depth images as it is moved around and a dense 3D model of the environment is generated from this series of depth images. This dense 3D model is incorporated within an interactive application, such as a game. The mobile dep...
Article
Full-text available
PatchMatch is a simple, yet very powerful and successful method for optimizing continuous labelling problems. The algorithm has two main ingredients: the update of the solution space by sampling and the use of the spatial neighbourhood to propagate samples. We show how these ingredients are related to steps in a specific form of belief propagation...
Patent
A computerized decision tree training system may include a distributed control processing unit configured to receive input of training data for training a decision tree. The system may further include a plurality of data batch processing units, each data batch processing unit being configured to evaluate each of a plurality of split functions of a...
Conference Paper
A resolution-independent image models the true intensity function underlying a standard image of discrete pixels. Previous work on resolution-independent images demonstrated their efficacy, primarily by employing regularizers that penalize discontinuity. This paper extends the approach by permitting the curvature of resolution-independent images to...
Conference Paper
Full-text available
Traditional Web search engines do not use the images in the HTML pages to find relevant documents for a given query. Instead, they typically operate by computing a measure of agreement between the keywords provided by the user and only the text portion of each page. In this paper we study whether the content of the pictures appearing in a Web page...
Article
Full-text available
It is often assumed that humans generate a 3D reconstruction of the environment, either in egocentric or world-based coordinates, but the steps involved are unknown. Here, we propose two reconstruction-based models, evaluated using data from two tasks in immersive virtual reality. We model the observer's prediction of landmark location based on sta...
Conference Paper
We address the problem of inferring the pose of an RGB-D camera relative to a known 3D scene, given only a single acquired image. Our approach employs a regression forest that is capable of inferring an estimate of each pixel's correspondence to 3D points in the scene's world coordinate frame. The forest uses only simple depth and RGB pixel compari...
Patent
Moving object segmentation using depth images is described. In an example, a moving object is segmented from the background of a depth image of a scene received from a mobile depth camera. A previous depth image of the scene is retrieved, and compared to the current depth image using an iterative closest point algorithm. The iterative closest point...
Conference Paper
Full-text available
We present a new method for inferring dense data to model correspondences, focusing on the application of human pose estimation from depth images. Recent work proposed the use of regression forests to quickly predict correspondences between depth pixels and points on a 3D human mesh model. That work, however, used a proxy forest training objective...
Patent
Full-text available
Content-based information retrieval is described. In an example, a query item such as an image, document, email or other item is presented and items with similar content are retrieved from a database of items. In an example, each time a query is presented, a classifier is formed based on that query and using a training set of items. For example, th...
Article
Full-text available
Recounts the career and contributions pf Mark Everingham.
Conference Paper
We present Kinectrack, a new six degree-of-freedom (6-DoF) tracker which allows real-time and low-cost pose estimation using only commodity hardware. We decouple the dot pattern emitter and IR camera of the Kinect. Keeping the camera fixed and moving the IR emitter in the environment, we recover the 6-DoF pose of the emitter by matching the observe...
Conference Paper
KinÊtre allows novice users to scan arbitrary physical objects and bring them to life in seconds. The fully interactive system allows diverse static meshes to be animated using the entire human body. Traditionally, the process of mesh animation is laborious and requires domain expertise, with rigging specified manually by an artist when designing t...
Conference Paper
Imagine you are asked to produce a 3D animation of a demonic armchair terrorizing an innocent desk lamp. You may think about model rigging, skeleton deformation, and keyframing. Depending on your experience, you might imagine hours to days at the controls of Maya or Blender. But even if you have absolutely no computer graphics experience, it can be...
Conference Paper
We present a model for early vision tasks such as denoising, super-resolution, deblurring, and demosaicing. The model provides a resolution-independent representation of discrete images which admits a truly rotationally invariant prior. The model generalizes several existing approaches: variational methods, finite element methods, and discrete rand...
Conference Paper
Full-text available
Fitting an articulated model to image data is often approached as an optimization over both model pose and model-to-image correspondence. For complex models such as humans, previous work has required a good initialization, or an alternating minimization between correspondence and pose. In this paper we investigate one-shot pose estimation: can we d...
Article
Full-text available
3D morphable models are low-dimensional parametrizations of 3D object classes which provide a powerful means of associating 3D geometry to 2D images. However, morphable models are currently generated from 3D scans, so for general object classes such as animals they are economically and practically infeasible. We show that, given a small amount of u...