Andrew FitzgibbonMicrosoft · Cambridge
Andrew Fitzgibbon
PhD, University of Edinburgh
About
246
Publications
76,995
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
38,762
Citations
Additional affiliations
May 2005 - present
September 1990 - May 1996
May 1996 - May 2005
Education
September 1992 - September 1997
September 1989 - September 1990
September 1986 - August 1989
Publications
Publications (246)
There is good evidence that simple animals, such as bees, use view-based strategies to return to a familiar location, whereas humans might use a 3-D reconstruction to achieve the same goal. Assuming some noise in the storage and retrieval process, these two types of strategy give rise to different patterns of predicted errors in homing. We describe...
We demonstrate the use of shape-from-shading (SfS) to improve both the quality and the robustness of 3D reconstruction of dynamic objects captured by a single camera. Unlike previous approaches that made use of SfS as a post-processing step, we offer a principled integrated approach that solves dynamic object tracking and reconstruction and SfS as...
We present new methods for simultaneously estimating camera geometry and time shift from video sequences from multiple unsynchronized cameras. Algorithms for simultaneous computation of a fundamental matrix or a homography with unknown time shift between images are developed. Our methods use minimal correspondence sets (eight for fundamental matrix...
This paper presents ShadowHands - a novel technique for visualizing a remote user's hand gestures using a single depth sensor and hand tracking system. Previous work has shown that making distributed users better aware of each other's gestures facilitates remote collaboration. These systems presented virtual embodiments as a stream of raw 2D or 3D...
Bundle adjustment is used in structure-from-motion pipelines as final refinement stage requiring a sufficiently good initialization to reach a useful local mininum. Starting from an arbitrary initialization almost always gets trapped in a poor minimum. In this work we aim to obtain an initialization-free approach which returns global minima from a...
We have shown that, in a sparse cue environment, small changes in scene layout can significantly affect the precision with which observers can return to a previously-viewed location (Pickup, L.C., Fitzgibbon, A.W. and Glennerster, A. (2013) Biological Cybernetics, 107, 449-464). The scene consisted of three very long vertical poles viewed from one...
Fully articulated hand tracking promises to enable fundamentally new interactions with virtual and augmented worlds, but the limited accuracy and efficiency of current systems has prevented widespread adoption. Today's dominant paradigm uses machine learning for initialization and recovery followed by iterative model-fitting optimization to achieve...
This chapter presents commonly used terms across image processing, computer vision and related fields, including machine vision. It contains terms starting with numbers. Information is provided for several terms, including 1D projection, 2D coordinate system, 2D Fourier transform, 2D image, 2D projection, and 2D pose estimation. The chapter also ex...
Matrix factorization (or low-rank matrix completion) with missing data is a key computation in many computer vision and machine learning tasks, and is also related to a broader class of nonlinear optimization problems such as bundle adjustment. The problem has received much attention recently, with renewed interest in variable-projection approaches...
Matrix factorization (or low-rank matrix completion) with missing data is a key computation in many computer vision and machine learning tasks, and is also related to a broader class of nonlinear optimization problems such as bundle adjustment. The problem has received much attention recently, with renewed interest in variable-projection approaches...
Observers make large, systematic errors when they point to an unseen target (VSS 2014). Here, we report modelling of these errors. In the experiment (reported VSS 2014), participants in a real or virtual environment viewed a set of 4 target boxes from one location and then walked behind a set of screens with the targets obscured from view. They wer...
We present a new method for inferring dense data to model correspondences, focusing on the application of human pose estimation from depth images. Recent work proposed the use of regression forests to quickly predict correspondences between depth pixels and points on a 3D human mesh model. That work, however, used a proxy forest training objective...
We present a 3D scanning system for deformable objects that uses only a single Kinect sensor. Our work allows considerable amount of nonrigid deformations during scanning, and achieves high quality results without heavily constraining user or camera motion. We do not rely on any prior shape knowledge, enabling general object scanning with freeform...
We present a new real-time hand tracking system based on a single depth camera. The system can accurately reconstruct complex hand poses across a variety of subjects. It also allows for robust tracking, rapidly recovering from any temporary failures. Most uniquely, our tracker is highly flexible, dramatically improving upon previous approaches whic...
We introduce a machine learning approach to demosaicing, the reconstruction of color images from incomplete color filter array samples. There are two challenges to overcome by a demosaicing method: first, it needs to model and respect the statistics of natural images in order to reconstruct natural looking images; second, it needs to be able to per...
Motion in the image plane is ultimately a function of 3D motion in space. We propose to compute optical flow using what is ostensibly an extreme overparameterization: depth, surface normal, and frame-to-frame 3D rigid body motion at every pixel, giving a total of 9 DoF. The advantages of such an overparameterization are twofold: first, geometricall...
A method and apparatus for processing video is disclosed. In an embodiment, image features of an object within a frame of video footage are identified and the movement of each of these features is tracked throughout the video footage to determine its trajectory (track). The tracks are analyzed, the maximum separation of the tracks is determined and...
We present a combined hardware and software solution for marker-less reconstruction of non-rigidly deforming physical objects with arbitrary shape in real-time. Our system uses a single self-contained stereo camera unit built from off-the-shelf components and consumer graphics hardware to generate spatio-temporally coherent 3D models at 30 Hz. A ne...
Background / Purpose:
How do humans build a spatial map of objects in their environment, and keep track of them while moving around in space with the objects disappearing out of their field of view? We aim to describe mathematical models that are able to explain what strategies are being used.
Main conclusion:
Participants show large, systema...
We take a new approach to computing dense scene flow between a pair of consecutive RGB-D frames. We exploit the availability of depth data by seeking correspondences with respect to patches specified not as the pixels inside square windows, but as the 3D points that are the inliers of spheres in world space. Our primary contribution is to show that...
This paper presents a method for acquiring dense nonrigid shape and deformation from a single monocular depth sensor. We focus on modeling the human hand, and assume that a single rough template model is available. We combine and extend existing work on model-based tracking, subdivision surface fitting, and mesh deformation to acquire detailed hand...
We address the problem of estimating the pose of a cam- era relative to a known 3D scene from a single RGB-D frame. We formulate this problem as inversion of the generative rendering procedure, i.e., we want to find the camera pose corresponding to a rendering of the 3D scene model that is most similar with the observed input. This is a non-convex...
Computing pose and/or shape of a modifiable entity is described. In various embodiments a model of an entity (such as a human hand, a golf player holding a golf club, an animal, a body organ) is fitted to an image depicting an example of the entity in a particular pose and shape. In examples, an optimization process finds values of pose and/or shap...
Techniques for human body pose estimation are disclosed herein. Images such as depth images, silhouette images, or volumetric images may be generated and pixels or voxels of the images may be identified. The techniques may process the pixels or voxels to determine a probability that each pixel or voxel is associated with a segment of a body capture...
Foreground and background image segmentation is described. In an example, a seed region is selected in a foreground portion of an image, and a geodesic distance is calculated from each image element to the seed region. A subset of the image elements having a geodesic distance less than a threshold is determined, and this subset of image elements ar...
In this chapter we review the problem of object class recognition in large image collections.We focus specifically on scenarios where the classes to be recognized are not known in advance. The motivating application is “object-class search by example” where a user provides at query time a small set of training images defining an arbitrary novel cat...
Systems and methods are disclosed for identifying objects captured by a depth camera by condensing classified image data into centroids of probability that captured objects are correctly identified entities. Output exemplars are processed to detect spatially localized clusters of non-zero probability pixels. For each cluster, a centroid is generate...
We describe two new approaches to human pose estimation. Both can quickly and accurately predict the 3D positions of body joints from a single depth image without using any temporal information. The key to both approaches is the use of a large, realistic, and highly varied synthetic set of training images. This allows us to learn models that are la...
Kinectrack is a novel approach to 6-DoF tracking that provides agile real-time pose estimation using only commodity hardware. The dot pattern emitter and IR camera components of the standard Kinect device are separated to allow the emitter to roam freely relative to a fixed camera. The 6-DoF pose of the emitter component is recovered by matching th...
Three-dimensional environment reconstruction is described. In an example, a 3D model of a real-world environment is generated in a 3D volume made up of voxels stored on a memory device. The model is built from data describing a camera location and orientation, and a depth image with pixels indicating a distance from the camera to a point in the env...
Predicting joint positions is described, for example, to find joint positions of humans or animals (or parts thereof) in an image to control a computer game or for other applications. In an embodiment image elements of a depth image make joint position votes so that for example, an image element depicting part of a torso may vote for a position of...
Use of a 3D environment model in gameplay is described. In an embodiment, a mobile depth camera is used to capture a series of depth images as it is moved around and a dense 3D model of the environment is generated from this series of depth images. This dense 3D model is incorporated within an interactive application, such as a game. The mobile dep...
PatchMatch is a simple, yet very powerful and successful method for optimizing continuous labelling problems. The algorithm has two main ingredients: the update of the solution space by sampling and the use of the spatial neighbourhood to propagate samples. We show how these ingredients are related to steps in a specific form of belief propagation...
A computerized decision tree training system may include a distributed control processing unit configured to receive input of training data for training a decision tree. The system may further include a plurality of data batch processing units, each data batch processing unit being configured to evaluate each of a plurality of split functions of a...
A resolution-independent image models the true intensity function underlying a standard image of discrete pixels. Previous work on resolution-independent images demonstrated their efficacy, primarily by employing regularizers that penalize discontinuity. This paper extends the approach by permitting the curvature of resolution-independent images to...
Traditional Web search engines do not use the images in the HTML pages to find relevant documents for a given query. Instead, they typically operate by computing a measure of agreement between the keywords provided by the user and only the text portion of each page. In this paper we study whether the content of the pictures appearing
in a Web page...
It is often assumed that humans generate a 3D reconstruction of the environment, either in egocentric or world-based coordinates, but the steps involved are unknown. Here, we propose two reconstruction-based models, evaluated using data from two tasks in immersive virtual reality. We model the observer's prediction of landmark location based on sta...
We address the problem of inferring the pose of an RGB-D camera relative to a known 3D scene, given only a single acquired image. Our approach employs a regression forest that is capable of inferring an estimate of each pixel's correspondence to 3D points in the scene's world coordinate frame. The forest uses only simple depth and RGB pixel compari...
Moving object segmentation using depth images is described. In an example, a moving object is segmented from the background of a depth image of a scene received from a mobile depth camera. A previous depth image of the scene is retrieved, and compared to the current depth image using an iterative closest point algorithm. The iterative closest point...
We present a new method for inferring dense data to model correspondences, focusing on the application of human pose estimation from depth images. Recent work proposed the use of regression forests to quickly predict correspondences between depth pixels and points on a 3D human mesh model. That work, however, used a proxy forest training objective...
Content-based information retrieval is described. In an example, a query item such as an image, document, email or other item is presented and items with similar content are retrieved from a database of items. In an example, each time a query is presented, a classifier is formed based on that query and using a training set of items. For example, th...
Recounts the career and contributions pf Mark Everingham.
We present Kinectrack, a new six degree-of-freedom (6-DoF) tracker which allows real-time and low-cost pose estimation using only commodity hardware. We decouple the dot pattern emitter and IR camera of the Kinect. Keeping the camera fixed and moving the IR emitter in the environment, we recover the 6-DoF pose of the emitter by matching the observe...
KinÊtre allows novice users to scan arbitrary physical objects and bring them to life in seconds. The fully interactive system allows diverse static meshes to be animated using the entire human body. Traditionally, the process of mesh animation is laborious and requires domain expertise, with rigging specified manually by an artist when designing t...
Imagine you are asked to produce a 3D animation of a demonic armchair terrorizing an innocent desk lamp. You may think about model rigging, skeleton deformation, and keyframing. Depending on your experience, you might imagine hours to days at the controls of Maya or Blender. But even if you have absolutely no computer graphics experience, it can be...
We present a model for early vision tasks such as denoising, super-resolution, deblurring, and demosaicing. The model provides a resolution-independent representation of discrete images which admits a truly rotationally invariant prior. The model generalizes several existing approaches: variational methods, finite element methods, and discrete rand...
Fitting an articulated model to image data is often approached as an optimization over both model pose and model-to-image correspondence. For complex models such as humans, previous work has required a good initialization, or an alternating minimization between correspondence and pose. In this paper we investigate one-shot pose estimation: can we d...
3D morphable models are low-dimensional parametrizations of 3D object classes which provide a powerful means of associating 3D geometry to 2D images. However, morphable models are currently generated from 3D scans, so for general object classes such as animals they are economically and practically infeasible. We show that, given a small amount of u...