About
155
Publications
65,761
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
27,825
Citations
Introduction
Skills and Expertise
Publications
Publications (155)
Body composition prediction from 3D optical imagery has previously been studied with linear algorithms. In this study, we present a novel application of deep 3D convolutional graph networks and nonlinear Gaussian process regression for human body shape parameterization and body composition estimation. We trained and tested linear and nonlinear mode...
Panoramic image stitching provides a unified, wide-angle view of a scene that extends beyond the camera's field of view. Stitching frames of a panning video into a panoramic photograph is a well-understood problem for stationary scenes, but when objects are moving, a still panorama cannot capture the scene. We present a method for synthesizing a pa...
Given an input painting, we reconstruct a time-lapse video of how it may have been painted. We formulate this as an autoregressive image generation problem, in which an initially blank "canvas" is iteratively updated. The model learns from real artists by training on many painting videos. Our approach incorporates text and region understanding to d...
We present a method for generating video sequences with coherent motion between a pair of input key frames. We adapt a pretrained large-scale image-to-video diffusion model (originally trained to generate videos moving forward in time from a single input image) for key frame interpolation, i.e., to produce a video in between two input frames. We ac...
Total and regional body composition are strongly correlated with metabolic syndrome and have been estimated non-invasively from 3D optical scans using linear parameterizations of body shape and linear regression models. Prior works produced accurate and precise predictions on many, but not all, body composition targets relative to the reference dua...
We present a method to generate full-body selfies from photographs originally taken at arms length. Because self-captured photos are typically taken close up, they have limited field of view and exaggerated perspective that distorts facial shapes. We instead seek to generate the photo some one else would take of you from a few feet away. Our approa...
Background:
Excess adiposity in children is strongly correlated with obesity-related metabolic disease in adulthood, including diabetes, cardiovascular disease, and 13 types of cancer. Despite the many long-term health risks of childhood obesity, body mass index (BMI) Z-score is typically the only adiposity marker used in pediatric studies and cli...
We introduce light diffusion, a novel method to improve lighting in portraits, softening harsh shadows and specular highlights while preserving overall scene illumination. Inspired by professional photographers' diffusers and scrims, our method softens lighting given only a single portrait photo. Previous portrait relighting approaches focus on cha...
We present PersonNeRF, a method that takes a collection of photos of a subject (e.g. Roger Federer) captured across multiple years with arbitrary body poses and appearances, and enables rendering the subject with arbitrary novel combinations of viewpoint, body pose, and appearance. PersonNeRF builds a customized neural volumetric 3D model of the su...
Background
Many predictors of morbidity caused by metabolic disease are associated with body shape. 3D optical (3DO) scanning captures body shape and has been shown to accurately and precisely predict body composition variables associated with mortality risk. 3DO is safer, less expensive, and more accessible than criterion body composition assessme...
We introduce 3D Moments, a new computational photography effect. As input we take a pair of near-duplicate photos, i.e., photos of moving subjects from similar viewpoints, common in people's photo collections. As output, we produce a video that smoothly interpolates the scene motion from the first photo to the second, while also producing camera mo...
We present a frame interpolation algorithm that synthesizes multiple intermediate frames from two input images with large in-between motion. Recent methods use multiple networks to estimate optical flow or depth and a separate network dedicated to frame synthesis. This is often complex and requires scarce optical flow or depth ground-truth. In this...
We introduce a free-viewpoint rendering method -- HumanNeRF -- that works on a given monocular video of a human performing complex body motions, e.g. a video from YouTube. Our method enables pausing the video at any frame and rendering the subject from arbitrary new camera viewpoints or even a full 360-degree camera path for that particular frame a...
Single image 3D photography enables viewers to view a still image from novel viewpoints. Recent approaches combine monocular depth networks with inpainting networks to achieve compelling results. A drawback of these techniques is the use of hard depth layering, making them unable to model intricate appearance details such as thin hair-like structur...
Objective
The aim of this study was to investigate whether digitally re-posing three-dimensional optical (3DO) whole-body scans to a standardized pose would improve body composition accuracy and precision regardless of the initial pose.
Methods
Healthy adults (n = 540), stratified by sex, BMI, and age, completed whole-body 3DO and dual-energy X-ra...
Every time you sit in front of a TV or monitor, your face is actively illuminated by time-varying patterns of light. This paper proposes to use this time-varying illumination for synthetic relighting of your face with any new illumination condition. In doing so, we take inspiration from the light stage work of Debevec et al., who first demonstrated...
Given an "in-the-wild" video of a person, we reconstruct an animatable model of the person in the video. The output model can be rendered in any body pose to any camera view, via the learned controls, without explicit 3D mesh reconstruction. At the core of our method is a volumetric 3D human representation reconstructed with a deep network trained...
We introduce a real-time, high-resolution background replacement technique which operates at 30fps in 4K resolution, and 60fps for HD on a modern GPU. Our technique is based on background matting, where an additional frame of the background is captured and used in recovering the alpha matte and the foreground layer. The main challenge is to compute...
In this paper, we demonstrate a fully automatic method for converting a still image into a realistic animated looping video. We target scenes with continuous fluid motion, such as flowing water and billowing smoke. Our method relies on the observation that this type of natural motion can be convincingly reproduced from a static Eulerian motion desc...
We present a new framework for sketch-based modeling and animation of 3D organic shapes that can work entirely in an intuitive 2D domain, enabling a playful, casual experience. Unlike previous sketch-based tools, our approach does not require a tedious part-based multi-view workflow with the explicit specification of an animation rig. Instead, we c...
Great progress has been made in 3D body pose and shape estimation from a single photo. Yet, state-of-the-art results still suffer from errors due to challenging body poses, modeling clothing, and self occlusions. The domain of basketball games is particularly challenging, as it exhibits all of these challenges. In this paper, we introduce a new app...
We address the problem of geo-registering ground-based multi-view stereo models by ground-to-aerial image matching. The main contribution is a fully automated geo-registration pipeline with a novel viewpoint-dependent matching method that handles ground to aerial viewpoint variation. We conduct large-scale experiments which consist of many popular...
We address the problem of extending the field of view of a photo-an operation we call uncrop. Given a reference photograph to be uncropped, our approach selects, reprojects, and composites a subset of Internet imagery taken near the reference into a larger image around the reference using the underlying scene geometry. The proposed Markov Random Fi...
This paper leverages occluding contours (aka 'internal silhouettes') to improve the performance of multi-view stereo methods. The contributions are 1) a new technique to identify free-space regions arising from occluding contours, and 2) a new approach for incorporating the resulting free-space constraints into Poisson surface reconstruction. The p...
In this paper, we propose a simple, novel plane sweep technique for refocusing plenoptic images. Rays are projected directly from the raw plenoptic image captured on the sensor into the output image plane, without computing intermediate representations such as subaperture views or epipolar images. Interpolation is performed in the output image plan...
We present MotionMontage, a system for recording multiple motion takes of a rigid virtual object and compositing them together into a montage. Our system incorporates a Kinect-based performance capture setup that allows animators to create 3D animations by tracking the motion of a rigid physical object and mapping it in realtime onto a virtual obje...
We present the first large scale system for capturing and rendering relight able scene reconstructions from massive unstructured photo collections taken under different illumination conditions and viewpoints. We combine photos taken from many sources, Flickr-Based ground-level imagery, oblique aerial views, and street view, to recover models that a...
We demonstrate a realtime system which infers and tracks the assembly process of a snap-together block model using a Kinect® sensor. The inference enables us to build a virtual replica of the model at every step. Tracking enables us to provide context specific visual feedback on a screen by augmenting the rendered virtual model aligned with the phy...
We show that, under spatially varying illumination, the light transport of diffuse scenes can be decomposed into direct, near-range (subsurface scattering and local inter-reflections) and far-range transports (diffuse inter-reflections). We show that these three component transports are redundant either in the spatial or the frequency domain and ca...
We present a system for producing 3D animations using physical objects (i.e., puppets) as input. Puppeteers can load 3D models of familiar rigid objects, including toys, into our system and use them as puppets for an animation. During a performance, the puppeteer physically manipulates these puppets in front of a Kinect depth sensor. Our system use...
This paper describes an effort to automatically create "tours" of thousands of the world's landmarks from geo-tagged user-contributed photos on the Internet. These photo tours take you through each site's most popular viewpoints on a tour that maximizes visual quality and traversal efficiency. This planning problem is framed as a form of the Travel...
This paper introduces a schematic representation for architectural scenes together with robust algorithms for reconstruction from sparse 3D point cloud data. The schematic models architecture as a network of transport curves, approximating a floorplan, with associated profile curves, together comprising an interconnected set of swept surfaces. The...
We propose a novel framework for reconstructing homogenous, transparent, refractive height-fields from a single viewpoint. The height-field is imaged against a known planar background, or sequence of backgrounds. Unlike existing approaches that do a point-by-point reconstruction - which is known to have intractable ambiguities - our method estimate...
Imagining what a proposed home remodel might look like without actually performing it is challenging. We present an image-based remodeling methodology that allows real-time photorealistic visualization during both the modeling and remodeling process of a home interior. Large-scale edits, like removing a wall or enlarging a window, are performed eas...
In this paper, we train a computer to select still frames from video that work well as candid portraits. Because of the subjective nature of this task, we conduct a human subjects study to collect ratings of video frames across multiple videos. Then, we compute a number of features and train a model to predict the average rating of a video frame. W...
We present a system that can reconstruct 3D geometry from large, unorganized collections of photographs such as those found by searching for a given city (e.g., Rome) on Internet photo-sharing sites. Our system is built on a set of new, distributed computer vision algorithms for image matching and 3D reconstruction, designed to maximize parallelism...
To investigate the effect of a vacuum-assisted socket suspension system as compared with pin suspension on lower extremity amputees.
Randomized crossover with 3-week acclimation.
Household, community, and laboratory environments.
Unilateral, transtibial amputees (N=20 enrolled, N=5 completed).
(1) Total surface-bearing socket with a vacuum-assisted...
We present the design and implementation of new inexact Newton type Bundle Adjustment algorithms that exploit hardware parallelism for efficiently solving large scale 3D scene reconstruction problems. We explore the use of multicore CPU as well as multicore GPUs for this purpose. We show that overcoming the severe memory and bandwidth limitations o...
Recognizing and manipulating objects is an im- portant task for mobile robots performing useful services in everyday environments. In this paper, we develop a system that enables a robot to grasp an object and to move it in front of its depth camera so as to build a 3D surface model of the object. We derive an information gain based variant of the...
Obscure glass is textured glass designed to separate spaces and \obscure" visibility between the spaces. Such glass is used to provide privacy while still allowing light to ow into a space, and is often found in homes and oces. We propose and explore the challenge of \seeing through" obscure glass, using both optical and digital techniques. In some...
We present a novel single image deblurring method to estimate spatially non-uniform blur that results from camera shake. We
use existing spatially invariant deconvolution methods in a local and robust way to compute initial estimates of the latent
image. The camera motion is represented as a Motion Density Function (MDF) which records the fraction...
Community photo collections like Flickr offer a rich, ever-growing record of the world around us. New computer vision techniques can use photographs from these collections to rapidly build detailed 3D models.
This paper introduces an approach for enabling existing multi-view stereo methods to operate on extremely large unstructured photo collections. The main idea is to decompose the collection into a set of overlapping sets of photos that can be processed in parallel, and to merge the resulting reconstructions. This overlapping clustering problem is fo...
This paper describes a photometric stereo method designed for surfaces with spatially-varying BRDFs, including surfaces with both varying diffuse and specular properties. Our optimization-based method builds on the observation that most objects are composed of a small number of fundamental materials by constraining each pixel to be representable by...
We present an optimization framework for exploring gradient-domain solutions for image and video processing. The proposed framework unifies many of the key ideas in the gradient-domain literature under a single optimization formulation. Our hope is that this generalized framework will allow the reader to quickly gain a general understanding of the...
We present an optimization framework for exploring gradient-domain solutions for image and video processing. The proposed framework unifies many of the key ideas in the gradient-domain literature under a single optimization formulation. Our hope is that this generalized framework will allow the reader to quickly gain a general understanding of the...
This paper proposes a fully automated 3D reconstruction and visualization system for architectural scenes (interiors and exteriors). The reconstruction of indoor environments from photographs is particularly challenging due to texture-poor planar surfaces such as uniformly-painted walls. Our system first uses structure-from-motion, multi-view stere...
Multi-view stereo (MVS) algorithms now produce reconstructions that rival laser range scanner accuracy. However, stereo algorithms require textured surfaces, and therefore work poorly for many architectural scenes (e.g., building interiors with textureless, painted walls). This paper presents a novel MVS approach to overcome these limitations for M...
We present solutions for enhancing the spatial and/or temporal resolution of videos. Our algorithm targets the emerging consumer-level hybrid cameras that can simultaneously capture video and high-resolution stills. Our technique produces a high spacetime resolution video using the high-resolution stills for rendering and the low-resolution video t...
We present an approach to convert a small portion of a light field with extracted depth information into a cinematic effect with sim- ulated, smooth camera motion that exhibits a sense of 3D parallax. We develop a taxonomy of the cinematic conventions of these ef- fects, distilled from observations of documentary film footage and organized by the n...
This paper presents an approach to render novel views from input photographs, a task which is commonly referred to as image based rendering. We first compute dense view dependent depthmaps us- ing consistent segmentation. This method jointly computes multi- view stereo and segments input photographs while accounting for mixed pixels (matting). We t...
We present solutions for enhancing the spatial and/or tem- poral resolution of videos. Our algorithm targets the emerg- ing consumer-level hybrid cameras that can simultaneously capture video and high-resolution stills. Our technique pro- duces a high spacetime resolution video using the high- resolution stills for rendering and the low-resolution...
We explore the use of tracked 2D object motion to enable novel approaches to interacting with video. These include moving annotations, video navigation by direct manipulation of objects, and creating an image composite from multiple video frames. Features in the video are automatically tracked and grouped in an off-line preprocess that enables late...
We analyze the problem of reconstructing a 2D function that approximates a set of desired gradients and a data term. The combined data and gradient terms enable operations like modifying the gradients of an image while staying close to the original image. Starting with a variational formulation, we arrive at the "screened Poisson equation" known in...
We present a system for creating and viewing interactive exploded views of complex 3D models. In our approach, a 3D input model is organized into an explosion graph that encodes how parts ex- plode with respect to each other. We present an automatic method for computing explosion graphs that takes into account part hierar- chies in the input models...