Philip Davidson's research while affiliated with Google Inc. and other places

Publications (21)

Article
The light transport (LT) of a scene describes how it appears under different lighting conditions from different viewing directions, and complete knowledge of a scene’s LT enables the synthesis of novel views under arbitrary lighting. In this article, we focus on image-based LT acquisition, primarily for human bodies within a light stage setup. We p...
Preprint
The light transport (LT) of a scene describes how it appears under different lighting and viewing directions, and complete knowledge of a scene's LT enables the synthesis of novel views under arbitrary lighting. In this paper, we focus on image-based LT acquisition, primarily for human bodies within a light stage setup. We propose a semi-parametric...
Preprint
Full-text available
We describe a novel approach for compressing truncated signed distance fields (TSDF) stored in 3D voxel grids, and their corresponding textures. To compress the TSDF, our method relies on a block-based neural network architecture trained end-to-end, achieving state-of-the-art rate-distortion trade-off. To prevent topological errors, we losslessly c...
Conference Paper
Motivated by recent availability of augmented and virtual reality platforms, we tackle the challenging problem of immersive storytelling experiences on mobile devices. In particular, we show an end-to-end system to generate 3D assets that enable real-time rendering of an opera on high end mobile phones. We call our system AR-ia and in this paper we...
Article
We present "The Relightables", a volumetric capture system for photorealistic and high quality relightable full-body performance capture. While significant progress has been made on volumetric capture systems, focusing on 3D geometric reconstruction with high resolution textures, much less work has been done to recover photometric properties needed...
Preprint
Full-text available
Volumetric (4D) performance capture is fundamental for AR/VR content generation. Whereas previous work in 4D performance capture has shown impressive results in studio settings, the technology is still far from being accessible to a typical consumer who, at best, might own a single RGBD sensor. Thus, in this work, we propose a method to synthesize...
Conference Paper
Full-text available
We introduce a realtime compression architecture for 4D performance capture that is two orders of magnitude faster than current state-of-the-art techniques, yet achieves comparable visual quality and bitrate. We note how much of the algorithmic complexity in traditional 4D compression arises from the necessity to encode geometry using an explicit m...
Conference Paper
Motivated by augmented and virtual reality applications such as telepresence, there has been a recent focus in real-time performance capture of humans under motion. However, given the real-time constraint, these systems often suffer from artifacts in geometry and texture such as holes and noise in the final rendering, poor lighting, and low-resolut...
Conference Paper
The advent of consumer depth cameras has incited the development of a new cohort of algorithms tackling challenging computer vision problems. The primary reason is that depth provides direct geometric information that is largely invariant to texture and illumination. As such, substantial progress has been made in human and object pose estimation, 3...
Preprint
Motivated by augmented and virtual reality applications such as telepresence, there has been a recent focus in real-time performance capture of humans under motion. However, given the real-time constraint, these systems often suffer from artifacts in geometry and texture such as holes and noise in the final rendering, poor lighting, and low-resolut...
Conference Paper
Full-text available
Real time non-rigid reconstruction pipelines are extremely computationally expensive and easily saturate the highest end GPUs currently available. This requires careful strategic choices of highly inter-connected parameters to divide up the limited compute. Offline systems, however, prove the value of increasing voxel resolution, more iterations, a...
Article
The state of the art in articulated hand tracking has been greatly advanced by hybrid methods that fit a generative hand model to depth data, leveraging both temporally and discriminatively predicted starting poses. In this paradigm, the generative model is used to define an energy function and a local iterative optimization is performed from these...
Article
Full-text available
We present Motion2Fusion, a state-of-the-art 360 performance capture system that enables *real-time* reconstruction of arbitrary non-rigid scenes. We provide three major contributions over prior work: 1) a new non-rigid fusion pipeline allowing for far more faithful reconstruction of high frequency geometric details, avoiding the over-smoothing and...
Conference Paper
Full-text available
We present an end-to-end system for augmented and virtual reality telepresence, called Holoportation. Our system demonstrates high-quality, real-time 3D reconstructions of an entire space, including people, furniture and objects, using a set of new depth cameras. These 3D models can also be transmitted in real-time to remote users. This allows user...
Article
Full-text available
We contribute a new pipeline for live multi-view performance capture, generating temporally coherent high-quality reconstructions in real-time. Our algorithm supports both incremental reconstruction, improving the surface estimation over time, as well as parameterizing the nonrigid scene motion. Our approach is highly robust to both large frame-to-...

Citations

... In recent years, several learning-based preprocessing methods have been proposed [12,18,33,34]. Chadha et al. [12] proposed a rate-aware perceptual preprocessing module for video coding. ...
... Controllable illumination in a multi-view light stage [7] is conceptually the most straightforward way of obtaining a light-reflectance decomposition in the presence of global illumination, via one-light-at-a-time (OLAT) captures. Even though not trivial, novel-view synthesis and relighting boils down to clever interpolation [39,53,29]. In contrast, input to our method is casually-captured multi-view data under unknown illumination, while embedding synthetic OLAT data generation into the training process to aid disentanglement. ...
... In recent years, as deep neural networks achieve remarkable success on various vision-related tasks, there has been a growing interest in developing implicit neural representations (INRs) of 3D shapes. Following seminal works [9,32,36] showing successful applications of neural network to encode 3D shapes, many methods have been introduced to solve various vision and graphics tasks using INRs of 3D shapes [3,4,7,10,11,15,17,24,28,29,35,38,40,42,43,49,52,53,56]. INRs usually use the multilayer perceptron (MLP) architecture to encode geometric information of a 3D shape by learning the mapping from a given 3D spatial point and a scalar value. ...
... Our model takes RGB image and depth map of the front view as the input for its perception module. Thanks to the rapid development of sensor devices, both observation data can be provided with a single RGBD camera so that there is no need to mount more sensors on the ego vehicle [22] [23] [24] [25]. Besides that, RGB image and depth map also have the same data shape and representation so that both information can be processed easily. ...
... Kelly S. and colleagues [4] have developed methods for displaying opera on premium mobile phones in realtimethe AR-ia system. Petrović N. [5] proposed a method of modeling their applications with augmented / virtual reality. ...
... 3D human reconstruction and animation. Traditional human reconstruction methods require complicated hardware that is expensive for daily use, such as depth sensors [7,12,56] or dense camera arrays [9,17]. To reduce the requirement on the capture device, some methods train networks to reconstruct human models from RGB images with differentiable renderers [64,14]. ...
... Here we set the minimum reasonable applicationspecific inference throughput values (IP S min ) to be ∼40 and ∼6 for hand detection and eye segmentation applications respectively. The IP S min values are based on latency metrics estimated in recent studies for both applications [19], [20]. ...
... We leverage a sparse set of commodity RGBD cameras to resolve this ambiguity. LookinGood [28] and HVSNet [33] are thus closely related to our approach. They both utilize RGBD camera inputs, and they achieve great visual qualities while being able to generalize to new people and actions. ...
... to capture and stream this data [7,10,16,53,69] require cumbersome capture technology, such as a large number of calibrated cameras or depth sensors, and the expert knowledge to install and deploy these systems. Videoconferencing, on the other hand, simply requires a single video camera, such as those found on common consumer devices, e.g. ...
... However, the inherent depth ambiguity is still unsolved. Some approaches [46,3,57,65,11] using the commodity depth cameras enable alleviating this, but these active IR-based cameras are unsuitable for outdoor capture and the capture volume is limited. Recently, Li et al. [28] employs a consumer-level LiDAR which provides large-scale depth information, to enable large-scale 3D human motion capture, but it still suffers from severe self-occlusion. ...