Philip A. Chou

Philip A. Chou
Google Inc. | Google · Daydream

About

244
Publications
68,202
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
15,864
Citations

Publications

Publications (244)
Preprint
Full-text available
We build interpretable and lightweight transformer-like neural networks by unrolling iterative optimization algorithms that minimize graph smoothness priors -- the quadratic graph Laplacian regularizer (GLR) and the $\ell_1$-norm graph total variation (GTV) -- subject to an interpolation constraint. The crucial insight is that a normalized signal-d...
Article
Point cloud compression (PCC) has been rapidly evolving in the context of international standards. Despite the inherent scalability of octree-based geometry descriptions, current attribute compression techniques prevent full scalability of compressed point clouds. We propose an improvement on an embedded attribute encoding method for point clouds b...
Preprint
Full-text available
We study 3D point cloud attribute compression using a volumetric approach: given a target volumetric attribute function $f : \mathbb{R}^3 \rightarrow \mathbb{R}$, we quantize and encode parameter vector $\theta$ that characterizes $f$ at the encoder, for reconstruction $f_{\hat{\theta}}(\mathbf{x})$ at known 3D points $\mathbf{x}$'s at the decoder....
Preprint
We propose sandwiched video compression -- a video compression system that wraps neural networks around a standard video codec. The sandwich framework consists of a neural pre- and post-processor with a standard video codec between them. The networks are trained jointly to optimize a rate-distortion loss function with the goal of significantly impr...
Article
We propose novel two-channel filter banks for signals on graphs. Our designs can be applied to arbitrary graphs, given a positive semi definite variation operator, while using arbitrary vertex partitions for downsampling. The proposed generalized filter banks (GFBs) also satisfy several desirable properties including perfect reconstruction and crit...
Article
Full-text available
We consider the attributes of a point cloud as samples of a vector-valued volumetric function at discrete positions. To compress the attributes given the positions, we compress the parameters of the volumetric function. We model the volumetric function by tiling space into blocks, and representing the function over each block by shifts of a coordin...
Preprint
Full-text available
p>Arithmetic coding is used in most media compression methods. Context modeling is usually done through frequency counting and look-up tables (LUTs). For long-memory signals, probability modeling with large context sizes is often infeasible. Recently, neural networks have been used to model probabilities of large contexts in order to drive arithmet...
Preprint
Full-text available
p>Arithmetic coding is used in most media compression methods. Context modeling is usually done through frequency counting and look-up tables (LUTs). For long-memory signals, probability modeling with large context sizes is often infeasible. Recently, neural networks have been used to model probabilities of large contexts in order to drive arithmet...
Preprint
Full-text available
We study the design of filter banks for signals defined on the nodes of graphs. We propose novel two channel filter banks, that can be applied to arbitrary graphs, given a positive semi definite variation operator, while using downsampling operators on arbitrary vertex partitions. The proposed filter banks also satisfy several desirable properties,...
Article
Arithmetic coding is used in most media compression methods. Context modeling is usually done through frequency counting and look-up tables (LUTs). For long-memory signals, probability modeling with large context sizes is often infeasible. Recently, neural networks have been used to model probabilities of large contexts in order to drive arithmetic...
Preprint
We consider the attributes of a point cloud as samples of a vector-valued volumetric function at discrete positions. To compress the attributes given the positions, we compress the parameters of the volumetric function. We model the volumetric function by tiling space into blocks, and representing the function over each block by shifts of a coordin...
Article
We propose an embedded attribute encoding method for point clouds based on set partitioning in hierarchical trees (SPIHT). The encoder is used with the region-adaptive hierarchical transform which has been a popular transform for point cloud coding, even included in the standard geometry-based point cloud coder (G-PCC). The result is an encoder tha...
Preprint
Full-text available
div>We propose an embedded attribute encoding method for point clouds based on set partitioning in hierarchical trees (SPIHT) [1]. The encoder is used with the region-adaptive hierarchical transform [2] which has been a popular transform for point cloud coding, even included in the standard geometry-based point cloud coder (G-PCC) [3],[4]. The resu...
Preprint
Full-text available
div>We propose an embedded attribute encoding method for point clouds based on set partitioning in hierarchical trees (SPIHT) [1]. The encoder is used with the region-adaptive hierarchical transform [2] which has been a popular transform for point cloud coding, even included in the standard geometry-based point cloud coder (G-PCC) [3],[4]. The resu...
Preprint
Some forms of novel visual media enable the viewer to explore a 3D scene from arbitrary viewpoints, by interpolating between a discrete set of original views. Compared to 2D imagery, these types of applications require much larger amounts of storage space, which we seek to reduce. Existing approaches for compressing 3D scenes are based on a separat...
Article
Full-text available
We review a class of methods that can be collected under the name nonlinear transform coding (NTC), which over the past few years have become competitive with the best linear transform codecs for images, and have superseded them in terms of rate—distortion performance under established perceptual quality metrics such as MS-SSIM. We assess the empir...
Preprint
In the past decade, several multi-resolution representation theories for graph signals have been proposed. Bipartite filter-banks stand out as the most natural extension of time domain filter-banks, in part because perfect reconstruction, orthogonality and bi-orthogonality conditions in the graph spectral domain resemble those for traditional filte...
Preprint
Full-text available
We review a class of methods that can be collected under the name nonlinear transform coding (NTC), which over the past few years have become competitive with the best linear transform codecs for images, and have superseded them in terms of rate--distortion performance under established perceptual quality metrics such as MS-SSIM. We assess the empi...
Preprint
Full-text available
We describe a novel approach for compressing truncated signed distance fields (TSDF) stored in 3D voxel grids, and their corresponding textures. To compress the TSDF, our method relies on a block-based neural network architecture trained end-to-end, achieving state-of-the-art rate-distortion trade-off. To prevent topological errors, we losslessly c...
Preprint
Full-text available
We introduce the Region Adaptive Graph Fourier Transform (RA-GFT) for compression of 3D point cloud attributes. We assume the points are organized by a family of nested partitions represented by a tree. The RA-GFT is a multiresolution transform, formed by combining spatially localized block transforms. At each resolution level, attributes are proce...
Article
Compression of point clouds has so far been confined to coding the positions of a discrete set of points in space and the attributes of those discrete points. We introduce an alternative approach based on volumetric functions, which are functions defined not just on a finite set of points, but throughout space. As in regression analysis, volumetric...
Article
A recently-introduced coder based on region-adaptive hierarchical transform (RAHT) is being considered as a standard for the compression of point cloud attributes at MPEG (Moving Picture Experts Group). The RAHT coefficients can be encoded in many ways and the transform is based on a series of orthogonal 2×2 transform matrices with geometry-depende...
Article
Compression of point clouds has so far been confined to coding the positions of a discrete set of points in space and the attributes of those discrete points. We introduce an alternative approach based on volumetric functions, which are functions defined not just on a finite set of points, but throughout space. As in regression analysis, volumetric...
Article
Volumetric media, popularly known as holograms, need to be delivered to users using both on-demand and live streaming, for new augmented reality (AR) and virtual reality (VR) experiences. As in video streaming, hologram streaming must support network adaptivity and fast startup, but must also handle large bandwidths, multiple simultaneously streami...
Article
Due to the increased popularity of augmented and virtual reality experiences, the interest in capturing the real world in multiple dimensions and in presenting it to users in an immersible fashion has never been higher. Distributing such representations enables users to freely navigate in multi-sensory 3D media experiences. Unfortunately, such repr...
Article
Light field (LF) representations aim to provide photo-realistic, free-viewpoint viewing experiences. However, the most popular LF representations are images from multiple views. Multi-view image-based representations generally need to restrict the range or degrees of freedom of the viewing experience to what can be interpolated in the image domain,...
Article
Point clouds have been recently used in applications involving real-time capture and rendering of 3D objects. In a point cloud, for practical reasons, each point or voxel is usually associated with one single color along with other attributes. The region-adaptive hierarchical transform (RAHT) coder has been proposed for single-color point clouds. T...
Preprint
Full-text available
Compression of point clouds has so far been confined to coding the positions of a discrete set of points in space and the attributes of those discrete points. We introduce an alternative approach based on volumetric functions, which are functions defined not just on a finite set of points, but throughout space. As in regression analysis, volumetric...
Preprint
Full-text available
Light field (LF) representations aim to provide photo-realistic, free-viewpoint viewing experiences. However, the most popular LF representations are images from multiple views. Multi-view image-based representations generally need to restrict the range or degrees of freedom of the viewing experience to what can be interpolated in the image domain,...
Preprint
The recently introduced coder based on region-adaptive hierarchical transform (RAHT) for the compression of point clouds attributes, was shown to have a performance competitive with the state-of-the-art, while being much less complex. In the paper "Compression of 3D Point Clouds Using a Region-Adaptive Hierarchical Transform", top performance was a...
Preprint
Full-text available
Volumetric media, popularly known as holograms, need to be delivered to users using both on-demand and live streaming, for new augmented reality (AR) and virtual reality (VR) experiences. As in video streaming, hologram streaming must support network adaptivity and fast startup, but must also moderate large bandwidths, multiple simultaneously strea...
Article
Full-text available
We present a context-driven method to encode nodes of an octree, which is typically used to encode point cloud geometry. Instead of using one bit per node of the tree, the context allows for deriving probabilities for that node based on distances of the actual voxel to voxels in a reference point cloud. Accurate probabilities of the node state allo...
Article
Full-text available
We introduce the polygon cloud, a compressible representation of three-dimensional geometry (including attributes, such as color), intermediate between polygonal meshes and point clouds. Dynamic polygon clouds, like dynamic polygonal meshes and dynamic point clouds, can take advantage of temporal redundancy for compression. In this paper, we propos...
Article
Dynamic point clouds are a potential new frontier in visual communication systems. A few articles have addressed the compression of point clouds, but very few references exist on exploring temporal redundancies. This paper presents a novel motion-compensated approach to encoding dynamic voxelized point clouds at low bit rates. A simple coder breaks...
Article
We propose using stationary Gaussian Processes (GPs) to model the statistics of the signal on points in a point cloud, which can be considered samples of a GP at the positions of the points. Further, we propose using Gaussian Process Transforms (GPTs), which are Karhunen-Lo`eve transforms of the GP, as the basis of transform coding of the signal. F...
Conference Paper
We introduce a compressible representation of 3D geometry (including its attributes, such as color texture) intermediate between polygonal meshes and point clouds called a polygon cloud. Polygon clouds, compared to polygonal meshes, are more robust to live capture noise and artifacts. Furthermore, dynamic polygon clouds, compared to dynamic point c...
Conference Paper
Full-text available
We present an end-to-end system for augmented and virtual reality telepresence, called Holoportation. Our system demonstrates high-quality, real-time 3D reconstructions of an entire space, including people, furniture and objects, using a set of new depth cameras. These 3D models can also be transmitted in real-time to remote users. This allows user...
Article
We introduce the {\em polygon cloud}, also known as a polygon set or {\em soup}, as a compressible representation of 3D geometry (including its attributes, such as color texture) intermediate between polygonal meshes and point clouds. Dynamic or time-varying polygon clouds, like dynamic polygonal meshes and dynamic point clouds, can take advantage...
Conference Paper
We present a new mathematical framework for multi-view surface reconstruction from a set of calibrated color and depth images. We estimate the occupancy probability of points in space along sight rays, and combine these estimates using a normalized product derived from Bayes' rule. The advantage of this approach is that the free space constraint is...
Conference Paper
Interactive real-time streaming applications such as audio-video conferencing, online gaming and app streaming, place stringent requirements on the network in terms of delay, jitter, and packet loss. Many of these applications inherently involve client-to-client communication, which is particularly challenging since the performance requirements nee...
Article
In free-viewpoint video, there is a recent trend to represent scene objects as solids rather than using multiple depth maps. Point clouds have been used in computer graphics for a long time and with the recent possibility of real time capturing and rendering, point clouds have been favored over meshes in order to save computation. Each point in the...
Article
Full-text available
In texture-plus-depth representation of a 3D scene, depth maps from different camera viewpoints are typically lossily compressed via the classical transform coding / coefficient quantization paradigm. In this paper we propose to reduce distortion of the decoded depth maps due to quantization. The key observation is that depth maps from different vi...
Article
Full-text available
This paper addresses the problem of compression of 3D point cloud sequences that are characterized by moving 3D positions and color attributes. As temporally successive point cloud frames are similar, motion estimation is key to effective compression of these sequences. It however remains a challenging problem as the point cloud frames have varying...
Patent
A temporal information integration dis-occlusion system and method for using historical data to reconstruct a virtual view containing an occluded area. Embodiments of the system and method use temporal information of the scene captured previously to obtain a total history. This total history is warped onto information captured by a camera at a curr...
Conference Paper
ImmerseBoard is a system for remote collaboration through a digital whiteboard that gives participants a 3D immersive experience, enabled only by an RGBD camera (Microsoft Kinect) mounted on the side of a large touch display. Using 3D processing of the depth images, life-sized rendering, and novel visualizations, ImmerseBoard emulates writing side-...
Patent
Full-text available
Techniques for virtual conferencing servers are described. An apparatus may comprise a conferencing server to manage a conference call with multiple client terminals. The conferencing server may have a virtual conference manager to select a first client terminal to operate as a first virtual conferencing server, and transfer conference call operati...
Article
The accuracy of scene flow is restricted by several challenges such as occlusion and large displacement motion. When occlusion happens, the positions inside the occluded regions lose their corresponding counterparts in preceding and succeeding frames. Large displacement motion will increase the complexity of motion modeling and computation. Moreove...
Conference Paper
Conventional scene flow containing only translational vectors is not able to model 3D motion with rotation properly. Moreover, the accuracy of 3D motion estimation is restricted by several challenges such as large displacement, noise, and missing data (caused by sensing techniques or occlusion). In terms of solution, there are two kinds of approach...
Conference Paper
Full-text available
In this paper, we further the characterization of a fundamental limit of human perception: the accuracy of human estimation of others' eye gaze directions. In particular, we introduce a non-linear model that describes how both the head direction and the gaze direction of a looker relative to an observer jointly affect the observer's perception of t...
Article
Transmitting compactly represented geometry of a dynamic 3D scene from a sender can enable a multitude of imaging functionalities at a receiver, such as synthesis of virtual images at freely chosen viewpoints via depth-image-based rendering. While depth maps—projections of 3D geometry onto 2D image planes at chosen camera viewpoints—can nowadays be...
Conference Paper
Full-text available
The next step in immersive communication beyond video from a single camera is object-based free viewpoint video, which is the capture and compression of a dynamic object such that it can be reconstructed and viewed from an arbitrary viewpoint. The moving human body is a particularly useful subclass of dynamic object for object-based free viewpoint...
Article
The last great advances in immersive communication were the invention of the telephone over 137 years ago and the invention of the video telephone (né television) over 86 years ago. However, a perfect storm is brewing for the next advance in immersive communication, thanks to the convergence of massive amounts of computation, bandwidth, resolution,...
Conference Paper
Transmitting compactly represented geometry of a dynamic scene from a sender can enable a multitude of 3D imaging functionalities at a receiver, such as synthesis of virtual images from freely chosen viewpoints via depth-image-based rendering (DIBR). While depth maps can now be readily captured using inexpensive depth sensors, they are often corrup...
Conference Paper
We propose a framework for simultaneous phase unwrapping and multipath interference cancellation (SPUMIC) in homodyne time-of-flight (ToF) cameras. Our multi-frequency acquisition framework is based on parametric modeling of the multipath interference phenomena. We use robust spectral estimation methods with low computational complexity to detect a...
Conference Paper
Full-text available
Transmitting from sender compressed texture and depth maps of multiple viewpoints enables image synthesis at receiver from any intermediate virtual viewpoint via depth-image-based rendering (DIBR). We observe that quantized depth maps from different viewpoints of the same 3D scene constitutes multiple descriptions (MD) of the same signal, thus it i...
Article
Full-text available
We observe that, in a network, the location of the node on which a service is computed is inextricably linked to the locations of the paths through which the service communicates. Hence service location can have a profound effect on quality of service (QoS), especially for communication-centric applications such as real-time multimedia. In this pap...
Article
Full-text available
In this paper, we study the problem of utility maximization in peer-to-peer (P2P) systems, in which aggregate application-specific utilities are maximized by running distributed algorithms on P2P nodes, which are constrained by their uplink capacities. For certain P2P topologies, we show that routing along a linear number of trees per source can ac...
Conference Paper
Augmented reality applications have focused on visually integrating virtual objects into real environments. In this paper, we propose an auditory augmented reality, where we integrate acoustic virtual objects into the real world. We sonify objects that do not intrinsically produce sound, with the purpose of revealing additional information about th...
Article
Full-text available
Communication has seen enormous advances over the past 100 years including radio, television, mobile phones, video conferencing, and Internet-based voice and video calling. Still, remote communication remains less natural and more fatiguing than face-to-face. The vision of immersive communication is to enable natural experiences and interactions wi...
Article
Full-text available
Traditional multiparty audio or video conferencing uses a single node, sometimes called a multipoint control unit, or MCU, to mix audio data for the conference. We introduce a novel mixer, called a Virtual Mixer, which performs mixing in a distributed way over the network. The Virtual Mixer topology is optimized over Steiner trees using a metric of...