Conference Paper

Folding-Based Compression Of Point Cloud Attributes

To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In other words, f is an injective function that maps the point-space domain to a pixel-space domain, which ensures that the quality measure can be mapped into a proper distance metric (e.g., Euclidean, Wasserstein, etc). In this work, we use the transformation proposed by Quach et al. [28], which has two main stages. In the first stage, the method uses a deep neural network to find a parametric function to fold a 2D grid onto a 3D point cloud. ...
... The second stage consists of an optimal mapping of the attributes of the original PC the 2D grid. More details about this folding method can be found in the original paper [28]. ...
... It is worth noticing that Quach's folding algorithm [28] used in the first step of feature map extraction (see Fig. 1) can be flexibly applied to point cloud patches to better adapt to local geometric complexity. After folding the PC, the feature map can be rapidly computed via DRLBP since this operator does not increase in the computational complexity over the traditional LBP while is rotation and viewpoint invariant. ...
Full-text available
Methods for (PC) quality assessment customarily perform local comparisons between corresponding points in the “degraded” and pristine PCs. These methods often compare the geometry of the degraded PC and the geometry of the reference PC. More recently, a few methods that use texture information to assess the PC quality have been proposed. In this work, we propose a full-reference Point Cloud Quality Assessment (PCQA) metric that combines both geometry and texture information to provide an estimate of the PC quality. We use a projection technique that represents PCs as 2D manifolds in the 3D space. This technique maps attributes from the PCs onto the folded 2D grid, generating a pure-texture 2D image (texture maps) that contains PC texture information. Then, we extract statistical features from these texture maps using a multi-scale rotation-invariant texture descriptor named the Dominant Rotated Local Binary Pattern (DRLBP). The texture similarity is computed by measuring the statistical differences between reference and test PCs. The geometrical similarities are computed using geometry-only distances. Finally, the texture and geometrical similarities are fused using a stacked regressor to model the PC visual quality. Experimental results show that the proposed method outperforms several state-of-the-art methods. An implementation of the metric described in this paper can be found at
... This is similar to the concept of UV texture maps (Catmull, 1974) in Computer Graphics except that here we seek to recover such a parameterization from a point cloud. In this context, Quach et al. (2020a) propose a folding based approach for point cloud compression illustrated in Alternatively, attributes can be directly mapped onto a voxel grid. Alexiou et al. (2020) extend convolutional neural networks used for geometry compression to attribute compression. ...
... Deep learning based methods handle the irregularity of the geometry by using a 3D regular space (voxel grid) (Alexiou et al., 2020), by mapping attributes onto a 2D grid (Quach et al., 2020a) or with the use of point convolutions to define CNNs that operate directly on the points (Sheng et al., 2021). Note that such point convolutions can often be seen as graph convolutions with the topology of the graph built from the point cloud geometry and its neighborhood structure. ...
... Numerous approaches have explored the use of 2D images for point cloud attribute compression (Mekuria et al., 2017;Zhang et al., 2017;MPEG, 2020b). Specifically, Quach et al. (2020a) has explored a deep learning based approach for mapping attributes from each point to a 2D image using a FoldingNet (Yang et al., 2017). This presents the advantage of enabling the use of any image processing or image compression method. ...
Full-text available
Point clouds are becoming essential in key applications with advances in capture technologies leading to large volumes of data. Compression is thus essential for storage and transmission. In this work, the state of the art for geometry and attribute compression methods with a focus on deep learning based approaches is reviewed. The challenges faced when compressing geometry and attributes are considered, with an analysis of the current approaches to address them, their limitations and the relations between deep learning and traditional ones. Current open questions in point cloud compression, existing solutions and perspectives are identified and discussed. Finally, the link between existing point cloud compression research and research problems to relevant areas of adjacent fields, such as rendering in computer graphics, mesh compression and point cloud quality assessment, is highlighted.
... A novel system for compressing point cloud attributes using DL is proposed in (Quach et al., 2020a). A 2D parameterization of the point cloud can be acquired by mapping the attributes from a point cloud onto a grid, making it possible to employ 2D image processing algorithms and compression tools. ...
... A 2D parameterization of the point cloud can be acquired by mapping the attributes from a point cloud onto a grid, making it possible to employ 2D image processing algorithms and compression tools. Inspired by (Yang et al., 2018), where the model is trained on a dataset to learn how to fold a 2D grid onto a 3D point cloud, instead, the method (Quach et al., 2020a) employs the folding network as a parametric function that maps an input 2D grid to points in 3D space. However, the lossy 3D-to-2D mapping introduces distortion for reconstruction and causes the low accuracy of the folding in geometrically complex parts of the point cloud. ...
With the rapid growth of multimedia content, 3D objects are becoming more and more popular. Most of the time, they are modeled as complex polygonal meshes or dense point clouds, providing immersive experiences in different industrial and consumer multimedia applications. The point cloud, which is easier to acquire than mesh and is widely applicable, has raised many interests in both the academic and commercial worlds.A point cloud is a set of points with different properties such as their geometrical locations and the associated attributes (e.g., color, material properties, etc.). The number of the points within a point cloud can range from a thousand, to constitute simple 3D objects, up to billions, to realistically represent complex 3D scenes. Such huge amounts of data bring great technological challenges in terms of transmission, processing, and storage of point clouds.In recent years, numerous research works focused their efforts on the compression of meshes, while less was addressed for point clouds. We have identified two main approaches in the literature: a purely geometric one based on octree decomposition, and a hybrid one based on both geometry and video coding. The first approach can provide accurate 3D geometry information but contains weak temporal consistency. The second one can efficiently remove the temporal redundancy yet a decrease of geometrical precision can be observed after the projection. Thus, the tradeoff between compression efficiency and accurate prediction needs to be optimized.We focused on exploring the temporal correlations between dynamic dense point clouds. We proposed different approaches to improve the compression performance of the MPEG (Moving Picture Experts Group) V-PCC (Video-based Point Cloud Compression) test model, which provides state-of-the-art compression on dynamic dense point clouds.First, an octree-based adaptive segmentation is proposed to cluster the points with different motion amplitudes into 3D cubes. Then, motion estimation is applied to these cubes using affine transformation. Gains in terms of rate-distortion (RD) performance have been observed in sequences with relatively low motion amplitudes. However, the cost of building an octree for the dense point cloud remains expensive while the resulting octree structures contain poor temporal consistency for the sequences with higher motion amplitudes.An anatomical structure is then proposed to model the motion of the point clouds representing humanoids more inherently. With the help of 2D pose estimation tools, the motion is estimated from 14 anatomical segments using affine transformation.Moreover, we propose a novel solution for color prediction and discuss the residual coding from prediction. It is shown that instead of encoding redundant texture information, it is more valuable to code the residuals, which leads to a better RD performance.Although our contributions have improved the performances of the V-PCC test models, the temporal compression of dynamic point clouds remains a highly challenging task. Due to the limitations of the current acquisition technology, the acquired point clouds can be noisy in both geometry and attribute domains, which makes it challenging to achieve accurate motion estimation. In future studies, the technologies used for 3D meshes may be exploited and adapted to provide temporal-consistent connectivity information between dynamic 3D point clouds.
... However, these volumetric methods result in huge memory and computational cost due to 3D convolutions. In addition, some researchers [31] projected a point cloud into collections of images. But the projecting transformation introduces unnecessary quantization artifacts. ...
Geometry-based point cloud compression (G-PCC) can achieve remarkable compression efficiency for point clouds. However, it still leads to serious attribute compression artifacts, especially under low bitrate scenarios. In this paper, we propose a Multi-Scale Graph Attention Network (MS-GAT) to remove the artifacts of point cloud attributes compressed by G-PCC. We first construct a graph based on point cloud geometry coordinates and then use the Chebyshev graph convolutions to extract features of point cloud attributes. Considering that one point may be correlated with points both near and far away from it, we propose a multi-scale scheme to capture the short and long range correlations between the current point and its neighboring and distant points. To address the problem that various points may have different degrees of artifacts caused by adaptive quantization, we introduce the quantization step per point as an extra input to the proposed network. We also incorporate a graph attentional layer into the network to pay special attention to the points with more attribute artifacts. To the best of our knowledge, this is the first attribute artifacts removal method for G-PCC. We validate the effectiveness of our method over various point clouds. Experimental results show that our proposed method achieves an average of 9.28% BD-rate reduction. In addition, our approach achieves some performance improvements for the downstream point cloud semantic segmentation task.
... In [16], a deeper auto-encoding architecture is proposed, based on 3D convolution layers stacked with Voxception-ResNet structures and a hyper-prior. In [17], an encoding scheme relying on folding of a 2D grid onto a point cloud is proposed, with the attributes of the latter being mapped on top of it. In [18], geometry and color information is encoded directly in the 3D domain by extracting features from regular grids, using 3D convolutions and capturing spatial redundancies. ...
... Deep learning approaches have been employed to compress geometry [5,6,7,8,9,10] and attributes [11,12] of points clouds. Specific approaches have also been developed for sparse LIDAR point clouds [13,14]. ...
Full-text available
Point clouds are essential for storage and transmission of 3D content. As they can entail significant volumes of data, point cloud compression is crucial for practical usage. Recently, point cloud geometry compression approaches based on deep neural networks have been explored. In this paper, we evaluate the ability to predict perceptual quality of typical voxel-based loss functions employed to train these networks. We find that the commonly used focal loss and weighted binary cross entropy are poorly correlated with human perception. We thus propose a perceptual loss function for 3D point clouds which outperforms existing loss functions on the ICIP2020 subjective dataset. In addition, we propose a novel truncated distance field voxel grid representation and find that it leads to sparser latent spaces and loss functions that are more correlated with perceived visual quality compared to a binary representation. The source code is available at
... For lossless geometry coding, deep neural networks have been used to improve entropy modeling [16]. Also, DPCC for attributes has been explored by interpreting point clouds as a 2D discrete manifold in 3D space [17]. Closely related to our study, the behavior and performance of DPCC methods has been investigated in [5]. ...
Full-text available
Point clouds have been recognized as a crucial data structure for 3D content and are essential in a number of applications such as virtual and mixed reality, autonomous driving, cultural heritage, etc. In this paper, we propose a set of contributions to improve deep point cloud compression, i.e.: using a scale hyperprior model for entropy coding; employing deeper transforms; a different balancing weight in the focal loss; optimal thresholding for decoding; and sequential model training. In addition, we present an extensive ablation study on the impact of each of these factors, in order to provide a better understanding about why they improve RD performance. An optimal combination of the proposed improvements achieves BD-PSNR gains over G-PCC trisoup and octree of 5.51 (6.50) dB and 6.83 (5.85) dB, respectively, when using the point-to-point (point-to-plane) metric. Code is available at .
Conference Paper
Point cloud imaging has emerged as an efficient and popular solution to represent immersive visual information. However, the large volume of data generated in the acquisition process reveals the need of efficient compression solutions in order to store and transmit such contents. Several standardization committees are in the process of finalizing efficient compression schemes to cope with the large volume of information that point clouds require. At the same time, recent efforts on learning-based compression approaches have been shown to exhibit good performance in the coding of conventional image and video contents. It is currently an open question how learning-based coding performs when applied to point cloud data. In this study, we extend recent efforts on the matter by exploring neural network implementations for separate, or joint compression of geometric and textural information from point cloud contents. Two alternative architectures are presented and compared; that is, a unified model that learns to encode point clouds in a holistic way, allowing fine-tuning for quality preservation per attribute, and a second paradigm consisting of two cascading networks that are trained separately to encode geometry and color, individually. A baseline configuration from the best-performing option is compared to the MPEG anchor, showing better performance for geometry and competitive performance for color encoding at low bit-rates. Moreover, the impact of a series of parameters is examined on the network performance, such as the selection of input block resolution for training and testing, the color space, and the loss functions. Results provide guidelines for future efforts in learning-based point cloud compression.
Conference Paper
Full-text available
Efficient point cloud compression is fundamental to enable the deployment of virtual and mixed reality applications, since the number of points to code can range in the order of millions. In this paper, we present a novel data-driven geometry compression method for static point clouds based on learned convolutional transforms and uniform quantization. We perform joint optimization of both rate and distortion using a trade-off parameter. In addition, we cast the decoding process as a binary classification of the point cloud occupancy map. Our method outperforms the MPEG reference solution in terms of rate-distortion on the Microsoft Voxelized Upper Bodies dataset with 51.5% BDBR savings on average. Moreover, while octree-based methods face exponential diminution of the number of points at low bitrates, our method still produces high resolution outputs even at low bitrates. Code and supplementary material are available at .
Compression of point clouds has so far been confined to coding the positions of a discrete set of points in space and the attributes of those discrete points. We introduce an alternative approach based on volumetric functions, which are functions defined not just on a finite set of points, but throughout space. As in regression analysis, volumetric functions are continuous functions that are able to interpolate values on a finite set of points as linear combinations of continuous basis functions. Using a B-spline wavelet basis, we are able to code volumetric functions representing both geometry and attributes. Attribute compression is addressed in Part I of this paper, while geometry compression is addressed in Part II. Geometry is represented implicitly as the level set of a volumetric function (the signed distance function or similar). Experimental results show that geometry compression using volumetric functions improves over the methods used in the emerging MPEG Point Cloud Compression (G-PCC) standard.
Due to the increased popularity of augmented and virtual reality experiences, the interest in capturing the real world in multiple dimensions and in presenting it to users in an immersible fashion has never been higher. Distributing such representations enables users to freely navigate in multi-sensory 3D media experiences. Unfortunately, such representations require a large amount of data, not feasible for transmission on today’s networks. Efficient compression technologies well adopted in the content chain are in high demand and are key components to democratize augmented and virtual reality applications. The Moving Picture Experts Group, MPEG, as one of the main standardization groups dealing with multimedia, identified the trend and started recently the process of building an open standard for compactly representing 3D point clouds, which are the 3D equivalent of the very well-known 2D pixels. This paper introduces the main developments and technical aspects of this ongoing standardization effort.
Technical Report
TensorFlow [1] is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at
In free-viewpoint video, there is a recent trend to represent scene objects as solids rather than using multiple depth maps. Point clouds have been used in computer graphics for a long time and with the recent possibility of real time capturing and rendering, point clouds have been favored over meshes in order to save computation. Each point in the cloud is associated with its 3D position and its color. We devise a method to compress the colors in point clouds which is based on a hierarchical transform and arithmetic coding. The transform is a hierarchical sub-band transform that resembles an adaptive variation of a Haar wavelet. The arithmetic encoding of the coefficients assumes Laplace distributions, one per sub-band. The Laplace parameter for each distribution is transmitted to the decoder using a custom method. The geometry of the point cloud is encoded using the well-established octtree scanning. Results show that the proposed solution performs comparably to the current state-of-the-art, in many occasions outperforming it, while being much more computationally efficient. We believe this work represents the state-of-the-art in intra-frame compression of point clouds for real-time 3D video.
Compressing attributes on 3D point clouds such as colors or normal directions has been a challenging problem, since these attribute signals are unstructured. In this paper, we propose to compress such attributes with graph transform. We construct graphs on small neighborhoods of the point cloud by connecting nearby points, and treat the attributes as signals over the graph. The graph transform, which is equivalent to Karhunen-Loève Transform on such graphs, is then adopted to decorrelate the signal. Experimental results on a number of point clouds representing human upper bodies demonstrate that our method is much more efficient than traditional schemes such as octree-based methods.
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions. The method is straightforward to implement and is based an adaptive estimates of lower-order moments of the gradients. The method is computationally efficient, has little memory requirements and is well suited for problems that are large in terms of data and/or parameters. The method is also ap- propriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The method exhibits invariance to diagonal rescaling of the gradients by adapting to the geometry of the objective function. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. We demonstrate that Adam works well in practice when experimentally compared to other stochastic optimization methods.
Conference Paper
Restricted Boltzmann machines were developed using binary stochastic hidden units. These can be generalized by replacing each binary unit by an infinite number of copies that all have the same weights but have progressively more negative biases. The learning and inference rules for these “Stepped Sigmoid Units ” are unchanged. They can be approximated efficiently by noisy, rectified linear units. Compared with binary units, these units learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset. Unlike binary units, rectified linear units preserve information about relative intensities as information travels through multiple layers of feature detectors. 1.
Microsoft voxelized upper bodies - a voxelized point cloud dataset
  • Charles Loop
  • Qin Cai
  • Sergio O Escolano
  • Philip A Chou
Rectifier nonlinearities improve neural network acoustic models
  • Andrew L Maas
  • Y Awni
  • Andrew Y Hannun
  • Ng
Common test conditions for point cloud compression
  • Sebastian Schwarz
  • Gaelle Martin-Cocher
  • David Flynn
  • Madhukar Budagavi