Conference Paper

Learning-Based Lossless Compression of 3D Point Cloud Geometry

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Learning-based approaches, leveraging deep neural networks, have emerged as promising solutions for point cloud compression [5][6][7][8][9][10][11][19][20][21][22][23][24][25][26][27][28][29][30][31][32][33][34]. These techniques harness the representation power of neural networks to capture intricate patterns and correlations present in point cloud data. ...
... This departure from traditional methods offers the potential to achieve higher compression ratios while maintaining desirable fidelity in reconstructed point clouds. Many works extend the design of deep image/video compression techniques to compress the point cloud geometry [5][6][7][8][9][10][11][19][20][21][22][23][24][25][26][27][28] and point cloud attribute [29][30][31][32][33][34], achieving promising performance. Motivated by deep image compression that uses 2D convolutional neural networks (2D-CNNs), many point cloud geometry compression methods use 3D-CNNs to extract the latent feature over voxelized point cloud. ...
... Point-based approaches, on the other hand, represent the points themselves as the fundamental element to be compressed, rather than using a binary occupancy voxel. An autoencoder is adopted in [20][21][22][23] for geometry compression that directly takes points as input and processes by PointNet to transform the data into latent space. A fully connected layer is used in the decoder to reconstruct the points in [21], while multiscale neural graph sampling is utilized in [22] to further characterize neighboring structures as latent features. ...
Article
Full-text available
As immersive media gains increasing prominence, point clouds have emerged as a preferred data representation for presenting complex 3D scenes. However, the large size of point cloud data poses challenges in terms of storage and real-time transmission, prompting the need for highly efficient point cloud compression techniques. In response to these challenges, we introduce a novel approach called ANFPCGC++ (Augmented Normalizing Flow-based Point Cloud Geometry Compression) for lossy static point cloud geometry coding. ANFPCGC++ leverages the power of Augmented Normalizing Flow (ANF) in conjunction with sparse convolution to effectively capture and incorporate spatial correlations inherent in point clouds. ANF offers a higher level of expressiveness compared to conventional methods like variational autoencoders (VAE), resulting in more accurate and faithful latent representations. Furthermore, we introduce a Transformer-based entropy model, that combines the hyperprior and context information, enabling a more precise entropy model that supports parallel computation. Extensive experimental results confirm the superior performance of ANFPCGC++. By comparing to the point cloud coding standards G-PCC and V-PCC, our proposed method achieves remarkable bitrate savings of 63.7% and 60.0% in terms of D1-PSNR, respectively. Additionally, when compared to other deep learning-based point cloud geometry compression methods like PCGCv2 and ANFPCGC, our approach demonstrates an average bitrate reduction of 25.6% and 23.6% in terms of D1-PSNR, respectively. The source code is available at https://github.com/ymnn1996/ANFPCGC2.
... 3. We demonstrate the effectiveness of our approach by applying it to two state-of-the-art models: an octree-based one (OctAttention [14]) and a voxel-based one (VoxelDNN [16]). Experimental results on object point cloud datasets MPEG 8i and MVUB, as well as LiDAR point cloud dataset SemanticKITTI show that our method can reduce the bitrate in geometry point cloud encoding without significantly increasing time complexity. ...
... While all these studies focus on the optimization of the network structure and context information, they overlook the optimization of basic training strategies and the efficiency of context utilization. Specifically, all the models mentioned above, except for CNet [28] and the methods in [16] [30], use the 255-dimensional one-hot encoding of the eight child nodes of the current node as the label, and the cross-entropy between the one-hot encoding and the probability distribution predicted by the context model as the loss function. As explained in Section. ...
... Finally, a Softmax layer generates the probability distribution of the node, which is used for entropy coding. Moreover, it is worth noting that the proposed structure is general and can enhance the performance of multiple learning-based context models for point cloud compression, e.g., VoxelDNN [16] (see Section IV.E). ...
Preprint
In point cloud geometry compression, context models usually use the one-hot encoding of node occupancy as the label, and the cross-entropy between the one-hot encoding and the probability distribution predicted by the context model as the loss function. However, this approach has two main weaknesses. First, the differences between contexts of different nodes are not significant, making it difficult for the context model to accurately predict the probability distribution of node occupancy. Second, as the one-hot encoding is not the actual probability distribution of node occupancy, the cross-entropy loss function is inaccurate. To address these problems, we propose a general structure that can enhance existing context models. We introduce the context feature residuals into the context model to amplify the differences between contexts. We also add a multi-layer perception branch, that uses the mean squared error between its output and node occupancy as a loss function to provide accurate gradients in backpropagation. We validate our method by showing that it can improve the performance of an octree-based model (OctAttention) and a voxel-based model (VoxelDNN) on the object point cloud datasets MPEG 8i and MVUB, as well as the LiDAR point cloud dataset SemanticKITTI.
... Concurrently, learned point cloud compression methods are emerging, using deep learning techniques to compress point clouds. Former work such as OctSqueeze , VoxelDNN (Nguyen et al. 2021a), VoxelContext-Net (Que, Lu, and Xu 2021), and OctFormer (Cui et al. 2023) employ information of ancient voxels for prediction of the current one. Advancing these approaches, OctAttention (Fu et al. 2022), SparsePCGC , and EHEM (Song et al. 2023) harness the voxels in the same level as the current one to minimize the redundancy. ...
... In recent years, learned point cloud compression methods have been emerging. Many of these techniques, including those cited in (Nguyen et al. 2021b;Quach, Valenzise, and Dufaux 2019;Que, Lu, and Xu 2021;Nguyen et al. 2021a;Wang et al. 2022), utilize Octree to represent and compress point clouds. ...
... OctSqueeze ) builds the Octree of the point cloud, predicting voxel occupancy level by level, using information from ancient voxels and known data about the current voxel. Building upon OctSqueeze, methods such as VoxelDNN (Nguyen et al. 2021a), VoxelContext-Net (Que, Lu, and Xu 2021), SparsePCGC , and OctFormer (Cui et al. 2023) eliminate redundancy by employing the information of neighbor voxels of the parent voxel. Moreover, Surface Prior (Chen et al. 2022) incorporates neighbor voxels which share the same depth as the current coding voxel, into the framework. ...
Article
In recent years, the task of learned point cloud compression has gained prominence. An important type of point cloud, LiDAR point cloud, is generated by spinning LiDAR on vehicles. This process results in numerous circular shapes and azimuthal angle invariance features within the point clouds. However, these two features have been largely overlooked by previous methodologies. In this paper, we introduce a model-agnostic method called Spherical-Coordinate-based learned Point cloud compression (SCP), designed to fully leverage the features of circular shapes and azimuthal angle invariance. Additionally, we propose a multi-level Octree for SCP to mitigate the reconstruction error for distant areas within the Spherical-coordinate-based Octree. SCP exhibits excellent universality, making it applicable to various learned point cloud compression techniques. Experimental results demonstrate that SCP surpasses previous state-of-the-art methods by up to 29.14% in point-to-point PSNR BD-Rate.
... Lossy geometric compression algorithms have been extensively studied due to their high compression ratio properties [5][6][7][8]. Lossless geometry compression aims to strike a balance between compressed data size and data quality [9][10][11][12][13]. When more bits are used, it is possible to obtain a value closer to the input. ...
... With the emergence of deep learning-based methods, several studies have explored neural network-based point cloud compression. Previous research [5][6][7][8][9][10][11][12][13][14][15] focused on voxel-based approaches, while [16][17][18] utilized tree structures, and [19] employed a heightmap representation. Deep learning-based methods achieve superior compression performance compared to traditional algorithms by learning more memory-efficient encoding strategies from training data. ...
... They also leveraged the sparsity of point clouds to perform progressive resampling for hierarchical point cloud reconstruction [8] and further proposed voxel compression using inter-scale and intra-scale correlations [9]. Researchers like André [14] and others [10] enhanced compression performance by adding modules to the multi-scale point cloud geometry compression network. Yu et al. [33] proposed a multi-layer residual module designed on sparse convolution-based autoencoders which progressively down-samples the input point clouds and hierarchically reconstructs them. ...
Article
Full-text available
Due to the often substantial size of the real-world point cloud data, efficient transmission and storage have become critical concerns. Point cloud compression plays a decisive role in addressing these challenges. Recognizing the importance of capturing global information within point cloud data for effective compression, many existing point cloud compression methods overlook this crucial aspect. To tackle this oversight, we propose an innovative end-to-end point cloud compression method designed to extract both global and local information. Our method includes a novel Transformer module to extract rich features from the point cloud. Utilization of a pooling operation that requires no learnable parameters as a token mixer for computing long-distance dependencies ensures global feature extraction while significantly reducing both computations and parameters. Furthermore, we employ convolutional layers for feature extraction. These layers not only preserve the spatial structure of the point cloud, but also offer the advantage of parameter independence from the input point cloud size, resulting in a substantial reduction in parameters. Our experimental results demonstrate the effectiveness of the proposed TransPCGC network. It achieves average Bjontegaard Delta Rate (BD-Rate) gains of 85.79% and 80.24% compared to Geometry-based Point Cloud Compression (G-PCC). Additionally, in comparison to the Learned-PCGC network, our approach attains an average BD-Rate gain of 18.26% and 13.83%. Moreover, it is accompanied by a 16% reduction in encoding and decoding time, along with a 50% reduction in model size.
... Concurrently, learned point cloud compression methods are emerging. Techniques such as OctSqueeze , VoxelDNN (Nguyen et al. 2021a), VoxelContext-Net (Que, Lu, and Xu 2021), and OctFormer (Cui et al. 2023) employ information of ancient voxels for prediction of the current one. Advancing these approaches, OctAttention (Fu et al. 2022), SparsePCGC , and EHEM (Song et al. 2023) harness the voxels in the same level as the current one to minimize the redundancy. ...
... In recent years, learned point cloud compression methods have been emerging. Many of these techniques, including those cited in (Nguyen et al. 2021b;Quach, Valenzise, and Dufaux 2019;Que, Lu, and Xu 2021;Nguyen et al. 2021a;Wang et al. 2022), utilize Octree to represent and compress point clouds. ...
... OctSqueeze ) builds the Octree of the point cloud, predicting voxel occupancy level by level, using information from ancient voxels and known data about the current voxel. Building upon OctSqueeze, methods such as VoxelDNN (Nguyen et al. 2021a), VoxelContext-Net (Que, Lu, and Xu 2021), SparsePCGC , and OctFormer (Cui et al. 2023) eliminate redundancy by employing the information of neighbor voxels of the parent voxel. Moreover, Surface Prior (Chen et al. 2022) incorporates neighbor voxels which share the same depth as the current coding voxel, into the framework. ...
Preprint
Full-text available
In recent years, the task of learned point cloud compression has gained prominence. An important type of point cloud, the spinning LiDAR point cloud, is generated by spinning LiDAR on vehicles. This process results in numerous circular shapes and azimuthal angle invariance features within the point clouds. However, these two features have been largely overlooked by previous methodologies. In this paper, we introduce a model-agnostic method called Spherical-Coordinate-based learned Point cloud compression (SCP), designed to leverage the aforementioned features fully. Additionally, we propose a multi-level Octree for SCP to mitigate the reconstruction error for distant areas within the Spherical-coordinate-based Octree. SCP exhibits excellent universality, making it applicable to various learned point cloud compression techniques. Experimental results demonstrate that SCP surpasses previous state-of-the-art methods by up to 29.14% in point-to-point PSNR BD-Rate.
... In order to efficiently code point cloud geometry losslessly, it is necessary to accurately estimate the occupancy probabilities to be employed into a context adaptive arithmetic codec. In our previous work, we have modeled the voxel occupancy distributions using a likelihood-based deep autoregressive network called VoxelDNN [132], inspired by the popular PixelCNN model [136]. VoxelDNN achieves state-of-the-art gains (up to 34%) over the MPEG G-PCC reference codec. ...
... Recently, deep learning has been applied widely in point cloud coding in both the octree domain [87,29] and especially voxel domain [79,141,180,132]. A coding method for static LiDAR point cloud is proposed in [87] which learns the probability distributions of the octree based on contextual information and uses an arithmetic coder for lossless coding. ...
... In this work we focus instead on dense point clouds, where voxel-based approaches have shown interesting results. In particular, our recent work, VoxelDNN [132], is an auto-regressive based model which predicts the distribution of each voxel conditioned on the previously decoded voxels. VoxelDNN obtains an average rate saving of 30% over G-PCC. ...
Thesis
Point clouds are becoming essential in key applications with advances in capture technologies leading to large volumes of data.Compression is thus essential for storage and transmission.Point Cloud Compression can be divided into two parts: geometry and attribute compression.In addition, point cloud quality assessment is necessary in order to evaluate point cloud compression methods.Geometry compression, attribute compression and quality assessment form the three main parts of this dissertation.The common challenge across these three problems is the sparsity and irregularity of point clouds.Indeed, while other modalities such as images lie on a regular grid, point cloud geometry can be considered as a sparse binary signal over 3D space and attributes are defined on the geometry which can be both sparse and irregular.First, the state of the art for geometry and attribute compression methods with a focus on deep learning based approaches is reviewed.The challenges faced when compressing geometry and attributes are considered, with an analysis of the current approaches to address them, their limitations and the relations between deep learning and traditional ones.We present our work on geometry compression: a convolutional lossy geometry compression approach with a study on the key performance factors of such methods and a generative model for lossless geometry compression with a multiscale variant addressing its complexity issues.Then, we present a folding-based approach for attribute compression that learns a mapping from the point cloud to a 2D grid in order to reduce point cloud attribute compression to an image compression problem.Furthermore, we propose a differentiable deep perceptual quality metric that can be used to train lossy point cloud geometry compression networks while being well correlated with perceived visual quality and a convolutional neural network for point cloud quality assessment based on a patch extraction approach.Finally, we conclude the dissertation and discuss open questions in point cloud compression, existing solutions and perspectives. We highlight the link between existing point cloud compression research and research problems to relevant areas of adjacent fields, such as rendering in computer graphics, mesh compression and point cloud quality assessment.
... The probability model can be based on adaptively maintained counts for various contexts [4], [7] or on a neural network (NN) model [11], [17], having a binary context at the input and the probability mass function of the symbol at the output. ...
... In the last few years, machine learning approaches using neural networks have been proven to be competitive for both lossy [14], [16], [20]- [22] and lossless [11], [17], [20], [24] point cloud geometry compression. A comprehensive survey of the recent methods with a focus on the learning-based approaches is provided in [25]. ...
... In [22], adaptive octree-based decomposition of the point cloud is performed prior to encoding with a multilayer perceptron-based end-toend learned analysis-synthesis architecture. VoxelDNN [11] uses 3D masked convolutional filters to enforce the causality of the 3D context from which the occupancy of 64 × 64 × 64 blocks of voxels are estimated. VoxelContext-Net [20] employs an octree-based deep entropy model for both dynamic and static LIDAR point clouds. ...
Article
Full-text available
In this paper we propose a new paradigm for encoding the geometry of dense point cloud sequences, where a convolutional neural network (CNN), which estimates the encoding distributions, is optimized on several frames of the sequence to be compressed. We adopt lightweight CNN structures, we perform training as part of the encoding process and the CNN parameters are transmitted as part of the bitstream. The newly proposed encoding scheme operates on the octree representation for each point cloud, consecutively encoding each octree resolution level. At every octree resolution level, the voxel grid is traversed section-by-section (each section being perpendicular to a selected coordinate axis), and in each section, the occupancies of groups of two-by-two voxels are encoded at once in a single arithmetic coding operation. A context for the conditional encoding distribution is defined for each two-by-two group of voxels based on the information available about the occupancy of the neighboring voxels in the current and lower resolution layers of the octree. The CNN estimates the probability mass functions of the occupancy patterns of all the voxel groups from one section in four phases. In each new phase, the contexts are updated with the occupancies encoded in the previous phase, and each phase estimates the probabilities in parallel, providing a reasonable trade-off between the parallelism of the processing and the informativeness of the contexts. The CNN training time is comparable to the time spent in the remaining encoding steps, leading to competitive overall encoding times. The bitrates and encoding-decoding times compare favorably with those of recently published compression schemes.
... Recently, entropy encoders based on deep learning have been shown to outperform hand-crafted ones on rate-distortion performance. Among them, some methods partition point clouds into voxels, then adopt 3D convolution to learn and predict the occupancy of each voxel Nguyen et al. 2021a;Quach, Valenzise, and Dufaux 2019;Que, Lu, and Xu 2021). Voxelbased models are capable of exploiting the local geometric patterns (e.g., planes, surfaces). ...
... Voxel-based methods quantize the point cloud and classify the voxel occupancy by neural networks. Voxel-based methods outperform G-PCC (3DG 2021) on lossless geometric compression (Nguyen et al. 2021a;Quach, Valenzise, and Dufaux 2019), lossy geometric compression Dufaux 2019, 2020;Wang et al. 2021) and progressive compression (Guarda, Rodrigues, and Pereira 2020). Compared to an octree, geometric patterns can be naturally preserved in the voxel representation. ...
... In object point cloud compression, we set qs = 1 in Eq. (1) to perform lossless compression. We compare our method with the hand-crafted inter-frame octree-based contexts model P(full) (Garcia et al. 2019), state-of-the-art compression method VoxelDNN (Nguyen et al. 2021a) and its fast version MSVoxelDNN (Nguyen et al. 2021b). We set the training condition following VoxelDNN and test the models on different depth data. ...
Article
Full-text available
In point cloud compression, sufficient contexts are significant for modeling the point cloud distribution. However, the contexts gathered by the previous voxel-based methods decrease when handling sparse point clouds. To address this problem, we propose a multiple-contexts deep learning framework called OctAttention employing the octree structure, a memory-efficient representation for point clouds. Our approach encodes octree symbol sequences in a lossless way by gathering the information of sibling and ancestor nodes. Expressly, we first represent point clouds with octree to reduce spatial redundancy, which is robust for point clouds with different resolutions. We then design a conditional entropy model with a large receptive field that models the sibling and ancestor contexts to exploit the strong dependency among the neighboring nodes and employ an attention mechanism to emphasize the correlated nodes in the context. Furthermore, we introduce a mask operation during training and testing to make a trade-off between encoding time and performance. Compared to the previous state-of-the-art works, our approach obtains a 10%-35% BD-Rate gain on the LiDAR benchmark (e.g. SemanticKITTI) and object point cloud dataset (e.g. MPEG 8i, MVUB), and saves 95% coding time compared to the voxel-based baseline. The code is available at https://github.com/zb12138/OctAttention.
... The probability model can be based on adaptively maintained counts for various contexts [4], [7], or on a NN model [11], [17], having a binary context at the input and the probability mass function of the symbol at the output. ...
... In the last few years, machine learning approaches using neural networks were proved to be successful for both lossy [14], [16], [20]- [22] and lossless [11], [17], [20] point cloud geometry compression. In [14], an auto-encoder architecture involving 3D convolutional layers is employed to generate a latent representation of the point cloud which is further compressed using range coding. ...
... VoxelDNN [11] uses 3D masked convolutional filters to enforce the causality of the 3D context from which the occupancy of 64 × 64 × 64 blocks of voxels are estimated. VoxelContext-Net [20] employs an octree-based deep entropy model for both dynamic and static LIDAR point clouds. ...
Preprint
Full-text available
We propose a new paradigm for encoding the geometry of point cloud sequences, where the convolutional neural network (CNN) which estimates the encoding distributions is optimized on several frames of the sequence to be compressed. We adopt lightweight CNN structures, we perform training as part of the encoding process, and the CNN parameters are transmitted as part of the bitstream. The newly proposed encoding scheme operates on the octree representation for each point cloud, encoding consecutively each octree resolution layer. At every octree resolution layer, the voxel grid is traversed section-by-section (each section being perpendicular to a selected coordinate axis) and in each section the occupancies of groups of two-by-two voxels are encoded at once, in a single arithmetic coding operation. A context for the conditional encoding distribution is defined for each two-by-two group of voxels, based on the information available about the occupancy of neighbor voxels in the current and lower resolution layers of the octree. The CNN estimates the probability distributions of occupancy patterns of all voxel groups from one section in four phases. In each new phase the contexts are updated with the occupancies encoded in the previous phase, and each phase estimates the probabilities in parallel, providing a reasonable trade-off between the parallelism of processing and the informativeness of the contexts. The CNN training time is comparable to the time spent in the remaining encoding steps, leading to competitive overall encoding times. Bitrates and encoding-decoding times compare favorably with those of recently published compression schemes.
... Que [20] proposed VoxelContext-Net for static and dynamic point cloud compression. Nguyen [25] proposed a learning-based static point cloud geometric downsampling method that exploited a deep convolutional neural network with a mask to learn the probability distribution of voxels. Qin [26] applied deep learning methods to point cloud downsampling and proposed a Gaussian model voxel network-GVnet. ...
Article
Full-text available
In response to the challenge of handling large-scale 3D point cloud data, downsampling is a common approach, yet it often leads to the problem of feature loss. We present a dynamic downsampling algorithm for 3D point cloud maps based on an improved voxel filtering approach. The algorithm consists of two modules, namely, dynamic downsampling and point cloud edge extraction. The former adapts voxel downsampling according to the features of the point cloud, while the latter preserves edge information within the 3D point cloud map. Comparative experiments with voxel downsampling, grid downsampling, clustering-based downsampling, random downsampling, uniform downsampling, and farthest-point downsampling were conducted. The proposed algorithm exhibited favorable downsampling simplification results, with a processing time of 0.01289 s and a simplification rate of 91.89%. Additionally, it demonstrated faster downsampling speed and showcased improved overall performance. This enhancement not only benefits productivity but also highlights the system’s efficiency and effectiveness.
... Based on this assumption, the decoding process of the point cloud can be transformed into a classification problem of whether the grid is occupied or not. Similarly, the method by Nygyen et al. [15] adaptively divides the point cloud into multi-resolution voxels according to its structure. Voxelbased methods are unavailable for large-scale data since vocalization also leads to the discretization of point cloud features losing important information such as local structure information between point clouds. ...
... Wang et al. [24] have made further optimization of the voxel-based PCC method and proposed a sparse convolution-based PCC framework, which greatly reduces the computational. Nguyen et al. [26] have attempted to mix octree and voxel-based coding, which partition the point cloud into multi-resolution voxel blocks. You et al. [25] have proposed a direct way to deal with points, they divide the point cloud into multiple blocks, encode each block independently, and recombine all patches into a complete point cloud in the decoding process. ...
Article
Full-text available
With the development of 3D sensors technology, 3D point cloud is widely used in industrial scenes due to their high accuracy, which promotes the development of point cloud compression technology. Learned point cloud compression has attracted much attention for its excellent rate distortion performance. However, there is a one-to-one correspondence between the model and the compression rate in these methods. To achieve compression at different rates, a large number of models need to be trained, which increases the training time and storage space. To address this problem, a variable rate point cloud compression method is proposed, which enables the adjustment of the compression rate by the hyperparameter in a single model. To address the narrow rate range problem that occurs when the traditional rate distortion loss is jointly optimized for variable rate models, a rate expansion method based on contrastive learning is proposed to expands the bit rate range of the model. To improve the visualization effect of the reconstructed point cloud, a boundary learning method is introduced to improve the classification ability of the boundary points through boundary optimization and enhance the overall model performance. The experimental results show that the proposed method achieves variable rate compression with a large bit rate range while ensuring the model performance. The proposed method outperforms G-PCC, achieving more than 70% BD-Rate against G-PCC, and performs about, as well as the learned methods at high bit rates.
... Researchers have explored several techniques in learningbased PCC, such as voxelization followed by 3D convolution, sparse convolution, and Multi-Layer Perceptron (MLP). For example, Quach et al. [8], [35] and Nguyen et al. [36], [37] converted point clouds into 3D grids using voxelization and represented each voxel with an occupied or unoccupied state. Guarda et al. explored learning-based scalable coding for geometry [38], [39] and obtained multiple rate-distortion points from a trained model using explicit quantization of the latent representation [40]. ...
Preprint
The emergence of digital avatars has raised an exponential increase in the demand for human point clouds with realistic and intricate details. The compression of such data becomes challenging with overwhelming data amounts comprising millions of points. Herein, we leverage the human geometric prior in geometry redundancy removal of point clouds, greatly promoting the compression performance. More specifically, the prior provides topological constraints as geometry initialization, allowing adaptive adjustments with a compact parameter set that could be represented with only a few bits. Therefore, we can envisage high-resolution human point clouds as a combination of geometric priors and structural deviations. The priors could first be derived with an aligned point cloud, and subsequently the difference of features is compressed into a compact latent code. The proposed framework can operate in a play-and-plug fashion with existing learning based point cloud compression methods. Extensive experimental results show that our approach significantly improves the compression performance without deteriorating the quality, demonstrating its promise in a variety of applications.
... The point cloud geometry is first projected onto lower-dimensional latent spaces, the latent spaces are then encoded by entropy coders with auto-regressive and/or hyperprior-based context modeling. In terms of lossless point cloud geometry coding, efficient autoregressive voxel-based compression methods (VoxelDNN) have been proposed in [26], [27] which was further developed to a multi-scale approach by the authors [28]. However, those voxel-based point cloud geometry compression methods require high computational complexity. ...
Preprint
In recent years, we have witnessed the presence of point cloud data in many aspects of our life, from immersive media, autonomous driving to healthcare, although at the cost of a tremendous amount of data. In this paper, we present an efficient lossless point cloud compression method that uses sparse tensor-based deep neural networks to learn point cloud geometry and color probability distributions. Our method represents a point cloud with both occupancy feature and three attribute features at different bit depths in a unified sparse representation. This allows us to efficiently exploit feature-wise and point-wise dependencies within point clouds using a sparse tensor-based neural network and thus build an accurate auto-regressive context model for an arithmetic coder. To the best of our knowledge, this is the first learning-based lossless point cloud geometry and attribute compression approach. Compared with the-state-of-the-art lossless point cloud compression method from Moving Picture Experts Group (MPEG), our method achieves 22.6% reduction in total bitrate on a diverse set of test point clouds while having 49.0% and 18.3% rate reduction on geometry and color attribute component, respectively.
... Data compression methods are also often employed to shrink the amount of the data, allowing it to be stored and sent via a low-bandwidth channel. Such compression techniques aim to reduce data size by finding and removing statistical redundancy while keeping the original data [11] [12]. However, the compressed data size for wireless data transport remains rather large in multiple megabytes, and information loss still often occurs, necessitating appropriate network technology. ...
Chapter
This chapter delves into the realm of point cloud technologies, emphasizing the significance of open-source projects and frameworks in advancing this field. The central focus is on the OpenPointCloud library, an open-source repository that encompasses a variety of deep learning methods for point cloud compression, processing, and analysis. This library utilizes popular deep learning frameworks such as TensorFlow, PyTorch, and MXNet, offering a robust platform for developers and researchers to engage in innovative point cloud applications. The evolution of point cloud technologies and its increasing relevance across various industries are also highlighted, driven by the growing availability of open-source tools and collaborative platforms that foster innovation and enhance research capabilities. The OpenPointCloud library serves as a pivotal resource, facilitating the development and testing of advanced algorithms and contributing significantly to the open-source community. This initiative not only enriches the diversity and availability of tools but also propels the forward momentum of research in point cloud technologies, underscoring the critical role of open-source projects in the technological landscape.
Article
This article presents an approach to parallel processing of geographical data in mobile applications using advanced technologies. A technique for the efficient distribution and processing of data on gas and oil fields, etc., using parallel computing on mobile devices, is considered. This method involves the use of parallel streams on a mobile device. The research is carried out in order to improve the performance and speed of processing geographical data, which is critically important for real-time applications. The impact of this approach on response time and data processing quality is also analyzed. The proposed approach demonstrates significant improvements in comparison with traditional methods of processing geographical data, ensuring faster and more efficient operation of mobile applications. This makes this approach an important tool for improving the efficiency and usability of mobile geoinformation applications.
Chapter
Previously, our attention was directed toward techniques related to point cloud compression, encompassing transformation, quantization, entropy coding, and others. Within this section, our emphasis shifts toward methods for point cloud compression rooted in deep learning. Moreover, we delve extensively into the realm of learning-based 3D point cloud compression techniques presented at the MPEG conference. This endeavor aims to foster a more profound comprehension of point cloud compression methodologies.
Article
Effective compression of point clouds is essential for implementing virtual and mixed reality applications, which require encoding millions or even tens of millions of points. This paper offers a new geometric compression for point clouds based on sparse cascaded residuals and sparse attention. A sparse cascaded residual module is posited to connect multiple residual modules through shortcuts, thereby augmenting the network's learning capacity and compression efficacy. The authors developed a sparse attention module to acquire global features by computing interdependencies among points, enhancing compression performance to a greater extent. Trade‐off parameters are employed to optimize the rate and distortion. The authors’ method outperforms the state‐of‐the‐art open‐source method regarding rate‐distortion on the ShapeNet, ModelNet, and Microsoft Voxelized Upper Bodies datasets, with average bjøntegaard‐delta (BD)‐rate gains of −14.44% and −15.38%.
Article
The emergence of digital avatars has prompted an exponential increase in the demand for human point clouds with realistic and intricate details. The compression of such data becomes challenging due to massive amounts of data comprising millions of points. Herein, we leverage the human geometric prior in the geometry redundancy removal of point clouds to greatly promote compression performance. More specifically, the prior provides topological constraints as geometry initialization, allowing adaptive adjustments with a compact parameter set that can be represented with only a few bits. Therefore, we propose representing high-resolution human point clouds as a combination of a geometric prior and structural deviations. The prior is first derived with an aligned point cloud. Subsequently, the difference in features is compressed into a compact latent code. The proposed framework can operate in a plug-and-play fashion with existing learning-based point cloud compression methods. Extensive experimental results show that our approach significantly improves the compression performance without deteriorating the quality, demonstrating its promise in serving a variety of applications.
Article
The Geometry-based Point Cloud Compression (G-PCC) has been developed by the Moving Picture Experts Group to compress point clouds efficiently. Nevertheless, in its lossy mode, the reconstructed point cloud by G-PCC often suffers from noticeable distortions due to naïve geometry quantization ( i.e ., grid downsampling). This paper proposes a hierarchical prior-based super resolution method for point cloud geometry compression. The content-dependent hierarchical prior is constructed at the encoder side, which enables coarse-to-fine super resolution of the point cloud geometry at the decoder side. A more accurate prior generally yields improved reconstruction performance, albeit at the cost of increased bits required to encode this piece of side information. Our experiments on the MPEG Cat1A dataset demonstrate substantial Bjontegaard-delta bitrate savings, surpassing the performance of the octree-based and trisoup-based G-PCC v14. We provide our implementations for reproducible research at https://github.com/lidq92/mpeg-pcc-tmc13.
Article
In point cloud geometry compression, context models usually use the one-hot encoding of node occupancy as the label, and the cross-entropy between the one-hot encoding and the probability distribution predicted by the context model as the loss function. However, this approach has two main weaknesses. First, the differences between contexts of different nodes are not significant, making it difficult for the context model to accurately predict the probability distribution of node occupancy. Second, as the one-hot encoding is not the actual probability distribution of node occupancy, the cross-entropy loss function is inaccurate. To address these problems, we propose a general structure that can enhance existing context models. We introduce the context feature residuals into the context model to amplify the differences between contexts. We also add a multi-layer perception branch, that uses the mean squared error between its output and node occupancy as a loss function to provide accurate gradients in backpropagation. We validate our method by showing that it can improve the performance of an octree-based model (OctAttention) and a voxel-based model (VoxelDNN) on the object point cloud datasets MPEG 8i and MVUB, as well as the LiDAR point cloud dataset SemanticKITTI.
Article
The lossy Geometry-based Point Cloud Compression (G-PCC) inevitably impairs the geometry information of point clouds, which deteriorates the quality of experience (QoE) in reconstruction and/or misleads decisions in tasks such as classification. To tackle it, this work proposes GRNet for the geometry restoration of G-PCC compressed large-scale point clouds. By analyzing the content characteristics of original and G-PCC compressed point clouds, we attribute the G-PCC distortion to two key factors: point vanishing and point displacement. Visible impairments on a point cloud are usually dominated by an individual factor or superimposed by both factors, which are determined by the density of the original point cloud. To this end, we employ two different models for coordinate reconstruction, termed Coordinate Expansion and Coordinate Refinement, to attack the point vanishing and displacement, respectively. In addition, 4-byte auxiliary density information is signaled in the bitstream to assist the selection of Coordinate Expansion, Coordinate Refinement, or their combination. Before being fed into the coordinate reconstruction module, the G-PCC compressed point cloud is first processed by a Feature Analysis Module for multiscale information fusion, in which k NN-based Transformer is leveraged at each scale to adaptively characterize neighborhood geometric dynamics for effective restoration. Following the common test conditions recommended in the MPEG standardization committee, GRNet significantly improves the G-PCC anchor and remarkably outperforms state-of-the-art methods on a great variety of point clouds ( e.g. , solid, dense, and sparse samples) both quantitatively and qualitatively. Meanwhile, GRNet runs fairly fast and uses a smaller-size model when compared with existing learning-based approaches, making it attractive to industry practitioners.
Article
In recent years, we have witnessed the presence of point cloud data in many aspects of our life, from immersive media, autonomous driving to healthcare, although at the cost of a tremendous amount of data. In this paper, we present an efficient lossless point cloud compression method that uses sparse tensor-based deep neural networks to learn point cloud geometry and color probability distributions. Our method represents a point cloud with both occupancy feature and three attribute features at different bit depths in a unified sparse representation. This allows us to efficiently exploit feature-wise and point-wise dependencies within point clouds using a sparse tensor-based neural network and thus build an accurate auto-regressive context model for an arithmetic coder. To the best of our knowledge, this is the first learning-based lossless point cloud geometry and attribute compression approach. Compared with the-state-of-the-art lossless point cloud compression method from Moving Picture Experts Group (MPEG), our method achieves 22.6% reduction in total bitrate on a diverse set of test point clouds while having 49.0% and 18.3% rate reduction on geometry and color attribute component, respectively.
Article
Since the development of 3D applications, the point cloud, as a spatial description easily acquired by sensors, has been widely used in multiple areas such as SLAM and 3D reconstruction. Point Cloud Compression (PCC) has also attracted more attention as a primary step before point cloud transferring and saving, where the geometry compression is an important component of PCC to compress the points geometrical structures. However, existing non-learning-based geometry compression methods are often limited by manually pre-defined compression rules. Though learning-based compression methods can significantly improve the algorithm performances by learning compression rules from data, they still have some defects. Voxel-based compression networks introduce precision errors due to the voxelized operations, while point-based methods may have relatively weak robustness and are mainly designed for sparse point clouds. In this work, we propose a novel learning-based point cloud compression framework named 3D Point Cloud Geometry Quantiation Compression Network (3QNet), which overcomes the robustness limitation of existing point-based methods and can handle dense points. By learning a codebook including common structural features from simple and sparse shapes, 3QNet can efficiently deal with multiple kinds of point clouds. According to experiments on object models, indoor scenes, and outdoor scans, 3QNet can achieve better compression performances than many representative methods.
Article
The increase in popularity of point-cloud-oriented applications has triggered the development of specialized compression algorithms. In this paper, a novel algorithm is developed for the lossless geometry compression of voxelized point clouds following an intra-frame design. The encoded voxels are arranged into runs and are encoded through a single-pass application directly on the voxel domain. This is done without representing the point cloud via an octree nor rendering the voxel space through an occupancy matrix, therefore decreasing the memory requirements of the method. Each run is compressed using a context-adaptive arithmetic encoder yielding state-of-the-art compression results, with gains of up to 15% over TMC13 , MPEG’s standard for point cloud geometry compression. Several proposed contributions accelerate the calculations of each run’s probability limits prior to arithmetic encoding. As a result, the encoder attains a low computational complexity described by a linear relation to the number of occupied voxels leading to an average speedup of 1.8 over TMC13 in encoding speeds. Various experiments are conducted assessing the proposed algorithm’s state-of-the-art performance in terms of compression ratio and encoding speeds.
Preprint
This paper presents a novel method to determine rate-distortion optimized transform coefficients for efficient compression of videos generated from point clouds. The method exploits a generalized frequency selective extrapolation approach that iteratively determines rate-distortion-optimized coefficients for all basis functions of two-dimensional discrete cosine and sine transforms. The method is applied to blocks containing both occupied and unoccupied pixels in video based point cloud compression for HEVC encoding. In the proposed algorithm, only the values of the transform coefficients are changed such that resulting bit streams are compliant to the V-PCC standard. For all-intra coded point clouds, bitrate savings of more than 4% for geometry and more than 6% for texture error metrics with respect to standard encoding can be observed. These savings are more than twice as high as savings obtained using competing methods from literature. In the randomaccess case, our proposed method outperforms competing V-PCC methods by more than 0.5%.
Article
This paper presents a novel method to determine rate-distortion optimized transform coefficients for efficient compression of videos generated from point clouds. The method exploits a generalized frequency selective extrapolation approach that iteratively determines rate-distortion-optimized coefficients for all basis functions of two-dimensional discrete cosine and sine transforms. The method is applied to blocks containing both occupied and unoccupied pixels in video based point cloud compression for HEVC encoding. In the proposed algorithm, only the values of the transform coefficients are changed such that resulting bit streams are compliant to the V-PCC standard. For all-intra coded point clouds, bitrate savings of more than 4% for geometry and more than 6% for texture error metrics with respect to standard encoding can be observed. These savings are more than twice as high as savings obtained using competing methods from literature. In the randomaccess case, our proposed method outperforms competing V-PCC methods by more than 0.5%.
Article
Full-text available
This article presents an overview of the recent standardization activities for point cloud compression (PCC). A point cloud is a 3D data representation used in diverse applications associated with immersive media including virtual/augmented reality, immersive telepresence, autonomous driving and cultural heritage archival. The international standard body for media compression, also known as the Motion Picture Experts Group (MPEG), is planning to release in 2020 two PCC standard specifications: video-based PCC (V-CC) and geometry-based PCC (G-PCC). V-PCC and G-PCC will be part of the ISO/IEC 23090 series on the coded representation of immersive media content. In this paper, we provide a detailed description of both codec algorithms and their coding performances. Moreover, we will also discuss certain unique aspects of point cloud compression.
Conference Paper
Full-text available
Efficient point cloud compression is fundamental to enable the deployment of virtual and mixed reality applications, since the number of points to code can range in the order of millions. In this paper, we present a novel data-driven geometry compression method for static point clouds based on learned convolutional transforms and uniform quantization. We perform joint optimization of both rate and distortion using a trade-off parameter. In addition, we cast the decoding process as a binary classification of the point cloud occupancy map. Our method outperforms the MPEG reference solution in terms of rate-distortion on the Microsoft Voxelized Upper Bodies dataset with 51.5% BDBR savings on average. Moreover, while octree-based methods face exponential diminution of the number of points at low bitrates, our method still produces high resolution outputs even at low bitrates. Code and supplementary material are available at https://github.com/mauriceqch/pcc_geo_cnn .
Article
Full-text available
We present a method to compress geometry information of point clouds that explores redundancies across consecutive frames of a sequence. It uses octrees and works by progressively increasing resolution of the octree. At each branch of the tree, we generate an approximation of the child nodes by a number of methods which are used as contexts to drive an arithmetic coder. The best approximation, i.e. the context that yields the least amount of encoding bits, is selected and the chosen method is indicated as side information for replication at the decoder. The core of our method is a context-based arithmetic coder in which a reference octree is used as reference to encode the current octree, thus providing 255 contexts for each output octet. The 255×255 frequency histogram is viewed as a discrete 3D surface and is conveyed to the decoder using another octree. We present two methods to generate the predictions (contexts) which use adjacent frames in the sequence (inter-frame) and one method that works purely intra-frame. The encoder continuously switches the best mode among the three and conveys such information to the decoder. Since an intra-frame prediction is present, our coder can also work in purely intra-frame mode, as well. Extensive results are presented to show the method’s potential against many compression alternatives for the geometry information in dynamic voxelized point clouds.
Article
The widespread adoption of new 3D sensor and authoring technologies has made it possible to capture 3D scenes and models in real time with decent visual quality. As an example, Microsoft's Kinect and Apple's PrimeSense technology are now being used in a wide variety of interactive 3D mobile applications, including gaming and augmented reality applications. The latest smartphones are equipped with multiple cameras, which can be readily used to generate depth images. Some of the latest smartphones also include depth-ranging sensors that can be used for 3D model generation. Light-based detection and ranging (lidar) technologies are yet another field where 3D depth acquisition is important. Realtime 3D scenery detection and ranging has become an important issue for the emerging field of autonomous navigation and driving applications.
Article
Due to the increased popularity of augmented and virtual reality experiences, the interest in capturing the real world in multiple dimensions and in presenting it to users in an immersible fashion has never been higher. Distributing such representations enables users to freely navigate in multi-sensory 3D media experiences. Unfortunately, such representations require a large amount of data, not feasible for transmission on today’s networks. Efficient compression technologies well adopted in the content chain are in high demand and are key components to democratize augmented and virtual reality applications. The Moving Picture Experts Group, MPEG, as one of the main standardization groups dealing with multimedia, identified the trend and started recently the process of building an open standard for compactly representing 3D point clouds, which are the 3D equivalent of the very well-known 2D pixels. This paper introduces the main developments and technical aspects of this ongoing standardization effort.
Article
Deep Neural Networks trained as image auto-encoders have recently emerged as a promising direction for advancing the state of the art in image compression. The key challenge in learning such networks is twofold: to deal with quantization, and to control the trade-off between reconstruction error (distortion) and entropy (rate) of the latent image representation. In this paper, we focus on the latter challenge and propose a new technique to navigate the rate-distortion trade-off for an image compression auto-encoder. The main idea is to directly model the entropy of the latent representation by using a context model: a 3D-CNN which learns a conditional probability model of the latent distribution of the auto-encoder. During training, the auto-encoder makes use of the context model to estimate the entropy of its representation, and the context model is concurrently updated to learn the dependencies between the symbols in the latent representation. Our experiments show that this approach yields a state-of-the-art image compression system based on a simple convolutional auto-encoder.
Article
PixelCNNs are a recently proposed class of powerful generative models with tractable likelihood. Here we discuss our implementation of PixelCNNs which we make available at https://github.com/openai/pixel-cnn. Our implementation contains a number of modifications to the original model that both simplify its structure and improve its performance. 1) We use a discretized logistic mixture likelihood on the pixels, rather than a 256-way softmax, which we find to speed up training. 2) We condition on whole pixels, rather than R/G/B sub-pixels, simplifying the model structure. 3) We use downsampling to efficiently capture structure at multiple resolutions. 4) We introduce additional short-cut connections to further speed up optimization. 5) We regularize the model using dropout. Finally, we present state-of-the-art log likelihood results on CIFAR-10 to demonstrate the usefulness of these modifications.
Article
Modeling the distribution of natural images is a landmark problem in unsupervised learning. This task requires an image model that is at once expressive, tractable and scalable. We present a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions. Our method models the discrete probability of the raw pixel values and encodes the complete set of dependencies in the image. Architectural novelties include fast two-dimensional recurrent layers and an effective use of residual connections in deep recurrent networks. We achieve log-likelihood scores on natural images that are considerably better than the previous state of the art. Our main results also provide benchmarks on the diverse ImageNet dataset. Samples generated from the model appear crisp, varied and globally coherent.
Article
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions. The method is straightforward to implement and is based an adaptive estimates of lower-order moments of the gradients. The method is computationally efficient, has little memory requirements and is well suited for problems that are large in terms of data and/or parameters. The method is also ap- propriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The method exhibits invariance to diagonal rescaling of the gradients by adapting to the geometry of the objective function. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. We demonstrate that Adam works well in practice when experimentally compared to other stochastic optimization methods.
Microsoft voxelized upper bodies - a voxelized point cloud dataset
  • C Loop
  • Q Cai
  • S O Escolano
  • P A Chou
8i Voxelized Full Bodies - A Voxelized Point Cloud Dataset
  • E Eon
  • B Harrison
  • T Myers
  • P A Chou
Microsoft voxelized upper bodies - a voxelized point cloud dataset
  • loop