## No full-text available

To read the full-text of this research,

you can request a copy directly from the authors.

To read the full-text of this research,

you can request a copy directly from the authors.

... Lossy geometric compression algorithms have been extensively studied due to their high compression ratio properties [5][6][7][8]. Lossless geometry compression aims to strike a balance between compressed data size and data quality [9][10][11][12][13]. When more bits are used, it is possible to obtain a value closer to the input. ...

... With the emergence of deep learning-based methods, several studies have explored neural network-based point cloud compression. Previous research [5][6][7][8][9][10][11][12][13][14][15] focused on voxel-based approaches, while [16][17][18] utilized tree structures, and [19] employed a heightmap representation. Deep learning-based methods achieve superior compression performance compared to traditional algorithms by learning more memory-efficient encoding strategies from training data. ...

... They also leveraged the sparsity of point clouds to perform progressive resampling for hierarchical point cloud reconstruction [8] and further proposed voxel compression using inter-scale and intra-scale correlations [9]. Researchers like André [14] and others [10] enhanced compression performance by adding modules to the multi-scale point cloud geometry compression network. Yu et al. [33] proposed a multi-layer residual module designed on sparse convolution-based autoencoders which progressively down-samples the input point clouds and hierarchically reconstructs them. ...

Due to the often substantial size of the real-world point cloud data, efficient transmission and storage have become critical concerns. Point cloud compression plays a decisive role in addressing these challenges. Recognizing the importance of capturing global information within point cloud data for effective compression, many existing point cloud compression methods overlook this crucial aspect. To tackle this oversight, we propose an innovative end-to-end point cloud compression method designed to extract both global and local information. Our method includes a novel Transformer module to extract rich features from the point cloud. Utilization of a pooling operation that requires no learnable parameters as a token mixer for computing long-distance dependencies ensures global feature extraction while significantly reducing both computations and parameters. Furthermore, we employ convolutional layers for feature extraction. These layers not only preserve the spatial structure of the point cloud, but also offer the advantage of parameter independence from the input point cloud size, resulting in a substantial reduction in parameters. Our experimental results demonstrate the effectiveness of the proposed TransPCGC network. It achieves average Bjontegaard Delta Rate (BD-Rate) gains of 85.79% and 80.24% compared to Geometry-based Point Cloud Compression (G-PCC). Additionally, in comparison to the Learned-PCGC network, our approach attains an average BD-Rate gain of 18.26% and 13.83%. Moreover, it is accompanied by a 16% reduction in encoding and decoding time, along with a 50% reduction in model size.

... Concurrently, learned point cloud compression methods are emerging. Techniques such as OctSqueeze , VoxelDNN (Nguyen et al. 2021a), VoxelContext-Net (Que, Lu, and Xu 2021), and OctFormer (Cui et al. 2023) employ information of ancient voxels for prediction of the current one. Advancing these approaches, OctAttention (Fu et al. 2022), SparsePCGC , and EHEM (Song et al. 2023) harness the voxels in the same level as the current one to minimize the redundancy. ...

... In recent years, learned point cloud compression methods have been emerging. Many of these techniques, including those cited in (Nguyen et al. 2021b;Quach, Valenzise, and Dufaux 2019;Que, Lu, and Xu 2021;Nguyen et al. 2021a;Wang et al. 2022), utilize Octree to represent and compress point clouds. ...

... OctSqueeze ) builds the Octree of the point cloud, predicting voxel occupancy level by level, using information from ancient voxels and known data about the current voxel. Building upon OctSqueeze, methods such as VoxelDNN (Nguyen et al. 2021a), VoxelContext-Net (Que, Lu, and Xu 2021), SparsePCGC , and OctFormer (Cui et al. 2023) eliminate redundancy by employing the information of neighbor voxels of the parent voxel. Moreover, Surface Prior (Chen et al. 2022) incorporates neighbor voxels which share the same depth as the current coding voxel, into the framework. ...

In recent years, the task of learned point cloud compression has gained prominence. An important type of point cloud, the spinning LiDAR point cloud, is generated by spinning LiDAR on vehicles. This process results in numerous circular shapes and azimuthal angle invariance features within the point clouds. However, these two features have been largely overlooked by previous methodologies. In this paper, we introduce a model-agnostic method called Spherical-Coordinate-based learned Point cloud compression (SCP), designed to leverage the aforementioned features fully. Additionally, we propose a multi-level Octree for SCP to mitigate the reconstruction error for distant areas within the Spherical-coordinate-based Octree. SCP exhibits excellent universality, making it applicable to various learned point cloud compression techniques. Experimental results demonstrate that SCP surpasses previous state-of-the-art methods by up to 29.14% in point-to-point PSNR BD-Rate.

... In order to efficiently code point cloud geometry losslessly, it is necessary to accurately estimate the occupancy probabilities to be employed into a context adaptive arithmetic codec. In our previous work, we have modeled the voxel occupancy distributions using a likelihood-based deep autoregressive network called VoxelDNN [132], inspired by the popular PixelCNN model [136]. VoxelDNN achieves state-of-the-art gains (up to 34%) over the MPEG G-PCC reference codec. ...

... Recently, deep learning has been applied widely in point cloud coding in both the octree domain [87,29] and especially voxel domain [79,141,180,132]. A coding method for static LiDAR point cloud is proposed in [87] which learns the probability distributions of the octree based on contextual information and uses an arithmetic coder for lossless coding. ...

... In this work we focus instead on dense point clouds, where voxel-based approaches have shown interesting results. In particular, our recent work, VoxelDNN [132], is an auto-regressive based model which predicts the distribution of each voxel conditioned on the previously decoded voxels. VoxelDNN obtains an average rate saving of 30% over G-PCC. ...

Point clouds are becoming essential in key applications with advances in capture technologies leading to large volumes of data.Compression is thus essential for storage and transmission.Point Cloud Compression can be divided into two parts: geometry and attribute compression.In addition, point cloud quality assessment is necessary in order to evaluate point cloud compression methods.Geometry compression, attribute compression and quality assessment form the three main parts of this dissertation.The common challenge across these three problems is the sparsity and irregularity of point clouds.Indeed, while other modalities such as images lie on a regular grid, point cloud geometry can be considered as a sparse binary signal over 3D space and attributes are defined on the geometry which can be both sparse and irregular.First, the state of the art for geometry and attribute compression methods with a focus on deep learning based approaches is reviewed.The challenges faced when compressing geometry and attributes are considered, with an analysis of the current approaches to address them, their limitations and the relations between deep learning and traditional ones.We present our work on geometry compression: a convolutional lossy geometry compression approach with a study on the key performance factors of such methods and a generative model for lossless geometry compression with a multiscale variant addressing its complexity issues.Then, we present a folding-based approach for attribute compression that learns a mapping from the point cloud to a 2D grid in order to reduce point cloud attribute compression to an image compression problem.Furthermore, we propose a differentiable deep perceptual quality metric that can be used to train lossy point cloud geometry compression networks while being well correlated with perceived visual quality and a convolutional neural network for point cloud quality assessment based on a patch extraction approach.Finally, we conclude the dissertation and discuss open questions in point cloud compression, existing solutions and perspectives. We highlight the link between existing point cloud compression research and research problems to relevant areas of adjacent fields, such as rendering in computer graphics, mesh compression and point cloud quality assessment.

... The probability model can be based on adaptively maintained counts for various contexts [4], [7] or on a neural network (NN) model [11], [17], having a binary context at the input and the probability mass function of the symbol at the output. ...

... In the last few years, machine learning approaches using neural networks have been proven to be competitive for both lossy [14], [16], [20]- [22] and lossless [11], [17], [20], [24] point cloud geometry compression. A comprehensive survey of the recent methods with a focus on the learning-based approaches is provided in [25]. ...

... In [22], adaptive octree-based decomposition of the point cloud is performed prior to encoding with a multilayer perceptron-based end-toend learned analysis-synthesis architecture. VoxelDNN [11] uses 3D masked convolutional filters to enforce the causality of the 3D context from which the occupancy of 64 × 64 × 64 blocks of voxels are estimated. VoxelContext-Net [20] employs an octree-based deep entropy model for both dynamic and static LIDAR point clouds. ...

In this paper we propose a new paradigm for encoding the geometry of dense point cloud sequences, where a convolutional neural network (CNN), which estimates the encoding distributions, is optimized on several frames of the sequence to be compressed. We adopt lightweight CNN structures, we perform training as part of the encoding process and the CNN parameters are transmitted as part of the bitstream. The newly proposed encoding scheme operates on the octree representation for each point cloud, consecutively encoding each octree resolution level. At every octree resolution level, the voxel grid is traversed section-by-section (each section being perpendicular to a selected coordinate axis), and in each section, the occupancies of groups of two-by-two voxels are encoded at once in a single arithmetic coding operation. A context for the conditional encoding distribution is defined for each two-by-two group of voxels based on the information available about the occupancy of the neighboring voxels in the current and lower resolution layers of the octree. The CNN estimates the probability mass functions of the occupancy patterns of all the voxel groups from one section in four phases. In each new phase, the contexts are updated with the occupancies encoded in the previous phase, and each phase estimates the probabilities in parallel, providing a reasonable trade-off between the parallelism of the processing and the informativeness of the contexts. The CNN training time is comparable to the time spent in the remaining encoding steps, leading to competitive overall encoding times. The bitrates and encoding-decoding times compare favorably with those of recently published compression schemes.

... Recently, entropy encoders based on deep learning have been shown to outperform hand-crafted ones on rate-distortion performance. Among them, some methods partition point clouds into voxels, then adopt 3D convolution to learn and predict the occupancy of each voxel Nguyen et al. 2021a;Quach, Valenzise, and Dufaux 2019;Que, Lu, and Xu 2021). Voxelbased models are capable of exploiting the local geometric patterns (e.g., planes, surfaces). ...

... Voxel-based methods quantize the point cloud and classify the voxel occupancy by neural networks. Voxel-based methods outperform G-PCC (3DG 2021) on lossless geometric compression (Nguyen et al. 2021a;Quach, Valenzise, and Dufaux 2019), lossy geometric compression Dufaux 2019, 2020;Wang et al. 2021) and progressive compression (Guarda, Rodrigues, and Pereira 2020). Compared to an octree, geometric patterns can be naturally preserved in the voxel representation. ...

... In object point cloud compression, we set qs = 1 in Eq. (1) to perform lossless compression. We compare our method with the hand-crafted inter-frame octree-based contexts model P(full) (Garcia et al. 2019), state-of-the-art compression method VoxelDNN (Nguyen et al. 2021a) and its fast version MSVoxelDNN (Nguyen et al. 2021b). We set the training condition following VoxelDNN and test the models on different depth data. ...

In point cloud compression, sufficient contexts are significant for modeling the point cloud distribution. However, the contexts gathered by the previous voxel-based methods decrease when handling sparse point clouds. To address this problem, we propose a multiple-contexts deep learning framework called OctAttention employing the octree structure, a memory-efficient representation for point clouds. Our approach encodes octree symbol sequences in a lossless way by gathering the information of sibling and ancestor nodes. Expressly, we first represent point clouds with octree to reduce spatial redundancy, which is robust for point clouds with different resolutions. We then design a conditional entropy model with a large receptive field that models the sibling and ancestor contexts to exploit the strong dependency among the neighboring nodes and employ an attention mechanism to emphasize the correlated nodes in the context. Furthermore, we introduce a mask operation during training and testing to make a trade-off between encoding time and performance. Compared to the previous state-of-the-art works, our approach obtains a 10%-35% BD-Rate gain on the LiDAR benchmark (e.g. SemanticKITTI) and object point cloud dataset (e.g. MPEG 8i, MVUB), and saves 95% coding time compared to the voxel-based baseline. The code is available at https://github.com/zb12138/OctAttention.

... The probability model can be based on adaptively maintained counts for various contexts [4], [7], or on a NN model [11], [17], having a binary context at the input and the probability mass function of the symbol at the output. ...

... In the last few years, machine learning approaches using neural networks were proved to be successful for both lossy [14], [16], [20]- [22] and lossless [11], [17], [20] point cloud geometry compression. In [14], an auto-encoder architecture involving 3D convolutional layers is employed to generate a latent representation of the point cloud which is further compressed using range coding. ...

... VoxelDNN [11] uses 3D masked convolutional filters to enforce the causality of the 3D context from which the occupancy of 64 × 64 × 64 blocks of voxels are estimated. VoxelContext-Net [20] employs an octree-based deep entropy model for both dynamic and static LIDAR point clouds. ...

We propose a new paradigm for encoding the geometry of point cloud sequences, where the convolutional neural network (CNN) which estimates the encoding distributions is optimized on several frames of the sequence to be compressed. We adopt lightweight CNN structures, we perform training as part of the encoding process, and the CNN parameters are transmitted as part of the bitstream. The newly proposed encoding scheme operates on the octree representation for each point cloud, encoding consecutively each octree resolution layer. At every octree resolution layer, the voxel grid is traversed section-by-section (each section being perpendicular to a selected coordinate axis) and in each section the occupancies of groups of two-by-two voxels are encoded at once, in a single arithmetic coding operation. A context for the conditional encoding distribution is defined for each two-by-two group of voxels, based on the information available about the occupancy of neighbor voxels in the current and lower resolution layers of the octree. The CNN estimates the probability distributions of occupancy patterns of all voxel groups from one section in four phases. In each new phase the contexts are updated with the occupancies encoded in the previous phase, and each phase estimates the probabilities in parallel, providing a reasonable trade-off between the parallelism of processing and the informativeness of the contexts. The CNN training time is comparable to the time spent in the remaining encoding steps, leading to competitive overall encoding times. Bitrates and encoding-decoding times compare favorably with those of recently published compression schemes.

... Recently, entropy encoders based on deep learning have been shown to outperform hand-crafted ones on rate-distortion performance. Among them, some methods partition point clouds into voxels, then adopt 3D convolution to learn and predict the occupancy of each voxel Nguyen et al. 2021a;Quach, Valenzise, and Dufaux 2019;Que, Lu, and Xu 2021). Voxelbased models are capable of exploiting the local geometric patterns (e.g., planes, surfaces). ...

... Voxel-based methods quantize the point cloud and classify the voxel occupancy by neural networks. Voxel-based methods outperform G-PCC (3DG 2021) on lossless geometric compression (Nguyen et al. 2021a;Quach, Valenzise, and Dufaux 2019), lossy geometric compression Dufaux 2019, 2020;Wang et al. 2021) and progressive compression (Guarda, Rodrigues, and Pereira 2020). Compared to an octree, geometric patterns can be naturally preserved in the voxel representation. ...

... In object point cloud compression, we set qs = 1 in Eq. (1) to perform lossless compression. We compare our method with the hand-crafted inter-frame octree-based contexts model P(full) (Garcia et al. 2019), state-of-the-art compression method VoxelDNN (Nguyen et al. 2021a) and its fast version MSVoxelDNN (Nguyen et al. 2021b). We set the training condition following VoxelDNN and test the models on different depth data. ...

In point cloud compression, sufficient contexts are significant for modeling the point cloud distribution. However, the contexts gathered by the previous voxel-based methods decrease when handling sparse point clouds. To address this problem, we propose a multiple-contexts deep learning framework called OctAttention employing the octree structure, a memory-efficient representation for point clouds. Our approach encodes octree symbol sequences in a lossless way by gathering the information of sibling and ancestor nodes. Expressly, we first represent point clouds with octree to reduce spatial redundancy, which is robust for point clouds with different resolutions. We then design a conditional entropy model with a large receptive field that models the sibling and ancestor contexts to exploit the strong dependency among the neighboring nodes and employ an attention mechanism to emphasize the correlated nodes in the context. Furthermore, we introduce a mask operation during training and testing to make a trade-off between encoding time and performance. Compared to the previous state-of-the-art works, our approach obtains a 10%-35% BD-Rate gain on the LiDAR benchmark (e.g. SemanticKITTI) and object point cloud dataset (e.g. MPEG 8i, MVUB), and saves 95% coding time compared to the voxel-based baseline. The code is available at https://github.com/zb12138/OctAttention.

... We consider the baseline PCC architecture from Quach et al. [2020], which incorporates several improvements upon Quach et al. [2019]. These improvements include sequential training, focal loss, and more efficient architectural choices including residual blocks and progressively increasing channels as the resolution decreases. ...

... These improvements include sequential training, focal loss, and more efficient architectural choices including residual blocks and progressively increasing channels as the resolution decreases. Their approach is distinct from other prior work focusing on graph-based representations of point clouds [de Oliveira Rente et al., 2018], multiscale approaches to compression ; lossless compression [Nguyen et al., 2020], and vector-quantized approaches [Caccia et al., 2020]. While these learned PCC approaches considerably improve rate distortion compared to traditional algorithms, these improvements come at the cost of slower run times and decreased computational efficiency. ...

... This architecture combines a better choice of activation and a more efficient convolutional block. In direct comparison with the current state of the art for operation-efficient learned point cloud compression (e.g., Quach et al. [2020]) using dense convolutions, DeepCompress has a dramatic reduction in parameter and operation count with a minimal loss in quality. In later sections, we show that DeepCompress reduces total model parameter count by 20% and reduces total convolution operation count by 8%. ...

Point clouds are a basic data type that is increasingly of interest as 3D content becomes more ubiquitous. Applications using point clouds include virtual, augmented, and mixed reality and autonomous driving. We propose a more efficient deep learning-based encoder architecture for point clouds compression that incorporates principles from established 3D object detection and image compression architectures. Through an ablation study, we show that incorporating the learned activation function from Computational Efficient Neural Image Compression (CENIC) and designing more parameter-efficient convolutional blocks yields dramatic gains in efficiency and performance. Our proposed architecture incorporates Generalized Divisive Normalization activations and propose a spatially separable InceptionV4-inspired block. We then evaluate rate-distortion curves on the standard JPEG Pleno 8i Voxelized Full Bodies dataset to evaluate our model's performance. Our proposed modifications outperform the baseline approaches by a small margin in terms of Bjontegard delta rate and PSNR values, yet reduces necessary encoder convolution operations by 8 percent and reduces total encoder parameters by 20 percent. Our proposed architecture, when considered on its own, has a small penalty of 0.02 percent in Chamfer's Distance and 0.32 percent increased bit rate in Point to Plane Distance for the same peak signal-to-noise ratio.

... to be employed into a context adaptive arithmetic codec. In our previous work, we have modeled the voxel occupancy distributions using a likelihood-based deep autoregressive network called VoxelDNN [1], inspired by the popular PixelCNN model [3]. VoxelDNN achieves state-of-the-art gains (up to 34%) over the MPEG G-PCC reference codec. ...

... Recently, deep learning has been applied widely in point cloud coding in both the octree domain [11], [17] and especially voxel domain [1], [18]- [20]. A coding method for static LiDAR point cloud is proposed in [11] which learns the probability distributions of the octree based on contextual information and uses an arithmetic coder for lossless coding. ...

... In this work we focus instead on dense point clouds, where voxel-based approaches have shown interesting results. In particular, our recent work, VoxelDNN [1], is an auto-regressive based model which predicts the distribution of each voxel conditioned on the previously decoded voxels. VoxelDNN obtains an average rate saving of 30% over G-PCC. ...

We propose a practical deep generative approach for lossless point cloud geometry compression, called MSVoxelDNN, and show that it significantly reduces the rate compared to the MPEG G-PCC codec. Our previous work based on autoregressive models (VoxelDNN) has a fast training phase, however, inference is slow as the occupancy probabilities are predicted sequentially, voxel by voxel. In this work, we employ a multiscale architecture which models voxel occupancy in coarse-to-fine order. At each scale, MSVoxelDNN divides voxels into eight conditionally independent groups, thus requiring a single network evaluation per group instead of one per voxel. We evaluate the performance of MSVoxelDNN on a set of point clouds from Microsoft Voxelized Upper Bodies (MVUB) and MPEG, showing that the current method speeds up encoding/decoding times significantly compared to the previous VoxelDNN, while having average rate saving over G-PCC of 17.5%. The implementation is available at https://github.com/Weafre/MSVoxelDNN.

... Meanwhile, in the technical literature appeared contributions showing improvements over the standardized solutions, for some specific classes of point clouds. Encoding the geometry of voxelized point clouds is a first task, solved both in V-PCC and G-PCC, and for which several recent publications provided alternative solutions, see e.g. the lossless codecs DD [3] and BVL [4] based on using 2D image coding tools, and more recently VoxelDNN [5] and MSVoxelDNN [6] based on deep neural networks for providing the arithmetic coder with coding probabilities. ...

... VoxelDNN [5], a recent DNN-based method, generated probabilities for the occupancy of voxels, using a wider conditional template than in the octree model, by splitting the space in very large cubes, e.g. 64 × 64 × 64, and generating conditional probabilities p(b i |b 1 , . . . ...

This paper describes a novel lossless point cloud compression algorithm that uses a neural network for estimating the coding probabilities for the occupancy status of voxels, depending on wide three dimensional contexts around the voxel to be encoded. The point cloud is represented as an octree, with each resolution layer being sequentially encoded and decoded using arithmetic coding, starting from the lowest resolution, until the final resolution is reached. The occupancy probability of each voxel of the splitting pattern at each node of the octree is modeled by a neural network, having at its input the already encoded occupancy status of several octree nodes (belonging to the past and current resolutions), corresponding to a 3D context surrounding the node to be encoded. The algorithm has a fast and a slow version, the fast version selecting differently several voxels of the context, which allows an increased parallelization by sending larger batches of templates to be estimated by the neural network, at both encoder and decoder. The proposed algorithms yield state-of-the-art results on benchmark datasets. The implementation will be made available at https://github.com/marmus12/nnctx

... Wang et al. [24] have made further optimization of the voxel-based PCC method and proposed a sparse convolution-based PCC framework, which greatly reduces the computational. Nguyen et al. [26] have attempted to mix octree and voxel-based coding, which partition the point cloud into multi-resolution voxel blocks. You et al. [25] have proposed a direct way to deal with points, they divide the point cloud into multiple blocks, encode each block independently, and recombine all patches into a complete point cloud in the decoding process. ...

With the development of 3D sensors technology, 3D point cloud is widely used in industrial scenes due to their high accuracy, which promotes the development of point cloud compression technology. Learned point cloud compression has attracted much attention for its excellent rate distortion performance. However, there is a one-to-one correspondence between the model and the compression rate in these methods. To achieve compression at different rates, a large number of models need to be trained, which increases the training time and storage space. To address this problem, a variable rate point cloud compression method is proposed, which enables the adjustment of the compression rate by the hyperparameter in a single model. To address the narrow rate range problem that occurs when the traditional rate distortion loss is jointly optimized for variable rate models, a rate expansion method based on contrastive learning is proposed to expands the bit rate range of the model. To improve the visualization effect of the reconstructed point cloud, a boundary learning method is introduced to improve the classification ability of the boundary points through boundary optimization and enhance the overall model performance. The experimental results show that the proposed method achieves variable rate compression with a large bit rate range while ensuring the model performance. The proposed method outperforms G-PCC, achieving more than 70% BD-Rate against G-PCC, and performs about, as well as the learned methods at high bit rates.

... Researchers have explored several techniques in learningbased PCC, such as voxelization followed by 3D convolution, sparse convolution, and Multi-Layer Perceptron (MLP). For example, Quach et al. [8], [35] and Nguyen et al. [36], [37] converted point clouds into 3D grids using voxelization and represented each voxel with an occupied or unoccupied state. Guarda et al. explored learning-based scalable coding for geometry [38], [39] and obtained multiple rate-distortion points from a trained model using explicit quantization of the latent representation [40]. ...

The emergence of digital avatars has raised an exponential increase in the demand for human point clouds with realistic and intricate details. The compression of such data becomes challenging with overwhelming data amounts comprising millions of points. Herein, we leverage the human geometric prior in geometry redundancy removal of point clouds, greatly promoting the compression performance. More specifically, the prior provides topological constraints as geometry initialization, allowing adaptive adjustments with a compact parameter set that could be represented with only a few bits. Therefore, we can envisage high-resolution human point clouds as a combination of geometric priors and structural deviations. The priors could first be derived with an aligned point cloud, and subsequently the difference of features is compressed into a compact latent code. The proposed framework can operate in a play-and-plug fashion with existing learning based point cloud compression methods. Extensive experimental results show that our approach significantly improves the compression performance without deteriorating the quality, demonstrating its promise in a variety of applications.

... The point cloud geometry is first projected onto lower-dimensional latent spaces, the latent spaces are then encoded by entropy coders with auto-regressive and/or hyperprior-based context modeling. In terms of lossless point cloud geometry coding, efficient autoregressive voxel-based compression methods (VoxelDNN) have been proposed in [26], [27] which was further developed to a multi-scale approach by the authors [28]. However, those voxel-based point cloud geometry compression methods require high computational complexity. ...

In recent years, we have witnessed the presence of point cloud data in many aspects of our life, from immersive media, autonomous driving to healthcare, although at the cost of a tremendous amount of data. In this paper, we present an efficient lossless point cloud compression method that uses sparse tensor-based deep neural networks to learn point cloud geometry and color probability distributions. Our method represents a point cloud with both occupancy feature and three attribute features at different bit depths in a unified sparse representation. This allows us to efficiently exploit feature-wise and point-wise dependencies within point clouds using a sparse tensor-based neural network and thus build an accurate auto-regressive context model for an arithmetic coder. To the best of our knowledge, this is the first learning-based lossless point cloud geometry and attribute compression approach. Compared with the-state-of-the-art lossless point cloud compression method from Moving Picture Experts Group (MPEG), our method achieves 22.6% reduction in total bitrate on a diverse set of test point clouds while having 49.0% and 18.3% rate reduction on geometry and color attribute component, respectively.

... By introducing a sequential dependency in the voxel grid, one can use voxels at the current LoD as context. Nguyen et al. (2021) proposes a deep CNN with masked convolutions called VoxelDNN for lossless compression of point cloud geometry. The neural network predicts the occupancy probability of each voxel, and the probabilities are then fed to an arithmetic coder. ...

Point clouds are becoming essential in key applications with advances in capture technologies leading to large volumes of data. Compression is thus essential for storage and transmission. In this work, the state of the art for geometry and attribute compression methods with a focus on deep learning based approaches is reviewed. The challenges faced when compressing geometry and attributes are considered, with an analysis of the current approaches to address them, their limitations and the relations between deep learning and traditional ones. Current open questions in point cloud compression, existing solutions and perspectives are identified and discussed. Finally, the link between existing point cloud compression research and research problems to relevant areas of adjacent fields, such as rendering in computer graphics, mesh compression and point cloud quality assessment, is highlighted.

... Compression method ShapeNet [1] Wang et al. [2,3] ModelNet [4] Quach et al. [5,6] and Nguyen et al. [7] MPEG Alexiou et al. [8] and Guarda et al. [9,10,11,12,13,14,15] JPEG Pleno [16] Alexiou et al. [8] nuScenes [17] Wiesman et al. [18] Depending on application, the number of points in a typical point cloud model can range from thousands up to the order of billions. Since the transmission and storage of such huge amount of data is impractical, efficient compression methods are paramount. ...

The popularisation of acquisition devices capable of capturing volumetric information such as LiDAR scans and depth cameras has lead to an increased interest in point clouds as an imaging modality. Due to the high amount of data needed for their representation, efficient compression solutions are needed to enable practical applications. Among the many techniques that have been proposed in the last years, learning-based methods are receiving large attention due to their high performance and potential for improvement. Such algorithms depend on large and diverse training sets to achieve good compression performance. ShapeNet is a large-scale dataset composed of CAD models with texture and constitute and effective option for training such compression methods. This dataset is entirely composed of meshes, which must go through a sampling process in order to obtain point clouds with geometry and texture information. Although many existing software libraries are able to sample geometry from meshes through simple functions, obtaining an output point cloud with geometry and color of the external faces of the mesh models is not a straightforward process for the ShapeNet dataset. The main difficulty associated with this dataset is that its models are often defined with duplicated faces sharing the same vertices, but with different color values. This document describes a script for sampling the meshes from ShapeNet that circumvent this issue by excluding the internal faces of the mesh models prior to the sampling. The script can be accessed from the following link: https://github.com/mmspg/mesh-sampling.

... These architectures are insufficient for the processing of large point cloud data. VoxelDNN was proposed in [38] which combines the octree and voxel domains. Inference in this lossless compression is slow, and the occupancy probabilities are predicted sequentially, voxel by voxel, while the improved MSVoxelDNN models voxel occupancy and achieves rate savings over G-PCC up to 17% on average [39]. ...

In this paper we will present a new dynamic point cloud compression based on different projection types and bit depth, combined with the surface reconstruction algorithm and video compression for obtained geometry and texture maps. Texture maps have been compressed after creating Voronoi diagrams. Used video compression is specific for geometry (FFV1) and texture (H.265/HEVC). Decompressed point clouds are reconstructed using a Poisson surface reconstruction algorithm. Comparison with the original point clouds was performed using point-to-point and point-to-plane measures. Comprehensive experiments show better performance for some projection maps: cylindrical, Miller and Mercator projections.

In recent years, we have witnessed the presence of point cloud data in many aspects of our life, from immersive media, autonomous driving to healthcare, although at the cost of a tremendous amount of data. In this paper, we present an efficient lossless point cloud compression method that uses sparse tensor-based deep neural networks to learn point cloud geometry and color probability distributions. Our method represents a point cloud with both occupancy feature and three attribute features at different bit depths in a unified sparse representation. This allows us to efficiently exploit feature-wise and point-wise dependencies within point clouds using a sparse tensor-based neural network and thus build an accurate auto-regressive context model for an arithmetic coder. To the best of our knowledge, this is the first learning-based lossless point cloud geometry and attribute compression approach. Compared with the-state-of-the-art lossless point cloud compression method from Moving Picture Experts Group (MPEG), our method achieves 22.6% reduction in total bitrate on a diverse set of test point clouds while having 49.0% and 18.3% rate reduction on geometry and color attribute component, respectively.

Since the development of 3D applications, the point cloud, as a spatial description easily acquired by sensors, has been widely used in multiple areas such as SLAM and 3D reconstruction. Point Cloud Compression (PCC) has also attracted more attention as a primary step before point cloud transferring and saving, where the geometry compression is an important component of PCC to compress the points geometrical structures. However, existing non-learning-based geometry compression methods are often limited by manually pre-defined compression rules. Though learning-based compression methods can significantly improve the algorithm performances by learning compression rules from data, they still have some defects. Voxel-based compression networks introduce precision errors due to the voxelized operations, while point-based methods may have relatively weak robustness and are mainly designed for sparse point clouds. In this work, we propose a novel learning-based point cloud compression framework named 3D Point Cloud Geometry Quantiation Compression Network (3QNet), which overcomes the robustness limitation of existing point-based methods and can handle dense points. By learning a codebook including common structural features from simple and sparse shapes, 3QNet can efficiently deal with multiple kinds of point clouds. According to experiments on object models, indoor scenes, and outdoor scans, 3QNet can achieve better compression performances than many representative methods.

The increase in popularity of point-cloud-oriented applications has triggered the development of specialized compression algorithms. In this paper, a novel algorithm is developed for the lossless geometry compression of voxelized point clouds following an intra-frame design. The encoded voxels are arranged into runs and are encoded through a single-pass application directly on the voxel domain. This is done without representing the point cloud via an octree nor rendering the voxel space through an occupancy matrix, therefore decreasing the memory requirements of the method. Each run is compressed using a context-adaptive arithmetic encoder yielding state-of-the-art compression results, with gains of up to 15% over
TMC13
, MPEG’s standard for point cloud geometry compression. Several proposed contributions accelerate the calculations of each run’s probability limits prior to arithmetic encoding. As a result, the encoder attains a low computational complexity described by a linear relation to the number of occupied voxels leading to an average speedup of 1.8 over
TMC13
in encoding speeds. Various experiments are conducted assessing the proposed algorithm’s state-of-the-art performance in terms of compression ratio and encoding speeds.

This paper presents a novel method to determine rate-distortion optimized transform coefficients for efficient compression of videos generated from point clouds. The method exploits a generalized frequency selective extrapolation approach that iteratively determines rate-distortion-optimized coefficients for all basis functions of two-dimensional discrete cosine and sine transforms. The method is applied to blocks containing both occupied and unoccupied pixels in video based point cloud compression for HEVC encoding. In the proposed algorithm, only the values of the transform coefficients are changed such that resulting bit streams are compliant to the V-PCC standard. For all-intra coded point clouds, bitrate savings of more than 4% for geometry and more than 6% for texture error metrics with respect to standard encoding can be observed. These savings are more than twice as high as savings obtained using competing methods from literature. In the randomaccess case, our proposed method outperforms competing V-PCC methods by more than 0.5%.

This paper presents a novel method to determine rate-distortion optimized transform coefficients for efficient compression of videos generated from point clouds. The method exploits a generalized frequency selective extrapolation approach that iteratively determines rate-distortion-optimized coefficients for all basis functions of two-dimensional discrete cosine and sine transforms. The method is applied to blocks containing both occupied and unoccupied pixels in video based point cloud compression for HEVC encoding. In the proposed algorithm, only the values of the transform coefficients are changed such that resulting bit streams are compliant to the V-PCC standard. For all-intra coded point clouds, bitrate savings of more than 4% for geometry and more than 6% for texture error metrics with respect to standard encoding can be observed. These savings are more than twice as high as savings obtained using competing methods from literature. In the randomaccess case, our proposed method outperforms competing V-PCC methods by more than 0.5%.

This article presents an overview of the recent standardization activities for point cloud compression (PCC). A point cloud is a 3D data representation used in diverse applications associated with immersive media including virtual/augmented reality, immersive telepresence, autonomous driving and cultural heritage archival. The international standard body for media compression, also known as the Motion Picture Experts Group (MPEG), is planning to release in 2020 two PCC standard specifications: video-based PCC (V-CC) and geometry-based PCC (G-PCC). V-PCC and G-PCC will be part of the ISO/IEC 23090 series on the coded representation of immersive media content. In this paper, we provide a detailed description of both codec algorithms and their coding performances. Moreover, we will also discuss certain unique aspects of point cloud compression.

Efficient point cloud compression is fundamental to enable the deployment of virtual and mixed reality applications, since the number of points to code can range in the order of millions. In this paper, we present a novel data-driven geometry compression method for static point clouds based on learned convolutional transforms and uniform quantization. We perform joint optimization of both rate and distortion using a trade-off parameter. In addition, we cast the decoding process as a binary classification of the point cloud occupancy map. Our method outperforms the MPEG reference solution in terms of rate-distortion on the Microsoft Voxelized Upper Bodies dataset with 51.5% BDBR savings on average. Moreover, while octree-based methods face exponential diminution of the number of points at low bitrates, our method still produces high resolution outputs even at low bitrates. Code and supplementary material are available at https://github.com/mauriceqch/pcc_geo_cnn .

We present a method to compress geometry information of point clouds that explores redundancies across consecutive frames of a sequence. It uses octrees and works by progressively increasing resolution of the octree. At each branch of the tree, we generate an approximation of the child nodes by a number of methods which are used as contexts to drive an arithmetic coder. The best approximation, i.e. the context that yields the least amount of encoding bits, is selected and the chosen method is indicated as side information for replication at the decoder. The core of our method is a context-based arithmetic coder in which a reference octree is used as reference to encode the current octree, thus providing 255 contexts for each output octet. The 255×255 frequency histogram is viewed as a discrete 3D surface and is conveyed to the decoder using another octree. We present two methods to generate the predictions (contexts) which use adjacent frames in the sequence (inter-frame) and one method that works purely intra-frame. The encoder continuously switches the best mode among the three and conveys such information to the decoder. Since an intra-frame prediction is present, our coder can also work in purely intra-frame mode, as well. Extensive results are presented to show the method’s potential against many compression alternatives for the geometry information in dynamic voxelized point clouds.

The widespread adoption of new 3D sensor and authoring technologies has made it possible to capture 3D scenes and models in real time with decent visual quality. As an example, Microsoft's Kinect and Apple's PrimeSense technology are now being used in a wide variety of interactive 3D mobile applications, including gaming and augmented reality applications. The latest smartphones are equipped with multiple cameras, which can be readily used to generate depth images. Some of the latest smartphones also include depth-ranging sensors that can be used for 3D model generation. Light-based detection and ranging (lidar) technologies are yet another field where 3D depth acquisition is important. Realtime 3D scenery detection and ranging has become an important issue for the emerging field of autonomous navigation and driving applications.

Due to the increased popularity of augmented and virtual reality experiences, the interest in capturing the real world in multiple dimensions and in presenting it to users in an immersible fashion has never been higher. Distributing such representations enables users to freely navigate in multi-sensory 3D media experiences. Unfortunately, such representations require a large amount of data, not feasible for transmission on today’s networks. Efficient compression technologies well adopted in the content chain are in high demand and are key components to democratize augmented and virtual reality applications. The Moving Picture Experts Group, MPEG, as one of the main standardization groups dealing with multimedia, identified the trend and started recently the process of building an open standard for compactly representing 3D point clouds, which are the 3D equivalent of the very well-known 2D pixels. This paper introduces the main developments and technical aspects of this ongoing standardization effort.

Deep Neural Networks trained as image auto-encoders have recently emerged as a promising direction for advancing the state of the art in image compression. The key challenge in learning such networks is twofold: to deal with quantization, and to control the trade-off between reconstruction error (distortion) and entropy (rate) of the latent image representation. In this paper, we focus on the latter challenge and propose a new technique to navigate the rate-distortion trade-off for an image compression auto-encoder. The main idea is to directly model the entropy of the latent representation by using a context model: a 3D-CNN which learns a conditional probability model of the latent distribution of the auto-encoder. During training, the auto-encoder makes use of the context model to estimate the entropy of its representation, and the context model is concurrently updated to learn the dependencies between the symbols in the latent representation. Our experiments show that this approach yields a state-of-the-art image compression system based on a simple convolutional auto-encoder.

PixelCNNs are a recently proposed class of powerful generative models with tractable likelihood. Here we discuss our implementation of PixelCNNs which we make available at https://github.com/openai/pixel-cnn. Our implementation contains a number of modifications to the original model that both simplify its structure and improve its performance. 1) We use a discretized logistic mixture likelihood on the pixels, rather than a 256-way softmax, which we find to speed up training. 2) We condition on whole pixels, rather than R/G/B sub-pixels, simplifying the model structure. 3) We use downsampling to efficiently capture structure at multiple resolutions. 4) We introduce additional short-cut connections to further speed up optimization. 5) We regularize the model using dropout. Finally, we present state-of-the-art log likelihood results on CIFAR-10 to demonstrate the usefulness of these modifications.

Modeling the distribution of natural images is a landmark problem in unsupervised learning. This task requires an image model that is at once expressive, tractable and scalable. We present a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions. Our method models the discrete probability of the raw pixel values and encodes the complete set of dependencies in the image. Architectural novelties include fast two-dimensional recurrent layers and an effective use of residual connections in deep recurrent networks. We achieve log-likelihood scores on natural images that are considerably better than the previous state of the art. Our main results also provide benchmarks on the diverse ImageNet dataset. Samples generated from the model appear crisp, varied and globally coherent.

We introduce Adam, an algorithm for first-order gradient-based optimization
of stochastic objective functions. The method is straightforward to implement
and is based an adaptive estimates of lower-order moments of the gradients. The
method is computationally efficient, has little memory requirements and is well
suited for problems that are large in terms of data and/or parameters. The
method is also ap- propriate for non-stationary objectives and problems with
very noisy and/or sparse gradients. The method exhibits invariance to diagonal
rescaling of the gradients by adapting to the geometry of the objective
function. The hyper-parameters have intuitive interpretations and typically
require little tuning. Some connections to related algorithms, on which Adam
was inspired, are discussed. We also analyze the theoretical convergence
properties of the algorithm and provide a regret bound on the convergence rate
that is comparable to the best known results under the online convex
optimization framework. We demonstrate that Adam works well in practice when
experimentally compared to other stochastic optimization methods.

Microsoft voxelized upper bodies - a voxelized point cloud dataset

- C Loop
- Q Cai
- S O Escolano
- P A Chou

8i Voxelized Full Bodies - A Voxelized Point Cloud Dataset

- E Eon
- B Harrison
- T Myers
- P A Chou

Microsoft voxelized upper bodies - a voxelized point cloud dataset

- loop