Conference Paper

Learning-Based Lossless Compression of 3D Point Cloud Geometry

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The probability model can be based on adaptively maintained counts for various contexts [4], [7], or on a NN model [11], [17], having a binary context at the input and the probability mass function of the symbol at the output. ...
... In the last few years, machine learning approaches using neural networks were proved to be successful for both lossy [14], [16], [20]- [22] and lossless [11], [17], [20] point cloud geometry compression. In [14], an auto-encoder architecture involving 3D convolutional layers is employed to generate a latent representation of the point cloud which is further compressed using range coding. ...
... VoxelDNN [11] uses 3D masked convolutional filters to enforce the causality of the 3D context from which the occupancy of 64 × 64 × 64 blocks of voxels are estimated. VoxelContext-Net [20] employs an octree-based deep entropy model for both dynamic and static LIDAR point clouds. ...
Preprint
Full-text available
We propose a new paradigm for encoding the geometry of point cloud sequences, where the convolutional neural network (CNN) which estimates the encoding distributions is optimized on several frames of the sequence to be compressed. We adopt lightweight CNN structures, we perform training as part of the encoding process, and the CNN parameters are transmitted as part of the bitstream. The newly proposed encoding scheme operates on the octree representation for each point cloud, encoding consecutively each octree resolution layer. At every octree resolution layer, the voxel grid is traversed section-by-section (each section being perpendicular to a selected coordinate axis) and in each section the occupancies of groups of two-by-two voxels are encoded at once, in a single arithmetic coding operation. A context for the conditional encoding distribution is defined for each two-by-two group of voxels, based on the information available about the occupancy of neighbor voxels in the current and lower resolution layers of the octree. The CNN estimates the probability distributions of occupancy patterns of all voxel groups from one section in four phases. In each new phase the contexts are updated with the occupancies encoded in the previous phase, and each phase estimates the probabilities in parallel, providing a reasonable trade-off between the parallelism of processing and the informativeness of the contexts. The CNN training time is comparable to the time spent in the remaining encoding steps, leading to competitive overall encoding times. Bitrates and encoding-decoding times compare favorably with those of recently published compression schemes.
... Recently, entropy encoders based on deep learning have been shown to outperform hand-crafted ones on rate-distortion performance. Among them, some methods partition point clouds into voxels, then adopt 3D convolution to learn and predict the occupancy of each voxel Nguyen et al. 2021a;Quach, Valenzise, and Dufaux 2019;Que, Lu, and Xu 2021). Voxelbased models are capable of exploiting the local geometric patterns (e.g., planes, surfaces). ...
... Voxel-based methods quantize the point cloud and classify the voxel occupancy by neural networks. Voxel-based methods outperform G-PCC (3DG 2021) on lossless geometric compression (Nguyen et al. 2021a;Quach, Valenzise, and Dufaux 2019), lossy geometric compression Dufaux 2019, 2020;Wang et al. 2021) and progressive compression (Guarda, Rodrigues, and Pereira 2020). Compared to an octree, geometric patterns can be naturally preserved in the voxel representation. ...
... In object point cloud compression, we set qs = 1 in Eq. (1) to perform lossless compression. We compare our method with the hand-crafted inter-frame octree-based contexts model P(full) (Garcia et al. 2019), state-of-the-art compression method VoxelDNN (Nguyen et al. 2021a) and its fast version MSVoxelDNN (Nguyen et al. 2021b). We set the training condition following VoxelDNN and test the models on different depth data. ...
Preprint
Full-text available
In point cloud compression, sufficient contexts are significant for modeling the point cloud distribution. However, the contexts gathered by the previous voxel-based methods decrease when handling sparse point clouds. To address this problem, we propose a multiple-contexts deep learning framework called OctAttention employing the octree structure, a memory-efficient representation for point clouds. Our approach encodes octree symbol sequences in a lossless way by gathering the information of sibling and ancestor nodes. Expressly, we first represent point clouds with octree to reduce spatial redundancy, which is robust for point clouds with different resolutions. We then design a conditional entropy model with a large receptive field that models the sibling and ancestor contexts to exploit the strong dependency among the neighboring nodes and employ an attention mechanism to emphasize the correlated nodes in the context. Furthermore, we introduce a mask operation during training and testing to make a trade-off between encoding time and performance. Compared to the previous state-of-the-art works, our approach obtains a 10%-35% BD-Rate gain on the LiDAR benchmark (e.g. SemanticKITTI) and object point cloud dataset (e.g. MPEG 8i, MVUB), and saves 95% coding time compared to the voxel-based baseline. The code is available at https://github.com/zb12138/OctAttention.
... We consider the baseline PCC architecture from Quach et al. [2020], which incorporates several improvements upon Quach et al. [2019]. These improvements include sequential training, focal loss, and more efficient architectural choices including residual blocks and progressively increasing channels as the resolution decreases. ...
... These improvements include sequential training, focal loss, and more efficient architectural choices including residual blocks and progressively increasing channels as the resolution decreases. Their approach is distinct from other prior work focusing on graph-based representations of point clouds [de Oliveira Rente et al., 2018], multiscale approaches to compression ; lossless compression [Nguyen et al., 2020], and vector-quantized approaches [Caccia et al., 2020]. While these learned PCC approaches considerably improve rate distortion compared to traditional algorithms, these improvements come at the cost of slower run times and decreased computational efficiency. ...
... This architecture combines a better choice of activation and a more efficient convolutional block. In direct comparison with the current state of the art for operation-efficient learned point cloud compression (e.g., Quach et al. [2020]) using dense convolutions, DeepCompress has a dramatic reduction in parameter and operation count with a minimal loss in quality. In later sections, we show that DeepCompress reduces total model parameter count by 20% and reduces total convolution operation count by 8%. ...
Preprint
Full-text available
Point clouds are a basic data type that is increasingly of interest as 3D content becomes more ubiquitous. Applications using point clouds include virtual, augmented, and mixed reality and autonomous driving. We propose a more efficient deep learning-based encoder architecture for point clouds compression that incorporates principles from established 3D object detection and image compression architectures. Through an ablation study, we show that incorporating the learned activation function from Computational Efficient Neural Image Compression (CENIC) and designing more parameter-efficient convolutional blocks yields dramatic gains in efficiency and performance. Our proposed architecture incorporates Generalized Divisive Normalization activations and propose a spatially separable InceptionV4-inspired block. We then evaluate rate-distortion curves on the standard JPEG Pleno 8i Voxelized Full Bodies dataset to evaluate our model's performance. Our proposed modifications outperform the baseline approaches by a small margin in terms of Bjontegard delta rate and PSNR values, yet reduces necessary encoder convolution operations by 8 percent and reduces total encoder parameters by 20 percent. Our proposed architecture, when considered on its own, has a small penalty of 0.02 percent in Chamfer's Distance and 0.32 percent increased bit rate in Point to Plane Distance for the same peak signal-to-noise ratio.
... to be employed into a context adaptive arithmetic codec. In our previous work, we have modeled the voxel occupancy distributions using a likelihood-based deep autoregressive network called VoxelDNN [1], inspired by the popular PixelCNN model [3]. VoxelDNN achieves state-of-the-art gains (up to 34%) over the MPEG G-PCC reference codec. ...
... Recently, deep learning has been applied widely in point cloud coding in both the octree domain [11], [17] and especially voxel domain [1], [18]- [20]. A coding method for static LiDAR point cloud is proposed in [11] which learns the probability distributions of the octree based on contextual information and uses an arithmetic coder for lossless coding. ...
... In this work we focus instead on dense point clouds, where voxel-based approaches have shown interesting results. In particular, our recent work, VoxelDNN [1], is an auto-regressive based model which predicts the distribution of each voxel conditioned on the previously decoded voxels. VoxelDNN obtains an average rate saving of 30% over G-PCC. ...
Preprint
Full-text available
We propose a practical deep generative approach for lossless point cloud geometry compression, called MSVoxelDNN, and show that it significantly reduces the rate compared to the MPEG G-PCC codec. Our previous work based on autoregressive models (VoxelDNN) has a fast training phase, however, inference is slow as the occupancy probabilities are predicted sequentially, voxel by voxel. In this work, we employ a multiscale architecture which models voxel occupancy in coarse-to-fine order. At each scale, MSVoxelDNN divides voxels into eight conditionally independent groups, thus requiring a single network evaluation per group instead of one per voxel. We evaluate the performance of MSVoxelDNN on a set of point clouds from Microsoft Voxelized Upper Bodies (MVUB) and MPEG, showing that the current method speeds up encoding/decoding times significantly compared to the previous VoxelDNN, while having average rate saving over G-PCC of 17.5%. The implementation is available at https://github.com/Weafre/MSVoxelDNN.
... Meanwhile, in the technical literature appeared contributions showing improvements over the standardized solutions, for some specific classes of point clouds. Encoding the geometry of voxelized point clouds is a first task, solved both in V-PCC and G-PCC, and for which several recent publications provided alternative solutions, see e.g. the lossless codecs DD [3] and BVL [4] based on using 2D image coding tools, and more recently VoxelDNN [5] and MSVoxelDNN [6] based on deep neural networks for providing the arithmetic coder with coding probabilities. ...
... VoxelDNN [5], a recent DNN-based method, generated probabilities for the occupancy of voxels, using a wider conditional template than in the octree model, by splitting the space in very large cubes, e.g. 64 × 64 × 64, and generating conditional probabilities p(b i |b 1 , . . . ...
Preprint
This paper describes a novel lossless point cloud compression algorithm that uses a neural network for estimating the coding probabilities for the occupancy status of voxels, depending on wide three dimensional contexts around the voxel to be encoded. The point cloud is represented as an octree, with each resolution layer being sequentially encoded and decoded using arithmetic coding, starting from the lowest resolution, until the final resolution is reached. The occupancy probability of each voxel of the splitting pattern at each node of the octree is modeled by a neural network, having at its input the already encoded occupancy status of several octree nodes (belonging to the past and current resolutions), corresponding to a 3D context surrounding the node to be encoded. The algorithm has a fast and a slow version, the fast version selecting differently several voxels of the context, which allows an increased parallelization by sending larger batches of templates to be estimated by the neural network, at both encoder and decoder. The proposed algorithms yield state-of-the-art results on benchmark datasets. The implementation will be made available at https://github.com/marmus12/nnctx
... By introducing a sequential dependency in the voxel grid, one can use voxels at the current LoD as context. Nguyen et al. (2021) proposes a deep CNN with masked convolutions called VoxelDNN for lossless compression of point cloud geometry. The neural network predicts the occupancy probability of each voxel, and the probabilities are then fed to an arithmetic coder. ...
Article
Full-text available
Point clouds are becoming essential in key applications with advances in capture technologies leading to large volumes of data. Compression is thus essential for storage and transmission. In this work, the state of the art for geometry and attribute compression methods with a focus on deep learning based approaches is reviewed. The challenges faced when compressing geometry and attributes are considered, with an analysis of the current approaches to address them, their limitations and the relations between deep learning and traditional ones. Current open questions in point cloud compression, existing solutions and perspectives are identified and discussed. Finally, the link between existing point cloud compression research and research problems to relevant areas of adjacent fields, such as rendering in computer graphics, mesh compression and point cloud quality assessment, is highlighted.
... Compression method ShapeNet [1] Wang et al. [2,3] ModelNet [4] Quach et al. [5,6] and Nguyen et al. [7] MPEG Alexiou et al. [8] and Guarda et al. [9,10,11,12,13,14,15] JPEG Pleno [16] Alexiou et al. [8] nuScenes [17] Wiesman et al. [18] Depending on application, the number of points in a typical point cloud model can range from thousands up to the order of billions. Since the transmission and storage of such huge amount of data is impractical, efficient compression methods are paramount. ...
Preprint
Full-text available
The popularisation of acquisition devices capable of capturing volumetric information such as LiDAR scans and depth cameras has lead to an increased interest in point clouds as an imaging modality. Due to the high amount of data needed for their representation, efficient compression solutions are needed to enable practical applications. Among the many techniques that have been proposed in the last years, learning-based methods are receiving large attention due to their high performance and potential for improvement. Such algorithms depend on large and diverse training sets to achieve good compression performance. ShapeNet is a large-scale dataset composed of CAD models with texture and constitute and effective option for training such compression methods. This dataset is entirely composed of meshes, which must go through a sampling process in order to obtain point clouds with geometry and texture information. Although many existing software libraries are able to sample geometry from meshes through simple functions, obtaining an output point cloud with geometry and color of the external faces of the mesh models is not a straightforward process for the ShapeNet dataset. The main difficulty associated with this dataset is that its models are often defined with duplicated faces sharing the same vertices, but with different color values. This document describes a script for sampling the meshes from ShapeNet that circumvent this issue by excluding the internal faces of the mesh models prior to the sampling. The script can be accessed from the following link: https://github.com/mmspg/mesh-sampling.
... These architectures are insufficient for the processing of large point cloud data. VoxelDNN was proposed in [38] which combines the octree and voxel domains. Inference in this lossless compression is slow, and the occupancy probabilities are predicted sequentially, voxel by voxel, while the improved MSVoxelDNN models voxel occupancy and achieves rate savings over G-PCC up to 17% on average [39]. ...
Article
Full-text available
In this paper we will present a new dynamic point cloud compression based on different projection types and bit depth, combined with the surface reconstruction algorithm and video compression for obtained geometry and texture maps. Texture maps have been compressed after creating Voronoi diagrams. Used video compression is specific for geometry (FFV1) and texture (H.265/HEVC). Decompressed point clouds are reconstructed using a Poisson surface reconstruction algorithm. Comparison with the original point clouds was performed using point-to-point and point-to-plane measures. Comprehensive experiments show better performance for some projection maps: cylindrical, Miller and Mercator projections.
Preprint
This paper presents a novel method to determine rate-distortion optimized transform coefficients for efficient compression of videos generated from point clouds. The method exploits a generalized frequency selective extrapolation approach that iteratively determines rate-distortion-optimized coefficients for all basis functions of two-dimensional discrete cosine and sine transforms. The method is applied to blocks containing both occupied and unoccupied pixels in video based point cloud compression for HEVC encoding. In the proposed algorithm, only the values of the transform coefficients are changed such that resulting bit streams are compliant to the V-PCC standard. For all-intra coded point clouds, bitrate savings of more than 4% for geometry and more than 6% for texture error metrics with respect to standard encoding can be observed. These savings are more than twice as high as savings obtained using competing methods from literature. In the randomaccess case, our proposed method outperforms competing V-PCC methods by more than 0.5%.
Article
Full-text available
This article presents an overview of the recent standardization activities for point cloud compression (PCC). A point cloud is a 3D data representation used in diverse applications associated with immersive media including virtual/augmented reality, immersive telepresence, autonomous driving and cultural heritage archival. The international standard body for media compression, also known as the Motion Picture Experts Group (MPEG), is planning to release in 2020 two PCC standard specifications: video-based PCC (V-CC) and geometry-based PCC (G-PCC). V-PCC and G-PCC will be part of the ISO/IEC 23090 series on the coded representation of immersive media content. In this paper, we provide a detailed description of both codec algorithms and their coding performances. Moreover, we will also discuss certain unique aspects of point cloud compression.
Conference Paper
Full-text available
Efficient point cloud compression is fundamental to enable the deployment of virtual and mixed reality applications, since the number of points to code can range in the order of millions. In this paper, we present a novel data-driven geometry compression method for static point clouds based on learned convolutional transforms and uniform quantization. We perform joint optimization of both rate and distortion using a trade-off parameter. In addition, we cast the decoding process as a binary classification of the point cloud occupancy map. Our method outperforms the MPEG reference solution in terms of rate-distortion on the Microsoft Voxelized Upper Bodies dataset with 51.5% BDBR savings on average. Moreover, while octree-based methods face exponential diminution of the number of points at low bitrates, our method still produces high resolution outputs even at low bitrates. Code and supplementary material are available at https://github.com/mauriceqch/pcc_geo_cnn .
Article
Full-text available
We present a method to compress geometry information of point clouds that explores redundancies across consecutive frames of a sequence. It uses octrees and works by progressively increasing resolution of the octree. At each branch of the tree, we generate an approximation of the child nodes by a number of methods which are used as contexts to drive an arithmetic coder. The best approximation, i.e. the context that yields the least amount of encoding bits, is selected and the chosen method is indicated as side information for replication at the decoder. The core of our method is a context-based arithmetic coder in which a reference octree is used as reference to encode the current octree, thus providing 255 contexts for each output octet. The 255×255 frequency histogram is viewed as a discrete 3D surface and is conveyed to the decoder using another octree. We present two methods to generate the predictions (contexts) which use adjacent frames in the sequence (inter-frame) and one method that works purely intra-frame. The encoder continuously switches the best mode among the three and conveys such information to the decoder. Since an intra-frame prediction is present, our coder can also work in purely intra-frame mode, as well. Extensive results are presented to show the method’s potential against many compression alternatives for the geometry information in dynamic voxelized point clouds.
Article
The widespread adoption of new 3D sensor and authoring technologies has made it possible to capture 3D scenes and models in real time with decent visual quality. As an example, Microsoft's Kinect and Apple's PrimeSense technology are now being used in a wide variety of interactive 3D mobile applications, including gaming and augmented reality applications. The latest smartphones are equipped with multiple cameras, which can be readily used to generate depth images. Some of the latest smartphones also include depth-ranging sensors that can be used for 3D model generation. Light-based detection and ranging (lidar) technologies are yet another field where 3D depth acquisition is important. Realtime 3D scenery detection and ranging has become an important issue for the emerging field of autonomous navigation and driving applications.
Article
Due to the increased popularity of augmented and virtual reality experiences, the interest in capturing the real world in multiple dimensions and in presenting it to users in an immersible fashion has never been higher. Distributing such representations enables users to freely navigate in multi-sensory 3D media experiences. Unfortunately, such representations require a large amount of data, not feasible for transmission on today’s networks. Efficient compression technologies well adopted in the content chain are in high demand and are key components to democratize augmented and virtual reality applications. The Moving Picture Experts Group, MPEG, as one of the main standardization groups dealing with multimedia, identified the trend and started recently the process of building an open standard for compactly representing 3D point clouds, which are the 3D equivalent of the very well-known 2D pixels. This paper introduces the main developments and technical aspects of this ongoing standardization effort.
Article
Deep Neural Networks trained as image auto-encoders have recently emerged as a promising direction for advancing the state of the art in image compression. The key challenge in learning such networks is twofold: to deal with quantization, and to control the trade-off between reconstruction error (distortion) and entropy (rate) of the latent image representation. In this paper, we focus on the latter challenge and propose a new technique to navigate the rate-distortion trade-off for an image compression auto-encoder. The main idea is to directly model the entropy of the latent representation by using a context model: a 3D-CNN which learns a conditional probability model of the latent distribution of the auto-encoder. During training, the auto-encoder makes use of the context model to estimate the entropy of its representation, and the context model is concurrently updated to learn the dependencies between the symbols in the latent representation. Our experiments show that this approach yields a state-of-the-art image compression system based on a simple convolutional auto-encoder.
Article
PixelCNNs are a recently proposed class of powerful generative models with tractable likelihood. Here we discuss our implementation of PixelCNNs which we make available at https://github.com/openai/pixel-cnn. Our implementation contains a number of modifications to the original model that both simplify its structure and improve its performance. 1) We use a discretized logistic mixture likelihood on the pixels, rather than a 256-way softmax, which we find to speed up training. 2) We condition on whole pixels, rather than R/G/B sub-pixels, simplifying the model structure. 3) We use downsampling to efficiently capture structure at multiple resolutions. 4) We introduce additional short-cut connections to further speed up optimization. 5) We regularize the model using dropout. Finally, we present state-of-the-art log likelihood results on CIFAR-10 to demonstrate the usefulness of these modifications.
Article
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions. The method is straightforward to implement and is based an adaptive estimates of lower-order moments of the gradients. The method is computationally efficient, has little memory requirements and is well suited for problems that are large in terms of data and/or parameters. The method is also ap- propriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The method exhibits invariance to diagonal rescaling of the gradients by adapting to the geometry of the objective function. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. We demonstrate that Adam works well in practice when experimentally compared to other stochastic optimization methods.
Microsoft voxelized upper bodies - a voxelized point cloud dataset
  • C Loop
  • Q Cai
  • S O Escolano
  • P A Chou
8i Voxelized Full Bodies - A Voxelized Point Cloud Dataset
  • E Eon
  • B Harrison
  • T Myers
  • P A Chou