Article

A Volumetric Approach to Point Cloud Compression, Part II: Geometry Compression

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Compression of point clouds has so far been confined to coding the positions of a discrete set of points in space and the attributes of those discrete points. We introduce an alternative approach based on volumetric functions, which are functions defined not just on a finite set of points, but throughout space. As in regression analysis, volumetric functions are continuous functions that are able to interpolate values on a finite set of points as linear combinations of continuous basis functions. Using a B-spline wavelet basis, we are able to code volumetric functions representing both geometry and attributes. Attribute compression is addressed in Part I of this paper, while geometry compression is addressed in Part II. Geometry is represented implicitly as the level set of a volumetric function (the signed distance function or similar). Experimental results show that geometry compression using volumetric functions improves over the methods used in the emerging MPEG Point Cloud Compression (G-PCC) standard.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... As a result, octree based lossy compression tends to produce "blocky" results at the rendering stage with medium to low bitrates. In order to partially attenuate this issue, [104] proposes to use wavelet transforms and volumetric functions to compact the energy of the point cloud signal. However, since they still employ an octree representation, their method exhibits rapid geometry degradation at lower bitrates. ...
... However, lossy compression using octrees alone has poor performance as pruning octree levels decreases the number of points exponentially resulting in significant distortion. To alleviate this issue, many solutions have been proposed such as triangle [58] surface models, planar [59] surface models, graph-based enhancement layers [52] and volumetric functions [104]. The core idea is that by encoding approximations along a coarse octree, we can alleviate the shortcomings of the octree structure. ...
... This chapter is at the crossroads of static point cloud attribute compression and deep representation learning of 3D data. Compressing static point cloud attributes has been explored using graph transforms [195], the Region-Adaptive Hierarchical Transform (RAHT) [54] and volumetric functions [104]. Graph transforms take advantage of the Graph Fourier Transform (GFT) and the neighborhood structure present in the 3D space to compress point cloud attributes. ...
Thesis
Point clouds are becoming essential in key applications with advances in capture technologies leading to large volumes of data.Compression is thus essential for storage and transmission.Point Cloud Compression can be divided into two parts: geometry and attribute compression.In addition, point cloud quality assessment is necessary in order to evaluate point cloud compression methods.Geometry compression, attribute compression and quality assessment form the three main parts of this dissertation.The common challenge across these three problems is the sparsity and irregularity of point clouds.Indeed, while other modalities such as images lie on a regular grid, point cloud geometry can be considered as a sparse binary signal over 3D space and attributes are defined on the geometry which can be both sparse and irregular.First, the state of the art for geometry and attribute compression methods with a focus on deep learning based approaches is reviewed.The challenges faced when compressing geometry and attributes are considered, with an analysis of the current approaches to address them, their limitations and the relations between deep learning and traditional ones.We present our work on geometry compression: a convolutional lossy geometry compression approach with a study on the key performance factors of such methods and a generative model for lossless geometry compression with a multiscale variant addressing its complexity issues.Then, we present a folding-based approach for attribute compression that learns a mapping from the point cloud to a 2D grid in order to reduce point cloud attribute compression to an image compression problem.Furthermore, we propose a differentiable deep perceptual quality metric that can be used to train lossy point cloud geometry compression networks while being well correlated with perceived visual quality and a convolutional neural network for point cloud quality assessment based on a patch extraction approach.Finally, we conclude the dissertation and discuss open questions in point cloud compression, existing solutions and perspectives. We highlight the link between existing point cloud compression research and research problems to relevant areas of adjacent fields, such as rendering in computer graphics, mesh compression and point cloud quality assessment.
... Attributes typically include color components, e.g., RGB, but may alternatively include reflectance, normals, transparency, density, spherical harmonics, and so forth. Commonly (Zhang et al., 2014;Cohen et al., 2016;de Queiroz and Chou, 2016;Thanou et al., 2016;de Queiroz and Chou, 2017;Pavez et al., 2018;Schwarz et al., 2019;Chou et al., 2020;Krivokuća et al., 2020), point cloud compression is broken into two steps: compression of the point cloud positions, called the geometry, and compression of the point cloud attributes. As illustrated in Figure 2, once the decoder decodes the geometry (possibly with loss), the encoder encodes the attributes conditioned on the decoded geometry. ...
... As illustrated in Figure 2, once the decoder decodes the geometry (possibly with loss), the encoder encodes the attributes conditioned on the decoded geometry. In this work, we focus on this second step, namely attribute compression conditioned on the decoded geometry, assuming geometry compression (such as Krivokuća et al., 2020;Tang et al., 2020) in the first step. It is important to note that this conditioning is crucial in achieving good attribute compression. ...
... V-PCC is based on existing video codecs, while G-PCC is based on new, but in many ways classical, geometric approaches. Like previous works (Zhang et al., 2014;Cohen et al., 2016;de Queiroz and Chou, 2016;Thanou et al., 2016;de Queiroz and Chou, 2017;Pavez et al., 2018;Chou et al., 2020;Krivokuća et al., 2020), both V-PCC and G-PCC compress geometry first, then compress attributes conditioned on geometry. Neural networks have been applied with some success to geometry compression (Yan et al., 2019;Quach et al., 2019;Guarda et al., 2019a,b;Guarda et al., 2020;Tang et al., 2020;Quach et al., 2020a;Milani, 2020Milani, , 2021Lazzarotto et al., 2021), but not to lossy attribute compression. ...
Article
Full-text available
We consider the attributes of a point cloud as samples of a vector-valued volumetric function at discrete positions. To compress the attributes given the positions, we compress the parameters of the volumetric function. We model the volumetric function by tiling space into blocks, and representing the function over each block by shifts of a coordinate-based, or implicit, neural network. Inputs to the network include both spatial coordinates and a latent vector per block. We represent the latent vectors using coefficients of the region-adaptive hierarchical transform (RAHT) used in the MPEG geometry-based point cloud codec G-PCC. The coefficients, which are highly compressible, are rate-distortion optimized by back-propagation through a rate-distortion Lagrangian loss in an auto-decoder configuration. The result outperforms the transform in the current standard, RAHT, by 2–4 dB and a recent non-volumetric method, Deep-PCAC, by 2–5 dB at the same bit rate. This is the first work to compress volumetric functions represented by local coordinate-based neural networks. As such, we expect it to be applicable beyond point clouds, for example to compression of high-resolution neural radiance fields.
... The most original method of point cloud compression is to directly compress the point cloud to reduce the amount of data to transmit, including attribute compression, geometric compression, and motion-compensation compression [22], [23]. Plenty of these pay close attention to static Kdtreebased and octre-based solutions. ...
... Draco realizes lossy compression by reducing quantization bits. [22] uses a B-spline wavelet to represent the geometry and attributes of the compressed point cloud. [25] points out that converting 3D point clouds into 2D maps and using conventional algorithms for compression will lead to the loss of key feature information. ...
Article
Digital twin technology has recently gathered pace in engineering communities as it allows for the convergence of the real structure and its digital counterpart. 3D point cloud data is a more effective way to describe the real world and to reconstruct the digital counterpart than the conventional 2D images or 360-degree images. Large-scale, e.g., city-scale digital twins, typically collect point cloud data via internet-of-things (IoT) devices and transmit it over wireless networks. However, the existing wireless transmission technology can not carry real-time point cloud transmission for digital twin reconstruction due to mass data volume, high processing overheads, and low delay-tolerance. We propose a novel artificial intelligence (AI) powered end-to-end framework, termed AIRec, for efficient digital twin communication from point cloud compression, wireless channel coding, and digital twin reconstruction. AIRec adopts the encoder-decoder architecture. In the encoder, a novel importance-aware pooling scheme is designed to adaptively select important points with learnable thresholds to reduce the transmission volume. We also design a novel noise-aware joint source and channel coding is proposed to adaptively adjust the transmission strategy based on SNR and map the features to error-resilient channel symbols for wireless transmission to achieve a good tradeoff between the transmission rate and reconstruction quality. The decoder can accurately reconstruct the digital twins from the received symbols. Extensive experiments of typical datasets and comparison with baselines show that we achieve a good reconstruction quality under 24×24\times compression ratio.
... Apart from its positional data, each point can also be associated with extra information including colors, normals, etc. Compression of 3D PCs has received significant attention during the past few years. Many approaches have been proposed in the literature to support efficient compression of PCs, including geometry coding [11] (such as octree [12], [13], kd-tree [14], spinning tree [15], quadtree and binary-tree approaches [16] etc.), attribute coding [17] (such as Graph Fourier Transform (GFT) [18], Karhunen-Loève transform (KLT) [19], Region-Adaptive Haar Transform (RAHT) [20], [21], structured dictionary learning [22], etc.), or a combination of both [23]. The current SoA belongs to the Moving Pictures Expert Group (MPEG). ...
... In order to further highlight geometrical significance in neighborhoods around the the most salient points previously detected, we extract features of high curvature values from such neighbourhoods. To be more specific, we exploit the most salient vertices of Eq. (11), which are denoted by v i ∈ P s : s 11i > s 0 , where P s ⊆ P v . The threshold value of s 0 is set such that only the largest saliency values are preserved. ...
Preprint
Full-text available
The increasing demand for accurate representations of 3D scenes, combined with immersive technologies has led point clouds to extensive popularity. However, quality point clouds require a large amount of data and therefore the need for compression methods is imperative. In this paper, we present a novel, geometry-based, end-to-end compression scheme, that combines information on the geometrical features of the point cloud and the user's position, achieving remarkable results for aggressive compression schemes demanding very small bit rates. After separating visible and non-visible points, four saliency maps are calculated, utilizing the point cloud's geometry and distance from the user, the visibility information, and the user's focus point. A combination of these maps results in a final saliency map, indicating the overall significance of each point and therefore quantizing different regions with a different number of bits during the encoding process. The decoder reconstructs the point cloud making use of delta coordinates and solving a sparse linear system. Evaluation studies and comparisons with the geometry-based point cloud compression (G-PCC) algorithm by the Moving Picture Experts Group (MPEG), carried out for a variety of point clouds, demonstrate that the proposed method achieves significantly better results for small bit rates.
... Wavelet-based downsampling is common for compressing mesh vertices [44], [45], [46]. When a mesh is not readily available, connectivity can be introduced by building a graph [47], local graphs [48], [49], or a resampled signed distance field [50] from the particles. Instead, we use a regular grid, which is simple and fast to compute. ...
... odd-even context coding (a) 3,130,215 particles (b) 49.85 dB (c)50.39 dB Top: rate-distortion plots for the dancer dataset shows that oddeven context coding significantly outperforms truncated binary coding. ...
Article
Full-text available
Scientific simulations and observations using particles have been creating large datasets that require effective and efficient data reduction to store, transfer, and analyze. However, current approaches either compress only small data well while being inefficient for large data, or handle large data but with insufficient compression. Toward effective and scalable compression/decompression of particle positions, we introduce new kinds of particle hierarchies and corresponding traversal orders that quickly reduce reconstruction error while being fast and low in memory footprint. Our solution to compression of large-scale particle data is a flexible block-based hierarchy that supports progressive, random-access, and error-driven decoding, where error estimation heuristics can be supplied by the user. For low-level node encoding, we introduce new schemes that effectively compress both uniform and densely structured particle distributions.
... The errors between the point clouds measured between the original and the decompressed point cloud using the point-to-point (P2P) and point-to-plane (P2Pl) metrics do not depend on the size and characteristics of the image [9]. The bi-Akima algorithm compresses the PCD image by scanning layer-by-layer to eliminate data redundancy in the original dense image [26]. ...
... The final calculated CDCT of subspaces x, y, and z have been expressed by the following Eqs. (8), (9), and (10). ...
Article
Full-text available
A novel and efficient block-wise decomposition-based codec (BDC) for a three-dimensional (3D) light detection and ranging (LiDAR) point cloud (PCD) image (BDCPCD) has been introduced in this paper. The raw LiDAR data is cleansed and normalized by applying the axis outlier detection and circular differential cosine transformation methods, respectively. Then, the iterative dimensionality reduction approach is used to decompose and quantize the tensor structured signal data through block-wise singular value decomposition and signal block vectorization methods, respectively. The final single order tensor is considered as a compressed bitstream for efficient transformation. The proposed BDCPCD is applied on three different dense 3D LiDAR PCD data sets. The results demonstrate that it outperformed the four existing well-known compression techniques, such as WinRAR, 7-Zip, Tensor Tucker decomposition, and Random sample consensus (RANSAC) point cloud compression algorithm. This iterative compression algorithm constantly reduces the 66.66% of tensor blocks in each iteration. This research proves that the BDCPCD compresses different sizes of 3D LiDAR PCD spatial data to be reduced into six bytes and averagely increases the quality of the decompressed image by 1.6 decibels than the existing Tucker based algorithm.
... Attributes typically include color components, e.g., RGB, but may alternatively include reflectance, normals, transparency, density, spherical harmonics, and so forth. Commonly (e.g., [16,18,20,21,39,58,69,82,91]), point cloud compression is broken into two steps: compression of the point cloud positions, called the geometry, and compression of the point cloud attributes. Compression of the attributes is conditioned on the decoded geometry, as illustrated in Fig. 2. It is important to note that this conditioning is crucial in achieving good compression. ...
... V-PCC is based on existing video codecs, while G-PCC is based on new, but in many ways classical, geometric approaches. Like previous works [16,18,20,21,39,58,82,91], both V-PCC and G-PCC compress geometry first, then compress attributes conditioned on geometry. Neural networks have been applied with some success to geometry compression [25-27, 41, 50, 51, 61, 63, 81, 88], but not to attribute compression. ...
Preprint
We consider the attributes of a point cloud as samples of a vector-valued volumetric function at discrete positions. To compress the attributes given the positions, we compress the parameters of the volumetric function. We model the volumetric function by tiling space into blocks, and representing the function over each block by shifts of a coordinate-based, or implicit, neural network. Inputs to the network include both spatial coordinates and a latent vector per block. We represent the latent vectors using coefficients of the region-adaptive hierarchical transform (RAHT) used in the MPEG geometry-based point cloud codec G-PCC. The coefficients, which are highly compressible, are rate-distortion optimized by back-propagation through a rate-distortion Lagrangian loss in an auto-decoder configuration. The result outperforms RAHT by 2--4 dB. This is the first work to compress volumetric functions represented by local coordinate-based neural networks. As such, we expect it to be applicable beyond point clouds, for example to compression of high-resolution neural radiance fields.
... Only representative patches and their positions are encoded and transmitted. A two-part paper [17] and [18], describes a volumetric approach towards PCC, where a continuous volumetric B-spline function is defined as a surface by fitting it to the PC data. The function coefficients are then quantized and transmitted. ...
Article
Full-text available
The rapid growth on the amount of generated 3D data, particularly in the form of Light Detection And Ranging (LiDAR) point clouds (PCs), poses very significant challenges in terms of data storage, transmission, and processing. Point cloud (PC) representation of 3D visual information has shown to be a very flexible format with many applications ranging from multimedia immersive communication to machine vision tasks in the robotics and autonomous driving domains. In this paper, we investigate the performance of four reference 3D object detection techniques, when the input PCs are compressed with varying levels of degradation. Compression is performed using two MPEG standard coders based on 2D projections and octree decomposition, as well as two coding methods based on Deep Learning (DL). For the DL coding methods, we used a Joint Photographic Experts Group (JPEG) reference PC coder, that we adapted to accept LiDAR PCs in both Cartesian and cylindrical coordinate systems. The detection performance of the four reference 3D object detection methods was evaluated using both pre-trained models and models specifically trained using degraded PCs reconstructed from compressed representations. It is shown that LiDAR PCs can be compressed down to 6 bits per point with no significant degradation on the object detection precision. Furthermore, employing specifically trained detection models improves the detection capabilities even at compression rates as low as 2 bits per point. These results show that LiDAR PCs can be coded to enable efficient storage and transmission, without significant object detection performance loss.
... The most intuitive method to reduce the volume of transmitted data is by compressing the point cloud video. This can be achieved through geometry compression, attribute compression, and motion-compensated compression [45], [46]. The former two approaches utilize octree or kd-tree structures to minimize spatial redundancy, and existing techniques that use these methods include PCL [42], Draco [5], G-PCC [7], GROOT [4], and deep learning-based codec [47], [48]. ...
Article
Full-text available
In the metaverse era, point cloud video (PCV) streaming on mobile XR devices is pivotal. While most current methods focus on PCV compression from traditional 3-DoF video services, emerging AI techniques extract vital semantic information, producing content resembling the original. However, these are early-stage and computationally intensive. To enhance the inference efficacy of AI-based approaches, accommodate dynamic environments, and facilitate applicability to metaverse XR devices, we present ISCom, an interest-aware semantic communication scheme for lightweight PCV streaming. ISCom is featured with a region-of-interest (ROI) selection module, a lightweight encoder-decoder training module, and a learning-based scheduler to achieve real-time PCV decoding and rendering on resource-constrained devices. ISCom’s dual-stage ROI selection provides significantly reduces data volume according to real-time interest. The lightweight PCV encoder-decoder training is tailored to resource-constrained devices and adapts to the heterogeneous computing capabilities of devices. Furthermore, We provide a deep reinforcement learning (DRL)-based scheduler to select optimal encoder-decoder model for various devices adaptivelly, considering the dynamic network environments and device computing capabilities. Our extensive experiments demonstrate that ISCom outperforms baselines on mobile devices, achieving a minimum rendering frame rate improvement of 10 FPS and up to 22 FPS. Furthermore, our method significantly reduces memory usage by 41.7% compared to the state-of-the-art AITransfer method. These results highlight the effectiveness of ISCom in enabling lightweight PCV streaming and its potential to improve immersive experiences for emerging metaverse application.
... Due to extremely high data volumes of uncompressed PCs, point cloud compression (PCC) is necessary to reduce both storage requirements and the amount of data delivered through networks. Initial studies started with compression of static 3D objects [17], [21], [27]. However, recent work focused on dynamic scenarios [35]. ...
Article
Full-text available
Point cloud streaming has recently attracted research attention as it has the potential to provide six degrees of freedom movement, which is essential for truly immersive media. The transmission of point clouds requires high-bandwidth connections, and adaptive streaming is a promising solution to cope with fluctuating bandwidth conditions. Thus, understanding the impact of different factors in adaptive streaming on the Quality of Experience (QoE) becomes fundamental. Point clouds have been evaluated in Virtual Reality (VR), where viewers are completely immersed in a virtual environment. Augmented Reality (AR) is a novel technology and has recently become popular, yet quality evaluations of point clouds in AR environments are still limited to static images. In this paper, we perform a subjective study of four impact factors on the QoE of point cloud video sequences in AR conditions, including encoding parameters (quantization parameters, QPs), quality switches, viewing distance, and content characteristics. The experimental results show that these factors significantly impact the QoE. The QoE decreases if the sequence is encoded at high QPs and/or switches to lower quality and/or is viewed at a shorter distance, and vice versa. Additionally, the results indicate that the end user is not able to distinguish the quality differences between two quality levels at a specific (high) viewing distance. An intermediate-quality point cloud encoded at geometry QP (G-QP) 24 and texture QP (T-QP) 32 and viewed at 2.5m can have a QoE (i.e., score 6.5 out of 10) comparable to a high-quality point cloud encoded at 16 and 22 for G-QP and T-QP, respectively, and viewed at a distance of 5 m. Regarding content characteristics, objects with lower contrast can yield better quality scores. Participants’ responses reveal that the visual quality of point clouds has not yet reached an immersion level as desired. The average QoE of the highest visual quality is less than 8 out of 10. There is also a good correlation between objective metrics (e.g., color Peak Signal-to-Noise Ratio (PSNR) and geometry PSNR) and the QoE score. Especially the Pearson correlation coefficients of color PSNR is 0.84. Finally, we found that machine learning models are able to accurately predict the QoE of point clouds in AR environments. The subjective test results and questionnaire responses are available on Github: https://github.com/minhkstn/QoE-and-Immersion-of-Dynamic-Point-Cloud.
... For instance, Oliveira et al. [21] employed a graph-based transform for the enhancement layer and an octree-based approach for the base layer. Furthermore, Zhu et al. exploited region similarity [22] and viewdependent projection [23], while Krivokuća et al. [24] introduced volumetric functions for geometry compression. In inter-frame compression, various methods for 3D motion compensation [25], [26] and context-based arithmetic coding [27] have also been investigated. ...
Preprint
The emergence of digital avatars has raised an exponential increase in the demand for human point clouds with realistic and intricate details. The compression of such data becomes challenging with overwhelming data amounts comprising millions of points. Herein, we leverage the human geometric prior in geometry redundancy removal of point clouds, greatly promoting the compression performance. More specifically, the prior provides topological constraints as geometry initialization, allowing adaptive adjustments with a compact parameter set that could be represented with only a few bits. Therefore, we can envisage high-resolution human point clouds as a combination of geometric priors and structural deviations. The priors could first be derived with an aligned point cloud, and subsequently the difference of features is compressed into a compact latent code. The proposed framework can operate in a play-and-plug fashion with existing learning based point cloud compression methods. Extensive experimental results show that our approach significantly improves the compression performance without deteriorating the quality, demonstrating its promise in a variety of applications.
... Point cloud geometry and attribute compression respectively concern compression of the point positions and attributes -with or without loss. Although the geometry and attribute compression may be done jointly (e.g., [1]), the dominant approach in both recent research [2]- [8] as well as in the MPEG geometry-based point cloud compression (G-PCC) standard [9]- [12] is to compress the geometry first and then to compress the attributes conditioned on the geometry. 1 Thus, even if the geometry is lossy, the problem of attribute compression reduces to the case where the decoded geometry is known at both the attribute encoder and at the attribute decoder. ...
Preprint
Full-text available
We study 3D point cloud attribute compression using a volumetric approach: given a target volumetric attribute function f:R3Rf : \mathbb{R}^3 \rightarrow \mathbb{R}, we quantize and encode parameter vector θ\theta that characterizes f at the encoder, for reconstruction fθ^(x)f_{\hat{\theta}}(\mathbf{x}) at known 3D points x\mathbf{x}'s at the decoder. Extending a previous work Region Adaptive Hierarchical Transform (RAHT) that employs piecewise constant functions to span a nested sequence of function spaces, we propose a feedforward linear network that implements higher-order B-spline bases spanning function spaces without eigen-decomposition. Feedforward network architecture means that the system is amenable to end-to-end neural learning. The key to our network is space-varying convolution, similar to a graph operator, whose weights are computed from the known 3D geometry for normalization. We show that the number of layers in the normalization at the encoder is equivalent to the number of terms in a matrix inverse Taylor series. Experimental results on real-world 3D point clouds show up to 2-3 dB gain over RAHT in energy compaction and 20-30% bitrate reduction.
... Currently, there are two types of point cloud compression frameworks, i.e., geometry based point cloud compression (G-PCC) and video based point cloud compression (V-PCC) [8]. In G-PCC, geometry coding is performed by octree coding, and lossy geometry coding is primarily enabled by grouping the leaf nodes (or voxels) into blocks at a higher level or directly using position scaling [9]. Thus, bit rate allocation between geometry and color in G-PCC can be recasted into a problem of search of the octree level of geometry and the QP of color. ...
Preprint
As being one of the main representation formats of 3D real world and well-suited for virtual reality and augmented reality applications, point clouds have gained a lot of popularity. In order to reduce the huge amount of data, a considerable amount of research on point cloud compression has been done. However, given a target bit rate, how to properly choose the color and geometry quantization parameters for compressing point clouds is still an open issue. In this paper, we propose a rate-distortion model based quantization parameter selection scheme for bit rate constrained point cloud compression. Firstly, to overcome the measurement uncertainty in evaluating the distortion of the point clouds, we propose a unified model to combine the geometry distortion and color distortion. In this model, we take into account the correlation between geometry and color variables of point clouds and derive a dimensionless quantity to represent the overall quality degradation. Then, we derive the relationships of overall distortion and bit rate with the quantization parameters. Finally, we formulate the bit rate constrained point cloud compression as a constrained minimization problem using the derived polynomial models and deduce the solution via an iterative numerical method. Experimental results show that the proposed algorithm can achieve optimal decoded point cloud quality at various target bit rates, and substantially outperform the video-rate-distortion model based point cloud compression scheme.
... Existing works stream point cloud video mainly using a combination of compression techniques [4], [6], [28] and adaptive transmission mechanisms [29], [30], [31], [32]. Compressing the point cloud video is the most intuitive way to reduce the transmitted data volume, including geometry compression, attribute compression, and motion-compensated compression [33], [34]. The geometry and attribute-based compression use octree or kd-tree structure to reduce spatial redundancy, such as PCL [28], Draco [4], G-PCC [6], GROOT [3], and deep learning-based codec [7], [8]. ...
Preprint
The provisioning of immersive point cloud video (PCV) streaming on pervasive mobile devices is a cornerstone for enabling immersive communication and interactions in the future 6G metaverse era. However, most streaming techniques are dedicated to efficient PCV compression and codec extending from traditional 3-DoF video services. Some emerging AI-enabled approaches are still in their infancy phase and are constrained by intensive computational and adaptive flow techniques. In this paper, we present ISCom, an Interest-aware Semantic Communication Scheme for PCV, consisting of a region-of-interest (ROI) selection module, a lightweight PCV streaming module, and an intelligent scheduler. First, we propose a two-stage efficient ROI selection method for providing interest-aware PCV streaming, which significantly reduces the data volume. Second, we design a lightweight PCV encoder-decoder network for resource-constrained devices, adapting to the heterogeneous computing capabilities of terminals. Third, we train a deep reinforcement learning (DRL)-based scheduler to adapt an optimal encoder-decoder network for various devices, considering the dynamic network environments and computing capabilities of different devices. Extensive experiments show that ISCom outperforms baselines on mobile devices at least 10 FPS and up to 22 FPS.
... With the increasing capability of 3D acquisition device, it becomes critical that how to efficiently compress 3D point clouds. Currently, PCG compression methods are mainly focused on octree based solutions [8,14,16,24,35], 3D convolutional autoencoder [22,23,27,28] and PointNet-based autoencoder [29,31,32]. ...
Preprint
Full-text available
Point cloud is a crucial representation of 3D contents, which has been widely used in many areas such as virtual reality, mixed reality, autonomous driving, etc. With the boost of the number of points in the data, how to efficiently compress point cloud becomes a challenging problem. In this paper, we propose a set of significant improvements to patch-based point cloud compression, i.e., a learnable context model for entropy coding, octree coding for sampling centroid points, and an integrated compression and training process. In addition, we propose an adversarial network to improve the uniformity of points during reconstruction. Our experiments show that the improved patch-based autoencoder outperforms the state-of-the-art in terms of rate-distortion performance, on both sparse and large-scale point clouds. More importantly, our method can maintain a short compression time while ensuring the reconstruction quality.
... Prior non-deep learning-based point cloud geometry compression mostly includes octree-based, triangle mesh-based, and 3D-to-2D projection-based methodologies. Octree-based methods are the most widely used point cloud encoding methods [19]- [21]. Octree provides an efficient way to partition the 3D space to represent point clouds and is especially suitable for lossless coding. ...
Preprint
Full-text available
Efficient point cloud compression is essential for applications like virtual and mixed reality, autonomous driving, and cultural heritage. In this paper, we propose a deep learning-based inter-frame encoding scheme for dynamic point cloud geometry compression. We propose a lossy geometry compression scheme that predicts the latent representation of the current frame using the previous frame by employing a novel prediction network. Our proposed network utilizes sparse convolutions with hierarchical multiscale 3D feature learning to encode the current frame using the previous frame. We employ convolution on target coordinates to map the latent representation of the previous frame to the downsampled coordinates of the current frame to predict the current frame's feature embedding. Our framework transmits the residual of the predicted features and the actual features by compressing them using a learned probabilistic factorized entropy model. At the receiver, the decoder hierarchically reconstructs the current frame by progressively rescaling the feature embedding. We compared our model to the state-of-the-art Video-based Point Cloud Compression (V-PCC) and Geometry-based Point Cloud Compression (G-PCC) schemes standardized by the Moving Picture Experts Group (MPEG). Our method achieves more than 91% BD-Rate Bjontegaard Delta Rate) reduction against G-PCC, more than 62% BD-Rate reduction against V-PCC intra-frame encoding mode, and more than 52% BD-Rate savings against V-PCC P-frame-based inter-frame encoding mode using HEVC.
... Traditional point-cloud compression algorithms are limited to encoding the position and attribute of discrete point clouds. In [19], Krivokua et al. introduced an alternative technique based on volume function. Like regression analysis, the volume function is a continuous function that can interpolate the values on a finite set of points into a linear combination of continuous basis functions. ...
Article
Full-text available
In this article, we present an efficient coding scheme for LiDAR point cloud maps. As a point cloud map consists of numerous single scans spliced together, by recording the time stamp and quaternion matrix of each scan during map building, we cast the point cloud map compression into the point cloud sequence compression problem. The coding architecture includes two techniques: intra-coding and inter-coding. For intra-frames, a segmentation-based intra-prediction technique is developed. For inter-frames, an interpolation-based inter-frame coding network is explored to remove temporal redundancy by generating virtual point clouds based on the decoded frames. We only need to code the difference between the original LiDAR data and the intra/inter-predicted point cloud data. The point cloud map can be reconstructed according to the decoded point cloud sequence and quaternion matrices. Experiments on the KITTI dataset show that the proposed coding scheme can largely eliminate the temporal and spatial redundancies. The point cloud map can be encoded to 1/24 of its original size with 2 mm-level precision. Our algorithm also obtains better coding performance compared with the octree and Google Draco algorithms.
... We show that for a given variation operator M, and vertex partition for downsampling, there exists a unique inner product matrix Q such that the (M, Q)-GFT obeys the spectral folding property. Based on this result, we propose perfect reconstruction, orthogonal and biorthogonal filter banks using spectral graph filters of the fundamental matrix Z = Q −1 M. Our conditions in the graph frequency domain are exactly those developed 1 The use of non traditional Hilbert spaces has proven effective in various theoretical studies involving irregularly structured data, for example graph sparsification [33], semi-supervised and unsupervised learning [34], compression of 3D point clouds [35], [36], and graph signal sampling [37]. in [10], [11] for the normalized Laplacian of bipartite graphs and therefore we can reuse any of previously proposed filter designs, including those in [10], [11] as well as more recent improved designs, such as those in [12], [13]. When the variation operator M is the normalized Laplacian, and the graph is bipartite, we recover the BFB framework. ...
Preprint
Full-text available
We study the design of filter banks for signals defined on the nodes of graphs. We propose novel two channel filter banks, that can be applied to arbitrary graphs, given a positive semi definite variation operator, while using downsampling operators on arbitrary vertex partitions. The proposed filter banks also satisfy several desirable properties, including perfect reconstruction, and critical sampling, while having efficient implementations. Our results generalize previous approaches only valid for the normalized Laplacian of bipartite graphs. We consider graph Fourier transforms (GFTs) given by the generalized eigenvectors of the variation operator. This GFT basis is orthogonal in an alternative inner product space, which depends on the choices of downsampling sets and variation operators. We show that the spectral folding property of the normalized Laplacian of bipartite graphs, at the core of bipartite filter bank theory, can be generalized for the proposed GFT if the inner product matrix is chosen properly. We give a probabilistic interpretation to the proposed filter banks using Gaussian graphical models. We also study orthogonality properties of tree structured filter banks, and propose a vertex partition algorithm for downsampling. We show that the proposed filter banks can be implemented efficiently on 3D point clouds, with hundreds of thousands of points (nodes), while also improving the color signal representation quality over competing state of the art approaches.
... Wavelet-based downsampling is common for compressing mesh vertices [32,44,73]. When a mesh is not readily available, connectivity can be introduced by building a graph [11], local graphs [65,77], or a resampled signed distance field [35] from the particles. Instead, we use a regular grid, which is simple and fast to compute. ...
Conference Paper
Particle representations are used often in large-scale simulations and observations, frequently creating datasets containing several millions of particles or more. Due to their sheer size, such datasets are difficult to store, transfer, and analyze efficiently. Data compression is a promising solution; however, effective approaches to compress particle data are lacking and no community-standard and accepted techniques exist. Current techniques are designed either to compress small data very well but require high computational resources when applied to large data, or to work with large data but without a focus on compression, resulting in low reconstruction quality per bit stored. In this paper, we present innovations targeting tree-based particle compression approaches that improve the tradeoff between high quality and low memory-footprint for compression and decompression of large particle datasets. Inspired by the lazy wavelet transform, we introduce a new way of partitioning space, which allows a low-cost depth-first traversal of a particle hierarchy to cover the space broadly. We also devise novel data-adaptive traversal orders that significantly reduce reconstruction error compared to traditional data-agnostic orders such as breadth-first and depth-first traversals. The new partitioning and traversal schemes are used to build novel particle hierarchies that can be traversed with asymptotically constant memory footprint while incurring low reconstruction error. Our solution to encoding and (lossy) decoding of large particle data is a flexible block-based hierarchy that supports progressive, random-access, and error-driven decoding, where error heuristics can be supplied by the user. Finally, through extensive experimentation, we demonstrate the efficacy and the flexibility of the proposed techniques when combined as well as when used independently with existing approaches on a wide range of scientific particle datasets.
... For that, after computing the optimum solution with our algorithm, we modify it just to the next possible level, by employing just the next lower level of octree coding, which is what some alternative way of specifying this crucial parameter may come up with. As the the second reference scheme (Ref 2), we use the default scaling-based lossy geometry coding to realize rate constrained point cloud compression [15,25]. That is, the positions are scaled by a factor, and then rounded to the nearest integers. ...
Article
Full-text available
In geometry-based point cloud compression, the geometry information is typically compressed using octree coding. In octree coding, the size of the blocks in the voxelized point clouds, i.e., the number of voxels contained in a block, determines whether the geometry coding is lossless or lossy, and the degree of geometry compression in lossy coding. Therefore, selecting an appropriate block size for octree coding is crucial for compression quality of voxelized point clouds. In this paper, we propose an optimal block size selection scheme for geometry based point cloud compression with a given bit rate constraint. Firstly, we analyze the gradients of the overall quality of the point clouds with color coding bit rate and geometry coding bit rate in lossy geometry coding. Then, we propose an octree level selection approach that can output the optimal octree level for point cloud compression under a target bit rate. In this approach, we consider the difference between the impacts of lossy geometry coding and lossless geometry coding on the overall quality of the point clouds. Experimental results demonstrate that, using the level selected by the proposed algorithm for geometry coding can yield best coding results in terms of the average quality of the images rendered from decoded point clouds.
... Many other works in the literature in the past decade are based on Octree Representation plus some sort of entropy coding, both focusing on lossless compression [5]- [7] and lossy compression [8]- [10]. A few proposals are not based on Octrees: Milani et al. uses a reversible cellular automata block transform [11], Zhu et al. uses a binary tree partition [12], Filali et al. proposes a vector quantization approach [13], and Krivokuća et al. proposes a method relying on volumetric functions [14]. More recently, following the research in image compression, Deep Learning based algorithms have also been proposed, mostly focusing on lossy compression [15], [16]. ...
Article
Full-text available
Recently we have proposed a coding algorithm of point cloud geometry based on a rather different approach from the popular octree representation. In our algorithm, the point cloud is decomposed in silhouettes, hence the name Silhouette Coder, and context adaptive arithmetic coding is used to exploit redundancies within the point cloud (intra frame coding), and also using a reference point cloud (inter frame coding). In this letter we build on our previous work and propose a context selection algorithm as a pre-processing stage. With this algorithm, the point cloud is first parsed testing a large number of candidate context locations. The algorithm selects a small number of these contexts that better reflect the current point cloud, and then encode it with this choice. The proposed method further improves the results of our previous coder, Silhouette 4D, by 10%10 \% , on average, on a dynamic point cloud dataset of the JPEG Pleno, and achieves bitrates competitive with some high quality lossy coders such as the MPEG G-PCC.
Preprint
Learning-based methods have proven successful in compressing geometric information for point clouds. For attribute compression, however, they still lag behind non-learning-based methods such as the MPEG G-PCC standard. To bridge this gap, we propose a novel deep learning-based point cloud attribute compression method that uses a generative adversarial network (GAN) with sparse convolution layers. Our method also includes a module that adaptively selects the resolution of the voxels used to voxelize the input point cloud. Sparse vectors are used to represent the voxelized point cloud, and sparse convolutions process the sparse tensors, ensuring computational efficiency. To the best of our knowledge, this is the first application of GANs to compress point cloud attributes. Our experimental results show that our method outperforms existing learning-based techniques and rivals the latest G-PCC test model (TMC13v23) in terms of visual quality.
Article
The state-of-the-art G-PCC (geometry-based point cloud compression) (Octree) is the fine-grained approach, which uses the octree to partition point clouds into voxels and predicts them based on neighbor occupancy in narrower spaces. However, G-PCC (Octree) is less effective at compressing dense point clouds than multi-grained approaches (such as G-PCC (Trisoup)), which exploit the continuous point distribution in nodes partitioned by the pruned octree over larger spaces. Therefore, we propose a lossy multi-grained compression with extended octree and dual-model prediction. The extended octree, where each partitioned node contains intra-block and extra-block points, is applied to address poor prediction (such as overfitting) at the node edges of the octree partition. For the points of each multi-grained node, dual-model prediction fits surfaces and projects residuals onto the surfaces, reducing projection residuals for efficient 2D compression and fitting complexity. In addition, a hybrid DWT-DCT transform for 2D projection residuals mitigates the resolution degradation of DWT and the blocking effect of DCT during high compression. Experimental results demonstrate the superior performance of our method over advanced G-PCC (Octree), achieving BD-rate gains of 55.9% and 45.3% for point-to-point ( D1 ) and point-to-plane ( D2 ) distortions, respectively. Our approach also outperforms G-PCC (Octree) and G-PCC (Trisoup) in subjective evaluation.
Article
The emergence of digital avatars has prompted an exponential increase in the demand for human point clouds with realistic and intricate details. The compression of such data becomes challenging due to massive amounts of data comprising millions of points. Herein, we leverage the human geometric prior in the geometry redundancy removal of point clouds to greatly promote compression performance. More specifically, the prior provides topological constraints as geometry initialization, allowing adaptive adjustments with a compact parameter set that can be represented with only a few bits. Therefore, we propose representing high-resolution human point clouds as a combination of a geometric prior and structural deviations. The prior is first derived with an aligned point cloud. Subsequently, the difference in features is compressed into a compact latent code. The proposed framework can operate in a plug-and-play fashion with existing learning-based point cloud compression methods. Extensive experimental results show that our approach significantly improves the compression performance without deteriorating the quality, demonstrating its promise in serving a variety of applications.
Article
Efficient point cloud compression is essential for applications like virtual and mixed reality, autonomous driving, and cultural heritage. This paper proposes a deep learning-based inter-frame encoding scheme for dynamic point cloud geometry compression. We propose a lossy geometry compression scheme that predicts the latent representation of the current frame using the previous frame by employing a novel feature space inter-prediction network. The proposed network utilizes sparse convolutions with hierarchical multiscale 3D feature learning to encode the current frame using the previous frame. The proposed method introduces a novel predictor network for motion compensation in the feature domain to map the latent representation of the previous frame to the coordinates of the current frame to predict the current frame’s feature embedding. The framework transmits the residual of the predicted features and the actual features by compressing them using a learned probabilistic factorized entropy model. At the receiver, the decoder hierarchically reconstructs the current frame by progressively rescaling the feature embedding. The proposed framework is compared to the state-of-the-art Video-based Point Cloud Compression (V-PCC) and Geometry-based Point Cloud Compression (G-PCC) schemes standardized by the Moving Picture Experts Group (MPEG). The proposed method achieves more than 88% BD-Rate (Bjøntegaard Delta Rate) reduction against G-PCCv20 Octree, more than 56% BD-Rate savings against G-PCCv20 Trisoup, more than 62% BD-Rate reduction against V-PCC intra-frame encoding mode, and more than 52% BD-Rate savings against V-PCC P-frame-based inter-frame encoding mode using HEVC. These significant performance gains are cross-checked and verified in the MPEG working group.
Article
There is a critical requirement for efficiently compressing point cloud geometries representing three-dimensional (3D) moving objects in various applications. The Moving Picture Experts Group 3D Graphics coding group (MPEG 3DG) set up an inter-exploration model for geometry-based point cloud compression (G-PCC interEM). However, the block-matching motion compensation scheme with a translational motion model has limited ability to handle dense point clouds with complex local motions. To overcome this problem, we propose a progressive non-rigid motion compensation framework for point cloud geometry compression, where the point cloud registration technique is introduced and tailored with our designed rate-distortion cost. In the coarse-grained stage, a point cloud is represented as deformable point patches, and the patch-wise non-rigid motion estimation task is formulated as a registration-based optimization problem that can be efficiently solved by the majorization-minimization method. In the fine-grained stage, we propose a block-based motion refinement to enhance the estimated motion field in the local region, followed by a multi-hypothesis motion compensation scheme enabling smooth reference reconstruction with patch-wise deformation and block-wise refined motions. Experiments demonstrate our proposed scheme outperforms several competitive platforms in terms of both coding performance and compensation quality. Compared with G-PCC interEM, our proposed framework achieves significant bitrate savings, i.e., 4.71% (32 frames) and 4.22% (200 frames), for point cloud lossless geometry compression.
Article
Near-lossless compression of point clouds is suitable for the application scenarios with low distortion tolerance and certain requirements on the rate. Near-lossless attribute compression usually adopts a level-of-detail structure, where the dependencies between the layers make it possible to improve the rate-distortion (R-D) performance by using different quantization parameters for different layers. In this work, a theoretical analysis of the dependencies between adjacent layers is carried out, based on which the dependent Distortion-Quantization and Rate-Quantization models are established for point cloud attribute compression. Then an algorithm for quantization parameter cascading based on R-D optimization is proposed and implemented for near-lossless compression of point cloud attributes. The experimental results show that the proposed method has a superior performance gain compared to state-of-the-art for the Hausdorff R-D performance. At the same time, the proposed method improves subjective quality and is well adapted to various categories of point clouds.
Article
We propose novel two-channel filter banks for signals on graphs. Our designs can be applied to arbitrary graphs, given a positive semi definite variation operator, while using arbitrary vertex partitions for downsampling. The proposed generalized filter banks (GFBs) also satisfy several desirable properties including perfect reconstruction and critical sampling, while having efficient implementations. Our results generalize previous approaches that were only valid for the normalized Laplacian of bipartite graphs. Our approach is based on novel graph Fourier transforms (GFTs) given by the generalized eigenvectors of the variation operator. These GFTs are orthogonal in an alternative inner product space which depends on the downsampling and variation operators. Our key theoretical contribution is showing that the spectral folding property of the normalized Laplacian of bipartite graphs, at the core of bipartite filter bank theory, can be generalized for the proposed GFT if the inner product matrix is chosen properly. In addition, we study vertex domain and spectral domain properties of GFBs and illustrate their probabilistic interpretation using Gaussian graphical models. While GFBs can be defined given any choice of a vertex partition for downsampling, we propose an algorithm to optimize these partitions with a criterion that favors balanced partitions with large graph cuts, which are shown to lead to efficient and stable GFB implementations. Our numerical experiments show that partition-optimized GFBs can be implemented efficiently on 3D point clouds with hundreds of thousands of points (nodes), while also improving the color signal representation quality over competing state-of-the-art approaches.
Article
With the growth of Extended Reality (XR) and capturing devices, point cloud representation has become attractive to academics and industry. Point Cloud Compression (PCC) algorithms further promote numerous XR applications that may change our daily life. However, in the literature, PCC algorithms are often evaluated with heterogeneous datasets, metrics, and parameters, making the results hard to interpret. In this article, we propose an open-source benchmark platform called PCC Arena. Our platform is modularized in three aspects: PCC algorithms, point cloud datasets, and performance metrics. Users can easily extend PCC Arena in each aspect to fulfill the requirements of their experiments. To show the effectiveness of PCC Arena, we integrate seven PCC algorithms into PCC Arena along with six point cloud datasets. We then compare the algorithms on ten carefully selected metrics to evaluate the quality of the output point clouds. We further conduct a user study to quantify the user-perceived quality of rendered images that are produced by different PCC algorithms. Several novel insights are revealed in our comparison: (i) Signal Processing (SP)-based PCC algorithms are stable for different usage scenarios, but the trade-offs between coding efficiency and quality should be carefully addressed, (ii) Neural Network (NN)-based PCC algorithms have the potential to consume lower bitrates yet provide similar results to SP-based algorithms, (iii) NN-based PCC algorithms may generate artifacts and suffer from long running time, and (iv) NN-based PCC algorithms are worth more in-depth studies as the recently proposed NN-based PCC algorithms improve the quality and running time. We believe that PCC Arena can play an essential role in allowing engineers and researchers to better interpret and compare the performance of future PCC algorithms.
Article
As being one of the main representation formats of 3D real world and well-suited for virtual reality and augmented reality applications, point clouds have gained a lot of popularity. In order to reduce the huge amount of data, a considerable amount of research on point cloud compression has been done. However, given a target bit rate, how to properly choose the color and geometry quantization parameters for compressing point clouds is still an open issue. In this paper, we propose a rate-distortion model based quantization parameter selection scheme for bit rate constrained point cloud compression. Firstly, to overcome the measurement uncertainty in evaluating the distortion of the point clouds, we propose a unified model to combine the geometry distortion and color distortion. In this model, we take into account the correlation between geometry and color variables of point clouds and derive a dimensionless quantity to represent the overall quality degradation. Then, we derive the relationships of overall distortion and bit rate with the quantization parameters. Finally, we formulate the bit rate constrained point cloud compression as a constrained minimization problem using the derived polynomial models and deduce the solution via an iterative numerical method. Experimental results show that the proposed algorithm can achieve optimal decoded point cloud quality at various target bit rates, and substantially outperform the video-rate-distortion model based point cloud compression scheme.
Article
LiDAR sensors are almost indispensable for autonomous robots to perceive the surrounding environment. However, the transmission of large-scale LiDAR point clouds is highly bandwidth-intensive, which can easily lead to transmission problems, especially for unstable communication networks. Meanwhile, existing LiDAR data compression is mainly based on rate-distortion optimization, which ignores the semantic information of ordered point clouds and the task requirements of autonomous robots. To address these challenges, this article presents a task-driven S cene- A ware L iDAR P oint C louds C oding (SA-LPCC) framework for autonomous vehicles. Specifically, a semantic segmentation model is developed based on multi-dimension information, in which both 2D texture and 3D topology information are fully utilized to segment movable objects. Further, a prediction-based deep network is explored to remove the spatial-temporal redundancy. The experimental results on the benchmark semantic KITTI dataset validate that our SA-LPCC achieves state-of-the-art performance in terms of the reconstruction quality and storage space for downstream tasks. We believe that SA-LPCC jointly considers the scene-aware characteristics of movable objects and removes the spatial-temporal redundancy from an end-to-end learning mechanism, which will boost the related applications from algorithm optimization to industrial products.
Article
Volumetric video provides a more immersive holographic virtual experience than conventional video services such as 360-degree and virtual reality (VR) videos. However, due to ultra-high bandwidth requirements, existing compression and transmission technology cannot handle the delivery of real-time volumetric video. Unlike traditional compression methods and the approaches that extend 360-degree video streaming, we propose AITransfer, an AI-powered compression and semantic-aware transmission method for point cloud video data (a popular volumetric data format). AITransfer targets the semantic-level communication beyond transmitting raw point cloud video or compressed video with two outstanding contributions: (1) designing an integrated end-to-end architecture with two fundamental contents of feature extraction and reconstruction to reduce the bandwidth consumption and alleviate the computational pressure; and (2) incorporating the dynamic network condition into end-to-end architecture design and employing a deep reinforcement learning-based adaptive control scheme to provide robust transmission. We conduct extensive experiments on the typical datasets and develop a case study to demonstrate the efficiency and effectiveness. The results show that AITransfer can provide extremely efficient point cloud transmission while maintaining considerable user experience with more than 30.72x compression ratio under the existing network environments.
Article
Point cloud compression (PCC) is crucial for efficient and flexible storage as well as feasible transmission of point clouds in practice. For geometry compression, one popular approach is the octree-based solution. The intra prediction mechanism utilizes the spatial correlation in the static point cloud to predict the occupancy bit of the octree node for entropy coding, reducing the spatial redundancy. In this study, two local geometry-based prediction methods are proposed following statistical and theoretical analyses: binary prediction, which outputs the binary state (i.e., occupied or unoccupied), and ternary prediction, which provides a third option other than occupied or unoccupied (i.e., not predicted). In comparison to the state-of-the-art, the proposed binary prediction offers the Bjontegaard delta rate (BD-rate) of -0.8% for lossy compression and the bits per input point (bpip) of 100.09% for lossless compression in average, respectively. The binary prediction reduces the computational complexity in terms of more than 20% decrease in decoding time. In particular, it also provides noticeable reduction of the memory usage during entropy coding. The proposed ternary prediction provides -1.2% BD-rate for lossy compression and 97.19% bpip for lossless compression in average, respectively, in comparison to the state-of-the-art. While achieving performance gain, it is considerably more computational efficient by saving about 18% decoding time. Due to these advantages, part of the proposed ternary prediction has been adopted by the ongoing MPEG standard of geometry-based point cloud compression (G-PCC).
Conference Paper
Point cloud is a crucial representation of 3D contents, which has been widely used in many areas such as virtual reality, mixed reality, autonomous driving, etc. With the boost of the number of points in the data, how to efficiently compress point cloud becomes a challenging problem. In this paper, we propose a set of significant improvements to patch-based point cloud compression, i.e., a learnable context model for entropy coding, octree coding for sampling centroid points, and an integrated compression and training process. In addition, we propose an adversarial network to improve the uniformity of points during reconstruction. Our experiments show that the improved patch-based autoencoder outperforms the state-of-the-art in terms of rate-distortion performance, on both sparse and large-scale point clouds. More importantly, our method can maintain a short compression time while ensuring the reconstruction quality.
Article
Point cloud is a set of three-dimensional points in arbitrary order, which is a popular representation of 3D scene in autonomous navigation and immersive applications in recent years. Compression becomes an inevitable issue due to the huge data volume of point cloud. In order to effectively compress attributes of those points, proper reordering is important. The existing voxel-based point cloud attributes compression scheme uses a naive scan for points reordering. In this paper, we theoretically analyzed 3C properties of point cloud, i.e., Compactness, Clustering and Correlation, of different scan-orders defined by different space filling curves and disclosed that the Hilbert curve can provide the best spatial correlation preservation compared with Z-order and Gray-coded curves. It is also statistically verified that the Hilbert curve always has the best ability of attributes correlation preservation for point clouds with different sparsity. We also proposed a fast and iterative Hilbert address code generation method to implement points reordering. The Hilbert scan-order could be combined with various point cloud attribute coding methods. Experiments show that the correlation preservation feature of the proposed scan-order can bring us 6.1% and 6.5% coding gain for prediction and transform coding, respectively.
Article
Due to the large amount of data involved in the three-dimensional (3D) LiDAR point clouds, point cloud compression (PCC) becomes indispensable to many real-time applications. In autonomous driving of connected vehicles for example, point clouds are constantly acquired along the time and subjected to be compressed. Among the existing PCC methods, very few of them have effectively removed the temporal redundancy inherited in the point clouds. To address this issue, a novel lossy LiDAR PCC system is proposed in this paper, which consists of the inter -frame coding and the intra -frame coding. For the former, a deep-learning approach is proposed to conduct bi-directional frame prediction using an asymmetric residual module and 3D space-time convolutions; the proposed network is called the bi-directional prediction network (BPNet). For the latter, a novel range-adaptive floating-point coding (RAFC) algorithm is proposed for encoding the reference frames and the B-frame prediction residuals in the 32-bit floating-point precision. Since the pixel-value distribution of these two types of data are quite different, various encoding modes are designed for providing adaptive selection. Extensive simulation experiments have been conducted using multiple point cloud datasets, and the results clearly show that our proposed PCC system consistently outperforms the state-of-the-art MPEG G-PCC in terms of data fidelity and localization, while delivering real-time performance.
Article
Recent years have witnessed remarkable success of Graph Fourier Transform (GFT) in point cloud attribute compression. Existing researches mainly utilize geometry distance to define graph structure for coding attribute (e.g., color), which may distribute high weights to the edges connecting points across texture boundaries. In this case, these geometry-based graphs cannot model attribute differences between points adequately, thus limiting the compression efficiency of GFT. Hence, we firstly utilize attribute itself to refine the distance-based weight values by setting penalty function, which smoothens signal variations on graph and concentrates more energies in the low frequencies. Then, adjacency matrices acting as penalty function variables are transmitted to decoder with extra bit overheads. To balance the attribute smoothness on graph and the cost of coding adjacency matrices, we finally propose the graph based on Rate-Distortion (RD) optimization and find the optimal adjacency matrix. Experimental results show that our algorithm improves RD performance compared with competitive platforms. Moreover, additional experiments also analyze the gain source by evaluating the effectiveness of RD optimized graphs.
Article
Point cloud is a major representation format of 3D objects and scenes. It has been increasingly applied in various applications due to the rapid advances in 3D sensing and rendering technologies. In the field of autonomous driving, point clouds captured by spinning Light Detection And Ranging (LiDAR) devices have become an informative data source for road environment perception and intelligent vehicle control. On the other hand, the massive data volume of point clouds also brings huge challenges to point cloud transmission and storage. Therefore, establishing compression frameworks and algorithms that conform to the characteristics of point cloud data has become an important research topic for both academia and industry. In this paper, a geometry compression method dedicated to spinning LiDAR point cloud was proposed taking advantage of the prior information of the LiDAR acquisition procedure. Rate-distortion optimizations were further integrated into the coding pipeline according to the characteristics of the prediction residuals. Experimental results obtained on different datasets show that the proposed method consistently outperforms the state-of-the-art G-PCC predictive geometry coding method with reduced runtime at both the encoder and decoder sides.
Article
Full-text available
Point clouds are becoming essential in key applications with advances in capture technologies leading to large volumes of data. Compression is thus essential for storage and transmission. In this work, the state of the art for geometry and attribute compression methods with a focus on deep learning based approaches is reviewed. The challenges faced when compressing geometry and attributes are considered, with an analysis of the current approaches to address them, their limitations and the relations between deep learning and traditional ones. Current open questions in point cloud compression, existing solutions and perspectives are identified and discussed. Finally, the link between existing point cloud compression research and research problems to relevant areas of adjacent fields, such as rendering in computer graphics, mesh compression and point cloud quality assessment, is highlighted.
Article
Existing LiDAR point cloud compression (PCC) methods tend to treat compression as a fidelity issue, without sufficiently addressing its machine perception aspect. The latter issue is often encountered by the decoder agents that might aim to conduct scene-understanding related tasks only, such as computing the localization information. For tackling this challenge, a novel LiDAR PCC system is proposed to compress the point cloud geometry, which contains a back channel for allowing the decoder to initiate such request to the encoder. The key success of our PCC method lies in our proposed semantic prior representation (SPR) and its lossy encoding algorithm with variable precision to generate the final bitstream; the entire process is fast and achieves real-time performance. Note that our SPR is a compact and effective representation of three-dimensional (3D) input point clouds, and it consists of labels, predictions , and residuals . These information can be generated by first exploiting a scene-aware object segmentation to a set of 2D range images (frames) individually, which were generated from the 3D point clouds via a projection process. Based on the generated labels, the pixels associated with those moving objects are considered as noisy information and should be removed for not only saving bit budget on transmission but also, most importantly, improving the accuracy of localization computed at the decoder. Experimental results conducted on the commonly-used test dataset have shown that our proposed system outperforms the MPEG’s G-PCC (TMC13-v14.0) in a large bitrate range. In fact, the performance gap will become even larger when more and/or large moving objects are involved in the input point clouds.
Article
Plenoptic point clouds are more complete representations of three-dimensional (3-D) objects than single-color point clouds, as they can have multiple colors per spatial point, representing colors of each point as seen from different view angles. They are more realistic but also involve a larger volume of data in need of compression. Therefore, in this paper, a multiview-video-based framework is proposed to better exploit correlations in color across different viewpoints. To the best of the authors' knowledge, this is the first work to exploit correlations in color across different viewpoints using a multiview-video-based framework. In addition, it is observed that some unoccupied pixels, which do not have corresponding points in plenoptic point clouds and are of no use to the quality of the reconstructed plenoptic point cloud colors, may cost many bits. To address this problem, a block-based group smoothing and a combined occupancy-map-based rate distortion optimization and four-neighbor average residual padding are further proposed to reduce the bit cost of unoccupied color pixels. The proposed algorithms are implemented in the moving pictures experts group (MPEG) video-based point cloud compression (V-PCC) and multiview extension of High Efficiency Video Coding (MV-HEVC) reference software. The experimental results show that the proposed algorithms can lead to a Bjontegaard Delta bitrate (BD-rate) reduction of 40% compared with the RAHT-KLT. Compared with the V-PCC independently applied to each view direction, the proposed algorithms can provide a BD-rate reduction of over 70%.
Article
Point cloud compression is critical to deploy 3D applications like autonomous driving. However, LiDAR point clouds contain many disconnected regions, where redundant bits for unoccupied 3D space and weak correlations between points make it a troublesome problem to achieve efficient compression. This paper aims to aggregate LiDAR point clouds to get compact representations with full consideration of the point distribution characteristics. Specifically, we propose a novel Layer-wise Geometry Aggregation (LGA) framework for LiDAR point cloud lossless geometry compression, which adaptively partitions point clouds into three layers based on the content properties, including a ground layer, an object layer, and a noise layer. The aggregation algorithms are delicately designed for each layer. Firstly, the ground layer is fitted to a Gaussian Mixture Model, which can uniformly represent ground points using much fewer model parameters than adopting the original 3D coordinates. Then, the object layer is tightly packed to reduce the space between objects effectively, and a dense layout for points can benefit compression efficiency. Finally, in the noise layer, the difference between neighbor points is reduced by reordering using Morton Code, and the reduced residuals can help saving bit consumption. Experimental results demonstrate that the proposed LGA significantly outperforms competitive methods without prior knowledge by 12.05~23.37% compression ratio gains. Furthermore, the enhanced LGA with prior knowledge shows consistent performance gains than the latest reference software. Additional results also validate the robustness and stability of our proposed scheme with acceptable time complexity.
Article
Full-text available
The modern data compression is mainly based on two approaches to entropy coding: Huffman (HC) and arithmetic/range coding (AC). The former is much faster, but approximates probabilities with powers of 2, usually leading to relatively low compression rates. The latter uses nearly exact probabilities - easily approaching theoretical compression rate limit (Shannon entropy), but at cost of much larger computational cost. Asymmetric numeral systems (ANS) is a new approach to accurate entropy coding, which allows to end this trade-off between speed and rate: the recent implementation [1] provides about 50%50\% faster decoding than HC for 256 size alphabet, with compression rate similar to provided by AC. This advantage is due to being simpler than AC: using single natural number as the state, instead of two to represent a range. Beside simplifying renormalization, it allows to put the entire behavior for given probability distribution into a relatively small table: defining entropy coding automaton. The memory cost of such table for 256 size alphabet is a few kilobytes. There is a large freedom while choosing a specific table - using pseudorandom number generator initialized with cryptographic key for this purpose allows to simultaneously encrypt the data. This article also introduces and discusses many other variants of this new entropy coding approach, which can provide direct alternatives for standard AC, for large alphabet range coding, or for approximated quasi arithmetic coding.
Article
Full-text available
An algorithm introduced by L. Breiman et al. (1984) in the context of classification and regression trees is reinterpreted and extended to cover a variety of applications in source coding and modeling in which trees are involved. These include variable-rate and minimum-entropy tree-structured vector quantization, minimum expected cost decision trees, variable-order Markov modeling, optimum bit allocation, and computer graphics and image processing using quadtrees. A concentration on the first of these and a detailed analysis of variable-rate tree-structured vector quantization are provided. It is found that variable-rate tree-structured vector quantization outperforms not only the fixed-rate variety but also full-search vector quantization. The successive approximation character of variable-rate tree-structured vector quantization permits it to degrade gracefully if the rate is reduced at the encoder. This has applications to the problem of buffer overflow
Article
Full-text available
Adaptively Sampled Distance Fields (ADFs) are a unifying representation of shape that integrate numerous concepts in computer graphics including the representation of geometry and volume data and a broad range of processing operations such as rendering, sculpting, level-of-detail management, surface offsetting, collision detection, and color gamut correction. Its structure is uncomplicated and direct, but is especially effective for quality reconstruction of complex shapes, e.g., artistic and organic forms, precision parts, volumes, high order functions, and fractals. We characterize one implementation of ADFs, illustrating its utility on two diverse applications: 1) artistic carving of fine detail, and 2) representing and rendering volume data and volumetric effects. Other applications are briefly presented. Copyright Mitsubishi Electric Research Laboratories, Inc., 2000 201 Broadway, Cambridge, Massachusetts 02139 This work may not be copied or reproduced in whole or in p...
Article
Compression of point clouds has so far been confined to coding the positions of a discrete set of points in space and the attributes of those discrete points. We introduce an alternative approach based on volumetric functions, which are functions defined not just on a finite set of points, but throughout space. As in regression analysis, volumetric functions are continuous functions that are able to interpolate values on a finite set of points as linear combinations of continuous basis functions. Using a B-spline wavelet basis, we are able to code volumetric functions representing both geometry and attributes. Geometry compression is addressed in Part II of this paper, while attribute compression is addressed in Part I. Attributes are represented by a volumetric function whose coefficients can be regarded as a critically sampled orthonormal transform that generalizes the recent successful region-adaptive hierarchical (or Haar) transform to higher orders. Experimental results show that attribute compression using higher order volumetric functions is an improvement over the first order functions used in the emerging MPEG Point Cloud Compression standard.
Article
Due to the increased popularity of augmented and virtual reality experiences, the interest in capturing the real world in multiple dimensions and in presenting it to users in an immersible fashion has never been higher. Distributing such representations enables users to freely navigate in multi-sensory 3D media experiences. Unfortunately, such representations require a large amount of data, not feasible for transmission on today’s networks. Efficient compression technologies well adopted in the content chain are in high demand and are key components to democratize augmented and virtual reality applications. The Moving Picture Experts Group, MPEG, as one of the main standardization groups dealing with multimedia, identified the trend and started recently the process of building an open standard for compactly representing 3D point clouds, which are the 3D equivalent of the very well-known 2D pixels. This paper introduces the main developments and technical aspects of this ongoing standardization effort.
Conference Paper
We present a new mathematical framework for multi-view surface reconstruction from a set of calibrated color and depth images. We estimate the occupancy probability of points in space along sight rays, and combine these estimates using a normalized product derived from Bayes' rule. The advantage of this approach is that the free space constraint is a natural consequence of the formulation, and not a separate logical operation. We present a single closed form implicit expression for the reconstructed surface in terms of the image data and camera projections, making analytic properties such as surface normals not only easy to compute, but exact. This expression can be efficiently evaluated on the GPU, making it ideal for high performance real-time applications, such as live human body capture for immersive telepresence.
Chapter
The Embedded Zerotree Wavelet Algorithm (EZW) is a simple, yet remarkably effective, image compression algorithm, having the property that the bits in the bit stream are generated in order of importance, yielding a fully embedded code. The embedded code represents a sequence of binary decisions that distinguish an image from the “null” image. Using an embedded coding algorithm, an encoder can terminate the encoding at any point thereby allowing a target rate or target distortion metric to be met exactly. Also, given a bit stream, the decoder can cease decoding at any point in the bit stream and still produce exactly the same image that would have been encoded at the bit rate corresponding to the truncated bit stream. In addition to producing a fully embedded bit stream, EZW consistently produces compression results that are competitive with virtually all known compression algorithms on standard test images. Yet this performance is achieved with a technique that requires absolutely no training, no pre-stored tables or codebooks, and requires no prior knowledge of the image source. The EZW algorithm is based on four key concepts: 1) a discrete wavelet transform or hierarchical subband decomposition, 2) prediction of the absence of significant information across scales by exploiting the self-similarity inherent in images, 3) entropy-coded successive-approximation quantization, and 4) universal lossless data compression which is achieved via adaptive arithmetic coding.
Chapter
As we have seen, a number of simplifications can be made when φ is a signed distance function. For this reason, we dedicate this chapter to numerical techniques for constructing approximate signed distance functions. These techniques can be applied to the initial data in order to initialize φ to a signed distance function.
Chapter
In the last chapter we defined implicit functions with φ(x↦) ≤ 0 in the interior region Ω-, φ((x↦) > 0 in the exterior region Ω+, and φ((x↦) = 0 on the boundary ∂Ω. Little was said about φ otherwise, except that smoothness is a desirable property especially in sampling the function or using numerical approximations. In this chapter we discuss signed distance functions, which are a subset of the implicit functions defined in the last chapter. We define signed distance functions to be positive on the exterior, negative on the interior, and zero on the boundary. An extra condition of |∇φ(x↦)| = 1 is imposed on a signed distance function.
Article
The embedded zerotree wavelet algorithm (EZW) is a simple, yet remarkably effective, image compression algorithm, having the property that the bits in the bit stream are generated in order of importance, yielding a fully embedded code. The embedded code represents a sequence of binary decisions that distinguish an image from the “null” image. Using an embedded coding algorithm, an encoder can terminate the encoding at any point thereby allowing a target rate or target distortion metric to be met exactly. Also, given a bit stream, the decoder can cease decoding at any point in the bit stream and still produce exactly the same image that would have been encoded at the bit rate corresponding to the truncated bit stream. In addition to producing a fully embedded bit stream, the EZW consistently produces compression results that are competitive with virtually all known compression algorithms on standard test images. Yet this performance is achieved with a technique that requires absolutely no training, no pre-stored tables or codebooks, and requires no prior knowledge of the image source. The EZW algorithm is based on four key concepts: (1) a discrete wavelet transform or hierarchical subband decomposition, (2) prediction of the absence of significant information across scales by exploiting the self-similarity inherent in images, (3) entropy-coded successive-approximation quantization, and (4) universal lossless data compression which is achieved via adaptive arithmetic coding
Article
Compressibility of individual sequences by the class of generalized finite-state information-lossless encoders is investigated. These encoders can operate in a variable-rate mode as well as a fixed-rate one, and they allow for any finite-state scheme of variable-length-to-variable-length coding. For every individual infinite sequence x a quantity rho(x) is defined, called the compressibility of x , which is shown to be the asymptotically attainable lower bound on the compression ratio that can be achieved for x by any finite-state encoder. This is demonstrated by means of a constructive coding theorem and its converse that, apart from their asymptotic significance, also provide useful performance criteria for finite and practical data-compression tasks. The proposed concept of compressibility is also shown to play a role analogous to that of entropy in classical information theory where one deals with probabilistic ensembles of sequences rather than with individual sequences. While the definition of rho(x) allows a different machine for each different sequence to be compressed, the constructive coding theorem leads to a universal algorithm that is asymptotically optimal for all sequences.
Article
A universal algorithm for sequential data compression is presented. Its performance is investigated with respect to a nonprobabilistic model of constrained sources. The compression ratio achieved by the proposed universal code uniformly approaches the lower bounds on the compression ratios attainable by block-to-variable codes and variable-to-block codes designed to match a completely specified source.
Article
A number of techniques have been developed for reconstructing surfaces by integrating groups of aligned range images. A desirable set of properties for such algorithms includes: incremental updating, representation of directional uncertainty, the ability to fill gaps in the reconstruction, and robustness in the presence of outliers. Prior algorithms possess subsets of these properties. In this paper, we present a volumetric method for integrating range images that possesses all of these properties.
Common Test Conditions For Point Cloud Compression
  • S Schwarz
  • G Martin-Cocher
  • D Flynn
  • M Budagavi