FIGURE 6 - available via license: Creative Commons Attribution 4.0 International
Content may be subject to copyright.
| Comparison between geometry compression codecs in (Quach et al., 2020b) and MPEG G-PCC v10.0 (MPEG, 2021c). Learning-based approaches can achieve significantly lower distortions at equivalent bitrates.
Source publication
Point clouds are becoming essential in key applications with advances in capture technologies leading to large volumes of data. Compression is thus essential for storage and transmission. In this work, the state of the art for geometry and attribute compression methods with a focus on deep learning based approaches is reviewed. The challenges faced...
Context in source publication
Context 1
... probabilities predicted by the entropy model can then be used by an arithmetic coder in order to encode and decode the latent space. Quach et al. (2019) introduce the use of CNNs for point cloud geometry compression and in (Quach et al., 2020b) the proposed approach significantly outperform MPEG G-PCC (MPEG, 2021c) as shown in Figure 6. The decoding of a point cloud can be cast as a binary classification problem in a voxel grid. ...
Similar publications
3D face recognition systems have been widely employed in intelligent terminals, among which structured light imaging is a common method to measure the 3D shape. However, this method could be easily attacked, leading to inaccurate 3D face recognition. In this paper, we propose a novel, physically-achievable attack on the fringe structured light syst...
Citations
... Nowadays, point clouds have been massively used in networked applications including Augmented and Virtual Reality, Autonomous Machinery, etc., making the desire for efficient Point Cloud Compression (PCC) more and more indispensable. In addition to those rules-based PCC solutions, such as Geometry-based PCC (G-PCC) or Video-based PCC (V-PCC) standardized under the ISO/IEC MPEG committee [1], learning-based PCC approaches have attracted worldwide attention and demonstrated noticeable compression gains [2] in Point Cloud Geometry Compression (PCGC). Among them, our earlier multiscale sparse representation-based PCGC has reported state-of-the-art performance [3,4] on a variety of point clouds (e.g., dense object and sparse LiDAR data). ...
This work extends the multiscale structure originally developed for point cloud geometry compression to point cloud attribute compression. To losslessly encode the attribute while maintaining a low bitrate, accurate probability prediction is critical. With this aim, we extensively exploit cross-scale, cross-group, and cross-color correlations of point cloud attribute to ensure accurate probability estimation and thus high coding efficiency. Specifically, we first generate multiscale attribute tensors through average pooling, by which, for any two consecutive scales, the decoded lower-scale attribute can be used to estimate the attribute probability in the current scale in one shot. Additionally, in each scale, we perform the probability estimation group-wisely following a predefined grouping pattern. In this way, both cross-scale and (same-scale) cross-group correlations are exploited jointly. Furthermore, cross-color redundancy is removed by allowing inter-color processing for YCoCg/RGB alike multi-channel attributes. The proposed method not only demonstrates state-of-the-art compression efficiency with significant performance gains over the latest G-PCC on various contents but also sustains low complexity with affordable encoding and decoding runtime.
... For example, the space partitioning trees approaches that exploit the 3D correlation between pointcloud points are widely used to arXiv:2301.11251v1 [cs.RO] 26 Jan 2023 compress the pointcloud data [4]- [9]. Recently, deep learning based approaches were also proposed to leverage data and learn or encode the pointcloud compression [10]- [12]. Different from these frameworks, the probabilistic approaches exploit the compactness of the distributions to compress 3D sensor observation. ...
This paper presents a framework to represent high-fidelity pointcloud sensor observations for efficient communication and storage. The proposed approach exploits Sparse Gaussian Process to encode pointcloud into a compact form. Our approach represents both the free space and the occupied space using only one model (one 2D Sparse Gaussian Process) instead of the existing two-model framework (two 3D Gaussian Mixture Models). We achieve this by proposing a variance-based sampling technique that effectively discriminates between the free and occupied space. The new representation requires less memory footprint and can be transmitted across limitedbandwidth communication channels. The framework is extensively evaluated in simulation and it is also demonstrated using a real mobile robot equipped with a 3D LiDAR. Our method results in a 70 to 100 times reduction in the communication rate compared to sending the raw pointcloud.
... A continuous-time approach should naturally handle such applications. Nevertheless, scaling up will require much stronger codec models, and perhaps combining with compression methods specific to such domains such as point cloud compression [Quach et al., 2022] or video compression , Rippel et al., 2021. ...
Neural compression offers a domain-agnostic approach to creating codecs for lossy or lossless compression via deep generative models. For sequence compression, however, most deep sequence models have costs that scale with the sequence length rather than the sequence complexity. In this work, we instead treat data sequences as observations from an underlying continuous-time process and learn how to efficiently discretize while retaining information about the full sequence. As a consequence of decoupling sequential information from its temporal discretization, our approach allows for greater compression rates and smaller computational complexity. Moreover, the continuous-time approach naturally allows us to decode at different time intervals. We empirically verify our approach on multiple domains involving compression of video and motion capture sequences, showing that our approaches can automatically achieve reductions in bit rates by learning how to discretize.
... Instead of applying handcrafted rules, data-driven learning is applied to derive (non-linear) transforms and context models directly for PCA compression. Among them, end-to-end supervised learning is the most straightforward solution [26]. Sheng et al. [27] designed a point-based lossy attribute autoencoder, where stacked multi-layer perceptrons (MLPs) were used to extract spatial correlations across points and transform the input attribute into high-dimensional features for entropy coding. ...
A learning-based adaptive loop filter is developed for the Geometry-based Point Cloud Compression (G-PCC) standard to reduce attribute compression artifacts. The proposed method first generates multiple Most-Probable Sample Offsets (MPSOs) as potential compression distortion approximations, and then linearly weights them for artifact mitigation. As such, we drive the filtered reconstruction as close to the uncompressed PCA as possible. To this end, we devise a Compression Artifact Reduction Network (CARNet) which consists of two consecutive processing phases: MPSOs derivation and MPSOs combination. The MPSOs derivation uses a two-stream network to model local neighborhood variations from direct spatial embedding and frequency-dependent embedding, where sparse convolutions are utilized to best aggregate information from sparsely and irregularly distributed points. The MPSOs combination is guided by the least square error metric to derive weighting coefficients on the fly to further capture content dynamics of input PCAs. The CARNet is implemented as an in-loop filtering tool of the GPCC, where those linear weighting coefficients are encapsulated into the bitstream with negligible bit rate overhead. Experimental results demonstrate significant improvement over the latest GPCC both subjectively and objectively.
... State-of-the-art non-learning-based PCC methods include MPEG G-PCC, MPEG V-PCC, and Draco, see [9] for an overview. Recent advances in deep learning provide opportunities to outperform them [23]. Deep-learning-based approaches purely based on voxels or octrees attempt to organize 3D point sets in the first place, putting the representability of geometric details at risk. ...
Point cloud compression (PCC) is a key enabler for various 3-D applications, owing to the universality of the point cloud format. Ideally, 3D point clouds endeavor to depict object/scene surfaces that are continuous. Practically, as a set of discrete samples, point clouds are locally disconnected and sparsely distributed. This sparse nature is hindering the discovery of local correlation among points for compression. Motivated by an analysis with fractal dimension, we propose a heterogeneous approach with deep learning for lossy point cloud geometry compression. On top of a base layer compressing a coarse representation of the input, an enhancement layer is designed to cope with the challenging geometric residual/details. Specifically, a point-based network is applied to convert the erratic local details to latent features residing on the coarse point cloud. Then a sparse convolutional neural network operating on the coarse point cloud is launched. It utilizes the continuity/smoothness of the coarse geometry to compress the latent features as an enhancement bit-stream that greatly benefits the reconstruction quality. When this bit-stream is unavailable, e.g., due to packet loss, we support a skip mode with the same architecture which generates geometric details from the coarse point cloud directly. Experimentation on both dense and sparse point clouds demonstrate the state-of-the-art compression performance achieved by our proposal. Our code is available at https://github.com/InterDigitalInc/GRASP-Net.
... However, efficient compression of massive LiDAR points that is vital for the storage and networked exchange in vast applications [9] is still challenging because it is hard to exploit inter-correlations among unstructured sparse points. To tackle it, numerous 3D representation models like uniform voxel [30,14,23], octree [21], multiscale sparse tensor [29,28], etc, are developed to specify explicit neighborhood connections upon which rules or learning based approaches [7,22] exploit inter-dependency among points in close proximity. ...
Although convolutional representation of multiscale sparse tensor demonstrated its superior efficiency to accurately model the occupancy probability for the compression of geometry component of dense object point clouds, its capacity for representing sparse LiDAR point cloud geometry (PCG) was largely limited. This is because 1) fixed receptive field of the convolution cannot characterize extremely and unevenly distributed sparse LiDAR points very well; and 2) pretrained convolutions with fixed weights are insufficient to dynamically capture information conditioned on the input. This work therefore suggests the neighborhood point attention (NPA) to tackle them, where we first use k nearest neighbors (kNN) to construct adaptive local neighborhood; and then leverage the self-attention mechanism to dynamically aggregate information within this neighborhood. Such NPA is devised as a NPAFormer to best exploit cross-scale and same-scale correlations for geometric occupancy probability estimation. Compared with the anchor using standardized G-PCC, our method provides >17% BD-rate gains for lossy compression, and >14% bitrate reduction for lossless scenario using popular LiDAR point clouds in SemanticKITTI and Ford datasets. Compared with the state-of-the-art (SOTA) solution using attention optimized octree coding method, our approach requires much less decoding runtime with about 640 times speedup on average, while still presenting better compression efficiency.
... In the last few years, machine learning approaches using neural networks have been proven to be competitive for both lossy [14], [16], [20]- [22] and lossless [11], [17], [20], [24] point cloud geometry compression. A comprehensive survey of the recent methods with a focus on the learning-based approaches is provided in [25]. ...
In this paper we propose a new paradigm for encoding the geometry of dense point cloud sequences, where a convolutional neural network (CNN), which estimates the encoding distributions, is optimized on several frames of the sequence to be compressed. We adopt lightweight CNN structures, we perform training as part of the encoding process and the CNN parameters are transmitted as part of the bitstream. The newly proposed encoding scheme operates on the octree representation for each point cloud, consecutively encoding each octree resolution level. At every octree resolution level, the voxel grid is traversed section-by-section (each section being perpendicular to a selected coordinate axis), and in each section, the occupancies of groups of two-by-two voxels are encoded at once in a single arithmetic coding operation. A context for the conditional encoding distribution is defined for each two-by-two group of voxels based on the information available about the occupancy of the neighboring voxels in the current and lower resolution layers of the octree. The CNN estimates the probability mass functions of the occupancy patterns of all the voxel groups from one section in four phases. In each new phase, the contexts are updated with the occupancies encoded in the previous phase, and each phase estimates the probabilities in parallel, providing a reasonable trade-off between the parallelism of the processing and the informativeness of the contexts. The CNN training time is comparable to the time spent in the remaining encoding steps, leading to competitive overall encoding times. The bitrates and encoding-decoding times compare favorably with those of recently published compression schemes.