Article

Compression of 3D Point Clouds Using a Region-Adaptive Hierarchical Transform

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In free-viewpoint video, there is a recent trend to represent scene objects as solids rather than using multiple depth maps. Point clouds have been used in computer graphics for a long time and with the recent possibility of real time capturing and rendering, point clouds have been favored over meshes in order to save computation. Each point in the cloud is associated with its 3D position and its color. We devise a method to compress the colors in point clouds which is based on a hierarchical transform and arithmetic coding. The transform is a hierarchical sub-band transform that resembles an adaptive variation of a Haar wavelet. The arithmetic encoding of the coefficients assumes Laplace distributions, one per sub-band. The Laplace parameter for each distribution is transmitted to the decoder using a custom method. The geometry of the point cloud is encoded using the well-established octtree scanning. Results show that the proposed solution performs comparably to the current state-of-the-art, in many occasions outperforming it, while being much more computationally efficient. We believe this work represents the state-of-the-art in intra-frame compression of point clouds for real-time 3D video.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... On the contrary, G-PCC encodes the point cloud frame directly in 3D space. It first adopts an octree structure to represent the point cloud geometry, and uses Predicting/Lifting Transforms or the Region-Adaptive Hierarchical Transform (RAHT [9]) for attribute coding. ...
... However, the frequent eigenvalue decomposition of the Laplacian matrix results in high computational cost. To alleviate this problem, region adaptive hierarchical transform (RAHT) is proposed in [9], which utilizes a hierarchical subband transform that resembles an adaptive variation of the Haar wavelet. RAHT can significantly reduce computational complexity and achieve faster processing speed through the adaptive hierarchical transform strategy, and it has adopted in G-PCC on account of its superior efficiency. ...
... The Bits per point (Bpp) serves as the metric to measure the bit cost of compressed attribute values. Figure 7 shows the rate-distortion curves of our method and three other representative methods including G-PCC (TMC13) [59], Deep-PCAC [21] and RAHT-RLGR [9]. Note that the SparsePCAC [19] is not included for its unreleased source code. ...
Preprint
Full-text available
With the great progress of 3D sensing and acquisition technology, the volume of point cloud data has grown dramatically, which urges the development of efficient point cloud compression methods. In this paper, we focus on the task of learned lossy point cloud attribute compression (PCAC). We propose an efficient attention-based method for lossy compression of point cloud attributes leveraging on an autoencoder architecture. Specifically, at the encoding side, we conduct multiple downsampling to best exploit the local attribute patterns, in which effective External Cross Attention (ECA) is devised to hierarchically aggregate features by intergrating attributes and geometry contexts. At the decoding side, the attributes of the point cloud are progressively reconstructed based on the multi-scale representation and the zero-padding upsampling tactic. To the best of our knowledge, this is the first approach to introduce attention mechanism to point-based lossy PCAC task. We verify the compression efficiency of our model on various sequences, including human body frames, sparse objects, and large-scale point cloud scenes. Experiments show that our method achieves an average improvement of 1.15 dB and 2.13 dB in BD-PSNR of Y channel and YUV channel, respectively, when comparing with the state-of-the-art point-based method Deep-PCAC. Codes of this paper are available at https://github.com/I2-Multimedia-Lab/Att2CPC.
... Due to the diverse attributes and intricate rendering procedure of the 3D-GS, developing an efficient compression method for 3D Gaussians presents significant challenges. Previous studies on point cloud compression [8,12,44,55] involve voxelizing the point cloud and applying transformations, quantization, and entropy encoding. However, these approaches cannot support fine-tuning to restore the quality of compressed 3D Gaussians and are limited to conventional point clouds with basic attributes like color and normals. ...
... Regarding attribute transformation, we first transform the rotation quaternions (4 numbers) into Euler angles (3 numbers), a lossless process that reduces the storage requirement for each Gaussian by one number. Then, we adopt region adaptive hierarchical transform (RAHT) [8] to reduce the entropy of key attributes -opacity, scales, Euler angles, and 0-degree SH coefficients. RAHT involves transforming a channel of the attribute into a DC coefficient and several concentrated distributed AC coefficients. ...
... For instance, [55] constructs a graph from the point cloud and applies the graph fourier transform to attributes. [8] introduces Haar wavelet transforms to attribute compression. Quantization is used to convert coefficients from transform coding into transmitted symbols and reduce the high-frequency components. ...
Preprint
Full-text available
3D Gaussian Splatting demonstrates excellent quality and speed in novel view synthesis. Nevertheless, the huge file size of the 3D Gaussians presents challenges for transmission and storage. Current works design compact models to replace the substantial volume and attributes of 3D Gaussians, along with intensive training to distill information. These endeavors demand considerable training time, presenting formidable hurdles for practical deployment. To this end, we propose MesonGS, a codec for post-training compression of 3D Gaussians. Initially, we introduce a measurement criterion that considers both view-dependent and view-independent factors to assess the impact of each Gaussian point on the rendering output, enabling the removal of insignificant points. Subsequently, we decrease the entropy of attributes through two transformations that complement subsequent entropy coding techniques to enhance the file compression rate. More specifically, we first replace rotation quaternions with Euler angles; then, we apply region adaptive hierarchical transform to key attributes to reduce entropy. Lastly, we adopt finer-grained quantization to avoid excessive information loss. Moreover, a well-crafted finetune scheme is devised to restore quality. Extensive experiments demonstrate that MesonGS significantly reduces the size of 3D Gaussians while preserving competitive quality.
... The idea of using point clouds as a data structure for streaming dynamical volumetric multimedia experiences over the Internet leads to significant bandwidth requirements, which establishes the need for efficient compression approaches. Standardization proposes solutions for point cloud compression through G-PCC [3], which uses octree data structures for the geometry and tailored approaches for the attributes like Region Adaptive Hierarchical Transform (RAHT) [4]. When considering point cloud sequences, V-PCC [5] utilizes video codecs to compress projections of the point cloud, thus allowing to exploit inter-frame correlations in 2D. ...
... MPEG standards [20] distinguish between static point cloud compression using Geometry-based PCC (G-PCC) [3] and dynamic point cloud compression with Video-based PCC (V-PCC) [5]. While G-PCC relies on octree partitions for geometry compression and methods like Region Adaptive Hierarchical Transform (RAHT) [4] for attributes, V-PCC projects geometry and attribute information into video frames and leverages video codecs for compression. Driven by promising results in learned image compression [6][7][8], learning based solutions for point cloud geometry or attribute compression showed competitive results in both domains. ...
... We note that early work explored learned compression of attributes and geometry in a single model [15], however, learned compression methods using separate models for both modalities [17] were shown to outperform the single model approach. The latter further boosts performance through wrapping RAHT [4] implemented in G-PCC [3] in the attribute branch and applying adaptive filters [22], which need to be optimized during each encoding, adding to the already large computational cost. ...
Preprint
Full-text available
Point cloud compression is essential to experience volumetric multimedia as it drastically reduces the required streaming data rates. Point attributes, specifically colors, extend the challenge of lossy compression beyond geometric representation to achieving joint reconstruction of texture and geometry. State-of-the-art methods separate geometry and attributes to compress them individually. This comes at a computational cost, requiring an encoder and a decoder for each modality. Additionally, as attribute compression methods require the same geometry for encoding and decoding, the encoder emulates the decoder-side geometry reconstruction as an input step to project and compress the attributes. In this work, we propose to learn joint compression of geometry and attributes using a single, adaptive autoencoder model, embedding both modalities into a unified latent space which is then entropy encoded. Key to the technique is to replace the search for trade-offs between rate, attribute quality and geometry quality, through conditioning the model on the desired qualities of both modalities, bypassing the need for training model ensembles. To differentiate important point cloud regions during encoding or to allow view-dependent compression for user-centered streaming, conditioning is pointwise, which allows for local quality and rate variation. Our evaluation shows comparable performance to state-of-the-art compression methods for geometry and attributes, while reducing complexity compared to related compression methods.
... removal of redundant points, the point cloud is voxelized, and then encoded by using methods such as trisoup [13], octree [14], or predictive tree [15], to generate the corresponding geometry bitstream. Next, the reconstructed geometry information is used for attribute encoding in which three methods, i.e., region-adaptive hierarchical transform (RAHT) [16], levels of detail (LoD)-based predictive transform (PT) [17], and lifting transform (LT) [18], can be selected. Arithmetic entropy coding is then applied to the transformed coefficients to generate the corresponding attribute bitstream. ...
... II. RELATED WORK Research on point cloud compression has made notable progress in recent years, accompanied by ongoing enhancements and optimizations in G-PCC standard. Queiroz et al proposed region-adaptive hierarchical transform in [16] which was adopted by G-PCC as a transform coding method for dense point clouds. RAHT relies on a pre-partitioned octree structure, where each non-empty voxel block (namely a transform block hereafter) contains 2×2×2 sub-blocks (namely sub-blocks hereafter). ...
Preprint
Full-text available
Three-dimensional (3D) point clouds are becoming more and more popular for representing 3D objects and scenes. Due to limited network bandwidth, efficient compression of 3D point clouds is crucial. To tackle this challenge, the Moving Picture Experts Group (MPEG) is actively developing the Geometry-based Point Cloud Compression (G-PCC) standard, incorporating innovative methods to optimize compression, such as the Region-Adaptive Hierarchical Transform (RAHT) nestled within a layer-by-layer octree-tree structure. Nevertheless, a notable problem still exists in RAHT, i.e., the proportion of zero residuals in the last few RAHT layers leads to unnecessary bitrate consumption. To address this problem, we propose an adaptive skip coding method for RAHT, which adaptively determines whether to encode the residuals of the last several layers or not, thereby improving the coding efficiency. In addition, we propose a rate-distortion cost calculation method associated with an adaptive Lagrange multiplier. Experimental results demonstrate that the proposed method achieves average Bj{\o}ntegaard rate improvements of -3.50%, -5.56%, and -4.18% for the Luma, Cb, and Cr components, respectively, on dynamic point clouds, when compared with the state-of-the-art G-PCC reference software under the common test conditions recommended by MPEG.
... Five different point clouds, shown in Fig. 1 were selected. The MPEG Geometry Point Cloud Compression (G-PCC) [1] standard, using the octree mode for geometry and the Regional Adaptive Hierarchical Transform (RAHT)) [7] and the predicting/Lifting (predlift) [1] transform for color attribute encoding, was used in this work. Moreover, the learning-based solution JPEG (a) 30%/70% r01 Pleno Point Cloud Verification model [8], was also included in this study. ...
... The octree mode was selected, as it is usually the most widely used and outperforms trisoup [2]. In terms of color encoding, G-PCC allows the use of the Regional Adaptive Hierarchical Transform (RAHT) [7], the predicting transform [1], and the lifting transform [1]. ...
Preprint
Full-text available
Typically, point cloud encoders allocate a similar bitrate for geometry and attributes (usually RGB color components) information coding. This paper reports a quality study considering different coding bitrate tradeoff between geometry and attributes. A set of five point clouds, representing different characteristics and types of content was encoded with the MPEG standard Geometry Point Cloud Compression (G-PCC), using octree to encode geometry information, and both the Region Adaptive Hierarchical Transform and the Prediction Lifting transform for attributes. Furthermore, the JPEG Pleno Point Cloud Verification Model was also tested. Five different attributes/geometry bitrate tradeoffs were considered, notably 70%/30%, 60%/40%, 50%/50%, 40%/60%, 30%/70%. Three point cloud objective metrics were selected to assess the quality of the reconstructed point clouds, notably the PSNR YUV, the Point Cloud Quality Metric, and GraphSIM. Furthermore, for each encoder, the Bjonteegaard Deltas were computed for each tradeoff, using the 50%/50% tradeoff as a reference. The reported results indicate that using a higher bitrate allocation for attribute encoding usually yields slightly better results.
... The compression of geometric information is usually based on octree [12][13][14] or mesh/surface methods [15,16]. For the compression of attribute information, hierarchical neighborhood prediction algorithms based on Prediction With Lifting Transform (PredLift) [17,18] and Region-Adaptive Hierarchical Transform (RAHT) [19] have been designed. These compression standards are designed to efficiently compress different types of PCs, so as to meet the needs of various applications. ...
... However, this method is limited by the high computational complexity of the graph Laplacian eigenvalue decomposition, with high computational costs incurred. To address this problem, Queiroz and Chou [19] introduced the Region-Adaptive Hierarchical Transform (RAHT) method to compress attributes. By employing a kind of hierarchical sub-band transform similar to the adaptive variation of the Haar wavelet [50], this method can significantly reduce the computational complexity, while maintaining efficient compression performance. ...
Article
Full-text available
As a compression standard, Geometry-based Point Cloud Compression (G-PCC) can effectively reduce data by compressing both geometric and attribute information. Even so, due to coding errors and data loss, point clouds (PCs) still face distortion challenges, such as the encoding of attribute information may lead to spatial detail loss and visible artifacts, which negatively impact visual quality. To address these challenges, this paper proposes an iterative removal method for attribute compression artifacts based on a graph neural network. First, the geometric coordinates of the PCs are used to construct a graph that accurately reflects the spatial structure, with the PC attributes treated as signals on the graph’s vertices. Adaptive graph convolution is then employed to dynamically focus on the areas most affected by compression, while a bi-branch attention block is used to restore high-frequency details. To maintain overall visual quality, a spatial consistency mechanism is applied to the recovered PCs. Additionally, an iterative strategy is introduced to correct systematic distortions, such as additive bias, introduced during compression. The experimental results demonstrate that the proposed method produces finer and more realistic visual details, compared to state-of-the-art techniques for PC attribute compression artifact removal. Furthermore, the proposed method significantly reduces the network runtime, enhancing processing efficiency.
... However, transposing efficient signal processing tools from 2D spaces to sparse and irregularly sampled 3D point data presents challenges. Existing attribute compression approaches mostly rely on geometry-dependent transformations [2][3][4], where the geometry representation significantly influences attribute compression. Point cloud geometry can adopt point, octree, or voxel representations. ...
... Many methods for compressing point cloud attributes rely on transforms that depend on geometry [2][3][4]. For example, the approach proposed in [2] constructs a graph from geometry, treating attributes as signals on this graph. ...
Preprint
Recent advancements in point cloud compression have primarily emphasized geometry compression while comparatively fewer efforts have been dedicated to attribute compression. This study introduces an end-to-end learned dynamic lossy attribute coding approach, utilizing an efficient high-dimensional convolution to capture extensive inter-point dependencies. This enables the efficient projection of attribute features into latent variables. Subsequently, we employ a context model that leverage previous latent space in conjunction with an auto-regressive context model for encoding the latent tensor into a bitstream. Evaluation of our method on widely utilized point cloud datasets from the MPEG and Microsoft demonstrates its superior performance compared to the core attribute compression module Region-Adaptive Hierarchical Transform method from MPEG Geometry Point Cloud Compression with 38.1% Bjontegaard Delta-rate saving in average while ensuring a low-complexity encoding/decoding.
... Then, the geometry values can be encoded using either an octree-like voxelization of the point cloud or a method named trisoup, which approximates the object's surface by a series of triangles, followed by an arithmetic coder. Finally, G-PCC encodes the attributes using one of three available transforming tools: Region Adaptive Hierarchical Transform (RAHT) [115], the Predicting Transform, and the Lifting Transform, also followed by an arithmetic coder. ...
Thesis
Full-text available
The advancement of autonomous driving marks a pivotal shift in the automotive industry, steering us toward a future characterized by smarter, safer, and more sustainable transportation systems. This relies heavily on robust perception systems capable of accurately interpreting their surroundings in real-time. Light Detection And Ranging (LiDAR) sensors have emerged as a critical component of these systems, providing high-resolution 3D point clouds that allow vehicles to map their environment. However, the vast amount of data generated by LiDAR sensors, often reaching several gigabits per second, presents significant challenges in terms of data processing, storage, and transmission, particularly for embedded automotive systems with constrained bandwidth and memory. This dissertation addresses this challenge by introducing Hardware Assisted Range Image Compression (HARIC), a hardware-accelerated point cloud compression algorithm designed specifically for automotive LiDAR systems. The proposed solution is implemented within the Advanced LiDAR Framework for Automotive (ALFA) and evaluated in two configurations: a software-only setup and one accelerated with Field-Programmable Gate Array (FPGA) hardware. This integration with the ALFA framework, allows HARIC to take advantage of the Robot Operating System 2 (ROS2) capabilities and ensures a cohesive and ecosystem and efficient communication between the various components. The algorithm’s performance was rigorously tested using real-world automotive datasets covering diverse driving scenarios. This evaluation demonstrated HARIC’s efficiency in real-time operation, achieving comparable accuracy to the software-only implementation while reducing processing time by 200%.
... Additionally, G-PCC offers a geometry coding technique called triangle soup (trisoup), which approximates object surfaces using triangle meshes and performs particularly well at low bit rates. For attribute compression, G-PCC employs linear transforms based on geometry, including the region-adaptive hierarchical transform (RAHT) [25], which predicts attribute values at higher levels of the octree based on lower-level values. G-PCC further enhances its performance with improved entropy coding and prediction of RAHT coefficients. ...
Preprint
Point clouds have gained prominence in numerous applications due to their ability to accurately depict 3D objects and scenes. However, compressing unstructured, high-precision point cloud data effectively remains a significant challenge. In this paper, we propose NeRC3^{\textbf{3}}, a novel point cloud compression framework leveraging implicit neural representations to handle both geometry and attributes. Our approach employs two coordinate-based neural networks to implicitly represent a voxelized point cloud: the first determines the occupancy status of a voxel, while the second predicts the attributes of occupied voxels. By feeding voxel coordinates into these networks, the receiver can efficiently reconstructs the original point cloud's geometry and attributes. The neural network parameters are quantized and compressed alongside auxiliary information required for reconstruction. Additionally, we extend our method to dynamic point cloud compression with techniques to reduce temporal redundancy, including a 4D spatial-temporal representation termed 4D-NeRC3^{\textbf{3}}. Experimental results validate the effectiveness of our approach: for static point clouds, NeRC3^{\textbf{3}} outperforms octree-based methods in the latest G-PCC standard. For dynamic point clouds, 4D-NeRC3^{\textbf{3}} demonstrates superior geometry compression compared to state-of-the-art G-PCC and V-PCC standards and achieves competitive results for joint geometry and attribute compression.
... Its practicality is hindered by the computationally intensive task of repeatedly solving eigen-decompositions, which makes it unsuitable for real-time applications. Queiroz and Chou proposed a Region-Adaptive Hierarchical Transform (RAHT) with employing a hierarchical sub-band transform [42]. It is adaptive and akin to a modified Haar wavelet, coupled with arithmetic coding that assumes Laplace distributions for the coefficients of each sub-band. ...
Preprint
The evolution of 3D visualization techniques has fundamentally transformed how we interact with digital content. At the forefront of this change is point cloud technology, offering an immersive experience that surpasses traditional 2D representations. However, the massive data size of point clouds presents significant challenges in data compression. Current methods for lossy point cloud attribute compression (PCAC) generally focus on reconstructing the original point clouds with minimal error. However, for point cloud visualization scenarios, the reconstructed point clouds with distortion still need to undergo a complex rendering process, which affects the final user-perceived quality. In this paper, we propose an end-to-end deep learning framework that seamlessly integrates PCAC with differentiable rendering, denoted as rendering-oriented PCAC (RO-PCAC), directly targeting the quality of rendered multiview images for viewing. In a differentiable manner, the impact of the rendering process on the reconstructed point clouds is taken into account. Moreover, we characterize point clouds as sparse tensors and propose a sparse tensor-based transformer, called SP-Trans. By aligning with the local density of the point cloud and utilizing an enhanced local attention mechanism, SP-Trans captures the intricate relationships within the point cloud, further improving feature analysis and synthesis within the framework. Extensive experiments demonstrate that the proposed RO-PCAC achieves state-of-the-art compression performance, compared to existing reconstruction-oriented methods, including traditional, learning-based, and hybrid methods.
... These anchor primitives serve as references for predicting non-anchor primitives across different LoDs to reduce spatial redundancy. For anchor primitives, we employ the region adaptive hierarchical transform (RAHT) [10] curacy. For non-anchor primitives, each is predicted by the k-nearest anchor primitives. ...
Preprint
3D Gaussian Splatting (GS) demonstrates excellent rendering quality and generation speed in novel view synthesis. However, substantial data size poses challenges for storage and transmission, making 3D GS compression an essential technology. Current 3D GS compression research primarily focuses on developing more compact scene representations, such as converting explicit 3D GS data into implicit forms. In contrast, compression of the GS data itself has hardly been explored. To address this gap, we propose a Hierarchical GS Compression (HGSC) technique. Initially, we prune unimportant Gaussians based on importance scores derived from both global and local significance, effectively reducing redundancy while maintaining visual quality. An Octree structure is used to compress 3D positions. Based on the 3D GS Octree, we implement a hierarchical attribute compression strategy by employing a KD-tree to partition the 3D GS into multiple blocks. We apply farthest point sampling to select anchor primitives within each block and others as non-anchor primitives with varying Levels of Details (LoDs). Anchor primitives serve as reference points for predicting non-anchor primitives across different LoDs to reduce spatial redundancy. For anchor primitives, we use the region adaptive hierarchical transform to achieve near-lossless compression of various attributes. For non-anchor primitives, each is predicted based on the k-nearest anchor primitives. To further minimize prediction errors, the reconstructed LoD and anchor primitives are combined to form new anchor primitives to predict the next LoD. Our method notably achieves superior compression quality and a significant data size reduction of over 4.5 times compared to the state-of-the-art compression method on small scenes datasets.
... Existe um mapeamento simples da ROI para a medida de distorção, que pode ser quantificado (por exemplo, usando experimentos perceptuais) independentemente de qualquer codec progressivo em particular. Como nosso codec, escolhemos a codificação por transformação com Transformação Hierárquica Adaptativa à Região (RAHT, do inglês Region-Adaptive Hierarchical Transform) [17] porque é automaticamente otimizado para a medida de distorção em virtude de sua interpretação medida-teórica [18] e pode ser usado em um codificador progressivo. O método proposto também é independente de qualquer algoritmo de detecção de região de interesse ou saliência. ...
... We evaluate chroma subsampling on the 8i PC dataset. We use a transform coding system comprised of the Region adaptive hierarchical transform (RAHT) [37], and run-length Golomb-Rice entropy coding [38]. We compare the rate-distortion (RD) curves, and report bitrate savings and average psnr gain using Bjontegaard metric [39]. ...
Preprint
Full-text available
3D Point clouds (PCs) are commonly used to represent 3D scenes. They can have millions of points, making subsequent downstream tasks such as compression and streaming computationally expensive. PC sampling (selecting a subset of points) can be used to reduce complexity. Existing PC sampling algorithms focus on preserving geometry features and often do not scale to handle large PCs. In this work, we develop scalable graph-based sampling algorithms for PC color attributes, assuming the full geometry is available. Our sampling algorithms are optimized for a signal reconstruction method that minimizes the graph Laplacian quadratic form. We first develop a global sampling algorithm that can be applied to PCs with millions of points by exploiting sparsity and sampling rate adaptive parameter selection. Further, we propose a block-based sampling strategy where each block is sampled independently. We show that sampling the corresponding sub-graphs with optimally chosen self-loop weights (node weights) will produce a sampling set that approximates the results of global sampling while reducing complexity by an order of magnitude. Our empirical results on two large PC datasets show that our algorithms outperform the existing fast PC subsampling techniques (uniform and geometry feature preserving random sampling) by 2dB. Our algorithm is up to 50 times faster than existing graph signal sampling algorithms while providing better reconstruction accuracy. Finally, we illustrate the efficacy of PC attribute sampling within a compression scenario, showing that pre-compression sampling of PC attributes can lower the bitrate by 11% while having minimal effect on reconstruction.
... Block-based coding schemes, commonly used in volumetric compression [1,12], can be further optimized through data filtering [22,23]. Recent works also leverage various tools, such as the Karhunen-Loève transform [50], autoencoders [5,51,52], and wavelet transform [14], to compress nodes in tree structures. While these compressed explicit representations suit traditional graphics, they fall short in rendering photorealistic images from free viewpoints. ...
Preprint
Full-text available
The goal of this paper is to encode a 3D scene into an extremely compact representation from 2D images and to enable its transmittance, decoding and rendering in real-time across various platforms. Despite the progress in NeRFs and Gaussian Splats, their large model size and specialized renderers make it challenging to distribute free-viewpoint 3D content as easily as images. To address this, we have designed a novel 3D representation that encodes the plenoptic function into sinusoidal function indexed dense volumes. This approach facilitates feature sharing across different locations, improving compactness over traditional spatial voxels. The memory footprint of the dense 3D feature grid can be further reduced using spatial decomposition techniques. This design combines the strengths of spatial hashing functions and voxel decomposition, resulting in a model size as small as 150 KB for each 3D scene. Moreover, PPNG features a lightweight rendering pipeline with only 300 lines of code that decodes its representation into standard GL textures and fragment shaders. This enables real-time rendering using the traditional GL pipeline, ensuring universal compatibility and efficiency across various platforms without additional dependencies.
... Researchers such as Zhang et al. [22] directly used GFT, while others like Robert et al. [24] and Song et al. [25] proposed methods based on block prediction and GFT. Cohen et al. [26] and Ricardo et al. [27] introduced approaches using 3D block prediction and hierarchical transforms, respectively. Chen et al. [28] developed a self-loop weighted graph using normalized graph Laplacian to define GFT. ...
Preprint
We propose an end-to-end attribute compression method for dense point clouds. The proposed method combines a frequency sampling module, an adaptive scale feature extraction module with geometry assistance, and a global hyperprior entropy model. The frequency sampling module uses a Hamming window and the Fast Fourier Transform to extract high-frequency components of the point cloud. The difference between the original point cloud and the sampled point cloud is divided into multiple sub-point clouds. These sub-point clouds are then partitioned using an octree, providing a structured input for feature extraction. The feature extraction module integrates adaptive convolutional layers and uses offset-attention to capture both local and global features. Then, a geometry-assisted attribute feature refinement module is used to refine the extracted attribute features. Finally, a global hyperprior model is introduced for entropy encoding. This model propagates hyperprior parameters from the deepest (base) layer to the other layers, further enhancing the encoding efficiency. At the decoder, a mirrored network is used to progressively restore features and reconstruct the color attribute through transposed convolutional layers. The proposed method encodes base layer information at a low bitrate and progressively adds enhancement layer information to improve reconstruction accuracy. Compared to the latest G-PCC test model (TMC13v23) under the MPEG common test conditions (CTCs), the proposed method achieved an average Bjontegaard delta bitrate reduction of 24.58% for the Y component (21.23% for YUV combined) on the MPEG Category Solid dataset and 22.48% for the Y component (17.19% for YUV combined) on the MPEG Category Dense dataset. This is the first instance of a learning-based codec outperforming the G-PCC standard on these datasets under the MPEG CTCs.
... Transform coding methods usually treat the multimedia data as a collection of vectors, where each vector represents a local fixed-size patch of pixels. These block-based methods also exist for 3D volumetric data [4], where 3D patches of voxels are encoded using linear transformations [10,42,43]. These methods are not spatially adaptive, in that each equal-sized patch of an image or volume go through the same encoding regardless of the local resolution content. ...
Preprint
We present Lagrangian Hashing, a representation for neural fields combining the characteristics of fast training NeRF methods that rely on Eulerian grids (i.e.~InstantNGP), with those that employ points equipped with features as a way to represent information (e.g. 3D Gaussian Splatting or PointNeRF). We achieve this by incorporating a point-based representation into the high-resolution layers of the hierarchical hash tables of an InstantNGP representation. As our points are equipped with a field of influence, our representation can be interpreted as a mixture of Gaussians stored within the hash table. We propose a loss that encourages the movement of our Gaussians towards regions that require more representation budget to be sufficiently well represented. Our main finding is that our representation allows the reconstruction of signals using a more compact representation without compromising quality.
... A related topic is the compression of point cloud attributes such as colors, reflectivity or semantic labels. Albeit this topic is outside the scope of this paper, we mention two approaches based on graph transforms [ZFL14] and Haar wavelets [DQC16]. ...
Article
Full-text available
3D point clouds stand as one of the prevalent representations for 3D data, offering the advantage of closely aligning with sensing technologies and providing an unbiased representation of a measured physical scene. Progressive compression is required for real‐world applications operating on networked infrastructures with restricted or variable bandwidth. We contribute a novel approach that leverages a recursive binary space partition, where the partitioning planes are not necessarily axis‐aligned and optimized via an entropy criterion. The planes are encoded via a novel adaptive quantization method combined with prediction. The input 3D point cloud is encoded as an interlaced stream of partitioning planes and number of points in the cells of the partition. Compared to previous work, the added value is an improved rate‐distortion performance, especially for very low bitrates. The latter are critical for interactive navigation of large 3D point clouds on heterogeneous networked infrastructures.
... G-PCC [1] employs an octree-based compression method. Regarding the attribute compression of G-PCC, it integrates three attribute compression methods: Region-Adaptive Hierarchical Transform (RAHT) [15], predictive transform, and lifting transform [16]. RAHT is a variant of the Haar wavelet transform, using lower octree levels to predict values at the next level. ...
Preprint
Point cloud has been the mainstream representation for advanced 3D applications, such as virtual reality and augmented reality. However, the massive data amounts of point clouds is one of the most challenging issues for transmission and storage. In this paper, we propose an end-to-end voxel Transformer and Sparse Convolution based Point Cloud Attribute Compression (TSC-PCAC) for 3D broadcasting. Firstly, we present a framework of the TSC-PCAC, which include Transformer and Sparse Convolutional Module (TSCM) based variational autoencoder and channel context module. Secondly, we propose a two-stage TSCM, where the first stage focuses on modeling local dependencies and feature representations of the point clouds, and the second stage captures global features through spatial and channel pooling encompassing larger receptive fields. This module effectively extracts global and local interpoint relevance to reduce informational redundancy. Thirdly, we design a TSCM based channel context module to exploit interchannel correlations, which improves the predicted probability distribution of quantized latent representations and thus reduces the bitrate. Experimental results indicate that the proposed TSC-PCAC method achieves an average of 38.53%, 21.30%, and 11.19% Bjontegaard Delta bitrate reductions compared to the Sparse-PCAC, NF-PCAC, and G-PCC v23 methods, respectively. The encoding/decoding time costs are reduced up to 97.68%/98.78% on average compared to the Sparse-PCAC. The source code and the trained models of the TSC-PCAC are available at https://github.com/igizuxo/TSC-PCAC.
... Algorithms based on these two techniques were included in the G-PCC standard [8] as two alternatives for geometry encoding, which are here referred to as the octree module and the trisoup module, respectively. Moreover, two alternative color coding modules are included in the same standard, namely the Region Adaptive Hierarchical Transform (RAHT) [9] and nearest-neighbor prediction algorithm with an update step, which is denominated the predlift module. Draco [10], a library developed by Google for 3D data compression, also employs octree for the representation of point clouds. ...
Article
Full-text available
The recent rise in interest in point clouds as an imaging modality has motivated standardization groups such as JPEG and MPEG to launch activities aiming at developing compression standards for point clouds. Lossy compression usually introduces visual artifacts that negatively impact the perceived quality of media, which can only be reliably measured through subjective visual quality assessment experiments. While MPEG standards have been subjectively evaluated in previous studies on multiple occasions, no work has yet assessed the performance of the recent JPEG Pleno standard in comparison to them. In this study, a comprehensive performance evaluation of JPEG and MPEG standards for point cloud compression is conducted. The impact of different configuration parameters on the performance of the codecs is first analyzed with the help of objective quality metrics. The results from this analysis are used to define three rate allocation strategies for each codec, which are employed to compress a set of point clouds at four target rates. The set of distorted point clouds is then subjectively evaluated following two subjective quality assessment protocols. Finally, the obtained results are used to compare the performance of these compression standards and draw insights about best coding practices.
... Xu et al. in [28] introduce a spatio-temporal GFT for dynamic point cloud attribute compression. In response to the time-consuming nature of GFT-based codecs, Queiroz et al. in [29] devise a more efficient transform-based attribute coding scheme named RAHT, and this work is adopted as a part of the solution in MPEG G-PCC. Moreover, Song et al. in [18] design a block-adaptive graph transform method to improve attribute coding performance. ...
Article
Full-text available
There is a pressing need across various applications for efficiently compressing point clouds. While the Moving Picture Experts Group introduced the geometry-based point cloud compression (G-PCC) standard, its attribute compression scheme falls short of eliminating signal frequency-domain redundancy. This paper proposes a texture-guided graph transform optimization scheme for point cloud attribute compression. We formulate the attribute transform coding task as a graph optimization problem, considering both the decorrelation capability of the graph transform and the sparsity of the optimized graph within a tailored joint optimization framework. First, the point cloud is reorganized and segmented into local clusters using a Hilbert-based scheme, enhancing spatial correlation preservation. Second, the inter-cluster attribute prediction and intra-cluster prediction are conducted on local clusters to remove spatial redundancy and extract texture priors. Third, the underlying graph structure in each cluster is constructed in a joint rate–distortion–sparsity optimization process, guided by geometry structure and texture priors to achieve optimal coding performance. Finally, point cloud attributes are efficiently compressed with the optimized graph transform. Experimental results show the proposed scheme outperforms the state of the art with significant BD-BR gains, surpassing G-PCC by 31.02%, 30.71%, and 32.14% in BD-BR gains for Y, U, and V components, respectively. Subjective evaluation of the attribute reconstruction quality further validates the superiority of our scheme.
Preprint
Full-text available
Efficient point cloud (PC) compression is crucial for streaming applications, such as augmented reality and cooperative perception. Classic PC compression techniques encode all the points in a frame. Tailoring compression towards perception tasks at the receiver side, we ask the question, "Can we remove the ground points during transmission without sacrificing the detection performance?" Our study reveals a strong dependency on the ground from state-of-the-art (SOTA) 3D object detection models, especially on those points below and around the object. In this work, we propose a lightweight obstacle-aware Pillar-based Ground Removal (PGR) algorithm. PGR filters out ground points that do not provide context to object recognition, significantly improving compression ratio without sacrificing the receiver side perception performance. Not using heavy object detection or semantic segmentation models, PGR is light-weight, highly parallelizable, and effective. Our evaluations on KITTI and Waymo Open Dataset show that SOTA detection models work equally well with PGR removing 20-30% of the points, with a speeding of 86 FPS.
Article
A universal multiscale conditional coding framework, Unicorn , is proposed to code the geometry and attribute of any given point cloud. Attribute compression is discussed in Part II of this paper, while geometry compression is given in Part I of this paper. We first construct the multiscale sparse tensors of each voxelized point cloud attribute frame. Since attribute components exhibit very different intrinsic characteristics from the geometry element, e.g. , 8-bit RGB color versus 1-bit occupancy, we process the attribute residual between lower-scale reconstruction and current-scale data. Similarly, we leverage spatially lower-scale priors in the current frame and (previously processed) temporal reference frame to improve the probability estimation of attribute intensity through conditional residual prediction in lossless mode or enhance the attribute reconstruction through progressive residual refinement in lossy mode for better performance. The proposed Unicorn is a versatile, learning-based solution capable of compressing a great variety of static and dynamic point clouds in both lossy and lossless modes. Following the same evaluation criteria, Unicorn significantly outperforms standard-compliant approaches like MPEG G-PCC, V-PCC, and other learning-based solutions, yielding state-of-the-art compression efficiency with affordable encoding/decoding runtime.
Article
Point cloud (PC) compression is crucial to immersive visual applications such as autonomous vehicles to classify objects on the roads. The MPEG standardization group has achieved a notable compression efficiency, called video-based point-cloud compression (V-PCC), which consists of an encoder-decoder. The V-PCC encoder takes original 3D PC data and projects them onto multiple 2D planes to generate several 2D feature images. These images are then compressed using the well-established High-Efficiency Video Coding (HEVC) method. The V-PCC decoder uses compressed information and decoding techniques to reconstruct the 3D point cloud. However, the point clouds produced by V-PCC are often sparse, non-uniform, and contain artifacts. In many practical applications, it is necessary to recover complete point clouds from partial ones in real time. This paper presents a method for enhancing decoded point clouds as a post-processing step in the V-PCC with reduced computational time. Our approach involves a 2D upsampling for the V-PCC occupancy image, which increases the density of the point cloud, and a 2D high-resolution auxiliary information modification algorithm for the 2D-3D conversion of high-resolution 3D point clouds, which improves the uniformity and reduces the noise in the point cloud. The 3D high-resolution point cloud has been further enhanced using the developed 3D outlier removal and point regeneration algorithm. Our proposed work can significantly simplify the state-of-the-art superresolution methods for point clouds and reduce the time complexity of 61%75%,61\%\sim 75\%, while maintaining a high level of quality in point clouds.
Article
Full-text available
Accurate digital elevation models (DEMs) derived from airborne light detection and ranging (LiDAR) data are crucial for terrain analysis applications. As established in the literature, higher point density improves terrain representation but requires greater data storage and processing capacities. Therefore, point cloud sampling is necessary to reduce densities while preserving DEM accuracy as much as possible. However, there has been a limited examination directly comparing the effects of various sampling algorithms on DEM accuracy. This study aimed to help fill this gap by evaluating and comparing the performance of three common point cloud sampling methods octree, spatial, and random sampling methods in high terrain. DEMs were then generated from the sampled point clouds using three different interpolation algorithms: inverse distance weighting (IDW), natural neighbor (NN), and ordinary kriging (OK). The results showed that octree sampling consistently produced the most accurate DEMs across all metrics and terrain slopes compared to other methods. Spatial sampling also produced more accurate DEMs than random sampling but was less accurate than octree sampling. The results can be attributed to differences in how the sampling methods represent terrain geometry and retain microtopographic detail. Octree sampling recursively subdivides the point cloud based on density distributions, closely conforming to complex microtopography. In contrast, random sampling disregards underlying densities, reducing accuracy in rough terrain. The findings guide optimal sampling and interpolation methods of airborne lidar point clouds for generating DEMs for similar complex mountainous terrains.
Article
Despite acceleration in the use of 3D meshes, it is difficult to find effective mesh quality assessment algorithms that can produce predictions highly correlated with human subjective opinions. Defining mesh quality features is challenging due to the irregular topology of meshes, which are defined on vertices and triangles. To address this, we propose a novel 3D projective structural similarity index ( 3D\mathtt {3D} - PSSIM\mathtt {PSSIM} ) for meshes that is robust to differences in mesh topology. We address topological differences between meshes by introducing multi-view and multi-layer projections that can densely represent the mesh textures and geometrical shapes irrespective of mesh topology. It also addresses occlusion problems that occur during projection. We propose visual sensitivity weights that capture the perceptual sensitivity to the degree of mesh surface curvature. 3D\mathtt {3D} - PSSIM\mathtt {PSSIM} computes perceptual quality predictions by aggregating quality-aware features that are computed in multiple projective spaces onto the mesh domain, rather than on 2D spaces. This allows 3D\mathtt {3D} - PSSIM\mathtt {PSSIM} to determine which parts of a mesh surface are distorted by geometric or color impairments. Experimental results show that 3D\mathtt {3D} - PSSIM\mathtt {PSSIM} can predict mesh quality with high correlation against human subjective judgments, across the presence of noise, even when there are large topological differences, outperforming existing mesh quality assessment models.
Article
In recent years, many standardized algorithms for point cloud compression (PCC) has been developed and achieved remarkable compression ratios. To provide guidance for rate-distortion optimization and codec evaluation, point cloud quality assessment (PCQA) has become a critical problem for PCC. Therefore, in order to achieve a more consistent correlation with human visual perception of a compressed point cloud, we propose a full-reference PCQA algorithm tailored for static point clouds in this paper, which can jointly measure geometry and attribute deformations. Specifically, we assume that the quality decision of compressed point clouds is determined by both global appearance (e.g., density, contrast, complexity) and local details (e.g., gradient, hole). Motivated by the nature of compression distortions and the properties of the human visual system, we derive perceptually effective features for the above two categories, such as content complexity, luminance/ geometry gradient, and hole probability. Through systematically incorporating measurements of variations in the local and global characteristics, we derive an effective quality index for the input compressed point clouds. Extensive experiments and analyses conducted on popular PCQA databases show the superiority of the proposed method in evaluating compression distortions. Subsequent investigations validate the efficacy of different components within the model design.
Article
Geometry-based point cloud compression (G-PCC) is a state-of-the-art point cloud compression standard. While G-PCC achieves excellent performance, its reliance on the predicting transform leads to a significant dependence problem, which can easily result in distortion accumulation. This not only increases bitrate consumption but also degrades reconstruction quality. To address these challenges, we propose a dependence-based coarse-to-fine approach for distortion accumulation in G-PCC attribute compression. Our method consists of three modules: level-based adaptive quantization, point-based adaptive quantization, and Wiener filter-based refinement level quality enhancement. The level-based adaptive quantization module addresses the interlevel-of-detail (LOD) dependence problem, while the point-based adaptive quantization module tackles the interpoint dependence problem. On the other hand, the Wiener filter-based refinement level quality enhancement module enhances the reconstruction quality of each point based on the dependence order among LODs. Extensive experimental results demonstrate the effectiveness of the proposed method. Notably, when the proposed method was implemented in the latest G-PCC test model (TMC13v23.0), a Bj ϕ\phi ntegaard delta rate of - 4.9%, - 12.7%, and - 14.0% was achieved for the Luma, Chroma Cb, and Chroma Cr components, respectively.
Chapter
Previously, our attention was directed toward techniques related to point cloud compression, encompassing transformation, quantization, entropy coding, and others. Within this section, our emphasis shifts toward methods for point cloud compression rooted in deep learning. Moreover, we delve extensively into the realm of learning-based 3D point cloud compression techniques presented at the MPEG conference. This endeavor aims to foster a more profound comprehension of point cloud compression methodologies.
Chapter
Advances in 3D representation technology have promoted the development of digital museums, automated driving, and other virtual/augmented reality applications. The 3D point cloud is widely used in these emerging applications, thanks to its efficient and concise simulation of 3D objects and scenes, and it provides free views to users via its geometry (the coordinate in 3D space) and attribute (e.g., color, reflectance). Nevertheless, the vast amount of data carried by point clouds limits the deployment of the related applications in terms of efficient communication and storage. To tackle this problem, the Moving Picture Experts Group (MPEG) has launched standardization for directly coding point clouds in 3D space, i.e., geometry-based point cloud compression (G-PCC) codec.
Chapter
The essence of transform coding is to transform signals from one domain (e.g., time domain) to another domain (e.g., frequency domain) with a set of orthogonal basis. The transform is beneficial to eliminate signal correlation and reduce data redundancy. In this chapter, we will introduce some important transforms, including the commonly used discrete cosine transform (DCT), the wavelet transform, and the graph Fourier transform (GFT). We also show several applications of transform-based methods in point cloud attribute compression.
Article
There is an urgent need from various multimedia applications to efficiently compress point clouds. The Moving Picture Experts Group has released a standard platform called geometry-based point cloud compression (G-PCC). However, its k -nearest neighbor (k-NN) based attribute prediction has limited efficiency for point clouds with rich texture and directional information. To overcome this problem, we propose a texture-aware attribute predictive coding framework in a point cloud diffusion model. In our work, attribute intra prediction is solved as a diffusion-based interpolation problem, and a general attribute predictor is developed. It is theoretically proven that G-PCC k -NN based predictor is a degraded case of the proposed diffusion-based solution. First, a point cloud is represented as two levels of details with seeds as the inpainting mask and non-seed points to be predicted. Second, we design point cloud partial difference operators to perform energy-minimizing attribute inpainting from seeds to unknowns. Smooth attribute interpolation can be achieved via an iterative diffusion process, and an adaptive early termination is proposed to reduce complexity. Third, we propose a structure-adaptive attribute predictive coding scheme, where edge-enhancing anisotropic diffusion is employed to perform texture-aware attribute prediction. Finally, attributes of seeds are beforehand encoded and prediction residuals of left points are progressively encoded into bitstream. Experiments show the proposed scheme surpasses the state-of-the-art by an average of 14.14%, 17.52%, and 17.87% BD-BR gains on the coding of Y, U, and V components, respectively. Subjective results on attribute reconstruction quality also verify the advantage of our scheme.
Article
Full-text available
We present the first end-to-end solution to create high-quality free-viewpoint video encoded as a compact data stream. Our system records performances using a dense set of RGB and IR video cameras , generates dynamic textured surfaces, and compresses these to a streamable 3D video format. Four technical advances contribute to high fidelity and robustness: multimodal multi-view stereo fusing RGB, IR, and silhouette information; adaptive meshing guided by automatic detection of perceptually salient areas; mesh tracking to create temporally coherent subsequences; and encoding of tracked textured meshes as an MPEG video stream. Quantitative experiments demonstrate geometric accuracy, texture fidelity, and encoding efficiency. We release several datasets with calibrated inputs and processed results to foster future research.
Conference Paper
Full-text available
The next step in immersive communication beyond video from a single camera is object-based free viewpoint video, which is the capture and compression of a dynamic object such that it can be reconstructed and viewed from an arbitrary viewpoint. The moving human body is a particularly useful subclass of dynamic object for object-based free viewpoint video relevant to both telepresence and entertainment. In this paper, we compress moving human body sequences by applying recently developed Graph Wavelet Filter Banks to time-varying geometry and color signals living on a mesh representation of the human body. This model-based approach significantly outperforms state-of-the-art coding of the human body represented as ordinary depth plus color video sequences.
Article
Full-text available
In this paper, we present Viewport, a fully distributed system for immersive teleconferencing. The participant at each site is captured by a camera rig, which contains multiple color and infrared cameras and multiple infrared projectors. The projected infrared dot pattern is used to help reconstruct a D representation in real-time, which will be rendered in a shared virtual environment for immersive telecommunication. We conduct a multimodal fusion scheme for multiview stereo and 3D reconstruction, which combines the knowledge of the infrared dot patterns from the projectors and the observations of IR and RGB cameras. One novel aspect of our approach is the use of sparse point cloud instead of dense multiview stereo for geometry reconstruction, which leads to significant speed up in D reconstruction and rendering. In addition, we introduce the scheme of virtual seating, where the point clouds are positioned to maintain the same seating geometry as face-to-face meetings after careful calibration, such that the mutual gaze between participants are faithfully maintained.
Article
Full-text available
3D video records dynamic 3D visual events as is. The applica-tion areas of 3D video include wide varieties of human activities. To promote these applications in our everyday life, a standardized compression scheme for 3D video is required. In this paper, we propose a practical and effective scheme for representing and com-pressing 3D video named skin-off, in which both the geometric and visual information are efficiently represented by cutting a 3D mesh and mapping it onto a 2D array. Our skin-off scheme shares much with geometry videos, proposed by Hoppe et al. However, while geometry videos employ the 3D surface shape information alone to generate 2D images, the skin-off scheme we are propos-ing employs both 3D shape and texture information to generate them. This enables us to achieve higher image quality with limited bandwidth. Experimental results demonstrate the effectiveness of the skin-off scheme.
Article
Full-text available
The task of dynamic mesh compression seeks to find a compact representation of a surface animation, while the artifacts introduced by the representation are as small as possible. In this paper, we present two geometric predictors, which are suitable for PCA-based compression schemes. The predictors exploit the knowledge about the geometrical meaning of the data, which allows a more accurate prediction, and thus a more compact representation. We also provide rate/distortion curves showing that our approach outperforms the current PCA-based compression methods by more than 20%.
Article
Full-text available
Three-dimensional (3D) meshes have been widely used in graphic applications for the representation of 3D objects. They often require a huge amount of data for storage and/or transmission in the raw data format. Since most applications demand compact storage, fast transmission, and efficient processing of 3D meshes, many algorithms have been proposed to compress 3D meshes efficiently since early 1990s. In this survey paper, we examine 3D mesh compression technologies developed over the last decade, with the main focus on triangular mesh compression technologies. In this effort, we classify various algorithms into classes, describe main ideas behind each class, and compare the advantages and shortcomings of the algorithms in each class. Finally, we address some trends in the 3D mesh compression technology development.
Conference Paper
Full-text available
With the advent of new, low-cost 3D sensing hardware such as the Kinect, and continued efforts in advanced point cloud processing, 3D perception gains more and more importance in robotics, as well as other fields. In this paper we present one of our most recent initiatives in the areas of point cloud perception: PCL (Point Cloud Library - http://pointclouds.org). PCL presents an advanced and extensive approach to the subject of 3D perception, and it's meant to provide support for all the common 3D building blocks that applications need. The library contains state-of- the art algorithms for: filtering, feature estimation, surface reconstruction, registration, model fitting and segmentation. PCL is supported by an international community of robotics and perception researchers. We provide a brief walkthrough of PCL including its algorithmic capabilities and implementation strategies.
Article
Full-text available
Surface geometry is often modeled with irregular triangle meshes. The process of remeshing refers to approximating such geometry using a mesh with (semi)-regular connectivity, which has advantages for many graphics applications. However, current techniques for remeshing arbitrary surfaces create only semi-regular meshes. The original mesh is typically decomposed into a set of disk-like charts, onto which the geometry is parametrized and sampled. In this paper, we propose to remesh an arbitrary surface onto a completely regular structure we call a geometry image. It captures geometry as a simple 2D array of quantized points. Surface signals like normals and colors are stored in similar 2D arrays using the same implicit surface parametrization --- texture coordinates are absent. To create a geometry image, we cut an arbitrary mesh along a network of edge paths, and parametrize the resulting single chart onto a square. Geometry images can be encoded using traditional image compression algorithms, such as wavelet-based coders. Engineering and Applied Sciences
Article
Full-text available
The compression of geometric structures is a relatively new field of data compression. Since about 1995, several articles have dealt with the coding of meshes, using for most of them the following approach: the vertices of the mesh are coded in an order that partially contains the topology of the mesh. In the same time, some simple rules attempt to predict the position of each vertex from the positions of its neighbors that have been previously coded. In this article, we describe a compression algorithm whose principle is completely different: the coding order of the vertices is used to compress their coordinates, and then the topology of the mesh is reconstructed from the vertices. This algorithm achieves compression ratios that are slightly better than those of the currently available algorithms, and moreover, it allows progressive and interactive transmission of the meshes.
Conference Paper
Full-text available
We present a simple and efficient entropy coder that combines run-length and Golomb-Rice encoders. The encoder automatically switches between the two modes according to simple rules that adjust the encoding parameters based on the previous output codeword, and the decoder tracks such changes. This adaptive run-length/Golomb-Rice (RLGR) coder has a fast learning rate, making it suitable for many practical applications, which usually involve encoding small source blocks. We study the encoding of generalized Gaussian (GG) sources after quantization with uniform scalar quantizers with deadzone, which are good source models in multimedia data compression, for example. We show that, for a wide range of source parameters, the RLGR encoder has a performance close to that of the optimal Golomb-Rice and exp-Golomb coders designed with knowledge of the source statistics, and in some cases the RLGR coder improves coding efficiency by 20% or more.
Article
Full-text available
In this paper, we propose a generic point cloud encoder that provides a unified framework for compressing different attributes of point samples corresponding to 3D objects with an arbitrary topology. In the proposed scheme, the coding process is led by an iterative octree cell subdivision of the object space. At each level of subdivision, the positions of point samples are approximated by the geometry centers of all tree-front cells, whereas normals and colors are approximated by their statistical average within each of the tree-front cells. With this framework, we employ attribute-dependent encoding techniques to exploit the different characteristics of various attributes. All of these have led to a significant improvement in the rate-distortion (R-D) performance and a computational advantage over the state of the art. Furthermore, given sufficient levels of octree expansion, normal space partitioning, and resolution of color quantization, the proposed point cloud encoder can be potentially used for lossless coding of 3D point clouds.
Article
Compressing attributes on 3D point clouds such as colors or normal directions has been a challenging problem, since these attribute signals are unstructured. In this paper, we propose to compress such attributes with graph transform. We construct graphs on small neighborhoods of the point cloud by connecting nearby points, and treat the attributes as signals over the graph. The graph transform, which is equivalent to Karhunen-Loève Transform on such graphs, is then adopted to decorrelate the signal. Experimental results on a number of point clouds representing human upper bodies demonstrate that our method is much more efficient than traditional schemes such as octree-based methods.
Article
The use of 3D data in mobile robotics applications provides valuable information about the robot’s environment. However usually the huge amount of 3D information is difficult to manage due to the fact that the robot storage system and computing capabilities are insufficient. Therefore, a data compression method is necessary to store and process this information while preserving as much information as possible. A few methods have been proposed to compress 3D information. Nevertheless, there does not exist a consistent public benchmark for comparing the results (compression level, distance reconstructed error, etc.) obtained with different methods. In this paper, we propose a dataset composed of a set of 3D point clouds with different structure and texture variability to evaluate the results obtained from 3D data compression methods. We also provide useful tools for comparing compression methods, using as a baseline the results obtained by existing relevant compression methods.
Article
This paper presents keyframe-based geometry video (KGV), a novel framework for compressing 3-D human motion data by using geometry videos. Given a motion data encoded in a geometry video (GV) format, our method extracts the keyframes and produces a reconstruction matrix. Then it applies the video compression technique (e.g., H.264/Advanced Video Coding) to the reordered keyframes, which can significantly reduce the spatial and temporal redundancy in the KGV. We develop a rate distortion-based optimization algorithm to determine the parameters (i.e., the number of keyframes and quantization parameter) leading to optimal performance. Experimental results show that the proposed KGV framework significantly outperforms the existing GV techniques in terms of both the rate distortion performance and visual quality. Besides, the computational cost of the KGV is rather low at the decoder, making it highly desirable for power-constrained devices. Last but not least, our method can be easily extended to progressive compression with heterogeneous communication network.
Conference Paper
We present a system for real-time, high-resolution, sparse voxelization of an image-based surface model. Our approach consists of a coarse-to-fine voxel representation and a collection of parallel processing steps. Voxels are stored as a list of unsigned integer triples. An oracle kernel decides, for each voxel in parallel, whether to keep or cull its voxel from the list based on an image consistency criterion of its projection across cameras. After a prefix sum scan, kept voxels are subdivided and the process repeats until projected voxels are pixel size. These voxels are drawn to a render target and shaded as a weighted combination of their projections into a set of calibrated RGB images. We apply this technique to the problem of smooth visual hull reconstruction of human subjects based on a set of live image streams. We demonstrate that human upper body shapes can be reconstructed to giga voxel resolution at greater than 30 fps on modern graphics hardware.
Article
In order to efciently archive and transmit large 3D models, lossy and lossless compression methods are needed. We propose a compression scheme for coordinate data of point-based 3D models of surfaces. A point-based model is processed for compression in a pipeline of three subsequent operations, partitioning, parameterization, and coding. First the point set is partitioned yielding a suitable number of point clusters. Each cluster corresponds to a surface patch, that can be parameterized as a height eld and resampled on a regular grid. The domains of the height elds have irregular shapes that are encoded losslessly. The height elds themselves are encoded using a shape-adaptive wavelet coder, producing a progressive bitstream for each patch. A rate-distortion optimization provides for an optimal bit allocation for the individual patch codes. With this algorithm design compact codes are produced that are scalable with respect to rate, quality, and resolution. In our encodings of complex 3D models competitive rate-distortion performances were achieved with excellent reconstruction quality at under 3 bits per point (bpp).
Conference Paper
We present a novel lossy compression approach for point cloud streams which exploits spatial and temporal redundancy within the point data. Our proposed compression framework can handle general point cloud streams of arbitrary and varying size, point order and point density. Furthermore, it allows for controlling coding complexity and coding precision. To compress the point clouds, we perform a spatial decomposition based on octree data structures. Additionally, we present a technique for comparing the octree data structures of consecutive point clouds. By encoding their structural differences, we can successively extend the point clouds at the decoder. In this way, we are able to detect and remove temporal redundancy from the point cloud data stream. Our experimental results show a strong compression performance of a ratio of 14 at 1 mm coordinate precision and up to 40 at a coordinate precision of 9 mm.
Article
We present the "Geometry Video," a new data structure to encode animated meshes. Being able to encode animated meshes in a generic source-independent format allows people to share experiences. Changing the viewpoint allows more interaction than the fixed view supported by 2D video. Geometry videos are based on the "Geometry Image" mesh representation introduced by Gu et al. 4. Our novel data structure provides a way to treat an animated mesh as a video sequence (i.e., 3D image) and is well suited for network streaming. This representation also offers the possibility of applying and adapting existing mature video processing and compression techniques (such as MPEG encoding) to animated meshes. This paper describes an algorithm to generate geometry videos from animated meshes.The main insight of this paper, is that Geometry Videos re-sample and re-organize the geometry information, in such a way, that it becomes very compressible. They provide a unified and intuitive method for level-of-detail control, both in terms of mesh resolution (by scaling the two spatial dimensions) and of frame rate (by scaling the temporal dimension). Geometry Videos have a very uniform and regular structure. Their resource and computational requirements can be calculated exactly, hence making them also suitable for applications requiring level of service guarantees.
Conference Paper
In this paper we present a progressive compression method for point sampled models that is specifically apt at dealing with densely sampled surface geometry. The compression is lossless and therefore is also suitable for stor- ing the unfiltered, raw scan data. Our method is based on an octree decomposition of space. The point-cloud is encoded in terms of occupied octree-cells. To compress the octree we employ novel prediction techniques that were specifically designed for point sampled geometry and are based on local surface approximations to achieve high compression rates that outperform previous progressive coders for point-sampled geometry. Moreover we demon- strate that additional point attributes, such as color, which are of great importance for point-sampled geometry, can be well integrated and efficiently encoded in this framework.
Article
Generation and transmission of complex animation sequences can benefit substantially from the availability of tools for handling large amounts of data associated with dynamic three-dimensional (3-D) models. Previous works in 3-D dynamic compression consider only the simplest situation where the connectivity changes do not occur with time. In this paper, we present an approach for compressing 3-D dynamic models in which both the vertex data and the connectivity data can change with time. Using our framework, 3-D animation sequences generated using commercial graphics tools or dynamic range data captured using range scanners can be compressed significantly. We use 3-D registration to identify the changes in the vertex data and the connectivity of the 3-D geometry between successive frames. Next, the interframe motion is encoded using affine motion parameters and the differential pulse coded modulation (DPCM) predictor. Our work is the first to exploit the temporal coherence in the connectivity data between frames and presents a detailed encoding scheme for 3-D dynamic data. We also discuss the issue of inserting I-frames in the compressed data for better performance. We show that our algorithm has a far superior performance when compared with existing techniques for both vertex compression and connectivity compression of 3-D dynamic datasets.
Conference Paper
A previously adopted standard, the mixed raster content (MRC) imaging model, represents compound images as a superposition of layers. Since layers are superimposed, large regions may not be imaged onto the final raster. Thus, those regions are redundant. We focus on techniques to replace the redundant data, i.e. on data filling redundant regions based on non-redundant ones. We present techniques to minimize the rate and distortion achieved by MRC compression through data filling. We start with a general method, then narrowing the presentation to DCT based compression of planes. Iterative block filling algorithms are presented in both spatial and DCT domain.
Article
Time-varying mesh, which is attracting a lot of attention as a new multimedia representation method, is a sequence of 3-D models that are composed of vertices, edges, and some attribute components such as color. Among these components, vertices require large storage space. In conventional 2-D video compression algorithms, motion compensation (MC) using a block matching algorithm is frequently employed to reduce temporal redundancy between consecutive frames. However, there has been no such technology for 3-D time-varying mesh so far. Therefore, in this paper, we have developed an extended block matching algorithm (EBMA) to reduce the temporal redundancy of the geometry information in the time-varying mesh by extending the idea of the 2-D block matching algorithm to 3-D space. In our EBMA, a cubic block is used as a matching unit. MC in the 3-D space is achieved efficiently by matching the mean normal vectors calculated from partial surfaces in cubic blocks, which our experiments showed to be a suboptimal matching criterion. After MC, residuals are transformed by the discrete cosine transform, uniformly quantized, and then encoded. The extracted motion vectors are also entropy coded after differential pulse code modulation. As a result of our experiments, 10%-18% compression has been achieved.
Article
We examine 3-D wavelet coding of video with arbitrary regions of support (AROS). A critically sampled wavelet transform is applied to the AROS and a modified 3-D set partitioning in hierarchical trees (SPIHT) algorithm is used to quantize and code the wavelet coefficients in the AROS only. Experiments show that, for typical MPEG-4 pre-segmented sequences, our proposed method can achieve a gain of up to 5.6 dB in average PSNR at the same rate over 3-D SPIHT coding of regular volumes that embed the AROS of the given video sequences