Content uploaded by Maurice Quach

Author content

All content in this area was uploaded by Maurice Quach on Jun 16, 2020

Content may be subject to copyright.

FOLDING-BASED COMPRESSION OF POINT CLOUD ATTRIBUTES

Maurice Quach Giuseppe Valenzise Frederic Dufaux

Université Paris-Saclay, CNRS, CentraleSupélec, Laboratoire des signaux et systèmes

91190 Gif-sur-Yvette, France

ABSTRACT

Existing techniques to compress point cloud attributes lever-

age either geometric or video-based compression tools. We

explore a radically different approach inspired by recent ad-

vances in point cloud representation learning. Point clouds

can be interpreted as 2D manifolds in 3D space. Speciﬁ-

cally, we fold a 2D grid onto a point cloud and we map at-

tributes from the point cloud onto the folded 2D grid using

a novel optimized mapping method. This mapping results in

an image, which opens a way to apply existing image pro-

cessing techniques on point cloud attributes. However, as this

mapping process is lossy in nature, we propose several strate-

gies to reﬁne it so that attributes can be mapped to the 2D

grid with minimal distortion. Moreover, this approach can

be ﬂexibly applied to point cloud patches in order to better

adapt to local geometric complexity. In this work, we con-

sider point cloud attribute compression; thus, we compress

this image with a conventional 2D image codec. Our pre-

liminary results show that the proposed folding-based coding

scheme can already reach performance similar to the latest

MPEG Geometry-based PCC (G-PCC) codec.

Index Terms—point cloud, compression, neural network

1. INTRODUCTION

A point cloud is a set of points in 3D space which can have as-

sociated attributes such as color or normals. Point clouds are

essential for numerous applications ranging from archeology

and architecture to virtual and mixed reality. Since they can

contain millions of points with complex attributes, efﬁcient

point cloud compression (PCC) is essential to make these ap-

plications feasible in practice.

When compressing a point cloud, we usually consider two

aspects: the geometry, that is the 3D coordinates of each in-

dividual point, and the attributes, for example RGB colors.

Moreover, we can differentiate dynamic point clouds, which

change in the temporal dimension, from static point clouds.

The Moving Picture Experts Group (MPEG) is leading PCC

standardization efforts [1]. Speciﬁcally, two main solutions

have emerged. The ﬁrst one, Geometry-based PCC (G-PCC),

Funded by the ANR ReVeRy national fund (REVERY ANR-17-CE23-

0020).

uses native 3D data structures, while the second one, Video-

based PCC (V-PCC), targets mainly dynamic point clouds,

and projects the data on a 2D plane to make use of available

video codecs such as HEVC.

Point clouds can be interpreted as 2D discrete manifolds

in 3D space. Therefore, instead of compressing point cloud

attributes using 3D structures such as octrees, we can fold

this 2D manifold onto an image. This opens many avenues

of research, as it provides, e.g., a way to apply existing im-

age processing techniques straightforwardly on point cloud

attributes. In this work, we propose a novel system for fold-

ing a point cloud and mapping its attributes to a 2D grid. Fur-

thermore, we demonstrate that the proposed approach can be

used to compress static point cloud attributes efﬁciently.

2. RELATED WORK

Our work is at the crossroads of static point cloud attribute

compression and deep representation learning of 3D data.

Compressing static point cloud attributes has been explored

using graph transforms [2], the Region-Adaptive Hierarchical

Transform (RAHT) [3] and volumetric functions [4]. Graph

transforms take advantage of the Graph Fourier Transform

(GFT) and the neighborhood structure present in the 3D

space to compress point cloud attributes. The RAHT is a

hierarchical transform which extends the Haar wavelet trans-

form to an octree representation. In this paper, we propose a

different perspective, and leverage the manifold interpretation

of the point cloud by mapping its attributes onto a 2D grid,

which can then be compressed as an image.

Deep learning methods have been used for representation

learning and compression of point clouds [5]. In particular,

the initial folding in our work is inspired by [6] where an au-

toencoder network is trained on a dataset to learn how to fold

a 2D grid onto a 3D point cloud. In our work, we build on

this folding idea; however, we employ it in a very different

way. Speciﬁcally, we do not aim at learning a good repre-

sentation that can generalize over a dataset; instead, we em-

ploy the folding network as a parametric function that maps

an input 2D grid to points in 3D space. The parameters of

this function (i.e., the weights of the network) are obtained by

overﬁtting the network to a speciﬁc point cloud. In addition,

the original folding proposed in [6] is highly inefﬁcient for

ICIP 2020 Copyright 2020 IEEE

Original

attributes Optimized Mapping Image

compression

Compressed

attributes

Image

decompression

Inverse

mapping

Decompressed

attributes

Encoder

Decoder

Coded

geometry

Segmentation

into patches

Grid Folding,

Folding reﬁnement

Fig. 1: Proposed system for attribute compression. Segmentation is optional and can help to adapt to local geometry complexity.

PCC as it poorly adapts to complex geometries. In our work,

we propose a number of solutions to improve folding.

3. PROPOSED METHOD

We propose a novel system for compressing point cloud at-

tributes based on the idea that a point cloud can be seen as a

discrete 2D manifold in 3D space. In this way, we can ob-

tain a 2D parameterization of the point cloud and we can map

attributes from a point cloud onto a grid, making it possible

to employ 2D image processing algorithms and compression

tools. The overall system is depicted in Figure 1. In a nutshell,

our approach is based on the following two steps: a) we ﬁnd

a parametric function (speciﬁcally, a deep neural network) to

fold a 2D grid onto a 3D point cloud; b) we map attributes

(e.g., colors) of the original point cloud to this grid. The grid

and the parametric function contain all the necessary informa-

tion to recover the point cloud attributes. Assuming the point

cloud geometry is coded separately and transmitted to the de-

coder, the folding function can be constructed at the decoder

side, and the 2D grid is fully decodable without any need to

transmit network parameters. In practice, the 3D-to-2D map-

ping is lossy, which entails a mapping distortion in the step

b) above. In the following, we propose several strategies to

reduce this mapping distortion.

Notation. We use lowercase bold letters such as xto in-

dicate 3D vectors (point cloud spatial coordinates), and up-

percase letters such as Xto indicate sets of 3D points (vec-

tors). We denote with a tilde (like ˜x or ˜

X) compressed (dis-

torted) vectors or sets of vectors. We use the notation hSi=

Px∈Sx/|S|for the average over a set S.

3.1. Grid folding

We propose a grid folding composed of two steps, namely,

an initial folding step to get a rough reconstruction of Xand

a folding reﬁnement step to improve the reconstruction qual-

ity, which is quintessential to map point cloud attributes with

minimal mapping distortion.

We fold a grid onto a point cloud to obtain its 2D param-

eterization by solving the following optimization problem:

min

fL(X, ˜

X)(1)

where Xis the set of npoints in the original point cloud, ˜

X=

f(X, G)is the set of n0points in the reconstructed point cloud

obtained by folding Gonto Xwhere Gthe set of n0=w×h

points of a 2D grid with 3D coordinates. In general, n06=n;

however, we choose n0to be close to n.Lis a loss function

and fis a folding function.

We parameterize fusing a neural network composed of

an encoder feand a decoder fdsuch that y=fe(X)and

˜

X=fd(G, y). The encoder feis composed of four point-

wise convolutions with ﬁlter sizes of 128 followed by a max-

pooling layer. The decoder fdis composed of two folding

layers with fd(G, y) = FL(FL(G, y),y). Each folding layer

has two pointwise convolutions with ﬁlter sizes of 64 and con-

catenates yto its input. The last pointwise convolution has a

ﬁlter size of 3. We use the ReLU activation [8] for the encoder

and LeakyReLU activation [9] for the decoder. A one-to-one

mapping exists between each point ˜xiin the folded grid ˜

X

and their original position giin the grid G.

We propose the following loss function

L(X, ˜

X)=dch(X, ˜

X)+drep (˜

X)(2)

where dch is the Chamfer distance:

dch(X, ˜

X) = X

x∈X

min

˜x∈˜

X

kx−˜xk2

2+X

˜x∈˜

X

min

x∈˜

X

k˜x −xk2

2,

and drep is a novel repulsion loss computed as the variance of

the distance of each point in ˜

Xto its nearest neighbor:

drep(˜

X) = Var({min

˜x0∈˜

X\˜x

k˜x −˜x0k2

2|˜x ∈˜

X}).

The Chamfer distance ensures that the reconstruction ˜

Xis

similar to Xand the repulsion loss penalizes variations in the

reconstruction’s density.

We obtain the parameterized folding function fby train-

ing a neural network using the Adam optimizer [10]. We use

the point cloud Xas the single input which is equivalent to

overﬁtting the network on a single sample.

2

(a) Original (b) Folded (27.63 dB) (c) Reﬁned folded (30.62 dB) (d) Opt. reﬁned folded (33.39 dB)

Fig. 2: Different steps of our proposed attribute mapping method for the ﬁrst frame of phil9 [7]. Top row: phases of point cloud

reconstruction; bottom row: the attributes mapped on a 2D grid, which is later compressed and transmitted. The initial folding

(b) provides a rough reconstruction ˜

Xwhich is improved with folding reﬁnement (c) and occupancy optimization (d) to reduce

the density mismatch between Xand ˜

X. We then map attributes from the point cloud onto a 2D grid. The holes in the grid are

ﬁlled to facilitate compression with HEVC. We indicate Y PSNRs between original and colors distorted by mapping.

3.2. Folding reﬁnement

The initial folding has difﬁculties reconstructing complex

shapes accurately as seen in Figure 2b. Speciﬁcally, the two

main issues are mismatches in local density between Xand

˜

Xand inaccurate reconstructions for complex shapes. As

a result, this introduces signiﬁcant mapping distortion when

mapping attributes from the original PC to the folded one;

additionally, this mapping distortion affects the reconstructed

point cloud attributes. For compression applications, this is a

serious issue as there are now two sources of distortion from

both mapping and compression. This is why we propose a

folding reﬁnement method that alleviates mismatches in local

density and inaccurate reconstructions.

First, we reduce local density variations by considering

density-aware grid structure preservation forces inside ˜

X.

Speciﬁcally, each point ˜x is attracted towards the inverse

density weighted average of its neighbors pgrid. Since a

one-to-one mapping exists between ˜

Xand G, each point

˜xiin the folded grid ˜

Xhas a corresponding point giin the

grid G. We then deﬁne the inverse density weight ωifor

˜xias ωi=h{ k˜xi−˜xjk2|gj∈ NG(gi)}i with NG(gi)

the set of horizontal and vertical neighbors of giin the grid

G. This encourages the reconstruction to have a more uni-

form distribution by penalizing high density areas. Given the

set Ωcomprising all weights ωi, we deﬁne the normalized

weights ˆωi= (ωi−min(Ω))/(max(Ω) −min(Ω)). Fi-

nally, this allows us to deﬁne the weighted average pgridi=

h{ ˆωj˜xj|gj∈ NG(gi)}i.

Second, we set up bidirectional attraction forces between

Xand ˜

Xto solve two issues: incomplete coverage, when

˜

Xdoes not cover parts of X, and inaccurate reconstructions,

when ˜

Xfails to reproduce Xaccurately. As a solution, we at-

tract each point ˜x towards two points ppush and ppull. Specif-

ically, ppush is the nearest neighbor of ˜x in Xand pushes ˜

X

towards Xwhich allows for more accurate reconstructions.

On the other hand, ppull is the average of the points in X

which have ˜x as their nearest neighbor and allows Xto pull

˜

Xcloser which alleviates incomplete coverage issues.

Finally, we combine these components into an iterative

reﬁnement system to update the point cloud reconstruction:

˜xt+1,i =αpgridt,i + (1 −α)(ppusht,i +ppullt,i )/2(3)

where ˜xt,i is the value of ˜xiafter titerations and ˜x0=˜x. The

inertia factor α∈[0,1] balances the grid structure preserva-

tion forces in ˜

Xwith the bidirectional attraction forces set

up between Xand ˜

X. Preserving the grid structure preserves

the spatial correlation of the attributes mapped on the grid and

the density-aware aspect of these forces results in more uni-

formly distributed points. In addition, the bidirectional forces

improve the accuracy of the reconstruction signiﬁcantly.

3.3. Optimized Attribute Mapping

Once a sufﬁciently accurate 3D point cloud geometry is re-

constructed (Figure 2c), we can map attributes from Xto ˜

X.

To this end, we ﬁrst build a mapping mX→˜

Xfrom each point

in Xto a corresponding point in ˜

X(for example, the nearest

neighbor). Hence, the inverse mapping m˜

X→Xmaps ˜x back

3

024

bits per input point

25

30

35

40

45

Y PSNR (dB)

GPCC v7

GPCC v3

Folding

Reﬁned folding

Opt. Reﬁned folding

012

bits per input point

30

35

40

Y PSNR (dB)

GPCC v7

GPCC v3

Folding

Reﬁned folding

Opt. Reﬁned folding

0123

bits per input point

30

35

40

45

Y PSNR (dB)

GPCC v7

GPCC v3

Folding

Reﬁned folding

Opt. Reﬁned folding

Fig. 3: RD curves showing the performance of the different steps of our method. From top to bottom: longdress_vox10_1300,

redandblack_vox10_1550 and soldier_vox10_0690 [11].

to X. As mX→˜

Xis not one-to-one (due to local density mis-

matches and inaccuracy of the reconstruction), several points

in Xcan map to the same ˜x. Thus, a given ˜x can correspond

to zero, one or many points in X; we deﬁne the number of

these points as its occupancy o(˜x). Attribute mapping from

Xto ˜

Xis obtained using m˜

X→Xas the attribute value for a

point ˜x is the average of the attribute values of m˜

X→X(˜x). In

case m˜

X→X(˜x) = ∅, we simply assign to ˜x the attribute of

its nearest neighbor in X. As a consequence of this approach,

points with higher occupancy tend to have higher mapping

distortion, as more attributes are averaged.

To overcome this problem, we integrate the occupancy as

a regularizing factor when building the mapping. For each

point xin X, we consider its knearest neighbors set Nk(x)∈

˜

Xand select mX→˜

X(x) = arg min˜x∈Nk(x)o(˜x)k˜x −xk2.

Speciﬁcally, the mapping is built iteratively and the occupan-

cies are updated progressively.

As noted above, when o(˜x)>1, the attributes are aver-

aged which introduces distortion. We mitigate this problem

by adding rows and columns in the 2D grid (see Fig. 2d) us-

ing the following procedure. Since o(˜x)is deﬁned on ˜

Xand

there is a one-to-one mapping between ˜

Xand G, we can com-

pute mean occupancies row-wise and column-wise. In par-

ticular, we compute mean occupancies with zeros excluded

and we select the row/column with the maximum mean oc-

cupancy. Then, we reduce its occupancy by inserting addi-

tional rows/columns around it. We repeat this procedure until

we obtain a lossless mapping or the relative change on the

average of mean occupancies ∆ris superior to a threshold

∆r,min.

4. EXPERIMENTAL RESULTS

We evaluate our system for static point cloud attribute com-

pression and compare it against G-PCC v3 [12] and v7 [13].

We also study the impact of folding reﬁnement and occupancy

optimization on our method by presenting an ablation study.

Since folding is less accurate on complex point clouds, we

manually segment the point clouds into patches and apply our

scheme on each patch. The patches are then reassembled in

order to compute rate-distortion measures.

We use TensorFlow 1.15.0 [14]. For the folding reﬁne-

ment, we set αto 1/3and perform 100 iterations. When map-

ping attributes, we consider k= 9 neighbors for assignment.

When optimizing occupancy, we set ∆r,min to 10−6. We then

perform image compression using BPG [15], an image format

based on HEVC intra [16], with QPs ranging from 20 to 50

with a step of 5.

In Figure 3, we observe that our method performs com-

parably to G-PCC for “longdress" and “redandblack". The

performance is slightly worse for “soldier" as its geometry

is much more complex making a good reconstruction difﬁ-

cult and introducing mapping distortion. We obtain signiﬁ-

cant gains in terms of rate-distortion by improving the recon-

struction quality using folding reﬁnement and occupancy op-

timization. This shows the potential of our method and con-

ﬁrms the importance of reducing the mapping distortion.

5. CONCLUSION

Based on the interpretation of a point cloud as a 2D manifold

living in a 3D space, we propose to fold a 2D grid onto it

and map point cloud attributes into this grid. As the map-

ping introduces distortion, this calls for strategies to mini-

mize this distortion. In order to minimize mapping distor-

tion, we proposed a folding reﬁnement procedure, an adap-

tive attribute mapping method and an occupancy optimization

scheme. With the resulting image, we compress point cloud

attributes leveraging conventional image codecs and obtain

encouraging results. Our proposed method enables the use

of 2D image processing techniques and tools on point cloud

attributes.

6. REFERENCES

[1] Sebastian Schwarz, Marius Preda, Vittorio Baroncini,

Madhukar Budagavi, Pablo Cesar, Philip A. Chou,

Robert A. Cohen, Maja Krivokuca, Sebastien Lasserre,

Zhu Li, Joan Llach, Khaled Mammou, Rufael Mekuria,

Ohji Nakagami, Ernestasia Siahaan, Ali Tabatabai,

4

Alexis M. Tourapis, and Vladyslav Zakharchenko,

“Emerging MPEG standards for point cloud compres-

sion,” pp. 1–1.

[2] Cha Zhang, Dinei Florêncio, and Charles Loop, “Point

cloud attribute compression with graph transform,” in

2014 IEEE International Conference on Image Process-

ing (ICIP), pp. 2066–2070, ISSN: 2381-8549.

[3] Ricardo L. de Queiroz and Philip A. Chou, “Compres-

sion of 3d point clouds using a region-adaptive hierar-

chical transform,” vol. 25, no. 8, pp. 3947–3956, Con-

ference Name: IEEE Transactions on Image Processing.

[4] Maja Krivoku´

ca, Maxim Koroteev, and Philip A. Chou,

“A volumetric approach to point cloud compression,” .

[5] Maurice Quach, Giuseppe Valenzise, and Frederic Du-

faux, “Learning convolutional transforms for lossy point

cloud geometry compression,” in 2019 IEEE Inter-

national Conference on Image Processing (ICIP), pp.

4320–4324, ISSN: 1522-4880.

[6] Yaoqing Yang, Chen Feng, Yiru Shen, and Dong Tian,

“FoldingNet: Point cloud auto-encoder via deep grid de-

formation,” in 2018 IEEE Conference on Computer Vi-

sion and Pattern Recognition (CVPR).

[7] Charles Loop, Qin Cai, Sergio O. Escolano, and

Philip A. Chou, “Microsoft voxelized upper bod-

ies - a voxelized point cloud dataset,” in ISO/IEC

JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG) input

document m38673/M72012.

[8] Vinod Nair and Geoffrey E. Hinton, “Rectiﬁed linear

units improve restricted boltzmann machines,” in Pro-

ceedings of the 27th International Conference on Ma-

chine Learning (ICML-10), June 21-24, 2010, Haifa, Is-

rael, Johannes Fürnkranz and Thorsten Joachims, Eds.

pp. 807–814, Omnipress.

[9] Andrew L. Maas, Awni Y. Hannun, and Andrew Y. Ng,

“Rectiﬁer nonlinearities improve neural network acous-

tic models,” in in ICML Workshop on Deep Learning

for Audio, Speech and Language Processing.

[10] Diederik P. Kingma and Jimmy Ba, “Adam: A method

for stochastic optimization,” in 2015 3rd International

Conference on Learning Representations.

[11] Sebastian Schwarz, Gaëlle Martin-Cocher, David Flynn,

and Madhukar Budagavi, “Common test condi-

tions for point cloud compression,” in ISO/IEC

JTC1/SC29/WG11 MPEG output document N17766.

[12] Khaled Mammou, Philip A. Chou, David Flynn, and

Maja Krivoku´

ca, “PCC test model category 13 v3,”

in ISO/IEC JTC1/SC29/WG11 MPEG output document

N17762.

[13] “G-PCC test model v7 user manual,” in ISO/IEC

JTC1/SC29/WG11 MPEG output document N18664.

[14] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene

Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado,

Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay

Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey

Irving, Michael Isard, Yangqing Jia, Rafal Jozefow-

icz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg,

Dan Mane, Rajat Monga, Sherry Moore, Derek Mur-

ray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit

Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vin-

cent Vanhoucke, Vijay Vasudevan, Fernanda Viegas,

Oriol Vinyals, Pete Warden, Martin Wattenberg, Mar-

tin Wicke, Yuan Yu, and Xiaoqiang Zheng, “Tensor-

Flow: Large-scale machine learning on heterogeneous

distributed systems,” .

[15] Fabrice Bellard, “BPG image format,” .

[16] “High efﬁciency video coding (HEVC) version 2 (ITU-t

recommendation h.265),” .

5