Content uploaded by Maurice Quach

Author content

All content in this area was uploaded by Maurice Quach on Mar 01, 2021

Content may be subject to copyright.

A deep perceptual metric for 3D point clouds

Maurice Quach†, Aladine Chetouani†+, Giuseppe Valenzise†, Frederic Dufaux†;

†Universit´

e Paris-Saclay, CNRS, CentraleSup´

elec, Laboratoire des signaux et syst`

emes; 91190 Gif-sur-Yvette, France

+Laboratoire PRISME, Universit´

e d’Orl´

eans; Orl´

eans, France

Abstract

Point clouds are essential for storage and transmission of

3D content. As they can entail signiﬁcant volumes of data, point

cloud compression is crucial for practical usage. Recently, point

cloud geometry compression approaches based on deep neural

networks have been explored. In this paper, we evaluate the abil-

ity to predict perceptual quality of typical voxel-based loss func-

tions employed to train these networks. We ﬁnd that the commonly

used focal loss and weighted binary cross entropy are poorly cor-

related with human perception. We thus propose a perceptual loss

function for 3D point clouds which outperforms existing loss func-

tions on the ICIP2020 subjective dataset. In addition, we propose

a novel truncated distance ﬁeld voxel grid representation and ﬁnd

that it leads to sparser latent spaces and loss functions that are

more correlated with perceived visual quality compared to a bi-

nary representation. The source code is available at https: //

github. com/ mauriceqch/ 2021_ pc_ perceptual_ loss .

Introduction

As 3D capture devices become more accurate and accessi-

ble, point clouds are a crucial data structure for storage and trans-

mission of 3D data. Naturally, this comes with signiﬁcant vol-

umes of data. Thus, Point Cloud Compression (PCC) is an essen-

tial research topic to enable practical usage. The Moving Picture

Experts Group (MPEG) is working on two PCC standards [1]:

Geometry-based PCC (G-PCC) and Video-based PCC (V-PCC).

G-PCC uses native 3D data structures to compress point clouds,

while V-PCC employs a projection-based approach using video

coding technology [2]. These two approaches are complementary

as V-PCC specializes on dense point clouds, while G-PCC is a

more general approach suited for sparse point clouds. Recently,

JPEG Pleno [3] has issued a Call for Evidence on PCC [4].

Deep learning approaches have been employed to compress

geometry [5, 6, 7, 8, 9, 10] and attributes [11, 12] of points clouds.

Speciﬁc approaches have also been developed for sparse LIDAR

point clouds [13, 14]. In this work, we focus on lossy point cloud

geometry compression for dense point clouds. In existing ap-

proaches, different point cloud geometry representations are con-

sidered for compression: G-PCC adopts a point representation,

V-PCC uses a projection or image-based representation and deep

learning approaches commonly employ a voxel grid representa-

tion. Point clouds can be represented in different ways in the

voxel grid. Indeed, voxel grid representations include binary and

Truncated Signed Distance Fields (TSDF) representations [15].

TDSFs rely on the computation of normals; however, in the case

of point clouds this computation can be noisy. We then ignore the

normal signs and reformulate TDSFs to propose a new Truncated

Distance Field (TDF) representation for point clouds

Deep learning approaches for lossy geometry compression

x

fa

fs

˜x

y

(a) Autoencoder model.

x(A)

fa

fs

˜

x(A)

y(A)

x(B)

fa

fs

˜

x(B)

y(B)

−

y(A)−y(B)

(b) Perceptual loss.

Figure 1. Perceptual loss based on an autoencoder. The grayed out parts

do not need to be computed for the perceptual loss.

typically jointly optimize rate and distortion. As a result, an ob-

jective quality metric, employed as a loss function, is necessary

to deﬁne the distortion objective during training. Such metrics

should be differentiable, deﬁned on the voxel grid and well corre-

lated with perceived visual quality. In this context, the Weighted

Binary Cross Entropy (WBCE) and the focal loss [16] are com-

monly used loss functions based on a binary voxel grid represen-

tation. They aim to alleviate the class imbalance between empty

and occupied voxels caused by point cloud sparsity. However,

they are poorly correlated with human perception as they only

compute a voxel-wise error.

A number of metrics have been proposed for Point Cloud

Quality Assessment (PCQA): the point-to-plane (D2) metric [18],

PC-MSDM [19], PCQM [20], angular similarity [21], point to

distribution metric [22], point cloud similarity metric [23], im-

proved PSNR metrics [24] and a color based metric [25]. These

metrics operate directly on the point cloud. However, they are

not deﬁned on the voxel grid and hence cannot be used easily as

loss functions. Recently, to improve upon existing loss functions

such as the WBCE and the focal loss, a neighborhood adaptive

loss function [17] was proposed. Still, these loss functions are

based on the explicit binary voxel grid representation. We show

in this paper that loss functions based on the TDF representation

are more correlated with human perception than those based on

the binary representation.

The perceptual loss has previously been proposed as an ob-

jective quality metric for images [26]. Indeed, neural networks

learn representations of images that are well correlated with per-

ceived visual quality. This enables the deﬁnition of the perceptual

loss as a distance between latent space representations. For the

case of images, the perceptual loss provides competitive perfor-

mance or even outperforms traditional quality metrics. We hy-

pothesize that a similar phenomenon can be observed for point

0123456

0123456

0000000

0000000

0000000

1111111

0000000

0000000

0000000

0.0

0.2

0.4

0.6

0.8

1.0

(a) Binary.

0123456

0123456

1111111

1111111

0.5 0.5 0.5 0.5 0.5 0.5 0.5

0000000

-0.5 -0.5 -0.5 -0.5 -0.5 -0.5 -0.5

-1 -1 -1 -1 -1 -1 -1

-1 -1 -1 -1 -1 -1 -1 −0.8

−0.4

0.0

0.4

0.8

(b) Truncated Signed Distance Field (TSDF).

0123456

0123456

1111111

1111111

0.5 0.5 0.5 0.5 0.5 0.5 0.5

0000000

0.5 0.5 0.5 0.5 0.5 0.5 0.5

1111111

1111111

0.0

0.2

0.4

0.6

0.8

1.0

(c) Truncated Distance Field (TDF).

Figure 2. Voxel grid representations of a point cloud. The upper bound distance value is 2 for the TDF and the TSDF. Normals are facing up in the TSDF.

Table 1: Objective quality metrics considered in this study.

Domain Name Signal type Block

aggregation

Learning

based

Description

Points

D1 MSE Coordinates 7 7 Point-to-point MSE

D2 MSE Coordinates 7 7 Point-to-plane MSE

D1 PSNR Coordinates 7 7 Point-to-point PSNR

D2 PSNR Coordinates 7 7 Point-to-plane PSNR

Voxel

Grid

Bin BCE Binary L1 7Binary cross entropy

Bin naBCE Binary L1 7Neighborhood adaptive binary cross entropy [17]

Bin WBCE 0.75 Binary L2 7Weighted binary cross entropy with w=0.75

Bin PL Binary L1 3Perceptual loss (explicit) on all feature maps

Bin PL F1 Binary L1 3Perceptual loss (explicit) on feature map 1

TDF MSE Distances L1 7Truncated distance ﬁeld (TDF) MSE

TDF PL Distances L1 3Perceptual loss (implicit) over all feature maps

TDF PL F9 Distances L1 3Perceptual loss (implicit) on feature map 9

clouds.

Therefore, we propose a differentiable perceptual loss for

training deep neural networks aimed at compressing point cloud

geometry. We investigate how to build and train such a perceptual

loss to improve point cloud compression results. Speciﬁcally, we

build a differentiable distortion metric suitable for training neu-

ral networks to improve PCC approaches based on deep learning.

We then validate our approach experimentally on the ICIP2020

[27] subjective dataset. The main contributions of the paper are

as follows:

• A novel perceptual loss for 3D point clouds that outperforms

existing metrics on the ICIP2020 subjective dataset

• A novel implicit TDF voxel grid representation

• An evaluation of binary (explicit) and TDF (implicit) rep-

resentations in the context of deep learning approaches for

point cloud geometry compression

Voxel grid representations

In this study, we consider different voxel grid representations

for point clouds. A commonly used voxel grid representation is

the explicit binary occupancy representation where the occupancy

of a voxel (occupied or empty) is represented with a binary value

(Figure 2a). In the binary (Bin) representation (Figure 2a), each

voxel has a binary occupancy value indicating whether it is occu-

pied or empty. When the ith voxel is occupied, then xi=1 and

otherwise xi=0.

Another representation is the implicit TSDF representation

which has been employed for volume compression [9]. Instead of

an occupancy value, the value of a voxel is the distance from this

voxel to the nearest point and the sign of this value is determined

from the orientation of the normal (Figure 2b). However, this

requires reliable normals which may not be available in sparse

and/or heavily compressed point clouds.

Hence, we propose an implicit TDF representation which is

a variant of the TSDF without signs and therefore does not re-

quire normals. In the implicit TDF representation (Figure 2c), the

ith voxel value is xi=d, where dis the distance to its nearest

occupied voxel. Consequently, xi=0 when a voxel is occupied

and xi=dwith d>0 otherwise. Additionally, we truncate and

normalize the distance values into the [0,1]interval with

xi=min(d,u)/u,(1)

where uis an upper bound value.

In this study, we focus on the explicit binary and implicit

TDF representations.

Objective quality metrics

In Table 1, we present the objective quality metrics consid-

ered in this study. Speciﬁcally, we evaluate metrics that are differ-

entiable on the voxel grid to evaluate their suitability as loss func-

tions for point cloud geometry compression. We include metrics

deﬁned on binary and TDF representations and we compare their

performance against traditional point set metrics.

Voxel grid metrics

We partition the point cloud into blocks and compute voxel

grid metrics for each block. For each metric, we aggregate metric

values over all blocks with either L1 or L2 norm. Speciﬁcally, we

select the best aggregation experimentally for each metric.

Given two point clouds A and B, we denote the ith voxel

value for each point cloud as x(A)

iand x(B)

i. Then, we deﬁne the

WBCE as follows

−1

N∑

iαx(A)

ilog(x(B)

i)

+ (1−α)(1−x(A)

i)log(1−x(B)

i),

(2)

where αis a balancing weight between 0 and 1. The binary cross

entropy (BCE) refers to the case α=0.5.

Different from the WBCE, the Focal Loss (FL) ampliﬁes

(γ>1) or reduces (γ<1) errors and is deﬁned as follows

−∑

iαx(A)

i(1−x(B)

i)γlog(x(B)

i)

+ (1−α)(1−x(A)

i)(x(B)

i)γlog(1−x(B)

i),

(3)

where αis a balancing weight and the log arguments are clipped

between 0.001 and 0.999.

Compared to the WBCE, the FL adds two factors (1−x(B)

i)γ

and (x(B)

i)γ. However, while in the context of neural training, x(B)

i

is an occupancy probability, in the context of quality assessment,

x(B)

iis a binary value. As a result, the FL is equivalent to the

WBCE since γhas no impact in the latter case. For this reason,

we include the WBCE with α=0.75 in our experiments as an

evaluation proxy for the FL used in [6].

The neighborhood adaptive BCE (naBCE) [17] was pro-

posed as an alternative to the BCE and FL [16]. It is a variant

of the WBCE in which the weight αadapts to the neighborhood

of each voxel uresulting in a weight αu. Given a voxel u, its

neighboorhood is a window Wof size m×m×mcentered on

u. Then, the neighborhood resemblance ruis the sum of the in-

verse euclidean distances of neighboring voxels with the same bi-

nary occupancy value as u. Finally, the weight αuis deﬁned as

αu=max(1−ru/max(r)),0.001)where max(r)is the maximum

of all neighborhood resemblances.

The Mean Squared Error (MSE) on the TDF is expressed as

follows

1

N∑

i

(x(A)

i−x(B)

i)2.(4)

Perceptual Loss

We propose a perceptual loss based on differences between

latent space representations learned by a neural network. More

precisely, we use an autoencoder as the underlying neural net-

work. The model architecture of autoencoder used and its training

procedure are presented in the following.

Model architecture

We adopt an autoencoder architecture based on 3D convolu-

tions and transposed convolutions. Given an input voxel grid x,

we perform an analysis transform fa(x) = yto obtain the latent

space yand a synthesis transform fs(y) = ˜xas seen in Figure 1a.

The analysis transform is composed of three convolutions with

kernel size 5, stride 2 while the synthesis transform is composed

of three transposed convolutions with same kernel size and stride.

We use ReLU [28] activations for all layers except for the last

layer which uses a sigmoid activation.

Training

Using the previously deﬁned architecture, we train two neu-

ral networks: one with explicit representation (binary) and an-

other with implicit representation (TDF).

In the explicit case, we perform training of the perceptual

loss with a focal loss function as deﬁned in Eq. (3). In the implicit

case, we ﬁrst deﬁne the Kronecker delta δisuch that δi=1 when

i=0 and otherwise δi=0. Then, we deﬁne an adaptive MSE loss

function for the training of the perceptual loss as follows

1

N∑

iδ1−x(A)

i

w(x(A)

i−x(B)

i)2

+ (1−δ1−x(A)

i

)(1−w)(x(A)

i−x(B)

i)2,

(5)

where wis a balancing weight. Speciﬁcally, we choose was the

proportion of distances strictly inferior to 1 with

w=min max ∑i1−δ1−x(A)

i

N,β!,1−β!.(6)

where βis a bounding factor such that wis bounded by [β,1−β].

This formulation compensates for class imbalance while avoiding

extreme weight values.

In that way, the loss function adapts the contributions from

the voxels that are far from occupied voxels (x(A)

i=1) and voxels

that are near occupied voxels (x(A)

i<1). We train the network

with the Adam [29] optimizer.

Metric

As seen in Figure 1b, in order to compare two point

clouds x(A)and x(B), we compute their respective latent spaces

fa(x(A)) = y(A)and fa(x(B)) = y(B)using the previously trained

analysis transform. These latent spaces each have Ffeature maps

of size W×D×H(width, depth, height). Then, we deﬁne the

MSE between latent spaces as follows

1

N∑

i

(y(A)

i−y(B)

i)2.(7)

We compute this MSE either over all Ffeature maps or on single

feature maps.

Table 2: Statistical analysis of objective quality metrics.

Method PCC SROCC RMSE OR

TDF PL F9 0.951 0.947 0.094 0.375

D2 MSE 0.946 0.943 0.100 0.469

TDF MSE 0.940 0.940 0.103 0.385

D1 MSE 0.938 0.933 0.109 0.479

TDF PL 0.935 0.933 0.110 0.490

Bin PL F1 0.922 0.916 0.115 0.406

D2 PSNR 0.900 0.898 0.129 0.500

Bin WBCE 0.75 0.875 0.859 0.144 0.531

Bin PL 0.863 0.867 0.151 0.552

D1 PSNR 0.850 0.867 0.158 0.448

Bin naBCE 0.740 0.719 0.201 0.573

Bin BCE 0.713 0.721 0.207 0.635

Point set metrics

Point-to-point (D1) and point-to-plane (D2)

The point-to-point distance (D1) [30] measures the average

error between each point in A and their nearest neighbor in B:

eD1

A,B=1

NA∑

∀ai∈Akai−bjk2

2(8)

where bjis the nearest neighbor of aiin B.

In contrast to D1, the point-to-plane distance (D2) [18]

projects the error vector along the normal and is expressed as fol-

lows

eD2

A,B=1

NA∑

∀ai∈A

((ai−bj)·ni)2(9)

where bjis the nearest neighbor of aiin B and niis the normal

vector at ai.

The normals for original and distorted point clouds are com-

puted with local quadric ﬁttings using 9 nearest neighbors.

The D1 and D2 MSEs are the maximum of eA,Band eB,Aand

their Peak Signal-to-Noise Ratio (PSNR) is then computed with a

peak error corresponding to three times the point cloud resolution

as deﬁned in [30].

Experiments

Experimental setup

We evaluate the metrics deﬁned above on the ICIP2020 [27]

subjective dataset. It contains 6 point clouds [31, 32] compressed

using G-PCC Octree, G-PCC Trisoup and V-PCC with 5 different

rates yielding a total of 96 stimuli (with 6 references) and their

associated subjective scores.

For each metric, we compute the Pearson Correlation Coef-

ﬁcient (PCC), the Spearman Rank Order Correlation Coefﬁcient

(SROCC), the Root Mean Square Error (RMSE) and the Outlier

Ratio (OR). We evaluate the statistical signiﬁcance of the differ-

ences between PCCs using the method in [33]. These metrics are

computed after logistic ﬁttings with cross-validation splits. Each

split contains stimuli for one point cloud (i.e. reference point

cloud and its distorted versions) as a test set and stimuli of all

other point clouds as a training set. The metrics are then com-

puted after concatenating results for the test set of each split. They

are summarized in Table 1 and the values before and after logistic

ﬁtting are shown in Figure 3.

We use an upper bound value u=5 when computing the

TDF in Eq. (1) and a block size of 64 when block partitioning

point clouds. The naBCE window size is m=5 as in the original

paper. The perceptual loss is trained with a learning rate of 0.001,

β1=0.9 and β2=0.999 on the ModelNet dataset [34] after block

partitioning using Python 3.6.9 and TensorFlow [35] 1.15.0.

Comparison of perceptual loss feature maps

In our experiments, we ﬁrst considered the perceptual loss

computed over all feature maps. However, we observed that some

feature maps are more perceptually relevant than others. Con-

sequently, we include the best feature maps for each voxel grid

representation in our results. This corresponds to feature map 9

(TDF PL F9) for TDF PL and 1 (Bin PL F1) for Bin PL.

Moreover, We observe that some feature maps are unused

by the neural network (constant). Therefore, they exhibit high

RMSE values (all equal to 0.812) as their perceptual loss MSE

are equal to 0. Speciﬁcally, we observe that TDF PL has 6 unused

feature maps, while Bin PL has a single unused feature map. This

suggests that the perceptual loss learns a sparser latent space rep-

resentation when using TDF compared to binary. Thus, implicit

representations may improve compression performance compared

to explicit representations as fewer feature maps may be needed.

Comparison of objective quality metrics

In Table 2, we observe that the TDF PL F9 is the best method

overall. In particular, identifying the most perceptually relevant

feature map and computing the MSE on this feature map provides

a signiﬁcant improvement. Speciﬁcally, the difference between

the PCCs of TDF PL F9 and TDF PL is statistically signiﬁcant

with a conﬁdence of 95%.

For voxel grid metrics, we observe that TDF metrics per-

form better than binary metrics. In particular, the RMSEs of the

former are noticeably lower for point clouds compressed with G-

PCC Octree compared to the RMSE of the latter as can be seen in

Table 3. This suggests that implicit representations may be bet-

ter at dealing with density differences between point clouds in the

context of point cloud quality assessment.

Conclusion

We proposed a novel perceptual loss that outperforms ex-

isting objective quality metrics and is differentiable in the voxel

grid. As a result, it can be used as a loss function in deep neural

networks for point cloud compression and it is more correlated

with perceived visual quality compared to traditional loss func-

tions such as the BCE and the focal loss. Overall, metrics on the

proposed implicit TDF representation performed better than ex-

plicit binary representation metrics. Additionally, we observed

that the TDF representation yields sparser latent space represen-

tations compared to the binary representations. This suggests that

switching from binary to the TDF representation may improve

compression performance in addition to enabling the use of better

loss functions.

Logistic G-PCC Octree G-PCC Trisoup V-PCC Reference

0.0 0.2

2

4

MOS

0 20004000 6000 8000 0.0 0.2 0.4 0.00 0.25 0.50 0.0 0.2

12345

Bin BCE

2

4

MOS

12345

Bin naBCE

12345

Bin WBCE 0.75

12345

Bin PL

12345

Bin PL C1

0.00 0.02

2

4

MOS

0 10 20 30 0.0 0.5 1.0 0.0 2.5 5.0 7.5 10.0 0.0 0.2

12345

TDF MSE

2

4

MOS

12345

D1 MSE

12345

TDF PL

12345

D2 MSE

12345

TDF PL C9

Figure 3. Scatter plots between the objective quality metrics and the MOS values. The plots before and after logistic ﬁtting are shown.

Table 3: Statistical analysis of objective quality metrics by compression method.

Method G-PCC Octree G-PCC Trisoup V-PCC

PCC SROCC RMSE PCC SROCC RMSE PCC SROCC RMSE

TDF PL F9 0.975 0.859 0.078 0.936 0.910 0.101 0.897 0.850 0.106

D2 MSE 0.962 0.829 0.094 0.954 0.924 0.103 0.903 0.860 0.103

TDF MSE 0.952 0.839 0.106 0.933 0.917 0.106 0.912 0.867 0.098

D1 MSE 0.976 0.851 0.082 0.937 0.918 0.126 0.876 0.844 0.119

TDF PL 0.970 0.840 0.087 0.918 0.900 0.127 0.876 0.837 0.115

Bin PL F1 0.941 0.786 0.138 0.927 0.907 0.107 0.898 0.865 0.109

D2 PSNR 0.943 0.890 0.110 0.926 0.895 0.108 0.738 0.723 0.166

Bin WBCE 0.75 0.923 0.747 0.163 0.918 0.886 0.112 0.850 0.786 0.164

Bin PL 0.931 0.852 0.186 0.892 0.886 0.130 0.880 0.852 0.142

D1 PSNR 0.903 0.859 0.156 0.910 0.895 0.117 0.599 0.689 0.202

Bin naBCE 0.552 0.357 0.277 0.846 0.786 0.154 0.748 0.692 0.170

Bin BCE 0.946 0.841 0.188 0.776 0.800 0.177 0.574 0.500 0.250

Acknowledgments

We would like to thank the authors of [17] for providing their

implementation of naBCE. This work was funded by the ANR

ReVeRy national fund (REVERY ANR-17-CE23-0020).

References

[1] S. Schwarz et al., “Emerging MPEG Standards for Point Cloud

Compression,” IEEE Journal on Emerging and Selected Topics in

Circuits and Systems, pp. 1–1, 2018.

[2] “ITU-T Recommendation H.265: High efﬁciency video coding

(HEVC),” Nov. 2019.

[3] T. Ebrahimi et al., “JPEG Pleno: Toward an Efﬁcient Representation

of Visual Reality,” IEEE MultiMedia, vol. 23, no. 4, pp. 14–20, Oct.

2016.

[4] “Final Call for Evidence on JPEG Pleno Point Cloud Coding,”

in ISO/IEC JTC1/SC29/WG1 JPEG output document N88014, Jul.

2020.

[5] M. Quach, G. Valenzise, and F. Dufaux, “Learning Convolutional

Transforms for Lossy Point Cloud Geometry Compression,” in 2019

IEEE Int. Conf. on Image Process. (ICIP), Sep. 2019, pp. 4320–

4324.

[6] ——, “Improved Deep Point Cloud Geometry Compression,” in

2020 IEEE Int. Workshop on Multimedia Signal Process. (MMSP),

Oct. 2020.

[7] J. Wang et al., “Learned Point Cloud Geometry Compression,”

arXiv:1909.12037 [cs, eess], Sep. 2019.

[8] ——, “Multiscale Point Cloud Geometry Compression,”

arXiv:2011.03799 [cs, eess], Nov. 2020.

[9] D. Tang et al., “Deep Implicit Volume Compression,” in 2020

IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020,

pp. 1290–1300.

[10] S. Milani, “A Syndrome-Based Autoencoder For Point Cloud Ge-

ometry Compression,” in 2020 IEEE Int. Conf. on Image Process.

(ICIP), Oct. 2020, pp. 2686–2690.

[11] M. Quach, G. Valenzise, and F. Dufaux, “Folding-Based Compres-

sion Of Point Cloud Attributes,” in 2020 IEEE Int. Conf. on Image

Process. (ICIP), Oct. 2020, pp. 3309–3313.

[12] E. Alexiou, K. Tung, and T. Ebrahimi, “Towards neural network

approaches for point cloud compression,” in Applications of Digit.

Image Process. XLIII, vol. 11510. Int. Society for Optics and Pho-

tonics, Aug. 2020, p. 1151008.

[13] L. Huang et al., “OctSqueeze: Octree-Structured Entropy Model for

LiDAR Compression,” in 2020 IEEE/CVF Conf. Comput. Vis. Pat-

tern Recognit. (CVPR), Jun. 2020, pp. 1310–1320.

[14] S. Biswas et al., “MuSCLE: Multi Sweep Compression of Li-

DAR using Deep Entropy Models,” Adv. Neural Inf. Process. Syst.,

vol. 33, 2020.

[15] B. Curless and M. Levoy, “A volumetric method for building com-

plex models from range images,” in Proc. of the 23rd annual conf.

on Computer graphics and interactive techniques - SIGGRAPH ’96.

ACM Press, 1996, pp. 303–312.

[16] T.-Y. Lin et al., “Focal Loss for Dense Object Detection,” in 2017

IEEE Int. Conf. on Computer Vision (ICCV), Oct. 2017, pp. 2999–

3007.

[17] A. Guarda, N. Rodrigues, and F. Pereira, “Neighborhood Adap-

tive Loss Function for Deep Learning-based Point Cloud Coding

with Implicit and Explicit Quantization,” IEEE MultiMedia, pp. 1–

1, 2020.

[18] D. Tian et al., “Geometric distortion metrics for point cloud com-

pression,” in 2017 IEEE Int. Conf. on Image Process. (ICIP). Bei-

jing: IEEE, Sep. 2017, pp. 3460–3464.

[19] G. Meynet, J. Digne, and G. Lavou´

e, “PC-MSDM: A quality met-

ric for 3D point clouds,” in 2019 Eleventh Int. Conf. on Quality of

Multimedia Experience (QoMEX), Jun. 2019, pp. 1–3.

[20] G. Meynet et al., “PCQM: A Full-Reference Quality Metric for Col-

ored 3D Point Clouds,” in 12th Int. Conf. on Quality of Multimedia

Experience (QoMEX 2020), Athlone, Ireland, May 2020.

[21] E. Alexiou and T. Ebrahimi, “Point Cloud Quality Assessment Met-

ric Based on Angular Similarity,” in 2018 IEEE Int. Conf. on Multi-

media and Expo (ICME), Jul. 2018, pp. 1–6.

[22] A. Javaheri et al., “Mahalanobis Based Point to Distribution Metric

for Point Cloud Geometry Quality Evaluation,” IEEE Signal Pro-

cess. Lett., vol. 27, pp. 1350–1354, 2020.

[23] E. Alexiou and T. Ebrahimi, “Towards a Point Cloud Structural Sim-

ilarity Metric,” in 2020 IEEE Int. Conf. on Multimedia Expo Work-

shops (ICMEW), Jul. 2020, pp. 1–6.

[24] A. Javaheri et al., “Improving PSNR-based Quality Metrics Perfor-

mance For Point Cloud Geometry,” in 2020 IEEE Int. Conf. on Im-

age Process. (ICIP), Oct. 2020, pp. 3438–3442.

[25] I. Viola, S. Subramanyam, and P. Cesar, “A Color-Based Objective

Quality Metric for Point Cloud Contents,” in 2020 Twelfth Int. Conf.

on Quality of Multimedia Experience (QoMEX), May 2020, pp. 1–6.

[26] R. Zhang et al., “The Unreasonable Effectiveness of Deep Features

as a Perceptual Metric,” in 2018 IEEE/CVF Conf. Comput. Vis. Pat-

tern Recognit. (CVPR). IEEE, Jun. 2018, pp. 586–595.

[27] S. Perry et al., “Quality Evaluation Of Static Point Clouds Encoded

Using MPEG Codecs,” in 2020 IEEE Int. Conf. on Image Process.

(ICIP), Oct. 2020, pp. 3428–3432.

[28] V. Nair and G. E. Hinton, “Rectiﬁed Linear Units Improve Re-

stricted Boltzmann Machines,” in Proc. of the 27th Int. Conf. on

Mach. Learn. (ICML), 2010, pp. 807–814.

[29] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimiza-

tion,” in 2015 3rd Int. Conf. on Learn. Representations, Dec. 2014.

[30] “Common test conditions for point cloud compression,” in ISO/IEC

JTC1/SC29/WG11 MPEG output document N19084, Feb. 2020.

[31] C. Loop et al., “Microsoft voxelized upper bodies - a voxelized

point cloud dataset,” in ISO/IEC JTC1/SC29 Joint WG11/WG1

(MPEG/JPEG) input document m38673/M72012, May 2016.

[32] E. d’Eon et al., “8i Voxelized Full Bodies - A Voxelized Point Cloud

Dataset,” in ISO/IEC JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG)

input document WG11M40059/WG1M74006, Geneva, Jan. 2017.

[33] G. Y. Zou, “Toward using conﬁdence intervals to compare correla-

tions.” Psychol. Methods, vol. 12, no. 4, pp. 399–413, Dec. 2007.

[34] N. Sedaghat et al., “Orientation-boosted Voxel Nets for 3D Object

Recognition,” arXiv:1604.03351 [cs], Apr. 2016.

[35] M. Abadi et al., “TensorFlow: Large-Scale Machine Learning on

Heterogeneous Distributed Systems,” arXiv:1603.04467 [cs], Mar.

2016.

Author Biography

Maurice Quach received the Computer Science Engineer

Diploma from University of Technology of Compi`

egne in 2018.

He is currently studying for a PhD on Point Cloud Compression

and Quality Assessment under the supervision of Frederic Dufaux

and Giuseppe Valenzise at Universit ´

e Paris-Saclay, CNRS, Cen-

traleSup´

elec, Laboratoire des signaux et syst`

emes, France.