PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Existing techniques to compress point cloud attributes leverage either geometric or video-based compression tools. We explore a radically different approach inspired by recent advances in point cloud representation learning. Point clouds can be interpreted as 2D manifolds in 3D space. Specifically, we fold a 2D grid onto a point cloud and we map attributes from the point cloud onto the folded 2D grid using a novel optimized mapping method. This mapping results in an image, which opens a way to apply existing image processing techniques on point cloud attributes. However, as this mapping process is lossy in nature, we propose several strategies to refine it so that attributes can be mapped to the 2D grid with minimal distortion. Moreover, this approach can be flexibly applied to point cloud patches in order to better adapt to local geometric complexity. In this work, we consider point cloud attribute compression; thus, we compress this image with a conventional 2D image codec. Our preliminary results show that the proposed folding-based coding scheme can already reach performance similar to the latest MPEG Geometry-based PCC (G-PCC) codec. Code is available at https://github.com/mauriceqch/pcc_attr_folding.
Content may be subject to copyright.
FOLDING-BASED COMPRESSION OF POINT CLOUD ATTRIBUTES
Maurice Quach Giuseppe Valenzise Frederic Dufaux
Université Paris-Saclay, CNRS, CentraleSupélec, Laboratoire des signaux et systèmes
91190 Gif-sur-Yvette, France
ABSTRACT
Existing techniques to compress point cloud attributes lever-
age either geometric or video-based compression tools. We
explore a radically different approach inspired by recent ad-
vances in point cloud representation learning. Point clouds
can be interpreted as 2D manifolds in 3D space. Specifi-
cally, we fold a 2D grid onto a point cloud and we map at-
tributes from the point cloud onto the folded 2D grid using
a novel optimized mapping method. This mapping results in
an image, which opens a way to apply existing image pro-
cessing techniques on point cloud attributes. However, as this
mapping process is lossy in nature, we propose several strate-
gies to refine it so that attributes can be mapped to the 2D
grid with minimal distortion. Moreover, this approach can
be flexibly applied to point cloud patches in order to better
adapt to local geometric complexity. In this work, we con-
sider point cloud attribute compression; thus, we compress
this image with a conventional 2D image codec. Our pre-
liminary results show that the proposed folding-based coding
scheme can already reach performance similar to the latest
MPEG Geometry-based PCC (G-PCC) codec.
Index Termspoint cloud, compression, neural network
1. INTRODUCTION
A point cloud is a set of points in 3D space which can have as-
sociated attributes such as color or normals. Point clouds are
essential for numerous applications ranging from archeology
and architecture to virtual and mixed reality. Since they can
contain millions of points with complex attributes, efficient
point cloud compression (PCC) is essential to make these ap-
plications feasible in practice.
When compressing a point cloud, we usually consider two
aspects: the geometry, that is the 3D coordinates of each in-
dividual point, and the attributes, for example RGB colors.
Moreover, we can differentiate dynamic point clouds, which
change in the temporal dimension, from static point clouds.
The Moving Picture Experts Group (MPEG) is leading PCC
standardization efforts [1]. Specifically, two main solutions
have emerged. The first one, Geometry-based PCC (G-PCC),
Funded by the ANR ReVeRy national fund (REVERY ANR-17-CE23-
0020).
uses native 3D data structures, while the second one, Video-
based PCC (V-PCC), targets mainly dynamic point clouds,
and projects the data on a 2D plane to make use of available
video codecs such as HEVC.
Point clouds can be interpreted as 2D discrete manifolds
in 3D space. Therefore, instead of compressing point cloud
attributes using 3D structures such as octrees, we can fold
this 2D manifold onto an image. This opens many avenues
of research, as it provides, e.g., a way to apply existing im-
age processing techniques straightforwardly on point cloud
attributes. In this work, we propose a novel system for fold-
ing a point cloud and mapping its attributes to a 2D grid. Fur-
thermore, we demonstrate that the proposed approach can be
used to compress static point cloud attributes efficiently.
2. RELATED WORK
Our work is at the crossroads of static point cloud attribute
compression and deep representation learning of 3D data.
Compressing static point cloud attributes has been explored
using graph transforms [2], the Region-Adaptive Hierarchical
Transform (RAHT) [3] and volumetric functions [4]. Graph
transforms take advantage of the Graph Fourier Transform
(GFT) and the neighborhood structure present in the 3D
space to compress point cloud attributes. The RAHT is a
hierarchical transform which extends the Haar wavelet trans-
form to an octree representation. In this paper, we propose a
different perspective, and leverage the manifold interpretation
of the point cloud by mapping its attributes onto a 2D grid,
which can then be compressed as an image.
Deep learning methods have been used for representation
learning and compression of point clouds [5]. In particular,
the initial folding in our work is inspired by [6] where an au-
toencoder network is trained on a dataset to learn how to fold
a 2D grid onto a 3D point cloud. In our work, we build on
this folding idea; however, we employ it in a very different
way. Specifically, we do not aim at learning a good repre-
sentation that can generalize over a dataset; instead, we em-
ploy the folding network as a parametric function that maps
an input 2D grid to points in 3D space. The parameters of
this function (i.e., the weights of the network) are obtained by
overfitting the network to a specific point cloud. In addition,
the original folding proposed in [6] is highly inefficient for
ICIP 2020 Copyright 2020 IEEE
Original
attributes Optimized Mapping Image
compression
Compressed
attributes
Image
decompression
Inverse
mapping
Decompressed
attributes
Encoder
Decoder
Coded
geometry
Segmentation
into patches
Grid Folding,
Folding refinement
Fig. 1: Proposed system for attribute compression. Segmentation is optional and can help to adapt to local geometry complexity.
PCC as it poorly adapts to complex geometries. In our work,
we propose a number of solutions to improve folding.
3. PROPOSED METHOD
We propose a novel system for compressing point cloud at-
tributes based on the idea that a point cloud can be seen as a
discrete 2D manifold in 3D space. In this way, we can ob-
tain a 2D parameterization of the point cloud and we can map
attributes from a point cloud onto a grid, making it possible
to employ 2D image processing algorithms and compression
tools. The overall system is depicted in Figure 1. In a nutshell,
our approach is based on the following two steps: a) we find
a parametric function (specifically, a deep neural network) to
fold a 2D grid onto a 3D point cloud; b) we map attributes
(e.g., colors) of the original point cloud to this grid. The grid
and the parametric function contain all the necessary informa-
tion to recover the point cloud attributes. Assuming the point
cloud geometry is coded separately and transmitted to the de-
coder, the folding function can be constructed at the decoder
side, and the 2D grid is fully decodable without any need to
transmit network parameters. In practice, the 3D-to-2D map-
ping is lossy, which entails a mapping distortion in the step
b) above. In the following, we propose several strategies to
reduce this mapping distortion.
Notation. We use lowercase bold letters such as xto in-
dicate 3D vectors (point cloud spatial coordinates), and up-
percase letters such as Xto indicate sets of 3D points (vec-
tors). We denote with a tilde (like ˜x or ˜
X) compressed (dis-
torted) vectors or sets of vectors. We use the notation hSi=
PxSx/|S|for the average over a set S.
3.1. Grid folding
We propose a grid folding composed of two steps, namely,
an initial folding step to get a rough reconstruction of Xand
a folding refinement step to improve the reconstruction qual-
ity, which is quintessential to map point cloud attributes with
minimal mapping distortion.
We fold a grid onto a point cloud to obtain its 2D param-
eterization by solving the following optimization problem:
min
fL(X, ˜
X)(1)
where Xis the set of npoints in the original point cloud, ˜
X=
f(X, G)is the set of n0points in the reconstructed point cloud
obtained by folding Gonto Xwhere Gthe set of n0=w×h
points of a 2D grid with 3D coordinates. In general, n06=n;
however, we choose n0to be close to n.Lis a loss function
and fis a folding function.
We parameterize fusing a neural network composed of
an encoder feand a decoder fdsuch that y=fe(X)and
˜
X=fd(G, y). The encoder feis composed of four point-
wise convolutions with filter sizes of 128 followed by a max-
pooling layer. The decoder fdis composed of two folding
layers with fd(G, y) = FL(FL(G, y),y). Each folding layer
has two pointwise convolutions with filter sizes of 64 and con-
catenates yto its input. The last pointwise convolution has a
filter size of 3. We use the ReLU activation [8] for the encoder
and LeakyReLU activation [9] for the decoder. A one-to-one
mapping exists between each point ˜xiin the folded grid ˜
X
and their original position giin the grid G.
We propose the following loss function
L(X, ˜
X)=dch(X, ˜
X)+drep (˜
X)(2)
where dch is the Chamfer distance:
dch(X, ˜
X) = X
xX
min
˜x˜
X
kx˜xk2
2+X
˜x˜
X
min
x˜
X
k˜x xk2
2,
and drep is a novel repulsion loss computed as the variance of
the distance of each point in ˜
Xto its nearest neighbor:
drep(˜
X) = Var({min
˜x0˜
X\˜x
k˜x ˜x0k2
2|˜x ˜
X}).
The Chamfer distance ensures that the reconstruction ˜
Xis
similar to Xand the repulsion loss penalizes variations in the
reconstruction’s density.
We obtain the parameterized folding function fby train-
ing a neural network using the Adam optimizer [10]. We use
the point cloud Xas the single input which is equivalent to
overfitting the network on a single sample.
2
(a) Original (b) Folded (27.63 dB) (c) Refined folded (30.62 dB) (d) Opt. refined folded (33.39 dB)
Fig. 2: Different steps of our proposed attribute mapping method for the first frame of phil9 [7]. Top row: phases of point cloud
reconstruction; bottom row: the attributes mapped on a 2D grid, which is later compressed and transmitted. The initial folding
(b) provides a rough reconstruction ˜
Xwhich is improved with folding refinement (c) and occupancy optimization (d) to reduce
the density mismatch between Xand ˜
X. We then map attributes from the point cloud onto a 2D grid. The holes in the grid are
filled to facilitate compression with HEVC. We indicate Y PSNRs between original and colors distorted by mapping.
3.2. Folding refinement
The initial folding has difficulties reconstructing complex
shapes accurately as seen in Figure 2b. Specifically, the two
main issues are mismatches in local density between Xand
˜
Xand inaccurate reconstructions for complex shapes. As
a result, this introduces significant mapping distortion when
mapping attributes from the original PC to the folded one;
additionally, this mapping distortion affects the reconstructed
point cloud attributes. For compression applications, this is a
serious issue as there are now two sources of distortion from
both mapping and compression. This is why we propose a
folding refinement method that alleviates mismatches in local
density and inaccurate reconstructions.
First, we reduce local density variations by considering
density-aware grid structure preservation forces inside ˜
X.
Specifically, each point ˜x is attracted towards the inverse
density weighted average of its neighbors pgrid. Since a
one-to-one mapping exists between ˜
Xand G, each point
˜xiin the folded grid ˜
Xhas a corresponding point giin the
grid G. We then define the inverse density weight ωifor
˜xias ωi=h{ k˜xi˜xjk2|gj∈ NG(gi)}i with NG(gi)
the set of horizontal and vertical neighbors of giin the grid
G. This encourages the reconstruction to have a more uni-
form distribution by penalizing high density areas. Given the
set comprising all weights ωi, we define the normalized
weights ˆωi= (ωimin(Ω))/(max(Ω) min(Ω)). Fi-
nally, this allows us to define the weighted average pgridi=
h{ ˆωj˜xj|gj∈ NG(gi)}i.
Second, we set up bidirectional attraction forces between
Xand ˜
Xto solve two issues: incomplete coverage, when
˜
Xdoes not cover parts of X, and inaccurate reconstructions,
when ˜
Xfails to reproduce Xaccurately. As a solution, we at-
tract each point ˜x towards two points ppush and ppull. Specif-
ically, ppush is the nearest neighbor of ˜x in Xand pushes ˜
X
towards Xwhich allows for more accurate reconstructions.
On the other hand, ppull is the average of the points in X
which have ˜x as their nearest neighbor and allows Xto pull
˜
Xcloser which alleviates incomplete coverage issues.
Finally, we combine these components into an iterative
refinement system to update the point cloud reconstruction:
˜xt+1,i =αpgridt,i + (1 α)(ppusht,i +ppullt,i )/2(3)
where ˜xt,i is the value of ˜xiafter titerations and ˜x0=˜x. The
inertia factor α[0,1] balances the grid structure preserva-
tion forces in ˜
Xwith the bidirectional attraction forces set
up between Xand ˜
X. Preserving the grid structure preserves
the spatial correlation of the attributes mapped on the grid and
the density-aware aspect of these forces results in more uni-
formly distributed points. In addition, the bidirectional forces
improve the accuracy of the reconstruction significantly.
3.3. Optimized Attribute Mapping
Once a sufficiently accurate 3D point cloud geometry is re-
constructed (Figure 2c), we can map attributes from Xto ˜
X.
To this end, we first build a mapping mX˜
Xfrom each point
in Xto a corresponding point in ˜
X(for example, the nearest
neighbor). Hence, the inverse mapping m˜
XXmaps ˜x back
3
024
bits per input point
25
30
35
40
45
Y PSNR (dB)
GPCC v7
GPCC v3
Folding
Refined folding
Opt. Refined folding
012
bits per input point
30
35
40
Y PSNR (dB)
GPCC v7
GPCC v3
Folding
Refined folding
Opt. Refined folding
0123
bits per input point
30
35
40
45
Y PSNR (dB)
GPCC v7
GPCC v3
Folding
Refined folding
Opt. Refined folding
Fig. 3: RD curves showing the performance of the different steps of our method. From top to bottom: longdress_vox10_1300,
redandblack_vox10_1550 and soldier_vox10_0690 [11].
to X. As mX˜
Xis not one-to-one (due to local density mis-
matches and inaccuracy of the reconstruction), several points
in Xcan map to the same ˜x. Thus, a given ˜x can correspond
to zero, one or many points in X; we define the number of
these points as its occupancy o(˜x). Attribute mapping from
Xto ˜
Xis obtained using m˜
XXas the attribute value for a
point ˜x is the average of the attribute values of m˜
XX(˜x). In
case m˜
XX(˜x) = , we simply assign to ˜x the attribute of
its nearest neighbor in X. As a consequence of this approach,
points with higher occupancy tend to have higher mapping
distortion, as more attributes are averaged.
To overcome this problem, we integrate the occupancy as
a regularizing factor when building the mapping. For each
point xin X, we consider its knearest neighbors set Nk(x)
˜
Xand select mX˜
X(x) = arg min˜x∈Nk(x)o(˜x)k˜x xk2.
Specifically, the mapping is built iteratively and the occupan-
cies are updated progressively.
As noted above, when o(˜x)>1, the attributes are aver-
aged which introduces distortion. We mitigate this problem
by adding rows and columns in the 2D grid (see Fig. 2d) us-
ing the following procedure. Since o(˜x)is defined on ˜
Xand
there is a one-to-one mapping between ˜
Xand G, we can com-
pute mean occupancies row-wise and column-wise. In par-
ticular, we compute mean occupancies with zeros excluded
and we select the row/column with the maximum mean oc-
cupancy. Then, we reduce its occupancy by inserting addi-
tional rows/columns around it. We repeat this procedure until
we obtain a lossless mapping or the relative change on the
average of mean occupancies ris superior to a threshold
r,min.
4. EXPERIMENTAL RESULTS
We evaluate our system for static point cloud attribute com-
pression and compare it against G-PCC v3 [12] and v7 [13].
We also study the impact of folding refinement and occupancy
optimization on our method by presenting an ablation study.
Since folding is less accurate on complex point clouds, we
manually segment the point clouds into patches and apply our
scheme on each patch. The patches are then reassembled in
order to compute rate-distortion measures.
We use TensorFlow 1.15.0 [14]. For the folding refine-
ment, we set αto 1/3and perform 100 iterations. When map-
ping attributes, we consider k= 9 neighbors for assignment.
When optimizing occupancy, we set r,min to 106. We then
perform image compression using BPG [15], an image format
based on HEVC intra [16], with QPs ranging from 20 to 50
with a step of 5.
In Figure 3, we observe that our method performs com-
parably to G-PCC for “longdress" and “redandblack". The
performance is slightly worse for “soldier" as its geometry
is much more complex making a good reconstruction diffi-
cult and introducing mapping distortion. We obtain signifi-
cant gains in terms of rate-distortion by improving the recon-
struction quality using folding refinement and occupancy op-
timization. This shows the potential of our method and con-
firms the importance of reducing the mapping distortion.
5. CONCLUSION
Based on the interpretation of a point cloud as a 2D manifold
living in a 3D space, we propose to fold a 2D grid onto it
and map point cloud attributes into this grid. As the map-
ping introduces distortion, this calls for strategies to mini-
mize this distortion. In order to minimize mapping distor-
tion, we proposed a folding refinement procedure, an adap-
tive attribute mapping method and an occupancy optimization
scheme. With the resulting image, we compress point cloud
attributes leveraging conventional image codecs and obtain
encouraging results. Our proposed method enables the use
of 2D image processing techniques and tools on point cloud
attributes.
6. REFERENCES
[1] Sebastian Schwarz, Marius Preda, Vittorio Baroncini,
Madhukar Budagavi, Pablo Cesar, Philip A. Chou,
Robert A. Cohen, Maja Krivokuca, Sebastien Lasserre,
Zhu Li, Joan Llach, Khaled Mammou, Rufael Mekuria,
Ohji Nakagami, Ernestasia Siahaan, Ali Tabatabai,
4
Alexis M. Tourapis, and Vladyslav Zakharchenko,
“Emerging MPEG standards for point cloud compres-
sion,” pp. 1–1.
[2] Cha Zhang, Dinei Florêncio, and Charles Loop, “Point
cloud attribute compression with graph transform, in
2014 IEEE International Conference on Image Process-
ing (ICIP), pp. 2066–2070, ISSN: 2381-8549.
[3] Ricardo L. de Queiroz and Philip A. Chou, “Compres-
sion of 3d point clouds using a region-adaptive hierar-
chical transform,” vol. 25, no. 8, pp. 3947–3956, Con-
ference Name: IEEE Transactions on Image Processing.
[4] Maja Krivoku´
ca, Maxim Koroteev, and Philip A. Chou,
A volumetric approach to point cloud compression,” .
[5] Maurice Quach, Giuseppe Valenzise, and Frederic Du-
faux, “Learning convolutional transforms for lossy point
cloud geometry compression,” in 2019 IEEE Inter-
national Conference on Image Processing (ICIP), pp.
4320–4324, ISSN: 1522-4880.
[6] Yaoqing Yang, Chen Feng, Yiru Shen, and Dong Tian,
“FoldingNet: Point cloud auto-encoder via deep grid de-
formation,” in 2018 IEEE Conference on Computer Vi-
sion and Pattern Recognition (CVPR).
[7] Charles Loop, Qin Cai, Sergio O. Escolano, and
Philip A. Chou, “Microsoft voxelized upper bod-
ies - a voxelized point cloud dataset, in ISO/IEC
JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG) input
document m38673/M72012.
[8] Vinod Nair and Geoffrey E. Hinton, “Rectified linear
units improve restricted boltzmann machines, in Pro-
ceedings of the 27th International Conference on Ma-
chine Learning (ICML-10), June 21-24, 2010, Haifa, Is-
rael, Johannes Fürnkranz and Thorsten Joachims, Eds.
pp. 807–814, Omnipress.
[9] Andrew L. Maas, Awni Y. Hannun, and Andrew Y. Ng,
“Rectifier nonlinearities improve neural network acous-
tic models,” in in ICML Workshop on Deep Learning
for Audio, Speech and Language Processing.
[10] Diederik P. Kingma and Jimmy Ba, “Adam: A method
for stochastic optimization,” in 2015 3rd International
Conference on Learning Representations.
[11] Sebastian Schwarz, Gaëlle Martin-Cocher, David Flynn,
and Madhukar Budagavi, “Common test condi-
tions for point cloud compression,” in ISO/IEC
JTC1/SC29/WG11 MPEG output document N17766.
[12] Khaled Mammou, Philip A. Chou, David Flynn, and
Maja Krivoku´
ca, “PCC test model category 13 v3,”
in ISO/IEC JTC1/SC29/WG11 MPEG output document
N17762.
[13] “G-PCC test model v7 user manual, in ISO/IEC
JTC1/SC29/WG11 MPEG output document N18664.
[14] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene
Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado,
Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay
Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey
Irving, Michael Isard, Yangqing Jia, Rafal Jozefow-
icz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg,
Dan Mane, Rajat Monga, Sherry Moore, Derek Mur-
ray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit
Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vin-
cent Vanhoucke, Vijay Vasudevan, Fernanda Viegas,
Oriol Vinyals, Pete Warden, Martin Wattenberg, Mar-
tin Wicke, Yuan Yu, and Xiaoqiang Zheng, “Tensor-
Flow: Large-scale machine learning on heterogeneous
distributed systems, .
[15] Fabrice Bellard, “BPG image format,” .
[16] “High efficiency video coding (HEVC) version 2 (ITU-t
recommendation h.265),” .
5
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Efficient point cloud compression is fundamental to enable the deployment of virtual and mixed reality applications, since the number of points to code can range in the order of millions. In this paper, we present a novel data-driven geometry compression method for static point clouds based on learned convolutional transforms and uniform quantization. We perform joint optimization of both rate and distortion using a trade-off parameter. In addition, we cast the decoding process as a binary classification of the point cloud occupancy map. Our method outperforms the MPEG reference solution in terms of rate-distortion on the Microsoft Voxelized Upper Bodies dataset with 51.5% BDBR savings on average. Moreover, while octree-based methods face exponential diminution of the number of points at low bitrates, our method still produces high resolution outputs even at low bitrates. Code and supplementary material are available at https://github.com/mauriceqch/pcc_geo_cnn .
Article
Compression of point clouds has so far been confined to coding the positions of a discrete set of points in space and the attributes of those discrete points. We introduce an alternative approach based on volumetric functions, which are functions defined not just on a finite set of points, but throughout space. As in regression analysis, volumetric functions are continuous functions that are able to interpolate values on a finite set of points as linear combinations of continuous basis functions. Using a B-spline wavelet basis, we are able to code volumetric functions representing both geometry and attributes. Attribute compression is addressed in Part I of this paper, while geometry compression is addressed in Part II. Geometry is represented implicitly as the level set of a volumetric function (the signed distance function or similar). Experimental results show that geometry compression using volumetric functions improves over the methods used in the emerging MPEG Point Cloud Compression (G-PCC) standard.
Article
Due to the increased popularity of augmented and virtual reality experiences, the interest in capturing the real world in multiple dimensions and in presenting it to users in an immersible fashion has never been higher. Distributing such representations enables users to freely navigate in multi-sensory 3D media experiences. Unfortunately, such representations require a large amount of data, not feasible for transmission on today’s networks. Efficient compression technologies well adopted in the content chain are in high demand and are key components to democratize augmented and virtual reality applications. The Moving Picture Experts Group, MPEG, as one of the main standardization groups dealing with multimedia, identified the trend and started recently the process of building an open standard for compactly representing 3D point clouds, which are the 3D equivalent of the very well-known 2D pixels. This paper introduces the main developments and technical aspects of this ongoing standardization effort.
Article
In free-viewpoint video, there is a recent trend to represent scene objects as solids rather than using multiple depth maps. Point clouds have been used in computer graphics for a long time and with the recent possibility of real time capturing and rendering, point clouds have been favored over meshes in order to save computation. Each point in the cloud is associated with its 3D position and its color. We devise a method to compress the colors in point clouds which is based on a hierarchical transform and arithmetic coding. The transform is a hierarchical sub-band transform that resembles an adaptive variation of a Haar wavelet. The arithmetic encoding of the coefficients assumes Laplace distributions, one per sub-band. The Laplace parameter for each distribution is transmitted to the decoder using a custom method. The geometry of the point cloud is encoded using the well-established octtree scanning. Results show that the proposed solution performs comparably to the current state-of-the-art, in many occasions outperforming it, while being much more computationally efficient. We believe this work represents the state-of-the-art in intra-frame compression of point clouds for real-time 3D video.
Article
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions. The method is straightforward to implement and is based an adaptive estimates of lower-order moments of the gradients. The method is computationally efficient, has little memory requirements and is well suited for problems that are large in terms of data and/or parameters. The method is also ap- propriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The method exhibits invariance to diagonal rescaling of the gradients by adapting to the geometry of the objective function. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. We demonstrate that Adam works well in practice when experimentally compared to other stochastic optimization methods.
Conference Paper
Restricted Boltzmann machines were developed using binary stochastic hidden units. These can be generalized by replacing each binary unit by an infinite number of copies that all have the same weights but have progressively more negative biases. The learning and inference rules for these “Stepped Sigmoid Units ” are unchanged. They can be approximated efficiently by noisy, rectified linear units. Compared with binary units, these units learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset. Unlike binary units, rectified linear units preserve information about relative intensities as information travels through multiple layers of feature detectors. 1.
Point cloud attribute compression with graph transform
  • Cha Zhang
  • Dinei Florêncio
  • Charles Loop
Cha Zhang, Dinei Florêncio, and Charles Loop, "Point cloud attribute compression with graph transform," in 2014 IEEE International Conference on Image Processing (ICIP), pp. 2066-2070, ISSN: 2381-8549.
Microsoft voxelized upper bodies -a voxelized point cloud dataset
  • Charles Loop
  • Qin Cai
  • Sergio O Escolano
  • Philip A Chou
Charles Loop, Qin Cai, Sergio O. Escolano, and Philip A. Chou, "Microsoft voxelized upper bodies -a voxelized point cloud dataset," in ISO/IEC JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG) input document m38673/M72012.