PreprintPDF Available

SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

Point cloud completion aims to predict a complete shape in high accuracy from its partial observation. However, previous methods usually suffered from discrete nature of point cloud and unstructured prediction of points in local regions, which makes it hard to reveal fine local geometric details on the complete shape. To resolve this issue, we propose SnowflakeNet with Snowflake Point Deconvolution (SPD) to generate the complete point clouds. The SnowflakeNet models the generation of complete point clouds as the snowflake-like growth of points in 3D space, where the child points are progressively generated by splitting their parent points after each SPD. Our insight of revealing detailed geometry is to introduce skip-transformer in SPD to learn point splitting patterns which can fit local regions the best. Skip-transformer leverages attention mechanism to summarize the splitting patterns used in the previous SPD layer to produce the splitting in the current SPD layer. The locally compact and structured point cloud generated by SPD is able to precisely capture the structure characteristic of 3D shape in local patches, which enables the network to predict highly detailed geometries, such as smooth regions, sharp edges and corners. Our experimental results outperform the state-of-the-art point cloud completion methods under widely used benchmarks. Code will be available at https://github.com/AllenXiangX/SnowflakeNet.
SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with
Skip-Transformer
Peng Xiang1*
, Xin Wen1,4*
, Yu-Shen Liu1, Yan-Pei Cao2, Pengfei Wan2, Wen Zheng2, Zhizhong Han3
1School of Software, BNRist, Tsinghua University, Beijing, China
2Y-tech, Kuaishou Technology, Beijing, China 3Wayne State University 4JD.com, Beijing, China
xp20@mails.tsinghua.edu.cn wenxin16@jd.com liuyushen@tsinghua.edu.cn
caoyanpei@gmail.com {wanpengfei,zhengwen}@kuaishou.com h312h@wayne.edu
(a) Input (b) TopNet (c) CDN (d)NSFA (e) SnowflakeNet (f) GT
Figure 1. Visual comparison of point cloud completion results. The input and ground truth have 2048 and 16384 points, respectively.
Compared with current completion methods like TopNet [43], CDN [46] and NSFA [59], our SnowflakeNet can generate the complete
shape (16384 points) with fine-grained geometric details, such as smooth regions (blue boxes), sharp edges and corners (green boxes).
Abstract
Point cloud completion aims to predict a complete shape
in high accuracy from its partial observation. However,
previous methods usually suffered from discrete nature of
point cloud and unstructured prediction of points in lo-
cal regions, which makes it hard to reveal fine local geo-
metric details on the complete shape. To resolve this is-
sue, we propose SnowflakeNet with Snowflake Point De-
convolution (SPD) to generate the complete point clouds.
The SnowflakeNet models the generation of complete point
clouds as the snowflake-like growth of points in 3D space,
where the child points are progressively generated by split-
ting their parent points after each SPD. Our insight of re-
vealing detailed geometry is to introduce skip-transformer
in SPD to learn point splitting patterns which can fit lo-
cal regions the best. Skip-transformer leverages attention
mechanism to summarize the splitting patterns used in the
previous SPD layer to produce the splitting in the current
SPD layer. The locally compact and structured point cloud
generated by SPD is able to precisely capture the structure
characteristic of 3D shape in local patches, which enables
the network to predict highly detailed geometries, such as
*Equal contribution. This work was supported by National Key R&D
Program of China (2020YFF0304100), the National Natural Science Foun-
dation of China (62072268), and in part by Tsinghua-Kuaishou Institute of
Future Media Data. The corresponding author is Yu-Shen Liu.
smooth regions, sharp edges and corners. Our experimental
results outperform the state-of-the-art point cloud comple-
tion methods under widely used benchmarks. Code will be
available at https://github.com/AllenXiangX/
SnowflakeNet.
1. Introduction
In 3D computer vision [11,18,20,14,13] applications,
raw point clouds captured by 3D scanners and depth cam-
eras are usually sparse and incomplete [53,54,48] due to
occlusion and limited sensor resolution. Therefore, point
cloud completion [53,43], which aims to predict a com-
plete shape from its partial observation, is vital for various
downstream tasks. Benefiting from large-scaled point cloud
datasets, deep learning based point cloud completion meth-
ods have been attracting more research interests. Current
methods either constrain the generation of point clouds by
following a hierarchical rooted tree structure [46,55,43]
or assume a specific topology [56,53] for the target shape.
However, most of these methods suffered from discrete na-
ture of point cloud and unstructured prediction of points in
local regions, which makes it hard to preserve a well ar-
ranged structure for points in local patches. It is still chal-
lenging to capture the local geometric details and structure
characteristic on the complete shape, such as smooth re-
gions, sharp edges and corners, as illustrated in Figure 1.
In order to address this problem, we propose a novel net-
arXiv:2108.04444v2 [cs.CV] 27 Oct 2021
(a)P1P2(b)P2P3
P1P2P3
Figure 2. Illustration of snowflake point deconvolution (SPD) for
growing part of a car. To show the local change more clearly, we
only illustrate some sample points as parent points in the same
patch and demonstrate their splitting paths for child points, which
are marked as gray and red lines. (a) illustrates the SPD of point
splitting from a coarse point cloud P1(512 points) to its splitting
P2(2048 points). (b) illustrates the SPD of point splitting from P2
to dense complete point cloud P3(16384 points), where the child
points are expanding like the growth process of snowflakes.
work called SnowflakeNet, especially focusing on the de-
coding process to complete partial point clouds. Snowflak-
eNet mainly consists of layers of Snowflake Point Decon-
volution (SPD), which models the generation of complete
point clouds like the snowflake growth of points in 3D
space. We progressively generate points by stacking one
SPD layer upon another, where each SPD layer produces
child points by splitting their parent point with inheriting
shape characteristics captured by the parent point. Figure 2
illustrates the process of SPD and point-wise splitting.
Our insight of revealing detailed geometry is to intro-
duce skip-transformer in SPD to learn point splitting pat-
terns which can fit local regions the best. Compared with
the previous methods, which often ignore the spatial rela-
tionship among points [56,43,31] or simply learn through
self-attention in a single level of multi-step point cloud de-
coding [53,29,46], our skip-transformer is proposed to in-
tegrate the spatial relationships across different levels of de-
coding. Therefore, it can establish a cross-level spatial re-
lationships between points in different decoding steps, and
refine their location to produce more detailed structure. To
achieve this, skip-transformer leverages attention mecha-
nism to summarize the splitting patterns used in the pre-
vious SPD layer, which aims to produce the splitting in cur-
rent SPD layer. The skip-transformer can learn the shape
context and the spatial relationship between the points in
local patches. This enables the network to precisely capture
the structure characteristic in each local patches, and predict
a better point cloud shape for both smooth plane and sharp
edges in 3D space. We achieved the state-of-the-art com-
pletion accuracy under the widely used benchmarks. Our
main contributions can be summarized as follows.
We propose a novel SnowflakeNet for point cloud
completion. Compared with previous locally unorga-
nized complete shape generation methods, Snowflak-
eNet can interpret the generation process of complete
point cloud into an explicit and locally structured pat-
tern, which greatly improves the performance of 3D
shape completion.
We propose the novel Snowflake Point Deconvolu-
tion (SPD) for progressively increasing the number of
points. It reformulates the generation of child points
from parent points as a growing process of snowflake,
where the shape characteristic embedded by the parent
point features is extracted and inherited into the child
points through a point-wise splitting operation.
We introduce a novel skip-transformer to learn split-
ting patterns in SPD. It learns shape context and spatial
relationship between child points and parent points,
which encourages SPD to produce locally structured
and compact point arrangements, and capture the
structure characteristic of 3D surface in local patches.
2. Related Work
Point cloud completion methods can be roughly divided
into two categories. (1) Traditional point cloud completion
methods [42,1,44,26] usually assume a smooth surface of
3D shape, or utilize large-scaled complete shape dataset to
infer the missing regions for incomplete shape. (2) Deep
learning [28,12,15,23,19,17,16,22,50,51] based meth-
ods [27,46,4,8,37,25,24], however, learn to predict a
complete shape based on the learned prior from the training
data. Our method falls into the second class and focuses on
the decoding process of point cloud completion. We briefly
review deep learning based methods below.
Point cloud completion by folding-based decoding. The
development of deep learning based 3D point cloud pro-
cessing techniques [10,52,49,35,33,34,32,21,3,36]
have boosted the research of point cloud completion. Suf-
fering from the discrete nature of point cloud data, the gen-
eration of high-quality complete shape is one of the major
concerns in the point cloud completion research. One of the
pioneering work is the FoldingNet [56], although it was not
originally designed for point cloud completion. It proposed
a two-stage generation process and combined with the as-
sumption that 3D object lies on 2D-manifold [43]. Follow-
ing the similar practice, methods like SA-Net [53] further
extended such generation process into multiple stages by
proposing hierarchical folding in decoder. However, the
problem of these folding-based methods [53,56,30] is that
the 3-dimensional code generated by intermediate layer of
network is an implicit representation of target shape, which
is hardly interpreted or constrained in order to help refine
the shape in local region. On the other hand, TopNet [43]
modeled the point cloud generation process as the growth
of rooted tree, where one parent point feature is projected
into several child point features in a feature expansion layer
of TopNet. Same as FoldingNet [56], the intermediate gen-
eration processes of TopNet and SA-Net are also implicit,
Nx3
P1P2P3
Displacement feature
Shape code
P0
P
N0x3N1x3N2x3N3x3
f
1 x C
Feature extraction Seed generation Point generation
fff
PDisplacement feature
shared
MLP
fshared
duplicate
shared Pc
Nc x3
P
MLP
MLP
FPS P0
N0x3
Pi-1
Ni-1 x3
PN Qi-1
f
ST Hi-1
shared
MLP
KiΔPi
C
Ki-1
From the previous step
Pi
Ki
To the next step
Ni x3
Shape
code
duplicate
Seed
generator
Feature
extractor
(b) Seed generator (c) SPD at the i-th step
(a) SnowflakeNet
SPD SPD SPD
PS
PS
SPD Snowflake point
deconvolution ST Skip-transformer PS Point-wise
splitting operation CConcatenation Element-wise addition
PN PointNet
Ni x3
Nc x(C +C
ʹ
)Ni x2C
ʹ
Ni x C
ʹ
duplicate
Nc x(C +C
ʹ
)
Figure 3. (a) The overall architecture of SnowflakeNet, which consists of three modules: feature extraction, seed generation and point
generation. (b) The details of seed generation module. (c) Snowflake point deconvolution (SPD). Note that N,Ncand Niare the number
of points, Cand C0are the number of point feature channels that are 512 and 128, respectively.
where the shape information is only represented by the point
features, cannot be constrained or explained explicitly.
Point cloud completion by coarse-to-fine decoding. Re-
cently, explicit coarse-to-fine completion framework [55,5]
has received an increasing attention, due to its explainable
nature and controllable generation process. Typical meth-
ods like PCN [57] and NSFA [59] adopted the two-stage
generation framework, where a coarse and low resolution
point cloud is first generated by the decoder, and then a lift-
ing module is used to increase the density of point clouds.
Such kind of methods can achieve better performance since
it can impose more constraints on the generation process of
point cloud, i.e. the coarse one and the dense one. Followers
like CDN [46] and PF-Net [27] further extended the num-
ber of generation stages and achieved the currently state-of-
the-art performance. Although intriguing performance has
been achieved by the studies along this line, most of these
methods still cannot predict a locally structured point split-
ting pattern, as illustrated in Figure 1. The biggest problem
is that these methods only focus on the expansion of point
number and the reconstruction of global shape, while ig-
noring to preserve a well-structured generation process for
points in local regions. This makes these methods difficult
to capture local detailed geometries and structures of 3D
shape.
Compared with the above-mentioned methods, our
SnowflakeNet takes one step further to explore an explicit,
explainable and locally structured solution for the gen-
eration of complete point cloud. SnowflakeNet models
the progressive generation of point cloud as a hierarchical
rooted tree structure like TopNet, while keeping the pro-
cess explainable and explicit like CDN [46] and PF-Net
[27]. Moreover, it excels the predecessors by arranging the
point splitting in local regions in a locally structured pattern,
which enables to precisely capture the detailed geometries
and structures of 3D shapes.
Relation to transformer. Transformer [45] was initially
proposed for encoding sentence in natural language pro-
cessing, and soon gets popular in the research of 2D com-
puter vision (CV) [6,39]. Then, the success of transformer-
based 2D CV studies have drawn the attention of 3D
point cloud research, where pioneering studies like Point-
Transformer [60], PCT [9] and Pointformer [38] have in-
troduced such framework in the encoding process of point
cloud to learn the representation. In our work, instead of
only utilizing its representation learning ability, we further
extend the application of transformer-based structure into
the decoding process of point cloud completion, and reveal
its ability for generating high quality 3D shapes through the
proposed skip-transformer.
3. SnowflakeNet
The overall architecture of SnowflakeNet is shown in
Figure 3(a), which consists of three modules: feature ex-
traction, seed generation and point generation. We will de-
tail each module in the following.
3.1. Overview
Feature extraction module. Let P={pj}of size N×3
be an input point cloud, where Nis the number of points
and each point pjindicates a 3D coordinate. The feature
extractor aims to extract a shape code fof size 1×C, which
captures the global structure and detailed local pattern of the
target shape. To achieve this, we adopt three layers of set
abstraction from [41] to aggregate point features from local
to global, along which point transformer [60] is applied to
incorporate local shape context.
Seed generation module. The objective of the seed gen-
erator is to produce a coarse but complete point cloud P0
of size N0×3that captures the geometry and structure
of the target shape. As shown in Figure 3(b), with the ex-
tracted shape code f, the seed generator first produces point
features that capture both the existing and missing shape
through point-wise splitting operation. Next, the per-point
features are integrated with the shape code through multi-
layer perceptron (MLP) to generate a coarse point cloud Pc
of size Nc×3. Then, following the previous method [46],
Pcis merged with the input point cloud Pby concatena-
tion, and then the merged point cloud is down-sampled to
P0through farthest point sampling (FPS) [41]. In this pa-
per, we typically set Nc= 256 and N0= 512, where a
sparse point cloud P0suffices for representing the underly-
ing shape. P0will serve as the seed point cloud for point
generation module.
Point generation module. The point generation module
consists of three steps of Snowflake Point Deconvolution
(SPD), each of which takes the point cloud from the pre-
vious step and splits it by up-sampling factors (denoted by
r1, r2and r3) to obtain P1,P2and P3, which have the point
sizes of N1×3, N2×3and N3×3, respectively. SPDs col-
laborate with each other to generate a rooted tree structure
that complies with local pattern for every seed point. The
structure of SPD is detailed below.
3.2. Snowflake Point Deconvolution (SPD)
The SPD aims to increase the number of points by split-
ting each parent point into multiple child points, which
can be achieved by first duplicating the parent points and
then adding variations. Existing methods [46,57,59] usu-
ally adopt the folding-based strategy [56] to obtain the
variations, which are used for learning different displace-
ments for the duplicated points. However, the folding op-
eration samples the same 2D grids for each parent point,
which ignores the local shape characteristics contained in
the parent point. Different from the folding-based methods
[46,57,59], the SPD obtains variations through a point-
wise splitting operation, which fully leverages the geomet-
ric information in parent points and adds variations that
comply with local patterns. In order to progressively gen-
erate the split points, three SPDs are used in point genera-
tion module. In addition, to facilitate consecutive SPDs to
split points in a coherent manner, we propose a novel skip-
transformer to capture the shape context and the spatial re-
.
..
gj, k
k = 1
r i
Parent
point feature
Child
point features
{ }
Activate
Shape
characteristics Activated
characteristics
hi-1
j
Kernels
Figure 4. The point-wise splitting operation. The cubes are logits
of the parent point feature that represent activation status of the
corresponding shape characteristics (Kernels), and child point fea-
tures are obtained by adding activated shape characteristics.
lationship between the parent points and their split points.
Figure 3(c) illustrates the structure of the i-th SPD with
up-sampling factor ri. We denote a set of parent points ob-
tained from previous step as Pi1={pi1
j}Ni1
j=1 . We split
the parent points in Pi1by duplicating them ritimes to
generate a set of child points ˆ
Pi, and then spread ˆ
Pito the
neighborhood of the parent points. To achieve this, we take
the inspiration from [57] to predict the point displacement
Piof ˆ
Pi. Then, ˆ
Piis updated as Pi=ˆ
Pi+ Pi, where
Piis the output of the i-th SPD.
In detail, taking the shape code ffrom feature extrac-
tion, the SPD first extracts the per-point feature Qi1=
{qi1
j}Ni1
j=1 for Pi1by adopting the basic PointNet [40]
framework. Then, Qi1is sent to the skip-transformer
to learn the shape context feature, denoted as Hi1=
{hi1
j}Ni1
j=1 . Next, Hi1is up-sampled by point-wise split-
ting operation and duplication, respectively, where the for-
mer serves to add variations and the latter preserves shape
context information. Finally, the up-sampled feature with
size of Ni×2C0is fed to MLP to produce the displacement
feature Ki={ki
j}Ni
j=1 of current step. Here, Kiis used for
generating the point displacement Pi, and will be fed into
the next SPD. Piis formulated as
Pi= tanh(MLP(Ki)),(1)
where tanh is the hyper-tangent activation.
Point-wise splitting operation. point-wise splitting op-
eration aims to generate multiple child point features for
each hi1
j Hi1. Figure 4shows this operation struc-
ture used in the i-th SPD (see Figure 3(c)). It is a spe-
cial one-dimensional deconvolution strategy, where the ker-
nel size and stride are both equal to ri. In practice, each
hi1
j Hi1shares the same set of kernels, and produces
multiple child point features in a point-wise manner. To be
clear, we denote the m-th logit of hi1
jas hi1
j,m , and its cor-
responding kernel is indicated by Km. Technically, Kmis a
matrix with a size of ri×C0, the k-th row of Kmis denoted
as km,k, and the k-th child point feature gj,k is given by
gj,k =X
m
hi1
j,mkm,k .(2)
qi-1
j
ki-1
j
qi-1
j
ki-1
j
qi-1
j
ki-1
j
MLP
softmax
vi-1
j
hi-1
j
Relation function
Element-wise addition
Hadamard product
Concatenation
CMLP
C
Figure 5. The detailed structure of skip-transformer.
In addition, in Figure 4, we assume that each learnable ker-
nel Kmindicates a certain shape characteristic, which de-
scribes the geometry and structure of 3D shape in local re-
gion. Correspondingly, every logit hi1
j,m indicates the ac-
tivation status of the m-th shape characteristic. The child
point features can be generated by adding the activated
shape characteristics. Moreover, the point-wise splitting
operation is flexible for up-sampling points. For example,
when ri= 1, it enables the SPD to move the point from
previous step to a better position; when ri>1, it serves to
expand the number of points by a factor of ri.
Collaboration between SPDs. In Figure 3(a), we adopt
three SPDs to generate the complete point cloud. We first
set the up-sampling factor r1= 1 to explicitly rearrange
seed point positions. Then, we set r2>1and r3>1
to generate a structured tree for every point in P1. Col-
laboration between SPDs is crucial for growing the tree in
a coherent manner, because information from the previous
splitting can be used to guide the current one. Besides, the
growth of the rooted trees should also capture the pattern
of local patches to avoid overlapping with each other. To
achieve this purpose, we propose a novel skip-transformer
to serve as the cooperation unit between SPDs. In Figure 5,
the skip-transformer takes per-point feature qi1
jas input,
and combines it with displacement feature ki1
jfrom pre-
vious step to produce the shape context feature hi1
j, which
is given by
hi1
j= ST(ki1
j,qi1
j),(3)
where ST denotes the skip-transformer. The detailed struc-
ture is described as follows.
3.3. Skip-Transformer
Figure 5shows the structure of skip-transformer. The
skip-transformer is introduced to learn and refine the spatial
context between parent points and their child points, where
the term “skip” represents the connection between the dis-
placement feature from the previous layer and the point fea-
ture of the current layer.
Given per-point feature qi1
jand displacement feature
ki1
j, the skip-transformer first concatenates them. Then,
the concatenated feature is fed to MLP, which generates the
vector vi1
j. Here, vi1
jserves as the value vector which
incorporates previous point splitting information. In order
to further aggregate local shape context into vi1
j, the skip-
transformer uses qi1
jas the query and ki1
jas the key to
estimate attention vector ai1
j, where ai1
jdenotes how
much attention the current splitting should pay to the previ-
ous one. To enable the skip-transformer to concentrate on
local pattern, we calculate attention vectors between each
point and its k-nearest neighbors (k-NN). The k-NN strat-
egy also helps to reduce computation cost. Specifically,
given the j-th point feature qi1
j, the attention vector ai1
j,l
between qi1
jand displacement features of the k-nearest
neighbors {ki1
j,l |l= 1,2, . . . , k}can be calculated as
ai1
j,l =exp(MLP((qi1
j)(ki1
j,l )))
Pk
l=1 exp(MLP((qi1
j)(ki1
j,l ))) ,(4)
where serves as the relation operation, i.e. element-wise
subtraction. Finally, the shape context feature hi1
jcan be
obtained by
hi1
j=vi1
j
k
X
l=1
ai1
j,l vi1
j,l ,(5)
where denotes element-wise addition and is Hadamard
product. Note that there is no previous displacement feature
for the first SPD, of which the skip-transformer takes q0
jas
both query and key.
3.4. Training Loss
In our implementation, we use Chamfer distance (CD)
as the primary loss function. To explicitly constrain point
clouds generated in the seed generation and the subsequent
splitting process, we down-sample the ground truth point
clouds to the same sampling density as {Pc,P1,P2,P3}
(see Figure 3), where we define the sum of the four CD
losses as the completion loss, denoted by Lcompletion. Be-
sides, we also exploit the partial matching loss from [48]
to preserve the shape structure of the input point cloud.
It is an unidirectional constraint which aims to match one
shape to the other without constraining the opposite direc-
tion. Because the partial matching loss only requires the
output point cloud to partially match the input, we take it
as the preservation loss Lpreservation, and the total training
loss is formulated as
L=Lcompletion +λLpreservation.(6)
The arrangement is detailed in Supplementary Material.
4. Experiments
To fully prove the effectiveness of our SnowflakeNet,
we conduct comprehensive experiments under two widely
used benchmarks: PCN [57] and Completion3D [43], both
Table 1. Point cloud completion on PCN dataset in terms of per-point L1 Chamfer distance ×103(lower is better).
Methods Average Plane Cabinet Car Chair Lamp Couch Table Boat
FoldingNet [56] 14.31 9.49 15.80 12.61 15.55 16.41 15.97 13.65 14.99
TopNet [43] 12.15 7.61 13.31 10.90 13.82 14.44 14.78 11.22 11.12
AtlasNet [7] 10.85 6.37 11.94 10.10 12.06 12.37 12.99 10.33 10.61
PCN [57] 9.64 5.50 22.70 10.63 8.70 11.00 11.34 11.68 8.59
GRNet [55] 8.83 6.45 10.37 9.45 9.41 7.96 10.51 8.44 8.04
CDN [46] 8.51 4.79 9.97 8.31 9.49 8.94 10.69 7.81 8.05
PMP-Net [54] 8.73 5.65 11.24 9.64 9.51 6.95 10.83 8.72 7.25
NSFA [59] 8.06 4.76 10.18 8.63 8.53 7.03 10.53 7.35 7.48
Ours 7.21 4.29 9.16 8.08 7.89 6.07 9.23 6.55 6.40
Input CDN GRNet NSFA PMP-Net SnowflakeNet GT
Car Boat
Chair Couch
Figure 6. Visual comparison of point cloud completion on PCN dataset. Our SnowflakeNet can produce smoother surfaces (e.g. car) and
more detailed structures (e.g. chair back) compared with the other state-of-the-art point cloud completion methods.
of which are subsets of the ShapeNet dataset. The experi-
ments demonstrate that our method has superiority over the
state-of-the-art point cloud completion methods.
4.1. Evaluation on PCN Dataset
Dataset briefs and evaluation metric. The PCN dataset
[57] is a subset with 8 categories derived from ShapeNet
dataset [2]. The incomplete shapes are generated by back-
projecting complete shapes into 8 different partial views.
For each complete shape, 16384 points are evenly sampled
from the shape surface. We follow the same split settings
with PCN [57] to fairly compare our SnowflakeNet with
other methods. For evaluation, we adopt the L1 version of
Chamfer distance, which follows the same practice as pre-
vious methods [57].
Quantitative comparison. Table 1shows the results of
our SnowflakeNet and other completion methods on PCN
dataset, from which we can find that SnowflakeNet achieves
the best performance over all counterparts. Especially,
compared with the result of the second-ranked NSFA [59],
SnowflakeNet reduces the average CD by 0.85, which is
10.5% lower than the NSFA’s results (8.06 in terms of av-
erage CD). Moreover, SnowflakeNet also achieves the best
results on all categories in terms of CD, which proves the
robust generalization ability of SnowflakeNet for complet-
ing shapes across different categories. In Table 1, both
CDN [46] and NSFA [59] are typical point cloud comple-
tion methods, which adopt a coarse-to-fine shape decoding
strategy and model the generation of points as a hierarchi-
cal rooted tree. Compared with these two methods, our
SnowflakeNet also adopts the same decoding strategy but
achieves much better results on PCN dataset. Therefore, the
improvements should credit to the proposed SPD layers and
skip-transformer in SnowflakeNet, which helps to generate
points in local regions in a locally structured pattern.
Visual comparison. We typically choose top four point
cloud completion methods from Table 1, and visually com-
pare SnowflakeNet with these methods in Figure 6. The
visual results show that SnowflakeNet can predict the com-
plete point clouds with much better shape quality. For ex-
ample, in the car category, the point distribution on the car’s
boundary generated by SnowflakeNet is smoother and more
uniform than other methods. As for the chair category,
SnowflakeNet can predict more detailed and clear structure
Table 2. Point cloud completion on Completion3D in terms of per-point L2 Chamfer distance ×104(lower is better).
Methods Average Plane Cabinet Car Chair Lamp Couch Table Boat
FoldingNet [56] 19.07 12.83 23.01 14.88 25.69 21.79 21.31 20.71 11.51
PCN [57] 18.22 9.79 22.70 12.43 25.14 22.72 20.26 20.27 11.73
PointSetVoting [58] 18.18 6.88 21.18 15.78 22.54 18.78 28.39 19.96 11.16
AtlasNet [7] 17.77 10.36 23.40 13.40 24.16 20.24 20.82 17.52 11.62
SoftPoolNet [47] 16.15 5.81 24.53 11.35 23.63 18.54 20.34 16.89 7.14
TopNet [43] 14.25 7.32 18.77 12.88 19.82 14.60 16.29 14.89 8.82
SA-Net [53] 11.22 5.27 14.45 7.78 13.67 13.53 14.22 11.75 8.84
GRNet [55] 10.64 6.13 16.90 8.27 12.23 10.22 14.93 10.08 5.86
PMP-Net [54] 9.23 3.99 14.70 8.55 10.21 9.27 12.43 8.51 5.77
Ours 7.60 3.48 11.09 6.9 8.75 8.42 10.15 6.46 5.32
of the chair back compared with the other methods, where
CDN [46] almost fails to preserve the basic structure of the
chair back, while the other methods generate lots of noise
between the columns of the chair back.
4.2. Evaluation on Completion3D Dataset
Dataset briefs and evaluation metric. The Completion3D
dataset contains 30958 models from 8 categories, of which
both partial and ground truth point clouds have 2048 points.
We follow the same train/validation/test split of Comple-
tion3D to have a fair comparison with the other methods,
where the training set contains 28974 models, validation
and testing set contain 800 and 1184 models, respectively.
For evaluation, we adopt the L2 version of Chamfer dis-
tance on testing set to align with previous studies.
Quantitative comparison. In Table 2, we show the quanti-
tative results of our SnowflakeNet and the other methods on
Completion3D dataset. All results are cited from the online
public leaderboard of Completion3D*. From Table 2, we
can find that our SnowflakeNet achieves the best results over
all methods listed on the leaderboard. Especially, compared
with the state-of-the-art method PMP-Net [54], Snowflak-
eNet significatly reduces the average CD by 1.63, which is
17.3% lower than the PMP-Net (9.23 in terms of average
CD). On the Completion3D dataset, SnowflakeNet outper-
forms the other methods in all categories in terms of per-
category CD. Especially in the cabinet category, Snowflak-
eNet reduces the per-category CD by 3.61 compared with
the second-ranked result of PMP-Net. Compared with PCN
dataset, the point cloud in Completion3D dataset is much
sparser and easier to generate. Therefore, a coarse-to-fine
decoding strategy may have less advantages over the other
methods. Despite of this, our SnowflakeNet still achieves
superior performance over folding-based methods includ-
ing SA-Net [53] and FoldingNet [56], and we are also the
best among the coarse-to-fine methods including TopNet
[43] and GRNet [55]. In all, the results on Completion3D
dataset demonstrate the capability of SnowflakeNet for pre-
dicting high-quality complete shape on sparse point clouds.
*https://completion3d.stanford.edu/results
Table Plane Boat Car
Input TopNet SA-Net GRNet PMP-Net SnowflakeNet GT
Figure 7. Visual comparison of point cloud completion on Com-
pletion3D dataset. Our SnowflakeNet can produce smoother sur-
faces (e.g. car and table) and more detailed structures compared
with the other state-of-the-art point cloud completion methods.
Visual comparison. Same as the practice in PCN dataset,
we also visually compare SnowflakeNet with the top four
methods in Table 2. Visual comparison in Figure 7demon-
strates that our SnowflakeNet also achieves much better vi-
sual results than the other counterparts on sparse point cloud
completion task. Especially, in plane category, Snowflak-
eNet predicts the complete plane which is almost the same
as the ground truth, while the other methods fail to reveal
the complete plane in detail. The same conclusion can also
be drawn from the observation of car category. In the table
and boat categories, SnowflakeNet produces more detailed
structures compared with the other methods, e.g. the sails
of the boat and the legs of the table.
4.3. Ablation studies
We analyze the effectiveness of each part of Snowflak-
eNet. For convenience, we conduct all experiments on the
validation set of Completion3D dataset. By default, all the
experiment settings and the network structure remains the
same as Section 4.2, except for the analyzed part.
Effect of skip-transformer. To evaluate the effectiveness
of skip-transformer used in SnowflakeNet, we develop three
network variations as follows. (1) The Self-att variation re-
places the transformer mechanism in skip-transformer with
the self-attention mechanism, where the input is the point
features of current layer. (2) The No-att variation removes
the transformer mechanism from skip-transformer, where
the features from the previous layer of SPD is directly added
to the feature of current SPD layer. (3) The No-connect
variation removes the whole skip-transformer from the SPD
layers, and thus, no feature connection is established be-
tween the SPD layers. The experiment results are shown
in Table 3. In addition, we denote the original version of
SnowflakeNet as Full for clear comparison with the per-
formance of each network variations. From Table 3, we
can find that the transformer-based Full model achieves the
best performance among all compare network variations.
The comparison between the No-connect model and the
Full model justifies the advantage of using skip-transformer
between SPD layers, and the comparison between No-att
model and Full model further proves the effectiveness of us-
ing transformer mechanism to learn shape context in local
regions. Moreover, the comparison between Self-att model
and No-att model shows that the attention based mechanism
can also contribute to the completion performance.
Table 3. Effect of skip-transformer.
Methods avg. Couch Chair Car Lamp
Self-att 8.89 6.04 10.9 9.42 9.12
No-att 9.30 6.15 11.2 10.4 9.38
No-connect 9.39 6.17 11.3 10.5 9.51
Full 8.48 5.89 10.6 9.32 8.12
Effect of each part in SnowflakeNet. To evaluate the ef-
fectiveness of each part in SnowflakeNet, we design four
different network variations as follows. (1) The Folding-
expansion variation replaces the point-wise splitting opera-
tion with the folding-based feature expansion method [56],
where the features are duplicated several times and concate-
nated with a 2-dimensional codeword, in order to increase
the number of point features. (2) The EPCN +SPD varia-
tion employs the PCN encoder and our SnowflakeNet de-
coder. (3) The w/o partial matching variation removes the
partial matching loss. (4) The PCN-baseline is the perfor-
mance of original PCN method [57], which is trained and
evaluated under the same settings of our ablation study. In
Table 4, we report the results of the four network variations
along with the default network denoted as Full. By com-
paring EPCN+SPD with PCN-baseline, we can find that our
SPD with skip-transformer based decoder can be potentially
applied to other simple encoders, and achieves significant
improvement. By comparing Folding-expansion with Full
model, the better performance of Full model proves the ad-
vantage of point-wise splitting operation over the folding-
based feature expansion methods. By comparing w/o partial
matching with Full model, we can find that partial match-
ing loss can slightly improve the average performance of
SnowflakeNet.
Visualization of point generation process of SPD. In Fig-
ure 8, we visualize the point cloud generation proccess of
SPD. We can find that the layers of SPD generate points
in a snowflake-like pattern. When generating the smooth
Table 4. Effect of each part in SnowflakeNet.
Methods avg. Couch Chair Car Lamp
Folding-expansion 8.80 8.40 10.80 5.83 10.10
EPCN+SPD 8.93 9.06 11.30 6.14 9.23
w/o partial matching 8.50 8.72 10.6 5.78 8.9
PCN-baseline 13.30 11.50 17.00 6.55 18.20
Full 8.48 8.12 10.6 5.89 9.32
Chair Plane
Table
Lamp
Figure 8. Visualization of snowflake point deconvolution on dif-
ferent objects. For each object, we sample two patches of points
and visualize two layers of point splitting together for each sam-
pled point. The gray lines indicate the paths of the point splitting
from P1to P2, and the red lines are splitting paths from P2to P3.
plane (e.g. chair and lamp in Figure 8), we can clearly see
the child points are generated around the parent points, and
smoothly placed along the plane surface. On the other hand,
when generating thin tubes and sharp edges, the child points
can precisely capture the geometries.
5. Conclusions
In this paper, we propose a novel neural network
for point cloud completion, named SnowflakeNet. The
SnowflakeNet models the generation of completion point
clouds as the snowflake-like growth of points in 3D
space using multiple layers of Snowflake Point Deconvolu-
tion. By further introducing skip-transformer in Snowflake
Point Deconvolution, SnowflakeNet learns to generate lo-
cally compact and structured point cloud with highly de-
tailed geometries. We conduct comprehensive experiments
on sparse (Completion3D) and dense (PCN) point cloud
completion datasets, which shows the superiority of our
SnowflakeNet over the current SOTA point cloud comple-
tion methods.
References
[1] Matthew Berger, Andrea Tagliasacchi, Lee Seversky, Pierre
Alliez, Joshua Levine, Andrei Sharf, and Claudio Silva. State
of the art in surface reconstruction from point clouds. In Pro-
ceedings of Eurographics, volume 1, pages 161–185, 2014.
2
[2] Angel X Chang, Thomas Funkhouser, Leonidas J Guibas,
Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese,
Manolis Savva, Shuran Song, Hao Su, et al. ShapeNet: An
information-rich 3D model repository. arXiv:1512.03012,
2015. 6
[3] Chao Chen, Zhizhong Han, Yu-Shen Liu, and Matthias
Zwicker. Unsupervised learning of fine structure generation
for 3D point clouds by 2D projection matching. In Proceed-
ings of the IEEE International Conference on Computer Vi-
sion (ICCV), 2021. 2
[4] Xuelin Chen, Baoquan Chen, and Niloy J Mitra. Unpaired
point cloud completion on real scans using adversarial train-
ing. In International Conference on Learning Representa-
tions, 2019. 2
[5] Angela Dai, Charles Ruizhongtai Qi, and Matthias Nießner.
Shape completion using 3D-encoder-predictor CNNs and
shape synthesis. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), pages
5868–5877, 2017. 3
[6] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov,
Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner,
Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl-
vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is
worth 16x16 words: Transformers for image recognition at
scale. In International Conference on Learning Representa-
tions, 2021. 3
[7] Thibault Groueix, Matthew Fisher, Vladimir Kim, Bryan
Russell, and Mathieu Aubry. A papier-mˆ
ach´
e approach to
learning 3D surface generation. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition
(CVPR), pages 216–224, 2018. 6,7
[8] Jiayuan Gu, Wei-Chiu Ma, Sivabalan Manivasagam,
Wenyuan Zeng, Zihao Wang, Yuwen Xiong, Hao Su, and
Raquel Urtasun. Weakly-supervised 3D shape completion in
the wild. In Proc. of the European Conf. on Computer Vision
(ECCV). Springer, 2020. 2
[9] Meng-Hao Guo, Jun-Xiong Cai, Zheng-Ning Liu, Tai-Jiang
Mu, Ralph R. Martin, and Shi-Min Hu. PCT: Point cloud
transformer. Computational Visual Media, 7(2):187–199,
Apr 2021. 3
[10] Zhizhong Han, Chao Chen, Yu-Shen Liu, and Matthias
Zwicker. DRWR: A differentiable renderer without render-
ing for unsupervised 3D structure learning from silhouette
images. In International Conference on Machine Learning
(ICML), 2020. 2
[11] Zhizhong Han, Chao Chen, Yu-Shen Liu, and Matthias
Zwicker. ShapeCaptioner: Generative caption network for
3D shapes by learning a mapping from parts detected in mul-
tiple views to sentences. In Proceedings of the 28th ACM
International Conference on Multimedia, pages 1018–1027,
2020. 1
[12] Zhizhong Han, Xinhai Liu, Yu-Shen Liu, and Matthias
Zwicker. Parts4Feature: Learning 3D global features from
generally semantic parts in multiple views. In International
Joint Conference on Artificial Intelligence, 2019. 2
[13] Zhizhong Han, Zhenbao Liu, Chi-Man Vong, Yu-Shen Liu,
Shuhui Bu, Junwei Han, and CL Philip Chen. BoSCC:
Bag of spatial context correlations for spatially enhanced 3D
shape representation. IEEE Transactions on Image Process-
ing, 26(8):3707–3720, 2017. 1
[14] Zhizhong Han, Zhenbao Liu, Chi-Man Vong, Yu-Shen Liu,
Shuhui Bu, Junwei Han, and CL Philip Chen. Deep Spatial-
ity: Unsupervised learning of spatially-enhanced global and
local 3D features by deep neural network with coupled soft-
max. IEEE Transactions on Image Processing, 27(6):3049–
3063, 2018. 1
[15] Zhizhong Han, Honglei Lu, Zhenbao Liu, Chi-Man Vong,
Yu-Shen Liu, Matthias Zwicker, Junwei Han, and CL Philip
Chen. 3D2SeqViews: Aggregating sequential views for
3D global feature learning by CNN with hierarchical atten-
tion aggregation. IEEE Transactions on Image Processing,
28(8):3986–3999, 2019. 2
[16] Zhizhong Han, Baorui Ma, Yu-Shen Liu, and Matthias
Zwicker. Reconstructing 3D shapes from multiple sketches
using direct shape optimization. IEEE Transactions on Im-
age Processing, 29:8721–8734, 2020. 2
[17] Zhizhong Han, Guanhui Qiao, Yu-Shen Liu, and Matthias
Zwicker. SeqXY2SeqZ: Structure learning for 3D shapes
by sequentially predicting 1D occupancy segments from 2D
coordinates. In European Conference on Computer Vision,
pages 607–625. Springer, 2020. 2
[18] Zhizhong Han, Mingyang Shang, Yu-Shen Liu, and Matthias
Zwicker. View inter-prediction GAN: Unsupervised repre-
sentation learning for 3D shapes by learning global shape
memories to support local view predictions. In The 33rd
AAAI Conference on Artificial Intelligence (AAAI), 2019. 1
[19] Zhizhong Han, Mingyang Shang, Zhenbao Liu, Chi-Man
Vong, Yu-Shen Liu, Matthias Zwicker, Junwei Han, and
CL Philip Chen. SeqViews2SeqLabels: Learning 3D global
features via aggregating sequential views by RNN with at-
tention. IEEE Transactions on Image Processing, 28(2):658–
672, 2018. 2
[20] Zhizhong Han, Mingyang Shang, Xiyang Wang, Yu-Shen
Liu, and Matthias Zwicker. Y2Seq2Seq: Cross-modal repre-
sentation learning for 3D shape and text by joint reconstruc-
tion and prediction of view and word sequences. The 33th
AAAI Conference on Artificial Intelligence (AAAI), 2019. 1
[21] Zhizhong Han, Xiyang Wang, Yu-Shen Liu, and Matthias
Zwicker. Multi-angle point cloud-vae: Unsupervised fea-
ture learning for 3D point clouds from multiple angles by
joint self-reconstruction and half-to-half prediction. In 2019
IEEE/CVF International Conference on Computer Vision
(ICCV), pages 10441–10450. IEEE, 2019. 2
[22] Zhizhong Han, Xiyang Wang, Yu-Shen Liu, and Matthias
Zwicker. Hierarchical view predictor: Unsupervised
3D global feature learning through hierarchical prediction
among unordered views. In ACM International Conference
on Multimedia, 2021. 2
[23] Zhizhong Han, Xiyang Wang, Chi-Man Vong, Yu-Shen Liu,
Matthias Zwicker, and CL Chen. 3DViewGraph: Learning
global features for 3D shapes from a graph of unordered
views with attention. In International Joint Conference on
Artificial Intelligence, 2019. 2
[24] Tao Hu, Zhizhong Han, Abhinav Shrivastava, and Matthias
Zwicker. Render4Completion: Synthesizing multi-view
depth maps for 3D shape completion. In Proceedings of In-
ternational Conference on Computer Vision (ICCV), 2019.
2
[25] Tao Hu, Zhizhong Han, and Matthias Zwicker. 3D shape
completion with multi-view consistent inference. In AAAI,
2020. 2
[26] Wei Hu, Zeqing Fu, and Zongming Guo. Local frequency
interpretation and non-local self-similarity on graph for point
cloud inpainting. IEEE Transactions on Image Processing,
28(8):4087–4100, 2019. 2
[27] Zitian Huang, Yikuan Yu, Jiawen Xu, Feng Ni, and Xinyi Le.
PF-Net: Point fractal network for 3D point cloud completion.
In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), pages 7662–7670, 2020.
2,3
[28] Yue Jiang, Dantong Ji, Zhizhong Han, and Matthias Zwicker.
SDFDiff: Differentiable rendering of signed distance fields
for 3D shape optimization. In IEEE Conference on Computer
Vision and Pattern Recognition, 2020. 2
[29] Yu Lequan, Li Xianzhi, Fu Chi-Wing, Cohen-Or Daniel, and
Heng Pheng-Ann. PU-Net: Point cloud upsampling net-
work. In Proceedings of the IEEE International Conference
on Computer Vision (ICCV), 2018. 2
[30] Ruihui Li, Xianzhi Li, Chi-Wing Fu, Daniel Cohen-Or, and
Pheng-Ann Heng. PU-GAN: A point cloud upsampling ad-
versarial network. In Proceedings of the IEEE International
Conference on Computer Vision (ICCV), pages 7203–7212,
2019. 2
[31] Minghua Liu, Lu Sheng, Sheng Yang, Jing Shao, and Shi-
Min Hu. Morphing and sampling network for dense point
cloud completion. In Proceedings of the AAAI conference on
artificial intelligence, volume 34, pages 11596–11603, 2020.
2
[32] Xinhai Liu, Zhizhong Han, Fangzhou Hong, Yu-Shen Liu,
and Matthias Zwicker. LRC-Net: Learning discriminative
features on point clouds by encoding local region contexts.
Computer Aided Geometric Design, 79:101859, 2020. 2
[33] Xinhai Liu, Zhizhong Han, Yu-Shen Liu, and Matthias
Zwicker. Point2Sequence: Learning the shape representation
of 3D point clouds with an attention-based sequence to se-
quence network. In Proceedings of the AAAI Conference on
Artificial Intelligence, volume 33, pages 8778–8785, 2019. 2
[34] Xinhai Liu, Zhizhong Han, Yu-Shen Liu, and Matthias
Zwicker. Fine-grained 3D shape classification with hierar-
chical part-view attention. IEEE Transactions on Image Pro-
cessing, 30:1744–1758, 2021. 2
[35] Xinhai Liu, Zhizhong Han, Xin Wen, Yu-Shen Liu, and
Matthias Zwicker. L2G Auto-encoder: Understanding point
clouds by local-to-global reconstruction with hierarchical
self-attention. In Proceedings of the 27th ACM International
Conference on Multimedia, pages 989–997, 2019. 2
[36] Baorui Ma, Zhizhong Han, Yu-Shen Liu, and Matthias
Zwicker. Neural-pull: Learning signed distance functions
from point clouds by learning to pull space onto surfaces.
In International Conference on Machine Learning (ICML),
2021. 2
[37] Yinyu Nie, Yiqun Lin, Xiaoguang Han, Shihui Guo, Jian
Chang, Shuguang Cui, and Jian.J Zhang. Skeleton-bridged
point completion: From global inference to local adjustment.
In Advances in Neural Information Processing Systems, vol-
ume 33, pages 16119–16130. Curran Associates, Inc., 2020.
2
[38] Xuran Pan, Zhuofan Xia, Shiji Song, Li Erran Li, and Gao
Huang. 3D object detection with Pointformer. In Proceed-
ings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR), pages 7463–7472, June 2021.
3
[39] Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz
Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. Im-
age transformer. In International Conference on Machine
Learning, pages 4055–4064. PMLR, 2018. 3
[40] Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas.
PointNet: Deep learning on point sets for 3D classification
and segmentation. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), pages
1063–6919, 2017. 4
[41] Charles R Qi, Li Yi, Hao Su, and Leonidas J Guibas. Point-
Net++: Deep hierarchical feature learning on point sets in a
metric space. In Advances in Neural Information Processing
Systems (NeurIPS), pages 5099–5108, 2017. 4
[42] Minhyuk Sung, Vladimir G Kim, Roland Angst, and
Leonidas Guibas. Data-driven structural priors for shape
completion. ACM Transactions on Graphics, 34(6):175,
2015. 2
[43] Lyne P Tchapmi, Vineet Kosaraju, Hamid Rezatofighi, Ian
Reid, and Silvio Savarese. TopNet: Structural point cloud
decoder. In Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition (CVPR), pages 383–
392, 2019. 1,2,5,6,7
[44] Duc Thanh Nguyen, Binh-Son Hua, Khoi Tran, Quang-Hieu
Pham, and Sai-Kit Yeung. A field model for repairing 3D
shapes. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), pages 5676–5684,
2016. 2
[45] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko-
reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia
Polosukhin. Attention is all you need. In Proceedings of the
31st International Conference on Neural Information Pro-
cessing Systems, pages 6000–6010, 2017. 3
[46] Xiaogang Wang, Marcelo H Ang Jr, and Gim Hee Lee. Cas-
caded refinement network for point cloud completion. In
Proceedings of the IEEEF Conference on Computer Vision
and Pattern Recognition (CVPR), pages 790–799, 2020. 1,
2,3,4,6,7
[47] Yida Wang, David Joseph Tan, Nassir Navab, and Federico
Tombari. SoftPoolNet: Shape descriptor for point cloud
completion and classification. In European Conference on
Computer Vision (ECCV), 2020. 7
[48] Xin Wen, Zhizhong Han, Yan-Pei Cao, Pengfei Wan, Wen
Zheng, and Yu-Shen Liu. Cycle4Completion: Unpaired
point cloud completion using cycle transformation with
missing region coding. In Proceedings of the IEEE Confer-
ence on Computer Vision and Pattern Recognition (CVPR),
2021. 1,5
[49] Xin Wen, Zhizhong Han, Xinhai Liu, and Yu-Shen Liu.
Point2SpatialCapsule: Aggregating features and spatial rela-
tionships of local regions on point clouds using spatial-aware
capsules. IEEE Transactions on Image Processing, 29:8855–
8869, 2020. 2
[50] Xin Wen, Zhizhong Han, and Yu-Shen Liu. CMPD: Using
cross memory network with pair discrimination for image-
text retrieval. IEEE Transactions on Circuits and Systems
for Video Technology, 31(6):2427–2437, 2020. 2
[51] Xin Wen, Zhizhong Han, Xinyu Yin, and Yu-Shen Liu. Ad-
versarial cross-modal retrieval via learning and transferring
single-modal similarities. In 2019 IEEE International Con-
ference on Multimedia and Expo (ICME), pages 478–483.
IEEE, 2019. 2
[52] Xin Wen, Zhizhong Han, Geunhyuk Youk, and Yu-Shen Liu.
CF-SIS: Semantic-instance segmentation of 3D point clouds
by context fusion with self-attention. In Proceedings of the
28th ACM International Conference on Multimedia, pages
1661–1669, 2020. 2
[53] Xin Wen, Tianyang Li, Zhizhong Han, and Yu-Shen Liu.
Point cloud completion by skip-attention network with hi-
erarchical folding. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), pages
1939–1948, 2020. 1,2,7
[54] Xin Wen, Peng Xiang, Zhizhong Han, Yan-Pei Cao, Pengfei
Wan, Wen Zheng, and Yu-Shen Liu. PMP-Net: Point cloud
completion by learning multi-step point moving paths. In
Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), 2021. 1,6,7
[55] Haozhe Xie, Hongxun Yao, Shangchen Zhou, Jiageng Mao,
Shengping Zhang, and Wenxiu Sun. GRNet: Gridding resid-
ual network for dense point cloud completion. In European
Conference on Computer Vision (ECCV), 2020. 1,3,6,7
[56] Yaoqing Yang, Chen Feng, Yiru Shen, and Dong Tian. Fold-
ingNet: Point cloud auto-encoder via deep grid deformation.
In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), pages 206–215, 2018. 1,
2,4,6,7,8
[57] Wentao Yuan, Tejas Khot, David Held, Christoph Mertz, and
Martial Hebert. PCN: Point completion network. In 2018
International Conference on 3D Vision (3DV), pages 728–
737. IEEE, 2018. 3,4,5,6,7,8
[58] Junming Zhang, Weijia Chen, Yuping Wang, Ram Vasude-
van, and Matthew Johnson-Roberson. Point set voting for
partial point cloud analysis. IEEE Robotics and Automation
Letters, 2021. 7
[59] Wenxiao Zhang, Qingan Yan, and Chunxia Xiao. Detail pre-
served point cloud completion via separated feature aggrega-
tion. In European Conference on Computer Vision (ECCV),
pages 512–528, 2020. 1,3,4,6
[60] Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip Torr, and
Vladlen Koltun. Point transformer. In ICCV, 2021. 3,4
Article
With the increasing use of drones for capturing images in urban areas, correcting for distortion and sawtooth effects on orthophotos generated with these images has become a critical issue. This is particularly challenging due to the larger displacements generated by high objects and lower flight altitude of drones compared to crewed aircraft. In addition, image-based point cloud generation methods often fail to produce complete point clouds due to occluded areas and radiometric changes between overlapping images, especially near the borders of high objects. To address these issues, a novel method is proposed in this article for improving the generated point clouds with image-based methods using a deep learning network, called urban-SnowflakeNet, which comprises the following steps: 1) preparing and normalizing the roof's point cloud; 2) completing the point clouds of the building using the proposed deep learning network; 3) restoring the completed point clouds of the buildings to the real coordinates and combining them with the background point cloud; and, 4) correcting the DSM and generating the final true orthophotos. On two different image datasets, our method reduced distortions at the building's edges by 40% on average when compared to the most recent orthophoto enhancement method. However, by maintaining this success on more datasets, the approach has the potential to improve the accuracy and completeness of point clouds in urban regions, as well as other applications such as 3D model improvement, which require further testing in future works.
Article
Fine-grained 3D shape classification is important for shape understanding and analysis, which poses a challenging research problem. However, the studies on the fine-grained 3D shape classification have rarely been explored, due to the lack of fine-grained 3D shape benchmarks. To address this issue, we first introduce a new 3D shape dataset (named FG3D dataset) with fine-grained class labels, which consists of three categories including airplane, car and chair. Each category consists of several subcategories at a fine-grained level. According to our experiments under this fine-grained dataset, we find that state-of-the-art methods are significantly limited by the small variance among subcategories in the same category. To resolve this problem, we further propose a novel fine-grained 3D shape classification method named FG3D-Net to capture the fine-grained local details of 3D shapes from multiple rendered views. Specifically, we first train a Region Proposal Network (RPN) to detect the generally semantic parts inside multiple views under the benchmark of generally semantic part detection. Then, we design a hierarchical part-view attention aggregation module to learn a global shape representation by aggregating generally semantic part features, which preserves the local details of 3D shapes. The part-view attention module hierarchically leverages part-level and view-level attention to increase the discriminability of our features. The part-level attention highlights the important parts in each view while the view-level attention highlights the discriminative views among all the views of the same object. In addition, we integrate a Recurrent Neural Network (RNN) to capture the spatial relationships among sequential views from different viewpoints. Our results under the fine-grained 3D shape dataset show that our method outperforms other state-of-the-art methods. The FG3D dataset is available at https://github.com/liuxinhai/FG3D-Net.
Article
The continual improvement of 3D sensors has driven the development of algorithms to perform point cloud analysis. In fact, techniques for point cloud classification and segmentation have in recent years achieved incredible performance driven in part by leveraging large synthetic datasets. Unfortunately these same state-of-the-art approaches perform poorly when applied to incomplete point clouds. This limitation of existing algorithms is particularly concerning since point clouds generated by 3D sensors in the real world are usually incomplete due to perspective view or occlusion by other objects. This paper proposes a general model for partial point clouds analysis wherein the latent feature encoding a complete point cloud is inferred by applying a point set voting strategy. In particular, each local point set constructs a vote that corresponds to a distribution in the latent space, and the optimal latent feature is the one with the highest probability. This approach ensures that any subsequent point cloud analysis is robust to partial observation while simultaneously guaranteeing that the proposed model is able to output multiple possible results. This paper illustrates that this proposed method achieves the state-of-the-art performance on shape classification, part segmentation and point cloud completion.
Chapter
3D shape completion for real data is important but challenging, since partial point clouds acquired by real-world sensors are usually sparse, noisy and unaligned. Different from previous methods, we address the problem of learning 3D complete shape from unaligned and real-world partial point clouds. To this end, we propose a weakly-supervised method to estimate both 3D canonical shape and 6-DoF pose for alignment, given multiple partial observations associated with the same instance. The network jointly optimizes canonical shapes and poses with multi-view geometry constraints during training, and can infer the complete shape given a single partial point cloud. Moreover, learned pose estimation can facilitate partial point cloud registration. Experiments on both synthetic and real data show that it is feasible and promising to learn 3D shape completion through large-scale data without shape and pose supervision.