Content uploaded by Mohsen Yavartanoo

Author content

All content in this area was uploaded by Mohsen Yavartanoo on Oct 15, 2021

Content may be subject to copyright.

PolyNet: Polynomial Neural Network for 3D Shape Recognition with PolyShape

Representation

Mohsen Yavartanoo1Shih-Hsuan Hung2Reyhaneh Neshatavar1Yue Zhang2Kyoung Mu Lee1

1SNU ECE & ASRI 2Oregon State University

{myavartanoo,reyhanehneshat,kyoungmu}@snu.ac.kr {hungsh,zhangyue}@oregonstate.edu

Abstract

3D shape representation and its processing have sub-

stantial effects on 3D shape recognition. The polygon

mesh as a 3D shape representation has many advantages

in computer graphics and geometry processing. However,

there are still some challenges for the existing deep neu-

ral network (DNN)-based methods on polygon mesh rep-

resentation, such as handling the variations in the degree

and permutations of the vertices and their pairwise dis-

tances. To overcome these challenges, we propose a DNN-

based method (PolyNet) and a speciﬁc polygon mesh rep-

resentation (PolyShape) with a multi-resolution structure.

PolyNet contains two operations; (1) a polynomial con-

volution (PolyConv) operation with learnable coefﬁcients,

which learns continuous distributions as the convolutional

ﬁlters to share the weights across different vertices, and

(2) a polygonal pooling (PolyPool) procedure by utilizing

the multi-resolution structure of PolyShape to aggregate

the features in a much lower dimension. Our experiments

demonstrate the strength and the advantages of PolyNet on

both 3D shape classiﬁcation and retrieval tasks compared

to existing polygon mesh-based methods and its superiority

in classifying graph representations of images. The code is

publicly available from this link.

1. Introduction

In recent years, increasing applications of 3D shapes rep-

resentation have made it a fundamental problem in com-

puter vision, computer graphics, and augmented reality.

The structure and the high-quality appearance of the rep-

resentation signiﬁcantly impact many tasks, such as 3D

shape classiﬁcation and retrieval. With the advent of deep

neural network (DNN) architectures, several methods have

been proposed to learn 3D shapes. Generally, these meth-

ods can be categorized into four groups based on the

input shape representation; point clouds [47,48], voxel

grids [64,41], 2D projections [56,52,50,9,63], and poly-

gon meshes [20,13,43,40]. The point clouds suffer from

harsh noises and wasted substantial structural information

of the 3D shapes. The voxel grids require large memory,

and also rendering voxel grids generates unnecessarily vo-

luminous data and quantization artifacts. Furthermore, 2D

projection representations encounter severe self-occlusions.

By contrast, a polygon mesh is a collection of vertices

and faces that deﬁnes a 3D shape smoothly and entirely.

Therefore, this representation contains structural informa-

tion without any harsh noises, severe artifacts, and self-

occlusions. Additionally, it is a memory-efﬁcient represen-

tation that can store the full geometry details by reducing

unnecessary voluminous data. However, in polygon mesh-

based methods, weight sharing is still a challenging prob-

lem due to variations in the degree of vertices, the permuta-

tion of adjacent vertices, and their pairwise distances.

To overcome the limitations of the polygon mesh-based

methods, in this work, we propose PolyNet, a novel network

that can effectively learn and extract features of a polygon

mesh representation of 3D shapes by a continuous polyno-

mial convolution (PolyConv). PolyConv is a polynomial

function with learnable coefﬁcients which learns continu-

ous distributions as the convolutional ﬁlters to share the cor-

responding weights among the features of the vertices in the

local patches made from each vertex and its adjacent ver-

tices on the surface. This operation is invariant to the num-

ber of adjacent vertices, their permutations, and their pair-

wise distances nearby the central vertex in the local patch.

Moreover, we design PolyShape representation, a speciﬁc

polygon mesh representation with a multi-resolution struc-

ture. We utilize this multi-resolution attribute to design our

PolyPool operation and apply it after each PolyConv layer.

This PolyPool operation reduces the mesh resolution by a

ﬁxed factor at each layer. We achieve the best classiﬁcation

accuracy and mean Average Precision (mAP) compared to

the previous methods based on voxel grid and polygon mesh

and comparable performance to point cloud-based methods.

We also show the superiority of our designed PolyConv on

the challenging 75 Superpixel MNIST dataset. We summa-

rize the main contributions of our method as follows:

• We propose PolyNet, a novel neural network method

with a continuous convolution operation invariant to

the number of adjacent vertices, their permutations,

and their pairwise distances in 3D shapes.

• We employ PolyNet on PolyShape, a polygon mesh

representation with a multi-resolution structure that

enables us to use a pooling operation named PolyPool.

• We achieve an improvement in classiﬁcation and re-

trieval tasks compared to the previous mesh-based

methods on the ModelNet dataset and the best clas-

siﬁcation performance on the 75 Superpixel MNIST.

2. Related work

In this section, we review the related works based on the

representation of the input 3D shapes: point cloud, voxel

grid, 2D projection, and polygon mesh.

Point cloud. PointNet [47], as a simple and effective DNN-

based method on point clouds, learns the features directly

from each point and aggregate them as one global repre-

sentation. However, extracting local structures is important

for the success of convolutional architectures. To overcome

the lack of local structure of this method, PointNet++ [48]

proposes a hierarchical neural network that employs Point-

Net [47] on the group of points divided into overlapping

local patches. Te et al. [57] and Wang et al. [60] utilized the

GraphCNNs to learn the features from a local graph formed

by the connection of the adjacent points. These graph-based

methods produce little shape information since they do not

explicitly represent the local neighboring points in an or-

dered alignment.

Voxel grid. 3D ShapeNets [64] and VoxNet [41] transfer a

3D shape to a structured binary 3D grid called voxel gird.

Then they learn the global features from the voxels by ex-

tending the CNN architectures from 2D to 3D convolutions.

To reduce the computational complexity on the sparse vox-

els, Riegler et al. [49] and Wang et al. [61] applied the oc-

tree data structure. However, these methods require heavy

computations and unnecessarily voluminous data.

2D projection. MVCNN [56] and RotationNet [24] learn

the features of a 3D shape over a multi-view rendered 2D

images based on conventional 2D CNNs. Moreover, pool-

ing operations aggregate these feature values to reduce the

rotation effects of the 3D shape [52,16,56]. However, these

pooling operations lose a lot of geometric details among the

views, such as two surfaces occluded to each other. Se-

qViews2SeqLabels [19] and SPNet VE [63] aggregate the

information among the sequential views by considering the

view speciﬁc importance to prevent the lost information. On

the other hand, DeepPano [52] and PANORAMA-NN [50]

consider a panoramic view of the 3D shape. They project

the shape into a cylinder surrounding it to accumulate the

contents of multiple views altogether. To extend the num-

ber of viewpoints, utilizing a sphere instead of the cylin-

der leads CNNs to cover all views and learn more robust

features consistent with rotations [9,63]. However, these

image-based methods suffer from self-occlusions.

Polygon mesh. A polygon mesh is a discrete representa-

tion of the surface of a 3D shape with faces and vertices.

This representation can be expressed as a graph; accord-

ingly, any graph-based methods can be applied to it. The

existing graph-based methods are classiﬁed into two main

categories: spectral methods [8,22,10,26,32] and spatial

methods [42,2,45,17,14,40,43,23]. The convolution

operation in the spectral domain is deﬁned by the eigende-

composition of the graph Laplacian, where the eigenvectors

are the same as the Fourier basis [8]. This process is basis-

dependent, which indicates applying the learned parameters

producing different features on a new domain [37]. More-

over, this operation is non-localized ﬁltering in the spec-

tral domain [10]. An efﬁcient method to solve the non-

localization problem is approximating the local spectral ﬁl-

ters via the Chebyshev polynomial expansion [10]. On the

other hand, there is no easy way to induce the weight shar-

ing across different locations of the graph due to the dif-

ﬁculty of matching local neighborhoods in the spatial do-

main [8]. Nevertheless, Atwood and Towsley [2] proposed

a spatial ﬁltering method that assumes information is trans-

ferred from a vertex to its adjacent vertex with a speciﬁc

transition probability. The power of the transition probabil-

ity matrix implies that farther adjacent vertices provide little

information for the central vertex. Furthermore, Geodesic

CNN [40], MoNet [43], and SplineCNN [14] deal with the

weight sharing problem by designing local coordinate sys-

tems for the central vertex in a local patch. They apply a

set of weighting functions to aggregate features on adja-

cent vertices. Then they compute a learnable weighted aver-

age of these aggregated features as the spatial convolution.

However, these methods are computationally expensive and

require predeﬁned local systems of coordinates. Moreover,

Neural3DMM [5] introduces the spiral convolution opera-

tion by enforcing a local ordering of vertices through the

spiral operator. An initial point for each spiral is a vertex

with the shortest geodesic path to a ﬁxed reference point

on a template shape. The remaining vertices of the spiral

are ordered in the clockwise or counterclockwise directions

inductively. However, ﬁnding a reference point for an arbi-

trary shape is challenging. Moreover, the initial point is not

unique once two or more adjacent vertices have the same

shortest path to the reference point.

FC+BatchNorm+ReLU+Dropout

PolyShape

64 128 256 512 1024

Poly-Conv+InstanceNorm+Tanh

Poly-Pool (max)

Global Avg-Pool

#classes

FC+BatchNorm+LogSoftmax

1024

Figure 1: The overview of PolyNet architecture. PolyNet takes a PolyShape as an input and applies four PolyConv followed

by instance normalization and pooling layers and three fully connected (FC) followed by batch normalization layers to learn

the local and the global features of the shape. Then, we employ the hyperbolic tangent and ReLU activation functions to

empower PolyConv and FC layers, respectively. Moreover, PolyPool layers reduce the spatial dimensions and minimize the

overﬁtting by utilizing the multi-resolution structure of the PolyShape. The global average pooling layer avoids the permuta-

tion ambiguities of the vertices. Note that Vand Frefer to the number of vertices and faces for each shape, respectively.

3. PolyNet

In this section, we explain the details of our PolyNet ar-

chitecture and its consistency to the number of adjacent ver-

tices, their permutations, and their pairwise distances in 3D

shapes. PolyNet learns the features locally by PolyConv

operation, and performs PolyPool procedure by utilizing

the multi-resolution structure of our designed PolyShape

representation. Figure 1shows the overview of our PolyNet

architecture. A 3D shape with PolyShape representation

passes through a straightforward network with three Poly-

Conv layers followed by PolyPool layers and another Poly-

Conv layer with a global average pooling to learn and ex-

tract the features. Then three fully connected layers classify

the shape with these extracted features.

3.1. Polynomial convolution operation

To overcome the challenges of the weight sharing across

different vertices in the conventional CNNs and GraphC-

NNs, we propose PolyConv operation, which learns a prob-

ability density function (PDF) as a convolutional ﬁlter. Let

us assume that the surface of a 3D shape is a differen-

tial manifold M. For a point vand its neighbor uin a

local patch N(v)on the manifold M, we deﬁne signals

x:M → [−1,1] and y:M → [−1,1] as the fea-

tures at those points, respectively. Without loss of gener-

ality, we consider the convolutional weights of the standard

CNN as the probability distributions. We then argue that

a patch operation D(v)in the standard CNN can be ex-

pressed as an expected value over the features on sample

points Ns(v)⊂ N(v)surrounding vas Eq. 1:

D(v) = E[y|x] = X

u∈Ns(v)

w(u)y(u),(1)

where w(u)is the corresponding probability (weight) to the

point u. However, in a general graph or polygon mesh, the

locations of the adjacent points are in a continuous domain

and can vary; hence, it is not possible to assign a discrete

distribution as the weights. Therefore, we assume that there

is an unknown conditional PDF that can express the convo-

lution ﬁlter weights, and then we can formulate the expected

value over each patch as Eq. 2:

D(v) = E[y|x] = ZN(v)

yf (y|x)dy, (2)

where f(y|x)is the conditional probability of the feature y

on point uin the neighborhood of the central point vwith

the given feature x. The conditional probability f(y|x)can

be written as Eq. 3:

f(y|x) = f(x, y)

fx(x)=f(x, y)

R1

−1f(x, y)dy ,(3)

where fx(x)is the marginal distribution that can be ob-

tained by integrating the joint probability distribution,

f(x, y), over y. Note that, since the value of feature yis

deﬁned in [−1,1] ∈R,R1

−1f(x, y)dy is a deﬁnite integral

on the interval [−1,1] ∈R. We reformulate f(y|x)by ap-

proximating f(x, y)with a polynomial function of xand y

by considering a certain degree das Eq. 4:

f(y|x) = P

0≤i,j,i+j≤d

ai,j xiyj

P

0≤i≤d

bixi,(4)

where the coefﬁcients bican be directly obtained by com-

puting the marginal distribution fx(x)from f(x, y). To en-

sure that the polynomial function as a PDF is always pos-

itive, the coefﬁcient matrix Ain the compact form of the

Input Local Patch Input Features Expected Values Output Features

Features at each channel

Convolution Weights

Input Local Patch Input Features Expected Values Output Features

Features at each channel

Convolution Weights

(a) Unsqueezed operations (b) Squeezed operations

Figure 2: PolyConv operations over a local patch. A set of conditional PDFs approximated with polynomial functions

with learnable coefﬁcients is applied as the convolutional ﬁlters to learn the input features on a local patch of vertices

Ns(v)⊂ N(v). (a) The unsqueezed operation includes C×C0conditional PDFs as the convolutional ﬁlterers which map

the input features to the higher dimensional output features. (b) The squeezed operation contains only Cconditional PDFs

which is combined with a fully connected layer to map the input features to the higher dimensional output features. Note that

Cand C0refer to the size of input channels and output channels, respectively.

polynomial function as Eq. 5must be positive deﬁnite.

f(x, y) = X

0≤i,j,i+j≤d

ai,j xiyj

=XTAX > 0,

(5)

where Xis the vector of variables xand ywith degrees of

less or equal d/2. Therefore, instead of learning the coef-

ﬁcient matrix A, we parameterize it as A=BBT0.

Indeed, we approximate the conditional PDF f(x|y)as the

continuous convolutional ﬁlters with the polynomial func-

tions, which are parameterized by the learnable symmetric

matrix B. For more details, refer to the supplementary ma-

terial. The large degree of freedom in polynomial functions

allows approximating any complex distributions.

It is important to note that since we only have few sam-

ples Ns(e.g., points, vertices, etc.) in each local patch N

on the manifold M, computing the exact expected value

over each local patch is not possible with Eq. 2. Therefore,

we approximate the integral by taking the weighted average

over these sample points as Eq. 6:

ZN(v)

yf (y|x)dy '1

|Ns(v)|X

u∈Ns(v)

yf (y|x).(6)

Finally, we design unsqueezed and squeezed operations

based on the proposed patch operator as the convolutions to

learn from the input features, as shown in Figure 2. In the

ﬁrst approach, similar to the conventional CNNs, we con-

sider multiple conditional PDFs corresponding to the input

and the output channels as the convolution ﬁlters. How-

ever, this operation requires heavy computations and large

memory usage. Therefore, we squeeze it by allocating dif-

ferent conditional PDFs to only the input channels and ag-

gregate the results by a fully connected layer. The second

approach is beneﬁcial when the number of input vertices is

large. Therefore, with this continuous convolution opera-

tion, we can locally learn the features from the surface of

3D shapes, which is invariant to the number of vertices in a

local patch, their permutation, and their pairwise distances.

3.2. Polygonal shape representation and pooling

To apply pooling after each convolution operation, we

present PolyShape representation with a multi-resolution

structure made of a sequence of the subdivisions and shape

ﬁttings, as shown in Figure 3. This multi-resolution struc-

ture enables the pooling operations without any learnable

parameter, which is similar to the multi-level pooling on

images. Moreover, PolyShape maintains the structural de-

tails and the topology of the shape after each pooling and

provides a semi-regular structure that beneﬁts the analysis

of the local structure of the shape (i.e., each vertex and its

corresponding neighborhood) [46].

For PolyShape processing, we ﬁrst employ the mesh fu-

sion [55] to a given 3D CAD model for the abstraction of the

shapes with a simpler topology. Next, we ﬁx the geometric

errors of the meshes, such as the non-manifold edges and

double vertices, and then reduce the number of the vertices

to obtain a coarse mesh with nearly 400 vertices. Lastly,

PolyShape

3D CAD Model

X

Y

Z

Coarse Subdivision Level 1 Fitting Level 1 Subdivision Level 2 Fitting Level 2 Subdivision Level 3

Vertices of Levels 1

Vertices of Levels 2

Vertices of PolyShape

Vertices of 3D CAD

Vertices of Coarse

Preprocessing

Shape Fitting

Subdivision

Figure 3: The overview of PolyShape processing. A given 3D CAD model passes through a preprocessing pipeline to produce

a coarse polygon mesh with a simpler topology and single connected component. Next, the subdivisions and shape ﬁtting

procedures sequentially create PolyShape with the multi-resolution structure for the shape.

we subdivide the coarse mesh and ﬁt the resulting mesh to

the given model to restore the details of the original shape.

We apply this sub-division routine iteratively, as frequent

as the number of the pooling layers in the PolyNet (i.e., 3

times). For the subdivision, there are two common meth-

ods: the primal triangle quadrisection (PTQ) [39] and the

√3-subdivision [28]. PTQ is a straightforward approach

that splits a triangle into four sub-triangles. It creates new

vertices on each edge in the original mesh and connects

them to each of the other new vertices from the same face.

The other strategy, √3-subdivision, adds the new vertices

inside each triangle in the original mesh and connects the

new vertices to each of its three old surrounding vertices

and adjacent new vertices. Every two iterations of the √3-

subdivision separate each original triangle into nine sub-

triangles. Thus, PolyShapes have fewer triangles by the √3-

subdivision than the PTQ. We evaluate the effectiveness of

the PolyShapes made by both subdivisions in Section 4.

With the multi-resolution structure of PolyShape, we can

downsample the output of PolyConv layers by collapsing

the neighboring vertices to each interior vertex (i.e., the ver-

tices of the coarser meshes), shown in Figure 4. The PTQ

and √3-subdivision upsample the mesh vertices by adding

another vertex at the center of each edge and each trian-

gle, respectively. The downsampling procedures are accom-

plished as the inverse process of the upsampling methods,

which allow us to generate relatively larger polygons, as

shown in Figure 4. Therefore, we can reduce the number

of polygons by the factors of four and three by employing

the PTQ and √3-subdivision methods, respectively. We use

these downsampling procedures as the pooling, which facil-

itates aggregating the features, where each vertex Von the

downsampled mesh takes the maximum (max-pool) over

(a) Original (b) PTQ (c) -subdivision

Figure 4: PolyPool operations. The black dots and red dots

show the vertices before and after PolyPool, respectively.

Dash lines indicate the omitted edges after the subdivisions.

features values of the vertices v∪ {ui}m

i=1 in a local patch

on the output of PolyConv layers. Therefore, PolyPool en-

ables describing a 3D shape with a large number of vertices

by aggregating the features. These aggregated features are

much lower in dimension compared to using all of the ex-

tracted features and also can improve the performance like

the conventional pooling operation of the 2D CNNs [44].

4. Experiments

In this section, we present the details of the datasets and

several experiments of PolyNet on the 3D shapes with and

without PolyShape representation and graph representation

of images. We compare our proposed method with the state-

of-the-art methods on both classiﬁcation and retrieval tasks.

We use Adam optimizer in all of our experiments with the

initial learning rate as 1.e-3 and 1.e-2 for unsqueezed and

squeezed cases, respectively. We set a mini-batch size as

100 and 10 for experiments on 3D shapes and graph rep-

resentation of images, respectively. We choose hyperbolic

tangent for the activation functions on PolyConv layers to

Layer 1 Layer 2 Layer 3 Layer 4

C = 6

C = 70

C = 134

C = 262

Layer 1

C = 6

Layer 2

C = 70

Layer 3

C =

134

Layer 4

C =

262

(a) Joint distributions (b) Marginal distributions

Figure 5: Joint and marginal distributions. Visualization of (a) the learned joint distributions f(x, y)and (b) the marginal

distributions fx(x)approximated by polynomial functions of degree d= 2 on the ModelNet-10 with the √3-subdivision.

guarantee that the input features to the next layers are in the

interval [−1,1] ∈R, and we use cross-entropy loss between

the model predicted scores and ground truth labels. We

also implement our model in Python3.6 using PyTorch via

CUDA instruction. PolyShape processing, including both

subdivision methods, takes 92 ms for one CAD model on

average, and we ensure all the conversions are successful

for the ModelNet dataset. The average testing times for the

baseline of PolyNet per shape for the √3-subdivision and

PTQ are 13 ms and 18 ms, respectively. We will publish the

code for both PolyShape processing and PolyNet.

4.1. Datasets

In our experiments, we use both the ModelNet-10 and

the ModelNet-40 datasets [64] containing 4,899 CAD mod-

els (3991 for training and 908 for testing) in 10 cate-

gories and 12,311 CAD models (9843 for training and

2468 for testing) in 40 categories, respectively. We apply

our PolyShape processing to the CAD models with Hou-

dini [53], a popular 3D modeling software. For more de-

tails about the PolyShape pipeline, refer to supplemental

material. Additionally, we translate and scale the result-

ing PolyShapes into the bounding box [−1,1]3∈R3. We

extract the coordinates (x, y, z)∈R3and the normal vec-

tors (nx, ny, nz)∈R3, for all vertices as the ﬁrst input into

PolyNet. We also use the MNIST dataset [31], which in-

cludes 28×28 images. These images are represented as dif-

ferent graphs so that each vertex and each edge corresponds

to a superpixel and the spatial relation between two super-

pixels, respectively [43]. Therefore, we consider the con-

struction of superpixel-based graphs with 75 vertices. We

use the standard splitting of the MNIST dataset, including

60k and 10K images for training and testing, respectively.

4.2. Convolution operation

We evaluate our convolution operation PolyConv with

different conﬁgurations and compare it with various fa-

mous convolutional operations as shown in Table 1. Since

squeezed PolyConv requires fewer computations and less

Conv. √3-subdivision PTQ Params.

max avg Time max avg Time

XConv [36] 84.58 83.31 173ms 85.54 83.85 835ms 473k

SplineConv [14] 93.46 92.72 12ms 93.16 92.66 18ms 111m

ChebConv [11] 93.95 93.42 12ms 93.70 93.11 14ms 712k

GCNConv [27] 93.85 93.38 9ms 93.85 93.30 13ms 179k

GMMConv [43] 93.32 92.68 17ms 93.20 92.36 28ms 4.6m

FiLMConv [7] 94.43 93.89 9ms 94.30 93.76 13ms 1.1m

PolyConv(d= 4) 93.96 93.13 25ms 94.00 92.38 38ms 189k

PolyConv(d= 2)94.52 93.95 13ms 94.40 94.08 18ms 182k

Table 1: Classiﬁcation accuracy (Acc%), testing time, and

number of parameters in only convolution layers on the

ModelNet-10 with both subdivision strategies for the var-

ious convolution operations in PolyNet.

memory usage compared to the unsqueezed version due to

the less number of learnable parameters, we use it for the

experiments of 3D shape classiﬁcation where the inputs are

extremely large (roughly 10k vertices). We consider two

different degrees d= 2 and d= 4 for each polynomial

function deﬁned in Eq. 5which each requires six and 21

learnable coefﬁcients for a patch operation, respectively.

The results on MoldelNet-10 with both subdivision strate-

gies show that PolyConv with degree d= 2 achieves rel-

atively higher performance than degree d= 4, which can

be due to its straightforward and easier to learn structure

for approximating the distributions. Furthermore, we eval-

uate PolyNet on ModelNet-10 by replacing PolyConv with

well-known convolutions such as ChebConv [11], GCN-

Conv [27], GMMConv [43], SplineConv [14], XConv [36],

and FiLMConv [7]. We show that PolyConv with degree

d= 2 achieves superior performances compared to all

mentioned convolutions for both subdivision strategies. To-

wards a better understanding of the PDFs, we visualize the

learned joint PDFs f(x, y)and the marginal PDFs fx(x)

of squeezed PolyConv for polynomial functions of degree

d= 2 which are learned on the ModelNet-10 dataset with

012

345

Figure 6: The graphs of the MNIST dataset. Visualization

of handwritten digits in the MNIST dataset individually rep-

resented as the graph of superpixel with 75 vertices.

the √3-subdivision in Figure 5. The results illustrate the di-

versity of learned PDFs among different input channels and

different layers of PolyNet.

On the other hand, we evaluate our unsqueezed Poly-

Conv and compare it with the squeezed PolyConv for the

polynomial functions of degree d= 2 on a classical task

of handwritten digit classiﬁcation in the graph represen-

tation of the MNIST dataset [43] with 75 vertices. De-

spite the simplicity of underlying images, this is a challeng-

ing task due to the lack of a regular grid structure among

the nodes, as shown in Figure 6. We use three convolu-

tional layers (256,256,256) and three fully connected lay-

ers (1024,1024,10) in the network architecture. We employ

both unsqueezed and squeezed PolyConvs as the convolu-

tion operations and the graclus clustering for the pooling

procedure. We use the position information of the vertices

as the extra features. We show in Table 2that PolyConv

outperforms the existing methods, which demonstrates the

strengths of PolyConv to learn features from irregular data

as well as semi-regular data. Moreover, we show that while

the squeezed version of PolyConv achieves only slightly

lower performance, it requires 135k learnable parameters

in PolyConv layers, which is much more efﬁcient than un-

squeezed PolyConv with 792k parameters. Note that the

average testing time for squeezed PolyConv on the samples

of the 75 Superpixel MNIST is 1.4 ms, while it is 12.4 ms

for the unsquzeed PolyConv.

4.3. Pooling layers

To show the beneﬁts of our PolyPool operation, we ap-

ply PolyNet with various conﬁgurations of the pooling op-

eration and input data type and compare the results in Ta-

ble 3. We consider two different data representations, in-

cluding data with and without PolyShape processing for the

ModelNet-10 dataset. Our experiments demonstrate that

PolyPool with PolyShape representation can effectively im-

prove the performance, especially by the pooling based on

√3-subdivision. Moreover, we employ a three-level gra-

clus [12] as an efﬁcient clustering algorithm on the data

without PolyShape representation. However, the results

show lower accuracy when we use pooling based on the gra-

clus clustering compared to both √3-subdivision and PTQ.

We interpret the accuracy gap as the effect of losing struc-

tural information of 3D shapes by applying the graclus clus-

tering. Note that we compute the maximum value (max-

pool) of each local patch for all pooling strategies.

4.4. 3D shape classiﬁcation

To improve the 3D shape classiﬁcation performance, we

combine the output of the last PolyConv layer for both sub-

divisions by taking an average over their features. We com-

pare the classiﬁcation results of our PolyNet, with the re-

cent state-of-the-art methods in Table 4on the ModelNet-10

and ModelNet-40 datasets. We note that our PolyNet out-

performs all the mesh-based and voxel-based approaches

on the classiﬁcation task and achieves comparable perfor-

mance to the methods based on point clouds. The perfor-

mance gaps between the 2D projection-based methods and

other methods are due to utilizing pre-trained networks on

a large number of images. Moreover, images include tex-

ture information produced by lights and shadows, while the

other representations suffer from a lack of such information.

Method Acc. (max)

MoNet [43] 91.11

SplineCNN [14] 95.22

GCGP [59] 95.80

GAT [4] 96.19

PNCNN [15] 98.76

PolyConv (squeezed) 98.39

PolyConv (unsqueezed) 98.95

Table 2: Classiﬁcation accuracy (Acc%) of various methods

and our PolyConv operations on a superpixel representation

of the MNIST dataset with 75 vertices.

Pooling Poly Accuracy Time Num.

Shape max avg

No pooling 794.11 92.69 22ms 2.8k

Graclus [12]794.14 92.73 16ms 2.8k

PolyPool(PTQ) 394.40 94.08 18ms 25.7k

PolyPool(√3-sub) 394.52 93.95 13ms 10.8k

Table 3: Classiﬁcation accuracy (Acc%), average testing

time, and average number of vertices for various types of

poolings and data with and without PolyShape processing.

Rep. Method ModelNet-10ModelNet-40

Acc mAP Acc mAP

2D

DeepPano [52] 85.45 84.18 77.63 76.81

Projection

MVCNN [56] - - 90.10 79.50

PANORAMA-ENN [50] 96.85 93.28 95.56 86.34

SPNet VE [63] 97.25 94.20 92.63 85.21

RotationNet [24]98.46 -97.37 -

voxel

3D ShapeNets [64] 83.54 68.26 77.32 49.23

grid

VoxNet [41] 92.00 - 83.00 -

VRN [6] 93.61 - 91.33 -

FusionNet [21] 93.11 - 90.80 -

LP-3DCNN [29]94.40 -92.10 -

Point

PointNet [47] - - 89.20 -

cloud

PointNet++ [48] - - 91.90 -

SO-Net [33]95.50 - 90.80 -

KCNet [51] 94.40 - 91.00 -

PCNN [3] 94.90 - 92.30 -

SpiderCNN [62] - - 92.40 -

PointCNN [35] - - 92.50 -

DGCNN [1] - - 92.90 -

KPConv [58] - - 92.90 -

RS-CNN [38] - - 93.60 -

Polygon

SPH [25] 79.79 44.05 68.23 33.26

Mesh

Geometry Image [54] 88.40 74.90 83.90 51.30

MeshNet [13] - - 91.90 81.90

Cross-atlas [34] 91.20 - 87.50 -

SNGC [18] - - 91.60 -

MeshWalker [30] - - 92.30 -

PolyNet (√3),(d=2) 94.52 83.91 92.14 82.36

PolyNet (PTQ),(d=2) 94.40 83.84 92.06 81.91

PolyNet (PTQ,√3),(d=2)94.93 84.62 92.42 82.86

Table 4: Classiﬁcation accuracy (Acc%) and mean Aver-

age Precision (mAP%) of PolyNet compared to the state-

of-the-art methods based on different representations on the

ModelNet-10 and the ModelNet-40 datasets.

4.5. 3D shape retrieval

We also evaluate and compare PolyNet in the retrieval

task with previous methods. We extract the output after

the softmax, measure similarities between the query and

the retrieved shapes by the L1 norm, and rank the relevant

shapes. We use mAP to quantitatively compare our retrieval

approach to the related methods on both the ModelNet-10

and the ModelNet-40 datasets, as shown in Table 4. We out-

perform all previously evaluated methods on the retrieval

task based on polygon mesh and voxel grid representations.

Lastly, we show some retrieved shapes in a ranked order for

the given queries on the ModelNet-10 trained by squeezed

PolyConvs, including polynomial functions of degree d= 2

in Figure 7. We illustrate that our method can retrieve vi-

sually similar shapes even when the query and retrieved

shapes are in different categories (e.g., retrieved table for

Figure 7: Retrieval results. This ﬁgure demonstrates the re-

trieved shapes for the given queries using PolyNet. The blue

models in the ﬁrst column are the queries. The retrieved re-

sults in green are from the same category as the query, while

the results in red are from different categories. From left to

right, the results are ordered with a descending rank.

the query desk and nightstand for the dresser).

5. Conclusion

In this paper, we propose PolyNet, a DNN-based method

consists of PolyConv and PolyPool operations to locally

learn and aggregate the information on the surface of 3D

shapes. In PolyConv, we utilize polynomial functions to

learn a continuous distribution as the convolutional ﬁlters,

which is invariant to the variation in the degree of vertices,

their permutations, and their pairwise distances. Moreover,

we design PolyShape with a multi-resolution structure that

enables applying PolyPool operation without missing ge-

ometrical structures after each layer. Our comprehensive

evaluations of PolyNet across classiﬁcation and retrieval

tasks and the theoretical analysis indicating the invariant

properties of PolyConv demonstrate its strength and supe-

riority over most of the previous methods. In future works,

we will explore the applications of PolyNet in 3D shape seg-

mentation and PolyConv in image-based computer vision

tasks where there are no regular neighboring connectives.

Acknowledgement

This work was supported by IITP grant funded by the

Korea government(MSIT) [NO.2021-0-01343, Artiﬁcial In-

telligence Graduate School Program (Seoul National Uni-

versity)]

References

[1] Dgcnn: A convolutional neural network over large-scale la-

beled graphs. Neural Networks, 2018. 8

[2] James Atwood and Donald F. Towsley. Diffusion-

convolutional neural networks. In NIPS, 2016. 2

[3] Matan Atzmon, Haggai Maron, and Yaron Lipman. Point

convolutional neural networks by extension operators. ACM

Trans. Graph., 2018. 8

[4] P. H. C. Avelar, A. R. Tavares, T. L. T. da Silveira, C. R. Jung,

and L. C. Lamb. Superpixel image classiﬁcation with graph

attention networks. In 2020 33rd SIBGRAPI Conference on

Graphics, Patterns and Images (SIBGRAPI), 2020. 7

[5] Giorgos Bouritsas, Sergiy V. Bokhnyak, Stylianos Ploumpis,

Michael M. Bronstein, and Stefanos Zafeiriou. Neural 3d

morphable models: Spiral convolutional networks for 3d

shape representation learning and generation. ICCV, 2019. 2

[6] Andr´

e Brock, Theodore Lim, James M. Ritchie, and Nick

Weston. Generative and discriminative voxel modeling with

convolutional neural networks. CoRR, 2016. 8

[7] Marc Brockschmidt. GNN-FiLM: Graph neural networks

with feature-wise linear modulation. In Proceedings of the

37th International Conference on Machine Learning, 2020.

6

[8] Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann Le-

Cun. Spectral networks and locally connected networks on

graphs. CoRR, 2013. 2

[9] Taco S. Cohen, Mario Geiger, Jonas K ¨

ohler, and Max

Welling. Spherical cnns. CoRR, 2018. 1,2

[10] Micha¨

el Defferrard, Xavier Bresson, and Pierre Van-

dergheynst. Convolutional neural networks on graphs with

fast localized spectral ﬁltering. In NIPS. 2016. 2

[11] Micha¨

el Defferrard, Xavier Bresson, and Pierre Van-

dergheynst. Convolutional neural networks on graphs with

fast localized spectral ﬁltering. In Proceedings of the 30th

International Conference on Neural Information Processing

Systems, 2016. 6

[12] I. S. Dhillon, Y. Guan, and B. Kulis. Weighted graph cuts

without eigenvectors a multilevel approach. IEEE Transac-

tions on Pattern Analysis and Machine Intelligence, 2007. 7

[13] Yutong Feng, Yifan Feng, Haoxuan You, Xibin Zhao, and

Yue Gao. Meshnet: Mesh neural network for 3d shape rep-

resentation. In AAAI, 2018. 1,8

[14] M. Fey, J. E. Lenssen, F. Weichert, and H. M¨

uller. Splinecnn:

Fast geometric deep learning with continuous b-spline ker-

nels. 2018 IEEE/CVF Conference on Computer Vision and

Pattern Recognition, 2018. 2,6,7

[15] Marc Finzi, R. Bondesan, and M. Welling. Proba-

bilistic numeric convolutional neural networks. ArXiv,

abs/2010.10876, 2020. 7

[16] Takahiko Furuya and Ryutarou Ohbuchi. Deep semantic

hashing of 3d geometric features for efﬁcient 3d model re-

trieval. In CGIC, 2017. 2

[17] Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol

Vinyals, and George E. Dahl. Neural message passing for

quantum chemistry. ArXiv, 2017. 2

[18] Niv Haim, Nimrod Segol, Heli Ben-Hamu, Haggai Maron,

and Y. Lipman. Surface networks via general covers. 2019

IEEE/CVF International Conference on Computer Vision

(ICCV), 2019. 8

[19] Z. Han, M. Shang, Z. Liu, C. Vong, Y. Liu, M. Zwicker,

J. Han, and C. L. P. Chen. Seqviews2seqlabels: Learning

3d global features via aggregating sequential views by rnn

with attention. IEEE Transactions on Image Processing, (2),

2019. 2

[20] Rana Hanocka, Amir Hertz, Noa Fish, Raja Giryes, Shachar

Fleishman, and Daniel Cohen-Or. Meshcnn: a network with

an edge. ACM Trans. Graph., 2019. 1

[21] Vishakh Hegde and Reza Bosagh Zadeh. Fusionnet: 3d ob-

ject classiﬁcation using multiple data representations. ArXiv,

2016. 8

[22] Mikael Henaff, Joan Bruna, and Yann LeCun. Deep con-

volutional networks on graph-structured data. CoRR, 2015.

2

[23] Jingwei Huang, Haotian Zhang, Li Yi, Thomas Funkhouser,

Matthias Nießner, and Leonidas J Guibas. Texturenet:

Consistent local parametrizations for learning from high-

resolution signals on meshes. In Proceedings of the IEEE

Conference on Computer Vision and Pattern Recognition,

2019. 2

[24] A. Kanezaki, Y. Matsushita, and Y. Nishida. Rotationnet:

Joint object categorization and pose estimation using mul-

tiviews from unsupervised viewpoints. In CVPR, 2018. 2,

8

[25] Michael M. Kazhdan, Thomas A. Funkhouser, and Szymon

Rusinkiewicz. Rotation invariant spherical harmonic repre-

sentation of 3d shape descriptors. In Symposium on Geome-

try Processing, 2003. 8

[26] Thomas Kipf and Max Welling. Semi-supervised classiﬁca-

tion with graph convolutional networks. ArXiv, 2016. 2

[27] Thomas N. Kipf and Max Welling. Semi-supervised classi-

ﬁcation with graph convolutional networks. In International

Conference on Learning Representations (ICLR), 2017. 6

[28] Leif Kobbelt. 3-subdivisionn. In Proceedings of the 27th

Annual Conference on Computer Graphics and Interactive

Techniques, 2000. 5

[29] Sudhakar Kumawat and Shanmuganathan Raman. LP-

3DCNN: unveiling local phase in 3d convolutional neural

networks. In CVPR, 2019. 8

[30] Alon Lahav and A. Tal. Meshwalker: Deep mesh under-

standing by random walks. ACM Trans. Graph., 2020. 8

[31] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-

based learning applied to document recognition. Proceed-

ings of the IEEE, 1998. 6

[32] Ron Levie, Federico Monti, Xavier Bresson, and Michael M.

Bronstein. Cayleynets: Graph convolutional neural networks

with complex rational spectral ﬁlters. IEEE Transactions on

Signal Processing, 2019. 2

[33] Jiaxin Li, Ben M. Chen, and Gim Hee Lee. So-net: Self-

organizing network for point cloud analysis. CoRR, 2018.

8

[34] S. Li, Z. Luo, M. Zhen, Y. Yao, T. Shen, T. Fang, and L.

Quan. Cross-atlas convolution for parameterization invariant

learning on textured mesh surface. In CVPR, 2019. 8

[35] Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di,

and Baoquan Chen. In Advances in Neural Information Pro-

cessing Systems, 2018. 8

[36] Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di,

and Baoquan Chen. Pointcnn: Convolution on x-transformed

points. In Advances in Neural Information Processing Sys-

tems, 2018. 6

[37] R. Litman and A. M. Bronstein. Learning spectral descrip-

tors for deformable shape correspondence. IEEE Transac-

tions on Pattern Analysis and Machine Intelligence, 2014. 2

[38] Yongcheng Liu, Bin Fan, Shiming Xiang, and Chunhong

Pan. Relation-shape convolutional neural network for point

cloud analysis (cvpr 2019 oral & best paper ﬁnalist). 2019.

8

[39] Charles Loop. Smooth subdivision surfaces based on tri-

angles. Master’s thesis, University of Utah, Department of

Mathematics, 1987. 5

[40] Jonathan Masci, Davide Boscaini, Michael M. Bronstein,

and Pierre Vandergheynst. Geodesic convolutional neural

networks on riemannian manifolds. In ICCVW, 2015. 1,

2

[41] D. Maturana and S. Scherer. Voxnet: A 3d convolutional

neural network for real-time object recognition. In IROS,

2015. 1,2,8

[42] A. Micheli. Neural network for graphs: A contextual con-

structive approach. IEEE Transactions on Neural Networks,

2009. 2

[43] F. Monti, D. Boscaini, J. Masci, E. Rodol`

a, J. Svoboda, and

M. M. Bronstein. Geometric deep learning on graphs and

manifolds using mixture model cnns. In CVPR, 2017. 1,2,

6,7

[44] J. Nagi, F. Ducatelle, G. A. Di Caro, D. Cires¸an, U. Meier,

A. Giusti, F. Nagi, J. Schmidhuber, and L. M. Gambardella.

Max-pooling convolutional neural networks for vision-based

hand gesture recognition. In ICSIPA, 2011. 5

[45] Mathias Niepert, Mohamed Ahmed, and Konstantin

Kutzkov. Learning convolutional neural networks for graphs.

ArXiv, 2016. 2

[46] Fr´

ed´

eric Payan, C´

eline Roudet, and Basile Sauvage. Semi-

regular triangle remeshing: A comprehensive study. In Com-

puter Graphics Forum, 2015. 4

[47] Charles Ruizhongtai Qi, Hao Su, Kaichun Mo, and

Leonidas J. Guibas. Pointnet: Deep learning on point sets

for 3d classiﬁcation and segmentation. CVPR, 2016. 1,2,8

[48] Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J.

Guibas. Pointnet++: Deep hierarchical feature learning on

point sets in a metric space. In NIPS, 2017. 1,2,8

[49] Gernot Riegler, Ali O. Ulusoy, and Andreas Geiger. Octnet:

Learning deep 3d representations at high resolutions. CVPR,

2016. 2

[50] Konstantinos Sﬁkas, Theoharis Theoharis, and Ioannis

Pratikakis. Exploiting the PANORAMA Representation for

Convolutional Neural Network Classiﬁcation and Retrieval.

In Eurographics Workshop on 3D Object Retrieval, 2017. 1,

2,8

[51] Yiru Shen, Chen Feng, Yaoqing Yang, and Dong Tian. Min-

ing point cloud local structures by kernel correlation and

graph pooling. In CVPR, 2018. 8

[52] B. Shi, S. Bai, Z. Zhou, and X. Bai. Deeppano: Deep

panoramic representation for 3-d shape recognition. IEEE

Signal Processing Letters, 2015. 1,2,8

[53] SideFX. Houdini. 6

[54] Ayan Sinha, Jing Bai, and Karthik Ramani. Deep learning

3d shape surfaces using geometry images. In ECCV, 2016.

8

[55] David Stutz and Andreas Geiger. Learning 3d shape comple-

tion under weak supervision. 2018. 4

[56] Hang Su, Subhransu Maji, Evangelos Kalogerakis, and

Erik G. Learned-Miller. Multi-view convolutional neural

networks for 3d shape recognition. In Proc. ICCV, 2015.

1,2,8

[57] Gusi Te, Wei Hu, Amin Zheng, and Zongming Guo. Rgcnn:

Regularized graph cnn for point cloud segmentation. In ACM

MM, 2018. 2

[58] Hugues Thomas, Charles R. Qi, Jean-Emmanuel Deschaud,

Beatriz Marcotegui, Franc¸ois Goulette, and Leonidas J.

Guibas. Kpconv: Flexible and deformable convolution for

point clouds. CoRR, 2019. 8

[59] Ian Walker and Ben Glocker. Graph convolutional Gaussian

processes. In Proceedings of the 36th International Confer-

ence on Machine Learning, 2019. 7

[60] Chu Wang, Babak Samari, and Kaleem Siddiqi. Local spec-

tral graph convolution for point set feature learning. In

ECCV, 2018. 2

[61] Peng-Shuai Wang, Yang Liu, Yu-Xiao Guo, Chun-Yu Sun,

and Xin Tong. O-cnn: Octree-based convolutional neural

networks for 3d shape analysis. ACM Trans. Graph., 2017.

2

[62] Yifan Xu, T. Fan, Mingye Xu, L. Zeng, and Y. Qiao. Spider-

cnn: Deep learning on point sets with parameterized convo-

lutional ﬁlters. ArXiv, 2018. 8

[63] Mohsen Yavartanoo, Euyoung Kim, and Kyoung Mu Lee.

Spnet: Deep 3d object classiﬁcation and retrieval using stere-

ographic projection. In ACCV, 2018. 1,2,8

[64] Zhirong Wu, S. Song, A. Khosla, Fisher Yu, Linguang

Zhang, Xiaoou Tang, and J. Xiao. 3d shapenets: A deep

representation for volumetric shapes. In CVPR, 2015. 1,2,

6,8

Supplementary Material for

PolyNet: Polynomial Neural Network for 3D Shape Recognition with PolyShape

Representation

Mohsen Yavartanoo1Shih-Hsuan Hung2Reyhaneh Neshatavar1Yue Zhang2Kyoung Mu Lee1

1SNU ECE & ASRI 2Oregon State University

{myavartanoo,reyhanehneshat,kyoungmu}@snu.ac.kr {hungsh,zhangyue}@oregonstate.edu

S. PolyNet analysis

In this section, we provide more mathematical and qual-

itative analysis of our proposed PolyNet with additional

explanations of PolyConv operation and the preprocessing

procedure of PolyShape and its results. First, we analyze

some equations of PolyConv mentioned in the main paper

for better understanding. Then we discuss PolyShape pre-

processing in detail and show its step-by-step results.

S.1. Expansion of PolyConv

We derive the compact form of the polynomial function

f(x, y)deﬁned in Eq. 5in the main paper for d= 2 and

d= 4 as in the following Eq. S1 and Eq. S2, respectively.

XTAX

=1x y

A11 A12 A13

A21 A22 A23

A31 A32 A33

1

x

y

=A11

+ (A12 +A21)x+ (A13 +A31 )y

+A22x2+ (A23 +A32 )xy +A33y2

=a0,0+a1,0x+a0,1y+a2,0x2+a1,1xy

+a0,2y2

=X

0≤i,j,i+j≤2

ai,j xiyj=f(x, y).

(S1)

XTAX

=1x y x2xy y2

A11 A12 A13 A14 A15 A16

A21 A22 A23 A24 A25 A26

A31 A32 A33 A34 A35 A36

1

x

y

x2

xy

y2

=A11

+ (A12 +A21)x+ (A13 +A31 )y

+ (A22 +A15 +A51)x2+ (A23 +A32

+A14 +A41)xy + (A33 +A16 +A61 )y2

+ (A25 +A52)x3+ (A35 +A53 +A24

+A42)x2y+ (A26 +A62 +A34 +A43 )xy2

+ (A36 +A63)y3

+A55x4+ (A45 +A54 )x3y+ (A44 +A56

+A65)x2y2+ (A46 +A64 )xy3+A66y4

=a0,0+a1,0x+a0,1y

+a2,0x2+a1,1xy +a0,2y2

+a3,0x3+a2,1x2y+a1,2xy2+a0,3y3

+a4,0x4+a3,1x3y+a2,2x2y2+a1,3xy3

+a0,4y4

=X

0≤i,j,i+j≤4

ai,j xiyj=f(x, y).

(S2)

Note that we parameterize the matrix Aby the matrix

Band learn the matrix Binstead of the matrix A. The

parametrized matrix Afor the polynomial functions of de-

File: Input

File: Output

File: Output

File: Output

File: Output

File: Output

File: Output

File: Output

File: Input

1x1x1 1x1x1

400

CAD Model Cleaned Model

matchsize

clean polyreduce

ray

normal

normal

normal

normal

normal

normal

normal

clean

matchsize

tridividesubdivide

Coarse

PTQ1

PTQ2

PTQ3

Sqrt1

Sqrt2

Sqrt3

shape fitting

ray

shape fitting

ray

shape fitting

ray

shape fitting

ray

PTQ

subdivide

PTQ

subdivide

PTQ

ray

shape fitting

shape fitting

ray

shape fitting

Primal Triangle Quadrisection

Process √3-subdivision Process

√3-subdivision

tridivide

√3-subdivision

tridivide

√3-subdivision

Figure S1: PolyShape processing in Houdini. Each box refers to a node with the same name in the software.

gree d= 2 is as Eq. S3;

A=BBT

=

A11 A12 A13

A21 A22 A23

A31 A32 A33

=

B11 B12 B13

B12 B22 B23

B13 B23 B33

B11 B12 B13

B12 B22 B23

B13 B23 B33

=⇒A11 =B2

11 +B2

12 +B2

13

=⇒A12 =B11B12 +B12B22 +B13 B23

=⇒A13 =B11B13 +B12B23 +B13 B33

=⇒A21 =B12B11 +B22B12 +B23 B13

=⇒A22 =B2

12 +B2

22 +B2

23

=⇒A23 =B12B13 +B22B23 +B23 B33

=⇒A31 =B13B11 +B23B12 +B33 B13

=⇒A32 =B13B12 +B23B22 +B33 B23

=⇒A33 =B2

13 +B2

23 +B2

33,

(S3)

where Bis a learnable symmetric matrix.

S.2. Details of PolyShape Processing

We provide the ﬂowchart of the PolyShape processing in

Houdini software in Figure S1. Given a 3D CAD model and

its cleaned model by mesh fusion, we ﬁrst resize the mod-

els and remove the unused points with the matchsize and

clean nodes. Next, we generate the coarse mesh by reduc-

ing the number of vertices to 400 with polyreduce and ﬁt-

ting the shape to the 3D CAD model with ray. We create the

multiresolution of the PolyShape by subdividing the coarse

mesh three times with primal triangle quadrisection (PTQ,

subdivide) or √3-subdivision (tridivide), respectively. At

each iteration of these subdivisions, we ﬁt the generated

mesh to the given 3D CAD model to maintain the details

of the original shape. Figure S2 and Figure S3 show the

resulting PolyShapes of √3-subdivision and PTQ with the

same input models, respectively. The PolyShapes generated

by √3-subdivision have fewer faces than the shapes created

by PTQ at the same level of details. For the highest resolu-

tion of the PolyShapes, √3-subdivision creates 0.43 fewer

faces than PTQ on average. Therefore, the √3-subdivision

provides a more efﬁcient representation for storage and the

computation of the classiﬁcation.

V= 15398

F= 19254

V= 2500

F= 4996

V= 400

F= 796

V= 1198

F= 2394

V= 3592

F= 7182

V= 10774

F= 21546

V= 584

F= 676

V= 2502

F= 5000

V= 400

F= 796

V= 1196

F= 2388

V= 3584

F= 7164

V= 10748

F= 21492

V= 584

F= 676

V= 2502

F= 5000

V= 400

F= 796

V= 1196

F= 2388

V= 3584

F= 7164

V= 10748

F= 21492

V= 1988

F= 1376

V= 2273

F= 4542

V= 400

F= 816

V= 1216

F= 2448

V= 3664

F= 7344

V= 11008

F= 22032

V= 3958

F= 3132

V= 2502

F= 5000

V= 400

F= 796

V= 1196

F= 2388

V= 3584

F= 7164

V= 10748

F= 21492

V= 2482

F= 1794

V= 2500

F= 5000

V= 400

F= 800

V= 1200

F= 2400

V= 3600

F= 7200

V= 10800

F= 21600

V= 41528

F= 22000

V= 2037

F= 4074

V= 400

F= 942

V= 1342

F= 2826

V= 4168

F= 8478

V= 12646

F= 25434

V= 7808

F= 13980

V= 2496

F= 4988

V= 400

F= 798

V= 1198

F= 2394

V= 3592

F= 7182

V= 10774

F= 21546

V= 1074

F= 694

V= 2502

F= 5000

V= 400

F= 796

V= 1196

F= 2388

V= 3584

F= 7164

V= 10748

F= 21492

V= 4204

F= 5364

3D CAD

V= 2385

F= 4770

Cleaned

V= 400

F= 788

Coarse

V= 1188

F= 2364

Sqrt1

V= 3552

F= 7092

Sqrt2

V= 10644

F= 21276

Sqrt3

Figure S2: PolyShape representation. PolyShape processing results on some samples of ModelNet-10 dataset based on √3-

subdivision. Sqrt1 to Sqrt3 refer to the output of the PolyShape procedure after each level of subdivision. Note that Vand F

refer to the number of vertices and faces for each shape.

V= 15398

F= 19254

V= 2500

F= 4996

V= 400

F= 796

V= 1594

F= 3184

V= 6370

F= 12736

V= 25474

F= 50944

V= 584

F= 676

V= 2502

F= 5000

V= 400

F= 796

V= 1594

F= 3184

V= 6370

F= 12736

V= 25474

F= 50944

V= 584

F= 676

V= 2502

F= 5000

V= 400

F= 796

V= 1594

F= 3184

V= 6370

F= 12736

V= 25474

F= 50944

V= 1988

F= 1376

V= 2273

F= 4542

V= 400

F= 816

V= 1594

F= 3184

V= 6370

F= 12736

V= 25474

F= 50944

V= 3958

F= 3132

V= 2502

F= 5000

V= 400

F= 796

V= 1594

F= 3184

V= 6370

F= 12736

V= 25474

F= 50944

V= 2482

F= 1794

V= 2500

F= 5000

V= 400

F= 800

V= 1600

F= 3200

V= 6400

F= 12800

V= 25600

F= 51200

V= 41528

F= 22000

V= 2037

F= 4074

V= 400

F= 942

V= 1600

F= 3200

V= 6400

F= 12800

V= 25600

F= 51200

V= 7808

F= 13980

V= 2496

F= 4988

V= 400

F= 798

V= 1594

F= 3184

V= 6370

F= 12736

V= 25474

F= 50944

V= 1074

F= 694

V= 2502

F= 5000

V= 400

F= 796

V= 1594

F= 3184

V= 6370

F= 12736

V= 25474

F= 50944

V= 4204

F= 5364

3D CAD

V= 2385

F= 4770

Cleaned

V= 400

F= 788

Coarse

V= 1600

F= 3200

PTQ1

V= 6400

F= 12800

PTQ2

V= 25600

F= 51200

PTQ3

Figure S3: PolyShape representation. PolyShape processing results on some samples of ModelNet-10 dataset based on PTQ.

PTQ1 to PTQ3 refer to the output of the PolyShape procedure after each level of subdivision. Note that Vand Frefer to the

number of vertices and faces for each shape.