PreprintPDF Available

3D Intracranial Aneurysm Classification and Segmentation via Unsupervised Dual-branch Learning

Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Intracranial aneurysms are common nowadays and how to detect them intelligently is of great significance in digital health. While most existing deep learning research focused on medical images in a supervised way, we introduce an unsupervised method for the detection of intracranial aneurysms based on 3D point cloud data. In particular, our method consists of two stages: unsupervised pre-training and downstream tasks. As for the former, the main idea is to pair each point cloud with its jittered counterpart and maximise their correspondence. Then we design a dual-branch contrastive network with an encoder for each branch and a subsequent common projection head. As for the latter, we design simple networks for supervised classification and segmentation training. Experiments on the public dataset (IntrA) show that our unsupervised method achieves comparable or even better performance than some state-of-the-art supervised techniques, and it is most prominent in the detection of aneurysmal vessels. Experiments on the ModelNet40 also show that our method achieves the accuracy of 90.79\% which outperforms existing state-of-the-art unsupervised models.
Content may be subject to copyright.
3D Intracranial Aneurysm Classification and Segmentation via Unsupervised
Dual-branch Learning
Di Shao
Deakin University
75 Pigdons Rd, Waurn Ponds, 3216, Australia
Xuequan Lu* (corresponding author)
Deakin University
75 Pigdons Rd, Waurn Ponds, 3216, Australia
Xiao Liu
Deakin University
75 Pigdons Rd, Waurn Ponds, 3216, Australia
Intracranial aneurysms are common nowadays and how
to detect them intelligently is of great significance in digi-
tal health. While most existing deep learning research fo-
cused on medical images in a supervised way, we introduce
an unsupervised method for the detection of intracranial
aneurysms based on 3D point cloud data. In particular, our
method consists of two stages: unsupervised pre-training
and downstream tasks. As for the former, the main idea is to
pair each point cloud with its jittered counterpart and max-
imise their correspondence. Then we design a dual-branch
contrastive network with an encoder for each branch and
a subsequent common projection head. As for the latter,
we design simple networks for supervised classification and
segmentation training. Experiments on the public dataset
(IntrA) show that our unsupervised method achieves com-
parable or even better performance than some state-of-the-
art supervised techniques, and it is most prominent in the
detection of aneurysmal vessels. Experiments on the Mod-
elNet40 also show that our method achieves the accuracy
of 90.79% which outperforms existing state-of-the-art un-
supervised models.
Key words: Intracranial Aneurysm Classification, Intracra-
nial Aneurysm Segmentation, 3D Point Cloud, Unsuper-
vised Learning
1. Introduction
Intracranial aneurysms can result in a high rate of mor-
tality, and their classification and segmentation are of great
significance. Existing research mainly focused on image
data which involve regular pixels [9,18,25,26,28,29].
While 3D geometric data such as point clouds can de-
pict more useful information, the research on analysing in-
tracranial aneurysms using point cloud data has been very
sparsely exploited to date. Thanks to [38], a point cloud
dataset including aneurysmal segments and healthy ves-
sel segments has been published. They have conducted a
benchmark using state-of-the-art point-based networks that
can directly consume 3D points instead of 2D pixels.
There are many networks available for consuming
point cloud data, for example, PointNet [20], PointNet++
[21], SpiderCNN [37], PointCNN [14], SO-Net [13] and
DGCNN [32]. PointNet is a seminal method for taking
3D points as input and used for 3D point cloud classifica-
tion and segmentation. Later on, other point-based methods
have been proposed to improve the performance. Since they
are all supervised learning methods, annotated data are re-
quired for training. However, annotation often requires ex-
perts and significant amounts of time, especially for large
datasets and medical data.
With the above analysis in mind, we design an unsuper-
vised representation learning method that consumes point
clouds of vessel segments. The contrastive learning concept
inspires our method. In particular, we first generate a pair of
augmented sample of the original point cloud which should
have a distinctly difference. Next, we design a dual-branch
contrastive network with an encoder for each branch and a
follow-up common projection head to facilitate the unsu-
pervised training with a contrastive loss. As for the down-
stream tasks, we first use the unsupervised trained model
to output the representations. Then, we design simple net-
works and train it by taking the representations as input
to classify or segment intracranial aneurysms. Note that
we design two unsupervised networks and two correspond-
ing downstream networks to fulfill two different tasks (i.e.
classification and segmentation). Supervised methods often
need a large scale of labelled data for achieving satisfac-
tory performance. Compared with them, our method does
not require labels in unsupervised training, and it can utilise
a small scale of labelled data for downstream training. In
summary, our contributions in this paper include:
We propose a simple yet effective method for unsu-
pervised representation learning on 3D point clouds of
vessel segments.
We invent a useful augmentation method for generat-
ing pairs of each vessel segment.
We propose a dual-branch contrastive network with an
encoder for each branch.
We conduct comprehensive experiments and compare
with state-of-the-art point-based techniques to demon-
strate the superior performance of our method.
2. Related work
2.1. Deep Learning on Intracranial Aneurysms
Intracranial aneurysms are associated with a high mortal-
ity rate. Therefore, the detection of intracranial aneurysms
is crucial for human health. Traditional methods rely
greatly on prior knowledge, which is often inferior to deep
learning in terms of capability and accuracy. Due to the
excellent performance of deep learning in processing med-
ical images, there are many deep learning methods to de-
tect intracranial aneurysms [24]. [18] proposed a convo-
lutional neural network-based detection system. The sys-
tem used a 6-layer (CNN) and maximum intensity pro-
jection (MIP) algorithm based on the MRA images. This
method can achieve almost 100% accuracy for detecting
aneurysms greater than 7 mm in diameter. However, it was
less sensitive for small vascular aneurysms. To improve
this, [28] used the full U-net convolution architecture to
predict aneurysm size based on the detection. [29] applied
the ResNet-18 network to the MRA images and performed
a secondary evaluation on the already detected image data
to enhance the detection sensitivity. To better segment the
shape of intracranial aneurysms, [26] utilized DeepMedic
[10] with 2-pathway architecture and 11-layer convolution
to segment intracranial aneurysms from the MRA images
on the basis of detection. The above method for detect-
ing intracranial aneurysms used data that are stacked with
2D images. To sum up, nearly all works focused on deal-
ing with medical images rather than 3D geometry like point
2.2. Point-Based Networks
Neural network models for the classification and seg-
mentation of 3D point cloud data have achieved noticeable
successes. [20] proposed PointNet to directly process point
sets. To obtain permutation invariance and transformation
invariance of point clouds, PointNet used the symmetric
function and T-net to design the network. It had good re-
sults for global features extraction of point clouds. How-
ever, it ignored the geometric relationship among points
and limited the extraction of local features. To address
this issue, [21] proposed PointNet++ using a hierarchical
neural network. It used the point sampling and grouping
strategy to extract local features of point clouds. How-
ever, PointNet++ did not reveal the spatial distribution of
the input point cloud. SO-Net [13] constructed the Self-
Organizing Map (SOM) [12] to model the spatial distribu-
tion of the input point cloud. It allowed SO-Net to adjust
the receptive field overlap and performed hierarchical fea-
ture extraction. Unlike SO-Net with adjusting the percep-
tual field of the hierarchical network, PointCNN [14] pro-
posed the χ-transformation to process the point cloud data
so that the point cloud data can be weighted or permuted.
Thus, it improved the extraction of local features. In addi-
tion, SpiderCNN [37] proposed SpiderConv, i.e. parameter-
ized convolutional Filters, to implement convolutional oper-
ations on disordered point clouds. DGCNN [32] proposed
a convolutional-like operation by constructing local neigh-
bourhood graphs and applying convolutional operations on
the edges. It connected adjacent point pairs to exploit the
local geometric structure. In addition to common tasks like
classification and segmentation, point based networks have
also been developed to address other tasks [31,40]. In sum-
mary, there has been great progress in analyzing point cloud
data in a supervised manner.
2.3. Unsupervised 3D Point Cloud Learning
All the above deep neural networks can classify and seg-
ment the point cloud data well. However, considering the
complexity of labelling 3D data, it is difficult to get enough
data with expert labelling for supervised training in many
scenarios. Therefore, it is meaningful for exploiting un-
supervised or self-supervised learning for 3D point cloud
data. PointContrast [36] proposed an Unsupervised frame-
work with U-net as the backbone network. And it demon-
strated the transferability of representation learning to 3D
point cloud data and the performance enhancement of pre-
training to downstream tasks. Lu et al. [15,16] attempted
to address skeleton learning on point cloud sequence data.
Jiang et al. [8] introduced a simple yet effective unsuper-
vised learning method on point cloud that only considers
rotation as the transformation. Info3D [22] proposed to ex-
tend the InfoMax [30] and contrastive learning principles on
3D shapes. It maximized the mutual information between
3D objects and their “chunks” to improve the representation
in the aligned dataset. FoldingNet [39] proposed an autoen-
coder with graph pooling and MLP layers using the folding
operation to deform 2D grids into object surfaces. How-
ever, in 3D medical point cloud data, unsupervised methods
are still in great demand. We propose an unsupervised rep-
resentation learning method, which shows excellent perfor-
mance for the classification and segmentation of point cloud
based intracranial aneurysms.
3. Method
Our method consists of two stages which are unsuper-
vised learning and downstream tasks. In stage 1, we first
perform augmentation on each point cloud to get a pair
of augmented samples which are different in pose (Section
3.1). We then get two representations of a pair of data in
a high-dimensional space by means of the dual-branch en-
coders, which enables each branch of the encoder to extract
distinct features. Next, we map representations to a low-
dimensional vector [6] with a projection head to improve
network training speed (Section 3.2). Last, we employ a
contrastive loss to encourage the representations of the pair
of point clouds output by the encoders to be similar in the
high-dimensional space (Section 3.3). In stage 2, the trained
model is used to output unsupervised representations for the
downstream task (Section 3.4). The downstream task will
evaluate the effectiveness of unsupervised learning. Figure
1presents the architecture of our method.
3.1. Data Augmentation
We use data augmentation to generate different samples
for each point cloud. To generate a pair of data, we con-
sider using data augmentation methods, including jittered,
perturbation, and rotation transformations. After experi-
ments, it was found that jittered as data augmentation in
both branches gave the best results in the downstream tasks,
indicating a more discriminative representation learned by
the upstream network. Ablation experiments will be pre-
sented in Section 4.4.
We take a batch of point clouds with mini-batch size
Nand input them into the data augmentation module. As
shown in the top of Figure 1, for a sample in the mini-batch,
we use the jittered function to obtain a pair of samples: one
is the jittered point cloud xiand the other is the jittered
point cloud xj. In this way, we have a batch size of 2Nin
this mini-batch. We randomly select {xi, xj}as a positive
pair, and the other N1pairs, which consist of one of pos-
itive sample and one of the other samples, are regarded as
negative samples in this mini-batch.
3.2. Dual-branch Encoders and Projection Head
As shown in the left bottom of Figure 2, each pair of
samples need to be passed through dual-branch encoders
f(·)to obtain two representations hiand hj. Features are
respectively extracted from a pair of data using two differ-
ent encoders. Experimentally, we have also compared two
different encoders with a common encoder. Ablation exper-
iments will be presented in Section 4.4.
Two encoders are PointNet [20] and PointNet++ [21],
respectively. The reason for choosing PointNet and Point-
Net++ is that PointNet can extract global features while
PointNet++ can extract local features. This design high-
lights distinctions in features and allows for more distinctive
Classification. The first encoder utilizes three consec-
utive 1D convolutional layers and a max-pooling layer to
obtain the representation vector h(1024-dimensional). The
second encoder consists of three abstraction levels. Each
layer abstracts and processes the point set to create a new
point set with fewer elements. The input to the abstraction
layer consists of an n×(d+c)matrix formed by n points
with d-dim coordinates and c-dim point features. nis the
number of points in a point cloud sample. It outputs a point
set group of size n1×k×(d+c)by sampling n1centroids
and grouping them, where each group corresponds to a local
region. kis the number of points sampled from the centroid
point’s neighbourhood. The subsequent pointnet layer out-
puts a local region feature vector n1×(d+c1). We take all
the sampled points as a group in the last abstraction layer
and output the representation vector h(1024-dimensional).
We design three linear layers as our projection head g(·)to
map each 1024-dimensional representation vector to a 128-
dimensional vector z.
Segmentation. The segmentation encoders are based on
the encoders for classification. The 1024-dimensional rep-
resentation vector output by encoder 1 is copied ntimes to
form an ntensor. Concatenating it with the n×1024 tensor
obtained from the last convolutional layer gives a n×2048
tensor. As such, this tensor contains both global features
and features for each point. The encoder 2 extends the
classification’s encoder 2 by adding three propagation lay-
ers. We adopt distance-based interpolation and a skip-link
across levels propagation strategy. In a propagation level,
we propagate point features from n1×(d+c1)points to
npoints. We achieve feature propagation by concatenat-
ing interpolating feature values c1of n1points with skip
linked point features from the set abstraction level cof the
npoints. It outputs a n×(d+c1+c)vector, which is
then passed through the unit pointnet to obtain a n×1024
tensor. The 1024-dimensional vector output from the ab-
straction layer is copied ntimes and concatenated with the
n×1024-dimensional tensor output from the propagation
layer to obtain the final n×2048 representation tensor.
The tensor obtained by encoder 1 and encoder 2 are max-
pooled separately to obtain a 2048-dimensional represen-
tation vector. These two vectors are used as the feature
Figure 1. The architecture of our method including data augmentation, encoder, projection head, loss function and downstream tasks. We
first jitter a point cloud xto construct a pair (xi,xj). The representation vectors ziand zjto the pair of point clouds are then extracted via
the dual-branch encoders and mapping head, and network optimisation is performed by a contrastive loss.The representation hobtained by
dual-branch encoders will be used for downstream tasks.
representation hfor the downstream segmentation network.
We design two linear layers as our projection head g(·)to
map each 2048-dimensional representation vector to a 512-
dimensional vector z.
3.3. Contrastive Loss
We use a contrastive loss function similar to [4]. With
this loss function, unsupervised learning can effectively
learn separable features for point clouds. After the projec-
tion head g(·), for each sample in the mini-batch, we obtain
the projection representation z. For the pair xiand xj, we
use their projection representations ziand zjto measure the
cosine similarity between the two samples, as follows:
si,j =zizj
(zi∥ ∥zj)(1)
Intuitively, the similarity for a positive sample pair
should be high. A combination pair of a positive and a neg-
ative sample should be low. Then, we use to get a simi-
lar probability of each positive sample pair in a mini-batch.
[k̸=i]∈ {0,1}is an indicator function evaluating to 1 iff
k̸=i. The equation for calculating the probability of simi-
larity is as follows:
S(i, j) = exp(si,j )
k=1 k!=iexp(si,k)(2)
We use the negative logarithm to calculate the loss of
the sample pair. This loss has been used in previous works
[19,27,35]. τdenotes a temperature parameter which scales
the input and expands the range of cosine similarity. This
loss is known as the normalized temperature-scaled cross-
entropy loss [2,19] as follows:
l(i, j) = log exp(si,j )
k=1 k!=iexp(si,k )(3)
We calculate the average loss of both (i, j)and (j, i)in
the mini-batch. Based on this loss, the representation of the
encoder and projection head improves over time, and the
trained network places similar samples closer in the repre-
sentation space. Specifically, the loss function is given by:
[l(2k1,2k) + l(2k, 2k1)] (4)
3.4. Downstream Tasks
We design two simple downstream networks to evaluate
the unsupervised learned representations for classification
and segmentation, respectively. Each point cloud of a vessel
segment is fed into the unsupervised dual-branch encoders
to obtain two representations. We then concatenate the two
representations into one and use this representation as input
to train the downstream network. As for the binary classifi-
cation task, we use four linear layers (512, 256, 128, 2) as
the downstream network. Regarding the segmentation task,
we employ four 1D convolutional layers (1024, 512, 256,
m), where mis the number of segmentation labels.
4. Evaluation
4.1. Datasets
IntrA [38] consists of complete models of aneurysms,
generated vessel segments and annotated aneurysm seg-
ments. IntrA collected 103 3D models of the entire cerebral
vasculature by reconstructing 2D MRA images scanned for
patients. IntrA generated 1,909 vessel segments from the
complete model, including 1,694 healthy vessel segments
and 215 aneurysmal segments. Additionally, 116 aneurysm
segments were manually annotated for each point. In IntrA,
each sample was represented as a 3D point cloud. Each
Figure 2. Dual-branch Encoders and projection head. “conv1d”: 1D convolution, “linear”: fully connected layers, “mlp” stand for multi-
layer perceptron. The numbers in brackets represent the layer sizes. All convolutions and fully connected layers include batchnorm and
point pis a 6D vector composing of its coordinates and
normal vector. Following IntrA, we combined the gener-
ated vessel segments and manually annotated aneurysms to
achieve a total of 2,025 samples. These 2,025 vessel seg-
ments will be used as the dataset for our unsupervised train-
ing. All 2,025 vessel segments will be used for the down-
stream classification task. 116 annotated aneurysm seg-
ments will be used for the downstream segmentation task.
ModelNet-40 [34] is a collection of 40 categories and
12,311 models culled from the mesh surfaces of CAD mod-
els. Following previous practice, 9,843 models are em-
ployed for training, and 2,468 for testing. Each point cloud
has 2,048 points, and all of the points’ coordinates are nor-
malised to the unit sphere. Each point is a 6D vector made
up of its coordinates and normal vector. We take 1,024
points from each item and augment the data by jittered. This
dataset is used for comparisons of our method with other
unsupervised methods.
4.2. Experimental Setting
For unsupervised training, we use the Adam optimizer
with a weight decay of 106. The mini-batch size is set to
32. The number of epochs is 200. The initial learning rate is
103. The learning rate is scheduled to be multiplied by 0.5
Network #.Points V.(%) A.(%) F1-Score
SpiderCNN [37]
SO-Net [13]
PointCNN [14]
DGCNN [32]
PointNet++ [21]
PointNet [20]
FoldingNet [39]
Our(single PN)
Our(single PN++)
Table 1. Classification results of each method. The additional input Kis required for DGCNN. PN: PointNet, PN++: PointNet++.
in every 10 epochs. We use jittered as the data augmentation
method, which directly adds Guassian noise to every coor-
dinate and normal information of input point clouds. In the
encoder 2, the number of points sampled from the centroid
point’s neighbourhood kis set to [32, 64, “None”]. “None”
means that all points are sampled. The projection head out-
puts a feature z, the dimension of which is set to 128 for the
classification task and 512 for the segmentation task. In the
loss function, the temperature parameter τis set to 0.5.
For the downstream network, the optimizer, the num-
ber of epochs, representation dimensions, initial learning
rate, learning rate decay schedule and mini-batch size are
the same as those in unsupervised training. We sample
512,1024 and 2048 points separately in each point cloud
for both experiments. For classification task weight decay
is set to 106and size of linear is set to [512, 256, 128, a].
ais the number of categories in the point cloud. For seg-
mentation task, weight decay is set to 1.0 and size of MLPs
is set to [1024, 512, 256, b]. bis the number of categories
of points in the point cloud.
Experiments were implemented using PyTorch on a
GeForce GTX 1080 GPU. For IntrA dataset, the time for
both unsupervised training on classification and segmenta-
tion is approximately 1 hour. The downstream classifica-
tion and segmentation training are approximately 40 min-
utes and 50 minutes, respectively.
4.3. Experimental Results
We evaluate the classification task and the segmentation
task separately on IntrA [38].To demonstrate the general-
isation of our method, we also perform the classification
task on ModelNet-40 [34], and then compare our method
with start-of-the-art unsupervised methods to verify the ef-
fectiveness of our method.
Classification task. On IntrA, we evaluate the perfor-
mance using three metrics: (1) V. Accuracy, measuring the
percentage of correctly predicted healthy vessels’ samples
over all healthy vessels’ samples, (2) A. Accuracy, indicat-
ing the percentage of correctly predicted aneurysm vessels’
samples over all aneurysm vessels’ samples, (3) F1 score,
representing the harmonic average of precision and recall
and evaluating the quality of the model. On ModelNet-40,
we evaluate the performance using overall accuracy.
As shown in Table 1, as for our dual-branch encoders
method with PointNet and PointNet++ backbones (i.e. PN,
PN++), 1,024 sample points have the best results in terms
of the F1 score and V. Accuracy compared with other num-
bers of sample points. The results for 512 sample points
are still impressive, though the number of points in each
point cloud is much smaller. Although the 2,048 sample
point result is not the best in terms of F1 score and V. ac-
curacy. Notice that our method in 2,048 sample point has
the best A. Accuracy results compared with all mentioned
methods. The ability to identify aneurysms is essential in
this case. Furthermore, we find that the A. accuracy in-
crease with more sample points. Compared with other su-
pervised methods, our results are better than the supervised
PointNet in all metrics. Besides, it also outperforms more
advanced supervised networks such as DGCNN in general.
Result of 1,024 sample points is very close to SO-Net and
SpiderCNN in terms of F1-Score. We also compare our
method with FoldingNet, one of the most representative un-
supervised methods. Obviously, our method performs bet-
ter on all metrics. The effectiveness of unsupervised learn-
ing is inherently limited due to the unsupervised nature, and
the tubular structure of intracranial aneurysms is much less
prominent compared with other data. It causes our method
(dual) to perform less well than supervised PN++.
Method ModelNet40(%)
SPH [11] 68.2%
LFD [3] 75.5%
T-L Network [5] 74.4%
VConv-DAE [23] 75.5%
3D-GAN [33] 83.3%
Latent-GAN [1] 85.7%
FoldingNet [39] 88.4%
PointCapsNet [41] 88.9%
MultiTask [7] 89.1%
Our(dual) 90.79%
Table 2. Classification accuracy of unsupervised learning on Mod-
As shown in Table 2, we compare the performance of our
model with other unsupervised methods on ModelNet40
[34]. we can see that our method outperforms all other un-
supervised methods, which again confirms its effectiveness
in unsupervised representation learning.
Network #.Points IoU V.(%) IoU A.(%)
Table 3. Segmentation results of each network.
Segmentation task. Following [38], we evaluate the
segmentation performance using two metrics: (1) V. IoU,
indicating the IoU of heathly vessel, and (2) A. IoU, indi-
cating the IoU of aneurysm vessel.
As shown in Table 3, as for our method (dual), 1,024
sample points have the best results in terms of V. IoU.
But 2,048 sample points have the best results in terms of
A. IoU. In comparison, our method outperforms the su-
pervised PointNet on both V. IoU and A. IoU. Notice that
our method is better than more advanced supervised net-
works like PointGrid. Compared to the supervised Point-
Net, which is trained with only 116 labelled samples, our
method is able to learn unsupervised features from a much
wider range of data, thus facilitating downstream network
training. Our method generally generates better results with
increasing the point number, and produces better results
than the supervised PointNet in both metrics. Our method
(PN++) is still inferior to the supervised PointNet++, which
is considered to be limited by the unsupervised nature.
4.4. Ablation Studies
We explore the factors that make our method effective
through ablation experiments. We conduct two ablation ex-
periments to further understand which data augmentation is
more effective, and the effect of dual-branch encoders. We
also analyse the effectiveness of our method in the case of
sparse labels. The ablation experiments sample 1,024 points
in each point cloud.
Augmentation. We try to find the best data augmen-
tation method for our unsupervised method, by consider-
ing three different augmentation methods including rota-
tion, jittered and perturbation.
Augmentation V.(%) A.(%) F1-Score
rotation 95.23 75.52 0.7637
perturbation 95.35 81.87 0.8121
jittered & perturbation 95.63 81.76 0.8240
jittered 97.45 84.28 0.8613
Table 4. Ablation study on augmentation.
Rotation means randomly rotating the point cloud along
the Y-axis. Perturbation means randomly rotating the point
cloud by a small angle along the XY Z-axis. Jittered is
the addition of Gaussian noise to the XY Z coordinates and
normal information of the point cloud. As shown in Table
4, jittered is the best data augmentation for both branches,
and the method with both jittered and perturbation is the
second and generally better than the perturbation for both
branches. The data augmentation with both branches rota-
tion is the least effective. Based on the results, we can find
that the data augmentation using jittered allows the encoder
to learn the distinctive features of the point cloud more ef-
fectively, thereby giving better results.
Figure 3. Ablation study on dual-branch encoders.
Dual-branch encoders. In order to investigate the effec-
tiveness of dual-branch encoders, experiments are designed
to compare it with the traditional single encoder method. As
shown in Figure 3, the dual-branch encoders method has the
best performance for both classification and segmentation
tasks, in particular for the classification task. Besides, the
single encoder method based on the more advanced PN++
is inferior to the PN-based method. This is probably due to
contrastive learning depends largely on global information.
Based on these results, we have the following findings:
Dual-branch encoders are able to extract more discrim-
inative features. In particular, the two encoders in our
design are PN and PN++ respectively where PN fo-
cuses on global features and PN++ on local features.
Contrastive learning can better understand the distinc-
tions between the features extracted by the two en-
coders. Therefore it is more effective than a single en-
coder. In our design, PN and PN++ as encoders can
better highlight the distinctions between the local and
global features of a point cloud sample, allowing the
contrastive loss to optimise the network more effec-
Contrastive learning is excellent at describing objects
as a whole, but is weak at describing them at the point
scale. Our method achieves outstanding results in clas-
sification tasks, but is moderately effective in segmen-
tation tasks. This is because our contrastive learning
is not a comparison between points but between point
clouds as a whole.
Network Label(%) V.(%) A.(%) F1-Score
Table 5. Ablation study for limited labeled Data.
Limited Labeled data. In real-world situations, we fre-
quently lack sufficient labeled data. We divide the origi-
nal dataset into two parts, A and B, assume A to be the
unlabeled data and B to be the labeled data, to represent
such circumstance. The percentages of labeled data are set
to 10%, 5% and 1%, respectively. In unsupervised learn-
ing, we use A+B to pre-train the model, and then use B
for training the downstream tasks. Because of the nature
of supervised learning, only B is used for other supervised
training. The experiments are set up with a classification
task and performed on the IntrA dataset. As shown in Ta-
ble 5, the accuracy of the classification gradually decreases
as the amount of annotated data decreased. However, the
accuracy of our model consistently outperforms that of the
supervised models. This suggests that our method is more
robust by making use of the unlabeled data for unsupervised
learning. An interesting point is to combine evolutionary
optimization with the proposed method to enhance the per-
formance on limited labeled data [17].
5. Conclusion
In this work, we have presented an unsupervised rep-
resentation learning method for the classification and seg-
mentation of 3D intracranial aneurysms. It first augments a
point cloud into two samples, and pairs them up for going
through the dual-branch encoders and a subsequent com-
mon projection head. Distinctive features are learned by
maximising the correspondence for a pair. The representa-
tions learned by the unsupervised trained encoders are used
as input for the downstream tasks. Experiments demon-
strated that our method is effective in learning unsupervised
representations and can achieve better or comparable per-
formance than state-of-the-art supervised and unsupervised
learning methods.
[1] Panos Achlioptas, Olga Diamanti, Ioannis Mitliagkas, and
Leonidas Guibas. Representation learning and adver-
sarial generation of 3d point clouds. arXiv preprint
arXiv:1707.02392, 2(3):4, 2017. 7
[2] Philip Bachman, R Devon Hjelm, and William Buchwalter.
Learning representations by maximizing mutual information
across views. arXiv preprint arXiv:1906.00910, 2019. 4
[3] Ding-Yun Chen, Xiao-Pei Tian, Yu-Te Shen, and Ming
Ouhyoung. On visual similarity based 3d model retrieval. In
Computer graphics forum, volume 22, pages 223–232. Wi-
ley Online Library, 2003. 7
[4] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Ge-
offrey Hinton. A simple framework for contrastive learning
of visual representations. In International conference on ma-
chine learning, pages 1597–1607. PMLR, 2020. 4
[5] Rohit Girdhar, David F Fouhey, Mikel Rodriguez, and Ab-
hinav Gupta. Learning a predictable and generative vector
representation for objects. In European Conference on Com-
puter Vision, pages 484–499. Springer, 2016. 7
[6] Raia Hadsell, Sumit Chopra, and Yann LeCun. Dimensional-
ity reduction by learning an invariant mapping. In 2006 IEEE
Computer Society Conference on Computer Vision and Pat-
tern Recognition (CVPR’06), volume 2, pages 1735–1742.
IEEE, 2006. 3
[7] Kaveh Hassani and Mike Haley. Unsupervised multi-task
feature learning on point clouds. In Proceedings of the
IEEE/CVF International Conference on Computer Vision,
pages 8160–8171, 2019. 7
[8] Jincen Jiang, Xuequan Lu, Wanli Ouyang, and Meili Wang.
Unsupervised representation learning for 3d point cloud
data. arXiv preprint arXiv:2110.06632, 2021. 2
[9] Bio Joo, Sung Soo Ahn, Pyeong Ho Yoon, Sohi Bae, Beom-
seok Sohn, Yong Eun Lee, Jun Ho Bae, Moo Sung Park,
Hyun Seok Choi, and Seung-Koo Lee. A deep learning al-
gorithm may automate intracranial aneurysm detection on mr
angiography with high diagnostic performance. European
Radiology, 30:5785–5793, 2020. 1
[10] Konstantinos Kamnitsas, Enzo Ferrante, Sarah Parisot,
Christian Ledig, Aditya V Nori, Antonio Criminisi, Daniel
Rueckert, and Ben Glocker. Deepmedic for brain tumor
segmentation. In International workshop on Brainlesion:
Glioma, multiple sclerosis, stroke and traumatic brain in-
juries, pages 138–149. Springer, 2016. 2
[11] Michael Kazhdan, Thomas Funkhouser, and Szymon
Rusinkiewicz. Rotation invariant spherical harmonic repre-
sentation of 3 d shape descriptors. In Symposium on geome-
try processing, volume 6, pages 156–164, 2003. 7
[12] Teuvo Kohonen. The self-organizing map. Proceedings of
the IEEE, 78(9):1464–1480, 1990. 2
[13] Jiaxin Li, Ben M Chen, and Gim Hee Lee. So-net: Self-
organizing network for point cloud analysis. In Proceed-
ings of the IEEE conference on computer vision and pattern
recognition, pages 9397–9406, 2018. 1,2,6
[14] Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan
Di, and Baoquan Chen. Pointcnn: Convolution on χ-
transformed points. In Proceedings of the 32nd Interna-
tional Conference on Neural Information Processing Sys-
tems, pages 828–838, 2018. 1,2,6
[15] Xuequan Lu, Honghua Chen, Sai-Kit Yeung, Zhigang Deng,
and Wenzhi Chen. Unsupervised articulated skeleton extrac-
tion from point set sequences captured by a single depth cam-
era. In Thirty-Second AAAI Conference on Artificial Intelli-
gence, 2018. 2
[16] Xuequan Lu, Zhigang Deng, Jun Luo, Wenzhi Chen, Sai-Kit
Yeung, and Ying He. 3d articulated skeleton extraction using
a single consumer-grade depth camera. Computer Vision and
Image Understanding, 188:102792, 2019. 2
[17] T. Nakane, N. Bold, H. Sun, X. Lu, T. Akashi, and C. Zhang.
Application of evolutionary and swarm optimization in com-
puter vision: a literature survey. IPSJ Transactions on Com-
puter Vision and Applications, 12(1):1–34, 2020. 9
[18] Takahiro Nakao, Shouhei Hanaoka, Yukihiro Nomura, Issei
Sato, Mitsutaka Nemoto, Soichiro Miki, Eriko Maeda, Take-
haru Yoshikawa, Naoto Hayashi, and Osamu Abe. Deep
neural network-based computer-assisted detection of cere-
bral aneurysms in mr angiography. Journal of Magnetic Res-
onance Imaging, 47(4):948–953, 2018. 1,2
[19] Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Repre-
sentation learning with contrastive predictive coding. arXiv
preprint arXiv:1807.03748, 2018. 4
[20] Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas.
Pointnet: Deep learning on point sets for 3d classification
and segmentation. In Proceedings of the IEEE conference
on computer vision and pattern recognition, pages 652–660,
2017. 1,2,3,6
[21] Charles R Qi, Li Yi, Hao Su, and Leonidas J Guibas. Point-
net++: Deep hierarchical feature learning on point sets in a
metric space. arXiv preprint arXiv:1706.02413, 2017. 1,2,
[22] Aditya Sanghi. Info3d: Representation learning on 3d ob-
jects using mutual information maximization and contrastive
learning. In European Conference on Computer Vision,
pages 626–642. Springer, 2020. 2
[23] Abhishek Sharma, Oliver Grau, and Mario Fritz. Vconv-dae:
Deep volumetric shape learning without object labels. In
European Conference on Computer Vision, pages 236–250.
Springer, 2016. 7
[24] Z Shi, B Hu, UJ Schoepf, RH Savage, DM Dargis, CW Pan,
XL Li, QQ Ni, GM Lu, and LJ Zhang. Artificial intelligence
in the management of intracranial aneurysms: current status
and future perspectives. American Journal of Neuroradiol-
ogy, 41(3):373–379, 2020. 2
[25] Zhao Shi, Chongchang Miao, U Joseph Schoepf, Rock H
Savage, Danielle M Dargis, Chengwei Pan, Xue Chai, Xiu Li
Li, Shuang Xia, Xin Zhang, et al. A clinically applicable
deep-learning model for detecting intracranial aneurysm in
computed tomography angiography images. Nature commu-
nications, 11(1):1–11, 2020. 1
[26] T Sichtermann, A Faron, R Sijben, N Teichert, J Freiherr,
and M Wiesmann. Deep learning–based detection of in-
tracranial aneurysms in 3d tof-mra. American Journal of
Neuroradiology, 40(1):25–32, 2019. 1,2
[27] Kihyuk Sohn. Improved deep metric learning with multi-
class n-pair loss objective. In Proceedings of the 30th Inter-
national Conference on Neural Information Processing Sys-
tems, pages 1857–1865, 2016. 4
[28] Joseph N Stember, Peter Chang, Danielle M Stember,
Michael Liu, Jack Grinband, Christopher G Filippi, Philip
Meyers, and Sachin Jambawalikar. Convolutional neural
networks for the detection and measurement of cerebral
aneurysms on magnetic resonance angiography. Journal of
digital imaging, 32(5):808–815, 2019. 1,2
[29] Daiju Ueda, Akira Yamamoto, Masataka Nishimori, Taro
Shimono, Satoshi Doishita, Akitoshi Shimazaki, Yutaka
Katayama, Shinya Fukumoto, Antoine Choppin, Yuki
Shimahara, et al. Deep learning for mr angiography:
automated detection of cerebral aneurysms. Radiology,
290(1):187–194, 2019. 1,2
[30] Petar Veliˇ
c, William Fedus, William L Hamilton, Pietro
o, Yoshua Bengio, and R Devon Hjelm. Deep graph info-
max. arXiv preprint arXiv:1809.10341, 2018. 2
[31] Weijia Wang, Xuequan Lu, Dasith de Silva Edirimuni,
Xiao Liu, and Antonio Robles-Kelly. Deep point cloud
normal estimation via triplet learning. arXiv preprint
arXiv:2110.10494, 2021. 2
[32] Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma,
Michael M Bronstein, and Justin M Solomon. Dynamic
graph cnn for learning on point clouds. Acm Transactions
On Graphics (tog), 38(5):1–12, 2019. 1,2,6
[33] Jiajun Wu, Chengkai Zhang, Tianfan Xue, William T Free-
man, and Joshua B Tenenbaum. Learning a probabilistic
latent space of object shapes via 3d generative-adversarial
modeling. arXiv preprint arXiv:1610.07584, 2016. 7
[34] Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Lin-
guang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3d
shapenets: A deep representation for volumetric shapes. In
Proceedings of the IEEE conference on computer vision and
pattern recognition, pages 1912–1920, 2015. 5,6,7
[35] Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin.
Unsupervised feature learning via non-parametric instance
discrimination. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pages 3733–
3742, 2018. 4
[36] Saining Xie, Jiatao Gu, Demi Guo, Charles R Qi, Leonidas
Guibas, and Or Litany. Pointcontrast: Unsupervised pre-
training for 3d point cloud understanding. In European Con-
ference on Computer Vision, pages 574–591. Springer, 2020.
[37] Yifan Xu, Tianqi Fan, Mingye Xu, Long Zeng, and Yu Qiao.
Spidercnn: Deep learning on point sets with parameterized
convolutional filters. In Proceedings of the European Con-
ference on Computer Vision (ECCV), pages 87–102, 2018.
[38] Xi Yang, Ding Xia, Taichi Kin, and Takeo Igarashi. Intra: 3d
intracranial aneurysm dataset for deep learning. In Proceed-
ings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pages 2656–2666, 2020. 1,4,6,7
[39] Yaoqing Yang, Chen Feng, Yiru Shen, and Dong Tian. Fold-
ingnet: Point cloud auto-encoder via deep grid deformation.
In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, pages 206–215, 2018. 3,6,7
[40] Dongbo Zhang, Xuequan Lu, Hong Qin, and Ying He. Point-
filter: Point cloud filtering via encoder-decoder modeling.
IEEE Transactions on Visualization and Computer Graph-
ics, 27(3):2015–2027, 2020. 2
[41] Yongheng Zhao, Tolga Birdal, Haowen Deng, and Federico
Tombari. 3d point capsule networks. In Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pages 1009–1018, 2019. 7
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Intracranial aneurysm is a common life-threatening disease. Computed tomography angiography is recommended as the standard diagnosis tool; yet, interpretation can be time-consuming and challenging. We present a specific deep-learning-based model trained on 1,177 digital subtraction angiography verified bone-removal computed tomography angiography cases. The model has good tolerance to image quality and is tested with different manufacturers. Simulated real-world studies are conducted in consecutive internal and external cohorts, in which it achieves an improved patient-level sensitivity and lesion-level sensitivity compared to that of radiologists and expert neurosurgeons. A specific cohort of suspected acute ischemic stroke is employed and it is found that 99.0% predicted-negative cases can be trusted with high confidence, leading to a potential reduction in human workload. A prospective study is warranted to determine whether the algorithm could improve patients’ care in comparison to clinicians’ assessment.
Full-text available
Point cloud filtering is a fundamental problem in geometry modeling and processing. Despite of significant advancement in recent years, the existing methods still suffer from two issues: 1) they are either designed without preserving sharp features or less robust in feature preservation; and 2) they usually have many parameters and require tedious parameter tuning. In this article, we propose a novel deep learning approach that automatically and robustly filters point clouds by removing noise and preserving their sharp features. Our point-wise learning architecture consists of an encoder and a decoder. The encoder directly takes points (a point and its neighbors) as input, and learns a latent representation vector which goes through the decoder to relate the ground-truth position with a displacement vector. The trained neural network can automatically generate a set of clean points from a noisy input. Extensive experiments show that our approach outperforms the state-of-the-art deep learning techniques in terms of both visual quality and quantitative error metrics. The source code and dataset can be found at .
Full-text available
Abstract Evolutionary algorithms (EAs) and swarm algorithms (SAs) have shown their usefulness in solving combinatorial and NP-hard optimization problems in various research fields. However, in the field of computer vision, related surveys have not been updated during the last decade. In this study, inspired by the recent development of deep neural networks in computer vision, which embed large-scale optimization problems, we first describe a literature survey conducted to compensate for the lack of relevant research in this area. Specifically, applications related to the genetic algorithm and differential evolution from EAs, as well as particle swarm optimization and ant colony optimization from SAs and their variants, are mainly considered in this survey.
Full-text available
Articulated skeleton extraction or learning has been extensively studied for 2D (e.g., images and video) and 3D (e.g., volume sequences, motion capture, and mesh sequences) data. Nevertheless, robustly and accurately learning 3D articulated skeletons from point set sequences captured by a single consumer-grade depth camera still remains challenging, since such data are often corrupted with substantial noise and outliers. Relatively few approaches have been proposed to tackle this problem. In this paper, we present a novel unsupervised framework to address this issue. Specifically, we first build one-to-one point correspondences among the point cloud frames in a sequence with our non-rigid point cloud registration algorithm. We then generate a skeleton involving a reasonable number of joints and bones with our skeletal structure extraction algorithm. We lastly present an iterative Linear Blend Skinning based algorithm for accurate joints learning. At the end, our method can learn a quality articulated skeleton from a single 3D point sequence possibly corrupted with noise and outliers. Through qualitative and quantitative evaluations on both publicly available data and in-house Kinect-captured data, we show that our unsupervised approach soundly outperforms state of the art techniques in terms of both quality (i.e., visual) and accuracy (i.e., Euclidean distance error metric). Moreover, the poses of our extracted skeletons are even comparable to those by KinectSDK, a well-known supervised pose estimation technique; for example, our method and KinectSDK achieves similar distance errors of 0.0497 and 0.0521.
Objectives To develop a deep learning algorithm for automated detection and localization of intracranial aneurysms on time-of-flight MR angiography and evaluate its diagnostic performance.Methods In a retrospective and multicenter study, MR images with aneurysms based on radiological reports were extracted. The examinations were randomly divided into two data sets: training set of 468 examinations and internal test set of 120 examinations. Additionally, 50 examinations without aneurysms were randomly selected and added to the internal test set. External test data set consisted of 56 examinations with intracranial aneurysms and 50 examinations without aneurysms, which were extracted based on radiological reports from a different institution. After manual ground truth segmentation of aneurysms, a deep learning algorithm based on 3D ResNet architecture was established with the training set. Its sensitivity, positive predictive value, and specificity were evaluated in the internal and external test sets.ResultsMR images included 551 aneurysms (mean diameter, 4.17 ± 2.49 mm) in the training, 147 aneurysms (mean diameter, 3.98 ± 2.11 mm) in the internal test, 63 aneurysms (mean diameter, 3.23 ± 1.69 mm) in the external test sets. The sensitivity, the positive predictive value, and the specificity were 87.1%, 92.8%, and 92.0% for the internal test set and 85.7%, 91.5%, and 98.0% for the external test set, respectively.ConclusionA deep learning algorithm detected intracranial aneurysms with a high diagnostic performance which was validated using external data set.Key Points • A deep learning-based algorithm for the automated diagnosis of intracranial aneurysms demonstrated a high sensitivity, positive predictive value, and specificity. • The high diagnostic performance of the algorithm was validated using external test data set from a different institution with a different scanner. • The algorithm might be robust and effective for general use in real clinical settings.
Intracranial aneurysms with subarachnoid hemorrhage lead to high morbidity and mortality. It is of critical importance to detect aneurysms, identify risk factors of rupture, and predict treatment response of aneurysms to guide clinical interventions. Artificial intelligence has received worldwide attention for its impressive performance in image-based tasks. Artificial intelligence serves as an adjunct to physicians in a series of clinical settings, which substantially improves diagnostic accuracy while reducing physicians' workload. Computer-assisted diagnosis systems of aneurysms based on MRA and CTA using deep learning have been evaluated, and excellent performances have been reported. Artificial intelligence has also been used in automated morphologic calculation, rupture risk stratification, and outcomes prediction with the implementation of machine learning methods, which have exhibited incremental value. This review summarizes current advances of artificial intelligence in the management of aneurysms, including detection and prediction. The challenges and future directions of clinical implementations of artificial intelligence are briefly discussed.
BACKGROUND AND PURPOSE: The rupture of an intracranial aneurysm is a serious incident, causing subarachnoid hemorrhage associated with high fatality and morbidity rates. Because the demand for radiologic examinations is steadily growing, physician fatigue due to an increased workload is a real concern and may lead to mistaken diagnoses of potentially relevant findings. Our aim was to develop a sufficient system for automated detection of intracranial aneurysms. MATERIALS AND METHODS: In a retrospective study, we established a system for the detection of intracranial aneurysms from 3D TOF-MRA data. The system is based on an open-source neural network, originally developed for segmentation of anatomic structures in medical images. Eighty-five datasets of patients with a total of 115 intracranial aneurysms were used to train the system and evaluate its performance. Manual annotation of aneurysms based on radiologic reports and critical revision of image data served as the reference standard. Sensitivity, false-positives per case, and positive predictive value were determined for different pipelines with modified pre- and postprocessing. RESULTS: The highest overall sensitivity of our system for the detection of intracranial aneurysms was 90% with a sensitivity of 96% for aneurysms with a diameter of 3-7 mm and 100% for aneurysms of >7 mm. The best location-dependent performance was in the posterior circulation. Pre- and postprocessing sufficiently reduced the number of false-positives. CONCLUSIONS: Our system, based on a deep learning convolutional network, can detect intracranial aneurysms with a high sensitivity from 3D TOF-MRA data.