Available via license: CC BY
Content may be subject to copyright.
Citation: Xiong, F.; Kong, Y.; Kuang,
X.; Hu, M.; Zhang, Z.; Shen, C.; Han,
X. A Multi-Scale Covariance Matrix
Descriptor and an Accurate
Transformation Estimation for Robust
Point Cloud Registration. Appl. Sci.
2024,14, 9375. https://doi.org/
10.3390/app14209375
Academic Editor: Antonio
Fernández-Caballero
Received: 9 September 2024
Revised: 3 October 2024
Accepted: 13 October 2024
Published: 14 October 2024
Copyright: © 2024 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
applied
sciences
Article
A Multi-Scale Covariance Matrix Descriptor and an Accurate
Transformation Estimation for Robust Point Cloud Registration
Fengguang Xiong 1,2,3,* , Yu Kong 1, Xinhe Kuang 4, Mingyue Hu 1, Zhiqiang Zhang 1, Chaofan Shen 1
and Xie Han 1
1School of Computer Science and Technology, North University of China, Taiyuan 030051, China
2Shanxi Provincial Key Laboratory of Machine Vision and Virtual Reality, North University of China,
Taiyuan 030051, China
3
Shanxi Visual Information Processing and Intelligent Robot Engineering Research Center, North University of
China, Taiyuan 030051, China
4Sydney Smart Technology College, Northeastern University, Qinhuangdao 066004, China
*Correspondence: hopenxfg@nuc.edu.cn
Abstract: This paper presents a robust point cloud registration method based on a multi-scale covariance
matrix descriptor and an accurate transformation estimation. Compared with state-of-the-art feature
descriptors, such as FPH, 3DSC, spin image, etc., our proposed multi-scale covariance matrix descriptor
is superior for dealing with registration problems in a higher noise environment since the mean
operation in generating the covariance matrix can filter out most of the noise-damaged samples
or outliers and also make itself robust to noise. Compared with transformation estimation, such
as feature matching, clustering, ICP, RANSAC, etc., our transformation estimation is able to find a
better optimal transformation between a pair of point clouds since our transformation estimation is a
multi-level point cloud transformation estimator including feature matching, coarse transformation
estimation based on clustering, and a fine transformation estimation based on ICP. Experiment findings
reveal that our proposed feature descriptor and transformation estimation outperforms state-of-the-
art feature descriptors and transformation estimation, and registration effectiveness based on our
registration framework of point cloud is extremely successful in the Stanford 3D Scanning Repository,
the SpaceTime dataset, and the Kinect dataset, where the Stanford 3D Scanning Repository is known for
its comprehensive collection of high-quality 3D scans, and the SpaceTime dataset and the Kinect dataset
are captured by a SpaceTime Stereo scanner and a low-cost Microsoft Kinect scanner, respectively.
Keywords: point cloud registration; feature descriptor; multi-scale covariance matrix; boundary
point detection; transformation estimation
1. Introduction
More and more fields, including 3D reconstruction [
1
,
2
], simultaneous localization
and mapping (SLAM) [
3
], autonomous driving [
4
], virtual reality [
5
], and others, have used
point clouds obtained by laser scanning or stereo matching of images. However, only a part
of the 3D object’s point cloud may be obtained in a single scan due to the scanning device’s
range of view limitation. If insufficient acquired point clouds are provided, several point
clouds of the 3D object must be combined to measure the entire 3D object, which requires
all partial point clouds to be aligned into a global coordinate frame [
6
]. The registration
between point clouds is a necessary operation before further use of point clouds, such as
segmentation, recognition, classification, retrieval of point clouds, etc.
Minimizing the distance between a source point cloud and a target point cloud is the
aim of point cloud registration, which then converts the pair of point clouds into a global
coordinate frame. Nevertheless, without an accurate initial orientation or position, the fine
transformation matrix estimation may only converge to a local minimum rather than a
Appl. Sci. 2024,14, 9375. https://doi.org/10.3390/app14209375 https://www.mdpi.com/journal/applsci
Appl. Sci. 2024,14, 9375 2 of 23
global optimum during the distance-minimization process. Therefore, a coarse transforma-
tion estimation should be carried out initially to obtain the correct initial transformation
parameters, which include a rotation matrix and a translation vector with six degrees
of freedom. Finding geometric features and the correspondence between these features
are two crucial jobs that must be completed for point clouds in any initial orientation or
position during coarse registration. However, the aforementioned tasks may be hindered by
some significant obstacles, such as noise and outliers, various types of occlusions, varying
densities, low overlap, and so on.
In this paper, we propose a local descriptor based on a multi-scale covariance matrix
for point cloud registration which is robust to noise and invariant to rigid transformation,
and we also put forward an accurate transformation estimation method by multi-level
estimating point cloud transformations. The following is a summary of the contributions.
(1) A multi-scale covariance matrix feature descriptor is proposed, which can describe
the geometrical relationship, eigenvalue variation gradient, and surface variation gradient
between a keypoint and its neighboring points. The feature descriptor can filter out most
of the noise samples by the mean operation in the process of producing covariance and
make it very robust to noise.
(2) A method for detecting boundary points is put forward, which can efficiently extract
points on the border. By utilizing this approach, nearby keypoints may be successfully
eliminated, improving the descriptiveness of the keypoint by our proposed feature descriptor.
(3) An accurate transformation estimation between a pair of point clouds is brought
upon. In this method, we obtain corresponding keypoint pairs by feature matching,
estimate an initial transformation using a coarse transformation estimation method based
on clustering via these correspondences, and finally obtain an accurate transformation by a
fine transformation estimation, such as ICP’s variants.
2. Related Work
2.1. Traditional Methods
To expand the viability and effectiveness of traditional point cloud registration, many
studies have proposed some methods to tackle these challenges, including coarse registra-
tion methods and fine registration methods.
Coarse registration is a rapid estimation of an initial matrix without knowing any initial
positions between the source and target point clouds. The course registration approach
is always composed of four stages: keypoint detection, feature descriptor generation,
and feature matching and transformation estimation. Among the preceding processes,
generating feature descriptors is unquestionably the most significant, and numerous studies
on feature descriptors, particularly local feature descriptors, have been presented for point
cloud registration. Johnson and Hebert et al. [
7
] presented a spin image descriptor. Each
keypoint described by a spin image is insensitive to rigid transformation, and keypoint
correspondences between a source point and a target point cloud can be constructed using
the spin image’s correspondences. Frome et al. [
8
] suggested a 3D shape context (3DSC)
descriptor. As the local reference axis, they use the normal at a keypoint. After that, the
spherical neighborhood is partitioned evenly along the azimuth and elevation dimensions
but logarithmically along the radial dimension. The weighted number of points that fall
within each bin is used to construct the 3DSC descriptor. Rusu et al. [
9
] offered the Point
Feature Histogram (PFH) descriptor. A Darboux frame is defined initially for each pair of
points in the neighborhood of a keypoint. They then compute four measures based on the
angles between the normals of the points and the distance vector between them and add
these measures for all pairs of points to form a 16-bin histogram. Point clouds are finally
registered and aligned into a consistent global point cloud using corresponding feature
point pairs. Salti et al. [
10
] put forward the SHOT descriptor which encodes the histograms
of surface normals at various spatial positions in the reference point neighborhood. Yulan
Guo et al. [
11
] presented the RoPS descriptor which employs the rotation and projection
mechanism to encode the local surfaces. Radu Bogdan Rusu et al. suggested Fast Point
Appl. Sci. 2024,14, 9375 3 of 23
Feature Histograms (FPFHs) [
12
] which modify PFH descriptors’ mathematical expressions
and perform a rigorous analysis of their robustness and complexity for the problem of
3D registration for overlapping point cloud views. Yu Zou et al. [
13
] put forward BRoPH,
which is generated directly from the point cloud by turning the description of the 3D point
cloud into a series binarization of 2D image patches. Yuhe Zhang et al. [
14
] proposed KDD,
which encodes the information of the whole 3D space around the feature point via kernel
density estimation and provides the strategy for selecting different matching metrics for
datasets with diverse levels of resolution qualities.
Fine registration is a more precise and refined registration based on coarse registration,
and the essential idea is to immediately acquire the correspondence between a source
point cloud and a target point cloud. The ICP (Iterative Closest Point) technique [15,16] is
the most extensively used fine registration method. It is based on the quaternions-based
approach and reduce point-to-point distances in overlapping areas between point clouds.
However, ICP often needs adequate starting transformation settings, and its iteration
process is lengthy. As a result, various ICP variations [
17
–
25
] have been developed to solve
these issues.
2.2. Learning-Based Methods
A number of deep learning approaches, including PCRNet [
26
], REGTR [
27
], PREDA-
TOR [
28
], and others, have recently been suggested for registration. Two categories can
be used to categorize learning-based registration: feature learning-based methods and
end-to-end learning-based approaches.
In contrast to the conventional point cloud registration techniques, feature learning-
based approaches learn a robust feature correspondence search using a deep neural network.
The transformation matrix is then finally computed via a one-step estimate (e.g., RANSAC)
without the need for iterations. In order to identify soft correspondences between the over-
lap of two point clouds and extract contextual information for the purpose of learning more
unique feature descriptors, PREDATOR uses an attention mechanism. Self-attention and
cross-attention are used by REGTR to directly forecast a set of final point correspondences
that are consistent. The goal of all these techniques is to estimate robust correspondences
using the learned distinctive feature, and they all use deep learning as a tool for feature
extraction. Gojcic et al. [
29
] propose to describe 3DSmoothNet for point cloud registra-
tion, a comprehensive approach to match 3D point clouds with a Siamese deep learning
architecture with fully convolutional layers via a voxelized smoothed density value (SDV)
representation. Xuyang Bai et al. [
30
] put forward PointDSC, which is a revolutionary deep
neural network that explicitly uses spatial consistency to prune correspondences between
outliers during point cloud registration.
End-to-end neural networks are used in end-to-end learning-based approaches to
overcome the registration challenge. Two point clouds are the network’s input, and the
transformation matrix that aligns the two point clouds is its output. The network not only
can extract features of point clouds but can also estimate transformation. The network of
the feature learning-based approach is distinct from the transformation estimation network
of the end-to-end learning-based approach and concentrates on feature learning. After
extracting global features with PointNet, PCRNet joins these features and feeds them into
the MLP network for regression modification. To learn pose-invariant point-to-distribution
parameter correspondences, DeepGMR [
31
] makes use of a neural network. The GMM
optimization module then uses these correspondences to estimate the transformation matrix.
For internal likelihood prediction, DGR [
32
] proposes a six-dimensional convolutional
network design and uses a weighted Procrustes module to estimate the transformation.
However, 3D point clouds with labeled tags are commonly used in deep learning
methods, but the creation of the labeled tags is time consuming, making them difficult to
use for some practical applications. Exceptions include point cloud registration technol-
ogy based on deep learning, which still has problems, such as inaccurate error solving,
certain randomness and instability of registration results, high algorithm complexity and
Appl. Sci. 2024,14, 9375 4 of 23
long computation time, and limited generalization ability. Specifically, the issues with
3DSmoonNet includes high efficiency in processing large-scale point clouds, strong hard-
ware dependency, and low generalization ability. The shortcomings of PointDSC include
severe performance dependence on initial matching, high computational complexity, etc.
3. Methodology
Our registration method includes three steps: keypoint detection with boundary
removal, feature descriptor generator, and accurate transformation estimation. In the
process of keypoint detection with boundary removal, we can detect keypoints from the
point cloud that are not located around the boundary, allowing us to describe complete
neighborhood information of keypoints and generate a good feature descriptor for each
keypoint. In the process of feature descriptor generation, we use our multi-scale covariance
matrix descriptor to extract local features of keypoints, making it more robust to noise
and invariant to rigid transformation. During the accurate transformation estimation,
we use feature matching, coarse transformation estimation based on clustering, and fine
transformation estimation based on an ICP’s variant to estimate an accurate transformation
between a pair of point clouds. The framework of our point cloud registration is shown in
Figure 1.
Appl. Sci. 2024, 14, x FOR PEER REVIEW 4 of 24
The GMM optimization module then uses these correspondences to estimate the transfor-
mation matrix. For internal likelihood prediction, DGR [32] proposes a six-dimensional
convolutional network design and uses a weighted Procrustes module to estimate the
transformation.
However, 3D point clouds with labeled tags are commonly used in deep learning
methods, but the creation of the labeled tags is time consuming, making them difficult to
use for some practical applications. Exceptions include point cloud registration technol-
ogy based on deep learning, which still has problems, such as inaccurate error solving,
certain randomness and instability of registration results, high algorithm complexity and
long computation time, and limited generalization ability. Specifically, the issues with
3DSmoonNet includes high efficiency in processing large-scale point clouds, strong hard-
ware dependency, and low generalization ability. The shortcomings of PointDSC include
severe performance dependence on initial matching, high computational complexity, etc.
3. Methodology
Our registration method includes three steps: keypoint detection with boundary re-
moval, feature descriptor generator, and accurate transformation estimation. In the pro-
cess of keypoint detection with boundary removal, we can detect keypoints from the point
cloud that are not located around the boundary, allowing us to describe complete neigh-
borhood information of keypoints and generate a good feature descriptor for each key-
point. In the process of feature descriptor generation, we use our multi-scale covariance
matrix descriptor to extract local features of keypoints, making it more robust to noise and
invariant to rigid transformation. During the accurate transformation estimation, we use
feature matching, coarse transformation estimation based on clustering, and fine transfor-
mation estimation based on an ICP’s variant to estimate an accurate transformation be-
tween a pair of point clouds. The framework of our point cloud registration is shown in
Figure 1.
Figure 1.
The framework of our point cloud registration.
3.1. Keypoint Detection with Bounday Removal
We put forward a method for keypoint detection with boundary removal that detects
firstly keypoints and boundary points, respectively, and then removes keypoints close to
boundary points. The keypoint detection with boundary removal makes sure detected
Keypoint Detection with
Bounda ry Removed
Keypoint Detection with
Bounda ry Removed
Source Point Cloud Target Point Cloud
Ultimate Registration
Result
Keypoint Detection
Source Keypoint Set Target Keypoint Set
Feature Descr iptor
Generation
Feature Descriptor
Generation
Feature Descr iptor Generation
Source Descriptor Set Targe t Descriptor Set
Accurate
Transformation
Estimation
Fine Trans formation Estimat ion
Feature matc hin g
Matched
Keypoint Pa irs
Initial Registration
Result
Coars e Trans forma tion Estimation
Figure 1. The framework of our point cloud registration.
3.1. Keypoint Detection with Bounday Removal
We put forward a method for keypoint detection with boundary removal that detects
firstly keypoints and boundary points, respectively, and then removes keypoints close to
boundary points. The keypoint detection with boundary removal makes sure detected
keypoints are of high quality, which can be better matched with the keypoints on another
point cloud.
3.1.1. Keypoint Detector
The goal of the keypoint detector is to choose a subset of conspicuous points with
good repeatability from the surface of a point cloud. The discovered keypoints should be
very robust to noise and highly invariant to rigid transformation.
Appl. Sci. 2024,14, 9375 5 of 23
Given a point cloud
P={p1,p2, . . . , pN}∈R3
and
pi=[xi,yi,zi]T∈P
, the mean
p
and covariance matrix C of P can be computed as follows:
p=1
N
N
∑
i=1
pi(1)
C=1
N
N
∑
i=1
(pi−p)(pi−p)T(2)
then, applying eigenvalue decomposition on the covariance matrix C yields the following
results:
CV =EV (3)
where V is the matrix of eigenvectors, E is the diagonal eigenvalues matrix of C, and each
column in V corresponds to a diagonal eigenvalue in the diagonal matrix E.
Following that, using Formula (4), point cloud P is aligned with the principal axes
established by V, which is known as Principal Component Analysis (PCA).
P′=V(P−p)(4)
where P′is a normalized point cloud of point cloud P.
For each point
p′
i
in point cloud
P′
, a local neighborhood Nbhd (
p′
i
) is used to prune
from P′using a sphere with a radius r(so-called support radius) centered at p′
i. Let X and
Y stand for the x and y components of Nbhd (p′
i) as follows:
X={x1,x2, . . . , xl}(5)
Y={y1,y2, . . . , yl}(6)
where lis the length of Nbhd (p′
i). Then, a surface variation index ϑis defined as follows:
ϑ=max(X)−min(X)
max(Y)−min(Y)(7)
where the local neighborhood’s geometric variation around the point
p′
i
is reflected by
the surface variation index
ϑ
, which is the ratio between the local neighborhood’s first
two principal axes centered at
p′
i
.
ϑ
is 1 for a symmetric local point set, such as a plane or
sphere, and more than 1 for an asymmetric local point set. The point in
P′
that has a surface
variation index higher than εϑis regarded as a keypoint and is defined as follows:
ϑ>εϑ(8)
3.1.2. Boundary Point Detector
We project a point’s neighboring points onto the point’ tangent plane, as seen in
Figure 2, and it can be observed that the majority of a boundary point’s surrounding points
projected on the tangent plane are located on one side of the boundary point. There will,
therefore, be a notch in the line connecting the projection points and the boundary point
when these surrounding points are projected onto the boundary point’s tangent plane. One
can ascertain if a point is a boundary point or not by measuring the angle of the notch.
The following illustrates how to decide whether a point
pi
in a point cloud
P
is a
boundary point.
•
Find
pi’s
surrounding points centered at
pi
on a sphere with radius
r
and designate
them as Nbhd(pi).
Nbhd(pi)=pj
pj∈P,∥pj−pi∥ ≤ r(9)
Appl. Sci. 2024,14, 9375 6 of 23
•Project each point pjin Nbhd(pi)to a tangent plane centered at piwith the following
formula:
qj=pj−n·−−→
pipj×n(10)
where nis the normal of pi, and designate these projection points as Nbhdqj.
•
Find the nearest point to
pi
, designate it as
qu
, and then construct a local reference
frame (
pi
,
u
,
v
,
w
) in which
w
axis is
pi
’s normal
n
,
u
axis is
−−→
piqu/
−−→
piqu
, and
v
axis is
u×w.
•
Find the included angle by calculating the angle between the vector
−−→
piqj
and
u
axis,
respectively, along a clockwise manner. Then, designate the set of the included angle
as S=(α1,α2, . . . , αk), where krepresents the number of all the included angles.
•
Assign
S′=α′
1,α′
2, . . . , α′
k
after sorting
S
in ascending order and then compute the
difference between adjacent included angles as follows:
Li=α′
i+1−α′
i(11)
•
Find maximum difference
Lmax
and determine
pi
is a boundary point if
Lmax >Lth
.
Lth
is a threshold in this case, whose value is modifiable to determine whether a point
is a boundary point or not.
Appl. Sci. 2024, 14, x FOR PEER REVIEW 6 of 24
q
p
(
a
) p is a non-boundary point.
(
b
) q is a boundary point.
Figure 2.
Distribution of a boundary and a non-boundary point with their neighboring points.
The following illustrates how to decide whether a point in a point cloud is a
boundary point.
• Find surrounding points centered at on a sphere with radius and desig-
nate them as ().
Nbhd() = {|, } (9
)
• Project each point in () to a tangent plane centered at with the follow-
ing formula:
=
× (10
)
where is the normal of , and designate these projection points as Nbhd().
• Find the nearest point to , designate it as , and then construct a local reference
frame (,,,) in which axis is s normal , axis is
/|
|, and axis
is ×.
• Find the included angle by calculating the angle between the vector
and axis,
respectively, along a clockwise manner. Then, designate the set of the included angle
as = (,, … , ), where represents the number of all the included angles.
• Assign = (
,
, … ,
) after sorting in ascending order and then compute the
difference between adjacent included angles as follows:
=
(11
)
• Find maximum difference and determine is a boundary point if >
. is a threshold in this case, whose value is modifiable to determine whether
a point is a boundary point or not.
3.1.3. Boundary Keypoint Removal
The process of boundary keypoints removal is depicted as follows:
• Traverse all points in the point cloud and detect its boundary points into a set
named , following the above boundary point detector.
• Traverse all points in the point cloud and detect its entire keypoints into a set
named following the above keypoint detector.
• For every keypoint in , find its nearest distance to a boundary point
in and remove from if < 5 .
3.2. Feature Descriptor Generation
We put forward a multi-scale covariance matrix descriptor that gathers geometric
relation, surface variation gradient, and eigenvalue variation gradient from a keypoint
and its neighborhood within several support radii and encodes such information into a
covariance matrix.
Figure 2. Distribution of a boundary and a non-boundary point with their neighboring points.
3.1.3. Boundary Keypoint Removal
The process of boundary keypoints removal is depicted as follows:
•
Traverse all points in the point cloud
P
and detect its boundary points into a set named
Boundarie, following the above boundary point detector.
•
Traverse all points in the point cloud
P
and detect its entire keypoints into a set named
Kpts following the above keypoint detector.
•
For every keypoint
kp
in
Kpts
, find its nearest distance
dmin
to a boundary point in
Boundaries and remove kp from Kpts if dmin <5mr.
3.2. Feature Descriptor Generation
We put forward a multi-scale covariance matrix descriptor that gathers geometric
relation, surface variation gradient, and eigenvalue variation gradient from a keypoint
and its neighborhood within several support radii and encodes such information into a
covariance matrix.
3.2.1. Covariance Matrix Descriptor
In probability theory and statistics, covariance is a measure of the joint variability of
two random variables, and it is used as a descriptor in image processing. A set of random
variables relates to a set of observable properties that may be detected from a keypoint and
its neighborhood in the context given by the covariance matrix descriptor, such as normal
values, curvature values, 3D coordinates, and so on. As a result, the first step of generating
Appl. Sci. 2024,14, 9375 7 of 23
a covariance matrix descriptor is to construct a variable selection function
∅(p,r)
for a
given keypoint p as shown below.
∅(p,r)=φpi,∀pis.t.|p−pi|≤r(12)
where
pi
is a neighboring point of p,
φpi
is a random variable feature vector obtained be-
tween p and each of its neighboring points, and r is the support radius of the neighborhood
of p.
To make selected random variables in the feature vector robust to noise and invariant
to rigid transformation, these random variables should be computed relative to the keypoint
p. As a result, we define φpias follows:
φpi= (cos(α(pip)), cos(β(pip)), cos(γ(pip)),ϑ(pip),λ1(pip),λ2(pip),λ3(pip))T(13)
where
cos(α(pip))
represents the cosine between the segment from p to
pi
and the normal
vector n of p;
cos(β(pip))
represents the cosine between the segment from p to
pi
and
the normal vector
ni
of
pi
;
cos(γ(pip))
represents the cosine between the normal vector
ni
of
pi
and the normal vector n of p;
ϑ(pip)
represents the surface variation gradient
between p and
pi
; and
λ1(pip)
,
λ2(pip)
,
and λ3(pip)
represent the gradient of eigenvalue
variation of two covariance matrices constructed by two neighborhoods centered at p
and
pi
, respectively. It should be noted that we utilize the cosine rather than the angle
of
α(pip)
,
β(pip)
,
and γ(pip)
to save computation time. From (14) to (20), the calculation
formulae are provided, and the three geometric relations
α(pip)
,
β(pip)
, and
γ(pip)
are
shown in Figure 3.
cos(α(pip)) = (pi−p)·ni
∥(pi−p)∥(14)
cos(β(pip)) = (pi−p)·n
∥(pi−p)∥(15)
cos(γ(pip)) =ni·n(16)
ϑ(pip)=ϑpi−ϑp(17)
λ1(pip)=λ1(pi)−λ1(p)(18)
λ2(pip)=λ2(pi)−λ2(p)(19)
λ3(pip)=λ3(pi)−λ3(p)(20)
where (
·
) represents the dot product between two points and
∥·∥
represents the Euclidean
distance between two points.
Appl. Sci. 2024, 14, x FOR PEER REVIEW 8 of 24
Figure 3.
Geometric relations α, β, and γ between a keypoint
p
and one of its neighboring
points.
Following the definition of a variable selection function ∅(𝑝,𝑟) for the keypoint p,
the covariance matrix descriptor of a keypoint p is defined as follows:
𝐶∅(𝑝,𝑟)= 1
𝑁 (𝜑−𝑢)(𝜑−𝑢)
(21)
where N represents the number of neighboring points of p within its support radius r and
𝑢 is the mean of these vectors. Our proposed feature descriptor offers a more flexible
representation since it considers 3D points as samples of a distribution and, by construc-
tion, subtracts the mean of this sample distribution; therefore, in case of noise, it will be
naturally attenuated.
3.2.2. Multi-Scale Covariance Matrix Descriptor
We adjust the value of the support radius r to generate covariance descriptors at var-
ious scales for a keypoint. A keypoint’s multi-scale covariance descriptor is defined as
follows:
𝐶(𝑝)={𝐶
∅(𝑝,𝑟),∀𝑟 ∈{𝑟,…,𝑟
}} (22)
Compared to a covariance matrix descriptor with a fixed radius, a multi-scale covar-
iance descriptor for a keypoint can improve its description.
3.3. Accurate Transformation Estimation
An accurate transformation estimation includes feature matching, coarse transfor-
mation estimation, and fine transformation estimation. By feature matching, we can find
coarse correspondences between a pair of point clouds, and we can obtain a coarse trans-
formation matrix between the pair of point clouds by coarse transformation estimation.
After applying the coarse transformation matrix to the source point cloud which is a trans-
formable point cloud in the pair of point clouds, we use a fine transformation estimation
to estimate a fine matrix between the transformed source point cloud and the target point
cloud, which is fixed point cloud in the original pair of point clouds. Through these three
processes, the transformation between a pair of point clouds will become increasingly ac-
curate.
3.3.1. Feature Matching
The goal of feature matching is to find feature correspondences between two sets of
multi-scale covariance matrix descriptors and then find keypoint correspondences be-
tween the source point cloud and the target point cloud. To establish feature correspond-
ences, we must determine the similarity of two multi-scale covariance matrix descriptors
Figure 3. Geometric relations α,β, and γbetween a keypoint pand one of its neighboring points.
Appl. Sci. 2024,14, 9375 8 of 23
Following the definition of a variable selection function
∅(p,r)
for the keypoint p, the
covariance matrix descriptor of a keypoint pis defined as follows:
Cr(∅(p,r)) =1
N
N
∑
i=1φpi−uφpiφpi−uφpiT(21)
where Nrepresents the number of neighboring points of pwithin its support radius rand
uφpi
is
the mean of these vectors. Our proposed feature descriptor offers a more flexible representation
since it considers 3D points as samples of a distribution and, by construction, subtracts the
mean of this sample distribution; therefore, in case of noise, it will be naturally attenuated.
3.2.2. Multi-Scale Covariance Matrix Descriptor
We adjust the value of the support radius r to generate covariance descriptors at
various scales for a keypoint. A keypoint’s multi-scale covariance descriptor is defined
as follows:
CM(p)={Cr(∅(p,r)),∀r∈{r1, . . . , rs}} (22)
Compared to a covariance matrix descriptor with a fixed radius, a multi-scale covari-
ance descriptor for a keypoint can improve its description.
3.3. Accurate Transformation Estimation
An accurate transformation estimation includes feature matching, coarse transforma-
tion estimation, and fine transformation estimation. By feature matching, we can find coarse
correspondences between a pair of point clouds, and we can obtain a coarse transformation
matrix between the pair of point clouds by coarse transformation estimation. After apply-
ing the coarse transformation matrix to the source point cloud which is a transformable
point cloud in the pair of point clouds, we use a fine transformation estimation to estimate
a fine matrix between the transformed source point cloud and the target point cloud, which
is fixed point cloud in the original pair of point clouds. Through these three processes, the
transformation between a pair of point clouds will become increasingly accurate.
3.3.1. Feature Matching
The goal of feature matching is to find feature correspondences between two sets of
multi-scale covariance matrix descriptors and then find keypoint correspondences between
the source point cloud and the target point cloud. To establish feature correspondences, we
must determine the similarity of two multi-scale covariance matrix descriptors followed by
the feature correspondences between two sets of multi-scale covariance matrix descriptors.
•Similarity between two multi-scale covariance matrix descriptors.
The distance between two covariance matrix descriptors can be used to represent their
similarity. The distance is closer, and the similarity is greater. Because the covariance
matrix is a symmetric positive definite matrix, the geodesic distance between two
covariance matrices can be used to measure their similarity.
In recent years, various geodesic distance formulas have been proposed. In this paper,
Jensen–Bregman LogDet Divergence [33] is employed, and it is defined as follows:
dJB LD (X,Y)=log
X+Y
2
−1
2log|XY|(23)
where
|·|
is the matrix determinant and
log(·)
is the natural logarithm. This metric
has been proven to be more efficient in terms of speed without sacrificing application
performance [
28
], which is why we use it to compare the similarity of two covariance
matrix descriptors in this paper.
Appl. Sci. 2024,14, 9375 9 of 23
Additionally, the similarity between two multi-scaled covariance matrix descriptors
can be defined using formula (24) as follows:
dC1
M,C2
M=1
s∑
i=r1...rs
dJB LD C1
i,C2
i(24)
where
C1
M
and
C2
M
correspond to two covariance matrix descriptors relating to dif-
ferent radius scales ranging from
r1
to
rs
and
dJB LD C1
i,C2
i
represents the similarity
of two covariance matrix descriptors with a fixed scale
ri
. Formula (24) considers
similarity under multi-scales and utilizes the mean of similarity under multi-scales
as the similarity of multi-scaled covariance matrix descriptors. A covariance matrix
descriptor stated in the rest of this paper refers to a multi-scale covariance matrix
descriptor unless otherwise specified.
•Correspondence between two sets of covariance matrix descriptors.
We present a bidirectional nearest neighbor distance ratio (BNNDR) to detect corre-
spondences from two sets of covariance matrix descriptors, which differs from existing
techniques, such as the nearest neighbor (NN) and the nearest neighbor distance ratio
(NNDR). The detailed matching procedure via the BNNDR is as follows.
First, assume that
DS=di
S
and
DT=ndj
To
are the sets of covariance matrix de-
scriptors from a source point PSand a target point cloud PT, respectively.
Next, each covariance matrix descriptor
di
T
in
DT
is matched against all the covari-
ance matrix descriptors in
DS
to yield the first and second nearest covariance matrix
descriptors to di
T:dj′
Sand dj′′
S, respectively.
dj′
S=min
dj
S∈DS
d(di
T,dj
S)(25)
dj′′
S=min
dj
S∈DS\dj′
S
d(di
T,dj
S)(26)
where
DS\dj′
S
is the set
DS
, excluding the covariance matrix descriptor
dj′
S
, and then
the NNDR rdist is defined as follows:
rdist =ddi
T,dj′
S
ddi
T,dj′′
S(27)
if the ratio
rdist
is less than a threshold
εr
, (
di
T
,
dj′
S
) can be thought of as a candidate-
matched feature pair, and (
kpi
T
,
kpj′
S
) can be also thought of as a candidate-matched
keypoint pair, with keypoint
kpi
T
corresponding to
di
S
and keypoint
kpj′
S
corresponding
to dj′
S.
Then,
dj′
S
is also matched against all the covariance matrix descriptors in
DT
to gener-
ate its corresponding feature descriptor
di′
T
following the preceding procedures, and the
matched keypoint pair (
kpj′
S
,
kpi′
T
) can be acquired by the corresponding covariance matrix
descriptor pair.
Finally, (
di
T
,
dj′
S
) is a matched feature pair if
kpi′
T
is the same keypoint as
kpi
T
; otherwise,
(di
T,dj′
S) is an unmatched feature pair.
3.3.2. Coarse Transformation Estimation
So far, a correspondence set
C
is constructed between
PS
and
PT
. The goal of coarse
transformation estimation is to find an optimal rigid transformation
To
from
C
. The
Appl. Sci. 2024,14, 9375 10 of 23
RANSAC algorithm and its variants [
29
,
30
] are popular approaches to solving the problem.
However, they face challenges due to a lack of robustness and a high time consumption.
To estimate
To
, we put forward a coarse transformation estimation method based on
clustering. The steps are outlined below in detail.
•Generate a Tifrom a correspondence Ci.
In theory, a correspondence
Ci
is inadequate to estimate a rigid transformation between
a pair of point clouds. Yet, if a correspondence is oriented with a local reference frame
(LRF), estimating a rigid transformation is adequate [
34
]. We have constructed the
LRF for each point during the keypoint detection with boundary removal, and we
can utilize the LRFs to estimate transformations. The following are the formulas for
estimating a rigid transformation:
Ri=LRFj′
STLRFi
T(28)
ti=kpi
T−kpj′
SRi(29)
where (
kpi
T
,
kpj′
S
) is a matched keypoint pair,
LRFj′
S
and
LRFi
T
are local reference
frames of
kpj′
S
and
kpi
T
, respectively, and
Ri
and
ti
signify a rotation matrix and a
translation vector. The advantage of correspondence over traditional RANSAC is that
the computational complex is lowered from
On3
to
O(n)
, where nis the number
of correspondences.
•Estimate an optimal rigid transformation based on clustering.
Each rotation matrix
Ri
is transformed to Euler angle rotations and concatenated with
the corresponding translation vector
ti
to generate a six-dimensional vector before
clustering. Each 6D vector denotes six degrees of freedom, consisting of three rotations
and three translations along the x, y, and z axes. The k-means clustering algorithm
is then used to cluster these 6D vectors and construct
κ
clusters, with each cluster
center being a transformation
T
that includes a rotation and translation. The next step
is to extract a translation vector and a rotation matrix from the cluster center. Then,
we transform the source point cloud
PS
using the transformation described above,
naming it
P′
S
. For every point in
P′
S
, we find its nearest point in
PT
, and we consider
a point
pi′
S
in
P′
S
to be an inlier if the distance between
pi′
S
and its nearest point
pj
T
in
pj
Tis less than a threshold, i.e.,
pi′
S−pj
T
<ϵd(30)
and, after that, the definition of the ratio of inliers is shown as follows:
rtinliers =ninlines/ns(31)
in which
ninlines
represents the number of inliers and
ns
represents the number of
points in PS. Finally, we choose an optimal Tas To, whose rtinl iers is the highest.
3.3.3. Fine Transformation Estimation
Following the completion of the preceding two phases, we acquire an initial transfor-
mation
To
between the source and target point clouds and then transform the source point
using the transformation
To
before fine registration. We apply a recent ICP variant [
21
] to
complete fine transformation estimation, which can register accurately the source point
cloud to the target point cloud.
4. Experiments
The experiments involve the comparison of performance and registration effectiveness
between our proposed feature descriptor based on multi-scale covariance matrix and state-
Appl. Sci. 2024,14, 9375 11 of 23
of-the-art feature descriptors, with the goal of testing the performance of our proposed
feature descriptor and registration approach.
4.1. Experiments of Keypoint Detection with Boundary Removal
The task of this section is to validate whether our keypoint detection with boundary
removed can detect boundary points in a point cloud and can obtain keypoints far away
from the boundary.
4.1.1. Dataset
To conduct these experiments, we initially employed partial point clouds of Armadillo,
Bunny, and Buddha et al. from the Stanford 3D Scanning Repository [
35
] for our research.
For Armadillo, we included 11 partial point clouds in our collection, numbered from
ArmadilloBack_0 to ArmadilloBack_330 with 60-degree intervals between them. For Bunny,
we provided six partial point clouds in our dataset, labeled Bun000, Bun045, Bun090,
Bun180, Bun270, and Bun315. For Happy Buddha, we chose 15 partial point clouds from
our dataset and called them HappyStandRight_0 to HappyStandRight_336 with 24-degree
intervals between them. Samples of point clouds from our dataset are shown in Figure 4.
Appl. Sci. 2024, 14, x FOR PEER REVIEW 11 of 24
in which represents the number of inliers and represents the number of
points in . Finally, we choose an optimal as , whose is the highest.
3.3.3. Fine Transformation Estimation
Following the completion of the preceding two phases, we acquire an initial trans-
formation between the source and target point clouds and then transform the source
point using the transformation before fine registration. We apply a recent ICP variant
[21] to complete fine transformation estimation, which can register accurately the source
point cloud to the target point cloud.
4. Experiments
The experiments involve the comparison of performance and registration effective-
ness between our proposed feature descriptor based on multi-scale covariance matrix and
state-of-the-art feature descriptors, with the goal of testing the performance of our pro-
posed feature descriptor and registration approach.
4.1. Experiments of Keypoint Detection with Boundary Removal
The task of this section is to validate whether our keypoint detection with boundary
removed can detect boundary points in a point cloud and can obtain keypoints far away
from the boundary.
4.1.1. Dataset
To conduct these experiments, we initially employed partial point clouds of Arma-
dillo, Bunny, and Buddha et al. from the Stanford 3D Scanning Repository [35] for our
research. For Armadillo, we included 11 partial point clouds in our collection, numbered
from ArmadilloBack_0 to ArmadilloBack_330 with 60-degree intervals between them. For
Bunny, we provided six partial point clouds in our dataset, labeled Bun000, Bun045,
Bun090, Bun180, Bun270, and Bun315. For Happy Buddha, we chose 15 partial point
clouds from our dataset and called them HappyStandRight_0 to HappyStandRight_336
with 24-degree intervals between them.
Samples of point clouds from our dataset are
shown in Figure 4.
(a) ArmadilloBack_0.
(b) ArmadilloBack_60.
(c) Bun000.
(d) Bun045.
Figure 4. Samples of point clouds from our dataset.
4.1.2. Results of Keypoint Detection with Boundary Removal
To determine if a location is a boundary point, we choose neighboring points within
a radius of 4mr (the support radius) and assign different values. Figure 5 depicts the
boundary points of the point clouds indicated in Figure 4 under various , with the red
points representing boundary points and the green points representing the original point
clouds in Figure 4. Figure 5 clearly shows that the number of boundary points is the min-
imum, and the boundary points are discrete when is set to π. There are more bound-
ary locations when is between π/2 and π/4. When is set to π/3 or π/4, the
boundary points overlap and create many clusters. In subsequent studies, we set to
Figure 4. Samples of point clouds from our dataset.
4.1.2. Results of Keypoint Detection with Boundary Removal
To determine if a location is a boundary point, we choose neighboring points within
a radius of 4 mr (the support radius) and assign different
Lth
values. Figure 5depicts
the boundary points of the point clouds indicated in Figure 4under various
Lth
, with the
red points representing boundary points and the green points representing the original
point clouds in Figure 4. Figure 5clearly shows that the number of boundary points is
the minimum, and the boundary points are discrete when
Lth
is set to
π
. There are more
boundary locations when
Lmax
is between
π
/2 and
π
/4. When
Lth
is set to
π
/3 or
π
/4,
the boundary points overlap and create many clusters. In subsequent studies, we set
Lth
to
π
/3, recognizing that too many boundary points reduce the time efficiency of finding
keypoints and that too few boundary points reduce the effectiveness of keypoint detection.
After acquiring boundary points for a point cloud, we may use our offered keypoint
detector to extract original keypoints and delete boundary keypoints. In Figure 6, columns
(a) and (c) illustrate keypoints detected by our keypoint detector for the various point
clouds indicated in Figure 4, with red points representing keypoints and green points
representing original points. After retrieving the original keypoints, we shall eliminate
any keypoints that are less than 5 mr from the nearest boundary point. Columns (b) and
(d) in Figure 6show the current keypoints after deleting the boundary keypoints from
the original keypoints. When compared to keypoints in columns (a), (b), (c), and (d) in
Figure 6, it is obvious that our approach for keypoint detection with boundary removal can
yield keypoints on non-boundaries, which is critical for generating feature descriptors and
matching point pairs between a pair of point clouds.
Appl. Sci. 2024,14, 9375 12 of 23
Appl. Sci. 2024, 14, x FOR PEER REVIEW 12 of 24
π/3, recognizing that too many boundary points reduce the time efficiency of finding key-
points and that too few boundary points reduce the effectiveness of keypoint detection.
(a) 𝐿
=𝜋
(b) 𝐿
=𝜋/2
(c) 𝐿
=𝜋/3
(d) 𝐿
=𝜋/4
Figure 5.
Boundary points under various
differences between adjacent included angles.
After acquiring boundary points for a point cloud, we may use our offered keypoint
detector to extract original keypoints and delete boundary keypoints. In Figure 6, columns
(a) and (c) illustrate keypoints detected by our keypoint detector for the various point
clouds indicated in Figure 4, with red points representing keypoints and green points rep-
resenting original points. After retrieving the original keypoints, we shall eliminate any
keypoints that are less than 5 mr from the nearest boundary point. Columns (b) and (d) in
Figure 6 show the current keypoints after deleting the boundary keypoints from the orig-
inal keypoints. When compared to keypoints in columns (a), (b), (c), and (d) in Figure 6, it
is obvious that our approach for keypoint detection with boundary removal can yield
keypoints on non-boundaries, which is critical for generating feature descriptors and
matching point pairs between a pair of point clouds.
Figure 5. Boundary points under various differences between adjacent included angles.
Appl. Sci. 2024, 14, x FOR PEER REVIEW 13 of 24
(a)
(b)
(c)
(d)
Figure 6.
Keypoints on different point clouds.
(a) Keypoints illustrator 1 with boundary point
retained. (b) Keypoints illustrator 1 with boundary point removed. (c) Keypoints illustrator 2 with
boundary point retained. (d) Keypoints illustrator 2 with boundary point removed.
4.2. Experiments on the Performance of Descriptors
4.2.1. Dataset
We chose four source point clouds from the Stanford 3D Scanning Repository for
these experiments to evaluate the performance of our proposed feature descriptor in these
experiments. Our target point clouds are created by superimposing Gaussian noise on our
source point clouds (at three distinct levels of Gaussian noise, with standard deviations of
0.1 mr, 0.3 mr, and 0.5 mr).
4.2.2. Evaluation Metrics of Performance of Descriptors
Our proposed multi-scale covariance matrix descriptor includes only one parameter:
random variables in the vector . As a result, the descriptor’s performance against vari-
ous variables in the vector must be evaluated. In the meantime, a performance com-
parison of our proposed feature descriptor with state-of-the-art feature descriptors is re-
quired.
Our experimental findings are shown using the recall versus 1-precision curve (RP
curve). The recall versus 1-precision curve is a frequent statistic in the literature for eval-
uating a descriptor [36]. It is created in the following manner. Given a target point cloud,
a source point cloud, and a ground truth matrix, keypoints from the target and source
point clouds are detected and described as target feature descriptors and source feature
descriptors, respectively, using our proposed multi-scale covariance matrix descriptor al-
gorithm. The total number of positives is used to calculate the number of target feature
descriptors. To determine the closest feature descriptor, a target feature descriptor is com-
pared to all source feature descriptors. If the distance between the target feature descriptor
and the nearest source feature descriptor is less than a certain threshold, the pair is con-
sidered matched. Furthermore, a matched feature pair is regarded as a valid positive if
the distance between their physical places is short enough; otherwise, it is considered a
false positive. The RP can be calculated using these numbers.
= ℎ
(32
)
1−= ℎ
ℎ (33
)
Figure 6. Keypoints on different point clouds. (a) Keypoints illustrator 1 with boundary point
retained. (b) Keypoints illustrator 1 with boundary point removed. (c) Keypoints illustrator 2 with
boundary point retained. (d) Keypoints illustrator 2 with boundary point removed.
Appl. Sci. 2024,14, 9375 13 of 23
4.2. Experiments on the Performance of Descriptors
4.2.1. Dataset
We chose four source point clouds from the Stanford 3D Scanning Repository for
these experiments to evaluate the performance of our proposed feature descriptor in these
experiments. Our target point clouds are created by superimposing Gaussian noise on our
source point clouds (at three distinct levels of Gaussian noise, with standard deviations of
0.1 mr, 0.3 mr, and 0.5 mr).
4.2.2. Evaluation Metrics of Performance of Descriptors
Our proposed multi-scale covariance matrix descriptor includes only one parameter:
random variables in the vector
φ
. As a result, the descriptor’s performance against various
variables in the vector
φ
must be evaluated. In the meantime, a performance comparison of
our proposed feature descriptor with state-of-the-art feature descriptors is required.
Our experimental findings are shown using the recall versus 1-precision curve (RP
curve). The recall versus 1-precision curve is a frequent statistic in the literature for
evaluating a descriptor [
36
]. It is created in the following manner. Given a target point
cloud, a source point cloud, and a ground truth matrix, keypoints from the target and source
point clouds are detected and described as target feature descriptors and source feature
descriptors, respectively, using our proposed multi-scale covariance matrix descriptor
algorithm. The total number of positives is used to calculate the number of target feature
descriptors. To determine the closest feature descriptor, a target feature descriptor is
compared to all source feature descriptors. If the distance between the target feature
descriptor and the nearest source feature descriptor is less than a certain threshold, the pair
is considered matched. Furthermore, a matched feature pair is regarded as a valid positive
if the distance between their physical places is short enough; otherwise, it is considered a
false positive. The RP can be calculated using these numbers.
recall =the nu mber o f cor rect positives
tot al number o f positives (32)
1−precision =the n umber o f f alse positives
tot al number o f matche d f eature pair s (33)
4.2.3. Comparison between Different Feature Vectors for Covariance Descriptor
Random variables in the feature vector play a significant part in the generation of our
proposed covariance matrix descriptor, which impacts not only the encapsulation ability of
information in the covariance matrix but also the length of the feature vector. The following
feature vectors are concatenated by random variables in this paper:
F3=(cos(α(pip)), cos(β(pip)), cos(γ(pip)))
F4=(σ(pip),λ1(pip),λ2(pip),λ3(pip))
F7=(cos(α(pip)), cos(β(pip)), cos(γ(pip))