ArticlePDF Available

Abstract and Figures

The relative pose estimation of an inactive spacecraft by an active servicer spacecraft is a critical task in the design of current and planned space missions, due to its relevance for close-proximity operations, such as In-Orbit Servicing and Active Debris Removal. This paper introduces a novel framework to enable robust monocular pose estimation for close-proximity operations around an uncooperative spacecraft, which combines a Convolutional Neural Network (CNN) for feature detection with a Covariant Efficient Procrustes Perspective-n-Points (CEPPnP) solver and a Multiplicative Extended Kalman Filter (MEKF). The performance of the proposed method is evaluated at different levels of the pose estimation system. A Single-stack Hourglass CNN is proposed for the feature detection step in order to decrease the computational load of the Image Processing (IP), and its accuracy is compared to the standard, more complex High-Resolution Net (HRNet). Subsequently, heatmaps-derived covariance matrices are included in the CEPPnP solver to assess the pose estimation accuracy prior to the navigation filter. This is done in order to support the performance evaluation of the proposed tightly-coupled approach against a loosely-coupled approach, in which the detected features are converted into pseudomeasurements of the relative pose prior to the filter. The performance results of the proposed system indicate that a tightly-coupled approach can guarantee an advantageous coupling between the rotational and translational states within the filter, whilst reflecting a representative measurements covariance. This suggest a promising scheme to cope with the challenging demand for robust navigation in close-proximity scenarios. Synthetic 2D images of the European Space Agency’s Envisat spacecraft are used to generate datasets for training, validation and testing of the CNN. Likewise, the images are used to recreate a representative close-proximity scenario for the validation of the proposed filter.
Content may be subject to copyright.
Evaluation of Tightly- and Loosely-coupled Approaches
in CNN-based Pose Estimation Systems for
Uncooperative Spacecraft
Lorenzo Pasqualetto Cassinisa,
, Robert Fonoda, Eberhard Gilla, Ingo Ahrnsb,
Jesus Gill Fernandezc
aDelft University of Technology, Kluyverweg 1 2629 HS, Delft, The Netherlands
bAirbus DS GmbH, Airbusallee 1, 28199, Bremen, Germany
cESTEC, Keplerlaan 1, 2201 AZ, Noordwijk, The Netherlands
Abstract
The relative pose estimation of an inactive spacecraft by an active servicer
spacecraft is a critical task in the design of current and planned space missions,
due to its relevance for close-proximity operations, such as In-Orbit Servicing
and Active Debris Removal. This paper introduces a novel framework to enable
robust monocular pose estimation for close-proximity operations around an
uncooperative spacecraft, which combines a Convolutional Neural Network
(CNN) for feature detection with a Covariant Efficient Procrustes Perspective-n-
Points (CEPPnP) solver and a Multiplicative Extended Kalman Filter (MEKF).
The performance of the proposed method is evaluated at different levels of the
pose estimation system. A Single-stack Hourglass CNN is proposed for the feature
detection step in order to decrease the computational load of the Image Processing
(IP), and its accuracy is compared to the standard, more complex High-Resolution
Net (HRNet). Subsequently, heatmaps-derived covariance matrices are included
in the CEPPnP solver to assess the pose estimation accuracy prior to the
navigation filter. This is done in order to support the performance evaluation
of the proposed tightly-coupled approach against a loosely-coupled approach,
Corresponding author
Email addresses: L.PasqualettoCassinis@tudelft.nl (Lorenzo Pasqualetto Cassinis),
Robert.Fonod@ieee.org (Robert Fonod), E.K.A.Gill@tudelft.nl (Eberhard Gill),
ingo.ahrns@airbus.com
(Ingo Ahrns),
Jesus.Gil.Fernandez@esa.int
(Jesus Gill Fernandez)
Preprint submitted to Elsevier February 25, 2021
in which the detected features are converted into pseudomeasurements of the
relative pose prior to the filter. The performance results of the proposed system
indicate that a tightly-coupled approach can guarantee an advantageous coupling
between the rotational and translational states within the filter, whilst reflecting a
representative measurements covariance. This suggest a promising scheme to cope
with the challenging demand for robust navigation in close-proximity scenarios.
Synthetic 2D images of the European Space Agency’s Envisat spacecraft are used
to generate datasets for training, validation and testing of the CNN. Likewise,
the images are used to recreate a representative close-proximity scenario for the
validation of the proposed filter.
Keywords: Relative Pose Estimation, Active Debris Removal, In-Orbit
Servicing, Monocular-based Navigation Filters, Convolutional Neural Networks,
2010 MSC: 00-01, 99-00
1. Introduction
Nowadays, key Earth-based applications such as remote sensing, navigation,
and telecommunication, rely on satellite technology on a daily basis.
To ensure
a high reliability of these services, the safety and operations of satellites in orbit
has to be guaranteed. In this context, advancements in the field of Guidance,
5
Navigation, and Control (GNC) were made in the past years to cope with
the challenges involved in In-Orbit Servicing (IOS) and
Active Debris Removal
(ADR) missions [
1
,
2
]. For such scenarios, the estimation of the relative pose
(position and attitude) of an uncooperative spacecraft by an active servicer
spacecraft represents a critical task. Compared to cooperative close-proximity
10
missions, the pose estimation problem is indeed complicated by the fact that the
target satellite is not functional and/or not able to aid the relative navigation.
Hence, optical sensors shall be preferred over Radio Frequency (RF) sensors to
cope with a lack of navigation devices such as Global Positioning System (GPS)
sensors and/or antennas onboard the target.15
2
From a high-level perspective, optical sensors can be divided into active and
passive devices, depending on whether they require power to function, i.e. LIght
Detection And Ranging (LIDAR) sensors and Time-Of-Flight (TOF) cameras, or
if they passively acquire radiation, i.e. monocular and stereo cameras. Spacecraft
20
relative navigation usually exploits Electro-Optical (EO) sensors such as stereo
cameras [
3
,
4
] and/or a LIDAR sensor [
5
] in combination with one or more
monocular cameras, in order to overcome the partial observability that results
from the lack of range information in these latter [
6
]. In this framework, pose
estimation systems based solely on a monocular camera are recently becoming an
25
attractive alternative to systems based on active sensors or stereo cameras, due
to their reduced mass, power consumption and system complexity [
7
]. However,
given the low Signal-To-Noise Ratio (SNR) and the high contrast which charac-
terize space images, a significant effort is still required to comply with most of the
demanding requirements for a robust and accurate monocular-based navigation
30
system. Interested readers are referred to Pasqualetto Cassinis et al.
[8]
for a
recent overview of the current trends in monocular-based pose estimation systems.
Notably, the aforementioned navigation system cannot rely on known visual
markers, as they are typically not installed on an uncooperative target. Since the
extraction of visual features is an essential step in the pose estimation process,
35
advanced Image Processing (IP) techniques are required to extract keypoints
(or interest points), corners, and/or edges on the target body. In model-based
methods, the detected features are then matched with pre-defined features on an
offline wireframe 3D model of the target to solve for the relative pose. In other
words, a reliable detection of key features under adverse orbital conditions is
40
highly desirable to guarantee safe operations around an uncooperative spacecraft.
Moreover, it would be beneficial from a different standpoint to obtain a model of
feature detection uncertainties. This would provide the navigation system with
additional statistical information about the measurements, which could in turn
improve the robustness of the entire estimation process.45
Unfortunately, standard pose estimation solvers such as the Efficient Perspective-
3
n-Point (EPnP) [9], the Efficient Procrustes Perspective-n-Point (EPPnP) [10],
or the multi-dimensional Newton Raphson Method (NRM) [
11
] do not have
the capability to include features uncertainties. Only recently, the Maximum-
50
Likelihood PnP (MLPnP) [
12
] and the Covariant EPPnP (CEPPnP) [
13
] solvers
were introduced to exploit statistical information by including feature covariances
in the pose estimation. Ferraz et al.
[13]
proposed a method for computing
the covariance which takes different camera poses to create a fictitious distri-
bution around each detected keypoint. Other authors proposed an improved
55
pose estimation method based on projection vector, in which the covariance
is associated to the image gradient magnitude and direction at each feature
location [
14
], or a method in which covariance information is derived for each
feature based on feature’s visibility and robustness against illumination changes
[
15
]. However, in all these methods the derivation of features covariance matrices
60
is a lengthy process which generally cannot be directly related to the actual
detection uncertainty. Moreover, this procedure could not be easily applied if
Convolutional Neural Networks (CNNs) are used in the feature detection step,
due to the difficulty to associate statistical meaning to the IP tasks performed
within the network. In this context, another procedure should be followed in
65
which the output of the CNNs is directly exploited to return relevant statistical
information about the detection step. This could, in turn, provide a reliable
representation of the detection uncertainty.
The implementation of CNNs for monocular pose estimation in space has
70
already become an attractive solution in recent years, also thanks to the creation
of the Spacecraft PosE Estimation Dataset (SPEED) [
16
], a database of highly
representative synthetic images of PRISMA’s TANGO spacecraft made publicly
available by Stanford’s Space Rendezvous Laboratory (SLAB) and applicable
to train and test different network architectures. One of the main advantages
75
of CNNs over standard feature-based algorithms for relative pose estimation
[
7
,
17
,
18
] is an increase in robustness under adverse illumination conditions,
as well as a reduction in the computational complexity. Initially, end-to-end
4
CNNs were exploited to map a 2D input image directly into a relative pose
by means of learning complex non-linear functions [
19
,
20
,
21
,
22
]. However,
80
since the pose accuracies of these end-to-end CNNs proved to be lower than
the accuracies returned by common pose estimation solvers, especially in the
estimation of the relative attitude [
19
], recent efforts investigated the capability
of CNNs to perform keypoint localization prior to the actual pose estimation
[
23
,
24
,
25
,
26
]. The output of these networks is a set of so called heatmaps
85
around pre-trained features. The coordinates of the heatmap’s peak intensity
characterize the predicted feature location, with the intensity and the shape
indicating the confidence of locating the corresponding keypoint at this position
[
23
]. Additionally, due to the fact that the trainable features can be selected
offline prior to the training, the matching of the extracted feature points with
90
the features of the wireframe model can be performed without the need of a
large search space for the image-model correspondences, which usually char-
acterizes most of the edges/corners-based methods [
27
]. In this context, the
High-Resolution Net (HRNet) [
28
] already proved to be a reliable and accurate
keypoint detector prior to pose estimation, due to its capability of maintaining
95
a high-resolution representation of the heatmaps through the whole detection
process.
To the best of the authors’ knowledge, the reviewed implementations of
CNNs feed solely the heatmap’s peak location into the pose estimation solver,
100
despite multiple information could be extracted from the detected heatmaps.
Only in Pavlakos et al.
[23]
, the pose estimation is solved by assigning weights
to each feature based on their heatmap’s peak intensities, in order to penalize
inaccurate detections. Yet, there is another aspect related to the heatmaps which
has not been considered. It is in fact hardly acknowledged how the overall shape
105
of the detected heatmaps returned by CNN can be translated into a statistical
distribution around the peak, allowing reliable feature covariances and, in turn,
a robust navigation performance. As already investigated by the authors in
earlier works [29, 30], deriving an accurate representation of the measurements
5
uncertainty from feature heatmaps can in fact not only improve the pose esti-
110
mation, but it can also benefit the estimation of the full relative state vector,
which would include the relative pose as well as the relative translational and
rotational velocities.
From a high level perspective, two different navigation architectures are
115
normally exploited in the framework of relative pose estimation. A tightly-
coupled architecture, where the extracted features are directly processed by the
navigation filter as measurements, and a loosely-coupled architecture, in which
the relative pose is computed by a pose solver prior to the navigation filter, in
order to derive pseudomeasurements from the target features [
31
]. Usually, a
120
loosely-coupled approach is preferred for an uncooperative tumbling target, due
to the fact that the fast relative dynamics could jeopardize feature tracking and
return highly-variable measurements to the filter. However, one shortcoming of
this approach is that it is generally hard to obtain a representative covariance
matrix for the pseudomeasurements. This can be quite challenging when filter
125
robustness is demanded. Remarkably, the adoption of a CNN in the feature
detection step can overcome the challenges in feature tracking by guaranteeing
the detection of a constant, pre-defined set of features. At the same time, the
CNN heatmaps can be used to derive a measurements covariance matrix and
improve filter robustness. Following this line of reasoning, a thightly-coupled
130
filter is expected to interface well with a CNN-based IP and to outperform its
loosely-coupled counterpart.
In this framework, the objective of this paper is to combine a CNN-based
feature detector with a CEPPnP solver whilst evaluating the performance of a
135
proposed tightly-coupled navigation filter against the performance of a loosely-
coupled filter. Specifically, the novelty of this work stands in extending the
authors’ previous findings [
29
,
30
] by further linking the current research on
CNN-based feature detection, covariant-based PnP solvers, and navigation filters.
The main contributions of this work are:140
6
1.
To assess the feasibility of a simplified CNN for feature detection within
the IP
2.
To improve the pose estimation by incorporating heatmaps-derived covari-
ance matrices in the CEPPnP
3.
To compare the performance of tightly- and loosely-coupled navigation
145
filters.
The paper is organized as follows. The overall pose estimation framework is
illustrated in Section II. Section III introduces the proposed CNN architecture
together with the adopted training, validation, and testing datasets. In Section
IV, special focus is given to the derivation of covariance matrices from the
150
CNN heatmaps, whereas Section V describes the CEPPnP solver. Besides,
Section VI provides a description of the tightly- and loosely-coupled filters
adopted. The simulation environment is presented in Section VII together with
the simulation results. Finally, Section VIII provides the main conclusions and
recommendations.155
2. Pose Estimation Framework
This work considers a servicer spacecraft flying in relative motion around a
target spacecraft located in a Low Earth Orbit (LEO), with the relative motion
being described in a Local Vertical Local Horizontal (LVLH) reference frame
co-moving with the servicer (Figure 1a). Furthermore, it is assumed that the
160
servicer is equipped with a single monocular camera. The relative attitude of
the target with respect to the servicer can then be defined as the rotation of the
target body-fixed frame B with respect to the servicer camera frame C, where
these frames are tied to each spacecraft’s body. The distance between the origins
of these two frames defines their relative position. Together, these two quantities
165
characterize the relative pose. This information can then be transferred from
the camera frame to the servicer’s center of mass by accounting for the relative
7
pose of the camera with respect to the LVLH frame.
From a high-level perspective, a model-based monocular pose estimation
170
system receives as input a 2D image and matches it with an existing wireframe
3D model of the target spacecraft to estimate the pose of such target with respect
to the servicer camera. Referring to Figure 1b, the pose estimation problem
consists in determining the position of the target’s centre of mass
tC
and its
orientation with respect to the camera frame C, represented by the rotation
175
matrix RC
B. The Perspective-n-Points (PnP) equations,
rC=xCyCzCT
=RC
BrB+tC(1)
p= (ui, vi) = xC
zCfx+Cx,yC
zCfy+Cy,(2)
relate the unknown pose with a feature point
p
in the image plane via the relative
position
rC
of the feature with respect to the camera frame. Here,
rB
is the
point location in the 3D model, expressed in the body-frame coordinate system
B, whereas
fx
and
fy
denote the focal lengths of the camera and (
Cx, Cy
) is the
180
principal point of the image.
From these equations, it can already be seen that an important aspect of
estimating the pose resides in the capability of the IP system to extract features
p
from a 2D image of the target spacecraft, which in turn need to be matched with
pre-selected features
rB
in the wireframe 3D model. Notably, such wireframe
185
model of the target needs to be made available prior to the estimation. Notice
also that the problem is not well defined for
n <
3 feature points, and can
have up to four positive solutions for
n
= 3 [
33
]. Generally, more features are
required in presence of large noise and/or symmetric objects. Besides, it can
also be expected that the time variation of the relative pose plays a crucial role
190
while navigating around the target spacecraft, e.g. if rotational synchronization
with the target spacecraft is required in the final approach phase. As such, it is
clear that the estimation of both the relative translational and angular velocities
8
(a) Co-moving LVLH frame [32]
(b) PnP problem (Figure adapted from [7])
Figure 1: Representation of the relative motion framework (left) and schematic of the pose
estimation problem using a monocular image (right).
9
Incoming light
CNN-based IP
2D Image
Initialization
CEPPnP
Navigation
Covariance computationFeature detection
Peak location
Covariance
Figure 2: Functional flow of the proposed tightly-coupled pose estimation architecture.
represent an essential step within the navigation system.
195
The proposed tightly-coupled architecture combines the above key ingredi-
ents in three main stages, which are shown in Figure 2 and described in more
detail in the following sections. In the CNN-based IP block, a CNN is used to
extract features from a 2D image of the target spacecraft. Statistical informa-
tion is derived by computing a covariance matrix for each features using the
200
information included in the output heatmaps. In the Navigation block, both
the peak locations and the covariances are fed into the navigation filter, which
estimates the relative pose as well as the relative translational and rotational
velocities. The filter is initialized by the CEPPnP block, which takes peak
location and covariance matrix of each feature as input and outputs the initial
205
relative pose by solving the PnP problem in Eqs.1-2. Thanks to the availability
of a covariance matrix of the detected features, this architecture can guarantee
a more accurate representation of feature uncertainties, especially in case of
inaccurate detection of the CNN due to adverse illumination conditions and/or
unfavourable relative geometries between servicer and target. Together with the
210
CEPPnP initialization, this aspect can return a robust and accurate estimation of
the relative pose and velocities and assure a safe approach of the target spacecraft.
10
In this work, a rectilinear VBAR approach of the servicer spacecraft towards
the target spacecraft is considered, as this typically occurs during the final stages
215
of close-proximity operations in rendezvous and docking missions [
1
,
2
]. This
assumption is justified by the fact that the proposed method needs to be first
validated on simplified relative trajectories before assessing its feasibility under
more complex relative geometries. Following the same line of reasoning, the
relative attitude is also simplified by considering a perturbation-free rotational
220
dynamics between the servicer and the target. This is described in more detail
in Section VI.
3. Convolutional Neural Network
CNNs are currently emerging as a promising features extraction method,
mostly due to the capability of their convolutional layers to extract high-level
225
features of objects with improved robustness against image noise and illumination
conditions. In order to optimize CNNs for the features extraction process, a
stacked hourglass architecture has been proposed [
23
,
24
], and other architectures
such as the U-net [34] and the HRNet [28] were tested in recent years.
Compared to the network proposed by Pavlakos et al.
[23]
, the architecture
230
proposed in this work is composed of only one encoder/decoder block, constituting
a single hourglass module. This was chosen in order to reduce the network size
and comply with the limitations in computing power which characterizes space-
grade processors. The encoder includes six blocks, each including a convolutional
layer formed by a fixed number of filter kernels of size 3
×
3, a batch normalization
235
module and max pooling layer, whereas the six decoder blocks accomodate an
up-sampling block in spite of max pooling. In the encoder stage, the initial
image resolution is decreased by a factor of two, with this downsampling process
continuing until reaching the lowest resolution of 4
×
4 pixels. An upsampling
process follows in the decoder with each layer increasing the resolution by a
240
factor of two and returning output heatmaps at the same resolution as the input
11
image. Figure 3 shows the high-level architecture of the network layers, together
with the corresponding input and output.
Overall, the size of the 2D input image and the number of kernels per
convolutional layer drive the total number of parameters. In the current analysis,
245
an input size of 256
×
256 pixels is chosen, and 128 kernels are considered per
convolutional layer, leading to a total of
1
,
800
,
000 trainable parameters.
Compared to the CNNs analyzed by Sun et al.
[28]
, this represents a reduction
of more than an order of magnitude in network size.
Figure 3: Overview of the single hourglass architecture. Downsampling is performed in the
encoder stage, in which the image size is decrease after each block, whereas upsampling occurs
in the decoder stage. The output of the network consists of heatmap responses, and is used for
keypoints localization.
As already mentioned, the output of the network is a set of heatmaps around
250
the selected features. Ideally, the heatmap’s peak intensity associated to a wrong
detection should be relatively small compared to the correctly detected features,
highlighting that the network is not confident about that particular wrongly-
detected feature. At the same time, the heatmap’s amplitude should provide an
additional insight into the confidence level of each detection, a large amplitude255
being related to large uncertainty about the detection. The network is trained
12
Figure 4: Illustration of the selected features for a given Envisat pose.
Table 1: Parameters of the camera used to generate the synthetic images in Cinema 4D©.
Parameter Value Unit
Image resolution 512×512 pixels
Focal length 3.9·103m
Pixel size 1.1·105m
with the
x
- and
y
- image coordinates of the feature points, computed offline
based on the intrinsic camera parameters as well as on the feature coordinates
in the target body frame, which were extracted from the wireframe 3D model
prior to the training. During training, the network is optimized to locate 16
260
features of the Envisat spacecraft, consisting of the corners of the main body,
the Synthetic-Aperture Radar (SAR) antenna, and the solar panel, respectively.
Figure 4 illustrates the selected features for a specific target pose.
3.1. Training, Validation and Test
For the training, validation, and test datasets, synthetic images of the Envisat
265
spacecraft were rendered in the Cinema 4D
©
software. Table 1 lists the main
camera parameters adopted. Constant Sun elevation and azimuth angles of 30
13
degrees were chosen in order to recreate favourable as well as adverse illumination
conditions. Relative distances between camera and target were discretized every
30 m in the interval 90 m - 180 m, with the Envisat always located in the camera
270
boresight direction in order to prevent some of the Envisat features from falling
outside the camera field of view. Although being a conservative assumption,
this allows to test the CNN detection under ideal servicer-target geometries
during a rectilinear approach. Subsequently, relative attitudes were generated
by discretizing the yaw, pitch, and roll angles of the target with respect to the
275
camera by 10 degrees each. Together, these two choices were made in order
to recreate several relative attitudes between the servicer and the target. The
resulting database was then shuffled to randomize the images, and was ultimately
split into training (18,000 images), validation (6,000 images), and test (6,000
images) datasets. Figure 5 shows a subset of the camera pose distribution for 100
280
representative training images, whereas Figure 6 illustrates some of the images
included in the training dataset.
During training, the validation dataset is used beside the training one to compute
the validation losses and avoid overfitting. The Adam optimizer [
35
] is used with
a learning rate of 10
3
for a total number of 50 epochs. Finally, the network
285
performance after training is assessed with the test dataset.
Preliminary results on the single-stack network performance were already
reported by Pasqualetto Cassinis et al.
[29]
. Above all, one key advantage of
relying on CNNs for feature detection was found in the capability of learning
290
the relative position between features under a variety of relative poses present
in the training. As a result, both features which are not visible due to adverse
illumination and features occulted by other parts of the target can be detected.
Besides, a challenge was identified in the specific selection of the trainable features.
Since the features selected in this work represent highly symmetrical points of
295
the Envisat spacecraft, such as corners of the solar panel, SAR antenna or main
body, the network could be unable to distinguish between similar features, and
return multiple heatmaps for a single feature output. Figure 7 illustrates these
14
Figure 5: Illustration of the pose space discretization in the training dataset. The concentric
spheres represent the discretization of the relative distance in the range 90 m - 180 m. Only
100 random relative camera poses are shown for clarity.
Figure 6: A montage of eight synthetic images selected from the training set.
15
findings. Notably, the detection of wrong features results in weak heatmaps,
which can be filtered out by selecting a proper threshold on their total brightness.
300
(a) Occulted spacecraft body
(b) Heatmap of a single solar panel’s
corner
Figure 7: Robustness and challenges of feature detection with the proposed CNN. On the
left-hand side, the network has been trained to recognize the pattern of the features, and
can correctly locate the body features which are not visible, i.e. parts occulted by the solar
panel and corners of the SAR antenna. Conversely, the right-hand side shows the detection of
multiple heatmaps for a single corner of the solar panel. As can be seen, the network can have
difficulties in distinguishing similar features, such as the corners of the solar panel.
In order to compare the feature detection accuracy of the proposed Single-
stack Hourglass with a more complex CNN architecture, the HRNet proposed
by Sun et al.
[28]
has been selected and trained on the same Envisat datasets.
This architecture had already been tested on the SPEED dataset [
25
] and
already proved to return highly accurate features of the TANGO spacecraft. The
305
performance is assessed in terms of Root Mean Squared Error (RMSE) between
the ground truth (GT) and the
x
,
y
coordinates of the extracted features, which
is computed as
ERMSE =
v
u
u
u
u
u
t
ntot
X
i=1
[(xGT,i xi)2+ (yGT ,i yi)2]
ntot
.(3)
16
Figure 8: RMSE over the test dataset for the HRNet and the Single-stack Hourglass.
Figure 8 shows the RMSE error over the test dataset for the two CNNs,
whereas Table 2 reports the mean
µ
and standard deviation
σ
of the associated
310
histograms. As expected, the added complexity of HRNet, translates into a
more accurate detection of the selected features, thanks to the higher number of
parameters: only 4% of the test images are characterized by a RMSE above 5
pixels, as opposed to the 15% in the Single-stack Hourglass case.
Table 2: Mean
µ
and Standard Deviation
σ
of the adopted networks over the Envisat test
dataset.
Network No. Params µ[pxl] σ[pxl]
Single-stack Hourglass 1.8M 3.4 4.3
HRNet 25M 2.4 1.4
Although HRNet proves to return more accurate features, it is also believed
315
that the larger RMSE scenarios returned by the Single-stack Hourglass can be
properly handled, if a larger uncertainty can be associated to their corresponding
17
heatmaps. As an example, a large RMSE could be associated to the inaccurate
detection of only a few features which, if properly weighted, could not have
a severe impact on the pose estimation step. This task can be performed by
320
deriving a covariance matrix for each detected feature, in order to represent
its detection uncertainty. Above all, this may prevent the pose solver and
the navigation filter from trusting wrong detections by relying more on other
accurate features. In this way, the navigation filter can cope with poorly accurate
heatmaps while at the same time relying on a computationally-low CNN.325
4. Covariance Computation
Compared to the methods discussed in Section I [
13
,
14
,
15
], the proposed
method derives a covariance matrix associated to each feature directly from the
heatmaps detected by the CNN, rather than from the computation of the image
gradient around each feature. In order to do so, the first step is to obtain a
330
statistical population around the heatmap’s peak. This is done by thresholding
each heatmap image so that only the
x
- and
y
- location of heatmap’s pixels
are extracted. Secondly, each pixel within the population is given a normalized
weight wibased on the gray intensity Iiat its location,
wi=wRRi+wGGi+wBBi,(4)
where
R, G, B
are the components of the coloured image and
wR, wG, wB
are
335
the weights assigned to each channel in order to get the grayscale intensity. This
is done in order to give more weight to pixels which are particularly bright and
close to the peak, and less weight to pixels which are very faint and far from
the peak. Finally, the obtained statistical population of each feature is used to
compute the weighted covariance between
x, y
and consequently the covariance
340
matrix Ci,
Ci=
cov(x, x) cov(x, y)
cov(y, x) cov(y, y)
,(5)
18
where
cov(x, y) =
n
X
i=1
wi(xipx)·(yipy) (6)
and
n
is the number of pixels in each feature’s heatmap. In this work, the mean
is replaced by the peak location
p
= (
px, py
) in order to represent a distribution
around the peak of the detected feature, rather than around the heatmap’s mean.
345
This is particularly relevant when the heatmaps are asymmetric and their mean
does not coincide with their peak.
Figure 9 shows the overall flow to obtain the covariance matrix for three
different heatmap shapes. The ellipse associated to each features covariance is
obtained by computing the eigenvalues λxand λyof the covariance matrix,350
x
λx2
+y
λy2
=s, (7)
where
s
defines the scale of the ellipse and is derived from the confidence interval
of interest, e.g.
s
= 2
.
2173 for a 68% confidence interval. As can be seen,
different heatmaps can result in very different covariance matrices. Above
all, the computed covariance can capture the different CNN uncertainty over
x
,
y
. Notice that, due to its symmetric nature, the covariance matrix can only
355
represent bivariate normal distributions. As a result, asymmetrical heatmaps
such as the one in the third scenario are approximated by Gaussian distributions
characterized by an ellipse which might overestimate the heatmap’s dispersion
over some directions.
5. Pose Estimation360
The CEPPnP method proposed by Ferraz et al.
[13]
was selected to estimate
the relative pose from the detected features as well as from their covariance
matrices. The first step of this method is to rewrite the PnP problem in Eqs.1-2 as
a function of a 12-dimensional vector
y
containing the control point coordinates
19
0.68
0.99
0.68
0.99 0.99 0.68
Thresholding
Weight allocation
Covariance computation
Figure 9: Schematic of the procedure followed to derive covariance matrices from CNN
heatmaps. The displayed ellipses are derived from the computed covariances by assuming the
confidence intervals 1σ= 0.68 and 3σ= 0.99.
20
in the camera reference system,365
M y =0,(8)
where
M
is a 2
n×
12 known matrix. This is the fundamental equation in the
EPnP problem [
9
]. The likelihood of each observed feature location
ui
is then
represented as
P(ui) = k·e1
2uT
iC1
uiui,(9)
where ∆
ui
is a small, independent and unbiased noise with expectation
E
[∆
ui
] =
0
and covariance
E
[∆
ui
uT
i
] =
σ2Cui
and
k
is a normalization constant. Here,
σ2
represents the global uncertainty in the image, whereas
Cui
is the 2x2
unnormalized covariance matrix representing the Gaussian distribution of each
detected feature, computed from the CNN heatmaps. After some calculations
370
[13], the EPnP formulation can be rewritten as
(NL)y=λy.(10)
This is an eigenvalue problem in which both
N
and
L
matrices are a function
of
y
and
Cui
. The problem is solved iteratively by means of the closed-loop
EPPnP solution for the four control points, assuming no feature uncertainty.
Once
y
is estimated, the relative pose is computed by solving the generalized
375
Orthogonal Procrustes problem used in the EPPnP [10].
6. Navigation Filter
Several navigation filters for close-proximity operations were investigated in
recent years in the context of relative pose estimation. The reader is referred
to Pasqualetto Cassinis et al.
[8]
for a comprehensive overview that goes be-
380
yond the scope of this work. In the proposed navigation system, the so-called
Multiplicative Extended Kalman Filter (MEKF) is used. Remarkably, other
works [
15
,
30
] adopted a standard formulation of the EKF that propagates the
relative pose, expressed in terms of relative position and quaternions, as well as
21
the relative translational and rotational velocities (prediction step), correcting
385
the prediction with the measurements obtained from the monocular camera
(correction step). However, the quaternion set consists of four parameters to
describe the 3DOF attitude, hence one of its parameters is deterministic. As
reported by Tweddle and Saenz-Otero
[36]
and Sharma and D’Amico
[31]
, this
makes the covariance matrix of a quaternion have one eigenvalue that is exactly
390
zero. As a result, the entire state covariance propagated by the filter may
become non-positive-definite and lead to the divergence of the filter. The MEKF,
introduced for the first time by Lefferts et al.
[37]
, aims at solving the above
issue by using two different parametrizations of the relative attitude. A three
element error parametrization, expressed in terms of quaternions, is propagated
395
and corrected inside the filter to return an estimate of the attitude error. At
each estimation step, this error estimate is used to update a reference quaternion
and is reset to zero for the next iteration. Notably, the reset step prevents the
attitude error parametrization from reaching singularities, which generally occur
for large angles.400
6.1. Propagation Step
A standard EKF state vector for relative pose estimation is composed of
the relative pose between the servicer and the target, as well as the relative
translational and rotational velocities
v
and
ω
. Under the assumption that the
camera frame onboard the servicer is co-moving with the LVLH frame, with the
405
camera boresight aligned with the along-track direction, this translates into
x=tCvqωT
,(11)
where
q
=
q0qv
is the quaternion set that represents the relative attitude.
Notice that the assumption of the camera co-moving with the LVLH is made
only to focus on the navigation aspects rather than on the attitude control of
the servicer. Therefore, the application of the filter can be extended to other
410
scenarios, if attitude control is included in the system.
22
In the MEKF, the modified state vector propagated inside the filter becomes
˜
x=tCv a ωT
,(12)
where ais four times the Modified Rodrigues Parameters (MRP) σ,
a= 4σ= 4 qv
1 + q0
.(13)
The discrete attitude propagation step is derived by linearizing
˙
a
around
a=03×1and assuming small angle rotations [36],415
˙
a=1
2[ω×]a+ω.(14)
As a result, the discrete linearized propagation of the full state becomes
˜
xk=Φk˜
xk1+ΓkQk,(15)
where Qkrepresents the process noise and
Φk=
ΦCW 06×6
06×6Φa,ω
(16)
ΦCW =
Φrr Φrv
Φvr Φvv
(17)
Φrr =
1 0 6(∆θsin ∆θ)
0 cos ∆θ0
0 0 4 3 cos ∆θ
(18)
Φrv =
1s(4 sin ∆θ3∆θ)02s(1 cos ∆θ)
0 1ssin ∆θ0
2s(cos ∆θ1) 0 sin ∆θ/ω
(19)
Φvr =
006ωs(1 cos ∆θ)
0ωssin ∆θ0
0 0 3ωssin ∆θ
(20)
23
Φvv =
4 cos ∆θ3 0 2 sin θ
0 cos ∆θ0
2 sin ∆θ0 cos ∆θ
(21)
Φa,ω =
e1
2[ω×]∆tRt
0e1
2[ω×]τ
03×3I3×3
(22)
Γk+1 =
1
2mI3×3t203×3
1
mI3×3t03×3
03×3tRt
0e1
2[ω×]τJ1
03×3J1t
.(23)
The terms
ωs
and ∆
θ
in Eq. 17 represent the servicer argument of perigee and
true anomaly variation from time
t0
to
t
, respectively, whereas the term
J
in
Eq. 23 is the inertia matrix of the target spacecraft. In Tweddle and Saenz-Otero
420
[36]
, the integral terms ins Eqs.22-23 are solved by creating a temporary linear
system from Eq. 14, augmented with the angular velocity and the process noise.
The State Transition Matrix of this system is then solved numerically with the
matrix exponential.
6.2. Correction Step425
At this stage, the propagated state
˜
xk
is corrected with the measurements
z
to
return an update of the state
ˆ
xk
. In a loosely-coupled filter, these measurements
are represented by the relative pose between the servicer and the target spacecraft,
obtained by solving the PnP problem with the CEPPnP solver described in
Section V. In this case, a pseudomeasurements vector is derived by transforming
430
the relative quaternion set into the desired attitude error a,
δqz=qzqref
ka= 4 δqv
1 + δq0
(24)
24
zk=
tC
a
=Hkxk+Vk=
I3×303×303×303×3
03×303×3I3×303×3
tC
v
a
ω
k
+
Vr
Va
k
.
(25)
In Eq. 24,
denotes the quaternion product. Conversely, in a tightly-coupled
filter the measurements are represented by the pixel coordinates of the detected
features,
z= (x1, y1. . . xn, yn)T.(26)
Referring to Eqs.1-2, this translates into the following equations for each
435
detected point pi:
hi=xC
i
zC
i
fx+Cx,yC
i
zC
i
fy+CyT
(27)
rC=qrB
iq+tC,(28)
where
q
is the quaternion conjugate. As a result, the measurements update
equation can be written as
zk=Hkxk+V=
HtC,i 02n×3Ha,i 02n×3
.
.
..
.
..
.
..
.
.
HtC,n 02n×3Ha,n 02n×3
tC
v
a
w
k
+
Vr
Va
(29)
and the Jacobian
Hk
of the observation model with respect of the state vector
is a 2n×13 matrix whose elements are440
Hr,i =Hint
i·Hext
tC,i (30)
25
Ha,i =Hint
i·Hext
a,i =Hint
i·Hext
q,i ·Hq
a(31)
Hint
i=hi
rC
i
=
fx
zC
i
0fx
(zC
i)2xC
i
0fy
zC
i
fy
(zC
i)2yC
i
(32)
Hext
q,i =rC
i
q=qrB
iq
q;Hext
tC,i =rC
i
tC=I3(33)
Hq
a=δqqref
a=Qrefδq
a=Qref
δq
a(34)
Qref =
q0q1q2q3
q1q0q3q2
q2q3q0q1
q3q2q1q0
ref
.(35)
The partial derivatives of the differential quaternion set with respect to the
attitude error are computed from the relation between the attitude error
a
and
the differential quaternion set δq,
δq0=16 − kak2
16 + kak2δqv= 8 a
16 + kak2(36)
δq
a=8
16 + kak22
8a18a28a3
16 + kak22a2
12a1a22a1a3
2a2a116 + kak22a2
22a2a3
2a3a12a3a216 + kak22a2
3
.
(37)
In the tightly-coupled filter, the measurement covariance matrix
R
is a time-
varying block diagonal matrix constructed with the heatmaps-derived covariances
445
26
Ciin Eq. 5,
R=
C1
...
Cn
.(38)
Notice that
Ci
can differ for each feature in a given frame as well as vary over
time. Preliminary navigation results [
30
] already showed that such heatmaps-
derived covariance matrix can capture the statistical distribution of the measured
features and improve the measurements update step of the navigation filter.
450
Conversely, in the loosely-coupled filter
R
represents the uncertainty in the
pose estimation step and hence it is not directly related to the CNN heatmaps.
A constant value is therefore chosen based on the pose estimation accuracy
observed for the test dataset.
455
Finally, the updated state estimate
ˆ
xk
is obtained from the propagated state
˜
xk, the residuals ˜
y, and the Kalman Gain K,
˜
y=zh(˜
xk) (39)
K=PkHT
k(HkPkHT
k+Rk)1(40)
ˆ
xk=˜
xk+K˜
y.(41)
6.3. Reset Step
In the reset step, the reference quaternion
qref
is updated with the attitude
error estimate ˆ
apand the new attitude error is set to zero,460
ˆ
qk=δq(ˆ
a)qrefk(42)
ˆ
a=03×1(43)
27
qrefk+1 =ˆ
qk.(44)
The obtained estimated quaternion set
ˆ
qk
is then compared to the real
quaternion set to assess the angle accuracy of the filter.
7. Simulations
In this section, the simulation environment and the results are presented.
Firstly, the impact of including a heatmaps-derived covariance in the pose
465
estimation step is addressed by comparing the CEPPnP method with a standard
solver which does not account for feature uncertainty. The weights in Eq. 4
are selected based on the standard RGB-to-grayscale conversion (
wR
= 0
.
299,
wG
= 0
.
587,
wB
= 0
.
114). Secondly, the performance of the MEKF is evaluated
by comparing the convergence profiles with a heatmaps-derived covariance matrix
470
against covariance matrices with arbitrary selected covariances. Initialization is
provided by the CEPPnP for all the scenarios.
Two separate error metrics are adopted in the evaluation, in accordance with
Sharma and D’Amico
[20]
. Firstly, the translational error between the estimated
relative position ˆ
tCand the ground truth tis computed as475
ET=|tCˆ
tC|.(45)
This metric is also applied for the translational and rotational velocities estimated
in the navigation filter. Secondly, the attitude accuracy is measured in terms of
the Euler axis-angle error between the estimated quaternion
ˆ
q
and the ground
truth q,
β=βsβv=qˆ
q(46)
ER= 2 arccos (|βs|).(47)
28
7.1. Pose Estimation480
Three representative scenarios are selected from the CNN test dataset for
a preliminary evaluation of the Single-stack Hourglass performance. These
scenarios were chosen in order to analyze different heatmaps’ distributions
around the detected features. A comparison is made between the proposed
CEPPnP and the EPPnP. Figure 10 shows the characteristics of the covariance
485
matrices derived from the predicted heatmaps. Here, the ratio between the
minimum and maximum eigenvalues of the associated covariances is represented
against the ellipse’s area and the RMSE between the Ground Truth (GT) and
the x, y coordinates of the extracted features,
ERMSE,i =q(xGT,i xi)2+ (yGT ,i yi)2.(48)
Notably, interesting relations can be established between the three quantities
490
reported in the figure. In the first scenario, the correlation between the sub-
pixel RMSE and the large eigenvalues ratio suggests that a very accurate CNN
detection can be associated with circular-shaped heatmaps. Moreover, the
relatively low ellipse’s areas indicate that, in general, small heatmaps are expected
for an accurate detection. Conversely, in the second scenario the larger ellipses’
495
area correlates with a larger RMSE. Furthermore, it can be seen that the largest
difference between the
x
- and
y
- components of the RMSE occurs either for the
most eccentric heatmap (ID 13) or for the one with the largest area (ID 6). The
same behaviour can be observed in the last scenario, where the largest RMSE
coincides with a large, highly eccentric heatmap.500
Table 3 lists the pose estimation results for the three scenarios. As anticipated
in Figure 10, the statistical information derived from the heatmaps in the first
scenario is uniform for all the features, due to the very accurate CNN detection.
As a result, the inclusion of features covariance in the CEPPnP solver does not
help refining the estimated pose. Both solvers are characterized by the same
505
pose accuracy.
Not surprisingly, the situation changes as soon as the heatmaps are not uni-
29
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Frame ID
0
0.5
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Frame ID
0
50
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Frame ID
0
0.5
1
1.5
x
y
(a) Test scenario 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Frame ID
0
0.5
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Frame ID
0
100
200
300
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Frame ID
0
1
2
3x
y
(b) Test scenario 2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Frame ID
0
0.5
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Frame ID
0
200
400
600
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Frame ID
0
5
10
x
y
(c) Test scenario 3
Figure 10: Characteristics of the ellipses derived from the covariance matrices for the three
selected scenarios.
Table 3: Single-stack Hourglass Pose Estimation performance results for the selected three
representative scenarios.
Metric Scenario CEPPnP EPPnP
ET[m]
1 [0.18 0.22 0.24] [0.17 0.22 0.24]
2 [0.35 0.41 0.59] [0.14 0.4 22.8]
3 [0.49 0.12 1.41] [0.56 0.16 5.01]
ER[deg]
1 0.36 0.35
2 0.75 6.08
3 1.99 2.72
30
form across the feature IDs. Due to its capability of accomodating feature
uncertainties in the estimation, the CEPPnP method outperforms the EPPnP
for the remaining scenarios. In other words, the CEPPnP solver proves to be
510
more robust against inaccurate CNN detections by accounting for a reliable
representation of the features covariance.
Next, the previous comparison is extended to the entire test dataset as well
as to HRNet, by computing the mean and standard deviation of the estimated
515
relative position and attitude as a function of the relative range, respectively.
This is represented in Figures 11-12. First of all, it can be seen that the pose
accuracy of the CEPPnP solver in the Single-stack Hourglass scenario does not
improve compared to the EPPnP, as opposed to the ideal behaviour reported
in Table 3. There are two potential causes of this behaviour. On the one hand,
520
most of the test images characterized by a large RMSE (Figure 8) could not
return statistically-meaningful heatmaps that would help the CEPPnP solver.
This could be due to multiple heatmaps or highly inaccurate detections in which
two different corners are confused with each other. On the other hand, this could
be a direct consequence of the large relative ranges considered in this work. As
525
already reported by Park et al.
[26]
and Sharma and D’Amico
[31]
, a decreasing
performance of EPPnP is indeed expected for increasing relative distances, due
to the nonlinear relation between the pixel location of the detected features and
zC
in Eq. 2. In other words, relatively large pixel errors could lead to inaccurate
pose estimates for large relative distances, independently of the use of either
530
CEPPnP or EPPnP.
Furthermore, it can be seen from a different comparison level that both
the mean and standard deviation of the estimated relative pose are improved,
when HRNet is used prior to the PnP solver (Figures 11b-12b). Again, this is
535
a direct consequence of the smaller RMSE reported in Figure 8. As a result,
the above-mentioned degradation of the pose estimation accuracy for increasing
relative ranges is less critical for HRNet. Notice also that, despite an actual
31
90 100 110 120 130 140 150 160 170 180
Relative range [m]
-15
-10
-5
0
5
10
15
20
25
30
35
Mean Position Error [m]
CEPPnP
EPPnP
(a) Single-stack Hourglass
90 100 110 120 130 140 150 160 170 180
Relative range [m]
-4
-2
0
2
4
6
8
10
12
Mean Position Error [m]
CEPPnP
EPPnP
(b) HRNet
Figure 11: Pose Estimation Results - The standard deviation of the position error
ET
is
depicted as the length of each error bar above and below the mean error ET.
improvement of CEPPnP over EPPnP can be seen in the HRNet scenario, the
improvements in both the mean and standard deviation of the estimation error
540
are relatively small at large relative distances. This is considered to be related to
the fact that HRNet returns circular heatmaps for most of the detected features,
due to its higher detection accuracy compared to the Single-stack Hourglass.
Notably, it is important to assess how well the pose estimation system can
545
scale when tested on datasets different than the Envisat one. To this aim, the
proposed heatmaps-based scheme was benchmarked on the SPEED dataset, in
order to compare its pose accuracy against standard as well as CNN-based systems
[
19
,
26
,
25
]. The reader is referred to Barad
[38
, p. 115
]
for a comprehensive
quantitative analysis of such comparison. The results demonstrated that the
550
performance of the proposed pipeline, based on extracting feature heatmaps
and using the CEPPnP solver, compares well with the state-of-the-art pose
estimation systems.
7.2. Navigation Filter
To assess the performance of the proposed MEKF, a rendezvous scenarios
555
with Envisat is rendered in Cinema 4D
©
. This is a perturbation-free VBAR
trajectory characterized by a relative velocity
kvk
= 0 m/s. The Envisat performs
32
90 100 110 120 130 140 150 160 170 180
Relative range [m]
-10
-5
0
5
10
15
20
25
Mean Attitude Error [deg]
CEPPnP
EPPnP
(a) Single-stack Hourglass
90 100 110 120 130 140 150 160 170 180
Relative range [m]
-6
-4
-2
0
2
4
6
8
10
12
14
Mean Attitude Error [deg]
CEPPnP
EPPnP
(b) HRNet
Figure 12: Pose Estimation Results - The standard deviation of the attitude error
ER
is
depicted as the length of each error bar above and below the mean error ER.
a roll rotation of
kωk
= 5 deg/s, with the servicer camera frame aligned with
the LVLH frame. Table 4 lists the initial conditions of the trajectory, whereas
Figure 13 shows some of the associated rendered 2D images. It is assumed that
560
the images are made available to the filter every 2 seconds for the measurement
update step, with the propagation step running at 1 Hz. In both scenarios,
the MEKF is initialized with the CEPPnP pose solution at time
t0
. The other
elements of the initial state vector are randomly chosen assuming a standard
deviation of 1 mm/s and 1 deg/s for all the axes of terms (
ˆ
v0v
) and (
ˆ
ω0ω
),
565
respectively. Table 5 reports the initial conditions of the filter.
Table 4: VBAR approach scenario. The attitude is represented in terms of ZYX Euler angles
for clarity. Note that the camera boresight is the Y-axis of the LVLH frame.
θ0[deg] ω0[deg/s] tC
0[m] v0[mm/s]
[-180 30 -80]T[-2.5 -4.3 0.75]T[0 150 0]T[0 0 0]T
Figures 14-15 show the convergence profiles for the translational and rotational
states in the tightly- and loosely-coupled MEKF, respectively. Besides, a Monte
Carlo simulation with 1,000 runs was performed to assess the robustness of the
filter estimate against varying the initial state
ˆ
x0
. Table 6 lists the standard
570
deviation chosen for the deviation from the true initial state of the filter. The
33
Figure 13: Montage of the selected VBAR approach scenario. Images are shown every 6 s for
clarity.
Table 5: Initial state vector in the MEKF. Here, HG refers to the Single-stack Hourglass
architecture.
CNN ˆ
θ0[deg] ˆ
ω0[deg/s] ˆ
tC
0[m] ˆ
v0[mm/s]
HG [-180.2 28.7 -80.6]T[-2.1 -4.1 0.1]T[0.1 149.7 0.1]T[2.8 -1.3 3]T
HRNet [-179.5 33.5 -79.7]T[-2.1 -4.1 0.1]T[-0.1 149.8 -0.1]T[2.8 -1.3 3]T
Table 6: Standard deviation of Monte Carlo variables.
σφ0[deg] σω0[deg/s] σtC
0[m] σv0[mm/s]
10 1 [1 10 1]T10
34
distribution follows a Gaussian profile with true-state mean. For the attitude
initial error, the initial reference quaternion
qref0
is perturbed by introducing a
random angular error around the correct Euler axis [39, p. 44],
φ0= ∆φ0qv(49)
qref0=q0
1
1
2φ0
.(50)
Table 7 reports the mean of the steady-state pose estimates together with their
575
standard deviation. From these results, important insights can be gained on two
different levels of the comparison.
Table 7: Monte Carlo Simulation Results. The mean and standard deviation of the relative
pose errors are taken from the absolute errors at filter steady state for each Monte Carlo run.
Single-stack Hourglass
MEKF ET1[m] ET2[m] ET3[m] ER[deg]
Tigthly-coupled 0.1182 ±6.3E-4 0.03 ±0.0024 0.096 ±5.4E-4 1.33 ±0.03
Loosely-coupled 0.1 ±2E-7 0.33 ±3E-7 0.01 ±1E-4 4.7 ±12.6
HRNet
MEKF ET1[m] ET2[m] ET3[m] ER[deg]
Tigthly-coupled 0.0683 ±5.2E-4 0.03 ±0.014 0.01 ±3.4E-5 4.3 ±0.03
Loosely-coupled 0.0075 ±1E-4 0.36 ±6.3E-5 0.002 ±4E-4 0.93 ±3.2E-7
On a CNN performance level, the results in Figure 14 show that a slightly
worse cross-track estimate of the Single-stack Hourglass is compensated by a
more accurate estimate of the relative attitude. Given the limited impact of
580
these estimation errors at the relatively large inter-satellite range of 150 m, these
results suggest that the Single-stack Hourglass has a comparable performance
with the HRNet for the selected scenario. Next, on a filter architecture level, a
comparison between Figures 14-15 illustrate the different convergence pattern
between the tightly- and loosely-coupled MEKF. Most importantly, it can be
585
35
seen that the loosely-coupled estimate of the relative along-track position is
characterized by a bias which is not present in the tightly-coupled estimate.
This occurs due to the decoupling of the translational and rotational states,
reflected in the Jacobian
Hk
in Eq. 25. As a result, the relative position is
estimated without accounting for the attitude measurements and viceversa. In
590
other words, the creation of pseudomeasurements of the relative pose prior to the
loosely-coupled filter leads to two separate translational and rotational estimates.
Conversely, in the tightly-coupled filter the full statistical information is enclosed
in the detected features, and can be used to simultaneously refine both the
translational and the rotational states. Moreover, a close inspection of the Single-
595
stack Hourglass attitude estimates in Table 7 suggests that the tightly-coupled
MEKF is characterized by a lower standard deviation, highlighting a better
robustness with respect to the initial conditions of the filter when compared to
the loosely-coupled MEKF. Note that, due to the higher accuracy of HRNet in
the feature detection step - and hence also in the pose estimation step, this is
600
not observed for the latter CNN. In conclusion, a tightly-coupled architecture
is expected to return higher pose accuracies if simplified CNNs, such as the
proposed single-stack hourglass, are implemented at a feature detection level.
8. Conclusions and Recommendations
This paper introduces a novel framework to estimate the relative pose of
605
an uncooperative target spacecraft with a single monocular camera onboard a
servicer spacecraft. A method is proposed in which a CNN-based IP algorithm
is combined with a CEPPnP solver and a tightly-coupled MEKF to return a
robust estimate of the relative pose as well as of the relative translational and
rotational velocities. The performance of the proposed method is evaluated
610
at different levels of the pose estimation system, by comparing the detection
accuracy of two different CNNs (feature detection step and pose estimation step)
whilst assessing the accuracy and robustness of the selected tightly-coupled filter
against a loosely-coupled filter (navigation filter step).
36
0 0.5 1 1.5 2
Number of Orbits
0
1
2
3
4
5
6
7
8
Error [deg]
Euler axis-angle Error
Single-stack Hourglass
HRNet
Figure 14: Navigation Filter Results - Tightly-coupled MEKF. The dashed lines represent the
1σof the estimated quantities.
37
0 0.5 1 1.5 2
Number of Orbits
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Error [deg]
Euler axis-angle Error
Single-stack Hourglass
HRNet
Figure 15: Navigation Filter Results - Loosely-coupled MEKF. The dashed lines represent the
1σof the estimated quantities.
38
615
The main novelty of the proposed CNN-based pose estimation system is to
introduce a heatmaps-derived covariance representation of the detected features
and to exploit this information in a tigthly-coupled, Single-stack Hourglass-based
MEKF. On a feature detection level, the performance of the proposed Single-
stack Hourglass is compared to the more complex HRNet to asses the feasibility
620
of a reduced-parameters CNN within the IP. Results on the selected test dataset
suggest a comparable mean detection accuracy, despite a larger standard devia-
tion of the former network. Notably, this latter aspect is found to decrease the
pose estimation accuracy of the proposed CNN compared to HRNet, despite
the adoption of CEPPnP to capture features uncertainty. However, important
625
insights are gained at a navigation filter level, delineating two major benefits
of the proposed tightly-coupled MEKF. First of all, the capability of deriving
a measurements covariance matrix directly from the CNN heatmaps allows to
capture a more representative statistical distribution of the measurements in the
filter. Notably, this is expected to be a more complex task if a loosely-coupled
630
filter is used, due to the need to convert the heatmaps distribution into a pose
estimation uncertainty through a linear transformation. Secondly, the coupling
between the rotational and translational states within the filter guarantees a
mutual interaction which is expected to improve the global accuracy of the filter,
especially in the along-track estimate. Besides, the navigation results for the
635
selected VBAR scenario demonstrated that the proposed Single-stack Hourglass
could represent a valid alternative to the more complex HRNet, provided that its
larger detection uncertainty is reflected in the measurements covariance matrix.
Together, these improvements suggest a promising scheme to cope with the
challenging demand for robust navigation in close-proximity scenarios.640
However, further work is required in several directions. First of all, more
recent CNN architectures shall be investigated to assess the achievable robustness
and accuracy in the feature detection step. Secondly, the impact of a reduction
in the number of CNN parameter on the computational complexity shall be
645
39
assessed by testing the CNNs in space-representative processors. Moreover,
broader relative ranges between the servicer camera and the target spacecraft
shall be considered, most importantly to allow a thorough investigation of the
3D depth perception challenges when approaching the target spacecraft with
a single monocular camera. Besides, more close-proximity scenarios shall be
650
recreated to assess the impact of perturbations on the accuracy and robustness
of the navigation filter. In this context, other navigation filters such as the
Unscented Kalman Filter shall be investigated.
Acknowledgments
This study is funded and supported by the European Space Agency and
655
Airbus Defence and Space under Network/Partnering Initiative (NPI) program
with grant number NPI 577 - 2017. The first author would like to thank Christoph
Haskampf for the rendering of the images in Cinema 4D
©
and Kuldeep Barad
for the adaptation of HRNet to the Envisat scenario.
References660
[1]
A. Tatsch, N. Fitz-Coy, S. Gladun, On-orbit Servicing: A brief survey,
in: Proceedings of the 2006 Performance Metrics for Intelligent Systems
Workshop, 2006, pp. 21–23.
[2]
M. Wieser, H. Richard, G. Hausmann, J.-C. Meyer, S. Jaekel, M. Lavagna,
R. Biesbroek, e.deorbit mission: OHB debris removal concepts, in: ASTRA
665
2015-13th Symposium on Advanced Space Technologies in Robotics and
Automation, Noordwijk, The Netherlands, 2015.
[3]
J. Davis, H. Pernicka, Proximity operations about and identification of non-
cooperative resident space objects using stereo imaging, Acta Astronautica
155 (2019) 418–425.670
40
[4]
V. Pesce, M. Lavagna, R. Bevilacqua, Stereovision-based pose and inertia
estimation of unknown and uncooperative space objects, Advances in Space
Research 59 (2017) 236–251.
[5]
R. Opromolla, G. Fasano, G. Rufino, M. Grassi, Uncooperative pose
estimation with a LIDAR-based system, Acta Astronautica 110 (2015)
675
287–297.
[6]
S. Segal, P. Gurfil, K. Shahid, In-Orbit Tracking of Resident Space Objects:
A Comparison of Monocular and Stereoscopic Vision, IEEE Transactions
on Aerospace and Electronic Systems 50 (2014) 676–688.
[7]
S. Sharma, J. Ventura, S. D’Amico, Robust Model-Based Monocular
680
Pose Initialization for Noncooperative Spacecraft Rendezvous, Journal of
Spacecraft and Rockets 55 (2018) 1–16.
[8]
L. Pasqualetto Cassinis, R. Fonod, E. Gill, Review of the Robustness and
Applicability of Monocular Pose Estimation Systems for Relative Navigation
with an Uncooperative Spacecraft, Progress in Aerospace Sciences 110
685
(2019).
[9]
Lepetit, F. Moreno-Noguer, P. Fua, EPnP: an accurate O(n) solution to the
PnP problem, International Journal of Computer Vision 81 (2009) 155–166.
[10]
L. Ferraz, X. Binefa, F. Moreno-Noguer, Very Fast Solution to the
PnP Problem with Algebraic Outlier Rejection, in: IEEE Conference
690
on Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014.
doi:10.1109/CVPR.2014.71.
[11]
A. Ostrowsky, Solution of Equations and Systems of Equations, 2 ed.,
Academic Press, New York, 1966.
[12]
S. Urban, J. Leitloff, S. Hinz, Mlpnp: A rel-time maximum likelihood
695
solution to the perspective-n-point problem, Proceedings of the British
Machine Vision Conference 3 (2016).
41
[13]
L. Ferraz, X. Binefa, F. Moreno-Noguer, Leveraging Feature Uncertainty in
the PnP Problem, in: Proceedings of the British Machine Vision Conference,
Nottingham, UK, 2014. doi:10.5244/C.28.83.700
[14]
J. Cui, C. Min, X. Bai, J. Cui, An Improved Pose Estimation Method
Based on Projection Vector with Noise Error Uncertainty, IEEE Photonics
Journal 11 (2019).
[15]
A. Harvard, V. Capuano, E. Shao, S.-J. Chung, Spacecraft Pose Estima-
tion from Monocular Images Using Neural Network Based Keypoints and
705
Visibility Maps, in: AIAA Scitech 2020 Forum, Orlando, FL, USA, 2020.
doi:10.1109/AERO.2018.8396425.
[16]
M. Kisantal, S. Sharma, T. Park, D. Izzo, M. Martens, S. D’Amico, Satellite
Pose Estimation Challenge: Dataset, Competition Design and Results,
IEEE Transactions on Aerospace and Electronic Systems (2020).710
[17]
D. Rondao, N. Aouf, Multi-View Monocular Pose Estimation for Spacecraft
Relative Navigation, in: 2018 AIAA Guidance, Navigation, and Control
Conference, Kissimmee, FL, USA, 2018. doi:10.2514/6.2018-2100.
[18]
V. Capuano, S. Alimo, A. Ho, S. Chung, Robust Features Extraction for
On-board Monocular-based Spacecraft Pose Acquisition, in: AIAA Scitech
715
2019 Forum, San Diego, CA, USA, 2019. doi:10.2514/6.2019-2005.
[19]
S. Sharma, C. Beierle, S. D’Amico, Pose Estimation for Non-Cooperative
Spacecraft Rendezvous using Convolutional Neural Networks, in: IEEE
Aerospace Conference, Big Sky, MT, USA, 2018. doi:
10.1109/AERO.2018.
8396425.720
[20]
S. Sharma, S. D’Amico, Pose Estimation for Non-Cooperative Spacecraft
Rendezvous using Neural Networks, in: 29th AAS/AIAA Space Flight
Mechanics Meeting, Ka’anapali, HI, USA, 2019. doi:
10.1109/AERO.2018.
8396425.
42
[21]
J. Shi, S. Ulrich, S. Ruel, CubeSat Simulation and Detection using Monoc-
725
ular Camera Images and Convolutional Neural Networks, in: 2018 AIAA
Guidance, Navigation, and Control Conference, Kissimmee, FL, USA, 2018.
doi:10.2514/6.2018-1604.
[22]
S. Sonawani, R. Alimo, R. Detry, D. Jeong, A. Hess, H. Ben Amor, Assistive
relative pose estimation for on-orbit assembly using convolutional neural
730
networks, in: AIAA Scitech 2020 Forum, Orlando, FL, USA, 2020. doi:10.
1109/AERO.2018.8396425.
[23]
G. Pavlakos, X. Zhou, A. Chan, K. Derpanis, K. Daniilidis, 6-DoF Object
Pose from Semantic Keypoints, in: IEEE International Conference on
Robotics and Automation, 2017.735
[24]
A. Newell, K. Yang, J. Deng, Stacked hourglass networks for human pose
estimation, in: B. Leibe, J. Matas, N. Sebe, M. Welling (Eds.), Computer
Vision - ECCV 2016, volume 9912, Springer, Cham, 2016, pp. 483–499.
[25]
B. Chen, J. Cao, A. Parra, T. Chin, Satellite Pose Estimation with Deep
Landmark Regression and Nonlinear Pose Refinement, in: International
740
Conference on Computer Vision, Seoul, South Korea, 2019.
[26]
T. Park, S. Sharma, S. D’Amico, Towards Robust Learning-Based Pose
Estimation of Noncooperative Spacecraft, in: AAS/AIAA Astrodynamics
Specialist Conference, Portland, ME, USA, 2019.
[27]
S. D’Amico, M. Benn, J. Jorgensen, Pose Estimation of an Uncooperative
745
Spacecraft from Actual Space Imagery, International Journal of Space
Science and Engineering 2 (2014) 171–189.
[28]
K. Sun, B. Xiao, D. Liu, J. Wang, Deep high-resolution representation
learning for human pose estimation, in: 2019 IEEE Conference on Computer
Vision and Pattern Recognition, Long Beach, CA, USA, 2019.750
[29]
L. Pasqualetto Cassinis, R. Fonod, E. Gill, I. Ahrns, J. Gil Fernandez,
Comparative Assessment of Image Processing Algorithms for the Pose
43
Estimation of an Uncooperative Spacecraft, in: International Workshop on
Satellite Constellations & Formation Flying, Glasgow, UK, 2019.
[30]
L. Pasqualetto Cassinis, R. Fonod, E. Gill, I. Ahrns, J. Gil Fernandez, CNN-
755
Based Pose Estimation System for Close-Proximity Operations Around
Uncooperative Spacecraft, in: AIAA Scitech 2019 Forum, Orlando, FL,
USA, 2020. doi:10.2514/6.2020-1457.
[31]
S. Sharma, S. D’Amico, Reduced-Dynamics Pose Estimation for Non-
Cooperative Spacecraft Rendezvous Using Monocular Vision, in: 38th AAS
760
Guidance and Control Conference, Breckenridge, CO, USA, 2017.
[32] H. Curtis, Orbital Mechanics for Engineering Students, Elsevier, 2005.
[33]
M. A. Fischer, R. Bolles, Random Sample Consensus: A Paradigm for Model
Fitting with Applications to Image Analysis and Automated Cartography,
Communications of the ACM 24 (1981) 381–395.765
[34]
O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks
for biomedical image segmentation, in: Medical Image Computing and
Computer-Assisted Intervention, Springer, 2015, pp. 234–241.
[35]
D. Kingma, J. Ba, Adam: A method for stochastic optimization, in: 3rd
International Conference for Learning Representations, San Diego, CA,
770
USA, 2015. doi:10.2514/6.2018-2100.
[36]
B. Tweddle, A. Saenz-Otero, Relative Computer Vision Based Navigation for
Small Inspection Spacecraft, Journal of Guidance, Control, and Dynamics
38 (2015) 969–978.
[37]
E. Lefferts, F. Markley, M. Shuster, Kalman Filtering for Spacecraft Attitude
775
Estimation, Journal of Guidance, Control, and Dynamics 5 (1982) 417–429.
[38]
K. Barad, Robust Navigation Framework for Proximity Operations around
Uncooperative Spacecraft, Delft University of Technology, Delft, The Nether-
lands, 2020. MSc Thesis.
44
[39]
J. Sol`a, Quaternion kinematics for the error-state Kalman filter, arXiv
780
e-prints (2017) arXiv:1711.02508.
45
... As already mentioned in Section 1, CNNs are currently emerging as a promising features extraction method. This is mostly due to the capability of their convolutional layers to extract high-level features of objects with improved robustness against image noise and illumination conditions [15]. Referring to Figure 9, the first essential step of keypoints-based CNN systems is represented by an Object Detection Network (e.g. ...
... Following the promising results presented by the authors in [15], the CEPPnP method [4] is selected to estimate the relative pose from the detected features. In this method, the CNN heatmaps around the detected features are exploited to derive feature covariance matrices and capture the statistical distribution of the detected features. ...
... In other words, the CEPPnP position estimate is characterized by a more accurate mean µ and a smaller MAD, indicating that the estimation performance is improved in those scenarios for which the CNN detections are less accurate. These considerations confirm the results reported by the authors in earlier works [15]. ...
Conference Paper
Full-text available
The estimation of the relative pose of an inactive spacecraft by an active servicer spacecraft is a critical task for close-proximity operations, such as In-Orbit Servicing and Active Debris Removal. Among all the challenges, the lack of available space images of the inactive satellite makes the on-ground validation of current monocu-lar camera-based navigation systems a challenging task, mostly due to the fact that standard Image Processing (IP) algorithms, which are usually tested on synthetic images, tend to fail when implemented in orbit. This paper reports on the testing of a novel Convolutional Neural Network (CNN)-based pose estimation pipeline with realistic lab-generated 2D monocular images of the European Space Agency's Envisat spacecraft. Following the current need to bridge the reality gap between synthetic and images acquired in space, the main contribution of this work is to test the performance of CNNs trained on synthetic datasets with more realistic images of the target spacecraft. The validation of the proposed pose estimation system is assured by the introduction of a calibration framework, which ensures an accurate reference relative pose between the target spacecraft and the camera for each lab-generated image, allowing a comparative assessment at both keypoints detection and pose estimation level. By creating a laboratory database of the Envisat spacecraft under space-like conditions, this work further aims at facilitating the establishment of a standardized on-ground validation procedure that can be used in different lab setups and with different target satellites. The lab-representative images of the Envisat are generated at the Orbital Robotics and GNC lab of ESA's European Space Research and Technology Centre (ESTEC). The VICON Tracker System is used together with a KUKA robotic arm to respectively track and control the trajectory of the monocular camera around a scaled 1:25 mockup of the Envisat spacecraft.
... Notably, the aforementioned navigation system cannot rely on known visual markers, as they are typically not available on an uncooperative target. Since the extraction of visual features is an essential step in the pose 30 estimation process, advanced Image Processing (IP) techniques are required to extract keypoints (or interest points), corners, and/or edges on the target body. ...
... Building on the authors' previous findings [17,30] and inspired by the promising texture randomization results presented in earlier works [28], the main 150 objective of this paper is to investigate the impact of training data augmentation on the CNN performance on representative space imagery generated on-ground. ...
... as compared to standard IP algorithms [30]. As shown in Figure 12 representation, facilitating more accurate and precise heatmaps. ...
Article
Full-text available
The estimation of the relative pose of an inactive spacecraft by an active servicer spacecraft is a critical task for close-proximity operations, such as In-Orbit Servicing and Active Debris Removal. Among all the challenges, the lack of available space images of the inactive satellite makes the on-ground validation of current monocular camera-based navigation systems a challenging task, mostly due to the fact that standard Image Processing (IP) algorithms, which are usually tested on synthetic images, tend to fail when implemented in orbit. In response to this need to guarantee a reliable validation of pose estimation systems, this paper presents the most recent advances of ESA's GNC Rendezvous, Approach and Landing Simulator (GRALS) testbed for close-proximity operations around uncooperative spacecraft. The proposed testbed is used to validate a Convo-lutional Neural Network (CNN)-based monocular pose estimation system on representative rendezvous scenarios with special focus on solving the domain shift problem which characterizes CNNs trained on synthetic datasets when tested on more realistic imagery. The validation of the proposed system is en
... A high-level block diagram of the proposed architecture is provided in Fig. 2. The configuration is loosely coupled meaning that the raw data produced by the EO sensors are processed within a separate block (i.e., outside of the filter) to get relative position and attitude measurements to be used in the correction step of the filtering schemes [30], [31]. Since the target is a known object, the inputs required by the main processing functions (red blocks) include target, chaser and robotic arm GEOmetric information (orange blocks), as well as the chaser absolute state estimates and the robotic arm's joint sensors measurements (blue blocks). ...
Conference Paper
The potentialities of In-Orbit Servicing (IOS) to extend the operational life of satellites and the need to implement Active Debris Removal (ADR) to effectively tackle the space debris problem are well known among the space community. Research on technical solutions to enable this class of missions is thriving, also pushed by the development of new generation sensors and control systems. Among private companies, space agencies and universities, the European Space Agency (ESA) has been developing technologies in this field for decades. Several solutions have been proposed over the years to safely capture orbital objects, the majority relying on robotic systems. A promising option is the employment of an autonomous spacecraft (chaser) equipped with a highly dexterous robotic arm able to perform the berthing with a resident space object. This operation poses complex technical challenges both during the approach phase and after contact. In this respect, the design of an effective, reliable, and robust Guidance, Navigation and Control (GNC) system, for which several algorithmic architectures and hardware configurations are possible, plays a key role to ensure safe mission execution. This work presents the outcomes of a research activity performed by a consortium of universities under contract with ESA with the goal to develop the navigation and control subsystems of a GNC system for controlling a chaser equipped with a redundant manipulator. Both the final approach until capture and the target stabilization phase after capture are considered in the study. The proposed solution aims at the implementation of a combined control strategy. Robust control methods are adopted to design control laws for the uncertain, nonlinear dynamics of the chaser and of the complete chaser-target stack after capture. Visual-based solutions, i.e., relying on active/passive electro-optical sensors, are selected for relative navigation. A complete sensor suite for relative and absolute navigation is part of the GNC system, including transducers for robot joint measurements. To properly validate the proposed solutions, a complete numerical simulator has been developed. This software tool allows to thoroughly assess the system performance, accounting for all the relevant external disturbances and error sources. A realistic synthetic image generator is also used for relative navigation performance assessment. This paper presents the design solutions and the results of preliminary numerical testing, considering three mission scenarios to prove the flexibility of the solution and its applicability to a wide range of operational cases.
... The non-cooperative target pose measurement is generally performed using noncontact methods such as image sensors [4][5][6] and LiDAR [7][8][9][10][11][12]. The method proposed in this article uses line scan hybrid-solid-state lidar. ...
Article
Full-text available
On-orbit space technology is used for tasks such as the relative navigation of non-cooperative targets, rendezvous and docking, on-orbit assembly, and space debris removal. In particular, the pose estimation of space non-cooperative targets is a prerequisite for studying these applications. The capabilities of a single sensor are limited, making it difficult to achieve high accuracy in the measurement range. Against this backdrop, a non-cooperative target pose measurement system fused with multi-source sensors was designed in this study. First, a cross-source point cloud fusion algorithm was developed. This algorithm uses the unified and simplified expression of geometric elements in conformal geometry algebra, breaks the traditional point-to-point correspondence, and constructs matching relationships between points and spheres. Next, for the fused point cloud, we proposed a plane clustering-method-based CGA to eliminate point cloud diffusion and then reconstruct the 3D contour model. Finally, we used a twistor along with the Clohessy–Wiltshire equation to obtain the posture and other motion parameters of the non-cooperative target through the unscented Kalman filter. In both the numerical simulations and the semi-physical experiments, the proposed measurement system met the requirements for non-cooperative target measurement accuracy, and the estimation error of the angle of the rotating spindle was 30% lower than that of other, previously studied methods. The proposed cross-source point cloud fusion algorithm can achieve high registration accuracy for point clouds with different densities and small overlap rates.
... After SPEC, published DNN-based work has seldom considered actual rendezvous trajectories [34], continuing to focus instead on individual greyscale images of SPEED [35][36][37][38]. In either case, the proposed strategies consist in using a CNN for keypoint detection for use with PnP. ...
Preprint
This paper presents an innovative deep learning pipeline which estimates the relative pose of a spacecraft by incorporating the temporal information from a rendezvous sequence. It leverages the performance of long short-term memory (LSTM) units in modelling sequences of data for the processing of features extracted by a convolutional neural network (CNN) backbone. Three distinct training strategies, which follow a coarse-to-fine funnelled approach, are combined to facilitate feature learning and improve end-to-end pose estimation by regression. The capability of CNNs to autonomously ascertain feature representations from images is exploited to fuse thermal infrared data with red-green-blue (RGB) inputs, thus mitigating the effects of artefacts from imaging space objects in the visible wavelength. Each contribution of the proposed framework, dubbed ChiNet, is demonstrated on a synthetic dataset, and the complete pipeline is validated on experimental data.
Article
This article reviews state-of-the-art of the problem of using Artificial Intelligence (AI) for spacecraft Guidance, Navigation, and Control (GNC). Developing AI algorithms for GNC purposes with enhanced and robust performance depends heavily on the models used for GNC algorithms. This is because using AI algorithms may require the designer to train his algorithms using training examples (data sets) via a supervised learning (SL) algorithms. GNC algorithms could be used to provide AI algorithms with the required data sets of training. So, reviewing state-of-the-art GNC algorithms is also mandatory. Good and up-to-date understanding of the current state-of-the art of GNC algorithms enable us to build better survey of using AI algorithms in spacecraft GNC. Unsupervised learning, or reward-based learning algorithms could also be used also for GNC purposes. The current article focuses on GNC algorithms, AI algorithms, and the interaction among AI and GNC algorithms. In addition, a future vision of using AI techniques with GNC is also provided.
Article
Accurate acquisition of spacecraft pose information is a key step in tasks such as space debris removal, rendezvous and docking, and on-orbit maintenance. The application of neural network has deepened for aerospace measurement and control. Studying the parameters’ influence and degree of imaging and neural network on the accuracy of spacecraft pose estimation is necessary and urgent, but there is still little research in related fields. Approximate Bayesian inference in the depth Gaussian process is utilized in this paper to explore above problems, and systematic verification experiments are carried based on the general neural network and spacecraft pose estimation data set. The verification results depict that the estimation errors of position and attitude can be both effectively reduced by selecting a camera whose radial distortion error is less than 240%. Selecting a camera with focal ratio error less than 2.0% can effectively reduce the position estimation error. Selecting a camera whose principal point error is less than 1.0% can effectively improve the accuracy of attitude estimation. The suppression of tangential distortion error does not significantly improve the estimation accuracy of both position and attitude. Parameters about weighted attenuation factor, the number of network layers and characteristic channels of neural network are all sensitive to the accuracy of pose estimation of spacecraft within a specific range. An excellent balance can be achieved between the estimation accuracy and computational overhead when the weight attenuation factor is set between 0.01 and 0.1, and the number of network layers and characteristic channels are set separately between 20~40 and 70~90. It can reach the accuracy of the traditional pose estimation algorithm, and has better processing time stability in real ground test system by using the network architecture of ResNet34 with weight attenuation factor of 0.01, characteristic channels of 90.
Article
Full-text available
Reliable pose estimation of uncooperative satellites is a key technology for enabling future on-orbit servicing and debris removal missions. The Kelvins Satellite Pose Estimation Challenge aims at evaluating and comparing monocular visionbased approaches and pushing the state-of-the-art on this problem. This work is based on the Satellite Pose Estimation Dataset, the first publicly available machine learning set of synthetic and real spacecraft imageries. The choice of dataset reflects one of the unique challenges associated with spaceborne computer vision tasks, namely the lack of spaceborne images to train and validate the developed algorithms. This work briefly reviews the basic properties and the collection process of the dataset which was made publicly available. The competition design, including the definition of performance metrics and the adopted testbed, is also discussed. The main contribution of this paper is the analysis of the submissions of the 48 competitors, which compares the performance of different approaches and uncovers what factors make the satellite pose estimation problem especially challenging.
Conference Paper
Full-text available
A novel method for monocular-based pose estimation of uncooperative spacecraft using keypoints specialized for a given target is presented. A set of robust keypoints are created by examining the effectiveness of existing localization algorithms by simulating and testing different perspectives. The feature extraction and matching is used to build a model of the spacecraft before the flight mission using the same feature extraction algorithms that can be used during the mission. Further, a visibility map is determined for each keypoint to aid in outlier filtering, matching, and measurement covariance estimation. For initialization and matching, a Convolutional Neural Network (CNN) is trained to generate descriptors robust to illumination, scale, and affine changes for the pre-computed keypoints. In the second part of the paper, we focus on pose determination and filtering after keypoint-to-model matching. While several approaches for pose acquisition have been formulated, we propose a novel method for tracking that makes use of a nonlinear filter, based on the spacecraft translational and rotational relative dynamics which estimates the covariance of the vision-based observations using the keypoint preprocessing information. Further, the estimated propagated covariance for each extracted feature is used for aiding the feature matching.
Conference Paper
Full-text available
This paper introduces a novel framework which combines a Convolutional Neural Network (CNN) for feature detection with a Covariant Efficient Procrustes Perspective-n-Points (CEPPnP) solver and an Extended Kalman Filter (EKF) to enable robust monocular pose estimation for close-proximity operations around an uncooperative spacecraft. The relative pose estimation of an inactive spacecraft by an active servicer spacecraft is a critical task in the design of current and planned space missions, due to its relevance for close-proximity operations, such as In-Orbit Servicing and Active Debris Removal. The main contribution of this work stands in deriving statistical information from the Image Processing step, by associating a covariance matrix to the heatmaps returned by the CNN for each detected feature. This information is included in the CEPPnP to improve the accuracy of the pose estimation step during filter initialization. The derived measurement covariance matrix is used in a tightly-coupled EKF to allow an enhanced representation of the measurements error from the feature detection step. This increases the filter robustness in case of inaccurate CNN detections. The proposed method is capable of returning reliable estimates of the relative pose as well as of the relative translational and rotational velocities, under adverse illumination conditions and partial occultation of the target. Synthetic 2D images of the European Space Agency's Envisat spacecraft are used to generate datasets for training, validation and testing of the CNN. Likewise, the images are used to recreate representative close-proximity scenarios for the validation of the proposed method.
Conference Paper
Full-text available
This paper reports on a comparative assessment of Image Processing (IP) techniques for the relative pose estimation of uncooperative spacecraft with a monocular camera. Currently, keypoints-based algorithms suffer from partial occlusion of the target, as well as from the different illumination conditions between the required offline database and the query space image. Besides, algorithms based on corners/edges detection are highly sensitive to adverse illumination conditions in orbit. An evaluation of the critical aspects of these two methods is provided with the aim of comparing their performance under changing illumination conditions and varying views between the camera and the target. Five different keypoints-based methods are compared to assess the robustness of feature matching. Furthermore, a method based on corners extraction from the lines detected by the Hough Transform is proposed and evaluated. Finally, a novel method, based on an hourglass Convolutional Neural Network (CNN) architecture, is proposed to improve the robustness of the IP during partial occlusion of the target as well as during feature tracking. It is expected that the results of this work will help assessing the robustness of keypoints-based, corners/edges-based, and CNN-based algorithms within the IP prior to the relative pose estimation.
Article
Full-text available
The relative pose estimation of an inactive target by an active servicer spacecraft is a critical task in the design of current and planned space missions, due to its relevance for close-proximity operations, i.e. the rendezvous with a space debris and/or in-orbit servicing. Pose estimation systems based solely on a monocular camera are recently becoming an attractive alternative to systems based on active sensors or stereo cameras, due to their reduced mass, power consumption and system complexity. In this framework, a review of the robustness and applicability of monocular systems for the pose estimation of an uncooperative spacecraft is provided. Special focus is put on the advantages of multispectral monocular systems as well as on the improved robustness of novel image processing schemes and pose estimation solvers. The limitations and drawbacks of the validation of current pose estimation schemes with synthetic images are further discussed, together with the critical trade-offs for the selection of visual-based navigation filters. The state-of-the-art techniques are analyzed in order to provide an insight into the limitations involved under adverse illumination and orbit scenarios, high image contrast, background noise, and low signal-to-noise ratio, which characterize actual space imagery, and which could jeopardize the image processing algorithms and affect the pose estimation accuracy as well as the navigation filter's robustness. Specifically, a comparative assessment of current solutions is given at different levels of the pose estimation process, in order to bring a novel and broad perspective as compared to previous works.
Article
Full-text available
Aiming at the problem of anomalous and non-independent distribution of the image errors in the feature-based visual pose estimation, a method of monocular visual pose estimation based on the uncertainty of noise error established by projection vector is proposed. Firstly, by using the covariance matrix to describe the uncertainty of the feature point direction and integrating the uncertainty of the feature point direction into the pose estimation, characteristic point measurement error with different degrees of directional uncertainty can be adapted which can makes the algorithm robust. Then, by introducing the projection vector and combining the depth information of each feature point to represent the collinearity error, the model nonlinear problem caused by the camera perspective projection can be eliminated which can make the algorithm have global convergence. Finally, we use global convergence theorem to prove the global convergence of the proposed algorithm. The results show that the proposed method has good robustness and convergence while adapting to different degrees of error uncertainty, which can meet practical engineering applications.
Conference Paper
Full-text available
This paper presents the design, implementation, and validation of a robust feature extraction architecture for real-time on-board monocular vision-based pose initialization of a target spacecraft in application to on-orbit servicing and formation flying. The proposed computer vision algorithm is designed to detect the most significant features of an uncooperative target spacecraft in a sequence of two-dimensional input images that are collected on board the chaser spacecraft. A novel approach based on the fusion of multiple and parallel processing streams is proposed to filter a minimum number of extracted true point features, even in case of unfavourable illumination conditions and in presence of Earth in the background. These are then combined into relevant polyline structures that characterize the true geometrical shape of the target spacecraft.