Conference PaperPDF Available

Abstract and Figures

This paper introduces a novel framework which combines a Convolutional Neural Network (CNN) for feature detection with a Covariant Efficient Procrustes Perspective-n-Points (CEPPnP) solver and an Extended Kalman Filter (EKF) to enable robust monocular pose estimation for close-proximity operations around an uncooperative spacecraft. The relative pose estimation of an inactive spacecraft by an active servicer spacecraft is a critical task in the design of current and planned space missions, due to its relevance for close-proximity operations, such as In-Orbit Servicing and Active Debris Removal. The main contribution of this work stands in deriving statistical information from the Image Processing step, by associating a covariance matrix to the heatmaps returned by the CNN for each detected feature. This information is included in the CEPPnP to improve the accuracy of the pose estimation step during filter initialization. The derived measurement covariance matrix is used in a tightly-coupled EKF to allow an enhanced representation of the measurements error from the feature detection step. This increases the filter robustness in case of inaccurate CNN detections. The proposed method is capable of returning reliable estimates of the relative pose as well as of the relative translational and rotational velocities, under adverse illumination conditions and partial occultation of the target. Synthetic 2D images of the European Space Agency's Envisat spacecraft are used to generate datasets for training, validation and testing of the CNN. Likewise, the images are used to recreate representative close-proximity scenarios for the validation of the proposed method.
Content may be subject to copyright.
CNN-Based Pose Estimation System for Close-Proximity
Operations Around Uncooperative Spacecraft
L. Pasqualetto Cassinis , R. Fonod and E. Gill
Delft University of Technology, Kluyverveg 1, 2629 HS, Delft, The Netherlands
I. Ahrns §
Airbus DS GmbH, Airbusallee 1, 28199, Bremen, Germany
J. Gil Fernandez
ESTEC, Keplerlaan 1, 2201 AZ, Noordwijk, The Netherlands
This paper introduces a novel framework which combines a Convolutional Neural Net-
work (CNN) for feature detection with a Covariant Efficient Procrustes Perspective-n-Points
(CEPPnP) solver and an Extended Kalman Filter (EKF) to enable robust monocular pose
estimation for close-proximity operations around an uncooperative spacecraft. The relative
pose estimation of an inactive spacecraft by an active servicer spacecraft is a critical task in
the design of current and planned space missions, due to its relevance for close-proximity oper-
ations, such as In-Orbit Servicing and Active Debris Removal. The main contribution of this
work stands in deriving statistical information from the Image Processing step, by associating
a covariance matrix to the heatmaps returned by the CNN for each detected feature. This
information is included in the CEPPnP to improve the accuracy of the pose estimation step
during filter initialization. The derived measurement covariance matrix is used in a tightly-
coupled EKF to allow an enhanced representation of the measurements error from the feature
detection step. This increases the filter robustness in case of inaccurate CNN detections. The
proposed method is capable of returning reliable estimates of the relative pose as well as of
the relative translational and rotational velocities, under adverse illumination conditions and
partial occultation of the target. Synthetic 2D images of the European Space Agency’s Envisat
spacecraft are used to generate datasets for training, validation and testing of the CNN. Like-
wise, the images are used to recreate representative close-proximity scenarios for the validation
of the proposed method.
I. Introduction
No
wadays, key Earth-based applications such as remote sensing, navigation, and telecommunication, rely on satellite
technology on a daily basis. To ensure a high reliability of these services, the safety and operations of satellites in
orbit has to be guaranteed. In this context, advancements in the field of Guidance, Navigation, and Control (GNC) were
made in the past years to cope with the challenges involved in In-Orbit Servicing (IOS) and Active Debris Removal
(ADR) missions [
1
,
2
]. For such scenarios, the estimation of the relative position and attitude (pose) of an uncooperative
spacecraft by an active servicer spacecraft represents a critical task. Pose estimation systems based solely on a monocular
camera are recently becoming an attractive alternative to systems based on active sensors or stereo cameras, due to
their reduced mass, power consumption and system complexity [
3
]. However, given the low Signal-To-Noise Ratio
(SNR) and the high contrast which characterize space images, a significant effort is still required to comply with most
of the demanding requirements for a robust and accurate monocular-based navigation system. Interested readers are
referred to Ref. [
4
] for a recent overview of the current trends in monocular-based pose estimation systems. Notably,
the pose estimation problem is in this case complicated by the fact that the target satellite is uncooperative, namely
retained as not functional and/or not able to aid the relative navigation. Above all, the navigation system cannot rely
PhD Candidate, Department of Space Engineering, L.PasqualettoCassinis@tudelft.nl
Assistant Professor, Department of Space Engineering, R.Fonod@tudelft.nl
Professor, Department of Space Engineering, E.K.A.Gill@tudelft.nl
§Engineer, Space Robotics Projects, ingo.ahrns@airbus.com
Engineer, TEC-ECN, Jesus.Gil.Fernandez@esa.int
1
on known visual markers, as they are typically not installed on an uncooperative target. Since the extraction of visual
features is an essential step in the pose estimation process, advanced Image Processing (IP) techniques are required to
extract keypoints (or interest points), corners, and/or edges on the target body. In model-based methods, the detected
features are then matched with pre-defined features on an offline wireframe 3D model of the target to solve for the
relative pose. In other words, a reliable detection of key features under adverse orbital conditions is highly desirable
to guarantee safe operations around an uncooperative spacecraft. Moreover, it would be beneficial from a different
standpoint to obtain a model of feature detection uncertainties. This would provide the navigation system with additional
statistical information about the measurements, which could in turn improve the robustness of the entire estimation process.
Unfortunately, standard pose estimation solvers such as the Efficient Perspective-n-Point (EPnP) [
5
], the Efficient
Procrustes Perspective-n-Point (EPPnP) [
6
], or the multi-dimensional Newton Raphson Method (NRM) [
7
] do not have
the capability to include features uncertainties. Only recently, a Covariant EPPnP (CEPPnP) solver was introduced
to exploit statistical information by including feature covariances in the pose estimation [
8
]. The authors proposed
a method for computing the covariance which takes different camera poses to create a fictitious distribution around
each detected keypoint. Furthermore, other authors [
9
] proposed an improved pose estimation method based on
projection vector in which the covariance is associated to the image gradient magnitude and direction at each feature
location. However, in both methods the derivation of features covariance matrices cannot be directly related to the actual
detection uncertainty. In this context, Convolutional Neural Networks (CNN) could be exploited to return relevant statis-
tical information about the detection step. This could in turn provide a reliable representation of the detection uncertainty.
The implementation of CNNs for monocular pose estimation in space has already become an attractive solution in
recent years [
10
12
], also thanks to the creation of the Spacecraft PosE Estimation Dataset (SPEED) [
11
], a database of
highly representative synthetic images of PRISMA’s TANGO spacecraft made publicly available by Stanford’s Space
Rendezvous Laboratory (SLAB) applicable to train and test different network architectures. One of the main advantages
of CNNs over standard feature-based algorithms for relative pose estimation [
3
,
13
,
14
] is an increase in the robustness
under adverse illumination condition, as well as a reduction in the computational complexity. Since the pose accuracies
of the first adopted CNNs proved to be lower than the accuracies returned by common pose estimation solvers, especially
in the estimation of the relative attitude [
10
], recent efforts investigated the capability of CNNs to perform keypoint
localization prior to the actual pose estimation [
15
18
]. The output of these networks is a set of so called heatmaps
around pre-trained features. The coordinates of the heatmap’s peak intensity characterize the predicted feature location,
with the intensity and the shape indicating the confidence of locating the corresponding keypoint at this position [
15
].
Additionally, due to the fact that the trainable features can be selected offline prior to the training, the matching of the
extracted feature points with the features of the wireframe model can be performed without the need of a large search
space for the image-model correspondences, which usually characterizes most of the edges/corners-based methods [
19
].
To the best of the authors’ knowledge, the reviewed implementations of CNNs feed solely the heatmap’s peak
location into the pose estimation solver, despite multiple information could be extracted from the detected heatmaps.
Only in Ref. [
15
], the pose estimation is solved by assigning weights to each feature based on their heatmap’s peak
intensities, in order to penalize inaccurate detections. Yet, there is another aspect related to the heatmaps which has not
been considered. It is in fact hardly acknowledged how the overall shape of the detected heatmaps returned by CNN
can be translated into a statistical distribution around the peak, allowing reliable feature covariances and, in turn, a
robust navigation performance. Despite the general claim that visual-based navigation filters can generally work with
an a-priori, overestimated, constant measurement covariance matrix, an accurate representation of the measurements
uncertainty can in fact be beneficial also for the estimation of the relative state vector, which would include the relative
pose as well as the relative translational and rotational velocities.
In this framework, the main contribution of this paper is to combine a CNN-based feature detector with a CEPPnP
solver and an Extended Kalman Filter (EKF). The filter accounts for representative feature covariances, computed directly
from the statistical distribution derived from the CNN heatmaps, and is initialized by the CEPPnP at time
t0
. Specifically,
the novelty of this work stands in linking the current research on CNN-based feature detection, covariant-based PnP
solvers, and navigation filters for relative pose estimation. The work is driven by two main objectives:
1) To improve filter initialization by incorporating heatmaps-derived covariance matrices in the CEPPnP
2) To guarantee a robust estimation by including a representative measurement covariance matrix in the EKF.
2
(a) Co-moving LVLH frame [20] (b) PnP problem (Figure adapted from [3])
Fig. 1 Representation of the relative motion framework (left), and schematic of the pose estimation problem
using a monocular image (right).
The paper is organized as follows. The overall pose estimation framework is illustrated in Section II. Section III
introduces the proposed CNN architecture together with the adopted training, validation, and testing datasets. In Section
IV, special focus is given to the derivation of covariance matrices from the CNN heatmaps, whereas Section V describes
the CEPPnP solver. Besides, Section VI provides a description of the adopted EKF. The simulation environment is
presented in Section VII together with the simulation results. Finally, Section VIII provides the main conclusions and
recommendations.
II. Pose Estimation Framework
This work considers a servicer spacecraft flying in relative motion around a target spacecraft located in a Low
Earth Orbit (LEO), with the relative motion being described in a Local Vertical Local Horizontal (LVLH) reference
frame co-moving with the servicer (Figure 1a). Furthermore, it is assumed that the servicer is equipped with a single
monocular camera. The relative attitude of the target with respect to the servicer can then be defined as the rotation
of the target body-fixed frame B with respect to the servicer camera frame C, where these frames are tied to each
spacecraft’s body. The distance between these two frames defines their relative position. Together, these two quantities
characterize the relative pose. This information can then be transferred from the camera frame to the servicer’s center of
mass by accounting for the relative pose of the camera with respect to the LVLH frame.
From a high-level perspective, a model-based monocular pose estimation system receives as input a 2D image and
matches it with an existing wireframe 3D model of the target spacecraft to estimate the pose of such target with respect
to the servicer camera. Referring to Figure 1b, the pose estimation problem consists in determining the position of the
target’s centre of mass
tC
and its orientation with respect to the camera frame C, represented by the rotation matrix
RC
B
.
The Perspective-n-Points (PnP) equations,
rC=xCyCzCT
=RC
BrB+tC(1)
p=(ui,vi)=xC
zCfx+Cx,yC
zCfy+Cy,(2)
relate the unknown pose with the corresponding point
p
in the image plane. Here,
rB
is a point in the 3D model,
expressed in the body-frame coordinate system B, whereas
fx
and
fy
denote the focal lengths of the camera and
(Cx,Cy
)
is the principal point of the image.
From these equations, it can already be seen that an important aspect of estimating the pose resides in the capability
of the IP system to extract features
p
from a 2D image of the target spacecraft, which in turn need to be matched with
3
Incoming light
CNN-based IP
2D Image
Initialization
CEPPnP
Navigation
Covariance computationFeature detection
Peak location
Covariance
Fig. 2 Functional flow of the proposed pose estimation architecture.
pre-selected features
rB
in the wireframe 3D model. Notably, such wireframe model of the target needs to be made
available prior to the estimation. Furthermore, it can be expected that the time variation of the relative pose plays a
crucial role while navigating around the target spacecraft, e.g. if rotational synchronization with the target spacecraft is
required in the final approach phase. As such, it is clear that the estimation of both the relative translational and angular
velocities is an essential step beside the pose estimation itself.
The proposed architecture combines the above key ingredients in three main stages, which are shown in Figure 2
and described in more detail in the following sections. In the CNN-based IP block, a CNN is used to extract features
from a 2D image of the target spacecraft. Statistical information is derived by computing a covariance matrix for each
features using the information included in the output heatmaps. In the Navigation block, both the peak locations and the
covariances are fed into an EKF, which estimates the relative pose as well as the relative translational and rotational
velocities. The filter is initialized by the CEPPnP block, which takes peak location and covariance matrix of each feature
as input and outputs the initial relative pose by solving the PnP problem in Eqn.1-2. Thanks to the availability of a
covariance matrix of the detected features, this architecture can guarantee a more accurate representation of feature
uncertainties, especially in case of inaccurate detection of the CNN due to adverse illumination conditions and/or
unfavourable relative geometries between servicer and target. Together with the CEPPnP initialization, this aspect
can return a robust and accurate estimation of the relative pose and velocities and assure a safe approach of the target
spacecraft.
III. Convolutional Neural Network
CNNs are currently emerging as a promising features extraction method, mostly due to the capability of their
convolutional layers to extract high-level features of objects with improved robustness against image noise and
illumination conditions. In order to optimize CNNs for the features extraction process, a stacked hourglass architecture
has been proposed in Ref. [
15
,
16
], and other architectures such as the U-net [
21
], or variants of the hourglass, were
tested in recent years. Compared to the network proposed in Ref. [
15
], the adopted architecture is composed of only
one encoder/decoder block, constituting a single hourglass module. The encoder includes six blocks, each including a
convolutional layer, a batch normalization module and max pooling layer, whereas the six decoder blocks accomodate
an up-sampling block in spite of max pooling. Each convolutional layer is formed by a fixed number of filter kernels of
size 3
×
3. In the current analysis, 128 kernels are considered per convolutional layer. Figure 3 shows the high-level
architecture of the network layers, together with the corresponding input and output.
As already mentioned, the output of the network is a set of heatmaps around the selected features. Ideally, the
4
Fig. 3 Overview of the single hourglass architecture. Downsampling is performed in the encoder stage, in
which the image size is decrease after each block, whereas upsampling occurs in the decoder stage. The output
of the network consists of heatmap responses, and is used for keypoints localization.
Fig. 4 Illustration of the selected features for a given Envisat pose.
heatmap’s peak intensity associated to a wrong detection should be relatively small compared to the correctly detected
features, highlighting that the network is not confident about that particular wrongly-detected feature. At the same
time, the heatmap’s amplitude should provide an additional insight into the confidence level of each detection, a large
amplitude being related to large uncertainty about the detection. The network is trained with the
x
- and
y
- image
coordinates of the feature points, computed offline based on the intrinsic camera parameters as well as on the feature
coordinates in the target body frame, which were extracted from the wireframe 3D model prior to the training. During
training, the network is optimized to locate 16 features of the Envisat spacecraft, consisting of the corners of the main
body, the Synthetic-Aperture Radar (SAR) antenna, and the solar panel, respectively. Figure 4 illustrates the selected
features for a specific target pose.
A. Training, Validation and Test
For the training, validation, and test datasets, synthetic images of the Envisat spacecraft were rendered in the Cinema
4D
©
software. Table 1 lists the main camera parameters adopted. A constant Sun elevation and azimuth angles of
30 degrees were chosen in order to recreate favourable as well as adverse illumination conditions. Relative distances
between camera and target were chosen in the interval 90 m - 180 m. Relative attitudes were generated by discretizing
the yaw, pitch, and roll angles of the target with respect to the camera by 10 degrees each. Together, these two choices
were made in order to recreate several relative geometries between the servicer and the target. The resulting database
was then shuffled to randomize the images, and was ultimately split into training (40,000 images), validation (4,000
5
Table 1 Parameters of the camera used to generate the synthetic images in Cinema 4D©.
Parameter Value Unit
Image resolution 512×512 pixels
Focal length 3.9·103m
Pixel size 1.1·105m
Fig. 5 A montage of eight synthetic images selected from the training set.
images), and test (10,000 images) datasets. Figure 5 shows some of the images included in the training dataset.
During training, the validation dataset is used beside the training one to compute the validation losses and avoid
overfitting. In this work, a learning rate of 0.1 is selected together with a batch size of 20 images and a total number of
20 epochs. Finally, the network performance after training is assessed with the test dataset.
Preliminary results on the network performance were already reported in Ref. [
22
]. Above all, one key advantage of
relying on CNNs for feature detection was found in the capability of learning the relative position between features
under a variety of relative poses present in the training. As a result, both features which are not visible due to adverse
illumination and features occulted by other parts of the target can be detected. Besides, a challenge was identified in
the specific selection of the trainable features. Since the features selected in this work represent highly symmetrical
points of the Envisat spacecraft, such as corners of the solar panel, SAR antenna or main body, the network is in some
scenarios unable to distinguish between similar features, and returns multiple heatmaps for a single feature output.
Figure 6 illustrates these findings.
Although improving on the CNN architecture is expected to cope with the above challenges and return more accurate
heatmaps, it is also believed that these scenarios can be properly handled, if a large uncertainty can be associated to
a wrong and/or inaccurate detection. This task can be performed by deriving a covariance matrix for each detected
feature, in order to represent its detection uncertainty. Above all, this can prevent the pose solver and the EKF from
trusting wrong detections, by relying more on other accurate features.
IV. Covariance Computation
In order to derive a covariance matrix associated to each feature from the heatmaps detected by the CNN, the first
step is to obtain a statistical population around the heatmap’s peak. This is done by thresholding each heatmap image so
that only the
x
- and
y
- location of heatmap’s pixels are extracted. Secondly, each pixel within the population is given a
normalized weight
wi
based on the gray intensity at its location. This is done in order to give more weight to pixels
6
(a) Occulted spacecraft body (b) Heatmap of a single solar panel’s corner
Fig. 6 Robustness and challenges of feature detection with the proposed CNN. On the left-hand side, the
network has been trained to recognize the pattern of the features, and can correctly locate the body features
which are not visible, i.e. parts occulted by the solar panel and corners of the SAR antenna. Conversely, the
right-hand side shows the detection of multiple heatmaps for a single corner of the solar panel. As can be seen,
the network can have difficulties in distinguishing similar features, such as the corners of the solar panel.
which are particularly bright and close to the peak, and less weight to pixels which are very faint and far from the peak.
Finally, the obtained statistical population of each feature is used to compute the weighted covariance between
x,y
and
consequently the covariance matrix Ci:
Ci= cov(x,x)cov(x,y)
cov(y,x)cov(y,y)!(3)
where
cov(x,y)=
n
Õ
i=1
wi(xipx)·(yipy).(4)
In this work, the mean is replaced by the peak location
p=(px,py)
in order to represent a distribution around the peak
of the detected feature, rather than around the heatmap’s mean. This is particularly relevant when the heatmaps are
asymmetric and their mean does not coincide with their peak.
Figure 7 shows the overall flow to obtain the covariance matrix for three different heatmap shapes. The ellipse
associated to each features covariance is obtained by computing the eigenvalues λx,yof the covariance matrix,
x
λx2
+y
λy2
=s(5)
where
s
defines the scale of the ellipse and is derived from the confidence interval of interest, e.g.
s=
2
.
2173 for a 68%
confidence interval. As can be seen, different heatmaps can result in very different covariance matrices. Above all, the
computed covariance can capture the different CNN uncertainty over
x
,
y
. Notice that, due to its symmetric nature, the
covariance matrix can only represent normal distributions. As a result, asymmetrical heatmaps such as the one in the
third scenario are approximated by Gaussian distributions characterized by an ellipse which might overestimate the
heatmap’s dispersion over some directions.
V. Pose Estimation
The CEPPnP method proposed in Ref. [
8
] was selected to estimate the relative pose from the detected features as
well as from their covariance matrices. The first step of this method is to rewrite the PnP problem in Eqn.1-2 as a
function of a 12-dimensional vector ycontaining the control point coordinates in the camera reference system:
7
0.68
0.99
0.68
0.99 0.99 0.68
Thresholding
Weight allocation
Covariance computation
Fig. 7 Schematic of the procedure followed to derive covariance matrices from CNN heatmaps. The displayed
ellipses are derived from the computed covariances by assuming the confidence intervals 1σ=0.68 and
3σ=0.99.
M y =0(6)
where
M
is a 2
n×
12 known matrix. This is the fundamental equation in the EPnP problem [
5
]. The likelihood of each
observed feature location uiis then represented as
P(ui)=k·e1
2uT
iC1
uiui(7)
where
ui
is a small, independent and unbiased noise with expectation
E[ui]=
0and covariance
E[uiuT
i]=σ2Cui
and
k
is a normalization constant. Here,
σ2
represents the global uncertainty in the image, whereas
Cui
is the 2x2
uncertainty covariance matrix of each detected feature, computed from the CNN heatmaps. After some calculations, the
EPnP formulation can be rewritten as
(NL)y=λy.(8)
This is an eigenvalue problem in which both
N
and
L
matrices are a function of
y
and
Cui
. The problem is solved
iteratively by means of the closed-loop EPPnP solution for the four control points, assuming no feature uncertainty.
Once
y
is estimated, the relative pose is computed by solving the generalized Orthogonal Procrustes problem used in
the EPPnP [6].
VI. Navigation Filter
From a high level perspective, two different navigation architectures are normally exploited in the framework of
relative pose estimation. A tightly-coupled architecture, where the extracted features are directly processed by the
navigation filter as measurements, and a loosely-coupled architecture, in which the relative pose is computed prior to
the navigation filter to derive pseudomeasurements from the target features [
23
]. Usually, a loosely-coupled approach
is preferred for an uncooperative tumbling target, due to the fact that the fast relative dynamics could jeopardize
feature tracking and return highly-variable measurements to the filter. However, one shortcoming of this approach is
that it is generally hard to obtain a representative covariance matrix for the pseudomeasurements. This can be quite
8
challenging when filter robustness is demanded. Remarkably, the adoption of a CNN in the feature detection step can
overcome the challenges in feature tracking by guaranteeing the detection of a constant, pre-defined set of features.
At the same time, the CNN heatmaps can be used to derive a measurements covariance matrix and improve filter robustness.
Following this line of reasoning, a thighly-coupled EKF was chosen in this work. The state vector is composed
of the relative pose between the servicer and the target, as well as the relative translational and rotational velocities.
Under the assumption that the camera frame onboard the servicer is co-moving with the LVLH frame, with the camera
boresight aligned with the along-track direction, this translates into
x=©«
tC
v
q
w
ª®®®®¬
(9)
where the unit quaternion
q
replaces the direction cosine matrix
RC
B
to represent the coordinate transformation. The state
propagation step decouples the translational dynamics from the rotational dynamics. The perturbation-free, closed-form
solution of the well known Clohessy-Wiltshire linear equations was chosen to describe the relative motion between the
servicer and the target: tC
v!k+1
=ΦCW tC
v!k
(10)
where
ΦCW =©«
1 0 6(θssin θs)1/ωs(4 sin θs3θs)0 2/ωs(1cos θs)
0 cos θs0 0 1/ωssin θs0
0 0 4 3 cos θs2/ωs(cos θs1)0 sin θs/ωs
0 0 6ωs(1cos θs)4 cos θs3 0 2 sin θs
0ωssin θs0 0 cos θs0
0 0 3ωssin θs2 sin θs0 cos θs
ª®®®®®®®®®¬
.(11)
Here,
ωs
and
θs
represent the servicer’s argument of perigee and true anomaly variation for a time step
t
, respectively.
The relative quaternions and the rotational velocity are also propagated assuming a perturbation-free dynamics with
constant angular velocity:
qk+1=qkq(wkt)(12)
ωk+1=ωk(13)
where the term
q(wkt)
represents the unit quaternion equivalent to a rotation of
wkt
. The propagation of the state
covariance matrix
P
is done by accounting for a linearized version of Eqn.12. The reader is referred to Ref. [
24
] for a
detailed description of this derivation.
In the measurements update step, the measurements vector is made of the
x
- and
y
- coordinates of the
n=
16
detected features in the image plane,
z=(x1,y1. . . xn,yn)T.(14)
As anticipated, the measurement covariance matrix
R
is a time-varying block diagonal matrix constructed with the
heatmaps-derived covariances Ciin Eqn.3,
R=©«
C1
...
Cnª®®®¬
.(15)
9
Notice that Cican differ for each feature in a given frame as well as vary over time.
The observation model
h
is a 2
n×
1 column vector representing the
x
- and
y
- coordinates of the 16 features detected
by the CNN. Referring to Eqn.1-2, this translates into the following equations for each detected point pi:
hi= xC
i
zC
i
fx+Cx,yC
i
zC
i
fy+Cy!T
(16)
rC=qrB
iq+tC(17)
where
denotes the quaternion product. As a result, the Jacobian
H
of the observation model with respect of the state
vector is a 2n×13 matrix computed as
H=©«
HtC
,i02n×3Hq,i02n×3
.
.
..
.
..
.
..
.
.
HtC
,n02n×3Hq,n02n×3ª®®®¬
(18)
where
Hr,i=Hint
i·Hext
tC
,i(19)
Hq,i=Hint
i·Hext
q,i(20)
Hint
i=hi
rC
i
=©«
fx
zC
i
0fx
(zC
i)2xC
i
0fy
zC
i
fy
(zC
i)2yCª®¬(21)
Hext
q,i=
rC
i
q=
qrB
iq
q;Hext
tC
,i=
rC
i
tC=I3.(22)
The partial derivative in Eqn.22 is computed according to Ref. [24].
After the measurements update step, brute-force normalization is performed to ensure unity of the estimated
quaternion [24].
VII. Simulations
In this section, the simulation environment and the results are presented. Firstly, the impact of including a
heatmaps-derived covariance in the pose estimation step is addressed by comparing the CEPPnP method with a standard
solver which does not account for feature uncertainty. Secondly, the performance of the EKF is evaluated by comparing
the convergence profiles with a heatmaps-derived covariance matrix against covariance matrices with arbitrary selected
covariances. Initialization is provided by the CEPPnP for all the scenarios.
Two separate error metrics are adopted in the evaluation, in accordance with Ref. [
11
]. Firstly, the translational error
between the estimated relative position ˆ
tCand the ground truth tis computed as
ET=|tCˆ
tC|.(23)
This metric is also applied for the translational and rotational velocities estimated in the navigation filter. Secondly, the
attitude accuracy is measured in terms of the Euler axis-angle error between the estimated quaternion
ˆ
q
and the ground
truth q,
β=βsβv=qˆ
q(24)
ER=2 arccos (| βs|).(25)
10
A. Pose Estimation
Three representative scenarios are selected from the CNN test dataset for the evaluation. These scenarios were
chosen in order to analyze different heatmaps’ distributions around the detected features. A comparison is made between
the proposed CEPPnP and the EPPnP. Figure 8 shows the characteristics of the covariance matrices derived from the
predicted heatmaps. Here, the ratio between the minimum and maximum eigenvalues of the associated covariances is
represented against the ellipse’s area and the RMSE between the Ground Truth (GT) and the x, y coordinates of the
extracted features,
ERMSE,i=q(xGT,ixi)2+(yGT,iyi)2.(26)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Frame ID
0
0.5
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Frame ID
0
50
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Frame ID
0
0.5
1
1.5
x
y
(a) Test scenario 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Frame ID
0
0.5
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Frame ID
0
100
200
300
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Frame ID
0
1
2
3x
y
(b) Test scenario 2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Frame ID
0
0.5
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Frame ID
0
200
400
600
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Frame ID
0
5
10
x
y
(c) Test scenario 3
Fig. 8 Characteristics of the ellipses derived from the covariance matrices for the three selected scenarios.
Notably, interesting relations can be established between the three quantities reported in the figure. In the first scenario,
the correlation between the sub-pixel RMSE and the large eigenvalues ratio suggests that a very accurate CNN detection
can be associated with circular-shaped heatmaps. Moreover, the relatively low ellipse’s areas indicate that, in general,
small heatmaps are expected for an accurate detection. Conversely, in the second scenario the larger ellipses’ area
correlates with a larger RMSE. Furthermore, it can be seen that the largest difference between the
x
- and
y
- components
of the RMSE occurs either for the most eccentric heatmap (ID 13) or for the one with the largest area (ID 6). The same
behaviour can be observed in the last scenario, where the largest RMSE coincides with a large, highly eccentric heatmap.
11
Table 2 Pose Estimation performance results.
Metric Scenario CEPPnP EPPnP
ET[m]
1 [0.18 0.22 0.24] [0.17 0.22 0.24]
2 [0.35 0.41 0.59] [0.14 0.4 22.8]
3 [0.49 0.12 1.41] [0.56 0.16 5.01]
ER[deg/s]
1 0.36 0.35
2 0.75 6.08
3 1.99 2.72
Fig. 9 Montage of the VBAR approach scenario 1 (Top) and 2 (Bottom). Images are shown every 12 s for
clarity.
Table 2 lists the pose estimation results for the three scenarios. As anticipated in Figure 8, the statistical information
derived from the heatmaps in the first scenario is uniform for all the features, due to the very accurate CNN detection.
As a result, the inclusion of features covariance in the CEPPnP solver does not help refining the estimated pose. Both
solvers are characterized by the same pose accuracy.
Not surprisingly, the situation changes as soon as the heatmaps are not uniform across the feature IDs. Due to its
capability of accomodating feature uncertainties in the estimation, the CEPPnP method outperforms the EPPnP for the
remaining scenarios. In other words, the CEPPnP solver proves to be more robust against inaccurate CNN detections by
accounting for a reliable representation of the features covariance.
B. Navigation Filter
To assess the performance of the proposed EKF, two rendezvous scenarios with Envisat are rendered in Cinema
4D
©
. These are perturbation-free VBAR trajectories characterized by a relative velocity
kvk=
0m/s. In both scenarios,
the Envisat performs a roll rotation of
kωk=
5deg/s, with the servicer camera frame aligned with the LVLH frame.
Table 3 lists the initial conditions of the trajectories, whereas Figure 9 shows some of the associated rendered 2D images.
It is assumed that the images are made available to the filter every 2 seconds for the measurement update step, with the
propagation step running at 1 Hz. In both scenarios, the EKF is initialized with the CEPPnP pose solution at time
t0
.
The other elements of the initial state vector are randomly chosen assuming a standard deviation of 1mm/s and 1deg/s
for all the axes of terms (ˆ
v0v)and (ˆ
ω0ω), respectively. Table 4 reports the initial conditions of the filter.
Table 3 VBAR approach scenarios. The attitude is represented in terms of ZYX Euler angles for clarity.
Scenario θ0[deg] ω[deg/s] tC[m] v[mm/s]
1 [-180 30 -80]T[-2.5 -4.3 0.75]T[0 150 0]T[0 0 0]T
2 [-180 -50 160]T[3.8 1.1 -3.02]T[0 180 0]T[0 0 0]T
In order to assess the impact of the measurements covariance on the performance of the EKF, two diagonal matrices
are considered aside from the proposed heatmaps-derived covariance in Eqn.15: a matrix
R=
10
·I2n
, in which 10 is the
12
Table 4 Initial state vector in the EKF.
Scenario ˆ
θ0[deg] ˆ
ω0[deg/s] ˆ
tC
0[m] ˆ
v0[mm/s]
1 [-180.17 28.74 -80.63]T[-2.07 -4.1 0.1]T[0.11 0.07 149.69]T[2.8 -1.3 3]T
2 [-179 -51.06 156.25]T[-2.07 -4.1 0.1]T[0.10 0.26 181.9]T[2.8 -1.3 3]T
Fig. 10 Comparison of the selected navigation scenarios throughout the whole rotational sequence of Envisat,
based on the mean eccentricity (Top) and mean area (Bottom) derived from the features covariances.
selected variance, and a matrix
Ravg
, which uses an averaged variance of the diagonal elements of the heatmaps-derived
covariance.
Figure 10 illustrates two selected mean characteristics of the heatmaps, calculated over the
n
features in each frame,
for the two rotational sequences of the Envisat. First of all, the mean eccentricity of the feature covariances, represented
by the ratio
λmin /λma x
, indicates that the first scenario is characterized by more circular heatmaps than the second one.
Moreover, the intra-frame dispersion of the ellipses’ shape, represented by the
±
1
σ
envelope, is much larger for the
second scenario than it is for the first one. This suggests that the covariances
Ci
in the second scenario can be very
different from each other within a given frame. Lastly, it can be seen that the ellipses’ areas in the second scenario
are larger than the ones in the first scenario throughout the entire rotational sequence. Overall, these considerations
emphasize a worse detection accuracy for certain features in the second scenario.
Figures 11-12 show the convergence profiles for the translational and rotational state for the first scenario, respectively.
Overall, the selection of the measurements covariance seems not to impact on the filter performance. As already
anticipated, this is a consequence of the nearly circular, small heatmaps which characterize almost every feature
throughout this sequence. As such, a diagonal covariance of 10 pixels, an heatmap-derived covariance, or an averaged
diagonal covariance return an almost identical convergence profile. The situation differs considerably when analyzing
the convergence profiles for the second scenario (Figures 13-14). Despite similar convergences occur for the rotational
state as well as for the relative velocity, it can be seen that the selection of the measurement covariance plays a key role in
the along-track error. Specifically, the error trend of the heatmaps-derived covariance oscillates around zero as opposed
to the other two biased estimates. This seems to indicate that the statistical information derived from the CNN heatmaps
can represent the measurements uncertainty and aid the filter convergence also in case of inaccurate feature detections.
13
0 1000 2000 3000 4000 5000 6000 7000
Time [s]
-0.4
-0.2
0
0.2
Along-track
Error [m]
R = 10 RCNN Ravg
0 1000 2000 3000 4000 5000 6000 7000
Time [s]
0.1
0.15
Cross-track
Error [m]
R = 10 RCNN Ravg
0 1000 2000 3000 4000 5000 6000 7000
Time [s]
0.1
0.15 Radial
Error [m]
R = 10 RCNN Ravg
0 1000 2000 3000 4000 5000 6000 7000
Time [s]
0
10
20 10-4 Along-track
Error [m/s]
R = 10 RCNN Ravg
0 1000 2000 3000 4000 5000 6000 7000
Time [s]
-10
-5
0
5
10-4 Cross-track
Error [m/s]
R = 10 RCNN Ravg
0 1000 2000 3000 4000 5000 6000 7000
Time [s]
0
2
410-4 Radial
Error [m/s]
R = 10 RCNN Ravg
Fig. 11 Estimation error for the relative position and velocity - Scenario 1.
0 1000 2000 3000 4000 5000 6000 7000
Time [s]
0
0.5
1
1.5
2
2.5
Error [deg/s]
Angle Error ER
R = 10 RCNN Ravg
0 1000 2000 3000 4000 5000 6000 7000
Time [s]
-0.4
-0.2
0
x
Error [deg/s]
R = 10 RCNN Ravg
0 1000 2000 3000 4000 5000 6000 7000
Time [s]
-1
-0.5
0y
Error [deg/s]
R = 10 RCNN Ravg
0 1000 2000 3000 4000 5000 6000 7000
Time [s]
0
0.2
0.4
0.6
z
Error [deg/s]
R = 10 RCNN Ravg
Fig. 12 Estimation error for the relative attitude and angular velocity - Scenario 1.
Fig. 13 Estimation error for the relative position and velocity - Scenario 2.
14
0 2000 4000 6000 8000 10000 12000 14000
Time [s]
0
1
2
3
4
5
6
7
Error [deg/s]
Angle Error ER
R = 10 RCNN Ravg
Fig. 14 Estimation error for the relative attitude and angular velocity - Scenario 2.
VIII. Conclusions and Recommendations
This paper introduces a novel framework to estimate the relative pose of an uncooperative target spacecraft with a
single monocular camera onboard a servicer spacecraft. A method is proposed in which a CNN-based IP algorithm is
combined with a CEPPnP solver and an EKF to return a robust estimate of the relative pose as well as of the relative
translational and rotational velocities.
The main novelty of the proposed CNN-based method stands in introducing a heatmaps-derived covariance represen-
tation of the detected features. This is done in order to incorporate feature uncertainties in both the pose estimation step
and in the navigation filter. Preliminary results indicate that the correlation between the accuracy of the CNN detection
and the shape of the detected heatmaps can be exploited to compute a covariance matrix for each detected feature.
This covariance can be incorporated in the CEPPnP solver to improve the pose estimation accuracy under inaccurate
CNN detections. Furthermore, the same statistical information can be used to build a representative measurements
covariance matrix and improve the measurements update step of the navigation filter. Together, these improvements
proved to return a more robust and accurate estimate for the considered VBAR approach scenarios. Overall, the
results suggest a promising scheme to cope with the challenging demand for robust navigation in close-proximity scenarios.
However, further work is required in several directions. First of all, more recent CNN architectures shall be
investigated to assess the achievable robustness and accuracy in the feature detection step. Secondly, the comparison
between the CEPPnP and the covariance-free EPPnP solvers shall be extended to the entire test dataset, in order to
validate the improved accuracy so far observed for only three representative scenarios. Moreover, the NRM shall be
included in the pose estimation step to refine the accuracy of the CEPPnP solver and improve filter initialization. Besides,
more close-proximity scenarios shall be recreated to investigate more complex relative trajectories as well as to assess
the impact of perturbations on the accuracy and robustness of the navigation filter. In this context, other navigation
filters such as the Unscented Kalman Filter and the Multiplicative EKF shall be investigated. Finally, alternative datasets
such as SPEED shall be considered as a benchmark for the proposed method.
Acknowledgments
This study is funded and supported by the European Space Agency and Airbus Defence and Space under
Network/Partnering Initiative (NPI) program with grant number NPI 577 – 2017.
References
[1]
Tatsch, A., Fitz-Coy, N., and Gladun, S., “On-orbit Servicing: A Brief Survey,Proceedings of the 2006 Performance Metrics
for Intelligent Systems Workshop, 2006, pp. 21–23.
15
[2]
Wieser, M., Richard, H., Hausmann, G., Meyer, J.-C., Jaekel, S., Lavagna, M., and Biesbroek, R., “e.Deorbit Mission: OHB
Debris Removal Concepts, ASTRA 2015-13th Symposium on Advanced Space Technologies in Robotics and Automation,
Noordwijk, The Netherlands, 2015.
[3]
Sharma, S., Ventura, J., and D’Amico, S., “Robust Model-Based Monocular Pose Initialization for Noncooperative Spacecraft
Rendezvous,Journal of Spacecraft and Rockets, Vol. 55, No. 6, 2018, pp. 1–16.
[4]
Pasqualetto Cassinis, L., Fonod, R., and Gill, E., “Review of the Robustness and Applicability of Monocular Pose Estimation
Systems for Relative Navigation with an Uncooperative Spacecraft,” Progress in Aerospace Sciences, Vol. 110, 2019.
[5]
Lepetit, Moreno-Noguer, F., and Fua, P., “EPnP: an accurate O(n) solution to the PnP problem,International Journal of
Computer Vision, Vol. 81, 2009, pp. 155–166.
[6]
Ferraz, L., Binefa, X., and Moreno-Noguer, F., “Very Fast Solution to the PnP Problem with Algebraic Outlier Rejection,”
IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014.
[7] Ostrowsky, A., Solution of Equations and Systems of Equations, 2nd ed., Academic Press, New York, 1966.
[8]
Ferraz, L., Binefa, X., and Moreno-Noguer, F., “Leveraging Feature Uncertainty in the PnP Problem,” Proceedings of the
British Machine Vision Conference, Nottingham, UK, 2014.
[9]
Cui, J., Min, C., Bai, X., and Cui, J., “An Improved Pose Estimation Method Based on Projection Vector with Noise Error
Uncertainty,IEEE Photonics Journal, Vol. 11, No. 2, 2019.
[10]
Sharma, S., Beierle, C., and D’Amico, S., “Pose Estimation for Non-Cooperative Spacecraft Rendezvous using Convolutional
Neural Networks,IEEE Aerospace Conference, Big Sky, MT, USA, 2018.
[11]
Sharma, S., and D’Amico, S., “Pose Estimation for Non-Cooperative Spacecraft Rendezvous using Neural Networks,29th
AAS/AIAA Space Flight Mechanics Meeting, Ka’anapali, HI, USA, 2019.
[12]
Shi, J., Ulrich, S., and Ruel, S., “CubeSat Simulation and Detection using Monocular Camera Images and Convolutional Neural
Networks,2018 AIAA Guidance, Navigation, and Control Conference, Kissimmee, FL, USA, 2018.
[13]
Rondao, D., and Aouf, N., “Multi-View Monocular Pose Estimation for Spacecraft Relative Navigation,2018 AIAA Guidance,
Navigation, and Control Conference, Kissimmee, FL, USA, 2018.
[14]
Capuano, V., Alimo, S., Ho, A., and Chung, S., “Robust Features Extraction for On-board Monocular-based Spacecraft Pose
Acquisition,” AIAA Scitech 2019 Forum, San Diego, CA, USA, 2019.
[15]
Pavlakos, G., Zhou, X., Chan, A., Derpanis, K., and Daniilidis, K., “6-DoF Object Pose from Semantic Keypoints,IEEE
International Conference on Robotics and Automation, 2017.
[16]
Newell, A., Yang, K., and Deng, J., “Stacked Hourglass Networks for Human Pose Estimation, European Conference on
Computer Vision, Springer, 2016, pp. 483–499.
[17]
Chen, B., Cao, J., Parra, A., and Chin, T., “Satellite Pose Estimation with Deep Landmark Regression and Nonlinear Pose
Refinement,” International Conference on Computer Vision, Seoul, South Korea, 2019.
[18]
Park, T., Sharma, S., and D’Amico, S., “Towards Robust Learning-Based Pose Estimation of Noncooperative Spacecraft,”
AAS/AIAA Astrodynamics Specialist Conference, Portland, ME, USA, 2019.
[19]
D’Amico, S., Benn, M., and Jorgensen, J., “Pose Estimation of an Uncooperative Spacecraft from Actual Space Imagery,”
International Journal of Space Science and Engineering, Vol. 2, No. 2, 2014, pp. 171–189.
[20] Curtis, H., Orbital Mechanics for Engineering Students, Elsevier, 2005.
[21]
Ronneberger, O., Fischer, P., and Brox, T., “U-Net: Convolutional Networks for Biomedical Image Segmentation,Medical
Image Computing and Computer-Assisted Intervention, Springer, 2015, pp. 234–241.
[22]
Pasqualetto Cassinis, L., Fonod, R., Gill, E., Ahrns, I., and Gil Fernandez, J., “Comparative Assessment of Image Processing
Algorithms for the Pose Estimation of an Uncooperative Spacecraft,International Workshop on Satellite Constellations &
Formation Flying, Glasgow, UK, 2019.
[23]
Sharma, S., and D’Amico, S., “Reduced-Dynamics Pose Estimation for Non-Cooperative Spacecraft Rendezvous Using
Monocular Vision,” Advances in the Astronautical Sciences Guidance, Navigation and Control, Vol. 159, 2017.
[24] Civera, J., Davison, A., and Martínez Montiel, J., Structure from Motion using the Extended Kalman Filter, Springer, 2012.
16
... Some examples include Proença and Gao 8 who qualitatively test their CNN on a video of the Soyuz spacecraft captured in LEO and Pasqualetto Cassinis et al. 12 who test their CNN on 100 images of the Envisat mockup spacecraft captured at the Orbital Robotics and GNC lab of ESTEC. On the other hand, Pasqualetto et al. 5 integrate a CNN into an Extended Kalman Filter (EKF), but it is only tested on a trajectory of synthetic images. To the authors' best knowledge, there is currently no literature that simultaneously achieves (1) integration of CNN or any ML models into a navigation filter for space missions and (2) evaluation of its performance and robustness on spacecraft trajectory images that originate from a source different from the synthetic training images. ...
... Then, these covariance matrices for each keypoint is used to construct the corresponding portion of the measurement noise matrix, R ∈ R (2K+6)×(2K+6) , at each iteration by populating the 2 × 2 entries along the diagonal of the upper-left 2K × 2K portion of R with the corresponding covariance matrices. For more details, the readers are referred to Pasqualetto Cassinis et al. 5 The measurement noise covariance for the h E measurements cannot be estimated online on an unknown image domain as it is done for the keypoints. Therefore, the lower-right 6 × 6 portion of R derives from the SPNv2's performance on the synthetic validation set of SPEED+ scaled by some arbitrary ad hoc constants. ...
Preprint
This paper presents a neural network-based Unscented Kalman Filter (UKF) to track the pose (i.e., position and orientation) of a known, noncooperative, tumbling target spacecraft in a close-proximity rendezvous scenario. The UKF estimates the relative orbital and attitude states of the target with respect to the servicer based on the pose information extracted from incoming monocular images of the target spacecraft with a Convolutional Neural Network (CNN). In order to enable reliable tracking, the process noise covariance matrix of the UKF is tuned online using adaptive state noise compensation. Specifically, the closed-form process noise model for the relative attitude dynamics is newly derived and implemented. In order to enable a comprehensive analysis of the performance and robustness of the proposed CNN-powered UKF, this paper also introduces the Satellite Hardware-In-the-loop Rendezvous Trajectories (SHIRT) dataset which comprises the labeled imagery of two representative rendezvous trajectories in low Earth orbit. For each trajectory, two sets of images are respectively created from a graphics renderer and a robotic testbed to allow testing the filter's robustness across domain gap. The proposed UKF is evaluated on both domains of the trajectories in SHIRT and is shown to have sub-decimeter-level position and degree-level orientation errors at steady-state.
... Moreover, driven by the limited on-board power and computational capabilities, a monocular camera is a favored choice of sensor as opposed to more complex systems such as stereovision, Range Detection and Ranging (RADAR) or Light Detection and Ranging (LIDAR). Recently, Machine Learning (ML) techniques based on Convolutional Neural Networks (CNN) have been explored in applications of vision-based spacecraft pose estimation and navigation [41,29,24,10,9,35,31,8]; however, it is impractical to acquire large-scale imagery of the interested target spacecraft with accurate pose labels in space to train CNN models. This is the main reason CNN models have not been deployed in space so far. ...
... Dung et al. [15] introduced a dataset comprising about 3,000 synthetic and spaceborne images of random satellites with bounding box and segmentation labels. Other authors have also developed their own datasets, such as synthetic images of the Envisat spacecraft rendered with Blender [9,31] and Cygnus spacecraft rendered with Blender and its Cycles rendering engine [8]. ...
Preprint
Full-text available
Autonomous vision-based spaceborne navigation is an enabling technology for future on-orbit servicing and space logistics missions. While computer vision in general has benefited from Machine Learning (ML), training and validating spaceborne ML models are extremely challenging due to the impracticality of acquiring a large-scale labeled dataset of images of the intended target in the space environment. Existing datasets, such as Spacecraft PosE Estimation Dataset (SPEED), have so far mostly relied on synthetic images for both training and validation, which are easy to mass-produce but fail to resemble the visual features and illumination variability inherent to the target spaceborne images. In order to bridge the gap between the current practices and the intended applications in future space missions, this paper introduces SPEED+: the next generation spacecraft pose estimation dataset with specific emphasis on domain gap. In addition to 60,000 synthetic images for training, SPEED+ includes 9,531 simulated images of a spacecraft mockup model captured from the Testbed for Rendezvous and Optical Navigation (TRON) facility. TRON is a first-of-a-kind robotic testbed capable of capturing an arbitrary number of target images with accurate and maximally diverse pose labels and high-fidelity spaceborne illumination conditions. SPEED+ will be used in the upcoming international Satellite Pose Estimation Challenge co-hosted with the Advanced Concepts Team of the European Space Agency to evaluate and compare the robustness of spaceborne ML models trained on synthetic images.
... The characteristics of space images also challenge conventional vision-based navigation algorithms for spacecraft, while the DL technique provides promising solutions and performance that can alleviate these issues. In terms of dynamic lighting, DNN-based schemes show increased robustness in attitude initialisation [21,26]. With the deployment of high-performance devices, CNN-based methods can not only provide a lower computational complexity in pose acquisition process, but also reduce the need for complicated dynamics models [27]. ...
... The overall structure of the network designed by Huo et al.[52].Another indirect DLs-based scheme combines a CNN-based feature detector with a PnP solver and an Extended Kalman Filter (EKF) to guarantee a robust pose estimation[26]. The authors build an hourglass-shaped CNN composed of a six-block encoder and a six-block decoder to estimate the heatmaps of 16 predefined corners on the Envisat spacecraft. A target detection module is not incorporated since the presence of Earth in the background is not considered. ...
Preprint
Autonomous spacecraft relative navigation technology has been planned for and applied to many famous space missions. The development of on-board electronics systems has enabled the use of vision-based and LiDAR-based methods to achieve better performances. Meanwhile, deep learning has reached great success in different areas, especially in computer vision, which has also attracted the attention of space researchers. However, spacecraft navigation differs from ground tasks due to high reliability requirements but lack of large datasets. This survey aims to systematically investigate the current deep learning-based autonomous spacecraft relative navigation methods, focusing on concrete orbital applications such as spacecraft rendezvous and landing on small bodies or the Moon. The fundamental characteristics, primary motivations, and contributions of deep learning-based relative navigation algorithms are first summarised from three perspectives of spacecraft rendezvous, asteroid exploration, and terrain navigation. Furthermore, popular visual tracking benchmarks and their respective properties are compared and summarised. Finally, potential applications are discussed, along with expected impediments.
... [7][8][9][10][11][12] Recently, Machine Learning (ML) techniques based on Convolutional Neural Networks (CNN) have been developed to replace the feature extraction step with superior performance. [13][14][15][16][17] However, unlike image processing-based methods, a CNN must be trained on a large set of target images with accurate pose labels. While the data-hungry nature of the training is a ubiquitous challenge for any ML applications, it is especially difficult and outright impractical in spaceborne applications to acquire a large quantity of images of interested targets in various space environments with accurate pose labels. ...
... 14,15,17,18 Therefore, in order for a CNN-based system to be deployed to space missions: 1) it must be trained to be robust against various adversarial conditions in space such as high contrast, extreme shadowing and low signal-to-noise ratio, and 2) its robustness must be validated on ground with no access to target spaceborne images. Previous works have attempted to address the robust training with extensive data augmentation during training, 14,16 but relatively fewer works or efforts have been dedicated to the issue of on-ground validation of CNNs. 13 The most promising method involves a robotic testbed that is capable of re-creating various space environmental conditions and configuring the camera and the target model to achieve the desired relative pose with high accuracy. ...
Preprint
Full-text available
This work presents the most recent advances of the Robotic Testbed for Rendezvous and Optical Navigation (TRON) at Stanford University - the first robotic testbed capable of validating machine learning algorithms for spaceborne optical navigation. The TRON facility consists of two 6 degrees-of-freedom KUKA robot arms and a set of Vicon motion track cameras to reconfigure an arbitrary relative pose between a camera and a target mockup model. The facility includes multiple Earth albedo light boxes and a sun lamp to recreate the high-fidelity spaceborne illumination conditions. After the overview of the facility, this work details the multi-source calibration procedure which enables the estimation of the relative pose between the object and the camera with millimeter-level position and millidegree-level orientation accuracies. Finally, a comparative analysis of the synthetic and TRON simulated imageries is performed using a Convolutional Neural Network (CNN) pre-trained on the synthetic images. The result shows a considerable gap in the CNN's performance, suggesting the TRON simulated images can be used to validate the robustness of any machine learning algorithms trained on more easily accessible synthetic imagery from computer graphics.
... Single image satellite pose estimation algorithms based on Convolutional Neural Network (CNN) architectures are giving very promising results, the authors of [8] show a solution where the CNN network is trained so that the extracted 2D keypoints correspond to the points of a 3D model in pre- defined order. To improve the accuracy of the pose estimation the authors of [9] associate a covariance matrix to the keypoints location returned by the CNN for each detected feature. However, to train the neural networks, a large amount of images are needed, which are often generated by computer graphics rendering due to the lack of real images, this led to a gap between real and simulated performances. ...
Preprint
Full-text available
In this work, we have analyzed the problem of relative pose initialization between two satellites: a chaser and a non-cooperating target. The analysis has been targeted to two close-range methods based on a monocular camera system: the Sharma-Ventura-D'Amico (SVD) method and the silhouette matching method. Both methods are based on a priori knowledge of the target geometry, but neither fiducial markers nor a priori range measurements or state information are needed. The tests were carried out using a 2U CubeSat mock-up as target attached to a motorized rotary stage to simulate its relative motion with respect to the chaser camera. A motion capture system was used as a reference instrument that provides the fiducial relative motion between the two mock-ups and allows to evaluate the performances of the initialization algorithms analyzed.
... To compensate for uncertainties in keypoint predictions, Cassinis et al. (2020) 21 show the problem can be modeled as predicting a heatmap (where peaks represent high-confidence locations of a given keypoint in the input image) for each keypoint as opposed to regressing coordinates directly. This provides some interpretability as each prediction can be visualized as a feature-heatmap when juxtaposed with the original image, and the 2D output allows for richer keypoint features to be fed to the accompanying pose solver. ...
Preprint
A key requirement for autonomous on-orbit proximity operations is the estimation of a target spacecraft's relative pose (position and orientation). It is desirable to employ monocular cameras for this problem due to their low cost, weight, and power requirements. This work presents a novel convolutional neural network (CNN)-based monocular pose estimation system that achieves state-of-the-art accuracy with low computational demand. In combination with a Blender-based synthetic data generation scheme, the system demonstrates the ability to generalize from purely synthetic training data to real in-space imagery of the Northrop Grumman Enhanced Cygnus spacecraft. Additionally, the system achieves real-time performance on low-power flight-like hardware.
Article
The relative pose estimation of a space noncooperative target is an attractive yet challenging task due to the complexity of the target background and illumination, and the lack of a priori knowledge. Unfortunately, these negative factors have a grave impact on the estimation accuracy and the robustness of filter algorithms. In response, this paper proposes a novel filter algorithm to estimate the relative pose to improve the robustness based on a stereovision system. First, to obtain a coarse relative pose, the weighted total least squares (WTLS) algorithm is adopted to estimate the relative pose based on several feature points. The resulting relative pose is fed into the subsequent filter scheme as observation quantities. Second, the classic Bayes filter is exploited to estimate the relative state except for moment-of-inertia ratios. Additionally, the one-step prediction results are used as feedback for WTLS initialization. The proposed algorithm successfully eliminates the dependency on continuous tracking of several fixed points. Finally, comparison experiments demonstrate that the proposed algorithm presents a better performance in terms of robustness and convergence time.
Article
Autonomous spacecraft relative navigation technology has been planned for and applied to many famous space missions. The development of on-board electronics systems has enabled the use of vision-based and LiDAR-based methods to achieve better performances. Meanwhile, deep learning has reached great success in different areas, especially in computer vision, which has also attracted the attention of space researchers. However, spacecraft navigation differs from ground tasks due to high reliability requirements but lack of large datasets. This survey aims to systematically investigate the current deep learning-based autonomous spacecraft relative navigation methods, focusing on concrete orbital applications such as spacecraft rendezvous and landing on small bodies or the Moon. The fundamental characteristics, primary motivations, and contributions of deep learning-based relative navigation algorithms are first summarised from three perspectives of spacecraft rendezvous, asteroid exploration, and terrain navigation. Furthermore, popular visual tracking benchmarks and their respective properties are compared and summarised. Finally, potential applications are discussed, along with expected impediments.
Article
Full-text available
The relative pose estimation of an inactive spacecraft by an active servicer spacecraft is a critical task in the design of current and planned space missions, due to its relevance for close-proximity operations, such as In-Orbit Servicing and Active Debris Removal. This paper introduces a novel framework to enable robust monocular pose estimation for close-proximity operations around an uncooperative spacecraft, which combines a Convolutional Neural Network (CNN) for feature detection with a Covariant Efficient Procrustes Perspective-n-Points (CEPPnP) solver and a Multiplicative Extended Kalman Filter (MEKF). The performance of the proposed method is evaluated at different levels of the pose estimation system. A Single-stack Hourglass CNN is proposed for the feature detection step in order to decrease the computational load of the Image Processing (IP), and its accuracy is compared to the standard, more complex High-Resolution Net (HRNet). Subsequently, heatmaps-derived covariance matrices are included in the CEPPnP solver to assess the pose estimation accuracy prior to the navigation filter. This is done in order to support the performance evaluation of the proposed tightly-coupled approach against a loosely-coupled approach, in which the detected features are converted into pseudomeasurements of the relative pose prior to the filter. The performance results of the proposed system indicate that a tightly-coupled approach can guarantee an advantageous coupling between the rotational and translational states within the filter, whilst reflecting a representative measurements covariance. This suggest a promising scheme to cope with the challenging demand for robust navigation in close-proximity scenarios. Synthetic 2D images of the European Space Agency’s Envisat spacecraft are used to generate datasets for training, validation and testing of the CNN. Likewise, the images are used to recreate a representative close-proximity scenario for the validation of the proposed filter.
Conference Paper
Full-text available
This paper reports on a comparative assessment of Image Processing (IP) techniques for the relative pose estimation of uncooperative spacecraft with a monocular camera. Currently, keypoints-based algorithms suffer from partial occlusion of the target, as well as from the different illumination conditions between the required offline database and the query space image. Besides, algorithms based on corners/edges detection are highly sensitive to adverse illumination conditions in orbit. An evaluation of the critical aspects of these two methods is provided with the aim of comparing their performance under changing illumination conditions and varying views between the camera and the target. Five different keypoints-based methods are compared to assess the robustness of feature matching. Furthermore, a method based on corners extraction from the lines detected by the Hough Transform is proposed and evaluated. Finally, a novel method, based on an hourglass Convolutional Neural Network (CNN) architecture, is proposed to improve the robustness of the IP during partial occlusion of the target as well as during feature tracking. It is expected that the results of this work will help assessing the robustness of keypoints-based, corners/edges-based, and CNN-based algorithms within the IP prior to the relative pose estimation.
Article
Full-text available
The relative pose estimation of an inactive target by an active servicer spacecraft is a critical task in the design of current and planned space missions, due to its relevance for close-proximity operations, i.e. the rendezvous with a space debris and/or in-orbit servicing. Pose estimation systems based solely on a monocular camera are recently becoming an attractive alternative to systems based on active sensors or stereo cameras, due to their reduced mass, power consumption and system complexity. In this framework, a review of the robustness and applicability of monocular systems for the pose estimation of an uncooperative spacecraft is provided. Special focus is put on the advantages of multispectral monocular systems as well as on the improved robustness of novel image processing schemes and pose estimation solvers. The limitations and drawbacks of the validation of current pose estimation schemes with synthetic images are further discussed, together with the critical trade-offs for the selection of visual-based navigation filters. The state-of-the-art techniques are analyzed in order to provide an insight into the limitations involved under adverse illumination and orbit scenarios, high image contrast, background noise, and low signal-to-noise ratio, which characterize actual space imagery, and which could jeopardize the image processing algorithms and affect the pose estimation accuracy as well as the navigation filter's robustness. Specifically, a comparative assessment of current solutions is given at different levels of the pose estimation process, in order to bring a novel and broad perspective as compared to previous works.
Article
Full-text available
Aiming at the problem of anomalous and non-independent distribution of the image errors in the feature-based visual pose estimation, a method of monocular visual pose estimation based on the uncertainty of noise error established by projection vector is proposed. Firstly, by using the covariance matrix to describe the uncertainty of the feature point direction and integrating the uncertainty of the feature point direction into the pose estimation, characteristic point measurement error with different degrees of directional uncertainty can be adapted which can makes the algorithm robust. Then, by introducing the projection vector and combining the depth information of each feature point to represent the collinearity error, the model nonlinear problem caused by the camera perspective projection can be eliminated which can make the algorithm have global convergence. Finally, we use global convergence theorem to prove the global convergence of the proposed algorithm. The results show that the proposed method has good robustness and convergence while adapting to different degrees of error uncertainty, which can meet practical engineering applications.
Conference Paper
Full-text available
This paper presents the design, implementation, and validation of a robust feature extraction architecture for real-time on-board monocular vision-based pose initialization of a target spacecraft in application to on-orbit servicing and formation flying. The proposed computer vision algorithm is designed to detect the most significant features of an uncooperative target spacecraft in a sequence of two-dimensional input images that are collected on board the chaser spacecraft. A novel approach based on the fusion of multiple and parallel processing streams is proposed to filter a minimum number of extracted true point features, even in case of unfavourable illumination conditions and in presence of Earth in the background. These are then combined into relevant polyline structures that characterize the true geometrical shape of the target spacecraft.
Conference Paper
Full-text available
This paper provides a method for dynamic simulation and target vehicle detection of the CubeSat spacecraft in the Low Earth Orbit. The technique for orbit prediction is verified against the Clohessy-Wiltshire solution while allowing for delta-V adjustments. Attitude simulation is performed using the Modified-Rodriguez Parameters variation of the Euler's equation for rigid body motion. An attitude kinematic-driver was developed to target the chaser vehicle camera towards the client CubeSat. Camera images are generated using rendering software with specified target and chaser spacecraft predicted motion. Spacecraft detection is performed using two state-of-the-art deep learning Convolutional Neural Networks, namely, the \eh{101-layer ResNet} and the \eh{Inception-ResNet-V2} classifiers with \eh{Faster-R-CNN} as the object detection engine. Results show, after applying transfer learning, both classification networks work well for simulation and laboratory experiment images.
Article
This work addresses the design and validation of a robust monocular vision-based pose initialization architecture for close-range onorbit-servicing and formation-flying applications. The aim is to rapidly determine the pose of a passive space resident object using its known three-dimensional wireframe model and a single low-resolution two-dimensional image collected on board the servicer spacecraft. In contrast to previous works, the proposed architecture is onboard executable and capable of estimating the pose of the client without the use of fiducial markers and without any a priori range measurements or state information. A novel feature detection method based on the filtering of the weak image gradients is used to identify the true edges of the client in the image, even in presence of the Earth in background. The detected features are synthesized using simple geometric constraints to dramatically reduce the search space of the feature correspondence problem, which is solved using the EPnP method. This approach is proven to be an order of magnitude faster than the state-of-the-art random sample consensus methods. A fast Newton–Raphson method that minimizes the fit error between corresponding image and model features is employed to refine the pose estimate and to resolve pose ambiguity. The proposed methodology is tested using actual space imagery collected during the PRISMA mission at about a 700 km altitude and a 10 m interspacecraft separation.