PreprintPDF Available

Autonomous Exploration Under Uncertainty via Graph Convolutional Networks

Preprints and early-stage research may not have been peer reviewed yet.
Preprint

Autonomous Exploration Under Uncertainty via Graph Convolutional Networks

Abstract

We consider a mapping and exploration problem in which a range-sensing mobile robot is tasked with mapping the landmarks in an unknown environment efficiently in real-time. There are numerous state-of-the-art methods which consider the uncertainty of a robot's pose and/or the entropy and accuracy of its map when exploring an unknown environment. However, such methods typically use forward simulation to predict and select the best action based on the respective utility function. Therefore, the computation time of such methods is often costly, and may grow exponentially with the increasing dimension of the state space and action space, prohibiting real-time implementation. We propose a novel approach that uses a Graph Convolutional Network (GCN) to predict a robot's optimal action in belief space over a graph representation of candidate waypoints and landmarks. The learned exploration policy can provide an optimal or near-optimal exploratory action and maintain competitive coverage speed with improved computational efficiency.
Autonomous Exploration Under Uncertainty via
Graph Convolutional Networks
Fanfei Chen, Jinkun Wang, Tixiao Shan and Brendan Englot
Abstract We consider a mapping and exploration problem in which a range-sensing
mobile robot is tasked with mapping the landmarks in an unknown environment
efficiently in real-time. There are numerous state-of-the-art methods which consider
the uncertainty of a robot’s pose and/or the entropy and accuracy of its map when
exploring an unknown environment. However, such methods typically use forward
simulation to predict and select the best action based on the respective utility function.
Therefore, the computation time of such methods is often costly, and may grow
exponentially with the increasing dimension of the state space and action space,
prohibiting real-time implementation. We propose a novel approach that uses a
Graph Convolutional Network (GCN) to predict a robot’s optimal action in belief
space over a graph representation of candidate waypoints and landmarks. The learned
exploration policy can provide an optimal or near-optimal exploratory action and
maintain competitive coverage speed with improved computational efficiency.
1 Introduction
Mapping and exploration [1] of a priori unknown environments are crucial capa-
bilities for mobile robot autonomy. Information-theoretic exploration methods were
developed to guide robots to the unexplored areas of their environment that will
contribute the largest quantities of new information to an occupancy map [2], [3],
[4], [5], [6]. Meanwhile, simultaneous localization and mapping (SLAM) provides
a way to evaluate map accuracy and manage localization error under noisy relative
measurements. Consequently, SLAM-based exploration or active SLAM [7], [8], [9]
has been applied successfully for mapping an unknown environment autonomously
under robot landmark and pose uncertainty. However, the decision-making compo-
F. Chen, J. Wang, T. Shan and B. Englot are with the Department of Mechanical Engineering,
Stevens Institute of Technology, Castle Point on Hudson, Hoboken, NJ, USA 07030.
e-mail: {fchen7, jwang92, tshan3, benglot}@stevens.edu
1
2 Fanfei Chen, Jinkun Wang, Tixiao Shan and Brendan Englot
GCNhiddenlayers
Input
Output
x1x2x3x4xt
l1l2l3l4
x5
f1f2f3fn
fn-1
lj
x1x2x3x4xt
l1l2l3l4
x5
f1f2f3fn
fn-1
lj
Fig. 1: An illustration of our use of a Graph Convolutional Network (GCN), in which
the cyan-colored rectangular block represents its hidden layers. The original input
data is the exploration graph associated with the current robot state, which reflects
its pose history, observed landmarks, and candidate frontiers. The output graph has
the same structure as the input graph, but one of the frontier nodes is selected to be
the next goal location, which is represented by the darkest red color. The darkness
of the red color indicates the predicted probability of optimality of the frontier.
nent of active SLAM exploration methods is time-consuming due to the need for
forward simulation of future robot measurements, and prediction of the resulting
map and pose uncertainty. This approach will ultimately fail for real-time decision-
making with increasing dimensionality of the state space and the action space, due
to the costly time complexity of such methods.
We propose a novel approach that uses an exploration graph to select the next
sensing action via Graph Convolutional Networks (GCN) in real-time. The proposed
graph captures the current robot state, its pose history, observed landmarks, and can-
didate frontiers. We use the Expectation-Maximization (EM) exploration algorithm
[9] to generate training data with randomly generated maps and random initial robot
Autonomous Exploration Under Uncertainty via Graph Convolutional Networks 3
locations. Once trained, a mobile robot does not need to use forward simulation to
compute the expected information gain of each action exhaustively. After a training
phase, the GCN predicts optimal sensing actions based on the current exploration
graph. Fig. 1 summarizes our approach. The nodes of the input graph have different
attributes with the same size. The GCN provides an output graph which has the same
structure as the input graph, with updated attributes. In our method, each node of the
output graph has only one attribute; which takes on a value of one for the selected
frontier node, and zero for all other nodes. Our method can provide an accurate and
efficient strategy to explore an a priori unknown environment populated with sparse
landmarks. For example, when an underwater robot explores a subsea environment,
it needs an efficient strategy to build a map because it has a limited power supply, and
a scarcity of subsea landmarks to support navigation. Meanwhile, communication
constraints require onboard real-time computation for navigation decision-making.
The contributions of this paper are as follows. We use the proposed exploration
graph to represent a SLAM-dependent robot’s state space and action space using
a generalized representation. Compared with typical metric maps, our graph only
records the topological relationships between poses, observed landmarks, and can-
didate waypoints. Accordingly, the learned exploration policy will not be influenced
as heavily by details such as the coloration of the environment, the type of feature
descriptor used, or the orientation of obstacles in the workspace, and thus a smaller
volume of training data is needed than would be required to learn from metric maps
directly. Using this representation, we propose what is to our knowledge the first
graph-learning-based active SLAM exploration approach, offering real-time viable
decision-making during robot exploration under uncertainty.
1.1 Related Work
Information-theoretic exploration methods guide a robot to explore the unknown
environment by repeatedly selecting the sensing action expected to be most infor-
mative [2], [3], [4]. It has also proven effective to use Gaussian process frontier
maps as a predictive tool to support exploration [5], [6]. However, these methods do
not consider a robot’s localization uncertainty during the exploration process, which
eventually leads to an inaccurate map.
Active SLAM exploration approaches have been developed to reduce both the
uncertainty of a robot’s pose and entropy of the map by considering the correlation
between localization and information gain [7]. The approach proposed in [8] uses the
particle filter to reduce the overall uncertainty for both maps and poses by capturing
a trajectory’s uncertainty using the particle weight. The Expectation-Maximization
(EM) Exploration algorithm [9] introduces virtual landmarks to represent the uncer-
tainty of unexplored regions, and a novel utility function for solving the exploration
problem that seeks the most accurate map possible with every sensing action.
Learning-based exploration methods can offer reduced computation time and
near-optimal exploration strategies. Bai et al. proposed a Gaussian process based
4 Fanfei Chen, Jinkun Wang, Tixiao Shan and Brendan Englot
mutual information prediction method [10], and a subsequent Bayesian optimization
active sampling approach [11], to speed up the prediction of mutual information
throughout a robot’s action space. Furthermore, [12] and [13] show that deep su-
pervised learning and deep reinforcement learning [14] can produce high-quality
solutions for the exploration problem by using local occupancy maps as input data.
Moreover, global occupancy maps are used as input data to find the frontier asso-
ciated with the next best view using deep reinforcement learning in [15]. However,
an occupancy grid map has limitations as an input to represent the environment.
For example, in some instances neural networks may be unable to perform reliably
over a map with a rotation relative to the maps presented in training. Hence, such a
framework may take a longer time to converge during the training phase, or even fail
to converge, due to the size of the state space induced by an occupancy map.
Learning from graphs is an emerging research area. Graph Convolutional Net-
works (GCNs) [16] have been successfully applied in many fields such as social
networks [17] and chemistry [18]. Spectral convolutions are introduced to operate
the graph-based neural network model. Sanchez-Gonzalez et al. [19] proposed to
solve the control problem by using graphs to represent physical models with Graph
Nets [20]. For graph learning, the input data can be variable-size graphs, and the
size of the output data scales with the number of nodes of the input graph.
By combining active-SLAM exploration with the graph-learning approach, we
propose and develop the first graph-learning-based robot exploration framework. The
proposed exploration graph structure offers a smaller state space for learning than
an occupancy map and is a more generalized representation of a SLAM-dependent
mobile robot’s environment. By permitting real-time computation due to low query
time complexity, a graph-learning approach is also scalable to high-dimensional
state and action spaces.
1.2 Paper Organization
A formal definition of the problem is given in Section 2, including a brief discussion
of the EM exploration algorithm, which generates the training data for our GCN
framework, and an overview of our GCN model. Experimental results are presented
in Section 3, with conclusions in Section 4.
2 Problem Formulation and Approach
2.1 Simultaneous Localization and Mapping Framework
We formulate graph-based SLAM as a least-squares problem. The definitions of the
Gaussian robot motion model and the Gaussian measurement model are as follows:
Autonomous Exploration Under Uncertainty via Graph Convolutional Networks 5
xi=hi(xi1,ui)+wi,wiN(0,Qi),(1)
zk=gk(xik,ljk)+vk,vkN(0,Rk),(2)
where X={xi}are robot poses, L={lj}are landmarks and U={ui}is a given
motion input. ikand jkrepresent the ith pose and the jth landmark associated with
the kth measurement. The estimate of the poses and the landmarks is obtained by
solving the nonlinear least-squares equation:
X,L=argmin
X,LÕ
ixihi(xi1,ui)2
Qi
+Õ
kzkgk(xik,ljk)2
Rk.(3)
In the EM exploration algorithm used to produce our training data, this is solved
using the GTSAM [21] implementation of iSAM2 [22] to construct a factor graph
and perform incremental nonlinear least-squares smoothing. This graphical model-
based inference approach provides Gaussian marginal distributions and Gaussian
joint marginal distributions.
2.2 Expectation-Maximization Exploration Algorithm
The EM exploration algorithm, which repeatedly selects the exploratory sensing
action that is expected to yield the most accurate map, offers an appealing balance
between managing uncertainty and exploring efficiently [9]. Virtual landmarks are
introduced as latent variables, and the exploration process is managed iteratively
through expectation-maximization. The virtual map, a uniformly discretized grid
map comprised of virtual landmarks in its cells (each characterized by an occupancy
probability and error covariance), is estimated using the current SLAM estimate
and measurements in the E-step. Then in the M-step, the optimal path is selected to
minimize the uncertainty of the virtual landmarks, which are initialized with large
covariance. Each cell in the initially unexplored map has a virtual landmark prior
with a large initial covariance. We use A-optimality [23] for our uncertainty criterion:
φA(Σ)=tr(Σ),(4)
where Σis a covariance matrix of the virtual landmarks. By calculating the trace
of the covariance of all virtual landmarks, we can quantify the uncertainty of each
virtual map. The definition of the cost function for the EM algorithm is as follows:
UÕ
vkV
φA(Σvk),(5)
where V={vk}are virtual landmarks in a virtual map. All the virtual landmarks,
which represent the possible real landmarks, are used to evaluate the performance of
each potential action. The goal is to minimize map uncertainty with each decision-
making step. Hence we define the action reward rato guide a robot accordingly:
6 Fanfei Chen, Jinkun Wang, Tixiao Shan and Brendan Englot
ra=U0UaαCa.(6)
In Eq. (6), Cindicates a cost-to-go to disfavor actions which have long travel
distance and αis a weight for the cost-to-go. U0is the initial uncertainty and Uais
the uncertainty after taking the action a.
2.3 Graph Convolutional Networks for Exploration
(a)
x1x2x3
l1
f1
l2
f3
f2
(b)
Fig. 2: In the left figure, the grayscale color represents the probability of occupancy
of a given map cell. The ellipses are estimation error covariances of each respective
location on the virtual map. Robot poses, landmarks and frontiers are indicated by
green dots, blue dots and red dots respectively. All landmark nodes and the current
pose node are connected with their nearest frontiers. In the right figure, An input
exploration graph is extracted from the current exploration state. Each edge in this
graph is weighted with the Euclidean distance between the two vertices connected.
We define the exploration graph G=(V,E), where V={{xi},{lj},{fn}}. The
edges Econnecting pose to pose xixi+1, and pose to landmark xikljkrepre-
sent odometry measurements and landmark measurements respectively. The frontier
nodes fnare sampled from the boundary cells between the free and unexplored space,
and they are connected to the nearest node, either pose or landmark, in the graph. It
is possible for multiple nodes to have the same frontier node. A simple example of
extracting the exploration graph from a robot’s current state is shown in Fig. 2.
Instead of explicitly computing the sensing action that maximizes the reward of
Eq. (6) through an expensive evaluation of all the candidate frontiers, we employ
a GCN to predict the optimal frontier with respect to (6). However, we use the
Autonomous Exploration Under Uncertainty via Graph Convolutional Networks 7
EM algorithm to generate training data. For each decision-making step, actions
A={an}are generated for each frontier node with straight-line path planning, and
the EM algorithm provides an action reward R={rn}for all frontiers via forward
simulation. The reward Ris normalized so that rn∈ [0,1]. We require that the
reward of an optimal frontier has to be larger than 0.95. If multiple frontiers are
optimal at the same time, we will select one at random. The rewards associated with
pose and landmark nodes are always 0because we only select the robot’s next action
from among frontier nodes. Therefore, the training label ynodes is defined as follows:
yi=(1ri>0.95
0otherwise (7)
A multi-layer GCN is adopted to explore the environment. The layer-wise propa-
gation rule of the GCN [16] is as follows:
H(l+1)=σ(ˆ
D1
2ˆ
Aˆ
D1
2H(l)W(l)),(8)
with ˆ
A=A+I, where Ais the adjacency matrix of the exploration graph Gand ˆ
Dis
the degree matrix of ˆ
A.H(l)represents the lth hidden layer of the GCN model with
the activation function σ(·).W(l)is the weight matrix of the lth layer.
We use a three-layer GCN model. The definition of the input feature vector
si=[si1,si2,si3,si4,si5]for the node niVis as follows:
si1=φA(Σi),(9)
si2=q(xixt)2+(yiyt)2,(10)
si3=arctan2(yiyt,xixt),(11)
si4=p(mi=1),(12)
si5=
0ni=xt
1ni∈ {fn}
1otherwise
.(13)
In Eq. (9), we use the same A-Optimality criterion of Eq. (4) for providing a
suitable feature to represent a node’s uncertainty, derived from our virtual map.
Relative position information to the current robot pose is provided by the second and
the third features. The Euclidean distance to the current pose is the second feature
in Eq. (10), where xand ycoordinates represent the locations of the node and the
current pose. The relative orientation to the current robot pose is the third feature as
shown in Eq. (11). Map information is provided by the fourth feature, where miis
the occupancy of the map cell associated with node niin Eq. (12). Since the robot
needs to travel to the optimal frontier from its current pose, it is important to provide
8 Fanfei Chen, Jinkun Wang, Tixiao Shan and Brendan Englot
indicator variables to denote the current pose and frontiers. In Eq. (13), we define
the indicator of the current pose to be 0, all frontiers to be 1, and all other nodes -1.
For the GCN model, the definition of each hidden layer is as follows:
H(1)=σ1(ˆ
D1
2ˆ
Aˆ
D1
2SW(1)),(14)
H(2)=Dropout(H(1)),(15)
H(3)=σ3(ˆ
D1
2ˆ
Aˆ
D1
2H(2)W(3)).(16)
In Eq. (14), the size of weight matrix W(1)is 5×1000 because we want to
upsample the number of features from 5 to 1000. The activation function σ1is
a Rectified Linear Unit (ReLU). We add a dropout layer with 0.5 dropout rate as
shown in Eq. (15). In the output layer, the size of the weight matrix W(3)in Eq. (16)
is 1000 ×1and the activation fuction σ3is a sigmoid function.
The cross-entropy loss function is defined as follows:
L=−(w1ylog ˆy+w0(1y)log(1ˆy)),(17)
where w1and w0are the weights for class 1 and class 0. These weights are used
to balance the cost for imbalanced data. We empirically define w1=21 and w0=1.
We use the Adam first-order gradient-based algorithm [24] when optimizing the
parameters of the GCN model.
3 Experiments and Results
3.1 Experimental Setup
We trained the proposed GCN system over a two-dimensional landmark-based sim-
ulation environment, using 500 unique 40m×40mmaps with randomly seeded
landmarks. From the exploration sequences run over these maps, we selected 32
random exploration graphs as a training batch, which was repeated for 20 batches,
with each batch undergoing 500 training epochs. All of this data was used to train
our GCN policy - we generated new random environments excluded from this train-
ing dataset for the testing phase. In our experiments, we impose a feature density
of 0.005 landmarks per m2on all training and testing maps. The simulated robot
in our experiments is equipped with a 5mrange sensor with a 360-degree field of
view, and Gaussian white noise of standard deviation 0.02min range and 0.5deg
in bearing. Straight-line path planning is introduced to repeatedly lead the robot
from its current location to the goal frontier location, by which it will first turn to
face the goal, and then travel along a straight-line path to the goal. Although an
occupancy grid is used to keep track of landmark locations, the point landmarks are
Autonomous Exploration Under Uncertainty via Graph Convolutional Networks 9
(a) GCN (b) EM
(c) Nearest Frontier (d) Random
Fig. 3: Representative exploration trials across four different methods on the same
40m×40mmap, over an equivalent number of decision-making instances.
assumed not to pose collision hazards. During one time step of the simulation, the
robot can rotate up to 180deg, or translate forward up to 2m; the standard deviation
of Gaussian white process noise on the translation and rotation actions is 0.1mand
0.2deg, respectively. The resolution of the virtual map is 2m(which is the size of
all square map cells), and the initial error covariance for virtual landmarks is 1m2in
each dimension. The virtual landmarks of the EM algorithm are only used here to
calculate the reward of Eq. 6, which serves as an indicator of map accuracy weighed
against travel expense. The virtual map is otherwise excluded from iSAM2 factor
graphs and from the exploration graph. For all the exploration experiments, the task
will be terminated after 85% of a map has been explored.
The simulation environment is written in Python and C++, and our GCN model
is trained using TensorFlow [25]. All of the test algorithms were run on a computer
equipped with an Intel i9 3.6Ghz CPU and an Nvidia GeForce Titan RTX GPU.
10 Fanfei Chen, Jinkun Wang, Tixiao Shan and Brendan Englot
(a) Average uncertainty of landmarks (b) Max uncertainty of the trajectory
(c) Map entropy reduction (d) Planning time
Fig. 4: The result of 50 exploration trials of each method, with the same randomly
initialized landmarks and robot start locations, on 40m×40mmaps (the first three
metrics shown are plotted per time-step of the simulation).
3.2 Exploration Comparison
We compared the proposed GCN framework with (1) a nearest frontier approach, (2)
the selection of a random frontier, and (3) the EM approach over 50 exploration trials.
The test environments were randomly generated and excluded from the training data.
For each trial, every approach uses the same random map and the same random initial
robot location. We provide examples of the four competing exploration methods over
the same random map with the same random initial location, in Fig. 3. The error
ellipses (0.5 standard deviations) shown in all grid cells represent the uncertainty of
the virtual landmarks. Derived from the robot poses that observed these cells, their
covariance is a measure of map accuracy. These virtual landmarks are only used for
reward calculation in Eq. 6 and they are not included in the SLAM factor graph or
exploration graph. The blue circle is the sensing area, and the center of the blue circle
is the current location of the robot. The orange error ellipses show the uncertainty of
the real landmarks and the green error ellipses show the uncertainty of the previous
robot poses. The magenta points are frontier nodes that are potential goal positions
Autonomous Exploration Under Uncertainty via Graph Convolutional Networks 11
for the robot, and the red point is the selected goal frontier. The cross symbols are the
locations of the real landmarks. We extend the boundary of the map by 6min each
direction to ensure the robot can successfully obtain the reward around the boundary
area, while frontier nodes are not allowed to be generated in these extended areas,
and their observation does not contribute to the evaluation of reward.
We select three metrics to evaluate the relative performance of four exploration
methods, as shown in Fig. 4. The average landmark uncertainty is a key element to
evaluate the accuracy of the final map, and we only consider real landmarks in this
comparison. The maximum uncertainty of the robot’s trajectory is used for showing
the accuracy of pose estimation along the trajectory. Map entropy reduction is used
to evaluate the efficiency of robot exploration. The random method and the EM
algorithm provide the lowest landmark uncertainty (Fig. 4a), but the random method
is the most inefficient method for exploring the environment (Fig. 4c). In Fig. 3d, it is
clear that the random method has a disorganized trajectory which has more chances
to visit the previous landmarks to reduce uncertainty, and a much longer travel
distance which is almost twice that of other methods (Fig. 4c). With respect to map
accuracy, the GCN method is slightly worse than the EM algorithm (which was used
to train the GCN) and the random method, but the uncertainty of those three methods
is at a very similar level in Fig. 4a. In Fig. 4b, the GCN method and EM algorithm
have very similar performance, which is slightly better than the Random method.
The nearest frontier method achieves the same map entropy reduction performance
as the GCN and EM approaches (Fig. 4c), but it achieves the worst performance
for the uncertainty of the landmarks and the poses along the trajectories as shown
in Fig. 4a and Fig. 4b, because this method only chooses the nearest frontier to
explore the environment and never actively seeks a frontier which is near a previous
landmark to reduce uncertainty, as shown in Fig. 3c. Overall, the GCN policy and
the EM algorithm demonstrate the best performance in producing an accurate map
of landmarks, and accurate pose estimates, using time-efficient trajectories.
3.3 Computation Time
In Section 3.2, both the GCN and EM exploration algorithms have similar satis-
fying exploration performance. However, the time complexity of the two methods
is very different. The time complexity of the EM algorithm’s decision-making is
O(Nact ion (C1+C2)), where Nact io n is the number of the frontiers, C1is the cost of
each iSAM2 predictive estimate over a candidate trajectory (a function of the num-
ber of mapped true landmarks and prior poses) and C2is the cost of the covariance
update for virtual landmarks (a function of the number of prior poses, the virtual
map resolution and the size of the region being explored). Therefore, the computa-
tional cost of the EM algorithm increases dramatically with increasing size of the
state space (including the number of landmarks) and the action space, and it will
ultimately lead to failure for complex and large-scale real-time exploration scenarios.
The time complexity of decision-making for the GCN approach is O(n), where n
12 Fanfei Chen, Jinkun Wang, Tixiao Shan and Brendan Englot
represents the number of nodes in a given exploration graph. This reflects the cost
of matrix multiplication with a GPU, performed for each of the two GCN layers that
require it. For the GCN approach, none of the Nact io n candidate actions need to be
evaluated explicitly, hence the GCN approach remains robust to high-dimensional
state and action spaces.
We test the computation time for exploration decision-making on four different
sizes of maps, namely 40m×40m,60m×60m,80m×80mand 100m×100m. Each size
has the same feature density, which is 0.5landmarks per m2. The test result, to which
a curve is fit using a sixth-order polynomial regression, is shown in Fig. 4d. The plot
represents the mean of the computation time used for exploration decision-making
(i.e., forward simulation). The GCN computation time is near zero, which is real-
time viable. In contrast, the computation time of the EM algorithm grows sharply
with increasing map size. In 100m×100mmaps, the average computation time for
the EM algorithm is 16.5s, which is not acceptable for a real-time exploration task.
As mentioned above, the computation time shown in Fig. 4d captures only the
decision-making component of exploration, the most costly step of the process,
which entails selecting the optimal goal frontier from among the available frontier
nodes. The computation time of other tasks, such as solving SLAM over the robot’s
true trajectory, updating the virtual landmarks as new measurements are collected,
and generating frontier nodes from the grid map, is not constant, but is equivalent
for both the EM and GCN algorithms (and these tasks are independent of the size of
the robot’s action space). The computation time of generating the exploration graph
is not constant either, but consumes a small fraction of the overall computation time
required for the GCN approach.
3.4 Scalability
For large-size maps with many landmarks, it is time-consuming to generate training
data using the EM algorithm because of the poor scaling of its computational expense,
as shown in Figure 4d. Our proposed method offers the additional prospect of
scalability in training, by allowing a GCN trained on small maps to be queried over
a large map. Larger maps accumulate a longer history of poses, more landmarks,
and more frontiers, yielding larger exploration graphs, by which the state space
and the action space expand as the map size increases. A robot is also likely to
accumulate more map and pose estimation error when exploring larger maps. In this
final experiment, we use a GCN trained on 40m×40mmaps with 0.5landmarks
per m2feature density to explore larger maps which are 60m×60m,80m×80m
and 100m×100mwith the same feature density. Representative examples of GCN
exploration over the three different map sizes are shown in Fig. 5, along with average
entropy reduction over 50 trials across all competing algorithms.
We show the map and pose uncertainty incurred by all four algorithms over differ-
ent large-size maps in Fig. 6. The nearest frontier method has the worst performance
over every map size. As the map size increases, the nearest frontier method accu-
Autonomous Exploration Under Uncertainty via Graph Convolutional Networks 13
(a) GCN exploration on 60m×60mmap (b) Entropy reduction on 60m×60mmaps
(c) GCN exploration on 80m×80mmap (d) Entropy reduction on 80m×80mmaps
(e) GCN exploration on 100m×100mmap (f) Entropy reduction on 100m×100mmaps
Fig. 5: (a)(c)(e): An illustration of GCN exploration over different map sizes.
(b)(d)(f): The map entropy reduction results of 50 exploration trials on three different
size maps. The GCN policy is trained on 40m×40mmaps.
mulates more error during exploration. In contrast, the random method and the EM
algorithm achieve similar performance even as the map size grows. However, al-
though the random method and the EM algorithm have similar optimal performance
with respect to curbing uncertainty, the random method produces the least efficient
exploration trajectories as shown in Figs. 5b, 5d, and 5f. Here, the EM algorithm
14 Fanfei Chen, Jinkun Wang, Tixiao Shan and Brendan Englot
(a) Average uncertainty of landmarks
on 60m×60mmaps
(b) Max uncertainty of the trajectory
on 60m×60mmaps
(c) Average uncertainty of landmarks
on 80m×80mmaps
(d) Max uncertainty of the trajectory
on 80m×80mmaps
(e) Average uncertainty of landmarks
on 100m×100mmaps
(f) Max uncertainty of the trajectory
on 100m×100mmaps
Fig. 6: Landmark and pose uncertainty management over 50 exploration trials on
three different map sizes, with the same randomly initialized landmarks for every
algorithm. The GCN policy is trained on 40m×40mmaps.
is still an optimal method since it produces short trajectories with low-uncertainty
landmark and pose estimates. The GCN approach is a near-optimal method com-
pared with the EM algorithm. It still achieves low uncertainty of landmarks and
Autonomous Exploration Under Uncertainty via Graph Convolutional Networks 15
poses over the 60m×60mmaps, but as the map size grows, the GCN framework
gradually performs worse because the training data was generated from 40m×40m
maps, and the larger maps’ exploration graphs differ from those encountered in the
training phase. However, the uncertainty of the landmark and pose estimates from
the GCN approach is still lower than the nearest frontier method, although they
have similarly competitive travel distance, as shown in Figs. 5b, 5d, and 5f. As we
discussed in Section 3.3, the computation time of the EM algorithm is not real-time
on these large maps, so the GCN framework offers a suitable alternative.
4 Conclusions and Future Work
In this paper, we proposed a novel graph learning-based exploration framework that
uses a GCN to achieve real-time viable exploration decision-making, and provides
an accurate and efficient exploration result. For other active-SLAM exploration
algorithms, with increasing dimensionality of the robot’s state space and action
space, the procedure’s computational complexity ceases to be real-time viable. The
GCN framework estimates a robot’s next optimal sensing action using an exploration
graph, which is introduced to encapsulate the current state space and action space.
Moreover, the proposed method is scalable to testing in higher-dimensional state-
action spaces, even when trained in lower-dimensional state-action spaces. Looking
ahead to future work, we wish to utilize a reduced graph structure to represent the
current state-action space, ideally in which the constraints tied to a robot’s pose
history, which grows without bound, can be collapsed onto the landmarks only,
which are finite in number. Moreover, in future work we will explore the viability
of different graph learning models, and investigate the framework’s extensibility to
more complex real-world active SLAM scenarios.
Acknowledgements This research has been supported by the National Science Foundation, grant
number IIS-1723996.
References
1. S. Thrun, W. Burgard and D. Fox. Exploration. Probabilistic Robotics, 569–605, MIT Press
(2005)
2. B. Charrow, G. Kahn, S. Patil, S. Liu, K. Goldberg, P. Abbeel, N. Michael and V. Kumar.
Information-theoretic Planning with Trajectory Optimization for Dense 3D Mapping. Pro-
ceedings of Robotics: Science and Systems (2015)
3. B. Charrow, S. Liu, V. Kumar and N. Michael. Information-theoretic Mapping using Cauchy-
Schwarz Quadratic Mutual Information. Proceedings of the IEEE International Conference
on Robotics and Automation, 4791–4798 (2015)
4. B.J. Julian, S. Karaman and D. Rus. On Mutual Information-Based Control of Range Sensing
Robots for Mapping Applications. The International Journal of Robotics Research, 33(10):
1375–1392 (2014)
16 Fanfei Chen, Jinkun Wang, Tixiao Shan and Brendan Englot
5. M. G. Jadidi, J. V. Miró, R. Valencia and J. Andrade-Cetto. Exploration on Continuous
Gaussian Process Frontier Maps. Proceedings of the IEEE International Conference on
Robotics and Automation, 6077–6082 (2014)
6. M. G. Jadidi, J. V. Miró and G. Dissanayake. Mutual Information-based Exploration on
Continuous Occupancy Maps. Proceedings of the IEEE/RSJ International Conference on
Intelligent Robots and Systems, 6086–6092 (2015)
7. R. Valencia, J. V. Miró, G. Dissanayake and J. Andrade-Cetto. Active Pose SLAM. Proceed-
ings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 1885–1891
(2012)
8. C. Stachniss, G. Grisetti and W. Burgard. Information Gain-based Exploration using Rao-
Blackwellized Particle Filters. Proceedings of Robotics: Science and Systems, 65–72 (2005)
9. J. Wang and B. Englot. Autonomous Exploration with Expectation-Maximization. Proceed-
ings of the International Symposium on Robotics Research (2017)
10. S. Bai, J. Wang, K. Doherty and B. Englot. Inference-Enabled Information-Theoretic Explo-
ration of Continuous Action Spaces Proceedings of the International Symposium on Robotics
Research, 419–433 (2015)
11. S. Bai, J. Wang, F. Chen and B. Englot. Information-theoretic Exploration with Bayesian
Optimization. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots
and Systems, 1816–1822 (2016)
12. S. Bai, F. Chen and B. Englot. Toward Autonomous Mapping and Exploration for Mobile
Robots through Deep Supervised Learning. Proceedings of the IEEE/RSJ International
Conference on Intelligent Robots and Systems, 2379–2384 (2017)
13. F. Chen, S. Bai, T. Shan and B. Englot. Self-Learning Exploration and Mapping for Mobile
Robots via Deep Reinforcement Learning. Proceedings of the AIAA SciTech Forum, (2019)
14. V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves,
M. Riedmiller, A.K. Fidjeland, G. Ostrovski and S. Petersen. Human-level Control through
Deep Reinforcement Learning Nature, 0518(7540):529–533 (2015)
15. F. Niroui, K. Zhang, Z. Kashino and G. Nejat. Deep Reinforcement Learning Robot for Search
and Rescue Applications: Exploration in Unknown Cluttered Environments. IEEE Robotics
and Automation Letters, 4(2):610–617 (2019)
16. T. N. Kipf and M. Welling. Semi-supervised Classification with Graph Convolutional Net-
works. Proceedings of the International Conference on Learning Representations (2017)
17. W. Hamilton, Z. Ying and J. Leskovec. Inductive Representation Learning on Large Graphs.
Advances in Neural Information Processing Systems, 1024–1034 (2017)
18. D.K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik
and R.P. Adams. Convolutional Networks on Graphs for Learning Molecular Fingerprints.
Advances in Neural Information Processing Systems, 2224–2232 (2015)
19. A. Sanchez-Gonzalez, N. Heess, J.T. Springenberg, J. Merel, M. Riedmiller, R. Hadsell and
P. Battaglia. Graph Networks as Learnable Physics Engines for Inference and Control. arXiv
preprint, arXiv:1806.01242 [cs.LG] (2018)
20. P.W. Battaglia, J.B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. Zambaldi, M. Malinowski,
A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner and C. Gulcehre. Relational Inductive Biases,
Deep Learning, and Graph Networks. arXiv preprint, arXiv:1806.01261 [cs.LG] (2018).
21. F. Dellaert. Factor Graphs and GTSAM: A Hands-on Introduction. Technical Report, Georgia
Institute of Technology, GT-RIM-CP&R-2012-002 (2012).
22. M. Kaess, H. Johannsson, R. Roberts, V. Ila, J. J. Leonard and F. Dellaert. iSAM2: Incre-
mental Smoothing and Mapping using the Bayes Tree. The International Journal of Robotics
Research, 31(2):216–235 (2012)
23. M. Kaess and F. Dellaert. Covariance Recovery from a Square Root Information Matrix for
Data Association. Robotics and Autonomous Systems, 57(12):1198–1210 (2009)
24. D. Kingma and J. Ba. Adam: A Method for Stochastic Optimization. Proceedings of the
International Conference on Learning Representations (2015)
25. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis,
J. Dean, M. Devin and S. Ghemawat. Tensorflow: Large-scale Machine Learning on Hetero-
geneous Distributed Systems. arXiv preprint, arXiv:1603.04467 [cs.DC] (2016)
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Low-dimensional embeddings of nodes in large graphs have proved extremely useful in a variety of prediction tasks, from content recommendation to identifying protein functions. However, most existing approaches require that all nodes in the graph are present during training of the embeddings; these previous approaches are inherently transductive and do not naturally generalize to unseen nodes. Here we present GraphSAGE, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, we learn a function that generates embeddings by sampling and aggregating features from a node's local neighborhood. Our algorithm outperforms strong baselines on three inductive node-classification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multi-graph dataset of protein-protein interactions.
Article
Full-text available
Predicting properties of molecules requires functions that take graphs as inputs. Molecular graphs are usually preprocessed using hash-based functions to produce fixed-size fingerprint vectors, which are used as features for making predictions. We introduce a convolutional neural network that operates directly on graphs, allowing end-to-end learning of the feature pipeline. This architecture generalizes standard molecular fingerprints. We show that these data-driven features are more interpretable, and have better predictive performance on a variety of tasks.
Article
Full-text available
We present a novel data structure, the Bayes tree, that provides an algorithmic foundation enabling a better understanding of existing graphical model inference algorithms and their connection to sparse matrix factorization methods. Similar to a clique tree, a Bayes tree encodes a factored probability density, but unlike the clique tree it is directed and maps more naturally to the square root information matrix of the simultaneous localization and mapping (SLAM) problem. In this paper, we highlight three insights provided by our new data structure. First, the Bayes tree provides a better understanding of the matrix factorization in terms of probability densities. Second, we show how the fairly abstract updates to a matrix factorization translate to a simple editing of the Bayes tree and its conditional densities. Third, we apply the Bayes tree to obtain a completely novel algorithm for sparse nonlinear incremental optimization, named iSAM2, which achieves improvements in efficiency through incremental variable re-ordering and fluid relinearization, eliminating the need for periodic batch steps. We analyze various properties of iSAM2 in detail, and show on a range of real and simulated datasets that our algorithm compares favorably with other recent mapping algorithms in both quality and efficiency.
Conference Paper
Full-text available
This paper presents an integrated approach to exploration, mapping, and localization. Our algorithm uses a highly efficient Rao-Blackwellized particle filter to represent the posterior about maps and poses. It applies a decision-theoretic framework which simultaneously considers the uncertainty in the map and in the pose of the vehicle to evaluate potential actions. Thereby, it trades off the cost of executing an action with the expected information gain and takes into account possible sensor measurements gathered along the path taken by the robot. We furthermore describe how to utilize the properties of the Rao-Blackwellization to efficiently compute the expected information gain. We present experimental results obtained in the real world and in simulation to demonstrate the effectiveness of our approach.
Article
Rescue robots can be used in urban search and rescue (USAR) applications to perform the important task of exploring unknown cluttered environments. Due to the unpredictable nature of these environments, deep learning techniques can be used to perform these tasks. In this paper, we present the first use of deep learning to address the robot exploration task in USAR applications. In particular, we uniquely combine the traditional approach of frontier-based exploration with deep reinforcement learning to allow a robot to autonomously explore unknown cluttered environments. Experiments conducted with a mobile robot in unknown cluttered environments of varying sizes and layouts showed that the proposed exploration approach can effectively determine appropriate frontier locations to navigate to while being robust to different environment layouts and sizes. Furthermore, a comparison study with other frontier exploration approaches showed that our learning-based frontier exploration technique was able to explore more of an environment earlier on, allowing for potential identification of a larger number of victims at the beginning of the time-critical exploration task.
Article
The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Article
Data association is one of the core problems of simultaneous localization and mapping (SLAM), and it requires knowledge about the uncertainties of the estimation problem in the form of marginal covariances. However, it is often difficult to access these quantities without calculating the full and dense covariance matrix, which is prohibitively expensive. We present a dynamic programming algorithm for efficient recovery of the marginal covariances needed for data association. As input we use a square root information matrix as maintained by our incremental smoothing and mapping (iSAM) algorithm. The contributions beyond our previous work are an improved algorithm for recovering the marginal covariances and a more thorough treatment of data association, now including the joint compatibility branch and bound (JCBB) algorithm. We further show how to make information theoretic decisions about measurements before actually taking the measurement, therefore allowing a reduction in estimation complexity by omitting uninformative measurements. We evaluate our work on simulated and real-world data.