PreprintPDF Available

Fractional Binding in Vector Symbolic Representations for Efficient Mutual Information Exploration

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

Mutual information (MI) is a standard objective function for driving exploration. The use of Gaussian processes to compute information gain is limited by time and memory complexity that grows with the number of observations collected.We present an efficient implementation of MI-driven exploration by combining vector symbolic architectures with Bayesian Linear Regression. We demonstrate equivalent regret performance to a GP-based approach with memory and time complexity that is constant in the number of samples collected, as opposed to t^2 and t^3, respectively, enabling long-term exploration.
Fractional Binding in Vector Symbolic
Representations for Efficient Mutual Information
Exploration
P. Michael Furlong
Centre for Theoretical Neuroscience
University of Waterloo
Waterloo, Canada
michael.furlong@uwaterloo.ca
Terrence C. Stewart
University of Waterloo Collaboration Centre
National Research Council of Canada
Waterloo, Canada
terrence.stewart@nrc-cnrc.gc.ca
Chris Eliasmith
Centre for Theoretical Neuroscience
University of Waterloo
Waterloo, Canada
celiasmith@uwaterloo.ca
Abstract—Mutual information (MI) is a standard objective
function for driving exploration. The use of Gaussian processes
to compute information gain is limited by time and memory
complexity that grows with the number of observations collected.
We present an efficient implementation of MI-driven exploration
by combining vector symbolic architectures with Bayesian Linear
Regression. We demonstrate equivalent regret performance to a
GP-based approach with memory and time complexity that is
constant in the number of samples collected, as opposed to t2
and t3, respectively, enabling long-term exploration.
Index Terms—mutual information sampling, Bayesian opti-
mization, vector symbolic architecture, fractional binding
I. INTRODUCTION
Mutual information (MI) is a standard objective function
for quantifying curiosity when exploring [1], [2]. In this paper
we use Bayesian Optimization as a framework for curiosity,
and present an algorithm for MI-driven exploration that has
time and memory requirements that are constant in the number
of observations, improving on the t3(time) and t2(memory)
requirements of Gaussian Process approaches to computing
MI.
A common approach to informative sampling is to compute
information gain using Gaussian Processes (GPs), e.g. [3]–[6].
However, computing the variance, necessary to compute MI,
requires inverting a matrix that grows with the square of the
number of sampled data points, t. Unbounded growth of mem-
ory, and the concomitant increase in time to evaluate sampling
locations, is not compatible with long-term operations using
systems with limited memory capacity.
To overcome the limitations of GPs, researchers have im-
proved occupancy grid methods [7] with efficient algorithms
for computing information gain [8], [9]. The complexity of
these approaches tend to grow linearly in the number of grid
cells. However, occupancy grids have constraints that GPs
do not - there is a fixed resolution, and only points in the
grid are modelled. These shortcomings can be ameliorated
with irregular and adaptive representations (e.g., triangulated
meshes or KD-trees), but they require additional machinery
to represent the function domain, and increased memory to
represent larger areas.
We present an algorithm that provides the benefits of both
approaches. It has memory and time requirements that are
constant with respect to the number of observations, and is
linear in the number of candidate sampling locations, but is
still defined for all points in the function domain. We achieved
this by combining the concept of fractional binding in Vector
Symbolic Architectures (VSAs) with Bayesian Linear Regres-
sion (BLR), to model uncertainty over the function domain.
VSAs are used in modelling cognitive processes [10]–
[14]. Symbols are represented as vectors, and cognition is
conducted through operations on those vectors. Binding is a
key operation, where a new symbol, C, is created by binding
two existing symbols, Aand B, denoted C=A~B, typically
representing a slot-filler relationship between Aand B.
Cognitive Semantic Pointers are a neurally implemented
VSA for which the binding operator is circular convolu-
tion1. Integer quantities, atomic symbols (e.g., words), and
structured representations (e.g. sentences), can be represented
in Semantic Pointers through binding. To represent integer
quantities, binding is iterated an integer number of times,
denoted Sk=S~. . . ~S, where kNand SRdis
a fixed semantic pointer of dimension d, that we call an “axis
pointer” or “axis vector”.
Spatial Semantic Pointers (SSPs) extend Semantic Pointers
to represent real numbers through the process of fractional
binding [13], [15]. Fractional binding is implemented with the
Fourier transform Sx=F1{F {S}x}, where xRis the
real value that is encoded through element-wise exponentiation
of the Fourier transform of the axis pointer, S. Using SSPs
allows us to make a connection between biological representa-
tion and information-theoretic models of curiosity. SSPs link
to biology through modelling grid and place cells [16], [17],
representations linked to an organism’s location [18]. Further,
the organization of spatial relationships may be used in other
brain areas [19].
SSPs connect to information theoretic exploration through
1Plate [13] originally suggested circular convolution for a purely algebraic
VSA.
the kernel induced by the dot product over SSP vectors. SSPs
induce a sinc kernel function [20], and sinc kernels have been
found to be efficient kernels for kernel density estimators [21],
[22]. Vector representations that induce kernels can be used to
make memory- and time-efficient kernel density estimators, as
in the EXPoSE algorithm [23]. But where Schneider et al. [23]
made a KDE, we are combining SSPs with Bayesian Linear
Regression to approximate Gaussian Process regression.
Other approaches combine vector representations with BLR
to improve computational efficiency. ALPaCA uses uncertainty
over the vector space for meta-learning [24], [25]. Perrone
et al. [26] project data into a vector space, for more com-
putationally efficient Bayesian Optimization. However, these
techniques require learning a projection from input data into
a vector space. The advantage of SSPs is that the represen-
tation doesn’t need to be learned, it can be designed, further
improving efficiency.
In this paper we compare the performance of our algorithm,
the Spatial Semantic Pointer Mutual Information Bayesian
optimization (SSP-MI), to the Gaussian Process Mutual Infor-
mation Bayesian optimization (GP-MI) algorithm developed
in [6]. We empirically show that the regret achieved by the
algorithm is at least as good as the GP algorithm, with time and
memory complexity that is constant in the number of samples
collected, t, as opposed to O(t3)and O(t2)for the GP-based
method. The constant time and memory requirements of SSP-
MI means that it is feasible to deploy this algorithm in limited
hardware for long-duration operations.
II. AP PROACH
Our exploration algorithm uses the Bayesian Optimization
framework of Contal et al. [6], given in Algorithm 1. The
objective is to find the sampling location, x, that max-
imizes a function f(·). The function domain is sampled
to provide a set of candidate function sampling points, X.
The algorithm computes an acquisition function, given by
µt(x) + pγt+σ2
t(x)γt, where µt(x)is the current
estimate of f(x),σ2
t(x)is the predicted variance of µt(x), and
γtaccumulates the predicted variance of previously observed
locations. The highest scoring candidate sample location is
selected for follow-up observations, which are used to update
the algorithm that predicts µt(x)and σ2
t(x).
In the baseline algorithm, GP-MI, µt(x)and σ2
t(x)are
provided by a Gaussian Process regression using a radial basis
kernel function, operating on the raw input vectors, x∈ X.
In our algorithm µt(x)and σ2
t(x)are provided by a BLR
over the SSP representation of the points in X. GP-MI was
implemented using the GPy library [27]. For GP-MI, mean
and variance are computed per [28, §6.4]:
µt(x) = kT
t1Σt1yt1(1)
σ2
t(x) = k(x, x)kT
t1Σt1kt1(2)
where kt= (k(x, x1), . . . , k(x, xt)) is the vector of kernel
function evaluations between the test point xand all points
in the data set collected up to observation t∈ {0, . . . , T }.
Algorithm 1 Mutual information Bayesian optimization
1: procedure MIBO(budget, f (·),X)
2: γ0
3: while t < budget do
4: µt(x), σ2
t(x)agent.query(x)x∈ X
5: φt(x)pγt+σ2(x)γt
6: xtarg max
x∈X µ(x) + φt(x)
7: ytf(xt)Observe f(·)at xt
8: agent.update(xt, yt)
9: γtγt+σ2(x)
10: tt+ 1
11: end while
12: end procedure
yt= (y1, . . . , yt)is the vector of previously collected function
value observations, yt=f(xt). After the observation (xt, yt),
is collected, Σt1,kt1and yt1are updated.
For SSP-MI candidate sample locations are transformed into
SSPs, which are row vectors denoted ψ(x). To fractionally
bind vector-valued variables, we select a random axis pointer,
Si, for each dimension of the vector as in [15]. We then frac-
tionally bind each axis pointer using the corresponding vector
element of x, and then bind all of the vectors representing the
individual axes:
ψ(x) = n
~
i=1Sxi/l
i(3)
lis a length scale parameter. In this work we use the same
length scale parameter for all vector elements, although using
one length scale per dimension would be reasonable. The
length scale parameter is optimized by minimizing the L2error
in the predicted observations:
arg min
lR+
t
X
ik(yimT
tψ(xi/l))k2(4)
The choice to minimize the prediction error instead of max-
imizing the log likelihood of the observations was arbitrary.
The BLR parameters were updated online, per [28, §3.3]:
Σ1
t= Σ1
t1+βψ(xt)Tψ(xt)(5)
mt= ΣtΣ1
t1mt1+ Σtβψ(xt)yt(6)
The predicted mean and variance were computed:
µt(x) = mT
t1ψ(x)(7)
σ2
t(x) = 1
β+ψ(xt1ψ(x)T(8)
Both algorithms are initialized with ten observations that are
used to optimize hyperparameters. Initialization points were
selected by randomly shuffling the candidate locations and
using the first 10 points. For both algorithms the hyperpa-
rameters were only optimized on the initial 10 samples, and
not modified afterwards.
A. Experiment
We tested the algorithms on the Himmelblau, Branin-Hoo,
and Goldstein-Price functions, which were used as benchmarks
in [6]. We scaled the functions to make the problem a
maximization, to ensure GP hyperparameter fitting converged,
and to get similar regret numbers to those reported in [6].
The functions were evaluated, without noise, over a restricted
domain, with points spaced evenly along each axis to form a
100 ×100 grid. Agents were given a budget of 250 samples.
The domain and scale factors are given in Table I.
Function Domain Scale
Himmelblau [5,5] ×[5,5] 1
100
Branin-Hoo [5,10] ×[0,15] 1
Goldstein-Price [2,2] ×[2,2] 1
105
TABLE I: The tested functions, the evaluation domains, and
the scale factors applied.
Algorithm performance was measured with regret averaged
over samples taken. The regret at each time point is the
difference between the function value at sampling location,
x, and the maximum value of the function over the candidate
sampling locations, x= arg maxx∈X f(x). The regret at time
t, is Rt=1
tPt
i=1 f(x)f(xi). We also recorded the running
total of the time it took to predict µt(x)and σ2
t(x)over the
set of candidate sampling points including, for SSP-MI, the
one-time encoding of the points in Xas SSPs.
III. RES ULTS
Fig. 1 shows the evolution of the algorithms’ regret on the
left, and the cumulative time spent evaluating sampling loca-
tions on the right. Shaded regions represent a 95% confidence
interval for the N= 30 trials.
Except for some initial samples, the algorithms’ average
regret is largely indistinguishable. Table II reports the differ-
ence in the means and in the standard deviation of Rtof the
algorithms. Positive values mean the SSP-MI algorithm has
lower regret, standard deviation. A Bayesian hypothesis test
at samples 125 and 250, finds the performance of the SSP-MI
algorithm is either better than or statistically indistinguishable
from the GP-MI algorithm with 95% probability. Where
there is a statistical difference, the effect size (Cohen’s d) is
moderate to large. For regret performance, under the tested
scenarios, there is no reason to choose GP-MI over SSP-MI.
The benefits SSP-MI become apparent in the time to se-
lect sampling locations. At each iteration the algorithm re-
computes the acquisition function for the candidate sampling
locations. For GP-MI the time grows as a function of the
number of samples collected. For SSP-MI the time to compute
the acquisition function is constant in the number of samples
collected, hence the observed linear trend in Fig. 1.
Of note is the large constant offset for the initial processing
time. This is due to the time it takes to encode the candidate
sampling locations as SSPs, {ψ(x) : x∈ X}. These values
were cached so the encoding was only performed once.
0 50 100 150 200 250
Sample Number
1
2
3
4
5
6
Average Regret (a.u.)
GP-MI
SSP-MI
(a) Himmelblau average regret
0 50 100 150 200 250
Sample Number
0
1
2
3
4
5
6
7
8
Cumulative Time (sec)
GP-MI
SSP-MI
(b) Himmelblau processing time
0 50 100 150 200 250
Sample Number
0
20
40
60
80
100
120
140
160
Average Regret (a.u.)
GP-MI
SSP-MI
(c) Branin-Hoo average regret
0 50 100 150 200 250
Sample Number
0
2
4
6
8
Cumulative Time (sec)
GP-MI
SSP-MI
(d) Branin-Hoo processing time
0 50 100 150 200 250
Sample Number
0
1
2
3
4
Average Regret (a.u.)
GP-MI
SSP-MI
(e) Goldstein-Price average regret
0 50 100 150 200 250
Sample Number
0
1
2
3
4
5
6
7
8
Cumulative Time (sec)
GP-MI
SSP-MI
(f) Goldstein-Price processing
time
Fig. 1: Graphs on the left show the average regret, Rt, and
graphs on the right the total accumulated processing time.
Shaded regions are 95% confidence intervals for N= 30
trials. In all cases the Rtfor SSP-MI is either the same as
or statistically significantly better than the GP-MI algorithm.
SSP-MI shows a substantial improvement in performance with
respect to accumulated processing time.
IV. DISCUSSION
We have demonstrated, using biologically plausible rep-
resentations, MI-driven exploration that has fixed limits on
memory and computation time while still being defined over
continuous spaces. Combining Spatial Semantic Pointers and
Bayesian Linear Regression enables operations with limited
memory space and long-term operation in arbitrary spaces.
Our empirical regret performance is either statistically
indistinguishable, or better than the baseline GP approach
on three standard optimization targets. The time to evaluate
candidate sampling locations is constant in the number of
samples collected, unlike the GP-based approaches, and has
a memory requirement O(d2), in the dimensionality of the
SPP, compared to O(t2), in the number of observations, for
GP-MI. Note also that our estimation of compute time in
Fig. 1 conservatively includes the one-time encoding cost.
While it takes over 100 samples to amortize this cost, the
encoding could be done prior to the onset of operations, which
Function tEGP [Rt]ESSP [Rt]SDGP [Rt]SDSSP [Rt]Effect Size
µ95% HDI µ95% HDI µ95% HDI
Himmelblau 125 1.01 [ 0.50, 1.61] 0.09 [-0.11, 0.34] 0.93 [ 0.44, 1.48]
250 1.06 [ 0.51, 1.60] 0.09 [-0.10, 0.35] 0.98 [ 0.45, 1.48]
Branin-Hoo 125 2.42 [-0.07, 4.82] 3.54 [ 1.72, 5.74] 0.67 [ 0.02, 1.35]
250 2.83 [ 0.94, 4.70] 3.03 [ 1.68, 4.47] 0.79 [ 0.25, 1.34]
Goldstein-Price 125 0.02 [-0.39, 0.41] 0.00 [-0.11, 0.11] 0.01 [-0.35, 0.42]
250 0.02 [-0.37, 0.43] 0.00 [-0.12, 0.12] 0.02 [-0.37, 0.41]
TABLE II: The difference in regret measured at samples t= 125 and t= 250. SSP-MI regret is either better than or statistically
indistinguishable from GP-MI, at the selected sample points. We report the differences in the average regret and the standard
deviations, as well as the effect size (Cohen’s d) for 30 trials. We also report 95% high density intervals (HDI). Positive values
means the SSP algorithm has a lower regret or standard deviation. Results using an unpaired Bayesian hypothesis test [29].
would further favour SSP-MI. Like occupancy grid methods,
evaluation time grows linearly with the number of candidate
locations, but SSP-MI retains its definition over the continuum.
Our algorithm represents an initial proof-of-concept for cu-
riosity guided exploration using vector representations. If SSPs
are a unifying tool for modeling cognition, as in Eliasmith
et al. [14], then our approach could also model curiosity
in conceptual spaces. However, while we encoded data with
SSPs, other vector encoding that induce kernels could be used.
There remain algorithmic refinements to explore. Hexagonal
SSPs [17] could improve the efficiency of encoding candidate
sample locations, and further integrate the work to neural
models of spatial representation. Performance degradation in
response to noise remains to be examined.
Because computing the acquisition function is efficient, it
should be practical to find sample collection points, x, that
maximize the acquisition function, instead of selecting from
a finite set, avoiding regret due to arbitrary sampling of the
function domain.
Further, we may be able to evaluate entire trajectories, not
just individual sample locations. Single SSPs can represent tra-
jectories and regions of a space, facts that may be exploitable
to efficiently evaluate trajectories for informativeness and,
through computing dot-products, feasibility in a configuration
space. As our curiosity model is goal-driven, via the µ(x)term
in the acquisition function, a variable weighting of expected
reward and information gain could allow switching between
task-driven exploration as well as something akin to play.
We have presented an efficient implementation of curiosity
that may be of use in memory- and time-limited contexts.
While preliminary, this work is a jumping-off point for effi-
cient autonomous exploration.
ACK NOW LE DG EM EN T
The authors would like to thank Nicole Sandra-Yaffa Du-
mont for discussions that helped improve this paper. This work
was supported by CFI and OIT infrastructure funding as well
as the Canada Research Chairs program, NSERC Discovery
grant 261453, NUCC NRC File A-0028850.
REFERENCES
[1] D. V. Lindley, “On a measure of the information provided by an
experiment,” The Annals of Mathematical Statistics, pp. 986–1005, 1956.
[2] T. Loredo, “Bayesian adaptive exploration in a nutshell,Statistical
Problems in Particle Physics, Astrophysics, and Cosmology, vol. 1, p.
162, 2003.
[3] A. Singh, A. Krause, C. Guestrin, W. J. Kaiser, and M. A. Batalin,
“Efficient planning of informative paths for multiple robots,” in IJCAI,
vol. 7, 2007, pp. 2204–2211.
[4] D. R. Thompson and D. Wettergreen, “Intelligent maps for autonomous
kilometer-scale science survey,” 2008.
[5] K. Yang, S. Keat Gan, and S. Sukkarieh, “A gaussian process-based rrt
planner for the exploration of an unknown and cluttered environment
with a uav,Advanced Robotics, vol. 27, no. 6, pp. 431–443, 2013.
[6] E. Contal, V. Perchet, and N. Vayatis, “Gaussian process optimization
with mutual information,” in International Conference on Machine
Learning. PMLR, 2014, pp. 253–261.
[7] F. Bourgault, A. A. Makarenko, S. B. Williams, B. Grocholsky, and
H. F. Durrant-Whyte, “Information based adaptive robotic exploration,”
in IEEE/RSJ international conference on intelligent robots and systems,
vol. 1. IEEE, 2002, pp. 540–545.
[8] B. Charrow, S. Liu, V. Kumar, and N. Michael, “Information-theoretic
mapping using cauchy-schwarz quadratic mutual information,” in 2015
IEEE International Conference on Robotics and Automation (ICRA).
IEEE, 2015, pp. 4791–4798.
[9] Z. Zhang, T. Henderson, S. Karaman, and V. Sze, “Fsmi: Fast computa-
tion of shannon mutual information for information-theoretic mapping,”
The International Journal of Robotics Research, vol. 39, no. 9, pp. 1155–
1177, 2020.
[10] P. Kanerva, Sparse distributed memory. MIT press, 1988.
[11] ——, “Hyperdimensional computing: An introduction to computing
in distributed representation with high-dimensional random vectors,
Cognitive computation, vol. 1, no. 2, pp. 139–159, 2009.
[12] T. Plate, “Holographic reduced representations: Convolution algebra for
compositional distributed representations.” in IJCAI, 1991, pp. 30–35.
[13] T. A. Plate, “Holographic reduced representations,IEEE Transactions
on Neural networks, vol. 6, no. 3, pp. 623–641, 1995.
[14] C. Eliasmith, “How to build a brain: From function to implementation,
Synthese, vol. 159, no. 3, pp. 373–388, 2007.
[15] B. Komer, “Biologically inspired spatial representation,” 2020.
[16] E. P. Frady, P. Kanerva, and F. T. Sommer, “A framework for linking
computations and rhythm-based timing patterns in neural firing, such as
phase precession in hippocampal place cells,” 2018.
[17] N. S.-Y. Dumont and C. Eliasmith, “Accurate representation for spatial
cognition using grid cells,” in 42nd Annual Meeting of the Cognitive
Science Society. Toronto, ON: Cognitive Science Society, 2020, pp.
2367–2373.
[18] E. I. Moser, E. Kropff, and M.-B. Moser, “Place cells, grid cells, and
the brain’s spatial representation system,Annu. Rev. Neurosci., vol. 31,
pp. 69–89, 2008.
[19] T. E. Behrens, T. H. Muller, J. C. Whittington, S. Mark, A. B. Baram,
K. L. Stachenfeld, and Z. Kurth-Nelson, “What is a cognitive map?
organizing knowledge for flexible behavior,” Neuron, vol. 100, no. 2,
pp. 490–509, 2018.
[20] A. R. Voelker, “A short letter on the dot product between rotated fourier
transforms,” arXiv preprint arXiv:2007.13462, 2020.
[21] I. K. Glad, N. L. Hjort, and N. G. Ushakov, “Correction of density
estimators that are not densities,” Scandinavian Journal of Statistics,
vol. 30, no. 2, pp. 415–427, 2003.
[22] I. K. Glad, N. L. Hjort, and N. Ushakov, “Density estimation using the
sinc kernel,” Preprint Statistics, vol. 2, p. 2007, 2007.
[23] M. Schneider, W. Ertel, and G. Palm, “Expected similarity estimation for
large scale anomaly detection,” in 2015 International Joint Conference
on Neural Networks (IJCNN). IEEE, 2015, pp. 1–8.
[24] J. Harrison, A. Sharma, and M. Pavone, “Meta-learning priors for
efficient online bayesian regression,” in International Workshop on the
Algorithmic Foundations of Robotics. Springer, 2018, pp. 318–337.
[25] S. Banerjee, J. Harrison, P. M. Furlong, and M. Pavone, “Adaptive meta-
learning for identification of rover-terrain dynamics,arXiv preprint
arXiv:2009.10191, 2020.
[26] V. Perrone, R. Jenatton, M. Seeger, and C. Archambeau, “Multiple
adaptive bayesian linear regression for scalable bayesian optimization
with warm start,” arXiv preprint arXiv:1712.02902, 2017.
[27] GPy, “GPy: A gaussian process framework in python,
http://github.com/SheffieldML/GPy, since 2012.
[28] C. M. Bishop, Pattern recognition and machine learning. springer,
2006.
[29] J. K. Kruschke, “Bayesian estimation supersedes the t test.” Journal of
Experimental Psychology: General, vol. 142, no. 2, p. 573, 2013.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
This paper deals with the kernel density estimator based on the so-called sinc (or Fourier integral) kernel K(x) = ( x) 1 sin x. We study in detail both asymptotic and nite sample properties of this estimator. It is shown that, contrary to widespread opinion, the sinc estimator is superior to other estimators in many respects: it is more accurate for quite moderate values of the sample size, has better asymptotics in non-smooth case (the density to be estimated has only rst derivative), is more convenient for the bandwidth selection etc.
Article
Full-text available
We presentanew approachforsite surveybyautonomous surface robots. In our method the agent constructs an in- telligent map, a multi-scale model of the explored envi- ronment incorporating in situ and remote sensing data. The agent learns the model's parameters on the fly and exploits its predictions to guide adaptive navigation and sampling. In this manner the agent can respond appropri- ately to novel correlations, resource constraints and ex- ecution errors. Rover tests at Amboy Crater, California demonstrate improved performance over non-adaptive strategies for a geologic survey task.
Article
Full-text available
I describe a framework for adaptive scientic exploration based on iterating an Observation{Inference{Design cycle that allows adjustment of hypotheses and observing protocols in response to the results of observation on-the-y , as data are gathered. The framework uses a unied Bayesian methodology for the inference and design stages: Bayesian inference to quantify what we have learned from the available data, and Bayesian decision theory to identify which new observations would teach us the most. When the goal of the experiment is simply to make inferences, the framework identies a computationally ecien t iterative \maximum entropy sampling" strategy as the optimal strategy in settings where the noise statistics are independent of signal properties. Results of applying the method to two \toy" problems with simulated data|measuring the orbit of an extrasolar planet, and locating a hidden one-dimensional object|show the approach can signican tly improve observational eciency in settings that have well-dened, reliable models.
Article
Exploration tasks are embedded in many robotics applications, such as search and rescue and space exploration. Information-based exploration algorithms aim to find the most informative trajectories by maximizing an information-theoretic metric, such as the mutual information between the map and potential future measurements. Unfortunately, most existing information-based exploration algorithms are plagued by the computational difficulty of evaluating the Shannon mutual information metric. In this article, we consider the fundamental problem of evaluating Shannon mutual information between the map and a range measurement. First, we consider 2D environments. We propose a novel algorithm, called the fast Shannon mutual information (FSMI). The key insight behind the algorithm is that a certain integral can be computed analytically, leading to substantial computational savings. Second, we consider 3D environments, represented by efficient data structures, e.g., an OctoMap, such that the measurements are compressed by run-length encoding (RLE). We propose a novel algorithm, called FSMI-RLE, that efficiently evaluates the Shannon mutual information when the measurements are compressed using RLE. For both the FSMI and the FSMI-RLE, we also propose variants that make different assumptions on the sensor noise distribution for the purpose of further computational savings. We evaluate the proposed algorithms in extensive experiments. In particular, we show that the proposed algorithms outperform existing algorithms that compute Shannon mutual information as well as other algorithms that compute the Cauchy–Schwarz quadratic mutual information (CSQMI). In addition, we demonstrate the computation of Shannon mutual information on a 3D map for the first time.
Article
It is proposed that a cognitive map encoding the relationships between entities in the world supports flexible behavior, but the majority of the neural evidence for such a system comes from studies of spatial navigation. Recent work describing neuronal parallels between spatial and non-spatial behaviors has rekindled the notion of a systematic organization of knowledge across multiple domains. We review experimental evidence and theoretical frameworks that point to principles unifying these apparently disparate functions. These principles describe how to learn and use abstract, generalizable knowledge and suggest that map-like representations observed in a spatial context may be an instance of general coding mechanisms capable of organizing knowledge of all kinds. We highlight how artificial agents endowed with such principles exhibit flexible behavior and learn map-like representations observed in the brain. Finally, we speculate on how these principles may offer insight into the extreme generalizations, abstractions, and inferences that characterize human cognition. Behrens et al. review an emerging field building formalisms for understanding the neural basis of flexible behavior. The authors extend these ideas toward representations useful for generalization and structural abstraction, allowing rapid inferences and flexible behavior with little direct experience.
Conference Paper
We propose a new algorithm named EXPected Similarity Estimation (EXPoSE) to approach the problem of anomaly detection (also known as one-class learning or outlier detection) which is based on the similarity between data points and the distribution of non-anomalous data. We formulate the problem as an inner product in a reproducing kernel Hilbert space to which we present approximations that allow its application to very large-scale datasets. More precisely, given a dataset with n instances, our proposed method requires O(n) training time and O(1) to make a prediction while spending only O(1) memory to store the learned model. Despite its abstract derivation our algorithm is simple and parameter free. We show on seven real datasets that our approach can compete with state of the art algorithms for anomaly detection.
Article
A new framework which adopts a rapidly-exploring random tree (RRT) path planner with a Gaussian process (GP) occupancy map is developed for the navigation and exploration of an unknown but cluttered environment. The GP map outputs the probability of occupancy given any selected query point in the continuous space and thus makes it possible to explore the full space when used in conjunction with a continuous path planner. Furthermore, the GP map-generated path is embedded with the probability of collision along the path which lends itself to obstacle avoidance. Finally, the GP map-building algorithm is extended to include an exploration mission considering the differential constraints of a rotary unmanned aerial vehicle and the limitation arising from the environment. Using mutual information as an information-theoretic measure, an informative path which reduces the uncertainty of the environment is generated. Simulation results show that GP map combined with RRT planner can achieve the 3D navigation and exploration task successfully in unknown and complex environments.
Article
Several old and new density estimators may have good theoretical performance, but are hampered by not being bona fide densities; they may be negative in certain regions or may not integrate to 1. One can therefore not simulate from them, for example. This paper develops general modification methods that turn any density estimator into one which is a bona fide density, and which is always better in performance under one set of conditions and arbitrarily close in performance under a complementary set of conditions. This improvement-for-free procedure can, in particular, be applied for higher-order kernel estimators, classes of modern h4 bias kernel type estimators, superkernel estimators, the sinc kernel estimator, the k-NN estimator, orthogonal expansion estimators, and for various recently developed semi-parametric density estimators.