PreprintPDF Available

Redesigning the ensemble Kalman filter with a dedicated model of epistemic uncertainty

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

The problem of incorporating information from observations received serially in time is widespread in the field of uncertainty quantification. Within a probabilistic framework, such problems can be addressed using standard filtering techniques. However, in many real-world problems, some (or all) of the uncertainty is epistemic, arising from a lack of knowledge, and is difficult to model probabilistically. This paper introduces a possibilistic ensemble Kalman filter designed for this setting and characterizes some of its properties. Using possibility theory to describe epistemic uncertainty is appealing from a philosophical perspective, and it is easy to justify certain heuristics often employed in standard ensemble Kalman filters as principled approaches to capturing uncertainty within it. The possibilistic approach motivates a robust mechanism for characterizing uncertainty which shows good performance with small sample sizes, and can outperform standard ensemble Kalman filters at given sample size, even when dealing with genuinely aleatoric uncertainty.
Redesigning the ensemble Kalman filter
with a dedicated model of epistemic uncertainty
Chatchuea Kimchaiwong1, Jeremie Houssineau2, and Adam M. Johansen3
1Warwick Mathematics Institute, University of Warwick, UK.
2Division of Mathematical Sciences, Nanyang Technological University, Singapore.
3Department of Statistics, University of Warwick, UK.
Abstract
The problem of incorporating information from observations received serially in time is widespread
in the field of uncertainty quantification. Within a probabilistic framework, such problems can be ad-
dressed using standard filtering techniques. However, in many real-world problems, some (or all) of the
uncertainty is epistemic, arising from a lack of knowledge, and is difficult to model probabilistically.
This paper introduces a possibilistic ensemble Kalman filter designed for this setting and characterizes
some of its properties. Using possibility theory to describe epistemic uncertainty is appealing from a
philosophical perspective, and it is easy to justify certain heuristics often employed in standard ensemble
Kalman filters as principled approaches to capturing uncertainty within it. The possibilistic approach
motivates a robust mechanism for characterizing uncertainty which shows good performance with small
sample sizes, and can outperform standard ensemble Kalman filters at given sample size, even when
dealing with genuinely aleatoric uncertainty.
Keywords: Bayesian inference; State-Space Model; Possibility Theory
1 Introduction
Dynamical real-world systems are typically represented as a hidden process in some state space, indirectly
observed via a measurement process. The usual approach to characterizing the unknown state and uncer-
tainty about it is to combine the observed information with given prior information in an approach known
as filtering, or sometimes data assimilation [29]. However, this is non-trivial due not only to imprecise
measurements, but also to model misspecification [31].
Perhaps the best-known data-assimilation algorithm is the Kalman filter (KF) [22], which is the optimal
filter for the linear Gaussian state-space model, as detailed, e.g., in [31]. However, its application is limited
by its high computational costs when dealing with large state spaces, and by the fact that it cannot be
applied directly to nonlinear models [28]. Several modifications of the KF have been developed to extend its
scope. The extended Kalman Filter can deal with a mild level of nonlinearity by approximating the model
with a first order Taylor series expansion, as described, e.g., in [34]. The unscented Kalman Filter (UKF)
approximates the distribution with Gaussian distribution using a set of sigma points and their weights,
often giving better accuracy for highly nonlinear models [37]. Still, the accuracy of the UKF depends upon
the tuning of certain algorithmic parameters which determine the locations of the sigma points. Moreover,
the number of sigma points (and hence the computational cost and accuracy) is fixed. Another widely
used algorithm is the ensemble Kalman filter (EnKF) [5, 35], which uses an ensemble to approximate the
distributions of interest. The strengths of the EnKF are that it does not require the calculation of the
variance (avoiding costly matrix inversions in high-dimensional problems) and can be used with a nonlinear
model without approximating it with its linear tangent model [28] as one simply needs to be able to sample
1
arXiv:2411.18864v1 [stat.ME] 28 Nov 2024
from the transition kernel of the dynamic model—although this comes at the cost of further approximating
a non-Gaussian distribution with a Gaussian one. The EnKF must be used carefully with high-dimensional
problems as constraints on computing power typically lead to an ensemble size relatively small compared
to the dimension of the state space, leading to problems such as spurious correlation and underestimated
variance, affecting the algorithm’s performance. Thus, the recent development in this field mainly focuses
on reducing computational cost and increasing the forecast’s accuracy.
Although there has been much recent development in the field of data assimilation, e.g., [1, 2, 23], the
opportunities offered by alternative representations of the uncertainty have not been fully explored. Yet,
there is a clear motivation for doing so: In many situations in which the EnKF is used, the main sources of
uncertainty are i) the initial state of the system, ii) unknown deviations in the model, e.g., unknown forcing
terms, and iii) the lack of data. These sources of uncertainty can be argued to be of an epistemic nature and,
given their predominance, it is important to model them as faithfully as possible. This makes appealing the
use of a dedicated framework for epistemic uncertainty. Many different approaches have been suggested in the
literature to deal with epistemic uncertainty, such as fiducial inference [13, 16] and Dempster–Shafer theory
[9]. While fiducial inference gives a probabilistic statement about a parameter without relying on a prior
probability distribution, it still represents the information a posteriori via probability distributions. On the
other hand, Dempster–Shafer theory leverages (fuzzy) set-valued probabilities to represent both aleatoric and
epistemic uncertainty [27]. Yet, its formulation, based on sums over all subsets of the state space, restricts
its application to discrete cases and, even then, does not scale easily to larger problems. Another approach,
termed possibility theory [8, 11, 10], proposes to model exclusively epistemic uncertainty by changing the
algebra underpinning probability theory, i.e., summation/integration is replaced by maximisation. Many
probabilistic concepts and analyses can be adapted to possibility theory, for instance, [20] proves a Bernstein-
von Mises theorem for a possibilistic version of Bayesian inference. In real systems, some uncertainty will
be well modelled as aleatoric and some will of an epistemic nature; [17] shows that probability theory and
possibility theory can be combined in this case, offering an extension of the probabilistic approach rather
than an alternative. Possibility theory can be applied to complex problems, for instance [18] introduces
an analogue to spatial point processes, and [19] shows that the Kalman filter can also be derived in this
framework. These results highlight the potential of possibility theory for data assimilation, which we explore
in this work via a possibilistic analogue of the EnKF. Related work includes [4] in which a robust version of
the Kalman filter is proposed based on sets of probability distributions, with a motivation close to the one
for this work.
Our contributions are as follows:
1. We introduce a new notion of Gaussian fitting for possibility theory, and contrast it against the stan-
dard moment-matching procedure. We highlight the potential of this approach for providing principle
grounds for standard heuristics used in the EnKF.
2. We derive a possibilistic analogue of the EnKF, adapting the initialisation and predictions steps to the
assumed epistemic uncertainty in the model, and showing how the update step of existing versions of
the EnKF remain valid in this context.
3. We assess the performance of our method against two versions of the EnKF and against the UKF,
considering both linear and nonlinear dynamics as well as fully and partially observed processes.
The remainder of this paper is organised as follows. Section 2 starts with the state estimation problem
for the state-space model before giving the filtering techniques used to estimate such states under the
probabilistic framework. Then, the details of the possibilistic framework developed to tackle epistemic
uncertainty are given, with some concepts defined analogously to the probabilistic framework. Next, the
possibilistic EnKF is introduced in Section 3 to estimate the system’s state in the presence of epistemic
uncertainty. After that, the performance of our algorithm is assessed on simulated data in a range of
situations and compared against standard baselines in Section 4.
2
2 Background
We first briefly review the standard EnKF and the underlying filtering problem in Section 2.1 before intro-
ducing, in Section 2.2, the framework based on which the proposed method will be derived.
2.1 State estimation problem and filtering technique
The standard, probabilistic Gaussian state-space model describes the evolution of a hidden state process,
(Xk)k>0, and its associated observation process, (Yk)k>0, via
Xk=Fk(Xk1) + ϵkwhere ϵkN(0, Uk) (1a)
Yk=Hk(Xk) + εkwhere εkN(0, Vk),(1b)
with Ukand Vkpositive definite matrices and with N(µ, Σ) denoting a Gaussian distribution of mean µand
covariance Σ.
We consider the setting in which a stream of observations y1, y2, . . . arrives sequentially in time, and we
want to obtain estimates of the hidden states as they arrive. Thus, in the Bayesian context, we characterise
our knowledge of the state Xkat time step k > 0 given the realisations y1:k.
= (y1, y2, . . . , yk) of the
observations Y1:kvia the filtering density p(xk|y1:k) which can be computed recursively, see, e.g., [31], as
p(xk|y1:k1) = Zp(xk|xk1)p(xk1|y1:k1)dxk1(prediction)
p(xk|y1:k) = p(yk|xk)p(xk|y1:k1)
Rp(yk|xk)p(xk|y1:k1)dxk
.(update)
The Kalman Filter (KF) provides a closed-form recursion for the prediction and update steps for linear
Gaussian state-space models (see, e.g., [31]). However, many models in this world are nonlinear, meaning
that the KF is not directly applicable and, unfortunately, there are few other settings in which analytic
recursions are available. This motivates the development of numerous alternatives, including the UKF and
the EnKF.
While the UKF can provide good performance for moderately nonlinear models, which depends upon
careful choice of several algorithmic tuning parameters, and, as the number of sigma points is fixed, there is
little scope for adjusting computational cost or accuracy. The EnKF algorithm takes a different approach; it
can be viewed as an approximation to the KF which employs an ensemble of samples {Xi
k}N
i=1 to approximate
the parameters of the Gaussian distribution at time k. In particular, the ensemble’s mean and variance are
used to approximate those of the filtering distribution. The EnKF algorithm then proceeds recursively via the
prediction and update steps. The prediction step mirrors that of the KF but uses an ensemble approximation,
which means that Fkneed not be linear as long as it can be evaluated pointwise as in Algorithm 1 [5, 35].
Algorithm 1 Prediction step of EnKF at time step k
Input: The posterior ensemble {ˆ
Xi
k1}N
i=1 at time step k1.
Output: The predictive ensemble {Xi
k}N
i=1 at time step kwith mean µkand variance Σk.
1: for i {1, . . . , N }do
2: ϵi
kN(0, Uk)\\Sample noise
3: Xi
k=Fk(ˆ
Xi
k1) + ϵi
k\\Predict i-th particle
4: end for
5: µk1
NPN
i=1 Xi
k\\Predictive mean
6: Σk1
N1PN
i=1(Xi
kµk)(Xi
kµk),\\Predictive variance
The update step of the EnKF also approximates that of the KF: the mean of the updated distribution is
a convex combination of that of the predictive distribution and the observation using the so-called Kalman
3
gain as a weight. However, using the traditional Kalman gain can lead to an underestimation in the posterior
variance (as it essentially neglects the observation noise) and can lead to filter divergence [38]. A number
of variants of the EnKF update step have been developed to address this problem, including both the
stochastic EnKF (StEnKF) and the square root EnKF (SqrtEnKF) [24]. The latter updates the deviation
of the ensemble by applying a matrix obtained from the square root of a certain matrix. Different variations
of the SqrtEnKF exist, depending on how such a matrix is obtained, as summarised in [35]. In this work,
we consider the version of the SqrtEnKF presented in [38], which is particularly suitable as a basis for
introducing a possibilistic version of the EnKF. The update of the SqrtEnKF for a linear observation model
is detailed in Algorithm 2, where the notation A1
/2refers to the Cholesky factor of matrix A, and where we
use the same notation for a linear function and for its matrix representation.
Algorithm 2 Update step of SqrtEnKF at time step k
Input: The predictive ensemble {Xi
k}N
i=1 at time step kwith mean µkand variance Σk.
Output: The posterior ensemble {ˆ
Xi
k}N
i=1 at time step k.
1: SkHkΣkH
k+Vk\\Covariance of the innovation from the ensemble
2: KkΣkH
kS1
k\\Standard Kalman gain
3: ˜
KkΣkH
kS1
/2
k−⊤S1
/2
k+V1
/2
k1\\Adjusted Kalman gain
4: ˆµkµk+Kk(YkHkµk)\\Posterior mean
5: for i {1, . . . , N }do
6: ˆei
k(In˜
KkHk)(Xi
kµk)\\Posterior deviation
7: ˆ
Xi
kˆei
k+ ˆµk\\Updated particle
8: end for
For the ensemble-based method, the accuracy of the state estimate depends highly on the ensemble size.
The situation in which the ensemble is too small to adequately represent the system is termed undersampling
and leads to four main problems which have been termed: inbreeding, spurious correlation, matrix singularity,
and filter divergence [28]. Two widely used techniques to negate the undersampling problems are inflation
and localisation [28]. The inflation technique increases the deviation between the sample and the predictive
mean so that the predictive covariance from the ensemble is not underestimated and avoids the so-called
inbreeding problem. On the other hand, the localisation techniques are implemented to eradicate the spurious
correlation and singular matrix, examples being tapering using the Schur product to cut off the long-range
correlation or domain localisation where assimilation is performed in a local domain, which is disjoint in a
physical space [21]. Each technique needs to be fine-tuned in a problem-specific manner to achieve good
accuracy.
We have focused here on approaches directly relevant to the methodology developed below in Section 3;
many other methods have been devised to address nonlinearity and non-Gaussianity—see, e.g., [2, 12, 36].
2.2 Review of possibility theory
Possibility theory aims to directly capture epistemic uncertainty about a fixed but unknown element θin a
set Θ, by focusing on the possibility of an event related to θrather than on its probability. A possibility of 1
corresponds to the absence of evidence against the event taking place not evidence that the event took place
almost surely. If there is no evidence against an event, say E, then there cannot be any evidence against (E
or E), with Eany other event, so that the possibility of (Eor E) is also equal to 1. This behaviour shows
the notion of possibility does not give an additive measure of uncertainty, and the simplest operation that
corresponds to the possibility of a union of events is the maximum of their individual possibilities.
The analogue of a probability density is a so-called possibility function, f: Θ [0,1] which must have
supremum 1. Just as probabilities are best described by measures, possibilities are naturally cast as outer
measures, ¯
Pand just as one may obtain a probability from a density one can obtain a possibility from a
possibility function by taking its supremum, that is, for any AΘ, ¯
P(A) = supθAf(θ). If it holds that
f(θ) = 1 for some θA, then we will indeed have ¯
P(A) = ¯
P(AB) = 1 for any BΘ, as required.
4
Formally, the set function ¯
Pis an outer measure verifying ¯
P(Θ) = 1; we therefore refer to such set functions
as outer probability measures (o.p.m.s). If ¯
P(A) = 1 models the absence of opposing evidence, ¯
P(A) = 0 has
essentially the same interpretation as with probabilities, i.e., that θAis impossible, which requires very
strong evidence. In general, ¯
P(A) can be interpreted as an upper bound for subjective probabilities of the
event θA, i.e., the maximum probability that we would be ready to assign to this event is ¯
P(A). Under
this interpretation, the statement ¯
P(A) = 1 is uninformative: we could assign any probability in the interval
[0,1] to this event.
To formally define the quantities described above, we consider an analogue of a random variable, referred
to as an uncertain variable, defined as a mapping θ: Θ, and interpreted as follows: if the true outcome
in is ωthen θ(ω) is the true value of the parameter. The set plays a similar role to the elementary
event space in probability theory, except that it is not equipped with any probabilistic structure; instead,
it contains a true element ωand it holds by construction that θ(ω) = θ. An event can now be defined
as a subset of Ω, for instance {ω : θ(ω)A}for some AΘ. We will use the standard shortcut and
write this event as θA. An important difference between possibility theory and probability theory is that
uncertain variables do not characterise possibility functions, instead possibility functions describe uncertain
variables in a way that is not unique. For instance, if the available information is modelled by a possibility
function fdescribing an uncertain variable θ, then any function gsuch that gf, i.e. g(θ)f(θ) for any
θΘ, also describes θ. This is because gdiscarded some evidence about θ, that is, gis less informative
than f. In particular, we can consider the possibility function equal to 1 everywhere, denoted 1, which upper
bounds all other possibility functions. The function 1models the total absence of information.
To gain information from data, Bayesian inference can be performed within possibility theory in a way
that is very similar to the standard approach: If Yis a random variable with probability distribution p(· | θ)
belonging to a parameterised family of distributions1{p(· | θ) : θΘ}, if we observe a realisation yof Y, and
if the information available about θa priori is modelled by the possibility function f, then the information
available a posteriori can be modelled by the possibility function f(· | y) characterised by [17]
f(θ|y) = p(y|θ)f(θ)
supθΘp(y|θ)f(θ), θ Θ.(2)
We borrow from the Bayesian nomenclature and simply refer to fand f(· | y) as the prior and the posterior.
Since we can always start from the uninformative prior f=1, it is easy to find posterior possibility functions
by inserting different likelihoods; for instance, if Θ = Rnand if the likelihood is a multivariate Gaussian
distribution with mean θand known variance Σ, i.e., p(y|θ) = N(y;θ, Σ), then
f(θ|y) = exp 1
2(θy)Σ1(θy).
= N(θ;y, Σ).
Such a Gaussian possibility function is a conjugate prior for the Gaussian likelihood as in the probabilistic
case, and shares many of the properties of its probabilistic analogue. It can be advantageous to parameterise
a Gaussian possibility function, say N(θ;µ, Σ), by the precision matrix Λ = Σ1; indeed, the precision matrix
does not need to be positive definite for the Gaussian possibility function to be well defined, with positive
semi-definiteness being sufficient. In particular, this means that setting Λ to the 0 matrix is possible, with
the Gaussian possibility function being equal to 1in this case. A simple way to quantify the amount of
epistemic uncertainty in a possibility function is to consider the integral Rf(θ)dθas in [6], when defined.
This notion of uncertainty is consistent with the partial order on possibility functions: if gis less informative
than fthen the integral of gwill obviously be larger then that of f. In particular, the uncertainty of a
Gaussian possibility function is p|2πΣ|, with |·| denoting the determinant; a quantity often used to quantify
how informative a given Gaussian distribution is.
If θand ψare two uncertain variables, respectively on sets Θ and Ψ, jointly described by a possibility
function fon Θ ×Ψ, then Bayes’ rule can also be expressed via a more standard notion of conditioning as
1Although p(· | θ) is not formally a conditional probability distribution, it is useful to slightly abuse notations and write it
as such.
5
[8]
f(θ|ψ) = f(θ, ψ)
supθΘf(θ, ψ),
where supθΘf(θ, ψ) is the marginal possibility function describing ψ. The conditional possibility function
f(θ|ψ) allows independence to be defined simply: if f(θ|ψ) is equal to the marginal f(θ) = supψΨf(θ, ψ)
for any θΘ, then θis said to be independent from ψunder f. As with most concepts in possibility theory,
the notion of independence depends on the choice of possibility function; if θand ψare not independent
under f, we could find another possibility function gsuch that gfand such that θis independent of
ψunder g. This will be key later on for simplifying high-dimensional Gaussian possibility functions in a
principled way. When there is no ambiguity about the underlying possibility function, we will simply say
that θis independent of ψ.
Although the parameters µand Σ of the Gaussian possibility functions are reminiscent of the probabilistic
notions of expected value and variance, these notions have to be redefined in the context of possibility
theory. Asymptotic considerations [20] lead to a notion of expected value E(·) based on the mode, that
is E(θ) = arg maxθΘf(θ),for any uncertain variable θdescribed by f, and to a local notion of variance
based on the curvature of fat E(θ), assuming that the latter is a singleton, an assumption that we
will make throughout this work. These quantities correspond to the approximation often referred to as
Laplace/Gaussian approximation, and make sense in the context of epistemic uncertainty: without a notion
of variability, we simply look at our best guess E(θ) for the unknown θand at how confident we are in that
guess, i.e., how fast the possibility decreases around it. Although the expected value and variance associated
with the Gaussian possibility function N(µ, Σ) are indeed µand Σ, they will differ in general from their
probabilistic counterparts.
In order to make inference for dynamical systems simpler, we introduce a Markovian structure as follows:
We consider a sequence of uncertain variables x0,x1, . . . in a set Xand assume that xkis described by a
possibility function fkfor some given k0. Following the standard approach, we assume that xk+1 is
conditionally independent of xkδgiven xk, for any δ > 0. The available information about xk+1 given a
value xkof xkcan then be modelled by a conditional possibility function fk+1|k(· | xk), and prediction can
be performed via
fk+1(xk+1 ) = sup
xkX
fk+1|k(xk+1 |xk)fk(xk), xk+1 X.
We will focus in particular on the case where an analogue of (1a) holds, that is
xk=Fk(xk1) + uk,
with ukan uncertain variable described by N(0, Uk). In this case, the term ukcannot be interpreted as
noise and models instead deviations between the model and the true dynamics: the true states x
kand x
k1
are related via x
k=˜
Fk(x
k1) with ˜
Fkpotentially different from Fk; the true value u
kof interest is therefore
equal to the difference ˜
Fk(x
k1)Fk(x
k1) between the model and the true dynamics.
In the case where the prediction is deterministic, that is xk+1 =Fk(xk) for some possibly non-linear
mapping Fk, we can use the change of variable formula [3] to characterise the possibility function fk+1 as
fk+1(xk+1 ) = sup fk(xk) : xkF1
k(xk+1),(3)
where F1
k(xk+1) is the (possibly set-valued) pre-image of xk+1 via Fk, and sup = 0 by convention.
A result that is specific to possibility theory is that the expected values of xk+1 and xkare related via
E(xk+1) = Fk(E(xk)) without assumptions on Fk. This result will be key in our approach since it allows
to compute the expected value at time step k+ 1 with a single application of Fk, rather than averaging
ensembles as in the EnKF. This result does not hold for the mode of probability densities because the
corresponding change of variable formula include the Jacobian of the transformation, which shifts the mode
in non-trivial ways.
6
3 Possibilistic EnKF
It is has been shown in [19] that an analogue of the Kalman filter can be derived in the context of possibility
theory, and that the corresponding expected value and variance are the same as in the probabilistic case.
However, as with the probabilistic KF, its applicability is limited to linear models—so it is natural to develop
extensions that accommodate nonlinearity. Since the probabilistic EnKF is flexible in computational cost
and can be effectively applied to nonlinear models, we explore the use of EnKF-like ideas in the possibilistic
setting. Here, we present a novel possibilistic EnKF (p-EnKF) and show that analogues of inflation and
localisation arise naturally in the context of possibility theory rather than being imposed as heuristics to
improve performance as in the standard setting.
For the p-EnKF, we use an ensemble of weighted particles to characterise the possibility function and
follow a similar path to the standard EnKF by assuming that the underlying possibility function is Gaussian
in order to proceed with a KF-like update. For this purpose, we need to define two operations: 1) how to
approximate a given possibility function by weighted particles? This will be necessary for initialisation at
time step k= 0, and 2) how to define a Gaussian possibility function based on weighted particles? This will
be necessary to carry out the Kalman-like update. We start by answering the latter question in Section 3.2,
before moving to the former in Section 3.3. Based on this, we detail the prediction and update mechanism
of the p-EnKF in Sections 3.4 and 3.5 respectively, and consider some extensions in Section 3.6.
3.1 Ensemble approximation
We consider the problem of defining a set {(wi, xi)}N
i=1 of Nweighted particles in Rn, which will allow the
approximate solution of optimisation problems of the form maxxRnφ(x)f(x), for some bounded function
φon Rnand for a given possibility function on Rn. Although our approach could be easily formulated
on more general spaces than Rn, Euclidean spaces are sufficient for our purpose in this work and help to
simplify the presentation. As is common in Monte Carlo approaches, we do not want the particles or weights
to depend on φ, so as to allow us to solve this problem for many different such functions using a single
sample, we therefore focus on directly approximating f: placing particles in locations which allow a good
characterization of fallows us to approximate the optimization for a broad class of regular φ. Following the
principles of Monte Carlo optimization [30, Chapter 5], and assuming that Rf(x)dx < +, we can sample
from the probability distribution proportional to fto obtain the particles {xi}N
i=1 and then weight these
particles with wi=f(xi), for any i {1, . . . , N}. We then obtain the approximation
max
xRnφ(x)f(x)max
i∈{1,...,N}wiφ(xi).
This approximation can be proved to converge when N under mild regularity conditions on φas long
as the supports of φand fare not disjoint. This is a simple default choice for the methods developed below,
although more sophisticated approaches are possible.
Remark 1.One initially appealing idea is to consider a form of maximum entropy principle by identifying
the constraints we want to impose on the probability distributions from which we may wish to sample, and
to pick the distribution of maximum entropy subject to satisfying these constraints. This is a standard
argument for specifying the “least informative” distribution with particular properties, e.g., given a distri-
bution when the mean and variance the maximum entropy distribution is the Gaussian with these moments.
In the considered context, the constraints comes from the interpretation of o.p.m.s as upper bounds for
probability distributions. Specifically, for an o.p.m. ¯
P, we could draw an ensemble from the probability
density psatisfying the following criteria: phas the maximum entropy amongst those distributions satis-
fying RAp(x)dx ¯
P(A) for any ARn. The distribution pwould typically be more diffuse than the one
proportional to f, which can be beneficial for optimizing functions φtaking large values in the “tails” of f.
However, obtaining such a distribution is in general highly non-trivial beyond discrete or univariate problem.
As there are no constraints on the way in which particle locations, {xi}N
i=1 are selected, deterministic
schemes such as quasi Monte Carlo [26, 15] could also be considered. In Section 4, we will consider using
7
the same approach as in the UKF to define particle locations, which will prove to be beneficial in some
situations.
In the situations that we will consider, it will always be the case that fis maximised at a single known
element xRn. Therefore, it makes sense to add one particle x0=xwith weight w0= 1. This will prove
beneficial when fitting a Gaussian possibility function to the ensemble {(wi, xi)}N
i=0, as detailed in the next
section.
3.2 Best-fitting Gaussian possibility function
We consider the situation in which the epistemic uncertainty is captured by a set {(wi, xi)}N
i=0 of N+ 1
weighted particles in Rn, with wi= 1 if and only if i= 0. We denote by ˜
fthe “empirical” possibility function
defined based on the ensemble as ˜
f(x) = maxi∈{0,...,N}wi1xi(x), with 1xithe indicator of the point xi. The
standard approach would be to simply compute the (weighted) mean and variance of the ensemble, but these
notions do not apply directly to possibility functions, and the curvature-based possibilistic notion of variance
is not defined for an empirical possibility function like ˜
f. Instead, we aim to fit a Gaussian possibility function
N(µ, Σ) to this (weighted) ensemble and the considered variance will simply be the second parameter Σ of
this fitted Gaussian.
To fit the Gaussian N(µ, Σ) to the ensemble, we need a notion of best fit. Based on the partial order
between possibility functions described in Section 2.2, we can easily ensure that N(µ, Σ) does not introduce
artificial information—at least at locations {xi}N
i=0 by requiring that N(µ, Σ) ˜
f. Since ˜
f(x) = 0 when
x / {xi}N
i=0, we can simplify this condition to: N(xi;µ, Σ) wifor all i {0, . . . , N }. The inequality
N(µ, Σ) ˜
fforces the expected values of N(µ, Σ) and ˜
fto coincide, from which we can deduce that µ=x0.
For the variance, we aim to minimise the uncertainty in the possibility function N(µ, Σ), i.e., minimising
RN(x;µ, Σ)dxp|Σ|, which is equivalent to maximising log |Λ|with Λ = Σ1the precision matrix, thanks
to the properties of the determinant. The precision matrix is then most naturally defined as the one solving
the constrained optimisation problem
max
ΛSd
+
log |Λ|subject to N(xi;µ, Σ) wi,1iN, (4)
where Sd
+is the cone of positive semi-definite d×dmatrices. Since µ=x0, the corresponding constraint
N(x0;µ, Σ) w0is automatically satisfied and we only need to ensure our Gaussian possibility function
upper bounds the ensemble at the other xi’s. Using the invariance of the trace to cyclic permutations allows
these constraints to be rewritten as
(xiµ)Λ(xiµ) = Tr (CiΛ) 2 log wi,1iN, (5)
where Ci= (xiµ)(xiµ); this is more convenient for numerical optimization because Tr(CiΛ) is linear
in Λ.
Remark 2.In the one-dimensional (n= 1) case, this optimisation problem can be easily solved for any N1:
the ith constraint in (5) can be expressed directly for the (scalar) variance as σ2.
= Σ 2 log wi/(xiµ)2,
and (5) reduces to a single constraint:
σ2min
i∈{1,...,N}
2 log wi
(xiµ)2.(6)
In particular, setting σ2to the right hand side of this inequality will maximise the precision, hence solving
the optimisation problem (4). In this case, if it happens that wi=f(xi) with fa Gaussian possibility
function, then the associated variance will be recovered exactly, even with N= 1. Although this result
does not generalise easily to higher dimensions, it highlights the potential of this method for recovering the
variance from an ensemble, as will be studied later in this section.
We will denote by Λ{(wi, xi)}ithe solution to (4), omitting the limits on ifor concision. In addition
to providing an estimate for the variance of the best-fitting Gaussian, this approach provides a measure
8
of how far from Gaussian the ensemble is. Indeed, a notion of distance can be defined based on the gaps
Tr (CiΛ) 2 log wi,i {1, . . . , N }.
We will be interested in the relationship between the best Gaussian fit for a given ensemble and the one
for a linear and invertible transformation of that ensemble, which motivates the following proposition.
Proposition 1. Let {(wi, xi)}ibe an ensemble such that wi= 1 if and only if i= 0, and let Mbe an
invertible linear map on Rn, it holds that
Λ{(wi, xi)}i=MΛ{(wi, M xi)}iM.
Proof. The constraints (5) for the ensemble (wi, Mxi) can be rewritten as
i {1, . . . , N }: (MxiM x0)Λ(M xiMx0)=(xix0)MΛM(xix0) 2 log wi.
Changing the variable of the optimisation problem from Λ to ˜
Λ = MΛMchanges the objective function
to log |M−⊤ ˜
ΛM1|= log |˜
Λ|+ constant. Therefore, the two optimisation problems are equivalent, and their
solutions are related as stated.
From a practical viewpoint, the computational cost for calculating the ensemble’s variance via (4) is
greater than that of the probabilistic framework due to the optimisation problem. Yet, Remark 2 hints at
a potential for an efficient recovery of the true variance in the Gaussian case. This aspect is investigated
in Figure 1a which displays the average root mean squared error (RMSE) between the true variance and
the ensemble’s variance based on different sample sizes and for n {8,16,32}, averaged over 1000 repeats.
The true n×ncovariance matrix Σ is drawn from the inverse Wishart distribution with n2degrees of
freedom and scale matrix nIn, with Inthe identity matrix of dimension n×n, making its reconstruction
appropriately challenging. Here, we use a sample xiN(0n,Σ) for i= 1, . . . , N , with 0nthe zero vector
of size n, to both estimate the probabilistic variance and to be used as the ensemble in (4) with weights
wi= N(xi; 0n,Σ). The performance of each case is investigated from the minimum required sample size
N=nup to N= 500. Figure 1a shows that the covariance matrix obtained from (4) indeed converges
to the true underlying variance faster than the probabilistic one for small dimensions. In particular, the
associated RMSE appears to drop significantly around N=n2.
0 100 200 300 400 500
sample size
18
16
14
12
10
8
6
4
log of RMSE
probabilistic (n=8)
possibilistic (n=8)
probabilistic (n=16)
possibilistic (n=16)
probabilistic (n=32)
possibilistic (n=32)
(a) Error between the sample and true variance
0 100 200 300 400 500
sample size
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
time (second)
possibilistic (n=8) possibilistic (n=16) possibilistic (n=32)
(b) Computational time of sample variance
Figure 1: Analysis of the proposed procedure for Gaussian fitting when the underlying possibility function
/ probability distribution is Gaussian.
The computational cost of the probabilistic approach is very small for the considered range of sample
sizes and remains around 0.1 ms even for n= 32 in our experiments, which is likely due to other operations
dominating the computational cost in the considered settings. This is in contrast to the computational time
of the proposed method, shown in Figure 1b, which has a clear linear trend and is orders of magnitude larger
than in the probabilistic case.
9
One advantage of defining an ensemble’s variance via (4) is that the knowledge of dependency—or lack
thereof—between components of the states xRncan be more easily integrated. For instance, if we
want to impose conditional independence between some components, i.e., to impose that Λi,j = 0 for any
indices iand j=iin a given set I, then we can add a set of constraints in the optimisation problem
(4). Adding constraints will reduce the precision, i.e., it will yield a Gaussian possibility function that is
less informative than the less-constrained problem. This behaviour can be interpreted as follows: in the
possibilistic framework, conditional independence can be obtained by sacrificing some information. This is
a well-understood trade-off in data assimilation, where forcing components to be uncorrelated, a method
known as localisation, is usually accompanied by an increase of the variance of the remaining components, a
process known as inflation. The main difference between the standard inflation and the proposed approach
is that inflation usually comes with additional parameters that need to be fine-tuned for different situations,
whereas the required amount of precision loss is automatically determined via the optimisation problem (4)
when adding the constraints on conditional independence, with no additional tuning parameters.
Apart from defining the ensemble’s expected value and variance, three steps of the algorithm need to
be developed: initialisation, prediction, and update. These will be analogous to those of the probabilistic
EnKF, but significant changes are required.
3.3 Initialisation
We consider a sequence of uncertain variables x0,x1, . . . , with xkrepresenting the state of the system at time
step k. We assume that there is prior knowledge about x0, encoded into a possibility function N(µ0,Σ0).
This possibility function is approximated by an ensemble {(wi,ˆxi
0)}N
i=0 of N+ 1 weighted particles, defined
as in Section 3.1. The time index, 0 in this case, is omitted for the weights as these will in fact remain
constant throughout the algorithm, which relies exclusively on transports of the associated particles. The
initialisation step is detailed in Algorithm 3.
Algorithm 3 Initial step of the p-EnKF
Input: The prior Gaussian possibility function N(µ0,Σ0)
Output: The initial ensemble {(wi,ˆxi
0)}N
i=0
1: for i {1, . . . , N }do
2: ˆxi
0N(µ0,Σ0)\\Draw a particle
3: wiN(ˆxi
0;µ0,Σ0)\\Calculate the corresponding weight
4: end for
5: ˆx0
0µ0\\Add a particle deterministically at the mode
6: w01
3.4 Prediction step
In the probabilistic framework, the standard way of obtaining the predictive ensemble at time step kis
to i) apply the transition model Fkto each particle, and ii) add a realisation of the transition noise. The
first part of this process can be used as is, with some advantageous properties: a key result is the fact
that E(Fk(xk1)) = Fk(E(xk1)) with no assumption on either Fkor xk1. This allows to perform the
prediction without recomputing the expected value, therefore stabilising the estimation through non-linear
dynamics. In practice, this means that the particle with index 0 will always correspond to the expected
value, hence its special treatment. In practice, given the ensemble {(wi,ˆxi
k1)}N
i=0 at time step k1, we
can simply compute the image ˜xi
k=Fkxi
k1) of each particle to capture the information at time kin the
absence of perturbations.
However, the second part of the standard approach is not appropriate in the possibilistic setting since we
aim to model epistemic uncertainty which, in this context, corresponds to deviations between the model and
the actual dynamics rather than real perturbations. Deterministic methods can however be adapted and we
10
consider using transport maps to move the points of the ensemble as in [32]. To construct such a map for
our setting, we first characterize linear transformations of uncertain variables as follows.
Proposition 2. If xis an uncertain variable on Rndescribed by N(µ, Σ) and z=Ax+bwhere Ais an
n×ninvertible matrix and bis a constant vector of length n, then zis described by N( +b, AΣA).
Proof. Let fxdenote the Gaussian possibility function N(µ, Σ). Applying (3), the possibility function of the
uncertain variable zis
fz(z) = sup{fx(x) : xRn, z =Ax +b}
= sup{fx(x) : x=A1(zb)}
=fx(A1(zb))
= N A1(zb); µ, Σ= Nz; +b, AΣA.
According to Proposition 2, Gaussianity is preserved under linear and invertible transformations in the
possibilistic framework. Furthermore, it is straightforward to obtain a map between two uncertain variables
in Rndescribed by Gaussian possibility functions:
Proposition 3 (Mapping between Gaussian possibility functions).If xand zare two uncertain variables
in Rndescribed respectively by N(µ, Σ) and N(˜µ, ˜
Σ), then there exists a map Msuch that z=M(x), which
is characterised by
M(x) = ˜µ+T(xµ), x Rn
where T=˜
Σ1
/2Σ1
/21, with the notation A1
/2referring to the Cholesky factor of a matrix A.
Proof. First, we rewrite the mapping as follows
M(x) = ˜µ+T(xµ) = Tx+ µT µ)
As this is a linear and invertible transformation of x, Proposition 2 guarantees that z=M(x) is also
described by a Gaussian possibility function, defined as
fz(z) = NT µ + (˜µT µ), T ΣT
= N˜µ, ˜
Σ1
/2Σ1
/21hΣ1
/2Σ1
/2iΣ1
/2−⊤˜
Σ1
/2= N˜µ, ˜
Σ.
Such a transport map can be used in the prediction step of the p-EnKF to add uncertainty from the
transition model to the ensemble using deterministic mapping as follows: We fit a Gaussian possibility
function N(µk,˜
Σk) to the ensemble {(wi,˜xi
k)}N
i=0, then, from the prediction of the possibilistic Kalman filter
[19], we know that an additive Gaussian uncertainty of the form N(0n, Uk) will yield a Gaussian predictive
possibility function with expected value µkand variance ˜
Σk+Uk. Using Proposition 3, we can compute
a map Mkfrom N(µk,˜
Σk) to N(µk,˜
Σk+Uk) and apply it to each particle ˜xi
kto obtain the predicted
ensemble at time k. The prediction step of the p-EnKF is summarised in Algorithm 4. Although we assume
the Gaussianity of the ensemble at a slightly earlier stage than the standard EnKF, Ukis typically small
compared to ˜
Σkso that the impact of this assumption is expected to be small. In addition, fitting a Gaussian
possibility function to the predicted ensemble is unnecessary as the result is known to be N(µk,Σk), with
Σk=˜
Σk+Uk, based on Proposition 1.
11
Algorithm 4 Prediction step of the p-EnKF at time k
Input: The time k1 posterior ensemble, {(wi,ˆxi
k1)}N
i=0.
Output: The time kpredictive ensemble, {(wi, xi
k)}N
i=0, expected value µk, and variance Σk
1: for i {1, . . . , N + 1}do
2: ˜xi
kFkxi
k1)\\Apply dynamics to particles
3: end for
4: µk˜x0
k
5: ˜
ΛkΛ{(wi,ˆxi
k1)}i\\Compute the precision matrix
6: ˜
Σk˜
Λ1
k
7: for i {1, . . . , N }do
8: T˜
Σk+Uk1
/2˜
Λ1
/2
k\\Compute matrix for adding uncertainty
9: xi
kµk+Txi
kµk)\\Transport each particle
10: end for
11: Σk˜
Σk+Uk
3.5 Update step
We first need to specify how the observation equation (1b) will be adapted to the considered context. Since
perturbations in sensors are often stochastic in nature, we continue to model the error in the observation
with a random variable, that is
Yk=Hk(xk) + εk,
with εkN(0, Vk) as before. The mechanism to update the information on xkaccordingly is provided by
(2). In some situations, the main source of uncertainty in the observation will be of an epistemic nature,
yet, if the corresponding model errors are described by N(0, Vk), then the posterior possibility function will
be the same; this follows from the likelihood principle since the two associated likelihoods will only differ by
a multiplicative constant.
There are several variants of the probabilistic EnKF update step. Here, we follow the principles of the
SqrtEnKF as it is well suited to the non-random setting of interest. In fact, we will show that our ensemble
can be updated exactly in the same way as in the SqrtEnKF.
From the prediction step, we know that the best fitting Gaussian possibility function for the predicted
sample is N(µk,Σk). As is standard, we consider the deviations ei
k=xi
kµkand we verify that the correct
updating formulas for the expected value and deviations are
ˆµk=µk+Kk(ykHkµk),and ˆei
k=ei
k˜
KkHkei
k,
where Kkand ˜
Kkare as defined in the Kalman filter and SqrtEnKF, respectively. The updated particles
obtained by adding the posterior expected value and deviations together are
ˆxi
k=ˆ
Mk(xi
k).
= (In˜
KkHk)xi
k+ ( ˜
KkHkµk+KkykKkHkµk),
which is a linear transformation of xi
k. Defining xkas an uncertain variable described by N(µk,Σk) and
using Proposition 3, it follows that ˆ
Mk(xk) is described by
ˆ
fk= N (In˜
KkHk)µk+ ( ˜
KkHkµk+KkykKkHkµk),(In˜
KkHkk(In˜
KkHk)
= N (µk+Kk(ykHkµk),(InKkHkk),
which matches the update mechanism of the KF, as required. The map ˆ
Mkis therefore moving the particles
in such a way that the best fitting Gaussian for {(wi,ˆ
Mk(xi
k))}is the posterior of the KF with N(µk,Σk) as
a predicted possibility function. Thus, the update step of the p-EnKF is formally the same as that of the
standard SqrtEnKF, as detailed in Algorithm 2, albeit with a difference in interpretation.
12
3.6 Extensions
We finish this section by collecting together some extensions to the p-EnKF, demonstrating its applicability
in the context of nonlinear measurement models and providing a number of ways to improve computational
efficiency.
3.6.1 Nonlinear observation models
In the previous section, we have detailed the p-EnKF with a linear observation model. We now establish
that, similarly to the EnKF [14, 33], the p-EnKF could be adapted to nonlinear observation models. There
are, in fact, two main approaches to dealing with nonlinearity in the observation model, which we consider
in turn.
Model linearisation The simplest way to handle a nonlinear observation model is to linearise it, i.e., to
Taylor expand the observation model around the predictive expected value at time kas Hk(xk)Hk(µk) +
JHk(µk)(xkµk), with JHk(µk) the Jacobian matrix of Hkat µk. Then, a new observation model can
be defined based on the observation matrix JHk(µk) and on a non-zero mean Hk(µk)JHk(µk)µkfor
the observation noise εk. After that, we can follow Algorithm 2 for the update, except that the term
ykHkµkis replaced by yk+JHk(µk)µkin step 4. Since linearisation does not depend on the chosen
representation of uncertainty, it is equally applicable to the p-EnKF as it is to standard versions of the
EnKF. The linearisation method can usually be improved by replacing the term HkΣkH
kby the predictive
variance of the observation based on the ensemble [33], which is also applicable to the p-EnKF. The term
ΣkH
kcan usually also be replaced by a ensemble-based approximation, yet, this is not straightforward in the
p-EnKF since the precision matrix does not have the same properties as the covariance matrix: to compute
the covariance matrix Σx,Hk(x)between the uncertain variables xand Hk(x), one must first compute the
precision matrix for (x, Hk(x)), invert it, and then extract the block corresponding to Σx,Hk(x). Such an
extension of the state is however usual, as described next.
Extending the state space Another way to deal with a nonlinear observation model is by extend-
ing/augmenting the original state with a the corresponding predicted observation, which is then observed
linearly. In this case, the state becomes zk= (xk, Hk(xk)) and the extended observation matrix is
˜
Hk=0m×nIm, with 0m×nthe 0 matrix of size m×n. Algorithm 2 can be used as usual once a
Gaussian is fitted to this extended state. The posterior ensemble can then be extracted by choosing the first
nelements of the extended state for every particle. Care must be taken in practice as the precision matrix of
the extended state can be close to singularity due to a strong correlation between the elements. For instance,
if one of the components of the state that is observed independently becomes sufficiently well estimated at
a given time step, then it might be that the nonlinear observation function is approximately linear from
the viewpoint of the ensemble. Yet, this corresponds to cases where linearisation would be appropriate. It
therefore appears that a hybrid technique would be the most suitable, with components of the observations
being either linearised or included in the state depending on their observed degree of nonlinearity.
3.6.2 Techniques to improve computational efficiency
As with any ensemble-based technique, the accuracy of the p-EnKF depends on ensemble size. As it is
typically of interest to use small ensembles for computational reasons, it can be challenging to represent
the state of interest adequately. An important aspect is that the p-EnKF requires a minimum sample size
equal to the state’s dimension plus one due to the computation of the variance; a sample size matching
the dimension plus one is sufficient to ensure that the resulting covariance matrix is of full rank providing
only that the collection of displacements from the expected value to each of the sample points are linearly
independent whereas a smaller sample will lead to a rank-deficient covariance matrix. That a sample size
equal to the dimension plus one suffices follows from the fact that a quadratic form bounded away from zero
at a number of points separated from the centroid by linearly-independent vectors cannot vanish anywhere.
13
However, the required sample size can be reduced by using one technique in the following section. Despite this
constraint on the sample size, some methods can be considered for improving the computational efficiency,
we present two of them in what follows.
Conditional Independence For the p-EnKF, variance is computed via an optimisation problem. Thus,
the number of variables in the precision matrix of the state xRnwill be n(n+ 1)/2. However, for many
problems, there is a natural structure in the state variable, such as a conditional-independence structure,
which can be exploited to reduce this number. Indeed, for sparsely dependent models, it is straightforward
to reduce the number of nonzero variables in the precision matrix Λ using conditional independence as
follows: The off-diagonal elements Λij that i=jare set to 0 if the variable xiand xjare to be modelled as
conditionally independent. By setting some of the off-diagonal elements in the precision matrix to 0 during
the computation of this matrix by optimisation, the other terms in the precision matrix will automatically
adjust to these constraints, offering a systemic way to perform inflation that is tailored to the strength of
the dependence being assumed away. However, exploiting this structure to gain in computational efficiency
would require optimisation algorithms that are specifically tailored to sparse/band diagonal positive-definite
matrices. The design of such algorithms is left for future work.
Nodal Numbering Scheme Apart from reducing the number of variables, the calculation and storage
can be done more efficiently by reordering the variables to minimise the bandwidth of nonzero entries in
the covariance matrix. One of the methods to achieve that is proposed in the literature is called the nodal
numbering scheme [7]. This method uses the graph to reorder the state’s element so that the nonzero elements
will be close to the diagonal element. Moreover, the nodal numbering scheme ensures that the permuted
matrix will have a bandwidth no greater than the original matrix, making the computation involved with
the precision matrix more efficient, see [7] and [25] for more details.
4 Numerical Experiments
Here we show the performance of the p-EnKF using simulated data from two different models: a simple
linear model and a modified Lorenz 96 model.
Data Generation One convenient feature of standard probabilistic modelling is that simulation can be
performed exactly according to the model: the variability of scenarios, necessary for a thorough performance
assessment, can be obtained directly by sampling from the assumed probability distributions. This is no
longer the case with possibility theory since embracing epistemic uncertainty means that sampling is no
longer a natural operation. The ideal solution would be to obtain a sufficiently large collection of real
datasets for which the ground truth is known, this is however not generally achievable for problems like data
assimilation. Instead, we generate our simulated scenarios by sampling from the probability distributions
assumed by the probabilistic baselines and align our possibilistic model with these.
4.1 Linear model
We first consider a linear model since the performance can be clearly compared with the optimal filter, which
can be computed in this instance via the KF. A simple linear model is considered so as to generalise easily
to arbitrary dimensions. In particular, we consider that the i-th component at time k,i {2, . . . , n}, only
depends on the i-th and (i1)-th components at time k1. This means that conditional independence can be
imposed to reduce the computational cost of obtaining the precision matrix with a limited information loss.
The model can be written as a state-space model (1), for k {1,...,100}, with the following components:
1. The initial state X0is sampled from N(0n,10In).
14
2. The dynamic model is linear: Fk(Xk1) = FkXk1with
Fk=
1λ0· · · 0
0 1 λ· · · 0
0 0 1 · · · 0
.
.
..
.
..
.
.....
.
.
0 0 0 · · · 1
,
where λ= 0.1 and the covariance matrix of the dynamical noise ϵkis Uk= 0.01In.
3. The observation model is also linear, Hk(Xk) = HkXk, with Hk=Im0m×(nm)and the covariance
matrix of the observation noise εkis Vk= 0.1Im.
Based on all the shared properties between the Gaussian possibility function and the Gaussian distribu-
tion, the best way to align our possibilistic model with the assumed probabilistic one is to simply keep the
expected value and variance parameters in our possibility functions. In particular, we assume that the initial
state x0is described by the Gaussian possibility function N(0n,10In) and that the errors in the dynamical
model are described by N(0n, Uk).
The parameters of the considered methods are as follows: Unless otherwise stated, the parameter Nis
set to twice the state’s dimension, i.e., N= 2n. For a given value of N, the actual number of samples for
all methods is N+ 1. The parameters of the UKF chosen in this paper are given by α= 0.25, κ = 130, λ =
α2(n+κ)nand β= 2.
4.1.1 Performance assessment for various sample sizes and dimensions
Since the p-EnKF is an ensemble-based method, it is natural for us to investigate the performance based
on different sample sizes and dimensions first. Figure 2 shows the performance of the p-EnKF with no
localisation when the state is fully observed (n=m), comparing it to the SqrtEnKF and the UKF. The
performance is measured in terms of average RMSE over 1000 realisations, except for n= 64 where we only
consider 50 realisations due to a large computational time (more than 30 minutes per run). We aim to assess
the performance after initialisation and thus focus on the estimation of the state X100 at the last time step.
The RMSE is computed i) for the posterior expected value with respect to (w.r.t.) the posterior mean of
the KF, ii) for the posterior expected value w.r.t. the true state, and iii) for the posterior variance w.r.t. the
posterior variance of the KF. The key aspects in Figure 2 are as follows:
1. As can be seen in Figure 2a, despite the similarities between the p-EnKF and the SqrtEnKF, the latter
improves on the former by at least 4 orders of magnitude in terms of RMSE w.r.t. the posterior mean
of the KF. Although the SqrtEnKF could be used when N < n, which is key in large dimensions,
the performance would necessary be lower in this regime. The difference in performance is still visible
when considering the RMSE w.r.t. the true state, as in Figure 2b, however it is less pronounced due to
the unavoidable error caused by the distance between the true state and the optimal estimator given
by the mean of the KF.
2. Despite the fact that the capabilities in terms of variance recovery are almost indistinguishable in
Figure 1a between the possibilistic and probabilistic approach, Figure 2c shows that the p-EnKF once
again largely outperforms the SqrtEnKF in terms of RMSE w.r.t. the optimal variance given by the
KF, with improvements by at least 4 orders of magnitude throughout once again. This is due to the
fact that here, as opposed to Figure 1a, the mean of the ensemble also needs to be estimated by the
SqrtEnKF whereas the expected value for the p-EnkF is given by the particle with index 0 and thus
does not need to be re-estimated.
The estimates of the UKF are closer to optimality than the ones of the p-EnKF in Figures 2a and 2c;
this could be due to the difference in initialisation between the two algorithms, with the UKF placing points
deterministically and with the p-EnKF relying on a random sample at the first time step. Yet, this source
15
of randomness in the p-EnKF is not necessary and other initialisation schemes are considered in the next
section.
  

 
   
   
  
 


  

 
   
   
  
 


  

 


























(a) Average RMSE w.r.t. the posterior mean of the KF.
  

 
   
   
  
 


  

 
   
   
  
 


  

 


























(b) Average RMSE w.r.t. the true state.
  

 
   
   
  
 


  

 
   
   
  
 


  

 


























(c) Average RMSE w.r.t. the posterior variance of the KF.
Figure 2: Performance assessment for the fully-observed linear model.
4.1.2 Alternatives schemes for particle initialisation
We now investigate the performance of the p-EnKF with different initialisation schemes inspired by the UKF.
When N= 2n, we can simply consider exactly the σ-points of the UKF as initial point for the p-EnKF and
we refer to this scheme as “UKF initialisation”. To allow for setting N=n, we also arbitrarily consider only
the σ-points which corresponds to increasing (resp. decreasing) one of the components of the mean vector
and we refer to this scheme as “UKF initialisation +” (resp. “UKF initialisation ”).
To better highlight the dependency on the initialisation scheme, we consider in this section a partially
observed model with m= 1, i.e., only the first component of the state is observed. This is challenging in
general, so only problems of small state dimension are considered. In Figure 3, the performance assessment
is carried out for n= 3 and n= 5, with all RMSEs being averaged over 1000 repeats. In Figures 3a
16
and 3b, all the considered initialisation schemes are compared against the SqrtEnKF and the StEnKF with
2n+1 samples, with the poor performance of the latter highlighting the difficulty of these inference problems
despite the small dimension. There is a slight but consistent improvement in performance when using the
UKF initialisation schemes in the p-EnKF, although this improvement vanishes after 30 to 50 time steps,
depending on the state dimension. Figures 3c and 3d are restricted to the StEnkKF, StEnKF and p-EnKF
with UKF initialisation for the sake of legibility, and show that although the interquartile range of the
different methods overlap, the p-EnKF performs consistently well across different realisations.
0 20 40 60 80 100
timestep
0.5
1.0
1.5
2.0
2.5
RMSE
KF
StEnKF
SqrtEnKF
UKF
random ini
UKF ini
UKF ini(pos)
UKF ini(neg)
(a) Average RMSE w.r.t. the true state for n= 3
0 20 40 60 80 100
timestep
1.0
1.5
2.0
2.5
3.0
3.5
4.0
RMSE
KF
StEnKF
SqrtEnKF
UKF
random ini
UKF ini
UKF ini(pos)
UKF ini(neg)
(b) Average RMSE w.r.t. the true state for n= 5
0 20 40 60 80 100
timestep
0.5
1.0
1.5
2.0
2.5
3.0
3.5
RMSE
StEnKF SqrtEnKF p-EnKF with UKF ini
(c) Average RMSE with the interquartile range for n= 3
0 20 40 60 80 100
timestep
1
2
3
4
5
RMSE
StEnKF SqrtEnKF p-EnKF with UKF ini
(d) Average RMSE with the interquartile range for n= 5
Figure 3: Average RMSE and the error range of different algorithms in the linear model. Left: state’s
dimension of 3; Right: state’s dimension of 5, averaged over 1000 repeats.
4.1.3 Banded vs. full precision matrix
To illustrate the capabilities of the p-EnKF to deal with localisation via a systematic form of inflation,
we contrast its performance with a full precision matrix against the one with a banded precision matrix
with a bandwidth of two, i.e., where all elements except the diagonal and elements adjacent to it are set
to zero. We compare these results against the ones obtained with the standard versions of the EnKF for
17
the fully-observed (n=m= 5) and partially-observed (n= 5, m = 1) cases. The performance is measured
with three quantities i) the RMSE of the posterior expected value w.r.t. the true state, ii) the determinant
of the posterior variance, and iii) the Mahalanobis distance between the posterior expected value and the
true state. The Mahalanobis distance is used to capture how good the estimate is relative to its variance
and, hence, assesses the calibration of the algorithms in terms of uncertainty quantification. It is defined as
p(xµ)Σ1(xµ) where xis the true state, µis a given posterior expected value, and Σ is the posterior
variance.
Figure 4 shows the performance of the p-EnKF with full and banded precision matrix against the same
baselines as before. All the algorithms except the KF use 11 samples, i.e., the ensemble size is N+ 1 where
N= 2nand the considered performance metrics are averaged over 1000 repeats. The important aspects in
Figure 4 are as follows:
1. As can be seen in Figures 4c and 4d, forcing the precision matrix to be banded has little impact on the
precision of the p-EnKF, despite correlations being crucial for a strong performance in the partially
observed case (Figure 4d). This is confirmed in Figures 4c and 4d, where the log-determinant of the
posterior variance is mostly unaffected by localisation. A small but noticeable difference can be seen in
Figure 4d, but the change in variance is in the correct direction: the determinant of the variance was
increased by localisation, i.e., some inflation has been automatically applied in order to compensate
for the imposed conditional independence.
2. The Mahalanobis distance for the two versions of the the p-EnKF is nearly constant and close to the
one of the KF for both considered scenario, as seen in Figures 4e and 4f. Conversely, the SqrtEnKF and
StEnKF both display large Mahalanobis distances, with the one of the StEnKF even diverging in the
partially-observed case; this is due to these algorithms having a larger RMSE than the KF (Figures 4c
and 4d) but a smaller variance (Figures 4c and 4d).
In conclusion, the p-EnKF adopts a nearly optimal behaviour in this linear scenario despite partial
observability and localisation. In contrast, the standard versions of the EnKF depart significantly from the
behaviour of the KF and tend to be overly optimistic even in the considered small-dimensional inference
problems, with well known adverse consequences for downstream tasks for which a reliable quantification of
the uncertainty can be crucial.
4.2 Modified Lorenz 96 type model
In order to show that the strong performance of the p-EnKF observed in the previous section generalises
beyond the linear case, we now consider a modified Lorenz 96 (LR96) model, which can be written as a
state-space model (1), with k {1,...,100}, with the following components:
1. The initial state X0is sampled from N(0n,10In).
2. The deterministic part of the dynamic model is characterised by x=Fk(x) with
x1=x
1+ ((x
2c)cx
1+F) t
x2=x
2+ ((x
3c)x
1x
2+F) t
xi=x
i+(x
i+1 x
i2)x
i1x
i+Ftfor 3 in1
xn=x
n+(cx
n2)x
n1x
n+Ft
where xi refers to the i-th component of xand where F= 8, c= 1, and t= 0.01.
3. The covariance matrix of the dynamical noise ϵkis Uk= 0.01In.
4. The observation model is linear, Hk(Xk) = HkXk, with Hk=Im0m×(nm)and the covariance
matrix of the observation noise εkis Vk= 0.1Im.
18
0 20 40 60 80 100
timestep
0.150
0.175
0.200
0.225
0.250
0.275
0.300
0.325
RMSE
(a) Average RMSE w.r.t. the true state
0 20 40 60 80 100
timestep
1.0
1.5
2.0
2.5
3.0
3.5
4.0
RMSE
(b) Average RMSE w.r.t. the true state
0 20 40 60 80 100
timestep
20
18
16
14
12
log of determinant of variance
(c) Estimated log(det(variance))
(d) Estimated log(det(variance))
0 20 40 60 80 100
timestep
2.0
2.5
3.0
3.5
4.0
4.5
5.0
distance
KF
p-EnKF
p-EnKF (UKF ini)
banded p-EnKF
StEnKF
SqrtEnKF
(e) Mahalanobis distance between the estimate and the
true state
0 20 40 60 80 100
timestep
2
3
4
5
6
7
8
9
distance
KF
p-EnKF
p-EnKF (UKF ini)
banded p-EnKF
StEnKF
SqrtEnKF
(f) Mahalanobis distance between the estimate and the
true state
Figure 4: Performance assessment for the linear model with n= 5 when (left) fully observed, m= 5 and
(right) partially observed, m= 1, averaged over 1000 repeats.
19
Overall, the only difference with the linear model is the transition function Fk. Throughout this section,
the ensemble size will be set to 2n+ 1, that is N= 2n, unless specified otherwise. The LR96 model is
particularly convenient for performance assessment since the dimension can be easily adjusted.
As before, we start by investigating the performance of the p-EnKF with the full precision matrix based
on different sample sizes and dimensions such that all elements of the state are observed (n=m) compared
to the SqrtEnKF and the UKF. However, for the nonlinear model, we only examine the performance in
terms of RMSE w.r.t. the true state at the last step since we can no longer obtain the optimal state estimate
from the KF. The RMSE displayed in Figure 5 is averaged over 100 repeats, except for n= 64 where we
only consider 50 realisations due to the computational cost. The results shown in Figure 5 are very close
to the ones obtained in the linear case, with all three algorithms maintaining a similar level of performance
despite the non-linearities.
  

 
   
   
  
 


  

 
   
   
  
 


  

 























Figure 5: Performance of p-EnKF in terms of RMSE w.r.t. the true state, compared with the SqrtEnKF
and the UKF for the LR96 model
Because of the similarities between the results with the linear model and the LR96 model, we only
highlight where noticeable differences arise. Figure 6a shows that, in the partially-observed case with m= 1
and n= 5, the SqrtEnKF has a RMSE that is now closer to the StEnKF than to the p-EnKF. Localisation
in the p-EnKF still has a very mild effect, although the error increases around time step 70. This increase
in error is however captured by the associated covariance matrix, so that the Mahalanobis distance remains
constant throughout the scenario, as required. This behaviour suggests that the correlation increased around
time step 70, forcing the p-EnKF with banded matrix to increase the amount of inflation to compensate for
the loss of information caused by the imposed conditional independence.
0 20 40 60 80 100
timestep
1.6
1.8
2.0
2.2
2.4
2.6
2.8
RMSE
(a) RMSE w.r.t. the true state
0 20 40 60 80 100
timestep
10.0
7.5
5.0
2.5
0.0
2.5
5.0
7.5
log of determinant of variance
(b) Estimated log(det(variance))
0 20 40 60 80 100
timestep
2
4
6
8
10
12
distance
p-EnKF
p-EnKF (UKF ini)
banded p-EnKF
StEnKF
SqrtEnKF
(c) Mahalanobis distance between the es-
timate and the true state
Figure 6: Performance for a partially-observed LR96 model (m= 1) when n= 5, averaged over 1000 repeats.
20
5 Conclusion
We have introduced the possibilistic ensemble Kalman filter, or p-EnKF, a data assimilation technique treat-
ing the state of a state-space model as a fixed quantity about which limited information is available. By using
possibility theory to model this form of epistemic uncertainty, we found that much of the intuition behind
the standard versions of the EnKF remains valid, with the differences between the theories of possibility and
probability leading to key features in the p-EnKF. Specifically, the properties of the expected value and vari-
ance in possibility theory appeared to be beneficial for inference problems of small to moderate dimensions,
with the p-EnKF closely approximating the Kalman filter in the linear-Gaussian case. These properties also
allowed for localisation to be seamlessly applied with no parameter tuning required to compensate for the
loss of information incurred by the imposed conditional independence.
In the current version of the p-EnKF, the computation of covariance matrices relies on solving a con-
strained optimisation problem, which is time-consuming and requires an ensemble size greater than the
dimension of the state. Although beyond the scope of this work, lifting these constraints appears to be
feasible through the use of specialised optimisation techniques and suitable regularisation.
There are also several possible avenues for further investigation. We highlight two directions that we
think are immediately interesting: although we have found the algorithms performance to be robust to the
specification of the initial ensemble, there is a potential for developing systematic approaches to specifying
it which might lead to further improved performance with small ensembles; and, an open question is how
can the update step be reformulated to operate directly in terms of the precision matrix to avoid the need
for matrix inversion, which would be required to facilitate the use of the p-EnKF in high-dimension.
Acknowledgements
AMJ acknowledges the financial support of the United Kingdom Engineering and Physical Sciences Re-
search Council (EPSRC; grants EP/R034710/1 and EP/T004134/1) and by United Kingdom Research and
Innovation (UKRI) via grant number EP/Y014650/1, as part of the ERC Synergy project OCEAN.
References
[1] Rossella Arcucci, Lamya Moutiq, and Yi-Ke Guo. Neural assimilation. In Computational Science–ICCS
2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part
VI 20, pages 155–168. Springer, 2020.
[2] Mark Asch, Marc Bocquet, and Ma¨elle Nodet. Data assimilation: methods, algorithms, and applications.
SIAM, 2016.
[3] edric Baudrit, Didier Dubois, and Nathalie Perrot. Representing parametric probabilistic models
tainted with imprecision. Fuzzy sets and systems, 159(15):1913–1928, 2008.
[4] Adrian N Bishop and Pierre Del Moral. Robust Kalman and Bayesian set-valued filtering and model
validation for linear stochastic systems. SIAM/ASA Journal on Uncertainty Quantification, 11(2):389–
425, 2023.
[5] Gerrit Burgers, Peter Jan Van Leeuwen, and Geir Evensen. Analysis scheme in the ensemble Kalman
filter. Monthly weather review, 126(6):1719–1724, 1998.
[6] Zhijin Chen, Branko Ristic, Jeremie Houssineau, and Du Yong Kim. Observer control for bearings-only
tracking using possibility functions. Automatica, 133:109888, 2021.
[7] Elizabeth Cuthill and James McKee. Reducing the bandwidth of sparse symmetric matrices. In Pro-
ceedings of the 1969 24th National Conference, pages 157–172, 1969.
21
[8] Bernard De Baets, Elena Tsiporkova, and Radko Mesiar. Conditioning in possibility theory with strict
order norms. Fuzzy Sets and Systems, 106(2):221–229, 1999.
[9] Arthur P Dempster. The Dempster–Shafer calculus for statisticians. International Journal of Approxi-
mate Reasoning, 48(2):365–377, 2008.
[10] Didier Dubois, Hung T Nguyen, and Henri Prade. Possibility theory, probability and fuzzy sets misun-
derstandings, bridges and gaps: Misunderstandings, bridges and gaps. In Fundamentals of fuzzy sets,
pages 343–438. Springer, 2000.
[11] Didier Dubois and Henry Prade. Possibility theory and its applications: Where do we stand? Springer
Handbook of Computational Intelligence, pages 31–60, 2015.
[12] Paul Fearnhead and Hans R unsch. Particle filters and data assimilation. Annual Review of Statistics
and Its Application, 5:421–449, 2018.
[13] Ronald A Fisher. The fiducial argument in statistical inference. Annals of Eugenics, 6(4):391–398, 1935.
[14] Marco Frei. Ensemble Kalman filtering and generalizations. PhD thesis, ETH Zurich, 2013.
[15] Philipp A Guth, Vesa Kaarnioja, Frances Y Kuo, Claudia Schillings, and Ian H Sloan. A quasi-Monte
Carlo method for optimal control under uncertainty. SIAM/ASA Journal on Uncertainty Quantification,
9(2):354–383, 2021.
[16] Jan Hannig, Hari Iyer, Randy CS Lai, and Thomas CM Lee. Generalized fiducial inference: A review
and new results. Journal of the American Statistical Association, 111(515):1346–1361, 2016.
[17] Jeremie Houssineau. Parameter estimation with a class of outer probability measures. arXiv preprint
arXiv:1801.00569, 2018.
[18] Jeremie Houssineau. A linear algorithm for multi-target tracking in the context of possibility theory.
IEEE Transactions on Signal Processing, 69:2740–2751, 2021.
[19] Jeremie Houssineau and Adrian N Bishop. Smoothing and filtering with a class of outer measures.
SIAM/ASA Journal on Uncertainty Quantification, 6(2):845–866, 2018.
[20] Jeremie Houssineau, Neil K Chada, and Emmanuel Delande. Elements of asymptotic theory with outer
probability measures. arXiv preprint arXiv:1908.04331, 2019.
[21] Tijana Janji´c, Lars Nerger, Alberta Albertella, Jens Schr¨oter, and Sergey Skachko. On domain lo-
calization in ensemble-based Kalman filter algorithms. Monthly Weather Review, 139(7):2046–2060,
2011.
[22] R. Kalman. A new approach to linear filtering and prediction problems. Journal of Basic Engineering,
82:35–42, 1960.
[23] Eugenia Kalnay. Atmospheric modeling, data assimilation and predictability. Cambridge University
Press, 2003.
[24] Matthias Katzfuss, Jonathan R Stroud, and Christopher K Wikle. Understanding the ensemble Kalman
filter. The American Statistician, 70(4):350–357, 2016.
[25] Chatchuea Kimchaiwong. Ensemble Kalman filtering with an alternative representation of uncertainty.
PhD thesis, University of Warwick, 2024.
[26] Christiane Lemieux. Monte Carlo and quasi-Monte Carlo sampling, volume 20. Springer, 2009.
[27] Judea Pearl. Reasoning with belief functions: An analysis of compatibility. International Journal of
Approximate Reasoning, 4(5-6):363–389, 1990.
22
[28] Ruth E Petrie. Localization in the ensemble Kalman filter. MSc Atmosphere, Ocean and Climate
University of Reading, 2008.
[29] Sebastian Reich and Colin Cotter. Probabilistic forecasting and Bayesian data assimilation. Cambridge
University Press, 2015.
[30] Christian P Robert and George Casella. Monte Carlo statistical methods, volume 2. Springer, 1999.
[31] Simo arkk¨a and Lennart Svensson. Bayesian filtering and smoothing. Cambridge University Press,
2023.
[32] Amirhossein Taghvaei and Prashant G Mehta. An optimal transport formulation of the ensemble
Kalman filter. IEEE Transactions on Automatic Control, 66(7):3052–3067, 2020.
[33] Youmin Tang, Jaison Ambandan, and Dake Chen. Nonlinear measurement function in the ensemble
Kalman filter. Advances in Atmospheric Sciences, 31(3):551–558, 2014.
[34] Gabriel A Terejanu et al. Extended Kalman filter tutorial. University at Buffalo, 2008.
[35] Michael K Tippett, Jeffrey L Anderson, Craig H Bishop, Thomas M Hamill, and Jeffrey S Whitaker.
Ensemble square root filters. Monthly weather review, 131(7):1485–1490, 2003.
[36] Peter Jan Van Leeuwen, Yuan Cheng, and Sebastian Reich. Nonlinear data assimilation for high-
dimensional systems: -with geophysical applications. Springer, 2015.
[37] Eric A Wan and Rudolph Van Der Merwe. The unscented Kalman filter for nonlinear estimation. In
Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control
Symposium (Cat. No. 00EX373), pages 153–158. IEEE, 2000.
[38] Jeffrey S Whitaker and Thomas M Hamill. Ensemble data assimilation without perturbed observations.
Monthly Weather Review, 130(7):1913–1924, 2002.
23
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
We present a modelling framework for multi-target tracking based on possibility theory and illustrate its ability to account for the general lack of knowledge that the target-tracking practitioner must deal with when working with real data. We also introduce and study variants of the notions of point process and intensity function, which lead to the derivation of an analogue of the probability hypothesis density (PHD) filter. The gains provided by the considered modelling framework in terms of flexibility lead to the loss of some of the abilities that the PHD filter possesses; in particular the estimation of the number of targets by integration of the intensity function. Yet, the proposed recursion displays a number of advantages such as facilitating the introduction of observation-driven birth schemes and the modelling the absence of information on the initial number of targets in the scene. The performance of the proposed approach is demonstrated on simulated data.
Article
Full-text available
Filtering and smoothing with a generalised representation of uncertainty is considered. Here, uncertainty is represented using a class of outer measures. It is shown how this representation of uncertainty can be propagated using outer-measure-type versions of Markov kernels and generalised Bayesian-like update equations. This leads to a system of generalised smoothing and filtering equations where integrals are replaced by supremums and probability density functions are replaced by positive functions with supremum equal to one. Interestingly, these equations retain most of the structure found in the classical Bayesian filtering framework. It is additionally shown that the Kalman filter recursion in terms of mean and variance can be recovered from weaker assumptions on the available information on the corresponding hidden Markov model.
Book
Now in its second edition, this accessible text presents a unified Bayesian treatment of state-of-the-art filtering, smoothing, and parameter estimation algorithms for non-linear state space models. The book focuses on discrete-time state space models and carefully introduces fundamental aspects related to optimal filtering and smoothing. In particular, it covers a range of efficient non-linear Gaussian filtering and smoothing algorithms, as well as Monte Carlo-based algorithms. This updated edition features new chapters on constructing state space models of practical systems, the discretization of continuous-time state space models, Gaussian filtering by enabling approximations, posterior linearization filtering, and the corresponding smoothers. Coverage of key topics is expanded, including extended Kalman filtering and smoothing, and parameter estimation. The book's practical, algorithmic approach assumes only modest mathematical prerequisites, suitable for graduate and advanced undergraduate students. Many examples are included, with Matlab and Python code available online, enabling readers to implement algorithms in their own projects.
Article
Bearings-only tracking using passive sensors is important for covert surveillance of moving targets. This paper adopts a mathematical formulation of bearings-only tracking in the framework of possibility theory, where uncertainties are represented using possibility functions, rather than usual probability distributions. Possibility functions have the capacity to deal robustly with partial (incomplete) specification of mathematical models and have been found particularly useful in model mismatch situations. The paper explores the design of reward functions which provide information gain in the context of observer motion control, in the framework of possibilistic recursive filter for bearings-only tracking. Numerical results demonstrate that in the presence of a model mismatch, the proposed framework performs better than the Bayesian probabilistic framework for stochastic filtering and control.
Article
Controlled interacting particle systems such as the ensemble Kalman filter (EnKF) and the feedback particle filter (FPF) are numerical algorithms to approximate the solution of the nonlinear filtering problem in continuous time. The distinguishing feature of these algorithms is that the Bayesian update step is implemented using a feedback control law. It has been noted in the literature that the control law is not unique. This is the main problem addressed in this article. To obtain a unique control law, the filtering problem is formulated here as an optimal transportation problem. An explicit formula for the (mean-field type) optimal control law is derived in the linear Gaussian setting. Comparisons are made with the control laws for different types of EnKF algorithms described in the literature. Via empirical approximation of the mean-field control law, a finite- N controlled interacting particle algorithm is obtained. For this algorithm, the equations for empirical mean and covariance are derived and shown to be identical to the Kalman filter. This allows strong conclusions on convergence and error properties based on the classical filter stability theory for the Kalman filter. It is shown that, under certain technical conditions, the mean squared error converges to zero even with a finite number of particles. A detailed propagation of chaos analysis is carried out for the finite- N algorithm. The analysis is used to prove weak convergence of the empirical distribution as NN\rightarrow \infty . For a certain simplified filtering problem, analytical comparison of the mse with the importance sampling-based algorithms is described. The analysis helps explain the favorable scaling properties of the control-based algorithms reported in several numerical studies in recent literature.
Article
State-space models can be used to incorporate subject knowledge on the underlying dynamics of a time series by the introduction of a latent Markov state-process. A user can specify the dynamics of this process together with how the state relates to partial and noisy observations that have been made. Inference and prediction then involves solving a challenging inverse problem: calculating the conditional distribution of quantities of interest given the observations. This article reviews Monte Carlo algorithms for solving this inverse problem, covering methods based on the particle filter and the ensemble Kalman filter. We discuss the challenges posed by models with high-dimensional states, joint estimation of parameters and the state, and inference for the history of the state process. We also point out some potential new developments which will be important for tackling cutting-edge filtering applications.
Article
R. A. Fisher, the father of modern statistics, proposed the idea of fiducial inference during the first half of the 20th century. While his proposal led to interesting methods for quantifying uncertainty, other prominent statisticians of the time did not accept Fisher’s approach as it became apparent that some of Fisher’s bold claims about the properties of fiducial distribution did not hold up for multi-parameter problems. Beginning around the year 2000, the authors and collaborators started to re-investigate the idea of fiducial inference and discovered that Fisher’s approach, when properly generalized, would open doors to solve many important and difficult inference problems. They termed their generalization of Fisher’s idea as generalized fiducial inference (GFI). The main idea of GFI is to carefully transfer randomness from the data to the parameter space using an inverse of a data generating equation without the use of Bayes theorem. The resulting generalized fiducial distribution (GFD) can then be used for inference. After more than a decade of investigations, the authors and collaborators have developed a unifying theory for GFI, and provided GFI solutions to many challenging practical problems in different fields of science and industry. Overall, they have demonstrated that GFI is a valid, useful, and promising approach for conducting statistical inference. The goal of this paper is to deliver a timely and concise introduction to GFI, to present some of the latest results, as well as to list some related open research problems. It is the authors’ hope that their contributions to GFI will stimulate the growth and usage of this exciting approach for statistical inference.