Content uploaded by Robert Fonod

Author content

All content in this area was uploaded by Robert Fonod on Oct 18, 2017

Content may be subject to copyright.

Approximate Estimators for Linear Systems With

Additive Cauchy Noises

Robert Fonod∗and Moshe Idan†

Technion - Israel Institute of Technology, Haifa 3200003, Israel

Jason L. Speyer‡

University of California, Los Angeles, California 90095, USA

The recently published optimal Cauchy estimator poses practical implementation chal-

lenges due to its time-growing complexity. Alternatively, addressing impulsive measure-

ment and process noises while using common estimation approaches requires heuristic

schemes. Approximate methods, such as particle and Gaussian sum ﬁlters, were suggested

to tackle the estimation problem in heavy-tailed noise environment when constraining the

computational load. In this paper, the performance of a particle ﬁlter and a Gaussian sum

ﬁlter, designed for a linear system with speciﬁed Cauchy noise parameters, are compared

numerically to a Cauchy ﬁlter-based approximation showing the advantages of the latter.

I. Introduction

Impulsive processes appear naturally in a variety of practical problems that range from engineering and

science to economics and ﬁnance. In many applications the underlying random processes or noises are

better described by heavy-tailed non-Gaussian densities [1], for example by the Cauchy probability density

function (pdf) [2]. Traditional ﬁltering techniques often rely on the Gaussian assumption mainly because

modern methods and algorithms are able to handle such systems very eﬃciently [3], yielding, e.g., the Kalman

ﬁltera[4]. However, in the presence of signiﬁcantly non-Gaussian, heavy-tailed noises, and particularly in

the presence of outliers, the performance of the Kalman ﬁlter degrades severely [5].

Impulsive measurement and process noises in stochastic state estimators have typically been handled

by heuristic schemes that augment the estimation process. Recently, an analytical recursive non-linear

estimation scheme, i.e., the Idan/Speyer Cauchy Estimator (ISCE), for linear scalar systems driven by Cauchy

distributed process and measurement noises has been developed using pdf [6] and characteristic function (cf)

[7, 8] approaches. Speciﬁcally, the latter recursively generates, in closed-form, the characteristic function of

the unnormalized conditional pdf (ucpdf) of the state given the measurement history. The number of terms

in the sum that expresses this cf grows with each measurement update.

Although the ISCE provides the exact minimum conditional variance estimates of the system’s states

given a measurement sequence, its computational complexity and memory burden becomes very high, re-

quiring an approximation in its implementation. One such approximation was suggested in [6, 7] for the

scalar and one in [9, 10] for the two-state ISCE, respectively. For both cases, a ﬁxed sliding window of the

most recent measurements was considered to attain a near optimal estimate, leading to an estimator with a

ﬁnite computational burden.

Alternatively, in this study we wish to evaluate other general estimation algorithms in addressing impul-

sive noises [11–14]. Although those approaches are suboptimal they may oﬀer reasonable approximations

when tuned for the heavy-tailed, Cauchy noise environment. Two of the most popular approximations are

∗Postdoctoral Fellow, Faculty of Aerospace Engineering, robert.fonod@technion.ac.il.

†Associate Professor, Faculty of Aerospace Engineering, moshe.idan@technion.ac.il. Associate Fellow AIAA.

‡Ronald and Valerie Sugar Distinguished Professor in Engineering, Department of Mechanical and Aerospace Engineering,

speyer@ucla.edu. Fellow AIAA.

aThe Kalman ﬁlter is the minimum variance ﬁlter if the additive noise is Gaussian and the best linear minimum variance

ﬁlter if the noise is non-Gaussian, but the second order statistics for the additive noise exists.

1

the particle ﬁlter (PF) and the Gaussian sum ﬁlter (GSF) in that they were shown to converge to the correct

conditional density of the state as the number of terms in their implementation tends to inﬁnity. Therefore,

for real-time applications, they are implemented with some degree of approximation, producing a tradeoﬀ

between numerical eﬃciency and the estimation performance in constructing the conditional pdf of the state

given the measurement history and the resulting conditional mean and variance.

Using statistical Monte Carlo (MC) simulations and a judiciously chosen measure to evaluate the esti-

mation performance of a heavy-tailed process, we extend the sample run results presented in [15]. Our main

objective is to compare the eﬃciency of the PF and GSF to that of the scalar and two-state ISCE-based

approximations, hence providing guidance for a practical implementation of an estimator for a heavy-tail

noise environment.

II. Problem Formulation

Consider a single-input-single-output, multivariate, discrete-time and time-invariant linear dynamic sys-

tem described by

xk+1 =Φxk+Γwk,(1)

zk=Hxk+vk,(2)

with state vector xk∈Rn, scalar measurement zk∈R, and known matrices Φ∈Rn×n,Γ∈Rn×1, and

H∈R1×n. The sequence xkfor k= 1,2, . . ., is a discrete-time Markov process. The scalar noise inputs

wkand vkare independent Cauchy distributed random variables with zero median and scaling parameters

β > 0and γ > 0, respectively. Their pdf-s and cf-s are denoted pand φ, respectively. They are assumed to

be time independent and given by

pW(wk) = β/π

w2

k+β2⇒φW(¯ν) = e−β|¯ν|,(3)

pV(vk) = γ/π

v2

k+γ2⇒φV(¯ν) = e−γ|¯ν|,(4)

where ¯νis a scalar spectral variable.

The initial conditions at k= 1 are also assumed to be independent Cauchy distributed random variables,

i.e., each element x1;iof the initial state x1has a Cauchy pdf with a zero median and a scaling parameter

αi>0, i = 1, . . . , n. The joint pdf of the initial conditions and its cf are given by

pX1(x1) =

n

Y

i=1

αi/π

x2

1;i+α2

i

⇒φX1(ν) =

n

Y

i=1

e−αi|νi|,(5)

where νiis an element of the n-dimensional spectral variable ν∈Rn.

The measurement history used in the estimation problem formulation is deﬁned as z1:k,{z1, . . . , zk}.

The objective is to examine the practically computable minimum conditional variance estimates of xkgiven

the measurement history z1:k.

III. Bayesian Approach

From a Bayesian perspective, the ﬁltering problem is solved by constructing the posterior density p(xk|z1:k)b

of the state xkat time kgiven all the available information z1:k. The pdf p(xk|z1:k)contains all available

statistical information, and thus is the complete solution to the estimation problem [3]. In principle, an

optimal (with respect to any criterion) estimate of the state may be obtained from this pdf.

Given the posterior pdf p(xk−1|z1:k−1)at time k−1, the posterior pdf p(xk|z1:k)at time kcan be

computed using a two stage approach of prediction and update. For square systems (i.e., rank(Γ) = n), the

prediction stage can be obtained via the Chapman-Kolmogorov equation

p(xk|z1:k−1) = Zp(xk−1|z1:k−1)p(xk|xk−1)dxk−1,(6)

bFor simplicity, p(xk|z1:k)denotes pXk|Z1:k(xk|z1:k). This notation simpliﬁcation is used hereafter to avoid extensive

notation, whenever the context is clear.

2

where p(xk|xk−1)is the state transition probability density determined by the state equation (1) and the

known statistics of the process noise wk−1, i.e.,

p(xk|xk−1) = pWΓ−1(xk−Φxk−1).(7)

For non-square (i.e., rank(Γ)6=n) and non-singular systems (i.e., rank(Φ) = n), the prediction stage (6)

can be computed via [8]

p(xk|z1:k−1) = Φ−1ZpXk−1|Z1:k−1Φ−1xk−Φ−1Γwk−1z1:k−1pW(wk−1)dwk−1,(8)

where |·|stands for determinant of a matrix.

If at time ka measurement zkbecomes available, then zkcan be used to update the prior pdf p(xk|z1:k−1)

via Bayes’ rulec

p(xk|z1:k) = p(zk|xk)p(xk|z1:k−1)

Rp(zk|xk)p(xk|z1:k−1)dxk

,(9)

where p(zk|xk)is the measurement likelihood deﬁned by the measurement model (2) and the known statistics

of the measurement noise vk, i.e.,

p(zk|xk) = pV(zk−Hxk).(10)

Given p(xk|z1:k), the minimum conditional variance estimate ˆ

xkand the second conditional moment of

the state are given by

ˆ

xk=E[xk|z1:k] = Zxkp(xk|z1:k)dxk,(11)

E[xkxT

k|z1:k] = ZxkxT

kp(xk|z1:k)dxk.(12)

The minimal conditional variance of the estimation error deﬁned as ˜

xk,xk−ˆ

xkis then determined by

Pk=E[˜

xk˜

xT

k|z1:k] = E[xkxT

k|z1:k]−ˆ

xkˆ

xT

k.(13)

IV. Optimal Solution

Here we present a brief overview of the analytical estimation solution to the Cauchy problem, the ISCE.

IV.A. Scalar ISCE: pdf Approach

The pioneering work of Idan and Speyer[6] derived the ISCE for scalar-state systems by solving, in closed-

form, the integrals of the Bayesian update rule involved in constructing the posterior pdf p(xk|z1:k). It was

shown that under mild conditions on the system and noise parameters (see Assumption 4.1 in [6]), p(xk|z1:k)

can be expressed as

p(xk|z1:k) =

k+2

X

i=1

ai

k|kxk+bi

k|k

(xk−σi

k|k)2+ (ωi

k|k)2.(14)

Initialization and update rules for the series coeﬃcients ai

k|k,bi

k|k,σi

k|kand ωi

k|kcan be found in [6]. All but

ωi

k|kare functions of the measurements. The series coeﬃcients of the above pdf are used to determine ˆxk

and Pkas

ˆxk=π

k+2

X

i=1

ai

k|k[(σi

k|k)2−(ωi

k|k)2] + bi

k|kσi

k|k

ωi

k|k

,(15)

Pk=π

k+2

X

i=1

hi

k|k−2ai

k|kσi

k|k(ωi

k|k)2

ωi

k|k

−ˆx2

k,(16)

where hi

k|k= (ai

k|kσi

k|k+bi

k|k)((σi

k|k)2−(ωi

k|k)2).

The approach above provides a closed-form expression for p(xk|z1:k), which can be examined for its shape

and additional properties, as will be carried out in this study.

cAt time k= 1, the prior pdf is deﬁned as: p(x1|z1:0),pX1(x1).

3

IV.B. Multivariate ISCE: cf Approach

Since the pdf approach did not extend to multivariate systems, an alternative derivation that utilizes the

characteristic function of the ucpdf was proposed. The ISCE for scalar-state systems was re-derived ﬁrst

using this approach in [7]. Subsequently, it was extended to multivariate systems in [8].

Here, we propagate the cf of p(xk|z1:k)given by

φxk|z1:k(ν) = Zp(xk|z1:k)ejνTxkdxk.(17)

Moreover, for computational simplicity, the normalization by p(zk|z1:k−1)when computing p(xk|z1:k)can

be postponed, thus propagating the cf of the ucpdf

¯

φxk|z1:k(ν) = Zp(xk,z1:k)ejνTxkdxk.(18)

From the Bayesian update rule, p(xk,z1:k)is the ucpdf of the state, while the normalization factor to obtain

p(xk|z1:k)is given by ¯

φxk|z1:k(0).

In [8], it was shown that ¯

φxk|z1:k(ν)at the update time kcan be expressed as a growing sum of nk|k

t

terms

¯

φxk|z1:k(ν) =

nk|k

t

X

i=1

gk|k

iyk|k

gi (ν)exp yk|k

ei (ν),(19)

i.e., a sum of exponential terms multiplied by a coeﬃcient function, gk|k

i(·), that is a complex, nonlinear

function of the measurements. The argument of this coeﬃcient function, yk|k

gi (ν), is real and is expressed as a

sum of sign functions of νwith known parameters. The real part of the argument of the exponents, yk|k

ei (ν),

is the absolute value of a function of the spectral vector ν, and its imaginary part is a linear function of the

measurements. The details of the various parameters and functions of the above expression can be found in

[8]. Since ¯

φxk|z1:k(ν)is twice continuously diﬀerentiable, the minimum conditional variance estimate of the

state can be obtained by [8]

ˆ

xk=1

jpZ1:k(z1:k)

nk|k

t

X

i=1

gk|k

iyk|k

gi (ˆ

ν)¯yk|k

ei (ˆ

ν),(20)

where jis the imaginary unit, ˆ

νis a ﬁxed direction in the νdomain, ¯yk|k

ei (ˆ

ν)is a n-dimensional vector which

relates to yk|k

ei (ˆ

ν)by h¯yk|k

ei (ˆ

ν),ˆ

νi=yk|k

ei (ˆ

ν), and pZ1:k(z1:k)is

pZ1:k(z1:k) = ¯

φxk|z1:k(ˆ

ν)=0 =

nk|k

t

X

i=1

gk|k

iyk|k

gi (ˆ

ν).(21)

The estimation error covariance matrix is obtained by

Pk=1

j2pZ1:k(z1:k)

nk|k

t

X

i=1

gk|k

iyk|k

gi (ˆ

ν)¯yk|k

ei (ˆ

ν)¯yk|k

ei (ˆ

ν)T−ˆ

xkˆ

xT

k.(22)

V. Suboptimal Solutions

In this section, the PF and GSF are designed to approximate the posterior density p(xk|z1:k)for the

system described by (1) and (2), in which the noise sequences wkand vkand the initial state x1are Cauchy

distributed. As was observed above, the exact posterior pdf (14) or its cf (19) are expressed in the optimal

ISCE as a series with a growing number of terms. To avoid the associated computational burden, [6, 7, 9, 10]

suggest a truncation procedure that limits the number of terms in these series to a prescribed ﬁxed sliding

window of the nsmost recent measurements. The validity of these approximations was demonstrated even

when using only around twenty terms for the scalar case and a window of eight, i.e., around 3000 terms, for

the two-state ISCE. Consequently, only approximate ISCE implementations were considered in this study

when comparing the performance with the proposed PF and GSF.

4

V.A. Particle Filter

The particle ﬁlter, also known as the sequential Monte Carlo method, is a set of algorithms implementing

recursive Bayesian estimation based on point mass representations of probability densities. For good surveys,

see [16, 17].

The main idea of the PF is to represent the posterior density p(xk|z1:k)using a set of random samples

with associated weights. Let {xi

k, µi

k}np

i=1 be such a representation that characterizes the posterior p(xk|z1:k)

at time k, where {xi

k}np

i=1 is a set of npsupport points (particles) with associated nonnegative weights

{µi

k}np

i=1. The weights are normalized such that Pnp

i=1 µi

k= 1.

V.A.1. Sequential Importance Sampling

Most PFs are based on the algorithm known as sequential importance sampling (SIS), also known as the

bootstrap ﬁlter [11], which is a MC technique for solving the Bayesian problem. Given {xi

k−1, µi

k−1}np

i=1

and zk, the posterior density p(xk|z1:k)at time kcan be approximated using the principle of importance

sampling, Markov property, and Bayes’s rule as follows [17]

p(xk|z1:k)≈

np

X

i=1

µi

kδ(xk−xi

k),(23)

where δ(·)stands for the Dirac delta density. In (23), the ith particle xi

kis sampled from the chosen

importance density q(xk|xi

k−1, zk), also known as the proposal density. The ith importance weight µi

k

associated with xi

kis updated as

µi

k∝µi

k−1

p(zk|xi

k)p(xi

k|xi

k−1)

q(xi

k|xi

k−1, zk), i = 1,2, . . . , np,(24)

where p(zk|xi

k)is the measurement likelihood given by (10), p(xi

k|xi

k−1)is the state transition density given

by (7), and the symbol ∝signiﬁes “proportional to”.

The SIS algorithm thus consists of recursive propagation of the weights and particles as each measure-

ment is received sequentially. Based on the strong law of large numbers, as np→ ∞, the approximated

posterior of (23) approaches the true posterior p(xk|z1:k)[17]. The numerical approximation to (11) and

(13), respectively, is computed as

ˆ

xk≈

np

X

i=1

µi

kxi

k,(25)

Pk≈

np

X

i=1

µi

k(xi

k−ˆ

xk)(xi

k−ˆ

xk)T.(26)

V.A.2. Choice of the Importance Density

It has been shown in [12] that the optimal choice for the importance density that minimises the variance of

the importance weight µi

kis

q(xk|xi

k−1, zk) = p(xk|xi

k−1, zk) = p(zk|xk,xi

k−1)p(xk|xi

k−1)

p(zk|xi

k−1).(27)

However, this importance density is not always available and can be used only in special cases, e.g., for a

class of models for which p(xk|xi

k−1, zk)is Gaussian [12]. Hence, the most widely used importance density

is the prior pdf [17], i.e.

q(xk|xi

k−1, zk) = p(xk|xi

k−1).(28)

This choice of importance density means that for k≥2, we need to sample particles from p(xk|xi

k−1). A

sample xi

k∼p(xk|xi

k−1)can be obtained by ﬁrst generating a process noise sample wk−1∼pW(wk−1)and

setting xi

k=Φxi

k−1+Γwi

k−1, where pW(wk−1)is the Cauchy process noise pdf deﬁned in (3). For k= 1,

5

the particles are generated from the initial density, i.e., xi

1∼pX1(x1), where pX1(x1)is the initial state pdf

deﬁned in (5). Additionally, the weight update formula (24) reduces to

µi

k∝µi

k−1p(zk|xi

k) = µi

k−1pV(zk−Hxi

k), i = 1,2, . . . , np,(29)

where pV(·)is the Cauchy measurement noise pdf deﬁned in (4).

Notice that the importance density of (28) is independent of the measurement zkand hence the state

space is explored without knowledge of the actual observation. This choice of q(xk|xi

k−1, zk)may fail if new

measurements appear in the tail of the prior or if the measurement likelihood is too peaked in comparison

to the prior. This strategy promotes a well known problem of the SIS algorithm, known as the degeneracy

(or sample impoverishment) problem [17]. On the other hand, the advantage of the SIS algorithm is that its

computational burden is constant at each time step.

V.A.3. Degeneracy Problem

The degeneracy problem arises when after a few iterations of the SIS algorithm, only a few of the particles

have signiﬁcant weights while the other particles have negligible weights. This yields a very poor approxi-

mation of p(xk|z1:k), and may lead to a breakdown of the algorithm. Note, this phenomenon occurs even if

the optimal importance density (27) is used, but is more severe when using p(xk|xi

k−1).

A suitable measure to assess the degeneracy of the SIS algorithm is the eﬀective sample size estimate [17]

ˆnef f

p= 1/

np

X

i=1

(µi

k)2.(30)

Here, 1≤ˆnef f

p≤np, where the upper bound is attained when all particles have the same weight, and the

lower bound when the entire probability mass is at one particle. Small ˆnef f

pindicates severe degeneracy.

V.A.4. Resampling

The most common solution to tackle the degeneracy problem is the use of resampling. It discards particles

that have low importance weights, as they do not contribute to the approximation, and replaces them with

particles in the vicinity of those with high importance weights [11]. To prevent degeneracy when ˆneff

pis

below a ﬁxed threshold nt

p, an appropriate resampling procedure is utilized.

Several resampling schemes exist. The choice of the particular resampling scheme aﬀects the computa-

tional load as well as the approximation error, see the discussion and classiﬁcation in [18]. In our study, only

the systematic resampling strategy [19] was used as it was shown empirically to outperform other methods

for the Cauchy case.

V.B. Gaussian Sum Filter

In this subsection, the problem of determining a posterior density p(xk|z1:k)is treated using the Gaussian

sum approximation. The presented ﬁltering scheme is an adaptation of the well-known GSF algorithm of

Sorenson and Alspach [13] to the system described by (1) and (2), in which the noise sequences wkand vk

and the initial state x1are Cauchy distributed.

V.B.1. Gaussian Mixture Model

Based on the Wiener approximation theory [20], any pdf can be expressed, or approximated with a given

level of accuracy, using a weighted sum of Gaussian densities, what is also known as a Gaussian mixture

model (GMM) given by

pG

X(x) =

nx

X

i=1

µi

xNx;¯

xi

x,Pi

x.(31)

Here nxis a positive integer indicating the number of Gaussian components (terms) in the GMM, µi

x≥0,

i= 1,2, . . . , nxare scalar weighting factors satisfying Pnx

i=1 µi

x= 1, and Nx;¯

xi

x,Pi

xdenotes a multivariate

Gaussian density function with argument x∈Rm, mean ¯

xi

x∈Rm, and covariance matrix Pi

x∈Rm×m. It

can be shown that pG

X(x)is a valid density function and converges uniformly to any density of practical

concern by letting nxincrease and each elemental covariance approach the zero matrix.

6

V.B.2. Fitting a GMM to a Cauchy Density

To obtain the recursive Bayesian ﬁlter in the GMM framework, ﬁrst the stationary Cauchy densities given in

(3), (4), and (5) need to be ﬁtted (approximated) by a GMM. The ﬁtting can be done in various ways. In this

paper, this ﬁtting is formulated as the following constrained optimization problem: Given the desired number

of Gaussian components (nx≥1), ﬁnd {µi

x,¯

xi

x,Pi

x}nx

i=1 which minimizes the integral square diﬀerence (ISD)

between a particular Cauchy density of interest and a GMM, i.e.,

argmin

{µi

x,¯

xi

x,Pi

x}nx

i=1

Jf=ZpC

X(x)−pG

X(x)2dx,(32a)

such that nx

X

i=1

µi

x= 1, µi

x≥0,Pi

x= (Pi

x)T>0, i = 1,2, . . . , nx.(32b)

Here pG

X(x)is a GMM deﬁned in (31) and pC

X(x) = m

Q

i=1

δi/π

x2

i+δ2

i

is a zero median multivariate Cauchy density

function with argument x= [x1, x2, . . . , xm]Tand scaling parameters δi>0,i= 1,2, . . . , m. The complex

minimization problem (32) can be solved, e.g., numerically by standard constrained optimization tools.

Figure 1 illustrates an actual ﬁtting of a standard scalar Cauchy pdf (m= 1 and δ1= 1) with a GMM

having diﬀerent numbers of Gaussian components nx. For nx= 3,nx= 5, and nx= 7 the resulting ISD,

computed numerically, is approximately equal to Jf= 2.4×10−3,Jf= 9.79 ×10−5, and Jf= 6.01 ×10−5,

respectively.

-10 -5 0 5 10

x

0

0.1

0.2

0.3

0.4

pX(x)

Standard Cauchy pdf

GMM with 3 terms

GMM with 5 terms

GMM with 7 terms

-30 -20 -10 0 10 20 30

x

0

0.005

0.01

pX(x)

ZOOM

Figure 1. Fitting a Cauchy PDF with a GMM having diﬀerent numbers of terms.

To proceed with the GSF algorithm, assume that the Cauchy densities given in (3), (4), and (5) are all

ﬁtted by a GMM in the same way as pG

Xwas ﬁtted to pC

Xin (32), i.e.,

pW(wk)≈

nw

X

i=1

µi

wNwk; ¯xi

w, P i

w,(33a)

pV(vk)≈

nv

X

i=1

µi

vNvk; ¯xi

v, P i

v,(33b)

pX1(x1)≈

n1|0

X

i=1

µi

1|0Nx1;¯

xi

1|0,Pi

1|0.(33c)

7

V.B.3. Time Propagation

Suppose that at time k−1, the posterior density p(xk−1|z1:k−1)is approximated by a weighted sum of

nk−1|k−1Gaussian densities

p(xk−1|z1:k−1)≈

nk−1|k−1

X

i=1

µi

k−1|k−1Nxk−1;¯

xi

k−1|k−1,Pi

k−1|k−1.(34)

Then, the approximation of the a priori density p(xk|z1:k−1)at time kis obtained in the GSF sense as

p(xk|z1:k−1)≈

nk−1|k−1

X

i=1

nw

X

j=1

˜µij

k|k−1Nxk;¯

mij

k|k−1,Mij

k|k−1,(35)

where ¯

mij

k|k−1and Mij

k|k−1are computed using Kalman-like equations, i.e., for all i= 1, . . . , nk−1|k−1and

j= 1, . . . , nwwe have

¯

mij

k|k−1=Φ¯

xi

k−1|k−1+Γ¯xj

w,(36a)

Mij

k|k−1=ΦPi

k−1|k−1ΦT+ΓPj

wΓT.(36b)

The weighting factors ˜µij

k|k−1are updated for all i= 1, . . . , nk−1|k−1and j= 1, . . . , nwas

˜µij

k|k−1=µi

k−1|k−1µj

w.(37)

For notation convenience, the double summation in (35) can be restated as

p(xk|z1:k−1)≈

nk|k−1

X

i=1

µi

k|k−1Nxk;¯

xi

k|k−1,Pi

k|k−1,(38)

where nk|k−1= (nk−1|k−1)(nw), and µi

k|k−1,¯

xi

k|k−1, and Pi

k|k−1are formed in an obvious fashion from

˜µij

k|k−1,¯

mij

k|k−1, and Mij

k|k−1, respectively.

V.B.4. Measurement Update

Suppose that at time k, the a priori density p(xk|z1:k−1)is expressed as in (38)d. Then, using the measure-

ment zk, the a posterior density p(xk|z1:k)at time kis approximated in the GSF sense as

p(xk|z1:k)≈

nk|k−1

X

i=1

nv

X

j=1

˜µij

k|kNxk;¯

mij

k|k,Mij

k|k,(39)

where ¯

mij

k|kand Mij

k|kare computed for all i= 1, . . . , nk|k−1and j= 1, . . . , nvas

ˆzij

k=H¯

xi

k|k−1+ ¯xj

v,(40a)

Sij

k=HP i

k|k−1HT+Pj

v,(40b)

Kij

k=Pi

k|k−1HT(Sij

k)−1,(40c)

¯

mij

k|k=¯

xi

k|k−1+Kij

k(zk−ˆzij

k),(40d)

Mij

k|k=Pi

k|k−1−Kij

kSij

k(Kij

k)T.(40e)

The weighting factors ˜µij

k|kare updated for all i= 1, . . . , nk|k−1and j= 1, . . . , nvusing the following rule

˜µij

k|k=µi

k|k−1µj

vN(zk; ˆzij

k, Sij

k)

nk|k−1

P

l=1

nv

P

m=1

µl

k|k−1µm

vN(zk; ˆzlm

k, Slm

k)

.(41)

dNote that at time k= 1, the a priori density corresponds to the GMM representation of the initial state density given in

(33c), which has the same form as (38).

8

For convenience, one can rewrite (39) as:

p(xk|z1:k)≈

nk|k

X

i=1

µi

k|kNxk;¯

xi

k|k,Pi

k|k,(42)

where nk|k= (nk|k−1)(nv)and µi

k|k,¯

xi

k|k, and Pi

k|kare again formed from ˜µij

k|k,¯

mij

k|k, and Mij

k|k, respectively.

The weighting factors µi

k|ksatisfy µi

k|k≥0and Pnk|k

i=1 µi

k|k= 1, thus generating a proper pdf stated in (42).

Note that for nw=nv=n1|0= 1, the above GSF equations reduce to the standard Kalman ﬁlter equations.

V.B.5. Conditional Mean and Estimation Error Variance

Using the posterior density p(xk|z1:k)at time k, as given in (42), the conditional mean (11) and the estimation

error covariance (13) can be approximated in the GSF sense as

ˆ

xk≈

nk|k

X

i=1

µi

k|k¯

xi

k|k,(43)

Pk≈

nk|k

X

i=1

µi

k|kPi

k|k+ (¯

xi

k|k−ˆ

xk)(¯

xi

k|k−ˆ

xk)T.(44)

The major disadvantage of the GSF algorithm is that the number of terms in the Gaussian sum increases

exponentially in time. The number of terms in p(xk|z1:k)at step kcan be expressed explicitly as nk|k=

n1|0×(nw×nv)k/nw. This is obviously a heavy computational burden for real-time implementation. It is

normally addressed by the Gaussian Sum re-approximation discussed next.

V.B.6. Gaussian Sum Re-approximation

Several heuristic approaches have been proposed in the literature to avoid the exponential growth of terms

nk|kin the Gaussian sum (42), see for instance [13, 21] and references therein. A seemingly tempting method

of adopting Gaussian components with largest weights was found to be ineﬃcient [22]. This is mainly due

to the fact that even if the weight of a Gaussian term is very small at a certain point, it might become large

at the next time step. This is often the case for the Cauchy distributed environment, when an outlier is very

likely to occur. Ignoring such a component might have severe consequences.

In this paper, we suggest to re-approximate the posteriori density estimate (42) of the GSF by a new

reduced-order GMM with pre-ﬁxed number of Gaussian components. This reduction in terms is motivated

by the observation that a relatively small number of weighted Gaussian components can approximate a large

class of densities [23], as well as by our aim to conﬁne the computational time of the GSF in order to compare

its performance to the approximate ISCE and PF, both having a bounded computational burden.

For ease of notation, assume that at a given time step the measurement updated Gaussian sum density

(42) is denoted by pa(x)and has naterms, i.e.,

pa(x) =

na

X

i=1

µi

aNx;¯

xi

a,Pi

a.(45)

After evaluating (43) and (44) using pa(x), the objective is to approximate pa(x)by another Gaussian sum

density,

pb(x) =

nb

X

i=1

µi

bNx;¯

xi

b,Pi

b,(46)

which has a constant pre-ﬁxed number of terms nb. Obviously, if na≤nb, then there is no need for re-

approximation and pb(x) = pa(x)is considered. If na> nb, then the task of the suggested re-approximation

scheme is to determine {µi

b,¯

xi

b,Pi

b}nb

i=1 such that the mean and covariance of the new Gaussian mixture

pb(xk)match exactly those of pa(xk), while also minimizing the ISD between pa(xk)and pb(xk). Given

nb≥1, this task can be formulated as a constrained optimization problem

argmin

{µi

b,¯

xi

b,Pi

b}nb

i=1

J=Zhpa(x)−pb(x)i2dx,(47a)

9

such that nb

X

i=1

µi

b= 1, µi

b≥0,Pi

b= (Pi

b)T>0, i = 1,2, . . . , nb,(47b)

na

X

i=1

µi

a¯

xi

a=

nb

X

i=1

µi

b¯

xi

b,

na

X

i=1

µi

aPi

a+¯

xi

a(¯

xi

a)T=

nb

X

i=1

µi

bPi

b+¯

xi

b(¯

xi

b)T.(47c)

The cost function (47a) can be expanded and rewritten as

J=Zp2

a(x)dx−2Zpa(x)pb(x)dx+Zp2

b(x)dx,Jaa −2Jab +Jbb,(48a)

where the particular integrals Jaa,Jab, and Jbb were solved, in closed-form, by Williams and Maybeck [24],

to yield

Jaa =

na

X

i=1

na

X

j=1

µi

aµj

aN¯

xi

a;¯

xj

a,Pi

a+Pj

a,(48b)

Jab =

na

X

i=1

nb

X

j=1

µi

aµj

bN¯

xi

a;¯

xj

b,Pi

a+Pj

b,(48c)

Jbb =

nb

X

i=1

nb

X

j=1

µi

bµj

bN¯

xi

b;¯

xj

b,Pi

b+Pj

b.(48d)

A small value of Jindicates, in the ISD sense, that pb(x)is a good approximation of pa(x). However, there

is no guarantee that the re-approximated density also preserves the higher order moments of the original

one.

Note that solving the above constrained minimization problem generally involves computationally costly

nonlinear optimization with respect to nb×(n2/2+3n/2+1) independent variables, where nis the dimension

of the system state vector x. Obviously, the computational burden of the GSF will also be aﬀected by the

numerical procedure used to solve the above constrained optimization problem. In this paper, we assume

that the computational time of the GSF algorithm is dictated solely by nand nb. This can be achieved by

using numerical solvers with ﬁxed number of iterations.

VI. Numerical Study

In this section, the performance of the PF and GSF is analyzed and numerically compared to the scalar

and two-state ISCE. All simulations were performed in Matlab (R2016a) on a desktop computer with an

8-core Intel Xeon processor at 2.90 GHz and 128 GB of RAM.

The PF was implemented using the systematic resampling technique [19], with a threshold parameter

nt

p= (2/3)np. The GSF-related constrained optimization problems, deﬁned in (32) and (47), were solved

with the interior-point algorithm using fmincon function from the Optimization Toolbox [25] of Matlab.

Note that the ﬁtting problem deﬁned in (32) is performed oﬀ-line (ﬁlter design phase) and it does not aﬀect

the on-line computation load (testing phase) of the GSF. On the other hand, the GMM re-approximation

procedure deﬁned in (47) has to be solved almost after each measurement update. For this task, the fmincon

function is constrained to a maximum of 100 iterations. This leads to a GSF implementation which has a

limited computational burden.

VI.A. Scalar Case Example

For the scalar case example, the following system parameters were considered: Φ = 0.75,Γ = 1,H= 2,

β= 0.1,γ= 0.2, and α= 0.5.

VI.A.1. Sample Run

Before turning to a statistical MC evaluation, we compare the accuracy of the PF and GSF on a sample

run scenario driven by the noise sequences depicted in Fig. 2. For clarity of the presentation, we omit the

10

1 10 20 30 40 50 60 70 80 90 100

-20

0

20

wk

1 10 20 30 40 50 60 70 80 90 100

Step k

-200

0

200

vk

Figure 2. Cauchy distributed process and measurement noise sample sequences.

approximate implementation of the scalar ISCE from the sample run comparison as the results for a sliding

window of ns≥10 are indistinguishable from the optimal ISCE.

Based on the discussion presented in Section V.B.2, the process noise, measurement noise, and initial

state pdf of the GSF was ﬁtted to the Cauchy pdf with a weighted Gaussian sum of nw= 7,nv= 7, and

n1|0= 9 components, respectively. Two diﬀerent number of particles (np) and Gaussian components kept at

each step (nb) are considered for the PF and for the GSF, respectively, to demonstrate their eﬀect on the

accuracy of those approximations.

We ﬁrst assess the performance of the PF and GSF through their approximation of the true conditional

pdf p(xk|z1:k)obtained from the optimal ISCE as shown in Fig. 3. The conditional pdf at time step 8 is

considered as it yields a reﬁned bimodal distribution. Fig. 3 clearly demonstrates that only the PF with

100,000 particles approximates reasonably well the true conditional pdf. However, the computational burden

when compared to the ISCE is quite high. The average computation time of the PF with 50 particles is 4.5

times and of the PF with 100,000 particles is 7,000 times higher than the average computation time of the

optimal ISCE while processing 100 data steps, and when carried out on the same computer.

-1.2 -0.8 -0.4 0 0.4 0.8 1.2

x8

0

0.5

1

1.5

2

2.5

p(x8|z1:8)

PF (np= 50)

PF (np= 100,000)

GSF (nb= 10)

GSF (nb= 200)

ISCE (optimal)

x8

z8/H

Figure 3. Comparison of PF and GSF approximations of the true density at k=8.

On the other hand, the GSF-based approximation of the conditional pdf is very poor even when 200

Gaussian componentseare kept at each step, as shown in Fig. 3. The bimodal shape of the true conditional

pdf is barely preserved. The computational burden of the GSF with 10 components is approximately 1,000

eNote that without engaging the re-approximation procedure of (47), the number of Gaussian components in (42) at time

k= 8 would reach a number greater than 4×1013.

11

times and with 200 components is approximately 25,000 times higher than the computational burden of the

ISCE, respectively.

Next, estimation results for a 100 step sample run are presented. The system is again driven by the

noise sequences depicted in Fig. 2. The upper subplot in Fig. 4 shows the diﬀerence between the exact

minimum variance state estimate (ˆx∗

k) and its approximation (ˆxk) computed by the PF or the GSF. The

bottom subplot presents the diﬀerence between the exact standard deviation of the estimation error (σ∗

k) and

its approximation (σk) computed by the PF or GSF. Note that the exact values of ˆx∗

kand σ∗

kare computed

by the optimal ISCE, see (15) and (16), respectively. The approximate implementation of the scalar ISCE

is omitted here as the results are indistinguishable from the optimal ISCE.

1 10 20 30 40 50 60 70 80 90 100

-20

0

20

40

ˆx∗

k−ˆxk

PF (np= 50)

PF (np= 100,000)

GSF (nb= 10)

GSF (nb= 200)

1 10 20 30 40 50 60 70 80 90 100

Step k

-20

0

20

40

σ∗

k−σk

Figure 4. PF and GSF approximation error statistics compared to the optimal ISCE values.

Fig. 4 shows that the PF with both 50 and 100,000 particles disregards the measurement outlier at steps

5 and 62. On the other hand, process noise outliers of steps 26 and 31 cause a slight divergence of the PF

with 50 particles, but vanishing after few steps. Such behavior is observed when the number of particles is

insuﬃcient to capture properly the heavy-tail characteristics of the Cauchy noise environment. At the cost of

increased computational burden, and except the two measurement outliers discussed earlier, the performance

of the PF with 100,000 particles is comparable to that of the ISCE.

Similar conclusions can be drawn when examining the performance of the GSF. In this case, the mea-

surement noise outliers at steps 5 and 62 yield to a slight divergence, especially at step 62.

VI.A.2. Monte Carlo Analysis

The presented sample run results suggest that, at the cost of a signiﬁcantly higher computational burden,

both the PF and the GSF perform comparably to the ISCE when considering the scalar-state problem. Now

we will consider a Monte Carlo simulation-based evaluation of the PF and GSF performance. Additionally,

to allow a fair comparison of the PF and GSF with the ISCE, the number of particles npof the PF and

the number of components (nb,nw,nv,n1|0) of the GSF are selected such that the average computational

burden of the PF and GSF is similar to that of the approximate ISCE implementation with a window of

20 steps, i.e., ns= 20. Consequently, npwas set to 12 while the parameters of the GSF are set to be as

minimalist as possible, i.e., nb= 3,nw= 3,nv= 3, and n1|0= 3. This resulted in a computational burden

of the GSF to be approximately 30 times larger that the ISCE and PF.

Conventional evaluation of a MC-based ensemble mean and variance of the estimation error cannot be

performed for the studied problem as both the system state and the measurements are Cauchy distributed.

12

As a consequence, the estimation errors (no matter whether computed by ISCE, PF, or GSF) are also heavy-

tailed, leading to inﬁnite variance when computed via the conventional MC averaging method. Therefore, in

this paper, we suggest to evaluate the estimation performance using the log of the geometric mean square,

i.e.,

˜σ2

k,1

nmc

nmc

X

i=1

log x(i)

k−ˆx(i)

k2,(49)

where iindicates the ith MC realization, nmc is the total number of MC runs. Since the log is monotonic,

but also suppresses the large deviations caused by the Cauchy impulsive uncertainty, ˜σ2

kappears to allow an

ensemble measure of the heavy-tailed mean-square estimation error deviations.

Figure 5 presents the results for the suggested measure ˜σ2

kbased on a set of nmc = 10,000 MC runs.

For consistency, we also depict the obtained results for the optimal ISCE. As the PF implementation is

nondeterministic, for each MC realization a set of 100 inner MC runs were considered to obtain an averaged

state estimate ˆx(i)

kfor the PF.

1 10 20 30 40 50 60 70 80 90 100

Step k

-6

-4

-2

0

˜σ2

k

PF (np= 12)

GSF (nb= 3)

ISCE (ns= 20)

ISCE (optimal)

Figure 5. Log of the geometric mean square of the scalar estimation error.

It can be observed from Fig. 5 that the performance of the PF degrades severely when constraining the

computational load to be comparable to the ISCE with ns= 20. Despite the fact that the computational

burden of the GSF is much higher, its performance is inferior to the ISCE.

VI.B. Two-state Case Example - Monte Carlo Analysis

For the two-state case example, the following system parameters were chosen: H= [ 1 2 ],

Φ="0.9 0.1

−0.2 1.0#,Γ="1.0

0.3#,"α1

α2#="0.5

0.3#.

The process noise, β= 0.1, and measurement noise, γ= 0.2, parameters are the same as in the scalar

case example. Sample run results comparing the performance of the PF and GSF with the two-state ISCE,

implemented both with a six-step and a eight-step sliding window approximation, are presented in [15].

In the current study, the performance of the six-step window implementation, more suitable for real time

applications, will be shown to be statistically comparable to the eight-step approximation. Hence the six-

window ISCE will serve as the baseline for selecting the number of particles npof the PF and the number of

components (nb,nw,nv,n1|0) of the GSF. Consequently, npwas set to 4,500 while nb= 3,nw= 3,nv= 3,

and n1|0= 9.

Figure 6 shows the obtained results based on a set of 1,000 MC simulations. Again, 100 inner MC

runs were considered for the PF. Additionally to the six-window ISCE approximation, we depict also the

eight-window case to demonstrate their consistency. Figure 6 clearly demonstrates that 4,500 particles,

which result in a comparable computing time to that of the six-step ISCE approximation, are not enough to

properly estimate the system states. Similar conclusion can be drawn when examining the performance of

the GSF depicted in Fig. 6. In this case, to have a comparable computing time, the GSF is constrained to

keep only 3 Gaussian components. This ﬁgure clearly shows that the GSF performs very poorly, worse than

the respective PF approximation. It demonstrates that the heavy-tail characteristics of the Cauchy noise

environment cannot be captured well enough by a limited number of Gaussian pdf’s.

13

1 10 20 30 40 50 60 70 80 90 100

-6

-3

0

3

6

9

˜σ2

k;1

PF (np= 4,500) GSF (nb= 3) ISCE (ns= 6) ISCE (ns= 8)

1 10 20 30 40 50 60 70 80 90 100

Step k

-6

-3

0

3

6

9

˜σ2

k;2

Figure 6. Log of the geometric mean square of the two-state estimation error.

VII. Conclusion

The estimation performance of two popular approximate ﬁltering algorithms have been numerically com-

pared with the approximate scalar and two-state ISCE, for a linear discrete-time dynamic system with

additive Cauchy measurement and process noises. Despite the fact that both the PF and GSF were designed

based on the same a priori information as the ISCE, sample run results shows that in the scalar case, the

PF tends to converge to the optimal solution only when using a very large number of particles, while the

GSF demonstrated modest convergence and a large discrepancy when the pdf approximation is of concern.

Monte Carlo simulation results for the scalar and two-state case revealed that both the PF and GSF perform

poorly and even diverge for a computation time consistent with that of the approximate ISCE. Hence the

PF and GSF do not provide a practical alternative to the approximation that is based on the optimal so-

lution. Consequently, for real-time implementation of ﬁltering problems in the impulsive noise environment

represented here as heavy-tailed Cauchy noises, the approximate scalar and two-state ISCE with a bounded

computational burden is clearly the superior solution.

Acknowledgments

This work was supported by the National Science Foundation (NSF) under Grant No. 1607502, the

United StatesŰIsrael Binational Science Foundation (BSF) under Grant No. 2012122, and the joint NSF-

BSF ECCS program under Grant No. 2015702.

References

[1] Taleb, N. N., The Black Swan: The Impact of the Highly Improbable, Random House, New York, 2007.

[2] Carpenter, J. R. and Mashiku, A. K., “Cauchy Drag Estimation For Low Earth Orbiters,” in

“AAS/AIAA Space Flight Mechanics Meeting,” Williamsburg, VA; United States, 2015, pp. 2731–2746.

[3] Speyer, J. L. and Chung, W. H., Stochastic Processes, Estimation, and Control, SIAM, Philadelphia,

14

2008.

[4] Kalman, R. E., “A new approach to linear ﬁltering and prediction problems,” Journal of basic Engi-

neering, Vol. 82, No. 1, 1960, pp. 35–45, doi:10.1115/1.3662552.

[5] Schick, I. C. and Mitter, S. K., “Robust recursive estimation in the presence of heavy-tailed observation

noise,” The Annals of Statistics, Vol. 22, No. 2, 1994, pp. 1045–1080, doi:10.1214/aos/1176325511.

[6] Idan, M. and Speyer, J. L., “Cauchy Estimation for Linear Scalar Systems,” IEEE Transactions on

Automatic Control, Vol. 55, No. 6, 2010, pp. 1329–1342, doi:10.1109/TAC.2010.2042009.

[7] Idan, M. and Speyer, J. L., “State Estimation for Linear Scalar Dynamic Systems with Additive Cauchy

Noises: Characteristic Function Approach,” SIAM Journal on Control and Optimization, Vol. 50, No. 4,

2012, pp. 1971–1994, doi:10.1137/110831362.

[8] Idan, M. and Speyer, J. L., “Multivariate Cauchy Estimator with Scalar Measurement and Pro-

cess Noises,” SIAM Journal on Control and Optimization, Vol. 52, No. 2, 2014, pp. 1108–1141,

doi:10.1137/120891897.

[9] Fernandez, J. H., Methods for Estimation and Control of Linear Systems Driven by Cauchy Noises,

Ph.D. thesis, UCLA: Mechanical Engineering 0330, 2013.

[10] Fernandez, J. H., Speyer, J. L., and Idan, M., “Stochastic Estimation for Two-State Linear Dynamic

Systems with Additive Cauchy Noises,” IEEE Transactions on Automatic Control, Vol. 60, No. 12, 2015,

pp. 3367–3372, doi:10.1109/TAC.2015.2422478.

[11] Gordon, N. J., Salmond, D. J., and Smith, A. F. M., “Novel approach to nonlinear/non-Gaussian

Bayesian state estimation,” in “IEE Proceedings F - Radar and Signal Processing,” IET, Vol. 140, 1993,

pp. 107–113, doi:10.1049/ip-f-2.1993.0015.

[12] Doucet, A., Godsill, S., and Andrieu, C., “On sequential Monte Carlo sampling methods for Bayesian

ﬁltering,” Statistics and Computing, Vol. 10, No. 3, 2000, pp. 197–208, doi:10.1023/A:1008935410038.

[13] Sorenson, H. W. and Alspach, D. L., “Recursive Bayesian Estimation Using Gaussian Sums,” Automat-

ica, Vol. 7, No. 4, 1971, pp. 465–479, doi:10.1016/0005-1098(71)90097-5.

[14] Alspach, D. L. and Sorenson, H. W., “Nonlinear Bayesian estimation using Gaussian sum ap-

proximations,” IEEE Transactions on Automatic Control, Vol. 17, No. 4, 1972, pp. 439–448,

doi:10.1109/TAC.1972.1100034.

[15] Fonod, R., Idan, M., and Speyer, J. L., “State Estimation for Linear Systems with Additive Cauchy

Noises: Optimal and Suboptimal Approaches,” in “Proceedings of European Control Conference,” IEEE,

Piscataway, NJ, 2016, pp. 1434–1439, doi:10.1109/ECC.2016.7810491.

[16] Doucet, A., de Freitas, N., and Gordon, N., Sequential Monte-Carlo Methods in Practice, Springer-

Verlag, New York, 2001.

[17] Arulampalam, M. S., Maskell, S., Gordon, N., and Clapp, T., “A tutorial on particle ﬁlters for online

nonlinear/non-Gaussian Bayesian tracking,” IEEE Transactions on Signal Processing, Vol. 50, No. 2,

2002, pp. 174–188, doi:10.1109/78.978374.

[18] Li, T., Bolic, M., and Djuric, P. M., “Resampling Methods for Particle Filtering: Classiﬁcation, im-

plementation, and strategies,” IEEE Signal Processing Magazine, Vol. 32, No. 3, 2015, pp. 70–86,

doi:10.1109/MSP.2014.2330626.

[19] Kitagawa, G., “Monte Carlo ﬁlter and smoother for non-Gaussian nonlinear state space models,” Journal

of computational and graphical statistics, Vol. 5, No. 1, 1996, pp. 1–25, doi:10.2307/1390750.

[20] Achieser, N. I., Theory of Approximation, Dover Publications, New York, 1992. Chap. 6.

[21] Psiaki, M. L., Schoenberg, J. R., and Miller, I. T., “Gaussian Sum Reapproximation for Use in a

Nonlinear Filter,” Journal of Guidance, Control, and Dynamics, Vol. 38, No. 2, 2015, pp. 292–303,

doi:10.2514/1.G000541.

15

[22] Kitagawa, G., “The two-ﬁlter formula for smoothing and an implementation of the Gaussian-sum

smoother,” Annals of the Institute of Statistical Mathematics, Vol. 46, No. 4, 1994, pp. 605–623,

doi:10.1007/BF00773470.

[23] Kitagawa, G., “Non-Gaussian seasonal adjustment,” Computers & Mathematics with Applications,

Vol. 18, No. 6, 1989, pp. 503–514, doi:10.1016/0898-1221(89)90103-X.

[24] Williams, J. L. and Maybeck, P. S., “Cost-function-based Gaussian mixture reduction for target track-

ing,” in “Proceedings of the sixth international conference of information fusion,” Vol. 2, 2003, pp.

1047–1054, doi:10.1109/ICIF.2003.177354.

[25] Venkataraman, P., Applied optimization with MATLAB programming, John Wiley & Sons, New York,

2002.

16