Content uploaded by Nozer Singpurwalla
Author content
All content in this area was uploaded by Nozer Singpurwalla on Nov 28, 2014
Content may be subject to copyright.
Understanding the Kalman Filter
Richard J. Meinhold; Nozer D. Singpurwalla
The American Statistician, Vol. 37, No. 2. (May, 1983), pp. 123-127.
Stable URL:
http://links.jstor.org/sici?sici=0003-1305%28198305%2937%3A2%3C123%3AUTKF%3E2.0.CO%3B2-Z
The American Statistician is currently published by American Statistical Association.
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained
prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in
the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/journals/astata.html.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic
journals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers,
and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community take
advantage of advances in technology. For more information regarding JSTOR, please contact support@jstor.org.
http://www.jstor.org
Thu Dec 20 12:01:53 2007
Understanding the Kalman Filter
RICHARD
J.
MEINHOLD
and
NOZER D. SINGPURWALLA*
This is an expository article. Here we show how the
successfully used Kalman filter, popular with control
engineers and other scientists, can be easily understood
by statisticians if we use a Bayesian formulation and
some well-known results in multivariate statistics. We
also give a simple example illustrating the use of the
Kalman filter for quality control work.
KEY WORDS: Bayesian inference; Box-Jenkins mod-
els; Forecasting; Exponential smoothing; Multivariate
normal distribution; Time series.
1. INTRODUCTION
The Kalman filter (KF) commonly employed by con-
trol engineers and other physical scientists has been
successfully used in such diverse areas as the processing
of signals in aerospace tracking and underwater sonar,
and the statistical control of quality. More recently, it
has also been used in some nonengineering applications
such as short-term forecasting and the analysis of life
lengths from dose-response experiments. Un-
fortunately, much of the published literature on the
KF
is in the engineering journals (including the original
development, in Kalman 1960 and Kalman and Bucy
1961), and uses a language, notation, and style that is
alien to statisticians. Consequently, many practitioners
of statistics are not aware of the simplicity of this useful
methodology. However, the model, the notions, and
the techniques of Kalman filtering are potentially of
great interest to statisticians owing to their similarity to
linear models of regression and time series analysis, and
because of their great utility in applications.
In actuality, the KF may be easily understood by the
statistician if it is cast as a problem in Bayesian infer-
ence and we employ some well-known elementary re-
sults in multivariate statistics. This feature was evi-
dently first published by Harrison and Stevens (1971,
1976), who were primarily interested in Bayesian fore-
casting. However, the particular result presented by
them is in a nontutorial manner, with emphasis placed
on the implementation of the
KF.
Our aim, on the other
hand, is to provide an exposition of the key notions of
the approach in a single source, laying out its derivation
in a few easy steps, filling in some clarifying technical
details, giving an example, and giving an interpretation
of results.
A
more mathematical discussion of the
KF
emphasizing the stochastic differential equation ap-
proach is given by Wegman (1982). We feel that once it
is demystified, the KF will be used more often by ap-
plied statisticians.
2. THE KALMAN FILTER MODEL:
MOTIVATION AND APPLICATIONS
Let
Y,,
Y,-,,
. .
.
,
Y,, the data (which may be either
scalars or vectors), denote the observed values of a
variable of interest at times t, t
-
1,.
. .
,l. We assume
that Y, depends on an unobservable quantity 0,, known
as the state of nature. Our aim is to make inferences
about 0,, which may be either a scalar or a vector and
whose dimension is independent of the dimension of Y,
.
The relationship between
Y,
and 0, is linear and is speci-
fied by the observation equation
where
F,
is a known quantity. The observation error
v,
is assumed to be normally distributed with mean zero
and a known variance V,, denoted as
v,
-
N(0, V,).
The essential difference between the KF and the con-
ventional linear model representation is that in the
former, the state of nature-analogous to the re-
gression coefficients of the latter-is not assumed to be
a constant but may change with time. This dynamic
feature is incorporated via the system equation, wherein
G, being a known quantity, and the system equation
error w,
-
N(0, W,), with W, known. Since there are
many physical systems for which the state of nature 0,
changes over time according to a relationship pre-
scribed by engineering or scientific principles, the abili-
ty to include a knowledge of the system behavior in the
statistical model is an apparent source of attractiveness
of the KF. Note that the relationships (2.1) and (2.2)
specified through
F,
and G, may or may not change with
time, as is also true of the variances V, and W,; we have
subscripted these here for the sake of generality.
In addition to the usual linear model assumptions
regarding the error terms, we also postulate that
v,
is
independent of w,; while extension to the case of de-
pendency is straightforward, there is no need in this
article to do so.
*Richard
J.
Meinhold is a graduate student, and Nozer D. Sing-
2'1
purwalla is Professor of Operations Research and Statistics at George
Washington University, Washington, D.C. 20052. The work of the
To look at how the KF model might be employed in
second author was supported in part
by
the Office of Naval Research
Contract N00014-77-C-0263 and
by
the U,S, Army Research Office
practice,
we
consider a
version
of
the
fie-
under Grant DAAG-29-80-C-0067 with George Washington univer-
quently referenced example of tracking a satellite's or-
sity.
bit around the earth. The unknown state of nature 0,
O
The
American Statistician, May
1983,
Vol.
37,
No.
2
123
could be the position and speed of the satellite at time
t, with respect to a spherical coordinate system with
origin at the center of the earth. These quantities can-
not be measured directly. Instead, from tracking sta-
tions around the earth, we may obtain measurements of
distance to the satellite and the accompanying angles of
measurement; these are the Y,'s. The principles of ge-
ometry, mapping Y, into 0,, would be incorporated in
F,, while v, would reflect the measurement error; G,
would prescribe how the position and speed change in
time according to the physical laws governing orbiting
bodies, while w, would allow for deviations from these
laws owing to such factors as nonuniformity of the
earth's gravitational field, and so on.
A less complicated situation is considered by Phadke
(1981) in the context of statistical quality control. Here
the observation Y, is a simple (approximately normal)
transform of the number of defectives observed in a
sample obtained at time t, while 8,,, and 02,, represent,
respectively, the true defective index of the process and
thedrift of this index. We then have as the bbservation
equation
and as the system equations
In vector notation, this system of equations becomes
where
does not change with time.
If we examine Y,
-
Y,-I for this model, we observe
that, under the assumption of constant variances, name-
ly, V,
=
V and W,
=
W, the autocorrelation structure of
this difference is identical to that of an ARZMA (0,1,1)
process in the sense of Box and Jenkins (1970). Al-
though such a correspondence is sometimes easily dis-
cernible, we should in general not, because of the dis-
crepancies in the philosophies and methodologies in-
volved, consider the two approaches to be equivalent.
3.
THE RECURSIVE ESTIMATION PROCEDURE
The term "Kalman filter" or "Kalman filtering" re-
fers to a recursive procedure for inference about the
state of nature 0,. The key notion here is that given the
data Y,
=
(Y,,
. . .
,
Y1), inference about 0, can be carried
out through a direct application of Bayes's theorem:
Prob{State of Nature
I
Data}
Prob{Data
I
State of Nature)
x
Prob{State of Nature), (3.1)
which can also be written as
~(0,
I
y,)
a
P(Y,
I
el, y,-~)
I
x
~(0,Y,-~), (3.2)
where the notation P(A
I
B) denotes the probability of
occurrence of event A given that (or conditional on)
event B has occurred. Note that the expression on the
left side of (3.2) denotes the posterior distribution for 0
at time t, whereas the first and second expressions on
the right side denote the likelihood and the prior distri-
bution for 0, respectively.
The recursive procedure can best be explained if we
focus attention on time point t
-
1, t
=
1,2,.
.
.
,and the
observed data until then, Y,-,
=
(Y,-,, Y,-,,
. .
.
,
Y1). In
what follows, we use matrix manipulations in allowing
for Y and/or 0 to be vectors, without explicitly noting
them as such.
At t
-
1, our state of knowledge about 8,-, is em-
bodied in the following probability statement for @,-,:
where and Z,-I are the expectation and the variance
of (Or-,
I
Y,-,). In effect, (3.3) represents the posterior
distribution of 0,-,; its evolution will become clear in the
subsequent text.
It is helpful to remark here that the recursive pro-
cedure is started off at time 0 by choosing
80
and Zo to
be our best guesses about the mean and the variance of
00, respectively.
We now look forward to time t, but in two stages:
1. prior to observing Y,, and
2. after observing Y,.
Stage
1.
Prior to observing Y,, our best choice for 0,
is governed by the system equation (2.2) and is given as
G,O,-l
+
w,
.
Since
Or-,
is described by (3.3), our state of
knowledge about 0, is embodied in the probability
statement
this is our prior distribution.
In obtaining (3.4), which represents our prior for 0, in
the next cycle of (3.2), we used the well-known result
that for any constant C
X
-
N(p, 2)
3
CX
-
N(Cp, CCC'),
where C' denotes the transpose of C.
Stage
2.
On observing Y,, our goal is to compute the
posterior of 0, using (3.2). However, to do this, we need
to know the likelihood Y(0,(Y,), or equivalently
P(Y, (O,, Y,-,), the determination of which is under-
taken via the following arguments.
Let el denote the error in predicting Y, from the point
t
-
1; thus
Since F,, G,, and
8,-,
are all known, observing Y, is
equivalent to observing el. Thus (3.2) can be rewritten
as
124
O
The American Statistician,
May
1983,
Vol.
37,
No.
2
with P(e,
(
0,
,
Y,-l) being the likelihood.
Using the fact that Y,
=
F,O,
+
v,, (3.5) can be written
as el= F,(O, -G,~,-~)+V,, so that E(e,(O,, Y,-l)=
F,
(01
-
G10,-1).
Since v,
-
N(0, V,), it follows that the likelihood is
described by
We can now use Bayes's theorem (Eq. (3.6)) to
obtain
and this best describes our state of knowledge about 0,
at time t. Once P(0,
(
Y,, Y,-l) is computed, we can go
back to (3.3) for the next cycle of the recursive pro-
cedure. In the next section, we show that the posterior
distribution of (3.8) is of the form presented in (3.3).
4.
DETERMINATION OF THE
POSTERIOR DISTRIBUTION
The tedious effort required to obtain P(0,) Y,) using
(3.8) can be avoided if we make use of the following
well-known result in multivariate statistics (Anderson
1958, pp. 28-29), and some standard properties of the
normal distribution.
Let Xl and X2 have a bivariate normal distribution
with means
p1
and p2, respectively, and a covariance
matrix
we denote this by
When (4.1) holds, the conditional distribution of Xl
given X2 is described by
The quantity
p1
+
C12CG1(x2
-
k2) is called the re-
gression function, and C12C&lis referred to as the coeffi-
cient of the least squares regression of Xl on x2.
As a converse to the relationship (4.1) implies (4.2),
we have the result that whenever (4.2) holds, and when
X2- N(p2, &), then (4.1) will hold; we will use this
converse relationship.
For our situation, we suppress the conditioning vari-
ables Y,-l and let Xl correspond to el, and X2 corre-
spond to 0,; we denote this correspondence by Xl
-a
el
and X2
@
0,. Since (0,) Y,-l)
-
~(~,8,-1,R,) (see (3.4)),
we note that
and
If in (4.2) we replace XI, X2, p2, and C22 by el, O,, ~181-~,
and R,, respectively, and recall the result that (el
)8,,
yl-J
-
N(F,(O,
-
G,~,-I), V,) (Eq. (3.7)), then
P-1
+
C12Rt-I (0,
-
~rer-1) e~r(or
-
~ter-I),
so that pl@O and C12@ FrRr; similarly,
C11
-
C12CG1C21
=
211
-
FtRrF: Vt
so that Cll V,
+
FtR,F:
.
We now invoke the converse relation mentioned pre-
viously to conclude that the joint distribution of 0, and
e,
,
given Y,- can be described as
Making el the conditioning variable and identifying
(4.3) with (4.1), we obtain via (4.2) the result that
This is the desired posterior distribution. We now sum-
marize to highlight the elements of the recursive pro-
cedure.
After time t
-
1, we had a posterior distribution for
with mean
8,-1
and variance C,-l (Eq. (3.3)).
Forming a prior for 0, with mean
~,8,-~
and variance
R,
=
G,C,-lG:
+
W,
(Eq. (3.4)) and evaluating a like-
lihood given e,
=
Y,
-
F,G,~,-,
(Eq. (3.5)), we arrive at
the posterior density for 0,; this has mean
8,
=
~,8,-~
(4.5)
+
R,F: (V,
+
F,R,F: )-'el
and variance
2,
=
R,
-
R,F: (V,
+
F,R,F: )-'F,R,. (4.6)
We now continue through the next cycle of the process.
5.
INTERPRETATION OF RESULTS AND
CONCLUDING REMARKS
If we look at (4.4) for obtaining some additional in-
sight into the workings of the Kalman filter, we note
that the mean of the posterior distribution of (0,l el,
Y,-l) is indeed the regression function of 0, on e,. The
mean (regression function) is the sum of two quantities
~,8,-~,
and a multiple of the one step ahead forecast
error e,
.
We first remark that ~,8,-1 is the mean of the prior
distribution of 0, (see (3.4)), and by comparing (4.3)
and (4.4) to (4.1) and (4.2) we verify that the multiplier
of e,, R,F: (V,
+
F,RJ;)-', is the coefficient of the least
squares regression of 0, on el (conditional on Y,-l). Thus
one way to view Kalman filtering is to think of it as an
updating procedure that consists of forming a pre-
liminary (prior) guess about the state of nature and then
adding a correction to this guess, the correction being
American Statistician, May
1983,
Vol.
37,
No.
2
125
determined by how well the guess has performed in
predicting the next observation.
Second, we should clarify the meaning of regressing
0,
on
e,
since this pair constitutes but a single obser-
vation and the regression relationship is not estimated
in the familiar way. Rather, we recall the usual frame-
work of sequential Bayesian estimation, wherein a new
posterior distribution arises with each successive piece
of data. At time zero, the regression of
81
on
el
is deter-
mined entirely by our prior specifications. On receiving
the first observation, the value of
el
is mapped into
8,
through this function, which is then replaced by a new
regression relation based on
el, Fl, GI, V1,
and
W1.
This
in turn is used to map
e2
into
e2,
and so on as the process
continues in the usual Bayesian priorlposterior iterative
manner; see Figure 1. Thus Kalman filtering can also be
viewed as the evolution of a series of regression func-
tions of
0,
on
el,
at times 0,1,
.
.
.
,
t
-
1,
t,
each having
a potentially different intercept and regression coeffi-
cient; the evolution stems from a learning process in-
volving all the data.
The original development of the Kalman filter ap-
proach was motivated by the updating feature just de-
scribed, and its derivation followed via the least squares
estimation theory. The Bayesian formulation described
here yields the same result in an elegant manner and
additionally provides the attractive feature of enabling
inference about
0,
through a probability distribution
rather than just a point estimate.
6.
ILLUSTRATIVE EXAMPLES
6.1
The Steady Model
We consider two examples to illustrate the preceding
mechanism and its performance.
regression function
posterior to t and
prior to t+l
Regression function
posterior to t-1
and prior to t
Figure
1.
Regression of 0, on e,
We first return to the quality control model of Section
2.1, simplified by the removal of the drift parameter.
This yields
Y,
=
0,
+
v,
(Obs. Eqn.)
and (6.1)
0,
=
0,-I
+
w,
(Sys. Eqn.).
This is a simplest possible nontrivial KF model (some-
times referred to in the forecasting literature as the
steady model); it also corresponds, in the sense of pos-
sessing the same autocorrelation structure (assuming
constant variances), to a class of
ARIMA
(0, 1, 1) mod-
els of Box and Jenkins (1970). In this situation,
F,
=
G,
=
1; if we further specified that
Zo
=
1,
V,
=
2,
W,
=
1, we can easily demonstrate inductively that
R,
=
G,Z,-,G:
+
W,
=2,
and from (4.6),
Z,
=l. In
(4.5), then, our recursive relationship becomes
A A
0,
=
el-,
+
;
(Y,
-
el-,)
Table
1.
A
Simulation of the Process Described in Section
(6.2)
126
O
The American Statistician, May
1983,
Vol. 37,
No.
2
We see then that in this simple situation the
KF
esti-
mator of
Of,
and thus
Y,,,,
is actually equivalent to that
derived from a form of exponential smoothing.
6.2
A
Numerical
Example
We present in Table
1
a numerical example involving
a simulation of the (scalar-dimensional) general model
of
(2.1) and (2.2). We continue to specify
So
=
1,
V,
=
2,
W,
1,
but incorpkrate cyclical behavior in
0,
by setting
while
F,
is in the nature of the familiar independent
variable of ordinary regression. This situation clearly
cannot be contained in any class of the
ARZMA
family;
instead it is analogous, if not equivalent, to the transfer
function model approach of Box and Jenkins (1970).
Starting with a value for
Bo,
the disturbances
v,
and
w,
were generated from a table of random normal variates
and used in turn to produce, via the system and obser-
vation equations, the processes
(0,)
and
{Y,),
of which
only the latter would ordinarily be visible.
A
"bad
guess" value of
go
was chosen; as can be seen in Figure
2,
where the actual values of
8,
and their estimates
8,
are
plotted, the effect of this error is short-lived. The reader
may find it conducive to a better understanding of the
model to work through several iterations of the recur-
sive procedure.
[Received October 1981. Revised July 1982.
]
REFERENCES
ANDERSON, T. W. (1958),
An Introduction to Multivariate Statisti-
cal Analysis,
New York: John Wiley.
BOX,
G.E.P., and JENKINS, G.M. (1970),
Time Series Analysis,
Forecasting and Control,
San Francisco: Holden-Day.
Figure
2.
A
Plot
of
the Simulated Values
of
8,
the State
of
Nature
at Time t, and Their Estimated Values
0,
Via the Kalman Filter
HARRISON, P.J., and STEVENS, C.F. (1971). "A Bayesian Ap-
proach to Short-Term Forecasting,"
Operations Research Quarter-
ly,
22, 341-362.
-
(1976), "Bayesian Forecasting (with discussion),"
Journal of
the Royal Statistical Society,
Ser.
B,
38, 205-247.
KALMAN, R.E. (1960), "A New Approach to Linear Filtering and
Prediction Problems,"
Journal of Basic Engineering,
82, 34-45.
KALMAN, R.E., and BUCY, R.S. (1961), "New Results in Linear
Filtering and Prediction Theory,"
Journal of Basic Engineering,
83, 95-108.
PHADKE, M.S. (1981), '"~ualit~ Audit Using Adaptive Kalman
Filtering,"
PSQC Quality Congress Transactions-San Francisco,
1045-1052.
WEGMAN,
E.J.
(1982), "Kalman Filtering," in
Encyclopedia of
Statistics,
eds. Norman Johnson and Samuel Kotz, New York: John
Wiley.
O
The American Statistician, May
1983,
Vol.
37,
No.
2
127