Latent Force Models.
ABSTRACT Purely data driven approaches for machine learning present diculties when data is scarce relative to the complexity of the model or when the model is forced to extrapolate. On the other hand, purely mechanistic ap proaches need to identify and specify all the interactions in the problem at hand (which may not be feasible) and still leave the is sue of how to parameterize the system. In this paper, we present a hybrid approach us ing Gaussian processes and dierential equa tions to combine data driven modelling with a physical model of the system. We show how dierent, physicallyinspired, kernel func tions can be developed through sensible, sim ple, mechanistic assumptions about the un derlying system. The versatility of our ap proach is illustrated with three case studies from computational biology, motion capture and geostatistics.

Conference Paper: Bayesian Inference for Change Points in Dynamical Systems with Reusable States—a Chinese Restaurant Process Approach
Proceedings of the International Conference on Artificial Intelligence and Statistics;  SourceAvailable from: David Luengo
 SourceAvailable from: de.arxiv.org
Article: Batch Nonlinear ContinuousTime Trajectory Estimation as Exactly Sparse Gaussian Process Regression
[Show abstract] [Hide abstract]
ABSTRACT: In this paper, we revisit batch state estimation through the lens of Gaussian process (GP) regression. We consider continuousdiscrete estimation problems wherein a trajectory is viewed as a onedimensional GP, with time as the independent variable. Our continuoustime prior can be defined by any nonlinear, timevarying stochastic differential equation driven by white noise; this allows the possibility of smoothing our trajectory estimates using a variety of vehicle dynamics models (e.g., `constantvelocity'). We show that this class of prior results in an inverse kernel matrix (i.e., covariance matrix between all pairs of measurement times) that is exactly sparse (blocktridiagonal) and that this can be exploited to carry out GP regression (and interpolation) very efficiently. When the prior is based on a linear, timevarying stochastic differential equation and the measurement model is also linear, this GP approach is equivalent to classical, discretetime smoothing (at the measurement times); when a nonlinearity is present, we iterate over the whole trajectory to maximize accuracy. We test the approach experimentally on a simultaneous trajectory estimation and mapping problem using a mobile robot dataset.12/2014;
Page 1
9
Latent Force Models
Mauricio Alvarez
School of Computer Science
University of Manchester
Manchester, UK, M13 9PL
alvarezm@cs.man.ac.uk
David Luengo
Dep. Teor´ ıa de Se˜ nal y Comunicaciones
Universidad Carlos III de Madrid
28911 Legan´ es, Spain
luengod@ieee.org
Neil D. Lawrence
School of Computer Science
University of Manchester
Manchester, UK, M13 9PL
neill@cs.man.ac.uk
Abstract
Purely data driven approaches for machine
learning present difficulties when data is
scarce relative to the complexity of the model
or when the model is forced to extrapolate.
On the other hand, purely mechanistic ap
proaches need to identify and specify all the
interactions in the problem at hand (which
may not be feasible) and still leave the is
sue of how to parameterize the system. In
this paper, we present a hybrid approach us
ing Gaussian processes and differential equa
tions to combine data driven modelling with
a physical model of the system. We show how
different, physicallyinspired, kernel func
tions can be developed through sensible, sim
ple, mechanistic assumptions about the un
derlying system. The versatility of our ap
proach is illustrated with three case studies
from computational biology, motion capture
and geostatistics.
1 Introduction
Traditionally, the main focus in machine learning
has been model generation through a data driven
paradigm. The usual approach is to combine a data
set with a (typically fairly flexible) class of models
and, through judicious use of regularization, make use
ful predictions on previously unseen data. There are
two key problems with purely data driven approaches.
Firstly, if data is scarce relative to the complexity of
the system we may be unable to make accurate predic
tions on test data. Secondly, if the model is forced to
Appearing in Proceedings of the 12thInternational Confe
rence on Artificial Intelligence and Statistics (AISTATS)
2009, Clearwater Beach, Florida, USA. Volume 5 of JMLR:
W&CP 5. Copyright 2009 by the authors.
extrapolate, i.e. make predictions in a regime in which
data has not been seen yet, performance can be poor.
Purely mechanistic models, i.e. models which are in
spired by the underlying physical knowledge of the
system, are common in many areas such as chem
istry, systems biology, climate modelling and geophys
ical sciences, etc. They normally make use of a fairly
well characterized physical process that underpins the
system, typically represented with a set of differential
equations. The purely mechanistic approach leaves us
with a different set of problems to those from the data
driven approach. In particular, accurate description
of a complex system through a mechanistic modelling
paradigm may not be possible: even if all the physical
processes can be adequately described, the resulting
model could become extremely complex. Identifying
and specifying all the interactions might not be feasi
ble, and we would still be faced with the problem of
identifying the parameters of the system.
Despite these problems, physically well characterized
models retain a major advantage over purely data
driven models. A mechanistic model can enable ac
curate prediction even in regions where there may be
no available training data. For example, Pioneer space
probes have been able to enter different extra terres
trial orbits despite the absence of data for these orbits.
In this paper we advocate an alternative approach.
Rather than relying on an exclusively mechanistic
or data driven approach we suggest a hybrid system
which involves a (typically overly simplistic) mechanis
tic model of the system which can easily be augmented
through machine learning techniques. We will start by
considering two dynamical systems, both simple latent
variable models, which incorporate first and second or
der differential equations. Our inspiration is the work
of (Lawrence et al., 2007; Gao et al., 2008) who en
coded a first order differential equation in a Gaussian
process (GP). However, their aim was to construct an
accurate model of transcriptional regulation, whereas
ours is to make use of the mechanistic model to in
Page 2
10
Latent Force Models
corporate salient characteristics of the data (e.g. in a
mechanical system inertia) without necessarily associ
ating the components of our mechanistic model with
actual physical components of the system. For exam
ple, for a human motion capture dataset we develop
a mechanistic model of motion capture that does not
exactly replicate the physics of human movement, but
nevertheless captures salient features of the movement.
Having shown how first and second order dynamical
systems can be incorporated in a GP, we finally show
how partial differential equations can also be incorpo
rated for modelling systems with multiple inputs.
2Latent Variables and Physical
Systems
From the perspective of machine learning our approach
can be seen as a type of latent variable model. In a
latent variable model we may summarize a high dimen
sional data set with a reduced dimensional represen
tation. For example, if our data consists of N points
in a Q dimensional space we might seek a linear rela
tionship between the data, Y ∈ RN×Q, and a reduced
dimensional representation, F ∈ RN×R, where R < Q.
From a probabilistic perspective this involves an as
sumption that we can represent the data as
Y = FW + E,
(1)
where E is a matrixvariate Gaussian noise: each col
umn, ?:,q (1 ≤ q ≤ Q), is a multivariate Gaussian
with zero mean and covariance Σ, i.e. ?:,q∼ N (0,Σ).
The usual approach, as undertaken in factor analysis
and principal component analysis (PCA), to dealing
with the unknowns in this model is to integrate out F
under a Gaussian prior and optimize with respect to
W ∈ RR×Q(although it turns out that for a nonlinear
variant of the model it can be convenient to do this the
other way around, see e.g. (Lawrence, 2005)). If the
data has a temporal nature, then the Gaussian prior in
the latent space could express a relationship between
the rows of F, ftn= ftn−1+ η, where η ∼ N (0,C)
and ftnis the nth row of F, which we associate with
time tn. This is known as the Kalman filter/smoother.
Normally the times, tn, are taken to be equally spaced,
but more generally we can consider a joint distribution
for p(Ft), t = [t1...tN]?, which has the form of a
Gaussian process (GP),
p(Ft) =
R
?
r=1
N?f:,r0,Kf:,r,f:,r
?,
where we have assumed zero mean and independence
across the R dimensions of the latent space. The GP
makes explicit the fact that the latent variables are
functions, {fr(t)}R
r=1, and we have now described them
with a process prior. The notation used, f:,r, indicates
the rth column of F, and represents the values of
that function for the rth dimension at the times given
by t. The matrix Kf:,r,f:,ris the covariance function
associated to fr(t) computed at the times given in t.
Such a GP can be readily implemented. Given the co
variance functions for {fr(t)} the implied covariance
functions for {yq(t)} are straightforward to derive. In
(Teh et al., 2005) this is known as a semiparametric
latent factor model (SLFM), although their main fo
cus is not the temporal case. Historically the Kalman
filter approach has been preferred, perhaps because of
its linear computational complexity in N. However,
recent advances in sparse approximations have made
the general GP framework practical (see (Qui˜ nonero
Candela and Rasmussen, 2005) for a review).
So far the model described relies on the latent variables
to provide the dynamic information. Our main contri
bution is to include a further dynamical system with
a mechanistic inspiration. We now use a mechanical
analogy to introduce it. Consider the following phys
ical interpretation of (1): the latent functions, fr(t),
are R forces and we observe the displacement of Q
springs, yq(t), to the forces. Then we can reinterpret
(1) as the force balance equation, YD = FS+˜E. Here
we have assumed that the forces are acting, for exam
ple, through levers, so that we have a matrix of sen
sitivities, S ∈ RR×Q, and a diagonal matrix of spring
constants, D ∈ RQ×Q. The original model is recovered
by setting W = SD−1and ˜ ?:,q∼ N?0,D?ΣD?. The
acting in parallel with a damper and that the system
has mass, allowing us to write,
FS =¨YM +˙YC + YD + ?,
model can be extended by assuming that the spring is
(2)
where M and C are diagonal matrices of masses and
damping coefficients respectively, ˙Y ∈ RN×Qis the
first derivative of Y w.r.t. time and¨Y is the second
derivative. The second order mechanical system that
this model describes will exhibit several characteris
tics which are impossible to represent in the simpler
latent variable model given by (1), such as inertia and
resonance.This model is not only appropriate for
data from mechanical systems. There are many analo
gous systems which can also be represented by second
order differential equations, e.g.
Capacitor circuits.A unifying characteristic for all
these models is that the system is beign forced by la
tent functions, {fr(t)}R
latent force models (LFMs).
ResistorInductor
r=1. Hence, we refer to them as
One way of thinking of our model is to consider pup
petry. A marionette is a representation of a human
(or animal) controlled by a limited number of inputs
through strings (or rods) attached to the character.
Page 3
11
Alvarez, Luengo, Lawrence
This limited number of inputs can lead to a wide range
of character movements. In our model, the data is the
movements of the marionette, and the latent forces are
the inputs to the system from the puppeteer.
Finally, note that it is of little use to include dynam
ical models of the type specified in (2) if their effects
cannot be efficiently incorporated into the inference
process. Fortunately, as we will see in the case studies,
for an important class of covariance functions it is an
alytically tractable to compute the implied covariance
functions for {yq(t)}Q
gate gradient descent algorithm can be used to obtain
the hyperparameters of the model which minimize the
minus loglikelihood, and inference is performed based
on standard GP regression techniques.
q=1. Then, given the data a conju
3 First Order Dynamical System
A single input module is a biological network motif
where the transcription of a number of genes is driven
by a single transcription factor. In (Barenco et al.,
2006) a simple first order differential equation was
proposed to model this situation. Then (Lawrence et
al., 2007; Gao et al., 2008) suggested that inference of
the latent transcription factor concentration should be
handled using GPs. In effect their model can be seen as
a latent force model based on a first order differential
equation with a single latent force. Here we consider
the extension of this model to multiple latent forces.
As a mechanistic model, this is a severe over simpli
fication of the physical system: transcription factors
are known to interact in a non linear manner. Despite
this we will be able to uncover useful information. Our
model is based on the following differential equation,
dyq(t)
dt
+ Dqyq(t) = Bq+
R
?
r=1
Srqfr(t).
(3)
Here the latent forces, fr(t), represent protein con
centration (which is difficult to observe directly), the
outputs, yq(t), are the mRNA abundance levels for
different genes, Bq and Dq are respectively the basal
transcription and the decay rates of the qth gene, and
Srqare coupling constants that quantify the influence
of the rth input on the qth output (i.e. the sensitiv
ity of gene q to the concentration of protein r). Solving
(3) for yq(t), we obtain
yq(t) =Bq
Dq
+
R
?
r=1
Lrq[fr](t),
where we have ignored transient terms, which are eas
ily included, and the linear operator is given by the
following linear convolution operator,
Lrq[fr](t) = Srqexp(−Dqt)
?t
0
fr(τ)exp(Dqτ)dτ.
If each latent force is taken to be independent with a
covariance function given by
kfr,fr(t,t?) = exp
?
−(t − t?)2
?2
r
?
,
then we can compute the covariance of the outputs
analytically, obtaining (Lawrence et al., 2007)
kypyq(t,t?) =
R
?
r=1
SrpSrq√π?r
2
[hqp(t?,t) + hpq(t,t?)],
where
hqp(t?,t) =
exp(ν2
Dp+ Dq
?
rq)
exp(−Dqt?)
?t?− t
?
?
exp(Dqt)
×
erf
?r
− νrq
?
+ erf
?t
?
?r
+ νrq
??
− exp(−Dpt)erf
?t?
?r
− νrq
+ erf(νrq)
??
,
here erf(x) is the real valued error function, erf(x) =
2
√π
Additionally, we can compute the crosscovariance be
tween the inputs and outputs,
kyqfr(t,t?) =Srq√π?r
2
?
?x
0exp(−y2)dy, and νrq= ?rDq/2.
exp(ν2
rq)exp(−Dq(t − t?))
?
×
erf
?t?− t
?r
− νrq
+ erf
?t?
?r
+ νrq
??
.
3.1p53 Data
Our data is from (Barenco et al., 2006), where
leukemia cell lines were bombarded with radiation to
induce activity of the transcription factor p53. This
transcription factor repairs DNA damage and triggers
a mechanism which pauses the cellcycle and poten
tially terminates the cell. In (Barenco et al., 2006)
microarray gene expression levels of known targets of
p53 were used to fit a first order differential equation
model to the data. The model was then used to pro
vide a ranked list of 50 genes identified as regulated
by p53.
Our aim is to determine if there are additional “la
tent forces” which could better explain the activity of
some of these genes. The experimental data consists of
measurements of expression levels of 50 genes for three
different replicas. Within each replica, there are mea
surements at seven different time instants. We con
structed a latent force model with six latent forces, as
suming that each replica was independently produced
but fixing the hyperparameters of the kernel across the
Page 4
12
Latent Force Models
replicas1. We employed a sparse approximation, as
proposed in (Alvarez and Lawrence, 2009), with ten
inducing points for speeding up computation.
Of the six latent functions, two were automatically
switched off by the model. Two further latent func
tions, shown in Figure 1 as latent forces 1 & 2, were
consistent across all replicas: their shapes were time
translated versions of the p53 profile as identified by
(Barenco et al., 2006; Lawrence et al., 2007; Gao et
al., 2008). This time translation allows genes to ex
perience different transcriptional delays, a mechanism
not included explicitly in our model, but mimicked by
linear mixing of an early and a late signal. The re
maining two latent functions were inconsistent across
the replicas (see e.g. latent force 3 in Figure 1). They
appear to represent processes not directly related to
p53. This was backed up by the sensitivity parameters
found in the model. The known p53 targets DDB2,
p21, SESN1/hPA26, BIK and TNFRSF10b were found
to respond to latent forces 1 & 2.
genes that were most responsive to latent force 3 were
MAP4K4, a gene involved in environmental stress sig
nalling, and FDXR, an electron transfer protein.
Conversely, the
4Second Order Dynamical System
In Section 1 we introduced the analogy of a mari
onette’s motion being controlled by a reduced number
of forces. Human motion capture data consists of a
skeleton and multivariate time courses of angles which
summarize the motion. This motion can be modelled
with a set of second order differential equations which,
due to variations in the centers of mass induced by
the movement, are nonlinear. The simplification we
consider for the latent force model is to linearize these
differential equations, resulting in the following second
order dynamical system,
d2yq(t)
dt2
+Cqdyq(t)
dt
+Dqyq(t) = Bq+
R
?
r=1
Srqfr(t), (4)
where the mass of the system, without loss of gener
ality, is normalized to 1. Whilst (4) is not the correct
physical model for our system, it will still be help
ful when extrapolating predictions across different mo
tions, as we shall see in the next section. Note also
that, although similar to (3), the dynamic behavior of
this system is much richer than that of the first order
system, since it can exhibit inertia and resonance.
1The decay rates were asssumed equal within replicas.
Although this might be an important restriction for this
experiment, our purpose in this paper is to expose a gen
eral methodology without delving into the details of each
experimental setup.
For the motion capture data yq(t) corresponds to a
given observed angle over time, and its derivatives rep
resent angular velocity and acceleration. The system
is summarized by the undamped natural frequency,
ω0q=?Dq, and the damping ratio, ζq=1
said to be overdamped, whereas underdamped systems
exhibit resonance and have a damping ratio less than
one. For critically damped systems ζq= 1, and finally,
for undamped systems (i.e. no friction) ζq= 0.
2Cq/?Dq.
Systems with a damping ratio greater than one are
Ignoring the initial conditions once more, the solution
of (4) is again given by a convolution, with the linear
operator now being
Lrq[fr](t)=
Srq
ωq
exp(−αqt)
?t
×
0
fr(τ)exp(αqτ)sin(ωq(t − τ))dτ,
(5)
where ωq=
?
4Dq− C2
q/2 and αq= Cq/2.
Once again, if we consider a latent force governed by
a GP with the RBF covariance function we can solve
(5) analytically, obtaining a closedform expression for
the covariance matrix of the outputs,
kypyq(t,t?) =
R
?
r=1
SrpSrq
8ωpωq
?π?2
r
k(r)
ypyq(t,t?).
Here k(r)
between the pth and qth outputs under the effect of
the rth latent force, and is given by
ypyq(t,t?) can be considered the crosscovariance
k(r)
ypyq(t,t?)=
hr(? γq,γp,t,t?) + hr(γp,? γq,t?,t)
− hr(? γq,? γp,t,t?) − hr(? γp,? γq,t?,t)
+ hr(γq,? γp,t,t?) + hr(? γp,γq,t?,t)
− hr(γq,γp,t,t?) − hr(γp,γq,t?,t),
where γp= αp+ jωp, ? γp= αp− jωp, and
hr(γq,γp,t,t?) =Υr(γq,t?,t) − exp(−γpt)Υr(γq,t?,0)
γp+ γq
,
with
Υr(γq,t,t?) = 2exp
?
??2
rγ2
4
q
?
exp(−γq(t − t?))
?
−exp
−(t−t?)2
× exp(−γqt)w(−jzrq(0)),
and zrq(t) = (t − t?)/?r− (?rγq)/2. Note that zrq(t) ∈
C, and w(jz) in (6), for z ∈ C, denotes Faddeeva’s
function w(jz) = exp(z2)erfc(z), where erfc(z) is the
complex version of the complementary error function,
?2
r
?
w(jzrq(t)) − exp
−(t?)2
?2
r
?
(6)
Page 5
13
Alvarez, Luengo, Lawrence
024681012
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
(a) Replica 1. Latent force 1.
0246810 12
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
(b) Replica 2. Latent force 1.
¿
024681012
−3
−2
−1
0
1
2
3
(c) Replica 3. Latent force 1.
0246810 12
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
(d) Replica 1. Latent force 2.
02468 10 12
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
(e) Replica 2. Latent force 2.
024681012
−3
−2
−1
0
1
2
3
(f) Replica 3. Latent force 2.
0246810 12
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
(g) Replica 1. Latent force 3.
024681012
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
(h) Replica 2. Latent force 3.
024681012
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
(i) Replica 3. Latent force 3.
Figure 1: (a)(c) and (d)(f) the two latent forces associated with p53 activity. p53 targets are sensitive to a
combination of these functions allowing them to account for transcriptional delays. (g)(h) a latent force that
was inconsistent across the replicas. It may be associated with cellular processes not directly related to p53.
erfc(z) = 1 − erf(z) =
function is usually considered the complex equivalent
of the error function, since w(jz) is bounded when
ever the imaginary part of jz is greater or equal than
zero, and is the key to achieving a good numerical sta
bility when computing (6) and its gradients.
2
√π
?∞
zexp(−v2)dv. Faddeeva’s
Similarly, the crosscovariance between latent func
tions and outputs is given by
kyqfr(t,t?) =?rSrq√π
j4ωq
[Υr(? γq,t,t?) − Υr(γq,t,t?)],
A visualization of a covariance matrix with a latent
force and three different outputs (overdamped, under
damped and critically damped) is given in Figure 2.
4.1 Motion Capture data
Our motion capture data set is from the CMU motion
capture data base2. We considered 3 balancing mo
tions (18, 19, 20) from subject 49. The subject starts
in a standing position with arms raised, then, over
about 10 seconds, he raises one leg in the air and low
ers his arms to an outstretched position. Of interest
to us was the fact that, whilst motions 18 and 19 are
relatively similar, motion 20 contains more dramatic
movements. We were interested in training on motions
18 and 19 and testing on the more dramatic movement
2The CMU Graphics Lab Motion Capture Database was
created with funding from NSF EIA0196217 and is avail
able at http://mocap.cs.cmu.edu.
Page 6
14
Latent Force Models
f(t)y1(t)y2(t)y3(t)
f(t)
y1(t)
y2(t)
y3(t)
−0.4
−0.2
0
0.2
0.4
0.6
0.8
Figure 2: Visualization of the covariance matrix asso
ciated with the second order kernel. Three outputs and
their correlation with the latent function are shown.
Output 1 is underdamped and the natural frequency
is observable through the bars of alternating correla
tion and anti correlation in the associated portions of
the covariance matrix. Output 2 is overdamped, note
the more diffuse covariance in comparison to Output
3 which is critically damped.
to assess the model’s ability to extrapolate. The data
was downsampled by 32 (from 120 frames per second
to just 3.75) and we focused on the subject’s left arm.
Our objective was to reconstruct the motion of this
arm for motion 20 given the angles of the shoulder
and the parameters learned from motions 18 and 19
using two latent functions. First, we train the second
order differential equation latent force model on mo
tions 18 and 19, treating the sequences as independent
but sharing parameters (i.e. the damping coefficients
and natural frequencies of the two differential equa
tions associated with each angle were constrained to
be the same). Then, for the test data, we condition on
the observations of the shoulder’s orientation to make
predictions for the rest of the arm’s angles.
For comparison, we considered a regression model that
directly predicts the angles of the arm given the ori
entation of the shoulder using standard independent
GPs with RBF covariance functions. Results are sum
marized in Table 1, with some example plots of the
tracks of the angles given in Figure 3.
5Partial Differential Equations and
Latent Forces
So far we have considered dynamical latent force mod
els based on ordinary differential equations, leading to
multioutput Gaussian processes which are functions
of a single variable: time. However, the methodology
can also be applied in the context of partial differen
Table 1: Root mean squared (RMS) angle error for
prediction of the left arm’s configuration in the motion
capture data. Prediction with the latent force model
outperforms the prediction with regression for all apart
from the radius’s angle.
Latent Force
Angle
Radius
Wrist
Hand X rotation
Hand Z rotation
Thumb X rotation
Thumb Z rotation
Regression
Error
4.02
6.65
3.21
6.14
3.10
6.09
Error
4.11
6.55
1.82
2.76
1.77
2.73
tial equations in order to recover multioutput Gaussian
processes which are functions of several inputs.
5.1 Diffusion in the Swiss Jura
The Jura data is a set of measurements of concentra
tions of several heavy metal pollutants collected from
topsoil in a 14.5 km2region of the Swiss Jura. We
consider a latent function that represents how the pol
lutants were originally laid down. As time passes, we
assume that the pollutants diffuse at different rates
resulting in the concentrations observed in the data
set. We therefore consider a simplified version of the
diffusion equation, known also as the heat equation,
∂yq(x,t)
∂t
=
d
?
j=1
κq∂2yq(x,t)
∂x2
j
,
where d = 2 is the dimension of x, the measured con
centration of each pollutant over space and time is
given by yq(x,t), and the latent function fr(x) now
represents the concentration of pollutants at time zero
(i.e. the system’s initial condition). The solution to
the system (Polyanin, 2002) is then given by
yq(x,t) =
R
?
r=1
Srq
?
Rdfr(x?)Gq(x,x?,t)dx?
where Gq(x,x?,t) is the Green’s function given as
Gq(x,x?,t) =
1
2dπd/2Td/2
q
exp
−
d
?
j=1
(xj− x?
4Tq
j)2
,
with Tq= κqt. Again, if we take the latent function to
be given by a GP with the RBF covariance function
we can compute the multiple output covariance func
tions analytically. The covariance function between
Page 7
15
Alvarez, Luengo, Lawrence
0123456789
−300
−250
−200
−150
−100
−50
0
50
100
150
(a) Inferred Latent Force
0123
(b) Wrist
456789
−5
0
5
10
15
20
25
30
35
40
45
0123456789
−30
−25
−20
−15
−10
−5
0
(c) Hand X Rotation
0123456789
−45
−40
−35
−30
−25
−20
−15
−10
−5
0
(d) Hand Z Rotation
0123456789
−2
0
2
4
6
8
10
12
(e) Thumb X Rotation
0123456789
−15
−10
−5
0
5
10
15
20
(f) Thumb Z Rotation
Figure 3: (a) Inferred latent force for the motion capture data. The force shown is the weighted sum of the two
forces that drive the system. (b)(f) Predictions from the latent force model (solid line, grey error bars) and from
direct regression from the shoulder angles (crosses with stick error bars). For these examples noise is high due
to the relatively small length of the bones. Despite this the latent force model does a credible job of capturing
the angle, whereas direct regression with independent GPs fails to capture the trends.
the output functions is obtained as
kypyq(x,x?,t) =
R
?
r=1
SrpSrqLr1/2
Lrp+ Lrq+ Lr1/2
2(x − x?)?(Lrp+ Lrq+ Lr)−1(x − x?)
× exp
?
−1
?
,
where Lrp,Lrqand Lrare diagonal isotropic matrices
with entries 2κpt, 2κqt and 1/?2
variance function between the output and latent func
tions is given by
rrespectively. The co
kyqfr(x,x?,t) =
SrqLr1/2
Lrq+ Lr1/2
2(x − x?)?(Lrq+ Lr)−1(x − x?)
× exp
?
−1
?
.
5.2Prediction of Metal Concentrations
We used our model to replicate the experiments de
scribed in (Goovaerts, 1997, pp. 248,249) in which a
primary variable (cadmium, copper, lead and cobalt)
is predicted in conjunction with some secondary vari
ables (nickel and zinc for cadmium; lead, nickel and
zinc for copper; copper, nickel and zinc for lead; nickel
and zinc for cobalt).3
By conditioning on the val
ues of the secondary variables we can improve the
prediction of the primary variables. We compare re
sults for the diffusion kernel with results from predic
tion using independent GPs for the metals and “or
dinary cokriging” (as reported by (Goovaerts, 1997,
pp. 248,249)). For our experiments we made use of
10 repeats to report standard deviations. Mean abso
lute errors and standard deviations are shown in Table
2 ((Goovaerts, 1997) does not report standard devia
tions for the cokriging method). Our diffusion model
outperforms cokriging for all but one example.
6 Discussion
We have proposed a hybrid approach for the use of sim
ple mechanistic models with Gaussian processes which
allows for the creation of new kernels with physically
meaningful parameters. We have shown how these ker
nels can be applied to a range of data sets for the
analysis of microarray data, motion capture data and
3Data available at http://www.aigeostats.org/.
Page 8
16
Latent Force Models
Table 2: Mean absolute error and standard devia
tion for ten repetitions of the experiment for the Jura
dataset. IGPs stands for independent GPs, GPDK
stands for GP diffusion kernel, OCK for ordinary co
kriging, Cd for Cadmium, Cu for Copper, Pb for lead
and Co for Cobalt. For the Gaussian process with dif
fusion kernel, we learn the diffusion coefficients and the
lengthscale of the covariance of the latent function.
Metals
Cd
Cu
Pb
Co
IGPsGPDK
0.4505±0.0126
7.1677±0.2266
10.1097±0.2842
1.7546±0.0895
OCK
0.5
7.8
10.7
1.5
0.5823±0.0133
15.9357±0.0907
22.9141±0.6076
2.0735±0.1070
geostatistical data. To do this we proposed a range of
linear differential equation models: first order, second
order and a partial differential equation. The solu
tions to all these differential equations are in the form
of convolutions. When applied to a Gaussian process
latent function they result in a joint GP over the latent
functions and the observed outputs which provides a
general framework for multioutput GP regression.
We are not the first to suggest the use of convolu
tion processes for multioutput regression, they were
proposed by (Higdon, 2002) and built on by (Boyle
and Frean, 2005) — the ideas in these papers have
also recently been made more computationally practi
cal through sparse approximations suggested by (Al
varez and Lawrence, 2009). However, whilst (Boyle
and Frean, 2005) was motivated by the general idea of
constructing multioutput GPs, our aims are different.
Our focus has been embodying GPs with the charac
teristics of mechanistic models so that our data driven
models can exhibit well understood characteristics of
these physical systems. To maintain tractability these
mechanistic models are necessarily over simplistic, but
our results have shown that they can lead to significant
improvements on a wide range of data sets.
Acknowledgements
DL has been partly financed by Comunidad de Madrid
(projectPROMULTIDISCM,
and by the Spanish government (CICYT project
TEC200613514C0201 and researh grant JC2008
00219). MA and NL have been financed by a Google
Research Award and EPSRC Grant No EP/F005687/1
“Gaussian Processes for Systems Identification with
Applications in Systems Biology”.
S0505/TIC/0233),
References
Mauricio Alvarez and Neil D. Lawrence. Sparse con
volved gaussian processes for multioutput regres
sion. In Advances in Neural Information Processing
Systems 21, pages 57–64. MIT Press, 2009.
Martino Barenco, Daniela Tomescu, Daniel Brewer,
Robin Callard, Jaroslav Stark, and Michael Hubank.
Ranked prediction of p53 targets using hidden vari
able dynamic modeling. Genome Biology, 7(3):R25,
2006.
Phillip Boyle and Marcus Frean. Dependent Gaus
sian processes. In Lawrence Saul, Yair Weiss, and
L´ eon Bouttou, editors, Advances in Neural Informa
tion Processing Systems, volume 17, pages 217–224,
Cambridge, MA, 2005. MIT Press.
Pei Gao, Antti Honkela, Magnus Rattray, and Neil D.
Lawrence. Gaussian process modelling of latent
chemical species: Applications to inferring tran
scription factor activities. Bioinformatics, 24:i70–
i75, 2008.
Pierre Goovaerts. Geostatistics For Natural Resources
Evaluation. Oxford University Press, 1997.
David M. Higdon. Space and spacetime modelling
using process convolutions. In C. Anderson, V. Bar
nett, P. Chatwin, and A. ElShaarawi, editors,
Quantitative methods for current environmental is
sues, pages 37–56. SpringerVerlag, 2002.
Neil D. Lawrence, Guido Sanguinetti, and Magnus
Rattray. Modelling transcriptional regulation using
Gaussian processes. In Bernhard Sch¨ olkopf, John C.
Platt, and Thomas Hofmann, editors, Advances in
Neural Information Processing Systems, volume 19,
pages 785–792, Cambridge, MA, 2007. MIT Press.
Neil D. Lawrence. Probabilistic nonlinear principal
component analysis with Gaussian process latent
variable models. Journal of Machine Learning Re
search, 6:1783–1816, Nov. 2005.
Andrei D. Polyanin. Handbook of Linear Partial Dif
ferential Equations for Engineers and Scientists.
Chapman & Hall/CRC Press, 2002.
Joaquin Qui˜ nonero Candela and Carl Edward Ras
mussen.A unifying view of sparse approximate
Gaussian process regression.
Learning Research, 6:1939–1959, 2005.
Yee Whye Teh, Matthias Seeger, and Michael I. Jor
dan. Semiparametric latent factor models.
Robert G. Cowell and Zoubin Ghahramani, editors,
Proceedings of the Tenth International Workshop on
Artificial Intelligence and Statistics, pages 333–340,
Barbados, 68 January 2005. Society for Artificial
Intelligence and Statistics.
Journal of Machine
In
View other sources
Hide other sources
 Available from David Luengo · May 21, 2014
 Available from psu.edu