# Latent Force Models.

**ABSTRACT** Purely data driven approaches for machine learning present diculties when data is scarce relative to the complexity of the model or when the model is forced to extrapolate. On the other hand, purely mechanistic ap- proaches need to identify and specify all the interactions in the problem at hand (which may not be feasible) and still leave the is- sue of how to parameterize the system. In this paper, we present a hybrid approach us- ing Gaussian processes and dierential equa- tions to combine data driven modelling with a physical model of the system. We show how dierent, physically-inspired, kernel func- tions can be developed through sensible, sim- ple, mechanistic assumptions about the un- derlying system. The versatility of our ap- proach is illustrated with three case studies from computational biology, motion capture and geostatistics.

**0**Bookmarks

**·**

**77**Views

- [Show abstract] [Hide abstract]

**ABSTRACT:**Current technologies have lead to the availability of multiple genomic data types in sufficient quantity and quality to serve as a basis for automatic global network inference. Accordingly, there are currently a large variety of network inference methods that learn regulatory networks to varying degrees of detail. These methods have different strengths and weaknesses and thus can be complementary. However, combining different methods in a mutually reinforcing manner remains a challenge. We investigate how three scalable methods can be combined into a useful network inference pipeline. The first is a novel t-test-based method that relies on a comprehensive steady-state knock-out dataset to rank regulatory interactions. The remaining two are previously published mutual information and ordinary differential equation based methods (tlCLR and Inferelator 1.0, respectively) that use both time-series and steady-state data to rank regulatory interactions; the latter has the added advantage of also inferring dynamic models of gene regulation which can be used to predict the system's response to new perturbations. Our t-test based method proved powerful at ranking regulatory interactions, tying for first out of methods in the DREAM4 100-gene in-silico network inference challenge. We demonstrate complementarity between this method and the two methods that take advantage of time-series data by combining the three into a pipeline whose ability to rank regulatory interactions is markedly improved compared to either method alone. Moreover, the pipeline is able to accurately predict the response of the system to new conditions (in this case new double knock-out genetic perturbations). Our evaluation of the performance of multiple methods for network inference suggests avenues for future methods development and provides simple considerations for genomic experimental design. Our code is publicly available at http://err.bio.nyu.edu/inferelator/.PLoS ONE 01/2010; 5(10):e13397. · 3.73 Impact Factor - SourceAvailable from: David Luengo[Show abstract] [Hide abstract]

**ABSTRACT:**Interest in multioutput kernel methods is increas- ing, whether under the guise of multitask learn- ing, multisensor networks or structured output data. From the Gaussian process perspective a multioutput Mercer kernel is a covariance func- tion over correlated output functions. One way of constructing such kernels is based on convolution processes (CP). A key problem for this approach is efficient inference. ´ Alvarez and Lawrence re- cently presented a sparse approximation for CPs that enabled efficient inference. In this paper, we extend this work in two directions: we in- troduce the concept of variational inducing func- tions to handle potential non-smooth functions involved in the kernel CP construction and we consider an alternative approach to approximate inference based on variational methods, extend- ing the work by Titsias (2009) to the multiple output case. We demonstrate our approaches on prediction of school marks, compiler perfor- mance and financial time series.Journal of Machine Learning Research - Proceedings Track. 01/2010; 9:25-32. - SourceAvailable from: Simo Särkkä
##### Conference Paper: Linear Operators and Stochastic Partial Differential Equations in Gaussian Process Regression.

[Show abstract] [Hide abstract]

**ABSTRACT:**In this paper we shall discuss an extension to Gaussian process (GP) regression models, where the measurements are modeled as linear functionals of the underlying GP and the estimation objective is a general linear operator of the process. We shall show how this framework can be used for modeling physical processes involved in measurement of the GP and for encoding physical prior information into regression models in form of stochastic partial differential equations (SPDE). We shall also illustrate the practical applicability of the theory in a simulated application.Artificial Neural Networks and Machine Learning - ICANN 2011 - 21st International Conference on Artificial Neural Networks, Espoo, Finland, June 14-17, 2011, Proceedings, Part II; 01/2011

Page 1

9

Latent Force Models

Mauricio Alvarez

School of Computer Science

University of Manchester

Manchester, UK, M13 9PL

alvarezm@cs.man.ac.uk

David Luengo

Dep. Teor´ ıa de Se˜ nal y Comunicaciones

Universidad Carlos III de Madrid

28911 Legan´ es, Spain

luengod@ieee.org

Neil D. Lawrence

School of Computer Science

University of Manchester

Manchester, UK, M13 9PL

neill@cs.man.ac.uk

Abstract

Purely data driven approaches for machine

learning present difficulties when data is

scarce relative to the complexity of the model

or when the model is forced to extrapolate.

On the other hand, purely mechanistic ap-

proaches need to identify and specify all the

interactions in the problem at hand (which

may not be feasible) and still leave the is-

sue of how to parameterize the system. In

this paper, we present a hybrid approach us-

ing Gaussian processes and differential equa-

tions to combine data driven modelling with

a physical model of the system. We show how

different, physically-inspired, kernel func-

tions can be developed through sensible, sim-

ple, mechanistic assumptions about the un-

derlying system. The versatility of our ap-

proach is illustrated with three case studies

from computational biology, motion capture

and geostatistics.

1 Introduction

Traditionally, the main focus in machine learning

has been model generation through a data driven

paradigm. The usual approach is to combine a data

set with a (typically fairly flexible) class of models

and, through judicious use of regularization, make use-

ful predictions on previously unseen data. There are

two key problems with purely data driven approaches.

Firstly, if data is scarce relative to the complexity of

the system we may be unable to make accurate predic-

tions on test data. Secondly, if the model is forced to

Appearing in Proceedings of the 12thInternational Confe-

rence on Artificial Intelligence and Statistics (AISTATS)

2009, Clearwater Beach, Florida, USA. Volume 5 of JMLR:

W&CP 5. Copyright 2009 by the authors.

extrapolate, i.e. make predictions in a regime in which

data has not been seen yet, performance can be poor.

Purely mechanistic models, i.e. models which are in-

spired by the underlying physical knowledge of the

system, are common in many areas such as chem-

istry, systems biology, climate modelling and geophys-

ical sciences, etc. They normally make use of a fairly

well characterized physical process that underpins the

system, typically represented with a set of differential

equations. The purely mechanistic approach leaves us

with a different set of problems to those from the data

driven approach. In particular, accurate description

of a complex system through a mechanistic modelling

paradigm may not be possible: even if all the physical

processes can be adequately described, the resulting

model could become extremely complex. Identifying

and specifying all the interactions might not be feasi-

ble, and we would still be faced with the problem of

identifying the parameters of the system.

Despite these problems, physically well characterized

models retain a major advantage over purely data

driven models. A mechanistic model can enable ac-

curate prediction even in regions where there may be

no available training data. For example, Pioneer space

probes have been able to enter different extra terres-

trial orbits despite the absence of data for these orbits.

In this paper we advocate an alternative approach.

Rather than relying on an exclusively mechanistic

or data driven approach we suggest a hybrid system

which involves a (typically overly simplistic) mechanis-

tic model of the system which can easily be augmented

through machine learning techniques. We will start by

considering two dynamical systems, both simple latent

variable models, which incorporate first and second or-

der differential equations. Our inspiration is the work

of (Lawrence et al., 2007; Gao et al., 2008) who en-

coded a first order differential equation in a Gaussian

process (GP). However, their aim was to construct an

accurate model of transcriptional regulation, whereas

ours is to make use of the mechanistic model to in-

Page 2

10

Latent Force Models

corporate salient characteristics of the data (e.g. in a

mechanical system inertia) without necessarily associ-

ating the components of our mechanistic model with

actual physical components of the system. For exam-

ple, for a human motion capture dataset we develop

a mechanistic model of motion capture that does not

exactly replicate the physics of human movement, but

nevertheless captures salient features of the movement.

Having shown how first and second order dynamical

systems can be incorporated in a GP, we finally show

how partial differential equations can also be incorpo-

rated for modelling systems with multiple inputs.

2Latent Variables and Physical

Systems

From the perspective of machine learning our approach

can be seen as a type of latent variable model. In a

latent variable model we may summarize a high dimen-

sional data set with a reduced dimensional represen-

tation. For example, if our data consists of N points

in a Q dimensional space we might seek a linear rela-

tionship between the data, Y ∈ RN×Q, and a reduced

dimensional representation, F ∈ RN×R, where R < Q.

From a probabilistic perspective this involves an as-

sumption that we can represent the data as

Y = FW + E,

(1)

where E is a matrix-variate Gaussian noise: each col-

umn, ?:,q (1 ≤ q ≤ Q), is a multi-variate Gaussian

with zero mean and covariance Σ, i.e. ?:,q∼ N (0,Σ).

The usual approach, as undertaken in factor analysis

and principal component analysis (PCA), to dealing

with the unknowns in this model is to integrate out F

under a Gaussian prior and optimize with respect to

W ∈ RR×Q(although it turns out that for a non-linear

variant of the model it can be convenient to do this the

other way around, see e.g. (Lawrence, 2005)). If the

data has a temporal nature, then the Gaussian prior in

the latent space could express a relationship between

the rows of F, ftn= ftn−1+ η, where η ∼ N (0,C)

and ftnis the n-th row of F, which we associate with

time tn. This is known as the Kalman filter/smoother.

Normally the times, tn, are taken to be equally spaced,

but more generally we can consider a joint distribution

for p(F|t), t = [t1...tN]?, which has the form of a

Gaussian process (GP),

p(F|t) =

R

?

r=1

N?f:,r|0,Kf:,r,f:,r

?,

where we have assumed zero mean and independence

across the R dimensions of the latent space. The GP

makes explicit the fact that the latent variables are

functions, {fr(t)}R

r=1, and we have now described them

with a process prior. The notation used, f:,r, indicates

the r-th column of F, and represents the values of

that function for the r-th dimension at the times given

by t. The matrix Kf:,r,f:,ris the covariance function

associated to fr(t) computed at the times given in t.

Such a GP can be readily implemented. Given the co-

variance functions for {fr(t)} the implied covariance

functions for {yq(t)} are straightforward to derive. In

(Teh et al., 2005) this is known as a semi-parametric

latent factor model (SLFM), although their main fo-

cus is not the temporal case. Historically the Kalman

filter approach has been preferred, perhaps because of

its linear computational complexity in N. However,

recent advances in sparse approximations have made

the general GP framework practical (see (Qui˜ nonero

Candela and Rasmussen, 2005) for a review).

So far the model described relies on the latent variables

to provide the dynamic information. Our main contri-

bution is to include a further dynamical system with

a mechanistic inspiration. We now use a mechanical

analogy to introduce it. Consider the following phys-

ical interpretation of (1): the latent functions, fr(t),

are R forces and we observe the displacement of Q

springs, yq(t), to the forces. Then we can reinterpret

(1) as the force balance equation, YD = FS+˜E. Here

we have assumed that the forces are acting, for exam-

ple, through levers, so that we have a matrix of sen-

sitivities, S ∈ RR×Q, and a diagonal matrix of spring

constants, D ∈ RQ×Q. The original model is recovered

by setting W = SD−1and ˜ ?:,q∼ N?0,D?ΣD?. The

acting in parallel with a damper and that the system

has mass, allowing us to write,

FS =¨YM +˙YC + YD + ?,

model can be extended by assuming that the spring is

(2)

where M and C are diagonal matrices of masses and

damping coefficients respectively, ˙Y ∈ RN×Qis the

first derivative of Y w.r.t. time and¨Y is the second

derivative. The second order mechanical system that

this model describes will exhibit several characteris-

tics which are impossible to represent in the simpler

latent variable model given by (1), such as inertia and

resonance.This model is not only appropriate for

data from mechanical systems. There are many analo-

gous systems which can also be represented by second

order differential equations, e.g.

Capacitor circuits.A unifying characteristic for all

these models is that the system is beign forced by la-

tent functions, {fr(t)}R

latent force models (LFMs).

Resistor-Inductor-

r=1. Hence, we refer to them as

One way of thinking of our model is to consider pup-

petry. A marionette is a representation of a human

(or animal) controlled by a limited number of inputs

through strings (or rods) attached to the character.

Page 3

11

Alvarez, Luengo, Lawrence

This limited number of inputs can lead to a wide range

of character movements. In our model, the data is the

movements of the marionette, and the latent forces are

the inputs to the system from the puppeteer.

Finally, note that it is of little use to include dynam-

ical models of the type specified in (2) if their effects

cannot be efficiently incorporated into the inference

process. Fortunately, as we will see in the case studies,

for an important class of covariance functions it is an-

alytically tractable to compute the implied covariance

functions for {yq(t)}Q

gate gradient descent algorithm can be used to obtain

the hyperparameters of the model which minimize the

minus log-likelihood, and inference is performed based

on standard GP regression techniques.

q=1. Then, given the data a conju-

3 First Order Dynamical System

A single input module is a biological network motif

where the transcription of a number of genes is driven

by a single transcription factor. In (Barenco et al.,

2006) a simple first order differential equation was

proposed to model this situation. Then (Lawrence et

al., 2007; Gao et al., 2008) suggested that inference of

the latent transcription factor concentration should be

handled using GPs. In effect their model can be seen as

a latent force model based on a first order differential

equation with a single latent force. Here we consider

the extension of this model to multiple latent forces.

As a mechanistic model, this is a severe over simpli-

fication of the physical system: transcription factors

are known to interact in a non linear manner. Despite

this we will be able to uncover useful information. Our

model is based on the following differential equation,

dyq(t)

dt

+ Dqyq(t) = Bq+

R

?

r=1

Srqfr(t).

(3)

Here the latent forces, fr(t), represent protein con-

centration (which is difficult to observe directly), the

outputs, yq(t), are the mRNA abundance levels for

different genes, Bq and Dq are respectively the basal

transcription and the decay rates of the q-th gene, and

Srqare coupling constants that quantify the influence

of the r-th input on the q-th output (i.e. the sensitiv-

ity of gene q to the concentration of protein r). Solving

(3) for yq(t), we obtain

yq(t) =Bq

Dq

+

R

?

r=1

Lrq[fr](t),

where we have ignored transient terms, which are eas-

ily included, and the linear operator is given by the

following linear convolution operator,

Lrq[fr](t) = Srqexp(−Dqt)

?t

0

fr(τ)exp(Dqτ)dτ.

If each latent force is taken to be independent with a

covariance function given by

kfr,fr(t,t?) = exp

?

−(t − t?)2

?2

r

?

,

then we can compute the covariance of the outputs

analytically, obtaining (Lawrence et al., 2007)

kypyq(t,t?) =

R

?

r=1

SrpSrq√π?r

2

[hqp(t?,t) + hpq(t,t?)],

where

hqp(t?,t) =

exp(ν2

Dp+ Dq

?

rq)

exp(−Dqt?)

?t?− t

?

?

exp(Dqt)

×

erf

?r

− νrq

?

+ erf

?t

?

?r

+ νrq

??

− exp(−Dpt)erf

?t?

?r

− νrq

+ erf(νrq)

??

,

here erf(x) is the real valued error function, erf(x) =

2

√π

Additionally, we can compute the cross-covariance be-

tween the inputs and outputs,

kyqfr(t,t?) =Srq√π?r

2

?

?x

0exp(−y2)dy, and νrq= ?rDq/2.

exp(ν2

rq)exp(−Dq(t − t?))

?

×

erf

?t?− t

?r

− νrq

+ erf

?t?

?r

+ νrq

??

.

3.1p53 Data

Our data is from (Barenco et al., 2006), where

leukemia cell lines were bombarded with radiation to

induce activity of the transcription factor p53. This

transcription factor repairs DNA damage and triggers

a mechanism which pauses the cell-cycle and poten-

tially terminates the cell. In (Barenco et al., 2006)

microarray gene expression levels of known targets of

p53 were used to fit a first order differential equation

model to the data. The model was then used to pro-

vide a ranked list of 50 genes identified as regulated

by p53.

Our aim is to determine if there are additional “la-

tent forces” which could better explain the activity of

some of these genes. The experimental data consists of

measurements of expression levels of 50 genes for three

different replicas. Within each replica, there are mea-

surements at seven different time instants. We con-

structed a latent force model with six latent forces, as-

suming that each replica was independently produced

but fixing the hyperparameters of the kernel across the

Page 4

12

Latent Force Models

replicas1. We employed a sparse approximation, as

proposed in (Alvarez and Lawrence, 2009), with ten

inducing points for speeding up computation.

Of the six latent functions, two were automatically

switched off by the model. Two further latent func-

tions, shown in Figure 1 as latent forces 1 & 2, were

consistent across all replicas: their shapes were time

translated versions of the p53 profile as identified by

(Barenco et al., 2006; Lawrence et al., 2007; Gao et

al., 2008). This time translation allows genes to ex-

perience different transcriptional delays, a mechanism

not included explicitly in our model, but mimicked by

linear mixing of an early and a late signal. The re-

maining two latent functions were inconsistent across

the replicas (see e.g. latent force 3 in Figure 1). They

appear to represent processes not directly related to

p53. This was backed up by the sensitivity parameters

found in the model. The known p53 targets DDB2,

p21, SESN1/hPA26, BIK and TNFRSF10b were found

to respond to latent forces 1 & 2.

genes that were most responsive to latent force 3 were

MAP4K4, a gene involved in environmental stress sig-

nalling, and FDXR, an electron transfer protein.

Conversely, the

4Second Order Dynamical System

In Section 1 we introduced the analogy of a mari-

onette’s motion being controlled by a reduced number

of forces. Human motion capture data consists of a

skeleton and multivariate time courses of angles which

summarize the motion. This motion can be modelled

with a set of second order differential equations which,

due to variations in the centers of mass induced by

the movement, are non-linear. The simplification we

consider for the latent force model is to linearize these

differential equations, resulting in the following second

order dynamical system,

d2yq(t)

dt2

+Cqdyq(t)

dt

+Dqyq(t) = Bq+

R

?

r=1

Srqfr(t), (4)

where the mass of the system, without loss of gener-

ality, is normalized to 1. Whilst (4) is not the correct

physical model for our system, it will still be help-

ful when extrapolating predictions across different mo-

tions, as we shall see in the next section. Note also

that, although similar to (3), the dynamic behavior of

this system is much richer than that of the first order

system, since it can exhibit inertia and resonance.

1The decay rates were asssumed equal within replicas.

Although this might be an important restriction for this

experiment, our purpose in this paper is to expose a gen-

eral methodology without delving into the details of each

experimental setup.

For the motion capture data yq(t) corresponds to a

given observed angle over time, and its derivatives rep-

resent angular velocity and acceleration. The system

is summarized by the undamped natural frequency,

ω0q=?Dq, and the damping ratio, ζq=1

said to be overdamped, whereas underdamped systems

exhibit resonance and have a damping ratio less than

one. For critically damped systems ζq= 1, and finally,

for undamped systems (i.e. no friction) ζq= 0.

2Cq/?Dq.

Systems with a damping ratio greater than one are

Ignoring the initial conditions once more, the solution

of (4) is again given by a convolution, with the linear

operator now being

Lrq[fr](t)=

Srq

ωq

exp(−αqt)

?t

×

0

fr(τ)exp(αqτ)sin(ωq(t − τ))dτ,

(5)

where ωq=

?

4Dq− C2

q/2 and αq= Cq/2.

Once again, if we consider a latent force governed by

a GP with the RBF covariance function we can solve

(5) analytically, obtaining a closed-form expression for

the covariance matrix of the outputs,

kypyq(t,t?) =

R

?

r=1

SrpSrq

8ωpωq

?π?2

r

k(r)

ypyq(t,t?).

Here k(r)

between the p-th and q-th outputs under the effect of

the r-th latent force, and is given by

ypyq(t,t?) can be considered the cross-covariance

k(r)

ypyq(t,t?)=

hr(? γq,γp,t,t?) + hr(γp,? γq,t?,t)

− hr(? γq,? γp,t,t?) − hr(? γp,? γq,t?,t)

+ hr(γq,? γp,t,t?) + hr(? γp,γq,t?,t)

− hr(γq,γp,t,t?) − hr(γp,γq,t?,t),

where γp= αp+ jωp, ? γp= αp− jωp, and

hr(γq,γp,t,t?) =Υr(γq,t?,t) − exp(−γpt)Υr(γq,t?,0)

γp+ γq

,

with

Υr(γq,t,t?) = 2exp

?

??2

rγ2

4

q

?

exp(−γq(t − t?))

?

−exp

−(t−t?)2

× exp(−γqt)w(−jzrq(0)),

and zrq(t) = (t − t?)/?r− (?rγq)/2. Note that zrq(t) ∈

C, and w(jz) in (6), for z ∈ C, denotes Faddeeva’s

function w(jz) = exp(z2)erfc(z), where erfc(z) is the

complex version of the complementary error function,

?2

r

?

w(jzrq(t)) − exp

−(t?)2

?2

r

?

(6)

Page 5

13

Alvarez, Luengo, Lawrence

024681012

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

(a) Replica 1. Latent force 1.

0246810 12

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

(b) Replica 2. Latent force 1.

¿

024681012

−3

−2

−1

0

1

2

3

(c) Replica 3. Latent force 1.

0246810 12

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

(d) Replica 1. Latent force 2.

02468 10 12

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

(e) Replica 2. Latent force 2.

024681012

−3

−2

−1

0

1

2

3

(f) Replica 3. Latent force 2.

0246810 12

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

(g) Replica 1. Latent force 3.

024681012

−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

(h) Replica 2. Latent force 3.

024681012

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

(i) Replica 3. Latent force 3.

Figure 1: (a)-(c) and (d)-(f) the two latent forces associated with p53 activity. p53 targets are sensitive to a

combination of these functions allowing them to account for transcriptional delays. (g)-(h) a latent force that

was inconsistent across the replicas. It may be associated with cellular processes not directly related to p53.

erfc(z) = 1 − erf(z) =

function is usually considered the complex equivalent

of the error function, since |w(jz)| is bounded when-

ever the imaginary part of jz is greater or equal than

zero, and is the key to achieving a good numerical sta-

bility when computing (6) and its gradients.

2

√π

?∞

zexp(−v2)dv. Faddeeva’s

Similarly, the cross-covariance between latent func-

tions and outputs is given by

kyqfr(t,t?) =?rSrq√π

j4ωq

[Υr(? γq,t,t?) − Υr(γq,t,t?)],

A visualization of a covariance matrix with a latent

force and three different outputs (overdamped, under-

damped and critically damped) is given in Figure 2.

4.1 Motion Capture data

Our motion capture data set is from the CMU motion

capture data base2. We considered 3 balancing mo-

tions (18, 19, 20) from subject 49. The subject starts

in a standing position with arms raised, then, over

about 10 seconds, he raises one leg in the air and low-

ers his arms to an outstretched position. Of interest

to us was the fact that, whilst motions 18 and 19 are

relatively similar, motion 20 contains more dramatic

movements. We were interested in training on motions

18 and 19 and testing on the more dramatic movement

2The CMU Graphics Lab Motion Capture Database was

created with funding from NSF EIA-0196217 and is avail-

able at http://mocap.cs.cmu.edu.

Page 6

14

Latent Force Models

f(t)y1(t)y2(t)y3(t)

f(t)

y1(t)

y2(t)

y3(t)

−0.4

−0.2

0

0.2

0.4

0.6

0.8

Figure 2: Visualization of the covariance matrix asso-

ciated with the second order kernel. Three outputs and

their correlation with the latent function are shown.

Output 1 is underdamped and the natural frequency

is observable through the bars of alternating correla-

tion and anti correlation in the associated portions of

the covariance matrix. Output 2 is overdamped, note

the more diffuse covariance in comparison to Output

3 which is critically damped.

to assess the model’s ability to extrapolate. The data

was down-sampled by 32 (from 120 frames per second

to just 3.75) and we focused on the subject’s left arm.

Our objective was to reconstruct the motion of this

arm for motion 20 given the angles of the shoulder

and the parameters learned from motions 18 and 19

using two latent functions. First, we train the second

order differential equation latent force model on mo-

tions 18 and 19, treating the sequences as independent

but sharing parameters (i.e. the damping coefficients

and natural frequencies of the two differential equa-

tions associated with each angle were constrained to

be the same). Then, for the test data, we condition on

the observations of the shoulder’s orientation to make

predictions for the rest of the arm’s angles.

For comparison, we considered a regression model that

directly predicts the angles of the arm given the ori-

entation of the shoulder using standard independent

GPs with RBF covariance functions. Results are sum-

marized in Table 1, with some example plots of the

tracks of the angles given in Figure 3.

5Partial Differential Equations and

Latent Forces

So far we have considered dynamical latent force mod-

els based on ordinary differential equations, leading to

multioutput Gaussian processes which are functions

of a single variable: time. However, the methodology

can also be applied in the context of partial differen-

Table 1: Root mean squared (RMS) angle error for

prediction of the left arm’s configuration in the motion

capture data. Prediction with the latent force model

outperforms the prediction with regression for all apart

from the radius’s angle.

Latent Force

Angle

Radius

Wrist

Hand X rotation

Hand Z rotation

Thumb X rotation

Thumb Z rotation

Regression

Error

4.02

6.65

3.21

6.14

3.10

6.09

Error

4.11

6.55

1.82

2.76

1.77

2.73

tial equations in order to recover multioutput Gaussian

processes which are functions of several inputs.

5.1 Diffusion in the Swiss Jura

The Jura data is a set of measurements of concentra-

tions of several heavy metal pollutants collected from

topsoil in a 14.5 km2region of the Swiss Jura. We

consider a latent function that represents how the pol-

lutants were originally laid down. As time passes, we

assume that the pollutants diffuse at different rates

resulting in the concentrations observed in the data

set. We therefore consider a simplified version of the

diffusion equation, known also as the heat equation,

∂yq(x,t)

∂t

=

d

?

j=1

κq∂2yq(x,t)

∂x2

j

,

where d = 2 is the dimension of x, the measured con-

centration of each pollutant over space and time is

given by yq(x,t), and the latent function fr(x) now

represents the concentration of pollutants at time zero

(i.e. the system’s initial condition). The solution to

the system (Polyanin, 2002) is then given by

yq(x,t) =

R

?

r=1

Srq

?

Rdfr(x?)Gq(x,x?,t)dx?

where Gq(x,x?,t) is the Green’s function given as

Gq(x,x?,t) =

1

2dπd/2Td/2

q

exp

−

d

?

j=1

(xj− x?

4Tq

j)2

,

with Tq= κqt. Again, if we take the latent function to

be given by a GP with the RBF covariance function

we can compute the multiple output covariance func-

tions analytically. The covariance function between

Page 7

15

Alvarez, Luengo, Lawrence

0123456789

−300

−250

−200

−150

−100

−50

0

50

100

150

(a) Inferred Latent Force

0123

(b) Wrist

456789

−5

0

5

10

15

20

25

30

35

40

45

0123456789

−30

−25

−20

−15

−10

−5

0

(c) Hand X Rotation

0123456789

−45

−40

−35

−30

−25

−20

−15

−10

−5

0

(d) Hand Z Rotation

0123456789

−2

0

2

4

6

8

10

12

(e) Thumb X Rotation

0123456789

−15

−10

−5

0

5

10

15

20

(f) Thumb Z Rotation

Figure 3: (a) Inferred latent force for the motion capture data. The force shown is the weighted sum of the two

forces that drive the system. (b)-(f) Predictions from the latent force model (solid line, grey error bars) and from

direct regression from the shoulder angles (crosses with stick error bars). For these examples noise is high due

to the relatively small length of the bones. Despite this the latent force model does a credible job of capturing

the angle, whereas direct regression with independent GPs fails to capture the trends.

the output functions is obtained as

kypyq(x,x?,t) =

R

?

r=1

SrpSrq|Lr|1/2

|Lrp+ Lrq+ Lr|1/2

2(x − x?)?(Lrp+ Lrq+ Lr)−1(x − x?)

× exp

?

−1

?

,

where Lrp,Lrqand Lrare diagonal isotropic matrices

with entries 2κpt, 2κqt and 1/?2

variance function between the output and latent func-

tions is given by

rrespectively. The co-

kyqfr(x,x?,t) =

Srq|Lr|1/2

|Lrq+ Lr|1/2

2(x − x?)?(Lrq+ Lr)−1(x − x?)

× exp

?

−1

?

.

5.2Prediction of Metal Concentrations

We used our model to replicate the experiments de-

scribed in (Goovaerts, 1997, pp. 248,249) in which a

primary variable (cadmium, copper, lead and cobalt)

is predicted in conjunction with some secondary vari-

ables (nickel and zinc for cadmium; lead, nickel and

zinc for copper; copper, nickel and zinc for lead; nickel

and zinc for cobalt).3

By conditioning on the val-

ues of the secondary variables we can improve the

prediction of the primary variables. We compare re-

sults for the diffusion kernel with results from predic-

tion using independent GPs for the metals and “or-

dinary co-kriging” (as reported by (Goovaerts, 1997,

pp. 248,249)). For our experiments we made use of

10 repeats to report standard deviations. Mean abso-

lute errors and standard deviations are shown in Table

2 ((Goovaerts, 1997) does not report standard devia-

tions for the co-kriging method). Our diffusion model

outperforms co-kriging for all but one example.

6 Discussion

We have proposed a hybrid approach for the use of sim-

ple mechanistic models with Gaussian processes which

allows for the creation of new kernels with physically

meaningful parameters. We have shown how these ker-

nels can be applied to a range of data sets for the

analysis of microarray data, motion capture data and

3Data available at http://www.ai-geostats.org/.

Page 8

16

Latent Force Models

Table 2: Mean absolute error and standard devia-

tion for ten repetitions of the experiment for the Jura

dataset. IGPs stands for independent GPs, GPDK

stands for GP diffusion kernel, OCK for ordinary co-

kriging, Cd for Cadmium, Cu for Copper, Pb for lead

and Co for Cobalt. For the Gaussian process with dif-

fusion kernel, we learn the diffusion coefficients and the

length-scale of the covariance of the latent function.

Metals

Cd

Cu

Pb

Co

IGPsGPDK

0.4505±0.0126

7.1677±0.2266

10.1097±0.2842

1.7546±0.0895

OCK

0.5

7.8

10.7

1.5

0.5823±0.0133

15.9357±0.0907

22.9141±0.6076

2.0735±0.1070

geostatistical data. To do this we proposed a range of

linear differential equation models: first order, second

order and a partial differential equation. The solu-

tions to all these differential equations are in the form

of convolutions. When applied to a Gaussian process

latent function they result in a joint GP over the latent

functions and the observed outputs which provides a

general framework for multi-output GP regression.

We are not the first to suggest the use of convolu-

tion processes for multi-output regression, they were

proposed by (Higdon, 2002) and built on by (Boyle

and Frean, 2005) — the ideas in these papers have

also recently been made more computationally practi-

cal through sparse approximations suggested by (Al-

varez and Lawrence, 2009). However, whilst (Boyle

and Frean, 2005) was motivated by the general idea of

constructing multi-output GPs, our aims are different.

Our focus has been embodying GPs with the charac-

teristics of mechanistic models so that our data driven

models can exhibit well understood characteristics of

these physical systems. To maintain tractability these

mechanistic models are necessarily over simplistic, but

our results have shown that they can lead to significant

improvements on a wide range of data sets.

Acknowledgements

DL has been partly financed by Comunidad de Madrid

(projectPRO-MULTIDIS-CM,

and by the Spanish government (CICYT project

TEC2006-13514-C02-01 and researh grant JC2008-

00219). MA and NL have been financed by a Google

Research Award and EPSRC Grant No EP/F005687/1

“Gaussian Processes for Systems Identification with

Applications in Systems Biology”.

S-0505/TIC/0233),

References

Mauricio Alvarez and Neil D. Lawrence. Sparse con-

volved gaussian processes for multi-output regres-

sion. In Advances in Neural Information Processing

Systems 21, pages 57–64. MIT Press, 2009.

Martino Barenco, Daniela Tomescu, Daniel Brewer,

Robin Callard, Jaroslav Stark, and Michael Hubank.

Ranked prediction of p53 targets using hidden vari-

able dynamic modeling. Genome Biology, 7(3):R25,

2006.

Phillip Boyle and Marcus Frean. Dependent Gaus-

sian processes. In Lawrence Saul, Yair Weiss, and

L´ eon Bouttou, editors, Advances in Neural Informa-

tion Processing Systems, volume 17, pages 217–224,

Cambridge, MA, 2005. MIT Press.

Pei Gao, Antti Honkela, Magnus Rattray, and Neil D.

Lawrence. Gaussian process modelling of latent

chemical species: Applications to inferring tran-

scription factor activities. Bioinformatics, 24:i70–

i75, 2008.

Pierre Goovaerts. Geostatistics For Natural Resources

Evaluation. Oxford University Press, 1997.

David M. Higdon. Space and space-time modelling

using process convolutions. In C. Anderson, V. Bar-

nett, P. Chatwin, and A. El-Shaarawi, editors,

Quantitative methods for current environmental is-

sues, pages 37–56. Springer-Verlag, 2002.

Neil D. Lawrence, Guido Sanguinetti, and Magnus

Rattray. Modelling transcriptional regulation using

Gaussian processes. In Bernhard Sch¨ olkopf, John C.

Platt, and Thomas Hofmann, editors, Advances in

Neural Information Processing Systems, volume 19,

pages 785–792, Cambridge, MA, 2007. MIT Press.

Neil D. Lawrence. Probabilistic non-linear principal

component analysis with Gaussian process latent

variable models. Journal of Machine Learning Re-

search, 6:1783–1816, Nov. 2005.

Andrei D. Polyanin. Handbook of Linear Partial Dif-

ferential Equations for Engineers and Scientists.

Chapman & Hall/CRC Press, 2002.

Joaquin Qui˜ nonero Candela and Carl Edward Ras-

mussen.A unifying view of sparse approximate

Gaussian process regression.

Learning Research, 6:1939–1959, 2005.

Yee Whye Teh, Matthias Seeger, and Michael I. Jor-

dan. Semiparametric latent factor models.

Robert G. Cowell and Zoubin Ghahramani, editors,

Proceedings of the Tenth International Workshop on

Artificial Intelligence and Statistics, pages 333–340,

Barbados, 6-8 January 2005. Society for Artificial

Intelligence and Statistics.

Journal of Machine

In

#### View other sources

#### Hide other sources

- Available from David Luengo · May 21, 2014
- Available from psu.edu