Content uploaded by Neil Dhir

Author content

All content in this area was uploaded by Neil Dhir on Oct 26, 2021

Content may be subject to copyright.

Dynamic Causal Bayesian Optimization

Virginia Aglietti∗

University of Warwick

The Alan Turing Institute

V.Aglietti@warwick.ac.uk

Neil Dhir∗

The Alan Turing Institute

ndhir@turing.ac.uk

Javier González

Microsoft Research Cambridge

Gonzalez.Javier@microsoft.com

Theodoros Damoulas

University of Warwick

The Alan Turing Institute

T.Damoulas@warwick.ac.uk

Abstract

This paper studies the problem of performing a sequence of optimal interventions in

a causal dynamical system where both the target variable of interest and the inputs

evolve over time. This problem arises in a variety of domains e.g. system biology

and operational research. Dynamic Causal Bayesian Optimization (DCBO) brings

together ideas from sequential decision making, causal inference and Gaussian

process (GP) emulation. DCBO is useful in scenarios where all causal effects

in a graph are changing over time. At every time step DCBO identiﬁes a local

optimal intervention by integrating both observational and past interventional data

collected from the system. We give theoretical results detailing how one can

transfer interventional information across time steps and deﬁne a dynamic causal

GP model which can be used to quantify uncertainty and ﬁnd optimal interventions

in practice. We demonstrate how DCBO identiﬁes optimal interventions faster than

competing approaches in multiple settings and applications.

1 Introduction

Causal Dimension

f(X=x, t)

ABO

Temporal Dimension

✓

✓

✗

✗

DCBO

f(do(Xs,t =xs,t), I0:t−1)

f(do(Xs=xs))

CBO

f(X=x)

BO

(a) (c)

(b) (d)

Y2

X0X2

Y0

X1

Y1

Z2

Z0Z1

Y2

X0X2

Y0Y1

Z2

Z0Z1

X1

Y2

X0X2

Y0

X1

Y1

Z2

Z0Z1

Y1Y2

X0X2

Y0

X1

Z2

Z0Z1

Figure 1: DAG representation of a

dynamic causal global optimisation

(DCGO) problem (a) and the DAG

considered when using CB O,A BO

or BO to address the same prob-

lem. Shaded nodes gives observed

variables while the arrows represent

causal effects.

Solving decision making problems in a variety of domains

requires understanding of cause-effect relationships in a system.

This can be obtained by experimentation. However, deciding

how to intervene at every point in time is particularly complex

in dynamical systems, due to the evolving nature of causal

effects. For instance, companies need to decide how to allocate

scarce resources across different quarters. In system biology,

scientists need to identify genes to knockout at speciﬁc points

in time. This paper describes a probabilistic framework that

ﬁnds such optimal interventions over time.

Focusing on a speciﬁc example, consider a setting in which

Yt

denotes the unemployment-rate of an economy at time

t

,

Zt

is

the economic growth and Xtthe inﬂation rate. Fig. 1a depicts

the causal graph [

26

] representing an agent’s understanding of

the causal links between these variables. The agent aims to

determine, at each time step

t∈ {0,1,2}

, the optimal action to

perform in order to minimize the current unemployment rate

∗Denotes equal contribution

35th Conference on Neural Information Processing Systems (NeurIPS 2021).

Yt

while accounting for the intervention cost. The investigator could frame this setting as a sequence

of global optimization problems and ﬁnd the solutions by resorting to Causal Bayesian Optimization

[CBO,

2

]. CBO extends Bayesian Optimization [BO,

30

] to cases in which the variable to optimize is

part of a causal model where a sequence of interventions can be performed. However, CBO does not

account for the system’s temporal evolution thus breaking the time dependency structure existing

among variables (Fig. 1b). This will lead to sub-optimal solutions, especially in non-stationary

scenarios. The same would happen when using Adaptive Bayesian Optimization [ABO,

25

] (Fig. 1c)

or BO (Fig. 1d). ABO captures the time dependency of the objective function but neither considers

the causal structure among inputs nor their temporal evolution. BO disregards both the temporal and

the causal structure. Our setting differs from both reinforcement learning (RL) and the multi-armed

bandits setting (MAB). Differently from MAB we consider interventions on continuous variables where

the dynamic target variable has a non-stationary interventional distribution. In addition, compared

to RL, we do not model the state dynamics explicitly and allow the agent to perform a number of

explorative interventions which do not change the underlying state of the system, before selecting the

optimal action. We discuss these points further in §1.1.

Dynamic Causal Bayesian Optimization

2

, henceforth DC BO, accounts for both the causal relationships

among input variables and the causality between inputs and outputs which might evolve over time.

DCBO integrates CBO with dynamic Bayesian networks (DBN), offering a novel approach for decision

making under uncertainty within dynamical systems. DBN [

19

] are commonly used in time-series

modelling and carry dependence assumptions that do not imply causation. Instead, in probabilistic

causal models [

27

], which form the basis for the CBO framework, graphs are buildt around causal

information and allow us to reason about the effects of different interventions. By combining CBO

with DBNs, the proposed methodology ﬁnds an optimal sequence of interventions which accounts

for the causal temporal dynamics of the system. In addition, DCBO takes into account past optimal

interventions and transfers this information across time, thus identifying the optimal intervention

faster than competing approaches and at a lower cost. We make the following contributions:

•

We formulate a new class of optimization problems called Dynamic Causal Global Optimization

(DCGO) where the objective functions account for the temporal causal dynamics among variables.

•

We give theoretical results demonstrating how interventional information can be transferred across

time-steps depending on the topology of the causal graph.

•

Exploiting our theoretical results, we solve the optimization problem with DCBO. At every time

step, DCBO constructs surrogate models for different intervention sets by integrating various sources

of data while accounting for past interventions.

• We analyze D CBO performance in a variety of settings comparing against CBO,ABO and BO.

1.1 Related Work

Dynamic Optimization

Optimization in dynamic environments has been studied in the context of

evolutionary algorithms [

14

,

16

]. More recently, other optimization techniques [

28

,

32

,

10

] have been

adapted to dynamic settings, see e.g. [

9

] for a review. Focusing on B O, the literature on dynamic

settings [

3

,

7

,

25

] is limited. The dynamic BO framework closest to this work is given by Nyikosa et al.

[25]

and focuses on functions deﬁned on continuous spaces that follow a more complex behaviour

than a simple Markov model. ABO treats the inputs as ﬁxed and not as random variables, thereby

disregarding their temporal evolution and, more importantly, breaking their causal dependencies.

Causal Optimization

Causal BO [CBO,

2

] focuses instead on the causal aspect of optimization and

solves the problem of ﬁnding an optimal intervention in a DAG by modelling the intervention functions

with single GPs or a multi-task GP model [

1

]. CBO disregards the existence of a temporal evolution

in both the inputs and the output variable, treating them as i.i.d. overtime. While disregarding time

signiﬁcantly simpliﬁes the problem, it prevents the identiﬁcation of an optimal intervention at every

t

.

Bandits and RL

In the broader decision-making literature, causal relationships have been considered

in the context of bandits [

4

,

20

–

22

] and reinforcement learning [

23

,

8

,

13

,

36

,

24

]. In these cases,

actions or arms, correspond to interventions on a causal graph where there exists complex relationships

between the agent’s decisions and the received rewards. While dynamic settings have been considered

in acausal bandit algorithms [

5

,

33

,

35

], causal MAB have focused on static settings. Dynamic settings

are instead considered by RL algorithms and formalized through Markov decision processes (MDP).

2A Python implementation is available at: https://github.com/neildhir/DCBO.

2

In the current formulation, DCBO does not consider an MDP as we do not have a notion of state

and therefore do not require an explicit model of its dynamics. The system is fully speciﬁed by the

causal model. As in BO, we focus on identifying a set of time-indexed optimal actions rather than

an optimal policy. We allow the agent to perform explorative interventions that do not lead to state

transitions. More importantly, differently from both MAB and RL,we allow for the integration of both

observational and interventional data. An expanded discussion on the reason why D CBO should be

used and the links between DC BO,CBO,A BO and RL is included in the supplement (§8).

2 Background and Problem Statement

Let random variables and values be denoted by upper-case and lower-case letters respectively. Vectors

are represented shown in bold.

do(X=x)

represents an intervention on

X

whose value is set to

x

.

p(Y|X=x)

represents an observational distribution and

p(Y|do(X=x))

represents an

interventional distribution. This is the distribution of

Y

obtained by intervening on

X

and ﬁxing

its value to

x

in the data generating mechanism (see Fig. 2), irrespective of the values of its parents.

Evaluating

p(Y|do(X=x))

requires “real” interventions while

p(Y|X=x)

only requires

“observing” the system.

DO

and

DI

denote observational and interventional datasets respectively.

Consider a structural causal model (SC M) deﬁned in Deﬁnition 1.

Deﬁnition 1. (Structural Causal Model)

[

27

, p. 203]. A structural causal model

M

is a triple

hU,V, F )i

where

U

is a set of background variables (also called exogenous), that are determined

by factors outside of the model.

V

is a set

{V1, V2, . . . , V|V|}

of observable variables (also called

endogenous), that are determined by variables in the model (i.e., determined by variables in

U∪V

).

F

is a set of functions

{f1, f2, . . . , fn}

such that each

fi

is a mapping from the respective domains

of

Ui∪Pa(Vi)

to

Vi

, where

Ui⊆U

and

Pa(Vi)⊆V\Vi

and the entire set

F

forms a mapping from

U

to

V

. In other words, each

{fi∈vi←fi(Pa(vi), ui)|i= 1, . . . , n}

, assigns a value to

Vi

that

depends on the values of the select set of variables (Ui∪Pa(Vi)).

M

is associated to a directed acyclic graph (DAG)

G

, in which each node corresponds to a variable and

the directed edges point from members of Pa(Vi)and Uito Vi.We assume Gto be known and leave

the integration with causal discovery [

15

] methods for future work. Within

V

, we distinguish between

three different types of variables: non-manipulative variables

C

, which cannot be modiﬁed, treatment

variables

X

that can be set to speciﬁc values and output variable

Y

that represents the agent’s outcome

of interest. Exploiting the rules of do-calculus [

27

] one can compute

p(Y|do(X=x))

using

observational data. This often involves evaluating intractable integrals which can be approximated

by using observational data to get a Monte Carlo estimate

bp(Y|do(X=x)) ≈p(Y|do(X=x))

.

These approximations will be consistent when the number of samples drawn from p(V)is large.

Causality in time

One can encode the existence of causal mechanisms across time steps by

explicitly representing these relationships with edges in an extended graph denoted by

G0:T

. For

instance, the DAG in Fig. 1(a) can be seen as one of the DAGs in Fig. 1(b) propagated in time. The DAG

in Fig. 1(a) captures both the causal structure existing across time steps and the causal mechanism

within every “time-slice”

t

[

19

]. In order to reason about interventions that are implemented in a

sequential manner, that is at time

t

we decide which intervention to perform in the system and so

deﬁne:

Deﬁnition 2. Mt

is the SC M at time step

t

deﬁned as

Mt=hU0:t,V0:t,F0:ti

where

0 : t

denotes the

union of the corresponding variables or functions up to time

t

(see Fig. 2).

V0:t

includes

X0:t=Xt

,

Y0:t=Yt

and

C0:t=Ct∪C0:t−1

. The functions in

F0:t

corresponding to intervened variables are

replaced by constant values while the exogenous variables related to them are excluded from U0:t.

Deﬁnition 3. Gt

is the causal graph associated to

Mt

. In

Gt

, the incoming edges in variables

intervened at

0 : t−1

are mutilated while intervened variables are represented by deterministic nodes

(squares) – see Fig. 2.

Dynamic Causal Global Optimization (DC GO)

The goal of this work is to ﬁnd a sequence of

interventions, optimizing a target variable, at each time step, in a causal DAG. Given

Gt

and

Mt

, at

every time step

t

, we wish to optimize

Yt

by intervening on a subset of the manipulative variables

Xt

.

The optimal intervention variables X?

s,t and intervention levels x?

s,t are given by:

X?

s,t,x?

s,t = arg min

Xs,t∈P (Xt),xs∈D(Xs,t )E[Yt|do(Xs,t =xs,t),1t>0·I0:t−1](1)

3

X0=fX0(ǫX0)

Z0=fZ0(X0,ǫZ0)

Y0=fY0(Z0,ǫY0)

F0={fX0, fZ0, fY0}

U0={ǫX0,ǫZ0,ǫY0}

X0={X0, Z0}

C0=∅

Y0=Y0

X0=fX0(ǫX0)

Z0=fI

Z0(·) = z0

Y0=fY0(z0,ǫY0)

X1=fX1(X0,ǫX0)

Z1=fZ1(z0, X1,ǫZ1)

Y1=fY1(Y0, Z1,ǫY1)

F0:1 ={fX0, f I

Z0, fY0,X1, fZ1, fY1}

U0:1 ={ǫX0,ǫY0,ǫX1,ǫZ1,ǫY1}

X0:1 ={X1, Z1}

C0:1 ={X0, Y0}

Y0:1 =Y1

X0=fX0(ǫX0)

Z0=fI

Z0(·) = z0

Y0=fY0(z0,ǫY0)

X1=fI

X1(·) = x1

Z1=fZ1(z0,x1,ǫZ1)

Y1=fY1(Y0, Z1,ǫY1)

X2=fX2(x1,ǫX2)

Z2=fZ2(Z1, X2,ǫZ2)

Y2=fY2(Y1, Z2,ǫY2)

F0:2 ={fX0, f I

Z0, fY0, f I

X1, fZ1, fY1, fX2, fZ2, fY2}

U0:2 ={ǫX0,ǫY0,ǫZ1,ǫY1,ǫX2,ǫZ2,ǫY2}

X0:2 ={X2, Z2}

C0:2 ={X0, Y0, Z1, Y1}

Y0:2 =Y2

DAG

Z0

X0

Y0

ǫX0

ǫZ0

ǫY0

Gt=0

Gt=2

z0

x1

Z1

Y1

X0

Y0

ǫX0

ǫY0

ǫZ1

ǫY1

X2

Z2

Y2

ǫX2

ǫZ2

ǫY2

z0

X1

Z1

Y1

X0

Y0

ǫX0

ǫY0

ǫX1

ǫZ1

ǫY1

I0= (Z0,z0) = argmin

Xs,0∈P(X0),

xs∈D(Xs,0)

E[Y0|do(Xs,0=xs,0)]

I1= (X1,x1) = argmin

Xs,1∈P(X1),

xs∈D(Xs,1)

E[Y1|do(Xs,1=xs,1), I0]

I2= (Z2,z2) = argmin

Xs,2∈P(X2),

xs∈D(Xs,2)

E[Y2|do(Xs,2=xs,2), I1, I0]

Gt=1

Mt

Figure 2: Structural equation models considered by DCBO at every time step

t∈ {0,1,2}

. Exogenous

noise variables

i

are depicted here but are omitted in the remainder of the paper, to avoid clutter.

For every

t

,

Gt

is a mutilated version of

Gt−1

reﬂecting the optimal intervention implemented in the

system at

0 : t−1

which are represented by squares. The SCM functions in

F0:t

corresponding to

the intervened variables are set to constant values. The exogenous variables that only related to the

intervened variables are excluded from

Ut

.

C0:t

is given by the set

{Ct∪C0:t−1∪Y0:t−1∪X0:t−1}

.

where

I0:t−1=St−1

i=0 doX?

s,i =x?

s,i

denotes previous interventions,

1t>0

is the indicator function

and

P(Xt)

is the power set of

Xt

.

D(Xs,t)

represents the interventional domain of

Xs,t

. In the

sequel we denote the previously intervened variables by

IV

0:t−1=St−1

i=0 X?

s,i

and implemented

intervention levels by

IL

0:t−1=St−1

i=0 x?

s,i

. The cost of each intervention is given by

cost(Xs,t,xs,t )

.

In order to solve the problem in Eq. (1) we make the following assumptions :

Assumptions 1.

Denote by

G(t)

the causal graph including variables at time

t

in

G0:T

and let

YPT

t=Pa(Yt)∩Y0:t−1

be the set of variables in

G0:T

that are both parents of

Yt

and targets at

previous time step. Let the set

YPNT

t=Pa(Yt)\YPT

t

be the complement and denote by

fYt(·)

the

functional mapping for Ytin Mt. We make the following assumptions:

1. Invariance of causal structure: G(t) = G(0),∀t > 0.

2.

Additivity of

fYt(·)

that is

Yt=fYt(Pa(Yt)) +

with

fYt(Pa(Yt)) = fY

Y(YPT

t) + fNY

Y(YPNT

t)

where fY

Yand fNY

Yare two generic unknown functions and ∼ N(0, σ2).

3. Absence of unobserved confounders in G0:T.

Assumption (3) implies the absence of unobserved confounders at every time step. For instance,

this is the case in Fig. 1a. Still in this DAG, Assumption (2) implies

fYt(Pa(Yt)) = fY

Y(Yt−1) +

fNY

Y(Zt) + Yt,∀t > 0

. Finally, Assumption (1) implies the existence of the same variables at every

time step and a constant orientation of the edges among them for t > 0.

Notice that Assumptions 1 imply invariance of the causal structure within each time-slice, i.e. the

structure, edges and vertices, concerning the nodes with the same time index. This means that, across

time steps, both the graph and the functional relationships can change. Therefore, not only can the

causal effects change signiﬁcantly across time steps, but also the input dimensionality of the causal

functions we model, might change. For instance, in the DAG of Fig. 3(c), the target function for

Y2

has dimensionality 3 and a function

fYt(·)

that is completely different from the one assumed for

Y1

that has only two parents. We can thus model a wide variety of settings and causal effects despite this

4

assumption. Furthermore, even though we assume an additive structure for the functional relationship

on

Y

, the use of GPs allow us to have ﬂexible models with highly non-linear causal effects across

different graph structures. In the causality literature, G P models are well established and have shown

good performances compared to parametric linear and non-linear models (see e.g. [

31

,

34

,

37

]). The

sum of GPs gives a ﬂexible and computationally tractable model that can be used to capture highly

non-linear causal effects while helping with interpretability [12, 11].

3 Methodology

In this section we introduce Dynamic Causal Bayesian Optimization (DC BO), a novel methodology

addressing the problem in Eq. (1). We ﬁrst study the correlation among objective functions for two

consecutive time steps and use it to derive a recursion formula that, based on the topology of the

graph, expresses the causal effects at time

t

as a function of previously implemented interventions (see

square nodes in Fig. 2). Exploiting these results, we develop a new surrogate model for the objective

functions that can be used within a CBO framework to ﬁnd the optimal sequence of interventions.

This model enables the integration of observational data and interventional data, collected at previous

time-steps and interventional data collected at time

t

, thereby accelerating the identiﬁcation of the

current optimal intervention.

3.1 Characterization of the time structure in a DAG with time dependent variables

The following result provides a theoretical foundation for the dynamic causal GP model introduced

later. In particular, it derives a recursion formula allowing us to express the objective function at time

t

as a function of the objective functions corresponding to the optimal interventions at previous time

steps. The proof is given in the appendix (§2).

Deﬁnition 4.

Consider a DAG

G0:T

and the objective function

E[Yt|do(Xs,t =xs,t), I0:t−1]

for

a generic time step

t∈ {0, . . . , T }

. Denote by

YPT

t= (Pa(Yt)∩Y0:t−1)

the parents of

Yt

that

are targets at previous time steps and by

YPNT

t=Pa(Yt)\YPT

t

the remaining parents. For any

Xs,t ∈ P(Xt)and IV

0:t−1⊆X0:t−1we deﬁne the following sets:

•XPY

s,t =Xs,t ∩Pa(Yt)includes the variables in Xs,t that are parents of Yt.

•IPY

0:t−1=IV

0:t−1∩Pa(Yt)includes the variables in IV

0:t−1that are parents of Yt.

•W⊂Pa(Yt)

such that

Pa(Yt)=(Pa(Yt)∩Y0:t−1)∪XPY

s,t ∪IPY

0:t−1∪W

.

W

includes variables

that are parents of Ytbut are not targets nor intervened variables.

The values of I0:t−1,XPY

s,t ,IPY

0:t−1and Wwill be denoted by i,xPY ,iPY and wrespectively.

Theorem 1. Time operator

. Consider a DAG

G0:T

and the related SC M satisfying Assump-

tions 1. It is possible to prove that,

∀Xs,t ∈ P(Xt)

, the intervention function

fs,t(x) =

E[Yt|do(Xs,t =x),1t>0·I0:t−1]with fs,t(x) : D(Xs,t )→Rcan be written as:

fs,t(x) = fY

Y(f?) + Ep(w|do(Xs,t=x),i)fNY

Y(xPY ,iPY ,w)(2)

where

f?={EYi|doX?

s,i =x?

s,i, I0:i−1}Yi∈YPT

t

that is the set of previously observed optimal

targets that are parents of

Yt

.

fY

Y

denotes the function mapping

YPT

t

to

Yt

and

fNY

Y

represents the

function mapping YPNT

tto Yt.

Eq. (2) reduces to

Ep(w|do(Xs,t=x),i)fNY

Y(xPY ,iPY ,w)

when

Yt

does not depend on previous

targets. This is the setting considered in CBO that can be thus seen as a particular instance of DCBO.

Exploiting Assumptions (1), it is possible to further expand the second term in Eq. (2) to get the

following expression. A proof is given in the supplement (§2).

Corollary 1. Given Assumptions 1, we can write:

Ep(w|do(Xs,t=x),i)fNY

Y(xPY ,iPY ,w)=Ep(U0:t)fNY

Y(xPY ,iPY ,{C(W)}W∈W)(3)

where p(U0:t)is the distribution for the exogenous variables up to time tand C(W)is given by:

C(W) =

fW(uW,xPW ,iPW )if R=∅

fW(uW,xPW ,iPW , r)if R⊆Xs,t ∪IV

0:t−1

fW(uW,xPW ,iPW , C (R)) if R6⊆ Xs,t ∪IV

0:t−1

5

where

fW

represents the functional mapping for

W

in the SC M and

uW

is the set of exogenous

variables with edges into

W

.

xPW

and

iPW

are the values corresponding to

XPW

s,t

and

IPW

0:t−1

which in

turn represent the subset of variables in

Xs,t

and

IV

0:t−1

that are parents of

W

. Finally

r

is the value

of R=Pa(W)\(XPY

s,t ∪IPW

0:t−1).

Examples for Eq. (2):

For the DAG in Fig. 1(a), at time

t= 1

and with

IV

0:t−1={Z0}

, we

have

E[Y|do(Z1=z), I0] = fY

Y(y?

0) + fNY

Y(z)

. Indeed in this case

W=∅

,

xPY =z

and

f?={y?

0=E[Y0|do(Z0=z0)]}

. Still at

t= 1

and with

IV

0:t−1={Z0}

, the objective function

for

Xs,t ={X1}

can be written as

fY

Y(y?

0) + Ep(z1|do(X1=x),I0)fNY

Y(z1)

as

W={Z1}

. All

derivations for these expressions and alternative graphs are given in the supplement (§2).

3.2 Restricting the search space

The search space for the problem in Eq. (1) grows exponentially with

|Xt|

thus slowing down the

identiﬁcation of the optimal intervention when

G

includes more than a few nodes. Indeed, a naive

approach of ﬁnding X?

s,t at t= 0, . . . , T would be to explore the 2|Xt|sets in P(Xt)at every tand

keep

2|Xt|

models for the objective functions. In the static setting, CBO reduces the search space

by exploiting the results in [

21

]. In particular, it identiﬁes a subset of variables

M⊆ P(X)

worth

intervening on thus reducing the size of the exploration set to 2|M|.

In our dynamic setting, the objective functions change at every time step depending on the previously

implemented interventions and one would need to recompute

M

at every

t

. However, it is possible

to show that, given Assumptions 1, the search space remains constant over time. Denote by

Mt

the

set

M

at time

t

and let

M0

represent the set at

t= 0

which corresponds to

M

computed in CB O. For

t > 0it is possible to prove that:

Proposition 3.1. MI S in time. If Assumptions 1 are satisﬁed, Mt=M0for t > 0.

3.3 Dynamic Causal GP model

Here we introduce the Dynamic Causal GP model that is used as a surrogate model for the objective

functions in Eq. (1). The prior parameters are constructed by exploiting the recursion in Eq. (2). At

each time step

t

, the agent explores the sets in

Mt⊆ P(Xt)

by selecting the next intervention to be

the one maximizing a given acquisition function. The DCBO algorithm is shown in Algorithm 1.

Prior Surrogate Model

At each time step

t

and for each

Xs,t ∈Mt

, we place a GP prior on the

objective function fs,t(x) = E[Yt|do(Xs,t =x),1t>0·I0:t−1]. We construct the prior parameters

exploiting the recursive expression in Eq. (2):

fs,t(x)∼ GP(ms,t(x), ks,t (x,x0)) where

ms,t(x) = EhfY

Y(f?) + b

E[fNY

Y(xPY ,iPY ,w)]i

ks,t(x,x0) = kR BF (x,x0) + σs,t(x)σs,t (x0)with

σs,t(x) = qV[fY

Y(f?) + ˆ

EfNY

Y(xPY ,iPY ,w)

and kRBF (x,x0):= exp(−||x−x0||2

2l2)represents the radial basis function kernel [29]. We have it that

ˆ

EfNY

Y(xPY ,iPY ,w)=ˆ

Ep(w|do(Xs,t=x),i)fNY

Y(xPY ,iPY ,w)

represents the expected value of

fNY

Y(xPY ,iPY ,w)

with respect to

p(w|do(Xs,t =x),i)

which

is estimated via the do-calculus using observational data. The outer expectation in

ms,t(x)

and

the variance in

σs,t(x)

are computed with respect to

p(fY

Y, f NY

Y)

which is also estimated using

observational data. In this work we model

fY

Y

,

fNY

Y

and all functions in the SCM by independent G Ps.

Both

ms,t(x)

and

σs,t(x)

can be equivalently written by exploiting the equivalence in Eq. (3). In

both cases, this prior construction allows the integration of three different types of data: observational

data, interventional data collected at time

t

and the optimal interventional data points collected in

the past. The former is used to estimate the SCM model and p(w|do(Xs,t =x),i)via the rules of

6

do-calculus. The optimal interventional data points at

0 : t−1

determine the shift

fY

Y(f?)

while the

interventional data collected at time

t

are used to update the prior distribution on

fs,t(x)

. Similar

prior constructions were previously considered in static settings [

2

,

1

] where only observational and

interventional data at the current time step were used. The additional shift term appears here as

there exists causal dynamics in the target variables and the objective function is affected by previous

decisions. Fig. 2 in the appendix shows a synthetic example in which accounting for the dynamic

aspect in the prior formulation leads to a more accurate GP posterior compared to the baselines,

especially when the the optimum location changes across time steps.

Algorithm 1: DC BO

Data: DO,{DI

s,t=0}s∈{0,...,|M0|} ,G0:T,H.

Result: Optimal intervention path

{X?

s,t,x?

s,t, y ?

t}T

t=1

Initialise:M,DI

0and initial optimal DI

?=∅.

for t= 0, . . . , T do

1. Initialise dynamic causal GP models for all

Xs,t ∈Mtusing DI

?,t−1if t > 0.

2. Initialise interventional dataset

{DI

s,t}s∈{0,...,|Mt|}

for h= 1, . . . , H do

1. Compute EIs,t(x)for each Xs,t ∈Mt.

2. Obtain (s?, α?)

3. Intervene and augment DI

s=s?,t

4. Update posterior for fs=s?,t

end

3. Return the optimal intervention (X?

s,t,x?

s,t)

4. Append optimal interventional data

DI

?,t =DI

?,t−1∪((X?

s,t,x?

s,t), y ?

t)

end

Likelihood

Let

DI

s,t = (XI,YI

s,t)

be the

set of interventional data points collected for

Xs,t

with

XI

being a vector of intervention

values and

YI

s,t

representing the corresponding

vector of observed target values. As in stan-

dard BO we assume each

ys,t

in

YI

s,t

to be a

noisy observation of the function

fs,t(x)

that is

ys,t(x) = fs,t (x) + s,t

with

s,t ∼ N(0, σ 2)

for

s∈ {1,...,|Mt|}

and

t∈ {0, . . . , T }

. In

compact form, the joint likelihood function for

DI

s,t is p(YI

s,t |fs,t, σ2) = N(fs,t(XI), σ2I).

Acquisition Function

Given our surrogate

models at time

t

, the agent selects the interven-

tions to implement by solving a Causal Bayesian

Optimization problem [

2

]. The agent explores

the sets in

Mt

and decides where to intervene by

maximizing the Causal Expected Improvement

(EI). Denote by

y?

t

the optimal observed target

value in

{YI

s,t}|Mt|

s=1

that is the optimal observed target across all intervention sets at time

t

. The

Causal EI is given by

EIs,t (x) = Ep(ys,t )[max(ys,t −y?

t,0)]/cost(Xs,t,xs,t ).

Let

α1, . . . , α|Mt|

be solutions of the optimization of

EIs,t (x)

for each set in

Mt

and

α?:=

max{α1, . . . , α|Mt|}

. The next best intervention to explore at time

t

is given by

s?=

argmaxs∈{1,··· ,|Mt|} αs.

Therefore, the set-value pair to intervene on is

(s?, α?)

. At every

t

, the agent

implement

H

explorative interventions in the system which are selected by maximizing the Causal EI.

Once the budget

H

is exhausted, the agent implements what we call the decision intervention

It

, that

is the optimal intervention found at the current time step, and move forward to a new optimization

at

t+ 1

carrying the information in

y?

0:t−1

. The parameter

H

determines the level of exploration of

the system and acts as a budget for the CBO algorithm. Its value is determined by the agent and is

generally problem speciﬁc.

Posterior Surrogate Model

For any set

Xs,t ∈Mt

, the posterior distribution

p(fs,t | DI

s,t)

can be

derived analytically via standard GP updates. p(fs,t | DI

s,t)will also be a G P with parameters

ms,t(x| DI

s,t) = ms,t (x) + ks,t(x,XI)[ks,t (XI,XI) + σ2I](YI

s,t −ms,t(XI)and

ks,t(x,x0| DI

s,t) = ks,t (x,x0)−ks,t(x,XI)[ks,t (XI,XI) + σ2I]ks,t (XI,x0).

4 Experiments

We evaluate the performance of DC BO in a variety of synthetic and real world settings with DAGs

given in Fig. 3. We ﬁrst run the algorithm for a stationary setting where both the graph structure and

the SCM do not change over time (STAT.). We then consider a scenario characterised by increased

observation noise (NOIS Y) for the manipulative variables and a settings where observational data

are missing at some time steps (MI SS.). Still assuming stationarity, we then test the algorithm in

aDAG where there are multivariate interventions in

Mt

(MU LTI V.). Finally, we run DCB O for a

non-stationary graph where both the SCM and the DAG change over time (NO NSTAT.). To conclude,

7

we use DC BO to optimize the unemployment rate of a closed economy (DAG in Fig. 3d, ECO N.)

and to ﬁnd the optimal intervention in a system of ordinary differential equation modelling a real

predator-prey system (DAG in Fig. 3e, OD E). We provide a discussion on the applicability of DCBO to

real-world problems in §7 of the supplement together with all implementation details.

Baselines

We compare against the algorithms in Fig. 1. Note that, by constructions, ABO and BO

intervene on all manipulative variables while DCBO and CBO explore only

Mt

at every

t

. In addition,

both DCBO and ABO reduce to CBO and BO at the ﬁrst time step. We assume the availability of an

observational dataset DOand set a unit intervention cost for all variables.

Performance metric

We run all experiments for

10

replicates and show the average convergence

path at every time step. We then compute the values of a modiﬁed “gap” metric

3

across time steps

and with standard errors across replicates. The metric is deﬁned as

Gt=y(x?

s,t)−y(xinit )

y?−y(xinit)+H−H(x?

s,t)

H.1 + H−1

H(4)

where

y(·)

represents the evaluation of the objective function,

y?

is the global minimum, and

xinit

and

x?

s,t

are the ﬁrst and best evaluated point, respectively. The term

H−H(x?

s,t)

H

with

H(x?

s,t)

denoting

the number of explorative trials needed to reach

x?

s,t

captures the speed of the optimization. This term

is equal to zero when the algorithm is not converged and equal to

(H−1)/H

when the algorithm

converges at the ﬁrst trial. We have

0≤Gt≤1

with higher values denoting better performances. For

each method we also show the average percentage of replicates where the optimal intervention set

X?

s,t is identiﬁed.

X0X1X2

Z0Z1Z2

Y0Y1Y2

W0W1W2

(a) MULTIV.

X0X1X2

Z0Z1Z2

Y0Y1Y2

(b) IND.

X0X1X2

Z0Z1Z2

Y0Y1Y2

(c) NONSTAT.

R0R1R2

U2

U0U1

G2

G0G1

T0T1T2

(d) ECON.

Nin,t−1

Nt−1

Pt−1

Jt−1

Dt−1

At−1

Nin,t

Nt

Pt

Jt

Dt

At

·· ·

·· ·

·· ·

·· ·

·· ·

·· ·

·· ·

·· ·

·· ·

·· ·

Et−1

Et

·· ··· ·

(e) ODE.

Figure 3: DAGs used in the experimental sections for the real (§4.2) and synthetic data (§4.1).

4.1 Synthetic Experiments

Stationary DAG and SCM (S TAT.)

We run the algorithms for the DAG in Fig. 1(a) with

T= 3

and

N= 10

.For

t > 0

,DCBO converges to the optimal value faster than competing approaches (see

Fig. 2 in the supplement, right panel, 3

rd

row). DCB O identiﬁes the optimal intervention set in

93%

of the replicates (Table 2) and reaches the highest average gap metric (Table 1). In this experiment

the location of the optimum changes signiﬁcantly both in terms of optimal set and intervention

value when going from

t= 0

to

t= 1

. This information is incorporated by DCBO through the

prior dependency on

y?

0:t−1

. In addition, ABO performance improves over time as it accumulates

interventional data and uses them to ﬁt the temporal dimension of the surrogate model. This beneﬁts

ABO in a stationary setting but might penalise it in non-stationary setting where the objective functions

change signiﬁcantly.

Noisy manipulative variables (NOISY):

The beneﬁt of using DC BO becomes more apparent when

the manipulative variables observations are noisy while the evolution of the target variable is more

accurately detected. In this case both the convergence of DC BO and CBO are slowed down by noisy

observations which are diluting the information provided by the do-calculus making the priors less

informative. However, the D CBO prior dependency on

y?

0:t−1

allows it to correctly identify the shift

in the target variable thus improving the prior accuracy and the speed-up of the algorithm (Fig. 4).

Missing observational data (MISS.)

Incorporating dynamic information in the surrogate model

allows us to efﬁciently optimise a target variable even in setting where observational data are missing.

3This metric is a modiﬁed version of the one used in [18].

8

0 5 10 15

cost(Xs,t,xs,t)

−2

−1

y?

t

t= 0

0 5 10 15

cost(Xs,t,xs,t)

−4

−3t= 1

0 5 10 15

cost(Xs,t,xs,t)

−7

−6

−5t= 2 DCBO

CBO

ABO

BO

EYt|do(X?

s,t =x?

s,t)

Figure 4: Experiment NOISY. Convergence of DCB O and competing methods across replicates. The

dashed black line (- - -) gives the optimal outcome

y∗

t,∀t

. Shaded areas are

±

one standard deviation.

We consider the DAG in Fig. 1(a) with

T= 6

,

N= 10

for the ﬁrst three time steps and

N= 0

afterwards. DCBO uses the observational distributions learned with data from the ﬁrst three time steps

to construct the prior for

t > 3

. On the contrary, CBO uses the standard prior for

t > 3

. In this setting

DCBO consistently outperforms CBO at every time step. However, ABO performance improves over

time and outperforms DCBO starting from

t= 4

due to its ability to exploit all interventional data

collected over time (see Fig. 3 in the supplement).

Multivariate intervention sets (MULTIV.)

When the optimal intervention set is multivariate, both

DCBO and C BO convergence speed worsen. For instance, for the DAG in Fig. 3a,

|M|= 5

thus both

CBO and DC BO will have to perform more explorative interventions before ﬁnding the optimum. At

the same time, ABO and BO consider interventions only on

{Wt, Xt, Zt},∀t

and need to explore an

even higher intervention space. The performance of all methods decrease in this case (Table 1) but

DCBO still identiﬁes the optimal intervention set in 93% of the replicates (Table 2).

Independent manipulative variables (IN D.):

Having to explore multiple intervention sets signiﬁ-

cantly penalises DCBO and CBO when there is no causal relationship among manipulative variables

which are also the only parents of the target. This is the case for the DAG in Fig. 3b where the optimal

intervention is

{Xt, Zt}

at every time step. In this case, exploring

M

and propagating uncertainty in

the causal prior slows down DCB O convergence and decreases both its performance (Table 1) and

capability to identify the optimal intervention set (Table 2).

Non-stationary DAG and SCM (N ONSTAT.):

DCBO outperforms all approaches in non-stationary

settings where both the DAG and the SC M change overtime – see Fig. 3c. Indeed, DCBO can timely

incorporate changes in the system via the dynamic causal prior construction while CBO,B O and ABO

need to perform several interventions before accurately learning the new objective functions.

Table 1: Average

Gt

across 10 replicates and time steps. See Fig. 1 for a summary of the baselines.

Higher values are better. The best result for each experiment in bold. Standard errors in brackets.

Synthetic data Real data

STAT. MISS. NOISY MULTIV. IND. NO NSTAT. ECON.ODE

DCBO 0.88 0.84 0.75 0.49 0.48 0.69 0.64 0.67

(0.00) (0.01) (0.00) (0.01) (0.04) (0.00) (0.01) (0.00)

CBO 0.70 0.70 0.51 0.48 0.47 0.61 0.61 0.65

(0.01) (0.02) (0.02) (0.09) (0.07) (0.00) (0.01) (0.00)

ABO 0.56 0.49 0.49 0.39 0.54 0.38 0.57 0.48

(0.01) (0.02) (0.04) (0.21) (0.01) (0.02) (0.02) (0.01)

BO 0.54 0.48 0.38 0.35 0.50 0.38 0.50 0.44

(0.02) (0.03) (0.05) (0.08) (0.01) (0.03) (0.01) (0.03)

4.2 Real experiments

Real-World Economic data (ECON.)

We use D CBO to minimize the unemployment rate

Ut

of a

closed economy. We consider its causal relationships with economic growth (

Gt

), inﬂation rate

(

Rt

) and ﬁscal policy (

Tt

)

4

. Inspired by the economic example in [

17

] we consider the DAG in

4

The causality between economic variables is oversimpliﬁed in this example thus the results cannot be used

to guide public policy and are only meant to showcase how DCBO can be used within a real application.

9

Table 2: Average % of replicates across time steps for which

X?

s,t

is identiﬁed. See Fig. 1 for a

summary of the baselines. Higher values are better. The best result for each experiment in bold.

Synthetic data Real data

STAT. MISS. NOIS Y MULTIV. IND. NO NSTAT. ECO N.ODE

DCBO 93.00 58.00 100.00 93.00 93.00 100.00 86.67 33.3

CBO 90.00 85.00 90.00 90.0 90.00 100.00 93.33 33.3

ABO 0.00 0.00 0.00 0.00 100.00 0.00 66.67 0.00

BO 0.00 0.00 0.00 0.00 100.00 0.00 66.67 0.00

Fig. 3d where

Rt

and

Tt

are considered manipulative variables we need to intervene on to minimize

log(Ut)

at every time step. Time series data for 10 countries

5

are used to construct a non-parametric

simulator and to compute the causal prior for both DC BO and CBO.DCBO convergences to the optimal

intervention faster than competing approaches (see Table 1 and Fig. 6 in the appendix). The optimal

sequence of interventions found in this experiment is equal to

{(T0, R0) = (9.38,−2.00),(T1, R1) =

(0.53,6.00),(T2) = (0.012)}which is consistent with domain knowledge.

Planktonic predator–prey community in a chemostat (ODE)

We investigate a biological system

in which two species interact, one as a predator and the other as prey, with the goal of identifying

the intervention reducing the concentration of dead animals in the chemostat – see

Dt

in Fig. 3e.

We use the system of ordinary differential equations (ODE) given by [

6

] as our SC M and construct

the DAG by rolling out the temporal variable dependencies in the ODE while removing graph cycles.

Observational data are provided in [

6

] and are use to compute the dynamic causal prior. DCBO

outperforms competing methods in term of average gap metric and identiﬁes the optimum faster

(Table 1). Additional details can be found in the supplement (§6).

5 Conclusions

We consider the problem of ﬁnding a sequence of optimal interventions in a causal graph where

causal temporal dependencies exist between variables. We propose the Dynamic Causal Bayesian

Optimization (DC BO) algorithm which ﬁnds the optimal intervention at every time step by intervening

in the system according to a causal acquisition function. Importantly, for each possible intervention

we propose to use a surrogate model that incorporates information from previous interventions

implemented in the system. This is constructed by exploiting theoretical results establishing the

correlation structure among objective functions for two consecutive time steps as a function of the

topology of the causal graph. We discuss the DCBO performance in a variety of setting characterized

by different DAG properties and stationarity assumptions. Future work will focus on extending our

theoretical results to more general DAG structures thus allowing for unobserved confounders and a

changing DAG topology within each time step. In addition, we will work on combining the proposed

framework with a causal discovery algorithm so as to account for uncertainty in the graph structure.

Acknowledgements

This work was supported by the EPSRC grant EP/L016710/1, The Alan Turing Institute under EPSRC

grant EP/N510129/1, the Defence and Security Programme at The Alan Turing Institute, funded by

the UK Government and the Lloyds Register Foundation programme on Data Centric Engineering

through the London Air Quality project. TD acknowledges support from UKRI Turing AI Fellowship

(EP/V02678X/1).

5

Data were downloaded from

https://www.data.oecd.org/ [Accessed: 01/04/2021]

. All details

in the supplement.

10

References

[1]

Aglietti, V., Damoulas, T., Álvarez, M., and González, J. Multi-task causal learning with

gaussian processes. In Advances in Neural Information Processing Systems, volume 33, pp.

6293–6304, 2020.

[2]

Aglietti, V., Lu, X., Paleyes, A., and González, J. Causal Bayesian Optimization. In Proceedings

of the Twenty Third International Conference on Artiﬁcial Intelligence and Statistics, volume

108 of Proceedings of Machine Learning Research, pp. 3155–3164. PMLR, 26–28 Aug 2020.

[3]

Azimi, J., Jalali, A., and Fern, X. Dynamic batch bayesian optimization. arXiv preprint

arXiv:1110.3347, 2011.

[4]

Bareinboim, E., Forney, A., and Pearl, J. Bandits with unobserved confounders: A causal

approach. Advances in Neural Information Processing Systems, 28:1342–1350, 2015.

[5]

Besbes, O., Gur, Y., and Zeevi, A. Stochastic multi-armed-bandit problem with non-stationary

rewards. Advances in neural information processing systems, 27:199–207, 2014.

[6]

Blasius, B., Rudolf, L., Weithoff, G., Gaedke, U., and Fussmann, G. F. Long-term cyclic

persistence in an experimental predator–prey system. Nature, 577(7789):226–230, 2020.

[7]

Bogunovic, I., Scarlett, J., and Cevher, V. Time-varying gaussian process bandit optimization.

In Artiﬁcial Intelligence and Statistics, pp. 314–323. PMLR, 2016.

[8]

Buesing, L., Weber, T., Zwols, Y., Racaniere, S., Guez, A., Lespiau, J.-B., and Heess, N. Woulda,

coulda, shoulda: Counterfactually-guided policy search. arXiv preprint arXiv:1811.06272,

2018.

[9]

Cruz, C., González, J. R., and Pelta, D. A. Optimization in dynamic environments: a survey on

problems, methods and measures. Soft Computing, 15(7):1427–1448, 2011.

[10]

De, M. K., Slawomir, N. J., and Mark, B. Stochastic diffusion search: Partial function evaluation

in swarm intelligence dynamic optimisation. In Stigmergic optimization, pp. 185–207. Springer,

2006.

[11]

Duvenaud, D. Automatic model construction with Gaussian processes. PhD thesis, University

of Cambridge, 2014.

[12]

Duvenaud, D., Nickisch, H., and Rasmussen, C. E. Additive Gaussian Processes. arXiv preprint

arXiv:1112.4394, 2011.

[13]

Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., and Whiteson, S. Counterfactual multi-agent

policy gradients. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence, volume 32,

2018.

[14]

Fogel, L. J., Owens, A. J., and Walsh, M. J. Artiﬁcial intelligence through simulated evolution.

1966.

[15]

Glymour, C., Zhang, K., and Spirtes, P. Review of causal discovery methods based on graphical

models. Frontiers in Genetics, 10, 2019.

[16]

Goldberg, D. E. and Smith, R. E. Nonstationary function optimization using genetic algorithms

with dominance and diploidy. In Genetic algorithms and their applications: proceedings of the

second International Conference on Genetic Algorithms: July 28-31, 1987 at the Massachusetts

Institute of Technology, Cambridge, MA. Hillsdale, NJ: L. Erlhaum Associates, 1987., 1987.

[17]

Huang, B., Zhang, K., Gong, M., and Glymour, C. Causal discovery and forecasting in

nonstationary environments with state-space models. In International Conference on Machine

Learning, pp. 2901–2910. PMLR, 2019.

[18]

Huang, D., Allen, T. T., Notz, W. I., and Zeng, N. Global optimization of stochastic black-box

systems via sequential kriging meta-models. Journal of global optimization, 34(3):441–466,

2006.

11

[19]

Koller, D. and Friedman, N. Probabilistic Graphical Models: Principles and Techniques -

Adaptive Computation and Machine Learning. The MIT Press, 2009. ISBN 0262013193.

[20]

Lattimore, F., Lattimore, T., and Reid, M. D. Causal bandits: Learning good interventions via

causal inference. In Advances in Neural Information Processing Systems, pp. 1181–1189, 2016.

[21]

Lee, S. and Bareinboim, E. Structural causal bandits: where to intervene? Advances in Neural

Information Processing Systems 31, 31, 2018.

[22]

Lee, S. and Bareinboim, E. Structural causal bandits with non-manipulable variables. In

Proceedings of the AAAI Conference on Artiﬁcial Intelligence, volume 33, pp. 4164–4172,

2019.

[23]

Lu, C., Schölkopf, B., and Hernández-Lobato, J. M. Deconfounding reinforcement learning in

observational settings. arXiv preprint arXiv:1812.10576, 2018.

[24]

Madumal, P., Miller, T., Sonenberg, L., and Vetere, F. Explainable reinforcement learning

through a causal lens. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence,

volume 34, pp. 2493–2500, 2020.

[25]

Nyikosa, F. M., Osborne, M. A., and Roberts, S. J. Bayesian optimization for dynamic problems,

2018.

[26] Pearl, J. Causal diagrams for empirical research. Biometrika, 82(4):669–688, 1995.

[27] Pearl, J. Causality: models, reasoning and inference, volume 29. Springer, 2000.

[28]

Pelta, D., Cruz, C., and Verdegay, J. L. Simple control rules in a cooperative system for dynamic

optimisation problems. International Journal of General Systems, 38(7):701–717, 2009.

[29]

Rasmussen, C. E. Gaussian processes in machine learning. In Summer School on Machine

Learning, pp. 63–71. Springer, 2003.

[30]

Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., and De Freitas, N. Taking the human out of

the loop: A review of bayesian optimization. Proceedings of the IEEE, 104(1):148–175, 2015.

[31]

Silva, R. and Gramacy, R. B. Gaussian Process Structural Equation Models with Latent

Variables. arXiv preprint arXiv:1002.4802, 2010.

[32]

Trojanowski, K. and Wierzcho´

n, S. T. Immune-based algorithms for dynamic optimization.

Information Sciences, 179(10):1495–1515, 2009.

[33]

Villar, S. S., Bowden, J., and Wason, J. Multi-armed bandit models for the optimal design of

clinical trials: beneﬁts and challenges. Statistical science: a review journal of the Institute of

Mathematical Statistics, 30(2):199, 2015.

[34]

Witty, S., Takatsu, K., Jensen, D., and Mansinghka, V. Causal Inference using Gaussian

Processes with Structured Latent Confounders. In International Conference on Machine

Learning, pp. 10313–10323. PMLR, 2020.

[35]

Wu, Q., Iyer, N., and Wang, H. Learning contextual bandits in a non-stationary environment.

In The 41st International ACM SIGIR Conference on Research & Development in Information

Retrieval, pp. 495–504, 2018.

[36]

Zhang, J. and Bareinboim, E. Near-optimal reinforcement learning in dynamic treatment

regimes. In Advances in Neural Information Processing Systems 32 pre-proceedings (NeurIPS),

2019.

[37]

Zhang, K., Schölkopf, B., and Janzing, D. Invariant Gaussian Process Latent Variable Models

and Application in Causal Discovery. arXiv preprint arXiv:1203.3534, 2012.

12