Conference PaperPDF Available

Dynamic Causal Bayesian Optimization

Authors:

Abstract and Figures

This paper studies the problem of performing a sequence of optimal interventions in a causal dynamical system where both the target variable of interest and the inputs evolve over time. This problem arises in a variety of domains e.g. system biology and operational research. Dynamic Causal Bayesian Optimization (DCBO) brings together ideas from sequential decision making, causal inference and Gaussian process (GP) emulation. DCBO is useful in scenarios where all causal effects in a graph are changing over time. At every time step DCBO identifies a local optimal intervention by integrating both observational and past interventional data collected from the system. We give theoretical results detailing how one can transfer interventional information across time steps and define a dynamic causal GP model which can be used to quantify uncertainty and find optimal interventions in practice. We demonstrate how DCBO identifies optimal interventions faster than competing approaches in multiple settings and applications.
Content may be subject to copyright.
Dynamic Causal Bayesian Optimization
Virginia Aglietti
University of Warwick
The Alan Turing Institute
V.Aglietti@warwick.ac.uk
Neil Dhir
The Alan Turing Institute
ndhir@turing.ac.uk
Javier González
Microsoft Research Cambridge
Gonzalez.Javier@microsoft.com
Theodoros Damoulas
University of Warwick
The Alan Turing Institute
T.Damoulas@warwick.ac.uk
Abstract
This paper studies the problem of performing a sequence of optimal interventions in
a causal dynamical system where both the target variable of interest and the inputs
evolve over time. This problem arises in a variety of domains e.g. system biology
and operational research. Dynamic Causal Bayesian Optimization (DCBO) brings
together ideas from sequential decision making, causal inference and Gaussian
process (GP) emulation. DCBO is useful in scenarios where all causal effects
in a graph are changing over time. At every time step DCBO identifies a local
optimal intervention by integrating both observational and past interventional data
collected from the system. We give theoretical results detailing how one can
transfer interventional information across time steps and define a dynamic causal
GP model which can be used to quantify uncertainty and find optimal interventions
in practice. We demonstrate how DCBO identifies optimal interventions faster than
competing approaches in multiple settings and applications.
1 Introduction
Causal Dimension
f(X=x, t)
ABO
Temporal Dimension
DCBO
f(do(Xs,t =xs,t), I0:t1)
f(do(Xs=xs))
CBO
f(X=x)
BO
(a) (c)
(b) (d)
Y2
X0X2
Y0
X1
Y1
Z2
Z0Z1
Y2
X0X2
Y0Y1
Z2
Z0Z1
X1
Y2
X0X2
Y0
X1
Y1
Z2
Z0Z1
Y1Y2
X0X2
Y0
X1
Z2
Z0Z1
Figure 1: DAG representation of a
dynamic causal global optimisation
(DCGO) problem (a) and the DAG
considered when using CB O,A BO
or BO to address the same prob-
lem. Shaded nodes gives observed
variables while the arrows represent
causal effects.
Solving decision making problems in a variety of domains
requires understanding of cause-effect relationships in a system.
This can be obtained by experimentation. However, deciding
how to intervene at every point in time is particularly complex
in dynamical systems, due to the evolving nature of causal
effects. For instance, companies need to decide how to allocate
scarce resources across different quarters. In system biology,
scientists need to identify genes to knockout at specific points
in time. This paper describes a probabilistic framework that
finds such optimal interventions over time.
Focusing on a specific example, consider a setting in which
Yt
denotes the unemployment-rate of an economy at time
t
,
Zt
is
the economic growth and Xtthe inflation rate. Fig. 1a depicts
the causal graph [
26
] representing an agent’s understanding of
the causal links between these variables. The agent aims to
determine, at each time step
t∈ {0,1,2}
, the optimal action to
perform in order to minimize the current unemployment rate
Denotes equal contribution
35th Conference on Neural Information Processing Systems (NeurIPS 2021).
Yt
while accounting for the intervention cost. The investigator could frame this setting as a sequence
of global optimization problems and find the solutions by resorting to Causal Bayesian Optimization
[CBO,
2
]. CBO extends Bayesian Optimization [BO,
30
] to cases in which the variable to optimize is
part of a causal model where a sequence of interventions can be performed. However, CBO does not
account for the system’s temporal evolution thus breaking the time dependency structure existing
among variables (Fig. 1b). This will lead to sub-optimal solutions, especially in non-stationary
scenarios. The same would happen when using Adaptive Bayesian Optimization [ABO,
25
] (Fig. 1c)
or BO (Fig. 1d). ABO captures the time dependency of the objective function but neither considers
the causal structure among inputs nor their temporal evolution. BO disregards both the temporal and
the causal structure. Our setting differs from both reinforcement learning (RL) and the multi-armed
bandits setting (MAB). Differently from MAB we consider interventions on continuous variables where
the dynamic target variable has a non-stationary interventional distribution. In addition, compared
to RL, we do not model the state dynamics explicitly and allow the agent to perform a number of
explorative interventions which do not change the underlying state of the system, before selecting the
optimal action. We discuss these points further in §1.1.
Dynamic Causal Bayesian Optimization
2
, henceforth DC BO, accounts for both the causal relationships
among input variables and the causality between inputs and outputs which might evolve over time.
DCBO integrates CBO with dynamic Bayesian networks (DBN), offering a novel approach for decision
making under uncertainty within dynamical systems. DBN [
19
] are commonly used in time-series
modelling and carry dependence assumptions that do not imply causation. Instead, in probabilistic
causal models [
27
], which form the basis for the CBO framework, graphs are buildt around causal
information and allow us to reason about the effects of different interventions. By combining CBO
with DBNs, the proposed methodology finds an optimal sequence of interventions which accounts
for the causal temporal dynamics of the system. In addition, DCBO takes into account past optimal
interventions and transfers this information across time, thus identifying the optimal intervention
faster than competing approaches and at a lower cost. We make the following contributions:
We formulate a new class of optimization problems called Dynamic Causal Global Optimization
(DCGO) where the objective functions account for the temporal causal dynamics among variables.
We give theoretical results demonstrating how interventional information can be transferred across
time-steps depending on the topology of the causal graph.
Exploiting our theoretical results, we solve the optimization problem with DCBO. At every time
step, DCBO constructs surrogate models for different intervention sets by integrating various sources
of data while accounting for past interventions.
We analyze D CBO performance in a variety of settings comparing against CBO,ABO and BO.
1.1 Related Work
Dynamic Optimization
Optimization in dynamic environments has been studied in the context of
evolutionary algorithms [
14
,
16
]. More recently, other optimization techniques [
28
,
32
,
10
] have been
adapted to dynamic settings, see e.g. [
9
] for a review. Focusing on B O, the literature on dynamic
settings [
3
,
7
,
25
] is limited. The dynamic BO framework closest to this work is given by Nyikosa et al.
[25]
and focuses on functions defined on continuous spaces that follow a more complex behaviour
than a simple Markov model. ABO treats the inputs as fixed and not as random variables, thereby
disregarding their temporal evolution and, more importantly, breaking their causal dependencies.
Causal Optimization
Causal BO [CBO,
2
] focuses instead on the causal aspect of optimization and
solves the problem of finding an optimal intervention in a DAG by modelling the intervention functions
with single GPs or a multi-task GP model [
1
]. CBO disregards the existence of a temporal evolution
in both the inputs and the output variable, treating them as i.i.d. overtime. While disregarding time
significantly simplifies the problem, it prevents the identification of an optimal intervention at every
t
.
Bandits and RL
In the broader decision-making literature, causal relationships have been considered
in the context of bandits [
4
,
20
22
] and reinforcement learning [
23
,
8
,
13
,
36
,
24
]. In these cases,
actions or arms, correspond to interventions on a causal graph where there exists complex relationships
between the agent’s decisions and the received rewards. While dynamic settings have been considered
in acausal bandit algorithms [
5
,
33
,
35
], causal MAB have focused on static settings. Dynamic settings
are instead considered by RL algorithms and formalized through Markov decision processes (MDP).
2A Python implementation is available at: https://github.com/neildhir/DCBO.
2
In the current formulation, DCBO does not consider an MDP as we do not have a notion of state
and therefore do not require an explicit model of its dynamics. The system is fully specified by the
causal model. As in BO, we focus on identifying a set of time-indexed optimal actions rather than
an optimal policy. We allow the agent to perform explorative interventions that do not lead to state
transitions. More importantly, differently from both MAB and RL,we allow for the integration of both
observational and interventional data. An expanded discussion on the reason why D CBO should be
used and the links between DC BO,CBO,A BO and RL is included in the supplement (§8).
2 Background and Problem Statement
Let random variables and values be denoted by upper-case and lower-case letters respectively. Vectors
are represented shown in bold.
do(X=x)
represents an intervention on
X
whose value is set to
x
.
p(Y|X=x)
represents an observational distribution and
p(Y|do(X=x))
represents an
interventional distribution. This is the distribution of
Y
obtained by intervening on
X
and fixing
its value to
x
in the data generating mechanism (see Fig. 2), irrespective of the values of its parents.
Evaluating
p(Y|do(X=x))
requires “real” interventions while
p(Y|X=x)
only requires
“observing” the system.
DO
and
DI
denote observational and interventional datasets respectively.
Consider a structural causal model (SC M) defined in Definition 1.
Definition 1. (Structural Causal Model)
[
27
, p. 203]. A structural causal model
M
is a triple
hU,V, F )i
where
U
is a set of background variables (also called exogenous), that are determined
by factors outside of the model.
V
is a set
{V1, V2, . . . , V|V|}
of observable variables (also called
endogenous), that are determined by variables in the model (i.e., determined by variables in
UV
).
F
is a set of functions
{f1, f2, . . . , fn}
such that each
fi
is a mapping from the respective domains
of
UiPa(Vi)
to
Vi
, where
UiU
and
Pa(Vi)V\Vi
and the entire set
F
forms a mapping from
U
to
V
. In other words, each
{fivifi(Pa(vi), ui)|i= 1, . . . , n}
, assigns a value to
Vi
that
depends on the values of the select set of variables (UiPa(Vi)).
M
is associated to a directed acyclic graph (DAG)
G
, in which each node corresponds to a variable and
the directed edges point from members of Pa(Vi)and Uito Vi.We assume Gto be known and leave
the integration with causal discovery [
15
] methods for future work. Within
V
, we distinguish between
three different types of variables: non-manipulative variables
C
, which cannot be modified, treatment
variables
X
that can be set to specific values and output variable
Y
that represents the agent’s outcome
of interest. Exploiting the rules of do-calculus [
27
] one can compute
p(Y|do(X=x))
using
observational data. This often involves evaluating intractable integrals which can be approximated
by using observational data to get a Monte Carlo estimate
bp(Y|do(X=x)) p(Y|do(X=x))
.
These approximations will be consistent when the number of samples drawn from p(V)is large.
Causality in time
One can encode the existence of causal mechanisms across time steps by
explicitly representing these relationships with edges in an extended graph denoted by
G0:T
. For
instance, the DAG in Fig. 1(a) can be seen as one of the DAGs in Fig. 1(b) propagated in time. The DAG
in Fig. 1(a) captures both the causal structure existing across time steps and the causal mechanism
within every “time-slice”
t
[
19
]. In order to reason about interventions that are implemented in a
sequential manner, that is at time
t
we decide which intervention to perform in the system and so
define:
Definition 2. Mt
is the SC M at time step
t
defined as
Mt=hU0:t,V0:t,F0:ti
where
0 : t
denotes the
union of the corresponding variables or functions up to time
t
(see Fig. 2).
V0:t
includes
X0:t=Xt
,
Y0:t=Yt
and
C0:t=CtC0:t1
. The functions in
F0:t
corresponding to intervened variables are
replaced by constant values while the exogenous variables related to them are excluded from U0:t.
Definition 3. Gt
is the causal graph associated to
Mt
. In
Gt
, the incoming edges in variables
intervened at
0 : t1
are mutilated while intervened variables are represented by deterministic nodes
(squares) – see Fig. 2.
Dynamic Causal Global Optimization (DC GO)
The goal of this work is to find a sequence of
interventions, optimizing a target variable, at each time step, in a causal DAG. Given
Gt
and
Mt
, at
every time step
t
, we wish to optimize
Yt
by intervening on a subset of the manipulative variables
Xt
.
The optimal intervention variables X?
s,t and intervention levels x?
s,t are given by:
X?
s,t,x?
s,t = arg min
Xs,t∈P (Xt),xsD(Xs,t )E[Yt|do(Xs,t =xs,t),1t>0·I0:t1](1)
3
X0=fX0(ǫX0)
Z0=fZ0(X0,ǫZ0)
Y0=fY0(Z0,ǫY0)
F0={fX0, fZ0, fY0}
U0={ǫX0,ǫZ0,ǫY0}
X0={X0, Z0}
C0=
Y0=Y0
X0=fX0(ǫX0)
Z0=fI
Z0(·) = z0
Y0=fY0(z0,ǫY0)
X1=fX1(X0,ǫX0)
Z1=fZ1(z0, X1,ǫZ1)
Y1=fY1(Y0, Z1,ǫY1)
F0:1 ={fX0, f I
Z0, fY0,X1, fZ1, fY1}
U0:1 ={ǫX0,ǫY0,ǫX1,ǫZ1,ǫY1}
X0:1 ={X1, Z1}
C0:1 ={X0, Y0}
Y0:1 =Y1
X0=fX0(ǫX0)
Z0=fI
Z0(·) = z0
Y0=fY0(z0,ǫY0)
X1=fI
X1(·) = x1
Z1=fZ1(z0,x1,ǫZ1)
Y1=fY1(Y0, Z1,ǫY1)
X2=fX2(x1,ǫX2)
Z2=fZ2(Z1, X2,ǫZ2)
Y2=fY2(Y1, Z2,ǫY2)
F0:2 ={fX0, f I
Z0, fY0, f I
X1, fZ1, fY1, fX2, fZ2, fY2}
U0:2 ={ǫX0,ǫY0,ǫZ1,ǫY1,ǫX2,ǫZ2,ǫY2}
X0:2 ={X2, Z2}
C0:2 ={X0, Y0, Z1, Y1}
Y0:2 =Y2
DAG
Z0
X0
Y0
ǫX0
ǫZ0
ǫY0
Gt=0
Gt=2
z0
x1
Z1
Y1
X0
Y0
ǫX0
ǫY0
ǫZ1
ǫY1
X2
Z2
Y2
ǫX2
ǫZ2
ǫY2
z0
X1
Z1
Y1
X0
Y0
ǫX0
ǫY0
ǫX1
ǫZ1
ǫY1
I0= (Z0,z0) = argmin
Xs,0P(X0),
xsD(Xs,0)
E[Y0|do(Xs,0=xs,0)]
I1= (X1,x1) = argmin
Xs,1P(X1),
xsD(Xs,1)
E[Y1|do(Xs,1=xs,1), I0]
I2= (Z2,z2) = argmin
Xs,2P(X2),
xsD(Xs,2)
E[Y2|do(Xs,2=xs,2), I1, I0]
Gt=1
Mt
Figure 2: Structural equation models considered by DCBO at every time step
t∈ {0,1,2}
. Exogenous
noise variables
i
are depicted here but are omitted in the remainder of the paper, to avoid clutter.
For every
t
,
Gt
is a mutilated version of
Gt1
reflecting the optimal intervention implemented in the
system at
0 : t1
which are represented by squares. The SCM functions in
F0:t
corresponding to
the intervened variables are set to constant values. The exogenous variables that only related to the
intervened variables are excluded from
Ut
.
C0:t
is given by the set
{CtC0:t1Y0:t1X0:t1}
.
where
I0:t1=St1
i=0 doX?
s,i =x?
s,i
denotes previous interventions,
1t>0
is the indicator function
and
P(Xt)
is the power set of
Xt
.
D(Xs,t)
represents the interventional domain of
Xs,t
. In the
sequel we denote the previously intervened variables by
IV
0:t1=St1
i=0 X?
s,i
and implemented
intervention levels by
IL
0:t1=St1
i=0 x?
s,i
. The cost of each intervention is given by
cost(Xs,t,xs,t )
.
In order to solve the problem in Eq. (1) we make the following assumptions :
Assumptions 1.
Denote by
G(t)
the causal graph including variables at time
t
in
G0:T
and let
YPT
t=Pa(Yt)Y0:t1
be the set of variables in
G0:T
that are both parents of
Yt
and targets at
previous time step. Let the set
YPNT
t=Pa(Yt)\YPT
t
be the complement and denote by
fYt(·)
the
functional mapping for Ytin Mt. We make the following assumptions:
1. Invariance of causal structure: G(t) = G(0),t > 0.
2.
Additivity of
fYt(·)
that is
Yt=fYt(Pa(Yt)) +
with
fYt(Pa(Yt)) = fY
Y(YPT
t) + fNY
Y(YPNT
t)
where fY
Yand fNY
Yare two generic unknown functions and ∼ N(0, σ2).
3. Absence of unobserved confounders in G0:T.
Assumption (3) implies the absence of unobserved confounders at every time step. For instance,
this is the case in Fig. 1a. Still in this DAG, Assumption (2) implies
fYt(Pa(Yt)) = fY
Y(Yt1) +
fNY
Y(Zt) + Yt,t > 0
. Finally, Assumption (1) implies the existence of the same variables at every
time step and a constant orientation of the edges among them for t > 0.
Notice that Assumptions 1 imply invariance of the causal structure within each time-slice, i.e. the
structure, edges and vertices, concerning the nodes with the same time index. This means that, across
time steps, both the graph and the functional relationships can change. Therefore, not only can the
causal effects change significantly across time steps, but also the input dimensionality of the causal
functions we model, might change. For instance, in the DAG of Fig. 3(c), the target function for
Y2
has dimensionality 3 and a function
fYt(·)
that is completely different from the one assumed for
Y1
that has only two parents. We can thus model a wide variety of settings and causal effects despite this
4
assumption. Furthermore, even though we assume an additive structure for the functional relationship
on
Y
, the use of GPs allow us to have flexible models with highly non-linear causal effects across
different graph structures. In the causality literature, G P models are well established and have shown
good performances compared to parametric linear and non-linear models (see e.g. [
31
,
34
,
37
]). The
sum of GPs gives a flexible and computationally tractable model that can be used to capture highly
non-linear causal effects while helping with interpretability [12, 11].
3 Methodology
In this section we introduce Dynamic Causal Bayesian Optimization (DC BO), a novel methodology
addressing the problem in Eq. (1). We first study the correlation among objective functions for two
consecutive time steps and use it to derive a recursion formula that, based on the topology of the
graph, expresses the causal effects at time
t
as a function of previously implemented interventions (see
square nodes in Fig. 2). Exploiting these results, we develop a new surrogate model for the objective
functions that can be used within a CBO framework to find the optimal sequence of interventions.
This model enables the integration of observational data and interventional data, collected at previous
time-steps and interventional data collected at time
t
, thereby accelerating the identification of the
current optimal intervention.
3.1 Characterization of the time structure in a DAG with time dependent variables
The following result provides a theoretical foundation for the dynamic causal GP model introduced
later. In particular, it derives a recursion formula allowing us to express the objective function at time
t
as a function of the objective functions corresponding to the optimal interventions at previous time
steps. The proof is given in the appendix (§2).
Definition 4.
Consider a DAG
G0:T
and the objective function
E[Yt|do(Xs,t =xs,t), I0:t1]
for
a generic time step
t∈ {0, . . . , T }
. Denote by
YPT
t= (Pa(Yt)Y0:t1)
the parents of
Yt
that
are targets at previous time steps and by
YPNT
t=Pa(Yt)\YPT
t
the remaining parents. For any
Xs,t ∈ P(Xt)and IV
0:t1X0:t1we define the following sets:
XPY
s,t =Xs,t Pa(Yt)includes the variables in Xs,t that are parents of Yt.
IPY
0:t1=IV
0:t1Pa(Yt)includes the variables in IV
0:t1that are parents of Yt.
WPa(Yt)
such that
Pa(Yt)=(Pa(Yt)Y0:t1)XPY
s,t IPY
0:t1W
.
W
includes variables
that are parents of Ytbut are not targets nor intervened variables.
The values of I0:t1,XPY
s,t ,IPY
0:t1and Wwill be denoted by i,xPY ,iPY and wrespectively.
Theorem 1. Time operator
. Consider a DAG
G0:T
and the related SC M satisfying Assump-
tions 1. It is possible to prove that,
Xs,t ∈ P(Xt)
, the intervention function
fs,t(x) =
E[Yt|do(Xs,t =x),1t>0·I0:t1]with fs,t(x) : D(Xs,t )Rcan be written as:
fs,t(x) = fY
Y(f?) + Ep(w|do(Xs,t=x),i)fNY
Y(xPY ,iPY ,w)(2)
where
f?={EYi|doX?
s,i =x?
s,i, I0:i1}YiYPT
t
that is the set of previously observed optimal
targets that are parents of
Yt
.
fY
Y
denotes the function mapping
YPT
t
to
Yt
and
fNY
Y
represents the
function mapping YPNT
tto Yt.
Eq. (2) reduces to
Ep(w|do(Xs,t=x),i)fNY
Y(xPY ,iPY ,w)
when
Yt
does not depend on previous
targets. This is the setting considered in CBO that can be thus seen as a particular instance of DCBO.
Exploiting Assumptions (1), it is possible to further expand the second term in Eq. (2) to get the
following expression. A proof is given in the supplement (§2).
Corollary 1. Given Assumptions 1, we can write:
Ep(w|do(Xs,t=x),i)fNY
Y(xPY ,iPY ,w)=Ep(U0:t)fNY
Y(xPY ,iPY ,{C(W)}WW)(3)
where p(U0:t)is the distribution for the exogenous variables up to time tand C(W)is given by:
C(W) =
fW(uW,xPW ,iPW )if R=
fW(uW,xPW ,iPW , r)if RXs,t IV
0:t1
fW(uW,xPW ,iPW , C (R)) if R6⊆ Xs,t IV
0:t1
5
where
fW
represents the functional mapping for
W
in the SC M and
uW
is the set of exogenous
variables with edges into
W
.
xPW
and
iPW
are the values corresponding to
XPW
s,t
and
IPW
0:t1
which in
turn represent the subset of variables in
Xs,t
and
IV
0:t1
that are parents of
W
. Finally
r
is the value
of R=Pa(W)\(XPY
s,t IPW
0:t1).
Examples for Eq. (2):
For the DAG in Fig. 1(a), at time
t= 1
and with
IV
0:t1={Z0}
, we
have
E[Y|do(Z1=z), I0] = fY
Y(y?
0) + fNY
Y(z)
. Indeed in this case
W=
,
xPY =z
and
f?={y?
0=E[Y0|do(Z0=z0)]}
. Still at
t= 1
and with
IV
0:t1={Z0}
, the objective function
for
Xs,t ={X1}
can be written as
fY
Y(y?
0) + Ep(z1|do(X1=x),I0)fNY
Y(z1)
as
W={Z1}
. All
derivations for these expressions and alternative graphs are given in the supplement (§2).
3.2 Restricting the search space
The search space for the problem in Eq. (1) grows exponentially with
|Xt|
thus slowing down the
identification of the optimal intervention when
G
includes more than a few nodes. Indeed, a naive
approach of finding X?
s,t at t= 0, . . . , T would be to explore the 2|Xt|sets in P(Xt)at every tand
keep
2|Xt|
models for the objective functions. In the static setting, CBO reduces the search space
by exploiting the results in [
21
]. In particular, it identifies a subset of variables
M⊆ P(X)
worth
intervening on thus reducing the size of the exploration set to 2|M|.
In our dynamic setting, the objective functions change at every time step depending on the previously
implemented interventions and one would need to recompute
M
at every
t
. However, it is possible
to show that, given Assumptions 1, the search space remains constant over time. Denote by
Mt
the
set
M
at time
t
and let
M0
represent the set at
t= 0
which corresponds to
M
computed in CB O. For
t > 0it is possible to prove that:
Proposition 3.1. MI S in time. If Assumptions 1 are satisfied, Mt=M0for t > 0.
3.3 Dynamic Causal GP model
Here we introduce the Dynamic Causal GP model that is used as a surrogate model for the objective
functions in Eq. (1). The prior parameters are constructed by exploiting the recursion in Eq. (2). At
each time step
t
, the agent explores the sets in
Mt⊆ P(Xt)
by selecting the next intervention to be
the one maximizing a given acquisition function. The DCBO algorithm is shown in Algorithm 1.
Prior Surrogate Model
At each time step
t
and for each
Xs,t Mt
, we place a GP prior on the
objective function fs,t(x) = E[Yt|do(Xs,t =x),1t>0·I0:t1]. We construct the prior parameters
exploiting the recursive expression in Eq. (2):
fs,t(x)∼ GP(ms,t(x), ks,t (x,x0)) where
ms,t(x) = EhfY
Y(f?) + b
E[fNY
Y(xPY ,iPY ,w)]i
ks,t(x,x0) = kR BF (x,x0) + σs,t(x)σs,t (x0)with
σs,t(x) = qV[fY
Y(f?) + ˆ
EfNY
Y(xPY ,iPY ,w)
and kRBF (x,x0):= exp(||xx0||2
2l2)represents the radial basis function kernel [29]. We have it that
ˆ
EfNY
Y(xPY ,iPY ,w)=ˆ
Ep(w|do(Xs,t=x),i)fNY
Y(xPY ,iPY ,w)
represents the expected value of
fNY
Y(xPY ,iPY ,w)
with respect to
p(w|do(Xs,t =x),i)
which
is estimated via the do-calculus using observational data. The outer expectation in
ms,t(x)
and
the variance in
σs,t(x)
are computed with respect to
p(fY
Y, f NY
Y)
which is also estimated using
observational data. In this work we model
fY
Y
,
fNY
Y
and all functions in the SCM by independent G Ps.
Both
ms,t(x)
and
σs,t(x)
can be equivalently written by exploiting the equivalence in Eq. (3). In
both cases, this prior construction allows the integration of three different types of data: observational
data, interventional data collected at time
t
and the optimal interventional data points collected in
the past. The former is used to estimate the SCM model and p(w|do(Xs,t =x),i)via the rules of
6
do-calculus. The optimal interventional data points at
0 : t1
determine the shift
fY
Y(f?)
while the
interventional data collected at time
t
are used to update the prior distribution on
fs,t(x)
. Similar
prior constructions were previously considered in static settings [
2
,
1
] where only observational and
interventional data at the current time step were used. The additional shift term appears here as
there exists causal dynamics in the target variables and the objective function is affected by previous
decisions. Fig. 2 in the appendix shows a synthetic example in which accounting for the dynamic
aspect in the prior formulation leads to a more accurate GP posterior compared to the baselines,
especially when the the optimum location changes across time steps.
Algorithm 1: DC BO
Data: DO,{DI
s,t=0}s∈{0,...,|M0|} ,G0:T,H.
Result: Optimal intervention path
{X?
s,t,x?
s,t, y ?
t}T
t=1
Initialise:M,DI
0and initial optimal DI
?=.
for t= 0, . . . , T do
1. Initialise dynamic causal GP models for all
Xs,t Mtusing DI
?,t1if t > 0.
2. Initialise interventional dataset
{DI
s,t}s∈{0,...,|Mt|}
for h= 1, . . . , H do
1. Compute EIs,t(x)for each Xs,t Mt.
2. Obtain (s?, α?)
3. Intervene and augment DI
s=s?,t
4. Update posterior for fs=s?,t
end
3. Return the optimal intervention (X?
s,t,x?
s,t)
4. Append optimal interventional data
DI
?,t =DI
?,t1((X?
s,t,x?
s,t), y ?
t)
end
Likelihood
Let
DI
s,t = (XI,YI
s,t)
be the
set of interventional data points collected for
Xs,t
with
XI
being a vector of intervention
values and
YI
s,t
representing the corresponding
vector of observed target values. As in stan-
dard BO we assume each
ys,t
in
YI
s,t
to be a
noisy observation of the function
fs,t(x)
that is
ys,t(x) = fs,t (x) + s,t
with
s,t ∼ N(0, σ 2)
for
s∈ {1,...,|Mt|}
and
t∈ {0, . . . , T }
. In
compact form, the joint likelihood function for
DI
s,t is p(YI
s,t |fs,t, σ2) = N(fs,t(XI), σ2I).
Acquisition Function
Given our surrogate
models at time
t
, the agent selects the interven-
tions to implement by solving a Causal Bayesian
Optimization problem [
2
]. The agent explores
the sets in
Mt
and decides where to intervene by
maximizing the Causal Expected Improvement
(EI). Denote by
y?
t
the optimal observed target
value in
{YI
s,t}|Mt|
s=1
that is the optimal observed target across all intervention sets at time
t
. The
Causal EI is given by
EIs,t (x) = Ep(ys,t )[max(ys,t y?
t,0)]/cost(Xs,t,xs,t ).
Let
α1, . . . , α|Mt|
be solutions of the optimization of
EIs,t (x)
for each set in
Mt
and
α?:=
max{α1, . . . , α|Mt|}
. The next best intervention to explore at time
t
is given by
s?=
argmaxs∈{1,··· ,|Mt|} αs.
Therefore, the set-value pair to intervene on is
(s?, α?)
. At every
t
, the agent
implement
H
explorative interventions in the system which are selected by maximizing the Causal EI.
Once the budget
H
is exhausted, the agent implements what we call the decision intervention
It
, that
is the optimal intervention found at the current time step, and move forward to a new optimization
at
t+ 1
carrying the information in
y?
0:t1
. The parameter
H
determines the level of exploration of
the system and acts as a budget for the CBO algorithm. Its value is determined by the agent and is
generally problem specific.
Posterior Surrogate Model
For any set
Xs,t Mt
, the posterior distribution
p(fs,t | DI
s,t)
can be
derived analytically via standard GP updates. p(fs,t | DI
s,t)will also be a G P with parameters
ms,t(x| DI
s,t) = ms,t (x) + ks,t(x,XI)[ks,t (XI,XI) + σ2I](YI
s,t ms,t(XI)and
ks,t(x,x0| DI
s,t) = ks,t (x,x0)ks,t(x,XI)[ks,t (XI,XI) + σ2I]ks,t (XI,x0).
4 Experiments
We evaluate the performance of DC BO in a variety of synthetic and real world settings with DAGs
given in Fig. 3. We first run the algorithm for a stationary setting where both the graph structure and
the SCM do not change over time (STAT.). We then consider a scenario characterised by increased
observation noise (NOIS Y) for the manipulative variables and a settings where observational data
are missing at some time steps (MI SS.). Still assuming stationarity, we then test the algorithm in
aDAG where there are multivariate interventions in
Mt
(MU LTI V.). Finally, we run DCB O for a
non-stationary graph where both the SCM and the DAG change over time (NO NSTAT.). To conclude,
7
we use DC BO to optimize the unemployment rate of a closed economy (DAG in Fig. 3d, ECO N.)
and to find the optimal intervention in a system of ordinary differential equation modelling a real
predator-prey system (DAG in Fig. 3e, OD E). We provide a discussion on the applicability of DCBO to
real-world problems in §7 of the supplement together with all implementation details.
Baselines
We compare against the algorithms in Fig. 1. Note that, by constructions, ABO and BO
intervene on all manipulative variables while DCBO and CBO explore only
Mt
at every
t
. In addition,
both DCBO and ABO reduce to CBO and BO at the first time step. We assume the availability of an
observational dataset DOand set a unit intervention cost for all variables.
Performance metric
We run all experiments for
10
replicates and show the average convergence
path at every time step. We then compute the values of a modified “gap” metric
3
across time steps
and with standard errors across replicates. The metric is defined as
Gt=y(x?
s,t)y(xinit )
y?y(xinit)+HH(x?
s,t)
H.1 + H1
H(4)
where
y(·)
represents the evaluation of the objective function,
y?
is the global minimum, and
xinit
and
x?
s,t
are the first and best evaluated point, respectively. The term
HH(x?
s,t)
H
with
H(x?
s,t)
denoting
the number of explorative trials needed to reach
x?
s,t
captures the speed of the optimization. This term
is equal to zero when the algorithm is not converged and equal to
(H1)/H
when the algorithm
converges at the first trial. We have
0Gt1
with higher values denoting better performances. For
each method we also show the average percentage of replicates where the optimal intervention set
X?
s,t is identified.
X0X1X2
Z0Z1Z2
Y0Y1Y2
W0W1W2
(a) MULTIV.
(b) IND.
X0X1X2
Z0Z1Z2
Y0Y1Y2
(c) NONSTAT.
R0R1R2
U2
U0U1
G2
G0G1
T0T1T2
(d) ECON.
Nin,t1
Nt1
Pt1
Jt1
Dt1
At1
Nin,t
Nt
Pt
Jt
Dt
At
·· ·
·· ·
·· ·
·· ·
·· ·
·· ·
·· ·
·· ·
·· ·
·· ·
Et1
Et
·· ··· ·
(e) ODE.
Figure 3: DAGs used in the experimental sections for the real (§4.2) and synthetic data (§4.1).
4.1 Synthetic Experiments
Stationary DAG and SCM (S TAT.)
We run the algorithms for the DAG in Fig. 1(a) with
T= 3
and
N= 10
.For
t > 0
,DCBO converges to the optimal value faster than competing approaches (see
Fig. 2 in the supplement, right panel, 3
rd
row). DCB O identifies the optimal intervention set in
93%
of the replicates (Table 2) and reaches the highest average gap metric (Table 1). In this experiment
the location of the optimum changes significantly both in terms of optimal set and intervention
value when going from
t= 0
to
t= 1
. This information is incorporated by DCBO through the
prior dependency on
y?
0:t1
. In addition, ABO performance improves over time as it accumulates
interventional data and uses them to fit the temporal dimension of the surrogate model. This benefits
ABO in a stationary setting but might penalise it in non-stationary setting where the objective functions
change significantly.
Noisy manipulative variables (NOISY):
The benefit of using DC BO becomes more apparent when
the manipulative variables observations are noisy while the evolution of the target variable is more
accurately detected. In this case both the convergence of DC BO and CBO are slowed down by noisy
observations which are diluting the information provided by the do-calculus making the priors less
informative. However, the D CBO prior dependency on
y?
0:t1
allows it to correctly identify the shift
in the target variable thus improving the prior accuracy and the speed-up of the algorithm (Fig. 4).
Missing observational data (MISS.)
Incorporating dynamic information in the surrogate model
allows us to efficiently optimise a target variable even in setting where observational data are missing.
3This metric is a modified version of the one used in [18].
8
0 5 10 15
cost(Xs,t,xs,t)
2
1
y?
t
t= 0
0 5 10 15
cost(Xs,t,xs,t)
4
3t= 1
0 5 10 15
cost(Xs,t,xs,t)
7
6
5t= 2 DCBO
CBO
ABO
BO
EYt|do(X?
s,t =x?
s,t)
Figure 4: Experiment NOISY. Convergence of DCB O and competing methods across replicates. The
dashed black line (- - -) gives the optimal outcome
y
t,t
. Shaded areas are
±
one standard deviation.
We consider the DAG in Fig. 1(a) with
T= 6
,
N= 10
for the first three time steps and
N= 0
afterwards. DCBO uses the observational distributions learned with data from the first three time steps
to construct the prior for
t > 3
. On the contrary, CBO uses the standard prior for
t > 3
. In this setting
DCBO consistently outperforms CBO at every time step. However, ABO performance improves over
time and outperforms DCBO starting from
t= 4
due to its ability to exploit all interventional data
collected over time (see Fig. 3 in the supplement).
Multivariate intervention sets (MULTIV.)
When the optimal intervention set is multivariate, both
DCBO and C BO convergence speed worsen. For instance, for the DAG in Fig. 3a,
|M|= 5
thus both
CBO and DC BO will have to perform more explorative interventions before finding the optimum. At
the same time, ABO and BO consider interventions only on
{Wt, Xt, Zt},t
and need to explore an
even higher intervention space. The performance of all methods decrease in this case (Table 1) but
DCBO still identifies the optimal intervention set in 93% of the replicates (Table 2).
Independent manipulative variables (IN D.):
Having to explore multiple intervention sets signifi-
cantly penalises DCBO and CBO when there is no causal relationship among manipulative variables
which are also the only parents of the target. This is the case for the DAG in Fig. 3b where the optimal
intervention is
{Xt, Zt}
at every time step. In this case, exploring
M
and propagating uncertainty in
the causal prior slows down DCB O convergence and decreases both its performance (Table 1) and
capability to identify the optimal intervention set (Table 2).
Non-stationary DAG and SCM (N ONSTAT.):
DCBO outperforms all approaches in non-stationary
settings where both the DAG and the SC M change overtime – see Fig. 3c. Indeed, DCBO can timely
incorporate changes in the system via the dynamic causal prior construction while CBO,B O and ABO
need to perform several interventions before accurately learning the new objective functions.
Table 1: Average
Gt
across 10 replicates and time steps. See Fig. 1 for a summary of the baselines.
Higher values are better. The best result for each experiment in bold. Standard errors in brackets.
Synthetic data Real data
STAT. MISS. NOISY MULTIV. IND. NO NSTAT. ECON.ODE
DCBO 0.88 0.84 0.75 0.49 0.48 0.69 0.64 0.67
(0.00) (0.01) (0.00) (0.01) (0.04) (0.00) (0.01) (0.00)
CBO 0.70 0.70 0.51 0.48 0.47 0.61 0.61 0.65
(0.01) (0.02) (0.02) (0.09) (0.07) (0.00) (0.01) (0.00)
ABO 0.56 0.49 0.49 0.39 0.54 0.38 0.57 0.48
(0.01) (0.02) (0.04) (0.21) (0.01) (0.02) (0.02) (0.01)
BO 0.54 0.48 0.38 0.35 0.50 0.38 0.50 0.44
(0.02) (0.03) (0.05) (0.08) (0.01) (0.03) (0.01) (0.03)
4.2 Real experiments
Real-World Economic data (ECON.)
We use D CBO to minimize the unemployment rate
Ut
of a
closed economy. We consider its causal relationships with economic growth (
Gt
), inflation rate
(
Rt
) and fiscal policy (
Tt
)
4
. Inspired by the economic example in [
17
] we consider the DAG in
4
The causality between economic variables is oversimplified in this example thus the results cannot be used
to guide public policy and are only meant to showcase how DCBO can be used within a real application.
9
Table 2: Average % of replicates across time steps for which
X?
s,t
is identified. See Fig. 1 for a
summary of the baselines. Higher values are better. The best result for each experiment in bold.
Synthetic data Real data
STAT. MISS. NOIS Y MULTIV. IND. NO NSTAT. ECO N.ODE
DCBO 93.00 58.00 100.00 93.00 93.00 100.00 86.67 33.3
CBO 90.00 85.00 90.00 90.0 90.00 100.00 93.33 33.3
ABO 0.00 0.00 0.00 0.00 100.00 0.00 66.67 0.00
BO 0.00 0.00 0.00 0.00 100.00 0.00 66.67 0.00
Fig. 3d where
Rt
and
Tt
are considered manipulative variables we need to intervene on to minimize
log(Ut)
at every time step. Time series data for 10 countries
5
are used to construct a non-parametric
simulator and to compute the causal prior for both DC BO and CBO.DCBO convergences to the optimal
intervention faster than competing approaches (see Table 1 and Fig. 6 in the appendix). The optimal
sequence of interventions found in this experiment is equal to
{(T0, R0) = (9.38,2.00),(T1, R1) =
(0.53,6.00),(T2) = (0.012)}which is consistent with domain knowledge.
Planktonic predator–prey community in a chemostat (ODE)
We investigate a biological system
in which two species interact, one as a predator and the other as prey, with the goal of identifying
the intervention reducing the concentration of dead animals in the chemostat – see
Dt
in Fig. 3e.
We use the system of ordinary differential equations (ODE) given by [
6
] as our SC M and construct
the DAG by rolling out the temporal variable dependencies in the ODE while removing graph cycles.
Observational data are provided in [
6
] and are use to compute the dynamic causal prior. DCBO
outperforms competing methods in term of average gap metric and identifies the optimum faster
(Table 1). Additional details can be found in the supplement (§6).
5 Conclusions
We consider the problem of finding a sequence of optimal interventions in a causal graph where
causal temporal dependencies exist between variables. We propose the Dynamic Causal Bayesian
Optimization (DC BO) algorithm which finds the optimal intervention at every time step by intervening
in the system according to a causal acquisition function. Importantly, for each possible intervention
we propose to use a surrogate model that incorporates information from previous interventions
implemented in the system. This is constructed by exploiting theoretical results establishing the
correlation structure among objective functions for two consecutive time steps as a function of the
topology of the causal graph. We discuss the DCBO performance in a variety of setting characterized
by different DAG properties and stationarity assumptions. Future work will focus on extending our
theoretical results to more general DAG structures thus allowing for unobserved confounders and a
changing DAG topology within each time step. In addition, we will work on combining the proposed
framework with a causal discovery algorithm so as to account for uncertainty in the graph structure.
Acknowledgements
This work was supported by the EPSRC grant EP/L016710/1, The Alan Turing Institute under EPSRC
grant EP/N510129/1, the Defence and Security Programme at The Alan Turing Institute, funded by
the UK Government and the Lloyds Register Foundation programme on Data Centric Engineering
through the London Air Quality project. TD acknowledges support from UKRI Turing AI Fellowship
(EP/V02678X/1).
5
Data were downloaded from
https://www.data.oecd.org/ [Accessed: 01/04/2021]
. All details
in the supplement.
10
References
[1]
Aglietti, V., Damoulas, T., Álvarez, M., and González, J. Multi-task causal learning with
gaussian processes. In Advances in Neural Information Processing Systems, volume 33, pp.
6293–6304, 2020.
[2]
Aglietti, V., Lu, X., Paleyes, A., and González, J. Causal Bayesian Optimization. In Proceedings
of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume
108 of Proceedings of Machine Learning Research, pp. 3155–3164. PMLR, 26–28 Aug 2020.
[3]
Azimi, J., Jalali, A., and Fern, X. Dynamic batch bayesian optimization. arXiv preprint
arXiv:1110.3347, 2011.
[4]
Bareinboim, E., Forney, A., and Pearl, J. Bandits with unobserved confounders: A causal
approach. Advances in Neural Information Processing Systems, 28:1342–1350, 2015.
[5]
Besbes, O., Gur, Y., and Zeevi, A. Stochastic multi-armed-bandit problem with non-stationary
rewards. Advances in neural information processing systems, 27:199–207, 2014.
[6]
Blasius, B., Rudolf, L., Weithoff, G., Gaedke, U., and Fussmann, G. F. Long-term cyclic
persistence in an experimental predator–prey system. Nature, 577(7789):226–230, 2020.
[7]
Bogunovic, I., Scarlett, J., and Cevher, V. Time-varying gaussian process bandit optimization.
In Artificial Intelligence and Statistics, pp. 314–323. PMLR, 2016.
[8]
Buesing, L., Weber, T., Zwols, Y., Racaniere, S., Guez, A., Lespiau, J.-B., and Heess, N. Woulda,
coulda, shoulda: Counterfactually-guided policy search. arXiv preprint arXiv:1811.06272,
2018.
[9]
Cruz, C., González, J. R., and Pelta, D. A. Optimization in dynamic environments: a survey on
problems, methods and measures. Soft Computing, 15(7):1427–1448, 2011.
[10]
De, M. K., Slawomir, N. J., and Mark, B. Stochastic diffusion search: Partial function evaluation
in swarm intelligence dynamic optimisation. In Stigmergic optimization, pp. 185–207. Springer,
2006.
[11]
Duvenaud, D. Automatic model construction with Gaussian processes. PhD thesis, University
of Cambridge, 2014.
[12]
Duvenaud, D., Nickisch, H., and Rasmussen, C. E. Additive Gaussian Processes. arXiv preprint
arXiv:1112.4394, 2011.
[13]
Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., and Whiteson, S. Counterfactual multi-agent
policy gradients. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32,
2018.
[14]
Fogel, L. J., Owens, A. J., and Walsh, M. J. Artificial intelligence through simulated evolution.
1966.
[15]
Glymour, C., Zhang, K., and Spirtes, P. Review of causal discovery methods based on graphical
models. Frontiers in Genetics, 10, 2019.
[16]
Goldberg, D. E. and Smith, R. E. Nonstationary function optimization using genetic algorithms
with dominance and diploidy. In Genetic algorithms and their applications: proceedings of the
second International Conference on Genetic Algorithms: July 28-31, 1987 at the Massachusetts
Institute of Technology, Cambridge, MA. Hillsdale, NJ: L. Erlhaum Associates, 1987., 1987.
[17]
Huang, B., Zhang, K., Gong, M., and Glymour, C. Causal discovery and forecasting in
nonstationary environments with state-space models. In International Conference on Machine
Learning, pp. 2901–2910. PMLR, 2019.
[18]
Huang, D., Allen, T. T., Notz, W. I., and Zeng, N. Global optimization of stochastic black-box
systems via sequential kriging meta-models. Journal of global optimization, 34(3):441–466,
2006.
11
[19]
Koller, D. and Friedman, N. Probabilistic Graphical Models: Principles and Techniques -
Adaptive Computation and Machine Learning. The MIT Press, 2009. ISBN 0262013193.
[20]
Lattimore, F., Lattimore, T., and Reid, M. D. Causal bandits: Learning good interventions via
causal inference. In Advances in Neural Information Processing Systems, pp. 1181–1189, 2016.
[21]
Lee, S. and Bareinboim, E. Structural causal bandits: where to intervene? Advances in Neural
Information Processing Systems 31, 31, 2018.
[22]
Lee, S. and Bareinboim, E. Structural causal bandits with non-manipulable variables. In
Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp. 4164–4172,
2019.
[23]
Lu, C., Schölkopf, B., and Hernández-Lobato, J. M. Deconfounding reinforcement learning in
observational settings. arXiv preprint arXiv:1812.10576, 2018.
[24]
Madumal, P., Miller, T., Sonenberg, L., and Vetere, F. Explainable reinforcement learning
through a causal lens. In Proceedings of the AAAI Conference on Artificial Intelligence,
volume 34, pp. 2493–2500, 2020.
[25]
Nyikosa, F. M., Osborne, M. A., and Roberts, S. J. Bayesian optimization for dynamic problems,
2018.
[26] Pearl, J. Causal diagrams for empirical research. Biometrika, 82(4):669–688, 1995.
[27] Pearl, J. Causality: models, reasoning and inference, volume 29. Springer, 2000.
[28]
Pelta, D., Cruz, C., and Verdegay, J. L. Simple control rules in a cooperative system for dynamic
optimisation problems. International Journal of General Systems, 38(7):701–717, 2009.
[29]
Rasmussen, C. E. Gaussian processes in machine learning. In Summer School on Machine
Learning, pp. 63–71. Springer, 2003.
[30]
Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., and De Freitas, N. Taking the human out of
the loop: A review of bayesian optimization. Proceedings of the IEEE, 104(1):148–175, 2015.
[31]
Silva, R. and Gramacy, R. B. Gaussian Process Structural Equation Models with Latent
Variables. arXiv preprint arXiv:1002.4802, 2010.
[32]
Trojanowski, K. and Wierzcho´
n, S. T. Immune-based algorithms for dynamic optimization.
Information Sciences, 179(10):1495–1515, 2009.
[33]
Villar, S. S., Bowden, J., and Wason, J. Multi-armed bandit models for the optimal design of
clinical trials: benefits and challenges. Statistical science: a review journal of the Institute of
Mathematical Statistics, 30(2):199, 2015.
[34]
Witty, S., Takatsu, K., Jensen, D., and Mansinghka, V. Causal Inference using Gaussian
Processes with Structured Latent Confounders. In International Conference on Machine
Learning, pp. 10313–10323. PMLR, 2020.
[35]
Wu, Q., Iyer, N., and Wang, H. Learning contextual bandits in a non-stationary environment.
In The 41st International ACM SIGIR Conference on Research & Development in Information
Retrieval, pp. 495–504, 2018.
[36]
Zhang, J. and Bareinboim, E. Near-optimal reinforcement learning in dynamic treatment
regimes. In Advances in Neural Information Processing Systems 32 pre-proceedings (NeurIPS),
2019.
[37]
Zhang, K., Schölkopf, B., and Janzing, D. Invariant Gaussian Process Latent Variable Models
and Application in Causal Discovery. arXiv preprint arXiv:1203.3534, 2012.
12
... P (G|D) without approximations in both Eqs. (3) and (4) and later in the acquisition function. Regarding convergence to δ G=G , the key requirements needed are: causal sufficiency; that R G includes G; all variables can be manipulated; infinite samples can be obtained from each node; causal minimality. ...
... Optimal Causal Decision Making The literature on causal decision making has mainly focused on finding the optimal treatment regime using observational data [72,7,24]. The idea of identifying the optimal action or policy by performing interventions in a causal system has been explored in causal bandits [37], causal reinforcement learning [73] and, more recently, in BO [2,1,3]. Importantly, all these approaches assume exact knowledge of the causal relationships beforehand, an assumption that is often not met in practice. ...
... Finally, notice also that causal discovery methods which do not (1) collect interventions with active learning (2) Performance measures We evaluate performance by assessing the convergence speed to the optimum value of Y as measured by the total cumulative cost of interventions taken where the cost of a single intervention is given by the number of variables in the intervened set. Further, we also evaluate our approach using the GAP metric introduced in Aglietti et al. [3,Eq. (4)]. ...
Full-text available
Preprint
We study the problem of globally optimizing the causal effect on a target variable of an unknown causal graph in which interventions can be performed. This problem arises in many areas of science including biology, operations research and healthcare. We propose Causal Entropy Optimization (CEO), a framework that generalizes Causal Bayesian Optimization (CBO) to account for all sources of uncertainty, including the one arising from the causal graph structure. CEO incorporates the causal structure uncertainty both in the surrogate models for the causal effects and in the mechanism used to select interventions via an information-theoretic acquisition function. The resulting algorithm automatically trades-off structure learning and causal effect optimization, while naturally accounting for observation noise. For various synthetic and real-world structural causal models, CEO achieves faster convergence to the global optimum compared with CBO while also learning the graph. Furthermore, our joint approach to structure learning and causal optimization improves upon sequential, structure-learning-first approaches.
... To find the optimal interventions which the BA should take to counter the actions of the RA, we propose the use of DCBO which was introduced to identify optimal interventions in precisely this kind of setting. Furthermore, it was shown to converge faster than competing methods in a variety of similar dynamic scenarios (Aglietti et al., 2021). DCBO is able to provide the optimal sequence of interventions, accounting for causal temporal dynamics, in a dynamical system such as that of a network under attack. ...
... Using causal inference notation we seek the interventional plan: {do(X * 0 = x * 0 ) , do(X * 1 = x * 1 ) , . . . , do(X * T = x * T )} which is found by solving a dynamic causal global optimisation (Aglietti et al., 2021) problem of the form: ...
... This setting makes a number of assumptions which are required to a sequence of optimal actions within a causal framework (the details of which can be found in (Aglietti et al., 2021, Assumptions 1)). Notwithstanding, we seek, at every time step, to construct models for different intervention sets (the expectation in eq. ...
Full-text available
Preprint
In this paper we explore cyber security defence, through the unification of a novel cyber security simulator with models for (causal) decision-making through optimisation. Particular attention is paid to a recently published approach: dynamic causal Bayesian optimisation (DCBO). We propose that DCBO can act as a blue agent when provided with a view of a simulated network and a causal model of how a red agent spreads within that network. To investigate how DCBO can perform optimal interventions on host nodes, in order to reduce the cost of intrusions caused by the red agent. Through this we demonstrate a complete cyber-simulation system, which we use to generate observational data for DCBO and provide numerical quantitative results which lay the foundations for future work in this space.
... To find the optimal interventions which the BA should take to counter the actions of the RA, we propose the use of DCBO which was introduced to identify optimal interventions in precisely this kind of setting. Furthermore, it was shown to converge faster than competing methods in a variety of similar dynamic scenarios (Aglietti et al., 2021). DCBO is able to provide the optimal sequence of interventions, accounting for causal temporal dynamics, in a dynamical system such as that of a network under attack. ...
... Using causal inference notation we seek the interventional plan: {do(X * 0 = x * 0 ) , do(X * 1 = x * 1 ) , . . . , do(X * T = x * T )} which is found by solving a dynamic causal global optimisation (Aglietti et al., 2021) problem of the form: ...
... This setting makes a number of assumptions which are required to a sequence of optimal actions within a causal framework (the details of which can be found in (Aglietti et al., 2021, Assumptions 1)). Notwithstanding, we seek, at every time step, to construct models for different intervention sets (the expectation in eq. ...
Full-text available
Conference Paper
In this paper, we explore cyber security defence, through the unification of a novel cyber security simulator with models for (causal) decision-making through optimisation. Particular attention is paid to a recently published approach: dynamic causal Bayesian optimisation (Aglietti et al., 2021, DCBO). We propose that DCBO can act as a blue agent when provided with a view of a simulated network and a causal model of how a red agent spreads within that network. To investigate how DCBO can perform optimal interventions on host nodes, in order to reduce the cost of intrusions caused by the red agent. Through this, we demonstrate a complete cyber-simulation system, which we use to generate observational data for DCBO and provide numerical quantitative results which lay the foundations for future work in this space.
... Problem statement. Similar to [2] the goal of this work is to find a sequence of optimal interventions over time, indexed by t i , by playing a series of sequential conditional SCM-MABs or chronological causal bandits (CCB). The agent is provided with M i , Y i for each t i and is then tasked with optimising the arm-selection sequence (within each 'trial' i) and in so doing, minimise the total regret, for the given horizon N i . ...
... 2. Aglietti et al. [2] showed that A does not change across time given assumption (1). ...
... Assumption (2) posits that the DAG is known. If this were not true then we would have to undertake causal discovery (CD) [8] or spend the first interactions with the environment [14] learning the causal DAG, from D O [21], from D I [10] or both [1]. ...
Full-text available
Preprint
This paper studies an instance of the multi-armed bandit (MAB) problem, specifically where several causal MABs operate chronologically in the same dynamical system. Practically the reward distribution of each bandit is governed by the same non-trivial dependence structure, which is a dynamic causal model. Dynamic because we allow for each causal MAB to depend on the preceding MAB and in doing so are able to transfer information between agents. Our contribution, the Chronological Causal Bandit (CCB), is useful in discrete decision-making settings where the causal effects are changing across time and can be informed by earlier interventions in the same system. In this paper, we present some early findings of the CCB as demonstrated on a toy problem.
... Some related studies [8,11,19] on CPSs (e.g., X-ray lasers, microgrids, electrical plants) perform subsystem-oriented fault analysis that can complement our work. Past approaches include isolation forests, graphs, clustering, and Bayesian models [5]. Regression models that work for IT service systems may not be effective for physical systems [22] based on the quality of timeseries data. ...
Article
Online robust parameter design (RPD) for the complex production process has recently attracted increasing attention among researchers and practitioners. However, the existing online RPD methods usually ignore the model uncertainty of initial steps, which may lead to the overestimated optimal solutions in the early stage of online RPD. This paper proposes a multi-stage robust optimization approach based on the Bayesian Gaussian process (BGP) model to improve the robustness of the optimal solutions of the online RPD process. First, the Gibbs sampling method is used to estimate the hyperparameters of the BGP model. Second, the global optimization and clustering analysis techniques are combined to determine the optimal design region of input variables. Consequently, the Bayesian posterior probability analysis technique is used to obtain the optimal robust design region for performing the online parameter optimization. Finally, an online RPD model is constructed by integrating the global optimization algorithm, parameter update strategy, and quality loss function. The proposed approach is validated through a simulation example and a laser drilling case study. The comparison results show that the proposed approach obtains more robust optimal solutions than the existing ones.
Article
Metal halide perovskites (MHPs) have catapulted to the forefront of energy research due to the unique combination of high device performance, low materials cost, and facile solution processability. A remarkable merit of these materials is their compositional flexibility allowing for multiple substitutions at all crystallographic sites, and hence thousands of possible pure compounds and virtually a near-infinite number of multicomponent solid solutions. Harnessing the full potential of MHPs necessitates rapid exploration of multidimensional chemical space toward desired functionalities. Recent advances in laboratory automation, ranging from bespoke fully automated robotic labs to microfluidic systems and to pipetting robots, have enabled high-throughput experimental workflows for synthesizing MHPs. Here, we provide an overview of the state of the art in the automated MHP synthesis and existing methods for navigating multicomponent compositional space. We highlight the limitations and pitfalls of the existing strategies and formulate the requirements for necessary machine learning tools including causal and Bayesian methods, as well as strategies based on co-navigation of theoritical and experimental spaces. We argue that ultimately the goal of automated experiments is to simultaneously optimize the materials synthesis and refine the theoretical models that underpin target functionalities. Furthermore, the near-term development of automated experimentation will not lead to the full exclusion of human operator but rather automatization of repetitive operations, deferring human role to high-level slow decisions. We also discuss the emerging opportunities leveraging machine learning-guided automated synthesis to the development of high-performance perovskite optoelectronics.
Full-text available
Article
Predator–prey cycles rank among the most fundamental concepts in ecology, are predicted by the simplest ecological models and enable, theoretically, the indefinite persistence of predator and prey1–4. However, it remains an open question for how long cyclic dynamics can be self-sustained in real communities. Field observations have been restricted to a few cycle periods5–8 and experimental studies indicate that oscillations may be short-lived without external stabilizing factors9–19. Here we performed microcosm experiments with a planktonic predator–prey system and repeatedly observed oscillatory time series of unprecedented length that persisted for up to around 50 cycles or approximately 300 predator generations. The dominant type of dynamics was characterized by regular, coherent oscillations with a nearly constant predator–prey phase difference. Despite constant experimental conditions, we also observed shorter episodes of irregular, non-coherent oscillations without any significant phase relationship. However, the predator–prey system showed a strong tendency to return to the dominant dynamical regime with a defined phase relationship. A mathematical model suggests that stochasticity is probably responsible for the reversible shift from coherent to non-coherent oscillations, a notion that was supported by experiments with external forcing by pulsed nutrient supply. Our findings empirically demonstrate the potential for infinite persistence of predator and prey populations in a cyclic dynamic regime that shows resilience in the presence of stochastic events. The potential for infinite persistence of planktonic predator and prey cycles is experimentally demonstrated and these cycles show resilience in the presence of stochastic events.
Full-text available
Article
A fundamental task in various disciplines of science, including biology, is to find underlying causal relations and make use of them. Causal relations can be seen if interventions are properly applied; however, in many cases they are difficult or even impossible to conduct. It is then necessary to discover causal relations by analyzing statistical properties of purely observational data, which is known as causal discovery or causal structure search. This paper aims to give a introduction to and a brief review of the computational methods for causal discovery that were developed in the past three decades, including constraint-based and score-based methods and those based on functional causal models, supplemented by some illustrations and applications.
Full-text available
Article
We propose practical extensions to Bayesian optimization for solving dynamic problems. We model dynamic objective functions using spatiotemporal Gaussian process priors which capture all the instances of the functions over time. Our extensions to Bayesian optimization use the information learnt from this model to guide the tracking of a temporally evolving minimum. By exploiting temporal correlations, the proposed method also determines when to make evaluations, how fast to make those evaluations, and it induces an appropriate budget of steps based on the available information. Lastly, we evaluate our technique on synthetic and real-world problems.
Full-text available
Article
We consider the sequential Bayesian optimization problem with bandit feedback, adopting a formulation that allows for the reward function to vary with time. We model the reward function using a Gaussian process whose evolution obeys a simple Markov model. We introduce two natural extensions of the classical Gaussian process upper confidence bound (GP-UCB) algorithm. The first, R-GP-UCB, resets GP-UCB at regular intervals. The second, TV-GP-UCB, instead forgets about old data in a smooth fashion. Our main contribution comprises of novel regret bounds for these algorithms, providing an explicit characterization of the trade-off between the time horizon and the rate at which the function varies. We illustrate the performance of the algorithms on both synthetic and real data, and we find the gradual forgetting of TV-GP-UCB to perform favorably compared to the sharp resetting of R-GP-UCB. Moreover, both algorithms significantly outperform classical GP-UCB, since it treats stale and fresh data equally.
Full-text available
Article
Multi-armed bandit problems (MABPs) are a special type of optimal control problem well suited to model resource allocation under uncertainty in a wide variety of contexts. Since the first publication of the optimal solution of the classic MABP by a dynamic index rule, the bandit literature quickly diversified and emerged as an active research topic. Across this literature, the use of bandit models to optimally design clinical trials became a typical motivating application, yet little of the resulting theory has ever been used in the actual design and analysis of clinical trials. To this end, we review two MABP decision-theoretic approaches to the optimal allocation of treatments in a clinical trial: the infinite-horizon Bayesian Bernoulli MABP and the finite-horizon variant. These models possess distinct theoretical properties and lead to separate allocation rules in a clinical trial design context. We evaluate their performance compared to other allocation rules, including fixed randomization. Our results indicate that bandit approaches offer significant advantages, in terms of assigning more patients to better treatments, and severe limitations, in terms of their resulting statistical power. We propose a novel bandit-based patient allocation rule that overcomes the issue of low power, thus removing a potential barrier for their use in practice.
Article
This paper studies the problem of learning the correlation structure of a set of intervention functions defined on the directed acyclic graph (DAG) of a causal model. This is useful when we are interested in jointly learning the causal effects of interventions on different subsets of variables in a DAG, which is common in field such as healthcare or operations research. We propose the first multi-task causal Gaussian process (GP) model, which we call DAG-GP, that allows for information sharing across continuous interventions and across experiments on different variables. DAG-GP accommodates different assumptions in terms of data availability and captures the correlation between functions lying in input spaces of different dimensionality via a well-defined integral operator. We give theoretical results detailing when and how the DAG-GP model can be formulated depending on the DAG. We test both the quality of its predictions and its calibrated uncertainties. Compared to single-task models, DAG-GP achieves the best fitting performance in a variety of real and synthetic settings. In addition, it helps to select optimal interventions faster than competing approaches when used within sequential decision making frameworks, like active learning or Bayesian optimization.
Article
We study the problem of using causal models to improve the rate at which good interventions can be learned online in a stochastic environment. Our formalism combines multi-arm bandits and causal inference to model a novel type of bandit feedback that is not exploited by existing approaches. We propose a new algorithm that exploits the causal feedback and prove a bound on its simple regret that is strictly better (in all quantities) than algorithms that do not use the additional causal information.
Thesis
This thesis develops a method for automatically constructing, visualizing and describing a large class of models, useful for forecasting and finding structure in domains such as time series, geological formations, and physical dynamics. These models, based on Gaussian processes, can capture many types of statistical structure, such as periodicity, changepoints, additivity, and symmetries. Such structure can be encoded through kernels, which have historically been hand-chosen by experts. We show how to automate this task, creating a system that explores an open-ended space of models and reports the structures discovered. To automatically construct Gaussian process models, we search over sums and products of kernels, maximizing the approximate marginal likelihood. We show how any model in this class can be automatically decomposed into qualitatively different parts, and how each component can be visualized and described through text. We combine these results into a procedure that, given a dataset, automatically constructs a model along with a detailed report containing plots and generated text that illustrate the structure discovered in the data. The introductory chapters contain a tutorial showing how to express many types of structure through kernels, and how adding and multiplying different kernels combines their properties. Examples also show how symmetric kernels can produce priors over topological manifolds such as cylinders, toruses, and Möbius strips, as well as their higher-dimensional generalizations. This thesis also explores several extensions to Gaussian process models. First, building on existing work that relates Gaussian processes and neural nets, we analyze natural extensions of these models to deep kernels and deep Gaussian processes. Second, we examine additive Gaussian processes, showing their relation to the regularization method of dropout. Third, we combine Gaussian processes with the Dirichlet process to produce the warped mixture model: a Bayesian clustering model having nonparametric cluster shapes, and a corresponding latent space in which each cluster has an interpretable parametric form.