Access to this full-text is provided by Springer Nature.
Content available from Statistics and Computing
This content is subject to copyright. Terms and conditions apply.
Statistics and Computing (2023) 33:58
https://doi.org/10.1007/s11222-023-10222-6
ORIGINAL PAPER
Learning-based importance sampling via stochastic optimal control
for stochastic reaction networks
Chiheb Ben Hammouda1·Nadhir Ben Rached2·Raúl Tempone3,4 ·Sophia Wiechert1
Received: 21 December 2021 / Accepted: 4 February 2023 / Published online: 28 March 2023
© The Author(s) 2023
Abstract
We explore efficient estimation of statistical quantities, particularly rare event probabilities, for stochastic reaction networks.
Consequently, we propose an importance sampling (IS) approach to improve the Monte Carlo (MC) estimator efficiency based
on an approximate tau-leap scheme. The crucial step in the IS framework is choosing an appropriate change of probability
measure to achieve substantial variance reduction. This task is typically challenging and often requires insights into the
underlying problem. Therefore, we propose an automated approach to obtain a highly efficient path-dependent measure
change based on an original connection in the stochastic reaction network context between finding optimal IS parameters
within a class of probability measures and a stochastic optimal control formulation. Optimal IS parameters are obtained
by solving a variance minimization problem. First, we derive an associated dynamic programming equation. Analytically
solving this backward equation is challenging, hence we propose an approximate dynamic programming formulation to find
near-optimal control parameters. To mitigate the curse of dimensionality, we propose a learning-based method to approximate
the value function using a neural network, where the parameters are determined via a stochastic optimization algorithm. Our
analysis and numerical experiments verify that the proposed learning-based IS approach substantially reduces MC estimator
variance, resulting in a lower computational complexity in the rare event regime, compared with standard tau-leap MC
estimators.
Keywords Stochastic reaction networks ·Tau-leap ·Importance sampling ·Stochastic optimal control ·Dynamic
programming ·Rare event
Mathematics Subject Classification 60H35 ·60J75 ·65C05 ·93E20
1 Introduction
We propose an approach to efficiently estimate statistical
quantities, particularly rare event probabilities for a particular
class of continuous-time Markov chains known as stochas-
BSophia Wiechert
wiechert@uq.rwth-aachen.de
1Chair of Mathematics for Uncertainty Quantification, RWTH
Aachen University, Aachen, Germany
2School of Mathematics, University of Leeds, Leeds, UK
3Computer, Electrical and Mathematical Sciences and
Engineering Division (CEMSE), King Abdullah University of
Science and Technology (KAUST), Thuwal, Saudi Arabia
4Alexander von Humboldt Professor in Mathematics for
Uncertainty Quantification, RWTH Aachen University,
Aachen, Germany
tic reaction networks (SRNs). Consequently, we develop
a learning-based importance sampling (IS) algorithm to
improve the Monte Carlo (MC) estimator efficiency based
on an approximate tau-leap (TL) scheme. The automated
approach is based on an original connection between opti-
mal IS parameter determination within a class of probability
measures and stochastic optimal control (SOC) formulation.
SRNs (see Sect. 1.1 for a short introduction and [9]for
more details) describe the time evolution of biochemical
reactions, epidemic processes [5,13], and transcription and
translation in genomics and virus kinetics [32,48], among
other important applications. For the current study, let Xbe
an SRN that takes values in Ndand is defined in the time
interval [0,T], where T>0 is a user-selected final time.
We aim to provide accurate and computationally efficient
MC estimations for the expected value E[g(X(T))], where
g:Nd→Ris a scalar observable for X. In particular, we
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
58 Page 2 of 17 Statistics and Computing (2023) 33 :58
study estimating rare event probabilities with g(x)=1{x∈B}
(i.e., the indicator function for a set B⊂Rd).
The quantity of interest, E[g(X(T))], can be computed by
solving the corresponding Kolmogorov backward equations
[8]. For most SRNs, deriving a closed-form solution for these
ordinary differential equations is infeasible, and numerical
approximations based on discretized schemes are commonly
used. However, the computational cost scales exponentially
with the number of species d. Therefore, we are particularly
interested in estimating E[g(X(T))]using MC methods, an
attractive alternative to avoid the curse of dimensionality.
Many schemes have been developed to simulate exact
sample paths for SRNs, such as the stochastic simulation
algorithm [25] and modified next reaction method [4]. Path-
wise exact SRN realizations can incur high computational
costs if any reaction channels have high reaction rates. Gille-
spie [26] and Aparicio and Solari [6] independently proposed
the explicit TL method (see Sect. 1.2) to overcome this issue
by simulating approximate paths of X, evolving the pro-
cess with fixed time steps and keeping reaction rates fixed
within each time step. Various simulation schemes have been
subsequently proposed to deal with situations incorporating
well-separated fast and slow time scales [1,2,11,14,40,45].
Various variance reduction techniques have been proposed
in the SRN context to reduce the computational work to esti-
mate E[g(X(T))]. Several multilevel Monte Carlo (MLMC)
[21,22] based methods have been proposed to address spe-
cific challenges in this context [3,10,11,38,40]. Furthermore,
as naive MC and MLMC estimators fail to efficiently and
accurately estimate rare event probabilities, different IS
approaches [15,16,23,24,36,46,47] have been proposed.
The current paper proposes a path-dependent IS approach
based on an approximate TL scheme to improve the MC
estimator efficiency, and hence efficiently estimate various
statistical quantities for SRNs (particularly rare event proba-
bilities). Our class of probability measure change is based on
modifying the Poisson random variable rates used to con-
struct the TL paths. In particular, optimal IS parameters
are obtained by minimizing the second moment of the IS
estimator (equivalently the variance) which represents the
cost function for the associated SOC problem. We show
that the corresponding value function solves a dynamic pro-
gramming relation that is challenging to solve analytically
(see Sect. 2.1). We approximate the dynamic programming
equation to derive a closed form solution and near-optimal
control parameters. The cost to solve the associated backward
equation numerically in multi-dimensional settings increases
exponentially with respect to the dimension (i.e., the curse of
dimensionality). Thus, we propose approximating the result-
ing value function using a neural network to overcome this
issue. Utilizing the optimality criterion for the SOC prob-
lem, we obtain a relationship between optimal IS parameters
and the value function. Finally, we employ a stochastic opti-
mization algorithm to learn the corresponding neural network
parameters. Our analysis and numerical results for different
dimensions confirm that the proposed estimator considerably
reduces the variance compared with the standard TL-MC
method with a negligible additional cost. This allows rare
event probabilities to be efficiently computed in a regime
where standard TL-MC estimators commonly fail.
The proposed approach is more computationally effi-
cient than previously proposed IS schemes in this context (
[15,16,23,24,36,46,47]) because it is based on an approxi-
mate TL scheme rather than the exact scheme. In contrast
to previous approaches, the change of measure is systemati-
cally derived to ensure convergence to the optimal measure
within the chosen class of probability measures, minimizing
MC estimator variance. The novelty of this work is estab-
lishing a connection between IS and SOC in the context of
pure jump processes, particularly for SRNs, with an empha-
sis on related practical and numerical aspects. Note that some
previous studies [7,17,20,28–31,33,41,49] have established a
similar connection, mainly in the diffusion dynamics context,
with less focus on pure jump dynamics. In this work, the pro-
posed methodology is based on an approximate explicit TL
scheme, which could and be subsequently extended in future
work to continuous-time formulation (exact schemes), and
implicit TL schemes which are relevant for systems with fast
and slow time scales.
The remainder of this paper is organized as follows. Sec-
tions 1.1,1.2,1.3 and 1.4 define relevant SRN, TL, MC and
IS concepts, respectively. Section2establishes the connec-
tion between IS and SOC, formulating the SOC problem
and defining its main ingredients: controls, cost function,
and value function; then presents the dynamic program-
ming solved by the optimal controls. Section 2.3 develops the
proposed IS learning-based approach appropriate for multi-
dimensional SRNs. Section 3provides selected numerical
experiments for different dimensions to illustrate the pro-
posed approach’s efficiency compared with standard MC
approaches. Finally, Sect. 4summarizes and concludes the
work, and discusses possible future research directions.
1.1 Stochastic reaction networks (SRNs)
We are interested in the time evolution for an homogeneously
mixed chemical reacting system described by the Markovian
pure jump process, X:[0,T]×→Nd, where (,F,P)
is a probability space. In this framework, we assume that d
different species interact through Jreaction channels. The
i-th component, Xi(t), describes the abundance of the i-th
species present in the chemical system at time t. This work
studies the time evolution of the state vector,
X(t)=(X1(t),...,Xd(t))∈Nd.(1.1)
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Statistics and Computing (2023) 33 :58 Page 3 of 17 58
Each reaction channel Rjis a pair (aj,νj)defined by its
propensity function aj:Rd→R+and stoichiometric vec-
tor νj=(ν j,1,νj,2,...,νj,d)satisfying
PX(t+t)=x+νj|X(t)=x=aj(x)t+o(t),j=1,2,...,J.
(1.2)
Thus, the probability of observing a jump in the process X
from state xto state x+νj, a consequence of reaction Rjfir-
ing during the small time interval (t,t+t], is proportional
to the time interval length, t, where aj(x)is the proportion-
ality constant. We set aj(x)=0forxsuch that x+νj/∈Nd
(i.e., the non-negativity assumption: the system can never
produce negative population values).
Hence, from (1.2), process Xis a continuous-time,
discrete-space Markov chain that can be characterized by
Kurtz’s random time change representation [19],
X(t)=x0+
J
j=1
Yjt
0
aj(X(s)) dsνj,(1.3)
where Yj:R+×→Nare independent unit-rate Pois-
son processes. Conditions on the reaction channels can be
imposed to ensure uniqueness [5] and avoid explosions in
finite time [18,27,44].
Applying the stochastic mass-action kinetics principle, we
can assume that the propensity function aj(·)for reaction
channel Rj, represented as1
αj,1S1+···+αj,dSd
θj
→βj,1S1+···+βj,dSd(1.4)
obeys
aj(x):= θj
d
i=1
xi!
(xi−αj,i)!1{xi≥αj,i},(1.5)
where {θj}J
j=1represents positive constant reaction rates, and
xiis the counting number for species Si.
1.2 Explicit tau-leap approximation
The explicit-TL scheme is a pathwise approximate method
[6,26] to overcome computational drawbacks for exact meth-
ods (i.e., when many reactions fire during a short time
interval). This scheme can be derived from the random
time change representation (1.3) by approximating the inte-
gral ti+1
tiaj(X(s))dsas aj(X(ti)) (ti+1−ti), i.e., using the
forward-Euler method with time mesh {t0=0,t1,...,tN=
1αj,imolecules for species Siare consumed and βj,iare produced.
Thus, (α j,i,βj,i)∈N2but βj,i−αj,ican be a negative integer, con-
stituting the vector νj=βj,1−αj,1,...,βj,d−αj,d∈Zd.
T}and size t=T
N. Thus, the explicit-TL approximation
for Xshould satisfy for k∈{1,2,...,N}
Xt
k=x0+
J
j=1
Yjk−1
i=0
aj(
Xt
i)tνj,(1.6)
and given
X0:= x0, we iteratively simulate a path for
Xt
as
Xt
k:=
Xt
k−1+
J
j=1
Pk−1,jaj(
Xt
k−1)tνj,1≤k≤N,
(1.7)
where, conditioned on the current state
Xt
k,
{Pk,j(rk,j)}{1≤j≤J}are independent Poisson random vari-
ables with respective rates rk,j:= aj(
Xt
k)t.
The explicit-TL path
Xtis defined only at time mesh
points, but can be naturally extended to [0,T]as a piecewise
constant path. We apply the projection to zero to prevent
the process from exiting the lattice (i.e., producing negative
values), hence (1.7) becomes
Xt
k:=max ⎛
⎝0,
Xt
k−1+
J
j=1
Pk−1,jaj(
Xt
k−1)tνj
⎞
⎠,1≤k≤N,(1.8)
where the maximum is applied entry-wise. In this work, we
use uniform time steps with length t, but the explicit-TL
scheme and the proposed IS scheme (see Sect. 2) can also be
applied to non-uniform time meshes.
1.3 Biased Monte Carlo estimator
Let Xbe a stochastic process and g:Rd→Ra scalar
observable. We want to approximate E[g(X(T))], but rather
than sampling directly from X(T), we sample from Xt(T),
which are random variables generated by a numerical scheme
with step size t. We assume that variates Xt(T)are gen-
erated with an algorithm with weak order, O(t), i.e., for
sufficiently small t,
Eg(X(T)) −g(Xt(T))≤Ct(1.9)
where C>0.2
Let μMbe the standard MC estimator for Eg(Xt(T)),
μM:= 1
M
M
m=1
g(Xt
[m](T)), (1.10)
2Refer to [39] for the underlying assumptions and proofs for this state-
ment in the TL scheme context.
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
58 Page 4 of 17 Statistics and Computing (2023) 33 :58
where {Xt
[m](T)}M
m=1are independent and distributed as
Xt(T).
The global error for the proposed MC estimator has error
decomposition
Eg(X(T))−μM
≤E[g(X(T))]−Eg(Xt(T))
Bias
+Eg(Xt(T))−μM
Statistical Error
.(1.11)
To achieve the desired accuracy, TOL, it is sufficient to bound
the bias and statistical error equally by TOL
2.From(1.9),
choosing step size
t(TOL)=TOL
2·C(1.12)
ensures a bias of TOL
2.
Thus, considering the central limit theorem, the statistical
error can be approximated as
Eg(Xt(T))−μM≈Cα·Var [g(Xt(T))]
M,(1.13)
where constant Cαis the (1−α
2)−quantile for the standard
normal distribution. We choose Cα=1.96 for a 95% confi-
dence level corresponding to α=0.05. Choosing
M∗(TOL)=C2
α
4·Var g(Xt(T))
TOL2(1.14)
sample paths ensures the statistical error to be approximately
bounded by TOL
2.
Given that the computational cost to simulate a sin-
gle path is Ot−1, the expected total computational
complexity is OTOL−3; and the complexity scales with
Var g(Xt(T))(see (1.14)).
1.4 Importance sampling
Importance sampling (IS) techniques improve the computa-
tional costs for the crude MC estimator by variance reduction
when used appropriately. To motivate the use of these tech-
niques, consider estimating rare event probabilities, where
the crude MC method is substantially expensive. In particu-
lar, consider estimating q=P(Y>γ)=E[1{Y>γ }], where
Yis a random variable taking values in Rwith probabil-
ity density function ρY.Letγbe sufficiently large that q
becomes sufficiently small. We can approximate qusing the
MC estimator
q=1
M
M
i=1
1{Y(i)>γ },(1.15)
where {Y(i)}M
i=1are independent and identically distributed
(i.i.d) realizations sampled according to ρY. The MC estima-
tor variance is
Var 1{Y(i)>γ }=q−q2.(1.16)
For a sufficiently small q, we can use (1.16) and the central
limit theorem to approximate the relative error as
|q−q|
q≈Cα1
qM,(1.17)
where Cαis chosen as in (1.13).
The number of required samples to attain a relative error
tolerance TOL
rel is M≈C2
α
q·TOL
2
rel
. Thus, for qof the order
of 10−8, the number of required samples such that TOL
rel =
5% is approximately equal to 1.5·1011.
To demonstrate the IS concept, consider the general prob-
lem of estimating E[g(Y)], where gis a given observable. In
the previous example, gwas chosen as g(y)=1{y>γ }.Let
ρZbe the probability density function for a new real random
variable Z, such that g·ρYis dominated by ρZ, i.e.,
ρZ(x)=0⇒ g(x)·ρY(x)=0 (1.18)
for all x∈R. This permits, the quantity of interest to be
expressed as
E[g(Y)]=R
g(x)ρY(x)dx =
=R
g(x)ρY(x)
ρZ(x)
L(x)
·ρZ(x)dx =E[L(Z)·g(Z)],
(1.19)
where L(·)is the likelihood ratio. Hence the expected value
under the new measure remains unchanged, but the vari-
ance could be reduced due to a different second moment
E(g(Z)·L(Z))2.
The MC estimator under the IS measure is
μIS
M=1
M
M
j=1
L(Z[j])·g(Z[j])=1
M
M
j=1
ρY(Z[j])
ρZ(Z[j])·g(Z[j]),
(1.20)
where Z[j]are i.i.d samples from ρZfor j=1,...,M.
The main challenge when using IS is choosing a new
probability measure that substantially reduces the variance
compared with the original measure. This step strongly
depends on the structure of the problem under consideration.
Further, the new measure should be obtained with negligi-
ble computational cost to ensure a computational efficient
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Statistics and Computing (2023) 33 :58 Page 5 of 17 58
IS scheme. This is particularly challenging in the present
problem, since we are considering path-dependent prob-
ability measures. In particular, the aim is to introduce a
path-dependent change of probability measure that corre-
sponds to changing the Poisson random variable rates used
to construct the TL paths. Section 2.1 shows how the optimal
IS parameters can be obtained using a novel connection with
SOC.
2 Importance sampling (IS) via stochastic
optimal control (SOC)
2.1 Dynamic programming for the importance
sample parameters
This section, establishes the connection between optimal IS
measure determination within a class of probability mea-
sures, and SOC. Let Xbe a SRN as defined in Sect. 1.1 and
let
Xtdenote its TL approximation as given by (1.8). We
aim to find a near-optimal IS measure to improve the MC esti-
mator computational performance to estimate E[g(X(T))].
Since finding the optimal path-dependent change of measure
within all measure classes presents a challenging problem,
we limit ourselves to a parameterized class obtained via mod-
ifying the Poisson random variable rates of the TL paths.
This class of measure change was previously used in [10]to
improve the MLMC estimator robustness and performance
in this context; we focus on a single-level MC setting, and
seek to automate the task to find a near-optimal IS measure
within this class.
We introduce the change of measure resulting from chang-
ing the Poisson random variable rates in the TL scheme,
Pn,j=Pn,jδt
n,j(Xt
n)t,
n=0,...,N−1,j=1,..., J;(2.1)
where δt
n,j(x)∈Ax,jis the control parameter at time step
n, under reaction j, and in state x∈Nd; and conditioned on
Xt
n,Pn,j(rn,j)are independent Poisson random variables
with respective rates rn,j:= δt
n,j(Xt
n)t. The admissible
set,
Ax,j={0},if aj(x)=0
{y∈R:y>0},otherwise,(2.2)
is chosen such that (1.18) is fulfilled and to avoid infinite vari-
ance for the IS estimator. The control δt
n,j(x)∈Ax,jdepends
deterministically on the current time step n, reaction channel
j, and current state x=Xt
nfor the TL-IS approximation in
(2.3).
Therefore, the resulting scheme under the new measure is
Xt
n+1=max ⎛
⎝0,Xt
n+
J
j=1
Pn,jνj⎞
⎠,n=0,...,N−1,
Xt
0=x0,;(2.3)
and the likelihood ratio3at step nassociated with the new IS
measure is
Ln(Pn,δt
n(Xt
n)) =
J
j=1
exp −(aj(Xt
n)−δt
n,j(Xt
n))t⎛
⎝aj(Xt
n)
δt
n,j(Xt
n)⎞
⎠
Pn,j
=exp⎛
⎝−⎛
⎝
J
j=1
aj(Xt
n)−δt
n,j(Xt
n)
⎞
⎠t⎞
⎠·
J
j=1⎛
⎝aj(Xt
n)
δt
n,j(Xt
n)⎞
⎠
Pn,j
;
(2.4)
where δt
n(x)∈×
J
j=1Ax,jare the IS parameters with
δt
n(x)j=δt
n,j(x)and the Poisson realizations are denoted
by Pnwith Pnj:= Pn,jfor j=1,...,J. Equation
(2.4) uses the convention that aj(Xt
n)
δt
n,j(Xt
n)=1, whenever
aj(Xt
n)=0 and δt
n,j(Xt
n)=0. From (2.2), this results
in a factor of one in the likelihood ratio for reactions with
aj(Xt
n)=0.
Therefore, the likelihood ratio for {Xt
n:n=0,...,N}
across one path is
LP0,...,PN−1,δt
0Xt
0,...,δt
N−1Xt
N−1
=
N−1
n=0
LnPn,δt
nXt
n.(2.5)
This likelihood ratio completes the characterization for the
proposed IS approach, and allows the quantity of interest
with respect to the new measure to be expressed as
E[g(
Xt
N)]
=ELP0,...,PN−1,δt
0(Xt
0), . . . ,δt
N−1(Xt
N−1)·g(Xt
N),(2.6)
with the expectation in the right-hand side of (2.6) taken
with respect to the dynamics in (2.3).
Hereinafter, we aim to determine optimal parameters
{δt
n(x)}n=0,...,N−1;x∈Ndthat minimize the second moment
(and hence the variance) for the IS estimator, given that
Xt
0=x0. To that end, we derive an associated SOC formu-
lation. First we introduce the cost function for the proposed
SOC problem in Definition 2.1, then derive a dynamic pro-
gramming equation in Theorem 2.4 that is satisfied by the
3We refer to [10] (Sect. 4.1) for the likelihood factor derivation of a
similar IS scheme.
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
58 Page 6 of 17 Statistics and Computing (2023) 33 :58
value function ut(·,·)in Definition 2.3. The proof for The-
orem 2.4 is given in “Appendix A”.
Definition 2.1 (Second moment for the proposed importance
sampling estimator)Let0≤n≤N. Given that Xt
n=x,
the second moment for the proposed IS estimator can be
expressed as
Cn,xδt
n,...,δt
N−1
=Eg2(Xt
N)
N−1
k=n
L2
kPk,δt
k(Xt
k)|Xt
n=x,
0≤n≤N−1,(2.7)
with terminal cost CN,x=Eg2Xt
N|Xt
N=x=
g2(x), for any x∈Nd.
Compared with the classical SOC formulation, (2.7) can
be interpreted as the expected total cost; where the main
difference is that (2.7) uses a multiplicative cost structure
rather than the standard additive one. Therefore, we derive
a dynamic programming relation in Theorem 2.4 associated
with this cost structure that is fulfilled by the corresponding
value function (see Definition 2.3), in the SRN context.
Remark 2.2 (Structure of the cost function) One can derive
an optimal control formulation with additive structure (sim-
ilar to [30] in the stochastic differential equation setting) by
applying a logarithmic transformation together with Jensen’s
inequality to (2.7). This reduces the control problem to a
Kullback–Leibler minimization. In [41,42], this Kullback–
Leibler minimization problem leads to the same optimal
change of measure as the problem of finding the change of
measure using a variance minimization approach. However,
the previous conclusion needs more investigation in the set-
ting of SRNs, which we leave for future potential work.
Definition 2.3 (Value function)Thevalue function ut(·,·)
is defined as the optimal (infimum) second moment for the
proposed IS estimator. For time step 0 ≤n≤Nand state
x∈Nd,
ut(n,x):= inf
{δt
k}k=n,...,N−1∈AN−n
Cn,xδt
n,...,δt
N−1
=inf
{δt
k}k=n,...,N−1∈AN−n
Eg2Xt
NN−1
k=n
L2
kPk,δt
k(Xt
k)|Xt
n=x,
(2.8)
where A=×x∈Nd×J
j=1Ax,j∈RNd×Jis the admissible
set for the IS parameters; and ut(N,x)=g2(x), for any
x∈Nd.
Theorem 2.4 (Dynamic programming for importance sam-
pling parameters) For x∈Nd, the value function ut(n,x)
fulfills the dynamic programming relation
ut(N,x)=g2(x)
and for n =N−1,...,0,and Ax:=
J
×
j=1
Ax,j,
ut(n,x)=inf
δt
n(x)∈Ax
exp ⎛
⎝⎛
⎝−2
J
j=1
aj(x)+
J
j=1
δt
n,j(x)⎞
⎠t⎞
⎠
×
p∈NJ⎛
⎝
J
j=1
(t·δt
n,j(x))pj
pj!(aj(x)
δt
n,j(x))2pj⎞
⎠·
ut(n+1,max(0,x+νp)), (2.9)
where ν=(ν1,...,νJ)∈Zd×J.
Theorem 2.4 breaks down the minimization problem to a
simpler optimization that can be solved stepwise backward
in time starting from final time T. Solving the minimiza-
tion problem (2.9) analytically is difficult due to the infinite
sum. Section 2.2 shows how to overcome this issue by
approximating (2.9) to derive near-optimal parameters for
{δt
n(x)}n=0,...,N−1;x∈Ndfor the proposed IS approach.
2.2 Approximate dynamic programming
Theorem 2.4 gives an exact solution for optimal IS param-
eters resulting from modifying the Poisson random variable
rates in the TL paths. However, the infinite sum has to be
evaluated in closed form to solve (2.9) analytically, which
is generally difficult. Therefore, we propose approximating
the value function ut(n,x)in (2.9)byut(n,x)for all time
steps n=0,...,N, reaction channels j=1,...,Jand
states x∈Nd. First, both ut(n,x)and ut(n,x)satisfy the
same final condition,
ut(N,x)=ut(N,x)=g2(x). (2.10)
Next, to derive the approximate dynamic programming
relation for ut(·,·), we presume Assumption 2.5 to hold.
This assumption is motivated by the behavior of the origi-
nal propensities, which are of O(1)due to the mass-action
kinetics principle (refer to (1.5)).
Assumption 2.5 The controls {δt
n}n=0,...,N−1are asymptot-
ically constant (i.e., δt
n,j(x)→cn,j,x,ast→0, where
cn,j,xare constants for 1 ≤j≤J,0 ≤n≤N−1, and
x∈Nd).
Given Assumption 2.5 and that {aj(·)}J
j=1are of O(1),we
apply a Taylor expansion around t=0 to the exponential
term in (2.9), then truncate the expression within the infimum
such that the remaining terms are O(t). This truncates the
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Statistics and Computing (2023) 33 :58 Page 7 of 17 58
infinite sum and linearizes the exponential term. Thus, for
x∈Ndand n=N−1,...,0
ut(n,x)=tinf
(δ1,...,δJ)∈AxJ
j=1
a2
j(x)
δj
ut(n+1,max(0,x+νj))
+ut(n+1,x)
J
j=1
δj
+ut(n+1,x)−2t·ut(n+1,x)·
J
j=1
aj(x)
=t·
J
j=1
inf
δj∈Ax,ja2
j(x)
δj·ut(n+1,max(0,x+νj)) +δj·ut(n+1,x)
=:Qt(n,j,x)
+⎛
⎝1−2t
J
j=1
aj(x)⎞
⎠ut(n+1,x), (2.11)
where δj∈Ax,j,j=1,...,J, are the SOC parameters
at state xfor reaction j. The admissible set Ax,jis defined
in (2.2). Assumption 2.5 ensures that (i) we can apply the
Taylor expansion to the exponential term as tdecreases,
and (ii) we have the exact approximation structure for (2.11)
with no further terms scaling with tthat have order less
than t2.
The infimum in (2.11) is attained when
(i)ut(n+1,x)= 0,and
(ii)ut(n+1,max(0,x+νj)) = 0,∀1≤j≤J.
(2.12)
In this case, the approximate optimal SOC parameter δt
n,j(x)
can be analytically determined as
δt
n,j(x)=aj(x) ut(n+1,max(0,x+νj))
√ut(n+1,x),1≤j≤J.
(2.13)
Note (2.13) includes the particular case when aj(x)=0
for some j∈{1,...,J}. In such a case, δt
n,j(x)=0, which
agrees with (2.2).
An important advantage for this numerical approxima-
tion, ut(·,·), is that we reduce the complexity of the
original optimization problem at each step in (2.9)froma
simultaneous optimization over Jvariables to independent
one-dimensional optimization problems that can be solved
in parallel using (2.13).
Remark 2.6 (Assumption (2.12)) Whether the assumption in
(2.12) is generally fulfilled depends on the method employed
to solve the dynamic programming principle in (2.11). For
example, if we use a direct numerical implementation either
some special numerical treatment is required for the cases
where (2.12) is violated, or some regularization is required
to ensure well-posedness. The proposed approach from
Sect. 2.3 avoids that issue since we model ut(·,·)with a
strictly positive ansatz function, which guarantees condition
(2.12) to hold for any state xand all time steps n.
Remark 2.7 (Computational cost for dynamic programming)
To derive a practical numerical algorithm for a finite num-
ber of states, we truncate the infinite state space Ndto
×d
i=1[0,Si], where S1,...,Sdis a set of sufficiently large
upper bounds. The computational cost to numerically solve
the dynamic programming equation (2.11) for step size t
and state space×d
i=1[0,Si]can be expressed as
Wdp(S,t)≈S∗d
·T
t·J,(2.14)
where S∗=maxi=1,...,dSi.
The cost in (2.14) scales exponentially with dimension d.
Section 2.3 proposes an alternative approach to address this
curse of dimensionality. However, in future work, we aim to
combine dimension reduction techniques for SRNs with a
direct numerical implementation of dynamic programming.
2.3 Learning-based approach
Using the SOC formulation derived in Sect.2.2, we propose
approximating the value function ut(·,·)with a parameter-
ized ansatz function, u(t,x;β).
Remark 2.8 (Choosing the ansatz function) The parameter-
ized ansatz function u(t,x;β)should consider the final
condition of the value function (2.9), and its choice depends
on the given SRN and observable g(x). For linear observ-
ables, such as g(x)=xi, we can consider polynomial basis
functions as an ansatz. For more complex problems, the
ansatz function is a small neural network.
For rare event applications with observable g(x)=1{xi>γ },
we consider a sigmoid with learning parameters β=
βspace,βtime∈Rd+1as the ansatz function
u(t,x;β)=1
1+e−(1−t)·(βspace ,x+βtime)−b0−β0xi
,(2.15)
where ·,· denotes the inner product, and the time is scaled
to one using t∈[0,1].
Parameters b0and β0are not learned through optimization
but determined by fitting the final condition for Theorem 2.4,
which imposes u(1,x;β)≈g2(x)=1{xi>γ }. Therefore,
the discontinuous indicator function is approximated by a
sigmoid, and the fit is characterized by the position of the
sigmoid’s inflection point and the sharpness of the slope. The
position and value of local and global minima with respect
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
58 Page 8 of 17 Statistics and Computing (2023) 33 :58
to the learned parameters βspace and βtime depend on the
choices for b0and β0.
To derive IS parameters from the ansatz function, we use
the previous SOC result from (2.13), i.e.,
δt
j(n,x;β)=
aj(x)!u(n+1)t
T,max(0,x+νj);β
"u((n+1)t
T,x;β)
,
1≤j≤J,0≤n≤N−1,x∈Nd.(2.16)
We define u(t,·;·)in (2.15) as a time-continuous function for
t∈[0,1]; whereas the IS controls from
δt
j(n,·;·)are dis-
crete in time for n=0,...,N−1, and depend on time step
size t. Therefore, u(·,·;β)can be used to derive control
parameters for arbitrary tin (2.16).
The parameters βfor the ansatz function are then chosen
to minimize the second moment,
inf
β∈Rd+1
Eg2Xt,β
NN−1
k=0
L2
kPk,
δt(k,Xt,β
k;β)
=:C0,x
δt
0,...,
δt
N−1;β
,
(2.17)
where {Xt,β
n}n=1,...,Nis the IS path generated using IS
parameters from (2.16) and
δt(n,x;β)j=
δt
j(n,x;β)
for 1 ≤j≤J.
We use a gradient based stochastic optimizer method to
solve (2.17), and derive Lemma 2.9 (proof in “Appendix B”)
for the gradient of the second moment with respect to param-
eters β.
Lemma 2.9 The partial derivatives for the second moment
C0,x
δt
0,...,
δt
N−1;βin (2.17)with respect to βl,l =
1,...,(d+1), are given by
∂
∂βl
E⎡
⎢
⎢
⎢
⎢
⎣
g2Xt,β
NN−1
k=0
L2
kPk,
δt(k,Xt,β
k;β)
=:R(x0;β)
⎤
⎥
⎥
⎥
⎥
⎦
=E⎡
⎣R(x0;β)⎛
⎝
N−1
k=1
J
j=1⎛
⎝t−Pk,j
δt
j(k,Xt,β
k;β)⎞
⎠·∂
∂βl
δt
j(k,Xt,β
k;β)
⎞
⎠⎤
⎦,
(2.18)
where {Xt,β
n}n=1,...,Nis the IS path generated using the IS
parameters from (2.16)and
∂
∂βl
δt
j(k,x;β)
=a2
j(x)
2
δt
j(k,x;β)·∂
∂βlu((k+1)t
T,max(x+νj,0);β)
u((k+1)t
T,x;β)
−u((k+1)t
T,max(x+νj,0);β)∂
∂βlu((k+1)t
T,x;β)
u2((k+1)t
T,x;β).(2.19)
Thus, partial derivatives for u(t,x;β)for the ansatz (2.15)
are
∂
∂βlu(t,x;β)
=(1−t)xiu(t,x;β)(1−u(t,x;β)) , if βl=βspacei
(1−t)u(t,x;β)(1−u(t,x;β)) , if βl=βtime,
(2.20)
where βspaceidenotes the i-th entry for βspace.
For an ansatz function different from (2.15), the gradient is
still given by Lemma 2.9 only the derivation of ∂
∂βlu(t,x;β)
in (2.20) changes accordingly.
By estimating the gradient in (2.18)usingaMCestima-
tor, we iteratively optimize the parameters βto reduce the
variance. For this optimization, we use the Adam optimizer
with the same parameter values suggested in [34] with the
only difference that the step size is tuned to fit our problem
setting.
In Sect. 3, we illustrate the potential of our new IS method
based on the learning approach numerically in terms of vari-
ance reduction. Further theoretical and numerical analysis of
this approach is left for future work, particularly the initial-
ization for the learned parameters βtime and βspace in (2.15)
and investigations of a stopping rule.
To derive an estimator for E[g(X(T))]using the proposed
IS change of measure, we first solve the related SOC problem
using the approach from this section; then we simulate M
paths under the new IS sampling measure. Thus, the MC
estimator using the proposed IS change of measure over M
paths becomes
μIS
M,t=1
M
M
i=1
Li·g(Xt,β
[i],N), (2.21)
where Xt,β
[i],Nis the i-th IS sample path and the corresponding
likelihood factor from (2.5)is
Li=LP0,...,PN−1,
δt(0,Xt,β
[i],0;β), . . . ,
δt(N−1,Xt,β
[i],N−1;β).
(2.22)
Remark 2.10 The explicit pathwise derivatives in Lemma 2.9
have the following ad