Content uploaded by Julio Backhoff
Author content
All content in this area was uploaded by Julio Backhoff on Feb 28, 2020
Content may be subject to copyright.
SIAM J. CONTROL OPTIM.c
2019 Society for Industrial and Applied Mathematics
Vol. 57, No. 6, pp. 3666–3693
EXTENDED MEAN FIELD CONTROL PROBLEMS: STOCHASTIC
MAXIMUM PRINCIPLE AND TRANSPORT PERSPECTIVE∗
BEATRICE ACCIAIO†, JULIO BACKHOFF-VERAGUAS‡,AND REN´
E CARMONA§
Abstract. We study mean field stochastic control problems where the cost function and the state
dynamics depend upon the joint distribution of the controlled state and the control process. We prove
suitable versions of the Pontryagin stochastic maximum principle, both in necessary and in sufficient
forms, which extend the known conditions to this general framework. We suggest a variational
approach for a weak formulation of these control problems. We show a natural connection between
this weak formulation and optimal transport on path space, which inspires a novel discretization
scheme.
Key words. controlled McKean–Vlasov SDEs, Pontryagin principle, mean-field interaction,
casual transport plans
AMS subject classifications. 93E20, 90C08, 60H30, 60K35
DOI. 10.1137/18M1196479
1. Introduction. The control of stochastic differential equations of mean field
type, also known as McKean–Vlasov control, did not get much attention before the
theory of mean field games became a popular subject of investigation. Indeed the two
topics are intimately related through the asymptotic theory of mean field stochastic
systems known as propagation of chaos. See, for example, [15] for an early discussion
of the similarities and the differences of the two problems. Among the earliest works
on this new form of control problem, relevant to the spirit of the analysis conducted
in this paper, are [10, 9, 3, 28, 8, 13]. Here, we follow the approach introduced
and developed in [13]. The reader is referred to [14, Chapters 3, 4, 6] for a general
overview of these problems and an extensive historical perspective. Still, most of
these contributions are limited to mean field interactions entering the models through
the statistical distribution of the state of the system alone. The goal of the present
article is to investigate the control of stochastic dynamics depending upon the joint
distribution of the controlled state and the control process. We refer to such problems
as extended Mean Field control problems; see [14, section 4.6].
Our first contribution is to prove an appropriate form of the Pontryagin stochastic
maximum principle, in necessary and in sufficient forms, for extended mean field
control problems. The main driver behind this search for an extension of existing
tools is the importance of many practical applications, which naturally fit within
the class of models for which the interactions are not only through the distribution
of the state of the system, but also through the distribution of the controls. The
analysis of extended mean field control problems had been restricted so far to the
∗Received by the editors June 25, 2018; accepted for publication (in revised form) August 16,
2019; published electronically November 12, 2019.
https://doi.org/10.1137/18M1196479
Funding: The third author was partially supported by National Science Foundation grant
DMS-1716673 and by Army Research Office grant W911NF-17-1-0578.
†Department of Statistics, London School of Economics, London, WC2A 2AE, England (b.acciaio
@lse.ac.uk).
‡Institute of Statistics and Mathematical Methods in Economics, Vienna University of Technology,
Vienna, 1040, Austria (julio.backhoff@tuwien.ac.at).
§Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544
(rcarmona@princeton.edu).
3666
EXTENDED MEAN FIELD CONTROL PROBLEMS 3667
linear quadratic (LQ) case; see, e.g., [35, 24, 6, 33]. To the best of our knowledge, the
recent work [33] is the only one where more general models are considered. In that
article, however, the authors restrict the analysis to closed-loop feedback controls,
leading to a deterministic reformulation of the problem, which is used in order to
derive the Bellman equation associated with the problem; theirs is therefore a PDE
approach. In the present paper, we study the extended mean field control problem
without any restrictions, deriving a version of the Pontryagin maximum principle via
a probabilistic approach.
We apply our optimality conditions for particular classes of models, where our
analysis can be pushed further. In the case of scalar interactions, in which the dy-
namics depend solely upon moments of the marginal distributions, we derive a more
explicit form of the optimality condition. The advantage here is that the analysis
can be conducted with a form of classical differential calculus, without the use of the
notion of L-differentiability. The announced work [23] studies an application of such
a class of models in electricity markets. As a special case of scalar interaction, we
study an optimal liquidation model, which we are able to solve explicitly. Finally, we
consider the case of LQ models for which we easily derive explicit solutions which can
be computed numerically. The results in the LQ setting are compatible with existing
results in the literature.
Another contribution of the present article is the variational study of a weak
formulation of the extended mean field control problem. Weak formulations have
already been studied in the literature, without nonlinear dependence in the law of
the control, as in [14, Chapter 6] and [25]. In this framework, we derive an analogue
of the Pontryagin principle in the form of a martingale optimality condition. Similar
statements have been derived in [18, 27] under the name of stochastic Euler–Lagrange
condition for a different kind of problems. Next, we derive a natural connection
between the extended mean field control problem and an optimal transport problem
on path space. The theory of optimal transport is known to provide a set of tools and
results crucial to the understanding of mean field control and mean field games. We
illustrate the use of this connection by building a discretization scheme for extended
mean field control based on transport-theoretic tools (as in [36, Chapter 3.6] for the
case without mean field terms), and show that this scheme converges monotonically
to the value of the original extended mean field control problem. The explosion in
activity regarding numerical optimal transport gives us reason to believe that such
discretization schemes might be efficiently implemented in the near future; see, e.g.,
[19, 7, 29] for the static setting and [30, 31, 32] for the dynamic one.
The paper is organized as follows. In section 2, we introduce the notations and
basic underpinnings for extended mean field control. Section 3 provides a new form
of the Pontryagin stochastic maximum principle. In section 4, we study classes of
models for which our optimality conditions lead to explicit solutions. In section 5, we
analyze the weak formulation of the problem in connection with optimal transport.
Finally, in the appendix, we collect some technical proofs.
2. Extended mean field control problems. The goal of this short subsection
is to set the stage for the statements and proofs of the stochastic maximum principle
proven in section 3 below.
Let f,b, and σbe measurable functions on Rd×Rk× P2(Rd×Rk) with val-
ues in R,Rd, and Rd×m, respectively, and gbe a real valued measurable function
on Rd× P2(Rd). Here and elsewhere we denote by P(·) (resp., P2(·)) the set of
probability measures (resp., with finite second moments) over an underlying met-
3668 B. ACCIAIO, J. BACKHOFF-VERAGUAS, AND R. CARMONA
ric space. Let (Ω,F,P) be a probability space, F0⊂ F be a sub-sigma-algebra,
and F= (Ft)0≤t≤Tbe the filtration generated by F0and an m-dimensional Wiener
process W= (Wt)0≤t≤T. We denote by Athe set of progressively measurable pro-
cesses α= (αt)0≤t≤Ttaking values in a given closed-convex set A⊂Rkand satisfying
the integrability condition ET
0|αt|2dt < ∞.
We consider the problem of minimizing
(2.1) J(α) = EïT
0
f(Xt, αt,L(Xt, αt)) dt+gXT,L(XT)ò
over the set Aof admissible control processes, under the dynamic constraint
(2.2) dXt=b(Xt, αt,L(Xt, αt)) dt+σ(Xt, αt,L(Xt, αt)) dWt
with X0a fixed F0-measurable random variable.
The symbol Lstands for the law of the given random element. We shall add mild
regularity conditions for the coefficients band σso that a solution to (2.2) always
exists when α∈A. For the sake of simplicity, we chose to use time independent
coefficients, but all the results would be the same should f,b, and σdepend upon t,
since time can be included as an extra state in the vector X.
The novelty of the above control problem lies in the fact that the cost functional
and the controlled SDE depend on the joint distribution of state and control. For
this reason, we call it the extended mean field control problem. In this generality,
this problem has not been studied before. We mention the works [35, 24, 6, 33] for
particular cases and different approaches.
2.1. Partial L-differentiability of functions of measures. We introduce
here the concept of L-differentiability for functions of joint probability laws (i.e.,
probability measures on product spaces). We refer the reader to [14, Chapter 5] for
more details.
Let u:Rq× P2(Rd×Rk)→R. We use the notation ξfor a generic element of
P2(Rd×Rk), and µ∈ P2(Rd) and ν∈ P2(Rk) for its marginals. We denote a generic
element of Rqby v.
Let (˜
Ω,˜
F,˜
P) be a probability space and let ˜ube a lifting of the function u. In
other words,
˜u:Rq×L2(˜
Ω,˜
F,˜
P;Rd×Rk)(v, ˜
X, ˜α)→ ˜u(v, ˜
X, ˜α) = u(v, L(˜
X, ˜α)).
We say that uis L-differentiable at (v, ξ) if there exists a pair
(˜
X, ˜α)∈L2(˜
Ω,˜
F,˜
P;Rd×Rk) with L(˜
X, ˜α) = ξ
such that the lifted function ˜uis Fr´echet differentiable at (v, ˜
X, ˜α); cf. [20, Chapter
II.5, p. 92]. When this is the case, it turns out that the Fr´echet derivative depends
only on the law ξand not on the specific pair ( ˜
X, ˜α) having distribution ξ; see
[11] or [14, Chapter 6] for details. Thanks to self-duality of L2spaces, the Fr´echet
derivative [D˜u](v, ˜
X, ˜α) of the lifting function ˜uat (v, ˜
X, ˜α) can be viewed as an
element D˜u(v, ˜
X, ˜α) of Rq×L2(˜
Ω,˜
F,˜
P;Rd×Rk) in the sense that
[D˜u](v, ˜
X, ˜α)( ˜
Y) = ˜
E[D˜u(v, ˜
X, ˜α)·˜
Y] for all ˜
Y∈Rq×L2(˜
Ω,˜
F,˜
P;Rd×Rk).
Since Rq×L2(˜
Ω,˜
F,˜
P;Rd×Rk)∼
=Rq×L2(˜
Ω,˜
F,˜
P;Rd)×L2(˜
Ω,˜
F,˜
P;Rk), as in [11],
EXTENDED MEAN FIELD CONTROL PROBLEMS 3669
the random variable D˜u(v, ˜
X, ˜α) can be represented a.s. via the random vector
D˜u(v, ˜
X, ˜α)
=Ä∂vu(v, L(˜
X, ˜α))( ˜
X, ˜α), ∂µu(v, L(˜
X, ˜α))( ˜
X, ˜α), ∂νu(v, L(˜
X, ˜α))( ˜
X, ˜α)ä
for measurable functions ∂vu(·,L(˜
X, ˜α))(·,·), ∂µu(·,L(˜
X, ˜α))(·,·), ∂νu(·,L(˜
X, ˜α))(·,·),
all of them defined on Rq×Rd×Rkand valued, respectively, on Rq,Rd, and Rk. We
call these functions the partial L-derivatives of uat (v, L(˜
X, ˜α)).
3. Stochastic maximum principle. Our goal is to prove a necessary and a
sufficient condition for optimality in the extended class of problems considered in the
paper. These are suitable extensions of the Pontryagin stochastic maximum principle
conditions. We define the Hamiltonian Hby
H(x, α, ξ, y, z) = b(x, α, ξ)·y+σ(x, α, ξ )·z+f(x, α, ξ)
(3.1)
for (x, α, ξ, y, z)∈Rd×Rk× P2(Rd×Rk)×Rd×Rd×m. Naturally, the dot notation
for matrices refers to the trace inner product. We let H0,n stand for the collection
of all Rn-valued progressively measurable processes on [0, T ], and denote by H2,n
the collection of processes Zin H0,n such that ET
0|Zs|2ds < ∞. We shall also
denote by S2,n the space of all continuous processes S= (St)0≤t≤Tin H0,n such that
E[sup0≤t≤T|St|2]<+∞. Here and in what follows, regularity properties, such as
continuity or Lipschitz character, of functions of measures are always understood in
the sense of the 2-Wasserstein distance of the respective spaces of probability measures
with finite second moments; cf. [34].
Throughout this section, we assume the following:
(I) The functions b,σ, and fare differentiable with respect to (x, α), for ξ∈
P2(Rd×Rk) fixed, and the functions
(x, α, ξ)→ (∂x(b, σ, f )(x, α, ξ), ∂α(b, σ, f )(x, α, ξ) )
are continuous. Moreover, the functions b,σ, and fare L-differentiable with
respect to the variable ξ, the mapping
Rd×A×L2(Ω,F,P;Rd×Rk)(x, α, (X, β)) → ∂µ(b, σ, f)(x, α, L(X, β))(X, β )
being continuous. Similarly, the function gis differentiable with respect to
x, the mapping (x, µ)→ ∂xg(x, µ) being continuous. The function gis also
L-differentiable with respect to the variable µ, and the following map is con-
tinuous:
Rd×L2(Ω,F,P;Rd)(x, X)→ ∂µg(x, L(X))(X)∈L2(Ω,F,P;Rd).
(II) The derivatives ∂x(b, σ) and ∂α(b, σ) are uniformly bounded, and the mapping
(x, α)→ ∂µ(b, σ)(x, α, ξ)(x, α) (resp., (x, α)→ ∂ν(b, σ)(x, α, ξ )(x, α))
has an L2(Rd, µ;Rd×Rk)-norm (resp., L2(Rk, ν;Rd×Rk)-norm) which is
uniformly bounded in (x, α, ξ). There exists a constant Lsuch that, for any
3670 B. ACCIAIO, J. BACKHOFF-VERAGUAS, AND R. CARMONA
R≥0 and any (x, α, ξ) such that |x|,|α|,ξL2≤R, it holds that
|∂xf(x, α, ξ)|∨|∂xg(x, µ)|∨|∂αf(x, α, ξ)| ≤ L(1 + R),
and the norms in L2(Rd×Rk, ξ;Rd×Rk) and L2(Rd, ξ;Rd×Rk) of (x, α)→
∂µf(x, α, ξ)(x, α), (x, α)→ ∂νf(x, α, ξ )(x, α), and x→ ∂µg(x, µ)(x) are
bounded by L(1 + R).
Under these assumptions, for any admissible control α∈A, we denote by X=
Xαthe corresponding controlled state process satisfying (2.2). We call adjoint pro-
cesses of X(or of α), the couple (Y,Z) of stochastic processes Y= (Yt)0≤t≤Tand
Z= (Zt)0≤t≤Tin S2,d ×H2,d×mthat satisfy
(3.2)
dYt=−∂xHθt, Yt, Zt+˜
E∂µH˜
θt,˜
Yt,˜
Zt)(Xt, αt)dt+ZtdWt, t ∈[0, T ],
YT=∂xgXT,L(XT)+˜
E∂µg˜
XT,L(XT)(XT),
where θt:= (Xt, αt,L(Xt, αt)), and the tilde notation refers to an independent copy.
Equation (3.2) is referred to as the adjoint equation. Formally, the adjoint variable
Ytreads as the derivative of the value function of the control problem with respect to
the state variable. In contrast with the deterministic case, in order for the solution to
be adapted to the information flow, the extra term ZtdWtis needed. This is a stan-
dard feature of the extension of the maximum principle from deterministic control
to stochastic control. As expected, it is driven by the derivative of the Hamiltonian
function with respect to the state variable. In addition, since the controlled dynamics
are of the McKean–Vlasov type, the state variable, with respect to which we differ-
entiate the Hamiltonian function, needs to include the probability measure appearing
in the state equation. This is now understood thanks to the early contributions [13]
and [14, Chapter 6]. In the present case of extended mean field control problems, the
above adjoint equation needed to account for the fact that the probability measure
appearing in the state equation is in fact the joint distribution of the state Xtand the
control αt. This forces us to involve the derivative of the Hamiltonian with respect
to the first marginal of this joint distribution.
Given αand as a result X,θtappears as a (random) input in the coefficients
of this equation which, except for the presence of the process copies, is a backward
stochastic differential equation of the McKean–Vlasov type, which is well posed under
the current assumptions. See for example the discussion in [14, Chapter 6, p. 532].
3.1. A necessary condition. The main result of this subsection is based on
the following expression of the Gˆateaux derivative of the cost function J(α).
Lemma 3.1. Let α∈A,Xbe the corresponding controlled state process, and
(Y,Z)its adjoint processes satisfying (3.2). For β∈A, the Gˆateaux derivative of J
at αin the direction β−αis
d
dJ(α+(β−α))=0 =ET
0∂αH(θt, Yt, Zt)+˜
E[∂νH(˜
θt,˜
Yt,˜
Zt)(Xt, αt)]·(βt−αt) dt,
where (˜
X,˜
Y,˜
Z,˜
α,˜
β)is an independent copy of (X,Y,Z,α,β)on the space (˜
Ω,˜
F,˜
P).
Proof. We follow the lines of the proof of the stochastic maximum principle for
the control of McKean–Vlasov equations given in [14, section 6.3]. Given admissible
EXTENDED MEAN FIELD CONTROL PROBLEMS 3671
controls αand β, for each > 0 we define the admissible control α= (α
t)0≤t≤T
by α
t=αt+(βt−αt), and we denote by X= (X
t)0≤t≤Tthe solution of the
state equation (2.2) for αin lieu of α. We then consider the variation process
V= (Vt)0≤t≤T, defined as the solution of the linear stochastic differential equation,
(3.3) dVt=γtVt+ρt+ηtdt+ˆγtVt+ ˆρt+ ˆηtdWt
with V0= 0. The coefficients γt, ˆγt,ηt, and ˆηtare defined by
γt=∂xb(θt),ˆγt=∂xσ(θt), ηt=∂αb(θt)(βt−αt),ˆηt=∂ασ(θt)(βt−αt),
which are progressively measurable bounded processes with values in the spaces Rd×d,
R(d×d)×d,Rd, and Rd×d, respectively (the parentheses around d×dindicate that ˆγt·u
is seen as an element of Rd×dwhenever u∈Rd). The coefficients ρtand ˆρtare given
by
ρt=˜
E∂µb(θt)( ˜
Xt,˜αt)˜
Vt+˜
E∂νb(θt)( ˜
Xt,˜αt)( ˜
βt−˜αt),
ˆρt=˜
E∂µσ(θt)( ˜
Xt,˜αt)˜
Vt+˜
E∂νσ(θt)( ˜
Xt,˜αt)( ˜
βt−˜αt),
which are progressively measurable bounded processes with values in Rdand Rd×d,
respectively, and where ( ˜
Xt,˜αt,˜
Vt,˜
βt) is an independent copy of (Xt, αt, Vt, βt) defined
on the separate probability structure (˜
Ω,˜
F,˜
P).
We call V= (Vt)0≤t≤Tthe variation process because it is the Gˆateaux derivative
of the state in the direction β−α, since, as detailed in [14, Lemma 6.10], it satisfies
lim
0
Eïsup
0≤t≤T
X
t−Xt
−Vt
2ò= 0.
For this reason, we have:
lim
0
1
[J(α)−J(α)]
=ET
0∂xf(θt)Vt+∂αf(θt)(βt−αt)
+˜
E[∂µf(θt)( ˜
Xt,˜αt)˜
Vt] + ˜
E[∂νf(θt)( ˜
Xt,˜αt)( ˜
βt−˜αt)]dt
+E∂xg(XT,L(XT))VT+˜
E[∂µg(XT,L(XT))( ˜
XT)˜
VT]
=ET
0∂xf(θt)Vt+∂αf(θt)(βt−αt)
+˜
E[∂µf(θt)( ˜
Xt,˜αt)˜
Vt] + ˜
E[∂νf(θt)( ˜
Xt,˜αt)( ˜
βt−˜αt)]dt
+E∂xg(XT,L(XT)) + ˜
E[∂µg(˜
XT,L(XT))(XT)VT],
(3.4)
where we used Fubini's theorem to obtain the last equality. Notice that, if we introduce
the adjoint processes (Y,Z) of α∈Aand the corresponding state process X, by (3.2),
we see that the last expectation above is exactly E[YTVT]. This can be computed
by integration by parts, using the Itˆo differentials of Yand V, which are given,
3672 B. ACCIAIO, J. BACKHOFF-VERAGUAS, AND R. CARMONA
respectively, by (3.2) and (3.3). In this way we obtain
YTVT=Y0V0+T
0
YtdVt+T
0
VtdYt+T
0
d[Y, V ]t
=MT+T
0Yt∂xb(θt)Vt+Yt∂αb(θt)(βt−αt) + Yt˜
E∂µb(θt)( ˜
Xt,˜αt)˜
Vt
+Yt˜
E∂νb(θt)( ˜
Xt,˜αt)( ˜
βt−˜αt)
−Vt∂xb(θt)Yt−Vt∂xσ(θt)Zt−Vt∂xf(θt)
−Vt˜
E∂µb(˜
θt)(Xt, αt)˜
Yt
−Vt˜
E∂µσ(˜
θt)(Xt, αt)˜
Zt−Vt˜
E∂µf(˜
θt)(Xt, αt)
+Zt∂xσ(θt)Vt+Zt∂ασ(θt)(βt−αt) + Zt˜
E∂µσ(θt)( ˜
Xt,˜αt)˜
Vt
+Zt˜
E∂νσ(t, θt)( ˜
Xt,˜αt)( ˜
βt−˜αt)dt,
where (Mt)0≤t≤Tis a mean zero integrable martingale which disappears when we take
expectations of both sides. Applying Fubini's theorem once more, we have
E[YTVT]
=ET
0Yt∂xb(θt)Vt+Yt∂αb(θt)(βt−αt) + Yt˜
E∂νb(θt)( ˜
Xt,˜αt)( ˜
βt−˜αt)
−Vt∂xb(θt)Yt−Vt∂xσ(θt)Zt−Vt∂xf(θt)−Vt˜
E∂µf(˜
θt)(Xt, αt)
+Zt∂xσ(θt)Vt+Zt∂ασ(θt)(βt−αt) + Zt˜
E∂νσ(t, θt)( ˜
Xt,˜αt)( ˜
βt−˜αt)dt.
Plugging this expression into the second equality of (3.4) we get, again by Fubini's
theorem,
lim
0
1
[J(α)−J(α)]
=ET
0∂αf(θt)(βt−αt) + ˜
E[∂νf(θt)( ˜
Xt,˜αt)( ˜
βt−˜αt)]
+Yt∂αb(θt)(βt−αt) + Yt˜
E∂νb(θt)( ˜
Xt,˜αt)( ˜
βt−˜αt)
+Zt∂ασ(θt)(βt−αt) + Zt˜
E∂νσ(t, θt)( ˜
Xt,˜αt)( ˜
βt−˜αt)dt,
which is the desired result, by (3.1).
We are now ready to prove the necessary part of the Pontryagin stochastic max-
imum principle. In the present framework of extended mean field control, we obtain
(3.5) below. It is not possible to improve this condition into a pointwise minimization
condition as in more classical versions of the problem, when there is no nonlinear
dependence on the law of the control; see (6.58) in [14]. We give an example of this
phenomenon in Remark 4.2.
Theorem 3.2. Under assumptions (I)–(II), if the admissible control α=
(αt)0≤t≤T∈Ais optimal, X= (Xt)0≤t≤Tis the associated controlled state given
by (2.2), and (Y,Z)=(Yt, Zt)0≤t≤Tare the associated adjoint processes satisfying
(3.2), then we have
(3.5)
Ä∂αH(θt, Yt, Zt) + ˜
E∂νH(˜
θt,˜
Yt,˜
Zt)(Xt, αt)ä·(αt−a)≤0∀a∈A, dt⊗dP-a.s.,
where (˜
X,˜
Y,˜
Z,˜
α)is an independent copy of (X,Y,Z,α)on L2(˜
Ω,˜
F,˜
P).
EXTENDED MEAN FIELD CONTROL PROBLEMS 3673
Proof. Given any admissible control β, we use as before the perturbation α
t=
αt+(βt−αt). Since αis optimal, we have the inequality
d
dJ(α+(β−α))=0 ≥0.
Using the result of the previous lemma, we get
ET
0∂αH(θt, Yt, Zt) + ˜
E[∂νH(˜
θt,˜
Yt,˜
Zt)(Xt, αt)]·(βt−αt) dt≥0.
We now use the same argument as in the classical case (see, e.g., [14, Theorem 6.14]).
For every tand β∈L2(Ω,Ft,P;A), we can take βtequal to αtexcept for the interval
[t, t +ε], where it equals β, obtaining
(3.6) EÄ∂αH(θt, Yt, Zt) + ˜
E[∂νH(˜
θt,˜
Yt,˜
Zt)(Xt, αt)]ä·(β−αt)≥0.
Further, for any a∈Awe can take βto be equal to aon an arbitrary set in Ft, and
to coincide with αtotherwise, establishing (3.5).
Remark 3.3. If the admissible optimal control αtakes values in the interior of
A, then we may replace (3.5) with the following condition (see, e.g., [14, Proposition
6.15]):
(3.7) ∂αH(θt, Yt, Zt) + ˜
E[∂νH(˜
θt,˜
Yt,˜
Zt)(Xt, αt)] = 0 dt⊗dP-a.s.
Remark 3.4. A sharpening of (3.5) can be obtained under the convexity condition
H(x, a, ξ, y, z)≥H(x, a, ξ, y , z) + ∂αH(x, a, ξ, y, z )·(a−a)
+˜
E∂νH(x, a, ξ, y, z)( ˜
Xt,˜αt)·( ˜α
t−˜αt)
(3.8)
for all x∈Rd,a, a∈A, and ˜αa copy on (˜
Ω,˜
F,˜
P) of an admissible control α,
and where ξ, ξ ∈ P2(Rd×A) with ξ=L(˜
Xt,˜αt) and ξ=L(˜
Xt,˜α
t). Indeed,
in the framework of Theorem 3.2, if (3.8) holds, we apply it for x=Xt(ω), a=
β(ω), y =Yt(ω), z =Zt(ω), a =αt(ω), and α=βsuch that( ˜
X, ˜
Y , ˜
Z, ˜α, ˜
β) is a copy
of (X, Y, Z, α, β). Passing to expectation and using (3.6), we get
E[H(Xt, β, L(Xt, β ), Yt, Zt)] ≥E[H(Xt, αt,L(Xt, αt), Yt, Zt)] ,
so
αt= argmin E[H(Xt, β, L(Xt, β ), Yt, Zt)] : β∈L2(Ω,Ft,P;A).
3.2. A sufficient condition. Guided by the necessary condition proven above,
we derive a sufficient condition for optimality in the same spirit, though under stronger
convexity assumptions. For a given pair ( ˜
X, ˜α), these conditions read as
(3.9) g(x, µ)≥g(x, µ) + ∂xg(x, µ)·(x−x) + ˜
E∂µg(x, µ)( ˜
X)·(˜
X−˜
X),
and
H(x, a, ξ, y, z)
≥H(x, a, ξ, y, z) + ∂xH(x, a, ξ, y , z)·(x−x) + ∂αH(x, a, ξ, y, z )·(a−a)
+˜
E∂µH(x, a, ξ, y, z)( ˜
X, ˜α)·(˜
X−˜
X) + ∂νH(x, a, ξ, y, z)( ˜
X, ˜α)·(˜α−˜α),
(3.10)
for all x, x∈Rd,a, a∈A,y∈Rd,z∈Rd×m, and any ˜
X(resp., ˜α) copy of a process
in H2,d (resp., of an admissible control) on (˜
Ω,˜
F,˜
P), and where µ=L(˜
X), µ=
L(˜
X), ξ =L(˜
X, ˜α), and ξ=L(˜
X,˜α); see [14, Chapter 6].
3674 B. ACCIAIO, J. BACKHOFF-VERAGUAS, AND R. CARMONA
Theorem 3.5. Under Assumptions (I)–(II), let α= (αt)0≤t≤T∈Abe an ad-
missible control, X= (Xt)0≤t≤Tthe corresponding controlled state process, and
(Y,Z)=(Yt, Zt)0≤t≤Tthe corresponding adjoint processes satisfying (3.2). Let us
assume that
(i) gis convex in the sense of (3.9);
(ii) His convex in the sense of (3.10).
Then, if (3.5) holds, αis an optimal control, i.e., J(α) = infα∈AJ(α).
As before, we use the notation θt= (Xt, αt,L(Xt, αt)) throughout the proof.
Proof. We follow the steps of the classical proofs; see, for example, [14, Theorem
6.16] for the case of the control of standard McKean–Vlasov SDEs. Let ( ˜
X, ˜α) be a
copy of (X, α) on (˜
Ω,˜
F,˜
P), and let α∈Abe any admissible control with X=Xα
the corresponding controlled state. By definition of the objective function in (2.1)
and of the Hamiltonian of the control problem in (3.1), we have
J(α)−J(α)
=Eg(XT,L(XT)) −g(X
T,L(X
T))+ET
0f(θt)−f(θ
t)dt
=Eg(XT,L(XT)) −g(X
T,L(X
T))+ET
0H(θt, Yt, Zt)−H(θ
t, Yt, Zt)dt
−ET
0b(θt)−b(θ
t)·Yt+σ(θt)−σ(θ
t)] ·Ztdt
(3.11)
with θ
t= (X
t, α
t,L(X
t, α
t)). Being gconvex, we have
EgXT,L(XT)−gX
T,L(X
T)
≤E∂xg(XT,L(XT)) ·(XT−X
T) + ˜
E∂µg(XT,L(XT))( ˜
XT)·(˜
XT−˜
X
T)
=E∂xg(XT,L(XT)) + ˜
E[∂µg(˜
XT,L(XT))(XT)]·(XT−X
T)
=E(XT−X
T)·YT,
(3.12)
where we used Fubini's theorem and the fact that the “tilde random variables” are
independent copies of the “nontilde” ones. Using integration by parts and the fact
that Y= (Yt)0≤t≤Tsolves the adjoint equation (3.2), we get
E[(XT−X
T)·YT]
=EîT
0(Xt−X
t)·dYt+T
0Yt·d[Xt−X
t] + T
0[σ(θt)−σ(θ
t)] ·Ztdtó
=−ET
0∂xH(θt, Yt, Zt)·(Xt−X
t) + ˜
E∂µH(˜
θt,˜
Yt,˜
Zt)(Xt, αt)·(Xt−X
t)dt
+ET
0[b(θt)−b(θ
t)] ·Yt+ [σ(θt)−σ(θ
t)] ·Ztdt.
(3.13)
Again by Fubini's theorem, we get
ET
0˜
E∂µH(˜
θt,˜
Yt,˜
Zt)(Xt, αt)·(Xt−X
t) dt
=ET
0˜
E∂µH(θt, Yt, Zt)( ˜
Xt,˜αt)·(˜
Xt−˜
X
t)dt.
EXTENDED MEAN FIELD CONTROL PROBLEMS 3675
Together with (3.11), (3.12), and (3.13), this gives
J(α)−J(α)
≤ET
0[H(θt, Yt, Zt)−H(θ
t, Yt, Zt)]dt
−ET
0∂xH(θt, Yt, Zt)·(Xt−X
t) + ˜
E∂µH(θt, Yt, Zt)( ˜
Xt,˜αt)·(˜
Xt−˜
X
t)dt
≤ET
0∂αH(θt, Yt, Zt)·(αt−α
t) + ˜
E∂νH(θt, Yt, Zt)( ˜
Xt,˜αt)·( ˜αt−˜α
t)dt
=ET
0∂αH(θt, Yt, Zt) + ˜
E∂νH(˜
θt,˜
Yt,˜
Zt)(Xt, αt)·(αt−α
t)dt
≤0
because of the convexity of H, Fubini's theorem, and (3.5), showing that αis opti-
mal.
4. Examples. In this section, we consider models for which the solution strategy
suggested by the stochastic maximum principle proved in the previous section can be
pushed further. In fact, in sections 4.2 and 4.3, we are able to obtain explicit solutions.
4.1. The case of scalar interactions. In this subection, we state explicitly
what the above forms of the Pontryagin stochastic maximum principle become in the
case of scalar interactions. This is a case of particular interest because it does not need
the full generality of the differential calculus on Wasserstein spaces, and can be dealt
with by using standard calculus. An example of scalar interactions will be studied
and explicitly solved in the next subsection; see also [23] for another application of
scalar interactions.
Assume drift and cost functions to be of the form
b(x, α, ξ) = b0x, α, ϕdξ, f (x, α, ξ ) = f0x, α, ψdξ, g(x, µ) = g0x, φdµ
for some functions b0, f0on Rd×A×R,g0on Rd×R,ϕ, ψ on Rd×A, and φon Rd.
In order to simplify the notation, we shall assume that the volatility is independent
of the control and, actually, we take σ≡Id. Under these circumstances, the adjoint
equation becomes
dYt=−∂xb0(Xt, αt,E[ϕ(Xt, αt)])Yt+∂xf0(Xt, αt,E[ψ(Xt, αt)])
+˜
E[˜
Yt·∂ζb0(˜
Xt,˜αt,E[ϕ(Xt, αt)])]∂xϕ(Xt, αt)
+˜
E[∂ζf0(˜
Xt,˜αt,E[ψ(Xt, αt)])]∂xψ(Xt, αt)dt+ZtdWt
with terminal condition YT=∂xg0(XT,E[φ(XT)]) + ˜
E[∂ζg0(˜
XT,E[φ(XT)])]∂xφ(XT).
Accordingly, the necessary condition (3.7) for optimality will be satisfied when
0 = ∂αb0(Xt, αt,E[ϕ(Xt, αt)]) ·Yt+∂αf0(Xt, αt,E[ψ(Xt, αt)])
+˜
E[˜
Yt·∂ζb0(˜
Xt,˜αt,E[ϕ(Xt, αt)])]∂αϕ(Xt, αt)
+˜
E[∂ζf0(˜
Xt,˜αt,E[ψ(Xt, αt)])]∂αψ(Xt, αt).
(4.1)
4.2. Optimal liquidation with market impact. In this section we explicitly
solve an example that lies outside the classical LQ framework, in the sense that
convexity fails. This is inspired by an optimal liquidation problem with price impact,
but here it is more of mathematical interest than a financial one.
3676 B. ACCIAIO, J. BACKHOFF-VERAGUAS, AND R. CARMONA
Consider a market where a group of investors, indexed by i, has large positions
qi
0on the same asset S. Each investor wants to liquidate her position by a fixed time
T > 0, and controls her trading speed αi
tthrough time. Her state is then described
by two variables: her inventory Qi
t, that starts at qi
0and changes according to αi
t,
and her wealth Xi
t, which is assumed to start at zero for all traders. Investors' speed
of trading affects prices in two ways. On the one hand, it generates a permanent
market impact, as the dynamics of Sare assumed to linearly depend on the average
trading speed of all investors. On the other hand, it produces a temporary impact,
that only affects traders' own wealth process (as fees or liquidation cost), and which
is assumed to be linear in their respective rate of trading. The optimality criterion
is the minimization of the cost, which is composed of three factors: the wealth at
time T, the final value of the inventory penalized by a terminal market impact, and a
running penalty which is assumed quadratic in the inventory. The optimal trades will
be a result of the trade-off between trading slowly to reduce the market impact (or
execution/liquidity cost), and trading quickly to reduce the risk of future uncertainty
in prices; see, e.g., [2, 16, 17, 12, 6].
Here we think of a continuum of investors. The initial inventories are distributed
according to a measure m0on R. We formulate the problem for a representative
agent, in the case of cooperative equilibria. The inventory process then evolves as
(4.2) dQt=αtdt, Q0∼m0,
while the wealth process is given by
dXt=−αt(St+kαt)dt, X0= 0,
where kαtmeasures the temporary market impact. The price process is modeled by
dSt=λE[αt]dt+σdWt, S0=s0,
where E[αt] represents the average trading speed, hence λE[αt] stands for the perma-
nent market impact to which all agents contribute (naturally λ≥0). The cost to be
minimized is given by
Eñ−XT−QT(ST−AQT) + φT
0
Q2
tdtô,
whereXTis the terminal profit due to trading in [0, T ], QT(ST−AQT) is the liquida-
tion value of the remaining quantity at terminal time (with a liquidation/execution
penalization), and φis an “urgency” parameter on the running cost (the higher φis,
the higher is the liquidation speed at the beginning of the trading period). Using the
dynamics of X, this can be rewritten as
EñT
0
(αtSt+kα2
t+φQ2
t)dt−QT(ST−AQT)ô.
This example falls into the framework described in section 2. We have a 2-dimensional
state process (S, Q), a 1-dimensional Wiener process W, and the control process is
the trading speed α. The Hamiltonian of the system is
H(x1, x2, a, ξ, y1, y2) = λ¯
ξ2y1+ay2+φx2
2+ax1+ka2,
where ¯
ξ2=vξ(du, dv), and the first order condition (4.1) reads as
(4.3) Y2
t+St+ 2kαt+λE[Y1
t] = 0
EXTENDED MEAN FIELD CONTROL PROBLEMS 3677
with adjoint equations
dY1
t=−αtdt+Z1
tdWt, Y 1
T=−QT,(4.4)
dY2
t=−2φQtdt+Z2
tdWt, Y 2
T=−ST+ 2AQT.(4.5)
Remark 4.1. Here the terminal cost function greads as
g(x1, x2) = −x1x2+Ax2
2,
which does not satisfy the convexity condition (3.9). However, an inspection of the
proof of Theorem 3.5 reveals that this assumption was only used in order to obtain the
inequality in (3.12). We are now going to show that such an inequality holds in the
present setting when A≥λ(which is satisfied for typical values of the parameters; see
[17, 12]), thus guaranteeing that the first order condition (4.3) is not only necessary
but also sufficient for the optimality of α. For this purpose, let α∈Abe any
admissible control, and (S, Q) the corresponding controlled state. Then
Eg(ST, QT)−g(S
T, Q
T)−E(ST−S
T)Y1
T+ (QT−Q
T)Y2
T
=λÇEñT
0
α
tdt−T
0
αtdtôå2
−AEÇT
0
αtdt−T
0
α
tdtå2
≤(λ−A)EÇT
0
αtdt−T
0
α
tdtå2,
which is nonpositive for A≥λ.
An inspection of (4.4) suggests that we have Z1
t= 0 and Y1
t=−Q0−t
0αsds=
−Qt;Y2
twill be determined later. Substituting into (4.3), we have
Y2
0−2φt
0
Qsds+t
0
Z2
sdWs+s0+λt
0
E[αs]ds+σWt+ 2kαt
−λ(E[Q0] + t
0
E[αs]ds) = 0,
that is,
(4.6) αt=λE[Q0]−Y2
0−s0
2k+φ
kt
0
Qsds−1
2kt
0
(Z2
s+σ)dWs.
We now show that Q≡Q0and α≡α0, where
Q0
t:= E[Qt|Q0], α0
t:= E[αt|Q0].
By taking conditional expectation in (4.2) and (4.6), we get
(4.7) Q0
t=Q0+t
0
α0
sds, α0
t=α0+φ
kt
0
Q0
sds.
Setting F(t) := Q0
t, we note that F(t) = α0
tand F(t) = φ
kF(t). Together with the
initial conditions F(0) = Q0and F(0) = α0, this gives
(4.8) F(t) = ÅQ0
2−α0
2rãe−rt +ÅQ0
2+α0
2rãert,
where r=φ/k. Now, by taking conditional expectation in (4.5), and substituting
3678 B. ACCIAIO, J. BACKHOFF-VERAGUAS, AND R. CARMONA
into (4.7), we obtain
α0
T=λE[Q0]−2AQ0
2k+λ
2kT
0
E[αt]dt−A
kT
0
α0
tdt
=λE[Q0]−2AQ0
2k+λ
2k(E[QT]−E[Q0]) −A
k(Q0
T−Q0)(4.9)
=λ
2kE[QT]−A
kQ0
T,
that is, F(T) = λ
2kE[F(T)] −A
kF(T). Imposing this condition, and using (4.8), we
obtain
(4.10) α0=Q0rd1e−rT −d2erT
d1e−rT +d2erT +E[Q0]4λφ
(d1e−rT +d2erT )(c1e−rT +c2er T ),
where d1=√φk −A, d2=√φk +A, c1= 2d1+λ, c2= 2d2−λ. From (4.6), we also
have an explicit expression for Y2
0=λE[Q0]−s0−2kα0.
Now we use the ansatz Z2≡ −σ, and show that the process
(4.11) Y2
t=Y2
0−2φt
0
Qsds−σWt
does satisfy the equation and terminal condition in (4.5). Only the latter needs to be
shown. First note that, with this ansatz, from (4.6) and (4.2) we have
αt=α0+φ
kt
0
Qsds, Qt=Q0+α0t+φ
kt
0s
0
Qududs;
thus both processes αand Qare σ(Q0)-measurable, that is,
(4.12) Qt=E[Qt|Q0] = Q0
t=F(t) and αt=E[αt|Q0] = α0
t=F(t).
We now check that Y2satisfies the terminal condition in (4.5). By (4.12), (4.11)
implies
Y2
T=λE[Q0]−s0−2kα0−2φT
0
Q0
tdt−σWT.
On the other hand, by (4.12), (4.9), and (4.7),
−ST+ 2AQT=−s0−λ(E[QT]−E[Q0]) −σWT+ 2AQ0
T
=−s0+λE[Q0]−2kα0
T−σWT
=−s0+λE[Q0]−2kα0−2φT
0
Q0
tdt−σWT,
which yields Y2
T=−ST+ 2AQT, as wanted. This shows that the process Z2in the
ansatz, together with Y2defined above, does satisfy (4.5). We have seen that this
gives Qt=F(t) and αt=F(t), by (4.12), thus from (4.8) we have
Qt=ÅQ0
2−α0
2rãe−rt +ÅQ0
2+α0
2rãert, αt=Å−Q0r
2+α0
2ãe−rt +ÅQ0r
2+α0
2ãert.
By (4.10), this gives
Qt=Q0
d1e−r(T−t)+d2er(T−t)
d1e−rT +d2erT +E[Q0]2λ√φk(−e−r t +ert)
(d1e−rT +d2erT )(c1e−rT +c2er T ),
αt=Q0rd1e−r(T−t)−d2er(T−t)
d1e−rT +d2erT +E[Q0]2λφ(e−rt +ert)
(d1e−rT +d2erT )(c1e−rT +c2er T ).
EXTENDED MEAN FIELD CONTROL PROBLEMS 3679
4.3. The LQ case. In this subsection, we use the sufficient condition derived
above to solve a simple LQ model. Via different methods, such models have been
already studied in the literature; see, e.g., [35, 24, 6, 33]. For the sake of simplicity,
we give the details of the computations in the scalar case m=d=k= 1 and with
A=R. Also, as before, we assume that the volatility is not controlled and, in fact,
that it is identically equal to 1. In such an LQ model, the drift is of the form
b(x, α, ξ) = b1x+b2α+¯
b1¯x+¯
b2¯α
for some constants b1, b2,¯
b1,¯
b2, where we denote by ¯xand ¯αthe means of the state
and the control, in the sense that ¯x=xξ(dx, dα) and ¯α= αξ(dx, dα). As for
the cost functions, we assume that
f(x, α, ξ) = 1
2qx2+ ¯q(x−s¯x)2+rα2+ ¯r(α−¯s¯α)2, g(x, µ) = 1
2γx2+¯γ
2(x−ρ¯x)2
for some constants q, ¯q , r, ¯r, s, ¯s, γ, δ, ρ satisfying ¯q, ¯r, ¯γ≥0 and q, r, γ > 0. Under
these conditions, the Hamiltonian reads
(4.13)
H(x, α, ξ, y)=(b1x+b2α+¯
b1¯x+¯
b2¯α)y+1
2qx2+ ¯q(x−s¯x)2+rα2+ ¯r(α−¯s¯α)2.
Accordingly, the adjoint equation reads as
(4.14) dYt=−b1Yt+ (q+ ¯q)Xt+¯
b1E[Yt] + s¯q(s−2)E[Xt]dt+ZtdWt.
In the present situation, conditions (i) and (ii) of Theorem 3.5 hold, and condition
(3.7) of the Pontryagin stochastic maximum principle holds if
(4.15) b2Yt+¯
b2E[Yt]+(r+ ¯r)αt+ ¯r¯s(¯s−2)E[αt]=0.
Taking expectations, we obtain
(4.16) E[αt] = −b2+¯
b2
r+ ¯r(¯s−1)2E[Yt].
Plugging this expression into (4.15), we get
(4.17) αt=−1
r+ ¯rb2Yt+¯
b2−¯r¯s(¯s−2)(b2+¯
b2)
r+ ¯r(¯s−1)2¯
Yt.
We can rewrite (4.17) and (4.16) as
(4.18) αt=aYt+bE[Yt] and E[αt] = cE[Yt]
with
(4.19)
a=−b2
r+ ¯r, b =−1
r+ ¯r¯
b2−¯r¯s(¯s−2)(b2+¯
b2)
r+ ¯r(¯s−1)2,and c=−b2+¯
b2
r+ ¯r(¯s−1)2.
With this notation, the solution of the mean field optimal control of the McKean–
Vlasov SDE (2.2) reduces to the solution of the following forward-backward SDE
(FBSDE) of McKean–Vlasov type:
(4.20)
dXt=b1Xt+¯
b1E[Xt]+(ab2Yt+ (bb2+c¯
b2)E[Yt]dt+ dWt,
dYt=−b1Yt+ (q+ ¯q)Xt+¯
b1E[Yt] + s¯q(s−2)E[Xt]dt+ZtdWt
3680 B. ACCIAIO, J. BACKHOFF-VERAGUAS, AND R. CARMONA
with terminal condition YT= (γ+ ¯γ)XT+ ¯γρ(ρ−2)E[XT]. We solve this system
in the usual way. First, we compute the means ¯xt=E[Xt] and ¯yt=E[Yt]. Taking
expectations in (4.20), we obtain
(4.21)
d¯xt=(b1+¯
b1)¯xt+ (ab2+bb2+c¯
b2)¯ytdt,
d¯yt=−(b1+¯
b1)¯yt+ (q+ ¯q+s¯q(s−2))¯xtdt
with terminal condition ¯yT= (γ+ ¯γ+ ¯γρ(ρ−2))¯xT. The linear system (4.21) can be
solved explicitly. For instance, if we denote
∆ := »(b1+¯
b1)2−(ab2+bb2+c¯
b2)(q+ ¯q+s¯q(s−2)),
and assume that the argument of the square root is strictly positive, one can solve
(4.21) via the theory of linear ODE systems in the case of real eigenvalues. We then
obtain that
¯xt=−(b1+¯
b1)2−∆2
2(q+ ¯q+s¯q(s−2))∆ ße−∆tÅy0+(q+ ¯q+s¯q(s−2))x0
b1+¯
b1+ ∆ ã
−e∆tÅy0+(q+ ¯q+s¯q(s−2))x0
b1+¯
b1−∆ã™
together with
¯yt=−(b1+¯
b1)2−∆2
2(q+ ¯q+s¯q(s−2))∆
·ß−(q+ ¯q+s¯q(s−2))e−∆t
b1+¯
b1−∆Åy0+(q+ ¯q+s¯q(s−2))x0
b1+¯
b1+ ∆ ã
+(q+ ¯q+s¯q(s−2))e∆t
b1+¯
b1+ ∆ Åy0+(q+ ¯q+s¯q(s−2))x0
b1+¯
b1−∆ã™
solve (4.21) for any y0, and choosing y0appropriately one can guarantee that ¯yT=
(γ+ ¯γ+ ¯γρ(ρ−2))¯xT. This expression for (¯xt,¯yt) can be plugged into (4.20) in lieu
of (E[Xt],E[Yt]), reducing the latter to a standard affine FBSDE. We then make the
ansatz Yt=ηtXt+χtfor two deterministic functions t→ ηtand t→ χt, which is
compatible with the terminal condition. Computing the Itˆo differentials of Ytfrom the
ansatz and from the system (4.20), and identifying the terms in the drift multiplying
the unknown Xt, we find that ηtshould be a solution of the scalar Riccati equation
ηt=−1
2b1
(q+ ¯q+η
t+ab2η2
t).
The latter is easily solved, and since necessarily ¯yt=ηt¯xt+χt, then χtcan also be
explicitly obtained. By Theorem 3.5, the control αobtained in this way is optimal.
Notice that it takes the form
αt=aηtXt+aχt+b¯xt
with aand bgiven in (4.19).
Remark 4.2. In classical control of mean field type, the pointwise minimization
of the Hamiltonian with respect to the control is a necessary optimality condition.
Let us illustrate with the LQ example how this need not be the case in our extended
framework. If we impose pointwise minimization of (4.13) with respect to α, we get
EXTENDED MEAN FIELD CONTROL PROBLEMS 3681
b2Yt+rαt+ ¯r(αt−¯s¯αt) = 0. Integrating it, we obtain b2E[Yt] + (r+ ¯r−¯r¯s) ¯αt= 0.
On the other hand, the necessary condition (3.5) implies (4.15), so we have ¯
b2E[Yt] +
¯r¯s(¯s−1) ¯αt= 0. The right choice of parameters leads to a contradiction between this
and the previous equation.
5. Variational perspective in the weak formulation. The goal of this sec-
tion is to analyze the extended mean field control problem from a purely variational
perspective, that is, by considering its formulation on path space. Given the intrinsic
nature of mean field problems, it is natural to express them in terms of laws rather
than controls. The main reason for exploring this point of view is that of creating
a bridge with the optimal transport theory. This paves the way to the use of differ-
ent sets of tools as, for example, the numerical methods that are fast developing in
transport theory. We start by introducing, in section 5.1, a weak formulation of the
extended mean field control problem, especially well-suited for variational analysis.
In such a formulation, the probability space is not specified a priori. We remark that
a weak formulation of the mean field control problem has been considered in [14,
section 6.6] and in [25], the latter rigorously proving convergence of large systems of
interacting control problems to the corresponding mean field control problem. How-
ever, in these works there is no nonlinear dependence on the law of the control; cf.
our problem (5.1) below.
We proceed in section 5.2 to obtain what we call a martingale optimality condition.
Such a condition can serve as a verification tool, in order to evaluate whether a given
control can be optimal. It is therefore the weak-formulation analogue of the necessary
Pontryagin maximum principle. This forms a bridge between the previous sections of
this work, and the ensuing ones. Whenever the Pontryagin maximum principle can be
used (or the martingale optimality condition in the weak formulation), it is a powerful
tool to identify optimal controls and the trajectories of the state at the optimum.
However, it does not say much about the optimal value of the problem. In fact, at
the optimum, the adjoint process gives formally the value of the gradient of the value
function when computed along the optimal trajectories. In order to study the value
function of the control problem (in a situation in which PDE techniques are highly
nontrivial) we recast in section 5.3 our weak formulation in transport-theoretic terms.
Numerical optimal transport has spectacularly grown in strength over the last
few years; see, e.g., [19, 7, 29] and the references therein. Our connection between
transport and mean field control is meant to lay ground for efficient numerical methods
in the future. In section 5.4 we provide, at a theorerical level, a first discretization
scheme of this kind. To be specific, the optimal transport problem we obtain in
the discretization has an additional causality constraint (see, e.g., [26, 1, 4, 5]); the
numerical analysis of such problems is also having a burst of activity (e.g., [30, 31, 32]).
5.1. The weak formulation. We present a weak formulation of the extended
mean field control problem formulated in section 2, in the sense that the probability
space is not specified here. We restrict our attention to the case where the state
dynamics have uncontrolled volatility, actually assuming σ≡Id,m=d, that the
drift does not depend on the law of the control, and that the initial condition X0is a
constant x0. We thus consider the minimization problem
inf
P,α
EPñT
0
f(Xt, αt,LP(Xt, αt)) dt+g(XT,LP(XT))ô
subject to dXt=b(Xt, αt,LP(Xt)) dt+ dWt, X0=x0,
(5.1)
3682 B. ACCIAIO, J. BACKHOFF-VERAGUAS, AND R. CARMONA
where the infimum is taken over filtered probability spaces (Ω,F,P) supporting some
d-dimensional Wiener process W, and over control processes αwhich are progres-
sively measurable on (Ω,F,P) and Rk-valued. We use LPto denote the law of the
given random element under P. Again, we choose time independent coefficients for
simplicity, but all the results would be the same should fand bdepend upon t.
We say that (Ω,F,P,W,X,α) is a feasible tuple if it participates in the above
optimization problem yielding a finite cost.
5.2. Martingale optimality condition. In this section, we obtain a necessary
Pontryagin principle for the weak formulation (5.1). We call this the martingale
optimality condition. Since our aim is to illustrate the method, we assume only in
this part that we are dealing with a drift-control problem
b(x, α, µ) = α, m =d.
We start by expressing the objective function of (5.1) in canonical space, as a func-
tion of semimartingale laws. We denote by Cx0the space of Rd-valued continuous
paths started at x0, and by Sthe canonical process on it. We consider the set of
semimartingale laws
(5.2) ˜
P:= {µ∈ P(Cx0):dSt=αµ
t(S)dt+ dWµ
tµ-a.s.},
where Wµis a µ-Brownian motion and αµis a progressively measurable process w.r.t.
the canonical filtration, denoted by F. It is then easy to see that (5.1) is equivalent
to
(5.3) inf
µ∈˜
P
EµñT
0
fSt, αµ
t,Lµ(St, αµ
t)dt+g(ST, µT)ô.
In what follows we consider perturbation of measures in ˜
Pvia push-forwards along
absolutely continuous shifts which preserve the filtration; see the work of Cruzeiro
and Lassalle [18] and the references therein. Using push-forwards instead of pertur-
bations directly on the SDE is the main difference between the weak and the strong
perspective. The main idea is to find the first order conditions for problem (5.3) by
considering perturbations of the form µ,K := (Id +K)∗µaround a putative opti-
mizer µ. For this matter it is important to identify the Doob–Meyer decomposition
of the canonical process under µ,K , which forces an assumption on Kas we now
explain.
Remark 5.1. Let µ∈˜
P. We say that an adapted process U:Cx0→ Cx0is
µ-invertible if there exists V:Cx0→ Cx0adapted such that U◦V=IdCx0holds
U(µ)−a.s., and V◦U=IdCx0holds µ−a.s. Now let K·=.
0ktdtbe adapted. We
say that Kpreserves the filtration under µif for every Uwhich is µ-invertible we
also have that U+Kis µ-invertible. It follows that the set of those K=.
0ktdt
that preserve the filtration under µ, is a linear space. It also follows that for such K
we have µ,K := (Id +K)∗µ∈˜
Pwith αµ,K
t(S+K(S)) = αµ
t(S) + kt(S); see
[18, Proposition 2.1, Lemma 3.1]. A typical case when the filtration is preserved is
when Kis a piecewise linear and adapted process, while an example when Kdoes not
preserve the filtration is given by Tsirelson's drift; see, respectively, [18, Proposition
2.4, Remark 2.1.1].
In analogy to [18, Theorem 5.1], we then obtain the following necessary condition
for an optimizer in (5.3). We use here the notation θµ
t= (St, αµ
t,Lµ(St, αµ
t)).
EXTENDED MEAN FIELD CONTROL PROBLEMS 3683
Proposition 5.2. Let µbe an optimizer for (5.3). Then the process Nµgiven
by
(5.4) Nµ
t:= ∂af(θµ
t) + ˜
E[∂νf(˜
θµ
t)(St, αµ
t)] −t
0∂xf(θµ
s) + ˜
E[∂µf(˜
θµ
s)(Ss, αµ
s)]ds
is a µ-martingale, with terminal value equal to
(5.5)
Nµ
T=−∂xg(ST, µT)−˜
E[∂µg(˜
ST, µT)(ST)] −T
0∂xf(θµ
s) + ˜
E[∂µf(˜
θµ
s)(Ss, αµ
s)]ds.
Proof. We use the notation µ,K introduced in Remark 5.1, and call C(µ) the
cost function appearing in problem (5.3). We have lim→0C(µ,K )−C(µ)
≥0 for all K.
Now if Kpreserves the filtration under µ, then the same is true for −K. Therefore
lim→0C(µ,K )−C(µ)
= 0. To conclude the proof, we use αµ,K
t(S+K(S)) = αµ
t(S)+
kt(S) and similar arguments as in [18, Theorem 5.1].
When (5.4)–(5.5) hold, we say that µsatisfies the martingale optimality condition.
The interest of this condition is that it is a clear stochastic counterpart to the classical
Euler–Lagrange condition in the calculus of variations, except for the fact that “being
equal to zero” is here replaced by “being a martingale”; see [18, 27].
Example 5.3. The martingale optimality condition is the analogue of the Pon-
tryagin principle in the weak formulation. To wit, we verify this in a simple ex-
ample. Suppose f(Xt, αt,L(Xt, αt)) = 1
2(αt−E[αt])2and g(XT,L(XT)) = 1
2X2
T.
The martingale optimality condition then asserts that for an optimizer µthe process
Nµ
t:= αµ
t−E[αµ
t] is a martingale with Nµ
T=−ST. On the other hand the Pontryagin
FBSDE states that
dYt=ZtdWt, YT=XT,
as well as αt−E[αt] + Yt, by Remark 3.3. We see the compatibility of the two
statements, as well as the equality in law Nµ
t=−Yt, in this particular case.
Remark 5.4. The above arguments can be adapted to the case when b(x, α, µ) =
b(x, α). This is the case, for example, when bis a C1-diffeomorphism and b(x, Rk) is
convex for each x. Indeed, in this case one may redefine the drift in the dynamics of
Svia βµ
t(S) := b(St, αµ
t(S)), which is associated with the cost
f(St, b−1(St, βµ
t(S)),Lµ(St, b−1(St, βµ
t(S))) ),
where with some abuse of notation b−1(x, ·) denotes the inverse of b(x, ·). Using this
time the notation θµ
t= (St, βµ
t,Lµ(St, βµ
t)) one then replaces the right-hand side
(r.h.s.) of (5.4) with
(5.6) ∂af(θµ
t)∂a(b−1)(St, βµ
t) + ˜
E[∂νf(θµ
t)∂a(b−1)( ˜
St,˜
βt)]
−t
0∂xf(θµ
s) + ˜
E[∂µf(θµ
s)( ˜
Ss,˜
βs) + ∂νf(θµ
s)∂x(b−1)( ˜
Ss,˜
βs)]ds,
and the r.h.s. of (5.5) with
(5.7) −∂xg(ST, µT)−˜
E[∂µg(ST, µT)( ˜
ST)]
−T
0∂xf(θµ
s) + ˜
E[∂µf(θµ
s)( ˜
Ss,˜
βs) + ∂νf(θµ
s)∂x(b−1)( ˜
Ss,˜
βs)]ds.
3684 B. ACCIAIO, J. BACKHOFF-VERAGUAS, AND R. CARMONA
5.3. Optimal transport reformulation. In this section we formulate a vari-
ational transport problem on C=C([0, T ]; Rd), the space of Rd-valued continuous
paths, which is equivalent to finding the weak solutions of the extended mean field
problem (5.1). This variational formulation is a particular type of transport problem
under the so-called causality constraint; see [26, 1, 4, 5]. Here we recall this concept
with respect to the filtrations F1and F2, generated by the first and by the second
coordinate process on C × C.
Definition 5.5. Given ζ1,ζ2∈ P(C), a probability measure π∈ P(C×C)is called
a causal transport plan between ζ1and ζ2if its marginals are ζ1and ζ2and, for any
t∈[0, T ]and any set A∈ F2
t, the map C x→ πx(A)is ˜
F1
t-measurable, where
πx(dy) := π({x} × dy)is a regular conditional kernel of πw.r.t. the first coordinate,
and ˜
F1is the completion of F1w.r.t. ζ1. The set of causal transport plans between ζ1
and ζ2is denoted by Πc(ζ1,ζ2).
The only transport plans that contribute to the variational formulation of the
problem are those under which the difference of the the coordinate processes on the
product space C × C is a.s. absolutely continuous with respect to Lebesgue measure.
We denote by (ω, ω) the generic element on C × C, and we use ( ˙
˘
ω−ω) to indicate the
density of the process ω−ωwith respect to Lebesgue measure, when it exists, i.e.,
ωt−ωt=ω0−ω0+t
0
(˙
˘
ω−ω)sds, t ∈[0, T ].
In such a case, we write ω−ω L. Moreover, we set
γ:= Wiener measure on Cstarted at 0
and Π
c(γ,·) := {π∈ P(C × C) : π(dω× C) = γ(dω),and ω−ω L, π-a.s.}.
We present the connection between extended mean field control and causal trans-
port.
Lemma 5.6. Assume that b(x, ., µ)is injective, and set
ut(ω, ω, µ) := b−1(ωt, ., µ)(˙
˘
ω−ω)t.
Then problem (5.1) is equivalent to
inf EπñT
0
fωt, ut(ω, ω, µπ
t),Lπ(ωt, ut(ω, ω, µπ
t))dt+g(ωT, µπ
T)ô,(5.8)
where the infimum is taken over transport plans π∈Π
c(γ,·)such that dt⊗dπ-a.s.
(˙
˘
ω−ω)t∈b(ωt,Rd, µπ
t), and µπdenotes the second marginal of π.
Proof. Fix (Ω,F,P,W,X,α) a feasible tuple for (5.1), if it exists, and note
that αt=ut(W,X,LP(Xt)) is FX,W-adapted. Then π:= LP(W,X) belongs to
Π
c(γ,LP(X)) and generates the same cost in (5.8). Conversely, given a transport
plan πparticipating in (5.8), the following tuple (Ω,F,P,W,X,α) is feasible for
(5.1): Ω = C × C,Fcanonical filtration on C × C,P=π,W=ω,X=ω, and
αt=ut(ω, ω, µπ
t).
The connection presented in the above lemma will be used in the next proposition,
in order to reduce the optimization problem in (5.1) to a minimization over weak closed
loop tuples, in the following sense.
EXTENDED MEAN FIELD CONTROL PROBLEMS 3685
Definition 5.7. We say that a feasible tuple for (5.1) is a weak closed loop if the
control is adapted to the state (i.e., αis FX-measurable).
We will further need the following concepts of monotonicity: a function f:
P(RN)→Ris called ≺cm-monotone (resp., ≺c-monotone) if f(m1)≤f(m2) when-
ever m1≺cm m2(resp., m1≺cm2). With the latter order of measures, we mean
hdm1≤hdm2for all functions hwhich are convex and increasing w.r.t. the
usual componentwise order in RN(resp., all convex functions h) such that the inte-
grals exist.
Proposition 5.8. Assume
(A1) b(x, ., µ)is injective, b(x, Rk, µ)is a convex set, and b−1(x, ., µ)is convex;
(A2) f(x, b−1(x, ., µ), ξ)is convex and grows at least like κ0+κ1|·|pwith κ1>
0, p ≥1;
(A3) f(x, α, .)is ≺cm-monotone.
Then the minimization in the extended mean field problem (5.1) can be taken over
weak closed loop tuples. Moreover, if the infimum is attained, then the optimal control
αis of weak closed loop form.
The proof follows the projection arguments used in [1], which requires the above
convexity assumptions. On the other hand, no regularity conditions are required here,
unlike in the classical PDE or probabilistic approaches (see assumptions (I)–(II) in
section 3). We refer to [25] for a similar statement, in a general framework, but under
no nonlinear dependence on the control law. This proof is postponed to Appendix A.
Remark 5.9. If bis linear with positive coefficient for α, then assumption (A3) in
Proposition 5.8 can be weakened:
(A3)f(x, α, .) is ≺c-monotone,
as can be seen from the proof. For example, conditions (A1), (A2), (A3) are satisfied
if
b(x, α, µ) = c1x+c2α+c3¯µand f(x, α, ξ ) = d1x+d2α+d3x2+d4α2+J(¯
ξ1,¯
ξ2),
where Jis a measurable function,
¯µ=xµ(dx),¯
ξ1= xξ(dx, dα),¯
ξ2= αξ(dx, dα),
and ci, diare constants such that c2= 0, d4/c2>0.
5.4. A transport-theoretic discretization scheme. In this part we specialize
the analysis to the following particular case of (5.1):
inf
P,α®1
0
f(LP(αt))dt+g(LP(XT)) : dXt=αtdt+ dWt, X0=x0´,(5.9)
where for simplicity we took T= 1. Throughout this section we assume
(i) gis bounded from below and lower semicontinuous w.r.t. weak convergence;
(ii) fis increasing with respect to convex order, lower semicontinuous w.r.t. weak
convergence, and such that for all λ∈[0,1] and Rk-valued random variables
Z, ¯
Z,
f(L(λZ + (1 −λ)¯
Z)) ≤λf(L(Z)) + (1 −λ)f(L(¯
Z));(5.10)
(iii) fsatisfies the growth condition f(ρ)≥a+b|z|pρ(dz) for some a∈R, b >
0, p > 1.
3686 B. ACCIAIO, J. BACKHOFF-VERAGUAS, AND R. CARMONA
Lemma 5.6 shows the equivalence of (5.9) with the variational problem
inf
π∈Π
c(γ,·)®1
0
fLπ(˙
˘
ω−ω)tdt+g(Lπ(ω1))´.
Under the convention that 1
0fLπ(˙
˘
ω−ω)tdt= +∞if ω−ω L fails under π, the
latter can be expressed in the equivalent form
inf
µ∈˜
P
inf
π∈Πc(γ,µ)®1
0
fLπ(˙
˘
ω−ω)tdt+g(Lπ(ω1))´,(P)
where ˜
Pwas defined in (5.2). In the same spirit as [36, Chapter 3.6], we introduce a
family of causal transport problems in finite dimension increasing to (P). For n∈N,
let Tn:= {i2−n: 0 ≤i≤2n, i ∈N}be the nth generation dyadic grid. For measures
m∈ P(C) and π∈ P(C × C), we write
mn:= Lm({ωt}t∈Tn)∈ P(R(2n+1)d)
and πn:= Lπ({(ωt, ωt)}t∈Tn)∈ P(R(2n+1)d×R(2n+1)d)
for the projections of mand πon the grid Tn. We denote by
(xn
0, xn
1, . . . , xn
2n, yn
0, yn
1, . . . , yn
2n)
a typical element of R(2n+1)d×R(2n+1)d, and let ∆nxi:= xn
i+1 −xn
i, and similarly for
∆nyi.
We consider the auxiliary transport problems
inf
µ∈P(R(2n+1)d)inf
π∈Πn
c(γn,µ)2−n
2n−1
i=0
fÅLπÅ∆nyi−∆nxi
2−nãã+g(Lπ(yn
2n)),
(P(n))
where, in analogy to Definition 5.5, we called
Πn
c(γn,µ)⊂ P(R(2n+1)d×R(2n+1)d)
the set of causal couplings in P(R(2n+1)d×R(2n+1)d) with marginals γnand µ; see
[5].
Theorem 5.10. Suppose problem (P)is finite, and that (i),(ii),(iii) hold. Then
the value of the auxiliary problems (P(n)) increases to the value of the original problem
(P), and the latter admits an optimizer.
Remark 5.11. An example of a function satisfying conditions (ii)–(iii) of The-
orem 5.10 is f(ρ) = Rhdρfor Rconvex and increasing, and hconvex with
p-power growth (p > 1). It also covers the case of functions of the form f(ρ) =
φ(w, z) dρ(w) dρ(z) + |x|pdρ(x), with φjointly convex and bounded from below,
and f(ρ) = Var(ρ) + |x|pdρ(x), where in both cases p > 1. For p= 2 the latter falls
into the LQ case of section 4.3.
Proof. Step 1 (lower bound): Let µ∈˜
Pand π∈Πc(γ,µ) with finite cost for
problem (P). Fix n∈N, and denote by πnthe projection of πonto the grid Tn. We
first observe that
1
0
fLπ(˙
˘
ω−ω)tdt+g(Lπ(ω1)) ≥2−n
2n−1
i=0
fÅLπnÅ∆nyi−∆nxi
2−nãã+g(Lπn(yn
2n)).
(5.11)
EXTENDED MEAN FIELD CONTROL PROBLEMS 3687
Indeed, for i∈ {0,...,2n−1}we have
(i+1)2−n
i2−n
fLπ(˙
˘
ω−ω)tdt≥2−nfLπ(i+1)2−n
i2−n
(˙
˘
ω−ω)t
dt
2−n
= 2−nfÅLπÅω(i+1)2−n−ωi2−n−(ω(i+1)2−n−ωi2−n)
2−nãã
= 2−nfÅLπnÅ∆nyi−∆nxi
2−nãã,
where for the inequality we used the convexity condition (5.10). Noticing that the
first marginal of πnis equal to γn, the r.h.s. of (5.11) is bounded from below by the
value of (P(n)). Because µ, π have been chosen having finite cost for problem (P),
but otherwise arbitrary, we conclude that
(P)≥(P(n)) ∀n∈N.
Step 2 (monotonicity): For n∈Nand i∈ {0,...,2n−1}, take ksuch that
i2−n= (k−1)2−(n+1) < k2−(n+1) <(k+ 1)2−(n+1) = (i+ 1)2−n.
Let µn+1 ∈ P(R(2n+1+1)d) and πn+1 ∈Πn+1
c(γn+1,µn+1 ). By (5.10) we get
2−(n+1) ßfÅLπn+1 Å∆n+1yk−1−∆n+1 xk−1
2−(n+1) ãã+fÅLπn+1 Å∆n+1yk−∆n+1xk
2−(n+1) ãã™
≥2−nfÇLπn+1 Çyn+1
k+1 −yn+1
k−1−(xn+1
k+1 −xn+1
k−1)
2−nåå
= 2−nfÅLπnÅ∆nyi−∆nxi
2−nãã,
where πnis the projection of πn+1 on the grid Tn. Analogously to the previous step,
this gives
(P(n+ 1)) ≥(P(n)) ∀n∈N.
Step 3 (discrete to continuous): We introduce auxiliary problems in path-space:
inf
µ∈˜
P
inf
π∈Πc(γ,µ)2−n
2n−1
i=0
fÅLπÅ∆n
iω−∆n
iω
2−nãã+g(Lπ(ω1)),(Paux(n))
where ∆n
iω:= ω(i+1)2−n−ωi2−nand likewise for ∆n
iω. We now prove that
(Paux(n)) = (P(n)) ∀n∈N.(5.12)
First we observe that the left-hand side of (5.12) is larger than the r.h.s. Indeed,
projecting a coupling from Πc(γ,·) onto a discretization grid gives again a causal
coupling; see [36, Lemma 3.5.1]. For the converse inequality, note that Remark 5.12
implies that, for any ν∈ P(R(2n+1)d) and π∈Πn
c(γn,ν) with finite cost in (P(n)),
there exist µ∈˜
Pand P∈Πc(γ,µ) that give the same cost in (Paux(n)).
3688 B. ACCIAIO, J. BACKHOFF-VERAGUAS, AND R. CARMONA
Step 4 (convergence): Let us denote
c(π) := 1
0
fLπ(˙
˘
ω−ω)tdtand cn(π) := 2−n
2n−1
i=0
fÅLπÅ∆n
iω−∆n
iω
2−nãã,
the cost functionals defining the optimization problems (P) and (Paux(n)). Notice
that Step 1 implies c≥cn, and Step 2 shows that cnis increasing. We now show that
cnconverges to cwhenever the latter is finite. For this it suffices to show that
lim inf
ncn(π)≥c(π).(5.13)
We start by representing cnin an alternative manner, namely,
cn(π) = 1
0
fLπ(t2n+1)2−n
t2n2−n
(˙
˘
ω−ω)s
ds
2−ndt.
By the Lebesgue differentiation theorem [21, Theorem 6, Appendix E.4], for each pair
(ω, ω) such that ω−ωis absolutely continuous, there exists a dt-full set of times such
that
A(t, n) := (t2n+1)2−n
t2n2−n
(˙
˘
ω−ω)s
ds
2−n→(˙
˘
ω−ω)t.(5.14)
If c(π)<∞, the set of such pairs (ω, ω) is π-full. This shows that (5.14) holds
π(dω, dω)dt-a.s. By Fubini's theorem, there is a dt-full set of times I⊂[0,1] such
that, for t∈I, the limit (5.14) holds in the π-almost sure sense (the π-null set depends
on ta priori). By dominated convergence, this proves that
∀t∈I:Lπ(A(t, n)) ⇒ Lπ(˙
˘
ω−ω)t,
namely, in the sense of weak convergence of measures. By lower boundedness and
lower semicontinuity of f, together with Fatou's lemma, we obtain
lim inf
ncn(π)≥1
0
lim inf
nf(Lπ(A(t, n)) ) dt=1
0
fLπ(˙
˘
ω−ω)tdt,
establishing (5.13) and so that cnc.
By Steps 2 and 3, we know that the values of (Paux(n)) are increasing and
bounded from above by the value of (P). We take πnwhich is 1/n-optimal for
(Paux(n)). It follows then by assumptions (i)–(iii) that 1
0[( ˙
˘
ω−ω)t]pdtdπn≤
¯a+¯
b(P), for some ¯a, ¯
b∈R. By [36, Lemma 3.6.2], we obtain the tightness of {πn}n.
We may thus assume that πn⇒πweakly. By [1, Lemma 5.5], the measure πis causal
(and it obviously has first marginal γ