Conference PaperPDF Available

Markov chain approximation methods on generalized HJB equation

Authors:

Abstract

This work is concerned with numerical methods for a class of stochastic control optimizations and stochastic differential games. Numerical procedures based on Markov chain approximation techniques are developed in a framework of generalized Hamilton-Jacobi-Bellman equations. Convergence of the algorithms is derived by means of viscosity solution methods.
Markov Chain Approximation Methods on Generalized HJB
Equation
Xueping Li and Q. S. Song
Abstract This work is concerned with numerical meth-
ods for a class of stochastic control optimizations and
stochastic differential games. Numerical procedures based
on Markov chain approximation techniques are developed
in a framework of generalized Hamilton-Jacobi-Bellman
equations. Convergence of the algorithms is derived by
means of viscosity solution methods.
I. INTRODUCTION
Stochastic control has its wide applications in man-
ufacturing, communication theory, signal processing,
and wireless networks; see for example [10], [7] and
references therein. On the other hand, zero sum stochas-
tic differential games, as the theory of two-controller,
extends the control theory into more realistic problems.
Many problems arising in, for example, pursuit evasion
games, queueing systems in heavy traffic, risk sensitive
control, and constrained optimization problems, can be
formulated as two-player stochastic differential games
[2], [6].
It is well known that the value functions of stochas-
tic optimal controls of such systems lead to systems
of Hamilton-Jacobi-Bellman (HJB) equations, and the
value functions of stochastic differential games satisfy
Hamilton-Jacobi-Isaac (HJI) equations. Such a HJB or
HJI equations are usually nonlinear and difficult to solve
in closed form. Thus numerical methods become viable
alternative. One of the most effective methods is the
Markov chain approximation approach. The proof of
convergence using probability methods is referred to
[9], [10], [14] for stochastic controls and [11], [15] for
stochastic differential games. Viscosity solution methods
provides another way to prove the convergence, see [1]
for stochastic controls and [16] for stochastic differential
games.
The idea of the generalized operator for associated
HJB equations in this work is motivated from [12] on
Q-learning problem, in which many applications of such
a generalization are introduced in the framework of
Xueping Li is with the Department of Industrial and Informa-
tion Engineering, University of Tennessee-Knoxville, TN 37996.
Xueping.Li@utk.edu.
Q. S. Song is with the Department of Mathematics, University of
Southern California, Los Angeles, CA 90089, qingshus@usc.edu.
Research of this author was supported in part by the U.S. Army
Research Office MURI grant W911NF-06-1-0094 at the University
of Southern California.
Markov Decision Process (MDP). In this paper, we aim
to introduce generalized HJB equations, and it turns out
that HJI equation is a special case of generalized HJB
equations. For applications, we concentrate on a class
of stochastic controls and stochastic differential games
with finite time horizon associated with generalized
HJB equations. Upwind finite difference scheme and its
interpretation of Makrov chain approximation is devel-
oped on generalized HJB equations. The convergence of
numerical solution is provided using viscosity solution
technique. This simultaneously implies the convergence
of numerical solutions both on stochastic control and
stochastic differential games.
The rest of paper is arranged as follows. Section II
begins with description of generalized HJB equation.
Associated stochastic control and stochastic differential
games are formulated as its applications. Section III
presents an effective upwind finite difference scheme
with its probability interpretations. Section IV proves the
convergence of numerical scheme. Section V concludes
the paper with further remarks.
II. GENERALIZED HJB EQUATIONS
Throughout the paper, we use following notations.
K is generic constant. Q = [0, T ] × R
d
is domain of
the real-valued value function V (·, ·) : Q R. U is
a compact subset of Euclidian space. σ(·, ·) : R
d
×
U R
d×d
is a matrix-valued function, and a(x, ν) =
σ(x, ν)σ(x, ν)
T
, where σ(x, ν)
T
is the transpose of
σ(x, ν). Let L : R
d
× U R be the running cost,
ψ(·, ·) : Q R be the terminal cost. xy for x, y R
d
is abbreviation of inner product xy
T
=
P
n
i=1
x
i
y
i
.
Consider generalized nonlinear Hamilton-Jacobi-
Bellman (HJB) equation: for (t, x) Q
V
t
+
x
ν
[f(x, ν)D
x
V +
1
2
tr(a(x, ν)D
2
x
V )+L(x, ν)] = 0
(1)
with boundary condition V (T, x) = ψ(x) for x R
d
.
Here
x
ν
is an operator that summarizes values over
actions as a function of the state, such that, for any real-
valued function φ
1
, φ
2
and constant c, there exist some
constant K
(C1)
x
ν
[
1
(x, ν) + φ
2
(x)] = c
x
ν
[φ
1
(x, ν)] + φ
2
(x).
(C2)
x
ν
φ
1
(x, ν)
x
ν
φ
2
(x, ν), whenever φ
1
φ
2
.
Proceedings of the
46th IEEE Conference on Decision and Control
New Orleans, LA, USA, Dec. 12-14, 2007
ThC05.4
1-4244-1498-9/07/$25.00 ©2007 IEEE. 4069
(C3) |
x
ν
φ
1
(x, ν)
x
ν
φ
2
(x, ν)| K max
ν
|φ
1
(x, ν)
φ
2
(x, ν)|.
Many natural operators satisfy above conditions, such
as max
ν
φ(x, ν), min
ν
φ(x, ν), and
R
U
φ(x, ν)m(),
where m(·) is a measure on B(U ), a Borel σalgebra.
Moreover, if we consider two component control
ν = (ν
1
, ν
2
), then min
ν
1
max
ν
2
φ(x, ν
1
, ν
2
) and
max
ν
2
min
ν
1
φ(x, ν
1
, ν
2
) also satisfy all conditions
above.
To proceed, we need following regular assumption.
(A1) a, f, L, ψ are continuous and bounded. For φ =
a, f, L, ψ, function φ and its partial derivatives
φ
x
i
, φ
x
i
x
j
are continuous and bounded on R
d
× U
for i, j = 1, 2, . . . , n.
(Ω, F, F
t
, P, W ) is a given probability space driven
by Wiener Process W
t
with filtration F
t
. In the fol-
lowing, we present two applications of generalized HJB
equation (1): stochastic control problem and stochastic
differential games.
A. Classical stochastic control problem
Suppose X
s
satisfies controlled stochastic differential
equation (SDE)
dX
s
= f(X
s
, u
s
)ds + σ(X
s
, u
s
)dW
s
, s [t, T ],
(2)
with initial condition X
t
= x.
Definition 2.1: An admissible control process u on
[t, T ] is an F
t
-progressively measurable process taking
values in U. The set of all admissible controls is denoted
by U(t).
Cost function for a given admissible control u(·) U
is defined as
J(t, x, u) = E
h
Z
T
t
L(X
s
, u
s
)ds + ψ(X
T
)
i
, (3)
and value function is defined as
V (t, x) = inf
u∈U(t)
J(t, x, u). (4)
It is well known that V (t, x) is unique viscosity solu-
tion of HJB equation (1) with
x
ν
= min
ν
. Similarly, if
take sup over all admissible controls in (4), then V (t, x)
is unique viscosity solution of (1) with
x
ν
= max
ν
.
(See [4], [7])
B. Stochastic differential games
Let U = U
1
× U
2
and u = (u
1
, u
2
) U(t) is an
admissible control. X
s
satisfies SDE (2). u
1
and u
2
are
controls offered by player1 and player2, respectively.
The collection of admissible controls on [t, T ] of player1
and player2 are denoted by U
1
(t) and U
2
(t). Player1
(resp. player2) wants to minimize (resp. maximize) the
cost (3). In the following, we define Elliott-Kalton type
upper and lower values of differential games.
Definition 2.2: An admissible strategy α (resp. β)
for player2 (resp. player1) on [t, T ] is a mapping α :
U
1
(t) U
2
(t) (resp. β : U
2
(t) U
1
(t)), such that,
for t < r < T , u
1
(s) = ˜u
1
(s) for almost all s [t, r]
implies β(u
1
)(s) = β(˜u
1
)(s) for almost all s [t, r].
Let S
1
(t) and S
2
(t) denote the class of all admissible
strategies of player1 and player2 on [t, T ].
The upper value V
+
(t, x) and lower value V
(t, x)
are defined as
V
+
(t, x) = sup
α∈S
1
(t)
inf
u
1
∈U
1
(t)
J(t, x, u
1
, α(u
1
)). (5)
and
V
(t, x) = inf
β∈S
2
(t)
sup
u
2
∈U
2
(t)
J(t, x, β(u
2
), u
2
) (6)
It is well known that V
+
(t, x) (resp. V
(t, x)) is
unique viscosity solution of HJB equation (1) with
x
ν
= min
ν
1
max
ν
2
(resp. max
ν
2
min
ν
1
), See [8]. If
V
+
(t, x) = V
(t, x) holds for all (t, x) Q, then the
differential game is said to have a saddle point, and its
value is denoted by V (t, x).
III. NUMERICAL SOLUTIONS
Let e
i
be ith unit basis of R
d
for i = 1, 2, . . . , d. For a
given positive discretized parameter δ, h, define discrete
spaces in state and time by
Σ
δ
= {x R
d
: x =
P
n
i=1
k
i
δe
i
, k
i
Z},
[t, T ]
h
= [t, T ] {t = kh + T : k Z}.
(7)
To proceed, the following assumptions will be given.
(A2) Matrix a(x, ν) satisfies
|a
ii
(x, ν)|
X
j6=i
|a
ij
(x, ν)| 0.
(A3) Discrete parameter δ = δ(h) is a function of h such
that
h
n
X
i=1
[a
ii
(x, ν)
1
2
X
j6=i
|a
ij
(x, ν)|+δ|f
i
(x, ν)|] δ
2
.
(8)
Assumption (A2) requires that the diffusion matrix
be diagonally dominated. If the given dynamic system
does not satisfy (A2), then we can adjust the coordinate
system to satisfy assumption (A2); see [10, page 110]
and [7, Page329]. Assumption (A3) gives the relation
between two parameters δ and h, which are used in
discretization.
By V
h
(·, ·) on Σ
δ
× [0, T ]
h
, we denote numerical
solution of (1) with parameters δ, h of (8) used. Note
that, for simplicity, we use V
h
instead of V
δ,h
.
Numerical solution V
h
can be obtained by upwind
finite difference numerical scheme, that is, for any
46th IEEE CDC, New Orleans, USA, Dec. 12-14, 2007 ThC05.4
4070
function φ(t, x)
δ,±
x
i
φ = δ
1
[φ(t, x ± δe
i
) φ(t, x)]
2
x
i
φ = δ
2
[φ(t, x + δe
i
) + φ(t, x δe
i
) 2φ(t, x)]
,
δ,+
x
i
x
i
φ ,
δ,
x
i
x
i
φ
δ,+
x
i
x
j
φ =
1
2
δ
2
[2φ(t, x) + φ(t, x + δe
i
+ δe
j
)
+φ(t, x δe
i
δe
j
)]
1
2
δ
2
[φ(t, x + δe
i
) + φ(t, x δe
i
)
+φ(t, x + δe
j
) + φ(t, x δe
j
)]
δ,
x
i
x
j
φ =
1
2
δ
2
[2φ(t, x) + φ(t, x + δe
i
δe
j
)
+φ(t, x δe
i
+ δe
j
)]
+
1
2
δ
2
[φ(t, x + δe
i
) + φ(t, x δe
i
)
+φ(t, x + δe
j
) + φ(t, x δe
j
)]
h,
t
φ =
φ(t, x) φ(t h, x)
h
h,+
t
φ =
φ(t + h, x) φ(t, x)
h
(9)
Applying above upwind finite difference scheme (9)
to (1), one can write explicit numerical scheme as
h,
t
V
h
+
x
ν
[f
+
(x, ν)∆
δ,+
x
V
h
+
1
2
tr(a
+
(x, ν)∆
2,δ,+
x
V
h
) f
(x, ν)∆
δ,
x
V
h
1
2
tr(a
(x, ν)∆
2,δ,
x
V
h
) + L(x, ν)] = 0,
(10)
where
a
±
= maxa, 0}
δ,±
x
φ = (∆
δ,±
x
1
φ,
δ,±
x
2
φ, . . . ,
δ,±
x
d
φ)
T
2,δ,±
x
φ = (∆
δ,±
x
i
x
j
φ)
i,j=1,...,d
(11)
Note that
2,δ,±
x
is symmetric matrix. In the following,
we give equivalent Markov chain approximation inter-
pretation of above upwind finite difference scheme. One
can rewrite (10) with boundary conditions by
V
h
(t h, x) =
x
ν
[
X
yΣ
δ
p
h
(x, y, ν)V
h
(t, y)
+hL(x, ν)], t [h, T ]
h
, x Σ
δ
V
h
(T, x) = ψ(x), x Σ
δ
.
(12)
where
p
h
(x, x ± δe
i
, ν)
=
h
2δ
2
[a
ii
(x, ν)
X
j6=i
|a
ij
(x, ν)| + 2δf
±
i
(x, ν)]
p
h
(x, x + δe
i
± δe
j
, ν) =
h
2δ
2
a
±
ij
(x, ν), i 6= j
p
h
(x, x δe
i
± δe
j
, ν) =
h
2δ
2
a
±
ij
(x, ν), i 6= j
p
h
(x, x, ν)
= 1
h
δ
2
n
X
i=1
[a
ii
(x, ν)
1
2
X
j6=i
|a
ij
(x, ν)| + δ|f
i
(x, ν)|]
p
h
(x, y, ν) = 0, otherwise.
(13)
Note that, under assumptions (A2) and (A3), we have
X
yΣ
δ
p
h
(x, y, ν) = 1; p
h
(x, y, ν) 0 (14)
It can be seen from (14), we can consider p
h
(·) as a
one step transition probability of a Markov chain {x
h
n
:
n = 0, 1, 2, . . .} in state space Σ
δ
with the cost function
defined by
˜
V
h
(k, x) =
T/h
X
n=k
hL(x
h
n
, u
h
n
). (15)
Then, the dynamic programming equation of
˜
V
h
is
exactly the same as (12). Hence, by uniqueness, we have
˜
V
h
= V
h
.
Remark 3.1: An implicit numerical scheme can be
obtained by replacing
h,
t
φ by
h,+
t
φ in (10), that
is
h,+
t
V
h
+
x
ν
[f
+
(x, ν)∆
δ,+
x
V
h
+
1
2
tr(a
+
(x, ν)∆
2,δ,+
x
V
h
) f
(x, ν)∆
δ,
x
V
h
1
2
tr(a
(x, ν)∆
2,δ,
x
V
h
) + L(x, ν)] = 0,
(16)
The above implicit numerical scheme also have probabil-
ity interpretations when we deal discrete time as another
state variable, see [10, Chapter 12].
The next section is to find sufficient conditions such
that V
h
of explicit scheme (10) converge to unique
viscosity solution V of (1). For the convergence of
implicit scheme (16), we can follow analogous method.
IV. CONVERGENCE
To show the convergence of V
h
of explicit scheme
(10) with boundary conditions, one can rewrite (10) as
V
h
(t h, x) = F
h
[V
h
(t, ·)](x), t [h, T ]
h
, x Σ
δ
V
h
(T, x) = ψ(x), x Σ
δ
.
(17)
46th IEEE CDC, New Orleans, USA, Dec. 12-14, 2007 ThC05.4
4071
where Σ
δ
= {x R
d
: x =
P
n
i=1
k
i
δe
i
, k
i
Z},
[h, T ]
h
= {t : h t T, t = kh, k Z}, and F
h
[φ](x)
is an operator for any function φ : R
d
R, such that
F
h
[φ](x) = φ(x) + h
x
ν
[f
+
(x, ν)∆
δ,+
x
φ(x)
f
(x, ν)∆
δ,
x
φ(x) +
1
2
tr(a
+
(x, ν)∆
2,δ,+
x
φ(x))
1
2
tr(a
(x, ν)∆
δ,
x
φ(x)) + L(x, ν)]
(18)
Note that, by condition (C1), one can rewrite (18) as
F
h
[φ](x) =
x
ν
[
X
yΣ
δ
p
h
(x, y, ν)φ(y)+ hL(x, ν)], (19)
Lemma 4.1: Assume (A1), (A2) and (A3). Following
properties hold:
F
h
[φ
1
] F
h
[φ
2
], for φ
1
φ
2
(20)
F
h
(φ + c) = F
h
(φ) + c, c R (21)
kV
h
k
K, 0 < h < 1, (22)
and for all φ C
1,2
(
¯
Q)
lim
(s,y)(t,x)
h0
F
h
[φ(s, ·)](y) φ(s h, y)
h
= φ
t
+
x
ν
[f(x, ν)D
x
φ(t, x)
+
1
2
tr(a(x, ν)D
2
x
φ(t, x)) + L(x, ν)]
(23)
Proof Note that under (A2), (A3) and (14), one can have
(20) and (21). Rewrite (17) as
V
h
(t h, x) = F
h
[V
h
(t, ·)](x)
=
h
ν
[
X
yΣ
δ
p
h
(x, y, ν)V
h
(t, y) + hL(x, ν)].
(24)
Then, for any t [h, T ]
V
h
(t h, x)
h
ν
[max
y
V
h
(t, y) + hL(x, ν)]
max
y
V
h
(t, y) + h
h
ν
L(x, ν)
max
y
V
h
(t, y) + KhkLk
.
(25)
It leads to stability of F
h
, that is, for any 0 m T/h,
max
x
V
h
(T mh, x) max
x
V
h
(T, x) + KmhkLk
kψ(x)k
+ KT kLk
<
(26)
Hence, (22) holds.
For any test function φ C
1,2
(
¯
Q), one can verify the
consistency (23) as following,
lim
(s,y)(t,x)
h0
F
h
[φ(s, ·)](y) φ(s h, y)
h
= lim
(s,y)(t,x)
h0
φ(s, y) φ(s h, y)
h
+ lim
(s,y)(t,x)
h0
y
ν
[L(y, ν) + f
+
(y, ν)∆
δ,+
x
φ(s, y)
f
(y, ν)∆
δ,
x
φ(s, y) +
1
2
tr(a
+
(y, ν)∆
2,δ,+
x
φ(s, y))
1
2
tr(a
(y, ν)∆
2,δ,
x
φ(s, y))]
= φ
t
+
x
ν
[f(x, ν)D
x
φ(t, x)
+
1
2
tr(a(x, ν)D
2
x
φ(t, x)) + L(x, ν)]
(27)
This completes the proof. 2
Definition 4.2: We say that V is a viscosity solution
of equation (1) if
(a) V (t, x) is upper semicontinuous function on Q and
for each φ C
(Q),
φ
t
(
¯
t, ¯x) +
x
ν
[f(¯x, ν)D
x
φ(
¯
t, ¯x)
+
1
2
tr(a(¯x, ν)D
2
x
φ(
¯
t, ¯x)) + L(¯x, ν)] 0,
(28)
at every (
¯
t, ¯x) Q which is strict maximizer of V φ
on
¯
Q.
(b) V (t, x) is lower semicontinuous function on Q and
for each φ C
(Q),
φ
t
(
¯
t, ¯x) +
x
ν
[f(¯x, ν)D
x
φ(
¯
t, ¯x)
+
1
2
tr(a(¯x, ν)D
2
x
φ(
¯
t, ¯x)) + L(¯x, ν)] 0,
(29)
at every (
¯
t, ¯x) Q which is strict minimizer of V φ
on
¯
Q.
If (a) (respectively (b)) holds, then V is said to be
subsolution (respectively supersolution) of (1).
For (t, x) Q, Define upper and lower semicontinu-
ous envelope of solution V
h
as
V
(t, x) = lim sup
(s,y)(t,x)
h0
V
h
(s, y),
V
(t, x) = lim inf
(s,y)(t,x)
h0
V
h
(s, y).
(30)
Lemma 4.3: V
(resp. V
) defined in (30) is a vis-
cosity subsolution (resp. supersolution) of equation (1)
under assumptions (A1), (A2), and (A3).
Proof. Suppose that φ C
(Q) is a test function such
that V
φ has strict maximum at (
¯
t, ¯x) Q. Then
there is a sequence converging to zero denoted by h,
such that V
h
φ has a maximum on [0, T ]
h
× Σ
δ
at
(s
h
, y
h
) such that (s
h
, y
h
) (
¯
t, ¯x) as h 0. For all
46th IEEE CDC, New Orleans, USA, Dec. 12-14, 2007 ThC05.4
4072
y Σ
h
, therefore
φ(s
h
+ h, y) φ(s
h
, y
h
) V
h
(s
h
+ h, y) V
h
(s
h
, y
h
).
(31)
By virtue of (20) and (21),
F
h
[φ(s
h
+ h, ·)](y
h
) φ(s
h
, y
h
)
F
h
[V
h
(s
h
+ h, ·)](y
h
) V
h
(s
h
, y
h
).
(32)
By (17), the right hand side of (32) is zero. By dividing
h and forcing h 0, left hand side of (32) goes to (23).
Thus, (28) holds. One can prove V
is supersolution in
similar fashion. 2
In the following lemma, by A B for symmetric
matrices, we mean AB is symmetric positive definite.
Lemma 4.4: Suppose (A1), A(2), and (A3) hold. Let
φ and
¯
φ be bounded viscosity subsolution and superso-
lution of (1). Then
sup
¯
Q
(φ
¯
φ) = sup
yR
d
(φ(T, y)
¯
φ(T, y)) (33)
Proof By virtue [7, Theorem V.9.1], it is enough to show
that there exists a constant K, such that
y
ν
[αf(y, ν)(x y) +
1
2
tr(a(y, ν)B) + L(y, ν)]
x
ν
[αf(x, ν)(x y) +
1
2
tr(a(x, ν)B) + L(x, ν)]
K(α|x y|
2
+ |x y|),
(34)
for every (t, x), (t, y) Q, α > 0, and symmetric
matrices A, B satisfying
3α
I 0
0 I
A 0
0 B
3α
I I
I I
(35)
By condition (C3) of the operator
x
ν
, one can write
y
ν
[αf(y, ν)(x y) +
1
2
tr(a(y, ν)B) + L(y, ν)]
x
ν
[αf(x, ν)(x y) +
1
2
tr(a(x, ν)B) + L(x, ν)]
K max
ν
|α(f(y, ν) f(x, ν))(x y)|
+K max
ν
|L(y, ν) L(x, ν)|
+K max
ν
|tr(a(y, ν)B a(x, ν)A)|.
(36)
Note that assumption (A1) implies Lipschitz continuity
of function f and L. Hence
max
ν
|α(f(y, ν) f(x, ν))(x y)|
+ max
ν
|L(y, ν) L(x, ν)| K(α|x y|
2
+ |x y|).
(37)
The last term is
tr(a(y, ν)B a(x, ν)A)
= tr
σ(y, ν)σ(y, ν)
T
σ(y, ν)σ(x, ν)
T
σ(x, ν)σ(y, ν)
T
σ(x, ν)σ(x, ν)
T
·
A 0
0 B
3αtr
σ(y, ν)σ(y, ν)
T
σ(y, ν)σ(x, ν)
T
σ(x, ν)σ(y, ν)
T
σ(x, ν)σ(x, ν)
T
·
I I
I I
= 3αtr(σ(y, ν)σ(y, ν)
T
σ(y, ν)σ(x, ν)
T
σ(x, ν)σ(y, ν)
T
+ σ(x, ν)σ(x, ν)
T
)
= 3αtr((σ(y, ν) σ(x), ν)(σ(y, ν) σ(x, ν))
0
)
= 3αkσ(y, ν) σ(x, ν)k
2
Kα|x y|
2
.
(38)
The above inequalities combined together imply the
result. 2
Theorem 4.5: Suppose (A1), A(2), and (A3) hold.
Then V
= V
, V is the unique viscosity solution
of (1) on Q.
Proof. By definition of (30), V
V
. Moreover, (17)
and (30) implies V
and V
has the same boundary
condition on {T } × R
d
. Applying Lemma 4.3 and
Lemma 4.4, one can write V
V
. Thus, V = V
=
V
is viscosity solution of (1). Uniqueness also follows
form Lemma 4.4. 2
Following corollaries are straight forward results from
Theorem 4.5.
Corollary 4.6: Suppose (A1), A(2), and (A3) hold.
V (t, x) is the value function defined by (3), and V
h
(t, x)
is approximated value function of (10) with
x
ν
replaced
by min
ν
. Then V
h
(t, x) converge to V (t, x) as h 0.
Corollary 4.7: Suppose (A1), A(2), and (A3) hold.
V
+
(t, x) (resp. V
(t, x)) is the value function defined
by (5) (resp. (6)), and V
h+
(t, x) (resp. V
h
(t, x)) is
approximated value function of (10) with
x
ν
replaced
by min
ν
1
max
ν
2
(resp. max
ν
2
min
ν
1
). Then V
h+
(t, x)
(resp. V
(t, x)) converge to V
+
(t, x) (resp. V
(t, x))
as h 0.
V. FURTHER REMARKS
In this work, the generalized HJB equation is pro-
posed, which is associated with stochastic control and
stochastic differential games. The proof of convergence
is given by the viscosity solution method. Probability
methods analogous to [9], [15] can also be used to prove
the convergence.
Another approachable result is controlled stochastic
hybrid system. Such a formulation has extensive recent
applications in risk theory, financial engineering, and
insurance modeling, see [5], [13], [17], [18], [19]. It
might be interesting if one could find more applications
46th IEEE CDC, New Orleans, USA, Dec. 12-14, 2007 ThC05.4
4073
in the manner of generalized HJB equations, such as,
risk-sensitive models, exploration-sensitive models, and
non zero sum differential games.
REFERENCES
[1] G. Barles, P. E. Souganidis, Convergence of Approximation
shemes for fully nonlinear second order equations, J. Asymptotic
Analysis, 4(1991), 271-283.
[2] T. Basar, P. Bernhard, H
-Optimal Control and Related Mini-
max Problems, Birkh¨auser Boston, Boston, 1991.
[3] D. P. Bertsekas, Dynamic Programming: Deterministic and
Stochastic Models. Upper Saddle River, NJ, 1987.
[4] M. G. Crandall, H. Ishii, P. L. Lions, User’s Guide to Viscosity
Solutions of Second Order Partial Differential Equations, Bul-
letin of the American Mathematical Society, Vol 27, No. 1 (1992),
1-67.
[5] G.B. Di Masi, Y.M. Kabanov, W.J. Runggaldier, Mean variance
hedging of options on stocks with Markov volatility, Theory of
Probability and Applications, Vol 39 (1994), 173-181.
[6] R.J. Elliott and N.J. Kalton, Existence of Value in Differential
Games, Mem. Amer. Math. Soc., 126, Providence, RI, 1972.
[7] W. H. Fleming, H. M. Soner, Controlled Markov Processes and
Viscosity Solutions, 2nd edition, Springer-Verlag, Berlin, New
York, 2006.
[8] W. H. Fleming, P. E. Souganidis, On the Existence of
Value Functions of Two-player, Zero-sum Stochastic Differential
Games, Indiana Univ. Math. J., 38.2, (1989), 293-314.
[9] H. J. Kushner, Numiercal methods for stochastic control prob-
lems in continuous time, SIAM J. Control Optim, 28 (1990),
999-1048.
[10] H. J. Kushner, P. Dupuis, Numerical Methods for Stochastic
Control Problems in Continuous Time, 2nd edition, Springer-
Verlag, New York, Berlin, 2001.
[11] H. J. Kushner, Numerical approximations for stochastic differ-
ential games, SIAM J. Control Optim, 41.2 (2002), 457-486.
[12] M. L. Littman, C. Szepesvari, A Generalized Reinforcement-
Learning Model: Convergence and Applications, Proceedings of
the 13th International Conference on Machine Learning (ICML-
96), Bari, Italy (1996), 310-318.
[13] T. Rolski, H. Schmidli, V. Schmidt and J. Teugels, Stochastic
Processes for Insurance and Finance, Wiley and Sons, New
York, 1999.
[14] Q.S. Song, G. Yin, and Z. Zhang, Numerical method for con-
trolled regime-switching diffusions and regime-switching jump
diffusions, Automatica, 42 (2006), 1147-1158.
[15] Q. S. Song, G. Yin, Existence of Saddle Points in Discrete
Markov Games and Its Application in Numerical Methods for
Stochastic Differential Games, Proceedings of the 45th IEEE
Conference on Decision & Control, (2006), 6325-6330.
[16] P. E. Souganidis, Two-player, zero-sum differential games and
viscosity solutions, Stochastic and differential games, Ann. In-
ternat. Soc. Dynam. Games, 4, Birkh¨auser Boston, Boston, MA,
(1999), 69–104 .
[17] G. Yin and Q. Zhang, Continuous-Time Markov Chains and Ap-
plications: A Singular Perturbation Approach, Springer- Verlag,
New York, 1998.
[18] G. Yin and Q. Zhang, Discrete-time Markov Chains: Two time-
scale Methods and Applications, Springer, New York. 2005.
[19] Q. Zhang, Stock trading: An optimal selling rule, SIAM J.
Control Optim., Vol 40 (2001), 64-87.
46th IEEE CDC, New Orleans, USA, Dec. 12-14, 2007 ThC05.4
4074
Chapter
In critical plants, the accommodation procedure must be performed with utmost speed to prevent the potential failure(s) damages. In relation to the optimal control issue, the Progressive Accommodation (PA) is an effective solution to overcome this issue. In this chapter, we define the concept and the objectives of the PA and we justify the choice of the iterative algorithm. Then, we present the limitations of the linear PA when it comes to nonlinear (NL) systems and thus, the necessity to use the NL optimal control. The SDRE method is presented as an approximate solution of the NL optimal control problem, as well as the related NL PA. To demonstrate the suggested strategy, an example is provided at the end.
Article
The maximal return and optimal leverage of a constant proportion debt obligation with finite termination and two boundaries are analysed by numerically solving Hamilton–Jacobi–Bellman equations. We discuss the probabilities of the asset value reaching the upper or lower bound under the optimal control and the optimal control problem with a time-varying boundary. Furthermore, we also analyse the relationship between the optimal return, the optimal policy and different parameters.
Article
Full-text available
In this paper we focus on pricing of structured products in energy markets using utility indifference pricing approach. In particular, we compute the buyer's price of such derivatives for an agent investing in the forward market, whose preferences are described by an exponential utility function. Such a price is characterized in terms of continuous viscosity solutions of suitable non-linear PDEs. This provides an effective way to compute both an optimal exercise strategy for the structured product and a portfolio strategy to partially hedge the financial position. In the complete market case, the financial hedge turns out to be perfect and the PDE reduces to particular cases already treated in the literature. Moreover, in a model with two assets and constant correlation, we obtain a representation of the price as the value function of an auxiliary simpler optimization problem under a risk neutral probability, that can be viewed as a perturbation of the minimal entropy martingale measure. Finally, numerical results are provided.
Article
"I believe that the authors have written a first-class book which can be used for a second or third year graduate level course in the subject... Researchers working in the area will certainly use the book as a standard reference... Given how well the book is written and organized, it is sure to become one of the major texts in the subject in the years to come, and it is highly recommended to both researchers working in the field, and those who want to learn about the subject." —SIAM Review (Review of the First Edition) "This book is devoted to one of the fastest developing fields in modern control theory---the so-called 'H-infinity optimal control theory'... In the authors' opinion 'the theory is now at a stage where it can easily be incorporated into a second-level graduate course in a control curriculum'. It seems that this book justifies this claim." —Mathematical Reviews (Review of the First Edition) "This work is a perfect and extensive research reference covering the state-space techniques for solving linear as well as nonlinear H-infinity control problems." —IEEE Transactions on Automatic Control (Review of the Second Edition) "The book, based mostly on recent work of the authors, is written on a good mathematical level. Many results in it are original, interesting, and inspirational...The book can be recommended to specialists and graduate students working in the development of control theory or using modern methods for controller design." —Mathematica Bohemica (Review of the Second Edition) "This book is a second edition of this very well-known text on H-infinity theory...This topic is central to modern control and hence this definitive book is highly recommended to anyone who wishes to catch up with this important theoretical development in applied mathematics and control." —Short Book Reviews (Review of the Second Edition) "The book can be recommended to mathematicians specializing in control theory and dynamic (differential) games. It can be also incorporated into a second-level graduate course in a control curriculum as no background in game theory is required." —Zentralblatt MATH (Review of the Second Edition)
Book
Stochastic Processes for Insurance and Finance offers a thorough yet accessible reference for researchers and practitioners of insurance mathematics. Building on recent and rapid developments in applied probability, the authors describe in general terms models based on Markov processes, martingales and various types of point processes. Discussing frequently asked insurance questions, the authors present a coherent overview of the subject and specifically address: The principal concepts from insurance and finance; Practical examples with real life data; Numerical and algorithmic procedures essential for modern insurance practices; Assuming competence in probability calculus, this book will provide a fairly rigorous treatment of insurance risk theory recommended for researchers and students interested in applied probability as well as practitioners of actuarial sciences.
Chapter
The chapter presents many of the basic ideas which are in current use for the solution of the dynamic programming equations for the optimal control and value function for the approximating Markov chain models. We concentrate on methods for problems which are of interest over a potentially unbounded time interval. Numerical methods for the ergodic problem will be discussed in Chapter 7, and are simple modifications of the ideas of this chapter. Some approaches to the numerical problem for the finite time problem will be discussed in Chapter 12.
Article
Controlled markov processes and viscosity solutions by W. H. Fleming and H. M. Soner. Springer-Verlag, New York (1993), 428 pp., $ 49.95. ISBN 0-387-97927-1.