Content uploaded by Xueping Li
Author content
All content in this area was uploaded by Xueping Li on Dec 27, 2013
Content may be subject to copyright.
Markov Chain Approximation Methods on Generalized HJB
Equation
Xueping Li and Q. S. Song
Abstract— This work is concerned with numerical meth-
ods for a class of stochastic control optimizations and
stochastic differential games. Numerical procedures based
on Markov chain approximation techniques are developed
in a framework of generalized Hamilton-Jacobi-Bellman
equations. Convergence of the algorithms is derived by
means of viscosity solution methods.
I. INTRODUCTION
Stochastic control has its wide applications in man-
ufacturing, communication theory, signal processing,
and wireless networks; see for example [10], [7] and
references therein. On the other hand, zero sum stochas-
tic differential games, as the theory of two-controller,
extends the control theory into more realistic problems.
Many problems arising in, for example, pursuit evasion
games, queueing systems in heavy traffic, risk sensitive
control, and constrained optimization problems, can be
formulated as two-player stochastic differential games
[2], [6].
It is well known that the value functions of stochas-
tic optimal controls of such systems lead to systems
of Hamilton-Jacobi-Bellman (HJB) equations, and the
value functions of stochastic differential games satisfy
Hamilton-Jacobi-Isaac (HJI) equations. Such a HJB or
HJI equations are usually nonlinear and difficult to solve
in closed form. Thus numerical methods become viable
alternative. One of the most effective methods is the
Markov chain approximation approach. The proof of
convergence using probability methods is referred to
[9], [10], [14] for stochastic controls and [11], [15] for
stochastic differential games. Viscosity solution methods
provides another way to prove the convergence, see [1]
for stochastic controls and [16] for stochastic differential
games.
The idea of the generalized operator for associated
HJB equations in this work is motivated from [12] on
Q-learning problem, in which many applications of such
a generalization are introduced in the framework of
Xueping Li is with the Department of Industrial and Informa-
tion Engineering, University of Tennessee-Knoxville, TN 37996.
Xueping.Li@utk.edu.
Q. S. Song is with the Department of Mathematics, University of
Southern California, Los Angeles, CA 90089, qingshus@usc.edu.
Research of this author was supported in part by the U.S. Army
Research Office MURI grant W911NF-06-1-0094 at the University
of Southern California.
Markov Decision Process (MDP). In this paper, we aim
to introduce generalized HJB equations, and it turns out
that HJI equation is a special case of generalized HJB
equations. For applications, we concentrate on a class
of stochastic controls and stochastic differential games
with finite time horizon associated with generalized
HJB equations. Upwind finite difference scheme and its
interpretation of Makrov chain approximation is devel-
oped on generalized HJB equations. The convergence of
numerical solution is provided using viscosity solution
technique. This simultaneously implies the convergence
of numerical solutions both on stochastic control and
stochastic differential games.
The rest of paper is arranged as follows. Section II
begins with description of generalized HJB equation.
Associated stochastic control and stochastic differential
games are formulated as its applications. Section III
presents an effective upwind finite difference scheme
with its probability interpretations. Section IV proves the
convergence of numerical scheme. Section V concludes
the paper with further remarks.
II. GENERALIZED HJB EQUATIONS
Throughout the paper, we use following notations.
K is generic constant. Q = [0, T ] × R
d
is domain of
the real-valued value function V (·, ·) : Q → R. U is
a compact subset of Euclidian space. σ(·, ·) : R
d
×
U → R
d×d
is a matrix-valued function, and a(x, ν) =
σ(x, ν)σ(x, ν)
T
, where σ(x, ν)
T
is the transpose of
σ(x, ν). Let L : R
d
× U → R be the running cost,
ψ(·, ·) : Q → R be the terminal cost. xy for x, y ∈ R
d
is abbreviation of inner product xy
T
=
P
n
i=1
x
i
y
i
.
Consider generalized nonlinear Hamilton-Jacobi-
Bellman (HJB) equation: for ∀(t, x) ∈ Q
V
t
+⊗
x
ν
[f(x, ν)D
x
V +
1
2
tr(a(x, ν)D
2
x
V )+L(x, ν)] = 0
(1)
with boundary condition V (T, x) = ψ(x) for x ∈ R
d
.
Here ⊗
x
ν
is an operator that summarizes values over
actions as a function of the state, such that, for any real-
valued function φ
1
, φ
2
and constant c, there exist some
constant K
(C1) ⊗
x
ν
[cφ
1
(x, ν) + φ
2
(x)] = c ⊗
x
ν
[φ
1
(x, ν)] + φ
2
(x).
(C2) ⊗
x
ν
φ
1
(x, ν) ≤ ⊗
x
ν
φ
2
(x, ν), whenever φ
1
≤ φ
2
.
Proceedings of the
46th IEEE Conference on Decision and Control
New Orleans, LA, USA, Dec. 12-14, 2007
ThC05.4
1-4244-1498-9/07/$25.00 ©2007 IEEE. 4069
(C3) |⊗
x
ν
φ
1
(x, ν)−⊗
x
ν
φ
2
(x, ν)| ≤ K max
ν
|φ
1
(x, ν)−
φ
2
(x, ν)|.
Many natural operators satisfy above conditions, such
as max
ν
φ(x, ν), min
ν
φ(x, ν), and
R
U
φ(x, ν)m(dν),
where m(·) is a measure on B(U ), a Borel σ−algebra.
Moreover, if we consider two component control
ν = (ν
1
, ν
2
), then min
ν
1
max
ν
2
φ(x, ν
1
, ν
2
) and
max
ν
2
min
ν
1
φ(x, ν
1
, ν
2
) also satisfy all conditions
above.
To proceed, we need following regular assumption.
(A1) a, f, L, ψ are continuous and bounded. For φ =
a, f, L, ψ, function φ and its partial derivatives
φ
x
i
, φ
x
i
x
j
are continuous and bounded on R
d
× U
for i, j = 1, 2, . . . , n.
(Ω, F, F
t
, P, W ) is a given probability space driven
by Wiener Process W
t
with filtration F
t
. In the fol-
lowing, we present two applications of generalized HJB
equation (1): stochastic control problem and stochastic
differential games.
A. Classical stochastic control problem
Suppose X
s
satisfies controlled stochastic differential
equation (SDE)
dX
s
= f(X
s
, u
s
)ds + σ(X
s
, u
s
)dW
s
, ∀s ∈ [t, T ],
(2)
with initial condition X
t
= x.
Definition 2.1: An admissible control process u on
[t, T ] is an F
t
-progressively measurable process taking
values in U. The set of all admissible controls is denoted
by U(t).
Cost function for a given admissible control u(·) ∈ U
is defined as
J(t, x, u) = E
h
Z
T
t
L(X
s
, u
s
)ds + ψ(X
T
)
i
, (3)
and value function is defined as
V (t, x) = inf
u∈U(t)
J(t, x, u). (4)
It is well known that V (t, x) is unique viscosity solu-
tion of HJB equation (1) with ⊗
x
ν
= min
ν
. Similarly, if
take sup over all admissible controls in (4), then V (t, x)
is unique viscosity solution of (1) with ⊗
x
ν
= max
ν
.
(See [4], [7])
B. Stochastic differential games
Let U = U
1
× U
2
and u = (u
1
, u
2
) ∈ U(t) is an
admissible control. X
s
satisfies SDE (2). u
1
and u
2
are
controls offered by player1 and player2, respectively.
The collection of admissible controls on [t, T ] of player1
and player2 are denoted by U
1
(t) and U
2
(t). Player1
(resp. player2) wants to minimize (resp. maximize) the
cost (3). In the following, we define Elliott-Kalton type
upper and lower values of differential games.
Definition 2.2: An admissible strategy α (resp. β)
for player2 (resp. player1) on [t, T ] is a mapping α :
U
1
(t) → U
2
(t) (resp. β : U
2
(t) → U
1
(t)), such that,
for t < r < T , u
1
(s) = ˜u
1
(s) for almost all s ∈ [t, r]
implies β(u
1
)(s) = β(˜u
1
)(s) for almost all s ∈ [t, r].
Let S
1
(t) and S
2
(t) denote the class of all admissible
strategies of player1 and player2 on [t, T ].
The upper value V
+
(t, x) and lower value V
−
(t, x)
are defined as
V
+
(t, x) = sup
α∈S
1
(t)
inf
u
1
∈U
1
(t)
J(t, x, u
1
, α(u
1
)). (5)
and
V
−
(t, x) = inf
β∈S
2
(t)
sup
u
2
∈U
2
(t)
J(t, x, β(u
2
), u
2
) (6)
It is well known that V
+
(t, x) (resp. V
−
(t, x)) is
unique viscosity solution of HJB equation (1) with
⊗
x
ν
= min
ν
1
max
ν
2
(resp. max
ν
2
min
ν
1
), See [8]. If
V
+
(t, x) = V
−
(t, x) holds for all (t, x) ∈ Q, then the
differential game is said to have a saddle point, and its
value is denoted by V (t, x).
III. NUMERICAL SOLUTIONS
Let e
i
be ith unit basis of R
d
for i = 1, 2, . . . , d. For a
given positive discretized parameter δ, h, define discrete
spaces in state and time by
Σ
δ
= {x ∈ R
d
: x =
P
n
i=1
k
i
δe
i
, k
i
∈ Z},
[t, T ]
h
= [t, T ] ∩ {t = kh + T : k ∈ Z}.
(7)
To proceed, the following assumptions will be given.
(A2) Matrix a(x, ν) satisfies
|a
ii
(x, ν)| −
X
j6=i
|a
ij
(x, ν)| ≥ 0.
(A3) Discrete parameter δ = δ(h) is a function of h such
that
h
n
X
i=1
[a
ii
(x, ν)−
1
2
X
j6=i
|a
ij
(x, ν)|+δ|f
i
(x, ν)|] ≤ δ
2
.
(8)
Assumption (A2) requires that the diffusion matrix
be diagonally dominated. If the given dynamic system
does not satisfy (A2), then we can adjust the coordinate
system to satisfy assumption (A2); see [10, page 110]
and [7, Page329]. Assumption (A3) gives the relation
between two parameters δ and h, which are used in
discretization.
By V
h
(·, ·) on Σ
δ
× [0, T ]
h
, we denote numerical
solution of (1) with parameters δ, h of (8) used. Note
that, for simplicity, we use V
h
instead of V
δ,h
.
Numerical solution V
h
can be obtained by upwind
finite difference numerical scheme, that is, for any
46th IEEE CDC, New Orleans, USA, Dec. 12-14, 2007 ThC05.4
4070
function φ(t, x)
∆
δ,±
x
i
φ = δ
−1
[φ(t, x ± δe
i
) − φ(t, x)]
∆
2,δ
x
i
φ = δ
−2
[φ(t, x + δe
i
) + φ(t, x − δe
i
) − 2φ(t, x)]
, ∆
δ,+
x
i
x
i
φ , ∆
δ,−
x
i
x
i
φ
∆
δ,+
x
i
x
j
φ =
1
2
δ
−2
[2φ(t, x) + φ(t, x + δe
i
+ δe
j
)
+φ(t, x − δe
i
− δe
j
)]
−
1
2
δ
−2
[φ(t, x + δe
i
) + φ(t, x − δe
i
)
+φ(t, x + δe
j
) + φ(t, x − δe
j
)]
∆
δ,−
x
i
x
j
φ = −
1
2
δ
−2
[2φ(t, x) + φ(t, x + δe
i
− δe
j
)
+φ(t, x − δe
i
+ δe
j
)]
+
1
2
δ
−2
[φ(t, x + δe
i
) + φ(t, x − δe
i
)
+φ(t, x + δe
j
) + φ(t, x − δe
j
)]
∆
h,−
t
φ =
φ(t, x) − φ(t − h, x)
h
∆
h,+
t
φ =
φ(t + h, x) − φ(t, x)
h
(9)
Applying above upwind finite difference scheme (9)
to (1), one can write explicit numerical scheme as
∆
h,−
t
V
h
+ ⊗
x
ν
[f
+
(x, ν)∆
δ,+
x
V
h
+
1
2
tr(a
+
(x, ν)∆
2,δ,+
x
V
h
) − f
−
(x, ν)∆
δ,−
x
V
h
−
1
2
tr(a
−
(x, ν)∆
2,δ,−
x
V
h
) + L(x, ν)] = 0,
(10)
where
a
±
= max{±a, 0}
∆
δ,±
x
φ = (∆
δ,±
x
1
φ, ∆
δ,±
x
2
φ, . . . , ∆
δ,±
x
d
φ)
T
∆
2,δ,±
x
φ = (∆
δ,±
x
i
x
j
φ)
i,j=1,...,d
(11)
Note that ∆
2,δ,±
x
is symmetric matrix. In the following,
we give equivalent Markov chain approximation inter-
pretation of above upwind finite difference scheme. One
can rewrite (10) with boundary conditions by
V
h
(t − h, x) = ⊗
x
ν
[
X
y∈Σ
δ
p
h
(x, y, ν)V
h
(t, y)
+hL(x, ν)], t ∈ [h, T ]
h
, x ∈ Σ
δ
V
h
(T, x) = ψ(x), x ∈ Σ
δ
.
(12)
where
p
h
(x, x ± δe
i
, ν)
=
h
2δ
2
[a
ii
(x, ν) −
X
j6=i
|a
ij
(x, ν)| + 2δf
±
i
(x, ν)]
p
h
(x, x + δe
i
± δe
j
, ν) =
h
2δ
2
a
±
ij
(x, ν), i 6= j
p
h
(x, x − δe
i
± δe
j
, ν) =
h
2δ
2
a
±
ij
(x, ν), i 6= j
p
h
(x, x, ν)
= 1 −
h
δ
2
n
X
i=1
[a
ii
(x, ν) −
1
2
X
j6=i
|a
ij
(x, ν)| + δ|f
i
(x, ν)|]
p
h
(x, y, ν) = 0, otherwise.
(13)
Note that, under assumptions (A2) and (A3), we have
X
y∈Σ
δ
p
h
(x, y, ν) = 1; p
h
(x, y, ν) ≥ 0 (14)
It can be seen from (14), we can consider p
h
(·) as a
one step transition probability of a Markov chain {x
h
n
:
n = 0, 1, 2, . . .} in state space Σ
δ
with the cost function
defined by
˜
V
h
(k, x) =
T/h
X
n=k
hL(x
h
n
, u
h
n
). (15)
Then, the dynamic programming equation of
˜
V
h
is
exactly the same as (12). Hence, by uniqueness, we have
˜
V
h
= V
h
.
Remark 3.1: An implicit numerical scheme can be
obtained by replacing ∆
h,−
t
φ by ∆
h,+
t
φ in (10), that
is
∆
h,+
t
V
h
+ ⊗
x
ν
[f
+
(x, ν)∆
δ,+
x
V
h
+
1
2
tr(a
+
(x, ν)∆
2,δ,+
x
V
h
) − f
−
(x, ν)∆
δ,−
x
V
h
−
1
2
tr(a
−
(x, ν)∆
2,δ,−
x
V
h
) + L(x, ν)] = 0,
(16)
The above implicit numerical scheme also have probabil-
ity interpretations when we deal discrete time as another
state variable, see [10, Chapter 12].
The next section is to find sufficient conditions such
that V
h
of explicit scheme (10) converge to unique
viscosity solution V of (1). For the convergence of
implicit scheme (16), we can follow analogous method.
IV. CONVERGENCE
To show the convergence of V
h
of explicit scheme
(10) with boundary conditions, one can rewrite (10) as
V
h
(t − h, x) = F
h
[V
h
(t, ·)](x), t ∈ [h, T ]
h
, x ∈ Σ
δ
V
h
(T, x) = ψ(x), x ∈ Σ
δ
.
(17)
46th IEEE CDC, New Orleans, USA, Dec. 12-14, 2007 ThC05.4
4071
where Σ
δ
= {x ∈ R
d
: x =
P
n
i=1
k
i
δe
i
, k
i
∈ Z},
[h, T ]
h
= {t : h ≤ t ≤ T, t = kh, k ∈ Z}, and F
h
[φ](x)
is an operator for any function φ : R
d
→ R, such that
F
h
[φ](x) = φ(x) + h ⊗
x
ν
[f
+
(x, ν)∆
δ,+
x
φ(x)
−f
−
(x, ν)∆
δ,−
x
φ(x) +
1
2
tr(a
+
(x, ν)∆
2,δ,+
x
φ(x))
−
1
2
tr(a
−
(x, ν)∆
δ,−
x
φ(x)) + L(x, ν)]
(18)
Note that, by condition (C1), one can rewrite (18) as
F
h
[φ](x) = ⊗
x
ν
[
X
y∈Σ
δ
p
h
(x, y, ν)φ(y)+ hL(x, ν)], (19)
Lemma 4.1: Assume (A1), (A2) and (A3). Following
properties hold:
F
h
[φ
1
] ≤ F
h
[φ
2
], for ∀φ
1
≤ φ
2
(20)
F
h
(φ + c) = F
h
(φ) + c, ∀c ∈ R (21)
kV
h
k
∞
≤ K, ∀0 < h < 1, (22)
and for all φ ∈ C
1,2
(
¯
Q)
lim
(s,y)→(t,x)
h→0
F
h
[φ(s, ·)](y) − φ(s − h, y)
h
= φ
t
+ ⊗
x
ν
[f(x, ν)D
x
φ(t, x)
+
1
2
tr(a(x, ν)D
2
x
φ(t, x)) + L(x, ν)]
(23)
Proof Note that under (A2), (A3) and (14), one can have
(20) and (21). Rewrite (17) as
V
h
(t − h, x) = F
h
[V
h
(t, ·)](x)
= ⊗
h
ν
[
X
y∈Σ
δ
p
h
(x, y, ν)V
h
(t, y) + hL(x, ν)].
(24)
Then, for any t ∈ [h, T ]
V
h
(t − h, x) ≤ ⊗
h
ν
[max
y
V
h
(t, y) + hL(x, ν)]
≤ max
y
V
h
(t, y) + h ⊗
h
ν
L(x, ν)
≤ max
y
V
h
(t, y) + KhkLk
∞
.
(25)
It leads to stability of F
h
, that is, for any 0 ≤ m ≤ T/h,
max
x
V
h
(T − mh, x) ≤ max
x
V
h
(T, x) + KmhkLk
∞
≤ kψ(x)k
∞
+ KT kLk
∞
< ∞
(26)
Hence, (22) holds.
For any test function φ ∈ C
1,2
(
¯
Q), one can verify the
consistency (23) as following,
lim
(s,y)→(t,x)
h→0
F
h
[φ(s, ·)](y) − φ(s − h, y)
h
= lim
(s,y)→(t,x)
h→0
φ(s, y) − φ(s − h, y)
h
+ lim
(s,y)→(t,x)
h→0
⊗
y
ν
[L(y, ν) + f
+
(y, ν)∆
δ,+
x
φ(s, y)
−f
−
(y, ν)∆
δ,−
x
φ(s, y) +
1
2
tr(a
+
(y, ν)∆
2,δ,+
x
φ(s, y))
−
1
2
tr(a
−
(y, ν)∆
2,δ,−
x
φ(s, y))]
= φ
t
+ ⊗
x
ν
[f(x, ν)D
x
φ(t, x)
+
1
2
tr(a(x, ν)D
2
x
φ(t, x)) + L(x, ν)]
(27)
This completes the proof. 2
Definition 4.2: We say that V is a viscosity solution
of equation (1) if
(a) V (t, x) is upper semicontinuous function on Q and
for each φ ∈ C
∞
(Q),
φ
t
(
¯
t, ¯x) + ⊗
x
ν
[f(¯x, ν)D
x
φ(
¯
t, ¯x)
+
1
2
tr(a(¯x, ν)D
2
x
φ(
¯
t, ¯x)) + L(¯x, ν)] ≥ 0,
(28)
at every (
¯
t, ¯x) ∈ Q which is strict maximizer of V − φ
on
¯
Q.
(b) V (t, x) is lower semicontinuous function on Q and
for each φ ∈ C
∞
(Q),
φ
t
(
¯
t, ¯x) + ⊗
x
ν
[f(¯x, ν)D
x
φ(
¯
t, ¯x)
+
1
2
tr(a(¯x, ν)D
2
x
φ(
¯
t, ¯x)) + L(¯x, ν)] ≤ 0,
(29)
at every (
¯
t, ¯x) ∈ Q which is strict minimizer of V − φ
on
¯
Q.
If (a) (respectively (b)) holds, then V is said to be
subsolution (respectively supersolution) of (1).
For (t, x) ∈ Q, Define upper and lower semicontinu-
ous envelope of solution V
h
as
V
∗
(t, x) = lim sup
(s,y)→(t,x)
h→0
V
h
(s, y),
V
∗
(t, x) = lim inf
(s,y)→(t,x)
h→0
V
h
(s, y).
(30)
Lemma 4.3: V
∗
(resp. V
∗
) defined in (30) is a vis-
cosity subsolution (resp. supersolution) of equation (1)
under assumptions (A1), (A2), and (A3).
Proof. Suppose that φ ∈ C
∞
(Q) is a test function such
that V
∗
− φ has strict maximum at (
¯
t, ¯x) ∈ Q. Then
there is a sequence converging to zero denoted by h,
such that V
h
− φ has a maximum on [0, T ]
h
× Σ
δ
at
(s
h
, y
h
) such that (s
h
, y
h
) → (
¯
t, ¯x) as h → 0. For all
46th IEEE CDC, New Orleans, USA, Dec. 12-14, 2007 ThC05.4
4072
y ∈ Σ
h
, therefore
φ(s
h
+ h, y) −φ(s
h
, y
h
) ≥ V
h
(s
h
+ h, y) −V
h
(s
h
, y
h
).
(31)
By virtue of (20) and (21),
F
h
[φ(s
h
+ h, ·)](y
h
) − φ(s
h
, y
h
)
≥ F
h
[V
h
(s
h
+ h, ·)](y
h
) − V
h
(s
h
, y
h
).
(32)
By (17), the right hand side of (32) is zero. By dividing
h and forcing h → 0, left hand side of (32) goes to (23).
Thus, (28) holds. One can prove V
∗
is supersolution in
similar fashion. 2
In the following lemma, by A ≥ B for symmetric
matrices, we mean A−B is symmetric positive definite.
Lemma 4.4: Suppose (A1), A(2), and (A3) hold. Let
φ and
¯
φ be bounded viscosity subsolution and superso-
lution of (1). Then
sup
¯
Q
(φ −
¯
φ) = sup
y∈R
d
(φ(T, y) −
¯
φ(T, y)) (33)
Proof By virtue [7, Theorem V.9.1], it is enough to show
that there exists a constant K, such that
⊗
y
ν
[αf(y, ν)(x − y) +
1
2
tr(a(y, ν)B) + L(y, ν)]
− ⊗
x
ν
[αf(x, ν)(x − y) +
1
2
tr(a(x, ν)B) + L(x, ν)]
≤ K(α|x − y|
2
+ |x − y|),
(34)
for every (t, x), (t, y) ∈ Q, α > 0, and symmetric
matrices A, B satisfying
−3α
I 0
0 I
≤
A 0
0 −B
≤ 3α
I −I
−I I
(35)
By condition (C3) of the operator ⊗
x
ν
, one can write
⊗
y
ν
[αf(y, ν)(x − y) +
1
2
tr(a(y, ν)B) + L(y, ν)]
− ⊗
x
ν
[αf(x, ν)(x − y) +
1
2
tr(a(x, ν)B) + L(x, ν)]
≤ K max
ν
|α(f(y, ν) − f(x, ν))(x − y)|
+K max
ν
|L(y, ν) − L(x, ν)|
+K max
ν
|tr(a(y, ν)B − a(x, ν)A)|.
(36)
Note that assumption (A1) implies Lipschitz continuity
of function f and L. Hence
max
ν
|α(f(y, ν) − f(x, ν))(x − y)|
+ max
ν
|L(y, ν) − L(x, ν)| ≤ K(α|x − y|
2
+ |x − y|).
(37)
The last term is
tr(a(y, ν)B − a(x, ν)A)
= tr
σ(y, ν)σ(y, ν)
T
σ(y, ν)σ(x, ν)
T
σ(x, ν)σ(y, ν)
T
σ(x, ν)σ(x, ν)
T
·
A 0
0 −B
≤ 3αtr
σ(y, ν)σ(y, ν)
T
σ(y, ν)σ(x, ν)
T
σ(x, ν)σ(y, ν)
T
σ(x, ν)σ(x, ν)
T
·
I −I
−I I
= 3αtr(σ(y, ν)σ(y, ν)
T
− σ(y, ν)σ(x, ν)
T
−σ(x, ν)σ(y, ν)
T
+ σ(x, ν)σ(x, ν)
T
)
= 3αtr((σ(y, ν) − σ(x), ν)(σ(y, ν) − σ(x, ν))
0
)
= 3αkσ(y, ν) − σ(x, ν)k
2
≤ Kα|x − y|
2
.
(38)
The above inequalities combined together imply the
result. 2
Theorem 4.5: Suppose (A1), A(2), and (A3) hold.
Then V
∗
= V
∗
, V is the unique viscosity solution
of (1) on Q.
Proof. By definition of (30), V
∗
≥ V
∗
. Moreover, (17)
and (30) implies V
∗
and V
∗
has the same boundary
condition on {T } × R
d
. Applying Lemma 4.3 and
Lemma 4.4, one can write V
∗
≤ V
∗
. Thus, V = V
∗
=
V
∗
is viscosity solution of (1). Uniqueness also follows
form Lemma 4.4. 2
Following corollaries are straight forward results from
Theorem 4.5.
Corollary 4.6: Suppose (A1), A(2), and (A3) hold.
V (t, x) is the value function defined by (3), and V
h
(t, x)
is approximated value function of (10) with ⊗
x
ν
replaced
by min
ν
. Then V
h
(t, x) converge to V (t, x) as h → 0.
Corollary 4.7: Suppose (A1), A(2), and (A3) hold.
V
+
(t, x) (resp. V
−
(t, x)) is the value function defined
by (5) (resp. (6)), and V
h+
(t, x) (resp. V
h−
(t, x)) is
approximated value function of (10) with ⊗
x
ν
replaced
by min
ν
1
max
ν
2
(resp. max
ν
2
min
ν
1
). Then V
h+
(t, x)
(resp. V
−
(t, x)) converge to V
+
(t, x) (resp. V
−
(t, x))
as h → 0.
V. FURTHER REMARKS
In this work, the generalized HJB equation is pro-
posed, which is associated with stochastic control and
stochastic differential games. The proof of convergence
is given by the viscosity solution method. Probability
methods analogous to [9], [15] can also be used to prove
the convergence.
Another approachable result is controlled stochastic
hybrid system. Such a formulation has extensive recent
applications in risk theory, financial engineering, and
insurance modeling, see [5], [13], [17], [18], [19]. It
might be interesting if one could find more applications
46th IEEE CDC, New Orleans, USA, Dec. 12-14, 2007 ThC05.4
4073
in the manner of generalized HJB equations, such as,
risk-sensitive models, exploration-sensitive models, and
non zero sum differential games.
REFERENCES
[1] G. Barles, P. E. Souganidis, Convergence of Approximation
shemes for fully nonlinear second order equations, J. Asymptotic
Analysis, 4(1991), 271-283.
[2] T. Basar, P. Bernhard, H
∞
-Optimal Control and Related Mini-
max Problems, Birkh¨auser Boston, Boston, 1991.
[3] D. P. Bertsekas, Dynamic Programming: Deterministic and
Stochastic Models. Upper Saddle River, NJ, 1987.
[4] M. G. Crandall, H. Ishii, P. L. Lions, User’s Guide to Viscosity
Solutions of Second Order Partial Differential Equations, Bul-
letin of the American Mathematical Society, Vol 27, No. 1 (1992),
1-67.
[5] G.B. Di Masi, Y.M. Kabanov, W.J. Runggaldier, Mean variance
hedging of options on stocks with Markov volatility, Theory of
Probability and Applications, Vol 39 (1994), 173-181.
[6] R.J. Elliott and N.J. Kalton, Existence of Value in Differential
Games, Mem. Amer. Math. Soc., 126, Providence, RI, 1972.
[7] W. H. Fleming, H. M. Soner, Controlled Markov Processes and
Viscosity Solutions, 2nd edition, Springer-Verlag, Berlin, New
York, 2006.
[8] W. H. Fleming, P. E. Souganidis, On the Existence of
Value Functions of Two-player, Zero-sum Stochastic Differential
Games, Indiana Univ. Math. J., 38.2, (1989), 293-314.
[9] H. J. Kushner, Numiercal methods for stochastic control prob-
lems in continuous time, SIAM J. Control Optim, 28 (1990),
999-1048.
[10] H. J. Kushner, P. Dupuis, Numerical Methods for Stochastic
Control Problems in Continuous Time, 2nd edition, Springer-
Verlag, New York, Berlin, 2001.
[11] H. J. Kushner, Numerical approximations for stochastic differ-
ential games, SIAM J. Control Optim, 41.2 (2002), 457-486.
[12] M. L. Littman, C. Szepesvari, A Generalized Reinforcement-
Learning Model: Convergence and Applications, Proceedings of
the 13th International Conference on Machine Learning (ICML-
96), Bari, Italy (1996), 310-318.
[13] T. Rolski, H. Schmidli, V. Schmidt and J. Teugels, Stochastic
Processes for Insurance and Finance, Wiley and Sons, New
York, 1999.
[14] Q.S. Song, G. Yin, and Z. Zhang, Numerical method for con-
trolled regime-switching diffusions and regime-switching jump
diffusions, Automatica, 42 (2006), 1147-1158.
[15] Q. S. Song, G. Yin, Existence of Saddle Points in Discrete
Markov Games and Its Application in Numerical Methods for
Stochastic Differential Games, Proceedings of the 45th IEEE
Conference on Decision & Control, (2006), 6325-6330.
[16] P. E. Souganidis, Two-player, zero-sum differential games and
viscosity solutions, Stochastic and differential games, Ann. In-
ternat. Soc. Dynam. Games, 4, Birkh¨auser Boston, Boston, MA,
(1999), 69–104 .
[17] G. Yin and Q. Zhang, Continuous-Time Markov Chains and Ap-
plications: A Singular Perturbation Approach, Springer- Verlag,
New York, 1998.
[18] G. Yin and Q. Zhang, Discrete-time Markov Chains: Two time-
scale Methods and Applications, Springer, New York. 2005.
[19] Q. Zhang, Stock trading: An optimal selling rule, SIAM J.
Control Optim., Vol 40 (2001), 64-87.
46th IEEE CDC, New Orleans, USA, Dec. 12-14, 2007 ThC05.4
4074