Toward Designing Intelligent PDEs for Computer Vision: An Optimal Control Approach
ABSTRACT Many computer vision and image processing problems can be posed as solving
partial differential equations (PDEs). However, designing PDE system usually
requires high mathematical skills and good insight into the problems. In this
paper, we consider designing PDEs for various problems arising in computer
vision and image processing in a lazy manner: \emph{learning PDEs from real
data via data-based optimal control}. We first propose a general intelligent
PDE system which holds the basic translational and rotational invariance rule
for most vision problems. By introducing a PDE-constrained optimal control
framework, it is possible to use the training data resulting from multiple ways
(ground truth, results from other methods, and manual results from humans) to
learn PDEs for different computer vision tasks. The proposed optimal control
based training framework aims at learning a PDE-based regressor to approximate
the unknown (and usually nonlinear) mapping of different vision tasks. The
experimental results show that the learnt PDEs can solve different vision
problems reasonably well. In particular, we can obtain PDEs not only for
problems that traditional PDEs work well but also for problems that PDE-based
methods have never been tried before, due to the difficulty in describing those
problems in a mathematical way.
-
Citations (0)
-
Cited In (0)
Page 1
Toward Designing Intelligent PDEs for Computer
Vision: An Optimal Control Approach
Risheng Liua, Zhouchen Linb, Wei Zhangc, Kewei Tanga, Zhixun Sua
aSchool of Mathematical Sciences, Dalian University of Technology, Dalian, China.
bMicrosoft Research Asia, Beijing, China, e-mail: zhoulin@microsoft.com.
cDepartment of Information Engineering, The Chinese University of Hong Kong, China.
Abstract
Many computer vision and image processing problems can be posed as solving
partial differential equations (PDEs). However, designing PDE system usually re-
quires high mathematical skills and good insight into the problems. In this paper,
we consider designing PDEs for various problems arising in computer vision and
image processing in a lazy manner: learning PDEs from real data via data-based
optimal control. We first propose a general intelligent PDE system which holds
the basic translational and rotational invariance rule for most vision problems. By
introducing a PDE-constrained optimal control framework, it is possible to use the
training data resulting from multiple ways (ground truth, results from other meth-
ods, and manual results from humans) to learn PDEs for different computer vision
tasks. The proposed optimal control based training framework aims at learning a
PDE-based regressor to approximate the unknown (and usually nonlinear) map-
ping of different vision tasks. The experimental results show that the learnt PDEs
can solve different vision problems reasonably well. In particular, we can obtain
PDEs not only for problems that traditional PDEs work well but also for prob-
lems that PDE-based methods have never been tried before, due to the difficulty
in describing those problems in a mathematical way.
Preprint submitted to IVCSeptember 7, 2011
arXiv:1109.1057v1 [cs.CV] 6 Sep 2011
Page 2
Keywords: Optimal Control, PDEs, Computer Vision, Image Processing.
1. Introduction
The wide applications of partial differential equations (PDEs) in computer vi-
sion and image processing can be attributable to two main factors [1]. First, PDEs
in classical mathematical physics are powerful tools to describe, model, and sim-
ulate many dynamics such as heat flow, diffusion, and wave propagation. Second,
many variational problems or their regularized counterparts can often be effec-
tively solved from their Euler-Lagrange equations. Therefore, in general there are
two types of methods for designing PDEs for vision tasks. For the first kind of
methods, PDEs are written down directly (e.g., anisotropic diffusion [2], shock
filter [3], based on some understandings on the properties of mathematical oper-
ators and the physical natures of the problems, and curve-evolution-based equa-
tions [4]). The second kind of methods basically define an energy functional and
then derive the evolution equations by computing the Euler-Lagrange equation of
the energy functional (e.g., total-variation-based variational methods [5][6][7]).
In either way, people have to heavily rely on their intuition on the vision tasks.
Therefore, traditional PDE-based methods require good mathematical skills when
choosing appropriate PDE forms and predicting the final effect of composing re-
lated operators such that the obtained PDEs roughly meet the goals. If people do
not have enough intuition on a vision task, they may have difficulty in acquiring
effective PDEs. For example, although there has been much work on PDE-based
image segmentation [8][9][10][11], the basic philosophy is always to follow the
strong edges in the image and also require the edge contour to be smooth. Can we
have a PDE system for objective detection (Fig. 1) that locates the object region
2
Page 3
Figure 1: Can we design a PDE system which can detect the object of interest (e.g., the helicopter
in the left image) and does not respond if the object is absent (e.g., the right image)?
if the object is present and does not respond if the object is absent? We believe
that this is a big challenge to human intuition and is much more difficult than
traditional segmentation tasks if a PDE-based method is required, because it is
hard to describe an object class, which may have significant variation in shape,
texture and pose. Without using additional information to judge the content, the
existing PDEs for segmentation, e.g., [10], always output an “object region” for
any non-constant image. In short, current PDE design methods greatly limit the
applications of PDEs to a wider and more complex scope. This motivates us to
explore whether we can acquire PDEs that are less artificial yet more powerful.
In this paper, we give an affirmative answer to this question. We demonstrate that
learning particular coefficients of a general intelligent PDE system from a given
training data set might be a possible way of designing PDEs for computer vision
in a lazy manner. Furthermore, borrowing this learning strategy from machine
learning can generalize PDEs techniques for more complex vision problems.
Inspired by the electromagnetic field theory and Maxwell’s equations [12],
we assume that the visual processing has two coupled evolutions in different scale
spaces: one is in the image scale space, which controls the evolution of the output,
and the other is in the indicator scale space that helps collect the global informa-
3
Page 4
tion to guide the evolution in the image scale space. In this way, our general
intelligent PDE system consists of two coupled evolutionary PDEs. Both PDEs
are coupled equations between the image and indicator, up to their second order
partial derivatives. Another key idea of our general intelligent PDE system is to
assume that the PDEs that are sought could be written as combinations of “atoms”
which satisfy the general properties of vision tasks. As a preliminary investiga-
tion, we utilize all the translational and rotational invariants as such“atoms” and
propose the general intelligent PDE system as a linear combination of all these
invariants [13]. Then the problem boils down to determining the combination
coefficients among such “atoms”.
Thetheoryofoptimalcontrol[14]hasbeenwelldevelopedforoverfiftyyears.
With the enormous advances in computing power, optimal control is now widely
usedinmulti-disciplinaryapplicationssuchasbiologicalsystems, communication
networks and socio-economic systems etc [15]. Optimal design and parameter es-
timation of systems governed by PDEs give rise to a class of problems known
as PDE-constrained optimal control [16]. In this paper, a PDE-constrained opti-
mal control technique as the training tool is introduced for our PDE system. We
further propose a general framework for learning PDEs to accomplish a specific
vision task via PDE-constrained optimal control, where the objective functional
is to minimize the difference between the expected outputs and the actual outputs
of the PDEs, given the input images. Such input-output image pairs are provided
in multiple ways (e.g., ground truth, results from other methods or manually gen-
erated results by humans) for different tasks. Therefore, we can train the general
intelligent PDE system to solve various vision problems which traditional PDEs
may find difficult or even impossible.
4
Page 5
In summary, our contributions are as follows:
1. Our intelligent PDE system provides a new way to design PDEs for com-
puter vision. Based on this framework, we can design particular PDEs for
different vision tasks using different sources of training images1. This may
be very difficult for traditional PDE design methods. However, we would
like to remind the readers that we have no intention to beat all the existing
approaches for each task, because these approaches have been carefully and
specially tuned for the task.
2. We propose a general data-based optimal control framework for training the
PDE system. Fed with pairs of input and output images, the proposed PDE-
constrained optimal control training model can automatically learn the com-
bination coefficients in the PDE system. Unlike previous design methods,
our approach requires much less human wits and can solve more difficult
problems in computer vision.
The rest of the paper is structured as follows. We first introduce in Section 2
the general intelligent PDE system. In Section 3 we utilize the PDE-constrained
optimal control technique as the training framework for our intelligent PDE sys-
tem. Then in Section 4 we evaluate our intelligent PDE system with optimal
control training framework by a series of computer vision and image processing
problems. Finally, we give concluding remarks and a discussion on the future
work in Section 5.
1Similar idea also appeared in [17]. But in that work, the authors only train special PDEs
involving the curvature operator for basic image restoration tasks. In contrast, our work here
proposes a more unified and elegant framework for more problems in computer vision.
5
Page 6
2. General PDE System for Computer Vision
2.1. Electromagnetic Field vs. Image Evolution
Electromagnetism is the force that causes the interaction among electrically
charged particles. The areas in which electromagnetic interaction happens are
called the electromagnetic fields. But in physics, electrically charged objects were
first thought to produce two types of fields associated with their charge property:
An electric field and a magnetic field. Over time, it was realized that the electric
and magnetic fields are better thought of as two parts of a greater whole – the
electromagnetic field [12]. It affects the behavior of charged objects in the vicin-
ity of the field. The electromagnetic field extends infinitely throughout space and
describes the electromagnetic interaction. The theoretical implications of electro-
magnetism also led to the development of special relativity by Albert Einstein in
1905.
In this paper, inspired by this fundamental force of nature, we consider the
image evolution in a similar way. For a target image signal u(t), different from
most traditional ways, which only consider the evolution in the image scale space,
we define a companion signal v(t) named the indicator signal. It changes with
time and guides the evolution of u(t) by collecting large scale information in the
image. In this way, these two signals evolve in two coupled scale spaces.
2.2. The Intelligent PDE System
Similar to Maxwell’s equations [12], which are a set of PDEs describing how
the electric and magnetic fields relate to their sources and how they develop with
time, we propose a general PDE system for the evolution of our coupled signals.
6
Page 7
The space of all PDEs is infinitely dimensional. To find the right form, we start
with the properties that our PDE system should have, in order to narrow down the
search space. We notice that translationally and rotationally invariant properties
are very important for computer vision, i.e., in most vision tasks, when the input
image is translated or rotated, the output image is also translated or rotated by
the same amount. So we require that our PDE system is translationally and ro-
tationally invariant. According to the differential invariant theory [13], the form
of our PDEs must be functions of the fundamental differential invariants under
the group of translation and rotation. The fundamental differential invariants are
invariant under translation and rotation and other invariants can be written as their
functions. We list those up to second order in Table 1, where some notations can
be found in Table 2. In the sequel, we shall use {invj(u,v)}16
in order. Note that those invariants are ordered with u going before v. We may
j=0to refer to them
reorder them with v going before u. In this case, the j-th invariant will be referred
to as invj(v,u). So the simplest choice of our general PDE system is the linear
combination of the differential invariants, leading to the following form:
where
F(u,v,{aj}16
F(v,u,{bj}16
∂u
∂t− F(u,v,{aj}16
u(x,y,t) = 0,
j=0) = 0, (x,y,t) ∈ Q,
(x,y,t) ∈ Γ,
u|t=0= fu,
∂v
∂t− F(v,u,{bj}16
v(x,y,t) = 0,
(x,y) ∈ Ω,
(x,y,t) ∈ Q,
(x,y,t) ∈ Γ,
(x,y) ∈ Ω,
j=0) = 0,
v|t=0= fv,
(1)
j=0) =?16
j=0aj(t)invj(u,v),
j=0) =?16
j=0bj(t)invj(v,u),
(2)
7
Page 8
Ω is the rectangular region occupied by the input image I, T is the time that the
PDE system finishes the visual information processing and outputs the results,
and fuand fvare the initial functions of u and v, respectively. The meaning of
other notations in (1) can be found in Table 2. For computational issues and the
ease of mathematical deduction, I will be padded with zeros of several pixels
width around it. As we can change the unit of time, it is harmless to fix T =
1. {aj(t)}16
control the evolution of u and v, respectively. As ∇u and Huchange to R∇u
and RHuRT, respectively, when the image is rotated by a matrix R, it is easy
j=0and {bj(t)}16
j=0are sets of functions defined on Q that are used to
to check the rotational invariance of those quantities. So the PDE system (1)
is rotationally invariant. Furthermore, the following proposition implies that the
control functions aj(t) and bj(t) can be functions of t only.
Proposition 2.1. Suppose the PDE system (1) is translationally invariant, then
the control functions {aj}16
j=0and {bj}16
j=0must be independent of (x,y).
The proof of Proposition 2.1 is presented in Appendix A.
3. Training the PDE System via Data-based Optimal Control
In this section, we propose a data-based optimal control framework to train
the intelligent PDE system for particular vision tasks.
3.1. The Objective Functional
Given the forms of PDEs shown in (1), we have to determine the coeffi-
cient functions aj(t) and bj(t). We may prepare training samples {(Im,Om)}M
where Imis the input image and Omis the expected output image, and compute
m=1,
8
Page 9
Table 1: Translationally and rotationally invariant fundamental differential invariants up to the
second order.
j
invj(u,v)
0,1,2
1, v, u
3,4
?∇v?2= v2
(∇v)T∇u = vxux+ vyuy
tr(Hv) = vxx+ vyy, tr(Hu) = uxx+ uyy
x+ v2
y, ?∇u?2= u2
x+ u2
y
5
6,7
8
(∇v)THv∇v = v2
(∇v)THu∇v = v2
(∇v)THv∇u = vxuxvxx+ (vxuy+ vyux)vxy+ vyuyvyy
(∇v)THu∇u = vxuxuxx+ (vxuy+ vyux)uxy+ vyuyuyy
(∇u)THv∇u = u2
(∇u)THu∇u = u2
tr(H2
xvxx+ 2vxvyvxy+ v2
yvyy
9
xuxx+ 2vxvyuxy+ v2
yuyy
10
11
12
xvxx+ 2uxuyvxy+ u2
yvyy
13
xuxx+ 2uxuyuxy+ u2
yuyy
14
v) = v2
xx+ 2v2
xy+ v2
yy
15tr(HvHu) = vxxuxx+ 2vxyuxy+ vyyuyy
16tr(H2
u) = u2
x+ 2u2
xy+ u2
y
9
Page 10
Table 2: Notations.
Ω
An open bounded region in R2
∂Ω
Boundary of Ω
(x,y)(x,y) ∈ Ω, spatial variable
Ω × (0,T)
The area of a region
tt ∈ (0,T), temporal variable
∂Ω × (0,T)
Transpose of matrix (or vector)
QΓ
| · |
? · ?
∇u
℘
XT
L2norm tr(·)
Hu
Trace of matrix
Gradient of u
Hessian of u
℘ = {(0,0),(0,1),(1,0),(0,2),(1,1),(2,0)}, index set for partial differentiation
the coefficient functions that minimize the following functional:
J({um}M
=1
2
+1
2
m=1,{aj}16
?M
j=0,{bj}16
j=0)
m=1
?
Ω[um(x,y,1) − Om]2dΩ
?1
?16
j=0λj
0a2
j(t)dt +1
2
?16
j=0µj
?1
0b2
j(t)dt,
(3)
where um(x,y,1) is the output image at time t = 1 computed from (1) when
the input image is Im, and λiand µiare positive weighting parameters. The first
term requires that the final output of our PDE system be close to the ground truth.
The second and the third terms are for regularization so that the optimal control
problem is well posed, as there may be multiple minimizers for the first term.
10
Page 11
3.2. Solving the Optimal Control Problem
Then we have the following optimal control problem with PDE constraints:
argmin
{aj}16
j=0,{bj}16
j=0
J({um}M
m=1,{aj}16
j=0,{bj}16
j=0)
s.t.
∂u
∂t− F(u,v,{aj}16
u(x,y,t) = 0,
j=0) = 0, (x,y,t) ∈ Q,
(x,y,t) ∈ Γ,
u|t=0= fu,
∂v
∂t− F(v,u,{bj}16
v(x,y,t) = 0,
(x,y) ∈ Ω,
(x,y,t) ∈ Q,
(x,y,t) ∈ Γ,
(x,y) ∈ Ω.
j=0) = 0,
v|t=0= fv,
(4)
By introducing the adjoint equation of (4), the Gˆ ateaux derivative of J can be
computed and consequently, the (local) optimal {aj}16
puted via gradient-based algorithms (e.g., conjugate gradient). Here, we give the
j=0and {bj}16
j=0can be com-
adjoint equation and Gˆ ateaux derivative directly:
3.2.1. Adjoint Equation
∂ϕm
∂t+ E(um,vm,ϕm,φm) = 0, (x,y,t) ∈ Q,
ϕm= 0,(x,y,t) ∈ Γ,
(x,y) ∈ Ω,ϕm|t=1= Om− um(1),
∂φm
∂t+ E(vm,um,φm,ϕm) = 0, (x,y,t) ∈ Q,
φm= 0,(x,y,t) ∈ Γ,
(x,y) ∈ Ω,φm|t=1= 0,
(5)
11
Page 12
where
E(um,vm,ϕm,φm)
?
E(vm,um,φm,ϕm)
?
σpq(u) =
=
(p,q)∈℘
(−1)p+q∂p+q(σpq(um)ϕm+σpq(vm)φm)
∂xp∂yq
,
=
(p,q)∈℘
(−1)p+q∂p+q(σpq(um)ϕm+σpq(vm)φm)
∂upq=?16
σpq(v) =
∂xp∂yq
,
∂F(u)
j=0aj
∂invj(u,v)
∂upq
∂invj(v,u)
∂vpq
, upq=
∂p+qu
∂xp∂yq,
∂p+qv
∂xp∂yq.
∂F(v)
∂vpq=?16
j=0bj
, vpq=
(6)
3.2.2. Gˆ ateaux Derivative of the Functional
With the help of the adjoint equation, at each iteration the derivative of J with
respect to aj(t) and bj(t) are as follows:
∂J
∂aj= λjaj−
M ?
m=1
M ?
?
?
Ωϕminvj(um,vm)dΩ,j = 0,...,16,
∂J
∂bj= µjbj−
m=1
Ωφminvj(vm,um)dΩ,j = 0,...,16,
(7)
where the adjoint functions ϕmand φmare the solutions to (5).
3.2.3. Initialization
Good initialization increases the approximation accuracy of the learnt PDEs.
In our current implementation, we simply set the initial functions of u and v as
the input image:
um(x,y,0) = vm(x,y,0) = Im(x,y), m = 1,2,...,M.
Then we employ a heuristic method to initialize the control functions. At each
time step,∂um
∂tis expected to be dm(t) =
the expected output Omand by the form of (1) we may solve {aj(t)}16
M
?
Om−um(t)
1−t
so that um(t) moves towards
j=0such that
m=1
?
Ω
[F(um,vm,{aj(t)}16
j=0) − dm(t)]2dΩ
(8)
12
Page 13
is minimized2. In this way, we initialize aj(t) successively in time while fixing
bj(t) = 0,j = 0,1,...,16.
3.2.4. Finite Difference Method for Numerical Solution
To solve the intelligent PDE system numerically, we design a finite difference
scheme [18] for the PDEs. We discretize the PDEs, i.e. replace the derivatives∂f
∂t,
∂f
∂xand∂2f
∂x2with finite differences as follows:
The discrete forms of∂f
∂f
∂t=
f(t+∆t)−f(t)
∆t
f(x+1)−f(x)
2
,
∂f
∂x=
∂2f
∂x2= f(x − 1) − 2f(x) + f(x + 1).
,
(9)
∂y,∂2f
∂y2and
∂2f
∂x∂ycan be defined similarly. In addition, we
discretize the integrations as
?
Ωf(x,y)dΩ =
?t
1
N
?
Ωf(x,y),
0f(t)dt = ∆t?Tm
i=0f(i · ∆t),
(10)
where N is the number of pixels in the spatial area, ∆t is a properly chosen time
step size and Tm= ?1
use an explicit scheme to compute the numerical solutions.
∆t+ 0.5? is the index of the expected output time. Then we
3.3. The Optimal-Control-Based Training Framework
We now summarize in Algorithm 1 the data-based optimal control training
framework for the intelligent PDE system. After the PDE system is learnt, it can
be applied to new test images by solving (1), whose inputs fuand fvare both the
test image and the solution u(t)|t=1is the desired output image.
2It is to minimize the difference between the left and the right hand sides of (1).
13
Page 14
Algorithm 1 (Data-based optimal control framework for training the PDE
system)
Input: Training image pairs {(Im,Om)}M
1: Initialize aj(t), t = 0,∆t,...,1 − ∆t, by minimizing (8) and fix bj(t) = 0,
j = 0,1,...,16.
m=1.
2: while not converged do
3:
Compute
∂J
∂ajand∂J
∂bj, j = 0,...,16, using (7).
4:
Decide the search direction using the conjugate gradient method [19].
5:
Perform golden search along the search direction and update aj(t) and
bj(t), j = 0,...,16.
6: end while
Output: The coefficient functions {aj(t)}16
j=0and {bj(t)}16
j=0.
4. Experimental Results
In this section, we apply our data-based optimal control framework to learn
PDEs for four groups of basic computer vision problems: Natural image denois-
ing, edge detection, blurring and deburring, and image segmentation and object
detection. As our goal is to show that the data-based optimal control framework
could be a new approach for designing PDEs and an effective regressor for many
computer vision tasks, NOT to propose better algorithms for these tasks, we are
not going to fine tune our PDEs and then compare it with the state-of-the-art al-
gorithms in every task.
4.1. Learning from Ground Truth: Natural Image Denoising
Image denoising is one of the most fundamental low-level vision problems.
For this task, we compare our learnt PDEs with the existing PDE-based denoising
14
Page 15
methods, ROF [5] and TV-L1[6], on images with unknown natural noise. This
task is designed to demonstrate that our method can solve problems by learning
from the ground truth. This is the first advantage of our data-based optimal control
model. We take 240 images, each with a size of 150 × 150 pixels, of 11 objects
using a Canon 30D digital camera, setting its ISO to 1600. For each object, 30
images are taken without changing the camera settings (by fixing the focus, aper-
ture and exposure time) and without moving the camera position. The average
image of them can be regraded as the noiseless ground truth image. We randomly
choose 8 objects. For each object we randomly choose 5 noisy images. These
noisy images and their ground truth images are used to train the PDE system.
Then we compare our learnt PDEs with the traditional PDEs in [5] and TV-L1[6]
on images of the remaining 3 objects.
Fig. 2 shows the comparison results. One can see that the PSNRs of our intelli-
gent PDEs are dramatically higher than those of traditional PDEs. This is because
our data-based PDE learning framework can easily adapt to unknown types of
noise and obtain PDE forms to fit for the natural noise well, while most traditional
PDE-based denoising methods were designed under specific assumptions on the
types of noise (e.g., ROF is designed for Gaussian noise [5] while TV-L1is de-
signed for impulsive noise [20]). Therefore, they may not fit for unknown types
of noise as well as our intelligent PDEs. The curves of the learnt coefficients for
image denoising are shown in Fig. 3.
4.2. Learning from Other Methods: Edge Detection
The image edge detection task is used to demonstrate that our PDEs can be
learnt from the results of different methods and achieve a better performance than
all of them. This is another advantage of our data-based optimal control model.
15