Content uploaded by Martin Lazar

Author content

All content in this area was uploaded by Martin Lazar on Oct 10, 2023

Content may be subject to copyright.

OPTIMAL CONTROL OF PARABOLIC EQUATIONS – A SPECTRAL

1

CALCULUS BASED APPROACH2

LUKA GRUBIˇ

SI´

C∗, MARTIN LAZAR†, IVICA NAKI´

C‡,AND MARTIN TAUTENHAHN§

3

Abstract. In this paper we consider a constrained parabolic optimal control problem. The cost

4

functional is quadratic and it combines the distance of the trajectory of the system from the desired

5

evolution proﬁle together with the cost of a control. The constraint is given by a term measuring

6

the distance between the ﬁnal state and the desired state towards which the solution should be

7

steered. The control enters the system through the initial condition. We present a geometric analysis

8

of this problem and provide a closed-form expression for the solution. This approach allows us to

9

present the sensitivity analysis of this problem based on the resolvent estimates for the generator

10

of the system. The numerical implementation is performed by exploring eﬃcient rational Krylov

11

approximation techniques that allow us to approximate a complex function of an operator by a series

12

of linear problems. Our method does not depend on the actual choice of discretization. The main

13

approximation task is to construct an eﬃcient rational approximation of a generalized exponential

14

function. It is well known that this class of functions allows exponentially convergent rational

15

approximations, which, combined with the sensitivity analysis of the closed form solution, allows us

16

to present a robust numerical method. Several case studies are presented to illustrate our results.17

1. Introduction. In this paper we consider an optimal control problem for a

18

general linear parabolic equation governed by a self-adjoint operator on an abstract

19

Hilbert space. The task consists in identifying a control (entering the system through

20

the initial condition) that minimizes a given cost functional, while steering the ﬁnal

21

state at time

T >

0 close to the given target. The functional comprises of the control

22

norm and an additional term penalizing the distance of the state from the desired

23

trajectory.24

The choice of the penalising functional not only ensures that the target is reached

25

with a minimal control cost, but also forces the system to follow a speciﬁc path.

26

Such kind of problems occur when one is able to adjust initial conditions of a sys-

27

tem/experiment and has no other mean of inﬂuencing it once the process has begun.

28

In such a scenario, one would like to choose the optimal starting point by which the

29

prescribed goals will be achieved. A typical example is an injection of a reaction source:

30

the goal is to inject the reagent under conditions that will provide the maximal eﬀect.

31

Or one might want to release the accumulated pollutant such that it follows a desired

32

path (e.g., far away from human settlements) and ends at a harmless location.33

This task can be also considered as an inverse problem (of initial source identiﬁca-

34

tion) for parabolic equations from the optimal control viewpoint. It is an important,

35

but also numerically challenging issue due to the dissipative nature of such equations.

36

It has been addressed by diﬀerent methods, some including optimization and optimal

37

control techniques [9,18,17,5].38

Optimal control problems with control in initial conditions are less investigated

39

than distributed or boundary control problems. The latter contain controls acting

40

along the whole time interval [0

, T

]. Such a setting is not the subject of this paper,

41

but we refer an interested reader to [

25

], containing a clear and detailed exposition of

42

∗

University of Zagreb, Department of Mathematics, Croatia (luka@math.hr,https://www.pmf.

unizg.hr/math/luka.grubisic).

†

University of Dubrovnik, Department of Electrical Engineering and Computing, Croatia

(mlazar@unidu.hr,http://www.martin-lazar.from.hr).

‡

University of Zagreb, Department of Mathematics, Croatia (nakic@math.hr,https://web.math.

pmf.unizg.hr/∼nakic).

§

Universit¨at Leipzig, Fakult¨at f¨ur Mathematik und Informatik, Germany (martin.tautenhahn@uni-

leipzig.de,https://home.uni-leipzig.de/mtau/).

1

This manuscript is for review purposes only.

the topic.43

Our problem can be treated by exploring the Fenchel-Rockafellar duality for convex

44

optimisation (cf. [

21

, Section 3.6]). If the cost functional consists of the control cost

45

only, the problem is reduced to the classical minimal norm control problem which can

46

be treated by the Hilbert uniqueness method. In the seminal work [

4

] this approach

47

is used to transform the boundary control problems into identiﬁcation problems for

48

initial data of the adjoint heat equation, better suited to numerical methods than the

49

original problems. A more recent paper [

11

] generalizes this method by considering

50

cost functional including the state in addition to the control. In order to explore

51

eﬃcient optimization methods, in both papers the authors approximate an original

52

problem with a constraint on the ﬁnal state, by an unconstrained one containing a

53

penalisation term. The solution is then obtained by letting the penalisation constant

54

blow up. Similar techniques are applied in [

3

,

12

]. However, this approach does not

55

provide an a-priori estimate on the deviation of the ﬁnal state from the given target.56

In order to numerically recover the control minimising the functional of interest,

57

most of the authors involve ﬁnite diﬀerence and/or ﬁnite element discretisation and

58

employ some iterative scheme (e.g. conjugate gradient), usually including the dual

59

problem. Classical convex optimization techniques in Hilbert spaces (e.g. [

1

,

21

])

60

also provide iterative methods that can be applied to our problem. Of course, these

61

iterative techniques come with a signiﬁcant computational cost, which increases with

62

the system dimension.63

In this paper we propose a diﬀerent approach based on the spectral calculus for

64

self-adjoint operators and a geometrical representation of the problem. First, we

65

obtain closed-form expression for the control solution as a function of the self-adjoint

66

operator governing the dynamics of the system. This expression is almost explicit, up

67

to a scalar factor ensuring that the deviation of the ﬁnal state from the given target is

68

within the prescribed tolerance. Once the equation – which is numerically represented

69

by a rational function – for this scalar unknown is solved, the method provides a

70

direct, one-shot formula for the optimal initial state. Furthermore, we can organize the

71

computation in the oﬀ-line (constructing the rational function representation of the

72

equation for the scalar factor) and the on-line phase (computation of the optimal initial

73

state) so that, unlike in the classical approach, we can reuse most of the computation

74

if we only change the tolerance parameter. Its numerical eﬃciency is achieved by

75

utilizing rational Krylov approximation techniques for resolvents from [

2

], by which

76

one constructs a rational approximant of the aforementioned function of the operator.

77

The proposed method and the obtained formula are given in the abstract Hilbert

78

space framework and can be applied to optimal control problems for a large class of

79

linear parabolic PDEs for which there exist eﬃcient resolvent approximation algorithms.

80

To illustrate our methods we treat optimization problems for 1D and 2D heat equations.

81

Our approach is an extension of the result from [

16

], where the authors explore

82

the spectral representation of the solution by eigenfunctions of the operator governing

83

the system dynamics. This eventually leads to an explicit expression (up to a scalar

84

factor) of the optimal ﬁnal state and the optimal control. The obtained formulae are

85

spectrally decoupled meaning that the

n

-th Fourier coeﬃcient is fully expressed by

86

the corresponding coeﬃcients of the given data: the ﬁnal target and desired trajectory.

87

However, the practical implementation of the algorithm is constrained by the availability

88

of the spectral decomposition of the operator. For general PDE operators with variable

89

coeﬃcients and/or acting on irregular domains the decomposition is in general not

90

available or hard to construct. Also, this construction requires costly computations

91

that can exceed the gain provided by the eﬃciency of the obtained formula.92

2

This manuscript is for review purposes only.

On the other side, the method proposed in this paper is applicable to more complex

93

settings. It allows to eﬃciently treat PDEs with variable coeﬃcients and deﬁned on

94

complicated domains. In addition, it is robust with respect to small perturbations of

95

both the system and the cost functional, and we provide estimates on deviation of

96

the original solution from the perturbed one. This is quite important in applications,

97

as in practice the models of interest are often not completely determined, subject

98

to unknown or uncertain parameters, either of deterministic or stochastic nature.

99

Furthermore, an expansion of a state in eigenfunctions typically converges rather

100

slowly except in very speciﬁc cases. In comparison, representations of solutions using

101

Krylov subspaces are much more eﬃcient in the number of required terms.102

The paper is organised as follows. In the next section we formulate the problem

103

and state the main result (Theorem 2.2). In Section 3we present the sensitivity

104

analysis which justiﬁes a ﬁnite-dimensional approximation of the problem. In Section 4

105

we present the rational function approximation theory and discuss the stability of

106

the ﬁnite element approximation of the problem. Further, we discuss the relationship

107

between numerical rational functions calculus as realized by the

rkfit

algorithm

108

[

2

] and the approximation problem for the generalized exponential functions which

109

appear as central for the study of the concrete numerical examples. In Section 5we

110

present 1D and 2D numerical examples which are outside the scope of the original

111

eigendecomposition method from [

16

]. Within the concluding remarks we discuss

112

eﬃciency of the introduced method, open perspectives and comparison to other

113

approaches.114

2. Setting of the problem and characterisation of the solution. Let

H115

be a Hilbert space and

A

be an upper-bounded self–adjoint operator in

H

with an

116

upper bound

κ

, i.e.

max σ

(

A

)

≤κ

. We denote by (

St

)

t≥0

the semigroup generated by

117

A. We consider for f∈L2((0,∞); H) and u∈ H the Cauchy problem118

(2.1) (y0(t) = Ay(t) + f(t), t > 0,

y(0) = u.

119

Note that the mild solution of (2.1) is given by120

y(t) = Stu+Zt

0

Sτf(t−τ)dτ, t ≥0,121

and is an element of

L2

((0

,∞

);

H

), see e.g. [

8

]. We say that the system

(2.1)

is

122

controllable to a target state y∗∈ H in time T > 0 if there is u∈ H such that123

STu+ZT

0

Sτf(T−τ)dτ=y∗.124

We say that the system

(2.1)

is approximately controllable in time

T >

0 if for all

125

y∗∈ H and all ε > 0 there exists u∈ H such that126

(2.2)

STu+ZT

0

Sτf(T−τ)dτ−y∗

≤ε.127

Let us note that for any

T >

0 the operator

ST

is injective with a dense range. Since

128

the range of

ST

is dense in

H

the system

(2.1)

is indeed approximately controllable

129

in any time

T >

0. In the class of initial values satisfying

(2.2)

for a given target

130

3

This manuscript is for review purposes only.

y∗∈ H

and time

T >

0, we are looking for those with minimal cost. More precisely,

131

for ε, T > 0 and y∗∈ H we introduce the problem132

(2.3) min

u∈H nJ(u):

STu−y∗,hom

≤εo

133

where134

J(u) = α

2kuk2+1

2ZT

0

β(t)

Stu−whom(t)

2dt,135

α >

0 and

β∈L∞

((0

, T

); [0

,∞

)) are weights of the cost, and

w∈L2

((0

, T

);

H

) is the

136

target trajectory. Here, we used the notation137

y∗,hom =y∗−ZT

0

Sτf(T−τ)dτand whom =w−Z·

0

Sτf(·−τ)dτ.138

Note that

Stu

is the solution of the corresponding homogeneous Cauchy problem with

139

f= 0.140

For

ε >

0 and

x∈ H

we denote by

Bε

(

x

) =

{y∈ H:ky−xk ≤ ε}

the closed ball

141

of radius εand center x. Our problem (2.3) can be restated as142

min

u∈H (J(u) + IBε(y∗) STu+ZT

0

Sτf(T−τ)dτ!),143

where IBε(y∗)is the corresponding indicator function deﬁned as144

IBε(y∗)(y) = (0 if y∈Bε(y∗),

+∞else.

145

Since the function

u→J

(

u

)+

IBε(y∗)◦

(

STu

+

RT

0Sτf

(

τ

)d

τ

) is proper, strongly convex

146

and lower-semicontinuous, problem

(2.3)

has a unique solution, which we denote by

147

uopt (see, for instance, [21, Corollary 2.20]). Moreover, we deﬁne148

umin = arg min

u∈H

J(u),149

as the solution to the corresponding unconstrained problem, while by

ymin

=

STumin

+

150 RT

0Sτf

(

T−τ

)d

τ

and

yopt

=

STuopt

+

RT

0Sτf

(

T−τ

)d

τ

we denote the corresponding

151

optimal ﬁnal states obtained from umin and uopt, respectively.152

Remark 2.1.

Regarding the results we use from [

21

], we note that they are stated

153

in the case of real Hilbert spaces only. However, they carry over to the complex case154

by realifying the Hilbert space

H

and taking the real part of the inner product instead

155

of the (complex) inner product.156

If

ε≥ kymin −y∗k

it follows that the solution of

(2.3)

satisﬁes

uopt

=

umin

. The

157

following theorem covers the non-trivial case 0 <ε<kymin −y∗kas well.158

Theorem 2.2.

Let

T, ε >

0and

y∗∈ H

. Then the optimal initial state

uopt

is

159

given by160

(2.4) uopt = (µεS2T+ Ψ)−1(µεSTy∗,hom +ψ),161

where162

Ψ = αId + ZT

0

β(t)S2tdt, ψ =ZT

0

β(t)Stwhom(t)dt,163

4

This manuscript is for review purposes only.

and

µε≥

0is the unique solution of Φ(

µ

) =

ε

if

ε < kymin −y∗k

=

k

Ψ

−1STψ−y∗,homk

,

164

and zero otherwise. Here Φ: [0,∞)→[0,∞)is the function deﬁned by165

(2.5) Φ(µ) = ky∗,hom −(µS2T+ Ψ)−1(µS2Ty∗,hom +STψ)k.166

Remark 2.3.

In the case in which operator

A

is a matrix, the elements of the

167

matrix (

µS2T

+ Ψ)

−1

are rational functions of

µ

. This can be established using the

168

Cramer rule. Subsequently, in the ﬁnite dimensional case the function

µ7→

Φ

2

(

µ

) is

169

also a rational function of

µ

. Further note that it does not depend on the tolerance

170

ε

and so we can organise the numerical procedure in the oﬀ-line phase where we

171

build a rational function surrogate for Φ using least squares rational approximation of

172

the sampled values of Φ. In the on-line phase one determines

µε

and computes

uopt

173

using one shot approximation formula

(2.4)

. This sets our approach apart from other

174

approaches based on direct constrained optimization.175

Remark 2.4.

Since Ψ is positive deﬁnite, we indeed have that Ψ and

µS2T

+ Ψ,

176

µ≥

0, are invertible. Moreover, for the functional

J

we have

∇J

(

u

) = Ψ

u−ψ

.

177

As

umin

is its global minimizer, it immediately follows that

umin

= Ψ

−1ψ

, and thus

178

kΨ−1STψ−y∗,homk=kymin −y∗k.179

Remark 2.5. For ε < kymin −y∗kwe obtain from (2.4) and (2.5)180

Φ(µε) = ky∗−yoptk=ε.181

In other words, if the global minimizer

umin

of the unconstrained problem does not

182

drive the system to the target ball

Bε

(

y∗

), then the optimal ﬁnal state lies on the

183

boundary of this ball, cf. Lemma 2.8. This is in accordance with previous results on

184

similar problems (e.g. [

16

, Proposition 2.1] and [

5

, Theorem 2.4]) that provide the

185

same characterisation of the optimal solution.186

Remark 2.6.

Let

φ

(

µ

) =

y∗,hom −

(

µS2T

+Ψ)

−1

(

µS2Ty∗,hom

+

STψ

), hence Φ(

µ

) =

187

kφ(µ)k. Then φ(µ) = y∗,hom −x, where xis the solution of the equation188

(2.6) (µS2T+ Ψ) x=µS2Ty∗,hom +STψ,189

hence the calculation of Φ(

µ

) reduces to solving a linear equation. Note also that the

190

optimal initial state uopt is the solution of the equation191

(2.7) (µεS2T+ Ψ) x=µεSTy∗,hom +ψ.192

Remark 2.7.

To calculate Ψ we will use the fact that it can be written as a

193

function of Aby using194

ZT

0

β(t)S2tdt=Z∞

−∞ ZT

0

β(t) exp(−2tλ)dtd(E(λ)) = ˜

β0(A),195

where

˜

β0

is a function given by

˜

β0

(

λ

) =

RT

0β

(

t

)

exp

(

−

2

tλ

)d

t

and

E

is the spectral

196

measure of A.197

However such approach does not work directly with other term entering the formula

198

for the solution

(2.4)

. Namely, it is not possible to ﬁnd a nice closed formula for

ψ199

except in special situations. But we can always ﬁnd a good approximant for

ψ

. Let

200

˜w

(

t

) =

PN

i=1 wiχ[ti−1,ti]

be an approximation of

w

, where 0 =

t0< t1<·· · < tN

=

T

,

201

wi∈ H,i= 1, . . . , N , and χSis the characteristic function of the set S. Then202

˜

ψ=

N

X

i=1

˜

βi(A)wi,where ˜

βi(λ) = Zti

ti−1

β(t) exp(−tλ)dt,203

5

This manuscript is for review purposes only.

is an approximation of ψ.204

If the function

β

is such that we can not explicitly calculate

˜

βi

,

i

= 0

, . . . , N

, we can

205

still ﬁnd appropriate approximations of Ψ and

ψ

by ﬁnding appropriate approximations

206

of ˜

βi,i= 0, . . . , N .207

Before proving Theorem 2.2 we provide two auxiliary results. The following lemma is

208

already proved in [16], but here we give a shorter proof.209

Lemma 2.8.

Let

ε >

0and

kymin −y∗k> ε

, then the optimal ﬁnal state veriﬁes

210

ky∗−yoptk=ε, i.e. yopt lies on the boundary of the target ball.211

Proof.

Let us suppose the contrary, that

yopt ∈Bε

(

y∗

). Then it exists

η >

0

212

such that

Bη

(

yopt

)

⊂Bε

(

y∗

) and then, by continuity of

ST

, a

δ >

0 such that

213

STBδ

(

uopt

)+

RT

0Sτf

(

T−τ

)d

τ⊂Bη

(

yopt

)

⊂Bε

(

y∗

). In particular, every

u∈Bδ

(

uopt

)

214

is a feasible control for the problem

(2.3)

. As

uopt

is the solution of the same problem,

215

it holds

J

(

uopt

)

≤J

(

u

) for every

u∈Bδ

(

uopt

). But, by the convexity of

J

, a local

216

minimizer is also global. Then

uopt

is solution of the unconstrained problem, which

217

contradicts the assumption ε < kymin −y∗k.218

Lemma 2.9. The function Φhas the following properties:219

(a) If 0<ky∗−ymink, then Φis a strictly decreasing function,220

(b) limµ→∞ Φ(µ) = 0,221

(c) Φ(0) = ky∗−ymink.222

Proof. Note that the function Φ can be rewritten as223

Φ(µ) = k(µS2T+ Ψ)−1(Ψy∗,hom −STψ)k.224

For the derivative of µ7→ Φ2(µ) we have225

(Φ2)0(µ) = −2S2T(µS2T+ Ψ)−2(Ψy∗,hom −STψ),(µS2T+ Ψ)−1(Ψy∗,hom −STψ)

226

=−2

ST(µS2T+ Ψ)−3/2(Ψy∗,hom −STψ)

2≤0.227

228

As (Φ

2

)

0

= 2ΦΦ

0

and Φ is a nonegative function, it follows that Φ

0

(

µ

)

≤

0 for all

229

µ >

0 and we have strict negativity if Ψ

y∗,hom 6

=

STψ

. Recall that Ψ

umin

=

ψ

(cf.

230

Remark 2.4), hence Ψ

ymin

=

STψ

. Now Ψ

y∗,hom

=

STψ

would imply Ψ

ymin

= Ψ

y∗

, a

231

contradiction with the assumption and the invertibility of Ψ.232

We now prove (b). First note233

y−(µS2T+ Ψ)−1µS2Ty=y−µ(µ+S−1

2TΨ)−1y= (µ+S−1

2TΨ)−1S−1

2TΨy234

for all

y∈Dom

(

S−1

2T

Ψ), which implies that we have

y−µ

(

µ

+

S−1

2T

Ψ)

−1y

= 0 for

235

all

y∈Dom

(

S−1

2T

Ψ). Let (

yn

) be a sequence from

Dom

(

S−1

2T

Ψ) which converges to

236

y∗,hom

. Let

n∈N

be arbitrary. Using

k

(

µ

+

S−1

2T

Ψ)

−1k

=

dist

(

µ, −σ

(

S−1

2T

Ψ))

−1≤µ−1

237

it follows238

lim

µ→∞ky∗,hom −(µS2T+ Ψ)−1µS2Ty∗,homk239

= lim

µ→∞ky∗,hom −yn−µ(µ+S−1

2TΨ)−1(y∗,hom −yn) + yn−µ(µ+S−1

2TΨ)−1ynk240

≤2ky∗,hom −ynk+ lim

µ→∞kyn−µ(µ+S−1

2TΨ)−1ynk= 2ky∗,hom −ynk,241

242

and taking the limit n→ ∞ we obtain243

lim

µ→∞ky∗,hom −(µS2T+ Ψ)−1µS2Ty∗,homk= 0.244

6

This manuscript is for review purposes only.

Hence to prove (b) we only have to show245

(2.8) lim

µ→∞k(µS2T+ Ψ)−1STψk= lim

µ→∞kS−1

T(µ+S−1

2TΨ)−1ψk= 0.246

Since

Ran

(

ST

) is dense in

H

, there exists a sequence (

ψm

)

m∈N

in

Ran

(

ST

) such that

247

limm→∞ ψm

=

ψ

, and let

vm∈ H

be such that

ψm

=

STvm

. Then

limµ→∞ S−1

T

(

µ

+

248

S−1

2T

Ψ)

−1ψm

=

limµ→∞

(

µ

+

S−1

2T

Ψ)

−1vm

= 0 for all

m∈N

and for

ψ

we have the

249

following estimate250

(2.9) kS−1

T(µ+S−1

2TΨ)−1ψk≤kST(µS2T+Ψ)−1kkψ−ψmk+kS−1

T(µ+S−1

2TΨ)−1ψmk.251

By diﬀerentiating one can show that the mapping [0

,∞

)

3µ7→ k

(

µS2T

+ Ψ)

−1xk2

is

252

a decreasing function for all

x∈ H

, so in particular [0

,∞

)

3µ7→ kST

(

µS2T

+ Ψ)

−1k253

is a bounded function. Thus we can pass to the limit in

(2.9)

, from which we obtain

254

(2.8).255

Finally, (c) follows from Φ(0) = ky∗,hom −Ψ−1STψk=ky∗−ymink.256

Now we are ready to provide proof of Theorem 2.2.257

Proof of Theorem 2.2.

Note that the case

ε≥ kymin −y∗k

is covered trivially.

258

Indeed, by choosing

µε

= 0 in

(2.4)

we obtain

uopt

= Ψ

−1ψ

=

umin

. By assumption

259

on ε, the unconstrained minimizer umin is admissible, and clearly optimal.260

For the rest of the proof we consider the case 0

<ε<kymin −y∗k

. We ﬁx

T, ε >

0

261

and y∗∈ H.262

Based on Lemma 2.8, our problem

(2.3)

(or its equivalent form

(2.3)

) can be

263

restated as264

min

u∈H J(u): kSTu−y∗,homk=ε.265

whose associate Lagrange functional reads as266

L(u, µ) = J(u) + µ

2kSTu−y∗,homk2−ε2.267

Its (global) minimizer corresponds to the unique solution of our problem. As

J

is a

268

diﬀerentiable function, so it is the Lagrangian

L

. Thus it achieves the minimum value

269

in the point (uopt, µε) satisfying270

∇u,µL(uopt , µε)=0.271

By exploring the relation ∇J(u) = Ψu−ψwe get272

∇uL(uopt, µε) = Ψuopt −ψ+µε(S2Tuopt −STy∗,hom )=0,273

which directly leads to the formula

(2.4)

. In order to determine the optimal value of

274

the Lagrange multiplier, we use Lemma 2.8 providing

kSTuopt −y∗,homk

=

ε

. Plugging

275

the expression for the optimal control uopt we obtain276

Φ(µε) = ky∗,hom −(µS2T+ Ψ)−1(µεS2Ty∗,hom +STψ)k=ε.277

Lemma 2.9 and the assumption

ε < kymin −y∗k

ensure the existence of the unique

278

value µεsatisfying the last relation, which completes the proof.279

7

This manuscript is for review purposes only.

3. Sensitivity analysis. In this section we show that the solution of

(2.3)

is

280

stable in the sense that if the parameters

α

,

f

,

β

,

w

,

y∗

and

A

are perturbed by

281

a small perturbation, then the solution of the perturbed problem is as well a small

282

perturbation of the solution of the unperturbed problem. Let 0 < ν < 1 and283

(i) δα < ν such that α+δα > 0,284

(ii) δf ∈L2((0,∞); H) such that kδf kL2((0,∞);H)< ν ,285

(iii) δβ ∈L∞

((0

, T

);

R

) such that

kδβkL∞((0,T );R)< ν

and

β

+

δβ ∈L∞

((0

, T

); [0

,∞

)),

286

(iv) δw ∈L2((0, T ); H) such that kδwkL2((0,T );H)< ν,287

(v) δy∗∈ H such that kδy∗k< ν.288

(vi)

For the perturbation

δA

of the operator

A

we assume that

δA

is a symmetric

289

linear operator in

H

,

A

+

δA

is an upper bounded self-adjoint operator in

H

,

290

and there exists

ζ > max{max σ

(

A

)

,max σ

(

A

+

δA

)

}

and

R >

0 such that for

291

all s∈Rwe have292

(3.1) k(ζ+ is−A−δA)−1−(ζ+ is−A)−1k< ν (1,|s| ≤ R,

s−2,|s|> R.

293

We denote by (

Sδ

t

)

t≥0

the semigroup generated by

A

+

δA

, and introduce the short

294

hand notation

αδ

=

α

+

δα

,

fδ

=

f

+

δf

,

βδ

=

β

+

δβ

,

wδ

=

w

+

δw

and

y∗

δ

=

y∗

+

δy∗

.

295

We will use the estimate

(3.1)

to obtain an upper bound for the perturbation of the

296

semigroup, e.g. kSδ

t−Stk. Now we introduce the perturbed problem297

(3.2) min

u∈H (Jδ(u): kSδ

Tu+ZT

0

Sδ

τfδ(T−τ)dτ−y∗

δk ≤ ε)

298

where299

Jδ(u) = αδ

2kuk2+1

2ZT

0

βδ(t)

Sδ

tu+Zt

0

Sδ

τfδ(t−τ)dτ−wδ(t)

2

dt.300

Let

y∗,hom

δ

=

y∗

δ−RT

0Sδ

τfδ

(

T−τ

)d

τ

,

whom

δ

=

wδ−R·

0Sδ

τfδ

(

·−τ

)d

τ

,

δy∗,hom

=

301

y∗,hom

δ−y∗,hom

and

δwhom

=

whom

δ−whom

. We denote the unique solution of the

302

perturbed problem

(3.2)

by

uopt

δ

, and recall that

uopt

is the unique solution of the

303

unperturbed problem (2.3).304

Theorem 3.1. Under the above assumptions we have305

kuopt

δ−uoptk< Cν306

for νsmall enough, where Cis a constant that does not depend on ν.307

Remark 3.2.

Let us discuss certain situations where assumption

(3.1)

is satisﬁed.

308

(i) Let kδAk< ν ,ζ= max{max σ(A),max σ(A+δA)}+ 1 and R= 1.309

Then, by the second resolvent identity and

k

(

z−T

)

−1k

= 1

/dist

(

z, σ

(

T

)) for

310

self-adjoint Tand z∈ρ(T) we conclude for all s∈R311

k(ζ+ is−Aδ)−1−(ζ+ is−A)−1k=k(ζ+ is−Aδ)−1δA(ζ+ is−A)−1k312

<ν

dist(ζ+ is, σ(Aδ)) ·dist(ζ+ is, σ(A))

313

≤ν

1 + s2.314

315

Hence, (3.1) is satisﬁed if δA is a bounded operator with norm smaller than ν.316

8

This manuscript is for review purposes only.

(ii)

Let

δA

be relatively bounded with respect to

A

with a relative bound smaller

317

than

ν

, i.e.

kδAxk ≤ akxk

+

bkAxk

for all

x∈Dom

(

A

) with 0

≤b<ν

and a

318

nonnegative constant

a

. Let

R

= 1. Then

A

+

δA

is an upper-bounded self-adjoint

319

operator, see e.g. [

14

, Theorem V.4.3, Theorem V.4.11]. Let

ζ > max σ

(

A

) =:

κ320

be arbitrary. Then we have

kA

(

ζ−A

)

−1k

=

Rκ

−∞

|λ|

ζ−λ

d

kE

(

λ

)

k

, hence for all

321

δ >

0 and all

ζ > 2+δ

1+δκ

we have

kA

(

ζ−A

)

−1k<

1 +

δ

. Let

δ

= (

ν−b

)

/

(

a

+

b

)

322

and let

ζ > 2+δ

1+δκ

be such that

k

(

ζ−A

)

−1k< δ

. Let

x∈Dom

(

A

) be arbitrary

323

and let y= (ζ−A)x. Then324

kδAxk ≤ ak(ζ−A)−1yk+bkA(ζ−A)−1yk< νk(ζ−A)xk.325

Let

χ≥ζ

be arbitrary. Then from

hx

+ (

χ−ζ

)(

ζ−A

)

−1x, xi ≥

1 for all

x∈ H326

such that kxk= 1 it follows k(I+ (χ−ζ)(ζ−A)−1)−1k ≤ 1 and hence327

kδA(χ−A)−1k=kδA(ζ−A)−1(I+ (χ−ζ)(ζ−A)−1)−1k< ν.328

This implies

χ∈σ

(

A

+

δA

), e.g.

max σ

(

A

+

δA

)

< ζ

. By the second resolvent

329

identity we have for all s∈R330

k(ζ+ is−Aδ)−1−(ζ+ is−A)−1k=k(ζ+ is−Aδ)−1δA(ζ+ is−A)−1k331

≤ k(ζ+ is−Aδ)−1kkδA(ζ+ is−A)−1k332

= dist(ζ+ is, σ(Aδ))−1kδA(ζ+ is−A)−1k.333

334

For all s6= 0 we have335

kδA(ζ+ is−A)−1k=kδA (I+ is(ζ−A)−1)(ζ−A)−1k336

=kδA(ζ−A)−1I+ is(ζ−A)−1−1k337

<ν

s

1

isI+ (ζ−A)−1−1

338

=ν

sdist −1

is,(ζ−σ(A))−1−1

≤ν

√s2+ 1

339

340

but note that the ﬁnal estimate holds also for

s

= 0. Hence we ﬁnally obtain for

341

all s∈Rand ξ=ζ+ 1342

k(ξ+ is−Aδ)−1−(ξ+ is−A)−1k<ν

1 + s2,343

and hence the perturbation δA satisﬁes the assumption (vi).344

Proof of Theorem 3.1.

We ﬁrst estimate the perturbation bound for Ψ. We deﬁne

345

Ξ :=

{ζ+is:s∈R}

, where

ζ

is the value from the assumption (vi). Then Ξ is in the

346

resolvent sets of both

A

and

A

+

δA

. Using the spectral calculus for generators of

347

C0-semigroups (see, for example [8]), we obtain348

Ψ = αI +1

2πi ZT

0

β(t)ZΞ

e2tλ(λ−A)−1dλdt,349

Ψ + δΨ=(α+δα)I+1

2πi ZT

0

(β(t) + δβ(t)) ZΞ

e2tλ(λ−A−δA)−1dλdt.350

351

9

This manuscript is for review purposes only.

Hence, using Fubini theorem and the resolvent formula, we obtain352

353

(3.3) δΨ = δαI +1

2πi ZΞ

(λ−A−δA)−1δA(λ−A)−1ZT

0

β(t)e2tλdtdλ354

+1

2πi ZT

0

δβ(t)ZΞ

e2tλ(λ−A−δA)−1dλdt.355

356

From

|RT

0β

(

t

)e

2tλ

d

t|≤kβk1

2ζ

(e

2T ζ −

1) for

λ∈

Ξ, the norm of the second term of

δ

Ψ

357

can be estimated from above by358

e2T ξ −1

4ξπ kβkZΞk(λ−A−δA)−1−(λ−A)−1kdλ359

=e2T ξ −1

4ξπ kβk Zξ+iR

ξ−iR

+ZΞ\[ξ−iR,ξ+iR]!k(λ−A−δA)−1−(λ−A)−1kdλ360

<ν(e2T ξ −1)

2ξπ kβk(R+R−1),361

362

where in the last inequality we have used (3.1). From363

ZT

0

δβ(t)ZΞ

e2tλ(λ−A−δA)−1dλdt

≤ZT

0|δβ(t)|

ZΞ

e2tλ(λ−A−δA)−1dλ

dt364

< ν ZT

0

Sδ

2t

dt≤νZT

0Zζ

−∞|e2tλ |dkEA+δA(λ)kdt≤νe2T ξ −1

2ζ,365

366

we obtain that the third term in

(3.3)

has an upper bound

ν

(e

2T ξ −

1)

/

(4

ζπ

). Hence

367

we obtain368

kδΨk< ν 1 + e2T ξ −1

4ζπ 2kβk(R+R−1)+1.369

To obtain an upper bound for

kδψk

, we use the same steps as for

δ

Ψ, but pulling out

370

the L2-functions using H¨older’s inequality. For t≥0 we deﬁne the operator function371

D(t) = 1

2πi ZΞ

etλ (λ−A−δA)−1−(λ−A)−1dλ.372

Note that

D

(

t

) =

Sδ

t−St

, hence

D

(

t

) is actually the perturbation of

St

. We calculate

373

kD(t)k< νπ−1eζt (R+R−1) for all t≥0.374

We ﬁrst estimate375

δwhom =δw −Z·

0

D(τ)f(τ)dλdτ−1

2πi Z·

0ZΞ

eτλ(λ−A−δA)−1δf (τ)dλdτ.376

By using H¨older’s inequality we estimate377

kδwhom (t)k ≤ kδw(t)k+

Zt

0

D(τ)f(τ)dτ

378

+1

2π

Zt

0ZΞ

eτλ(λ−A−δA)−1δf (τ)dλdτ

379

<kδw(t)k+ν

π(R+R−1)kfkse2tζ −1

2ζ+ν

2πse2tζ −1

2ζ.380

381

10

This manuscript is for review purposes only.

This implies382

kδwhom k<√2ν1 + 1

8ζπ22(R+R−1+ 1)2kfke2T ζ −1

2ζ−T1/2

.383

Now we are in position to estimate δψ. Since384

385

δψ =ZT

0

β(t)D(T+t)whom(t)dt+ZT

0

β(t)D(T+t)δwhom (t)dt386

+ZT

0

δβ(t)D(T+t)whom(t)dt+ZT

0

δβ(t)D(T+t)δwhom (t)dt387

+1

2πi ZT

0

β(t)ZΞ

eτλ(λ−A−δA)−1δwhom (t)dλdt388

+1

2πi ZT

0

δβ(t)ZΞ

eτλ(λ−A−δA)−1whom (t)dλdt389

+1

2πi ZT

0

δβ(t)ZΞ

eτλ(λ−A−δA)−1δwhom (t)dλdt390

391

we can again estimate kδψkusing the techniques from above and obtain392

kδψk ≤ C ν,393

394

where

C

is a constant which does not depend on

ν

and which may change from line to

395

line. Similarly we obtain396

kδy∗,hom k ≤ Cν.397

Hence we obtained that for

ν <

1, each of

kδ

Ψ

k

,

kδψk

and

kδy∗,hom k

has an upper

398

bound of the form Cν. We have also proved kD(t)k< C(t)ν.399

As the solution is given in terms of linear systems

(2.6)

and

(2.7)

, to prove the

400

claim of the theorem it is suﬃcient to show that the solutions of these systems are

401

stable under perturbations. First note that for a chosen

µ

the operator on the left

402

hand side of

(2.6)

and

(2.7)

is bounded and strictly positive and that the same holds

403

for the perturbed right hand side. Moreover, from the estimates obtained above, we

404

see that the perturbation of the left hand side of (2.6) is given by405

µD(2T) + δΨ406

and the perturbation of the right hand side of (2.6) is given by407

µS2Tδy∗,hom +µD(2T)(y∗,hom +δy∗,hom) + STδψ +D(T)(ψ+δψ).408

Hence the norms of the perturbations of both left and right hand side of

(2.6)

are smaller

409

than

Cν

if

ν <

1. This allows us to apply the standard perturbation theoretic results

410

for the solutions of linear systems ([

20

], see also [

6

, Proposition 4.2]) and conclude that

411

the perturbed system

(2.6)

has the solution

x

+

δx

with

δx

satisfying

kδxk< Cν

if

ν412

is suﬃciently small. Let Φ

δ

be the perturbed function Φ,

δ

Φ

δ

= Φ

δ−

Φ, and let

µδ

ε

be

413

the solution of the equation Φ

δ

(

µ

) =

ε

. Then Φ

δ

(

µδ

ε

) =

ky∗,hom

+

δy∗,hom −xε−δxεk

,

414

where

xε

+

δxε

is the solution of the perturbed system

(2.6)

with

µ

=

µδ

ε

. Using the

415

obtained bounds on the perturbations, it follows |δΦδ(µδ

ε)|< Cν . Hence416

|Φ(µδ

ε)−Φ(µε)|=|Φδ(µδ

ε)−δΦ(µδ

ε)−ε|=|δΦ(µδ

ε)|< Cν.417

Since Φ is a continuous and monotone function, it follows that Φ

−1

is continuous,

418

hence we obtain |δµε|< Cν.419

11

This manuscript is for review purposes only.

4. Approximations of semigroups and related operator functions. In

420

this section we will review rational approximation methods for a semigroup

St

whose

421

generator

A

is a self-adjoint operator with upper bound

κ≤

0. Our approach to

422

constructing numerical approximations can be applied to non stable systems (those

423

systems for which

κ >

0) as well, but such systems are not included among our

424

examples, and we just note that in the case of a non-stable systems the estimates

425

include a multiplicative constant which grows exponentially with κ.426

We say that the function

r

is a type (

n, m

) rational function, where

n

and

m

are

427

nonnegative integers, if there are polynomials

p

and

q

of degrees at most

n

and

m

,

428

respectively, such that

r

=

p/q

. Here the degrees

n

and

m

need not be optimal. Given

429

a rational function rwe have430

(4.1) (v, Stv)−(v, r (A)v) = Zκ

−∞ etλ −r(λ)d(E(λ)v, v),

431

where

E

(

·

) denotes the spectral measure of the self-adjoint operator

A

[

22

,

14

]. The

432

support of the spectral measure of any of the operators

tA

, for

t >

0 is contained in

433

(−∞,0] and computing similarly as in (4.1) we obtain the estimate434

(4.2) kg(A)v−r(A)vk≤kg−rkL∞(−∞,0]kvk,435

where the function

g

is measurable with respect to the spectral measure of

A

. When

436

considering numerical eﬃciency, a key information is if there exists a rational approxi-

437

mation of gwith nsmall.438

For the case in which

g

(

z

) =

e−z

, it is known, see [

7

,

24

], that for each

n

there

439

exists a unique type (

n, n

) rational function

r∗

n

which minimizes

kg− ·kL∞(−∞,0]

.

440

Furthermore, there exists a constant C > 0, independent of n, such that r∗

nveriﬁes441

kg−r∗

nkL∞(−∞,0] = min{kg−rkL∞(−∞,0] :ris a type(n, n) rational function.}442

≤C

Hn≤C

9.28903n.(4.3)443

444

The number

H

is known under the name of Halphen constant, see [

23

]. The rational

445

function

r∗

n

is the unique minimizer of

kg−rkL∞(−∞,0]

among (

n, n

) rational functions.

446

Further, r∗

ndoes not have zero-pole pairs appearing on the negative real axis.447

For the error analysis of approximations of semi-groups it is particularly convenient

448

if the rational function is representable in the partial fractions form. For constants

r0

449

and ri,ζi,i= 1,·· · , n the expression450

br(z) = r0+r1

z−ζ1

+·· · rn

z−ζn

451

is a partial fractions expansion of the rational function

br

. It has been shown that

452

for

g

(

z

) =

e−z

one can construct, see [

24

], a partial fractions expansion of the type

453

(

n−

1

, n

) rational function

brn

such that

kg−brnkL∞(−∞,0] ≤C

3

.

2

−n

. The constant

454

C >

0 is independent of

n

. The poles

ζi

are contained on a hyperbola – the so called

455

Weideman hyperbola [

26

,

24

]– in a complex plane and the weights are deﬁned by the

456

application of the

d

point quadrature rule to the Cauchy integral representation of the

457

exponential function with this hyperbola as a contour. We summarize the results in

458

the following proposition which follows directly from [26,27].459

Lemma 4.1.

Let

f

be analytic in

C\R−

with

f

(

z

)

→

0uniformly as

Re z→ −∞

and let Γbe the Weideman hyperbola from [

26

,

24

]. Then there exist

ξi∈

Γand

ri∈C

,

12

This manuscript is for review purposes only.

i= 1,·· · , n such that

kg(A)v−

n

X

i=1

ri(A−ξi)−1vk ≤ 3−nkvk

for any positive deﬁnite

A

and

g

(

z

)=e

−zf

(

−z

). Furthermore, the restriction of the

460

function ˆr(z) = Pn

i=1 ri(ξi−z)to R+is real.461

4.1. Rational function ﬁtting. The rational function which was constructed

462

in Lemma 4.1 is not the optimal rational approximation. In order to improve the

463

approximation, we follow the approach of [

23

] and consider the class of perturbed

464

exponential functions gwhich can be represented as465

g(x) = u0(x) + u1(x)eκx

466

where

u0

and

u16

= 0 are arbitrary rational functions and

κ <

0 is some constant.

467

According to [

23

, Theorem 1], for any

n∈N

and a chosen but ﬁxed integer

k

such

468

that n−k≥0 there exists a unique rational function r∗

n,n+ksuch that469

r∗

n,n+k= arg min{kg−rn,n+kkL∞(−∞,0] :ris rational function of type (n, n +k)}470

and

kg−r∗

n,n+kkL∞(−∞,0] ≤C

9

.

28903

−n

. However, [

7

,

23

] only provide results on

471

the existence of solutions, not their construction. A way to construct a rational

472

approximation satisfying

(4.3)

is to transform the interval (

−

1

,

1] to (

−∞,

0] and then

473

apply the contour integration technique to the transformed problem, see [

24

]. This can

474

be achieved by the Moebius transformation

m

(

z

) = 9(

z−

1)

/

(

z

+1), see [

24

]. The inverse

475

transformation to

m

is given by the formula

m−1

(

z

) =

−

(

z

+ 9)

/

(

z−

9) and it maps

476

h−∞,0]

to

h−1,1]

. Then the function to approximate is

g

(

z

) = e

m(z)

: (

−

1

,

1]

→R477

and the rational function which approximates

ez

is obtained by composing the rational

478

approximant of

g

with the inverse Moebius transformation. We ﬁrst loop a ﬁnite

479

contour around the interval (

−

1

,

1] and then these points get mapped by the Moebius

480

transform into points on a curve looping around the inﬁnite interval (−∞,0].481

Let now

g1

and

g2

be perturbed exponential functions. We are interested in ﬁnding

482

type (

n, n

) rational approximations of functions of the form

g1

+

g2

,

g1g2

,

g1/g2

and

483

gi◦m

. Obviously, combining rational approximations

ri

of

gi

is a natural ﬁrst idea.

484

However, the rational functions

r

=

r1

+

r2

,

r

=

r1r2

or

r

=

r1/r2

will in general

485

be of a diﬀerent (component-wise larger) type.486

We can however use an approximation approach to truncate the type of the

487

product, sum or a quotient of two rational functions of the type (

n, n

) to a rational

488

function

er◦

of the type (

n, n

) which for given

tol >

0 and an interval

[a, b]

satisﬁes

489

the estimate ker−rkL2[a,b]≤tolkrkL2[a,b].490

To this end we use the award winning

rkfit

algorithm from [

2

]. This is the

491

rational Krylov function ﬁtting algorithm which implements the rational functions

492

calculus by working with a representation of a rational function as a transfer function

493

of a pencil of Hessenberg matrices. It performs all ot the aforementioned operations

494

(addition, division, multiplication and composition with a Moebius transformation)

495

stably using only ﬂoating point arithmetic. According to [

2

] given a tolerance

tol496

and the perturbed exponential function

g

(

x

) =

u0

(

x

) +

u1

(

x

)e

κx

,

κ <

0 such that

497

g∈L2(−∞,0], rkfit algorithm produces a rational function498

(4.4) rRK (x) = r0+r1

x−˜

ζ1

+· ·· +rn

x−˜

ζn

,499

13

This manuscript is for review purposes only.

in the pole residue form, such that500

krRK −gkL∞(−∞,0] ≤tolkgkL2(−∞,0].501

We can now construct the operator rRK (A) := r0I+Pn

i=1 ri(A−˜

ζi)−1such that502

kg(A)−rRK (A)kL(H)≤tolkgkL2(−∞,0].503

4.2. Galerkin resolvent estimates. The steps needed to compute the action

504

of a function of an operator on a vector, exemplary

g

(

A

)

v

=

Stv

, involve two steps.

505

First, we approximate the function

g

by a rational function on an interval containing

506

the spectrum of the self-adjoint operator

A

. We then need to sample the resolvent

507

(z−A)−1vat the poles of the rational function r.508

In what follows we will restrict our considerations to the operator of the divergence

509

type posed in a compact polygonal domain Ω

⊂R2

. Many statements are algebraic

510

in nature and hold in a more general setting. However, the interpolation results

511

for piecewise polynomial functions and the regularity results for the domain of the

512

operator are speciﬁc to the aforementioned class of operators.513

We approximate the action of the resolvent by selecting a ﬁnite dimensional

514

subspace

Vh⊂Dom

(

A1/2

) and then forming the Galerkin projection of

A

onto

Vh

.

515

According to [

15

, Section 5], the Galerkin projection

Ah

:

Vh→ Vh

is given by the

516

formula517

Ah= (A1/2Ph)∗(A1/2Ph),518

where

Ph

is the orthogonal projection onto

Vh

. Let

Vh

be the space of piece-wise

519

linear, for a given triangular tessellation of Ω, and continuous functions on Ω. The

520

resolvent estimate for

A

using the Galerkin projection

Ah

reads (see e.g. [

13

, Eq. (52)]

521

for technical details)522

(4.5) k(z−A)−1v−(z−Ah)−1vkL2(Ω) ≤2Ch2νa(z)b(z)kvkL2(Ω) ,523

for h<h0and v∈Vh. Here Cand 0 < ν ≤1 are constants such that

inf

vh∈VhkA−1f−vhkH1(Ω) ≤ChνkfkL2(Ω)

and

a(z) = sup n|λ−µ|

|λ|:λ∈σ(A)o,b(z) = sup n|λ|

|λ−z|:λ∈σ(A)o.

Here

h

is the maximal diameter of a triangle in the chosen tessellation of Ω and

h0

524

is denoting the minimal level of reﬁnement from which the estimate holds. We will,

525

however need this estimate solely for at most

n

poles

˜

ζi

,

i

= 1

,·· · , n

of the rational

526

function rRK from (4.4), and so527

kˆr(A)v−ˆr(Ah)vkL2(Ω) ≤n C max

ia(˜

ζi)b(˜

ζi)h2νkvkL2(Ω).528

Finally, let

g

(

x

) =

u0

(

x

) +

u1

(

x

)e

κx

be the perturbed exponential function. Based on

529

(4.2) and Lemma 4.1 we have the estimate530

kg(A)v−ˆr(Ah)vkL2(Ω) ≤ kg(A)v−ˆr(A)vkL2(Ω) +kˆr(A)v−ˆr(Ah)vkL2(Ω)

531

≤3−nkvk+n C max

ia(˜

ζi)b(˜

ζi)h2νkvkL2(Ω).(4.6)532

533

14

This manuscript is for review purposes only.

On the other hand if we choose

rRK

as the optimal rational approximation of

g

, then

534

we have the qualitative estimate535

kg(A)v−rRK (Ah)vkL2(Ω) ≤ kg(A)v−g(Ah)vkL2(Ω) +kg(Ah)v−rRK (Ah)vkL2(Ω)

536

≤ kg(A)v−g(Ah)vkL2(Ω) + 9.28903−n

537

538

which indicates that the error which dominates the process is the error in the Galerkin

539

approximation of

g

(

A

) by

g

(

Ah

). This error can be estimated as

O

(

h2ν

) using the

540

resolvent technique of [

15

] and under additional assumptions on the discretization

Ah

.

541

Formally establishing this estimate is outside the scope of this paper and would not

542

bring much when compared to the estimate (4.6) which holds for ˆrfrom Lemma 4.1.543

By choosing suitable

rRK

and

h

, the last estimate ensures a good approximation

544

of g(A)vbased on a ﬁnite dimensional approximation of the operator A.545

5. Numerical examples. In this section we consider several constrained op-

546

timization problems in 1D and 2D. The problems are academic and are primarily

547

chosen to test the eﬃciency of the developed approach. We compare our results

548

with those obtained by other, already existing methods where such a comparison is

549

possible. We will also report the timings as means to get an intuition of the eﬃciency

550

of implementation. The timings will be reported for the workstation running Intel

551

Core i5 8600K at 3.60 GHz with 24 GB of DDR4 ram.552

In all examples we take the weight function of the form

β

=

χ[T/3,2T/3]

, while

553

the desired trajectory

w

is assumed to be time independent. This implies that we

554

want the optimal state to be close to

w

for times

t

between

T/

3 and 2

T/

3, while no

555

desired trajectory is prescribed outside this interval. With this setting and under the

556

additional assumption that the operator

A

is strictly negative, the operator Ψ and the

557

vector ψfrom the main theorem can be computed explicitly as558

Ψ = αI +1

2A−1S2T/3(I−S2T/3), ψ =A−1ST /3(I−ST/3)w.559

560

We can now use spectral calculus to exemplary represent the operator Ψ as561

Ψ = αI +1

2ZR

eλ2T/3(1/λ −eλ2T/3/λ) dE(λ).562

The function

λ→g

(

λ

) := 1

/λ −

e

λ2T/3/λ

is obviously a perturbed exponential

563

function for which the rational approximation theory holds (there exists a small degree

564

rational approximation). We can equivalently use rational approximation theory to

565

compute the vector

ψ

. In numerical procedure the ﬁrst step is to determine

µε

- the

566

solution to the equation Φ(

µ

) =

ε

(cf.

(2.5)

). Taking into account the properties of

567

Φ (given in Lemma 2.9), the equation has a unique solution for every

ε∈

(0

,

Φ(0)).

568

Recall that in the ﬁnite dimensional case Φ

2

is a rational function (cf. Remark 2.3)

569

and that

µ

is a zero of Φ if and only if it is a zero of Φ

2

. Given the order of the

570

surrogate model

ds∈N

, we construct a type (

ds, ds

) rational function approximation

571 ˆ

R

of Φ

2

by sampling Φ

2

in at least 2

ds

+ 2 points. We approximate

µε

by solving

572

the equation

R

(

µ

) = 0 and empirically assess the error in the approximation of the

573

rational function Φ

2

by the rational function

ˆ

R

using the AAA algorithm from [

19

]. If

574

the approximation error estimate from the AAA algorithm is to low, we increase the575

order

ds

and recompute the surrogate model. We aim to keep this error indicator (by

576

increasing

ds

) below the level of the error of the ﬁnite element approximation. Note

577

that given the fact that Φ

2

is a rational function, this error will decrease by choosing

578

15

This manuscript is for review purposes only.

ds

large enough. According to the resolvent analysis, the error in the approximation

579

by a rational function is a lower order perturbation of the system, as compared to the

580

discretization error.581

The value of the function Φ(

µ

) is computed by using the rational approximation

582

and the spectral calculus583

(µS2T+ Ψ)−1µS2Ty∗,hom

=Z0

−∞

µe2T λ

µe2T λ +α+1

2(1/λ −eλ2T/3/λ)dE(λ)y∗,hom

≈r0y∗,hom +

d

X

i=1

ri(˜

ζi−A)−1y∗,hom.

584

The function585

g(µ,α,T )

1(λ) = µe2T λ

µe2T λ +α+1

2(1/λ −eλ2T/3/λ)

586

– for ﬁxed

µ

,

α

,

T

– is approximated on (

−∞,

0] using the rational function

r

with 18

587

pole residue pairs. The function is a quotient of perturbed exponential functions for

588

which we know that there is a high quality low degree rational approximation. We

589

could compute a rational approximation of

g

as a quotient of rational approximations,

590

however this rational function could have, in the worst case, double the degree of

591

the best rational approximation of the numerator and denominator. Instead, as

592

discussed in Section 4.1, we choose to approximate the function

g

directly, as means

593

of keeping the degree of the approximating rational function lower. Note that these

594

approximations are obviously independent of A, and hold on (−∞,0].595

Once, the equation for

µε

is solved, the optimal initial control

uopt

follows by

(2.4)

.

596

For its computation we explore the same procedure as the one for calculation of Φ(

µ

).

597

The other family of perturbed exponential functions which we need is

g(µ,α,T )

2(λ) = eT λ/3(1 −eT λ/3)/λ

µe2T λ +α+1

2(1/λ −eλ2T/3/λ).

It stems from the spectral representation of the second summand in

(2.4)

. We organize

598

our computation in the on-line / oﬀ-line computation. The function Φ

2

(

µ

) is rational,

599

as was claimed in Remark 2.3, and we start by computing its rational surrogate

ˆ

R

,

600

and then we proceed by computing

uopt

, for any given

ε

, in the on-line phase using

ˆ

R

.

601

Remark 5.1.

Note that a gradient method for computing the minimizer of

J602

requires a solution of the forward problem for the parabolic equation in each step

603

of the method. A basic step of any implicit method for the solution of a parabolic

604

equation is the evaluation of a resolvent like function. The convergence of a gradient605

method with Nesterov’s acceleration is at best 1

/n2

, where

n

is the number of solves of

606

the forward problem. Furthermore, for each new tolerance

ε

the whole procedure needs

607

to be run again. On the other hand, our method is organized in the on-line/oﬀ-line

608

phase and it reuses previous computations in the form of the rational surrogate

ˆ

R

. In

609

the case in which the best rational function approximation of the operator function is

610

used we obtain the convergence rate of 9

−n

, where

n

is the number of the resolvent

611

evaluations. In the case in which a pole residue form of the rational function is used we

612

achieve the convergence rate of 3

−n

, for

n

resolvent evaluations (see Lemma 4.1). This

613

is a very crude cost measure. However, under a very modest assumption that we need

614

16

This manuscript is for review purposes only.

Algorithm 5.1 Surrogate optimal control

Require: ε > 0, T > 0, α > 0, Ns∈N,ds∈Nand Ah

Generate

Ns

equally spaced points on the log scale

µi

= e

−10+ 15

Nsi

,

i

= 1

,·· · , Ns−

1.

while i≤Nsdo This can be done in parallel

Compute the rational function surrogate ˆg1of g(µi,α,T )

1on [0,kAhk]

Compute the rational function surrogate ˆg2of g(µi,α,T )

2on [0,kAhk]

Compute (µi,^

Φ(µi)) from (2.5) using spectral calculus and rkfit

end while

Compute the rational function ˆ

Rof type (ds, ds) such that it minimizes

Ns

X

i=1 |^

Φ2(µi)−ˆ

R(µi)|2

using AAA algorithm from [19].

Solve ˆ

R(µε) = ε Here start the on-line steps

Output uopt from (2.4) using spectral calculus and rkfit

if given new εthen

Repeat on-line steps

Output uopt

end if

at least one evaluation of the resolvent function to compute the solution of the forward

615

problem, we clearly see a potential advantage of the the rational function approach.

616

This is the reason why such methods are becoming methods of choice for the solution

617

of parabolic problems and also for numerically inverting the Laplace transform in the

618

case of the solution of inverse problems, see [23].619

In all of the experiments when we have applied Algorithm 5.1, we have used

ds

= 6

620

and have sampled the function Φ2in Ns= 20 points.621

5.1. 1D heat equation. As the ﬁrst test of the proposed method we consider the

622

heat equation (with variable coeﬃcient) on Ω = [0

, π

] accompanied by homogeneous

623

Dirichlet boundary conditions. The operator Ais taken of the form624

A=−∂x((1 + cχ[γ,π])∂x)625

with

γ

= 2

.

2. The parameter

γ

determines the contact of two materials with a diﬀerent

626

diﬀusivity coeﬃcient. We consider two cases:627

1. c= 0, with Abeing the isotropic Laplace operator;628

2. c=−0.8, resulting in discontinuity of diﬀusion coeﬃcient at point γ.629

Operator

A

is discretized by conforming linear ﬁnite elements with

h

= 1

/

20 and we

630

needed 103 degrees of freedom to represent Ah.631

Besides the function

β

determined in the beginning of this section, for this example

632

we propose633

•α= 10−4,634

•ﬁnal time T= 0.01,635

•desired trajectory ω=χ[π/5,2π/5],636

•ﬁnal target y∗=χ[3π/5,4π/5] ,637

•f= 0 (homogenous equation).638

17

This manuscript is for review purposes only.

The choice of

w

stimulates the state trajectory to be concentrated on the left part

639

of the domain during the central time period, while at the ﬁnal time the target

y∗

640

requires it to be supported at the right hand side, at least for small values of the

641

tolerance ε.642

For the isotropic case

c

= 0, the above setting coincides with Example 4.1 from

643

[

16

]. In such a way we shall be able to compare our results with those obtained by a

644

diﬀerent method based on spectral decomposition of the Laplace operator.645

The example is performed for three values of the ﬁnal tolerance646

ε= [0.2,0.5,0.9]Φ(0),647

depicted on Figure 1, together with the corresponding values

µε

(solutions to equation

648

(2.5)

) and graph of function Φ (in

log −log

scale). The ﬁgure conﬁrms the properties

649

of function Φ provided by Lemma 2.9. Specially, its initial value Φ(0) coincides with650

kymin −y∗k= 1.0374651

where

ymin

is the optimal ﬁnal state of the unconstrained problem. The corresponding

652

initial value

umin

, which is just the minimizer of functional

J

can also be obtained by

653

standard methods of convex analysis.654

The elapsed time to produce the plot which included sampling Φ in 350 points

655

was 12

.

97 seconds and it took 0

.

36 seconds to compute Ψ(0) alone. The results in

(a) Unisotropic difusion (b) Isotropic difusion

Figure 1: Function Φ, the chosen values of εand corresponding µε.

656

the isotropic case for the prescribed values of

ε

are presented in Figure 2. When

ε657

is small the initial mass is concentrated on the support of the target

y∗

, in order to

658

steer the system close to it at the ﬁnal time. On the opposite, for large value of

ε

,

659

the initial control is concentrated on the left. In such a way the solution stays close

660

to the desired trajectory

ω

in the middle part of the time interval during which the

661

distributed cost

β

is active. Finally, the intermediate value of

ε

is a trade-oﬀ between

662

the optimisation of the cost functional

J

and the requirement to hit the ﬁnal target

663

with the given tolerance.664

Besides agreeing with the intuition, the results completely coincide with those

665

obtained in [

16

, Example 4.1]. This provides the ﬁrst conﬁrmation of the method

666

proposed in this article. Furthermore, we note that for increased tolerance

ε

the

667

optimal control resembles the solution of the unconstrained problem

umin

(Figure 3),

668

where the latter is just a minimizer of functional

J

. This is expected, as for large values

669

18

This manuscript is for review purposes only.

Figure 2: Example 5.1, isotropic case. The initial control

u

=

y

(0) (left), the computed

solution at time

t

=

T/

2 compared with the desired trajectory

ω

(middle), and the

optimal ﬁnal state at

t

=

T

compared with the target

y∗

(right) for three diﬀerent

values of the tolerance ε.

of

ε

the solution is less aﬀected by the prescribed target

y∗

, while the unconstrained

670

problem is completely independent of it. The results for the discontinuous diﬀusion

671

and for the same range of the ﬁnal tolerance are presented in Figure 4. In their main

672

features, the results coincide with those obtained in the case of the constant diﬀusion

673

coeﬃcient. The novelty is broken symmetry of the solution in the right part of the

674

domain, where the discontinuity occurs. As a consequence, the center of initial mass is

675

slightly shifted rightward, where diﬀusion processes are slower. This is logical, having

676

in mind that in this region the initial mass can better approximate the characteristic

677

function (of the support of

y∗

) during a larger period of time, as small diﬀusion rate

678

will not modify its form signiﬁcantly.679

Remark 5.2.

It is interesting to explore the eﬀect of the penalty

α

on the run

680

of Algorithm 5.1. Increasing

α >

0 penalizes the oscillations in the solution

uopt

.

681

Ultimatively, the size of the penalty

α

should be chosen adaptively depending on the

682

discretization parameter hand the estimated smoothness of the functions wand y∗.683

5.2. 2D heat equation on irregular domain. In the next example we repeat

684

the same calculation in the 2D setting. To this end we use the Dirichlet Laplace

685

operator deﬁned on the L-shape domain Ω =

[−1,1]2\

(

[−1,0] ×[0,1]

). We will use

686

Lagrange P1 elements (i.e. we approximate using piecewise linear and continuous

687

19

This manuscript is for review purposes only.

Figure 3: Example 5.1, unisotropic case. The plot of umin and y∗.

functions). We have used a shape regular mesh with

h

= 1

/

30 and we choose

T

= 1

/

20.

688

This results in the discretization matrix

Ah

having 2953 degrees of freedom. Due

689

to the reentrant corner of the L-shaped domain we have a loss of regularity of the

690

functions in

Dom

(

A

) and so the resolvent estimate

(4.5)

holds with

ν

, 0

< ν <

1. In

691

the case of H2-regular solutions we would have r= 1.692

For the target data we choose693

•ω(x) = χkx−x0k1≤0.2,694

•y∗(x)=e−20kx−x1k2+ e−20kx−x2k2+ e−30kx−x3k2,695

with

x0

= (

−

0

.

5

,−

0

.

5),

x1

= (0

.

5

,

0

.

5),

x2

= (0

.

6

,

0

.

1) and

x3

= (0

.

8

,

0

.

4) (Figure 5).

696

The other parameters are the same as in the previous example. The results for the

697

three values of the ﬁnal tolerance

ε

= [0

.

1

,

0

.

5

,

0

.

9]Φ(0) are displayed in Figure 6. We

698

show the solutions’ snapshots at

t

= 0

, T /

2

, T

. The ﬁrst row depicts evolution of the

699

state for small tolerance

ε

. The initial control steers the system close to the prescribed

700

target

y∗

(cf. Figure 5) at the ﬁnal time, while there is no coincidence with

ω

in the

701

between period. For the tolerance 0

.

5 of the range of Φ equal importance is assigned

702

both to

ω

and the ﬁnal state. The largest tolerance allows the solution to optimize

703

the given cost functional almost independently of the prescribed target y∗.704

Essentially, the results exhibit the same behavior as those obtained in the previous

705

examples. The elapsed time for computing Ψ(0) – the unconstrained problem – is

706

0

.

9828 seconds. This demonstrates eﬃciency and ﬂexibility of the method in 2D and,

707

in particular, in case of irregular domains. We chose to exemplary report the timing

708

for Ψ(0), since in this case it is possible to compute the value of Ψ with other methods

709

such as those which are based on gradient optimization.710

6. Conclusion. In this paper we have constructed and implemented a numerical

711

algorithm for a constrained optimal control problem. The problem consists of identi-

712

fying an initial datum that minimizes a given cost functional and steers the system

713

at the ﬁnal time within a prescribed distance from the target. The algorithm results

714

in an (almost explicit) formula for the solution, expressed in terms of the operator

715

governing the system. The formula itself was derived previously (cf. [

16

]), but its

716

implementation was based on spectral decomposition, which requires knowledge or

717

construction of eigenfunctions of the operator.718

The main novelty of this article is twofold. Firstly, we provide a complete quantiﬁed

719

sensitivity analysis of the solution with respect to all the data entering the problem.

720

20

This manuscript is for review purposes only.

Figure 4: Example 5.1, discontinuous diﬀusion. The initial control

u

=

y

(0) (left), the

computed solution at time

t

=

T/

2 compared with the desired trajectory

ω

(middle),

and the optimal ﬁnal state compared with the target

y∗

(right) for three diﬀerent

values of the tolerance ε.

In particular, it implies a good approximation of the solution in cases where the

721

operator or the external source are not completely determined. Secondly, for the

722

numerical implementation we explore eﬃcient Krylov subspace techniques that allow

723

us to approximate a complex function of an operator by a series of linear problems.

724

We provide a-priori estimates for the approximation that are not sensitive to any

725

particular spatial discretization, and neither to a matrix representation of the operator

726

A

. The theoretical results are conﬁrmed by numerical examples. The ﬁrst and the

727

simplest example coincides with the one analysed in [

16

], and the results obtained with

728

two approaches are in complete agreement. The following, more complex examples

729

conﬁrm the good performance of the algorithm in the case of operators with variable

730

coeﬃcients and acting on irregular domains.731

The proposed approach can be generalised to other optimal control problems. The

732

ﬁrst step in this direction would be to consider a distributed control problem, i.e. one

733

in which a control enters the equation through a non-homogeneous term and is active

734

along the entire time frame. This would also allow for boundary control problems

735

which, by using the classical Fattorini’s approach [

10

], can be expressed as distributed

736

ones. The second generalisation would consider diﬀerent norms that enter into the cost

737

functional. In particular, it would be tempting to include L1-terms in the cost, since738

these introduce sparsity into the control. Of course, such a generalisation requires a

739

21

This manuscript is for review purposes only.

Figure 5: Example 5.2. The prescribed ﬁnal target y∗.

more subtle theoretical analysis, as the cost functional is not diﬀerentiable in this case.

740

This approach would also enable to consider diﬀerent non-smooth, convex functionals.

741

Acknowledgment. The work of L.G. has been supported by Hrvatska Zaklada za

742

Znanost (Croatian Science Foundation) under the grant IP-2019-04-6268 - Randomized

743

low rank algorithms and applications to parameter dependent problems. The work of

744

M.L. and I.N. has been supported by Hrvatska Zaklada za Znanost (Croatian Science

745

Foundation) under the grant IP-2016-06-2468 - Control of Dynamical Systems.746

REFERENCES747

[1]

H. Bauschke and P. Combettes,Convex analysis and monotone operator theory in Hilbert

748

spaces, CMS Books in Mathematics book series, Springer, Cham, 2nd ed., 2017, https:

749

//doi.org/10.1007/978-3-319-48311- 5.750

[2]

M. Berljafa and S. G

¨

uttel,The RKFIT algorithm for nonlinear rational approximation,

751

SIAM J. Sci. Comput., 39 (2017), pp. A2049–A2071, https://doi.org/10.1137/15M1025426.

752

[3]

F. Boyer,On the penalised HUM approach and its applications to the numerical approximation

753

of null-controls for parabolic problems, in CANUM 2012, 41e Congr`es National d’Analyse

754

Num´erique, L. Chupin and A. M¨unch, eds., vol. 41 of ESAIM Proc., EDP Sciences, Les

755

Ulis, 2013, pp. 15–58, https://doi.org/10.1051/proc/201341002.756

[4]

C. Carthel, R. Glowinski, and J. Lions,On exact and approximate boundary controllabilities

757

for the heat equation: a numerical approach, J. Optim. Theory Appl., 82 (1994), pp. 429–484,

758

https://doi.org/10.1007/BF02192213.759

[5]

E. Casas, B. Vexler, and E. Zuazua,Sparse initial data identiﬁcation for parabolic PDE

760

and its ﬁnite element approximations, Math. Control Relat. Fields, 5 (2015), pp. 377–399,

761

https://doi.org/10.3934/mcrf.2015.5.377.762

[6]

G. Chen and Y. Xue,Perturbation analysis for the operator equation

T x

=

b

in Banach spaces,

763

J. Math. Anal. Appl., 212 (1997), pp. 107–125.764

[7]

W. Cody, G. Meinardus, and R. Varga,Chebyshev rational approximations to

ex

in [0

,∞

)

765

and applications to heat-conduction problems, J. Approx. Theory, 2 (1969), pp. 50–65,

766

https://doi.org/10.1016/0021-9045(69)90030-6.767

[8]

K. Engel and R. Nagel,One-Parameter Semigroups for Linear Evolution Equations, vol. 194 of

768

Graduate Texts in Mathematics, Springer, New York, 2000, https://doi.org/10.1007/b97696.

769

[9]

C. Fabre, J. Puel, and E. Zuazua,On the density of the range of the semigroup for semilinear

770

heat equations, in Control and Optimal Design of Distributed Parameter Systems, J. Lagnese,

771

D. Russell, and L. White, eds., vol. 70 of The IMA Volumes in Mathematics and its Applica-

772

tions, Springer, New York, 1995, pp. 73–91, https://doi.org/10.1007/978-1-4613-8460-1 4.773

[10]

H. O. Fattorini,Boundary control systems, SIAM Journal on Control, 6 (1968), pp. 349–385,

774

https://doi.org/10.1137/0306025.775

[11]

E. Fern

´

andez-Cara and A. M

¨

unch,Numerical exact controllability of the 1D heat equation:

776

22

This manuscript is for review purposes only.

Figure 6: Example 5.2. The initial control

u

=

y

(0) (left), the computed solution at

time

t

=

T/

2 compared with the desired trajectory

ω

(middle), and the optimal ﬁnal

state (right) at

t

=

T

for three diﬀerent values of the tolerance

ε

. The red dashed

circle marks the constraint ωon the trajectory.

duality and Carleman weights, J. Optim. Theory Appl., 163 (2014), pp. 253–285, https:

777

//doi.org/10.1007/s10957-013-0517-z.778

[12]

E. Fern

´

andez-Cara and E. Zuazua,The cost of approximate controllability for heat equations:

779

the linear case, Adv. Diﬀerential Equations, 5 (2000), pp. 465–514, https://doi.org/ade/

780

1356651338.781

[13]

J. Gopalakrishnan, L. Grubi

ˇ

si

´

c, and J. Ovall,Spectral discretization errors in ﬁltered

782

subspace iteration, Math. Comp., 89 (2020), pp. 203–228, https://doi.org/10.1090/mcom/

783

3483.784

[14]

T. Kato,Perturbation theory for linear operators, vol. 132 of Classics in Mathematics, Springer,

785

Berlin, 1995, https://doi.org/10.1007/978-3-642-66282-9.786

[15]

I. Lasiecka and R. Triggiani,Control theory for partial diﬀerential equations: continuous

787

and approximation theories, I Abstract parabolic systems, vol. 74 of Encyclopedia of

788

Mathematics and its