Content uploaded by Pietro Benedusi

Author content

All content in this area was uploaded by Pietro Benedusi on Jul 26, 2021

Content may be subject to copyright.

Content uploaded by Carlo Garoni

Author content

All content in this area was uploaded by Carlo Garoni on Nov 04, 2020

Content may be subject to copyright.

Journal of Scientiﬁc Computing manuscript No.

(will be inserted by the editor)

Fast Parallel Solver for the Space-Time IgA-DG Discretization

of the Diffusion Equation

Pietro Benedusi ·Paola Ferrari ·Carlo Garoni ·

Rolf Krause ·Stefano Serra-Capizzano

Received: date / Accepted: date

Abstract We consider the space-time discretization of the diffusion equation, using an iso-

geometric analysis (IgA) approximation in space and a discontinuous Galerkin (DG) ap-

proximation in time. Drawing inspiration from a former spectral analysis, we propose for

the resulting space-time linear system a multigrid preconditioned GMRES method, which

combines a preconditioned GMRES with a standard multigrid acting only in space. The per-

formance of the proposed solver is illustrated through numerical experiments, which show

its competitiveness in terms of iteration count, run-time and parallel scaling.

Keywords isogeometric analysis ·discontinuous Galerkin ·preconditioned GMRES ·

multigrid ·parallel solver ·spectral distribution ·diffusion equation

Mathematics Subject Classiﬁcation (2010) 65M60 ·65F08 ·65M55 ·65Y05 ·47B06 ·

35Q79

1 Introduction

In recent years, with ever increasing computational capacities, space-time methods have re-

ceived fast growing attention from the scientiﬁc community. Space-time approximations of

dynamic problems, in contrast to standard time-stepping techniques, enable full space-time

parallelism on modern massively parallel architectures [27]. Moreover, they can naturally

P. Benedusi, R. Krause

University of Italian Switzerland (USI), Euler Institute, Lugano, Switzerland

E-mail: pietro.benedusi@usi.ch, rolf.krause@usi.ch

P. Ferrari

University of Insubria, Department of Science and High Technology, Como, Italy

E-mail: pferrari@uninsubria.it

C. Garoni

University of Rome Tor Vergata, Department of Mathematics, Rome, Italy

E-mail: garoni@mat.uniroma2.it

S. Serra-Capizzano

University of Insubria, Department of Humanities and Innovation, Como, Italy, and Uppsala University,

Department of Information Technology, Division of Scientiﬁc Computing, Uppsala, Sweden

E-mail: stefano.serrac@uninsubria.it, stefano.serra@it.uu.se

2 P. Benedusi, P. Ferrari, C. Garoni, R. Krause, S. Serra-Capizzano

deal with moving domains [38,57,58,59,63] and allow for space-time adaptivity [1,24,28,

39,47,49,61]. The main idea of space-time formulations is to consider the temporal dimen-

sion as an additional spatial one and assemble a large space-time system to be solved in

parallel as in [25]. Space-time methods have been used in combination with various numer-

ical techniques, including ﬁnite differences [2,11, 35], ﬁnite elements [4,26,37,40], isoge-

ometric analysis [34,41], and discontinuous Galerkin methods [1,16,32, 37, 38, 48,57,63].

Moreover, they have been considered for a variety of applications, such as mechanics [15],

ﬂuid dynamics [11,38,54], ﬂuid-structure interaction [60], and many others. When dealing

with space-time ﬁnite elements, the time direction needs special care. To ensure that the in-

formation ﬂows in the positive time direction, a particular choice of the basis in time is often

used. The discontinuous Galerkin formulation with an “upwind” ﬂow is a common choice

in this context; see, for example, [38,51, 57,62].

Specialized parallel solvers have been recently developed for the large linear systems

arising from space-time discretizations. We mention in particular the space-time parallel

multigrid proposed by Gander and Neum¨

uller [29], the parallel preconditioners for space-

time isogeometric analysis proposed by Hofer et al. [34], the fast diagonalization techniques

proposed by Langer and Zank [42] and Loli et al. [44], and the parallel proposal by Mc-

Donald and Wathen [46]. We also refer the reader to [56] for a recent review on space-time

methods for parabolic evolution equations, and to [55] for algebraic multigrid methods.

In the present paper, we focus on the diffusion equation

∂tu(t,x)−∇·K(x)∇u(t,x) = f(t,x),(t,x)∈(0,T)×(0,1)d,

u(t,x) = 0,(t,x)∈(0,T)×∂((0,1)d),

u(t,x) = 0,(t,x)∈ {0} × (0,1)d,

(1.1)

where K(x)∈Rd×dis the matrix of diffusion coefﬁcients and f(t,x)is a source term. It is as-

sumed that K(x)is symmetric positive deﬁnite at every point x∈(0,1)dand each component

of K(x)is a continuous bounded function on (0,1)d. We impose homogeneous Dirichlet ini-

tial/boundary conditions both for simplicity and because the inhomogeneous case reduces

to the homogeneous case by considering a lifting of the boundary data [50]. We consider

for (1.1) the same space-time approximation as in [10], involving a p

p

p-degree Ck

k

kisogeo-

metric analysis (IgA) discretization in space and a q-degree discontinuous Galerkin (DG)

discretization in time. Here, p

p

p= (p1,..., pd)and k

k

k= (k1,...,kd), where 0≤k

k

k≤p

p

p−1(i.e.,

0≤ki≤pi−1 for all i=1,...,d) and the parameters piand kirepresent, respectively, the

polynomial degree and the smoothness of the IgA basis functions in direction xi.

The overall discretization process leads to solving a large space-time linear system. We

propose a fast solver for this system in the case of maximal smoothness k

k

k=p

p

p−1, i.e., the

case corresponding to the classical IgA paradigm [3,9, 17,36]. The solver is a preconditioned

GMRES (PGMRES) method whose preconditioner ˜

Pis obtained as an approximation of

another preconditioner Pinspired by the spectral analysis carried out in [10]. Informally

speaking, the preconditioner ˜

Pis a standard multigrid, which is applied only in space and not

in time, and which involves, at all levels, a single symmetric Gauss–Seidel post-smoothing

step and standard bisection for the interpolation and restriction operators (following the

Galerkin assembly). The proposed solver is then a multigrid preconditioned GMRES (MG-

GMRES). Its performance is illustrated through numerical experiments and turns out to be

satisfactory in terms of iteration count and run-time. In addition, the solver is suited for

parallel computation as it shows remarkable scaling properties with respect to the number

of cores. Comparisons with other benchmark solvers are also presented and reveal the actual

competitiveness of our proposal.

Fast Parallel Solver for the Space-Time IgA-DG Discretization of the Diffusion Equation 3

The paper is organized as follows. In Section 2, we brieﬂy recall the space-time IgA-

DG discretization of (1.1) and we report the main result of [10] concerning the spectral

distribution of the associated discretization matrix C. In Section 3, we present a PGMRES

method for the matrix C, which is the root from which the proposed solver originated. In

Section 4, we describe the proposed solver. In Section 5, we describe its parallel version. In

Section 6, we illustrate its performance in terms of iteration count, run-time and scaling. In

Section 7, we test it on a generalization of problem (1.1) where (0,1)dis replaced by a non-

rectangular domain and the considered IgA discretization involves a non-trivial geometry.

In Section 8, we draw conclusions. In order to keep this paper as concise as possible, we

borrow notation and terminology from [10]. It is therefore recommended that the reader

takes a look at Sections 1 and 2 of [10].

2 Space-Time IgA-DG Discretization of the Diffusion Equation

Let N∈Nand n

n

n= (n1,...,nd)∈Nd, and deﬁne the following uniform partitions in time

and space:

ti=i∆t,i=0,...,N,∆t=T/N,

xi

i

i=i

i

i∆x= (i1∆x1,...,id∆xd),i

i

i=0,...,n

n

n,∆x= (∆x1,...,∆xd) = (1/n1,...,1/nd).

We consider for the differential problem (1.1) the same space-time discretization as in [10],

i.e., we use a p

p

p-degree Ck

k

kIgA approximation in space based on the uniform mesh {xi

i

i,i

i

i=

0,...,n

n

n}and a q-degree DG approximation in time based on the uniform mesh {ti,i=

0,...,N}. Here, p

p

p= (p1,..., pd)and k

k

k= (k1,...,kd)are multi-indices, with piand 0 ≤

ki≤pi−1 representing, respectively, the polynomial degree and the smoothness of the IgA

basis functions in direction xi. As explained in [10, Section 3], the overall discretization

process leads to a linear system

C[q,p

p

p,k

k

k]

N,n

n

n(K)u=f,(2.1)

where:

–C[q,p

p

p,k

k

k]

N,n

n

n(K)is the N×Nblock matrix given by

C[q,p

p

p,k

k

k]

N,n

n

n(K) =

A[q,p

p

p,k

k

k]

n

n

n(K)

B[q,p

p

p,k

k

k]

n

n

nA[q,p

p

p,k

k

k]

n

n

n(K)

......

B[q,p

p

p,k

k

k]

n

n

nA[q,p

p

p,k

k

k]

n

n

n(K)

; (2.2)

–the blocks A[q,p

p

p,k

k

k]

n

n

n(K)and B[q,p

p

p,k

k

k]

n

n

nare (q+1)¯n×(q+1)¯nmatrices given by

A[q,p

p

p,k

k

k]

n

n

n(K) = K[q]⊗Mn

n

n,[p

p

p,k

k

k]+∆t

2M[q]⊗Kn

n

n,[p

p

p,k

k

k](K),(2.3)

B[q,p

p

p,k

k

k]

n

n

n=−J[q]⊗Mn

n

n,[p

p

p,k

k

k],(2.4)

where ¯n=∏d

i=1(ni(pi−ki)+ ki−1)is the number of degrees of freedom (DoFs) in space

(the total number of DoFs is equal to the size N(q+1)¯nof the matrix C[q,p

p

p,k

k

k]

N,n

n

n(K)); each

block row in the block partition of C[q,p

p

p,k

k

k]

N,n

n

n(K)given by (2.2) is referred to as a time slab;

4 P. Benedusi, P. Ferrari, C. Garoni, R. Krause, S. Serra-Capizzano

–Mn

n

n,[p

p

p,k

k

k]and Kn

n

n,[p

p

p,k

k

k](K)are the ¯n×¯nmass and stiffness matrices in space, which are given

by

Mn

n

n,[p

p

p,k

k

k]=Z[0,1]dBj

j

j+1,[p

p

p,k

k

k](x)Bi

i

i+1,[p

p

p,k

k

k](x)dxn

n

n(p

p

p−k

k

k)+k

k

k−1

i

i

i,j

j

j=1

,(2.5)

Kn

n

n,[p

p

p,k

k

k](K) = Z[0,1]dK(x)∇Bj

j

j+1,[p

p

p,k

k

k](x)·∇Bi

i

i+1,[p

p

p,k

k

k](x)dxn

n

n(p

p

p−k

k

k)+k

k

k−1

i

i

i,j

j

j=1

,(2.6)

where B1,[p

p

p,k

k

k],...,Bn

n

n(p

p

p−k

k

k)+k

k

k+1,[p

p

p,k

k

k]are the tensor-product B-splines deﬁned by

Bi

i

i,[p

p

p,k

k

k](x) =

d

∏

r=1

Bir,[pr,kr](xr),i

i

i=1,...,n

n

n(p

p

p−k

k

k) + k

k

k+1,

and B1,[pr,kr],...,Bnr(pr−kr)+kr+1,[pr,kr]are the B-splines of degree prand smoothness Ckr

deﬁned on the knot sequence

0,...,0

| {z }

pr+1

,1

nr

,..., 1

nr

| {z }

pr−kr

,2

nr

,..., 2

nr

| {z }

pr−kr

, .. . , nr−1

nr

,..., nr−1

nr

| {z }

pr−kr

,1,...,1

| {z }

pr+1.

–M[q],K[q],J[q]are the (q+1)×(q+1)blocks given by

M[q]=Z1

−1`j,[q](τ)`i,[q](τ)dτq+1

i,j=1

,(2.7)

K[q]=`j,[q](1)`i,[q](1)−Z1

−1`j,[q](τ)`0

i,[q](τ)dτq+1

i,j=1

,(2.8)

J[q]=`j,[q](1)`i,[q](−1)q+1

i,j=1,(2.9)

where {`1,[q],...,`q+1,[q]}is a ﬁxed basis for the space of polynomials of degree ≤q.

In the context of (nodal) DG methods [33], `1,[q],.. . ,`q+1,[q]are often chosen as the La-

grange polynomials associated with q+1 ﬁxed points {τ1,...,τq+1} ⊆ [−1,1], such as,

for example, the Gauss–Lobatto or the right Gauss–Radau nodes in [−1,1].

The solution of system (2.1) yields the approximate solution of problem (1.1); see [10] for

details. The main result of [10] is reported in Theorem 2.1 below; see also [8, Section 6.2] for

a more recent and lucid proof. Before stating Theorem 2.1, let us recall the notion of spectral

distribution for a given sequence of matrices. In what follows, we say that a matrix-valued

function f:D→Cs×s, deﬁned on a measurable set D⊆R`, is measurable if its components

fi j :D→C,i,j=1,...,s, are (Lebesgue) measurable.

Deﬁnition 2.1 Let {Xm}mbe a sequence of matrices, with Xmof size dmtending to inﬁnity,

and let f:D→Cs×sbe a measurable matrix-valued function deﬁned on a set D⊂R`with

0<measure(D)<∞. We say that {Xm}mhas a (asymptotic) spectral distribution described

by f, and we write {Xm}m∼λf, if

lim

m→∞

1

dm

dm

∑

j=1

F(λj(Xm)) = 1

measure(D)ZD

∑s

i=1F(λi(f(y)))

sdy

for all continuous functions F:C→Cwith compact support. In this case, fis called the

spectral symbol of {Xm}m.

Fast Parallel Solver for the Space-Time IgA-DG Discretization of the Diffusion Equation 5

Remark 2.1 The informal meaning behind Deﬁnition 2.1 is the following: assuming that

fpossesses sRiemann-integrable eigenvalue functions λi(f(y)),i=1,...,s, the eigenval-

ues of Xm, except possibly for o(dm)outliers, can be subdivided into sdifferent subsets of

approximately the same cardinality; and the eigenvalues belonging to the ith subset are ap-

proximately equal to the samples of the ith eigenvalue function λi(f(y)) over a uniform grid

in the domain D. For instance, if `=1, dm=ms, and D= [a,b], then, assuming we have no

outliers, the eigenvalues of Xmare approximately equal to

λifa+jb−a

m,j=1,...,m,i=1,...,s,

for mlarge enough; similarly, if `=2, dm=m2s, and D= [a1,b1]×[a2,b2], then, assuming

we have no outliers, the eigenvalues of Xmare approximately equal to

λifa1+j1b1−a1

m,a2+j2b2−a2

m,j1,j2=1,...,m,i=1,...,s,

for mlarge enough; and so on for `≥3.

Theorem 2.1 Let q ≥0be an integer, let p

p

p∈Ndand 0≤k

k

k≤p

p

p−1. Assume that K(x)

is symmetric positive deﬁnite at every point x∈(0,1)dand each component of K(x)is a

continuous bounded function on (0,1)d. Suppose the following two conditions are met:

–n

n

n=α

α

αn, where α

α

α= (α1,...,αd)is a vector with positive components in Qdand n varies

in some inﬁnite subset of Nsuch that n

n

n=α

α

αn∈Nd;

–N=N(n)is such that N →∞and N/n2→0as n →∞.

Then, for the sequence of normalized space-time matrices {2Nnd−2C[q,p

p

p,k

k

k]

N,n

n

n(K)}nwe have

the spectral distribution relation

{2Nnd−2C[q,p

p

p,k

k

k]

N,n

n

n(K)}n∼λf[α

α

α,K]

[q,p

p

p,k

k

k],

where:

–the spectral symbol f[α

α

α,K]

[q,p

p

p,k

k

k]:[0,1]d×[−π,π]d→C(q+1)∏d

i=1(pi−ki)×(q+1)∏d

i=1(pi−ki)is de-

ﬁned as

f[α

α

α,K]

[q,p

p

p,k

k

k](x,θ

θ

θ) = f[α

α

α,K]

[p

p

p,k

k

k](x,θ

θ

θ)⊗TM[q]; (2.10)

– f[α

α

α,K]

[p

p

p,k

k

k]:[0,1]d×[−π,π]d→C∏d

i=1(pi−ki)×∏d

i=1(pi−ki)is deﬁned as

f[α

α

α,K]

[p

p

p,k

k

k](x,θ

θ

θ) = 1

∏d

i=1αi

d

∑

i,j=1

αiαjKi j(x)(H[p

p

p,k

k

k])i j(θ

θ

θ); (2.11)

–H[p

p

p,k

k

k]is a d ×d block matrix whose (i,j)entry is a ∏d

i=1(pi−ki)×∏d

i=1(pi−ki)block

deﬁned as in [10, eq. (5.12)];

–Tis the ﬁnal time in (1.1) and M[q]is given in (2.7).

With the same argument used for proving Theorem 2.1, it not difﬁcult to prove the

following result.

Theorem 2.2 Suppose the hypotheses of Theorem 2.1 are satisﬁed, and let

Q[q,p

p

p,k

k

k]

N,n

n

n(K) = ∆t

2IN⊗M[q]⊗Kn

n

n,[p

p

p,k

k

k](K).

Then,

{2Nnd−2(IN⊗A[q,p

p

p,k

k

k]

n

n

n(K))}n∼λf[α

α

α,K]

[q,p

p

p,k

k

k],{2Nnd−2Q[q,p

p

p,k

k

k]

N,n

n

n(K)}n∼λf[α

α

α,K]

[q,p

p

p,k

k

k].

6 P. Benedusi, P. Ferrari, C. Garoni, R. Krause, S. Serra-Capizzano

Table 3.1: Number of iterations GM[p]and PGM[p]needed by, respectively, the GMRES and the PGMRES

with preconditioner P[q,p

p

p,k

k

k]

N,n

n

n(K), for solving the linear system (2.1), up to a precision ε=10−8, in the case

where d=2, K(x) = I2,f(t,x) = 1, T =1, q=0, n

n

n= (n,n),p

p

p= (p,p),k

k

k= (p−1,p−1),N=n. The total

size of the space-time system (number of DoFs) is given by n¯n=n(n+p−2)2.

n=NGM[3]PGM[3]GM[4]PGM[4]GM[5]PGM[5]

20 66 21 85 21 170 21

40 168 40 178 40 235 40

60 295 59 314 59 360 59

80 443 77 473 77 506 77

100 609 94 652 94 699 94

120 790 111 847 111 909 111

n=NGM[6]PGM[6]GM[7]PGM[7]GM[8]PGM[8]

20 269 21 532 21 674 21

40 380 40 572 40 656 40

60 477 59 611 59 690 59

80 621 77 720 77 791 77

100 780 94 879 94 963 94

120 971 111 1025 111 1114 111

3 PGMRES for the Space-Time IgA-DG System

Suppose the hypotheses of Theorem 2.1 are satisﬁed. Then, on the basis of Theorem 2.2 and

the theory of (block) generalized locally Toeplitz (GLT) sequences [7,8,30, 31, 52, 53], we

expect that the sequence of preconditioned matrices

(IN⊗A[q,p

p

p,k

k

k]

N,n

n

n(K))−1C[q,p

p

p,k

k

k]

N,n

n

n(K),(3.1)

as well as the sequence of preconditioned matrices

(Q[q,p

p

p,k

k

k]

N,n

n

n(K))−1C[q,p

p

p,k

k

k]

N,n

n

n(K) = 2

∆t(IN⊗M[q]⊗Kn

n

n,[p

p

p,k

k

k](K))−1C[q,p

p

p,k

k

k]

N,n

n

n(K),(3.2)

has an asymptotic spectral distribution described by the preconditioned symbol

f[α

α

α,K]

[q,p

p

p,k

k

k]−1f[α

α

α,K]

[q,p

p

p,k

k

k]=I(q+1)∏d

i=1(pi−ki).

This means that the eigenvalues of the two sequences of matrices (3.1) and (3.2) are (weakly)

clustered at 1; see [7, Section 2.4.2]. Therefore, in view of the convergence properties of

the GMRES method [13]—see in particular [13, Theorem 2.13] and the original research

paper by Bertaccini and Ng [14]—we may expect that the PGMRES with preconditioner

IN⊗A[q,p

p

p,k

k

k]

N,n

n

n(K)or Q[q,p

p

p,k

k

k]

N,n

n

n(K)for solving a linear system with coefﬁcient matrix C[q,p

p

p,k

k

k]

N,n

n

n(K)

has an optimal convergence rate, i.e., the number of iterations for reaching a preassigned

accuracy εis independent of (or only weakly dependent on) the matrix size. We may also

expect that the same is true for the PGMRES with preconditioner

P[q,p

p

p,k

k

k]

N,n

n

n(K) = IN⊗Iq+1⊗Kn

n

n,[p

p

p,k

k

k](K) = IN(q+1)⊗Kn

n

n,[p

p

p,k

k

k](K),(3.3)

because (up to a negligible normalization factor ∆t/2) P[q,p

p

p,k

k

k]

N,n

n

n(K)is spectrally equivalent

to Q[q,p

p

p,k

k

k]

N,n

n

n(K). Indeed, the spectrum of (P[q,p

p

p,k

k

k]

N,n

n

n(K))−1(IN⊗M[q]⊗Kn

n

n,[p

p

p,k

k

k](K)) is contained

Fast Parallel Solver for the Space-Time IgA-DG Discretization of the Diffusion Equation 7

Table 3.2: Number of iterations GM[p,k]and PGM[p,k]needed by, respectively, the GMRES and the PGM-

RES with preconditioner P[q,p

p

p,k

k

k]

N,n

n

n(K), for solving the linear system (2.1), up to a precision ε=10−8, in the

case where d=2,

K(x1,x2) = cos(x1) + x20

0x1+sin(x2),

f(t,x) = 1, T =1, q=1, n

n

n= (n,n),p

p

p= (p,p),k

k

k= (k,k),N=20. The number of DoFs is given by 40¯n=

40(n(p−k) + k−1)2. Note that K(x1,x2)is singular at (x1,x2) = (0,0).

nGM[1,0]PGM[1,0]GM[2,0]PGM[2,0]GM[2,1]PGM[2,1]GM[3,1]PGM[3,1]

20 244 42 383 42 156 42 276 42

40 502 42 778 42 314 42 560 42

60 763 42 1174 42 474 42 842 42

80 1026 42 1570 42 635 42 1146 42

100 1275 42 1966 42 796 42 1894 42

120 1608 42 2374 42 954 42 1898 42

nGM[4,1]PGM[4,1]GM[4,2]PGM[4,2]GM[5,2]PGM[5,2]GM[5,3]PGM[5,3]

20 444 42 390 42 522 42 514 42

40 759 42 565 42 721 42 643 42

60 1148 42 771 42 953 42 831 42

80 1536 42 1035 42 1337 42 1026 42

100 1909 42 1299 42 2232 42 1226 42

120 2329 42 1564 42 2390 42 1831 42

in [cq,Cq]for some positive constants cq,Cq>0 depending only on q. For instance, one

can take cq=λmin(M[q])and Cq=λmax(M[q]), which are both positive as M[q]is symmetric

positive deﬁnite (see (2.7)).

To show that our expectation is realized, we solve system (2.1) in two space dimensions

(d=2), up to a precision ε=10−8, by means of the GMRES and the PGMRES with pre-

conditioner P[q,p

p

p,k

k

k]

N,n

n

n(K), using f(t,x) = 1, T =1, α

α

α= (1,1),n

n

n=α

α

αn= (n,n),p

p

p= (p,p),

k

k

k= (k,k), and varying K(x),N,n,q,p,k. The resulting number of iterations are collected

in Tables 3.1–3.3. We see from the tables that the GMRES solver rapidly deteriorates with

increasing n, and it is not robust with respect to p,k. On the other hand, the convergence

rate of the proposed PGMRES is robust with respect to all spatial parameters n,p,k, though

its performance is clearly better in the case where Nis ﬁxed (Tables 3.2–3.3) than in the

case where Nincreases (Table 3.1). An explanation of this phenomenon based on Theo-

rem 2.1 is the following. In the case where Nis ﬁxed, the ratio N/n2converges to 0 much

more quickly than in the case where N=n. Consequently, when Nis ﬁxed, the spectrum of

both 2Nnd−2C[q,p

p

p,k

k

k]

N,n

n

n(K)and 2Nnd−2Q[q,p

p

p,k

k

k]

N,n

n

n(K)is better described by the symbol f[α

α

α,K]

[q,p

p

p,k

k

k]than

when N=n. Similarly, the spectrum of the preconditioned matrix (Q[q,p

p

p,k

k

k]

N,n

n

n(K))−1C[q,p

p

p,k

k

k]

N,n

n

n(K)

is better described by the preconditioned symbol I(q+1)∏d

i=1(pi−ki). In conclusion, the eigen-

values of the preconditioned matrix are supposed to be more clustered when Nis ﬁxed than

when N=n.

In order to investigate the inﬂuence of qon the number of PGMRES iterations, we

performed a further numerical experiment in Table 3.4. We observe that the considered

PGMRES is not robust with respect to q, but the number of PGMRES iterations grows

linearly with q. By comparing Tables 3.1 and 3.4, we note that the PGMRES convergence

is linear with respect to both Nand q. In practice, increasing qis the most convenient way

8 P. Benedusi, P. Ferrari, C. Garoni, R. Krause, S. Serra-Capizzano

Table 3.3: Number of iterations GM[p,k]and PGM[p,k]needed by, respectively, the GMRES and the PGM-

RES with preconditioner P[q,p

p

p,k

k

k]

N,n

n

n(K), for solving the linear system (2.1), up to a precision ε=10−8, in the

case where d=2,

K(x1,x2) = (2+cosx1)(1+x2)cos(x1+x2)sin(x1+x2)

cos(x1+x2)sin(x1+x2) (2+sinx2)(1+x1),

f(t,x) = 1, T =1, q=2, n

n

n= (n,n),p

p

p= (p,p),k

k

k= (k,k),N=20. The number of DoFs is given by 60¯n=

60(n(p−k) + k−1)2.

nGM[2,0]PGM[2,0]GM[2,1]PGM[2,1]GM[3,0]PGM[3,0]GM[3,2]PGM[3,2]

20 286 40 112 40 400 40 123 40

40 579 40 228 40 809 40 224 40

60 874 40 345 40 1218 40 339 40

80 1170 40 463 40 1716 40 456 40

100 1466 40 580 40 2204 40 573 40

120 1757 40 697 40 2487 40 690 40

nGM[4,0]PGM[4,0]GM[4,3]PGM[4,3]GM[5,0]PGM[5,0]GM[5,4]PGM[5,4]

20 779 40 208 40 1460 40 396 40

40 1070 40 270 40 1982 40 419 40

60 1580 40 361 40 2376 40 466 40

80 2176 40 487 40 2733 40 531 40

100 2668 40 613 40 3559 40 657 40

120 3284 40 738 40 4565 40 791 40

Table 3.4: Same setting as in Table 3.1 with n=N=20 and q=0,1,2,3,4.

qGM[3]PGM[3]GM[4]PGM[4]GM[5]PGM[5]

0 66 21 85 21 170 21

1 122 42 154 42 280 42

2 175 64 225 64 391 64

3 222 95 289 95 464 95

4 247 115 351 115 602 116

qGM[6]PGM[6]GM[7]PGM[7]GM[8]PGM[8]

0 269 21 532 21 674 21

1 446 42 688 42 834 42

2 491 64 580 64 672 64

3 616 95 916 95 1103 95

4 1031 116 1927 116 5468 116

to improve the temporal accuracy of the discrete solution u; see, e.g., [12]. This is due to

the superconvergence property, according to which the order of convergence in time of a

q-degree DG method is 2q+1 [19,43]. Tables 3.1 and 3.4 show that the strategy of keeping

Nﬁxed and increasing qis more convenient even in terms of performance of the proposed

PGMRES.

As it is known, each PGMRES iteration requires solving a linear system with coefﬁcient

matrix given by the preconditioner P[q,p

p

p,k

k

k]

N,n

n

n(K), and this is not required in a GMRES iteration.

Thus, if we want to prove that the proposed PGMRES is fast, we have to show that we are

Fast Parallel Solver for the Space-Time IgA-DG Discretization of the Diffusion Equation 9

able to solve efﬁciently a linear system with matrix P[q,p

p

p,k

k

k]

N,n

n

n(K). However, for the reasons

explained in Section 4, this is not exactly the path we will follow.

Before moving on to Section 4, we remark that, thanks to the tensor structure (3.3),

the solution of a linear system with coefﬁcient matrix P[q,p

p

p,k

k

k]

N,n

n

n(K)reduces to the solution

of N(q+1)linear systems with coefﬁcient matrix Kn

n

n,[p

p

p,k

k

k](K). Indeed, the solution of the

system P[q,p

p

p,k

k

k]

N,n

n

n(K)x=yis given by

x= (P[q,p

p

p,k

k

k]

N,n

n

n(K))−1y= (IN(q+1)⊗Kn

n

n,[p

p

p,k

k

k](K)−1)y=

Kn

n

n,[p

p

p,k

k

k](K)−1y1

.

.

.

Kn

n

n,[p

p

p,k

k

k](K)−1yN(q+1)

,(3.4)

where yT= [yT

1,...,yT

N(q+1)]and each yihas length ¯n. It is then clear that the computation

of the solution xis equivalent to solving the N(q+1)linear systems Kn

n

n,[p

p

p,k

k

k](K)xi=yi,

i=1,...,N(q+1). Note that the various xican be computed in parallel as the computation

of xiis independent of the computation of xjwhenever i6=j.

4 Fast Solver for the Space-Time IgA-DG System

From here on, we focus on the maximal smoothness case k

k

k=p

p

p−1, that is, the case

corresponding to the classical IgA approach. For notational simplicity, we drop the sub-

script/superscript k

k

k=p

p

p−1, so that, for instance, the matrices C[q,p

p

p,p

p

p−1]

N,n

n

n(K),P[q,p

p

p,p

p

p−1]

N,n

n

n(K),

Kn

n

n,[p

p

p,p

p

p−1](K)will be denoted by C[q,p

p

p]

N,n

n

n(K),P[q,p

p

p]

N,n

n

n(K),Kn

n

n,[p

p

p](K), respectively.

The solver suggested in Section 3 for a linear system with matrix C[q,p

p

p]

N,n

n

n(K)is a PGMRES

with preconditioner P[q,p

p

p]

N,n

n

n(K). According to (3.4), the solution of a linear system with ma-

trix P[q,p

p

p]

N,n

n

n(K), which is required at each PGMRES iteration, is equivalent to solving N(q+1)

linear systems with matrix Kn

n

n,[p

p

p](K). Fast solvers for Kn

n

n,[p

p

p](K)that have been proposed in

recent papers (see [20,21,22] and references therein) might be employed here. However,

using an exact solver for Kn

n

n,[p

p

p](K)is not what we have in mind. Indeed, it was discovered

experimentally that the PGMRES method converges faster if the linear system with matrix

P[q,p

p

p]

N,n

n

n(K)occurring at each PGMRES iteration is solved inexactly. More precisely, when

solving the N(q+1)linear systems with matrix Kn

n

n,[p

p

p](K)occurring at each PGMRES itera-

tion, it is enough to approximate their solutions by performing only a few standard multigrid

iterations in order to achieve an excellent PGMRES run-time; and, in fact, only one stan-

dard multigrid iteration is sufﬁcient. In view of these experimental discoveries, we propose

to solve a linear system with matrix C[q,p

p

p]

N,n

n

n(K)in the following way.

Algorithm 4.1

1. Apply to the given system the PGMRES algorithm with preconditioner P[q,p

p

p]

N,n

n

n(K).

2. The exact solution of the linear system with matrix P[q,p

p

p]

N,n

n

n(K)occurring at each PGM-

RES iteration would require solving N(q+1)linear systems with matrix Kn

n

n,[p

p

p](K)

as per eq. (3.4).

3. Instead of solving exactly these N(q+1)systems, apply to each of them, starting

from the zero vector as initial guess, µmultigrid (V-cycle) iterations involving, at all

levels, a single symmetric Gauss–Seidel post-smoothing step and standard bisection

for the interpolation and restriction operators (following the Galerkin assembly in

which the interpolation operator is the transpose of the restriction operator).

10 P. Benedusi, P. Ferrari, C. Garoni, R. Krause, S. Serra-Capizzano

As we shall see in the numerics of Section 6, the choice µ=1 yields the best per-

formance of Algorithm 4.1. The proposed solver is not the PGMRES with preconditioner

P[q,p

p

p]

N,n

n

n(K)because, at each iteration, the linear system associated with P[q,p

p

p]

N,n

n

n(K)is not solved

exactly. However, the solver is still a PGMRES with a different preconditioner ˜

P[q,p

p

p]

N,n

n

n(K).

To see this, let MG be the iteration matrix of the multigrid method used in step 3 of Algo-

rithm 4.1 for solving a linear system with matrix Kn

n

n,[p

p

p](K). Recall that MG depends only

on Kn

n

n,[p

p

p](K)and not on the speciﬁc right-hand side of the system to solve. If the system to

solve is Kn

n

n,[p

p

p](K)xi=yi, the approximate solution ˜

xiobtained after µmultigrid iterations

starting from the zero initial guess is given by

˜

xi= (I¯n−MGµ)Kn

n

n,[p

p

p](K)−1yi.

Hence, the approximation ˜

xcomputed by our solver for the exact solution (3.4) of the system

P[q,p

p

p]

N,n

n

n(K)x=yis given by

˜

x=

(I¯n−MGµ)Kn

n

n,[p

p

p,k

k

k](K)−1y1

.

.

.

(I¯n−MGµ)Kn

n

n,[p

p

p,k

k

k](K)−1yN(q+1)

= (IN(q+1)⊗(I¯n−MGµ)Kn

n

n,[p

p

p](K)−1)y

=˜

P[q,p

p

p]

N,n

n

n(K)−1y,

where ˜

P[q,p

p

p]

N,n

n

n(K) = IN(q+1)⊗Kn

n

n,[p

p

p](K)(I¯n−MGµ)−1.(4.1)

In conclusion, the proposed solver is the PGMRES with preconditioner ˜

P[q,p

p

p]

N,n

n

n(K).From the

expression of ˜

P[q,p

p

p]

N,n

n

n(K), we can also say that the proposed solver is a MG-GMRES, that is, a

PGMRES with preconditioner given by a standard multigrid applied only in space. A more

precise notation for this solver could be MGspace-GMRES, but for simplicity we just write

MG-GMRES.

5 Fast Parallel Solver for the Space-Time IgA-DG System

In Section 4, we have described the sequential version of the proposed solver. The same

version is used also in the case where ρ≤N(q+1)processors are available, with the only

difference that step 3 of Algorithm 4.1 is performed in parallel. In practice, silinear systems

are assigned to the ith processor for i=1,...,ρ, with s1+.. . +sρ=N(q+1)and s1,...,sρ

approximately equal to each other according to a load balancing principle. This is illustrated

in Figure 5.1 (left), which shows the row-wise partition of P[q,p

p

p]

N,n

n

n(K) = IN(q+1)⊗K[q,p

p

p]

N,n

n

n(K)

corresponding to the distribution of the N(q+1)systems among ρ=N(q+1)−1 proces-

sors.

If ρ>N(q+1)processors are available, we use a slight modiﬁcation of the solver,

which is suited for parallel computation. As before, the modiﬁcation only concerns step 3

of Algorithm 4.1. Since we now have more processors than systems to be solved, after as-

signing 1 processor to each system, we still have ρ−N(q+1)unused processors. Following

again a load balancing principle, we distribute the unused processors among the N(q+1)

systems, so that now one system can be shared between two or more different processors; see

Figure 5.1 (right). Suppose that the system K[q,p

p

p]

N,n

n

n(K)x=yis shared between σprocessors.

The symmetric Gauss–Seidel post-smoothing iteration in step 3 of Algorithm 4.1 cannot be

Fast Parallel Solver for the Space-Time IgA-DG Discretization of the Diffusion Equation 11

Partition of P[q,p

p

p]

N,n

n

n(K)for ρ=N(q+1)−1 Partition of P[q,p

p

p]

N,n

n

n(K)for ρ=N(q+1) + 1

˜

K

˜

K

˜

K

˜

K

˜

K

˜

K

˜

K

˜

K

ρ1ρ2ρ3ρ4ρ5

Fig. 5.1: Row-wise partitions of the preconditioner P[q,p

p

p]

N,n

n

n(K) = IN(q+1)⊗˜

Kusing ρ=N(q+1)−1 processors

(left) and ρ=N(q+1) + 1 processors (right) with N(q+1) = 4. For simplicity, we write “ ˜

K” instead of

“K[q,p

p

p]

N,n

n

n(K)”.

performed in parallel. Therefore, we replace it with its block-wise version. To be precise,

we recall that the symmetric Gauss–Seidel iteration for a system with matrix E=L+U−D

is just the preconditioned Richardson iteration with preconditioner M=LD−1U.1Its block-

wise version in the case where we consider σdiagonal blocks E1,...,Eσof Eis simply

the preconditioned Richardson iteration with preconditioner M1⊕ · ·· ⊕ Mσ, where Miis the

symmetric Gauss–Seidel preconditioner for Eiand M1⊕ · ·· ⊕ Mσis the block diagonal ma-

trix whose diagonal blocks are M1,...,Mσ. This block-wise version is suited for parallel

computation in the case where σprocessors are available.

6 Numerical Experiments: Iteration Count, Timing and Scaling

In this section, we illustrate through numerical experiments the performance of the proposed

solver and we compare it to the performance of other benchmark parallel solvers, such as

the PGMRES with block-wise ILU(0) preconditioner.

6.1 Implementation Details

For the numerics of this section, as well as throughout this paper, we used the C++ frame-

work PETSc [5,6] and the domain speciﬁc language Utopia [64] for the parallel linear al-

gebra and solvers, and the Cray-MPICH compiler. For the assembly of high order ﬁnite

elements, we used the PetIGA package [18]. A parallel tensor-product routine was imple-

mented to assemble space-time matrices. Numerical experiments have been performed on

the Cray XC40 nodes of the Piz Daint supercomputer of the Swiss national supercomputing

centre (CSCS).2The used partition features 1813 computation nodes, each of which holds

two 18-core Intel Xeon E5-2695v4 (2.10GHz) processors. We stress that the PETSc default

row-wise partition follows a load balancing principle and, except in the trivial case ρ=N,

does not correspond to the row-wise partition described in Section 5; see Figure 6.1. There-

fore, the partition must be adjusted by the user. Alternatively, one can use a PETSc built-in

class for sparse block matrices and specify the block size (q+1)n.

1The matrices L,U,Dare, respectively, the lower triangular part of E(including the diagonal), the upper

triangular part of E(including the diagonal), and the diagonal part of E.

2https://www.cscs.ch/computers/piz-daint/

12 P. Benedusi, P. Ferrari, C. Garoni, R. Krause, S. Serra-Capizzano

PETSc default partition

˜

K

˜

K

˜

K

˜

K

ρ1ρ2ρ3ρ4ρ5

Fig. 6.1: The PETSc default row-wise partition does not account for the structure of the space-time problem;

compare with Figure 5.1.

6.2 Experimental Setting

In the numerics of this section, we solve the linear system (2.1) arising from the choices

d=2, f(t,x) = 1, T =1, n

n

n= (n,n),p

p

p= (p,p),k

k

k= (p−1,p−1). The basis functions

`1,[q],...,`q+1,[q]are chosen as the Lagrange polynomials associated with the right Gauss–

Radau nodes in [−1,1]. The values of K(x),N,n,q,pare speciﬁed in each example. For

each solver considered herein, we use the tolerance ε=10−8and the PETSc default stopping

criterion based on the preconditioned relative residual. Moreover, the PGMRES method is

always applied with restart after 30 iterations as per PETSc default. Whenever we report

the run-time of a solver, the time spent in I/O operations and matrix assembly is ignored.

Run-times are always expressed in seconds. In all the tables below, the number of iterations

needed by a given solver to converge within the tolerance ε=10−8is reported in square

brackets next to the corresponding run-time. Throughout this section, we use the following

abbreviations for the solvers.

–ILU(0)-GMRES

PGMRES with preconditioner given by an ILU(0) factorization (ILU factorization with

no ﬁll-in) of the system matrix.

–MGL

µ,ν-GMRES

The proposed solver, as described in Section 4, with µmultigrid (V-cycle) iterations

applied to Kn

n

n,[p

p

p](K). Each multigrid iteration involves νsymmetric Gauss–Seidel post-

smoothing steps at the ﬁnest level and 1 symmetric Gauss–Seidel post-smoothing step at

the coarse levels. The choice ν=1 corresponds to our solver proposal. Different values

of νare considered for comparison purposes. The superscript Ldenotes the number of

multigrid levels.

–TMGL

µ,ν-GMRES

The same as MGL

µ,ν-GMRES, with the only difference that the multigrid iterations are

performed with the telescopic option, thus giving rise to the telescopic multigrid (TMG)

[23,45]. This technique consists in reducing the number of processors used on the coarse

levels and can be beneﬁcial for the parallel multigrid performance. In the numerics of

this section, we only reduced the number of processors used on the coarsest level to one

fourth of the number of processors used at all other levels.

Fast Parallel Solver for the Space-Time IgA-DG Discretization of the Diffusion Equation 13

Table 6.1: PGMRES iterations and run-time (using 64 cores) to solve the linear system (2.1) up to a precision

of 10−8, according to the experimental setting described in Section 6.2. We used K(x) = I2,q=0, N=32,

n=259 −p. The total size of the space-time system (number of DoFs) is given by 32·2572.

p1 2 3 4

ILU(0)-GMRES 3.7 [579] 4.3 [367] 5.2 [269] 6.7 [226]

MG5

3,2-GMRES 1.4 [33] 2.9 [33] 4.7 [33] 7.2 [33]

MG5

1,2-GMRES 0.8 [33] 1.6 [33] 2.5 [33] 4.0 [35]

MG5

3,1-GMRES 1.1 [33] 2.2 [33] 3.3 [33] 5.0 [34]

MG5

1,1-GMRES 0.6 [33] 1.2 [33] 1.8 [34] 3.1 [39]

p5 6 7 8 9

ILU(0)-GMRES 8.2 [193] 10.1 [174] 11.9 [156] 22.5 [234] 44.9 [383]

MG5

3,2-GMRES 10.5 [35] 14.7 [36] 21.1 [41] 34.6 [53] 57.6 [73]

MG5

1,2-GMRES 6.6 [42] 11.0 [52] 16.0 [60] 26.2 [77] 47.0 [90]

MG5

3,1-GMRES 7.1 [36] 11.4 [43] 17.0 [51] 28.5 [67] 42.1 [83]

MG5

1,1-GMRES 5.3 [50] 9.1 [63] 13.5 [75] 19.8 [87] 30.7 [112]

Table 6.2: PGMRES iterations and run-time (using 64 cores) to solve the linear system (2.1) up to a precision

of 10−8, according to the experimental setting described in Section 6.2. We used

K(x1,x2) = cos(x1) + x20

0x1+sin(x2),

q=1, N=20, n=131 −p. The total size of the space-time system (number of DoFs) is given by 40·1292.

Note that K(x1,x2)is singular at (x1,x2) = (0,0).

p1 2 3 4

ILU(0)-GMRES 1.3 [449] 1.7 [283] 2.2 [219] 2.9 [183]

MG5

2,3-GMRES 0.6 [55] 1.3 [55] 2.4 [55] 4.1 [58]

MG5

1,3-GMRES 0.5 [57] 1.0 [56] 1.8 [56] 3.5 [68]

MG5

2,1-GMRES 0.5 [57] 1.0 [57] 1.6 [58] 3.1 [77]

MG5

1,1-GMRES 0.5 [67] 0.8 [65] 1.3 [68] 2.8 [90]

p5 6 7 8 9

ILU(0)-GMRES 3.6 [158] 4.4 [141] 6.0 [148] 9.5 [186] 24.8 [397]

MG5

2,3-GMRES 7.6 [64] 12.7 [90] 18.5 [101] 32.2 [139] 48.9 [173]

MG5

1,3-GMRES 6.2 [85] 10.4 [103] 15.0 [116] 26.5 [161] 38.0 [189]

MG5

2,1-GMRES 5.2 [91] 8.6 [112] 12.6 [128] 22.0 [179] 30.7 [205]

MG5

1,1-GMRES 4.6 [110] 7.2 [125] 11.0 [150] 19.4 [210] 30.2 [269]

6.3 Iteration Count and Timing

Tables 6.1–6.3 illustrate the performance of the proposed solver in terms of number of iter-

ations and run-time. It is clear from the tables that the best performance of the solver is ob-

tained when applying to Kn

n

n,[p

p

p](K)a single multigrid iteration (µ=1) with only one smooth-

ing step at the ﬁnest level (ν=1). Moreover, the solver is competitive with respect to the

ILU(0)-GMRES. The worst performance of the solver with respect to the ILU(0)-GMRES

is attained in Table 6.2, where the diffusion matrix K(x1,x2)is singular at (x1,x2) = (0,0).

14 P. Benedusi, P. Ferrari, C. Garoni, R. Krause, S. Serra-Capizzano

Table 6.3: PGMRES iterations and run-time (using 64 cores) to solve the linear system (2.1) up to a precision

of 10−8, according to the experimental setting described in Section 6.2. We used

K(x1,x2) = (2+cosx1)(1+x2)cos(x1+x2)sin(x1+x2)

cos(x1+x2)sin(x1+x2) (2+sinx2)(1+x1),

q=0, N=20, n=259 −p. The total size of the space-time system (number of DoFs) is given by 20·2572.

p1 2 3 4

ILU(0)-GMRES 1.9 [450] 2.2 [284] 2.6 [205] 3.4 [170]

MG5

2,2-GMRES 0.2 [11] 0.5 [11] 0.8 [11] 1.5 [13]

MG5

1,2-GMRES 0.2 [12] 0.4 [11] 0.6 [12] 1.2 [15]

MG5

2,1-GMRES 0.2 [11] 0.4 [11] 0.6 [12] 1.1 [15]

MG5

1,1-GMRES 0.2 [12] 0.3 [11] 0.5 [14] 1.0 [19]

p5 6 7 8 9

ILU(0)-GMRES 4.4 [154] 5.2 [135] 6.4 [125] 12.6 [195] 22.8 [289]

MG5

2,2-GMRES 2.6 [17] 4.1 [20] 5.9 [23] 8.8 [27] 11.9 [30]

MG5

1,2-GMRES 2.1 [20] 3.3 [23] 4.6 [26] 7.2 [31] 10.1 [36]

MG5

2,1-GMRES 2.0 [20] 3.1 [23] 4.6 [27] 6.2 [31] 8.4 [35]

MG5

1,1-GMRES 1.7 [23] 2.5 [26] 3.6 [30] 5.5 [36] 7.4 [40]

Table 6.4: Strong scaling: PGMRES iterations and run-time to solve the linear system (2.1) up to a precision

of 10−8, according to the experimental setting described in Section 6.2. We used K(x) = I2,q=0, p=3,

N=64, n=384. The total size of the space-time system (number of DoFs) is given by 64·3852.

Cores 1 2 4 8

ILU(0)-GMRES 1385.0 [414] 682.1 [415] 336.7 [415] 181.9 [415]

MG7

1,1-GMRES 335.1 [64] 179.7 [64] 92.5 [64] 51.9 [64]

TMG7

1,1-GMRES 335.1 [64] 179.7 [64] 92.5 [64] 51.9 [64]

Cores 16 32 64 128

ILU(0)-GMRES 103.3 [415] 49.7 [416] 21.2 [417] 12.8 [500]

MG7

1,1-GMRES 31.8 [64] 16.6 [64] 8.3 [64] 4.4 [64]

TMG7

1,1-GMRES 31.3 [64] 16.5 [64] 8.0 [64] 4.2 [64]

Cores 256 512 1024 2048 4096

ILU(0)-GMRES 6.8 [519] 4.0 [550] 2.5 [619] 1.7 [753] 1.7 [1013]

MG7

1,1-GMRES 2.5 [65] 1.8 [65] 1.9 [65] 5.1 [65] 14.7 [66]

TMG7

1,1-GMRES 2.2 [64] 1.3 [63] 0.8 [64] 0.5 [64] 0.4 [64]

Table 6.5: Space-time weak scaling: PGMRES iterations and run-time to solve the linear system (2.1) up to a

precision of 10−8, according to the experimental setting described in Section 6.2. We used K(x) = I2,q=0,

p=2 and (N,n) = (8,65),(16,129),(32,256),(64,512). The ratio DoFs/Cores is constant in the table.

[Cores,n,N,L] [1,65,8,4] [8,129,16,5] [64,257,32,6] [512,513,64,7]

ILU(0)-GMRES 0.25 [50] 0.86 [121] 2.80 [367] 7.6 [989]

TMGL

1,1-GMRES 0.11 [10] 0.27 [17] 0.67 [33] 1.4 [64]

Fast Parallel Solver for the Space-Time IgA-DG Discretization of the Diffusion Equation 15

100101102103104

Cores

10−1

100

101

102

103

104

Run-time (s)

ILU

MG

TMG

Fig. 6.2: Graphical representation of the run-times

reported in Table 6.4.

104105106107108

DoFs

10−2

10−1

100

101

102

103

Run-time (s)

ILU

TMG

Fig. 6.3: Graphical representation of the run-times

reported in Table 6.5.

6.4 Scaling

In the scaling experiments, besides the multigrid already considered above, we also employ

a TMG for performance reasons. To avoid memory bounds, we use at most 16 cores per

node. From Table 6.4 and Figure 6.2 we see that the proposed solver, especially when using

the TMG option, shows a nearly optimal strong scaling with respect to the number of cores.3

Table 6.5 and Figure 6.3 illustrate the weak scaling properties of the proposed solver, which

possesses a superior parallel efﬁciency with respect to the standard ILU(0) approach in

terms of iteration count and run-time. For both solvers, however, the weak scaling is not

ideal (constant run-time). This is due to the fact that Ngrows from 8 to 64 and both solvers

are not robust with respect to N.

7 Non-Rectangular Domain and Non-Trivial Geometry

So far, the performance of the proposed solver has been illustrated for the diffusion problem

(1.1) over the hypersquare (0,1)d. However, no special difﬁculty arises if (0,1)dis replaced

by a non-rectangular domain Ωdescribed (exactly) by a geometry map G:[0,1]d→Ωas

per IgA paradigm. Indeed, as long as a tensor-product structure between space and time is

maintained, the geometry map Gacts as a reparameterization of Ωthrough (0,1)d, and the

resulting discretization matrix is still given by (2.2)–(2.9) with the only difference that:

–a factor |det(JG(x))|should be included in the integrand of (2.5), where JG(x)is the

Jacobian matrix of G(x);

–the matrix K(x)in (2.6) should be replaced by JG(x)−1K(G(x))JG(x)−T|det(JG(x))|.

In short, a change of domain from (0,1)dto Ωessentially amounts to a mere change of

diffusion matrix from Kto J−1

GK(G)J−T

G|det(JG)|, which does not affect the performance of

the proposed solver.

In Table 7.1, we validate the previous claim by testing the solver on the linear sys-

tem arising from the space-time IgA-DG discretization of (1.1) in the case where (0,1)d

is replaced by a non-rectangular domain Ωdescribed by a non-trivial geometry map G:

3We observe a slight reduction in the ideal scaling as the number of cores grows from 2 to 16. This is due

to the fact that runs are performed on a single node with its own limited memory. For more than 16 cores,

the memory bound is no longer present since computations are performed on multiple nodes with increasing

memory. When the number of cores exceeds two thousand, communication takes over and scaling is no longer

observable.

16 P. Benedusi, P. Ferrari, C. Garoni, R. Krause, S. Serra-Capizzano

Table 7.1: PGMRES iterations and run-time (using 64 cores) to solve, up to a precision of 10−8, the linear

system arising from the space-time IgA-DG discretization of (1.1) in the case where (0,1)dis replaced by the

domain (7.1) described by the geometry map (7.2). The experimental setting is the same as in Section 6.2.

We used K(x) = I2,q=1, N=20, n=131 −p. The total size of the space-time system (number of DoFs) is

given by 40·1292.

p1 2 3 4

ILU(0)-GMRES 1.6 [412] 2.2 [354] 4.0 [296] 4.3 [268]

MG5

1,1-GMRES 0.7 [99] 1.3 [91] 2.2 [103] 4.0 [140]

p5 6 7 8 9

ILU(0)-GMRES 6.0 [266] 8.0 [257] 16.6 [415] 31.3 [622] 48.2 [775]

MG5

1,1-GMRES 7.2 [178] 11.8 [219] 16.5 [241] 29.6 [348] 39.9 [386]

[0,1]d→Ω. The experimental setting is the same as in Section 6.2, with the only difference

that (0,1)2is now replaced by a quarter of an annulus

Ω={x∈R2:r2<x2

1+x2

2<R2,x1>0,x2>0},r=1,R=2,(7.1)

described by the geometry map G:[0,1]2→Ω,

G(ˆ

x) = (x1= [r+ˆx1(R−r)]cosπ

2ˆx2,

x2= [r+ˆx1(R−r)]sinπ

2ˆx2,ˆ

x∈[0,1]2.(7.2)

We remark that the geometry map Gis a common benchmark example in IgA; see, e.g., [20,

21].

8 Conclusions

We have proposed a MG-GMRES solver for the space-time IgA-DG discretization of the

diffusion problem (1.1). Through numerical experiments, we have illustrated the competi-

tiveness of our proposal in terms of iteration count, run-time and parallel scaling. We have

also shown its applicability to more general problems than (1.1) involving a non-rectangular

domain Ωand a non-trivial geometry map G. To conclude, we remark that the proposed

solver is highly ﬂexible as it does not depend on the domain or the space-time discretization.

It could therefore be applied to other space-time discretizations, as long as a tensor-product

structure is maintained between space and time.

Data Availability Statement If requested by the handling editor or the reviewers, the codes used for pro-

ducing the numerical results of this paper will be made publicly available.

Acknowledgements Paola Ferrari is partially ﬁnanced by the GNCS 2019 Project “Metodi Numerici per

Problemi Mal Posti”. Paola Ferrari, Carlo Garoni and Stefano Serra-Capizzano are grateful to the Ital-

ian INdAM-GNCS for the scientiﬁc support. Carlo Garoni acknowledges the MIUR Excellence Depart-

ment Project awarded to the Department of Mathematics of the University of Rome Tor Vergata (CUP

E83C18000100006) and the support obtained by the Beyond Borders Programme of the University of Rome

Tor Vergata through the Project ASTRID (CUP E84I19002250005). Rolf Krause acknowledges the funding

obtained from the European High-Performance Computing Joint Undertaking (JU) under Grant Agreement

N. 955701 (Project TIME-X); the JU receives support from the European Union’s Horizon 2020 Research

Fast Parallel Solver for the Space-Time IgA-DG Discretization of the Diffusion Equation 17

and Innovation Programme and from Belgium, France, Germany and Switzerland. Finally, the authors ac-

knowledge the Deutsche Forschungsgemeinschaft (DFG) as part of the “ExaSolvers” Project in the Priority

Programme 1648 “Software for Exascale Computing” (SPPEXA) and the Swiss National Science Foundation

(SNSF) under the lead agency grant agreement SNSF-162199.

References

1. ABEDI R., PETRACOV IC I B., HA BER R.B. A space-time discontinuous Galerkin method for lin-

earized elastodynamics with element-wise momentum balance. Comput. Methods Appl. Mech. Engrg.

195 (2006) 3247–3273.

2. ARBENZ P., HU PP D., OBRIST D. A parallel solver for the time-periodic Navier–Stokes equations. In

“PPAM 2013: Parallel Processing and Applied Mathematics”, Springer (2014), pp. 291–300.

3. AURICCHIO F., BE IR ÃO DA VEI GA L., HUGHES T.J.R., REALI A., SANGALLI G. Isogeometric col-

location methods. Math. Models Methods Appl. Sci. 20 (2010) 2075–2107.

4. AZIZ A.K., MONK P. Continuous ﬁnite elements in space and time for the heat equation. Math. Comput.

52 (1989) 255–274.

5. BAL AY S., ABH YANK AR S., ADAM S M. F., BROWN J., BRU NE P., BUSCHELMAN K ., DALCIN L.,

DENER A., EIJKHOUT V., GROP P W.D., KARPEYEV D., KAUS HI K D., KN EP LE Y M.G., MAY D.A.,

CURFMAN MCINNES L., MILL S R. T., MUNSON T., RUPP K., SANA N P., SMITH B.F., ZAMPINI S.,

ZHANG H., ZHAN G H. PETSc web page. https://www.mcs.anl.gov/petsc (2019).

6. BAL AY S., ABH YANK AR S., ADAM S M. F., BROWN J., BRU NE P., BUSCHELMAN K ., DALCIN L.,

DENER A., EIJKHOUT V., GROP P W.D., KARPEYEV D., KAUS HI K D., KN EP LE Y M.G., MAY D.A.,

CURFMAN MCINNES L., MILL S R. T., MUNSON T., RUPP K., SANA N P., SMITH B.F., ZAMPINI S.,

ZHANG H., ZHAN G H. PETSc users manual. Technical Report ANL-95/11 - Revision 3.11, Argonne

National Laboratory (2019).

7. BARBARINO G., G ARONI C., SERRA-CAPIZZANO S. Block generalized locally Toeplitz sequences:

theory and applications in the unidimensional case. Electron. Trans. Numer. Anal. 53 (2020) 28–112.

8. BAR BAR IN O G., GARONI C., SERRA-CAPIZZANO S. Block generalized locally Toeplitz sequences:

theory and applications in the multidimensional case. Electron. Trans. Numer. Anal. 53 (2020) 113–

216.

9. BEIRÃO DA VEI GA L., BUFFA A., SANGALLI G., VÁZQ UE Z R. Mathematical analysis of variational

isogeometric methods. Acta Numerica 23 (2014) 157–287.

10. BENEDUSI P., GA RONI C., KRAUSE R., LIX., SERRA-CAPIZZANO S. Space-time FE-DG discretiza-

tion of the anisotropic diffusion equation in any dimension: the spectral symbol. SIAM J. Matrix Anal.

Appl. 39 (2018) 1383–1420.

11. BENEDUSI P., HU PP D., ARBENZ P., KR AUS E R. A parallel multigrid solver for time-periodic in-

compressible Navier–Stokes equations in 3D. In “Numerical Mathematics and Advanced Applications

ENUMATH 2015”, Springer (2016), pp. 265–273.

12. BENEDUSI P., MINION M., KRAUSE R. An experimental comparison of a space-time multigrid method

with PFASST for a reaction-diffusion problem. In revision.

13. BERTACCI NI D ., DURASTANTE F. Iterative Methods and Preconditioning for Large and Sparse Linear

Systems with Applications. Taylor & Francis, Boca Raton (2018).

14. BERTACCI NI D ., N GM.K. Band-Toeplitz preconditioned GMRES iterations for time-dependent PDEs.

BIT Numer. Math. 43 (2003) 901–914.

15. BETSCH P., STEINMANN P. Conservation properties of a time FE method—part II: time-stepping

schemes for non-linear elastodynamics. Inter. J. Numer. Methods Engrg. 50 (2001) 1931–1955.

16. ˇ

CESENEK J., FE IS TAUE R M. Theory of the space-time discontinuous Galerkin method for nonstationary

parabolic problems with nonlinear convection and diffusion. SIAM J. Numer. Anal. 50 (2012) 1181–

1206.

17. COT TR EL L J.A., HUGHES T.J.R., BAZILEVS Y. Isogeometric Analysis: Toward Integration of CAD

and FEA. Wiley, Chicester (2009).

18. DALCIN L., CO LLIER N., VIGNAL P., C ˆ

ORTES A.M.A., CALO V.M. PetIGA: a framework for high

performance isogeometric analysis. Comput. Methods Appl. Mech. Engrg. 308 (2016) 151–181.

19. DELFOUR M., H AGER W., TROC HU F. Discontinuous Galerkin methods for ordinary differential equa-

tions. Math. Comput. 36 (1981) 455–473.

20. DONATEL LI M., GARO NI C ., M AN NI C., SER RA -CAPIZZANO S., SP EL EE RS H. Robust and optimal

multi-iterative techniques for IgA Galerkin linear systems. Comput. Methods Appl. Mech. Engrg. 284

(2015) 230–264.

18 P. Benedusi, P. Ferrari, C. Garoni, R. Krause, S. Serra-Capizzano

21. DONATEL LI M., GARONI C., M AN NI C ., S ER RA -CAPIZZANO S., SP EL EE RS H. Robust and optimal

multi-iterative techniques for IgA collocation linear systems. Comput. Methods Appl. Mech. Engrg. 284

(2015) 1120–1146.

22. DONATEL LI M., GARONI C., M AN NI C ., S ER RA -CAPIZZANO S., SP EL EE RS H. Symbol-based multi-

grid methods for Galerkin B-spline isogeometric analysis. SIAM J. Numer. Anal. 55 (2017) 31–62.

23. DOUGLAS C.C. A review of numerous parallel multigrid methods. In “Applications on Advanced Ar-

chitecture Computers”, SIAM (1996), pp. 187–202.

24. ERIKSSON K., JOHNSO N C. , LO GG A. Adaptive computational methods for parabolic problems: Part

1. Fundamentals. Encyclop. Comput. Mech. (2004).

25. FALG OU T R.D., FRI ED HO FF S., KOL EV TZ. V., MACLACHL AN S .P., SCHRODER J.B., VAND E-

WALLE S. Multigrid methods with space-time concurrency. Comput. Visual. Sci. 18 (2007) 123–143.

26. FRENCH D.A. A space-time ﬁnite element method for the wave equation. Comput. Methods Appl. Mech.

Engrg. 107 (1993) 145–157.

27. GANDER M.J. 50 years of time parallel time integration. Article 3 in “Multiple Shooting and Time

Domain Decomposition Methods”, Springer (2015).

28. GANDER M.J., HALPERN L . Techniques for locally adaptive time stepping developed over the last two

decades. Domain Decomp. Methods Sci. Engrg. XX (2013) 377–385.

29. GANDER M.J., NE UM ¨

ULLER M. Analysis of a new space-time parallel multigrid algorithm for

parabolic problems. SIAM J. Sci. Comput. 38 (2016) A2173–A2208.

30. GARON I C., SE RRA-CAPIZZANO S. Generalized Locally Toeplitz Sequences: Theory and Applications.

Volume I, Springer, Cham (2017).

31. GARON I C., SE RRA-CAPIZZANO S. Generalized Locally Toeplitz Sequences: Theory and Applications.

Volume II, Springer, Cham (2018).

32. GRIEBEL M., O ELT Z D. A sparse grid space-time discretization scheme for parabolic problems. Com-

puting 81 (2007) 1–34.

33. HESTHAVEN J.S., WARB URTON T. Nodal Discontinuous Galerkin Methods: Algorithms, Analysis, and

Applications. Springer, New York (2008).

34. HOFER C., LANGER U., NE UM ¨

ULLER M. Parallel and robust preconditioning for space-time isogeo-

metric analysis of parabolic evolution problems. SIAM J. Sci. Comput. 41 (2019) A1793–A1821.

35. HORTON G., VANDEWALLE S. A space-time multigrid method for parabolic partial differential equa-

tions. SIAM J. Sci. Comput. 16 (1995) 848–864.

36. HUGHES T.J.R., COTTR EL L J.A., BAZILEVS Y. Isogeometric analysis: CAD, ﬁnite elements, NURBS,

exact geometry and mesh reﬁnement. Comput. Methods Appl. Mech. Engrg. 194 (2005) 4135–4195.

37. HUGHES T.J.R., HULB ERT G.M. Space-time ﬁnite element methods for elastodynamics: formulations

and error estimates. Comput. Methods Appl. Mech. Engrg. 66 (1988) 339–363.

38. KLAIJ C.M., VAN DER VE GT J.J.W., VAN DER VE N H. Space-time discontinuous Galerkin method for

the compressible Navier–Stokes equations. J. Comput. Phys. 217 (2006) 589–611.

39. KRAUSE D., KRAUSE R. Enabling local time stepping in the parallel implicit solution of reaction-

diffusion equations via space-time ﬁnite elements on shallow tree meshes. Appl. Math. Comput. 277

(2016) 164–179.

40. LA DY ˇ

ZENSKAJA O.A., SOLO NN IK OV V.A., URA L’CEVA N.N. Linear and Quasi-Linear Equations of

Parabolic Type. Amer. Math. Soc. (1968).

41. LANGER U., MOORE S.E., NEU M ¨

ULLER M. Space-time isogeometric analysis of parabolic evolution

problems. Comput. Methods Appl. Mech. Engrg. 306 (2016) 342–363.

42. LANGER U., ZANK M. Efﬁcient direct space-time ﬁnite element solvers for parabolic initial-boundary

value problems in anisotropic Sobolev spaces. arXiv:2008.01996 (2020).

43. LASAINT P., RAVI ART P.A. On a ﬁnite element method for solving the neutron transport equation. In

“Mathematical Aspects of Finite Elements in Partial Differential Equations”, Academic Press (1974),

pp. 89–123.

44. LOLI G., MONTARDINI M., SANGALLI G., TAN I M. An efﬁcient solver for space-time isogeometric

Galerkin methods for parabolic problems. Comput. Math. Appl. 80 (2020) 2586–2603.

45. MAY D.A., SANA N P., RUP P K., KN EP LE Y M.G., SMITH B.F. Extreme-scale multigrid components

within PETSc. Article 5 in “Proceedings of the Platform for Advanced Scientiﬁc Computing Confer-

ence”, ACM (2016).

46. MCDONAL D E., WATHEN A. A simple proposal for parallel computation over time of an evolutionary

process with implicit time stepping. In “Numerical Mathematics and Advanced Applications ENUMATH

2015”, Springer (2016), pp. 285–293.

47. MEIDNER D., V EX LE R B. Adaptive space-time ﬁnite element methods for parabolic optimization prob-

lems. SIAM J. Control Optim. 46 (2007) 116–142.

48. MILLER S.T., HAB ER R.B. A spacetime discontinuous Galerkin method for hyperbolic heat conduction.

Comput. Methods Appl. Mech. Engrg. 198 (2008) 194–209.

Fast Parallel Solver for the Space-Time IgA-DG Discretization of the Diffusion Equation 19

49. NE UM ¨

ULLER M., STE IN BAC H O. Reﬁnement of ﬂexible space-time ﬁnite element meshes and discon-

tinuous Galerkin methods. Comput. Visual. Sci. 14 (2011) 189–205.

50. QUARTERON I A. Numerical Models for Differential Problems. Springer, Milan (2009).

51. SC H ¨

OTZAU D., SCHWAB C. An hp a priori error analysis of the DG time-stepping method for initial

value problems. Calcolo 37 (2000) 207–232.

52. SERRA-CAPIZZANO S. Generalized locally Toeplitz sequences: spectral analysis and applications to

discretized partial differential equations. Linear Algebra Appl. 366 (2003) 371–402.

53. SERRA-CAPIZZANO S. The GLT class as a generalized Fourier analysis and applications. Linear Al-

gebra Appl. 419 (2006) 180–233.

54. SHAKIB F., HUGHES T.J.R., ZDEN ˇ

EK J. A new ﬁnite element formulation for computational ﬂuid dy-

namics: X. The compressible Euler and Navier–Stokes equations. Comput. Methods Appl. Mech. Engrg.

89 (1991) 141–219.

55. STEINBAC H O., YANG H. Comparison of algebraic multigrid methods for an adaptive space-time ﬁnite

element discretization of the heat equation in 3D and 4D. Numer. Linear Algebra Appl. 25(2018) e2143.

56. STEINBAC H O., YANG H. Space-time ﬁnite element methods for parabolic evolution equations: dis-

cretization, a posteriori error estimation, adaptivity and solution. In “Space-Time Methods: Applica-

tions to Partial Differential Equations”, Radon Series on Computational and Applied Mathematics 25

(2019), pp. 207–248.

57. SUDIRHAM J.J., VAN DE R VEG T J.J.W., VAN DAMME R.M.J. Space-time discontinuous Galerkin

method for advection-diffusion problems on time-dependent domains. Appl. Numer. Math. 56 (2006)

1491–1518.

58. TEZDUYAR T.E., BEH R M., LIOU J. A new strategy for ﬁnite element computations involving moving

boundaries and interfaces—The deforming-spatial-domain/space-time procedure: I. The concept and the

preliminary numerical tests. Comput. Methods Appl. Mech. Engrg. 94 (1992) 339–351.

59. TEZDUYAR T.E., BEH R M., MI TTAL S ., L IOU J. A new strategy for ﬁnite element computations involv-

ing moving boundaries and interfaces—The deforming-spatial-domain/space-time procedure: II. Com-

putation of free-surface ﬂows, two-liquid ﬂows, and ﬂows with drifting cylinders. Comput. Methods

Appl. Mech. Engrg. 94 (1992) 353–371.

60. TEZDUYAR T.E., SATHE S., KEE DY R., ST EI N K. Space-time ﬁnite element techniques for computation

of ﬂuid-structure interactions. Comput. Methods Appl. Mech. Engrg. 195 (2006) 2002–2027.

61. THITE S. Adaptive spacetime meshing for discontinuous Galerkin methods. Comput. Geom. 42 (2009)

20–44.

62. THOMÉE V. Galerkin Finite Element Methods for Parabolic Problems. Springer, New York (2006).

63. VAN DER VE GT J.J.W., VAN DER VE N H. Space-time discontinuous Galerkin ﬁnite element method

with dynamic grid motion for inviscid compressible ﬂows: I. General formulation. J. Comput. Phys. 182

(2002) 546–585.

64. ZULIAN P., KO PANI ˇ

CÁKOVÁ A., NES TO LA M.C.G., FINK A., FAD EL N ., M AGRI V., SCHNEIDER T.,

BOTT ER E., MANKAU J. Utopia: a C++ embedded domain speciﬁc language for scientiﬁc computing.

Git repository. https://bitbucket.org/zulianp/utopia (2016).