Content uploaded by Hussam Al Daas

Author content

All content in this area was uploaded by Hussam Al Daas on Sep 13, 2021

Content may be subject to copyright.

A ROBUST ALGEBRAIC MULTILEVEL DOMAIN

DECOMPOSITION PRECONDITIONER FOR SPARSE SYMMETRIC

POSITIVE DEFINITE MATRICES∗

HUSSAM AL DAAS†AND PIERRE JOLIVET‡

Abstract. Domain decomposition (DD) methods are widely used as preconditioner techniques.

Their eﬀectiveness relies on the choice of a locally constructed coarse space. Thus far, this

construction was mostly achieved using non-assembled matrices from discretized partial diﬀerential

equations (PDEs). Therefore, DD methods were mainly successful when solving systems stemming

from PDEs. In this paper, we present a fully algebraic multilevel DD method where the coarse space

can be constructed locally and eﬃciently without any information besides the coeﬃcient matrix.

The condition number of the preconditioned matrix can be bounded by a user-prescribed number.

Numerical experiments illustrate the eﬀectiveness of the preconditioner on a range of problems arising

from diﬀerent applications.

Key words. Algebraic domain decomposition, multilevel preconditioner, overlapping Schwarz

method, sparse linear system.

1. Introduction. We are interested in solving the linear system of equations

Ax =b,

where A∈Rn×nis a sparse symmetric positive deﬁnite (SPD) matrix and b∈Rnis

the right-hand side. On the one hand, despite their accuracy, direct methods [13] that

are based on matrix factorizations become memory and computationally demanding

for large-scale problems. Furthermore, establishing a high level of concurrency in

their algorithm is challenging, which limits the eﬀectiveness of their parallelization

with many processing units, e.g., thousands of MPI processes. On the other hand,

iterative methods, such as Krylov subspace methods, are attractive as they require

less memory resources and parallelizing them is easier. However, their convergence

depends on the coeﬃcient matrix A, the initial guess x0, and the right-hand side b.

More precisely, the error at iteration kof the conjugate gradient method [21] satisﬁes

kxk−x?kA≤2kx0−x?kA pκ2(A)−1

pκ2(A)+1!k

,

where x?is the exact solution and κ2(A) is the spectral condition number of A.

Therefore, iterative methods are usually combined with preconditioners that modiﬁes

the properties of the linear system such that the convergence rate of the methods

is improved. A variety of preconditioning techniques have been proposed in the

literature, see the recent survey [36] and references therein. We focus in this work on

preconditioners for SPD matrices. In terms of construction type, these preconditioners

can be split into two categories. (1) Algebraic preconditioners: those do not require

information from the problem besides the linear system, and their construction relies

only on Aand b[5,22,31,35,38]. (2) Analytic preconditioners: in order to

construct them, more information from the origin of the linear system, e.g., matrix

∗Submitted to the editors September 13, 2021.

†STFC Rutherford Appleton Laboratory, Harwell Campus, Didcot, Oxfordshire, OX11 0QX, UK

(hussam.al-daas@stfc.ac.uk).

‡CNRS, ENSEEIHT, 2 rue Charles Camichel, 31071 Toulouse Cedex 7, France

(pierre.jolivet@enseeiht.fr).

1

2H. AL DAAS AND P. JOLIVET

assembly procedure, is required [25,26,41]. Inferring how preconditioners modify

the spectrum of iteration matrices provides another way to classify them. Again, two

categories exist. (1) One-level preconditioners: those mostly rely on incomplete matrix

factorizations, matrix splitting methods, approximate sparse inverse methods, and

Schwarz methods [38]. One-level preconditioners usually bound from above the largest

eigenvalue of the preconditioned matrix. (2) Two-level and multilevel preconditioners:

those are usually a combination of a one-level method and a coarse space correction.

While the one-level part can bound from above the largest eigenvalue, the coarse space

is used to bound from below the smallest eigenvalue such that the condition number of

the preconditioned matrix is bounded [2,3,4,12,15,16,18,19,28,29,31,49,42,45].

When it comes to overlapping DD, most one-level preconditioners and a few two-

level/multilevel preconditioners are algebraic, while most two-level preconditioners are

analytic. On the one hand, analytic two-level/multilevel preconditioners construct

the coarse space eﬃciently without requiring computations involving the global

matrix. On the other hand, existing algebraic two-level/multilevel preconditioners

still require global computations involving the matrix Athat limit the setup

scalability [2,16]. Furthermore, certain algebraic two-level preconditioners require

complicated operations that may not be easy to parallelize. Therefore, we focus

in this paper on two-level/multilevel preconditioners where the coarse space can be

constructed locally. Certain algebraic multigrid (AMG) methods are examples of these

preconditioners [35]. Note that several AMG methods require unassembled matrices or

the near-nullspace of the global matrix, which is known in some applications [10,44].

One could argue that these methods are thus not purely algebraic. Furthermore,

their eﬀectiveness has been proved only for certain classes of matrices. An algebraic

two-level preconditioner for the normal matrix equations was recently proposed in [4].

In [2], the authors presented an algebraic framework to construct robust coarse

spaces and characterized a class of local symmetric positive semi-deﬁnite (SPSD)

matrices that allows to construct such coarse spaces eﬃciently. Since then, there have

been several attempts to construct algebraic two-level preconditioners with a locally

computed coarse space that are theoretically eﬀective on any sparse SPD matrix,

see, e.g., [16] and references therein. Starting oﬀ with the subdomain matrices of

A, the authors in [16] deﬁne an auxiliary matrix A+such that A−A+is low-rank

and a local SPSD splitting for A+is easily obtained. A robust algebraic two-level

preconditioner for Ais then derived by a low-rank update of the robust algebraic two-

level preconditioner of A+. Despite the fact that the preconditioner proposed in [16]

is fully algebraic, using it in practice may not be very attractive since the low-rank

update requires the solution of linear systems with A+involving a large number of

right-hand sides that is nearly equal to the size of the coarse space of A+, which is

prohibitive for large number of subdomains. Therefore, we believe that the question

of ﬁnding eﬃcient locally constructed coarse spaces is still open.

When information such as the near-nullspace or the subdomain non-assembled

matrices are available, analytic AMG or DD preconditioners are optimal. The

preconditioner presented in this paper should be used when a robust black-box solver

is needed.

The manuscript is organized as follows. We introduce the notation and review

the algebraic DD framework in Section 2.Section 3 presents our main contribution

in ﬁnding local SPSD splitting matrices associated with each subdomain fully

algebraically in an inexpensive way and starting from local data. These matrices

will be used to construct a robust two-level Schwarz preconditioner. Then, we brieﬂy

discuss the straightforward extension of our approach to a multilevel preconditioner.

PRECONDITIONER FOR SPARSE SPD MATRICES 3

Afterwards, we present in Section 4 numerical experiments on problems arising from

diﬀerent engineering applications. Concluding remarks and future lines of research

are given in Section 5.

Notation. We end our introduction by deﬁning notations that will be used in this

paper. Let 1 ≤n≤mand let B∈Rm×n. Let S1⊂J1, mKand S2⊂J1, nKbe two sets

of integers. B(S1,:) is the submatrix of Bformed by the rows whose indices belong

to S1and B(:, S2) is the submatrix of Bformed by the columns whose indices belong

to S2. The matrix B(S1, S2) is formed by taking the rows whose indices belong to S1

and only retaining the columns whose indices belong to S2. The concatenation of

any two sets of integers S1and S2is represented by [S1, S2]. Note that the order of

the concatenation is important. The set of the ﬁrst ppositive integers is denoted by

J1, pK. The identity matrix of size nis denoted by In.

2. Domain decomposition. Throughout this section, we assume that Cis a

general n×nsparse SPD matrix. Let the nodes Vin the corresponding adjacency

graph G(C) be numbered from 1 to n. A graph partitioning algorithm can be used to

split Vinto Nndisjoint subsets ΩIi (1 ≤i≤N) of size nIi . These sets are called

nonoverlapping subdomains.

2.1. Abstract setting for two-level overlapping Schwarz methods.

Deﬁning ﬁrst a one-level Schwarz preconditioner requires overlapping subdomains.

Let ΩΓibe the subset of size nΓiof nodes that are distance one in G(C) from the nodes

in ΩIi (1 ≤i≤N). The overlapping subdomain Ωiis deﬁned to be Ωi= [ΩI i,ΩΓi],

with size ni=nΓi+nIi . The complement of Ωiin J1, nKis denoted by ΩcΓi.

Associated with Ωiis a restriction (or projection) matrix Ri∈Rni×ngiven by

Ri=In(Ωi,:). Rimaps from the global domain to subdomain Ωi. Its transpose R>

i

is a prolongation matrix that maps from subdomain Ωito the global domain.

The theory in this paper requires a decomposition of the graph of C2. Hence, in

addition to the previous subsets, we deﬁne the following ones. We denote Ω∆ithe

subset of size n∆icontaining nodes that are not in ΩIi and distance one in G(C) from

the nodes in ΩΓi(1 ≤i≤N). The extended overlapping subdomain e

Ωiis deﬁned to

be e

Ωi= [ΩIi ,ΩΓi,Ω∆i] and it is of size eni. We denote the complement of e

Ωiin J1, nKby

Ωc∆i. Associated with e

Ωiis a restriction matrix e

Ri∈R

eni×ngiven by e

Ri=In(Ωi,:).

e

Rimaps from the global domain to the extended overlapping subdomain e

Ωi. Its

transpose e

R>

iis a prolongation matrix that maps from the extended overlapping

subdomain e

Ωito the global domain.

The one-level additive Schwarz preconditioner [12] is deﬁned to be

M−1

ASM =

N

X

i=1

R>

iC−1

ii Ri, Cii =RiCR>

i.

Applying this preconditioner to a vector involves solving concurrent local problems

in the overlapping subdomains. Increasing Nreduces the sizes niof the overlapping

subdomains, leading to smaller local problems and faster computations. However,

in practice, the preconditioned system using M−1

ASM may not be well-conditioned,

inhibiting convergence of the iterative solver. In fact, the local nature of this

preconditioner can lead to a deterioration in its eﬀectiveness as the number of

subdomains increases because of the lack of global information from the matrix C[12,

15]. To maintain robustness with respect to N, a coarse space is added to

the preconditioner (also known as second-level correction) that includes global

4H. AL DAAS AND P. JOLIVET

information.

Let 0 < nCn. If R0∈RnC×nis of full row rank, the two-level additive Schwarz

preconditioner [12] is deﬁned to be

(2.1) M−1

additive =

N

X

i=0

R>

iC−1

ii Ri=R>

0C−1

00 R0+M−1

ASM, C00 =R0CR>

0.

Observe that, since Cand R0are of full rank, C00 is also of full rank. For any

full rank R0, it is possible to cheaply obtain upper bounds on the largest eigenvalue

of the preconditioned matrix, independently of nand N[2]. However, bounding

the smallest eigenvalue is highly dependent on R0. Therefore, the choice of R0is

key to obtaining a well-conditioned system and building eﬃcient two-level Schwarz

preconditioners. Two-level Schwarz preconditioners have been used to solve a large

class of systems arising from a range of engineering applications (see, for example,

[18,24,30,32,40,46] and references therein).

Following [2], we denote by Di∈Rni×ni(1 ≤i≤N) any non-negative diagonal

matrices such that

N

X

i=1

R>

iDiRi=In.

We refer to (Di)1≤i≤Nas an algebraic partition of unity. In [2], Al Daas and Grigori

show how to select local subspaces Zi∈Rni×piwith pini(1 ≤i≤N) such that,

if R>

0is deﬁned to be R>

0= [R>

1D1Z1, . . . , R>

NDNZN], the spectral condition number

of the preconditioned matrix M−1

additiveCis bounded from above independently of N

and n.

2.2. Algebraic local SPSD splitting of an SPD matrix. We now recall

the deﬁnition of an algebraic local SPSD splitting of an SPD matrix given in [2] and

generalized in [3].

An algebraic local SPSD splitting of the SPD matrix Cwith respect to the i-th

subdomain is deﬁned to be any SPSD matrix e

Ci∈Rn×nthat satisﬁes the following

0≤u>e

Ciu≤u>Cu, for all u∈Rn,

RcΓie

Ci= 0.

We denote the nonzero submatrix of e

Ciby e

Cii so that

e

Ci=R>

ie

CiiRi.

Associated with the local SPSD splitting matrices, we deﬁne a multiplicity

constant kmthat satisﬁes the inequality

(2.2) 0 ≤

N

X

i=1

u>e

Ciu≤kmu>Cu, for all u∈Rn.

Note that, for any set of SPSD splitting matrices, km≤N.

The main motivation for deﬁning splitting matrices is to ﬁnd local seminorms that

are bounded from above by the C-norm. These seminorms will be used to determine a

subspace that contains the eigenvectors of Cassociated with its smallest eigenvalues.

PRECONDITIONER FOR SPARSE SPD MATRICES 5

3. Local SPSD splitting matrices. In this section we show how to construct

local SPSD splitting matrices of a sparse SPD matrix eﬃciently using only local

subdomain information.

3.1. From normal equations matrices to general SPD matrices. In [4],

the authors presented how to compute local SPSD splitting matrices for the normal

equations matrix C=B>Bwhere B∈Rm×n. Considering the case B=A,

we have C=A2. Thus, provided the theory developed in [4], we can compute

local SPSD splitting matrices of A2eﬃciently. Using the permutation matrix

Pi=I(ΩIi ,ΩΓi,Ω∆i,Ωc∆i],:), we can write

PiAP >

i=

AIi AIΓi

AΓIi AΓiAΓ∆i

A∆ΓiA∆iA∆c∆i

Ac∆∆iAc∆i

,

and

e

Ci=e

R>

iX>

iXie

Ri

is a SPSD splitting of A2, where Xiis given as

(3.1) Xi=RiAe

R>

i=AIi AIΓi

AΓIi AΓiAΓ∆i.

Remark 3.1. All terms from (3.1) stem from the original coeﬃcient matrix A, in

the sense that there is no connection with the underlying discretization scheme or

matrix assembly procedure. In a parallel computing context, e.g., if Ais distributed

following a contiguous one-dimensional row partitioning among MPI processes, all

terms may be retrieved using peer-to-peer communication between neighboring

processes.

Lemma 3.2 demonstrates how to obtain a local SPSD splitting of Awith respect to

the extended overlapping subdomains given a SPSD splitting of A2.

Lemma 3.2. Let e

Cibe a local SPSD splitting of C=A2, and let e

Aibe the square

root SPSD matrix of e

Cisuch that e

A2

i=e

Ci. Then, e

Aiis a local SPSD splitting of A

with respect to the extended overlapping subdomain e

Ωi.

Proof. First, observe that for any vector u∈Rn,

u>(A2−e

A2

i)u=u>(A+e

Ai)(A−e

Ai)u.

Since A+e

Aiis SPD, we can write A+e

Ai=W>

iWi, and we have

u>Wi(A−e

Ai)W−1

iu=u>W−>

iW>

iWi(A−e

Ai)W−1

iu

=v>W>

iWi(A−e

Ai)v

=v>(A+e

Ai)(A−e

Ai)v

=v>(A2−e

A2

i)v

≥0,

where v=W−1

iu. Since Wi(A−e

Ai)W−1

iand A−e

Aihave the same eigenvalues, we

conclude that A−e

Aiis SPSD. The locality of e

Aistems from the locality of e

Ci.

6H. AL DAAS AND P. JOLIVET

We note that the SPSD splitting e

Aiobtained from the SPSD splitting of A2is

local with respect to the extended overlapping subdomain e

Ωi. A Schur complement

technique can be applied to obtain the locality to the subdomain Ωi.Lemma 3.3

presents how to obtain a local SPSD splitting matrix of Awith respect to the

subdomain Ωifrom the local SPSD splitting of Awith respect to the extended

overlapping subdomain e

Ωi.

Lemma 3.3. Let e

Ai=e

R>

ie

Aii e

Ribe a local SPSD splitting of Awith respect to the

extended overlapping subdomain e

Ωi. Let e

Aii be written as a (2,2) block matrix such

that the (1,1) block corresponds to the overlapping subdomain Ωiand the (2,2) block

corresponds to Ω∆i, i.e.,

e

Aii =Xi,11 Xi,12

Xi,21 Xi,22,

and let

(3.2) e

Aii =Xi,11 −Xi,12X−1

i,22Xi,21 ,

where we assume that Xi,22 is SPD. Then, e

Ai=R>

ie

AiiRiis a SPSD splitting of A

with respect to the subdomain Ωi.

Proof. We have

e

Aii =Xi,11 Xi,12

Xi,21 Xi,22

=Xi,11 −Xi,12X−1

i,22Xi,21 +Xi,12 X−1

i,22Xi,21 Xi,12

Xi,21 Xi,22.

Since Xi,22 is SPD and e

Aii is SPSD, Xi,11 −Xi,12X−1

i,22Xi,21 is SPSD. Therefore,

0≤u>e

Aiu=u>R>

ie

AiiRiu

≤u>e

R>

ie

Aii e

Riu

≤u>Au.

Remark 3.4. Since the SPSD splitting will be used to construct a preconditioner,

the assumption in Lemma 3.3 that Xi,22 is SPD can be obtained by shifting its

diagonal elements by a small value such as kXi,22k2ε, where εis the ﬂoating-point

machine precision. One can also shift the diagonal values of the matrix e

Aii by a small

value ke

Aiik2εso that the Schur complement can be well deﬁned.

In the following section, we explain how to compute the local SPSD splitting matrices

eﬃciently.

3.2. Practical construction of local SPSD matrices. The construction of

robust two-level overlapping Schwarz preconditioners is based on computing the coarse

space projection operator R0. Using the local SPSD splitting matrices of A,R0can

be chosen as the matrix that spans the space

Z=

N

M

i=1

R>

iDiZi,

PRECONDITIONER FOR SPARSE SPD MATRICES 7

where Ziis deﬁned to be

(3.3) Zi= span{u|DiAiiDiu=λe

Aiiu, and λ > 1/τ },

where τ > 0 is a user-speciﬁed number. The condition number of the preconditioned

matrix M−1

additiveAis bounded from above by (kc+ 1) 2 + (2kc+ 1)km

τ, where kc

is the number of colors required to color the graph of Asuch that any two

neighboring subdomains have diﬀerent colors and kmis the multiplicity constant that

satisﬁes (2.2).

Solving the generalized eigenvalue problem in (3.3) using iterative solvers such

as the Krylov–Schur method [43] requires solving linear systems of the form e

Aiiu=

v. The matrix e

Aii is the Schur complement of the matrix e

Aii =X>

iXi1

2, where

Xi=RiAe

R>

i. Let Xi=UiΣiV>

ibe the economic singular-value decomposition of

Xiand let V⊥

ibe an orthonormal matrix whose columns form a complementary basis

of the columns of Vi, i.e., [Vi, V ⊥

i] is an orthogonal matrix. Note that V⊥

i(V⊥

i)>=

I˜ni−ViV>

i. Using Remark 3.4,e

Aii can be chosen as

e

Aii =ViΣiV>

i+σ1iεI˜ni

=ViΣiV>

i+σ1iε[Vi, V ⊥

i][Vi, V ⊥

i]>

=Vi(Σi+σ1iεIni)V>

i+σ1iεV ⊥

i(V⊥

i)>

=Vi(Σi+σ1iεIni)V>

i+σ1iε(I˜ni−ViV>

i),

where σ1iis the largest singular value of Xi. One way to solve the linear system

e

Aiiu=vis thus to solve the augmented linear system

e

Aii u

y=v

0.

Given the singular-value decomposition of e

Aii, the solution ucan be obtained

eﬃciently. Indeed, the inverse of e

Aii is

(3.4) e

A−1

ii =Vi(Σi+σ1iεIni)−1V>

i+σ−1

1iε−1(I˜ni−ViV>

i).

In our current implementation, the singular-value decomposition is computed

concurrently using LAPACK [6]. This implies that the sparse matrix Xi, see (3.1),

is converted to a dense representation. Then, e

Aii is never assembled, and instead,

the action of its inverse is applied in a matrix-free fashion using (3.4). Since these

operations are local to each subdomain, they remain tractable. However, it could

be beneﬁcial to leverage the lower memory-footprint of iterative sparse singular-value

solvers, e.g., PRIMME SVDS [48]. To the best of our knowledge, no such solver may

be used to retrieve the complete economic singular-value decomposition of a sparse

matrix.

Since the construction of the two-level method is fully algebraic, one can

successively apply the same approach on the coarse space matrix to obtain a multilevel

preconditioner in which, the condition number of each preconditioned matrix is

bounded from above by a prescribed number. Note that if the matrices e

Aii for

i= 1, . . . , N are formed explicitly as in (3.2), we can use the strategy that we proposed

in [3] to construct a multilevel preconditioner with the same properties.

8H. AL DAAS AND P. JOLIVET

4. Numerical experiments. In this section, we present a variety of numerical

experiments that show the eﬀectiveness and eﬃciency of the proposed preconditioner.

First, we compare it against state-of-the-art algebraic multigrid preconditioners

including AGMG [34,35], BoomerAMG [14], and GAMG [1]. Then, we include

numerical experiments where the proposed preconditioner is used to solve coarse

problems from other multilevel solvers, thus emphasizing the algebraic and robust

traits of our method. Except for AGMG which is used through its MATLAB

interface, all these experiments are performed using PETSc [7]. In particular, the

proposed preconditioner is a natural extension of the PCHPDDM infrastructure [24]

which we use to solve the concurrent generalized eigenvalue problems from (3.3)

via SLEPc [20], and then to deﬁne our multilevel preconditioner by selecting the

appropriate local eigenmodes depending on the user-speciﬁed value of τ. With

respect to Remark 3.1, we use the PETSc routine MatCreateSubMatrices, see https:

//petsc.org/release/docs/manualpages/Mat/MatCreateSubMatrices.html. Instead of

using M−1

additive as deﬁned in (2.1), we will use M−1

deflated, deﬁned as

M−1

deflated =R>

0C−1

00 R0+M−1

RAS(In−CR>

0C−1

00 R0),

where M−1

RAS is the well-known one-level restricted additive Schwarz method [9]. The

choice of M−1

deflated over M−1

additive is motived by previous results from the literature [45],

which exhibit better numerical property of the former over the latter. Table 1 presents

the set of test matrices from the SuiteSparse Matrix Collection [11] that are used ﬁrst.

They represent a subset of the matrices from the collection which satisfy both criteria

“Special Structure equal to Symmetric” and “Positive Deﬁnite equal to Yes”. We

highlight the fact that our proposed preconditioner can handle unstructured systems,

not necessarily stemming from standard PDE discretization schemes, by displaying

some nonzero patterns in Figure 1.

4.1. The algebraic two-level case. In this section, we present a numerical

comparison between our proposed preconditioner and three algebraic multigrid

solvers: AGMG, BoomerAMG, and GAMG. Even though matrices from Table 1 are

SPD, all three AMG solvers encounter diﬃculties in solving many of the associated

linear systems with random right-hand sides. On the contrary, our algebraic two-

level preconditioner is more robust and always reach the prescribed tolerance of 10−8.

Note that a simple one-level preconditioner such as M−1

RAS with a minimal overlap of

one does not converge for these problems. The outer Krylov method is the right-

preconditioned GMRES(30) [39]. For preconditioners used within PETSc (all except

AGMG), the systems are solved using 256 MPI processes and are ﬁrst renumbered

by ParMETIS [27]. For our DD method, a single subdomain is mapped to each

process, i.e., N= 256 in (2.1). Furthermore, exact subdomain and second-level

operator Cholesky factorizations are computed. In the last column of Table 2, the

size of second-level is reported. One may notice that the grid complexities ﬂuctuate

among matrices. Indeed, for small-sized problem s3rmt3m3, the grid complexity is

5,357+5,321

5,357 = 1.99, while for problem parabolic fem, it is 5.26·105+21,736

5.26·105= 1.04.

4.2. The nested-level case. Since our proposed preconditioner is fully

algebraic, we now use it recursively to solve the second-level operator from the

previous section using yet another two-level method instead of using an exact Cholesky

factorization. This thus yields an algebraic three-level preconditioner. HPDDM has

the capability of automatically redistributing coarse operators on a subset of MPI

processes on which the initial coeﬃcient matrix Ais distributed [23]. We still use 256

PRECONDITIONER FOR SPARSE SPD MATRICES 9

Table 1

Test matrices taken from the SuiteSparse Matrix Collection.

Identiﬁer nnnz(A) condest(A)

s3rmt3m3 5,357 207,123 4.4·1010

vanbody 47,072 2,329,056 9.4·1018

gridgena 48,962 512,084 7.1·105

ct20stif 52,329 2,600,295 2.2·1014

nasasrb 54,870 2,677,324 1.5·109

Dubcova2 65,025 1,030,225 10,411

ﬁnan512 74,752 596,992 98.4

consph 83,334 6,010,480 3.2·107

s3dkt3m2 90,449 3,686,223 6.3·1011

shipsec8 114,919 3,303,553 1.5·1014

ship 003 121,728 3,777,036 2.6·1016

boneS01 127,224 5,516,602 4.2·107

bmwcra 1 148,770 10,641,602 9.7·108

G2 circuit 150,102 726,674 2 ·107

pwtk 217,918 11,524,432 5 ·1012

oﬀshore 259,789 4,242,673 2.3·1013

af 4 k101 503,625 17,550,675 6.5·108

parabolic fem 525,825 3,674,625 2.1·105

apache2 715,176 4,817,870 5.3·106

tmt sym 726,713 5,080,961 1.1·109

ecology2 999,999 4,995,991 6.7·107

(a) s3rmt3m3, n= 5,357 (b) ct20stif, n= 52,329 (c) ﬁnan512, n= 74,752

(d) consph, n= 83,334 (e) G2 circuit, n= 1.5·105(f) oﬀshore, n= 2.6·105

Fig. 1.Nonzero sparsity pattern of some of the test matrices from Table 1.

10 H. AL DAAS AND P. JOLIVET

Table 2

Preconditioner comparison: iteration counts are reported in the columns 2–5 if convergence

to the prescribed tolerance of 10−8is achieved in 100 iterations or less. In column 6, sizes of the

second-level operator generated by our proposed preconditioner are reported.

Identiﬁer AGMG BoomerAMG GAMG HPDDM nC

s3rmt3m3 4 5,321

vanbody 18 25,600

gridgena 2 16,706

ct20stif 4 49,421

nasasrb 10 25,600

Dubcova2 76 56 5 12,729

ﬁnan512 9 7 8 4 15,271

consph 26 25,600

s3dkt3m2 49 25,592

shipsec8 7 76,800

ship 003 9 76,759

boneS01 16 25,600

bmwcra 1 20 76,800

G2 circuit 29 11 26 19 21,602

pwtk 47 25,600

oﬀshore 7 76,800

af 4 k101 18 76,800

parabolic fem 12 8 16 17 21,736

apache2 14 11 35 7 76,800

tmt sym 14 10 17 14 32,000

ecology2 18 12 18 45 33,261

MPI processes for the ﬁne-level decomposition, then use four processes for the second-

level decomposition, and the third-level operator is centralized on a single process.

The outer solver is now the ﬂexible GMRES(30) [37]. Second-level systems are this

time solved with the right-preconditioned GMRES(30), with a higher tolerance set

to 10−4, compared to the outer-solver tolerance of 10−8. We investigate problems

s3rmt3m3 and parabolic fem which are the two extremes from the previous section

in terms of grid complexity. Iteration counts are reported in Table 3. One may

notice that the number of outer iterations is exactly the same as in the ﬁfth column

of Table 2, meaning that the switch to an inexact second-level solver does not hinder

the overall convergence. Also, the number of inner iterations is small, so our proposed

preconditioner applied to the second-level operator is indeed robust. Eventually, as

we decrease the number of subdomains for the second-level decomposition, the grid

coarsening improves as well, especially for small-sized problem s3rmt3m3.

In another context, we use our proposed preconditioner to solve coarse systems

yield by two other multilevel preconditioners. The following three-dimensional

problems are discretized by FreeFEM [17] using 4,096 MPI processes. First, we use

GenEO [41] to assemble a two-level analytic preconditioner for a scalar diﬀusion

equation using order-two Lagrange ﬁnite elements. The number of unknowns is

4.17 ·106, and the second-level operator generated by GenEO is of dimension nC,2=

60,144. It is redistributed among 512 processes, and our preconditioner constructs a

third-level operator of dimension nC,3= 12,040. Then, we use GAMG to assemble a

four-level quasi-algebraic (the near-nullspace is provided by the discretization kernel)

PRECONDITIONER FOR SPARSE SPD MATRICES 11

1

5·105

1·106

1.7·106

κ

(a) Scalar diﬀusion in the unit cube with the

coeﬃcient κextruded in one dimension.

1·10−2

200

E(GPa)

0.25

0.45

ν

(b) Elongated (10×ratio) three-dimensional beam

with Young’s modulus (E) and Poisson’s ratio (ν)

extruded in one dimension.

Fig. 2.Variations of the material coeﬃcients for problems from Table 4.

Table 3

Algebraic multilevel preconditioner: Outer iterations is the FGMRES iteration count, Inner

iterations is the average GMRES iteration count to solve coarse systems, nis the size of the linear

system, nC,2(resp. nC,3) is the size of the second-level (resp. third-level) operator.

Identiﬁer Outer

iterations

Inner

iterations n nC,2nC,3

s3rmt3m3 4 10 5,357 5,321 2,240

parabolic fem 17 3 525,825 21,736 3,838

preconditioner for the system of linear elasticity using order-two Lagrange ﬁnite

elements. The number of unknowns is 3.06 ·107. The coarse operator from GAMG

grid hierarchy is of dimension nC,2= 14,880. It is redistributed among 256 processes

using the telescope infrastructure [33] and our preconditioner constructs a ﬁnal-level

operator of dimension nC,3= 5,120. Unlike what is traditionally done with smoothed-

aggregation AMG [47], we do not transfer explicitly the near-nullspace from GAMG

coarse level for setting up our

preconditioner. These results are gathered in Table 4. Again, one may notice that

the fast and accurate convergence of the inner solves (third column) does not hinder

the overall convergence (second column). For both the scalar diﬀusion equation ∇ · κ∇

and the system of linear elasticity, highly heterogeneous material coeﬃcients are used,

see Figure 2a and Figure 2b, respectively.

Furthermore, as in subsection 4.1, note that using a simple one-level

preconditioner such as M−1

RAS with a minimal overlap of one for solving coarse systems

from Tables 3 and 4does not yield accurate enough inner solutions, thus preventing

the outer solvers from converging. Coupling GAMG with our preconditioner is a good

assessment of the composability of PETSc solvers [8], for the interested reader, we

provide next in Figure 3 the exact options used to setup such a multilevel solver.

5. Conclusion. We presented in this paper a fully algebraic and locally

constructed multilevel overlapping Schwarz preconditioner that can bound from above

the condition number of the preconditioned matrix given a user-deﬁned number. The

construction of the preconditioner relies on ﬁnding local SPSD splitting matrices of

the matrix A. Computing these splitting matrices involves the computation of the

right singular vectors of the local block row matrix which might be considered costly

on the ﬁne level. However, the locality of computations and the robustness of the

12 H. AL DAAS AND P. JOLIVET

Table 4

Hybrid multilevel preconditioner: Outer iterations is the FGMRES iteration count, Inner

iterations is the average GMRES iteration count to solve coarse systems, nis the size of the

linear system, nC,2is the size of the coarse-level operator assembled by either GenEO (for problem

diﬀusion) or GAMG (for problem elasticity), nC,3is the size of the second-level operator assembled

by our algebraic preconditioner to solve the aforementioned coarse systems.

Identiﬁer Outer

iterations

Inner

iterations n nC,2nC,3

diﬀusion 11 5 4,173,281 60,144 12,040

elasticity 8 11 30,633,603 14,880 5,120

-ksp_type fgmres

-ksp_rtol 1.0e-8

-pc_type gamg

-pc_gamg_threshold 0.01

-pc_gamg_repartition

-pc_mg_levels 4

-prefix_push mg_coarse_

-pc_type telescope

-prefix_push pc_telescope_

-reduction_factor 16

-prefix_pop

-prefix_pop

# continue on the right column

# continued from the left column

-prefix_push mg_coarse_telescope_

-ksp_converged_reason

-ksp_type gmres

-ksp_pc_side right

-ksp_norm_type unpreconditioned

-ksp_rtol 1.0e-4

-pc_type hpddm

-prefix_push pc_hpddm_

-define_subdomains

-levels_1_pc_type asm #M−1

RAS

-levels_1_sub_pc_type cholesky # subdomain solvers

-levels_1_eps_nev 20 # smallest λin (3.3)

-levels_1_st_type mat #

e

Aii

−1from (3.4)

-coarse_pc_type cholesky # coarse solver

-prefix_pop

-prefix_pop

Fig. 3.PETSc command-line options for coupling GAMG and the proposed preconditioner.

preconditioner provide a very powerful and scalable preconditioner that can be used

as a black-box solver especially when other black-box preconditioners fail to achieve

a desired convergence rate. Our implementation is readily available in the PETSc

library. Again, the proposed preconditioner is not meant to replace analytic multilevel

preconditioners such as smoothed-aggregation algebraic multigrid and GenEO. When

these work, they will be more eﬃcient algorithmically. However, employing the

proposed preconditioner to solve the corresponding coarse problems proved to be

eﬀective and eﬃcient. As a future work, we would like to investigate less expensive

constructions of SPSD matrices for speciﬁc classes of SPD matrices that arise from

the discretization of PDEs.

Acknowledgments. This work was granted access to the GENCI-sponsored

HPC resources of TGCC@CEA under allocation A0090607519. The authors would

like to thank J. E. Roman for interesting discussions concerning the solution of (3.3).

REFERENCES

[1] M. F. Adams, H. H. Bayraktar, T. M. Keaveny, and P. Papadopoulos,Ultrascalable

implicit ﬁnite element analyses in solid mechanics with over a half a billion degrees of

freedom, in Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, SC04,

IEEE Computer Society, 2004, pp. 34:1–34:15.

[2] H. Al Daas and L. Grigori,A class of eﬃcient locally constructed preconditioners based on

coarse spaces, SIAM Journal on Matrix Analysis and Applications, 40 (2019), pp. 66–91.

PRECONDITIONER FOR SPARSE SPD MATRICES 13

[3] H. Al Daas, L. Grigori, P. Jolivet, and P.-H. Tournier,A multilevel Schwarz

preconditioner based on a hierarchy of robust coarse spaces, SIAM Journal on Scientiﬁc

Computing, 43 (2021), pp. A1907–A1928.

[4] H. Al Daas, P. Jolivet, and J. A. Scott,A robust algebraic domain decomposition

preconditioner for sparse normal equations, 2021, https://arxiv.org/abs/2107.09006.

[5] H. Al Daas, T. Rees, and J. A. Scott,Two-level Nystr¨om–Schur preconditioner for sparse

symmetric positive deﬁnite matrices, 2021, https://arxiv.org/abs/2101.12164.

[6] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz,

A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen,LAPACK users’

guide, Society for Industrial and Applied Mathematics, 1999.

[7] S. Balay, S. Abhyankar, M. F. Adams, J. Brown, P. Brune, K. Buschelman, L. Dalcin,

A. Dener, V. Eijkhout, W. D. Gropp, D. Karpeyev, D. Kaushik, M. G. Knepley,

D. A. May, L. C. McInnes, R. T. Mills, T. Munson, K. Rupp, P. Sanan, B. F. Smith,

S. Zampini, H. Zhang, and H. Zhang,PETSc web page, 2021, https://petsc.org.

[8] J. Brown, M. G. Knepley, D. A. May, L. C. McInnes, and B. F. Smith,Composable linear

solvers for multiphysics, in 2012 11th International Symposium on Parallel and Distributed

Computing, 2012, pp. 55–62.

[9] X.-C. Cai and M. Sarkis,A restricted additive Schwarz preconditioner for general sparse

linear systems, SIAM Journal on Scientiﬁc Computing, 21 (1999), pp. 792–797.

[10] T. Chartier, R. D. Falgout, V. E. Henson, J. Jones, T. Manteuffel, S. McCormick,

J. Ruge, and P. S. Vassilevski,Spectral AMGe (ρAMGe), SIAM Journal on Scientiﬁc

Computing, 25 (2003), pp. 1–26.

[11] T. A. Davis and Y. Hu,The University of Florida sparse matrix collection, ACM Transactions

on Mathematical Software, 38 (2011), pp. 1–28.

[12] V. Dolean, P. Jolivet, and F. Nataf,An introduction to domain decomposition methods.

Algorithms, theory, and parallel implementation, Society for Industrial and Applied

Mathematics, 2015.

[13] I. S. Duff, A. M. Erisman, and J. K. Reid,Direct methods for sparse matrices, Oxford

University Press, 2017.

[14] R. D. Falgout and U. M. Yang, hypre: a library of high performance preconditioners,

Computational Science—ICCS 2002, (2002), pp. 632–641.

[15] M. J. Gander and A. Loneland,SHEM: an optimal coarse space for RAS and its multiscale

approximation, in Domain Decomposition Methods in Science and Engineering XXIII, C.-

O. Lee, X.-C. Cai, D. E. Keyes, H. H. Kim, A. Klawonn, E.-J. Park, and O. B. Widlund,

eds., Cham, 2017, Springer International Publishing, pp. 313–321.

[16] L. Gouarin and N. Spillane,Fully algebraic domain decomposition preconditioners

with adaptive spectral bounds. Preprint, June 2021, https://hal.archives-ouvertes.fr/

hal-03258644.

[17] F. Hecht,New development in FreeFem++, Journal of Numerical Mathematics, 20 (2012),

pp. 251–265.

[18] A. Heinlein, C. Hochmuth, and A. Klawonn,Reduced dimension GDSW coarse spaces for

monolithic Schwarz domain decomposition methods for incompressible ﬂuid ﬂow problems,

International Journal for Numerical Methods in Engineering, 121 (2020), pp. 1101–1119.

[19] A. Heinlein, A. Klawonn, J. Knepper, O. Rheinbach, and O. B. Widlund,Adaptive GDSW

coarse spaces of reduced dimension for overlapping Schwarz methods, technical report,

Universit¨at zu K¨oln, September 2020, https://kups.ub.uni-koeln.de/12113/.

[20] V. Hernandez, J. E. Roman, and V. Vidal,SLEPc: a scalable and ﬂexible toolkit for the

solution of eigenvalue problems, ACM Transactions on Mathematical Software, 31 (2005),

pp. 351–362, https://slepc.upv.es.

[21] M. R. Hestenes and E. Stiefel,Methods of conjugate gradients for solving linear systems.,

Journal of research of the National Bureau of Standards., 49 (1952), pp. 409–436.

[22] N. J. Higham and T. Mary,A new preconditioner that exploits low-rank approximations to

factorization error, SIAM Journal on Scientiﬁc Computing, 41 (2019), pp. A59–A82.

[23] P. Jolivet, F. Hecht, F. Nataf, and C. Prud’homme,Scalable domain decomposition

preconditioners for heterogeneous elliptic problems, in Proceedings of the International

Conference on High Performance Computing, Networking, Storage and Analysis, SC ’13,

New York, NY, USA, 2013, ACM, pp. 80:1–80:11.

[24] P. Jolivet, J. E. Roman, and S. Zampini,KSPHPDDM and PCHPDDM: extending PETSc

with advanced Krylov methods and robust multilevel overlapping Schwarz preconditioners,

Computers & Mathematics with Applications, 84 (2021), pp. 277–295.

[25] T. B. J¨

onsth¨

ovel, M. B. van Gijzen, C. Vuik, C. Kasbergen, and A. Scarpas,

Preconditioned conjugate gradient method enhanced by deﬂation of rigid body modes

14 H. AL DAAS AND P. JOLIVET

applied to composite materials, Computer Modeling in Engineering & Sciences, 47 (2009),

pp. 97–118.

[26] T. B. J¨

onsth¨

ovel, M. B. van Gijzen, C. Vuik, and A. Scarpas,On the use of rigid body

modes in the deﬂated preconditioned conjugate gradient method, SIAM Journal on Scientiﬁc

Computing, 35 (2013), pp. B207–B225.

[27] G. Karypis and V. Kumar,Multilevel k-way partitioning scheme for irregular graphs, Journal

of Parallel and Distributed computing, 48 (1998), pp. 96–129.

[28] A. Klawonn, M. K¨

uhn, and O. Rheinbach,Adaptive coarse spaces for FETI-DP in three

dimensions, SIAM Journal on Scientiﬁc Computing, 38 (2016), pp. A2880–A2911.

[29] A. Klawonn, P. Radtke, and O. Rheinbach,FETI-DP methods with an adaptive coarse

space, SIAM Journal on Numerical Analysis, 53 (2015), pp. 297–320.

[30] F. Kong and X.-C. Cai,A scalable nonlinear ﬂuid–structure interaction solver based on a

Schwarz preconditioner with isogeometric unstructured coarse spaces in 3D, Journal of

Computational Physics, 340 (2017), pp. 498–518.

[31] R. Li, Y. Xi, and Y. Saad,Schur complement-based domain decomposition preconditioners

with low-rank corrections, Numerical Linear Algebra with Applications, 23 (2016), pp. 706–

729.

[32] P. Marchand, X. Claeys, P. Jolivet, F. Nataf, and P.-H. Tournier,Two-level

preconditioning for h-version boundary element approximation of hypersingular operator

with GenEO, Numerische Mathematik, 146 (2020), pp. 597–628.

[33] D. A. May, P. Sanan, K. Rupp, M. G. Knepley, and B. F. Smith,Extreme-scale

multigrid components within PETSc, in Proceedings of the Platform for Advanced Scientiﬁc

Computing Conference, PASC ’16, New York, NY, USA, 2016, Association for Computing

Machinery.

[34] A. Napov and Y. Notay,An algebraic multigrid method with guaranteed convergence rate,

SIAM Journal on Scientiﬁc Computing, 34 (2012), pp. A1079–A1109.

[35] Y. Notay,An aggregation-based algebraic multigrid method, Electronic Transactions on

Numerical Analysis, 37 (2010), pp. 123–146, http://agmg.eu.

[36] J. W. Pearson and J. Pestana,Preconditioners for Krylov subspace methods: an overview,

GAMM-Mitteilungen, 43 (2020), p. e202000015.

[37] Y. Saad,A ﬂexible inner-outer preconditioned GMRES algorithm, SIAM Journal on Scientiﬁc

Computing, 14 (1993), pp. 461–469.

[38] Y. Saad,Iterative methods for sparse linear systems, Society for Industrial and Applied

Mathematics, 2003.

[39] Y. Saad and M. H. Schultz,GMRES: a generalized minimal residual algorithm for solving

nonsymmetric linear systems, SIAM Journal on Scientiﬁc and Statistical Computing, 7

(1986), pp. 856–869.

[40] B. F. Smith, P. E. Bjørstad, and W. D. Gropp,Domain decomposition: parallel multilevel

methods for elliptic partial diﬀerential equations, Cambridge University Press, 1996.

[41] N. Spillane, V. Dolean, P. Hauret, F. Nataf, C. Pechstein, and R. Scheichl,Abstract

robust coarse spaces for systems of PDEs via generalized eigenproblems in the overlaps,

Numerische Mathematik, 126 (2014), pp. 741–770.

[42] N. Spillane and D. Rixen,Automatic spectral coarse spaces for robust ﬁnite element tearing

and interconnecting and balanced domain decomposition algorithms, International Journal

for Numerical Methods in Engineering, 95 (2013), pp. 953–990.

[43] G. W. Stewart,A Krylov–Schur algorithm for large eigenproblems, SIAM Journal on Matrix

Analysis and Applications, 23 (2002), pp. 601–614.

[44] R. Tamstorf, T. Jones, and S. F. McCormick,Smoothed aggregation multigrid for cloth

simulation, ACM Transactions on Graphics, 34 (2015).

[45] J. M. Tang, R. Nabben, C. Vuik, and Y. A. Erlangga,Comparison of two-level

preconditioners derived from deﬂation, domain decomposition and multigrid methods,

Journal of Scientiﬁc Computing, 39 (2009), pp. 340–370.

[46] J. Van lent, R. Scheichl, and I. G. Graham,Energy-minimizing coarse spaces for two-level

Schwarz methods for multiscale PDEs, Numerical Linear Algebra with Applications, 16

(2009), pp. 775–799.

[47] P. Vanˇ

ek,Acceleration of convergence of a two-level algorithm by smoothing transfer operators,

Applications of Mathematics, 37 (1992), pp. 265–274.

[48] L. Wu, E. Romero, and A. Stathopoulos,PRIMME SVDS: a high-performance

preconditioned SVD solver for accurate large-scale computations, SIAM Journal on

Scientiﬁc Computing, 39 (2017), pp. S248–S271.

[49] S. Zampini,PCBDDC: a class of robust dual-primal methods in PETSc, SIAM Journal on

Scientiﬁc Computing, 38 (2016), pp. S282–S306.