PreprintPDF Available

A Robust Algebraic Multilevel Domain Decomposition Preconditioner For Sparse Symmetric Positive Definite Matrices

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Domain decomposition (DD) methods are widely used as preconditioner techniques. Their effectiveness relies on the choice of a locally constructed coarse space. Thus far, this construction was mostly achieved using non-assembled matrices from discretized partial differential equations (PDEs). Therefore, DD methods were mainly successful when solving systems stemming from PDEs. In this paper, we present a fully algebraic multilevel DD method where the coarse space can be constructed locally and efficiently without any information besides the coefficient matrix. The condition number of the preconditioned matrix can be bounded by a user-prescribed number. Numerical experiments illustrate the effectiveness of the preconditioner on a range of problems arising from different applications.
Content may be subject to copyright.
A ROBUST ALGEBRAIC MULTILEVEL DOMAIN
DECOMPOSITION PRECONDITIONER FOR SPARSE SYMMETRIC
POSITIVE DEFINITE MATRICES
HUSSAM AL DAASAND PIERRE JOLIVET
Abstract. Domain decomposition (DD) methods are widely used as preconditioner techniques.
Their effectiveness relies on the choice of a locally constructed coarse space. Thus far, this
construction was mostly achieved using non-assembled matrices from discretized partial differential
equations (PDEs). Therefore, DD methods were mainly successful when solving systems stemming
from PDEs. In this paper, we present a fully algebraic multilevel DD method where the coarse space
can be constructed locally and efficiently without any information besides the coefficient matrix.
The condition number of the preconditioned matrix can be bounded by a user-prescribed number.
Numerical experiments illustrate the effectiveness of the preconditioner on a range of problems arising
from different applications.
Key words. Algebraic domain decomposition, multilevel preconditioner, overlapping Schwarz
method, sparse linear system.
1. Introduction. We are interested in solving the linear system of equations
Ax =b,
where ARn×nis a sparse symmetric positive definite (SPD) matrix and bRnis
the right-hand side. On the one hand, despite their accuracy, direct methods [13] that
are based on matrix factorizations become memory and computationally demanding
for large-scale problems. Furthermore, establishing a high level of concurrency in
their algorithm is challenging, which limits the effectiveness of their parallelization
with many processing units, e.g., thousands of MPI processes. On the other hand,
iterative methods, such as Krylov subspace methods, are attractive as they require
less memory resources and parallelizing them is easier. However, their convergence
depends on the coefficient matrix A, the initial guess x0, and the right-hand side b.
More precisely, the error at iteration kof the conjugate gradient method [21] satisfies
kxkx?kA2kx0x?kA pκ2(A)1
pκ2(A)+1!k
,
where x?is the exact solution and κ2(A) is the spectral condition number of A.
Therefore, iterative methods are usually combined with preconditioners that modifies
the properties of the linear system such that the convergence rate of the methods
is improved. A variety of preconditioning techniques have been proposed in the
literature, see the recent survey [36] and references therein. We focus in this work on
preconditioners for SPD matrices. In terms of construction type, these preconditioners
can be split into two categories. (1) Algebraic preconditioners: those do not require
information from the problem besides the linear system, and their construction relies
only on Aand b[5,22,31,35,38]. (2) Analytic preconditioners: in order to
construct them, more information from the origin of the linear system, e.g., matrix
Submitted to the editors September 13, 2021.
STFC Rutherford Appleton Laboratory, Harwell Campus, Didcot, Oxfordshire, OX11 0QX, UK
(hussam.al-daas@stfc.ac.uk).
CNRS, ENSEEIHT, 2 rue Charles Camichel, 31071 Toulouse Cedex 7, France
(pierre.jolivet@enseeiht.fr).
1
2H. AL DAAS AND P. JOLIVET
assembly procedure, is required [25,26,41]. Inferring how preconditioners modify
the spectrum of iteration matrices provides another way to classify them. Again, two
categories exist. (1) One-level preconditioners: those mostly rely on incomplete matrix
factorizations, matrix splitting methods, approximate sparse inverse methods, and
Schwarz methods [38]. One-level preconditioners usually bound from above the largest
eigenvalue of the preconditioned matrix. (2) Two-level and multilevel preconditioners:
those are usually a combination of a one-level method and a coarse space correction.
While the one-level part can bound from above the largest eigenvalue, the coarse space
is used to bound from below the smallest eigenvalue such that the condition number of
the preconditioned matrix is bounded [2,3,4,12,15,16,18,19,28,29,31,49,42,45].
When it comes to overlapping DD, most one-level preconditioners and a few two-
level/multilevel preconditioners are algebraic, while most two-level preconditioners are
analytic. On the one hand, analytic two-level/multilevel preconditioners construct
the coarse space efficiently without requiring computations involving the global
matrix. On the other hand, existing algebraic two-level/multilevel preconditioners
still require global computations involving the matrix Athat limit the setup
scalability [2,16]. Furthermore, certain algebraic two-level preconditioners require
complicated operations that may not be easy to parallelize. Therefore, we focus
in this paper on two-level/multilevel preconditioners where the coarse space can be
constructed locally. Certain algebraic multigrid (AMG) methods are examples of these
preconditioners [35]. Note that several AMG methods require unassembled matrices or
the near-nullspace of the global matrix, which is known in some applications [10,44].
One could argue that these methods are thus not purely algebraic. Furthermore,
their effectiveness has been proved only for certain classes of matrices. An algebraic
two-level preconditioner for the normal matrix equations was recently proposed in [4].
In [2], the authors presented an algebraic framework to construct robust coarse
spaces and characterized a class of local symmetric positive semi-definite (SPSD)
matrices that allows to construct such coarse spaces efficiently. Since then, there have
been several attempts to construct algebraic two-level preconditioners with a locally
computed coarse space that are theoretically effective on any sparse SPD matrix,
see, e.g., [16] and references therein. Starting off with the subdomain matrices of
A, the authors in [16] define an auxiliary matrix A+such that AA+is low-rank
and a local SPSD splitting for A+is easily obtained. A robust algebraic two-level
preconditioner for Ais then derived by a low-rank update of the robust algebraic two-
level preconditioner of A+. Despite the fact that the preconditioner proposed in [16]
is fully algebraic, using it in practice may not be very attractive since the low-rank
update requires the solution of linear systems with A+involving a large number of
right-hand sides that is nearly equal to the size of the coarse space of A+, which is
prohibitive for large number of subdomains. Therefore, we believe that the question
of finding efficient locally constructed coarse spaces is still open.
When information such as the near-nullspace or the subdomain non-assembled
matrices are available, analytic AMG or DD preconditioners are optimal. The
preconditioner presented in this paper should be used when a robust black-box solver
is needed.
The manuscript is organized as follows. We introduce the notation and review
the algebraic DD framework in Section 2.Section 3 presents our main contribution
in finding local SPSD splitting matrices associated with each subdomain fully
algebraically in an inexpensive way and starting from local data. These matrices
will be used to construct a robust two-level Schwarz preconditioner. Then, we briefly
discuss the straightforward extension of our approach to a multilevel preconditioner.
PRECONDITIONER FOR SPARSE SPD MATRICES 3
Afterwards, we present in Section 4 numerical experiments on problems arising from
different engineering applications. Concluding remarks and future lines of research
are given in Section 5.
Notation. We end our introduction by defining notations that will be used in this
paper. Let 1 nmand let BRm×n. Let S1J1, mKand S2J1, nKbe two sets
of integers. B(S1,:) is the submatrix of Bformed by the rows whose indices belong
to S1and B(:, S2) is the submatrix of Bformed by the columns whose indices belong
to S2. The matrix B(S1, S2) is formed by taking the rows whose indices belong to S1
and only retaining the columns whose indices belong to S2. The concatenation of
any two sets of integers S1and S2is represented by [S1, S2]. Note that the order of
the concatenation is important. The set of the first ppositive integers is denoted by
J1, pK. The identity matrix of size nis denoted by In.
2. Domain decomposition. Throughout this section, we assume that Cis a
general n×nsparse SPD matrix. Let the nodes Vin the corresponding adjacency
graph G(C) be numbered from 1 to n. A graph partitioning algorithm can be used to
split Vinto Nndisjoint subsets ΩIi (1 iN) of size nIi . These sets are called
nonoverlapping subdomains.
2.1. Abstract setting for two-level overlapping Schwarz methods.
Defining first a one-level Schwarz preconditioner requires overlapping subdomains.
Let ΩΓibe the subset of size nΓiof nodes that are distance one in G(C) from the nodes
in ΩIi (1 iN). The overlapping subdomain Ωiis defined to be Ωi= [ΩI i,Γi],
with size ni=nΓi+nIi . The complement of Ωiin J1, nKis denoted by ΩcΓi.
Associated with Ωiis a restriction (or projection) matrix RiRni×ngiven by
Ri=In(Ωi,:). Rimaps from the global domain to subdomain Ωi. Its transpose R>
i
is a prolongation matrix that maps from subdomain Ωito the global domain.
The theory in this paper requires a decomposition of the graph of C2. Hence, in
addition to the previous subsets, we define the following ones. We denote Ωithe
subset of size nicontaining nodes that are not in ΩIi and distance one in G(C) from
the nodes in ΩΓi(1 iN). The extended overlapping subdomain e
iis defined to
be e
i= [ΩIi ,Γi,i] and it is of size eni. We denote the complement of e
iin J1, nKby
ci. Associated with e
iis a restriction matrix e
RiR
eni×ngiven by e
Ri=In(Ωi,:).
e
Rimaps from the global domain to the extended overlapping subdomain e
i. Its
transpose e
R>
iis a prolongation matrix that maps from the extended overlapping
subdomain e
ito the global domain.
The one-level additive Schwarz preconditioner [12] is defined to be
M1
ASM =
N
X
i=1
R>
iC1
ii Ri, Cii =RiCR>
i.
Applying this preconditioner to a vector involves solving concurrent local problems
in the overlapping subdomains. Increasing Nreduces the sizes niof the overlapping
subdomains, leading to smaller local problems and faster computations. However,
in practice, the preconditioned system using M1
ASM may not be well-conditioned,
inhibiting convergence of the iterative solver. In fact, the local nature of this
preconditioner can lead to a deterioration in its effectiveness as the number of
subdomains increases because of the lack of global information from the matrix C[12,
15]. To maintain robustness with respect to N, a coarse space is added to
the preconditioner (also known as second-level correction) that includes global
4H. AL DAAS AND P. JOLIVET
information.
Let 0 < nCn. If R0RnC×nis of full row rank, the two-level additive Schwarz
preconditioner [12] is defined to be
(2.1) M1
additive =
N
X
i=0
R>
iC1
ii Ri=R>
0C1
00 R0+M1
ASM, C00 =R0CR>
0.
Observe that, since Cand R0are of full rank, C00 is also of full rank. For any
full rank R0, it is possible to cheaply obtain upper bounds on the largest eigenvalue
of the preconditioned matrix, independently of nand N[2]. However, bounding
the smallest eigenvalue is highly dependent on R0. Therefore, the choice of R0is
key to obtaining a well-conditioned system and building efficient two-level Schwarz
preconditioners. Two-level Schwarz preconditioners have been used to solve a large
class of systems arising from a range of engineering applications (see, for example,
[18,24,30,32,40,46] and references therein).
Following [2], we denote by DiRni×ni(1 iN) any non-negative diagonal
matrices such that
N
X
i=1
R>
iDiRi=In.
We refer to (Di)1iNas an algebraic partition of unity. In [2], Al Daas and Grigori
show how to select local subspaces ZiRni×piwith pini(1 iN) such that,
if R>
0is defined to be R>
0= [R>
1D1Z1, . . . , R>
NDNZN], the spectral condition number
of the preconditioned matrix M1
additiveCis bounded from above independently of N
and n.
2.2. Algebraic local SPSD splitting of an SPD matrix. We now recall
the definition of an algebraic local SPSD splitting of an SPD matrix given in [2] and
generalized in [3].
An algebraic local SPSD splitting of the SPD matrix Cwith respect to the i-th
subdomain is defined to be any SPSD matrix e
CiRn×nthat satisfies the following
0u>e
Ciuu>Cu, for all uRn,
RcΓie
Ci= 0.
We denote the nonzero submatrix of e
Ciby e
Cii so that
e
Ci=R>
ie
CiiRi.
Associated with the local SPSD splitting matrices, we define a multiplicity
constant kmthat satisfies the inequality
(2.2) 0
N
X
i=1
u>e
Ciukmu>Cu, for all uRn.
Note that, for any set of SPSD splitting matrices, kmN.
The main motivation for defining splitting matrices is to find local seminorms that
are bounded from above by the C-norm. These seminorms will be used to determine a
subspace that contains the eigenvectors of Cassociated with its smallest eigenvalues.
PRECONDITIONER FOR SPARSE SPD MATRICES 5
3. Local SPSD splitting matrices. In this section we show how to construct
local SPSD splitting matrices of a sparse SPD matrix efficiently using only local
subdomain information.
3.1. From normal equations matrices to general SPD matrices. In [4],
the authors presented how to compute local SPSD splitting matrices for the normal
equations matrix C=B>Bwhere BRm×n. Considering the case B=A,
we have C=A2. Thus, provided the theory developed in [4], we can compute
local SPSD splitting matrices of A2efficiently. Using the permutation matrix
Pi=I(ΩIi ,Γi,i,ci],:), we can write
PiAP >
i=
AIi AIΓi
AΓIi AΓiAΓ∆i
A∆ΓiAiA∆ci
AciAci
,
and
e
Ci=e
R>
iX>
iXie
Ri
is a SPSD splitting of A2, where Xiis given as
(3.1) Xi=RiAe
R>
i=AIi AIΓi
AΓIi AΓiAΓ∆i.
Remark 3.1. All terms from (3.1) stem from the original coefficient matrix A, in
the sense that there is no connection with the underlying discretization scheme or
matrix assembly procedure. In a parallel computing context, e.g., if Ais distributed
following a contiguous one-dimensional row partitioning among MPI processes, all
terms may be retrieved using peer-to-peer communication between neighboring
processes.
Lemma 3.2 demonstrates how to obtain a local SPSD splitting of Awith respect to
the extended overlapping subdomains given a SPSD splitting of A2.
Lemma 3.2. Let e
Cibe a local SPSD splitting of C=A2, and let e
Aibe the square
root SPSD matrix of e
Cisuch that e
A2
i=e
Ci. Then, e
Aiis a local SPSD splitting of A
with respect to the extended overlapping subdomain e
i.
Proof. First, observe that for any vector uRn,
u>(A2e
A2
i)u=u>(A+e
Ai)(Ae
Ai)u.
Since A+e
Aiis SPD, we can write A+e
Ai=W>
iWi, and we have
u>Wi(Ae
Ai)W1
iu=u>W−>
iW>
iWi(Ae
Ai)W1
iu
=v>W>
iWi(Ae
Ai)v
=v>(A+e
Ai)(Ae
Ai)v
=v>(A2e
A2
i)v
0,
where v=W1
iu. Since Wi(Ae
Ai)W1
iand Ae
Aihave the same eigenvalues, we
conclude that Ae
Aiis SPSD. The locality of e
Aistems from the locality of e
Ci.
6H. AL DAAS AND P. JOLIVET
We note that the SPSD splitting e
Aiobtained from the SPSD splitting of A2is
local with respect to the extended overlapping subdomain e
i. A Schur complement
technique can be applied to obtain the locality to the subdomain Ωi.Lemma 3.3
presents how to obtain a local SPSD splitting matrix of Awith respect to the
subdomain Ωifrom the local SPSD splitting of Awith respect to the extended
overlapping subdomain e
i.
Lemma 3.3. Let e
Ai=e
R>
ie
Aii e
Ribe a local SPSD splitting of Awith respect to the
extended overlapping subdomain e
i. Let e
Aii be written as a (2,2) block matrix such
that the (1,1) block corresponds to the overlapping subdomain iand the (2,2) block
corresponds to i, i.e.,
e
Aii =Xi,11 Xi,12
Xi,21 Xi,22,
and let
(3.2) e
Aii =Xi,11 Xi,12X1
i,22Xi,21 ,
where we assume that Xi,22 is SPD. Then, e
Ai=R>
ie
AiiRiis a SPSD splitting of A
with respect to the subdomain i.
Proof. We have
e
Aii =Xi,11 Xi,12
Xi,21 Xi,22
=Xi,11 Xi,12X1
i,22Xi,21 +Xi,12 X1
i,22Xi,21 Xi,12
Xi,21 Xi,22.
Since Xi,22 is SPD and e
Aii is SPSD, Xi,11 Xi,12X1
i,22Xi,21 is SPSD. Therefore,
0u>e
Aiu=u>R>
ie
AiiRiu
u>e
R>
ie
Aii e
Riu
u>Au.
Remark 3.4. Since the SPSD splitting will be used to construct a preconditioner,
the assumption in Lemma 3.3 that Xi,22 is SPD can be obtained by shifting its
diagonal elements by a small value such as kXi,22k2ε, where εis the floating-point
machine precision. One can also shift the diagonal values of the matrix e
Aii by a small
value ke
Aiik2εso that the Schur complement can be well defined.
In the following section, we explain how to compute the local SPSD splitting matrices
efficiently.
3.2. Practical construction of local SPSD matrices. The construction of
robust two-level overlapping Schwarz preconditioners is based on computing the coarse
space projection operator R0. Using the local SPSD splitting matrices of A,R0can
be chosen as the matrix that spans the space
Z=
N
M
i=1
R>
iDiZi,
PRECONDITIONER FOR SPARSE SPD MATRICES 7
where Ziis defined to be
(3.3) Zi= span{u|DiAiiDiu=λe
Aiiu, and λ > 1 },
where τ > 0 is a user-specified number. The condition number of the preconditioned
matrix M1
additiveAis bounded from above by (kc+ 1) 2 + (2kc+ 1)km
τ, where kc
is the number of colors required to color the graph of Asuch that any two
neighboring subdomains have different colors and kmis the multiplicity constant that
satisfies (2.2).
Solving the generalized eigenvalue problem in (3.3) using iterative solvers such
as the Krylov–Schur method [43] requires solving linear systems of the form e
Aiiu=
v. The matrix e
Aii is the Schur complement of the matrix e
Aii =X>
iXi1
2, where
Xi=RiAe
R>
i. Let Xi=UiΣiV>
ibe the economic singular-value decomposition of
Xiand let V
ibe an orthonormal matrix whose columns form a complementary basis
of the columns of Vi, i.e., [Vi, V
i] is an orthogonal matrix. Note that V
i(V
i)>=
I˜niViV>
i. Using Remark 3.4,e
Aii can be chosen as
e
Aii =ViΣiV>
i+σ1iεI˜ni
=ViΣiV>
i+σ1iε[Vi, V
i][Vi, V
i]>
=Vii+σ1iεIni)V>
i+σ1iεV
i(V
i)>
=Vii+σ1iεIni)V>
i+σ1iε(I˜niViV>
i),
where σ1iis the largest singular value of Xi. One way to solve the linear system
e
Aiiu=vis thus to solve the augmented linear system
e
Aii u
y=v
0.
Given the singular-value decomposition of e
Aii, the solution ucan be obtained
efficiently. Indeed, the inverse of e
Aii is
(3.4) e
A1
ii =Vii+σ1iεIni)1V>
i+σ1
1iε1(I˜niViV>
i).
In our current implementation, the singular-value decomposition is computed
concurrently using LAPACK [6]. This implies that the sparse matrix Xi, see (3.1),
is converted to a dense representation. Then, e
Aii is never assembled, and instead,
the action of its inverse is applied in a matrix-free fashion using (3.4). Since these
operations are local to each subdomain, they remain tractable. However, it could
be beneficial to leverage the lower memory-footprint of iterative sparse singular-value
solvers, e.g., PRIMME SVDS [48]. To the best of our knowledge, no such solver may
be used to retrieve the complete economic singular-value decomposition of a sparse
matrix.
Since the construction of the two-level method is fully algebraic, one can
successively apply the same approach on the coarse space matrix to obtain a multilevel
preconditioner in which, the condition number of each preconditioned matrix is
bounded from above by a prescribed number. Note that if the matrices e
Aii for
i= 1, . . . , N are formed explicitly as in (3.2), we can use the strategy that we proposed
in [3] to construct a multilevel preconditioner with the same properties.
8H. AL DAAS AND P. JOLIVET
4. Numerical experiments. In this section, we present a variety of numerical
experiments that show the effectiveness and efficiency of the proposed preconditioner.
First, we compare it against state-of-the-art algebraic multigrid preconditioners
including AGMG [34,35], BoomerAMG [14], and GAMG [1]. Then, we include
numerical experiments where the proposed preconditioner is used to solve coarse
problems from other multilevel solvers, thus emphasizing the algebraic and robust
traits of our method. Except for AGMG which is used through its MATLAB
interface, all these experiments are performed using PETSc [7]. In particular, the
proposed preconditioner is a natural extension of the PCHPDDM infrastructure [24]
which we use to solve the concurrent generalized eigenvalue problems from (3.3)
via SLEPc [20], and then to define our multilevel preconditioner by selecting the
appropriate local eigenmodes depending on the user-specified value of τ. With
respect to Remark 3.1, we use the PETSc routine MatCreateSubMatrices, see https:
//petsc.org/release/docs/manualpages/Mat/MatCreateSubMatrices.html. Instead of
using M1
additive as defined in (2.1), we will use M1
deflated, defined as
M1
deflated =R>
0C1
00 R0+M1
RAS(InCR>
0C1
00 R0),
where M1
RAS is the well-known one-level restricted additive Schwarz method [9]. The
choice of M1
deflated over M1
additive is motived by previous results from the literature [45],
which exhibit better numerical property of the former over the latter. Table 1 presents
the set of test matrices from the SuiteSparse Matrix Collection [11] that are used first.
They represent a subset of the matrices from the collection which satisfy both criteria
Special Structure equal to Symmetric” and “Positive Definite equal to Yes”. We
highlight the fact that our proposed preconditioner can handle unstructured systems,
not necessarily stemming from standard PDE discretization schemes, by displaying
some nonzero patterns in Figure 1.
4.1. The algebraic two-level case. In this section, we present a numerical
comparison between our proposed preconditioner and three algebraic multigrid
solvers: AGMG, BoomerAMG, and GAMG. Even though matrices from Table 1 are
SPD, all three AMG solvers encounter difficulties in solving many of the associated
linear systems with random right-hand sides. On the contrary, our algebraic two-
level preconditioner is more robust and always reach the prescribed tolerance of 108.
Note that a simple one-level preconditioner such as M1
RAS with a minimal overlap of
one does not converge for these problems. The outer Krylov method is the right-
preconditioned GMRES(30) [39]. For preconditioners used within PETSc (all except
AGMG), the systems are solved using 256 MPI processes and are first renumbered
by ParMETIS [27]. For our DD method, a single subdomain is mapped to each
process, i.e., N= 256 in (2.1). Furthermore, exact subdomain and second-level
operator Cholesky factorizations are computed. In the last column of Table 2, the
size of second-level is reported. One may notice that the grid complexities fluctuate
among matrices. Indeed, for small-sized problem s3rmt3m3, the grid complexity is
5,357+5,321
5,357 = 1.99, while for problem parabolic fem, it is 5.26·105+21,736
5.26·105= 1.04.
4.2. The nested-level case. Since our proposed preconditioner is fully
algebraic, we now use it recursively to solve the second-level operator from the
previous section using yet another two-level method instead of using an exact Cholesky
factorization. This thus yields an algebraic three-level preconditioner. HPDDM has
the capability of automatically redistributing coarse operators on a subset of MPI
processes on which the initial coefficient matrix Ais distributed [23]. We still use 256
PRECONDITIONER FOR SPARSE SPD MATRICES 9
Table 1
Test matrices taken from the SuiteSparse Matrix Collection.
Identifier nnnz(A) condest(A)
s3rmt3m3 5,357 207,123 4.4·1010
vanbody 47,072 2,329,056 9.4·1018
gridgena 48,962 512,084 7.1·105
ct20stif 52,329 2,600,295 2.2·1014
nasasrb 54,870 2,677,324 1.5·109
Dubcova2 65,025 1,030,225 10,411
finan512 74,752 596,992 98.4
consph 83,334 6,010,480 3.2·107
s3dkt3m2 90,449 3,686,223 6.3·1011
shipsec8 114,919 3,303,553 1.5·1014
ship 003 121,728 3,777,036 2.6·1016
boneS01 127,224 5,516,602 4.2·107
bmwcra 1 148,770 10,641,602 9.7·108
G2 circuit 150,102 726,674 2 ·107
pwtk 217,918 11,524,432 5 ·1012
offshore 259,789 4,242,673 2.3·1013
af 4 k101 503,625 17,550,675 6.5·108
parabolic fem 525,825 3,674,625 2.1·105
apache2 715,176 4,817,870 5.3·106
tmt sym 726,713 5,080,961 1.1·109
ecology2 999,999 4,995,991 6.7·107
(a) s3rmt3m3, n= 5,357 (b) ct20stif, n= 52,329 (c) finan512, n= 74,752
(d) consph, n= 83,334 (e) G2 circuit, n= 1.5·105(f) offshore, n= 2.6·105
Fig. 1.Nonzero sparsity pattern of some of the test matrices from Table 1.
10 H. AL DAAS AND P. JOLIVET
Table 2
Preconditioner comparison: iteration counts are reported in the columns 2–5 if convergence
to the prescribed tolerance of 108is achieved in 100 iterations or less. In column 6, sizes of the
second-level operator generated by our proposed preconditioner are reported.
Identifier AGMG BoomerAMG GAMG HPDDM nC
s3rmt3m3 4 5,321
vanbody 18 25,600
gridgena 2 16,706
ct20stif 4 49,421
nasasrb 10 25,600
Dubcova2 76 56 5 12,729
finan512 9 7 8 4 15,271
consph 26 25,600
s3dkt3m2 49 25,592
shipsec8 7 76,800
ship 003 9 76,759
boneS01 16 25,600
bmwcra 1 20 76,800
G2 circuit 29 11 26 19 21,602
pwtk 47 25,600
offshore 7 76,800
af 4 k101 18 76,800
parabolic fem 12 8 16 17 21,736
apache2 14 11 35 7 76,800
tmt sym 14 10 17 14 32,000
ecology2 18 12 18 45 33,261
MPI processes for the fine-level decomposition, then use four processes for the second-
level decomposition, and the third-level operator is centralized on a single process.
The outer solver is now the flexible GMRES(30) [37]. Second-level systems are this
time solved with the right-preconditioned GMRES(30), with a higher tolerance set
to 104, compared to the outer-solver tolerance of 108. We investigate problems
s3rmt3m3 and parabolic fem which are the two extremes from the previous section
in terms of grid complexity. Iteration counts are reported in Table 3. One may
notice that the number of outer iterations is exactly the same as in the fifth column
of Table 2, meaning that the switch to an inexact second-level solver does not hinder
the overall convergence. Also, the number of inner iterations is small, so our proposed
preconditioner applied to the second-level operator is indeed robust. Eventually, as
we decrease the number of subdomains for the second-level decomposition, the grid
coarsening improves as well, especially for small-sized problem s3rmt3m3.
In another context, we use our proposed preconditioner to solve coarse systems
yield by two other multilevel preconditioners. The following three-dimensional
problems are discretized by FreeFEM [17] using 4,096 MPI processes. First, we use
GenEO [41] to assemble a two-level analytic preconditioner for a scalar diffusion
equation using order-two Lagrange finite elements. The number of unknowns is
4.17 ·106, and the second-level operator generated by GenEO is of dimension nC,2=
60,144. It is redistributed among 512 processes, and our preconditioner constructs a
third-level operator of dimension nC,3= 12,040. Then, we use GAMG to assemble a
four-level quasi-algebraic (the near-nullspace is provided by the discretization kernel)
PRECONDITIONER FOR SPARSE SPD MATRICES 11
1
5·105
1·106
1.7·106
κ
(a) Scalar diffusion in the unit cube with the
coefficient κextruded in one dimension.
1·102
200
E(GPa)
0.25
0.45
ν
(b) Elongated (10×ratio) three-dimensional beam
with Young’s modulus (E) and Poisson’s ratio (ν)
extruded in one dimension.
Fig. 2.Variations of the material coefficients for problems from Table 4.
Table 3
Algebraic multilevel preconditioner: Outer iterations is the FGMRES iteration count, Inner
iterations is the average GMRES iteration count to solve coarse systems, nis the size of the linear
system, nC,2(resp. nC,3) is the size of the second-level (resp. third-level) operator.
Identifier Outer
iterations
Inner
iterations n nC,2nC,3
s3rmt3m3 4 10 5,357 5,321 2,240
parabolic fem 17 3 525,825 21,736 3,838
preconditioner for the system of linear elasticity using order-two Lagrange finite
elements. The number of unknowns is 3.06 ·107. The coarse operator from GAMG
grid hierarchy is of dimension nC,2= 14,880. It is redistributed among 256 processes
using the telescope infrastructure [33] and our preconditioner constructs a final-level
operator of dimension nC,3= 5,120. Unlike what is traditionally done with smoothed-
aggregation AMG [47], we do not transfer explicitly the near-nullspace from GAMG
coarse level for setting up our
preconditioner. These results are gathered in Table 4. Again, one may notice that
the fast and accurate convergence of the inner solves (third column) does not hinder
the overall convergence (second column). For both the scalar diffusion equation · κ
and the system of linear elasticity, highly heterogeneous material coefficients are used,
see Figure 2a and Figure 2b, respectively.
Furthermore, as in subsection 4.1, note that using a simple one-level
preconditioner such as M1
RAS with a minimal overlap of one for solving coarse systems
from Tables 3 and 4does not yield accurate enough inner solutions, thus preventing
the outer solvers from converging. Coupling GAMG with our preconditioner is a good
assessment of the composability of PETSc solvers [8], for the interested reader, we
provide next in Figure 3 the exact options used to setup such a multilevel solver.
5. Conclusion. We presented in this paper a fully algebraic and locally
constructed multilevel overlapping Schwarz preconditioner that can bound from above
the condition number of the preconditioned matrix given a user-defined number. The
construction of the preconditioner relies on finding local SPSD splitting matrices of
the matrix A. Computing these splitting matrices involves the computation of the
right singular vectors of the local block row matrix which might be considered costly
on the fine level. However, the locality of computations and the robustness of the
12 H. AL DAAS AND P. JOLIVET
Table 4
Hybrid multilevel preconditioner: Outer iterations is the FGMRES iteration count, Inner
iterations is the average GMRES iteration count to solve coarse systems, nis the size of the
linear system, nC,2is the size of the coarse-level operator assembled by either GenEO (for problem
diffusion) or GAMG (for problem elasticity), nC,3is the size of the second-level operator assembled
by our algebraic preconditioner to solve the aforementioned coarse systems.
Identifier Outer
iterations
Inner
iterations n nC,2nC,3
diffusion 11 5 4,173,281 60,144 12,040
elasticity 8 11 30,633,603 14,880 5,120
-ksp_type fgmres
-ksp_rtol 1.0e-8
-pc_type gamg
-pc_gamg_threshold 0.01
-pc_gamg_repartition
-pc_mg_levels 4
-prefix_push mg_coarse_
-pc_type telescope
-prefix_push pc_telescope_
-reduction_factor 16
-prefix_pop
-prefix_pop
# continue on the right column
# continued from the left column
-prefix_push mg_coarse_telescope_
-ksp_converged_reason
-ksp_type gmres
-ksp_pc_side right
-ksp_norm_type unpreconditioned
-ksp_rtol 1.0e-4
-pc_type hpddm
-prefix_push pc_hpddm_
-define_subdomains
-levels_1_pc_type asm #M1
RAS
-levels_1_sub_pc_type cholesky # subdomain solvers
-levels_1_eps_nev 20 # smallest λin (3.3)
-levels_1_st_type mat #
e
Aii
1from (3.4)
-coarse_pc_type cholesky # coarse solver
-prefix_pop
-prefix_pop
Fig. 3.PETSc command-line options for coupling GAMG and the proposed preconditioner.
preconditioner provide a very powerful and scalable preconditioner that can be used
as a black-box solver especially when other black-box preconditioners fail to achieve
a desired convergence rate. Our implementation is readily available in the PETSc
library. Again, the proposed preconditioner is not meant to replace analytic multilevel
preconditioners such as smoothed-aggregation algebraic multigrid and GenEO. When
these work, they will be more efficient algorithmically. However, employing the
proposed preconditioner to solve the corresponding coarse problems proved to be
effective and efficient. As a future work, we would like to investigate less expensive
constructions of SPSD matrices for specific classes of SPD matrices that arise from
the discretization of PDEs.
Acknowledgments. This work was granted access to the GENCI-sponsored
HPC resources of TGCC@CEA under allocation A0090607519. The authors would
like to thank J. E. Roman for interesting discussions concerning the solution of (3.3).
REFERENCES
[1] M. F. Adams, H. H. Bayraktar, T. M. Keaveny, and P. Papadopoulos,Ultrascalable
implicit finite element analyses in solid mechanics with over a half a billion degrees of
freedom, in Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, SC04,
IEEE Computer Society, 2004, pp. 34:1–34:15.
[2] H. Al Daas and L. Grigori,A class of efficient locally constructed preconditioners based on
coarse spaces, SIAM Journal on Matrix Analysis and Applications, 40 (2019), pp. 66–91.
PRECONDITIONER FOR SPARSE SPD MATRICES 13
[3] H. Al Daas, L. Grigori, P. Jolivet, and P.-H. Tournier,A multilevel Schwarz
preconditioner based on a hierarchy of robust coarse spaces, SIAM Journal on Scientific
Computing, 43 (2021), pp. A1907–A1928.
[4] H. Al Daas, P. Jolivet, and J. A. Scott,A robust algebraic domain decomposition
preconditioner for sparse normal equations, 2021, https://arxiv.org/abs/2107.09006.
[5] H. Al Daas, T. Rees, and J. A. Scott,Two-level Nystr¨om–Schur preconditioner for sparse
symmetric positive definite matrices, 2021, https://arxiv.org/abs/2101.12164.
[6] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz,
A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen,LAPACK users’
guide, Society for Industrial and Applied Mathematics, 1999.
[7] S. Balay, S. Abhyankar, M. F. Adams, J. Brown, P. Brune, K. Buschelman, L. Dalcin,
A. Dener, V. Eijkhout, W. D. Gropp, D. Karpeyev, D. Kaushik, M. G. Knepley,
D. A. May, L. C. McInnes, R. T. Mills, T. Munson, K. Rupp, P. Sanan, B. F. Smith,
S. Zampini, H. Zhang, and H. Zhang,PETSc web page, 2021, https://petsc.org.
[8] J. Brown, M. G. Knepley, D. A. May, L. C. McInnes, and B. F. Smith,Composable linear
solvers for multiphysics, in 2012 11th International Symposium on Parallel and Distributed
Computing, 2012, pp. 55–62.
[9] X.-C. Cai and M. Sarkis,A restricted additive Schwarz preconditioner for general sparse
linear systems, SIAM Journal on Scientific Computing, 21 (1999), pp. 792–797.
[10] T. Chartier, R. D. Falgout, V. E. Henson, J. Jones, T. Manteuffel, S. McCormick,
J. Ruge, and P. S. Vassilevski,Spectral AMGe (ρAMGe), SIAM Journal on Scientific
Computing, 25 (2003), pp. 1–26.
[11] T. A. Davis and Y. Hu,The University of Florida sparse matrix collection, ACM Transactions
on Mathematical Software, 38 (2011), pp. 1–28.
[12] V. Dolean, P. Jolivet, and F. Nataf,An introduction to domain decomposition methods.
Algorithms, theory, and parallel implementation, Society for Industrial and Applied
Mathematics, 2015.
[13] I. S. Duff, A. M. Erisman, and J. K. Reid,Direct methods for sparse matrices, Oxford
University Press, 2017.
[14] R. D. Falgout and U. M. Yang, hypre: a library of high performance preconditioners,
Computational Science—ICCS 2002, (2002), pp. 632–641.
[15] M. J. Gander and A. Loneland,SHEM: an optimal coarse space for RAS and its multiscale
approximation, in Domain Decomposition Methods in Science and Engineering XXIII, C.-
O. Lee, X.-C. Cai, D. E. Keyes, H. H. Kim, A. Klawonn, E.-J. Park, and O. B. Widlund,
eds., Cham, 2017, Springer International Publishing, pp. 313–321.
[16] L. Gouarin and N. Spillane,Fully algebraic domain decomposition preconditioners
with adaptive spectral bounds. Preprint, June 2021, https://hal.archives-ouvertes.fr/
hal-03258644.
[17] F. Hecht,New development in FreeFem++, Journal of Numerical Mathematics, 20 (2012),
pp. 251–265.
[18] A. Heinlein, C. Hochmuth, and A. Klawonn,Reduced dimension GDSW coarse spaces for
monolithic Schwarz domain decomposition methods for incompressible fluid flow problems,
International Journal for Numerical Methods in Engineering, 121 (2020), pp. 1101–1119.
[19] A. Heinlein, A. Klawonn, J. Knepper, O. Rheinbach, and O. B. Widlund,Adaptive GDSW
coarse spaces of reduced dimension for overlapping Schwarz methods, technical report,
Universit¨at zu K¨oln, September 2020, https://kups.ub.uni-koeln.de/12113/.
[20] V. Hernandez, J. E. Roman, and V. Vidal,SLEPc: a scalable and flexible toolkit for the
solution of eigenvalue problems, ACM Transactions on Mathematical Software, 31 (2005),
pp. 351–362, https://slepc.upv.es.
[21] M. R. Hestenes and E. Stiefel,Methods of conjugate gradients for solving linear systems.,
Journal of research of the National Bureau of Standards., 49 (1952), pp. 409–436.
[22] N. J. Higham and T. Mary,A new preconditioner that exploits low-rank approximations to
factorization error, SIAM Journal on Scientific Computing, 41 (2019), pp. A59–A82.
[23] P. Jolivet, F. Hecht, F. Nataf, and C. Prud’homme,Scalable domain decomposition
preconditioners for heterogeneous elliptic problems, in Proceedings of the International
Conference on High Performance Computing, Networking, Storage and Analysis, SC ’13,
New York, NY, USA, 2013, ACM, pp. 80:1–80:11.
[24] P. Jolivet, J. E. Roman, and S. Zampini,KSPHPDDM and PCHPDDM: extending PETSc
with advanced Krylov methods and robust multilevel overlapping Schwarz preconditioners,
Computers & Mathematics with Applications, 84 (2021), pp. 277–295.
[25] T. B. J¨
onsth¨
ovel, M. B. van Gijzen, C. Vuik, C. Kasbergen, and A. Scarpas,
Preconditioned conjugate gradient method enhanced by deflation of rigid body modes
14 H. AL DAAS AND P. JOLIVET
applied to composite materials, Computer Modeling in Engineering & Sciences, 47 (2009),
pp. 97–118.
[26] T. B. J¨
onsth¨
ovel, M. B. van Gijzen, C. Vuik, and A. Scarpas,On the use of rigid body
modes in the deflated preconditioned conjugate gradient method, SIAM Journal on Scientific
Computing, 35 (2013), pp. B207–B225.
[27] G. Karypis and V. Kumar,Multilevel k-way partitioning scheme for irregular graphs, Journal
of Parallel and Distributed computing, 48 (1998), pp. 96–129.
[28] A. Klawonn, M. K¨
uhn, and O. Rheinbach,Adaptive coarse spaces for FETI-DP in three
dimensions, SIAM Journal on Scientific Computing, 38 (2016), pp. A2880–A2911.
[29] A. Klawonn, P. Radtke, and O. Rheinbach,FETI-DP methods with an adaptive coarse
space, SIAM Journal on Numerical Analysis, 53 (2015), pp. 297–320.
[30] F. Kong and X.-C. Cai,A scalable nonlinear fluid–structure interaction solver based on a
Schwarz preconditioner with isogeometric unstructured coarse spaces in 3D, Journal of
Computational Physics, 340 (2017), pp. 498–518.
[31] R. Li, Y. Xi, and Y. Saad,Schur complement-based domain decomposition preconditioners
with low-rank corrections, Numerical Linear Algebra with Applications, 23 (2016), pp. 706–
729.
[32] P. Marchand, X. Claeys, P. Jolivet, F. Nataf, and P.-H. Tournier,Two-level
preconditioning for h-version boundary element approximation of hypersingular operator
with GenEO, Numerische Mathematik, 146 (2020), pp. 597–628.
[33] D. A. May, P. Sanan, K. Rupp, M. G. Knepley, and B. F. Smith,Extreme-scale
multigrid components within PETSc, in Proceedings of the Platform for Advanced Scientific
Computing Conference, PASC ’16, New York, NY, USA, 2016, Association for Computing
Machinery.
[34] A. Napov and Y. Notay,An algebraic multigrid method with guaranteed convergence rate,
SIAM Journal on Scientific Computing, 34 (2012), pp. A1079–A1109.
[35] Y. Notay,An aggregation-based algebraic multigrid method, Electronic Transactions on
Numerical Analysis, 37 (2010), pp. 123–146, http://agmg.eu.
[36] J. W. Pearson and J. Pestana,Preconditioners for Krylov subspace methods: an overview,
GAMM-Mitteilungen, 43 (2020), p. e202000015.
[37] Y. Saad,A flexible inner-outer preconditioned GMRES algorithm, SIAM Journal on Scientific
Computing, 14 (1993), pp. 461–469.
[38] Y. Saad,Iterative methods for sparse linear systems, Society for Industrial and Applied
Mathematics, 2003.
[39] Y. Saad and M. H. Schultz,GMRES: a generalized minimal residual algorithm for solving
nonsymmetric linear systems, SIAM Journal on Scientific and Statistical Computing, 7
(1986), pp. 856–869.
[40] B. F. Smith, P. E. Bjørstad, and W. D. Gropp,Domain decomposition: parallel multilevel
methods for elliptic partial differential equations, Cambridge University Press, 1996.
[41] N. Spillane, V. Dolean, P. Hauret, F. Nataf, C. Pechstein, and R. Scheichl,Abstract
robust coarse spaces for systems of PDEs via generalized eigenproblems in the overlaps,
Numerische Mathematik, 126 (2014), pp. 741–770.
[42] N. Spillane and D. Rixen,Automatic spectral coarse spaces for robust finite element tearing
and interconnecting and balanced domain decomposition algorithms, International Journal
for Numerical Methods in Engineering, 95 (2013), pp. 953–990.
[43] G. W. Stewart,A Krylov–Schur algorithm for large eigenproblems, SIAM Journal on Matrix
Analysis and Applications, 23 (2002), pp. 601–614.
[44] R. Tamstorf, T. Jones, and S. F. McCormick,Smoothed aggregation multigrid for cloth
simulation, ACM Transactions on Graphics, 34 (2015).
[45] J. M. Tang, R. Nabben, C. Vuik, and Y. A. Erlangga,Comparison of two-level
preconditioners derived from deflation, domain decomposition and multigrid methods,
Journal of Scientific Computing, 39 (2009), pp. 340–370.
[46] J. Van lent, R. Scheichl, and I. G. Graham,Energy-minimizing coarse spaces for two-level
Schwarz methods for multiscale PDEs, Numerical Linear Algebra with Applications, 16
(2009), pp. 775–799.
[47] P. Vanˇ
ek,Acceleration of convergence of a two-level algorithm by smoothing transfer operators,
Applications of Mathematics, 37 (1992), pp. 265–274.
[48] L. Wu, E. Romero, and A. Stathopoulos,PRIMME SVDS: a high-performance
preconditioned SVD solver for accurate large-scale computations, SIAM Journal on
Scientific Computing, 39 (2017), pp. S248–S271.
[49] S. Zampini,PCBDDC: a class of robust dual-primal methods in PETSc, SIAM Journal on
Scientific Computing, 38 (2016), pp. S282–S306.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Monolithic preconditioners for incompressible fluid flow problems can significantly improve the convergence speed compared to preconditioners based on incomplete block factorizations. However, the computational costs for the setup and the application of monolithic preconditioners are typically higher. In this paper, several techniques to further improve the convergence speed as well as the computing time are applied to monolithic two‐level Generalized Dryja–Smith–Widlund (GDSW) preconditioners. In particular, reduced dimension GDSW (RGDSW) coarse spaces, restricted and scaled versions of the first level, hybrid and parallel coupling of the levels, and recycling strategies are investigated. Using a combination of all these improvements, for a small time‐dependent Navier‐Stokes problem on 240 MPI ranks, a reduction of 86 % of the time‐to‐solution can be obtained. Even without applying recycling strategies, the time‐to‐solution can be reduced by more than 50 % for a larger steady Stokes problem on 4 608 MPI ranks. For the largest problems with 11 979 MPI ranks the scalability deteriorates drastically for the monolithic GDSW coarse space. On the other hand, using the reduced dimension coarse spaces, good scalability up to 11 979 MPI ranks, which corresponds to the largest problem configuration fitting on the employed supercomputer, could be achieved. This article is protected by copyright. All rights reserved.
Article
Full-text available
A robust two-level overlapping Schwarz method for scalar elliptic model problems with highly varying coefficient functions is introduced. While the convergence of standard coarse spaces may depend strongly on the contrast of the coefficient function, the condition number bound of the new method is independent of the coefficient function. Indeed, the condition number only depends on a user-prescribed tolerance. The coarse space is based on discrete harmonic extensions of vertex, edge, and face interface functions, which are computed from the solutions of corresponding local generalized edge and face eigenvalue problems. The local eigenvalue problems are of the size of the edges and faces of the decomposition, and the eigenvalue problems can be constructed solely from the local subdomain stiffness matrices and the fully assembled global stiffness matrix. The new AGDSW (adaptive generalized Dryja--Smith--Widlund) coarse space always contains the classical GDSW coarse space by construction of the generalized eigenvalue problems. Numerical results supporting the theory are presented for several model problems in three dimensions using structured as well as unstructured meshes and unstructured decompositions.
Article
Full-text available
In this paper we present a class of robust and fully algebraic two-level preconditioners for SPD matrices. We introduce the notion of algebraic local SPSD splitting of an SPD matrix and we give a characterization of this splitting. This splitting leads to construct algebraically and locally a class of efficient coarse spaces which bound the spectral condition number of the preconditioned system by a number defined a priori. We also introduce the τ-filtering subspace. This concept helps compare the dimension minimality of coarse spaces. Some PDEs-dependant preconditioners correspond to a special case. The examples of the algebraic coarse spaces in this paper are not practical due to expensive construction. We propose a heuristic approximation that is not costly. Numerical experiments illustrate the efficiency of the proposed method.
Article
We consider ill-conditioned linear systems Ax = b that are to be solved iteratively, and assume that a low accuracy LU factorization A ≈ LÛ is available for use in a preconditioner. We have observed that for ill-conditioned matrices A arising in practice, A ⁻¹ tends to be numerically low rank, that is, it has a small number of large singular values. Importantly, the error matrix E = Û ⁻¹ L ⁻¹ A-I tends to have the same property. To understand this phenomenon we give bounds for the distance from E to a low-rank matrix in terms of the corresponding distance for A ⁻¹ . We then design a novel preconditioner that exploits the low-rank property of the error to accelerate the convergence of iterative methods. We apply this new preconditioner in three different contexts fitting our general framework: low oating-point precision (e.g., half precision) LU factorization, incomplete LU factorization, and block low-rank LU factorization. In numerical experiments with GMRES-based iterative refinement we show that our preconditioner can achieve a significant reduction in the number of iterations required to solve a variety of real-life problems.
Article
Nonlinear fluid-structure interaction (FSI) problems on unstructured meshes in 3D appear many applications in science and engineering, such as vibration analysis of aircrafts and patient-specific diagnosis of cardiovascular diseases. In this work, we develop a highly scalable, parallel algorithmic and software framework for FSI problems consisting of a nonlinear fluid system and a nonlinear solid system, that are coupled monolithically. The FSI system is discretized by a stabilized finite element method in space and a fully implicit backward difference scheme in time. To solve the large, sparse system of nonlinear algebraic equations at each time step, we propose an inexact Newton–Krylov method together with a multilevel, smoothed Schwarz preconditioner with isogeometric coarse meshes generated by a geometry preserving coarsening algorithm. Here “geometry” includes the boundary of the computational domain and the wet interface between the fluid and the solid. We show numerically that the proposed algorithm and implementation are highly scalable in terms of the number of linear and nonlinear iterations and the total compute time on a supercomputer with more than 10,000 processor cores for several problems with hundreds of millions of unknowns.
Chapter
In domain decomposition methods, coarse spaces are traditionally added to make the method scalable. Coarse spaces can however do much more: they can act on other error components that the subdomain iteration has difficulties with, and thus accelerate the overall solution process. We identify here the optimal coarse space for RAS, where optimal does not refer to scalable, but to best possible. This coarse space leads to convergence of the subdomain iterative method in two steps. Since this coarse space is very rich, we propose an approximation which turns out to be also very effective for multiscale problems.