ArticlePDF Available

A Multilevel Schwarz Preconditioner Based on a Hierarchy of Robust Coarse Spaces



Abstract. In this paper we present a multilevel preconditioner based on overlapping Schwarz
methods for symmetric positive definite (SPD) matrices. Robust two-level Schwarz preconditioners
exist in the literature to guarantee fast convergence of Krylov methods. As long as the dimension of
the coarse space is reasonable, that is, exact solvers can be used efficiently, two-level methods scale
well on parallel architectures. However, the factorization of the coarse space matrix may become
costly at scale. An alternative is then to use an iterative method on the second level, combined with
an algebraic preconditioner, such as a one-level additive Schwarz preconditioner. Nevertheless, the
condition number of the resulting preconditioned coarse space matrix may still be large. One of the
difficulties of using more advanced methods, like algebraic multigrid or even two-level overlapping
Schwarz methods, to solve the coarse problem is that the matrix does not arise from a partial
differential equation (PDE) anymore. We introduce in this paper a robust multilevel additive Schwarz
preconditioner where at each level the condition number is bounded, ensuring a fast convergence for
each nested solver. Furthermore, our construction does not require any additional information than
for building a two-level method, and may thus be seen as an algebraic extension.
Key words. domain decomposition, multilevel, elliptic problems, subspace correction
AMS subject classifications. 65F08, 65F10, 65N55
1. Introduction. We consider the solution of a linear system of equations
(1.1) Ax =b,
where ARn×nis a symmetric positive definite (SPD) matrix, bRnis the right-
hand side, and xRnis the vector of unknowns. To enhance convergence, it is
common to solve the preconditioned system
M1Ax =M1b.
Standard domain decomposition preconditioners such as block Jacobi, additive
Schwarz, and restricted additive Schwarz methods are widely used [32,9,8]. In a
parallel framework, such preconditioners have the advantage of relatively low com-
munication costs. However, their role in lowering the condition number of the sys-
tem typically deteriorates when the number of subdomains increases. Multilevel ap-
proaches have shown a large impact on enhancing the convergence of Krylov methods
[33,12,7,25,20,10,21,1,15,23,34,30]. In multigrid and domain decomposition
communities, multilevel methods have proven their capacity of scaling up to large
numbers of processors and tackling ill-conditioned systems [37,4,19]. While some
preconditioners are purely algebraic [7,20,10,26,29,16,1], several multilevel meth-
ods are based on hierarchical meshing in both multigrid and domain decomposition
communities [35,9,25,15,23]. Mesh coarsening depends on the geometry of the
problem. One has to be careful when choosing a hierarchical structure since it can
have a significant impact on the iteration count [23,25]. In [23], the authors propose
ALPINES, INRIA, Paris, France (,
IRIT, CNRS, Toulouse, France (
LJLL, CNRS, Paris, France (
a multilevel Schwarz domain decomposition solver for the elasticity problem. Based
on a heuristic approach and following the maximum independent set method [2], they
coarsen the fine mesh while preserving the boundary in order to obtain a two-level
method. This strategy is repeated recursively to build several levels. However, they
do not provide a bound on the condition number of the preconditioned matrix of the
multilevel method. Multilevel domain decomposition methods are mostly based on
non-overlapping approaches [35,9,25,23,37,4,30,34]. Two-level overlapping domain
decomposition methods are well studied and provide robust convergence estimates
[33,12,5]. However, extending such a construction to more than two levels while
preserving robustness is not straightforward. In [6], the authors propose an algebraic
multilevel additive Schwarz method. Their approach is inspired by algebraic multigrid
strategies. One drawback of it is that it is sensitive to the number of subdomains. In
[15], the authors suggest applying the two-level Generalized Dryja–Smith–Widlund
preconditioner recursively to build a multilevel method. In this case, the condition
number bound of the two-level approach depends on the width of the overlap, the
diameter of discretization elements, and the diameter of the subdomains. They focus
on the preconditioner for the three-level case. One drawback of their approach is that
the three-level preconditioner requires more iterations than the two-level variant. In
this paper, the only information from the PDE needed for the construction of the
preconditioner consists of the local Neumann matrices at the fine level. These ma-
trices correspond to the integration of the bilinear form in the weak formulation of
the studied PDE on the subdomain-decomposed input mesh. No further information
is necessary: except on the fine level, our method is algebraic and does not depend
on any coarsened mesh or auxiliary discretized operator. For problems not arising
from PDE discretization, one needs to supply the local SPSD matrices on the finest
level. In [3], a subset of the authors propose a fully algebraic approximation for such
matrices. However, their approximation strategy is heuristic and may not be effective
in some cases.
Our preconditioner is based on a hierarchy of coarse spaces and is defined as fol-
lowing. At the first level, the set of unknowns is partitioned into N1subdomains and
each subdomain has an associated matrix A1,j =R1,j AR>
1,j obtained by using appro-
priate restriction and prolongation operators R1,j and R>
1,j respectively, defined in the
following section. The preconditioner is formed as an additive Schwarz preconditioner
coupled with an additive coarse space correction, defined as,
1,j A1
1,j R1,j ,
where V1is a tall-and-skinny matrix spanning a coarse space obtained by solving for
each subdomain j= 1 to N1a generalized eigenvalue problem involving the matrix
A1,j and the Neumann matrix associated with subdomain j. The coarse space matrix
is A2=V>
1AV1. This is equivalent to the GenEO preconditioner, and is described
in detail in [33] and recalled briefly in section 2. The dimension of the coarse space
is proportional to the number of subdomains N1. When it increases, factorizing A2
by using a direct method becomes prohibitive, and hence the application of A1
2to a
vector should also be performed through an iterative method.
Our multilevel approach defines a hierarchy of coarse spaces Viand coarse space
matrices Aifor i= 2 to any depth L+1, and defines a preconditioner M1
isuch that
the condition number of M1
iAiis bounded. The depth L+ 1 is chosen such that the
coarse space matrix AL+1 can be factorized efficiently by using a direct method. At
each level i, the graph of the coarse space matrix Aiis partitioned into Nisubdomains,
and each subdomain jis associated with a local matrix Ai,j =Ri,j AiR>
i,j obtained by
using appropriate restriction and prolongation operators Ri,j and R>
i,j , respectively.
The preconditioner at level iis defined as,
i,j A1
i,j Ri,j ,
where the coarse space matrix is Ai+1 =V>
One of the main contributions of the paper concerns the construction of the
hierarchy of coarse spaces Vifor levels igoing from 2 to L, that are built algebraically
from the coarse space of the previous level Vi1. This construction is based on the
definition of local symmetric positive semi-definite (SPSD) matrices associated with
each subdomain jat each level ithat we introduce in this paper. These matrices are
obtained by using the local SPSD matrices of the previous level i1 and the previous
coarse space Vi1. They are then involved, with the local matrices Ai,j, in concurrent
generalized eigenvalue problems solved for each subdomain jthat allows to compute
the local eigenvectors contributing to the coarse space Vi.
We show in Theorem 5.3,section 5, that the condition number of M1
bounded and depends on the maximum number of subdomains at the first level that
share an unknown, the number of distinct colors required to color the graph of Aiso
that span{R>
i,j }16j6Niof the same color are mutually Ai-orthogonal, and a user
defined tolerance τ. It is thus independent of the number of subdomains Ni.
The main contribution of this paper is based on the combination of two previous
works on two-level additive Schwarz methods [3,33]. The coarse space proposed in
[33] guarantees an upper bound on the condition number that can be prescribed by
the user. The SPSD splitting in the context of domain decomposition presented in
[3] provides an algebraic view for the construction of coarse spaces. The combination
of these two works leads to a robust multilevel additive Schwarz method. Here,
robustness refers to the fact that at each level, an upper bound on the condition
number of the associated matrix can be prescribed by the user a priori. The rest
of the paper is organized as follows. In the next section, we present the notations
used throughout the paper. In section 2, we present a brief review of the theory of
one- and two-level additive Schwarz methods. We extend in section 3 the class of
SPSD splitting matrices presented in [3] in order to make it suitable for multilevel
methods. Afterwards, we define the coarse space at level ibased on the extended
class of local SPSD splitting matrices associated with this level. Section 4 describes
the partitioning of the domain at level i+ 1 from the partitioning at level i. In
Section 5, we explain the computation of the local SPSD matrices associated with each
subdomain at level i+ 1. We compute them using those associated with subdomains
at level i.Section 6 presents numerical experiments on highly challenging diffusion
and linear elasticity problems in two- and three-dimensional problems. We illustrate
the theoretical robustness and practical usage of our proposed method by performing
strong scalability tests up to 8,192 processes.
Context and notation. By convention, the finest level, on which (1.1) is de-
fined, is the first level. A subscript index is used in order to specify which level
an entity is defined on. In the case where additional subscripts are used, the first
subscript always denotes the level. For the sake of clarity, we omit the subscript cor-
responding to level 1 when it is clear from context, e.g., matrix A. Furthermore, the
subscripts iand jalways refer to a specific level iand its subdomain j, respectively.
The number of levels is L+ 1. Let AiRni×nidenote symmetric positive definite
matrices, each corresponding to level i= 1, . . . , L +1. We suppose that a direct solver
can be used at level L+ 1 to compute an exact factorization of AL+1 .
Let BRp×qbe a matrix. Let PJ1; pKand QJ1; qKbe two sets of
indices. The concatenation of Pand Qis represented by [P, Q]. We note that the
order of the concatenation is important. B(P , :) is the submatrix of Bformed by
the rows whose indices belong to P.B(:, Q) is the submatrix of Bformed by the
columns whose indices belong to Q.B(P, Q)=(B(P, :)) (:, Q). The identity matrix
of size pis denoted Ip. We suppose that the graph of Aiis partitioned into Ninon-
overlapping subdomains, where Niniand Ni+1 6Nifor i= 1, . . . , L. We note that
partitioning at level 1 can be performed by using a graph partitioning library such as
ParMETIS [22] or PT-SCOTCH [11]. Partitioning at greater levels will be described
later in section 4. In the following, we define for each level i= 1, . . . , L notations
for subsets and restriction operators that are associated with the partitioning. Let
i=J1; niKbe the set of unknowns at level iand let Ωi,j,I for j= 1, . . . , Nibe the
subset of Ωithat represents the unknowns in subdomain j. We refer to Ωi,j,I as the
interior unknowns of subdomain j. Let Γi,j for j= 1, . . . , Nibe the subset of Ωithat
represents the neighbor unknowns of subdomain j, i.e., the unknowns at distance 1
from subdomain jthrough the graph of Ai. We refer to Γi,j as the overlapping
unknowns of subdomain j. We denote Ωi,j = [Ωi,j,I,Γi,j ], for j= 1, . . . , Ni, the
concatenation of interior and overlapping unknowns of subdomain j. We denote
i,j , for j= 1, . . . , Ni, the complementary of Ωi,j in Ωi, i.e., ∆i,j = Ωi\i,j . In
Figure 1.1, a triangular mesh is used to discretize a square domain. The set of
nodes of the mesh is partitioned into 16 disjoint subsets Ω1,j,I , which represent a
non-overlapping decomposition, for j= 1,...,16 (left). On the left, a matrix A1
whose connectivity graph corresponds to the mesh is illustrated. The submatrix
A1(Ω1,j,I ,1,j,I ) is associated with the non-overlapping subdomain j. Each submatrix
A1(Ω1,j,I ,1,j,I ) is colored with a distinct color. The same color is used to color the
region that contains the nodes in the non-overlapping subdomain Ω1,j,I. Note that
if two subdomains j1, j2are neighbors, the submatrix A1(Ω1,j1,I ,1,j2,I ) has nonzero
elements. For j= 1, . . . , Ni, we denote by ni,j,I ,γi,j and ni,j the cardinality of Ωi,j,I ,
Γi,j and Ωi,j respectively.
Let Ri,j,I Rni,j,I ×nibe defined as Ri,j,I =Ini(Ωi,j,I ,:).
Let Ri,j,ΓRγi,j ×nibe defined as Ri,j,Γ=Inii,j ,:).
Let Ri,j Rni,j ×nibe defined as Ri,j =Ini(Ωi,j ,:).
Let Ri,j,R(nini,j )×nibe defined as Ri,j,=Ini(∆i,j ,:).
Let Pi,j =Ini([Ωi,j,I ,Γi,j ,i,j ],:) Rni×ni, be a permutation matrix associated
with the subdomain j, for j= 1, . . . , Ni. The matrix of the overlapping subdomain j,
Ri,j AiR>
i,j , is denoted Ai,j . We denote Di,j Rni,j,×ni,j , j = 1, . . . , Ni, any set of
non-negative diagonal matrices such that
i,j Di,j Ri,j .
We refer to {Di,j }16j6Nias the algebraic partition of unity. Let ViRni×ni+1 be
a tall-and-skinny matrix of full rank. We denote Sithe subspace spanned by the
columns of Vi. This subspace will stand for the coarse space associated with level i.
By convention, we refer to Sias subdomain 0 at level i. Thus, we have ni,0=ni+1.
1,1,I 1,2,I
Fig. 1.1.Left: a triangular mesh is used to discretize the unit square. The set of nodes of the
mesh is partitioned into 16 disjoint subsets, non-overlapping subdomains, 1,j,I for j= 1,...,16.
Right: Illustration of the matrix A1whose connectivity graph corresponds to the mesh on the left.
The diagonal block jof A1corresponds to the non-overlapping subdomain 1,j,I. Each submatrix
A1(Ω1,j,I ,1,j,I )is colored with a distinct color. The same color is used to color the region of the
square that contains nodes in 1,j,I.
The interpolation operator at level iis defined as:
Rni,j Rni
i,j uj.
Finally, we denote Vi,j the set of neighboring subdomains of each subdomain jat
level ifor (i, j)J1; LK×J1; NiK.
Vi,j ={kJ1; NiK: Ωi,j i,k 6=∅}.
As previously mentioned, partitioning at level 1 can be performed by graph parti-
tioning libraries such as ParMETIS [22] or PT-SCOTCH [11]. Partitioning at further
levels will be defined later: the sets Ωi,j,I , Ωi,j,Γ, Ωi,j , and ∆i,j for i > 1 are defined
in subsection 4.2. The coarse spaces Sias well as the projection and prolongation
operators V>
iand Viare defined in subsection 3.2. We suppose that the connectivity
graph between the subdomains on each level is sparse. This assumption is not true in
general, however, it is valid in structures based on locally constructed coarse spaces
in domain decomposition as we show in this paper, see [18, Section 4.1 p.81] for the
case of two levels.
2. Background. In this section, we review briefly several theoretical results
related to additive Schwarz preconditioners. We introduce them for the sake of com-
Lemma 2.1 (fictitious subspace lemma). Let ARnA×nA, B RnB×nBbe two
symmetric positive definite matrices. Let Rbe an operator defined as
v7→ Rv,
and let R>be its transpose. Suppose that the following conditions hold:
1. The operator Ris surjective.
2. There exists cu>0such that
(Rv)>A(Rv)6cuv>Bv, vRnB.
3. There exists cl>0such that for all vnARnA,vnBRnB|vnA=RvnB
nBBvnB6(RvnB)>A(RvnB) = v>
Then, the spectrum of the operator RB1R>Ais contained in the segment [cl, cu].
Proof. We refer the reader to [12, Lemma 7.4 p.164] or [28,27,13] for a detailed
Lemma 2.2. The operator Ri,2as defined in (1.2) is surjective.
Proof. The proof follows from the definition of Ri,2(1.2).
Lemma 2.3. Let ki,c for i= 1, . . . , L be the minimum number of distinct colors
so that span{R>
i,j }16j6Niof the same color are mutually Ai-orthogonal. Then, we
6(ki,c + 1)
jRi,j AiR>
i,j uj,uBi= (uj)06j6Ni
Rni,j .
Proof. We refer the reader to [9, Theorem 12 p.93] for a detailed proof.
We note that at level i, the number ki,c is smaller than the maximum number of
neighbors over the set of subdomains J1; NiK
ki,c 6max
#Vi,j .
Due to the sparse structure of the connectivity graph between the subdomains at
level i, the maximum number of neighbors over the set of subdomains J1; NiKis
independent of the number of subdomains Ni. Then, so is ki,c.
Lemma 2.4. Let uAiRnAiand uBi={uj}06j6NiQNi
j=0 Rni,j such that uAi=
Ri,2uBi. The additive Schwarz operator without any other restriction on the coarse
space Siverifies the following inequality
jRi,j AiR>
i,j uj62u>
AiAiuAi+ (2ki,c + 1)
jRi,j AiR>
i,j uj,
where ki,c is defined in Lemma 2.3.
Proof. We refer the reader to [12, Lemma 7.12, p. 175] to view the proof in
Lemma 2.5. Let A, B Rm×mbe two symmetric positive semi-definite matrices.
Let ker(A),range(A)denote the null space and the range of Arespectively. Let P0
be an orthogonal projection on range(A). Let τbe a positive real number. Consider
the generalized eigenvalue problem,
(uk, λk)range(A)×R.
Let Pτbe an orthogonal projection on the subspace
Z=ker(A)span {uk|λk> τ},
then, the following inequality holds:
(2.1) (uPτu)>B(uPτu)6τu>Au, uRm.
Proof. We refer the reader to [3, Lemma 2.4] and [12, Lemma 7.7] for a detailed
2.1. GenEO coarse space. In [33,12] the authors present the GenEO coarse
space which relies on defining appropriate symmetric positive semi-definite (SPSD)
matrices ˜
AjRn×nfor j= 1, . . . , N . These are the unassembled Neumann matrices,
corresponding to the integration on each subdomain of the operator defined in the
variational form of the PDE. These matrices are local, i.e., Rj,˜
Aj= 0. Furthermore,
they verify the relations
Aju6u>Au, uRn,
Aju6kGenEOu>Au, uRn,
where kGenEO 6Nis the maximum number of subdomains that share an unknown.
2.2. Local SPSD splitting of an SPD matrix. In [3], the authors present
the local SPSD splitting of an SPD matrix. Given the permutation matrix Pj, a local
SPSD splitting matrix ˜
Ajof Aassociated with subdomain jis defined as
(2.2) Pj˜
Rj,I AR>
j,I Rj,I AR>
j,I ˜
where ˜
ΓRγj×γjsatisfies the two following conditions: For all uRγj,
j,I Rj,I AR>
j,I 1Rj,I AR>
The authors prove that the matrices ˜
Ajdefined in such a way verify the following
Aj= 0,(2.3)
Ajuu>Au, uRn,(2.4)
Aju6ku>Au, uRn,(2.5)
where kis a number that depends on the local SPSD splitting matrices and can be
at most equal to the number of subdomains k6N. The authors also show that the
local matrices defined in GenEO [33,12] can be seen as a local SPSD splitting.
In [3], the authors highlight that the key idea to construct a coarse space relies
on the ability to identify the so-called local SPSD splitting matrices. They present
a class of algebraically constructed coarse spaces based on the local SPSD splitting
matrices. Moreover, this class can be extended to a larger variety of local SPSD
matrices. This extension has the advantage of allowing to construct efficient coarse
spaces for a multilevel structure in a practical way. This is discussed in the following
3. Extension of the class of coarse spaces. In this section we extend the
class of coarse spaces presented in [3]. To do so, we present a class of matrices, that is
larger than the class of local SPSD splitting matrices. This will be our main building
block in the construction of efficient coarse spaces. Furthermore, this extension can
lead to a straightforward construction of hierarchical coarse spaces in a multilevel
Schwarz preconditioner setting.
3.1. Extension of the class of local SPSD splitting matrices. Regarding
the two-level additive Schwarz method, the authors of [3] introduced the local SPSD
splitting related to a subdomain as defined in (2.2). As it can be seen from the theory
presented in that paper, it is not necessary to have the exact matrices Rj,I AR>
j,I ,
Rj,I AR>
j,Γ, and Rj,ΓAR>
j,I in the definition of the local SPSD splitting in order to
build an efficient coarse space. Indeed, the one and only necessary condition is to
define for each subdomain jan SPSD matrix ˜
Ajfor j= 1, . . . , N such that:
Aj= 0,
Aju6ku>Au, uRn,
where kis a number that depends on the local SPSD matrices ˜
Ajfor j= 1, . . . , N .
The first condition means that ˜
Ajhas the local SPSD structure associated with sub-
domain j, i.e., it has the following form:
0 0,
where ˜
I,ΓRnj×nj. The second condition is associated with the stable decom-
position property [36,12]. Note that with regard to the local SPSD matrices, the
authors in [33] only use these two conditions. That is to say, with matrices that verify
conditions (3.1) the construction of the coarse space is straightforward through the
theory presented in either [33] or [3]. To this end, we define in the following the local
SPSD (LSPSD) matrix associated with subdomain jas well as the associated local
filtering subspace that contributes to the coarse space.
Definition 3.1 (local SPSD matrices). An SPSD matrix ˜
Ai,j Rni×niis called
local SPSD (LSPSD) with respect to subdomain jif
Ai,j = 0,
j=1 ˜
Ai,j u6kiu>Aiu,
where ki>0.
We note that the local SPSD splitting matrices form a subset of the local SPSD
3.2. Multilevel coarse spaces. This section summarizes the steps to be per-
formed in order to construct the coarse space at level ionce we have the LSPSD
matrices associated with each subdomain at that level.
Definition 3.2 (coarse space based on LSPSD matrices). Let ˜
Ai,j Rni×nifor
j= 1, . . . , Nibe LSPSD matrices. Let Di,j Rni,j for j= 1, . . . , Nibe the partition
of unity. Let τi>0be a given number. For a subdomain jJ1; NiK, let
Gi,j =Di,j Ri,j AiR>
i,j Di,j .
Let ˜
Pi,j be the projection on range(Ri,j ˜
i,j )parallel to ker(Ri,j ˜
i,j ). Let Ki,j =
ker(Ri,j ˜
Ai,j R>
i,j ). Consider the generalized eigenvalue problem:
Pi,j Gi,j ˜
Pi,j ui,j,k =λi,j,kRi,j ˜
Ai,j R>
i,j ui,j,k,
(ui,j,k, λi,j,k )range(Ri,j ˜
Ai,j R>
i,j )×R.
(3.3) Zi,j =Ki,j span {ui,j,k|λi,j,k > τi}.
Then, the coarse space associated with LSPSD matrices ˜
Ai,j for j= 1, . . . , Niat level i
is defined as:
(3.4) Si=
i,j Di,j Zi,j .
Following notations from section 1, the columns of Vispan the coarse space Si. The
matrix Ai+1 is defined as:
(3.5) Ai+1 =V>
The local SPSD splitting matrices at level 1 will play an important role in the
construction of the LSPSD matrices at subsequent levels. In the following, we present
an efficient approach for computing LSPSD matrices for levels greater than 1.
4. Partitioning for levels strictly greater than 1. In this section, we ex-
plain how to obtain the partitioning sets Ωi,j,I for (i, j)J2; LK×J1; NiK. Once the
sets Ωi,j,I for j= 1, . . . , Niare defined at level i, the following elements are readily
available: sets Γi,j ,i,j , and Ωi,j ; restriction operators Ri,j,I , Ri,j,Γ, Ri,j,, and Ri,j ;
permutation matrices Pi,j for j= 1, . . . , Ni. The partition of unity is constructed in
an algebraic way. The mth diagonal element of Di,j is 1 if m6ni,j,I and 0 otherwise.
4.1. Superdomains as unions of several subdomains. In this section, we
introduce the notion of a superdomain. It refers to the union of several neighboring
subdomains. Let Gi,1,...,Gi,Ni+1 be disjoint subsets of J1; NiK, where SNi+1
j=1 Gi,j =
J1; NiK. We call the union of the subdomains {kJ1; NiK:k∈ Gi,j }superdomain j,
for j= 1, . . . , Ni+1.Figure 4.1 gives an example of how to set superdomains. Though
this definition of superdomains may look somehow related to the fine mesh, it is in
practice done at the algebraic level, as explained later on. Note that the indices of
columns and rows of Ai+1 are associated with the vectors contributed by the subdo-
mains at level iin order to build the coarse space Si, see Figure 4.2. Hence, defining
subdomains on the structure of Ai+1 is natural once we have the subsets Gi,j, for
j= 1, . . . , Ni+1.
1,1,I 1,2,I
Fig. 4.1.Left: 16 subdomains at level 1. Right: 4superdomains at level 1. G1,j =J4(j1) +
1; 4(j1) + 4K.
Fig. 4.2.Illustration of the correspondence of indices between the columns of Vi(left) and the
rows and columns of Ai+1 (right). Having no overlap in Viis possible through a non-overlapping
partition of unity.
4.2. Heritage from superdomains. Let ei,j be the set of indices of the vectors
that span R>
i,j Di,j Zi,j in the matrix Vifor some (i, j)J1; L1K×J1; NiK, see
Figure 4.2. We define Ωi+1,j,I =k∈Gi,j ei,k, for j= 1, . . . , Ni+1. We denote Ωi+1,j,Γ
the subset of J1; ni+1K\i+1,j,I whose elements are at distance 1 from Ωi+1,j,I through
the graph of Ai+1. We note that
p∈Gi,j [
where Vi,j represents the set of subdomains that are neighbors of subdomain jat
level ifor j= 1, . . . , Ni. The overlapping subdomain jis defined by the set Ωi+1,j =
[Ωi+1,j,I ,i+1,j,Γ]. The rest of the sets, restriction, and prolongation operators can
be defined as given in section 1.
5. LSPSD matrices for levels strictly greater than 1. In [33,12,3], differ-
ent methods are suggested to obtain local SPSD splitting matrices at level 1. These
matrices are used to construct efficient two-level additive Schwarz preconditioners.
Here in this section, we do not discuss the construction of these matrices at level 1. We
suppose that we have the local SPSD matrices ˜
A1,j Rn1×n1for j= 1, . . . , N1. We
focus on computing LSPSD matrices ˜
Ai,j Rni×nifor (i, j)J2; LK×J1; NiK. We also
suppose that the coarse space S1is available, i.e., the matrices V1and A2=V>
are known explicitly.
Proposition 5.1. Let ibe a fixed level index, and let ˜
Ai,j be an LSPSD of Ai,
(see Definition 3.1), associated with subdomain j, for j= 1, . . . , Ni. Let Gi,1,...,Gi,Ni+1
be a set of superdomains at level iassociated with the partitioning at level i+ 1, see
subsection 4.1. Let V>
ibe the restriction matrix to the coarse space at level i. Then,
the matrix ˜
Ai+1,j which is defined as:
Ai+1,j =X
satisfies the conditions in Definition 3.1. That is, ˜
Ai+1,j is LSPSD of Ai+1 with
respect to subdomain jfor j= 1, . . . , Ni+1.
Proof. To prove that ˜
Ai+1,j is LSPSD of Ai+1 with respect to subdomain j, we
have to prove the following:
Ai+1,j = 0
j=1 ˜
Ai+1,j u6ki+1u>Ai+1 ufor all uRni+1 .
First, note that Ri,k ˜
Ai,j = 0 for all non-neighboring subdomains kof subdomain j.
This yields Z>
i,kDi,k Ri,k ˜
Ai,j = 0 for these subdomains k.
Now, let mJ1; ni+1K\i+1,j . We will show that the mth row of ˜
Ai+1,j is zero.
Following the partitioning of subdomains at level i+ 1, there exists a subdomain Ωp0
such that the mth column of Viis part of R>
i,p0Di,p0Zi,p0. We denote this column
vector by vm. Furthermore, the subdomain p0is not a neighbor of any subdomain
that is a part of the superdomain Gi,j . Hence, v>
Ai,k = 0 for k∈ Gi,j . The mth row
of ˜
Ai+1,j is given as v>
Ai,kVi. Then, v>
Ai,k = 0, and the mth row
of ˜
Ai+1,j is zero.
To prove the second condition, we have
Ai+1,j u=u>
j=1 X
Since {Gi,j }16j6Ni+1 form a disjoint partitioning of J1; NiK, we can write
Ai+1,j u=u>
Ai,k is an LSPSD matrix of Aifor k= 1, . . . , Ni. Hence, we have
Ai+1,j u6kiu>V>
We finish the proof by setting ki+1 =ki.
Figure 5.1 gives an illustration of the LSPSD construction provided by Proposi-
tion 5.1.Figure 5.1 (top left) represents the matrix A1. The graph of A1is partitioned
into 16 subdomains. Each subdomain is represented by a different color. Figure 5.1
(top right) represents the matrix V1whose column vectors form a basis of the coarse
space S1. Colors of columns of V1correspond to those of subdomains in A1.Figure 5.1
(bottom left) represents the matrix A2=V>
1A1V1. Note that column and row indices
of A2are associated with column indices of V1. Four subdomains are used at level 2.
The partitioning at level 2 is related to the superdomain G1,j =J4(j1)+1; 4(j1)+4K
for j= 1,...,4. Figure 5.1 (bottom right) represents an LSPSD matrix of A2with
respect to subdomain 1 at level 2.
Theorem 5.2 shows that the third condition of the fictitious subspace lemma
Lemma 2.1 holds at level ifor i= 1, . . . , L.
Theorem 5.2. Let ˜
Ai,j be an LSPSD of Aiassociated with subdomain j, for
(i, j)J1; LK×J1; NiK. Let τi>0,Zi,j be the subspace associated with ˜
Ai,j , and
Pi,j be the projection on Zi,j as defined in Lemma 2.5. Let uiRniand let ui,j =
Di,j Ini,j Pi,j Ri,j uifor (i, j)J1; LK×J1; NiK. Let ui,0be defined as,
i,j Di,j Pi,j Ri,j ui
Let mi= (2 + (2ki,c + 1)kiτi)1. Then,
i,j ui,j ,
(5.1) mi
i,j Ri,j AiR>
i,j ui,j 6u>
Proof. We have
i,j ui,j =ViV>
i,j Di,j Pi,j Ri,j ui
i,j ui,j
1,11D1,11 Z1,11)>A1(R>
1,11D1,11 Z1,11)>A1(R>
Fig. 5.1.Illustration of the LSPSD construction provided by Proposition 5.1. Top left:
the matrix A1, top right: V1, bottom left: the matrix A2=V>
1A1V1, bottom right: ˜
A1,j V1, where G1,1= 1,...,4
Since for all y∈ Si,ViV>
iy=y, we have
i,j ui,j =
i,j Di,j Pi,j Ri,j ui+
i,j Di,j Ini,j Pi,j Ri,j ui,
i,j Di,j Ri,j ui,
To prove the inequality (5.1), we start with the inequality from Lemma 2.4. We
i,j Ri,j AiR>
i,j ui,j 62u>
iAiui+ (2ki,c + 1)
i,j Ri,j AiR>
i,j ui,j ,(5.2)
where we chose uBiin Lemma 2.4 to be (ui,j)j=0,...,Niand uAi=ui. In Definition 3.2,
we defined Zi,j, such that for all wRni,j we have
(Ini,j Pi,j )w>Di,j Ri,j AiR>
i,j Di,j (Ini,j Pi,j )w6τiw>Ri,j ˜
Ai,j R>
i,j w.
Hence, in the special case w=Ri,j ui, we can write
(Ini,j Pi,j )Ri,j ui>Di,j Ri,j AiR>
i,j Di,j (Ini,j Pi,j )Ri,j ui
6τi(Ri,j ui)>Ri,j ˜
Ai,j R>
i,j (Ri,j ui).
i,j Ri,j AiR>
i,j ui,j 6τi(Ri,j ui)>Ri,j ˜
Ai,j R>
i,j (Ri,j ui).
Plugging this inequality in (5.2) gives
i,j Ri,j AiR>
i,j ui,j 62u>
iAiui+ (2ki,c + 1) τi
(Ri,j ui)>Ri,j ˜
Ai,j R>
i,j (Ri,j ui).
Since ˜
Ai,j is local, we have
(Ri,j ui)>Ri,j ˜
Ai,j R>
i,j (Ri,j ui) = u>
Ai,j ui,for j= 1, . . . , Ni.
By using the fact that ˜
Ai,j is LSPSD of Aifor j= 1, . . . , Ni, we obtain the following:
i,j Ri,j AiR>
i,j ui,j 62u>
iAiui+ (2ki,c + 1) kiτiu>
Multiplying both sides with miends the proof, i.e.,
i,j Ri,j AiR>
i,j ui,j 6u>
In [3], the authors presented the minimal subspace that replaces Zi,j (defined in (3.3)
and used in Theorem 5.2) that is required to prove Theorem 5.2. The main difference
with respect to the subspace that we define in (3.3) is that it is not necessary to include
the entire kernel of the LSPSD matrix, Ki,j, in Zi,j , see Definition 3.2. Nevertheless,
in this work, we include the entire kernel of the LSPSD matrix in the definition of
Zi,j . This allows us to ensure that the kernels of Neumann matrices are transferred
across the levels, see Theorem 5.4. And in addition, this corresponds to the definition
used in GenEO [12, Lemma 7.7] and to its implementation in the HPDDM library
Theorem 5.3 provides an upper bound on the condition number of the precondi-
tioned matrix M1
iAifor i= 1, . . . , L.
Theorem 5.3. Let Mibe the additive Schwarz preconditioner at level icombined
with the coarse space correction induced by Sidefined in (3.4). The following inequality
iAi6(ki,c + 1) (2 + (2ki,c + 1)kiτi).
Proof. Lemma 2.2,Lemma 2.3, and Theorem 5.2 prove that the multilevel precon-
ditioner verifies the conditions in Lemma 2.1 at each level i. Hence, the spectrum of the
preconditioned matrix M1
iAiis contained in the interval [(2 + (2ki,c + 1)kiτi)1, ki,c+
1]. Equivalently, the condition number of the preconditioned matrix at level iverifies
the following inequality
iAi6(ki,c + 1) (2 + (2ki,c + 1)kiτi).
Proposition 5.1 shows that the constant kiassociated with the LSPSD matrices at
level iis independent of the number of levels and bounded by the number of subdo-
mains at level 1. Indeed,
k1kifor i= 2, . . . , L.
Furthermore, in the case where the LSPSD matrices at the first level are the Neumann
matrices, kiis bounded by the maximum number of subdomains at level 1 that share
an unknown.
The constant ki,c for i= 1, . . . , L is the minimum number of distinct colors so that
i,j }16j6Niof the same color are mutually Ai-orthogonal. Both constants
kiand ki,c are independent of the number of subdomains for each level i.
The constant τican be chosen such that the condition number of the precondi-
tioned system at level iis upper bounded by a prescribed value. Hence, this allows
to have a robust convergence of the preconditioned Krylov solver at each level.
Algorithm 5.1 presents the construction of the multilevel additive Schwarz method
by using GenEO. The algorithm iterates over the levels. At each level, three main
operations are performed. First, the construction of the LSPSD matrices. At level 1,
the LSPSD matrices are the Neumann matrices, otherwise, Proposition 5.1 is used
to compute them. Once the LSPSD matrix is available, the generalized eigenvalue
problem in (3.2) has to be solved concurrently. Given the prescribed upper bound on
the condition number, Zi,j can be set. Finally, the coarse space is available and the
coarse matrix is assembled.
The following Theorem 5.4, describes how the kernel of Neumann matrices are
transferred across the levels.
Theorem 5.4. Suppose that ˜
A1,j is the Neumann matrix associated with the sub-
domain 1,j for jJ1; N1K. For (i, j)J2; LK×J1; NiK, let
Ai,j be the LSPSD matrices associated with Ai,j defined in Proposition 5.1,
• Gi1,j be the corresponding superdomains,
• G1
i1,j be the union of subdomains at level 1 which contribute hierarchically
to obtain Gi1,j ,
AGi1,j be the Neumann matrix associated with G1
i1,j (seeing G1
i1,j as a
AGi1,j be the restriction of Ato the subdomain G1
i1,j .
Then, the kernel of ˜
AGi1,j is included in the kernel of Qi1
l=1 Vl˜
Ai,j Qi1
l=1 Vl>
Proof. First, note that for any LSPSD matrix computed as in Proposition 5.1, we
Ai,j i1
= i1
Vl! i1
A1,k i1
Vl! i1
Algorithm 5.1 Multilevel GenEO
Require: A1=ARn×nSPD, L+ 1 number of levels, Ninumber of subdomains
at each level, Gi,j sets of superdomains
Ensure: preconditioner at each level i,M1
iwith bounded condition number of
1: for i= 1, . . . , L do
2: for each subdomain j= 1, . . . , Nido
3: Ai,j =Ri,j AiR>
i,j (local matrix associated with subdomain j)
4: if i= 1 then
5: local SPSD ˜
Ai,j is Neumann matrix of subdomain j
6: else
7: compute local SPSD matrix as
Ai,j =X
8: end if
9: solve the generalized eigenvalue problem (3.2), set Zi,j as in (3.3)
10: end for
11: Si=LNi
j=1 Di,j R>
i,j Zi,j ,Vibasis of Si
12: coarse matrix Ai+1 =V>
iAiVi,Ai+1 Rni+1×ni+1
13: end for
14: M1
j=1 R>
i,j A1
i,j Ri,j
Moreover, due to the fact that ˜
AGi1,j and ˜
A1,k are Neumann matrices, we have
AGi1,j u6u>X
AGi1,j u.
On one hand, the kernels of ˜
A1,k for k∈ G1
i,j are included, by construction, in the im-
age of V1, see Definition 3.2. So is their intersection which is the kernel of Pk∈G1
On the other hand, the previous two-sided inequality implies that the kernels of ˜
and Pk∈G1
A1,k are identical. Hence, the kernel of ˜
AGi1,j is included in the image
of QQ>, where Q=Qi1
l=1 Vl.
Theorem 5.4 proves that the kernel of the Neumann matrix of a union of subdomains
at level 1 that hierarchically contribute to form a subdomain at level iis conserved by
the construction of the hierarchical coarse spaces. For example in the case of linear
elasticity, it is essential to include the rigid body motions in the coarse space in order
to have a fast convergence. As these are included in the kernel of the Neumann matrix
of the subdomain, the hierarchical coarse space includes them, consequently.
6. Numerical experiments. In this section, the developed theory is validated
numerically with FreeFEM [14] for finite element discretizations and HPDDM [19]
for domain decomposition methods. We present numerical experiments on two highly
challenging problems illustrating the efficiency and practical usage of the proposed
method. For both problems, we use N1= 2,048 MPI processes (equal to the number
of subdomains at level 1), and the domain partitioning is performed using ParMETIS
[22], with no control on the alignments of subdomain interfaces. We compare the
two-level GenEO preconditioner and its multilevel extension by varying N2between 4
and 256. For the two-level method, N2corresponds to the number of MPI processes
that solve the coarse problem in a distributed fashion using MKL CPARDISO [17].
For the multilevel method, N3is set to 1, i.e., a three-level method is used. The goal
of these numerical experiments is to show that when one switches from a two-level
method with an exact coarse solver, to our proposed multilevel method, the number
of outer iterations is not impacted. Thus, three levels are sufficient. As an outer
solver, since all levels but the coarsest are solved approximately, the flexible GMRES
[31] is used. It is stopped when relative unpreconditioned residuals are lower than
106. Subdomain matrices {Ai,j }16i62,16j6Niare factorized concurrently using MKL
PARDISO, and eigenvalue problems are solved using ARPACK [24]. In both, two-
and three-level GenEO, we factorize the local matrices A1,j for jJ1; N1Kand solve
the generalized eigenvalue problems concurrently at the first level. For this reason,
we do not take into account the time needed for these two steps which are performed
without any communication between MPI processes. We compare the time needed
to assemble and factorize A2in the two-level approach against the time needed to
assemble A2and local SPSD matrices ˜
A2,j for jJ1; N2K, solve the generalized
eigenvalue problems concurrently on the second level, assemble, and factorize the
matrix A3in the three-level approach. We also compare the time spent in the outer
Krylov solver during the solution phase. Readers interested by a comparison of the
efficiency of GenEO and multigrid methods such as GAMG [1] are referred to [18].
FreeFEM scripts used to produce the following results are available at the following
6.1. Diffusion test cases. The scalar diffusion equation with highly heteroge-
neous coefficient κis solved in [0,1]d(d= 2 or 3). The strong formulation of the
equation is:
−∇ · (κu) = 1 in Ω,
u= 0 on ΓD,
∂n = 0 on ΓN.
The exterior normal vector to the boundary of Ω is denoted n. ΓDis the subset
of the boundary of Ω corresponding to x= 0 in 2D and 3D. ΓNis defined as the
complementary of ΓDwith respect to the boundary of Ω. We discretize the equation
using P2and P4finite elements in the 3D and 2D test cases, respectively. The number
of unknowns is 441 ×106and 784 ×106, with approximately 28 and 24 nonzero
elements per row in the 3D and 2D cases, respectively. The heterogeneity is due
to the jumps in the diffusion coefficient κ, see Figure 6.1, which is modeled using
a combination of jumps and channels, cf. the file coefficients.idp from https:
The results in two dimensions are reported in Table 6.1. The number of outer
iterations for both two- and three-level GenEO is 32. The size of the level 2 operator
is n2= 25 ×2,048 = 51,200. In all numerical results, the number of eigenvectors per
subdomain, here 25, is fixed. This is because ARPACK cannot a priori compute all
eigenpairs below a certain threshold, and an upper bound has to be provided instead.
1note to reviewers: the repository is now public
Fig. 6.1.Variation of the coefficient κused for the diffusion test case
HPDDM is capable of filtering the eigenpairs for which eigenvalues are above the user-
specified GenEO threshold from Lemma 2.5. However, this means that the coarse
operator may be unevenly distributed. With a fixed number of eigenvectors per sub-
domain, it is possible to use highly optimized uniform MPI routines and block matrix
formats. Hence, for performance reasons, all eigenvectors computed by ARPACK are
kept when building coarse operators. It is striking that the multilevel method does not
deteriorate the numerical performance of the outer solver. For the two-level method,
the first column corresponds to the time needed to assemble the Galerkin operator A2
from (3.5) (assuming V1has already been computed by ARPACK), and to factorize it
using N2MPI processes. For the three-level method, the first column corresponds to
the time needed to assemble level 2 local subdomain matrices {A2,j }16j6N2, level 2
local SPSD matrices, solve the generalized eigenvalue problem (3.2) concurrently, as-
semble the Galerkin operator A3and factorize it on a single process. The size of
the level 3 operator is n3= 20 ×N2. For both two- and three-level methods, the
second column is the time spent in the outer Krylov solver once the preconditioner
has been set up. In the last column of the three-level method, the number of inner
iterations for solving systems involving A2, which is not inverted exactly anymore,
is reported. For all tables, this column is an average over all successive outer itera-
tions. Another important numerical property of our method is that, thanks to fully
controlled bounds at each level, the number of inner iterations is low, independently
of the number of superdomains N2. Because this problem is not large enough, it is
still tractable by a two-level method, for which HPDDM was highly optimized for.
Thus, there is no performance gain to be expected at this scale. However, one can
notice that the construction of the coarse operator(s) scales nicely with N2for the
three-level method, whereas the performance of the direct solver MKL CPARDISO
quickly stagnates because of the finer and finer parallel workload granularity.
The results in three dimensions are reported in Table 6.2. The number of outer
iterations for both the two- and three-level GenEO is 19. The observations made
in two dimensions still hold, and the dimensions of A2and A3are the same. Once
again, it is important to note that the number of outer iterations is the same for both
two-level GenEO three-level GenEO
N2CS solve % of nnz A2CS solve inner it. % of nnz A3
4 2.4 11.9
6.5 27.4 14 56.0
16 1.8 11.3 3.6 15.4 15 19.0
64 1.9 12.1 3.0 16.7 14 5.5
256 2.4 18.4 2.8 13.9 13 1.4
Table 6.1
Diffusion 2D test case, comparison between two- and
three-level GenEO. The percentage of nonzero entries
in A1is 0.3%.
two-level GenEO three-level GenEO
N2CS solve % of nnz A2CS solve inner it. % of nnz A3
4 7.0 20.9
16.9 43.6 17 62.0
16 5.0 19.8 7.7 26.7 17 28.0
64 5.1 20.1 5.8 32.7 15 8.9
256 5.2 24.1 5.3 22.6 14 2.6
Table 6.2
Diffusion 3D test case, comparison between two- and
three-level GenEO. The percentage of nonzero entries
in A1is 0.5%.
6.2. Linear elasticity test cases. The system of linear elasticity with highly
heterogeneous elastic moduli is solved in 2D and 3D. The strong formulation of the
equation is given as:
div σ(u) + f= 0 in Ω,
u= 0 on ΓD,
σ(u)·n= 0 on ΓN.
The physical domain Ω is a beam of dimensions [0,10] ×[0,1], extruded for z
[0,1] in 3D. The Cauchy stress tensor σ(·) is given by Hooke’s law: it can be expressed
in terms of Young’s modulus Eand Poisson’s ratio ν.
σij (u) = (2µεij (u)i6=j,
2µεii(u) + λdiv(u)i=j,
εij (u) = 1
∂xj, µ =E
2(1 + ν),and λ=
The exterior normal vector to the boundary of Ω is denoted n. ΓDis the subset
of the boundary of Ω corresponding to x= 0 in 2D and 3D. ΓNis defined as the
complementary of ΓDwith respect to the boundary of Ω. We discretize (6.1) using
the following vectorial finite elements: (P2,P2,P2) in 3D and (P3,P3) in 2D. The
number of unknowns is 146 ×106and 847 ×106, with approximately 82 and 34
nonzero elements per row in the 3D and 2D cases, respectively. The heterogeneity is
due to the jumps in Eand ν. We consider discontinuous piecewise constant values
for Eand ν: (E1, ν1) = (2 ×1011,0.25), (E2, ν2) = (107,0.45), see Figure 6.2.
Results in two (resp. three) dimensions are reported in Table 6.3 (resp. Table 6.4).
The number of outer iterations are 73 and 45 respectively. For these test cases, we
Fig. 6.2.Variation of the structure coefficients used for the elasticity test case
two-level GenEO three-level GenEO
N2CS solve % of nnz A2CS solve inner it. % of nnz A3
4 4.8 52.7
22.5 179.3 31 43.0
16 3.9 50.3 9.3 124.9 57 17.0
64 4.0 53.1 7.2 71.5 34 4.9
256 4.8 63.2 6.8 71.2 44 1.4
Table 6.3
Elasticity 2D test case, comparison between two- and
three-level GenEO. The percentage of nonzero entries
in A1is 0.4%.
slightly relaxed the criterion for selecting eigenvectors in coarse spaces, which explains
why the iteration counts increase. However, the same observations as for the diffusion
test cases still hold. The dimension of the level 2 matrix is n2= 50×2,048 = 1.02·105,
while for the level 3 matrix it is n3= 20 ×N2. This means that 50 (resp. 20)
eigenvectors are kept per level 1 (resp. level 2) subdomains. We observe that the
number of iterations of the inner solver increases slowly when increasing the number
of subdomains from 4 to 256 in the 2D case and remains almost constant in the 3D
case. In terms of runtime, the two-level GenEO is faster than three-level GenEO for
these matrices of medium dimensions.
To show the potential of our method at larger scales, a three-dimensional linear
elasticity problem of size 593 ×106is now solved on N1= 16,384 processes and
N2= 256 superdomains. With the two-level method, A2is assembled and factorized
in 40.8 seconds. With the three-level method, this step now takes 35.1 seconds, see
Table 6.5. There is a two iterations difference in the iteration count. Not taking
into account the preconditioner setup, the problem is solved in 222.5 seconds in the
two-level case and 90.1 seconds in the multilevel case. In this test case the cost of
applying the two-level preconditioner on a given vector is approximately twice the cost
of applying the multilevel variant. At this regime, it is clear that there are important
gains for the solution phase. At even greater scales, gains for the setup phase are
also expected. Moreover, another interesting fact to note regarding computation time
is that the generalized eigenvalue problems solved concurrently at the first level to
obtain V1actually represents a significant part of the total time of 377.6 seconds (resp.
244.8 seconds) with the two- (resp. three-)level method: 78.2 seconds. This cost can
be reduced by taking a larger number of (smaller) subdomains, with the drawback of
increasing the size of V1and thus A2. This drawback represents a clear bottleneck
for the two-level method but is alleviated by using the three-level method, making it
a good candidate for problems at greater scales.
7. Conclusion. In this paper, we reviewed general properties of overlapping
Schwarz preconditioners and presented a framework for its multilevel extension. We
two-level GenEO three-level GenEO
N2CS solve % of nnz A2CS solve inner it. % of nnz A3
4 28.5 46.9
78.9 296.7 23 43.0
16 17.3 35.4 24.5 124.5 23 19.0
64 15.0 33.2 15.4 62.2 21 7.9
256 13.6 40.7 10.6 50.7 23 2.5
Table 6.4
Elasticity 3D test case, comparison between two- and
three-level GenEO. The percentage of nonzero entries
in A1is 3.3%.
two-level GenEO three-level GenEO
N2CS solve CS solve inner it.
256 40.8 222.5 35.1 90.1 11
Table 6.5
Elasticity 3D test case, comparison between two- and three-level GenEO
generalized the local SPSD splitting presented in [3] to cover a larger set of matrices
leading to more flexibility for building robust coarse spaces. Based on local SPSD
matrices on the first level, we presented how to compute local SPSD matrices for
coarser levels. The multilevel solver based on hierarchical local SPSD matrices is
robust and guarantees a bound on the condition number of the preconditioned matrix
at each level depending on predefined values. Numerical experiments illustrate the
theory and prove the efficiency of the method on challenging problems of large size
arising from heterogeneous linear elasticity and diffusion problems with jumps in the
coefficients of multiple orders of magnitude.
8. Acknowledgments. We would like to thank the anonymous referees for their
comments and remarks that helped us improve the clarity of this manuscript. This
work was granted access to the HPC resources of TGCC@CEA under the allocation
A0050607519 made by GENCI. The work of the second author was supported by the
NLAFET project as part of European Union’s Horizon 2020 research and innovation
program under grant 671633.
[1] M. F. Adams, H. H. Bayraktar, T. M. Keaveny, and P. Papadopoulos,Ultrascalable
Implicit Finite Element Analyses in Solid Mechanics with over a Half a Billion Degrees of
Freedom, in Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, SC ’04,
IEEE Computer Society, 2004.
[2] M. F. Adams and J. W. Demmel,Parallel Multigrid Solver for 3D Unstructured Finite El-
ement Problems, in Proceedings of the 1999 ACM/IEEE Conference on Supercomputing,
SC ’99, ACM, 1999.
[3] H. Al Daas and L. Grigori,A class of efficient locally constructed preconditioners based on
coarse spaces, SIAM Journal on Matrix Analysis and Applications, 40 (2019), pp. 66–91.
[4] S. Badia, A. Mart
ın, and J. Principe,Multilevel balancing domain decomposition at extreme
scales, SIAM Journal on Scientific Computing, 38 (2016), pp. C22–C52.
[5] P. E. Bjørstad, M. J. Gander, A. Loneland, and T. Rahman,Does SHEM for Additive
Schwarz Work Better than Predicted by Its Condition Number Estimate?, in International
Conference on Domain Decomposition Methods, Springer, 2017, pp. 129–137.
[6] A. Borz
ı, V. De Simone, and D. di Serafino,Parallel algebraic multilevel Schwarz precon-
ditioners for a class of elliptic PDE systems, Computing and Visualization in Science, 16
(2013), pp. 1–14.
[7] M. Brezina, A. Cleary, R. Falgout, V. Henson, J. Jones, T. Manteuffel, S. McCormick,
and J. Ruge,Algebraic Multigrid Based on Element Interpolation (AMGe), SIAM Journal
on Scientific Computing, 22 (2001), pp. 1570–1592.
[8] X.-C. Cai and M. Sarkis,A restricted additive Schwarz preconditioner for general sparse
linear systems, SIAM Journal on Scientific Computing, 21 (1999), pp. 792–797.
[9] T. F. Chan and T. P. Mathew,Domain decomposition algorithms, Acta Numerica, 3 (1994),
pp. 61––143.
[10] T. Chartier, R. D. Falgout, V. E. Henson, J. Jones, T. Manteuffel, S. McCormick,
J. Ruge, and P. S. Vassilevski,Spectral AMGe (ρAMGe), SIAM Journal on Scientific
Computing, 25 (2003), pp. 1–26.
[11] C. Chevalier and F. Pellegrini,PT-SCOTCH: A tool for efficient parallel graph ordering,
Parallel Computing, 34 (2008), pp. 318–331. Parallel Matrix Algorithms and Applications.
[12] V. Dolean, P. Jolivet, and F. Nataf,An introduction to domain decomposition methods,
Society for Industrial and Applied Mathematics, 2015. Algorithms, theory, and parallel
[13] M. Griebel and P. Oswald,On the abstract theory of additive and multiplicative Schwarz
algorithms, Numerische Mathematik, 70 (1995), pp. 163–180.
[14] F. Hecht,New development in FreeFem++, Journal of Numerical Mathematics, 20 (2012),
pp. 251–266.
[15] A. Heinlein, A. Klawonn, O. Rheinbach, and F. R¨
over,A Three-Level Extension of the
GDSW Overlapping Schwarz Preconditioner in Three Dimensions, technical report, Uni-
versit¨at zu K¨oln, November 2018.
[16] V. E. Henson and U. M. Yang,BoomerAMG: A parallel algebraic multigrid solver and pre-
conditioner, Applied Numerical Mathematics, 41 (2002), pp. 155–177. Developments and
Trends in Iterative Methods for Large Systems of Equations.
[17] Intel,MKL web page., 2019.
[18] P. Jolivet,Domain decomposition methods. Application to high-performance computing, the-
ses, Universit´e de Grenoble, Oct. 2014.
[19] P. Jolivet, F. Hecht, F. Nataf, and C. Prud’homme,Scalable domain decomposition pre-
conditioners for heterogeneous elliptic problems, in Proceedings of the International Con-
ference on High Performance Computing, Networking, Storage and Analysis, SC13, ACM,
[20] J. Jones and P. Vassilevski,AMGe Based on Element Agglomeration, SIAM Journal on
Scientific Computing, 23 (2001), pp. 109–133.
[21] D. Kalchev, C. Lee, U. Villa, Y. Efendiev, and P. Vassilevski,Upscaling of mixed finite
element discretization problems by the spectral AMGe method, SIAM Journal on Scientific
Computing, 38 (2016), pp. A2912–A2933.
[22] G. Karypis and V. Kumar,Multilevel k-way partitioning scheme for irregular graphs, Journal
of Parallel and Distributed Computing, 48 (1998), pp. 96–129.
[23] F. Kong and X.-C. Cai,A highly scalable multilevel Schwarz method with boundary geometry
preserving coarse spaces for 3D elasticity problems on domains with complex geometry,
SIAM Journal on Scientific Computing, 38 (2016), pp. C73–C95.
[24] R. Lehoucq, D. Sorensen, and C. Yang,ARPACK users’ guide: solution of large-scale
eigenvalue problems with implicitly restarted Arnoldi methods, vol. 6, Society for Industrial
and Applied Mathematics, 1998.
[25] J. Mandel, B. Soused
ık, and C. R. Dohrmann,Multispace and multilevel BDDC, Comput-
ing, 83 (2008), pp. 55–85.
[26] O. Marques, A. Druinsky, X. S. Li, A. T. Barker, P. Vassilevski, and D. Kalchev,Tuning
the coarse space construction in a spectral AMG solver, Procedia Computer Science, 80
(2016), pp. 212–221. International Conference on Computational Science 2016, ICCS 2016,
6-8 June 2016, San Diego, California, USA.
[27] S. V. Nepomnyaschikh,Mesh theorems of traces, normalizations of function traces and their
inversions, Russian Journal of Numerical Analysis and Mathematical Modelling, 6 (1991),
pp. 1–25.
[28] , Decomposition and fictitious domains methods for elliptic boundary value problems,
[29] Y. Notay,An aggregation-based algebraic multigrid method, Electronic Transactions on Nu-
merical Analysis, 37 (2010), pp. 123–146.
[30] C. Pechstein,Finite and boundary element tearing and interconnecting solvers for multiscale
problems, vol. 90, Springer Science & Business Media, 2012.
[31] Y. Saad.,A Flexible Inner–Outer Preconditioned GMRES Algorithm, SIAM Journal on Sci-
entific Computing, 14 (1993), pp. 461–469.
[32] Y. Saad,Iterative Methods for Sparse Linear Systems, Society for Industrial and Applied
Mathematics, 2nd ed., 2003.
[33] N. Spillane, V. Dolean, P. Hauret, F. Nataf, C. Pechstein, and R. Scheichl,Abstract
robust coarse spaces for systems of PDEs via generalized eigenproblems in the overlaps,
Numerische Mathematik, 126 (2014), pp. 741–770.
[34] J. Toivanen, P. Avery, and C. Farhat,A multilevel feti-dp method and its performance for
problems with billions of degrees of freedom, International Journal for Numerical Methods
in Engineering, 116 (2018), pp. 661–682.
[35] A. Toselli and O. Widlund,Domain Decomposition Methods - Algorithms and Theory,
Springer Series in Computational Mathematics, Springer Berlin Heidelberg, 2005.
[36] J. Xu,Theory of Multilevel Methods, PhD thesis, Cornell University, 1989.
[37] S. Zampini,PCBDDC: A class of robust dual-primal methods in PETSc, SIAM Journal on
Scientific Computing, 38 (2016), pp. S282–S306.
... (2) Two-level and multilevel preconditioners: those are usually a combination of a one-level method and a coarse space correction. While the one-level part can bound from above the largest eigenvalue, the coarse space is used to bound from below the smallest eigenvalue such that the condition number of the preconditioned matrix is bounded [2,3,4,12,15,16,18,19,28,29,31,49,42,45]. ...
... Algebraic local SPSD splitting of an SPD matrix. We now recall the definition of an algebraic local SPSD splitting of an SPD matrix given in [2] and generalized in [3]. ...
... Note that if the matrices A ii for i = 1, . . . , N are formed explicitly as in (3.2), we can use the strategy that we proposed in [3] to construct a multilevel preconditioner with the same properties. ...
Full-text available
Domain decomposition (DD) methods are widely used as preconditioner techniques. Their effectiveness relies on the choice of a locally constructed coarse space. Thus far, this construction was mostly achieved using non-assembled matrices from discretized partial differential equations (PDEs). Therefore, DD methods were mainly successful when solving systems stemming from PDEs. In this paper, we present a fully algebraic multilevel DD method where the coarse space can be constructed locally and efficiently without any information besides the coefficient matrix. The condition number of the preconditioned matrix can be bounded by a user-prescribed number. Numerical experiments illustrate the effectiveness of the preconditioner on a range of problems arising from different applications.
... In the last two decades, there has been a great advance in the development of spectral coarse spaces that yield efficient preconditioners. Spectral coarse spaces were initially proposed in the multigrid community for elliptic PDEs with self-adjoint operators [15,19,24,34], and similar ideas were later picked up by the DD community for the same kind of problems [2,3,4,7,33,32,40,48,49]. The past three years have seen several approaches to tackle symmetric indefinite systems and non-self-adjoint problems. ...
... Such preconditioners have been used to solve a large class of systems arising from a range of engineering applications (see, for example, [3,5,29,35,38,47,51] and references therein). We denote by D i ∈ R ni×ni , i ∈ 1, N , any non-negative diagonal matrices such that ...
... First presented and analyzed in [2], local HPSD splitting matrices provide a framework to construct robust two-level Schwarz preconditioners for sparse HPD matrices. Recently, this has led to the introduction of robust multilevel Schwarz preconditioners for finite element SPD matrices [3], sparse normal equations matrices [5], and sparse general SPD matrices [4]. ...
Full-text available
Domain decomposition methods are among the most efficient for solving sparse linear systems of equations. Their effectiveness relies on a judiciously chosen coarse space. Originally introduced and theoretically proved to be efficient for self-adjoint operators, spectral coarse spaces have been proposed in the past few years for indefinite and non-self-adjoint operators. This paper presents a new spectral coarse space that can be constructed in a fully-algebraic way unlike most existing spectral coarse spaces. We present theoretical convergence result for Hermitian positive definite diagonally dominant matrices. Numerical experiments and comparisons against state-of-the-art preconditioners in the multigrid community show that the resulting two-level Schwarz preconditioner is efficient especially for non-self-adjoint operators. Furthermore, in this case, our proposed preconditioner outperforms state-of-the-art preconditioners.
... Thus the GenEO method would not scale to the full Hawk machine. This estimate could be improved by employing parallel direct solvers, but the scalability of these solvers is limited [21,26,3]. ...
... While robustness with respect to coefficient variations is proven and demonstrated numerically, the problem sizes are rather small. A multilevel version of GenEO was proposed in [3] based on the SPSD splitting introduced in [2]. ...
... In this paper, we formulate a natural extension of GenEO [29] to an arbitrary number of levels. In contrast to [3], our method is formulated in a variational framework based on subspace correction [32]. The convergence theory of [29] is generalized to nonconforming discretizations as well as to multiple levels and several different GEVPs. ...
Highly heterogeneous, anisotropic coefficients, e.g. in the simulation of carbon-fibre composite components, can lead to extremely challenging finite element systems. Direct solvers for the resulting large and sparse linear systems suffer from severe memory requirements and limited parallel scalability, while iterative solvers in general lack robustness. Two-level spectral domain decomposition methods can provide such robustness for symmetric positive definite linear systems, by using coarse spaces based on independent generalized eigenproblems in the subdomains. Rigorous condition number bounds are independent of mesh size, number of subdomains, as well as coefficient contrast. However, their parallel scalability is still limited by the fact that (in order to guarantee robustness) the coarse problem is solved via a direct method. In this paper, we introduce a multilevel variant in the context of subspace correction methods and provide a general convergence theory for its robust convergence for abstract, elliptic variational problems. Assumptions of the theory are verified for conforming, as well as for discontinuous Galerkin methods applied to a scalar diffusion problem. Numerical results illustrate the performance of the method for two- and three-dimensional problems and for various discretization schemes, in the context of scalar diffusion and linear elasticity.
... In computational science, preconditioning is widely used to enhance the convergence, which means instead of directly working with a linear system Ax = b, one can consider the preconditioned system (Daas et al., 2019) ...
... Standard domain decomposition preconditioners, such as restricted additive Schwarz, are widely used for parallel implementation of computational models. In a parallel implementation, such preconditioners bring the benefit of relatively low communication costs (Daas et al., 2019). Beside this, the formed linear system of equations in each partition of the mesh was solved using Krylov methods by taking advantage of the highly-efficient preconditioners and iterative solvers of the PETSc library. ...
Full-text available
A combination of reaction-diffusion models with moving-boundary problems yields a system in which the diffusion (spreading and penetration) and reaction (transformation) evolve the system's state and geometry over time. These systems can be used in a wide range of engineering applications. In this study, as an example of such a system, the degradation of metallic materials is investigated. A mathematical model is constructed of the diffusion-reaction processes and the movement of corrosion front of a magnesium block floating in a chemical solution. The corresponding parallelized computational model is implemented using the finite element method, and the weak and strong-scaling behaviors of the model are evaluated to analyze the performance and efficiency of the employed high-performance computing techniques.
... The preconditioned conjugate gradient method (PCG) [1,2,3,4] and the preconditioned generalised minimal residual method (PGMRES) [5,6,7] are amongst the most powerful and versatile approaches to treat such problems. The choice of a suitable preconditioner plays a major role on the convergence and scalability of these solvers and notable examples include the incomplete Choleski factorization [8] and domain decomposition methods [9,10], such as the popular FETI methods [11,12,13] and the additive Schwarz methods [14,15]. In a similar fashion, Algebraic and Geometric Multigrid (AMG, GMG, resp.) ...
Recent advances in the field of machine learning open a new era in high performance computing. Applications of machine learning algorithms for the development of accurate and cost-efficient surrogates of complex problems have already attracted major attention from scientists. Despite their powerful approximation capabilities, however, surrogates cannot produce the `exact' solution to the problem. To address this issue, this paper exploits up-to-date ML tools and delivers customized iterative solvers of linear equation systems, capable of solving large-scale parametrized problems at any desired level of accuracy. Specifically, the proposed approach consists of the following two steps. At first, a reduced set of model evaluations is performed and the corresponding solutions are used to establish an approximate mapping from the problem's parametric space to its solution space using deep feedforward neural networks and convolutional autoencoders. This mapping serves a means to obtain very accurate initial predictions of the system's response to new query points at negligible computational cost. Subsequently, an iterative solver inspired by the Algebraic Multigrid method in combination with Proper Orthogonal Decomposition, termed POD-2G, is developed that successively refines the initial predictions towards the exact system solutions. The application of POD-2G as a standalone solver or as preconditioner in the context of preconditioned conjugate gradient methods is demonstrated on several numerical examples of large scale systems, with the results indicating its superiority over conventional iterative solution schemes.
... In the ideal case, the deflation subspace is the invariant subspace spanned by the eigenvectors associated with the smallest eigenvalues of A, and the convergence rate is then governed by the "effective" spectral condition number associated with the remaining eigenvalues (that is, the ratio of the largest eigenvalue to the smallest remaining eigenvalue). The idea was first introduced in the late 1980s [8,32] and has been discussed and used by a number of researchers [2,3,10,14,22,23,27,31,39,40,44,45]. However, in most of these references, the deflation subspaces rely on the underlying partial differential equation and its discretization and cannot be applied to more general systems or used as "black box" preconditioners. ...
... A multilevel extension thus sounds very appealing in theory. While there are tools available for symmetric definite problems [45], they do not trivially translate to the Helmholtz equation. For eigenanalysis, we rely on the Krylov-Schur method [46]. ...
Solving time-harmonic wave propagation problems in the frequency domain and within heterogeneous media brings many mathematical and computational challenges, especially in the high frequency regime. We will focus here on computational challenges and try to identify the best algorithm and numerical strategy for a few well-known benchmark cases arising in applications. The aim is to cover, through numerical experimentation and consideration of the best implementation strategies, the main two-level domain decomposition methods developed in recent years for the Helmholtz equation. The theory for these methods is either out of reach with standard mathematical tools or does not cover all cases of practical interest. More precisely, we will focus on the comparison of three coarse spaces that yield two-level methods: the grid coarse space, DtN coarse space, and GenEO coarse space. We will show that they display different pros and cons, and properties depending on the problem and particular numerical setting.
... When N is large (N ≥ 1,024), the setup and solve times are impacted by the high cost of factorizing and solving the second-level problems, which, as highlighted by the values of n 0 , become large. Multilevel variants [4] could be used to overcome this but goes beyond the scope of the current study. ...
Full-text available
Solving the normal equations corresponding to large sparse linear least-squares problems is an important and challenging problem. For very large problems, an iterative solver is needed and, in general, a preconditioner is required to achieve good convergence. In recent years, a number of preconditioners have been proposed. These are largely serial and reported results demonstrate that none of the commonly used preconditioners for the normal equations matrix is capable of solving all sparse least-squares problems. Our interest is thus in designing new preconditioners for the normal equations that are efficient, robust, and can be implemented in parallel. Our proposed preconditioners can be constructed efficiently and algebraically without any knowledge of the problem and without any assumption on the least-squares matrix except that it is sparse. We exploit the structure of the symmetric positive definite normal equations matrix and use the concept of algebraic local symmetric positive semi-definite splittings to introduce two-level Schwarz preconditioners for least-squares problems. The condition number of the preconditioned normal equations is shown to be theoretically bounded independently of the number of subdomains in the splitting. This upper bound can be adjusted using a single parameter $\tau$ that the user can specify. We discuss how the new preconditioners can be implemented on top of the PETSc library using only 150 lines of Fortran, C, or Python code. Problems arising from practical applications are used to compare the performance of the proposed new preconditioner with that of other preconditioners.
Full-text available
In this paper we present a class of robust and fully algebraic two-level preconditioners for SPD matrices. We introduce the notion of algebraic local SPSD splitting of an SPD matrix and we give a characterization of this splitting. This splitting leads to construct algebraically and locally a class of efficient coarse spaces which bound the spectral condition number of the preconditioned system by a number defined a priori. We also introduce the τ-filtering subspace. This concept helps compare the dimension minimality of coarse spaces. Some PDEs-dependant preconditioners correspond to a special case. The examples of the algebraic coarse spaces in this paper are not practical due to expensive construction. We propose a heuristic approximation that is not costly. Numerical experiments illustrate the efficiency of the proposed method.
Full-text available
We propose two multilevel spectral techniques for constructing coarse discretization 4 spaces for saddle-point problems corresponding to PDEs involving divergence constraint, with fo-5 cus on the mixed finite element discretization of scalar self-adjoint second order elliptic equations 6 on general unstructured grids. We use element agglomeration algebraic multigrid (AMGe) which 7 employs coarse elements that can have nonstandard shape since they are agglomerates of fine-grid 8 elements. The coarse basis associated with each agglomerated coarse element is constructed by solv-9 ing local eigenvalue problems and local mixed finite element problems. This construction leads to 10 stable upscaled coarse spaces and guarantees the inf-sup compatibility of the upscaled discretization. 11 Also, approximation properties of these upscaling spaces improve by adding more local eigenfunc-12 tions to the coarse spaces. The higher accuracy comes at the cost of additional computational effort, 13 as the sparsity of the resulting upscaled coarse discretization (referred to as operator complexity) 14 deteriorates as we introduce additional functions in the coarse space. We also provide an efficient 15 solver for the coarse (upscaled) saddle-point system by employing hybridization, which leads to a 16 s.p.d. reduced system for the Lagrange multipliers, and to solve the latter s.p.d. system, we use 17 our previously developed spectral AMGe solver. Numerical experiments, in both 2D and 3D, are 18 provided to illustrate the efficiency of the proposed upscaling technique. 19
A multi‐level generalization of the FETI‐DP domain decomposition method is proposed for very‐large‐scale discrete problems in order to address the bottleneck associated with the solution of the coarse problems at such scales. This bottleneck destroys the parallel scalability of the original, two‐level FETI‐DP method when using more than a few thousand processor cores. In the multi‐level formulation proposed here, the FETI‐DP method is applied recursively to solve all coarse problems but the smallest one. Crucially, this recursive application of the method is enabled by utilizing a new primal formulation of the augmentation/enrichment process of the coarse problems. The efficiency and scalability of the proposed approach are demonstrated for up to 32,768 processor cores, and large‐scale real‐world and benchmark problems with more than 21 billion degrees of freedom. The obtained performance results show that the three‐ and four‐level FETI‐DP methods exhibit a better scalability than the original two‐level FETI‐DP method.
A class of preconditioners based on balancing domain decomposition by constraints methods is introduced in the Portable, Extensible Toolkit for Scientific Computation (PETSc). The algorithm and the underlying nonoverlapping domain decomposition framework are described with a specific focus on their current implementation in the library. Available user customizations are also presented, together with an experimental interface to the finite element tearing and interconnecting dual-primal methods within PETSc. Large-scale parallel numerical results are provided for the latest version of the code, which is able to tackle symmetric positive definite problems with highly heterogeneous distributions of the coefficients. Current limitations and future extensions of the preconditioner class are also discussed.
The node-depth encoding is a representation for evolutionary algorithms applied to tree prob-lems. Its represents trees by storing the nodes and their depth in a proper ordered list. Theoriginal formulation of the node-depth encoding has only mutation operators as the searchmechanism. Although the representation has this restriction, it has obtained good results withlow convergence. Then, this work proposes a specic recombination operator to improve theconvergence of the node-depth encoding representation. These operators are based on recom-bination for permutation representations. An investigation into the bias and heritability of theproposed recombination operator shows that it has a bias towards stars and low heritability. The performance of node-depth encoding with the proposed operator is investigated for theoptimal communication spanning tree problem. The results are presented for benchmark in-stances in the literature. The use of the recombination operator results in a faster convergencethan with only mutation operators.
We consider overlapping Schwarz algorithms for solving linear and nonlinear systems of equations arising from the finite element discretization of elasticity problems on unstructured meshes in three dimensions. The parallel scalability of Schwarz methods is determined almost completely by how the coarse space is constructed. In this paper, we introduce a low cost, boundary geometry preserving coarse mesh that shares the same boundary geometry with the fine mesh but has a better scalability than the coarse mesh obtained by uniformly coarsening the fine mesh. A new coarsening algorithm and a partitioning strategy are developed. We numerically show that a multilevel Schwarz method with the new coarse spaces is highly scalable, in terms of the total compute time, for solving some three-dimensional linear and nonlinear elasticity equations discretized on unstructured meshes with several hundreds of millions of unknowns and on a supercomputer with over 10,000 processor cores.
In this paper we present a fully distributed, communicator-aware, recursive, and interlevel-overlapped message-passing implementation of the multilevel balancing domain decomposition by constraints (MLBDDC) preconditioner. The implementation highly relies on subcommunicators in order to achieve the desired effect of coarse-grain overlapping of computation and communication, and communication and communication among levels in the hierarchy (namely, interlevel overlapping). Essentially, the main communicator is split into as many nonoverlapping subsets of message-passing interface (MPI) tasks (i.e., MPI subcommunicators) as levels in the hierarchy. Provided that specialized resources (cores and memory) are devoted to each level, a careful rescheduling and mapping of all the computations and communications in the algorithm lets a high degree of overlapping be exploited among levels. All subroutines and associated data structures are expressed recursively, and therefore MLBDDC preconditioners with an arbitrary number of levels can be built while re-using significant and recurrent parts of the codes. This approach leads to excellent weak scalability results as soon as level-1 tasks can fully overlap coarser-levels duties. We provide a model to indicate how to choose the number of levels and coarsening ratios between consecutive levels and determine qualitatively the scalability limits for a given choice. We have carried out a comprehensive weak scalability analysis of the proposed implementation for the three-dimensional Laplacian and linear elasticity problems on structured and unstructured meshes. Excellent weak scalability results have been obtained up to 458,752 IBM BG/Q cores and 1.8 million MPI being, being the first time that exact domain decomposition preconditioners (only based on sparse direct solvers) reach these scales.