ArticlePDF Available

Abstract and Figures

The use of factorized sparse approximate inverse (FSAI) preconditioners in a standard multilevel framework for symmetric positive definite (SPD) matrices may pose a number of issues as to the definiteness of the Schur complement at each level. The present work introduces a robust multilevel approach for SPD problems based on FSAI preconditioning, which eliminates the chance of algorithmic breakdowns independently of the preconditioner sparsity. The multilevel FSAI algorithm is further enhanced by introducing descending and ascending low-rank corrections, thus giving rise to the multilevel FSAI with low-rank corrections (MFLR) preconditioner. The proposed algorithm is investigated in a number of test problems. The numerical results show that the MFLR preconditioner is a robust approach that can significantly accelerate the solver convergence rate preserving a good degree of parallelism. The possibly large set-up cost, mainly due to the computation of the eigenpairs needed by low-rank corrections, makes its use attractive in applications where the preconditioner can be recycled along a number of linear solves.
Content may be subject to copyright.
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SIAM J. MATRIX ANAL. APPL.c
2018 Society for Industrial and Applied Mathematics
Vol. 39, No. 1, pp. 123–147
A ROBUST MULTILEVEL APPROXIMATE INVERSE
PRECONDITIONER FOR SYMMETRIC POSITIVE
DEFINITE MATRICES
ANDREA FRANCESCHINI, VICTOR ANTONIO PALUDETTO MAGRI, MASSIMILIANO
FERRONATO,AND CARLO JANNA§
Abstract. The use of factorized sparse approximate inverse (FSAI) preconditioners in a standard
multilevel framework for symmetric positive definite (SPD) matrices may pose a number of issues
as to the definiteness of the Schur complement at each level. The present work introduces a robust
multilevel approach for SPD problems based on FSAI preconditioning, which eliminates the chance of
algorithmic breakdowns independently of the preconditioner sparsity. The multilevel FSAI algorithm
is further enhanced by introducing descending and ascending low-rank corrections, thus giving rise
to the multilevel FSAI with low-rank corrections (MFLR) preconditioner. The proposed algorithm is
investigated in a number of test problems. The numerical results show that the MFLR preconditioner
is a robust approach that can significantly accelerate the solver convergence rate preserving a good
degree of parallelism. The possibly large set-up cost, mainly due to the computation of the eigenpairs
needed by low-rank corrections, makes its use attractive in applications where the preconditioner can
be recycled along a number of linear solves.
Key words. preconditioning, approximate inverses, parallel computing, iterative methods
AMS subject classifications. 65F08, 65F10, 65F50, 65Y05
DOI. 10.1137/16M1109503
1. Introduction. The solution of a large sparse linear system of equations in
the form
(1.1) Ax=b
where ARn×n,b,xRn, and Ais symmetric positive definite (SPD), typically
requires the use of preconditioned iterative methods based on Krylov subspaces. As
is well known, the key for accelerating the solver convergence is the definition of the
preconditioner, i.e., the operator that approximates the action of A1.
There are several different approaches for building an efficient preconditioner, ei-
ther physics-based or purely algebraic. Among the algebraic algorithms, the most
popular categories are incomplete factorizations, multigrid methods, and sparse ap-
proximate inverses [2, 11], though different preconditioners can be composed to form
new ones, e.g., blending domain decomposition approaches with incomplete factor-
izations or approximate inverses, or using nested Krylov methods [7, 18, 19, 24].
Combinations of physics-based and purely algebraic approaches are also denoted as
“gray-box” solvers [2, 5]. For example, some physics-based pattern selection strategies
Received by the editors December 27, 2016; accepted for publication (in revised form) by
D. Orban August 21, 2017; published electronically January 24, 2018.
http://www.siam.org/journals/simax/39-1/M110950.html
Funding: The work of the authors was supported by the ISCRA project “SCAIP: Scalable
approximate inverse preconditioners.”
Department ICEA, University of Padova, 35121 Padova, Italy (franc90@dmsa.unipd.it,
victor.magri@dicea.unipd.it).
M3E s.r.l., 35129 Padova, Italy, and Department ICEA, University of Padova, 35121 Padova,
Italy (massimiliano.ferronato@unipd.it).
§Corresponding author. M3E s.r.l., 35129 Padova, Italy, and Department ICEA, University of
Padova, 35121 Padova, Italy (carlo.janna@unipd.it).
123
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
124 FRANCESCHINI, MAGRI, FERRONATO, AND JANNA
can be used for algebraic sparse approximate inverse preconditioners [5] and block
variants of multilevel incomplete factorizations [17, 6].
Compared to incomplete factorizations, sparse approximate inverses are gener-
ally more robust and more appropriate for parallel computational architectures. In
particular, the factorized sparse approximate inverse (FSAI) [21] is an algebraic pre-
conditioner for SPD problems that proves effective in a wide range of applications,
especially in its dynamically adaptive variants [16, 20]. This algorithm provides a
factorized approximation of A1:
(1.2) GTG'A1,
where Gis a lower triangular matrix explicitly computed in the set-up phase so as to
approach the inverse of the lower Cholesky factor of Ain the sense of the Frobenius
norm. One of the most attractive features of FSAI for modern computers is its intrinsic
high degree of parallelism. In fact, each row of Gcan be fully formed independently of
the others, with the parallelization trivially accomplished by evenly subdividing the
rows among threads and/or processes. The degree of parallelism is even redundant,
as the number of rows is much larger than the available number of computing cores.
The central idea of this work is to introduce some sequentiality in the FSAI
computation, in order to use the information extracted from the earlier set-up stages
for the remaining rows. This concept, which is the basis for incomplete factorizations,
has been already introduced in the context of approximate inverses in [8, 25], and
more recently in [12], where both a block tridiagonal and a domain decomposition
approach have been used to improve the FSAI performance. In the present paper,
we develop a more general multilevel framework that describes the former approaches
just by selecting a proper unknown ordering.
One of the difficulties arising in the multilevel generalization of the FSAI pre-
conditioner is related to the accuracy in the computation of the Schur complement
at each level. If the earlier levels are not well approximated, the resulting Schur
complement can be inaccurate, with a consequent degradation of the solver perfor-
mance. This issue has been recently addressed in the context of multilevel incomplete
factorizations with the aid of low-rank corrections [29]. Low-rank compression algo-
rithms are gaining increasing attention especially in direct linear solution methods,
e.g., [1, 14, 28, 30, 31, 32], with the basic idea of taking advantage of data sparsity in-
stead of structural sparsity. During the factorization process, the off-diagonal blocks,
which in discretized-PDE problems are usually characterized by low-rank properties,
are decomposed by SVD and compressed by neglecting those components correspond-
ing to the smallest singular values. There are several schemes to carry out the com-
pression on the entire matrix, ranging from H-matrices Hermitian and skew-Hermitian
splitting methods to block low rank tree structured compression. Typically a drop
tolerance is set below which the singular values are dropped. If this tolerance is suffi-
ciently small, the matrix factorization is cheaper but can still be directly applied to a
right-hand side to get the system solution. With large drop tolerances, the low-rank
approach gives an approximation of the exact factorization and can be used as a pre-
conditioner, e.g., [1]. To our knowledge, the first attempt to directly use low-rank
representations in preconditioning is discussed in [23] in the context of a divide and
conquer strategy.
Within the FSAI multilevel framework presented in this work, low-rank correc-
tions are introduced for both enhancing the preconditioner quality at the earlier
levels (descending low-rank, DLR) and improving the accuracy in the Schur com-
plement computation (ascending low-rank, ALR). The paper is organized as follows.
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A ROBUST MFLR PRECONDITIONER FOR SPD MATRICES 125
Algorithm 2.1. Multilevel Factorization Set-up.
1. Function ML SetUp(nl,A)
2. Set A0=A;
3. for all l= 0, . . . , nl2do
4. Partition Alas K B
BTC;
5. Compute e
Lsuch that e
Le
LT'K;
6. Compute e
Hsuch that e
H'L1
KB;
7. Compute e
Ssuch that e
S'CBTK1B;
8. Form Ml="e
L0
e
HTI#;
9. Set Al+1 =e
S;
10. end for
11. Compute Mnl1such that Mnl1MT
nl1'Anl1;
12. Set M={M0, M1, . . . , Mnl1};
The Multilevel FSAI (MF) preconditioner is first derived proving its robustness. It is
then demonstrated that the MF quality can be improved if the approximated Schur
complement computed at any level is corrected in order to approximate the precon-
ditioner instead of the original matrix Schur complement. Low-rank corrections are
finally introduced to further improve the MF preconditioner. The properties of this
approach are numerically investigated, and finally a few considerations close the work.
2. Multilevel FSAI preconditioning. Consider a standard multilevel ap-
proach applied to the SPD matrix Afor a total number nlof levels. The matrix
Alobtained at any level l[0, nl1] is the Schur complement of the previous
level. Following for instance [17], we can partition Alin four blocks and perform the
factorization:
(2.1) Al=K B
BTC=LK0
BTLT
KI I0
0S LT
KL1
KB
0I,
where LKLT
Kis the exact Cholesky decomposition of K,KRn1×n1, and S=
CBTK1B,SRn2×n2, is the Schur complement of Alwith respect to the
partition (n1, n2), i.e., Al+1 S. The partitioning of each level can follow some
physics-based criterion, if any. As our goal is to get a preconditioner, we can replace
each block on the right-hand side of (2.1) with an approximation:
(2.2) e
L'LK,e
H'L1
KB, e
S'S
The recursive computation of (2.1) with the approximations (2.2) provides the general
framework for a multilevel preconditioner Mof A, made by the list of factors Ml,
l[0, nl] (Algorithm 2.1). The preconditioner application stage, i.e., the computation
of w=M1vfor some known vector v, is provided in Algorithm 2.2.
Because of the recursion in the computation of M, we can restrict to the two-level
case with no loss of generality. Hence, Alin (2.1) coincides with Aand n1+n2=n.
A popular choice is to use an incomplete factorization as the main kernel for the
approximations in (2.2), i.e., setting e
Lequal to the incomplete Cholesky (IC) factor
of Kas in [26, 22, 17, 4]. In the same framework, the use of FSAI as the main kernel
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
126 FRANCESCHINI, MAGRI, FERRONATO, AND JANNA
Algorithm 2.2. Multilevel Factorization Application.
1. Function w =ML Apply(nl,M,v)
2. Set levstart = 1;
3. for all l= 0, . . . , nl2do
4. Solve Mlz=v;
5. Set levend =levstart +n11;
6. Form w(levstart :levend) = v(1 : n1);
7. Set v=v(n1+ 1 : n);
8. Update levstart =levend + 1;
9. end for
10. Solve (Mnl1MT
nl1)z=v;
11. Form w(levstart :n) = z;
12. Set levend =levstart 1;
13. for all l=nl2,...,0do
14. Set levstart =levend n1+ 1;
15. Retrieve wl=w(levstart :n);
16. Solve MT
lz=wl;
17. Form w(levstart :n) = wl;
18. Update levend =levstar t 1;
19. end for
is straightforward. After computing Gsuch that GTG'K, the approximations in
(2.2) become
(2.3) e
L'G1,e
H'GB, e
S'Ce
HTe
H.
In contrast to what typically happens using an incomplete factorization as the main
kernel, no dropping is necessary to compute efficiently e
Hand e
S, because Gis usually
very sparse. In fact, recall that the sparsity of Gis controlled by the user, while the
one of e
L1is not.
Unfortunately, this straightforward implementation of the MF preconditioner is
very prone to breakdowns. In fact, the Schur complement approximation e
S=C
e
HTe
His computed as the difference between two SPD matrices and can be indefinite.
Such an experience is quite common even in relatively well-conditioned problems.
The reason for this resides in the poor approximation of the leftmost eigenvalues
usually obtained by FSAI. Figure 1 compares the eigenspectra of the bcsstk16 matrix
(structural problem, n= 4,884 with 290,378 nonzero entries) from the University of
Florida sparse matrix collection [9] with its approximations by IC and FSAI, e
Le
LTand
(GTG)1, respectively, where e
Land Gare computed with the same number of non-
zeros. Both IC and FSAI are not able to capture the smallest eigenvalues of a matrix,
though IC is generally better. The smallest eigenvalues of Kare the largest of K1,
and therefore control the most significant entries of BTK1B. As a consequence,
e
HTe
Hcomputed as in (2.3) often fails in approximating accurately the largest entries
of BTK1B, leading to the appearance of negative eigenvalues in e
Sand causing the
breakdown of the procedure. This occurrence can happen using IC as a kernel in a
multilevel preconditioning framework as well. To address this issue some stabilization
techniques, e.g., based on diagonal shifts [17, 27], have been successfully introduced.
Unfortunately, these strategies do not provide satisfactory results when used for the
computation of e
Swith the aid of FSAI as a kernel.
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A ROBUST MFLR PRECONDITIONER FOR SPD MATRICES 127
10 100 1000
1×10-2
1×10-1
1×100
A
G-1G-T
LLT
Fig. 1.Comparison between the eigenspectra of A,e
Le
LT, and (GTG)1for the bcsstk16 matrix
from the University of Florida sparse matrix collection.
The robustness of the MF preconditioner can be ensured by computing e
Sso as
to be SPD independently of G. To this aim, we can use the following result.
Theorem 2.1. Let ARn×nbe an SPD (2 ×2)-block matrix,
(2.4) A=K B
BTC,
with KRn1×n1,BRn1×n2, and CRn2×n2, and let VRn×n2and DRn2×n
be the 2-block rectangular matrices
(2.5) V=FT
I, D =0Z,
with FRn2×n1and ZRn2×n2, such that the Frobenius norm kDVTLkFis
minimum for any Z,Lbeing the lower Cholesky factor of A. Then, S=VTAV is
the Schur complement of Awith respect to the partition (n1, n2).
Proof. Recalling (2.1), the lower Cholesky factor of Areads
(2.6) L=LK0
BTLT
KLS,
where LSis the lower Cholesky factor of the Schur complement of A, i.e., LSLT
S=
CBTK1B. The matrix (DVTL) is therefore
(2.7) DVTL=F LK+BTLT
K, Z LS,
whose Frobenius norm is minimum for any Zif F LK+BTLT
K= 0; i.e.,
(2.8) F=BTK1.
The matrix S=VTAV reads
(2.9) S=F KF T+BTFT+F B +C.
Introducing (2.8) into (2.9) provides VTAV =CBTK1B.
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
128 FRANCESCHINI, MAGRI, FERRONATO, AND JANNA
Remark 2.2. The matrix Fof Theorem 2.1 is generally dense. If (2.8) is enforced
only for the entries located in a prescribed set of positions e
S S ={(i, j) : 1
in1,1jn2}, the sparsity of Fcan be retained at a workable level. This
definition for Fcoincides with the block FSAI preconditioner introduced in [18] and
[16], where the nonzero pattern e
Sis defined either statically or dynamically during
the computation of F. Using a sparse Fin (2.9) produces an approximation e
Sof the
exact Schur complement Sof A.
Corollary 2.3. The Schur complement approximation e
Scomputed with (2.9)
and a sparse block FSAI Fis SPD.
Proof. The expression of e
Scan be easily rearranged by adding and subtracting
BTK1B:
(2.10) e
S=CBTK1B+ (F+BTK1)K(KTB+FT) = S+WTKW.
The result immediately follows by noting that WTKW is SPD.
Based on these results, the MF preconditioner is built as follows. The zero-level
preconditioner M1
0is made by two factors:
(2.11) M1
0=PbPa.
An explicit approximation of K1is computed as GTGusing an adaptive FSAI
procedure [20] and introduced in Pa:
(2.12) Pa=G0
0I.
Then, the preconditioned matrix PaAP T
ais computed,
(2.13) PaAP T
a=GKGTGB
BTGTC,
and the adaptive block FSAI [16] of PaAP T
ais computed for the second factor:
(2.14) Pb=I0
F I .
The zero-level preconditioned matrix M1
0AMT
0reads
(2.15)
M1
0AMT
0=I0
F I GKGTGB
BTGTC I F T
0I=GKGTRT
F
RFe
S,
where RF=F(GKGT)BTGTis the residual on Fi.e., RFapproaches the null
matrix as the accuracy in the computation on Fincreases. Finally, the (2,2) block of
M1
0AMT
0is the approximation of the first-level Schur complement
(2.16) e
S=C+F GB +BTGTFT+F GKGTFT
that becomes the new matrix for the next level. As e
Sin (2.16) is SPD for any Fand
G(see Corollary 2.3), no breakdown is possible. The operations required for building
the robust MF preconditioner are provided in Algorithm 2.3.
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A ROBUST MFLR PRECONDITIONER FOR SPD MATRICES 129
Algorithm 2.3. FSAI-based Multilevel Preconditioner Set-up.
1. Function MF SetUp(nl,A)
2. Set A0=A;
3. for all l= 0, . . . , nl2do
4. Partition Alas K B
BTC;
5. Compute the adaptive FSAI approximation Gof Ksuch that GTG'K1;
6. Set Pa=G0
0I;
7. Compute Pb=I0
F I the adaptive block FSAI approximation of PaAlPT
a;
8. Compute e
S=C+F GB +BTGTFT+F GKGTFT;
9. Form M1
l=PbPa;
10. Set Al+1 =e
S;
11. end for
12. Compute M1
nl1as the adaptive FSAI approximation of Anl1;
13. Set M1=M1
0, M 1
1, . . . , M 1
nl1;
2.1. Theoretical properties. In this section, we will obtain some theoretical
bounds on the eigenspectrum of the preconditioned matrix according to the different
approximations introduced in the MF computation. At every level of the MF pre-
conditioner set-up, the partial factorization of the approximated Schur complement
(2.16) is computed. This operation introduces additional approximations level after
level, potentially yielding to a Schur complement quite different from the exact one.
Consider the zero-level preconditioner M1
0of (2.11):
(2.17) M1
0=G0
F G I .
The natural choice for the next level preconditioner is
(2.18) f
M1
1=I0
0L1
e
S,
where Le
SLT
e
S=e
S. However, e
Sis an approximated Schur complement. Should Sbe
available, one could use
(2.19) M1
1=I0
0L1
S
with LSLT
S=S. Using either f
M1
1or M1
1as the next level preconditioner of A
leads to a different performance. The two propositions that follow provide a theo-
retical upper bound for the eigenvalues of the preconditioned matrices f
M1Af
MT
and M1AMT, in order to allow for an a priori assessment of the preconditioner
quality.
Proposition 2.4. The eigenvalues λof the preconditioned matrix f
M1Af
MT,
with f
M1=f
M1
1M1
0as defined in (2.17) and (2.18), satisfy
(2.20) |λ1| kEKk+qkEKk2+ 4ke
QTkk e
Qk
2,
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
130 FRANCESCHINI, MAGRI, FERRONATO, AND JANNA
where e
Q=G(KGTFT+B)LT
e
S=RT
FLT
e
Sand EK=GKGTI, for any consistent
matrix norm.
Proof. The preconditioned matrix f
M1Af
MTreads
(2.21)
f
M1Af
MT="GKGTG(K GTFT+B)LT
e
S
L1
e
S(F GK +BT)GTI#="I+EKe
Q
e
QTI#.
Its eigenpairs (λ, w), w= [u,v]T, satisfy by definition the relationship
(2.22) "I+EKe
Q
e
QTI#u
v=λu
v,
which is equivalent to
(2.23) "EKe
Q
e
QT0#u
v= (λ1) u
v.
Taking consistent norms at both sides of the first and second set of equations we have
(2.24) (kEKkkuk+ke
Qkkvk≥|λ1|kuk,
ke
QTkkuk |λ1|kvk,
which, by setting t=kvk/kuk, can be rearranged as
(2.25) (|λ1|≤kEKk+ke
Qkt,
|λ1|≤ke
QTk/t.
If kuk= 0, then trivially vKer( e
Q) and λ= 1, thus satisfying the inequality (2.20).
The right-hand side of the first and second inequality in (2.25) increases and decreases
monotonically with t, respectively (Figure 2). The intersection point is
(2.26) ¯
t=−kEKk+qkEKk2+ 4ke
QTkk e
Qk
2ke
Qk,
and thus for any twe have
(2.27) |λ1| kEKk+qkEKk2+ 4ke
QTkk e
Qk
2.
Proposition 2.5. The eigenvalues λof the preconditioned matrix M1AM T,
with M1=M1
1M1
0as defined in (2.11) and (2.19), satisfy
(2.28)
|λ1| kEKk+kQTkk (I+EK)1kkQk+q(kEKk−kQTkk (I+EK)1kkQk)2+ 4kQTkkQk
2,
where Q=G(KGTFT+B)LT
S=RT
FLT
Sand EK=GKGTI, for any consistent
matrix norm.
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A ROBUST MFLR PRECONDITIONER FOR SPD MATRICES 131
t
||Q ||/t
K
K
||E ||
||E ||+||Q||t
~
~
T
| −1|λ
t
Fig. 2.Schematic representation of the system of inequalities (2.25).
Proof. Operating as in the proof of Proposition 2.4, it can be shown that the
eigenpairs of M1AMTsatisfy
(2.29) I+EKQ
QTI+QT(I+EK)1Q u
v=λu
v,
which is equivalent to
(2.30) EKQ
QTQT(I+EK)1Q u
v= (λ1) u
v.
Again, taking consistent norms on both sides of the first and second set of equations,
and setting t=kvk/kuk, we get
(2.31) |λ1|≤kEKk+kQkt,
|λ1|≤kQTk/t +kQTkk (I+EK)1kkQk.
If kuk= 0, then trivially vKer(Q) and λ= 1, thus satisfying the inequality
(2.28). Otherwise, denoting by ¯
tthe intersection point between the right-hand sides
of (2.31),
(2.32)
¯
t=−kEKk+kQTkk (I+EK)1kkQk+q(kEKk−kQTkk (I+EK)1kkQk)2+ 4kQTkkQk
2kQk
for any t, we have
(2.33)
|λ1| kEKk+kQTkk (I+EK)1kkQk+q(kEKk−kQTkk (I+EK)1kkQk)2+ 4kQTkkQk
2.
Now, to understand which preconditioner, either f
M1or M1, is expected to
ensure a faster convergence, we compare the upper bounds provided in Propositions
2.4 and 2.5. These bounds depend on the norms of EKand Qor e
Q, which in turn
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
132 FRANCESCHINI, MAGRI, FERRONATO, AND JANNA
are controlled by the accuracy in the computation of Gand F, respectively. As GTG
approaches K1,kEKk 0, and in the limit the bounds (2.20) and (2.28) read
(2.34) |λ1| qke
Qkk e
QTk
and
(2.35) |λ1| 1
2kQkkQTk+qkQkkQTk(4 + kQkkQTk),
respectively. Similarly, as Fapproaches BTK1G1,kQk,ke
Qk 0 and the bounds
(2.20) and (2.28) trivially provide
(2.36) |λ1|≤kEKk.
The previous relationships hold true for any consistent matrix norm. However, to
compare the bounds in an easier way, we restrict our attention to the matrix norm
induced by the 2-norm of vectors. In this case we have
kEKk2=λ1(EK) = 1,kQk2=pλ1(QTQ) = η1,ke
Qk2=qλ1(e
QTe
Q) = ˜η1,
k(I+EK)1k2=λ1
n(GKGT) = κ1
n,
(2.37)
and the bounds (2.20) and (2.28), respectively, become
|λ1| 1+p2
1+ η2
1
2,(2.38)
|λ1| 1+η2
1κ1
n+q1η2
1κ1
n2+ 4η2
1
2.(2.39)
Remark 2.6. When 1= 0, it is easy to prove that the maximum and minimum
eigenvalues of f
M1Af
MTare 1 + ˜η1and 1 ˜η1, respectively, and the maximum
and minimum eigenvalues of M1AM Tare 1 + (η2
1+pη4
1+ 4η2
1)/2 and 1 + (η2
1
pη4
1+ 4η2
1)/2, respectively.
The following results suggest the use of f
M1instead of M1as the MF precon-
ditioner of A.
Theorem 2.7. For any choice of Gand Fin (2.17), the bound (2.38) is narrower
than or equal to the bound (2.39).
Proof. Using the arguments of Corollary 2.3, it follows that e
S=S+e
H, with e
H
a symmetric positive semidefinite matrix, hence kSe
S1k21. In particular,
e
S=C+F GKGTFT+F GB +BTGTFT
=CBTK1B+ (F+BTK1G1)(GKGT)(F+BTK1G1)T
=S+RF(GKGT)1RT
F.(2.40)
Moreover, the matrix e
QTe
Qis similar to LT
Se
S1LSQTQ. In fact, recalling that
Q=RT
FLT
Sand e
Q=RT
FLT
e
S, we obtain
(2.41) RT
FRF=LSQTQLT
S=Le
Se
QTe
QLT
e
S,
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A ROBUST MFLR PRECONDITIONER FOR SPD MATRICES 133
from which the similarity follows. Hence
(2.42) ke
QTe
Qk2=kLT
Se
S1LSQTQk2 kLT
Se
S1LSk2kQTQk2 kQTQk2
because LT
Se
S1LSis similar to Se
S1. As a consequence, ˜η1=αη1for some α1.
The thesis of the theorem reads
(2.43) 1+p2
1+ η2
1
21+η2
1κ1
n+q1η2
1κ1
n2+ 4η2
1
2.
Introducing ˜η1=αη1in (2.43), after some algebra we obtain
(2.44) α21 + q1η2
1κ1
n2+ 4η2
11η2
1κ1
n
2,
which holds true for any Fand G.
Theorem 2.7 suggests that the use of e
Sin the MF preconditioner is likely to be
more appropriate than the exact Schur complement S. In particular, in the theoret-
ical case of GTG=K1it is possible to compute explicitly the ratio between the
conditioning numbers of the preconditioned matrices M1AMTand f
M1Af
MTas
follows.
Theorem 2.8. If EK= 0, the ratio between the conditioning number of the pre-
conditioned matrices (2.29) and (2.21) is
(2.45) r(η1) = 1 + η2
1+pη4
1+ 4η2
1
2!2
1+2η2
1+ 2qη4
1+η2
1
.
Proof. If EK= 0, it can be easily verified from (2.40) that e
S=S+RFRT
Fand
(2.46) L1
Se
SLT
S=I+L1
SRFRT
FLT
S=I+QTQ.
Recalling from the proof of Theorem 2.7 that e
QTe
Qis similar to LT
Se
S1LSQTQwe
have
(2.47) ke
QTe
Qk2=k(I+QTQ)1QTQk2.
Trivially, (I+QTQ)1QTQis symmetric positive definite and has the same eigenvec-
tors as QTQ. Denoting by λan eigenvalue of QTQ, the norm k(I+QTQ)1QTQk2
is the maximum of the function
(2.48) f(λ) = λ
1 + λ.
As f(λ) monotonically increases with λ, its maximum value is attained for the largest
eigenvalue of QTQ, i.e., kQTQk2=η2
1. Hence
(2.49) ke
QTe
Qk2=kQTQk2
1 + kQTQk2
˜η1=sη2
1
1 + η2
1
.
The proof is completed by introducing (2.49) in the results of Remark 2.6.
Remark 2.9. The ratio r(η1) monotonically increases with η1and takes value 1
for η1= 0, i.e., F=BTK1G1and e
S=S. For η1 ,rmonotonically diverges
to infinity as the second power of η1. Hence, the sparser or more inaccurate F, the
more important is using e
Sinstead of Sin the MF preconditioner.
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
134 FRANCESCHINI, MAGRI, FERRONATO, AND JANNA
3. Improving the MF performance with low-rank corrections. The frame-
work developed in section 2 provides the formulation for a robust multilevel FSAI
preconditioner that can be computed in a stable way for any choice of the fill-in
degree used for the basic kernels. Using low fill-in degrees obviously keeps the com-
putational cost for the preconditioner set-up and application under control, but may
yield a poor convergence. The quality of the MF preconditioner can be improved
by using low-rank corrections. The idea of using low-rank corrections in a multilevel
framework has been already introduced in [29], giving rise to the robust multilevel
Schur complement-based low-rank (MSLR) preconditioner. The basic concept can be
briefly recalled as follows. Define the matrix
(3.1) Y=L1
CBTK1BLT
C=L1
C(CS)LT
C,
where LCis the exact lower factor of C, i.e., C=LCLT
C. It is easily recognized that
the eigenvalues σiof Yare such that
(3.2) 0 σn2 · · · σ1<1.
The separation of the eigenvalues θiof X=LT
C(S1C1)LCis larger than that of
Y, because
(3.3)
θi=σi
1σi, i = 1, . . . , n2,
θiθi+1 =σiσi+1
(1 σi)(1 σi+1), i = 1, . . . , n21.
The separation of the eigenvalues of LT
CXL1
C=S1C1has a stronger impact
on the performance of the MSLR preconditioner [29], however studying Xis easier
and the main results for Xare to some extent still valid for S1C1. Equation
(3.3) suggests that approximating with a low-rank matrix (S1C1) is easier than
(SC) because of the faster eigenvalue decay. A better approximation of S1can
be computed as
(3.4) S1'C1+WkΘkWT
k
with WkΘkWT
ka rank-kapproximation of LT
CXL1
Cwhich can be obtained from
the eigendecomposition of Y. In fact, by retaining the klargest eigenvalues and
corresponding eigenvectors of Ywe can write
(3.5) Y'UkΣkUT
k.
Noting that
(3.6) S1C1=LT
C[(IY)1I]L1
C=LT
C[Y(IY)1]L1
C,
the rank-kcorrection to S1C1is found by setting
(3.7) Θk= Σk(IΣk)1
and Wk=LT
CUk.
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A ROBUST MFLR PRECONDITIONER FOR SPD MATRICES 135
In the original formulation outlined above, the low-rank corrections are used to
make the action of C1closer to that of S1. By distinction, we use low-rank cor-
rections to improve the action of e
S1. A consequence of Corollary 2.3 is that the
eigenvalues σiof the matrix
(3.8) Y=L1
e
S(e
SS)LT
e
S
satisfy the condition (3.2). Thus, following the procedure outlined above for C, we
can compute the klargest eigenpairs of Y
(3.9) Y'UkΣkUT
k,
and get the expression of the corrected Schur complement inverse:
(3.10) S1'e
S1+WkΘkWT
k,
where Θk= Σk(IΣk)1and W=LT
e
SU. However, the idea of correcting the
application of e
S1so as to better resemble the one of S1is not that good. First, the
computation of (3.9) may be quite expensive, since every multiplication by Srequires
a solution of a linear system with K. Second, according to Theorems 2.7 and 2.8, the
use of S1in our multilevel framework is not optimal.
The low-rank corrections can be more effectively implemented in another way.
Since we are working in a multilevel framework, e
S1will not be used exactly. Rather,
a new approximation, say b
S1'e
S1, will be computed. Thus, b
Swill be the new
target of the low-rank correction. Moreover, as shown by (2.20), reducing kEKkis
also useful for improving the convergence. Also this task can be performed by a
low-rank correction. Therefore, we use two low-rank correction techniques for the
preconditioner set-up:
Descending low-rank corrections: computed at each level, from the first to
the last, to reduce kEKk;
Ascending low-rank corrections: computed at each level, from the last to the
first, to reduce the gap between b
Sand e
S.
3.1. Descending low-rank corrections. The aim of this correction is to en-
hance the approximation of the inverse of K. We can define the matrix
(3.11) Y=G[(GTG)1K]GT=IGKGT
obtained from (3.8), where (GTG)1and Kreplace e
Sand S, respectively, and com-
pute its rank-kapproximation:
(3.12) Y'UkΣkUT
k.
Note that the computation of Ukand Σkis less expensive than in (3.9), because both
Gand Kare explicitly known. The enhanced preconditioner for Kreads
(3.13) K1'GTG+WkΘkWT
k.
The eigenvalues of Yare bounded from above by 1 as GKGTis positive definite,
but there is not a lower bound in this case. Actually this is not a problem, as we are
mainly interested in the computation of the eigenvalue σiof Yclosest to 1. From the
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
136 FRANCESCHINI, MAGRI, FERRONATO, AND JANNA
implementation point of view, it is better to dispose of a symmetrically split operator.
Hence, we define
(3.14) e
G= (I+UkΨkUT
k)G
in order to have
(3.15) e
GTe
G=GT(I+UkΨkUT
k)(I+UkΨkUT
k)G=GTG+WkΘkWT
k.
Recalling that Wk=GTUk, the diagonal k×kmatrix Ψkis simply found by solving
(3.16) I+ 2UkΨkUT
k+UkΨ2
kUT
k=I+UkΘkUT
k.
Using (3.3), the entries of Ψkread
(3.17) ψi=1 + r1
1σi
, i = 1, . . . , k.
Since GKGTis positive definite, σi<1 and ψiis real for any i.
This correction can significantly reduce kEKk. However, the update in e
Gprop-
agates in the other blocks of the preconditioned matrix, potentially shattering the
overall procedure efficiency:
PaAP T
a=e
GK e
GTe
GB
BTe
GTC=GKGTGB
BTGTC
(3.18)
+"UkΨkUT
kGKGT+GK GTUkΨkUT
k+UkΨkUT
kGKGTUkΨkUT
kUkΨkUT
kGB
BTGTUkΨkUT
kC#.
Actually, this is not the case. In fact, the block FSAI e
Fcomputed as the (2,1) block of
Pbwhen using e
Gis the approximate solution of the multiple right-hand-side system:
(3.19) e
FT' (e
GK e
GT)1e
GB =e
GTK1B.
By expanding (3.19) with e
Gdefinition, we note that e
Fcan be easily found as:
(3.20) e
F=F(I+UkΨkUT
k)1,
where Fis the standard block FSAI computed using G. Moreover, from (3.20)
and (3.14) we notice that F G =e
Fe
G. This implies that the approximate Schur
complement (2.16) is not affected by the use of e
Fand e
G:
(3.21)
e
S=C+e
Fe
GB +BTe
GTe
FT+e
Fe
GK e
GTe
FT=C+F GB +BTGTFT+F GKGTFT.
As a consequence, descending low-rank corrections on Ghave only a local impact with
no changes for the following levels.
3.2. Ascending low-rank corrections. We define the matrix b
Yand compute
its rank-kapproximation:
(3.22) b
Y=Ib
Ge
Sb
GT'b
Ukb
Σkb
UT
k,
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A ROBUST MFLR PRECONDITIONER FOR SPD MATRICES 137
where b
Gis the lower inverse factor of b
S; i.e., ( b
GTb
G)1=b
S, which is explicitly available
by the approximation of lower levels. The computation of (3.22) is relatively cheap,
because the explicit expression of every matrix is known. The new approximation to
e
Sis given by:
(3.23) e
S1'b
GTb
G+c
Wkb
Θkc
WT
k,
where c
Wk=b
GTb
Ukand b
Θk=b
Σk(Ib
Σk)1. Notice that during the set-up, we use a
split update, as done for the descending low-rank corrections, because it is operatively
necessary to compute b
Ge
Sb
GT. However, during the preconditioner application, the use
of (3.23) is more efficient as it only requires one update.
4. Numerical results. The multilevel FSAI preconditioner with low-rank cor-
rections (MFLR) can be implemented and applied through a recursive function (Al-
gorithms 4.1 and 4.2, respectively). The key parameters for the MFLR computation
and application are the matrix A, the number of levels nl, the current level index l,
and the size of the ascending and descending low-rank corrections, alr and dlr. Other
threshold parameters are actually needed by the inner kernels, such as those required
Algorithm 4.1. MFLR Set-up.
1. Recursive Function MFLR SetUp(l,nl,A,alr,dlr)
2. if l < (nl1) then
3. Partition Alas K B
BTC;
4. Compute the adaptive FSAI approximation Gof Ksuch that GTG'K1;
5. Set Pa=G0
0I;
6. Compute Pb=I0
F I as the adaptive block FSAI approximation of
PaAlPT
a;
7. Compute e
S=C+F GB +BTGTFT+F GKGTFT;
8. Compute a rank-kapproximation UkΣkUT
kof Y=IGKGT;
9. Set e
G= (I+UkΨkUT
k)Gwith Ψk= (IΣk)1/2I;
10. Set P=e
G0
F G I ;
11. [Qalr, QM] = MFLR SetUp(l+ 1,nl,e
S,alr,dlr);
12. Use Qalr and QMto compute the rank-kcorrection c
Wkand b
Θkas in (3.23);
13. Set e
P=I0
0I+c
Wkb
Θkc
WT
k;
14. Push e
Pin the head of Qalr;
15. Push Pin the head of QM;
16. return Qalr, QM;
17. else
18. Compute the adaptive FSAI approximation Gof Asuch that GTG'A1;
19. Set Qalr =;
20. Set QM={G};
21. return Qalr, QM;
22. end if
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
138 FRANCESCHINI, MAGRI, FERRONATO, AND JANNA
Algorithm 4.2. MFLR Application.
1. Recursive Function MFLR Apply(l,nl,Qalr,QM,x)
2. if l < (nl1) then
3. Pop Pfrom the head of QM;
4. Pop e
Pfrom the head of Qalr;
5. Compute y=Px;
6. Partition yinto y1=y(1 : n1) and y2=y(n1+ 1 : n1+n2);
7. Compute z2=MFLR Apply(l+ 1,nl,Qalr,QM,y2);
8. Form zwith y1and z2;
9. Compute y=e
Pz;
10. Update yPTy;
11. Push Pin the head of QM;
12. Push e
Pin the head of Qalr;
13. return y;
14. else
15. Pop Gfrom the head of QM;
16. Compute y=GTGx;
17. Push Gin the head of QM;
18. return y;
19. end if
by the computation of Gand F, and the level-of-fill control of the matrix-matrix
products. For the sake of readability, these parameters are dropped from the list of
arguments of the MFLR SetUp and MFLR Apply functions. The preconditioner
is stored using the lists Qalr and QM, and the parallelization is performed equally
partitioning each level among the available cores using the OpenMP directives. The
reason for this choice, which limits the algorithm scalability and the problem size, is
that the aim of this work is mainly the analysis of the MFLR preconditioner behavior.
The MPI implementation is an ongoing task.
The MFLR preconditioner behavior is investigated on a set of test cases. First, its
theoretical properties are verified in a small problem. Second, the MFLR sensitivity
to the user-specified parameters that control the preconditioner quality and density
are analyzed with an extensive numerical experimentation on a medium-size matrix.
Finally, the performance in a few large-size problems is considered.
4.1. Theoretical properties. We analyze the bcsstk38 matrix from the Uni-
versity of Florida Sparse Matrix Collection [9]. This SPD matrix has 8,032 rows
and 355,460 nonzeroes, and has been scaled so as to have a unitary diagonal. Its
eigenspectrum is provided in the leftmost frame of Figure 3. The matrix is uniformly
partitioned into two levels (n1=n2= 4,016). The approximate Schur complement e
S
is computed using (2.9), where Fis the block FSAI of Aobtained using the dynamic
strategy introduced in [16] and a variable number kF,max of entries retained per row.
The nonzero eigenspectrum of ( e
SS) is shown in the rightmost frame of Figure 3
and is strictly positive for any F, as expected from Corollary 2.3.
The analysis is performed by introducing step-by-step new ingredients to the
MFLR preconditioner. In particular, (i) the exact inverse of Kis used to evaluate
the effect of using either e
Sor S, (ii) the approximation GTG'K1is introduced,
and (iii) low-rank corrections are added to improve the preconditioner.
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A ROBUST MFLR PRECONDITIONER FOR SPD MATRICES 139
0 1000 2000 3000 4000 5000 6000 7000 8000
1×10-7
1×10-6
1×10-5
1×10-4
1×10-3
1×10-2
1×10-1
1×100
1×101
050 100 150 200
1×10-14
1×10-13
1×10-12
1×10-11
1×10-10
1×10-9
1×10-8
1×10-7
1×10-6
1×10-5
1×10-4
1×10-3
1×10-2
1×10-1
1×100
kF,max=30
kF,max=60
kF,max=90
Fig. 3.bcsstk38 test case: eigenspectrum of A(left) and (e
SS)(right) for different block
FSAI F.
Table 1
bcsstk38 test case: comparison between the conditioning numbers of (2.29) and (2.21) varying
Fwith kEKk= 0.λand ˜
λdenote the eigenvalues of M1AMTand f
M1Af
MT, respectively.
kF,max η1˜η1[λn, λ1] [˜
λn,˜
λ1]r(η1)
30 4.395 0.975 [0.0470, 21.273] [0.0249, 1.975] 5.709
60 2.867 0.944 [0.0988, 10.121] [0.0558, 1.944] 2.939
90 2.248 0.914 [0.1448, 6.9069] [0.0863, 1.914] 2.153
First, we want to verify that the main claim of section 2, i.e., the use of e
Sis
more appropriate than that of Sin the presented multilevel framework, is actually
correct using the exact inverse of K. The ratio between the conditioning numbers of
M1AMTand f
M1Af
MT, as obtained in Theorem 2.8, is reported in Table 1 for
different block FSAI F. Should Fbe computed exactly as a full matrix, there would
be no difference between f
M1and M1. By distinction, decreasing the quality of F
means increasing η1=kQk2and ˜η1=ke
Qk2. As a consequence, the effectiveness of
f
M1improves with respect to M1; i.e., the sparser F, the more important it is to
use e
Sinstead of S. For this test problem, the ratio r(η1) between the conditioning
numbers of (2.29) and (2.21) increases up to 5.709.
Theorem 2.8 no longer holds if we introduce the approximation GTGfor K1.
However, it is still true that the theoretical eigenvalue bounds for f
M1are tighter
than those for M1. The matrix Gis computed as the FSAI of Kusing the adaptive
strategy implemented in the FSAIPACK software package [20]. The quality of Gis
controlled by the maximum number of entries kG,max retained per row. First of all,
notice that e
Scomputed as in (2.3) might be indefinite if Gis not accurate enough. For
instance, even with kG,max = 250 the approximate Schur complement (CBTGTGB)
still has one negative eigenvalue. Changing the fill-in degree of Gmodifies the values
obtained from the bounds (2.38) and (2.39), which become tighter and tighter as
kG,max increases. Such bounds along with the actual eigenvalue intervals are reported
in Table 2 for different choices of Gand F. It can be observed that the role played by
1and ˜η1, i.e., a measure of the quality of Gand F, respectively, has a similar impact
on the actual eigenvalue distribution of f
M1Af
MT, as expected from the right-hand
side of inequality (2.38). Hence, one may argue that improving both Gand Fis
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
140 FRANCESCHINI, MAGRI, FERRONATO, AND JANNA
Table 2
bcsstk38 test case: eigenvalue distribution of (2.29) and (2.21) varying Fand G. The same
notation as in Table 1is used.
bound bound
kG,max kF,max 1η1˜η1[λn, λ1] [˜
λn,˜
λ1] (2.39) (2.38)
30 30 1.076 4.505 0.812 [1.8e-04, 47.3] [1.7e-04, 2.1] 19473.3 2.512
30 60 1.076 3.181 0.790 [1.8e-04, 33.2] [1.8e-04, 2.1] 9707.6 2.494
30 90 1.076 2.617 0.648 [1.8e-04, 29.1] [1.8e-04, 2.1] 6570.8 2.380
60 30 1.240 4.393 0.718 [3.9e-04, 40.0] [3.9e-04, 2.2] 8291.3 2.569
60 60 1.240 3.037 0.641 [4.0e-04, 29.5] [4.0e-04, 2.2] 3963.8 2.512
60 90 1.240 2.466 0.589 [4.0e-04, 25.8] [4.0e-04, 2.2] 2613.6 2.475
90 30 1.092 4.300 0.738 [6.6e-04, 36.0] [6.5e-04, 2.1] 5353.9 2.464
90 60 1.092 2.935 0.651 [6.8e-04, 26.8] [6.7e-04, 2.1] 2495.6 2.396
90 90 1.092 2.434 0.620 [6.9e-04, 23.3] [6.8e-04, 2.1] 1716.8 2.372
0500 1000 1500 2000 2500 3000 3500 4000
-1,0
-0,8
-0,6
-0,4
-0,2
0,0
0,2
0,4
0,6
0,8
1,0
8000 8005 8010 8015 8020 8025 8030
0.0
10.0
20.0
30.0
40.0
Eigenvalues of (2.21)
Eigenvalues of (2.29)
Upper bound (2.38)
Fig. 4.bcsstk38 test case: eigenspectrum of EK(left) and rightmost eigenvalues of the pre-
conditioned matrices (2.29) and (2.21)(right) for kG,max = 30 and kF ,max = 60. The upper bound
(2.38) is also shown.
essential for a better MF performance. On the other hand, the bound (2.39) is less
significant because of the presence of κ1
n, which can be quite large. Nonetheless,
the actual eigenvalue distribution of M1AM Tis in any case worse than that of
f
M1Af
MT.
As an example, Figure 4 shows the eigenvalue distribution of EKfor kG,max = 30,
and the preconditioned matrices (2.29) and (2.21) for the same kG,max and kF,max =
60. The matrix EKis indefinite, with the eigenvalues almost equally distributed
between negative and positive values approximately in the interval [1,1]. The matrix
f
M1Af
MThas a more favorable eigenvalues distribution than M1AMT.
Finally, we add to the frame descending and ascending low-rank corrections. We
fix kG,max = 30 and kF,max = 60, and change the number of eigenpairs computed to
correct either G(dlr) or b
S1(alr). Their effect on the variation of the conditioning
number of f
M1Af
MTis shown in Table 3. Using both low-rank strategies greatly
helps reduce the conditioning number. For example, in this case retaining 20 eigen-
pairs for both corrections yields a reduction of the conditioning number of about two
orders of magnitude, i.e., from about 24,000 to 500. By distinction, notice that cor-
recting e
S1towards S1is not as effective, as expected from the MFLR theoretical
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A ROBUST MFLR PRECONDITIONER FOR SPD MATRICES 141
Table 3
bcsstk38 test case: effect of descending and ascending low-rank corrections, dlr and alr, re-
spectively, with kG,max = 30 and kF,max = 60. The target of ALR correction is e
S1and S1in
the upper and lower table, respectively.
alr target alr dlr [λn, λ1]λ1n
0 0 [1.682e-04, 2.337e+00] 2.389e+04
0 10 [4.391e-04, 2.337e+00] 5.321e+03
0 20 [4.438e-04, 2.337e+00] 5.265e+03
10 0 [1.682e-04, 2.076e+00] 1.234e+04
e
S110 10 [4.391e-04, 2.077e+00] 4.731e+03
10 20 [4.438e-04, 2.137e+00] 4.816e+03
20 0 [1.744e-04, 2.076e+00] 1.190e+04
20 10 [3.380e-03, 2.077e+00] 6.145e+02
20 20 [4.413e-03, 2.137e+00] 4.843e+02
10 0 [1.753e-04, 1.467e+01] 8.371e+04
10 10 [4.342e-03, 1.482e+01] 3.414e+03
S110 20 [6.308e-03, 1.493e+01] 2.367e+03
20 0 [1.772e-04, 3.220e+01] 1.817e+05
20 10 [5.475e-03, 3.270e+01] 5.973e+03
20 20 [9.065e-03, 3.290e+01] 3.630e+03
properties. It is also interesting to observe that the largest eigenvalue is the most
sensitive to the selection of the target matrix for the ALR corrections, while it is
insensitive to the DLR corrections, which mainly affect the smallest eigenvalue.
4.2. Sensitivity to user-specified parameters. To test the influence of the
user-specified parameters that control the MFLR behavior, we use the medium-size
matrix Cube arising from a homogeneous and uniformly discretized structural problem
by P1 finite elements, with 190,581 unknowns and 7,531,389 nonzero entries. The
linear system (1.1) is solved by using an MFLR-preconditioned conjugate gradient
(PCG) method, with an exit tolerance on the relative residual equal to 108. The
right-hand-side bis such that the solution xis the vector with components xj=j+ 1,
j= 1, . . . , n. All tests reported here are obtained using a machine equipped with
Intel(R) Xeon(R) E5-2680 v2 processors at 2.80 GHz and 256 Gbyte of RAM. Each
CPU has 10 cores. For these preliminary tests, just one thread is used. The MFLR
preconditioner is implemented in Fortran90, with the code compiled by the Intel
Fortran compiler using the -O3 optimization level. We also used BLAS and LAPACK
routines from the Intel Math Kernel Library. For the computation of the eigenpairs
needed by the low-rank corrections, we used the Laneig software, that is part of the
Filtlan package [10].
The user-specified parameters that control the MFLR quality and performance
are as follows:
1. nl: number of levels. Therefore, (nl1) is the number of computed Schur
complements;
2. G: tolerance for the adaptive computation of G, see [20];
3. F: tolerance for the adaptive computation of F, see [16];
4. dlr: DLR correction size, i.e., number of eigenpairs used to enrich G. This is
a local improvement;
5. alr: ALR correction size, i.e., number of eigenpairs used to enrich b
S1. This
is a global improvement, in the sense that the Schur complement size grows
up to the size of the zero-level partition.
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
142 FRANCESCHINI, MAGRI, FERRONATO, AND JANNA
Table 4
Cube test case: MFLR performance varying nland G(F= 102,dlr =alr = 0).
Gnlnit ρ TpTsGnlnit ρ TpTs
10110 930 0.43 14.5 26.8 10210 640 0.77 34.5 21.9
20 843 0.73 132.0 34.3 20 591 1.17 265.9 29.3
50 664 1.47 349.9 49.1 50 503 2.04 441.7 41.6
100 612 2.78 515.1 79.2 100 464 3.67 612.9 67.0
10310 446 1.97 268.4 24.0 10410 362 3.83 2232.9 37.3
20 416 2.53 881.2 29.5 20 338 4.67 3741.6 35.7
50 386 3.45 829.9 41.8 50 329 5.74 2404.2 47.4
100 367 5.71 956.9 66.3 100 318 8.88 2053.8 73.1
The Cube test matrix is reordered with the Reverse Cuthill McKee algorithm and
uniformly partitioned into equal-size levels. The results are evaluated in terms of
number of iterations, nit, time needed to compute the preconditioner, Tp, the time
spent in the PCG iterations, Ts, and the preconditioner density, ρ, defined as
(4.1) ρ=1
nnz(A)
nl1
X
i=0 nnz(Qi
alr) + nnz(Qi
M),
where nnz(·) gives the number of nonzero entries stored for the operator (·) and all the
remaining symbols are defined as in Algorithm 4.1. For these preliminary tests, the
matrix-matrix products are fully computed, i.e., any use of thresholds and/or levels
of fill is avoided.
Similarly to the previous section, the analysis is carried out step by step, adding
one ingredient at a time. First, the performance of the robust MF preconditioner
with no low-rank corrections is investigated by changing the fill-in degree of Gand F.
Then, low-rank corrections are added to the frame to identify their effect. It is worth
mentioning that only the robust MF preconditioner is addressed here, because the
standard multilevel FSAI algorithm breaks down, giving rise to indefinite Schur com-
plements.
Table 4 shows the results obtained by varying the number of levels nland the
tolerance G. The other tolerance Fis set to 102and no low-rank corrections are
applied, i.e., dlr =alr = 0. As expected, the iteration count decreases progressively
as Gdecreases, i.e., Gis more accurate, and nlgrows. The preconditioner density
also increases, so that the set-up burden and the cost per iteration grows. Notice
that, in particular, the set-up time can become very heavy. Although such a cost
could be reduced by introducing thresholds and level-of-fills in the matrix-matrix
computations with no substantial loss in the PCG acceleration, it appears that the
proposed multilevel approach is of interest whenever the preconditioner can be reused
several times, e.g., in eigensolvers or in some transient simulations, so that its set-up
cost can be properly amortized and set apart in the present analysis. Hence, we focus
on the solution time Tsonly. With the most efficient Gvalue, i.e., 102, we vary F
and nl. The results, provided in Table 5, show that the MFLR preconditioner appears
to be less sensitive to the quality of Fthan of G. In fact, the iteration count is more
stable than in Table 4.
With G= 102,F= 102, and nl= 10, we test the effects of corrections.
Table 6 shows the impact of using either DLR and ALR corrections. Increasing the
rank size, the number of iterations decreases for both approaches, but ALR correc-
tions have a stronger impact on the solution time Ts. This is somewhat expected
because ALR corrections have a global effect on the overall preconditioner, while
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A ROBUST MFLR PRECONDITIONER FOR SPD MATRICES 143
Table 5
Cube test case: MFLR performance varying nland F(G= 102,dlr =alr = 0).
Fnlnit ρ TpTsFnlnit ρ TpTs
10110 660 0.65 29.1 22.8 10210 640 0.77 34.5 21.9
20 632 0.94 216.8 29.5 20 591 1.17 265.9 29.3
50 601 1.61 369.8 47.8 50 503 2.04 441.7 41.6
100 579 2.81 498.0 78.7 100 464 3.67 612.9 67.0
10310 621 1.13 85.7 27.0 10410 585 2.304 1163.3 35.1
20 544 1.79 639.2 32.9 20 460 3.445 4513.5 39.3
50 403 3.21 1129.2 43.0 50 309 5.550 4020.9 43.6
100 348 6.98 2278.7 71.2 100 255 11.192 6029.8 68.6
Table 6
Cube test case: MFLR performance varying either dlr or alr (G=F= 102,nl= 10).
dlr nit ρ TpTsalr nit ρ TpTs
5 534 0.90 71.3 29.3 5 289 1.35 95.1 17.0
10 485 1.01 66.1 29.4 10 213 1.92 78.0 15.1
20 450 1.24 65.9 25.5 20 196 3.06 88.1 14.1
50 405 1.92 75.0 30.1 50 186 6.48 214.5 36.7
Table 7
Cube test case: MFLR performance varying both dlr and alr (G=F= 102,nl= 10).
dlr alr nit ρ TpTs
1 1 525 0.92 76.3 34.4
1 10 204 1.95 87.7 14.8
1 20 188 3.09 112.5 18.5
10 1 381 1.13 71.7 25.0
10 10 148 2.15 83.6 10.9
10 20 134 3.29 100.0 11.3
20 1 341 1.35 86.6 23.9
20 10 132 2.38 100.5 10.4
20 20 123 3.52 100.1 10.8
DLR corrections improve locally the current level approximation of K1. Finally, the
combined effect of both corrections is investigated in Table 7. This allows for obtain-
ing the best result in terms of both iteration count and solution time Tsdecrease. In
particular, the latter is more than halved with respect to the case with no corrections
with at least dlr =alr = 10. The solver acceleration is paid with a bigger set-up
cost, which makes the MFLR approach interesting when the preconditioner can be
recycled.
Although the present analysis cannot be thoroughly exhaustive, it is possible to
observe that dlr and alr appear to be the most sensitive user-specified parameters.
The number of levels nlstrongly depends on the size of the matrix and should be
selected such that each level is not too small. In the Cube test case it can be seen that
with more than 20 levels, i.e., less than 10,000 unknowns per level, the preconditioner
application cost grows quickly and is no longer compensated by the iteration count
reduction. By distinction, the selection of the tolerances Gand Fdoes not appear
to be overly difficult. In fact, the MFLR performance does not change much with
respect to a variation of these parameters in the interval [103,101].
4.3. Preconditioner performance. The computational performance of the
MFLR preconditioner is finally evaluated in a set of large size SPD matrices taken
from the University of Florida Sparse Matrix Collection [9]. The main properties of
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
144 FRANCESCHINI, MAGRI, FERRONATO, AND JANNA
Table 8
Test matrices.
Size Number of nonzeroes
af shell3 504,855 17,588,875
af shell8 504,855 17,579,155
Emilia 923 923,136 40,373,538
Geo 1438 1,437,960 60,236,322
StocF 1465 1,465,137 21,005,389
Table 9
Computational performance of the adaptive FSAI and MFLR preconditioners for the test ma-
trices of Table 8.
Adaptive FSAI MFLR
nit ρ Tp[s] Ts[s] nit ρ Tp[s] Ts[s]
af shell3 963 0.89 88.8 63.9 126 4.50 237.8 31.5
af shell8 1033 0.86 81.2 68.0 131 3.22 174.8 28.6
Emilia 923 1513 0.11 5.0 128.5 575 0.25 82.8 80.4
Geo 1438 491 0.31 59.4 89.6 456 0.49 159.8 110.4
StocF 1465 937 0.93 51.3 133.4 936 1.66 260.4 222.2
the selected test cases are provided in Table 8, while the best results obtained by
the MFLR preconditioner in terms of iteration CPU time Tsare given in Table 9.
As a benchmark, in the same table the performance provided by the native adaptive
FSAI algorithm is also shown. All the CPU times are obtained using 1 thread on the
computer defined in section 4.2. It is worth mentioning that the native adaptive FSAI
is used as a benchmark because the standard multilevel FSAI generally gives rise to
indefinite Schur complements. Hence, only the robust MF algorithm presented here
can be effectively used, with or without low-rank corrections. The latter may improve
the convergence rate but have no effect as to the preconditioner robustness.
As already observed, the MFLR preconditioner set-up can be quite expensive,
especially because of the computation of the eigenpairs needed by the low-rank cor-
rection procedures. However, its effectiveness in the iteration count and CPU time
can be quite significant. For instance, in the af shell3 and af shell8 test cases, nit
is approximately reduced by a factor 10 and Tsis more than halved. Hence, the use
of the MFLR preconditioner can be of great interest whenever the set-up time can
be amortized along several linear solves. On the other hand, if the reduction of the
number of iterations is marginal, such as in the Geo 1438 and StocF 1465 test cases,
the adaptive FSAI proves more efficient than the MFLR preconditioner.
For the sake of a comparison, the computational performance obtained by other
multilevel packages, such as ILUPACK [3] and the Trilinos ML [13], is provided in
Table 10. The ILUPACK parameters are set so as to obtain approximately the same
memory footprint as MFLR, i.e., a density close to the one provided in Table 9. By
distinction, the default parameters are retained for Trilinos ML. The results show
that the ILUPACK performance in terms of solution time Tsis close to MFLR
for af shell3 and af shell8, much better for StocF 1465, and much lower for
Emilia 923 and Geo 1438. In particular, the ILU density for the Emilia 923 test
case must be increased up to 3.8 to attain convergence in just 62 iterations, other-
wise the convergence is not reached after 3,000 iterations. Recalling that ILUPACK
cannot thoroughly exploit a growing parallelism of the computational architecture,
the comparison with MFLR appears to be quite satisfactory. As far as the results
with Trilinos ML are concerned, it can be observed that with the default parameters
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A ROBUST MFLR PRECONDITIONER FOR SPD MATRICES 145
Table 10
Computational performance of the test matrices of Table 8with ILUPACK and Trilinos ML.
ILUPACK GMRes Trilinos ML CG
nit ρ Tp[s] Ts[s] nit ρ Tp[s] Ts[s]
af shell3 79 4.12 64.0 24.6 356 1.01 1.2 80.1
af shell8 78 4.12 62.2 24.4 309 1.02 1.2 57.4
Emilia 923 >3000 0.40 34.0 586 1.15 6.5 367.0
Geo 1438 878 0.50 57.3 385.8 118 1.15 9.6 93.7
StocF 1465 108 1.50 49.2 33.3 759 1.16 3.8 278.0
0.01
0.1
1
Tp(t) / Tp(1)
Adaptive FSAI
MFLR
1 2 4 8 16 32
no. threads
0.01
0.1
1
Ts(t) / Ts(1)
Fig. 5.Strong scalability test for a regular 3003Laplacian.
the MFLR preconditioner is always superior, with the exception of the Geo 1438 test
case, where the two performances are roughly equivalent.
Finally, to show the potential parallelism of the MFLR preconditioner, a scal-
ability test has been carried out on a discrete Laplacian computed over a regular
300 ×300 ×300 grid. The numerical experiment is performed on the Marconi cluster
at the CINECA Center of High Performance Computing, Bologna, Italy. The clus-
ter is still under construction and, at the moment of writing this work, it consists
of 1,512 nodes with 128 Gbyte of RAM memory each. Every node is equipped with
2 Intel Xeon E5-2697v4 Broadwell processors at 2.3 GHz with 36 cores. This pre-
liminary implementation of MFLR makes use of shared-memory parallelism through
the OpenMP directives, hence in the strong scalability test provided in Figure 5, a
single node only is used with up to 32 threads. The strong scalability of the MFLR
preconditioner is compared to that of the Adaptive FSAI algorithm as far as both
the set-up and the iteration time, Tpand Ts, are concerned. In particular, Figure 5
provides the speed-up,
Sp=Tp(t)
Tp(1), Ss=Ts(t)
Ts(1),(4.2)
with Tp(t) and Ts(t) the set-up and iteration wall-clock times measured with tthreads.
It has been already verified that the native Adaptive FSAI practically has an ideal
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
146 FRANCESCHINI, MAGRI, FERRONATO, AND JANNA
speed-up [20], according to the hardware properties of the specific computational ar-
chitecture used for the numerical experiments. In the case of the multi-core processors
of the Marconi cluster, it is well-known that it is virtually impossible to obtain ideal
speed-ups with iterative solvers as these algorithms are bandwidth limited, being char-
acterized by a low bit per flop ratio. The MFLR preconditioner is theoretically less
parallel than FSAI, because of the intrinsic sequentiality introduced by the multilevel
framework. Nevertheless, Figure 5 shows that the strong scalability is only marginally
affected by such a sequentiality and the MFLR preconditioner still preserves a good
degree of parallelism.
5. Conclusions. The development of a multilevel framework is often useful in
several applications. However, using the FSAI preconditioner as the basic kernel
in a standard multilevel approach may give rise to some difficulties related with the
approximations introduced in the computation of the Schur complement at each level.
With SPD problems, such a Schur complement might be indefinite, thus causing a
breakdown of the multilevel algorithm.
The present work develops a robust multilevel framework for SPD matrices based
on the use of adaptive FSAI as the main kernel. An alternative way of computing
the Schur complement is introduced so as to guarantee its positive definiteness in-
dependent of the preconditioner sparsity. A theoretical analysis is formulated with
the aim of providing appropriate bounds for the eigenspectrum of the preconditioned
matrix. The multilevel FSAI preconditioner is further enhanced by introducing low-
rank corrections at both a local and a global level, namely DLR and ALR corrections,
respectively, thus producing the MFLR preconditioning framework.
The MFLR preconditioner has been investigated in a set of test problems to
analyze (i) the relative influence of the user-specified parameters controlling the al-
gorithm set-up, and (ii) the computational performance and potential scalability in
a parallel environment. The numerical results show that the proposed approach is
generally able to significantly accelerate the solver convergence rate still preserving a
good degree of parallelism. At the present time, the solver acceleration is paid off by
a large set-up cost, which is mainly due to the computation of the eigenpairs needed
by the low-rank corrections. The increase of the cost for building the preconditioner
with respect to the native adaptive FSAI makes this approach attractive especially
for those applications where the preconditioner can be effectively recycled along a
number of linear solves.
The main goal of this work was to prove the robustness and effectiveness of
a multilevel framework in explicit preconditioning. Further investigations will be
devoted in the development of a faster set-up stage by using, for instance, randomized
approaches while computing low-rank corrections [15], and enforcing sparsity in the
resulting factors.
REFERENCES
[1] P. Amestoy, C. Ashcraft, O. Boiteau, A. Buttari, J. Y. L’Excellent, and
C. Weisbecker,Improving multifrontal methods by means of block low-rank represen-
tations, SIAM J. Sci. Comp., 37 (2015), pp. A1451–A1474.
[2] M. Benzi,Preconditioning techniques for large linear systems: A survey, J. Comput. Phys.,
182 (2002), pp. 418–477.
[3] M. Bollh¨
ofer, J. I. Aliaga, A. F. Martın, and E. S. Quintana-Ort
´
ı, ILUPACK, in Ency-
clopedia of Parallel Computing, D. Padua, ed., Springer, New York, 2011, pp. 917–926.
[4] Y. Bu, B. Carpentieri, Z. Shen, and T. Z. Huang,A hybrid recursive multilevel incomplete
factorization preconditioner for solving general linear systems, Appl. Numer. Math., 104
(2016), pp. 141–157.
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A ROBUST MFLR PRECONDITIONER FOR SPD MATRICES 147
[5] B. Carpentieri, I. S. Duff, L. Giraud, and G. Sylvand,Combining fast multipole techniques
and an approximate inverse preconditioner for large electromagnetism calculations, SIAM
J. Sci. Comput., 27 (2005), pp. 774–792.
[6] B. Carpentieri, J. Liao, and M. Sosonkina,VBARMS: A variable block algebraic recursive
multilevel solver for sparse linear systems, J. Comp. Appl. Math., 259 (2014), pp. 164–173.
[7] T. F. Chan and T. P. Mathew,Domain decomposition algorithms, Acta Numer., 3 (1994),
pp. 61–143.
[8] E. Chow and Y. Saad,Approximate inverse preconditioners via sparse-sparse iterations,
SIAM J. Sci. Comput., 19 (1998), pp. 995–1023.
[9] T. A. Davis and Y. Hu,The University of Florida sparse matrix collection, ACM Trans. Math.
Software, 38 (2011), pp. 1–25.
[10] H. Fang and Y. Saad,A filtered Lanczos procedure for extreme and interior eigenvalue prob-
lems, SIAM J. Sci. Comput., 34 (2012), pp. 2220–2246
[11] M. Ferronato,Preconditioning for sparse linear systems at the dawn of the 21st cen-
tury: History, current developments, and future perspectives, ISRN Appl. Math., (2012),
doi:10.5402/2012/127647.
[12] A. Franceschini, V. A. Paludetto Magri, C. Janna, and M. Ferronato,Multilevel ap-
proaches for FSAI preconditioning, Numer. Linear Algebra Appl., submitted.
[13] M. W. Gee, C. M. Siefert, J. J. Hu, R. S. Tuminaro, and M. G. Sala,ML 5.0 Smoothed
Aggregation User’s Guide, 2006–2649 Sandia National Laboratories, 2006.
[14] W. Hackbusch,A sparse matrix arithmetic based on H-matrices. Part I: Introduction to
H-matrices, Computing, 62 (1999), pp. 89–108.
[15] N. Halko, P. G. Martinsson, and J. A. Tropp,Finding structure with randomness: Prob-
abilistic algorithms for constructing approximate matrix decompositions, SIAM Rev., 53
(2011), pp. 217–288.
[16] C. Janna and M. Ferronato,Adaptive pattern research for block FSAI preconditioning, SIAM
J. Sci. Comput. 33 (2011), pp. 3357–3380.
[17] C. Janna, M. Ferronato, and G. Gambolati,Multilevel incomplete factorizations for the it-
erative solution of non-linear finite element problems, Internat. J. Numer. Methods Engrg.,
80 (2009), pp. 651–670.
[18] C. Janna, M. Ferronato, and G. Gambolati,A Block FSAI-ILU parallel precondi-
tioner for symmetric positive definite linear systems, SIAM J. Sci. Comput., 32 (2010),
pp. 2468–2484.
[19] C. Janna, M. Ferronato, and G. Gambolati,Enhanced block FSAI preconditioning using
domain decomposition techniques, SIAM J. Sci. Comput., 35 (2013), pp. S229–S249.
[20] C. Janna, M. Ferronato, F. Sartoretto, and G. Gambolati,FSAIPACK: A software pack-
age for high performance FSAI preconditioning, ACM Trans. Math. Software, 41 (2015),
Art. 10.
[21] L. Yu. Kolotilina and A. Yu. Yeremin,Factorized sparse approximate inverse precondition-
ing I. Theory, SIAM J. Matrix Anal. Appl., 14 (1993), pp. 45–58.
[22] Z. Li, Y. Saad, and M. Sosonkina,pARMS: a parallel version of the algebraic recursive
multilevel solver, Numer. Lin. Alg. Appl., 10 (2003), pp. 485–509.
[23] R. Li and Y. Saad,Divide and conquer low-rank preconditioners for symmetric matrices,
SIAM J. Sci. Comput., 35 (2013), pp. A2069–A2095.
[24] L. C. McInnes, B. Smith, H. Zhang, and R. T. Mills,Hierarchical Krylov and nested Krylov
methods for extreme-scale computing, Parallel Comput., 40 (2014), 17–31.
[25] P. Raghavan and K. Teranishi,Parallel hybrid preconditioning: Incomplete factoriza-
tion with selective sparse approximate inversion, SIAM J. Sci. Comput. 32 (2010),
pp. 1323–1345.
[26] Y. Saad and B. Suchomel,ARMS: An algebraic recursive multilevel solver for general sparse
linear systems, Numer. Linear Algebra Appl. 9 (2002), pp. 359–378.
[27] J. A. Scott and M. T˚
uma,On positive semidefinite modification schemes for incomplete
Cholesky factorization, SIAM J. Sci. Comput., 36 (2014), pp. A609–A633.
[28] S. Wang, X. S. Li, J. Xia, Y. Situ, and M. V. de Hoop,Efficient scalable algorithms for
solving dense linear systems with hierarchically semiseparable structures, SIAM J. Sci.
Comp., 35 (2013), pp. C519–C544.
[29] Y. Xi, R. Li and Y. Saad,An algebraic multilevel preconditioner with low-rank corrections
for sparse symmetric matrices, SIAM J. Matrix Anal. Appl., 37 (2016), pp. 235–259.
[30] J. Xia,On the complexity of some hierarchical structured matrix algorithms, SIAM J. Matrix
Anal. Appl., 33 (2012), pp. 388–410.
[31] J. Xia,Efficient structured multifrontal factorization for general large sparse matrices, SIAM
J. Sci. Comput., 35 (2013), pp. A832–A860.
[32] J. Xia, S. Chandrasekaran, M. Gu, and X. S. Li,Fast algorithms for hierarchically semisep-
arable matrices, Numer. Linear Algebra Appl., 17 (2010), pp. 953–976.
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
... These preconditioners approximate the Schur complement or its inverse by exploiting various low-rank corrections and because they are essentially approximate inverse methods they tend to perform rather well on indefinite linear systems. Similar ideas have also been exploited in [9]. A related class of methods is the class of rank structured matrix methods, which include the HOLDR-matrix [1], the H-matrix [2,4], the H 2 -matrix [11] and hierarchically semiseparable (HSS) matrices [5,18,24]. ...
... Proof. Since A is SPD, S and C 0 are also SPD and C 9) and this shows that λ(C −1 0 E s ) < 1. Now we prove the second part of this lemma. ...
Preprint
Full-text available
An effective power based parallel preconditioner is proposed for general large sparse linear systems. The preconditioner combines a power series expansion method with some low-rank correction techniques, where the Sherman-Morrison-Woodbury formula is utilized. A matrix splitting of the Schur complement is proposed to expand the power series. The number of terms used in the power series expansion can control the approximation accuracy of the preconditioner to the inverse of the Schur complement. To construct the preconditioner, graph partitioning is invoked to reorder the original coefficient matrix, leading to a special block two-by-two matrix whose two off-diagonal submatrices are block diagonal. Variables corresponding to interface variables are obtained by solving a linear system with the coeffcient matrix being the Schur complement. For the variables related to the interior variables, one only needs to solve a block diagonal linear system. This can be performed efficiently in parallel. Various numerical examples are provided to illustrate that the efficiency of the proposed preconditioner.
... Iterative solvers are efficient, but require a good preconditioner in order for the solution to converge [17,157,54,16,156,55,98]. The constrained preconditioner and solver presented in [55] was tested in this work. ...
... Iterative solvers are efficient, but require a good preconditioner in order for the solution to converge [17,157,54,16,156,55,98]. The constrained preconditioner and solver presented in [55] was tested in this work. This type of iterative procedure is able to treat the asymmetric matrix generated after discretization and preliminary results could be obtained. ...
Thesis
Full-text available
Porous media are found in many engineering-relevant materials such as foam and cementitious composites, as well as in biological tissues. They present a complex nature, being composed of both solid and fluid phases, which interact with each other. This interaction can be caused by chemical reactions or due to drying, e.g. during hardening of cement-based materials when the solid matrix is formed and deforms. The presence of inner heterogeneities, self or external restraints prevents the free deformation of the medium and might lead to cracking. Cracks induced by change of volume, or shrinkage, occur due to short- and long-term drying. In cementitious materials at early ages, these can compromise the durability of the construction. Aim of this thesis is to develop a numerical framework able to predict shrinkage-induced cracking in porous materials. In this work, porous media are modeled at the macroscopic scale, in other words pores and heterogeneities are not modeled explicitly and average properties are taken into account instead. The poromechanical framework is based on the effective stress concept. A phase-field model of brittle fracture is utilized to model cracking and coupled to the poromechanical part through the effective stress and through an expression of the fracture energy depending on hydraulic variables. The developed mathematical framework is discretized with the finite element method using the Taylor-Hood element pair and implemented within the deal.II library. The first application of the framework deals with the desiccation phenomenon in soils. A block of clay is subjected to drying at different configurations in a 2D setting. A sensitivity analysis of the problem with respect to the variation of input quantities is performed. The obtained behavior compares very well with the experimentally observed one. Numerical aspects and the extension to the 3D setting are also investigated. The second application deals with cracking induced by drying shrinkage in cementitious mortar. In order to obtain the most appropriate input data for the calibration of the framework and to pursue its independent validation, a set of original experiments is performed. The calibration tests encompass mechanical as well as poromechanical tests, which aim at providing material properties and boundary conditions such as mass loss (flux) data for shrinkage-induced cracking simulations. For independent validation, ring tests are performed. A good agreement between computational and experimental results is found.
... In the context of preconditioning linear systems, much work has recently been devoted to low-rank matrix representations [35,23,36,22,58]. Let us recall the following result [36]. ...
Preprint
Full-text available
Divergence constraints are present in the governing equations of many physical phenomena, and they usually lead to a Poisson equation whose solution typically is the main bottleneck of many simulation codes. Algebraic Multigrid (AMG) is arguably the most powerful preconditioner for Poisson's equation, and its effectiveness results from the complementary roles played by the smoother, responsible for damping high-frequency error components, and the coarse-grid correction, which in turn reduces low-frequency modes. This work presents several strategies to make AMG more compute-intensive by leveraging reflection, translational and rotational symmetries, often present in academic and industrial configurations. The best-performing method, AMGR, is based on a multigrid reduction framework that introduces an aggressive coarsening to the multigrid hierarchy, reducing the memory footprint, setup and application costs of the top-level smoother. While preserving AMG's excellent convergence, AMGR allows replacing the standard sparse matrix-vector product with the more compute-intensive sparse matrix-matrix product, yielding significant accelerations. Numerical experiments on industrial CFD applications demonstrated up to 70% speed-ups when solving Poisson's equation with AMGR instead of AMG. Additionally, strong and weak scalability analyses revealed no significant degradation.
... One of the motivations of this paper is to propose a method that is a move away from nested dissection. Similar low-rank correction ideas have also been exploited in [11,17,1,12]. A related class of methods is the class of rank structured matrix methods, which include the hierarchically off-diagonal low-rank matrix [3], the \scrH -matrix [4,6,7], the \scrH 2 -matrix [15], and hierarchically semiseparable (HSS) matrices [32]. ...
... Previous work on the computation of inverse factors has to a great extent focused on approximations used as preconditioners for iterative solution of linear systems (see Benzi & Tuma, 1999). Examples, besides the AINV algorithms already mentioned, include FSAI (Kolotilina & Yeremin, 1993) and more recent variants (Franceschini et al., 2018). Recursive partitioning of the matrix is used in direct methods for factorization such as multifrontal methods (see Davis et al., 2016). ...
Article
Full-text available
We propose a localized divide and conquer algorithm for inverse factorization S1=ZZS^{-1} = ZZ^* of Hermitian positive definite matrices S with localized structure, e.g. exponential decay with respect to some given distance function on the index set of S. The algorithm is a reformulation of recursive inverse factorization (Rubensson et al. (2008) Recursive inverse factorization. J. Chem. Phys., 128, 104105) but makes use of localized operations only. At each level of the recursion, the problem is cut into two subproblems and their solutions are combined using iterative refinement (Niklasson (2004) Iterative refinement method for the approximate factorization of a matrix inverse. Phys. Rev. B, 70, 193102) to give a solution to the original problem. The two subproblems can be solved in parallel without any communication and, using the localized formulation, the cost of combining their results is negligible compared to the overall cost for sufficiently large systems and appropriate partitions of the problem. We also present an alternative derivation of iterative refinement based on a sign matrix formulation, analyze the stability and propose a parameterless stopping criterion. We present bounds for the initial factorization error and the number of iterations in terms of the condition number of S when the starting guess is given by the solution of the two subproblems in the binary recursion. These bounds are used in theoretical results for the decay properties of the involved matrices. We demonstrate the localization properties of our algorithm for matrices corresponding to nearest neighbor overlap on one-, two- and three-dimensional lattices, as well as basis set overlap matrices generated using the Hartree–Fock and Kohn–Sham density functional theory electronic structure program Ergo (Rudberg et al. (2018) Ergo: an open-source program for linear-scaling electronic structure. SoftwareX, 7, 107). We evaluate the parallel performance of our implementation based on the chunks and tasks programming model, showing that the proposed localization of the algorithm results in a dramatic reduction of communication costs.
Article
The solution of large sparse linear systems is an essential part in many scientific fields. Numerical solution of such systems is usually performed using iterative methods in conjunction with effective preconditioning schemes. A new class of factored approximate inverses is proposed, namely adaptive factored incomplete inverse matrices, which are computed using a recursive Schur complement‐based approach. This class of approximate inverses does not require the knowledge of a sparsity pattern, which is formed adaptively during computation. For this to be possible, a flexible sparse storage scheme was designed. Several numerical dropping strategies are showcased and discussed, along with the effects of reordering to the preconditioner density and the corresponding convergence behavior. Dropping based on monitoring the growth of elements in the incomplete LU factorization is also presented and discussed along with improvements during computation of the successive Schur complements. A static sparsity pattern variant is also provided and discussed. Implementation details and analysis, for computing the proposed scheme, are given along with the computational complexity and memory requirements. Numerical results depicting the effectiveness and applicability of the proposed scheme are also presented.
Article
The solution of large sparse linear systems is required in many scientific fields such as computational fluid dynamics, computational electromagnetism, computational finance, etc. The computation of the solution of these systems is performed with preconditioned iterative methods, which rely on effective preconditioning schemes. A new class of approximate inverses is proposed, namely incomplete inverse matrices, which are computed using a recursive Schur complement‐based approach. This class of approximate inverses is based on a priori knowledge of a sparsity pattern. In order to have finer control over the density of the proposed approximate inverse, especially in the case of three‐dimensional problems, on‐the‐fly filtration is used, resulting in substantial reduction in the number of nonzero elements. Implementation details and analysis for computing the proposed scheme are given. Numerical results depicting the effectiveness and applicability of the proposed scheme are also provided.
Article
Full-text available
A hybrid approach for the solution of linear elliptic PDEs, based on the unified transform method in conjunction with domain decomposition techniques, is introduced. Given a well-posed boundary value problem, the proposed methodology relies on the derivation of an approximate global relation, which is an equation that couples the finite Fourier transforms of all the boundary values. The computational domain is hierarchically decomposed into several nonoverlapping subdomains; for each of those subdomains, a unique approximate global relation is derived. Then, by introducing a modified Dirichlet-to-Neumann iterative algorithm, it is possible to compute the solution and its normal derivative at the resulting interfaces. By considering several hierarchical levels, higher spatial resolution can be achieved. There are three main advantages associated with the proposed approach. First, since the unified transform is a boundary-based technique, the interior of each subdomain does not need to be discretized; thus, no mesh generation is required. Additionally, the Dirichlet and Neumann values can be computed on the interfaces with high accuracy, using a collocation technique in the complex Fourier plane. Finally, the interface values at each hierarchical level can be computed in parallel by considering a quadtree decomposition in conjunction with the iterative Dirichlet-to-Neumann algorithm. The proposed methodology is analysed both regarding implementation details and computational complexity. Moreover, numerical results are presented, assessing the performance of the solver.
Article
Full-text available
We introduce an algebraic recursive multilevel incomplete factorization preconditioner based on a distributed Schur complement formulation for solving general linear systems. The proposed method combines factorization techniques of both implicit and explicit type, recursive combinatorial algorithms, multilevel mechanisms and overlapping strategies to maximize sparsity in the inverse factors and reduce the factorization costs. Numerical experiments are illustrated to demonstrate the potential to precondition effectively general linear systems, also against other state-of-the-art iterative solvers of both implicit and explicit form.
Article
Full-text available
Incomplete Cholesky factorizations have long been important as preconditioners for use in solving large-scale symmetric positive-definite linear systems. In this paper, we focus on the relationship between two important positive semidefinite modification schemes that were introduced to avoid factorization breakdown, namely, the approach of Jennings and Malik and that of Tismenetsky. We present a novel view of the relationship between the two schemes and implement them in combination with a limited memory approach. We explore their effectiveness using extensive numerical experiments involving a large set of test problems arising from a wide range of practical applications.
Article
Factorized sparse approximate inverse (FSAI) preconditioners are robust algorithms for symmetric positive matrices, which are particularly attractive in a parallel computational environment because of their inherent and almost perfect scalability. Their parallel degree is even redundant with respect to the actual capabilities of the current computational architectures. In this work, we present two new approaches for FSAI preconditioners with the aim of improving the algorithm effectiveness by adding some sequentiality to the native formulation. The first one, denoted as block tridiagonal FSAI, is based on a block tridiagonal factorization strategy, whereas the second one, domain decomposition FSAI, is built by reordering the matrix graph according to a multilevel k‐way partitioning method followed by a bandwidth minimization algorithm. We test these preconditioners by solving a set of symmetric positive definite problems arising from different engineering applications. The results are evaluated in terms of performance, scalability, and robustness, showing that both strategies lead to faster convergent schemes regarding the number of iterations and total computational time in comparison with the native FSAI with no significant loss in the algorithmic parallel degree.
Article
This paper describes a multilevel preconditioning technique for solving sparse symmetric linear systems of equations. This "Multilevel Schur Low-Rank" (MSLR) preconditioner first builds a tree structure T based on a hierarchical decomposition of the matrix and then computes an approximate inverse of the original matrix level by level. Unlike classical direct solvers, the construction of the MSLR preconditioner follows a top-down traversal of T and exploits a low-rank property that is satisfied by the difference between the inverses of the local Schur complements and specific blocks of the original matrix. A few steps of the generalized Lanczos tridiagonalization procedure are applied to capture most of this difference. Numerical results are reported to illustrate the efficiency and robustness of the MSLR preconditioner with both two- and three-dimensional discretized PDE problems and with publicly available test problems. Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
Article
Matrices coming from elliptic partial differential equations have been shown to have a low-rank property: well-defined off-diagonal blocks of their Schur complements can be approximated by low-rank products. Given a suitable ordering of the matrix which gives the blocks a geometrical meaning, such approximations can be computed using an SVD or a rank-revealing QR factorization. The resulting representation offers a substantial reduction of the memory requirement and gives efficient ways to perform many of the basic dense linear algebra operations. Several strategies, mostly based on hierarchical formats, have been proposed to exploit this property. We study a simple, nonhierarchical, low-rank format called block low-rank (BLR) and explain how it can be used to reduce the memory footprint and the complexity of sparse direct solvers based on the multifrontal method. We present experimental results on matrices coming from elliptic PDEs and from various other applications. We show that even if BLR-based factorizations are asymptotically less efficient than hierarchical approaches, they still deliver considerable gains. The BLR format is compatible with numerical pivoting, and its simplicity and flexibility make it easy to use in the context of a general purpose, algebraic solver. © 2015 Society for Industrial and Applied Mathematics. Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
Article
This paper presents a preconditioning method based on an approximate inverse of the original matrix, computed recursively from a multilevel low-rank (MLR) expansion approach. The basic idea is to recursively divide the problem in two and apply a low-rank approximation to a matrix obtained from the Sherman-Morrison formula. The low-rank expansion is computed by a few steps of the Lanczos bidiagonalization procedure. The MLR preconditioner has been motivated by its potential for exploiting different levels of parallelism on modern high-performance platforms, though this feature is not yet tested in this paper. Numerical experiments indicate that, when combined with Krylov subspace accelerators, this preconditioner can be efficient and robust for solving symmetric sparse linear systems.
Article
Hierarchically semiseparable (HSS) matrix techniques are emerging in constructing superfast direct solvers for both dense and sparse linear systems. Here, we develop a set of novel parallel algorithms for key HSS operations that are used for solving large linear systems. These are parallel rank-revealing QR factorization, HSS constructions with hierarchical compression, ULV HSS factorization, and HSS solutions. The HSS tree-based parallelism is fully exploited at the coarse level. The BLACS and ScaLAPACK libraries are used to facilitate the parallel dense kernel operations at the fine-grained level. We apply our new solvers for discretized Helmholtz equations for multifrequency seismic imaging and iteratively solve time-harmonic seismic inverse boundary value problems. In particular, we use the HSS algorithms to solve the dense Schur complement systems associated with the root separator of the separator tree obtained from nested dissection of the graph of discretized Helmholtz equations. We demonstrate that the new approach is much faster and uses much less memory than the LU factorization algorithm for both two-dimensional and three-dimensional problems, using up to 8912 processing cores. This is the first work in parallelizing HSS algorithms and conducting detailed performance analysis on a large parallel machine. This also lays a good foundation for developing scalable sparse structured factorization algorithms for general sparse linear systems.
Article
When combined with Krylov projection methods, polynomial filtering can provide a powerful method for extracting extreme or interior eigenvalues of large sparse matrices. This general approach can be quite efficient in the situation when a large number of eigenvalues is sought. However, its competitiveness depends critically on a good implementation. This paper presents a technique based on such a combination to compute a group of extreme or interior eigenvalues of a real symmetric (or complex Hermitian) matrix. The technique harnesses the effectiveness of the Lanczos algorithm with partial reorthogonalization and the power of polynomial filtering. Numerical experiments indicate that the method can be far superior to competing algorithms when a large number of eigenvalues and eigenvectors is to be computed.
Article
Rank structures provide an opportunity to develop new efficient numerical methods for practical problems, when the off-diagonal blocks of certain dense intermediate matrices have small (numerical) ranks. In this work, we present a framework of structured direct factorizations for general sparse matrices, including discretized PDEs on general meshes, based on the multifrontal method and hierarchically semiseparable (HSS) matrices. We prove an idea of reduced matrices, which replaces certain complex structured matrix operations by fast simple ones performed on small dense forms. Such forms result from the hierarchical factorization of a tree-structured HSS matrix in a ULV-type factorization, so that the tree structure is reduced into a single node, the root of the original tree. This idea is shown to be very useful in the partial ULV factorization of an HSS matrix (for quickly computing Schur complements) as well as the solution stage. These techniques are then built into the multifrontal method for sparse factorizations after nested dissection, so as to convert the intermediate dense factorizations into fast structured ones. This method keeps certain Schur complements dense so as to avoid complicated data assembly, and is much simpler and more general than some existing methods. In particular, if the matrix arises from the discretization of certain PDEs, the factorization costs roughly O(n) flops in 2D, and roughly O(n 4/3) flops in 3D. The solution cost and memory are nearly O(n) in both cases. These counts are obtained with an idea of rank relaxation, so that this method is more generally applicable to problems where the intermediate off-diagonal ranks are not small. We demonstrate the performance of the method with 2D and 3D discretized equations, as well as various examples from a sparse matrix collection. The ideas here are also useful in future developments of fast structured solvers.
Article
This paper considers construction and properties of factorized sparse approximate inverse preconditionings well suited for implementation on modern parallel computers. In the symmetric case such preconditionings have the form AGLAGLTA \to G_L AG_L^T , where GLG_L is a sparse approximation based on minimizing the Frobenius form IGLLAF\| I - G_L L_A \|_F to the inverse of the lower triangular Cholesky factor LAL_A of A, which is not assumed to be known explicitly. These preconditionings preserve symmetry and/or positive definiteness of the original matrix and, in the case of M-, H-, or block H-matrices, lead to convergent splittings.