Content uploaded by C. Janna
Author content
All content in this area was uploaded by C. Janna on Mar 21, 2018
Content may be subject to copyright.
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SIAM J. MATRIX ANAL. APPL.c
2018 Society for Industrial and Applied Mathematics
Vol. 39, No. 1, pp. 123–147
A ROBUST MULTILEVEL APPROXIMATE INVERSE
PRECONDITIONER FOR SYMMETRIC POSITIVE
DEFINITE MATRICES∗
ANDREA FRANCESCHINI†, VICTOR ANTONIO PALUDETTO MAGRI†, MASSIMILIANO
FERRONATO‡,AND CARLO JANNA§
Abstract. The use of factorized sparse approximate inverse (FSAI) preconditioners in a standard
multilevel framework for symmetric positive definite (SPD) matrices may pose a number of issues
as to the definiteness of the Schur complement at each level. The present work introduces a robust
multilevel approach for SPD problems based on FSAI preconditioning, which eliminates the chance of
algorithmic breakdowns independently of the preconditioner sparsity. The multilevel FSAI algorithm
is further enhanced by introducing descending and ascending low-rank corrections, thus giving rise
to the multilevel FSAI with low-rank corrections (MFLR) preconditioner. The proposed algorithm is
investigated in a number of test problems. The numerical results show that the MFLR preconditioner
is a robust approach that can significantly accelerate the solver convergence rate preserving a good
degree of parallelism. The possibly large set-up cost, mainly due to the computation of the eigenpairs
needed by low-rank corrections, makes its use attractive in applications where the preconditioner can
be recycled along a number of linear solves.
Key words. preconditioning, approximate inverses, parallel computing, iterative methods
AMS subject classifications. 65F08, 65F10, 65F50, 65Y05
DOI. 10.1137/16M1109503
1. Introduction. The solution of a large sparse linear system of equations in
the form
(1.1) Ax=b
where A∈Rn×n,b,x∈Rn, and Ais symmetric positive definite (SPD), typically
requires the use of preconditioned iterative methods based on Krylov subspaces. As
is well known, the key for accelerating the solver convergence is the definition of the
preconditioner, i.e., the operator that approximates the action of A−1.
There are several different approaches for building an efficient preconditioner, ei-
ther physics-based or purely algebraic. Among the algebraic algorithms, the most
popular categories are incomplete factorizations, multigrid methods, and sparse ap-
proximate inverses [2, 11], though different preconditioners can be composed to form
new ones, e.g., blending domain decomposition approaches with incomplete factor-
izations or approximate inverses, or using nested Krylov methods [7, 18, 19, 24].
Combinations of physics-based and purely algebraic approaches are also denoted as
“gray-box” solvers [2, 5]. For example, some physics-based pattern selection strategies
∗Received by the editors December 27, 2016; accepted for publication (in revised form) by
D. Orban August 21, 2017; published electronically January 24, 2018.
http://www.siam.org/journals/simax/39-1/M110950.html
Funding: The work of the authors was supported by the ISCRA project “SCAIP: Scalable
approximate inverse preconditioners.”
†Department ICEA, University of Padova, 35121 Padova, Italy (franc90@dmsa.unipd.it,
victor.magri@dicea.unipd.it).
‡M3E s.r.l., 35129 Padova, Italy, and Department ICEA, University of Padova, 35121 Padova,
Italy (massimiliano.ferronato@unipd.it).
§Corresponding author. M3E s.r.l., 35129 Padova, Italy, and Department ICEA, University of
Padova, 35121 Padova, Italy (carlo.janna@unipd.it).
123
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
124 FRANCESCHINI, MAGRI, FERRONATO, AND JANNA
can be used for algebraic sparse approximate inverse preconditioners [5] and block
variants of multilevel incomplete factorizations [17, 6].
Compared to incomplete factorizations, sparse approximate inverses are gener-
ally more robust and more appropriate for parallel computational architectures. In
particular, the factorized sparse approximate inverse (FSAI) [21] is an algebraic pre-
conditioner for SPD problems that proves effective in a wide range of applications,
especially in its dynamically adaptive variants [16, 20]. This algorithm provides a
factorized approximation of A−1:
(1.2) GTG'A−1,
where Gis a lower triangular matrix explicitly computed in the set-up phase so as to
approach the inverse of the lower Cholesky factor of Ain the sense of the Frobenius
norm. One of the most attractive features of FSAI for modern computers is its intrinsic
high degree of parallelism. In fact, each row of Gcan be fully formed independently of
the others, with the parallelization trivially accomplished by evenly subdividing the
rows among threads and/or processes. The degree of parallelism is even redundant,
as the number of rows is much larger than the available number of computing cores.
The central idea of this work is to introduce some sequentiality in the FSAI
computation, in order to use the information extracted from the earlier set-up stages
for the remaining rows. This concept, which is the basis for incomplete factorizations,
has been already introduced in the context of approximate inverses in [8, 25], and
more recently in [12], where both a block tridiagonal and a domain decomposition
approach have been used to improve the FSAI performance. In the present paper,
we develop a more general multilevel framework that describes the former approaches
just by selecting a proper unknown ordering.
One of the difficulties arising in the multilevel generalization of the FSAI pre-
conditioner is related to the accuracy in the computation of the Schur complement
at each level. If the earlier levels are not well approximated, the resulting Schur
complement can be inaccurate, with a consequent degradation of the solver perfor-
mance. This issue has been recently addressed in the context of multilevel incomplete
factorizations with the aid of low-rank corrections [29]. Low-rank compression algo-
rithms are gaining increasing attention especially in direct linear solution methods,
e.g., [1, 14, 28, 30, 31, 32], with the basic idea of taking advantage of data sparsity in-
stead of structural sparsity. During the factorization process, the off-diagonal blocks,
which in discretized-PDE problems are usually characterized by low-rank properties,
are decomposed by SVD and compressed by neglecting those components correspond-
ing to the smallest singular values. There are several schemes to carry out the com-
pression on the entire matrix, ranging from H-matrices Hermitian and skew-Hermitian
splitting methods to block low rank tree structured compression. Typically a drop
tolerance is set below which the singular values are dropped. If this tolerance is suffi-
ciently small, the matrix factorization is cheaper but can still be directly applied to a
right-hand side to get the system solution. With large drop tolerances, the low-rank
approach gives an approximation of the exact factorization and can be used as a pre-
conditioner, e.g., [1]. To our knowledge, the first attempt to directly use low-rank
representations in preconditioning is discussed in [23] in the context of a divide and
conquer strategy.
Within the FSAI multilevel framework presented in this work, low-rank correc-
tions are introduced for both enhancing the preconditioner quality at the earlier
levels (descending low-rank, DLR) and improving the accuracy in the Schur com-
plement computation (ascending low-rank, ALR). The paper is organized as follows.
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A ROBUST MFLR PRECONDITIONER FOR SPD MATRICES 125
Algorithm 2.1. Multilevel Factorization Set-up.
1. Function ML SetUp(nl,A)
2. Set A0=A;
3. for all l= 0, . . . , nl−2do
4. Partition Alas K B
BTC;
5. Compute e
Lsuch that e
Le
LT'K;
6. Compute e
Hsuch that e
H'L−1
KB;
7. Compute e
Ssuch that e
S'C−BTK−1B;
8. Form Ml="e
L0
e
HTI#;
9. Set Al+1 =e
S;
10. end for
11. Compute Mnl−1such that Mnl−1MT
nl−1'Anl−1;
12. Set M={M0, M1, . . . , Mnl−1};
The Multilevel FSAI (MF) preconditioner is first derived proving its robustness. It is
then demonstrated that the MF quality can be improved if the approximated Schur
complement computed at any level is corrected in order to approximate the precon-
ditioner instead of the original matrix Schur complement. Low-rank corrections are
finally introduced to further improve the MF preconditioner. The properties of this
approach are numerically investigated, and finally a few considerations close the work.
2. Multilevel FSAI preconditioning. Consider a standard multilevel ap-
proach applied to the SPD matrix Afor a total number nlof levels. The matrix
Alobtained at any level l∈[0, nl−1] is the Schur complement of the previous
level. Following for instance [17], we can partition Alin four blocks and perform the
factorization:
(2.1) Al=K B
BTC=LK0
BTL−T
KI I0
0S LT
KL−1
KB
0I,
where LKLT
Kis the exact Cholesky decomposition of K,K∈Rn1×n1, and S=
C−BTK−1B,S∈Rn2×n2, is the Schur complement of Alwith respect to the
partition (n1, n2), i.e., Al+1 ≡S. The partitioning of each level can follow some
physics-based criterion, if any. As our goal is to get a preconditioner, we can replace
each block on the right-hand side of (2.1) with an approximation:
(2.2) e
L'LK,e
H'L−1
KB, e
S'S
The recursive computation of (2.1) with the approximations (2.2) provides the general
framework for a multilevel preconditioner Mof A, made by the list of factors Ml,
l∈[0, nl] (Algorithm 2.1). The preconditioner application stage, i.e., the computation
of w=M−1vfor some known vector v, is provided in Algorithm 2.2.
Because of the recursion in the computation of M, we can restrict to the two-level
case with no loss of generality. Hence, Alin (2.1) coincides with Aand n1+n2=n.
A popular choice is to use an incomplete factorization as the main kernel for the
approximations in (2.2), i.e., setting e
Lequal to the incomplete Cholesky (IC) factor
of Kas in [26, 22, 17, 4]. In the same framework, the use of FSAI as the main kernel
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
126 FRANCESCHINI, MAGRI, FERRONATO, AND JANNA
Algorithm 2.2. Multilevel Factorization Application.
1. Function w =ML Apply(nl,M,v)
2. Set levstart = 1;
3. for all l= 0, . . . , nl−2do
4. Solve Mlz=v;
5. Set levend =levstart +n1−1;
6. Form w(levstart :levend) = v(1 : n1);
7. Set v=v(n1+ 1 : n);
8. Update levstart =levend + 1;
9. end for
10. Solve (Mnl−1MT
nl−1)z=v;
11. Form w(levstart :n) = z;
12. Set levend =levstart −1;
13. for all l=nl−2,...,0do
14. Set levstart =levend −n1+ 1;
15. Retrieve wl=w(levstart :n);
16. Solve MT
lz=wl;
17. Form w(levstart :n) = wl;
18. Update levend =levstar t −1;
19. end for
is straightforward. After computing Gsuch that GTG'K, the approximations in
(2.2) become
(2.3) e
L'G−1,e
H'GB, e
S'C−e
HTe
H.
In contrast to what typically happens using an incomplete factorization as the main
kernel, no dropping is necessary to compute efficiently e
Hand e
S, because Gis usually
very sparse. In fact, recall that the sparsity of Gis controlled by the user, while the
one of e
L−1is not.
Unfortunately, this straightforward implementation of the MF preconditioner is
very prone to breakdowns. In fact, the Schur complement approximation e
S=C−
e
HTe
His computed as the difference between two SPD matrices and can be indefinite.
Such an experience is quite common even in relatively well-conditioned problems.
The reason for this resides in the poor approximation of the leftmost eigenvalues
usually obtained by FSAI. Figure 1 compares the eigenspectra of the bcsstk16 matrix
(structural problem, n= 4,884 with 290,378 nonzero entries) from the University of
Florida sparse matrix collection [9] with its approximations by IC and FSAI, e
Le
LTand
(GTG)−1, respectively, where e
Land Gare computed with the same number of non-
zeros. Both IC and FSAI are not able to capture the smallest eigenvalues of a matrix,
though IC is generally better. The smallest eigenvalues of Kare the largest of K−1,
and therefore control the most significant entries of BTK−1B. As a consequence,
e
HTe
Hcomputed as in (2.3) often fails in approximating accurately the largest entries
of BTK−1B, leading to the appearance of negative eigenvalues in e
Sand causing the
breakdown of the procedure. This occurrence can happen using IC as a kernel in a
multilevel preconditioning framework as well. To address this issue some stabilization
techniques, e.g., based on diagonal shifts [17, 27], have been successfully introduced.
Unfortunately, these strategies do not provide satisfactory results when used for the
computation of e
Swith the aid of FSAI as a kernel.
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A ROBUST MFLR PRECONDITIONER FOR SPD MATRICES 127
10 100 1000
1×10-2
1×10-1
1×100
A
G-1G-T
LLT
Fig. 1.Comparison between the eigenspectra of A,e
Le
LT, and (GTG)−1for the bcsstk16 matrix
from the University of Florida sparse matrix collection.
The robustness of the MF preconditioner can be ensured by computing e
Sso as
to be SPD independently of G. To this aim, we can use the following result.
Theorem 2.1. Let A∈Rn×nbe an SPD (2 ×2)-block matrix,
(2.4) A=K B
BTC,
with K∈Rn1×n1,B∈Rn1×n2, and C∈Rn2×n2, and let V∈Rn×n2and D∈Rn2×n
be the 2-block rectangular matrices
(2.5) V=FT
I, D =0Z,
with F∈Rn2×n1and Z∈Rn2×n2, such that the Frobenius norm kD−VTLkFis
minimum for any Z,Lbeing the lower Cholesky factor of A. Then, S=VTAV is
the Schur complement of Awith respect to the partition (n1, n2).
Proof. Recalling (2.1), the lower Cholesky factor of Areads
(2.6) L=LK0
BTL−T
KLS,
where LSis the lower Cholesky factor of the Schur complement of A, i.e., LSLT
S=
C−BTK−1B. The matrix (D−VTL) is therefore
(2.7) D−VTL=F LK+BTL−T
K, Z −LS,
whose Frobenius norm is minimum for any Zif F LK+BTL−T
K= 0; i.e.,
(2.8) F=−BTK−1.
The matrix S=VTAV reads
(2.9) S=F KF T+BTFT+F B +C.
Introducing (2.8) into (2.9) provides VTAV =C−BTK−1B.
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
128 FRANCESCHINI, MAGRI, FERRONATO, AND JANNA
Remark 2.2. The matrix Fof Theorem 2.1 is generally dense. If (2.8) is enforced
only for the entries located in a prescribed set of positions e
S ⊂ S ={(i, j) : 1 ≤
i≤n1,1≤j≤n2}, the sparsity of Fcan be retained at a workable level. This
definition for Fcoincides with the block FSAI preconditioner introduced in [18] and
[16], where the nonzero pattern e
Sis defined either statically or dynamically during
the computation of F. Using a sparse Fin (2.9) produces an approximation e
Sof the
exact Schur complement Sof A.
Corollary 2.3. The Schur complement approximation e
Scomputed with (2.9)
and a sparse block FSAI Fis SPD.
Proof. The expression of e
Scan be easily rearranged by adding and subtracting
BTK−1B:
(2.10) e
S=C−BTK−1B+ (F+BTK−1)K(K−TB+FT) = S+WTKW.
The result immediately follows by noting that WTKW is SPD.
Based on these results, the MF preconditioner is built as follows. The zero-level
preconditioner M−1
0is made by two factors:
(2.11) M−1
0=PbPa.
An explicit approximation of K−1is computed as GTGusing an adaptive FSAI
procedure [20] and introduced in Pa:
(2.12) Pa=G0
0I.
Then, the preconditioned matrix PaAP T
ais computed,
(2.13) PaAP T
a=GKGTGB
BTGTC,
and the adaptive block FSAI [16] of PaAP T
ais computed for the second factor:
(2.14) Pb=I0
F I .
The zero-level preconditioned matrix M−1
0AM−T
0reads
(2.15)
M−1
0AM−T
0=I0
F I GKGTGB
BTGTC I F T
0I=GKGT−RT
F
−RFe
S,
where RF=−F(GKGT)−BTGTis the residual on Fi.e., RFapproaches the null
matrix as the accuracy in the computation on Fincreases. Finally, the (2,2) block of
M−1
0AM−T
0is the approximation of the first-level Schur complement
(2.16) e
S=C+F GB +BTGTFT+F GKGTFT
that becomes the new matrix for the next level. As e
Sin (2.16) is SPD for any Fand
G(see Corollary 2.3), no breakdown is possible. The operations required for building
the robust MF preconditioner are provided in Algorithm 2.3.
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A ROBUST MFLR PRECONDITIONER FOR SPD MATRICES 129
Algorithm 2.3. FSAI-based Multilevel Preconditioner Set-up.
1. Function MF SetUp(nl,A)
2. Set A0=A;
3. for all l= 0, . . . , nl−2do
4. Partition Alas K B
BTC;
5. Compute the adaptive FSAI approximation Gof Ksuch that GTG'K−1;
6. Set Pa=G0
0I;
7. Compute Pb=I0
F I the adaptive block FSAI approximation of PaAlPT
a;
8. Compute e
S=C+F GB +BTGTFT+F GKGTFT;
9. Form M−1
l=PbPa;
10. Set Al+1 =e
S;
11. end for
12. Compute M−1
nl−1as the adaptive FSAI approximation of Anl−1;
13. Set M−1=M−1
0, M −1
1, . . . , M −1
nl−1;
2.1. Theoretical properties. In this section, we will obtain some theoretical
bounds on the eigenspectrum of the preconditioned matrix according to the different
approximations introduced in the MF computation. At every level of the MF pre-
conditioner set-up, the partial factorization of the approximated Schur complement
(2.16) is computed. This operation introduces additional approximations level after
level, potentially yielding to a Schur complement quite different from the exact one.
Consider the zero-level preconditioner M−1
0of (2.11):
(2.17) M−1
0=G0
F G I .
The natural choice for the next level preconditioner is
(2.18) f
M−1
1=I0
0L−1
e
S,
where Le
SLT
e
S=e
S. However, e
Sis an approximated Schur complement. Should Sbe
available, one could use
(2.19) M−1
1=I0
0L−1
S
with LSLT
S=S. Using either f
M−1
1or M−1
1as the next level preconditioner of A
leads to a different performance. The two propositions that follow provide a theo-
retical upper bound for the eigenvalues of the preconditioned matrices f
M−1Af
M−T
and M−1AM−T, in order to allow for an a priori assessment of the preconditioner
quality.
Proposition 2.4. The eigenvalues λof the preconditioned matrix f
M−1Af
M−T,
with f
M−1=f
M−1
1M−1
0as defined in (2.17) and (2.18), satisfy
(2.20) |λ−1| ≤ kEKk+qkEKk2+ 4ke
QTkk e
Qk
2,
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
130 FRANCESCHINI, MAGRI, FERRONATO, AND JANNA
where e
Q=G(KGTFT+B)L−T
e
S=−RT
FL−T
e
Sand EK=GKGT−I, for any consistent
matrix norm.
Proof. The preconditioned matrix f
M−1Af
M−Treads
(2.21)
f
M−1Af
M−T="GKGTG(K GTFT+B)L−T
e
S
L−1
e
S(F GK +BT)GTI#="I+EKe
Q
e
QTI#.
Its eigenpairs (λ, w), w= [u,v]T, satisfy by definition the relationship
(2.22) "I+EKe
Q
e
QTI#u
v=λu
v,
which is equivalent to
(2.23) "EKe
Q
e
QT0#u
v= (λ−1) u
v.
Taking consistent norms at both sides of the first and second set of equations we have
(2.24) (kEKkkuk+ke
Qkkvk≥|λ−1|kuk,
ke
QTkkuk ≥ |λ−1|kvk,
which, by setting t=kvk/kuk, can be rearranged as
(2.25) (|λ−1|≤kEKk+ke
Qkt,
|λ−1|≤ke
QTk/t.
If kuk= 0, then trivially v∈Ker( e
Q) and λ= 1, thus satisfying the inequality (2.20).
The right-hand side of the first and second inequality in (2.25) increases and decreases
monotonically with t, respectively (Figure 2). The intersection point is
(2.26) ¯
t=−kEKk+qkEKk2+ 4ke
QTkk e
Qk
2ke
Qk,
and thus for any twe have
(2.27) |λ−1| ≤ kEKk+qkEKk2+ 4ke
QTkk e
Qk
2.
Proposition 2.5. The eigenvalues λof the preconditioned matrix M−1AM −T,
with M−1=M−1
1M−1
0as defined in (2.11) and (2.19), satisfy
(2.28)
|λ−1| ≤ kEKk+kQTkk (I+EK)−1kkQk+q(kEKk−kQTkk (I+EK)−1kkQk)2+ 4kQTkkQk
2,
where Q=G(KGTFT+B)L−T
S=−RT
FL−T
Sand EK=GKGT−I, for any consistent
matrix norm.
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A ROBUST MFLR PRECONDITIONER FOR SPD MATRICES 131
t
||Q ||/t
K
K
||E ||
||E ||+||Q||t
~
~
T
| −1|λ
t
Fig. 2.Schematic representation of the system of inequalities (2.25).
Proof. Operating as in the proof of Proposition 2.4, it can be shown that the
eigenpairs of M−1AM−Tsatisfy
(2.29) I+EKQ
QTI+QT(I+EK)−1Q u
v=λu
v,
which is equivalent to
(2.30) EKQ
QTQT(I+EK)−1Q u
v= (λ−1) u
v.
Again, taking consistent norms on both sides of the first and second set of equations,
and setting t=kvk/kuk, we get
(2.31) |λ−1|≤kEKk+kQkt,
|λ−1|≤kQTk/t +kQTkk (I+EK)−1kkQk.
If kuk= 0, then trivially v∈Ker(Q) and λ= 1, thus satisfying the inequality
(2.28). Otherwise, denoting by ¯
tthe intersection point between the right-hand sides
of (2.31),
(2.32)
¯
t=−kEKk+kQTkk (I+EK)−1kkQk+q(kEKk−kQTkk (I+EK)−1kkQk)2+ 4kQTkkQk
2kQk
for any t, we have
(2.33)
|λ−1| ≤ kEKk+kQTkk (I+EK)−1kkQk+q(kEKk−kQTkk (I+EK)−1kkQk)2+ 4kQTkkQk
2.
Now, to understand which preconditioner, either f
M−1or M−1, is expected to
ensure a faster convergence, we compare the upper bounds provided in Propositions
2.4 and 2.5. These bounds depend on the norms of EKand Qor e
Q, which in turn
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
132 FRANCESCHINI, MAGRI, FERRONATO, AND JANNA
are controlled by the accuracy in the computation of Gand F, respectively. As GTG
approaches K−1,kEKk → 0, and in the limit the bounds (2.20) and (2.28) read
(2.34) |λ−1| ≤ qke
Qkk e
QTk
and
(2.35) |λ−1| ≤ 1
2kQkkQTk+qkQkkQTk(4 + kQkkQTk),
respectively. Similarly, as Fapproaches −BTK−1G−1,kQk,ke
Qk → 0 and the bounds
(2.20) and (2.28) trivially provide
(2.36) |λ−1|≤kEKk.
The previous relationships hold true for any consistent matrix norm. However, to
compare the bounds in an easier way, we restrict our attention to the matrix norm
induced by the 2-norm of vectors. In this case we have
kEKk2=λ1(EK) = 1,kQk2=pλ1(QTQ) = η1,ke
Qk2=qλ1(e
QTe
Q) = ˜η1,
k(I+EK)−1k2=λ−1
n(GKGT) = κ−1
n,
(2.37)
and the bounds (2.20) and (2.28), respectively, become
|λ−1| ≤ 1+p2
1+ 4˜η2
1
2,(2.38)
|λ−1| ≤ 1+η2
1κ−1
n+q1−η2
1κ−1
n2+ 4η2
1
2.(2.39)
Remark 2.6. When 1= 0, it is easy to prove that the maximum and minimum
eigenvalues of f
M−1Af
M−Tare 1 + ˜η1and 1 −˜η1, respectively, and the maximum
and minimum eigenvalues of M−1AM −Tare 1 + (η2
1+pη4
1+ 4η2
1)/2 and 1 + (η2
1−
pη4
1+ 4η2
1)/2, respectively.
The following results suggest the use of f
M−1instead of M−1as the MF precon-
ditioner of A.
Theorem 2.7. For any choice of Gand Fin (2.17), the bound (2.38) is narrower
than or equal to the bound (2.39).
Proof. Using the arguments of Corollary 2.3, it follows that e
S=S+e
H, with e
H
a symmetric positive semidefinite matrix, hence kSe
S−1k2≤1. In particular,
e
S=C+F GKGTFT+F GB +BTGTFT
=C−BTK−1B+ (F+BTK−1G−1)(GKGT)(F+BTK−1G−1)T
=S+RF(GKGT)−1RT
F.(2.40)
Moreover, the matrix e
QTe
Qis similar to LT
Se
S−1LSQTQ. In fact, recalling that
Q=−RT
FL−T
Sand e
Q=−RT
FL−T
e
S, we obtain
(2.41) RT
FRF=LSQTQLT
S=Le
Se
QTe
QLT
e
S,
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A ROBUST MFLR PRECONDITIONER FOR SPD MATRICES 133
from which the similarity follows. Hence
(2.42) ke
QTe
Qk2=kLT
Se
S−1LSQTQk2≤ kLT
Se
S−1LSk2kQTQk2≤ kQTQk2
because LT
Se
S−1LSis similar to Se
S−1. As a consequence, ˜η1=αη1for some α≤1.
The thesis of the theorem reads
(2.43) 1+p2
1+ 4˜η2
1
2≤1+η2
1κ−1
n+q1−η2
1κ−1
n2+ 4η2
1
2.
Introducing ˜η1=αη1in (2.43), after some algebra we obtain
(2.44) α2≤1 + q1−η2
1κ−1
n2+ 4η2
1−1−η2
1κ−1
n
2,
which holds true for any Fand G.
Theorem 2.7 suggests that the use of e
Sin the MF preconditioner is likely to be
more appropriate than the exact Schur complement S. In particular, in the theoret-
ical case of GTG=K−1it is possible to compute explicitly the ratio between the
conditioning numbers of the preconditioned matrices M−1AM−Tand f
M−1Af
M−Tas
follows.
Theorem 2.8. If EK= 0, the ratio between the conditioning number of the pre-
conditioned matrices (2.29) and (2.21) is
(2.45) r(η1) = 1 + η2
1+pη4
1+ 4η2
1
2!2
1+2η2
1+ 2qη4
1+η2
1
.
Proof. If EK= 0, it can be easily verified from (2.40) that e
S=S+RFRT
Fand
(2.46) L−1
Se
SL−T
S=I+L−1
SRFRT
FL−T
S=I+QTQ.
Recalling from the proof of Theorem 2.7 that e
QTe
Qis similar to LT
Se
S−1LSQTQwe
have
(2.47) ke
QTe
Qk2=k(I+QTQ)−1QTQk2.
Trivially, (I+QTQ)−1QTQis symmetric positive definite and has the same eigenvec-
tors as QTQ. Denoting by λan eigenvalue of QTQ, the norm k(I+QTQ)−1QTQk2
is the maximum of the function
(2.48) f(λ) = λ
1 + λ.
As f(λ) monotonically increases with λ, its maximum value is attained for the largest
eigenvalue of QTQ, i.e., kQTQk2=η2
1. Hence
(2.49) ke
QTe
Qk2=kQTQk2
1 + kQTQk2
⇒˜η1=sη2
1
1 + η2
1
.
The proof is completed by introducing (2.49) in the results of Remark 2.6.
Remark 2.9. The ratio r(η1) monotonically increases with η1and takes value 1
for η1= 0, i.e., F=−BTK−1G−1and e
S=S. For η1→ ∞,rmonotonically diverges
to infinity as the second power of η1. Hence, the sparser or more inaccurate F, the
more important is using e
Sinstead of Sin the MF preconditioner.
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
134 FRANCESCHINI, MAGRI, FERRONATO, AND JANNA
3. Improving the MF performance with low-rank corrections. The frame-
work developed in section 2 provides the formulation for a robust multilevel FSAI
preconditioner that can be computed in a stable way for any choice of the fill-in
degree used for the basic kernels. Using low fill-in degrees obviously keeps the com-
putational cost for the preconditioner set-up and application under control, but may
yield a poor convergence. The quality of the MF preconditioner can be improved
by using low-rank corrections. The idea of using low-rank corrections in a multilevel
framework has been already introduced in [29], giving rise to the robust multilevel
Schur complement-based low-rank (MSLR) preconditioner. The basic concept can be
briefly recalled as follows. Define the matrix
(3.1) Y=L−1
CBTK−1BL−T
C=L−1
C(C−S)L−T
C,
where LCis the exact lower factor of C, i.e., C=LCLT
C. It is easily recognized that
the eigenvalues σiof Yare such that
(3.2) 0 ≤σn2≤ · · · ≤ σ1<1.
The separation of the eigenvalues θiof X=LT
C(S−1−C−1)LCis larger than that of
Y, because
(3.3)
θi=σi
1−σi, i = 1, . . . , n2,
θi−θi+1 =σi−σi+1
(1 −σi)(1 −σi+1), i = 1, . . . , n2−1.
The separation of the eigenvalues of L−T
CXL−1
C=S−1−C−1has a stronger impact
on the performance of the MSLR preconditioner [29], however studying Xis easier
and the main results for Xare to some extent still valid for S−1−C−1. Equation
(3.3) suggests that approximating with a low-rank matrix (S−1−C−1) is easier than
(S−C) because of the faster eigenvalue decay. A better approximation of S−1can
be computed as
(3.4) S−1'C−1+WkΘkWT
k
with WkΘkWT
ka rank-kapproximation of L−T
CXL−1
Cwhich can be obtained from
the eigendecomposition of Y. In fact, by retaining the klargest eigenvalues and
corresponding eigenvectors of Ywe can write
(3.5) Y'UkΣkUT
k.
Noting that
(3.6) S−1−C−1=L−T
C[(I−Y)−1−I]L−1
C=L−T
C[Y(I−Y)−1]L−1
C,
the rank-kcorrection to S−1−C−1is found by setting
(3.7) Θk= Σk(I−Σk)−1
and Wk=L−T
CUk.
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A ROBUST MFLR PRECONDITIONER FOR SPD MATRICES 135
In the original formulation outlined above, the low-rank corrections are used to
make the action of C−1closer to that of S−1. By distinction, we use low-rank cor-
rections to improve the action of e
S−1. A consequence of Corollary 2.3 is that the
eigenvalues σiof the matrix
(3.8) Y=L−1
e
S(e
S−S)L−T
e
S
satisfy the condition (3.2). Thus, following the procedure outlined above for C, we
can compute the klargest eigenpairs of Y
(3.9) Y'UkΣkUT
k,
and get the expression of the corrected Schur complement inverse:
(3.10) S−1'e
S−1+WkΘkWT
k,
where Θk= Σk(I−Σk)−1and W=L−T
e
SU. However, the idea of correcting the
application of e
S−1so as to better resemble the one of S−1is not that good. First, the
computation of (3.9) may be quite expensive, since every multiplication by Srequires
a solution of a linear system with K. Second, according to Theorems 2.7 and 2.8, the
use of S−1in our multilevel framework is not optimal.
The low-rank corrections can be more effectively implemented in another way.
Since we are working in a multilevel framework, e
S−1will not be used exactly. Rather,
a new approximation, say b
S−1'e
S−1, will be computed. Thus, b
Swill be the new
target of the low-rank correction. Moreover, as shown by (2.20), reducing kEKkis
also useful for improving the convergence. Also this task can be performed by a
low-rank correction. Therefore, we use two low-rank correction techniques for the
preconditioner set-up:
•Descending low-rank corrections: computed at each level, from the first to
the last, to reduce kEKk;
•Ascending low-rank corrections: computed at each level, from the last to the
first, to reduce the gap between b
Sand e
S.
3.1. Descending low-rank corrections. The aim of this correction is to en-
hance the approximation of the inverse of K. We can define the matrix
(3.11) Y=G[(GTG)−1−K]GT=I−GKGT
obtained from (3.8), where (GTG)−1and Kreplace e
Sand S, respectively, and com-
pute its rank-kapproximation:
(3.12) Y'UkΣkUT
k.
Note that the computation of Ukand Σkis less expensive than in (3.9), because both
Gand Kare explicitly known. The enhanced preconditioner for Kreads
(3.13) K−1'GTG+WkΘkWT
k.
The eigenvalues of Yare bounded from above by 1 as GKGTis positive definite,
but there is not a lower bound in this case. Actually this is not a problem, as we are
mainly interested in the computation of the eigenvalue σiof Yclosest to 1. From the
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
136 FRANCESCHINI, MAGRI, FERRONATO, AND JANNA
implementation point of view, it is better to dispose of a symmetrically split operator.
Hence, we define
(3.14) e
G= (I+UkΨkUT
k)G
in order to have
(3.15) e
GTe
G=GT(I+UkΨkUT
k)(I+UkΨkUT
k)G=GTG+WkΘkWT
k.
Recalling that Wk=GTUk, the diagonal k×kmatrix Ψkis simply found by solving
(3.16) I+ 2UkΨkUT
k+UkΨ2
kUT
k=I+UkΘkUT
k.
Using (3.3), the entries of Ψkread
(3.17) ψi=−1 + r1
1−σi
, i = 1, . . . , k.
Since GKGTis positive definite, σi<1 and ψiis real for any i.
This correction can significantly reduce kEKk. However, the update in e
Gprop-
agates in the other blocks of the preconditioned matrix, potentially shattering the
overall procedure efficiency:
PaAP T
a=e
GK e
GTe
GB
BTe
GTC=GKGTGB
BTGTC
(3.18)
+"UkΨkUT
kGKGT+GK GTUkΨkUT
k+UkΨkUT
kGKGTUkΨkUT
kUkΨkUT
kGB
BTGTUkΨkUT
kC#.
Actually, this is not the case. In fact, the block FSAI e
Fcomputed as the (2,1) block of
Pbwhen using e
Gis the approximate solution of the multiple right-hand-side system:
(3.19) e
FT' −(e
GK e
GT)−1e
GB =−e
G−TK−1B.
By expanding (3.19) with e
Gdefinition, we note that e
Fcan be easily found as:
(3.20) e
F=F(I+UkΨkUT
k)−1,
where Fis the standard block FSAI computed using G. Moreover, from (3.20)
and (3.14) we notice that F G =e
Fe
G. This implies that the approximate Schur
complement (2.16) is not affected by the use of e
Fand e
G:
(3.21)
e
S=C+e
Fe
GB +BTe
GTe
FT+e
Fe
GK e
GTe
FT=C+F GB +BTGTFT+F GKGTFT.
As a consequence, descending low-rank corrections on Ghave only a local impact with
no changes for the following levels.
3.2. Ascending low-rank corrections. We define the matrix b
Yand compute
its rank-kapproximation:
(3.22) b
Y=I−b
Ge
Sb
GT'b
Ukb
Σkb
UT
k,
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A ROBUST MFLR PRECONDITIONER FOR SPD MATRICES 137
where b
Gis the lower inverse factor of b
S; i.e., ( b
GTb
G)−1=b
S, which is explicitly available
by the approximation of lower levels. The computation of (3.22) is relatively cheap,
because the explicit expression of every matrix is known. The new approximation to
e
Sis given by:
(3.23) e
S−1'b
GTb
G+c
Wkb
Θkc
WT
k,
where c
Wk=b
GTb
Ukand b
Θk=b
Σk(I−b
Σk)−1. Notice that during the set-up, we use a
split update, as done for the descending low-rank corrections, because it is operatively
necessary to compute b
Ge
Sb
GT. However, during the preconditioner application, the use
of (3.23) is more efficient as it only requires one update.
4. Numerical results. The multilevel FSAI preconditioner with low-rank cor-
rections (MFLR) can be implemented and applied through a recursive function (Al-
gorithms 4.1 and 4.2, respectively). The key parameters for the MFLR computation
and application are the matrix A, the number of levels nl, the current level index l,
and the size of the ascending and descending low-rank corrections, alr and dlr. Other
threshold parameters are actually needed by the inner kernels, such as those required
Algorithm 4.1. MFLR Set-up.
1. Recursive Function MFLR SetUp(l,nl,A,alr,dlr)
2. if l < (nl−1) then
3. Partition Alas K B
BTC;
4. Compute the adaptive FSAI approximation Gof Ksuch that GTG'K−1;
5. Set Pa=G0
0I;
6. Compute Pb=I0
F I as the adaptive block FSAI approximation of
PaAlPT
a;
7. Compute e
S=C+F GB +BTGTFT+F GKGTFT;
8. Compute a rank-kapproximation UkΣkUT
kof Y=I−GKGT;
9. Set e
G= (I+UkΨkUT
k)Gwith Ψk= (I−Σk)−1/2−I;
10. Set P=e
G0
F G I ;
11. [Qalr, QM] = MFLR SetUp(l+ 1,nl,e
S,alr,dlr);
12. Use Qalr and QMto compute the rank-kcorrection c
Wkand b
Θkas in (3.23);
13. Set e
P=I0
0I+c
Wkb
Θkc
WT
k;
14. Push e
Pin the head of Qalr;
15. Push Pin the head of QM;
16. return Qalr, QM;
17. else
18. Compute the adaptive FSAI approximation Gof Asuch that GTG'A−1;
19. Set Qalr =∅;
20. Set QM={G};
21. return Qalr, QM;
22. end if
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
138 FRANCESCHINI, MAGRI, FERRONATO, AND JANNA
Algorithm 4.2. MFLR Application.
1. Recursive Function MFLR Apply(l,nl,Qalr,QM,x)
2. if l < (nl−1) then
3. Pop Pfrom the head of QM;
4. Pop e
Pfrom the head of Qalr;
5. Compute y=Px;
6. Partition yinto y1=y(1 : n1) and y2=y(n1+ 1 : n1+n2);
7. Compute z2=MFLR Apply(l+ 1,nl,Qalr,QM,y2);
8. Form zwith y1and z2;
9. Compute y=e
Pz;
10. Update y←PTy;
11. Push Pin the head of QM;
12. Push e
Pin the head of Qalr;
13. return y;
14. else
15. Pop Gfrom the head of QM;
16. Compute y=GTGx;
17. Push Gin the head of QM;
18. return y;
19. end if
by the computation of Gand F, and the level-of-fill control of the matrix-matrix
products. For the sake of readability, these parameters are dropped from the list of
arguments of the MFLR SetUp and MFLR Apply functions. The preconditioner
is stored using the lists Qalr and QM, and the parallelization is performed equally
partitioning each level among the available cores using the OpenMP directives. The
reason for this choice, which limits the algorithm scalability and the problem size, is
that the aim of this work is mainly the analysis of the MFLR preconditioner behavior.
The MPI implementation is an ongoing task.
The MFLR preconditioner behavior is investigated on a set of test cases. First, its
theoretical properties are verified in a small problem. Second, the MFLR sensitivity
to the user-specified parameters that control the preconditioner quality and density
are analyzed with an extensive numerical experimentation on a medium-size matrix.
Finally, the performance in a few large-size problems is considered.
4.1. Theoretical properties. We analyze the bcsstk38 matrix from the Uni-
versity of Florida Sparse Matrix Collection [9]. This SPD matrix has 8,032 rows
and 355,460 nonzeroes, and has been scaled so as to have a unitary diagonal. Its
eigenspectrum is provided in the leftmost frame of Figure 3. The matrix is uniformly
partitioned into two levels (n1=n2= 4,016). The approximate Schur complement e
S
is computed using (2.9), where Fis the block FSAI of Aobtained using the dynamic
strategy introduced in [16] and a variable number kF,max of entries retained per row.
The nonzero eigenspectrum of ( e
S−S) is shown in the rightmost frame of Figure 3
and is strictly positive for any F, as expected from Corollary 2.3.
The analysis is performed by introducing step-by-step new ingredients to the
MFLR preconditioner. In particular, (i) the exact inverse of Kis used to evaluate
the effect of using either e
Sor S, (ii) the approximation GTG'K−1is introduced,
and (iii) low-rank corrections are added to improve the preconditioner.
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A ROBUST MFLR PRECONDITIONER FOR SPD MATRICES 139
0 1000 2000 3000 4000 5000 6000 7000 8000
1×10-7
1×10-6
1×10-5
1×10-4
1×10-3
1×10-2
1×10-1
1×100
1×101
050 100 150 200
1×10-14
1×10-13
1×10-12
1×10-11
1×10-10
1×10-9
1×10-8
1×10-7
1×10-6
1×10-5
1×10-4
1×10-3
1×10-2
1×10-1
1×100
kF,max=30
kF,max=60
kF,max=90
Fig. 3.bcsstk38 test case: eigenspectrum of A(left) and (e
S−S)(right) for different block
FSAI F.
Table 1
bcsstk38 test case: comparison between the conditioning numbers of (2.29) and (2.21) varying
Fwith kEKk= 0.λand ˜
λdenote the eigenvalues of M−1AM−Tand f
M−1Af
M−T, respectively.
kF,max η1˜η1[λn, λ1] [˜
λn,˜
λ1]r(η1)
30 4.395 0.975 [0.0470, 21.273] [0.0249, 1.975] 5.709
60 2.867 0.944 [0.0988, 10.121] [0.0558, 1.944] 2.939
90 2.248 0.914 [0.1448, 6.9069] [0.0863, 1.914] 2.153
First, we want to verify that the main claim of section 2, i.e., the use of e
Sis
more appropriate than that of Sin the presented multilevel framework, is actually
correct using the exact inverse of K. The ratio between the conditioning numbers of
M−1AM−Tand f
M−1Af
M−T, as obtained in Theorem 2.8, is reported in Table 1 for
different block FSAI F. Should Fbe computed exactly as a full matrix, there would
be no difference between f
M−1and M−1. By distinction, decreasing the quality of F
means increasing η1=kQk2and ˜η1=ke
Qk2. As a consequence, the effectiveness of
f
M−1improves with respect to M−1; i.e., the sparser F, the more important it is to
use e
Sinstead of S. For this test problem, the ratio r(η1) between the conditioning
numbers of (2.29) and (2.21) increases up to 5.709.
Theorem 2.8 no longer holds if we introduce the approximation GTGfor K−1.
However, it is still true that the theoretical eigenvalue bounds for f
M−1are tighter
than those for M−1. The matrix Gis computed as the FSAI of Kusing the adaptive
strategy implemented in the FSAIPACK software package [20]. The quality of Gis
controlled by the maximum number of entries kG,max retained per row. First of all,
notice that e
Scomputed as in (2.3) might be indefinite if Gis not accurate enough. For
instance, even with kG,max = 250 the approximate Schur complement (C−BTGTGB)
still has one negative eigenvalue. Changing the fill-in degree of Gmodifies the values
obtained from the bounds (2.38) and (2.39), which become tighter and tighter as
kG,max increases. Such bounds along with the actual eigenvalue intervals are reported
in Table 2 for different choices of Gand F. It can be observed that the role played by
1and ˜η1, i.e., a measure of the quality of Gand F, respectively, has a similar impact
on the actual eigenvalue distribution of f
M−1Af
M−T, as expected from the right-hand
side of inequality (2.38). Hence, one may argue that improving both Gand Fis
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
140 FRANCESCHINI, MAGRI, FERRONATO, AND JANNA
Table 2
bcsstk38 test case: eigenvalue distribution of (2.29) and (2.21) varying Fand G. The same
notation as in Table 1is used.
bound bound
kG,max kF,max 1η1˜η1[λn, λ1] [˜
λn,˜
λ1] (2.39) (2.38)
30 30 1.076 4.505 0.812 [1.8e-04, 47.3] [1.7e-04, 2.1] 19473.3 2.512
30 60 1.076 3.181 0.790 [1.8e-04, 33.2] [1.8e-04, 2.1] 9707.6 2.494
30 90 1.076 2.617 0.648 [1.8e-04, 29.1] [1.8e-04, 2.1] 6570.8 2.380
60 30 1.240 4.393 0.718 [3.9e-04, 40.0] [3.9e-04, 2.2] 8291.3 2.569
60 60 1.240 3.037 0.641 [4.0e-04, 29.5] [4.0e-04, 2.2] 3963.8 2.512
60 90 1.240 2.466 0.589 [4.0e-04, 25.8] [4.0e-04, 2.2] 2613.6 2.475
90 30 1.092 4.300 0.738 [6.6e-04, 36.0] [6.5e-04, 2.1] 5353.9 2.464
90 60 1.092 2.935 0.651 [6.8e-04, 26.8] [6.7e-04, 2.1] 2495.6 2.396
90 90 1.092 2.434 0.620 [6.9e-04, 23.3] [6.8e-04, 2.1] 1716.8 2.372
0500 1000 1500 2000 2500 3000 3500 4000
-1,0
-0,8
-0,6
-0,4
-0,2
0,0
0,2
0,4
0,6
0,8
1,0
8000 8005 8010 8015 8020 8025 8030
0.0
10.0
20.0
30.0
40.0
Eigenvalues of (2.21)
Eigenvalues of (2.29)
Upper bound (2.38)
Fig. 4.bcsstk38 test case: eigenspectrum of EK(left) and rightmost eigenvalues of the pre-
conditioned matrices (2.29) and (2.21)(right) for kG,max = 30 and kF ,max = 60. The upper bound
(2.38) is also shown.
essential for a better MF performance. On the other hand, the bound (2.39) is less
significant because of the presence of κ−1
n, which can be quite large. Nonetheless,
the actual eigenvalue distribution of M−1AM −Tis in any case worse than that of
f
M−1Af
M−T.
As an example, Figure 4 shows the eigenvalue distribution of EKfor kG,max = 30,
and the preconditioned matrices (2.29) and (2.21) for the same kG,max and kF,max =
60. The matrix EKis indefinite, with the eigenvalues almost equally distributed
between negative and positive values approximately in the interval [−1,1]. The matrix
f
M−1Af
M−Thas a more favorable eigenvalues distribution than M−1AM−T.
Finally, we add to the frame descending and ascending low-rank corrections. We
fix kG,max = 30 and kF,max = 60, and change the number of eigenpairs computed to
correct either G(dlr) or b
S−1(alr). Their effect on the variation of the conditioning
number of f
M−1Af
M−Tis shown in Table 3. Using both low-rank strategies greatly
helps reduce the conditioning number. For example, in this case retaining 20 eigen-
pairs for both corrections yields a reduction of the conditioning number of about two
orders of magnitude, i.e., from about 24,000 to 500. By distinction, notice that cor-
recting e
S−1towards S−1is not as effective, as expected from the MFLR theoretical
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A ROBUST MFLR PRECONDITIONER FOR SPD MATRICES 141
Table 3
bcsstk38 test case: effect of descending and ascending low-rank corrections, dlr and alr, re-
spectively, with kG,max = 30 and kF,max = 60. The target of ALR correction is e
S−1and S−1in
the upper and lower table, respectively.
alr target alr dlr [λn, λ1]λ1/λn
0 0 [1.682e-04, 2.337e+00] 2.389e+04
0 10 [4.391e-04, 2.337e+00] 5.321e+03
0 20 [4.438e-04, 2.337e+00] 5.265e+03
10 0 [1.682e-04, 2.076e+00] 1.234e+04
e
S−110 10 [4.391e-04, 2.077e+00] 4.731e+03
10 20 [4.438e-04, 2.137e+00] 4.816e+03
20 0 [1.744e-04, 2.076e+00] 1.190e+04
20 10 [3.380e-03, 2.077e+00] 6.145e+02
20 20 [4.413e-03, 2.137e+00] 4.843e+02
10 0 [1.753e-04, 1.467e+01] 8.371e+04
10 10 [4.342e-03, 1.482e+01] 3.414e+03
S−110 20 [6.308e-03, 1.493e+01] 2.367e+03
20 0 [1.772e-04, 3.220e+01] 1.817e+05
20 10 [5.475e-03, 3.270e+01] 5.973e+03
20 20 [9.065e-03, 3.290e+01] 3.630e+03
properties. It is also interesting to observe that the largest eigenvalue is the most
sensitive to the selection of the target matrix for the ALR corrections, while it is
insensitive to the DLR corrections, which mainly affect the smallest eigenvalue.
4.2. Sensitivity to user-specified parameters. To test the influence of the
user-specified parameters that control the MFLR behavior, we use the medium-size
matrix Cube arising from a homogeneous and uniformly discretized structural problem
by P1 finite elements, with 190,581 unknowns and 7,531,389 nonzero entries. The
linear system (1.1) is solved by using an MFLR-preconditioned conjugate gradient
(PCG) method, with an exit tolerance on the relative residual equal to 10−8. The
right-hand-side bis such that the solution xis the vector with components xj=j+ 1,
j= 1, . . . , n. All tests reported here are obtained using a machine equipped with
Intel(R) Xeon(R) E5-2680 v2 processors at 2.80 GHz and 256 Gbyte of RAM. Each
CPU has 10 cores. For these preliminary tests, just one thread is used. The MFLR
preconditioner is implemented in Fortran90, with the code compiled by the Intel
Fortran compiler using the -O3 optimization level. We also used BLAS and LAPACK
routines from the Intel Math Kernel Library. For the computation of the eigenpairs
needed by the low-rank corrections, we used the Laneig software, that is part of the
Filtlan package [10].
The user-specified parameters that control the MFLR quality and performance
are as follows:
1. nl: number of levels. Therefore, (nl−1) is the number of computed Schur
complements;
2. G: tolerance for the adaptive computation of G, see [20];
3. F: tolerance for the adaptive computation of F, see [16];
4. dlr: DLR correction size, i.e., number of eigenpairs used to enrich G. This is
a local improvement;
5. alr: ALR correction size, i.e., number of eigenpairs used to enrich b
S−1. This
is a global improvement, in the sense that the Schur complement size grows
up to the size of the zero-level partition.
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
142 FRANCESCHINI, MAGRI, FERRONATO, AND JANNA
Table 4
Cube test case: MFLR performance varying nland G(F= 10−2,dlr =alr = 0).
Gnlnit ρ TpTsGnlnit ρ TpTs
10−110 930 0.43 14.5 26.8 10−210 640 0.77 34.5 21.9
20 843 0.73 132.0 34.3 20 591 1.17 265.9 29.3
50 664 1.47 349.9 49.1 50 503 2.04 441.7 41.6
100 612 2.78 515.1 79.2 100 464 3.67 612.9 67.0
10−310 446 1.97 268.4 24.0 10−410 362 3.83 2232.9 37.3
20 416 2.53 881.2 29.5 20 338 4.67 3741.6 35.7
50 386 3.45 829.9 41.8 50 329 5.74 2404.2 47.4
100 367 5.71 956.9 66.3 100 318 8.88 2053.8 73.1
The Cube test matrix is reordered with the Reverse Cuthill McKee algorithm and
uniformly partitioned into equal-size levels. The results are evaluated in terms of
number of iterations, nit, time needed to compute the preconditioner, Tp, the time
spent in the PCG iterations, Ts, and the preconditioner density, ρ, defined as
(4.1) ρ=1
nnz(A)
nl−1
X
i=0 nnz(Qi
alr) + nnz(Qi
M),
where nnz(·) gives the number of nonzero entries stored for the operator (·) and all the
remaining symbols are defined as in Algorithm 4.1. For these preliminary tests, the
matrix-matrix products are fully computed, i.e., any use of thresholds and/or levels
of fill is avoided.
Similarly to the previous section, the analysis is carried out step by step, adding
one ingredient at a time. First, the performance of the robust MF preconditioner
with no low-rank corrections is investigated by changing the fill-in degree of Gand F.
Then, low-rank corrections are added to the frame to identify their effect. It is worth
mentioning that only the robust MF preconditioner is addressed here, because the
standard multilevel FSAI algorithm breaks down, giving rise to indefinite Schur com-
plements.
Table 4 shows the results obtained by varying the number of levels nland the
tolerance G. The other tolerance Fis set to 10−2and no low-rank corrections are
applied, i.e., dlr =alr = 0. As expected, the iteration count decreases progressively
as Gdecreases, i.e., Gis more accurate, and nlgrows. The preconditioner density
also increases, so that the set-up burden and the cost per iteration grows. Notice
that, in particular, the set-up time can become very heavy. Although such a cost
could be reduced by introducing thresholds and level-of-fills in the matrix-matrix
computations with no substantial loss in the PCG acceleration, it appears that the
proposed multilevel approach is of interest whenever the preconditioner can be reused
several times, e.g., in eigensolvers or in some transient simulations, so that its set-up
cost can be properly amortized and set apart in the present analysis. Hence, we focus
on the solution time Tsonly. With the most efficient Gvalue, i.e., 10−2, we vary F
and nl. The results, provided in Table 5, show that the MFLR preconditioner appears
to be less sensitive to the quality of Fthan of G. In fact, the iteration count is more
stable than in Table 4.
With G= 10−2,F= 10−2, and nl= 10, we test the effects of corrections.
Table 6 shows the impact of using either DLR and ALR corrections. Increasing the
rank size, the number of iterations decreases for both approaches, but ALR correc-
tions have a stronger impact on the solution time Ts. This is somewhat expected
because ALR corrections have a global effect on the overall preconditioner, while
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A ROBUST MFLR PRECONDITIONER FOR SPD MATRICES 143
Table 5
Cube test case: MFLR performance varying nland F(G= 10−2,dlr =alr = 0).
Fnlnit ρ TpTsFnlnit ρ TpTs
10−110 660 0.65 29.1 22.8 10−210 640 0.77 34.5 21.9
20 632 0.94 216.8 29.5 20 591 1.17 265.9 29.3
50 601 1.61 369.8 47.8 50 503 2.04 441.7 41.6
100 579 2.81 498.0 78.7 100 464 3.67 612.9 67.0
10−310 621 1.13 85.7 27.0 10−410 585 2.304 1163.3 35.1
20 544 1.79 639.2 32.9 20 460 3.445 4513.5 39.3
50 403 3.21 1129.2 43.0 50 309 5.550 4020.9 43.6
100 348 6.98 2278.7 71.2 100 255 11.192 6029.8 68.6
Table 6
Cube test case: MFLR performance varying either dlr or alr (G=F= 10−2,nl= 10).
dlr nit ρ TpTsalr nit ρ TpTs
5 534 0.90 71.3 29.3 5 289 1.35 95.1 17.0
10 485 1.01 66.1 29.4 10 213 1.92 78.0 15.1
20 450 1.24 65.9 25.5 20 196 3.06 88.1 14.1
50 405 1.92 75.0 30.1 50 186 6.48 214.5 36.7
Table 7
Cube test case: MFLR performance varying both dlr and alr (G=F= 10−2,nl= 10).
dlr alr nit ρ TpTs
1 1 525 0.92 76.3 34.4
1 10 204 1.95 87.7 14.8
1 20 188 3.09 112.5 18.5
10 1 381 1.13 71.7 25.0
10 10 148 2.15 83.6 10.9
10 20 134 3.29 100.0 11.3
20 1 341 1.35 86.6 23.9
20 10 132 2.38 100.5 10.4
20 20 123 3.52 100.1 10.8
DLR corrections improve locally the current level approximation of K−1. Finally, the
combined effect of both corrections is investigated in Table 7. This allows for obtain-
ing the best result in terms of both iteration count and solution time Tsdecrease. In
particular, the latter is more than halved with respect to the case with no corrections
with at least dlr =alr = 10. The solver acceleration is paid with a bigger set-up
cost, which makes the MFLR approach interesting when the preconditioner can be
recycled.
Although the present analysis cannot be thoroughly exhaustive, it is possible to
observe that dlr and alr appear to be the most sensitive user-specified parameters.
The number of levels nlstrongly depends on the size of the matrix and should be
selected such that each level is not too small. In the Cube test case it can be seen that
with more than 20 levels, i.e., less than 10,000 unknowns per level, the preconditioner
application cost grows quickly and is no longer compensated by the iteration count
reduction. By distinction, the selection of the tolerances Gand Fdoes not appear
to be overly difficult. In fact, the MFLR performance does not change much with
respect to a variation of these parameters in the interval [10−3,10−1].
4.3. Preconditioner performance. The computational performance of the
MFLR preconditioner is finally evaluated in a set of large size SPD matrices taken
from the University of Florida Sparse Matrix Collection [9]. The main properties of
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
144 FRANCESCHINI, MAGRI, FERRONATO, AND JANNA
Table 8
Test matrices.
Size Number of nonzeroes
af shell3 504,855 17,588,875
af shell8 504,855 17,579,155
Emilia 923 923,136 40,373,538
Geo 1438 1,437,960 60,236,322
StocF 1465 1,465,137 21,005,389
Table 9
Computational performance of the adaptive FSAI and MFLR preconditioners for the test ma-
trices of Table 8.
Adaptive FSAI MFLR
nit ρ Tp[s] Ts[s] nit ρ Tp[s] Ts[s]
af shell3 963 0.89 88.8 63.9 126 4.50 237.8 31.5
af shell8 1033 0.86 81.2 68.0 131 3.22 174.8 28.6
Emilia 923 1513 0.11 5.0 128.5 575 0.25 82.8 80.4
Geo 1438 491 0.31 59.4 89.6 456 0.49 159.8 110.4
StocF 1465 937 0.93 51.3 133.4 936 1.66 260.4 222.2
the selected test cases are provided in Table 8, while the best results obtained by
the MFLR preconditioner in terms of iteration CPU time Tsare given in Table 9.
As a benchmark, in the same table the performance provided by the native adaptive
FSAI algorithm is also shown. All the CPU times are obtained using 1 thread on the
computer defined in section 4.2. It is worth mentioning that the native adaptive FSAI
is used as a benchmark because the standard multilevel FSAI generally gives rise to
indefinite Schur complements. Hence, only the robust MF algorithm presented here
can be effectively used, with or without low-rank corrections. The latter may improve
the convergence rate but have no effect as to the preconditioner robustness.
As already observed, the MFLR preconditioner set-up can be quite expensive,
especially because of the computation of the eigenpairs needed by the low-rank cor-
rection procedures. However, its effectiveness in the iteration count and CPU time
can be quite significant. For instance, in the af shell3 and af shell8 test cases, nit
is approximately reduced by a factor 10 and Tsis more than halved. Hence, the use
of the MFLR preconditioner can be of great interest whenever the set-up time can
be amortized along several linear solves. On the other hand, if the reduction of the
number of iterations is marginal, such as in the Geo 1438 and StocF 1465 test cases,
the adaptive FSAI proves more efficient than the MFLR preconditioner.
For the sake of a comparison, the computational performance obtained by other
multilevel packages, such as ILUPACK [3] and the Trilinos ML [13], is provided in
Table 10. The ILUPACK parameters are set so as to obtain approximately the same
memory footprint as MFLR, i.e., a density close to the one provided in Table 9. By
distinction, the default parameters are retained for Trilinos ML. The results show
that the ILUPACK performance in terms of solution time Tsis close to MFLR
for af shell3 and af shell8, much better for StocF 1465, and much lower for
Emilia 923 and Geo 1438. In particular, the ILU density for the Emilia 923 test
case must be increased up to 3.8 to attain convergence in just 62 iterations, other-
wise the convergence is not reached after 3,000 iterations. Recalling that ILUPACK
cannot thoroughly exploit a growing parallelism of the computational architecture,
the comparison with MFLR appears to be quite satisfactory. As far as the results
with Trilinos ML are concerned, it can be observed that with the default parameters
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A ROBUST MFLR PRECONDITIONER FOR SPD MATRICES 145
Table 10
Computational performance of the test matrices of Table 8with ILUPACK and Trilinos ML.
ILUPACK – GMRes Trilinos ML – CG
nit ρ Tp[s] Ts[s] nit ρ Tp[s] Ts[s]
af shell3 79 4.12 64.0 24.6 356 1.01 1.2 80.1
af shell8 78 4.12 62.2 24.4 309 1.02 1.2 57.4
Emilia 923 >3000 0.40 34.0 – 586 1.15 6.5 367.0
Geo 1438 878 0.50 57.3 385.8 118 1.15 9.6 93.7
StocF 1465 108 1.50 49.2 33.3 759 1.16 3.8 278.0
0.01
0.1
1
Tp(t) / Tp(1)
Adaptive FSAI
MFLR
1 2 4 8 16 32
no. threads
0.01
0.1
1
Ts(t) / Ts(1)
Fig. 5.Strong scalability test for a regular 3003Laplacian.
the MFLR preconditioner is always superior, with the exception of the Geo 1438 test
case, where the two performances are roughly equivalent.
Finally, to show the potential parallelism of the MFLR preconditioner, a scal-
ability test has been carried out on a discrete Laplacian computed over a regular
300 ×300 ×300 grid. The numerical experiment is performed on the Marconi cluster
at the CINECA Center of High Performance Computing, Bologna, Italy. The clus-
ter is still under construction and, at the moment of writing this work, it consists
of 1,512 nodes with 128 Gbyte of RAM memory each. Every node is equipped with
2 Intel Xeon E5-2697v4 Broadwell processors at 2.3 GHz with 36 cores. This pre-
liminary implementation of MFLR makes use of shared-memory parallelism through
the OpenMP directives, hence in the strong scalability test provided in Figure 5, a
single node only is used with up to 32 threads. The strong scalability of the MFLR
preconditioner is compared to that of the Adaptive FSAI algorithm as far as both
the set-up and the iteration time, Tpand Ts, are concerned. In particular, Figure 5
provides the speed-up,
Sp=Tp(t)
Tp(1), Ss=Ts(t)
Ts(1),(4.2)
with Tp(t) and Ts(t) the set-up and iteration wall-clock times measured with tthreads.
It has been already verified that the native Adaptive FSAI practically has an ideal
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
146 FRANCESCHINI, MAGRI, FERRONATO, AND JANNA
speed-up [20], according to the hardware properties of the specific computational ar-
chitecture used for the numerical experiments. In the case of the multi-core processors
of the Marconi cluster, it is well-known that it is virtually impossible to obtain ideal
speed-ups with iterative solvers as these algorithms are bandwidth limited, being char-
acterized by a low bit per flop ratio. The MFLR preconditioner is theoretically less
parallel than FSAI, because of the intrinsic sequentiality introduced by the multilevel
framework. Nevertheless, Figure 5 shows that the strong scalability is only marginally
affected by such a sequentiality and the MFLR preconditioner still preserves a good
degree of parallelism.
5. Conclusions. The development of a multilevel framework is often useful in
several applications. However, using the FSAI preconditioner as the basic kernel
in a standard multilevel approach may give rise to some difficulties related with the
approximations introduced in the computation of the Schur complement at each level.
With SPD problems, such a Schur complement might be indefinite, thus causing a
breakdown of the multilevel algorithm.
The present work develops a robust multilevel framework for SPD matrices based
on the use of adaptive FSAI as the main kernel. An alternative way of computing
the Schur complement is introduced so as to guarantee its positive definiteness in-
dependent of the preconditioner sparsity. A theoretical analysis is formulated with
the aim of providing appropriate bounds for the eigenspectrum of the preconditioned
matrix. The multilevel FSAI preconditioner is further enhanced by introducing low-
rank corrections at both a local and a global level, namely DLR and ALR corrections,
respectively, thus producing the MFLR preconditioning framework.
The MFLR preconditioner has been investigated in a set of test problems to
analyze (i) the relative influence of the user-specified parameters controlling the al-
gorithm set-up, and (ii) the computational performance and potential scalability in
a parallel environment. The numerical results show that the proposed approach is
generally able to significantly accelerate the solver convergence rate still preserving a
good degree of parallelism. At the present time, the solver acceleration is paid off by
a large set-up cost, which is mainly due to the computation of the eigenpairs needed
by the low-rank corrections. The increase of the cost for building the preconditioner
with respect to the native adaptive FSAI makes this approach attractive especially
for those applications where the preconditioner can be effectively recycled along a
number of linear solves.
The main goal of this work was to prove the robustness and effectiveness of
a multilevel framework in explicit preconditioning. Further investigations will be
devoted in the development of a faster set-up stage by using, for instance, randomized
approaches while computing low-rank corrections [15], and enforcing sparsity in the
resulting factors.
REFERENCES
[1] P. Amestoy, C. Ashcraft, O. Boiteau, A. Buttari, J. Y. L’Excellent, and
C. Weisbecker,Improving multifrontal methods by means of block low-rank represen-
tations, SIAM J. Sci. Comp., 37 (2015), pp. A1451–A1474.
[2] M. Benzi,Preconditioning techniques for large linear systems: A survey, J. Comput. Phys.,
182 (2002), pp. 418–477.
[3] M. Bollh¨
ofer, J. I. Aliaga, A. F. Martın, and E. S. Quintana-Ort
´
ı, ILUPACK, in Ency-
clopedia of Parallel Computing, D. Padua, ed., Springer, New York, 2011, pp. 917–926.
[4] Y. Bu, B. Carpentieri, Z. Shen, and T. Z. Huang,A hybrid recursive multilevel incomplete
factorization preconditioner for solving general linear systems, Appl. Numer. Math., 104
(2016), pp. 141–157.
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A ROBUST MFLR PRECONDITIONER FOR SPD MATRICES 147
[5] B. Carpentieri, I. S. Duff, L. Giraud, and G. Sylvand,Combining fast multipole techniques
and an approximate inverse preconditioner for large electromagnetism calculations, SIAM
J. Sci. Comput., 27 (2005), pp. 774–792.
[6] B. Carpentieri, J. Liao, and M. Sosonkina,VBARMS: A variable block algebraic recursive
multilevel solver for sparse linear systems, J. Comp. Appl. Math., 259 (2014), pp. 164–173.
[7] T. F. Chan and T. P. Mathew,Domain decomposition algorithms, Acta Numer., 3 (1994),
pp. 61–143.
[8] E. Chow and Y. Saad,Approximate inverse preconditioners via sparse-sparse iterations,
SIAM J. Sci. Comput., 19 (1998), pp. 995–1023.
[9] T. A. Davis and Y. Hu,The University of Florida sparse matrix collection, ACM Trans. Math.
Software, 38 (2011), pp. 1–25.
[10] H. Fang and Y. Saad,A filtered Lanczos procedure for extreme and interior eigenvalue prob-
lems, SIAM J. Sci. Comput., 34 (2012), pp. 2220–2246
[11] M. Ferronato,Preconditioning for sparse linear systems at the dawn of the 21st cen-
tury: History, current developments, and future perspectives, ISRN Appl. Math., (2012),
doi:10.5402/2012/127647.
[12] A. Franceschini, V. A. Paludetto Magri, C. Janna, and M. Ferronato,Multilevel ap-
proaches for FSAI preconditioning, Numer. Linear Algebra Appl., submitted.
[13] M. W. Gee, C. M. Siefert, J. J. Hu, R. S. Tuminaro, and M. G. Sala,ML 5.0 Smoothed
Aggregation User’s Guide, 2006–2649 Sandia National Laboratories, 2006.
[14] W. Hackbusch,A sparse matrix arithmetic based on H-matrices. Part I: Introduction to
H-matrices, Computing, 62 (1999), pp. 89–108.
[15] N. Halko, P. G. Martinsson, and J. A. Tropp,Finding structure with randomness: Prob-
abilistic algorithms for constructing approximate matrix decompositions, SIAM Rev., 53
(2011), pp. 217–288.
[16] C. Janna and M. Ferronato,Adaptive pattern research for block FSAI preconditioning, SIAM
J. Sci. Comput. 33 (2011), pp. 3357–3380.
[17] C. Janna, M. Ferronato, and G. Gambolati,Multilevel incomplete factorizations for the it-
erative solution of non-linear finite element problems, Internat. J. Numer. Methods Engrg.,
80 (2009), pp. 651–670.
[18] C. Janna, M. Ferronato, and G. Gambolati,A Block FSAI-ILU parallel precondi-
tioner for symmetric positive definite linear systems, SIAM J. Sci. Comput., 32 (2010),
pp. 2468–2484.
[19] C. Janna, M. Ferronato, and G. Gambolati,Enhanced block FSAI preconditioning using
domain decomposition techniques, SIAM J. Sci. Comput., 35 (2013), pp. S229–S249.
[20] C. Janna, M. Ferronato, F. Sartoretto, and G. Gambolati,FSAIPACK: A software pack-
age for high performance FSAI preconditioning, ACM Trans. Math. Software, 41 (2015),
Art. 10.
[21] L. Yu. Kolotilina and A. Yu. Yeremin,Factorized sparse approximate inverse precondition-
ing I. Theory, SIAM J. Matrix Anal. Appl., 14 (1993), pp. 45–58.
[22] Z. Li, Y. Saad, and M. Sosonkina,pARMS: a parallel version of the algebraic recursive
multilevel solver, Numer. Lin. Alg. Appl., 10 (2003), pp. 485–509.
[23] R. Li and Y. Saad,Divide and conquer low-rank preconditioners for symmetric matrices,
SIAM J. Sci. Comput., 35 (2013), pp. A2069–A2095.
[24] L. C. McInnes, B. Smith, H. Zhang, and R. T. Mills,Hierarchical Krylov and nested Krylov
methods for extreme-scale computing, Parallel Comput., 40 (2014), 17–31.
[25] P. Raghavan and K. Teranishi,Parallel hybrid preconditioning: Incomplete factoriza-
tion with selective sparse approximate inversion, SIAM J. Sci. Comput. 32 (2010),
pp. 1323–1345.
[26] Y. Saad and B. Suchomel,ARMS: An algebraic recursive multilevel solver for general sparse
linear systems, Numer. Linear Algebra Appl. 9 (2002), pp. 359–378.
[27] J. A. Scott and M. T˚
uma,On positive semidefinite modification schemes for incomplete
Cholesky factorization, SIAM J. Sci. Comput., 36 (2014), pp. A609–A633.
[28] S. Wang, X. S. Li, J. Xia, Y. Situ, and M. V. de Hoop,Efficient scalable algorithms for
solving dense linear systems with hierarchically semiseparable structures, SIAM J. Sci.
Comp., 35 (2013), pp. C519–C544.
[29] Y. Xi, R. Li and Y. Saad,An algebraic multilevel preconditioner with low-rank corrections
for sparse symmetric matrices, SIAM J. Matrix Anal. Appl., 37 (2016), pp. 235–259.
[30] J. Xia,On the complexity of some hierarchical structured matrix algorithms, SIAM J. Matrix
Anal. Appl., 33 (2012), pp. 388–410.
[31] J. Xia,Efficient structured multifrontal factorization for general large sparse matrices, SIAM
J. Sci. Comput., 35 (2013), pp. A832–A860.
[32] J. Xia, S. Chandrasekaran, M. Gu, and X. S. Li,Fast algorithms for hierarchically semisep-
arable matrices, Numer. Linear Algebra Appl., 17 (2010), pp. 953–976.
Downloaded 01/24/18 to 147.162.110.99. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php