PreprintPDF Available

Abstract and Figures

Randomized methods are becoming increasingly popular in numerical linear algebra. However, few attempts have been made to use them in developing preconditioners. Our interest lies in solving large-scale sparse symmetric positive definite linear systems of equations where the system matrix is preordered to doubly bordered block diagonal form (for example, using a nested dissection ordering). We investigate the use of randomized methods to construct high quality preconditioners. In particular, we propose a new and efficient approach that employs Nystr\"om's method for computing low rank approximations to develop robust algebraic two-level preconditioners. Construction of the new preconditioners involves iteratively solving a smaller but denser symmetric positive definite Schur complement system with multiple right-hand sides. Numerical experiments on problems coming from a range of application areas demonstrate that this inner system can be solved cheaply using block conjugate gradients and that using a large convergence tolerance to limit the cost does not adversely affect the quality of the resulting Nystr\"om--Schur two-level preconditioner.
Content may be subject to copyright.
TWO-LEVEL NYSTR ¨
OM–SCHUR PRECONDITIONER FOR SPARSE
SYMMETRIC POSITIVE DEFINITE MATRICES
HUSSAM AL DAAS, TYRONE REES,AND JENNIFER SCOTT†‡
Abstract. Randomized methods are becoming increasingly popular in numerical linear algebra.
However, few attempts have been made to use them in developing preconditioners. Our interest lies
in solving large-scale sparse symmetric positive definite linear systems of equations where the system
matrix is preordered to doubly bordered block diagonal form (for example, using a nested dissection
ordering). We investigate the use of randomized methods to construct high quality preconditioners.
In particular, we propose a new and efficient approach that employs Nystr¨om’s method for computing
low rank approximations to develop robust algebraic two-level preconditioners. Construction of the
new preconditioners involves iteratively solving a smaller but denser symmetric positive definite Schur
complement system with multiple right-hand sides. Numerical experiments on problems coming from
a range of application areas demonstrate that this inner system can be solved cheaply using block
conjugate gradients and that using a large convergence tolerance to limit the cost does not adversely
affect the quality of the resulting Nystr¨om–Schur two-level preconditioner.
Key words. Randomized methods, Nystr¨om’s method, Low rank, Schur complement, Deflation,
Sparse symmetric positive definite systems, Doubly bordered block diagonal form, Block Conjugate
Gradients, Preconditioning.
1. Introduction. Large scale linear systems of equations arise in a wide range
of real-life applications. Since the 1970s, sparse direct methods, such as LU, Cholesky,
and LDLT factorizations, have been studied in depth and library quality software is
available (see, for example, [9] and the references therein). However, their memory
requirements and the difficulties in developing effective parallel implementations
can limit their scope for solving extremely large problems, unless they are used in
combination with an iterative approach. Iterative methods are attractive because
they have low memory requirements and are simpler to parallelize. In this work,
our interest is in using the conjugate gradient (CG) method to solve large sparse
symmetric positive definite (SPD) systems of the form
(1.1) Ax =b,
where ARn×nis SPD, bRnis the given right-hand side, and xis the required
solution. The solution of SPD systems is ubiquitous in scientific computing, being
required in applications as diverse as least-squares problems, non-linear optimization
subproblems, Monte-Carlo simulations, finite element analysis, and Kalman filtering.
In the following, we assume no additional structure beyond a sparse SPD system.
It is well known that the approximate solution xkat iteration kof the CG method
satisfies
(1.2) kx?xkkA2kx?x0kAκ1
κ+ 1k
,
where x?is the exact solution, x0is the initial guess, k·kAis the A-norm, and κ(A) =
λmaxmin is the spectral condition number (λmax and λmin denote the largest and
Submitted to the editors January 28, 2021.
STFC Rutherford Appleton Laboratory, Harwell Campus, Didcot, Oxfordshire, OX11 0QX, UK
(hussam.al-daas@stfc.ac.uk,tyrone.rees@stfc.ac.uk,jennifer.scott@stfc.ac.uk).
School of Mathematical, Physical and Computational Sciences, University of Reading, Reading
RG6 6AQ, UK
1
2H. AL DAAS, T. REES, AND J. SCOTT
smallest eigenvalues of A). The rate of convergence also depends on the distribution
of the eigenvalues (as well as on band x0): eigenvalues clustered away from the
origin lead to rapid convergence. If κ(A) is large and the eigenvalues of Aare evenly
distributed, the system needs to be preconditioned to enhance convergence. This
can be done by applying a linear operator Pto (1.1), where P Rn×nis chosen so
that the spectral condition number of PAis small and applying Pis inexpensive. In
some applications, knowledge of the provenance of Acan help in building an efficient
preconditioner. Algebraic preconditioners do not assume such knowledge, and include
incomplete Cholesky factorizations, block Jacobi, Gauss–Seidel, and additive Schwarz;
see, for example, [36]. These are referred to as one-level or traditional preconditioners
[7,43]. In general, algebraic preconditioners bound the largest eigenvalues of PAbut
encounter difficulties in controlling the smallest eigenvalues, which can lie close to the
origin, hindering convergence.
Deflation strategies have been proposed to overcome the issues related to small
eigenvalues. As explained in [25], the basic idea behind deflation is to “hide” certain
parts of the spectrum of the matrix from the CG method, such that the CG iteration
“sees” a system that has a much smaller condition number than the original matrix.
The part of the spectrum that is hidden from CG is determined by the deflation
subspace and the improvement in the convergence rate of the deflated CG method is
dependent on the choice of this subspace. In the ideal case, the deflation subspace
is the invariant subspace spanned by the eigenvectors associated with the smallest
eigenvalues of Aand the convergence rate is then governed by the “effective” spectral
condition number associated with the remaining eigenvalues (that is, the ratio of the
largest eigenvalue to the smallest remaining eigenvalue). The idea was first introduced
in the late 1980s [8,33], and has been discussed and used by a number of researchers
[2,3,10,14,22,23,27,32,40,41,45,46]. However, in most of these references,
the deflation subspaces rely on the underlying partial differential equation and its
discretization, and cannot be applied to more general systems or used as “black box”
preconditioners. Algebraic two-level preconditioners have been proposed in [4,11,
15,30,43,44]. Recently, a two-level Schur complement preconditioner based on the
power series approximation was proposed in [50].
In recent years, the study of randomized methods has become an active and
promising research area in the field of numerical linear algebra (see, for example,
[16,31] and the references therein). The use of randomized methods to build
preconditioners has been proposed in a number of papers, including [14,18]. The
approach in [14] starts by reordering the system matrix Ato a 2 ×2 doubly
bordered block diagonal form, which can be achieved using a nested dissection
ordering. The Schur complement system must then be solved. Starting from
a first-level preconditioner P, a deflation subspace is constructed via a low rank
approximation. Although deflation can be seen as a low rank correction, using
randomized methods to estimate the low rank term is not straightforward because
the deflation subspace is more likely to be associated with the invariant subspace
corresponding to the smallest eigenvalues of the preconditioned matrix, and not to
its dominant subspace. In section 2, we review the ingredients involved in building
our two-level preconditioner. This includes Nystr¨om’s method for computing a low
rank approximation of a matrix [12,16,34,47,48], basic ideas behind deflation
preconditioners, and the two-level Schur complement preconditioners presented in
[14,27]. In section 3, we illustrate the difficulties in constructing these two-level
preconditioners by analysing the eigenvalue problems that must be solved. We show
that these difficulties are mainly associated with the clustering of eigenvalues near
TWO-LEVEL NYSTR ¨
OM–SCHUR PRECONDITIONER FOR SPD MATRICES 3
Identifier n nnz(A)κ(A)nΓ2D/3D Application Source
bcsstk38 8,032 355,460 5.5e+16 2,589 2D Structural problem SSMC
ela2d 45,602 543,600 1.5e+8 4,288 2D Elasticity problem FF++
ela3d 9,438 312,372 4.5e+5 4,658 3D Elasticity problem FF++
msc10848 10,848 1,229,776 1.0e+10 4,440 3D Structural problem SSMC
nd3k 9,000 3,279,690 1.6e+7 1,785 3D Not available SSMC
s3rmt3m3 5,357 207,123 2.4e+10 2,058 2D Structural problem SSMC
Table 1
Set of test matrices. nand nnz (A)denote the order of Aand the number of nonzero entries
in Adisregarding, κ(A)is the spectral condition number, nΓis the order of the Schur complement
(2.11). SSMC refers to SuiteSparse Matrix Collection [5]. FF++ refers to FreeFem++ [17].
the origin. Motivated by this analysis, in section 4 we propose reformulating the
approximation problem.
The new formulation leads to well-separated eigenvalues that lie away from
the origin, and this allows randomized methods to be used to compute a deflation
subspace. Our approach guarantees a user-defined upper bound on the expected value
of the spectral condition number of the preconditioned matrix. Numerical results for
our new preconditioner and comparisons with other approaches are given in section 5.
Concluding remarks are made in section 6.
Our main contributions are:
an analysis of the eigenvalue problems and solvers presented in [14,27];
a reformulation of the eigenvalue problem so that it be efficiently solving using
randomized methods;
a new two-level preconditioner for symmetric positive definite systems that
we refer to as a two-level Nystr¨om–Schur preconditioner;
theoretical bounds on the expected value of the spectral condition number of
the preconditioned system.
Test environment. In this study, to demonstrate our theoretical and practical
findings, we report on numerical experiments using the test matrices given in Table 1.
This set was chosen to include 2D and 3D problems having a range of densities and
with relatively large spectral condition numbers. In the Appendix, results are given
for a much larger set of matrices. For each test, the entries of the right-hand side
vector fare taken to be random numbers in the interval [0,1]. All experiments are
performed using Matlab 2020b.
Notation. Throughout this article, matrices are denoted using uppercase letters;
scalars and vectors are lowercase. The pseudo inverse of a matrix Cis denoted by C
and its transpose is given by C>. Λ(M) denotes the spectrum of the matrix Mand
κ(M) denotes its condition number. Λk=diag(λ1, . . . , λk) denotes a k×kdiagonal
matrix with entries on the diagonal equal to λ1, . . . , λk.e
S(with or without a subscript
or superscript) is used as an approximation to a Schur complement matrix. P(with
or without a subscript) denotes a (deflation) preconditioner. M(with or without
a subscript) denotes a two-level (deflation) preconditioner. Matrices with an upper
symbol such as e
Z,b
Z, and ˘
Zdenote approximations of the matrix Z. Euler’s constant
is denoted by e.
2. Background. We start by presenting a brief review of Nystr¨om’s method for
computing a low rank approximation to a matrix and then recalling key ideas behind
two-level preconditioners; both are required in later sections.
4H. AL DAAS, T. REES, AND J. SCOTT
2.1. Nystr¨om’s method. Given a matrix G, the Nystr¨om approximation of a
SPSD matrix Bis defined to be
(2.1) BG(G>BG)(BG)>.
We observe that there are a large number of variants based on different choices of
G(for example, [16,28,31]). For q0, the q-power iteration Nystr¨om method is
obtained by choosing
(2.2) G=Bq,
for a given (random) starting matrix Ω. Note that, in practice, for stability it is
normally necessary to orthonormalize the columns between applications of B.
The variant of Nystr¨om’s method we employ is outlined in Algorithm 2.1. It gives
a near-optimal low rank approximation to Band is particularly effective when the
eigenvalues of Bdecay rapidly after the k-th eigenvalue [16,31]. It requires only one
matrix-matrix product with B(or q+ 1 products if (2.2) is used). The rank of the
resulting approximation is min(r, k), where ris the rank of D1, see Step 5.
Algorithm 2.1 Nystr¨om’s method for computing a low rank approximation to a
SPSD matrix.
Input: A SPSD matrix BRn×n, the required rank k > 0, an oversampling
parameter p0 such that k, p n, and a threshold ε.
Output: e
Bk=e
Uke
Σke
U>
kBwhere e
Ukis orthonormal e
Σkis diagonal with non
negative entries.
1: Draw a random matrix GRn×(k+p).
2: Compute F=BG.
3: Compute the QR factorization F=QR.
4: Set C=G>F.
5: Compute the EVD C=V1D1V>
1+V2D2V>
2, where D1contains all the eigenvalues
that are at least ε.
6: Set T=RV1D1
1(RV1)>.
7: Compute the EVD T=W EW >.
8: Set e
U=QW ,e
Uk=e
U(:,1 : k), e
Σ = E(1 : k, 1 : k), and e
Bk=e
Uke
Σke
U>
k.
Note that, if the eigenvalues are ordered in descending order, the success of
Nystr¨om’s method is closely related to the ratio of the (k+ 1)th and the kth
eigenvalues. If the ratio is approximately equal to one, qmust be large to obtain
a good approximation [37].
2.2. Introduction to two-level preconditioners. Consider the linear system
(1.1). As already noted, deflation techniques are typically used to shift isolated
clusters of small eigenvalues to obtain a tighter spectrum and a smaller condition
number. Such changes have a positive effect on the convergence of Krylov subspace
methods. Consider the general (left) preconditioned system
(2.3) PAx =Pb, P Rn×n.
Given a projection subspace matrix ZRn×kof full rank and kn, define the
nonsingular matrix E=Z>AZ Rk×kand the matrix Q=ZE 1Z>Rn×n. The
deflation preconditioner PDEF Rn×nis defined to be [10]
(2.4) PDEF =IAQ.
TWO-LEVEL NYSTR ¨
OM–SCHUR PRECONDITIONER FOR SPD MATRICES 5
It is straightforward to show that PDEF is a projection matrix and PDEFAhas kzero
eigenvalues (see [44] for basic properties of PDEF ). To solve (1.1), we write
x= (I− P>
DEF)x+P>
DEFx.
Since Qis symmetric, P>
DEF =IQA, and so
x=QAx +P>
DEFx=Qb +P>
DEFx,
and we only need to compute P>
DEFx. We first find ythat satisfies the deflated system
(2.5) PDEFAy =PDEF b,
then (due to the identity AP>
DEF =PDEFA) we have that P>
DEFy=P>
DEFx. We therefore
obtain the unique solution x=Qb +P>
DEFy. The deflated system (2.5) is singular and
can only be solved using CG if it is consistent [24], which is the case here since the
same projection is applied to both sides of a consistent nonsingular system (1.1).
The deflated system can also be solved using a preconditioner, giving a two-level
preconditioner for the original system.
Tang et al. [44] illustrate that rounding errors can result in erratic and slow
convergence of CG using PDEF. They thus also consider an adapted deflation
preconditioner
(2.6) PA-DEF =IQA +Q,
that combines P>
DEF with Q. In exact arithmetic, both PDEF and PA-DEF used with
CG generate the same iterates. However, numerical experiments [44] show that the
latter is more robust and leads to better numerical behavior of CG1.
Let λn≥ ··· ≥ λ1>0 be the eigenvalues of Awith associated normalized
eigenvectors vn, . . . , v1. For the ideal deflation preconditioner, Pideal, the deflation
subspace is the invariant subspace spanned by the eigenvectors associated with the
smallest eigenvalues. To demonstrate how Pideal modifies the spectrum of the deflated
matrix, set Zk= [v1, . . . , vk] to be the n×kmatrix whose columns are the eigenvectors
corresponding to the smallest eigenvalues. It follows that E=Z>AZ is equal to
Λk=diag(λ1, . . . , λk) and the preconditioned matrix is given by
PidealA=AZkΛkZ>
k.
Since Zkis orthonormal and its columns span an invariant subspace, the spectrum
of PidealAis {λn, . . . , λk+1 ,0}. Starting with x0such that Z>
kr0= 0 (r0is the
initial residual), for l0, Z>
k(PidealA)lr0= 0 and Z>
kAlr0= 0. Hence the search
subspace generated by the preconditioned CG (PCG) method lies in the invariant
subspace spanned by vn, . . . , vk+1, which is orthogonal to the subspace spanned by
the columns of Zk. Consequently, the effective spectrum of the operator that PCG
sees is {λn, . . . , λk+1}and the associated effective spectral condition number is
κeff(Pideal A) = λnk+1.
Using similar computations, the ideal adapted deflated system is given by:
(2.7) PA-ideal =AZkΛ1
kZ>
k+ZkZ>
k.
1In [44], PDEF and PA-DEF are termed PDEF1 and PA-DEF2, respectively
6H. AL DAAS, T. REES, AND J. SCOTT
Furthermore, the spectrum of the operator that PCG sees is {λn, . . . , λk+1,1,...,1}
and the associated effective spectral condition number is
κeff(PA-ideal A) = max{1, λn}/min{1, λk+1}.
In practice, only an approximation of the ideal deflation subspace spanned by the
columns of Zkis available. Kahl and Rittich [25] analyze the deflation preconditioner
using e
ZkZkand present an upper bound on the corresponding effective spectral
condition number of the deflated matrix κ(PA). Their bound [25, Proposition 4.3],
which depends on κ(A), κeff(Pideal A), and the largest principal angle θbetween e
Zk
and Zk, is given by
(2.8) κ(PA)pκ(A) sin θ+pκeff(Pideal A)2,
where sin θ=kZkZ>
ke
Zke
Z>
kk2.
2.3. Schur Complement Preconditioners. This section reviews the Schur
complement preconditioner with a focus on two-level variants that were introduced in
[14,27].
One-level preconditioners may not provide the required robustness when used with
a Krylov subspace method because they typically fail to capture information about
the eigenvectors corresponding to the smallest eigenvalues. To try and remedy this, in
their (unpublished) report, Grigori et al. [14] and, independently, Li et al. [27] propose
a two-level preconditioner based on using a block factorization and approximating the
resulting Schur complement.
Applying graph partitioning techniques (for example, using the METIS package
[26,29]), Acan be symmetrically permuted to the 2×2 doubly bordered block diagonal
form
(2.9) P>AP =AIAIΓ
AΓIAΓ,
where AIRnI×nIis a block diagonal matrix, AΓRnΓ×nΓ,AΓIRnΓI×nΓand
AIΓ=A>
ΓI. For simplicity of notation, we assume that Ais of the form (2.9) (and
omit the permutation Pfrom the subsequent discussion).
The block form (2.9) induces a block LDLT factorization
(2.10) A=I
AΓIA1
IIAI
SΓI A1
IAIΓ
I,
where
(2.11) SΓ=AΓAΓIA1
IAIΓ
is the Schur complement of Awith respect to AΓ. Provided the blocks within AI
are small, they can be factorized cheaply in parallel using a direct algorithm (see,
for example, [38]) and thus we assume that solving linear systems with AIis not
computationally expensive. However, the SPD Schur complement SΓis typically
large and significantly denser than AΓ(its size increases with the number of blocks
in AI) and, in large-scale practical applications, it may not be possible to explicitly
assemble or factorize it.
TWO-LEVEL NYSTR ¨
OM–SCHUR PRECONDITIONER FOR SPD MATRICES 7
Preconditioners may be derived by approximating S1
Γ. An approximate block
factorization of A1is
M1=IA1
IAIΓI
IA1
Ie
S1I
AΓIA1
II,
where e
S1S1
Γ. If M1is employed as a preconditioner for Athen the
preconditioned system is given by
(2.12) M1A= I A1
IAIΓ(Ie
S1SΓ)
e
S1SΓ!,
with Λ(M1A) = {λΛ( e
S1SΓ)} ∪ {1}. Thus, to bound the condition number
κ(M1A), we need to construct e
S1so that κ(e
S1SΓ) is bounded. Moreover, (2.12)
shows that applying the preconditioner requires the efficient solution of linear systems
with e
S1SΓand AI, the latter being relatively inexpensive. We therefore focus on
constructing preconditioners e
S1for linear systems of the form
(2.13) SΓw=f.
Consider the first-level preconditioner obtained by setting
(2.14) e
S1
1:= A1
Γ.
Assume for now that we can factorize AΓ, although in practice it may be very large
and a recursive construction of the preconditioner may then be needed (see [49]). Let
the eigenvalues of the generalized eigenvalue problem
(2.15) SΓz=λe
S1z
be λnΓ≥ ··· ≥ λ1>0. From (2.11),λnΓ1 and so
κ(e
S1
1SΓ) = λnΓ
λ11
λ1
.
As this is unbounded as λ1approaches zero, we seek to add a low rank term to
“correct” the approximation and shift the smallest keigenvalues of e
S1
1SΓ. Let
Λk=diag{λ1, . . . , λk}and let ZkRnΓ×kbe the matrix whose columns are
the corresponding eigenvectors. Without loss of generality, we assume Zkis AΓ-
orthonormal. Let the Cholesky factorization of AΓbe
(2.16) AΓ=R>
ΓRΓ
and define
(2.17) e
S1
2:= A1
Γ+Zk1
kI)Z>
k.
e
S1
2is an additive combination of the first-level preconditioner e
S1
1and an adapted
deflation preconditioner associated with the subspace spanned by the columns of
Uk=RΓZk, which is an invariant subspace of R1
ΓSΓR−>
Γ. Substituting Ukinto
(2.17) and using (2.16),
(2.18) e
S1
2=R1
Γ(I+Uk1
kI)U>
k)R−>
Γ.
8H. AL DAAS, T. REES, AND J. SCOTT
Setting Q=UkΛ1
kU>
kin (2.6) gives
PA-DEF =RΓe
S1
2R>
Γ.
Now e
S1
2SΓ=R1
ΓPA-DEFR−>
ΓSΓand PA-DEFR−>
ΓSΓR1
Γare spectrally equivalent
and Λ( e
S1
2SΓ) = {λnΓ, λnΓ1, ..., λk+1}∪ {1}. It follows that
κ(e
S1
2SΓ) = λnΓ
λk+1 1
λk+1
.
Grigori et al. [14] note that (2.15) is equivalent to the generalized eigenvalue
problem
(2.19) (AΓSΓ)z=AΓIA1
IAIΓz=σAΓz, σ = 1 λ.
Setting u=RΓzand defining
(2.20) H=R−>
ΓAΓIA1
IAIΓR1
Γ,
(2.19) becomes
(2.21) Hu =σu.
Thus, the smallest eigenvalues λof (2.15) are transformed to the largest eigenvalues
σof problems (2.19) and (2.21). Grigori et al. employ a randomized algorithm to
compute a low rank eigenvalue decomposition (EVD) of Hthat approximates its
largest eigenvalues and vectors, which are multiplied by R1
Γto obtain approximate
eigenvectors of A1
ΓSΓ.
In [27], Li et al. write the inverse of the Schur complement SΓas:
S1
Γ=AΓAΓIA1
IAIΓ1
=R>
ΓRΓAΓIA1
IAIΓ1
=R1
Γ(IH)1R−>
Γ,
(2.22)
where the symmetric positive semidefinite (SPSD) matrix His given by (2.20). Since
IH=R−>
ΓSΓR1
Γis SPD, the eigenvalues σ1. . . σnΓof Hbelong to [0,1].
Let the EVD of Hbe
H=UΣU>,
where Uis orthonormal and Σ = diag{σ1, . . . , σnΓ}. It follows that
S1
Γ=R1
ΓIUΣU>1R−>
Γ
=R1
ΓU(IΣ)1U>R−>
Γ
=R1
ΓI+U(IΣ)1IU>R−>
Γ
=A1
Γ+R1
ΓU(IΣ)1IU>R−>
Γ.
(2.23)
If Hhas an approximate EVD of the form
HUe
ΣU>,e
Σ = diag{eσ1,...,eσnΓ},
TWO-LEVEL NYSTR ¨
OM–SCHUR PRECONDITIONER FOR SPD MATRICES 9
then an approximation of S1
Γis
e
S1=A1
Γ+R1
ΓUIe
Σ1IU>R−>
Γ.(2.24)
The simplest selection of e
Σ is the one that ensures the klargest eigenvalues of (Ie
Σ)1
match the largest eigenvalues of (IΣ)1. Li et al. set e
Σ = diag(σ1, . . . , σk,...,θ),
where θ[0,1]. The resulting preconditioner can be written as
(2.25) e
S1
θ=1
1θA1
Γ+Zk(IΣk)11
1θIZ>
k,
where Σk=diag(σ1, . . . , σk) and the columns of Zk=R1
ΓUkare the eigenvectors
corresponding to the klargest eigenvalues of H. In [27], it is shown that κ(e
S1
θS) =
(1 σnΓ)/(1 θ), which takes its minimum value for θ=σk+1.
In the next section, we analyse the eigenvalue problems that need to be solved
to construct the preconditioners (2.17) and (2.25). In particular, we show that the
approaches presented in [14,27] for tackling these problems are inefficient because of
the eigenvalue distribution.
3. Analysis of Hu =σu.
3.1. Use of the Lanczos method. Consider the eigenproblem:
Given ε > 0,find all the eigenpairs (λ, z)R×RnΓsuch that
SΓz=λAΓz, λ < ε.
This can be rewritten as:
Given ε > 0,find all the eigenpairs (λ, z)R×RnΓsuch that
(IH)u=λu, z =R1
Γu, λ < ε,
(3.1)
where RΓand Hare given by (2.16) and (2.20). Consider also the eigenproblem:
Given ε > 0,find all the eigenpairs (σ, u)R×RnΓsuch that
Hu =σu, σ > 1ε.
(3.2)
As already observed, each eigenpair (λ, z) of (3.1) corresponds to the eigenpair (1
λ, RΓz) of (3.2). Consider using the Lanczos method to solve these eigenproblems.
The Krylov subspace at iteration jgenerated for (3.1) is
Kj((IH), v1) = span(v1,(IH)v1,...,(IH)j1v1),
while the subspace generated for (3.2) is
Kj(H, v1) = span(v1, H v1, . . . , Hj1v1).
It is clear that, provided the same starting vector v1is used, Kj((IH), v1) and
Kj(H, v1) are identical. Suppose that [Vj, vj+1] is the output of the Lanczos basis of
the Krylov subspace, then the subspace relations that hold at iteration jare
(IH)Vj=VjTj+vj+1h>
j,
10 H. AL DAAS, T. REES, AND J. SCOTT
HVj=Vj(ITj)vj+1h>
j,
where TjRj×jis a symmetric tridiagonal matrix and hjRj. The eigenpair
(λ, z) (respectively, (σ, u)) corresponding to the smallest (respectively, largest)
eigenvalue in (3.1) (respectively, (3.2)) is approximated by the eigenpair (e
λ, R1
ΓVjeu)
(respectively, (eσ , Vjeu)) corresponding to the smallest (respectively, largest) eigenvalue
of Tj(respectively, ITj). To overcome memory constraints, the Lanczos procedure
is typically restarted after a chosen number of iterations, at each restart discarding
the non convergent part of the Krylov subspace [42]. Hence, starting with the same
v1and performing the same number of iterations per cycle, in exact arithmetic the
accuracy obtained when solving (3.1) and (3.2) is identical.
Having shown that the convergence of Lanczos’ method for solving (3.1) and (3.2)
is the same, we focus on (3.2). In Figure 1, for each of our test matrices in Table 1
0 10 20 30 40 50 60 70 80 90 100
Eigenvalue index
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
Eigenvalue
ela2d
ela3d
s3rmt3m3
bcsstk38
msc10848
nd3k
Fig. 1.Largest 100 eigenvalues of H=R−>
ΓAΓIA1
IAIΓR1
Γassociated with our test matrices
computed to an accuracy of 108using the Krylov-Schur method [42].
we plot the 100 largest eigenvalues of the matrix Hgiven by (2.20). We see that the
largest eigenvalues (which are the ones that we require) are clustered near one and
they do not decay rapidly. As there are a significant number of eigenvalues in the
cluster, computing the largest k(for k=O(10)) and the corresponding eigenvectors
with sufficient accuracy using the Lanczos method is challenging. Similar distributions
were observed for the larger test set that we report on in the Appendix, particularly
for problems for which the one-level preconditioner e
S1was found to perform poorly,
which is generally the case when κ(A) is large. Table 2 reports the Lanczos iteration
counts (itLan) for computing the k= 20 and 40 largest eigenpairs (that is, the number
of linear systems that are solved in the Lanczos method). In addition, we present the
PCG iteration count (itPCG) for solving the linear system (2.13) using the first-level
preconditioner e
S1=A1
Γand the two-level preconditioner e
S2given by (2.17). We
see that, in terms of the total iteration count, the first-level preconditioner is the
more efficient option. It is of interest to consider whether relaxing the convergence
tolerance εLan in the Lanczos method can reduce the total iteration count for e
S2.
Table 3 illustrates the effect of varying εLan for problem el3d (results for the other test
problems are consistent). Although itLan decreases as εLan increases, itPCG increases
and the total count still exceeds the 175 PCG iterations required by the first-level
preconditioner e
S1.
As already observed, in [49] a recursive (multilevel) scheme is proposed to
help mitigate the computational costs of building and applying the preconditioner.
TWO-LEVEL NYSTR ¨
OM–SCHUR PRECONDITIONER FOR SPD MATRICES 11
e
S2
e
S1k= 20 k= 40
Identifier itPCG itLan itPCG total itLan itPCG total
bcsstk38 584 797 122 919 730 67 797
el2d 914 1210 231 1441 982 120 1102
el3d 174 311 37 348 389 27 416
msc10848 612 813 116 929 760 63 823
nd3k 603 1796 143 1939 1349 105 1454
s3rmt3m3 441 529 70 599 480 37 517
Table 2
The Lanczos iteration count (itLan) and the iteration count for PCG (itPCG ). The convergence
tolerance for the Lanczos method and PCG is 106. The size of the Krylov subspace per cycle is
2k.
k= 20 k= 40
εLan itLan itPCG total itLan itPCG total
0.1 50 131 181 80 101 181
0.08 50 131 181 100 85 185
0.06 60 121 181 100 85 185
0.04 82 100 182 120 71 191
0.02 127 64 201 207 37 244
0.01 169 41 210 259 32 291
0.005 213 38 251 316 29 345
0.001 247 37 284 372 28 400
Table 3
Problem el3d and two-level preconditioner e
S2: sensitivity of the number of the Lanczos iteration
count (itLan ) and the iteration count for PCG (itPCG ) to the convergence tolerance εLan. The PCG
convergence tolerance is 106. The size of the Krylov subspace per cycle is 2k.
Nevertheless, the Lanczos method is still used, albeit with reduced costs for applying
the operator matrices.
3.2. Use of Nystr¨om’s method. As suggested in [14], an alternative approach
to approximating the dominant subspace of His to use a randomized method,
specifically a randomized eigenvalue decomposition. Because His SPSD, Nystr¨om’s
method can be use. Results are presented in Table 4 for problem el3d (results for our
other test examples are consistent with these). Here pis the oversampling parameter
and qis the power iteration parameter. These show that, as with the Lanczos method,
Nystr¨om’s method struggles to approximate the dominant eigenpairs of H. Using
k= 20 (respectively, 40) exact eigenpairs, PCG using e
S2requires 37 (respectively,
28) iterations. To obtain the same iteration counts using vectors computed using
Nystr¨om’s method requires the oversampling parameter to be greater than 2000,
which is clearly prohibitive. Using the power iteration improves the quality of the
approximate subspace. However, the large value of qneeded to decrease the PCG
iteration count means a large number of linear systems must be solved with AΓ, in
addition to the work involved in the orthogonalization that is needed between the
power iterations to maintain stability. Indeed, it is sufficient to look at Figure 1 to
predict this behaviour for any randomized method applied to H. The lack of success
of existing strategies motivates us, in the next section, to reformulate the eigenvalue
12 H. AL DAAS, T. REES, AND J. SCOTT
problem to one with a spectrum that is easy to approximate.
p k = 20 k= 40
100 171 169
200 170 165
400 165 161
800 155 146
1600 125 111
3200 55 45
q k = 20 k= 40
0 172 171
20 121 87
40 86 48
60 68 34
80 55 30
100 46 29
Table 4
PCG iteration counts for problem el3d using the two-level preconditioner e
S2constructed using
a rank kapproximation of H=R−>
ΓAΓIA1
IAIΓR1
Γ. The PCG convergence tolerance is 106.
Nystr¨om’s method applied to Hwith the oversampling parameter p100 and the power iteration
parameter q= 0 (left) and with p= 0 and q0(right).
4. Nystr¨om–Schur two-level preconditioner. In this section, we propose
reformulating the eigenvalue problem to obtain a new one such that the desired
eigenvectors correspond to the largest eigenvalues and these eigenvalues are well
separated from the remaining eigenvalues: this is what is needed for randomized
methods to be successful.
4.1. Two-level preconditioner for SΓ.Applying the Sherman Morrison
Woodbury identity [13, 2.1.3], the inverse of the Schur complement SΓ(2.11) can
be written as:
S1
Γ=A1
Γ+A1
ΓAΓI(AIAIΓA1
ΓAΓI)1AIΓA1
Γ
=A1
Γ+A1
ΓAΓIS1
IAIΓA1
Γ,
(4.1)
where
(4.2) SI=AIAIΓA1
ΓAΓI
is the Schur complement of Awith respect to AI. Using the Cholesky factorization
(2.16), we have
(4.3) RΓS1
ΓR>
Γ=I+R−>
ΓAΓIS1
IAIΓR1
Γ.
Note that if (λ, u) is an eigenpair of R−>
ΓSΓR1
Γ, then ( 1
λ1, u) is an eigenpair of
R−>
ΓAΓIS1
IAIΓR1
Γ. Therefore, the cluster of eigenvalues of R−>
ΓSΓR1
Γnear the
origin (which correspond to the cluster of eigenvalues of Hnear 1) correspond to
very large and highly separated eigenvalues of R−>
ΓAΓIS1
IAIΓR1
Γ. Hence, using
randomized methods to approximate the dominant subspace of R−>
ΓAΓIS1
IAIΓR1
Γ
can be an efficient way of computing a deflation subspace for R−>
ΓSΓR1
Γ. Now
assume that we have a low rank approximation
(4.4) R−>
ΓAΓIS1
IAIΓR1
Γ˘
Uk˘
Σk˘
U>
k,
where ˘
UkRnΓ×kis orthonormal and ˘
ΣkRk×kis diagonal. Combining (4.3) and
(4.4), we can define a preconditioner for R−>
ΓSΓR1
Γto be
(4.5) P1=I+˘
Uk˘
Σk˘
U>
k.
TWO-LEVEL NYSTR ¨
OM–SCHUR PRECONDITIONER FOR SPD MATRICES 13
The preconditioned matrix P1R−>
ΓSΓR1
Γis spectrally equivalent to R1
ΓP1R−>
ΓSΓ.
Therefore, the preconditioned system can be written as
(4.6) M1SΓ=R1
ΓP1R−>
ΓSΓ=A1
Γ+˘
Zk˘
Σk˘
Z>
kSΓ,
where ˘
Zk=R1
Γ˘
Uk. If (4.4) is obtained using a truncated EVD denoted by UkΣkU>
k,
then ˘
Uk=Ukand the subspace spanned by the columns of Ukis an invariant subspace
of RΓS1
ΓR>
Γand of its inverse R1
ΓSΓR−>
Γ. Furthermore, using the truncated EVD,
(4.5) is an adapted deflation preconditioner for R−>
ΓSΓR1
Γ. Indeed, as the columns of
Ukare orthonormal eigenvectors, we have from (4.3) that RΓS1
ΓR>
ΓUk=Uk(I+ Σk).
Hence R−>
ΓSΓR1
ΓUk=Uk(I+ Σk)1and the preconditioned matrix is
PA-DEFR−>
ΓSΓR1
Γ=R−>
ΓSΓR1
Γ+UkΣk(I+ Σk)1U>
k
=R−>
ΓSΓR1
Γ+Uk((I+ Σk)I) (I+ Σk)1U>
k
=R−>
ΓSΓR1
ΓUk(I+ Σk)1U>
k+UkU>
k,
which has the same form as the ideal adapted preconditioned matrix (2.7).
Note that given the matrix ˘
Ukin the approximation (4.4), then following
subsection 2.2, we can define a deflation preconditioner for R−>
ΓSΓR1
Γ. Setting
Ek=˘
U>
kR−>
ΓSΓR1
Γ˘
Ukand Q=˘
UkE1˘
U>
k, the deflation preconditioner is
(4.7) P1-A-DEF =IQR−>
ΓSΓR1
Γ+Q.
The preconditioned Schur complement P1-A-DEF R−>
ΓSΓR1
Γis spectrally similar to
R1
ΓP1-A-DEFR−>
ΓSΓand thus
(4.8) M1-A-DEF =R1
ΓP1-A-DEFR−>
Γ
is a two-level preconditioner for SΓ.
4.2. Lanczos versus Nystr¨om. The two-level preconditioner (4.8) relies on
computing a low-rank approximation (4.4). We now consider the difference between
using the Lanczos and Nystr¨om methods for this.
Both methods require the application of R−>
ΓAΓIS1
IAIΓR1
Γto a set of k+p
vectors, where k > 0 is the required rank and p0. Because explicitly computing
the SPD matrix SI=AIAIΓA1
ΓAΓIand factorizing it is prohibitively expensive,
applying S1
Imust be done using an iterative solver.
The Lanczos method builds a Krylov subspace of dimension k+pin order to
compute a low-rank approximation. Therefore, k+plinear systems must be solved,
each with one right-hand side, first for RΓ, then for SI, and then for R>
Γ. However,
the Nystr¨om method requires the solution of only one linear system with k+pright-
hand sides for RΓ, then for SI, and then for R>
Γ. This allows the use of matrix-matrix
operations rather than less efficient matrix-vector operations. Moreover, as we will
illustrate in section 5, block Krylov subspace methods, such as block CG [35], for
solving the system with SIyield faster convergence than their classical counterparts.
When the Nystr¨om method is used, we call the resulting preconditioner (4.8) the
Nystr¨om–Schur preconditioner.
4.3. Avoiding computations with RΓ.For large scale problems, computing
the Cholesky factorization AΓ=R>
ΓRΓis prohibitive and so we would like to avoid
14 H. AL DAAS, T. REES, AND J. SCOTT
computations with RΓ. We can achieve this by using an iterative solver to solve linear
systems with AΓ. Note that this is possible when solving the generalized eigenvalue
problem (2.15). Because AΓis typically well conditioned, so too is RΓ. Thus, we can
reduce the cost of computing the Nystr¨om–Schur preconditioner by approximating
the SPSD matrix AΓIS1
IAIΓ(or even by approximating S1
I). Of course, this needs
to be done without seriously adversely affecting the preconditioner quality. Using an
approximate factorization
(4.9) AΓIS1
IAIΓf
Wke
Σkf
W>
k,
an alternative deflation preconditioner is
P2=I+R−>
Γf
Wke
Σkf
W>
kR1
Γ,
=R−>
ΓAΓ+f
Wke
Σkf
W>
kR1
Γ.
The preconditioned Schur complement P2R−>
ΓSΓR1
Γis spectrally similar to
R1
ΓP2R−>
ΓSΓand, setting e
Zk=A1
Γf
Wk, we have
(4.10) M2SΓ=R1
ΓP2R−>
ΓSΓ= (A1
Γ+e
Zke
Σke
Z>
k)SΓ.
Thus M2=A1
Γ+e
Zke
Σke
Z>
kis a variant of the Nystr¨om–Schur preconditioner for SΓ
that avoids computing RΓ.
Alternatively, assuming we have an approximate factorization
(4.11) S1
Ib
Vkb
Σkb
V>
k,
yields
P3=I+R−>
ΓAΓIb
Vkb
Σkb
V>
kAIΓR1
Γ.
Again, P3R−>
ΓSΓR1
Γis spectrally similar to R1
ΓP3R−>
ΓSΓand, setting b
Zk=
A1
ΓAΓIb
Vk, we have
(4.12) M3SΓ=R1
ΓP3R−>
ΓSΓ= (A1
Γ+b
Zkb
Σkb
Z>
k)SΓ,
which gives another variant of the Nystr¨om–Schur preconditioner. In a similar way
to defining M1-A-DEF (4.7), we can define M2-A-DEF and M3-A-DEF. Note that M2-A-DEF
and M3-A-DEF also avoid computations with RΓ.
4.4. Nystr¨om–Schur preconditioner. Algorithm 4.1 presents the
construction of the Nystr¨om–Schur preconditioner M2; an analogous derivation
yields the variant M3. Step 3 is the most expensive step, that is, solving the nI×nI
SPD linear system
(4.13) SIX=F,
where FRnI×(k+p)and SI=AIAIΓA1
ΓAΓI. Using an iterative solver requires a
linear system solve with AΓon each iteration. Importantly for efficiency, the number
of iterations can be limited by employing a large relative tolerance when solving
(4.13) without adversely affecting the performance of the resulting preconditioner.
Numerical experiments in section 5 illustrate this robustness.
Observe that applying M2to a vector requires the solution of a linear system
with AΓand a low rank correction; see Step 12.
TWO-LEVEL NYSTR ¨
OM–SCHUR PRECONDITIONER FOR SPD MATRICES 15
Algorithm 4.1 Construction of the Nystr¨om–Schur preconditioner (4.10)
Input: Ain block form (2.9),k > 0 and p0 (k, p nΓ) and ε > 0.
Output: Two-level preconditioner for the nΓ×nΓSchur complement SΓ.
1: Draw a random matrix GRnΓ×(k+p).
2: Compute F=AIΓG.
3: Solve SIX=F.
4: Compute Y=AΓIX.
5: Compute Y=QR.
6: Set C=G>Y.
7: Compute the EVD C=V1D1V>
1+V2D2V>
2, where D1contains all the eigenvalues
that are at least ε.
8: Set T=RV1D1
1V>
1R>.
9: Compute the EVD T=W EW >.
10: Set e
U=Y W (:,1 : k), Σ = E(1 : k, 1 : k).
11: Solve AΓZ=e
U.
12: Define the preconditioner M2=A1
Γ+ZΣZ>.
4.5. Estimation of the Spectral Condition Number. In this section, we
provide an expectation of the spectral condition number of SΓpreconditioned by
the Nystr¨om–Schur preconditioner. Saibaba [37] derives bounds on the angles
between the approximate singular vectors computed using a randomized singular
value decomposition and the exact singular vectors of a matrix. It is straightforward
to derive the corresponding bounds for the Nystr¨om method. Let ΠMdenote
the orthogonal projector on the space spanned by the columns of the matrix M.
Let (λj, uj), j= 1, . . . , k, be the dominant eigenpairs of R−>
ΓSΓR1
Γ. Following
the notation in Algorithm 2.1, the angle θj=(uj,e
U) between the approximate
eigenvectors e
URnΓ×(k+p)of R−>
ΓSΓR1
Γand the exact eigenvector ujRnΓ
satisfies
(4.14) sin (uj,e
U) = kujΠe
Uujk2γq+1
j,k c,
where qis the power iteration count (recall (2.2)), γj,k is the gap between λ1
j1
and λ1
k+1 1 given by
(4.15) γj,k = (λ1
k+1 1)/(λ1
j1),
and chas the expected value
(4.16) E(c) = sk
p1+ep(k+p)(nΓk)
p,
where kis the required rank and p2 is the oversampling parameter. Hence,
(4.17) Esin (uj,e
U)=EkujΠe
Uujk2γq+1
j,k E(c).
Note that if λj1/2 then γj,k 2λjk+1 (j= 1, . . . , k).
Proposition 4.1. Let the EVD of the SPD matrix IH=R−>
ΓSΓR1
Γbe
UUkΛ
ΛkU>
U>
k,
16 H. AL DAAS, T. REES, AND J. SCOTT
where ΛR(nΓk)×(nΓk)and ΛkRk×kare diagonal matrices with the eigenvalues
(λi)ki1and (λi)nΓik+1, respectively, in decreasing order. Furthermore, assume
that λk1/2. Let the columns of e
URnΓ×(k+p)be the approximate eigenvectors of
IHcomputed using the Nystr¨om method and let
P=I(IH)e
UE1e
U>with E=e
U>(IH)e
U,
be the associated deflation preconditioner. Then, the effective condition number of the
two-level preconditioner P(IH) = PR−>
ΓSΓR1
Γsatisfies
(4.18) Epκeff (P(IH))c1sλnΓ
λk+1
,
where c2
1is independent of the spectrum of IHand can be bounded by a polynomial
of degree 3 in k.
Proof. Let xRnΓ. Since u1, . . . , unΓform an orthogonal basis of RnΓ, there
exists α1, . . . , αnΓRsuch that x=PnΓ
i=1 αiui. In [25, Theorem 3.4], Kahl and
Rittich show that, if for some positive constant cK,e
Usatisfies
(4.19) kxΠe
Uxk2
2cKkxk2
IH
kIHk2
,
then the effective condition number of P(IH) satisfies
κeff (P(IH)) cK.
Let tkand consider
kxΠe
Uxk2=k
nΓ
X
i=1
αiuiΠe
U
nΓ
X
i=1
αiuik2
≤ k
nΓ
X
i=t+1
(IΠe
U)αiuik2+
t
X
i=1 |αi|kuiΠe
Uuik2
≤ k
nΓ
X
i=t+1
αiuik2+
t
X
i=1 |αi|kuiΠe
Uuik2.
The last inequality is obtained using the fact that IΠe
Uis an orthogonal projector.
Now bound each term on the right separately. We have
k
nΓ
X
i=t+1
αiuik21
pλt+1 k
nΓ
X
i=t+1 pλt+1αiuik21
pλt+1 k
nΓ
X
i=t+1 pλiαiuik2
1
pλt+1
nΓ
X
i=t+1
λiα2
i=1
pλt+1 kxΠUtxkIH=sλnΓ
λt+1
kxΠUtxkIH
pkIHk2
.
TWO-LEVEL NYSTR ¨
OM–SCHUR PRECONDITIONER FOR SPD MATRICES 17
From (4.15),γi,k 1 for i= 1, . . . , t, thus,
t
X
i=1 |αi|kuiΠe
Uuik2
t
X
i=1 |αi|γq+1
i,k cq+1
2
t,k
t
X
i=1 |αi|γi,k
=q+1
2
t,k qλ1
k+1 1
t
X
i=1 |αi|1
qλ1
i1
q+1
2
t,k
1
pλk+1
t
X
i=1 |αi|1
qλ1
i1
.
Assuming that λi1/2 for i= 1, . . . , t, we have
t
X
i=1 |αi|kuiΠe
Uuik22q+1
2
t,k
1
pλk+1
t
X
i=1 |αi|1
qλ1
i
2q+1
2
t,k
1
pλk+1
t
X
i=1 |αi|pλi.
Using the fact that the l1and l2norms are equivalent, we have
t
X
i=1 |αi|kuiΠe
Uuik2c2q+1
2
t,k
1
pλk+1 v
u
u
t
t
X
i=1
α2
iλi
=c2q+1
2
t,k
1
pλk+1 kΠUtxkIH
=c2q+1
2
t,k sλnΓ
λk+1
kΠUtxkIH
pkIHk2
.
Since λkλtwe have
t
X
i=1 |αi|kuiΠe
Uuik2c2q+1
2
t,k sλnΓ
λt+1
kΠUtxkIH
pkIHk2
.
It follows that
kxΠe
Uxk2sλnΓ
λt+1
kxΠUtxkIH
pkIHk2
+c2q+1
2
t,k sλnΓ
λt+1
kΠUtxkIH
pkIHk2
2 max(c2q+1
2
t,k ,1)sλnΓ
λt+1
kxkIH
pkIHk2
.
Hence (4.19) is satisfied and we have
κeff (P(IH)) 2 max(2c22q+1
t,k ,1) λnΓ
λt+1
.
Thus,
Epκeff (P(IH))2 max(E(c)2q+1
2
t,k ,1)sλnΓ
λt+1
.
18 H. AL DAAS, T. REES, AND J. SCOTT
Since tis chosen arbitrarily between 1 and kwe have
(4.20) Epκeff (P(IH))2 min
1tk max E(c)2q+1
2
t,k ,1sλnΓ
λt+1 !.
Because E(c) can be bounded by a polynomial of degree 1 in kand γt,k 1,
max(42q+1
t,k (E(c))2,2) can be bounded by a polynomial of degree 3 in kindependent
of the spectrum of IH.
Note that, in practice, when the problem is challenging, a few eigenvalues of
R−>
ΓSΓR1
Γare close to the origin. This is reflected in a rapid and exponential
decay of the values of the entries of Λ1I.Figure 2 depicts the bound obtained
in Proposition 4.1 for different values of kand qfor problem s3rmt3m3.
5 10 20 40
k
102
104
106
q=0
q=1
q=2
1/k
Fig. 2.Problem s3rmt3m3: Values of the bound (4.20) on Epκeff (P(IH))2for a range
of values of kand q.
5. Numerical Experiments. We use 64 subdomains (i.e., AIis a 64-block
diagonal matrix) for each of our test matrices with the exception of one problem. The
matrix nd3k is much denser than the others, and we use only two blocks (to reduce
the runtime). For comparison purposes, we include results for the Schur complement
preconditioners e
S1and e
S2given by (2.14) and (2.17), respectively. As demonstrated
in subsection 3.1, the latter is too costly to be practical, however, its performance
is the ideal since it guarantees the smallest spectral condition number for a fixed
deflation subspace. Therefore, the quality of the Nystr¨om–Schur preconditioner will
be measured in terms of how close its performance is to that of e
S2and the reduction in
iteration it gives compared to e
S1. For a given problem, the right-hand side vector is the
same for all the tests: it is generated randomly with entries from the standard normal
distribution. The relative convergence tolerance for PCG is 106. Unless otherwise
specified, the parameters within Nystr¨om’s method (Algorithm 2.1) are rank k= 20,
oversampling p= 0, and power iteration q= 0. To ensure fair comparisons, the
random matrices generated in different runs of the Nystr¨om algorithm use the same
seed. We employ the Nystr¨om–Schur variant M2(4.10) (recall that its construction
does not require the Cholesky factors of AΓ). The relative convergence tolerance used
when solving the SPD system (4.13) is εSI= 0.1. This system (4.13) is preconditioned
by the block diagonal matrix AI. We denote by itSIthe number of block PCG
iterations required to solve (4.13) during the construction of the Nystr¨om–Schur
preconditioners (it is zero for e
S1and e
S2), and by itPCG the PCG iteration count
TWO-LEVEL NYSTR ¨
OM–SCHUR PRECONDITIONER FOR SPD MATRICES 19
100 150 200 250
# iteration
0
1
2
3
4
# rhs
Fig. 3.Histogram of the PCG iteration counts for (4.13) for problem bcsstk38. The number of
right hand sides for which the iteration count is between [k, k + 10),k= 100,...,240, is given.
Classic Block
Identifier iters itPCG iters itPCG
bcsstk38 238 186 46 173
el2d 549 261 72 228
el3d 95 56 24 52
msc10848 203 194 47 166
nd3k 294 191 32 178
s3rmt3m3 403 157 37 98
Table 5
A comparison of the performance of classic and block PCG. iters denotes the iteration count
for solving (4.13) (details in the text) and itPCG is the iteration count for solving (2.13).
for solving (2.13). The total number of iterations is ittotal =itSI+itPCG . We use the
code [1] to generate the numerical experiments.
5.1. Linear system with SI.We start by considering how to efficiently
compute an approximate solution of (4.13).
5.1.1. Block and classic CG. The system (4.13) has k+pright hand sides.
The number of iterations required by PCG to solve each right hand side is different
and the variation can be large; this is illustrated in Figure 3 for problem bcsstk38.
Here we report the number of right hand sides for which the iteration count lies in
the interval [k, k + 10), k= 100,...,240. For example, there are 4 right hand sides
for which the count is between 110 and 119. Similar behaviour was observed for our
other test problems.
Table 5 reports the iteration counts for the classical PCG method and the
breakdown-free block PCG method [21,35]. For PCG, iters is the largest PCG
iteration count over the k+pright hand sides. For the block method, iters =itSIis
the number of block PCG iterations. As expected from the theory, the block method
significantly reduces the (maximum) iteration count. For our examples, it also leads
to a modest reduction in the iteration count itPCG for solving (2.13).
5.1.2. Impact of tolerance εSI.We now study the impact of the convergence
tolerance εSIused when solving (4.13) on the quality of the Nystr¨om–Schur
preconditioner. In Table 6, we present results for three test problems that illustrate
the (slightly) different behaviors we observed. The results demonstrate numerically
20 H. AL DAAS, T. REES, AND J. SCOTT
M2e
S1e
S2
Identifier εSIitSIitPC G
el2d
0.8 1 500+
914 231
0.5 68 228
0.3 70 228
0.1 72 228
0.01 78 228
el3d
0.8 1 173
174 37
0.5 2 171
0.3 22 52
0.1 24 52
0.01 27 52
nd3k
0.8 32 178
603 143
0.5 32 178
0.3 32 178
0.1 32 178
0.01 33 178
Table 6
The effects of the convergence tolerance εSIon the quality of the Nystr¨om–Schur preconditioner.
Identifier M1M1-A-DEF M2M2-A-DEF M3M3-A-DEF e
S1e
S2
bcsstk38 218 218 219 219 360 313 584 122
el2d 266 267 300 300 282 282 914 231
el3d 73 72 76 75 78 76 174 37
msc10848 206 205 213 211 216 222 612 116
nd3k 205 205 210 210 211 211 603 143
s3rmt3m3 127 127 135 134 161 153 441 70
Table 7
Comparison of ittotal for the variants of the Nystr¨om–Schur preconditioner and e
S1and e
S2.
εSI= 0.1.
that a large tolerance can be used without affecting the quality of the preconditioner.
Indeed, using εSI= 0.3 leads to a preconditioner whose efficiency is close to that of the
ideal (but impractical) two-level preconditioner e
S2. The use of a large εSIto limit itSI
is crucial in ensuring low construction costs for the Nystr¨om–Schur preconditioners.
5.2. Type of preconditioner. We next compare the performances of the
variants Miand Mi-A-DEF (i= 1,2,3) of the Nystr¨om–Schur preconditioner presented
in section 4. In Table 7, we report the total iteration count ittotal. All the variants
have similar behaviors and have a significantly smaller count than the one-level
preconditioner e
S1.
5.3. Varying the rank and the oversampling parameter. We now look
at varying the rank kwithin the Nystr¨om algorithm and demonstrate numerically
that the efficiency of the preconditioner is robust with respect to the oversampling
parameter p. For problem s3rmt3m3, Table 8 compares the iteration counts for M2
with that of the ideal two-level preconditioner e
S2for kranging from 5 to 320. For e
S1,
the iteration count is 441. This demonstrates the effectiveness of the Nystr¨om–Schur
preconditioner in reducing the iteration count. Increasing the size of the deflation
subspace (the rank k) steadily reduces the iteration count required to solve the SI
TWO-LEVEL NYSTR ¨
OM–SCHUR PRECONDITIONER FOR SPD MATRICES 21
k5 10 20 40 80 160 320
M2
itSI97 57 37 23 16 11 8
itPCG 244 203 98 53 30 20 14
e
S2itPCG 212 153 70 37 22 13 9
Table 8
Problem s3rmt3m3: Impact of the rank kon the iteration counts (p= 0).
p0 5 10 20 40
itSI37 31 28 23 20
itPCG 98 86 79 77 74
Table 9
Problem s3rmt3m3: Impact of the oversampling parameter pon the iteration counts (k= 20).
system (4.13). For the same test example, Table 9 presents the iteration counts for
a range of values of the oversampling parameter p(here k= 20). We observe that
the counts are relatively insensitive to pbut, as pincreases, itPCG reduces towards
the lower bound of 70 PCG iterations required by e
S2. Similar behavior was noticed
for our other test examples. Although increasing kand pimproves the efficiency
of the Nystr¨om–Schur preconditioner, this comes with extra costs during both the
construction of the preconditioner and its application. Nevertheless, the savings from
the reduction in the iteration count and the efficiency in solving block linear systems
of equations for moderate block sizes (for example, k= 40) typically outweigh the
increase in construction costs.
5.4. Comparisons with incomplete Cholesky factorization
preconditioners. Finally, we compare the Nystr¨om–Schur preconditioner with
two incomplete Cholesky factorization preconditioners applied to original system.
The first is the Matlab variant ichol with the global diagonal shift set to 0.1 and
default values for other parameters and the second is the Matlab interface to the
incomplete Cholesky (IC) factorization preconditioner HSL_MI28 [39] from the HSL
library [20] using the default parameter settings. IC preconditioners are widely used
but their construction is often serial, potentially limiting their suitability for very
large problems (see [19] for an IC preconditioner that can be parallelised). In terms
of iteration counts, the Nystr¨om–Schur and the HSL_MI28 preconditioners are clearly
superior to the simple ichol preconditioner, with neither consistently offering the
best performance. Figure 4 presents the residual norm history for PCG. This is
confirmed by the results in the Appendix for our large test set. The residual norm for
M2decreases monotonically while for the IC preconditioners we observe oscillatory
behaviour.
Because our implementation of the Nystr¨om–Schur preconditioner is in Matlab,
we are not able to provide performance comparisons in terms of computation times.
Having demonstrated the potential of our two-level Nystr¨om–Schur preconditioner,
one of our objectives for the future is to develop an efficient (parallel) implementation
in Fortran that will be included within the HSL library. This will allow users to
test out the preconditioner and to assess the performance of both constructing and
applying the preconditioner. Our preliminary work on this is encouraging.
22 H. AL DAAS, T. REES, AND J. SCOTT
Identifier M2HSL_MI28 ichol
itSIitP CG
bcsstk38 46 173 593 2786
ela2d 72 228 108 2319
ela3d 24 52 36 170
msc10848 47 166 145 784
nd3k 32 178 102 1231
s3rmt3m3 37 98 610 2281
Table 10
PCG iteration counts for the Nystr¨om–Schur preconditioner M2(with k= 20) and the IC
preconditioners HSL_MI28 and ichol.
0 500 1000 1500 2000 2500 3000
Iteration count
10-8
10-6
10-4
10-2
100
102
104
Residual norm
0 500 1000 1500 2000 2500
Iteration count
10-8
10-6
10-4
10-2
100
102
Residual norm
Fig. 4.PCG residual norm history for test examples bcsstk38 (top) and ela2d (bottom).
6. Concluding comments. In this paper, we have investigated using
randomized methods to construct efficient and robust preconditioners for use with
CG to solve large-scale SPD linear systems. The approach requires an initial
ordering to doubly bordered block diagonal form and then uses a Schur complement
approximation. We have demonstrated that by carefully posing the approximation
problem we can apply randomized methods to construct high quality preconditioners,
which gives an improvement over previously proposed methods that use low rank
approximation strategies. We have presented a number of variants of our new
Nystr¨om–Schur preconditioner. During the preconditioner construction, we must
solve a smaller linear system with multiple right-hand sides. Our numerical
experiments have shown that a small number of iterations of block CG are needed
to obtain an approximate solution that is sufficient to construct an effective
preconditioner.
TWO-LEVEL NYSTR ¨
OM–SCHUR PRECONDITIONER FOR SPD MATRICES 23
Currently, the construction and application of our Nystr¨om–Schur preconditioners
requires the solution of linear systems with the block matrix AΓ(2.9). Given
the promising results presented in this paper, in the future we plan to investigate
employing a recursive approach, following ideas given in [49]. This will only require
the solution of systems involving a much smaller matrix and will lead to a practical
approach for very large-scale SPD systems. A parallel implementation of the
preconditioner will also be developed.
Appendix A. Extended numerical experiments. Here we present results
for a larger test set. The problems are given in Table 11. We selected all the SPD
matrices in the SuiteSparse Collection with nlying between 5K and 100K, giving us a
set of 71 problems. For each problem, we ran PCG with the e
S1,M2,e
S2and HSL MI28
preconditioners. In all the tests, we use 64 subdomains. For M2, we used k= 20
and set p=q= 0. Iteration counts are given in the table, whilst performance profiles
[6] are presented in Figure 5. In recent years, performance profiles have become a
popular and widely used tool for providing objective information when benchmarking
algorithms. The performance profile takes into account the number of problems solved
by an algorithm as well as the cost to solve it. It scales the cost of solving the problem
according to the best solver for that problem. In our case, the performance cost is
the iteration count (for M2, we sum the counts itSIand itPC G). Note that we do
not include e
S2in the performance profiles because it is an ideal but impractical two-
level preconditioner and, as such, it always outperforms M2. The performance profile
shows that on the problems where e
S1struggles, there is little to choose between the
overall quality of M2and HSL MI28.
e
S1M2e
S2HSL MI28 κ(A)
Identifier itSIitPC G
aft01 118 19 45 31 17 9e+18
apache1 667 122 291 192 72 3e+06
bcsstk17 349 46 55 48 59 1e+10
bcsstk18 136 40 77 45 26 6e+11
bcsstk25 92 660 453 254 1e+13
bcsstk36 451 64 214 169 1e+12
bcsstk38 584 46 171 122 593 6e+16
bodyy6 182 53 163 129 5 9e+04
cant 57 228 396 933 5e+10
cfd1 209 30 72 50 274 1e+06
consph 185 47 177 136 50 3e+07
gridgena 426 90 377 298 66 6e+05
gyro 55 346 518 319 4e+09
gyro k 55 346 518 319 3e+09
gyro m 165 16 34 22 17 1e+07
m t1 867 85 247 187 3e+11
minsurfo 15 3 15 13 3 8e+01
msc10848 612 47 168 116 145 3e+10
msc23052 479 69 220 175 1e+12
nasasrb 1279 135 496 421 1e+09
nd3k 1091 56 301 230 102 5e+07
nd6k 1184 108 325 248 116 6e+07
oilpan 647 67 122 72 507 4e+09
olafu 1428 69 489 757 557 2e+12
pdb1HYS 869 89 83 274 483 2e+12
vanbody 287 1106 769 4e+03
ct20stif 1296 90 232 281 2e+14
nd12k 1039 155 337 265 111 2e+08
nd24k 1093 165 386 268 120 2e+08
s1rmq4m1 154 19 50 32 33 5e+06
s1rmt3m1 192 24 59 39 18 3e+08
s2rmq4m1 231 28 54 41 39 4e+08
s2rmt3m1 260 31 64 45 33 3e+11
s3dkq4m2 148 339 236 610 6e+11
e
S1M2e
S2HSL MI28 κ(A)
Identifier itSIitPCG
s3dkt3m2 164 338 270 1107 3e+10
s3rmq4m1 356 31 80 58 472 4e+10
s3rmt3m1 434 36 101 64 413 4e+10
s3rmt3m3 441 37 101 70 610 3e+00
ship 001 1453 367 600 368 1177 6e+09
smt 399 59 112 72 95 1e+09
thermal1 169 30 62 47 30 4e+01
Pres Poisson 92 13 29 19 32 3e+06
crankseg 1 92 16 49 33 34 9e+18
crankseg 2 89 17 47 32 38 8e+06
Kuu 81 16 44 31 10 3e+04
bodyy5 72 19 67 57 4 9e+03
Dubcova2 62 11 32 23 14 1e+04
cbuckle 55 9 51 39 47 7e+07
fv3 50 12 31 21 8 4e+03
Dubcova1 39 8 24 15 7 2e+03
bodyy4 34 8 29 24 4 1e+03
jnlbrng1 22 4 21 19 4 1e+02
bundle1 13 3 8 5 5 1e+04
t2dah e 12 3 12 11 3 3e+07
obstclae 12 3 12 12 3 4e+01
torsion1 12 3 12 12 3 8e+03
wathen100 12 3 12 11 3 2e+07
wathen120 12 3 12 11 3 2e+07
fv1 7 2 7 7 3 1e+01
fv2 7 2 7 7 3 1e+01
shallow water2 7 40 7 7 3 3e+12
shallow water1 5 20 5 5 2 1e+01
Muu 6 1 6 6 2 1e+02
qa8fm 6 1 6 6 2 1e+02
crystm02 6 1 6 5 2 4e+02
crystm03 6 1 6 5 2 4e+02
finan512 5 1 5 5 3 9e+01
ted B unscaled 3 1 3 4 2 4e+05
ted B 2 1 3 3 2 2e+11
Trefethen 20000b 3 1 2 2 3 1e+05
Trefethen 20000 4 1 2 2 3 2e+05
Table 11
PCG iteration counts for SPD matrices from the SuiteSparse Collection with nranging between
5K and 100K.
24 H. AL DAAS, T. REES, AND J. SCOTT
Fig. 5.Iteration count performance profile for the large test set. The 40 problems used in the
right hand plot are the subset for which the e
S1(one-level) iteration count exceeded 100.
REFERENCES
[1] H. Al Daas,haldaas/Nystrom–Schur-Preconditioner: version reproducing paper numerical
experiments, June 2021, https://doi.org/10.5281/zenodo.4957301.
[2] H. Al Daas and L. Grigori,A class of efficient locally constructed preconditioners based on
coarse spaces, SIAM Journal on Matrix Analysis and Applications, 40 (2019), pp. 66–91.
[3] H. Al Daas, L. Grigori, P. Jolivet, and P.-H. Tournier,A multilevel Schwarz
preconditioner based on a hierarchy of robust coarse spaces, SIAM Journal on Scientific
Computing, (2021), pp. A1907–A1928.
[4] H. Al Daas, P. Jolivet, and J. A. Scott,A robust algebraic domain decomposition
preconditioner for sparse normal equations, 2021, https://arxiv.org/abs/2107.09006.
[5] T. A. Davis and Y. Hu,The University of Florida sparse matrix collection, ACM Transactions
on Mathematical Software, 38 (2011), pp. 1–28.
[6] E. D. Dolan and J. J. Mor´
e,Benchmarking optimization software with performance profiles,
Mathematical Programming, 91 (2002), pp. 201–213.
[7] V. Dolean, P. Jolivet, and F. Nataf,An introduction to domain decomposition methods,
Society for Industrial and Applied Mathematics, Philadelphia, PA, 2015. Algorithms,
theory, and parallel implementation.
[8] Z. Dost´
al,Conjugate gradient method with preconditioning by projector, International Journal
of Computer Mathematics, 23 (1988), pp. 315–323.
[9] I. S. Duff, A. M. Erisman, and J. K. Reid,Direct Methods for Sparse Matrices, Second
Edition, Oxford University Press, London, 2017.
[10] J. Frank and C. Vuik,On the construction of deflation-based preconditioners, SIAM Journal
on Scientific Computing, 23 (2001), pp. 442–462.
[11] A. Gaul, M. H. Gutknecht, J. Liesen, and R. Nabben,A framework for deflated and
augmented Krylov subspace methods, SIAM Journal on Matrix Analysis and Applications,
34 (2013), pp. 495–518.
[12] A. Gittens and M. W. Mahoney,Revisiting the Nystr¨om method for improved large-scale
machine learning, J. Mach. Learn. Res., 17 (2016), pp. 3977–4041.
[13] G. H. Golub and C. F. Van Loan,Matrix Computations, The Johns Hopkins University
Press, third ed., 1996.
[14] L. Grigori, F. Nataf, and S. Yousef,Robust algebraic Schur complement preconditioners
based on low rank corrections, Research Report RR-8557, INRIA, July 2014, https://hal.
inria.fr/hal-01017448.
[15] M. H. Gutknecht,Deflated and augmented Krylov subspace methods: A framework for
deflated BiCG and related solvers, SIAM Journal on Matrix Analysis and Applications,
35 (2014), pp. 1444–1466.
[16] N. Halko, P.-G. Martinsson, and J. A. Tropp,Finding structure with randomness:
Probabilistic algorithms for constructing approximate matrix decompositions, SIAM
Review, 53 (2011), pp. 217–288.
[17] F. Hecht,New development in freefem++, Journal of Numerical Mathematics, 20 (2012),
pp. 251–265.
[18] N. J. Higham and T. Mary,A new preconditioner that exploits low-rank approximations to
TWO-LEVEL NYSTR ¨
OM–SCHUR PRECONDITIONER FOR SPD MATRICES 25
factorization error, SIAM Journal on Scientific Computing, 41 (2019), pp. A59–A82.
[19] J. Hook, J. Scott, F. Tisseur, and J. Hogg,A max-plus apporach to incomplete
Cholesky factorization preconditioners, SIAM Journal on Scientific Computing, 40 (2018),
pp. A1987–A2004.
[20] HSL. A collection of Fortran codes for large-scale scientific computation, 2018. http://www.
hsl.rl.ac.uk.
[21] H. Ji and Y. Li,A breakdown-free block conjugate gradient method, BIT Numerical
Mathematics, 57 (2017), pp. 379–403.
[22] T. B. J¨
onsth¨
ovel, M. B. van Gijzen, C. Vuik, C. Kasbergen, and A. Scarpas,
Preconditioned conjugate gradient method enhanced by deflation of rigid body modes
applied to composite materials, Computer Modeling in Engineering & Sciences, 47 (2009),
pp. 97–118.
[23] T. B. J¨
onsth¨
ovel, M. B. van Gijzen, C. Vuik, and A. Scarpas,On the use of rigid body
modes in the deflated preconditioned conjugate gradient method, SIAM Journal on Scientific
Computing, 35 (2013), pp. B207–B225.
[24] E. F. Kaasschieter,Preconditioned conjugate gradients for solving singular systems, Journal
of Computational and Applied Mathematics, 24 (1988), pp. 265–275.
[25] K. Kahl and H. Rittich,The deflated conjugate gradient method: Convergence, perturbation
and accuracy, Linear Algebra and its Applications, 515 (2017), pp. 111–129.
[26] G. Karypis and V. Kumar,METIS: A software package for partitioning unstructured graphs,
partitioning meshes, and computing fill-reducing orderings of sparse matrices, Technical
Report 97-061, University of Minnesota, Department of Computer Science and Army HPC
Research Center, 1997.
[27] R. Li, Y. Xi, and Y. Saad,Schur complement-based domain decomposition preconditioners
with low-rank corrections, Numerical Linear Algebra with Applications, 23 (2016), pp. 706–
729.
[28] P.-G. Martinsson and J. A. Tropp,Randomized numerical linear algebra: Foundations and
algorithms, Acta Numerica, 29 (2020), pp. 403–572.
[29] METIS - serial graph partitioning and fill-reducing matrix ordering, 2020. http://glaros.dtc.
umn.edu/gkhome/metis/metis/overview.
[30] R. Nabben and C. Vuik,A comparison of abstract versions of deflation, balancing and additive
coarse grid correction preconditioners, Numerical Linear Algebra with Applications, 15
(2008), pp. 355–372.
[31] Y. Nakatsukasa,Fast and stable randomized low-rank matrix approximation, 2020, https:
//arxiv.org/abs/2009.11392.
[32] F. Nataf, H. Xiang, V. Dolean, and N. Spillane,A coarse space construction based on local
Dirichlet-to-Neumann maps, SIAM Journal on Scientific Computing, 33 (2011), pp. 1623–
1642.
[33] R. A. Nicolaides,Deflation of conjugate gradients with applications to boundary value
problems, SIAM J. on Numerical Analysis, 24 (1987), pp. 355–365.
[34] E. J. Nystr¨
om,¨
Uber die praktische au߬osung von integralgleichungen mit anwendungen auf
randwertaufgaben, Acta Mathematica, 54 (1930), pp. 185–204.
[35] D. P. O’Leary,The block conjugate gradient algorithm and related methods, Linear Algebra
and its Applications, 29 (1980), pp. 293–322.
[36] Y. Saad,Iterative Methods for Sparse Linear Systems, Society for Industrial and Applied
Mathematics, Philadelphia, PA, USA, 2nd ed., 2003.
[37] A. K. Saibaba,Randomized subspace iteration: Analysis of canonical angles and unitarily
invariant norms, SIAM Journal on Matrix Analysis and Applications, 40 (2019), pp. 23–
48.
[38] J. A. Scott,A parallel frontal solver for finite element applications, International J. of
Numerical Methods in Engineering, 50 (2001), pp. 1131–1144.
[39] J. A. Scott and M. T˚
uma,HSL_MI28: An efficient and robust limited-memory incomplete
Cholesky factorization code, ACM Transactions on Mathematical Software, 40 (2014),
pp. 24:1–19.
[40] N. Spillane, V. Dolean, P. Hauret, F. Nataf, C. Pechstein, and R. Scheichl,Abstract
robust coarse spaces for systems of PDEs via generalized eigenproblems in the overlaps,
Numerische Mathematik, 126 (2014), pp. 741–770.
[41] N. Spillane and D. Rixen,Automatic spectral coarse spaces for robust finite element tearing
and interconnecting and balanced domain decomposition algorithms, International Journal
for Numerical Methods in Engineering, 95 (2013), pp. 953–990.
[42] G. W. Stewart,A Krylov–Schur algorithm for large eigenproblems, SIAM Journal on Matrix
Analysis and Applications, 23 (2002), pp. 601–614.
26 H. AL DAAS, T. REES, AND J. SCOTT
[43] J. M. Tang, S. P. MacLachlan, R. Nabben, and C. Vuik,A comparison of two-level
preconditioners based on multigrid and deflation, SIAM Journal on Matrix Analysis and
Applications, 31 (2010), pp. 1715–1739.
[44] J. M. Tang, R. Nabben, C. Vuik, and Y. A. Erlangga,Comparison of two-level
preconditioners derived from deflation, domain decomposition and multigrid methods,
Journal of Scientific Computing, 39 (2009), pp. 340–370.
[45] C. Vuik, A. Segal, and J. A. Meijerink,An efficient preconditioned CG method for the
solution of a class of layered problems with extreme contrasts in the coefficients, Journal
of Computational Physics, 152 (1999), pp. 385–403.
[46] C. Vuik, A. Segal, J. A. Meijerink, and G. T. Wijma,The construction of projection vectors
for a deflated ICCG method applied to problems with extreme contrasts in the coefficients,
Journal of Computational Physics, 172 (2001), pp. 426–450.
[47] C. K. I. Williams and M. Seeger,Using the Nystr¨om method to speed up kernel machines,
in Advances in Neural Information Processing Systems 13, T. K. Leen, T. G. Dietterich,
and V. Tresp, eds., MIT Press, 2001, pp. 682–688.
[48] D. Woodruff,Sketching as a Tool for Numerical Linear Algebra, Foundations and Trends(r)
in Theoretical Computer Science Series, Now Publishers, 2014.
[49] Y. Xi, R. Li, and Y. Saad,An algebraic multilevel preconditioner with low-rank corrections
for sparse symmetric matrices, SIAM Journal on Matrix Analysis and Applications, 37
(2016), pp. 235–259.
[50] Q. Zheng, Y. Xi, and Y. Saad,A power Schur complement low-rank correction preconditioner
for general sparse linear systems, SIAM Journal on Matrix Analysis and Applications, 42
(2021), pp. 659–682.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
In this paper we present a class of robust and fully algebraic two-level preconditioners for SPD matrices. We introduce the notion of algebraic local SPSD splitting of an SPD matrix and we give a characterization of this splitting. This splitting leads to construct algebraically and locally a class of efficient coarse spaces which bound the spectral condition number of the preconditioned system by a number defined a priori. We also introduce the τ-filtering subspace. This concept helps compare the dimension minimality of coarse spaces. Some PDEs-dependant preconditioners correspond to a special case. The examples of the algebraic coarse spaces in this paper are not practical due to expensive construction. We propose a heuristic approximation that is not costly. Numerical experiments illustrate the efficiency of the proposed method.
Article
Full-text available
This paper is concerned with the analysis of the randomized subspace iteration for the computation of low-rank approximations. We present three different kinds of bounds. First, we derive both bounds for the canonical angles between the exact and the approximate singular subspaces. Second, we derive bounds for the low-rank approximation in any unitarily invariant norm (including the Schatten-p norm). This generalizes the bounds for Spectral and Frobenius norms found in the literature. Third, we present bounds for the accuracy of the singular values. The bounds are structural in that they are applicable to any starting guess, be it random or deterministic, that satisfies some minimal assumptions. Specialized bounds are provided when a Gaussian random matrix is used as the starting guess. Numerical experiments demonstrate the effectiveness of the proposed bounds.
Article
This survey describes probabilistic algorithms for linear algebraic computations, such as factorizing matrices and solving linear systems. It focuses on techniques that have a proven track record for real-world problems. The paper treats both the theoretical foundations of the subject and practical computational issues. Topics include norm estimation, matrix approximation by sampling, structured and unstructured random embeddings, linear regression problems, low-rank approximation, subspace iteration and Krylov methods, error estimation and adaptivity, interpolatory and CUR factorizations, Nyström approximation of positive semidefinite matrices, single-view (‘streaming’) algorithms, full rank-revealing factorizations, solvers for linear systems, and approximation of kernel matrices that arise in machine learning and in scientific computing.
Article
We consider ill-conditioned linear systems Ax = b that are to be solved iteratively, and assume that a low accuracy LU factorization A ≈ LÛ is available for use in a preconditioner. We have observed that for ill-conditioned matrices A arising in practice, A ⁻¹ tends to be numerically low rank, that is, it has a small number of large singular values. Importantly, the error matrix E = Û ⁻¹ L ⁻¹ A-I tends to have the same property. To understand this phenomenon we give bounds for the distance from E to a low-rank matrix in terms of the corresponding distance for A ⁻¹ . We then design a novel preconditioner that exploits the low-rank property of the error to accelerate the convergence of iterative methods. We apply this new preconditioner in three different contexts fitting our general framework: low oating-point precision (e.g., half precision) LU factorization, incomplete LU factorization, and block low-rank LU factorization. In numerical experiments with GMRES-based iterative refinement we show that our preconditioner can achieve a significant reduction in the number of iterations required to solve a variety of real-life problems.
Article
We present a new method for constructing incomplete Cholesky factorization preconditioners for use in solving large sparse symmetric positive-definite linear systems. This method uses max-plus algebra to predict the positions of the largest entries in the Cholesky factor and then uses these positions as the sparsity pattern for the preconditioner. Our method builds on the max-plus incomplete LU factorization preconditioner recently proposed in [J. Hook and F. Tisseur, SIAM J. Matrix Anal. Appl., 38 (2017), pp. 1160–1189] but is applied to symmetric positive-definite matrices, which comprise an important special case for the method and its application. An attractive feature of our approach is that the sparsity pattern of each column of the preconditioner can be computed in parallel. Numerical comparisons are made with other incomplete Cholesky factorization preconditioners using problems from a range of practical applications. We demonstrate that the new preconditioner can outperform traditional level-based preconditioners and offer a parallel alternative to a serial limited-memory–based approach. © 2018 SIAM. Published by SIAM under the terms of the Creative Commons 4.0 license.
Article
Deflation techniques for Krylov subspace methods have seen a lot of attention in recent years. They provide means to improve the convergence speed of these methods by enriching the Krylov subspace with a deflation subspace. The most common approach for the construction of deflation subspaces is to use (approximate) eigenvectors, but also more general subspaces are applicable. In this paper we discuss two results concerning the accuracy requirements within the deflated CG method. First we show that the effective condition number which bounds the convergence rate of the deflated conjugate gradient method depends asymptotically linearly on the size of the perturbations in the deflation subspace. Second, we discuss the accuracy required in calculating the deflating projection. This is crucial concerning the overall convergence of the method, and also allows to save some computational work. To show these results, we use the fact that as a projection approach deflation has many similarities to multigrid methods. In particular, recent results relate the spectra of the deflated matrix to the spectra of the error propagator of twogrid methods. In the spirit of these results we show that the effective condition number can be bounded by the constant of a weak approximation property.