Content uploaded by Hussam Al Daas

Author content

All content in this area was uploaded by Hussam Al Daas on Jul 27, 2021

Content may be subject to copyright.

TWO-LEVEL NYSTR ¨

OM–SCHUR PRECONDITIONER FOR SPARSE

SYMMETRIC POSITIVE DEFINITE MATRICES∗

HUSSAM AL DAAS†, TYRONE REES†,AND JENNIFER SCOTT†‡

Abstract. Randomized methods are becoming increasingly popular in numerical linear algebra.

However, few attempts have been made to use them in developing preconditioners. Our interest lies

in solving large-scale sparse symmetric positive deﬁnite linear systems of equations where the system

matrix is preordered to doubly bordered block diagonal form (for example, using a nested dissection

ordering). We investigate the use of randomized methods to construct high quality preconditioners.

In particular, we propose a new and eﬃcient approach that employs Nystr¨om’s method for computing

low rank approximations to develop robust algebraic two-level preconditioners. Construction of the

new preconditioners involves iteratively solving a smaller but denser symmetric positive deﬁnite Schur

complement system with multiple right-hand sides. Numerical experiments on problems coming from

a range of application areas demonstrate that this inner system can be solved cheaply using block

conjugate gradients and that using a large convergence tolerance to limit the cost does not adversely

aﬀect the quality of the resulting Nystr¨om–Schur two-level preconditioner.

Key words. Randomized methods, Nystr¨om’s method, Low rank, Schur complement, Deﬂation,

Sparse symmetric positive deﬁnite systems, Doubly bordered block diagonal form, Block Conjugate

Gradients, Preconditioning.

1. Introduction. Large scale linear systems of equations arise in a wide range

of real-life applications. Since the 1970s, sparse direct methods, such as LU, Cholesky,

and LDLT factorizations, have been studied in depth and library quality software is

available (see, for example, [9] and the references therein). However, their memory

requirements and the diﬃculties in developing eﬀective parallel implementations

can limit their scope for solving extremely large problems, unless they are used in

combination with an iterative approach. Iterative methods are attractive because

they have low memory requirements and are simpler to parallelize. In this work,

our interest is in using the conjugate gradient (CG) method to solve large sparse

symmetric positive deﬁnite (SPD) systems of the form

(1.1) Ax =b,

where A∈Rn×nis SPD, b∈Rnis the given right-hand side, and xis the required

solution. The solution of SPD systems is ubiquitous in scientiﬁc computing, being

required in applications as diverse as least-squares problems, non-linear optimization

subproblems, Monte-Carlo simulations, ﬁnite element analysis, and Kalman ﬁltering.

In the following, we assume no additional structure beyond a sparse SPD system.

It is well known that the approximate solution xkat iteration kof the CG method

satisﬁes

(1.2) kx?−xkkA≤2kx?−x0kA√κ−1

√κ+ 1k

,

where x?is the exact solution, x0is the initial guess, k·kAis the A-norm, and κ(A) =

λmax/λmin is the spectral condition number (λmax and λmin denote the largest and

∗Submitted to the editors January 28, 2021.

†STFC Rutherford Appleton Laboratory, Harwell Campus, Didcot, Oxfordshire, OX11 0QX, UK

(hussam.al-daas@stfc.ac.uk,tyrone.rees@stfc.ac.uk,jennifer.scott@stfc.ac.uk).

‡School of Mathematical, Physical and Computational Sciences, University of Reading, Reading

RG6 6AQ, UK

1

2H. AL DAAS, T. REES, AND J. SCOTT

smallest eigenvalues of A). The rate of convergence also depends on the distribution

of the eigenvalues (as well as on band x0): eigenvalues clustered away from the

origin lead to rapid convergence. If κ(A) is large and the eigenvalues of Aare evenly

distributed, the system needs to be preconditioned to enhance convergence. This

can be done by applying a linear operator Pto (1.1), where P ∈ Rn×nis chosen so

that the spectral condition number of PAis small and applying Pis inexpensive. In

some applications, knowledge of the provenance of Acan help in building an eﬃcient

preconditioner. Algebraic preconditioners do not assume such knowledge, and include

incomplete Cholesky factorizations, block Jacobi, Gauss–Seidel, and additive Schwarz;

see, for example, [36]. These are referred to as one-level or traditional preconditioners

[7,43]. In general, algebraic preconditioners bound the largest eigenvalues of PAbut

encounter diﬃculties in controlling the smallest eigenvalues, which can lie close to the

origin, hindering convergence.

Deﬂation strategies have been proposed to overcome the issues related to small

eigenvalues. As explained in [25], the basic idea behind deﬂation is to “hide” certain

parts of the spectrum of the matrix from the CG method, such that the CG iteration

“sees” a system that has a much smaller condition number than the original matrix.

The part of the spectrum that is hidden from CG is determined by the deﬂation

subspace and the improvement in the convergence rate of the deﬂated CG method is

dependent on the choice of this subspace. In the ideal case, the deﬂation subspace

is the invariant subspace spanned by the eigenvectors associated with the smallest

eigenvalues of Aand the convergence rate is then governed by the “eﬀective” spectral

condition number associated with the remaining eigenvalues (that is, the ratio of the

largest eigenvalue to the smallest remaining eigenvalue). The idea was ﬁrst introduced

in the late 1980s [8,33], and has been discussed and used by a number of researchers

[2,3,10,14,22,23,27,32,40,41,45,46]. However, in most of these references,

the deﬂation subspaces rely on the underlying partial diﬀerential equation and its

discretization, and cannot be applied to more general systems or used as “black box”

preconditioners. Algebraic two-level preconditioners have been proposed in [4,11,

15,30,43,44]. Recently, a two-level Schur complement preconditioner based on the

power series approximation was proposed in [50].

In recent years, the study of randomized methods has become an active and

promising research area in the ﬁeld of numerical linear algebra (see, for example,

[16,31] and the references therein). The use of randomized methods to build

preconditioners has been proposed in a number of papers, including [14,18]. The

approach in [14] starts by reordering the system matrix Ato a 2 ×2 doubly

bordered block diagonal form, which can be achieved using a nested dissection

ordering. The Schur complement system must then be solved. Starting from

a ﬁrst-level preconditioner P, a deﬂation subspace is constructed via a low rank

approximation. Although deﬂation can be seen as a low rank correction, using

randomized methods to estimate the low rank term is not straightforward because

the deﬂation subspace is more likely to be associated with the invariant subspace

corresponding to the smallest eigenvalues of the preconditioned matrix, and not to

its dominant subspace. In section 2, we review the ingredients involved in building

our two-level preconditioner. This includes Nystr¨om’s method for computing a low

rank approximation of a matrix [12,16,34,47,48], basic ideas behind deﬂation

preconditioners, and the two-level Schur complement preconditioners presented in

[14,27]. In section 3, we illustrate the diﬃculties in constructing these two-level

preconditioners by analysing the eigenvalue problems that must be solved. We show

that these diﬃculties are mainly associated with the clustering of eigenvalues near

TWO-LEVEL NYSTR ¨

OM–SCHUR PRECONDITIONER FOR SPD MATRICES 3

Identiﬁer n nnz(A)κ(A)nΓ2D/3D Application Source

bcsstk38 8,032 355,460 5.5e+16 2,589 2D Structural problem SSMC

ela2d 45,602 543,600 1.5e+8 4,288 2D Elasticity problem FF++

ela3d 9,438 312,372 4.5e+5 4,658 3D Elasticity problem FF++

msc10848 10,848 1,229,776 1.0e+10 4,440 3D Structural problem SSMC

nd3k 9,000 3,279,690 1.6e+7 1,785 3D Not available SSMC

s3rmt3m3 5,357 207,123 2.4e+10 2,058 2D Structural problem SSMC

Table 1

Set of test matrices. nand nnz (A)denote the order of Aand the number of nonzero entries

in Adisregarding, κ(A)is the spectral condition number, nΓis the order of the Schur complement

(2.11). SSMC refers to SuiteSparse Matrix Collection [5]. FF++ refers to FreeFem++ [17].

the origin. Motivated by this analysis, in section 4 we propose reformulating the

approximation problem.

The new formulation leads to well-separated eigenvalues that lie away from

the origin, and this allows randomized methods to be used to compute a deﬂation

subspace. Our approach guarantees a user-deﬁned upper bound on the expected value

of the spectral condition number of the preconditioned matrix. Numerical results for

our new preconditioner and comparisons with other approaches are given in section 5.

Concluding remarks are made in section 6.

Our main contributions are:

•an analysis of the eigenvalue problems and solvers presented in [14,27];

•a reformulation of the eigenvalue problem so that it be eﬃciently solving using

randomized methods;

•a new two-level preconditioner for symmetric positive deﬁnite systems that

we refer to as a two-level Nystr¨om–Schur preconditioner;

•theoretical bounds on the expected value of the spectral condition number of

the preconditioned system.

Test environment. In this study, to demonstrate our theoretical and practical

ﬁndings, we report on numerical experiments using the test matrices given in Table 1.

This set was chosen to include 2D and 3D problems having a range of densities and

with relatively large spectral condition numbers. In the Appendix, results are given

for a much larger set of matrices. For each test, the entries of the right-hand side

vector fare taken to be random numbers in the interval [0,1]. All experiments are

performed using Matlab 2020b.

Notation. Throughout this article, matrices are denoted using uppercase letters;

scalars and vectors are lowercase. The pseudo inverse of a matrix Cis denoted by C†

and its transpose is given by C>. Λ(M) denotes the spectrum of the matrix Mand

κ(M) denotes its condition number. Λk=diag(λ1, . . . , λk) denotes a k×kdiagonal

matrix with entries on the diagonal equal to λ1, . . . , λk.e

S(with or without a subscript

or superscript) is used as an approximation to a Schur complement matrix. P(with

or without a subscript) denotes a (deﬂation) preconditioner. M(with or without

a subscript) denotes a two-level (deﬂation) preconditioner. Matrices with an upper

symbol such as e

Z,b

Z, and ˘

Zdenote approximations of the matrix Z. Euler’s constant

is denoted by e.

2. Background. We start by presenting a brief review of Nystr¨om’s method for

computing a low rank approximation to a matrix and then recalling key ideas behind

two-level preconditioners; both are required in later sections.

4H. AL DAAS, T. REES, AND J. SCOTT

2.1. Nystr¨om’s method. Given a matrix G, the Nystr¨om approximation of a

SPSD matrix Bis deﬁned to be

(2.1) BG(G>BG)†(BG)>.

We observe that there are a large number of variants based on diﬀerent choices of

G(for example, [16,28,31]). For q≥0, the q-power iteration Nystr¨om method is

obtained by choosing

(2.2) G=BqΩ,

for a given (random) starting matrix Ω. Note that, in practice, for stability it is

normally necessary to orthonormalize the columns between applications of B.

The variant of Nystr¨om’s method we employ is outlined in Algorithm 2.1. It gives

a near-optimal low rank approximation to Band is particularly eﬀective when the

eigenvalues of Bdecay rapidly after the k-th eigenvalue [16,31]. It requires only one

matrix-matrix product with B(or q+ 1 products if (2.2) is used). The rank of the

resulting approximation is min(r, k), where ris the rank of D1, see Step 5.

Algorithm 2.1 Nystr¨om’s method for computing a low rank approximation to a

SPSD matrix.

Input: A SPSD matrix B∈Rn×n, the required rank k > 0, an oversampling

parameter p≥0 such that k, p n, and a threshold ε.

Output: e

Bk=e

Uke

Σke

U>

k≈Bwhere e

Ukis orthonormal e

Σkis diagonal with non

negative entries.

1: Draw a random matrix G∈Rn×(k+p).

2: Compute F=BG.

3: Compute the QR factorization F=QR.

4: Set C=G>F.

5: Compute the EVD C=V1D1V>

1+V2D2V>

2, where D1contains all the eigenvalues

that are at least ε.

6: Set T=RV1D−1

1(RV1)>.

7: Compute the EVD T=W EW >.

8: Set e

U=QW ,e

Uk=e

U(:,1 : k), e

Σ = E(1 : k, 1 : k), and e

Bk=e

Uke

Σke

U>

k.

Note that, if the eigenvalues are ordered in descending order, the success of

Nystr¨om’s method is closely related to the ratio of the (k+ 1)th and the kth

eigenvalues. If the ratio is approximately equal to one, qmust be large to obtain

a good approximation [37].

2.2. Introduction to two-level preconditioners. Consider the linear system

(1.1). As already noted, deﬂation techniques are typically used to shift isolated

clusters of small eigenvalues to obtain a tighter spectrum and a smaller condition

number. Such changes have a positive eﬀect on the convergence of Krylov subspace

methods. Consider the general (left) preconditioned system

(2.3) PAx =Pb, P ∈ Rn×n.

Given a projection subspace matrix Z∈Rn×kof full rank and kn, deﬁne the

nonsingular matrix E=Z>AZ ∈Rk×kand the matrix Q=ZE −1Z>∈Rn×n. The

deﬂation preconditioner PDEF ∈Rn×nis deﬁned to be [10]

(2.4) PDEF =I−AQ.

TWO-LEVEL NYSTR ¨

OM–SCHUR PRECONDITIONER FOR SPD MATRICES 5

It is straightforward to show that PDEF is a projection matrix and PDEFAhas kzero

eigenvalues (see [44] for basic properties of PDEF ). To solve (1.1), we write

x= (I− P>

DEF)x+P>

DEFx.

Since Qis symmetric, P>

DEF =I−QA, and so

x=QAx +P>

DEFx=Qb +P>

DEFx,

and we only need to compute P>

DEFx. We ﬁrst ﬁnd ythat satisﬁes the deﬂated system

(2.5) PDEFAy =PDEF b,

then (due to the identity AP>

DEF =PDEFA) we have that P>

DEFy=P>

DEFx. We therefore

obtain the unique solution x=Qb +P>

DEFy. The deﬂated system (2.5) is singular and

can only be solved using CG if it is consistent [24], which is the case here since the

same projection is applied to both sides of a consistent nonsingular system (1.1).

The deﬂated system can also be solved using a preconditioner, giving a two-level

preconditioner for the original system.

Tang et al. [44] illustrate that rounding errors can result in erratic and slow

convergence of CG using PDEF. They thus also consider an adapted deﬂation

preconditioner

(2.6) PA-DEF =I−QA +Q,

that combines P>

DEF with Q. In exact arithmetic, both PDEF and PA-DEF used with

CG generate the same iterates. However, numerical experiments [44] show that the

latter is more robust and leads to better numerical behavior of CG1.

Let λn≥ ··· ≥ λ1>0 be the eigenvalues of Awith associated normalized

eigenvectors vn, . . . , v1. For the ideal deﬂation preconditioner, Pideal, the deﬂation

subspace is the invariant subspace spanned by the eigenvectors associated with the

smallest eigenvalues. To demonstrate how Pideal modiﬁes the spectrum of the deﬂated

matrix, set Zk= [v1, . . . , vk] to be the n×kmatrix whose columns are the eigenvectors

corresponding to the smallest eigenvalues. It follows that E=Z>AZ is equal to

Λk=diag(λ1, . . . , λk) and the preconditioned matrix is given by

PidealA=A−ZkΛkZ>

k.

Since Zkis orthonormal and its columns span an invariant subspace, the spectrum

of PidealAis {λn, . . . , λk+1 ,0}. Starting with x0such that Z>

kr0= 0 (r0is the

initial residual), for l≥0, Z>

k(PidealA)lr0= 0 and Z>

kAlr0= 0. Hence the search

subspace generated by the preconditioned CG (PCG) method lies in the invariant

subspace spanned by vn, . . . , vk+1, which is orthogonal to the subspace spanned by

the columns of Zk. Consequently, the eﬀective spectrum of the operator that PCG

sees is {λn, . . . , λk+1}and the associated eﬀective spectral condition number is

κeﬀ(Pideal A) = λn/λk+1.

Using similar computations, the ideal adapted deﬂated system is given by:

(2.7) PA-ideal =A−ZkΛ−1

kZ>

k+ZkZ>

k.

1In [44], PDEF and PA-DEF are termed PDEF1 and PA-DEF2, respectively

6H. AL DAAS, T. REES, AND J. SCOTT

Furthermore, the spectrum of the operator that PCG sees is {λn, . . . , λk+1,1,...,1}

and the associated eﬀective spectral condition number is

κeﬀ(PA-ideal A) = max{1, λn}/min{1, λk+1}.

In practice, only an approximation of the ideal deﬂation subspace spanned by the

columns of Zkis available. Kahl and Rittich [25] analyze the deﬂation preconditioner

using e

Zk≈Zkand present an upper bound on the corresponding eﬀective spectral

condition number of the deﬂated matrix κ(PA). Their bound [25, Proposition 4.3],

which depends on κ(A), κeﬀ(Pideal A), and the largest principal angle θbetween e

Zk

and Zk, is given by

(2.8) κ(PA)≤pκ(A) sin θ+pκeﬀ(Pideal A)2,

where sin θ=kZkZ>

k−e

Zke

Z>

kk2.

2.3. Schur Complement Preconditioners. This section reviews the Schur

complement preconditioner with a focus on two-level variants that were introduced in

[14,27].

One-level preconditioners may not provide the required robustness when used with

a Krylov subspace method because they typically fail to capture information about

the eigenvectors corresponding to the smallest eigenvalues. To try and remedy this, in

their (unpublished) report, Grigori et al. [14] and, independently, Li et al. [27] propose

a two-level preconditioner based on using a block factorization and approximating the

resulting Schur complement.

Applying graph partitioning techniques (for example, using the METIS package

[26,29]), Acan be symmetrically permuted to the 2×2 doubly bordered block diagonal

form

(2.9) P>AP =AIAIΓ

AΓIAΓ,

where AI∈RnI×nIis a block diagonal matrix, AΓ∈RnΓ×nΓ,AΓI∈RnΓI×nΓand

AIΓ=A>

ΓI. For simplicity of notation, we assume that Ais of the form (2.9) (and

omit the permutation Pfrom the subsequent discussion).

The block form (2.9) induces a block LDLT factorization

(2.10) A=I

AΓIA−1

IIAI

SΓI A−1

IAIΓ

I,

where

(2.11) SΓ=AΓ−AΓIA−1

IAIΓ

is the Schur complement of Awith respect to AΓ. Provided the blocks within AI

are small, they can be factorized cheaply in parallel using a direct algorithm (see,

for example, [38]) and thus we assume that solving linear systems with AIis not

computationally expensive. However, the SPD Schur complement SΓis typically

large and signiﬁcantly denser than AΓ(its size increases with the number of blocks

in AI) and, in large-scale practical applications, it may not be possible to explicitly

assemble or factorize it.

TWO-LEVEL NYSTR ¨

OM–SCHUR PRECONDITIONER FOR SPD MATRICES 7

Preconditioners may be derived by approximating S−1

Γ. An approximate block

factorization of A−1is

M−1=I−A−1

IAIΓI

IA−1

Ie

S−1 I

−AΓIA−1

II,

where e

S−1≈S−1

Γ. If M−1is employed as a preconditioner for Athen the

preconditioned system is given by

(2.12) M−1A= I A−1

IAIΓ(I−e

S−1SΓ)

e

S−1SΓ!,

with Λ(M−1A) = {λ∈Λ( e

S−1SΓ)} ∪ {1}. Thus, to bound the condition number

κ(M−1A), we need to construct e

S−1so that κ(e

S−1SΓ) is bounded. Moreover, (2.12)

shows that applying the preconditioner requires the eﬃcient solution of linear systems

with e

S−1SΓand AI, the latter being relatively inexpensive. We therefore focus on

constructing preconditioners e

S−1for linear systems of the form

(2.13) SΓw=f.

Consider the ﬁrst-level preconditioner obtained by setting

(2.14) e

S−1

1:= A−1

Γ.

Assume for now that we can factorize AΓ, although in practice it may be very large

and a recursive construction of the preconditioner may then be needed (see [49]). Let

the eigenvalues of the generalized eigenvalue problem

(2.15) SΓz=λe

S1z

be λnΓ≥ ··· ≥ λ1>0. From (2.11),λnΓ≤1 and so

κ(e

S−1

1SΓ) = λnΓ

λ1≤1

λ1

.

As this is unbounded as λ1approaches zero, we seek to add a low rank term to

“correct” the approximation and shift the smallest keigenvalues of e

S−1

1SΓ. Let

Λk=diag{λ1, . . . , λk}and let Zk∈RnΓ×kbe the matrix whose columns are

the corresponding eigenvectors. Without loss of generality, we assume Zkis AΓ-

orthonormal. Let the Cholesky factorization of AΓbe

(2.16) AΓ=R>

ΓRΓ

and deﬁne

(2.17) e

S−1

2:= A−1

Γ+Zk(Λ−1

k−I)Z>

k.

e

S−1

2is an additive combination of the ﬁrst-level preconditioner e

S−1

1and an adapted

deﬂation preconditioner associated with the subspace spanned by the columns of

Uk=RΓZk, which is an invariant subspace of R−1

ΓSΓR−>

Γ. Substituting Ukinto

(2.17) and using (2.16),

(2.18) e

S−1

2=R−1

Γ(I+Uk(Λ−1

k−I)U>

k)R−>

Γ.

8H. AL DAAS, T. REES, AND J. SCOTT

Setting Q=UkΛ−1

kU>

kin (2.6) gives

PA-DEF =RΓe

S−1

2R>

Γ.

Now e

S−1

2SΓ=R−1

ΓPA-DEFR−>

ΓSΓand PA-DEFR−>

ΓSΓR−1

Γare spectrally equivalent

and Λ( e

S−1

2SΓ) = {λnΓ, λnΓ−1, ..., λk+1}∪ {1}. It follows that

κ(e

S−1

2SΓ) = λnΓ

λk+1 ≤1

λk+1

.

Grigori et al. [14] note that (2.15) is equivalent to the generalized eigenvalue

problem

(2.19) (AΓ−SΓ)z=AΓIA−1

IAIΓz=σAΓz, σ = 1 −λ.

Setting u=RΓzand deﬁning

(2.20) H=R−>

ΓAΓIA−1

IAIΓR−1

Γ,

(2.19) becomes

(2.21) Hu =σu.

Thus, the smallest eigenvalues λof (2.15) are transformed to the largest eigenvalues

σof problems (2.19) and (2.21). Grigori et al. employ a randomized algorithm to

compute a low rank eigenvalue decomposition (EVD) of Hthat approximates its

largest eigenvalues and vectors, which are multiplied by R−1

Γto obtain approximate

eigenvectors of A−1

ΓSΓ.

In [27], Li et al. write the inverse of the Schur complement SΓas:

S−1

Γ=AΓ−AΓIA−1

IAIΓ−1

=R>

ΓRΓ−AΓIA−1

IAIΓ−1

=R−1

Γ(I−H)−1R−>

Γ,

(2.22)

where the symmetric positive semideﬁnite (SPSD) matrix His given by (2.20). Since

I−H=R−>

ΓSΓR−1

Γis SPD, the eigenvalues σ1≥. . . ≥σnΓof Hbelong to [0,1].

Let the EVD of Hbe

H=UΣU>,

where Uis orthonormal and Σ = diag{σ1, . . . , σnΓ}. It follows that

S−1

Γ=R−1

ΓI−UΣU>−1R−>

Γ

=R−1

ΓU(I−Σ)−1U>R−>

Γ

=R−1

ΓI+U(I−Σ)−1−IU>R−>

Γ

=A−1

Γ+R−1

ΓU(I−Σ)−1−IU>R−>

Γ.

(2.23)

If Hhas an approximate EVD of the form

H≈Ue

ΣU>,e

Σ = diag{eσ1,...,eσnΓ},

TWO-LEVEL NYSTR ¨

OM–SCHUR PRECONDITIONER FOR SPD MATRICES 9

then an approximation of S−1

Γis

e

S−1=A−1

Γ+R−1

ΓUI−e

Σ−1−IU>R−>

Γ.(2.24)

The simplest selection of e

Σ is the one that ensures the klargest eigenvalues of (I−e

Σ)−1

match the largest eigenvalues of (I−Σ)−1. Li et al. set e

Σ = diag(σ1, . . . , σk,θ,...,θ),

where θ∈[0,1]. The resulting preconditioner can be written as

(2.25) e

S−1

θ=1

1−θA−1

Γ+Zk(I−Σk)−1−1

1−θIZ>

k,

where Σk=diag(σ1, . . . , σk) and the columns of Zk=R−1

ΓUkare the eigenvectors

corresponding to the klargest eigenvalues of H. In [27], it is shown that κ(e

S−1

θS) =

(1 −σnΓ)/(1 −θ), which takes its minimum value for θ=σk+1.

In the next section, we analyse the eigenvalue problems that need to be solved

to construct the preconditioners (2.17) and (2.25). In particular, we show that the

approaches presented in [14,27] for tackling these problems are ineﬃcient because of

the eigenvalue distribution.

3. Analysis of Hu =σu.

3.1. Use of the Lanczos method. Consider the eigenproblem:

Given ε > 0,ﬁnd all the eigenpairs (λ, z)∈R×RnΓsuch that

SΓz=λAΓz, λ < ε.

This can be rewritten as:

Given ε > 0,ﬁnd all the eigenpairs (λ, z)∈R×RnΓsuch that

(I−H)u=λu, z =R−1

Γu, λ < ε,

(3.1)

where RΓand Hare given by (2.16) and (2.20). Consider also the eigenproblem:

Given ε > 0,ﬁnd all the eigenpairs (σ, u)∈R×RnΓsuch that

Hu =σu, σ > 1−ε.

(3.2)

As already observed, each eigenpair (λ, z) of (3.1) corresponds to the eigenpair (1 −

λ, RΓz) of (3.2). Consider using the Lanczos method to solve these eigenproblems.

The Krylov subspace at iteration jgenerated for (3.1) is

Kj((I−H), v1) = span(v1,(I−H)v1,...,(I−H)j−1v1),

while the subspace generated for (3.2) is

Kj(H, v1) = span(v1, H v1, . . . , Hj−1v1).

It is clear that, provided the same starting vector v1is used, Kj((I−H), v1) and

Kj(H, v1) are identical. Suppose that [Vj, vj+1] is the output of the Lanczos basis of

the Krylov subspace, then the subspace relations that hold at iteration jare

(I−H)Vj=VjTj+vj+1h>

j,

10 H. AL DAAS, T. REES, AND J. SCOTT

HVj=Vj(I−Tj)−vj+1h>

j,

where Tj∈Rj×jis a symmetric tridiagonal matrix and hj∈Rj. The eigenpair

(λ, z) (respectively, (σ, u)) corresponding to the smallest (respectively, largest)

eigenvalue in (3.1) (respectively, (3.2)) is approximated by the eigenpair (e

λ, R−1

ΓVjeu)

(respectively, (eσ , Vjeu)) corresponding to the smallest (respectively, largest) eigenvalue

of Tj(respectively, I−Tj). To overcome memory constraints, the Lanczos procedure

is typically restarted after a chosen number of iterations, at each restart discarding

the non convergent part of the Krylov subspace [42]. Hence, starting with the same

v1and performing the same number of iterations per cycle, in exact arithmetic the

accuracy obtained when solving (3.1) and (3.2) is identical.

Having shown that the convergence of Lanczos’ method for solving (3.1) and (3.2)

is the same, we focus on (3.2). In Figure 1, for each of our test matrices in Table 1

0 10 20 30 40 50 60 70 80 90 100

Eigenvalue index

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

Eigenvalue

ela2d

ela3d

s3rmt3m3

bcsstk38

msc10848

nd3k

Fig. 1.Largest 100 eigenvalues of H=R−>

ΓAΓIA−1

IAIΓR−1

Γassociated with our test matrices

computed to an accuracy of 10−8using the Krylov-Schur method [42].

we plot the 100 largest eigenvalues of the matrix Hgiven by (2.20). We see that the

largest eigenvalues (which are the ones that we require) are clustered near one and

they do not decay rapidly. As there are a signiﬁcant number of eigenvalues in the

cluster, computing the largest k(for k=O(10)) and the corresponding eigenvectors

with suﬃcient accuracy using the Lanczos method is challenging. Similar distributions

were observed for the larger test set that we report on in the Appendix, particularly

for problems for which the one-level preconditioner e

S1was found to perform poorly,

which is generally the case when κ(A) is large. Table 2 reports the Lanczos iteration

counts (itLan) for computing the k= 20 and 40 largest eigenpairs (that is, the number

of linear systems that are solved in the Lanczos method). In addition, we present the

PCG iteration count (itPCG) for solving the linear system (2.13) using the ﬁrst-level

preconditioner e

S1=A−1

Γand the two-level preconditioner e

S2given by (2.17). We

see that, in terms of the total iteration count, the ﬁrst-level preconditioner is the

more eﬃcient option. It is of interest to consider whether relaxing the convergence

tolerance εLan in the Lanczos method can reduce the total iteration count for e

S2.

Table 3 illustrates the eﬀect of varying εLan for problem el3d (results for the other test

problems are consistent). Although itLan decreases as εLan increases, itPCG increases

and the total count still exceeds the 175 PCG iterations required by the ﬁrst-level

preconditioner e

S1.

As already observed, in [49] a recursive (multilevel) scheme is proposed to

help mitigate the computational costs of building and applying the preconditioner.

TWO-LEVEL NYSTR ¨

OM–SCHUR PRECONDITIONER FOR SPD MATRICES 11

e

S2

e

S1k= 20 k= 40

Identiﬁer itPCG itLan itPCG total itLan itPCG total

bcsstk38 584 797 122 919 730 67 797

el2d 914 1210 231 1441 982 120 1102

el3d 174 311 37 348 389 27 416

msc10848 612 813 116 929 760 63 823

nd3k 603 1796 143 1939 1349 105 1454

s3rmt3m3 441 529 70 599 480 37 517

Table 2

The Lanczos iteration count (itLan) and the iteration count for PCG (itPCG ). The convergence

tolerance for the Lanczos method and PCG is 10−6. The size of the Krylov subspace per cycle is

2k.

k= 20 k= 40

εLan itLan itPCG total itLan itPCG total

0.1 50 131 181 80 101 181

0.08 50 131 181 100 85 185

0.06 60 121 181 100 85 185

0.04 82 100 182 120 71 191

0.02 127 64 201 207 37 244

0.01 169 41 210 259 32 291

0.005 213 38 251 316 29 345

0.001 247 37 284 372 28 400

Table 3

Problem el3d and two-level preconditioner e

S2: sensitivity of the number of the Lanczos iteration

count (itLan ) and the iteration count for PCG (itPCG ) to the convergence tolerance εLan. The PCG

convergence tolerance is 10−6. The size of the Krylov subspace per cycle is 2k.

Nevertheless, the Lanczos method is still used, albeit with reduced costs for applying

the operator matrices.

3.2. Use of Nystr¨om’s method. As suggested in [14], an alternative approach

to approximating the dominant subspace of His to use a randomized method,

speciﬁcally a randomized eigenvalue decomposition. Because His SPSD, Nystr¨om’s

method can be use. Results are presented in Table 4 for problem el3d (results for our

other test examples are consistent with these). Here pis the oversampling parameter

and qis the power iteration parameter. These show that, as with the Lanczos method,

Nystr¨om’s method struggles to approximate the dominant eigenpairs of H. Using

k= 20 (respectively, 40) exact eigenpairs, PCG using e

S2requires 37 (respectively,

28) iterations. To obtain the same iteration counts using vectors computed using

Nystr¨om’s method requires the oversampling parameter to be greater than 2000,

which is clearly prohibitive. Using the power iteration improves the quality of the

approximate subspace. However, the large value of qneeded to decrease the PCG

iteration count means a large number of linear systems must be solved with AΓ, in

addition to the work involved in the orthogonalization that is needed between the

power iterations to maintain stability. Indeed, it is suﬃcient to look at Figure 1 to

predict this behaviour for any randomized method applied to H. The lack of success

of existing strategies motivates us, in the next section, to reformulate the eigenvalue

12 H. AL DAAS, T. REES, AND J. SCOTT

problem to one with a spectrum that is easy to approximate.

p k = 20 k= 40

100 171 169

200 170 165

400 165 161

800 155 146

1600 125 111

3200 55 45

q k = 20 k= 40

0 172 171

20 121 87

40 86 48

60 68 34

80 55 30

100 46 29

Table 4

PCG iteration counts for problem el3d using the two-level preconditioner e

S2constructed using

a rank kapproximation of H=R−>

ΓAΓIA−1

IAIΓR−1

Γ. The PCG convergence tolerance is 10−6.

Nystr¨om’s method applied to Hwith the oversampling parameter p≥100 and the power iteration

parameter q= 0 (left) and with p= 0 and q≥0(right).

4. Nystr¨om–Schur two-level preconditioner. In this section, we propose

reformulating the eigenvalue problem to obtain a new one such that the desired

eigenvectors correspond to the largest eigenvalues and these eigenvalues are well

separated from the remaining eigenvalues: this is what is needed for randomized

methods to be successful.

4.1. Two-level preconditioner for SΓ.Applying the Sherman Morrison

Woodbury identity [13, 2.1.3], the inverse of the Schur complement SΓ(2.11) can

be written as:

S−1

Γ=A−1

Γ+A−1

ΓAΓI(AI−AIΓA−1

ΓAΓI)−1AIΓA−1

Γ

=A−1

Γ+A−1

ΓAΓIS−1

IAIΓA−1

Γ,

(4.1)

where

(4.2) SI=AI−AIΓA−1

ΓAΓI

is the Schur complement of Awith respect to AI. Using the Cholesky factorization

(2.16), we have

(4.3) RΓS−1

ΓR>

Γ=I+R−>

ΓAΓIS−1

IAIΓR−1

Γ.

Note that if (λ, u) is an eigenpair of R−>

ΓSΓR−1

Γ, then ( 1

λ−1, u) is an eigenpair of

R−>

ΓAΓIS−1

IAIΓR−1

Γ. Therefore, the cluster of eigenvalues of R−>

ΓSΓR−1

Γnear the

origin (which correspond to the cluster of eigenvalues of Hnear 1) correspond to

very large and highly separated eigenvalues of R−>

ΓAΓIS−1

IAIΓR−1

Γ. Hence, using

randomized methods to approximate the dominant subspace of R−>

ΓAΓIS−1

IAIΓR−1

Γ

can be an eﬃcient way of computing a deﬂation subspace for R−>

ΓSΓR−1

Γ. Now

assume that we have a low rank approximation

(4.4) R−>

ΓAΓIS−1

IAIΓR−1

Γ≈˘

Uk˘

Σk˘

U>

k,

where ˘

Uk∈RnΓ×kis orthonormal and ˘

Σk∈Rk×kis diagonal. Combining (4.3) and

(4.4), we can deﬁne a preconditioner for R−>

ΓSΓR−1

Γto be

(4.5) P1=I+˘

Uk˘

Σk˘

U>

k.

TWO-LEVEL NYSTR ¨

OM–SCHUR PRECONDITIONER FOR SPD MATRICES 13

The preconditioned matrix P1R−>

ΓSΓR−1

Γis spectrally equivalent to R−1

ΓP1R−>

ΓSΓ.

Therefore, the preconditioned system can be written as

(4.6) M1SΓ=R−1

ΓP1R−>

ΓSΓ=A−1

Γ+˘

Zk˘

Σk˘

Z>

kSΓ,

where ˘

Zk=R−1

Γ˘

Uk. If (4.4) is obtained using a truncated EVD denoted by UkΣkU>

k,

then ˘

Uk=Ukand the subspace spanned by the columns of Ukis an invariant subspace

of RΓS−1

ΓR>

Γand of its inverse R−1

ΓSΓR−>

Γ. Furthermore, using the truncated EVD,

(4.5) is an adapted deﬂation preconditioner for R−>

ΓSΓR−1

Γ. Indeed, as the columns of

Ukare orthonormal eigenvectors, we have from (4.3) that RΓS−1

ΓR>

ΓUk=Uk(I+ Σk).

Hence R−>

ΓSΓR−1

ΓUk=Uk(I+ Σk)−1and the preconditioned matrix is

PA-DEFR−>

ΓSΓR−1

Γ=R−>

ΓSΓR−1

Γ+UkΣk(I+ Σk)−1U>

k

=R−>

ΓSΓR−1

Γ+Uk((I+ Σk)−I) (I+ Σk)−1U>

k

=R−>

ΓSΓR−1

Γ−Uk(I+ Σk)−1U>

k+UkU>

k,

which has the same form as the ideal adapted preconditioned matrix (2.7).

Note that given the matrix ˘

Ukin the approximation (4.4), then following

subsection 2.2, we can deﬁne a deﬂation preconditioner for R−>

ΓSΓR−1

Γ. Setting

Ek=˘

U>

kR−>

ΓSΓR−1

Γ˘

Ukand Q=˘

UkE−1˘

U>

k, the deﬂation preconditioner is

(4.7) P1-A-DEF =I−QR−>

ΓSΓR−1

Γ+Q.

The preconditioned Schur complement P1-A-DEF R−>

ΓSΓR−1

Γis spectrally similar to

R−1

ΓP1-A-DEFR−>

ΓSΓand thus

(4.8) M1-A-DEF =R−1

ΓP1-A-DEFR−>

Γ

is a two-level preconditioner for SΓ.

4.2. Lanczos versus Nystr¨om. The two-level preconditioner (4.8) relies on

computing a low-rank approximation (4.4). We now consider the diﬀerence between

using the Lanczos and Nystr¨om methods for this.

Both methods require the application of R−>

ΓAΓIS−1

IAIΓR−1

Γto a set of k+p

vectors, where k > 0 is the required rank and p≥0. Because explicitly computing

the SPD matrix SI=AI−AIΓA−1

ΓAΓIand factorizing it is prohibitively expensive,

applying S−1

Imust be done using an iterative solver.

The Lanczos method builds a Krylov subspace of dimension k+pin order to

compute a low-rank approximation. Therefore, k+plinear systems must be solved,

each with one right-hand side, ﬁrst for RΓ, then for SI, and then for R>

Γ. However,

the Nystr¨om method requires the solution of only one linear system with k+pright-

hand sides for RΓ, then for SI, and then for R>

Γ. This allows the use of matrix-matrix

operations rather than less eﬃcient matrix-vector operations. Moreover, as we will

illustrate in section 5, block Krylov subspace methods, such as block CG [35], for

solving the system with SIyield faster convergence than their classical counterparts.

When the Nystr¨om method is used, we call the resulting preconditioner (4.8) the

Nystr¨om–Schur preconditioner.

4.3. Avoiding computations with RΓ.For large scale problems, computing

the Cholesky factorization AΓ=R>

ΓRΓis prohibitive and so we would like to avoid

14 H. AL DAAS, T. REES, AND J. SCOTT

computations with RΓ. We can achieve this by using an iterative solver to solve linear

systems with AΓ. Note that this is possible when solving the generalized eigenvalue

problem (2.15). Because AΓis typically well conditioned, so too is RΓ. Thus, we can

reduce the cost of computing the Nystr¨om–Schur preconditioner by approximating

the SPSD matrix AΓIS−1

IAIΓ(or even by approximating S−1

I). Of course, this needs

to be done without seriously adversely aﬀecting the preconditioner quality. Using an

approximate factorization

(4.9) AΓIS−1

IAIΓ≈f

Wke

Σkf

W>

k,

an alternative deﬂation preconditioner is

P2=I+R−>

Γf

Wke

Σkf

W>

kR−1

Γ,

=R−>

ΓAΓ+f

Wke

Σkf

W>

kR−1

Γ.

The preconditioned Schur complement P2R−>

ΓSΓR−1

Γis spectrally similar to

R−1

ΓP2R−>

ΓSΓand, setting e

Zk=A−1

Γf

Wk, we have

(4.10) M2SΓ=R−1

ΓP2R−>

ΓSΓ= (A−1

Γ+e

Zke

Σke

Z>

k)SΓ.

Thus M2=A−1

Γ+e

Zke

Σke

Z>

kis a variant of the Nystr¨om–Schur preconditioner for SΓ

that avoids computing RΓ.

Alternatively, assuming we have an approximate factorization

(4.11) S−1

I≈b

Vkb

Σkb

V>

k,

yields

P3=I+R−>

ΓAΓIb

Vkb

Σkb

V>

kAIΓR−1

Γ.

Again, P3R−>

ΓSΓR−1

Γis spectrally similar to R−1

ΓP3R−>

ΓSΓand, setting b

Zk=

A−1

ΓAΓIb

Vk, we have

(4.12) M3SΓ=R−1

ΓP3R−>

ΓSΓ= (A−1

Γ+b

Zkb

Σkb

Z>

k)SΓ,

which gives another variant of the Nystr¨om–Schur preconditioner. In a similar way

to deﬁning M1-A-DEF (4.7), we can deﬁne M2-A-DEF and M3-A-DEF. Note that M2-A-DEF

and M3-A-DEF also avoid computations with RΓ.

4.4. Nystr¨om–Schur preconditioner. Algorithm 4.1 presents the

construction of the Nystr¨om–Schur preconditioner M2; an analogous derivation

yields the variant M3. Step 3 is the most expensive step, that is, solving the nI×nI

SPD linear system

(4.13) SIX=F,

where F∈RnI×(k+p)and SI=AI−AIΓA−1

ΓAΓI. Using an iterative solver requires a

linear system solve with AΓon each iteration. Importantly for eﬃciency, the number

of iterations can be limited by employing a large relative tolerance when solving

(4.13) without adversely aﬀecting the performance of the resulting preconditioner.

Numerical experiments in section 5 illustrate this robustness.

Observe that applying M2to a vector requires the solution of a linear system

with AΓand a low rank correction; see Step 12.

TWO-LEVEL NYSTR ¨

OM–SCHUR PRECONDITIONER FOR SPD MATRICES 15

Algorithm 4.1 Construction of the Nystr¨om–Schur preconditioner (4.10)

Input: Ain block form (2.9),k > 0 and p≥0 (k, p nΓ) and ε > 0.

Output: Two-level preconditioner for the nΓ×nΓSchur complement SΓ.

1: Draw a random matrix G∈RnΓ×(k+p).

2: Compute F=AIΓG.

3: Solve SIX=F.

4: Compute Y=AΓIX.

5: Compute Y=QR.

6: Set C=G>Y.

7: Compute the EVD C=V1D1V>

1+V2D2V>

2, where D1contains all the eigenvalues

that are at least ε.

8: Set T=RV1D−1

1V>

1R>.

9: Compute the EVD T=W EW >.

10: Set e

U=Y W (:,1 : k), Σ = E(1 : k, 1 : k).

11: Solve AΓZ=e

U.

12: Deﬁne the preconditioner M2=A−1

Γ+ZΣZ>.

4.5. Estimation of the Spectral Condition Number. In this section, we

provide an expectation of the spectral condition number of SΓpreconditioned by

the Nystr¨om–Schur preconditioner. Saibaba [37] derives bounds on the angles

between the approximate singular vectors computed using a randomized singular

value decomposition and the exact singular vectors of a matrix. It is straightforward

to derive the corresponding bounds for the Nystr¨om method. Let ΠMdenote

the orthogonal projector on the space spanned by the columns of the matrix M.

Let (λj, uj), j= 1, . . . , k, be the dominant eigenpairs of R−>

ΓSΓR−1

Γ. Following

the notation in Algorithm 2.1, the angle θj=∠(uj,e

U) between the approximate

eigenvectors e

U∈RnΓ×(k+p)of R−>

ΓSΓR−1

Γand the exact eigenvector uj∈RnΓ

satisﬁes

(4.14) sin ∠(uj,e

U) = kuj−Πe

Uujk2≤γq+1

j,k c,

where qis the power iteration count (recall (2.2)), γj,k is the gap between λ−1

j−1

and λ−1

k+1 −1 given by

(4.15) γj,k = (λ−1

k+1 −1)/(λ−1

j−1),

and chas the expected value

(4.16) E(c) = sk

p−1+ep(k+p)(nΓ−k)

p,

where kis the required rank and p≥2 is the oversampling parameter. Hence,

(4.17) Esin ∠(uj,e

U)=Ekuj−Πe

Uujk2≤γq+1

j,k E(c).

Note that if λj≤1/2 then γj,k ≤2λj/λk+1 (j= 1, . . . , k).

Proposition 4.1. Let the EVD of the SPD matrix I−H=R−>

ΓSΓR−1

Γbe

U⊥UkΛ⊥

ΛkU>

⊥

U>

k,

16 H. AL DAAS, T. REES, AND J. SCOTT

where Λ⊥∈R(nΓ−k)×(nΓ−k)and Λk∈Rk×kare diagonal matrices with the eigenvalues

(λi)k≥i≥1and (λi)nΓ≥i≥k+1, respectively, in decreasing order. Furthermore, assume

that λk≤1/2. Let the columns of e

U∈RnΓ×(k+p)be the approximate eigenvectors of

I−Hcomputed using the Nystr¨om method and let

P=I−(I−H)e

UE−1e

U>with E=e

U>(I−H)e

U,

be the associated deﬂation preconditioner. Then, the eﬀective condition number of the

two-level preconditioner P(I−H) = PR−>

ΓSΓR−1

Γsatisﬁes

(4.18) Epκeﬀ (P(I−H))≤c1sλnΓ

λk+1

,

where c2

1is independent of the spectrum of I−Hand can be bounded by a polynomial

of degree 3 in k.

Proof. Let x∈RnΓ. Since u1, . . . , unΓform an orthogonal basis of RnΓ, there

exists α1, . . . , αnΓ∈Rsuch that x=PnΓ

i=1 αiui. In [25, Theorem 3.4], Kahl and

Rittich show that, if for some positive constant cK,e

Usatisﬁes

(4.19) kx−Πe

Uxk2

2≤cKkxk2

I−H

kI−Hk2

,

then the eﬀective condition number of P(I−H) satisﬁes

κeﬀ (P(I−H)) ≤cK.

Let t≤kand consider

kx−Πe

Uxk2=k

nΓ

X

i=1

αiui−Πe

U

nΓ

X

i=1

αiuik2

≤ k

nΓ

X

i=t+1

(I−Πe

U)αiuik2+

t

X

i=1 |αi|kui−Πe

Uuik2

≤ k

nΓ

X

i=t+1

αiuik2+

t

X

i=1 |αi|kui−Πe

Uuik2.

The last inequality is obtained using the fact that I−Πe

Uis an orthogonal projector.

Now bound each term on the right separately. We have

k

nΓ

X

i=t+1

αiuik2≤1

pλt+1 k

nΓ

X

i=t+1 pλt+1αiuik2≤1

pλt+1 k

nΓ

X

i=t+1 pλiαiuik2

≤1

pλt+1

nΓ

X

i=t+1

λiα2

i=1

pλt+1 kx−ΠUtxkI−H=sλnΓ

λt+1

kx−ΠUtxkI−H

pkI−Hk2

.

TWO-LEVEL NYSTR ¨

OM–SCHUR PRECONDITIONER FOR SPD MATRICES 17

From (4.15),γi,k ≤1 for i= 1, . . . , t, thus,

t

X

i=1 |αi|kui−Πe

Uuik2≤

t

X

i=1 |αi|γq+1

i,k c≤cγq+1

2

t,k

t

X

i=1 |αi|√γi,k

=cγq+1

2

t,k qλ−1

k+1 −1

t

X

i=1 |αi|1

qλ−1

i−1

≤cγq+1

2

t,k

1

pλk+1

t

X

i=1 |αi|1

qλ−1

i−1

.

Assuming that λi≤1/2 for i= 1, . . . , t, we have

t

X

i=1 |αi|kui−Πe

Uuik2≤√2cγq+1

2

t,k

1

pλk+1

t

X

i=1 |αi|1

qλ−1

i

≤√2cγq+1

2

t,k

1

pλk+1

t

X

i=1 |αi|pλi.

Using the fact that the l1and l2norms are equivalent, we have

t

X

i=1 |αi|kui−Πe

Uuik2≤c√2tγq+1

2

t,k

1

pλk+1 v

u

u

t

t

X

i=1

α2

iλi

=c√2tγq+1

2

t,k

1

pλk+1 kΠUtxkI−H

=c√2tγq+1

2

t,k sλnΓ

λk+1

kΠUtxkI−H

pkI−Hk2

.

Since λk≥λtwe have

t

X

i=1 |αi|kui−Πe

Uuik2≤c√2tγq+1

2

t,k sλnΓ

λt+1

kΠUtxkI−H

pkI−Hk2

.

It follows that

kx−Πe

Uxk2≤sλnΓ

λt+1

kx−ΠUtxkI−H

pkI−Hk2

+c√2tγq+1

2

t,k sλnΓ

λt+1

kΠUtxkI−H

pkI−Hk2

≤√2 max(c√2tγq+1

2

t,k ,1)sλnΓ

λt+1

kxkI−H

pkI−Hk2

.

Hence (4.19) is satisﬁed and we have

κeﬀ (P(I−H)) ≤2 max(2c2tγ2q+1

t,k ,1) λnΓ

λt+1

.

Thus,

Epκeﬀ (P(I−H))≤√2 max(E(c)√2tγq+1

2

t,k ,1)sλnΓ

λt+1

.

18 H. AL DAAS, T. REES, AND J. SCOTT

Since tis chosen arbitrarily between 1 and kwe have

(4.20) Epκeﬀ (P(I−H))≤√2 min

1≤t≤k max E(c)√2tγq+1

2

t,k ,1sλnΓ

λt+1 !.

Because E(c) can be bounded by a polynomial of degree 1 in kand γt,k ≤1,

max(4tγ2q+1

t,k (E(c))2,2) can be bounded by a polynomial of degree 3 in kindependent

of the spectrum of I−H.

Note that, in practice, when the problem is challenging, a few eigenvalues of

R−>

ΓSΓR−1

Γare close to the origin. This is reﬂected in a rapid and exponential

decay of the values of the entries of Λ−1−I.Figure 2 depicts the bound obtained

in Proposition 4.1 for diﬀerent values of kand qfor problem s3rmt3m3.

5 10 20 40

k

102

104

106

q=0

q=1

q=2

1/k

Fig. 2.Problem s3rmt3m3: Values of the bound (4.20) on Epκeﬀ (P(I−H))2for a range

of values of kand q.

5. Numerical Experiments. We use 64 subdomains (i.e., AIis a 64-block

diagonal matrix) for each of our test matrices with the exception of one problem. The

matrix nd3k is much denser than the others, and we use only two blocks (to reduce

the runtime). For comparison purposes, we include results for the Schur complement

preconditioners e

S1and e

S2given by (2.14) and (2.17), respectively. As demonstrated

in subsection 3.1, the latter is too costly to be practical, however, its performance

is the ideal since it guarantees the smallest spectral condition number for a ﬁxed

deﬂation subspace. Therefore, the quality of the Nystr¨om–Schur preconditioner will

be measured in terms of how close its performance is to that of e

S2and the reduction in

iteration it gives compared to e

S1. For a given problem, the right-hand side vector is the

same for all the tests: it is generated randomly with entries from the standard normal

distribution. The relative convergence tolerance for PCG is 10−6. Unless otherwise

speciﬁed, the parameters within Nystr¨om’s method (Algorithm 2.1) are rank k= 20,

oversampling p= 0, and power iteration q= 0. To ensure fair comparisons, the

random matrices generated in diﬀerent runs of the Nystr¨om algorithm use the same

seed. We employ the Nystr¨om–Schur variant M2(4.10) (recall that its construction

does not require the Cholesky factors of AΓ). The relative convergence tolerance used

when solving the SPD system (4.13) is εSI= 0.1. This system (4.13) is preconditioned

by the block diagonal matrix AI. We denote by itSIthe number of block PCG

iterations required to solve (4.13) during the construction of the Nystr¨om–Schur

preconditioners (it is zero for e

S1and e

S2), and by itPCG the PCG iteration count

TWO-LEVEL NYSTR ¨

OM–SCHUR PRECONDITIONER FOR SPD MATRICES 19

100 150 200 250

# iteration

0

1

2

3

4

# rhs

Fig. 3.Histogram of the PCG iteration counts for (4.13) for problem bcsstk38. The number of

right hand sides for which the iteration count is between [k, k + 10),k= 100,...,240, is given.

Classic Block

Identiﬁer iters itPCG iters itPCG

bcsstk38 238 186 46 173

el2d 549 261 72 228

el3d 95 56 24 52

msc10848 203 194 47 166

nd3k 294 191 32 178

s3rmt3m3 403 157 37 98

Table 5

A comparison of the performance of classic and block PCG. iters denotes the iteration count

for solving (4.13) (details in the text) and itPCG is the iteration count for solving (2.13).

for solving (2.13). The total number of iterations is ittotal =itSI+itPCG . We use the

code [1] to generate the numerical experiments.

5.1. Linear system with SI.We start by considering how to eﬃciently

compute an approximate solution of (4.13).

5.1.1. Block and classic CG. The system (4.13) has k+pright hand sides.

The number of iterations required by PCG to solve each right hand side is diﬀerent

and the variation can be large; this is illustrated in Figure 3 for problem bcsstk38.

Here we report the number of right hand sides for which the iteration count lies in

the interval [k, k + 10), k= 100,...,240. For example, there are 4 right hand sides

for which the count is between 110 and 119. Similar behaviour was observed for our

other test problems.

Table 5 reports the iteration counts for the classical PCG method and the

breakdown-free block PCG method [21,35]. For PCG, iters is the largest PCG

iteration count over the k+pright hand sides. For the block method, iters =itSIis

the number of block PCG iterations. As expected from the theory, the block method

signiﬁcantly reduces the (maximum) iteration count. For our examples, it also leads

to a modest reduction in the iteration count itPCG for solving (2.13).

5.1.2. Impact of tolerance εSI.We now study the impact of the convergence

tolerance εSIused when solving (4.13) on the quality of the Nystr¨om–Schur

preconditioner. In Table 6, we present results for three test problems that illustrate

the (slightly) diﬀerent behaviors we observed. The results demonstrate numerically

20 H. AL DAAS, T. REES, AND J. SCOTT

M2e

S1e

S2

Identiﬁer εSIitSIitPC G

el2d

0.8 1 500+

914 231

0.5 68 228

0.3 70 228

0.1 72 228

0.01 78 228

el3d

0.8 1 173

174 37

0.5 2 171

0.3 22 52

0.1 24 52

0.01 27 52

nd3k

0.8 32 178

603 143

0.5 32 178

0.3 32 178

0.1 32 178

0.01 33 178

Table 6

The eﬀects of the convergence tolerance εSIon the quality of the Nystr¨om–Schur preconditioner.

Identiﬁer M1M1-A-DEF M2M2-A-DEF M3M3-A-DEF e

S1e

S2

bcsstk38 218 218 219 219 360 313 584 122

el2d 266 267 300 300 282 282 914 231

el3d 73 72 76 75 78 76 174 37

msc10848 206 205 213 211 216 222 612 116

nd3k 205 205 210 210 211 211 603 143

s3rmt3m3 127 127 135 134 161 153 441 70

Table 7

Comparison of ittotal for the variants of the Nystr¨om–Schur preconditioner and e

S1and e

S2.

εSI= 0.1.

that a large tolerance can be used without aﬀecting the quality of the preconditioner.

Indeed, using εSI= 0.3 leads to a preconditioner whose eﬃciency is close to that of the

ideal (but impractical) two-level preconditioner e

S2. The use of a large εSIto limit itSI

is crucial in ensuring low construction costs for the Nystr¨om–Schur preconditioners.

5.2. Type of preconditioner. We next compare the performances of the

variants Miand Mi-A-DEF (i= 1,2,3) of the Nystr¨om–Schur preconditioner presented

in section 4. In Table 7, we report the total iteration count ittotal. All the variants

have similar behaviors and have a signiﬁcantly smaller count than the one-level

preconditioner e

S1.

5.3. Varying the rank and the oversampling parameter. We now look

at varying the rank kwithin the Nystr¨om algorithm and demonstrate numerically

that the eﬃciency of the preconditioner is robust with respect to the oversampling

parameter p. For problem s3rmt3m3, Table 8 compares the iteration counts for M2

with that of the ideal two-level preconditioner e

S2for kranging from 5 to 320. For e

S1,

the iteration count is 441. This demonstrates the eﬀectiveness of the Nystr¨om–Schur

preconditioner in reducing the iteration count. Increasing the size of the deﬂation

subspace (the rank k) steadily reduces the iteration count required to solve the SI

TWO-LEVEL NYSTR ¨

OM–SCHUR PRECONDITIONER FOR SPD MATRICES 21

k5 10 20 40 80 160 320

M2

itSI97 57 37 23 16 11 8

itPCG 244 203 98 53 30 20 14

e

S2itPCG 212 153 70 37 22 13 9

Table 8

Problem s3rmt3m3: Impact of the rank kon the iteration counts (p= 0).

p0 5 10 20 40

itSI37 31 28 23 20

itPCG 98 86 79 77 74

Table 9

Problem s3rmt3m3: Impact of the oversampling parameter pon the iteration counts (k= 20).

system (4.13). For the same test example, Table 9 presents the iteration counts for

a range of values of the oversampling parameter p(here k= 20). We observe that

the counts are relatively insensitive to pbut, as pincreases, itPCG reduces towards

the lower bound of 70 PCG iterations required by e

S2. Similar behavior was noticed

for our other test examples. Although increasing kand pimproves the eﬃciency

of the Nystr¨om–Schur preconditioner, this comes with extra costs during both the

construction of the preconditioner and its application. Nevertheless, the savings from

the reduction in the iteration count and the eﬃciency in solving block linear systems

of equations for moderate block sizes (for example, k= 40) typically outweigh the

increase in construction costs.

5.4. Comparisons with incomplete Cholesky factorization

preconditioners. Finally, we compare the Nystr¨om–Schur preconditioner with

two incomplete Cholesky factorization preconditioners applied to original system.

The ﬁrst is the Matlab variant ichol with the global diagonal shift set to 0.1 and

default values for other parameters and the second is the Matlab interface to the

incomplete Cholesky (IC) factorization preconditioner HSL_MI28 [39] from the HSL

library [20] using the default parameter settings. IC preconditioners are widely used

but their construction is often serial, potentially limiting their suitability for very

large problems (see [19] for an IC preconditioner that can be parallelised). In terms

of iteration counts, the Nystr¨om–Schur and the HSL_MI28 preconditioners are clearly

superior to the simple ichol preconditioner, with neither consistently oﬀering the

best performance. Figure 4 presents the residual norm history for PCG. This is

conﬁrmed by the results in the Appendix for our large test set. The residual norm for

M2decreases monotonically while for the IC preconditioners we observe oscillatory

behaviour.

Because our implementation of the Nystr¨om–Schur preconditioner is in Matlab,

we are not able to provide performance comparisons in terms of computation times.

Having demonstrated the potential of our two-level Nystr¨om–Schur preconditioner,

one of our objectives for the future is to develop an eﬃcient (parallel) implementation

in Fortran that will be included within the HSL library. This will allow users to

test out the preconditioner and to assess the performance of both constructing and

applying the preconditioner. Our preliminary work on this is encouraging.

22 H. AL DAAS, T. REES, AND J. SCOTT

Identiﬁer M2HSL_MI28 ichol

itSIitP CG

bcsstk38 46 173 593 2786

ela2d 72 228 108 2319

ela3d 24 52 36 170

msc10848 47 166 145 784

nd3k 32 178 102 1231

s3rmt3m3 37 98 610 2281

Table 10

PCG iteration counts for the Nystr¨om–Schur preconditioner M2(with k= 20) and the IC

preconditioners HSL_MI28 and ichol.

0 500 1000 1500 2000 2500 3000

Iteration count

10-8

10-6

10-4

10-2

100

102

104

Residual norm

0 500 1000 1500 2000 2500

Iteration count

10-8

10-6

10-4

10-2

100

102

Residual norm

Fig. 4.PCG residual norm history for test examples bcsstk38 (top) and ela2d (bottom).

6. Concluding comments. In this paper, we have investigated using

randomized methods to construct eﬃcient and robust preconditioners for use with

CG to solve large-scale SPD linear systems. The approach requires an initial

ordering to doubly bordered block diagonal form and then uses a Schur complement

approximation. We have demonstrated that by carefully posing the approximation

problem we can apply randomized methods to construct high quality preconditioners,

which gives an improvement over previously proposed methods that use low rank

approximation strategies. We have presented a number of variants of our new

Nystr¨om–Schur preconditioner. During the preconditioner construction, we must

solve a smaller linear system with multiple right-hand sides. Our numerical

experiments have shown that a small number of iterations of block CG are needed

to obtain an approximate solution that is suﬃcient to construct an eﬀective

preconditioner.

TWO-LEVEL NYSTR ¨

OM–SCHUR PRECONDITIONER FOR SPD MATRICES 23

Currently, the construction and application of our Nystr¨om–Schur preconditioners

requires the solution of linear systems with the block matrix AΓ(2.9). Given

the promising results presented in this paper, in the future we plan to investigate

employing a recursive approach, following ideas given in [49]. This will only require

the solution of systems involving a much smaller matrix and will lead to a practical

approach for very large-scale SPD systems. A parallel implementation of the

preconditioner will also be developed.

Appendix A. Extended numerical experiments. Here we present results

for a larger test set. The problems are given in Table 11. We selected all the SPD

matrices in the SuiteSparse Collection with nlying between 5K and 100K, giving us a

set of 71 problems. For each problem, we ran PCG with the e

S1,M2,e

S2and HSL MI28

preconditioners. In all the tests, we use 64 subdomains. For M2, we used k= 20

and set p=q= 0. Iteration counts are given in the table, whilst performance proﬁles

[6] are presented in Figure 5. In recent years, performance proﬁles have become a

popular and widely used tool for providing objective information when benchmarking

algorithms. The performance proﬁle takes into account the number of problems solved

by an algorithm as well as the cost to solve it. It scales the cost of solving the problem

according to the best solver for that problem. In our case, the performance cost is

the iteration count (for M2, we sum the counts itSIand itPC G). Note that we do

not include e

S2in the performance proﬁles because it is an ideal but impractical two-

level preconditioner and, as such, it always outperforms M2. The performance proﬁle

shows that on the problems where e

S1struggles, there is little to choose between the

overall quality of M2and HSL MI28.

e

S1M2e

S2HSL MI28 κ(A)

Identiﬁer itSIitPC G

aft01 118 19 45 31 17 9e+18

apache1 667 122 291 192 72 3e+06

bcsstk17 349 46 55 48 59 1e+10

bcsstk18 136 40 77 45 26 6e+11

bcsstk25 †92 660 453 254 1e+13

bcsstk36 451 64 214 169 †1e+12

bcsstk38 584 46 171 122 593 6e+16

bodyy6 182 53 163 129 5 9e+04

cant †57 228 396 933 5e+10

cfd1 209 30 72 50 274 1e+06

consph 185 47 177 136 50 3e+07

gridgena 426 90 377 298 66 6e+05

gyro †55 346 518 319 4e+09

gyro k †55 346 518 319 3e+09

gyro m 165 16 34 22 17 1e+07

m t1 867 85 247 187 ‡3e+11

minsurfo 15 3 15 13 3 8e+01

msc10848 612 47 168 116 145 3e+10

msc23052 479 69 220 175 ‡1e+12

nasasrb 1279 135 496 421 †1e+09

nd3k 1091 56 301 230 102 5e+07

nd6k 1184 108 325 248 116 6e+07

oilpan 647 67 122 72 507 4e+09

olafu 1428 69 489 757 557 2e+12

pdb1HYS 869 89 83 274 483 2e+12

vanbody †287 1106 769 ‡4e+03

ct20stif 1296 90 232 281 †2e+14

nd12k 1039 155 337 265 111 2e+08

nd24k 1093 165 386 268 120 2e+08

s1rmq4m1 154 19 50 32 33 5e+06

s1rmt3m1 192 24 59 39 18 3e+08

s2rmq4m1 231 28 54 41 39 4e+08

s2rmt3m1 260 31 64 45 33 3e+11

s3dkq4m2 †148 339 236 610 6e+11

e

S1M2e

S2HSL MI28 κ(A)

Identiﬁer itSIitPCG

s3dkt3m2 †164 338 270 1107 3e+10

s3rmq4m1 356 31 80 58 472 4e+10

s3rmt3m1 434 36 101 64 413 4e+10

s3rmt3m3 441 37 101 70 610 3e+00

ship 001 1453 367 600 368 1177 6e+09

smt 399 59 112 72 95 1e+09

thermal1 169 30 62 47 30 4e+01

Pres Poisson 92 13 29 19 32 3e+06

crankseg 1 92 16 49 33 34 9e+18

crankseg 2 89 17 47 32 38 8e+06

Kuu 81 16 44 31 10 3e+04

bodyy5 72 19 67 57 4 9e+03

Dubcova2 62 11 32 23 14 1e+04

cbuckle 55 9 51 39 47 7e+07

fv3 50 12 31 21 8 4e+03

Dubcova1 39 8 24 15 7 2e+03

bodyy4 34 8 29 24 4 1e+03

jnlbrng1 22 4 21 19 4 1e+02

bundle1 13 3 8 5 5 1e+04

t2dah e 12 3 12 11 3 3e+07

obstclae 12 3 12 12 3 4e+01

torsion1 12 3 12 12 3 8e+03

wathen100 12 3 12 11 3 2e+07

wathen120 12 3 12 11 3 2e+07

fv1 7 2 7 7 3 1e+01

fv2 7 2 7 7 3 1e+01

shallow water2 7 40 7 7 3 3e+12

shallow water1 5 20 5 5 2 1e+01

Muu 6 1 6 6 2 1e+02

qa8fm 6 1 6 6 2 1e+02

crystm02 6 1 6 5 2 4e+02

crystm03 6 1 6 5 2 4e+02

ﬁnan512 5 1 5 5 3 9e+01

ted B unscaled 3 1 3 4 2 4e+05

ted B 2 1 3 3 2 2e+11

Trefethen 20000b 3 1 2 2 3 1e+05

Trefethen 20000 4 1 2 2 3 2e+05

Table 11

PCG iteration counts for SPD matrices from the SuiteSparse Collection with nranging between

5K and 100K.

24 H. AL DAAS, T. REES, AND J. SCOTT

Fig. 5.Iteration count performance proﬁle for the large test set. The 40 problems used in the

right hand plot are the subset for which the e

S1(one-level) iteration count exceeded 100.

REFERENCES

[1] H. Al Daas,haldaas/Nystrom–Schur-Preconditioner: version reproducing paper numerical

experiments, June 2021, https://doi.org/10.5281/zenodo.4957301.

[2] H. Al Daas and L. Grigori,A class of eﬃcient locally constructed preconditioners based on

coarse spaces, SIAM Journal on Matrix Analysis and Applications, 40 (2019), pp. 66–91.

[3] H. Al Daas, L. Grigori, P. Jolivet, and P.-H. Tournier,A multilevel Schwarz

preconditioner based on a hierarchy of robust coarse spaces, SIAM Journal on Scientiﬁc

Computing, (2021), pp. A1907–A1928.

[4] H. Al Daas, P. Jolivet, and J. A. Scott,A robust algebraic domain decomposition

preconditioner for sparse normal equations, 2021, https://arxiv.org/abs/2107.09006.

[5] T. A. Davis and Y. Hu,The University of Florida sparse matrix collection, ACM Transactions

on Mathematical Software, 38 (2011), pp. 1–28.

[6] E. D. Dolan and J. J. Mor´

e,Benchmarking optimization software with performance proﬁles,

Mathematical Programming, 91 (2002), pp. 201–213.

[7] V. Dolean, P. Jolivet, and F. Nataf,An introduction to domain decomposition methods,

Society for Industrial and Applied Mathematics, Philadelphia, PA, 2015. Algorithms,

theory, and parallel implementation.

[8] Z. Dost´

al,Conjugate gradient method with preconditioning by projector, International Journal

of Computer Mathematics, 23 (1988), pp. 315–323.

[9] I. S. Duff, A. M. Erisman, and J. K. Reid,Direct Methods for Sparse Matrices, Second

Edition, Oxford University Press, London, 2017.

[10] J. Frank and C. Vuik,On the construction of deﬂation-based preconditioners, SIAM Journal

on Scientiﬁc Computing, 23 (2001), pp. 442–462.

[11] A. Gaul, M. H. Gutknecht, J. Liesen, and R. Nabben,A framework for deﬂated and

augmented Krylov subspace methods, SIAM Journal on Matrix Analysis and Applications,

34 (2013), pp. 495–518.

[12] A. Gittens and M. W. Mahoney,Revisiting the Nystr¨om method for improved large-scale

machine learning, J. Mach. Learn. Res., 17 (2016), pp. 3977–4041.

[13] G. H. Golub and C. F. Van Loan,Matrix Computations, The Johns Hopkins University

Press, third ed., 1996.

[14] L. Grigori, F. Nataf, and S. Yousef,Robust algebraic Schur complement preconditioners

based on low rank corrections, Research Report RR-8557, INRIA, July 2014, https://hal.

inria.fr/hal-01017448.

[15] M. H. Gutknecht,Deﬂated and augmented Krylov subspace methods: A framework for

deﬂated BiCG and related solvers, SIAM Journal on Matrix Analysis and Applications,

35 (2014), pp. 1444–1466.

[16] N. Halko, P.-G. Martinsson, and J. A. Tropp,Finding structure with randomness:

Probabilistic algorithms for constructing approximate matrix decompositions, SIAM

Review, 53 (2011), pp. 217–288.

[17] F. Hecht,New development in freefem++, Journal of Numerical Mathematics, 20 (2012),

pp. 251–265.

[18] N. J. Higham and T. Mary,A new preconditioner that exploits low-rank approximations to

TWO-LEVEL NYSTR ¨

OM–SCHUR PRECONDITIONER FOR SPD MATRICES 25

factorization error, SIAM Journal on Scientiﬁc Computing, 41 (2019), pp. A59–A82.

[19] J. Hook, J. Scott, F. Tisseur, and J. Hogg,A max-plus apporach to incomplete

Cholesky factorization preconditioners, SIAM Journal on Scientiﬁc Computing, 40 (2018),

pp. A1987–A2004.

[20] HSL. A collection of Fortran codes for large-scale scientiﬁc computation, 2018. http://www.

hsl.rl.ac.uk.

[21] H. Ji and Y. Li,A breakdown-free block conjugate gradient method, BIT Numerical

Mathematics, 57 (2017), pp. 379–403.

[22] T. B. J¨

onsth¨

ovel, M. B. van Gijzen, C. Vuik, C. Kasbergen, and A. Scarpas,

Preconditioned conjugate gradient method enhanced by deﬂation of rigid body modes

applied to composite materials, Computer Modeling in Engineering & Sciences, 47 (2009),

pp. 97–118.

[23] T. B. J¨

onsth¨

ovel, M. B. van Gijzen, C. Vuik, and A. Scarpas,On the use of rigid body

modes in the deﬂated preconditioned conjugate gradient method, SIAM Journal on Scientiﬁc

Computing, 35 (2013), pp. B207–B225.

[24] E. F. Kaasschieter,Preconditioned conjugate gradients for solving singular systems, Journal

of Computational and Applied Mathematics, 24 (1988), pp. 265–275.

[25] K. Kahl and H. Rittich,The deﬂated conjugate gradient method: Convergence, perturbation

and accuracy, Linear Algebra and its Applications, 515 (2017), pp. 111–129.

[26] G. Karypis and V. Kumar,METIS: A software package for partitioning unstructured graphs,

partitioning meshes, and computing ﬁll-reducing orderings of sparse matrices, Technical

Report 97-061, University of Minnesota, Department of Computer Science and Army HPC

Research Center, 1997.

[27] R. Li, Y. Xi, and Y. Saad,Schur complement-based domain decomposition preconditioners

with low-rank corrections, Numerical Linear Algebra with Applications, 23 (2016), pp. 706–

729.

[28] P.-G. Martinsson and J. A. Tropp,Randomized numerical linear algebra: Foundations and

algorithms, Acta Numerica, 29 (2020), pp. 403–572.

[29] METIS - serial graph partitioning and ﬁll-reducing matrix ordering, 2020. http://glaros.dtc.

umn.edu/gkhome/metis/metis/overview.

[30] R. Nabben and C. Vuik,A comparison of abstract versions of deﬂation, balancing and additive

coarse grid correction preconditioners, Numerical Linear Algebra with Applications, 15

(2008), pp. 355–372.

[31] Y. Nakatsukasa,Fast and stable randomized low-rank matrix approximation, 2020, https:

//arxiv.org/abs/2009.11392.

[32] F. Nataf, H. Xiang, V. Dolean, and N. Spillane,A coarse space construction based on local

Dirichlet-to-Neumann maps, SIAM Journal on Scientiﬁc Computing, 33 (2011), pp. 1623–

1642.

[33] R. A. Nicolaides,Deﬂation of conjugate gradients with applications to boundary value

problems, SIAM J. on Numerical Analysis, 24 (1987), pp. 355–365.

[34] E. J. Nystr¨

om,¨

Uber die praktische auﬂ¨osung von integralgleichungen mit anwendungen auf

randwertaufgaben, Acta Mathematica, 54 (1930), pp. 185–204.

[35] D. P. O’Leary,The block conjugate gradient algorithm and related methods, Linear Algebra

and its Applications, 29 (1980), pp. 293–322.

[36] Y. Saad,Iterative Methods for Sparse Linear Systems, Society for Industrial and Applied

Mathematics, Philadelphia, PA, USA, 2nd ed., 2003.

[37] A. K. Saibaba,Randomized subspace iteration: Analysis of canonical angles and unitarily

invariant norms, SIAM Journal on Matrix Analysis and Applications, 40 (2019), pp. 23–

48.

[38] J. A. Scott,A parallel frontal solver for ﬁnite element applications, International J. of

Numerical Methods in Engineering, 50 (2001), pp. 1131–1144.

[39] J. A. Scott and M. T˚

uma,HSL_MI28: An eﬃcient and robust limited-memory incomplete

Cholesky factorization code, ACM Transactions on Mathematical Software, 40 (2014),

pp. 24:1–19.

[40] N. Spillane, V. Dolean, P. Hauret, F. Nataf, C. Pechstein, and R. Scheichl,Abstract

robust coarse spaces for systems of PDEs via generalized eigenproblems in the overlaps,

Numerische Mathematik, 126 (2014), pp. 741–770.

[41] N. Spillane and D. Rixen,Automatic spectral coarse spaces for robust ﬁnite element tearing

and interconnecting and balanced domain decomposition algorithms, International Journal

for Numerical Methods in Engineering, 95 (2013), pp. 953–990.

[42] G. W. Stewart,A Krylov–Schur algorithm for large eigenproblems, SIAM Journal on Matrix

Analysis and Applications, 23 (2002), pp. 601–614.

26 H. AL DAAS, T. REES, AND J. SCOTT

[43] J. M. Tang, S. P. MacLachlan, R. Nabben, and C. Vuik,A comparison of two-level

preconditioners based on multigrid and deﬂation, SIAM Journal on Matrix Analysis and

Applications, 31 (2010), pp. 1715–1739.

[44] J. M. Tang, R. Nabben, C. Vuik, and Y. A. Erlangga,Comparison of two-level

preconditioners derived from deﬂation, domain decomposition and multigrid methods,

Journal of Scientiﬁc Computing, 39 (2009), pp. 340–370.

[45] C. Vuik, A. Segal, and J. A. Meijerink,An eﬃcient preconditioned CG method for the

solution of a class of layered problems with extreme contrasts in the coeﬃcients, Journal

of Computational Physics, 152 (1999), pp. 385–403.

[46] C. Vuik, A. Segal, J. A. Meijerink, and G. T. Wijma,The construction of projection vectors

for a deﬂated ICCG method applied to problems with extreme contrasts in the coeﬃcients,

Journal of Computational Physics, 172 (2001), pp. 426–450.

[47] C. K. I. Williams and M. Seeger,Using the Nystr¨om method to speed up kernel machines,

in Advances in Neural Information Processing Systems 13, T. K. Leen, T. G. Dietterich,

and V. Tresp, eds., MIT Press, 2001, pp. 682–688.

[48] D. Woodruff,Sketching as a Tool for Numerical Linear Algebra, Foundations and Trends(r)

in Theoretical Computer Science Series, Now Publishers, 2014.

[49] Y. Xi, R. Li, and Y. Saad,An algebraic multilevel preconditioner with low-rank corrections

for sparse symmetric matrices, SIAM Journal on Matrix Analysis and Applications, 37

(2016), pp. 235–259.

[50] Q. Zheng, Y. Xi, and Y. Saad,A power Schur complement low-rank correction preconditioner

for general sparse linear systems, SIAM Journal on Matrix Analysis and Applications, 42

(2021), pp. 659–682.