Page 1

arXiv:cs/0607105v4 [cs.NA] 17 Sep 2009

Nearly-Linear Time Algorithms for Preconditioning and Solving

Symmetric, Diagonally Dominant Linear Systems∗

Daniel A. Spielman

Department of Computer Science

Program in Applied Mathematics

Yale University

Shang-Hua Teng

Department of Computer Science

Boston University

September 17, 2009

Abstract

We present a randomized algorithm that, on input a symmetric, weakly diagonally dom-

inant n-by-n matrix A with m non-zero entries and an n-vector b, produces an ˜ x such that

??˜ x − A†b??

mlogO(1)nlog(1/ǫ).

A≤ ǫ??A†b??

Ain expected time

The algorithm applies subgraph preconditioners in a recursive fashion. These preconditioners

improve upon the subgraph preconditioners first introduced by Vaidya (1990). For any

symmetric, weakly diagonally-dominant matrix A with non-positive off-diagonal entries and

k ≥ 1, we construct in time mlogO(1)n a preconditioner of A with at most

2(n − 1) + (m/k)logO(1)n

non-zero off-diagonal entries such that the finite generalized condition number κf(A,B) is

at most k. If the non-zero structure of the matrix is planar, then the condition number is at

most

O?(n/k)lognloglog2n?,

and the corresponding linear system solver runs in expected time

O(nlog2n + nlogn (loglogn)2log(1/ǫ)).

Similar bounds are obtained on the running time of algorithms computing approximate

Fiedler vectors.

∗This paper is the last in a sequence of three papers expanding on material that appeared first under the title

“Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems” [ST04].

The second paper, “Spectral Sparsification of Graphs” [ST08c] contains algorithms for constructing sparsifiers

of graphs, which we use in this paper to build preconditioners. The first paper, “A Local Clustering Algorithm

for Massive Graphs and its Application to Nearly-Linear Time Graph Partitioning” [ST08b] contains graph

partitioning algorithms that are used to construct sparsifiers in the second paper.

This material is based upon work supported by the National Science Foundation under Grant Nos. 0325630,

0324914, 0634957, 0635102 and 0707522. Any opinions, findings, and conclusions or recommendations expressed in

this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

1

Page 2

1Introduction

We design an algorithm with nearly optimal asymptotic complexity for solving linear systems

in symmetric, weakly diagonally dominant (SDD0) matrices. The algorithm applies a classical

iterative solver, such as the Preconditioned Conjugate Gradient or the Preconditioned Chebyshev

Method, with a novel preconditioner that we construct and analyze using techniques from graph

theory. Linear systems in these preconditioners may be reduced to systems of smaller size in

linear time by use of a direct method. The smaller linear systems are solved recursively. The

resulting algorithm solves linear systems in SDD0matrices in time almost linear in their number

of non-zero entries. Our analysis does not make any assumptions about the non-zero structure

of the matrix, and thus may be applied to the solution of the systems in SDD0matrices that

arise in any application, such as the solution of elliptic partial differential equations by the

finite element method [Str86, BHV04], the solution of maximum flow problems by interior point

algorithms [FG04, DS08], or the solution of learning problems on graphs [BMN04, ZBL+03,

ZGL03].

Graph theory drives the construction of our preconditioners. Our algorithm is best un-

derstood by first examining its behavior on Laplacian matrices—the symmetric matrices with

non-positive off-diagonals and zero row sums. Each n-by-n Laplacian matrix A may be associ-

ated with a weighted graph, in which the weight of the edge between distinct vertices i and j is

−Ai,j(see Figure 1). We precondition the Laplacian matrix A of a graph G by the Laplacian

matrix B of a subgraph H of G that resembles a spanning tree of G plus a few edges. The sub-

graph H is called an ultra-sparsifier of G, and its corresponding Laplacian matrix is a very good

preconditioner for A: The finite generalized condition number κf(A,B) is logO(1)n. Moreover,

it is easy to solve linear equations in B. As the graph H resembles a tree plus a few edges, we

may use partial Cholesky factorization to eliminate most of the rows and columns of B while

incurring only a linear amount fill. We then solve the reduced system recursively.

1.5

−1.5

0

0

−1.5

4

−2

−0.5

00

−2

3

−1

−0.5

−1

1.5

2

4

3

1

1.5

0.5

1

2

Figure 1: A Laplacian matrix and its corresponding weighted graph.

The technical meat of this paper lies in the construction of ultra-sparsifiers for Laplacian

matrices, which appears in Sections 7 through 10. In the remainder of the introduction, we

formally define ultra-sparsifiers, and the sparsifiers from which they are built. In Section 2,

we survey the contributions upon which we build, and mention other related work. We devote

Section 3 to recalling the basics of support theory, defining the finite condition number, and

explaining why we may restrict out attention to Laplacian matrices.

In Section 4, we state the properties we require of partial Cholesky factorizations, and we

present our first algorithms for solving equations in SDD0-matrices. These algorithms directly

solve equations in the preconditioners, rather than using a recursive approach, and take time

roughly O(m5/4logO(1)n) for general SDD0-matrices and O(n9/8log1/2n) for SDDM0-matrices

2

Page 3

with planar non-zero structure. To accelerate these algorithms, we apply our preconditioners

in a recursive fashion. We analyze the complexity of these recursive algorithms in Section 5,

obtaining our main algorithmic results. In Section 6, we observe that these linear system solvers

yield efficient algorithms for computing approximate Fiedler vectors, when applied inside the

inverse power method.

We do not attempt to optimize the exponent of logn in the complexity of our algorithm.

Rather, we present the simplest analysis we can find of an algorithm of complexity mlogO(1)nlog(1/ǫ).

We expect that the exponent of logn can be substantially reduced through advances in the con-

structions of low-stretch spanning trees, sparsifiers, and ultrasparsifiers. Experimental work is

required to determine whether a variation of our algorithm will be useful in practice.

1.1 Ultra-sparsifiers

To describe the quality of our preconditioners, we employ the notation A ? B to indicate that

B − A is positive semi-definite. We define a SDDM0-matrix to be a SDD0-matrix with no

positive off-diagonal entries. When positive definite, the SDDM0-matrices are M-matrices and

in particular are Stieltjes matrices.

Definition 1.1 (Ultra-Sparsifiers). A (k,h)-ultra-sparsifier of an n-by-n SDDM0-matrix A with

2m non-zero off-diagonal entries is a SDDM0-matrix Assuch that

(a) As? A ? k · As.

(b) Ashas at most 2(n − 1) + 2hm/k non-zero off-diagonal entries.

(c) The set of non-zero entries of Asis a subset of the set of non-zero entries of A.

In Section 10, we present an expected mlogO(1)n-time algorithm that on input a Laplacian

matrix A and a k ≥ 1 produces a (k,h)-ultra-sparsifier of A with probability at least 1 − 1/2n,

for

h = c3logc4

2n, (1)

where c3and c4are some absolute constants. As we will use these ultra-sparsifiers throughout

the paper, we will define a k-ultra-sparsifier to be a (k,h)-ultra-sparsifier where h satisfies (1).

For matrices whose graphs are planar, we present a simpler construction of (k,h)-ultra-

sparsifiers, with h = O?logn(loglogn)2?. This simple constructions exploits low-stretch span-

ultra-sparsifiers in Section 10 builds upon the simpler construction, but requires the use of

sparsifiers. The following definition of sparsifiers will suffice for the purposes of this paper.

ning trees [AKPW95, EEST08, ABN08], and is presented in Section 9. Our construction of

Definition 1.2 (Sparsifiers). A d-sparsifier of n-by-n SDDM0-matrix A is a SDDM0-matrix As

such that

(a) As? A ? (5/4)As.

(b) Ashas at most dn non-zero off-diagonal entries.

(c) The set of non-zero entries of Asis a subset of the set of non-zero entries of A.

3

Page 4

(d) For all i,

?

j?=i

As(i,j)

A(i,j)

≤ 2|{j : A(i,j) ?= 0}|.

In a companion paper [ST08c], we present a randomized algorithm Sparsify2 that produces

sparsifiers of Laplacian matrices in expected nearly-linear time. As explained in Section 3, this

construction can trivially be extended to all SDDM0-matrices.

Theorem 1.3 (Sparsification). On input an n × n Laplacian matrix A with 2m non-zero off-

diagonal entries and a p > 0, Sparsify2 runs in expected time mlog(1/p)log17n and with

probability at least 1 − p produces a c1logc2(n/p)-sparsifier of A, for c2= 30 and some absolute

constant c1> 1.

We parameterize this theorem by the constants c1and c2as we believe that they can be

substantially improved. In particular, Spielman and Srivastava [SS08] construct sparsifiers with

c2= 1, but these constructions require the solution of linear equations in Laplacian matrices,

and so can not be used to help speed up the algorithms in this paper. Batson, Spielman and

Srivastava [BSS09] have proved that there exist sparsifiers that satisfy conditions (a) through

(c) of Definition 1.2 with c2= 0.

2 Related Work

In this section, we explain how our results relate to other rigorous asymptotic analyses of algo-

rithms for solving systems of linear equations. For the most part, we restrict our attention to

algorithms that make structural assumptions about their input matrices, rather than assump-

tions about the origins of those matrices.

Throughout our discussion, we consider an n-by-n matrix with m non-zero entries. When

m is large relative to n and the matrix is arbitrary, the fastest algorithms for solving linear

equations are those based on fast matrix multiplication [CW82], which take time approximately

O(n2.376). The fastest algorithm for solving general sparse positive semi-definite linear systems

is the Conjugate Gradient. Used as a direct solver, it runs in time O(mn) (see [TB97, Theo-

rem 28.3]). Of course, this algorithm can be used to solve a system in an arbitrary matrix A in

a similar amount of time by first multiplying both sides by AT. To the best of our knowledge,

every faster algorithm requires additional properties of the input matrix.

2.1 Special non-zero structure

In the design and analysis of direct solvers, it is standard to represent the non-zero structure

of a matrix A by an unweighted graph GAthat has an edge between vertices i ?= j if and only

if Ai,jis non-zero (see [DER86]). If this graph has special structure, there may be elimination

orderings that accelerate direct solvers. If A is tri-diagonal, in which case GAis a path, then a

linear system in A can be solved in time O(n). Similarly, when GAis a tree a linear system in

A by be solved in time O(n) (see [DER86]).

If the graph of non-zero entries GA is planar, one can use Generalized Nested Dissec-

tion [Geo73, LRT79, GT87] to find an elimination ordering under which Cholesky factorization

can be performed in time O(n1.5) and produces factors with at most O(nlogn) non-zero entries.

4

Page 5

We will exploit these results in our algorithms for solving planar linear systems in Section 4.

We recall that a planar graph on n vertices has at most 3n−6 edges (see [Har72, Corollary 11.1

(c)]), so m ≤ 6n.

2.2Subgraph Preconditioners

Our work builds on a remarkable approach to solving linear systems in Laplacian matrices

introduced by Vaidya [Vai90]. Vaidya demonstrated that a good preconditioner for a Laplacian

matrix A can be found in the Laplacian matrix B of a subgraph of the graph corresponding to

A. He then showed that one could bound the condition number of the preconditioned system by

bounding the dilation and congestion of an embedding of the graph of A into the graph of B. By

using preconditioners obtained by adding edges to maximum spanning trees, Vaidya developed

an algorithm that finds ǫ-approximate solutions to linear systems in SDDM0-matrices with at

most d non-zero entries per row in time O((dn)1.75log(1/ǫ)). When the graph corresponding

to A had special structure, such as having a bounded genus or avoiding certain minors, he

obtained even faster algorithms. For example, his algorithm for solving planar systems runs in

time O((dn)1.2log(1/ǫ)).

As Vaidya’s paper was never published and his manuscript lacked many proofs, the task

of formally working out his results fell to others. Much of its content appears in the thesis

of his student, Anil Joshi [Jos97], and a complete exposition along with many extensions was

presented by Bern et. al. [BGH+06]. Gremban, Miller and Zagha [Gre96, GMZ95] explain parts

of Vaidya’s paper as well as extend Vaidya’s techniques. Among other results, they find ways of

constructing preconditioners by adding vertices to the graphs. Maggs et. al. [MMP+05] prove

that this technique may be used to construct excellent preconditioners, but it is still not clear if

they can be constructed efficiently.

The machinery needed to apply Vaidya’s techniques directly to matrices with positive off-

diagonal elements is developed in [BCHT04]. An algebraic extension of Vaidya’s techniques for

bounding the condition number was presented by Boman and Hendrickson [BH03b], and later

used by them [BH01] to prove that the low-stretch spanning trees constructed by Alon, Karp,

Peleg, and West [AKPW95], yield preconditioners for which the preconditioned system has con-

dition number at most m2O(√lognloglogn). They thereby obtained a solver for symmetric diago-

nally dominant linear systems that produces ǫ-approximate solutions in time m1.5+o(1)log(1/ǫ).

Through improvements in the construction of low-stretch spanning trees [EEST08, ABN08] and

a careful analysis of the eigenvalue distribution of the preconditioned system, Spielman and

Woo [SW09] show that when the Preconditioned Conjugate Gradient is applied with the best

low-stretch spanning tree preconditioners, the resulting linear system solver takes time at most

O(mn1/3log1/2nlog(1/ǫ)). The preconditioners in the present paper are formed by adding edges

to these low-stretch spanning trees.

The recursive application of subgraph preconditioners was pioneered in the work of Joshi [Jos97]

and Reif [Rei98]. Reif [Rei98] showed how to recursively apply Vaidya’s preconditioners to solve

linear systems in SDDM0-matrices with planar non-zero structure and at most a constant num-

ber of non-zeros per row in time O(n1+βlogO(1)(κ(A)/ǫ)), for every β > 0. While Joshi’s anal-

ysis is numerically much cleaner, he only analyzes preconditioners for simple model problems.

Our recursive scheme uses ideas from both these works, with some simplification. Koutis and

Miller [KM07] have developed recursive algorithms that solve linear systems in SDDM0-matrices

5

Page 6

with planar non-zero structure in time O(nlog(1/ǫ)).

2.3Other families of matrices

Subgraph preconditioners have been used to solve systems of linear equations from a few other

families.

Daitch and Spielman [DS08] have shown how to reduce the problem of solving linear equa-

tions in symmetric M0-matrices to the problem of solving linear equations in SDDM0-matrices,

given a factorization of the M0-matrix of width 2 [EGB05]. These matrices, with the required

factorizations, arise in the solution of the generalized maximum flow problem by interior point

algorithms.

Shklarski and Toledo [ST08a] introduce an extension of support graph preconditioners, called

fretsaw preconditioners, which are well suited to preconditioning finite element matrices. Daitch

and Spielman [DS07] use these preconditioners to solve linear equations in the stiffness matrices

of two-dimensional truss structures in time O(n5/4lognlog(1/ǫ)).

For linear equations that arise when solving elliptic partial differential equations, other tech-

niques supply fast algorithms. For example, Multigrid methods may be proved correct when

applied to the solution of some of these linear systems [BHM01], and Hierarchical Matrices run

in nearly-linear time when the discretization is nice [BH03a]. Boman, Hendrickson, and Vavasis

[BHV04] have shown that the problem of solving a large class of these linear systems may be

reduced to that of solving diagonally-dominant systems. Thus, our algorithms may be applied

to the solution of these systems.

3 Background and Notation

By logx, we mean the logarithm of x base 2, and by lnx the natural logarithm.

We define SDD0 to be the class of symmetric, weakly diagonally dominant matrices, and

SDDM0to be the class of SDD0-matrices with non-positive off-diagonal entries. We define a

Laplacian matrix to be a SDDM0-matrix with with zero row-sums.

Throughout this paper, we define the A-norm by

√

?x?A=

xTAx.

3.1Preconditioners

For symmetric matrices A and B, we write

A ? B

if B−A is positive semi-definite. We recall that if A is positive semi-definite and B is symmetric,

then all eigenvalues of AB are real. For a matrix B, we let B†denote the Moore-Penrose pseudo-

inverse of B—that is the matrix with the same nullspace as B that acts as the inverse of B on

its image. We will use the following propositions, whose proofs are elementary.

Proposition 3.1. If A and B are positive semi-definite matrices such that for some α,β > 0,

αA ? B ? βA

then A and B have the same nullspace.

6

Page 7

Proposition 3.2. If A and B are positive semi-definite matrices having the same nullspace and

α > 0, then

αA ? B

if and only if

αB†? A†.

The following proposition notes the equivalence of two notions of preconditioning. This

proposition is called the “Support Lemma” in [BGH+06] and [Gre96], and is implied by Theo-

rem 10.1 of [Axe85]. We include a proof for completeness.

Proposition 3.3. If A and B are symmetric matrices with the same nullspace and A is positive

semi-definite, then all eigenvalues of AB†lie between λminand λmaxif and only if

λminB ? A ? λmaxB.

Proof. We first note that AB†has the same eigenvalues as A1/2B†A1/2. If for all x ∈ Image(A)

we have

λminxTx ≤ xTA1/2B†A1/2x,

then by setting z = A1/2x, we find that for all z ∈ Image(A),

λminzTA†z ≤ zTB†z,

which is equivalent to λminA†? B†and

λminB ? A.

The other side is proved similarly.

Following Bern et. al. [BGH+06], we define the finite generalized condition number κf(A,B)

of matrices A and B having the same nullspace to be the ratio of the largest to smallest non-

zero eigenvalues AB†. Proposition 3.3 tells us that λminB ? A ? λmaxB implies κf(A,B) ≤

λmax/λmin. One can use κf(A,B) to bound the number of iterations taken by the Preconditioned

Conjugate Gradient algorithm to solve linear systems in A when using B as a preconditioner.

Given bounds on λmaxand λmin, one can similarly bound the complexity of the Preconditioned

Chebyshev method.

3.2Laplacians Suffice

When constructing preconditioners, we will focus our attention on the problem of preconditioning

Laplacian matrices.

Bern et. al. [BGH+06, Lemma 2.5], observe that the problem of preconditioning SDDM0-

matrices is easily reduced to that of preconditioning Laplacian matrices. We recall the reduction

for completeness.

Proposition 3.4. Let A be a SDDM0-matrix. Then, A can be expressed as A = AL+ADwhere

AL is a Laplacian matrix and AD is a diagonal matrix with non-negative entries. Moreover,

if BL is a Laplacian matrix such that AL ? BL, then A ? BL+ AD. Similarly, if BL is a

Laplacian matrix such that BL? AL, then BL+ AD? A.

7

Page 8

So, any algorithm for constructing sparsifiers or ultra-sparsifiers for Laplacian matrices can

immediately be converted into an algorithm for constructing sparsifiers or ultra-sparsifiers of

SDDM0-matrices. Accordingly in Sections 9 and 10 we will restrict our attention to the problem

of preconditioning Laplacian matrices.

Recall that a symmetric matrix A is reducible if there is a permutation matrix P for which

PTAP is a block-diagonal matrix with at least two blocks. If such a permutation exists, one

can find it in linear time. A matrix that is not reducible is said to be irreducible. The problem

of solving a linear system in a reducible matrix can be reduced to the problems of solving linear

systems in each of the blocks. Throughout the rest of this paper, we will restrict our attention

to solving linear systems in irreducible matrices. It is well-known that a symmetric matrix is

irreducible if and only if its corresponding graph of non-zero entries is connected. We use this

fact in the special case of Laplacian matrices, observing that the weighted graph associated with

a Laplacian matrix A has the same set of edges as GA.

Proposition 3.5. A Laplacian matrix is irreducible if and only if its corresponding weighted

graph is connected.

It is also well-known that the null-space of the Laplacian matrix of a connected graph is the

span of the all-1’s vector. Combining this fact with Proposition 3.4, one can show that the only

singular irreducible SDDM0-matrices are the Laplacian matrices.

Proposition 3.6. A singular irreducible SDDM0-matrix is a Laplacian matrix, and its nullspace

is spanned by the all-1’s vector.

To the extent possible, we will describe our algorithms for solving irreducible singular and

non-singular systems similarly. The one tool that we use for which this requires some thought is

the Cholesky factorization. As the Cholesky factorization of a Laplacian matrix is degenerate, it

is not immediately clear that one can use backwards and forwards substitutions on the Cholesky

factors to solve a system in a Laplacian. To handle this technicality, we note that an irreducible

Laplacian matrix A has a factorization of the form

A = LDLT,

where L is lower-triangular and non-zero on its entire diagonal and D is a diagonal matrix with

ones on each diagonal entry, excluding the bottom right-most which is a zero. This factorization

may be computed by a slight modification of standard Cholesky factorization algorithms. The

pseudo-inverse of A can be written

A†= ΠL−TDL−1Π,

where Π is the projection orthogonal to the all-1’s vector (see Appendix D).

When A is a Laplacian and we refer to forwards or backwards substitution on its Cholesky

factors, we will mean multiplying by DL−1Π or ΠL−TD, respectively, and remark that these

operations can be performed in time proportional to the number of non-zero entries in L.

4 Solvers

We first note that by Gremban’s reduction, the problem of solving an equation of the form

Ax = b for a SDD0-matrix A can be reduced to the problem of solving a system that is twice

8

Page 9

as large in a SDDM0-matrix (see Appendix A). So, for the purposes of asymptotic complexity,

we need only consider the problem of solving systems in SDDM0-matrices.

To solve systems in an irreducible SDDM0-matrix A, we will compute an ultra-sparsifier

B of A, and then solve the system in A using a preconditioned iterative method. At each

iteration of this method, we will need to solve a system in B. We will solve a system in B

by a two-step algorithm. We will first apply Cholesky factorization repeatedly to eliminate

all rows and columns with at most one or two non-zero off-diagonal entries. As we stop the

Cholesky factorization before it has factored the entire matrix, we call this process a partial

Cholesky factorization. We then apply another solver on the remaining system. In this section,

we analyze the use of a direct solver. In Section 5, we obtain our fastest algorithms by solving

the remaining system recursively.

The application of partial Cholesky factorization to eliminate rows and columns with at most

2 non-zero off-diagonal entries results in a factorization of B of the form

B = PLCLTPT,

where C has the form

C =

?

In−n1

0

0

A1,

?

,

P is a permutation matrix, L is non-singular and lower triangular of the form

L =

?

L1,1

L2,1

0

In1,

?

,

and every row and column of A1has at least 3 non-zero off-diagonal entries.

We will exploit the properties of this factorization stated in the following proposition.

Proposition 4.1 (Partial Cholesky Factorization). If B is an irreducible SDDM0-matrix then,

(a) A1is an irreducible SDDM0-matrix and is singular if and only if A is singular.

(b) If the graph of non-zero entries of B is planar, then the graph of non-zero entries of A1is

as well.

(c) L has at most 3n non-zero entries.

(d) If B has 2(n−1+j) non-zero off-diagonal entries, then A1has dimension at most 2j −2

and has at most 2(3j − 3) non-zero off-diagonal entries.

?In−n1

Proof. It is routine to verify that A1 is diagonally dominant with non-positive off-diagonal

entries, and that planarity is preserved by elimination of rows and columns with 2 or 3 non-zero

entries, as these correspond to vertices of degree 1 or 2 in the graph of non-zero entries. It is

similarly routine to observe that these eliminations preserve irreducibility and singularity.

To bound the number of entries in L, we note that for each row and column with 1 non-zero

off-diagonal entry that is eliminated, the corresponding column in L has 2 non-zero entries,

(e) B†= ΠP−TL−T

0

A1†

0

?

L−1P−1Π, where Π is the projection onto the span of B.

9

Page 10

and that for each row and column with 2 non-zero off-diagonal entries that is eliminated, the

corresponding column in L has 3 non-zero entries.

To bound n1, the dimension of A1, first observe that the elimination of a row and column

with 1 or 2 non-zero off-diagonal entries decreases both the dimension by 1 and the number of

non-zero entries by 2. So, A1will have 2(n1− 1 + j) non-zero off-diagonal entries. As each row

in A1has at least 3 non-zero off-diagonal entries, we have

2(n1− 1 + j) ≥ 3n1,

which implies n1≤ 2j −2. The bound on the number non-zero off-diagonal entries in A1follows

immediately.

Finally, (5) may be proved by verifying that the formula given for B†satisfies all the axioms

of the pseudo-inverse (which we do in Appendix D).

We name the algorithm that performs this factorization PartialChol, and invoke it with

the syntax

(P,L,A1) = PartialChol(B).

We remark that PartialChol can be implemented to run in linear time.

4.1 One-Level Algorithms

Before analyzing the algorithm in which we solve systems in A1recursively, we pause to examine

the complexity of an algorithm that applies a direct solver to systems in A1. While the results

in this subsection are not necessary for the main claims of our paper, we hope they will provide

intuition.

If we are willing to ignore numerical issues, we may apply the conjugate gradient algorithm

to directly solve systems in A1in O(n1m1) operations [TB97, Theorem 28.3], where m1is the

number of non-zero entries in A1. In the following theorem, we examine the performance of the

resulting algorithm.

Theorem 4.2 (General One-Level Algorithm). Let A be an irreducible n-by-n SDDM0-matrix

with 2m non-zero off-diagonal entries. Let B be a√m-ultra-sparsifier of A. Let (P,L,A1) =

PartialChol(B). Consider the algorithm that solves systems in A by applying PCG with B as a

preconditioner, and solves each system in B by a performing backward substitution on its partial

Cholesky factor, solving the inner system in A1by conjugate gradient used as an exact solver,

and performing forward substitution on its partial Cholesky factor. Then for every right-hand

side b, after

O(m1/4log(1/ǫ))

iterations, comprising

O(m5/4log2c4nlog(1/ǫ))

arithmetic operations, the algorithm will output an approximate solution ˜ x satisfying

???˜ x − A†b

???A≤ ǫ

???A†b

???A. (2)

10

Page 11

Proof. As κf(A,B) ≤√m, we may apply the standard analysis of PCG [Axe85], to show that

(2) will be satisfied after O(m1/4log(1/ǫ)) iterations. To bound the number of operations in

each iteration, note that B has at most 2(n − 1) + O(√mlogc4n) non-zero off-diagonal entries.

So, Proposition 4.1 implies m1and n1are both O(√mlogc4n). Thus, the time required to solve

each inner system in A1is at most O(m1n1) = O(mlog2c4n). As A is irreducible m ≥ n −1, so

this bounds the number of operations that must be performed in each iteration.

If m is much greater than n, we could speed up this algorithm by first applying Sparsify2

to compute a very good sparse preconditioner Asfor A, using the one-level algorithm to solve

systems in As, and then applying this solver to A by iterative refinement.

When the graph of non-zero entries of A is planar, we may precondition using the the

algorithm UltraSimple, presented in Section 9, instead of UltraSparsify. As the matrix

A1produced by applying partial Cholesky factorization to the output of UltraSimple is also

planar, we can solve the linear systems in A1by the generalized nested dissection algorithm of

Lipton, Rose and Tarjan [LRT79]. This algorithm uses graph separators to choose a good order

for Cholesky factorization. The Cholesky factorization is then computed in time O(n3/2

resulting Cholesky factors only have O(n1logn1) non-zero entries, and so each linear system in

A1may be solved in time O(n1logn1), after the Cholesky factors have been computed.

1). The

Theorem 4.3 (Planar One-Level Algorithm). Let A be an n-by-n planar SDDM0-matrix with

m non-zero entries. Consider the algorithm that solves linear systems in A by using PCG with

the preconditioner

B = UltraSimple(A,n3/4log1/3n),

and solves systems in B by applying PartialChol to factor B into PL[I,0;0,A1]LTPT, and

uses generalized nested dissection to solve systems in A1. For every right-hand side b, this

algorithm computes an ˜ x satisfying

???˜ x − A†b

in time

On9/8log1/2nlog(1/ǫ)

???A≤ ǫ

???A†b

???A

?

(3)

?

.

Proof. First, recall that the planarity of A implies m ≤ 3n. Thus, the time taken by UltraSimple

is dominated by the time taken by LowStretch, which is O(nlog2n).

By Theorem 9.1 and Theorem 9.5, the matrix B has at most 2(n−1)+6n3/4log1/3n non-zero

off-diagonal entries and

?

Again, standard analysis of PCG [Axe85] tells us that the algorithm will require at most

?

iterations guarantee that (3) is satisfied.

κf(A,B) = On1/4log2/3nlog2logn

?

≤ O

?

n1/4logn

?

.

On1/8log1/2nlog(1/ǫ)

?

11

Page 12

By Proposition 4.1, the dimension of A1, n1, is at most 6n3/4log1/3n. Before beginning to

solve the linear system, the algorithm will spend

O(n3/2

1) = O((n3/4log1/3n)3/2) = O(n9/8log1/2n)

time using generalized nested dissection [LRT79] to permute and Cholesky factor the matrix A1.

As the factors obtained will have at most O(n1logn1) ≤ O(n) non-zeros, each iteration of the

PCG will require at most O(n) steps. So, the total complexity of the application of the PCG

will be

On ·

which dominates the time required to compute the Cholesky factors and the time of the call to

UltraSimple.

??

n1/8log1/2nlog(1/ǫ)

??

= O

?

n9/8log1/2nlog(1/ǫ)

?

,

5The Recursive Solver

In our recursive algorithm for solving linear equations, we solve linear equations in a matrix A

by computing an ultra-sparsifier B, using partial Cholesky factorization to reduce it to a matrix

A1, and then solving the system in A1recursively. Of course, we compute all of the necessary

ultra-sparsifiers and Cholesky factorizations just once at the beginning of the algorithm.

To specify the recursive algorithm for an n-by-n matrix, we first set the parameters

χ = c3logc4n,(4)

and

k = (14χ + 1)2, (5)

where we recall that c3 and c4 are determined by the quality of the ultra-sparsifiers we can

compute (see equation (1)), and were used to define a k-ultra-sparsifier.

We the following algorithm BuildPreconditioners to build the sequence of preconditioners

and Cholesky factors. In Section 10, we define the routine UltraSparsify for weighted graphs,

and thus implicitly for Laplacian matrices.For general irreducible SDDM0-matrices A, we

express A as a sum of matrices ALand ADas explained in Proposition 3.4, and return ADplus

the ultra-sparsifier of the Laplacian matrix AL.

12

Page 13

BuildPreconditioners(A0),

1. i = 0.

2. Repeat

(a) i = i + 1.

(b) Bi= UltraSparsify(Ai−1,k).

(c) (Pi,Li,Ai) = partialChol(Bi).

(d) Set Πito be the projection onto the span of Bi.

Until Aihas dimension less than 66χ + 6.

3. Set ℓ = i.

4. Compute Zℓ= Aℓ†.

We now make a few observations about the sequence of matrices this algorithm generates.

In the following, we let noff (A) denote the number of non-zero off-diagonal entries in the upper-

triangular portion of A, and let dim(A) denote the dimension of A.

Proposition 5.1 (Recursive Preconditioning). If A0 is a symmetric, irreducible, SDDM0-

matrix, and for each i the matrix Biis a k-ultra-sparsifier of Ai, then

(a) For i ≥ 1, noff (Ai) ≤ (3χ/k)noff (Ai−1).

(b) For i ≥ 1, dim(Ai) ≤ (2χ/k)noff (Ai−1).

(c) For i ≥ 1, dim(Bi) = dim(Ai−1).

(d) Each of Biand Aiis an irreducible SDDM0-matrix.

(e) Each Aiand Biis a Laplacian matrix if and only if A0is as well.

(f) If A0is a Laplacian matrix, then each Πiis a projection orthogonal to the all-1’s vector.

Otherwise, each Πiis the identity.

Proof. Let nibe the dimension of Ai. Definition 1.1 tells us that

noff (Bi) ≤ n − 1 + hnoff (Ai)/k = n − 1 + noff (Ai)χ/k.

Parts (a), (b), (d) and (e) now follow from Proposition 4.1. Part (c) is obvious, and part (f)

follows from Proposition 3.6.

Our recursive solver will use each matrix Bias a preconditioner for Ai−1. But rather than

solve systems in Bidirectly, it will reduce these to systems in Ai, which will in turn be solved

recursively. Our solver will use the preconditioned Chebyshev method, instead of the precon-

ditioned conjugate gradient. This choice is dictated by the requirements of our analysis rather

than by common sense. Our preconditioned Chebyshev method will not take the preconditioner

13

Page 14

Bias input. Rather, it will take a subroutine solveBithat produces approximate solutions to

systems in Bi. So that we can guarantee that our solvers will be linear operators, we will fix

the number of iterations that each will perform, as opposed to allowing them to terminate upon

finding a sufficiently good solution. While this trick is necessary for our analysis, it may also be

unnecessary in practice1.

For concreteness, we present pseudocode for the variant of the preconditioned Chebyshev

algorithm that we will use. It is a modification of the pseudocode presented in [BBC+94, page

36], the difference being that it takes as input a parameter t determining the number of iterations

it executes (and some variable names have been changed).

x = precondCheby(A,b,t,f(·),λmin,λmax)

(0) Set x = 0.

(1) r = b

(2) d = (λmax+ λmin)/2, c = (λmax− λmin)/2

(3) for i = 1,...,t,

(a) z = f(r)

(b) if i = 1,

x = z

α = 2/d

else,

β = (cα/2)2

α = 1/(d − β)

x = z + βx

(c) x = x + αx

(d) r = b − Ax

Proposition 5.2 (Linear Chebyshev). Let A be a positive semi-definite matrix and f be a

positive semi-definite, symmetric linear operator such that

λminf†? A ? λmaxf†.(6)

Let ǫ < 1 and let

t ≥

?

1

2

?λmax

λmin

ln2

ǫ

?

. (7)

Then, the function precondCheby(A,b,t,f,λmin,λmax) is a symmetric linear operator in b with

the same nullspace as A. Moreover, if Z is the matrix realizing this operator, then

(1 − ǫ)Z†? A ? (1 + ǫ)Z†.

1One could obtain a slightly weaker analysis of this algorithm if one instead allowed the Chebyshev solvers to

terminate as soon as they found a sufficiently accurate solution. In an early version of this paper, we analyzed such

an algorithm using the analysis of the inexact preconditioned Chebyshev iteration by Golub and Overton [GO88].

This analysis was improved by applying a slight extension by Joshi [Jos97] of Golub and Overton’s analysis. The

idea of bypassing these analysis by forcing our solvers to be linear operators was suggested to us by Vladimir

Rokhlin.

14

Page 15

Proof. By Proposition 3.1, condition (6) implies that f and A have the same nullspace. An

inspection of the pseudo-code reveals that the function computed by precondCheby can be

expressed as a sum of monomials of the form f(Af)i, from which it follows that this function is

a symmetric linear operator having the same nullspace as A. Let Z be the matrix realizing this

operator.

Standard analyses of the preconditioned Chebyshev algorithm [Axe85, Section 5.3] imply

that for all b in the range of A,

???Zb − A†b

Now, consider any non-zero eigenvalue λ and eigenvector b of AZ, so that

???A≤ ǫ

???A†b

???A.

AZb = λb.

Multiplying on the left by A†and using the fact that Z and A have the same nullspace, we

obtain

Zb = λA†b.

Plugging this into the previous inequality, we find

ǫ

???A†b

???A≥

???Zb − A†b

???A= |λ − 1|

???A†b

???A,

and so λ must lie between 1 − ǫ and 1 + ǫ. Applying Proposition 3.3, we obtain

(1 − ǫ)Z†? A ? (1 + ǫ)Z†.

We can now state the subroutine solveBifor i = 1,...,ℓ.

x = solveBi(b)

1. Set λmin= 1 − 2e−2, λmax= (1 + 2e−2)k and t =

?

1.33√k

?

.

2. Set s = L−1

iP−1

?

i

Πib.

?

3. Write s =

s0

s1

, where the dimension of s1is the size of Ai.

4. Set y0= s0, and

(a) if i = ℓ, set y1= Zℓs1

(b) else, set y1= precondCheby(Ai,s1,solveBi+1,t,λmin,λmax).

?

5. Set x = ΠiP−T

i

L−T

i

y0

y1

?

.

We have chosen the parameters λmin, λmax, and t so that inequality (7) holds for for ǫ = 2e−2.

15

Page 16

We note that we apply L−T

constructing the inverses. Similarly, Πimay be applied in time proportional to the length of b as

it is either the identity, or the operator that orthogonalizes with respect to the all-1’s vector. We

remark that the multiplications by Πiare actually unnecessary in our code, as solveBiwill only

appear inside a call to precondCheby, in which case it is multiplied on either side by matrices

that implicitly contain Πi. However, our analysis is simpler if we include these applications of

Πi.

i

and L−1

i

by forward and backward substitution, rather than by

Lemma 5.3 (Correctness of solveBi). If A is an irreducible SDDM0-matrix and Bi? Ai−1?

kBifor all i ≥ 1, then for 1 ≤ i ≤ ℓ,

(a) The function solveBiis a symmetric linear operator.

(b) The function precondCheby(Ai−1,b,solveBi,t,λmin,λmax) is a symmetric linear operator

in b.

(c)

(1 − 2e−2)Zi†? Ai? (1 + 2e−2)Zi†,

where for i ≤ l − 1, Ziis the matrix such that

Zis1= precondCheby(Ai,s1,solveBi+1,t,λmin,λmax).

(d)

(1 − 2e−2)solveBi

†? Bi? (1 + 2e−2)solveBi

†.

Proof. We first prove (a) and (b) by reverse induction on i. The base case of our induction is

when i = ℓ, in which case BuildPreconditioners sets Zℓ= Aℓ†, and so

solveBℓ= ΠℓP−T

ℓ

L−T

ℓ

?

I

0

0

Zℓ

?

L−1

ℓP−1

ℓ

Πℓ,

which is obviously a symmetric linear operator.

operator, part (b) for Ai−1follows from Proposition 5.2. Given that (b) holds for Aiand that

the call to precondCheby is realized by a symmetric matrix Zi, we then have that

Given that solveBiis a symmetric linear

solveBi= ΠiP−T

i

L−T

i

?

I

0

0

Zi

?

L−1

iP−1

i

Πi

is a symmetric linear operator. We may thereby establish that (a) and (b) hold for all 1 ≥ i ≥ ℓ.

We now prove properties (c) and (d), again by reverse induction. By construction Zℓ= Aℓ†,

so (c) holds for i = ℓ. To see that if (c) holds for i, then (d) does also, note that

(1 − 2e−2)Zi†? Ai

(1 − 2e−2)Ai†? Zi,

(1 − 2e−2)

0

implies

by Proposition 3.2, which implies

?I0

Ai†

?

?

?

I

0

0

Zi

?

which implies

16

Page 17

(1 − 2e−2)Bi†= (1 − 2e−2)ΠiP−T

i

L−T

i

?I0

Ai†

0

?

L−1

iP−1

i

Πi

(by Proposition 4.1 (e))

? ΠiP−T

i

L−T

i

?

I

0

0

Zi

?

L−1

iP−1

i

Πi= solveBi,

which by Proposition 3.2 implies (1−2e−2)solveBi†? Bi. The inequality Bi? (1+2e−2)solveBi†

may be established similarly.

To show that when (d) holds for i then (c) holds for i−1, note that (d) and Bi? Ai−1? k·Bi

imply

(1 − 2e−2)solveBi

So, (c) for i − 1 now follows from Proposition 5.2 and the fact that λmin,λmaxand t have been

chosen so that inequality (7) is satisfied with ǫ = 2e−2.

†? Ai−1? k(1 + 2e−2)solveBi

†.

Lemma 5.4 (Complexity of solveBi). If A0is an irreducible, n-by-n SDDM0-matrix with 2m

non-zero off-diagonal entries and each Biis a k-ultra-sparsifier of Ai−1, then solveB1runs in

time

O(n + m).

Proof. Let Tidenote the running time of solveBi. We will prove by reverse induction on i that

there exists a constant c such that

Ti≤ c(dim(Bi) + (γχ + δ)(noff (Ai) + dim(Ai))), (8)

where

γ = 196 andδ = 15.

This will prove the lemma as dim(B1) = dim(A0) = n, and Proposition 5.1 implies

(γχ + δ)(noff (Ai) + dim(Ai)) ≤ (γχ + δ)5χm

k

≤ m5γχ2+ 5δχ

(14χ + 1)2= O(m).

To prove (8), we note that there exists a constant c so that steps 2 and 5 take time at most

c(dim(Bi)) (by Proposition 4.1), step 4a takes time at most c(dim(Aℓ)2), and step 4b takes

time at most t(c · dim(Ai) + c · noff (Ai) + Ti+1), where t is as defined on step 1 of solveBi.

The base case of our induction will be i = ℓ, in which case the preceding analysis implies

?

≤ c(dim(Bℓ) + (66χ + 6)dim(Aℓ)),

which satisfies (8). We now prove (8) is true for i < ℓ, assuming it is true for i + 1. We have

Tℓ≤ cdim(Bℓ) + dim(Aℓ)2?

(by step 2 of BuildPreconditioners)

Ti≤ c(dim(Bi)) + t(c · dim(Ai) + c · noff (Ai) + Ti+1)

≤ c?dim(Bi) + t?dim(Ai) + noff (Ai) + dim(Bi+1) + (γχ + δ)(noff (Ai+1) + dim(Ai+1))??

(by the induction hypothesis)

≤ c?dim(Bi) + t?2 dim(Ai) + noff (Ai) + (γχ + δ)(5 noff (Ai)χ/k)??

17

Page 18

(by Proposition 5.1)

≤ c[dim(Bi) + t(2 dim(Ai) + 6 noff (Ai))],

as γχ2+ δχ ≤ k. As

6t ≤ 6 · (1.33(14χ + 1) + 1) ≤ γχ + δ,

we have proved that (8) is true for i as well.

We now state and analyze our ultimate solver.

x = solve(A,b,ǫ)

1. Set λmin= 1 − 2e−2, λmax= (1 + 2e−2)k and t =

?

0.67√kln(2/ǫ)

?

.

2. Run BuildPreconditioners(A).

3. x = precondCheby(A,b,solveB1,t,λmin,λmax).

Theorem 5.5 (Nearly Linear-Time Solver). On input an irreducible n-by-n SDDM0-matrix A

with 2m non-zero off-diagonal entries and an n-vector b, with probability at least 1 − 1/500,

solve(A,b,ǫ) runs in time

O(mlogc4mlog(1/ǫ)) + mlogO(1)m

and produces an ˜ x satisfying

???˜ x − A†b

???A≤ ǫ

???A†b

???A.

Proof. By Proposition 5.1, the numbers noff (Ai) are geometrically decreasing, and l ≤ logk/3χm.

So we may use Theorem 10.5 to show that the time required to build the preconditioners is at

most mlogO(1)m. If each Biis a k-ultra-sparsifier of Ai−1, then the bound on the A-norm of

the output follows by an analysis similar to that used to prove Lemma 5.3. In this case, we may

use Lemma 5.4 to bound on the running time of step 3 by

√

klog(1/ǫ)) = O(mlogc4nlog(1/ǫ)).

O(mt) = O(m

The probability that there is some Bithat is not a k-ultra-sparsifier of Ai−1is at most

?

assuming c3,c4≥ 1.

If the non-zero structure of A is planar, then by Theorem 9.5, we can replace all the calls

to UltraSparsify in the above algorithm with calls to UltraSimple. By Theorem 9.1, this

is like having (k,h)-ultra-sparsifiers with h = O(lognlog2logn). Thus, the same analysis goes

through with χ = O(lognlog2logn), and the resulting linear system solver runs in time

i

1

2 dim(Bi)≤

l

2(66χ + 6)≤

logk/3χm

2(66χ + 6)< 1/500,

O(nlog2n + nlogn log2logn log(1/ǫ)).

18

Page 19

We remark that our analysis is very loose when m is much larger than n. In this case, the

first ultra-sparsifier constructed, B1, will probably have close to n edges, which could be much

lower than the bound proved in Proposition 5.1. While it is not necessary for the proof of our

theorem, one could remove this slack by setting B1= Sparsify(A0,1/2n) in this case.

6 Computing Approximate Fiedler Vectors

Fiedler [Fie73] was the first to recognize that the eigenvector associated with the second-smallest

eigenvalue of the Laplacian matrix of a graph could be used to partition a graph. From a result

of Mihail [Mih89], we know that any vector whose Rayleigh quotient is close to this eigenvalue

can also be used to find a good partition. We call such a vector an approximate Fiedler vector.

Definition 6.1 (Approximate Fiedler Vector). For a Laplacian matrix A, v is an ǫ-approximate

Fiedler vector if v is orthogonal to the all-1’s vector and

vTAv

vTv

≤ (1 + ǫ)λ2(A),

where λ2(A) is the second-smallest eigenvalue of A.

Our linear system solvers may be used to quickly compute ǫ-approximate Fiedler vectors.

We will prove that the following algorithm does so with probability at least 1 − p.

v = ApproxFiedler(A,ǫ,p)

1. Set λmin= 1 − 2e−2, λmax= (1 + 2e−2)k and t =

?

0.67√kln(8/ǫ)

?

.

2. Set k = 8ln(18(n − 1)/ǫ)/ǫ.

3. For a = 1,...,⌈log2p⌉.

(a) Run BuildPreconditioners(A).

(b) Choose r0to be a random unit vector orthogonal to the all-1’s vector.

(c) For b = 1,...,k

rb= precondCheby(A,rb−1,solveB1,t,λmin,λmax).

(d) Set va= rk.

4. Let a0be the index of the vector minimizing vT

a0Ava0/vT

a0va0.

5. Set v = va0.

Theorem 6.2. On input a Laplacian matrix A with m non-zero entries and ǫ,p > 0, with

probability at least 1−1/p, ApproxFiedler(A,ǫ,p) computes an ǫ-approximate Fiedler vector of

A in time

mlogO(1)mlog(1/p)log(1/ǫ)/ǫ.

19

Page 20

Our proof of Theorem 6.2 will use the following proposition.

Proposition 6.3. If Z is a matrix such that

(1 − ǫ)Z†? A ? (1 + ǫ)Z†,

and v is a vector such that vTZ†v ≤ (1+ǫ)λ2(Z†), for some ǫ ≤ 1/5, then v is a 4ǫ-approximate

Fiedler vector of A.

Proof. We first observe that

λ2(Z†) ≤ λ2(A)/(1 − ǫ).

We then compute

vTAv ≤ (1 + ǫ)vTZ†v

≤ (1 + ǫ)(1 + ǫ)λ2(Z†)

≤ (1 + ǫ)(1 + ǫ)λ2(A)/(1 − ǫ)

≤ (1 + 4ǫ)λ2(A),

for ǫ ≤ 1/5.

Proof of Theorem 6.2. As we did in the proof of Lemma 5.3 and Theorem 5.5, we can show that

precondCheby(A,b,solveB1,t,λmin,λmax) is a linear operator in b. Let Z denote the matrix

realizing this operator. As in the proof of Lemma 5.3, we can show that (1 − ǫ/4)Z†? A ?

(1 + ǫ/4)Z†.

By Proposition 6.3, it suffices to show that with probability at least 1/2 each vector va

satisfies

vT

aZ†va/vT

ava≤ (1 + ǫ/4)λ2(Z†).

To this end, let 0 = µ1≤ µ2≤ ··· ≤ µnbe the eigenvalues of Z†, and let 1 = u1,...,unbe

corresponding eigenvectors. Let

r0=

?

i≥2

αiui,

and recall that (see e.g. [SST06, Lemma B.1])

?

Thus, with probably at least 1/2, the call to BuildPreconditioners succeeds and |α2| ≥

2/3?(n − 1). In this case,

We now show that this inequality implies that rksatisfies

Pr

|α2| ≥ 2/3

?

(n − 1)

?

≥

2

√2π

?∞

2/3

e−t2/2dt ≥ 0.504.

k ≥ 8ln(8/α2

2ǫ)/ǫ.(9)

(rk)TZ†rk

(rk)Trk

≤ (1 + ǫ/4)µ2.

To see this, let j be the greatest index such that µj≤ (1 + ǫ/8)µ2, and compute

rk= Zkr0=

?

i≥2

αi/µk

iui,

20

Page 21

so

(rk)TZ†rk

(rk)Trk

=

?

?

i≥2α2

?

?

i/µ2k−1

i

i≥2α2

j≥i≥2α2

j≥i≥2α2

?

i/µ2k

i/µ2k−1

i/µ2k

i>jα2

α2

i

≤

i

i

+

?

i>jα2

?

i/µ2k−1

i

i≥2α2

i/µ2k

i

≤ µj+

i/µ2k−1

i

2/µ2k

2

≤ (1 + ǫ/8)µ2+ µ2

??

??

?

i>jα2

i(µ2/µi)2k−1

α2

2

?

≤ (1 + ǫ/8)µ2+ µ2

i>jα2

i(1/(1 + ǫ/8))2k−1

α2

2

?

≤ (1 + ǫ/8)µ2+ µ2

i>j

α2

iǫ/8(by inequality (9))

≤ (1 + ǫ/8)µ2+ µ2(ǫ/8)

≤ (1 + ǫ/4)µ2.

7Laplacians and Weighted Graphs

We will find it convenient to describe and analyze our preconditioners for Laplacian matrices

in terms of weighted graphs. This is possible because of the isomorphism between Laplacian

matrices and weighted graphs. To an n-by-n Laplacian matrix A, we associate the graph with

vertex set {1,...,n} having an edge between vertices u and v of weight −A(u,v) for each u and

v such that A(u,v) is non-zero.

All the graphs we consider in this paper will be weighed. If u and v are distinct vertices in

a graph, we write ( ( (u,v) ) ) to denote an edge between u and v of weight 1. Similarly, if w > 0,

then we write w( ( (u,v) ) ) to denote an edge between u and v of weight w. A weighted graph is then

a pair G = (V,E) where V is a set of vertices and E is a set of weighted edges on V , each of

which spans a distinct pair of vertices. The Laplacian matrix LGof the graph G is the matrix

such that

We recall that for every vector x ∈ IRn,

xTLGx =

LG(u,v) =

−w

0

?

if there is an edge w( ( (u,v) ) ) ∈ E

if u ?= v and there is no edge between u and v in E

if u = v.

w( ( (u,x) ) )∈Ew

?

w( ( (u,v) ) )∈E

w(xu− xv)2.

For graphs G and H, we define the graph G+H to be the graph whose Laplacian matrix is

LG+ LH.

21

Page 22

8Graphic Inequalities, Resistance, and Low-Stretch Spanning

Trees

In this section, we introduce the machinery of “graphic inequalities” that underlies the proofs in

the rest of the paper. We then introduce low-stretch spanning trees, and use graphic inequalities

to bound how well a low-stretch spanning tree preconditions a graph. This proof provides the

motivation for the construction in the next section.

We begin by overloading the notation ? by writing

G ? HandE ? F

if G = (V,E) and H = (V,F) are two graphs such that their Laplacian matrices, LGand LH

satisfy

LG? LH.

Many facts that have been used in the chain of work related to this paper can be simply

expressed with this notation. For example, the Splitting Lemma of [BGH+06] becomes

A1? B1

andA2? B2

impliesA1+ A2? B1+ B2.

We also observe that if B is a subgraph of A, then

B ? A.

We define the resistance of an edge to be the reciprocal of its weight. Similarly, we define

the resistance of a simple path to be the sum of the resistances of its edges. For example,

the resistance of the path w1( ( (1,2) ) ),w2( ( (2,3) ) ),w3( ( (3,4) ) ) is (1/w1+ 1/w2+ 1/w3). Of course, the

resistance of a trivial path with one vertex and no edges is zero. If one multiplies all the weights

of the edges in a path by α, its resistance decreases by a factor of α.

The next lemma says that a path of resistance r supports an edge of resistance r. This

lemma may be derived from the Rank-One Support Lemma of [BH03b], and appears in simpler

form as the Congestion-Dilation Lemma of [BGH+06] and Lemma 4.6 of [Gre96].

Lemma 8.1 (Path Inequality). Let e = w( ( (u,v) ) ) and let P be a path from u to v. Then,

e ? w resistance(P) · P.

Proof. After dividing both sides by w, it suffices to consider the case w = 1. Without loss of

generality, we may assume that e = ( ( (1,k + 1) ) ) and that P consists of the edges wi( ( (i,i + 1) ) ) for

1 ≤ i ≤ k. In this notation, the lemma is equivalent to

??

We prove this for the case k = 2. The general case follows by induction.

Recall Cauchy’s inequality, which says that for all 0 < α < 1,

( ( (1,k + 1) ) ) ?

i

1

wi

?

?w1( ( (1,2) ) ) + w2( ( (2,3) ) ) + ··· + wk( ( (k,k + 1) ) )?.

(a + b)2≤ a2/α + b2/(1 − α).

22

Page 23

For k = 2, the lemma is equivalent to

(x1− x3)2≤ (1 + w1/w2)(x1− x2)2+ (1 + w2/w1)(x2− x3)2,

which follows from Cauchy’s inequality with α = w2/(w1+ w2).

Recall that a spanning tree of a weighted graph G = (V,E) is a connected subgraph of G

with exactly |V |−1 edges. The weights of edges that appear in a spanning tree are assumed to

be the same as in G. If T is a spanning tree of a graph G = (V,E), then for every pair of vertices

u,v ∈ V , T contains a unique path from u to v. We let T(u,v) denote this path. We now use

graphic inequalities to derive a bound on how well T preconditions G. This bound strengthens

a bound of Boman and Hendrickson [BH03b, Lemma 4.9].

Lemma 8.2. Let G = (V,E) be a graph and let T be a spanning tree of G. Then,

??

Proof. As T is a subgraph of G, T ? G is immediate. To prove the right-hand inequality, we

compute

?

?

??

T ? G ?

e∈E

resistance(T(e))

resistance(e)

?

· T.

E =

e∈E

e

?

e∈E

resistance(T(e))

resistance(e)

· T(e),

?

by Lemma 8.1

?

e∈E

resistance(T(e))

resistance(e)

· T,as T(e) ? T.

Definition 8.3. Given a tree T spanning a set of vertices V and a weighted edge e = w( ( (u,v) ) )

with u,v ∈ V , we define the stretch of e with respect to T to be

stT(e) =resistance(T(e))

resistance(e)

= w · resistance(T(e)).

If E is a set of edges on V , then we define

stT(E) =

?

e∈E

stT(e).

With this definition, the statement of Lemma 8.2 may be simplified to

T ? G ? stT(E) · T.(10)

We will often use the following related inequality, which follows immediately from Lemma 8.1

and the definition of stretch.

w( ( (u,v) ) ) ? stT(w( ( (u,v) ) )) T(u,v) = w stT(( ( (u,v) ) )) T(u,v), (11)

where we recall that T(u,v) is the unique path in T from u to v.

23

Page 24

9 Preconditioning with Low-Stretch Trees

In this section, we present a simple preconditioning algorithm, UltraSimple, that works by

simply adding edges to low-stretch spanning trees. This algorithm is sufficient to obtain all our

results for planar graphs. For arbitrary graphs, this algorithm might add too many additional

edges. We will show in Section 10 how these extra edges can be removed via sparsification.

9.1 Low-Stretch Trees

Low-stretch spanning trees were introduced by Alon, Karp, Peleg and West [AKPW95]. At

present, the construction of spanning trees with the lowest stretch is due to Abraham, Bartal

and Neiman [ABN08], who prove

Theorem 9.1 (Low Stretch Spanning Trees). There exists an O(mlogn + nlog2n)-time algo-

rithm, LowStretch, that on input a weighted connected graph G = (V,E), outputs a spanning

tree T of G such that

stT(E) ≤ cABN mlognloglogn(logloglogn)3,

where m = |E|, for some constant cABN. In particular, stT(E) = O(mlogn log2logn).

9.2 Augmenting Low-Stretch Spanning Trees

To decide which edges to add to the tree, we first decompose the tree into a collection of

subtrees so that no non-singleton subtree is attached to too many edges of E of high stretch.

In the decomposition, we allow subtrees to overlap at a single vertex, or even consist of just a

single vertex. Then, for every pair of subtrees connected by edges of E, we add one such edge

of E to the tree. The subtrees are specified by the subset of the vertices that they span.

Definition 9.2. Given a tree T that spans a set of vertices V , a T-decomposition is a decom-

position of V into sets W1,...,Whsuch that V = ∪Wi, the graph induced by T on each Wiis a

tree, possibly with just one vertex, and for all i ?= j, |Wi∩ Wj| ≤ 1.

Given an additional set of edges E on V , a (T,E)-decomposition is a pair ({W1,...,Wh},ρ)

where {W1,...,Wh} is a T-decomposition and ρ is a map that sends each edge of E to a set or

pair of sets in {W1,...,Wh} so that for each edge in (u,v) ∈ E,

(a) if ρ(u,v) = {Wi} then {u,v} ∈ Wi, and

(b) if ρ(u,v) = {Wi,Wj}, then either u ∈ Wiand v ∈ Wj, or u ∈ Wjand v ∈ Wi.

We remark that as the sets Wiand Wjcan overlap, it is possible that ρ(u,v) = {Wi,Wj},

u ∈ Wiand v ∈ Wi∩ Wj.

We use the following tree decomposition theorem to show that one can always quickly find a

T-decomposition of E with few components in which the sum of stretches of the edges attached

to each component is not too big. As the theorem holds for any non-negative function η on the

edges, not just stretch, we state it in this general form.

24

Page 25

W6

W3

W4

W5

W1

W2

Figure 2: An example of a tree decomposition. Note that sets W1and W6overlap, and that set

W5is a singleton set and that it overlaps W4.

Theorem 9.3 (decompose). There exists a linear-time algorithm, which we invoke with the

syntax

({W1,...,Wh},ρ) = decompose(T,E,η,t),

that on input a set of edges E on a vertex set V , a spanning tree T on V , a function η : E → IR+,

and an integer 1 < t ≤?

(a) h ≤ t,

(b) for all Wisuch that |Wi| > 1,

?

e∈Eη(e), outputs a (T,E)-decomposition ({W1,...,Wh},ρ), such that

e∈E:Wi∈ρ(e)

η(e) ≤4

t

?

e∈E

η(e).

For pseudo-code and a proof of this theorem, see Appendix C. We remark that when t ≥ n,

the algorithm can just construct a singleton set for every vertex.

For technical reasons, edges with stretch less than 1 can be inconvenient. So, we define

η(e) = max(stT(e),1)andη(E) =

?

e∈E

η(e). (12)

The tree T should always be clear from context.

Given a (T,E)-decomposition, ({W1,...,Wh},ρ), we define the map

σ : {1,...,h} × {1,...,h} → E ∪ {undefined}

by setting

?

undefined

σ(i,j) =

argmaxe:ρ(e)={Wi,Wj}weight(e)/η(e), if i ?= j and such an e exists

otherwise.

(13)

In the event of a tie, we let e be the lexicographically least edge maximizing weight(e)/η(e) such

that ρ(e) = {Wi,Wj}. Note that σ(i,j) is a weighted edge.

The map σ tells us which edge from E between Wi and Wj to add to T. The following

property of σ, which follows immediately from its definition, will be used in our analysis in this

and the next section.

25

Page 26

Proposition 9.4. For every i,j such that σ(i,j) is defined and for every e ∈ E such that

ρ(e) = {Wi,Wj},

weight(e)

η(e)

≤weight(σ(i,j))

η(σ(i,j))

.

We can now state the procedure by which we augment a spanning tree.

F = AugmentTree(T,E,t),

E is set of weighted edges,

T is a spanning tree of the vertices underlying E,

t is an integer.

1. Compute stT(e) for each edge e ∈ E.

2. Set ((W1,...,Wh),ρ) = decompose(T,E,η,t), where η(e) is as defined in (12).

3. Set F to be the union of the weighted edges σ(i,j) over all pairs 1 ≤ i < j ≤ h for which

σ(i,j) is defined, where σ(i,j) is as defined in (13).

A = UltraSimple(E,t)

1. Set T = LowStretch(E).

2. Set F = AugmentTree(T,E,t).

3. Set A = T ∪ F.

We remark that when t ≥ n, UltraSimple can just return A = E.

Theorem 9.5 (AugmentTree). On input a set of weighted edges E, a spanning subtree T, and

an integer 1 < t ≤ η(E), the algorithm AugmentTree runs in time O(mlogn), where m = |E|.

The set of edges F output by the algorithm satisfies

(a) F ⊆ E,

(b) |F| ≤ t2/2,

(c) If T ⊆ E, as happens when AugmentTree is called by UltraSimple, then (T ∪ F) ? E.

(d)

E ?12η(E)

t

· (T ∪ F).(14)

Moreover, if E is planar then A is planar and |F| ≤ 3t − 6.

Proof. In Appendix B, we present an algorithm for computing the stretch of each edge of E in

time O(mlogn). The remainder of the analysis of the running time is trivial. Part (a) follows

immediately from the statement of the algorithm. When T ⊆ E, T ∪F ⊆ E, so part (c) follows

as well.

26

Page 27

To verify (b), note that the algorithm adds at most one edge to F for each pair of sets in

W1,...,Wh, and there are at most?t

E on the sets W1,...,Whis also planar. Thus, the number of pairs of these sets connected by

edges of E is at most the maximum number of edges in a planar graph with t vertices, 3t − 6.

We now turn to the proof of part (d). Set

2

?≤ t2/2 such pairs. If E is planar, then F must be planar

as F is a subgraph of E. Moreover, we can use Lemma C.1 to show that the graph induced by

β = 4η(E)/t.(15)

By Theorem 9.3, ρ and W1,...,Whsatisfy

?

e:Wi∈ρ(e)

η(e) ≤ β, for all Wisuch that |Wi| > 1.(16)

Let Eint

e with |ρ(e)| = 2 and Wi∈ ρ(e). Let Eint= ∪iEint

the tree formed by the edges of T inside the set Wi. Note that when |Wi| = 1, Tiand Eint

empty.

We will begin by proving that when |Wi| > 1,

e∈Eint

i

denote the set of edges e with ρ(e) = (Wi,Wi), and let Eext

i

denote the set of edges

. Also, let Tidenote

i

and Eext= ∪iEext

i

i

are

Eint

i

?

?

i

η(e)

Ti,(17)

from which it follows that

Eint?

?

i:|Wi|>1

?

e∈Eint

i

η(e)

Ti. (18)

For any edge e ∈ Eint

(11) we have

i

, the path in T between the endpoints of e lies entirely in Ti. So, by

e ? stT(e) · Ti? η(e) · Ti.

Inequality (17) now follows by summing over the edges e ∈ Eint

We now define the map τ : E → E ∪ {undefined} by

?

undefined

i .

τ(e) =

σ(i,j), if |ρ(e)| = 2, where ρ(e) = {Wi,Wj}, and

otherwise.

(19)

To handle the edges bridging components, we prove that for each edge e with ρ(e) = (Wi,Wj),

e ? 3η(e)(Ti+ Tj) + 3

weight(e)

weight(τ(e))· τ(e)(20)

Let e = w( ( (u,v) ) ) be such an edge, with u ∈ Wiand v ∈ Wj. Let τ(e) = z( ( (x,y) ) ), with x ∈ Wiand

y ∈ Wj. Let tidenote the last vertex in Tion the path in T from u to v (see Figure 3). If Tiis

empty, ti= u. Note that tiis also the last vertex in Tion the path in T from x to y. Define tj

similarly. As Ti(u,x) ⊆ Ti(u,ti) ∪ Ti(ti,x), the tree Ticontains a path from u to x of resistance

27

Page 28

W6

W3

W4

W5

W1

W2

t3

t4

x

vu

y

Figure 3: In this example, e = w( ( (u,v) ) ) and τ(e) = z( ( (x,y) ) ).

at most

resistance(Ti(u,ti)) + resistance(Ti(ti,x)),

and the tree Tjcontains a path from y to v of resistance at most

resistance(Tj(y,tj)) + resistance(Tj(tj,v)).

Furthermore, as Ti(u,ti) + Tj(tj,v) ⊆ T(u,v) and Ti(ti,x) + Tj(y,tj) ⊆ T(x,y), the sum of the

resistances of the paths from u to x in Tiand from y to v in Tjis at most

resistance(T(u,v)) + resistance(T(x,y)) = stT(e)/w + stT(τ(e))/z

≤ η(e)/w + η(τ(e))/z

≤ 2η(e)/w,

where the last inequality follows from Proposition 9.4. Thus, the graph

3η(e)(Ti+ Tj) + 3w( ( (x,y) ) ) = 3η(e)(Ti+ Tj) + 3

weight(e)

weight(τ(e))· τ(e)

contains a path from u to v of resistance at most

2

3

1

w+1

3

1

w=1

w,

which by Lemma 8.1 implies (20).

We will now sum (20) over every edge e ∈ Eext

i

for every i, observing that this counts every

28

Page 29

edge in Eexttwice.

Eext= (1/2)

?

?

i

?

e∈Eext

i

e

?

?

?

i

e∈Eext

i

3η(e)Ti+ (1/2)

?

i

?

e∈Eext

?

?

i

3

weight(e)

weight(τ(e))· τ(e)

= 3

i

?

e∈Eext

i

η(e)

Ti+ 3

e∈Eext

weight(e)

weight(τ(e))· τ(e)

= 3

?

i:|Wi|>1

?

e∈Eext

i

η(e)

Ti+ 3

e∈Eext

weight(e)

weight(τ(e))· τ(e),(21)

as Tiis empty when |Wi| = 1.

We will now upper bound the right-hand side of (21). To handle boundary cases, we divide

Eextinto two sets. We let Eext

size 1. We let Eext

τ(e) = e, while for e ∈ Eext

For Eext

single, we have

singleconsist of those e ∈ Eextfor which both sets in ρ(e) have

general= Eext− Eext

general, τ(e) ∈ Eext

singlecontain the rest of the edges in Eext. For e ∈ Eext

general.

single,

?

e∈Eext

single

weight(e)

weight(τ(e))· τ(e) =

?

e∈Eext

single

τ(e) = Eext

single.

To evaluate the sum over the edges e ∈ Eext

τ. Let i be such that f ∈ Eext

e ∈ Eext

?

τ(e)=f

general, consider any f ∈ Eext

and |Wi| > 1. Then, for every e such that τ(e) = f, we have

. So, by Proposition 9.4,

generalin the image of

i

i

e∈Eext

weight(e)

weight(τ(e))=

?

τ(e)=f

e∈Eext

i

weight(e)

weight(τ(e))

≤

?

e∈Eext

i

weight(e)

weight(τ(e))≤

?

e∈Eext

i

η(e)

η(τ(e))≤

?

e∈Eext

i

η(e) ≤ β.(22)

Thus,

?

e∈Eext

weight(e)

weight(τ(e))· τ(e) ? Eext

single+

?

general

f∈image(τ)

f∈Eext

β · f ? β · F.

Plugging this last inequality into (21), we obtain

Eext? 3

?

i:|Wi|>1

?

e∈Eext

i

η(e)

Ti+ 3β · F.

29

Page 30

Applying (18) and then (16), we compute

E = Eext+ Eint? 3

?

i:|Wi|>1

Ti

?

e∈Eint

i

η(e) +

?

e∈Eext

i

η(e)

+ 3β · F ? 3β · (T ∪ F),

which by (15) implies the lemma.

We now observe three sources of slack in Theorem 9.5, in decreasing order of importance.

The first is the motivation for the construction of ultra-sparsifiers in the next section.

1. In the proof of Theorem 9.5, we assume in the worst case that the tree decomposition

could result in each tree Tibeing connected to t − 1 other trees, for a total of t(t − 1)/2

extra edges. Most of these edges seem barely necessary, as they could be included at a

small fraction of their original weight. To see why, consider the crude estimate at the end

of inequality (22). We upper bound the multiplier of one bridge edge f from Ti,

?

τ(e)=f

e∈Eext

i

weight(e)

weight(τ(e)),

by the sum of the multipliers of all bridge edges from Ti,

?

e∈Eext

i

weight(e)

weight(τ(e)).

The extent to which this upper bound is loose is the factor by which we could decrease

the weight of the edge f in the preconditioner.

While we cannot accelerate our algorithms by decreasing the weights with which we include

edges, we are able to use sparsifiers to trade many low-weight edges for a few edges of

higher weight. This is how we reduce the number of edges we add to the spanning tree to

tlogO(1)n.

2. The number of edges added equals the number of pairs of trees that are connected. While

we can easily obtain an upper bound on this quantity when the graph has bounded genus,

it seems that we should also be able to bound this quantity when the graph has some nice

geometrical structure.

3. The constant 12 in inequality (14) can be closer to 2 in practice. To see why, first note

that the internal and external edges count quite differently: the external edges have three

times as much impact. However, most of the edges will probably be internal. In fact, if

one uses the algorithm of [ABN08] to construct the tree, then one can incorporate the

augmentation into this process to minimize the number of external edges. Another factor

of 2 can be saved by observing that the decomposeTree, as stated, counts the internal

edges twice, but could be modified to count them once.

30

Page 31

10Ultra-Sparsifiers

We begin our construction of ultra-sparsifiers by building ultra-sparsifiers for the special case in

which our graph has a distinguished vertex r and a low-stretch spanning tree T with the property

that for every edge e ∈ E − T, the path in T connecting the endpoints of e goes through r.

In this case, we will call r the root of the tree. All of the complications of ultra-sparsification

will be handled in this construction. The general construction will follow simply by using tree

splitters to choose the roots and decompose the input graph.

The algorithm RootedUltraSparsify begins by computing the same set of edges σ(i,j),

as was computed by UltraSimple. However, when RootedUltraSparsify puts one of these

edges into the set F, it gives it a different weight: ω(i,j). For technical reasons, the set F is

decomposed into subsets Fbaccording to the quantities φ(f), which will play a role in the analysis

of RootedUltraSparsify analogous to the role played by η(e) in the analysis of UltraSimple.

Each set of edges Fbis sparsified, and the union of the edges of E that appear in the resulting

sparsifiers are returned by the algorithm. The edges in Fbcannot necessarily be sparsified

directly, as they might all have different endpoints. Instead, Fbis first projected to a graph Hb

on vertex set {1,...,h}. After a sparsifier Hb

graph to form Eb

s. Note that the graph Esreturned by RootedUltraSparsify is a subgraph of

E, with the same edge weights.

We now prove that F = ∪⌈log2η(E)⌉

was defined in (12) and which was used to define the map σ.

sof Hbis computed, it is lifted back to the original

b=1

Fb. Our proof will use the function η, which we recall

Lemma 10.1. For φ as defined in (24), for every f = ψ(i,j)σ(i,j) ∈ F,

1 ≤ ψ(i,j) ≤ φ(f) ≤ η(E).

Proof. Recall from the definitions of φ and ψ that

(25)

φ(f) ≥ ψ(i,j) =

?

e∈E:ρ(e)={Wi,Wj}weight(e)

weight(σ(i,j))

.

By definition σ(i,j) is an edge in E satisfying ρ(σ(i,j)) = {Wi,Wj}; so, the right-hand side of

the last expression is at least 1.

To prove the upper bound on φ(f), first apply Proposition 9.4 to show that

?

weight(σ(i,j))

ψ(i,j) =

e∈E:ρ(e)={Wi,Wj}weight(e)

≤

?

e∈E:ρ(e)={Wi,Wj}η(e)

η(σ(i,j))

≤ η(E),

as η is always at least 1. Similarly,

stT(f) =

ω(i,j)

weight(σ(i,j))stT(σ(i,j)) =

stT(σ(i,j))

weight(σ(i,j))

?

e∈E:ρ(e)={Wi,Wj}

e∈E:ρ(e)={Wi,Wj}

weight(e)

≤

η(σ(i,j))

weight(σ(i,j))

?

e∈E:ρ(e)={Wi,Wj}

weight(e)

≤

?

η(e) ≤ η(E),

where the second-to-last inequality follows from Proposition 9.4.

31

Page 32

Es= RootedUltraSparsify(E,T,r,t,p)

Condition: for all e ∈ E, r ∈ T(e). The parameter t is a positive integer at most ⌈η(E)⌉.

1. Compute stT(e) and η(e) for each edge e ∈ E, where η is as defined in (12).

2. If t ≥ |E|, return Es= E.

3. Set ({W1,...,Wh},ρ) = decompose(T,E,η,t).

4. Compute σ, as given by (13), everywhere it is defined.

5. For every (i,j) such that σ(i,j) is defined, set

ω(i,j) =

?

e∈E:ρ(e)={Wi,Wj}

weight(e)andψ(i,j) = ω(i,j)/weight(σ(i,j)). (23)

6. Set F = {ψ(i,j)σ(i,j) : σ(i,j) is defined}.

7. For each f = ψ(i,j)σ(i,j) ∈ F, set

φ(f) = max(ψ(i,j),stT(f)). (24)

8. For b ∈ {1,...,⌈log2η(E)⌉}:

(a) Set Fb=

?

{f ∈ F : φ(f) ∈ [1,2]}

?f ∈ F : φ(f) ∈ (2b−1,2b]?

if b = 1

otherwise

(b) Let Hbbe the set of edges on vertex set {1,...,h} defined by

?

(c) Set Hb

Hb=ω(i,j)( ( (i,j) ) ) : ψ(i,j)σ(i,j) ∈ Fb?

.

s= Sparsify2(Hb,p).

(d) Set

Eb

s=

?

σ(i,j) : ∃w such that w( ( (i,j) ) ) ∈ Hb

s

?

.

9. Set Es= ∪bEb

s.

32

Page 33

It will be convenient for us to extend the domain of ρ to F by setting ρ(f) = ρ(e) where

e ∈ E has the same vertices as f. That is, when there exists γ ∈ IR+such that f = γe. Define

β = 4η(E)/t.

Our analysis of RootedUltraSparsify will exploit the inequalities contained in the following

two lemmas.

Lemma 10.2. For every i for which |Wi| > 1,

?

f∈F:Wi∈ρ(f)

stT(f) ≤ β.

Proof. Consider any f ∈ F, and let f = ψ(i,j)σ(i,j). Note that the weight of f is ω(i,j), and

recall that stT(f) ≤ η(f). We first show that

?

e:τ(e)=σ(i,j)

η(e) ≥ η(f).

By Proposition 9.4, and the definition of τ in (19)

?

e:τ(e)=σ(i,j)

η(e) ≥

η(σ(i,j))

weight(σ(i,j))

?

e:τ(e)=σ(i,j)

weight(e)

=

η(σ(i,j))

weight(σ(i,j))weight(f)

?

= max(ψ(i,j),stT(f))

= max(φ(f),stT(f))

≥ max(1,stT(f))

= η(f).

= max

weight(f)

weight(σ(i,j)),

stT(σ(i,j))

weight(σ(i,j))weight(f)

?

(by (24))

(by (25))

We then have

?

e∈E:Wi∈ρ(e)

η(e) ≥

?

f∈F:Wi∈ρ(f)

η(f).

The lemma now follows from the upper bound of 4η(E)/t imposed on the left-hand term by

Theorem 9.3.

Lemma 10.3. For every i for which |Wi| > 1,

?

f∈F:Wi∈ρ(f)

φ(f) ≤ 2β.(26)

33

Page 34

Proof. For an edge f ∈ F, let ψ(f) equal ψ(i,j) where f = ψ(i,j)σ(i,j). With this notation,

we may compute

?

≤

f∈F:Wi∈ρ(f)

≤ β +

f∈F:Wi∈ρ(f)

f∈F:Wi∈ρ(f)

φ(f) ≤

?

?

f∈F:Wi∈ρ(f)

stT(f) +

?

f∈F:Wi∈ρ(f)

?

ψ(f),

ψ(f)

η(f) +

f∈F:Wi∈ρ(f)

ψ(f)

?

by Lemma 10.2. We now bound the right-hand term as in the proof of inequality (22):

?

f∈F:Wi∈ρ(f)

ψ(f) =

?

e∈Eext

i

weight(e)

weight(τ(e))≤

?

e∈Eext

i

η(e)

η(τ(e))≤

?

e∈Eext

i

η(e) ≤ β,

by our choice of β and Theorem 9.3.

Lemma 10.4 (RootedUltraSparsify). Let T be a spanning tree on a vertex set V , and let E

be a non-empty set of edges on V for which there exists an r ∈ V be such that for all e ∈ E,

r ∈ T(v). For p > 0 and t a positive integer at most ⌈η(E)⌉, let Esbe the graph returned by

RootedUltraSparsify(E,T,r,t,p). The graph Esis a subgraph of E, and with probability at

least 1 − ⌈log2η(E)⌉p,

|Es| ≤ c1logc2(n/p)max(1,⌈log2η(E)⌉)t,(27)

and

E ? (3β + 126β max(1,log2η(E))) · T + 120β · Es,(28)

where β = 4η(E)/t.

Proof. We first dispense with the case in which the algorithm terminates at line 2. If t ≥ m,

then both (27) and (28) are trivially satisfied by setting Es= E, as β ≥ 2.

By Theorem 1.3 each graph Hb

according to Definition 1.2 with probability at least 1−p. As there are at most ⌈log2η(E)⌉ such

graphs Hb, this happens for all of these graphs with probability at least 1 − ⌈log2η(E)⌉p. For

the remainder of the proof, we will assume that each graph Hb

Hb. Recalling that h ≤ t, the bound on the number of edges in Esis immediate.

Our proof of (28) will go through an analysis of intermediate graphs. As some of these could

be multi-graphs, we will find it convenient to write them as sums of edges.

To define these intermediate graphs, let ribe the vertex in Withat is closest to r in T. As in

Section 9, let Tidenote the edges of the subtree of T with vertex set Wi. We will view rias the

root of tree Ti. Note that if |Wi| = 1, then Wi= {ri} and Tiis empty. As distinct sets Wiand

Wjcan overlap in at most one vertex,?

r ∈ T(e).

scomputed by Sparsify2 is a c1logc2(n/p)-sparsifier of Hb

sis a c1logc2(n/p)-sparsifier of

iTi≤ T. We will exploit the fact that for each e ∈ E

with ρ(e) = {Wi,Wj}, the path T(e) contains both riand rj, which follows from the condition

34

Page 35

We now define the edge set Db, which is a projection of Hbto the vertex set r1,...,rh, and

s, which is an analogous projection of the sparsifier Hb

?

Db

s. We set

Db=

(i,j):ψ(i,j)σ(i,j)∈Fb

ω(i,j)( ( (ri,rj) ) )

and

Db

s=

?

w( ( (i,j) ) )∈Hb

s

w( ( (ri,rj) ) ).

As the sets Wiand Wjare allowed to overlap slightly, it could be the case that some ri= rjfor

i ?= j. In this case, Dbwould not be isomorphic to Hb.

Set

Fb

s= γψ(i,j)σ(i,j) : ∃γ and (i,j) so that γω(i,j)( ( (i,j) ) ) ∈ Hb

The edge set Hbcan be viewed as a projection of the edge set Fbto the vertex set {1,...,h},

and the edge set Fb

We will prove the following inequalities

?

s

?

.

scan be viewed as a lift of Hb

sback into a reweighted subgraph of Fb.

E ? 3β · T + 3F

Fb? 2β · T + 2Db

Db? (5/4)Db

Db

Fb

(29)

(30)

s

(31)

s? 16β · T + 2Fb

s? 8β · Eb

s

(32)

s

(33)

Inequality (28) in the statement of the lemma follows from these inequalities and F =?

in RootedUltraSparsify are the same as those chosen by UltraSimple, except that they are

reweighted by the function ψ. If we follow the proof of inequality (14) in Theorem 9.5, but

neglect to apply inequality (22), we obtain

bFb.

To prove inequality (29), we exploit the proof of Theorem 9.5. The edges F constructed

E ? 3β · T + 3

?

e∈Eext

weight(e)

weight(τ(e))· τ(e) = 3β · T + 3F.

To prove inequality (30), consider any edge w( ( (u,v) ) ) = f ∈ Fb. Assume ρ(f) = {Wi,Wj},

u ∈ Wiand v ∈ Wj. We will now show that

f ? 2stT(f)(Ti+ Tj) + 2w( ( (ri,rj) ) ). (34)

As the path from u to v in T contains both riand rj,

resistance(T(u,ri)) + resistance(T(rj,v)) ≤ resistance(T(u,v)) = stT(f)/w.

Thus, the resistance of the path

2stT(f)T(u,ri) + 2w( ( (ri,rj) ) ) + 2stT(f)T(rj,v)

35

Page 36

is at most 1/w, and so Lemma 8.1 implies that

f ? 2stT(f)T(u,ri) + 2w( ( (ri,rj) ) ) + 2stT(f)T(rj,v),

which in turn implies (34). Summing (34) over all f ∈ Fbyields

?

?

? 2

β · Ti+ 2Db,

Fb? 2

i

?

f∈F:Wi∈ρ(f)

f∈F:Wi∈ρ(f)

stT(f)

Ti+ 2Db

stT(f)

Ti+ 2Db

Fb? 2

i:|Wi|>1

?

?

as Tiis empty when |Wi| = 1

i

by Lemma 10.2

? 2β · T + 2Db.

We now prove inequality (32), as it uses similar techniques. Let fs= w( ( (u,v) ) ) ∈ Fb

there exist γ and (i,j) so that γω(i,j)( ( (i,j) ) ) ∈ Hb

multiplier γ. By part (c) of Definition 1.2, we must have ω(i,j)( ( (i,j) ) ) ∈ Hband ψ(i,j)σ(i,j) ∈ Fb.

Let f = ψ(i,j)σ(i,j). Note that fs= γ(fs)f. The sum of the resistances of the paths from ri

to u in Tiand from v to rjin Tjis

s. Then,

s, u ∈ Wi, and v ∈ Wj. Set γ(fs) to be this

resistance(T(ri,u)) + resistance(T(v,rj)) ≤ resistance(T(u,v)) = stT(f)/ω(i,j),

as weight(f) = ω(i,j). Thus, the resistance of the path

2stT(f)T(ri,u) + 2f + 2stT(f)T(v,rj)

is at most 1/ω(i,j), and so Lemma 8.1 implies that

ω(i,j)( ( (ri,rj) ) ) ? 2stT(f)(Ti+ Tj) + 2f,

and

γ(fs)ω(i,j)( ( (ri,rj) ) ) ? 2γ(fs)stT(f)(Ti+ Tj) + 2fs

? 2γ(fs)φ(f)(Ti+ Tj) + 2fs

? 2b+1γ(fs)(Ti+ Tj) + 2fs

(by (24))

(by f ∈ Fb).

Summing this inequality over all fs∈ Fb

s, we obtain

Db

s?

?

i

2b+1

?

fs∈Fb

s:Wi∈ρ(fs)

γ(fs)

Ti+ 2Fb

s.

For all i such that |Wi| > 1,

?

fs∈Fb

s:Wi∈ρ(fs)

γ(fs) ≤ 2

???

?

f ∈ Fb: Wi∈ ρ(f)

?

≤ 4β/2b−1

= β/2b−3.

????

(part (d) of Definition 1.2)

≤ 2

f∈Fb:Wi∈ρ(f)

φ(f)/2b−1

(by Lemma 10.3)

(35)

36

Page 37

So,

Db

s?

?

i

16β · Ti+ 2Fb

s? 16β · T + 2Fb

s.

To prove inequality (33), let fsbe any edge in Fs, let f be the edge in F such that fs= γ(fs)f,

and let σ(i,j) be the edge such that fs= γ(fs)ψ(i,j)σ(i,j). It suffices to show that

weight(fs) ≤ 8β weight(σ(i,j)). (36)

Set b so that f ∈ Fb. By (35),

γ(fs) ≤ β/2b−3≤ 8β/φ(f) = 8β/max(ψ(i,j),stT(f)) ≤ 8β/ψ(i,j).

As weight(fs) = γ(fs)ψ(i,j)weight(σ(i,j)), inequality (36) follows.

It remains to prove inequality (31). The only reason this inequality is not immediate from

part (a) of Definition 1.2 is that we may have ri= rjfor some i ?= j. Let R = {r1,...,rh} and

S = {1,...,h}, Define the map π : IRR→ IRSby π(x)i= xri. We then have for all x ∈ IRR

xTLDbx = π(x)TLHbπ(x) andxTLDb

sx = π(x)TLHb

sπ(x);

so,

xTLDbx = π(x)TLHbπ(x) ≤ (5/4)π(x)TLHb

sπ(x) = (5/4)xTLDb

sx.

The algorithm UltraSparsify will construct a low-stretch spanning tree T of a graph, choose

a root vertex r, apply RootedUltraSparsify to sparsify all edges whose path in T contains r,

and then work recursively on the trees obtained by removing the root vertex from T. The root

vertex will be chosen to be a tree splitter, where we recall that a vertex r is a splitter of a tree

T if the trees T1,...,Tqobtained by removing r each have at most two-thirds as many vertices

as T. It is well-known that a tree splitter can be found in linear time. By making the root a

splitter of the tree, we bound the depth of the recursion. This is both critical for bounding the

running time of the algorithm and for proving a bound on the quality of the approximation it

returns. For each edge e such that r ?∈ T(e), T(e) is entirely contained in one of T1,...,Tq.

Such edges are sparsified recursively.

U = UltraSparsify(G = (V,E),k)

Condition: G is connected.

1. T = LowStretch(E).

2. Set t = 517 · max(1,log2η(E)) ·

?

log3/2n

?

η(E)/k and p =?2⌈logη(E)⌉n2?−1.

3. If t ≥ η(E) then set A = E − T; otherwise, set A = TreeUltraSparsify(E − T,t,T,p).

4. U = T ∪ A.

37

Page 38

A = TreeUltraSparsify(E′,t′,T′,p)

1. If E′= ∅, return A = ∅.

2. Compute a splitter r of T′.

3. Set Er= {edges e ∈ E′such that r ∈ T′(e)} and tr= ⌈t′η(Er)/η(E′)⌉.

4. If tr> 1, set Ar= RootedUltraSparsify(Er,T′,r,tr,p); otherwise, set Ar= ∅.

5. Set T1,...,Tqto be the trees obtained by removing r from T′. Set V1,...,Vqto be the

vertex sets of these trees, and set E1,...,Eqso that Ei=?(u,v) ∈ E′: {u,v} ⊆ Vi?.

6. For i = 1,...,q, set

A = Ar∪ TreeUltraSparsify(Ei,t′η(Ei)/η(E′),Ti,p).

Theorem 10.5 (Ultra-Sparsification). On input a weighted, connected n-vertex graph G =

(V,E) and k ≥ 1, UltraSparsify(E,k) returns a set of edges U = T ∪ A ⊆ E such that T is a

spanning tree of G, U ⊆ E, and with probability at least 1 − 1/2n,

U ? E ? kU,(37)

and

|A| ≤ O

?m

klogc2+5n

?

,(38)

where m = |E|. Furthermore, UltraSparsify runs in expected time mlogO(1)n.

We remark that this theorem is very loose when m/k ≥ n. In this case, the calls made to

decompose by RootedUltraSparsify could have t ≥ n, in which case decompose will just return

singleton sets, and the output of RootedUltraSparsify will essentially just be the output of

Sparsify2 on Er. In this case, the upper bound in (38) can be very loose.

Proof. We first dispense with the case t ≥ η(E). In this case, UltraSparsify simply returns

the graph E, so (37) is trivially satisfied. The inequality t ≥ η(E) implies k ≤ O(log2n), so (38)

is trivially satisfied as well.

At the end of the proof, we will use the inequality t < η(E). It will be useful to observe that

every time TreeUltraSparsify is invoked,

t′= tη(E′)/η(E).

To apply the analysis of RootedUltraSparsify, we must have

tr≤ ⌈η(Er)⌉.

This follows from

tr=?t′η(Er)/η(E′)?= ⌈tη(Er)/η(E)⌉ ≤ ⌈η(Er)⌉,

as TreeUltraSparsify is only called if t < η(E).

38

Page 39

Each vertex of V can be a root in a call to RootedUltraSparsify at most once, so this sub-

routine is called at most n times during the execution of UltraSparsify. Thus by Lemma 10.4,

with probability at least

1 − n⌈log2η(E)⌉p = 1 − 1/2n,

every graph Esreturned by a call to RootedUltraSparsify satisfies (27) and (28). Accordingly,

we will assume both of these conditions hold for the rest of our analysis.

We now prove the upper bound on the number of edges in A. During the execution of

UltraSparsify, many vertices become the root of some tree. For those vertices v that do not,

set tv= 0. By (27),

?

As ⌈z⌉ ≤ 2z for z ≥ 1 and Er1∩ Er2= ∅ for each r1?= r2,

?

Thus,

|A| =

r∈V :tr>1

|Ar| ≤ c1logc2(n/p)max(1,⌈log2η(E)⌉)

?

r∈V :tr>1

tr.(39)

r∈V :tr>1

tr=

?

r∈V :tr>1

?η(Er)

η(E)t

?

≤

?

r∈V :tr>1

2η(Er)

η(E)t ≤ 2t.

(39) ≤ 2c1logc2(n/p)⌈log2η(E)⌉t

≤ 2c1logc2(n/p)⌈log2η(E)⌉517 · log2η(E) ·

≤ O

?

log3/2n

?

η(E)/k

?m

klogc2+5n

?

,

where the last inequality uses η(E) = O(mlognlog2n) = O(mlog2n) from Theorem 9.1 and

logm = O(logn).

We now establish (37). For every vertex r that is ever selected as a tree splitter in line 2 of

TreeUltraSparsify, let Trbe the tree T′of which r is a splitter, and let Erdenote the set of

edges and trbe the parameter set in line 3. Observe that ∪rEr= E − T. Let

βr= 4η(Er)/tr,

and note this is the parameter used in the analysis of RootedUltraSparsify in Lemma 10.4. If

tr> 1, let Arbe the set of edges returned by the call to RootedUltraSparsify. By Lemma 10.4,

RootedUltraSparsify returns a set of edges Arsatisfying

Er? (3βr+ 126βrmax(1,log2η(Er))) · Tr+ 120βr· Ar.

On the other hand, if tr= 1 and so Ar= ∅, then βr= 4η(Er). We know that (40) is satisfied in

this case because Er? η(Er)Tr(by (10)). If tr= 0, then Er= ∅ and (40) is trivially satisfied.

As tr= ⌈tη(Er)/η(E)⌉,

βr≤ 4η(E)/t.

We conclude

(40)

Er? 129βrmax(1,log2η(Er))·Tr+120βr·Ar? 516(η(E)/t)max(1,log2η(Er))Tr+120(η(E)/t)Ar.

39

Page 40

Adding T, summing over all r, and remembering η(Er) ≤ η(E), we obtain

T + (E − T) ? T + 516(η(E)/t)max(1,log2η(E))

?

r

Tr+ 120(η(E)/t)A.

As r is always chosen to be a splitter of the tree input to TreeUltraSparsify, the depth of the

recursion is at mostlog3/2n . Thus, no edge of T appears more than

sum?

T + (E − T) ? T + 516(η(E)/t)max(1,log2η(E))

? 517(η(E)/t)max(1,log2η(E))

???

log3/2n

?

times in the

rTr, and we may conclude

?

log3/2n

?

?

T + 120(η(E)/t)A

?

log3/2nT + 120(η(E)/t)A

? k(T + A)

= kU,

where the second inequality follows from t ≤ η(E), and the third inequality follows from the

value chosen for t in line 2 of UltraSparsify.

To bound the expected running time of UltraSparsify, first observe that the call to

LowStretch takes time O(mlog2n). Then, note that the routine TreeUltraSparsify is re-

cursive, the recursion has depth at most O(logn), and all the graphs being processed by

TreeUltraSparsifyat any level of the recursion are disjoint. The running time of TreeUltraSparsify

is dominated by the calls made to Sparsify2 inside RootedUltraSparsify. Each of these takes

nearly-linear expected time, so the overall expected running time of TreeUltraSparsify is

O(mlogO(1)n).

References

[ABN08] I. Abraham, Y. Bartal, and O. Neiman. Nearly tight low stretch spanning trees.

In Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer

Science, pages 781–790, Oct. 2008.

[AKPW95] Noga Alon, Richard M. Karp, David Peleg, and Douglas West. A graph-theoretic

game and its application to the k-server problem. SIAM Journal on Computing,

24(1):78–100, February 1995.

[Axe85] O. Axelsson. A survey of preconditioned iterative methods for linear systems of

algebraic equations. BIT Numerical Mathematics, 25(1):165–187, March 1985.

[BBC+94] R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout,

R. Pozo, C. Romine, and H. Van der Vorst. Templates for the Solution of Linear

Systems: Building Blocks for Iterative Methods, 2nd Edition. SIAM, Philadelphia,

PA, 1994.

[BCHT04] Erik G. Boman, Doron Chen, Bruce Hendrickson, and Sivan Toledo. Maximum-

weight-basis preconditioners. Numerical linear algebra with applications, 11(8–

9):695–721, October/November 2004.

40

Page 41

[BGH+06] M. Bern, J. Gilbert, B. Hendrickson, N. Nguyen, and S. Toledo. Support-graph

preconditioners. SIAM J. Matrix Anal. & Appl, 27(4):930–951, 2006.

[BH01]Erik Boman and B. Hendrickson. On spanning tree preconditioners. Manuscript,

Sandia National Lab., 2001.

[BH03a]Mario Bebendorf and Wolfgang Hackbusch. Existence of H-matrix approximants

to the inverse FE-matrix of elliptic operators with L∞-coefficients.

Mathematik, 95(1):1–28, July 2003.

Numerische

[BH03b]Erik G. Boman and Bruce Hendrickson. Support theory for preconditioning. SIAM

Journal on Matrix Analysis and Applications, 25(3):694–717, 2003.

[BHM01]W. L. Briggs, V. E. Henson, and S. F. McCormick. A Multigrid Tutorial, 2nd

Edition. SIAM, 2001.

[BHV04]Erik G. Boman, Bruce Hendrickson, and Stephen A. Vavasis. Solving elliptic fi-

nite element systems in near-linear time with support preconditioners.

cs.NA/0407022, 2004.

CoRR,

[BMN04]M. Belkin, I. Matveeva, and P. Niyogi. Regularization and semi-supervised learning

on large graphs. Proc. 17th Conf. on Learning Theory, pages 624–638, 2004.

[BSS09]Joshua D. Batson, Daniel A. Spielman, and Nikhil Srivastava. Twice-Ramanujan

sparsifiers. In Proceedings of the 41st Annual ACM Symposium on Theory of com-

puting, pages 255–262, 2009.

[CW82]D. Coppersmith and S. Winograd. On the asymptotic complexity of matrix multi-

plication. SIAM Journal on Computing, 11(3):472–492, August 1982.

[DER86]I. S. Duff, A. M. Erisman, and J. K. Reid. Direct Methods for Sparse Matrices.

Oxford Science Publications, 1986.

[DS07]Samuel I. Daitch and Daniel A. Spielman. Support-graph preconditioners for 2-

dimensional trusses. CoRR, abs/cs/0703119, 2007.

[DS08] Samuel I. Daitch and Daniel A. Spielman. Faster approximate lossy generalized flow

via interior point algorithms. In Proceedings of the 40th Annual ACM Symposium

on Theory of Computing, pages 451–460, 2008.

[EEST08] Michael Elkin, Yuval Emek, Daniel A. Spielman, and Shang-Hua Teng. Lower-

stretch spanning trees. SIAM Journal on Computing, 32(2):608–628, 2008.

[EGB05] Ojas Parekh Sivan Toledo Erik G. Boman, Doron Chen.

symmetric h-matrices. Linear Algebra and its Applications, 405(1):239–248, August

2005.

On factor width and

[FG04]A. Frangioni and C. Gentile. New preconditioners for KKT systems of network flow

problems. SIAM Journal on Optimization, 14(3):894–913, 2004.

41

Page 42

[Fie73]Miroslav Fiedler.

Journal, 23(98):298–305, 1973.

Algebraic connectivity of graphs. Czechoslovak Mathematical

[Geo73]J. A. George. Nested dissection of a regular finite element mesh. SIAM J. Numer.

Anal., 10:345–363, 1973.

[GMZ95] Keith D. Gremban, Gary L. Miller, and Marco Zagha. Performance evaluation of a

new parallel preconditioner. In Proceedings of the 9th International Symposium on

Parallel Processing, pages 65–69. IEEE Computer Society, 1995.

[GO88] G. H. Golub and M. Overton. The convergence of inexact Chebychev and Richardson

iterative methods for solving linear systems. Numerische Mathematik, 53:571–594,

1988.

[Gre96]Keith Gremban. Combinatorial Preconditioners for Sparse, Symmetric, Diagonally

Dominant Linear Systems. PhD thesis, Carnegie Mellon University, CMU-CS-96-

123, 1996.

[GT87]John R. Gilbert and Robert Endre Tarjan. The analysis of a nested dissection

algorithm. Numerische Mathematik, 50(4):377–404, February 1987.

[Har72]Frank Harary. Graph Theory. Addison-Wesley, 1972.

[Jos97]Anil Joshi. Topics in Optimization and Sparse Linear Systems. PhD thesis, UIUC,

1997.

[KM07] Ioannis Koutis and Gary L. Miller. A linear work, o(n1/6) time, parallel algorithm

for solving planar Laplacians. In Proceedings of the 18th Annual ACM-SIAM Sym-

posium on Discrete Algorithms, pages 1002–1011, 2007.

[LRT79]Richard J. Lipton, Donald J. Rose, and Robert Endre Tarjan. Generalized nested

dissection. SIAM Journal on Numerical Analysis, 16(2):346–358, April 1979.

[Mih89] Milena Mihail. Conductance and convergence of Markov chains—A combinatorial

treatment of expanders. In 30th Annual IEEE Symposium on Foundations of Com-

puter Science, pages 526–531, 1989.

[MMP+05] Bruce M. Maggs, Gary L. Miller, Ojas Parekh, R. Ravi, and Shan Leung Maverick

Woo. Finding effective support-tree preconditioners. In Proceedings of the seven-

teenth annual ACM symposium on Parallelism in algorithms and architectures, pages

176–185, 2005.

[Rei98] John Reif. Efficient approximate solution of sparse linear systems. Computers and

Mathematics with Applications, 36(9):37–58, 1998.

[SS08]Daniel A. Spielman and Nikhil Srivastava. Graph sparsification by effective resis-

tances. In Proceedings of the 40th annual ACM Symposium on Theory of Computing,

pages 563–568, 2008.

42

Page 43

[SST06]A. Sankar, D. A. Spielman, and S.-H. Teng. Smoothed analysis of the condition

numbers and growth factors of matrices. SIAM Journal on Matrix Analysis and

Applications, 28(2):446–476, 2006.

[ST04] Daniel A. Spielman and Shang-Hua Teng. Nearly-linear time algorithms for graph

partitioning, graph sparsification, and solving linear systems. In Proceedings of the

thirty-sixth annual ACM Symposium on Theory of Computing, pages 81–90, 2004.

Full version available at http://arxiv.org/abs/cs.DS/0310051.

[ST08a]Gil Shklarski and Sivan Toledo. Rigidity in finite-element matrices: Sufficient con-

ditions for the rigidity of structures and substructures. SIAM Journal on Matrix

Analysis and Applications, 30(1):7–40, 2008.

[ST08b]Daniel A. Spielman and Shang-Hua Teng. A local clustering algorithm for mas-

sive graphs and its application to nearly-linear time graph partitioning. CoRR,

abs/0809.3232, 2008. Available at http://arxiv.org/abs/0809.3232.

[ST08c]Daniel A. Spielman and Shang-Hua Teng. Spectral sparsification of graphs. CoRR,

abs/0808.4134, 2008. Available at http://arxiv.org/abs/0808.4134.

[Str86]Gilbert Strang. Introduction to Applied Mathematics. Wellesley-Cambridge Press,

1986.

[SW09]Daniel

low-stretch

http://arxiv.org/abs/0903.2816.

A.Spielman

spanning trees.

andJaeohWoo.A noteon preconditioning

2009.

by

CoRR, abs/0903.2816,Available at

[TB97] L. N. Trefethen and D. Bau. Numerical Linear Algebra. SIAM, Philadelphia, PA,

1997.

[Vai90]Pravin M. Vaidya. Solving linear equations with symmetric diagonally dominant

matrices by constructing good preconditioners. Unpublished manuscript UIUC 1990.

A talk based on the manuscript was presented at the IMA Workshop on Graph

Theory and Sparse Matrix Computation, October 1991, Minneapolis., 1990.

[ZBL+03] Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard

Sch¨ olkopf. Learning with local and global consistency. In Adv. in Neural Inf. Proc.

Sys. 16, pages 321–328, 2003.

[ZGL03]Xiaojin Zhu, Zoubin Ghahramani, and John D. Lafferty. Semi-supervised learning

using Gaussian fields and harmonic functions. In Proc. 20th Int. Conf. on Mach.

Learn., 2003.

AGremban’s reduction

Gremban [Gre96] (see also [MMP+05]) provides the following method for handling positive off-

diagonal entries. If A is a SDD0-matrix, then Gremban decomposes A into D +An+Ap, where

43

Page 44

D is the diagonal of A, Anis the matrix containing all the negative off-diagonal entries of A,

and Apcontains all the positive off-diagonals. Gremban then considers the linear system

?

? A

x1

x2

?

=ˆb, where

? A =

?

D + An

−Ap

−Ap

D + An

?

and

ˆb =

?

b

−b

?

,

and observes that x = (x1−x2)/2 will be the solution to Ax = b, if a solution exists. Moreover,

approximate solutions of Gremban’s system yield approximate solutions of the original:

????

SDD0-matrix into that of solving a linear system in a SDDM0-matrix that is at most twice as

large and has at most twice as many non-zero entries.

?

x1

x2

?

−? A†ˆb

????≤ ǫ

???? A†ˆb

???

implies

???x − A†b

??? ≤ ǫ

???A†b

???,

where again x = (x1− x2)/2. Thus we may reduce the problem of solving a linear system in a

BComputing the stretch

We now show that given a weighted graph G = (V,E) and a spanning tree T of G, we can

compute stT(e) for every edge e ∈ E in O((m + n)logn) time, where m = |E| and n = |V |.

For each pair of vertices u,v ∈ V , let resistance(u,v) be the resistance of T(u,v), the path

in T connecting u and v. We first observe that for an arbitrary r ∈ V , we can compute

resistance(v,r) for all v ∈ V in O(n) time by a top-down traversal on the rooted tree obtained

from T with root r. Using this information, we can compute the stretch of all edges in Er=

{edges e ∈ E such that r ∈ T(e)} in time O(|Er|). We can then use tree splitters in the same

manner as in TreeUltraSparsify to compute the stretch of all edges in E in O((m + n)logn)

time.

C Decomposing Trees

The pseudo-code for for decompose appears on the next page. The algorithm performs a depth-

first traversal of the tree, greedily forming sets Wionce they are attached to a sufficient number

of edges of E. While these sets are being created, the edges they are responsible for are stored

in Fsub, and the sum of the value of η on these edges is stored in wsub. When a set Wiis formed,

the edges e for which ρ(e) = Wiare set to some combination of Fsuband Fv.

We assume that some vertex r has been chosen to be the root of the tree. This choice is

used to determine which nodes in the tree are children of each other.

Proof of Theorem 9.3. As algorithm decompose traverses the tree T once and visits each edge

in E once, it runs in linear time.

In our proof, we will say that an edge e is assigned to a set Wjif Wj∈ ρ(e). To prove part

(a) of the theorem, we use the following observations: If Wjis formed in step 3.c.ii or step 6.b,

then the sum of η over edges assigned to Wjis at least φ, and if Wjis formed in step 7.b, then

the sum of η of edges incident to Wj and Wj+1(which is a singleton) is at least 2φ. Finally,

if a set Whis formed in line 5.b of decompose, then the sum of η over edges assigned to Whis

44

Page 45

({W1,...,Wh},ρ) = decompose(T,E,η,t)

Comment: h, ρ, and the Wi’s are treated as global variables.

1. Set h = 0.

2. For all e ∈ E, set ρ(e) = ∅.

3. Set φ = 2?

5. If U ?= ∅,

(a) h = h + 1.

(b) Wh= U.

(c) For all e ∈ F, set ρ(e) = ρ(e) ∪ {Wh}.

e∈Eη(e)/t.

4. (F,w,U) = sub(r).

(F,w,U) = sub(v)

1. Let v1,...,vsbe the children of v.

2. Set wsub= 0, Fsub= ∅ and Usub= ∅.

3. For i = 1,...,s

(a) (Fi,wi,Ui) = sub(vi).

(b) wsub= wsub+ wi, Fsub= Fsub∪ Fi, Usub= Usub∪ Ui.

(c) If wsub≥ φ,

i. h = h + 1.

ii. Set Wh= Usub∪ {v}.

iii. For all e ∈ Fsub, set ρ(e) = ρ(e) ∪ {Wh}.

iv. Set wsub= 0, Fsub= ∅ and Usub= ∅.

4. Set Fv= {(u,v) ∈ E}, the edges attached to v.

5. Set wv=?

(a) h = h + 1.

(b) Set Wh= Usub∪ {v}.

(c) For all e ∈ Fsub∪ Fv, set ρ(e) = ρ(e) ∪ {Wh}.

(d) Return (∅,0,∅).

7. If wv+ wsub> 2φ,

e∈Fvη(e).

6. If φ ≤ wv+ wsub≤ 2φ,

(a) h = h + 1.

(b) Set Wh= Usub.

(c) For all e ∈ Fsub, set ρ(e) = ρ(e) ∪ {Wh}.

(d) h = h + 1.

(e) Set Wh= {v}.

(f) For all e ∈ Fv, set ρ(e) = ρ(e) ∪ {Wh}.

(g) Return (∅,0,∅).

8. Return (Fsub∪ Fv,wsub+ wv,Usub∪ {v})

45

Page 46

greater than zero. But, at most one set is formed this way. As each edge is assigned to at most

two sets in W1,...,Wh, we may conclude

?

which implies t > h − 1. As both t and h are integers, this implies t ≥ h.

We now prove part (b). First, observe that steps 6 and 7 guarantee that when a call to

sub(v) returns a triple (F,w,U),

w =η(e) < φ.

2

e∈E

η(e) > (h − 1)φ,

?

e∈U

Thus, when a set Whis formed in step 3.c.ii, we know that the sum of η over edges assigned to

Whequals wsuband is at most 2φ. Similarly, we may reason that wsub< φ at step 4. If a set

Whis formed in step 6.b, the sum of η over edges associated with Whis wv+ wsub, and must

be at most 2φ. If a set Whis formed in step 7.b, the sum of η over edges associated with Whis

wsub, which we established is at most φ. As the set formed in step 7.e is a singleton, we do not

need to bound the sum of η over its associated edges.

Lemma C.1. Suppose G = (V,E) is a planar graph, π is a planar embedding of G, T is a

spanning tree of G, and t > 1 is an integer. Let ({W1,...,Wh},ρ) = decompose(T,E,η,t) with

the assumption that in Step 1 of sub, the children v1,...,vsof v always appear in clock-wise order

according to π. Then the graph G{W1,...,Wh}= ({1,...,h},{(i,j) : ∃ e ∈ E,ρ(e) = {Wi,Wj}})

is planar.

Proof. Recall that the contraction of an edge e = (u,v) in a planar graph G = (V,E) defines

a new graph (V − {u},E ∪ {(x,v) : (x,u) ∈ E} − {(x,u) ∈ E}). Also recall that edge deletions

and edge contractions preserve planarity.

We first prove the lemma in the special case in which the sets W1,...,Whare disjoint. For

each j, let Tj be the graph induced on T by Wj. As each Tj is connected, G{W1,...,Wh}is a

subgraph of the graph obtained by contracting all the edges in each subgraph Tj. Thus in this

special case G{W1,...,Wh}is planar.

We now analyze the general case, recalling that the sets W1,...,Whcan overlap. However,

the only way sets Wjand Wkwith j < k can overlap is if the set Wjwas formed at step 3.c.ii,

and the vertex v becomes part of Wkafter it is returned by a call to sub. In this situation, no

edge is assigned to Wjfor having v as an end-point. That is, the only edges of form (x,v) that

can be assigned to Wjmust have x ∈ Wj. So, these edges will not appear in G{W1,...,Wh}.

Accordingly, for each j we define

?

Wj

otherwise.

Xj=

Wj− v if Wjwas formed at step 3.c.ii, and

We have shown that G{W1,...,Wh}= G{X1,...,Xh}. Moreover, the sets X1,...,Xhare disjoint. Our

proof would now be finished, if only each subgraph of G induced by a set Xjwere connected.

While this is not necessarily the case, we can make it the case by adding edges to E.

The only way the subgraph of G induced on a set Xj can fail to be connected is if Wj is

formed at line 3.c.ii from the union of v with a collection sets Uifor i0≤ i ≤ i1returned by

46

Page 47

recursive calls to sub. Now, consider what happens if we add edges of the form (vi,vi+1) to the

graph for i0≤ i < i1, whenever they are not already present. As the vertices vi0,...,vi1appear

in clock-wise order around v, the addition of these edges preserves the planarity of the graph.

Moreover, their addition makes the induced subgraphs on each set Xj connected, so we may

conclude that G{X1,...,Xh}is in fact planar.

D The Pseudo-Inverse of a Factored Symmetric Matrix

We recall that B†is the pseudo-inverse of B if and only if it satisfies

BB†B = B

B†BB†= B†

(BB†)T= BB†

(B†B)T= B†B.

(41)

(42)

(43)

(44)

We now prove that if B = XCXT, where X is a non-singular matrix and C is symmetric,

then

B†= ΠX−TC†X−1Π,

where Π is the projection onto the span of B. We will prove that by showing that ΠX−TC†X−1Π

satisfies axioms (41–44). Recall that Π = BB†= B†B and that ΠB = B.

To verify (41), we compute

B(ΠX−TC†X−1Π)B = BX−TC†X−1B

= (XCXT)X−TC†X−1(XCXT)

= XCC†CXT

= XCXT

= B.

To verify (42), we compute

(ΠX−TC†X−1Π)B(ΠX−TC†X−1Π) = ΠX−TC†X−1BX−TC†X−1Π

= ΠX−TC†X−1XCXTX−TC†X−1Π

= ΠX−TC†CC†X−1Π

= ΠX−TC†X−1Π.

47

Page 48

To verify (43), it suffices to verify that BΠX−TC†X−1Π is symmetric, which we now do:

B(ΠX−TC†X−1Π) = ΠBX−TC†X−1Π

= Π(XCXT)(X−TC†X−1Π)

= ΠXCC†X−1Π

= B†BXCC†X−1BB†

= B†XCXTXCC†X−1XCXTB†

= B†XCXTXCC†CXTB†

= B†XCXTXCXTB†

= B†BBB†,

which is symmetric as B is symmetric.

As B and ΠX−TC†X−1Π are symmetric, it follows that (44) is satisfied as well.

48