Solving Sparse, Symmetric, Diagonally-Dominant Linear Systems in Time $O (m^{1.31})$
ABSTRACT We present a linear-system solver that, given an $n$-by-$n$ symmetric positive semi-definite, diagonally dominant matrix $A$ with $m$ non-zero entries and an $n$-vector $\bb $, produces a vector $\xxt$ within relative distance $\epsilon$ of the solution to $A \xx = \bb$ in time $O (m^{1.31} \log (n \kappa_{f} (A)/\epsilon)^{O (1)})$, where $\kappa_{f} (A)$ is the log of the ratio of the largest to smallest non-zero eigenvalue of $A$. In particular, $\log (\kappa_{f} (A)) = O (b \log n)$, where $b$ is the logarithm of the ratio of the largest to smallest non-zero entry of $A$. If the graph of $A$ has genus $m^{2\theta}$ or does not have a $K_{m^{\theta}} $ minor, then the exponent of $m$ can be improved to the minimum of $1 + 5 \theta $ and $(9/8) (1+\theta)$. The key contribution of our work is an extension of Vaidya's techniques for constructing and analyzing combinatorial preconditioners.
-
Citations (0)
-
Cited In (0)
Page 1
arXiv:cs/0310036v2 [cs.DS] 31 Mar 2004
Solving Sparse, Symmetric, Diagonally-Dominant Linear
Systems in Time O(m1.31)
Daniel A. Spielman∗
Department of Mathematics
Massachusetts Institute of Technology
Shang-Hua Teng†
Department of Computer Science
Boston University and
Akamai Technologies Inc.
February 1, 2008
Second post-FOCS revision.
Abstract
We present a linear-system solver that, given an n-by-n symmetric positive semi-definite, diagonally
dominant matrix A with m non-zero entries and an n-vector b, produces a vector ˜ x within relative distance
ǫ of the solution to Ax = b in time O(m1.31log(nκf(A)/ǫ)O(1)), where κf(A) is the log of the ratio
of the largest to smallest non-zero eigenvalue of A. In particular, log(κf(A)) = O(blogn), where b is
the logarithm of the ratio of the largest to smallest non-zero entry of A. If the graph of A has genus m2θ
or does not have a Kmθ minor, then the exponent of m can be improved to the minimum of 1 + 5θ and
(9/8)(1+θ). The key contribution of our work is an extension of Vaidya’s techniques for constructing and
analyzing combinatorial preconditioners.
1 Introduction
Sparse linear systems are ubiquitous in scientific computing and optimization. In this work, we develop fast
algorithms for solving some of the best-behaved linear systems: those specified by symmetric, diagonally
dominant matrices with positive diagonals. We call such matrices PSDDD as they are positive semi-definite
and diagonally dominant. Such systems arise in the solution of certain elliptic differential equations via the
finite element method, the modelingof resistive networks, and in the solution of certain networkoptimization
problems [SF73, McC87, HY81, Var62, You71].
While one is often taught to solve a linear system Ax = b by computing A−1and then multiplying A−1
by b, this approach is quite inefficient for sparse linear systems—the best known bound on the time required
to compute A−1is O(n2.376) [CW82] and the representation of A−1typically requires Ω(n2) space. In
contrast, if A is symmetric and has m non-zeroentries, then one can use the Conjugate Gradient method, as a
direct method, to solve for A−1b in O(nm) time and O(n) space! Until Vaidya’s revolutionary introduction
of combinatorial preconditioners [Vai90], this was the best complexity bound for the solution of general
PSDDD systems.
∗Partially supported by NSF grant CCR-0112487. spielman@math.mit.edu
†Partially supported by NSF grant CCR-9972532. steng@cs.bu.edu
1
Page 2
The two most popular families of methods for solving linear systems are the direct methods and the iterative
methods. Direct methods, such as Gaussian elimination, perform arithmetic operations that produce x treat-
ing the entries of A and b symbolically. As discussed in Section 1.4, direct methods can be used to quickly
compute x if the matrix A has special topological structure.
Iterative methods, which are discussed in Section 1.5, compute successively better approximations to x. The
Chebyshev and Conjugate Gradient methods take time proportional to m?κf(A)log(κf(A)/ǫ) to produce
eigenvalue of A. These algorithms are improved by preconditioning—essentially solving B−1Ax = B−1b
for a preconditioner B that is carefully chosen so that κf(A,B) is small and so that it is easy to solve
linear systems in B. These systems in B may be solved using direct methods, or by again applying iterative
methods.
approximations to x with relative error ǫ, where κf(A) is the ratio of the largest to the smallest non-zero
Vaidya [Vai90] discovered that for PSDDD matrices A one could use combinatorial techniques to construct
matrices B that provably satisfy both criteria. In his seminal work, Vaidya shows that when B corresponds
to a subgraphof the graph of A, one can bound κf(A,B) by boundingthe dilation and congestionof the best
embedding of the graph of A into the graph of B. By using preconditioners derived by adding a few edges
to maximum spanning trees, Vaidya’s algorithm finds ǫ-approximate solutions to PSDDD linear systems of
maximumvalenced in time O((dn)1.75log(κf(A)/ǫ)).1When these systems have special structure, such as
having a sparsity graph of bounded genus or avoiding certain minors, he obtains even faster algorithms. For
example, his algorithmsolves planar linear systems in time O((dn)1.2log(κf(A)/ǫ)). This paperfollows the
outlineestablishedbyVaidya: ourcontributionsareimprovementsinthetechniquesforboundingκf(A,B), a
constructionof better preconditioners,a constructionthat dependsuponaveragedegree ratherthan maximum
degree, and an analysis of the recursive application of our algorithm.
As Vaidya’spaperwas neverpublished2, andhismanuscriptlackedmanyproofs,thetask offormallyworking
out his results fell to others. Much of its content appears in the thesis of his student, Anil Joshi [Jos97].
Gremban, Miller and Zagha[Gre96, GMZ95] explain parts of Vaidya’s paper as well as extend Vaidya’s
techniques. Among other results, they found ways of constructing preconditioners by adding vertices to the
graphs and using separator trees.
Much of the theory behind the application of Vaidya’s techniques to matrices with non-positiveoff-diagonals
is developed in [BGH+]. The machinery needed to apply Vaidya’s techniques directly to matrices with pos-
itive off-diagonal elements is developed in [BCHT]. The present work builds upon an algebraic extension
of the tools used to prove bounds on κf(A,B) by Boman and Hendrickson [BH]. Boman and Hendrick-
son [BH01] have pointed out that by applying one of their bounds on support to the tree constructed by Alon,
Karp, Peleg, and West [AKPW95] for the k-server problem, one obtains a spanning tree preconditioner B
with κf(A,B) = m2O(√log nlog log n). They thereby obtain a solver for PSDDD systems that produces ǫ-
approximate solutions in time m1.5+o(1)log(κf(A)/ǫ). In their manuscript, they asked whether one could
possibly augment this tree to obtain a better preconditioner. We answer this question in the affirmative. An
algorithm running in time O(mn1/2log2(n)) has also recently been obtained by Maggs, et. al. [MMP+02].
The present paper is the first to push past the O(n1.5) barrier. It is interesting to observe that this is exactly
the point at which one obtains sub-cubic time algorithms for solving dense PSDDD linear systems.
Reif [Rei98] proved that by applying Vaidya’s techniques recursively, one can solve bounded-degree planar
positive definite diagonally dominant linear systems to relative accuracy ǫ in time O(m1+o(1)log(κ(A)/ǫ)).
We extend this result to general planar PSDDD linear systems.
Due to space limitations in the FOCS proceedings,some proofshavebeen omitted. These are beinggradually
included in the on-line version of the paper.
1For the reader unaccustomed to condition numbers, we note that for an PSDDD matrix A in which each entry is specified using b
bits of precision, log(κf(A)) = O(blogn).
2Vaidya founded the company Computational Applications and System Integration (http://www.casicorp.com) to market his linear
system solvers.
2
Page 3
1.1Background and Notation
A symmetric matrix A is semi-positive definite if xTAx ≥ 0 for all vectors x. This is equivalent to having
all eigenvalues of A non-negative.
In most of the paper, we will focus on Laplacian matrices: symmetric matrices with non-negative diagonals
and non-positive off-diagonals such that for all i,?
dominant if |Ai,i| ≥?n
In this paper, we will restrict our attention to the solution of linear systems of the form Ax = b where A is a
PSDDD matrix. When A is non-singular, that is when A−1exists, there exists a unique solution x = A−1b
to the linear system. When A is singular and symmetric, for every b ∈ Span(A) there exists a unique
x ∈ Span(A) such that Ax = b. If A is the Laplacian of a connected graph, then the null space of A is
spanned by 1.
jAi,j= 0. However, our results will apply to the more
generalfamilyof positivesemidefinite, diagonallydominant(PSDDD) matrices, wherea matrix is diagonally
j=1|Ai,j| for all i. We remark that a symmetric matrix is PSDDD if and only if it is
diagonally dominant and all of its diagonals are non-negative.
There are two natural ways to formulate the problemof finding an approximatesolution to a system Ax = b.
A vector ˜ x has relative residual error ǫ if ?A˜ x − b? ≤ ǫ?b?. We say that a solution ˜ x is an ǫ-approximate
solution if it is at relative distance at most ǫ from the actual solution—that is, if ?x − ˜ x? ≤ ǫ?x?. One
can relate these two notions of approximation by observing that relative distance of x to the solution and the
relative residual error differ by a multiplicative factor of at most κf(A). We will focus our attention on the
problem of finding ǫ-approximate solutions.
The ratio κf(A) is the finite condition number of A. The l2norm of a matrix, ?A?, is the maximum of
?Ax?/?x?, and equals the largest eigenvalueof A if A is symmetric. For non-symmetricmatrices, λmax(A)
and ?A? are typically different. We let |A| denote the number of non-zero entries in A, and min(A) and
max(A) denote the smallest and largest non-zero elements of A in absolute value, respectively.
The condition number plays a prominent role in the analysis of iterative linear system solvers. When A is
PSD, it is known that, after?κf(A)log(1/ǫ) iterations, the Chebyshev iterative method and the Conjugate
tion, one need merely run log(κf(A)) times as many iterations. If A has m non-zero entries, each of these
iterations takes time O(m). When applying the preconditionedversions of these algorithms to solve systems
of the form B−1Ax = B−1b, the number of iterations required by these algorithms to produce an ǫ-accurate
solution is bounded by?κf(A,B)log(κf(A)/ǫ) where
κf(A,B) =max
x:Ax?=0
xTBx
Gradient method produce solutions with relative residual error at most ǫ. To obtain an ǫ-approximate solu-
?
xTAx
??
max
x:Ax?=0
xTBx
xTAx
?
,
for symmetric A and B with Span(A) = Span(B). However, each iteration of these methods takes time
O(m) plus the time required to solve linear systems in B. In our initial algorithm,we will use direct methods
to solve these systems, and so will not have to worry about approximate solutions. For the recursive applica-
tion of our algorithms, we will use our algorithm again to solve these systems, and so will have to determine
how well we need to approximate the solution. For this reason, we will analyze the Chebyshev iteration
instead of the Conjugate Gradient, as it is easier to analyze the impact of approximation in the Chebyshev
iterations. However, we expect that similar results could be obtained for the preconditioned Conjugate Gra-
dient. For more information on these methods, we refer the reader to [GV89] or [Bru95].
1.2Laplacians and Weighted Graphs
All weighted graphs in this paper have positive weights. There is a natural isomorphism between weighted
graphs and Laplacian matrices: given a weighted graph G = (V,E,w), we can form the Laplacian matrix in
3
Page 4
which Ai,j= −w(i,j) for (i,j) ∈ E, and with diagonals determined by the condition A1 = 0. Conversely,
a weighted graph is naturally associated to each Laplacian matrix. Each vertex of the graph corresponds to
both a row and column of the matrix, and we will often abuse notation by identifying this row/column pair
with the associated vertex.
We note that if G1and G2are weighted graphs on the same vertex set with disjoint sets of edges, then the
Laplacian of the union of G1and G2is the sum of their Laplacians.
1.3Reductions
In most of this paper we just consider Laplacian matrices of connected graphs. This simplification is enabled
by two reductions.
First, we note that it suffices to construct preconditioners for matrices satisfying Ai,i=?
κf(˜A,B+D) ≤ κf(A,B). So, it suffices to find a preconditionerafter subtracting off the maximal diagonal
matrix that maintains positive diagonal dominance.
j|Ai,j|, for all i.
This follows from the observation in [BGH+] that if˜A = A+D, where A satisfies the above condition, then
We then use an idea of Gremban [Gre96] for handling positive off-diagonal entries. If A is a symmetric
matrix such that for all i, Ai,i≥?
positive off-diagonals. Gremban then considers the linear system
j|Ai,j|, then Gremban decomposes A into D + An+ Ap, where D is
the diagonal of A, Anis the matrix containing all negative off-diagonal entires of A, and Apcontains all the
?
D + An
−Ap
−Ap
D + An
??
x
x′
?
=
?
b
−b
?
,
and observes that its solution will have x′= −x and that x will be the solution to Ax = b. Thus, by making
this transformation, we can convert any PSDDD linear system into one with non-negative off diagonals.
One can understand this transformation as making two copies of every vertex in the graph, and two copies of
every edge. The edges corresponding to negative off-diagonals connect nodes in the same copy of the graph,
while the others cross copies. To capture the resulting family of graphs, we define a weighted graph G to be
a Gremban cover if it has 2n vertices and
• for i,j ≤ n, (i,j) ∈ E if and only if (i + n,j + n) ∈ E, and w(i,j) = w(i + n,j + n),
• for i,j ≤ n, (i,j + n) ∈ E if and only if (i + n,j) ∈ E, and w(i,j + n) = w(i + n,j), and
• the graph contains no edge of the form (i,i + n).
When necessary,we will explainhowto modifyourargumentsto handleLaplaciansthat are Grembancovers.
Finally, if A is the Laplacian of an unconnected graph, then the blocks corresponding to the connected
components may be solved independently.
1.4Direct Methods
The standard direct method for solving symmetric linear systems is Cholesky factorization. Those unfamiliar
withCholeskyfactorizationshouldthinkofit asGaussianeliminationinwhichonesimultaneouslyeliminates
on rows and columns so as to preserve symmetry. Given a permutation matrix P, Cholesky factorization
produces a lower-triangular matrix L such that LLT= PAPT. Because one can use forward and back
substitution to multiply vectors by L−1and L−Tin time proportionalto the number of non-zeroentries in L,
one can use the Cholesky factorization of A to solve the system Ax = b in time O(|L|).
4
Page 5
Each pivot in the factorization comes from the diagonal of A, and one should understand the permutation P
as providing the order in which these pivots are chosen. Many heuristics exist for producing permutations P
for which the number of non-zeros in L is small. If the graph of A is a tree, then a permutation P that orders
the vertices of A from the leaves up will result in an L with at most 2n − 1 non-zero entries. In this work,
we will use results concerningmatrices whose sparsity graphs resemble trees with a few additional edges and
whose graphs have small separators, which we now review.
If B is the Laplacian matrix of a weighted graph (V,E,w), and one eliminates a vertex a of degree 1, then
the remaining matrix has the form
?
where A1is the Laplacian of the graph in which a and its attached edge have been removed. Similarly, if a
vertex a of degree 2 is eliminated, then the remaining matrix is the Laplacian of the graph in which the vertex
a and its adjacent edges have been removed, and an edge with weight 1/(1/w1+ 1/w2) is added between
the two neighbors of a, where w1and w2are the weights of the edges connecting a to its neighbors.
1
0
0
A1,
?
Given a graph G with edge set E = R ∪ S, where the edges in R form a tree, we will perform a partial
Cholesky factorization of G in which we successively eliminate all the degree 1 and 2 vertices that are not
endpoint of edges in S. We introduce the algorithm trim to define the order in which the vertices should be
eliminated, and we call the trim order the order in which trim deletes vertices.
Algorithm: trim(V,R,S)
1. While G contains a vertex of degree one that is not an endpoint of an edge in S, removethat vertex and
its adjacent edge.
2. While G contains a vertex of degree two that is not an endpointof an edge in S, remove that vertex and
its adjacent edges, and add an edge between its two neighbors.
Proposition 1.1. The output of trim is a graph with at most 4|S| vertices and 5|S| edges.
Remark 1.2. If (V,R) and (V,S) are Gremban covers, then we can implement trim so that the output
graph is also a Gremban cover. Moreover, the genus and maximum size clique minor of the output graph do
not increase.
After performing partial Cholesky factorization of the vertices in the trim order, one obtains a factorization
of the form
B = LCLT,where C =
?
I
0
0
A1
?
,
L is lower triangular, and the left column and right columns in the above representations correspond to the
eliminated and remainingvertices respectively. Moreover,|L| ≤ 2n−1, and this Cholesky factorizationmay
be performed in time O(n + |S|).
The following Lemma may be proved by induction.
Lemma 1.3. Let B be a Laplacianmatrix and let L and A1be the matrices arising from the partial Cholesky
factorization of B according to the trim order. Let U be the set of eliminated vertices, and let W be the set of
remaining vertices. For each pair of vertices (a,b) in W joined by a simple path containing only vertices of
U, let B(a,b)be the Laplacian of the graph containingjust one edge between a and b of weight 1/(?
i1/wi),
where the wiare the weights on the path between a and b. Then,
(a) the matrix A1is the sum of the Laplacian of the induced graph on W and the sum all the Laplacians
B(a,b),
5