Content uploaded by Caleb Robelle

Author content

All content in this area was uploaded by Caleb Robelle on Jun 05, 2020

Content may be subject to copyright.

arXiv:2002.10889v2 [cs.DS] 24 May 2020

Eﬃcient and Simple Algorithms for Fault-Tolerant Spanners

Michael Dinitz∗

Johns Hopkins University

Caleb Robelle

University of Maryland, Baltimore County

Abstract

It was recently shown that a version of the greedy algorithm gives a construction of fault-

tolerant spanners that is size-optimal, at least for vertex faults. However, the algorithm to

construct this spanner is not polynomial-time, and the best-known polynomial time algorithm

is signiﬁcantly suboptimal. Designing a polynomial-time algorithm to construct (near-)optimal

fault-tolerant spanners was given as an explicit open problem in the two most recent papers on

fault-tolerant spanners ([Bodwin, Dinitz, Parter, Vassilevka Williams SODA ’18] and [Bodwin,

Patel PODC ’19]). We give a surprisingly simple algorithm which runs in polynomial time and

constructs fault-tolerant spanners that are extremely close to optimal (oﬀ by only a linear factor

in the stretch) by modifying the greedy algorithm to run in polynomial time. To complement

this result, we also give simple distributed constructions in both the LOCAL and CONGEST

models.

1 Introduction

Let G= (V, E ) be a graph, possibly with edge lengths w:E→R≥0. A t-spanner of G, for t≥1,

is a subgraph G′= (V, E′) that preserves all pairwise distances within factor t, i.e.,

dG′(u, v)≤t·dG(u, v) (1)

for all u, v ∈V(where dHdenotes the shortest-path distance in a graph H). The distance preser-

vation factor tis called the stretch of the spanner. Less formally, graph spanners are a form of

sparsiﬁers that approximately preserve distances (as opposed to other notions of graph sparsiﬁcation

which approximately preserve cuts [BK15], the spectrum [SS11,BSS14], or other graph properties).

When considering spanners through the lens of sparsiﬁcation, perhaps the most important goal in

the study of graph spanners is understanding the tradeoﬀ between the stretch and the sparsity.

The main result in this area, which is tight assuming the “Erd˝os girth conjecture” [Erd64], was

given by Alth¨ofer et al.:

Theorem 1 ( [ADD+93]).For every positive integer k, every weighted graph G= (V, E)has a

(2k−1)-spanner with at most O(n1+1/k)edges.

This notion of graph spanners was ﬁrst introduced by Peleg and Sch¨aﬀer [PS89] and Peleg

and Ullman [PU89] in the context of distributed computing, and has been studied extensively for

the last three decades in the distributed computing community as well as more broadly. Span-

ners are not only inherently interesting mathematical objects, but they also have an enormous

∗Supported in part by NSF award CCF-1909111

1

number of applications. A small sampling includes uses in distance oracles [TZ05], property test-

ing [BGJ+09, BBG+14], synchronizers [PU89], compact routing [TZ01], preprocessing for approxi-

mation algorithms [BKM09, DKN17]), and many others.

Many of these applications, particularly in distributed computing, arise from modeling computer

networks or distributed systems as graphs. But one aspect of distributed systems that is not

captured by the above spanner deﬁnition is the possibility of failures. We would like our spanner

to be robust to failures, so that even if some nodes fail we still have a spanner of what remains.

More formally, G′is an f-(vertex-)fault-tolerant t-spanner of Gif for every set F⊆Vwith |F| ≤ f

the spanner condition holds for G\F, i.e.,

dG′\F(u, v)≤t·dG\F(u, v)

for all u, v ∈V\F. If Fis instead an edge set then this gives a deﬁnition of an f-edge-fault-tolerant

t-spanner.

This notion of fault-tolerant spanners was ﬁrst introduced by Levcopoulos, Narasimhan, and

Smid [LNS98] in the context of geometric spanners (the special case when the vertices are in

Euclidean space and the distance between two points is the Euclidean distance), and has since

been studied extensively in that setting [LNS98, Luk99, CZ04, NS07]. Note that in the geometric

setting dG\F(u, v) = dG(u, v) for all u, v ∈V\F, since faults do not change the underlying geometric

distances.

In general graphs, though, dG\F(u, v) may be extremely diﬀerent from dG(u, v), making this

deﬁnition more diﬃcult to work with. The ﬁrst results on fault-tolerant graph spanners were by

Chechik, Langberg, Peleg, and Roditty [CLPR10], who showed how to modify the Thorup-Zwick

spanner [TZ05] to be f-fault-tolerant with an additional cost of approximately kf: the number

of edges in the f-fault-tolerant (2k−1)-spanner that they create is approximately ˜

O(kfn1+1/k)

(where ˜

Ohides polylogarithmic factors). Since [CLPR10] there has been a signiﬁcant amount of

work on improving the sparsity, particularly as a function of the number of faults f(since we

would like to protect against large numbers of faults but usually care most about small stretch

values). First, Dinitz and Krauthgamer [DK11] improved the size to ˜

O(f2−1/kn1+1/k ) by giving a

black-box reduction to the traditional non-fault-tolerant setting. Then Bodwin, Dinitz, Parter, and

Vassilevska Williams [BDPW18] decreased this to O(exp(k)f1−1/k n1+1/k), which they also showed

was optimal (for vertex faults) as a function of fand n(i.e., the only non-optimal dependence

was the exp(k)). Unlike previous fault-tolerant spanner constructions, this optimal construction

was based oﬀ of a natural greedy algorithm (the natural generalization of the greedy algorithm

of [ADD+93]). An improved analysis of the same greedy algorithm was then given by Bodwin and

Patel [BP19], who managed to show the fully optimal bound of O(f1−1/kn1+1/k ).

Unlike the previous fault-tolerant spanner construction of [DK11] and the greedy non-fault-

tolerant algorithm of [ADD+93], the greedy algorithm of [BDPW18,BP19] has a signiﬁcant weak-

ness: it takes exponential time. Obtaining the same (or similar) size bound in polynomial time was

explicitly mentioned as an important open question in both [BDPW18] and [BP19].

1.1 Our Results and Techniques

In this paper we design a surprisingly simple algorithm to construct nearly-optimal fault-tolerant

spanners in polynomial time, in both unweighted and weighted graphs.

2

Theorem 2. There is a polynomial time algorithm which, given integers k≥1and f≥1and a

(weighted) graph G= (V, E)with |V|=nand |E|=m, constructs an f-fault-tolerant (2k−1)-

spanner with at most Okf1−1/kn1+1/k edges in time O(mkf 2−1/k n1+1/k).

Note that while we are a factor of kaway from complete optimality (for vertex faults), this is

truly optimal when the stretch is constant and, for non-constant stretch values, is still signiﬁcantly

sparser than the analysis of the exponential time algorithm by [BDPW18] (which lost an exponential

factor in k).

The main idea in our algorithm is to replace the exponential-time subroutine used in the

greedy algorithm of [BDPW18, BP19] with an appropriate polynomial-time approximation algo-

rithm. More speciﬁcally, the main step of the exponential time greedy algorithm is to consider

whether a given candidate edge is “already spanned” by the subgraph Hthat has already been

built. This means determining whether, for some candidate edge {u, v}, there is a fault set Fwith

|F| ≤ fsuch that dH\F(u, v)>(2k−1) ·dG\F(u, v). If such a fault set exists then the algorithm

adds {u, v}to H, and otherwise does not1. In both [BDPW18] and [BP19], the only method given

to ﬁnd such a set Fwas to try all possible sets, giving running time that is exponential in fand

thus exponential in the size of the input.

Our main approach is to speed this up by designing a polynomial-time algorithm to replace this

exponential-time step. Unfortunately, the corresponding problem (known as Length-Bounded

Cut) is NP-hard [BEH+06], so we cannot hope to actually solve it eﬃciently. Instead, we design an

approximation algorithm for Length-Bounded Cut and use it instead. We end up with a fairly

weak approximation (basically a k-approximation), and one which only holds in the unweighted

case. But this turns out to be enough for the unweighted case: it intuitively allows us to build (in

polynomial time) an f-fault-tolerant spanner with the size of a kf-fault-tolerant spanner, which

changes the size from O(f1−1/kn1+1/k ) to O((kf )1−1/kn1+1/k ) = O(kf1−1/kn1+1/k). However, this

is only intuition. The graph we end up creating is not necessarily even a subgraph of the kf -fault-

tolerant spanner that the true greedy algorithm would have built, so we cannot simply argue that

our algorithm returns something with at most as many edges as the greedy kf-fault-tolerant greedy

spanner. Instead, we need to analyze the size of our spanner from scratch. Fortunately, we can do

this by simply following the proof strategy of [BP19] with only some minor modiﬁcations.

A natural approach to the weighted case would be to try to generalize this by creating an O(k)-

approximation for Length-Bounded Cut in the weighted setting. Such an algorithm would

certainly suﬃce, but unfortunately we do not know how to design any nontrivial approximation

algorithm for Length-Bounded Cut in the presence of weights. While this might appear to

rule out using a similar technique, we show that special properties of the greedy algorithm allow

us to essentially reduce to the unweighted setting. We use the weights to determine the order

in which we consider edges, but for the rest of the algorithm we simply “pretend” to be in the

unweighted setting. Since the size bound for the unweighted case worked for any ordering, that

same size bound will apply to our spanner. And then we can use the fact that we considered edges

in order of nondecreasing weights to argue that the subgraph we create is in fact an f-fault-tolerant

(2k−1)-spanner even though we ignored the weights.

1Note that in the fault-free case this just means checking whether there is already a path of stretch at most (2k−1)

between the endpoints, which is precisely the original greedy algorithm of [ADD+93].

3

Distributed Settings

While the focus of this paper is on a centralized polynomial-time algorithm since the existence of

such an algorithm was an explicit open question from [BDPW18] and [BP19], we complement this

result with some simple algorithms in the standard LOCAL and CONGEST models of distributed

computation.

In the LOCAL model, we can use standard network decompositions to ﬁnd a clustering of the

graph where the clusters have low diameter, every edge is in at least one cluster, and the clustering

comes from O(log n) partitions. Since in the LOCAL model we are allowed unbounded message

sizes, this means that in O(log n) time we can send the subgraph induced by each cluster to the

cluster center (an arbitrary node in the cluster), who can then locally run the greedy algorithm

on that cluster and then inform the nodes in the cluster about the edges that have been chosen.

This will take only O(log n) communication rounds (since clusters have diameter O(log n)) and will

incur only an extra O(log n) factor in the number of edges (since the clustering can be divided into

O(log n) partitions).

In the CONGEST model we cannot apply this approach (even though we could ﬁnd a similar

clustering) because we are not able to gather large induced subgraphs at the cluster centers (due to

the bound on message sizes). Instead, we show that the older fault-tolerant spanner construction

of [DK11] can be combined with the standard (non-fault-tolerant) spanner algorithm in the CON-

GEST model due to Baswana and Sen [BS07] to give a fault-tolerant spanner algorithm in CON-

GEST. This approach means that the size increases to O(kf2−1/kn1+1/k log n) (so we are a factor of

flog naway from the bounds of the polynomial-time greedy algorithm), but the number of rounds

needed is quite small despite the limitation on message sizes (O(f2(log f+ log log n) + k2flog n)

rounds).

2 Notation and Preliminaries

We will be discussing graphs G= (V , E) where n=|V|and m=|E|. Sometimes these graphs will

also have a weight function w:E→R≥0. We will slightly abuse notation to let w(u, v) = w({u, v})

for all {u, v} ∈ E. For a (possibly weighted) graph G, we will let dG(u, v) denote the length of the

shortest (lowest-weight) path from uto v(if no such path exists then this length is ∞). For any

C⊆V, we let G[C] denote the subgraph of Ginduced by C. For F⊆Vlet G\Fbe G[V\F],

and for F⊆Elet G\Fbe (V, E \F).

Deﬁnition 1. Let G= (V, E ) be a (possibly weighted) graph. A subgraph Hof Gis an f-

vertex-fault-tolerant (f-VFT) t-spanner of Gif dH\F(u, v)≤t·dG\F(u, v) for all F⊆Vwith

|F| ≤ fand u, v 6∈ F. A subgraph Hof Gis an f-edge-fault-tolerant (f-EFT) t-spanner of Gif

dH\F(u, v)≤t·dG\F(u, v) for all F⊆Ewith |F| ≤ f.

Throughout this paper, for simplicity we will only discuss the vertex fault-tolerant case since

that is the more diﬃcult one to prove upper bounds for. The proofs for the edge fault-tolerant case

are essentially identical.

We ﬁrst show an equivalent deﬁnition that will let us restrict which pairs of vertices we care

about.

Lemma 3. Let G= (V, E)be a graph with weight function wand let Hbe a subgraph of G. Then

His an f-VFT t-spanner of Gif and only if dH\F(u, v)≤t·w(u, v)for all F⊆Vwith |F| ≤ f

and u, v ∈V\Fsuch that {u, v} ∈ Eand dG\F(u, v) = w(u, v)

4

Proof. The only if direction is immediately implied by Deﬁnition 1, since for any F⊆Vwith

|F| ≤ fand u, v ∈V\Fsuch that {u, v} ∈ Eand dG\F(u, v) = w(u, v), we know from Deﬁnition 1

that dH\F(u, v)≤t·dG\F(u, v)≤t·w(u, v).

For the if direction, let F⊆Vwith |F| ≤ fand u, v ∈V\F. Let P= (u=x0, x1,...,xp=v)

be the shortest path in G\Fbetween uand v. If p= 1 then P= (u, v), and thus dH\F(u, v) =

w(u, v) = dG\F(u, v). If p > 1, then we know that dG\F(xi−1, xi) = w(xi−1, xi) for all i∈

{1,2,...,p}, and thus

dH\F(u, v)≤

p

X

i=1

dH\F(xi−1, xi)≤

p

X

i=1

t·w(xi−1, xi)

=t

p

X

i=1

w(xi−1, xi) = t·dG\F(u, v).

Hence His an f-VFT t-spanner of G.

The original greedy algorithm for fault-tolerant spanners was introduced and analyzed by [BDPW18],

with an improved analysis by [BP19], and is given in Algorithm 1. The part of this algorithm which

takes exponential time is the “if” condition, i.e., checking whether there is a fault set which hits

all stretch-(2k−1) paths. For edge fault-tolerance, the algorithm is the same except that Fis an

edge set.

Algorithm 1 Greedy f-VFT (2k−1)-Spanner Algorithm

function FT-GREEDY(G= (V, E, w), k, f )

H←(V, ∅, w)

for all {u, v} ∈ Ein nondecreasing weight order do

if there exists a set Fof at most fvertices such that dH\F(u, v)>(2k−1)w(u, v)then

add {u, v}to H

end if

end for

return H

3 Unweighted Graphs

In this section we design a polynomial-time algorithm for the special case of unweighted (or unit-

weighted) graphs. We begin by designing a simple approximation algorithm for the Length-

Bounded Cut problem, and then show that this algorithm can be plugged into the greedy algo-

rithm with only a small loss.

3.1 Length-Bounded Cut

In order to design a polynomial-time variant of the greedy algorithm, we want to replace the

“if” condition by something that can be computed in polynomial time. While there are many

possibilities, there are two obvious approaches: we could try to compute the maximum tsuch that

there is a fault set of size fwhich hits all t-hop paths, or we could try to compute the minimum

5

fsuch that there is a fault set of size fwhich hits all t-hop paths. It turns out that this second

approach is more fruitful.

Consider the following problem, known as the Length-Bounded Cut problem [BEH+06].

The input is an unweighted graph G= (V, E ) with |V|=nand |E|=m, vertices u, v ∈V(known

as the terminals), and a positive integer t. A length-t-cut is a subset F⊆V\ {u, v}such that

dG\F(u, v)> t. The goal is to ﬁnd the length-t-cut of minimum cardinality.

We are essentially going to design a t-approximation for this problem. But since we do not need

the full power of this approximation, in order to speed it up we will instead consider a gap decision

version of the problem. In the LBC(t, α) problem, the input is the same as in Length-Bounded

Cut but there is an additional input parameter α. If there is a length-t-cut of size at most α, then

we must return YES. If there is no length-t-cut of size at most αt, then we must return NO. For

intermediate values we are allowed to return either YES or NO.

Recall that breadth-ﬁrst search (BFS) ﬁnds shortest paths in unweighted graphs in O(m+n)

time. So we can use BFS to check whether there is a path with at most thops from uto vin

O(m+n) time. This gives the following natural algorithm (Algorithm 2), which is essentially the

standard “frequency” approximation of Set Cover (or Hitting Set).

Algorithm 2 Algorithm for LBC(t, α)

F← ∅

for i= 1 to α+ 1 do

Run BFS to ﬁnd a path Pof length at most tfrom uto vin G\Fif one exists.

if no such Pexists then

return YES

else

Add all vertices of P\ {u, v}to F

end if

end for

return NO

Theorem 4. Algorithm 2 correctly decides LBC(t, α) and runs in O((m+n)α)time.

Proof. By the running time of BFS, we know that each iteration of Algorithm 2 takes O(m+n)

time, and thus the total time is O((m+n)α) as claimed.

Suppose that there is a length-t-cut F∗of size at most α. Then for every path Pwhich our

algorithm considers (and adds to F), it must be the case that |P∩F∗| ≥ 1 since F∗must hit all

paths of length at most t. Since we remove each path we consider (by adding it to F), this means

that there will be no more such paths after at most αiterations and thus the algorithm will return

YES as required.

Now suppose that every length-t-cut has size larger than αt. Since we add at most tvertices

to Fin each iteration, at the beginning of iteration α+ 1 the set Fhas size at most αt. Thus in

every iteration some path Pof length at most texists, so the algorithm will return NO.

To handle edge fault-tolerance, we need to slightly change the deﬁnition of LBC(t, α) to be

about edge sets rather than vertex sets, so in the algorithm Fis an edge set and we add the edges

of Prather than the vertices. But other than that trivial change, the algorithm and analysis are

identical.

6

3.2 Modiﬁed Greedy

Let G= (V, E ) be an undirected unweighted graph. We will modify Algorithm 1 by using our new

algorithm for LBC, Algorithm 2. For an EFT spanner algorithm, we simply use the edge-based

version of Algorithm 2.

Algorithm 3 Modiﬁed Greedy VFT Spanner Algorithm

function FT-GREEDY(G= (V, E), k, f )

H←(V, ∅, w)

for all {u, v} ∈ Ein arbitrary order do

if Algorithm 2 returns YES when run on input graph Hwith terminals u, v and t= 2k−1

and α=fthen

Add {u, v}to H

end if

end for

return H

We ﬁrst prove that this algorithm does indeed return a valid solution, despite the use of an

approximation algorithm to determine whether or not to add an edge (we prove this only for VFT

for simplicity, but the proof for EFT is analogous).

Theorem 5. Algorithm 3 returns an f-VFT (2k−1)-spanner.

Proof. Let F⊆Vbe an arbitrary fault set with |F| ≤ fand {u, v} ∈ Ewith u, v 6∈ F. By

Lemma 3, we just need to show that dH\F(u, v)≤2k−1 (since Gis unweighted) in order to prove

the theorem. Clearly this is true if {u, v} ∈ E(H). If {u, v} 6∈ E(H), then when the algorithm

considered {u, v}it must have been the case that Algorithm 2 returned NO. Theorem 4 then

implies that every length-(2k−1)-cut on H(for u, v) has size larger than f. Thus Fis not a

length-(2k−1)-cut in Hfor u, v, and so dH\F(u, v)≤2k−1.

Now we want to bound the size of the returned spanner. To do this, a natural approach would

be to argue that the spanner it returns is a subgraph of the greedy ((2k−1)f)-VFT spanner, since it

seems like whenever our modiﬁed algorithm requires us to add an edge it has found a cut certifying

that the greedy ((2k−1)f)-VFT spanner would also have had to add that edge. Unfortunately, this

is not true since the modiﬁed algorithm might not add some edges that the true greedy algorithm

would have added, and thus later on our algorithm might have to actually add some edges that the

true greedy algorithm would not have had to add.

The next natural approach would be to try to use the analysis of [BP19] as a black box.

Unfortunately we cannot do this either, since the lemmas they use are speciﬁc to the true greedy

algorithm rather than our modiﬁcation. However, it is straightforward to modify their analysis so

that it continues to hold for our modiﬁed algorithm, with only an additional loss of a factor of k.

We do this here for completeness. As in [BP19], we start with the deﬁnition of a blocking set, and

then give two lemmas using this deﬁnition. And also as in [BDPW18, BP19], we only prove this

for VFT, as the proof for EFT is essentially identical.

Deﬁnition 2 ( [BP19]).For any graph G= (V, E), we deﬁne B⊆V×Eto be a t-blocking set of

Gif for all (v, e)∈B, we have v6∈ eand for any cycle Cin Gwith |C| ≤ t, there exists (v, e)∈B

such that v, e ∈C.

7

Lemma 6. Any graph Hreturned by Algorithm 3 with parameters k, f has a (2k)-blocking set of

size at most (2k−1)f|E(H)|.

It was shown in [BP19] that the graph Hreturned by the standard VFT greedy algorithm

with parameters k, f has a (2k)-blocking set of size at most f|E(H)|.2So our modiﬁed algorithm

satisﬁes the same lemma up to a factor of O(k). The proof is almost identical in our case; we

essentially replace all instances of fin their proof with (2k−1)f.

Proof of Lemma 6. Let e={u, v}be some edge in E(H), and let H′be the subgraph maintained

by the algorithm just before eis added to E(H) (so H′is a subset of the ﬁnal H). Since ewas

added by Algorithm 3, when it was considered Algorithm 2 must have returned YES. Thus by

Theorem 4 there is some set Fe⊆V\ {u, v}with |Fe| ≤ f(2k−1) such that dH′\Fe(u, v)>2k−1.

Now we can deﬁne the blocking set: let B={(x, e) : e∈E(H), x ∈Fe}.

Since |Fe| ≤ f(2k−1) for all e∈E(H), we immediately get that |B| ≤ |E(H)|f(2k−1) as

claimed. So we now need to show that Bis a (2k)-blocking set. To see this, let Cbe any cycle

with at most 2kvertices in H, and let e={u, v}be the last edge of this cycle to be added to H.

Let H′be the subgraph of Hbuilt by the algorithm just before eis added. Then C\eis a u−v

path in H′of length at most 2k−1, and thus there is some x∈C\ {u, v}that is in Fe. Thus

(x, e)∈B.

Now we know that the spanner returned by Algorithm 3 has a small blocking set. The next

lemma implies that any such graph must have a dense but high-girth subgraph.

Lemma 7. Let Hbe any graph on nnodes and medges (with f=o(n)) that has a (2k)-blocking

set Bof size at most (2k−1)fm. Then Hhas a subgraph on O(n/(kf )) nodes and Ω(m/(kf )2)

edges that has girth greater than 2k.

Proof. Let H′denote the induced subgraph of Hon a uniformly random subset of exactly ⌊n/(2(2k−

1)f)⌋nodes. Let B′:= B∩(V(H′)×E(H′)), and let H′′ denote the graph obtained by removing

from H′every edge contained in any pair in B′. The graph H′′ will be the one we analyze.

The easiest property to analyze is the number of nodes in H′′ : there are precisely ⌊n/(2(2k−

1)f)⌋vertices in H′′, which is O(n/(kf )) as claimed.

The next easiest property of H′′ to prove is the girth. Let Cbe a cycle in Hwith at most 2k

nodes. Cis either in H′or it is not. If it is not in H′then some vertex in Cis not in V(H′), and

thus Cis not in H′′. On the other hand, if Cis in H′then by the deﬁnition of Bthere is some

edge (x, e)∈Bso that e∈C, and also (x, e)∈B′, and thus Cdoes not exist in H′′.

To analyze |E(H′′)|, we start with the following observations.

•Each {u, v} ∈ E(H) remains in E(H′) if u, v ∈V(H′). This happens with probability

⌊n/(2(2k−1)f)⌋

n·⌊n/(2(2k−1)f)⌋ − 1

n−1

≥(1 −o(1)) 1

4((2k−1)f)2

2In [BP19] the parameter “k” is used to denote the stretch, while for us the stretch is 2k−1, and thus there

are slight constant factor diﬀerences between the statements as written in [BP19] and our interpretation of their

statements. But our statements about [BP19] are correct under this change of variables.

8

•Each (x, {u, v})∈Bremains in B′if u, v, x ∈V(H′). This happens with probability

⌊n/(2(2k−1)f)⌋

n·⌊n/(2(2k−1)f)⌋ − 1

n−1·⌊n/(2(2k−1)f)⌋ − 2

n−2

≤1

8((2k−1)f)3

Now we can use these observations to compute the expected size of E(H′′):

E[|E(H′′)|]≥E[|E(H′)| − |B′|] = E[|E(H′)|]−E[|B′|]

≥(1 −o(1)) |E(H)|

4((2k−1)f)2−|B|

8((2k−1)f)3

≥(1 −o(1)) m

4((2k−1)f)2−(2k−1)fm

8((2k−1)f)3

≥(1 −o(1)) m

4((2k−1)f)2−m

8((2k−1)f)2

= (1 −o(1)) m

8((2k−1)f)2

= Ω m

(kf )2

Note that the bounds on |V(H′′)|and on the girth of H′′ are deterministic. So there is some

subgraph which has those bounds and where the number of edges is at least the expectation, proving

the lemma.

This lemma allows us to prove the size bound.

Theorem 8. The subgraph Hreturned by Algorithm 3 has at most Okf 1−1/kn1+1/k edges.

Proof. If f= Ω(n) then the theorem is trivially true. Otherwise, by Lemmas 6 and 7 we know

that Hhas a subgraph Sof girth larger than 2kon O(n/(kf )) nodes and with |E(S)| ≥ Ω|E(H)|

(kf )2

edges. But it has long been known that any graph with nvertices and girth larger than 2kmust

have at most O(n1+1/k) edges (this is the key fact used in the original non-fault-tolerant greedy

algorithm analysis [ADD+93]). Hence |E(S)| ≤ O((n/(kf ))1+1/k ). Therefore there are constants

c1, c2>0 such that for large enough n,

c1n

kf 1+1/k

≥ |E(S)| ≥ c2|E(H)|

(kf )2

=⇒ |E(H)| ≤ O(kf )1−1/k n1+1/k =Okf 1−1/kn1+1/k .

Theorem 9. The worst-case running time of Algorithm 3 is at most Omkf2−1/k n1+1/k.

Proof. Algorithm 3 has |E|=miterations, each of which consists of one call to Algorithm 2

with α=fon graph H. So the running time of each iteration (by Theorem 4) is at most

O((|E(H)|+n)f). Theorem 8 implies that |E(H)| ≤ O(kf 1−1/kn1+1/k ), and thus the total running

time is at most O(mkf 2−1/kn1+1/k ).

Theorems 5, 8, and 9 together imply Theorem 2 in the unweighted case.

9

4 Weighted Graphs

We now show that we can use the algorithm we designed for the unweighted setting even in the

presence of weights. Our algorithm is very simple: we order the edges in nondecreasing weight

order, but then run the unweighted algorithm on the edges in this order. We give this algorithm

more formally as Algorithm 4. Again, changing to edge fault-tolerance is straightforward: we just

use the edge version of Algorithm 2. So we prove this only for vertex fault-tolerance for simplicity.

Algorithm 4 Modiﬁed Greedy VFT Spanner Algorithm (Weighted)

function FT-GREEDY(G= (V, E, w), k, f )

H←(V, ∅, w)

for all {u, v} ∈ Ein nondecreasing weight order do

if Algorithm 2 returns YES when run on input graph H(with no weights) with terminals u, v

and t= 2k−1 and α=fthen

Add {u, v}to H

end if

end for

return H

Theorem 10. Algorithm 4 returns an f-VFT (2k−1)-spanner with at most O(kf 1−1/kn1+1/k )

edges in time at most O(mkf 2−1/k n1+1/k).

Proof. The running time is directly from Theorem 9, since the only additional step in the algorithm

is sorting the edges by weight, which takes only O(mlog m) additional time. The size also follows

directly from Theorem 8, since Algorithm 4 is just a particular instantiation of Algorithm 3 where

the ordering (which is unspeciﬁed in Algorithm 3) is determined by the weights. In other words,

Theorem 8 holds for an arbitrary order, so it certainly holds for the weight ordering.

The more interesting part of this theorem is correctness: why does this algorithm return an

f-VFT (2k−1)-spanner despite ignoring weights? Let F⊆Vbe an arbitrary fault set with |F| ≤ f

and {u, v} ∈ Ewith u, v 6∈ Fand dG\F(u, v) = w(u, v). By Lemma 3, we just need to show that

dH\F(u, v)≤(2k−1)w(u, v) in order to prove the theorem. Clearly this is true if {u, v} ∈ E(H).

So suppose that {u, v} 6∈ E(H). Then when the algorithm considered {u, v}it must have been

the case that Algorithm 2 returned NO, and hence by Theorem 4 every length-(2k−1)-cut in H

(unweighted) for u, v has size larger than fand so Fis not such a cut. Thus at the time the

algorithm was considering {u, v}, there was some path Pbetween uand vin H\Fwith at most

2k−1 edges. But since we considered edges in order of nondecreasing weight, every edge in Phas

weight at most w(u, v). Thus

dH\F(u, v)≤X

e∈P

w(e)≤X

e∈P

w(u, v) = |P|w(u, v)

≤(2k−1)w(u, v),

as required.

10

5 Distributed Algorithms

In this section we give eﬃcient randomized algorithms to compute fault-tolerant spanners of

weighted graphs in two standard distributed models: the LOCAL model and the CONGEST

model [Pel00]. Recall that in both models we assume communication happens in synchronous

rounds, and our goal is to minimize the number of rounds needed. In the LOCAL model each

node can send an arbitrary message on each incident edge in each round, while in the CONGEST

model these messages must have size at most O(log n) bits (or O(1) words, so we can send a con-

stant number of node IDs and weights in each message). Note that both models allow unlimited

computation at each node, and hence the diﬃculty with applying the greedy algorithm is not the

exponential running time, but its inherently sequential nature.

5.1 LOCAL

In the LOCAL model we will be able to implement the greedy algorithm at only a small extra

cost in the size of the spanner. Our approach is simple: we use standard network decompositions

to decompose the graph into clusters, run the greedy algorithm in each cluster, and then take the

union of the spanner for each cluster.

The following theorem is a simple corollary of the construction of “padded decompositions”

given explicitly in previous work on fault-tolerant spanners [DK11]. It also appears implicitly in

various forms in [LS93,Bar96,MPX13,MPVX15] (among others). In what follows, the hop diameter

of a cluster refers to its unweighted diameter.

Theorem 11. There is an algorithm in the LOCAL model which runs in O(log n)rounds and

constructs P1, P2,...,Pℓsuch that:

1. Each Piis a partition of V, with each part of the partition referred to as a cluster. Let

C=∪ℓ

i=1Pibe the collection of all clusters of all ℓpartitions.

2. Each cluster has hop diameter at most O(log n)and contains some special node known as the

cluster center.

3. ℓ=O(log n)(there are O(log n)partitions).

4. With high probability (1−1/ncfor any constant c) for every edge e∈Ethere is a cluster

C∈ C such that e⊆C.

With this tool, it is easy to describe our algorithm. First we use Theorem 11 to construct the

partitions. Then in each cluster Cwe gather at the cluster center the entire subgraph G[C] induced

by that cluster. Each cluster center uses the greedy algorithm (Algorithm 1) on G[C] to construct

an f-VFT (2k−1)-spanner HCof G[C], and then sends out the selected edges to the nodes in C.

Let Hbe the ﬁnal subgraph created (the union of the edges of each HC)

Theorem 12. With high probability, His an f-VFT (2k−1)-spanner of Gwith at most Of1−1/kn1+1/k log n

edges and the algorithm terminates in O(log n)rounds.

Proof. The round complexity is obvious from the round complexity and cluster hop diameter bounds

in Theorem 11.

11

The total number of edges added is at most

ℓ

X

i=1 X

C∈Pi

|E(HC)| ≤

ℓ

X

i=1 X

C∈Pi

f1−1/k|V(HC)|1+1/k

=f1−1/k

ℓ

X

i=1 X

C∈Pi

|C|1+1/k

≤f1−1/k

ℓ

X

i=1

n1+1/k

=Of1−1/kn1+1/k log n,

where we used the size bound on the greedy algorithm from [BP19] and the fact from Theorem 11

that each Piis a partition of V.

To show correctness, consider some {u, v} ∈ Eand F⊆Vwith |F| ≤ fand u, v 6∈ Fso that

dG\F(u, v) = w(u, v). By Lemma 3, we just need to prove that dH\F(u, v)≤(2k−1)w(u, v). Let

C∈ C be a cluster which contains both uand v, which we know exists (with high probability) from

Theorem 11. Let FC=F∩C. Then

dH\F(u, v)≤dHC\FC(u, v)

≤(2k−1) ·dG[C]\FC(u, v) (deﬁnition of HC)

≤(2k−1) ·w(u, v) ({u, v} ∈ E(G[C]\FC))

Thus His indeed an f-VFT (2k−1)-spanner of G.

5.2 CONGEST

We unfortunately cannot use the approach that we used in the LOCAL model in the CONGEST

model, since we cannot eﬃciently gather the entire topology of a cluster at a single node. We will

instead use the fault-tolerant spanner of Dinitz and Krauthgamer [DK11], rather than the greedy

algorithm, and combine it with the non-fault-tolerant spanner of [BS07] which can be eﬃciently

constructed in CONGEST. This approach means that, unlike in the centralized setting or the

LOCAL model, we will not be able to get size-optimal fault-tolerant spanners.

The algorithm of [DK11] works as follows (in the traditional centralized model). Suppose that

we have some algorithm Awhich constructs a (2k−1)-spanner with at most g(n) edges on any

graph with nnodes. The algorithm of [DK11] consists of O(f3log n) iterations, and in each iteration

every node chooses to participate independently with probability 1/f. For each i∈O(f3log n), let

Vibe the vertices who participate and let Gibe the subgraph of Ginduced by them. We let Hibe

the (2k−1)-spanner constructed by Aon Gi. Then we return the union of all Hi.

The main theorem that [DK11] proved about this is the following.

Theorem 13 ( [DK11]).This algorithm returns an f-VFT (2k−1)-spanner of Gwith Of3g((2n)/f) log n

edges with high probability.

Note that when g(n) = n1+1/k , this results in an f-VFT (2k−1)-spanner with at most

O(f2−1/kn1+1/k log n), which is precisely the bound from [DK11].

12

Since the algorithm of [DK11] uses an arbitrary non-fault-tolerant spanner algorithm A, by

using a distributed spanner algorithm for Awe naturally end up with a distributed fault-tolerant

spanner algorithm. In particular, we will combine the algorithm of [DK11] with the following

algorithm due to Baswana and Sen [BS07].

Theorem 14 ( [BS07]).There is an algorithm that computes a (2k−1)-spanner with at most

O(kn1+1/k )edges of any weighted graph in O(k2)rounds in the CONGEST model.

Combining Theorems 13 and 14 immediately gives an algorithm in CONGEST that returns

an f-VFT (2k−1)-spanner of size at most O(kf 2−1/kn1+1/k ) that runs in at most O(k2f3log n)

rounds (with high probability). We can just run each iteration of the Dinitz-Krauthgamer algo-

rithm [DK11] in series, and in each iteration we use the Baswana-Sen algorithm [BS07]. Since there

are O(f3log n) iterations, and Baswana-Sen takes O(k2) rounds, this gives a total round complexity

of O(k2f3log n).

We can improve on this bound by taking advantage of the fact that each iteration of Dinitz-

Krauthgamer runs on a relatively small graph (approximately n/f nodes), so we can run some of

these iterations in parallel.

Theorem 15. There is an algorithm that computes an f-VFT (2k−1)-spanner of Gwith Okf 2−1/kn1+1/k log n

edges of any weighted graph and which runs in O(f2(log f+ log log n) + k2flog n)rounds in the

CONGEST model (all with high probability).

Proof. In the ﬁrst phase of the algorithm each vertex randomly selects which of the O(f3log n)

iterations in which to participate by choosing each iteration independently with probability 1/f.

So by a Chernoﬀ bound, with high probability every node picks O(f2log n) iterations in which

to participate. Then each vertex sends its chosen iterations to all of its neighbors. Identifying

these iterations take O(f2log n·log(f3log n)) = O(f2log n·(log f+ log log n)) bits, and thus

O(f2(log f+ log log n)) rounds in CONGEST.

After this has completed we enter the second phase of the algorithm, and now every node knows

which iterations it is participating in and which iterations each of its neighbors is participating in.

With high probability (by a simple Chernoﬀ bound), for every edge there are at most O(flog n)

iterations in which both endpoints participate. Thus if we try to run all O(f3log n) iterations of

Baswana-Sen (Theorem 14) in parallel, we have “congestion” of O(flog n) on each edge (at each

time step) since there could be up to that many iterations in which a message is supposed to be

sent along that edge at that time. Thus we can simply use O(flog n) time steps for each time step

of Baswana-Sen and can simulate all O(f3log n) iterations of the Dinitz-Krauthgamer algorithm

(note that each Baswana-Sen message needs to have a tag added to it with the iteration number,

but since that takes at most O(log(f3log n)) = O(log f+ log log n)≤O(log n) bits it ﬁts within the

required message size). Hence the total running time of this second phase is at most O(k2flog n).

The size and correctness bounds are direct from Theorems 13 and 14, and the round complexity

is from our analysis of the two phases above.

6 Conclusion and Future Work

In this paper we designed an algorithm to compute nearly-optimal fault-tolerant spanners in polyno-

mial time, answering a question posed by [BDPW18,BP19]. We also gave an optimal construction

in the LOCAL model which runs in O(log n) rounds, and an eﬃcient algorithm in the CONGEST

13

model that constructs fault-tolerant spanners which have the same size as in [DK11] rather than

the optimal size.

There are many interesting open questions remaining about eﬃcient algorithms for fault-tolerant

spanners, as well as about the extremal properties of these spanners. Most obviously, the size we

achieve is a factor of kaway from the optimal size, due to our use of an O(k)-approximation

for Length-Bounded Cut. Can this be removed, either by giving a better approximation for

Length-Bounded Cut or through some other construction? While kis somewhat small since

spanners tend to be most useful for constant stretch (and never have stretch larger than O(log n)),

it would still be nice to get fully optimal size in polynomial time. Similarly, our distributed

constructions are extremely simple, and there is no reason to think that we actually need Ω(log n)

rounds in LOCAL or that we cannot get optimal size fault-tolerant spanners in CONGEST. It would

be interesting to design better distributed and parallel algorithms for these objects, particularly

since the greedy algorithm (the only size-optimal algorithm we know) tends to be diﬃcult to

parallelize.

From a structural point of view, we reiterate one of the main open questions from [BDPW18]

and [BP19]: understanding the optimal bounds for edge-fault-tolerant spanners. The best upper

bound we have is the same O(f1−1/kn1+1/k ) that we have for the vertex case, while the best lower

bound is Ω(f1

2(1−1/k)n1+1/k) (from [BDPW18]). What is the correct bound?

References

[ADD+93] Ingo Alth¨ofer, Gautam Das, David P. Dobkin, Deborah Joseph, and Jos´e Soares. On

sparse spanners of weighted graphs. Discrete & Computational Geometry, 9:81–100,

1993.

[Bar96] Y. Bartal. Probabilistic approximation of metric spaces and its algorithmic applications.

In Proceedings of 37th Conference on Foundations of Computer Science, pages 184–193,

Oct 1996.

[BBG+14] Piotr Berman, Arnab Bhattacharyya, Elena Grigorescu, Sofya Raskhodnikova, David P.

Woodruﬀ, and Grigory Yaroslavtsev. Steiner transitive-closure spanners of low-

dimensional posets. Combinatorica, 34(3):255–277, 2014.

[BDPW18] Greg Bodwin, Michael Dinitz, Merav Parter, and Virginia Vassilevska Williams. Op-

timal vertex fault tolerant spanners (for ﬁxed stretch). In Artur Czumaj, editor, Pro-

ceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms,

SODA 2018, New Orleans, LA, USA, January 7-10, 2018, pages 1884–1900. SIAM,

2018.

[BEH+06] Georg Baier, Thomas Erlebach, Alexander Hall, Ekkehard K¨ohler, Heiko Schilling, and

Martin Skutella. Length-bounded cuts and ﬂows. In Michele Bugliesi, Bart Preneel,

Vladimiro Sassone, and Ingo Wegener, editors, Automata, Languages and Program-

ming, pages 679–690, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.

[BGJ+09] Arnab Bhattacharyya, Elena Grigorescu, Kyomin Jung, Sofya Raskhodnikova, and

David P. Woodruﬀ. Transitive-closure spanners. In Proceedings of the Twentieth Annual

ACM-SIAM Symposium on Discrete Algorithms, SODA ’09, pages 932–941, 2009.

14

[BK15] Andr´as A. Bencz´ur and David R. Karger. Randomized approximation schemes for cuts

and ﬂows in capacitated graphs. SIAM J. Comput., 44(2):290–319, 2015.

[BKM09] Glencora Borradaile, Philip Klein, and Claire Mathieu. An O(nlog n) approximation

scheme for steiner tree in planar graphs. ACM Trans. Algorithms, 5(3):31:1–31:31, July

2009.

[BP19] Greg Bodwin and Shyamal Patel. A trivial yet optimal solution to vertex fault tolerant

spanners. In Proceedings of the 2019 ACM Symposium on Principles of Distributed

Computing, PODC 19, page 541543, New York, NY, USA, 2019. Association for Com-

puting Machinery.

[BS07] Surender Baswana and Sandeep Sen. A simple and linear time randomized algorithm for

computing sparse spanners in weighted graphs. Random Struct. Algorithms, 30(4):532–

563, 2007.

[BSS14] Joshua D. Batson, Daniel A. Spielman, and Nikhil Srivastava. Twice-ramanujan spar-

siﬁers. SIAM Review, 56(2):315–334, 2014.

[CLPR10] Shiri Chechik, Michael Langberg, David Peleg, and Liam Roditty. Fault tolerant span-

ners for general graphs. SIAM J. Comput., 39(7):3403–3423, 2010.

[CZ04] Artur Czumaj and Hairong Zhao. Fault-tolerant geometric spanners. Discrete & Com-

putational Geometry, 32(2):207–230, 2004.

[DK11] Michael Dinitz and Robert Krauthgamer. Fault-tolerant spanners: better and sim-

pler. In Proceedings of the 30th Annual ACM Symposium on Principles of Distributed

Computing, PODC 2011, San Jose, CA, USA, June 6-8, 2011, pages 169–178, 2011.

[DKN17] Michael Dinitz, Guy Kortsarz, and Zeev Nutov. Improved approximation algorithm

for steiner k-forest with nearly uniform weights. ACM Trans. Algorithms, 13(3), July

2017.

[Erd64] Paul Erd˝os. Extremal problems in graph theory. In IN THEORY OF GRAPHS AND

ITS APPLICATIONS, PROC. SYMPOS. SMOLENICE. Citeseer, 1964.

[LNS98] Christos Levcopoulos, Giri Narasimhan, and Michiel Smid. Eﬃcient algorithms for

constructing fault-tolerant geometric spanners. In Proceedings of the Thirtieth Annual

ACM Symposium on Theory of Computing, pages 186–195. ACM, 1998.

[LS93] Nathan Linial and Michael E. Saks. Low diameter graph decompositions. Combina-

torica, 13(4):441–454, 1993.

[Luk99] Tamas Lukovszki. New results on fault tolerant geometric spanners. Algorithms and

Data Structures, pages 774–774, 1999.

[MPVX15] Gary L Miller, Richard Peng, Adrian Vladu, and Shen Chen Xu. Improved parallel

algorithms for spanners and hopsets. In Proceedings of the Symposium on Parallelism

in Algorithms and Architectures. ACM, 2015.

15

[MPX13] Gary L Miller, Richard Peng, and Shen Chen Xu. Parallel graph decompositions using

random shifts. In Proceedings of the ACM Symposium on Parallelism in algorithms

and architectures. ACM, 2013.

[NS07] Giri Narasimhan and Michiel Smid. Geometric Spanner Networks. Cambridge Univer-

sity Press, 2007.

[Pel00] David Peleg. Distributed computing: a locality-sensitive approach. SIAM, 2000.

[PS89] David Peleg and Alejandro A. Sch¨aﬀer. Graph spanners. Journal of Graph Theory,

13(1):99–116, 1989.

[PU89] David Peleg and Jeﬀrey D. Ullman. An optimal synchronizer for the hypercube. SIAM

J. Comput., 18(4):740–747, 1989.

[SS11] Daniel A. Spielman and Nikhil Srivastava. Graph sparsiﬁcation by eﬀective resistances.

SIAM J. Comput., 40(6):1913–1926, 2011.

[TZ01] Mikkel Thorup and Uri Zwick. Compact routing schemes. In SPAA, pages 1–10, 2001.

[TZ05] Mikkel Thorup and Uri Zwick. Approximate distance oracles. J. ACM, 52(1):1–24,

2005.

16