Article

Single Pass Spectral Sparsification in Dynamic Streams

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We present the first single pass algorithm for computing spectral sparsifiers of graphs in the dynamic semi-streaming model. Given a single pass over a stream containing insertions and deletions of edges to a graph G, our algorithm maintains a randomized linear sketch of the incidence matrix of G into dimension O((1/epsilon^2) n polylog(n)). Using this sketch, at any point, the algorithm can output a (1 +/- epsilon) spectral sparsifier for G with high probability. While O((1/epsilon^2) n polylog(n)) space algorithms are known for computing *cut sparsifiers* in dynamic streams [AGM12b, GKP12] and spectral sparsifiers in *insertion-only* streams [KL11], prior to our work, the best known single pass algorithm for maintaining spectral sparsifiers in dynamic streams required sketches of dimension Omega((1/epsilon^2) n^(5/3)) [AGM14]. To achieve our result, we show that, using a coarse sparsifier of G and a linear sketch of G's incidence matrix, it is possible to sample edges by effective resistance, obtaining a spectral sparsifier of arbitrary precision. Sampling from the sketch requires a novel application of ell_2/ell_2 sparse recovery, a natural extension of the ell_0 methods used for cut sparsifiers in [AGM12b]. Recent work of [MP12] on row sampling for matrix approximation gives a recursive approach for obtaining the required coarse sparsifiers. Under certain restrictions, our approach also extends to the problem of maintaining a spectral approximation for a general matrix A^T A given a stream of updates to rows in A.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Their algorithms maintain spanners in a graph, which is a different approach from our algorithms based on sampling rows in the online setting. When A is the Laplacian matrix, we can also obtain a spectral approximation in the dynamic semi-streaming setting [19,20]. ...
... When p i = 1, Eq. (19) holds, since the gaps X U i−1 and X L i−1 strictly increase. When p i < 1, ...
Preprint
This paper studies spectral approximation for a positive semidefinite matrix in the online setting. It is known in [Cohen et al. APPROX 2016] that we can construct a spectral approximation of a given $n \times d$ matrix in the online setting if an additive error is allowed. In this paper, we propose an online algorithm that avoids an additive error with the same time and space complexities as the algorithm of Cohen et al., and provides a better upper bound on the approximation size when a given matrix has small rank. In addition, we consider the online random order setting where a row of a given matrix arrives uniformly at random. In this setting, we propose time and space efficient algorithms to find a spectral approximation. Moreover, we reveal that a lower bound on the approximation size in the online random order setting is $\Omega (d \epsilon^{-2} \log n)$, which is larger than the one in the offline setting by an $\mathrm{O}\left( \log n \right)$ factor.
... Again, the case of p = 2 has the richest research history, with the state-of-the-art results due to Cohen et al. [5] and Jiang et al. [8], which retain only O( −1 d log d log( A 2 2 )) rows of A (where A 2 denotes the operator norm of A). The idea of the algorithms is to sample according to the online leverage scores, which was first employed in [10]. The online leverage score of a row is simply the leverage score of the row in the submatrix of A consisting of all the revealed rows so far. ...
Preprint
Active regression considers a linear regression problem where the learner receives a large number of data points but can only observe a small number of labels. Since online algorithms can deal with incremental training data and take advantage of low computational cost, we consider an online extension of the active regression problem: the learner receives data points one by one and immediately decides whether it should collect the corresponding labels. The goal is to efficiently maintain the regression of received data points with a small budget of label queries. We propose novel algorithms for this problem under $\ell_p$ loss where $p\in[1,2]$. To achieve a $(1+\epsilon)$-approximate solution, our proposed algorithms only require $\tilde{\mathcal{O}}(\epsilon^{-2} d \log(n\kappa))$ queries of labels, where $n$ is the number of data points and $\kappa$ is a quantity, called the condition number, of the data points. The numerical results verify our theoretical results and show that our methods have comparable performance with offline active regression algorithms.
... Cuts in graphs are a fundamental object of study, and play a central role in the study of graph algorithms. Consequently, the problem of sparsifying a graph while approximately preserving its cut structure has been extensively studied (see, for instance, [17,6,18,25,1,2,13,5,3,21,15,4,16], and references therein). A cut-preserving sparsifier not only reduces the space requirement for any computation, but it can also reduce the time complexity of solving many fundamental cut, flow, and matching problems as one can now run the algorithms on the sparsifier which may contain far fewer edges. ...
Preprint
The problem of sparsifying a graph or a hypergraph while approximately preserving its cut structure has been extensively studied and has many applications. In a seminal work, Bencz\'ur and Karger (1996) showed that given any $n$-vertex undirected weighted graph $G$ and a parameter $\varepsilon \in (0,1)$, there is a near-linear time algorithm that outputs a weighted subgraph $G'$ of $G$ of size $\tilde{O}(n/\varepsilon^2)$ such that the weight of every cut in $G$ is preserved to within a $(1 \pm \varepsilon)$-factor in $G'$. The graph $G'$ is referred to as a {\em $(1 \pm \varepsilon)$-approximate cut sparsifier} of $G$. Subsequent recent work has obtained a similar result for the more general problem of hypergraph cut sparsifiers. However, all known sparsification algorithms require $\Omega(n + m)$ time where $n$ denotes the number of vertices and $m$ denotes the number of hyperedges in the hypergraph. Since $m$ can be exponentially large in $n$, a natural question is if it is possible to create a hypergraph cut sparsifier in time polynomial in $n$, {\em independent of the number of edges}. We resolve this question in the affirmative, giving the first sublinear time algorithm for this problem, given appropriate query access to the hypergraph.
... If the list of edges includes deletions, then the model is called the turnstile model; otherwise it is called the insertion-only model. In both models, some graph problems, such as spanning trees [3], k-connectivity [25], densest subgraph [37], degeneracy [15], cut-sparsifier [30], and (∆ + 1)-coloring [4], can be exactly solved or (1 + ε)-approximated in a single pass, while other graph problems, such as triangle detection and unweighted all-pairs shortest paths [7], are known to requireΩ(n) passes to compute. For many fundamental graph problems, e.g., standard spanning trees, the tractability in these models is open. ...
Preprint
The semi-streaming model is a variant of the streaming model frequently used for the computation of graph problems. It allows the edges of an $n$-node input graph to be read sequentially in $p$ passes using $\tilde{O}(n)$ space. In this model, some graph problems, such as spanning trees and $k$-connectivity, can be exactly solved in a single pass; while other graph problems, such as triangle detection and unweighted all-pairs shortest paths, are known to require $\tilde{\Omega}(n)$ passes to compute. For many fundamental graph problems, the tractability in these models is open. In this paper, we study the tractability of computing some standard spanning trees. Our results are: (1) Maximum-Leaf Spanning Trees. This problem is known to be APX-complete with inapproximability constant $\rho\in[245/244,2)$. By constructing an $\varepsilon$-MLST sparsifier, we show that for every constant $\varepsilon > 0$, MLST can be approximated in a single pass to within a factor of $1+\varepsilon$ w.h.p. (albeit in super-polynomial time for $\varepsilon \le \rho-1$ assuming $\mathrm{P} \ne \mathrm{NP}$). (2) BFS Trees. It is known that BFS trees require $\omega(1)$ passes to compute, but the na\"{i}ve approach needs $O(n)$ passes. We devise a new randomized algorithm that reduces the pass complexity to $O(\sqrt{n})$, and it offers a smooth tradeoff between pass complexity and space usage. (3) DFS Trees. The current best algorithm by Khan and Mehta {[}STACS 2019{]} takes $\tilde{O}(h)$ passes, where $h$ is the height of computed DFS trees. Our contribution is twofold. First, we provide a simple alternative proof of this result, via a new connection to sparse certificates for $k$-node-connectivity. Second, we present a randomized algorithm that reduces the pass complexity to $O(\sqrt{n})$, and it also offers a smooth tradeoff between pass complexity and space usage.
... Sparse recovery sketches are also used to address the more challenging problem of spectral sparsification in [AGM13], but lead to a solution with suboptimalÕ(n 5/3 −2 ) space and runtime. The open problem of achieving near optimalÕ(n −2 ) space for streaming spectral sparsifiers was eventually resolved in [KLM + 17]. This result, which was also surveyed in [Woo14], is based on edge sampling. ...
... Thus, it becomes more significant to explore these dynamic graph problems on a computation model that efficiently handles large storage and computations involved. In the past three decades a lot of work has been done to address dynamic graph problems in parallel [16,22,40,45,46], semi-streaming [3,10,33,42,43], and distributed (also called dynamic networks) [4,9,18,25,29,30,37,48] environments. ...
Article
Full-text available
Depth first search (DFS) tree is a fundamental data structure for solving various graph problems. The classical algorithm [54] for building a DFS tree requires O(m+n) time for a given undirected graph G having n vertices and m edges. Recently, Baswana et al. [5] presented a simple algorithm for updating the DFS tree of an undirected graph after an edge/vertex update in Õ(n)¹ time. However, their algorithm is strictly sequential. We present an algorithm achieving similar bounds that can be easily adopted to the parallel environment. In the parallel environment, a DFS tree can be computed from scratch in expected Õ(1) time [2] on an EREW PRAM, whereas the best deterministic algorithm takes Õ(&sqrt; n) time [2, 27] on a CRCW PRAM. Our algorithm can be used to develop optimal time (to poly log n factors) deterministic parallel algorithms for maintaining fully dynamic DFS and fault tolerant DFS of an undirected graph. (1) Parallel Fully Dynamic DFS: Given an arbitrary online sequence of vertex or edge updates, we can maintain a DFS tree of an undirected graph in Õ(1) time per update using m processors on an EREW PRAM. (2) Parallel Fault tolerant DFS: An undirected graph can be preprocessed to build a data structure of size O(m), such that for any set of k updates (where k is constant) in the graph, a DFS tree of the updated graph can be computed in Õ(1) time using n processors on an EREW PRAM. For constant k, this is also work optimal (to poly log n factors). Moreover, our fully dynamic DFS algorithm provides, in a seamless manner, nearly optimal (to poly log n factors) algorithms for maintaining a DFS tree in the semi-streaming environment and a restricted distributed model. These are the first parallel, semi-streaming, and distributed algorithms for maintaining a DFS tree in the dynamic setting.
... We plan to develop a multi-machine solution in the future to further scale NetSMF. Second, building upon NetSMF, we would like to efficiently and accurately learn embeddings for large-scale directed [9], dynamic [20], and/or heterogeneous networks. Third, as the advantage of matrix factorization methods demonstrated, we are also interested in exploring the other matrix definitions that may be effective in capturing different structural properties in networks. ...
Preprint
Full-text available
We study the problem of large-scale network embedding, which aims to learn latent representations for network mining applications. Previous research shows that 1) popular network embedding benchmarks, such as DeepWalk, are in essence implicitly factorizing a matrix with a closed form, and 2)the explicit factorization of such matrix generates more powerful embeddings than existing methods. However, directly constructing and factorizing this matrix---which is dense---is prohibitively expensive in terms of both time and space, making it not scalable for large networks. In this work, we present the algorithm of large-scale network embedding as sparse matrix factorization (NetSMF). NetSMF leverages theories from spectral sparsification to efficiently sparsify the aforementioned dense matrix, enabling significantly improved efficiency in embedding learning. The sparsified matrix is spectrally close to the original dense one with a theoretically bounded approximation error, which helps maintain the representation power of the learned embeddings. We conduct experiments on networks of various scales and types. Results show that among both popular benchmarks and factorization based methods, NetSMF is the only method that achieves both high efficiency and effectiveness. We show that NetSMF requires only 24 hours to generate effective embeddings for a large-scale academic collaboration network with tens of millions of nodes, while it would cost DeepWalk months and is computationally infeasible for the dense matrix factorization solution. The source code of NetSMF is publicly available (https://github.com/xptree/NetSMF).
... Kelner and Levin [23] give single-pass incremental streaming algorithm using near-linear space and total update time. This was extended by Kapralov et al. [20] to the dynamic semi-streaming model which allows both edge insertions and deletions. Recently, Abraham et al. [5] give a fully-dynamic algorithm for maintaining spectral sparsifiers in poly-logarithmic amortized update time. ...
Conference Paper
Full-text available
We introduce a new algorithmic framework for designing dynamic graph algorithms in minor-free graphs, by exploiting the structure of such graphs and a tool called vertex sparsification, which is a way to compress large graphs into small ones that well preserve relevant properties among a subset of vertices and has previously mainly been used in the design of approximation algorithms. Using this framework, we obtain a Monte Carlo randomized fully dynamic algorithm for (1 + epsilon)-approximating the energy of electrical flows in n-vertex planar graphs with tilde{O}(r epsilon^{-2}) worst-case update time and tilde{O}((r + n / sqrt{r}) epsilon^{-2}) worst-case query time, for any r larger than some constant. For r=n^{2/3}, this gives tilde{O}(n^{2/3} epsilon^{-2}) update time and tilde{O}(n^{2/3} epsilon^{-2}) query time. We also extend this algorithm to work for minor-free graphs with similar approximation and running time guarantees. Furthermore, we illustrate our framework on the all-pairs max flow and shortest path problems by giving corresponding dynamic algorithms in minor-free graphs with both sublinear update and query times. To the best of our knowledge, our results are the first to systematically establish such a connection between dynamic graph algorithms and vertex sparsification. We also present both upper bound and lower bound for maintaining the energy of electrical flows in the incremental subgraph model, where updates consist of only vertex activations, which might be of independent interest.
... This follows from the main theorem of [21] (also proved in the work [1]). By the following theorem, every cut of G is multiplicatively approximated and hence G is connected iff H is connected, since a graph is disconnected iff it has a zero cut. ...
Preprint
Full-text available
We consider algorithms with access to an unknown matrix $M\in\mathbb{F}^{n \times d}$ via matrix-vector products, namely, the algorithm chooses vectors $\mathbf{v}^1, \ldots, \mathbf{v}^q$, and observes $M\mathbf{v}^1,\ldots, M\mathbf{v}^q$. Here the $\mathbf{v}^i$ can be randomized as well as chosen adaptively as a function of $ M\mathbf{v}^1,\ldots,M\mathbf{v}^{i-1}$. Motivated by applications of sketching in distributed computation, linear algebra, and streaming models, as well as connections to areas such as communication complexity and property testing, we initiate the study of the number $q$ of queries needed to solve various fundamental problems. We study problems in three broad categories, including linear algebra, statistics problems, and graph problems. For example, we consider the number of queries required to approximate the rank, trace, maximum eigenvalue, and norms of a matrix $M$; to compute the AND/OR/Parity of each column or row of $M$, to decide whether there are identical columns or rows in $M$ or whether $M$ is symmetric, diagonal, or unitary; or to compute whether a graph defined by $M$ is connected or triangle-free. We also show separations for algorithms that are allowed to obtain matrix-vector products only by querying vectors on the right, versus algorithms that can query vectors on both the left and the right. We also show separations depending on the underlying field the matrix-vector product occurs in. For graph problems, we show separations depending on the form of the matrix (bipartite adjacency versus signed edge-vertex incidence matrix) to represent the graph. Surprisingly, this fundamental model does not appear to have been studied on its own, and we believe a thorough investigation of problems in this model would be beneficial to a number of different application areas.
... Randomization has played an increasingly fundamental role in the design of modern data structures. The current best algorithms for fully-dynamic graph connectivity [28], [29], shortest paths [1], graph spanners [4], maximal matchings [32], and the dimensionality-reductions of large matrices [24], [26] all critically rely on randomization. An increasing majority of these data structures operate under the oblivious adversary model, which assumes that updates are generated independently of the internal randomness used in the data structure. ...
... The fully dynamic setting, where edges can be both inserted and removed, is an important extension where our approach could be further investigated, especially because it has been observed in several domains that graphs become denser as they evolve over time (Leskovec et al., 2005). While sparsifiers have been developed for this setting (see e.g., (Kapralov et al., 2014)), current solutions would require O(n 2 polylog(n)) time to construct the sparsifier, thus making it unfeasible to repeat this computation many times over the stream. Extending sparsification techniques to the fully dynamic setting in a computationally efficient manner is currently an open problem. ...
Thesis
The main advantage of non-parametric models is that the accuracy of the model (degreesof freedom) adapts to the number of samples. The main drawback is the so-called "curseof kernelization": to learn the model we must first compute a similarity matrix among allsamples, which requires quadratic space and time and is unfeasible for large datasets.Nonetheless the underlying effective dimension (effective d.o.f.) of the dataset is often muchsmaller than its size, and we can replace the dataset with a subset (dictionary) of highlyinformative samples. Unfortunately, fast data-oblivious selection methods (e.g., uniformsampling) almost always discard useful information, while data-adaptive methods thatprovably construct an accurate dictionary, such as ridge leverage score (RLS) sampling,have a quadratic time/space cost.In this thesis we introduce a new single-pass streaming RLS sampling approach thatsequentially construct the dictionary, where each step compares a new sample only withthe current intermediate dictionary and not all past samples. We prove that the size ofall intermediate dictionaries scales only with the effective dimension of the dataset, andtherefore guarantee a per-step time and space complexity independent from the number ofsamples. This reduces the overall time required to construct provably accurate dictionariesfrom quadratic to near-linear, or even logarithmic when parallelized.Finally, for many non-parametric learning problems (e.g., K-PCA, graph SSL, online kernellearning) we we show that we can can use the generated dictionaries to compute approximatesolutions in near-linear that are both provably accurate and empirically competitive.
... In the streaming setting, there are several algorithms [2,29,20,3,21,28] that produce cut or spectral sparsifiers with O( n ǫ 2 ) edges using O( n ǫ 2 ) space. Such algorithms preserves every cut within (1 + ǫ)-factor (and therefore also preserve the max cut). ...
Article
Full-text available
We study sublinear algorithms for two fundamental graph problems, MAXCUT and correlation clustering. Our focus is on constructing core-sets as well as developing streaming algorithms for these problems. Constant space algorithms are known for dense graphs for these problems, while $\Omega(n)$ lower bounds exist (in the streaming setting) for sparse graphs. Our goal in this paper is to bridge the gap between these extremes. Our first result is to construct core-sets of size $\tilde{O}(n^{1-\delta})$ for both the problems, on graphs with average degree $n^{\delta}$ (for any $\delta >0$). This turns out to be optimal, under the exponential time hypothesis (ETH). Our core-set analysis is based on studying random-induced sub-problems of optimization problems. To the best of our knowledge, all the known results in our parameter range rely crucially on near-regularity assumptions. We avoid these by using a biased sampling approach, which we analyze using recent results on concentration of quadratic functions. We then show that our construction yields a 2-pass streaming $(1+\epsilon)$-approximation for both problems; the algorithm uses $\tilde{O}(n^{1-\delta})$ space, for graphs of average degree $n^\delta$.
... A core idea to our approach is to use graph sparsification: It is known that any graph on N nodes has an -sparsifier with onlyÕ( −2 N ) edges [39]. Furthermore, an -sparsifier can be constructed in the dynamic graph stream model using one-pass andÕ( −2 N )-space [21,24]. ...
Article
Full-text available
We study the classic NP-Hard problem of finding the maximum $k$-set coverage in the data stream model: given a set system of $m$ sets that are subsets of a universe $\{1,\cdots,n \}$, find the $k$ sets that cover the most number of distinct elements. The problem can be approximated up to a factor $1-1/e$ in polynomial time. In the streaming-set model, the sets and their elements are revealed online. The main goal of our work is to design algorithms, with approximation guarantees as close as possible to $1-1/e$, that use sublinear space $o(mn)$. We present two $(1-1/e-\epsilon)$-approximation algorithms: The first uses $O(\epsilon^{-1}\log (k/\epsilon))$ passes and $\tilde{O}(\epsilon^{-2} k)$ space whereas the second uses only a single pass but $\tilde{O}(\epsilon^{-3} m)$ space. We show that any approximation factor better than $(1-1/e)$ in constant passes requires $\Omega(m)$ space for constant $k$ even if the algorithm is allowed unbounded processing time. We also study the maximum $k$-vertex coverage problem in the dynamic graph stream model. In this model, the stream consists of edge insertions and deletions of a graph on $N$ vertices. The goal is to find $k$ vertices that cover the most number of distinct edges. We show that any constant approximation in constant passes requires $\Omega(N)$ space for constant $k$ whereas $\tilde{O}(\epsilon^{-2}N)$ space is sufficient for a $(1-\epsilon)$ approximation and arbitrary $k$ in a single pass. For regular graphs, we show that $\tilde{O}(\epsilon^{-3}k)$ space is sufficient for a $(1-\epsilon)$ approximation in a single pass. We generalize this to a $\kappa(1-\epsilon)$ approximation when the ratio between the minimum and maximum degree is bounded below by $\kappa$.
Preprint
Full-text available
Structural balance theory studies stability in networks. Given a $n$-vertex complete graph $G=(V,E)$ whose edges are labeled positive or negative, the graph is considered \emph{balanced} if every triangle either consists of three positive edges (three mutual ``friends''), or one positive edge and two negative edges (two ``friends'' with a common ``enemy''). From a computational perspective, structural balance turns out to be a special case of correlation clustering with the number of clusters at most two. The two main algorithmic problems of interest are: $(i)$ detecting whether a given graph is balanced, or $(ii)$ finding a partition that approximates the \emph{frustration index}, i.e., the minimum number of edge flips that turn the graph balanced. We study these problems in the streaming model where edges are given one by one and focus on \emph{memory efficiency}. We provide randomized single-pass algorithms for: $(i)$ determining whether an input graph is balanced with $O(\log{n})$ memory, and $(ii)$ finding a partition that induces a $(1 + \varepsilon)$-approximation to the frustration index with $O(n \cdot \text{polylog}(n))$ memory. We further provide several new lower bounds, complementing different aspects of our algorithms such as the need for randomization or approximation. To obtain our main results, we develop a method using pseudorandom generators (PRGs) to sample edges between independently-chosen \emph{vertices} in graph streaming. Furthermore, our algorithm that approximates the frustration index improves the running time of the state-of-the-art correlation clustering with two clusters (Giotis-Guruswami algorithm [SODA 2006]) from $n^{O(1/\varepsilon^2)}$ to $O(n^2\log^3{n}/\varepsilon^2 + n\log n \cdot (1/\varepsilon)^{O(1/\varepsilon^4)})$ time for $(1+\varepsilon)$-approximation. These results may be of independent interest.
Preprint
Full-text available
For any norms $N_1,\ldots,N_m$ on $\mathbb{R}^n$ and $N(x) := N_1(x)+\cdots+N_m(x)$, we show there is a sparsified norm $\tilde{N}(x) = w_1 N_1(x) + \cdots + w_m N_m(x)$ such that $|N(x) - \tilde{N}(x)| \leq \epsilon N(x)$ for all $x \in \mathbb{R}^n$, where $w_1,\ldots,w_m$ are non-negative weights, of which only $O(\epsilon^{-2} n \log(n/\epsilon) (\log n)^{2.5} )$ are non-zero. Additionally, we show that such weights can be found with high probability in time $O(m (\log n)^{O(1)} + \mathrm{poly}(n)) T$, where $T$ is the time required to evaluate a norm $N_i(x)$, assuming that $N(x)$ is $\mathrm{poly}(n)$-equivalent to the Euclidean norm. This immediately yields analogous statements for sparsifying sums of symmetric submodular functions. More generally, we show how to sparsify sums of $p$th powers of norms when the sum is $p$-uniformly smooth.
Preprint
We present a streaming algorithm for the vertex connectivity problem in dynamic streams with a (nearly) optimal space bound: for any $n$-vertex graph $G$ and any integer $k \geq 1$, our algorithm with high probability outputs whether or not $G$ is $k$-vertex-connected in a single pass using $\widetilde{O}(k n)$ space. Our upper bound matches the known $\Omega(k n)$ lower bound for this problem even in insertion-only streams -- which we extend to multi-pass algorithms in this paper -- and closes one of the last remaining gaps in our understanding of dynamic versus insertion-only streams. Our result is obtained via a novel analysis of the previous best dynamic streaming algorithm of Guha, McGregor, and Tench [PODS 2015] who obtained an $\widetilde{O}(k^2 n)$ space algorithm for this problem. This also gives a model-independent algorithm for computing a "certificate" of $k$-vertex-connectivity as a union of $O(k^2\log{n})$ spanning forests, each on a random subset of $O(n/k)$ vertices, which may be of independent interest.
Article
Full-text available
The general method of graph coarsening or graph reduction has been a remarkably useful and ubiquitous tool in scientific computing and it is now just starting to have a similar impact in machine learning. The goal of this paper is to take a broad look into coarsening techniques that have been successfully deployed in scientific computing and see how similar principles are finding their way in more recent applications related to machine learning. In scientific computing, coarsening plays a central role in algebraic multigrid methods as well as the related class of multilevel incomplete LU factorizations. In machine learning, graph coarsening goes under various names, e.g., graph downsampling or graph reduction. Its goal in most cases is to replace some original graph by one which has fewer nodes, but whose structure and characteristics are similar to those of the original graph. As will be seen, a common strategy in these methods is to rely on spectral properties to define the coarse graph.
Preprint
Full-text available
The Boolean Hidden Matching (BHM) problem, introduced in a seminal paper of Gavinsky et. al. [STOC'07], has played an important role in the streaming lower bounds for graph problems such as triangle and subgraph counting, maximum matching, MAX-CUT, Schatten $p$-norm approximation, maximum acyclic subgraph, testing bipartiteness, $k$-connectivity, and cycle-freeness. The one-way communication complexity of the Boolean Hidden Matching problem on a universe of size $n$ is $\Theta(\sqrt{n})$, resulting in $\Omega(\sqrt{n})$ lower bounds for constant factor approximations to several of the aforementioned graph problems. The related (and, in fact, more general) Boolean Hidden Hypermatching (BHH) problem introduced by Verbin and Yu [SODA'11] provides an approach to proving higher lower bounds of $\Omega(n^{1-1/t})$ for integer $t\geq 2$. Reductions based on Boolean Hidden Hypermatching generate distributions on graphs with connected components of diameter about $t$, and basically show that long range exploration is hard in the streaming model of computation with adversarial arrivals. In this paper we introduce a natural variant of the BHM problem, called noisy BHM (and its natural noisy BHH variant), that we use to obtain higher than $\Omega(\sqrt{n})$ lower bounds for approximating several of the aforementioned problems in graph streams when the input graphs consist only of components of diameter bounded by a fixed constant. We also use the noisy BHM problem to show that the problem of classifying whether an underlying graph is isomorphic to a complete binary tree in insertion-only streams requires $\Omega(n)$ space, which seems challenging to show using BHM or BHH alone.
Article
Full-text available
Clustering is a fundamental tool for analyzing large data sets. A rich body of work has been devoted to designing data-stream algorithms for the relevant optimization problems such as k -center, k -median, and k -means. Such algorithms need to be both time and and space efficient. In this paper, we address the problem of correlation clustering in the dynamic data stream model. The stream consists of updates to the edge weights of a graph on n nodes and the goal is to find a node-partition such that the end-points of negative-weight edges are typically in different clusters whereas the end-points of positive-weight edges are typically in the same cluster. We present polynomial-time, $$O(n\cdot {{\,\mathrm{polylog}\,}}n)$$ O ( n · polylog n ) -space approximation algorithms for natural problems that arise. We first develop data structures based on linear sketches that allow the “quality” of a given node-partition to be measured. We then combine these data structures with convex programming and sampling techniques to solve the relevant approximation problem. Unfortunately, the standard LP and SDP formulations are not obviously solvable in $$O(n\cdot {{\,\mathrm{polylog}\,}}n)$$ O ( n · polylog n ) -space. Our work presents space-efficient algorithms for the convex programming required, as well as approaches to reduce the adaptivity of the sampling.
Preprint
In this paper we provide an $\tilde{O}(nd+d^{3})$ time randomized algorithm for solving linear programs with $d$ variables and $n$ constraints with high probability. To obtain this result we provide a robust, primal-dual $\tilde{O}(\sqrt{d})$-iteration interior point method inspired by the methods of Lee and Sidford (2014, 2019) and show how to efficiently implement this method using new data-structures based on heavy-hitters, the Johnson-Lindenstrauss lemma, and inverse maintenance. Interestingly, we obtain this running time without using fast matrix multiplication and consequently, barring a major advance in linear system solving, our running time is near optimal for solving dense linear programs among algorithms that do not use fast matrix multiplication.
Preprint
Recently, Musco and Woodruff (FOCS, 2017) showed that given an $n \times n$ positive semidefinite (PSD) matrix $A$, it is possible to compute a relative-error $(1+\epsilon)$-approximate low-rank approximation to $A$ by querying $\widetilde{O}(nk/\epsilon^{2.5})$ entries of $A$ in time $\widetilde{O}(nk/\epsilon^{2.5} +n k^{\omega-1}/\epsilon^{2(\omega-1)})$. They also showed that any relative-error low-rank approximation algorithm must query $\widetilde{\Omega}(nk/\epsilon)$ entries of $A$, and closing this gap is an important open question. Our main result is to resolve this question by showing an algorithm that queries an optimal $\widetilde{O}(nk/\epsilon)$ entries of $A$ and outputs a relative-error low-rank approximation in $\widetilde{O}(n\cdot(k/\epsilon)^{\omega-1})$ time. Note, our running time improves that of Musco and Woodruff, and matches the information-theoretic lower bound if the matrix-multiplication exponent $\omega$ is $2$. Next, we introduce a new robust low-rank approximation model which captures PSD matrices that have been corrupted with noise. We assume that the Frobenius norm of the corruption is bounded. Here, we relax the notion of approximation to additive-error, since it is information-theoretically impossible to obtain a relative-error approximation in this setting. While a sample complexity lower bound precludes sublinear algorithms for arbitrary PSD matices, we provide the first sublinear time and query algorithms when the corruption on the diagonal entries is bounded. As a special case, we show sample-optimal sublinear time algorithms for low-rank approximation of correlation matrices corrupted by noise.
Preprint
Consider the following {\em 2-respecting min-cut} problem. Given a weighted graph $G$ and its spanning tree $T$, find the minimum cut among the cuts that contain at most two edges in $T$. This problem is an important subroutine in Karger's celebrated randomized near-linear-time min-cut algorithm [STOC'96]. We present a new approach for this problem which can be easily implemented in many settings, leading to the following randomized min-cut algorithms for weighted graphs. * An $O(m \log^2 n+n\log^5 n)$-time sequential algorithm: This improves Karger's long-standing $O(m \log^3 n)$ bound when the input graph is not extremely sparse. Improvements over Karger's bounds were previously known only under a rather strong assumption that the input graph is {\em simple} (unweighted without parallel edges) [Henzinger, Rao, Wang, SODA'17; Ghaffari, Nowicki, Thorup, SODA'20]. * An algorithm that requires $\tilde O(n)$ {\em cut queries} to compute the min-cut of a weighted graph: This answers an open problem by Rubinstein, Schramm, and Weinberg [ITCS'18], who obtained a similar bound for simple graphs. Our bound is tight up to polylogarithmic factors. * A {\em streaming} algorithm that requires $\tilde O(n)$ space and $O(\log n)$ passes to compute the min-cut: The only previous non-trivial exact min-cut algorithm in this setting is the 2-pass $\tilde O(n)$-space algorithm on simple graphs [Rubinstein~et~al., ITCS'18] (observed by Assadi, Chen, and Khanna [STOC'19]). In contrast to Karger's 2-respecting min-cut algorithm which deploys sophisticated dynamic programming techniques, our approach exploits some cute structural properties so that it only needs to compute the values of $\tilde O(n)$ cuts corresponding to removing $\tilde O(n)$ pairs of tree edges, an operation that can be done quickly in many settings.
Conference Paper
We consider the problem of estimating the value of MAX-CUT in a graph in the streaming model of computation. At one extreme, there is a trivial 2-approximation for this problem that uses only O(log n) space, namely, count the number of edges and output half of this value as the estimate for the size of the MAX-CUT. On the other extreme, for any fixed є > 0, if one allows Õ(n) space, a (1+є)-approximate solution to the MAX-CUT value can be obtained by storing an Õ(n)-size sparsifier that essentially preserves MAX-CUT value. Our main result is that any (randomized) single pass streaming algorithm that breaks the 2-approximation barrier requires Ω(n)-space, thus resolving the space complexity of any non-trivial approximations of the MAX-CUT value to within polylogarithmic factors in the single pass streaming model. We achieve the result by presenting a tight analysis of the Implicit Hidden Partition Problem introduced by Kapralov et al. [SODA’17] for an arbitrarily large number of players. In this problem a number of players receive random matchings of Ω(n) size together with random bits on the edges, and their task is to determine whether the bits correspond to parities of some hidden bipartition, or are just uniformly random. Unlike all previous Fourier analytic communication lower bounds, our analysis does not directly use bounds on the ℓ2 norm of Fourier coefficients of a typical message at any given weight level that follow from hypercontractivity. Instead, we use the fact that graphs received by players are sparse (matchings) to obtain strong upper bounds on the ℓ1 norm of the Fourier coefficients of the messages of individual players using their special structure, and then argue, using the convolution theorem, that similar strong bounds on the ℓ1 norm are essentially preserved (up to an exponential loss in the number of players) once messages of different players are combined. We feel that our main technique is likely of independent interest.
Conference Paper
We study the problem of large-scale network embedding, which aims to learn latent representations for network mining applications. Previous research shows that 1) popular network embedding benchmarks, such as DeepWalk, are in essence implicitly factorizing a matrix with a closed form, and 2) the explicit factorization of such matrix generates more powerful embeddings than existing methods. However, directly constructing and factorizing this matrix-which is dense-is prohibitively expensive in terms of both time and space, making it not scalable for large networks. In this work, we present the algorithm of large-scale network embedding as sparse matrix factorization (NetSMF). NetSMF leverages theories from spectral sparsification to efficiently sparsify the aforementioned dense matrix, enabling significantly improved efficiency in embedding learning. The sparsified matrix is spectrally close to the original dense one with a theoretically bounded approximation error, which helps maintain the representation power of the learned embeddings. We conduct experiments on networks of various scales and types. Results show that among both popular benchmarks and factorization based methods, NetSMF is the only method that achieves both high efficiency and effectiveness. We show that NetSMF requires only 24 hours to generate effective embeddings for a large-scale academic collaboration network with tens of millions of nodes, while it would cost DeepWalk months and is computationally infeasible for the dense matrix factorization solution. The source code of NetSMF is publicly available1.
Preprint
Full-text available
In this paper we consider the problem of computing spectral approximations to graphs in the single pass dynamic streaming model. We provide a linear sketching based solution that given a stream of edge insertions and deletions to a $n$-node undirected graph, uses $\tilde O(n)$ space, processes each update in $\tilde O(1)$ time, and with high probability recovers a spectral sparsifier in $\tilde O(n)$ time. Prior to our work, state of the art results either used near optimal $\tilde O(n)$ space complexity, but brute-force $\Omega(n^2)$ recovery time [Kapralov et al.'14], or with subquadratic runtime, but polynomially suboptimal space complexity [Ahn et al.'14, Kapralov et al.'19]. Our main technical contribution is a novel method for `bucketing' vertices of the input graph into clusters that allows fast recovery of edges of sufficiently large effective resistance. Our algorithm first buckets vertices of the graph by performing ball-carving using (an approximation to) its effective resistance metric, and then recovers the high effective resistance edges from a sketched version of an electrical flow between vertices in a bucket, taking nearly linear time in the number of vertices overall. This process is performed at different geometric scales to recover a sample of edges with probabilities proportional to effective resistances and obtain an actual sparsifier of the input graph. This work provides both the first efficient $\ell_2$-sparse recovery algorithm for graphs and new primitives for manipulating the effective resistance embedding of a graph, both of which we hope have further applications.
Preprint
Clustering is a fundamental tool for analyzing large data sets. A rich body of work has been devoted to designing data-stream algorithms for the relevant optimization problems such as $k$-center, $k$-median, and $k$-means. Such algorithms need to be both time and and space efficient. In this paper, we address the problem of correlation clustering in the dynamic data stream model. The stream consists of updates to the edge weights of a graph on $n$ nodes and the goal is to find a node-partition such that the end-points of negative-weight edges are typically in different clusters whereas the end-points of positive-weight edges are typically in the same cluster. We present polynomial-time, $O(n\cdot \ \mbox{polylog}~n)$-space approximation algorithms for natural problems that arise. We first develop data structures based on linear sketches that allow the "quality" of a given node-partition to be measured. We then combine these data structures with convex programming and sampling techniques to solve the relevant approximation problem. Unfortunately, the standard LP and SDP formulations are not obviously solvable in $O(n\cdot \mbox{polylog}~n)$-space. Our work presents space-efficient algorithms for the convex programming required, as well as approaches to reduce the adaptivity of the sampling.
Preprint
We consider the problem of estimating the value of MAX-CUT in a graph in the streaming model of computation. At one extreme, there is a trivial $2$-approximation for this problem that uses only $O(\log n)$ space, namely, count the number of edges and output half of this value as the estimate for the size of the MAX-CUT. On the other extreme, for any fixed $\epsilon > 0$, if one allows $\tilde{O}(n)$ space, a $(1+\epsilon)$-approximate solution to the MAX-CUT value can be obtained by storing an $\tilde{O}(n)$-size sparsifier that essentially preserves MAX-CUT value. Our main result is that any (randomized) single pass streaming algorithm that breaks the $2$-approximation barrier requires $\Omega(n)$-space, thus resolving the space complexity of any non-trivial approximations of the MAX-CUT value to within polylogarithmic factors in the single pass streaming model. We achieve the result by presenting a tight analysis of the Implicit Hidden Partition Problem introduced by Kapralov et al.[SODA'17] for an arbitrarily large number of players. In this problem a number of players receive random matchings of $\Omega(n)$ size together with random bits on the edges, and their task is to determine whether the bits correspond to parities of some hidden bipartition, or are just uniformly random. Unlike all previous Fourier analytic communication lower bounds, our analysis does not directly use bounds on the $\ell_2$ norm of Fourier coefficients of a typical message at any given weight level that follow from hypercontractivity. Instead, we use the fact that graphs received by players are sparse (matchings) to obtain strong upper bounds on the $\ell_1$ norm of the Fourier coefficients of the messages of individual players, and then argue, using the convolution theorem, that similar strong bounds on the $\ell_1$ norm are essentially preserved once messages of different players are combined.
Preprint
Subgraph counting is a fundamental primitive in graph processing, with applications in social network analysis (e.g., estimating the clustering coefficient of a graph), database processing and other areas. The space complexity of subgraph counting has been studied extensively in the literature, but many natural settings are still not well understood. In this paper we revisit the subgraph (and hypergraph) counting problem in the sketching model, where the algorithm's state as it processes a stream of updates to the graph is a linear function of the stream. This model has recently received a lot of attention in the literature, and has become a standard model for solving dynamic graph streaming problems. In this paper we give a tight bound on the sketching complexity of counting the number of occurrences of a small subgraph $H$ in a bounded degree graph $G$ presented as a stream of edge updates. Specifically, we show that the space complexity of the problem is governed by the fractional vertex cover number of the graph $H$. Our subgraph counting algorithm implements a natural vertex sampling approach, with sampling probabilities governed by the vertex cover of $H$. Our main technical contribution lies in a new set of Fourier analytic tools that we develop to analyze multiplayer communication protocols in the simultaneous communication model, allowing us to prove a tight lower bound. We believe that our techniques are likely to find applications in other settings. Besides giving tight bounds for all graphs $H$, both our algorithm and lower bounds extend to the hypergraph setting, albeit with some loss in space complexity.
Article
Full-text available
We introduce a new algorithmic framework for designing dynamic graph algorithms in minor-free graphs, by exploiting the structure of such graphs and a tool called vertex sparsification, which is a way to compress large graphs into small ones that well preserve relevant properties among a subset of vertices and has previously mainly been used in the design of approximation algorithms. Using this framework, we obtain a Monte Carlo randomized fully dynamic algorithm for $(1+\varepsilon)$-approximating the energy of electrical flows in $n$-vertex planar graphs with $\tilde{O}(r\varepsilon^{-2})$ worst-case update time and $\tilde{O}((r+\frac{n}{\sqrt{r}})\varepsilon^{-2})$ worst-case query time, for any $r$ larger than some constant. For $r=n^{2/3}$, this gives $\tilde{O}(n^{2/3}\varepsilon^{-2})$ update time and $\tilde{O}(n^{2/3}\varepsilon^{-2})$ query time. We also extend this algorithm to work for minor-free graphs with similar approximation and running time guarantees. Furthermore, we illustrate our framework on the all-pairs max flow and shortest path problems by giving corresponding dynamic algorithms in minor-free graphs with both sublinear update and query times. To the best of our knowledge, our results are the first to systematically establish such a connection between dynamic graph algorithms and vertex sparsification. We also present both upper bound and lower bound for maintaining the energy of electrical flows in the incremental subgraph model, where updates consist of only vertex activations, which might be of independent interest.
Article
We give query-efficient algorithms for the global min-cut and the s-t cut problem in unweighted, undirected graphs. Our oracle model is inspired by the submodular function minimization problem: on query $S \subset V$, the oracle returns the size of the cut between $S$ and $V \setminus S$. We provide algorithms computing an exact minimum $s$-$t$ cut in $G$ with $\tilde{O}(n^{5/3})$ queries, and computing an exact global minimum cut of $G$ with only $\tilde{O}(n)$ queries (while learning the graph requires $\tilde{\Theta}(n^2)$ queries).
Article
This paper addresses matrix approximation problems for matrices that are large, sparse and/or that are representations of large graphs. To tackle these problems, we consider algorithms that are based primarily on coarsening techniques, possibly combined with random sampling. A multilevel coarsening technique is proposed which utilizes a hypergraph associated with the data matrix and a graph coarsening strategy based on column matching. Theoretical results are established that characterize the quality of the dimension reduction achieved by a coarsening step, when a proper column matching strategy is employed. We consider a number of standard applications of this technique as well as a few new ones. Among the standard applications we first consider the problem of computing the partial SVD for which a combination of sampling and coarsening yields significantly improved SVD results relative to sampling alone. We also consider the Column subset selection problem, a popular low rank approximation method used in data related applications, and show how multilevel coarsening can be adapted for this problem. Similarly, we consider the problem of graph sparsification and show how coarsening techniques can be employed to solve it. Numerical experiments illustrate the performances of the methods in various applications.
Conference Paper
A common approach for designing scalable algorithms for massive data sets is to distribute the computation across, say k, machines and process the data using limited communication between them. A particularly appealing framework here is the simultaneous communication model whereby each machine constructs a small representative summary of its own data and one obtains an approximate/exact solution from the union of the representative summaries. If the representative summaries needed for a problem are small, then this results in a communication-efficient and \emph{round-optimal} (requiring essentially no interaction between the machines) protocol. Some well-known examples of techniques for creating summaries include sampling, linear sketching, and composable coresets. These techniques have been successfully used to design communication efficient solutions for many fundamental graph problems. However, two prominent problems are notably absent from the list of successes, namely, the maximum matching problem and the minimum vertex cover problem. Indeed, it was shown recently that for both these problems, even achieving a modest approximation factor of \polylog{(n)} requires using representative summaries of size \widetilde{\Omega}(n^2) i.e. essentially no better summary exists than each machine simply sending its entire input graph. The main insight of our work is that the intractability of matching and vertex cover in the simultaneous communication model is inherently connected to an adversarial partitioning of the underlying graph across machines. We show that when the underlying graph is randomly partitioned across machines, both these problems admit \emph{randomized composable coresets} of size \widetilde{O}(n) that yield an \widetilde{O}(1)-approximate solution\footnote{Here and throughout the paper, we use \Ot(\cdot) notation to suppress \polylog{(n)} factors, where n is the number of vertices in the graph. In other words, a small subgraph of the input graph at each machine can be identified as its representative summary and the final answer then is obtained by simply running any maximum matching or minimum vertex cover algorithm on these combined subgraphs. This results in an Õ(1)-approximation simultaneous protocol for these problems with Õ(nk) total communication when the input is randomly partitioned across k machines. We also prove our results are optimal in a very strong sense: we not only rule out existence of smaller randomized composable coresets for these problems but in fact show that our \Ot(nk) bound for total communication is optimal for em any simultaneous communication protocol (i.e. not only for randomized coresets) for these two problems. Finally, by a standard application of composable coresets, our results also imply MapReduce algorithms with the same approximation guarantee in one or two rounds of communication, improving the previous best known round complexity for these problems.
Conference Paper
In this invited talk, we will survey some of the recent work on designing algorithms for analyzing massive graphs. Such graphs may not fit in main memory, may be distributed across numerous machines, and may change over time. This has motivated a rich body of work on analyzing graphs in the data stream model and the development of general algorithmic techniques, such as graph sketching, that can help minimize the space and communication costs required to process these massive graphs.
Article
In the communication problem $\mathbf{UR}$ (universal relation) [KRW95], Alice and Bob respectively receive $x, y \in\{0,1\}^n$ with the promise that $x\neq y$. The last player to receive a message must output an index $i$ such that $x_i\neq y_i$. We prove that the randomized one-way communication complexity of this problem in the public coin model is exactly $\Theta(\min\{n,\log(1/\delta)\log^2(\frac n{\log(1/\delta)})\})$ for failure probability $\delta$. Our lower bound holds even if promised $\mathop{support}(y)\subset \mathop{support}(x)$. As a corollary, we obtain optimal lower bounds for $\ell_p$-sampling in strict turnstile streams for $0\le p < 2$, as well as for the problem of finding duplicates in a stream. Our lower bounds do not need to use large weights, and hold even if promised $x\in\{0,1\}^n$ at all points in the stream. We give two different proofs of our main result. The first proof demonstrates that any algorithm $\mathcal A$ solving sampling problems in turnstile streams in low memory can be used to encode subsets of $[n]$ of certain sizes into a number of bits below the information theoretic minimum. Our encoder makes adaptive queries to $\mathcal A$ throughout its execution, but done carefully so as to not violate correctness. This is accomplished by injecting random noise into the encoder's interactions with $\mathcal A$, which is loosely motivated by techniques in differential privacy. Our second proof is via a novel randomized reduction from Augmented Indexing [MNSW98] which needs to interact with $\mathcal A$ adaptively. To handle the adaptivity we identify certain likely interaction patterns and union bound over them to guarantee correct interaction on all of them. To guarantee correctness, it is important that the interaction hides some of its randomness from $\mathcal A$ in the reduction.
Article
In the communication problem $\mathbf{UR}$ (universal relation) [KRW95], Alice and Bob respectively receive $x$ and $y$ in $\{0,1\}^n$ with the promise that $x\neq y$. The last player to receive a message must output an index $i$ such that $x_i\neq y_i$. We prove that the randomized one-way communication complexity of this problem in the public coin model is exactly $\Theta(\min\{n, \log(1/\delta)\log^2(\frac{n}{\log(1/\delta)})\})$ bits for failure probability $\delta$. Our lower bound holds even if promised $\mathop{support}(y)\subset \mathop{support}(x)$. As a corollary, we obtain optimal lower bounds for $\ell_p$-sampling in strict turnstile streams for $0\le p < 2$, as well as for the problem of finding duplicates in a stream. Our lower bounds do not need to use large weights, and hold even if it is promised that $x\in\{0,1\}^n$ at all points in the stream. Our lower bound demonstrates that any algorithm $\mathcal{A}$ solving sampling problems in turnstile streams in low memory can be used to encode subsets of $[n]$ of certain sizes into a number of bits below the information theoretic minimum. Our encoder makes adaptive queries to $\mathcal{A}$ throughout its execution, but done carefully so as to not violate correctness. This is accomplished by injecting random noise into the encoder's interactions with $\mathcal{A}$, which is loosely motivated by techniques in differential privacy. Our correctness analysis involves understanding the ability of $\mathcal{A}$ to correctly answer adaptive queries which have positive but bounded mutual information with $\mathcal{A}$'s internal randomness, and may be of independent interest in the newly emerging area of adaptive data analysis with a theoretical computer science lens.
Conference Paper
Given an n-node m-edge graph G, the degeneracy of graph G and the associated node ordering can be computed in linear time in the RAM model by a greedy algorithm that iteratively removes the node of min-degree [28]. In the semi-streaming model for large graphs, where memory is limited to \(\mathcal {O}(n \,\mathrm{polylog}\,n)\) and edges can only be accessed in sequential passes, the greedy algorithm requires too many passes, so another approach is needed. In the semi-streaming model, there is a deterministic log-pass algorithm for generating an ordering whose degeneracy approximates the minimum possible to within a factor of \((2+\varepsilon )\) for any constant \(\varepsilon > 0\) [12]. In this paper, we propose a randomized algorithm that improves the approximation factor to \((1+\varepsilon )\) with high probability and needs only a single pass. Our algorithm can be generalized to the model that allows edge deletions, but then it requires more computation and space usage. The generated node ordering not only yields a \((1+\varepsilon )\)-approximation for the degeneracy but gives constant-factor approximations for arboricity and thickness.
Article
We present the first single pass algorithm for computing spectral sparsifiers for graphs in the dynamic semi-streaming model. Given a single pass over a stream containing insertions and deletions of edges to a graph $G$, our algorithm maintains a randomized linear sketch of the incidence matrix of $G$ into dimension $O (\frac{1}{\epsilon^2}n{polylog} (n))$. Using this sketch, at any point, the algorithm can output a $(1 \pm \epsilon)$ spectral sparsifier for $G$ with high probability. While $O (\frac{1}{\epsilon^2} n{polylog}(n))$ space algorithms are known for computing cut sparsifiers in dynamic streams [K. J. Ahn, S. Guha, and A. McGregor, in Proceedings of the 31st ACM Symposium on Principles of Database Systems, 2012, pp. 5--14; A. Goel, M. Kapralov, and I. Post, \hrefhttp://arXiv.org/abs/1203.4900 arXiv:1203.4900, 2002] and spectral sparsifiers in insertion-only streams [J. A. Kelner and A. Levin, Theory Comput. Syst., 53 (2013), pp. 243--262], prior to our work, the best known single pass algorithm...
Conference Paper
Full-text available
We undertake a systematic study of sketching a quadratic form: given an n x n matrix A, create a succinct sketch sk(A) which can produce (without further access to A) a multiplicative (1+ε)-approximation to xT A x for any desired query x ∈ Rⁿ. While a general matrix does not admit non-trivial sketches, positive semi-definite (PSD) matrices admit sketches of size θ(ε{-2n), via the Johnson-Lindenstrauss lemma, achieving the "for each" guarantee, namely, for each query x, with a constant probability the sketch succeeds. (For the stronger "for all" guarantee, where the sketch succeeds for all x's simultaneously, again there are no non-trivial sketches.) We design significantly better sketches for the important subclass of graph Laplacian matrices, which we also extend to symmetric diagonally dominant matrices. A sequence of work culminating in that of Batson, Spielman, and Srivastava (SIAM Review, 2014), shows that by choosing and reweighting O(ε{-2n) edges in a graph, one achieves the "for all" guarantee. Our main results advance this front. • For the "for all" guarantee, we prove that Batson et al.'s bound is optimal even when we restrict to "cut queries" x ∈ (0,1}ⁿ. Specifically, an arbitrary sketch that can (1+ε)-estimate the weight of all cuts (S, bar S) in an n-vertex graph must be of size Ω(ε{-2n) bits. Furthermore, if the sketch is a cut-sparsifier (i.e., itself a weighted graph and the estimate is the weight of the corresponding cut in this graph), then the sketch must have Ω(ε{-2n) edges. • In contrast, previous lower bounds showed the bound only for spectral-sparsifiers. • For the "for each" guarantee, we design a sketch of size Õ(ε{-1n) bits for "cut queries" x ∈{0,1}ⁿ. We apply this sketch to design an algorithm for the distributed minimum cut problem. We prove a nearly-matching lower bound of Ω(ε{-1n) bits. For general queries x ∈ Rⁿ, we construct sketches of size Õ(ε{-1.6n) bits. Our results provide the first separation between the sketch size needed for the "for all" and "for each" guarantees for Laplacian matrices.
Article
This article studies the set cover problem under the semi-streaming model. The underlying set system is formalized in terms of a hypergraph G &equals; (V, E) whose edges arrive one by one, and the goal is to construct an edge cover F⊆E with the objective of minimizing the cardinality (or cost in the weighted case) of F. We further consider a parameterized relaxation of this problem, where, given some 0 &les; &epsi; < 1, the goal is to construct an edge (1 − &epsi;)-cover, namely, a subset of edges incident to all but an &epsi;-fraction of the vertices (or their benefit in the weighted case). The key limitation imposed on the algorithm is that its space is limited to (poly)logarithmically many bits per vertex. Our main result is an asymptotically tight tradeoff between &epsi; and the approximation ratio: We design a semi-streaming algorithm that on input hypergraph G constructs a succinct data structure D such that for every 0 &les; &epsi; < 1, an edge (1 − &epsi;)-cover that approximates the optimal edge (1-)cover within a factor of f(&epsi;, n) can be extracted from D (efficiently and with no additional space requirements), where f(&epsi;, n) &equals; &lbrace;O(1/&epsi;), if &epsi; > 1/&sqrt;nO(&sqrt;n), otherwise. In particular, for the traditional set cover problem, we obtain an O(&sqrt;n-approximation. This algorithm is proved to be best possible by establishing a family (parameterized by &epsi;) of matching lower bounds.
Article
Full-text available
We study the $\ell_1$-low rank approximation problem, where for a given $n \times d$ matrix $A$ and approximation factor $\alpha \geq 1$, the goal is to output a rank-$k$ matrix $\widehat{A}$ for which $$\|A-\widehat{A}\|_1 \leq \alpha \cdot \min_{\textrm{rank-}k\textrm{ matrices}~A'}\|A-A'\|_1,$$ where for an $n \times d$ matrix $C$, we let $\|C\|_1 = \sum_{i=1}^n \sum_{j=1}^d |C_{i,j}|$. This error measure is known to be more robust than the Frobenius norm in the presence of outliers and is indicated in models where Gaussian assumptions on the noise may not apply. The problem was shown to be NP-hard by Gillis and Vavasis and a number of heuristics have been proposed. It was asked in multiple places if there are any approximation algorithms. We give the first provable approximation algorithms for $\ell_1$-low rank approximation, showing that it is possible to achieve approximation factor $\alpha = (\log d) \cdot \mathrm{poly}(k)$ in $\mathrm{nnz}(A) + (n+d) \mathrm{poly}(k)$ time, where $\mathrm{nnz}(A)$ denotes the number of non-zero entries of $A$. If $k$ is constant, we further improve the approximation ratio to $O(1)$ with a $\mathrm{poly}(nd)$-time algorithm. Under the Exponential Time Hypothesis, we show there is no $\mathrm{poly}(nd)$-time algorithm achieving a $(1+\frac{1}{\log^{1+\gamma}(nd)})$-approximation, for $\gamma > 0$ an arbitrarily small constant, even when $k = 1$. We give a number of additional results for $\ell_1$-low rank approximation: nearly tight upper and lower bounds for column subset selection, CUR decompositions, extensions to low rank approximation with respect to $\ell_p$-norms for $1 \leq p < 2$ and earthmover distance, low-communication distributed protocols and low-memory streaming algorithms, algorithms with limited randomness, and bicriteria algorithms. We also give a preliminary empirical evaluation.
Article
Finding a small spectral approximation for a tall $n \times d$ matrix $A$ is a fundamental numerical primitive. For a number of reasons, one often seeks an approximation whose rows are sampled from those of $A$. Row sampling improves interpretability, saves space when $A$ is sparse, and preserves row structure, which is especially important, for example, when $A$ represents a graph. However, correctly sampling rows from $A$ can be costly when the matrix is large and cannot be stored and processed in memory. Hence, a number of recent publications focus on row sampling in the streaming setting, using little more space than what is required to store the outputted approximation [KL13, KLM+14]. Inspired by a growing body of work on online algorithms for machine learning and data analysis, we extend this work to a more restrictive online setting: we read rows of $A$ one by one and immediately decide whether each row should be kept in the spectral approximation or discarded, without ever retracting these decisions. We present an extremely simple algorithm that approximates $A$ up to multiplicative error $\epsilon$ and additive error $\delta$ using $O(d \log d \log(\epsilon||A||_2/\delta)/\epsilon^2)$ online samples, with memory overhead proportional to the cost of storing the spectral approximation. We also present an algorithm that uses $O(d^2$) memory but only requires $O(d\log(\epsilon||A||_2/\delta)/\epsilon^2)$ samples, which we show is optimal. Our methods are clean and intuitive, allow for lower memory usage than prior work, and expose new theoretical properties of leverage score based matrix approximation.
Article
Full-text available
In this paper, we survey algorithms for sparse recovery problems that are based on sparse random matrices. Such matrices has several attractive properties: they support algorithms with low computational complexity, and make it easy to perform incremental updates to signals. We discuss applications to several areas, including compressive sensing, data stream computing, and group testing.
Conference Paper
Full-text available
We present a streaming algorithm for constructing sparse spanners and show that our algorithm significantly outperforms the state-of-the-art algorithm for this task (due to Feigenbaum et al.). Specifically, the processing time per edge of our algorithm is O (1), and it is drastically smaller than that of the algorithm of Feigenbaum et al., and all other efficiency parameters of our algorithm are no greater (and some of them are strictly smaller) than the respective parameters of the state-of-the-art algorithm. We also devise a fully dynamic centralized algorithm maintaining sparse spanners. This algorithm has incremental update time of O (1), and a nontrivial decremental update time. To our knowledge, this is the first fully dynamic centralized algorithm for maintaining sparse spanners that provides nontrivial bounds on both incremental and decremental update time for a wide range of stretch parameter t .
Article
Full-text available
We present a streaming algorithm for constructing sparse spanners and show that our algorithm out-performs significantly the state-of-the-art algorithm for this task [20]. Specifically, the processing time-per-edge of our algorithm is drastically smaller than that of the algorithm of [20], and all other efficiency parameters of our algorithm are no greater (and some of them are strictly smaller) than the respective parameters for the state-of-the-art algorithm. We also devise a fully dynamic centralized algorithm maintaining sparse spanners. This algorithm has a very small incremental update time, and a non-trivial decremental update time. To our knowledge, this is the first fully dynamic centralized algorithm for maintaining sparse spanners that provides non-trivial bounds on both incremental and decremental update time for a wide range of stretch parameter t.
Article
Full-text available
A Euclidean approximate sparse recovery system consists of parameters k,N, an m-by-N measurement matrix, Φ, and a decoding algorithm, D. Given a vector, x, the system approximates x by ^x=D(Φ x), which must satisfy ||x - x||2≤ C ||x - xk||2, where xk denotes the optimal k-term approximation to x. (The output ^x may have more than k terms). For each vector x, the system must succeed with probability at least 3/4. Among the goals in designing such systems are minimizing the number m of measurements and the runtime of the decoding algorithm, D. In this paper, we give a system with m=O(k log(N/k)) measurements--matching a lower bound, up to a constant factor--and decoding time k log{O(1) N, matching a lower bound up to log(N) factors. We also consider the encode time (i.e., the time to multiply Φ by x), the time to update measurements (i.e., the time to multiply Φ by a 1-sparse x), and the robustness and stability of the algorithm (adding noise before and after the measurements). Our encode and update times are optimal up to log(k) factors. The columns of Φ have at most O(log²(k)log(N/k)) non-zeros, each of which can be found in constant time. Our full result, an FPRAS, is as follows. If x=xk+ν1, where ν1 and ν2 (below) are arbitrary vectors (regarded as noise), then, setting ^x = D(Φ x + ν2), and for properly normalized ν, we get [||^x - x||2² ≤ (1+ε)||ν1||2² + ε||ν2||2²,] using O((k/ε)log(N/k)) measurements and (k/ε)logO(1)(N) time for decoding.
Article
Full-text available
We improve on randomsampling techniquesfor approximately solving problems that involve cuts in graphs. We give a lineartime construction that transforms any graph on n vertices into an O(n log n)-edge graphon the same vertices whose cuts have approximately the same value as the original graph's. In this new graph, for example, we can run the ~ O(mn)-time maximum flow algorithm of Goldberg and Tarjan to find an s--t minimum cut in ~ O(n 2 ) time. This corresponds to a (1 + ffl)-times minimum s--t cut in the original graph. In a similar way, we can approximate a sparsest cut in ~ O(n 2 ) time. 1 Introduction Several papers by Karger [Kar94b, Kar94a, Kar96] have recently shownthat randomsampling is an effective tool for problems involving cuts in graphs. A cut is a partition of a graph's vertices into two groups; its value is the number, or in weighted graphs the total weight, of edges with one endpoint in each side of the cut. Many problems depend only on cut values: the connect...
Book
Data stream algorithms as an active research agenda emerged only over the past few years, even though the concept of making few passes over the data for performing computations has been around since the early days of Automata Theory. The data stream agenda now pervades many branches of Computer Science including databases, networking, knowledge discovery and data mining, and hardware systems. Industry is in synch too, with Data Stream Management Systems (DSMSs) and special hardware to deal with data speeds. Even beyond Computer Science, data stream concerns are emerging in physics, atmospheric science and statistics. Data Streams: Algorithms and Applications focuses on the algorithmic foundations of data streaming. In the data stream scenario, input arrives very rapidly and there is limited memory to store the input. Algorithms have to work with one or few passes over the data, space less than linear in the input size or time significantly less than the input size. In the past few years, a new theory has emerged for reasoning about algorithms that work within these constraints on space, time and number of passes. Some of the methods rely on metric embeddings, pseudo-random computations, sparse approximation theory and communication complexity. The applications for this scenario include IP network traffic analysis, mining text message streams and processing massive data sets in general. Data Streams: Algorithms and Applications surveys the emerging area of algorithms for processing data streams and associated applications. An extensive bibliography with over 200 entries points the reader to further resources for exploration.
Article
Analyzing massive data sets has been one of the key motivations for studying streaming algorithms. In recent years, there has been significant progress in analysing distributions in a streaming setting, but the progress on graph problems has been limited. A main reason for this has been the existence of linear space lower bounds for even simple problems such as determining the connectedness of a graph. However, in many new scenarios that arise from social and other interaction networks, the number of vertices is significantly less than the number of edges. This has led to the formulation of the semi-streaming model where we assume that the space is (near) linear in the number of vertices (but not necessarily the edges), and the edges appear in an arbitrary (and possibly adversarial) order. In this paper we focus on graph sparsification, which is one of the major building blocks in a variety of graph algorithms. There has been a long history of (non-streaming) sampling algorithms that provide sparse graph approximations and it a natural question to ask if the sparsification can be achieved using a small space, and in addition using a single pass over the data? The question is interesting from the standpoint of both theory and practice and we answer the question in the affirmative, by providing a one pass $\tilde{O}(n/\epsilon^{2})$ space algorithm that produces a sparsification that approximates each cut to a $(1+\epsilon)$ factor. We also show that $\Omega(n \log \frac1\epsilon)$ space is necessary for a one pass streaming algorithm to approximate the min-cut, improving upon the $\Omega(n)$ lower bound that arises from lower bounds for testing connectivity.
Article
Over the last decade, there has been considerable interest in designing algorithms for processing massive graphs in the data stream model. The original motivation was two-fold: a) in many applications, the dynamic graphs that arise are too large to be stored in the main memory of a single machine and b) considering graph problems yields new insights into the complexity of stream computation. However, the techniques developed in this area are now finding applications in other areas including data structures for dynamic graphs, approximation algorithms, and distributed and parallel computation. We survey the state-of-the-art results; identify general techniques; and highlight some simple algorithms that illustrate basic ideas.
Article
Linear sketching is a popular technique for computing in dynamic streams, where one needs to handle both insertions and deletions of elements. The underlying idea of taking randomized linear measurements of input data has been extremely successful in providing space-efficient algorithms for classical problems such as frequency moment estimation and computing heavy hitters, and was very recently shown to be a powerful technique for solving graph problems in dynamic streams [AGM'12]. Ideally, one would like to obtain algorithms that use one or a small constant number of passes over the data and a small amount of space (i.e. sketching dimension) to preserve some useful properties of the input graph presented as a sequence of edge insertions and edge deletions. In this paper, we concentrate on the problem of constructing linear sketches of graphs that (approximately) preserve the spectral information of the graph in a few passes over the stream. We do so by giving the first sketch-based algorithm for constructing multiplicative graph spanners in only two passes over the stream. Our spanners use ~O(n1+1/k) bits of space and have stretch 2k. While this stretch is larger than the conjectured optimal 2k-1 for this amount of space, we show for an appropriate k that it implies the first 2-pass spectral sparsifier with n1+o(1) bits of space. Previous constructions of spectral sparsifiers in this model with a constant number of passes would require n1+c bits of space for a constant c > 0. We also give an algorithm for constructing spanners that provides an additive approximation to the shortest path metric using a single pass over the data stream, also achieving an essentially best possible space/approximation tradeoff.
Article
We present a new bound relating edge connectivity in a simple, unweighted graph with effective resistance in the corresponding electrical network. The bound is tight. While we believe the bound is of independent interest, our work is motivated by the problem of constructing combinatorial and spectral sparsifiers of a graph, i.e., sparse, weighted sub-graphs that preserve cut information (in the case of combinatorial sparsifiers) and additional spectral information (in the case of spectral sparsifiers). Recent results by Fung et al. (STOC 2011) and Spielman and Srivastava (SICOMP 2011) show that sampling edges with probability based on edge-connectivity gives rise to a combinatorial sparsifier whereas sampling edges with probability based on effective resistance gives rise to a spectral sparsifier. Our result implies that by simply increasing the sampling probability by a O(n 2/3) factor in the combinatorial sparsifier construction, we also preserve the spectral properties of the graph. Combining this with the algorithms of Ahn et al. (SODA 2012, PODS 2012) gives rise to the first data stream algorithm for the construction of spectral sparsifiers in the dynamic setting where edges can be added or removed from the stream. This was posed as an open question by Kelner and Levin (STACS 2011).
Conference Paper
We improve on random sampling techniques for approximately solving problems that involve cuts in graphs. We give a linear-time construction that transforms any graph on n vertices into an O(nlogn)-edge graph on the same vertices whose cuts have approximately the same value as the original graph’s. In this new graph, for example, we can run O ˜(mn)-time maximum flow algorithm of Goldberg and Tarjan to find an s-t minimum cut in O ˜(n 2 ) time. This corresponds to a (1+ε)-times minimum s-t cut in the original graph. In a similar way, we can approximate a sparsest cut in O ˜(n 2 ) time.
Conference Paper
An oblivious subspace embedding (OSE) given some parameters ε, d is a distribution D over matrices Π ∈ Rm×n such that for any linear subspace W ⊆ Rn with dim(W) = d, PΠ~D(∀x ∈ W ||Πx||2 ∈ (1 ± ε)||x||2) > 2/3. We show that a certain class of distributions, Oblivious Sparse Norm-Approximating Projections (OSNAPs), provides OSE's with m = O(d1+γ/ε2), and where every matrix Π in the support of the OSE has only s = Oγ(1/ε) non-zero entries per column, for γ > 0 any desired constant. Plugging OSNAPs into known algorithms for approximate least squares regression, ℓp regression, low rank approximation, and approximating leverage scores implies faster algorithms for all these problems. Our main result is essentially a Bai-Yin type theorem in random matrix theory and is likely to be of independent interest: we show that for any fixed U ∈ Rn×d with orthonormal columns and random sparse Π, all singular values of ΠU lie in [1 - ε, 1 + ε] with good probability. This can be seen as a generalization of the sparse Johnson-Lindenstrauss lemma, which was concerned with d = 1. Our methods also recover a slightly sharper version of a main result of [Clarkson-Woodruff, STOC 2013], with a much simpler proof. That is, we show that OSNAPs give an OSE with m = O(d2/ε2), s = 1.
Article
There has been significant interest and progress recently in algorithms that solve regression problems involving tall and thin matrices in input sparsity time. These algorithms find shorter equivalent of a n*d matrix where n >> d, which allows one to solve a poly(d) sized problem instead. In practice, the best performances are often obtained by invoking these routines in an iterative fashion. We show these iterative methods can be adapted to give theoretical guarantees comparable and better than the current state of the art. Our approaches are based on computing the importances of the rows, known as leverage scores, in an iterative manner. We show that alternating between computing a short matrix estimate and finding more accurate approximate leverage scores leads to a series of geometrically smaller instances. This gives an algorithm that runs in $O(nnz(A) + d^{\omega + \theta} \epsilon^{-2})$ time for any $\theta > 0$, where the $d^{\omega + \theta}$ term is comparable to the cost of solving a regression problem on the small approximation. Our results are built upon the close connection between randomized matrix algorithms, iterative methods, and graph sparsification.
Article
When processing massive data sets, a core task is to construct synopses of the data. To be useful, a synopsis data structure should be easy to construct while also yielding good approximations of the relevant properties of the data set. A particularly useful class of synopses are sketches, i.e., those based on linear projections of the data. These are applicable in many models including various parallel, stream, and compressed sensing settings. A rich body of analytic and empirical work exists for sketching numerical data such as the frequencies of a set of entities. Our work investigates graph sketching where the graphs of interest encode the relationships between these entities. The main challenge is to capture this richer structure and build the necessary synopses with only linear measurements. In this paper we consider properties of graphs including the size of the cuts, the distances between nodes, and the prevalence of dense sub-graphs. Our main result is a sketch-based sparsifier construction: we show that O̅(nε-2) random linear projections of a graph on n nodes suffice to (1 + ε) approximate all cut values. Similarly, we show that O(ε-2) linear projections suffice for (additively) approximating the fraction of induced sub-graphs that match a given pattern such as a small clique. Finally, for distance estimation we present sketch-based spanner constructions. In this last result the sketches are adaptive, i.e., the linear projections are performed in a small number of batches where each projection may be chosen dependent on the outcome of earlier sketches. All of the above results immediately give rise to data stream algorithms that also apply to dynamic graph streams where edges are both inserted and deleted. The non-adaptive sketches, such as those for sparsification and subgraphs, give us single-pass algorithms for distributed data streams with insertion and deletions. The adaptive sketches can be used to analyze MapReduce algorithms that use a small number of rounds.
Article
In this paper we introduce a new notion of distance between nodes in a graph that we refer to as robust connectivity. Robust connectivity between a pair of nodes u and v is parameterized by a threshold k and intuitively captures the number of paths between u and v of length at most k. Using this new notion of distances, we show that any black box algorithm for constructing a spanner can be used to construct a spectral sparsifier. We show that given an undirected weighted graph G, simply taking the union of spanners of a few (polylogarithmically many) random subgraphs of G obtained by sampling edges at different probabilities, after appropriate weighting, yields a spectral sparsifier of G. We show how this be done in Õ(m) time, producing a sparsifier with Õ(n/ε2) edges. While the cut sparsifiers of Benczur and Karger are based on weighting edges according to (inverse) strong connectivity, and the spectral sparsifiers are based on resistance, our method weights edges using the robust connectivity measure. The main property that we use is that this new measure is always greater than the resistance when scaled by a factor of O(k) (k is chosen to be O(log n)), but, just like resistance and connectivity, has a bounded sum, i.e. Õ(n), over all the edges of the graph.
Article
An abstract is not available.
Article
We show faster algorithms for solving regression problems based on estimating statistical leverage scores. A growing number of applications involve large, sparse, n*d matrices A where n>>d. For many of these applications the more expensive operations involve the d*d matrix A^TA. When n is much larger than d, the running time bottleneck is in the cost of computing this matrix, rather than the more expensive operations on the smaller matrix. Recent works by Clarkson, Drineas, Magdon-Ismail, Mahoney and Woodruff led to algorithms that approximate A^TA in O(ndlogn + poly(d)) and O(nnz(A) + poly(d)) time respectively. When the size of A is much larger than the cost of more expensive operations involving the smaller d*d matrix, these algorithms offer significant savings in running time. We give alternate approaches for approximating A^TA based on techniques originally developed for spectral sparsification. Our algorithm finds a matrix B consisting of a small number of scaled rows of A such that with high probability ||Ax||_2 = (1 +/- \epsilon) ||Bx||_2 for all vectors x. The running time of our algorithm can be bounded by O(nnz(A) + d^{\omega + \alpha} \epsilon^{-2}) for any constant \alpha > 0. The key to our approach is to find a sequence of estimates of A^TA and gradually improve the approximations.
Article
In this paper we show several results obtained by combining the use of stable distributions with pseudorandom generators for bounded space. In particular: we show how to maintain (using only O(log n//spl epsiv//sup 2/) words of storage) a sketch C(p) of a point p/spl isin/l/sub 1//sup n/ under dynamic updates of its coordinates, such that given sketches C(p) and C(q) one can estimate |p-q|/sub 1/ up to a factor of (1+/spl epsiv/) with large probability. We obtain another sketch function C' which maps l/sub 1//sup n/ into a normed space l/sub 1//sup m/ (as opposed to C), such that m=m(n) is much smaller than n; to our knowledge this is the first dimensionality reduction lemma for l/sub 1/ norm we give an explicit embedding of l/sub 2//sup n/ into l/sub l//sup nO(log n)/ with distortion (1+1/n/sup /spl theta/(1)/) and a non-constructive embedding of l/sub 2//sup n/ into l/sub 1//sup O(n)/ with distortion (1+/spl epsiv/) such that the embedding can be represented using only O(n log/sup 2/ n) bits (as opposed to at least n/sup 2/ used by earlier methods).
Conference Paper
Several results appeared that show significant reduction in time for matrix multiplication, singular value decomposition as well as linear (lscr<sub>2</sub>) regression, all based on data dependent random sampling. Our key idea is that low dimensional embeddings can be used to eliminate data dependence and provide more versatile, linear time pass efficient matrix computation. Our main contribution is summarized as follows. 1) Independent of the results of Har-Peled and of Deshpande and Vempala, one of the first - and to the best of our knowledge the most efficient - relative error (1 + epsi) parA $A<sub>k</sub>par<sub>F</sub> approximation algorithms for the singular value decomposition of an m times n matrix A with M non-zero entries that requires 2 passes over the data and runs in time O((M(k/epsi+k log k) + (n+m)(k/epsi+k log k)<sup>2</sup>)log (1/sigma)). 2) The first o(nd<sup>2</sup>) time (1 + epsi) relative error approximation algorithm for n times d linear (lscr<sub>2</sub>) regression. 3) A matrix multiplication and norm approximation algorithm that easily applies to implicitly given matrices and can be used as a black box probability boosting tool
Article
We formalize a potentially rich new streaming model, the semi-streaming model, that we believe is necessary for the fruitful study of efficient algorithms for solving problems on massive graphs whose edge sets cannot be stored in memory. In this model, the input graph, G=(V,E), is presented as a stream of edges (in adversarial order), and the storage space of an algorithm is bounded by O(n·polylog n), where n=|V|. We are particularly interested in algorithms that use only one pass over the input, but, for problems where this is provably insufficient, we also look at algorithms using constant or, in some cases, logarithmically many passes. In the course of this general study, we give semi-streaming constant approximation algorithms for the unweighted and weighted matching problems, along with a further algorithmic improvement for the bipartite case. We also exhibit semi-streaming approximations to the diameter and the problem of computing the distance between specified vertices in a weighted graph. These are complemented by lower bounds.
Article
In this paper we give a construction of cut sparsifiers of Benczur and Karger in the {\em dynamic} streaming setting in a single pass over the data stream. Previous constructions either required multiple passes or were unable to handle edge deletions. We use $\tilde{O}(1/\e^2)$ time for each stream update and $\tilde{O}(n/\e^2)$ time to construct a sparsifier. Our $\e$-sparsifiers have $O(n\log^3 n/\e^2)$ edges. The main tools behind our result are an application of sketching techniques of Ahn et al.[SODA'12] to estimate edge connectivity together with a novel application of sampling with limited independence and sparse recovery to produce the edges of the sparsifier.
Conference Paper
We give near-optimal space bounds in the streaming model for linear algebra problems that include estimation of matrix products, linear regression, low-rank approximation, and approximation of matrix rank. In the streaming model, sketches of input matrices are maintained under updates of matrix entries; we prove results for turnstile updates, given in an arbitrary order. We give the first lower bounds known for the space needed by the sketches, for a given estimation error ε. We sharpen prior upper bounds, with respect to combinations of space, failure probability, and number of passes. The sketch we use for matrix A is simply STA, where S is a sign matrix. Our results include the following upper and lower bounds on the bits of space needed for 1-pass algorithms. Here A is an n x d matrix, B is an n x d' matrix, and c := d+d'. These results are given for fixed failure probability; for failure probability δ>0, the upper bounds require a factor of log(1/δ) more space. We assume the inputs have integer entries specified by O(log(nc)) bits, or O(log(nd)) bits. (Matrix Product) Output matrix C with F(ATB-C) ≤ ε F(A) F(B). We show that Θ(cε-2log(nc)) space is needed. (Linear Regression) For d'=1, so that B is a vector b, find x so that Ax-b ≤ (1+ε) minx' ∈ Reald Ax'-b. We show that Θ(d2ε-1 log(nd)) space is needed. (Rank-k Approximation) Find matrix tAk of rank no more than k, so that F(A-tAk) ≤ (1+ε) F{A-Ak}, where Ak is the best rank-k approximation to A. Our lower bound is Ω(kε-1(n+d)log(nd)) space, and we give a one-pass algorithm matching this when A is given row-wise or column-wise. For general updates, we give a one-pass algorithm needing [O(kε-2(n + d/ε2)log(nd))] space. We also give upper and lower bounds for algorithms using multiple passes, and a sketching analog of the CUR decomposition.
Conference Paper
We present an improved algorithm for solving symmetrically diagonally dominant linear systems. On input of an n×n symmetric diagonally dominant matrix A with m non-zero entries and a vector b such that Ax̅ = b for some (unknown) vector x̅, our algorithm computes a vector x such that ∥x-x̅∥A≤ϵ∥x̅∥A1 in time Õ (m log n log (1/ϵ))2. The solver utilizes in a standard way a 'preconditioning' chain of progressively sparser graphs. To claim the faster running time we make a two-fold improvement in the algorithm for constructing the chain. The new chain exploits previously unknown properties of the graph sparsification algorithm given in [Koutis,Miller,Peng, FOCS 2010], allowing for stronger preconditioning properties.We also present an algorithm of independent interest that constructs nearly-tight low-stretch spanning trees in time Õ (m log n), a factor of O (log n) faster than the algorithm in [Abraham,Bartal,Neiman, FOCS 2008]. This speedup directly reflects on the construction time of the preconditioning chain.
Conference Paper
We initiate the study of graph sketching, i.e., algorithms that use a limited number of linear measurements of a graph to determine the properties of the graph. While a graph on n nodes is essentially O(n2)-dimensional, we show the existence of a distribution over random projections into d-dimensional "sketch" space (d<< n2) such that the relevant properties of the original graph can be inferred from the sketch with high probability. Specifically, we show that: 1. d = O(n · polylog n) suffices to evaluate properties including connectivity, k-connectivity, bipartiteness, and to return any constant approximation of the weight of the minimum spanning tree. 2. d = O(n1+γ) suffices to compute graph sparsifiers, the exact MST, and approximate the maximum weighted matchings if we permit O(1/γ)-round adaptive sketches, i.e., a sequence of projections where each projection may be chosen dependent on the outcome of earlier sketches. Our results have two main applications, both of which have the potential to give rise to fruitful lines of further research. First, our results can be thought of as giving the first compressed-sensing style algorithms for graph data. Secondly, our work initiates the study of dynamic graph streams. There is already extensive literature on processing massive graphs in the data-stream model. However, the existing work focuses on graphs defined by a sequence of inserted edges and does not consider edge deletions. We think this is a curious omission given the existing work on both dynamic graphs in the non-streaming setting and dynamic geometric streaming. Our results include the first dynamic graph semi-streaming algorithms for connectivity, spanning trees, sparsification, and matching problems.
Article
Pseudorandom generators are constructed which convertO(SlogR) truly random bits toR bits that appear random to any algorithm that runs inSPACE(S). In particular, any randomized polynomial time algorithm that runs in spaceS can be simulated using onlyO(Slogn) random bits. An application of these generators is an explicit construction of universal traversal sequences (for arbitrary graphs) of lengthn O(logn). The generators constructed are technically stronger than just appearing random to spacebounded machines, and have several other applications. In particular, applications are given for “deterministic amplification” (i.e. reducing the probability of error of randomized algorithms), as well as generalizations of it.
Article
Let G be a graph with n vertices and m edges. A sparsifier of G is a sparse graph on the same vertex set approximating G in some natural way. It allows us to say useful things about G while considering much fewer than m edges. The strongest commonly-used notion of sparsification is spectral sparsification; H is a spectral sparsifier of G if the quadratic forms induced by the Laplacians of G and H approximate one another well. This notion is strictly stronger than the earlier concept of combinatorial sparsification. In this paper, we consider a semi-streaming setting, where we have only \(\tilde{O}(n)\) storage space, and we thus cannot keep all of G. In this case, maintaining a sparsifier instead gives us a useful approximation to G, allowing us to answer certain questions about the original graph without storing all of it. We introduce an algorithm for constructing a spectral sparsifier of G with O(nlogn/ϵ 2) edges (where ϵ is a parameter measuring the quality of the sparsifier), taking \(\tilde{O}(m)\) time and requiring only one pass over G. In addition, our algorithm has the property that it maintains at all times a valid sparsifier for the subgraph of G that we have received. Our algorithm is natural and conceptually simple. As we read edges of G, we add them to the sparsifier H. Whenever H gets too big, we resparsify it in \(\tilde{O}(n)\) time. Adding edges to a graph changes the structure of its sparsifier’s restriction to the already existing edges. It would thus seem that the above procedure would cause errors to compound each time that we resparsify, and that we should need to either retain significantly more information or reexamine previously discarded edges in order to construct the new sparsifier. However, we show how to use the information contained in H to perform this resparsification using only the edges retained by earlier steps in nearly linear time.
Article
This paper presents new probability inequalities for sums of independent, random, self-adjoint matrices. These results place simple and easily verifiable hypotheses on the summands, and they deliver strong conclusions about the large-deviation behavior of the maximum eigenvalue of the sum. Tail bounds for the norm of a sum of random rectangular matrices follow as an immediate corollary. The proof techniques also yield some information about matrix-valued martingales. In other words, this paper provides noncommutative generalizations of the classical bounds associated with the names Azuma, Bennett, Bernstein, Chernoff, Hoeffding, and McDiarmid. The matrix inequalities promise the same diversity of application, ease of use, and strength of conclusion that have made the scalar inequalities so valuable.
Conference Paper
Analyzing massive data sets has been one of the key motivations for studying streaming algorithms. In recent years, there has been significant progress in analysing distributions in a streaming setting, but the progress on graph problems has been limited. A main reason for this has been the existence of linear space lower bounds for even simple problems such as determining the connectedness of a graph. However, in many new scenarios that arise from social and other interaction networks, the number of vertices is significantly less than the number of edges. This has led to the formulation of the semi-streaming model where we assume that the space is (near) linear in the number of vertices (but not necessarily the edges), and the edges appear in an arbitrary (and possibly adversarial) order. However there has been limited progress in analysing graph algorithms in this model. In this paper we focus on graph sparsification, which is one of the major building blocks in a variety of graph algorithms. Further, there has been a long history of (non-streaming) sampling algorithms that provide sparse graph approximations and it a natural question to ask: since the end result of the sparse approximation is a small (linear) space structure, can we achieve that using a small space, and in addition using a single pass over the data? The question is interesting from the standpoint of both theory and practice and we answer the question in the affirmative, by providing a one pass \(\tilde{O}(n/\epsilon^{2})\) space algorithm that produces a sparsification that approximates each cut to a (1 + ε) factor. We also show that \(\Omega(n \log \frac1\epsilon)\) space is necessary for a one pass streaming algorithm to approximate the min-cut, improving upon the Ω(n) lower bound that arises from lower bounds for testing connectivity.
Article
We study the maximum weight matching problem in the semi-streaming model, and improve on the currently best one-pass algorithm due to Zelke (Proc. of STACS2008, pages 669-680) by devising a deterministic approach whose performance guarantee is 4.91+epsilon. In addition, we study preemptive online algorithms, a sub-class of one-pass algorithms where we are only allowed to maintain a feasible matching in memory at any point in time. All known results prior to Zelke's belong to this sub-class. We provide a lower bound of 4.967 on the competitive ratio of any such deterministic algorithm, and hence show that future improvements will have to store in memory a set of edges which is not necessarily a feasible matching.
Article
Contents 1 Introduction 2 1.1 Puzzle 1: Finding Missing Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Puzzle 2: Fishing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Lessons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Map 6 3 Data Stream Phenomenon 6 4 Data Streaming: Formal Aspects 8 4.1 Data Stream Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.2 A Motivating Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.3 Other Applications for Data Stream Models . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5 Foundations 14 5.1 Basic Mathematical Ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 5.1.1 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 5.1.2 Random Projections . . . . . . . . . . . . . . . . . . . . . . .
Article
This paper has been divided into three papers. arXiv:0809.3232, arXiv:0808.4134, arXiv:cs/0607105 Comment: withdrawn by author
Article
We present a nearly-linear time algorithm that produces high-quality sparsifiers of weighted graphs. Given as input a weighted graph $G=(V,E,w)$ and a parameter $\epsilon>0$, we produce a weighted subgraph $H=(V,\tilde{E},\tilde{w})$ of $G$ such that $|\tilde{E}|=O(n\log n/\epsilon^2)$ and for all vectors $x\in\R^V$ $(1-\epsilon)\sum_{uv\in E}(x(u)-x(v))^2w_{uv}\le \sum_{uv\in\tilde{E}}(x(u)-x(v))^2\tilde{w}_{uv} \le (1+\epsilon)\sum_{uv\in E}(x(u)-x(v))^2w_{uv}. (*)$ This improves upon the sparsifiers constructed by Spielman and Teng, which had $O(n\log^c n)$ edges for some large constant $c$, and upon those of Bencz\'ur and Karger, which only satisfied (*) for $x\in\{0,1\}^V$. A key ingredient in our algorithm is a subroutine of independent interest: a nearly-linear time algorithm that builds a data structure from which we can query the approximate effective resistance between any two vertices in a graph in $O(\log n)$ time.
Matrix concentration
  • Nick Harvey
Nick Harvey. Matrix concentration. http://www.cs.rpi.edu/~drinep/RandNLA/slides/Harvey_Rand 2012.
External memory algorithms
  • M R Henzinger
  • P Raghavan
  • S Rajagopalan
Randomized algorithms lecture notes
  • A Blum
  • A Gupta
Improved approximation guarantees for weighted matching in the semistreaming model
  • L Epstein
  • A Levin
  • J Mestre
  • D Segev
Graph sketches: sparsification, spanners, and subgraphs
  • Sudipto Kook Jin Ahn
  • Andrew Guha
  • Mcgregor
Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. Graph sketches: sparsification, spanners, and subgraphs. In PODS, pages 5-14, 2012.
Spectral sparsification in dynamic graph streams
  • Sudipto Kook Jin Ahn
  • Andrew Guha
  • Mcgregor
Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. Spectral sparsification in dynamic graph streams. In APPROX-RANDOM, pages 1-10, 2013.
External memory algorithms. chapter Computing on Data Streams
  • Monika R Henzinger
  • Prabhakar Raghavan
  • Sridhar Rajagopalan
Monika R. Henzinger, Prabhakar Raghavan, and Sridhar Rajagopalan. External memory algorithms. chapter Computing on Data Streams, pages 107-118. American Mathematical Society, Boston, MA, USA, 1999.