Reid Andersen

Microsoft, Washington, West Virginia, United States

Are you Reid Andersen?

Claim your profile

Publications (26)0.98 Total impact

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Most graph decomposition procedures seek to partition a graph into disjoint sets of vertices. Motivated by applications of clustering in distributed computation, we describe a graph decomposition algorithm for the paradigm where the partitions intersect. This algorithm covers the vertex set with a collection of overlapping clusters. Each vertex in the graph is well-contained within some cluster in the collection. We then describe a framework for distributed computation across a collection of overlapping clusters and describe how this framework can be used in various algorithms based on the graph diffusion process. In particular, we focus on two illustrative examples: (i) the simulation of a randomly walking particle and (ii) the solution of a linear system, e.g. PageRank. Our simulation results for these two cases show a significant reduction in swapping between clusters in a random walk, a significant decrease in communication volume during a linear system solve in a geometric mesh, and some ability to reduce the communication volume during a linear system solve in an information network.
    Proceedings of the Fifth International Conference on Web Search and Web Data Mining, WSDM 2012, Seattle, WA, USA, February 8-12, 2012; 01/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose WebCop to identify malicious web pages and neighborhoods of malware on the internet. Using a bottom-up approach, telemetry data from commercial Anti-Malware (AM) clients running on millions of computers first identify malware distribution sites hosting malicious executables on the web. Next, traversing hyperlinks in a web graph constructed from a commercial search engine crawler in the reverse direction quickly discovers malware landing pages linking to the malware distribution sites. In addition, the malicious distribution sites and web graph are used to identify neighborhoods of malware, locate additional executables distributed on the internet which may be unknown malware and identify false positives in AM signatures. We compare the malicious URLs generated by the proposed method with those found by a commercial, drive-by download approach and show that lists are independent; both methods can be used to identify malware on the internet and help protect end users.
    3rd Usenix Workshop on Large-Scale Exploits and Emergent Threats; 01/2010
  • Source
    Reid Andersen, Uriel Feige
    [Show abstract] [Hide abstract]
    ABSTRACT: Harald Racke [STOC 2008] described a new method to obtain hierarchical decompositions of networks in a way that minimizes the congestion. Racke's approach is based on an equivalence that he discovered between minimizing congestion and minimizing stretch (in a certain setting). Here we present Racke's equivalence in an abstract setting that is more general than the one described in Racke's work, and clarifies the power of Racke's result. In addition, we present a related (but different) equivalence that was developed by Yuval Emek [ESA 2009] and is only known to apply to planar graphs. Comment: 16 pages, no figures
    07/2009;
  • Source
    Reid Andersen, Kumar Chellapilla
    [Show abstract] [Hide abstract]
    ABSTRACT: We consider the problem of finding dense subgraphs with specified upper or lower bounds on the number of vertices. We introduce two optimization problems: the densest at-least-k-subgraph problem (dalks), which is to find an induced subgraph of highest average degree among all subgraphs with at least k vertices, and the densest at-most-k-subgraph problem (damks), which is defined similarly. These problems are relaxed versions of the well-known densest k-subgraph problem (dks), which is to find the densest subgraph with exactly k vertices. Our main result is that dalks can be approximated efficiently, even for web-scale graphs. We give a (1/3)-approximation algorithm for dalks that is based on the core decomposition of a graph, and that runs in time O(m + n), where n is the number of nodes and m is the number of edges. In contrast, we show that damks is nearly as hard to approximate as the densest k-subgraph problem, for which no good approximation algorithm is known. In particular, we show that if there exists a polynomial time approximation algorithm for damks with approximation ratio γ, then there is a polynomial time approximation algorithm for dks with approximation ratio γ 2/8. In the experimental section, we test the algorithm for dalks on large publicly available web graphs. We observe that, in addition to producing near-optimal solutions for dalks, the algorithm also produces near-optimal solutions for dks for nearly all values of k.
    Algorithms and Models for the Web-Graph, 6th International Workshop, WAW 2009, Barcelona, Spain, February 12-13, 2009. Proceedings; 01/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A variety of lossless compression schemes has been proposed to reduce the storage requirements of web graphs. One successful approach is virtual-node compression, in which often-used patterns of links are replaced by links to virtual nodes, creating a compressed graph that succinctly represents the original. In this paper, we show that several important classes of web graph algorithms can be extended to run directly on virtual-node-compressed graphs, such that their running times depend on the size of the compressed graph rather than on that of the original. These include algorithms for link analysis, estimating the size of vertex neighborhoods, and a variety of algorithms based on matrix-vector products and random walks. Similar speedups have been obtained previously for classical graph algorithms such as shortest paths and maximum bipartite matching. We measure the performance of our modified algorithms on several publicly available web graph data sets, and demonstrate significant empirical speedups that nearly match the compression ratios.
    Proceedings of the Second International Conference on Web Search and Web Data Mining, WSDM 2009, Barcelona, Spain, February 9-11, 2009; 01/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Since the link structure of the web is an important element in ranking systems on search engines, web spammers widely use the link structure of the web to increase the rank of their pages. Various link-based features of web pages have been introduced and have proven effective at identifying link spam. One particularly successful family of features (as de-scribed in the SpamRank algorithm), is based on examining the sets of pages that contribute most to the PageRank of a given vertex, called supporting sets. In a recent paper, the current authors described an algorithm for efficiently com-puting, for a single specified vertex, an approximation of its supporting sets. In this paper, we describe several link-based spam-detection features, both supervised and unsu-pervised, that can be derived from these approximate sup-porting sets. In particular, we examine the size of a node's supporting sets and the approximate l2 norm of the PageR-ank contributions from other nodes. As a supervised feature, we examine the composition of a node's supporting sets. We perform experiments on two labeled real data sets to demon-strate the effectiveness of these features for spam detection, and demonstrate that these features can be computed ef-ficiently. Furthermore, we design a variation of PageRank (called Robust PageRank) that incorporates some of these features into its ranking, argue that this variation is more robust against link spam engineering, and give an algorithm for approximating Robust PageRank.
    05/2008;
  • Source
    Reid Andersen, Kevin J. Lang
    [Show abstract] [Hide abstract]
    ABSTRACT: We present an algorithm called Improve that im- proves a proposed partition of a graph, taking as in- put a subset of vertices and returning a new subset of vertices with a smaller quotient cut score. The most powerful previously known method for improv- ing quotient cuts, which is based on parametric ∞ow, returns a partition whose quotient cut score is at least as small as any set contained within the pro- posed set. For our algorithm, we can prove a stronger guarantee: the quotient score of the set returned is nearly as small as any set in the graph with which the proposed set has a larger-than-expected intersec- tion. The algorithm flnds such a set by solving a sequence of polynomially many s ¡ t minimum cut problems, a sequence that cannot be cast as a single parametric ∞ow problem. We demonstrate empiri- cally that applying Improve to the output of various graph partitioning algorithms greatly improves the quality of cuts produced without signiflcantly impact- ing the running time.
    Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008, San Francisco, California, USA, January 20-22, 2008; 01/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we analyze a graph-theoretic property moti- vated by web crawling. We introduce a notion of stable cores, which is the set of web pages that are usually contained in the crawling buffer when the buffer size is smaller than the total number of web pages. We analyze the size of core in a random graph model based on the bounded Pareto power law distribution. We prove that a core of significant size exists for a large range of parameters 2
    Algorithms and Computation, 19th International Symposium, ISAAC 2008, Gold Coast, Australia, December 15-17, 2008. Proceedings; 01/2008
  • Source
    Reid Andersen, Fan R. K. Chung, Kevin J. Lang
    Internet Mathematics. 01/2008; 5:3-22.
  • Source
    Internet Mathematics. 01/2008; 5:23-45.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: High-quality, personalized recommendations are a key fea- ture in many online systems. Since these systems often have explicit knowledge of social network structures, the recom- mendations may incorporate this information. This paper focuses on networks which represent trust and recommen- dations which incorporate trust relationships. The goal of a trust-based recommendation system is to generate per- sonalized recommendations from known opinions and trust relationships. In analogy to prior work on voting and ranking systems, we use the axiomatic approach from the theory of social choice. We develop an natural set of five axioms which we desire any recommendation system exhibit. Then we show that no system can simultaneously satisfy all these axioms. We also exhibit systems which satisfy any four of the five axioms. Next we consider ways of weakening the axioms, which can lead to a unique recommendation system based on random walks. We consider other recommendation systems (personal page rank, majority of majorities, and min cut) and search for alternative axiomatizations which uniquely characterize these systems. Finally, we determine which of these systems are incen- tive compatible. This is an important property for systems deployed in a monetized environment: groups of agents in- terested in manipulating recommendations to make others share their opinion have nothing to gain from lying about their votes or their trust links.
    Proceedings of the 17th International Conference on World Wide Web, WWW 2008, Beijing, China, April 21-25, 2008; 01/2008
  • Reid Andersen, Fan Chung, Kevin Lang
    [Show abstract] [Hide abstract]
    ABSTRACT: A local partitioning algorithm finds a set with small conductance near a specified seed vertex. In this paper, we present a generalization of a local partitioning algorithm for undirected graphs to strongly connected directed graphs. In particular, we prove that by computing a personalized PageRank vector in a directed graph, starting from a single seed vertex within a set S that has conductance at most α, and by performing a sweep over that vector, we can obtain a set of vertices S′ with conductance FM(S¢) = O(Ö{alog|S|})\Phi_{M}(S')= O(\sqrt{\alpha \log |S|}) . Here, the conductance function Φ M is defined in terms of the stationary distribution of a random walk in the directed graph. In addition, we describe how this algorithm may be applied to the PageRank Markov chain of an arbitrary directed graph, which provides a way to partition directed graphs that are not strongly connected.
    11/2007: pages 166-178;
  • Source
    Reid Andersen, Fan Chung
    [Show abstract] [Hide abstract]
    ABSTRACT: We show that whenever there is a sharp drop in the numerical rank defined by a personalized PageRank vector, the location of the drop reveals a cut with small conductance. We then show that for any cut in the graph, and for many starting vertices within that cut, an approximate personalized PageRank vector will have a sharp drop sufficient to produce a cut with conductance nearly as small as the original cut. Using this technique, we produce a nearly linear time local partitioning algorithm whose analysis is simpler than previous algorithms.
    07/2007: pages 1-12;
  • Reid Andersen, Fan R. K. Chung, Linyuan Lu
    [Show abstract] [Hide abstract]
    ABSTRACT: It has been noted that many realistic graphs have a power law degree distribution and exhibit the small-world phenomenon. We present drawing methods influenced by recent developments in the modeling of such graphs. Our main approach is to partition the edge set of a graph into "local" edges and "global" edges and to use a standard drawing method that allows us to give added importance to local edges. We show that our drawing method works well for graphs that contain underlying geometric graphs augmented with random edges, and we demonstrate the method on a few examples. We define edges to be local or global depending on the size of the maximum short flow between the edge's endpoints. Here, a short flow, or alternatively an ℓ-short flow, is one composed of paths whose length is at most some constant ℓ. We present fast approximation algorithms for the maximum short flow problem and for testing whether a short flow of a certain size exists between given vertices. Using these algorithms, we give an algorithm for computing approximate local subgraphs of a given graph. The drawing algorithm we present can be applied to general graphs, but it is particularly well suited for small-world networks with power law degree distribution.
    Algorithmica 04/2007; 47:379-397. · 0.49 Impact Factor
  • Source
    Reid Andersen, Fan R. K. Chung, Linyuan Lu
    [Show abstract] [Hide abstract]
    ABSTRACT: We examine reconfigurations between triangulations and near-triangulations of point sets. We give new bounds on the number of point moves and edge flips sufficient for any reconfiguration. We show that with O(n log n) edge flips and point moves, we can ...
    Algorithmica 01/2007; 47:397. · 0.49 Impact Factor
  • Source
    Reid Andersen, Fan R. K. Chung, Kevin J. Lang
    [Show abstract] [Hide abstract]
    ABSTRACT: A local graph partitioning algorithm finds a cut near a specified starting vertex, with a running time that depends largely on the size of the small side of the cut, rather than the size of the input graph. In this paper, we present a local partitioning algorithm using a variation of PageRank with a specified starting distribution. We derive a mixing result for PageRank vectors similar to that for random walks, and show that the ordering of the vertices produced by a PageRank vector reveals a cut with small conductance. In particular, we show that for any set C with conductance and volume
    Internet Mathematics. 01/2007; 4:35-64.
  • Source
    Reid Andersen, Sebastian M. Cioaba
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we study spectral versions of the densest subgraph problem and the largest independence subset problem. In the first part, we give an algorithm for identifying small subgraphs with large spectral radius. We also prove a Hoffman-type ratio bound for the order of an induced subgraph whose spectral radius is bounded from above.
    J. UCS. 01/2007; 13:1501-1513.
  • Source
    Reid Andersen, Fan R. K. Chung, Kevin J. Lang
    Algorithms and Models for the Web-Graph, 5th International Workshop, WAW 2007, San Diego, CA, USA, December 11-12, 2007, Proceedings; 01/2007
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivated by the problem of detecting link-spam, we consider the following graph-theoretic primitive: Given a webgraph G, a vertex v in G, and a parameter 2 (0,1), compute the set of all vertices that contribute to v at least a fraction of v's PageRank. We call this set the -contributing set of v. To this end, we define the contribution vector of v to be the vector whose entries measure the contributions of every vertex to the PageRank of v. A local algorithm is one that produces a solution by adaptively examining only a small portion of the input graph near a specified vertex. We give an ecient local algorithm that computes an -approximation of the contribution vector for a given vertex by adaptively examining O(1/ ) vertices. Using this algorithm, we give a local approximation algorithm for the primitive defined above. Specifically, we give an algorithm that returns a set containing the - contributing set of v and at most O(1/ ) vertices from the / 2-contributing set of v, and which does so by examining at most O(1/ ) vertices. We also give a local algorithm for solving the following problem: If there exist k vertices that contribute a -fraction to the PageRank of v, find a set of k vertices that contribute at least a ( )-fraction to the PageRank of v. In this case, we prove that our algorithm examines at most O(k/ ) vertices.
    Algorithms and Models for the Web-Graph, 5th International Workshop, WAW 2007, San Diego, CA, USA, December 11-12, 2007, Proceedings; 01/2007
  • Source
    Kevin J. Lang, Reid Andersen
    [Show abstract] [Hide abstract]
    ABSTRACT: Methods for improving sponsored search revenue are often tested or deployed within a small submarket of the larger marketplace. For many applications, the ideal submarket contains a small number of nodes, a large amount of spending within the submarket, and a small amount of spending leaving the submarket. We introduce an efficient algorithm for finding submarkets that are optimal for a user-specified tradeoff between these three quantities. We apply our algorithm to find submarkets that are both dense and isolated in a large spending graph from Yahoo! sponsored search.
    Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, CIKM 2007, Lisbon, Portugal, November 6-10, 2007; 01/2007