Publications (26)3.07 Total impact

Conference Paper: Overlapping clusters for distributed computation
[Show abstract] [Hide abstract]
ABSTRACT: Most graph decomposition procedures seek to partition a graph into disjoint sets of vertices. Motivated by applications of clustering in distributed computation, we describe a graph decomposition algorithm for the paradigm where the partitions intersect. This algorithm covers the vertex set with a collection of overlapping clusters. Each vertex in the graph is wellcontained within some cluster in the collection. We then describe a framework for distributed computation across a collection of overlapping clusters and describe how this framework can be used in various algorithms based on the graph diffusion process. In particular, we focus on two illustrative examples: (i) the simulation of a randomly walking particle and (ii) the solution of a linear system, e.g. PageRank. Our simulation results for these two cases show a significant reduction in swapping between clusters in a random walk, a significant decrease in communication volume during a linear system solve in a geometric mesh, and some ability to reduce the communication volume during a linear system solve in an information network. 
Conference Paper: WebCop: Locating Neighborhoods of Malware on the Web
[Show abstract] [Hide abstract]
ABSTRACT: In this paper, we propose WebCop to identify malicious web pages and neighborhoods of malware on the internet. Using a bottomup approach, telemetry data from commercial AntiMalware (AM) clients running on millions of computers first identify malware distribution sites hosting malicious executables on the web. Next, traversing hyperlinks in a web graph constructed from a commercial search engine crawler in the reverse direction quickly discovers malware landing pages linking to the malware distribution sites. In addition, the malicious distribution sites and web graph are used to identify neighborhoods of malware, locate additional executables distributed on the internet which may be unknown malware and identify false positives in AM signatures. We compare the malicious URLs generated by the proposed method with those found by a commercial, driveby download approach and show that lists are independent; both methods can be used to identify malware on the internet and help protect end users.  [Show abstract] [Hide abstract]
ABSTRACT: Harald Racke [STOC 2008] described a new method to obtain hierarchical decompositions of networks in a way that minimizes the congestion. Racke's approach is based on an equivalence that he discovered between minimizing congestion and minimizing stretch (in a certain setting). Here we present Racke's equivalence in an abstract setting that is more general than the one described in Racke's work, and clarifies the power of Racke's result. In addition, we present a related (but different) equivalence that was developed by Yuval Emek [ESA 2009] and is only known to apply to planar graphs. Comment: 16 pages, no figures 
Conference Paper: Finding Dense Subgraphs with Size Bounds
[Show abstract] [Hide abstract]
ABSTRACT: We consider the problem of finding dense subgraphs with specified upper or lower bounds on the number of vertices. We introduce two optimization problems: the densest atleastksubgraph problem (dalks), which is to find an induced subgraph of highest average degree among all subgraphs with at least k vertices, and the densest atmostksubgraph problem (damks), which is defined similarly. These problems are relaxed versions of the wellknown densest ksubgraph problem (dks), which is to find the densest subgraph with exactly k vertices. Our main result is that dalks can be approximated efficiently, even for webscale graphs. We give a (1/3)approximation algorithm for dalks that is based on the core decomposition of a graph, and that runs in time O(m + n), where n is the number of nodes and m is the number of edges. In contrast, we show that damks is nearly as hard to approximate as the densest ksubgraph problem, for which no good approximation algorithm is known. In particular, we show that if there exists a polynomial time approximation algorithm for damks with approximation ratio γ, then there is a polynomial time approximation algorithm for dks with approximation ratio γ 2/8. In the experimental section, we test the algorithm for dalks on large publicly available web graphs. We observe that, in addition to producing nearoptimal solutions for dalks, the algorithm also produces nearoptimal solutions for dks for nearly all values of k. 
Conference Paper: Speeding Up Algorithms on Compressed Web Graphs
[Show abstract] [Hide abstract]
ABSTRACT: A variety of lossless compression schemes has been proposed to reduce the storage requirements of web graphs. One successful approach is virtualnode compression, in which oftenused patterns of links are replaced by links to virtual nodes, creating a compressed graph that succinctly represents the original. In this paper, we show that several important classes of web graph algorithms can be extended to run directly on virtualnodecompressed graphs, such that their running times depend on the size of the compressed graph rather than on that of the original. These include algorithms for link analysis, estimating the size of vertex neighborhoods, and a variety of algorithms based on matrixvector products and random walks. Similar speedups have been obtained previously for classical graph algorithms such as shortest paths and maximum bipartite matching. We measure the performance of our modified algorithms on several publicly available web graph data sets, and demonstrate significant empirical speedups that nearly match the compression ratios. 
Conference Paper: On the Stability of Web Crawling and Web Search
[Show abstract] [Hide abstract]
ABSTRACT: In this paper, we analyze a graphtheoretic property moti vated by web crawling. We introduce a notion of stable cores, which is the set of web pages that are usually contained in the crawling buffer when the buffer size is smaller than the total number of web pages. We analyze the size of core in a random graph model based on the bounded Pareto power law distribution. We prove that a core of significant size exists for a large range of parameters 2  [Show abstract] [Hide abstract]
ABSTRACT: Since the link structure of the web is an important element in ranking systems on search engines, web spammers widely use the link structure of the web to increase the rank of their pages. Various linkbased features of web pages have been introduced and have proven effective at identifying link spam. One particularly successful family of features (as described in the SpamRank algorithm), is based on examining the sets of pages that contribute most to the PageRank of a given vertex, called supporting sets. In a recent paper, the current authors described an algorithm for efficiently computing, for a single specified vertex, an approximation of its supporting sets. In this paper, we describe several linkbased spamdetection features, both supervised and unsupervised, that can be derived from these approximate supporting sets. In particular, we examine the size of a node's supporting sets and the approximate l2 norm of the PageRank contributions from other nodes. As a supervised feature, we examine the composition of a node's supporting sets. We perform experiments on two labeled real data sets to demonstrate the effectiveness of these features for spam detection, and demonstrate that these features can be computed efficiently. Furthermore, we design a variation of PageRank (called Robust PageRank) that incorporates some of these features into its ranking, argue that this variation is more robust against link spam engineering, and give an algorithm for approximating Robust PageRank. 
Conference Paper: An algorithm for improving graph partitions
[Show abstract] [Hide abstract]
ABSTRACT: We present an algorithm called Improve that im proves a proposed partition of a graph, taking as in put a subset of vertices and returning a new subset of vertices with a smaller quotient cut score. The most powerful previously known method for improv ing quotient cuts, which is based on parametric ∞ow, returns a partition whose quotient cut score is at least as small as any set contained within the pro posed set. For our algorithm, we can prove a stronger guarantee: the quotient score of the set returned is nearly as small as any set in the graph with which the proposed set has a largerthanexpected intersec tion. The algorithm flnds such a set by solving a sequence of polynomially many s ¡ t minimum cut problems, a sequence that cannot be cast as a single parametric ∞ow problem. We demonstrate empiri cally that applying Improve to the output of various graph partitioning algorithms greatly improves the quality of cuts produced without signiflcantly impact ing the running time. 
Conference Paper: Trustbased recommendation systems
[Show abstract] [Hide abstract]
ABSTRACT: Highquality, personalized recommendations are a key fea ture in many online systems. Since these systems often have explicit knowledge of social network structures, the recom mendations may incorporate this information. This paper focuses on networks which represent trust and recommen dations which incorporate trust relationships. The goal of a trustbased recommendation system is to generate per sonalized recommendations from known opinions and trust relationships. In analogy to prior work on voting and ranking systems, we use the axiomatic approach from the theory of social choice. We develop an natural set of five axioms which we desire any recommendation system exhibit. Then we show that no system can simultaneously satisfy all these axioms. We also exhibit systems which satisfy any four of the five axioms. Next we consider ways of weakening the axioms, which can lead to a unique recommendation system based on random walks. We consider other recommendation systems (personal page rank, majority of majorities, and min cut) and search for alternative axiomatizations which uniquely characterize these systems. Finally, we determine which of these systems are incen tive compatible. This is an important property for systems deployed in a monetized environment: groups of agents in terested in manipulating recommendations to make others share their opinion have nothing to gain from lying about their votes or their trust links. 


Conference Paper: Local Computation of PageRank Contributions
[Show abstract] [Hide abstract]
ABSTRACT: Motivated by the problem of detecting linkspam, we consider the following graphtheoretic primitive: Given a webgraph G, a vertex v in G, and a parameter 2 (0,1), compute the set of all vertices that contribute to v at least a fraction of v's PageRank. We call this set the contributing set of v. To this end, we define the contribution vector of v to be the vector whose entries measure the contributions of every vertex to the PageRank of v. A local algorithm is one that produces a solution by adaptively examining only a small portion of the input graph near a specified vertex. We give an ecient local algorithm that computes an approximation of the contribution vector for a given vertex by adaptively examining O(1/ ) vertices. Using this algorithm, we give a local approximation algorithm for the primitive defined above. Specifically, we give an algorithm that returns a set containing the  contributing set of v and at most O(1/ ) vertices from the / 2contributing set of v, and which does so by examining at most O(1/ ) vertices. We also give a local algorithm for solving the following problem: If there exist k vertices that contribute a fraction to the PageRank of v, find a set of k vertices that contribute at least a ( )fraction to the PageRank of v. In this case, we prove that our algorithm examines at most O(k/ ) vertices.  [Show abstract] [Hide abstract]
ABSTRACT: A local partitioning algorithm finds a set with small conductance near a specified seed vertex. In this paper, we present a generalization of a local partitioning algorithm for undirected graphs to strongly connected directed graphs. In particular, we prove that by computing a personalized PageRank vector in a directed graph, starting from a single seed vertex within a set S that has conductance at most α, and by performing a sweep over that vector, we can obtain a set of vertices S′ with conductance FM(S¢) = O(Ö{alogS})\Phi_{M}(S')= O(\sqrt{\alpha \log S}) . Here, the conductance function Φ M is defined in terms of the stationary distribution of a random walk in the directed graph. In addition, we describe how this algorithm may be applied to the PageRank Markov chain of an arbitrary directed graph, which provides a way to partition directed graphs that are not strongly connected.  [Show abstract] [Hide abstract]
ABSTRACT: We show that whenever there is a sharp drop in the numerical rank defined by a personalized PageRank vector, the location of the drop reveals a cut with small conductance. We then show that for any cut in the graph, and for many starting vertices within that cut, an approximate personalized PageRank vector will have a sharp drop sufficient to produce a cut with conductance nearly as small as the original cut. Using this technique, we produce a nearly linear time local partitioning algorithm whose analysis is simpler than previous algorithms. 
Article: NoThreeinLinein3D
[Show abstract] [Hide abstract]
ABSTRACT: It has been noted that many realistic graphs have a power law degree distribution and exhibit the smallworld phenomenon. We present drawing methods influenced by recent developments in the modeling of such graphs. Our main approach is to partition the edge set of a graph into "local" edges and "global" edges and to use a standard drawing method that allows us to give added importance to local edges. We show that our drawing method works well for graphs that contain underlying geometric graphs augmented with random edges, and we demonstrate the method on a few examples. We define edges to be local or global depending on the size of the maximum short flow between the edge's endpoints. Here, a short flow, or alternatively an ℓshort flow, is one composed of paths whose length is at most some constant ℓ. We present fast approximation algorithms for the maximum short flow problem and for testing whether a short flow of a certain size exists between given vertices. Using these algorithms, we give an algorithm for computing approximate local subgraphs of a given graph. The drawing algorithm we present can be applied to general graphs, but it is particularly well suited for smallworld networks with power law degree distribution.  [Show abstract] [Hide abstract]
ABSTRACT: It has been noted that many realistic graphs have a power law degree distribution and exhibit the smallworld phenomenon. We present drawing methods influenced by recent developments in the modeling of such graphs. Our main approach is to partition the edge set of a graph into "local" edges and "global" edges and to use a standard drawing method that allows us to give added importance to local edges. We show that our drawing method works well for graphs that contain underlying geometric graphs augmented with random edges, and we demonstrate the method on a few examples. We define edges to be local or global depending on the size of the maximum short flow between the edge's endpoints. Here, a short flow, or alternatively an ℓshort flow, is one composed of paths whose length is at most some constant ℓ. We present fast approximation algorithms for the maximum short flow problem and for testing whether a short flow of a certain size exists between given vertices. Using these algorithms, we give an algorithm for computing approximate local subgraphs of a given graph. The drawing algorithm we present can be applied to general graphs, but it is particularly well suited for smallworld networks with power law degree distribution.  [Show abstract] [Hide abstract]
ABSTRACT: Methods for improving sponsored search revenue are often tested or deployed within a small submarket of the larger marketplace. For many applications, the ideal submarket contains a small number of nodes, a large amount of spending within the submarket, and a small amount of spending leaving the submarket. We introduce an efficient algorithm for finding submarkets that are optimal for a userspecified tradeoff between these three quantities. We apply our algorithm to find submarkets that are both dense and isolated in a large spending graph from Yahoo! sponsored search.  [Show abstract] [Hide abstract]
ABSTRACT: In this paper, we study spectral versions of the densest subgraph problem and the largest independence subset problem. In the first part, we give an algorithm for identifying small subgraphs with large spectral radius. We also prove a Hoffmantype ratio bound for the order of an induced subgraph whose spectral radius is bounded from above.  [Show abstract] [Hide abstract]
ABSTRACT: A local graph partitioning algorithm finds a cut near a specified starting vertex, with a running time that depends largely on the size of the small side of the cut, rather than the size of the input graph. In this paper, we present a local partitioning algorithm using a variation of PageRank with a specified starting distribution. We derive a mixing result for PageRank vectors similar to that for random walks, and show that the ordering of the vertices produced by a PageRank vector reveals a cut with small conductance. In particular, we show that for any set C with conductance and volume 
Conference Paper: Local Partitioning for Directed Graphs Using PageRank.
Publication Stats
598  Citations  
3.07  Total Impact Points  
Top Journals
Institutions

20072012

Microsoft
Washington, West Virginia, United States


20042007

University of California, San Diego
 Department of Mathematics
San Diego, California, United States
