Conference Paper
PEGASUS: A petascale graph mining system  Implementation and observations
SCS, Carnegie Mellon Univ., Pittsburgh, PA, USA
DOI: 10.1109/ICDM.2009.14 Conference: Data Mining, 2009. ICDM '09. Ninth IEEE International Conference on Source: DBLP

 "In order to achieve scalable graph computing, researchers have proposed many distributed or single machine solutions [3]–[23]. Representative distributed systems include Power Graph [18], Giraph [15], Pregel [16], GraphLab [17],GraphX [19], PEGASUS [20], and etc. Some of these systems are developed based on popular distributed computing frameworks , such as MapReduce [24] and Spark [25]. "
[Show abstract] [Hide abstract]
ABSTRACT: Recent studies show that graph processing systems on a single machine can achieve competitive performance compared with clusterbased graph processing systems. In this paper, we present NXgraph, an efficient graph processing system on a single machine. With the abstraction of vertex intervals and edge subshards, we propose the DestinationSorted SubShard (DSSS) structure to store a graph. By dividing vertices and edges into intervals and subshards, NXgraph ensures graph data access locality and enables finegrained scheduling. By sorting edges within each subshard according to their destination vertices, NXgraph reduces write conflicts among different threads and achieves a high degree of parallelism. Then, three updating strategies, i.e., SinglePhase Update (SPU), DoublePhase Update (DPU), and MixedPhase Update (MPU), are proposed in this paper. NXgraph can adaptively choose the fastest strategy for different graph problems according to the graph size and the available memory resources to fully utilize the memory space and reduce the amount of data transfer. All these three strategies exploit streamlined disk access pattern. Extensive experiments on three realworld graphs and five synthetic graphs show that NXgraph can outperform GraphChi, TurboGraph, VENUS, and GridGraph in various situations. Moreover, NXgraph, running on a single commodity PC, can finish an iteration of PageRank on the Twitter graph with 1.5 billion edges in 2.05 seconds; while PowerGraph, a distributed graph processing system, needs 3.6s to finish the same task. 
 "Since inception , the Hadoop/MapReduce ecosystem has grown considerably in support of related Big Data tasks. However, these distributed frameworks are not suited for all purposes, in many cases can even result in poor performance [31] [59] [85]. Algorithms that make use of multiple iterations, especially those using graph or matrix data representations, are particularly poorly suited for popular Big Data processing systems. "
Article: Thinking Like a Vertex: a Survey of VertexCentric Frameworks for Distributed Graph Processing
[Show abstract] [Hide abstract]
ABSTRACT: The vertexcentric programming model is an established computational paradigm recently incorporated into distributed processing frameworks to address challenges in largescale graph processing. Billionnode graphs that exceed the memory capacity of standard machines are not wellsupported by popular Big Data tools like MapReduce, which are notoriously poorperforming for iterative graph algorithms such as PageRank. In response, a new type of framework challenges one to Think Like A Vertex (TLAV) and implements userdefined programs from the perspective of a vertex rather than a graph. Such an approach improves locality, demonstrates linear scalability, and provides a natural way to express and compute many iterative graph algorithms. These frameworks are simple to program and widely applicable, but, like an operating system, are composed of several intricate, interdependent components, of which a thorough understanding is necessary in order to elicit top performance at scale. To this end, the first comprehensive survey of TLAV frameworks is presented. In this survey, the vertexcentric approach to graph processing is overviewed, TLAV frameworks are deconstructed into four main components and respectively analyzed, and TLAV implementations are reviewed and categorized. 
 "Other largescale techniques are visual in a different sense; they present plots of calculated features of the graph instead of depicting their structural information. This is the case of Apolo [7], OPAvion [2], Pegasus [16], and OddBall [3]. There are also techniques [5] that rely on sampling to gain scalability, but this approach assumes that parts of the graph will be absent; parts that are of potential interest. "
Article: StructMatrix: largescale visualization of graphs by means of structure detection and dense matrices
[Show abstract] [Hide abstract]
ABSTRACT: Given a planetaryscale graph with millions of nodes and billions of edges, how to reveal macro patterns of interest, like cliques, bipartite cores, stars, and chains? Furthermore, how to visualize such patterns altogether getting insights from the graph to support wise decisionmaking? Although there are many algorithmic and visual techniques to analyze graphs, none of the existing approaches is able to present the structural information of graphs at planetary scale. Hence, this paper describes StructMatrix, a methodology aimed at high scalable visual inspection of graph structures with the goal of revealing macro patterns of interest. StructMatrix combines algorithmic structure detection and adjacency matrix visualization to present cardinality, distribution, and relationship features of the structures found in a given graph. We performed experiments in real, planetaryscale graphs with up to millions of nodes and over 10 billion edges. StructMatrix revealed that graphs of high relevance (e.g., Web, Wikipedia and DBLP) have characterizations that reflect the nature of their corresponding domains; our findings have not been seen in the literature so far. We expect that our technique will bring deeper insights into large graph mining, leveraging their use for decision making.
Similar Publications
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.