Conference Paper

Static graph challenge: Subgraph isomorphism

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Static graph structure. The static graph structure uses fixed nodes and edges to represent the relationship between geographical locations and positions, which can effectively capture the spatial dependencies and features between different sea area positions (Samsi et al. 2017). ...
Article
Full-text available
Ocean temperature prediction is significant in climate change research and marine ecosystem management. However, relevant statistical and physical methods focus on assuming relationships between variables and simulating complex physical processes of ocean temperature changes, facing challenges such as high data dependence and insufficient processing of long-term dependencies. This paper comprehensively reviews the development and latest progress of ocean temperature prediction models based on deep learning. We first provide a formulaic definition for ocean temperature prediction and a brief overview of deep learning models widely used in this field. Using data sources and model structures, we systematically divide ocean temperature prediction models into data-driven deep learning models and physically guided deep learning models; and comprehensively explore the relevant literature involved in each method. In addition, we summarize an ocean temperature dataset and sea areas, laying a solid foundation for ocean temperature prediction. Finally, we propose current challenges and future development directions in ocean temperature prediction research based on deep learning. This article aims to analyze existing research, identify research gaps and challenges, provide complete and reliable technical support for climate forecasting, marine disaster prevention, and fishery resource management, and promote the further development of ocean temperature research.
... The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein. Subgraph Isomorphism [15]- [29], and PageRank [30]- [32] Graph Challenges have enabled a new generation of graph analysis by highlighting the benefits of novel innovations. Graph Challenge is part of the long tradition of challenges that have played a key role in advancing computation, AI, and other fields, such as, YOHO [33], MNIST [34], HPC Challenge [35], ImageNet [36] and VAST [37], [38]. ...
Preprint
The MIT/IEEE/Amazon GraphChallenge encourages community approaches to developing new solutions for analyzing graphs and sparse data derived from social media, sensor feeds, and scientific data to discover relationships between events as they unfold in the field. The anonymized network sensing Graph Challenge seeks to enable large, open, community-based approaches to protecting networks. Many large-scale networking problems can only be solved with community access to very broad data sets with the highest regard for privacy and strong community buy-in. Such approaches often require community-based data sharing. In the broader networking community (commercial, federal, and academia) anonymized source-to-destination traffic matrices with standard data sharing agreements have emerged as a data product that can meet many of these requirements. This challenge provides an opportunity to highlight novel approaches for optimizing the construction and analysis of anonymized traffic matrices using over 100 billion network packets derived from the largest Internet telescope in the world (CAIDA). This challenge specifies the anonymization, construction, and analysis of these traffic matrices. A GraphBLAS reference implementation is provided, but the use of GraphBLAS is not required in this Graph Challenge. As with prior Graph Challenges the goal is to provide a well-defined context for demonstrating innovation. Graph Challenge participants are free to select (with accompanying explanation) the Graph Challenge elements that are appropriate for highlighting their innovations.
... analytics. Introducing important AI problems to the public allows for increased collaboration among diverse research teams and maximization of potential solutions (see Figure 1). Challenges such as YOHO (J.P. Campbell, 1995), MNIST (C.C.Y. LeCun and C.J. Burges, 2017), HPC Challenge (HPC Challenge, 2017), Graph Challenge (E. J. Kepner et al., 2019;S. Samsi et al., 2017), ImageNet (O. Russakovsky et al., 2015) and VAST (K.A. Cook et al., 2014;J Scholtz et al., 2012) have driven major developments in various fields. Each of these challenges have catalyzed critical research efforts in their respective fields: YOHO enabled speech research; MNIST remains foundational to the computer vision research communit ...
Preprint
Full-text available
Artificial intelligence (AI) has enormous potential to improve Air Force pilot training by providing actionable feedback to pilot trainees on the quality of their maneuvers and enabling instructor-less flying familiarization for early-stage trainees in low-cost simulators. Historically, AI challenges consisting of data, problem descriptions, and example code have been critical to fueling AI breakthroughs. The Department of the Air Force-Massachusetts Institute of Technology AI Accelerator (DAF-MIT AI Accelerator) developed such an AI challenge using real-world Air Force flight simulator data. The Maneuver ID challenge assembled thousands of virtual reality simulator flight recordings collected by actual Air Force student pilots at Pilot Training Next (PTN). This dataset has been publicly released at Maneuver-ID.mit.edu and represents the first of its kind public release of USAF flight training data. Using this dataset, we have applied a variety of AI methods to separate "good" vs "bad" simulator data and categorize and characterize maneuvers. These data, algorithms, and software are being released as baselines of model performance for others to build upon to enable the AI ecosystem for flight simulator training.
... Unlike other near-clique substructures like -plexes, -clans, and -clubs, which are computationally intractable to enumerate and count, -trusses can be efficiently found in polynomial-time. Many parallel, external-memory, and distributed algorithms have been developed in the past decade for -cores [20,25,35,37,49,70] and -trusses [7,12,13,17,36,44,62,67,75], and computing all trussness values of a graph is one of the challenge problems in the yearly MIT GraphChallenge [50]. A related problem is to compute the -clique densest subgraph [65] and ( , Ψ)core [24], for which efficient parallel algorithms have been recently designed [59]. ...
Preprint
This paper studies the nucleus decomposition problem, which has been shown to be useful in finding dense substructures in graphs. We present a novel parallel algorithm that is efficient both in theory and in practice. Our algorithm achieves a work complexity matching the best sequential algorithm while also having low depth (parallel running time), which significantly improves upon the only existing parallel nucleus decomposition algorithm (Sariyuce et al., PVLDB 2018). The key to the theoretical efficiency of our algorithm is a new lemma that bounds the amount of work done when peeling cliques from the graph, combined with the use of a theoretically-efficient parallel algorithms for clique listing and bucketing. We introduce several new practical optimizations, including a new multi-level hash table structure to store information on cliques space-efficiently and a technique for traversing this structure cache-efficiently. On a 30-core machine with two-way hyper-threading on real-world graphs, we achieve up to a 55x speedup over the state-of-the-art parallel nucleus decomposition algorithm by Sariyuce et al., and up to a 40x self-relative parallel speedup. We are able to efficiently compute larger nucleus decompositions than prior work on several million-scale graphs for the first time.
... AI challenges are also a key driver of AI computational requirements. Challenges such as YOHO [1], MNIST [2], HPC Challenge [3], Graph Challenge [4]- [6], ImageNet [7] and VAST [8], [9] have played important roles in driving progress in fields as diverse as machine learning, high performance computing, and visual analytics. YOHO is the Linguistic Data Consortium database for voice verification systems and has been a critical enabler of speech research. ...
Preprint
AI algorithms that identify maneuvers from trajectory data could play an important role in improving flight safety and pilot training. AI challenges allow diverse teams to work together to solve hard problems and are an effective tool for developing AI solutions. AI challenges are also a key driver of AI computational requirements. The Maneuver Identification Challenge hosted at maneuver-id.mit.edu provides thousands of trajectories collected from pilots practicing in flight simulators, descriptions of maneuvers, and examples of these maneuvers performed by experienced pilots. Each trajectory consists of positions, velocities, and aircraft orientations normalized to a common coordinate system. Construction of the data set required significant data architecture to transform flight simulator logs into AI ready data, which included using a supercomputer for deduplication and data conditioning. There are three proposed challenges. The first challenge is separating physically plausible (good) trajectories from unfeasible (bad) trajectories. Human labeled good and bad trajectories are provided to aid in this task. Subsequent challenges are to label trajectories with their intended maneuvers and to assess the quality of those maneuvers.
... Input graphs from the Stanford Network Analysis Project (SNAP) [5] were downloaded from the GraphChallenge collection [6]. These graphs were made upper-triangular before being used as inputs. ...
Preprint
In this work we present a performance exploration on Eager K-truss, a linear-algebraic formulation of the K-truss graph algorithm. We address performance issues related to load imbalance of parallel tasks in symmetric, triangular graphs by presenting a fine-grained parallel approach to executing the support computation. This approach also increases available parallelism, making it amenable to GPU execution. We demonstrate our fine-grained parallel approach using implementations in Kokkos and evaluate them on an Intel Skylake CPU and an Nvidia Tesla V100 GPU. Overall, we observe between a 1.261. 48x improvement on the CPU and a 9.97-16.92x improvement on the GPU due to our fine-grained parallel formulation.
... It combines graph-theoretic, statistics and database technology to model, store, retrieve and analyze graph-structured data. Samsi [127] used subgraph isomorphism to solve the previous scalability difficulties in machine learning, high-performance computing, and visual analysis. The serial implementations of C++, Python, and Pandas and MATLAB are implemented, and their single-thread performance is measured. ...
Article
Full-text available
Machine learning is driving development across many fields in science and engineering. A simple and efficient programming language could accelerate applications of machine learning in various fields. Currently, the programming languages most commonly used to develop machine learning algorithms include Python, MATLAB, and C/C ++. However, none of these languages well balance both efficiency and simplicity. The Julia language is a fast, easy-to-use, and open-source programming language that was originally designed for high-performance computing, which can well balance the efficiency and simplicity. This paper summarizes the related research work and developments in the applications of the Julia language in machine learning. It first surveys the popular machine learning algorithms that are developed in the Julia language. Then, it investigates applications of the machine learning algorithms implemented with the Julia language. Finally, it discusses the open issues and the potential future directions that arise in the use of the Julia language in machine learning.
... The public data are available in a variety of formats, such as linked list, tab separated, and labeled/unlabeled. The Graph Challenge consists of a pre-challenge and three full challenges • Pre-challenge: PageRank pipeline [27] • Static graph challenge: subgraph isomorphism [28] • Streaming graph challenge: stochastic block partition [29] • Sparse DNN challenge [14] The static graph challenge is further broken down into triangle counting and k-truss. The sparse DNN challenge is the focus of this paper. ...
Preprint
Full-text available
The MIT/IEEE/Amazon GraphChallenge.org encourages community approaches to developing new solutions for analyzing graphs and sparse data. Sparse AI analytics present unique scalability difficulties. The Sparse Deep Neural Network (DNN) Challenge draws upon prior challenges from machine learning, high performance computing, and visual analytics to create a challenge that is reflective of emerging sparse AI systems. The sparse DNN challenge is based on a mathematically well-defined DNN inference computation and can be implemented in any programming environment. In 2019 several sparse DNN challenge submissions were received from a wide range of authors and organizations. This paper presents a performance analysis of the best performers of these submissions. These submissions show that their state-of-the-art sparse DNN execution time, TDNNT_{\rm DNN}, is a strong function of the number of DNN operations performed, NopN_{\rm op}. The sparse DNN challenge provides a clear picture of current sparse DNN systems and underscores the need for new innovations to achieve high performance on very large sparse DNNs.
... It combines graph-theoretic, statistics and database technology to model, store, retrieve and analyze graph-structured data. Samsi [127] used subgraph isomorphism to solve the previous scalability difficulties in machine learning, high-performance computing, and visual analysis. The serial implementations of C++, Python, and Pandas and MATLAB are implemented, and their single-thread performance is measured. ...
Preprint
Full-text available
Machine learning is driving development across many fields in science and engineering. A simple and efficient programming language could accelerate applications of machine learning in various fields. Currently, the programming languages most commonly used to develop machine learning algorithms include Python, MATLAB, and C/C ++. However, none of these languages well balance both efficiency and simplicity. The Julia language is a fast, easy-to-use, and open-source programming language that was originally designed for high-performance computing , which can well balance the efficiency and simplicity. This paper summarizes the related research work and developments in the application of the Julia language in machine learning. It first surveys the popular machine learning algorithms that are developed in the Julia language. Then, it investigates applications of the machine learning algorithms implemented with the Julia language. Finally, it discusses the open issues and the potential future directions that arise in the use
... The Graph Challenge consists of three challenges • Pre-challenge: PageRank pipeline [74] • Static graph challenge: subgraph isomorphism [75] • Streaming graph challenge: stochastic block partition [76] • Sparse deep neural network challenge [77] The static graph challenge is further broken down into triangle counting and k-truss. Triangle counting is the focus of this paper. ...
Preprint
Full-text available
The rise of graph analytic systems has created a need for new ways to measure and compare the capabilities of graph processing systems. The MIT/Amazon/IEEE Graph Challenge has been developed to provide a well-defined community venue for stimulating research and highlighting innovations in graph analysis software, hardware, algorithms, and systems. GraphChallenge.org provides a wide range of pre-parsed graph data sets, graph generators, mathematically defined graph algorithms, example serial implementations in a variety of languages, and specific metrics for measuring performance. The triangle counting component of GraphChallenge.org tests the performance of graph processing systems to count all the triangles in a graph and exercises key graph operations found in many graph algorithms. In 2017, 2018, and 2019 many triangle counting submissions were received from a wide range of authors and organizations. This paper presents a performance analysis of the best performers of these submissions. These submissions show that their state-of-the-art triangle counting execution time, TtriT_{\rm tri}, is a strong function of the number of edges in the graph, NeN_e, which improved significantly from 2017 (Ttri(Ne/108)4/3T_{\rm tri} \approx (N_e/10^8)^{4/3}) to 2018 (TtriNe/109T_{\rm tri} \approx N_e/10^9) and remained comparable from 2018 to 2019. Graph Challenge provides a clear picture of current graph analysis systems and underscores the need for new innovations to achieve high performance on very large graphs.
... Triangle counting is widely used to study the subgraph isomorphism. Triangles are the special case of patterns that repeat in a graph [29]. It is used to study the interconnectedness of a graph. ...
... Since a network motif is a kind of high-frequency low-order subgraph in the network, the subgraph isomorphism problem is often encountered during detection and counting. Scholars have studied the subgraph isomorphism problem for a long period [9]- [11]. This problem refers to the node mapping relationship between two subgraphs wherein all edges are also eligible. ...
Article
Full-text available
Motifs have been recognized as basic network blocks and are found to be quite powerful in modeling certain patterns. Generally speaking, local characteristics of big networks could be reflected in network motifs. Over the years, motifs have attracted a lot of attention from researchers. However, most current literature reviews on motifs generally focus on the field of biological science. In contrast, here we try to present a comprehensive survey on motifs in the context of big networks. We introduce the definition of motifs and other related concepts. Big networks with motif-based structures are analyzed. Specifically, we respectively analyze four kinds of networks, including biological networks, social networks, academic networks, and infrastructure networks. We then examine methods for motif discovery, motif counting, and motif clustering. The applications of motifs in different areas have also been reviewed. Finally, some challenges and open issues in this direction are discussed.
Preprint
The problem of finding the vertex correspondence between two noisy graphs with different number of vertices where the smaller graph is still large has many applications in social networks, neuroscience, and computer vision. We propose a solution to this problem via a graph matching matched filter: centering and padding the smaller adjacency matrix and applying graph matching methods to align it to the larger network. The centering and padding schemes can be incorporated into any algorithm that matches using adjacency matrices. Under a statistical model for correlated pairs of graphs, which yields a noisy copy of the small graph within the larger graph, the resulting optimization problem can be guaranteed to recover the true vertex correspondence between the networks. However, there are currently no efficient algorithms for solving this problem. To illustrate the possibilities and challenges of such problems, we use an algorithm that can exploit a partially known correspondence and show via varied simulations and applications to {\it Drosophila} and human connectomes that this approach can achieve good performance.
Article
Full-text available
Graph drawing, involving the automatic layout of graphs, is vital for clear data visualization and interpretation but poses challenges due to the optimization of a multi-metric objective function, an area where current search-based methods seek improvement. In this paper, we investigate the performance of Jaya algorithm for automatic graph layout with straight lines. Jaya algorithm has not been previously used in the field of graph drawing. Unlike most population-based methods, Jaya algorithm is a parameter-less algorithm in that it requires no algorithm-specific control parameters and only population size and number of iterations need to be specified, which makes it easy for researchers to apply in the field. To improve Jaya algorithm’s performance, we applied Latin Hypercube Sampling to initialize the population of individuals so that they widely cover the search space. We developed a visualization tool that simplifies the integration of search methods, allowing for easy performance testing of algorithms on graphs with weighted aesthetic metrics. We benchmarked the Jaya algorithm and its enhanced version against Hill Climbing and Simulated Annealing, commonly used graph-drawing search algorithms which have a limited number of parameters, to demonstrate Jaya algorithm’s effectiveness in the field. We conducted experiments on synthetic datasets with varying numbers of nodes and edges using the Erdős–Rényi model and real-world graph datasets and evaluated the quality of the generated layouts, and the performance of the methods based on number of function evaluations. We also conducted a scalability experiment on Jaya algorithm to evaluate its ability to handle large-scale graphs. Our results showed that Jaya algorithm significantly outperforms Hill Climbing and Simulated Annealing in terms of the quality of the generated graph layouts and the speed at which the layouts were produced. Using improved population sampling generated better layouts compared to the original Jaya algorithm using the same number of function evaluations. Moreover, Jaya algorithm was able to draw layouts for graphs with 500 nodes in a reasonable time.
Article
Wing and Tip decomposition are motif-based analytics for bipartite graphs, that construct a hierarchy of butterfly (2,2-biclique) dense edge and vertex induced subgraphs, respectively. They have applications in several domains including e-commerce, recommendation systems, document analysis and others. Existing decomposition algorithms use a bottom-up approach that constructs the hierarchy in an increasing order of the subgraph density. They iteratively select the edges or vertices with minimum butterfly count peel i.e. remove them along with their butterflies. The amount of butterflies in real-world bipartite graphs makes bottom-up peeling computationally demanding. Furthermore, the strict order of peeling entities results in a large number of sequentially dependent iterations. Consequently, parallel algorithms based on bottom up peeling incur heavy synchronization and poor scalability. In this paper, we propose a novel Parallel Bipartite Network peelinG (PBNG) framework which adopts a two-phased peeling approach to relax the order of peeling, and in turn, dramatically reduce synchronization. The first phase divides the decomposition hierarchy into few partitions, and requires little synchronization. The second phase concurrently processes all partitions to generate individual levels of the hierarchy, and requires no global synchronization. The two-phased peeling further enables batching optimizations that dramatically improve the computational efficiency of PBNG. We empirically evaluate PBNG using several real-world bipartite graphs and demonstrate radical improvements over the existing approaches. On a shared-memory 36 core server, PBNG achieves up to 19.7 × self-relative parallel speedup. Compared to the state-of-the-art parallel framework P ar B utterfly , PBNG reduces synchronization by up to 15260 × and execution time by up to 295 ×. Furthermore, it achieves up to 38.5 × speedup over state-of-the-art algorithms specifically tuned for wing decomposition. Our source code is made available at https://github.com/kartiklakhotia/RECEIPT.
Chapter
Currently the field of network forensics lacks a methodology for attack fingerprinting. Such a methodology would enhance attack attribution. Currently, attack attribution is often quite subjective. The current research provides a mathematically rigorous procedure for creating fingerprints of network intrusions. These fingerprints can be compared to the fingerprints of known cyber-attacks, to provide a mathematically robust method for attack attribution.KeywordsNetwork fingerprintingGraph theoryNetwork forensicsCyber-attackCyber forensics
Chapter
Triangles are an essential part of network analysis, representing metrics such as transitivity and clustering coefficient. Using the correspondence between sparse adjacency matrices and graphs, linear algebraic methods have been developed for triangle counting and enumeration, where the main computational kernel is sparse matrix-matrix multiplication. In this paper, we use an intersection representation of graph data implemented as a sparse matrix, and engineer an algorithm to compute the “k-count” distribution of the triangles of the graph. The main computational task of computing sparse matrix-vector products is carefully crafted by employing compressed vectors as accumulators. Our method avoids redundant work by counting and enumerating each triangle exactly once. We present results from extensive computational experiments on large-scale real-world and synthetic graph instances that demonstrate good scalability of our method. In terms of run-time performance, our algorithm has been found to be orders of magnitude faster than the reference implementations of the miniTri data analytics application [18].
Chapter
Truss decomposition algorithm is to decompose a graph into a hierarchical subgraph structure. A k-truss (k2k \ge 2) is a subgraph that each edge is in at least k2k-2 triangles. The existing algorithm is to first compute the number of triangles for each edge, and then iteratively increase k to peel off the edges that are not in the (k+1)-truss. Due to the scale of the data and the intensity of computations, truss decomposition algorithm on the billion-side graph may take more than hours on a commodity server. In addition, today, more servers adopt NUMA architecture, which also affects the scalability of the algorithm. Therefore, we propose a NUMA-aware shared-memory parallel algorithm to accelerate the truss decomposition for NUMA systems by (1) computing different levels of k-truss between each NUMA nodes (2) dividing the range of k heuristically to ensure load balance (3) optimizing data structure and triangle counting method to reduce remote memory access, data contention and data skew. Our experiments show that on real-world datasets our OpenMP implementation can accelerate truss decomposition effectively on NUMA systems.KeywordsTruss decompositionTriangle countingNUMAMultithreadGraph analysis
Article
Tip decomposition is a crucial kernel for mining dense subgraphs in bipartite networks, with applications in spam detection, analysis of affiliation networks etc. It creates a hierarchy of vertex-induced subgraphs with varying densities determined by the participation of vertices in butterflies (2, 2-bicliques). To build the hierarchy, existing algorithms iteratively follow a delete-update (peeling) process: deleting vertices with the minimum number of butterflies and correspondingly updating the butterfly count of their 2-hop neighbors. The need to explore 2-hop neighborhood renders tip-decomposition computationally very expensive. Furthermore, the inherent sequentiality in peeling only minimum butterfly vertices makes derived parallel algorithms prone to heavy synchronization. In this paper, we propose a novel parallel tip-decomposition algorithm - REfine CoarsE-grained Independent Tasks (RECEIPT) that relaxes the peeling order restrictions by partitioning the vertices into multiple independent subsets that can be concurrently peeled. This enables RECEIPT to simultaneously achieve a high degree of parallelism and dramatic reduction in synchronizations. Further, RECEIPT employs a hybrid peeling strategy along with other optimizations that drastically reduce the amount of wedge exploration and execution time. We perform detailed experimental evaluation of RECEIPT on a shared-memory multicore server. It can process some of the largest publicly available bipartite datasets orders of magnitude faster than the state-of-the-art algorithms - achieving up to 1100× and 64× reduction in the number of thread synchronizations and traversed wedges, respectively. Using 36 threads, RECEIPT can provide up to 17.1× self-relative speedup.
Chapter
The motivation of the work in this paper is due to the need in research and applied fields for synthetic social network data due to (i) difficulties to obtain real data and (ii) data privacy issues of the real data. The issues to address are first to obtain a graph with a social network type structure, label it with communities. The main focus is the generation of realistic data, its assignment to and propagation within the graph. The main aim in this work is to implement an easy to use standalone end-user application which addresses the aforementioned issues. The methods used are the R-MAT and Louvain algorithms, with some modifications, for graph generation and community labeling respectively, and the development of a Java based system for the data generation using an original seed assignment algorithm followed by a second algorithm for weighted and probabilistic data propagation to neighbors and other nodes. The results show that a close fit can be achieved between the initial user specification and the generated data, and that the algorithms have potential for scale up. The system is made publicly available in a Github Java project.
Chapter
Full-text available
Question Answering (QA) systems play an important role in decision support systems. Deep neural network-based passage rankers have recently been developed to more effectively rank likely answer-containing passages for QA purposes. These rankers utilize distributed word or sentence embeddings. Such distributed representations mostly carry semantic relatedness of text units in which explicit linguistic features are under-represented. In this paper, we take novel approaches to combine linguistic features (such as different part-of-speech measures) with distributed sentence representations of questions and passages. The QUASAR-T fact-seeking questions and short text passages were used in our experiments to show that while ensembling of deep relevance measures based on pure sentence embedding with linguistic features using several machine learning techniques fails to improve upon the passage ranking performance of our baseline neural network ranker, the concatenation of the same features within the network structure significantly improves the overall performance of passage ranking for QA.
Preprint
Hypersparse matrices are a powerful enabler for a variety of network, health, finance, and social applications. Hierarchical hypersparse GraphBLAS matrices enable rapid streaming updates while preserving algebraic analytic power and convenience. In many contexts, the rate of these updates sets the bounds on performance. This paper explores hierarchical hypersparse update performance on a variety of hardware with identical software configurations. The high-level language bindings of the GraphBLAS readily enable performance experiments on simultaneous diverse hardware. The best single process performance measured was 4,000,000 updates per second. The best single node performance measured was 170,000,000 updates per second. The hardware used spans nearly a decade and allows a direct comparison of hardware improvements for this computation over this time range; showing a 2x increase in single-core performance, a 3x increase in single process performance, and a 5x increase in single node performance. Running on nearly 2,000 MIT SuperCloud nodes simultaneously achieved a sustained update rate of over 200,000,000,000 updates per second. Hierarchical hypersparse GraphBLAS allows the MIT SuperCloud to analyze extremely large streaming network data sets.
Preprint
Full-text available
The SuiteSparse GraphBLAS C-library implements high performance hypersparse matrices with bindings to a variety of languages (Python, Julia, and Matlab/Octave). GraphBLAS provides a lightweight in-memory database implementation of hypersparse matrices that are ideal for analyzing many types of network data, while providing rigorous mathematical guarantees, such as linearity. Streaming updates of hypersparse matrices put enormous pressure on the memory hierarchy. This work benchmarks an implementation of hierarchical hypersparse matrices that reduces memory pressure and dramatically increases the update rate into a hypersparse matrices. The parameters of hierarchical hypersparse matrices rely on controlling the number of entries in each level in the hierarchy before an update is cascaded. The parameters are easily tunable to achieve optimal performance for a variety of applications. Hierarchical hypersparse matrices achieve over 1,000,000 updates per second in a single instance. Scaling to 31,000 instances of hierarchical hypersparse matrices arrays on 1,100 server nodes on the MIT SuperCloud achieved a sustained update rate of 75,000,000,000 updates per second. This capability allows the MIT SuperCloud to analyze extremely large streaming network data sets.
Chapter
Triangle counting is an important step in calculating the network clustering confficient and transitivity, and is widely used in important role recognition, spam detection, community discovery, and biological detection. In this paper, we introduced a GPU-based load balancing triangle counting scheme (GBTCS), which contains three techniques. First, we designed an algorithm for preprocessing the graph to obtain the CSR (Compressed Sparse Row Format) representation of the graph, which not only can reduce half of the memory usage of GPU, but also distribute the computational overhead to the core of the GPU. Second, we designed a SIMD (Single Instruction Multiple Data)-based set intersection algorithm that improves the thread parallel performance on the GPU. Third, we designed a load balancing algorithm to dynamically schedule the GPU workload. Performance evaluations demonstrate that our proposed scheme is 5x to 120x faster than the serial CPU algorithm.
Article
Full-text available
The increasing size of Big Data is often heralded but how data are transformed and represented is also profoundly important to knowledge discovery, and this is exemplified in Big Graph analytics. Much attention has been placed on the scale of the input graph but the product of a graph algorithm can be many times larger than the input. This is true for many graph problems, such as listing all triangles in a graph. Enabling scalable graph exploration for Big Graphs requires new approaches to algorithms, architectures, and visual analytics. A brief tutorial is given to aid the argument for thoughtful representation of data in the context of graph analysis. Then a new algebraic method to reduce the arithmetic operations in counting and listing triangles in graphs is introduced. Additionally, a scalable triangle listing algorithm in the MapReduce model will be presented followed by a description of the experiments with that algorithm that led to the current largest and fastest triangle listing benchmarks to date. Finally, a method for identifying triangles in new visual graph exploration technologies is proposed.
Conference Paper
Full-text available
We implement exact triangle counting in graphs on the GPU using three different methodologies: subgraph matching to a triangle pattern; programmable graph analytics, with a set-intersection approach; and a matrix formulation based on sparse matrix-matrix multiplies. All three deliver best-of-class performance over CPU implementations and over comparable GPU implementations, with the graph-analytic approach achieving the best performance due to its ability to exploit efficient filtering steps to remove unnecessary work and its high-performance set-intersection core.
Article
Full-text available
The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide detailed a analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the five years of the challenge, and propose future directions and improvements.
Article
Full-text available
One of the main objectives of the DARPA High Productivity Computing Systems (HPCS) program is to reassess the way we define and measure performance, programmability, portability, robustness and ultimately productivity in the High Performance Computing (HPC) domain. This article describes the Scalable Synthetic Compact Applications (SSCA) benchmark suite, a community product delivered under support of the DARPA HPCS program. The SSCA benchmark suite consists of six benchmarks. The first three SSCA benchmarks are specified and described in this article. The last three are to be developed and will relate to simulation. SSCA #1 Bioinformatics Optimal Pattern Matching stresses integer and character operations (no floating point required) and is compute-limited; SSCA #2 Graph Analysis stresses memory access, uses integer operations, is compute-intensive, and is hard to parallelize on most modern systems; and SSCA #3 Synthetic Aperture Radar Application is computationally taxing, seeks a high rate at which answers are generated, and contains a significant file I/O component. These SSCA benchmarks are envisioned to emerge as complements to current scalable micro-benchmarks and complex real applications to measure high-end productivity and system performance. They are also described in sufficient detail to drive novel HPC programming paradigms, as well as architecture development and testing. The benchmark written and executable specifications are available from www.highproductivity.org.
Conference Paper
Full-text available
How can we generate realistic graphs? In addition, how can we do so with a mathematically tractable model that makes it feasible to analyze their properties rigorously? Real graphs obey a long list of surprising properties: Heavy tails for the in- and out-degree distribution; heavy tails for the eigenvalues and eigenvectors; small diameters; and the recently discovered “Densification Power Law” (DPL). All published graph generators either fail to match several of the above properties, are very complicated to analyze mathematically, or both. Here we propose a graph generator that is mathematically tractable and matches this collection of properties. The main idea is to use a non-standard matrix operation, the Kronecker product, to generate graphs that we refer to as “Kronecker graphs”. We show that Kronecker graphs naturally obey all the above properties; in fact, we can rigorously prove that they do so. We also provide empirical evidence showing that they can mimic very well several real graphs.
Article
Full-text available
Community structure plays a significant role in the analysis of social networks and similar graphs, yet this structure is little understood and not well captured by most models. We formally define a community to be a subgraph that is internally highly connected and has no deeper substructure. We use tools of combinatorics to show that any such community must contain a dense Erdős-Rényi (ER) subgraph. Based on mathematical arguments, we hypothesize that any graph with a heavy-tailed degree distribution and community structure must contain a scale-free collection of dense ER subgraphs. These theoretical observations corroborate well with empirical evidence. From this, we propose the Block Two-Level Erdős-Rényi (BTER) model, and demonstrate that it accurately captures the observable properties of many real-world social networks.
Article
Full-text available
We propose elementary ASCII exchange formats for matrices. Specific instances of the format are defined for dense and sparse matrices with real, complex, integer and pattern entries, with special cases for symmetric, skew-symmetric and Hermitian matrices. Sparse matrices are represented in a coordinate storage format. The overall file structure is designed to allow future definition of other specialized matrix formats, as well as for objects other than matrices. Contents 1 Introduction 1 1.1 Coordinate Format for Sparse Matrices : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 1.2 Array Format for Dense Matrices : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2 Specification of the Base MM File Format 6 3 Specification of MM Formats for Matrices 7 3.1 Coordinate Formats for Sparse Matrices : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 3.2 Array Formats for Dense Matrices : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 4 Extend...
Article
The rise of big data systems has created a need for benchmarks to measure and compare the capabilities of these systems. Big data benchmarks present unique scalability challenges. The supercomputing community has wrestled with these challenges for decades and developed methodologies for creating rigorous scalable benchmarks (e.g., HPC Challenge). The proposed PageRank pipeline benchmark employs supercomputing benchmarking methodologies to create a scalable benchmark that is reflective of many real-world big data processing systems. The PageRank pipeline benchmark builds on existing prior scalable benchmarks (Graph500, Sort, and PageRank) to create a holistic benchmark with multiple integrated kernels that can be run together or independently. Each kernel is well defined mathematically and can be implemented in any programming environment. The linear algebraic nature of PageRank makes it well suited to being implemented using the GraphBLAS standard. The computations are simple enough that performance predictions can be made based on simple computing hardware models. The surrounding kernels provide the context for each kernel that allows rigorous definition of both the input and the output for each kernel. Furthermore, since the proposed PageRank pipeline benchmark is scalable in both problem size and hardware, it can be used to measure and quantitatively compare a wide range of present day and future systems. Serial implementations in C++, Python, Python with Pandas, Matlab, Octave, and Julia have been implemented and their single threaded performance has been measured.
Article
Big data and the Internet of Things era continue to challenge computational systems. Several technology solutions such as NoSQL databases have been developed to deal with this challenge. In order to generate meaningful results from large datasets, analysts often use a graph representation which provides an intuitive way to work with the data. Graph vertices can represent users and events, and edges can represent the relationship between vertices. Graph algorithms are used to extract meaningful information from these very large graphs. At MIT, the Graphulo initiative is an effort to perform graph algorithms directly in NoSQL databases such as Apache Accumulo or SciDB, which have an inherently sparse data storage scheme. Sparse matrix operations have a history of efficient implementations and the Graph Basic Linear Algebra Subprogram (GraphBLAS) community has developed a set of key kernels that can be used to develop efficient linear algebra operations. However, in order to use the GraphBLAS kernels, it is important that common graph algorithms be recast using the linear algebra building blocks. In this article, we look at common classes of graph algorithms and recast them into linear algebra operations using the GraphBLAS building blocks.
Article
The annual Visual Analytics Science and Technology (VAST) challenge provides Visual Analytics researchers, developers, and designers an opportunity to apply their best tools and techniques against invented problems that include a realistic scenario, data, tasks, and questions to be answered. Submissions are processed much like conference papers, contestants are provided reviewer feedback, and excellence is recognized with awards. A day-long VAST Challenge workshop takes place each year at the IEEE VAST conference to share results and recognize outstanding submissions. Short papers are published each year in the annual VAST proceedings. Over the history of the challenge, participants have investigated a wide variety of scenarios, such as bioterrorism, epidemics, arms smuggling, social unrest, and computer network attacks, among many others. Contestants have been provided with large numbers of realistic but synthetic Coast Guard interdiction records, intelligence reports, hospitalization records, microblog records, personal RFID tag locations, huge amounts of cyber security log data, and several hours of video. This paper describes the process for developing the synthetic VAST Challenge datasets and conducting the annual challenges. This paper also provides an introduction to this special issue of Information Visualization, focusing on the impacts of the VAST Challenge.
Conference Paper
We describe the evolution of the IEEE Visual Analytics Science and Technology (VAST) Challenge from its origin in 2006 to present (2012). The VAST Challenge has provided an opportunity for visual analytics researchers to test their innovative thoughts on approaching problems in a wide range of subject domains against realistic datasets and problem scenarios. Over time, the Challenge has changed to correspond to the needs of researchers and users. We describe those changes and the impacts they have had on topics selected, data and questions offered, submissions received, and the Challenge format.
Article
This paper presents a new space-efficient algorithm for counting and sampling triangles--and more generally, constant-sized cliques--in a massive graph whose edges arrive as a stream. Compared to prior work, our algorithm yields significant improvements in the space and time complexity for these fundamental problems. Our algorithm is simple to implement and has very good practical performance on large graphs.
Article
Social network analysts employ automated methods of identifying cohesive subgraphs as a means of focusing their attention on areas of the network that are likely to be fruitful. A variety of standard cohesive subgraphs have been defined in the literature, but most suffer from computational intractability and other drawbacks. This paper introduces a new cohesive subgraph called a truss. Like most of the existing definitions, the truss is a relaxation of a clique. The truss offers guaranteed computation in polynomial time, is motivated by a natural observation of social cohesion, and is related nicely to other standard structures. After a review of other cohesive group definitions, the truss is defined, many of its properties are enumerated, a detailed accounting of computing trusses is given, and some examples are shown.
Article
The k-truss is a type of cohesive subgraphs proposed recently for the study of networks. While the problem of computing most cohesive subgraphs is NP-hard, there exists a polynomial time algorithm for computing k-truss. Compared with k-core which is also efficient to compute, k-truss represents the "core" of a k-core that keeps the key information of, while filtering out less important information from, the k-core. However, existing algorithms for computing k-truss are inefficient for handling today's massive networks. We first improve the existing in-memory algorithm for computing k-truss in networks of moderate size. Then, we propose two I/O-efficient algorithms to handle massive networks that cannot fit in main memory. Our experiments on real datasets verify the efficiency of our algorithms and the value of k-truss.
Article
As the size of graphs for analysis continues to grow, methods of graph processing that scale well have become increasingly important. One way to handle large datasets is to disperse them across an array of networked computers, each of which implements simple sorting and accumulating, or MapReduce, operations. This cloud computing approach offers many attractive features. If decomposing useful graph operations in terms of MapReduce cycles is possible, it provides incentive for seriously considering cloud computing. Moreover, it offers a way to handle a large graph on a single machine that can't hold the entire graph as well as enables streaming graph processing. This article examines this possibility.
Conference Paper
A standard database for testing voice verification systems, called YOHO, is now available from the Linguistic Data Consortium (LDC). The purpose of this database is to enable research, spark competition, and provide a means for comparative performance assessments between various voice verification systems. A test plan is presented for the suggested use of the LDC's YOHO CD-ROM for testing voice verification systems. This plan is based upon ITT's voice verification test methodology as described by Higgins, et al. (1992), but differs slightly in order to match the LDC's CD-ROM version of YOHO and to accommodate different systems. Test results of several algorithms using YOHO are also presented
Brad Appleton. sclc - Source-code line counter
  • Brad Appleton
Perfect power law graphs: Generation, sampling, construction and fitting
  • Jeremy Kepner
SNAP Datasets: Stanford large network dataset collection
  • Jure Leskovec
  • Andrej Krevl
  • Datasets