Bisimulation summaries of graph data have multiple applications, including facilitating graph exploration and enabling query optimization techniques, but efficient, scalable, summary construction is challenging. The literature describes parallel construction algorithms using message-passing, and these have been recently adapted to MapReduce environments. The fixpoint nature of bisimulation is
... [Show full abstract] well suited to iterative graph processing, but the existing MapReduce solutions do not drastically decrease per-iteration times as the computation progresses.
In this paper, we focus on leveraging parallel multi-core graph frameworks with the goal of constructing summaries in roughly the same amount of time that it takes to input the data into the framework (for a range of real world data graphs) and output the summary. To achieve our goal we introduce a singleton optimization that significantly reduces per-iteration times after only a few iterations. We present experimental results validating that our scalable GraphChi implementation achieves our goal with bisimulation summaries of million to billion edge graphs.