Chapter

Merging Adjacency Lists for Efficient Web Graph Compression

DOI: 10.1007/978-3-642-23169-8_42 In book: Man-Machine Interactions 2 AISC 103, pp.385-392

ABSTRACT

Analysing Web graphs meets a difficulty in the necessity of storing a major part of huge graphs in the external memory, which
prevents efficient random access to edge (hyperlink) lists. A number of algorithms involving compression techniques have thus
been presented, to represent Web graphs succinctly but also providing random access. Our algorithm belongs to this category.
It works on contiguous blocks of adjacency lists, and its key mechanism is merging the block into a single ordered list. This
method achieves compression ratios much better than most methods known from the literature at rather competitive access times.

Keywordsgraph compression–random access

Download full-text

Full-text

Available from: Wojciech Bieniecki
  • Source
    • "To deal with this challenge, we use a compressed graph representation. Memory-efficient graph representations have been widely studied [5], [6]. In particular, Brisaboa et al., introduce a compact tree to represent the adjacency matrix of the web graph (the K 2 -tree) [7]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: With the advent of cloud computing, a significant number of web services are available on the Internet. Services can be combined together when user's requirements are too complex to be solved by individual services. Since there are many services, searching a solution may require much storage. We propose to apply a compact data structure to represent the web service composition graph. To the best of our knowledge, our work is the first attempt to consider compact structure in solving the web service composition problem. Experimental results show that our method can find a valid solution to the composition problem; meanwhile, it takes less space and shows good scalability when handling a large number of web services.
    Full-text · Conference Paper · Jun 2015
  • Source
    • "Assuming 20 outgoing links per node, 5-byte links (4-byte indexes to other pages are simply too small) and pointers to each adjacency list we would need more than 5.2 TB of memory, ways beyond the capacities of the current RAM memories. Preliminary versions of this manuscript were published in [4] and [5] "
    [Show abstract] [Hide abstract]
    ABSTRACT: Analyzing Web graphs has applications in determining page ranks, fighting Web spam, detecting communities and mirror sites, and more. This study is however hampered by the necessity of storing a major part of huge graphs in the external memory which prevents efficient random access to edge (hyperlink) lists. A number of algorithms involving compression techniques have thus been presented, to represent Web graphs succinctly, but also providing random access. Those techniques are usually based on differential encodings of the adjacency lists, finding repeating nodes or node regions in the successive lists, more general grammar-based transformations or 2-dimensional representations of the binary matrix of the graph. In this paper we present three Web graph compression algorithms. The first can be seen as engineering of the Boldi and Vigna (2004) [8] method. We extend the notion of similarity between link lists and use a more compact encoding of residuals. The algorithm works on blocks of varying size (in the number of input lists) and sacrifices access time for better compression ratio, achieving more succinct graph representation than other algorithms reported in the literature. The second algorithm works on blocks of the same size in the number of input lists. Its key mechanism is merging the block into a single ordered list. This method achieves much more attractive space–time tradeoffs. Finally, we present an algorithm for bidirectional neighbor query support, which offers compression ratios better than those known from the literature.
    Full-text · Article · Jan 2014 · Discrete Applied Mathematics
  • [Show abstract] [Hide abstract]
    ABSTRACT: Compressed representations have become effective to store and access large Web and social graphs, in order to support various graph querying and mining tasks. The existing representations exploit various typical patterns in those networks and provide basic navigation support. In this paper, we obtain unprecedented results by finding “dense subgraph” patterns and combining them with techniques such as node orderings and compact data structures. On those representations, we support out-neighbor and out/in-neighbor queries, as well as mining queries based on the dense subgraphs. First, we propose a compression scheme for Web graphs that reduces edges by representing dense subgraphs with “virtual nodes”; over this scheme, we apply node orderings and other compression techniques. With this approach, we match the best current compression ratios that support out-neighbor queries (i.e., nodes pointed from a given node), using 1.0–1.8 bits per edge (bpe) on large Web graphs, and retrieving each neighbor of a node in 0.6–1.0 microseconds ( $\upmu $ s). When supporting both out- and in-neighbor queries, instead, our technique generally offers the best time when using little space. If the reduced graph, instead, is represented with a compact data structure that supports bidirectional navigation, we obtain the most compact Web graph representations (0.9–1.5 bpe) that support out/in-neighbor navigation; yet, the time per neighbor extracted raises to around 5–20 $\upmu $ s. We also propose a compact data structure that represents dense subgraphs without using virtual nodes. It allows us to recover out/in-neighbors and answer other more complex queries on the dense subgraphs identified. This structure is not competitive on Web graphs, but on social networks, it achieves 4–13 bpe and 8–12 $\upmu $ s per out/in-neighbor retrieved, which improves upon all existing representations.
    No preview · Article · Aug 2013 · Knowledge and Information Systems
Show more