Figure 2 - uploaded by Iwona Bialynicka-Birula
Content may be subject to copyright.
3: The second-level structures of the range tree in 2 linked using fractional cascading. For clarity, the first-level nodes are not depicted-they are the same as the ones in 2. The bold lines indicate the fractional cascading pointers following the paths of the query in the primary tree. The bold dashed lines indicate the fractional cascading pointers followed from a node on the main query path to its child whose secondary structure needs to be queried.

3: The second-level structures of the range tree in 2 linked using fractional cascading. For clarity, the first-level nodes are not depicted-they are the same as the ones in 2. The bold lines indicate the fractional cascading pointers following the paths of the query in the primary tree. The bold dashed lines indicate the fractional cascading pointers followed from a node on the main query path to its child whose secondary structure needs to be queried.

Citations

Article
We introduce a new representation of the inverted index that performs faster ranked unions and intersections while using similar space. Our index is based on the treap data structure, which allows us to intersect/merge the document identifiers while simultaneously thresholding by frequency, instead of the costlier two-step classical processing methods. To achieve compression, we represent the treap topology using different alternative compact data structures. Further, the treap invariants allow us to elegantly encode differentially both document identifiers and frequencies. We also show how to extend this representation to support incremental updates over the index. Results show that, under the tf-idf scoring scheme, our index uses about the same space as state-of-the-art compact representations, while performing up to 2-20 times faster on ranked single-word, union, or intersection queries. Under the BM25 scoring scheme, our index may use up to 40% more space than the others and outperforms them less frequently but still reaches improvement factors of 2-20 in the best cases. The index supporting incremental updates poses an overhead of 50%-100% over the static variants in terms of space, construction, and query time.
Conference Paper
We introduce a new representation of the inverted index that performs faster ranked unions and intersections while using less space. Our index is based on the treap data structure, which allows us to intersect/merge the document identifiers while simultaneously thresholding by frequency, instead of the costlier two-step classical processing methods. To achieve compression we represent the treap topology using compact data structures. Further, the treap invariants allow us to elegantly encode differentially both document identifiers and frequencies. Results show that our index uses about 20% less space, and performs queries up to three times faster, than state-of-the-art compact representations.