Analyzing design choices for distributed multidimensional indexing

The Journal of Supercomputing (Impact Factor: 0.92). 03/2012; 59(3):1552-1576. DOI: 10.1007/s11227-011-0567-7
Source: DBLP

ABSTRACT Scientific datasets are often stored on distributed archival storage systems, because geographically distributed sensor devices
store the datasets in their local machines and also because the size of scientific datasets demands large amount of disk space.
Multidimensional indexing techniques have been shown to greatly improve range query performance into large scientific datasets.
In this paper, we discuss several ways of distributing a multidimensional index in order to speed up access to large distributed
scientific datasets. This paper compares the designs, challenges, and problems for distributed multidimensional indexing schemes,
and provides a comprehensive performance study of distributed indexing to provide guidelines to choose a distributed multidimensional
index for a specific data analysis application.

KeywordsMultidimensional indexing–Distributed indexing–Decentralized indexing–Data intensive computing

  • Source
  • Source
  • [Show abstract] [Hide abstract]
    ABSTRACT: The general purpose computing on graphics processing unit (GP-GPU) has emerged as a new cost effective parallel computing paradigm in high performance computing research that enables large amount of data to be processed in parallel. Large scale scientific data intensive applications have been playing an important role in modern high performance computing research. A common access pattern into such scientific data analysis applications is multi-dimensional range query, but not much research has been conducted on multi-dimensional range query on the GPU. Inherently multi-dimensional indexing trees such as R-Trees are not well suited for GPU environment because of its irregular tree traversal. Traversing irregular tree search path makes it hard to maximize the utilization of massively parallel architectures. In this paper, we propose a novel MPTS (Massively Parallel Three-phase Scanning) R-tree traversal algorithm for multi-dimensional range query, that converts recursive access to tree nodes into sequential access. Our extensive experimental study shows that MPTS R-tree traversal algorithm on NVIDIA Tesla M2090 GPU consistently outperforms traditional recursive R-trees search algorithm on Intel Xeon E5506 processors.
    Journal of Parallel and Distributed Computing 08/2013; 73(8):1195–1207. · 1.12 Impact Factor