Analyzing design choices for distributed multidimensional indexing
ABSTRACT Scientific datasets are often stored on distributed archival storage systems, because geographically distributed sensor devices
store the datasets in their local machines and also because the size of scientific datasets demands large amount of disk space.
Multidimensional indexing techniques have been shown to greatly improve range query performance into large scientific datasets.
In this paper, we discuss several ways of distributing a multidimensional index in order to speed up access to large distributed
scientific datasets. This paper compares the designs, challenges, and problems for distributed multidimensional indexing schemes,
and provides a comprehensive performance study of distributed indexing to provide guidelines to choose a distributed multidimensional
index for a specific data analysis application.
KeywordsMultidimensional indexing–Distributed indexing–Decentralized indexing–Data intensive computing
- SourceAvailable from: uni-muenchen.de[show abstract] [hide abstract]
ABSTRACT: The R-tree, one of the most popular access methods for rectangles, is based on the heuristic optimization of the area of the enclosing rectangle in each inner node. By running numerous experiments in a standardized testbed under highly varying data, queries and operations, we were able to design the R*-tree which incorporates a combined optimization of area, margin and overlap of each enclosing rectangle in the directory. Using our standardized testbed in an exhaustive performance comparison, it turned out that the R*-tree clearly outperforms the existing R-tree variants. Guttman's linear and quadratic R-tree and Greene's variant of the R-tree. This superiority of the R*-tree holds for different types of queries and operations, such as map overlay, for both rectangles and multidimensional points in all experiments. From a practical point of view the R*-tree is very attractive because of the following two reasons 1 it efficiently supports point and spatial data at the same time and 2 its implementation cost is only slightly higher than that of other R-trees.01/1990;
Article: The SDSC Storage Resource Broker[show abstract] [hide abstract]
ABSTRACT: This paper describes the architecture of the SDSC Storage Resource Broker (SRB). The SRB is middleware that provides applications a uniform API to access heterogeneous distributed storage resources including, filesystems, database systems, and archival storage systems. The SRB utilizes a metadata catalog service, MCAT, to provide a "collection"- oriented view of data. Thus, data items that belong to a single collection may, in fact, be stored on heterogeneous storage systems. The SRB infrastructure is being used to support digital library projects at SDSC. This paper describes the architecture and various features of the SDSC SRB. 1 Introduction The San Diego Supercomputer Center (SDSC) is involved in developing infrastructure for a high performance distributed computing environment as part of its National Partnership for Advanced Computational Infrastructure (NPACI) project funded by the NSF. The NSF program in Partnerships for Advanced Computational Infrastructure (PACI), which fund...11/1998;
- [show abstract] [hide abstract]
ABSTRACT: At regional scales, satellite-based sensors are the primary source of information to study the earth's environment, as they provide the needed dynamic temporal view of the earth's surface. Raw satellite orbit data have to be processed and mapped into a standard projection to produce multitemporal data sets which can then be used for regional or global earth science studies. In this paper, we describe a software system Kronos for the generation of custom-tailored data products from the Advanced Very High Resolution Radiometer (AVHRR) sensor. Kronos allows the generation of a rich set of products that can be easily specified through a Java interface by scientists wishing to carry out earth system modeling or analysis based on AVHRR Global Area Coverage (GAC) data. Kronos is based on a flexible methodology and consists of four major components: ingest and preprocessing, indexing and storage, search and processing engine, and a Java interface. We illustrate the power of our methodology by ...09/2000;