Conference Paper

Relative prefix sums: an efficient approach for querying dynamic OLAP data cubes

Dept. of Comput. Sci., California Univ., Santa Barbara, CA
DOI: 10.1109/ICDE.1999.754948 Conference: Data Engineering, 1999. Proceedings., 15th International Conference on
Source: DBLP

ABSTRACT Range sum queries on data cubes are a powerful tool for analysis.
A range sum query applies an aggregation operation (e.g., SUM) over all
selected cells in a data cube, where the selection is specified by
providing ranges of values for numeric dimensions. Many application
domains require that information provided by analysis tools be current
or “near-current.” Existing techniques for range sum queries
on data cubes, however, can incur update costs on the order of the size
of the data cube. Since the size of a data cube is exponential in the
number of its dimensions, rebuilding the entire data cube can be very
costly. We present an approach that achieves constant time range sum
queries while constraining update costs. Our method reduces the overall
complexity of the range sum problem

  • [Show abstract] [Hide abstract]
    ABSTRACT: In-memory OLAP systems require a space-efficient representation of sparse data cubes in order to accommodate large data sets. On the other hand, many efficient online aggregation techniques, such as prefix sums, are built on dense array-based representations. These are often not applicable to real-world data due to the size of the arrays which usually cannot be compressed well, as most sparsity is removed during pre-processing. A possible solution is to identify dense regions in a sparse cube and only represent those using arrays, while storing sparse data separately, e.g. in a spatial index structure. Previous dense-region-based approaches have concentrated mainly on the effectiveness of the dense-region detection (i.e. on the space-efficiency of the result). However, especially in higher-dimensional cubes, data is usually more cluttered, resulting in a potentially large number of small dense regions, which negatively affects query performance on such a structure. In this article, our focus is not only on space-efficiency but also on time-efficiency, both for the initial dense-region extraction and for queries carried out in the resulting hybrid data structure. After describing a pre-aggregation method for representing dense sub-cubes which supports efficient online aggregate queries as well as cell updates, our sub-cube extraction approach is outlined in detail. In addition, optimizations in our approach significantly reduce the time to build the initial data structure compared to former systems. Two methods to trade available memory for increased aggregate query performance are provided. Also, we present a straightforward adaptation of our approach to support multi-core or multi-processor architectures, which can further enhance query performance. Experiments with different real-world data sets show how various parameter settings can be used to adjust the efficiency and effectiveness of our algorithms.
    09/2010: pages 73-102;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this research, we propose to use the discrete cosine transform to approximate the cumulative distributions of data cube cells' values. The cosine transform is known to have a good energy compaction property and thus can approximate data distribution functions easily with small number of coefficients. The derived estimator is accurate and easy to update. We perform experiments to compare its performance with a well-known technique - the (Haar) wavelet. The experimental results show that the cosine transform performs much better than the wavelet in estimation accuracy, speed, space efficiency, and update easiness. Keywords—DCT, Data Cube
    Database and Expert Systems Applications, 19th International Conference, DEXA 2008, Turin, Italy, September 1-5, 2008. Proceedings; 01/2008
  • [Show abstract] [Hide abstract]
    ABSTRACT: As the applications of wireless sensor networks continue to expand, it is important to support fast and simultaneous data aggregation over multiple regions for advanced data analysis. In this paper, we propose a solution by using a novel distributed data structure called distributed data cube (DDC). A DDC maintains a set of special forms of aggregate values (prefix sum, prefix average, prefix max, and prefix min) in distributed sensor nodes. We will first present fast algorithms to build a DDC within a sharp time bound. Then, we will present efficient distributed query-processing algorithms to handle aggregate queries by using a DDC. For a query region with n sensor nodes, our algorithms can return within O (√ n ) time. Finally, extensive simulation studies confirm that a DDC can be built very quickly, which is consistent with the theoretical time bound. The network traffic injected while constructing a DDC is acceptable and also scalable as the network size grows. Query processing on a DDC is fast and energy efficient in terms of the time units needed and the number of messages incurred.
    IEEE Transactions on Systems Man and Cybernetics Part C (Applications and Reviews) 06/2011; · 2.55 Impact Factor


Available from