Conference Paper

Relative prefix sums: an efficient approach for querying dynamic OLAP data cubes

Dept. of Comput. Sci., California Univ., Santa Barbara, CA
DOI: 10.1109/ICDE.1999.754948 Conference: Data Engineering, 1999. Proceedings., 15th International Conference on
Source: DBLP


Range sum queries on data cubes are a powerful tool for analysis.
A range sum query applies an aggregation operation (e.g., SUM) over all
selected cells in a data cube, where the selection is specified by
providing ranges of values for numeric dimensions. Many application
domains require that information provided by analysis tools be current
or “near-current.” Existing techniques for range sum queries
on data cubes, however, can incur update costs on the order of the size
of the data cube. Since the size of a data cube is exponential in the
number of its dimensions, rebuilding the entire data cube can be very
costly. We present an approach that achieves constant time range sum
queries while constraining update costs. Our method reduces the overall
complexity of the range sum problem

Full-text preview

Available from:
  • Source
    • "[23] introduces some techniques for minimizing the update overhead required to maintain the prefix-sums. However, because the fine-grained grid that maintains the counts resides in main memory it is cheap to update even if the techniques in [23] are not applied. A partition, say p, can be divided into four regions R1 through R4 as illustrated in Figure 6. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The unprecedented spread of location-aware devices has resulted in a plethora of location-based services in which huge amounts of spatial data need to be efficiently processed by large-scale computing clusters. Existing cluster-based systems for processing spatial data employ static data-partitioning structures that cannot adapt to data changes, and that are insensitive to the query workload. Hence, these systems are incapable of consistently providing good performance. To close this gap, we present AQWA, an adaptive and query-workload-aware mechanism for partitioning large-scale spatial data. AQWA does not assume prior knowledge of the data distribution or the query workload. Instead, as data is consumed and queries are processed, the data partitions are incrementally updated. With extensive experiments using real spatial data from Twitter, and various workloads of range and k-nearest-neighbor queries, we demonstrate that AQWA can achieve an order of magnitude enhancement in query performance compared to the state-of-the-art systems.
    Full-text · Conference Paper · Jan 2015
  • Source
    • "In this scenario, every data point can be thought of as a cell in a multidimensional array. In preprocessing [12] [13] [18] [29], a value is computed and materialized for every cell in the array, regardless of whether it contains a data point; then, every query can be answered in a small number of lookups on the materialized information. This technique is not applicable to our problem (even with the reduction from temporal aggregation to point enclosure counting) because objects' keys are distributed in a continuous domain. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper studies aggregate search in transaction time databases. Specifically, each object in such a database can be modeled as a horizontal segment, whose y-projection is its search key, and its x-projection represents the period when the key was valid in history. Given a query timestamp qt and a key range ~ qk, a count-query retrieves the number of objects that are alive at qt, and their keys fall in ~ qk. We provide a method that accurately answers such queries, with error less than 1 " + " · Nalive(qt), where Nalive(qt) is the number of objects alive at time qt, and " is any constant in (0,1). Denoting the disk page size as B, and n = N/B, our technique requires O(n) space, processes any query in O(logB n) time, and supports each update in O(logB n) amortized I/Os. As demonstrated by extensive experiments, the proposed solutions guarantee query results with extremely high precision (median relative error below 5%), while consuming only a fraction of the space occupied by the existing approaches that promise precise results.
    Full-text · Article · Aug 2008 · The VLDB Journal
  • Source
    • "Although this method can answer queries fast, an update in the worst case can propagate to the entire prefix-sum cube, which is as large as the original data cube. To control the cascading of updates, Geffner et al. [4], [5] decomposed the prefix-sum cubes recursively and Chan et al. [3] organized the prefix-sum cubes hierarchically; but the complexity of update still increases exponentially with the number of dimensions. In general, update propagation is a common problem for all these prefix-sum data cube approaches. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this research, we propose to use the discrete cosine transform to approximate the cumulative distributions of data cube cells' values. The cosine transform is known to have a good energy compaction property and thus can approximate data distribution functions easily with small number of coefficients. The derived estimator is accurate and easy to update. We perform experiments to compare its performance with a well-known technique - the (Haar) wavelet. The experimental results show that the cosine transform performs much better than the wavelet in estimation accuracy, speed, space efficiency, and update easiness. Keywords—DCT, Data Cube
    Full-text · Conference Paper · Jan 2008
Show more