Conference Paper

Dictionary Compression in Point Cloud Data Management

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Nowadays, massive amounts of point cloud data can be collected thanks to advances in data acquisition and processing technologies like dense image matching and airborne LiDAR (Light Detection and Ranging) scanning. With the increase in volume and precision, point cloud data offers a useful source of information for natural resource management, urban planning, self-driving cars and more. At the same time, the scale at which point cloud data is produced, introduces management challenges: it is important to achieve efficiency both in terms of querying performance and space requirements. Traditional file-based solutions to point cloud management offer space efficiency, however, cannot scale to such massive data and provide the same declarative power as a database management system (DBMS). In this paper, we propose a time- and space-efficient solution to storing and managing point cloud data in main memory column-store DBMS. Our solution, Space-Filling Curve Dictionary-Based Compression (SFC-DBC), employs dictionary-based compression in the spatial data management domain and enhances it with indexing capabilities by using space-filling curves. It does so by constructing the space-filling curve over a compressed, artificially introduced 3D dictionary space. Consequently, SFC-DBC significantly optimizes query execution, and yet it does not require additional storage resources, compared to traditional dictionary-based compression. With respect to space-filling curve-based approaches, it minimizes storage footprint and increases resilience to skew. As a proof of concept, we develop and evaluate our approach as a research prototype in the context of SAP HANA. SFC-DBC outperforms other dictionary-based compression schemes by up to 61% in terms of space and up to 9.4x in terms of query performance.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Although the addition of nodes/cores generally results in improved performance, in PCDM literature, this can be seen only in shared-nothing architecture-oriented PCDM work. More specifically, the current shared-memory and shared-disk architecture-oriented PCDM systems do not demonstrate the performance improvements that PCDM systems can yield when more core/processor-memory nodes are added to an existing PCDM system [19,20,24,31,[46][47][48][49]. A potential reason could be the infeasibility of adding more cores/processor-memory nodes to existing systems once the systems are configured. ...
... When analyzing the state-of-the-art PCDM research work, most of the databaseoriented PCDM research efforts are shown to be based on shared-memory architectureoriented systems [19,20,24,31,46,47,49,52,53]. Critically, current research work in PCDM literature does not provide straightforward reasons for the adoption of shared-memory architectures. ...
... In addition to the particular case of Earth-Server, modern SQL databases such as NewSQL belong to a class of relational database systems [62] that provide high scalability of NoSQL systems and strong consistency and usability of relational databases [39]. However, Pavlovic et al. [49] appears to be the only PCDM system that is built atop a NewSQL DBMS. In [49], the authors employ the SAP HANA database-an in-memory column-oriented RDBMS for PCDM. ...
Article
Full-text available
Current state-of-the-art point cloud data management (PCDM) systems rely on a variety of parallel architectures and diverse data models. The main objective of these implementations is achieving higher scalability without compromising performance. This paper reviews the scalability and performance of state-of-the-art PCDM systems with respect to both parallel architectures and data models. More specifically, in terms of parallel architectures, shared-memory architecture, shared-disk architecture, and shared-nothing architecture are considered. In terms of data models, relational models, and novel data models (such as wide-column models) are considered. New structured query language (NewSQL) models are considered. The impacts of parallel architectures and data models are discussed with respect to theoretical perspectives and in the context of existing PCDM implementations. Based on the review, a methodical approach for the selection of parallel architectures and data models for highly scalable and performance-efficient PCDM system development is proposed. Finally, notable research gaps in the PCDM literature are presented as possible directions for future research.
... As the Morton is too discrete, an optimization algorithm for the Morton is proposed by IT on this basis finally [3]. Mirjana et al., introduced space-filling curve dictionary-based compression, employs dictionary-based compression in the spatial data management domain and enhances it with indexing capabilities by using space-filling curves [4]. Javier et al., used a fast variant of Gaussian Mixture Models and an expectation-Maximization algorithm to replace the points grouped in the previous step with a set of Gaussian distributions. ...
Article
Data Visualization in static images is still dynamically growing and changing with time in recent days. In visualization applications, memory, time and bandwidth are crucial issues when handling the high resolution three dimensional (3D) Light Detection and Ranging (LiDAR) data and they progressively demand efficient data compression strategies. This shortage is strongly motivating us to develop an efficient 3D point cloud image compression methodology. This work introduces an innovative lossless compression algorithm for a 3D point cloud image based on higher-order singular value decomposition (HOSVD) technique. This algorithm starts with the preprocessing method which removes the unreliable 3D points and then it combines the HOSVD together with the normalization, predictive coding followed by Run Length encoding to compress the HOSVD coefficients. This work accomplished lower mean square error (MSE), high (infinitive) Peak signal noise ratio (PSNR) to produce the lossless decompressed 3D point cloud image. The storage size has been reduced to one by fourth of its original 3D LiDAR point cloud image size.
... Pajić et al. [9] have extended the use of Apache Spark DataFrame for determining k-nearest neighborhood. Pavlovic et al. [10] have explored the use of an in-memory database, namely SAP HANA, for large-scale point cloud management, supported by indexing using space-filling curve dictionary-based compression. ...
... In general, a tree structure or a sequential ordering method are used for compression of point cloud. In [18], Pavlovic proposed a compression method for point cloud data by using the space filling curves (SFC). This method is based on a static space and a distribution of coordinates for input point cloud data. ...
Article
Full-text available
Point clouds have become a primitive and fundamental material for manifold spatial representations. It can precisely render real-world environments as high-density points which include three-dimensional (3D) coordinates (x, y & z) and other features (color, intensity, and so on). Accordingly, various applications, including robot navigation and self-driving, make use of point clouds not only to detect near objects but to comprehend overall geospatial surroundings. However, it is challenging to exploit the point clouds in terms of spatial query processing in traditional database systems because of its enormous volume and nonstructural formats. In this paper, we propose an efficient method for the manipulation of 3D point cloud based on a Discrete Global Grid System (DGGS). As DGGS represents the Earth as hierarchical sequences of equal area/volume tessellations, it provides an accurate partitioning to integrate and analyze big geospatial data, unlike a base64 geohash representation. This study extends our previous DGGS-based encoding/decoding work to process 3D range queries with more than 64 bits for precise 3D coordinates of point clouds. In particular, we apply PH-tree as a multi-resolution tessellation storage and indexing structure for 3D bounding box queries. The experimental results show that our query processing significantly outperforms the baseline with a linear quadtree. Also, we present the encoding/decoding efficiency of converting large Morton codes from geographic coordinates by using the combination of bit interleaving and lookup tables.
... However, its encoding/decoding performance is lower than the constant-type code. In [17], Pavlovic proposed a compression method for point cloud data using the SFC. The method is based on a static space and a distribution of coordinates for input point cloud data. ...
Conference Paper
Full-text available
With the development of mobile surveying and mapping technologies, point cloud data has been emerging in a variety of applications including robot navigation, self-driving drones/vehicles, and three-dimensional (3D) urban space modeling. In addition, there is an increasing demand for the database management system to share and reuse point cloud data, unlike being treated as archive files in the traditional uses and applications. However, database scalability needs to be explored to process and manage a massive volume of point cloud data defined by a 3D (X, Y, and Z) coordinates system. The typical approach to handle big data and distribute it across multiple nodes is data partitioning.~Geohashing is a popular way to convert a latitude/longitude spatial point into a code/string and has used for storing data into buckets of the grid. Many methods of handling big geospatial data, especially NoSQL databases, are based on the geohashing techniques. In this paper, we propose an efficient method to encode/decode 3D point cloud in a Discrete Global Grid System (DGGS) that represents the Earth as hierarchical sequences of equal area/volume tessellations, similar to geohash. The current geohash of base36 has the difficulties of working with high-resolution 3D point clouds for data storage, filter, integration, and analytics because of its limitation of cell size and unequal areas. We employ DGGS-based Morton codes with more than 64 bits for precise 3D coordinates of point cloud and compare the encoding/decoding performance between two implementations: using strings and using the combination of bit interleaving and lookup tables.
Article
Nowadays, massive amounts of point cloud data can be collected thanks to advances in data acquisition and processing technologies such as dense image matching and airborne LiDAR scanning. With the increase in volume and precision, point cloud data offers a useful source of information for natural-resource management, urban planning, self-driving cars, and more. At the same time, on the scale that point cloud data is produced, management challenges are introduced: it is important to achieve efficiency both in terms of querying performance and space requirements. Traditional file-based solutions to point cloud management offer space efficiency, however, they cannot scale to such massive data and provide the declarative power of a DBMS. In this article, we propose a time- and space-efficient solution to storing and managing point cloud data in main memory column-store DBMS. Our solution, Space-Filling Curve Dictionary-Based Compression (SFC-DBC), employs dictionary-based compression in the spatial data management domain and enhances it with indexing capabilities by using space-filling curves. SFC-DBC does so by constructing the space-filling curve over a compressed, artificially introduced dictionary space. Consequently, SFC-DBC significantly optimizes query execution and yet does not require additional storage resources, compared to traditional dictionary-based compression. With respect to space-filling-curve-based approaches, it minimizes storage footprint and increases resilience to skew. As a proof of concept, we develop and evaluate our approach as a research prototype in the context of SAP HANA. SFC-DBC outperforms other dictionary-based compression schemes by up to 61% in terms of space and up to 9.4× in terms of query performance.
Conference Paper
Full-text available
3D digital city models, important for urban planning, are currently constructed from massive point clouds obtained through airborne LiDAR (Light Detection and Ranging). They are semantically enriched with information obtained from auxiliary GIS data like Cadastral data which contains information about the boundaries of properties, road networks, rivers, lakes etc. Technical advances in the LiDAR data acquisition systems made possible the rapid acquisition of high resolution topographical information for an entire country. Such data sets are now reaching the trillion points barrier. To cope with this data deluge and provide up-to-date 3D digital city models on demand current geospatial management strategies should be re-thought. This work presents a column-oriented Spatial Database Management System which provides in-situ data access, effective data skipping, efficient spatial operations, and interactive data visualization. Its efficiency and scalability is demonstrated using a dense LiDAR scan of The Netherlands consisting of 640 billion points and the latest Cadastral information, and compared with PostGIS.
Article
Full-text available
The popularity, availability and sizes of point cloud data sets are increasing, thus raising interesting data management and processing challenges. Various software solutions are available for the management of point cloud data. A benchmark for point cloud data management systems was defined and it was executed for several solutions. In this paper we focus on the solutions based on the column-store MonetDB, the generic out-of-the-box approach is compared with two alternative approaches that exploit the spatial coherence of the data to improve the data access and to minimize the storage requirements.
Article
Full-text available
Point cloud data are important sources for 3D geo-information. An inventory of the point cloud data management user requirements has been compiled using structured interviews with users from different background: government, industry and academia. Based on these requirements a benchmark has been developed to compare various point cloud data management solutions with regard to functionality and performance. The main test dataset is the second national height map of the Netherlands, AHN2, with 6 to 10 samples for every square meter of the country, resulting in 640 billion points. At the database level, a data storage model based on grouping the points in blocks is available in Oracle and PostgreSQL. This model is compared with the ‘flat table’ model, where each point is stored in a table row, in Oracle, PostgreSQL and the column-store MonetDB. In addition, the commonly used file-based solution Rapidlasso LAStools is used for comparison with the database solutions. The results of executing the benchmark on different platforms are presented as obtained during the increasingly challenging stages with more functionality and more data: mini (20 million points), medium (20 billion points), and full benchmark (the complete AHN2).
Article
Full-text available
This paper is devoted to the applications of Peano space-filling curves for modelling graphics information. These curves, based on bit interlacing, allow the modelling, easily, of images with quadtrees, solids with octrees and ray casting techniques, colour coding for paint systems, terrains and some kinds of animation in film bases. Finally, Peano keys and space-filling curves appear to be an efficient tool for logical modelling of computer-graphics objects.
Article
Full-text available
Requirements of enterprise applications have become much more demanding because they execute complex reports on transactional data while thousands of users may read or update records of the same data. The goal of the SAP HANA database is the integration of transactional and analytical workloads within the same database management system. To achieve this, a columnar engine exploits modern hardware (multiple CPU cores, large main memory, and caches), compression of database content, maximum parallelization in the database kernel, and database extensions required by enterprise applications, e.g., specialized data structures for hierarchies or support for domain specific languages. In this paper we highlight the architectural concepts employed in the SAP HANA database. We also report on insights gathered with the SAP HANA database in real-world enterprise application scenarios.
Conference Paper
Full-text available
The SQL Server 11 release (code named "Denali") introduces a new data warehouse query acceleration feature based on a new index type called a column store index. The new index type combined with new query operators processing batches of rows greatly improves data warehouse query performance: in some cases by hundreds of times and routinely a tenfold speedup for a broad range of decision support queries. Column store indexes are fully integrated with the rest of the system, including query processing and optimization. This paper gives an overview of the design and implementation of column store indexes including enhancements to query processing and query optimization to take full advantage of the new indexes. The resulting performance improvements are illustrated by a number of example queries.
Conference Paper
Full-text available
In order to handle spatial data efficiently, as required in computer aided design and geo-data applications, a database system needs an index mechanism that will help it retrieve data items quickly according to their spatial locations However, traditional indexing methods are not well suited to data objects of non-zero size located m multi-dimensional spaces In this paper we describe a dynamic index structure called an R-tree which meets this need, and give algorithms for searching and updating it. We present the results of a series of tests which indicate that the structure performs well, and conclude that it is useful for current database systems in spatial applications
Article
Full-text available
The availability of huge system memory, even on standard servers, generated a lot of interest in main memory database engines. In data warehouse systems, highly compressed columnoriented data structures are quite prominent. In order to scale with the data volume and the system load, many of these systems are highly distributed with a shared-nothing approach. The fundamental principle of all systems is a full table scan over one or multiple compressed columns. Recent research proposed different techniques to speedup table scans like intelligent compression or using an additional hardware such as graphic cards or FPGAs. In this paper, we show that utilizing the embedded Vector Processing Units (VPUs) found in standard superscalar processors can speed up the performance of mainmemory full table scan by factors. This is achieved without changing the hardware architecture and thereby without additional power consumption. Moreover, as on-chip VPUs directly access the system's RAM, no additional costly copy operations are needed for using the new SIMD-scan approach in standard main memory database engines. Therefore, we propose this scan approach to be used as the standard scan operator for compressed column-oriented main memory storage. We then discuss how well our solution scales with the number of processor cores; consequently, to what degree it can be applied in multi-threaded environments. To verify the feasibility of our approach, we implemented the proposed techniques on a modern Intel multicore processor using Intel® Streaming SIMD Extensions 1 (Intel®SSE). In addition, we integrated the new SIMD-scan approach into SAP® Netweaver® Business Warehouse Accelerator2. We conclude with describing the performance benefits of using our approach for processing and scanning compressed data using VPUs in column-oriented main memory database systems.
Article
Full-text available
Several schemes for the linear mapping of a multidimensional space have been proposed for various applications, such as access methods for spatio-temporal databases and image compression. In these applications, one of the most desired properties from such linear mappings is clustering, which means the locality between objects in the multidimensional space being preserved in the linear space. It is widely believed that the Hilbert space-filling curve achieves the best clustering (Abel and Mark, 1990; Jagadish, 1990). We analyze the clustering property of the Hilbert space-filling curve by deriving closed-form formulas for the number of clusters in a given query region of an arbitrary shape (e.g., polygons and polyhedra). Both the asymptotic solution for the general case and the exact solution for a special case generalize previous work. They agree with the empirical results that the number of clusters depends on the hypersurface area of the query region and not on its hypervolume. We also show that the Hilbert curve achieves better clustering than the z curve. From a practical point of view, the formulas given provide a simple measure that can be used to predict the required disk access behaviors and, hence, the total access time
Conference Paper
LIDAR is a popular remote sensing method used to examine the surface of the Earth. LIDAR instruments use light in the form of a pulsed laser to measure ranges (variable distances) and generate vast amounts of precise three dimensional point data describing the shape of the Earth. Processing large collections of point cloud data and combining them with auxiliary GIS data remain an open research problem. Past research in the area of geographic information systems focused on handling large collections of complex geometric objects stored on disk and most algorithms have been designed and studied in a single-thread setting even though multi-core systems are well established. In this paper, we describe parallel alternatives of known algorithms for evaluating spatial selections over point clouds and spatial joins between point clouds and rectangle collections.
Article
Earth observation sciences, astronomy, and seismology have large data sets which have inherently rich spatial and geospatial information. In combination with large collections of semantically rich objects which have a large number of thematic properties, they form a new source of knowledge for urban planning, smart cities and natural resource management. Modeling and storing these properties indicating the relationships between them is best handled in a relational database. Furthermore, the scalability requirements posed by the latest 26-attribute light detection and ranging (LIDAR) data sets are a challenge for file-based solutions. In this demo we show how to query a 640 billion point data set using a column store enriched with GIS functionality. Through a lightweight and cache conscious secondary index called Imprints, spatial queries performance on a flat table storage is comparable to traditional file-based solutions. All the results are visualised in real time using QGIS.
Conference Paper
In order to handle spatial data efficiently, as required in computer aided design and geo-data applications, a database system needs an index mechanism that will help it retrieve data items quickly according to their spatial locations However, traditional indexing methods are not well suited to data objects of non-zero size located m multi-dimensional spaces In this paper we describe a dynamic index structure called an R-tree which meets this need, and give algorithms for searching and updating it. We present the results of a series of tests which indicate that the structure performs well, and conclude that it is useful for current database systems in spatial applications
Many of the programming techniques used in solving two-dimensional problems can be extended to three dimensions. Here oct-trees are developed as a three-dimensional analog of quad-trees. Oct-trees can be used in geometric modeling and space planning. A fast algorithm is given for 90° rotation of oct-tree representations of objects. A space-efficient algorithm is given for translation in space. A PASCAL program for experimenting with oct-trees is described.
Article
There is often a need to map a multi-dimensional space on to a one-dimensional space. For example, this kind of mapping has been proposed to permit the use of one-dimensional indexing techniques to a multi-dimensional index space such as in a spatial database. This kind of mapping is also of value in assigning physical storage, such as assigning buckets to records that have been indexed on multiple attributes, to minimize the disk access effort. In this paper, we discuss what the desired properties of such a mapping are, and evaluate, through analysis and simulation, several mappings that have been proposed in the past. We present a mapping based on Hilbert's space-filling curve, which out-performs previously proposed mappings on average over a variety of different operating conditions.
Article
Dans cette Note on détermine deux fonctions x et y, uniformes et continues d’une variable (réelle) t, qui, lorsque t varie dans l’intervalle (0, 1), prennent toutes les couples de valeurs telles que 0≤x≤1, 0≤y≤1. Si l’on appelle, suivant l’usage, courbe continue le lieu des points dont les coordonnées sont des fonctions continues d’une variable, on a ainsi un arc de courbe qui passe par tous les points d’un carré. Donc, étant donné un arc de courbe continue, sans faire d’autres hypothèses, il n’est pas toujours possible de le renfermer dans une aire arbitrairement petite.
Article
A tutorial survey is presented of the quadtree and related hierarchical data structures. They are based on the principle of recursive decomposition. The emphasis is on the representation of data used in applications in image processing, computer graphics, geographic information systems, and robotics. There is a greater emphasis on region data (ie, two-dimensional shapes) and to a lesser extent on point, curvilinear, and three-dimensional data. A number of operations in which such data structures find use are examined in greater detail.-Author
Article
In multi-dimensional databases the essential tool for accessing data is the range query (or window query). In this paper we introduce a new algorithm of processing range query in universal B-tree (UB-tree), which is an index structure for searching in multi-dimensional databases. The new range query algorithm (called the DRU algorithm) works efficiently, even for processing high-dimensional databases. In particular, using the DRU algorithm many of the UB-tree inner nodes need not to be accessed. We explain the DRU algorithm using a simple geometric model, providing a clear insight into the problem. More specifically, the model exploits an interesting relation between the Z-curve and generalized quad-trees. We also present experimental results for the DRU algorithm implementation.
Article
This tutorial paper gives an introduction and overview of various topics related to airborne laser scanning (ALS) as used to measure range to and reflectance of objects on the earth surface. After a short introduction, the basic principles of laser, the two main classes, i.e., pulse and continuous-wave lasers, and relations with respect to time-of-flight, range, resolution, and precision are presented. The main laser components and the role of the laser wavelength, including eye safety considerations, are explained. Different scanning mechanisms and the integration of laser with GPS and INS for position and orientation determination are presented. The data processing chain for producing digital terrain and surface models is outlined. Finally, a short overview of applications is given.
Conference Paper
By interleaving the bits of the binary representations of the attribute values in a tuple, an integer corresponding to the tuple is created. A set of these integers represents a relation. The usual ordering of these integers corresponds to an ordering of multidimensional data that allows the use of conventional file organizations, such as Btrees, in the efficient processing of multidimensional queries (e.g. range queries). The class of data structures generated by this scheme includes a type of kd tree whose balance can be efficiently maintained, a multidimensional Btree which is simpler than previously proposed generalizations, and some previously reported data structures for range searching. All of the data structures in this class also support the efficient implementation of the set operations.
Conference Paper
A number of emerging applications of data management technology involve the monitoring and querying of large quantities of continuous variables, e.g., the positions of mobile service users, termed moving objects. In such applications, large quantities of state samples obtained via sensors are streamed to a database. Indexes for moving objects must support queries efficiently, but must also support frequent updates. Indexes based on minimum bounding regions (MBRs) such as the R-tree exhibit high concurrency overheads during node splitting, and each individual update is known to be quite costly. This motivates the design of a solution that enables the B -tree to manage moving objects. We represent moving-object locations as vectors that are timestamped based on their update time. By applying a novel linearization technique to these values, it is possible to index the resulting values using a single B that partitions values according to their timestamp and otherwise preserves spatial proximity. We develop algorithms for range and nearest neighbor queries, as well as continuous queries. The proposal can be grafted into existing database systems cost effectively. An extensive experimental study explores the performance characteristics of the proposal and also shows that it is capable of substantially outperforming the R-tree based TPRtree for both single and concurrent access scenarios.
Conference Paper
Today almost all database systems use B-trees as their main access method. One of the main drawbacks of the classical B-tree is, however, that it works well only for one-dimensional data In this paper we present a new access structure, called UB-tree (for universal B-tree) for multidimensional data. The UB-tree is balanced and has all the guaranteed performance characteristics of B-trees, i.e. it requires linear space for storage and logarithmic time for the basic operations of INSERT FIND DELETE. In addition the UB-tree has the fundamental property, that it preserves clustering of objects w.r. to Cartesian distance. Therefore, the UB-tree shows its main strengths for multidimensional data. It has very high potential for parallel processing. With the new method, a single UB-tree can replace an arbitrary number of secondary indexes. For updates this means that only one UB-tree must be managed instead of several secondary indexes. This reduces runtime and storage requirements substantially. For queries and in particular range queries the UB-tree has multiplicative complexity instead of the additive complexity of multiple secondary indexes. This results in dramatic performance improvements over secondary indexes. The UB-tree is obviously useful for geometric databases, datawarehousing and datamining applications, but even more for databases in general, where multiple secondary indexes are widespread, which can all be replaced by a single UB-tree index.
Conference Paper
This paper presents a point-based rendering approach to visualize massive sets of 3D points in real-time. In many disciplines such as architecture, engineering, and archeology LiDAR technology is used to capture sites and landscapes; the resulting massive 3D point clouds pose challenges for traditional storage, processing, and presentation techniques. The available hardware resources of CPU and GPU are limited, and the D point cloud data exceeds available memory size in general. Hence out-of-core strategies are required to overcome the limit of memory. We discuss concepts and implementations of rendering algorithms and interaction techniques that make out-of-core real-time visualization and exploration of massive 3D point clouds feasible. We demonstrate with our implementation real-time visualization of arbitrarily sized 3D point clouds with current PC hardware using a spatial data structure in combination with a point-based rendering algorithm. A rendering front is used to increase the performance taking into account user interaction as well as available hardware resources. Furthermore, we evaluate our approach, describe its characteristics, and report on applications.
Article
A mapping from multidimensional data (multi-key records) to one dimension is described. It consists of bitwise interlacing the keys. This mapping allows using dynamically balanced binary search trees for efficient multidimensional range searching. A range search algorithm for bitwise interlaced keys is presented. Experimental results show that for small hypercube ranges the average number of records to be inspected is logarithmic with the number of records. Storage requirements are only two pointers per record to establish the search tree.
Article
Given a query Q, a one-dimensional index structure I(e.g., B-tree), and a set of D dimensional points, a space-filling curve S is used to map the D dimensional points into a set of one-dimensional points that can be indexed through I for an efficient execution of query Q. The main idea is that space-filling curves are used as a way of mapping the multi-dimensional space into the one-dimensional space such that existing onedimensional query processing and indexing techniques can be applied.
Article
It is suggested that Gray codes be used to improve the performance of methods for partial match and range queries. Specifically, the author illustrates the improved clustering of similar records that Gray codes can achieve with multiattribute hashing. Gray codes are used instead of binary codes to map record signatures to buckets. In Gray codes, successive codewords differ in the value of exactly one bit position; thus, successive buckets hold records with similar record signatures. The proposed method achieves better clustering of similar records, thus reducing the I/O time. A mathematical model is developed to derive formulas giving the average performance of both methods, and it is shown that the proposed method achieves 0-50% relative savings over the binary codes. The author also discusses how Gray codes could be applied to some retrieval methods designed for range queries, such as the grid file and the approach based on the so-called z -ordering. Gray codes are also used to design good distance-preserving functions, which map a k -dimensional ( k -D) space into a one-dimensional one, in such a way that points are close in the k -D space are likely to be close in the 1-D space
Article
A number of emerging applications of data management technology involve the monitoring and querying of large quantities of continuous variables, e.g., the positions of mobile service users, termed moving objects. In such applications, large quantities of state samples obtained via sensors are streamed to a database. Indexes for moving objects must support queries efficiently, but must also support frequent updates. Indexes based on minimum bounding regions (MBRs) such as the R-tree exhibit high concurrency overheads during node splitting, and each individual update is known to be quite costly. This motivates the design of a solution that enables the B -tree to manage moving objects. We represent moving-object locations as vectors that are timestamped based on their update time. By applying a novel linearization technique to these values, it is possible to index the resulting values using a single B that partitions values according to their timestamp and otherwise preserves spatial proximity. We develop algorithms for range and nearest neighbor queries, as well as continuous queries. The proposal can be grafted into existing database systems cost effectively. An extensive experimental study explores the performance characteristics of the proposal and also shows that it is capable of substantially outperforming the R-tree based TPRtree for both single and concurrent access scenarios.
Article
In this paper we propose the use of fractals and especially the Hilbert curve, in order to design good distance-preserving mappings. Such mappings improve the performance of secondary-key- and spatial- access methods, where multi-dimensional points have to be stored on an 1-dimensional medium (e.g., disk). Good clustering reduces the number of disk accesses on retrieval, improving the response time. Our experiments on range queries and nearest neighbor queries showed that the proposed Hilbert curve achieves better clustering than older methods ("bit-shuffling", or Peano curve), for every situation we tried. Categories and Subject Descriptors: H.2.2 [Database Management]: Physical Design-access methods; H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing-indexing methods General Terms: Algorithms, Design, Performance Additional Keywords: distance preserving mappings, geometric data. 1. INTRODUCTION In this work we propose some space-filling curves which achiev...
Massive point cloud data management, Computers and Graphics, v.49 n.C
  • Oscar Peter Van Oosterom
  • Milena Martinez-Rubi
  • Mike Ivanova
  • Daniel Horhammer
  • Siva Geringer
  • Theo Ravada
  • Martin Tijssen
  • Romulo Kodde
  • Gonçalves
SQL server column store indexes
  • Cipri Per-Åke Larson
  • Eric N Clinciu
  • Artem Hanson
  • Susan L Oks
  • Price
Aleksandras Surna , Qingqing Zhou, SQL server column store indexes
  • Cipri Per-Åke Larson
  • Eric N Clinciu
  • Artem Hanson
  • Susan L Oks
  • Srikumar Price
  • Rangarajan
A spatial column-store to triangulate the Netherlands on the fly
  • Romulo Goncalves
  • Kostis Tom Van Tilburg
  • Foteini Kyzirakos
  • Panagiotis Alvanaki
  • Koutsourakis
  • Willem Ben Van Werkhoven
  • Robert Van Hage
Jack A Orenstein and Tim H Merrett. 1984. A class of data structures for associative searching
  • A Jack
  • Tim H Orenstein
  • Merrett
Christos Faloutsos and Shari Roseman. 1989. Fractals for Secondary Key Retrieval
  • Christos Faloutsos
  • Shari Roseman
Query and Update Efficient B+-Tree Based Indexing of Moving Objects
  • S Christian
  • Beng Chin Jensen Dan Lin
  • Ooi
Actueel Hoogte Bestand Nederland
  • Actueel Hoogte Bestand Nederland
Norbert Haala. 2011. Multiray photogrammetry and dense image matching
  • Norbert Haala
  • Haala Norbert
Robert Laurini. 1985. Graphics databases built on Peano space-filling curves
  • Robert Laurini
  • Laurini Robert