Conference Paper

Approximation and Analytical Studies of Inter-clustering Performances of Space-Filling Curves

January 2003

Source
DBLP

Conference: Discrete Random Walks, DRW'03, Paris, France, September 1-5, 2003

Authors:

A discrete space-filling curve provides a linear traversal/indexing of a multi-dimensional grid space. This paper presents an application of random walk to the study of inter-clustering of space-filling curves and an analytical study on the inter-clustering performances of 2-dimensional Hilbert and z-order curve families. Two underlying measures are employed: the mean inter-cluster distance over all inter-cluster gaps and the mean total inter-cluster distance over all subgrids. We show how approximating the mean inter-cluster distance statistics of continuous multi-dimensional space-filling curves fits into the formalism of random walk, and derive the exact formulas for the two statistics for both curve families. The excellent agreement in the approximate and true mean inter-cluster distance statistics suggests that the random walk may furnish an effective model to develop approximations to clustering and locality statistics for space-filling curves. Based upon the analytical results, the asymptotic comparisons indicate that z-order curve family performs better than Hilbert curve family with respect to both statistics.

Studies of Norm-Based Locality Measures of Two-Dimensional Hilbert Curves

Article

Full-text available

Sep 2021

A discrete space-filling curve provides a one-dimensional indexing or traversal of a multi-dimensional grid space. Sample applications of space-filling curves include multi-dimensional indexing methods, data structures and algorithms, parallel computing, and image compression. Common measures for the applicability of space-filling curve families are locality and clustering. Locality preservation reflects proximity between grid points, that is, close-by grid points are mapped to close-by indices or vice versa. We present analytical and empirical studies on the locality properties of the two-dimensional Hilbert curve family. The underlying locality measure, based on the p-normed metric \(d_{p}\), is the maximum ratio of \(d_{p}(v, u)^{m}\) to \(d_{p}({\tilde{v}}, {\tilde{u}})\) over all corresponding point-pairs (v, u) and \(({\tilde{v}}, {\tilde{u}})\) in the m-dimensional grid space and one-dimensional index space, respectively. Our analytical results close the gaps between the current best lower and upper bounds with exact formulas for \(p \in \{1, 2\}\), and extend to all reals \(p \ge 2\). We also verify the results with computer programs over various grid-orders and p-values. Our empirical results will shed some light on determining the exact formulas for the locality measure for all reals \(p \in (1, 2)\).

Survey of graph partitioning algorithms

Article

Jan 2020

Evdokiya Nikolayevna Golovchenko

PatchWork, a scalable density-grid clustering algorithm

Conference Paper

Full-text available

Apr 2016

Clustering is a fundamental task in Knowledge Discovery and Data mining. It aims to discover the unknown nature of data by grouping together data objects that are more similar. While hundreds of clustering algorithms have been proposed, many are complex and do not scale well as more data become available, making then inadequate to analyze very large datasets. In addition, many clustering algorithms are sequential, thus inherently difficult to parallelize. We propose PatchWork, a novel clustering algorithm to address those issues. PatchWork is a distributed density clustering algorithm with linear computational complexity and linear horizontal scalability. It presents several desirable characteristics in knowledge discovery, in particular, it does not require a priori the number of clusters to identify, and offers a natural protection against outliers and noise. In addition, PatchWork makes it possible to discover spatially large clusters instead of dense clusters only. PatchWork relies on the map/reduce paradigm to parallelize computations and was implemented using Apache Spark, the distributed computation framework. As a result, PatchWork can cluster a billion points in a few minutes only, a 40x improvement over the distributed implementation of k-means in Spark MLLib.

Clustering Analyses of Two-Dimensional Space-Filling Curves

Chapter

Nov 2021

A discrete space-filling curve provides a linear traversal or indexing of a multi-dimensional grid space. This paper presents two analytical studies on clustering analyses of the 2-dimensional Hilbert and z-order curve families. The underlying measure is the mean number of cluster over all identically shaped subgrids. We derive the exact formulas for the clustering statistics for the 2-dimensional Hilbert and z-order curve families. The exact results allow us to compare their relative performances with respect to this measure: when the grid-order is sufficiently larger than the subgrid-order (typical scenario for most applications), Hilbert curve family performs significantly better than z-order curve family.

Distance Metric on Multidimensional Spatial Objects

Conference Paper

Jun 2015

Clustering of Geospatial Big Data in a Distributed Environment

Chapter

Jan 2015

Norm-Based Locality Measures of Two-Dimensional Hilbert Curves

Conference Paper

Jul 2016

A discrete space-filling curve provides a 1-dimensional indexing or traversal of a multi-dimensional grid space. Applications of space-filling curves include multi-dimensional indexing methods, parallel computing, and image compression. Common goodness-measures for the applicability of space-filling curve families are locality and clustering. Locality reflects proximity preservation that close-by grid points are mapped to close-by indices or vice versa. We present an analytical study on the locality property of the 2-dimensional Hilbert curve family. The underlying locality measure, based on the p-normed metric \(d_{p}\), is the maximum ratio of \(d_{p}(u, v)^{m}\) to \(d_{p}(\tilde{u}, \tilde{v})\) over all corresponding point-pairs (u, v) and \((\tilde{u}, \tilde{v})\) in the m-dimensional grid space and 1-dimensional index space, respectively. Our analytical results identify all candidate representative grid-point pairs (realizing the locality-measure values) for all real norm-parameters in the unit interval [1, 2] and grid-orders. Together with the known results for other norm-parameter values, we have almost complete knowledge of the locality measure of 2-dimensional Hilbert curves over the entire spectrum of possible norm-parameter values.

Clustering Performance of 3-Dimensional Hilbert Curves

Conference Paper

Jul 2014

A discrete space-filling curve provides a linear traversal or indexing of a multi-dimensional grid space. This paper presents an analytical study of the clustering performance of the 3-dimensional Hilbert curve family. The underlying measure is the mean number of clusters over all identically shaped cubic subgrids. We derive an exact formula for the statistics for the Hilbert curve family, and have verified all exact formulas (intermediate and final) involved in the derivations in the analytical study with computer programs over various grid- and subgrid-orders.

Locality of Corner Transformation for Multidimensional Spatial Access Methods

Article

Full-text available

Apr 2008
Electron Notes Theor Comput Sci

The geometric structural complexity of spatial objects does not render an intuitive distance metric on the data space that measures spatial proximity. However, such a metric provides a formal basis for analytical work in transformation-based multidimensional spatial access methods, including locality preservation of the underlying transformation and distance-based spatial queries. We study the Hausdorff distance metric on the space of multidimensional polytopes, and prove a tight relationship between the metric on the original space of k-dimensional hyperrectangles and the standard p-normed metric on the transform space of 2k-dimensional points under the corner transformation, which justifies the effectiveness of the transformation-based technique in preserving spatial locality.

On the Locality Properties of Space-Filling Curves

Conference Paper

Dec 2003

A discrete space-filling curve provides a linear traversal or indexing of a multi-dimensional grid space. We present an analytical study of the locality properties of the m-dimensional k-order discrete Hilbert and z-order curve families, \(\{H^m_k | k = 1,2,...\}\) and \(\{Z^m_k | k = 1,2,...\}\), respectively, based on the locality measure L δ that cumulates all index-differences of point-pairs at a common 1-normed distance δ. We derive the exact formulas for L δ (H k m ) and L δ (Z k m ) for m = 2 and arbitrary δ that is an integral power of 2, and m = 3 and δ = 1. The results yield a constant asymptotic ratio lim\(_{k\rightarrow\infty}\frac{L_\delta(H^m_k)}{L_\delta(Z^m_k)} > 1\), which suggests that the z-order curve family performs better than the Hilbert curve family over the considered parameter ranges.

On p-Norm Based Locality Measures of Space-Filling Curves

Conference Paper

Dec 2004
Lect Notes Comput Sci

A discrete space-filling curve provides a linear indexing or traversal of a multi-dimensional grid space. We present an analytical study on the locality properties of the 2-dimensional Hilbert curve family. The underlying locality measure, based on the p-normed metric d p , is the maximum ratio of d p (v, u)m to \(d_{p}(\tilde{v},\tilde{u})\) over all corresponding point-pairs (v, u) and \((\tilde{v},\tilde{u})\) in the m-dimensional grid space and (1-dimensional) index space, respectively. Our analytical results close the gaps between the current best lower and upper bounds with exact formulas for p ∈ {1, 2}, and extend to all reals p ≥ 2.

Efficient processing of spatial joins with DOT-based indexing

Article

Apr 2010
INFORM SCIENCES

A spatial join is a query that searches for a set of object pairs satisfying a given spatial relationship from a database. It is one of the most costly queries, and thus requires an efficient processing algorithm that fully exploits the features of the underlying spatial indexes. In our earlier work, we devised a fairly effective algorithm for processing spatial joins with double transformation (DOT) indexing, which is one of several spatial indexing schemes. However, the algorithm is restricted to only the one-dimensional cases. In this paper, we extend the algorithm for the two-dimensional cases, which are general in Geographic Information Systems (GIS) applications. We first extend DOT to two-dimensional original space. Next, we propose an efficient algorithm for processing range queries using extended DOT. This algorithm employs the quarter division technique and the tri-quarter division technique devised by analyzing the regularity of the space-filling curve used in DOT. This greatly reduces the number of space transformation operations. We then propose a novel spatial join algorithm based on this range query processing algorithm. In processing a spatial join, we determine the access order of disk pages so that we can minimize the number of disk accesses. We show the superiority of the proposed method by extensive experiments using data sets of various distributions and sizes. The experimental results reveal that the proposed method improves the performance of spatial join processing up to three times in comparison with the widely-used R-tree-based spatial join method.

Analysis of the Clustering Properties of Hilbert Space-filling Curve

Article

Full-text available

Feb 2001

Several schemes for the linear mapping of a multidimensional space have been proposed for various applications, such as access methods for spatio-temporal databases and image compression. In these applications, one of the most desired properties from such linear mappings is clustering, which means the locality between objects in the multidimensional space being preserved in the linear space. It is widely believed that the Hilbert space-filling curve achieves the best clustering (Abel and Mark, 1990; Jagadish, 1990). We analyze the clustering property of the Hilbert space-filling curve by deriving closed-form formulas for the number of clusters in a given query region of an arbitrary shape (e.g., polygons and polyhedra). Both the asymptotic solution for the general case and the exact solution for a special case generalize previous work. They agree with the empirical results that the number of clusters depends on the hypersurface area of the query region and not on its hypervolume. We also show that the Hilbert curve achieves better clustering than the z curve. From a practical point of view, the formulas given provide a simple measure that can be used to predict the required disk access behaviors and, hence, the total access time

Locality properties of discrete space-filling curves: Results with relevance for computer science

Aug 1997

J Alber

J. Alber. Locality properties of discrete space-filling curves: Results with relevance for computer science (in German). Studienarbeit Universität Tübingen, Wilhelm-Schickard-Institut für Informatik. July 1997.

Approximation and Analytical Studies of Inter-clustering Performances of Space-Filling Curves

Abstract

No full-text available

Recommended publications

Clustering Analyses of Two-Dimensional Space-Filling Curves: Hilbert and z-Order Curves