Page 1

Four-Dimensional Hilbert Curves for R-Trees

Herman Haverkort∗

Freek van Walderveen†

Abstract

Two-dimensional R-trees are a class of spatial index

structures in which objects are arranged to enable fast

window queries: report all objects that intersect a given

query window. One of the most successful methods of

arranging the objects in the index structure is based

on sorting the objects according to the positions of

their centres along a two-dimensional Hilbert space-

filling curve. Alternatively one may use the coordinates

of the objects’ bounding boxes to represent each object

by a four-dimensional point, and sort these points along

a four-dimensional Hilbert-type curve. In experiments

by Kamel and Faloutsos and by Arge et al. the first

solution consistently outperformed the latter when ap-

plied to point data, while the latter solution clearly out-

performed the first on certain artificial rectangle data.

These authors did not specify which four-dimensional

Hilbert-type curve was used; many exist.

In this paper we show that the results of the previ-

ous papers can be explained by the choice of the four-

dimensional Hilbert-type curve that was used and by

the way it was rotated in four-dimensional space. By

selecting a curve that has certain properties and choos-

ing the right rotation one can combine the strengths of

the two-dimensional and the four-dimensional approach

into one, while avoiding their apparent weaknesses. The

effectiveness of our approach is demonstrated with ex-

periments on various data sets. For real data taken from

VLSI design, our new curve yields R-trees with query

times that are better than those of R-trees that were

obtained with previously used curves.

1

In many applications one needs a spatial database to

store and query objects in a plane.

geographic information systems, objects on the surface

of the earth are stored, and one needs to be able to

retrieve the objects that lie within a certain query range.

A database with the components of a VLSI design

Introduction

For example in

∗Dept. of Mathematics and Computer Science, Eindhoven

University of Technology, The Netherlands.

cs.herman@haverkort.net

†Dept. of Mathematics and Computer Science, Eindhoven

University of Technology, The Netherlands. freek@vanwal.nl

constitutes another example. When such databases are

big, disk access becomes a major issue. Therefore a lot

of research has been done into spatial databases that

are designed to be stored on disk.

R-trees, originally introduced by Guttman [5], form

a class of spatial index structures that are particularly

interesting as a general-purpose method to organise a

variety of spatial objects. The R-tree can be understood

as a multi-dimensional variant of a B-tree that stores

minimum bounding boxes instead of one-dimensional

keys. An R-tree is structured as follows. The minimum

axis-parallel bounding boxes of the data objects are

organised into leaves (pages) that can accommodate a

certain number of bounding boxes. Let B be the number

of bounding boxes that fit in a leaf. The leaves are then

organised into a height-balanced tree, of which each

node has roughly B children (the root node may have

fewer children). Each node of the tree occupies a page

on disk and stores, for each of its children ν, the address

(page number) of ν and the bounding box of the objects

stored in the subtree rooted at ν.

To find the objects intersecting a query window Q,

we start at the root and recursively query all subtrees

whose bounding boxes intersect Q. Whenever the query

reaches a leaf, we check the bounding boxes stored in it,

and for each bounding box that intersects Q we check

the corresponding object. Fig. 1 shows an example of

an R-tree, highlighting the nodes that are accessed to

answer an example query. R-trees can also be used to

find the objects that are nearest to a query range Q; the

process is similar to searching the bounding boxes in a

range around Q that is grown until the nearest objects

are found.

R-trees have been proven quite effective in practice,

but the query efficiency depends on how exactly the

rectangles are ordered in the tree.

typically possible to keep all internal nodes of the tree

cached in memory, while the leaves often have to be

retrieved from disk.Therefore the number of leaves

accessed is the most important factor determining the

query efficiency of an R-tree.

like to make sure that the rectangles that are stored

in any particular leaf ν lie close to each other in the

plane. Thus the bounding box of ν will be small, thus

the chance that any particular query intersects that

In practice it is

Intuitively, one would

63Copyright © by SIAM.

Unauthorized reproduction of this article is prohibited.

Page 2

1

1

234

5

67

5

2

3

4

6

7

Q

Figure 1: An example of an R-tree. When queried with

the hatched query range Q, the nodes marked by bold

outlines will be visited.

bounding box is small, and thus the chance that ν needs

to be retrieved from disk is small. The need to optimise

the organisation of rectangles in an R-tree has spawned

a lot of research: many algorithms have been designed

to construct R-trees and to insert or delete rectangles

in them. For an overview see Manolopoulos et al. [10].

A particularly easy and successful method is based

on sorting the set of input rectangles into a linear order.

Then the tree can be constructed by packing the input

rectangles into leaves in the order in which they have

been sorted, and simply putting a balanced tree of

degree roughly B on top of them. This results in an

R-tree construction algorithm that is as fast as sorting

and scanning the input once. Furthermore, it is easy

to maintain the structure by keeping the rectangles in

the tree in order as in a B-tree. The main consideration

with this approach is how to define the linear order, and

several heuristics and variations on this approach have

been explored [4, 8, 9, 13]. In particular, the Hilbert-

order used by Kamel and Faloutsos [8] was shown to

result in efficient queries in experiments.

1.1

define a scanning order ≺ of points in the unit square as

follows. We define a set of rules, each of which specifies

(i) a scanning order ≺ of the four quadrants of a square,

and (ii) for each quadrant, which rule is to be applied to

establish the scanning order within that quadrant. We

choose a starting rule R and apply it to the unit square.

Fig. 2 illustrates the starting rule defining the Hilbert-

order. It puts the quadrants of the unit square in the

order lower left, upper left, upper right, lower right. To

the upper left and the upper right quadrants, rule R is

applied recursively. To the lower quadrants, we apply a

rotated and mirrored version of rule R.

By applying these rules recursively, we can order

the squares of an arbitrarily fine grid.

A Hilbert R-tree on a set of points.

We

Drawing a

23

4

1

67

85

1011

129

43

21

14 13

1615

P

Q

R

p

q r

s

Figure 2: Top left: the definition of the Hilbert-order.

The grey curve illustrates the scanning order of the

quadrants. Bottom left:

Right: four levels of subdivision. The first cell p and

the last cell s of quadrant P are in the corners. The

last cell q of Q and the first cell r of R share an edge.

two levels of subdivision.

curve through the centres of these squares gives an

arbitrarily fine approximation of Hilbert’s space-filling

curve [7], see Fig. 2. For two points a and b, we say

a ≺ b (a precedes b) if and only if, through recursive

subdivision of the unit square according to these rules,

we can find two squares A and B such that a ∈ A,

b ∈ B, and A ≺ B. When a point lies on a vertical

boundary between quadrants, we assume it belongs

to the quadrant to the right; when a point lies on a

horizontal boundary between quadrants, we assume it

belongs to the quadrant above.

1.2

Unfortunately the approach described above only works

for point data. To build an R-tree on a set of objects in

the plane that are not points, we can map the bounding

boxes of the objects to points and put the objects in

the tree according to the Hilbert-order on those points.

Kamel and Faloutsos investigated three ways to do this:

A Hilbert R-tree on a set of rectangles.

• map bounding boxes [xmn,xmx] × [ymn,ymx] to

their centre points

and use the Hilbert-order defined above (the two-

dimensional mapping, or H2 for short);

1

2(xmn + xmx,ymn + ymx)

• map bounding boxes [xmn,xmx] × [ymn,ymx]

to four-dimensional points (xmn,ymn,xmx,ymx)

and order them according to a Hilbert-like scanning

order of the four-dimensional unit hypercube (the

four-dimensional xy-mapping, H4xy for short);

• map bounding boxes [xmn,xmx] × [ymn,ymx] to

four-dimensional points (cx,cy,dx,dy), were cx =

1

2(xmn+ xmx), cy=1

2(ymn+ ymx), dx= xmx−

64Copyright © by SIAM.

Unauthorized reproduction of this article is prohibited.

Page 11

ity measures for four-dimensional scanning orders that

can be analysed and computed theoretically, similar to

those which we defined for two-dimensional scanning

orders [6]. This would not only be useful to speed up

the search for good scanning orders, it could also ensure

that the “best” scanning order is selected on the basis of

more objective criteria than good average performance

on a small test set. Note that a mere generalisation

of two-dimensional measures to four dimensions would

not be good enough for our purposes: we need a quality

measure for four-dimensional scanning orders that are

used to make bounding boxes in two dimensions.

References

[1] J. Alber, R. Niedermeier: On Multidimensional Curves

with Hilbert Property. Theory of Computing Systems

33:295–312 (2000).

[2] L. Arge, M. de Berg, H. J. Haverkort, and K. Yi. The

Priority R-tree: a practically efficient and worst-case

optimal R-tree.In Proc. ACM SIG Management of

Data (SIGMOD), pages 347–358, 2004.

[3] J. van den Bercken, B. Seeger. An Evaluation of Generic

Bulk Loading Techniques. In Proc. Int. Conf. on Very

Large Databases (VLDB), pages 461–470, 2001.

[4] D. J. DeWitt, N. Kabra, J. Luo, J. M. Patel, and J.-B.

Yu. Client-server paradise. In Proc. Int. Conf. on Very

Large Databases (VLDB), pages 558–569, 1994.

[5] A. Guttman. R-trees: A dynamic index structure for

spatial searching. In Proc. ACM Special Interest Group

on Management of Data (SIGMOD), pages 47–57, 1984.

[6] H. Haverkort and F. van Walderveen.

bounding-box quality of two-dimensional space-filling

curves. In Proc. 16th Eur. Symp. on Algorithms (ESA),

2008. Full manuscript at arXiv:0806.4787 [cs.CG].

Locality and

[7] D. Hilbert.¨Uber die stetige Abbildung einer Linie auf

ein Fl¨ achenst¨ uck. Math. Ann. 38 (1891), 459–460.

[8] I. Kamel and C. Faloutsos. On packing R-trees. In Proc.

Int. Conf. on Information and Knowledge Management

(CIKM), pages 490–499, 1993.

[9] S. T. Leutenegger, M. A. L´ opez, and J. Edgington.

STR: A simple and efficient algorithm for R-tree pack-

ing. In Proc. IEEE Int. Conf. on Data Engineering

(ICDE), pages 497–506, 1996.

[10] Y. Manolopoulos, A. Nanopoulos, A. N. Papadopoulos,

Y. Theodoridis.R-trees:

Springer, 2005.

Theory and Applications.

[11] D. Moore.

and Range Queries.

www.caam.rice.edu/~dougm/twiddle/Hilbert/

Fast Hilbert Curve Generation, Sorting,

http://web.archive.org/web/

[12] G. Peano. Sur une courbe, qui remplit toute une aire

plane. Math. Ann. 36 (1890), 157–160.

[13] N. Roussopoulos and D. Leifker. Direct spatial search

on pictorial databases using packed R-trees. In Proc.

ACM Special Interest Group on Management of Data

(SIGMOD), pages 17–31, 1985.

73Copyright © by SIAM.

Unauthorized reproduction of this article is prohibited.