Page 1

Four-Dimensional Hilbert Curves for R-Trees

Herman Haverkort∗

Freek van Walderveen†

Abstract

Two-dimensional R-trees are a class of spatial index

structures in which objects are arranged to enable fast

window queries: report all objects that intersect a given

query window. One of the most successful methods of

arranging the objects in the index structure is based

on sorting the objects according to the positions of

their centres along a two-dimensional Hilbert space-

filling curve. Alternatively one may use the coordinates

of the objects’ bounding boxes to represent each object

by a four-dimensional point, and sort these points along

a four-dimensional Hilbert-type curve. In experiments

by Kamel and Faloutsos and by Arge et al. the first

solution consistently outperformed the latter when ap-

plied to point data, while the latter solution clearly out-

performed the first on certain artificial rectangle data.

These authors did not specify which four-dimensional

Hilbert-type curve was used; many exist.

In this paper we show that the results of the previ-

ous papers can be explained by the choice of the four-

dimensional Hilbert-type curve that was used and by

the way it was rotated in four-dimensional space. By

selecting a curve that has certain properties and choos-

ing the right rotation one can combine the strengths of

the two-dimensional and the four-dimensional approach

into one, while avoiding their apparent weaknesses. The

effectiveness of our approach is demonstrated with ex-

periments on various data sets. For real data taken from

VLSI design, our new curve yields R-trees with query

times that are better than those of R-trees that were

obtained with previously used curves.

1

In many applications one needs a spatial database to

store and query objects in a plane.

geographic information systems, objects on the surface

of the earth are stored, and one needs to be able to

retrieve the objects that lie within a certain query range.

A database with the components of a VLSI design

Introduction

For example in

∗Dept. of Mathematics and Computer Science, Eindhoven

University of Technology, The Netherlands.

cs.herman@haverkort.net

†Dept. of Mathematics and Computer Science, Eindhoven

University of Technology, The Netherlands. freek@vanwal.nl

constitutes another example. When such databases are

big, disk access becomes a major issue. Therefore a lot

of research has been done into spatial databases that

are designed to be stored on disk.

R-trees, originally introduced by Guttman [5], form

a class of spatial index structures that are particularly

interesting as a general-purpose method to organise a

variety of spatial objects. The R-tree can be understood

as a multi-dimensional variant of a B-tree that stores

minimum bounding boxes instead of one-dimensional

keys. An R-tree is structured as follows. The minimum

axis-parallel bounding boxes of the data objects are

organised into leaves (pages) that can accommodate a

certain number of bounding boxes. Let B be the number

of bounding boxes that fit in a leaf. The leaves are then

organised into a height-balanced tree, of which each

node has roughly B children (the root node may have

fewer children). Each node of the tree occupies a page

on disk and stores, for each of its children ν, the address

(page number) of ν and the bounding box of the objects

stored in the subtree rooted at ν.

To find the objects intersecting a query window Q,

we start at the root and recursively query all subtrees

whose bounding boxes intersect Q. Whenever the query

reaches a leaf, we check the bounding boxes stored in it,

and for each bounding box that intersects Q we check

the corresponding object. Fig. 1 shows an example of

an R-tree, highlighting the nodes that are accessed to

answer an example query. R-trees can also be used to

find the objects that are nearest to a query range Q; the

process is similar to searching the bounding boxes in a

range around Q that is grown until the nearest objects

are found.

R-trees have been proven quite effective in practice,

but the query efficiency depends on how exactly the

rectangles are ordered in the tree.

typically possible to keep all internal nodes of the tree

cached in memory, while the leaves often have to be

retrieved from disk.Therefore the number of leaves

accessed is the most important factor determining the

query efficiency of an R-tree.

like to make sure that the rectangles that are stored

in any particular leaf ν lie close to each other in the

plane. Thus the bounding box of ν will be small, thus

the chance that any particular query intersects that

In practice it is

Intuitively, one would

63Copyright © by SIAM.

Unauthorized reproduction of this article is prohibited.

Page 2

1

1

234

5

67

5

2

3

4

6

7

Q

Figure 1: An example of an R-tree. When queried with

the hatched query range Q, the nodes marked by bold

outlines will be visited.

bounding box is small, and thus the chance that ν needs

to be retrieved from disk is small. The need to optimise

the organisation of rectangles in an R-tree has spawned

a lot of research: many algorithms have been designed

to construct R-trees and to insert or delete rectangles

in them. For an overview see Manolopoulos et al. [10].

A particularly easy and successful method is based

on sorting the set of input rectangles into a linear order.

Then the tree can be constructed by packing the input

rectangles into leaves in the order in which they have

been sorted, and simply putting a balanced tree of

degree roughly B on top of them. This results in an

R-tree construction algorithm that is as fast as sorting

and scanning the input once. Furthermore, it is easy

to maintain the structure by keeping the rectangles in

the tree in order as in a B-tree. The main consideration

with this approach is how to define the linear order, and

several heuristics and variations on this approach have

been explored [4, 8, 9, 13]. In particular, the Hilbert-

order used by Kamel and Faloutsos [8] was shown to

result in efficient queries in experiments.

1.1

define a scanning order ≺ of points in the unit square as

follows. We define a set of rules, each of which specifies

(i) a scanning order ≺ of the four quadrants of a square,

and (ii) for each quadrant, which rule is to be applied to

establish the scanning order within that quadrant. We

choose a starting rule R and apply it to the unit square.

Fig. 2 illustrates the starting rule defining the Hilbert-

order. It puts the quadrants of the unit square in the

order lower left, upper left, upper right, lower right. To

the upper left and the upper right quadrants, rule R is

applied recursively. To the lower quadrants, we apply a

rotated and mirrored version of rule R.

By applying these rules recursively, we can order

the squares of an arbitrarily fine grid.

A Hilbert R-tree on a set of points.

We

Drawing a

23

4

1

67

85

1011

129

43

21

14 13

1615

P

Q

R

p

q r

s

Figure 2: Top left: the definition of the Hilbert-order.

The grey curve illustrates the scanning order of the

quadrants.Bottom left:

Right: four levels of subdivision. The first cell p and

the last cell s of quadrant P are in the corners. The

last cell q of Q and the first cell r of R share an edge.

two levels of subdivision.

curve through the centres of these squares gives an

arbitrarily fine approximation of Hilbert’s space-filling

curve [7], see Fig. 2. For two points a and b, we say

a ≺ b (a precedes b) if and only if, through recursive

subdivision of the unit square according to these rules,

we can find two squares A and B such that a ∈ A,

b ∈ B, and A ≺ B. When a point lies on a vertical

boundary between quadrants, we assume it belongs

to the quadrant to the right; when a point lies on a

horizontal boundary between quadrants, we assume it

belongs to the quadrant above.

1.2

Unfortunately the approach described above only works

for point data. To build an R-tree on a set of objects in

the plane that are not points, we can map the bounding

boxes of the objects to points and put the objects in

the tree according to the Hilbert-order on those points.

Kamel and Faloutsos investigated three ways to do this:

A Hilbert R-tree on a set of rectangles.

• map bounding boxes [xmn,xmx] × [ymn,ymx] to

their centre points

and use the Hilbert-order defined above (the two-

dimensional mapping, or H2 for short);

1

2(xmn + xmx,ymn + ymx)

• map bounding boxes [xmn,xmx] × [ymn,ymx]

to four-dimensional points (xmn,ymn,xmx,ymx)

and order them according to a Hilbert-like scanning

order of the four-dimensional unit hypercube (the

four-dimensional xy-mapping, H4xy for short);

• map bounding boxes [xmn,xmx] × [ymn,ymx] to

four-dimensional points (cx,cy,dx,dy), were cx =

1

2(xmn+ xmx), cy=1

2(ymn+ ymx), dx= xmx−

64Copyright © by SIAM.

Unauthorized reproduction of this article is prohibited.

Page 3

xmn, and dy= ymx− ymn; order these points ac-

cording to a Hilbert-like scanning order of the four-

dimensional unit hypercube (the four-dimensional

cd-mapping, H4cd for short).

The two-dimensional approach ignores the widths and

heights of the input objects’ bounding boxes. As a re-

sult, a vertical line segment and a horizontal line seg-

ment with the same midpoint would always be put to-

gether in one leaf, creating a leaf with a large bound-

ing box, see Fig. 3.The four-dimensional approach

may avoid this problem by grouping boxes together

that are similar in both location and orientation. How-

ever, this raises the question of how exactly to define a

scanning order of the four-dimensional unit hypercube.

The two-dimensional Hilbert-order can be generalised

to higher dimensions in many different, equally plausi-

ble ways, and there is no agreement in the literature

of what would constitute the four-dimensional Hilbert-

order. Different implementations of multi-dimensional

Hilbert-orders may sort points differently.

Kamel and Faloutsos [8] and Arge et al. [2] show

comparisons between the two-dimensional approach H2

and one or both of the four-dimensional approaches,

but they do not discuss what four-dimensional Hilbert-

like scanning order they used.

concluded from their experiments that H2 works best

in practice. The results of Arge et al. are consistent

with that conclusion, except for extreme, artificially

constructed data sets, which may be unlike anything

one would expect to find in practice. On those extreme

data sets, the H2 approach resulted in very bad query

times while the H4xy approach did well.

Neither Kamel and Faloutsos nor Arge et al. showed

any advantages of a four-dimensional approach applied

to a practical data set. On the contrary, their results

seem to show that the four-dimensional approach is

clearly worse in practice. This is quite counterintuitive.

Even if the four-dimensional approach does not do any

good on ‘nice’ data (which consists mostly of very small

objects), it is hard to see why taking height and width

information into account when ordering small bounding

boxes would have to do any harm.

Kamel and Faloutsos

1.3

results of Kamel and Faloutsos and Arge et al. with

unspecified four-dimensional scanning orders, we de-

cided to investigate the effect of the choice of the four-

dimensional scanning order on the query efficiency of

the resulting R-trees. We found that the choice of scan-

ning order has a significant impact. In fact the impact

is such that Kamel and Faloutsos could have arrived at

entirely different conclusions if they had tried different

four-dimensional Hilbert-like scanning orders.

Our results.

Considering the counterintuitive

Figure 3: Left: a set of objects (line segments). Centre:

when objects with approximately the same centre are

grouped together into leaves, regardless of their orien-

tation, leaves with large bounding boxes result. Right:

when the orientation of the objects is taken into ac-

count when packing them together, smaller bounding

boxes are possible.

In particular, we found a four-dimensional Hilbert-

like scanning order that seems to combine the strengths

of the previously used scanning orders while avoiding

their apparent weaknesses. On relatively ‘nice’ data, the

new order results in a query efficiency that is as good

as with the two-dimensional approach, or only slightly

worse. On extreme data the new order results in query

times that are as good as with the previously used four-

dimensional scanning orders.

that there are data sets in the practice of VLSI design

for which the new order gives considerably better results

than the two-dimensional approach.

Moreover, it turns out

2

To define a four-dimensional Hilbert-like scanning order

we use the approach of Alber and Niedermeier [1]. They

define a class of multi-dimensional scanning orders that

maintain the most characteristic properties of the two-

dimensional Hilbert-order.

we first discuss the relevant properties of the two-

dimensional Hilbert-order. Next we illustrate how one

can define multi-dimensional scanning orders with an

example in three dimensions. Then we give definitions

of four-dimensional scanning orders, and explain the

algorithms we used to find new scanning orders.

Defining a four-dimensional scanning order

To explain their method,

2.1

al Hilbert-order described in Section 1.1: it is defined

by ordering the quadrants of the unit square, and by

specifying, for each quadrant, how to mirror and/or

rotate the order within that quadrant. The Hilbert-

order has two special properties, illustrated in Fig. 2.

Two dimensions.

Recall the two-dimension-

Property 2.1. Consider a grid of square cells that

results from an arbitrarily deep recursive application of

the rules that generate the scanning order, and consider

any square P to which the rules have been applied, on

any level of the recursion. The cells p and s in P that

65Copyright © by SIAM.

Unauthorized reproduction of this article is prohibited.

Page 4

come first and last in the scanning order, are in two

corners of P.

Property 2.2. Let Q and R be two quadrants of a

square P as defined above, such that R is the immediate

successor of Q in the scanning order. Then the first cell

r of R shares an edge with the last cell q of Q.

The latter is a useful property when the scanning order

is used to make R-trees: it guarantees that consecutive

cells in the Hilbert-order are adjacent in space. Thus

points that end up in the same leaf of the R-tree are

likely to be close to each other in space, and thus they

have a small bounding box.

2.2

generalise the Hilbert order to higher dimensions. Their

d-dimensional scanning orders are based on recursively

ordering the 2d‘hyperquadrants’ of the d-dimensional

unit hypercube, while maintaining the above properties

2.1 and 2.2. They find that there are 1536 structurally

different three-dimensional scanning orders that have

these properties. To describe a particular order, one

needs to specify in which order the ‘hyperquadrants’ of

the unit hypercube appear, and how the order is mir-

rored and/or rotated within each hyperquadrant. Alber

and Niedermeier use a permutation-based notation for

this purpose, which we will illustrate with an example

in three dimensions.

Consider a scanning order that puts the octants of

the three-dimensional unit cube in the order LTF (left

top front), LTH (left top hind), RTH (right top hind),

RTF, RBF (right bottom front), RBH, LBH, LBF, see

Fig. 4. Using 0 to encode left, bottom, and front, and 1

to encode right, top, and hind, we can also present this

order as 010-011-111-110-100-101-001-000. Note that in

this case, the first and the last octant are adjacent.

Now we need to specify how to order the suboctants

inside the left-top-front octant A1. The first of these

will also be the first suboctant of the unit hypercube

as a whole; therefore it must be in a corner (to ensure

Property 2.1), so it must be the suboctant A11on the

left-top-front side of A1, see Fig. 4. The last suboctant

of A1must be adjacent to the next octant A2(to ensure

Property 2.2) and to A11, so it must be suboctant A18

on the left-top-hind side of A1. This leaves two possible

orders of suboctants within A1that can be obtained by

mirroring and/or rotating the order of the octants of

the unit cube: it must be either 010-110-100-000-001-

101-111-011 (as in Fig. 4, top), or 010-000-100-110-111-

101-001-011 (as in Fig. 4, bottom).

To specify which order we take, we describe how to

permute the order of the octants of the unit cube (010-

011-111-110-100-101-001-000) to get the order of the

Three dimensions.

Alber and Niedermeier

A1

A2

A11

A18

A3

A4

A8

A7

A6

A5

A1

A2

A11

A18

A3

A4

A8

A7

A6

A5

Figure 4: Example of a top-level order, showing different

orders for the first octant.

suboctants inside A1. Such a permutation is given as a

set of sequences of indices, where a sequence of k indices

(i0i1...ik−1), with ij ∈ {1,2,...,2d}, indicates that

octant ij in the permuted order is octant i(j+1) mod k

in the original order, for 0 ? j < k. Thus the two

choices of permutations for the first octant of the unit

cube are (2 4 8) (3 5 7) and (2 8) (3 5).

To specify a full scanning order, we specify the top-

level order and we specify a permutation of that order

for every octant. See Fig. 5 for an example.

2.3

proach we distinguish four dimensions, which cor-

respond to the following attributes of input boxes

[xmn,xmx] × [ymn,ymx]:

1

2(xmn+xmx) (from left to right); (2) vertical location

1

2(ymn+ymx) (from bottom to top); (3) width (xmx−

xmn) (from narrow to wide); (4) height (ymx− ymn)

(from short to tall). We identify a hyperquadrant H of

Four dimensions (H4cd).

In the H4cd ap-

(1) horizontal location

66Copyright © by SIAM.

Unauthorized reproduction of this article is prohibited.

Page 5

000

001

101

100

110

111

011

010

(2 8)(3 5)

(2 8 4)(3 7 5)

(2 8 4)(3 7 5)

(1 3)(2 4)(5 7)(6 8)

(1 3)(2 4)(5 7)(6 8)

(1 5 7)(2 4 6)

(1 5 7)(2 4 6)

(1 7)(4 6)

Figure 5: Example of a fully specified scanning order in

three dimensions. The scanning order is defined by the

table, with the left column defining the top-level order

and the right column defining the permutations within

the octants.

the four-dimensional unit hypercube by a four-digit bi-

nary number b1b2b3b4, where bi= 0 if H is on the small

(left, bottom, narrow, short) side of dimension i, and

bi= 1 if H is on the large (right, top, wide, tall) side

of dimension i. Fig. 6 shows the definitions of a four-

dimensional scanning order by Alber and Niedermeier

and the order of Moore’s software [11] (H4cdM).

When we order bounding boxes of point data (with

width and height dimension zero) with the known

four-dimensional scanning orders, then their order is

quite different from their order according to the two-

dimensional approach. For example, Fig. 7 shows the

order in which sixteen points of the type (x,y) appear

when we order the corresponding points (x,y,0,0) with

Alber and Niedermeier’s scanning order. As it turns

out, the scanning order of such points in a grid of

16 squares contains ‘jumps’: sometimes cells that are

consecutive in the scanning order are not adjacent in the

plane. As a result, two points that lie very close to each

other in the scanning order—and are therefore likely to

be packed into the same leaf—can be very far apart in

the plane, causing the leaf to have a large bounding

box. We believe this may explain why previous authors

found that the four-dimensional approach did not work

well in practice. Therefore we decided to look for four-

dimensional orders with the following property:

Property 2.3. For any two points p1 = (x1,y1,0,0)

and p2 = (x2,y2,0,0) we have p1 ≺ p2 in the four-

dimensional scanning order if and only if (x1,y1) ≺

(x2,y2) according to the two-dimensional Hilbert-order.

To find a scanning order that has properties 2.1,

2.2 and 2.3, we implemented an algorithm to search

the space of four-dimensional scanning orders within

the framework of Alber and Niedermeier. Our search

method is explained in the next section, and resulted in

the order shown in Fig. 6 (bottom).

We also experimented with a rotated version of the

H4cdAN-order (Fig. 6). We call this version H4cdANR:

Alber-Niedermeier order (H4cdAN)

0000(2 16)(3 13)(6 12)(7 9)

0010(3 15)(4 16)(5 9)(6 10)

0110 (3 15)(4 16)(5 9)(6 10)

0100 (1 3 13 11 9 7)(2 4 14 12 10 8)(5 15)(6 16)

1100(1 3 13 11 9 7)(2 4 14 12 10 8)(5 15)(6 16)

1110 (1 5 13 9)(2 6 14 10)(3 11 15 7)(4 12 16 8)

1010 (1 5 13 9)(2 6 14 10)(3 11 15 7)(4 12 16 8)

1000(1 7)(4 6)(10 16)(11 13)

1001(1 7)(4 6)(10 16)(11 13)

1011(1 9 13 5)(2 10 14 6)(3 7 15 11)(4 8 16 12)

1111 (1 9 13 5)(2 10 14 6)(3 7 15 11)(4 8 16 12)

1101(1 11)(2 12)(3 5 7 9 15 13)(4 6 8 10 16 14)

0101 (1 11)(2 12)(3 5 7 9 15 13)(4 6 8 10 16 14)

0111(1 13)(2 14)(7 11)(8 12)

0011(1 13)(2 14)(7 11)(8 12)

0001(1 15)(4 14)(5 11)(8 10)

Moore order (H4cdM)

0000 (2 4 8 16)(3 5 9 15)(6 12 10 14)(7 13)

1000 (2 8)(3 9)(4 16)(5 15)(6 10)(12 14)

1100 (2 8)(3 9)(4 16)(5 15)(6 10)(12 14)

0100 (1 3 13 5)(2 14 12 8)(6 16)(7 15 11 9)

0110 (1 3 13 5)(2 14 12 8)(6 16)(7 15 11 9)

1110 (1 5 11 15)(2 4 12 10)(3 13 9 7)(6 14 16 8)

1010 (1 5 11 15)(2 4 12 10)(3 13 9 7)(6 14 16 8)

0010(1 7)(2 8)(3 5)(4 6)(9 15)(10 16)(11 13)(12 14)

0011(1 7)(2 8)(3 5)(4 6)(9 15)(10 16)(11 13)(12 14)

1011(1 9 11 3)(2 16 12 6)(4 8 10 14)(5 7 15 13)

1111(1 9 11 3)(2 16 12 6)(4 8 10 14)(5 7 15 13)

0111(1 11)(2 6 8 10)(3 5 9 15)(4 12 16 14)

0101(1 11)(2 6 8 10)(3 5 9 15)(4 12 16 14)

1101(1 13)(2 12)(3 5)(7 11)(8 14)(9 15)

1001(1 13)(2 12)(3 5)(7 11)(8 14)(9 15)

0001(1 15 13 9)(2 14 12 8)(3 11 5 7)(4 10)

New order (H4cdNew)

0000(2 16)(3 9)(4 8)(6 12)(7 13)(10 14)

0010(3 15)(4 16)(5 9)(6 10)

0110(2 8)(3 9)(4 16)(5 15)(6 10)(12 14)

0100(1 3)(5 13)(6 16)(7 15)(8 14)(9 11)

1100(1 3 15 11 9 5)(2 14 10 12 8 4)(6 16)(7 13)

1110(1 5 11 15)(2 4 12 10)(3 13 9 7)(6 14 16 8)

1010 (1 5 3 11 15 9)(2 12)(4 6 14 10 16 8)(7 13)

1000 (1 7)(4 6)(10 16)(11 13)

1001(1 7)(4 6)(10 16)(11 13)

1011(1 9 13 11 3 7)(2 8 16 12 14 6)(4 10)(5 15)

1111 (1 9 11 3)(2 16 12 6)(4 8 10 14)(5 7 15 13)

1101 (1 11)(2 6 8 12 16 14)(3 7 5 9 13 15)(4 10)

0101(1 11)(2 10)(3 9)(4 12)(6 8)(14 16)

0111(1 13)(2 12)(3 5)(7 11)(8 14)(9 15)

0011(1 13)(2 14)(7 11)(8 12)

0001 (1 15)(3 7)(4 10)(5 11)(8 14)(9 13)

Figure 6: Definitions of three four-dimensional scanning

orders.

67Copyright © by SIAM.

Unauthorized reproduction of this article is prohibited.