ArticlePDF Available

Modelling topological spatial relations: Strategies for query processing

Authors:

Abstract and Figures

This paper investigates the processing of spatial queries with topological constraints, for which current database solutions are inappropriate. Topological relations, such as disjoint, meet, overlap, inside, and contains, have been well defined by the 9-intersection, a comprehensive model for binary topological relations. We focus on two types of queries: (1) “Which objects have a stated topological relation with a given spatial object?” and (2) “What is the topological relation between two given spatial objects?” Such queries are processed at two levels of detail. First, Minimum Bounding Rectangles are used as an approximation of the objects' geometry and as a means of identifying candidates that might satisfy the query. Next, the nine intersections that determine the topological relations between candidate pairs are calculated. We present algorithms for minimizing these computations. Considerable performance can be gained by exploiting the semantics of spatial relations. We also compare the approach for a naive cost model, which assumes that all relations have the same frequency of occurrence, with a refined cost model, which considers the probability of occurrence of the topological relations. The strategies presented here have three key benefits: (1) they are based on a well-defined formalism; (2) they are customizable; and (3) they can take into account important statistical information about the data.
Content may be subject to copyright.
Computers and Graphics 18 (6): 815-822, 1994.
Modeling Topological Spatial Relations:
Strategies for Query Processing
Eliseo Clementini*, Jayant Sharma, and Max J. Egenhofer§
National Center for Geographic Information and Analysis,
University of Maine, Orono, ME 04469-5711, U.S.A.
Abstract
This paper investigates the processing of spatial queries with topological constraints, for which
current database solutions are inappropriate. Topological relations, such as disjoint, meet, overlap,
inside, and contains, have been well defined by the 9-intersection, a comprehensive model for
binary topological relations. We focus on two types of queries: (1) “Which objects have a stated
topological relation with a given spatial object?” and (2) “What is the topological relation between
two given spatial objects?” Such queries are processed at two levels of detail. First, Minimum
Bounding Rectangles are used as an approximation of the objects’ geometry and as a means of
identifying candidates that might satisfy the query. Next, the nine intesections that determine the
topological relations between candidate pairs are calculated. We present algorithms for minimizing
these computations. Considerable performance can be gained by exploiting the semantics of spatial
relations. We also compare the approach for a naive cost model, which assumes that all relations
have the same frequency of occurrence, with a refined cost model, which considers the probability
of occurrence of the topological relations. The strategies presented here have three key benefits: (1)
they are based on a well-defined formalism; (2) they are customizable; and (3) they can take into
account important statistical information about the data.
1. Introduction
Geographic Information Systems (GISs) contain high-level spatial operators that are uncommon in
conventional database management systems (DBMSs) [1]. Spatial operators appear, for instance,
as constraints in spatial queries to select spatial objects. They may include such simple selections
as, “Retrieve all lakes in the state of Maine,” or more complex ones like, “Find the shortest path
from Boston to Bangor based on travel time.” Traditionally, concerns about processing spatial
queries have been addressed primarily at the level of spatial access methods in order to minimize
the number of disk accesses by clustering spatial objects according to their spatial neighborhoods.
This method supports requests like, “Display a map of Maine showing highways and cities with a
population of more than 25,000” where the constraint is that objects intersect with a search
window. For other types of queries in which the semantics of the constraints are more complex,
the mere support of spatial access methods is insufficient to guarantee efficient processing of
queries.
Spatial access has to be complemented by methods that consider (1) the semantics of the spatial
relations, i.e., how the relations are defined, (2) heuristics for evaluating spatial constraints, and
(3) estimates of the distribution, i.e., the probability of occurrence of the relations. The semantics
of the spatial relations allow for the inference of a relation from a set of given relations. For
example, given that region A is disjoint from region B and that B contains region C, it can be
inferred that A is disjoint from C [2]. Heuristics for evaluating spatial constraints are dependent on
* This work was performed while on a leave of absence from Università di L’Aquila, Dipartimento di Ingegneria Elettrica,
67040 Poggio di Roio, L’Aquila, Italy. Eliseo Clementini is partially supported by the Italian National Council of
Research (CNR) under grant No. 92.01574.PF69.
Jayant Sharma receives partial support from a University of Maine Graduate Research Assistantship and the NCGIA.
§ Max Egenhofer’s work is partially supported by NSF grant No. IRI-9309230, a grant from Intergraph Corporation, a
University of Maine Summer Faculty Research Grant, and the NCGIA through NSF grant no. SBR-8810917.
Computers and Graphics 18 (6): 815-822, 1994.
the spatial data model and data structures used. An example of a heuristic would be the use of
Minimum Bounding Rectangles (MBRs) as a first approximation of the objects’ geometry as a fast
filter. Estimates of the distribution are important, because the application often determines what
relations are feasible. For example, in a cadastral application the only possible topological relations
between land parcels are disjoint or meet.
This paper focuses on the processing and algebraic optimization of spatial queries with topological
constraints. An example of such a query is, “Find all residential lots for sale adjacent to Branch
Lake,” where adjacent is a topological relation. Such relations are usually not explicitly stored
among spatial objects, but have to be inferred from the objects’ geometry. For example, the fact
that two land parcels are adjacent would be inferred from the fact that the two regions have a part
of their boundaries, but no interior, in common. While existing DBMSs do not support such
complex relations, extensible DBMSs [3] have the provisions to incorporate them into query
languages. To be successful as geographic databases, extensible DBMSs need models of how to
process and optimize queries over spatial relations.
The query processing strategies presented in this paper are based on the 9-intersection, a
comprehensive model for binary topological relations among point-, line-, and area-objects [4, 5].
It identifies eight basic relations between two regions in
; 19 topological relations between a
region and a simple line in
; and 33 relations between two simple lines in
. The strategies for
processing queries with topological constraints are based on the observation that only a true subset
of the nine intersections need to be determined in order to identify the topological relation between
two spatial objects. For example, determining whether two regions are disjoint only requires
determining that two intersections, the boundary-boundary and the interior-interior, are empty
because in none of the other seven possible cases are these two intersections both empty.
The optimization of topological queries is particularly challenging because terminology and
semantics of the relations varies across application domains. As long as no formalizations of such
spatial predicates exist that would match with humans’ interpretations, it is standard practice to
define the set of topological relations for an application domain as disjunctions of a set of basic,
mutually exclusive relations [6]. The sets of relations defined by the 9-intersection form base sets,
from which database users and administrators can construct the non-primitive sets of relations
relevant to their specific application domain. For example, an application domain may not need the
distinction between covers and contains as defined in the 9-intersection model for region-region
relations; therefore, for this user group, the set of relevant region-region relations would be
{disjoint, containsOrCovers, insideOrCoveredBy, meet, equal, overlap}. It is important that any
query optimization strategy selected will work independently of the particular combinations made.
This general applicability is of great importance since initial tests with human subjects clearly
demonstrate that humans group conceptually close relations and devise a prototypical representative
of this group [7]. Two topological relations are conceptually close if there is a transition from one
to another as a result of a gradual deformation applied to one object [8, 9]. Given this evidence it is
unlikely that all users would need or grasp the nuances between the 19 distinct region-line or 33
line-line relations in their applications. The particular subset, or subgroupings, of these relations
will necessarily vary with the application domain. Hence any query processing strategy based on
the base sets of relations must be applicable to such domain specific groupings. In this paper we
give examples of how the strategies presented can be directly applied to such cases, and therefore,
these strategies provide a degree of flexibility and customizability not typically found in
conventional spatial query processors.
We assume an object-centered geographic database. Object-centered geographic databases [1] have
a vector or topological data model and deal with spatial objects that have a distinct identity, e.g., in
the form of simplicial complexes or cells [10, 11]. This spatial data model represents the geometry
of geographic objects in terms of points, lines, and areas, and records explicitly boundary and co-
boundary relations among the geometric elements.
Computers and Graphics 18 (6): 815-822, 1994.
There are two types of queries, the processing of which requires the computation of the values of
the nine intersections between the interiors, boundaries, and exteriors. They are:
“Find all objects that have the topological relation R to object A?” and
“What is the topological relation between objects A and B?”
The latter type of query is less frequently asked though such a query is as important as the former
in geographic applications. The results of these queries have been called “qualitative answers”
[12].
The goal of this paper is to present the most promising strategies for processing topological
queries. The novel result is an algorithm to determine the smallest subset of the nine intersections
that have to be evaluated. We demonstrate that such spatial queries can be frequently determined
with less effort than computing all nine intersections for a topological relation. We start with a
“naive” model of query processing, assuming that the computation of all intersections is of equal
complexity, that all relations are equally distributed, and that all relations are equally frequently
queried. The naive model is then refined by assigning a probability distribution to the set of
relations. For example, in a cadastral application the most frequent topological relation is “disjoint”
followed by “meet.”
The framework introduced in this paper allows a query processor to find the best strategy to assess
a topological relation between two spatial entities. The 9-intersection model gives the primitives for
describing a topological relation, and by using a combination of such primitives it is possible to
model every set of topological relations. The same elementary tools can be used even when such
complex geographic objects as regions with holes or 1-spheres are involved [13].
The remainder of this paper is structured as follows: after a brief summary of the pertinent work in
spatial query optimization, we review the concepts of the 9-intersection as the model for which we
will investigate query processing strategies. As a first step, we introduce the mappings from
topological relations as defined by the 9-intersection onto relations between MBRs and show how
such knowledge can be exploited as a fast filter to find candidates that would satisfy a particular
topological relation. Subsequently, we design algorithms to select objects from the candidates for
two types of spatial queries: (1) finding the set of objects that hold a particular topological relation,
or set of topological relations, with respect to a given object; and (2) determining the topological
relation between two given objects. For the latter we compare the approach for the naive cost
model with a refined cost model.
2. Previous Work in Spatial Query Optimization
Most approaches to spatial query processing reported in the literature optimize spatial queries by
transforming user queries into evaluation plans that take into account the extended physical storage
mechanisms and access methods. This section reviews four such approaches and compares them
with the goals of this paper.
2.1 Spatial and Non-Spatial Database (SAND)
Aref and Samet [14, 15] describe strategies for constructing query evaluation plans and assessing
their costs. SAND consists of separate spatial and non-spatial data stores, which have two-way
links between records that describe the spatial and non-spatial attributes of some object. The
strategies essentially extend traditional non-spatial approaches with a spatial selection, which
retrieves records from the spatial data store that satisfy the given constraints. The evaluation plans
involve reordering the selections, i.e., choosing between performing the spatial or non-spatial
selection first; using indexes on both spatial and non-spatial data; and performing spatial operations
while accessing the data rather than storing links in a temporary store and subsequently traversing
these records to perform the desired operations. The major contribution of SAND is its
extensibility because the techniques are applicable to various data types, not only to spatial data.
2.2 GEOQL
Computers and Graphics 18 (6): 815-822, 1994.
Another proposal for extending a traditional DBMS and query processor uses an augmented SQL,
called GEOQL, which has spatial operators and works with an extended SQL-based DBMS [16].
Queries are optimized in four stages. (1) A logical transformation removes redundant constraints
and builds a query tree such that spatial indexes can be effectively utilized; (2) decomposition
partitions the tree such that spatial and non-spatial subqueries are formed; (3) from this set of
subqueries various execution plans are formulated and evaluated, one of which is executed; and (4)
the spatial subquery is handled by an auxiliary spatial processor, which is part of the extended
DBMS, and stores spatial attributes. The query processing strategies are designed to maximize the
advantages of using this spatial processor by extending the traditional query handling techniques of
decomposition and rewrite rules to work with spatial data. The benefits of GEOQL lie in the
simplicity of the extensions, which facilitate the use of existing optimizers.
2.3 GRAL
Becker and Güting [17] approach spatial query processing from a fundamental and formal
standpoint. Gral, is an extensible database system with geometric data types and a geometric query
language, called geo-relational algebra. Unlike the 9-intersection formalism [4], however, the
classification and semantics of the spatial relations between objects are not explicitly specified. In
Gral, the query language and the executable query plans are based on a many-sorted algebra. This
formalism defines the source and target languages of the optimization, as well as the optimization
rules themselves, providing the major advantage of a uniform framework for extensibility and
optimization strategies. Optimization is then a process of transformation and translation of algebraic
expressions. The rules governing these transformations and translations can be general and
independent of the operations in the query language or they can be specialized, facilitating the
inclusion of new spatial data types and operations.
2.4 Spatial Joins
Günther [18] examines the efficient computation of spatial joins where two datasets are related to
each other by some spatial constraint. Traditional join processing strategies are ill-suited for this
problem, because there is no linear ordering that preserves spatial proximity and hence sort-merge
cannot be used. Günther describes a class of tree structures called generalization trees that can be
used to devise efficient means of join computations. The generalization tree is suited for any
application where the data has inherent containment hierarchies. Spatial joins are computed by
considering nodes at a higher level in the tree in order to determine the branches that may contain
candidates on which the spatial join condition is performed. The concept of hierarchy of detail is
applicable to topological relations in the 9-intersection model too, with MBRs forming the coarse
level and the actual geometry being the detailed level.
2.5 Comparison with Present Work
Our concern is more fundamental and yet also more abstract than previous approaches in that it
addresses issues at a conceptually higher level. The fundamental question being investigated is that
of a spatial data model and a formal specification of this model. We use the mathematical definition
of binary topological relations [4]. In particular we propose means of reducing the computations
involved in evaluating a topological spatial constraint. The strategies presented here have three key
benefits: (1) they are based on a well-defined formalism, (2) they are customizable so that users or
spatial database administrators may redefine or limit the topological relations needed for their
specific application domain and still use the strategies presented here, and (3) the strategies can take
into account important statistical information about the data, the probability distribution, or
estimates, of the occurrence of various topological relations.
3. 9-Intersection Model
Topological relations are spatial relations that are preserved under such transformations as rotation,
scaling, and rubber sheeting. The model for binary topological relations used in this paper is based
on the usual concepts of point-set topology with open and closed sets. The binary topological
relation between two objects, A and B, in
is based upon the intersection of A’s interior (
Aº
),
boundary (
A
), and exterior (
A
) with B’s interior (
Bº
), boundary (
B
), and exterior (
B
). The
Computers and Graphics 18 (6): 815-822, 1994.
nine intersections between the six object parts describe a topological relation and can be concisely
represented by a 3×3-matrix 9, called the 9-intersection (Equation 1).
9(A,B)=AºBºAºB AºB
ABºA∩ ∂BAB
ABºA∩ ∂B AB
(1)
By considering the values empty (0) and non-empty (1), one can distinguish between 29
=512
binary topological relations. Only a small subset of them can be realized when the objects of
concern are embedded in
[19]. For two regions (connected, homogeneously 2-dimensional
areas with connected boundaries) in
, there are eight relations which provide a mutually
exclusive complete coverage [4] (Figure 1). The terms adopted for them (disjoint, meet, overlap,
equal, contains, inside, covers, and coveredBy) will be often shortened with d, m, o, e, ct, i, cv,
cB.
disjoint
contains
inside
equal
meet
covers
coveredBy
overlap
0 1
1
111
0
0 0 1 0 0
1 0 0
1 1 1
111
001
001
1 0 0
0 1 0
0 0 1
0 1
1
111
0
0 1
111
011
001
1 0 0
1 1 0
1 1 1
1 1 1
1 1 1
1 1 1
Figure 1: The eight topological relations between two spatial regions and their corresponding 9-
intersection matrices [2].
4. Exploiting Minimal Bounding Rectangles
Geographic databases that employ one of the common spatial access methods [20, 21] usually
store for each spatial object its MBR. An MBR is an X-Y-parallel rectangle that fully encloses the
geometry of the spatial object. While the MBR is usually only a crude approximation of the
object’s geometry, it is sufficient in most cases to locate the object in space; therefore, spatial
access methods use MBRs as the primary criterion when selecting the physical page on which the
spatial object will be stored. Likewise, when accessing objects based upon their spatial location,
the MBR is used as a filter to determine whether it is worthwhile to load an object from disk into
memory for more detailed tests.
When processing topological queries, the situation is somewhat different. Due to the potential
inconsistencies between topology and metric information, no immediate conclusions can be drawn
from the relations among the approximations. Occasionally, however, the MBR information can be
Computers and Graphics 18 (6): 815-822, 1994.
exploited to make very fast decisions as to whether an object qualifies to fulfill a particular relation
or not. This filter method is based on the fact that there are some consistent mappings between
MBR relations and the topological relations of interest. Subsequently, we will develop the
mappings between MBR relations and the topological relations between regions. Similar
correspondences can be found for the more extensive region-line and line-line relations.
Provided the MBRs are “tight” approximations, i.e., no space is wasted, and no inconsistencies
due to the number system exist, the mappings in Table 1 hold from the topological relations
between MBRs onto the topological relations between the exact geometric representations. This set
of mappings is relevant for queries in which one asks for the topological relation between two
given objects. The most dramatic performance improvement will occur if the MBRs are disjoint,
because this implies that their topological relation is disjoint as well. No further analysis of the
topological relation will be necessary. Considerable improvements are also obtained if the MBRs
meet, because in this case only the boundary-boundary intersection has to be evaluated.
MBR relation topological relation
dMdr
mMdr mr
oMdr mr or
iMdr mr or ir cBr
cBMdr mr or ir cBr
ctMdr mr or ctr cvr
cvMdr mr or ctr cvr
eMer or cBr cvr
Table 1: Mappings from MBR relations onto exact topological relations.
From Table 1, the reverse mapping from exact topological relations onto MBR relations can be
derived in a straightforward manner (Table 2). This mapping indicates in which cases the
consideration of MBR relations will improve performance of queries asking for all objects that
satisfy a given topological constraint with respect to a given object. The use of the MBR here is as
a filter since it eliminates all those cases that can be disregarded right away, because they will never
fulfill the particular topological constraint. For example, if the query is to find all objects that are
equal to a given object A, one first maps the topological relation er onto eM (Table 1) to make a fast
evaluation based on the spatial access method. Afterwards, for each object B whose MBR is equal
to A’s MBR, one has to evaluate whether the topological relation between the exact representations
is er or cBr cvr using the reverse mappings (Table 2).
Computers and Graphics 18 (6): 815-822, 1994.
topological relation MBR relations
drdM mM oM iM cBM ctM
cvM
mrmM oM eM iM cBM ctM
cvM
oroM eM iM cBM ctM cvM
iriM cBM
ctrctM cvM
cBriM cBM eM
cvrctM cvM eM
ereM
Table 2: Mappings from exact topological relations onto MBR relations.
5. Queries on Whether Two Objects Satisfy a Set of Topological Relations
The set of intersections to be calculated must be such that the content, empty or non-empty, of the
intersections is sufficient to identify the relation Ri. This leads to the definition of a minimal subset
of the nine intersections. The selection of the subset, which uniquely characterizes a relation Ri,
can be optimally determined for each of the relations. For example, the relation meet between two
regions is characterized by the following 9-intersection matrix:
Mmeet =001
011
111
(2)
Comparing this matrix with the matrices of the other 7 relations (Figure 1), it can be determined
that only Mmeet matches the template 0δ δ
δ1δ
δ δ δ
, where each δ indicates a don’t care value because
only the relation meet has a non-empty boundary-boundary intersection and an empty interior-
interior intersection. Thus the minimal subset for meet is {interior-interior, boundary-boundary}.
The following is an algorithm to determine the minimal subset. It is assumed that the computational
cost of each of the nine possible intersections, for an arbitrary 9-intersection, is equal. This
problem of identifying the minimal distinguishing subset is a form of the Minimum Set Cover
problem [22]. We use a greedy heuristic [23] to obtain the minimal sets in our application. R is the
set of binary topological relations. A and B are disjoint subsets of R such that AB = R. M is the
set of 9-intersection matrices for R. Each Mi in M is a 3x3 binary valued matrix. D is the set of
XOR matrices Dij where Dij is Mai XOR Mbi. Thus Dij [k, l] = 1 where k, l {˚, , -}, if that
intersection value distinguishes relation ai from bj.
Algorithm 1: Determine the minimal subset, P, of the 9-intersection that distinguishes between
mutually disjoint subsets A and B of the set of relations R.
procedure MinimalSet (R, A , B, M)
for each relation ai in A
for each relation bj in B
Construct the XOR matrices Dij = MaiMbj;
Construct the set of sets I, with elements Ikl, where k, l {˚, , -} and Ikl is the set of ai-
bj relation pairs such that Mai[k, l] Mbj[k, l].
Computers and Graphics 18 (6): 815-822, 1994.
Apply the minimum set cover [21] to find the subset of I such that their union is {ai × bj |
ai A and bj B}. From this subset create Pkl = {(k, l) | k, l {˚, , -}}. Pkl is the
minimal set of intersections that is sufficient to distinguish relations in A from those in B.
return Pkl;
For the set of eight region-region relations, the following templates represent the minimal subsets.
Rcontains =δ1δ
δ0δ
δ δ δ
Rinside =δ δ δ
10δ
δ δ δ
Rdisjoint =0δ δ
δ0δ
δ δ δ
Rcovers =δ1δ
01δ
δ δ δ
RcoveredBy =δ0δ
11δ
δ δ δ
Roverlap =δ1δ
1δ δ
δ δ δ
Requal =δ δ δ
0δ0
δ δ δ
Rmeet =0δ δ
δ1δ
δ δ δ
The worst case is for covers and coveredBy, where three intersection values must be computed,
while all other relations can be uniquely determined with only two intersections. This is a
considerable improvement over computing all nine intersections.
The same algorithm, used to determine the minimal sets above, applies to queries with disjunctions
of topological relations over the same objects. For example, Algorithm 1 used on the constraints in
the query, “Find object b such that A contains b
A covers b
A equal b,” gives the minimal set
represented by the following template:
Rctcve=δ δ δ
δ δ δ
δ0δ
(3)
Therefore, for each object b, only one intersection (between A’s exterior and b’s interior) must be
computed in lieu of seven (2 each for contains and equal, and 3 for covers).
Only disjunctions of the base relations are meaningful combinations. Conjunctions of relations
over the same objects are not meaningful since each basis set consists of mutually exclusive
relations [4], therefore, queries with conjunctions of topological relations over the same objects
result in an empty set. Since the set of relations is closed, negations can be expressed as the
complement, i.e., as disjunctions of the remaining relations.
6. Queries on the Topological Relation Between Two Objects
Queries on the topological relation between two objects are computationally expensive since the
relation between two given objects must be uniquely determined. The basic strategy for such
queries is the construction of a decision tree that partitions the search space at each node, thereby
progressively excluding all the other relations. In the best case, at each step of the decision process
the search space gets partitioned into two halves, therefore requiring log n computations, where n
is the number of relations. In the case of region-region relations, n = 8 and so at most three steps
are necessary.
The decision tree for topological relations characterized by 9-intersections is based on the values of
these intersections and different trees will be obtained by selecting different discriminants at the
nodes. For example, if region-region relations are being considered and the discriminant at the root
of the tree is the boundary-boundary intersection, then the relations in the ø-valued and ¬ø-valued
child nodes are {disjoint, contains, inside} and {meet, equal, covers, coveredBy, overlap},
respectively. Hence for the specific problem at hand the decision tree will not be perfectly
balanced. The goal however is to build a near-optimal tree.
Computers and Graphics 18 (6): 815-822, 1994.
6.1 Naive Cost Model
The naive cost model assumes that all relations occur with equal probability. For a given tree, T,
the computation cost, lr
T, associated with a relation r is
lr
T=wrdr
T(4)
where dr
T is depth of the node at which a particular relation is identified and wr is the weight
associated with relation r.
Definition 1. The characteristic, χT, of a tree T is the sum of the computation costs for all
relations, that is, χT=lr
T
r
=wrdr
T
r
(5)
For the naive model, the characteristic of a tree is the average path length, since wr = 1
N for all N
relations. Figure 2 shows the decision tree for the eight region-region that results from using a
naive cost model.
∂ =
TF
T
i
FTF
T
dct
F
cB
F
T
o
o=o=
o∂ = o∂ =
o∂ =
T
T
m e
F
cv
F
oo = ∅
Figure 2: The decision tree for the eight region-region relations based on the naive cost model.
6.2 Refined Cost Model
The naive model can be refined by assuming a certain frequency distribution of relations and a
different cost for each operation. A refined model would assign a frequency distribution to the
instances of the relations in a database. For example, in a cadastral map of n parcels a fair estimate
is that 95% of all relations are disjoint, while the remaining 5% are meet. The cost of each
operation, i.e., computing an intersection, is highly dependent on the choice of data structure and
spatial access methods. Operation costs do not influence the choice of the tree, since on average all
operations need to be calculated once in the tree; therefore, the characteristic of the tree is stable
with respect to the cost of the operations. This section presents an algorithm to build a near optimal
decision tree that considers the expected frequency of occurrence of relations for a particular data
set.
To illustrate such a refined cost model, we will use the following distribution for region-region
relations: If one computes all n2 topological relations for a given data set, 80% of all relations are
Computers and Graphics 18 (6): 815-822, 1994.
disjoint; 10% meet; 5% overlap; 2% are contains and inside each; 0.4% are covers and coveredBy
each, and 0.2% are equal. These percentages correspond to the weights wd= 0.8, wm= 0.1, etc.
The following is a greedy algorithm for building a decision tree. At each node if a particular
relation is being identified it continues with that one, else the relation with the highest weight is
considered. L contains the list of relations while P is a set of intersections (e.g., interior-interior)
used so far. Initially P = {}. When a leaf node is reached, P contains the set of intersections that
are sufficient for identifying the relation in the leaf node. M is the set of 9-intersection matrices for
the relations in L.
Algorithm 2. Build a decision tree, T, for determining the relation between objects.
procedure BuildTree (L, P, M);
k := first (L);
Now consider the templateRk;
Create the list of positions [x, y] such that Rk [x, y] δ
for each position [x, y] that is not in the set P of positions used
for each relation i in list L
if Ri[x, y] δ then increment the count for position [x, y] ;
Choose the position with the highest count. Break ties arbitrarily.
for each relation i in list L
if Mi[x, y] = Ø then add relation i to the sublist LT ;
else add relation i to the sublist LF ;
Add the chosen position to the set of positions used, i.e., PT := P
{[x, y]}; PF := P
{[x,
y]};
if LT>1 then T:= BuildTree (LT,PT, M);
if LF>1 then T:= BuildTree (LF, PF, M);
return T;
The algorithm builds the decision tree recursively using the list L of relations sorted in descending
order of their expected frequency of occurrence in the database. The first relation, Rk, in this list is
considered and a discriminant, by which L will be partitioned into two sublists, is chosen. Each
relation in the list L has a set of intersections, for example boundary-boundary, that can be used as
a discriminant. For each such intersection a count is kept of its presence in the sets for the relations
in L. Now from the set of intersections for Rk that have not been used as a discriminant, the one
with the maximum count is chosen as the current discriminant. List L is then partitioned into two
sublists LF and LT and the corresponding sets PF and PT of intersections used as discriminants
are also set up for the next recursive call. Note that the sublists are also sorted in descending order
and hence if relation Rk is in one of the lists it will be first element of the list. This ensures that
relation Rk is identified before any other relation on the list.
Algorithm 2 assures that the most frequently occurring relation is calculated in the minimum
number of steps. The same criteria is applied in each node of the decision tree. For example, when
Algorithm 2 is applied to region-region relations, with the frequency distribution discussed at the
beginning of this section, the ordered list of relations is L=d,m,o,ct,i,cv,cB,e
( )
. Disjoint is the first
relation considered and position [1,1] denoting the boundary-boundary intersection is the
discriminant. Since Md[1,1] = Ø, the sublists are LT=d,ct,i
( )
and LF=m,o,cv,cB,e
( )
. The procedure
is invoked recursively until the leaf nodes are reached, resulting in the decision tree shown in
Figure 3.
Computers and Graphics 18 (6): 815-822, 1994.
∂ =
TF
T
d
FT
m
F
T
ct
i
FTF
T
e
F
T
o
F
cv
oo = ∅ oo = ∅
o∂ = o∂ =
o=∅ ∂o=
cB
Figure 3: The decision tree for the eight region-region relations based on a refined cost model.
6.3 Characteristic of the Tree
The characteristic of a tree represents the average number of computations that are needed in order
to assess a relation. It is defined as the weighted sum of all relations:
χT = wrdr
T
r
(6)
The tree for the naive cost model (Figure 2) has the characteristic:
χTn = 0.8
×3+0.1×4+0.05 ×3+0.02 ×2+0.02 ×3+0.004 ×3+0.004 ×3+0.002 ×4
(7)
= 3.082
Using the same distribution, the refined cost model tree (Figure 3)
χTr= 0.8
×2+0.1×2+0.02 ×3+0.02 ×3+0.05 ×4+0.004 ×4+0.004 ×4+0.002 ×4
(8)
= 2.16
The theoretical lower bound
for the characteristic of the decision trees for a given set of relations
can be determined by the Huffman algorithm [24]. This bound cannot be achieved because the
“codes” —or set of intersection values in this case— that signify a particular relation, are
predetermined. The Huffman algorithm, however, determines the optimal prefix codes that should
be used. For the distribution considered in this section the lower bound, if the relations were
characterizable by Huffman codes, would be 1.422.
The theoretical upper bound
Θ
is the maximum depth of the decision tree built, assuming a
uniform distribution of the relations. For region-region relations this bound would be 4 since meet
and equal are at depth 4 in the tree of Figure 2.
A measure of the optimality for a tree T is given by: µT
( )
=χT− Ω
Θ−Ω
. This has a value 0 for the
optimal tree, while it is 1 for the worst tree. For the two characteristics χTr and χTn we have
µTr
( )
=0.287 and µTn
( )
=0.644; therefore, the tree, Tr, obtained from the refined cost model is
closer to the theoretical optimum.
Computers and Graphics 18 (6): 815-822, 1994.
7. Conclusions
We presented a new approach to processing and optimizing queries over spatial relations, which
relies on the existence of any of the common spatial access methods, such as R-trees, and exploits
the semantics of the relations, heuristics, and the distribution of the objects in geographic space. It
builds on the well-defined formalism of the 9-intersection, a model for binary topological relations.
This model is popular in GIS research and has been implemented in commercial GISs. We
developed algorithms that minimize the calculations of intersections necessary to uniquely
determine the topological relation between two given objects, and to find those objects that satisfy a
particular topological constraint with respect to a given object. Though the present paper used only
the eight region-region relations to demonstrate the principles, the method applies immediately to
other topological relations defined by the 9-intersection, such as line-line relations or region-line
relations.
The present work assumes that the computation of boundary-boundary intersections has the same
cost as say interior-interior intersections. In general, this is not the case. The choice of data model
and structure will determine the costs of the operations as will the choice of computational
geometry algorithms. For example, using plane-sweep algorithms versus a series of line-line
intersection and point-in-polygon tests will enable testing boundary-boundary and interior-interior
intersections in the same iteration. Future work will evaluate these effects and tests will be run on
datasets such as TIGER files. We also propose to extend the present algorithms to process queries
like, “Find all lakes in Maine that are fed by rivers”, which have constraints between two regions
and between a region and a line.
Computers and Graphics 18 (6): 815-822, 1994.
Acknowledgments
George Markowsky’s and Rahul Simha's comments on an earlier version are gratefully
acknowledged.
8. References.
1. O. Günther and A. Buchmann. Research Issues in Spatial Databases. SIGMOD RECORD
19(4), 61-68 (1990).
2. M. J. Egenhofer. Reasoning about Binary Topological Relations. in: O. Günther and H.-J.
Schek (Ed.), Advances in Spatial Databases-Second Symposium, SSD ‘91. LNCS 525, 143-
160, Springer Verlag, Zurich, Switzerland (1991).
3. D. S. Batory, T. Y. Lang, and T. S. Wise. Implementation concepts for an extensible data
model and language. ACM Transactions on Database Systems 13(3), 231-262 (1988).
4. M. J. Egenhofer and R. Franzosa. Point-Set Topological Spatial Relations. International
Journal of Geographic Information Systems 5(2), 161-174 (1991).
5. M. J. Egenhofer and J. Herring. Categorizing Binary Topological Relationships Between
Regions, Lines, and Points in Geographic Databases. Department of Surveying Engineering,
University of Maine, Technical Report (1991).
6. J. R. Herring, R. Larsen, and J. Shivakumar. Extensions to the SQL Language to Support
Spatial Analysis in a Topological Data Base. Proc. GIS/LIS’88, San Antonio, TX, 741-750
(1988).
7. D. Mark and M. Egenhofer. An Evaluation of the 9-intersection for Region-Line Relations.
Proc. GIS/LIS ‘92, San Jose, CA, (1992).
8. M. J. Egenhofer and K. Al-Taha. Reasoning about Gradual Changes of Topological
Relationships. in: A. U. Frank, I. Campari, and U. Formentini (Ed.), Theories and Models of
Spatio-Temporal Reasoning in Geographic Space. LNCS 639, 196-219, Springer Verlag,
Pisa, Italy (1992).
9. C. Freksa. Temporal Reasoning Based on Semi-Intervals. Artificial Intelligence 54, 199-227
(1992).
10. M. J. Egenhofer, A. Frank, and J. Jackson. A Topological Data Model for Spatial Databases.
in: A. Buchmann, O. Günther, T. Smith, and Y. Wang (Ed.), Design and Implementation of
Large Spatial Databases. LNCS 409, 271-286, Springer Verlag, Santa Barbara, CA (1989).
11. J. Herring. The Mathematical Modeling of Spatial and Non-Spatial Information in
Geographic Information Systems. in: D. Mark and A. Frank (Ed.), Cognitive and Linguistic
Aspects of Geographic Space. 313-350, Kluwer Academic, Dordrecht (1991).
12. M. J. Egenhofer. Why not SQL! International Journal of Geographical Information Systems
6(2), 71-85 (1992).
13. M. J. Egenhofer, E. Clementini, and P. Di Felice. Topological Relations Between Regions
with Holes. International Journal of Geographical Information Systems 8(2), (1994).
14. W. G. Aref and H. Samet. Optimization Strategies for Spatial Query Processing. Proc. 17th
International Conference on Very Large Databases, Barcelona, Spain, 81-90 (1991).
Computers and Graphics 18 (6): 815-822, 1994.
15. W. G. Aref and H. Samet. Extending a DBMS with Spatial Operations. in: O. Günther and
H.-J. Schek (Ed.), Advances in Spatial Databases—Second Symposium, SSD’91. LNCS
525, 299-318, Springer-Verlag, Zurich, Switzerland (1991).
16. B.-C. Ooi and R. Sacks-Davis. Query Optimization in an Extended DBMS. in: W. Litwin
and H.-J. Schek (Ed.), Foundations of Data Organization and Algorithms. LNCS 367, 48-63,
Springer Verlag, Paris, France (1989).
17. L. Becker and R. H. Güting. Rule-Based Optimization and Query Processing in an extensible
Geometric Database System. ACM Transactions on Database Systems 17(2), 247-303 (1992).
18. O. Günther. Efficient Computation of Spatial Joins. Proc. Ninth International Conference on
Data Engineering, Vienna, Austria, 81-90 (1993).
19. M. J. Egenhofer and J. Herring. A Mathematical Framework for the Definition of
Topological Relationships. Proc. Fourth International Symposium on Spatial Data Handling,
Zurich, Switzerland, 803-813 (1990).
20. H.-P. Kriegel, M. Schiwietz, R. Schneider, and B. Seeger. Performance Comparison of
Point and Spatial Access Methods. in: A. Buchmann, O. Günther, T. Smith, and Y. Wang
(Ed.), Design and Implementation of Large Spatial Databases. LNCS 409, 89-114, Springer-
Verlag, Santa Barbara, CA (1989).
21. H. Samet. Applications of Spatial Data Sructures: Computer Graphics, Image Processing,
and GIS. Addison-Wesley, Reading, MA (1989).
22. M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-
Completeness. W. H. Freeman, San Francisco, CA (1979).
23. V. Chavtal. A Greedy Heuristic for the Set-Covering Problem. Mathematics of Operations
Research 4(3), 233-235 (1979).
24. D. A. Huffman. A Method for the Construction of Minimum-Redundancy Codes.
Proceedings of the IRE 40(9), 1098-1101 (1952).
... As an example of such relations consider the geometries in Figure 1: LineString 4 intersects LineString 3 , which touches Polygon 1 , which contains Polygon 2 . The topological relations between two geometries can be encoded by the DE-9IM model [3,4,6], which essentially builds an intersection matrix based on the relation between the interior, the boundary and the exterior of the two geometries (see Section 3 for more details). The time complexity of g 1 g 2 g 3 g 4 Figure 1: An example of topologically related geometries. ...
... For these two types of geometries, the Dimensionally Extended Nine-Intersection Model (DE-9IM) [3,4,6] defines the following topological relations between two geometries and : Note that ( ) is 0, 1 or 2 if is a point, a line segment or an area, respectively. Note also that all topological relations can be extracted from the intersection matrix of the given geometry pair [1,17]. ...
... In 1 and 5 , the top performer is the Jaccard similarity, with the Pearson 2 test following in close distance. In 2 , the MBR overlap is by far the most effective scheme, in 3 , these three features exhibit very similar performance and in 4 , the co-occurrence frequency (CF) is the clear winner. This is Table 3, which reports the average distance of each algorithm from the top performance per evaluation measure and dataset across all 20 budgets. ...
... A foundational model for computing the topological connectivity relation between twodimensional geometries is the Dimensionally Extended Nine-Intersection Model (DE-9IM) [Egenhofer and Herring 1990, Clementini et al. 1993, Clementini et al. 1994. This representation provides a structured framework for formally defining spatial predicates that describe the connectivity between POIs. ...
... The proposed QQESPM algorithm uses the topological relations "equals", "touches", "covers", "covered by", "partially overlaps" and "disjoint". The relation "covers" is a variation of "contains" allowing the geometries to have intersecting boundaries [Clementini et al. 1994], and the relation "covered by" is simply the inverse of "covers". ...
Preprint
Full-text available
The Spatial Pattern Matching (SPM) query allows for the retrieval of Points of Interest (POIs) based on spatial patterns defined by keywords and distance criteria. However, it does not consider the connectivity between POIs. In this study, we introduce the Qualitative and Quantitative Spatial Pattern Matching (QQ-SPM) query, an extension of the SPM query that incorporates qualitative connectivity constraints. To answer the proposed query type, we propose the QQESPM algorithm, which adapts the state-of-the-art ESPM algorithm to handle connectivity constraints. Performance tests comparing QQESPM to a baseline approach demonstrate QQESPM's superiority in addressing the proposed query type.
... Assuming two geometries and , the dimension of the intersection between geometric objects and can be calculated with the function dim. The dimension function dim of a general set of geometry returns for relation determination the highest value, and is defined as followed [8]: ...
Preprint
Full-text available
Geospatial data plays a central role in modeling our world, for which OpenStreetMap (OSM) provides a rich source of such data. While often spatial data is represented in a tabular format, a graph based representation provides the possibility to interconnect entities which would have been separated in a tabular representation. We propose in our paper a framework which supports a planet scale transformation of OpenStreetMap data into a Spatial Temporal Knowledge Graph. In addition to OpenStreetMap data, we align the different OpenStreetMap geometries on individual h3 grid cells. We compare our constructed spatial knowledge graph to other spatial knowledge graphs and outline our contribution in this paper. As a basis for our computation, we use Apache Sedona as a computational framework for our Spatial Temporal Knowledge Graph construction
... Further, we can gain an appreciation of the percentages of configurations falling into each of the nine relations. For instance, these statistics have been used in the past for query optimization by using the information on whether a given relation frequently happens or is quite rare [42]. ...
Article
Full-text available
RCC*-9 is a mereotopological qualitative spatial calculus for simple lines and regions. RCC*-9 can be easily expressed in other existing models for topological relations and thus can be viewed as a candidate for being a “bridge” model among various approaches. In this paper, we present a revised and extended version of RCC*-9, which can handle non-simple geometric features, such as multipolygons, multipolylines, and multipoints, and 3D features, such as polyhedrons and lower-dimensional features embedded in R3. We also run experiments to compute RCC*-9 relations among very large random datasets of spatial features to demonstrate the JEPD properties of the calculus and also to compute the composition tables for spatial reasoning with the calculus.
... The third column of Table 1 shows the operations that need to be conducted while evaluating the pixel pairs from two IDEAL-represented complex polygons for a polygon-in-polygon test. This spatial topology model extends the 9-Intersection Model [56] to the pixel pair level. First, if p_r is an internal pixel, the spatial relationship cannot be determined no matter what states the pixels in t it intersects are in. ...
Article
Full-text available
One major goal of spatial query processing is to mitigate I/O costs and minimize the search space. However, geometric computation can be heavy-duty for spatial queries, in particular for complex geometries such as polygons with many edges based on a vector-based representation. Many past techniques have been provided for spatial partitioning and indexing, which are mainly built on minimal bounding boxes or other approximation methods and are not optimized for reducing geometric computation. In this paper, we propose a novel vector-raster hybrid approach through rasterization, where rich pixel-centric information is preserved to help not only filter out more candidates but also reduce geometry computation load. Based on the hybrid model, we implement four typical spatial queries, which can be generalized for other types of spatial queries. We also propose cost models to estimate the latency for those query types. Our experiments demonstrate that the hybrid model can boost the performance of spatial queries on complex polygons by up to one order of magnitude.
Article
In map production it is necessary to keep the spatial relationships between map features. Generalization is simplification performed on geographical data when decreasing its representation scale. It is a common practice to simplify each type of spatial features independently (administrative boundaries first, then road network, hydrographic network, etc.). During the process, some spatial conflicts, which require manual correction, arise inevitably. The generalization automation still remains an open issue for data producers and users. Many researchers are working to achieve a higher level of automation. In order to detect the spatial conflicts, a refined description of spatial relationships is needed. This paper analyzes models of describing topological relationships of spatial features: the 9-intersections model, the topological chain model, and the E-WID model. Each considered model allows one to take into account some relations between features, but it does not make it possible to transfer them exactly. As a result, the task of developing a model of relations preserving topology is relevant. We have proposed an improved model of nine intersections, which takes into account the topological conflict that occurs when a point feature is located next to a simplified line. Line simplification is one of the most requested actions in map production and generalization. When the mesh covered the map inside the cell, there can be points, line segments, and polygon topological features, which, if the cell is rather small, are polyline features. Thus, the issue of simplification of topological features within a cell is reduced to the issue of simplifying linear features (polylines). The developed algorithm is planned to be used to solve the problem of consistent generalization of spatial data. The ideas outlined in this article will form the basis of a new index of spatial data that preserves their topological relationships.
Chapter
This chapter presents concepts and methods of topology that are applicable in geoinformatics and spatial data analysis. After introducing the topology of boundaries and topological models for GIS and simplicial complexes, methods of TDA (Topological Data Analysis) are presented that are suitable for handling spatial data using Hasse diagrams and Vietoris-Rips complexes.
Chapter
Safe autonomous driving requires monitoring the movement of other road users to detect potential problems as early as possible. Road users and regions of interest can be modeled as time-dependent areas. However, current algorithms for monitoring spatial relations between such areas do not consider uncertainty. Thus, they are not able to cope with sensor inaccuracy and errors in trajectory prediction, which can lead to false verdicts regarding possible collisions. In this paper, spatial relations between regions are generalized using a probabilistic approach to treat uncertainties. This makes verdicts of monitored movements more reliable, especially when the positions or boundaries of spatial regions are not known precisely. Therefore, this allows monitoring inherently uncertain spatial relations, like collision estimations or insight into regions of interest using range sensors. The applicability of the presented probabilistic spatial relations is demonstrated by monitoring a potentially hazardous turn maneuver simulated with the Open Urban Driving Simulator CARLA.KeywordsRuntime MonitoringSpatial RelationsUncertaintyRoad Traffic
Chapter
Full-text available
This paper presents a case study of a novel spatial decomposition algorithm in the field of Geographic Information Systems (GIS). Real estate cadastral data of a district, consisting of parcels and buildings, are used as a test data set. A cadastral map is a set of parcels and buildings, or more generally GIS features. Each feature is geometrically represented as a polygon. The legally binding nature of the cadastral map (rights, restrictions and responsibilities on land) requires that the polygons do not overlap. There must also be no gaps between the parcels.Based on the test cases, it is shown that the computationally robust space decomposition presented here with a complete, gapless and overlap-free two-dimensional topology can be used very well in this domain. With the added benefit that the results provided are completely error-free and reliable, and spatial queries can be easily formulated using set operations. The foundations of the algorithm were presented in previous research papers, and are summarized shortly in this paper.While a previous paper provided a proof of concept using artificial datasets, this paper now uses real cadastral data. The fully automated procedure transforms OGC simple features in a space decomposition model. For validation purposes, all cases are additionally tested with a geospatial ETL-software, namely the Feature Manipulation Engine (FME).KeywordsSpace decompositionGISDE-9IMrobust geometry
Article
Full-text available
Practical needs in geographic information systems (GIS) have led to the investigation of formal and sound methods of describing spatial relations. After an introduction to the basic ideas and notions of topology, a novel theory of topological spatial relations between sets is developed in which the relations are defined in terms of the intersections of the boundaries and interiors of two sets. By considering empty and non-empty as the values of the intersections, a total of sixteen topological spatial relations is described, each of which can be realized in R 2. This set is reduced to nine relations if the sets are restricted to spatial regions, a fairly broad class of subsets of a connected topological space with an application to GIS. It is shown that these relations correspond to some of the standard set theoretical and topological spatial relations between sets such as equality, disjointness and containment in the interior.
Article
Full-text available
Geographic Information Systems (GIS) use a variety of approaches to model spatial information and the data processing associated to spatial analysis. Each of these primitive data models has its own set of inherent strengths and weaknesses which determine how its users view the spatial world and reason about it. Further, each GIS application has developed its own jargon to describe complex spatial interactions not usually addressed in natural languages. To study how humans reason about space should require us to examine these logical approaches to spatial reasoning, especially those which extend natural language concepts with their own more application specific jargon. To extend this study to encompass methods for automated spatial reasoning in a manner applicable to existing GIS environments and applications requires a logically consistent framework for the models themselves, for the manners in which they interact, and for the spatial concepts that they can represent.
Article
SQL is a standard query language designed for the relational data base model, but it lacks semantics applicable to the spatial analysis so important in the analysis and processing of geographic and cartographic data. This paper discusses extensions to the SQL standard to support spatial analysis in a topologically-structured geographic data base, including extensions of general data base techniques, and special spatial operators. The research and prototyping for this paper was done in Intergraph's TIGRIS GIS development group, using commercial software being developed there. -from Authors