ArticlePDF Available


Spatial network analysis is a collection of methods for measuring accessibility potentials as well as for analyzing flows over transport networks. Though it has been part of the practice of geographic information systems for a long time, designing network analytical workflows still requires a considerable amount of expertise. In principle, artificial intelligence methods for workflow synthesis could be used to automate this task. This would improve the (re)usability of analytic resources. However, though underlying graph algorithms are well understood, we still lack a conceptual model that captures the required methodological know‐how. The reason is that in practice this know‐how goes beyond graph theory to a significant extent. In this article we suggest interpreting spatial networks in terms of quantified relations between spatial objects, where both the objects themselves and their relations can be quantified in an extensive or an intensive manner. Using this model, it becomes possible to effectively organize data sources and network functions towards common analytical goals for answering questions. We tested our model on 12 analytical tasks, and evaluated automatically synthesized workflows with network experts. Results show that standard data models are insufficient for answering questions, and that our model adds information crucial for understanding spatial network functionality.
Transactions in GIS. 2021;00:1–38.
Received: 22 March 2021 
  Revised: 7 Se ptembe r 2021 
  Accepted: 14 September 2 021
DOI : 10.1111/tgis .128 55
A conceptual model for automating spatial
network analysis
Simon Scheider1| Tom de Jong2
This is an op en access arti cle under the ter ms of the Creative Commons Attribution L icense, which pe rmits use, dis tribu tion and
reproductio n in any medium, provided the original work is p roper ly cited .
© 2021 The Authors . Transactions in GIS published by John Wiley & S ons Ltd.
1Depar tment of Human Ge ograp hy and
Spatial Plannin g, Utre cht Unive rsit y,
Utrecht, the Netherlands
2Depar tment of Logistics, Ste llenbosch
University, Stellenbosch, Sout h Afric a
Simon Scheider, Department of Human
Geography and Spatial Planning, Utrecht
University, the Netherl ands.
Funding information
H2020 European Research Council , Grant /
Award Number: 803498
Spatial network analysis is a collection of methods for meas-
uring accessibility potentials as well as for analyzing flows
over transport networks. Though it has been part of the
practice of geographic information systems for a long time,
designing network analytical workflows still requires a con-
siderable amount of expertise. In principle, artificial intel-
ligence methods for workflow synthesis could be used to
automate this task. This would improve the (re)usability of
analytic resources. However, though underlying graph algo-
rithms are well understood, we still lack a conceptual model
that captures the required methodological know- how. The
reason is that in practice this know- how goes beyond graph
theory to a significant extent. In this article we suggest in-
terpreting spatial networks in terms of quantified relations
between spatial objects, where both the objects themselves
and their relations can be quantified in an extensive or an
intensive manner. Using this model, it becomes possible to
effectively organize data sources and network functions
towards common analytical goals for answering questions.
We tested our model on 12 analytical tasks, and evaluated
automatically synthesized workflows with network experts.
Results show that standard data models are insufficient for
answering questions, and that our model adds information
crucial for understanding spatial network functionality.
   SCHEIDER Et al.
Computational models of spatial networks for geographic information systems (GIS) have been known for a long
time (Sutton, 1998). They are frequently used in applications such as spatial planning (Geertman, de Jong, &
Wessels, 2003), transport analysis (Thill, 2000), supply infrastructures, and the analysis of flows (Curry, 1972, cf.
Miller & Shaw, 2001, for an overview). Corresponding functions are nowadays implemented in many GIS software
tools, such as ArcGIS Network Analyst ( us/arcgi s/produ cts/arcgi s- netwo rk- analy st/
overview), as well as in Web APIs and geo- services (https://devel
Yet, despite the ubiquity of technical resources, answering questions about spatial networks still requires
organizing analytic functionality into workflows, and the latter presupposes a considerable amount of expertise.
Suppose our task is to assess the accessibility and distribution of transport flows within a road network. Could
ArcGIS’s service area tool ( app/lates t/help/analy sis/netwo rks/servi ce- area- analy
sis- layer.htm) be used for this task, or rather a different one? And is a road network data set sufficient, or do we
need travel statistics as well? It is clear that while such tasks are of relevance for many data scientists, manual
identification of functions and data is a time- consuming process (Scheider & Tomko, 2016), and manual composi-
tion of workflows remains a non- trivial craft.
To address this challenge, program synthesis algorithms were developed in (symbolic) artificial intelligence ( AI)1
(Naujokat, Lamprecht, & Steffen, 2011). They provide a way to automate this task, allowing analysts to loosely
specify workflows without knowing the details about available resources (Kasalica & Lamprecht, 2020b). These
algorithms have predecessors in geographical information ser vice composition (Lutz, 2007), but go beyond by
searching through the composition space of functions described by an information ontology, in order to satisfy a
given task specification (L amprecht, Naujokat, Margaria, & Steffen, 2010). To automate spatial network analysis,
the main challenge lies in finding the appropriate semantic constraints for both task specifications and function
descriptions (Kruiger et al., 2021).
Yet ge ogra phi c inf orm atio n sc ien ce (G ISc ien ce) ha s stru ggl e d to co m e up wit h a mo d e l that is able to ca ptu re th e
semantic constraints implied by this practice (see Sec tion 2). The difficulty seems to lie in a frequent confusion of
networks as concepts used in geographic practice, with networks as data models implemented in particular informa-
tion systems (Kuhn & Ballatore, 2015). Network data models are usually understood as embedded graphs (Scheider
& Kuhn, 2008), where vertices are embedded as points in Euclidean space allowing us to assess metric distances.
While sufficient for implementing network procedures, this model seems to disregard import ant concepts needed
to analyze spatial networks, and in consequence, fails to capture underlying analy tical tasks. To illustrate, suppose
our goal is to assess the effect of football games on traffic load on the street s, caused by football fans traveling
to their respective clubs. How could a graph model be used to specify the task of determining flows of fans from
residential areas to clubs based on the numbers of residents and their distances to clubs? There is no concept in
embedded graphs that would allow us to distinguish numbers from ratios on nodes or flows from distances on
edges. Another sub- task is to assign flows to particular paths on a road network to assess the traffic caused by
fans. To handle this problem, different kinds of weights for different kinds of edges need to be distinguished, yet
we cu rr en tl y la ck a the or y that ma kes such distinc ti on s. Sticking to gr aph- th eo re ti c terms seems to merely transf er
the problem to the semantics of graph labels, or of edge and node labels (Kanjilal & Schneider, 2010).
We therefore argue that the concepts underlying spatial net work data models need to go beyond embedded
graphs. An explicit model of these concepts would help us better understand not only what kind of information a
spatial network contains, and which questions can therefore be answered with it (Scheider et al., 2021), but also
what kinds of analyses are possible. This leads to automating the analysis process itself. To address this goal, we
argue that spatial networks should be conceived in terms of core concepts of spa tial information (Kuhn, 2012), which
implies impor tant restrictions on the applicabilit y of functions. More precisely, we consider networks as quanti-
fied relations between spatial objects,2 where both object and relational qualities can be considered extensive or
intensive (Scheider & Huisjes, 2019). Spatially ex tensive measures are additive with respect to the spatial extent
of their controlling objects, whereas intensive measures are not additive in this sense. Consider, for example, the
potential of football fans living in districts of a city. These potentials add up when merging underlying districts, as
oppose d to th e dis ta nce to the cit y center. Notice that bot h potential and dist ance measures are required for esti-
mating travel flow in our example above, and more generally, to model spatial network analysis (see Section 4.2).
We argue that this new model, by its very simplicity, can go a long way towards clearing a pathway through the
jungle of available functionality and corresponding network tasks. We focus on the following questions:
1. How can spatial network tasks be specified in terms of core concepts and extensivity, to assess the
suitability of resources?
2. To what extent can network functionality be distinguished in terms of concept transformations?
3. What is the quality of automatically synthesized workflows that are based on such concepts?
Note that in this article, spatial network analysis is not a method, but an object of investigation. Correspondingly,
we are not targeting empirical questions about spatial net works, as usually intended by GIS analysts. Instead, our
study is about conceptual modeling (Guarino, Guizzardi, & Mylopoulos, 2020) of geographic information, and a net-
work analysis scenario serves merely as our empirical basis. Even though our goal is to distinguish network concepts
from other kinds of concepts relevant in GIS, we are aware that the underlying functions always form an integrated
whole in practice. Correspondingly, our model feeds into a more general geographic information ontology (Scheider
et al., 2020). Our goal is a lightweight type system that is able to model the part of this practice needed to compose
workflows for answering questions (Scheider et al., 2021). In the following, we start with a review of spatial network
theory and corresponding conceptual models (Section 2), before giving an overview of our methodological approach
(Section 3). Our own conceptual model is developed in Section 4, and is then used to introduce computational signa-
tures for spat ial networ k fun ct ions (Se ction 5), as wel l as to speci fy 12 spatial net wor k tasks in an app lication scenario
(Section 6). Finally, we evaluate our model by automatically synthesizing workflows for each scenario task and by
assessing their quality (Section 7).
If we look at current st andard textbooks on GIS, spatial networks seem to play only a minor role (Burrough,
McDonnell, & Lloyd, 2015; Chrisman, 2002; Hey wood, Cornelius, & Carver, 2010; Longley et al., 2015). Yet the
relevance of spatial networks for geo- spatial analysis has been known to geographers since the rise of quantitative
methods in the second half of the twentieth century. It is insightful to take a look at the histor y of spatial net work
related concepts, which runs in parallel to the change of research paradigms within geography and GIScience.
Furthermore, we review recent work on geospatial semantics as a basis for modeling spatial network concepts.
2.1 | Spatial network analysis
Peter Hagget t and Richard Chorley’s book (Hagget t & Chorley, 1969) provides an early integrated view on passive
(drainage networks) and active transportation networks (e.g. roads). In this text, graph theory plays a minor part,
including definitions of trees and circular graphs, as well as shortest path algorithms. Beyond graphs, the authors
focus their discussion on flow networks versus barrier networks; relations of channel order numbers, flow and
lengths in drainage networks; geometric shapes, densities and orientations of networks; the relation between dis-
tance, flow and efficiency/costs of networks, relating to Christaller’s optimal settlement system (Christaller, 1933),
as well as network change over time. Furthermore, optimization methods include not only shortest path algorithms,
but also districting and problems of regionalization (how to divide space into tessellated regions using net works).
   SCHEIDER Et al.
In the 1970s and 1980s, when GIS evolved, human geographers discovered the powerful concept of a potential
in geographic space (Rich, 1980). This is related to the idea of accessibility, which combines the concept of distance
with the utility of activities that can be performed at the destinations in a net work (Moseley, 1979; Ingram, 1971).
Accessibility allows us to assess a potential interaction (Masser & Brown, 1977) of numbers of people or amounts of
goods between places, in analogy to a gravity model (Batty, 1976; Curry, 1972; Wilson, 1974). These methods have
become essential tools of spatial planning with GIS (Geertman & Ritsema van Eck, 1995; Jong & Ritsema van Eck,
1996). Besides path algorithms, Ritsema van Eck (1993) identified zoning, districting and origin– destination matrix
methods as essential for spatial network analysis in GIS.
Research on spatial network models and GIS during the 1990s, in contrast, focused less on conceptual or
methodological issues, and more on network data models that would allow integration of transport science func-
tionalit y into GIS databases (Miller & Shaw, 2001; Sutton, 1998; Thill, 2000). These systems were called GIS- T, and
researchers were mainly concerned with how data structures and algorithms for transportation research could
best be integrated within a GIS infrastructure. This “structural” view of networks continues to the present day,
though the focus has shifted from implementation models to formal models that would support efficient design
of databases across software environments (Kanjilal & Schneider, 2010; Qi, Zhang, & Schneider, 2016), as well as
efficient quer ying of network data, including graph databases (Güting, 1994) and moving objects on networks
(Güting, De Almeida, & Ding, 2006). Other authors have focused on network complexity measures for spatial
graphs (Arlinghaus, Arlinghaus, & Harary, 2002; Jiang & Claramunt, 2004). The latter approach, however, largely
abstracts from the conceptual basis of network analysis in geography.
2.2 | Networks as core concepts of spatial information
What kind of semantics should be adopted to model spatial networks as concepts? Some researchers have been in-
vestigating transport networks from the viewpoint of environmental cognition, such as wayfinding activities and af-
fordances (Winter, 2002; Scheider & Kuhn, 2010, 2008). A more general, transdisciplinary account of net works was
given by Kuhn in terms of the core concepts of spatial information (Kuhn, 2012). On this account, networks are one of
a range of concepts needed for interpreting the environment and for reasoning with GIS. These concept s constitute
conceptual “lenses” through which the environment can be studied independently of technical representations (Allen
et al., 2016; Kuhn & Ballatore, 2015). Besides the base concept of location, allowing for metric distance assessments
in space, Kuhn distinguished the following content concepts, which we interpret here in a broader research contex t:
Fields are understood as contin uo us funct io ns (Gal ton, 200 4) whose do ma in is tim e and locatio n, and whose rang e
may be any kind of measurable quality. Temperature fields are a prime example.
Objects are understood as functions from time to locations and qualities (Galton, 2004). Objects are distinct from
fields and events in the sense that they have an identity and that they are fully localized in each moment of their
existence. We assume that object s include both bona fide (perceivable) and fiat (conventional) boundaries, as in
the case of administrative units.
Events are understood as entities that, besides having identity and having qualities like objec ts, happen during
some temporal inter val. Earthquakes, which have a time, a location, and a magnitude, are a prime example.
Networks are quantified relations between objects, that is, functions from pairs of objects to some quality.
Networks measure a relationship between objects. Kuhn (2012) distinguished link networks which connect
objects in a qualitative way (e.g. friendship, treaty or business relation) from path networks, which can measure
flows or paths between objec ts. Similar distinctions can be drawn in our model.
We believe that geo- analytical tasks, and network analysis in particular, can only be understood when model-
ing these concept s in combination, because they depend on each other. Yet, so far, computational models of core
concepts have not taken networks into focus (Kuhn & Ballatore, 2015). Furthermore, it is an open question how core
concep ts combin e with other semantic conc epts needed for geographi c analy sis (Schei der et al., 2020). Our model of
spatial net works was designed to reflect precisely this underlying practice.
2.3 | Ontologies for geo- analytic workflow synthesis
Automated workflow composition first appears in the context of geographical information web processing services
(Yue et al., 2007). However, its effectiveness mainly depends on the quality of the ontology used to describe the in-
formation resources (Hofer et al., 2017). As recognized early on (Albrecht, 1998; Giordano et al., 1994), this includes
the need for generalized taxonomies of GIS that focus on functionality rather than technicalities. The main difficulty
seems to lie in the fact that analytical concepts are not fully reflected in data types, and thus can occur in various syn-
tactical variations. In Scheider et al. (2020), we have therefore suggested an OWL3 ontolog y of typ es of core con ce pt s
that can occur in combination with measurement levels and data types, to serve as a method for reasoning about GIS
workflows and geo- analytical tasks. Based on this work, there have been recent attempts at automating GIS workflow
synthe si s for task s th at are not ne twork relate d (K ruiger et al ., 2021). Computa ti on ally, thi s ap pr oa ch is base d on loose
programming, that is, the sequencing of func tions satisfying task constraints specified over an ontolog y with some
temporal logic (Lamprecht, Naujokat, Margaria, & Steffen, 2010) (see Section 7.1). To handle spatial network analysis
tasks in the same manner, network concepts need to be combined with other core concepts. Yet, formal models of
the role that networks play in this respect are lacking. We also do not know of any studies about modeling network
functionality with the goal of automating geo- analytical tasks. This gap is addressed in the current article.
In this section we explain the steps taken towards developing and testing a conceptual model of spatial network
analysis. Empirically, our study is based on a network analysis scenario: the analysis of football clubs and their fans
in the Netherlands, as outlined below. This scenario gives us a way to explore core tasks of spatial network analysis
as a basis for developing our model (Grüninger & Fox, 1995). Furthermore, to evaluate our model, we manually
generated expert- level workflows for these tasks, and compared them with workflows automatically synthesized
using our conceptual model.
3.1 | Network analysis scenario and task design
The following scenario was selected based on whether it captures precisely those practices that distinguish spa-
tial network analysis from other types of spatial analysis. This mainly includes the capabilities of handling spa-
tial interaction data, going beyond geometrical GIS models that focus on topological relations and distances.
Dejonghe, Van Hoof, and Kemmeren (2006) published a book on professional football clubs and their fan base in
the Netherlands. One of the data sets they used is the 2003 nationwide complete list of the number of seasonal
ticket holders by football club and by municipalit y. Football fans in the Netherlands are usually season ticket hold-
ers, and as such form regular transport flows when traveling to their clubs.
We assume an analyst plans a follow- up GIS study exploring spatial interaction of the fan base at a municipal
scale. Suppose he or she is given municipal data about population numbers, football clubs (within municipalities),
a road network, and some data about fan (ticket) statistics. Using these data, the analyst can answer various
network- related questions. In total, we formulated 12 different workflow tasks that cover major forms of analysis
(Section 6). For illustration purposes, we explain the first three examples:
   SCHEIDER Et al.
1. What is the suitability of municipalities (e.g. as a place for a new stadium) in terms of the fan potential
reachable within a certain distance?
2. What is the suitability of municipalities in terms of the minimal travel distance to reach a certain number of
football fans?
3. What is the accessibility of football clubs for people living in municipalities?
Workflows to answer the first two questions can be found using threshold distance/amount analysis. For example
(Tas k 1), we can assess the minimal distance that needs to be traveled to reach a threshold number of potential fans,
which generates a map of municipal travel times (Figure 1a). Alternatively (Ta sk 2 ), one could assess the number of
football fans reachable within a threshold distance (not shown here). Answers to the third question can be found by
generating a map of catchment areas (Tas k 3), whe re eac h municipality is assigned to its nea re st club ac co rd in g to so me
clu b capacit y. This re sults in a map like in Fig ure 1b, where the smalle r the dis ta nce, th e more acces sible clubs ar e. The
data can be used to do accessibilit y statistic s, revealing, for example, that over 77% of ticket holders live near at least
one club within a 15- minute drive.
3.2 | Expert- level workflow design
Once analytical tasks were formulated, we designed workflows manually as a basis for developing and evaluating
our model. We were interested in understanding how experts choose and organize software tools into a workflow
graph which generates valid answer maps. The answers were computed and illustrated using Flowmap, which is a
software designed to handle spatial interaction (http://flowm Some of this functionalit y can also be
found in other GIS sof twar e, such as ArcG IS Netw or k Analy st (ht t ps ://w ww.e sr m/e n- us/arcgi s/prod u c t s/a rc gi
s- netwo rk- analyst). Example workflows for answering Tasks 1,2 and 3 can be seen in Figure 2.
FIGURE 1 (a) Fan potential and (b) club accessibility. (a) threshold distance map, showing the travel distance
to reach a given number of fans on a scale from red (5,00 0 ticket s within a five- minute drive time) via orange
(5,000 within 10 min) to yellow (5,000 within 15 min). Stadiums are shown as points. (b) Catchment areas are
drawn in dif ferent colors for each club and the superimposed allocation lines indicate the closest club for each
municipality. The analysis generates both shortest distances to closest club and information about this club
To computationally solve these three tasks, we first need to measure the leng th of road segments (or their
travel impedance) using street data. Then we need to turn the latter into a transport network (graph), by taking
segment ends as intersections. This also includes checking segment topolog y. Origin and destination locations
(municipalities) together with the transport network then need to be fed into a distance matrix function to com-
pute a matrix of shortest paths between municipalities (including also the “last mile” feedlinks from origins and
destinations to the closest network intersection). The distance matrix together with the origin (destination) loca-
tions including their capacity (demand) then feed into either a threshold or a catchment area function, to produce
either a suitability or an accessibility map. Parameters (not shown here) are the use of travel speed for computing
distances as travel time, as well as the choice of threshold distance or amount.
3.3 | Conceptual modeling and workflow synthesis study
The goal of our investigation is to learn how to produce workflows comparable to the examples above in an auto-
mated manner, given just the task descriptions and the starting data. Since these computational steps are implicit
in the task, they need to be figured out automatically. This is done based on some conceptual model that can be
used to describe the task, the data and the computational functions. We develop such a model in Sec tion 4 in the
form of an ontology. We used this ontology to describe typical spatial network functions as transformations of
concepts in our model. This means we described functions in terms of their input/output types, resulting in a type
signature in Section 5. Furthermore, we specified the 12 analytical tasks in terms of concept transformations using
the same t ypes (Section 6).
The conceptual model, together with the task specification and the function type signatures were then fed
into a loose programming algorithm. The latter searches for ontologically consistent sequences of function appli-
cations of increasing complexity that satisf y a given task description (L amprecht, Naujokat, Margaria, & Steffen,
2010). As explained in Section 7, we evaluated synthesized workflows based on expert assessments.
The model introduced in this section is less about computation, and more on the level of thinking in GIS. Thinking
happens in parallel to computation by interpreting the computational products in terms of concepts (Guarino,
Guizzardi, & Mylopoulos, 2020). In a nutshell, we suggest regarding spatial networks as quantified relations be-
tween object s embedded in a metric space, such that both objects and their relations can be quantified in a spa-
tially extensive or intensive manner. This model is used to formulate analytical tasks and to guide the composition
of workflows.
FIGURE 2 Expert workflow implementing Tasks 1, 2 and 3 in Flowmap. Ellipses denote computational steps,
rectangles denote data sets. We have generated such exper t solutions for every task (not shown here because
of lack of space); see Section 7
   SCHEIDER Et al.
4.1 | Spatial networks as quantified relations
One way to think of core concepts of spatial information (Kuhn, 2012) is in terms of par ticular kinds of relations
in the sense of relational algebra4 (Codd, 1979). For example, information about a spatial field can be regarded as
a relation between locations and some quality (“at this location, the temperature is
C”), and information about
object s as a relation between object identifiers and object qualities (“this building has a height of 10 m”). In the first
case, locations form the primary key, in the second case, object identifiers serve as the primar y key, while qualities
are foreign keys in all cases. We call such relations unary qualities, because the measured quality is controlled by a
single entity. A spatial network, in contrast, captures the idea of a relation with a composite key: the key consists
of some pair of instances of objects or other concepts, and we measure some quality for each pair. For example, a
distance matrix between cities has pairs of objects as a primary key and distance measurements as a foreign key.
We call such relations quantified relations, and their qualities binary qualities.
In principl e, all core concepts can play a role in determ ining quantif ied relations . The measured quality, for example,
can be generated by various kinds of concepts. To analyze a drainage network in a catchment area requires summation
of a hydrological field (rainfall, wate r con tent) within the ri ver cat chment to de termine net work fl ow (Hag gett & Chorley,
1969). To study movement or changes in a transport network, traffic or construction events need to be summarized.
Furthermore, the primary key of a quantified relation can be formed by different concepts. Prominent GIS methods
such as visibility analysis and Euclidean distance analysis can be conceived in terms of a Boolean or ratio scaled relation
between locations in space. We might call the latter relational fields, given that they quantify a measure for pairs of lo-
cations, similar to ordinar y fields quantifying a single location. Such hybrid models have been proposed earlier; see, for
example, Cova and Goodchild ’s (2002) idea of object fields. However, within the limited scope of this article, we focus
only on object- based primar y keys. This interpretation may correspond to a default understanding of spatial networ ks.
4.2 | Measuring extensive and intensive network qualities
Unary and binary qualities can be measured on different levels, and in this way determine whether functions are
applicable or not (Scheider & Tomko, 2016). For example, it is well known that dif ferent levels of measurement, in-
cluding count, ratio, inter val, ordinal and nominal, are relevant for understanding analysis in GIS (Chrisman, 2002).
In this article we will make use of a Boolean quality including the values true and false, as well as plain nominal quali-
ties, which correspond to qualities that are on a nominal level and not on any other level. We will also consider the
regions of space that an object occupies as a measurable quality of that object.
The most important distinction for network qualities, however, is that between spatially extensive and intensive
qualities (Scheider & Huisjes, 2019). Extensivity is known to influence the applicability of arithmetic functions,
such as the possibility of forming sums:
Extensive qualities, which are closely related to amounts, are ratio- scaled qualities that are additive with respect
to the spatial extent of non- overlapping control units. An example of an extensive quality would be the popu-
lation of administrative units. If we merge two such units into a larger one (assuming the units do not overlap),
then their population counts sum in a corresponding way (Scheider & Huisjes, 2019). And the population count
of a region shrunk to zero size becomes zero, making it ratio- scaled (Chrisman, 2002). We consider extensivity
as a class not only of unary qualities, but also of binar y qualities or networks. Following this idea, extensive
binary qualities are determined by the extents of the objects that constitute the network relation. Take the
example of commuter flows: when merging a destination region (e.g.z a city) with a new destination (a satellite
town), the commuter flow between origin and destination will increase by the sum of flows from the origin to
the new destination.5
Intensive qualities, in contrast, are ratio- scaled qualities that do not sum when merging units. An example would
be the percentage of elderly people of a municipality, or the distance to the closest sport club. When merging
control units, the first quality needs to be aggregated using weighted averages, not sums. For spatial networks,
we consider intensive binary qualities. An example would be the distance measured bet ween two regions, which
needs to be minimized, rather than summed, when merging one of these regions with others.
These ideas give rise to the relational types listed in Table 1. In Figure 3 these types are illustrated by entity re-
lationship diagrams, with primary keys (PK) taken from data examples in our scenario (see Section 6). For example,
layer s of municipalitie s and football clubs are modeled as unary qualities with objects as primar y key and some geom-
etry as foreign key (OS). Road sizes and population numbers are examples of extensive unary qualities (OE). Distance
networks (between municipalities and clubs), in contrast, correspond to intensive binar y objec t qualities of type OIO,
where as tra ffic flows bet we en road int er se ctions cor re sp on d to ex tens ive binary objec t qualities of ty pe OEO. Binar y
qualities c an also be Boolean, indicating whether paths go through a pair of objects, or consist of geometries that
denote such a path (= path networks, type OSO).
Just as in relational algebra, we leave open how complete a given relation is with respect to its set of tuples
and the domains that make up its key. Binary concepts that consist of an incomplete subset of the cross- product
of two given sets of objects are called networks. Networks might consist of only a single pair of objects as a key.
Sometimes we want to be more exhaustive, and then the complete cross- product of two sets of objects makes
up the primary key of that relation, which we call a matrix. We use the star symbol * to refer to relations of that
latter sort (e.g. OEO*).
Spatial network analysis, in essence, consist s of transformations between such qualities (Figure 4). For exam-
ple, a catchment area analysis, which computes networ k distances to the closest ob ject in a layer, trans forms an in-
tensive (distance- based) network between spatial objects with extensive quantities into intensive objec t qualities
(distance to closest object). This corresponds to going from the middle layer to the upper layer in Figure 4. Gravity
models (Batty, 1976), in contrast, allow us to estimate amounts of interactions between objects. In essence, they
convert an intensive (distance- based) quality between spatial object s with extensive quantities (middle layer) into
some extensive quality (lower layer).
4.3 | Representing object and network qualities as data types
The concepts discussed above are interpretations of input or output data of network functions, that is, they
constitute intermediary types. Which formal type system should be used to add such interpretations to the data?
A given core concept can be represented by various geometry types, and conversely, a given geometric model
might be interpreted in terms of different concepts (Scheider et al., 2020). A field, for example, may be repre-
sented by vector lines or polygons (think about contours or land cover polygons), as well as by some raster layer.
TABLE 1 Concepts as types of relations of objects and measured qualities
Unary Binary
Measure quality quality
S (spatial geometr y) OS (object geometry) OSO (path network)
B (Boolean qualit y) OB (Boolean object quality) OBO (Boolean net work)
N (nominal quality) ON (nominal object qualit y) ONO (nominal network)
I (intensive qualit y) OI (intensive object quality) OIO (intensive net work)
E (extensive qualit y) OE (extensive object quality) OEO (extensive network)
Note: Unary object qualities have a simple primar y key, networks are binary object qualities (composite primary key).
   SCHEIDER Et al.
Similarly, networks may be represented by many kinds of geometries, not only by lines.6 And conversely, a line
data set alone does not yet imply the existence of a network: to turn a roads file into a network, we first need to
build a network topology. We take account of this representational variety simply by three orthogonal semantic
dimensions: the core concept represented by a given attribute, its measurement level, and the geometr y t ype of
its layer. Each dimension forms an independent subsumption hierarchy, where subsumed classes are interpreted
as sub- classes. Classes can be combined arbitrarily between hierarchies, while leaf classes of one dimension are
considered mutually exclusive.
Dimensions were encoded by extending the core concept data types (CCD) ontology (http://geogr aphic knowl CoreC oncep tData) (Scheider et al., 2020) with corresponding OWL classes (see Figure 5). The
first dimension (Figure 5a) includes the hierarchy of core concept types. CoreConceptQ is the upper bound of this
hierarchy and subsumes ObjectQ (object qualit y), NetworkQ (network qualit y) and MatrixQ (matrix quality). The
latter two are subsumed by RelationalQ (
binary qualit y). AmountQ denotes amount s of objects or other content
that is not bound to any object quality. We use this class to denote summary statistics. In the layer geometry
dimension (Figure 5b), LayerA subsumes LineA (line attribute), VectorTessellationA (polygon tessellation attribute)
and PlainVectorRegionA (attribute of a non- tessellated polygon layer). The third dimension (Figure 5c) subsumes
FIGURE 3 Entity relationship diagram of spatial network concepts, with realization examples (tables with
primar y/foreign keys) taken from our football scenario (see Section 6). Note how dif ferent tables can be
realizations of a given concept
measurement levels of an attribute, with NominalA being the upper bound. IRA/ERA are considered subtypes of
RatioA standing for intensive/extensive region attributes. PlainNominalA denotes nominal attributes that are not
on a more specific measurement level. Conjunctions of these classes are used in the following to specify tasks,
describe functions and compute workflows.
Building on our model, we can distinguish available network functions based on how they transform one concept
into another. This is done based on type signatures using the types from our model. The signatures of functions
relevant to our scenario are given in Table 2, and each one is briefly explained below. The table contains soft ware
examples from ArcGIS as well as Flowmap.7 Tool annotations in RDF ( - prime r/) are
available online (ht tps://figsh 030aa f8b78 175ab), including also geometry types for function in-
puts and outputs which are omitted here. Tasks illustrating their use are mentioned together with each function
(see Appendix A).
We star t with basic functions that are underlying yet not usually considered to be network analysis. Usually,
the first step in constructing an intensive (distance) network is to measure road lengths using street segment lines.
We call this operation measure size, and it takes object regions (OS; in this case lines) and generates object sizes
FIGURE 4 Modeling spatial networks in terms of qualities of objects and their relations. Both kinds of qualities
can be extensive or intensive. Spatial network analysis essentially transforms these qualities into each other
   SCHEIDER Et al.
(OE; in this case lengths of lines), which are extensive measurements. Object sizes can then be used together
with the geometry of their object regions in order to construct a distance network, based on topological (touch)
relations between geometries. The latter are used to generate new (intersection) object pairs in the network,
while the object sizes become intensive distance qualities of the network (OIO). This step corresponds to “building
a topological network” in GIS. Following our logic of naming functions according to their outputs, we call it a dis-
tance network here. The distance matrix function takes an intensive network of distances (OIO), as well as a set of
object regions (OS), and generates a matrix of network distances between all pairs of objects. Commonly this is the
shortest path between these objects on the network, and involves, in case some object s are not in the net work,
also a metric distance measurement between these objects and their entry points to the network.
Functional clustering (Brown & Horton, 1970) between two locations in space is the reverse of the amount of
interaction bet ween them. For example, the intramax method developed by Brown and Masser clusters (adjacent)
locations based on the amount (Masser & Brown, 1975) or relative amount (Masser & Brown, 1977) of interaction.
It therefore takes an extensive (interaction) matrix, as well as some object regions, and generates a nominal object
quality, where the nominal value indicates the cluster to which a given object belongs. Object regions are needed
to determine whether objects are neighbors. An example is given with Task 6. A catchment area function t akes an
intensive (distance) matrix , so me objec t re gions as orig in s, as well as some obje ct regio ns as dest in ations, and indi-
cates, for each origin object, its distance to the closest destination object, as illustrated in Task 3. Network analysis
does a similar thing, only based on an intensive distance network and some destination object regions, computing
shortest distances to the closest object for all possible origins given within this network (Task 4). The resulting
distance measurements on objec ts can be used to compute accessibility statistics. In addition, this function also
outputs corresponding shortest paths given as a Boolean network, where true indicates that some path goes
through the corresponding pair of objects. Threshold distance and threshold amount functions both take an inten-
sive (distance) matrix and some extensive object quality (amount). The latter generates, for each objec t, the sum
of amounts reachable within some distance, and the former the minimal distance to a given sum of object- based
amo un ts. In our scenario, an exa mp le is given in te rms of fan pote ntial analy sis as par t of answers to Tasks 1 and 2.
A doubly constrained flow matrix function takes some intensive (distance) matrix and two extensive object
qualities (amounts) and generates an extens ive (interactio n) matri x between these ob ject s, as well as some at trac-
tiveness/productivity score on objects, which is intensive. For example, a gravity model (Batty, 1976; Huff, 1964;
Wilson, 1974) can be used to estimate interactions between municipalities and football clubs based on both the
number of ticket holders residing in each municipality and the number of tickets sold by each football club using
some distance decay function. The parameter of the distance decay function is either given or fitted to a mea-
sured mean trip length. A singly constrained model, in contrast, takes some attractiveness/productivity score on
FIGURE 5 Three semantic dimensions of CCD types used in this ar ticle, including core concept, geometric
types and measurement levels of attributes. Arrows denote subsumption relations. (a) The core concept
represented by some geodata attribute. (b) The geometr y type of some geodata attribute. (c) The measurement
level of some geodata at tribute
TABLE 2 Functional signatures of basic spatial network transformations
Function Software examples Inputs Outputs
ArcGIS Flowmap
Measure size Calculate geometry
Calculate length/
Object regions
Object size OE
Distance network Build network Import to
Object size OE Object regions OS Distance
Networ k distance matrix OD Cost matrix
Networ k distance
network OIO
Object regions OS Distance
Functional clustering Intramax analysis Flow matrix
Object regions OS Object
Catchment area Closest facility
Distance matrix
Object regions OS Object regions OS Object
Networ k analysis Transport
network OIO
Object regions OS Object
Boolean network
Threshold amount
(amount within
(Service area
Proximity count Distance matrix
Object amount s
Threshold distance
(distance to amount)
Regular treshold
Distance matrix
Object amount s
Accessibility analysis Summary statistics Catchment
Object distances
Statistics I
Flow matrix estimation
(doubly constrained)
doubly constr.
Gravity model
Distance matrix
Object amount s
Object amount s
Flow matrix
scores OI
   SCHEIDER Et al.
Function Software examples Inputs Outputs
Flow matrix estimation
(singly constrained)
Huff model singly constr.
Gravity model
Distance matrix
Object amount s
attr./prod. factors
Flow matrix
Object amount s
Flow summation Interaction
Flow matrix
Trip length analysis Trip end ranking Distance matrix
Flow matrix OEO* Statistics I
Trade area (Probability based
Trade area
Distance matrix
Flow matrix OEO* Object regions OS Object regions
Flow assignment Flow assignment
to network
Flow matrix
Distance network
Object regions OS Flow network
I, intensive quality.
TABLE 2 (Continued)
destinations (origins) and some capacity on origins (destinations) to generate interaction estimations and amounts
for destinations (origins). Examples are the different sorts of gravity models that can be used in Tasks 9, 10 and 11.
Flow summation takes an extensive (interaction) matrix and sums up all outgoing flows to corresponding amounts
on origin objects, as illustrated in Task 5. Trip length analysis is a statistical summary of the distribution of interac-
tions over distances between objects, resulting in some trip statistics (average trip length (Task 7) or average trip
end ranking), like the average car travel time for all trips being approximately 16 min. Trade area functions also
take a distance and an interaction matrix as inputs, as well as some object regions, and determine some smallest
(minimal distance based) objec t region that contains a par ticular sum of interactions. For example, it allows us to
demarcate an area around each football club that contains a certain percentage of its closest ticket holders (Task
8). Finally, a flow assignment function takes some interaction matrix and some distance network as well as some
object reg ions, and assigns flows to the networ k accordin g to the sh or test path s between flow ori gin and des tina-
tion objects (Task 12). Functions are also summarized in the computational diagram in Figure 6.
Note that only three of the 15 functions in Table 2 require an actual transport network file. Most (10) of the
other functions require a distance table that can be based on transport network distance but also on airline dis-
tances, time schedules, t ariff structures or functional distances. This illustrates that spatial network analysis is
much broader than implied by the common focus on transport networks. Furthermore, note that seven of the 15
functions did not have an equivalent in the standard software ArcGIS, though this functionality can of course be
Starting from a simple data source, we went through 12 different analytical tasks8 as an empirical basis for evalu-
ating our model. We begin with a description of the available data sources. Note that an in- depth study of the data
and the results is beyond the scope of this article.9
6.1 | Data source specification
There are five different data sources, which were interpreted in terms of the following types in our model:
). Polygon layer containing 489 municipalities, plus the
four- digit postcode areas of 37 professional football clubs in the Netherlands. The “L ABEL” field contains the
residential municipal name or football club name, and the “FC” field is 1 in the latter and 2 in the former case.
Conceptually, this corresponds to a collection of objects, including ON and OS. Municipalities form a vector
tessellation of the area.
). Table POP_2003 contains the total population of each
municipality (CB S St at li ne), corre sp on ding to an ex te nsive obje ct- ba se d ve ctor tessel lati on attribute, and cor re -
sponding to OE.
). The Dutch road transport network ROADS08 (BASNE T by
Adviesdienst Verkeer en Vervoer) contains names of roads and line geometries, thus including both ON and OS.
Note that roads are conceived as linear objec ts, not networks.
). This spatial interaction table of the Royal Dutch Football Association
contains the tot al number of season tickets for each combination of residential municipality and club. In total,
in 2003, there were 349,538 ticket holders distributed over 3,459 different combinations of municipality and
club. The interaction table reports numbers of ticket holders. Thus it corresponds to an extensive matrix quality
   SCHEIDER Et al.
). This gives the hypothetical attractiveness score for each football
club (OI) in a scenario where the lower professional league is abolished.
Though these sources cover only a limited set of types, further types of data are generated as part of the work-
flows described below.
6.2 | Specification of analytical tasks and expert workflows
Each task was described by a unique question (workflow task; see Table 3 and Appendix A). The latter was
then specified in terms of our t ype model (CCD), including input data types, goal types and (optionally) requests
for intermediate data types that should be used in the workflow. Specifications were later used as a basis for
automatic workflow synthesis. Furthermore, we manually generated one expert workflow for each question
(ex amp les below). In Ap pen di x A we exp la in in more deta il ho w eac h t as k spe cif ic ati on re fl ects th e infor ma tio n
given in the question, which computational steps are needed to answer it, and how the resulting maps look.
6.2.1 | Distance- based analysis
We fir st considered analytic al Tasks 1– 4 that ex ploit dist ances bet ween resident ial areas and football clubs mea s-
ured on a road network, in addition to amounts measured at origins or destinations. Workflow tasks include
the assessment of fan potentials and accessibility analysis. Computationally, these tasks require the generation
of a distance matrix between objects, by computing shortest path distances on the road network and including
the last mile between road intersections and these objects (use types). To assess fan potentials, the goal types
are extensive/intensive object qualities. Accessibility analysis requires intensive (distance- based) object qualities,
FIGURE 6 Computational diagram of spatial network transformations. Note that some signatures have been
simplified in this diagram
represented either as regions (municipality level) or lines (street level). Workflows for Tasks 1– 3 were discussed
in Section 3.
6.2.2 | Interaction- based analysis
Here we focus on tasks that analyze spatial interaction or flows between residential areas and clubs, in addition
to the network distance, making use of a (measured or modeled) interaction matrix (type OEO*). This includes flow
summation (Task 5) to su mm ar ize fl ows of destination/origi n am ount tot al s, and which was specified by request in g
extensive object qualities as goal type. Functional distance clustering (Task 6) was specified by requesting nominal
values (cluster identifiers) for objects. Trip length distribution (Task 7) was specified by requesting some intensive
measure. Finally, trade area analysis (Task 8) was specified by requesting an object- based region. Workflow solu-
tions for Tasks 7 and 8 are shown in Figure 7.
6.2.3 | Flow generation
The final type of analysis provides ways of estimating interactions from other kinds of spatial information. Expert
workflows solving these t asks are depicted in Figure 8.
The task of estimating the potential number of season ticket holders (Task 9) was specified by requesting an
extensive matrix. Another task was to estimate relative attractiveness scores (Task 10) for club s, base d on the pr od-
uct of (club or municipal) amount and their matching “balancing” factor. This was specified by an intensive object
quality. Finally, suppose the lower professional football league is abolished and their attractiveness becomes zero.
What will happen to the fans and the remainder of the clubs? To answer this question, the task (Task 11) was to
generate an extensive object quality (goal).A more challenging version of the same task (Task 11a) is to st art without
manually generated attractiveness scores, but require the generation of attractiveness scores in an intermediate
step, via use types. The final flow generation task takes an interaction matrix between municipalities and clubs,
as well as a street network as input, and generates finer- grained flows between road intersections, based on as-
suming that trips are made on the shor test paths on this network. This task is called traffic load analysis (Task 12),
specified by requesting an extensive network qualit y on lines.
When thinking is turned into workflows, concepts need to be translated into concrete tools and data sources. Our
hypothesis is that common geodata models alone, as well as graph- theoretic models, are insufficient to per form
such a translation. To test this hypothesis, we follow an approach of workflow synthesis quality assessment that
was developed in Kruiger et al. (2021). An overview of the evaluation process is shown in Figure 9. We compare
the quality of automatically synthesized workflows that were generated using our conceptual model against two
benchmark models. In this section, we explain the synthesis algorithm, the benchmark models and our workflow
quality assessment approach.
7.1 | Synthesis algorithm and workflow repository
We used a workflow composition algorithm as described in Kasalica and L amprecht (2020a). Automated Pipeline
Explorer (APE, uuary/ APE) generates sequences of tool applications satisfying logical (type)
   SCHEIDER Et al.
TABLE 3 Spatial network analysis tasks for synthesizing workflows
Tas k
Task category subcategory Workflow task Task specification
Distance- based
Fan potential 1 "What is the potential number of fans within a
travel dis tance for each municipality?"
input: (1)
, (2)
goal types: OE (ObjectQ, RegionA, ER A)
use type: OIO* (MatrixQ, IRA )
2 "What is the potential minimal travel distance
to reach a certain number of fans for each
input: (1)
, (2)
goal types: OI (ObjectQ, RegionA, IR A)
use types: OIO* (MatrixQ, RegionA, IR A)
Accessibility 3 "What is the accessibility of clubs from each
input: (1)
, (2)
goal types: OI (ObjectQ, RegionA, IR A)
use types: OIO* (MatrixQ, RegionA, IR A)
4 "What is the accessibilit y of clubs from each
input: (1)
, (2)
goal types: OI (ObjectQ, Line A, IR A)
use types: OIO (NetworkQ, LineA, IRA)
Flow summation 5 "What is the number fans for each
input: (1)
goal types: OE (ObjectQ, RegionA, ER A)
Functional clustering 6 "To which functional cluster does a club
input: (1)
, (2)
goal types: OS, ON (Objec tQ, RegionA, PlainNominalA)
Trip length distribution 7 "What is the average travel time to/trip rank of
a club?"
input: (1)
, (2)
, (3)
goal types: I (IR A)
Trade area analysis 8 "What is the area enclosing 60% of the number
of fans closest to each club?"
input: (1)
, (2)
, (3)
goal types: OS, ON (Objec tQ, RegionA, PlainNominalA)
use types: OEO* (MatrixQ, ERA), OIO* (MatrixQ, IRA)
Tas k
Task category subcategory Workflow task Task specification
Flow generation Gravity modeling 9 "What is the potential number of fans in each
municipality for each club assuming distance
de cay? "
input: (1)
, (2)
, (3)
goal types: OEO* (MatrixQ, RegionA, ERA)
10 "What is the attr activeness of clubs for fans?" input: (1)
, (2)
, (3)
goal types: OI (ObjectQ, RegionA, IR A)
11 "What is the potential number of fans for each
club when the lower professional league is
input: (1)
, (2)
, (3)
goal types: OE (ObjectQ, RegionA, ER A)
input: (1)
, (2)
, (3)
11a (a more challenging version of 11) goal types: OE (ObjectQ, RegionA, ER A)
use types: OI (ObjectQ, RegionA, IR A)
Traffic load analysis 12 "What is the potential traffic load for each
road assuming fans travel by c ar at the same
input: (1)
, (2)
, (3)
goal types: OEO (Networ kQ, Line A, ER A)
Note: Tasks were formulated as questions and specified using CCD types.
TABLE 3 (Continued)
   SCHEIDER Et al.
const ra ints as used in our task sp ecification (input types, ou tp ut typ es , use types). The latter are expressed in seman tic
linear- time logic using the classes of our ontology. The three semantic dimensions of the CCD model were used in-
dependently as constraints for this kind of reasoning, and class combinations were automatically interpreted as class
conjunctions. Furthermore, leaf classes in one dimension were interpreted as mutually exclusive and jointly exhaustive.
In AP E, workf low mo de ls satisfying th ese task spe ci fications ar e generated with in cr ea si ng size , dr awing fro m a reposi-
tor y of tool signat ures (se e Table 2) annotated with the same ty pes. Th e max imum numbe r and size of workflows were
given as pa rameter s. In ou r te st , we gen er ated fi ve workf lows up to a lengt h of 10 tool applic at ions for each variant of a
task. Mor e wor kflows increased only the am ou nt of soft error s (see Se ction 7.3). Fur th er mo re, we used th e constraints
that all given input data should be used in the workflow, and that at least one of the data instances that are generated
as output, per tool, has to be used. The workflow synthesis repository with all resources is available online (https://
figsh 00a98 58db6 9e37f), including task specification files (ape.configuration and constraints.json) for
the 12 tasks as well as resulting workflows, for both CCD and the benchmark solutions. Workflow outputs are gener-
ally encoded as directed acyclic graphs with function applications as vertices. Examples of automatically synthesized
workflows are shown in Appendix B. In APE, workflows can also be expor ted in a serialized form, as an executable
script. This requires, however, a way to deal with function parameters (see the discussion below).
7. 2  | Benchmarking
We compared the synthesized work flows from our model against workflows obtained under the exact same con-
ditions, except that we used some modified type system reflec ting the kind of information available in current data
models used to represent spatial networks. We considered t wo benchmark variants:
1. Geometric benchmark (abbreviated bench). This is a proper subset of CCD where the two conceptual
dimensions (including core concepts and measurement levels) were removed, including only one dimension
FIGURE 7 Expert workflows solving Tasks 7 and 8
FIGURE 8 Expert workflows solving Tasks 9– 11
related to geometry types, namely the distinction between raster and vector attributes, as well as be-
tween point, line and region attributes (see Figure 5b). The distinction between VectorTessellationA and
PlainVectorRegionA was also removed, since it does not occur in current data structures.
2. Embedded graph benchmark (abbreviated graph). This version retains the idea of a graph embedded into geo-
metric spa ce. We dis tinguish bet ween node s and direc ted edges (
relations between no des) based on the core
concept superclasses ObjectQ and RelationalQ (see Figure 5a) respectively, together forming one dimension.
Fur the rm ore, node s as well as edges can be em be dded by ei ther of the ge ometr ic type s in the geo me tr ic bench-
mark. This is encoded by taking the geometric benchmark types as a second dimension.
Using these benchmark versions of the ontology, we manually created corresponding tool annotations by sub-
stituting every type with the least upper bound (supremum) concept that is still in the corresponding benchmark
ontology. In the same way, we generated benchmark versions of all task specifications, by substituting input, use and
goal types with their benchmark equivalents, respectively.
FIGURE 9 A summary of our ontolog y evaluation framework for workflow synthesis. For an ontology,
five steps are performed. All steps are per formed both for the ontolog y and the benchmarks to measure
improvements (Kruiger et al., 2021)
(2) Taxonomy preparation
(1) Tool annotation
(4) APE
(5) Evaluation
Error types
Tool annotations
   SCHEIDER Et al.
7.3 | Evaluation metrics and quality assessment
We treated workflow synthesis like a retrieval process, measuring its qualit y with respect to an expert judgment
and considering expert workflows produced independently with Flowmap. We decided to measure both precision
(the proportion of retrieved answers that are correct given all retrieved answers) as well as recall (the propor tion
of retrieved answers that are correct given all correct answers).
To assess recall, an expert on spatial network analysis went through the tasks ahead of our study and manually
generated a gold standard of expert workflows, using the set of spatial network func tions in Table 2. Afterwards,
when going through the synthesized workflows for each task, the expert simply indicated whether one of them
corresponded to the exper t workflow for this task.
To assess precision, our exper t assessed synthesized workflows individually based on different error types.
We used three error types on two different severity levels, which are summarized and illustrated in Table 4.
Hard errors are critical errors which result either in a wrong or non- meaningful answer, or in a workflow that
is non- executable due to wrong data formats. We distinguish two kinds of hard errors: syntax errors, which have
a part of the work flow that cannot be executed because a tool is incorrec tly applied, and semantic errors, which
produce a meaningless or invalid answer for the given question. Soft errors are no n- cri tic a l err ors wher e wor k flo ws
do entail a correc t answer, but which are in some sense of less er qu ality. We focused on redundancy errors, where
workflows make use of unnecessary tool applications.
Evaluation result data sets are available online (https://figsh 57de0 58b51 c19e3). In Table 5, eval-
uation results for each task variant are shown as a statistic over the first five workflows generated using each task
specification. Num indicates the number of workflows for each task variant, which can be less than five in c ase
not more options were found. Semantic error denotes the numb er of semantic er rors in these workflows, Syntactic
error denotes the number of syntactic errors in these workflows, Correct denotes the number of workflows with-
out hard errors, Rdn denotes the number of correct workflows with redundancy errors. Expert solution denotes
the number of correct workflows that correspond to an expert solution. Expert order denotes the order of occur-
rence of the first expert solution, in c ase it occurred within the set of generated workflows, and
Results are listed for workflows generated with the CCD model (CCD[x]), the embedded graph model (graph[x]),
and the geometric benchmark model (geom[x]). In the total row we summed up all workflow counts and averaged
the length and order measurements for each of these three test variants. In total, we checked the quality of 181
workflows. Our interpretation of these results is summarized as follows:
Our study shows that the CCD model is capable of reproducing at least one expert workflow for each single task
(see the expert solution column). In total, 22 expert workflows could be recalled by the CCD model. Removing
duplicates, this amount s to 13 unique expert workflows (including Tasks 11 and 11a; see Figure 14), which is
a recall of 100%. This is in stark contrast to the geometric benchmark, which only produced a single expert
TABLE 4 An over view of the different error types (figures in Appendix B)
Error severity Error type
Hard Syntax Figure B3
Semantic Figure B2
Soft Redundancy Figure B4
TABLE 5 Result s of evaluating the core concept (CCD) model of spatial networks against the benchmark models
Tas k Variant Num
error Correct Rdn
1CCD1 54.8 2 1 2 1 1 1
gra ph1 53.0 45 0 0 0
geo m1 51.8 5 5 0 0 0
2CCD2 54.6 3 2 1 0 1 1
graph2 53.0 5 5 0 0 0
geom2 51.8 5 5 0 0 0
3CCD3 55.6 2 3 2 1 1 1
graph3 53.0 45 0 0 0
geom3 51.8 5 5 0 0 0
4CCD4 54.6 0 3 2 1 1 1
graph4 52.8 041 0 1 5
geom4 51.6 45 0 0 0
5CCD5 11.0 0 0 1 0 1 1
graph5 51.6 241 0 1 2
geom5 51.0 43 1 0 1 1
6CCD6 53.0 041 0 1 1
graph6 51.0 4 4 1 0 1 3
geom6 51.0 45 0 0 0
7CCD7 54.8 1 1 43 1 1
gra ph7 52.4 45 0 0 0
geom7 52.8 0 5 0 0 0
8CCD8 54.8 0 0 5 41 1
graph8 52.4 45 0 0 0
geom8 52.0 45 0 0 0
9CCD9 55.2 0 1 4041
graph9 52.6 5 5 0 0 0
geom9 52.0 5 5 0 0 0
10 CCD10 55.0 40 1 0 1 5
gra ph10 52.4 5 5 0 0 0
geo m10 52.0 5 5 0 0 0
11 CCD11 55.2 0 0 5 1 41
CCD11 a 56.0 1 0 4042
gra ph11 53.0 5 5 0 0 0
geo m11 52.0 5 5 0 0 0
12 CCD12 53.8 0 1 43 1 1
gra ph1 2 52.8 041 0 1 5
geo m12 51.8 3 5 0 0 0
Tot a l CCD 61 4.5 13 16 36 14 22 1.4
graph 60 2.5 42 56 404-
geom 60 1.8 49 58 1 0 1 -
Notes: Each task variant included the first five workflows generated by APE under the given specification. In total, 181
workflows were evaluated. See tex t for explanation.
   SCHEIDER Et al.
solution (for Task 5) over all 12 tasks (recall 8%), as well as the embedded graph model, which found four expert
solutions (recall 33%). Whether the exceptionally high recall value of the CCD model can be sustained for larger
sets of expert workflows or other kinds of tasks remains to be seen. However, it shows that our model indeed
is capable of accounting for a significant amount of such exper t knowledge.
Furthermore, the exper t solutions that were found by CCD appear very early in the process (see the expert order
column). Most often they appeared as the first solution, except for Tasks 10 and 11a, where they appeared as
number 5 and 2 in the row. In the four cases in which the graph model was able to produce exper ts solutions,
these were generated in places 5, 2, 3, 5. This indicates that despite of the presence of semantically incorrect
or redundant workflows, high- quality solutions produce d by the CCD model may be filtered out simply by con-
straining the number of work flows generated.
CCD solutions are on average much longer than benchmark workflows (4.5 nodes compared to 1.8 in the geo-
metric and 2.5 in the graph model) (see the Avg length colum n). This indicate s that th e CCD onto logy adds mor e
constraints to the space of workflow composition, and thus contains more information than both the geometric
and the graph model.
Thirty- six out of 61 CCD workflows (59%) were correct solutions of the task (without any semantic or syntactic
errors) (see the Correct column). This is again in st ark contrast to the geometric model, with a precision of less
than 2%, and also to the graph model, with a precision of 6%. This indicates that without deeper semantics, it
becomes nearly impossible to generate high- quality solutions, even if using an embedded graph. Furthermore,
since errors tend to occur with larger solutions, the precision of the CCD model dramatically increases to 84%
(11 out of 13) when selecting the first workflow as a solution for each task. Still, there remain quite a lot of
semantic and syntactic errors in the CCD solutions. The 13 semantic errors were due to missing workflow con-
straints implicitly contained in the task (see the discussion below). The 16 syntax errors were mainly due to the
fact that some of the computational functions in our model, which are treated independently, are actually not
implemented in terms of independent components in the Flowmap software.11 In consequence, some possible
combinations and repetitions of these tools in our model are actually syntactically impossible in Flowmap.
These errors can be easily avoided by forcing the tools to be used only once or only in conjunc tion with others.
Furthermore, syntactic errors due to repetitions can be considered redundancy errors. If we count these errors
as redundancy errors instead, the hard error rate of the CCD solutions falls by 10, resulting in a precision of
75% (46 out of 61).
Redundancy errors occur within CCD work flows mainly because CCD imposes increased constraints on the
workflow composition process, and so the only possibilit y of generating longer workflows is to repeat function
applications. This is compatible with earlier results (Kruiger et al., 2021). The problem can be handled by further
restricting the number of workflows produced for each task.
Regarding the validity of these result s, we would like to add the following considerations. First, one may ask
whether the chosen benchmark for comparison is of sufficient quality. Our argument is that the benchmarks cover
precisely the concepts used and available in current spatial network information systems. These are, on the one hand,
geometric data types, and on the other hand, graph- theoretic models. We were rather lenient with the combinability
of graph elements and geometry types to distinguish functions, which in practice is rather more restricted. Second,
one might ask whether our chosen tasks and scenarios are not too limited in range. Our list indeed lacks some com-
mon network functions, including more complex routing functions, such as traveling salesman or Chinese postman
routing, or location allocation methods. However, the first two of these can be seen as a special case of the network
distance matrix function. Shortest- path routing deals with a single origin and destination and some path network
(OSO) as output that contains all trips as geometries between origin and destination objects. In the traveling salesman
variant, the only thing added is another object input, namely, the objects to be visited on a tour. Location allocation
functions are met ho ds to pla ce objects in respec t of bo th amounts and distances, and thu s should also fit well into our
fram ewor k. Third, regarding the comp lexity of our tasks, we believe they correspond to the level required in practice.
Nevertheless, it should be investigated in the future how longer tasks and larger repositories of functions influence
the qualit y of workflows. And fourth, in the practice of spatial network analysis, parameter settings and fitting of
parameter values (e.g. the dist ance decay parameter for gravity models) and manual interventions are essential parts
of a workflow. In this respect, our model still commits to a considerable simplification, leaving completely automatized
workflow synthesis beyond current reach. However, this could be addressed in the future by incorporating abstract
parameter semantic s. What kinds of concepts could be used for this purpose, however, is an open question. Finally, in
com pliance wi th pre vi ou s re su lt s (Kruiger et al ., 2021), it seems that th e am ount of semantic error s ca n on ly be fu rth er
reduced when incorporating information about the type of transformation. As shown in Figure 15, this workflow for
Task 10 fai ls bec ause the thre sh old dista nc e fu nc tion has the same result type as the (requ ir ed ) at tra cti ve ne ss sco re of
the doubly constrained flow matrix function. To prevent this error, we would need to distinguish between measuring
threshold distances and measuring attractiveness, which is beyond the current model. However, the workflow syn-
thesis algorithm would allow such tool constraints to be incorporated (Lamprecht et al., 2010).
In this article we suggested and tested the idea that spatial network analysis, as implemented in GIS, and as en-
visioned by early writers in net work- related geography, can be fruitfully understood as a repertoire of functions
that transform between relations of objects and their qualities. Qualities can be unary or binary, extensive or
intensive (depending on whether they are additive with respect to the spatial ex tent of the controlling objects),
and on different levels of measurement. To this end, we extended the core concept data types ontology with new
classes along three semantic dimensions, including core concept, measurement level and geometry type. We also
included two benchmark models, one of them corresponding to a geometrically embedded graph.
We tested our model against the benchmarks on a scenario with 12 different network analysis tasks. We eval-
uated automatically synthesized workflows by expert judgements and by comparing them with independently gen-
erated expert workflows. Despite its simplicity, we demonstrated that the model helps us not only to more clearly
understand the underlying functions, but also to automate spatial network analysis to a degree that can support
analysts in answering questions. Our model distinguishes (question 1) 12 network analysis tasks in terms of input/
output and intermediary types, which was sufficient to instruct corresponding workflow synthesis. Only in few
cases (e.g. Task 10) was the model not able to distinguish between tasks that should result in different workflows.
Furthermore, the model was sufficient (question 2) to distinguish between all relevant spatial network func tions, ex-
cept for functional differences that depend on function parameters or type- equivalent transformations, (e.g. threshold
distances and attractiveness scores) which were not distinguished in this study. Furthermore, regarding the quality
of synthesized workflows (question 3), results show not only that the model was capable of regenerating all expert
workflows, but also that the semantic depth added by our model over and above graph theor y is crucial for high-
quality workflows, improving their accuracy from
60 %
, and potentially over
75 %
under certain adjustments.
To enable fully automatized workflows and executable workflow scripts, there are still several open issues.
First, future work should focus on models for incorporating method parameters (which were not considered here)
and for removing remaining syntax errors. To remove the considerable amount of semantic errors, the model
needs to be extended to types of network transformations. Modeling parameter semantics is closely related to a
transformation model, because function parameters are often functions themselves (e.g. “averaging” trip lengths
versus “taking the median” of trip ranks). We are currently working on a transformation algebra that is based on
a higher- order type system for specifying such conceptual transformations. Finally, tool annotations should be
extended to encompass further relevant software for spatial network analysis, including QGIS, ArcGIS and Python
libraries, allowing for cross- software comparisons.
What are the wider implications of these results? We see our work in the context of symbolic AI for GIS
(Janowic z et al., 2019). For purposes of GIS automation, we can learn from this study that the know- how required
   SCHEIDER Et al.
to deal with spatial information generally goes beyond knowing the computational procedures or having the data.
Thus reducing know- how to knowledge extraction runs the risk of underestimating this task. This is especially
important in an age where intelligence tends to be reduced to a variant of machine learning. By reducing analysis
to the comput ational process on data, we disregard the underlying reasoning process that is necessary to arrive
at meaningful results. As our study demonstrates, this reasoning process requires concepts instilled into data, not
extracted from data. Correspondingly, while Janowicz et al. (2019) claim that “GeoAI research will have to make a
case for spatially explicit models,” our study clearly shows that for purposes of automation, explicit spatial models
are beyond question, and that even such models can still be insufficient. While we have made a suggestion for
the kind of knowledge lacking, it remains unknown what we will lose once our network experts are substituted
by machines.
Simon Scheider
1. SAT solvers, for example, form a basis for algorit hms behind workflow synthesis (L amprecht et al., 2010) and belong
to symbolic AI.
2. This idea is rooted in a suggestion made by Werner Kuhn (cf. 2012) in personal conversation, so we can only partially
take credit for it.
3. Web Ontolog y Language; see
4. Relations behave similarly to tables in a database or to entity relationship models in that they have primary and for-
eign keys. In the following, such analogy is used only to illustrate our idea, not to imply that core concepts are actually
implemented as tables.
5. Extensivity in this case would need to take account of the product of origin and dest ina tio n regions. For maliza tio n of
this kind of unary and binary extensivity is considered future work.
6. Think, for example, about grid or lattice graphs, which represent a network of raster cells or lattice polygons.
7. In the f uture we plan to extend the coverage of software to fur ther network functions, such as the functions avail-
able in QGIS.
8. Detailed task s descriptions can be found in Appendix A .
9. A more comprehensive documentation of the analysis is available at http://geogr aphic knowl
a p % 2 0 A p p l i c a t i o n s % 2 0 Q 1 - 1 6 . p d f
10. Meaning that the workflow may still appear later in the process, but not within the first five workflows.
11. “Measure size”, for example, is implemented in Flowmap only as part of the network topology generation.
Albrecht, J. (1998). Universal analytical GIS operations: A task- oriented systematization of data structure- independent
GIS functionality. In M. Craglia & H. Onsrud (Eds.), Geographic information research: Transatlantic perspectives (pp.
577– 591). Boca Raton, FL: CRC Press.
Allen, C., Hervey, T., Lafia, S., Phillips, D. W., Vahedi, B., & Kuhn, W. (2016). Exploring the notion of spatial lenses. In J.
Miller, D. O’Sullivan, & N. Wiegand (Eds.), Geographic information science: GIScience 2016, (Lecture Note s in Computer
Science, Vol. 9927, pp. 259– 274). Berlin, G ermany : Springer.
Arlinghaus, S. L., Arlinghaus, W. C., & Harary, F. (2002). Graph theor y and geography: An interactive view. New York, NY:
John Wiley & Sons.
Batty, M. (1976). Urban modelling: Algorithms, calibrations, predictions. Cambridge, UK: Cambridge University Press.
Brown, L. A., & Horton, F. E. (1970). Functional distance: An operational approach. Geographical Analysis, 2, 76– 83.
https ://doi.or g/10.1111/ j.1 538- 46 32.1970.tb 001 46. x
Burrough, P. A., McDonnell, R. A., & Lloyd, C. D. (2015). Principles of geographical information systems (3rd ed.). Oxford,
UK: Oxford University Press.
Chrisman, N. R. (2002). Exploring geographic information systems (2nd ed.). Chichester, UK: John Wiley & Sons.
Ch ri sta ll er, W. (193 3). Die zen tralen Orte in Südd eutschland [ The central places i n southern Germa ny]. Jena, Germany: Gustav
Codd, E. F. (1979). Extending the database relational model to capture more meaning. ACM Transactions on Database
Systems, 4, 38.
Cova, T. J., & Goodchild, M. F. (2002). Extending geographic al representation to include fields of spatial objects.
International Journal of Geographical Information Science, 16, 509– 532. http s:// /10.1080/13658 81021 01370 40
Curry, L. (1972). A spatial analysis of gravity flows. Regional Studies, 6, 131147. ht tp s:// /10.1080/09595 2372 0
018 5141
Dejonghe, T., Van Hoof, S., & Kemmeren, T. (20 06). Voetballen in een kleine ruimte: Een onder zoek naar de geografische
marktgebieden en ruimtelijke uitbreidingsmogelijkheden voor de clubs in het Nederlandse betaald voetbal. Nieuwegein,
The Netherlands: Arko Sp ort s Media.
Galton, A. (2004). Fields and objects in space, time, and space- time. Spatial Cognition and Computation, 4, 39– 68. https:// 7633s cc0401_4
Geertman, S., de Jong, T., & Wessels, C. (2003). Flowmap: A support system for strategic net work analysis. In S.
Geertman & J. Stillwell (Eds.), Planning support systems in practice (pp. 155– 175). Berlin, Germany: Springer. https://
d o i . o r g / 1 0 . 1 0 0 7 / 9 7 8 - 3 - 5 4 0 - 2 4 7 9 5 - 1 _ 9
Geertman, S. C., & Ritsema van Eck, J. R. (1995). GIS and models of accessibility potential: An application in planning.
International Journal of Geographical Information Systems, 9, 67– 80. 79950 8902025
Giordano, A., Veregin, H., Borak, E., & L anter, D. (1994). A conceptual model of GIS- based spatial analysis. Cartographica,
31, 44– 57.
Grüninger, M ., & Fox, M. S. (1995). T he role of competency questions in enterprise engineering. In A. Rolstadås (Ed.),
Benchmarking: Theory and practice (pp. 2231). London: Chapman & Hall. 0- 387- 34847
- 6_3
Guarino, N., Guizzardi, G., & Mylopoulos, J. (2020). On the philosophical foundations of conceptual models. Information
Modelling and Knowledge Bases, 31, 1. 0 00 02
Güting, R. H. (1994). GraphDB: Modeling and querying graphs in databases. In J. B. Bocca, M. Jarke, & C. Zaniolo (Eds.),
VLDB ’94: Proceedings of the 20th International Conference on Very Large Data Bases (pp. 297– 308). S an Francisco, CA :
Morgan Kaufmann.
Güting, R. H., D e Almeida, V. T., & Ding, Z. (2006). Modeling and querying moving objects in net works. VLDB Journal, 15,
1 6 5 – 1 9 0 . h t t p s : / / d o i . o r g / 1 0 . 1 0 0 7 / s 0 0 7 7 8 - 0 0 5 - 0 1 5 2 - x
Hagget t, P., & Chorley, R. J. (1969). Network analysis in geography. London, UK : Edward A rnold.
Heywood, D. I., Cornelius, S., & Carver, S. (2010). An introduction to geographical information systems. New York, NY:
Addison Wesley Longman.
Hofer, B., Mäs, S., Brauner, J., & Bernard, L. (2017). Towards a knowledge base to support geoprocessing workflow
development. International Journal of Geographical Information Science, 31, 694– 716.
816.2 016.1227441
Huff, D. L. (1964). Defining and estimating a trading area. Journal of Marketing, 28, 3 4– 38.
Ingram, D. R. (1971). The concept of accessibility: A search for an operational form . Regional Studies, 5, 101– 107. https:// 80/0 9595 23710 0185131
Janowicz, K., Gao, S., McKenzie, G., Hu, Y., & Bhaduri, B. (2019). GeoAI: Spatially explicit artificial intelligence techniques
for geographic knowledge discovery and beyond. International Journal of Geographical Information Science, 34(4), 625–
636. 816.2019.1684500
Jiang, B., & Claramunt, C. (20 04). Topological analysis of urban street net works. Environment and Planning B: Planning and
Design, 31, 151– 162. 06
Jong, T. D., & Ritsema van Eck, J. R. (1996). Location profile- based measures as an improvement on accessibility modelling
in GIS. Computers, Environment and Urban Systems, 20, 1 8 1 – 1 9 0 . h t t p s : / / d o i . o r g / 1 0 . 1 0 1 6 / S 0 1 9 8 - 9 7 1 5 ( 9 6 ) 0 0 0 1 3 - 0
Kanjilal, V., & Schneider, M. (2010). Modeling and querying spatial network s in databases. Journal of Multimedia Processing
and Technologies, 1, 142 159.
Kasalica, V., & Lamprecht , A.- L. (2020a). APE: A command- line tool and API for automated workflow composition. In V. V.
Krzhizhanovskaya, G. Závodszky, M. H. Lees, J. J. Dongarra, P. M. A. Sloot, S. Brissos, & J. Teixeira (Eds.), Computational
science: ICCS (pp. 464– 476). Cham, Switzerland: Springer.
Kasalica, V., & Lamprecht , A.- L. (2020b). Work flow discovery with semantic constraints: The SAT- based implementation
of APE. Electronic Communications of the EASST, 78, 1– 25.
Kruiger, J., Kasa lica, V., Meerlo, R., Lamprec ht, A ., Nyamsuren , E., & Scheider, S. (2021). Loose programmin g of GIS work-
flows with geo- analytical concepts. Transactions in GIS, 25, 424– 449.
Kuhn, W. (2012). Core concepts of spatial information for transdisciplinar y research. International Journal of Geographical
Information Science, 26, 22672276. 816.2012.722637
   SCHEIDER Et al.
Kuhn, W., & Ballatore, A . (2015). Designing a language for spatial computing. In F. Bacao, M. Santos, & M. Painho (Eds.),
AGILE 2015 (Lecture Notes in Geoinformation and Cartography (pp. 309– 326). Cham, Swit zerland: Springer.
Lamprecht, A.- L., Naujokat, S., Margaria, T., & Steffen, B. (2010). Synthesis- based loose programming. Proceedings of
the Seventh International Conference on the Quality of Information and Communications Technology (pp. 262– 267).
Piscataway, NJ: IEEE.
Longley, P. A., Goodchild, M. F., Maguire, D. J., & Rhind, D. W. (2015). Geographic information science and systems. New
York, NY: John Wiley & Sons.
Lutz, M. (2007). Ontology- based descriptions for semantic discovery and composition of geoprocessing services.
Geoinformatica, 11, 1 – 3 6 . h t t p s : / / d o i . o r g / 1 0 . 1 0 0 7 / s 1 0 7 0 7 - 0 0 6 - 7 6 3 5 - 9
Masser, I., & Brown, P. J. (1975). Hierarchical aggregation procedures for interac tion data. Environment and Planning A, 7,
509– 523.
Masser, I., & Brown, P. J. (1977). Spatial representation and spatial interaction. Papers of the Regional Science Association,
38, 71– 92. 33513
Miller, H. J., & Shaw, S.- L. (2001). Geographic information systems for transportation: Principles and applications. Oxford, UK:
Oxford University Press.
Moseley, M. J. (1979). Accessibility: The rural challenge. London, UK : Methuen.
Naujokat, S., Lamprecht, A.- L ., & Steffen, B. (2011). Tailoring process synthesis to domain characteristics In N. V. L as
Vegas (Ed.), Proceedings of the 16th IEEE International Conference on Engineering of Complex Computer Systems (pp. 167–
175). Piscataway, NJ: IEEE.
Qi, L., Zhang, H., & Schneider, M. (2016). SNAL: Spatial network algebra for modeling spatial networks in database sys-
tems. Proceedings of the Second International Conference on Geographical Information Systems Theory, Applications and
Management, Rome, Ital y (pp. 145– 152).
Rich, D. C. (1980). Potential models in human geography. Norwich, UK: G eo Abstracts.
Ritsema van Eck, J. R. (1993). Analyse van Transportnetwerken in GIS voor Sociaal- geografisch Onderzoek (Netherlands
Geographical Studies 164). Utrecht, The Netherlands: University of Utrecht.
Scheider, S., & Huisjes, M. D. (2019). Distinguishing extensive and intensive properties for meaningful geocomputation
and mapping. International Journal of Geographical Information Science, 33(1), 28– 54.
Scheider, S., & Kuhn, W. (2008). Road net works and their incomplete representation by networ k data models. In T. J.
Cova (Ed.), Geograp hic information sci ence: GIScience 20 08 (Lecture Notes in Com puter Science, Vol. 5266 (pp. 290 3 07 ).
Berlin, Germany: Springer.
Scheider, S., & Kuhn , W. (2010 ). Af fordance- based cate gor ization of road netwo rk data using a grounded th eory of chan -
nel networks. International Journal of Geographical Information Science, 24, 1249– 1267.
8 1 0 9 0 3 5 1 4 1 9 8
Scheider, S., Meerlo, R., Kasalica, V., & Lamprecht, A .- L. (2020). Ontology of core concept data types for answering geo-
analytical questions. Journal of Spatial Information Science, 2020, 167201.
Scheider, S., Nyamsuren, E., Kruiger, H., & Xu, H. (2021). Geo- analytic al question- answering with GIS. International Journal
of Digital Earth, 14, 1– 14. htt ps:// 0/1753 8 947.2 020.173 856 8
Scheider, S., & Tomko, M. (2016). Knowing whether spatio- temporal analysis procedures are applicable to datasets. In R.
Ferrario, & W. Kuhn (Eds.), Formal ontology in information systems: Proceedings of the 9th International Conference (pp.
67– 80). Amsterdam, The Netherlands: IOS Press.
Sutton, J. (1998). Data attribution and network representation issues in GIS and transport ation. Transportation Planning
and Technology, 21, 25– 41. 06970 8717600
Thill, J.- C. (2000). Geographic information systems for transport ation in perspective. Transportation Research Par t C:
Emerging Technologies, 8, 3 – 1 2 . h t t p s : / / d o i . o r g / 1 0 . 1 0 1 6 / S 0 9 6 8 - 0 9 0 X ( 0 0 ) 0 0 0 2 9 - 2
Wilson, A. G. (1974). Urban and regional models in geography and planning. Chichester, UK: John Wiley & Sons.
Winter, S. (2002). Modeling costs of turns in route planning. GeoInformatica, 6, 345– 361.
Yue, P., Di, L., Yang, W., Yu, G., & Zhao, P. (20 07). Semantics- based automatic composition of geospatial web ser vice
chains. Computers & Geosciences, 33, 649– 665.
How to cite this article: Scheider, S., & de Jong, T. (2021). A conceptual model for automating spatial
network analysis. Transactions in GIS, 00, 1– 38. https://doi. or g/10.1111/tgis.128 55
Specification of analytical tasks
This appendix contains more detailed descriptions of the analytical tasks 1– 12 for designing workflows, which
were used to build and evaluate our conceptual model.
A1 | Distance- based network analysis
A1.1 | Fan potential of municipalities
Starting from the number of inhabitants of municipalities, the potential of fans for a club in a municipality can be
assessed by assuming a fixed distance threshold that these potential football fans would be willing to travel:
Workflow Task 1 ”What is the potential number of football fans with a certain travel distance for each municipality
in the Netherlands?”
Task Specification 1 input: (1)
, (2)
goal types: OE (ObjectQ, RegionA, ERA)
use type: OIO* (MatrixQ, IRA)
For this task, we start with the roads file and the population data on municipalities, and the goal is to assess some
object- based extensive measure (the number of football fans reachable at some travel distance from a given municipality).
To account for the concept of travel distances, we request in addition that some intensive matrix be used in the solution.
Alternatively, we can measure a minimum travel distance to reach a threshold number of fans that a football
club can at trac t:
Workflow Task 2 ”What is the potential minimal travel distance to reach a certain number of football fans for each
municipality in the Netherlands?”
Task Specification 2 input: (1)
, (2)
goal types: OI (ObjectQ, RegionA, IRA)
use types: OIO* (MatrixQ, RegionA , IRA)
Starting again with the roads file and the population data, our goal here is to estimate some object- based in-
tensive measure (minimal travel distance of some number of fans). For the same reason as above, we require that
some dist ance matrix be used in the solution.
Both kinds of analysis result in a map that shows a potential for each municipality. Figure 1a shows the map for
Task 2, in which all high- ranking (red) municipalities are covered by at least one actual stadium. This supports the
validity of the chosen potential measure.
A1.2 | Accessibility of football clubs from municipalities
In this task, we are interested in finding out how accessible football clubs are for each municipality:
Workflow Task 3 What is the accessibility of football clubs for each municipality in the Netherlands?”
   SCHEIDER Et al.
Task Specification 3 input: (1)
, (2)
goal types: OI (ObjectQ, RegionA, IRA)
use types: OIO* (MatrixQ, RegionA , IRA)
Here we start with plain municipalities (including some nominal attribute) and roads, to assess some object-
based intensive measure (the accessibility of football clubs). Since accessibility implies distance measurements, we
likewise require that some distance matrix be used in the solution.
A1.3 | Accessibility of football clubs from roads
Roads and intersections are the objects that constitute a road network. Here we determine the distance between
each road and its closest football club.
Workflow Task 4 What is the accessibility of football clubs from each road?”
Task Specification 4 input: (1)
, (2)
goal types: OI (ObjectQ, LineA , IRA)
use types: OIO (NetworkQ, LineA, IR A)
Using roads and clubs as input, we request some object- based intensive measure on lines, representing road
objects. We require distances measured on some line network to account for the concept of accessibility from
roads (Figure A1).
FIGURE A1 The map shows for each road segment the travel time to the closest football club, from more
than half an hour in yellow, via half an hour in red, to less than 5 min in purple. The black lines indicate the
shortest path from each municipality to its closest football club
A2 | Interaction- based network analysis
A2.1 | Numbers and flows of football fans
Flows in a matrix are shown in terms of the thickness of connecting lines in Figure A2a.
Starting from an interaction matrix, a simple transformation is needed in order to assess how many football fans
originate in each municipalit y.
Workflow Task 5 ”What is the number of season ticket holders for each municipality in the Netherlands?”
Task Specification 5 input: (1)
goal types: OE (ObjectQ, RegionA, ERA)
In this task, we start from the interaction table with ticket holders and request some extensive object- based
measure (number of ticket holders). To generate the map in FigureA2b, we need to summarize the interaction
table over one of its keys. This shows that season ticket holders seem concentrated around existing football
A2.2 | Functional distance clustering of football clubs
Workflow Task 6 ”To which functional cluster does a football club belong?”
Task Specification 6 input: (1)
, (2)
goal types: OS, ON (ObjectQ, RegionA, PlainNominalA)
FIGURE A2 Numbers and flows of football fans in the Netherlands: (a) Flow of ticket holders, displayed as
desire lines; and (b) Numbers of ticket holders by municipality, displayed as a bar chart .
(a) (b)
   SCHEIDER Et al.
Here, we start again with the interaction table. Together with the plain municipality data, it should be used to
derive some nominal attribute for objects (cluster labels for municipalities).
The dendrogram in Figure A3a shows the progress of the fusion process as residential municipalities are
merged with football clubs. The map in Figure A3b shows, in purple, the fusion lines representing a merge be-
tween a residential municipality and a football club. The blue fusion lines indicate the first 24 merges between
football clubs resulting in 14 (sub- )regional clusters; After this stage over 86.6% of all ticket holders are inter-
nalized in one of the clusters. The red fusion lines indicate the next nine steps, after which five clusters at the
national level remain.
A2.3 | Football trip length distribution and trip end ranking
Workflow Task 7 ”What is the a verage travel time to/rank of their footbal l club if all ticket holder s were to travel by car?”
Task Specification 7 input: (1)
, (2)
, (3)
goal types: I (IR A)
In this task we use roads, municipalities and an interaction table to assess some intensive measure, namely the
mean travel time. Alternatively, one can also compare a given trip with potentially closer trip alternatives, by rank-
ing destinations (football clubs) for each given origin (municipality) with respect to the closest destination. When
we weight this rank by the amount of interaction and average it over all flows, we obtain an average rank number.
This is called trip end ranking. In this example, it shows that out of a choice of 37 clubs the average season ticket
holder chooses the 2.5th closest club.
FIGURE A3 Functional distance- based clustering. (a) Functional distance clusters. (b) Functional distance
clusters on map
(a) (b)
A2.4 | Trade area analysis of football clubs
Workflow Task 8 ”What is the area enclosing 60% of the number of season ticket holders (trade area) closest to each
football club in the Netherlands?”
Task Specification 8 input: (1)
, (2)
, (3)
goal types: OS, ON (ObjectQ, RegionA, PlainNominalA)
use types: OEO* (MatrixQ, ERA), OIO* (MatrixQ, IR A)
In this task we are requesting some object- based region enclosing a given number of the closest ticket holders.
Th e te r m “clo ses t ” in th is ta sk im plies th e us e of so me di sta nce mat r ix , and th e num ber impl ie s som e exte ns ive ma-
trix between clubs and municipalities. We are interested in the size and the extent of overlap of these trade areas
(Figure A4a). It can be seen that the big three clubs (Ajax Amsterdam, Feyenoord Rotterdam and PSV Eindhoven)
fully dominate their neighbors.
A3 | Flow generation
A3.1 | Gravity model of football fan interaction
Workflow Task 9 ”What is the potential number of season tickets to be sold in each municipality for each football club
in the Netherlands, if some form of distance decay is assumed?”
FIGURE A4 Trade areas of football clubs and flow assignments of football fan trips in the Netherlands.
(a) Trade areas of football clubs: convex hulls around origin municipalities that demarcate the closest 60%
of interactions with destinations. (b) Flows of football ticket holders assigned to street network. Amounts
designated by line width and color hue (red, 3,000– 49,000 ticket holders)
(a) (b)
   SCHEIDER Et al.
Task Specification 9 input: (1)
, (2)
, (3)
goal types: OEO* (MatrixQ, RegionA, ERA)
Given some road data, some ticket interaction data and some municipality/club data, we are interested in pre-
dicting an extensive matrix, denoting the numbers of tickets sold for a municipality and some club.
Workflow Task 10 ”What is the attractiveness of football clubs for season ticket holders?”
Task Specification 10 input: (1)
, (2)
, (3)
goal types: OI (ObjectQ, RegionA, IRA)
In this task, our goal is to assess some intensive object- based measure (attractiveness of clubs) using the same
Workflow Task 11 ”What is the potential number of season ticket holders for remaining football clubs, when the
same distance decay effect and the same attractiveness for the remaining clubs are assumed as before closure?
Task Specification 11 input: (1)
, (2)
, (3)
, (4)
goal types: OE (ObjectQ, RegionA, ERA)
In this task, we use some hypothetical club attractiveness together with roads and other data to obtain some
object- based quantities (ticket holders for each club).
Task Specification 11a input: (1)
, (2)
, (3)
goal types: OE (ObjectQ, RegionA, ERA)
use types: OI (ObjectQ, RegionA, IRA)
Note that in Task 11a we leave out the fourth input, but additionally require some intermediate step that gener-
ates this input.
A3.2 | Traffic load analysis for football trip flow
Workflow Task 12 ”What would be the traffic load for each road in the Netherlands, assuming all season ticket holders
were to travel by car at the same time?
Task Specification 12Traffic load for each road in a network assuming shortest paths:
input: (1)
, (2)
, (3)
goal types: OEO (NetworkQ, LineA , ERA)
In this task we start with the interaction table, the municipalities and the roads to estimate some extensive
(flow) measure on these roads. Based on flow assignment, we find that the traffic load caused by season ticket
holders may run up to almost 49,000 on a single road segment in the vicinity of the most popular football clubs
(Figure A2b).
Examples of synthesized workflows
FIGURE B1 Example of an expert solution for Task 11a automatically generated with the CCD model in APE.
Here we model potential numbers of ticket holders for each football club in a scenario where some clubs are
closing. We st art from a roads file (2), municipalities and clubs given as an object tessellation (1), and a ticket
interaction table between these objec ts (3). Attractiveness scores for clubs (used to obtain amounts in the last
step applying a singly constrained gravity model) are generated on the fly using a doubly constrained gravit y
   SCHEIDER Et al.
FIGURE B2 Example of a semantic error produced by the CCD model for Task 10. The task is to generate
attractiveness scores for clubs, based on a municipality/club tessellation (1), a roads file (2) and am interaction
table (3). The problem is that threshold distances are not at trac tiveness scores, and that the task specification
lacks semantic detail to prevent this confusion
FIGURE B3 Example of a syntax error produced by the geometric benchmark model for Task 7. The distance
matrix function needs a topological network as data input, but it is given a roads file, resulting in a syntax error
   SCHEIDER Et al.
FIGURE B4 Example of a redundancy error produced by the CCD model for Task 7. The workflow produces
a correct result of trip length analysis from roads (1), municipalities (2) and an interaction file (3), but an
unnecessary functional clustering step is added
... Spatial network analysis is a collection of methods for measuring accessibility potentials and analyzing flows over transport networks [33]. As we go deeper into the analysis of the quantitative spatial analysis methods, the spatial network analysis proves itself as the promising and convenient method for SP identification. ...
... Computational models of spatial networks for GIS are frequently used in applications such as spatial planning, transport analysis, supply infrastructures, and the analysis of flows [33]. The analysis of the spatial network analysis tools was conducted by comparing features, provided measures, and techniques of network vertices mapping. ...
... Although spatial network analysis has been part of GIS for a long time, designing network analytical workflows still requires a considerable amount of expertise, and although the underlying graph algorithms [46] are well understood, we still lack a conceptual model that captures the required methodological know-how [33]. Analyzed spatial network analysis tools do not offer capabilities to describe SP, SP identification methods, and measures; this can be seen as the main SPDIAM quantitative and qualitative improvement to the spatial network analysis tools family. ...
Full-text available
Spatial pattern (SP) helps to analyze the tasks and provisions that arise in the spatial planning and design disciplines examining a complex spatial system (CSS). The aim of this article is to present a territorial planning process and how SP description, identification, and application methodology (SPDIAM) can help to work with it using a CSS model. First, the quantitative spatial analysis methods and spatial network analysis tools that can be used to identify SP are compared with SPDIAM. Then, the main concepts of SPDIAM and created IT artefacts are explained. In the experiment section, the SP is identified using spatial metapatterns in different North American, European, and African cities and the correlation results of SP and statistics are calculated to show a medium to close relationship. Then, the territorial planning cycle and SPDIAM application use case using the data of Kaunas city (Lithuania) are presented. The results of SPDIAM can be used to associate the shape of the territory with the geographer models and Alexander patterns using the bottom-up modelling principle and allows us to standardize urban planning solutions.
Full-text available
Loose programming enables analysts to program with concepts instead of procedural code. Data transformations are left underspecified, leaving away procedural details and exploiting knowledge about the applicability of functions to data types. To synthesize workflows of high quality for a geo-analytical task, the semantic type system needs to reflect knowledge of Geographic Information Systems (GIS) on a level that is deep enough to capture geo-analytical concepts and intentions, yet shallow enough to generalize over GIS implementations. Recently, core concepts of spatial information and related geo-analytical concepts were proposed as a way to add the required abstraction level to current geodata models. The core concept data types (CCD) ontology is a semantic type system that can be used to constrain GIS functions for workflow synthesis. However, to date, it is unknown what gain in precision and workflow quality can be expected. In this article, we synthesize workflows by annotating GIS tools with these types, specifying a range of common analytical tasks taken from an urban livability scenario. We measure the quality of automatically synthesized workflows against a benchmark generated from common data types. Results show that CCD concepts significantly improve the precision of workflow synthesis.
Full-text available
In geographic information systems (GIS), analysts answer questions by designing workflows that transform a certain type of data into a certain type of goal. Semantic data types help constrain the application of computational methods to those that are meaningful for such a goal. This prevents pointless computations and helps analysts design effective workflows. Yet, to date it remains unclear which types would be needed in order to ease geo-analytical tasks. The data types and formats used in GIS still allow for huge amounts of syntactically possible but nonsensical method applications. Core concepts of spatial information and related geo-semantic distinctions have been proposed as abstractions to help analysts formulate analytic questions and to compute appropriate answers over geodata of different formats. In essence, core concepts reflect particular interpretations of data which imply that certain transformations are possible. However, core concepts usually remain implicit when operating on geodata, since a concept can be represented in a variety of forms. A central question therefore is: Which semantic types would be needed to capture this variety and its implications for geospatial analysis? In this article, we propose an ontology design pattern of core concept data types that help answer geo-analytical questions. Based on a scenario to compute a liveability atlas for Amsterdam, we show that diverse kinds of geo-analytical questions can be answered by this pattern in terms of valid, automatically constructible GIS workflows using standard sources.
Full-text available
Automated workflow composition is bound to take the work with scientific workflows to the next level. On top of today’s comprehensive eScience infrastructure, it enables the automated generation of possible workflows for a given specification. However, functionality for automated workflow composition tends to be integrated with one of the many available workflow management systems, and is thus difficult or impossible to apply in other environments. Therefore we have developed APE (the Automated Pipeline Explorer) as a command-line tool and API for automated composition of scientific workflows. APE is easily configured to a new application domain by providing it with a domain ontology and semantically annotated tools. It can then be used to synthesize purpose-specific workflows based on a specification of the available workflow inputs, desired outputs and possibly additional constraints. The workflows can further be transformed into executable implementations and/or exported into standard workflow formats. In this paper we describe APE v1.0 and discuss lessons learned from applications in bioinformatics and geosciences.
Full-text available
Question Answering (QA), the process of computing valid answers to questions formulated in natural language, has recently gained attention in both industry and academia. Translating this idea to the realm of geographic information systems (GIS) may open new opportunities for data scientists. In theory, analysts may simply ask spatial questions to exploit diverse geographic information resources, without a need to know how GIS tools and geodata sets interoperate. In this outlook article, we investigate the scientific challenges of geo-analytical question answering, introducing the problems of unknown answers and indirect QA. Furthermore, we argue why core concepts of spatial information play an important role in addressing this challenge, enabling us to describe analytic potentials, and to compose spatial questions and workflows for generating answers.
Full-text available
What is the current state-of-the-art in integrating results from artificial intelligence research into geographic information science and the earth sciences more broadly? Does GeoAI research contribute to the broader field of AI, or does it merely apply existing results? What are the historical roots of GeoAI? Are there core topics and maybe even moonshots that jointly drive this emerging community forward? In this editorial, we answer these questions by providing an overview of past and present work, explain how a change in data culture is fueling the rapid growth of GeoAI work, and point to future research directions that may serve as common measures of success.
Conference Paper
Full-text available
This paper contributes to the philosophical foundations of conceptual modeling by addressing a number of foundational questions such as: What is a conceptual model? Among models used in computer science, which are conceptual , and which are not? How are conceptual models different from other models used in the Sciences and Engineering? The paper takes a stance in answering these questions and, in order to do that, it draws from a broad literature in philosophy, cognitive science, Logics, as well as several areas of Computer Science (including Databases, Software Engineering, Artificial Intelligence, Information Systems Engineering , among others). After a brief history of conceptual modeling, the paper addresses the aforementioned questions by proposing a characterization of conceptual models with respect to conceptual semantics and ontological commitments. Finally, we position our work w.r.t. to a "Reference Framework for Conceptual" modeling recently proposed in the literature.
Full-text available
A most fundamental and far-reaching trait of geographic information is the distinction between extensive and intensive properties. In common understanding, originating in Physics and Chemistry, extensive properties increase with the size of their supporting objects, while intensive properties are independent of this size. It has long been recognized that the decision whether analytical and cartographic measures can be meaningfully applied depends on whether an attribute is considered intensive or extensive. For example, the choice of a map type as well as the application of basic geocomputational operations, such as spatial intersections, aggregations or algebraic operations such as sums and weighted averages, strongly depend on this semantic distinction. So far, however, the distinction can only be drawn in the head of an analyst. We still lack practical ways of automation for composing GIS workflows and to scale up mapping and geocomputation over many data sources, e.g. in statistical portals. In this article, we test a machine-learning model that is capable of labeling extensive/intensive region attributes with high accuracy based on simple characteristics extractable from geodata files. Furthermore, we propose an ontology pattern that captures central applicability constraints for automating data conversion and mapping using Semantic Web technology.