Conference PaperPDF Available

Exploring the Notion of Spatial Lenses

Authors:

Abstract and Figures

We explore the idea of spatial lenses as pieces of software interpreting data sets in a particular spatial view of an environment. The lenses serve to prepare the data sets for subsequent analysis in that view. Examples include a network lens to view places in a literary text, or a field lens to interpret pharmacy sales in terms of seasonal allergy risks. The theory underlying these lenses is that of core concepts of spatial information, but here we exploit how these concepts enhance the usability of data rather than that of systems. Spatial lenses also supply transformations between multiple views of an environment, for example, between field and object views. They lift these transformations from the level of data format conversions to that of understanding an environment in multiple ways. In software engineering terms, spatial lenses are defined by constructors, generating instances of core concept representations from spatial data sets. Deployed as web services or libraries, spatial lenses would make larger varieties of data sets amenable to mapping and spatial analysis, compared to today’s situation, where file formats determine and limit what one can do. To illustrate and evaluate the idea of spatial lenses, we present a set of experimental lenses, implemented in a variety of languages, and test them with a variety of data sets, some of them non-spatial.
Content may be subject to copyright.
Exploring the Notion of Spatial Lenses
Christopher Allen2, Thomas Hervey1, Sara Lafia1, Daniel W. Phillips1, Behzad
Vahedi1, and Werner Kuhn1
1Department of Geography, University of California Santa Barbara (UCSB), Santa
Barbara, CA, USA,
werner@ucsb.edu,http://geog.ucsb.edu
2Department of Geography, San Diego State University (SDSU), San Diego, CA,
USA
Abstract. We explore the idea of spatial lenses as pieces of software
interpreting data sets in a particular spatial view of an environment.
The lenses serve to prepare the data sets for subsequent analysis in that
view. Examples include a network lens to view places in a literary text,
or a field lens to interpret pharmacy sales in terms of seasonal allergy
risks. The theory underlying these lenses is that of core concepts of spa-
tial information, but here we exploit how these concepts enhance the
usability of data rather than that of systems. Spatial lenses also supply
transformations between multiple views of an environment, for example,
between field and object views. They lift these transformations from the
level of data format conversions to that of understanding an environ-
ment in multiple ways. In software engineering terms, spatial lenses are
defined by constructors, generating instances of core concept representa-
tions from spatial data sets. Deployed as web services or libraries, spatial
lenses would make larger varieties of data sets amenable to mapping and
spatial analysis, compared to today’s situation, where file formats deter-
mine and limit what one can do. To illustrate and evaluate the idea of
spatial lenses, we present a set of experimental lenses, implemented in a
variety of languages, and test them with a variety of data sets, some of
them non-spatial.
Keywords: conceptual lenses, core concepts of spatial information, spa-
tial analysis, data usability, format conversions
1 Introduction
There is an implicit assumption underlying most work with GIS, namely that
a data set encodes a certain view of an environment and should therefore be
analyzed with tools conforming to that view only. For example, a raster data set
is normally seen as encoding a field view of space, consequently admitting map
algebra operations, while a set of polygons would be seen as encoding an object
view of space, and a literary text would in itself not be considered spatial data.
While this assumption can guide the choice of analysis tools, it can also stand in
the way of more flexible uses and richer interpretations of data. It limits the use
of mapping and spatial analysis tools to certain data formats that these tools
can handle and it prevents the exploitation of spatial references in data that are
not in a GIS format. Furthermore, the assumption can lead to unnecessary data
format conversions and information loss.
In this paper, we challenge this assumption by decoupling data from world
views. We introduce the notion of spatial lenses for data, which we define as
pieces of software that interpret data sets in terms of a chosen world view. For
example, a field lens interprets a data set as a representation of an environment
viewed as a field. The data set to be interpreted in this way can be an image,
a set of point measurements, a live sensor network, or anything else that may
be interpreted as representing a continuous function from positions in space-
time to values (in other words, a field [1]). Spatial lenses defined in this way
would normally be built by application programmers or software developers,
according to specifications produced by geographic information scientists, ideally
in consultation with domain scientists (such as climatologists, archaeologists, or
historians).
The decoupling of data from views of environments also allows for (and of-
ten requires) introducing some auxiliary information. To interpret a set of point
measurements as a field, for example, one obviously needs to supply an interpola-
tion function. If a literary text is to be seen as representing a social network, the
nodes (for example, literary characters) and links (for example, their kinship)
need to be defined in computable form.
Spatial lenses also reduce the need for explicit data format conversions by
users. Spatial analyses often involve multiple switches between different views of
space, based on the formats of data sources. With a dedicated choice of how to
conceptualize an environment, there is no need to change that lens for computa-
tional reasons only, no matter what format the data sources are in. For example,
an analysis of night-time lights in certain areas and at multiple levels of granu-
larity does not require alternation between field and object views, though this is
often done in practice [2]. Aggregation is handled as a granularity lens operation,
which can be layered on top of other content lenses to answer questions about
data quality.
Similar to the idea of constructor and observer operators, which are well-
known from abstract data types in programming and software engineering [19],
using the spatial lens idea allows for separation of analysis from pre-processing.
Spatial lenses are constructors that generate instances of core concept repre-
sentations to which observers can subsequently be applied in order to answer
spatio-temporal questions.
Our main goal is to allow for a more flexible view of what is considered “spa-
tial data”. A large proportion of data has implicit or explicit spatial references
and is therefore in principle amenable to mapping and spatial analysis. However,
the gap between the data and the tools is often too large to bridge for those with-
out solid technical GIS expertise. Spatial lenses lift mapping and spatial analysis
from the implementation level of data formats and GIS commands to the level
of questions about spatial phenomena [5]. Each of the lenses comes with a set
of questions they answer. For example, a network lens answers questions about
connectivity, centrality, and paths, while an event lens answers questions about
temporal sequencing and possible causation.
The paper first surveys previous work in several information sciences, then
presents five case studies, each of them proposing and implementing a set of
lenses, and ends with conclusions.
2 Previous Work
We take the seminal work of Edgar F. Codd on databases in the early 1970s [3]
as inspiration for the (admittedly harder) task of creating a higher-level under-
standing of spatial information. Codd’s relational algebra essentially defined a
lens on the world in terms of tables. Once one understands the world as consist-
ing of phenomena that one can represent in rows of tables with columns for their
attributes, the power of relational algebra unfolds without a need to understand
how the tables are stored and manipulated in a database. The first sentence
in Codd’s CACM article says it all: “Future users of large data banks must be
protected from having to know how the data is organized in the machine (the
internal representation)” [3].
Replace “large data banks” with “GIS” (or any other type of spatial comput-
ing platform) and ask yourself what you can do without knowing how a system
organizes the data internally. Here we do not mean the physical level of data
organization, but the logical one, i.e. the data structures. Whole curricula are in
fact built on the assumption that you cannot and should not do much without
that knowledge. While this assumption creates a cast of GIS experts (generally
recognized to be too small, and likely to stay so), it misses out on the vast
potential that GIS has for users without the technical skills, time, or financial
resources to acquire a thorough understanding of GIS internals before asking a
spatial question. In this paper, we do not address the usability of GIS and other
tools, but the usability of data, improving the means to interpret any data
spatially.
A case similar to that about databases could be made for the power (and
limitations) of seeing everything stored in a computer as a document of some type
(text, table, graphic etc.) or for the idea of Linked Data, built on the simplest
possible data model of subject-predicate-object triples [4]. Both paradigms have
lifted data manipulation from the level of dealing with data structures to that
of dealing with real-world concepts (documents and statements). GIScience has
not yet reached similar levels of simplicity and clarity in describing what it is
(and GIS are) about.
The core concepts of spatial information have been defined previously to
bridge the gap between spatial thinking and spatial computing [18, 5]. The fol-
lowing six of them are now explored as concepts to interpret spatial data:
1. Object – An individual that has properties and relations with other objects.
2. Field – A property with a value for each position in space and time.
3. Network – A set of objects (nodes) linked by a binary relation (edges).
4. Event – Something that happens and involves fields, objects, and/or net-
works as participants.
5. Granularity – The level of detail in objects, fields, networks, and events.
6. Accuracy – The correspondence of spatial information with what is con-
sidered a true state of affairs.
Note that the core concept of Location underlies the four ”content concepts”
(object, field, network, event), in the sense that all of these serve to answer
’where’ questions. The ”quality concepts” of granularity and accuracy, in turn,
can be applied to all content concepts. As location is always observed in space
and time, all concepts are spatio-temporal (not just the event concept).
In the case of spatial lenses as well as in the cases of databases, documents,
and linked data, the idea of a conceptual lens should be understood as a means
to view reality using some data and software, rather than viewing the data.
For example, if one decides to view temperature measurements from a network
of weather stations as a representation of a temperature field, this conceptual
lens allows for having a temperature value at each position and time of interest
(some of which are measured, others interpolated). The field lens is, thus, not
applied to the point data set per se (as this would leave out the interpolation
function to be supplied), but to the environment, seeing it through the data.
Our notion of Spatial Lenses is not directly related to the Urban Lens3from
the Senseable City Lab at MIT, which allows users to extract trends from large
datasets through the application of a figurative lens. Unlike this data visualiza-
tion work, our spatial lens notion applies a computational approach that carries
a set of associated questions to help users spatialize data in a particular way.
In GIScience, attempts to provide clearer conceptual structure to geographic
information have so far mainly focused on organizing GIS commands around the
sorts of items manipulated by a GIS [6, 7], or on finding a single general model
to deal with the largest possible range of geographic data [8, 1]. Our approach,
by contrast, identifies sorts of items or phenomena in the world, together with
questions to be asked about them [5]. It then fits the data and operations to
these conceptualizations, rather than coercing the understanding of the world to
data models.
3 Case Studies
To articulate the flexibility in choosing spatial lenses, we investigate non-trivial
domain-specific questions through a set of case studies. These studies highlight
lens views of environments based on existing data, spatial or other, taken from
diverse sources and mostly available online. The data range from historical news-
paper texts through typical GIS data in raster and vector form, to research
objects (publications and research data of any kind).
3http://senseable.mit.edu/urban-lens/
Fig. 1. Spatial lenses applied to a campus environment, illustrating symbolically how
a campus can be seen as a set of objects, as observed through various fields, as forming
networks, as participating in events, and how all these views come with a certain level
of granularity (detail) and accuracy
3.1 Baltimore City Vacant Buildings
Data Source and Uses. Our first data set is an online repository of vacant
building features across Baltimore City4. Updated monthly, the data set keeps
track of parcel properties that are not currently on the market, and are con-
demned or no longer occupied. Each feature contains attributes for several par-
ent administrative jurisdictions including neighborhood and police districts as
well as a pair of coordinates and a notice date generated upon initial inspection.
Available on the web portal OpenBaltimore, the data can be displayed through
online mapping tools, or downloaded in tabular form. We know of no spatial
analyses that use this data so far. However, for economists and social scientists,
the data could be useful when studying spatial urban dynamics or deciding on
investments.
Questions and Lens Support. Addressing such possible user perspectives,
we can ask the following questions of the data:
4https://data.baltimorecity.gov
Where can one find vacancy clusters?
Which neighborhoods contain most clusters?
To study clustering, we propose an object lens, viewing clustered vacancies as
objects generated from vacant parcel points. A user could then, based on the
neighborhood attribute, observe how many clusters are within each neighbor-
hood. This information could be valuable for influencing policy. For example,
if a user discovers a large number of vacancy clusters within the Sandtown-
Winchester neighborhood, they could suggest to the respective council members
where to focus and how to allocate rehabilitation funds. Note that this is an
example that illustrates how objects are not defined by boundaries (there are
none, in this case), but by their identity.
Fig. 2. West Baltimore vacancies plotted as cluster objects
Constructing the Lens. Since the vacant lots in this data set have a coordi-
nate pair attribute, generating cluster objects is straightforward (for this data,
DBSCAN is appropriate [9]), and a user would only need to supply the data in
tabular form along with clustering specifications. Upon applying the object lens
twice, to individual parcels as well as to the clusters, the following processing
steps constitute the object lens constructor:
Determine and plot point locations based on provided coordinate pairs.
With user defined settings, run the clustering algorithm.
Count clusters within each neighborhood.
In our Python implementation, the tabular data is loaded into a matrix, and
DBSCAN is run. Several clustering characteristics, including minimum number
of samples (in this case five other vacant parcels) and maximum sampling dis-
tance (in this case eight houses away), are supplied by the user. The results in
Figure 2 show point objects for each vacant building plotted by location and
with clusters differentiated by color. An automated count reveals that the Cen-
tral Park Heights and Sandtown-Winchester neighborhoods contain the highest
number of clusters, with 30 and 22 lots respectively.
Secondary Lens. Viewing vacancies through a field lens rather than as a
set of objects can provide additional insights. By constructing a field, values
could instead represent vacancy density. This ”heat map” view could be used to
study the spatial distribution of decay rather than just answering the question,
where do discrete vacancy clusters occur? An economist or city planner could
then suggest to developers where best to launch demolition and development
projects. A similar view of vacancies by density is currently being used by The
Johns Hopkins Medical Center, which is looking for opportunities to expand
further into east Baltimore5.
3.2 Santa Barbara Communities of Interest
Data Source and Uses. The data set used here was gathered by one of the
authors, Daniel W. Phillips, while conducting research about redistricting in
Santa Barbara, California. The data were collected to determine how well res-
idents thought the boundaries of the city council district in which they lived
reflect what they believed to be their community of interest. They consist of 114
responses to a survey given to residents during the summer of 2015, collected in
three of the six city council districts in the city. One of the items on the survey
involved each participant taking a base street map of the city and drawing a line
around the area that they believed to be their community of interest, defined as
a contiguous group of people with shared values, concerns, and cultural traits.
Questions and Lens Support. Analysis of the polygons drawn by residents
involved the following questions:
Do residents of a given district roughly agree about the location and extent
of their community of interest?
Can one identify an area of highest agreement, which might be considered
the core of the community of interest within each district?
Applying a field lens supports answering these questions. By overlaying the
individual polygons for a given district, one can determine the degree of overlap
as a field. While the street map survey instrument might have led respondents
into more network-based thinking about their communities than field-based, it
identified familiar locations better than an aerial image would have and thus
enabled people to make more informed decisions; even still, most drew simple
5https://hub.jhu.edu/gazette/2013/january/east-baltimore-changes-development
oval shapes that did not conform to the underlying street network. For any given
point within this field there is a certain percentage of respondents who included
that point in the polygon that they drew. The percentages range from zero to
more than 60 percent.
Constructing the Lens. After digitizing all drawings, the resulting polygons
were merged into a shapefile that served as the input for the constructor opera-
tions6:
Compute a count of the overlapping polygons at each point in space.
Use that count to create an output raster (with 25 meter cells).
Constructing classes for degree of overlap allowed for a simpler representation,
with a light yellow to dark red color scheme applied to differentiate four classes
of agreement.
Fig. 3. Agreement level of polygons drawn by District 1 residents, with classes from
light to dark of 0-39, 40-49, 50-59, and 60+ percent agreement
Secondary Lens. One could apply an accuracy lens, comparing the results
of applying different survey instruments. If the survey collectors used a more
detailed base map or showed it digitally, allowing for zooming and panning,
they might increase the accuracy of the responses obtained.
6Using a Python script written by Adam Davis, UCSB Department of Geography
3.3 Yucatan Peninsula Research Data Footprints
Data Source and Uses. Universities increasingly curate data repositories to
promote the discovery and reuse of research data. The data considered here ex-
pose the spatial extents of research projects from different domains. The first
data set examined is UCSB archaeologist Dr. Anabel Ford’s archaeological sites
and protected areas layer, which contains a collection of point locations and
names for 530 archaeological sites on the Yucatan peninsula7. The second data
set examined is Stanford political scientist Tom Patterson’s global disputed bor-
der layer, which contains polyline features for disputed areas and breakaway
regions derived from the CIA’s World Factbook boundary database8.
Questions and Lens Support. Leveraging the location of researcher data
through the creation of footprints for data sets promotes data discovery and
integration across disciplines. Researchers working across various domains may
have overlapping study areas and would benefit from spatial data discovery. For
example, a political scientist interested in contested regions should be able to
discover and utilize a relevant protected heritage sites layer contributed by an
archaeologist. Applying an object lens to study areas makes it possible to ask
questions about their spatial properties and relationships, in particular:
Which data sets overlap with the spatial extent of the area of interest?
A generic method is needed to generate a footprint of any research data type.
Spatial metadata often includes an extent attribute, which delineates a mini-
mum rectangular bounding box for the object. However, this extent alone is not
a desirable data envelope, as the inclusion of an outlying feature can greatly
exaggerate the geometry of the object [11]. Some library resources may also in-
clude place names, for example, Library of Congress subject headings. These
place names need to be turned into footprints using a gazetteer.
Constructing the Lens. Convex hulls are constructed from an input researcher
data set, such as Ford’s geocoded archaeological site points and Patterson’s
disaggregated contested border polylines:
Determine the spatial extent of each research object.
Determine overlap between research objects.
Convex hulls are convex sets that contain all points [12] and are constructed for
each data object using geoprocessing tools. The intersection of the convex hull
shapefiles representing data set footprints reveals overlapping extents.
Figure 4 demonstrates the spatial relationships among the constructed convex
hulls. A partial intersection reveals a correspondence between protected areas
of archaeological interest and a contested border region in Belize, which may
7http://discovery.ucsb.opendata.arcgis.com/
8https://earthworks.stanford.edu/
Fig. 4. Convex hulls constructed for research objects reveal intersections in extent.
be of interest to a domain scientist working in the region. Using location as
an integrator of contents has potential to enhance the data discovery process,
enhancing information retrieval across domains.
Secondary Lens. Another lens view on research data sets that enables data
discovery is that of a non-spatial network of linked data triples. The connections
between nodes of researchers and the data that they produce are represented as
edges for the predicate isReferencedBy:
Which research data sets reference publications in the area of interest?
Which publications are authored by researchers in the area of interest?
Applying spatial lenses to researcher-generated data sets and to publications
authored by researchers exposes undiscovered relationships, providing a basis
for data discovery and leading to opportunities for trans-disciplinary research
collaboration.
3.4 United States Historical News Archive
Data Source and Uses. The Chronicling America data set9is an archive
that contains newspaper issues published between 1836 and 1922. Although
9http://chroniclingamerica.loc.gov/about/api
the database includes newspapers from across the United States, some regions
appear better represented than others. Large-scale efforts to digitize historical
archives are a relatively recent phenomenon and thus researchers are just be-
ginning to explore the possibilities that this new data presents. Scholars, for
example, have investigated the temporal dynamics of themes found in the Rich-
mond Daily Dispatch during the Civil War [13]. With respect to the Chronicling
America data set in particular, there do not appear to be any studies that ap-
proach the data set from a geographic point of view so far.
Questions and Lens Support. Anetwork lens can be used to investigate
questions about relationships between place names in newspaper text:
Which pairs of place names frequently co-occur in newspaper texts and why?
What place names are most centrally mentioned (i.e., most connected to
others)?
Such a network can be inspected using traditional network analysis tools.
For instance, important nodes (i.e., place names) can be identified using central-
ity measures, and clusters can be found with community detection algorithms.
Moreover, it is possible to visualize this network using programming libraries or
software tools such as Gephi (see Figure 5).
Constructing the Lens. Pre-processing is a crucial step in applying spatial
lenses to natural language data, as these data are unstructured and often noisy.
Natural language processing tools are available to normalize and parse text, as
well as to identify potential place names in text. Additionally, such data are
fraught with misspellings and other formatting issues resulting from scanning.
The Natural Language Toolkit (NLTK) for Python has been used to clean and
parse text data. We also take advantage of a tool called CLIFF10 to recognize
place names in newspaper articles. CLIFF accepts unstructured text as input
and returns a list of standardized place names that were found. The key steps
for constructing a network in this case are:
Determine proximity parameters for defining co-occurrences.
Identify place names using the CLIFF tool.
Iterate over entire text and maintain a list of place names that occur within
the specific proximity.
Create an undirected network using the list of co-occurring place names and
weigh each edge by the frequency with which the places co-occur.
Secondary Lens. The Chronicling America data set includes many historical
issues of individual newspapers and it is appealing to apply a temporal granu-
larity lens to it. Once the mechanisms for constructing a co-occurence network
10 http://cliff.mediameter.org/
Fig. 5. Cooccurrence network of all 1914 Aberdeen Herald issues
are built, it becomes easy to investigate the data at different temporal scales by
aggregating newspaper issues into multiple time windows. Observing changes in
the network structure over time can help scholars understand urban processes
such as shifting neighborhood demographics or transformations in spatial struc-
ture.
3.5 California Wildfires and Land Cover Change
Data Source and Uses. Two sets of data are used in this section. The first is
regional land cover data of California in four different years (1996, 2001, 2006,
and 2010), produced by the Coastal Change Analysis Program (C-CAP) of the
National Oceanic and Atmospheric Administration (NOAA) and updated every
five years11. Landsat TM 5 satellite imagery is used to produce these data, with
25 different land cover classes. The second data set is a list of the 20 largest
California wildfires from 1932 to 2015, produced by California Department of
Forestry and Fire Protection12 (see Figure 6).
Monitoring land cover change is important for policy decisions, regulatory
actions and subsequent land-use activities. These data are frequently used to
generate landscape-based metrics and to assess landscape condition and mon-
itor status and trends over a specified time interval [14]. Land cover change
sometimes is set in motion by individual landowners and sometimes is driven
by environmental forces. Over the past few decades, the most prominent land
11 https://coast.noaa.gov/ccapftp/
12 http://www.fire.ca.gov/communications/downloads/fact_sheets/20LACRES.
pdf
Fig. 6. Data sources: Land cover in California in 1996 (left); 20 Largest California
Wildfires (right) .
changes within the U.S. have been changes in the amount and kind of forest
cover. Logging practices, development, urban expansion, and wildfires play the
most important roles in this trend [16].
Questions and Lens Support. To analyze the effect of wildfires on land
cover change in California between 1996 and 2010, in five-year time periods, we
propose to use an event lens to look at the data. Both land cover change and
wildfires can be seen as events, allowing users to relate and compare land cover
change to wildfires, asking questions such as:
Is there a relationship between the number and the magnitude of wildfires
and the amount of land cover change in a given time period?
What time period has experienced the greatest number of wildfires?
In what time period has the area of land affected by wildfires been greatest?
Defining environmental phenomena as events enables the users to evaluate
the interaction of these phenomena with land cover change and to reason about
them. This is not easily done, if at all, with conventional methods.
Constructing the Lens. An event lens enables the user to choose the unit on
which they want to study change (the whole field, sub-fields, or even a set of
objects). Each of these units could be defined in various sizes and forms, applying
a granularity lens. To define land cover change events, we consider each satellite
image as representing a field. By comparing two consecutive fields in time, an
event is constructed based on whether or not there is a difference between these
two fields. The total number of events in which a certain class is changed can then
be calculated and this value could give an estimation of the land cover change.
Next, each of the 20 wildfires is defined as a separate event that has location
(county), name, date, and affected area as properties. To construct events:
Compare land cover fields for all two consecutive pairs.
Create an event (and populate its properties) if there is a difference between
two fields at a certain location.
Create wildfire events with corresponding properties.
Having these two sets of events, we can observe their overlap, containment,
and time of occurrence. The results of applying the overlap observer, for example,
shows that the 2006-2010 period, by having six major wildfires and almost 1116
acres of land affected, has by far the greatest number of wildfires and the largest
total area affected among the three periods.
Secondary Lens. Since the wildfire data are at the county level, an idea for the
secondary lens would be to use an object lens. One can then apply the event lens
on these objects to assess the effect of each wildfire on its corresponding county
and to study their relationship. Objects can be constructed using Geospatial
Object Based Image Analysis (GEOBIA) methods such as supervised maximum
likelihood classifier [17].
4 Conclusions
The idea of spatial lenses to view environments through data sets is presented
here as a counterpoint to considering data sets as implying singular world views.
Data are just data, and while they always result from a certain conceptualiza-
tion, one can often beneficially interpret them in other ways, including ways not
intended by their authors. Today’s GIS practice tends to lock users into concep-
tualizations based on data formats, discouraging the exploration of alternative
views of data and, more importantly, of the phenomena under study. Balancing
a possible need to restrict what can be done with data against the opportunity
to exploit spatiality currently errs on the timid side, i.e. being overly restrictive
(while still not really preventing inappropriate uses).
Core concepts are a way of thinking about, encoding, and computing with
phenomena in a few intuitive spatial ways. Users of spatial data will ideally
approach these lenses knowing what questions they want to answer, and these
questions will inform the choice of operations and consequently the lens. There-
fore, in order to get meaningful results, data sets should be fed to lenses applied
to study areas, rather than applying lenses to data sets. The question what data
set can inform what lenses remains to be studied. Implementing more generic
constructor operations for each lens will make it more specific by admitting cer-
tain data types but not others. Switching between different lenses, on the other
hand, appears to be possible even for one and the same data set (for example,
seeing point measurements as defining objects or fields).
The five case studies presented in this paper explore, illustrate, and test the
idea of spatial lenses. They were chosen to represent a broad spectrum of data
sets and applications, as well as to cover all four content and two quality concepts
of spatial information defined in our previous work [18]. We have arranged them
to start with straightforward object and field lens applications and progress to
more elaborate or unusual applications. Each case study furthermore illustrates
some less obvious aspects of the chosen data sets. Jointly, they demonstrate
the versatility of the spatial lens idea, resulting from the decoupling of spatio-
temporal question answering from data and file formats. For example, examining
land cover time series from an event perspective is conceptually intuitive and
powerful, but not possible using traditional techniques.
Our lenses were straightforward for GIS savvy users to build. Our ambition
in ongoing work is to build six generic lenses, usable by domain experts with
little or no GIS expertise. Consequently, the next step in this research is to ab-
stract each lens constructor from peculiarities in the data sets and build pieces
of software that can be deployed as libraries or web services. These lenses will be
parameterized to become applicable to all data sets that allow for an interpre-
tation through them. For example, a field lens, when fed with a point data set
with values from a continuously varying phenomenon, would determine the spa-
tial and temporal domain of the field, ask the user for an interpolation method
(or apply a default method), and produce an enriched data set that can then be
queried by map algebra tools, used to produce isoline maps, or applied to any
other field-based analysis.
Acknowledgments. The work presented in this paper (and the writing of the
paper) was part of a graduate research seminar at the Geography Department
of UCSB. All authors have contributed equally to the paper and are therefore
listed in alphabetical order, with the seminar teacher going last. Additional con-
tributions by Carlos Baez, Andrea Ballatore, Chandra Krintz, George Technitis,
and Rich Wolski are gratefully acknowledged. The work was supported by the
Center for Spatial Studies at UCSB.
References
1. Camara, G., Egenhofer, M.J., Ferreira, K., Andrade, P., Queiroz, G., Sanchez, A.,
Jones, J. and Vinhas, L.: Fields as a generic data type for big spatial data. In
Geographic Information Science (pp. 159-172). Springer International Publishing
(2014)
2. Lowe,M.: Night Lights and ArcGIS: A Brief Guide (2014). http://economics.mit.
edu/files/8945; Online, accessed Nov-2015
3. Codd, E. F.: A relational model of data for large shared data banks. Communications
of the ACM, 13(6), 377-387 (1970)
4. Kuhn, W., Kauppinen, T., and Janowicz, K.: Linked data-A paradigm shift for
geographic information science. In Geographic Information Science (pp. 173-186).
Springer International Publishing (2014)
5. Vahedi, B., Kuhn, W., and Ballatore, A.: Question Based Spatial Computing - A
Case Study. In AGILE 2016 (in press). Springer International Publishing (2016)
6. Albrecht, J.: Universal analytical GIS operations: A task-oriented systematization
of data structure-independent GIS functionality. Geographic information research:
Transatlantic perspectives, 577-591 (1998)
7. Tomlin, C. D.: Geographic information systems and cartographic modeling (No.
526.0285 T659). Prentice Hall (1990)
8. Goodchild, M.F., Yuan, M. and Cova, T.J.: Towards a general theory of geographic
representation in GIS. International journal of geographical information science,
21(3), pp.239-260 (2007)
9. Ester, M., Kriegel, H. P., Sander, J., and Xu, X: A density-based algorithm for
discovering clusters in large spatial databases with noise. In Kdd, vol. 96, no. 34,
pp. 226-231 (1996)
10. Montello, D. R., Friedman, A., and Phillips, D. W.: Vague cognitive regions in ge-
ography and geographic information science. International Journal of Geographical
Information Science, 28(9), 1802-1820 (2014)
11. Hill, L. L.: Georeferencing: The geographic associations of information. MIT Press
(2009)
12. De Berg, M., Van Kreveld, M., Overmars, M., and Schwarzkopf, O. C.: Convex
Hulls, Mixing Things. in Computational geometry,(pp 243–258) Springer Berlin Hei-
delberg (2000)
13. Templeton, C., Brown, T., Battacharyya, S., and Boyd-Graber, J.: Mining the Dis-
patch under Supervision: Using Casualty Counts to Guide Topics from the Rich-
mond Daily Dispatch Cor. In Chicago Colloquium on Digital Humanities and Com-
puter Science (2011)
14. Jones, K. Bruce, Kurt H. Riitters, James D. Wickham, D. Roger Jr, Robert V.
O’Neill, Deborah J. Chaloud, Elizabeth R. Smith, and Anne C. Neale: ”An ecological
assessment of the United States mid-Atlantic region: a landscape atlas.” (1997)
15. Pielke, R. A., Marland, G., Betts, R. A., Chase, T. N., Eastman, J. L., Niles, J. O.,
and Running, S. W.: The influence of land-use change and landscape dynamics on
the climate system: relevance to climate-change policy beyond the radiative effect
of greenhouse gases. Philosophical Transactions of the Royal Society of London A:
Mathematical, Physical and Engineering Sciences, 360(1797), 1705-1719 (2002)
16. Brown, D. G., C. Polsky, P. Bolstad, S. D. Brody, D. Hulse, R. Kroh, T. R. Love-
land, and A. Thomson: Land Use and Land Cover Change. Climate Change Impacts
in the United States. in The Third National Climate Assessment, J. M. Melillo,
Terese (T.C.) Richmond, and G. W. Yohe (Eds.), U.S. Global Change Research
Program, 318-332 (2014)
17. Walter, V.: Object-based classification of remote sensing data for change detection.
ISPRS Journal of photogrammetry and remote sensing, 58(3), 225-238 (2004)
18. Kuhn, W. and Ballatore, A.: Designing a Language for Spatial Computing. In
AGILE 2015 (pp. 309-326). Springer International Publishing (2015)
19. Liskov, B. and Guttag, J.: Abstraction and specification in program development.
MIT press 1986
... The work thus uses Kuhn's (2012) spatial ontology (cf. also Allen et al., 2016;Vasardani & Winter 2016), which is centered on the following concepts. ...
... We believe that our taxonomy can double as an inference system for urban place and place name retrieval and organization. This is consistent with the view that semantic content can guide the development of spatial inference systems (e.g., Keßler, Janowicz & Bishr;Allen et al., 2016;Kuhn, 2012;Winter & Freksa, 2012). For instance, street directory apps may include inference systems that answer where-questions by parsing the generic terms included in a search. ...
Article
The goal of this paper is to offer an analysis of urbanonyms, names for urban places, and show how this analysis can inform a conceptual taxonomy of urban places via the cultural lens of language. To reach this goal, the paper offers a classification of Italian urbanonyms (e.g., Via Nazionale "National Street") based on data extraction from the Pagine Gialle directory, and a taxonomy of place concepts. This classification is obtained via a lexicographic analysis of extracted terms and their sense relations. A discussion of place concepts unique to cities across Italy is offered, as proof of the importance of cultural and linguistic facets. The paper concludes by discussing how these results can inform research on place ontologies across disciplines.
... On this account, networks are one of a range of concepts needed for interpreting the environment and for reasoning with GIS. These concepts constitute conceptual "lenses" through which the environment can be studied independently of technical representations (Allen et al., 2016;Kuhn & Ballatore, 2015). Besides the base concept of location, allowing for metric distance assessments in space, Kuhn distinguished the following content concepts, which we interpret here in a broader research context: ...
Article
Full-text available
Spatial network analysis is a collection of methods for measuring accessibility potentials as well as for analyzing flows over transport networks. Though it has been part of the practice of geographic information systems for a long time, designing network analytical workflows still requires a considerable amount of expertise. In principle, artificial intelligence methods for workflow synthesis could be used to automate this task. This would improve the (re)usability of analytic resources. However, though underlying graph algorithms are well understood, we still lack a conceptual model that captures the required methodological know‐how. The reason is that in practice this know‐how goes beyond graph theory to a significant extent. In this article we suggest interpreting spatial networks in terms of quantified relations between spatial objects, where both the objects themselves and their relations can be quantified in an extensive or an intensive manner. Using this model, it becomes possible to effectively organize data sources and network functions towards common analytical goals for answering questions. We tested our model on 12 analytical tasks, and evaluated automatically synthesized workflows with network experts. Results show that standard data models are insufficient for answering questions, and that our model adds information crucial for understanding spatial network functionality.
... They provide us with a set of interchangeable lenses through which research data can be spatialized and viewed. 43 To produce maps, we first produce a field of continuous topic values from the texts of research documents with a topic value at each position. This can be thought of as a landscape or surface of topic values. ...
Article
Full-text available
The institutional review of interdisciplinary bodies of research lacks methods to systematically produce higher-level abstractions. Abstraction methods, like the “distant reading” of corpora, are increasingly important for knowledge discovery in the sciences and humanities. We demonstrate how abstraction methods complement the metrics on which research reviews currently rely. We model cross-disciplinary topics of research publications and projects emerging at multiple levels of detail in the context of an institutional review of the Earth Research Institute (ERI) at the University of California at Santa Barbara. From these, we design science maps that reveal the latent thematic structure of ERI's interdisciplinary research and enable reviewers to “read” a body of research at multiple levels of detail. We find that our approach provides decision support and reveals trends that strengthen the institutional review process by exposing regions of thematic expertise, distributions and clusters of work, and the evolution of these aspects.
... Their findings suggested that ANNs can be appropriate techniques for asthma detection. Due to a lack of research on the spatial complexities of COVID-19 at the national level, in this study, we leveraged the potential of ANNs in identifying complex spatial patterns and the power of geographic information systems (GIS) in spatial analysis [29,30] to predict county-level COVID-19 incidence rates in the continental United States. We employed one of the widely used topologies of ANNs that is described in Section 2.4. ...
Article
Full-text available
Prediction of the COVID-19 incidence rate is a matter of global importance, particularly in the United States. As of 4 June 2020, more than 1.8 million confirmed cases and over 108 thousand deaths have been reported in this country. Few studies have examined nationwide modeling of COVID-19 incidence in the United States particularly using machine-learning algorithms. Thus, we collected and prepared a database of 57 candidate explanatory variables to examine the performance of multilayer perceptron (MLP) neural network in predicting the cumulative COVID-19 incidence rates across the continental United States. Our results indicated that a single-hidden-layer MLP could explain almost 65% of the correlation with ground truth for the holdout samples. Sensitivity analysis conducted on this model showed that the age-adjusted mortality rates of ischemic heart disease, pancreatic cancer, and leukemia, together with two socioeconomic and environmental factors (median household income and total precipitation), are among the most substantial factors for predicting COVID-19 incidence rates. Moreover, results of the logistic regression model indicated that these variables could explain the presence/absence of the hotspots of disease incidence that were identified by Getis-Ord Gi* (p < 0.05) in a geographic information system environment. The findings may provide useful insights for public health decision makers regarding the influence of potential risk factors associated with the COVID-19 incidence at the county level.
... We chose Kuhn's ontology of core concepts of spatial information (Allen et al., 2016;Kuhn, 2012) which, in its latest form, includes a base concept location, four content concepts: field, object, network, and event, and three information quality concepts: granularity, accuracy, and provenance. For the purpose of this article, the quality concepts are out of scope. ...
Article
Full-text available
Human spatial concepts, such as the concept of place, are not immediately translatable to the geometric foundations of spatial databases and information systems developed over the past 50 years. These systems typically rest on the concepts of objects and fields, both bound to coordinates, as two general paradigms of geographic representation. The match between notions of place occurring in everyday where questions and the data available to answer such questions is unclear and hinders progress in place‐based information systems. This is particularly true in novel application areas such as the Digital Humanities or speech‐based human–computer interaction, but also for location‐based services. Although this shortcoming has been observed before, we approach the challenges of relating places to information system representations with a fresh view, based on a set of core concepts of spatial information. These concepts have been proposed in information science with the intent of serving human–machine spatial question asking and answering. Clarifying the relationship of the notion of place to these concepts is a significant step toward geographically intelligent systems. The main result of the article is a demonstration that the notion of place fits existing concepts of spatial information, when these are adequately exploited and combined.
Chapter
Full-text available
Geographic Information Systems (GIS) support spatial problem solving by large repositories of procedures, which are mainly operating on map layers. These procedures and their parameters are often not easy to understand and use, especially not for domain experts without extensive GIS training. This hinders a wider adoption of mapping and spatial analysis across disciplines. Building on the idea of core concepts of spatial information, and further developing the language for spatial computing based on them, we introduce an alternative approach to spatial analysis, based on the idea that users should be able to ask questions about the environment, rather than finding and executing procedures on map layers. We define such questions in terms of the core concepts of spatial information, and use data abstraction instead of procedural abstraction to structure command spaces for application programmers (and ultimately for end users). We sketch an implementation in Python that enables application programmers to dispatch computations to existing GIS capabilities. The gains in usability and conceptual clarity are illustrated through a case study from economics, comparing a traditional procedural solution with our declarative approach. The case study shows a reduction of computational steps by around 45 %, as well as smaller and better organized command spaces.
Conference Paper
Full-text available
One of the main reasons why software projects fail is the lack of communication between the business users, who actually know the problem domain, and the developers who design and implement the software model. " (Ghosh 2011). Abstract We present the design rationale underlying a language for spatial computing and sketch a prototypical implementation in Python. The goal of this work is to provide a high-level language for spatial computing that is executable on existing commercial and open source spatial computing platforms, particularly Geographic Information Systems (GIS). The key idea of the approach is to target an abstraction level higher than that of GIS commands and data formats, yet meaningful within and across application domains. The paper describes the underlying theory of spatial information and shows its evolving formal specification. An embedding in Python exemplifies access to commonly available implementations of spatial computations.
Article
Full-text available
Cognitive regions are regions in the mind, reflecting informal ways individuals and cultural groups organize their understanding of earth landscapes. Cognitive region boundaries are typically substantially vague and their membership functions are substantially variable - the transition from outside to inside the region is imprecise or vague, and different places within the region are not equally strong or clear as exemplars of the region. Methods for assessing and cartographically depicting cognitive regions, as with other vague geographic regions, have traditionally implied an inappropriate level of boundary sharpness and membership uniformity, such as when boundaries are mapped as precise lines. Research in recent decades has explored methods for assessing and depicting boundary vagueness and membership variability, either within or across individuals, but has still assumed homogeneity and regularity in the vagueness and variability. In this article, we present two studies that assess the cognitive regions of 'Northern' and 'Southern' California, and, for comparison, 'Northern' and 'Southern' Alberta. The first study uses a standard boundary-drawing task; the second uses a novel task in which participants rate cells of a high-resolution grid laid over an outline map. This technique allows us to assess and depict vagueness and nonuniformity that is heterogeneous and irregular across different areas. Differences in the conceptualization of 'Northern' and 'Southern' regions in California, as compared to those in Alberta, point to thematic influences on cognitive regions in California but not in Alberta. As is often true with cognitive regions, Northern and Southern California are about attitude, not just latitude.
Conference Paper
Full-text available
This paper defines the Field data type for big spatial data. Most big spatial data sets provide information about properties of real-ity in continuous way, which leads to their representation as fields. We develop a generic data type for fields that can represent di↵erent types of spatiotemporal data, such as trajectories, time series, remote sensing and, climate data. To assess its power of generality, we show how to rep-resent existing algebras for spatial data with the Fields data type. The paper also argues that array databases are the best support for processing big spatial data and shows how to use the Fields data type with array databases.
Conference Paper
Full-text available
The Linked Data paradigm has made significant inroads into research and practice around spatial information and it is time to reflect on what this means for GIScience. Technically, Linked Data is just data in the simplest possible data model (that of triples), allowing for linking records or data sets anywhere across the web using controlled semantics. Conceptually, Linked Data offers radically new ways of thinking about, structuring, publishing, discovering, accessing, and integrating data. It is of particular novelty and value to the producers and users of geographic data, as these are commonly thought to require more complex data models. The paper explains the main innovations brought about by Linked Data and demonstrates them with examples. It concludes that many longstanding problems in GIScience have become approachable in novel ways, while new and more specific research challenges emerge.
Article
Full-text available
Geographic representation has become more complex through time as researchers have added new concepts, leading to apparently endless proliferation and creating a need for simplification. We show that many of these concepts can be derived from a single foundation that we term the atomic form of geographic information. The familiar concepts of continuous fields and discrete objects can be derived under suitable rules applied to the properties and values of the atomic form. Fields and objects are further integrated through the concept of phase space, and in the form of field objects. A second atomic concept is introduced, termed the geo-dipole, and shown to provide a foundation for object fields, metamaps, and the association classes of object-oriented data modelling. Geographic dynamics are synthesized in a three-dimensional space defined by static or dynamic object shape, the possibility of movement, and the possibility of dynamic internal structure. The atomic form also provides a tentative argument that discrete objects and continuous fields are the only possible bases for geographic representation.