Conference PaperPDF Available

Talk of the Town: Discovering Open Public Data via Voice Assistants

Abstract and Figures

Access to public data in the United States and elsewhere has steadily increased as governments have launched geospatially-enabled web portals like Socrata, CKAN, and Esri Hub. However, data discovery in these portals remains a challenge for the average user. Differences between users' colloquial search terms and authoritative metadata impede data discovery. For example, a motivated user with expertise can leverage valuable public data about transportation, real estate values, and crime, yet it remains difficult for the average user to discover and leverage data. To close this gap, community dashboards that use public data are being developed to track initiatives for public consumption; however, dashboards still require users to discover and interpret data. Alternatively, local governments are now developing data discovery systems that use voice assistants like Amazon Alexa and Google Home as conversational interfaces to public data portals. We explore these emerging technologies, examining the application areas they are designed to address and the degree to which they currently leverage existing open public geospatial data. In the context of ongoing technological advances, we envision using core concepts of spatial information to organize the geospatial themes of data exposed through voice assistant applications. This will allow us to curate them for improved discovery, ultimately supporting more meaningful user questions and their translation into spatial computations.
Content may be subject to copyright.
Talk of the Town: Discovering Open Public Data
via Voice Assistants
Sara Lafia
Department of Geography, University of California, Santa Barbara, USA
slafia@ucsb.edu
Jingyi Xiao
Department of Geography, University of California, Santa Barbara, USA
jingyi_xiao@ucsb.edu
Thomas Hervey
Department of Geography, University of California, Santa Barbara, USA
thomasahervey@ucsb.edu
Werner Kuhn
Department of Geography, University of California, Santa Barbara, USA
werner@ucsb.edu
Abstract
Access to public data in the United States and elsewhere has steadily increased as governments
have launched geospatially-enabled web portals like Socrata, CKAN, and Esri Hub. However,
data discovery in these portals remains a challenge for the average user. Differences between
users’ colloquial search terms and authoritative metadata impede data discovery. For example, a
motivated user with expertise can leverage valuable public data about transportation, real estate
values, and crime, yet it remains difficult for the average user to discover and leverage data. To
close this gap, community dashboards that use public data are being developed to track initiatives
for public consumption; however, dashboards still require users to discover and interpret data.
Alternatively, local governments are now developing data discovery systems that use voice assistants
like Amazon Alexa and Google Home as conversational interfaces to public data portals. We explore
these emerging technologies, examining the application areas they are designed to address and the
degree to which they currently leverage existing open public geospatial data. In the context of
ongoing technological advances, we envision using core concepts of spatial information to organize
the geospatial themes of data exposed through voice assistant applications. This will allow us to
curate them for improved discovery, ultimately supporting more meaningful user questions and their
translation into spatial computations.
2012 ACM Subject Classification Information systems Service discovery and interfaces
Keywords and phrases data discovery, open public data, voice assistants, essential model, GIS
Digital Object Identifier 10.4230/LIPIcs.COSIT.2019.10
Category Short Paper
Acknowledgements
The work presented in this paper resulted from a graduate research seminar at
the UCSB Department of Geography. Feedback from Behzad Vahedi and Gengchen Mai is gratefully
acknowledged. The work was supported by the UCSB Center for Spatial Studies.
1 Open Data: Of, By, and For the People?
Open data, also called public sector information, aspire to increase the transparency of
government activities and their accountability to the public [
10
]. In the United States,
mandates for open data are often satisfied in part by the adoption of platforms, like CKAN
and ArcGIS Hub, which mediate public access to government data catalogs. The platforms
often include both geospatial (e.g. parcel maps) and non-geospatial data (e.g. tax tables).
©Sara Lafia, Jingyi Xiao, Thomas Hervey, and Werner Kuhn;
licensed under Creative Commons License CC-BY
14th International Conference on Spatial Information Theory (COSIT 2019).
Editors: Sabine Timpf, Christoph Schlieder, Markus Kattenbeck, Bernd Ludwig, and Kathleen Stewart;
Article No. 10; pp. 10:1–10:7
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
10:2 Talk of the Town
However, the production, maintenance, and dissemination of such authoritative open public
data is costly for providers and the effort does not guarantee increased public engagement
[
6
]. Furthermore, work remains to be done to help governments keep track of the direct and
indirect benefits of their open data policies, measured in part by tracking data reuse [2].
Impediments to the uptake of open public data include challenges with data discovery
and usability. Discovery is understood broadly as a mode of exploratory search that involves
browsing for task appropriate data, while usability describes the fitness of data for a defined
task
1
. A major impediment to data discovery in human-system communication is reflected
by the “vocabulary problem” [
5
], in which users rarely agree on what to call the things
that they want to find. This makes effective keyword-based search and discovery difficult to
accomplish in public open data portals. To address this problem, some open data portals like
Esri’s ArcGIS Hub have partnered with community organizations to develop ontologies that
map the terms used to describe community level initiatives from authoritative vocabularies
(e.g. the USGS Thesaurus) to users’ colloquial terminology [8].
While this strategy addresses data discovery, it does not address underlying issues with
data usability. Even if users are better able to identify task-appropriate data, they generally do
not know how to assess the fitness of data for a given task and are still expected to manipulate
and analyze data to gain insights. Given these constraints, services like data dashboards are
being developed, allowing users to track vital community issues (e.g. pedestrian fatalities
2
)
without requiring them to manipulate, clean, or visualize data. Even more empowering
are alternative modalities, such as those offered by voice assistants, which are growing in
popularity
3
and have implications for open public data discovery and use. Governments
have suggested that voice assistants might offer new interfaces for connecting community
members to public services and information exposed through open public data portals.
In this paper, we explore the current capabilities of various voice assistants under
development by local governments across the United States. We focus on the application
areas that these systems are designed to address and examine how (if at all) they leverage
geospatial data. Next, we discuss the challenges that voice assistants face when answering
geospatial questions. Finally, we envision using core concepts of spatial information [
7
] to
organize the geospatial themes of data that users want to discover, with the goal of supporting
a broader range of user questions and spatial computations on them. We focus primarily on
improvements to discovery for existing systems that also carry benefits for data usability.
2 State of the Art for Government Voice Assistants
Voice assistants are now widely available on commercial smart speakers, such as Google
Assistant and Amazon Alexa. A recent survey
4
has projected that half of all U.S. homes
will own smart speakers by the end of 2019. The same survey also reported that the most
common interactions with voice services include asking questions, performing online searches,
performing basic research like confirming information, and asking for directions.
While today’s voice assistants are used to control home automation systems and perform
other basic daily tasks, interest has shifted to more intelligent interactions such as enabling
natural conversations and answering questions. Users are able to ask questions about real-
1https://www.force11.org/group/fairgroup/fairprinciples
2http://visionzero.lacity.org/map/
3https://www.citylab.com/solutions/2018/10/amazon-alexa- smart-speakers- city/573412/
4https://www.cmo.com/features/articles/2018/9/7/adobe-2018- consumer-voice- survey.html
S. Lafia, J. Xiao, T. Hervey, and W. Kuhn 10:3
time information, such as what time it is now? and what wil l the weather be like tomorrow?
When users talk to a voice assistant, their spoken words can be converted to text by APIs (e.g.
Amazon’s automatic speech recognition
5
and Google’s Speech-to-Text API
6
). The diversity
of expression in human language has posed enormous difficulties to language understanding.
With the state-of-the-art natural language understanding (NLU) techniques, including syntax
analysis (e.g. tokenization, identifying part-of-speech), entity recognition (e.g. organization,
person, location), sentiment analysis, and intent and topic detection, machines are able to
“understand” user questions. By detecting the given topic and intent, related information and
potential answers are retrieved from various databases like Wikipedia, Google’s knowledge
graphs, and Microsoft’s concept graphs. Retrieved information is then used to generate
responses using different methods, such as rule-based and generative methods. The responses
are then converted from text back to speech to answer user questions conversationally.
Voice assistant technology is now being leveraged to retrieve and reason on open public
data through the development of skills (which are essentially micro tasks). In 2017, Esri
prototyped an early government voice assistant application called Sonar
7
. It offered a chatbot
that completed predefined tasks and addressed standard questions about a given community
by leveraging open data available through Esri Hub. As shown in Figure 1, Sonar performs
lookups on data matching the themes described in a user’s query at a defined location. Users
can ask about city services (e.g., trash pickup), safety (e.g., crimes), and transportation (e.g.,
bus routing). Sonar facilitates both open public data discovery and use by templatizing a set
of intents designed to perform basic computations on geospatial data. In other words, Sonar
provides a set of “core questions” that a community member would want to ask, and maps
them to available, thematically relevant data, using location as context. Thus, governments
can build additional skills upon Sonar’s foundation.
Since the advent of Sonar, many U.S. cities have developed ad hoc voice assistant
applications. Many are designed to reduce administrative burdens, such as “311 information”
calls. For example, the Alexa skills developed for Albuquerque, New Mexico
8
allow residents
to register complaints about graffiti, weeds, abandoned vehicles, and ask questions about
city-owned facilities, like fee information for public parks. Raleigh, North Carolina
9
also
allows residents to ask questions about the government, such as trash pickup days or elected
representatives for a given neighborhood. Similarly, specific city departments, like New
York City’s Department of Environmental Protection
10
, have created Alexa skills that allow
residents to check their water usage and pay their bills. Los Angeles, California
11
has
released several voice applications that provide residents with local information about recent
earthquakes. The earthquake alert works on the Google Home system, which harvests USGS
seismic data to notify residents of recent earthquake events based on the location of their
device. The Alexa skills of Johns Creek, Georgia
12
are robust, continuously mining the city’s
open data portal to provide updated information about zoning and road closures.
These applications all work by knowing where to find open public data and how to use
it in order to answer typical questions that people ask about government. Today, many
5https://developer.amazon.com/alexa-skills- kit/asr
6https://cloud.google.com/speech-to- text/
7https://github.com/Esri/sonar
8https://www.cabq.gov/alexa
9https://www.raleighnc.gov/home/news/content/CorNews/Articles/AlexaApp.html
10 https://www1.nyc.gov/html/dep/html/customer_assistance/amazon-alexa.shtml
11 https://assistant.google.com/services/a/uid/00000096ea087604?hl=en
12 https://www.amazon.com/City-of- Johns-Creek- GA/dp/B07BHPGDR1
COSIT 2019
10:4 Talk of the Town
Figure 1
A list of the six intents (ping, get population, get data, summarize data, add note, and
get map) accessible to users through the Sonar project’s Alexa skill.
applications are built on top of voice assistant-accessible databases that contain standardized
open public data (e.g. government data catalogs exposed through Esri’s Open Data Hub).
However, new trends such as the uptake of the schema.org Dataset standard
13
for the
annotation of open public metadata (e.g. in Google Dataset Search
14
) enable the discovery
of open public data through search engines [
3
]. As public data discoverability increases for
the average user, it will also likely increase for the average voice assistant application. Thus,
as it becomes easier for humans and machines to discover open public data, how can data be
organized to facilitate use? We propose that time and space, inherent to geospatial data in
particular, makes the themes that they are “about” more amenable to such curation.
3 Geospatial Limitations of Government Voice Assistants
The prospect for data discovery and question answering in the applications described in
Section 2 is promising. Many municipalities are working to rapidly expand the skills that
their voice assistants use to help answer questions and engage their communities. This
is a reasonable tactic because it is likely that the efficacy of voice assistants will improve
greatly over the next decade. These systems are synthesizing factual data with real-time
computational abilities, using semantic technologies to answer increasingly complex questions.
However, we have observed two problems with this trend, which are even more evident
when it comes to addressing geospatial questions: 1) voice assistant applications frequently
bypass discovery, and 2) governments are building unsustainable skills. The first problem
means that a system supplies an answer to a question without first allowing a user to explore
available data. This may not seem like a problem when considering the alternative: a voice
13 https://schema.org/Dataset
14 https://toolbox.google.com/datasetsearch
S. Lafia, J. Xiao, T. Hervey, and W. Kuhn 10:5
assistant that would conversationally list available data. This mode of interaction would
be tedious and far less efficient than exploring data by using a graphical browser. In a
way, voice assistants are perceived to have abilities like those of a question-answering oracle.
These question-answering systems bypass the process of manually discovering, manipulating,
using, and reasoning on data themselves. In many cases, users often quickly accept the top
suggestions by search engines
15
. However, much of the value of open data, especially open
geospatial data, is the ability to explore and synthesize information, and conduct visual
analysis. This is not possible with a voice assistant. What would be optimal for discovery
is to make voice assistants more conversational. If a voice assistant application creates an
index of datasets based on generic concepts that a user is familiar with, such as objects and
networks, then the system could conversationally suggest relevant datasets.
The second problem is that if governments continue to build skills in their current manner,
after a few years, they will likely have to maintain many heterogeneous (geospatial) tasks that
will also be hard to improve. In other words, building skills in this manner is unsustainable.
Furthermore, most of the aforementioned examples of applications in Section 2 are not
explicitly geospatial. Those that could be considered geospatial work by retrieving pre-
generated data from factual databases (e.g. water usage), and some leverage near real-time
geospatial information (e.g. earthquakes). More complex geospatial questions, like those
specific to a user’s location, require more complex geospatial computing and cannot yet be
answered. For instance, a question like which hospitals are open now and are also within a
20-minute drive from home?, cannot be answered simply by retrieving data from databases.
Such questions require geospatial analysis and computing, which could be partially supported
by leveraging existing APIs. We therefore believe that if skill building could leverage the
organizational structure of data, and a corresponding conceptual model that humans have of
these types of data, then perhaps computing with them could be easier as well.
4 A Vision for Geospatially-Enabled Voice Assistants
We propose a conceptual framework adopted from Cook and Daniels’ software design
methodology [
4
] as a means of facilitating geospatial data discovery and subsequent use to
provide answers to users’ geospatial questions. Our work formalizing this conceptual model
for spatial data is ongoing and is applicable to both GIS and voice assistant environments.
We are not proposing an implementation solution; rather, we are proposing a conceptual
model to help organize the things that people want to ask about and the computations on
geospatial data to answer those questions.
Cook and Daniels’ software design methodology is comprised of an essential model, a
formal model, and a system model. The essential model is a model of the world built
by objects and events used to understand a situation. The formal model (also called the
specification model) states what the software will do and formalizes the essential model by
mathematical operations. The system model (also called the implementation model) specifies
system-level behavior based on the formal model.
In our framework, the essential model specifies concepts about the real world. Since
spatial questions are about things in the real world, they are cognitively represented by core
concepts of spatial information like fields and networks [
7
]. Thus, the procedure to answer a
question can be formalized as as a set of spatial operations with mathematical foundations
(as a formal model). The information detected from user questions can be used as input for
15 https://moz.com/blog/google-organic- click-through- rates-in-2014
COSIT 2019
10:6 Talk of the Town
the spatial operations. The spatial operations can then be implemented in a chosen software
(as a system model). The results are finally computed by the chosen software and returned
to the user as an answer. An example of this framework is shown in Figure 2.
Figure 2 The essential (blue), formal (green), and system (red) conceptual levels.
To operationalize this framework, we need a means of relating human concepts to
formal operations and system level commands in a GIS [
1
]. Kuhn’s core concepts of spatial
information [
7
] provide a bridge, specifying concepts in user questions at the essential level
and relating them to operations at the formal level. Previous work on question-based spatial
computing used data abstraction to relate user questions to computations in a GIS [9].
Progress can be made on the essential and formal models for at least two core concepts:
fields and networks. Fields as an essential model conceptualize continuous phenomena and
are characterized by continuous functions from location to theme. Prototypical examples
include elevation, temperature, and rainfall. Fields are formalized by map algebra. The field
concept allows users to ask questions like how much did it rain in my neighborhood last night?
Networks are a topological essential model, formalized by graph theory. They allow users to
ask questions like how many bus stops are between my house and downtown? The system
model could take the form of an existing geocomputation API (e.g. GDAL, ArcGIS Online,
etc.). Today, architectures of many geospatially enabled portals (e.g. Socrata with QGIS
16
,
Hub with ArcGIS Online
17
) are already equipped to handle the system model specifications.
By formalizing the operations that are to take place on the geospatial data in the portal,
today’s voice assistant applications move closer to the capabilities of conversational GIS.
The mathematical formalization of fields and networks suggests a manageable set of
questions that users could ask of open government data. We surmise that these two concepts,
their mathematical models, and the accompanying software packages, could provide an entry
point for mapping between user questions and computations, following the architecture
illustrated in Figure 2. In this vision, voice assistants serve as a kind of conversational GIS,
answering a far broader range of geospatial questions about government.
16 https://dev.socrata.com/blog/2016/06/13/geospatial-analysis.html
17 https://doc.arcgis.com/en/hub/sites/explore-data.htm
S. Lafia, J. Xiao, T. Hervey, and W. Kuhn 10:7
Voice assistants following this framework would organize contents based on their spatial
concepts, suggesting data and operations to perform on them based on the concepts in
a user’s question. For example, such a system could parse the previous question which
hospitals are open now and are also within a 20-minute drive from home?, and recognize that
“hospitals” are likely to be objects in a health care data set and determine that “a 20-minute
drive” would require a road network data set. A computation would intersect currently open
hospitals (stored as an attribute of the open dataset) and a 20-minute roadway service area
from the user’s home. If suitable open data sets do not exist, the voice assistant could suggest
alternatives with similar themes based on the concepts present in the original question.
5 Conclusion
A vast amount of open public data is ready for discovery. Technological advances in voice
assistant technology have the potential to actively connect users to developments in their
communities. In this paper, we have explored voice assistant applications that governments
are developing to improve open public data discovery and use. To address challenges that
today’s applications face, we have proposed a conceptual framework informed by core concepts
of spatial information and structured as an essential, a formal, and a system model. Relating
the language of user questions about the world to spatial computations is a step toward
improving discovery and use of open public data for users and their communities.
References
1
Guoray Cai, Hongmei Wang, Alan M. MacEachren, and Sven Fuhrmann. Natural Conversa-
tional Interfaces to Geospatial Databases. Transactions in GIS, 9(2):199–221, March 2005.
doi:10.1111/j.1467-9671.2005.00213.x.
2
Wendy Carrara, Wae San Chan, Sander Fischer, and Evan Steenbergen. Creating value through
open data: Study on the impact of re-use of public data resources. European Commission,
2015. doi:10.2759/328101.
3
Davide Castelvecchi. Google unveils search engine for open data. Nature, 561(7722):161–162,
2018. doi:10.1038/d41586-018-06201-x.
4
John Daniels and Steve Cook. Designing Object Systems: Object-oriented Model ling with
Syntropy. Prentice Hall, Englewood Cliffs, NJ, September, 1994. ISBN: 0-13-203860-9.
5
George W. Furnas, Thomas K. Landauer, Louis M. Gomez, and Susan T. Dumais. The vocabu-
lary problem in human-system communication. Communications of the ACM, 30(11):964–971,
1987. doi:10.1145/32206.32212.
6
Peter A. Johnson, Renee Sieber, Teresa Scassa, Monica Stephens, and Pamela Robinson.
The Cost(s) of Geospatial Open Data. Transactions in GIS, 21(3):434–445, 2017.
doi:
10.1111/tgis.12283.
7
Werner Kuhn. Core concepts of spatial information for transdisciplinary research. International
Journal of Geographical Information Science, 26(12):2267–2276, 2012.
doi:10.1080/13658816.
2012.722637.
8
Sara Lafia, Andrew Turner, and Werner Kuhn. Improving Discovery of Open Civic Data.
LIPIcs-Leibniz International Proceedings in Informatics, 114(9):1–15, 2018.
doi:10.4230/
LIPIcs.GIScience.2018.9.
9Behzad Vahedi, Werner Kuhn, and Andrea Ballatore. Question-based spatial computing—A
case study. In Geospatial Data in a Changing World, pages 37–50. Springer, 2016.
doi:
10.1007/978-3-319-33783- 8_3.
10
Anneke Zuiderwijk and Marijn Janssen. Open data policies, their implementation and
impact: A framework for comparison. Government Information Quarterly, 31(1):17–29, 2014.
doi:10.1016/j.giq.2013.04.003.
COSIT 2019
... Para cerrar esta brecha, se han comenzado a desarrollar cuadros de mando (dashboards) comunitarios que utilizan datos públicos útiles para dar seguimiento a las iniciativas para el consumo público (Calzada, 2021); sin embargo, los cuadros de mando todavía requieren que los usuarios descubran e interpreten los datos. Alternativamente, los gobiernos locales han desarrollado sistemas de descubrimiento de datos que utilizan asistentes de voz como Amazon Alexa y Google, lo que puede facilitar el acceso a los datos y su interpretación (Lafia et al., 2019). ...
... Asimismo, los gobiernos locales deberían fomentar el uso ciudadano de los datos públicos (por ejemplo: transporte, valores inmobiliarios y delincuencia) como una forma de fomentar la participación y las iniciativas para el consumo público (Lafia et al., 2019). ...
Article
Las tecnologías emergentes tienen el potencial de transformar la administración pública de una forma inimaginada. En este sentido, este documento se enfoca en analizar a las tecnologías emergentes en gobiernos locales usando la metodología PRISMA. Las preguntas que guían la investigación son: ¿cuáles son las tecnologías emergentes utilizadas por los gobiernos locales? Y ¿cuáles son los retos y consecuencias del uso de las tecnologías emergentes en los gobiernos locales? Los hallazgos muestran tres tipos de tecnologías emergentes: 1) Básicas (como la tecnología móvil, la Web 2.0, las páginas web y las TIC), 2) De vanguardia (como Blockchain, inteligencia artificial, macrodatos (Big Data) e Internet de las cosas) y 3) Específicas y aplicadas a las ciudades inteligentes, agricultura urbana, conciencia ambiental y telesalud. Aunque la implementación de tecnologías emergentes puede resultar en beneficios para el sector público, uno de los retos consiste en acortar la brecha entre desarrolladores de tecnología y tomadores de decisiones. Asimismo, la inescrutable condición de algunos algoritmos y la capacidad de vigilancia masiva de algunas tecnologías emergentes amenazan la libertad de las sociedades y pueden deshumanizar algunos procesos en el sector público.
... In particular, the combination of OGD and AI is crucial to generate more and better value from the data [8]. Hence, governments are developing data access systems that make use of conversational AI and are integrated into commercial voice assistants [12]. ...
... In a series of works [22,23], Porreca et al. present 11 Microsoft Azure Bot framework, https://dev.botframework.com 12 Schema.org vocabulary, https://schema.org ...
... One natural concern becomes what else these devices listen to and what conversations and noises in the household are recorded in general. Recently, not only the IT press [10], [11], but also industrial vendors themselves [12], [13], and academia [14], [15] have reported how IVAs record entire conversations of anyone in the room, citing that up to 80% of IVA users are concerned about possible breaches of their privacy. This concern is wide-spread, as law enforcement agencies discourage users of certain models of smart TVs with bundled IVA functionality from discussing confidential information in the vicinity of the device [16]. ...
Chapter
Real estate agents are data-intensive professionals who suffer from the problem of not having ways to access all that information simply and quickly, anytime, anywhere. This research consists of the development and evaluation of a set of dashboards screens for a mobile CRM application to visualize the data gathered in a real estate organization in terms of sales and productivity, in order to support professionals in their day-to-day functions. The methodology adopted in this research was Design Science Research (DSR), because, due to its iterative and evolutionary process, it would be possible to achieve a validated artifact from the theoretical and practical point of view at the same time. Five face-to-face interviews were conducted with real estate experts to validate the artifact, where it was concluded that by visualizing the information in the dashboard, professionals can keep a much more conscious record of their goals and performance.KeywordsMobile dashboardsm-CRMInformation systemsUser performance evaluation
Chapter
Voice-based applications might become a promising novel channel for governments to engage with citizens and to provide easy-to-use and always available administrative services. To learn about citizens’ attitudes towards this “conversational e-government”, we conducted an online survey among Swiss residents (n = 397). While half of the participants tended towards not using such prospective services, 38% were positive about them. Regular users of voice assistants and participants with interest in technology showed a significantly higher willingness-to-use. The top-rated service was looking up government-related information. Advanced services (such as ordering certificates by voice) were perceived more skeptically. As trustworthy service providers the municipality and national tech companies were clearly favored over major international ones due to privacy concerns.
Conference Paper
Full-text available
We describe a method and system design for improved data discovery in an integrated network of open geospatial data that supports collaborative policy development between governments and local constituents. Metadata about civic data (such as thematic categories, user-generated tags, geo-references, or attribute schemata) primarily rely on technical vocabularies that reflect scientific or organizational hierarchies. By contrast, public consumers of data often search for information using colloquial terminology that does not align with official metadata vocabularies. For example, citizens searching for data about bicycle collisions in an area are unlikely to use the search terms with which organizations like Departments of Transportation describe relevant data. Users may also search with broad terms, such as “traffic safety”, and will then not discover data tagged with narrower official terms, such as “vehicular crash”. This mismatch raises the question of how to bridge the users’ ways of talking and searching with the language of technical metadata. In similar situations, it has been beneficial to augment official metadata with semantic annotations that expand the discoverability and relevance recommendations of data, supporting more inclusive access. Adopting this strategy, we develop a method for automated semantic annotation, which aggregates similar thematic and geographic information. A novelty of our approach is the development and application of a crosscutting base vocabulary that supports the description of geospatial themes. The resulting annotation method is integrated into a novel open access collaboration platform (Esri’s ArcGIS Hub) that supports public dissemination of civic data and is in use by thousands of government agencies. Our semantic annotation method improves data discovery for users across organizational repositories and has the potential to facilitate the coordination of community and organizational work, improving the transparency and efficacy of government policies.
Article
Full-text available
The provision of open data by governments at all levels has rapidly increased over recent years. Given that one of the dominant motivations for the provision of open data is to generate ‘value’, both economic and civic, there are valid concerns over the costs incurred in this pursuit. Typically, costs of open data are framed as internal to the data providing government. Building on the strong history of GIScience research on data provision via spatial data infrastructures, this article considers both the direct and indirect costs of open data provision, framing four main areas of indirect costs: citizen participation challenges, uneven provision across geography and user types, subsidy of private sector activities, and the creation of inroads for corporate influence on government. These areas of indirect cost lead to the development of critical questions, including constituency, purpose, enablement, protection, and priorities. These questions are posed as a guide to governments that provide open data in addressing the indirect costs of open data. © 2017 The Authors. Transactions in GIS published by John Wiley & Sons Ltd
Chapter
Full-text available
Geographic Information Systems (GIS) support spatial problem solving by large repositories of procedures, which are mainly operating on map layers. These procedures and their parameters are often not easy to understand and use, especially not for domain experts without extensive GIS training. This hinders a wider adoption of mapping and spatial analysis across disciplines. Building on the idea of core concepts of spatial information, and further developing the language for spatial computing based on them, we introduce an alternative approach to spatial analysis, based on the idea that users should be able to ask questions about the environment, rather than finding and executing procedures on map layers. We define such questions in terms of the core concepts of spatial information, and use data abstraction instead of procedural abstraction to structure command spaces for application programmers (and ultimately for end users). We sketch an implementation in Python that enables application programmers to dispatch computations to existing GIS capabilities. The gains in usability and conceptual clarity are illustrated through a case study from economics, comparing a traditional procedural solution with our declarative approach. The case study shows a reduction of computational steps by around 45 %, as well as smaller and better organized command spaces.
Article
Full-text available
Geographic information science is emerging from its niche ‘behind the systems’, getting ready to contribute to transdisciplinary research. To succeed, a conceptual consensus across multiple disciplines on what spatial information is and how it can be used is needed. This article proposes a set of 10 core concepts of spatial information, intended to be meaningful to scientists who are not specialists of spatial information: location, neighbourhood, field, object, network, event, granularity, accuracy, meaning, and value. Each proposed concept is briefly characterized, demonstrating the need to map between their different disciplinary uses.
Article
The tool, called Google Dataset Search, should help researchers to find the data they need more easily. The new feature, called Google Dataset Search, locates open data repositories, and should help researchers to find the data they need more easily.
Article
In developing open data policies, governments aim to stimulate and guide the publication of government data and to gain advantages from its use. Currently there is a multiplicity of open data policies at various levels of government, whereas very little systematic and structured research has been done on the issues that are covered by open data policies, their intent and actual impact. Furthermore, no suitable framework for comparing open data policies is available, as open data is a recent phenomenon and is thus in an early stage of development. In order to help bring about a better understanding of the common and differentiating elements in the policies and to identify the factors affecting the variation in policies, this paper develops a framework for comparing open data policies. The framework includes the factors of environment and context, policy content, performance indicators and public values. Using this framework, seven Dutch governmental policies at different government levels are compared. The comparison shows both similarities and differences among open data policies, providing opportunities to learn from each other's policies. The findings suggest that current policies are rather inward looking, open data policies can be improved by collaborating with other organizations, focusing on the impact of the policy, stimulating the use of open data and looking at the need to create a culture in which publicizing data is incorporated in daily working processes. The findings could contribute to the development of new open data policies and the improvement of existing open data policies.
Article
In almost all computer applications, users must enter correct words for the desired objects or actions. For success without extensive training, or in first-tries for new targets, the system must recognize terms that will be chosen spontaneously. We studied spontaneous word choice for objects in five application-related domains, and found the variability to be surprisingly large. In every case two people favored the same term with probability less than 0. 20. Simulations show how this fundamental property of language limits the success of various design methodologies for vocabulary-driven interaction. For example, the popular approach in which access is via one designer's favorite single word will result in 80-90 percent failure rates in many common situations. An optimal strategy, unlimited aliasing, is derived and shown to be capable of several-fold improvements. (Author abstracat)
Article
Natural (spoken) language, combined with gestures and other human modalities, provides a promising alternative for interacting with computers, but such benefit has not been explored for interactions with geographical information systems. This paper presents a conceptual framework for enabling conversational human- GIS interactions. Conversations with a GIS are modeled as human-computer collaborative activities within a task domain. We adopt a mental state view of collaboration and discourse and propose a plan-based computational model for conversational grounding and dialogue generation. At the implementation level, our approach is to introduce a dialogue agent, GeoDialogue , between a user and a geographical information server. GeoDialogue actively recognizes user's information needs, reasons about detailed cartographic and database procedures, and acts cooperatively to assist user's problem solving. GeoDialogue serves as a semantic 'bridge' between the human language and the formal language that a GIS understands. The behavior of such dialogue-assisted human-GIS interfaces is illustrated through a scenario simulating a session of emergency response during a hurricane event.
Creating value through open data: Study on the impact of re-use of public data resources
  • Wendy Carrara
  • Sander Wae San Chan
  • Evan Fischer
  • Steenbergen
Wendy Carrara, Wae San Chan, Sander Fischer, and Evan Steenbergen. Creating value through open data: Study on the impact of re-use of public data resources. European Commission, 2015. doi:10.2759/328101.