Talk of the Town: Discovering Open Public Data
via Voice Assistants
Department of Geography, University of California, Santa Barbara, USA
Department of Geography, University of California, Santa Barbara, USA
Department of Geography, University of California, Santa Barbara, USA
Department of Geography, University of California, Santa Barbara, USA
Access to public data in the United States and elsewhere has steadily increased as governments
have launched geospatially-enabled web portals like Socrata, CKAN, and Esri Hub. However,
data discovery in these portals remains a challenge for the average user. Diﬀerences between
users’ colloquial search terms and authoritative metadata impede data discovery. For example, a
motivated user with expertise can leverage valuable public data about transportation, real estate
values, and crime, yet it remains diﬃcult for the average user to discover and leverage data. To
close this gap, community dashboards that use public data are being developed to track initiatives
for public consumption; however, dashboards still require users to discover and interpret data.
Alternatively, local governments are now developing data discovery systems that use voice assistants
like Amazon Alexa and Google Home as conversational interfaces to public data portals. We explore
these emerging technologies, examining the application areas they are designed to address and the
degree to which they currently leverage existing open public geospatial data. In the context of
ongoing technological advances, we envision using core concepts of spatial information to organize
the geospatial themes of data exposed through voice assistant applications. This will allow us to
curate them for improved discovery, ultimately supporting more meaningful user questions and their
translation into spatial computations.
2012 ACM Subject Classiﬁcation Information systems →Service discovery and interfaces
Keywords and phrases data discovery, open public data, voice assistants, essential model, GIS
Digital Object Identiﬁer 10.4230/LIPIcs.COSIT.2019.10
Category Short Paper
The work presented in this paper resulted from a graduate research seminar at
the UCSB Department of Geography. Feedback from Behzad Vahedi and Gengchen Mai is gratefully
acknowledged. The work was supported by the UCSB Center for Spatial Studies.
1 Open Data: Of, By, and For the People?
Open data, also called public sector information, aspire to increase the transparency of
government activities and their accountability to the public [
]. In the United States,
mandates for open data are often satisﬁed in part by the adoption of platforms, like CKAN
and ArcGIS Hub, which mediate public access to government data catalogs. The platforms
often include both geospatial (e.g. parcel maps) and non-geospatial data (e.g. tax tables).
©Sara Laﬁa, Jingyi Xiao, Thomas Hervey, and Werner Kuhn;
licensed under Creative Commons License CC-BY
14th International Conference on Spatial Information Theory (COSIT 2019).
Editors: Sabine Timpf, Christoph Schlieder, Markus Kattenbeck, Bernd Ludwig, and Kathleen Stewart;
Article No. 10; pp. 10:1–10:7
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
10:2 Talk of the Town
However, the production, maintenance, and dissemination of such authoritative open public
data is costly for providers and the eﬀort does not guarantee increased public engagement
]. Furthermore, work remains to be done to help governments keep track of the direct and
indirect beneﬁts of their open data policies, measured in part by tracking data reuse .
Impediments to the uptake of open public data include challenges with data discovery
and usability. Discovery is understood broadly as a mode of exploratory search that involves
browsing for task appropriate data, while usability describes the ﬁtness of data for a deﬁned
. A major impediment to data discovery in human-system communication is reﬂected
by the “vocabulary problem” [
], in which users rarely agree on what to call the things
that they want to ﬁnd. This makes eﬀective keyword-based search and discovery diﬃcult to
accomplish in public open data portals. To address this problem, some open data portals like
Esri’s ArcGIS Hub have partnered with community organizations to develop ontologies that
map the terms used to describe community level initiatives from authoritative vocabularies
(e.g. the USGS Thesaurus) to users’ colloquial terminology .
While this strategy addresses data discovery, it does not address underlying issues with
data usability. Even if users are better able to identify task-appropriate data, they generally do
not know how to assess the ﬁtness of data for a given task and are still expected to manipulate
and analyze data to gain insights. Given these constraints, services like data dashboards are
being developed, allowing users to track vital community issues (e.g. pedestrian fatalities
without requiring them to manipulate, clean, or visualize data. Even more empowering
are alternative modalities, such as those oﬀered by voice assistants, which are growing in
and have implications for open public data discovery and use. Governments
have suggested that voice assistants might oﬀer new interfaces for connecting community
members to public services and information exposed through open public data portals.
In this paper, we explore the current capabilities of various voice assistants under
development by local governments across the United States. We focus on the application
areas that these systems are designed to address and examine how (if at all) they leverage
geospatial data. Next, we discuss the challenges that voice assistants face when answering
geospatial questions. Finally, we envision using core concepts of spatial information [
organize the geospatial themes of data that users want to discover, with the goal of supporting
a broader range of user questions and spatial computations on them. We focus primarily on
improvements to discovery for existing systems that also carry beneﬁts for data usability.
2 State of the Art for Government Voice Assistants
Voice assistants are now widely available on commercial smart speakers, such as Google
Assistant and Amazon Alexa. A recent survey
has projected that half of all U.S. homes
will own smart speakers by the end of 2019. The same survey also reported that the most
common interactions with voice services include asking questions, performing online searches,
performing basic research like conﬁrming information, and asking for directions.
While today’s voice assistants are used to control home automation systems and perform
other basic daily tasks, interest has shifted to more intelligent interactions such as enabling
natural conversations and answering questions. Users are able to ask questions about real-
3https://www.citylab.com/solutions/2018/10/amazon-alexa- smart-speakers- city/573412/
4https://www.cmo.com/features/articles/2018/9/7/adobe-2018- consumer-voice- survey.html
S. Laﬁa, J. Xiao, T. Hervey, and W. Kuhn 10:3
time information, such as what time it is now? and what wil l the weather be like tomorrow?
When users talk to a voice assistant, their spoken words can be converted to text by APIs (e.g.
Amazon’s automatic speech recognition
and Google’s Speech-to-Text API
). The diversity
of expression in human language has posed enormous diﬃculties to language understanding.
With the state-of-the-art natural language understanding (NLU) techniques, including syntax
analysis (e.g. tokenization, identifying part-of-speech), entity recognition (e.g. organization,
person, location), sentiment analysis, and intent and topic detection, machines are able to
“understand” user questions. By detecting the given topic and intent, related information and
potential answers are retrieved from various databases like Wikipedia, Google’s knowledge
graphs, and Microsoft’s concept graphs. Retrieved information is then used to generate
responses using diﬀerent methods, such as rule-based and generative methods. The responses
are then converted from text back to speech to answer user questions conversationally.
Voice assistant technology is now being leveraged to retrieve and reason on open public
data through the development of skills (which are essentially micro tasks). In 2017, Esri
prototyped an early government voice assistant application called Sonar
. It oﬀered a chatbot
that completed predeﬁned tasks and addressed standard questions about a given community
by leveraging open data available through Esri Hub. As shown in Figure 1, Sonar performs
lookups on data matching the themes described in a user’s query at a deﬁned location. Users
can ask about city services (e.g., trash pickup), safety (e.g., crimes), and transportation (e.g.,
bus routing). Sonar facilitates both open public data discovery and use by templatizing a set
of intents designed to perform basic computations on geospatial data. In other words, Sonar
provides a set of “core questions” that a community member would want to ask, and maps
them to available, thematically relevant data, using location as context. Thus, governments
can build additional skills upon Sonar’s foundation.
Since the advent of Sonar, many U.S. cities have developed ad hoc voice assistant
applications. Many are designed to reduce administrative burdens, such as “311 information”
calls. For example, the Alexa skills developed for Albuquerque, New Mexico
to register complaints about graﬃti, weeds, abandoned vehicles, and ask questions about
city-owned facilities, like fee information for public parks. Raleigh, North Carolina
allows residents to ask questions about the government, such as trash pickup days or elected
representatives for a given neighborhood. Similarly, speciﬁc city departments, like New
York City’s Department of Environmental Protection
, have created Alexa skills that allow
residents to check their water usage and pay their bills. Los Angeles, California
released several voice applications that provide residents with local information about recent
earthquakes. The earthquake alert works on the Google Home system, which harvests USGS
seismic data to notify residents of recent earthquake events based on the location of their
device. The Alexa skills of Johns Creek, Georgia
are robust, continuously mining the city’s
open data portal to provide updated information about zoning and road closures.
These applications all work by knowing where to ﬁnd open public data and how to use
it in order to answer typical questions that people ask about government. Today, many
12 https://www.amazon.com/City-of- Johns-Creek- GA/dp/B07BHPGDR1
10:4 Talk of the Town
A list of the six intents (ping, get population, get data, summarize data, add note, and
get map) accessible to users through the Sonar project’s Alexa skill.
applications are built on top of voice assistant-accessible databases that contain standardized
open public data (e.g. government data catalogs exposed through Esri’s Open Data Hub).
However, new trends such as the uptake of the schema.org Dataset standard
annotation of open public metadata (e.g. in Google Dataset Search
) enable the discovery
of open public data through search engines [
]. As public data discoverability increases for
the average user, it will also likely increase for the average voice assistant application. Thus,
as it becomes easier for humans and machines to discover open public data, how can data be
organized to facilitate use? We propose that time and space, inherent to geospatial data in
particular, makes the themes that they are “about” more amenable to such curation.
3 Geospatial Limitations of Government Voice Assistants
The prospect for data discovery and question answering in the applications described in
Section 2 is promising. Many municipalities are working to rapidly expand the skills that
their voice assistants use to help answer questions and engage their communities. This
is a reasonable tactic because it is likely that the eﬃcacy of voice assistants will improve
greatly over the next decade. These systems are synthesizing factual data with real-time
computational abilities, using semantic technologies to answer increasingly complex questions.
However, we have observed two problems with this trend, which are even more evident
when it comes to addressing geospatial questions: 1) voice assistant applications frequently
bypass discovery, and 2) governments are building unsustainable skills. The ﬁrst problem
means that a system supplies an answer to a question without ﬁrst allowing a user to explore
available data. This may not seem like a problem when considering the alternative: a voice
S. Laﬁa, J. Xiao, T. Hervey, and W. Kuhn 10:5
assistant that would conversationally list available data. This mode of interaction would
be tedious and far less eﬃcient than exploring data by using a graphical browser. In a
way, voice assistants are perceived to have abilities like those of a question-answering oracle.
These question-answering systems bypass the process of manually discovering, manipulating,
using, and reasoning on data themselves. In many cases, users often quickly accept the top
suggestions by search engines
. However, much of the value of open data, especially open
geospatial data, is the ability to explore and synthesize information, and conduct visual
analysis. This is not possible with a voice assistant. What would be optimal for discovery
is to make voice assistants more conversational. If a voice assistant application creates an
index of datasets based on generic concepts that a user is familiar with, such as objects and
networks, then the system could conversationally suggest relevant datasets.
The second problem is that if governments continue to build skills in their current manner,
after a few years, they will likely have to maintain many heterogeneous (geospatial) tasks that
will also be hard to improve. In other words, building skills in this manner is unsustainable.
Furthermore, most of the aforementioned examples of applications in Section 2 are not
explicitly geospatial. Those that could be considered geospatial work by retrieving pre-
generated data from factual databases (e.g. water usage), and some leverage near real-time
geospatial information (e.g. earthquakes). More complex geospatial questions, like those
speciﬁc to a user’s location, require more complex geospatial computing and cannot yet be
answered. For instance, a question like which hospitals are open now and are also within a
20-minute drive from home?, cannot be answered simply by retrieving data from databases.
Such questions require geospatial analysis and computing, which could be partially supported
by leveraging existing APIs. We therefore believe that if skill building could leverage the
organizational structure of data, and a corresponding conceptual model that humans have of
these types of data, then perhaps computing with them could be easier as well.
4 A Vision for Geospatially-Enabled Voice Assistants
We propose a conceptual framework adopted from Cook and Daniels’ software design
] as a means of facilitating geospatial data discovery and subsequent use to
provide answers to users’ geospatial questions. Our work formalizing this conceptual model
for spatial data is ongoing and is applicable to both GIS and voice assistant environments.
We are not proposing an implementation solution; rather, we are proposing a conceptual
model to help organize the things that people want to ask about and the computations on
geospatial data to answer those questions.
Cook and Daniels’ software design methodology is comprised of an essential model, a
formal model, and a system model. The essential model is a model of the world built
by objects and events used to understand a situation. The formal model (also called the
speciﬁcation model) states what the software will do and formalizes the essential model by
mathematical operations. The system model (also called the implementation model) speciﬁes
system-level behavior based on the formal model.
In our framework, the essential model speciﬁes concepts about the real world. Since
spatial questions are about things in the real world, they are cognitively represented by core
concepts of spatial information like ﬁelds and networks [
]. Thus, the procedure to answer a
question can be formalized as as a set of spatial operations with mathematical foundations
(as a formal model). The information detected from user questions can be used as input for
15 https://moz.com/blog/google-organic- click-through- rates-in-2014
10:6 Talk of the Town
the spatial operations. The spatial operations can then be implemented in a chosen software
(as a system model). The results are ﬁnally computed by the chosen software and returned
to the user as an answer. An example of this framework is shown in Figure 2.
Figure 2 The essential (blue), formal (green), and system (red) conceptual levels.
To operationalize this framework, we need a means of relating human concepts to
formal operations and system level commands in a GIS [
]. Kuhn’s core concepts of spatial
] provide a bridge, specifying concepts in user questions at the essential level
and relating them to operations at the formal level. Previous work on question-based spatial
computing used data abstraction to relate user questions to computations in a GIS .
Progress can be made on the essential and formal models for at least two core concepts:
ﬁelds and networks. Fields as an essential model conceptualize continuous phenomena and
are characterized by continuous functions from location to theme. Prototypical examples
include elevation, temperature, and rainfall. Fields are formalized by map algebra. The ﬁeld
concept allows users to ask questions like how much did it rain in my neighborhood last night?
Networks are a topological essential model, formalized by graph theory. They allow users to
ask questions like how many bus stops are between my house and downtown? The system
model could take the form of an existing geocomputation API (e.g. GDAL, ArcGIS Online,
etc.). Today, architectures of many geospatially enabled portals (e.g. Socrata with QGIS
Hub with ArcGIS Online
) are already equipped to handle the system model speciﬁcations.
By formalizing the operations that are to take place on the geospatial data in the portal,
today’s voice assistant applications move closer to the capabilities of conversational GIS.
The mathematical formalization of ﬁelds and networks suggests a manageable set of
questions that users could ask of open government data. We surmise that these two concepts,
their mathematical models, and the accompanying software packages, could provide an entry
point for mapping between user questions and computations, following the architecture
illustrated in Figure 2. In this vision, voice assistants serve as a kind of conversational GIS,
answering a far broader range of geospatial questions about government.
S. Laﬁa, J. Xiao, T. Hervey, and W. Kuhn 10:7
Voice assistants following this framework would organize contents based on their spatial
concepts, suggesting data and operations to perform on them based on the concepts in
a user’s question. For example, such a system could parse the previous question which
hospitals are open now and are also within a 20-minute drive from home?, and recognize that
“hospitals” are likely to be objects in a health care data set and determine that “a 20-minute
drive” would require a road network data set. A computation would intersect currently open
hospitals (stored as an attribute of the open dataset) and a 20-minute roadway service area
from the user’s home. If suitable open data sets do not exist, the voice assistant could suggest
alternatives with similar themes based on the concepts present in the original question.
A vast amount of open public data is ready for discovery. Technological advances in voice
assistant technology have the potential to actively connect users to developments in their
communities. In this paper, we have explored voice assistant applications that governments
are developing to improve open public data discovery and use. To address challenges that
today’s applications face, we have proposed a conceptual framework informed by core concepts
of spatial information and structured as an essential, a formal, and a system model. Relating
the language of user questions about the world to spatial computations is a step toward
improving discovery and use of open public data for users and their communities.
Guoray Cai, Hongmei Wang, Alan M. MacEachren, and Sven Fuhrmann. Natural Conversa-
tional Interfaces to Geospatial Databases. Transactions in GIS, 9(2):199–221, March 2005.
Wendy Carrara, Wae San Chan, Sander Fischer, and Evan Steenbergen. Creating value through
open data: Study on the impact of re-use of public data resources. European Commission,
Davide Castelvecchi. Google unveils search engine for open data. Nature, 561(7722):161–162,
John Daniels and Steve Cook. Designing Object Systems: Object-oriented Model ling with
Syntropy. Prentice Hall, Englewood Cliﬀs, NJ, September, 1994. ISBN: 0-13-203860-9.
George W. Furnas, Thomas K. Landauer, Louis M. Gomez, and Susan T. Dumais. The vocabu-
lary problem in human-system communication. Communications of the ACM, 30(11):964–971,
Peter A. Johnson, Renee Sieber, Teresa Scassa, Monica Stephens, and Pamela Robinson.
The Cost(s) of Geospatial Open Data. Transactions in GIS, 21(3):434–445, 2017.
Werner Kuhn. Core concepts of spatial information for transdisciplinary research. International
Journal of Geographical Information Science, 26(12):2267–2276, 2012.
Sara Laﬁa, Andrew Turner, and Werner Kuhn. Improving Discovery of Open Civic Data.
LIPIcs-Leibniz International Proceedings in Informatics, 114(9):1–15, 2018.
9Behzad Vahedi, Werner Kuhn, and Andrea Ballatore. Question-based spatial computing—A
case study. In Geospatial Data in a Changing World, pages 37–50. Springer, 2016.
Anneke Zuiderwijk and Marijn Janssen. Open data policies, their implementation and
impact: A framework for comparison. Government Information Quarterly, 31(1):17–29, 2014.