ArticlePDF Available

Open source data mining infrastructure for exploring and analysing OpenStreetMap

Authors:

Abstract and Figures

OpenStreetMap and other Volunteered Geographic Information datasets have been explored in the last years, with the aim of understanding how their meaning is rendered, of assessing their quality, and of understanding the community-driven process that creates and maintains the data. Research mostly focuses either on the data themselves while ignoring the social processes behind, or solely discusses the community-driven process without making sense of the data at a larger scale. A holistic understanding that takes these and other aspects into account is, however, seldom gained. This article describes a server infrastructure to collect and process data about different aspects of OpenStreetMap. The resulting data are offered publicly in a common container format, which fosters the simultaneous examination of different aspects with the aim of gaining a more holistic view and facilitates the results’ reproducibility. As an example of such uses, we discuss the project OSMvis. This project offers a number of visualizations, which use the datasets produced by the server infrastructure to explore and visually analyse different aspects of OpenStreetMap. While the server infrastructure can serve as a blueprint for similar endeavours, the created datasets are of interest themselves too.
This content is subject to copyright. Terms and conditions apply.
Open Geospatial Data,
Software and Standards
Mocnik et al. Open Geospatial Data, Software and Standards (2018) 3:7
https://doi.org/10.1186/s40965-018-0047-6
SOFTWARE Open Access
Open source data mining infrastructure
for exploring and analysing OpenStreetMap
Franz-Benjamin Mocnik*, Amin Mobasheri and Alexander Zipf
Abstract
OpenStreetMap and other Volunteered Geographic Information datasets have been explored in the last years, with
the aim of understanding how their meaning is rendered, of assessing their quality, and of understanding the
community-driven process that creates and maintains the data. Research mostly focuses either on the data themselves
while ignoring the social processes behind, or solely discusses the community-driven process without making sense
of the data at a larger scale. A holistic understanding that takes these and other aspects into account is, however,
seldom gained. This article describes a server infrastructure to collect and process data about different aspects of
OpenStreetMap. The resulting data are offered publicly in a common container format, which fosters the
simultaneous examination of different aspects with the aim of gaining a more holistic view and facilitates the results’
reproducibility. As an example of such uses, we discuss the project OSMvis. This project offers a number of
visualizations, which use the datasets produced by the server infrastructure to explore and visually analyse different
aspects of OpenStreetMap. While the server infrastructure can serve as a blueprint for similar endeavours, the created
datasets are of interest themselves too.
Keywords: Infrastructure, Data repository, Volunteered geographic information (VGI), OpenStreetMap (OSM),
Information visualization, Visual analysis, Data quality
Introduction and background
The way geographical information is collected and used,
and also the characteristics of the data themselves, has
changed in the last years and decades. Geographical
information is not longer the domain of experts, but
citizens, often without formal qualification, voluntarily
collect information about the environment they are living
or working in, or are familiar with for some other reason.
Citizens also derive geographical information from aerial
images and other sources. Such geographical information
that is collected in a cummunity-driven process is called
Volunteered Geographic Information (VGI) [1,2].
One of the most prominent examples of VGI is Open-
StreetMap (OSM) [3] – a community-driven effort to
collect geographical information about the environment:
streets and buildings; villages, cities, administrative divi-
sions, and country boundaries; land use classification; natural
and physical land features; amenities; leisure, tourism,
*Correspondence: mocnik@uni-heidelberg.de
Institute of Geography, Heidelberg University, Im Neuenheimer Feld 348,
69120 Heidelberg, Germany
sport sites, and shops; and many more information. The
data derived and maintained by the OSM project can be
used for different purposes [4]. While the data are often
used to create street maps (www.openstreetmap.org),
also other services have been created based on or using
OSM data: topographic maps (www.opentopomap.org),
thematic maps for cyclists (www.opencyclemap.org), for
sailors (www.openseamap.org), and for users of ski pistes
(www.opensnowmap.org); a (reverse) geocoder (nominatim.
openstreetmap.org); a digital globe (marble.kde.org);
routing services (www.project-osrm.org,www.openroute
service.org); etc. OSM data are used by citizens, com-
panies, and organizations for producing and viewing
maps, for planning tasks, for humanitarian aid, and crisis
management.
DataqualityisamajorissueforOSMandVGIingen-
eral, because a holistic and deep understanding of the
data creating process is needed [57]. The characteristics
of VGI to be collected and maintained by a commu-
nity implies that the advancement of the dataset and its
underlaying folksonomy, that is, the semantics of the tags
used in OSM, are not controlled by a single person or
© The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons license, and indicate if changes were made.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Mocnik et al. Open Geospatial Data, Software and Standards (2018) 3:7 Page 2 of 15
authority but rather the result of a community-driven pro-
cess [810]. The tools and services used to produce and
process the data are neither. The data are rather edited
and the folksonomy and tools created on a voluntary basis
by individuals who coordinate their activities, leading to
strong heterogeneity among the users and their mapping
behaviour [11,12]; among the choice of what to map;
among the chosen representation; and among the sources
used to map, for example, aerial images, local knowledge,
and GPS tracks. This heterogeneity implies that some
regions are mapped more complete than others [11,12];
concepts are used differently [13]; the data are not always
up to date [12]; and locations are described with differing
precision [12].
The comprehension of the complex, community-driven
process that leads to VGI datasets is necessary to under-
stand the data and their quality, because it is foremost
this process which distinguishes VGI data from other
data and potentially causes heterogeneity. Information
visualization investigates visual representations of com-
plex information to reinforce human cognition and, in
turn, make complex processes more tangible. This article
demonstrates how techniques from information visual-
ization can be used to gain a deeper understanding of
the community-driven process of creating and modifying
OSM data.
There are several studies and tools that inspect, anal-
yse and/or employ VGI and OSM data in particular. In
the case of OSM there exist several systems to display
and investigate OSM data, for example, the OSM web
platform, GIS systems such as QGIS (open source), and
ArcGIS. Several other software also exist for editing OSM
data, with iD,Potlach 2,JOSM,Maps.me,Vespucci being
the most common tools for this purpose. A capability that
is lacking in before-mentioned software is the possibil-
ity to analyse and display information about the creation
process of data.
For this specific purpose, several other software tools
exist that can only examine and visualize the creation
process of a small part of the database. An example
of such services is show-me-the-way (www.github.
com/osmlab/show-me-the-way), which provides users
the possibility to visualize the recent changes of OSM
data (with short delay). Further examples include osm-
deep-history (www.github.com/osmlab/osm-deep-history),
which allows to analyze the history of OSM; Aug-
mented OSM Change Viewer (overpass-api.de/achavi),
which allows to analyze a collection of changes sub-
mitted to OSM in the form of a ‘changeset’; and the
tool Who did it? (zverik.osm.rambler.ru/whodidit/),
which provides information about local changes.
In addition, there exist websites that collect, aggre-
gate, and analyse information about the tags used to
describe OSM objects. The most famous websites are:
Tagi n fo (taginfo.openstreetmap.org), Tag f i n de r (tagfinder.
herokuapp.com), OSM Tag History (taghistory.raifer.
tech), and OSMstats (osmstats.neis-one.org). Further-
more, a first statistical analysis of OSM users has been
provided by Mooney and Corcoran [9,14].
The quality of OSM data has been examined in respect
to many purposes and many areas using different methods
[7]. Trame et al. [15], for example, examined the lineage
of OSM objects and visualized the result as a cartographic
heat map. Roick et al. [16,17] have visualized several sta-
tistical properties of OSM data by a cartographic heat
map, with the aim to understand the quality of the data.
An overview of how the quality of OSM data compares
to the one of other datasets has been provided by Hak-
lay [18] for the first time and repeated in several other
studies and use case scenarios. In terms of geographic
areas, the quality of OSM has, among others, been exam-
ined for Germany [19], France [20], Brazil [21]aswellas
for Iran [22]. OSM data have been widely used in various
application domains including disaster management [23],
routing and navigation [24,25], and urban demographic
estimation [26].
In the context of the community-driven process, data
quality has widely been discussed, among others, by
Keßler et al. [27] in respect to trust; by Arsanjani et al.
[28] in respect to the quality of single contributors; by
Rehrl et al. [29] in respect to contribution patterns; by
Rehrl et al. [30] in respect to the motivations to con-
tribute; and by Mooney et al. [9] in respect to the com-
munity consisting of individual contributors. Hashemi
et al. [31] have assessed the logical consistency; Ballatore
et al. [6], the conceptual quality; and Vandecasteele et al.
[32] and Mooney et al. [13] have discussed the influ-
ence of the tagging process on data quality. Some intrin-
sic quality measures have been discussed by Mooney
et al. [33] and Gröchenig et al. [11]. A general overview
over methods and indicators to assess data quality of
OSM has been provided by Barron et al. [34]and
Senaratne et al. [7].
Despite the extensive literature body on the examination
of OSM data and OSM mapping efforts as well as on data
quality in respect to VGI and OSM in particular, technolo-
gies to support these aims have been discussed rarely. This
is despite the fact that technology is an important tool for
conducting research. This article aims at developing and
improving the view on infrastructure to investigate OSM
data and OSM mapping efforts in a scientific context. In
particular, we present new infrastructure that
(1) fosters the analysis and the comprehension of differ-
ent aspects of OSM,
(2) efficiently supports the analysis without overusing
existing server infrastructure, and
(3) makes the analysis of the data reproducible.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Mocnik et al. Open Geospatial Data, Software and Standards (2018) 3:7 Page 3 of 15
In this article, we examine infrastructure to collect and
mine data from and about the OSM project in order
to explore and analyse them holistically, but the results
can easily be transferred to other VGI projects. First, we
discuss the OSM dataset and numerous additional data
sources, among them the documentation of the folkson-
omy in the OSM wiki and statistical information about
the OSM dataset (“Data sources” section). This discussion
is followed by a general overview over the infrastruc-
ture that aims at achieving the above requirements 1 to 3
(“Conceptual overview” section). The infrastructure con-
sists, in particular, of software to retrieve, merge, filter,
and aggregate data from the data sources, which are
then stored in a repository (“Technologies and resulting
datasets” section). The resulting datasets of this reposi-
tory can be used to examine several aspects of original
data related to the OSM project in detail. As an exam-
ple of such software, visualizations that use original data
from the repository are discussed. These visualizations
demonstrate, in particular, how the mapping behaviour
and the folksonomy of OSM can be examined (“Results
and discussion of use cases”section).
Implementation
A holistic understanding of the OSM data and its creation
process can only be gained by examining several datasets.
In the following sections, we discuss some important
datasets by referring to their characteristics and their
potential use, and we discuss how they can be mined and
combined into new datasets.
Data sources
Several data sources are needed to get an overview over
the community-driven data collection and maintenance
process, which explains the heterogeneous nature of the
data and eventually renders the datas’ meaning. This
section discusses the most important data sources related
to OSM: the OSM database, which contains the col-
lected data about the environment themselves; the OSM
changesets, which contain information about which data
of the database were edited, by whom, and under which
circumstances; the OSM wiki, which can be seen as a doc-
umentation of the folksonomy of the OSM database and
a documentation of the community activity; and some
additional sources.
The OSM database. It is the aim of OSM to collect
information about the environment, which can be repre-
sented in a map. This information is saved in the OSM
database, which is accordingly a major component of
OSM. The database is currently hosted at the Imperial
College in London and technically maintained by the
OpenStreetMap Foundation. Despite this technical cus-
todianship, the data is created by a community-driven
process. This process of data collection has a strong influ-
ence on how the environment is formally represented in
the data, on which data are stored into the database, and
thus on the quality of OSM data – data contained in
the OSM database. While this community-driven process
is not directly reflected by the data stored in the OSM
database, information about this process can indirectly
be concluded by understanding local differences between
the data of different areas, by tracing the heterogene-
ity of the dataset, and by comparing the data with other
information, for example, aerial images.
The OSM database is mostly accessed indirectly, for
example, when using tiles to view a map, or when using
a routing service or a geocoder based on OSM data. Dif-
ferent APIs have been used to access the data. Currently
(in 2017), the OSM API and the Overpass API can be
considered as de facto standards for accessing the data
stored in the official instance of the OSM database, or
a mirror. In addition, dumps of the OSM database are
offered by Planet OSM (http://planet.openstreetmap.org).
These world-wide database dumps are complemented by
regional extracts, which are available, for example, on
Geofabrik downloads (http://download.geofabrik.de).
OSM changesets. Changes in the OSM database are
grouped into changesets, which are collections of changes
in the OSM database. Such a change can be the addition
or deletion of a node, a way, or a relation, that is, an OSM
object, and it can be the addition, modification, or dele-
tion of a tag of an object. Every changeset also contains
information about whom did edit and in which timespan
the edits have been performed. In addition, information
about the used editor and the source, as well as other rel-
evant information, are optionally part of the changeset. A
changeset provides information about the circumstances
under which data were modified, but potentially also
about the intention of the user and the data quality of the
added data. Planet OSM provides weekly dumps of the
changesets.
The OSM wiki. Without a community of volunteers,
VGI cannot exist. Data and information about how to
interpret the data, in particular the tags (that is what we
will call the folksonomy of OSM), would not be collected;
the collected data would not be maintained; tools would
not be developed; and the use of the data would not be
promoted. These activities are documented in the OSM
wiki (http://wiki.openstreetmap.org)asameansofcom-
munication in the community. When someone wants to
participate in the community-driven process, for exam-
ple, by contributing to the OSM database, he or she may
refer to the wiki to understand the current progress, which
help is needed, ongoing decision processes, and the cur-
rent folksonomy. Many pages in the OSM wiki exist in
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Mocnik et al. Open Geospatial Data, Software and Standards (2018) 3:7 Page 4 of 15
several languages, which we will call language versions
in the following. The language versions are often direct
translations, often shortened to the most important infor-
mation or country specific information. The OSM wiki
contains a documentation of the folksonomy of the OSM
data, in particular of many concepts used for tagging OSM
objects [8].
Additional sources. Many information around the OSM
database, the OSM changesets, and the OSM wiki
have been created as part of the community effort
to create and use OSM data. Most of this additional
information is statistical information created by aggre-
gating existing information, but also complementing
datasets exist. The software discussed in this article
uses, supplementary to the data sources discussed above,
statistical data from OSMStats (http://blackadder.dev.
openstreetmap.org/OSMStats/) about the history of the
OSM database. In addition, data about the usage of certain
tags in the OSM database are retrieved from Tag H is-
tory (http://taghistory.raifer.tech). Finally, the OSM spe-
cific data are complemented by data about the Earth, in
particular about the boundaries of the continents. Such
data are provided by the Natural Earth project (http://
naturalearthdata.com). The files have even been con-
verted to the GeoJSON formatted (http://github.com/
nvkelso/natural-earth-vector).
Conceptual overview
VGI is created by a community process, and the data
can only be interpreted and used if this community pro-
cess is well understood. As the data and the community
process are very heterogeneous, it is necessary to exam-
ine different aspects of this process in detail. A thorough
understanding of these aspects renders the interpretation
of the data possible, in particular, with regards to the
heterogeneity of the data and their folksonomy. The tech-
nology that we present in this article aims at fostering such
examination of the data by a larger number of scientists,
which can be achieved by storing the data at a persis-
tent location and making them available under an open
license. The advantages of open software and open data in
science have been examined by many studies, which can,
for example, be seen by a review paper published by von
Krogh et al. [35], but also in other articles, among them
by Murray-Rust [36], and Uhlir and Schröder [37]. Among
these advantages are the transparency of the methods and
the results, and the traceability and potential replication
of the results [38]. It can be hoped for that the sharing of
the data facilitates their broader use.
The infrastructure discussed in this article structures
the process of analyzing aspects of the OSM project
into two steps (Fig. 1). In the first step, data from the
data sources are retrieved, potentially merged, filtered,
Fig. 1 Technical overview
and finally aggregated, in order to provide datasets that
highlight one particular aspect of the OSM project. The
resulting data are published in a public Git (http://git-
scm.com)repository.Thedatacanhencebeusedby
many researchers without the need to mine the origi-
nal data sources again, which would otherwise strongly
increase the load of the original servers. To be able to
reuse the mined data is of particular importance for
the OSM Wiki server, because a multiplicity of pages
have to be retrieved several times otherwise. In a sec-
ond step, the data from the repository can be analysed
in detail. As the data and the software to produce the
data are open, the analysis results can easily be repli-
cated. In addition, the examination of exactly the same
dataset (and hence the same aspect of OSM) increases the
reproducibility of the analyses, which becomes of partic-
ular interest if analyses are conducted at different points
in time.
The data sources have already been discussed in the
last section. In the next section, we will discuss the data
mining software and potential analyses in more detail.
Technologies and resulting datasets
Different data sources are mined in the context of the
discussed infrastructure with the aim to retrieve infor-
mation about several aspects of the OSM project, its
database, and its community. An overview of the differ-
ent services generating the new datasets an be found in
Tabl e 1.
For instance, one of the resulting datasets collects sta-
tistical information about all the words used in various
languages in the documentation of the OSM Wiki. Statis-
tical data about the history of the documentation of the
OSM folksonomy are stored as another resulting dataset.
Please note that these two resulting datasets both use the
OSM wiki as their primary source, while combining the
OSM wiki and the tag history into one dataset renders
possible to relate the usage of tags in the OSM dataset and
the documentation of the folksonomy (Table 1).
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Mocnik et al. Open Geospatial Data, Software and Standards (2018) 3:7 Page 5 of 15
Table 1 Data mining
Data source Resulting dataset Programming
language
Frequency Description OSMvis
Natural Earth naturalearth_ne_
110m_admin_0_
countries.topojson
ECMAScript 6 Monthly Compression by topojsonOSM Changes Map
OSM wikiosm-tags-word-fre-
quency-wiki.json
Haskell Monthly Collect statistical information about the
words used in different language versions
of the documentation of the folksonomy
OSM Tags Word Frequency Wiki
OSM wikiosm-tags-history-
wiki.json
Haskell Monthly Collect statistical information about the
history of the documentation of the folk-
sonomy
OSM Tags Wiki History
OSM wiki,
Tag History
osm-tags-wiki-vs-
osmdata.json
ECMAScript 6 Monthly Relate the usage of tags in the OSM dataset
and the documentation of the folksonomy
First Documentation in the Wiki vs.
First Use in the Database
OSM Stats osmstats.json ECMAScript 6 Daily Download only OSM Changes per Day
Planet OSM osm-node-changes-
per-area.json
Haskell Daily Collect and aggregate statistical informa-
tion about OSM changesets
OSM Changes Map
*http://github.com/topojson/topojson
http://wiki.openstreetmap.org/wiki/Map_Features and linked pages, as well as corresponding pages in other languages
http://wiki.openstreetmap.org/wiki/Map_Features and linked pages, English language version only
Some of the proposed services mine the data on a daily
basis, while others perform monthly. This is due to the
fact that some data change very often and hence need to
be analysed on a more regular basis. The OSM change-
set is an example of such data. The information in the
OSM wiki, in contrast, does not change on a daily or even
weekly basis, and retrieving all sites of the OSM wiki is
slow and increases the load of the corresponding OSM
wiki server. Therefore, the service mines the OSM wiki on
a monthly schedule. For the ease of use, a common format
(Fig. 2) has been established, and the datasets are offered
in a repository, from which users can easily download the
requested datasets. Even different versions from differ-
ent points in time are available. Among the advantages of
such a repository are the provided metadata information,
such as license information, a description of the data, and
more importantly, the temporal information about when
the mining was performed.
Fig. 2 Format of a resulting dataset file, exemplified at
osm-node-changes-per-area.json
The data mining is performed by using different pro-
gramming languages, depending on which language is
most suitable. As a ‘default’, a recent variant of Javascript,
ECMAScript, is used due to its widely adoption in the
programming community. It is easy to read and write,
and it handles JSON files efficiently. Other programming
languages have their own beneficiaries, depending on the
analysis that needs to be performed. Haskell, a purely
functional programming language, is, for example, used
duetoitsefficiencyincomplexdataaggregation.Thedata
format JSON has been chosen as the data exchange for-
mat due to its simplicity and feasibility to be used within
web applications and web processing services.
The discussed infrastructure creates datasets that shed
light on very different aspects of OSM. These datasets
canbeusedtoanalysetheOSMprojectandthemapping
behaviour of the contributors. Statistical analysis, visual
analytics, and other approaches can be used to make sense
of the data. As the datasets are publicly available, it is
hoped for that several applications, websites, and services
will take advantage of these datasets. In the next section,
we discuss OSMvis, a project that visualizes these datasets
with the aim of fostering explorative analyses.
Results and discussion of use cases
This section is dedicated to a critical discussion of the
described server infrastructure and the resulting datasets.
Hereby, we discuss the merits and limitations of the server
infrastructure, in particular with respect to the long-term
perspective. In subsequent subsections, we then discuss
exemplary use cases. These use cases – a number of visu-
alizations – are meant to demonstrate how the resulting
data can be used in a simple and straight-forward way.
These visualizations are part of the project OSMvis1,a
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Mocnik et al. Open Geospatial Data, Software and Standards (2018) 3:7 Page 6 of 15
collection of visualizations related to the OSM project.
The aim of OSMvis is to generate insights about the OSM
community and its mapping behaviour as well as about
the OSM data themselves by a visual exploration of the
datasets. The usefulness of such visualizations, in partic-
ular when combining different aspects of the data from
different datasets, has been demonstrated by Mocnik
et al. [8].
Merits and limitations
The discussed infrastructure aims at fostering repro-
ducible research about VGI in general, and the OSM
project in particular. It approaches the question of how
to bundle the efforts of data mining, of understanding
licenses, and of how to archive the mined data. Such
advantages come, however, with limitations. This section
aims at critically set the discussed infrastructure into
context in order to recognize its merits and limitations.
Technical infrastructure. The data repository is one of
the central aspects of the discussed infrastructure, which
is used to store and archive the data. Currently, the data
repository is hosted by the company GitHub. This is, how-
ever, not a real limitation, because GitHub is essentially
based on Git (git-scm.com), an open source versioning
control system. In case GitHub becomes unserviceable,
for example, when it changes its terms of service, offers
different services than before, or even shuts down com-
pletely, it can easily be replaced by other servers that run
Git and offer similar graphical user interfaces. Hosted
infrastructure like the one provided by GitHub usually
restricts the size of the repository and the contained files.
If the data mining process results in larger files, it might
become necessary to host the files on similar systems that
are self-hosted.
The data mining server is currently hosted at and main-
tained by Heidelberg University. While this ensures main-
tenance in the short and medium term, there might be
the need of adapting this approach in future times. As an
example, several data mining servers might be used. These
servers might be hosted at different places and main-
tained by different organizations, despite their data all
being included into the same repository. Even the reposi-
tory itself could be maintained by different organizations.
Another option would be to host the data in different
repositories, which all conform to the same organizational
principles but are yet hosted and maintained by different
organizations. As the software is publicly available under
an open license, there are no real obstacles for exploring
andimplementingsuchcollaborativeapproachesofdata
mining and data repositories.
Data mining. The discussed infrastructure has differ-
ent strengths. One of the most important strengths is
to render possible the combination of different datasets,
which all are about the OSM project and its manifold
aspects. Datasets created by the community, for exam-
ple, the ones described in the section “Data sources”, s e r v e
for different purposes and are thus not always meant to
be formally analyzed or combined. Accordingly, it needs
some effort to relate the OSM wiki to data from the OSM
database. In addition, meta data is often only published
by the OSM community if it serves for the purpose of
improving OSM data or renders new applications possi-
ble. The infrastructure discussed in this article aims, in
contrast, for bundling data mining activities that focus
on the scientific understanding of the entire OSM project
and the principles behind VGI. Thereby, the infrastruc-
ture renders the following advantages: First, the mined
datasets are available publicly without the need to mine
the data on one’s own. Secondly, the data is not only
available at the point in time when being mined but also
stored and made permanently available, which, thirdly,
renders possible to run the same experiments and visu-
alizations with the same results at a later point in time
and thus affords reproducible research. These advantages
are critical in case of VGI due to several reasons: First,
very different aspects of VGI projects need to be incorpo-
rated in holistic approaches for examining and analyzing
VGI projects. Secondly, these datasets about VGI projects
are very different in their nature, and they expose a high
variety and heterogeneity. Thirdly, the hardware resources
used by VGI projects are often limited due to the volun-
tary nature of the project, potentially leading to a critical
overuse when extensively performing data mining. For
instance, the costs of crawling the OSM wiki as well as
theserverloadofthecorrespondingserveroftheOSM
foundation are reduced.
The infrastructure depends on many data sources
(Fig. 1). These data sources have their own data formats,
and they can be accessed in different ways, for example,
as files on a http or a ftp server, as text on html web-
sites, APIs, etc. Such data formats and interfaces usually
change over time, and data sources might become inac-
cessibleovertime.Theriskofsuchissuesbecomeslarger
the more data sources are involved. As the data mining is
performed only once by the data mining server (and not
by every scientist on their own), a quick adaption of the
infrastructure to such changes is needed. On the other
side, the common container format and the unchanging
way of how the data is archived in the data repository
allows to hide these issues from the scientist by removing
the need to adapt the analyses and visualizations them-
selves to new data formats or methods of accessing the
data. This, in turn, enables reproducible research, despite
the very rapid evolvement and improvement of VGI tools
and infrastructure, because current and historical data are
both available and share a common data structure.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Mocnik et al. Open Geospatial Data, Software and Standards (2018) 3:7 Page 7 of 15
Reproducability. The infrastructure focusses on pro-
ducing datasets that can easily be stored and archived.
While this allows to perform analyses easily and in a
reproducible way, extensive adaptions of the data min-
ing process to the particular use case are only possible in
parts: principle adaptions can be done in the data mining
process, and minor adaptions can be implemented by fil-
tering the dataset before analysing or visualizing it. This
limitation is strongly related to the data being mined only
once, which shifts the required processing power in the
data mining step from the scientist who analyses or visu-
alizes the OSM project for gaining general insights to the
group or community that runs the data mining server. In
addition, the dataset with the same name, which refers to
the point in time and the used algorithms, always contains
the same information – the data is only mined once and
remains unchanged in the following.
Reproducibility is the effect of several factors, among
them the public availability of the data under a shared
license; software and workflows that are shared publicly;
and the review of the software and workflows [39]. The
proposed infrastructure is able to ensure the first two fac-
tors – public availability of the data, the software, and
the workflows. Thereby, it seems to be important that the
data are stored independently of any university or research
group in order to guarantee the data to be available in
the long term. Storing the data in a version control sys-
tem prevents, in addition, single files from being modified
or deleted: all additions, modifications, or deletions of the
data become traceable. However, the infrastructure does
not provide a formal review process. It is though hoped
that the use of the data by different researchers leads to an
informal review process, which is also common to many
other collaborative projects hosted on GitHub or simi-
lar platforms. In summary, the proposed infrastructure
is able to ensure many of the aspects necessary to foster
reproducible research.
In the next sections, we discuss several use cases that
demonstrate the use of the mined datasets. These use
cases shall demonstrate how useful information can eas-
ily be visualized and how these visualizations adapt to the
infrastructure.
Use case: how do contributors and their contributions
spatially relate?
Users from all over the world contribute to OSM by
describing their own environment, or the environment in
other parts of the world. Before a user can contribute,
he or she has to observe the environment in some way,
for example, by walking around and visually observing
the environment; by recording GPS tracks, or manually
writing down GPS waypoints; by tracing and interpreting
aerial images; or by incorporating additional informa-
tion sources, such as information from websites, tablets
and signs, and from encyclopaedias. The gained or col-
lected knowledge is stored into the database by either
manually editing a small number of objects in some edi-
tor; by importing GPS tracks; by tracing aerial images;
or by importing datasets. These different methods of
observing the environment and adding the data to the
database are creating very different data – data with
different spatial precision; data about different features;
data referring to different concepts; and data of different
quality.
Some of the methods to collect knowledge about the
environment and incorporate it into the database are only
applicable if one is at the place that is to be mapped,
while other methods are even applicable if one is some-
where else on Earth. Also the personal motivation to
contribute is influencing the way one contributes: the per-
sonal interest in having one’s own environment mapped
may lead to adding local knowledge, while the inten-
tion to provide humanitarian aid by helping to map the
environment in a crisis region may lead to mapping
regions far away. To get a first idea of which people
map their own environment, which areas are primarily
mapped by people from other parts of the Earth, and
which motivation these people may have, we ask: At which
time of the day do users edit OSM data at which place
on Earth?
In Fig. 3, the changes of the OSM data on 2 April 2018
are depicted based on where they have been happening.
A change which affects many objects is represented by a
large disk, while a change affecting only some objects is
represented by a small disk. The point in time from which
the changes are depicted can, in the online version, be
chosen by a time slider, which ranges from 0 a. m. to 12
p. m. When the position of the time slider is changed, the
changes for the corresponding point in time are depicted.
When the reference time, in this case UTC, is passing, the
twilight zones move on Earth. The visualization depicts
three lines which contain all places on Earth at which
mean solar time equals 8 a.m., 12 a.m. , or 16 p.m. respec-
tively. These lines are straight and not curved because the
Mercator projection is used.
The visualization of the dataset osm-node-changes-
per-area.json reveals several patterns in the collec-
tive behaviour of OSM users. Most contributions are
about places in Europe, and other continents are much
less edited. Changes of OSM data at most places occur
mostly during day-time; before 8 a. m., only very little data
are edited. This may facilitate the potential interpretation
of a high number of OSM data contributions initiated by
direct observations, in particular in Europe, where this
day and night pattern is particularly distinctive. On other
continents than Europe, this pattern is still present but
not as distinctive, that is, the number of contributions
about North America during night-times is about as high
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Mocnik et al. Open Geospatial Data, Software and Standards (2018) 3:7 Page 8 of 15
8:00 am 12:00 (noon) 16:00 pm
8:00 am 12:00 (noon) 16:00 pm
a
b
Fig. 3 World map depicting changes in the OSM database. Every change of nodes in the data is depicted as a disk, and the number of changes at
that particular point in place and time is encoded by the size of the disk. Changeset data © OpenStreetMap contributors (cf. www.openstreetmap.
org/copyright). a2018-04-02, 6:00 a.m. (UTC) b2018-04-02, 10:45 a.m. (UTC)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Mocnik et al. Open Geospatial Data, Software and Standards (2018) 3:7 Page 9 of 15
as the number of contributions about Europe during
night-times, but during day-times, there are many more
contributions about Europe. This behaviour suggests that
the number of users contributing to a foreign country
or continent is higher in North America than in Europe.
It can, in any case, be assumed that the relative number
of visual observations leading to changes in OSM data
is much higher in Europe compared to other continents.
Extensive changes of a huge amount of data at neighbour-
ing locations may indicate that the changes have been
coordinated in some way. A possible interpretation of a
high number of contributions about Mozambique during
local night-time on 28 December 2016 may, for example,
be the result of a mapping event dedicated to the very
region. These conclusions demonstrate that the visualiza-
tion is capable of efficiently communicating some impor-
tant patterns in the collective user behaviour, explaining
some aspects of how OSM data are added and modified.
Use case: temporal patterns in mapping activities
Mapping activities depend on various factors and vary
thus over time. Good weather may facilitate outdoor map-
ping activities, bad weather the use of aerial images. Also
the introduction of new tools as well as promoting activ-
ities have an influence on mapping activities. Important
and global events potentially change the users habits,
and may, in turn, also influence OSM mapping activities.
While many potential influences on mapping activities
exist, their superposition impedes the traceability of the
actual influences. Temporal patterns in the number of
changes and the number of new users may shed light on
which influences have to be taken a closer look at. This
leads to the question: Which temporal patterns exist in the
history of the changes of OSM data?
A calendar heat map depicting the number of new OSM
objects, users, and similar values is a visualization tech-
nique that is suitable to approach the question of temporal
patterns. It can, for example, be applied to the dataset
osmstats.json. Each cell of the raster which is used in
the calendar heat map in Fig. 4corresponds to a day in the
history of OSM, and the cells are grouped by the temporal
units of a week, a month, and a year. This arrangement
of the cells facilitates the visual emergence of weekly,
monthly, and annual patterns in the data. Such a technique
of a calendar heat map was already used by Wickling et al.
[40]. In contrast to the cartographic heat maps which were
used by Trame et al. [15] and Roick et al. [16,17], the cal-
endar heat maps in Fig. 4are not cartographic maps but
general diagrams.
The visualization affirms some well known facts, for
example, that there are far more nodes than ways and
far more ways than relations, but it also reveals fur-
ther insights about their temporal trend. Since nodes,
ways, and relations, in short OSM objects, have been
introduced – nodes and ways existed from the begin-
ning, but relations have first been introduced on 7
October 2007 – their number is incessantly growing.
Also this growth of the number of objects is increasing
over time.
Several temporal patterns can be observed. The OSM
data editing activity has been increased for one to three
month long periods during the year. These periods of
editing activity can often be observed in the second quar-
ter of the year, but occur also in other ones. This trend
has become more distinct over the years, in particular
in 2013–2016. The annual pattern is superimposed by
a weekly pattern: the activity of adding nodes and ways
seems to be slightly less subject to variance on Saturdays
and Sundays. During the annual phases of high activity,
the activity at weekends increases less than at weekdays;
and during phases of normal or little activity, the activity
is, at least in some periods, slightly stronger at weekends.
The weekly pattern as well as the daily fluctuation is much
less visible as the annual pattern.
It is not always clear which reasons cause the phases
at which there is only little activity compared to other
periods. Even single days occur, at which the activ-
ity is strongly decreased. On 25 December 2015, for
example, only very few nodes have been added to
the OSM database, and Christmas might be a good
explanation. Only very few new edges, compared to
other days, have been added to the database on the
weekend of the 13 and 14 August 2016, which was
the middle weekend of the 2016 Summer Olympics
(5–21 August). Different factors might be explanations
for other dates, for example, bad weather conditions that
affect an entire continent, or changes in the available
editors.
The number of new users also exhibits an annual pat-
tern of periods in which an increasing growth can be
observed. A weekly pattern of less new users at weekends
is present as well. There is, however, no observable effect
of the number of new users on the number of new objects.
In particular, the annual pattern of users is not correlated
to the annual pattern of new objects. As can be seen by
these examples, conclusions about the user behaviour and
its influence on the mapping process can be drawn from
the visualizations discussed in the previous and in this
section.
Use case: visualizing the folksonomy of OSM
ThelocationsofthenodesintheOSMdatabasearecom-
plemented by semantic information. A collection of tags,
each of them consisting of a key and a value, is stored
to every node, way, or relation to capture the meaning
of the object. These tags describe concepts, which often
are fuzzy. Such fuzziness is an inherent property of geo-
graphic concepts as has been pointed out by Bennet [41].
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Mocnik et al. Open Geospatial Data, Software and Standards (2018) 3:7 Page 10 of 15
ab
Fig. 4 Calendar heat map with temporally arranged cells. Yellow colour indicates a low number of nodes/users per day, blue colour indicates a high
number. In case no data were available for a day, the cell is left white. Data from OSMStats © OpenStreetMap contributors (cf. www.openstreetmap.
org/copyright). aNew nodes per day bNew users per day
Coexisting and fuzzy concepts of a forrest, for example,
exist, because it is not clear whether a clearing is part of
a forest, how many trees are needed to be called a forest,
and how dense the trees need to be to be called a forest
[41]. The concepts used for OSM tags may accordingly
be fuzzy and also be changing over time. The description
of the concepts, and also the consensus among the users
of what is referred to by a certain concept, is of varying
nature for different tags and for different language ver-
sions alike. The OSM folksonomy is, compared with an
ontology, weak and not structured by the means of more
complex relations.
The description of the concepts used for OSM data, that
is, the description of the underlaying folksonomy, poten-
tially has a large influence on which concepts and which
tags are used when users contribute. If a concept is, for
example, not documented, neither in the OSM wiki nor
provided in the list of tags or keys of an editor, it may be
used much less than well documented concepts. As the
OSM wiki and tools to edit OSM data have a potentially
strong influence on the creation process and the use of
OSM data, it is of interest to understand the folksonomy
and its description. In this section, visualization tech-
niques are discussed to approach this comprehension in
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Mocnik et al. Open Geospatial Data, Software and Standards (2018) 3:7 Page 11 of 15
respect to coexisting concepts, and the temporal evolution
of the folksonomy.
Which types of terms are used to describe the folksonomy
in the OSM wiki?
Different types of ontologies and concepts exist. Some
concepts are descriptive, for example, by describing a
zebra crossing as white stripes that are arranged in par-
allel and with equal distance in between, such that the
optical illusion of alternating black and white stripes is
created. Other concepts describe objects by the function
they have in their environment, for example, a zebra cross-
ing as a place where pedestrians can safely cross a street.
The OSM wiki pages describe the concepts of the tags and
values differently, by referring to very different features
(functional and non-functional ones), but also with differ-
ent precision, a differing number of examples, and with
different length. All these properties of the description of
the concepts influence how users collect and formalize
information. We thus ask: Which words play an important
role in the descriptions of the folksonomy underlaying OSM
data, and which differences exist between the different
language versions?
Word clouds are a visualization technique commonly
used to get an overview of one or more texts. In Fig. 5,
we use word clouds to analyse the dataset osm-tags-
word-frequency-wiki.json with the aim of getting
a first overview of which linguistic concepts are referred
to in the descriptions of the concepts of OSM tags in the
OSM wiki. The word cloud depict the word frequency
in the descriptions of the tags in the language versions
English and German respectively (Fig. 5a and c). Some
words are used frequently on many pages, while other
words are only used very frequently on one page. The
word clouds in Fig. 5b and dagain depict the word fre-
quency, but the occurrence of a word is only counted once
per tag, that is, once per key-value combination, to com-
pensate for the effect of a word frequently occurring in the
description of one tag only.
The visualizations reveal some differences between the
language versions of the wiki. In the English version, the
words “tagging” and “tagged” are very prominent, while
in the German version, the word “mappen” (meaning “to
map”) is much more prominent. (The visualizations in
Fig. 5only depict words consisting of at least 5 charac-
ters to filter out many filler words, but even when showing
words of length 3, the word “tag” is much more popular
than the word “map” in the English version.) This differ-
ence shows that the English version refers more often to
the tagging process itself, while, even when describing the
tagging process, the German version refers to the entire
mapping process, which is more than the tagging pro-
cess. The high frequency of these terms is, among others,
due to their frequent occurrence in headings. The English
version refers frequently to the concept of a “place”. In
the German version, however, there does not seem to
be used any corresponding concept. On the other side,
the German version refers to “Öffnungszeiten” (meaning
“opening hours”), “Telefonnummer” (meaning “telephone
number”), and “Betreiber” (meaning “Operator”) more
often than the English version, which might indicate that
formal and descriptive information is referred to more
often.
The English version of the OSM wiki refers to many
concepts which might allow for some fuzziness in the
tagging process: “often”, “usually”, “possible”, “typically”,
“recommended”, “common”, “indicate”, and “sometimes”.
The word cloud depicting the German version, how-
ever, only contains the words “meist” (meaning “mostly”)
and “optional” (meaning “optional”/“optionally”). This
might indicate that the existing fuzziness of the con-
cepts and the observations is part of the descriptions
in the English version. The reason behind these differ-
ences in word frequencies and its potential effect on the
mapping process and the resulting data may be subject
of future research.
The history of the folksonomy and its effect on data quality
There is no fixed taxonomy of OSM data, because many
people contribute to the dataset in a joint effort. Accord-
ingly, the taxonomy is changing over time. As the tax-
onomy is created by its use in the data rather than by
documentation efforts, it is referred to as a folkson-
omy [8]. The quality of OSM data is strongly interlinked
with the evolution of the folksonomy. An understand-
ing of how the folksonomy evolves over time can thus
provide useful insights in how the data quality changes
over time. The OSMvis visualizations, which are able to
provide such insights, have already been discussed in lit-
erature [8] and are thus rather shortly summarized in
this section.
The dataset osm-tags-wiki-vs-osmdata.json
contains for all relevant tags the points in time when
tags have been used and when they have been docu-
mented. A corresponding visualization has been pub-
lished by Mocnik et al. [8](Fig.6). It compares, for
each tag, its 100th use in the OSM dataset to its first
use in the documentation in the OSM wiki. As can be
seen, most of the tags have been documented after their
first usages, which corroborates that the taxonomy is a
folksonomy.
The OSM folksonomy becomes richer and more fine-
grained over time, its granularity becomes more uniform,
and its scope was growing until recently. The conceptual
quality of OSM has, for example been discussed by Balla-
tore et al. [6] and Ali et al. [42]. The discussed dimensions
of conceptual quality include the accuracy, the granularity,
the completeness, the consistency, the compliance, and
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Mocnik et al. Open Geospatial Data, Software and Standards (2018) 3:7 Page 12 of 15
ab
cd
Fig. 5 Word clouds depicting the word frequency in the descriptions of tags in the OSM wiki. Only words of minimum length 5 are depicted. Data
from the OSM wiki © OpenStreetMap contributors (cf. http://wiki.openstreetmap.org/wiki/Wiki_content_license). aEnglish version of the OSM wiki
bEnglish version of the OSM wiki, occurence only counted once per tag cGerman version of the OSM wiki dGerman version of the OSM wiki,
occurence only counted once per tag
the richness of the data. These dimensions are interrelated
to temporal aspects. Data are added, and the folksonomy
becomes more complete and more consistent over time.
An examination of single tags is though needed in order to
understand the folksonomy in greater detail, in particular,
with respect to temporal changes of single tags. Mocnik
et al. [8] have proposed a visualization, which interac-
tively visualizes the tags contained in the dataset in more
detail (Fig. 7).
Conclusion
Volunteered geographic information (VGI) differs from
many other geographic information by being collected
by a group of volunteers and by potentially being more
heterogeneous. An important part of VGI is the typi-
cally coordinated effort to create and maintain the data.
The interpretation and analysis of VGI data is thus only
possible when considering the social process that leads
to their creation. Typically, VGI data are studied without
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Mocnik et al. Open Geospatial Data, Software and Standards (2018) 3:7 Page 13 of 15
Fig. 6 Comparison of the 100th use of a tag in the OSM database and its first documentation in the OSM wiki. Each blue disk represents a tag. The
size of the disk reflects how frequently the tag is used in the OSM database. Only tags that are used at least 1,000 times in the data and that are
documented in the OSM wiki are included, tags with value "*"are excluded. Data from the OSM database/wiki © OpenStreetMap contributors (cf.
http://openstreetmap.org/copyright); image and caption by Mocnik et al. [8], CC BY 4.0
ab
Fig. 7 Visualization technique for the documentation of the folksonomy in the OSM wiki. The nodes of the inner circle refer to the documented
keys, while the nodes around, to the corresponding values. The longer a value exists, the more it moves away from the origin. Data from the OSM
wiki © OpenStreetMap contributors (cf. http://wiki.openstreetmap.org/wiki/Wiki_content_license). Image and caption by Mocnik et al. [8], CC BY 4.0.
a2007 b2012
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Mocnik et al. Open Geospatial Data, Software and Standards (2018) 3:7 Page 14 of 15
or in combination with their creation process. A holistic
understanding is though to be established and data qual-
ity can, as a result, not be examined thoroughly, because a
more complete understanding of which factors influence
the emergence of the data is still missing.
In this article, we have described an infrastructure for
collecting and mining data about the OSM project. This
infrastructure does not restrict to certain aspects, like
the OSM dataset, but aims at collecting many available
information. The resulting datasets are offered under an
open license in order to foster their analysis and use. Each
dataset also contains a copyright notice and a link to the
original datasets that were used for data mining, as well as
a description. It is hoped for that the collection of datasets
fosters projects that aim at a holistic understanding of the
OSM project rather than an understanding of the OSM
dataset only.
The discussed visualizations use datasets that were cre-
ated by the described infrastructure. They offer insights
into basic but yet important aspects of the OSM project
and offer a way to explore the OSM project from dif-
ferent points of view and in a more holistical way.
The visualizations have, in parts, been discussed in the
context of data quality, but a more detailed analysis
may be needed to gain a deeper understanding of the
influence on data quality. In particular, more language
versions of the OSM wiki may be compared and the
influence of the differing descriptions of concepts in the
language versions on the actual data may be explored
in greater detail. A visualization of correlations between
the modification and introduction of new concepts, and
properties of the data at the corresponding points in
time, may provide further insights about how data qual-
ity is influenced by the folksonomy and its granularity
in particular.
Many more aspects of OSM-related data can be visual-
ized by using methods from the field of information visu-
alization. The number of new objects may, for example, be
visually compared to large events and weather data. Also
nodes and ways in a certain area may be compared with
nodes and ways in another area, by visualizing their pos-
sibly systematically differing properties, for example, by
the use of parallel coordinates. Data related to OSM may
also be a good starting point for developing new visualiza-
tion techniques that incorporate spatial and non-spatial
data by combining maps with non-spatial visualizations.
The described software and the resulting datasets can
serve as a starting point for these and further research
directions.
Availability and requirements
Project name: OSMvis Data
Project home page: http://github.com/giscience/osm-
vis-data
Operating system: Unix-like systems
Programming language: Haskell, ECMAScript 6, bash
scripts
Other requirements: none
License: GPL-3
Any restrictions to use by non-academics: see license
Endnote
1http://osm-vis.geog.uni-heidelberg.de
Abbreviations
VGI: Volunteered geographic information; OSM: OpenStreetMap
Acknowledgments
The authors would like to thank Martin Raifer for his helpful comments on the
manuscript.
Funding
This research has been supported by the DFG project A framework for
measuring the fitness for purpose of OpenStreetMap data based on intrinsic quality
indicators (FA 1189/3-1).
Availability of data and materials
– code repository for data mining and for visualization: http://github.com/
mocnik-science/osm- vis
– data repository: http://github.com/giscience/osm-vis- data
– visualization: http://osm-vis.geog.uni- heidelberg.de.
Authors’ contributions
FBM developed the visualizations. AZ and FBM jointly developed the idea of
the visualization presented in Fig. 7. FBM and AM wrote the manuscript. All
authors read and approved the final manuscript.
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
Received: 26 November 2017 Accepted: 17 March 2018
References
1. Goodchild MF. Citizens as sensors: the world of volunteered geography.
GeoJournal. 2007;69:211–21.
2. See L, Mooney P, Foody GM, Bastin L, Comber A, Estima J, Fritz S,
Kerle N, Jiang B, Laakso M, Liu H-Y, Milˇ
cinski G, Nikši ˇ
c M, Painho M,
P˝
odör A, Olteanu-Raimond A-M, Rutzinger M. Crowdsourcing, citizen
science or volunteered geographic information? The current state of
crowdsourced geographic information. ISPRS Int J Geo-Inf. 2016; 5(5).
https://doi.org/10.3390/ijgi5050055.
3. Haklay M, Weber P. OpenStreetMap: user-generated street maps. IEEE
Pervasive Comput. 2008;7(4):12–8. https://doi.org/10.1109/MPRV.2008.80.
4. Mooney P, Minghini M. A review of OpenStreetMap data. In: Foody GM,
See L, Fritz S, Mooney P, Olteanu-Raimond A-M, Fonte CC, Antoniou V,
editors. Mapping and the Citizen Sensor. London: Ubiquity Press. 2017.
p. 37–59. https://doi.org/10.5334/bbf.c.
5. Mocnik F-B, Mobasheri A, Griesbaum L, Eckle M, Jacobs C, Klonner C.
A grounding-based ontology of data quality measures. J Spat Inf Sci.
2018; 16. https://doi.org/10.5311/JOSIS.2018.16.360.
6. Ballatore A, Zipf A. A conceptual quality framework for volunteered
geographic information. In: Proceedings of the 12th Conference on
Spatial Information Theory (COSIT). 2015. p. 89–107. https://doi.org/10.
1007/978-3- 319-23374-1_5.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Mocnik et al. Open Geospatial Data, Software and Standards (2018) 3:7 Page 15 of 15
7. Senaratne H, Mobasheri A, Ali AL, Capineri C, Haklay M. A review of
volunteered geographic information quality assessment methods. Int J
Geogr Inf Sci. 2017;31(1):139–67. https://doi.org/10.1080/13658816.2016.
1189556.
8. Mocnik F-B, Zipf A, Raifer M. The OpenStreetMap folksonomy and its
evolution. Geo-Spat Inform Sci. 2017;20(3):219–30. https://doi.org/10.
1080/10095020.2017.1368193.
9. Mooney P, Corcoran P. How social is OpenStreetMap? In: Proceedings of
the 15th AGILE Conference on Geographic Information Science. 2012. p.
282–287.
10. Ballatore A, Mooney P. Conceptualising the geographic world: the
cimensions of negotiation in crowdsourced cartography. Int J Geogr Inf
Sci. 2015;29(12):2310–27. https://doi.org/10.1080/13658816.2015.
1076825.
11. Gröchenig S, Brunauer R, Rehrl K. Estimating completeness of VGI
datasets by analyzing community activity over time periods. In:
Proceedings of the 17th AGILE Conference on Geographic Information
Science. 2014. p. 3–18. https://doi.org/10.1007/978- 3-319-03611- 3_1.
12. Neis P, Zielstra D, Zipf A. Comparison of volunteered geographic
information data contributions and community development for
selected world regions. Future Internet. 2013;5(2):282–300. https://doi.
org/10.3390/fi5020282.
13. Mooney P, Corcoran P. The annotation process in OpenStreetMap.
Transactions in GIS. 2012;16(4):561–79. https://doi.org/10.1111/j.1467-
9671.2012.01306.x.
14. Mooney P, Corcoran P. Who are the contributors to OpenStreetMap and
what do they do? In: Proceedings of the 20th Annual GIS Research UK
(GISRUK). 2012.
15. Trame J, Keßler C. Exploring the lineage of volunteered geographic
information with heat maps. GeoViz. 2011.
16. Roick O, Hagenauer J, Zipf A. OSMatrix – grid-based analysis and
visualization of OpenStreetMap. In: Proceedings of the 1st European State
of the Map Conference (SOTM-EU). 2011.
17. Roick O, Loos L, Zipf A. A technical framework for visualizing
spatio-temporal quality metrics of volunteered geographic information.
In: Proceedings of the Conference Geoinformatik. 2012.
18. Haklay M. How good is volunteered goegraphical information?
A comparative study of OpenStreetMap and Ordnance Survey datasets.
Environ Plan B. 2010;37(4):682–703. https://doi.org/10.1068/b35097.
19. Neis P, Zielstra D, Zipf A. The street network evolution of crowdsourced
maps: OpenStreetMap in Germany 2007–2011. Future Internet. 2012;4:
1–21. https://doi.org/10.3390/fi4010001.
20. Girres J-F, Touya G. Quality assessment of the French OpenStreetMap
dataset. Trans GIS. 2010;14(4):435–59. https://doi.org/10.1111/j.1467-
9671.2010.01203.x.
21. Camboim SP, Bravo JVM, Sluter CR. An investigation into the
completeness of, and the updates to, OpenStreetMap data in a
heterogeneous area in Brazil. ISPRS Int J Geo-Inf. 2015;4:1366–88. https://
doi.org/10.3390/ijgi4031366.
22. Forghani M, Delavar MR. A quality study of the OpenStreetMap dataset
for Tehran. ISPRS Int J Geo-Inf. 2014;3(2):750–63. https://doi.org/10.3390/
ijgi3020750.
23. Eckle M, de Albuquerque JP. Quality assessment of remote mapping in
OpenStreetMap for disaster management purposes. In: Proceedings of
the 12th International Conference on Information Systems for Crisis
Response and Management (ISCRAM). 2015.
24. Zipf A, Mobasheri A, Rousell A, Hahmann S. Crowdsourcing for
individual needs – thecaseof routing and navigation for mobility-impaired
persons. In: Capineri C, Haklay M, Huang H, Antoniou V, Kettunen J,
Ostermann F, Purves R, editors. European Handbook of Crowdsourced
Geographic Information. London: Ubiquity Press; 2016. p. 325–337.
https://doi.org/10.5334/bax.x.
25. Rousell A, Zipf A. Towards a landmark-based pedestrian navigation
service using OSM data. ISPRS Int J Geo-Inf. 2017;6(64). https://doi.org/10.
3390/ijgi6030064.
26. Bakillah M, Liang S, Mobasheri A, Arsanjani JJ, Zipf A. Fine-resolution
population mapping using OpenStreetMap points-of-interest. Int J Geogr
Inf Sci. 2014;28(9):1940–63. https://doi.org/10.1080/13658816.2014.
909045.
27. Keßler C, de Groot RTA. Trust as a proxy measure for the quality of
volunteered geographic information in the case of OpenStreetMap.
In: Proceedings of the 16th AGILE Conference on Geographic Information
Science. 2013. p. 21–37. https://doi.org/10.1007/978- 3-319-00615- 4_2.
28. Arsanjani JJ, Barron C, Bakillah M, Helbich M. Assessing the quality of
OpenStreetMap contributors together with their contributions. In:
Proceedings of the 16th AGILE Conference on Geographic Information
Science. 2013.
29. Rehrl K, Gröchenig S, Hochmair H, Leitinger S, Steinmann R, Wagner A.
A conceptual model for analyzing contribution patterns in the context of
VGI. In: Krisp JM, editor. Progress in Location-Based Services. Berlin:
Springer; 2013. p. 373–388. https://doi.org/10.1007/978- 3-642-34203-
5_21.
30. Rehrl K, Gröchenig S. A framework for data-centric analysis of mapping
activity in the context of volunteered geographic information. ISPRS Int J
Geo-Inf. 2016;5(37). https://doi.org/10.3390/ijgi5030037.
31. Hashemi P, Abbaspour RA. Assessment of logical consistency in
OpenStreetMap based on the spatial similarity concept. In: Arsanjani JJ,
Zipf A, Mooney P, Helbich M, editors. OpenStreetMap in GIScience.
Experiences, Research, and Applications. Heidelberg: Springer; 2015. p.
19–36. https://doi.org/10.1007/978-3- 319-14280-7_2.
32. Vandecasteele A, Devillers R. Improving volunteered geographic
information quality using a tag recommender system: the case of
OpenStreetMap. In: Arsanjani JJ, Zipf A, Mooney P, Helbich M, editors.
OpenStreetMap in GIScience. Experiences, Research, and Applications.
Heidelberg: Springer; 2015. p. 59–80. https://doi.org/10.1007/978- 3-319-
14280-7_4.
33. Mooney P, Corcoran P, Winstanley AC. Towards quality metrics for
OpenStreetMap. In: Proceedings of the 18th ACM SIGSPATIAL
International Symposium on Advances in Geographic Information
Systems. 2010. p. 514–517.
34. Barron C, Neis P, Zipf A. A comprehensive framework for intrinsic
OpenStreetMap quality analysis. Trans GIS. 2014;18(6):877–95. https://doi.
org/10.1111/tgis.12073.
35. von Krogh G, von Hippel E. The promise of research on open source
software. Manag Sci. 2006;52(7):975–83. https://doi.org/10.1287/mnsc.
1060.0560.
36. Murray-Rust P. Open data in science. Ser Rev. 2008;34(1):52–64. https://
doi.org/10.1080/00987913.2008.10765152.
37. Uhlir PF, Schröder P. Open data for global science. Data Sci J. 2007;6:
36–53. https://doi.org/10.2481/dsj.6.OD36.
38. Nosek BA, Alter G, Banks GC, Boorsbom D, Bowman SD, Breckler SJ, Buck
S, Chambers CD, Chin G, Christensen G, Contestabile M, A D, Eich E,
Freese J, Glennerster R, Goroff D, Green DP, Hesse B, Humphreys M,
Ishiyama J, Karlan D, Kraut A, Lupia A, Mabry P, Madon T, Malhotra N,
Mayo-Wilson E, McNutt M, Miguel E, Levy Paluck E, Simonsohn U,
Soderberg C, Spellman BA, Turitto J, VandenBos G, Vazire S,
Wagenmakers EJ, Wilson R, Yarkoni T. Promoting an open research
culture. Science. 2015;348(6242):1422–5. https://doi.org/10.1126/science.
aab2374.
39. Singleton AD, Spielman S, Brunsdon C. Establishing a framework for
open geographic information science. Int J Geogr Inf Sci. 2016;30(8):
1507–21. https://doi.org/10.1080/13658816.2015.1137579.
40. Wicklin R, Allison R. Congestion in the sky. Visualizing domestic airline
traffic with SAS software. ASA Statistical Computing and Graphics Data
Expo. 2009.
41. Bennett B. What is a forest? On the vagueness of certain geographic
concepts. Topoi. 2001;20(2):189–201.
42. Ali AL, Sirilertworakul N, Zipf A, Mobasheri A. Guided classification
system for conceptual overlapping classes in OpenStreetMap. ISPRS Int J
Geo-Inf. 2016;5(87). https://doi.org/10.3390/ijgi5060087.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... It is possible to access weekly snapshots and changesets of the full dataset or the full history by downloading them in XML or Planet PBF files (https://planet.openstreetmap.org/). To analyze the creation process of OSM data some software were implemented by the are limited to examine and view a small portion of the database [39]. Martini et al. [40] presented a methodology for analysis of changes in OSM. ...
Preprint
Full-text available
Modern research applies the Open Science approach that fosters the production and sharing of Open Data according to the FAIR (Findable, Accessible, Interoperable, Reusable) principles. In the geospatial context this is generally achieved through the setup of OGC Web services that implements open standards that satisfies the FAIR requirements. Nevertheless, the requirement of Findability is not fully satisfied by those services since there’s no use of persistent identifiers and no guarantee that the same dataset used for a study can be immutably accessed in a later period: a fact that hinders the replicability of research. This is particularly true in recent years where data-driven research and technological advances have boosted frequent updates of datasets. Here, we review needs and practices, supported by some real case examples, on frequent data or metadata updates in geo-datasets of different data types. Additionally we assess the currently available tools that support data versioning for databases, files and log-structured tables. Finally we discuss challenges and opportunities to enable geospatial web services that are fully FAIR: a fact that would provide, due to the massive use and increasing availability of geospatial data, a great push toward open science compliance with ultimately impacts on the science transparency and credibility.
... The patient's address, given by the pair <ZIP code, City> (e.g. 00168, Rome), was translated into geographical coordinates using the OpenStreetMap API for Python [16]. ...
Article
Full-text available
Background The 30‐day hospital re‐admission rate is a quality measure of hospital care to monitor the efficiency of the healthcare system. The hospital re‐admission of acute stroke (AS) patients is often associated with higher mortality rates, greater levels of disability and increased healthcare costs. The aim of our study was to identify predictors of unplanned 30‐day hospital re‐admissions after discharge of AS patients and define an early re‐admission risk score (RRS). Methods This observational, retrospective study was performed on AS patients who were discharged between 2014 and 2019. Early re‐admission predictors were identified by machine learning models. The performances of these models were assessed by receiver operating characteristic curve analysis. Results Of 7599 patients with AS, 3699 patients met the inclusion criteria, and 304 patients (8.22%) were re‐admitted within 30 days from discharge. After identifying the predictors of early re‐admission by logistic regression analysis, RRS was obtained and consisted of seven variables: hemoglobin level, atrial fibrillation, brain hemorrhage, discharge home, chronic obstructive pulmonary disease, one and more than one hospitalization in the previous year. The cohort of patients was then stratified into three risk categories: low (RRS = 0–1), medium (RRS = 2–3) and high (RRS >3) with re‐admission rates of 5%, 8% and 14%, respectively. Conclusions The identification of risk factors for early re‐admission after AS and the elaboration of a score to stratify at discharge time the risk of re‐admission can provide a tool for clinicians to plan a personalized follow‐up and contain healthcare costs.
... OSM History Viewer (osmrmhv) [129] visualizes the changes made in single changesets, using different colours for different actions (delete, modify, create) and analyses the history of OSM relations, while Show me the way [128] is a popular near-real-time graphical representation of the latest OSM edits. Using the OSM Full History Planet File, OSMvis [95,96] offers a number of visualizations to explore the generation, modification, and use of OSM through the methods of information visualization. Finally, OSM Latest Changes [117] lists the latest changesets happened in any given region and highlights the ones corresponding to the object selected on the map. ...
Thesis
Full-text available
Data stored in OpenStreetMap can be more accurate or more recently updated than authoritative sources in some areas, thus requiring to look at other approaches than evaluating OSM comparing it against other databases: methods to perform intrinsic quality analysis are then needed. As this topic is so wide, it has been decided to focus on a specific and less investigated aspect: temporal accuracy. Comparing different areas or features within OSM can provide a relative measurement of temporal accuracy and up-to-dateness, which has been done using standard tools. After mapping existing solutions and evaluating new approaches to gather and process historical data, it has been hypothesized that it could have been worth trying to develop a new software, targeting both OSM contributors and researchers, with a simple interface, reasonable speed and modest technical requirements. This goal has been achieved to a good extent with the development of "Is OSM up-to-date?", after a long process defined by numerous changes of both approaches and libraries, in an increasingly stronger effort to process larger areas more quickly and more efficiently.
... However, geocoding services such as Google Maps offer data only in a limited way. In contrast, geodata from OpenStreetMap (OSM) contain geographic information provided by volunteers and thus are less restricted [20]. In a recent study, we performed an extensive literature search to identify obesity-related environmental factors [18]. ...
Article
Full-text available
Background Overweight and obesity are severe public health problems worldwide. Obesity can lead to chronic diseases such as type 2 diabetes mellitus. Environmental factors may affect lifestyle aspects and are therefore expected to influence people’s weight status. To assess environmental risks, several methods have been tested using geographic information systems. Freely available data from online geocoding services such as OpenStreetMap (OSM) can be used to determine the spatial distribution of these obesogenic factors. The aim of our study was to develop and test a spatial obesity risk score (SORS) based on data from OSM and using kernel density estimation (KDE). Methods Obesity-related factors were downloaded from OSM for two municipalities in Bavaria, Germany. We visualized obesogenic and protective risk factors on maps and tested the spatial heterogeneity via Ripley’s K function. Subsequently, we developed the SORS based on positive and negative KDE surfaces. Risk score values were estimated at 50 random spatial data points. We examined the bandwidth, edge correction, weighting, interpolation method, and numbers of grid points. To account for uncertainty, a spatial bootstrap (1000 samples) was integrated, which was used to evaluate the parameter selection via the ANOVA F statistic. Results We found significantly clustered patterns of the obesogenic and protective environmental factors according to Ripley’s K function. Separate density maps enabled ex ante visualization of the positive and negative density layers. Furthermore, visual inspection of the final risk score values made it possible to identify overall high- and low-risk areas within our two study areas. Parameter choice for the bandwidth and the edge correction had the highest impact on the SORS results. Discussion The SORS made it possible to visualize risk patterns across our study areas. Our score and parameter testing approach has been proven to be geographically scalable and can be applied to other geographic areas and in other contexts. Parameter choice played a major role in the score results and therefore needs careful consideration in future applications.
... 1. Finding the boundary of logged GPS data. 2. Downloading of OSM file using Python API [30]. ...
Conference Paper
Full-text available
A digital twin (DT) is a technology that allows to create a virtual copy of a real-world system or product. This allows real-world activities to be synchronized, validated and production costs to be reduced. In this study, a DT model is developed to simulate the maneuvers of the vehicle in the virtual environment and optimize the vehicles’ control parameters within the scope of Advanced Driver Assistance Systems (ADAS) and Automated Driving (AD) applications. In order to create and validate the DT, we focused on the follow acceleration maneuver (FAM). Firstly, road tests were carried out based on the cases, and vehicle dynamics relevant data were collected with the sensors. The collected data were used to generate scenario files containing static and dynamic environmental information, and movements of road users using ASAM OpenDRIVE® (ODR) and ASAM OpenSCENARIO® (OSC) scenario modelling standards, respectively. The Model.CONNECT co-simulation platform (CSP), which enables subsystems containing vehicle model, controller, and scenario information to work together, was used to simulate maneuvers regarding the generated scenarios. While creating the DT, the parameters in the vehicle model were tuned by performing the coastdown maneuver as well as FAM. After all, subsystems were integrated into the AVL Model.CONNECT (MC), FAM is performed and controller parameters were optimized using the particle swarm optimization (PSO) algorithm. The main goal of this study is to analyze and validate the DT model, then optimize controller parameters of adaptive cruise control (ACC) with the DT.
... The proposed approach uses some of the geometric properties of the points constituting a building (corner points) to evaluate drawing behavior. To assess different kinds of measures, previous studies have used data mining techniques [6,26,32,33]. Basiri et al. [15] measured the scores of some of ML classifiers and found that K-nearest neighbor and random forest classifiers are suitable for geometric and geographic type classifications, respectively. Furthermore, Pazoky and Pahlavani [34] recently compared several ML classifiers with the centrality measures of OSM data for data enrichment. ...
Article
Full-text available
Mapping as an action in volunteered geographic information is complex in light of the human diversity within the volunteer community. There is no integrated solution that models and fixes all data heterogeneity. Instead, researchers are attempting to assess and understand crowdsourced data. Approaches based on statistics are helpful to comprehend trends in crowd-drawing behaviors. This study examines trends in contributors’ first decisions when drawing OpenStreetMap buildings. The proposed approach evaluates how important the properties of a point are in determining the first point of building drawings. It classifies the adjacency types of the buildings using a random forest classifier for the properties and aids in inferring drawing trends from the relative impact of each property. To test the approach, detached and attached building groups in Istanbul and Izmir, Turkey, were used. The result had an 83% F-score. In summary, the volunteers tended to choose as first points those further away from the street and building centroid and provided lower point density in the detached buildings than the attached ones. This means that OSM volunteers paid more attention to open spaces when drawing the first points of the detached buildings in the study areas. The study reveals common drawing trends in building-mapping actions.
Article
Citizen science projects are open and available to anyone to contribute data. The literature concerning volunteered geographic information, however, has demonstrated significant demographic participation biases across time and space. Understanding the significance and impacts of these biases is challenging due to privacy concerns, which lead to the (pseudo-)anonymity of contributors. Using a sample of 265 users, this article statistically analyzes edits to the crowdsourced mapping platform OpenStreetMap to examine the impact of gender and age on spatial and temporal contribution patterns. We find that men aged in the others group (i.e., below twenty-five or over fifty-four) made more contributions during the week and on weekends than those in the economically active age group (i.e., ages twenty-five through fifty-four). Using the Kruskal–Wallis test to compare temporal contributions between gender groups, the economically active group showed a significant gender difference on both weekdays and weekends, as well as the hours of the day, with men making more contributions than women regardless of age category. Men in the others group made the most contributions overall. Calculating the Simpson Index of Diversity for user edits reveals that women have more limited spatial interests (i.e., they contribute to fewer countries) than their male counterparts, suggesting particular spatial preferences by gender.
Article
Urbanization has changed the Earth’s surface, resulting in the urban heat island effect. There has been a recent focus on increasing urban albedo as a strategy to mitigate this phenomenon. Studies on Montreal’s albedo have primarily looked at the impact of albedo manipulations upon the urban heat island effect. However, the current albedo of the island, broken down by land use type, has yet to be quantified. Therefore, previous studies often rely on generalized urban albedo and land use estimates that have not been proven to be generalizable to Montreal. This study attempted to quantify the current albedo of the Island of Montreal through urban land use categorization. The findings were then used to estimate albedo increase under different roof replacement scenarios. Data sets for building footprints, vegetation, and roadways were incomplete in Montreal, requiring the combination of several sources to obtain representative data for analysis. This study found the albedo of Montreal island to be 0.19 ± 0.057. Further, the hypothetical roof change scenarios then aligned with a 0.1 albedo increase, which is the albedo change used in current urban heat island effect mitigation literature. Using the albedo increase potential that resulted from the three scenarios tested here, future research should explore further estimation of the associated surface and air temperature decrease.
Thesis
Social Media is a new age of data sources that emerged in the last decade. Users who have diverse different motivations (such as; entertainment, communicating, or promoting) sign up the platforms worldwide. Currently, there is 3.5 billion active social media account worldwide. This growing number of account holders are accepted as human sensors that provide information about their environment. Unlike traditional sensors, these human sensors have no certainty in their capacity to sense and share information. In addition, the data provided by human sensors is unstructured. Still, social media is an invaluable data source for studies, especially that require continuous and real-time data widely. Currently, the data is widely used for politics, marketing, and most importantly in crisis management. In this thesis, social media data is assessed for incidence mapping during or shortly after a disaster with the motivation of increasing resilience to the expected major earthquake in Istanbul. The disaster management cycle has four phases as response, recovery, mitigation, and preparation. In the response phase, having real-time data from the affected area is important to properly allocate the resources. Conventional mapping technologies such as remote sensing and photogrammetry have the capacity of detecting the occurrence of a natural hazard however they are not eligible for information retrieval about the impacts of the natural hazards on human life such as emotions, opinions, and emergency situations. At this point, social media become forward as an immediate data source for incidence mapping during the response time of a disaster. Incidence mapping for resources management requires fine-grained data analyses. However, the uncertainty in data capacity, questions in the reliability of chosen techniques for pre-processing, and data bias are the key obstacles to the fine-grained analyses with the use of social media data. In this thesis, social media data is evaluated in terms of these key obstacles for Istanbul City since the data varies to the area that belongs to depending on its own human sensors. The main objective of this thesis is the determination of social media data potential for its use during the response phase of disaster management. There are three sub-objectives in order to reach the main objective; revealing the adequacy of the data for incidence mapping, adapting the pre-processing steps to the Turkish language, and questioning the reliability of the used filtering and classifying techniques with the quantification of its impacts on mapping, and investigating the intrinsic quality of the data (such as anomalies, trends, and biases) for the further interpretation of the incidence maps. The thesis is composed of three papers tackling these three objectives. Istanbul City is determined as the case area of each paper. In the first paper, the capacity of social media data to detect incidences in a fine-grained spatiotemporal perspective is investigated. For the case, the coup attempt data georeferenced within Istanbul city boundary is used and a series of incidences by the hour is mapped with the hotspots. According to that study, it is revealed that social media data has the capacity to identify an incidence with a fine-grain spatiotemporal resolution. In the second paper, the reliability of the chosen techniques for pre-processing and filtering social media data is researched with their effects on incidence mapping. Two terror attack data that are georeferenced within Istanbul City are used for the case of this study. The study is not also testing the adaptation of the current pre-processing and filtering techniques to the Turkish language and also proposes a quantitative comparative index for quantifying the spatial reliability of each filtering process. This index named Giz Index which can be replicated for the similarity searches between two incidence maps. It is found in this study, with the proposed methodology for pre-processing and filtering, over 80% of spatial reliability can be achieved for incidence mapping based on social media data. In the third paper, the intrinsic quality of data is researched for the right interpretation of the incidence maps. The study overviews the weekly sampled social media data from each month during a year that is georeferenced within Istanbul City. The data is assessed from the perspective of data anomaly, trend, and bias with the spatial statistical tests. The study infers that the data has spatial representation bias, anomaly tendency in some parts of the city, the spatiotemporal bias in terms of the time of day and day of the week. The results of the study contribute to the incidence mapping with the reference maps to avoid biased hot spot occurrences or missing information due to less amount of data.
Article
Full-text available
Data quality and fitness for purpose can be assessed by data quality measures. Existing ontologies of data quality dimensions reflect, among others, which aspects of data quality are assessed and the mechanisms that lead to poor data quality. An understanding of which source of information is used to judge about data quality and fitness for purpose is, however, lacking. This article introduces an ontology of data quality measures by their grounding, that is, the source of information to which the data is compared to in order to assess their quality. The ontology is exemplified with several examples of volunteered geographic information (VGI), while also applying to other geographical data and data in general. An evaluation of the ontology in the context of data quality measures for OpenStreetMap (OSM) data, a well-known example of VGI, provides insights about which types of quality measures for OSM data have and which have not yet been considered in literature.
Chapter
Full-text available
While there is now a considerable variety of sources of Volunteered Geographic Information (VGI) available, discussion of this domain is often exemplified by and focused around OpenStreetMap (OSM). In a little over a decade OSM has become the leading example of VGI on the Internet. OSM is not just a crowdsourced spatial database of VGI; rather, it has grown to become a vast ecosystem of data, software systems and applications, tools, and Web-based information stores such as wikis. An increasing number of developers, industry actors, researchers and other end users are making use of OSM in their applications. OSM has been shown to compare favourably with other sources of spatial data in terms of data quality. In addition to this, a very large OSM community updates data within OSM on a regular basis. This chapter provides an introduction to and review of OSM and the ecosystem which has grown to support the mission of creating a free, editable map of the whole world. The chapter is especially meant for readers who have no or little knowledge about the range, maturity and complexity of the tools, services, applications and organisations working with OSM data. We provide examples of tools and services to access, edit, visualise and make quality assessments of OSM data. We also provide a number of examples of applications, such as some of those used in navigation and routing, that use OSM data directly. The chapter finishes with an indication of where OSM will be discussed in the other chapters in this book, and we provide a brief speculative outlook on what the future holds for the OSM project.
Article
Full-text available
The comprehension of folksonomies is of high importance when making sense of Volunteered Geographic Information (VGI), in particular in the case of OpenStreetMap (OSM). So far, only little research has been conducted to understand the role and the evolution of folksonomies in VGI and OSM, which is despite the fact that without a comprehension of the folksonomies the thematic dimension of data can hardly be used. This article examines the history of the OSM folksonomy, with the aim to predict its future evolution. In particular, we explore how the documentation of the OSM folksonomy relates to its actual use in the data, and we investigate the historical and future scope and granularity of the folksonomy. Finally, a visualization technique is proposed to examine the folksonomy in more detail.
Article
Full-text available
With the advent of location-aware smartphones, the desire for pedestrian-based navigation services has increased. Unlike car-based services where instructions generally are comprised of distance and road names, pedestrian instructions should instead focus on the delivery of landmarks to aid in navigation. OpenStreetMap (OSM) contains a vast amount of geospatial information that can be tapped into for identifying these landmark features. This paper presents a prototype navigation service that extracts landmarks suitable for navigation instructions from the OSM dataset based on several metrics. This is coupled with a short comparison of landmark availability within OSM, differences in routes between locations with different levels of OSM completeness and a short evaluation of the suitability of the landmarks provided by the prototype. Landmark extraction is performed on a server-side service, with the instructions being delivered to a pedestrian navigation application running on an Android mobile device.
Article
Full-text available
The increased development of Volunteered Geographic Information (VGI) and its potential role in GIScience studies raises questions about the resulting data quality. Several studies address VGI quality from various perspectives like completeness, positional accuracy, consistency, etc. They mostly have consensus on the heterogeneity of data quality. The problem may be due to the lack of standard procedures for data collection and absence of quality control feedback for voluntary participants. In our research, we are concerned with data quality from the classification perspective. Particularly in VGI-mapping projects, the limited expertise of participants and the non-strict definition of geographic features lead to conceptual overlapping classes, where an entity could plausibly belong to multiple classes, e.g., lake or pond, park or garden, marsh or swamp, etc. Usually, quantitative and/or qualitative characteristics exist that distinguish between classes. Nevertheless, these characteristics might not be recognizable for non-expert participants. In previous work, we developed the rule-guided classification approach that guides participants to the most appropriate classes. As exemplification, we tackle the conceptual overlapping of some grass-related classes. For a given data set, our approach presents the most highly recommended classes for each entity. In this paper, we present the validation of our approach. We implement a web-based application called Grass&Green that presents recommendations for crowdsourcing validation. The findings show the applicability of the proposed approach. In four months, the application attracted 212 participants from more than 35 countries who checked 2,865 entities. The results indicate that 89% of the contributions fully/partially agree with our recommendations. We then carried out a detailed analysis that demonstrates the potential of this enhanced data classification. This research encourages the development of customized applications that target a particular geographic feature.
Article
Full-text available
With the ubiquity of advanced web technologies and location-sensing hand held devices, citizens regardless of their knowledge or expertise, are able to produce spatial information. This phenomenon is known as volunteered geographic information (VGI). During the past decade VGI has been used as a data source supporting a wide range of services, such as environmental monitoring, events reporting, human movement analysis, disaster management, etc. However, these volunteer-contributed data also come with varying quality. Reasons for this are: data is produced by heterogeneous contributors, using various technologies and tools, having different level of details and precision, serving heterogeneous purposes, and a lack of gatekeepers. Crowd-sourcing, social, and geographic approaches have been proposed and later followed to develop appropriate methods to assess the quality measures and indicators of VGI. In this article, we review various quality measures and indicators for selected types of VGI and existing quality assessment methods. As an outcome, the article presents a classification of VGI with current methods utilized to assess the quality of selected types of VGI. Through these findings, we introduce data mining as an additional approach for quality handling in VGI.
Article
Full-text available
Citizens are increasingly becoming an important source of geographic information, sometimes entering domains that had until recently been the exclusive realm of authoritative agencies. This activity has a very diverse character as it can, amongst other things, be active or passive, involve spatial or aspatial data and the data provided can be variable in terms of key attributes such as format, description and quality. Unsurprisingly, therefore, there are a variety of terms used to describe data arising from citizens. In this article, the expressions used to describe citizen sensing of geographic information are reviewed and their use over time explored, prior to categorizing them and highlighting key issues in the current state of the subject. The latter involved a review of ~100 Internet sites with particular focus on their thematic topic, the nature of the data and issues such as incentives for contributors. This review suggests that most sites involve active rather than passive contribution, with citizens typically motivated by the desire to aid a worthy cause, often receiving little training. As such, this article provides a snapshot of the role of citizens in crowdsourcing geographic information and a guide to the current status of this rapidly emerging and evolving subject.
Article
Open Data (OD) is an emerging term in the process of defining how scientific data may be published and re-used without price or permission barriers. Scientists generally see published data as belonging to the scientific community, but many publishers claim copyright over data and will not allow its re-use without permission. This is a major impediment to the progress of scholarship in the digital age. This article reviews the need for Open Data, shows examples of why Open Data are valuable, and summarizes some early initiatives in formalizing the right of access to and re-use of scientific data.
Chapter
Studies have analyzed the quality of volunteered geographic information (VGI) datasets, assessing the positional accuracy of features and the semantic accuracy of the attributes. While it has been shown that VGI can, in some contexts, reach a high positional accuracy, these studies have also highlighted a large spatial heterogeneity in positional accuracy and completeness, but also concerning the semantics of the objects. Such high semantic heterogeneity of VGI datasets becomes a significant obstacle to a number of possible uses that could be made of the data. This paper proposes an approach for both improving the semantic quality and reducing the semantic heterogeneity of VGI datasets. The improvement of the semantic quality is achieved by using a tag recommender system, called OSMantic, which automatically suggests relevant tags to contributors during the editing process. Such an approach helps contributors find the most appropriate tags for a given object, hence reducing the overall dataset semantic heterogeneity. The approach was implemented into a plugin for the Java OpenStreetMap editor (JOSM) and different examples illustrate how this plugin can be used to improve the quality of VGI data. This plugin has been tested by OSM contributors and evaluated using an online questionnaire. Results of the evaluation suggest a high level of satisfaction from users and are discussed.