PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Big data are not yet commonly used in psychological research as they are often difficultto access and process. One source of behavioral data containing both spatial andthematic information is OpenStreetMap, a collaborative online project aiming to developa comprehensive world map. Besides spatial and thematic information about buildings,streets, and other geographical features, the collected data also contains informationabout the contribution process itself. Even though such data can be potentially useful forstudying individual judgments and group processes within a natural context, behavioraldata generated in OpenStreetMap have not yet been easily accessible for scholars inpsychology and the social sciences. To overcome this obstacle, we developed a softwarepackage which makes OpenSteetMap data more accessible and allows researchers toextract data sets from the OpenStreetMap database as CSV or JSON files. Furthermore,we show how to select relevant map sections in which contributor activity is high and howto model and predict the behavior of contributors in OpenStreetMap. Moreover, wediscuss opportunities and possible limitations of using behavioral data fromOpenStreetMap as a data source.
Content may be subject to copyright.
Using OpenStreetMap as a Data Source
in Psychology and the Social Sciences
Maren Mayer1,2, Daniel W. Heck3, and Franz-Benjamin Mocnik4
1University of Mannheim
2Heidelberg Academy of Sciences and Humanities
3University of Marburg
4University of Twente
Version: February 11, 2022
Correspondence concerning this article should be addressed to Maren Mayer, Department
of Psychology, University of Mannheim, B6 30-32 Room 316, D-68159 Mannheim,
Germany. E-mail:
Author Note
Maren Mayer
Daniel W. Heck
Franz-Benjamin Mocnik
Data and R scripts are available at the Open Science Framework
( The software OSM-Psychology is available on GitHub
This work was supported by the Heidelberg Academy of Sciences and Humanities
(WIN project Shared Data Sources).
The authors made the following contributions. Maren Mayer: Conceptualization,
Software Development, Investigation, Methodology, Writing - Original Draft, Writing -
Review & Editing; Daniel W. Heck: Conceptualization, Methodology, Writing - Review &
Editing; Franz-Benjamin Mocnik: Conceptualization, Software Development, Writing -
Review & Editing.
Big data are not yet commonly used in psychological research as they are often difficult
to access and process. One source of behavioral data containing both spatial and
thematic information is OpenStreetMap, a collaborative online project aiming to develop
a comprehensive world map. Besides spatial and thematic information about buildings,
streets, and other geographical features, the collected data also contains information
about the contribution process itself. Even though such data can be potentially useful for
studying individual judgments and group processes within a natural context, behavioral
data generated in OpenStreetMap have not yet been easily accessible for scholars in
psychology and the social sciences. To overcome this obstacle, we developed a software
package which makes OpenSteetMap data more accessible and allows researchers to
extract data sets from the OpenStreetMap database as CSV or JSON files. Furthermore,
we show how to select relevant map sections in which contributor activity is high and how
to model and predict the behavior of contributors in OpenStreetMap. Moreover, we
discuss opportunities and possible limitations of using behavioral data from
OpenStreetMap as a data source.
Keywords: Big data, mass collaboration, online collaboration, group processes.
Using OpenStreetMap as a Data Source
in Psychology and the Social Sciences
1 Introduction
Sources of Big Data are of increasing importance in the social sciences in general
and in psychological research in particular. As the storage, processing, and analysis of
large data has become easier (Adjerid & Kelley, 2018), many researchers have
recommended the use of Big Data given that it provides many benefits (Kern et al., 2016;
Kosinski et al., 2015; Luhmann, 2017). Big Data offers the opportunity to analyze large
and diverse samples while partly or completely skipping the recruiting process for
participants. Especially user generated content (e.g., as in Wikipedia and other
Wiki-based projects, OpenStreetMap, Twitter, Facebook, or forums) makes it possible to
analyze human behavior, communication, and decisions through contributions to the
content of these projects. Furthermore, having large samples enables researchers to fit
more sophisticated models with many parameters and to test hypotheses and theories on
behavioral data generated in real-world settings. For instance, Wang et al. (2016)
analyzed Twitter tweets for content indicating positive or negative affect and found that
these tweets resemble the affect experienced in a typical working week.
While Facebook, Twitter, and Wikipedia have received considerable attention in
various fields of psychology (e.g., Cress et al., 2016; Liu et al., 2015; Wang et al., 2016),
other sources of big data have not yet been explored in a similar depth. Especially
user-generated content in collaborative projects as in Wikipedia is only rarely examined
but can be a valuable data source for a variety of research questions such as the
development of social rules and norms in collaboration, exemplar theories, or contribution
and correction processes in communities. For example, Cress et al. (2016) used Wikipedia
data to test a theory of social collaboration in online settings.
Even though being potentially useful and freely available, Wikipedia offers mostly
data structured in full-text articles consisting of sections and subsections. Such data are
more difficult to access and process than numeric data since extensive preprocessing is
necessary. For a quantitative analysis, different versions of Wikipedia articles must be
compared to extract the changes that were made from one version to another and the
extent of these needs to be translated into comparable and valid numerical measures
(Cress et al., 2016). This can be very time consuming for several hundred or even
thousand Wikipedia articles.
In contrast, the availability of numeric information facilitates the study of
incremental changes using quantitative approaches. For example, changes to the outline
of a building on an interactive map as in OpenStreetMap can easily be traced by
comparing the area of the building and its centroid before and after a change. Moreover,
verbal thematic information that has a simple structure (such as bullet points or semantic
tags) can be analyzed more efficiently than unstructured text. Analyzing changes
between different versions of a data set then becomes easier as one can compare the
number and type of tags that have been modified. A collaborative project that provides
both of these types of data—interval-scaled numeric information and simply structured
thematic information—is OpenStreetMap. Concerning semantic information, it also offers
the advantage that a pool of frequently used tags and their combinations are
available (Mocnik et al., 2017). Since OpenStreetMap is used and maintained by many
contributors worldwide, it is furthermore possible to test the generalizability of
hypotheses and results using data from many diverse cultural contexts. Thus,
OpenStreetMap data may be of interest especially for research combining geographical
features and information with psychological concepts and theories (Rentfrow et al., 2013;
Rentfrow et al., 2008).
OpenStreetMap is a world-wide collaborative project in which voluntary
contributors who are also called mappers create and edit geographical data in order to
obtain a detailed world map (OpenStreetMap Contributors, 2020). OpenStreetMap
contains geometric as well as thematic information about streets, buildings, and many
more geographical features. Contributors use satellite images, GPS trackers, personal
knowledge, and other sources to complete and maintain the map. To reach this goal,
contributors correct each other and extend existing information. This results in a rich
history of changes that are made to the features represented in OpenStreetMap. For
Figure 1
Building footprint of Mannheim Palace as being represented in OpenStreetMap.
(a) Representation in 2009 (b) Representation in 2018
Note. The varying level of detail indicates how the representation has been modified over time.
instance, Figure 1displays the history of geometric changes to the Mannheim Castle in
Germany. Similar to Wikipedia, users accessing a map on OpenStreetMap in the browser
usually only see the latest version of a feature while the complete change history is
optionally available for further inspection.
OpenStreetMap data can in principle be easily accessed and used, because the
data are available under an open license and corresponding database dumps can be
downloaded without costs involved. Several ways of accessing the data stored in such a
dump have been established (e.g., Raifer et al., 2019). None of these seem, however, to be
compatible with the tool set commonly used in psychology. In order to make these data
accessible for analyses within the domain of psychology and the social sciences, we thus
developed the software package OSM-Psychology. The software allows researchers to
extract information from the database and export a data set in a format compatible with
the tool set used in psychological research.
OSM-Psychology is written in Java and heavily builds on the OSHDB (Raifer
et al., 2019), a software written to extract the history of the OpenStreetMap data from a
full dump of the data base. This history is then extracted and stored in formats
compatible with standard statistics software, thus facilitating further statistical modeling
and analyses. The export focuses either on the complete contribution history of
OpenStreetMap elements (i.e., geographical features represented in the OpenStreetMap
data) over a certain time period or on a snapshot capturing the status of such an element
at a certain point in time. In each case, the user can specify for which areas and which
information about the features should be extracted.
In the following, we first provide an overview of online mass collaboration in
general and of collaboration in OpenStreetMap in specific. Secondly, we describe the
software interface, how to use the software, and the resulting data. Thirdly, we present
an example of how to extract information from OpenStreetMap to answer a psychological
research question with the resulting OpenStreetMap data. More precisely, we illustrate
how to select areas appropriate for further analyses and how to model and predict the
probability of future changes to OpenStreetMap content for the identified areas. Finally,
we discuss possible directions for future research and challenges of using OpenStreetMap
and big data as a data source for studying psychological research questions.
2 OpenStreetMap: Online Mass Collaboration of Individuals
When Internet users collaborate, they leave traces in the data sets they operate
on. These traces can be quite different in nature, which is among others the result of
different types of contribution structures. On the one hand, networking services such as
Twitter or Facebook allow their users to build and maintain social networks,
communicate with each other, and reference and comment on each others’ content. On
the other hand, mass-collaboration projects such as Wikipedia or OpenStreetMap allow
their users to collaborate in order to achieve a common goal. In line with Cress et al.
(2016, p.85), we define mass collaboration as “an activity where masses of individuals
work collaboratively on common products that capture the current state of the group.
Thus, social media platforms such as Twitter or Facebook are not defined as mass
collaboration because users do not intentionally work collaboratively towards achieving a
common goal or generating a common product (Mocnik et al., 2019).
Mass-collaboration projects are a rather new source of information, which have
emerged with the rise of the Internet. In recent years, the data quality of such projects
has been examined frequently. For Wikipedia, studies show a good overall quality
compared to the Encyclopedia Britannica (Giles, 2005) as well as for specific topics
compared to textbooks used in university lectures and professionally managed websites
(Clauson et al., 2008; Kräenbring et al., 2014; Leithner et al., 2010; Rajagopalan et al.,
2011). The accuracy of OpenStreetMap data has been extensively assessed in recent
years, often by comparing worldwide OpenStreetMap data to other reference data. For
Germany, the accuracy of OpenStreetMap was evaluated as medium (Ludwig et al., 2011)
to high (Helbich et al., 2012; Zielstra & Zipf, 2010), while urban areas are usually
attributed a better accuracy than rural areas (Chen, 2011). Similar results were obtained
in other parts of the world (Ciepłuch et al., 2010; El-Ashmawy, 2016; Haklay, 2010;
Zhang & Malczewski, 2017; Zheng & Zheng, 2014). An overview of these results and of
further activities towards the establishment of quantitative measures of data quality was
provided by Mocnik et al. (2018).
Elements in OpenStreetMap are classified as nodes, ways, or relations. Nodes are
used to represent point features such as trees, lampposts, or traffic lights. Ways consist of
at least two connected nodes, which are for instance used to display any kind of street. If
the start and the end node of a way are identical, the resulting geometry is closed and
can be used to represent any type of area such as building outlines, parks, and lakes.
Relations are compositions of several OpenStreetMap elements. They can contain ways
such as in the case of a university which consists of several buildings, or in the case of a
cycling track which consists of corresponding roads. Relations can also consist of different
types of elements. For instance, bus lines can be represented by a relation that consists of
the roads the bus takes on a tour and nodes to represent the corresponding bus stops.
Each element in OpenStreetMap can be described by tags which encode any
thematic information beyond the geometric information. Tags are organized in key–value
pairs, such as “building = university,” where “building” is the key and “university” the
corresponding value. Tags are commonly used to indicate feature types, names, and
addresses of buildings, speed limits on streets and highways, business hours of shops, and
many other aspects. Note that each element can have tags attached, even if it is a
member of a relation or (in case of a node) forms part of a way. For example, a way (i.e.,
a collection of nodes) may have the tag “highway = residential” in case it represents a
street within a residential area. At the same time, one of its nodes may have the tag
“highway = traffic_signals”, thereby indicating that there are traffic lights at this specific
location along the street. In view of the virtually endless possibilities of describing
elements by tags, a certain consensus with implicit conventions regarding the use of tags
has emerged as the result of a community process (Mocnik et al., 2017).
OpenStreetMap builds on the contributors’ motivation to create an open and
comprehensive data set that can be used to generate maps. However, the resulting data
can also be of interest for researchers from various fields, because they contain
information about geographical features, the way individuals represent these, decision
processes behind the generated content, and the various modes of how individuals
interact when collaborating in such a project. For instance, previous research investigated
topics such as the contributors’ motivation (Budhathoki & Haythornthwaite, 2013),
network growth (Corcoran et al., 2013), and city planning (Liu & Long, 2016). Important
for psychology and social-science scholars, contributors leave digital traces when creating
new elements and when editing or deleting existing ones. The resulting behavioral data
included in the change history of OpenStreetMap can thus be used to trace the judgments
and decisions that underlie the process of contributing to a large, collaborative project.
3 Software Interface
In this section, we describe the open-source software package OSM-Psychology1
and briefly illustrate the basics of how to extract a data set. A more detailed step-by-step
tutorial for installing the software and getting started can be found in the documentation
of the package. Moreover, the latter resource also provides a comprehensive
documentation of all the available functions including specific examples of code and data.
3.1 Extracting Data
The software
can be used to extract collaboration data from the
OpenStreetMap History Database in a format that can easily be used for standard
statistical analyses in psychology. After the installation and the download of a
corresponding database, a short snippet of Java code such as that shown in Figure 2
allows researchers to extract the desired data and export them as a CSV file. The
example code highlights the three steps that are required for doing so: In the first step
(via the argument Data.load), the data set for which the path needs to be specified is
loaded. In a second, optional step (via the argument BoundingBox), a bounding box is
specified, which contains the minimum and maximum with respect to both longitude and
latitude of the area of interest.
The third and most important step is to export the data (e.g., via the argument
). Two modes of export are available: the first exports the different
historical versions of the elements themselves, whereas the second exports only the
incremental contributions (i.e., the changes from one to another version). These modes
are described in more detail in the next section. For each of these modes, the data can be
exported either in the CSV or in the JSON format. Both methods except several
arguments: (1) the filtering of which data to select, such as only buildings, roads and
streets, nodes, or all elements; (2) the bounding box as an optional argument in case the
exported data set should be spatially restricted; and (3) a list of properties to be
exported. Table 1shows the data set obtained by running the example code in Figure 2.
3.2 Modes of Export
Two aspects of the data are of particular interest for research in psychology and
the social sciences: a snapshot of all the elements included in the data at a certain point
in time (called element view in the following), and the change history of these elements
within a certain time period (called contribution view). The element view exports a
snapshot of the data at a certain point in time. It is element-centered in that every row
in the extracted data file represents one OpenStreetMap element with its properties at
Table 1
Example data exported from OpenStreetMap using OSM-Psychology
"OsmID" "ObjectID" "Timestamp" "OsmType" "NumberOfChanges" "CentroidLon" "CentroidLat" "Tags"
"node/310797727" 310797727 2013-11-06 17:26:20 "NODE" 2 8.664628 49.3789372
"recycling:glass = "yes",
amenity = "recycling""
"node/3770174857" 3770174857 2015-10-03 09:00:14 "NODE" 1 8.6464978 49.404143
"shop = "bakery", name
= "Riegler""
"node/5395319198" 5395319198 2018-02-07 08:55:48 "NODE" 1 8.6755765 49.4077557
"wheelchair = "yes",
amenity = "university",
name = "Universitäts-
"relation/1186436" 1186436 2015-10-15 17:54:55 "RELATION" 14 8.67920133636414 49.4038327227724
"boundary =
postal_code = "69115""
"relation/255164" 255164 2009-09-18 23:09:18 "RELATION" 1 8.60882500953722 49.4188673670933
"ref = "K 9709", route =
"road", type = "route""
"way/463527848" 463527848 2018-12-04 15:25:13 "WAY" 3 8.67158229968341 49.408671636229
= "63", addr:street
= "Vangerowstraße",
addr:postcode =
"69115", building =
"yes", addr:city = "Hei-
"way/641884455" 641884455 2018-11-07 13:56:38 "WAY" 1 8.68255951422936 49.4093721339541
"fenced = "yes", play-
ground = "sandpit""
"way/457349073" 457349073 2017-11-20 09:31:20 "WAY" 3 8.60144385 49.4452956
"name = "Lerchenweg",
highway = "footway""
Figure 2
Example of Java code for extracting data from OpenStreetMap using OSM-Psychology.
import c om . o sm . p sy c h ol o g y . co r e .* ;
import co m. os m . ps yc h ol og y . st r at e gi es . *;
pu b l ic c l as s Main {
public static void m ai n ( S tr i ng [ ] ar gs ) throws Exc e pt io n {
// load t h e O SM H i st o ry D a ta b as e
Data.load(" D : / O S Md a t a / g er m a ny . o s hd b . m v . db " ) ;
// sp e ci f y a bo u nd in g box
Bo u nd i ng Bo x hd = new BoundingBox(" H e id e lb e rg " , 8.573179,
4 9. 35 2 00 3 , 8. 79 40 5 , 49 .4 59 6 93 ) ;
// exp o rt to CSV f i l e
ExportEntities.csv(new S t ra t eg y Al l () , hd , " 2 01 9 - 0 1 - 01 " ,
C ol . B A SI C _ IN F OR M A TI O N , C ol . C E NT R OI D , C o l . TA GS ) ;
the specified point in time. In contrast, the contribution view exports the history of
changes made to the elements within a specific time span. It is contribution-centered in
that each row indicates one contribution, meaning that each OpenStreetMap element
covers as many rows as it was modified by the contributors. For each modification, the
properties of the element are provided both before and after they have been changed.
Thereby the creation of an onject is counted as first change with no properties before the
creation and the entered information after the creation.
The geographical area for which data are exported can be specified by a bounding
box. The argument
in Figure 2defines the minimum longitude and latitude,
and the maximum longitude and latitude (in that order) of the area of interest for
extraction. In case the bounding box is omitted in the extraction function, the elements
and contributions are not filtered for an area during the export but all data of the
OpenStreetMap database are considered. Due to the large size of the exported data,
especially when employing the contribution view, it is highly recommended to focus only
on specific areas of interest and to refrain from extracting all elements or contributions
from the database at once. In the preparatory processing outlined below, we provide an
impression of realistic numbers of elements and file sizes. Moreover, we demonstrate how
to define and select an appropriate area of interest to obtain reasonably-sized data sets,
thus also reducing the computation time for further analyses.
After defining a bounding box or deciding to omit it, the export command is
specified (e.g.,
in Figure 2). The command itself specifies the type
of view (either element view of contribution view) and the data format to which the data
are exported (either CSV or JSON). Within the export command, the type of objects to
export (either buildings, roads, nodes, or all) is specified via the strategy argument (e.g.,
). If a bounding box was specified, the object name of this bounding box is
also included in the export command (e.g.,
). Depending on the type of view, a single
point in time (e.g., "2019-01-01") or a period of time can be added next. This
time-related parameter can also be skipped. In this case, the most recent version of all
elements available in the database is exported by the element view, whereas all
contributions stored in the database are extracted by the contribution view. Lastly, at
the end of the export command, all the relevant features of the OpenStreetMap objects
are indicated (e.g., Col.BASIC_INFORMATION or Col.TAGS), each adding one or more
columns to the exported data.
3.3 Variables Available for Export
For each OpenStreetMap element, a number of columns containing variables can
be exported. The available information can be roughly categorized into basic information,
geometric information, and tag information. Which of these variables can be extracted
depends on whether the element view or the contribution view is used as not every type
of information is available for both types of views. An overview over all available
variables is given in the documentation of the program at GitHub.
Basic information is available for both views and contains the ID of the element, a
timestamp indicating when it was edited, and the type of the element, among others.
The ID of the element is, however, only unique within each class of element type (i.e.,
node, way, or relation). To provide IDs that are unique across all types of elements, the
exported data also include the OSM ID composed of the element type and ID. Specific
elements in the resulting data can be traced through the official OpenStreetMap API by
appending the OSM ID to the domain name of OpenStreetMap. For instance, the
relation representing Mannheim Castle with the ID 60105 is obtained via
When extracting data in the element view, the timestamp represents the date and
time of the last change of an OpenStreetMap element. Moreover, the number of changes
made to an object until the specified point in time for the snapshot are additionally
available as basic information. Note that the number of changes resembles the version of
the OpenStreetMap element as counted internally in the OpenStreetMap database. In
contrast, contributions as extracted in the contribution view do not only register changes
in the element itself but also in subordinate elements, such as changes to nodes that form
part of a way or several ways in a relation (Raifer et al., 2019).
When using the contribution view, the exported data set may include several
contributions per element. In this case, the timestamp indicates the date and time of
each contribution. The contribution view additionally allows the extraction of the type of
contribution (creation, deletion, geometry, and tag change), the user ID of the
and the ID of the change set (i.e., the session in which the contribution was
made). This classification of contribution types is performed by the OSHDB API, which
is used by
internally. Such classification can, however, be more complex
than it seems at first hand. For instance, when adding subordinate elements to an
existing one (e.g., a node to a way), this is not classified as one of these contribution
types for the superordinate element if the geometry does not change. However, the
creation of the new subordinate element is classified as a creation.
Geometric information is available for OpenStreetMap elements in both the
element and the contribution view. It contains the type of geometry (i.e., point, line
string, polygon, multi-line string, multi polygon, geometry collection), the length of the
element’s boundary in meters, its area in square meters, the number of nodes used to
define the geometry, and the centroid represented by geographical coordinates (e.g., via
the argument Col.CENTROID in Figure 2). Geometric information is exported for all
No user account was required to edit OpenStreetMap before 2007, and for users of the API v0.6 before
2009 (Bégin et al., 2017). In such cases, the contributor user ID is provided as 0.
elements even when it is not informative (e.g., the area of ways and nodes as well as the
length of nodes all vanish by definition). For data obtained using the element view,
geometric information refers to the state of the OpenStreetMap element at the specified
point in time. When using the contribution view, all geometric information is available
for both the state before and the state after the contribution.
Tag information comprises the number of tags of an OpenStreetMap element and
a list of these tags as key–value pairs (e.g., via the argument
in Figure 2). For
the element view, the most recent information is exported according to the specified date
of the snapshot. For the contribution view, these variables are available for both the state
before and the state after the contribution.
The data can be stored in either the CSV or the JSON data format, both of which
are convenient for further statistical analysis using R or Python. When extracting the list
of tags for an element, it is recommended to use the JSON file format since tags can be
represented more efficiently in this case. The exported data files are saved to the
sub-directory “exported-data” within the folder in which the Java code is run. In the
following section, we illustrate how psychological research questions can be addressed
using OpenStreetMap data.
4 Exemplary Application
4.1 Preparatory Processing: Identifying Appropriate Areas for Analysis
Before investigating any research question using OpenStreetMap data, one needs
to identify appropriate geographical areas for which data are exported and analyzed. In
some cases, specific areas of the map are required by the hypothesis itself, for instance,
when testing cultural differences in how certain elements are represented on the map. In
other cases, however, the first step is to select an informative map area in which the
behavior of interest can be studied. For instance, when testing predictions about future
contribution behavior, it is necessary to use areas in which the elements have been
changed frequently enough. Similarly, when analyzing the development of tag structures
(e.g., in the context of exemplar theories of categorization, Nosofsky, 2011, or with
respect to the development of social norms, Emmerich et al., 1971), it is necessary that
the exported elements have a sufficiently large number of tags. When examining the
contributors themselves, one should focus on sufficiently large areas to capture all
contributions of the contributors of interest.
Selecting only these geographical areas that are appropriate for the research
questions and hypotheses of interest also reduces computation time. Compared to
analyzing overly large areas, one may focus on smaller, preselected areas that contain
these elements that are of interest for the analysis. Even though tempting, we do not
recommend using the complete OpenStreetMap data base or very large areas such as
densely populated countries or even continents. Especially when examining contributions,
the extracted data can be extensive, thus presenting a hurdle to the analysis with
standard software for statistical analyses. Instead, it is preferable to preselect one or
several areas of interest that are appropriate for the research question. This approach
also facilitates a cross-validation of results across different geographical areas, and thus, a
test of the generalizability of the conclusions.
To illustrate how appropriate areas can be identified, we outline how to preselect
areas for the substantive analysis below. In this example, we aim at predicting whether
OpenStreetMap elements are changed in the future. As contribution activity is not
uniformly distributed across regions, we need to select areas in which elements are
changed frequently enough. A similar approach can also be applied when focusing on the
number of tags or other selection criteria appropriate for different research questions.
To identify areas in which elements are frequently changed, we run a regression to
predict the number of changes by the coordinates of the OpenStreetMap elements. The
number of changes are provided in the basic information of the extracted data. Moreover,
the centroid of an element described by longitude and latitude is available as geometric
information. We expect that contributors make more changes to elements that are
located in densely populated and touristic areas which offer sights, shops, and other
amenities such as city centers compared to more sparsely populated, rural areas.
First, we downloaded the OpenStreetMap data base for Germany 3. Using the
data base file, we extracted all OpenStreetMap elements from Germany’s three largest
cities (i.e., Berlin, Hamburg, and Munich) and their surrounding areas as of December
31st, 2019. Since we were not interested in the changes themselves but only in their
frequency, we used the element view and exported basic information and the centroid
using the CSV format.
The statistical analysis aims at detecting areas with many contributions. For this
purpose, we take into account some specifics of OpenStreetMap data and geographical
data in general. First, we do not expect the centroid coordinates (i.e., longitude and
latitude) to have a linear or simple curvilinear relationship with the number of changes.
Instead, we use generalized additive models (GAMs) to fit nonlinear spatial patterns
(Wood, 2011). GAMs are a statistical method for fitting regressions in which the criterion
is linearly regressed on smooth functions of the predictors with the specific functional
form estimated based on the data. Thereby, GAMs can model relationships between
variables that follow complex nonlinear functions. Second, longitude and latitude cannot
simply be interpreted as independent, linear variables since the Earth’s surface strongly
deviates from that of a plane. Instead, the earth can be considered a sphere in the
present context, and we thus use a certain class of splines tailored to generalized additive
models on a sphere (Wood, 2011; Wood, 2017). Third, as the creation of an element
counts as a first change, the number of changes is always strictly positive and can never
be zero. We account for the resulting skewness by using a logit link function and a
zero-truncated Poisson distributions for the dependent variable in all analyses. Lastly,
since the areas of interest can be very large, we use parallel computation to reduce
computation time (Wood, 2017).
All GAM analyses were conducted in R using the package mgcv (Wood, 2011;
Wood, 2017), zero-truncated Poisson distributions were implemented with the
package (Zeileis & Kleiber, 2018), and plots were created using
(Wickham, 2016)
and ggmap (Kahle & Wickham, 2013). The data and the code for extracting and
analyzing the data in R are available at
Before analyzing the data and selecting areas for further analyses, we excluded
subordinate elements (helper elements) from the data. The sole purpose of these elements
is to create superordinate elements (such as nodes in ways) but they do not have any
meaning themselves (such as points on the way that mark traffic lights). These elements
can be easily identified since they have no tags. From 7,274,232 elements (2,935,537 for
Berlin, 2,619,431 for Hamburg, and 1,719,264 for Munich), 6,762,182 elements (2,752,924
for Berlin, 2,456,263 for Hamburg, and 1,552,995 for Munich) remained for analysis.
Table 2shows descriptive statistics of the frequency of changes for the extracted
data. The results indicate that many OpenStreetMap elements are rarely changed while a
small number of elements is extensively edited. Overall, this results in a large range of
the frequency of changes. The most edited but also rarest type of elements are relations,
which are often used to represent more complex elements. These aspects need to be
considered when selecting appropriate areas for any statistical analyses as these in the
example below.
Table 2
Frequency of changes for OpenStreetMap elements in Berlin, Hamburg, and Munich and
their surrounding areas.
Frequency of Changes
City Element Type Number of Elements M SD Range
Nodes 906,546 (0.329) 2.53 2.14 214
Relations 27,712 (0.010) 7.43 34.7 2203
Ways 1,818,666 (0.661) 2.39 2.83 130
Total 2,752,924 2.49 4.38 2203
Nodes 569,371 (0.232) 2.29 2.25 91
Relations 36,442 (0.015) 6.44 35.60 2194
Ways 1,850,450 (0.753) 2.22 2.49 195
Total 2,456,263 2.30 5.00 2194
Nodes 459,226 (0.296) 2.18 2.20 97
Relations 16,777 (0.011) 8.58 32.4 1701
Ways 1,076,992 (0.693) 2.49 2.82 89
Total 1,552,995 2.46 4.33 1701
Note. Brackets indicate the relative proportion for each type of OpenStreetMap element.
Next, we fitted GAMs to the extracted data by regressing the number of changes
on the centroid using a 2-dimensional spline for spheres. The models showed a satisfying
explained deviance of 6.27% for Berlin, 3.68% for Hamburg, and 4.51% for Munich.
Figure 3shows the results of the fitted GAMs for these three cities. The contour plot
indicates which areas are more or less frequently changed from yellow to violet,
respectively. As expected, elements in the center of each city are more frequently changed
by the contributors than these in the peripheral areas.
Our results provide important information about which areas are appropriate for
Figure 3
Predicted number of changes by longitude and latitude for Berlin, Hamburg, and Munich
and their surrounding areas.
which type of analysis. When examining the contribution process and how contributors
interact through contributing to OpenStreetMap, cities and city centers may be more
appropriate areas for analysis than peripheral or rural areas. In city centers, contributors
make a larger number of changes to OpenStreetMap elements, making it easier to trace
the contribution process. Thus, elements are sufficiently often changed, resulting not only
in a sample that is informative with respect to the frequency of changes but also has a
sufficient amount of variance for detecting systematic relationships reliably.
4.2 Analysis Example: Predicting Convergence
The following example aims at gaining first insights into how the collaboration in
projects such as OpenStreetMap is organized. For this purpose, we analyze on which
entries contributors focus when they contribute and to which degree this behavior can be
predicted. This also gives an impression of how stable the OpenStreetMap data are over
time and which entries may still change frequently in the future.
The analysis builds on prior research on predicting social contacts (Pachur et al.,
2012,2014). Pachur and colleagues showed that the probability of future social
interactions can be predicted by recency, frequency, and spacing of past interactions. In
their study, the patterns of social interaction of 40 participants over 30 days were used to
predict the probability of a contact on day 31 (Pachur et al., 2014). The more frequent or
the more recent a contact with a certain person was during the time period of 30 days,
the higher was the probability to interact with this person on day 31. The spacing of the
contacts moderated the relationship between recency and social interaction such that the
probability of a future interaction was higher when contacts were not only recent but also
massed compared to recent but spaced contact.
We apply this modeling approach to the OpenStreetMap data. However, instead
of focusing on social interactions with different individuals, we focus on the interaction of
contributors with different elements. Nevertheless, the structure of contributions to
OpenStreetMap elements by contributors may be considered similar to the structure of
social interactions in the study of Pachur et al. (2014). Social contacts that the
participants interacted with can be conjectured to resemble OpenStreetMap elements
that contributors edit and, thus, interact with. Furthermore, the “interactions” of
contributors with OpenStreetMap elements are also spaced over time. Hence, it is
possible to compute the relevant input variables for the model: frequency of edits for one
element, the recency of the last change, and the spacing between changes. With this
information, the probability of future edits can be predicted.
According to the findings of Pachur et al. (2014), we expect that recency,
frequency, and spacing of edits predict future changes. More specifically, contributors
should be more likely to change an element if it has recently been changed (Hypothesis 1)
and if it has been changed frequently in the past (Hypothesis 2). Moreover, we expect an
interaction of the spacing of changes with recency such that the probability of a future
change is higher when the last change was recent and when the past changes were massed
rather than spaced (Hypothesis 3).
Based on the preparatory preprocessing, which showed that contributors in
OpenStreetMap change more elements in cities rather than in rural areas, we use data of
cities for the analysis of contributions. To limit the possible influence of very active, local
contributors, we extracted data from the three most populated cities in Germany (Berlin,
Hamburg, and Munich) and combined these data. We included all available elements
from these cities starting in January 2017 until December 2019 to predict changes in
January 2020. Using OSM-Psychology, we thus extracted two data sets with basic
information and the number of tags for each city: one data set with the contribution view
(i.e., changes from 2017 to 2019) and another data set with the element snapshot (i.e., the
state of the map in January 2020).
We had to exclude several elements before performing the analysis. We excluded
13,148 elements from the contribution data since they were initially created in
OpenStreetMap in January 2020, and thus, their creation could not be predicted by data
from the previous two years. Similarly, we excluded 12,409 elements for which the earliest
change in the time span was made in January 2020. We also removed 129,744 elements
that were deleted between 2017 and January 2020. Similar as in the preparatory
processing, we excluded subordinate elements with no tags since changes in these
elements are also captured by the higher-order elements in the contribution view (64,552
elements excluded). Finally, since the element snapshot provides a snapshot of the
desired map area at a given point in time, we excluded all elements from these data that
were last changed before 2017 according to the contribution data (2,101,081 elements).
From initially 3,582,832 unique elements for all cities and 3,258,530 contributions,
1,375,223 unique elements and 2,881,503 contributions remained.
Based on the extracted data set, we computed the dependent and independent
variables. As the dependent variable, we used a dichotomous variable indicating whether
an element was changed in January 2020 or not. Regarding the independent variables, we
used the definition of contributions as in the contribution view, meaning that
contributions are also counted as such when changes were made to associated elements
and not only to the main element. Since we aim at predicting the probability of a change
in January 2020 (month 37), we computed recency, frequency, and spacing monthly.
Similar to Pachur et al. (2014) who computed frequency as the number of days on which
an interaction between two persons happened, we computed frequency as the number of
months in which an element was changed at least once. The mean frequency was 1.54
months with a standard deviation of 1.58 months. Recency was operationalized as the
number of months between the last change and January 2020, which was on average 12.89
months with a standard deviation of 8.52 months. Finally, we computed the spacing of
changes according to Pachur et al. (2014) by selecting elements with exactly two past
changes and computing the interval between these changes in months (M= 10.89, SD =
8.22). The spacing was categorized into massed changes if changes were only one month
apart and spaced changes if they happened further apart creating a dichotomous variable
(Pachur et al., 2014).
For the statistical analyses, we used the R package
(Bates et al., 2015). We
fitted generalized linear models with a logit link function to account for the dichotomous
dependent variable. Additionally, we mean-centered the recency and frequency variables.
Lastly, within the full data set, some values of the independent variables were heavily
Figure 4
Probability of a change in January 2020 regressed on recency, frequency, and the
interaction of spacing and recency over the previous 36 months.
Note. Spacing only contains data with exactly two contributions. Massed changes are defined as
one month between contributions, while spaced changes are separated by more than one month.
overrepresented (e.g., change frequency of one or recency of one) and would thus have an
excessively large influence on the regression estimates (Krawczyk, 2016). To avoid
overweighing of these specific values of recency and frequency, we aimed at ensuring an
approximately equal number of observations across the full range of predictor values
(similar as in a balanced factorial design). For this purpose, we randomly sampled 1,000
OSM elements for each value of recency or frequency and for each type of element. We
also applied this procedure for the analysis of Hypothesis 3 for each combination of the
levels of recency and spacing. Additionally, to reduce the strong influence of single
elements that were far more frequently or recently changed than all others, we excluded
elements if the respective recency, frequency, or combination of spacing and recency
occurred in less than ten elements for each element type.
The different types of elements in OpenStreetMap clearly vary in complexity:
Nodes present single points on the map, ways consist of points forming a polygon, and
relations are elements consisting of other elements themselves. Hence, the relationships
between change probability and the independent variables may be moderated by the type
of element. Thus, we modeled the relationship between the dependent variable and the
change probability separately for each element type. Moreover, we ran a joint analysis to
test for an interaction of element type and recency or frequency, respectively. For this
purpose, we compared a nested model including only the main effects of element type and
either recency or frequency or the interaction of recency and spacing against a model that
also contained the corresponding interaction term with the OSM type.
With respect to the recency of editing relations, we found that the change
probability for mean recency was significantly below chance as indicated by a negative
intercept of the logistic regression (β=2.464, CI= [2.521; 2.408],z=85.48,
001). As predicted, the change probability significantly decreased the less recent the
last change was made (β=0.170, CI = [0.177; 0.163],z=46.81,p<.001). For
both ways and nodes, the change probability for mean recency was also below chance
(β=4.090, CI = [4.107; 4.073],z=477.94,p<.001 and β=4.508, CI
= [4.538; 4.480],z=305.13,p<.001, respectively). Moreover, the negative
relationship between recency of changes and change probability was again negative both
for ways (β=0.036, CI = [0.038; 0.034],z=42.59,p < .001) and for nodes
(β=0.028, CI = [0.031; 0.025],z=18.45,p<.001). However, even though
significant, these relationships between recency and change probability are much weaker
for ways and nodes than for relations and cannot be considered meaningful. This may be
due to the data structure since the structure of ways and especially nodes is simpler than
the one of relations which leads overall to less changes and less need to revise information.
Next, we tested the interaction with element type by comparing a nested model having
recency and element type only as main effects against a more general model allowing for
an interaction. The nested model with main effects fitted significantly worse as indicated
by a likelihood ratio test (χ2= 2075.8,df = 2,p<.001), thus providing evidence for an
interaction of recency and element type. This result is also displayed by Panel a) in
Figure 4, showing that contributions to elements of the type relation followed the pattern
predicted by Hypothesis 1 while contributions for nodes and ways did not conform to the
expected pattern.
In line with Hypothesis 2, Panel b) in Figure 4displays that, with an increasing
frequency of changes made to an element between 2017 and 2019, the probability of a
change in January 2020 increased. Regarding the absolute level of changes, the
generalized linear model showed a significant negative intercept, meaning that the change
probability was below 50% for relations (
077, CI = [
p<.001), for ways (β=2.752, CI = [2.843; 2.664],z=60.20,p<.001), and for
nodes (β=3.747, CI = [3.929; 3.575],z=41.509,p<.001). More importantly,
the model showed a significant positive trend for each type of element (relations:
β= 0.168, CI = [0.162; 0.175],z= 52.55,p < .001; ways: β= 0.187, CI = [0.160; 0.215],
z= 13.44,p<.001; and nodes: β= 0.215, CI = [0.123; 0.307],z= 4.578,p>.001).
Even though the separate models show rather similar results concerning the relationship
between frequency and change probability, a model comparison comparing a nested model
with frequency and element type as main effects only against a model also including the
interaction term revealed that the restricted model fitted the data significantly worse
(χ2= 1412.9,df = 2,p<.001). Overall, these results support Hypothesis 2.
Lastly, we examined the effect of the interaction of recency and spacing on the
probability that an element is changed (Hypothesis 3). For elements of the type relation,
we found that, when changes were massed, recency was negatively related with change
probability (β=0.116, CI = [0.153; 0.083],z=6.502,p<.001) whereas this
trend was significantly diminished for spaced changes (β= 0.110, CI = [0.074; 0.150],
z= 5.617,p<.001). However, for ways, recency was only very weakly and not
meaningful related to the change probability in January 2020 when changes were massed
(ways: β=0.023, CI = [0.033; 0.013],z=4.468,p<.001) and not significantly
related to change probability when changes were spaced (
= 0
005, CI = [
008; 0
z= 0.792,p=.429). For nodes both relationships were not significant (massed changes:
β=0.001, CI = [0.019; 0.016],z=0.140,p=.889; spaced changes: β= 0.006, CI
= [0.013; 0.027],z= 0.651,p=.515). Furthermore, comparing a model predicting the
change probability by the interaction of recency and spacing and a main effect of element
type showed significantly worse fit to the data than a model predicting change probability
with a three-way-interaction of recency, spacing, and element type (χ2= 104.74,df = 6,
p<.001). This indicates that the element type has a significant influence on the
relationship between change probability and the interaction of recency and spacing. The
respective trends are displayed in Panel c) of Figure 4, indicating that only contributions
to elements of the type relation followed the expected pattern, thus providing mixed
evidence for Hypothesis 3.
Overall, our example illustrates how the OpenStreetMap data base can be used to
test substantive hypotheses about how contributors collaborate in order to edit and
modify elements. Extending research on social interactions (Pachur et al., 2014), we
tested whether changes can be predicted by the recency, frequency, and spacing of prior
contributions. For elements of the type relation, we found that the more recent and
frequent changes were in the past, the more likely a change was made in the future.
Moreover, recent changes were especially predictive for future changes when past changes
were massed rather than spaced. However, for ways and nodes, only the frequency of past
changes predicted the probability of a change in January 2020.
The behavior of how contributors change elements resembles social interactions
only for relations but not for ways and nodes. A reason why nodes and ways are not
changed in the same way as relations may be that relations are more complex elements
than nodes and ways. Relations are composed of several subordinate elements that
provide more information than a single element alone. Hence, relations allow contributors
to make more modifications and changes to the existing information and are thereby
prone to more dispute, in turn resulting in further changes. In contrast, consensus among
contributors seems to be reached faster for simple elements such as ways or nodes, in turn
resulting in less changes. This finding is also supported by Table 2and Figure 4showing
that ways and nodes are generally edited less frequently than relations.
The results have practical implications when selecting data for testing behavioral
models. Our results show that the changing behavior of contributors in OpenStreetMap
follows basic predictable patterns of interaction behavior, meaning that changes do not
occur at random or unsystematically. Hence, it might be possible to develop and validate
a quantitative index that measures the change stability of OpenStreetMap elements.
Such an index could serve as a proxy for the quality of an element’s representation in the
data base. More generally, these findings may be used as a preliminary foundation for
future research on the editing behavior of contributors to mass collaboration projects.
5 Discussion
The OSM-Psychology software is a tool to extract collaboration data from the
OpenStreetMap project to investigate phenomena related to online mass collaboration of
contributors. We demonstrated that such data can serve as a testbed for generating and
investigating research questions related to psychology and the social sciences. Practically,
we showed how to identify appropriate geographical areas for such analysis to then
predict future contribution behavior by past contribution behavior analogously to social
interactions (Pachur et al., 2014).
Analyses like the ones described in this article are only rendered possible through
the use of software tools such as OSM-Psychology. Without the use of such tools to
extract corresponding data, advanced programming skills would be required to create the
data structures commonly used in the social sciences. Furthermore, OSM-Psychology
offers several filtering mechanisms to limit the extracted data to geographical areas,
ranges of OpenStreetMap element types, time spans, and variables of interest. This helps
researchers to select exactly these elements that are relevant for further analyses. A
major benefit of OpenStreetMap is that the data do not consist of unstructured text as in
Wikipedia (Cress et al., 2016) or social-media networks (Liu et al., 2015; Wang et al.,
2016), but rather contain numerical information for geographical properties as well as
semantic tags in form of key–value pairs, which facilitates data preparation. By making
OpenStreetMap data more accessible to psychology and the social sciences,
OSM-Psychology offers researchers the possibility to test behavioral hypotheses using
large-scale behavioral data obtained in a natural, ecologically valid environment rather
than in artificial laboratory settings. Thereby, OSM-Psychology broadens the range of
available sources of big data for psychology and the social sciences.
Application of OpenStreetMap data to psychological research questions
OpenStreetMap data as provided by OSM-Psychology offer the possibility to
study collaborative behavior in online communities by analyzing contribution processes
(Cress et al., 2016). The change history including the modifications of all OpenStreetMap
elements can be traced back to its creation by using the contribution view for data
extraction. This comprehensive information on the processes of adding and modifying
elements can help to gain insights on how collaboration in online communities works and
how information on OpenStreetMap develops (Mayer & Heck, 2021).
OpenStreetMap data may be used to investigate research questions on semantic
networks (Nosofsky, 2011; Stein et al., 2015). The structure of semantic tags in
OpenStreetMap is very detailed and contains categories and subcategories that developed
naturally over time. Hence, theories on categorization may profit from analyzing the tag
structure in OpenStreetMap with respect to the development as well as the structure
itself. The contribution view allows users to extract the history of all contributions while
focusing on the changes of semantic tags over time. Furthermore, tagging can provide
insights into the development of norms and consensus. The OpenStreetMap Wiki
an overview of all available tags and how they are used. Similar to the OpenStreetMap
data base itself, these rules have emerged over time, which can be traced in the history of
the Wiki articles. It can thus be analyzed how norms are established in the
OpenStreetMap Wiki and how norms are applied during the contribution process, and
how these processes interact with each other. This offers the possibility to apply theories
of norm construction (e.g., Emmerich et al., 1971) to real-life data.
As proposed by Mayer et al. (2020), OpenStreetMap data and the corresponding
OpenStreetMap Wiki may also be used to examine shared mental models among
contributors (Johnson-Laird, 2005; Johnson-Laird, 1980). Shared mental models concern
the contribution process itself, comprising ideas of how to add a structure in a
geometrically correct way and tag it appropriately. OpenStreetMap data gives
information about which steps are performed by contributors when creating and editing
elements, and how these steps may have changed over time. Additionally, the
OpenStreetMap Wiki provides insights about the editing rules and how they have
changed over time. This situation makes it possible to obtain a comprehensive view on
the development and application of shared mental models through analyzing formal
editing rules as described in the OpenStreetMap Wiki, the editing process, and the
contributions themselves.
5.2 Limitations
Even though potentially useful for investigating research questions,
OSM-Psychology has some limitations. First, it is not possible to extract data in a
contributor-centered way. This means that the subset of data to be extracted can only be
defined by timestamp, area, and element type. For a comprehensive analysis of
contributor behavior, it is not possible to extract all changes made by specific
contributors or by a subset of contributors who meet certain criteria without further
effort. When the contributor behavior is only traced and analyzed within a predefined
geographical area, not all changes of a contributor may be considered.
Related to the first limitation, it is not possible in OSM-Psychology to obtain
information about characteristics of the contributors since only the unique contributor
identifier (i.e., the user name shown on the website) is available. This may limit
predictions on future contribution behavior, for instance, which contributors can be
expected to further contribute to OpenStreetMap and which contributors are likely to
withdraw. As a remedy, it is possible to collect information about OpenStreetMap
contributors within a survey (e.g. Budhathoki & Haythornthwaite, 2013) and, with the
consent of the participants, to connect these data to individual contributions.
Third, the data obtained through OpenStreetMap are not as controlled as data
collected in a laboratory setting and thus comprise more random noise. Nevertheless,
OpenStreetMap has a high data quality, as has already been outlined above (e.g., Helbich
et al., 2012; Zielstra & Zipf, 2010). Furthermore, even if a contributor makes erroneous
changes, other contributors can and often will correct these errors over time to align the
individual with the collaborative effort. Thus, erroneous entries may be less concerning in
light of the high data quality and rather be intermediate steps in the correction process.
6 Conclusion
The use of big data in psychology and the social sciences has been growing
steadily. However, some sources of big data are still hardy accessible and require
knowledge and skills in various programming languages. The software OSM-Psychology
offers easy access to data generated by contributors in OpenStreetMap. Thereby, it
provides researchers with access to a novel and rich source of behavioral data. Compared
to unstructured text data obtained from Wikipedia or social media, OpenStreetMap
offers a relatively simple data structure containing both geographical, numeric
information, and semantic tags, thus facilitating statistical analyses. These data can be
useful to address research questions on various topics such as collaborative behavior,
development of norms, mental models, and semantic networks.
Adjerid, I., & Kelley, K. (2018). Big data in psychology: A framework for research
advancement. American Psychologist,73 (7), 899.
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects
models using lme4. Journal of Statistical Software,67, 1–48.
Bégin, D., Devillers, R., & Roche, S. (2017). Contributors’ withdrawal from online
collaborative communities: The case of OpenStreetMap. ISPRS International
Journal of Geo-Information,6(11), 340.
Budhathoki, N. R., & Haythornthwaite, C. (2013). Motivation for open collaboration:
Crowd and community models and the case of OpenStreetMap. American
Behavioral Scientist,57, 548–575.
Chen, H. (2011). Entwicklung von verfahren zur beurteilung und verbesserung der qualität
von navigationsdaten (Doctoral dissertation). Universität Stuttgart.
Ciepłuch, B., Jacob, R., Mooney, P., & Winstanley, A. C. (2010). Comparison of the
accuracy of OpenStreetMap for ireland with google maps and bing maps.
Proceedings of the Ninth International Symposium on Spatial Accuracy Assessment
in Natural Resuorces and Enviromental Sciences 20-23rd July 2010, 337–340.
Clauson, K. A., Polen, H. H., Boulos, M. N. K., & Dzenowagis, J. H. (2008). Scope,
completeness, and accuracy of drug information in wikipedia. Annals of
Pharmacotherapy,42 (12), 1814–1821.
Corcoran, P., Mooney, P., & Bertolotto, M. (2013). Analysing the growth of
OpenStreetMap networks. Spatial Statistics,3, 21–32.
Cress, U., Feinkohl, I., Jirschitzka, J., & Kimmerle, J. (2016). Mass collaboration as
coevolution of cognitive and social systems. In U. Cress, J. Moskaliuk, & H. Jeong
(Eds.), Mass collaboration and education (pp. 85–104). Springer International
El-Ashmawy, K. L. A. (2016). Testing the positional accuracy of OpenStreetMap data for
mapping applications. Geodesy and Cartography,42 (1), 25–30.
Emmerich, W., Goldman, K. S., & Shore, R. E. (1971). Differentiation and development
of social norms. Journal of Personality and Social Psychology,18, 323–353.
Giles, J. (2005). Internet encyclopaedias go head to head [Nature].
Haklay, M. (2010). How good is volunteered geographical information? a comparative
study of OpenStreetMap and ordnance survey datasets. Environment and
Planning B: Planning and Design,37 (4), 682–703.
Helbich, M., Amelunxen, C., Neis, P., & Zipf, A. (2012). Comparative spatial analysis of
positional accuracy of OpenStreetMap and proprietary geodata. In T. Jekel,
A. Car, J. Strobl, & G. Griesebner (Eds.), GI_forum 2012: Geovizualisation,
society and learning. (p. 10). Herbert Wichmann Verlag, VDE VERLAG GMBH.
Johnson-Laird, P. N. (2005). Mental models and thought. In K. J. Holyoak &
R. G. Morrison (Eds.), The cambridge handbook of thinking and reasoning.
thinking and reasoning: A reader’s guide (pp. 185–208). Wiley.
Johnson-Laird, P. (1980). Mental models in cognitive science. Cognitive Science,4(1),
Kahle, D., & Wickham, H. (2013). Ggmap: Spatial visualization with ggplot2. The R
Journal,5(1), 144–161.
Kern, M. L., Park, G., Eichstaedt, J. C., Schwartz, H. A., Sap, M., Smith, L. K., &
Ungar, L. H. (2016). Gaining insights from social media language: Methodologies
and challenges. Psychological methods,21 (4), 507–525.
Kosinski, M., Matz, S. C., Gosling, S. D., Popov, V., & Stillwell, D. (2015). Facebook as
a research tool for the social sciences: Opportunities, challenges, ethical
considerations, and practical guidelines. American Psychologist,70, 543.
Kräenbring, J., Monzon Penza, T., Gutmann, J., Muehlich, S., Zolk, O., Wojnowski, L.,
. . . Sarikas, A. (2014). Accuracy and completeness of drug information in
wikipedia: A comparison with standard textbooks of pharmacology. PLoS ONE,
Krawczyk, B. (2016). Learning from imbalanced data: Open challenges and future
directions. Progress in Artificial Intelligence,5, 221–232.
Leithner, A., Maurer-Ertl, W., Glehr, M., Friesenbichler, J., Leithner, K., &
Windhager, R. (2010). Wikipedia and osteosarcoma: A trustworthy patients’
information? Journal of the American Medical Informatics Association : JAMIA,
17 (4), 373–374.
Liu, P., Tov, W., Kosinski, M., Stillwell, D. J., & Qiu, L. (2015). Do facebook status
updates reflect subjective well-being? Cyberpsychology, Behavior, and Social
Networking,18 (7), 373–379.
Liu, X., & Long, Y. (2016). Automated identification and characterization of parcels with
OpenStreetMap and points of interest. Environment and Planning B: Planning
and Design,43, 341–360.
Ludwig, I., Voss, A., & Krause-Traudes, M. (2011). A comparison of the street networks
of navteq and OSM in germany. In S. Geertman, W. Reinhardt, & F. Toppen
(Eds.), Advancing geoinformation science for a changing world (pp. 65–84).
Luhmann, M. (2017). Using big data to study subjective well-being. Current opinion in
behavioral sciences,18, 28–33.
Mayer, M., Heck, D., & Mocnik, F.-B. (2020). Shared mental models as a psychological
explanation for converging mental representations of place – the example of
OpenStreetMap. Zenodo.
Mayer, M., & Heck, D. W. (2021). Sequential collaboration: Comparing the accuracy of
dependent, incremental judgments to wisdom of crowds. PsyArXiv.
Mocnik, F.-B., Ludwig, C., Grinberger, A. Y., Jacobs, C., Klonner, C., & Raifer, M.
(2019). Shared data sources in the geographical domain—a classification schema
and corresponding visualization techniques. ISPRS International Journal of
Geo-Information,8(5), 242.
Mocnik, F.-B., Mobasheri, A., Griesbaum, L., Eckle, M., Jacobs, C., & Klonner, C.
(2018). A grounding-based ontology of data quality measures. Journal of Spatial
Information Science,16, 1–25.
Mocnik, F.-B., Zipf, A., & Raifer, M. (2017). The OpenStreetMap folksonomy and its
evolution. Geo-spatial Information Science,20 (3), 219–230.
Nosofsky, R. M. (2011). The generalized context model: An exemplar model of
classification. In E. M. Pothos & A. J. Wills (Eds.), Formal approaches in
categorization (pp. 18–39). Cambridge University Press.
OpenStreetMap Contributors. (2020). OpenStreetMap.
Pachur, T., Schooler, L. J., & Stevens, J. R. (2012). When will we meet again?
regularities of social connectivity and their reflections in memory and decision
making. In R. Hertwig, U. Hoffrage, & A. Research Group (Eds.), Simple
heuristics in a social world (pp. 199–224). Oxford University Press.
Pachur, T., Schooler, L. J., & Stevens, J. R. (2014). We’ll meet again: Revealing
distributional and temporal patterns of social contact. PLoS ONE,9(1), e86081.
Raifer, M., Troilo, R., Kowatsch, F., Auer, M., Loos, L., Marx, S., . . . Zipf, A. (2019).
OSHDB: A framework for spatio-temporal analysis of OpenStreetMap history
data. Open Geospatial Data, Software and Standards,4(1), 3.
Rajagopalan, M. S., Khanna, V. K., Leiter, Y., Stott, M., Showalter, T. N., Dicker, A. P.,
& Lawrence, Y. R. (2011). Patient-oriented cancer information on the internet: A
comparison of wikipedia and a professionally maintained database. Journal of
Oncology Practice,7(5), 319–323.
Rentfrow, P. J., Gosling, S. D., Jokela, M., Stillwell, D. J., Kosinski, M., & Potter, J.
(2013). Divided we stand: Three psychological regions of the united states and
their political, economic, social, and health correlates. Journal of Personality and
Social Psychology,105, 996–1012.
Rentfrow, P. J., Gosling, S. D., & Potter, J. (2008). A theory of the emergence,
persistence, and expression of geographic variation in psychological characteristics.
Perspectives on Psychological Science,3, 339–369.
Stein, K., Kremer, D., & Schlieder, C. (2015). Spatial collaboration networks of
OpenStreetMap. In J. J. Arsanjani, A. Zipf, P. Mooney, & M. Helbich (Eds.),
OpenStreetMap in GIScience. experiences, research, and applications
(pp. 167–186). Springer.
Wang, W., Hernandez, I., Newman, D. A., He, J., & Bian, J. (2016). Twitter analysis:
Studying US weekly trends in work stress and emotion. Applied Psychology,65 (2),
Wickham, H. (2016). Ggplot2: Elegant graphics for data analysis. Springer-Verlag New
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood
estimation of semiparametric generalized linear models: Estimation of
semiparametric generalized linear models. Journal of the Royal Statistical Society:
Series B (Statistical Methodology),73 (1), 3–36.
Wood, S. (2017). Generalized additive models: An introduction with r (2nd ed.).
Chapman; Hall/CRC.
Zeileis, A., & Kleiber, C. (2018). countreg: Count data regression.
Zhang, H., & Malczewski, J. (2017). Accuracy evaluation of the canadian OpenStreetMap
road networks. International Journal of Geospatial and Environmental Research,
Zheng, S., & Zheng, J. (2014). Assessing the completeness and positional accuracy of
OpenStreetMap in china. In T. Bandrova, M. Konecny, & S. Zlatanova (Eds.),
Thematic cartography for the society (pp. 171–189). Springer International
Zielstra, D., & Zipf, A. (2010). Quantitative studies on the data quality of
OpenStreetMap in germany, 8.
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Online collaborative projects in which users contribute to extensive knowledge bases such as Wikipedia or OpenStreetMap have become increasingly popular while yielding highly accurate information. Collaboration in such projects is organized sequentially with one contributor creating an entry and the following contributors deciding whether to adjust or to maintain the presented information. We refer to this process as sequential collaboration since individual judgments directly depend on the previous judgment. As sequential collaboration has not yet been examined systematically, we investigate whether dependent, sequential judgments become increasingly more accurate. Moreover, we test whether final sequential judgments are more accurate than the unweighted average of independent judgments from equally large groups. We conducted three studies with groups of four to six contributors who either answered general knowledge questions (Experiments 1 and 2) or located cities on maps (Experiment 3). As expected, individual judgments became more accurate across the course of sequential chains and final estimates were similarly accurate as unweighted averaging of independent judgments. These results show that sequential collaboration profits from dependent, incremental judgments, thereby shedding light on the contribution process underlying large-scale online collaborative projects.
Conference Paper
Full-text available
People perceive the environment in various idiosyncratic ways, letting them conceptualize places differently. Representation in a data set and communication about places, however, create the need to reach agreement in the place a symbol or word represents. People have thus to integrate their views about a place. In this paper, we discuss how idiosyncratic views about places and their integration can be traced in OpenStreetMap. Then, we explore novel ways of how to model the integration processes of such idiosyncratic views by the means of psychological models. In particular, we explore the concept of Shared Mental Models. Such formal modelling and the corresponding better understanding of how people integrate their views about places improves the way we can make sense of collaborative shared data sets.
Full-text available
People share data in different ways. Many of them contribute on a voluntary basis, while others are unaware of their contribution. They have differing intentions, collaborate in different ways, and they contribute data about differing aspects. Shared Data Sources have been explored individually in the literature, in particular OpenStreetMap and Twitter, and some types of Shared Data Sources have widely been studied, such as Volunteered Geographic Information (VGI), Ambient Geographic Information (AGI), and Public Participation Geographic Information Systems (PPGIS). A thorough and systematic discussion of Shared Data Sources in their entirety is, however, still missing. For the purpose of establishing such a discussion, we introduce in this article a schema consisting of a number of dimensions for characterizing socially produced, maintained, and used ‘Shared Data Sources,’ as well as corresponding visualization techniques. Both the schema and the visualization techniques allow for a common characterization in order to set individual data sources into context and to identify clusters of Shared Data Sources with common characteristics. Among others, this makes possible choosing suitable Shared Data Sources for a given task and gaining an understanding of how to interpret them by drawing parallels between several Shared Data Sources.
Full-text available
OpenStreetMap (OSM) is a collaborative project collecting geographical data of the entire world. The level of detail of OSM data and its data quality vary much across different regions and domains. In order to analyse such variations it is often necessary to research the history and evolution of the OSM data. The OpenStreetMap History Database (OSHDB) is a new data analysis tool for spatio-temporal geographical vector data. It is specifically optimized for working with OSM history data on a global scale and allows one to investigate the data evolution and user contributions in a flexible way. Benefits of the OSHDB are for example: to facilitate accessing OSM history data as a research subject and to assess the quality of OSM data by using intrinsic measures. This article describes the requirements of such a system and the resulting technical implementation of the OSHDB: the OSHDB data model and its application programming interface.
Full-text available
Data quality and fitness for purpose can be assessed by data quality measures. Existing ontologies of data quality dimensions reflect, among others, which aspects of data quality are assessed and the mechanisms that lead to poor data quality. An understanding of which source of information is used to judge about data quality and fitness for purpose is, however, lacking. This article introduces an ontology of data quality measures by their grounding, that is, the source of information to which the data is compared to in order to assess their quality. The ontology is exemplified with several examples of volunteered geographic information (VGI), while also applying to other geographical data and data in general. An evaluation of the ontology in the context of data quality measures for OpenStreetMap (OSM) data, a well-known example of VGI, provides insights about which types of quality measures for OSM data have and which have not yet been considered in literature.
Full-text available
Volunteered geographic information (VGI) has been applied in many fields such as participatory planning, humanitarian relief and crisis management. One of the reasons for popularity of VGI is its cost-effectiveness. However, the coverage and accuracy of VGI cannot be guaranteed. The issue of geospatial data quality in the OpenStreetMap (OSM) project has become a trending research topic because of the large size of the dataset and the multiple channels of data access. This paper focuses on a national study of the Canadian OSM road network data for the assessment of completeness, positional accuracy, attribute accuracy, semantic accuracy and lineage. The OSM road networks in Canada have generally reliable quality compared to Digital Mapping Technologies Inc. Urban areas and footways received more contributions than rural areas and motorways, and imported road segments from GeoBase have slightly better quality than the national OSM dataset. The findings of the map quality can potentially guide cartographic product selection for interested parties and offer a better understanding of future improvement of OSM quality. In addition, the study presents the OSM contributions influenced by data import and remote mapping.
Full-text available
OpenStreetMap, a web mapping platform, is the most popular web map source for use in locationbased services with specific emphasis on pedestrian navigation, tourist guide applications, and other location-based search applications. This paper tests the positional accuracy of OpenStreetMap for the mapping applications using the case study in the campus of UMM El-Qura University, Makah, Saudi Arabia. The proposed testing method consists of statistical comparative approach using OpenStreetMap data and accurate land surveying reference data. The results show that OpenStreetMap data has positional accuracy of 1.57 m which is suitable for generating planimetric maps of scale 1:5000 or smaller. The obtained results open the door for using the OpenStreetMap maps for applications such as general preliminary planning where larger areas are covered but only moderate accuracy is needed. Applications include mapping the general layout of potential construction sites, proposed transportation systems, and existing facilities. The proposed methodology in this paper is of great interest to small engineering firms for the generation of local area maps from OpenStreetMap data.
Full-text available
Online collaborative communities are now ubiquitous. Identifying the nature of the events that drive contributors to withdraw from a project is of prime importance to ensure the sustainability of those communities. Previous studies used ad hoc criteria to identify withdrawn contributors, preventing comparisons between results and introducing interpretation biases. This paper compares different methods to identify withdrawn contributors, proposing a probabilistic approach. Withdrawals from the OpenStreetMap (OSM) community are investigated using time series and survival analyses. Survival analysis revealed that participants’ withdrawal pattern compares with the life cycles studied in reliability engineering. For OSM contributors, this life cycle would translate into three phases: “evaluation,” “engagement” and “detachment.” Time series analysis, when compared with the different events that may have affected the motivation of OSM participants over time, showed that an internal conflict about a license change was related to largest bursts of withdrawals in the history of the OSM project. This paper not only illustrates a formal approach to assess withdrawals from online communities, but also sheds new light on contributors’ behavior, their life cycle, and events that may affect the length of their participation in such project.
Full-text available
The comprehension of folksonomies is of high importance when making sense of Volunteered Geographic Information (VGI), in particular in the case of OpenStreetMap (OSM). So far, only little research has been conducted to understand the role and the evolution of folksonomies in VGI and OSM, which is despite the fact that without a comprehension of the folksonomies the thematic dimension of data can hardly be used. This article examines the history of the OSM folksonomy, with the aim to predict its future evolution. In particular, we explore how the documentation of the OSM folksonomy relates to its actual use in the data, and we investigate the historical and future scope and granularity of the folksonomy. Finally, a visualization technique is proposed to examine the folksonomy in more detail.
The potential for big data to provide value for psychology is significant. However, the pursuit of big data remains an uncertain and risky undertaking for the average psychological researcher. In this article, we address some of this uncertainty by discussing the potential impact of big data on the type of data available for psychological research, addressing the benefits and most significant challenges that emerge from these data, and organizing a variety of research opportunities for psychology. Our article yields two central insights. First, we highlight that big data research efforts are more readily accessible than many researchers realize, particularly with the emergence of open-source research tools, digital platforms, and instrumentation. Second, we argue that opportunities for big data research are diverse and differ both in their fit for varying research goals, as well as in the challenges they bring about. Ultimately, our outlook for researchers in psychology using and benefiting from big data is cautiously optimistic. Although not all big data efforts are suited for all researchers or all areas within psychology, big data research prospects are diverse, expanding, and promising for psychology and related disciplines.