ChapterPDF Available

Intrinsic OSM data quality assessment The intrinsic quality assessment of building footprints data on OpenStreetMap in Baden- Württemberg


Abstract and Figures

In this work, we propose a framework to assess the quality of OSM building footprints data without using any reference data. More specifically, the OSM history data will be examined regarding the development of attributes, geometries and positions of building footprints. In total seven quality indicators are defined for the intrinsic quality assessment. For our case study in the federal state of Baden-Württemberg (BW), Germany, a PostgreSQL database is established based on a spatiotemporal data model which can track both individual objects and editing events on OSM. The preliminary experiments show that the quality of building footprints in BW is relatively high. And the quality in terms of semantics, geometries and positions are getting increasingly high over the time thanks to the considerable contribution of OSM volunteers.
Content may be subject to copyright.
Intrinsic OSM data quality assessment
The intrinsic quality assessment of building
footprints data on OpenStreetMap in Baden-
Hongchao Fan, Anran Yang, and Alexander Zipf
In this work, we propose a framework to assess the quality of OSM building
footprints data without using any reference data. More specifically, the OSM history
data will be examined regarding the development of attributes, geometries and
positions of building footprints. In total seven quality indicators are defined for the
intrinsic quality assessment. For our case study in the federal state of Baden-
Württemberg (BW), Germany, a PostgreSQL database is established based on a
spatiotemporal data model which can track both individual objects and editing
events on OSM. The preliminary experiments show that the quality of building
footprints in BW is relatively high. And the quality in terms of semantics, geometries
and positions are getting increasingly high over the time thanks to the considerable
contribution of OSM volunteers.
1 Introduction
OpenStreetMap (OSM) is considered one of the most successful and popular VGI
projects, and it has attracted significant and sustained interest in academia,
industry, and governmental agencies. Currently, there are almost three million
registered members (OSM, 2016), which has led OSM to grow rapidly. With the
rapid development of OSM in recent years, especially, sparked by the availability of
high-resolution imagery from Bing since 2010, there has been an increase in
building information in OSM, proving that volunteers do not only contribute roads or
points of interest (POIs) to the database. According to the latest statistics (the
values are derived from our internal OSM database, which is updated daily), the
number of buildings in OSM is above 200 million, thereof 18.4 million building
footprints in Germany. The research of Fan et al. (2014) demonstrated that the data
regarding building footprints on OSM has a high degree of completeness and
semantic accuracy. There is an offset of about four meters on average in terms of
position accuracy. With respect to shape, OSM building footprints have high
similarity to ATKIS (the German authority data) footprint data. Moreover, there is
more and more information about building heights and roof structures, which is
required for the 3D reconstruction.
Because the data are collected through crowd-sourcing, OSM is often doubted
regarding data quality. In 2013, Kunze and her colleagues applied several methods
H. Fan, A. Yang & A. Zipf
to assess the completeness of the building information in OSM in comparison to an
administrative dataset for two federal states in Germany (Kunze et al. 2013; Hecht
et al. 2013). As the criterion of quality assessment, the work mainly analyzed the
area difference of a group of buildings within hexagon/square instead of individual
correspondence. Fan et al. (2014) addressed the OSM building completeness in
Munich, Germany. The authors compared OSM building footprints data with ATKIS
data in terms of completeness, semantic accuracy, position accuracy, and shape
accuracy. Klonner et al. (2015) also addressed the building completeness and
conducted a data quality analysis of building footprints in Bregenz, Austria. Most
recently, authors employ authority datasets to compare the quality of OSM in terms
of the completeness and the thematic accuracy in the dataset (Törnros et al 2015;
Dorn et al 2015).
The abovementioned methods of quality assessment rely on the access of
reference datasets which are unfortunately, in many cases, not available due to
contradictory licensing restrictions or high procurement costs. Therefore, intrinsic
quality assessment has been introduced in the recent years. Many existing
approaches examined OSM data by checking the change history of features. For
example, Keßler and Groot (2013) evaluated feature-level attributes such as the
number of versions, the stability against changes and the corrections and rollbacks
of features so to infer OSM features’ quality. Barron et al. (2014) have developed a
comprehensive analysis framework, called iOSManalyser, for investigating the
intrinsic data quality of OSM based on its mapping history. In their work, a broad
range of more than 25 different methods and indicators were presented to evaluate
the quality of an OSM dataset.
This work is dedicated to the intrinsic quality assessment for building footprints data
on OSM. First of all, a conceptual framework is developed regarding the intrinsic
quality assessment for OSM building footprints data. At the same time, the
preliminary results of the intrinsic quality assessment in Baden-Württemberg,
Germany, will be demonstrated. Furthermore, a conceptual framework for the
effective analysis of OSM history data will be presented in order to carry out the
intrinsic quality assessment for OSM data.
2 The conceptual framework of intrinsic quality
assessment for OSM building footprints data
In total, seven indicators are defined for the intrinsic quality assessment of building
footprints data on OSM:
In terms of completeness:
(i) the development of built-up area over time,
(ii) the development of building count over time,
(iii) the development of positional accuracy attributes over time,
Intrinsic OSM data quality assessment
(iv) the average vertex displacement of a building footprint when edited by
OSM contributors,
and in terms of shape accuracy:
(v) the orthogonality of building footprints,
(vi) the parallelism of building footprint edges to the nearby line segments
of roads, and
(vii) the fragmentariness of patterns formed by building footprints.
While the first three indicators and the fifth one are quite obviously and easy to be
understood, the fourth, sixth and seventh indicator will be elaborated in the
following, in order to give a better depiction how they will be calculated.
(1) Calculation of the average displacement of a vertex on a building footprint
polygon after several editing processes on OSM
It is assumed that a vertex   of a building footprint polygon is (re-
)edited many times       by different contributors on OSM.
Due to the non-rectangular shape on the vertex and some other reasons
such as shadow or occlusion by vegetation, the contributors might have
problems to see the exact vertex on the aerial or satellite images. For this
reason, the exact position might be estimated and corrected differently by
different OSM contributors. As the result, there are many records of
positions of the referred vertex. In order to estimate its most likely location,
a grid-based accumulation space is generated at first. Then the number of
points falling within the accumulation cells is counted. The cell with the
largest number is then the most likely location of the vertex. The average
displacement to this point is then the positional accuracy.
(2) Calculation of the parallelism of building footprint edges
The right image of fig.1 shows some buildings in Heidelberg, Germany.
Normally, the edges of buildings immediately adjacent are parallel to the
nearby street. In most cases, building footprints are decomposed into long
line segments, while the street nearby consists also of line segments. In
these cases, the parallelism is very easy to be calculated. However, there
are some cases in which one of the edges of the building footprint is curved
and just like the street nearby as demonstrated on the left image of fig.1. In
this case, the edge of building footprint is denoted as , while the street
nearby is represented as .
In the first step, is converted into a sequence of points (   
with small and equal intervals. For point , its foot point , which is
perpendicular to is calculated. If  is located on , the distance
H. Fan, A. Yang & A. Zipf
from the point to , is calculated as the Euclidean distance from 
and , namely,  . It is assumed that there are points
(   
 ) (        ) on that have foot points
(   ) perpendicular to . The distance between and is
then the RMS (root mean square) of (   ), . The RMSE (root
mean square error)  is then used to evaluate the parallelism of
and . In this work, the distance between two points is set at 0.3 m,
which is sufficiently small compared with a line segment of a road in the
physical world, which can ensure that the RMSE can be used to
evaluate parallelism.
Fig.1 The parallelism of edges of building footprints to road network (the image on
the right figure is taken from Google Map)
(3) Calculation of the fragmentariness of patterns formed by building footprints
Buildings with similar shapes and sizes in the same area can form patterns
if they are distributed regularly. This kind of knowledge can be used for
intrinsic quality assessment of OSM building footprint. Firstly, the building
footprints in a local area will be compared in terms of shapes and sizes,
whereby a turning function is applied for the similarity. Secondly, the
centroids of similar building footprints are used to estimate the distribution
of the buildings by using regression or partition regression models. In the
next step, it will check whether there are buildings intersecting with the
pattern but not considered as similar buildings in the pattern. This building
should share the shape and size to the buildings in the pattern according to
the abovementioned hypothesis. A pseudo building footprint with the same
shape and size will be calculated at its position. And its orientation can be
Intrinsic OSM data quality assessment
computed by the interpolation of the pattern regression. Finally, the
positional accuracy and shape accuracy can be calculated by comparing
the OSM building footprint at that location and the pseudo building footprint.
It is also possible to obtain the displacement of the orientation.
3 The Spatiotemporal data model for OSM history data
In order to conduct the intrinsic quality assessment of OSM data, the full history of
OSM data has to be made available. The original format of the OSM history is very
terse and capable to express rich information about the entities, their relationships,
and their temporal changes. The geometry oriented models used by editing tools
cannot provide an equivalent representation of the data. In this work, we developed
a spatiotemporal data model not only for the purpose of assessing building
footprints data, but also for the purpose of a comprehensive investigation of OSM
features in terms of data quality and user behaviour.
We first discuss the pure temporal model of OSM history, namely, entity and event,.
In the so-called Interval Based Model, an entity is homogeneous across the interval
but an event is not, that is, an entity over and the entity over   is the same
entity, which is not the case for an event. For example, a building in 2010-2011 will
be the same building as in 2010-2012, but the improvement of a building from 2010
to 2011 is only part of the improvement from 2010 to 2012. In the Instant Based
Model, the entity and the event are not so distinguishable in theory, but usually we
can recognize them in the context. For example, nodes addedis clearly an event,
while “road 111 at 2016-01-01 00:00" is an entity.
We propose a model including four types, which are entities over intervals, events
over intervals, entities at instants, and events at instants. The objects over intervals
have two essential properties enter_time and exit_time, while objects over
instants have one property named timespot. Granted that there is a function
  calculates the value of the entity at time . The semantic of  is trivial
in the instant based view, but in the interval based view,  is inconclusive unless
 ,  . For convenience, we further define the predicate
 in the interval view as:
       
The definition of  clearly suggests that it depends on how  is
understood, which brings the confusion about versions of ways and relations. If 
is defined as the tags and referred node/member entities, then the result is exactly
the same versions marked in the original data. However, if ways and relations are
regarded as geometry with tags much more versions result. Let it be supposed that
a ”version" of a highway spanning from January 1st to February 3rd, of which one of
its inner nodes N changes its position on January 29th. The “version” is in fact not a
H. Fan, A. Yang & A. Zipf
version since           . Instead,  in such
case can be defined as:
     
The five fundamental types in OSM history are defined as shown in figure 2. We
preserve most of the original structures with some differences to make the whole
model more consistent and convenient.
Fig. 2 Five major types corresponding to the original data.
4 Preliminary results in Baden-Württemberg
For the current stage, a PostgreSQL database has been established by using the
spatiotemporal model introduced in section 3. We use the building footprints data in
Baden-Württemberg (BW) for the test of the proposed intrinsic quality assessment.
Fig.3 The development of parameters of building footprints in BW
Intrinsic OSM data quality assessment
As shown in figure 3, the total area (fig.3a) of building footprints in BW trend
convergently, while the number (fig.3b) of building footprints still seems to rise with a
steep slope. fig.3c depicts that the number of buildings tagged with attributes is still
increasing rapidly. And fig.3d represents the development of the percentage of
buildings with rectangle shape. The valley in 2012 is followed by an increase of the
percentage of rectangle-shaped building footprints. This reflects the reality that
buildings were mapped in blocks as rectangles at the early stage of OSM
development. Then OSM contributors started mapping buildings with complex
shapes. From 2012 onwards, the polygons as group-buildings have been refined, so
that individual buildings are digitized. This development means that both the
semantic and the geometric accuracy of building data on OSM have been improved
in the recent years.
Using the method presented in section 3, the parallelism of edges of building
footprints to the nearby road line segments are calculated. Fig.5 shows that most of
the buildings have an edge which is more than 90 % parallel to the immediately
adjacent street/road. This is almost consistent with reality. In other words, the
positional accuracy of building footprints in BW seems to be very high.
Fig.5 The parallelisms of building footprints edges to their nearby roads (
5 Conclusion
In this paper, a framework is presented for the intrinsic quality assessment of OSM
building footprints data. In total, seven quality indicators are suggested for the
quality measurement without any reference data. The main idea is to observe the
historic development of OSM data on the one hand. On the other hand, we are
utilizing the knowledge in urban area to define indicators of intrinsic quality, because
urban is man-made environment and almost everything is constructed based on a
H. Fan, A. Yang & A. Zipf
certain rule. For instance, buildings are constructed with rectangles. Then the
rectangularity of building footprints can be used as a kind of indicator for the quality
of building footprint. Preliminary statistics of data on building footprints have been
done by using the OSM history data in Baden-Württemberg, Germany. The
experimental results show that data on building footprints in BW are mapped with
relatively high accuracy and the quality in terms of semantics, geometries and
positions is still improving over time.
6 Literatur
Barron, C., Neis, P., & Zipf, A. (2014). A comprehensive framework for intrinsic
OpenStreetMap quality analysis. Transactions in GIS, 18(6), 877-895.
Dorn, H., Törnros, T., & Zipf, A. (2015). Quality Evaluation of VGI Using Authoritative
DataA Comparison with Land Use Data in Southern Germany.ISPRS International
Journal of Geo-Information, 4(3), 1657-1671.
Fan, H., Zipf, A., Fu, Q., & Neis, P. (2014). Quality assessment for building footprints
data on OpenStreetMap. International Journal of Geographical Information Science,
28(4), 700-719.
Hecht, R., Kunze, C., & Hahmann, S. (2013). Measuring completeness of building
footprints in OpenStreetMap over space and time. ISPRS International Journal of
Geo-Information, 2(4), 1066-1091.
Keßler, C., & de Groot, R. T. A. (2013). Trust as a proxy measure for the quality of
volunteered geographic information in the case of OpenStreetMap. InGeographic
information science at the heart of Europe (pp. 21-37). Springer International
Klonner, C., Barron, C., Neis, P., & Höfle, B. (2015). Updating digital elevation models
via change detection and fusion of human and remote sensor data in urban
environments. International Journal of Digital Earth, 8(2), 153-171.
Kunze, C., Hecht, R., & Hahmann, S. (2013, August). Assessing the completeness of
building footprints in OpenStreetMap: an example from Germany. In 26th
International Cartographic conference (pp. 25-30).
OSM, 2016. . (last access on the 15.08.2016)
Törnros, T., Dorn, H., Hahmann, S., & Zipf, A. (2015). UNCERTAINTIES OF
Photogrammetry, Remote Sensing and Spatial Information Sciences, 1, 353-357.
... They found that while the OSM data in a region were almost complete, the increment of data in such a region was less than 3%. Fan [35] used "development of building count over time" which was also based on analyzing the historical data in OSM. Mobasheri et al. [36] analyzed the OSM sidewalk data by counting the number of road segments with/without a tag. ...
... OSM building count denotes the number of OSM buildings in a given region. Several studies [34,35] have proposed that the "development of building count over time" can be used for quality assessment of OSM building completeness. Theoretically, the OSM building count is positively correlated with the completeness of OSM building data in a region, although the former cannot specifically indicate a completeness value. ...
Full-text available
OpenStreetMap (OSM) is a free map that can be created, edited, and updated by volunteers globally. The quality of OSM datasets is therefore of great concern. Extensive studies have focused on assessing the completeness (a quality measure) of OSM datasets in various countries, but very few have been paid attention to investigating the OSM building dataset in China. This study aims to present an analysis of the evolution, completeness and spatial patterns of OSM building data in China across the years 2012 to 2017. This is done using two quality indicators, OSM building count and OSM building density, although a corresponding reference dataset for the whole country is not freely available. Development of OSM building counts from 2012 to 2017 is analyzed in terms of provincial- and prefecture-level divisions. Factors that may affect the development of OSM building data in China are also analyzed. A 1 × 1 km2 regular grid is overlapped onto urban areas of each prefecture-level division, and the OSM building density of each grid cell is calculated. Spatial distributions of high-density grid cells for prefecture-level divisions are analyzed. Results show that: (1) the OSM building count increases by almost 20 times from 2012 to 2017, and in most cases, economic (gross domestic product) and OSM road length are two factors that may influence the development of OSM building data in China; (2) most grid cells in urban areas do not have any building data, but two typical patterns (dispersion and aggregation) of high-density grid cells are found among prefecture-level divisions.
... In earlier work, Barron, Neis, and Zipf (2014) also proposed a number of methods and proxy indicators that can be used to assess the quality of OSM products utilizing just data history, while Antoniou and Skopeliti (2015) summarized four such proxies (i.e., data, demographic, socioeconomic, and contributor) via a literature review. Applying the first of these proxies, Ciepłuch, Mooney, and Winstanley (2011) used the density of points within 5 km grid squares as the basis of a completeness assessment, while Fan, Yang, and Zipf (2016) proposed the use of seven indicators founded on historical observations inherent to the development of OSM data and utilized existing urban area knowledge to assess the completeness of buildings. The application of demographic and socioeconomic indicators has also revealed that OSM data completeness is affected by both population density (Zielstra & Zipf, 2010) and the income levels of contributors (Neis, Zielstra, & Zipf, 2013), while Camboim et al. (2015) have also shown that road completeness in urban areas is moderately well correlated with population density. ...
Full-text available
OpenStreetMap (OSM) is a free global map dataset that was created by volunteers around the world. This inevitably means that there are a number of quality issues with the final OSM product, however. Extensive research has therefore been carried out to assess the quality of this product by applying a range of measures versus reference datasets, but little effort to date has been focused on quantitative quality estimation without reference data-sets. The aim of this study is therefore to quantitatively estimate the completeness of street blocks in an OSM dataset. This was accomplished by initially exploring the relationship between geometric indicators (i.e., area, perimeter , and density) and street block completeness in an OSM road dataset, before these relationships were applied to quantitatively estimate completeness in other datasets. The results of this study show that: (1) street block completeness is positively correlated with density and negatively correlated with area and perimeter; and (2) in most cases, estimated completeness values for all street blocks within an OSM road dataset do not differ by more than 10% in absolute terms from actual completeness values. These results indicate that geometric indicators can be used as proxies to quantitatively estimate road completeness in OSM datasets. The link for this article is
... As an example, the authors in [18] introduced a comprehensive framework to assess the quality of OSM datasets. In [19], the authors investigated the quality of building footprints data in Baden-Württemberg in Germany extracted from OSM. Furthermore, [20] provided an analysis the fitness for using OSM database for routing and navigation and assessed its completeness regarding sidewalk information. ...
Full-text available
Energizing the future energy systems requires comprehensive understanding at the urban level. However, many uncertainties hinder such goals, leastwise the data and tools used in modeling urban energy systems (UES). Urban Energy Modeling including spatial parameters of urban forms and settlements can lead to more sustainable approaches in computational simulations. Moreover, scientific policy advice requires that models and data to be transparent and prepared in an unbiased manner. Hence, benchmarking the quality criteria of open source datasets like OpenStreetMap (OSM) data is needed. This contribution introduces UES modelling and its data requirements for different application sectors. Further, it extends and provides a first assessment of OSM data which is used in modeling energy systems in cities. This data was previously used in the context of a UES model developed by the authors. A main conclusion is that although many OSM features are missing and cannot be extracted from OSM, these datasets constitute a great source of open data which can contribute to a more sustainable and transparent modelling.
... A number of indicators have been proposed (Fan et al. 2016, Mobasheri et al. 2017, Senaratne et al. 2017. However, most have been used for qualitatively analyzing rather than quantitatively exploring the relationship between each indicator and the quality of OSM data, which may be one of the necessary steps in a quantitative quality estimation. ...
Full Text is now freely downloaded OpenStreetMap (OSM) is a free spatial data source based on crowd sourced data. Although the OSM data have a range of applications, such as generating 3D models, and routing and navigation, quality issues are still significant concerns when using the data. Several studies have undertaken quality assessments by comparing OSM data with reference data. However, reference data are not always available due to high costs or licensing restrictions, and very few studies have quantitatively estimated the quality of OSM data under conditions where the corresponding reference data are not available. This study proposed the use of a building density (or building coverage ratio) indicator as a proxy, and designed a series of experiments involving different study areas to quantitatively explore the relationship between building density and building completeness for OSM data in urban areas. The residuals (estimated building completeness and reference building completeness) were also analyzed. Two main results were found from the experiments. (1) There was an approximate linear relationship between building density and building completeness in the OSM data. More precisely, the building completeness of OSM data was approximately 3.4–4 times the building density of OSM data. (2) Approximately 70–80% of the absolute residuals were smaller than 10%, and 80–90% of them were smaller than 20%. This shows that, in most cases, estimated building completeness was close to the corresponding reference building completeness. Therefore, we concluded that the building density indicator is a potential proxy for the quantitative completeness estimation of OSM building data in urban areas. The limitations of using this indicator were also addressed.
Full-text available
The academic community frequently engages with OpenStreetMap (OSM) as a data source and research subject, acknowledging its complex and contextual nature. However, existing literature rarely considers the position of academic research in relation to the OSM community. In this paper we explore the extent and nature of engagement between the academic research community and the larger communities in OSM. An analysis of OSM-related publications from 2016 to 2019 and seven interviews conducted with members of one research group engaged in OSM-related research are described. The literature analysis seeks to uncover general engagement patterns while the interviews are used to identify possible causal structures explaining how these patterns may emerge within the context of a specific research group. Results indicate that academic papers generally show few signs of engagement and adopt data-oriented perspectives on the OSM project and product. The interviews expose that more complex perspectives and deeper engagement exist within the research group to which the interviewees belong, e.g., engaging in OSM mapping and direct interactions based on specific points-of-contact in the OSM community. Several conclusions and recommendations emerge, most notably: that every engagement with OSM includes an interpretive act which must be acknowledged and that the academic community should act to triangulate its interpretation of the data and OSM community by diversifying their engagement. This could be achieved through channels such as more direct interactions and inviting members of the OSM community to participate in the design and evaluation of research projects and programmes.
Full-text available
The completeness of buildings in OpenStreetMap (OSM) is estimated for a medium-sized German city and its surroundings by comparing the OSM data with data from an official building cadastre. As completeness measures we apply two unit-based methods that are frequently applied in similar studies. It is found that the estimation of OSM building completeness strongly differ between the methods. A count ratio (number of OSM buildings / number of reference buildings) tends to underestimate the actual building completeness and an area ratio (total OSM building area / total reference building area) instead tends to overestimate the completeness within the study area. It is argued that a simple pre-processing of the building footprint polygons leads to a more accurate completeness estimation when applying the count ratio. It is also suggested to more carefully examine the areas that have been mapped in OSM but not in the reference data set (false positives). In the present study region, these values are mainly due to simplified OSM polygons and they contribute to an overestimation of the OSM building completeness when applying the area ratio.
Full-text available
Volunteered Geographic Information (VGI) such as data derived from the OpenStreetMap (OSM) project is a popular data source for freely available geographic data. Normally, untrained contributors gather these data. This fact is frequently a cause of concern regarding the quality and usability of such data. In this study, the quality of OSM land use and land cover (LULC) data is investigated for an area in southern Germany. Two spatial data quality elements, thematic accuracy and completeness are addressed by comparing the OSM data with an authoritative German reference dataset. The results show that the kappa value indicates a substantial agreement between the OSM and the authoritative dataset. Nonetheless, for our study region, there are clear variations between the LULC classes. Forest covers a large area and shows both a high OSM completeness (97.6%) and correctness (95.1%). In contrast, farmland also covers a large area, but for this class OSM shows a low completeness value (45.9%) due to unmapped areas. Additionally, the results indicate that a high population density, as present in urbanized areas, seems to denote a higher strength of agreement between OSM and the DLM (Digital Landscape Model). However, a low population density does not necessarily imply a low strength of agreement.
Full-text available
In the past two years, several applications of generating three-dimensional 3D buildings from OpenStreetMap OSM have been made available, for instance, OSM-3D, OSM2World, OSM Building, etc. In these projects, 3D buildings are reconstructed using the buildings’ footprints and information about their attributes, which are documented as tags in OSM. Therefore, the quality of 3D buildings relies strongly on the quality of the building footprints data in OSM. This article is dedicated to a quality assessment of building footprints data in OSM for the German city of Munich, which is one of the most developed cities in OSM. The data are evaluated in terms of completeness, semantic accuracy, position accuracy, and shape accuracy by using building footprints in ATKIS German Authority Topographic–Cartographic Information System as reference data. The process contains three steps: finding correspondence between OSM and ATKIS data, calculating parameters of the four quality criteria, and statistical analysis. The results show that OSM footprint data in Munich have a high completeness and semantic accuracy. There is an offset of about four meters on average in terms of position accuracy. With respect to shape, OSM building footprints have a high similarity to those in ATKIS data. However, some architectural details are missing; hence, the OSM footprints can be regarded as a simplified version of those in ATKIS data.
Full-text available
Due to financial or administrative constraints, access to official spatial base data is currently limited to a small subset of all potential users in the field of spatial planning and research. This increases the usefulness of Volunteered Geographic Information (VGI), in particular OpenStreetMap (OSM), as supplementary datasets or, in some cases, alternative sources of primary data. In contrast to the OSM street network, which has already been thoroughly investigated and found to be practically complete in many areas, the degree of completeness of OSM data on buildings is still unclear. In this paper we describe methods to analyze building completeness and apply these to various test areas in Germany. Official data from national mapping and cadastral agencies is used as a basis for comparison. The results show that unit-based completeness measurements (e.g., total number or area of buildings) are highly sensitive to disparities in modeling between official data and VGI. Therefore, we recommend object-based methods to study the completeness of OSM building footprint data. An analysis from November 2011 in Germany indicated a completeness of 25% in the federal states of North Rhine-Westphalia and 15% in Saxony. Although further analyses from 2012 confirm that data completeness in Saxony has risen to 23%, the rate of new data input was slowing in the year 2012.
Full-text available
High availability and diversity make Volunteered Geographic Information (VGI) an interesting source of information for an increasing number of use cases. Varying quality, however, is a concern often raised when it comes to using VGI in professional applications. Recent research directs towards the estimation of VGI quality through the notion of trust as a proxy measure. In this chapter, we investigate which indicators influence trust, focusing on inherent properties that do not require any comparison with a ground truth dataset. The indicators are tested on a sample dataset extracted from OpenStreetMap. High numbers of contributors, versions and confirmations are considered as positive indicators, while corrections and revisions are treated as indicators that have a negative influence on the development of feature trustworthiness. In order to evaluate the trust measure, its results have been compared to the results of a quality measure obtained from a field survey. The quality measure is based on thematic accuracy, topological consistency, and information completeness. To address information completeness as a criterion of data quality, the importance of individual tags for a given feature type was determined based on a method adopted from information retrieval. The results of the comparison between trust assessments and quality measure show significant support for the hypothesis that feature-level VGI data quality can be assessed using a trust model based on data provenance.
OpenStreetMap (OSM) is one of the most popular examples of a Volunteered Geographic Information (VGI) project. In the past years it has become a serious alternative source for geodata. Since the quality of OSM data can vary strongly, different aspects have been investigated in several scientific studies. In most cases the data is compared with commercial or administrative datasets which, however, are not always accessible due to the lack of availability, contradictory licensing restrictions or high procurement costs. In this investigation a framework containing more than 25 methods and indicators is presented, allowing OSM quality assessments based solely on the data's history. Without the usage of a reference data set, approximate statements on OSM data quality are possible. For this purpose existing methods are taken up, developed further, and integrated into an extensible open source framework. This enables arbitrarily repeatable intrinsic OSM quality analyses for any part of the world.
Assessing the completeness of building footprints in OpenStreetMap: an example from Germany
  • C Kunze
  • R Hecht
  • S Hahmann
Kunze, C., Hecht, R., & Hahmann, S. (2013, August). Assessing the completeness of building footprints in OpenStreetMap: an example from Germany. In 26th International Cartographic conference (pp. 25-30).