Content uploaded by Levente Juhász
Author content
All content in this area was uploaded by Levente Juhász on Aug 04, 2015
Content may be subject to copyright.
535
Exploratory Completeness Analysis of Mapillary
for Selected Cities in Germany and Austria
Levente Juhasz and Hartwig Hochmair
University of Florida, Fort Lauderdale/USA · levente.juhasz@ufl.edu
Full paper double blind review
Abstract
Mapillary, founded in early 2014, is a Web 2.0 application that allows voluntary users to
contribute street level photographs from all over the world. This paper analyses the growth
in uploaded data and contributor numbers over time, and assesses data completeness by
comparing Mapillary with Google Street View and OpenStreetMap data for selected cities
in Germany and Austria. Results show that as of now Google Street View generally pro-
vides a better coverage in the cities in which it is offered, but that Mapillary has the poten-
tial to reach higher coverage along off-road segments, such as footpaths and bicycle trails,
or even railroads.
1 Introduction
Street level photographs are an important data source for a variety of analysis tasks, includ-
ing the identification of road features (e.g., crosswalks, traffic signs) and the assessment of
wheelchair accessibility of sidewalks. Besides commercial products such as Google Street
View or Mapjack, Mapillary is the first platform offering a street level photograph service
based on crowd-sourced data. It therefore provides a unique addition to the currently avail-
able data resources of Volunteered Geographic Information (VGI) (GOODCHILD 2007).
Mapillary is run by a company located in Malmö, Sweden, and began providing services at
the beginning of 2014. Imagery is available (at http://www.mapillary.com) to the public for
any kind of purpose under an open license, CC BY-SA 4.0.
This study provides an overview of the development of Mapillary data within its first year,
including the number of individual contributors and data growth. Mapillary data has already
been collected for all five continents, with Europe showing the most contributions. The
geographic focus of this study is the 40 largest cities in Germany, and the 4 largest cities in
Austria. Since half of the analyzed German cities offer the Google Street View service,
which provides the same type of data as Mapillary, we can use these cities to analyse the
potential effect of the presence of Google Street View data on the completeness of Mapil-
lary data. Google Street View dates back to 2007 and is currently the most prominent pro-
vider of street level photographs. It allows users to access 360º panorama images from
selected roads all over the world.
Contributors to Mapillay can either upload photos through an application on a GPS enabled
smartphone, or manually via the Mapillary Website as long as the photograph comes with
GI_Forum ‒ Journal for Geographic Information Science, 1-2015.
© Herbert Wichmann Verlag, VDE VERLAG GMBH, Berlin/Offenbach. ISBN 978-3-87907-558-4.
© ÖAW Verlag, Wien. ISSN 2308-1708, doi:10.1553/giscience2015s535.
L. Juhasz and H. Hochmair 536
an EXIF file containing geographic coordinates. Google Street View focuses on main
roads, since this facilitates fast data capture from cars, and therefore frequent data updates.
Google also developed solutions to capture Street View photos for off-road trails, including
a special bike, or even a backpack. Mapillary contributors typically use mobile phones to
upload data from wherever they are traveling. This frequently also includes off-road paths,
such as trails in parks.
Fig. 1(a) provides a screenshot of how a collected street level picture is shown on the
Mapillary Website. The lines visualize GPS tracks, along which pictures were taken, and
the icon with the arrow indicates the image location together with the orientation of the
camera. This picture was taken from a scenic viewpoint at an off-road location in Salzburg,
Austria.
2 Previous Work
Previous studies report on the use of Google Street View imagery as an effective method
for neighbourhood audit to eliminate in-person fieldwork (CLARKE et al. 2010, RUNDLE et
al. 2011, GRIEW et al. 2013, VANWOLLEGHEM et al. 2014). Google Street View imagery has
also been analysed using computer vision to determine the geographic position of other
photos, or to identify road features (ZAMIR & SHAH 2010, HARA et al. 2013). Previous
research has, however, not analysed the spatial coverage and completeness of Google Street
View. This will be addressed in our paper, at least for selected cities, through assessing the
overlap of Google Street View geometries with OpenStreetMap (OSM) road features. One
related study compared the completeness of Google Maps road feature data (not Google
Street View, though) with those from Bing Maps and OSM for five cities in Ireland, con-
cluding that none of these datasets were substantially better than another (CIPELUCH et al.
2010).
For part of the research that assesses the completeness of Mapillary and Google Street
View data in this study, OSM network data is used as a reference dataset. Although OSM is
not governed by an authoritative agency that guarantees certain quality standards, OSM was
found to provide good coverage for roads in analysed urban areas (GIRRES & TOUYA 2010,
HAKLAY 2010, ZIELSTRA & ZIPF 2010). OSM coverage of road segments for non-motor-
ized traffic (e.g., walking or cycling trails) was also shown to be of high quality. For exam-
ple, for seven cities in the US and Europe it was found that the length of off-road bicycle
trails mapped in OSM nearly doubled between 2009 and 2013, except in London, or grew
even more (HOCHMAIR et al. 2015). Another study showed that the total length of mapped
footpaths was higher in OSM than for a proprietary data provider (TeleAtlas back then,
now TomTom) (ZIELSTRA & HOCHMAIR 2011) for selected cities in Germany. For exam-
ple, in Berlin OSM had 3.2 times as many pedestrian related network data as Tele Atlas.
For Munich this ratio was even 5.6. These numbers give evidence that OSM provides a
robust reference dataset for determining Mapillary and Google Street View data complete-
ness, also for off-road data analysis.
Exploratory Completeness Analysis of Mapillary 537
3 Study Design
3.1 Study Area
Only the largest 20 German cities provide Google Street View coverage, in addition to
some smaller towns, such as Oberstaufen. For our analysis we selected the 40 largest cities
in Germany, which includes 20 cities with and 20 cities without Google Street View im-
agery. This split allows us to study the potential effect of the availability of Street View
imagery on Mapillary coverage in different cities. Furthermore, we also examined the
Mapillary coverage in the four largest Austrian cities. The effect of Google Street View
imagery on Mapillary coverage could not be assessed for Austrian cities since this Google
service is not available in Austria. The German and Austrian cities comprising the study
area are shown in Fig. 1(b) (green and blue areas). Red lines indicate Mapillary coverage
outside the selected cities. As can be seen, the data collection efforts of the Mapillary com-
munity have so far focused on urban areas.
(a) (b)
Fig. 1: Street level image as shown on the Mapillary Website (a) and study area (b)
3.2 Data Preparation
3.2.1 Tile System
The completeness analysis of this study is based on raster tiles. To facilitate the comparison
of Google Street View data with other datasets, we adhered to the tile specifications used in
Google Street View. Tiles of the Street View coverage are accessible via the Google Maps
API and provided as 256 x 256 pixel PNG images. In this system, the world is divided into
tiles corresponding to zoom levels. In each zoom level, tiles are indexed by X (column) and
Y (row) values starting from the top-left corner. Zoom level 0 covers the whole world in
one tile. The number of tiles in each zoom level is 22z, where z is the zoom level. Logically,
this system can be considered a hierarchy of folders and files, where each zoom level is a
folder, each X coordinate is a subfolder and each Y coordinate is a PNG image file. This so
called XYZ tile scheme provided by Google became a de facto standard in Web mapping,
and is used by other map providers including Bing Maps, Yahoo Maps, OSM, and Mapbox.
L. Juhasz and H. Hochmair 538
Using this schema allows one to convert geographic coordinates to tile coordinates and the
other way round. As a preliminary step in our analysis, we converted all data sources into
this schema.
3.2.2 Google Street View
Google does not provide information on how its Street View tiles are generated. For this
study, zoom level 13 tiles were chosen for the analysis, but Street View coverage tiles are
highly generalized by default (Fig. 2a) at that level. To make them comparable to the
Mapillary dataset, tiles were regenerated (Fig. 2b). To do so, a vector version of Street
View lines was extracted based on the Street View coverage, which was downloaded at
zoom level 17 (i.e. further zoomed in), using a client-side script in September 2014. As a
result of this step, all PNG tiles appeared in the Web browser’s cache. Individual images of
zoom level 17 were then extracted from the cache and stored within the XYZ tile folder
structure. This zoom level corresponds to a ground pixel size of approximately 1.2 m. After
geocoding each tile, the PNG raster tiles were loaded into GRASS GIS, where they were
patched together, a thinning algorithm was applied, and the 1 pixel wide raster images were
vectorised. All extracted lines were then uploaded to a PostgreSQL database with line ge-
ometries.
(a) (b)
Fig. 2:
Original (a) and regenera-
ted (b) Street View tiles
at zoom level 13
As the last step, TileMill and Mapnik toolkits were used to render tiles at the zoom level 13.
Background pixels had a value of 0, and the pixel size in this zoom level is ~19 m, conceal-
ing GPS positioning errors occurring during the data collection process.
3.2.3 Mapillary
Mapillary offers various methods to download data via their JSON API. For this study,
image sequences were used, which are LineStrings of coherent images and their attributes.
Images are taken one after another by walking, driving, or riding a bike. Although geome-
tries and additional attributes of each sequence can be downloaded, the Mapillary team
provided us with a database dump that included some additional information, including user
ID, timestamp, and geometry of each sequence. The lines are GPS trajectories, where each
node is the position of an image. Individual GPS trajectories are mapped as separate lines
even if they were taken on the same road. Using map tiles with a zoom level of 13 (pixel
size ~19 m) was found to be an efficient method to avoid double counting sequences on the
same road. This means that segments of multiple lines are counted as only one line if they
Exploratory Completeness Analysis of Mapillary 539
fall within the same pixel. Mapillary tiles were then rendered with the same parameters
used to render Street View tiles. We used a dataset that contains LineStrings of photos
uploaded to Mapillary up until November 18, 2014.
3.2.4 OpenStreetMap
An OSM database dump was downloaded from Geofabrik in October, 2014. All roads were
extracted with Osmosis using a highway=[key] filter, and uploaded into a spatially-enabled
PostgreSQL database. Since this study uses certain OSM road categories, additional queries
were formulated to extract the following road categories:
Main roads: connect settlements and cities
Residential roads: minor, lower level roads with moderate traffic
Pedestrian/Cycle roads: minor elements of the road network used for pedestrians or
cyclists for daily routine or recreational purposes
Inaccessible roads, sidewalks, road crossings, tunnels, and indoor features were excluded
based on their tags. Further, all pedestrian and bicycle highway features within 25 m from
main and residential roads were removed to be able to assess completeness of off-road
pedestrian and bicycle features in Mapillary and Google Street View. In a final step, OSM
map tiles were generated for zoom level 13 in the same way as previously done for Google
Street View and Mapillary data.
3.3 Determination of Relative Completeness Between Data Sources
3.3.1 Mapillary vs. Google Street View
20 out of the 40 analysed cities in Germany provide Google Street View service, which
allows a comparison between Mapillary and Street View coverage. Since Mapillary is a
relatively new service, it can be expected that Street View has better coverage in most ana-
lysed areas. However, since Street View is mostly bound to car accessible roads, it can also
be expected that Mapillary exceeds Street View coverage in some off-road areas, e.g. at
recreational sites inaccessible to cars.
A self-developed python script compared tiles from Mapillary and Street View datasets
within predefined boundaries. The script loaded each tile from the two datasets and counted
non-zero value pixels within the tile. As a result, each unique tile identifier could be associ-
ated with the count values of Mapillary and Street View pixels of that tile. The geographic
extent of tiles at zoom level 13 is ~ 4.9 x 4.9 km2. Each tile was subsequently divided into
16 squares with a spatial resolution of ~ 1.2 x 1.2 km2. This provided enough detail to illus-
trate local differences in mapping completeness at the city level. This new reference system
is also identical to the tile system at zoom level 15. Results were uploaded into a Post-
greSQL database with polygon vector geometries. A relative completeness difference that
ranges between -1 and 1 was calculated (Equation 1a). A value of 1 indicates that a tile
contains only Street View coverage but no Mapillary data, whereas -1 means the opposite.
3.3.2 Completeness Relative to OpenStreetMap
The method to calculate Mapillary and Street View completeness relative to predefined
OSM road categories is similar. The difference is how pixels are counted. Instead of coun-
ting all pixels within a tile, completeness requires to identify only those Mapillary and
L. Juhasz and H. Hochmair 540
Street View pixels that overlap with pixels of selected OSM road categories. Since the
reference tile system is the same for all datasets, it is sufficient to identify the position of
OSM road pixels in a tile, and then check whether that specific pixel has a value of one in
the other datasets or not. Results were again uploaded to a PostgreSQL database. Overall
completeness of each city can then be calculated by selecting tiles within the city boundary
and computing the fraction of Mapillary or Street View pixels overlapping with OSM road
pixels in the selected tiles (Equation 1b). Calculations were performed on all predefined
OSM road categories for Mapillary and Street View.
,0, 0
,0
r∑r
∑r
Where d is the relative completeness differ-
ence, SV is the count of Street View pixels and
Map is the count of Mapillary pixels
Where Cr is the completeness of Mapil-
lary or Street View on a road category,
Pxr is a pixel overlapping with a pixel
from the OSM road category r and OSMr
is a pixel of the OSM road category r.
(a) (b)
Equation 1: Relative completeness difference (a) and computation of completeness (b)
4 Results
4.1 User Contributions
Fig. 3(a) shows the number of Mapillary users actively contributing in Germany and Aus-
tria for each month. The majority of users contribute on a regular basis (returning users).
Until November 2014 the total number of contributing users reached 388. Mappers covered
sequences of more than 46500 km, whereby it must be noted that some of the sequences
were taken on the same roads. Fig. 3(b) shows the distribution of data contributors for vari-
ous total distance ranges. The most active contributor uploaded pictures along almost 6100
km to Mapillary. Fig. 3(b) also expresses inequalities in data contributions among volun-
teers, which has been previously identified for other VGI data sources, such as OSM (NEIS
& ZIELSTRA 2014) or drone images (HOCHMAIR & ZIELSTRA 2014). In those VGI data
sources, but also with Mapillary, few users contribute most of the data, whereas the major-
ity of users make only few data contributions. In the case of Mapillary, more than 60%
(205) of the users contributed less than 10 km. Conversely, only 5% (19) mapped 500 km
or more. Another aspect of the contribution pattern is the number of cities a user mapped in.
Fig. 3(c) shows that a large majority of users (40%) contributed in only one city, as op-
posed to only 4% of users who contributed in 5 or more study cities, which again reflects
contribution inequality.
Exploratory Completeness Analysis of Mapillary 541
Fig. 3: User contributions to Mapillary
4.2 Completeness
4.2.1 Relative Completeness Difference Between Mapillary and Street View
The relative completeness difference between Mapillary and Google Street View data was
determined for the 20 largest cities in Germany. This computation was not possible for
Austria, where Street View is not provided. First, all tiles within a city were selected, then
all relevant pixels within those tiles were used to calculate the relative completeness differ-
ence based on Equation 1a. Values range between 0.38 (Dresden) and 0.98 (Düsseldorf).
The values are all positive, meaning that Google Street View provides better overall cover-
age in the analysed cities.
The median of relative completeness differences is 0.84. An outlier (Dresden) with a value
of 0.38 indicates that this city has high Mapillary coverage compared to other cities. Fig.
4(a) and 4(b) visualize patterns of relative completeness difference for Dresden and Berlin
(the latter with a value of 0.65). Purple tiles (without borders) show areas where more roads
Fig. 4:
Spatial distribution of rela-
tive completeness differ-
ence (a-b) and examples of
better Mapillary coverage
(c-d)
(
a
)
(
b
)
(
c
)
(
d
)
(
a
)
(
b
)
(
c
)
L. Juhasz and H. Hochmair 542
are mapped in Google Street View, whereas orange tiles (with borders) show areas with
higher Mapillary coverage. The low relative completeness difference for Dresden matches
the visual appearance of the tile map with its large portions of orange areas in the suburbs
and the town centre. In contrast, Berlin shows fewer orange areas. Whereas Street View
imagery mostly relies on the road network, Mapillary photos are often taken off-road or on
minor roads. Typical examples can be found on riversides. Mapillary is more complete in a
park in Dresden (Fig. 4(c)) or along a river in Frankfurt (Fig. 4(d)). In general, differences
between both data sources are smaller in city centres or other populated areas, indicated
through lighter colours.
4.2.2 Completeness Relative to OpenStreetMap Features
At the city level, Street View is more complete than Mapillary, but it is limited to the 20
biggest German cities, with some exceptions. Completeness values relative to OSM are
shown in Table 1 for some selected German cities and the largest four cities in Austria.
Table 1: Completeness relative to selected OSM road categories (%)
City OSM Main OSM Residential OSM Ped./Cycle Street
View
Google Mapillary Google Mapillary Google Mapillary
Austria
Vienna 0 13.93 0 2.26 0 0.80
Graz 0 32.72 0 16.25 0 7.00
Linz 0 8.82 0 2.39 0 0.82
Salzburg 0 67.73 0 13.47 0 5.68
Germany
Berlin 91.44 42.41 81.44 8.10 1.67 1.86 X
Hamburg 83.35 15.36 72.80 1.84 2.15 0.54 X
Munich 90.08 22.11 82.81 3.76 2.43 1.52 X
Frankfurt 82.72 14.95 63.66 1.37 2.62 3.63 X
Düsseldorf 87.97 1.32 74.44 0.19 3.30 0.19 X
Dortmund 85.72 20.82 79.02 2.51 3.76 0.65 X
Bremen 85.33 6.46 75.49 0.45 1.85 0.28 X
Dresden 86.77 59.01 78.10 18.75 2.18 4.65 X
Nuremberg 92.10 22.94 80.19 3.21 2.96 0.54 X
Duisburg 88.84 4.31 83.99 0.07 3.49 0.12 X
Bielefeld 77.11 20.50 58.23 1.50 1.72 0.67 X
Major roads are well covered in Google Street View, with completeness ranging between
77% (Bielefeld) and 92% (Nuremberg). Numbers are also high for residential roads and
ranging between 84% (Duisburg) and 58% (Bielefeld), but much smaller for off-road pe-
destrian/cycle paths with a maximum of 4%. Completeness is considerably smaller for
Mapillary in all categories, with a few exceptions in the pedestrian/cycle category. For the
latter category it is possible that some of the positive percentage numbers in Google Street
View stem from images taken by cars along roads in the vicinity of pedestrian or cycling
paths. This situation can occur near roads that were not previously excluded from the analy-
sis (only main or residential roads were removed), but are still driveable, such as “track” or
“service” roads in OSM. Salzburg provides the most complete coverage of main roads in
Exploratory Completeness Analysis of Mapillary 543
Mapillary among all cities with a value of 68%. Considering all 44 selected cities, main
roads in Mapillary are mapped most completely (18.6%), followed by residential roads
(3.2%) and pedestrian or cycling paths (1.1%).
No association could be identified between city size and data completeness in Mapillary.
For example, some of the larger cities, such as Düsseldorf (pop. ~ 590,000), Bremen (pop.
~ 550,000), or Duisburg (pop. ~ 490,000) are poorly covered in Mapillary. Previous re-
search suggests that the availability of freely available datasets from other sources in a
region diminishes the community’s motivation for voluntary data collection efforts, e.g. in
the case of OSM (ZIELSTRA & HOCHMAIR 2011). However, mean comparison of Mapillary
completeness between cities with and without Google Street View did not reveal statistical-
ly significant differences in Mapillary completeness between both groups of cities.
4.2.3 Data Growth
Fig. 5(a) shows the growth of average Mapillary completeness values for the three OSM
road categories and the selected 44 cities. Main roads are growing more rapidly than other
road categories. Fig. 5(b)-(d) show the completeness growth curves for five selected cities
in the three road categories. Dresden and Salzburg reveal the highest current completeness
in all three categories. A very active mapping period can be identified between May and
July 2014 in Salzburg. For the other four selected cities, growth slowed down in August,
2014. However, this trend cannot be observed when considering all 44 selected cities (Fig.
5(a)).
Fig. 5: Completeness growth in Mapillary data: Average for three road categories in
44 cities (a), and completeness in five selected cities for three road categories
(b)-(d)
(
a
)
(
b
)
(
c
)
(
d
)
L. Juhasz and H. Hochmair 544
5 Conclusion and Future Work
This study analysed the completeness and development of Mapillary data, which presents
an alternative data source of street level photographs to commercial datasets. This study
focused on 40 German and 4 Austrian cities. A comparison with Google Street View
showed that Google Street View provides a better coverage than Mapillary, with a few
exceptions. Since Mapillary is not bound to professional equipment that needs to be moved
by car, it could become a complimentary data source of street level imagery to Google
Street View for footpaths and bicycle trails, even in those cities where Google Street View
is present.
For future work we plan to extend the spatio-temporal analysis of Mapillary data beyond
Germany and Austria. It is yet to be seen if privacy concerns of local residents will hamper
the continuous growth of Mapillary, as was the case for Google Street View in some Euro-
pean countries. Another aspect of future work is to analyse to which extent Mapillary is
used as a data source for OSM. Tags in the source field of OSM features indicate that OSM
mappers are already using Mapillary imagery to add landmark point features, such as a bus
stop, which can be visually identified on Mapillary street level photographs. Germany was
also one of the most active regions during the beginning of the OSM project, similar to
Mapillary. We therefore plan to analyse whether the same group of users that pushed the
OSM project is also actively contributing to the Mapillary project, or whether Mapillary
reaches out to a new crowd of voluntary mappers.
References
CLARKE, P., AILSHIRE, J., MELENDEZ, R., BADER, M. & MORENOFF, J. (2010), Using
Google Earth to conduct a neighborhood audit: Reliability of a virtual audit instrument.
Health & Place, 16 (6), 1224-1229.
GIRRES, J. F. & TOUYA, G. (2010), Quality assessment of the French OpenStreetMap
dataset. Transactions in GIS, 14 (4), 435-459.
GOODCHILD, M. F. (2007), Citizens as Voluntary Sensors: Spatial Data Infrastructure in the
World of Web 2.0 (Editorial). International Journal of Spatial Data Infrastructures
Research (IJSDIR), 2, 24-32.
GRIEW, P., HILLSDON, M., FOSTER, C., COOMBES, E., JONES, A. & WILKINSON, P. (2013),
Developing and testing a street audit tool using Google Street View to measure
environmental supportiveness for physical activity. International Journal of Behavioral
Nutrition and Physical Activity, 10 (1), 103.
HAKLAY, M. (2010), How good is Volunteered Geographical Information? A comparative
study of OpenStreetMap and Ordnance Survey datasets. Environment and Planning B:
Planning and Design, 37 (4), 682-703.
HARA, K., LE, V., SUN, J., JACOBS, D. & FROEHLICH, J. E. (2013), Exploring Early So-
lutions for Automatically Identifying Inaccessible Sidewalks in the Physical World
Using Google Street View. Human Computer Interaction Consortium 2013.
HOCHMAIR, H. H. & ZIELSTRA, D. (2014), Analysing User Contribution Patterns of Drone
Pictures to the dronestagram Photo Sharing Portal. Journal of Spatial Science.
Exploratory Completeness Analysis of Mapillary 545
HOCHMAIR, H. H., ZIELSTRA, D. & NEIS, P. (2015), Assessing the Completeness of Bicycle
Trail and Designated Lane Features in OpenStreetMap for the United States. Trans-
actions in GIS, 19 (1), 63-81.
NEIS, P. & ZIELSTRA, D. (2014), Recent developments and future trends in volunteered
geographic information research: The case of OpenStreetMap. Future Internet, 6 (1), 76-
106.
RUNDLE, A. G., BADER, M. D. M., RICHARDS, C. A., NECKERMAN, K. M. & TEITLER, J. O.
(2011), Using Google Street View to Audit Neighborhood Environments. American
Journal of Preventive Medicine, 40 (1), 94-100.
VANWOLLEGHEM, G., DYCK, D. V., DUCHEYNE, F., BOURDEAUDHUIJ, I. D. & CARDON, G.
(2014), Assessing the environmental characteristics of cycling routes to school: a study
on the reliability and validity of a Google Street View-based audit. International Journal
of Health Geographics, 13 (19).
ZAMIR, A. R. & SHAH, M. (2010), Accurate Image Localization Based on Google Maps
Street View. In: DANIILIDIS, K., MARAGOS. P. & PARAGIOS, N. (Eds.), Computer Vision
– ECCV 2010. Springer, Berlin/Heidelberg, 255-268.
ZIELSTRA, D. & HOCHMAIR, H. H. (2011), A Comparative Study of Pedestrian Accessibility
to Transit Stations Using Free and Proprietary Network Data. Transportation Research
Record: Journal of the Transportation Research Board, 2217, 145-152.